Dear Marcus, Recently, Marcus Karolewski wrote:
Dear Matt,
I have been looking thro' your (highly informative) PhD thesis today in the hope of finding some discussion about the zero padding used in FT operations on XAFS data, but there doesn't seem to be any direct reference to the reasoning behind it, although I gather you padded your data up to 25 A-1.
I realise that FT padding makes the data look nice in R-space, but I'd like to ask whether you know of any other benefit to be gained from it. To my way of thinking, there seems to be a definite drawback to padding insofar as one ends up with data that consist mostly of interpolated points in R space. This would seem to make a mockery of the fitting criteria which strictly should be based on the actual number of points justified by the range of the expl data. I would be very interested to hear your comments if you can find the time.
Excellent question! I hope it's OK if I CC this to the ifeffit mailing list, as it's relevant to a current discussion there on interpolating QEXAFS data, and because others might be interested or have opinions on this as well. Short answer: Yes, zero-padding is mostly to make the data look nice in R-space. A mockery of the fitting criteria?? I wouldn't go that far! Just for reference, the important values for the FTs are: kgrid: the k-grid spacing needs to get 'all the features'. N_fft: the FFT should use a number of points = a power of 2, even if zero-padding is used. rmax: the largest frequency that can be seen. rgrid: the resulting grid spacing in R should be small enough to avoid missing real features in R-space. Ifeffit uses kgrid=0.05Ang^-1 and zero-pads chi(k) to have N_fft=2048. The outputs from feffit also use these values. rmax and kgrid are simply related: rmax = pi /(2* kgrid) ~= 31.4 Ang for kgrid=0.05 The spectra above Rmax are a mirror image of the lower-R portion of the spectra: chi(R) = chi(2*Rmax-R) for R > Rmax 31.4Ang is much further than the XAFS really extends, but we can use the high portion to estimate the noise in the signal (feffit does this). So, we probably wouldn't want to go to kgrid=0.10 & rmax=16Ang. In fact, we might be tempted to use a smaller kgrid. The spacing of the R-space data, rgrid, is given as rgrid = pi / (kgrid * N_fft) For N_fft=2048 and kgrid=0.05Ang^-1, rgrid~=0.03Ang. If we had N_fft = 512, we'd have kmax = 0.05 * 512 = 25.6Ang-1 which is probably reasonable -- I've seen a few data sets past 25Ang^-1, but these are rare. We'd also have rgrid = pi / (0.05 * 512 ) = 0.122 Ang This has the advantages of giving data points in R that is closer to the 'real spacing' of "pairs of independent data points" in R-space: Delta R ~= pi / (K-range). The emphasis here should be on the 'approximate' sign. It is tempting to interpret this formula as "my independent data points are exactly 0.XXX Ang apart", but this is a serious overstatement. A better interpretation is that distances separated by less than (2pi/K-range) will be very difficult to distinguish. Using a coarser R spacing does make the fits faster. In fact, feffit (the program) does NOT use N_fft = 2048 when doing the fits -- it uses either 256, 512, 1024, or 2048 depending on the data k-range. This does indeed speed up the fits (for all output arrays, N_fft=2048 is used to make the plots prettier). Long ago (when computers ran on diesel), N_fft=128 was also allowed, but this was not reliable because there was a good chance of getting stuck in a 'false minima' with data that sparsely spaced in R. I've seen such false minima happen a few times, and took out the option of using N_fft = 128. Because it's painful to constantly switch N_fft, ifeffit currently uses kgrid=0.05 and N_fft = 2048 throughout. I have started working toward having these be changeable at will inside the program, but this still causes a few problems at the moment. Anyway, I wouldn't say that using zero padding and a large N_fft is really cheating much. It's really just oversampling the data. An FT is linear, so zero-padding does not change your data (much), but just convolves the 'independent data points' with Gaussian functions. Finally, zero-padding does change your data some because it means that you are asserting that chi(k)=0 for k>kmax. A better approach would be to pad with random noise at a level consistent with your data. This could be done, I suppose, but has not been done yet. Hope that helps, --Matt