Dear Marcus,
Recently, Marcus Karolewski wrote:
> Dear Matt,
>
> I have been looking thro' your (highly informative) PhD thesis today in the
> hope of finding some discussion about the zero padding used in FT
> operations on XAFS data, but there doesn't seem to be any direct reference
> to the reasoning behind it, although I gather you padded your data up to 25
> A-1.
>
> I realise that FT padding makes the data look nice in R-space, but I'd like
> to ask whether you know of any other benefit to be gained from it. To my
> way of thinking, there seems to be a definite drawback to padding insofar
> as one ends up with data that consist mostly of interpolated points in R
> space. This would seem to make a mockery of the fitting criteria which
> strictly should be based on the actual number of points justified by the
> range of the expl data. I would be very interested to hear your comments if
> you can find the time.
Excellent question! I hope it's OK if I CC this to the ifeffit
mailing list, as it's relevant to a current discussion there on
interpolating QEXAFS data, and because others might be
interested or have opinions on this as well.
Short answer: Yes, zero-padding is mostly to make the data
look nice in R-space. A mockery of the fitting criteria?? I
wouldn't go that far!
Just for reference, the important values for the FTs are:
kgrid: the k-grid spacing needs to get 'all the features'.
N_fft: the FFT should use a number of points = a power of 2,
even if zero-padding is used.
rmax: the largest frequency that can be seen.
rgrid: the resulting grid spacing in R should be small enough
to avoid missing real features in R-space.
Ifeffit uses kgrid=0.05Ang^-1 and zero-pads chi(k) to have
N_fft=2048. The outputs from feffit also use these values.
rmax and kgrid are simply related:
rmax = pi /(2* kgrid) ~= 31.4 Ang for kgrid=0.05
The spectra above Rmax are a mirror image of the lower-R
portion of the spectra:
chi(R) = chi(2*Rmax-R) for R > Rmax
31.4Ang is much further than the XAFS really extends, but we
can use the high portion to estimate the noise in the signal
(feffit does this). So, we probably wouldn't want to go to
kgrid=0.10 & rmax=16Ang. In fact, we might be tempted to use a
smaller kgrid.
The spacing of the R-space data, rgrid, is given as
rgrid = pi / (kgrid * N_fft)
For N_fft=2048 and kgrid=0.05Ang^-1, rgrid~=0.03Ang. If we had
N_fft = 512, we'd have kmax = 0.05 * 512 = 25.6Ang-1 which is
probably reasonable -- I've seen a few data sets past 25Ang^-1,
but these are rare. We'd also have
rgrid = pi / (0.05 * 512 ) = 0.122 Ang
This has the advantages of giving data points in R that is
closer to the 'real spacing' of "pairs of independent data
points" in R-space: Delta R ~= pi / (K-range). The emphasis
here should be on the 'approximate' sign. It is tempting to
interpret this formula as "my independent data points are
exactly 0.XXX Ang apart", but this is a serious overstatement.
A better interpretation is that distances separated by less
than (2pi/K-range) will be very difficult to distinguish.
Using a coarser R spacing does make the fits faster. In fact,
feffit (the program) does NOT use N_fft = 2048 when doing the
fits -- it uses either 256, 512, 1024, or 2048 depending on the
data k-range. This does indeed speed up the fits (for all
output arrays, N_fft=2048 is used to make the plots prettier).
Long ago (when computers ran on diesel), N_fft=128 was also
allowed, but this was not reliable because there was a good
chance of getting stuck in a 'false minima' with data that
sparsely spaced in R. I've seen such false minima happen a few
times, and took out the option of using N_fft = 128.
Because it's painful to constantly switch N_fft, ifeffit
currently uses kgrid=0.05 and N_fft = 2048 throughout. I have
started working toward having these be changeable at will
inside the program, but this still causes a few problems at the
moment.
Anyway, I wouldn't say that using zero padding and a large
N_fft is really cheating much. It's really just oversampling
the data. An FT is linear, so zero-padding does not change
your data (much), but just convolves the 'independent data
points' with Gaussian functions.
Finally, zero-padding does change your data some because it
means that you are asserting that chi(k)=0 for k>kmax. A
better approach would be to pad with random noise at a level
consistent with your data. This could be done, I suppose, but
has not been done yet.
Hope that helps,
--Matt