Hi Matthew, On Wed, 12 Dec 2012, Matthew Marcus wrote:
On 12/12/2012 8:03 PM, Matt Newville wrote:
Hi Matthew, Bruce, All,
Sorry for not being able to join this discussion earlier. I agree that having glitch-free data is preferred. But I also think it's OK to simply remove diffraction peaks from XAFS data -- you're just asserting that you know those values are not good measurements of mu(E). I don't see it as all that different from removing obvious outliers from fluorescence channels from a multi-element detector -- though that has the appeal of keeping good measurements at a particular energy.
That's kind of my point. The standard methods, which require uniform tabulation, cause you to fill gaps in data by interpolation. Effectively, you're claiming knowledge of data for which you don't have knowledge. Better to use a method which preserves agnosticism about the bad data. The method I described for deglitching with reference scalers tries to preserve what information you do have by allowing you to ignore some channels. That only works if there are 'good' channels to serve as references.
I would say that identifying a point as an invalid measure of mu(E) and simply not excluding it in the analysis is OK: it is claiming knowledge that we didn't measure that point well. I just don't see it as much different from claiming that outliers in the set of fluorescence intensities are not good measures of fluorescence intensity. You might argue (I would agree) that with multiple measures of fluorescence you can determine outliers more rigorously. But I would also say that then the best thing to do then is to not simply remove outliers but to take the average AND standard deviation of the samples and include both in the analysis. A glitch would then not be removed, but would be reflected by having a less certain value for that energy point.
I'm not sure that "slow FT" versus "discrete FT" is that important here, though perhaps I'm missing your meaning. My vies is that The EXAFS signal is band-limited (finite k range due to F(k) and 1/k, and finite R range due to F(k), lambda(k), and 1/R^2), so that sampling on reasonably fine grids is going to preserve all the information that is really there.
Again, FFT requires a uniform tabulation of data in k-space, which is generally secured by interpolation. Traditionally, this has been histogram interpolation, which tries to take into account the possibility of having multiple data points within one of the new intervals in k. That method, however, has to 'make up' data when there isn't any in one or more intervals.
Actually, there are three possible tiers of speed. There's FFT, there's discrete FT done the slow way by computing the sums as they appear in the textbook (possibly good if you want a funny grid in R space), and then there's what I propose.
I agree with your larger point that we can definitely afford the CPU cycles to do better analysis than use something simply because it is Fast. I'm not sure that a slow discrete FT would be significantly different than an FFT, though. I think (but am listening...) that the continuous v discrete FT is like classic audiophile snobbery about the superiority of analog over digital music. With sufficient sampling and resolution and dynamic range in the signal, the difference has to be trivial. I'm willing to believe that a digital recording has less dynamic range, but I'm also willing to believe that this is swamped by my stereo and ears (and I mostly listen to mp3s through cheap headphones these days, so I clearly am not buying a tube amplifier anytime soon...). But for EXAFS, we know the limits of the signals in k and R (say, 50 Ang^-1 and 25 Ang for extrema), and so can choose samplings that definitely over-sample enough to not cause significant artifacts. There can be dynamic range issues (especially with standard pre-amplifiers and VtoF converters), but I think this is a slightly separate issue from "the glitch problem". Interpolation methods are definitely worth discussing and studying. I'm not sure that the energy values that happened to have been where the monochromator was stopped for a measurement of mu(E) are more sacred or a better sampling of the first shell of an EXAFS spectra than a set of energies interpolated from those measurements to be on uniform k grid, provided the the original data isn't woefully undersampled. With a typical step size of 0.05Ang^1 you can easily miss several points and still accurately measure distances up to 5 or probably even 10 Angstroms.
I do think that having richer window functions and spectral weightings would be very useful. You could view chi(k) data with glitches ias wanting a window function that had a very small value (possibly zero: remove this point) exactly at the glitch, but had a large value (~1) at nearby k-values.
Another (and possibly best) approach is to assign an uncertainty to each k value of chi. At the glitch, the uncertainty needs to go way up, so that not matching that point does not harm the overall fit.
A window function which went to 0 in the glitch region is tantamount to an assertion that chi(k)=0 there, which is generally not so and is probably a lot worse than the error you make by even the lamest interpolation through the glitch. The uncertainty makes more sense, except that if you filter data, the noise spreads out in a correlated manner over adjacent regions. Also, the tacit assumption of an uncertainty-based method is that the noise is uncorrelated within the high-uncertainty region. That said, I can imagine that this is one of those methods that works better than it has any right to.
I think of the window as weighting of measurements to consider, and that a weight of zero is applied to data with infinite uncertainty. I don't think of chi(k) as zero outside the window range (say, below kmin or above kmax), just that we don't want to consider that in the analysis. Again, I think the better thing would be to have an uncertainty for each value of chi(k), and propagate that through the analysis.
Now, in my proposed system, how would you fit filtered (q-space) data? The first step is to do the very slow FT (VSFT) on the input data with no interpolation. This leads to a set of sines and cosines which replicate the data. I would then apply the window and do the back-summation of all those sines and cosines at the given points. This will undoubtedly introduce some artifacts near the edges of the deleted regions simply due to our ignorance of what the data really do in there. Next, in each fit iteration, I would evaluate the model function at each point, then do the VSFT and filtering exactly as had been done on the data, which causes the fit to have the same artifacts. Since the VSFT and filter operations are linear, it should be possible to pre-compute a kernel which relates the values of unfiltered data to those of filtered data. This would be a matrix of size Np*Np with Np=# of points. Using the kernel instead of evaluating gazillions of trig functions would be reasonably fast; I suspect that ifeffit does a similar trick. Similarly, fitting in R-space would be accomplished by computing the VSFT on data and model function.
Yes, I know I talk a good game. Unfortunately, I'm too swamped to attempt implementation. Also, I'd have to reinvent a number of very well-engineered wheels to do it.
I hear you! I'm interested in pursuing "more advanced weightings and transforms", but don't have much time to spend on such things these days, so development is slow....
Larch has a separate Transform class used for each dataset in a fit - this includes the standard FT parameters and fit ranges. It is intended to be able to do this sort of advanced windowing (and in principle, advanced weighting too). Currently (I'd be ready for a release but am commissioning our new beamline and am hoping to post a windows installer by the end of the month) this has "the standard XAFS window functions", but one can create their own window functions (for k- and/or R-space) with suppressed points/regions, etc, and use them as well. I have to admit I haven't tried this, but it was definitely part of the intention, and should work. My hope is to extend this to include some wavelet-like weightings as well.
What is Larch? Is it a replacement for ifeffit? By 'class' do you mean object as in OOP? "And now for something completely different. The larch." - Monty Python mam
Yes, Mostly yes, and, Yes, you get the reference exactly ("How to recognize different types of trees from quite a long way away, Number 1: The Larch"). http://xraypy.github.com/xraylarch/ Another wide-open appeal: I'd love help in implementing improvements like these (better windows, comparing continuous v discrete transforms, using wavelets, etc), and think work like this would be publishable. There are several other improvements I'd like to see, such as generating Feff Paths on-the-fly, and so forth. Essentially all of Larch is in python so that it's very easy to modify and add features like this. The available numerical libraries for python are plentiful and well-maintained. It's also easy to add wrappers for C and Fortran code if that is needed..... --Matt
You might argue (I would agree) that with multiple measures of fluorescence you can determine outliers more rigorously. But I would also say that then the best thing to do then is to not simply remove outliers but to take the average AND standard deviation of the samples and include both in the analysis. A glitch would then not be removed, but would be reflected by having a less certain value for that energy
But how do you know what the standard deviation should be for a point you had to exclude? I suppose you could try to come up with some plasuibiity measure which says that it can't be more different from the results of interpolation than some amount, a hieuristic measure. The idea there would be that you do a best-effort deglitch and then assign an uncertainty based on how badly you may have done at it. This still makes me queasy. In a typocal case, the glitch dominates over the real signal so you actually have *no* information about what the signal was there except for what you know from the surrounding points. Thus, attempting to use the glitched-out region is really an attempted double-use of the surrounding points. Better to adapt your methods so that they can deal with non-uniform tabulation, which you have anyway.
point.
I agree with your larger point that we can definitely afford the CPU cycles to do better analysis than use something simply because it is Fast. I'm not sure that a slow discrete FT would be significantly different than an FFT, though. I think (but am listening...) that the continuous v discrete FT is like classic audiophile snobbery about the superiority of analog over digital music. With sufficient sampling and resolution and dynamic range in the signal, the difference has to be trivial. I'm willing to believe that a digital recording has less dynamic range, but I'm also willing to believe that this is swamped by my stereo and ears (and I mostly listen to mp3s through cheap headphones these days, so I clearly am not buying a tube amplifier anytime soon...). You're right that the DFT is exactly equivalent to the FFT. However, the VSFT is different in that it works on non-uniformly tabulated data with no interpolation. I've heard of an AB test in which Golden Ears were challenged with two amps, one of which was solid-state and the other also solid-state but with some 2nd-harmonic distortion and (I think) noise added to simulate a tube amp. Guess which won? Somebody once wrote a parody in which mercury-filled speaker cables were advertised for their "liquid, shimmering sound", but I digress.
But for EXAFS, we know the limits of the signals in k and R (say, 50 Ang^-1 and 25 Ang for extrema), and so can choose samplings that definitely over-sample enough to not cause significant artifacts. There can be dynamic range issues (especially with standard pre-amplifiers and VtoF converters), but I think this is a slightly separate issue from "the glitch problem".
Are you suggesting changes to the methods used to pick points at which to take data? The original discussion had to do with data which had already been taken. An example of what you're talking about - the QXAS setup on my beamline scans the mono at a constant angular speed and samples on-the-fly, like an on-the-fly fluorescence map and using the same hardware. Thus, at the end of the spectrum, to get longer dwell times, you have to pick larger energy steps, which means that you're smearing each point because each point is the average over, say, 5eV, rather than being taken at one enetgy. However, a simple calculation shows that with reasonable stepsizes, you don't lose anything significant even out to 10A.
Interpolation methods are definitely worth discussing and studying. I'm not sure that the energy values that happened to have been where the monochromator was stopped for a measurement of mu(E) are more sacred or a better sampling of the first shell of an EXAFS spectra than a set of energies interpolated from those measurements to be on uniform k grid, provided the the original data isn't woefully undersampled. With a typical step size of 0.05Ang^1 you can easily miss several points and still accurately measure distances up to 5 or probably even 10 Angstroms.
That's kind of the idea behind the histogram interpolation used in at least some FT programs. Let's say that the data need to be interpolated on a 0.05A grid, and there happen to be points at 3.01, 3.03 and 3.04A^-1. Then, chi_out(3.00A^-1) = (4/5)*chi_in(3.01A^-1)+(2/5)*chi_in(3.03A^-1)+(1/5)*chi_in(3.04A^-1)+ contributions from points to the left of 3.00, and similar for chi_out(3.05A^-1). If there are no data points between 3.00 and 3.05, then values are assigned by linear interpolation from the ones on either side of the 3.00-3.05 interval. Getting back to the original problem, suppose you have a Bragg glitch whose wings go out far enough that you have to lop off the top of an EXAFS wiggle. If you try to interpolate over the gap, you will probably screw up the amplitude and maybe the phase in that area. This may not affect a 1st-shell measurement with only one kind of neighbor, but it could do nasty things to anything more complex.
I do think that having richer window functions and spectral weightings would be very useful. You could view chi(k) data with glitches ias wanting a window function that had a very small value (possibly zero: remove this point) exactly at the glitch, but had a large value (~1) at nearby k-values.
Another (and possibly best) approach is to assign an uncertainty to each k value of chi. At the glitch, the uncertainty needs to go way up, so that not matching that point does not harm the overall fit.
A window function which went to 0 in the glitch region is tantamount to an assertion that chi(k)=0 there, which is generally not so and is probably a lot worse than the error you make by even the lamest interpolation through the glitch. The uncertainty makes more sense, except that if you filter data, the noise spreads out in a correlated manner over adjacent regions. Also, the tacit assumption of an uncertainty-based method is that the noise is uncorrelated within the high-uncertainty region. That said, I can imagine that this is one of those methods that works better than it has any right to.
I think of the window as weighting of measurements to consider, and that a weight of zero is applied to data with infinite uncertainty. I don't think of chi(k) as zero outside the window range (say, below kmin or above kmax), just that we don't want to consider that in the analysis. Again, I think the better thing would be to have an uncertainty for each value of chi(k), and propagate that through the analysis.
That's not what's usually meant by windowing. Usually, a window is a multiplying factor. Now, that said, if you apply the same multiplying factor to the results of a fit, you're effectively doing a weighting. Sumsq = sum{W[i]*(y[i]-yfit[i])^2) = sum((w[i]*y[i]-w[i]*yfit[i])^2) for W[i]=w[i]^2 . Also, the issue of weighting gets into some deep and murky waters. The idealist would say that one should weight according to some measure of noise statistics, such as sqrt(N) noise. However, it's not usual that the spectrum be as good as one would expect from sqrt(N), and how do you define N when you're using analog counters like ion chambers or Lytle detectors anyway? Also, systematic noise is often point-to-point correlated so not white. Further, people like to play with varying k-weight exponents in order to make certain features show up. Now add the popular practice of fitting in R-space, wherein every point you fit is actually a linear combination of the input data so the noise in Re(chi~(R)) and Im(chi~(R)) is definitely not white. That's one can of worms I don't plan to open, not even to go fishing with.
Now, in my proposed system, how would you fit filtered (q-space) data? The first step is to do the very slow FT (VSFT) on the input data with no interpolation. This leads to a set of sines and cosines which replicate the data. I would then apply the window and do the back-summation of all those sines and cosines at the given points. This will undoubtedly introduce some artifacts near the edges of the deleted regions simply due to our ignorance of what the data really do in there. Next, in each fit iteration, I would evaluate the model function at each point, then do the VSFT and filtering exactly as had been done on the data, which causes the fit to have the same artifacts. Since the VSFT and filter operations are linear, it should be possible to pre-compute a kernel which relates the values of unfiltered data to those of filtered data. This would be a matrix of size Np*Np with Np=# of points. Using the kernel instead of evaluating gazillions of trig functions would be reasonably fast; I suspect that ifeffit does a similar trick. Similarly, fitting in R-space would be accomplished by computing the VSFT on data and model function.
Yes, I know I talk a good game. Unfortunately, I'm too swamped to attempt implementation. Also, I'd have to reinvent a number of very well-engineered wheels to do it.
I hear you! I'm interested in pursuing "more advanced weightings and transforms", but don't have much time to spend on such things these days, so development is slow....
Larch has a separate Transform class used for each dataset in a fit - this includes the standard FT parameters and fit ranges. It is intended to be able to do this sort of advanced windowing (and in principle, advanced weighting too). Currently (I'd be ready for a release but am commissioning our new beamline and am hoping to post a windows installer by the end of the month) this has "the standard XAFS window functions", but one can create their own window functions (for k- and/or R-space) with suppressed points/regions, etc, and use them as well. I have to admit I haven't tried this, but it was definitely part of the intention, and should work. My hope is to extend this to include some wavelet-like weightings as well.
What is Larch? Is it a replacement for ifeffit? By 'class' do you mean object as in OOP? "And now for something completely different. The larch." - Monty Python mam
Yes, Mostly yes, and, Yes, you get the reference exactly ("How to recognize different types of trees from quite a long way away, Number 1: The Larch"). http://xraypy.github.com/xraylarch/
Another wide-open appeal: I'd love help in implementing improvements like these (better windows, comparing continuous v discrete transforms, using wavelets, etc), and think work like this would be publishable. There are several other improvements I'd like to see, such as generating Feff Paths on-the-fly, and so forth. Essentially all of Larch is in python so that it's very easy to modify and add features like this. The available numerical libraries for python are plentiful and well-maintained. It's also easy to add wrappers for C and Fortran code if that is needed.....
Sigh. If I had infinite time or multiple copies of myself, I'd learn Python. It seems to be the way to go for this kind of work. For the last 12 years or so, I've been working almost exclisively in LabVIEW.
--Matt _______________________________________________ Ifeffit mailing list Ifeffit@millenia.cars.aps.anl.gov http://millenia.cars.aps.anl.gov/mailman/listinfo/ifeffit
Hi Matthew,
On Thu, Dec 13, 2012 at 11:28 PM, Matthew Marcus
You might argue (I would agree) that with multiple measures of fluorescence you can determine outliers more rigorously. But I would also say that then the best thing to do then is to not simply remove outliers but to take the average AND standard deviation of the samples and include both in the analysis. A glitch would then not be removed, but would be reflected by having a less certain value for that energy point.
But how do you know what the standard deviation should be for a point you had to exclude? I suppose you could try to come up with some plasuibiity measure which says that it can't be more different from the results of interpolation than some amount, a hieuristic measure. The idea there would be that you do a best-effort deglitch and then assign an uncertainty based on how badly you may have done at it. This still makes me queasy. In a typocal case, the glitch dominates over the real signal so you actually have *no* information about what the signal was there except for what you know from the surrounding points. Thus, attempting to use the glitched-out region is really an attempted double-use of the surrounding points. Better to adapt your methods so that they can deal with non-uniform tabulation, which you have anyway.
Oh, I meant that for each energy point measured you could use the different fluorescence channels to compute a mean value for I_f and standard deviation for I_f, and then propagate both measurement and uncertainty through the entire analysis. For a point with a glitch, the standard deviation would be very high, and so that region of k space would receive less weight. I'd view data not measured not as having a value of 0 but having infinite uncertainty. In that sense, removing a glitch doesn't do any harm - we're treating it as if we have no measurement at that energy point.
I agree with your larger point that we can definitely afford the CPU cycles to do better analysis than use something simply because it is Fast. I'm not sure that a slow discrete FT would be significantly different than an FFT, though. I think (but am listening...) that the continuous v discrete FT is like classic audiophile snobbery about the superiority of analog over digital music. With sufficient sampling and resolution and dynamic range in the signal, the difference has to be trivial. I'm willing to believe that a digital recording has less dynamic range, but I'm also willing to believe that this is swamped by my stereo and ears (and I mostly listen to mp3s through cheap headphones these days, so I clearly am not buying a tube amplifier anytime soon...).
You're right that the DFT is exactly equivalent to the FFT. However, the VSFT is different in that it works on non-uniformly tabulated data with no interpolation. I've heard of an AB test in which Golden Ears were challenged with two amps, one of which was solid-state and the other also solid-state but with some 2nd-harmonic distortion and (I think) noise added to simulate a tube amp. Guess which won? Somebody once wrote a parody in which mercury-filled speaker cables were advertised for their "liquid, shimmering sound", but I digress.
I agree / understand that a VSFT is or can be different from a DFT. For me the point is that with sufficient sampling and bit depth, the differences go to zero. And I think we know how to sample XAFS well enough. For digital audio there's a fair argument about bit depth / dynamic range and whether people can actually detect frequencies about 18KHz. But there aren't such arguments (yet for EXAFS): no one is (yet) measuring EXAFS reliably to 50 Ang^-1.
Are you suggesting changes to the methods used to pick points at which to take data?
Well, that might be worth considering, but I'm mostly just saying that the way we pick points now is pretty arbitrary for the analysis we want to do. That is, what we need is to sample the XAFS out to 8 or 10 Ang. With steps in k of 0.05Ang^-1 (typical in my experience), you're sampling out to 31.4Ang, though I'm willing to say you might see some aliasing above 20Ang or so, and if there really is signal beyond 15Ang you might want to sample more finely. But I think a "normal sampling" is going to reproduce 8 Ang well enough to miss an occasional point and apply any sort of linear or cubic interpolation. It's interesting to think about whether (and how) we could do better.
The original discussion had to do with data which had already been taken. An example of what you're talking about - the QXAS setup on my beamline scans the mono at a constant angular speed and samples on-the-fly, like an on-the-fly fluorescence map and using the same hardware. Thus, at the end of the spectrum, to get longer dwell times, you have to pick larger energy steps, which means that you're smearing each point because each point is the average over, say, 5eV, rather than being taken at one enetgy. However, a simple calculation shows that with reasonable stepsizes, you don't lose anything significant even out to 10A.
Right, and how to best convert QXAS data (way over-sampled at high k) to chi(k) is a good question. I think mos people just do a box car average.
Interpolation methods are definitely worth discussing and studying. I'm not sure that the energy values that happened to have been where the monochromator was stopped for a measurement of mu(E) are more sacred or a better sampling of the first shell of an EXAFS spectra than a set of energies interpolated from those measurements to be on uniform k grid, provided the the original data isn't woefully undersampled. With a typical step size of 0.05Ang^1 you can easily miss several points and still accurately measure distances up to 5 or probably even 10 Angstroms.
That's kind of the idea behind the histogram interpolation used in at least some FT programs. Let's say that the data need to be interpolated on a 0.05A grid, and there happen to be points at 3.01, 3.03 and 3.04A^-1. Then, chi_out(3.00A^-1) = (4/5)*chi_in(3.01A^-1)+(2/5)*chi_in(3.03A^-1)+(1/5)*chi_in(3.04A^-1)+ contributions from points to the left of 3.00, and similar for chi_out(3.05A^-1). If there are no data points between 3.00 and 3.05, then values are assigned by linear interpolation from the ones on either side of the 3.00-3.05 interval.
Getting back to the original problem, suppose you have a Bragg glitch whose wings go out far enough that you have to lop off the top of an EXAFS wiggle. If you try to interpolate over the gap, you will probably screw up the amplitude and maybe the phase in that area. This may not affect a 1st-shell measurement with only one kind of neighbor, but it could do nasty things to anything more complex.
I agree with all of that... It's hard to tell how far in energy a diffraction glitch will extend.
I do think that having richer window functions and spectral weightings would be very useful. You could view chi(k) data with glitches ias wanting a window function that had a very small value (possibly zero: remove this point) exactly at the glitch, but had a large value (~1) at nearby k-values.
Another (and possibly best) approach is to assign an uncertainty to each k value of chi. At the glitch, the uncertainty needs to go way up, so that not matching that point does not harm the overall fit.
A window function which went to 0 in the glitch region is tantamount to an assertion that chi(k)=0 there, which is generally not so and is probably a lot worse than the error you make by even the lamest interpolation through the glitch. The uncertainty makes more sense, except that if you filter data, the noise spreads out in a correlated manner over adjacent regions. Also, the tacit assumption of an uncertainty-based method is that the noise is uncorrelated within the high-uncertainty region. That said, I can imagine that this is one of those methods that works better than it has any right to.
I think of the window as weighting of measurements to consider, and that a weight of zero is applied to data with infinite uncertainty. I don't think of chi(k) as zero outside the window range (say, below kmin or above kmax), just that we don't want to consider that in the analysis. Again, I think the better thing would be to have an uncertainty for each value of chi(k), and propagate that through the analysis.
That's not what's usually meant by windowing. Usually, a window is a multiplying factor.
That's similar to weighting by the confidence in your data...
Now, that said, if you apply the same multiplying factor to the results of a fit, you're effectively doing a weighting. Sumsq = sum{W[i]*(y[i]-yfit[i])^2) = sum((w[i]*y[i]-w[i]*yfit[i])^2) for W[i]=w[i]^2 .
Exactly. You also might include a 1/eps[i] in the terms in the sum, which can be folded into (or, is not really different from) the window function.
Also, the issue of weighting gets into some deep and murky waters. The idealist would say that one should weight according to some measure of noise statistics, such as sqrt(N) noise. However, it's not usual that the spectrum be as good as one would expect from sqrt(N), and how do you define N when you're using analog counters like ion chambers or Lytle detectors anyway? Also, systematic noise is often point-to-point correlated so not white. Further, people like to play with varying k-weight exponents in order to make certain features show up. Now add the popular practice of fitting in R-space, wherein every point you fit is actually a linear combination of the input data so the noise in Re(chi~(R)) and Im(chi~(R)) is definitely not white. That's one can of worms I don't plan to open, not even to go fishing with.
Well, sqrt(N) implies a counting noise. For a glitch, you could just say that the uncertainty is dominated by non-normal statistics and is either very large, or even infinite, in which case you can set w[i] to zero.
Now, in my proposed system, how would you fit filtered (q-space) data? The first step is to do the very slow FT (VSFT) on the input data with no interpolation. This leads to a set of sines and cosines which replicate the data. I would then apply the window and do the back-summation of all those sines and cosines at the given points. This will undoubtedly introduce some artifacts near the edges of the deleted regions simply due to our ignorance of what the data really do in there. Next, in each fit iteration, I would evaluate the model function at each point, then do the VSFT and filtering exactly as had been done on the data, which causes the fit to have the same artifacts. Since the VSFT and filter operations are linear, it should be possible to pre-compute a kernel which relates the values of unfiltered data to those of filtered data. This would be a matrix of size Np*Np with Np=# of points. Using the kernel instead of evaluating gazillions of trig functions would be reasonably fast; I suspect that ifeffit does a similar trick. Similarly, fitting in R-space would be accomplished by computing the VSFT on data and model function.
Yes, I know I talk a good game. Unfortunately, I'm too swamped to attempt implementation. Also, I'd have to reinvent a number of very well-engineered wheels to do it.
I hear you! I'm interested in pursuing "more advanced weightings and transforms", but don't have much time to spend on such things these days, so development is slow....
Larch has a separate Transform class used for each dataset in a fit - this includes the standard FT parameters and fit ranges. It is intended to be able to do this sort of advanced windowing (and in principle, advanced weighting too). Currently (I'd be ready for a release but am commissioning our new beamline and am hoping to post a windows installer by the end of the month) this has "the standard XAFS window functions", but one can create their own window functions (for k- and/or R-space) with suppressed points/regions, etc, and use them as well. I have to admit I haven't tried this, but it was definitely part of the intention, and should work. My hope is to extend this to include some wavelet-like weightings as well.
What is Larch? Is it a replacement for ifeffit? By 'class' do you mean object as in OOP? "And now for something completely different. The larch." - Monty Python mam
Yes, Mostly yes, and, Yes, you get the reference exactly ("How to recognize different types of trees from quite a long way away, Number 1: The Larch"). http://xraypy.github.com/xraylarch/
Another wide-open appeal: I'd love help in implementing improvements like these (better windows, comparing continuous v discrete transforms, using wavelets, etc), and think work like this would be publishable. There are several other improvements I'd like to see, such as generating Feff Paths on-the-fly, and so forth. Essentially all of Larch is in python so that it's very easy to modify and add features like this. The available numerical libraries for python are plentiful and well-maintained. It's also easy to add wrappers for C and Fortran code if that is needed.....
Sigh. If I had infinite time or multiple copies of myself, I'd learn Python. It seems to be the way to go for this kind of work. For the last 12 years or so, I've been working almost exclisively in LabVIEW.
I hear you and completely understand. It took me all day to be able to get to even respond to this discussion! (OTOH, the new microprobe is fun too!) But, if anyone else out there is reading this far and is even halfway interested in working on making EXAFS analysis better, this is the time to get involved. If you've ever asked Bruce or I to make some change, or wished that (D)Athena, (D)Artemis, or Ifeffit did something new, better, or different, now is the time. If you think you'd like to be doing EXAFS analysis in 5 years, now is the time to make sure that will happen. It's never been easier to make real and meaningful improvements to how we do these EXAFS analysis steps. Bruce and I can barely keep up with this effort, and have less time available for it with each passing year. If there is no interest, we will stop doing this. --Matt
On 12/14/2012 4:10 PM, Matt Newville wrote:
Hi Matthew,
On Thu, Dec 13, 2012 at 11:28 PM, Matthew Marcus
wrote: You might argue (I would agree) that with multiple measures of fluorescence you can determine outliers more rigorously. But I would also say that then the best thing to do then is to not simply remove outliers but to take the average AND standard deviation of the samples and include both in the analysis. A glitch would then not be removed, but would be reflected by having a less certain value for that energy point.
But how do you know what the standard deviation should be for a point you had to exclude? I suppose you could try to come up with some plasuibiity measure which says that it can't be more different from the results of interpolation than some amount, a hieuristic measure. The idea there would be that you do a best-effort deglitch and then assign an uncertainty based on how badly you may have done at it. This still makes me queasy. In a typocal case, the glitch dominates over the real signal so you actually have *no* information about what the signal was there except for what you know from the surrounding points. Thus, attempting to use the glitched-out region is really an attempted double-use of the surrounding points. Better to adapt your methods so that they can deal with non-uniform tabulation, which you have anyway.
Oh, I meant that for each energy point measured you could use the different fluorescence channels to compute a mean value for I_f and standard deviation for I_f, and then propagate both measurement and uncertainty through the entire analysis. For a point with a glitch, the standard deviation would be very high, and so that region of k space would receive less weight.
I'd view data not measured not as having a value of 0 but having infinite uncertainty. In that sense, removing a glitch doesn't do any harm - we're treating it as if we have no measurement at that energy point.
If I understand you correctly, you're saying that 'deglitch with reference scalers' is OK, but with some additional error estimate imposed on the points you had to do that to. I've found no reason to suspect that any artifacts were introduced by this procedure because the ratio of the "bad" to the "good" counters is quite slowly-varying except in the glitch region, so interpolation is safe. Therefore, the only extra noise is shot noise due to counting with fewer elements. Now, given how non-rigorous our treatment of noise is, I think that the extra noise is not going to influence the fit significantly. The real problem is what to do when you can't do any kind of signal-based data recovery, but have to either acknowledge that you don't have any data there or make some up and assign some sort of uncertainty to your curve-drawing artistry.
I agree with your larger point that we can definitely afford the CPU cycles to do better analysis than use something simply because it is Fast. I'm not sure that a slow discrete FT would be significantly different than an FFT, though. I think (but am listening...) that the continuous v discrete FT is like classic audiophile snobbery about the superiority of analog over digital music. With sufficient sampling and resolution and dynamic range in the signal, the difference has to be trivial. I'm willing to believe that a digital recording has less dynamic range, but I'm also willing to believe that this is swamped by my stereo and ears (and I mostly listen to mp3s through cheap headphones these days, so I clearly am not buying a tube amplifier anytime soon...).
You're right that the DFT is exactly equivalent to the FFT. However, the VSFT is different in that it works on non-uniformly tabulated data with no interpolation. I've heard of an AB test in which Golden Ears were challenged with two amps, one of which was solid-state and the other also solid-state but with some 2nd-harmonic distortion and (I think) noise added to simulate a tube amp. Guess which won? Somebody once wrote a parody in which mercury-filled speaker cables were advertised for their "liquid, shimmering sound", but I digress.
I agree / understand that a VSFT is or can be different from a DFT. For me the point is that with sufficient sampling and bit depth, the differences go to zero. And I think we know how to sample XAFS well enough. For digital audio there's a fair argument about bit depth / dynamic range and whether people can actually detect frequencies about 18KHz. But there aren't such arguments (yet for EXAFS): no one is (yet) measuring EXAFS reliably to 50 Ang^-1.
Sampling is exactly the problem - if you have gaps in the data, then you're not sampled densely in the relevant region. If you assume the data to be band-limited, then you could in principle interpolate. One way to do that is to do the VSFT and then evaluate the resulting weighted sum of trig functions in the bad region. Unlike DFT, VSFT defines an inverse transform over a continuous range in abscissa. You can even extrapolate. When you use Artemis in q-space mode and it plots q over a bigger range than your input data, I suspect that it's doing such a summation. BTW, I think that argument about detecting high audio frequencies is more subtle than whether you can hear them directly. I think the argument is that their presence somehow affects the perception of the lower frequencies. I'd have to hear an AB comparison of a piece of music with and without the 18kHz and above frequencies before I'll believe such a claim.
Are you suggesting changes to the methods used to pick points at which to take data?
Well, that might be worth considering, but I'm mostly just saying that the way we pick points now is pretty arbitrary for the analysis we want to do. That is, what we need is to sample the XAFS out to 8 or 10 Ang. With steps in k of 0.05Ang^-1 (typical in my experience), you're sampling out to 31.4Ang, though I'm willing to say you might see some aliasing above 20Ang or so, and if there really is signal beyond 15Ang you might want to sample more finely. But I think a "normal sampling" is going to reproduce 8 Ang well enough to miss an occasional point and apply any sort of linear or cubic interpolation. It's interesting to think about whether (and how) we could do better.
The original discussion had to do with data which had already been taken. An example of what you're talking about - the QXAS setup on my beamline scans the mono at a constant angular speed and samples on-the-fly, like an on-the-fly fluorescence map and using the same hardware. Thus, at the end of the spectrum, to get longer dwell times, you have to pick larger energy steps, which means that you're smearing each point because each point is the average over, say, 5eV, rather than being taken at one enetgy. However, a simple calculation shows that with reasonable stepsizes, you don't lose anything significant even out to 10A.
Right, and how to best convert QXAS data (way over-sampled at high k) to chi(k) is a good question. I think mos people just do a box car average.
Interpolation methods are definitely worth discussing and studying. I'm not sure that the energy values that happened to have been where the monochromator was stopped for a measurement of mu(E) are more sacred or a better sampling of the first shell of an EXAFS spectra than a set of energies interpolated from those measurements to be on uniform k grid, provided the the original data isn't woefully undersampled. With a typical step size of 0.05Ang^1 you can easily miss several points and still accurately measure distances up to 5 or probably even 10 Angstroms.
That's kind of the idea behind the histogram interpolation used in at least some FT programs. Let's say that the data need to be interpolated on a 0.05A grid, and there happen to be points at 3.01, 3.03 and 3.04A^-1. Then, chi_out(3.00A^-1) = (4/5)*chi_in(3.01A^-1)+(2/5)*chi_in(3.03A^-1)+(1/5)*chi_in(3.04A^-1)+ contributions from points to the left of 3.00, and similar for chi_out(3.05A^-1). If there are no data points between 3.00 and 3.05, then values are assigned by linear interpolation from the ones on either side of the 3.00-3.05 interval.
Getting back to the original problem, suppose you have a Bragg glitch whose wings go out far enough that you have to lop off the top of an EXAFS wiggle. If you try to interpolate over the gap, you will probably screw up the amplitude and maybe the phase in that area. This may not affect a 1st-shell measurement with only one kind of neighbor, but it could do nasty things to anything more complex.
I agree with all of that... It's hard to tell how far in energy a diffraction glitch will extend.
I do think that having richer window functions and spectral weightings would be very useful. You could view chi(k) data with glitches ias wanting a window function that had a very small value (possibly zero: remove this point) exactly at the glitch, but had a large value (~1) at nearby k-values.
Another (and possibly best) approach is to assign an uncertainty to each k value of chi. At the glitch, the uncertainty needs to go way up, so that not matching that point does not harm the overall fit.
A window function which went to 0 in the glitch region is tantamount to an assertion that chi(k)=0 there, which is generally not so and is probably a lot worse than the error you make by even the lamest interpolation through the glitch. The uncertainty makes more sense, except that if you filter data, the noise spreads out in a correlated manner over adjacent regions. Also, the tacit assumption of an uncertainty-based method is that the noise is uncorrelated within the high-uncertainty region. That said, I can imagine that this is one of those methods that works better than it has any right to.
I think of the window as weighting of measurements to consider, and that a weight of zero is applied to data with infinite uncertainty. I don't think of chi(k) as zero outside the window range (say, below kmin or above kmax), just that we don't want to consider that in the analysis. Again, I think the better thing would be to have an uncertainty for each value of chi(k), and propagate that through the analysis.
That's not what's usually meant by windowing. Usually, a window is a multiplying factor.
That's similar to weighting by the confidence in your data...
Now, that said, if you apply the same multiplying factor to the results of a fit, you're effectively doing a weighting. Sumsq = sum{W[i]*(y[i]-yfit[i])^2) = sum((w[i]*y[i]-w[i]*yfit[i])^2) for W[i]=w[i]^2 .
Exactly. You also might include a 1/eps[i] in the terms in the sum, which can be folded into (or, is not really different from) the window function.
Also, the issue of weighting gets into some deep and murky waters. The idealist would say that one should weight according to some measure of noise statistics, such as sqrt(N) noise. However, it's not usual that the spectrum be as good as one would expect from sqrt(N), and how do you define N when you're using analog counters like ion chambers or Lytle detectors anyway? Also, systematic noise is often point-to-point correlated so not white. Further, people like to play with varying k-weight exponents in order to make certain features show up. Now add the popular practice of fitting in R-space, wherein every point you fit is actually a linear combination of the input data so the noise in Re(chi~(R)) and Im(chi~(R)) is definitely not white. That's one can of worms I don't plan to open, not even to go fishing with.
Well, sqrt(N) implies a counting noise. For a glitch, you could just say that the uncertainty is dominated by non-normal statistics and is either very large, or even infinite, in which case you can set w[i] to zero.
Now, in my proposed system, how would you fit filtered (q-space) data? The first step is to do the very slow FT (VSFT) on the input data with no interpolation. This leads to a set of sines and cosines which replicate the data. I would then apply the window and do the back-summation of all those sines and cosines at the given points. This will undoubtedly introduce some artifacts near the edges of the deleted regions simply due to our ignorance of what the data really do in there. Next, in each fit iteration, I would evaluate the model function at each point, then do the VSFT and filtering exactly as had been done on the data, which causes the fit to have the same artifacts. Since the VSFT and filter operations are linear, it should be possible to pre-compute a kernel which relates the values of unfiltered data to those of filtered data. This would be a matrix of size Np*Np with Np=# of points. Using the kernel instead of evaluating gazillions of trig functions would be reasonably fast; I suspect that ifeffit does a similar trick. Similarly, fitting in R-space would be accomplished by computing the VSFT on data and model function.
Yes, I know I talk a good game. Unfortunately, I'm too swamped to attempt implementation. Also, I'd have to reinvent a number of very well-engineered wheels to do it.
I hear you! I'm interested in pursuing "more advanced weightings and transforms", but don't have much time to spend on such things these days, so development is slow....
Larch has a separate Transform class used for each dataset in a fit - this includes the standard FT parameters and fit ranges. It is intended to be able to do this sort of advanced windowing (and in principle, advanced weighting too). Currently (I'd be ready for a release but am commissioning our new beamline and am hoping to post a windows installer by the end of the month) this has "the standard XAFS window functions", but one can create their own window functions (for k- and/or R-space) with suppressed points/regions, etc, and use them as well. I have to admit I haven't tried this, but it was definitely part of the intention, and should work. My hope is to extend this to include some wavelet-like weightings as well.
What is Larch? Is it a replacement for ifeffit? By 'class' do you mean object as in OOP? "And now for something completely different. The larch." - Monty Python mam
Yes, Mostly yes, and, Yes, you get the reference exactly ("How to recognize different types of trees from quite a long way away, Number 1: The Larch"). http://xraypy.github.com/xraylarch/
Another wide-open appeal: I'd love help in implementing improvements like these (better windows, comparing continuous v discrete transforms, using wavelets, etc), and think work like this would be publishable. There are several other improvements I'd like to see, such as generating Feff Paths on-the-fly, and so forth. Essentially all of Larch is in python so that it's very easy to modify and add features like this. The available numerical libraries for python are plentiful and well-maintained. It's also easy to add wrappers for C and Fortran code if that is needed.....
Sigh. If I had infinite time or multiple copies of myself, I'd learn Python. It seems to be the way to go for this kind of work. For the last 12 years or so, I've been working almost exclisively in LabVIEW.
I hear you and completely understand. It took me all day to be able to get to even respond to this discussion! (OTOH, the new microprobe is fun too!)
But, if anyone else out there is reading this far and is even halfway interested in working on making EXAFS analysis better, this is the time to get involved. If you've ever asked Bruce or I to make some change, or wished that (D)Athena, (D)Artemis, or Ifeffit did something new, better, or different, now is the time. If you think you'd like to be doing EXAFS analysis in 5 years, now is the time to make sure that will happen. It's never been easier to make real and meaningful improvements to how we do these EXAFS analysis steps. Bruce and I can barely keep up with this effort, and have less time available for it with each passing year. If there is no interest, we will stop doing this.
--Matt _______________________________________________ Ifeffit mailing list Ifeffit@millenia.cars.aps.anl.gov http://millenia.cars.aps.anl.gov/mailman/listinfo/ifeffit
Hi Matthew, On Fri, 14 Dec 2012, Matthew Marcus wrote:
On 12/14/2012 4:10 PM, Matt Newville wrote:
Hi Matthew,
On Thu, Dec 13, 2012 at 11:28 PM, Matthew Marcus
wrote: Oh, I meant that for each energy point measured you could use the different fluorescence channels to compute a mean value for I_f and standard deviation for I_f, and then propagate both measurement and uncertainty through the entire analysis. For a point with a glitch, the standard deviation would be very high, and so that region of k space would receive less weight.
I'd view data not measured not as having a value of 0 but having infinite uncertainty. In that sense, removing a glitch doesn't do any harm - we're treating it as if we have no measurement at that energy point.
If I understand you correctly, you're saying that 'deglitch with reference scalers' is OK, but with some additional error estimate imposed on the points you had to do that to.
No, I'm saying first that it is OK to simply remove a glitch -- you didn't measure mu(E) at that energy, you measured something else. Alternatively, one could keep the glitch but say that the uncertainty at that energy is extremely large. This approach would require one to propagate the uncertainties per point throughout the analysis -- not a bad idea, but not normally done. Your method of replacing an outlier in a set of I_f values with the mean value is also fine with me -- it effectively removes the bad point, preserving the average I_f. Alternatively, one could use not the simple sum or mean of the set of I_f values, but use both the mean and variance, and propagate these both all the way through the analysis. The advantage of propagating the mean and variance is that one may not correctly identifiy small glitches or tails of glitches or other artifacts not due to counting statistics, but their effect would be handled as well as possible.
I've found no reason to suspect that any artifacts were introduced by this procedure because the ratio of the "bad" to the "good" counters is quite slowly-varying except in the glitch region, so interpolation is safe. Therefore, the only extra noise is shot noise due to counting with fewer elements. Now, given how non-rigorous our treatment of noise is, I think that the extra noise is not going to influence the fit significantly.
The real problem is what to do when you can't do any kind of signal-based data recovery, but have to either acknowledge that you don't have any data there or make some up and assign some sort of uncertainty to your curve-drawing artistry.
Right. I think acknowledging you don't have the data is the best approach.
I agree with your larger point that we can definitely afford the CPU cycles to do better analysis than use something simply because it is Fast. I'm not sure that a slow discrete FT would be significantly different than an FFT, though. I think (but am listening...) that the continuous v discrete FT is like classic audiophile snobbery about the superiority of analog over digital music. With sufficient sampling and resolution and dynamic range in the signal, the difference has to be trivial. I'm willing to believe that a digital recording has less dynamic range, but I'm also willing to believe that this is swamped by my stereo and ears (and I mostly listen to mp3s through cheap headphones these days, so I clearly am not buying a tube amplifier anytime soon...).
You're right that the DFT is exactly equivalent to the FFT. However, the VSFT is different in that it works on non-uniformly tabulated data with no interpolation. I've heard of an AB test in which Golden Ears were challenged with two amps, one of which was solid-state and the other also solid-state but with some 2nd-harmonic distortion and (I think) noise added to simulate a tube amp. Guess which won? Somebody once wrote a parody in which mercury-filled speaker cables were advertised for their "liquid, shimmering sound", but I digress.
I agree / understand that a VSFT is or can be different from a DFT. For me the point is that with sufficient sampling and bit depth, the differences go to zero. And I think we know how to sample XAFS well enough. For digital audio there's a fair argument about bit depth / dynamic range and whether people can actually detect frequencies about 18KHz. But there aren't such arguments (yet for EXAFS): no one is (yet) measuring EXAFS reliably to 50 Ang^-1.
Sampling is exactly the problem - if you have gaps in the data, then you're not sampled densely in the relevant region.
Yes, sampling is the problem. But, removing a point or two from a 0.05Ang^-1 grid won't do much damage to the signal below 8 Ang. You could even mask out a full 0.5 Ang^-1 from the chi(k) and still do OK. This is essentially the earlier suggestion for using a notched window around a glitch or region of glitches -- it is possible to ignore portions of an oscillatory function and still recover phase/amplitude. Sure, it's not ideal, but it's actually not that bad for the lower frequencies.
If you assume the data to be band-limited, then you could in principle interpolate.
The data is limited. We're not going to be modeling EXAFS beyond 50 Ang^-1 or beyond 20 Ang anytime soon. Possibly ever. Samplying at 0.05Ang^-1 would let us measure to 31 Ang (well, aliasing might push that down, but only if you believe non-zero signal past 31 Ang!). That is, we over-sample the EXAFS below 8 Ang well enough to be able to miss a few points.
One way to do that is to do the VSFT and then evaluate the resulting weighted sum of trig functions in the bad region. Unlike DFT, VSFT defines an inverse transform over a continuous range in abscissa. You can even extrapolate. When you use Artemis in q-space mode and it plots q over a bigger range than your input data, I suspect that it's doing such a summation.
BTW, I think that argument about detecting high audio frequencies is more subtle than whether you can hear them directly. I think the argument is that their presence somehow affects the perception of the lower frequencies. I'd have to hear an AB comparison of a piece of music with and without the 18kHz and above frequencies before I'll believe such a claim.
Sure.... For audio, the detectors are complicated. ;). Cheers, --Matt
participants (2)
-
Matt Newville
-
Matthew Marcus