Matt and Bruce: Following our previous discussion of rebinning continuous scan data, I am attaching a python code fragment which we use to perform this function. You are free to use it as you like. It was written by Ken McIvor, a student at IIT as part of a larger project to manipulate and preview MR-CAT XAFS data. We simply pulled out the code and annotated it somewhat. Carlo -- Carlo U. Segre -- Professor of Physics Associate Dean for Research, Armour College Illinois Institute of Technology Voice: 312.567.3498 Fax: 312.567.3494 Carlo.Segre@iit.edu http://www.iit.edu/~segre
On Tuesday 11 March 2003 01:16 pm, Carlo U. Segre wrote:
Following our previous discussion of rebinning continuous scan data, I am attaching a python code fragment which we use to perform this function. You are free to use it as you like. It was written by Ken McIvor, a student at IIT as part of a larger project to manipulate and preview MR-CAT XAFS data. We simply pulled out the code and annotated it somewhat.
Very cool! I will be very happy to add rebinning to Athena. I must say that I am very pleased to see our mailing list working so well. To receive not just a suggestion but a fleshed out algorithm suggests that our little community here is starting to work very well. I was also pleased that Ken did such a good job commenting the code. It was very easy to read and to understand what was going on. Thanks! B -- Bruce Ravel ----------------------------------- ravel@phys.washington.edu Code 6134, Building 3, Room 222 Naval Research Laboratory phone: (1) 202 767 5947 Washington DC 20375, USA fax: (1) 202 767 1697 NRL Synchrotron Radiation Consortium (NRL-SRC) Beamlines X11a, X11b, X23b, X24c, U4b National Synchrotron Light Source Brookhaven National Laboratory, Upton, NY 11973 My homepage: http://feff.phys.washington.edu/~ravel EXAFS software: http://feff.phys.washington.edu/~ravel/software/exafs/
Hi Carlo, Thanks!! That's wonderful. It seems like this should be an option in spline() and bkg_cl(), and possibly in pre_edge(). I admit to being still a little confused by the goal of the rebinning, especially with respect to loss of resolution. It seems like to use this script, you need to do two things: 1. select an E0. The data will be put at energies that will give an even k-grid with this e0. I think it's inevitable that if e0 changes, you might want to re-bin. 2. assume that the k-grid is small enough that the measurements at independent energies are within the acceptable energy resolution that they can be simply summed. Is that right? Would it be OK with you (and everyone else) to have a weighted average replace the boxcar average? For example, at k=12, [E(k+0.05) - E(k)] = 5eV, which is larger than normal energy resolutions, so the boxcar average might wash out the high-k EXAFS, no? It might be better to convolve the spectra with a lorenztian reflecting the incident energy resolution (probably defaulting to 1eV). Does that seem OK or is there something eles going on? Anyway, thanks!! --Matt
Matt: Let me see if I can explain my reasoning adequately The reason we want to rebin continuous scan data is because in our continuous scan, we oversample the data significantly so reducing it to a step size in k of 0.05 really costs nothing. With the rebinning, it is possible to make use of the statistical weight of all the data. A simnple interpolation would throw this away. I know that you have a spline fit in IFEFFIT and perhaps that is an equivalent method to make use of all the counts in the data but this is slightly different. We have been using this for a while with MR-CAT data and it seems to work well for us. Yes, for this algorithm, we select an E0 and a distance above that at which to begin rebinning. You could just as well just set an energy at which to begin rebinning but the conventional idea of using energies relative to E0 make sense to me. The bin size just starts at this E0+Offset, whatever it is. There is no regard for an even value in k-space since the data needs to be kept in terms of energy anyway. (If you have suggestions about this, please let me know). If E0 changes, interpolation will be required for analysis but this is something that has to be done anyway when summing multiple spectra where there could be a small shift in E0 from one to the next. We just have to live with that, I suppose. This code fragment is also incorporated in a bigger "filter" that Ken has written which fits multiple spectra to each other to determine shifts, then interpolates the spectra to a common grid, sums or averages them and then performs the rebinning on the final product. We plan to use this in a GUI to assess the quality of data as we take it in order to figure out when to stop collecting on a sample. I suppose that it is possible to do what you suggest in "2" below but the boxcar gives a better representation of where the center of mass of the actual data is. As I mentioned before, the goal is not to get data on an even grid but just to remove the oversampling without losing statistics. Interpolation to an even grid is left for later, in the analysis software. Typically, in a step scan you would set the delta-k to some value like 0.05 or more so your original data would be no better than the rebinned data. If it is important to have smaller steps then perhaps the default should be smaller than 0.05k? The way I see it, we have not yet pushed the data collection as far as we will eventually go. Right now we have an EPICS limitation of no more than 4000 data points in a scan but once we break past this limit, I expect that there is plenty of flux at APS to take data even more densely. Your question about weighted averaging is a good one. I would have to think about the convolution a bit more. What do others think? Carlo On Tue, 11 Mar 2003, Matt Newville wrote:
Hi Carlo,
Thanks!! That's wonderful. It seems like this should be an option in spline() and bkg_cl(), and possibly in pre_edge(). I admit to being still a little confused by the goal of the rebinning, especially with respect to loss of resolution. It seems like to use this script, you need to do two things: 1. select an E0. The data will be put at energies that will give an even k-grid with this e0. I think it's inevitable that if e0 changes, you might want to re-bin.
2. assume that the k-grid is small enough that the measurements at independent energies are within the acceptable energy resolution that they can be simply summed.
Is that right?
Would it be OK with you (and everyone else) to have a weighted average replace the boxcar average? For example, at k=12, [E(k+0.05) - E(k)] = 5eV, which is larger than normal energy resolutions, so the boxcar average might wash out the high-k EXAFS, no? It might be better to convolve the spectra with a lorenztian reflecting the incident energy resolution (probably defaulting to 1eV). Does that seem OK or is there something eles going on?
Anyway, thanks!!
--Matt
_______________________________________________ Ifeffit mailing list Ifeffit@millenia.cars.aps.anl.gov http://millenia.cars.aps.anl.gov/mailman/listinfo/ifeffit
-- Carlo U. Segre -- Professor of Physics Associate Dean for Research, Armour College Illinois Institute of Technology Voice: 312.567.3498 Fax: 312.567.3494 Carlo.Segre@iit.edu http://www.iit.edu/~segre
Matt: Jeff Terry mentioned that perhaps I did not explain myself adequately with the last message. The algorithm averages BOTH counts and Energy. We do not just place the average number of counts at the center of the bin. Because of this, there should be no distortion and no need to weight. Carlo On Tue, 11 Mar 2003, Matt Newville wrote:
Hi Carlo,
Thanks!! That's wonderful. It seems like this should be an option in spline() and bkg_cl(), and possibly in pre_edge(). I admit to being still a little confused by the goal of the rebinning, especially with respect to loss of resolution. It seems like to use this script, you need to do two things: 1. select an E0. The data will be put at energies that will give an even k-grid with this e0. I think it's inevitable that if e0 changes, you might want to re-bin.
2. assume that the k-grid is small enough that the measurements at independent energies are within the acceptable energy resolution that they can be simply summed.
Is that right?
Would it be OK with you (and everyone else) to have a weighted average replace the boxcar average? For example, at k=12, [E(k+0.05) - E(k)] = 5eV, which is larger than normal energy resolutions, so the boxcar average might wash out the high-k EXAFS, no? It might be better to convolve the spectra with a lorenztian reflecting the incident energy resolution (probably defaulting to 1eV). Does that seem OK or is there something eles going on?
Anyway, thanks!!
--Matt
_______________________________________________ Ifeffit mailing list Ifeffit@millenia.cars.aps.anl.gov http://millenia.cars.aps.anl.gov/mailman/listinfo/ifeffit
-- Carlo U. Segre -- Professor of Physics Associate Dean for Research, Armour College Illinois Institute of Technology Voice: 312.567.3498 Fax: 312.567.3494 Carlo.Segre@iit.edu http://www.iit.edu/~segre
Hi Carlo, On Tue, 11 Mar 2003, Carlo U. Segre wrote:
Matt:
Jeff Terry mentioned that perhaps I did not explain myself adequately with the last message.
The algorithm averages BOTH counts and Energy. We do not just place the average number of counts at the center of the bin. Because of this, there should be no distortion and no need to weight.
Carlo
I'm not sure I see that at first glance, but I'll take your word for it. That isn't what I'm concerned about. Let me explain where I get stuck: let's say you have data collected in energy steps of 0.25eV -- could be QEXAFS, could be step scan. For the sake of argument, lets' set the energy resolution to 1eV. You have to get the data on the 0.25eV grid to an even k-grid for the analysis (at least in ifeffit). For the sake of argument, we'll say dk = 0.05Ang^-1, No matter what the details are, you need to get values of mu(E) for the data onto a gridded set E={E_i}. At k=4, E~=61.0, and k=0.05 corresponds to 1.5eV. A boxcar average will average the original data between 60.25 and 61.75 and call that the new data for E=61. Seems reasonable, though giving equal weight to the data at 61.0 and 61.75 could be questioned for 1eV resolution. At k=16, E~=975.0 eV, k=0.05 corresponds to 6eV. So here, you average of data between E=972.0 and 978.0eV and call that the data for E=975.0eV. Giving equal weight to the data at 972.0 and 975.0 when the resolution is 1eV is what worries me. The simple average of mu(E) between 972.0 and 978.0 is definitely not the same as mu(E) at E=975.0. It would probably be better to average the original data between 974.0 and 976.0 for the new data at 975.0, and might be preferrable to just do a convolution with a 1eV point spread function for all the data. Certainly, if you were to re-bin data collected in 0.25eV steps to a grid of 10eV there would be real problems. By 'using all the data', one can use too much data and spoil the resolution. I don't claim that the boxcar average is wrong, or that convolution is definitely the right thing to do, just that I'm confused by this. --Matt
Matt: I think that I didn't explain clearly. First of all, I misused the name boxcar. In the most stringent of definitions we are not doing a boxcar average because we make no assumptions of where the energy position is to be. Let me try to restate it. 1. There is no effort to put the data on a uniform energy grid. The date from a continuous scan is never on an even grid anyway, just approximately. This is because, at least at MR-CAT, we just count the encoder steps and use that to determine the angular position and thence the energy. Since we can only se our monochromator to move at a uniform angular speed (and even the spacing in angle is not always perfectly uniform), the energy spacing will have a siusoidal dependence at the best. 2. There is no effort to put the data on an even grid in k-space. The purpose of the delta-k is to set the window over which we average the data. The algorithm determines which points in energy space are within the desired box in k-space and then it finds the center of mass of the points by averaging the counts AND averaging the energy too. The resultant spectrum will have datapoints which have the desired density but which will not be on any kind of regular grid. The idea is to let the analysis programs such as IFEFFIT interpolate and regrid. Carlo On Wed, 12 Mar 2003, Matt Newville wrote:
Hi Carlo,
On Tue, 11 Mar 2003, Carlo U. Segre wrote:
Matt:
Jeff Terry mentioned that perhaps I did not explain myself adequately with the last message.
The algorithm averages BOTH counts and Energy. We do not just place the average number of counts at the center of the bin. Because of this, there should be no distortion and no need to weight.
Carlo
I'm not sure I see that at first glance, but I'll take your word for it. That isn't what I'm concerned about.
Let me explain where I get stuck: let's say you have data collected in energy steps of 0.25eV -- could be QEXAFS, could be step scan. For the sake of argument, lets' set the energy resolution to 1eV. You have to get the data on the 0.25eV grid to an even k-grid for the analysis (at least in ifeffit). For the sake of argument, we'll say dk = 0.05Ang^-1, No matter what the details are, you need to get values of mu(E) for the data onto a gridded set E={E_i}.
At k=4, E~=61.0, and k=0.05 corresponds to 1.5eV. A boxcar average will average the original data between 60.25 and 61.75 and call that the new data for E=61. Seems reasonable, though giving equal weight to the data at 61.0 and 61.75 could be questioned for 1eV resolution.
At k=16, E~=975.0 eV, k=0.05 corresponds to 6eV. So here, you average of data between E=972.0 and 978.0eV and call that the data for E=975.0eV. Giving equal weight to the data at 972.0 and 975.0 when the resolution is 1eV is what worries me. The simple average of mu(E) between 972.0 and 978.0 is definitely not the same as mu(E) at E=975.0. It would probably be better to average the original data between 974.0 and 976.0 for the new data at 975.0, and might be preferrable to just do a convolution with a 1eV point spread function for all the data.
Certainly, if you were to re-bin data collected in 0.25eV steps to a grid of 10eV there would be real problems. By 'using all the data', one can use too much data and spoil the resolution.
I don't claim that the boxcar average is wrong, or that convolution is definitely the right thing to do, just that I'm confused by this.
--Matt
_______________________________________________ Ifeffit mailing list Ifeffit@millenia.cars.aps.anl.gov http://millenia.cars.aps.anl.gov/mailman/listinfo/ifeffit
-- Carlo U. Segre -- Professor of Physics Associate Dean for Research, Armour College Illinois Institute of Technology Voice: 312.567.3498 Fax: 312.567.3494 Carlo.Segre@iit.edu http://www.iit.edu/~segre
Carlo, Grant, Shelly, Bruce, Thanks! Shelly's script eems to be similar to what Carlo, Ken, and Grant have done as well, at least as far as how the rolling average is done. I think Sam Webb mentioned he had used a similar technique for some of his QEXAFS data too. So the consensus definitely seems to be that the moving simple average over a limited energy/k range is 'good enough' when converting to k-space. I agree with that: it's certainly no worse than what happens in ifeffit/autobk now. I'm sort of uncomfortable with automatically rebinning the mu(E) data for it's own sake, because I think it's too easy to lose resolution of the data. I think no one was proposing that -- the discussion seems only how to convert data on a fine energy grid to k -- but I want to make sure. At any rate, the conversion of mu(E) to chi(k) seems to be the part that ifeffit should be concerned with. I propose these behaviors for ifeffit commands: - read_data() should leave the QEXAFS energy values intact, optionally sorting data. That is the current behavior. - spline() and bkg_cl() [the commands that convert mu(E) to chi(k)] need to work with both 'step scan' and 'continuous' energy data. That complicates things a little, but I think it means they should create chi(k) on the even k-grid like this: At each k-point (i*kgrid for i=0,Npts), if there are more than 2 data points in the range (k - kgrid/2, k + kgrid/2], average all points in that region. If there are fewer than 2 points in that range, do a 3-point interpolation using the 2 surrounding points and the next nearest point. Both of these averages may spoil resolution somewhat, but like Grant says, the kgrid=0.05 is pretty conservative anyway. I believe this approach will handle QEXAFS data about as well and as simply as possible, and needs no additional flags or settings. The rolling average will *ALWAYS* be done by spline() and bkg_cl() if and where it is needed. That will make spline() a little slower, but only because it would be using more data. This change should help other data, and do no harm that isn't already being done. For example, data taken on an even energy grid would 'use all the data' to effectively increase the dwell time at each k value linearly with k. Currently, this does not happen with ifeffit/autobk, and the additional statistics inherent in such data is lost. This approach should also make Bruce's job much easier, as the calls to spline() and bkg_cl() from athena would not need any changes for QEXAFS data. Any objections, suggestions, or other thoughts? --Matt PS: I think that the fix for the non-uniform k-grid problem in feffit() and this new E->k procedure would be enough to call it a new version. Were there any other outstanding requests?
participants (3)
-
Bruce Ravel
-
Carlo U. Segre
-
Matt Newville