Hi Everyone,

Thanks for the discussion and example data and code! By itself, Larch was definitely not handling these datasets gracefully, and now it is doing a much better job. There were lots of different topics discussed, but I'd like to make a few comments:

First, I do not understand the objection to rebinning QXAFS data that is heavily oversampled in energy. I think this is necessary. When I do continuous XAFS scans (still at ~1 sec per energy, so maybe not "Quick" anymore), I set up a "normally gridded XAFS scan" and then use that to set triggers and scan at approximately constant energy velocity so that each bin of energy is centered on the target energy. That is, I don't get 8000 values, only ~400 on a nearly-normal EXAFS grid. To be clear, we do this for convenience, so that the binning is done in the motor controller and detector hardware, not in post-processing software. But, there should be no difference, and like Carlo and Edmund point out, doing it in software does add flexibility in how the data is treated. As long as the final energy grid is fine enough relative to energy resolution and/or EXAFS resolution (so, ~0.05Ang^-1), I think this is fine.

But, whether done in hardware or software, binning *will* happen with QXAFS.

Interpolation and smoothing could work, but these introduce questions of how much and which data to use for the interpolation or smoothing. For sure, smoothing with Savitzky-Golay followed by simple interpolation could work. But, when Larch/Ifeffit/Athena treat data by default, they use a simple interpolation and do work to use all the data points in finely spaced data. Instead, they use only the 2 or 3 nearest energy values and do linear or quadratic interpolation with those limited data, assuming the data is accurate. That does not work so well for heavily oversampled and so results in artificially noisy data, as shown below (see especially the Ru example).

I'm still digesting Matthew's code. I think it is close in spirit to what I have, but I have not tested it numerically.

For rebinning in Larch and its in-active-development XAS_Viewer, here's what I have so far (full code at https://github.com/xraypy/xraylarch/blob/master/plugins/xafs/rebin_xafs.py#L52), based on playing around with data from Carlo and Edmund:

Step 1: make "standard XAFS grid" array of energy, with XANES steps of ~0.5 eV or better and EXAFS steps of 0.05 Ang^-1

Step 2: identify segments in original energy array for each value in the new energy array.

Step 3: for each energy value in the new array:

a) if the segment has 2 or few energy values, do linear interpolation.

b) otherwise, take either the mean value ("boxcar") or centroid of mu for the segment.

c) estimate the uncertainty as the standard deviation of the mu for that segment.

I know most of the discussion here was about EXAFS, but I am also slightly concerned about rebinning introducing systematic shifts at the edge, just in case this data is used as XANES. Because of this, I use a hybrid solution of doing rebinning when there are enough original data points (3 or more), and doing linear interpolation when there are 3 or fewer points. I think Matthew's code does that too....

I thought that using the centroid might improve the noise, but it seemed to have a tiny (1.e-7) effect, at least on the data looked at so far. Also, so far, I'm calculating the uncertainty in mu due to the rebinning, but not doing anything with that yet.

With this and other fixes for sorting and removing duplicate data, Larch XAS Viewer now does an OK job with Carlo's and Edmund's data. Attached are two plots showing data as originally imported without rebinning, and after rebinning. The unbinned data look really, really noisy. For Carlo's fluorescence data at Fe K edge (1 scan only), rebinning helps a lot. For Edmund's 8000+ dataset for Ru in transmission, not-rebinning is a complete disaster, and rebinning is a huge improvement.

Let me know if you have any more suggestions on how to do this better or questions about this,

On Thu, Jun 28, 2018 at 8:19 AM Edmund Welter <edmund.welter@desy.de> wrote:

Dear Matt,

attached you will find a recent data file which is showing the problem and two plots of the data (A RuO2 powder sample on Scotch tape) in the file. One plot is showing the Chi(k) with and without rebinning the other one just mue(E) with and without rebinning. It seems it is no good to convert previously rebinned data to k-space. the rebinned mue(E) looks okay and the original data converted to k-space does also look ok.

I produced the plots with Athena 0.9.24 (I know not the most recent version) under WIN 7 with IFEFFIT as backend. I wasn't able yet to reproduce this on my office desktop using LINUX and the most recent dathena and Larch, because of some trouble with reading the data file... No idea yet, might be worse another thread or might be something stupid on my site.

Cheers,

Edmund

On 27.06.2018 20:31, Matt Newville wrote:
HI Ilya, Edmund, Carlo,

Ilya and/or Carlo: can you post some example unbinned data? As it turns out, I am adding a rebinning feature in the Larch XAS Viewer GUI that should be ready for a ready-to-try release very soon (for IIT XAFS School and XAFS2018).

This seems like a good chance to test these procedures out.

My approach for this is to this is to make a "normal XAFS energy grid" of ~5 eV steps, 0.25 eV steps, 0.05 Ang^-1 steps that the downstream processing needs, and then do one of two strategies -- maybe there should be more?:

a) do a straight interpolation onto this array -- that is probably the "noisy" result.

b) assign each energy point in the original data to one of these energy bins, and take the average of all the points in each bin.

I'd also like to try using energy-weighted mean (centroid). Probably most of the data is so finely spaced that this won't make much difference, but it might be a good option. It might be able to help compensate for energy jitter, assuming that the recorded energy (probably from an encoder) is more accurate than the requested energy.

It's also interesting to think about doing a Savitzky-Golay smoothing, though that might require knowing if the data points are actually uniform in mono angle or mono energy. It also makes it easy to over-do the smoothing, and so a little trickier to prevent bad results.

Do you (or anyone else) have any suggestions for how to best re-bin this kind of data?

--Matt

On Wed, Jun 27, 2018 at 10:15 AM Carlo Segre <segre@iit.edu> wrote:

Yes, we measure fast and have taken as many as 20000 points. The problem
is not in the shifts that you mention. This is normal and expected. the
problem is specificallly in the rebinning algorithm in Demeter. It seems
to be different than the one in the old Horae package. I have done a test
of this and I attache a coule of figures that show the difference.

I have used 10 continuous scans for this test. The data were taken at the
MRCAT beamline, Sector 10 at the APS. The data are for the Fe K-edge and
there are about 3400 points per scan with a point density of about 0.35
eV/step. I used both versions of Athena and performed the following steps
to give the data groups shown in the plots

new_athena.png
Fe_new_rebin_merge - (blue) all 10 scans rebinned at input and then
merged
Fe_new_merge - (red) all 10 scans merged only
Fe_new_merge_rebin - (green) all 10 scans merged then rebinned

old_athena.png
Fe_old_rebin_merge - (blue) all 10 scans rebinned at input and then
merged
Fe_old_merge - (red) all 10 scans merged only
Fe_old_merge_rebin - (green) all 10 scans merged then rebinned

comp_athena.png
Fe_old_rebin_merge - (blue)
Fe_new_rebin_merge - (red)

It is clear that the new Athena (Demeter) is not rebinning the same way as
the old one (Horae). The contrast is particularly evident with the last
plot. The new rebinning algorithm is introducing more noise. For the
moment, I recommend only merging and perhaps smoothing if you can tolerate
a bit of amplitude reduction.

I have been thinking that it might even be better to have the data
acquisition software do the rebinning on the fly so the data does not have
to be manipulated in Athena. I am not sure if this is a good idea yet but
I think it would help my users.

Carlo

On Wed, 27 Jun 2018, Edmund Welter wrote:

> Dear Carlo,
>
> do you also measure as fast as possible in the sense that for two consecutive
> scans the points on the energy axis are not at the same positions? This is
> what happens at my beamline. The differences are typically very small but
> there are differences and one should not just add all the first points and
> all the second points and so on because they are not necessarily exactly at
> the same energy. Sometimes the beamline computer is doing something else in
> parallel (whatever that might be) and the distance between points A and B is
> significantly larger than the distance between B and C.
>
> So, the problem is, at which point does it make sense to merge several
> spectra of the same sample? I presume that Athena is taking care of this when
> I use it to merge spectra, but it can only do so by interpolating the points
> in the spectrum onto a common grid before summing up the spectra.
>
> The best solution might be to rebin/interpolate the spectra onto a fixed grid
> before they are imported into Athena (or any other program), depends on what
> Athena is exactly doing when it is rebinning data.
>
> Another aspect is that Athena is not very happy about 8600 points/spectrum
> anyway, at least as long as it using Ifeffit.
>
> Cheers,
>
> Edmund
>
>
>
> On 27.06.2018 15:14, Carlo Segre wrote:
>>
>>
>> Hi Ilya:
>>
>> We always take data in this mode at APS Sector 10 and I have also find that
>> the rebinning function is not working satisfactorily at this time. I find
>> that for the current version of the software it is better to merge your
>> data and let IFEFFIT interpolate to the dk=0.05 grid that it uses.
>>
>> Carlo
>>
>>
>> On Wed, 27 Jun 2018, Ilya Sinev wrote:
>>
>>> Hi all,
>>>
>>>
>>>
>>> I have a question regarding the chi(k) function isolation and rebinning
>>> processes. I have some data recorded in ?quasi channel-cut? modus, i.e.
>>> with
>>> the mono constantly moving and the data points collected with the highest
>>> possible rate. With 180 sec measurement in yields to a spectrum of ca.
>>> 8600
>>> point, which obviously needs to be rebinned. The rebinned data, however,
>>> does not look good in k-space even if multiple data are merged. Moreover,
>>> I
>>> have an impression that the raw spectrum in k-space does not have those
>>> 8000+ points anymore but significantly less. Is there any reduction of the
>>> data points number that is not seen (e.g. as a preparation step for FT)?
>>> Since the unbinned data has higher quality, does it then make more sense
>>> to
>>> keep using it for EXAFS analysis?
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Thank you
>>>
>>> Ilya Sinev
>>>
>>>
>>>
>>>
>>
>
> _______________________________________________
> Ifeffit mailing list
> Ifeffit@millenia.cars.aps.anl.gov
> http://millenia.cars.aps.anl.gov/mailman/listinfo/ifeffit
> Unsubscribe: http://millenia.cars.aps.anl.gov/mailman/options/ifeffit
>

--
Carlo U. Segre -- Duchossois Leadership Professor of Physics
Interim Chair, Department of Chemistry
Director, Center for Synchrotron Radiation Research and Instrumentation
Illinois Institute of Technology
Voice: 312.567.3498 Fax: 312.567.3494
segre@iit.edu http://phys.iit.edu/~segre segre@debian.org_______________________________________________
Ifeffit mailing list
Ifeffit@millenia.cars.aps.anl.gov
http://millenia.cars.aps.anl.gov/mailman/listinfo/ifeffit
Unsubscribe: http://millenia.cars.aps.anl.gov/mailman/options/ifeffit

--

--Matt Newville <newville at cars.uchicago.edu> 630-252-0431
_______________________________________________
Ifeffit mailing list
Ifeffit@millenia.cars.aps.anl.gov
http://millenia.cars.aps.anl.gov/mailman/listinfo/ifeffit
Unsubscribe: http://millenia.cars.aps.anl.gov/mailman/options/ifeffit
_______________________________________________
Ifeffit mailing list
Ifeffit@millenia.cars.aps.anl.gov
http://millenia.cars.aps.anl.gov/mailman/listinfo/ifeffit
Unsubscribe: http://millenia.cars.aps.anl.gov/mailman/options/ifeffit