LCF, Sayers & Bunker, R and ATHENA
Dear mailing list, we recently published a paper where we broke with the "Sayers & Bunker 1988 - Paradigm" to apply normalization as consistently as possible between samples and standards when performing linear combination fitting: http://pubs.acs.org/doi/abs/10.1021/acs.est.5b03096. There, we showed that for known mixtures of phosphorus standards the LCF of P K-edge XANES spectra delivers better results when changing the base-line and normalization parameters of the sample (in defined ranges, using the software /R/). I'm writing to the mailing list of several reasons: 1) A reviewer explicitly asked to contact Bruce Ravel so that the new insight could be implemented in ATHENA, 2) I was trying to combine my code with ATHENAs way of LCF before publishing the code. Unfortunately, I have not been able to resolve that a combination of both approaches results in large numbers of LCF procedures (sometimes Million or Billion). So many procedures are not fittable in reasonable time, maybe because my code is slow or maybe because /R/ isn't fast enough (or just the nlsLM-function that I'm using for LCF). I'm stuck and maybe programming in Perl would be a better choice (never programmed in Perl, yet)? Maybe a different kind of fitting algorithm would be faster?, 3) When using the nlsLM-function to perform LCF in /R/, I also have to provide portions of the standards before starting the fit. I used the same approach as ATHENA, dividing 1 by the number of standards so that every standard has the same portion. However, when playing with the code I realized that changing the starting portion can have an impact on the fitting result. Has this been observed or tested before by anyone else? How do you, in general, justify using the same starting portion for every standard? 4) Some minor question about LCF in ATHENA: a) Could the standards (the columns) in the output .csv-file after LCF be sorted alphabetically?, b) Does LCF in ATHENA sometimes produce negative numbers of a portion because the weight of any "last" standard in a set is always calculated by "1-sum(portions_of_other_standards)" (could it be this? https://github.com/bruceravel/demeter/blob/master/lib/Demeter/LCF.pm#L375-L3...). To mimic "all samples sum to one" from ATHENA and to not produce negative portions, we used all LCF results which have a residual "absolute(1-sum(portions_of_all_standards)) < 0.0005"; maybe this is of interest also for ATHENA? Cheers, Florian
HI Florian,
On Thu, Nov 12, 2015 at 11:06 AM, Florian Werner
Dear mailing list,
we recently published a paper where we broke with the "Sayers & Bunker 1988 - Paradigm" to apply normalization as consistently as possible between samples and standards when performing linear combination fitting: http://pubs.acs.org/doi/abs/10.1021/acs.est.5b03096. There, we showed that for known mixtures of phosphorus standards the LCF of P K-edge XANES spectra delivers better results when changing the base-line and normalization parameters of the sample (in defined ranges, using the software *R*).
That seems really nice. I haven't looked at it in great detail, but it looks generally useful. Are you willing to share this code? My R is pretty poor, but if you can share that, we would at least have something to make a test suite from so we could know that any translation was working well.
I'm writing to the mailing list of several reasons: 1) A reviewer explicitly asked to contact Bruce Ravel so that the new insight could be implemented in ATHENA, 2) I was trying to combine my code with ATHENAs way of LCF before publishing the code. Unfortunately, I have not been able to resolve that a combination of both approaches results in large numbers of LCF procedures (sometimes Million or Billion). So many procedures are not fittable in reasonable time, maybe because my code is slow or maybe because *R* isn't fast enough (or just the nlsLM-function that I'm using for LCF). I'm stuck and maybe programming in Perl would be a better choice (never programmed in Perl, yet)? Maybe a different kind of fitting algorithm would be faster?,
First, I would like to be able to include this in Larch, so having it available in Python would be very useful. If it *is* available in Python/Larch, then Athena can (at least in principle) also use it, and so can many other people. One issue with the way "standard Athena" (that is, using Ifeffit) does LCF -- and nlsLM in R unless I am mistaken -- is that it uses a non-linear least-squares method (from MINPACK) to solve a problem that *should* be linear (the L in LCF). Of course, if one allows for energy shifts, that is no longer quite true. But when doing high precision work these days, the spectra should be well-aligned to not need that energy shifting as part of the LCF process. So, a first question would be whether one could actually use a Linear least-squares (or even likelihood estimates other than least-squares) rather than MINPACK. If so, that could significantly improve performance. With Athena/Ifeffit, this simply was not an option. With Larch (using scipy and/or Python statsmodels or lmfit) there are many more choices for how to do this analysis. For reference, a LCF fit (still using MINPACK) with Larch is described at http://cars.uchicago.edu/xraylarch/fitting/examples.html#example-3-fitting-x...
3) When using the nlsLM-function to perform LCF in *R*, I also have to provide portions of the standards before starting the fit. I used the same approach as ATHENA, dividing 1 by the number of standards so that every standard has the same portion. However, when playing with the code I realized that changing the starting portion can have an impact on the fitting result. Has this been observed or tested before by anyone else? How do you, in general, justify using the same starting portion for every standard?
Yes, I have seen that results can depend on starting points for LCF fits. The justification for an automated assignment of equal starting values for all weights would be that this is the least ordered (highest entropy) distribution. Any automated assignment is sort of hard to justify, but something other than "all things being equal" would be even harder to justify! 4) Some minor question about LCF in ATHENA:
a) Could the standards (the columns) in the output .csv-file after LCF be sorted alphabetically?
Sure, and using any alphabet you choose!
, b) Does LCF in ATHENA sometimes produce negative numbers of a portion because the weight of any "last" standard in a set is always calculated by "1-sum(portions_of_other_standards)" (could it be this? https://github.com/bruceravel/demeter/blob/master/lib/Demeter/LCF.pm#L375-L3...). To mimic "all samples sum to one" from ATHENA and to not produce negative portions, we used all LCF results which have a residual "absolute(1-sum(portions_of_all_standards)) < 0.0005"; maybe this is of interest also for ATHENA?
There should be a range of [0, 1] for the weights of all samples. Again, this can be more easily done with Larch/Python than with Athena+Ifeffit. --Matt
Florian, I agree that that's a wonderful paper. As I understand the main point, you are providing an algorithmic, somewhat brute-force way of sampling a variety of slightly different background removals in order to optimize the fit in a space that includes not just weighting parameters, but also background removal parameters. Great idea! Very nice work, indeed! I agree with all that Matt said. In response to your referee who mentioned me by name (what the what?!?! how did the world come to a place where THAT happens?), I am disinclined to implement something like this in Athena. To my mind, this seems a lot like another of my common questions from people wanting to process an ensemble of qXAS data in Athena. I understand why one would want to do so, but it's not really the job that Athena was designed to perform. It is better to think of Athena as an XAS data processing prototype tool. It's great for a few or a few dozen scans, but it's not really the right tool for a qXAS sequence of 50,000 scans. Similarly, including for the reasons that Matt explained about linear and non-linear minimization algorithms, Athena probably isn't the right tool for the protocol you describe in your paper (although that could be made better in Larch, which Athena is already able to use in place of Ifeffit). You were right to implement the thing you wanted in a way that did so efficiently. That said, it is useful to think about how existing tools can be useful for sophisticated, specialized projects like your protocol. On one hand, using Larch sure seems like a good idea. Python and Larch certainly have comparable tools to what you used in R. That is, your protocol could be implemented in python and larch with about the same amount of effort as you expended in R. That would have the advantage of providing a short, unobstructed path to making a contribution that many in the XAS community will actually use. On the other hand, one should not feel compelled to use python simply because all the cool kids use python. I know I don't! From your paper, it appears that you used Athena to do basic data reduction, alignment, deglitching, etc. It turns out that there is a recent, scantily-documented feature in Athena whereby you can write project files in the form of a compressed JSON file. That is, the data that are compressed can be interpreted by any JSON parser. Thus, if you love R and have no intention of working in some other language, but you want a good pipeline from Athena into your R code, you could save your project file in this new, JSON format and read the data from Athena directly in your R program. My point is that, while I may not choose to implement your cool algorithm in Athena, Athena could still be useful as part of a specialized work flow. (See the "athena-->project_format" configuration parameter.) As for your 4th point, the '1-sum' thing is, as Matt said, awkward and not quite right in Athena. As for the sorting of standards in the output csv file -- sure -- I have no strong opinion for how it is presented right now. And it was certainly kind of Matt to make promises for me ;) Cheers, B On 11/12/2015 02:31 PM, Matt Newville wrote:
HI Florian,
On Thu, Nov 12, 2015 at 11:06 AM, Florian Werner
mailto:florian.werner@wzw.tum.de> wrote: Dear mailing list,
we recently published a paper where we broke with the "Sayers & Bunker 1988 - Paradigm" to apply normalization as consistently as possible between samples and standards when performing linear combination fitting: http://pubs.acs.org/doi/abs/10.1021/acs.est.5b03096. There, we showed that for known mixtures of phosphorus standards the LCF of P K-edge XANES spectra delivers better results when changing the base-line and normalization parameters of the sample (in defined ranges, using the software /R/).
That seems really nice. I haven't looked at it in great detail, but it looks generally useful. Are you willing to share this code? My R is pretty poor, but if you can share that, we would at least have something to make a test suite from so we could know that any translation was working well.
I'm writing to the mailing list of several reasons: 1) A reviewer explicitly asked to contact Bruce Ravel so that the new insight could be implemented in ATHENA, 2) I was trying to combine my code with ATHENAs way of LCF before publishing the code. Unfortunately, I have not been able to resolve that a combination of both approaches results in large numbers of LCF procedures (sometimes Million or Billion). So many procedures are not fittable in reasonable time, maybe because my code is slow or maybe because /R/ isn't fast enough (or just the nlsLM-function that I'm using for LCF). I'm stuck and maybe programming in Perl would be a better choice (never programmed in Perl, yet)? Maybe a different kind of fitting algorithm would be faster?,
First, I would like to be able to include this in Larch, so having it available in Python would be very useful. If it *is* available in Python/Larch, then Athena can (at least in principle) also use it, and so can many other people.
One issue with the way "standard Athena" (that is, using Ifeffit) does LCF -- and nlsLM in R unless I am mistaken -- is that it uses a non-linear least-squares method (from MINPACK) to solve a problem that *should* be linear (the L in LCF). Of course, if one allows for energy shifts, that is no longer quite true. But when doing high precision work these days, the spectra should be well-aligned to not need that energy shifting as part of the LCF process.
So, a first question would be whether one could actually use a Linear least-squares (or even likelihood estimates other than least-squares) rather than MINPACK. If so, that could significantly improve performance. With Athena/Ifeffit, this simply was not an option. With Larch (using scipy and/or Python statsmodels or lmfit) there are many more choices for how to do this analysis.
For reference, a LCF fit (still using MINPACK) with Larch is described at http://cars.uchicago.edu/xraylarch/fitting/examples.html#example-3-fitting-x...
3) When using the nlsLM-function to perform LCF in /R/, I also have to provide portions of the standards before starting the fit. I used the same approach as ATHENA, dividing 1 by the number of standards so that every standard has the same portion. However, when playing with the code I realized that changing the starting portion can have an impact on the fitting result. Has this been observed or tested before by anyone else? How do you, in general, justify using the same starting portion for every standard?
Yes, I have seen that results can depend on starting points for LCF fits. The justification for an automated assignment of equal starting values for all weights would be that this is the least ordered (highest entropy) distribution. Any automated assignment is sort of hard to justify, but something other than "all things being equal" would be even harder to justify!
4) Some minor question about LCF in ATHENA: a) Could the standards (the columns) in the output .csv-file after LCF be sorted alphabetically?
Sure, and using any alphabet you choose!
, b) Does LCF in ATHENA sometimes produce negative numbers of a portion because the weight of any "last" standard in a set is always calculated by "1-sum(portions_of_other_standards)" (could it be this? https://github.com/bruceravel/demeter/blob/master/lib/Demeter/LCF.pm#L375-L3...). To mimic "all samples sum to one" from ATHENA and to not produce negative portions, we used all LCF results which have a residual "absolute(1-sum(portions_of_all_standards)) < 0.0005"; maybe this is of interest also for ATHENA?
There should be a range of [0, 1] for the weights of all samples. Again, this can be more easily done with Larch/Python than with Athena+Ifeffit.
--Matt
_______________________________________________ Ifeffit mailing list Ifeffit@millenia.cars.aps.anl.gov http://millenia.cars.aps.anl.gov/mailman/listinfo/ifeffit Unsubscribe: http://millenia.cars.aps.anl.gov/mailman/options/ifeffit
-- Bruce Ravel ------------------------------------ bravel@bnl.gov National Institute of Standards and Technology Synchrotron Science Group at NSLS-II Building 535A Upton NY, 11973 Homepage: http://bruceravel.github.io/home/ Software: https://github.com/bruceravel Demeter: http://bruceravel.github.io/demeter/
On 11/12/2015 12:06 PM, Florian Werner wrote:
a) Could the standards (the columns) in the output .csv-file after LCF be sorted alphabetically?,
I just checked this into the github repository. It will be available for Windows users the next time I make an installer. Cheers, B P.S. I am also making good progress on Marcelo Alves' observation that Athena doesn't like files with non-ASCII characters in the file names. Hopefully, that too will be fixed in the next release. -- Bruce Ravel ------------------------------------ bravel@bnl.gov National Institute of Standards and Technology Synchrotron Science Group at NSLS-II Building 535A Upton NY, 11973 Homepage: http://bruceravel.github.io/home/ Software: https://github.com/bruceravel Demeter: http://bruceravel.github.io/demeter/
participants (3)
-
Bruce Ravel
-
Florian Werner
-
Matt Newville