HI Florian,

On Thu, Nov 12, 2015 at 11:06 AM, Florian Werner <florian.werner@wzw.tum.de> wrote:

Dear mailing list,

we recently published a paper where we broke with the "Sayers & Bunker 1988 - Paradigm" to apply normalization as consistently as possible between samples and standards when performing linear combination fitting: http://pubs.acs.org/doi/abs/10.1021/acs.est.5b03096. There, we showed that for known mixtures of phosphorus standards the LCF of P K-edge XANES spectra delivers better results when changing the base-line and normalization parameters of the sample (in defined ranges, using the software R).

That seems really nice. I haven't looked at it in great detail, but it looks generally useful. Are you willing to share this code? My R is pretty poor, but if you can share that, we would at least have something to make a test suite from so we could know that any translation was working well.

I'm writing to the mailing list of several reasons:
1) A reviewer explicitly asked to contact Bruce Ravel so that the new insight could be implemented in ATHENA,
2) I was trying to combine my code with ATHENAs way of LCF before publishing the code. Unfortunately, I have not been able to resolve that a combination of both approaches results in large numbers of LCF procedures (sometimes Million or Billion). So many procedures are not fittable in reasonable time, maybe because my code is slow or maybe because R isn't fast enough (or just the nlsLM-function that I'm using for LCF). I'm stuck and maybe programming in Perl would be a better choice (never programmed in Perl, yet)? Maybe a different kind of fitting algorithm would be faster?,

First, I would like to be able to include this in Larch, so having it available in Python would be very useful. If it *is* available in Python/Larch, then Athena can (at least in principle) also use it, and so can many other people.

One issue with the way "standard Athena" (that is, using Ifeffit) does LCF -- and nlsLM in R unless I am mistaken -- is that it uses a non-linear least-squares method (from MINPACK) to solve a problem that *should* be linear (the L in LCF). Of course, if one allows for energy shifts, that is no longer quite true. But when doing high precision work these days, the spectra should be well-aligned to not need that energy shifting as part of the LCF process.

So, a first question would be whether one could actually use a Linear least-squares (or even likelihood estimates other than least-squares) rather than MINPACK. If so, that could significantly improve performance. With Athena/Ifeffit, this simply was not an option. With Larch (using scipy and/or Python statsmodels or lmfit) there are many more choices for how to do this analysis.

For reference, a LCF fit (still using MINPACK) with Larch is described at http://cars.uchicago.edu/xraylarch/fitting/examples.html#example-3-fitting-xanes-spectra-as-a-linear-combination-of-other-spectra

3) When using the nlsLM-function to perform LCF in R, I also have to provide portions of the standards before starting the fit. I used the same approach as ATHENA, dividing 1 by the number of standards so that every standard has the same portion. However, when playing with the code I realized that changing the starting portion can have an impact on the fitting result. Has this been observed or tested before by anyone else? How do you, in general, justify using the same starting portion for every standard?

Yes, I have seen that results can depend on starting points for LCF fits. The justification for an automated assignment of equal starting values for all weights would be that this is the least ordered (highest entropy) distribution. Any automated assignment is sort of hard to justify, but something other than "all things being equal" would be even harder to justify!

4) Some minor question about LCF in ATHENA:
a) Could the standards (the columns) in the output .csv-file after LCF be sorted alphabetically?

Sure, and using any alphabet you choose!

,
b) Does LCF in ATHENA sometimes produce negative numbers of a portion because the weight of any "last" standard in a set is always calculated by "1-sum(portions_of_other_standards)" (could it be this? https://github.com/bruceravel/demeter/blob/master/lib/Demeter/LCF.pm#L375-L378). To mimic "all samples sum to one" from ATHENA and to not produce negative portions, we used all LCF results which have a residual "absolute(1-sum(portions_of_all_standards)) < 0.0005"; maybe this is of interest also for ATHENA?

There should be a range of [0, 1] for the weights of all samples. Again, this can be more easily done with Larch/Python than with Athena+Ifeffit.

--Matt