Hi Teresa,

I'm sorry that I cannot give you a definitive answer. I should admit that when adding PCA methods to Larch and XAS Viewer (which I invite you to try out), I tried to follow the scikit-learn approach but also to follow the Athena implementation. I think I never tried to test against the results from SixPack. FWIW, Larch only does (easily) PCA on normalized mu(E) or its derivative. I suppose PCA on chi(k) could be added, but I'm a bit sort of skeptical of this.

In fact, the code at https://github.com/xraypy/xraylarch/blob/master/larch/math/pca.py (and, just to be clear, having this both in Python and publicly available is motivated by having these conversations of "what does it do?") has a few different methods to train a PCA set: one directly from scikit-learn, one basically reproducing Demeter's PCA.pm (modulo slight differences in underlying math libraries, which should be insignificant), one that aims to use only non-negative components (not really worth in my opinion), and one that is sort of hand-coded and including the IND statistic. I don't know what SixPack does.

I cannot really explain why, but the default "readily exposed in Larch XAS Viewer" is to use the hand-coded version of `pca_train`. In fact, they should be all more or less interchangeable. I did some tests with these but that was now several years ago, but it might be worth trying that again. If you're up for that, please do try. If not and would like to send your project and an outline of what you get, I might be able to look at this too.

For "target transformation", this is implemented as `pca_fit`: how well can a data set be explained by the first N components of a training model?

For fit statistics: I have seen "SPOIL" used several places in the EXAFS literature but am afraid I do not actually know of a definition for this. If anyone can explain what these are, that would be helpful. Larch can calculate the F1 and IND statistics that are more common in the PCA literature. XAS Viewer exposes and automatically plots IND - it's a very useful way to select how many components are significant.

I'm pretty sure that does not answer your actual question, but maybe it will be helpful. If you or anyone else has suggestions for additions, improvements, or other optional methods or statistics for PCA and/or related methods, please let me know.

--Matt

On Tue, Jun 8, 2021 at 7:43 AM <t.zahoransky@mineralogie.uni-hannover.de> wrote:

Dear XAS Community,

I stumbled over the issue that a PCA on 20 EXAFS spectra (k², k =
2.0-11.0 Å-1) perfomed in Sixpack does not give the same results
(Eigenvalues, variance) as in Athena. However, when I use other
statistical programs (i.e., TIBCO Statistica or SPSS), I get the same
results as reported in Athena. I tested this with another EXAFS
dataset of over 30 samples and the problem persited.

An old entry from 2017 in the ifeffit mailing list ("[Ifeffit]
Calculation of SPOIL value for the reconstruction of standard
spectra"), told me that as of Sixpack version 1.4 on, a new/different
PCA algorithm from the scikit-learn Python package is used.

So I downloaded older versions of Sixpack (i.e. 1.3) and used "Use Old
PCA"-selection in the "Rotation" menu bar, which actually gave
different results. However, they are still different from the
Athena/Statistical program results.

My question is: What is behind this? Is there some sort of
normalization or axis rotation, that leads to the different values? Is
there any way to change this so that the results are comparable to
other programs?

As I need to use the Target Transform option after PCA, which is not
yet possible in Athena, I am at a loss as to how to deal with these
different results and where they come from.

Thank you very much for your help,

Teresa

--
Teresa Zahoransky

Soil Mineralogy

Gottfried Wilhelm Leibniz Universität Hannover

Institute of Mineralogy

Callinstr. 3, Room 325

D-30167 Hannover, Germany

Phone: +49 (0)511 762-8058

Email: t.zahoransky@mineralogie.uni-hannover.de

_______________________________________________
Ifeffit mailing list
Ifeffit@millenia.cars.aps.anl.gov
http://millenia.cars.aps.anl.gov/mailman/listinfo/ifeffit
Unsubscribe: http://millenia.cars.aps.anl.gov/mailman/options/ifeffit

--Matt Newville <newville at cars.uchicago.edu> 630-327-7411