PCA Sixpack results compared to Athena (etc.)
Dear XAS Community, I stumbled over the issue that a PCA on 20 EXAFS spectra (k², k = 2.0-11.0 Å-1) perfomed in Sixpack does not give the same results (Eigenvalues, variance) as in Athena. However, when I use other statistical programs (i.e., TIBCO Statistica or SPSS), I get the same results as reported in Athena. I tested this with another EXAFS dataset of over 30 samples and the problem persited. An old entry from 2017 in the ifeffit mailing list ("[Ifeffit] Calculation of SPOIL value for the reconstruction of standard spectra"), told me that as of Sixpack version 1.4 on, a new/different PCA algorithm from the scikit-learn Python package is used. So I downloaded older versions of Sixpack (i.e. 1.3) and used "Use Old PCA"-selection in the "Rotation" menu bar, which actually gave different results. However, they are still different from the Athena/Statistical program results. My question is: What is behind this? Is there some sort of normalization or axis rotation, that leads to the different values? Is there any way to change this so that the results are comparable to other programs? As I need to use the Target Transform option after PCA, which is not yet possible in Athena, I am at a loss as to how to deal with these different results and where they come from. Thank you very much for your help, Teresa -- Teresa Zahoransky Soil Mineralogy Gottfried Wilhelm Leibniz Universität Hannover Institute of Mineralogy Callinstr. 3, Room 325 D-30167 Hannover, Germany Phone: +49 (0)511 762-8058 Email: t.zahoransky@mineralogie.uni-hannover.de
Hi Teresa,
I'm sorry that I cannot give you a definitive answer. I should admit that
when adding PCA methods to Larch and XAS Viewer (which I invite you to try
out), I tried to follow the scikit-learn approach but also to follow the
Athena implementation. I think I never tried to test against the results
from SixPack. FWIW, Larch only does (easily) PCA on normalized mu(E) or
its derivative. I suppose PCA on chi(k) could be added, but I'm a bit sort
of skeptical of this.
In fact, the code at
https://github.com/xraypy/xraylarch/blob/master/larch/math/pca.py (and,
just to be clear, having this both in Python and publicly available is
motivated by having these conversations of "what does it do?") has a few
different methods to train a PCA set: one directly from scikit-learn, one
basically reproducing Demeter's PCA.pm (modulo slight differences in
underlying math libraries, which should be insignificant), one that aims to
use only non-negative components (not really worth in my opinion), and one
that is sort of hand-coded and including the IND statistic. I don't know
what SixPack does.
I cannot really explain why, but the default "readily exposed in Larch XAS
Viewer" is to use the hand-coded version of `pca_train`. In fact, they
should be all more or less interchangeable. I did some tests with these
but that was now several years ago, but it might be worth trying that
again. If you're up for that, please do try. If not and would like to
send your project and an outline of what you get, I might be able to look
at this too.
For "target transformation", this is implemented as `pca_fit`: how well can
a data set be explained by the first N components of a training model?
For fit statistics: I have seen "SPOIL" used several places in the EXAFS
literature but am afraid I do not actually know of a definition for this.
If anyone can explain what these are, that would be helpful. Larch can
calculate the F1 and IND statistics that are more common in the PCA
literature. XAS Viewer exposes and automatically plots IND - it's a very
useful way to select how many components are significant.
I'm pretty sure that does not answer your actual question, but maybe it
will be helpful. If you or anyone else has suggestions for additions,
improvements, or other optional methods or statistics for PCA and/or
related methods, please let me know.
--Matt
On Tue, Jun 8, 2021 at 7:43 AM
Dear XAS Community,
I stumbled over the issue that a PCA on 20 EXAFS spectra (k², k = 2.0-11.0 Å-1) perfomed in Sixpack does not give the same results (Eigenvalues, variance) as in Athena. However, when I use other statistical programs (i.e., TIBCO Statistica or SPSS), I get the same results as reported in Athena. I tested this with another EXAFS dataset of over 30 samples and the problem persited.
An old entry from 2017 in the ifeffit mailing list ("[Ifeffit] Calculation of SPOIL value for the reconstruction of standard spectra"), told me that as of Sixpack version 1.4 on, a new/different PCA algorithm from the scikit-learn Python package is used.
So I downloaded older versions of Sixpack (i.e. 1.3) and used "Use Old PCA"-selection in the "Rotation" menu bar, which actually gave different results. However, they are still different from the Athena/Statistical program results.
My question is: What is behind this? Is there some sort of normalization or axis rotation, that leads to the different values? Is there any way to change this so that the results are comparable to other programs?
As I need to use the Target Transform option after PCA, which is not yet possible in Athena, I am at a loss as to how to deal with these different results and where they come from.
Thank you very much for your help,
Teresa
-- Teresa Zahoransky
Soil Mineralogy
Gottfried Wilhelm Leibniz Universität Hannover
Institute of Mineralogy
Callinstr. 3, Room 325
D-30167 Hannover, Germany
Phone: +49 (0)511 762-8058
Email: t.zahoransky@mineralogie.uni-hannover.de
_______________________________________________ Ifeffit mailing list Ifeffit@millenia.cars.aps.anl.gov http://millenia.cars.aps.anl.gov/mailman/listinfo/ifeffit Unsubscribe: http://millenia.cars.aps.anl.gov/mailman/options/ifeffit
-- --Matt Newville <newville at cars.uchicago.edu> 630-327-7411
participants (2)
-
Matt Newville
-
t.zahoransky@mineralogie.uni-hannover.de