Re: [Ifeffit] Ifeffit Digest, Vol 220, Issue 2
Hi Teresa and Matt,
Concerning the SPOIL function, some months ago I digged on that and I found
the first paper (I guess) on this topic, which is from Edmund R. Malinowski
in an Analytical Chimica Acta, 1978 ( 10.1016/S0003-2670(01)83099-3
https://doi.org/10.1016/S0003-2670(01)83099-3 ).
Instead of reproducing this paper here (it is not just a couple of lines),
I let to you read the precise definition of such a function.
I have the copy on pdf of the paper if any of you is interested in it.
Cheers,
Danilo
*Danilo OLIVEIRA DE SOUZA*
*Research Fellow @ ELETTRA Sincrotrone, XAFS beamline*
* Strada Statale 14 - km 163,5 in AREA Science Park*
* 34149 Basovizza, Trieste ITALY*
*http://www.elettra.trieste.it/elettra-beamlines/xafs.html
http://www.elettra.trieste.it/elettra-beamlines/xafs.html*
* e-mail: danilo.oliveiradesouza@elettra.eu
Send Ifeffit mailing list submissions to ifeffit@millenia.cars.aps.anl.gov
To subscribe or unsubscribe via the World Wide Web, visit http://millenia.cars.aps.anl.gov/mailman/listinfo/ifeffit or, via email, send a message with subject or body 'help' to ifeffit-request@millenia.cars.aps.anl.gov
You can reach the person managing the list at ifeffit-owner@millenia.cars.aps.anl.gov
When replying, please edit your Subject line so it is more specific than "Re: Contents of Ifeffit digest..."
Today's Topics:
1. Re: PCA Sixpack results compared to Athena (etc.) (Matt Newville)
----------------------------------------------------------------------
Message: 1 Date: Thu, 10 Jun 2021 07:08:19 -0500 From: Matt Newville
To: XAFS Analysis using Ifeffit Subject: Re: [Ifeffit] PCA Sixpack results compared to Athena (etc.) Message-ID: Content-Type: text/plain; charset="utf-8" Hi Teresa,
I'm sorry that I cannot give you a definitive answer. I should admit that when adding PCA methods to Larch and XAS Viewer (which I invite you to try out), I tried to follow the scikit-learn approach but also to follow the Athena implementation. I think I never tried to test against the results from SixPack. FWIW, Larch only does (easily) PCA on normalized mu(E) or its derivative. I suppose PCA on chi(k) could be added, but I'm a bit sort of skeptical of this.
In fact, the code at https://github.com/xraypy/xraylarch/blob/master/larch/math/pca.py (and, just to be clear, having this both in Python and publicly available is motivated by having these conversations of "what does it do?") has a few different methods to train a PCA set: one directly from scikit-learn, one basically reproducing Demeter's PCA.pm (modulo slight differences in underlying math libraries, which should be insignificant), one that aims to use only non-negative components (not really worth in my opinion), and one that is sort of hand-coded and including the IND statistic. I don't know what SixPack does.
I cannot really explain why, but the default "readily exposed in Larch XAS Viewer" is to use the hand-coded version of `pca_train`. In fact, they should be all more or less interchangeable. I did some tests with these but that was now several years ago, but it might be worth trying that again. If you're up for that, please do try. If not and would like to send your project and an outline of what you get, I might be able to look at this too.
For "target transformation", this is implemented as `pca_fit`: how well can a data set be explained by the first N components of a training model?
For fit statistics: I have seen "SPOIL" used several places in the EXAFS literature but am afraid I do not actually know of a definition for this. If anyone can explain what these are, that would be helpful. Larch can calculate the F1 and IND statistics that are more common in the PCA literature. XAS Viewer exposes and automatically plots IND - it's a very useful way to select how many components are significant.
I'm pretty sure that does not answer your actual question, but maybe it will be helpful. If you or anyone else has suggestions for additions, improvements, or other optional methods or statistics for PCA and/or related methods, please let me know.
--Matt
On Tue, Jun 8, 2021 at 7:43 AM
wrote: Dear XAS Community,
I stumbled over the issue that a PCA on 20 EXAFS spectra (k?, k = 2.0-11.0 ?-1) perfomed in Sixpack does not give the same results (Eigenvalues, variance) as in Athena. However, when I use other statistical programs (i.e., TIBCO Statistica or SPSS), I get the same results as reported in Athena. I tested this with another EXAFS dataset of over 30 samples and the problem persited.
An old entry from 2017 in the ifeffit mailing list ("[Ifeffit] Calculation of SPOIL value for the reconstruction of standard spectra"), told me that as of Sixpack version 1.4 on, a new/different PCA algorithm from the scikit-learn Python package is used.
So I downloaded older versions of Sixpack (i.e. 1.3) and used "Use Old PCA"-selection in the "Rotation" menu bar, which actually gave different results. However, they are still different from the Athena/Statistical program results.
My question is: What is behind this? Is there some sort of normalization or axis rotation, that leads to the different values? Is there any way to change this so that the results are comparable to other programs?
As I need to use the Target Transform option after PCA, which is not yet possible in Athena, I am at a loss as to how to deal with these different results and where they come from.
Thank you very much for your help,
Teresa
-- Teresa Zahoransky
Soil Mineralogy
Gottfried Wilhelm Leibniz Universit?t Hannover
Institute of Mineralogy
Callinstr. 3, Room 325
D-30167 Hannover, Germany
Phone: +49 (0)511 762-8058
Email: t.zahoransky@mineralogie.uni-hannover.de
_______________________________________________ Ifeffit mailing list Ifeffit@millenia.cars.aps.anl.gov http://millenia.cars.aps.anl.gov/mailman/listinfo/ifeffit Unsubscribe: http://millenia.cars.aps.anl.gov/mailman/options/ifeffit
-- --Matt Newville <newville at cars.uchicago.edu> 630-327-7411
participants (1)
-
Danilo Oliveira de Souza