Hi Teresa and Matt,

Concerning the SPOIL function, some months ago I digged on that and I found the first paper (I guess) on this topic, which is from Edmund R. Malinowski in an Analytical Chimica Acta, 1978 ( 10.1016/S0003-2670(01)83099-3 ).
Instead of reproducing this paper here (it is not just a couple of lines), I let to you read the precise definition of such a function.
I have the copy on pdf of the paper if any of you is interested in it.

Cheers,
  Danilo



Danilo OLIVEIRA DE SOUZA          

Research Fellow @ ELETTRA Sincrotrone, XAFS beamline

    Strada Statale 14 - km 163,5 in AREA Science Park
   34149 Basovizza, Trieste ITALY

   mobile: (+39) 351 50.30.731
   bureau: (+39) 040-375-8604


Em qui., 10 de jun. de 2021 às 19:00, <ifeffit-request@millenia.cars.aps.anl.gov> escreveu:
Send Ifeffit mailing list submissions to
        ifeffit@millenia.cars.aps.anl.gov

To subscribe or unsubscribe via the World Wide Web, visit
        http://millenia.cars.aps.anl.gov/mailman/listinfo/ifeffit
or, via email, send a message with subject or body 'help' to
        ifeffit-request@millenia.cars.aps.anl.gov

You can reach the person managing the list at
        ifeffit-owner@millenia.cars.aps.anl.gov

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Ifeffit digest..."


Today's Topics:

   1. Re: PCA Sixpack results compared to Athena (etc.) (Matt Newville)


----------------------------------------------------------------------

Message: 1
Date: Thu, 10 Jun 2021 07:08:19 -0500
From: Matt Newville <newville@cars.uchicago.edu>
To: XAFS Analysis using Ifeffit <ifeffit@millenia.cars.aps.anl.gov>
Subject: Re: [Ifeffit] PCA Sixpack results compared to Athena (etc.)
Message-ID:
        <CA+7ESbq=J9fZ3Y0PPuyjShk+XNFOGjOTktX4m7oyu6PFfTwM5w@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi Teresa,

I'm sorry that I cannot give you a definitive answer.  I should admit that
when adding PCA methods to Larch and XAS Viewer (which I invite you to try
out), I tried to follow the scikit-learn approach but also to follow the
Athena implementation.  I think I never tried to test against the results
from SixPack.   FWIW, Larch only does (easily) PCA on normalized mu(E) or
its derivative.  I suppose PCA on chi(k) could be added, but I'm a bit sort
of skeptical of this.

In fact, the code at
https://github.com/xraypy/xraylarch/blob/master/larch/math/pca.py (and,
just to be clear, having this both in Python and publicly available is
motivated by having these conversations of "what does it do?") has a few
different methods to train a PCA set: one directly from scikit-learn, one
basically reproducing Demeter's PCA.pm (modulo slight differences in
underlying math libraries, which should be insignificant), one that aims to
use only non-negative components (not really worth in my opinion), and one
that is sort of hand-coded and including the IND statistic.   I don't know
what SixPack does.

I cannot really explain why, but the default "readily exposed in Larch XAS
Viewer" is to use the hand-coded version of `pca_train`.  In fact, they
should be all more or less interchangeable.  I did some tests with these
but that was now several years ago, but it might be worth trying that
again.  If you're up for that, please do try.  If not and would like to
send your project and an outline of what you get, I might be able to look
at this too.

For "target transformation", this is implemented as `pca_fit`: how well can
a data set be explained by the first N components of a training model?

For fit statistics: I have seen "SPOIL" used several places in the EXAFS
literature but am afraid I do not actually know of a definition for this.
If anyone can explain what these are, that would be helpful.   Larch can
calculate the F1 and IND statistics that are more common in the PCA
literature.  XAS Viewer exposes and automatically plots IND - it's a very
useful way to select how many components are significant.

I'm pretty sure that does not answer your actual question, but maybe it
will be helpful.  If you or anyone else has suggestions for additions,
improvements, or other optional methods or statistics for PCA and/or
related methods, please let me know.

--Matt


On Tue, Jun 8, 2021 at 7:43 AM <t.zahoransky@mineralogie.uni-hannover.de>
wrote:

>
> Dear XAS Community,
>
> I stumbled over the issue that a PCA on 20 EXAFS spectra (k?, k =
> 2.0-11.0 ?-1) perfomed in Sixpack does not give the same results
> (Eigenvalues, variance) as in Athena. However, when I use other
> statistical programs (i.e., TIBCO Statistica or SPSS), I get the same
> results as reported in Athena. I tested this with another EXAFS
> dataset of over 30 samples and the problem persited.
>
> An old entry from 2017 in the ifeffit mailing list ("[Ifeffit]
> Calculation of SPOIL value for the reconstruction of standard
> spectra"), told me that as of Sixpack version 1.4 on, a new/different
> PCA algorithm from the scikit-learn Python package is used.


> So I downloaded older versions of Sixpack (i.e. 1.3) and used "Use Old
> PCA"-selection in the "Rotation" menu bar, which actually gave
> different results. However, they are still different from the
> Athena/Statistical program results.
>
> My question is: What is behind this? Is there some sort of
> normalization or axis rotation, that leads to the different values? Is
> there any way to change this so that the results are comparable to
> other programs?
>
> As I need to use the Target Transform option after PCA, which is not
> yet possible in Athena, I am at a loss as to how to deal with these
> different results and where they come from.
>
>
> Thank you very much for your help,
>
> Teresa
>
>
> --
> Teresa Zahoransky
>
> Soil Mineralogy
>
> Gottfried Wilhelm Leibniz Universit?t Hannover
>
> Institute of Mineralogy
>
> Callinstr. 3, Room 325
>
> D-30167 Hannover, Germany
>
>
>
> Phone: +49 (0)511 762-8058
>
> Email: t.zahoransky@mineralogie.uni-hannover.de
>
> _______________________________________________
> Ifeffit mailing list
> Ifeffit@millenia.cars.aps.anl.gov
> http://millenia.cars.aps.anl.gov/mailman/listinfo/ifeffit
> Unsubscribe: http://millenia.cars.aps.anl.gov/mailman/options/ifeffit
>


--
--Matt Newville <newville at cars.uchicago.edu> 630-327-7411
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://millenia.cars.aps.anl.gov/pipermail/ifeffit/attachments/20210610/f2ab6178/attachment-0001.htm>

------------------------------

Subject: Digest Footer

_______________________________________________
Ifeffit mailing list
Ifeffit@millenia.cars.aps.anl.gov
http://millenia.cars.aps.anl.gov/mailman/listinfo/ifeffit
Unsubscribe: http://millenia.cars.aps.anl.gov/mailman/options/ifeffit


------------------------------

End of Ifeffit Digest, Vol 220, Issue 2
***************************************