[Ifeffit] Question about parameters of principle components analysis (PCA)

Matt Newville newville at cars.uchicago.edu
Sat Jan 26 11:50:29 CST 2019


On Fri, Jan 25, 2019 at 12:48 PM 雷思聪 <1510405 at tongji.edu.cn> wrote:

> Hello, I see in many publications the indicator values (IND) numbers used
> to evaluate how many components are included, and SPOIL value used to
> evaluate quality of target transform in principle components analysis
> (PCA). However, they are obtained from PCA performed on SIXpack, rather
> than Athena (version 0.9.26). Hence, I wonder if I can get similar
> parameters after running of PCA in Athena? If not, how do I evaluate the
> PCA results?

I don't know that there is a single mathematical definition for giving the
number of significant components in PCA.  For one thing, there is some
ambiguity in literature about whether the spectral average is considered as
a component or whether that is removed from the list of components.  Even
beyond that, I think there is not a clear standard for determining whether
an eigenvector is below the noise level.  If someone knows of one, I'd be
happy to use it!

For XAFS and XANES in particular, it is often the case that one component
(typically number 3, 4, or 5) has a high noise level, but some spectral
weight right around the edge/white line energy.  It could be that such a
component is above the noise level in an important part of the spectrum but
below the noise level overall.  That makes it something of a judgement call
of whether such a component should be considered as "significant".  My
inclination would be to consider such a component significant and see if it
is clearly important for any of the targeted spectra (and it often is not).

I also do not know the exact definition or origin of the term "Spoil" to
indicate the quality of a target transformation. I see it in some XANES
papers, but not in general PCA descriptions.  In Larch's PCA routines, the
target tranformation is simply a linear least-square regression of the
spectra with the component eigenvectors. (and the mean spectra is
subtracted for the PCA decomposition), so the most natural thing to report
is the resulting chi-square.  If I knew the definition of Spoil used in
other programs, we could report that too.

Sorry, that's probably not really giving the sort of simple answer you were
looking for.  One reason for this mailing list and this open discussion of
these analysis tools is to get to to clearer definitions of these sort of
terms that are often reported in the literature.

--Matt Newville
