Hello, I see in many publications the indicator values (IND) numbers used to evaluate how many components are included, and SPOIL value used to evaluate quality of target transform in principle components analysis (PCA). However, they are obtained from PCA performed on SIXpack, rather than Athena (version 0.9.26). Hence, I wonder if I can get similar parameters after running of PCA in Athena? If not, how do I evaluate the PCA results?
I don't know that there is a single mathematical definition for giving the number of significant components in PCA. For one thing, there is some ambiguity in literature about whether the spectral average is considered as a component or whether that is removed from the list of components. Even beyond that, I think there is not a clear standard for determining whether an eigenvector is below the noise level. If someone knows of one, I'd be happy to use it!
For XAFS and XANES in particular, it is often the case that one component (typically number 3, 4, or 5) has a high noise level, but some spectral weight right around the edge/white line energy. It could be that such a component is above the noise level in an important part of the spectrum but below the noise level overall. That makes it something of a judgement call of whether such a component should be considered as "significant". My inclination would be to consider such a component significant and see if it is clearly important for any of the targeted spectra (and it often is not).
I also do not know the exact definition or origin of the term "Spoil" to indicate the quality of a target transformation. I see it in some XANES papers, but not in general PCA descriptions. In Larch's PCA routines, the target tranformation is simply a linear least-square regression of the spectra with the component eigenvectors. (and the mean spectra is subtracted for the PCA decomposition), so the most natural thing to report is the resulting chi-square. If I knew the definition of Spoil used in other programs, we could report that too.
Sorry, that's probably not really giving the sort of simple answer you were looking for. One reason for this mailing list and this open discussion of these analysis tools is to get to to clearer definitions of these sort of terms that are often reported in the literature.