Re: [Ifeffit] Question about PCA

24 Apr 2019

      Hi Joselaine,

On Tue, Apr 23, 2019 at 2:35 PM Joselaine Cáceres gonzalez <
joselainecaceres@gmail.com> wrote:
...
Dear all,
I performed PCA in a set of 11 XANES spectra and I was trying to use some
of the functions and tests developed by Malinowski to extract the number of
primary components.
I am not familiar with the details of those tests.  Do you have a reference
for them?

I found differences between eigenvalues calculated whit a matrix calculator
...
and those calculated by ATHENA, but they are equivalent in some way since
they explain the same amount of variance and when I calculate the IND
function or  Malinowski F-test I reach the same conclusions.
To  calculate the eigenvalues with the matrix calculator I first
normalized my data matrix (step height normalization) with ATHENA, then I
export the data to an Excel spreadsheet where I centered them:
z=(value-mean)/standard_dev.
I believe that Athena does not do this centering.   I would imagine that
dividing by the standard deviation could skew the data.
...
Then I used SVD and Eigenvalue decomposition to find the eigenvalues and
those obtained with SVD explain exactly the same amount of variance than
those found with ATHENA but they are different.
I think I don't understand what you mean by "eigenvalues explain exactly
the amount of variance ... but they are different".  Can you clarify?
Giving an actual example might help.

In all the calculation I did, I used the same amount of points 161, points
...
in each spectrum in ATHENA and in the matrix calculator, exactly the same
energy interval.

...
I´d like to know if I am missing some step in the data pre-treatment that
is prevented me to find the same eigenvalues obtained by ATHENA but also
I´d like to know if I can straight forward use the values obtained with
ATHENA to evaluate the functions proposed by Malinowski since these
equations might be different depending if calculations are made using
"covariance about the origin" or "correlation about the origin" and I am
not sure what is the case.
I don't know the answer to any of those questions.

I can say that when I compare PCA with Athena and with Larch (using
scikit-learn's PCA), I do get very similar looking eigenvectors for the
first few eigen components. To be clear, scikit-learn's PCA does first
subtract out the mean (but does not divide by the standard deviation),
whereas Athena identifies this as the 1st component.   So there is a
potential  "off by 1" counting issue, but that is easily worked out.

The scalar values I get with scikit-learn's PCA are different from what
Athena reports.  scikit-learn PCA returns both the eigenvalues (the
explained variance) and the explained variance ratio -- weights that will
add to 1 . I generally find the latter to be more useful, but maybe I'm not
understand what the eigenvales can be used.

Hope that helps. FWIW, I'm trying to learn all this stuff better too.
Perhaps you can give some insight?

--Matt

Re: [Ifeffit] Question about PCA

Matt Newville