Hi Joselaine,

On Tue, Apr 23, 2019 at 2:35 PM Joselaine Cáceres gonzalez <joselainecaceres@gmail.com> wrote:

Dear all,

I performed PCA in a set of 11 XANES spectra and I was trying to use some of the functions and tests developed by Malinowski to extract the number of primary components.

I am not familiar with the details of those tests. Do you have a reference for them?

I found differences between eigenvalues calculated whit a matrix calculator and those calculated by ATHENA, but they are equivalent in some way since they explain the same amount of variance and when I calculate the IND function or Malinowski F-test I reach the same conclusions.
To calculate the eigenvalues with the matrix calculator I first normalized my data matrix (step height normalization) with ATHENA, then I export the data to an Excel spreadsheet where I centered them: z=(value-mean)/standard_dev.

I believe that Athena does not do this centering. I would imagine that dividing by the standard deviation could skew the data.

Then I used SVD and Eigenvalue decomposition to find the eigenvalues and those obtained with SVD explain exactly the same amount of variance than those found with ATHENA but they are different.

I think I don't understand what you mean by "eigenvalues explain exactly the amount of variance ... but they are different". Can you clarify? Giving an actual example might help.

In all the calculation I did, I used the same amount of points 161, points in each spectrum in ATHENA and in the matrix calculator, exactly the same energy interval.

I´d like to know if I am missing some step in the data pre-treatment that is prevented me to find the same eigenvalues obtained by ATHENA but also I´d like to know if I can straight forward use the values obtained with ATHENA to evaluate the functions proposed by Malinowski since these equations might be different depending if calculations are made using "covariance about the origin" or "correlation about the origin" and I am not sure what is the case.

I don't know the answer to any of those questions.

I can say that when I compare PCA with Athena and with Larch (using scikit-learn's PCA), I do get very similar looking eigenvectors for the first few eigen components. To be clear, scikit-learn's PCA does first subtract out the mean (but does not divide by the standard deviation), whereas Athena identifies this as the 1st component. So there is a potential "off by 1" counting issue, but that is easily worked out.

The scalar values I get with scikit-learn's PCA are different from what Athena reports. scikit-learn PCA returns both the eigenvalues (the explained variance) and the explained variance ratio -- weights that will add to 1 . I generally find the latter to be more useful, but maybe I'm not understand what the eigenvales can be used.

Hope that helps. FWIW, I'm trying to learn all this stuff better too. Perhaps you can give some insight?

--Matt