# [Ifeffit] Question about PCA

Matt Newville newville at cars.uchicago.edu
Tue Apr 23 21:14:40 CDT 2019

```Hi Joselaine,

On Tue, Apr 23, 2019 at 2:35 PM Joselaine Cáceres gonzalez <
joselainecaceres at gmail.com> wrote:

> Dear all,
>
> I performed PCA in a set of 11 XANES spectra and I was trying to use some
> of the functions and tests developed by Malinowski to extract the number of
> primary components.
>

I am not familiar with the details of those tests.  Do you have a reference
for them?

I found differences between eigenvalues calculated whit a matrix calculator
> and those calculated by ATHENA, but they are equivalent in some way since
> they explain the same amount of variance and when I calculate the IND
> function or  Malinowski F-test I reach the same conclusions.
> To  calculate the eigenvalues with the matrix calculator I first
> normalized my data matrix (step height normalization) with ATHENA, then I
> export the data to an Excel spreadsheet where I centered them:
> z=(value-mean)/standard_dev.
>

I believe that Athena does not do this centering.   I would imagine that
dividing by the standard deviation could skew the data.

> Then I used SVD and Eigenvalue decomposition to find the eigenvalues and
> those obtained with SVD explain exactly the same amount of variance than
> those found with ATHENA but they are different.
>

I think I don't understand what you mean by "eigenvalues explain exactly
the amount of variance ... but they are different".  Can you clarify?
Giving an actual example might help.

In all the calculation I did, I used the same amount of points 161, points
> in each spectrum in ATHENA and in the matrix calculator, exactly the same
> energy interval.
>

> I´d like to know if I am missing some step in the data pre-treatment that
> is prevented me to find the same eigenvalues obtained by ATHENA but also
> I´d like to know if I can straight forward use the values obtained with
> ATHENA to evaluate the functions proposed by Malinowski since these
> equations might be different depending if calculations are made using
> "covariance about the origin" or "correlation about the origin" and I am
> not sure what is the case.
>

I don't know the answer to any of those questions.

I can say that when I compare PCA with Athena and with Larch (using
scikit-learn's PCA), I do get very similar looking eigenvectors for the
first few eigen components. To be clear, scikit-learn's PCA does first
subtract out the mean (but does not divide by the standard deviation),
whereas Athena identifies this as the 1st component.   So there is a
potential  "off by 1" counting issue, but that is easily worked out.

The scalar values I get with scikit-learn's PCA are different from what
Athena reports.  scikit-learn PCA returns both the eigenvalues (the
explained variance) and the explained variance ratio -- weights that will
add to 1 . I generally find the latter to be more useful, but maybe I'm not
understand what the eigenvales can be used.

Hope that helps. FWIW, I'm trying to learn all this stuff better too.
Perhaps you can give some insight?

--Matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://millenia.cars.aps.anl.gov/pipermail/ifeffit/attachments/20190423/55b4f41d/attachment.html>
```