[Ifeffit] Question about PCA

Joselaine Cáceres gonzalez joselainecaceres at gmail.com
Wed Apr 24 20:38:24 CDT 2019


Hi Matt, thank you for your answer!. The references I have about
Malinowski´s work and some applications are:

Malinowski, E.R., *Theory of error in factor analysis.* Analytical
Chemistry, 1977. *49*(4): p. 606-612.

 Malinowski, E.R., *Theory of the distribution of error eigenvalues
resulting from principal component analysis with applications to
spectroscopic data.* Journal of Chemometrics, 1987. *1*(1): p. 33-40.

Malinowski, E.R., *Statistical F-tests for abstract factor analysis and
target testing.* Journal of Chemometrics, 1989. *3*(1): p. 49-60.

Malinowski, E.R., *Adaptation of the Vogt–Mizaikoff F-test to determine the
number of principal factors responsible for a data matrix and comparison
with other popular methods.* Journal of Chemometrics, 2004. *18*(9): p.
387-392.

McCue, M. and E.R. Malinowski, *Target Factor Analysis of the Ultraviolet
Spectra of Unresolved Liquid Chromatographic Fractions.* Applied
Spectroscopy, 1983. *37*(5): p. 463-469.

Beauchemin, S., D. Hesterberg, and M. Beauchemin, *Principal Component
Analysis Approach for Modeling Sulfur K-XANES Spectra of Humic Acids.*
Soils Science Society of America Journal, 2002. *66*: p. 83-91.

Wasserman, S.R., et al., *EXAFS and principal component analysis: a new
shell game.* Journal of Synchrotron Radiation, 1999. *6*: p. 284-286.

I think I don't understand what you mean by "eigenvalues explain exactly
the amount of variance ... but they are different".  Can you clarify?
Giving an actual example might help.

Here are results obtained by ATHENA, the factional variance explained by
each eigenvalue is calulated by dividing the eigenvalue between the sum of
them all, right?:
#  Eignevalues    Variance    Cumulative variance
1 8,864394 0,80585 0,805854
2 1,227578 0,11160 0,917452
3 0,708334 0,06439 0,981846
4 0,129478 0,01177 0,993617
5 0,045127 0,00410 0,997719
6 0,012229 0,00111 0,998831
7 0,009464 0,00086 0,999692
8 0,001489 0,00014 0,999827
9 0,000989 0,00009 0,999917
10 0,000617 0,00006 0,999973
11 0,000298 0,00003 1

Here are results obtained with matrix calculator for the same data:
Eigenvalue Explained Variance Cumulative variance
1418,3057 0,80586 0,80586
196,4118 0,11160 0,917453
113,3331 0,06439 0,981846
20,7162 0,01177 0,993617
7,2205 0,00410 0,997719
1,9566 0,00111 0,998831
1,5144 0,00086 0,999692
0,2381 0,00014 0,999827
0,1583 0,00009 0,999917
0,0987 0,00006 0,999973
0,0477 0,00003 1,000000

The eigenvalues are used then to evaluate the function IND and F test, and
depending on the values of eigenvalues, function IND reach a minimum value
when the set of primary components are separated from the secondary ones
that just explained experimental errors (in the equations lambda are the
eigenvalues, r the numbers of rows, c columns, n the number of primary
components):
[image: image.png]


The results obtained with the two sets of eigenvalues are diferent but they
reach the minimum in the same n. The F test also gives me similar levels of
significance for the two sets, but I do not undestand why I´m not hable to
find the same eigenvalues that ATHENA does.
By the way, I tested the possibility you told to not divide the data by the
standard deviation and still couldn´t find the same eigenvalues.
Best regards.
Joselaine


El mar., 23 abr. 2019 a las 23:15, Matt Newville (<
newville at cars.uchicago.edu>) escribió:

> Hi Joselaine,
>
> On Tue, Apr 23, 2019 at 2:35 PM Joselaine Cáceres gonzalez <
> joselainecaceres at gmail.com> wrote:
>
>> Dear all,
>>
>> I performed PCA in a set of 11 XANES spectra and I was trying to use some
>> of the functions and tests developed by Malinowski to extract the number of
>> primary components.
>>
>
> I am not familiar with the details of those tests.  Do you have a
> reference for them?
>
> I found differences between eigenvalues calculated whit a matrix
>> calculator and those calculated by ATHENA, but they are equivalent in some
>> way since they explain the same amount of variance and when I calculate the
>> IND function or  Malinowski F-test I reach the same conclusions.
>> To  calculate the eigenvalues with the matrix calculator I first
>> normalized my data matrix (step height normalization) with ATHENA, then I
>> export the data to an Excel spreadsheet where I centered them:
>> z=(value-mean)/standard_dev.
>>
>
> I believe that Athena does not do this centering.   I would imagine that
> dividing by the standard deviation could skew the data.
>
>
>
>> Then I used SVD and Eigenvalue decomposition to find the eigenvalues and
>> those obtained with SVD explain exactly the same amount of variance than
>> those found with ATHENA but they are different.
>>
>
> I think I don't understand what you mean by "eigenvalues explain exactly
> the amount of variance ... but they are different".  Can you clarify?
> Giving an actual example might help.
>
> In all the calculation I did, I used the same amount of points 161, points
>> in each spectrum in ATHENA and in the matrix calculator, exactly the same
>> energy interval.
>>
>
>
>> I´d like to know if I am missing some step in the data pre-treatment that
>> is prevented me to find the same eigenvalues obtained by ATHENA but also
>> I´d like to know if I can straight forward use the values obtained with
>> ATHENA to evaluate the functions proposed by Malinowski since these
>> equations might be different depending if calculations are made using
>> "covariance about the origin" or "correlation about the origin" and I am
>> not sure what is the case.
>>
>
> I don't know the answer to any of those questions.
>
> I can say that when I compare PCA with Athena and with Larch (using
> scikit-learn's PCA), I do get very similar looking eigenvectors for the
> first few eigen components. To be clear, scikit-learn's PCA does first
> subtract out the mean (but does not divide by the standard deviation),
> whereas Athena identifies this as the 1st component.   So there is a
> potential  "off by 1" counting issue, but that is easily worked out.
>
> The scalar values I get with scikit-learn's PCA are different from what
> Athena reports.  scikit-learn PCA returns both the eigenvalues (the
> explained variance) and the explained variance ratio -- weights that will
> add to 1 . I generally find the latter to be more useful, but maybe I'm not
> understand what the eigenvales can be used.
>
> Hope that helps. FWIW, I'm trying to learn all this stuff better too.
> Perhaps you can give some insight?
>
> --Matt
> _______________________________________________
> Ifeffit mailing list
> Ifeffit at millenia.cars.aps.anl.gov
> http://millenia.cars.aps.anl.gov/mailman/listinfo/ifeffit
> Unsubscribe: http://millenia.cars.aps.anl.gov/mailman/options/ifeffit
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://millenia.cars.aps.anl.gov/pipermail/ifeffit/attachments/20190424/6b37d574/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 44705 bytes
Desc: not available
URL: <http://millenia.cars.aps.anl.gov/pipermail/ifeffit/attachments/20190424/6b37d574/attachment-0001.png>


More information about the Ifeffit mailing list