Determining the correct number of components from PCA
Hi all, I have a question about the "best" method to use to decide the correct number of components in a system based on a PCA. Myself and a collaborator took in-situ XANES scans of the formation of ZnO in aqueous solution. We have a series of scans from this experiment and I have done a PCA on them using SIXPack. After doing the PCA I unchecked the components until I could no longer visually reconstruct the series of scans accurately (by visual determination). Using this visual method I only need 2 components. But if I go by the minimum of the IND function (what the SIXPack user guide recommends) I should use three components. I have posted the sample/reconstruction plots for 2 and 3 component reconstructions here (the scan I chose to reconstruct is representative of the data series): http://dunx1.irt.drexel.edu/~kmm333/xafs/pca_recon_12.5_90_scan10_2comp.html http://dunx1.irt.drexel.edu/~kmm333/xafs/pca_recon_12.5_90_scan10_3comp.html The main difference between the two plots is that the 3 component reconstruction fits the edge region better. Zoom in (hold down the right mouse button) and you'll see what I'm talking about. That said I really have my doubts about 3 components being necessary, the visual differences are so slight that I can't help but think that these differences are just part of the noise. As for 2 or 3 components being physically possible, both are feasible give the chemistry involved. SIXPack does not provide IE or F test, so I plotted the IND results vs. number of components for all the reactions. Except for the lowest concentration the IND value reached a minimum at 3 components for all the reactions. You can see the plot of IND value vs. number of components here: http://dunx1.irt.drexel.edu/~kmm333/xafs/pca_ind_results.html The IND minimum is rather broad. Are there other techniques that I should be using to determine the number of components from the PCA? I look forward to your feedback. Kevin McPeak kmm333@drexel.edu -- PhD Candidate Chemical Engineering Drexel University P.S If you use Firefox or Safari you can zoom in on these plots by holding down the right mouse button and making a box. Release the right mouse button to zoom in on that box. Press the left arrow button to the left of the plot to un-zoom.
Hi Kevin Maybe the 3rd component is not due to noise but accounts for very small shifts in energy calibration between scans (considering that the reconstructed white-line is shifted to higher energies in the 2-component reconstruction). Do you have a reference foil measured with each scan to exactly align the spectra or some other way to check for small shifts in energy calibration? Best regards, Andreas -----Original Message----- From: ifeffit-bounces@millenia.cars.aps.anl.gov [mailto:ifeffit-bounces@millenia.cars.aps.anl.gov] On Behalf Of Kevin McPeak Sent: Donnerstag, 27. August 2009 07:23 To: ifeffit@millenia.cars.aps.anl.gov Subject: [Ifeffit] Determining the correct number of components from PCA Hi all, I have a question about the "best" method to use to decide the correct number of components in a system based on a PCA. Myself and a collaborator took in-situ XANES scans of the formation of ZnO in aqueous solution. We have a series of scans from this experiment and I have done a PCA on them using SIXPack. After doing the PCA I unchecked the components until I could no longer visually reconstruct the series of scans accurately (by visual determination). Using this visual method I only need 2 components. But if I go by the minimum of the IND function (what the SIXPack user guide recommends) I should use three components. I have posted the sample/reconstruction plots for 2 and 3 component reconstructions here (the scan I chose to reconstruct is representative of the data series): http://dunx1.irt.drexel.edu/~kmm333/xafs/pca_recon_12.5_90_scan10_2comp. html http://dunx1.irt.drexel.edu/~kmm333/xafs/pca_recon_12.5_90_scan10_3comp. html The main difference between the two plots is that the 3 component reconstruction fits the edge region better. Zoom in (hold down the right mouse button) and you'll see what I'm talking about. That said I really have my doubts about 3 components being necessary, the visual differences are so slight that I can't help but think that these differences are just part of the noise. As for 2 or 3 components being physically possible, both are feasible give the chemistry involved. SIXPack does not provide IE or F test, so I plotted the IND results vs. number of components for all the reactions. Except for the lowest concentration the IND value reached a minimum at 3 components for all the reactions. You can see the plot of IND value vs. number of components here: http://dunx1.irt.drexel.edu/~kmm333/xafs/pca_ind_results.html The IND minimum is rather broad. Are there other techniques that I should be using to determine the number of components from the PCA? I look forward to your feedback. Kevin McPeak kmm333@drexel.edu -- PhD Candidate Chemical Engineering Drexel University P.S If you use Firefox or Safari you can zoom in on these plots by holding down the right mouse button and making a box. Release the right mouse button to zoom in on that box. Press the left arrow button to the left of the plot to un-zoom. _______________________________________________ Ifeffit mailing list Ifeffit@millenia.cars.aps.anl.gov http://millenia.cars.aps.anl.gov/mailman/listinfo/ifeffit
participants (2)
-
Kevin McPeak
-
Voegelin, Andreas