Partial Least Squares

The Partial Least Squares methods simply says that there is a weighting matrix W that can be used to transform the spectra matrix X into the valence matrix V:

V = X \times W

and seeks to find the weighting matrix W using a least-squares approach.

In principle, :math;`W` needs j=n rows, but we know from PCA that we probably can by with many fewer weighting components than spectra.

That is, it finds W such that minimizes:

V - X \times W

The Partial Least Squares method finds a weighting matrix to satisfy this relation. It is essentially linear regression of the spectra to the value of valence. And we can determine j – how many weighting components are needed to explain the variance in our external variable.

With W, we can predict the value for valence of an unknown spectra. Without any other analysis.

This machine learning method can be a very powerful analysis approach.

We can use try many permutations to cross-validate the weights: use some spectra to find a set of weights (train the model) and then test how well this works with other spectra left out of the training set.

Warnings and Caveats

Like other machine learning methods, this is powerful but not without concerns

1. It has no idea that this is X-ray absorption data. No physics/chemistry at all.

2. The energy array has no meaning. The energy order does not matter, and the energy resolution does not matter.

3. Like all linear regression mehods, it will find a linear relationship between spectra and “external variable” (valence), even it one does not actually exist.

That is, things can go badly. But when it works, it can seem like magic…