LASSO

With Partial Least Squares, we get a weighting matrix, and can select the number j of important components.

But it still uses several components and hundreds of energy points to basically end up predict 1 (or maybe a couple) externl parameter – valence.

Note

We have way too many energy measurements in a XANES spectra to determine Valence

We should be able to identify n energies needed to determine valence – maybe even fewer. If we’re trying to determine

\rm [Fe^{3+}]/[Fe]

we might need only 2 or 3 energy values.

How can we identify which of the energy channels are most needed explain the full variation in valence V (or any other quantitative external variable)?

The LASSO (least absolute shrinkage and selection operator) method provides an robust way to further do such dimensional shrinkage. This is done with a regularization parameter \alpha that changes the least-squares minimization from minimizing

V - X \times W

to minimizing

V - X \times W + \alpha |W|

That it uses the absolute values of the weights (“L1 norm”) to further penalize the misfit.

With \alpha=0, LASSO will select n (the number of spectra) energies to explain all of the variance in Valence.

As \alpha increase, LASSO selects fewer energies as being important.

That is, LASSO will be able to identify around 10 energy points to determine valence.

We can apply the same cross-training schemes and prediction methods as for Partial Least Squares.