Athena: the R factor for normalized mu(E)
Hello all, I have been a frequent user of Athena for many years, mostly for interpreting P K-edge XANES spectra. Until last week I thought that the R factor in Athena was always defined as: sum( [data_i - fit_i]^2 ) ------------------------------- sum( data_i^2 ] This is also the definition given in the online manual, and it has been stated by me and by other colleagues in a number of papers dealing with P K-edge XANES. But well, this is not true when dealing with normalized XANES spectra! I realized this when I played around with a number of my old LC fits in Excel. While the chi-square value (or maybe more precisely, the sum of squared residuals) was reproduced perfectly, I always got "R factors" (i.e. with the above definition) between 2 and 3 times lower than what Athena gives. After that I consulted the Demeter programming documentation (https://bruceravel.github.io/demeter/pods/Demeter/LCF.pm.html) to find that, for normalized mu(E), "Demeter thus scales the R-factor to make it somewhat closer to 10^-2". However, the equation stated on this page actually reproduces the R factor even more poorly, and therefore I won't reiterate it here. After inspecting the Perl code, and trying out different alternatives in Excel, I now believe that the following equation provides a more accurate definition of the R factor (correct me if I'm wrong!): sum( [data_i - fit_i]^2 ) ------------------------------- sum( [data_i - avg data]^2 ) where "avg data" is the arithmetic mean of the data in the LC fitting range. It would be great if others could confirm this. As far as I understand, this won't affect the interpretations that any of us have made over the years, it only affects the understanding of what the R factor actually is... Kind regards, Jon Petter Jon Petter Gustafsson, Professor in Soil Chemistry Department of Soil and Environment Swedish University of Agricultural Sciences (SLU) Box 7014 750 07 Uppsala, Sweden Phone: 018-671284; e-mail: jon-petter.gustafsson@slu.semailto:jon-petter.gustafsson@slu.se --- N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r https://www.slu.se/om-slu/kontakta-slu/personuppgifter/ E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here https://www.slu.se/en/about-slu/contact-slu/personal-data/
Hi Jon Petter, On Tue, Jul 12, 2022 at 12:46 AM Jon-Petter Gustafsson < jon-petter.gustafsson@slu.se> wrote:
Hello all,
I have been a frequent user of Athena for many years, mostly for interpreting P K-edge XANES spectra. Until last week I thought that the R factor in Athena was always defined as:
sum( [data_i – fit_i]^2 )
-------------------------------
sum( data_i^2 ]
This is also the definition given in the online manual, and it has been stated by me and by other colleagues in a number of papers dealing with P K-edge XANES. But well, this is not true when dealing with normalized XANES spectra! I realized this when I played around with a number of my old LC fits in Excel. While the chi-square value (or maybe more precisely, the sum of squared residuals) was reproduced perfectly, I always got “R factors” (i.e. with the above definition) between 2 and 3 times lower than what Athena gives. After that I consulted the Demeter programming documentation ( https://bruceravel.github.io/demeter/pods/Demeter/LCF.pm.html) to find that, for normalized mu(E), “Demeter thus scales the R-factor to make it somewhat closer to 10^-2”. However, the equation stated on this page actually reproduces the R factor even more poorly, and therefore I won’t reiterate it here. After inspecting the Perl code, and trying out different alternatives in Excel, I now believe that the following equation provides a more accurate definition of the R factor (correct me if I’m wrong!):
sum( [data_i – fit_i]^2 )
-------------------------------
sum( [data_i – avg data]^2 )
where “avg data” is the arithmetic mean of the data in the LC fitting range. It would be great if others could confirm this. As far as I understand, this won’t affect the interpretations that any of us have made over the years, it only affects the understanding of what the R factor actually is…
Thanks, and yes, that does appear to be exactly what the Demeter code is doing. I never noticed that, or I guess it has honestly been a very long time since I used Athena for linear combination fitting. I'm not 100% sure why it would do that when fitting normalized mu(E), but not otherwise. I agree that it will not alter the actual interpretation of whether one fit is better than another. It might be that some sort of "remove the most obvious data trend" (often called "whitening") is a fine thing to do. FWIW, linear combination fitting in Larch reports an R-factor that does not subtract the average of the data in the denominator. Maybe it should? OTOH, one of the appealing features of the R factor is that it is meant to be really easy to understand and reproduced. Cheers, --Matt
participants (2)
-
Jon-Petter Gustafsson
-
Matt Newville