Re: [Ifeffit] Breaking down correlationships between parameters
Hi Matt,
Thanks a lot for your prompt reply. The method I am referring to is not the multiple k-weight fits by constraining N*S02. My apologies for not being clear enough. Let's do it again. I am actually referring to an approach where we take an advantage of a different k-dependence of various parameters to breakdown correlations between them. For example, S02 and sigma2. S02 is k-independent and Sigma2 has k^2 dependence.
In this case, to breakdown correlation between S02 and sigma2, one can assume a series of S02 values and perform fits using a single k-weight each time (say k-weight 1,2 and 3) and record corresponding sigma2 values. Let us say for k-weight =1, a series of preset S02 values will result in a series of corresponding sigma2 values refined in fits, which can be plotted as a straight line in sigma2 vs. S02 plot. Similar straight lines can be obtained for fits using k-weight = 2 and then 3. Now, these three lines may intersect at or near some point, which will determine the "true" value of parameters independent of k-weight. One can then constrain S02 to a value obtained from the point of intersection of three lines and vary sigma2 in a fit. In this particular case, however, the advantage is, S02 does not depend on changes inside sample and we have very good estimate of its range (say 0.7 - 1.0).
Now suppose instead of S02 (which i now set to a reasonable value), I am interested in determining N, but it is highly correlated with sigma2. Each time when disorder in the sample increases, the sigma2 increases and due to its high correlation, N is also overestimated. On the other hand, when the disorder in the sample decreases, the sigma2 decreases and I can have a "true" estimation of N in the sample. Can I still apply the above mentioned approach to break the correlationship between N and sigma2 and get a "true" estimation of N, even if disorder is high in my samples ? or it is simply not possible due to the fact that both N and sigma2 varies with changes inside the sample.
Best regards,
Jatin
________________________________________
From: ifeffit-bounces@millenia.cars.aps.anl.gov [ifeffit-bounces@millenia.cars.aps.anl.gov] on behalf of ifeffit-request@millenia.cars.aps.anl.gov [ifeffit-request@millenia.cars.aps.anl.gov]
Sent: Saturday, March 21, 2015 3:14 PM
To: ifeffit@millenia.cars.aps.anl.gov
Subject: Ifeffit Digest, Vol 145, Issue 38
Send Ifeffit mailing list submissions to
ifeffit@millenia.cars.aps.anl.gov
To subscribe or unsubscribe via the World Wide Web, visit
http://millenia.cars.aps.anl.gov/mailman/listinfo/ifeffit
or, via email, send a message with subject or body 'help' to
ifeffit-request@millenia.cars.aps.anl.gov
You can reach the person managing the list at
ifeffit-owner@millenia.cars.aps.anl.gov
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Ifeffit digest..."
Today's Topics:
1. Re: amplitude parameter S02 larger than 1 (Scott Calvin)
2. Breaking down correlationships between parameters
(Rana, Jatinkumar Kantilal)
3. Re: Breaking down correlationships between parameters
(Matt Newville)
----------------------------------------------------------------------
Message: 1
Date: Fri, 20 Mar 2015 18:53:06 -0400
From: Scott Calvin
On Mar 20, 2015, at 4:30 PM, huyanyun@physics.utoronto.ca wrote:
Hi Scott,
In all situations, 31.2 independent data points and 24 variables were used. In the case of setting S02 to a value, 23 variables were used.
Let me know if there is any other info needed.
Best, Yanyun
Quoting Scott Calvin
: Hi Yanyun,
To actually do a Hamilton test, the one other thing I need to know the number of degrees of freedom in the fit...if you provide that, I'll walk you through how to actually do a Hamilton test--it's not that bad, with the aid of an online calculator, and I think it might be instructive for some of the other people reading this list who are trying to learn EXAFS.
--Scott Calvin Sarah Lawrence College
On Mar 20, 2015, at 3:46 PM, huyanyun@physics.utoronto.ca wrote:
Hi Scott,
Thank you so much for giving me your thought again. It is very helpful to know how you and other XAFS experts deal with unusual situations.
The floating S02 is fitted to be 1.45+/-0.14, this just means the fit doesn't like the idea of an S02 in a typical range. Instead of setting S02 to 0.9, I have to figure out why it happens and what it might indicate.
I guess a Hamilton test is done by adjusting one parameter (i.e., S02) while keeping other conditions and model the same. Is that right? So I record this test as following:
1) Floating S02: S02 fits to 1.45+/-0.14, R=0.0055, reduced chi^2=17.86, Percentage=0.53+/-0.04 2) Set S02=0.7, R=0.044, reduced chi^2=120.6, percentage=0.81+/-0.2 3) set S02=0.8, R=0.030, reduced chi^2=86.10, percentage=0.77+/-0.07 3) set S02=0.9, R=0.021, reduced chi^2=60.16, percentage=0.72+/-0.06 4) set S02=1.0, R=0.017, reduced chi^2=49.5, percentage=0.67+/-0.05 5) set S02=1.1, R=0.012, reduced chi^2=35.1, percentage=0.62+/-0.03 6) set S02=1.2, R=0.009, reduced chi^2=24.9, percentage=0.59+/-0.02 7) set S02=1.3, R=0.007, reduced chi^2=18.9, percentage=0.57+/-0.02 8) set S02=1.4, R=0.0057, reduced chi^2=16.1, percentage=0.55+/-0.02 9) Floating S02 to be 1.45+/-0.14 10) set S02=1.6, R=0.006, reduced chi^2=17.8, percentage=0.53+/- 0.02 11) set S02=2.0, R=0.044, reduced chi^2=120.7, percentage=0.37+/-0.06.
Therefore, I will say S02 falling in the range 1.2~1.6 gives statistically improved fit, but S02=0.9 is not terrible as well. I agree with you that I could always be confident to say the percentage is 0.64+/-0.15, but I do want to shrink down the uncertainty and think about other possibilities that could cause a large S02.
I did double-check the data-reduction and normalization process. I don't think I can improve anything in this step. By the way, I have a series of similar samples and their fittings all shows floating S02 larger than one based on the same two-sites model.
Best, Yanyun
Quoting Scott Calvin
: Hi Yanyun,
Lots of comments coming in now, so I?m editing this as I write it!
One possibility for why you're getting a high best-fit S02 is that the fit doesn't care all that much about what the value of S02; i.e. there is broad range of S02's compatible with describing the fit as "good." That should be reflected in the uncertainty that Artemis reports. If S02 is 1.50 +/- 0.48, for example, that means the fit isn't all that "sure" what S02 should be. That would mean we could just shrug our shoulders and move on, except that it correlates with a parameter you are interested in (in this case, site occupancy). So in such a case, I think you can cautiously fall back on what might be called a "Bayesian prior"; i.e., the belief that the S02 should be "around" 0.9, and set the S02 to 0.9. (Or perhaps restrain S02 to 0.9; then you're really doing something a bit more like the notion of a Bayesian prior.)
On the other hand, if the S02 is, say, 1.50 +/- 0.07, then the fit really doesn?t like the idea of an S02 in the typical range. An S02 that high, with that small an uncertainty, suggests to me that something is wrong?although it could be as simple as a normalization issue during data reduction. In that case, I?d be more skeptical of just setting S02 to 0.90 and going with that result; the fit is trying to tell you something, and it?s important to track down what that something is.
Of course, once in a while, a fit will find a local minimum, while there?s another good local minimum around a more realistic value. That would be reflected by a fit that gave similarly good quantitative measures of fit quality (e.g. R-factors) when S02 is fit (and yields 1.50 +/- 0.07) as when its forced to 0.90. That?s somewhat unusual, however, particularly with a global parameter like S02.
A good way to defend setting S02 to 0.90 is to use the Hamilton test to see if floating S02 yields a statistically significant improvement over forcing it to 0.90. If not, using your prior best estimate for S02 is reasonable.
If you did that, though, I?d think that it would be good to mention what happened in any eventual publication of presentation; it might provide an important clue to someone who follows up with this or a similar system. It would also be good to increase your reported uncertainty for site occupancy (and indicate in the text what you?ve done). I now see that your site occupancies are 0.53 +/- 0.04 for the floated S02, and 0.72 +/-0.06 for the S02 = 0.90. That?s not so bad, really. It means that you?re pretty confident that the site occupancy is 0.64 +/- 0.15, which isn?t an absurdly large uncertainty as these things go.
To be concrete, if all the Hamilton test does not show statistically significant improvement by floating S02, then I might write something like this in any eventual paper: ?The site occupancy was highly correlated with S02 in our fits, making it difficult to determine the site occupancy with high precision. If S02 is constrained to 0.90, a plausible value for element [X] [ref], then the site occupancy is 0.53 +/- 0.04. If constrained to 1.0, the site occupancy is [whatever it comes out to be] To reflect the increased uncertainty associated with the unknown value for S02, we are adopting a value of 0.53 +/- [enough uncertainty to cover the results found for S02 = 1.0].
Of course, if you do that, I?d also suggest tracking down as many other possibilities for why your fit is showing high values of S02 as you can; e.g., double-check your normalization during data reduction.
If, on the other hand, the Hamilton test does show the floated S02 is yielding a statistically significant improvement, I think you have a bigger issue. Looking at, e.g., whether you may have constrained coordination numbers incorrectly becomes more critical.
?Scott Calvin Sarah Lawrence College
_______________________________________________ Ifeffit mailing list Ifeffit@millenia.cars.aps.anl.gov http://millenia.cars.aps.anl.gov/mailman/listinfo/ifeffit
_______________________________________________ Ifeffit mailing list Ifeffit@millenia.cars.aps.anl.gov http://millenia.cars.aps.anl.gov/mailman/listinfo/ifeffit
------------------------------
Message: 2
Date: Sat, 21 Mar 2015 13:06:39 +0000
From: "Rana, Jatinkumar Kantilal"
Hi Jatin, On Sat, Mar 21, 2015 at 10:41 AM, Rana, Jatinkumar Kantilal < jatinkumar.rana@helmholtz-berlin.de> wrote:
Hi Matt,
Thanks a lot for your prompt reply. The method I am referring to is not the multiple k-weight fits by constraining N*S02. My apologies for not being clear enough. Let's do it again. I am actually referring to an approach where we take an advantage of a different k-dependence of various parameters to breakdown correlations between them. For example, S02 and sigma2. S02 is k-independent and Sigma2 has k^2 dependence.
Yes, I am familiar with this approach, and I understand that this is what you are using. What I am saying is that this does not work nearly as well as (sometimes) claimed, and is sort of cheating. It ignores the measures of statistical significance. In this case, to breakdown correlation between S02 and sigma2, The correlation between N*S02 and sigma2 is inherent to the finite k-range of the EXAFS signal. It cannot be "broken", though it might be reduced.
one can assume a series of S02 values and perform fits using a single k-weight each time (say k-weight 1,2 and 3) and record corresponding sigma2 values.
Let us say for k-weight =1, a series of preset S02 values will result in a
series of corresponding sigma2 values refined in fits, which can be plotted as a straight line in sigma2 vs. S02 plot.
OK, one can fit sigma2 with a series of preset values on N*S02. That's fine. But it does NOT lead to an infinitely thin line of sigma2 vs. N*S02. Each sigma2 value on that line has a width, corresponding to its uncertainty. In fact, the line you produce nicely demonstrates and measures the correlation of N*S02 and sigma2 as the slope of this line.
Similar straight lines can be obtained for fits using k-weight = 2 and then 3.
Now, these three lines may intersect at or near some point, which will
determine the "true" value of parameters independent of k-weight.
The different lines (each with finite thickness) will give a *range of values* for N*S02 and sigma2, not a single value. The biggest problem with this approach is that it ignores the relative goodness-of-fits (let's just assume that is 'chi-square' for the purpose of this discussion) for the fits along these lines. Some fits are better than others, and this approach completely ignores that fact, and equally importantly ignores the fact that there is a range of values for chi-square that are consistent with "good". If you include these values, your linear plot will become contours of chi-square as a function of N*S02 and sigma2. And, yes, by using different k-weights and k-ranges and so on you can get overlapping contour plots which may reduce the correlation a small amount when looked at as an ensemble. And you can find a best set of values for N*S02 and sigma2, but *each* of these will have an uncertainty. So, you can use this approach to find a good value for N*S02, but it is not breaking the correlation. You can do this by hand. Or you can just do a fit with datasets with different k-weights and k-ranges. When you do this as a fit, you will see that the correlation is still fairly large. Also, just to be clear, this is absolutely not a "true" value. It is a measured value. Not at all the same thing. One can then constrain S02 to a value obtained from the point of
intersection of three lines and vary sigma2 in a fit.
Well, one can certainly set N*S02 to some value and fit sigma2. As I said earlier, this ignores the correlation of N*S02 and sigma2, but does not remove that correlation.
In this particular case, however, the advantage is, S02 does not depend on changes inside sample and we have very good estimate of its range (say 0.7 - 1.0).
Now suppose instead of S02 (which i now set to a reasonable value), I am interested in determining N, but it is highly correlated with sigma2. Each time when disorder in the sample increases, the sigma2 increases and due to its high correlation, N is also overestimated. On the other hand, when the disorder in the sample decreases, the sigma2 decreases and I can have a "true" estimation of N in the sample. Can I still apply the above mentioned approach to break the correlationship between N and sigma2 and get a "true" estimation of N, even if disorder is high in my samples ? or it is simply not possible due to the fact that both N and sigma2 varies with changes inside the sample.
N and S02 are always 100% correlated (mathematically, not merely by the finite k range). So, to the extent that the approach works at all, you can use it for "N" or "S02". Really, the approach is comparing N*S02 and sigma2, in one case you asserted a value of "N" and projected all changes to "S02" -- you can equally assert "S02" and project all changes to "N". To be clear, this is not going to find the "true" value of anything, because no analysis is ever going to find the "true" value -- it's going to find a measured value. Finally, the correlation of N*S02 and sigma2 does not imply a bias in the values for N*S02. N*S02 is NOT overestimated because it is highly correlated with sigma2. Hope that helps, --Matt
One side-comment from me:
On Mar 22, 2015, at 12:52 PM, Matt Newville
On Sun, Mar 22, 2015 at 12:44 PM, Scott Calvin
One side-comment from me:
On Mar 22, 2015, at 12:52 PM, Matt Newville
wrote: N and S02 are always 100% correlated (mathematically, not merely by the finite k range).
Matt is saying that N and S02 are always 100% correlated *for a single path*. But in some situations you might know N for one path but not others. For example, you might know that the absorbing atom is octahedrally coordinated to oxygen but not be as certain as to next-nearest neighbors, or that there are copper atoms on the corners of a simple cubic lattice with a mixture of atoms at other positions. In cases like that, both N for all paths but one and S02 can be fit without 100% correlation.
Yes, I completely agree with Scott -- this is a good point that I neglected. In addition to looking at multiple shells, one might also consider using temperature or pressure dependence to separate N*S02 and sigma2. Those aren't without assumptions, and still don't remove the inherent correlation, but are useful approaches. The degeneracy of multiple-scattering paths can often be constrained in
terms of the coordination numbers for direct-scattering paths, which can further reduce (not “break”) the correlation.
In terms of the main question, I agree with Matt: I don’t think there’s much point in using the line-crossing technique nowadays; fitting using multiple k-weights simultaneously accomplishes the same thing but is a bit easier to interpret statistically.
—Scott Calvin Sarah Lawrence College
_______________________________________________ Ifeffit mailing list Ifeffit@millenia.cars.aps.anl.gov http://millenia.cars.aps.anl.gov/mailman/listinfo/ifeffit
--Matt
participants (3)
-
Matt Newville
-
Rana, Jatinkumar Kantilal
-
Scott Calvin