[Ifeffit] Breaking down correlationships between parameters

Matt Newville newville at cars.uchicago.edu
Sun Mar 22 11:52:30 CDT 2015


Hi Jatin,


On Sat, Mar 21, 2015 at 10:41 AM, Rana, Jatinkumar Kantilal <
jatinkumar.rana at helmholtz-berlin.de> wrote:

> Hi Matt,
>
> Thanks a lot for your prompt reply. The method I am referring to is not
> the multiple k-weight fits by constraining N*S02. My apologies for not
> being clear enough. Let's do it again. I am actually referring to an
> approach where we take an advantage of a different k-dependence of various
> parameters to breakdown correlations between them. For example, S02 and
> sigma2. S02 is k-independent and Sigma2 has k^2 dependence.
>
>
Yes, I am familiar with this approach, and I understand that this is what
you are using.  What I am saying is that this does not work nearly as well
as (sometimes) claimed, and is sort of cheating.  It ignores the measures
of statistical significance.

In this case, to breakdown correlation between S02 and sigma2,


The correlation between N*S02 and sigma2 is inherent to the finite k-range
of the EXAFS signal.  It cannot be "broken", though it might be reduced.


> one can assume a series of S02 values and perform fits using a single
> k-weight each time (say k-weight 1,2 and 3) and record corresponding sigma2
> values.

Let us say for k-weight =1, a series of preset S02 values will result in a
> series of corresponding sigma2 values refined in fits, which can be plotted
> as a straight line in sigma2 vs. S02 plot.


OK, one can fit sigma2 with a series of preset values on N*S02.  That's
fine.  But it does NOT lead to an infinitely thin line of sigma2 vs.
N*S02.  Each sigma2 value on that line has a width, corresponding to its
uncertainty.   In fact, the line you produce nicely demonstrates and
measures the correlation of N*S02 and sigma2 as the slope of this line.


> Similar straight lines can be obtained for fits using k-weight = 2 and
> then 3.

Now, these three lines may intersect at or near some point, which will
> determine the "true" value of parameters independent of k-weight.


The different lines (each with finite thickness) will give a *range of
values* for N*S02 and sigma2, not a single value.

The biggest problem with this approach is that it ignores the relative
goodness-of-fits (let's just assume that is 'chi-square' for the purpose of
this discussion) for the fits along these lines.   Some fits are better
than others, and this approach completely ignores that fact, and equally
importantly ignores the fact that there is a range of values for chi-square
that are consistent with "good".     If you include these values, your
linear plot will become contours of chi-square as a function of N*S02 and
sigma2.   And, yes, by using different k-weights and k-ranges and so on you
can get overlapping contour plots which may reduce the correlation a small
amount when looked at as an ensemble.  And you can find a best set of
values for N*S02 and sigma2, but *each* of these will have an
uncertainty.

So, you can use this approach to find a good value for N*S02, but it is not
breaking the correlation.  You can do this by hand.  Or you can just do a
fit with datasets with different k-weights and k-ranges.   When you do this
as a fit, you will see that the correlation is still fairly large.

Also, just to be clear, this is absolutely not a "true" value.  It is a
measured value.  Not at all the same thing.

One can then constrain S02 to a value obtained from the point of
> intersection of three lines and vary sigma2 in a fit.


Well, one can certainly set N*S02 to some value and fit sigma2.  As I said
earlier, this ignores the correlation of N*S02 and sigma2, but does not
remove that correlation.


> In this particular case, however, the advantage is, S02 does not depend on
> changes inside sample and we have very good estimate of its range (say 0.7
> - 1.0).
>
> Now suppose instead of S02 (which i now set to a reasonable value), I am
> interested in determining N, but it is highly correlated with sigma2. Each
> time when disorder in the sample increases, the sigma2 increases and due to
> its high correlation, N is also overestimated. On the other hand, when the
> disorder in the sample decreases, the sigma2 decreases and I can have a
> "true" estimation of N in the sample. Can I still apply the above mentioned
> approach to break the correlationship between N and sigma2 and get a "true"
> estimation of N, even if disorder is high in my samples ? or it is simply
> not possible due to the fact that both N and sigma2 varies with changes
> inside the sample.
>
>
N and S02 are always 100% correlated (mathematically, not merely by the
finite k range).  So, to the extent that the approach works at all, you can
use it for "N" or "S02".  Really, the approach is comparing N*S02 and
sigma2, in one case you asserted a value of "N" and projected all changes
to "S02" -- you can equally assert "S02" and project all changes to "N".

To be clear,  this is not going to find the "true" value of anything,
because no analysis is ever going to find the "true" value -- it's going to
find a measured value.

Finally, the correlation of N*S02 and sigma2 does not imply a bias in the
values for N*S02.  N*S02 is NOT overestimated because it is highly
correlated with sigma2.

Hope that helps,

--Matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://millenia.cars.aps.anl.gov/pipermail/ifeffit/attachments/20150322/86f5b809/attachment.html>


More information about the Ifeffit mailing list