Re: [Ifeffit] Breaking down correlationships between parameters

23 Mar 2015

      Hi Matt,

Thank you very much for your detailed explanation. As you pointed out that this approach ignores the statistical significance of fits and assumes that all fits are "good" fits. Also, the point that this approach yields a value of the parameter which is only slightly less correlated with the other one, but not completely removes the correlation. It makes it really clear to me that how this approach works and what are the pros and cons.

Well, I myself has never tried this approach of minimizing the correlation between N*S02 and sigma2, but I read a lot about it in the literature. With my limited knowledge about the method, I could not judge this approach, although I had my own doubts.

I truly appreciate your efforts in providing me a deeper insight into this approach.

Best regards,
Jatin

-----Original Message-----
From: ifeffit-bounces@millenia.cars.aps.anl.gov [mailto:ifeffit-bounces@millenia.cars.aps.anl.gov] On Behalf Of ifeffit-request@millenia.cars.aps.anl.gov
Sent: 22 March, 2015 18:00
To: ifeffit@millenia.cars.aps.anl.gov
Subject: Ifeffit Digest, Vol 145, Issue 40

Send Ifeffit mailing list submissions to
	ifeffit@millenia.cars.aps.anl.gov

To subscribe or unsubscribe via the World Wide Web, visit
	http://millenia.cars.aps.anl.gov/mailman/listinfo/ifeffit
or, via email, send a message with subject or body 'help' to
	ifeffit-request@millenia.cars.aps.anl.gov

You can reach the person managing the list at
	ifeffit-owner@millenia.cars.aps.anl.gov

When replying, please edit your Subject line so it is more specific than "Re: Contents of Ifeffit digest..."

Today's Topics:

   1. Re: Breaking down correlationships between parameters
      (Matt Newville)

----------------------------------------------------------------------

Message: 1
Date: Sun, 22 Mar 2015 11:52:30 -0500
From: Matt Newville 
To: XAFS Analysis using Ifeffit 
Subject: Re: [Ifeffit] Breaking down correlationships between
	parameters
Message-ID:

Content-Type: text/plain; charset="utf-8"

Hi Jatin,

On Sat, Mar 21, 2015 at 10:41 AM, Rana, Jatinkumar Kantilal < jatinkumar.rana@helmholtz-berlin.de> wrote:
...
Hi Matt,
Thanks a lot for your prompt reply. The method I am referring to is
not the multiple k-weight fits by constraining N*S02. My apologies for
not being clear enough. Let's do it again. I am actually referring to
an approach where we take an advantage of a different k-dependence of
various parameters to breakdown correlations between them. For
example, S02 and sigma2. S02 is k-independent and Sigma2 has k^2 dependence.
Yes, I am familiar with this approach, and I understand that this is what you are using.  What I am saying is that this does not work nearly as well as (sometimes) claimed, and is sort of cheating.  It ignores the measures of statistical significance.

In this case, to breakdown correlation between S02 and sigma2,

The correlation between N*S02 and sigma2 is inherent to the finite k-range of the EXAFS signal.  It cannot be "broken", though it might be reduced.
...
one can assume a series of S02 values and perform fits using a single
k-weight each time (say k-weight 1,2 and 3) and record corresponding
sigma2 values.
Let us say for k-weight =1, a series of preset S02 values will result in a
...
series of corresponding sigma2 values refined in fits, which can be
plotted as a straight line in sigma2 vs. S02 plot.
OK, one can fit sigma2 with a series of preset values on N*S02.  That's fine.  But it does NOT lead to an infinitely thin line of sigma2 vs.
N*S02.  Each sigma2 value on that line has a width, corresponding to its
uncertainty.   In fact, the line you produce nicely demonstrates and
measures the correlation of N*S02 and sigma2 as the slope of this line.
...
Similar straight lines can be obtained for fits using k-weight = 2 and
then 3.
Now, these three lines may intersect at or near some point, which will
...
determine the "true" value of parameters independent of k-weight.
The different lines (each with finite thickness) will give a *range of
values* for N*S02 and sigma2, not a single value.

The biggest problem with this approach is that it ignores the relative goodness-of-fits (let's just assume that is 'chi-square' for the purpose of
this discussion) for the fits along these lines.   Some fits are better
than others, and this approach completely ignores that fact, and equally importantly ignores the fact that there is a range of values for chi-square
that are consistent with "good".     If you include these values, your
linear plot will become contours of chi-square as a function of N*S02 and
sigma2.   And, yes, by using different k-weights and k-ranges and so on you
can get overlapping contour plots which may reduce the correlation a small amount when looked at as an ensemble.  And you can find a best set of values for N*S02 and sigma2, but *each* of these will have an uncertainty.

So, you can use this approach to find a good value for N*S02, but it is not breaking the correlation.  You can do this by hand.  Or you can just do a
fit with datasets with different k-weights and k-ranges.   When you do this
as a fit, you will see that the correlation is still fairly large.

Also, just to be clear, this is absolutely not a "true" value.  It is a measured value.  Not at all the same thing.

One can then constrain S02 to a value obtained from the point of
...
intersection of three lines and vary sigma2 in a fit.
Well, one can certainly set N*S02 to some value and fit sigma2.  As I said earlier, this ignores the correlation of N*S02 and sigma2, but does not remove that correlation.
...
In this particular case, however, the advantage is, S02 does not
depend on changes inside sample and we have very good estimate of its
range (say 0.7
- 1.0).
Now suppose instead of S02 (which i now set to a reasonable value), I
am interested in determining N, but it is highly correlated with
sigma2. Each time when disorder in the sample increases, the sigma2
increases and due to its high correlation, N is also overestimated. On
the other hand, when the disorder in the sample decreases, the sigma2
decreases and I can have a "true" estimation of N in the sample. Can I
still apply the above mentioned approach to break the correlationship between N and sigma2 and get a "true"
estimation of N, even if disorder is high in my samples ? or it is
simply not possible due to the fact that both N and sigma2 varies with
changes inside the sample.
N and S02 are always 100% correlated (mathematically, not merely by the finite k range).  So, to the extent that the approach works at all, you can use it for "N" or "S02".  Really, the approach is comparing N*S02 and sigma2, in one case you asserted a value of "N" and projected all changes to "S02" -- you can equally assert "S02" and project all changes to "N".

To be clear,  this is not going to find the "true" value of anything, because no analysis is ever going to find the "true" value -- it's going to find a measured value.

Finally, the correlation of N*S02 and sigma2 does not imply a bias in the values for N*S02.  N*S02 is NOT overestimated because it is highly correlated with sigma2.

Hope that helps,

--Matt