[Ifeffit] Question about transform windows and statistical parameters

Fri May 13 10:38:53 CDT 2011

Scott,

On Thu, May 12, 2011 at 10:51 AM, Scott Calvin
<dr.scott.calvin at gmail.com> wrote:
> Hi Brandon,
> Matt and Bruce both gave good, thorough answers to your questions this
> morning. Nevertheless, I'm going to chime in too, because there are some
> aspects of this issue I'd like to put emphasis on.
> On May 11, 2011, at 8:46 PM, Brandon Reese wrote:
>
>  I tried your suggestion with epsilon and the chi-square values came out to
> be very similar values with the different windows.  Does this mean that
> reporting reduced chi-square values in a paper that compared several data
> sets would not be necessary and/or appropriate?
>
> Bruce said "no" emphatically, and I say "yes," but I think we've understood
> the question differently. As Bruce says:
>
> Of course, reduced chi-square can only be compared for fitting models which
> compute epsilon the same way or use the same value for epsilon.
>
> That's the key point. I've gotten away from reporting values for reduced
> chi-square (RCS). That's a personal choice, and is not in accord with the
> International X-Ray Absorption Society's Error Reporting Recommendation,
> available here:
> http://ixs.iit.edu/subcommittee_reports/sc
> I think the difficulty in choosing epsilon is more likely to make a reduced
> chi-square number confusing than enlightening. But I am moving increasingly
> toward reporting changes in reduced chi-square between fits on the same
> data, and applying Hamilton's test to determine if improvements are
> statistically significant.

Well, the Error Reporting recommendation is a minimal recommendation.
Going beyond it by comparing fits with Hamilton or other statistical
tests is not at all against the spirit of that document.  Such tests
are highly useful, they're just hard to apply to a single fit.

That seems very different to saying that reporting reduced chi-square
is not necessary or appropriate, which could be taken to mean that not
reporting any statistical analysis is better than reporting reduced
chi-square.

>  Would setting a value for epsilon allow comparisons across different
> k-ranges, different (but similar) data sets, or a combination of the two
> using the chi-square parameter?
>
> Maybe not. After all, the epsilon should be different for different
> k-ranges, as your signal to noise ratio probably changes as a function of k.
> Using the same epsilon doesn't reflect that.

Without seeing the data in question, this seems like speculation to
me.  I'm not at all sure why epsilon (the variation in chi(k)) should
depend strongly on the k-range.  In my experience, it usually does
not.  The S/N ratio will surely change with k, but that would surely
be dominated by the rapid decay in |chi(k)|, rather than a change in
epsilon.

> In playing around with different windows and dk values my fit variables
> generally stayed within the error bars, but the size of the error bars could
> change more than a factor 2.  Does this mean that it would make sense to
> find a window/dk that seems to "work" for a given group of data and stay
> consistent when analyzing that data group?
>
> The fact that your variables stay within the error bars is good news. The
> change in the size of the error bars may be related to a less-than-ideal
> value for dk you may have used for the Kaiser-Bessel window.
> But yes, find a window and dk combination that seems to work well and then
> stay consistent for that analysis. Unless the data is particularly
> problematic, I'd prefer making a reasoned choice before beginning to fit and
> then sticking with it; a posteriori choices for that kind of thing make me a
> little nervous.
>
> * * *
> At the risk of being redundant, four quick examples.
> Example 1: You change the range of R values in the Fourier transform over
> which you are fitting a data set.
> For this example, RCS is a valuable statistic for letting you know whether
> the fit supports the change in R-range.

> Example 2: You change the range of k values over which you are fitting your
> data.
> For this example, comparing RCS is unlikely to be useful. You are likely
> trying different k-ranges because you are suspicious about some of the data
> at the extremes of your range. Including or excluding that data likely
> implies epsilon should be changed, but by how much?

I think your assumption that epsilon will depend strongly on k may not
correct.  Do you have evidence for this?   I would say that it is not
strongly dependent on k, and that reduced chi-square is useful in
comparing fits with different k-ranges.

I find it confusing that you expect  the noise in the data to depend
(strongly, even) on k, but not on R.    The general wisdom is the
estimate of epsilon from the high-R components is too low, suggesting
that the R dependence is significant.    Every time I've looked, I
come to the conclusion that noise in data is either consistent with
"white" or so small as to be difficult to measure.  I believe Corwin
Booth's recent work supports the conventional wisdom that  epsilon
decreases with R, but I don't recall it suggesting a significant k
dependence.

It would be interesting to have more evidence.

--Matt