[Ifeffit] Question about transform windows and statistical parameters

Fri May 13 12:32:14 CDT 2011

Hi Scott,

Sorry, I read epsilon as "noise in chi(k)".  This is the most
meaningful physical/statistical measure: epsilon_r surely depends on
k-weight and can depend on k-range as it samples different portions of
the spectra.  Like you say,  it will tend to increase as you increase
the k-range.

On Fri, May 13, 2011 at 11:58 AM, Scott Calvin
<dr.scott.calvin at gmail.com> wrote:
> Matt,
> On May 13, 2011, at 8:39 AM, Matt Newville wrote:
>
>  After all, the epsilon should be different for different k-ranges, as your
> signal to noise ratio probably changes as a function of k. Using the same
> epsilon doesn't reflect that.
>
> Without seeing the data in question, this seems like speculation to me.  I'm
> not at all sure why epsilon (the variation in chi(k)) should depend strongly
> on the k-range.  In my experience, it usually does not.  The S/N ratio will
> surely change with k, but that would surely be dominated by the rapid decay
> in |chi(k)|, rather than a change in epsilon.
>
> I'm confused. We Fourier transform k-weighted data. Since Ifeffit uses the
> high-R amplitude to estimate uncertainty, it seems to me that what matters
> is signal-to-noise, not just noise in the original unweighted chi(k). Am I
> wrong in that? I may be misunderstanding how epsilon_r is calculated. And
> epsilon_r is the relevant epsilon for a fit in R space, right?
>
> I think your assumption that epsilon will depend strongly on k may
> not correct.  Do you have evidence for this?   I would say that it is
> not strongly dependent on k, and that reduced chi-square is useful
> in comparing fits with different k-ranges.
>
> I just tried it on the FeC2O4 chi(k) attached to this post. It's a good
> example of data where it's not immediately clear to me what the "best" value
> for kmax is, so it would be tempting to use RCS to compare fits over
> different k-ranges. I used k-weight 3, and Hanning windows with dk = 1. I
> chose kmin as 2 and stepped kmax by 0.5, recording epsilon_r for each:
> kmax         epsilon_r
> 7           0.034840105
> 7.5       0.041843848
> 8          0.082627337
> 8.5       0.087550367
> 9          0.086032007
> 9.5       0.085996216
> 10        0.088679339
> 10.5     0.090364699
> 11        0.092509939
> 11.5     0.108103081
>
> There's a general trend of increasing epsilon_r with an increase in k.
> There's also a jump of a factor of 2 between 7.5 and 8. Why? Because there's
> a glitch there, and the glitch adds high-R structure.

Well, except for that jump (which I would say is appropriate, as the
spike add weights at all frequencies),  I'd say epsilon_r is pretty
constant, varying by 10% (not bad for a crude estimate) up to k=11.
|chi(k)| drops by considerably over that range, possibly to well below
the noise level by k=10.  So the higher end there is clearly not going
to help the fit -- all you're adding is noise.

> To make sure there wasn't something odd about this particular chi(k), I took
> one of the data sets included with the horae distribution: the file y300.chi
> in the ybco folder.
> I followed the same procedure as before, except I stepped by 1 inverse
> angstrom each time, because of the greater data range.
> kmax         epsilon_r
> 7         0.012866125
> 8         0.073383695
> 9         0.078255772
> 10       0.080016040
> 11       0.091634572
> 12       0.105419473
> 13       0.164341701
> 14       0.195266957
> 15       0.224727593
> 16       0.411139882
> 17       0.480293296
> If anything, the trend is more clear here.

Between 8 and 12 Ang^-1 there is what I would call a small change
You're certainly adding more noise and progressively less signal as
you increase k, even for a noise level in chi(k) that does not depend
of k.   There are sharp features that could easily be considered
"white noise".  But I don't strongly disagree either -- epsilon_r does
definitely increase as you increase the k-range.

> I find it confusing that you expect  the noise in the data to
> depend (strongly, even) on k, but not on R.    The general wisdom is
> the estimate of epsilon from the high-R components is too low,
> suggesting that the R dependence is significant.    Every time I've looked,
> I come to the conclusion that noise in data is either consistent
> with "white" or so small as to be difficult to measure.  I believe
> Corwin Booth's recent work supports the conventional wisdom that
>  epsilon decreases with R, but I don't recall it suggesting a significant
> k dependence.
>
> I'm not making any claims as to whether, in general, the noise in the data
> depends on R. I can speculate about circumstances where low R noise is
> greater (due, for instance, to temperature fluctuations in cooling water,
> which are likely to be fairly slow), or where high R noise is greater (an
> example here would be if whatever system is keeping the beam on the sample
> vertically as the mono scans is tending to overshoot).
> But Ifeffit's estimation of epsilon_r demonstrably does not depend on the
> R-range used for fitting, regardless of the distribution of noise in R.
> That's a very different thing. Thus, changing the R-range of a fit is
> completely safe as far as comparing RCS goes.

Ah, OK, I think I see what you were getting at.     But I think the
epsilon_r and epsilon_k are still roughly good for using reduced
chi-square to compare fits of different k- and R-ranges.    If
anything, the estimate in the number of independent points is a much
cruder estimate than the estimate of epsilon.

--Matt