[Ifeffit] Question about transform windows and statistical parameters
newville at cars.uchicago.edu
Fri May 13 12:32:14 CDT 2011
Sorry, I read epsilon as "noise in chi(k)". This is the most
meaningful physical/statistical measure: epsilon_r surely depends on
k-weight and can depend on k-range as it samples different portions of
the spectra. Like you say, it will tend to increase as you increase
On Fri, May 13, 2011 at 11:58 AM, Scott Calvin
<dr.scott.calvin at gmail.com> wrote:
> On May 13, 2011, at 8:39 AM, Matt Newville wrote:
> After all, the epsilon should be different for different k-ranges, as your
> signal to noise ratio probably changes as a function of k. Using the same
> epsilon doesn't reflect that.
> Without seeing the data in question, this seems like speculation to me. I'm
> not at all sure why epsilon (the variation in chi(k)) should depend strongly
> on the k-range. In my experience, it usually does not. The S/N ratio will
> surely change with k, but that would surely be dominated by the rapid decay
> in |chi(k)|, rather than a change in epsilon.
> I'm confused. We Fourier transform k-weighted data. Since Ifeffit uses the
> high-R amplitude to estimate uncertainty, it seems to me that what matters
> is signal-to-noise, not just noise in the original unweighted chi(k). Am I
> wrong in that? I may be misunderstanding how epsilon_r is calculated. And
> epsilon_r is the relevant epsilon for a fit in R space, right?
> I think your assumption that epsilon will depend strongly on k may
> not correct. Do you have evidence for this? I would say that it is
> not strongly dependent on k, and that reduced chi-square is useful
> in comparing fits with different k-ranges.
> I just tried it on the FeC2O4 chi(k) attached to this post. It's a good
> example of data where it's not immediately clear to me what the "best" value
> for kmax is, so it would be tempting to use RCS to compare fits over
> different k-ranges. I used k-weight 3, and Hanning windows with dk = 1. I
> chose kmin as 2 and stepped kmax by 0.5, recording epsilon_r for each:
> kmax epsilon_r
> 7 0.034840105
> 7.5 0.041843848
> 8 0.082627337
> 8.5 0.087550367
> 9 0.086032007
> 9.5 0.085996216
> 10 0.088679339
> 10.5 0.090364699
> 11 0.092509939
> 11.5 0.108103081
> There's a general trend of increasing epsilon_r with an increase in k.
> There's also a jump of a factor of 2 between 7.5 and 8. Why? Because there's
> a glitch there, and the glitch adds high-R structure.
Well, except for that jump (which I would say is appropriate, as the
spike add weights at all frequencies), I'd say epsilon_r is pretty
constant, varying by 10% (not bad for a crude estimate) up to k=11.
|chi(k)| drops by considerably over that range, possibly to well below
the noise level by k=10. So the higher end there is clearly not going
to help the fit -- all you're adding is noise.
> To make sure there wasn't something odd about this particular chi(k), I took
> one of the data sets included with the horae distribution: the file y300.chi
> in the ybco folder.
> I followed the same procedure as before, except I stepped by 1 inverse
> angstrom each time, because of the greater data range.
> kmax epsilon_r
> 7 0.012866125
> 8 0.073383695
> 9 0.078255772
> 10 0.080016040
> 11 0.091634572
> 12 0.105419473
> 13 0.164341701
> 14 0.195266957
> 15 0.224727593
> 16 0.411139882
> 17 0.480293296
> If anything, the trend is more clear here.
Between 8 and 12 Ang^-1 there is what I would call a small change
You're certainly adding more noise and progressively less signal as
you increase k, even for a noise level in chi(k) that does not depend
of k. There are sharp features that could easily be considered
"white noise". But I don't strongly disagree either -- epsilon_r does
definitely increase as you increase the k-range.
> I find it confusing that you expect the noise in the data to
> depend (strongly, even) on k, but not on R. The general wisdom is
> the estimate of epsilon from the high-R components is too low,
> suggesting that the R dependence is significant. Every time I've looked,
> I come to the conclusion that noise in data is either consistent
> with "white" or so small as to be difficult to measure. I believe
> Corwin Booth's recent work supports the conventional wisdom that
> epsilon decreases with R, but I don't recall it suggesting a significant
> k dependence.
> I'm not making any claims as to whether, in general, the noise in the data
> depends on R. I can speculate about circumstances where low R noise is
> greater (due, for instance, to temperature fluctuations in cooling water,
> which are likely to be fairly slow), or where high R noise is greater (an
> example here would be if whatever system is keeping the beam on the sample
> vertically as the mono scans is tending to overshoot).
> But Ifeffit's estimation of epsilon_r demonstrably does not depend on the
> R-range used for fitting, regardless of the distribution of noise in R.
> That's a very different thing. Thus, changing the R-range of a fit is
> completely safe as far as comparing RCS goes.
Ah, OK, I think I see what you were getting at. But I think the
epsilon_r and epsilon_k are still roughly good for using reduced
chi-square to compare fits of different k- and R-ranges. If
anything, the estimate in the number of independent points is a much
cruder estimate than the estimate of epsilon.
More information about the Ifeffit