[Ifeffit] measurement uncertainty
newville at cars.uchicago.edu
Tue Oct 22 11:58:23 CDT 2002
I agree 99.99% with Bruce's explanation, and just want to clarify
a couple points and go on about a few other aspects. As Bruce
points out, the estimated error bars are rescaled because the
automated estimate of the uncertainty in the data is almost
always too small, and that the reported reduced chi-square for a
good fit often exceeds 20 (whereas it should be close to 1).
Bruce also points out (again, correctly) that we have very little
experience training monkeys. Sorry, I couldn't resist.
Chi-square, Reduced Chi-square, and the R-factor can be used to
determine whether a fit is 'good'. They can certainly be used to
compare two fits to decide which is 'better'.
In Statistics 101 (ie, a first pass at discussing Data and Error
Analysis), Chi-square is used to estimate uncertainties in fitted
parameters, provided the uncertainties in the data are known.
That is, in Statistics 101, Chi-square has two very different
purposes: 1) Is the fit good? and 2) What are the uncertainties
in the parameters? Since, we don't know the uncertainties in the
data (which Statistics 101 happily ignores), we have to do the
best we can. The approach used by ifeffit is common and
carefully critiqued in Numerical Recipes by Press, et al.
This approach definitely leads to the caution Bruce expressed
about using some judgment about whether to trust fit results.
The rescaling means that the reported uncertainties in the
parameters are valid *IF* the fit is "good". That seems
reasonable because if the fit is not "good", you probably don't
care that the reported uncertainties are not good either.
Incidentally, the Statistics 101 view of Chi-square also glosses
over the notion of what counts as a 'data point'. That leads to
the whole idea of Number of Independent Points, N_idp, which
would set a maximum number of fittable parameters and would go
into the Chi-square equation, and has led to lots of discussion
in the EXAFS community. If non-rescaled Chi-square is used to
estimate uncertainties, N_idp would also effect the error bars.
Rescaling error bars to assert that the fit is good (that is,
scaling epsilon so that Chi-square = N_idp - N_Parameters)
actually lessens the dependence of the error bars on N_idp.
The 'white-noise' estimate of epsilon_R that Ifeffit (and Feffit)
does is very easy, and usually not too far off for white-noise
(that is, the portion of the noise that is independent of R).
It actually works reasonably well for very noisy data. We all do
the best we can to avoid that situation!!
Bruce gave the normal arguments of saying that the white-noise
estimate doesn't include systematic errors. I would put a slight
variation on this: It does include systematic and statistical
errors that are 'white', but doesn't include statistical or
systematic errors that are not white. There has been a lot of
speculation in the EXAFS community about the importance of
systematic errors. Many have suggested that systematic errors
are dominated by bad background-removal. I'm not sure I agree,
but this would certainly count as a non-white, systematic error.
Glitches are systematic errors that have a fairly large component
that is white. I don't think anyone really has a complete handle
on this topic, or indeed why EXAFS fits tend to be much worse
than white-noise would predict. Blaming the Feff calculations is
another popular option!!
In principle, a Bayesian approach could help, but I don't think
that it would magically give better error bar estimates. In a
Bayesian approach, we would need to put uncertainties on the Feff
calculations too -- a good idea, but not trivial to do. That
being said, if anyone has any ideas of a better approach or even
a robust alternative, please let me know!!
More information about the Ifeffit