Scott,
On Thu, May 12, 2011 at 10:51 AM, Scott Calvin
Hi Brandon, Matt and Bruce both gave good, thorough answers to your questions this morning. Nevertheless, I'm going to chime in too, because there are some aspects of this issue I'd like to put emphasis on. On May 11, 2011, at 8:46 PM, Brandon Reese wrote:
I tried your suggestion with epsilon and the chi-square values came out to be very similar values with the different windows. Does this mean that reporting reduced chi-square values in a paper that compared several data sets would not be necessary and/or appropriate?
Bruce said "no" emphatically, and I say "yes," but I think we've understood the question differently. As Bruce says:
Of course, reduced chi-square can only be compared for fitting models which compute epsilon the same way or use the same value for epsilon.
That's the key point. I've gotten away from reporting values for reduced chi-square (RCS). That's a personal choice, and is not in accord with the International X-Ray Absorption Society's Error Reporting Recommendation, available here: http://ixs.iit.edu/subcommittee_reports/sc I think the difficulty in choosing epsilon is more likely to make a reduced chi-square number confusing than enlightening. But I am moving increasingly toward reporting changes in reduced chi-square between fits on the same data, and applying Hamilton's test to determine if improvements are statistically significant.
Well, the Error Reporting recommendation is a minimal recommendation. Going beyond it by comparing fits with Hamilton or other statistical tests is not at all against the spirit of that document. Such tests are highly useful, they're just hard to apply to a single fit. That seems very different to saying that reporting reduced chi-square is not necessary or appropriate, which could be taken to mean that not reporting any statistical analysis is better than reporting reduced chi-square.
Would setting a value for epsilon allow comparisons across different k-ranges, different (but similar) data sets, or a combination of the two using the chi-square parameter?
Maybe not. After all, the epsilon should be different for different k-ranges, as your signal to noise ratio probably changes as a function of k. Using the same epsilon doesn't reflect that.
Without seeing the data in question, this seems like speculation to me. I'm not at all sure why epsilon (the variation in chi(k)) should depend strongly on the k-range. In my experience, it usually does not. The S/N ratio will surely change with k, but that would surely be dominated by the rapid decay in |chi(k)|, rather than a change in epsilon.
In playing around with different windows and dk values my fit variables generally stayed within the error bars, but the size of the error bars could change more than a factor 2. Does this mean that it would make sense to find a window/dk that seems to "work" for a given group of data and stay consistent when analyzing that data group?
The fact that your variables stay within the error bars is good news. The change in the size of the error bars may be related to a less-than-ideal value for dk you may have used for the Kaiser-Bessel window. But yes, find a window and dk combination that seems to work well and then stay consistent for that analysis. Unless the data is particularly problematic, I'd prefer making a reasoned choice before beginning to fit and then sticking with it; a posteriori choices for that kind of thing make me a little nervous.
* * * At the risk of being redundant, four quick examples. Example 1: You change the range of R values in the Fourier transform over which you are fitting a data set. For this example, RCS is a valuable statistic for letting you know whether the fit supports the change in R-range.
Example 2: You change the range of k values over which you are fitting your data. For this example, comparing RCS is unlikely to be useful. You are likely trying different k-ranges because you are suspicious about some of the data at the extremes of your range. Including or excluding that data likely implies epsilon should be changed, but by how much?
I think your assumption that epsilon will depend strongly on k may not correct. Do you have evidence for this? I would say that it is not strongly dependent on k, and that reduced chi-square is useful in comparing fits with different k-ranges. I find it confusing that you expect the noise in the data to depend (strongly, even) on k, but not on R. The general wisdom is the estimate of epsilon from the high-R components is too low, suggesting that the R dependence is significant. Every time I've looked, I come to the conclusion that noise in data is either consistent with "white" or so small as to be difficult to measure. I believe Corwin Booth's recent work supports the conventional wisdom that epsilon decreases with R, but I don't recall it suggesting a significant k dependence. It would be interesting to have more evidence. --Matt