Hi Scott, On Mon, 7 Jul 2003, Scott Calvin wrote:
In any case, I don't really understand the logic of averaging scans based on some estimate of the noise. For that to be appropriate, you'd have to believe there was some systematic difference in the noise between scans. What's causing that difference, if they're collected on the same beamline on the same sample? (Or did I misunderstand Michel's comment--was he talking about averaging data from different beamlines or something?) If there is no systematic changes during data collection, then the noise level should be the same, and any attempt to weight by some proxy for the actual noise will actually decrease the statistical content of the averaged data by overweighting some scans (i. e. random fluctuations in the quantity being used to estimate the uncertainty will cause some scans to dominate the average more heavily, which is not ideal if the actual noise level is the same). If, on the other hand, there is a systematic difference between subsequent scans, it is fairly unlikely to be "white," and thus will not be addressed by this scheme anyway. Perhaps one of you can give me examples where this kind of variation in data quality is found.
Using a solid-state detector with low-concentration samples, it's common to do a couple scans counting for a few seconds per point, then more scans counting for longer time (say, first 3sec/pt then 10sec/pt). The data is typically better with longer counting time (not always by square-root-of-time), but you want to use all the noisy data you have. In such a case, a weighted average based on after-the-fact data quality would be useful.
So right now I don't see the benefit to this method. Particularly if it's automated, I hesitate to add hidden complexity to my data reduction without a clear rationale for it.
I can't see a case where it would be obviously better to do the simple average than to average using weights of the estimated noise. For very noisy data (such as the 3sec/pt scan and the 10sec/pt scan), the simple average is almost certainly worse. Anyway, it seems simple enough to allow either option and/or overriding the high-R estimate of the noise. Maybe I'm just not understanding your objection to using the high-R estimate of data noise, but I don't see how a weigthed average would "actually decrease the statistical content of the averaged data" unless the high-R estimate of noise is pathologically *way* off, which I don't think it is (I think it's generally a little low but reasonable). If two spectra give different high-R estimates of noise, saying they actually have the same noise seems pretty bold. Of course, one can assert that the data should be evenly weighted or assert that they should be weighted by their individually estimated noise. Either way, something is being asserted about different measurements in order to treat them as one. Whatever average is used, it is that assertion that is probably the most questionable, complex, and in need of a rationale. So maybe it is better to make the assertion more explicit and provide options for how it is done. --Matt