New subject: Double dipping

22 Nov 2002

      Hi Matt,

I think that your reply sounds reasonable.  Lets talk some more about this.
...
The other side of this is that I think it's more difficult than
generally acknowleged to over-interpret data.  Most of the cases
I've seen are due to blatantly ignore the confidence limits or do
completely unfair things like fitting So2 to get 0.9+/-0.1, then
fix So2 to 0.90, fitting N, and claiming N to better than 10%.
Those are serious, but are the normal mistakes in analysis that
can happen with any data.
If you fix s02 and then you fit N and get a error lets call it dn.  Then the
more generous estimation of the uncertainty for N is d(s02*n) = ds02*n +
s02*dn.  How would you estimate the error for N if you fix s02, R and
sigma2?

I will try to be more specific to my original problem.  lets say that the
original data set (s3) gives these results for the Ca shell.
Ncas3=3.4+/-0.9
Rcas3=4.01+/-0.01
sigma2cas3=0.006+/-0.3
s02s3=1.05+/-0.10

Now I run another fit and I fix all the parameters except for Nca. This time
I fit three data sets s3, s4 and s5 together and I get these results.
Ncas3 = 3.2+/-0.1
Ncas4 = 2.5+/-0.4
Ncas5 = 2.0+/-0.4
What would you think would be an "appropriate" uncertainty?  Surely 0.1 is
too small for Ncas3 but 0.9 seems rather huge.  So we have some limits but
can I do any better than that?  The difference is between these to extremes
is quite profound.  In one case we see a change in the value for Nca in the
other we see nothing.

Shelly
...
-----Original Message-----
From: Newville, Matthew G.
Sent: Friday, November 22, 2002 10:17 AM
To: 'ifeffit@millenia.cars.aps.anl.gov'
Subject: Re: [Ifeffit] Double dipping
Hi Shelly,
...
I would like to take a bunch of parameters (all of them that do
not vary significantly from the known, as determined by
individual fits to each data set) from this "known" data set
and then fit the series including the "original" to see how a
particluar parameter (the number of Ca atoms) changes as the
ratio of U to Ca changes.  I would like to hear a discussion on
the fairness of this approach and how would you calculate the
number of independent points?
I think what you propose is fair, but you might need to be a
little more specific....
In general, I think it's fair to use parameters from one fit to
to another data set.  Once one accepts the fact that there is
limited amount of information, getting hung up on trying to count
or enforce N_idp becomes pointless -- the data and estimated
uncertainties __WILL__ indicate how many parameters can be
determined with any confidence.
N_idp is one statistic, it should not be construed as an exact
value nor a hard upper limit on how much data can be extracted
from data.  Any attempt to use N_idp this way is abuse.
The classic test is whether 100 very noisy scans (say, because
data was collected for 0.01s per point), has 100x more
'independent points' as 1 scan that collected for 1.0s per point.
The 100 noisy scans __ARE__ independent, but when added together
with give exactly the same result as using the 1 clean scan. So
independent data does not mean you can determine you can
determine more parameters.
The other side of this is that I think it's more difficult than
generally acknowleged to over-interpret data.  Most of the cases
I've seen are due to blatantly ignore the confidence limits or do
completely unfair things like fitting So2 to get 0.9+/-0.1, then
fix So2 to 0.90, fitting N, and claiming N to better than 10%.
Those are serious, but are the normal mistakes in analysis that
can happen with any data.
--Matt
_______________________________________________
Ifeffit mailing list
Ifeffit@millenia.cars.aps.anl.gov
http://millenia.cars.aps.anl.gov/mailman/listinfo/ifeffit

RE: [Ifeffit] Double dipping

Kelly, Shelly D.

Scott Calvin

tags

participants (2)