# [Ifeffit] Double dipping

Kelly, Shelly D. SKelly at anl.gov
Fri Nov 22 11:05:53 CST 2002

```Hi Matt,

I think that your reply sounds reasonable.  Lets talk some more about this.
> The other side of this is that I think it's more difficult than
> generally acknowleged to over-interpret data.  Most of the cases
> I've seen are due to blatantly ignore the confidence limits or do
> completely unfair things like fitting So2 to get 0.9+/-0.1, then
> fix So2 to 0.90, fitting N, and claiming N to better than 10%.
> Those are serious, but are the normal mistakes in analysis that
> can happen with any data.

If you fix s02 and then you fit N and get a error lets call it dn.  Then the
more generous estimation of the uncertainty for N is d(s02*n) = ds02*n +
s02*dn.  How would you estimate the error for N if you fix s02, R and
sigma2?

I will try to be more specific to my original problem.  lets say that the
original data set (s3) gives these results for the Ca shell.
Ncas3=3.4+/-0.9
Rcas3=4.01+/-0.01
sigma2cas3=0.006+/-0.3
s02s3=1.05+/-0.10

Now I run another fit and I fix all the parameters except for Nca. This time
I fit three data sets s3, s4 and s5 together and I get these results.
Ncas3 = 3.2+/-0.1
Ncas4 = 2.5+/-0.4
Ncas5 = 2.0+/-0.4
What would you think would be an "appropriate" uncertainty?  Surely 0.1 is
too small for Ncas3 but 0.9 seems rather huge.  So we have some limits but
can I do any better than that?  The difference is between these to extremes
is quite profound.  In one case we see a change in the value for Nca in the
other we see nothing.

Shelly

> -----Original Message-----
> From: Newville, Matthew G.
> Sent: Friday, November 22, 2002 10:17 AM
> To: 'ifeffit at millenia.cars.aps.anl.gov'
> Subject: Re: [Ifeffit] Double dipping
>
>
> Hi Shelly,
>
> > I would like to take a bunch of parameters (all of them that do
> > not vary significantly from the known, as determined by
> > individual fits to each data set) from this "known" data set
> > and then fit the series including the "original" to see how a
> > particluar parameter (the number of Ca atoms) changes as the
> > ratio of U to Ca changes.  I would like to hear a discussion on
> > the fairness of this approach and how would you calculate the
> > number of independent points?
>
> I think what you propose is fair, but you might need to be a
> little more specific....
>
> In general, I think it's fair to use parameters from one fit to
> to another data set.  Once one accepts the fact that there is
> limited amount of information, getting hung up on trying to count
> or enforce N_idp becomes pointless -- the data and estimated
> uncertainties __WILL__ indicate how many parameters can be
> determined with any confidence.
>
> N_idp is one statistic, it should not be construed as an exact
> value nor a hard upper limit on how much data can be extracted
> from data.  Any attempt to use N_idp this way is abuse.
>
> The classic test is whether 100 very noisy scans (say, because
> data was collected for 0.01s per point), has 100x more
> 'independent points' as 1 scan that collected for 1.0s per point.
> The 100 noisy scans __ARE__ independent, but when added together
> with give exactly the same result as using the 1 clean scan. So
> independent data does not mean you can determine you can
> determine more parameters.
>
> The other side of this is that I think it's more difficult than
> generally acknowleged to over-interpret data.  Most of the cases
> I've seen are due to blatantly ignore the confidence limits or do
> completely unfair things like fitting So2 to get 0.9+/-0.1, then
> fix So2 to 0.90, fitting N, and claiming N to better than 10%.
> Those are serious, but are the normal mistakes in analysis that
> can happen with any data.
>
> --Matt
>
>
> _______________________________________________
> Ifeffit mailing list
> Ifeffit at millenia.cars.aps.anl.gov
> http://millenia.cars.aps.anl.gov/mailman/listinfo/ifeffit
>

```

More information about the Ifeffit mailing list