[Ifeffit] Another R-factor question

Thu May 6 11:55:24 CDT 2004

Hi Matt,

Thanks again for the help;  I was not sure if I missed something in the 
code.
Also, I meant to say that R_factor in ifeffit is 
sum((data-fit)^2)/sum(data^2).

On May 6, 2004, at 7:46 AM, Matt Newville wrote:

> Hi Wayne,
>
>> Thanks again for clearing up my confusion over the origin of the R
>> factor for fits in R-space.  I now have a different and unrelated
>> question about the R factor.
>> From looking through the ifeffit code, it appears that the R
>> factor is defined as sum(data-fit)/sum(data).  Is this correct?
>
> Yes, that is correct.   It is a fractional misfit.
>
>> The reason that I am asking is that I would like to use Hamilton's
>> test (Acta Cryst 1965, V18, p.502) to determine whether adding
>> additional shells to a fit actually results in a better fit.
>> Hamilton's test uses the "crystallographic R factor", which is
>> sqrt(sum(data-fit)/sum(data)), so I would like to know whether or
>> not to take the square root of the R-factor ratios in Hamilton's
>> test.  Thanks again for the help!
>
> I'm not familiar with "Hamilton's test", just downloaded the paper,
> and glanced at page 2 of it.  I am sure I do not know all the
> subtleties of R-factor(s) used in crystallography: I thought there
> were a couple different R-factors used, and except for something
> called 'R merge' (which seems to be only about data quality???) that
> they were all essentially 'sum(data-fit)/sum(data)', differing in
> whether they used Intensities, F values, and how they weighted the
> different reflections (this would seem similar to the different ways
> of treating XAFS data: weighting, k-, R-space, etc).  It looks to me
> like Hamilton used F values.
>
As far as R factors in crystallography go, the usual definition of R is
sum(||data|-|model|)/sum(|model|), but sometimes the RMS value
is used instead: R=sqrt(sum((|data|-|model|)^2)/sum(data^2)).
R-merge is the residual for averaging equivalent data,
R_merge=sum(||data|-|average||)/sum(|data|), if I recall
correctly, and reflects the data quality.

 From the manual for SHELX, one of the typical single crystal refinement
programs, the R-factors reported for model quality when doing crystal
structures are the weighted R-factor, wR2, and the R-factor, R1.  In 
SHELX,
wR2 = sqrt(sum(w(data^2-model^2)2)/sum(w(data)^2)), where
each reflection has its own weight, w.   The R-factor, R1 is
  sum(||data|-|model|)/sum(|model|) and is equivalent to the
square root of the R-factor reported by ifeffit, as I currently 
understand it.

> I have no doubt that there are people on this list know a lot more
> about this than I do.  Can anybody provide any insight on the
> R-factors and tests used in crystallography, and correct all the
> mistakes above?
>
> I think you might also be interested in the Joyner tests, from
> Joyner et al, J Phyc C 20, p 4005 (1987).  If I recall correctly,
> these are very close to standard statistics F-tests on the
> chi-square values, with the aim of testing whether adding data
> and/or variables improves a fit.  I believe the on-line EXCURVE
> manuals might have some discussion of these.  Of course, seeing if
> reduced chi-square is improved is the simplest way to compare two
> fits with different number of variables or data ranges.
>

Hamilton's test is an F-test that compares the ratios of R for two 
different
models with different degrees of freedom.   The R ratio is converted to
an F value, then the likelihood of the F value is examined.  Typically,
the hypothesis that the two models are equivalent is rejected if the
likelihood of the F value is less than 5%, but that number is somewhat
arbitrary.  Hamilton's test gives somewhat different information from
examining the reduced chi squared.  The reduced chi squared value
tells you which fit is better;  Hamilton's test tells you the 
probability
that the fits are actually equivalent.

Thank you for the reference to Joyner, et al.'s paper; it looks like 
this
is the same test.  Joyner, et al.'s paper defines the # of degrees of
freedom as # of points in the chi curve - # of variables rather than
# of independent data points - # of variables.  The F test is
very sensitive to the # of degrees of freedom.

Thanks again for the help!

Sincerely,

Wayne