Re: [Ifeffit] Different R-factor values

25 Jan 2013

      Thanks again everyone,

Thanks for the clarity Bruce.  Yes guess it was a poor question. The
purely numerical aspect of R in a single data set fit is certainly not
lost on me but reviewing the analyses I want to have some concrete
arguments against the fits instead of just, 'your method and fits are
indefensible'.  There are several problems with the fits outside any
statistical metrics...    in fact I would guess that this was the unique
case where the user depended exclusively upon R instead of the "red line
matching the blue line".   (see attached)

- Chris

********************************
Christopher J. Patridge, PhD
NRC Post Doctoral Research Associate
Naval Research Laboratory
Washington, DC 20375
Cell: 315-529-0501

On 1/25/2013 3:32 PM, Bruce Ravel wrote:
...
Chris et al,
Sorry I didn't pipe up earlier.  I haven't had a good chance to sit
down and follow this discussion until this afternoon.
I'll start with the practical issue.  Recently someone requested
per-data-set R factors in Artemis.  Being a perfectly fine request, I
sat down to implement it.  Since fits in Artemis are usually done with
multiple k-weights, it wasn't clear to me how to display the
information in the clearest manner.
The "overall" R factor, the one that Ifeffit reports after the fit
finishes, includes all the data and all the k-weights used in the fit.
That is certainly a useful number in that it summarizes the closeness
of the fit in the aggregate.
As long as I was breaking down the R-factors by data set, I figured it
would be useful to do so by k-weight also.  I could imagine a scenario
where knowing how a particular data set and a particular k-weight
contributed to the overall closeness of the fit.  That should explain
the why of what you find in Artemis' log file.
My intent is to use the same formula for R-factor as in the Ifeffit
reference manual.  If you do a single data set, single k-weight fit,
theoverall R-factor and the per data set R-factor at the k-weight (all
three are reported regardless) should be the same.  It is possible
that is not well enought tested.
Matt's point about Larch being the superior tool for user-specified
R-factors is certainly true, although few GUI users would avail
themselves of that.
If some R-factor other than one reported by Ifeffit (or, soon, the one
reported by default by Larch) is needed, that would be a legitamate
request.  If something sophisticated or flexible is needed, that too
can be put into the GUI.
As for the actual question -- how to "decide" between the R-factors --
well, my take is that that's not a well posed question.  The R factor
is not reduced chi-square.  It does not measure *goodness*, it only
measures *closeness* of fit.  The term "goodness" means something in a
statistical context.  An R-factor is some kind of percentage misfit
without any consideration of how the information content of the actual
data ensemble was used.  In short, the R-factor is a numerical value
expressing how closely the red line overplots the blue line in the
plot made after Artemis finishes her fit.  Thus, the overall R-factor
expresses how closely all the red lines together overplot all the blue
lines.  The R-factors broken out by data set and k-weight express how
closely a particular red line overplots a particular blue line.
HTH,
B
On Friday, January 25, 2013 01:11:22 PM Christopher Patridge wrote:
...
Thank you for the discussion Matt and Jason,
My main objective was to decide between the two different reported
R-factors in some older Artemis fit file logs.  I suspect that the
analysis was prematurely completed because the user found small R-factor
values printed out along with the other fit statistics near the
beginning of the fit log.  Scrolling down the log file to the area which
gives;
R-factor for this data set = ?
k1,k2,k3 weightings R-factors = ?
This R-factor is the average R-factor of the k-weights and much larger
say,  0.01 above vs. 0.07-0.08 making a typical "good fit" to a single
data set into a rather questionable one.
Looking at more current fit logs from Demeter (attached, just a quick
example), the R-factor which is printed near the beginning of the fit
file is equal to the average R-factor for the k-weightings.  Therefore
the value found in the earlier Artemis file logs must have been faulty
or buggy as was said so one should not rely on that value to evaluate
the fits.  Sorry for any confusion but this is all in the name of
weeding out good/bad analysis....
Thanks again,
Chris
********************************
Christopher J. Patridge, PhD
NRC Post Doctoral Research Associate
Naval Research Laboratory
Washington, DC 20375
Cell: 315-529-0501
On 1/25/2013 12:04 PM, Matt Newville wrote:
...
Hi Jason, Chris,
...
Hi Chris,
Might be helpful also to link to the archived thread you're talking
about.
http://millenia.cars.aps.anl.gov/pipermail/ifeffit/2006-June/007048.html
Bruce might have to correct me on this, but if I remember right there
were
individual-data-set R-factor and chi-square calculations at some point,
which come not from IFEFFIT but from Bruce's own post-fit calculations,
and
these eventually were found to be pretty buggy and were dropped.
I don't understand what "the average over the k weights" R factor is;
analyzing the same data set with multiple k weights (which is pretty
typical) still means a single fit result and a single statistical output
in
IFEFFIT, as far back as I can remember, anyhow.  The discussion about
multiple R-factors is for when you're simultaneously fitting multiple
data
sets (i.e. trying to fit a couple different data sets to some shared or
partially shared set of guess variables).
I think the overall residuals and chi-square are the more statistically
meaningful values, as they are actually calculated by the same algorithm
used to determine the guess variables - they're the quantities IFEFFIT is
attempting to reduce.  I don't believe I've reported the per-data-set
residuals in my final results, as I only treated it as an internal check
for myself.  (It would be nice to have again, though...)
-Jason
I can understand the desire for "per data set" R-factors.  I think
On Fri, Jan 25, 2013 at 10:01 AM, Jason Gaudet 
wrote:
there are a few reasons why this hasn't been done so far.  First, The
main purpose of chi-square and R-factor are to be simple, well-defined
statistics that can be used to compare different fits.   In the case
of R-factor,  the actual value can also be readily interpreted and so
mapped to "that's a good fit" and "that's a poor fit" more easily
(even if still imperfect).   Second, it would be a slight technical
challenge for Ifeffit to make these different statistics and decide
what to call them.     Third, this is  really asking for information
on different portions of the fit, and it's not necessarily obvious how
to break the whole into parts.  OK, for fitting multiple data sets, it
might *seem* obvious how to break the whole.
But, well, fitting with multiple k-weights *is* fitting different
data.  Also, multiple-data-set fits can mix fits in different fit
spaces, with different k-weights, and so on.  Should the chi-squared
and R-factors be broken up for different k-weights too?  Perhaps they
should.  You can different weights to different data sets in a fit,
but how to best do this can quickly become a field of study on its
own.  I guess that's not a valid reason to not report these....
So, again, I think it's reasonable to ask for per-data-set and/or
per-k-weight statistics, but not necessarily obvious what to report
here.  For example, you might also want to use other partial
sums-of-squares (based on k- or R-range, for example) to see where a
fit was better and worse.    Of course, you can calculate any of the
partial sums and R-factors yourself.  This isn't so obvious with
Artemis or DArtemis, but it is possible.  It's  much easier to do
yourself and implement for others with larch than doing it in Ifeffit
or Artemis.  Patches welcome for this and/or any other advanced
statistical analyses.
Better visualizations of the fit and/or mis-fit might be useful to
think about too.
--Matt
_______________________________________________
Ifeffit mailing list
Ifeffit@millenia.cars.aps.anl.gov
http://millenia.cars.aps.anl.gov/mailman/listinfo/ifeffit