[Ifeffit] Different R-factor values

Fri Jan 25 12:11:22 CST 2013

Thank you for the discussion Matt and Jason,

My main objective was to decide between the two different reported 
R-factors in some older Artemis fit file logs.  I suspect that the 
analysis was prematurely completed because the user found small R-factor 
values printed out along with the other fit statistics near the 
beginning of the fit log.  Scrolling down the log file to the area which 
gives;

R-factor for this data set = ?
k1,k2,k3 weightings R-factors = ?

This R-factor is the average R-factor of the k-weights and much larger 
say,  0.01 above vs. 0.07-0.08 making a typical "good fit" to a single 
data set into a rather questionable one.

Looking at more current fit logs from Demeter (attached, just a quick 
example), the R-factor which is printed near the beginning of the fit 
file is equal to the average R-factor for the k-weightings.  Therefore 
the value found in the earlier Artemis file logs must have been faulty 
or buggy as was said so one should not rely on that value to evaluate 
the fits.  Sorry for any confusion but this is all in the name of 
weeding out good/bad analysis....

Thanks again,

Chris

********************************
Christopher J. Patridge, PhD
NRC Post Doctoral Research Associate
Naval Research Laboratory
Washington, DC 20375
Cell: 315-529-0501

On 1/25/2013 12:04 PM, Matt Newville wrote:
> Hi Jason, Chris,
>
> On Fri, Jan 25, 2013 at 10:01 AM, Jason Gaudet <jason.r.gaudet at gmail.com> wrote:
>> Hi Chris,
>>
>> Might be helpful also to link to the archived thread you're talking about.
>>
>> http://millenia.cars.aps.anl.gov/pipermail/ifeffit/2006-June/007048.html
>>
>> Bruce might have to correct me on this, but if I remember right there were
>> individual-data-set R-factor and chi-square calculations at some point,
>> which come not from IFEFFIT but from Bruce's own post-fit calculations, and
>> these eventually were found to be pretty buggy and were dropped.
>>
>> I don't understand what "the average over the k weights" R factor is;
>> analyzing the same data set with multiple k weights (which is pretty
>> typical) still means a single fit result and a single statistical output in
>> IFEFFIT, as far back as I can remember, anyhow.  The discussion about
>> multiple R-factors is for when you're simultaneously fitting multiple data
>> sets (i.e. trying to fit a couple different data sets to some shared or
>> partially shared set of guess variables).
>>
>> I think the overall residuals and chi-square are the more statistically
>> meaningful values, as they are actually calculated by the same algorithm
>> used to determine the guess variables - they're the quantities IFEFFIT is
>> attempting to reduce.  I don't believe I've reported the per-data-set
>> residuals in my final results, as I only treated it as an internal check for
>> myself.  (It would be nice to have again, though...)
>>
>> -Jason
> I can understand the desire for "per data set" R-factors.  I think
> there are a few reasons why this hasn't been done so far.  First, The
> main purpose of chi-square and R-factor are to be simple, well-defined
> statistics that can be used to compare different fits.   In the case
> of R-factor,  the actual value can also be readily interpreted and so
> mapped to "that's a good fit" and "that's a poor fit" more easily
> (even if still imperfect).   Second, it would be a slight technical
> challenge for Ifeffit to make these different statistics and decide
> what to call them.     Third, this is  really asking for information
> on different portions of the fit, and it's not necessarily obvious how
> to break the whole into parts.  OK, for fitting multiple data sets, it
> might *seem* obvious how to break the whole.
>
> But, well, fitting with multiple k-weights *is* fitting different
> data.  Also, multiple-data-set fits can mix fits in different fit
> spaces, with different k-weights, and so on.  Should the chi-squared
> and R-factors be broken up for different k-weights too?  Perhaps they
> should.  You can different weights to different data sets in a fit,
> but how to best do this can quickly become a field of study on its
> own.  I guess that's not a valid reason to not report these....
>
> So, again, I think it's reasonable to ask for per-data-set and/or
> per-k-weight statistics, but not necessarily obvious what to report
> here.  For example, you might also want to use other partial
> sums-of-squares (based on k- or R-range, for example) to see where a
> fit was better and worse.    Of course, you can calculate any of the
> partial sums and R-factors yourself.  This isn't so obvious with
> Artemis or DArtemis, but it is possible.  It's  much easier to do
> yourself and implement for others with larch than doing it in Ifeffit
> or Artemis.  Patches welcome for this and/or any other advanced
> statistical analyses.
>
> Better visualizations of the fit and/or mis-fit might be useful to
> think about too.
>
> --Matt
> _______________________________________________
> Ifeffit mailing list
> Ifeffit at millenia.cars.aps.anl.gov
> http://millenia.cars.aps.anl.gov/mailman/listinfo/ifeffit

-------------- next part --------------

Independent points          : 6.8125000
Number of variables         : 6
Chi-square                  : 3785.3828959
Reduced chi-square          : 4658.9327950                                      
R-factor                    : 0.0067330                                         
Measurement uncertainty (k) : 0.0007327
Measurement uncertainty (R) : 0.0007677
Number of data sets         : 1

Happiness = 74.31/100             color = #FEC082                               
   Used 6 of 6.813 independent points for a penalty of 25.688                   
***** Note: happiness is a semantic parameter and should *****                  
*****    NEVER be reported in a publication -- NEVER!    *****                  

guess parameters:                                                               
  amp                =   0.94829226    # +/-   0.22579777     [0.941]
  enot               =  -1.64270447    # +/-   2.10845432     [0]
  dO8                =  -0.19174946    # +/-   0.02373561     [0]
  ssO8               =   0.00339416    # +/-   0.00484903     [0.00300]
  ssP                =   0.01295715    # +/-   0.00658638     [0.00300]
  dP8                =  -0.10529215    # +/-   0.04348177     [0]

Correlations between variables:                                                 
                sso8 & amp                -->  0.9151
                 do8 & enot               -->  0.8896
                 dp8 & enot               -->  0.5757
                 dp8 & do8                -->  0.4888
All other correlations below 0.4

===== Data set >> 4_0V.009 << ====================================              

: Athena project       = C:\Users\christopher_patridge\Desktop\Raw XAS Datat\Raw XAS Battery Data\C_LiFePO4 9967-15\CarbonDataImportProject.prj, 15
: name                 = 4_0V.009
: k-range              = 2 - 7.5
: dk                   = 1
: k-window             = hanning
: k-weight             = 1,2,3
: R-range              = 1 - 3
: dR                   = 0.0
: R-window             = hanning
: fitting space        = r
: background function  = no
: phase correction     = 
: R-factor by k-weight = 1 -> 0.00467,  2 -> 0.00544,  3 -> 0.01010