Re: [Ifeffit] Question about statistics

6 Mar 2015

      Hi Olga,

It's a bit hard for me to tell what here is actually a question and
what is an assertion of preconceived ideas, or a description of what
you want (or think you want).  You might find it more useful to ask
questions in smaller units, and aim for more clarity.

On Wed, Mar 4, 2015 at 3:51 PM, Ольга Кашурникова  wrote:
...
Good day to developers and users of Ifeffit,
I'd like to get help with my question of statistical calculations in Ifeffit
(I use the version embedded in Demeter, but in console mode or from my
Fortran little program: think of Larch using but there are no some features
and I'm not sure).
Not sure what that means.  FWIW, there is not going to be future
development or changes beyond critical bugs to Ifeffit.  If you want
to use anything supported or ready for adding features, Larch is the
only way to go.
...
The task is to try making Bayes statistics, in the sense that was published
in Krappe et al. papers from 1999,2002,2004 years and the book of
Bretthorst.
Not familiar with that book.
...
I haven't found the software for it and my own tries to write it
are not completed because of no experience.
I believe that Krappe and Rossner's used a software modified from
Feffit, but I'm not sure they ever made this available for others to
use.
...
I saw something in papers about
BFEFFIT, an extension for such statistics, but haven't found it in the web
or any other information. May be somebody knows what stage is it on.
I've never heard of this.
...
I need it because of complicate model (4 spheres, may be splitting of
spheres because it is many-phase system with dopant phase) and need of
regularization in Tikhonov sense (parametra are correlated and I can not
determine which constraint or model is better without statistics, and with
nonconstrained parametra it simply will not fit)
Tikhnonov regularization and Bayesian analysis seem quite different to
me.  Can you explain what you mean?   Of course parameters are
correlated...  this happens all the time, but does not mean they
cannot be fitted.    Also, saying "simply will not fit" gives no clue
about what might be happening.
...
I understood that it can be emulated in Ifeffit with help of restraints. All
thing is to:
-minimize chi**2=(dat-model)**2/epsilon_k**2+A*sum(xi-x0i)**2, where A is
regularizer, xi - parametra in fit end, x0i - a priori values of parametra.
-calculate pair correlations and errors so that we can calculate the second
derivatives and variance matrix of chi**2 determines in this sence.
Restraints are a way to encode prior knowlege, but I'm not sure it's
really a Bayesian analysis.  Or perhaps: it's one way of doing a
Bayesian analysis, but almost certainly not the only way.

For a Bayesian analysis you need to find the most likely values for
parameters p given data (and a model) and a set of prior probabilities
for the parameters.   So you need to:
   a) have an expectation value and probability distribution for each parameter
   b) somehow encode this in the actual analysis.

Note that this doesn't say anything about how one does the analysis.
When doing a Bayesian analysis, one is given an additional problem
(identify a probability distribution for each parameter) and no
guidance of what to do once that is known.   In this sense, Bayesian
analysis is "obviously better" because it prescribes a general maxim
("include prior knowledge of your parameters") and no guidance of any
practical utility.  It is approximately equivalent to saying "Find a
career that makes you happy".

Chi-square minimization is completely different -- a separate topic
really.  It's not Bayesian or Non-Bayesian  (Frequentist), it's just a
method.  Practical, no theoretical.  It finds a set of parameters p to
make "[data - model(p)]/epsilon" match as well as possible.   This is
convenient because there are a variety of mathematical methods ("Curve
Fitting / Optimization") for finding a set of parameters that
minimizes some array in the least-squares sense.

If looking at it from a theoretical viewpoint, this is exactly
asserting that the variance in the data is given by a Normal /
Gaussian distribution.    If you don't know what probability
distribution to use, this would be one of the first to assume.

Given this, one route to a practical way to encode "prior information"
/ Bayesian priors is to restraints.   Here one simply makes up a "data
point" and extends the array to be minimized:

    [data-model(p)]/epsilon + restraint1 + restraint2 + ....

If one sets restraint1 to be
    [paramA - A_0]/eps_A

that would effectively assert a normal distribution for parameter A
centered at A_0 with sigma = eps_A.
Of course, that doesn't address the question of how much you believe
"the parameters should be varied so that the model matches the data"
compared to "the value for parameter A should be near A_0".  You may
need to adjust a weighting factor, or scale eps_A accordingly.  That's
sort of like regularization....  To be clear, Bayes gives no guidance
on how to do this.
...
But I can not understand about how restraint goes in fit and how IFEFFIT
calculates chi_square (not reduced) and errors. From manuals of FEFFIT and
IFEFFIT I supposed that there goes:
-LS minimisation of (dat-model)**2+restraint**2, may be normed, but it
doesn't matter
-chi_square is this value divided by eps_k**2, may be divided by points num
-for errors found variance of chi_square defined above, but then error bars
are scaled to chi_reduced to eliminate noise dependence
-pair correlations are corr(i,j)/sqrt(corr(i,i)*corr(j,j) and doesn't depend
of normalisations
See above.  Chi-square is the scaled sum of squares of the array to be
minimized.  So, yes, it would include restraints.  And restraints can
alter the uncertainties and the correlations because they change the
array to be minimized.
...
What is right? I need the above defined
chi**2=(dat-model)**2/epsilon_k**2+A*sum(xi-x0i)**2 and its variance matrix,
because it defines my statistics, optimal A,errors and 'projections' of
parametra that I need to verify model.
I use the fit in k space (not q) without Fourier transforms or k weight,
because it is hard to do for this statistics.
Sorry, I can't read this in any way except "I do X because it is hard
to do".   Is that really what you mean?

I strongly recommend not fitting is k-space.  Doing so willfully
ignores something that you do, in fact, know about the model you are
using for the data -- that it is limited in R-space.   That is, if the
model has no weight above R>8 Angstroms, why try to fit this part of
the data?   There is no situation in which fitting in k-space is
better than fitting in R-space, and very many situations where it is
worse.
...
(Filtration of high fr3equency
components I can do with rectangle windows if needed, I think, but it is not
much bigger than noise)
I thought (dat-model)**2 in k space is in real part for points from kmax to
kmin for defined dat array(my_spectrum.chi for instance) and fit.chi array.
But I calculated chi_square in hand from these arrays and there was no right
number. for test of 1 sphere chi_square=8e-3, (dat-model)**2=3e-4,
(dat-model)**2/noise**2=2,37e1, (dat-model)**2/noise**2=9.5e-2 and so on
(regularizer part A(x-x0)**2=0.91). For instance.
Sorry, I'm not sure what you are trying to say.    Is that your
by-hand calculation of chi-square is not matching the one reported?
If you wanted us to help you solve this, you would have to give us
more details.
...
But if it calculates in real and imaginary part of Fourier transform no
meaning what space I use - and this function is used to find variance - then
I'm not sure how to find error bars and variance matrix I need.
Not sure what you're talking about here.
...
There are many other questions and limitations, I want to make this
calculation approximately with what I can (constaint noise, may be +bkg part
as Ifeffit gives, two or more spectra - and the other things later), but I
hope  someone could help the beginner to solve this problem or find the
simplest way in existing software... I need to know what is the stage of
this topic and need I end my program for it or not. There in mind are such
things as mu0 polynome fit, many-spectra fit with PDF and diffraction
spectra for model and so on, but may be there is the best program in one
step from completeness and I need not to do it.
Sorry, I don't understand this.
...
Thanks sincerely,
Olga Kashurnikova (from MePHI XAFS group in Moscow)
I'm not sure I answered any of your questions, but hope it helps.   Cheers,

--Matt

Re: [Ifeffit] Question about statistics

Matt Newville