Re: [Ifeffit] Question about statistics
Hello, Dr Matt Newville, Thank you very much for the answer. There is a message with some corrections in the next thread - the numbers of my hand try to check am I right about chi_square and error calculation in IFEFFIT, were not right in the first time. I will try to attach the calculation files later, it is not clear for me how they will be shown here. I will write the main question and then about Bayesian analysis for thread splitting if you don't mind. These things are interconnected that's why I didn't split them. The main question was about how IFEFFIT calculates chi_square and to what minimization function it adds restraints, in case of k_space. What is the normalization? I thought it should be (N_idp/N_data)sum[(dat-model)^2/eps^2], and in k space only the real part of data compared with imaginary part of model, but my hand calculation for check this gave the different result. It is necessary for the choice of optimal restraint (it is chi^2 without normalization that I need to compare) and for normalization of errors for them to be covariance matrix of this chi^2, as of Krappe and Rossner. Restraint will change errors, of course, because covariance matrix is (Q+A)^(-1) where A is regularizer in restraint and Q is what should be without restraint. If Q is not invertible (in case of some parametra don't influence spectrum) it is critical, and optimal A can be found to strictly divide the 'parametra space' and verify models. That's why I need to rescale them well, because for bayes it should be chi^2=sum[(dat-model)^2/eps^2]+A*sum(x-x0)**2 (A is connected with a priori ranges of parametra, it should be not one number, but algorithm of finding optimal diagonal A matrix is harder and I think not with IFEFFIT/Larch help). It is the same as your (paramA-A_0)/eps_A restraint, I understood what formula should be for a restraint, but don't understand how to weight data and restraint part. It were Krappe and Rossner who mentioned Tikhonov regularization if I remember right, and it is very close, if I understood the paper about this regularization. That's what I mean in 'doesn't fit': 4 coordination spheres (Gd-O, Gd-Gd,Gd-Hf, next Gd-O for instance, and only first is in separate peak) may be in the N_idp range, but are so correlated that without constraints or other structure model the fit give bad results - some parametra will leave the acceptable range etc, and I'm not sure that without a Bayesian analysis I can define the better model of constraints, and the sphere splitting is even more difficult to define. So, the main question is 'how IFEFFIT calculates this, what is the formula', because FEFFIT manual doesn't give the k-space case, and the check was wrong. I can append calculation files in a while if needed (I need to convert from a program to iff file), though I wrote formulae and they may be can be verified without numerical values. Chi_square in your method is close to what should be minimized and for what the covariance matrix should be found in Bayesian analysis, with the right normalization, that's why I ask if I could use IFEFFIT to calculate this with renormalization of the results. If I put the right normalization to restraint, I need only rescale chi_square and errors and then calculate what I need. It is not that I think IFEFFIT use Bayesian statistics, I understand there are different approaches. Thankful for your assistance, Olga Kashurnikova, MEPhI, Moscow
Dear Olga,
Thanks for your interest in the Bayesian analysis codes. I'm glad to
hear that your are interested in developing the method. Although some
prototype Bayes-Turchin based codes were developed by Josh Kas as part
of his thesis research, the project was never fully integrated into the
IFEFFIT
package and progress stalled for lack of funding. But there has also been
a
parallel effort to develop a Bayes-Turchin code by Elizabeta Holub-Krappe
and her colleagues in Chiba, Japan. and I would suggest that you ask her
for a copy of their BT code and documentation. I think this topic will
also be discussed at the XAFS16 conf in Karlsruhe in Aug 2015 - will you be
there ?
With very best regards,
John
cc E. Holub-Krappe
On Fri, Mar 6, 2015 at 12:39 PM, Olga Kashurnikova
Hello, Dr Matt Newville,
Thank you very much for the answer.
There is a message with some corrections in the next thread – the numbers of my hand try to check am I right about chi_square and error calculation in IFEFFIT, were not right in the first time. I will try to attach the calculation files later, it is not clear for me how they will be shown here.
I will write the main question and then about Bayesian analysis for thread splitting if you don’t mind. These things are interconnected that’s why I didn’t split them.
The main question was about how IFEFFIT calculates chi_square and to what minimization function it adds restraints, in case of k_space. What is the normalization? I thought it should be (N_idp/N_data)sum[(dat-model)^2/eps^2], and in k space only the real part of data compared with imaginary part of model, but my hand calculation for check this gave the different result. It is necessary for the choice of optimal restraint (it is chi^2 without normalization that I need to compare) and for normalization of errors for them to be covariance matrix of this chi^2, as of Krappe and Rossner. Restraint will change errors, of course, because covariance matrix is (Q+A)^(-1) where A is regularizer in restraint and Q is what should be without restraint. If Q is not invertible (in case of some parametra don’t influence spectrum) it is critical, and optimal A can be found to strictly divide the ‘parametra space’ and verify models. That’s why I need to rescale them well, because for bayes it should be chi^2=sum[(dat-model)^2/eps^2]+A*sum(x-x0)**2 (A is connected with a priori ranges of parametra, it should be not one number, but algorithm of finding optimal diagonal A matrix is harder and I think not with IFEFFIT/Larch help). It is the same as your (paramA-A_0)/eps_A restraint, I understood what formula should be for a restraint, but don’t understand how to weight data and restraint part. It were Krappe and Rossner who mentioned Tikhonov regularization if I remember right, and it is very close, if I understood the paper about this regularization. That’s what I mean in ‘doesn’t fit’: 4 coordination spheres (Gd-O, Gd-Gd,Gd-Hf, next Gd-O for instance, and only first is in separate peak) may be in the N_idp range, but are so correlated that without constraints or other structure model the fit give bad results – some parametra will leave the acceptable range etc, and I’m not sure that without a Bayesian analysis I can define the better model of constraints, and the sphere splitting is even more difficult to define.
So, the main question is ‘how IFEFFIT calculates this, what is the formula’, because FEFFIT manual doesn’t give the k-space case, and the check was wrong. I can append calculation files in a while if needed (I need to convert from a program to iff file), though I wrote formulae and they may be can be verified without numerical values.
Chi_square in your method is close to what should be minimized and for what the covariance matrix should be found in Bayesian analysis, with the right normalization, that’s why I ask if I could use IFEFFIT to calculate this with renormalization of the results. If I put the right normalization to restraint, I need only rescale chi_square and errors and then calculate what I need. It is not that I think IFEFFIT use Bayesian statistics, I understand there are different approaches.
Thankful for your assistance,
Olga Kashurnikova, MEPhI, Moscow
_______________________________________________ Ifeffit mailing list Ifeffit@millenia.cars.aps.anl.gov http://millenia.cars.aps.anl.gov/mailman/listinfo/ifeffit
Glad to hear your answer, Dr John J Rehr! It was very helpful!
I saw BFEFFIT in Dr Joshua Kas thesis but didn’t find anymore of it in web, so was not sure. For some time I tried to do this analysis myself, but could not find errors in the program. May be I will find it later, but I have not much time and need to use something working or change a thesis theme.
It will be great if there is such a code, though if it is not in wide use, could I use it or not? And could I now end in the simplest case what I began in IFEFFIT, for what I need norms, if the code exists? My scientific advisor asks me to make a paper on the results of our local conference in a week, and because restraint works well, I wanted to get the extra results, thinking that it could be not a bad solution for a first time.
What of conference, I haven’t send the annotation for it yet, don’t remember what are deadlines there and it depends on if I can calculate something for it: we have new spectra for new samples, and they can be calculated without a Bayes stuff, but depends on time left and because there are the same problems with model, I think it should better be done with Bayes. For old spectra and nonBayes model we published yet, some with my participation, some with others.
Thankful, Olga Kashurnikova
From: John J Rehr [mailto:jjr@uw.edu]
Sent: Saturday, March 07, 2015 12:27 AM
To: XAFS Analysis using Ifeffit; Olga Kashurnikova
Cc: Matt Newville; Elizabeta Holub-Krappe; Joshua Kas
Subject: Re: [Ifeffit] Question about statistics
Dear Olga,
Thanks for your interest in the Bayesian analysis codes. I'm glad to
hear that your are interested in developing the method. Although some
prototype Bayes-Turchin based codes were developed by Josh Kas as part
of his thesis research, the project was never fully integrated into the IFEFFIT
package and progress stalled for lack of funding. But there has also been a
parallel effort to develop a Bayes-Turchin code by Elizabeta Holub-Krappe and her colleagues in Chiba, Japan. and I would suggest that you ask her for a copy of their BT code and documentation. I think this topic will also be discussed at the XAFS16 conf in Karlsruhe in Aug 2015 - will you be there ?
With very best regards,
John
cc E. Holub-Krappe
On Fri, Mar 6, 2015 at 12:39 PM, Olga Kashurnikova
Olga, I'm having a very difficult time understanding what you are trying to say and do. Lengthy emails aren't necessarily helpful. Please try to state simple questions and post simple examples. Please understand that you want us to read your messages, understand them, and respond to them with something other than requests for clarification. The risk for you is that we'll simply give up asking for clarification after a few attempts. I've made a few attempts already. Much of what you say is really very confusing. Some of it is just wrong (Bayesian approaches certainly do not dictate what "space" to use). It's also simply too much to reply to all of it. It is clear that you're trying to do some fitting in k-space AND have a meaningful statistical treatment. Stop doing this. Fit in R-space if you want to do any meaningful statistical analysis. You also want to do something "Bayesian" because some fit "didn't work", which you don't actually define. If a fit doesn't give meaningful results, it seems likely to me that the data simply doesn't support the variables you're trying to fit. I doubt a Bayesian approach is likely to help. Of course, you can use priors to skew the fit to expected values for the parameters, but that's probably not very different than just constraining parameter values. If you have any questions about the code or statistical treatments, or want to use something supported or for which development can possibly progress, use Larch. Ifeffit is no longer supported and won't be developed further. Specifically for you, Ifeffit may very well have bugs in fitting in k-space -- it's never been a good idea to fit in k-space, so this option was never tested well. If you're interested in adding restraints or priors or other Bayesian tools to Larch, that would be great. The fitting code in Larch is already far superior to that in Ifeffit, but there is always room for improvement. There are a handful of Python tools for Bayesian analysis, and also for using MCMC methods. These are well-supported and tested, and would not be hard to incorporate into Larch. Cheers, --Matt
I'm very sorry that I'm long. That is because I didn't suppose to explain the approach itself, but to understand what is with formulae of statistics in IFEFFIT. I understood that you say that k-space is a problem because you didn't rely on it. I thought it is not a good idea for IFEFFIT statistic, not Bayes. I would like to know what you mean of 'Bayes doesn't dictate space', I didn't understand it from papers and books, may be you will help me find where it is said and what is math? I really didn't know anything of it, there was analysis of original spectra in all cases. I have statistically good fits on this compounds, but with constraints, and Bayesian approach could help decide what parametra are nonmeaningful. It is hard to decide between models without it. I will try a test in R-space now, and to simplify the test, if you think, but not sure how to use it to the end. Is it that I should use epsilon_R and simply use FT-transformed dat and model and Bayes statistics will be the same? I'm not sure noise can be treated as constant in this case, you see, it depends on k-weighting and so on. Uncertainty is from noise (that can be thought as Gaussian) and mu0, and it is not quite understandable how to add mu0 co-fitting than. Thank you for help, Olga Kashurnikova
Olga,
On Sat, Mar 7, 2015 at 12:26 AM, Olga Kashurnikova
I'm very sorry that I'm long. That is because I didn't suppose to explain the approach itself, but to understand what is with formulae of statistics in IFEFFIT. I understood that you say that k-space is a problem because you didn't rely on it. I thought it is not a good idea for IFEFFIT statistic, not Bayes.
I'm sorry, but I simply do not understand what this sentence means. K-space is not a problem because I don't rely on it. I don't rely on k-space because it is a problem. Specifically, using k-space neglects to properly filter out frequency components that we know we're not trying to model.
I would like to know what you mean of 'Bayes doesn't dictate space', I didn't understand it from papers and books, may be you will help me find where it is said and what is math? I really didn't know anything of it, there was analysis of original spectra in all cases.
No analysis method specified the independent variable(s) 'x'. In fact, usually the model is described as a function *of the parameters*, and any independent variable(s) are ignored. That the data happens to also be a function of some 'x' (whether that be time, frequency, energy, wavenumber, distance, voltage, ....) is not important mathematically. Data can be transformed many ways, and can always be modeled statistically. For EXAFS the data in k-space is certainly not any more real or fundamental than the data in R-space.
I have statistically good fits on this compounds, but with constraints, and Bayesian approach could help decide what parametra are nonmeaningful. It is hard to decide between models without it.
There are many statistical tests one can do to decide between models and parameters. See, for example F-tests, Akaike Information Criteria, Bayesian Information Criteria, and so on.
I will try a test in R-space now, and to simplify the test, if you think, but not sure how to use it to the end. Is it that I should use epsilon_R and simply use FT-transformed dat and model and Bayes statistics will be the same?
I'm not really sure what you're doing, so I can't answer this. Like, I have absolutely no idea what you're doing for "Bayes statistics". It seems from subsequent messages that you may have figured out the issue, but I don't really understand what you're trying to do. Was calculating chi-square by hand the major stumbling block? That would be much easier with Larch than Ifeffit.
I'm not sure noise can be treated as constant in this case, you see, it depends on k-weighting and so on. Uncertainty is from noise (that can be thought as Gaussian) and mu0, and it is not quite understandable how to add mu0 co-fitting than.
Why would co-refining the background function changes the methodology? One models some signal (say, "chi(k) + mu0(k)") and then transforms/projects/samples the data and model and compares them. If the data has been transformed, you want the uncertainty in that transformed data. I hope that helps, but I'm not really sure. --Matt
Hello, Dr.Newville I attach the simple example you asked. It is the same 1 oxygen sphere, but without restraint and I simplified and got rid of macros Hfo1.chi is the spectrum. Feff0001.dat is the feff. Check_k.iff is the check in k space fit as was Check_r.iff is the check in r space fit. The result that I asked to check is program variable chi_square and manually calculated chi_square_calc. They aren't equal in both cases. I used chir_mag array in r space fit for a formula because couldn't get re and im arrays, there is something with flags or may be need to do fftf manually, but I didn't want to complicate the example. I will be very glad if you'd help me understand where are my formulae wrong when you could... Because I haven't got equivalent values even in r space, I'm not sure that r space will solve the problem of renormalization. If I simply didn't understand formulae, that's exactly why I tried to ask here to correct me as the user. Thanks a lot, Olga Kashurnikova
Hello, Dr. Matt Newville, I think I found what was wrong, and it seems to work in k-space. I didn't try R-space yet. Here is new file, where the calculation is correct. Of course it is test spectrum without noise that gave small values of chi_square, I tested normal spectrum and there it was a minor change (but noticeable, something in second number after point) One error is that in k space we should divide to 2*N_data. It is not what is in manual, but you say it is not good tested, I don't mind because it works. Second error is mine - I used nofx function because there was no rounding to the small number in IFEFFIT shell, and didn't think 1 point will give such a change, but it was eventually because of exact spectrum. There is something strange with array numeration, because written arrays begin with 0, and in program it seems shifted, but it also is a minor problem. The accuracy is in 4-5 number after point, I think it is single precision in IFEFFIT shell, but it is enough for calculations. With that I can proceed. So if you are busy, it is now not worthy wasting many time for my question (that's why I write), but I will be very glad if somewhere you will correct me if there is something wrong, and if you will tell me what I don't know about FT transforms with Bayesian analysis. And I'm very thankful for your invitation to think of Larch and Bayes together, and for your colleagues' suggestion of Bayes project work, I will surely try Larch when having a time for this, and if could, will continue these studies - for us it obviously depends on situation and financing, too, it should give results, that's why I tried a simple way. But it is very interesting and you were very helpful. Thanks a lot, Olga Kashurnikova
participants (3)
-
John J Rehr
-
Matt Newville
-
Olga Kashurnikova