Re: [Ifeffit] testing ravelware

10 Nov 2003

      Hi,

I wanted to make some comments on the discussion from last week mostly
between Matt and Shelly in response to my request for testing
volunteers.

Matt's suggestion were mostly about *regression testing* of the
ifeffit library.  I was hoping to inspire volunteers to do *use
testing* of Athena and Artemis.  Both are needed.

Regression testing of ifeffit is undoubtedly important and would be a
splendid way for a volunteer to contribute something of genuine value
to the ifeffit project.  One could imagine creating some sort of
database using standards data.  Then every time a new version of
ifeffit is released, the standards data can be checked against this
data base.  One series of regression tests might be to Fourier
transform a standard chi(k) data file using a sequence of different
parameters (i.e. different k-ranges, different weights, phase
correction, different window types, and so on -- essentially every
option listed in the Ifeffit Reference Manual for the fftf() function
could be poked during the regression testing).  The data base would
contain the trusted results [1] of each of the permutations of the
basic Fourier transform operation.  The regression tests would then
use some kind of R-factor-like metric to determine that the Fourier
transforms under each set of conditions are sufficiently close to the
trusted results that the Fourier transforming operation can be said to
still be working in the new release of ifeffit.  This kind of thing
could be implemented in C or perl or python or whatever, whatever
makes the implementer happy.

This kind of automated, low-level testing is very much necessary.
Although Matt has done an extraordinary job [2] of providing a stable,
high-quality analysis engine, there have been examples in recent
months of errors being introduced in a new release.  In some cases,
these mistakes would not have survived good regression testing.  Thus,
the contribution of a good set of regression tests would help the core
ifeffit library immensely.

The other day, when I asked for test volunteers, I was asking for
something a little different from regression testing.  The problem
with a GUI is that it encompasses not only algorithms but a user
implementation as well.  Testing the underlying algorithms is
necessary but inadequate.  It is quite possible for a GUI to have
properly-functioning algorithms but to have serious problems between
the on-screen user experience and those properly-functioning
algorithms.  Thus a GUI must be used by a human [3] to be tested.

My understanding of how companies do that is to have a crew of folks
who use the program repeatedly and report any indication that
something is misbehaving.  My concept of how my codes might be tested
would be for a number of people to volunteer for some aspect of the
program.  For example, one person might volunteer to read files into
Athena and write them out in every form that is supported.  Then, plot
the output data independently of Athena to verify that writing them
out worked as expected.  Another person might volunteer to put the
alignment dialog through its paces.  And so on.

Not only is that kind of user-level interaction difficult to automate,
automating it might fail to turn up some problem in the user interface
[4].  If enough people volunteered to check out some aspect of the
code each time a new release comes out, then no one would actually
have to do that much work [5].  Indeed, a network of use testing
volunteers could be incorporated into the release process such that a
version does not get released until the problems the testers uncover
get fixed.

Soooooo... regression testing and use testing...  any takers?

Regards,
B

Footnotes:

[1] The sense in which I mean "trusted" is a similar sense as in
    cryptogrophy or security research.  That is, the trusted results
    are the ones that we presume to be true or have convinced
    ourselves are true by some form of scrutiny outside the scope of
    the regression tests.  If we allow a faulty operation to slip into
    the "trusted results", then future regression tests will trun up
    false positives when they preserve that faulty result.  However,
    with enough care and scrutiny, a set of trusted results can be
    made that are, indeed, trustworthy.

[2] Really, he has!  Has everyone remembered to thank Matt again
    recently for all he has done?

[3] This is essentially the same reason that faculty have grad
    students rather than well-trained bonobos. ;-)

[4] And it certainly would for Athena and Artemis.  I did not consider
    the possibility of automating the on-screen behavior when I first
    designed the programs.  There really isn't a good way to automate
    screen events within my codes in the manner of, say, writing VB
    scripts for Microsoft Office applications.  And retooling the
    codes to allow for that would be a huge job that would take me
    away from implementing new features for weeks or months for
    relatively little benefit.

[5] Except me, of course!

--
 Bruce Ravel  ----------------------------------- ravel@phys.washington.edu
 Code 6134, Building 3, Room 222
 Naval Research Laboratory                          phone: (1) 202 767 5947
 Washington DC 20375, USA                             fax: (1) 202 767 1697

 NRL Synchrotron Radiation Consortium (NRL-SRC)
 Beamlines X11a, X11b, X23b
 National Synchrotron Light Source
 Brookhaven National Laboratory, Upton, NY 11973

 My homepage:    http://feff.phys.washington.edu/~ravel
 EXAFS software: http://feff.phys.washington.edu/~ravel/software/exafs/