[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: User interface, client/server issues, and FEFF
This is a very interesting topic and I will be a while digesting all
that Bruce has put up on the web pages. At a quick read, this all
strikes me as a VERY good idea. The piece of this that I can
contribute to is that Feff now runs on parallel processing clusters.
This is a natural for the server part of this problem.
I have been working with Jim Sims at NIST (translation: giving
occasional advice while Jim does masterful work with his parallel
processing knowledge) to get Feff running on parallel processing
clusters. A quick summary:
We use the message passing interface to create the new version FeffMPI.
FeffMPI is nearly as portable as the original code. In addition to an
f77 compiler, you only need the MPI libraries and a cluster to use.
FeffMPI runs on clusters that use shared or distributed memory, or a
combination of both.
We have successfully compiled and run FeffMPI on two Linux/Pentium
clusters, two IBM SP2 clusters, a WinNT cluster, an SGI cluster, and
a dual Pentium shared memory machine.
We are nearly finished with merging the MPI changes into a single
fortran code base. From that base, different makefiles can build
single or MPI versions of Feff. The idea is to get the MPI version
integrated with the main code base
so that development can continue on physics and algorithmic
improvements (Alex's new matrix methods) without breaking the MPI
version or splitting the code into two development tracks.
The speedups are pretty amazing. We get speeds that are 50 times
faster than "typical" desktop speeds. For about $15-20k any group
that wants to can get about an order of magnitude speed improvement
in the calculations. See
http://www.ceramics.nist.gov/programs/thinfilms/parallelXANES.html
for some sense of how the speed scales with cluster size. The scaling law is
that the runtime scales with number of processors as
Const*( 0.03 + (0.97)/Nprocs).
The constant term depends on the type of processor and the fortran
compiler, except for the scale factor, all clusters appear to be
equivalent. About 3% of the execution time is in code that is still
executed sequentially on the clusters. This implies an asymptotic
speed up of about 1/.03=33, meaning that a cluster of 16-32
processors is what gets you into the range of diminishing returns
where it is not worth going to a larger cluster. Cluster machines in
that size range are readily available. Eventually, we will run the
profiler again to see if there are any hot spots within the 3% of the
code that is still sequential which could be parallelized. This could
easily raise the asymptotic speedup to about a factor of 50-70.
(Note: Feffmpi is about 50 times faster than "typical" desktops, but
has an asymptotic speedup of about 33 relative to a single processor
of the SAME TYPE on a given cluster. The reason for the two numbers
is that they express different things; the 33 is about scalability,
the 50 is about what an average user will think when he runs the
code.)
It would take some work to map this onto the 2-5(??) machines that a
typical XAFS group might have in their offices without disrupting the
other uses, but those machines could function as a FeffMPI cluster
and as a dedicated FeffMPI cluster at night.
The other part of this is that I wonder about setting up some
reciprocity arrangements, ie, a group in Japan uses the UW Feffmpi
"office cluster" at night in exchange for letting the UW folks do the
same with the Japanese machines when it is night in Japan, etc.
A NIST internal report has been written up on this; a PRB paper is in
the works.