[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: User interface, client/server issues, and FEFF



This is a very interesting topic and I will be a while digesting all 
that Bruce has put up on the web pages. At a quick read, this all 
strikes me as a VERY good idea. The piece of this that I can 
contribute to is that Feff now runs on parallel processing clusters. 
This is a natural for the server part of this problem.

I have been working with Jim Sims at NIST (translation: giving 
occasional advice while Jim does masterful work with his parallel 
processing knowledge) to get Feff running on parallel processing 
clusters. A quick summary:

We use the message passing interface to create the new version FeffMPI.

FeffMPI is nearly as portable as the original code. In addition to an 
f77 compiler, you only need the MPI libraries and a cluster to use.

FeffMPI runs on clusters that use shared or distributed memory, or a 
combination of both.

We have successfully compiled and run FeffMPI on two Linux/Pentium 
clusters, two IBM SP2 clusters, a WinNT cluster, an SGI cluster, and 
a dual Pentium shared memory machine.

We are nearly finished with merging the MPI changes into a single 
fortran code base. From that base, different makefiles can build 
single or MPI versions of Feff. The idea is to get the MPI version 
integrated with the main code base
so that development can continue on physics and algorithmic 
improvements (Alex's new matrix methods) without breaking the MPI 
version or splitting the code into two development tracks.

The speedups are pretty amazing. We get speeds that are 50 times 
faster than "typical" desktop speeds. For about $15-20k any group 
that wants to can get about an order of magnitude speed improvement 
in the calculations. See
http://www.ceramics.nist.gov/programs/thinfilms/parallelXANES.html
for some sense of how the speed scales with cluster size. The scaling law is
that the runtime scales with number of processors as

Const*(  0.03 + (0.97)/Nprocs).

The constant term depends on the type of processor and the fortran 
compiler, except for the scale factor, all clusters appear to be 
equivalent. About 3% of the execution time is in code that is still 
executed sequentially on the clusters. This implies an asymptotic 
speed up of about 1/.03=33, meaning that a cluster of 16-32 
processors is what gets you into the range of diminishing returns 
where it is not worth going to a larger cluster. Cluster machines in 
that size range are readily available. Eventually, we will run the 
profiler again to see if there are any hot spots within the 3% of the 
code that is still sequential which could be parallelized. This could 
easily raise the asymptotic speedup to about a factor of 50-70.

(Note: Feffmpi is about 50 times faster than "typical" desktops, but 
has an asymptotic speedup of about 33 relative to a single processor 
of the SAME TYPE on a given cluster. The reason for the two numbers 
is that they express different things; the 33 is about scalability, 
the 50 is about what an average user will think when he runs the 
code.)

It would take some work to map this onto the 2-5(??) machines that a 
typical XAFS group might have in their offices without disrupting the 
other uses, but those machines could function as a FeffMPI cluster 
and as a dedicated FeffMPI cluster at night.

The other part of this is that I wonder about setting up some 
reciprocity arrangements, ie, a group in Japan uses the UW Feffmpi 
"office cluster" at night in exchange for letting the UW folks do the 
same with the Japanese machines when it is night in Japan, etc.

A NIST internal report has been written up on this; a PRB paper is in 
the works.