[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: MPI Calculation Strategy with Lanczos Algorithm

To: "FEFF Software Development" <feffdevel@u.washington.edu>
Subject: Re: MPI Calculation Strategy with Lanczos Algorithm
From: "John J. Rehr" <jjr@leonardo.phys.washington.edu>
Date: Fri, 4 May 2001 14:26:51 -0700 (PDT)
In-Reply-To: <3AF2D268.2C6CCF0A@nist.gov> from James S Sims at "May 4, 2001 12:01:44pm"
List-Help: <http://www.washington.edu/computing/listproc/>
List-Owner: <mailto:feffdevel-request@u.washington.edu> (Human contact for the list)
List-Post: <mailto:feffdevel@u.washington.edu>
List-Subscribe: <mailto:listproc@u.washington.edu?body=subscribe%20feffdevel%20YourName>
List-Unsubscribe: <mailto:listproc@u.washington.edu?body=unsubscribe%20feffdevel>
Reply-To: feffdevel@u.washington.edu
Sender: FEFFDEVEL-owner@u.washington.edu

The tests that Alex has been doing with Lanczos methods indicates that a
van der Vurst scheme called BIGSTAB (i.e. "stabilized bi conjugate gradient" I think)
which is essentially a Lanczos procedure applied to LU, is both efficient and highly
stable. Overall the method is typically about 5 times faster than LU.

Interestingly the number of iterations required for convergence can be interpreted
roughly as the order of the MS expansion used. For a 600 atom Si run, this number
starts at about 50 iterations at low energies and then drops to about 5-6 at
the highese energies. In path expansion terms, this means one needs paths with
up to 50 legs near threshold for XANES but only 5 leg paths in EXAFS. Thus this
shows how the calculation crosses over from full multiple scattering to the
path expansion with increasing energy.

This also suggests a way of revising the MPI strategy. Presently we assign the
first processor the first nloc = nex/numproc energies, e.g., 0 - nloc , the
next nloc+1, ... 2nloc, etc. This works file with the default LU method when
all processors do exactly the same N**3 steps of exact matrix iversion.

However, with the Lanczos approach, the last processor will be sitting idle most of the time,
having completed its work in 5 steps, while the 1st processor will still be groaning away.
Thus it's preferable to reorder the work, so processor of rank i does energy
points i, i+numproc, i + 2 numproc ... and the work is more evenly spread out.
Is there a version of MPE_DECOMP1D that can do this kind of order?

Let me know if you have other comments on this idea.

This change does not have to be done immediately of course. But it will likely
be very useful in the future, especially when the big rotation matrix drix is reconfigured.

J. Rehr

Follow-Ups:
- Re: MPI Calculation Strategy with Lanczos Algorithm
  - From: Francois Farges <farges@univ-mlv.fr>
- Re: MPI Calculation Strategy with Lanczos Algorithm
  - From: Charles Bouldin <charles.bouldin@nist.gov>

Prev by Date: Re: negative argument in ALOG bug in FEFF825 on CRAY PVP machines
Next by Date: Quick fix for drix
Prev by thread: Re: Quick fix for drix
Next by thread: Re: MPI Calculation Strategy with Lanczos Algorithm
Index(es):
- Date
- Thread