[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

A Short Guide to SpeedShop





A Short Guide to SpeedShop


The SGI's SpeedShop, which is an integrated package of performance tools,
is used to do the performance analysis of FeffMPI. For introduction to 
SpeedShop, one should read the SpeedShop User's Guide which describes and
illustrates methods for measuring program performance using SpeedShop
commands.  The SpeedShop User's Guide can be accessed from the following
web site: http://www.techpubs.sgi.com/library or use 'insight', the IRIS
InSight Online Documentation (Book) Viewer. Read the section, Running
Experiments on MPI Programs, in Chapter 6 Setting Up and Running 
Experiments: ssrun.


A short guide of using SpeedShop is provided below:

1. Do a normal compilation of your source codes and create executable
   files such as ffmod1, ffmod2 etc.

2. Create a batch script file, feffmpi8_run, as follows:

#!/bin/sh -vf
#QSUB -r feff1 -o feffmpi_8.out -eo
#QSUB -ro
#QSUB -lt 3600 -lT 3600
#QSUB -lm 128mb -lM 128mb
#QSUB -l mpp_p=8
setenv MPI_RLD_HACK_OFF 1
cd /tay/nist/hung/Feff825/TestMPI/runs/Gan.8p
mpirun -np 8 ssrun -pcsampx ../../bin/rdinp
mpirun -np 8 ssrun -pcsampx ../../bin/ffmod1
mpirun -np 8 ssrun -pcsampx ../../bin/ldos
mpirun -np 8 ssrun -pcsampx ../../bin/ffmod2
mpirun -np 8 ssrun -pcsampx ../../bin/ffmod3
mpirun -np 8 ssrun -pcsampx ../../bin/ffmod4
mpirun -np 8 ssrun -pcsampx ../../bin/ffmod5
mpirun -np 8 ssrun -pcsampx ../../bin/ffmod6


The environment variable, MPI_RLD_HACK_OFF must be set to 1 before
executing the ssrun command.

The option -pcsampx estimates the process virtual time. This includes
the time your program is actually executing, but it excludes the
time the system is providing services, such as executing system
calls.  It uses 32-bit counters and samples every 10 ms.



3. On NIST Origins, we submit the batch request to NQS:

qsub feffmpi8_run

However, the script can also be run interactively, by typing
8_run. In the interactive run, the QSUB commands are just comments.

All output files will be written in the working directory. In this
case, they are in /tay/nist/hung/Feff825/TestMPI/runs/Gan.8p, the
working directory as prescribed in the batch file, feffmpi8_run.

There is a output file for each processor that the mpi program
spawns the actual mpi application slaves.  For example,

mpirun -np 8 ssrun -pcsampx ../../bin/ffmod1

generates following 10 files:

ffmod1.pcsampx.e18050811
ffmod1.pcsampx.f17784230
ffmod1.pcsampx.f17981454
ffmod1.pcsampx.f18027285
ffmod1.pcsampx.f18047906
ffmod1.pcsampx.f18049449
ffmod1.pcsampx.f18050314
ffmod1.pcsampx.f18050794
ffmod1.pcsampx.f18051358
ffmod1.pcsampx.m18050811 

Performance data are contained in files ffmod1.pcsampx.f********.   


4. The last step is to use prof to analyze data files generated by
ssrun and produces reports. Prof writes an analysis of the performance
data to stdout. To save the report, one generally redirects the output
to a file.

In our example, we do

prof ffmod1.pcsampx.f17784230 > ffmod1.pcsampx.p1
prof ffmod1.pcsampx.f17981454 > ffmod1.pcsampx.p2
...


5. To see the sample file, ffmod1.pcsampx.p1, just created, type:

more ffmod1.pcsampx.p1


-------------------------------------------------------------------------
SpeedShop profile listing generated Fri Apr 27 13:41:16 2001
   prof ffmod1.pcsampx.f17784230
                  ffmod1 (n32): Target program
                       pcsampx: Experiment name
               pc,4,10000,0:cu: Marching orders
               R12000 / R12010: CPU / FPU
                            32: Number of CPUs
                           300: Clock frequency (MHz.)
  Experiment notes--
	  From file ffmod1.pcsampx.f17784230:
	Caliper point 0 at target begin, PID 17784230
			/tay/nist/hung/Feff825/TestMPI/bin/ffmod1
	Caliper point 1 at exit(0)
-------------------------------------------------------------------------
Summary of statistical PC sampling data (pcsampx)--
                          7562: Total samples
                        75.620: Accumulated time (secs.)
                          10.0: Time per sample (msecs.)
                             4: Sample bin width (bytes)
-------------------------------------------------------------------------
Function list, in descending order by time
-------------------------------------------------------------------------
 [index]      secs    %    cum.%   samples  function (dso: file, line)

     [1]    24.780  32.8%  32.8%      2478  yprep (ffmod1: pot_tot.f, 9236)
     [2]     6.470   8.6%  41.3%       647  MPI_SGI_progress (libmpi.so: progress.c, 70)
     [3]     5.640   7.5%  48.8%       564  xgllm (ffmod1: pot_tot.f, 8367)
     [4]     5.480   7.2%  56.0%       548  fms (ffmod1: pot_tot.f, 7759)
     [5]     3.620   4.8%  60.8%       362  MPI_SGI_bypass_progress (libmpi.so: bypass.c, 1021)
     [6]     3.400   4.5%  65.3%       340  MPI_SGI_inet_progress (libmpi.so: inet.c, 522)
     [7]     2.940   3.9%  69.2%       294  MPI_SGI_getqueue (libmpi.so: getqueue.s, 29)
     [8]     2.730   3.6%  72.8%       273  __libm_rcis (libm.so: rcis.c, 95)
     [9]     2.530   3.3%  76.2%       253  MPI_SGI_request_wait (libmpi.so: req.c, 755)
    [10]     2.290   3.0%  79.2%       229  MPI_SGI_shared_progress (libmpi.so: shared.c, 163)
    [11]     2.150   2.8%  82.0%       215  terp (ffmod1: pot_tot.f, 17760)
    [12]     1.980   2.6%  84.6%       198  MPI_SGI_request_test (libmpi.so: req.c, 698)
    [13]     1.690   2.2%  86.9%       169  CTRSM (ffmod1: pot_tot.f, 14364)
    [14]     1.310   1.7%  88.6%       131  __expf (libm.so: fexp.c, 96)
    [15]     1.050   1.4%  90.0%       105  dfovrg (ffmod1: pot_tot.f, 10540)
    [16]     0.820   1.1%  91.1%        82  memcpy (libc.so.1: bcopy.s, 329)
    [17]     0.670   0.9%  92.0%        67  terpc (ffmod1: pot_tot.f, 17942)
    [18]     0.570   0.8%  92.7%        57  solin (ffmod1: pot_tot.f, 11736)
    [19]     0.560   0.7%  93.5%        56  gglu (ffmod1: pot_tot.f, 8506)
    [20]     0.480   0.6%  94.1%        48  besjn (ffmod1: pot_tot.f, 16814)
    [21]     0.460   0.6%  94.7%        46  fixdsx (ffmod1: pot_tot.f, 18234)
    [22]     0.450   0.6%  95.3%        45  intout (ffmod1: pot_tot.f, 11005)
    [23]     0.410   0.5%  95.8%        41  __exp (libm.so: exp.c, 103)
    [24]     0.300   0.4%  96.2%        30  soldir (ffmod1: pot_tot.f, 7152)
    [25]     0.280   0.4%  96.6%        28  fixvar (ffmod1: pot_tot.f, 18369)
    [26]     0.250   0.3%  96.9%        25  ff2g (ffmod1: pot_tot.f, 1054)
    [27]     0.200   0.3%  97.2%        20  __pow (libm.so: pow.c, 256)
    [28]     0.170   0.2%  97.4%        17  potrdf (ffmod1: pot_tot.f, 5731)
    [29]     0.160   0.2%  97.6%        16  MPI_SGI_buddy_alloc (libmpi.so: buddy.c, 195)
    [30]     0.140   0.2%  97.8%        14  MPI_SGI_packet_state_data (libmpi.so: packet_state.c, 251)
    [31]     0.140   0.2%  98.0%        14  rholie (ffmod1: pot_tot.f, 1910)
    [32]     0.140   0.2%  98.2%        14  diff (ffmod1: pot_tot.f, 10762)
    [33]     0.120   0.2%  98.4%        12  scmtmp (ffmod1: pot_tot.f, 4490)
    [34]     0.090   0.1%  98.5%         9  CSCAL (ffmod1: pot_tot.f, 14078)
    [35]     0.080   0.1%  98.6%         8  ICAMAX (ffmod1: pot_tot.f, 15706)
    [36]     0.080   0.1%  98.7%         8  buddy_index (libmpi.so: buddy.c, 178)
    [37]     0.080   0.1%  98.8%         8  __log (libm.so: log.c, 207)
    [38]     0.070   0.1%  98.9%         7  CGERU (ffmod1: pot_tot.f, 14204)
    [39]     0.070   0.1%  99.0%         7  wfirdc (ffmod1: pot_tot.f, 12229)
    [40]     0.070   0.1%  99.1%         7  sumax (ffmod1: pot_tot.f, 2595)
    [41]     0.060   0.1%  99.2%         6  potdvp (ffmod1: pot_tot.f, 11527)
    [42]     0.050   0.1%  99.2%         5  MPI_SGI_packet_alloc_payload (libmpi.so: packet_alloc.c, 39)
    [43]     0.040   0.1%  99.3%         4  nucdec (ffmod1: pot_tot.f, 11295)
    [44]     0.040   0.1%  99.3%         4  fdrirk (ffmod1: pot_tot.f, 5947)
    [45]     0.040   0.1%  99.4%         4  xclmz (ffmod1: pot_tot.f, 8333)
    [46]     0.040   0.1%  99.4%         4  __atan2f (libm.so: fatan2.c, 163)
    [47]     0.040   0.1%  99.5%         4  __acosf (libm.so: facos.c, 119)
    [48]     0.030   0.0%  99.5%         3  CGEMM (ffmod1: pot_tot.f, 14782)
    [49]     0.030   0.0%  99.6%         3  flatv (ffmod1: pot_tot.f, 10671)
    [50]     0.030   0.0%  99.6%         3  getorb (ffmod1: pot_tot.f, 18514)
    [51]     0.020   0.0%  99.6%         2  list_insert (libmpi.so: buddy.c, 113)
    [52]     0.020   0.0%  99.7%         2  movrlp (ffmod1: pot_tot.f, 3349)
    [53]     0.020   0.0%  99.7%         2  CGETF2 (ffmod1: pot_tot.f, 13794)
    [54]     0.020   0.0%  99.7%         2  istprm (ffmod1: pot_tot.f, 2946)
    [55]     0.020   0.0%  99.7%         2  inmuac (ffmod1: pot_tot.f, 10902)
    [56]     0.020   0.0%  99.8%         2  csomm2 (ffmod1: pot_tot.f, 17309)
    [57]     0.020   0.0%  99.8%         2  MPI_SGI_pack_some (libmpi.so: pack.c, 81)
    [58]     0.010   0.0%  99.8%         1  __cosh (libm.so: sinh.c, 364)
    [59]     0.010   0.0%  99.8%         1  __read (libc.so.1: read.s, 20)
    [60]     0.010   0.0%  99.8%         1  __sinh (libm.so: sinh.c, 153)
    [61]     0.010   0.0%  99.8%         1  CLASWP (ffmod1: pot_tot.f, 14264)
    [62]     0.010   0.0%  99.9%         1  vlda (ffmod1: pot_tot.f, 6734)
    [63]     0.010   0.0%  99.9%         1  lagdat (ffmod1: pot_tot.f, 5992)
    [64]     0.010   0.0%  99.9%         1  MPI_SGI_packet_state_medium (libmpi.so: packet_state.c, 84)
    [65]     0.010   0.0%  99.9%         1  inipot (ffmod1: pot_tot.f, 1268)
    [66]     0.010   0.0%  99.9%         1  scfdat (ffmod1: pot_tot.f, 4955)
    [67]     0.010   0.0%  99.9%         1  dsordf (ffmod1: pot_tot.f, 6356)
    [68]     0.010   0.0%  99.9%         1  MPI_SGI_getlist (libmpi.so: getlist.s, 29)
    [69]     0.010   0.0%  99.9%         1  cgetrs (ffmod1: pot_tot.f, 13846)
    [70]     0.010   0.0% 100.0%         1  t_getc (libftn.so: lread.c, 125)
    [71]     0.010   0.0% 100.0%         1  z_wnew (libftn.so: iio.c, 134)
    [72]     0.010   0.0% 100.0%         1  s_cmp (libftn.so: s_cmp.c, 63)
    [73]     0.010   0.0% 100.0%         1  MPI_SGI_putqueue (libmpi.so: putqueue.s, 28)

            75.620 100.0% 100.0%      7562  TOTAL