[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
A Short Guide to SpeedShop
A Short Guide to SpeedShop
The SGI's SpeedShop, which is an integrated package of performance tools,
is used to do the performance analysis of FeffMPI. For introduction to
SpeedShop, one should read the SpeedShop User's Guide which describes and
illustrates methods for measuring program performance using SpeedShop
commands. The SpeedShop User's Guide can be accessed from the following
web site: http://www.techpubs.sgi.com/library or use 'insight', the IRIS
InSight Online Documentation (Book) Viewer. Read the section, Running
Experiments on MPI Programs, in Chapter 6 Setting Up and Running
Experiments: ssrun.
A short guide of using SpeedShop is provided below:
1. Do a normal compilation of your source codes and create executable
files such as ffmod1, ffmod2 etc.
2. Create a batch script file, feffmpi8_run, as follows:
#!/bin/sh -vf
#QSUB -r feff1 -o feffmpi_8.out -eo
#QSUB -ro
#QSUB -lt 3600 -lT 3600
#QSUB -lm 128mb -lM 128mb
#QSUB -l mpp_p=8
setenv MPI_RLD_HACK_OFF 1
cd /tay/nist/hung/Feff825/TestMPI/runs/Gan.8p
mpirun -np 8 ssrun -pcsampx ../../bin/rdinp
mpirun -np 8 ssrun -pcsampx ../../bin/ffmod1
mpirun -np 8 ssrun -pcsampx ../../bin/ldos
mpirun -np 8 ssrun -pcsampx ../../bin/ffmod2
mpirun -np 8 ssrun -pcsampx ../../bin/ffmod3
mpirun -np 8 ssrun -pcsampx ../../bin/ffmod4
mpirun -np 8 ssrun -pcsampx ../../bin/ffmod5
mpirun -np 8 ssrun -pcsampx ../../bin/ffmod6
The environment variable, MPI_RLD_HACK_OFF must be set to 1 before
executing the ssrun command.
The option -pcsampx estimates the process virtual time. This includes
the time your program is actually executing, but it excludes the
time the system is providing services, such as executing system
calls. It uses 32-bit counters and samples every 10 ms.
3. On NIST Origins, we submit the batch request to NQS:
qsub feffmpi8_run
However, the script can also be run interactively, by typing
8_run. In the interactive run, the QSUB commands are just comments.
All output files will be written in the working directory. In this
case, they are in /tay/nist/hung/Feff825/TestMPI/runs/Gan.8p, the
working directory as prescribed in the batch file, feffmpi8_run.
There is a output file for each processor that the mpi program
spawns the actual mpi application slaves. For example,
mpirun -np 8 ssrun -pcsampx ../../bin/ffmod1
generates following 10 files:
ffmod1.pcsampx.e18050811
ffmod1.pcsampx.f17784230
ffmod1.pcsampx.f17981454
ffmod1.pcsampx.f18027285
ffmod1.pcsampx.f18047906
ffmod1.pcsampx.f18049449
ffmod1.pcsampx.f18050314
ffmod1.pcsampx.f18050794
ffmod1.pcsampx.f18051358
ffmod1.pcsampx.m18050811
Performance data are contained in files ffmod1.pcsampx.f********.
4. The last step is to use prof to analyze data files generated by
ssrun and produces reports. Prof writes an analysis of the performance
data to stdout. To save the report, one generally redirects the output
to a file.
In our example, we do
prof ffmod1.pcsampx.f17784230 > ffmod1.pcsampx.p1
prof ffmod1.pcsampx.f17981454 > ffmod1.pcsampx.p2
...
5. To see the sample file, ffmod1.pcsampx.p1, just created, type:
more ffmod1.pcsampx.p1
-------------------------------------------------------------------------
SpeedShop profile listing generated Fri Apr 27 13:41:16 2001
prof ffmod1.pcsampx.f17784230
ffmod1 (n32): Target program
pcsampx: Experiment name
pc,4,10000,0:cu: Marching orders
R12000 / R12010: CPU / FPU
32: Number of CPUs
300: Clock frequency (MHz.)
Experiment notes--
From file ffmod1.pcsampx.f17784230:
Caliper point 0 at target begin, PID 17784230
/tay/nist/hung/Feff825/TestMPI/bin/ffmod1
Caliper point 1 at exit(0)
-------------------------------------------------------------------------
Summary of statistical PC sampling data (pcsampx)--
7562: Total samples
75.620: Accumulated time (secs.)
10.0: Time per sample (msecs.)
4: Sample bin width (bytes)
-------------------------------------------------------------------------
Function list, in descending order by time
-------------------------------------------------------------------------
[index] secs % cum.% samples function (dso: file, line)
[1] 24.780 32.8% 32.8% 2478 yprep (ffmod1: pot_tot.f, 9236)
[2] 6.470 8.6% 41.3% 647 MPI_SGI_progress (libmpi.so: progress.c, 70)
[3] 5.640 7.5% 48.8% 564 xgllm (ffmod1: pot_tot.f, 8367)
[4] 5.480 7.2% 56.0% 548 fms (ffmod1: pot_tot.f, 7759)
[5] 3.620 4.8% 60.8% 362 MPI_SGI_bypass_progress (libmpi.so: bypass.c, 1021)
[6] 3.400 4.5% 65.3% 340 MPI_SGI_inet_progress (libmpi.so: inet.c, 522)
[7] 2.940 3.9% 69.2% 294 MPI_SGI_getqueue (libmpi.so: getqueue.s, 29)
[8] 2.730 3.6% 72.8% 273 __libm_rcis (libm.so: rcis.c, 95)
[9] 2.530 3.3% 76.2% 253 MPI_SGI_request_wait (libmpi.so: req.c, 755)
[10] 2.290 3.0% 79.2% 229 MPI_SGI_shared_progress (libmpi.so: shared.c, 163)
[11] 2.150 2.8% 82.0% 215 terp (ffmod1: pot_tot.f, 17760)
[12] 1.980 2.6% 84.6% 198 MPI_SGI_request_test (libmpi.so: req.c, 698)
[13] 1.690 2.2% 86.9% 169 CTRSM (ffmod1: pot_tot.f, 14364)
[14] 1.310 1.7% 88.6% 131 __expf (libm.so: fexp.c, 96)
[15] 1.050 1.4% 90.0% 105 dfovrg (ffmod1: pot_tot.f, 10540)
[16] 0.820 1.1% 91.1% 82 memcpy (libc.so.1: bcopy.s, 329)
[17] 0.670 0.9% 92.0% 67 terpc (ffmod1: pot_tot.f, 17942)
[18] 0.570 0.8% 92.7% 57 solin (ffmod1: pot_tot.f, 11736)
[19] 0.560 0.7% 93.5% 56 gglu (ffmod1: pot_tot.f, 8506)
[20] 0.480 0.6% 94.1% 48 besjn (ffmod1: pot_tot.f, 16814)
[21] 0.460 0.6% 94.7% 46 fixdsx (ffmod1: pot_tot.f, 18234)
[22] 0.450 0.6% 95.3% 45 intout (ffmod1: pot_tot.f, 11005)
[23] 0.410 0.5% 95.8% 41 __exp (libm.so: exp.c, 103)
[24] 0.300 0.4% 96.2% 30 soldir (ffmod1: pot_tot.f, 7152)
[25] 0.280 0.4% 96.6% 28 fixvar (ffmod1: pot_tot.f, 18369)
[26] 0.250 0.3% 96.9% 25 ff2g (ffmod1: pot_tot.f, 1054)
[27] 0.200 0.3% 97.2% 20 __pow (libm.so: pow.c, 256)
[28] 0.170 0.2% 97.4% 17 potrdf (ffmod1: pot_tot.f, 5731)
[29] 0.160 0.2% 97.6% 16 MPI_SGI_buddy_alloc (libmpi.so: buddy.c, 195)
[30] 0.140 0.2% 97.8% 14 MPI_SGI_packet_state_data (libmpi.so: packet_state.c, 251)
[31] 0.140 0.2% 98.0% 14 rholie (ffmod1: pot_tot.f, 1910)
[32] 0.140 0.2% 98.2% 14 diff (ffmod1: pot_tot.f, 10762)
[33] 0.120 0.2% 98.4% 12 scmtmp (ffmod1: pot_tot.f, 4490)
[34] 0.090 0.1% 98.5% 9 CSCAL (ffmod1: pot_tot.f, 14078)
[35] 0.080 0.1% 98.6% 8 ICAMAX (ffmod1: pot_tot.f, 15706)
[36] 0.080 0.1% 98.7% 8 buddy_index (libmpi.so: buddy.c, 178)
[37] 0.080 0.1% 98.8% 8 __log (libm.so: log.c, 207)
[38] 0.070 0.1% 98.9% 7 CGERU (ffmod1: pot_tot.f, 14204)
[39] 0.070 0.1% 99.0% 7 wfirdc (ffmod1: pot_tot.f, 12229)
[40] 0.070 0.1% 99.1% 7 sumax (ffmod1: pot_tot.f, 2595)
[41] 0.060 0.1% 99.2% 6 potdvp (ffmod1: pot_tot.f, 11527)
[42] 0.050 0.1% 99.2% 5 MPI_SGI_packet_alloc_payload (libmpi.so: packet_alloc.c, 39)
[43] 0.040 0.1% 99.3% 4 nucdec (ffmod1: pot_tot.f, 11295)
[44] 0.040 0.1% 99.3% 4 fdrirk (ffmod1: pot_tot.f, 5947)
[45] 0.040 0.1% 99.4% 4 xclmz (ffmod1: pot_tot.f, 8333)
[46] 0.040 0.1% 99.4% 4 __atan2f (libm.so: fatan2.c, 163)
[47] 0.040 0.1% 99.5% 4 __acosf (libm.so: facos.c, 119)
[48] 0.030 0.0% 99.5% 3 CGEMM (ffmod1: pot_tot.f, 14782)
[49] 0.030 0.0% 99.6% 3 flatv (ffmod1: pot_tot.f, 10671)
[50] 0.030 0.0% 99.6% 3 getorb (ffmod1: pot_tot.f, 18514)
[51] 0.020 0.0% 99.6% 2 list_insert (libmpi.so: buddy.c, 113)
[52] 0.020 0.0% 99.7% 2 movrlp (ffmod1: pot_tot.f, 3349)
[53] 0.020 0.0% 99.7% 2 CGETF2 (ffmod1: pot_tot.f, 13794)
[54] 0.020 0.0% 99.7% 2 istprm (ffmod1: pot_tot.f, 2946)
[55] 0.020 0.0% 99.7% 2 inmuac (ffmod1: pot_tot.f, 10902)
[56] 0.020 0.0% 99.8% 2 csomm2 (ffmod1: pot_tot.f, 17309)
[57] 0.020 0.0% 99.8% 2 MPI_SGI_pack_some (libmpi.so: pack.c, 81)
[58] 0.010 0.0% 99.8% 1 __cosh (libm.so: sinh.c, 364)
[59] 0.010 0.0% 99.8% 1 __read (libc.so.1: read.s, 20)
[60] 0.010 0.0% 99.8% 1 __sinh (libm.so: sinh.c, 153)
[61] 0.010 0.0% 99.8% 1 CLASWP (ffmod1: pot_tot.f, 14264)
[62] 0.010 0.0% 99.9% 1 vlda (ffmod1: pot_tot.f, 6734)
[63] 0.010 0.0% 99.9% 1 lagdat (ffmod1: pot_tot.f, 5992)
[64] 0.010 0.0% 99.9% 1 MPI_SGI_packet_state_medium (libmpi.so: packet_state.c, 84)
[65] 0.010 0.0% 99.9% 1 inipot (ffmod1: pot_tot.f, 1268)
[66] 0.010 0.0% 99.9% 1 scfdat (ffmod1: pot_tot.f, 4955)
[67] 0.010 0.0% 99.9% 1 dsordf (ffmod1: pot_tot.f, 6356)
[68] 0.010 0.0% 99.9% 1 MPI_SGI_getlist (libmpi.so: getlist.s, 29)
[69] 0.010 0.0% 99.9% 1 cgetrs (ffmod1: pot_tot.f, 13846)
[70] 0.010 0.0% 100.0% 1 t_getc (libftn.so: lread.c, 125)
[71] 0.010 0.0% 100.0% 1 z_wnew (libftn.so: iio.c, 134)
[72] 0.010 0.0% 100.0% 1 s_cmp (libftn.so: s_cmp.c, 63)
[73] 0.010 0.0% 100.0% 1 MPI_SGI_putqueue (libmpi.so: putqueue.s, 28)
75.620 100.0% 100.0% 7562 TOTAL