Performance Analysis¶
Performance Analysis Tools | Version |
---|---|
INTEL VTUNE | 2015, 2016, 2017, 2018 |
PGI pgprof | 2015,2016,2017 |
GNU gprof | 2.25, 2.26, 2.27, 2.28, 2.29 |
Scalasca | 2.2.2, 2.3.1 |
mpiP | 3.4.1 |
nvprof | 6.5.14, 7.0.28, 7.5.18, 8.0.27, 8.0.44, 8.0.61 |
GPROF¶
GNU profiler gprof
module load binutils
GNU gprof is a widely used profiling tool for Unix systems which produces an execution profile of C and Fortran programs. It can show the application call graph, which represents the calling relationships between functions in the program, and the percentage of total execution time spent in each function.
Compile and Link your code with -pg
flag
gcc [flags] -g [source_file] -o [output_file] -pg
Invoke gprof
to analyse and display profiling results.
gprof options [executable-file] gmon.out bb-data [yet-more-profile-data-files...] [> outfile]
Output Options
--flat-profile
: prints the total amount of time spent and the number of calls to each function--graph
: prints the call-graph analysis from the application execution--annotated-source
: prints profiling information next to the original source code
VTUNE Amplifier XE¶
Whether you are tuning for the first time or doing advanced performance optimization, Intel® VTune™ Amplifier XE provides the data needed to meet a wide variety of tuning needs. Collect a rich set of performance data for hotspots, threading, OpenCL, locks and waits, DirectX*, bandwidth, and more. But good data is not enough. You need tools to mine the data and make it easy to interpret. Powerful analysis lets you sort, filter, and visualize results on the timeline and on your source. Identify serial time and load imbalance. Select slow Open MP instances and discover why they are slow.
module load intel
GUI¶
amplxe-gui
Please use gui only on login nodes to analyze your report
Command Line¶
You can use command line tool to analyze your program on compute nodes.
amplxe-cl
Check help information
amplxe-cl -help
amplxe-cl -help collect
Perform hotspot analysis
amplxe-cl -collect hotspots -result-dir mydir /home/test/myprogram
Check result summary
amplxe-cl -R summary -r mydir
SCALASCA¶
Scalasca is a software tool that supports the performance optimization of parallel programs by measuring and analyzing their runtime behavior. The analysis identifies potential performance bottlenecks – in particular those concerning communication and synchronization – and offers guidance in exploring their causes.
module load scalasca/2.2.2
mpiP¶
mpiP is a lightweight profiling library for MPI applications. Because it only collects statistical information about MPI functions, mpiP generates considerably less overhead and much less data than tracing tools. All the information captured by mpiP is task-local. It only uses communication during report generation, typically at the end of the experiment, to merge results from all of the tasks into one output file.
Compile and Link with the mpiP library¶
module load mpiP
# gnu
mpif90 mycode.f -g -L$MPIPROOT/lib -lmpiP -lbfd -lunwind -o mycode.x
# intel
mpiifort mycode.f -g -L$MPIPROOT/lib -lmpiP -lbfd -lunwind -o mycode.x
In your slurm script just run the executable
srun mycode.x
after completion check the report file mycode.x.NPROCS.PID.mpiP
nvprof¶
You can you use the nvprof
to collect and view profiling data from the command-line, either import them to visual profiler nvpp
.
Command line nvprof
¶
http://docs.nvidia.com/cuda/profiler-users-guide/index.html#nvprof-overview
nvprof <GPU_EXECUTABLE>
Remote profiling with nvprof
¶
http://docs.nvidia.com/cuda/profiler-users-guide/index.html#unique_307789860
nvprof --export-profile timeline.nvprof <GPU_EXECUTABLE>
To view collected timeline data, the timeline.nvprof file can be imported into
nvvp
as described in Import Single-Process nvprof Session - See more at:
http://docs.nvidia.com/cuda/profiler-users-guide/index.html#import-session
MPI Profiling¶
http://docs.nvidia.com/cuda/profiler-users-guide/index.html#mpi-profiling
The nvprof
profiler can be used to profile individual MPI processes.
srun nvprof -o output.%h.%p.%q{SLURM_PROCID} <GPU_EXECUTABLE>