The gprofng Application Profiling Tool

1. What is gprofng?

Gprofng is a next generation application profiling tool. It supports the profiling of programs written in C, C++, Java, or Scala running on systems using processors from Intel, AMD, or Arm. The extent of the support is processor dependent, but the basic views are always available.

Two distinct steps are needed to produce a profile. In the first phase, the performance data is collected. This information is stored in a directory called the experiment directory. There are several tools available to display and analyze the information stored in the directory.

2. The main features of gprofng

These are some of the features:

3. The gprofng tools

Several tools are part of the gprofng suite. This is a one line summary that describes the key functionality:

gprofng collect app

collect the performance data

gprofng display text

display the performance results in ASCII format

gprofng display html

generate an html structure to be navigated in a browser

gprofng display src

display the source code interleaved with the instructions

gprofng archive

archive an experiment directory

4. Usage examples

4.1. About the example program

This program is written in C and uses Pthreads to parallelize a matrix-vector algorithm, multiplying an m x n matrix with a vector of length n. The program supports the -m and -n options to set the matrix sizes. The -t option sets the number of threads and -h prints a usage overview. The algorithm is executed repeatedly in order to increase the timings. This repeat count can be adjusted using the -r option. The -v option enables verbose mode. All options have defaults, so the program can be executed without any options.

This is an example how to run the program. A 5000 x 4000 matrix is used and the thread count is set to 2. The program prints a single line with the execution status, the matrix sizes and number of threads used:

$ ./mxv-pthreads.exe -m 5000 -n 4000 -t 2
mxv: error check passed - rows = 5000 columns = 4000 threads = 2

Clearly, everything went well.

4.2. How to get a basic profile

We now want to make a profile in order to see where the time was spent.

The only thing that needs to be done in order to collect the performance information is to run the same job, but now under control of gprofng. To this end, we use the gprofng collect app command:

$ gprofng collect app ./mxv-pthreads.exe -m 5000 -n 4000 -t 2
Creating experiment directory (Process ID: 2607338) ...
mxv: error check passed - rows = 5000 columns = 4000 threads = 2

The second line shown above is printed by gprofng. It tells us that an experiment directory with the name has been created. It also shows the process id. This line is followed by the program output.

The experiment directory is a regular Linux directory and contains the information that was generated while the program was running. The name is a default name. On a subsequent profiling run, a directory called will be created, and so on.

In the second step, we can extract the information stored in the experiment directory using the gprofng display text command. If invoked without additional commands, the interpreter is launched and the user can issue commands within this interactive environment:

$ gprofng display text
Warning: History and command editing is not supported on this system.
(gp-display-text) quit

Although it is perfectly valid to use this feature, in practice it is probably easier to add one or more additional commands, or as we shall see later, put the commands in a script file.

This is an example how to extract information from the experiment directory. The -functions command displays a table with the functions executed and the values for the metrics. By default, the total CPU time is shown:

$ gprofng display text -functions
Functions sorted by metric: Exclusive Total CPU Time

Excl. Total   Incl. Total    Name
CPU           CPU
 sec.      %   sec.      %
5.554 100.00  5.554 100.00   <Total>
5.274  94.95  5.274  94.95   mxv_core
0.140   2.52  0.270   4.86   init_data
0.090   1.62  0.110   1.98   erand48_r
0.020   0.36  0.020   0.36   __drand48_iterate
0.020   0.36  0.130   2.34   drand48
0.010   0.18  0.010   0.18   _int_malloc
0.      0.    0.280   5.05   __libc_start_main
0.      0.    0.010   0.18   allocate_data
0.      0.    5.274  94.95   collector_root
0.      0.    5.274  94.95   driver_mxv
0.      0.    0.280   5.05   main
0.      0.    0.010   0.18   malloc
0.      0.    5.274  94.95   start_thread

We see 5 columns. In the first and second column, the exclusive total CPU time is shown, both as a number and a percentage. The next two columns contain the values for the inclusive total CPU time. The function names are in the last column.

The first line in the table shows the values for <Total>. This is a pseudo function, generated by gprofng. The total value(s) for the metric(s) are shown here. The percentages are relative to this number.

Note that the very first line above shows the metric used to sort the data. By default, the metric displayed in the first column is used in the sort, but this is one of things that can be modified.

4.3. A first example of customization

5. How does the data collection work?

In the data collection phase, statistical call stack sampling is used.

The target application is executed under control of gprofng and at regular intervals (10ms by default), the program is stopped. At that point, it gathers information that is stored in the experiment directory. For example, the call stack is recorded using a stack unwind algorithm. The application then resumes execution until it is stopped again. Thanks to this, gprofng can essentially profile any executable and does not need to have access to the source code to collect the information.

Due to the statistical nature of this process, it is natural to see small differences across seemingly identical profiling jobs. In practice, this is not something to be worried about though. The variations are typically below 5%, but there is something else to pay attention to. It may be that a relatively small function is not represented well, because the sampling granularity is too high. In this case, the -p option can be used to increase the sampling rate. Through the same option, the sampling rate can also be decreased.

6. How to contribute

7. How to port

8. Frequently Asked Questions (FAQ)

Go to the main binutils page

None: gprofng (last edited 2022-11-24 06:46:56 by RuudVanDerPas)

All content (C) 2008 Free Software Foundation. For terms of use, redistribution, and modification, please see the WikiLicense page.