Bug 29015 - On Intel Skylake the call tree is incorrect
Summary: On Intel Skylake the call tree is incorrect
Status: ASSIGNED
Alias: None
Product: binutils
Classification: Unclassified
Component: gprofng (show other bugs)
Version: 2.39
: P3 normal
Target Milestone: ---
Assignee: Vladimir Mezentsev
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-04-01 05:26 UTC by Ruud van der Pas
Modified: 2022-07-22 13:53 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed: 2022-04-19 00:00:00


Attachments
This directory contains everything needed to reproduce the problem. (2.29 MB, application/x-gzip)
2022-04-01 05:26 UTC, Ruud van der Pas
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ruud van der Pas 2022-04-01 05:26:28 UTC
Created attachment 14046 [details]
This directory contains everything needed to reproduce the problem.

The call tree output is not correct for this example I ran on an Intel Skylake based system. The code has been parallelized using Pthreads and we should see function "start_thread" in the call tree. It is not there though and this looks like an issue related to stack unwind.

This is the output I get:

Functions Call Tree. Metric: Attributed Total CPU Time

Attr.      Name
Total
CPU sec.
4.827      +-<Total>
4.712        +-collector_root
4.712        |  +-driver_mxv
4.712        |    +-mxv_core
0.116        +-__libc_start_main
0.116          +-main
0.106            +-init_data
0.050            |  +-drand48
0.039            |    +-erand48_r
0.014            |      +-__drand48_iterate
0.010            +-allocate_data
0.010              +-malloc
0.010                +-_int_malloc
0.003                  +-sysmalloc
0.002                    +-__default_morecore
0.002                      +-sbrk
0.002                        +-brk

I used gcc 10 and did not enable any optimizations, but I also see this problem if I use -O for example.

On an older Intel Haswell based system, I do see start_thread in the call tree.

The attachment has everything needed to reproduce the problem. The code is in directory "src" and can be built with "make". On purpose I left my objects and the binary in, as well as the experiment directory. There is a run.sh script that was used to show the problem. Sample output of this script is in run.res.