Summary: | On Intel Skylake the call tree is incorrect | ||
---|---|---|---|
Product: | binutils | Reporter: | Ruud van der Pas <ruud.vanderpas> |
Component: | gprofng | Assignee: | Vladimir Mezentsev <vladimir.mezentsev> |
Status: | ASSIGNED --- | ||
Severity: | normal | CC: | kurt.goebel, ruud.vanderpas |
Priority: | P3 | ||
Version: | 2.39 | ||
Target Milestone: | --- | ||
Host: | Target: | ||
Build: | Last reconfirmed: | 2022-04-19 00:00:00 | |
Attachments: | This directory contains everything needed to reproduce the problem. |
Created attachment 14046 [details] This directory contains everything needed to reproduce the problem. The call tree output is not correct for this example I ran on an Intel Skylake based system. The code has been parallelized using Pthreads and we should see function "start_thread" in the call tree. It is not there though and this looks like an issue related to stack unwind. This is the output I get: Functions Call Tree. Metric: Attributed Total CPU Time Attr. Name Total CPU sec. 4.827 +-<Total> 4.712 +-collector_root 4.712 | +-driver_mxv 4.712 | +-mxv_core 0.116 +-__libc_start_main 0.116 +-main 0.106 +-init_data 0.050 | +-drand48 0.039 | +-erand48_r 0.014 | +-__drand48_iterate 0.010 +-allocate_data 0.010 +-malloc 0.010 +-_int_malloc 0.003 +-sysmalloc 0.002 +-__default_morecore 0.002 +-sbrk 0.002 +-brk I used gcc 10 and did not enable any optimizations, but I also see this problem if I use -O for example. On an older Intel Haswell based system, I do see start_thread in the call tree. The attachment has everything needed to reproduce the problem. The code is in directory "src" and can be built with "make". On purpose I left my objects and the binary in, as well as the experiment directory. There is a run.sh script that was used to show the problem. Sample output of this script is in run.res.