Optimizing elfutils usage for unwinding

Milian Wolff mail@milianw.de
Thu Feb 1 11:34:30 GMT 2024

Hey all,

I'm working on perfparser/hotspot which ingests perf.data files and does 
unwinding and symbolication etc.

We got a bug report by a user [1] with a worst-case performance situation in 
usage of elfutils, which I do not know how to handle - thus me reaching out to 
you all here.

The problem is that the perf.data file for the workload there contains samples 
for tens of thousands of short lived processes - but overall there are only 
about a hundred _different_ binaries being executed.

When we analyze the data, we currently have one dwfl* per process. We already 
employ excessive caching on the symbolication end, which allows us to only 
look at inline frames and demangling once per executable or library, instead 
of once per process.

But this kind of caching across dwfl* is not possible for what 
dwfl_thread_getframes does internally. Profiling our analysis, I see that most 
of the time is spent by this stack:


Another big chunk is then later on when `dwfl_end` cleans up the modules and 
we get into `__libdw_destroy_frame_cache`.

So, there already seems to be a cache of sorts being build - but it's tied to 
the `dwfl*` structure. In our case we have tens of thousands of these 
structures, each very short lived (as the processes underneath are 

I understand that each process will have its own custom address space mapping, 
but is the CFI/FDE data also tied to such process-specific data? Or could it 
in theory be reuses across `dwfl*` instances?


[1]: https://github.com/KDAB/hotspot/issues/394
Milian Wolff
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part.
URL: <https://sourceware.org/pipermail/elfutils-devel/attachments/20240201/77ca99f3/attachment.sig>

More information about the Elfutils-devel mailing list