[PATCH v2] dwarf_getaranges: Build aranges list from CUs instead of .debug_aranges
Aaron Merey
amerey@redhat.com
Tue Feb 20 04:20:13 GMT 2024
Hi Mark,
On Tue, Feb 13, 2024 at 8:28 AM Mark Wielaard <mark@klomp.org> wrote:
>
> > This patch's method of building the aranges list is slower than simply
> > reading .debug_aranges. On my machine, running eu-stack on a 2.9G
> > firefox core file takes about 8.7 seconds with this patch applied,
> > compared to about 3.3 seconds without this patch.
>
> That is significant. 2.5 times slower.
> Did you check with perf or some other profiler where exactly the extra
> time goes. Does the new method find more aranges (and so produces
> "better" backtraces)?
I took another look at the performance and realized I made a silly
mistake when I originally tested this. My build that was 2.5x slower
was compiled with -O0 but I tested it against an -O2 build. Oops!
With the optimization level set to -O2 in all cases, the runtime of
'eu-stack -s' on the original 2.9G firefox core file is only about
9% slower: 3.6 seconds with the patch applied compared to 3.3
seconds without the patch.
As for the number of aranges found, there is a difference for libxul.so:
250435 with the patch compared to 254832 without. So 4397 fewer aranges
are found when using the new CU iteration method. I'll dig into this and
see if there is a problem or if it's just due to some redundancy in
libxul's .debug_aranges. FWIW there was no change to the aranges counts
for the other modules searched during this eu-stack firefox corefile test.
>
> > Ideally we could assume that .debug_aranges is complete if it is present
> > and build the aranges list via CU iteration only when .debug_aranges
> > is absent. This would let us save time on gcc-compiled binaries, which
> > include complete .debug_aranges by default.
>
> Right. This why the question is if the firefox case sees more/less
> aranges. If I remember correctly it is build with gcc and rustc, and
> rustc might not produce .debug_aranges.
>
> > However the DWARF spec appears to permit partially complete
> > .debug_aranges [1]. We could improve performance by starting with a
> > potentially incomplete list built from .debug_aranges. If a lookup
> > fails then search the CUs for missing aranges and add to the list
> > when found.
> >
> > This approach would complicate the dwarf_get_aranges interface. The
> > list it initially provides could no longer be assumed to be complete.
> > The number of elements in the list could change during calls to
> > dwarf_getarange{info, _addr}. This would invalidate the naranges value
> > set by dwarf_getaranges. The current API doesn't include a way to
> > communicate to the caller when narages changes and by how much.
> >
> > Due to these complications I think it's better to simply ignore
> > .debug_aranges altogether and build the aranges table via CU iteration,
> > as is done in this patch.
>
> Might it be an idea to leave dwarf_getaranges as it is and introduce a
> new (internal) function to get "dynamic" ranges? It looks like what
> programs (like eu-stack and eu-addr2line) really use is dwarf_addrdie
> and dwfl_module_addrdie. These are currently build on dwarf_getaranges,
> but could maybe use a new interface?
IMO this depends on what users expect from dwarf_getaranges. Do they
want the exact contents of .debug_aranges (whether or not it's complete)
or should dwarf_getaranges go beyond .debug_aranges to ensure the most
complete results?
The comment for dwarf_getaranges in libdw.h simply reads "Return list
address ranges". Since there's no mention of .debug_aranges specifically,
I think it's fair if dwarf_getaranges does whatever it can to ensure
comprehensive results. In which case dwarf_getaranges should probably
dynamically generate aranges.
Aaron
More information about the Elfutils-devel
mailing list