How to associate Elf with Dwfl_Module returned by dwfl_report_module
Thu Mar 22 12:29:00 GMT 2018
On Mittwoch, 21. März 2018 22:21:13 CET Mark Wielaard wrote:
> Hi Milian,
> On Wed, Mar 21, 2018 at 02:01:41PM +0100, Milian Wolff wrote:
> > Here's the code for the perf tools:
> > https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/
> > perf/util/unwind-libdw.c?h=perf/core#n52
> > Here's the code for the perfparser:
> > http://code.qt.io/cgit/qt-creator/perfparser.git/tree/app/
> > perfsymboltable.cpp#n479
> > Let's concentrate on perf for now, but perfparser has similar logic:
> > We parse the mmap events in the perf.data file and store that information.
> > Note that the perf.data file does not contain events for munmap calls.
> > Then
> > while unwinding the callstack of a perf sample, we lookup the most recent
> > mmap event for every given instruction pointer address, and ensure that
> > the corresponding ELF was registered with libdw.
> So, modules are never deregistered?
> In that case, that might explain the issue.
No, they are deregistered - that is not the issue. Perf actually starts with a
clean dwfl on every sample and registers whatever modules are relevant for the
given sample. perfparser tries to be a bit smarter and caches more, but also
has code to deregister if something goes amiss.
> But I see there is a check if there is already something at the address.
> The interface to "remove" a module might not be immediately clear.
> The idea is that if modules need to be remove you'll call
> dwfl_report_begin, possibly dwfl_report_elf for any new module and then
> dwfl_report_end has a callback that gets all old modules and decides
> whether to re-report them, or they'll get removed. You might want to
> experiment with doing that and not re-report any module that overlaps
> with the new module. (See the libdwfl.h documentation for a hopefully
> clearer description.)
> > > Specifically are you using false for the add_p_vaddr argument?
> > Yes, we are.
> > > And could you provide some example where the reported address is
> > > wrong/different from the start address of the Dwfl_Module?
> > I don't think it's the start address that is wrong, rather it's the end
> > address. But it's hard for me to come up with a small selfcontained
> > example at this stage. I am regularly seeing broken backtraces for
> > samples where I have the gut feeling that missing reported ELFs are to
> > blame. But we report everything, except for scenarios where the mmap
> > events seemingly overlap. This overlapping is, as far as I can see,
> > actually a side effect of remapping taking place in the dynamic linker
> > (i.e. a single dlopen/dynamic linked library can yield multiple mmap
> > events). One way or another, we end up with a situation where we cannot
> > report an ELF to dwfl due to two issues:
> > a) either ELF tells us we are overlapping some module and just stops which
> > is bad, since we would actually much prefer the newly reported ELF to
> > take precedence
> > b) we find an mmap event that with a non-zero pgoff, and have no clue how
> > to call dwfl_report_elf and just give up.
> > In both cases, I was hopeing for dwfl_report_module to help since it
> > seemingly allows me to exactly recreate the mapping that was traced
> > originally.
> If you could add some logging and post that plus the eu-readelf -l
> output of the ELF file, that might help track down what is really going
Yes, I will try to find the time to write a more elaborate reproducer for this
issue, to better figure out what is going on here.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 833 bytes
Desc: This is a digitally signed message part.
More information about the Elfutils-devel