This is the mail archive of the elfutils-devel@sourceware.org mailing list for the elfutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: How to associate Elf with Dwfl_Module returned by dwfl_report_module


On Mittwoch, 21. März 2018 14:01:41 CET Milian Wolff wrote:
> On Dienstag, 20. März 2018 23:05:49 CET Mark Wielaard wrote:
> > Hi Milian,
> 
> Hey Mark :)
> 
> > On Sat, Mar 17, 2018 at 02:14:48PM +0100, Milian Wolff wrote:
> > > a recurring issue in the libdwfl integration of perf and perfparser are
> > > supposedly overlapping modules. The perf data file contains the exact
> > > mappings for all files corresponding to the actual mmap events that
> > > occurred during runtime, e.g.:
> > > 
> > > ```
> > > $ perf script --show-mmap-events | grep MMAP | grep stdc
> > > heaptrack_print 13962 87163.483450: PERF_RECORD_MMAP2 13962/13962:
> > > [0x7fd0aea84000(0x387000) @ 0 08:03 413039 3864781083]: r-xp
> > > /usr/lib/libstdc+ +.so.6.0.24
> > > heaptrack_print 13962 87163.483454: PERF_RECORD_MMAP2 13962/13962:
> > > [0x7fd0aebfc000(0x1ff000) @ 0x178000 08:03 413039 3864781083]: ---p
> > > /usr/lib/ libstdc++.so.6.0.24
> > > heaptrack_print 13962 87163.483458: PERF_RECORD_MMAP2 13962/13962:
> > > [0x7fd0aedfb000(0xd000) @ 0x177000 08:03 413039 3864781083]: rw-p
> > > /usr/lib/
> > > libstdc++.so.6.0.24
> > > heaptrack_print 13962 87163.484860: PERF_RECORD_MMAP2 13962/13962:
> > > [0x7fd0aedfb000(0xc000) @ 0x177000 08:03 413039 3864781083]: r--p
> > > /usr/lib/
> > > libstdc++.so.6.0.24
> > > ```
> > > So far, both perf and perfparser are using dwfl_report_elf to report the
> > > file. But that API is deducing the mapping addresses internally, which
> > > may or may not be what we saw at runtime. I suspect that this is the
> > > reason for some issues we are seeing, such as supposedly overlapping
> > > modules.
> > 
> > How exactly are you calling dwfl_report_elf?
> 
> Here's the code for the perf tools:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/
> perf/util/unwind-libdw.c?h=perf/core#n52
> 
> Here's the code for the perfparser:
> 
> http://code.qt.io/cgit/qt-creator/perfparser.git/tree/app/
> perfsymboltable.cpp#n479
> 
> Let's concentrate on perf for now, but perfparser has similar logic:
> 
> We parse the mmap events in the perf.data file and store that information.
> Note that the perf.data file does not contain events for munmap calls. Then
> while unwinding the callstack of a perf sample, we lookup the most recent
> mmap event for every given instruction pointer address, and ensure that the
> corresponding ELF was registered with libdw.
> 
> > Specifically are you using false for the add_p_vaddr argument?
> 
> Yes, we are.
> 
> > And could you provide some example where the reported address is
> > wrong/different from the start address of the Dwfl_Module?
> 
> I don't think it's the start address that is wrong, rather it's the end
> address. But it's hard for me to come up with a small selfcontained example
> at this stage. I am regularly seeing broken backtraces for samples where I
> have the gut feeling that missing reported ELFs are to blame. But we report
> everything, except for scenarios where the mmap events seemingly overlap.
> This overlapping is, as far as I can see, actually a side effect of
> remapping taking place in the dynamic linker (i.e. a single dlopen/dynamic
> linked library can yield multiple mmap events). One way or another, we end
> up with a situation where we cannot report an ELF to dwfl due to two
> issues:
> 
> a) either ELF tells us we are overlapping some module and just stops which
> is bad, since we would actually much prefer the newly reported ELF to take
> precedence
> 
> b) we find an mmap event that with a non-zero pgoff, and have no clue how to
> call dwfl_report_elf and just give up.
> 
> In both cases, I was hopeing for dwfl_report_module to help since it
> seemingly allows me to exactly recreate the mapping that was traced
> originally.

Here's one way to investigate where perf and dwfl disagree on the file 
mappings:

diff --git a/tools/perf/util/unwind-libdw.c b/tools/perf/util/unwind-libdw.c
index 7bdd239c795c..739c68b73772 100644
--- a/tools/perf/util/unwind-libdw.c
+++ b/tools/perf/util/unwind-libdw.c
@@ -48,11 +48,17 @@ static int __report_module(struct addr_location *al, u64 
ip,
 			mod = 0;
 	}
 
-	if (!mod)
+	if (!mod) {
+		Dwarf_Addr s, e;
 		mod = dwfl_report_elf(ui->dwfl, dso->short_name,
 				      (dso->symsrc_filename ? dso->symsrc_filename : dso-
>long_name), -1, al->map->start,
 				      false);
 
+		dwfl_module_info(mod, NULL, &s, &e, NULL, NULL, NULL, NULL);
+		if (al->map->start != s || al->map->end != e)
+			pr_warning("MEH: %s | mmap: %zx %zx | dwfl: %zx %zx\n", dso-
>short_name, al->map->start, al->map->end, s, e);
+	}
+
 	return mod && dwfl_addrmodule(ui->dwfl, ip) == mod ? 0 : -1;
 }
 
On one of my perf.data files with many broken backtraces this gives me:

MEH: heaptrack_print | mmap: 56166e9d4000 56166ea39000 | dwfl: 56166e9d4000 
56166ea38880
MEH: ld-2.26.so | mmap: 7fd0afc6c000 7fd0afe93000 | dwfl: 7fd0afc6c000 
7fd0afe920f8
MEH: libc-2.26.so | mmap: 7fd0ae16a000 7fd0ae521000 | dwfl: 7fd0ae16a000 
7fd0ae5208f0
MEH: libstdc++.so.6.0.24 | mmap: 7fd0aea84000 7fd0aee0b000 | dwfl: 
7fd0aea84000 7fd0aee0a640
MEH: libzstd.so.1.3.3 | mmap: 7fd0aee0b000 7fd0af087000 | dwfl: 7fd0aee0b000 
7fd0af086030

Interestingly, here the mmap events observed by perf are actually always 
*larger* than what dwfl sees...
-- 
Milian Wolff
mail@milianw.de
http://milianw.de

Attachment: signature.asc
Description: This is a digitally signed message part.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]