Summary: | Inconsistently-biased addresses for ET_EXEC | ||
---|---|---|---|
Product: | systemtap | Reporter: | Josh Stone <jistone> |
Component: | translator | Assignee: | Unassigned <systemtap> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | mark |
Priority: | P2 | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Host: | Target: | ||
Build: | Last reconfirmed: |
Description
Josh Stone
2014-03-08 00:12:16 UTC
Here's another example, probing function("_start") because that will resolve from the symbol table either way. You can see this with "main" too, but it will be resolving from debuginfo when available, so it's a very different path. With coreutils-debuginfo: $ stap -e 'probe process.function("_start") {next}' -c /usr/bin/ls -p2 # probes process("/usr/bin/ls").function("_start") /* pc=.absolute+0x4e3c */ /* <- process("/usr/bin/ls").function("_start") */ Without coreutils-debuginfo: $ stap -e 'probe process.function("_start") {next}' -c /usr/bin/ls -p2 # probes process("/usr/bin/ls").function("_start") /* pc=.dynamic+0x4e3c */ /* <- process("/usr/bin/ls").function("_start") */ This explains why we aren't bitten by the buildid more often. For inode-uprobes, we always ultimately use a file-offset "address", but .absolute/.dynamic affects how we get task_finder callbacks. For .absolute, we use a process callback and fake a 0 "relocation", so having an absolute build-id address from there works fine. For .dynamic, we use an mmap callback where we know the relocation, so having a relative build-id address also works. But process.plt is always giving me .absolute, which fails if the build-id address was relative. Unless it happens to follow a function probe, then it will becomes .dynamic too. :/ So maybe process.plt just needs to trigger something in dwfl to make it always follow suit? (Honestly, I'd rather get rid of the ".absolute" concept, and convert everything to ".dynamic" with relative addresses, but that may be more invasive.) Consider: $ ./stap -e 'probe process.plt("strstr"),process.function("_start") {next}' -c /usr/bin/ls --poison-cache -p2 # probes process("/usr/bin/ls").statement(0x402c00) /* pc=.absolute+0x2c00 */ /* <- process("/usr/bin/ls").plt("strstr").statement(0x402c00) */ process("/usr/bin/ls").function("_start") /* pc=.dynamic+0x4e3c */ /* <- process("/usr/bin/ls").plt("strstr"),process("/usr/bin/ls").function("_start") */ In one run, we changed our mind from .absolute to .dynamic!?! We make this decision in dwflpp::relocate_address, which looks at dwfl_module_relocations(). That function will return 0 if mod->e_type is ET_EXEC, or 1 if mod->e_type is ET_DYN. And sure enough, the e_type is changing in the middle of this run. A hardware watchpoint tells me where: libdwfl/dwfl_module_getdwarf.c 134│ mod->e_type = ehdr->e_type; 135│ 136│ /* Relocatable Linux kernels are ET_EXEC but act like ET_DYN. */ 137│ if (mod->e_type == ET_EXEC && file->vaddr != mod->low_addr) 138├> mod->e_type = ET_DYN; (gdb) bt #0 0x000000370481dc4b in open_elf (file=file@entry=0x2185be0, mod=<optimized out>, mod=<optimized out>) at dwfl_module_getdwarf.c:138 #1 0x000000370481e4b1 in find_aux_sym (aux_strshndx=<synthetic pointer>, aux_xndxscn=<synthetic pointer>, aux_symscn=<synthetic pointer>, mod=0x2185b60) at dwfl_module_getdwarf.c:907 #2 find_symtab (mod=mod@entry=0x2185b60) at dwfl_module_getdwarf.c:1022 #3 0x000000370481ee8e in dwfl_module_getsymtab (mod=0x2185b60) at dwfl_module_getdwarf.c:1259 #4 0x00000000004e4c24 in symbol_table::get_from_elf (this=0x2188fb0) at ../tapsets.cxx:7806 So when it opened the aux minisymtab (.gnu_debugdata), this triggered a kernel heuristic that really should not apply to this case. FWIW, file->vaddr = 0x400020, and mod->low_addr = 0x400000. Should be fixed by elfutils commit 65cefbd0793c0f9e90a326d7bebf0a47c93294ad Author: Josh Stone <jistone@redhat.com> Date: Tue Mar 11 10:19:28 2014 -0700 libdwfl: dwfl_module_getdwarf.c (open_elf) only (re)set mod->e_type once. As noted in https://sourceware.org/bugzilla/show_bug.cgi?id=16676#c2 for systemtap, the heuristic used by open_elf to set the kernel Dwfl_Module type to ET_DYN, even if the underlying ELF file e_type was set to ET_EXEC, could trigger erroneously for non-kernel/non-main (debug or aux) files. Make sure we only set the e_type of the module once when processing the main file (when the phdrs can be trusted). I confirmed on elfutils-0.158-2.fc21, ET_EXEC stays ".absolute" in all cases. |