dwfl_module_getsrc() doesn't work unexpectedly if the target executable is compiled with clang though it works well with an executable compiled with gcc. For testing I used executables built from the following source code (target.c): static int f1(void) { while (1); return 0; } static int f0(void) { return f1(); } int main(void) { return f0(); } I made two executables with two compilers. $ gcc -O0 -g target.c -o target-gcc $ clang -O0 -g target.c -o target-clang I run the two executables in background: $ ./target-gcc & [2] 2118229 $ ./target-clang & [3] 2118253 I run src/stack with -s -p options: For the executable built with gcc, backtrace reported source lines. $ ./src/stack -s -p 2118229 PID 2118229 - process TID 2118229: #0 0x000000000040110a f1 /home/yamato/var/elfutils/tests/target.c:4:9 #1 0x0000000000401115 f0 /home/yamato/var/elfutils/tests/target.c:11:10 #2 0x0000000000401120 main /home/yamato/var/elfutils/tests/target.c:16:10 #3 0x00007f7d6c823b8a __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58:16 #4 0x00007f7d6c823c4b __libc_start_main@@GLIBC_2.34 ../csu/libc-start.c:360:3 #5 0x0000000000401045 _start For the executable built with clang, backtrace didn't report source lines. $ ./src/stack -s -p 2118253 PID 2118253 - process TID 2118253: #0 0x0000000000401149 f1 #1 0x0000000000401139 f0 #2 0x0000000000401124 main #3 0x00007f4a84ab5b8a __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58:16 #4 0x00007f4a84ab5c4b __libc_start_main@@GLIBC_2.34 ../csu/libc-start.c:360:3 #5 0x0000000000401045 _start The bt command of Gdb can report them: $ ./target-clang & [1] 2116805 [yamato@dev64]~/var/elfutils/tests% pstack 2116805 #0 0x0000000000401149 in f1 () at target.c:4 #1 0x0000000000401139 in f0 () at target.c:11 #2 0x0000000000401124 in main () at target.c:16 Gdb can report the source lines. So I guess clang injected enough information to the executable. The versions of the compilers: $ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/13/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-redhat-linux Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,objc,obj-c++,ada,go,d,m2,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --enable-libstdcxx-backtrace --with-libstdcxx-zoneinfo=/usr/share/zoneinfo --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl=/builddir/build/BUILD/gcc-13.2.1-20230728/obj-x86_64-redhat-linux/isl-install --enable-offload-targets=nvptx-none --without-cuda-driver --enable-offload-defaulted --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux --with-build-config=bootstrap-lto --enable-link-serialization=1 Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 13.2.1 20230728 (Red Hat 13.2.1-1) (GCC) $ clang -v clang version 16.0.6 (Fedora 16.0.6-3.fc38) Target: x86_64-redhat-linux-gnu Thread model: posix InstalledDir: /usr/bin Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-redhat-linux/13 Selected GCC installation: /usr/bin/../lib/gcc/x86_64-redhat-linux/13 Candidate multilib: .;@m64 Candidate multilib: 32;@m32 Selected multilib: .;@m64 The revision of elfutils: 557aa6a4b7b1d678b7c2c3b9aae1dafcc2160c64 (git commit).
Does compiling with -gdwarf-aranges help? For some reason clang doesn't seem to include .debug_aranges by default.
Thank you. With -gdwarf-aranges, the stack command works expectedly. $ clang -O0 -g -gdwarf-aranges target.c -o target-clang $ ./target-clang & [1] 4104789 $ ./src/stack -s -p 4104789 PID 4104789 - process TID 4104789: #0 0x0000000000401149 f1 /home/yamato/var/elfutils/target.c:4:3 #1 0x0000000000401139 f0 /home/yamato/var/elfutils/target.c:11:10 #2 0x0000000000401124 main /home/yamato/var/elfutils/target.c:16:10 #3 0x00007fb7e9213b8a __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58:16 #4 0x00007fb7e9213c4b __libc_start_main@@GLIBC_2.34 ../csu/libc-start.c:360:3 #5 0x0000000000401045 _start
See also https://sourceware.org/bugzilla/show_bug.cgi?id=22288 The issue is that we rely on .debug_aranges to know whether a (code) address is described in a particular DWARF CU (and from there which line table to use). We could create an aranges table ourselves, that means scanning over all the Compile Units and look at the first CU DIE, which will have a low-pc and high-pc (or ranges) attribute describing the addresses. The tricky bit is "merging" that with any existing .debug_aranges table. And that even if there is a .debug_aranges it might not be complete. So I think what we need to do is whenever we try to match up an address to a CU, we first check the .debug_aranges, if it doesn't match (and it is the first time) we do a full scan over all CUs and create or auxilary aranges table, then use that for (any future) mapping. I think this needs to be done in dwarf_addrdie which uses dwarf_getaranges but we might have to look at other places which use dwarf_getaranges and decide whether to make that functions the "magic" one or if that will always report the actual .debug_aranges. I think I would prefer if dwarf_getaranges would remain reporting the actual .debug_aranges data (if it is there).
Fixed in the following commit: commit d7768acc697735cc7498ddc891a1065439ba1d6f Author: Aaron Merey <amerey@redhat.com> Date: Mon Feb 26 09:58:39 2024 -0500 Add __libdw_getdieranges __libdw_getdieranges builds an aranges list by iterating over each CU and recording each address range. This function is an alternative to dwarf_getaranges. dwarf_getaranges attempts to read address ranges from .debug_aranges, which might be absent or incomplete. This patch replaces dwarf_getaranges with __libdw_getdieranges in dwarf_addrdie and dwfl_module_addrdie. The existing tests in run-getsrc-die.sh are also rerun with .debug_aranges removed from the testfiles. https://sourceware.org/bugzilla/show_bug.cgi?id=22288 https://sourceware.org/bugzilla/show_bug.cgi?id=30948 Signed-off-by: Aaron Merey <amerey@redhat.com>