Bug 30948 - src/stack doesn't show source inforamtion if the target is compiled with clang
Summary: src/stack doesn't show source inforamtion if the target is compiled with clang
Status: RESOLVED FIXED
Alias: None
Product: elfutils
Classification: Unclassified
Component: libdw (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-10-08 17:22 UTC by Masatake YAMATO
Modified: 2024-02-29 22:54 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Masatake YAMATO 2023-10-08 17:22:20 UTC
dwfl_module_getsrc() doesn't work unexpectedly if the target
executable is compiled with clang though it works well with an
executable compiled with gcc.

For testing I used executables built from the following source code (target.c):

    static int
    f1(void)
    {
      while (1);
      return 0;
    }

    static int
    f0(void)
    {
      return f1();
    }

    int main(void)
    {
      return f0();
    }

I made two executables with two compilers.
  $ gcc -O0 -g target.c -o target-gcc
  $ clang -O0 -g target.c -o target-clang

I run the two executables in background:

    $ ./target-gcc &  
    [2] 2118229

    $ ./target-clang &         
    [3] 2118253

I run src/stack with -s -p options:


For the executable built with gcc, backtrace reported source lines.

    $ ./src/stack -s -p 2118229
    PID 2118229 - process
    TID 2118229:
    #0  0x000000000040110a f1
	/home/yamato/var/elfutils/tests/target.c:4:9
    #1  0x0000000000401115 f0
	/home/yamato/var/elfutils/tests/target.c:11:10
    #2  0x0000000000401120 main
	/home/yamato/var/elfutils/tests/target.c:16:10
    #3  0x00007f7d6c823b8a __libc_start_call_main
	../sysdeps/nptl/libc_start_call_main.h:58:16
    #4  0x00007f7d6c823c4b __libc_start_main@@GLIBC_2.34
	../csu/libc-start.c:360:3
    #5  0x0000000000401045 _start


For the executable built with clang, backtrace didn't report source lines.

    $ ./src/stack -s -p 2118253
    PID 2118253 - process
    TID 2118253:
    #0  0x0000000000401149 f1
    #1  0x0000000000401139 f0
    #2  0x0000000000401124 main
    #3  0x00007f4a84ab5b8a __libc_start_call_main
	../sysdeps/nptl/libc_start_call_main.h:58:16
    #4  0x00007f4a84ab5c4b __libc_start_main@@GLIBC_2.34
	../csu/libc-start.c:360:3
    #5  0x0000000000401045 _start

The bt command of Gdb can report them:

    $ ./target-clang &
    [1] 2116805
    [yamato@dev64]~/var/elfutils/tests% pstack 2116805
    #0  0x0000000000401149 in f1 () at target.c:4
    #1  0x0000000000401139 in f0 () at target.c:11
    #2  0x0000000000401124 in main () at target.c:16

Gdb can report the source lines. So I guess clang injected enough information
to the executable.

The versions of the compilers:
    $ gcc -v
    Using built-in specs.
    COLLECT_GCC=gcc
    COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/13/lto-wrapper
    OFFLOAD_TARGET_NAMES=nvptx-none
    OFFLOAD_TARGET_DEFAULT=1
    Target: x86_64-redhat-linux
    Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,objc,obj-c++,ada,go,d,m2,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --enable-libstdcxx-backtrace --with-libstdcxx-zoneinfo=/usr/share/zoneinfo --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl=/builddir/build/BUILD/gcc-13.2.1-20230728/obj-x86_64-redhat-linux/isl-install --enable-offload-targets=nvptx-none --without-cuda-driver --enable-offload-defaulted --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux --with-build-config=bootstrap-lto --enable-link-serialization=1
    Thread model: posix
    Supported LTO compression algorithms: zlib zstd
    gcc version 13.2.1 20230728 (Red Hat 13.2.1-1) (GCC)
    
    $ clang -v
    clang version 16.0.6 (Fedora 16.0.6-3.fc38)
    Target: x86_64-redhat-linux-gnu
    Thread model: posix
    InstalledDir: /usr/bin
    Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-redhat-linux/13
    Selected GCC installation: /usr/bin/../lib/gcc/x86_64-redhat-linux/13
    Candidate multilib: .;@m64
    Candidate multilib: 32;@m32
    Selected multilib: .;@m64

The revision of elfutils: 557aa6a4b7b1d678b7c2c3b9aae1dafcc2160c64 (git commit).
Comment 1 Mark Wielaard 2023-11-02 13:25:20 UTC
Does compiling with -gdwarf-aranges help?
For some reason clang doesn't seem to include .debug_aranges by default.
Comment 2 Masatake YAMATO 2023-11-06 21:21:16 UTC
Thank you. With -gdwarf-aranges, the stack command works expectedly.

$ clang -O0 -g -gdwarf-aranges target.c -o target-clang
$ ./target-clang &                                     
[1] 4104789
$ ./src/stack -s -p  4104789
PID 4104789 - process
TID 4104789:
#0  0x0000000000401149 f1
    /home/yamato/var/elfutils/target.c:4:3
#1  0x0000000000401139 f0
    /home/yamato/var/elfutils/target.c:11:10
#2  0x0000000000401124 main
    /home/yamato/var/elfutils/target.c:16:10
#3  0x00007fb7e9213b8a __libc_start_call_main
    ../sysdeps/nptl/libc_start_call_main.h:58:16
#4  0x00007fb7e9213c4b __libc_start_main@@GLIBC_2.34
    ../csu/libc-start.c:360:3
#5  0x0000000000401045 _start
Comment 3 Mark Wielaard 2023-11-07 14:43:58 UTC
See also https://sourceware.org/bugzilla/show_bug.cgi?id=22288

The issue is that we rely on .debug_aranges to know whether a (code) address is described in a particular DWARF CU (and from there which line table to use). We could create an aranges table ourselves, that means scanning over all the Compile Units and look at the first CU DIE, which will have a low-pc and high-pc (or ranges) attribute describing the addresses. The tricky bit is "merging" that with any existing .debug_aranges table. And that even if there is a .debug_aranges it might not be complete.

So I think what we need to do is whenever we try to match up an address to a CU, we first check the .debug_aranges, if it doesn't match (and it is the first time) we do a full scan over all CUs and create or auxilary aranges table, then use that for (any future) mapping. I think this needs to be done in dwarf_addrdie which uses dwarf_getaranges but we might have to look at other places which use dwarf_getaranges and decide whether to make that functions the "magic" one or if that will always report the actual .debug_aranges. I think I would prefer if dwarf_getaranges would remain reporting the actual .debug_aranges data (if it is there).
Comment 4 Aaron Merey 2024-02-29 22:54:05 UTC
Fixed in the following commit:

commit d7768acc697735cc7498ddc891a1065439ba1d6f
Author: Aaron Merey <amerey@redhat.com>
Date:   Mon Feb 26 09:58:39 2024 -0500

    Add __libdw_getdieranges
    
    __libdw_getdieranges builds an aranges list by iterating over each
    CU and recording each address range.
    
    This function is an alternative to dwarf_getaranges.  dwarf_getaranges
    attempts to read address ranges from .debug_aranges, which might be
    absent or incomplete.
    
    This patch replaces dwarf_getaranges with __libdw_getdieranges in
    dwarf_addrdie and dwfl_module_addrdie.  The existing tests in
    run-getsrc-die.sh are also rerun with .debug_aranges removed from
    the testfiles.
    
    https://sourceware.org/bugzilla/show_bug.cgi?id=22288
    https://sourceware.org/bugzilla/show_bug.cgi?id=30948
    
    Signed-off-by: Aaron Merey <amerey@redhat.com>