GDB shared library tracking with stap probes x _dl_debug_state
Fri May 7 19:42:03 GMT 2021
I'm cc-ing the GDB ML as well, as this might be an issue for other
architectures that store flags in ELF symbols like armhf.
Matthias (cc-ed) reported the following ticket on GDB's bugzilla:
This is related to how GDB tracks shared library loads/unloads in
dynamically-linked executables. GDB is aided by some hooks provided by
the dynamic linker.
There are two ways GDB will track these shared library events:
* _dl_debug_state mechanism
This is a dummy function that gets called by the dynamic linker's
dl_main (...) function so debbugers can breakpoint it and track shared
_dl_debug_state is a real ELF symbol that lives in .dynsym. This is a
fallback mechanism in GDB these days.
* stap probes
This is a more recent approach where some probe points are provided by
the ELF file and GDB breakpoints a list of known probes instead.
There are no real ELF symbols here, just probe names and addresses that
debuggers should use to put breakpoints into. This is the preferred way
to track shared library events in GDB nowadays.
Going back to bz27826, up until Ubuntu 18.04 (glibc 2.27) on armhf, GDB
used the _dl_debug_state mechanism to track shared library events. This
is due to a bug in stap that made GDB fail a check, thus falling back to
using the _dl_debug_state mechanism.
With Ubuntu 20.04 (glibc 2.31), this check no longer fails and GDB
decide to use the new stap mechanism instead.
That's all fine, but there is one small detail that doesn't work for
armhf, and that is discovering if we're dealing with a PC that is arm
mode or thumb mode.
armhf's GDB uses a few strategies to figure out the mode: mapping
symbols, LSB of the PC and ELF symbol flags.
Given distros usually strip binaries (ld.so is also stripped), only a
few symbols are left in the executable file itself, and _dl_debug_state
is one of them.
GDB can still peak at the _dl_debug_state ELF symbol and retrieve the
flag that indicates we have a arm or thumb mode function. That way GDB
can place the proper arm/thumb breakpoint at the address pointed to by
With the stap probes approach, this is not possible. As was said before,
the probe points are not real ELF symbols. They're just metadata with a
name and an address.
Of course we could lookup what symbol contains a particular probe
address, but those symbols are not available in stripped binaries
(_dl_main, for example).
So GDB is left with no useful information to insert the right kind of
breakpoint for the specified address. It defaults to arm mode, and,
since dl_main is thumb mode, things just break.
I believe this may also be a problem for MIPS, since it has to determine
the ISA bit for some operations.
Now, two or three possible solutions exist:
1 - Force GDB to fallback to using _dl_debug_state for armhf (and
possibly other architectures). This is considered bad because the
affected architectures can't take advantage of a more advanced mechanism
for tracking shared library events.
2 - Not stripping ld.so/glibc. I can't determine the impact of this
choice, but distros strip binaries for a reason. Having to carry all
symbols for a particular library may not be desirable.
It is also not desirable to force users to install a dbg package for
ld.so/glibc just to be able to use a debugger.
3 - Strip symbols from ld.so/glibc, but keep a few select critical
symbols that debuggers will want to use. I've been told this may be a
bit undesirable from glibc's perspective.
I noticed the probe points fall into the following functions: _dl_main,
_dl_map_object_from_fd, lose, dl_open_worker and _dl_close_worker.
If we keep those symbols, GDB will be able to figure out what mode we
have and the proper breakpoint to use for each of those symbols.
Before making a decision, it sounds best to discuss this and come up
with the best solution for both projects and the distros.
More information about the Libc-alpha