Tools Interface NG - glibc wiki

Next Generation Tools Interface

Fully contained within glibc

Userspace sdt probe points on the entry and exit of all system calls and important library functions - What system administrators and users without root access have been asking for for sometime is something like a lighter weight strace and ltrace. The place where strace taps into the kernel is hit too often by active processes that it often makes bugs disappear. They would like to be able to tap into the userspace side of a process to see what is going on. While root users can use systemtap for this by tapping into just the syscalls that they are most interested in, normal unprivlaged users cannot. They would like to tap into the glibc userspace to be able to write stap scripts against specific syscalls.
Address normalization - Address space randomization provides an important security function but it causes problems with MPI debugging. When library addresses are preloaded at different locations on different machines a parallel debugger has to track those locations of variables and functions on each node participating in the parallel job. Adding the capability to override the preloaded and randomized addresses when starting up a process on a particular node would allow a parallel debugger to launch the job on one node, see where it loads the libraries, then launch the job on other nodes with the same addresses. The alternative, turning off all address randomization reduces the security of the system as a whole.
Access to thread's TLS address space - OpenMP for both C/C++ and Fortran is becoming more and more common. Getting access to a thread's TLS area for both dynamic and static binaries is required.
- For static binaries: Right now, this works well for dynamicly linked executables however, the mechanism needed to make it work for statically linked binaries is a bit of a hack (Note: Statically linked binaries are the default on Cray systems as well as Blue Gene/Q systems).https://sourceware.org/ml/libc-help/2014-03/msg00024.html Switching to the correct behavior will require some modifications to the _r_debug_ interface providing a link map. One proposal floating around is to remove libthread_db and make a structure in the executable's address space which can be used to find the TLS address. https://sourceware.org/ml/libc-help/2014-03/msg00026.html. As of 2.20 the _r_debug is now fixed to include a link map, see: https://sourceware.org/ml/libc-alpha/2014-04/msg00287.html
- From within the inferior: Functions that are inserted into a process's address space by the debugger need to be able to access the TLS area for that thread. GDB can use DWARF to to find the offset within the TLS of a variable and pass that in as a compile time offset into the function being inserted into the traced process's address space. However, to be able to get to the address of the TLS area for a particular thread, it needs to know the module id to make a call to __tls_get_addr(). This module id is burried in a private area of the link_map data structure which is subject to change. It is therefore proposed that we add a new function to libthread_db which extracts the module id from the link map.
Add functions to enumerate pthread primitives to libthread_db - The current capabilities of libthread_db in glibc are very basic as opposed to other Unix operating systems most notably AIX. It would be great if libthread_db would be enhanced to support the enumeration of pthread primitives like mutexes, CVs, RW locks etc...The thread properties API ThreadPropertiesAPI which has been proposed to facilitate ASAN, LSAN, and MSAN demonstrates some of the capabilities currently missing from thread debugging. Supplying these functions not only internally but also for debuggers through libthead_db would add the needed capabilities.
cross-architecture support in libthread_db - This is needed so that gdb can remotely debug a 32b binary running in a container on a rhel machine.
pretty printers usable by other "debuggers" - The python pretty printers for GDB are handy but the interface needed to make them available to other debuggers is too tied to the internals of GDB. It would be nice if the interface could be generalized and made available to other tools authors.This would allow a culture of small tools to be implemented using the consistent human readable versions of types supplied by libraries.
the audit interface - The audit interface is used by Spindle, https://computation-rnd.llnl.gov/spindle, and though it works they are misusing the interface, section 3.2 of their paper, https://computation-rnd.llnl.gov/spindle/pdfs/spindle-paper.pdf point out some challenges that they have with the audit interface.
- the limited set of functions available to processes implementing audit functions
- using the audit facilities puts the dynamic loader into debug mode which slows down its operation
- the audit interface doesn't have way to intercept the load of a binary or library.
- ProposalAuditFlag

Require kernel changes

File Descriptor based process control - The signal and wait mechanism is very difficult to implement and has notable scalability problems. A new file descriptor based interface between the debugger and the kernel would solve many problems.
- GDB requests it http://sourceware.org/gdb/wiki/LinuxKernelWishList
- The signal and wait mechanisms make it difficult to implement a GUI because their programming interfaces do not blend well with other GUI programming paradigms.
- Signals are difficult to program properly even for experienced programmers.
- Multiple concurrently running inferiors combined with unreliable signal devivery present extreme scalability problems. Consider the a problem with literally hundreds of inferiors. While you are iterating through signals that come back after you have sent SIGSTOP to all the threads, many things can happen such as thread creation and deletion. On other platforms they have some synchronous commands where they write a command to the process control socket and when the write completed, the process was stopped in a consistent state.
- A proposed solution is to use a form ofnetlink socket to implement a ptrace like process control protocol.
Multiple debuggers -There are many times where having multiple debuggers attached to one particular process would be helpful. A special case of this might be one debugger handing off to another debugger but it would be useful to allow multiple debuggers to be attached to a process. LLNL worked with IBM to design and implement a debugger interface which allowed multiple debuggers to be attached to a process. They describe the protocol as a "Baton Passing" protocol which allows multiple debuggers to have concurrent read-only access but only have one debugger to have control authority. This protocol is described in the IBM redbook found here: http://www.redbooks.ibm.com/abstracts/redp4659.html in section 4.5 Some example use cases:
- Allowing valgrind or some other memory monitoring tool to seamleessly hand off to a general purpose debugger like gdb when it detects a memory error of some kind
- Large scale debugging with general purpose debugger like GDB doesn't scale up to the huge number of inferiors needed for current and future planned HPC machines. The current state of the art mechanism to debug programs running on something like a BlueGene/Q machine with 256,000 processors is to have at least one lightweight debugger running looking for potential misbehavior and then giving control to a general purpose parallel debugger like TotalView when requested.
- Being able to do something like strace a process that is also being debugged with a debugger like GDB.
Large writes to inferior's address space - /proc/<pid>/mem provides read access to a process's memory but the kernel has provided inconsistent write access to the process's memory. At one time writing to the process's address space was allowed through the proc interface but there were some potential races found in the code and that mechanism was disabled until it could be fixed. Rik Van Riel has the details on that. Never the less there are several situations where being able to do large writes does occur:
- GDB is working on using GCC as a JIT to evaluate complex C & C++ expressions in the context of the inferior. To be able to do this, it nees to be able to drop a newly generated function in the process's address space to be executed. Poking it into the process's address space 8 bytes at a time is slow and cumbersome.
- lightweight advanced consitional breakpoints that execute in the inferior's address space
- things that use dyninst to replace functions or instrument live code
mmap_inferior like function - A function like mmap which allows a debugger to make a mmap syscall to create a new block of valid addresses in an inferior. The purpose of this area would most often be to write the contents of a function to be executed within the context of the processes's address space. Some potential uses of this would be:
- advanced conditional breakpoints without the need for excessive syscalls.
- function replacement

Reference documents

Blue Gene/Q Code Development and Tools Interface - http://www.redbooks.ibm.com/abstracts/redp4659.html
AIX pthread debug library interface (libpthdebug) - http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.basetechref/doc/basetrf1/pthdb_attr.htm
Solaris 5.9's manpage on libthread_db - This is what Linux's libthead_db was modeled after. http://modman.unixdev.net/index.php?page=libthread_db&sektion=3LIB&manpath=SunOS-5.9
MPIR Aqusition interface - http://www.mpi-forum.org/docs/mpir-specification-10-11-2010.pdf
Bull OpenSource's NPTL tools interface - http://nptl.bullopensource.org/ols/paper.pdf
OpenMP Tools interface - http://openmp.org/mp-documents/ompt-tr.pdf
Using the DLFM Package on the Cray XE6 System - http://www.nersc.gov/assets/dlfmuserguide.pdf
Shared library performance on Hopper - https://cug.org/proceedings/attendee_program_cug2012/includes/files/pap124.pdf
FS-Cache: A Network Filesystem Caching Facility - http://people.redhat.com/dhowells/fscache/FS-Cache.pdf