|Deletions are marked like this.||Additions are marked like this.|
|Line 30:||Line 30:|
|...||Trampolines aka Jump pad in case of `gdbserver` is the code section to which the jump from the instruction stream is made. It is responsible for storing the register values onto the stack and calling the tracepoint collector function . The original displaced instruction is then pushed back and normal flow of the instruction stream resumes. As an example, before patching, if the instructions are as follows :
After patching a tracepoint at `INSN3` 's address:
INSN2 | --- tracepoint *'s jump pad ---
* JUMP ==> | 1. push all registers to the stack,
INSN4 | 2. hold pointer to this "buffer".
INSN5 | 3. call collector(register_buffer)
INSN6 | 4. execute original instruction
INSN7 | 5. jump back to INSN4.
And this is how trampolines work. Boing!
|Line 36:||Line 57:|
== References ==
This page describes the internal design of the fast tracepoint mechanism available in GDB. This involves GDB, gdbserver and a special in-process agent library (commonly referred to as IPA). Even though fast tracepoints are only available in gdbserver right now, there is no technical reason that prevents having them in native GDB. It is simply that the use cases for them generally involve using gdbserver as well.
When a fast tracepoint is installed in the target, an instruction is replaced by an unconditional jump to a section called "jump pad". This jump pad contains code that will perform condition checking and data collection. The trace data is saved in a buffer in the process memory. In addition, the code in the jump pad will execute the instruction that was replaced in a displaced manner. The execution then jumps back to where it came from to continue execution of the program.
Why fast tracepoints ?
Regular tracepoints work by using the standard breakpoint mechanism. Once a regular tracepoint is hit, gdbserver carries the actions associated with the tracepoint and resumes the execution. The cost of hitting a breakpoint, going to the kernel, transferring the control to gdbserver and resuming is generally acceptable, but there are some situations where we can't afford it. The fast tracepoints design allows GDB to trace the application (collect data when the control reaches certain points in the program) without leaving the context of the program. That's right: no context switch, no system call, no nothing.
In-process agent library (IPA)
The in-process agent library consists of a dynamic library shipped with GDB that is meant to be loaded in the program with which you want to use fast tracepoints. This can be done using LD_PRELOAD (man ld.so(8)). The library does a bunch of initialization work using a constructor function:
- Allocate a trace buffer (5 MB by default)
- Allocate a "jump pad" zone (20 memory pages by default)
Start the helper thread, a thread that runs in the debugged program that will handle communication with gdbserver
During its initialization, the thread creates a named socket of the form /tmp/gdb_ust<pid> and starts listening on it.
GDBserver <-> IPA communication
To send commands to the IPA, gdbserver uses the named socket created by the helper thread, but not in the obvious way. Since gdbserver has the ability to read/write the inferior's memory, it has the ability to read/write memory that "belongs" the the IPA. To send a command, gdbserver writes it in a global buffer in the IPA and then sends a byte through the socket to tell the IPA that a new command is available in the command buffer. The IPA places its reply in the same buffer and also writes back a byte in the socket to inform gdbserver that the reply is ready.
gdbserver installs internal breakpoints on some special functions designed to allow the IPA to inform gdbserver of some events. When the IPA wants to trigger an event (for example, flush the trace buffer when it is full), it simply calls the associated function. The breakpoint is hit, and control goes back to gdbserver, which execute some necessary actions before resuming the inferior.
Fast tracepoints installation
Whenever gdbserver wants to install a fast tracepoint in the inferior, it uses the communication mechanism outlined above to send the IPA the FastTrace command, which includes a bunch of information about the tracepoint (look at tracepoint_send_agent() to see them all). The IPA prepares the jump pad, the zone where the program will jump to when reaching the tracepoint, by writing machine code that performs various tasks such as condition checking and data collection. The instruction to be overwritten by a jump is also copied in the jump pad so that it can be executed out-of-line (see the Displaced stepping section). If everything goes as expected, it replies OK to gdbserver, which in turn will write overwrite the instruction with a jump to the jump pad.
This is also called out-of-line stepping/execution in other contexts. Since the instruction at the tracepoint location that is overwritten by a jump, it has to be executed somehow. It is copied at the en of the jump pad, right before jumping back to the main execution. The challenge with this is to deal properly with instructions that use relative addressing. For example, an instruction that jumps at (current instruction pointer + 0x20) wouldn't work if you execute it at a different address. GDB has to adapt the instruction to make it behave as if it had been executed at its original location.
Trampolines aka Jump pad in case of gdbserver is the code section to which the jump from the instruction stream is made. It is responsible for storing the register values onto the stack and calling the tracepoint collector function . The original displaced instruction is then pushed back and normal flow of the instruction stream resumes. As an example, before patching, if the instructions are as follows :
INSN1 INSN2 INSN3 INSN4 INSN5 INSN6 INSN7
After patching a tracepoint at INSN3 's address:
INSN1 INSN2 | --- tracepoint *'s jump pad --- * JUMP ==> | 1. push all registers to the stack, INSN4 | 2. hold pointer to this "buffer". INSN5 | 3. call collector(register_buffer) INSN6 | 4. execute original instruction INSN7 | 5. jump back to INSN4.
And this is how trampolines work. Boing!
Instruction length limitation
The most important limitation (at least on x86) is that you can only install a tracepoint on 5 bytes or longer instructions. This is because the jump that replaces the instruction is 5 bytes itself. You may say: "But why not replace two (or more) instructions with the jump and execute them out-of-line?". The problem is, if you have a branch instruction whose destination is the second instruction you replaced, the processor will try to interpret the bytes that encode the jump destination as an instruction. This will result either in a random instruction being executed, or an illegal instruction. This limits the choice of places you can install a fast tracepoint in your program, but it has to be a performance vs flexibility tradeoff.
(Note: this might not be true because of trampolines...)