Differences between revisions 2 and 3
Revision 2 as of 2014-10-13 13:51:49
Size: 6701
Editor: PhilMuldoon
Comment: Update History And Genesis. Add project approach. Add "A first example"
Revision 3 as of 2014-10-14 12:00:50
Size: 13306
Editor: PhilMuldoon
Comment:
Deletions are marked like this. Additions are marked like this.
Line 2: Line 2:

'''''** DRAFT **'''''
Line 99: Line 101:
TODO: Write up mechanical internals document. The first thing GDB does it to annotate the code so it will be in a format that GCC can recognize.

The process of annotations falls into several discrete steps. They are enumerated below. (Note, this is for the C language. Other languages may have different steps or operations).

==== Enumerate and include macros that are in scope ====

The code the user writes and wishes to be compiled and injected might refer to a macro. GDB will enumerate all of the macros it knows about and select ones that are in the current scope. The scope referred to here is the place where the inferior is stopped in GDB – that's where the compiled code will be injected. GDB (with this project) makes no differentiation regarding whether an actual macro is used or not. If the macro is in the current scope it is included unconditionally in the annotated output.

So the top of our annotated output looks something like this:

{{{#!highlight c
#define BUFSIZ _IO_BUFSIZ
#define EOF (-1)
#define FILENAME_MAX 4096
#define FOPEN_MAX 16
#define L_ctermid 9
}}}

And so on. In most programs there are dozens and dozens of system defined macros, so this section can be quite lengthy.

==== Generate callable scope ====

Currently the project does not “patch in” the newly compiled bytes at the program counter. That might be a viable solution in the future, perhaps as we explore other avenues for this project (like “fast” breakpoints). The project instead utilizes GDB's ability to make inferior function calls (basically call a function out of sequence without affecting the current execution context of the inferior). For that we need to generate a unique callable scope. GDB then can then “know” the function name to call when it prepares to execute the snippet. We call this inserted a scope a “code header”. Currently there is only one code header, for C. Other languages will need their own code header, and other code headers perhaps for other types of functionality. The C code header looks like this:

{{{#!highlight c
void _gdb_expr (struct __gdb_regs *__regs) {
}}}

The function is called “_gdb_expr”. This is what GDB will call when it comes to execute the code snippet. It takes one parameter which is an auto-generated C struct that contains a register name and value pairing. The use for this structure and why we need these registers will be explored in the next section. However there are disadvantages to this approach. It relies upon the host language's ability to access such low level details. Java, for example, may have some difficulty with this if someone were to write an extension to this project for GCJ's compiled Java. However in all cases we always managed to find a workaround (for Java one could use CNI/JNI).

==== Generate locals location ====

If you read the above section about a callable scope, you would know that we actually wrap the code snippet in our own callable scope. The reasons why are explained above. But that creates a problem we have to solve. We want the code to act as if it is running in the current scope of where the inferior is currently stopped at. In our example above we are stopped in the '''''main''''' function. It has three local variables: '''''i''''', '''''c''''' and '''''f'''''. But as the snippet is being executed in its own auto-generated scope, accessing those local variables becomes problematic. The snippet's callable scope will have its own stack, and those variables will not be found within them. We experimented with copying the stack and other approaches, but they all fell short of our project guidelines: permanence and non-involvement. Permanence meaning that if a user assigns a variable in the local inferior frame's stack (say, in this example, '''''c''''') that local should maintain that value even after the snippet has stopped executing. The other factor, non-involvement, means we like to tread lightly in the inferior, and wholesale copying of stacks or register manipulations in the inferior is not, as it were, treading lightly.

The solution we came up with and implemented was to calculate the location of each the inferior's locals in the current stack, and map “shadow” locals to those. I quoted shadow as it is an often overloaded term in computer science – don't attribute any of those other meanings here, a shadowed local in this context is a local in our snippet that “points too” another local. Writing to a shadowed local will alter the local variable it points to.

To do this we have to write a compile-loc2{language}.{ext} file. Other languages will need their own implementation. This part of the projects enumerates the locals and annotates it to the code header detailed above. Here is an example of one of the variables (for brevity, just one - they rest look similar). Note this code is auto-generated, may change, and in order not to interfere with the user's snippet has to have obscured variable names.

{{{#!highlight c
void *__i_ptr;
{
__gdb_uintptr __gdb_stack[1];
int __gdb_tos = -1;
/* DW_OP_fbreg */
void *__frame_base_2;
{
__gdb_uintptr __gdb_stack[1];
int __gdb_tos = -1;
/* DW_OP_call_frame_cfa */
__gdb_stack[__gdb_tos + 1] = __regs->__rbp + 0x10;
++__gdb_tos;
__frame_base_2 = (void *) __gdb_stack[__gdb_tos];
}
__gdb_stack[__gdb_tos + 1] = __frame_base_2 + 0xffffffffffffffec;
++__gdb_tos;
__i_ptr = (void *) __gdb_stack[__gdb_tos];
}
}}}

In the example above we are calculating the location of the '''''i''''' local variable. In the callable scope section, we saw that the scope is passed a register struct of name and value pairings. In the example auto-generated code above, you may understand why we need those registers. We need the '''''RBP''''' register value to calculate the stack offset in the current context of the inferior. So in this example, we calculate the base of the frame, and the location of the stack in that frame. We add the offset of the current local in the stack, and assign that location to our shadowed local (in this case, '''''i_ptr'''''). Beyond the location we have no need for type based information, so we just use the utility “void” type here. This was a matter of convenience for the implementation of the C language. If you are writing your own language adaptation you may have to deal with types more explicitly here.

So for each local in scope, we have a snippet largely similar to the one above. These variables are located in the “_gdb_expr” scope.

GDB/GCC Compile and Execute Project

** DRAFT **

History and Genesis

The compile and execute project originated from the desire to reuse parts of the GCC language parser in GDB. Language parsing is a complex and involved task. In C++ especially, it is a fiendishly difficult problem, especially when parsing templates. Over the last few years there have been many C++ parsing bugs found in GDB, most of which are not easily solved. This accumulation of bugs, many of which exist to this day, has proven a less than ideal experience for GDB users.

Several projects have tried to solve this with the use of GCC language parsers. GCC, after all, has presumably already parsed and compiled the source previously. If we model GCC as an example of "best in the class" parser, the conclusion would be to reuse that parser in as many places as possible. There have been efforts to attempt this, but all of them run into two fundamental issues:

  • Access to source code is needed.
  • Compilation times for source.

Most other projects relied on recompiling an expression within source code to provide context. GDB could then examine this new debug information. But source code is not always available, and with even a modestly sized project, the compilation and link step delay renders a poor user experience. Getting over these two obstacles has stymied previous efforts, and, to this day, GDB still relies on its own internal language parsers to examine an expression.

GDB's internal parsers get a lot of things right. But also, they get a lot of things wrong. It has to parse the expression “by hand”, figure out what is a symbol, the types, what is an inferior function call, etc. It also has to deal with various versions of languages. If GDB were to work around a C++ bug exposed in GCC, what then of newer GCC versions that fixed that bug? In short it can't work around these bugs because GCC changes (hopefully for the better!).

Backward engineering a complex (yes, C++, I come to you again), in an ever increasingly complex language through the murky lens of debug info results in a workman-like parser. It works with simple to medium complexity expressions, but tends to fail in complex scenarios. It sometimes fails in simple scenarios. Take a plain old C++ iterator:

(gdb) p it
$1 = 1
(gdb) p it+1
Attempt to take address of value not located in memory.

While there may be many internally technical reasons why that would not work in GDB, to the user, who just wants the iterator incremented, it is a GDB failure.

Project Approach

This project approaches the problem in a similar manner as the previous projects written above; it uses GCC to construct the meaning or value of a given expression by asking GCC to compile something. It just asks GCC to do the heavy-lifting. GDB then executes the produced object code, caches the return value and (in some cases) prints it.

This approach is achieved through the use of “Oracles”, or a concept of two-way information exchange between GCC and GDB. Put simply, an Oracle provides answers to GCC queries with the answers provided by GDB. In return for these answers, GCC can provide object code without the original source code. One way of looking at this project is just as a collection of Oracles and Marshallers that coordinate the flow of information to and from GDB and GCC. In this sense, GDB provides information about the inferior it knows about, and GCC provides information relating to errors or the resultant object code. But before we delve into the technical innards of this projects, lets look at it from a functionality point of view.

A First Example

Here is a simple piece of example code. It has been compiled, loaded into GDB and stopped at a breakpoint:

   1 #include <stdio.h>
   2 
   3 main ()
   4 {
   5    int i = 5;
   6    int c = 12;
   7    char *f = "Hello World";
   8 
   9    printf ("f is %s\n", f);
  10 }

And this is the GDB output thus far:

(gdb) break 9
Breakpoint 4 at 0x400696: file /home/build/simple.c, line 9.

(gdb) run
Starting program: /home/build/simple
Breakpoint 4, main () at /home/build/simple.c:9
9 printf("f is %s\n", f);

Here is the what I want to do in this example:

  • I want to create a new variable. An integer. This has not been previously defined either in GCC or GDB.
  • I want to compute some value and assign the value to the location of this new variable.
  • I want to assign the value of this new variable to an existing variable in the program above. While the program is running, without recompiling the code or stopping execution in any way.

This raises some interesting questions. GCC does not know anything at this point. GDB has started it up and it is sitting idle. GDB knows about the type and location of all inferior variables (in this case, i, c and f. At this point GDB is sitting idle waiting on user input. It does not know that I want to create a new inferior variable, I've not typed anything into GDB yet. Neither GCC nor GDB know that this new variable's value will be computed somehow, and that this new variable value will be assigned to an existing variable.

So how do we do this? We write some brand new C code to tell GDB and GCC what to do. Currently in the project we have access to a command called “compile code”. So lets see what happens in GDB.

(gdb) compile code int z = 5; c = z;

This example creates z, a new variable, and just assigns a value to it. It then assigns the value of z to c.

The problems highlighted above are:

  • GDB knows about c but GCC has no idea what c is. When we pass this code snippet to it will be undefined, and without the use of the oracles would render a syntax error.

  • We don't have access to source code remember! So no cheating that way
  • Neither GDB nor GCC know about z. At the moment it is just a bunch of text on the screen.

Before we investigate how it happens, lets look at the result of the command. When you type the above into a GDB with this patch-set, what happens? Well nothing! That is, nothing is printed, but there is plenty going on behind the scenes – the code is compiled and executed. So if you were to type:

(gdb) print c
$1 = 5

The variable c is now equal to z (previously, c was 12), which is 5.

Internals of the example

We have the result. It works (phew!). How did that happen?

The first thing GDB does it to annotate the code so it will be in a format that GCC can recognize.

The process of annotations falls into several discrete steps. They are enumerated below. (Note, this is for the C language. Other languages may have different steps or operations).

Enumerate and include macros that are in scope

The code the user writes and wishes to be compiled and injected might refer to a macro. GDB will enumerate all of the macros it knows about and select ones that are in the current scope. The scope referred to here is the place where the inferior is stopped in GDB – that's where the compiled code will be injected. GDB (with this project) makes no differentiation regarding whether an actual macro is used or not. If the macro is in the current scope it is included unconditionally in the annotated output.

So the top of our annotated output looks something like this:

   1 #define BUFSIZ _IO_BUFSIZ
   2 #define EOF (-1)
   3 #define FILENAME_MAX 4096
   4 #define FOPEN_MAX 16
   5 #define L_ctermid 9
   6 

And so on. In most programs there are dozens and dozens of system defined macros, so this section can be quite lengthy.

Generate callable scope

Currently the project does not “patch in” the newly compiled bytes at the program counter. That might be a viable solution in the future, perhaps as we explore other avenues for this project (like “fast” breakpoints). The project instead utilizes GDB's ability to make inferior function calls (basically call a function out of sequence without affecting the current execution context of the inferior). For that we need to generate a unique callable scope. GDB then can then “know” the function name to call when it prepares to execute the snippet. We call this inserted a scope a “code header”. Currently there is only one code header, for C. Other languages will need their own code header, and other code headers perhaps for other types of functionality. The C code header looks like this:

   1 void _gdb_expr (struct __gdb_regs *__regs) {

The function is called “_gdb_expr”. This is what GDB will call when it comes to execute the code snippet. It takes one parameter which is an auto-generated C struct that contains a register name and value pairing. The use for this structure and why we need these registers will be explored in the next section. However there are disadvantages to this approach. It relies upon the host language's ability to access such low level details. Java, for example, may have some difficulty with this if someone were to write an extension to this project for GCJ's compiled Java. However in all cases we always managed to find a workaround (for Java one could use CNI/JNI).

Generate locals location

If you read the above section about a callable scope, you would know that we actually wrap the code snippet in our own callable scope. The reasons why are explained above. But that creates a problem we have to solve. We want the code to act as if it is running in the current scope of where the inferior is currently stopped at. In our example above we are stopped in the main function. It has three local variables: i, c and f. But as the snippet is being executed in its own auto-generated scope, accessing those local variables becomes problematic. The snippet's callable scope will have its own stack, and those variables will not be found within them. We experimented with copying the stack and other approaches, but they all fell short of our project guidelines: permanence and non-involvement. Permanence meaning that if a user assigns a variable in the local inferior frame's stack (say, in this example, c) that local should maintain that value even after the snippet has stopped executing. The other factor, non-involvement, means we like to tread lightly in the inferior, and wholesale copying of stacks or register manipulations in the inferior is not, as it were, treading lightly.

The solution we came up with and implemented was to calculate the location of each the inferior's locals in the current stack, and map “shadow” locals to those. I quoted shadow as it is an often overloaded term in computer science – don't attribute any of those other meanings here, a shadowed local in this context is a local in our snippet that “points too” another local. Writing to a shadowed local will alter the local variable it points to.

To do this we have to write a compile-loc2{language}.{ext} file. Other languages will need their own implementation. This part of the projects enumerates the locals and annotates it to the code header detailed above. Here is an example of one of the variables (for brevity, just one - they rest look similar). Note this code is auto-generated, may change, and in order not to interfere with the user's snippet has to have obscured variable names.

   1 void *__i_ptr;
   2 {
   3 __gdb_uintptr __gdb_stack[1];
   4 int __gdb_tos = -1;
   5 /* DW_OP_fbreg */
   6 void *__frame_base_2;
   7 {
   8 __gdb_uintptr __gdb_stack[1];
   9 int __gdb_tos = -1;
  10 /* DW_OP_call_frame_cfa */
  11 __gdb_stack[__gdb_tos + 1] = __regs->__rbp + 0x10;
  12 ++__gdb_tos;
  13 __frame_base_2 = (void *) __gdb_stack[__gdb_tos];
  14 }
  15 __gdb_stack[__gdb_tos + 1] = __frame_base_2 + 0xffffffffffffffec;
  16 ++__gdb_tos;
  17 __i_ptr = (void *) __gdb_stack[__gdb_tos];
  18 }

In the example above we are calculating the location of the i local variable. In the callable scope section, we saw that the scope is passed a register struct of name and value pairings. In the example auto-generated code above, you may understand why we need those registers. We need the RBP register value to calculate the stack offset in the current context of the inferior. So in this example, we calculate the base of the frame, and the location of the stack in that frame. We add the offset of the current local in the stack, and assign that location to our shadowed local (in this case, i_ptr). Beyond the location we have no need for type based information, so we just use the utility “void” type here. This was a matter of convenience for the implementation of the C language. If you are writing your own language adaptation you may have to deal with types more explicitly here.

So for each local in scope, we have a snippet largely similar to the one above. These variables are located in the “_gdb_expr” scope.

None: GCCCompileAndExecute (last edited 2016-07-27 11:01:00 by PedroAlves)

All content (C) 2008 Free Software Foundation. For terms of use, redistribution, and modification, please see the WikiLicense page.