Differences between revisions 1 and 2
Revision 1 as of 2014-10-10 12:01:21
Size: 778
Editor: PhilMuldoon
Comment: First draft, history and genesis
Revision 2 as of 2014-10-13 13:51:49
Size: 6701
Editor: PhilMuldoon
Comment: Update History And Genesis. Add project approach. Add "A first example"
Deletions are marked like this. Additions are marked like this.
Line 2: Line 2:

<<TableOfContents()>>
Line 5: Line 7:
The compile and execute plugin project came about from the desire to reuse parts of the GCC language parser in GDB. Language parsing is a complex and involved task. In C++ especially it is such a considerably difficult problem, that, over the years there have been many C++ parsing bugs found in GDB. This accumulation of bugs, some that still exit to this day has proven a less than ideal experience for GDB users. The compile and execute project originated from the desire to reuse parts of the GCC language parser in GDB. Language parsing is a complex and involved task. In C++ especially, it is a fiendishly difficult problem, especially when parsing templates. Over the last few years there have been many C++ parsing bugs found in GDB, most of which are not easily solved. This accumulation of bugs, many of which exist to this day, has proven a less than ideal experience for GDB users.
Line 7: Line 9:
For years there has been a desire, and several projects emerged, to reuse the GCC suite of language parsers. GCC, after all, has presumably already parsed and compiled the source. So if we use GCC as an example of "best in the class" parser, it would seem a good idea to reuse it. Several projects have tried to solve this with the use of GCC language parsers. GCC, after all, has presumably already parsed and compiled the source previously. If we model GCC as an example of "best in the class" parser, the conclusion would be to reuse that parser in as many places as possible. There have been efforts to attempt this, but all of them run into two fundamental issues:

 * Access to source code is needed.
 * Compilation times for source.

Most other projects relied on recompiling an expression within source code to provide context. GDB could then examine this new debug information. But source code is not always available, and with even a modestly sized project, the compilation and link step delay renders a poor user experience. Getting over these two obstacles has stymied previous efforts, and, to this day, GDB still relies on its own internal language parsers to examine an expression.

GDB's internal parsers get a lot of things right. But also, they get a lot of things wrong. It has to parse the expression &ldquo;by hand&rdquo;, figure out what is a symbol, the types, what is an inferior function call, etc. It also has to deal with various versions of languages. If GDB were to work around a C++ bug exposed in GCC, what then of newer GCC versions that fixed that bug? In short it can't work around these bugs because GCC changes (hopefully for the better!).

Backward engineering a complex (yes, C++, I come to you again), in an ever increasingly complex language through the murky lens of debug info results in a workman-like parser. It works with simple to medium complexity expressions, but tends to fail in complex scenarios. It sometimes fails in simple scenarios. Take a plain old C++ iterator:

{{{
(gdb) p it
$1 = 1
(gdb) p it+1
Attempt to take address of value not located in memory.
}}}

While there may be many internally technical reasons why that would not work in GDB, to the user, who just wants the iterator incremented, it is a GDB failure.

== Project Approach ==

This project approaches the problem in a similar manner as the previous projects written above; it uses GCC to construct the meaning or value of a given expression by asking GCC to compile something. It just asks GCC to do the heavy-lifting. GDB then executes the produced object code, caches the return value and (in some cases) prints it.

This approach is achieved through the use of &ldquo;Oracles&rdquo;, or a concept of two-way information exchange between GCC and GDB. Put simply, an Oracle provides answers to GCC queries with the answers provided by GDB. In return for these answers, GCC can provide object code without the original source code. One way of looking at this project is just as a collection of Oracles and Marshallers that coordinate the flow of information to and from GDB and GCC. In this sense, GDB provides information about the inferior it knows about, and GCC provides information relating to errors or the resultant object code. But before we delve into the technical innards of this projects, lets look at it from a functionality point of view.

=== A First Example ===

Here is a simple piece of example code. It has been compiled, loaded into GDB and stopped at a breakpoint:

{{{#!highlight c
#include <stdio.h>

main ()
{
   int i = 5;
   int c = 12;
   char *f = "Hello World";

   printf ("f is %s\n", f);
}
}}}

And this is the GDB output thus far:

{{{
(gdb) break 9
Breakpoint 4 at 0x400696: file /home/build/simple.c, line 9.

(gdb) run
Starting program: /home/build/simple
Breakpoint 4, main () at /home/build/simple.c:9
9 printf("f is %s\n", f);
}}}

Here is the what I want to do in this example:

 * I want to create a new variable. An integer. This has not been previously defined either in GCC or GDB.
 * I want to compute some value and assign the value to the location of this new variable.
 * I want to assign the value of this new variable to an existing variable in the program above. While the program is running, without recompiling the code or stopping execution in any way.

This raises some interesting questions. GCC does not know anything at this point. GDB has started it up and it is sitting idle. GDB knows about the type and location of all inferior variables (in this case, '''''i''''', '''''c''''' and '''''f'''''. At this point GDB is sitting idle waiting on user input. It does not know that I want to create a new inferior variable, I've not typed anything into GDB yet. Neither GCC nor GDB know that this new variable's value will be computed somehow, and that this new variable value will be assigned to an existing variable.

So how do we do this? We write some brand new C code to tell GDB and GCC what to do. Currently in the project we have access to a command called &ldquo;compile code&rdquo;. So lets see what happens in GDB.

{{{
(gdb) compile code int z = 5; c = z;
}}}

This example creates '''''z''''', a new variable, and just assigns a value to it. It then assigns the value of '''''z''''' to '''''c'''''.

The problems highlighted above are:

 * GDB knows about '''''c''''' but GCC has no idea what '''''c''''' is. When we pass this code snippet to it will be undefined, and without the use of the oracles would render a syntax error.
 * We don't have access to source code remember! So no cheating that way
 * Neither GDB nor GCC know about '''''z'''''. At the moment it is just a bunch of text on the screen.

Before we investigate how it happens, lets look at the result of the command. When you type the above into a GDB with this patch-set, what happens? Well nothing! That is, nothing is printed, but there is plenty going on behind the scenes &ndash; the code is compiled and executed. So if you were to type:

{{{
(gdb) print c
$1 = 5
}}}

The variable '''''c''''' is now equal to '''''z''''' (previously, '''''c''''' was 12), which is 5.

=== Internals of the example ===

We have the result. It works (phew!). How did that happen?

TODO: Write up mechanical internals document.

GDB/GCC Compile and Execute Project

History and Genesis

The compile and execute project originated from the desire to reuse parts of the GCC language parser in GDB. Language parsing is a complex and involved task. In C++ especially, it is a fiendishly difficult problem, especially when parsing templates. Over the last few years there have been many C++ parsing bugs found in GDB, most of which are not easily solved. This accumulation of bugs, many of which exist to this day, has proven a less than ideal experience for GDB users.

Several projects have tried to solve this with the use of GCC language parsers. GCC, after all, has presumably already parsed and compiled the source previously. If we model GCC as an example of "best in the class" parser, the conclusion would be to reuse that parser in as many places as possible. There have been efforts to attempt this, but all of them run into two fundamental issues:

  • Access to source code is needed.
  • Compilation times for source.

Most other projects relied on recompiling an expression within source code to provide context. GDB could then examine this new debug information. But source code is not always available, and with even a modestly sized project, the compilation and link step delay renders a poor user experience. Getting over these two obstacles has stymied previous efforts, and, to this day, GDB still relies on its own internal language parsers to examine an expression.

GDB's internal parsers get a lot of things right. But also, they get a lot of things wrong. It has to parse the expression “by hand”, figure out what is a symbol, the types, what is an inferior function call, etc. It also has to deal with various versions of languages. If GDB were to work around a C++ bug exposed in GCC, what then of newer GCC versions that fixed that bug? In short it can't work around these bugs because GCC changes (hopefully for the better!).

Backward engineering a complex (yes, C++, I come to you again), in an ever increasingly complex language through the murky lens of debug info results in a workman-like parser. It works with simple to medium complexity expressions, but tends to fail in complex scenarios. It sometimes fails in simple scenarios. Take a plain old C++ iterator:

(gdb) p it
$1 = 1
(gdb) p it+1
Attempt to take address of value not located in memory.

While there may be many internally technical reasons why that would not work in GDB, to the user, who just wants the iterator incremented, it is a GDB failure.

Project Approach

This project approaches the problem in a similar manner as the previous projects written above; it uses GCC to construct the meaning or value of a given expression by asking GCC to compile something. It just asks GCC to do the heavy-lifting. GDB then executes the produced object code, caches the return value and (in some cases) prints it.

This approach is achieved through the use of “Oracles”, or a concept of two-way information exchange between GCC and GDB. Put simply, an Oracle provides answers to GCC queries with the answers provided by GDB. In return for these answers, GCC can provide object code without the original source code. One way of looking at this project is just as a collection of Oracles and Marshallers that coordinate the flow of information to and from GDB and GCC. In this sense, GDB provides information about the inferior it knows about, and GCC provides information relating to errors or the resultant object code. But before we delve into the technical innards of this projects, lets look at it from a functionality point of view.

A First Example

Here is a simple piece of example code. It has been compiled, loaded into GDB and stopped at a breakpoint:

   1 #include <stdio.h>
   2 
   3 main ()
   4 {
   5    int i = 5;
   6    int c = 12;
   7    char *f = "Hello World";
   8 
   9    printf ("f is %s\n", f);
  10 }

And this is the GDB output thus far:

(gdb) break 9
Breakpoint 4 at 0x400696: file /home/build/simple.c, line 9.

(gdb) run
Starting program: /home/build/simple
Breakpoint 4, main () at /home/build/simple.c:9
9 printf("f is %s\n", f);

Here is the what I want to do in this example:

  • I want to create a new variable. An integer. This has not been previously defined either in GCC or GDB.
  • I want to compute some value and assign the value to the location of this new variable.
  • I want to assign the value of this new variable to an existing variable in the program above. While the program is running, without recompiling the code or stopping execution in any way.

This raises some interesting questions. GCC does not know anything at this point. GDB has started it up and it is sitting idle. GDB knows about the type and location of all inferior variables (in this case, i, c and f. At this point GDB is sitting idle waiting on user input. It does not know that I want to create a new inferior variable, I've not typed anything into GDB yet. Neither GCC nor GDB know that this new variable's value will be computed somehow, and that this new variable value will be assigned to an existing variable.

So how do we do this? We write some brand new C code to tell GDB and GCC what to do. Currently in the project we have access to a command called “compile code”. So lets see what happens in GDB.

(gdb) compile code int z = 5; c = z;

This example creates z, a new variable, and just assigns a value to it. It then assigns the value of z to c.

The problems highlighted above are:

  • GDB knows about c but GCC has no idea what c is. When we pass this code snippet to it will be undefined, and without the use of the oracles would render a syntax error.

  • We don't have access to source code remember! So no cheating that way
  • Neither GDB nor GCC know about z. At the moment it is just a bunch of text on the screen.

Before we investigate how it happens, lets look at the result of the command. When you type the above into a GDB with this patch-set, what happens? Well nothing! That is, nothing is printed, but there is plenty going on behind the scenes – the code is compiled and executed. So if you were to type:

(gdb) print c
$1 = 5

The variable c is now equal to z (previously, c was 12), which is 5.

Internals of the example

We have the result. It works (phew!). How did that happen?

TODO: Write up mechanical internals document.

None: GCCCompileAndExecute (last edited 2016-07-27 11:01:00 by PedroAlves)

All content (C) 2008 Free Software Foundation. For terms of use, redistribution, and modification, please see the WikiLicense page.