This is the mail archive of the
gdb-patches@sources.redhat.com
mailing list for the GDB project.
Re: RFC: C/C++ preprocessor macro support for GDB
Neil Booth <neil@daikokuya.demon.co.uk> writes:
> What are the issues with using libcpp? It would be a good test of its
> viability as an independent library to have it used somewhere else.
I think there are two issues. Both might simply be my
misunderstanding of the libcpp header files and code I read; I'd love
to be set straight.
- GDB has commands like this:
(gdb) break *ADDRESS if CONDITION
This sets a conditional breakpoint at the address computed by
evaluating the expression ADDRESS, whose condition is CONDITION.
ADDRESS needs to be evaluated in the current scope --- the currently
selected frame and its PC --- but CONDITION needs to be evaluated in
the scope in force at the *breakpoint's* address. So you can't just
take the whole command and smoosh it through an expander all at
once: ADDRESS and CONDITION might have totally different contexts,
as far as the preprocessor is concerned.
This means you've got to decide if there's an `if' in the command
before you can macro-expand things. Obviously, an `if' in a string,
or as part of a larger identifier, doesn't count --- you really need
to work in terms of tokens.
(There's a similar situation involving commas: sometimes the parser
is supposed to stop when it finds its first comma outside of any
parens.)
So my macro expander has the following function in its public
interface:
/* If the null-terminated string pointed to by *LEXPTR begins with a
macro invocation, return the result of expanding that invocation as
a null-terminated string, and set *LEXPTR to the next character
after the invocation. The result is completely expanded; it
contains no further macro invocations.
Otherwise, if *LEXPTR does not start with a macro invocation,
return zero, and leave *LEXPTR unchanged.
Use LOOKUP_FUNC and LOOKUP_BATON to find macro definitions.
If this function returns a string, the caller is responsible for
freeing it, using xfree.
We need this expand-one-token-at-a-time interface in order to
accomodate GDB's C expression parser, which may not consume the
entire string. When the user enters a command like
(gdb) break *func+20 if x == 5
the parser is expected to consume `func+20', and then stop when it
sees the "if". But of course, "if" appearing in a character string
or as part of a larger identifier doesn't count. So you pretty
much have to do tokenization to find the end of the string that
needs to be macro-expanded. Our C/C++ tokenizer isn't really
designed to be called by anything but the yacc parser engine. */
char *macro_expand_next (char **lexptr,
macro_lookup_ftype *lookup_func,
void *lookup_baton);
I changed GDB's lexer to call macro_expand_next before carving out
each token. This means we don't have to worry about commas or `if's
in macro invocations being confused with terminating commas: the
expander consumes them before we ever see them.
As far as I can tell, libcpp doesn't provide an analogous
token-by-token entry point.
Another way to deal with this would be to lex the command string
twice: once to find the `if' or comma, and then again to do the real
parsing, after macro-expanding each of the various expressions
properly. The only difficulty here is that GDB's lexer expects to
be called by a yacc-style parsing engine; it deposits tokens'
semantic values in yylval, etc. To work around this, we'd need to
make the lexer independent of yacc --- give it some other way to
return semantic values, mostly --- and hook that into both yacc and
the code looking for `if's and commas. But that approach wouldn't
require any change to libcpp's interface.
There's nothing too hard there. But I wanted to put together a
patch which actually worked, while disturbing the existing GDB code
as little as possible. And I think there's something unsatisfying
about the two-pass approach; parsers ought to be able to leave input
unconsumed if they want. It's a common enough idiom. Shouldn't
libcpp support it?
- GDB's macro data structures record all the macros that were ever
#defined in a compilation unit, and the line numbers at which they
were in force. Given a name and an #inclusion and a line number (or
in libcpp's terminology, a logical line number?), it can find the
#definition in scope at that point.
This is a bit different from libcpp's data structures, which only
record the macros currently in force as libcpp makes a pass through
the file's text. (At least, that's the impression I got.)
My macro expander is completely ignorant of the lookup table's
structure; you pass it a function and a data pointer that it uses
blindly for lookups. Here's the relevant typedef, and one of the
prototypes, from the expander's public interface:
/* A function for looking up preprocessor macro definitions. Return
the preprocessor definition of NAME in scope according to BATON, or
zero if NAME is not defined as a preprocessor macro.
The caller must not free or modify the definition returned. It is
probably unwise for the caller to hold pointers to it for very
long; it probably lives in some objfile's obstacks. */
typedef struct macro_definition *(macro_lookup_ftype) (const char *name,
void *baton);
/* Expand any preprocessor macros in SOURCE, and return the expanded
text. Use LOOKUP_FUNC and LOOKUP_FUNC_BATON to find identifiers'
preprocessor definitions. SOURCE is a null-terminated string. The
result is a null-terminated string, allocated using xmalloc; it is
the caller's responsibility to free it. */
char *macro_expand (const char *source,
macro_lookup_ftype *lookup_func,
void *lookup_func_baton);
When expanding an expression, GDB packages up the #inclusion and
line number in the baton argument, and provides a lookup_func that
takes those together with the macro name to search the macro table.