Adding a Source Language to GDB
This page is a high-level guide to adding support for a new source language to GDB. This is not too difficult, and one nice thing is that you can do the work in pieces, gradually adding more functionality.
Language definition
The first step is to add an entry for the new language to enum language in gdb/defs.h.
Next, make a new instance of struct language_defn (see gdb/language.h). The best approach is to make a new "lang.c" file, named after your language (e.g., ada-lang.c, c-lang.c); and then to start the new language definition as a copy of the C definition, replacing only the first three elements (la_name, la_natural_name, and la_language). Then you will refine the definition as you write more components.
Add an initialization function to your "lang.c" file to register the new language definition with add_language.
Edit init_filename_language_table in gdb/symfile.c to add any language extensions that should be associated with your new language.
At this point, you should be able to start gdb and use set language to change to your new language.
Update the DWARF reader
Because most GDB targets use DWARF, this task should be considered an early must-do. Change dwarf2read.c:set_cu_language to translate the DWARF code for your language to the enum value you added in the previous step. You may need to edit include/dwarf2.h (which is canonically maintained in GCC) to add the new value. If your language doesn't have a language code yet, you can add DWARF producer sniffing in read_file_scope.
You may want to update the DWARF reader some more in a later step.
Next steps
There are many choices of what to do next. Many of them can be done in any order. This guide presents one possible sequence, leaving the most difficult tasks for last. Most of the remaining tasks involve implementing one or more methods from struct language_defn.
There are also some tasks that you may or may not have to do, depending on your language. These are covered in the very last section.
Correct scalar fields
struct language_defn has several scalar fields -- as opposed to function pointers, or pointers to other tables. Go through each of these and make sure that the value in your new definition is correct for the language you are implementing.
Add a character printer
Implement the la_printchar and la_emitchar methods.
Add a typedef printer
Implement the la_print_typedef method.
Add a type printer
Implement the la_print_type method. Ideally (for your users), this should be able to display any type that would be used in programs written in the new language. Because programs can be written in multiple languages, and because GDB doesn't record the language of a type, if your printer sees a type it doesn't recognize, it is usually best to delegate it to c_print_type.
Add a val printer
Value printing is split into two phases -- value printing, which tries to print a struct value, and "val" printing, which essentially tries to print a value that has been decomposed into its constituent parts. Normally the generic value printer is fine; and so you will probably only need to implement a val printer.
Many values can be printed nicely using generic_val_print. It can be customized to some degree using an instance of generic_val_print_decorations. However, the generic printer cannot handle all types, for example TYPE_CODE_STRUCT. Your printer should handle these.
There are a number of print options that your printer should handle, in order to integrate nicely into GDB. See struct value_print_options, and the manual, for details.
One question you should consider is which types should have special code in the val printer. One decent approach is to have GDB know how to print values that correspond to types that are specially treated (or known) by the compiler. Then, delegate the printing of other types to Python pretty-printers that are shipped with the standard library.
Implement symbol lookup
The language method la_lookup_symbol_nonlocal is used by GDB when searching for a name. In particular, GDB calls this method after searching the various function-local blocks (and after searching this, if you've defined la_name_of_this), and before searching file-scoped and global blocks. This provides a way for your language to handle more complex name lookup, such as searching any associated namespaces or module imports.
Write the documentation
Your language should have a node in the manual, near the other source language nodes. You should also write a NEWS entry.
Write tests
Porting the test suite can be difficult, depending on the specifics of your language. See gdb/testsuite/lib/future.exp for a good spot to add hooks for your language.
It's a good idea to run coverage tests while writing your test suite, to ensure your new code is sufficiently tested.
Add a demangler
If your language mangles symbol names, say to include type information, then you will want to teach GDB how to demangle these names. This has a few steps:
1. The demangler implementation itself should go in libiberty, alongside the other demanglers there. The demangler test suite is also here.
2. You should update c++filt to recognize the name of the newly-added demangling style as an argument to the --format flag.
3. Update the la_demangle field in your language definition to call the new demangler.
4. Update gdb/symtab.c:symbol_find_demangled_name to handle your language.
Create the expression parser
If you use a yacc-based parser, it should reside in a file named after your language, and ending in "-exp.y". Since we can't depend upon everyone having Bison, and yacc produces parsers that define a bunch of global names, GDB provides a header file, yy-remap.h, which can be used to rename symbols that might possibly conflict.
Routines for building parsed expressions into a union exp_element list are in parse.c.
Due to the way the GDB CLI works, expression parsers must follow a few rules in addition to those required by the source language:
If the global comma_terminates is non-zero, then a top-level (i.e., unparenthesized) comma should be treated as an EOF. This is used for commands like printf.
The word if should be treated as EOF. This lets watch EXPR if OTHER-EXPR and break *EXPR if OTHER-EXPR work.
The words task and thread, or any abbreviation of them, should be treated as EOF if followed by an integer. So, for example task + 7 is part of an ordinary expression but task 98 should be considered to end the expression. This is also used by breakpoints.
It should be possible to start a variable with $ so that convenience variables and the value history can be accessed.
Other GDB expression extensions, such as the @ operator, can be supported if you like, and if they make sense for your language. There are a number of these of varying degrees of obscurity.
It's typical for a GDB expression parser to be able to parse either an expression or a type. This often introduces ambiguity into grammars that did not previously exist (though this can be worked around with a special start state, followed by parsing the same token stream two different ways). This is used to make ptype TYPE work. Unfortunately there is no way for your parser to know whether it has been called from ptype or some other command.
The parser API has special support for field name completion. This support makes it so that pressing tab will narrow the list of completions to just members of an aggregate object. See mark_struct_expression and parse_completion.
Add any evaluation routines, if necessary
If you need new opcodes (that represent the operations of the language), add them to std-operator.def. Add support code for these operations in the evaluate_subexp function defined in the file eval.c. Add cases for new opcodes to prefixify_subexp, operator_length_standard, print_subexp_standard, and dump_subexp_body.
You can also make the new operators specific to your language, by writing local variants of these functions (that delegate most cases to the standard versions); and by adding a struct exp_descriptor to your language implementation. You can also override standard operators this way -- most commonly by redefining the semantics of a particular operator, but also even changing the layout in struct expression.
"Maybe" tasks
There are some tasks that you may or may not have to do, depending either on how your language works, or how close it is to some language that GDB already supports.
It's not unusual to have to modify the DWARF reader beyond merely adding support for a language tag.
- If your language uses a module hierarchy, you may want to encode this information into the symbol names generated by GDB. This requires modifying the DWARF reader.
- If your language supports types that aren't already available in GDB, you may need:
- To modify the DWARF reader to add such types;
- To add new GDB type codes to represent the types; or
To add a new enum type_specific_kind constant and update associated types (especially union type_specific) to encode information specific to your language
It's possible your language may even require deeper changes to GDB. Whatever those might be, they are outside the scope of this document.