|Deletions are marked like this.||Additions are marked like this.|
|Line 39:||Line 39:|
1. obstack alignment.
On amd64 the alignment is 16 bytes because of SSE. However, gdb generally doesn't need that much alignment.
Being able to reduce it to 8 bytes saves a measurable amount of memory.
Symbol handling issues and improvements
This page describes the issues GDB has with symbol handling, and the improvements we're thinking of making. For the purposes of this page "symbol handling" is a catch-all that incorporates all things related to symbols and debug information.
Getting the code / Helping
Discussions are held on the main GDB mailing lists. Patches should be posted to the firstname.lastname@example.org mailing list. Work is being committed directly to the mainline (i.e., there's no special feature branch).
For testing, run the testsuite using your desktop o/s of choice, and make sure there are no regressions. amd64-linux and i386-linux are generally important platforms to not break.
For now this is just a raw list of unordered issues, "to get things down on paper". It is certainly an incomplete list.
Memory usage is a real problem, with multiple facets.
- Worst case is GDB will grow to use all available swap for very large programs
(here a "very large" program is roughly, say >=1G of debug info in the ELF binary).
Memory used to create of both "partial symbols" and "full symbols" can probably be improved on. As can minsyms. That's potentially three copies. Symbols can be shrunk a bit by better packing (see http://sourceware.org/ml/gdb-patches/2009-11/msg00119.html). Also, the obj_section field is redundant and can be removed, saving a word per symbol; see the archer-tromey-remove-obj_section branch.
Storing minsyms involves building their own copy of the demangled form (this is related to PR 12707, see the patch submission).
- Tab-completion on symbols can use excessive amounts of memory. For example, do we need to (prematurely) expand symtabs for C++ parameters? Most of the excessive memory usage affects speed of course too.
- The full name expansion (and canonicalization) that the DWARF reader does, it spends memory and cpu.
BFD waste. E.g., http://sourceware.org/bugzilla/show_bug.cgi?id=14108
- obstack alignment. On amd64 the alignment is 16 bytes because of SSE. However, gdb generally doesn't need that much alignment. Being able to reduce it to 8 bytes saves a measurable amount of memory.
Speed is another real problem, with multiple facets.
- The ".gdb_index" section greatly improves gdb startup time. For large programs the time to read "minimal symbols" (the ELF symbol table) now dominates and takes enough time to be worth considering improving on.
- The handling of "partial symbols" versus "full symbols" is a source of slowness (and memory usage and complexity). When not using gdb_index, during startup gdb reads the debug info as quickly as possible to create an initial set of symbol tables ("partial symbols"). Then later when the symbol is actually needed gdb reads the debug info again creating symbol tables that gdb ultimately uses ("full symbols").
- Full CU expansion is excessive work. Whether we use gdb_index or not, when we create the "full symbols" we expand the entire CU (DWARF). It should be possible to improve on this.
- Tab-completion on symbols has been really slow in the past, and is still not as fast as it could/should be.
See, e.g., end of http://sourceware.org/bugzilla/show_bug.cgi?id=13498 Also, even if tab-completion is blazingly fast, dumping 1000s of symbols in the output isn't always what the user wants.
- Symbol lookup is sometimes less efficient than it could be. [This is apart from debug info reading.] For example, the code may try finding a symbol in the static (or global) block list even though it knows the symbol "should" be on the other list. But it tries it anyway "just in case". In large apps this can be painful. It would be better to get it right. Another example is lookup_symbol_aux_objfile (circa December 2012). It pre-expands every symtab matching the symbol, but then the subsequent loop just returns the first one it finds. For static and global symbols this is a waste (one needs to be careful with things like -fshort-double where "double" can be different in different files, but lookup_symbol_aux_objfile can't handle that anyway).
- Rerunning a program shouldn't require rereading debug info for shared libs that haven't changed.
For large apps (say, >1000 shared libs, but even for less) it's unnecessarily painful.
- Single-stepping can be excessively slow. In one profile run, find_pc_sect_psymtab is the main culprit (this is w/o .gdb_index). It is called an inordinate number of times for each step (and an inordinate number of times for the same pc value - maybe some caching will help).
For singlestepping through dynsym resolving code, the PR is http://sourceware.org/bugzilla/show_bug.cgi?id=10952 The bug turns out to be due to a missing glibc resolver, but the data collected shows some inefficiencies here.
- Watch, carefully, all that GDB does to lookup "int" in things like "watch -l *(int*) $rsp" or "py print gdb.lookup_type("int")" in a large C++ program (with many shared libs, with and without .gdb_index).
- Two calls to lookup_symbol_global ("int"). They may be (relatively) fast (though in large apps, less so), but it's clumsy.
- "int" is in STATIC_BLOCK, but GDB searches GLOBAL_BLOCK first. There's a comment that says we shouldn't *have* to try the other block, but that's not always true.
- When .gdb_index is in use, "int" matches so gdb will expand the symbol table, but the match doesn't take into account the block kind. So gdb will proceed to expand one symbol table from every objfile looking for "int" in GLOBAL_BLOCK, finding it, but not using it.
Only after that is done will GDB try STATIC_BLOCK. In a large app (say >1000 shared libs) this gets painful. A similar excessive expansion can happen with "break foo::bar::baz". [This is obviously also a memory issue.]
- There can be way more TUs (DWARF Type Units) than CUs (DWARF Compilation Units). E.g. 200K vs 8K. The current way TUs are handled can be slow.
- Having headers in the same symtab/psymtab lists as "primary" symtabs often means a lot of iteration for nothing.
- On some systems with NFS-like file systems (overlayfs and whatnot), reading disk can be slower than it otherwise could be. E.g., is there potential wins from being able to tune prefetch options, with flexibility provided by exporting to Python somehow?
- When printing the type of a symbol, the struct type it came from is discarded and we pass plain text to the lookup routines (e.g., during canonicalization). Is this necessary? We lose all the context of where the type came from (for example), and are in essence starting over from scratch.
- A canonical way gdb does symbol lookup is to expand all "matching symtabs", and then do a search over all symtabs. E.g., linespec.c:iterate_over_all_matching_symtabs (circa February 2013). Why not collect a list of matching symtabs and only search those? Another example of a clumsy API successfully hiding performance issues?
- When setting a breakpoint on namespace::class::method (or just class::method), we first lookup class (though we do it twice: once in VAR_DOMAIN and once in STRUCT_DOMAIN, ref: linespec.c:lookup_prefix_sym circa February 2013). The lookup uses expand_symtabs_matching which iterates over all symbol table slots (in the case of .gdb_index). There's no need for this generality here since we're looking up a specific name, and thus should be able to hash the name and quickly find it in the index's symtab. Large apps can have 4M symtab slots (or more). Another example of a clumsy API successfully hiding performance issues?
This section is a random collection of known bugs.
"info var" doesn't find LOC_UNRESOLVED var: http://sourceware.org/bugzilla/show_bug.cgi?id=14025
- bfd caches files, and can close and reopen them behind gdb's back.
If the file has changed in the interim this can lead to incorrect behaviour: http://sourceware.org/bugzilla/show_bug.cgi?id=14202
'info variable' and 'info functions' very slow and memory consuming: http://sourceware.org/bugzilla/show_bug.cgi?id=13511
- gdb's handling of files compiled with a mix of things like with/without -fshort-double is broken. If double isn't defined by the current CU gdb will pick the first it finds, which will return randomly 4 or 8 for sizeof(double). gdb should first look in the current CU and if not found there try its builtin types list (and then continue as before if the symbol is not a builtin type).
This section is a random collection of annoyances that don't fit anywhere else (yet).
- The error message "warning: (Internal error: pc 0x19 in read in psymtab, but not in symtab.)" often appears, is generally useless to the user, and often ignorable. (I haven't seen this in a long, long time. It indicates a bug in the psymtab reader, so a reproducer would be very helpful.)
- GDB doesn't warn when the debug info it is using doesn't match the binary (plus possible core) being debugged. In practice it can be less of a problem with the main binary and more of a problem with the shared libs being used. This leaves the user with a false sense of confidence in what gdb prints, e.g., in backtraces, and frustration trying to figure out what is wrong.
- Lazy expansion can cause gdb to change its behaviour, based on what commands the user types and in what sequence. This shouldn't happen, so as we make things more lazy we should take care to catch and minimize the frequency of these kinds of bugs.
- When looking up linespecs, say to set a breakpoint, I(dje) have seen GDB throw away information it already has (obj_section?) only to go look it up again. It mightn't always slow things down (though for long operations (info func regex?) it may be a problem), but such clumsiness makes the code harder to understand/maintain.
- Symbol lookup, besides sometimes being slow, is just clumsy and in need of some clean up. There needs to be a cleaner API that the implementation (e.g. psyms) hides behind. Language dependencies are strewn throughout. The global "block_found" symbol, and is_a_field_of_this are all annoying. It would be much cleaner if the symtab API just concerned itself with the structure of the symbol tables and left all language-specific lookup rules to the language code.
- minsyms::filename seems barely useful. It is only used by stabs; it would be better if only stabs users paid for this.
- The DWARF reader currently stores demangled syms in the mangled entry of the symbol struct, and leaves the demangled entry as NULL. One thought is to go back to storing both. (There's a patch for this.)
- GDB records runtime offsets in symbol locations. This prevents symbols from being shared across inferiors. There is some ongoing work in this area, but it is a long process.
- One can print a specific case of a variable used in multiple locations with "print filename::varname". It would be useful to also be able to do "print objfile::varname" and "print "objfile::filename::varname".
- Some types live in VAR_DOMAIN. Functions live in VAR_DOMAIN. VAR_DOMAIN covers so much that as a tool for narrowing down the search, it's not very useful. XXX_DOMAIN is a historical C artifact. Is there something better for a multi-language world? There is also the symbol_matches_domain() hack to make, e.g., c++ classes appear in STRUCT_DOMAIN and VAR_DOMAIN.
- Calling psymtab_search_name in lookup_partial_symbol is clumsy. [Gets repetitively done for each psymtab.]
- The handling of include files as non-primary symtabs is clumsy.
- Complaints in debug info readers are generally ignored.
- Complaints and errors from the DWARF reader should generally mention at least the objfile name and the DIE offset. Currently, if you see the message, it is still a bit of work to track down the problem. There is at least one PR open about this.
- Errors when reading debug info could be handled more gracefully (i.e., not abort loading of the file). (This was partly addressed by the PR 14931 fix.)
- The strcmp_iw function is a bit of a wart. A symbol table redesign (e.g., hierarchical) could allow removing it.
- check_typedef is a constant source of pain. Maybe a necessary evil, but IWBN to see if there's a better way. Plus, it doesn't just do typedef dereferencing, it also handles opaque type lookup (IIRC - this one was added much later after looking into it). Handling opaque type lookup isn't bad, per se, but it's not expected given the name "check_typedef".
These issues may not be directly related to symbol/debug info handling, but they're tangentially related, and so documented here.
- Linespecs have a few problems.
- "break foo:bar" Is "foo" is a C source file (gcc -x c foo) or a function?
- "break foo" may currently resolve to the main binary, and is the intuitive way to specify that. But gdb will try setting that breakpoint on each shared library it opens as well. [This can tie in with "final" breakpoints.]
- Separate debug file objfiles are kept in the same list as the "real" objfile.
This section describes some ideas we have. They're just ideas, not anything even remotely cast in concrete.
Lazier CU reading
When we need full symbols, we expand the entire CU that contains the thing we need. We could be smarter and only expand the part we need (or some small but useful superset if it simplifies the implementation at reasonable cost).
Lazier type expansion
Expanding TU's to resolve DW_FORM_ref_sig8 could be done lazily. This could be extended to all types.
Smarter TU reading
In large apps there can be way more TUs than CUs (e.g., 200K vs 8K). Since TUs often share abbrev tables, we could sort TUs by the abbrev table they use and thus greatly reduce time spent reading abbrev tables (which shows up high in profiles of gdb startup). In the 200K vs 8K example, the number of TU abbrev tables is ~8K.
In addition to smarter reading, storing source file information better for TUs would be good as they typically share the same info.
One thing to try is share TUs across objfiles.
Hierarchical Symbol Tables
Currently symbol files are source file based. For larger programs this breaks down because, for example, classes and namespaces can be spread out over several files, and it's rather clumsy, for example, to go looking through every source file for elements of a particular class.
Hierarchical symbol tables can also help with lazier CU reading. E.g., we can skip all the children of namespace and class DIEs until we know we need them.
Another thought is that this would let us defer the full name expansion (+ canonicalization) that we do now in the DWARF reader.
Not necessarily tied to Hierarchical Symbol Tables, but supporting doing name expansion on demand would allow us to do things like choose whether to print typedef'd names or the underlying type, and whether omit defaulted template parameters. Tab completion could also take advantage of this (e.g. to avoid symtab expansion).
Some *very* rough timings I (dje) have done suggest we could bring GDB startup time down from 31sec to 15sec in one example large app (200K TUs, 8K CUs, 1G of debug info). 6sec of that is minsym reading btw, so for debug info it's 25sec -> 9sec. I think some other improvements could reduce that number by a few more seconds, but still not what .gdb_index provides.
Combine partial symbols into full symbols
Instead of building partial symbols, and then in turn building full symbols from them, build full symbols to begin with, but just lazily fill out the details.
The details of the combined form aren't spec'd out. The point is to take the best of both, and combine them into one symbol.
Do debug info reading in a separate thread
A lot of the information needed from the debug info (including minsyms from the ELF symbol table) aren't needed right away. It might speed up gdb's startup and response times if such reading was done in the background.
GDB generally only needs a small portion of all of the debug info. In a distributed build environment, it can make sense to leave all that info where it is, instead of (via various means) copying it to the user's desktop. For tab completion, this could still be handled in the server, and only sending the results to GDB. Such a symbol server might even be useful locally if it turns out that reading/processing debug info in a separate thread makes sense (it mightn't be a separate process, it could be just a separate thread).
Is it reasonable to do the Symbol API in such a way that it can be exported to Python, and Python code could talk to the Symbol Server? That would provide some useful flexibility.
My thinking was that the symbol server would serve up a variant on DWARF. In response to a request for a type, or a variable, or a function, it would send back a custom-crafted DWARF CU that holds all the needed information. It could also annotate the DWARF with hash codes for all objects returned, so that gdb could keep a single instance of all returned entities (without needing a stateful session). Finally, the symbol server could use build ids so that it could unify common objects across all the objects it held, without presenting incorrect information to its clients.
Discard symtab expansions when memory is tight
While in general one might want to just let the o/s handle the paging, in worst-case situations it's not possible - all the swap is gone. And even for less than worst-case situations, it can be beneficial to just discard the expansions and reread the debug info when necessary. It's faster to throw something away than to write it to disk, plus debug info is relatively compact compared to its expansion. Whether it will ever be needed again, and how soon ... well, that's the tradeoff.
Maybe add some parameters to control it?
[AIUI] "final" breakpoints have their location assignment finalized and so when reading new shared libs, and more importantly when re-running a program, there is no need to do a general search for new locations (which can be expensive).
Have a simple API around the (internalized form of the) debug info (and ELF symtab, minsyms - and yes, when we say ELF symtab we also mean all the other file formats ...), and then have the languages build their semantics on top of that.
Can we do without them and just use BFD's symbols for them? (it is tempting, but BFD symbols are often larger than minimal symbols.) Or, similarly, could we bypass BFD entirely and just refer directly to the relevant ELF sections, interpreting on demand? Another alternative is to add them to .gdb_index. Another alternative is to read them in the background, apart from reading debug info.
Once reading of debug info is sped up, like with .gdb_index, reading minsyms dominates (e.g. 6 of 7 seconds spent in gdb start up is spent reading minsyms in one example).
Lots of operations (e.g., setting a breakpoint) involve searching minsyms in addition to the debug info. If the function is described in the debug info, searching minsyms is unnecessary extra effort. To what extent could we have a flag that turns minsyms off (maybe modulo the few places that do need them), and an option to turn them back on as desired?
Standardize a .gdb_index workalike
The LLVM project is working on something very similar to .gdb_index:
They're open to enhancing it where it makes sense. Do we want to replace .gdb_index with that? Is it worth trying to get something like this into the DWARF Standard?
Do not cache symbols
I wonder why we should cache symbols, at all. With caching, I mean any form of symbol object that duplicates data from the debug information. I would instead leave all the data where it is and just use "dwarf pointers" to the data.
A dwarf pointer is essentially either a file or section offset; whereas the former should really be 64bit, we should get away with 32bit section offsets. The base (i.e. the debug information file or section) should always be clear from the context, so we don't waste another 64bit for the pointer. We would, of course, mmap() the respective file or section.
When necessary, the DIE pointed to by the dwarf pointer will be parsed into a temporary symbol object. This object will be destroyed once it is no longer needed. This requires frequent re-parsing. On the other hand, since we're only parsing a single DIE, each time, the overhead should be neglectible.
A simple pointer won't suffice for lookups, since it would require too much and too frequent re-parsing. But we should be able to extend it in a low-memory-overhead way using the same technique. Instead of copying data, I would again use offsets - from the DIE, this time. The name, for example, can be a 16bit offset from the DIE to it's DW_AT_name (where -1 means not present); same for DW_AT_high_pc and DW_AT_low_pc or DW_AT_ranges. We would need a small type enum to select between alternative representations (e.g. high/low pc vs. ranges, or direct string vs. pointer). A symbol like this would only take 12 bytes.
Lazy .gdb_index generation
When debugging programs without .gdb_index, GDB could write separate files containing .gdb_index in the background.
One would want to record (or copy over) a build id to help with versioning problems.
One can imagine a central repository for shared shared-libraries (e.g., system libraries), GDB could look for .gdb_index files in the directories in debug-file-directory. The user will need to be able to specify where to put new files. A default could be ~/.gdb_index. Heh.
Direct expansion of psymtabs
Currently expansion from psymtabs to symtabs is done by scanning the DWARF a second time. This is inefficient, and also leads to bugs when the two readers get out of sync. This can be fixed by instantiating symtabs directly from psymtabs. This idea requires lazy CU and type expansion in order to work properly.
First, we would record a pointer to the DIE with each partial symbol. However, due to the bcache, we would not want to record this directly in the partial symbol, but instead in a separate table. (If memory pressure is an issue here, we can arrange for symtab expansion to free this table.)
Then, expanding a symtab could be done entirely without referring to the DWARF data. In fact, with appropriate changes to struct symbol, we would not even have to copy any data -- we would simply create the symtab and populate it with partially-completed symbols; these symbols would point to their corresponding partial symbols. This would shrink the size of symbols created in this way. (I picture a union here; but really all symbols could be treated this way, with some work, perhaps leading to more memory savings due to increased use of the bcache.)
Lazy CU expansion would let us avoid reading the type DIEs until they were needed by some request. Similarly, we could avoid reading function bodies until needed.
This approach would not immediately help when the index was in use. Lazy CU expansion could still operate, though, letting us avoid some processing while instantiating the CU (I did an experiment where I had the DWARF reader skip function bodies, and this gave a 40% boost during CU expansion); and if necessary we could change the index to record the DIE information.
Split up symbol-based and line-number-based symtabs
At the moment, the symbol data and line-number data is kept together (in struct symtab), with an entry in the symtab list for every file, including every header, with entries for the same CU (DWARF-speak) sharing the same blockvector. This can massively increase the number of entries in the symtab list for large programs. Instead, maybe have separate tables: one for symbol based lookups and one for line-number based lookups. The win is that for symbol based lookups we don't need to skip over non-symtab symtabs (the non-primary ones), and that for line-number based lookups we could do something like have a table based on the file's basename, and only have to iterate over a much smaller set (the basenames_may_differ case would still have to be handled of course).
This may also provide a vehicle for speeding up debug-info reading, though with other improvements the need may not be as great. When doing symbol-based lookups we don't need to build the line table, and when doing line-number based lookups we don't need to read symbols. In practice, there are times when we need both anyway, so that's another reason speeding up this aspect of debug-info reading may not be needed.