== libdw changes to exploit sharing better == There are various internal reorganizations of libdw data structures that we should do to better exploit the shared data we expect to have in DWARF files from the compression/writer work. === .debug_abbrev sharing === The plan for the writer is to emit just one big .debug_abbrev chunk that will be used by all CUs in a file. That is not a change of the format per se, and existing reader code will handle it fine. The CU header fields will all just refer to the same .debug_abbrev offset (i.e. all of them being 0 in a normal final-linked file). The current libdw code will redundantly decode and internalize the abbrev details separately for each CU. We should change this. The ''abbrev_hash'' field and related fields should move from '''struct Dwarf_CU''' into a place that can be shared. This would be a new structure for the abbrev info. Then that would be stored in a central table that's keyed by the pointer into the .debug_abbrev section. === .debug_loc sharing === Similarly we now have a table of previously-decoded location expressions that is indexed by the pointer into the .debug_loc section. This is now rooted in the ''locs'' field of '''struct Dwarf_CU'''. In the writer, there is no reason not to put all location lists from all CUs into one big pool and generate just one unified .debug_loc section. It's not very likely we'd get much extra sharing from this, since the CUs' addresses won't overlap. But perhaps we'd get some in a DwarfArchive or perhaps there are cases we haven't thought of within a single file. We can move ''locs'' into someplace shared across CUs (and across files). It's already indexed by the pointer into the .debug_loc data, so nothing about how it's used would change. === .debug_line sharing === In the writer, it will make sense to unify some separate CUs' .debug_line chunks together. To start with, each ''partial_unit'' will normally contain no code definitions, so no actual line program, but it will need a ''stmt_list'' pointer to a .debug_line chunk giving its file table. We can certainly consolidate all such file tables together into one that has the union of all the file names used in any CU that has no line program. It might even make sense to unify all .debug_line chunks into one single one. No two CUs will have overlapping addresses. So it possibly makes sense to join all CUs' line tables together, thus sharing all file tables completely with no repetition of any file name. It's not clear that is a net win, since consumers then would have to process the entire joined line program to populate their address<->line tables. That might not be worthwhile if a consumer was only going to look at a few CUs and there are many. There are also ideas about new a overarching by-address index that could optimize the lookup of a large single line program without playing out the whole thing. That all remains to be resolved. But it's quite certain that for ''partial_unit'' .debug_line chunks with no line program, it is advantageous to join them together as a single unified file table. When that's done, libdw should avoid decoding the same table more than once. Similar to .debug_abbrev, we can move the ''lines'' and ''files'' pointers out of '''struct Dwarf_CU''' into someplace shared. That would be keyed by the pointer into .debug_line data that we get from ''DW_AT_stmt_list''. === inter-file sharing === The first obvious thing is to have these various pointer-indexed tables live in the '''Dwarf''' object. But that only gets them shared across multiple CUs in the same file. In the DwarfArchive plan, there is the possibility for sharing across files too. So it is better to define a new struct e.g. '''dwarf_sharing_context'''. That would contain all these tables. Each table is indexed by the actual pointer to the mapped file data, rather than by section or file offset values. In normal single-file reading, there would be one '''dwarf_sharing_context''' object allocated per file and '''Dwarf''' can just point to that. With DwarfArchive, there would be one '''dwarf_sharing_context''' for the whole archive, and all the '''Dwarf''' objects for individual files would point to that same object. === libdwfl sharing === With the DwarfRelocatable ideas about revamping libdwfl, it could make sense to move from sharing one '''Dwarf''' object across many '''Dwfl_Module''' objects (probably in different '''Dwfl''' address spaces) to having one '''Dwarf''' object per '''Dwfl_Module'''. Then almost everything that is now in '''Dwarf''' should move to a new internal structure that is shared across all the different '''Dwarf''' objects referring to the same file. All the uses that can refer to addresses would have to supply both the '''Dwarf''' pointer through which the access came as well as the pointers into the internal structures. Most of the DWARF resolution and caching stuff would be shared in those internal structures. But actually resolving addresses would refer to the '''Dwarf''' object in question to find its libdwfl hooks.