== libdw interfaces for relocatable addresses == The git branch roland/relocate is a fork from the mainline that has pretty full support in libdw proper for handling relocatable addresses. This has only pure C additions the API/ABI. There are some mailing list postings about this from the time I did it, I haven't dug up the archive pointers but they should be added here. This branch needs some updating before it can be merged, but was mostly done as far as the plain C interfaces, though not tested very well. The crux of the API additions is the new type Dwarf_Relocatable. This is meant to be an opaque structure in the API that can serve in every place Dwarf_Addr is used. All interfaces dealing with addresses get variants that use Dwarf_Relocatable instead. Then there is the new function dwarf_relocatable_info to extract the contents of a Dwarf_Relocatable. That yields a GElf_Sym and associated symbol name and section name, and an addend from that symbol/section (i.e. r_addend from the reloc, or in-place contents for SHT_REL). These interfaces make it possible to deal with ET_REL (.o/.ko) files without first relocating them to fixed addresses as libdwfl does today. This is necessary for DWARF compression to operate on .ko files, which is a key target. The original relocatable addresses must be preserved as relocatable in the output. === plain reader benefits === On the roland/relocate branch, libdwfl no longer applies relocations to ET_REL files when they are first read in. Instead, the relocation support just interns the reloc sections when they are first needed. This turns them into some simple internal arrays that are sorted for binary search. The individual reloc record are looked up here when they are needed to resolve a relocatable address value. In the traditional interfaces, this leads back to using the libdwfl hooks to resolve section addresses or symbols on demand. In the new '''dwarf_*_relocatable''' interfaces, it doesn't resolve anything but instead just fills in '''Dwarf_Relocatable''' objects to describe the reloc. This should improve the performance of libdwfl-using readers (e.g. systemtap) that don't exhaustively use all values in the file. Now the relocation work will be done lazily, and mapped DWARF file data will never take COW faults because it's never modified in place. === libdwfl reworking === This stuff also offers the possibility of revamping the libdwfl implementation and interfaces to be somewhat less cumbersome to use. Originally libdwfl was a pure layer on top of libdw, so it is organized around giving access to the libdw data as it is and then giving address bias values that must be applied to addresses from libdw or libelf. Now that libdw has reloc hooks in its innards, we could change this. It could make sense to reorganize the libdw data structures in some ways related to DwarfReaderSharing. Then we could get all the material sharing we want, but have each '''Dwarf''' object you get from libdwfl actually be tied to a single '''Dwfl_Module''' instead of being shared. Using plain libdw interfaces would then go via the reloc hooks back into libdwfl code to fully resolve to absolute addresses. We might retain all the existing libdwfl interfaces, but all the ones wrapping libdw interfaces (not libelf interfaces) would then yield bias values of zero. === C++ interfaces === The unfinished precursor to relocatable address support in the writer is the reader-side C++ interfaces for relocatable addresses. I didn't work these out in detail, but had some ideas. The core would be a new C++ type '''symbolic_address''', which is probably just a thin C++ wrapper object around Dwarf_Relocatable. This would have query methods like '''is_absolute''', returning a '''bool''' saying whether this is really a resolved address or is truly relocatable. An '''address''' method would yield the '''Dwarf_Addr''' or throw an exception if applied to an unresolved relocatable address. For relocatable addresses, my idea was that accessors would extract a symbol table iterator and an addend. The symbol table iterator requires first defining some C++ interfaces for pure libelf stuff, not just libdw. Those would give an STL-style container interface for a symbol table, so you could iterate through it or index it with the usual interfaces. Various places in the '''dwarf''' branch code are marked with "XXX reloc" comments where some attention for this stuff is needed. The '''dwarf::attr_value::address''' method would be replaced or supplemented with a '''symbolic_address''' method that gives the new object, and its '''dwarf_edit'''/'''dwarf_output''' cousins would permit assigning one. In the writer, this eventually leads to generating reloc records in the output. Note that it's possible the '''constant''' and '''signed_constant''' also need to have relocatable variants. I'm no longer really sure about that, but it needs to be investigated. dwarflint should be able to tell us whether relocs are applied to any attribute values other than '''DW_FORM_addr''' ones. It's certainly the case that '''constant_block''' can need embedded relocs. This would need a new interface more sophisticated than the '''const_vector''' hack for those values. I imagine something like an STL-style container whose elements are either literal-byte blocks or are relocatable values. Similarly, '''range_list''', '''line_entry''', '''ranges''', '''arange_list''', and '''location_attr''' need to be changed so that their address pieces can be '''symbolic_address''' objects. (The corresponding new libdw C interfaces support that on the roland/relocate branch already.) Note that an individual '''location_attr::mapped_type''' needs to change from being a plain '''const_vector''' to being a structured type as well. That can do something to wrap the '''dwarf_getlocation*''' interfaces that yield a vector of distinct operations. Each individual operation might be a '''DW_OP_addr''' that needs to be relocatable. My original thinking was that this would tie in to the much higher-level DwarfLocations ideas. === dwarf_output emitting ET_REL === There are two kinds of relocations in DWARF data. 1. References to other DWARF sections. Here the target of the reloc records is in some '''.debug_*''' section. These don't need to exist as relocatable entities in the API at all. In the reloc-aware libdw reading code, they are resolved automatically. In the writer code to emit ET_REL files, these relocs would be generated implicitly wherever they are needed to make the output files properly linkable. 2. Address and constant data. Here the target of the reloc records is in some '''SHF_ALLOC''' section, i.e. it's a real address. For these, the reader has to yield '''symbolic_address''' objects. In the writer, these would emit relocs matching what the original input relocs were. There are also two kinds of '''ET_REL''' files that we write: 1. Actually relocatable objects. This is for when we use the writer to produce normal ''.o'' files. Here the internal DWARF references need to be relocatable so that they can be linked together later. 2. ''Final'' objects. This is for cases like ''.ko'' files (Linux kernel modules). Here the DWARF data stands on its own and will never be fed to the linker. So they don't actually need to have relocs for the internal DWARF references at all. Among the writer's output format knobs will be ''final'' vs. ''reloc'' mode. In ''final'' mode, we don't emit relocs for DWARF references, we juts encode them directly. This is somewhat akin to how output files come from '''eu-unstrip -R'''. Then when the reader is dealing with these input files, there is no extra work for these internal references at all, they are just like a real final linked object. The relocs to real '''SHF_ALLOC''' sections still need relocs even in a ''final'' object like a .ko file.