== libdw interfaces for relocatable addresses ==

The git branch roland/relocate is a fork from the mainline that has pretty
full support in libdw proper for handling relocatable addresses.  This has
only pure C additions the API/ABI.  There are some mailing list postings
about this from the time I did it, I haven't dug up the archive pointers
but they should be added here.  This branch needs some updating before it
can be merged, but was mostly done as far as the plain C interfaces, though
not tested very well.

The crux of the API additions is the new type Dwarf_Relocatable.  This is
meant to be an opaque structure in the API that can serve in every place
Dwarf_Addr is used.  All interfaces dealing with addresses get variants
that use Dwarf_Relocatable instead.  Then there is the new function
dwarf_relocatable_info to extract the contents of a Dwarf_Relocatable.
That yields a GElf_Sym and associated symbol name and section name,
and an addend from that symbol/section (i.e. r_addend from the reloc,
or in-place contents for SHT_REL).

These interfaces make it possible to deal with ET_REL (.o/.ko) files
without first relocating them to fixed addresses as libdwfl does today.
This is necessary for DWARF compression to operate on .ko files, which is a
key target.  The original relocatable addresses must be preserved as
relocatable in the output.

=== plain reader benefits ===

On the roland/relocate branch, libdwfl no longer applies relocations to
ET_REL files when they are first read in.  Instead, the relocation support
just interns the reloc sections when they are first needed.  This turns
them into some simple internal arrays that are sorted for binary search.
The individual reloc record are looked up here when they are needed to
resolve a relocatable address value.  In the traditional interfaces, this
leads back to using the libdwfl hooks to resolve section addresses or
symbols on demand.  In the new '''dwarf_*_relocatable''' interfaces, it
doesn't resolve anything but instead just fills in '''Dwarf_Relocatable'''
objects to describe the reloc.

This should improve the performance of libdwfl-using readers
(e.g. systemtap) that don't exhaustively use all values in the file.  Now
the relocation work will be done lazily, and mapped DWARF file data will
never take COW faults because it's never modified in place.

=== libdwfl reworking ===

This stuff also offers the possibility of revamping the libdwfl
implementation and interfaces to be somewhat less cumbersome to use.
Originally libdwfl was a pure layer on top of libdw, so it is organized
around giving access to the libdw data as it is and then giving address
bias values that must be applied to addresses from libdw or libelf.

Now that libdw has reloc hooks in its innards, we could change this.
It could make sense to reorganize the libdw data structures in some
ways related to DwarfReaderSharing.  Then we could get all the material
sharing we want, but have each '''Dwarf''' object you get from libdwfl
actually be tied to a single '''Dwfl_Module''' instead of being shared.
Using plain libdw interfaces would then go via the reloc hooks back into
libdwfl code to fully resolve to absolute addresses.  

We might retain all the existing libdwfl interfaces, but all the ones
wrapping libdw interfaces (not libelf interfaces) would then yield
bias values of zero.

=== C++ interfaces ===

The unfinished precursor to relocatable address support in the writer is
the reader-side C++ interfaces for relocatable addresses.  I didn't work
these out in detail, but had some ideas.

The core would be a new C++ type '''symbolic_address''', which is probably
just a thin C++ wrapper object around Dwarf_Relocatable.  This would have
query methods like '''is_absolute''', returning a '''bool''' saying whether
this is really a resolved address or is truly relocatable.  An
'''address''' method would yield the '''Dwarf_Addr''' or throw an exception
if applied to an unresolved relocatable address.

For relocatable addresses, my idea was that accessors would extract a
symbol table iterator and an addend.  The symbol table iterator requires
first defining some C++ interfaces for pure libelf stuff, not just libdw.
Those would give an STL-style container interface for a symbol table, so
you could iterate through it or index it with the usual interfaces.

Various places in the '''dwarf''' branch code are marked with "XXX reloc"
comments where some attention for this stuff is needed.  The
'''dwarf::attr_value::address''' method would be replaced or supplemented
with a '''symbolic_address''' method that gives the new object, and its
'''dwarf_edit'''/'''dwarf_output''' cousins would permit assigning one.  In
the writer, this eventually leads to generating reloc records in the
output.  

Note that it's possible the '''constant''' and '''signed_constant''' also
need to have relocatable variants.  I'm no longer really sure about that,
but it needs to be investigated.  dwarflint should be able to tell us
whether relocs are applied to any attribute values other than
'''DW_FORM_addr''' ones.

It's certainly the case that '''constant_block''' can need embedded relocs.
This would need a new interface more sophisticated than the
'''const_vector''' hack for those values.  I imagine something like an
STL-style container whose elements are either literal-byte blocks or are
relocatable values.

Similarly, '''range_list''', '''line_entry''', '''ranges''',
'''arange_list''', and '''location_attr''' need to be changed so that their
address pieces can be '''symbolic_address''' objects.  (The corresponding
new libdw C interfaces support that on the roland/relocate branch already.)

Note that an individual '''location_attr::mapped_type''' needs to change
from being a plain '''const_vector<uint8_t>''' to being a structured type
as well.  That can do something to wrap the '''dwarf_getlocation*'''
interfaces that yield a vector of distinct operations.  Each individual
operation might be a '''DW_OP_addr''' that needs to be relocatable.
My original thinking was that this would tie in to the much higher-level
DwarfLocations ideas.

=== dwarf_output emitting ET_REL ===

There are two kinds of relocations in DWARF data.

 1. References to other DWARF sections.

    Here the target of the reloc records is in some '''.debug_*''' section.
    These don't need to exist as relocatable entities in the API at all.
    In the reloc-aware libdw reading code, they are resolved automatically.
    In the writer code to emit ET_REL files, these relocs would be
    generated implicitly wherever they are needed to make the output files
    properly linkable.

 2. Address and constant data.

    Here the target of the reloc records is in some '''SHF_ALLOC'''
    section, i.e. it's a real address.  For these, the reader has to yield
    '''symbolic_address''' objects.  In the writer, these would emit relocs
    matching what the original input relocs were.

There are also two kinds of '''ET_REL''' files that we write:

 1. Actually relocatable objects.

    This is for when we use the writer to produce normal ''.o'' files.
    Here the internal DWARF references need to be relocatable so that they
    can be linked together later.

 2. ''Final'' objects.

    This is for cases like ''.ko'' files (Linux kernel modules).
    Here the DWARF data stands on its own and will never be fed to the
    linker.  So they don't actually need to have relocs for the internal
    DWARF references at all.  Among the writer's output format knobs will
    be ''final'' vs. ''reloc'' mode.  In ''final'' mode, we don't emit
    relocs for DWARF references, we juts encode them directly.  This is
    somewhat akin to how output files come from '''eu-unstrip -R'''.
    Then when the reader is dealing with these input files, there is no
    extra work for these internal references at all, they are just like a
    real final linked object.

    The relocs to real '''SHF_ALLOC''' sections still need relocs even in a
    ''final'' object like a .ko file.