Refactor ELF symbol table reading by adding a new symtab reader
Based on existing functionality, implement the reading of ELF symbol
tables as a separate component. This reduces the complexity of
abg-dwarf-reader's read_context by separating and delegating the
functionality. This also allows dedicated testing.
The new namespace symtab_reader contains a couple of new components that
work loosely coupled together. Together they allow for a consistent view
on a symbol table. With filter criteria those views can be restricted,
iterated and consistent lookup maps can be built on top of them. While
this implementation tries to address some shortcomings of the previous
model, it still provides the high level interfaces to the symbol table
contents through sorted iterating and name/address mapped access.
symtab_reader::symtab
While the other classes in the same namespace are merely helpers, this
is the main implementation of symtab reading and storage.
Symtab objects are factory created to ensure a consistent construction
and valid invariants. Thus a symtab will be loaded by either passing
an ELF handle (when reading from binary) or by passing a set of
function/variable symbol maps (when reading from XML).
When constructed they are considered const and are not writable
anymore. As such, all public methods are const.
The load reuses the existing implementation for loading symtab
sections, but since the new implementation does not distinguish
between functions and variables, the code could be simplified. The
support for ppc64 function entry addresses has been deferred to a
later commit.
Linux Kernel symbol tables are now directly loaded by name when
encountering symbols prefixed with the __ksymtab_ as per convention.
This has been tricky in the past due to various different binary
layouts (relocations, position relative relocations, symbol
namespaces, CFI indirections, differences between vmlinux and kernel
modules). Thus the new implementation is much simpler and is less
vulnerable to future ksymtab changes. As we are also not looking up
the Kernel symbols by addresses, we could resolve shortcomings with
symbol aliasing: Previously a symbol and its alias were
indistinguishable as they are having the same symbol address. We could
not identify the one that is actually exported via ksymtab.
One major architectural difference of this implementation is that we
do not early discard suppressed symbols. While we keep them out of the
vector of exported symbols, we still make them available for lookup.
That helps addressing issues when looking up a symbol by address (e.g.
from the ksymtab read implementation) that is suppressed. That would
fail in the existing implementation.
Still, we intend to only instantiate each symbol once and pass around
shared_ptr instances to refer to it from the vector as well as from
the lookup maps.
For reading, there are two access paths that serve the existing
patterns:
1) lookup_symbol: either via a name or an address
2) filtered iteration with begin(), end()
The former is used for direct access with a clue in hand (like a name
or an address), the latter is used for iteration (e.g. when emitting
the XML).
symtab_reader::symtab_iterator
The symtab_iterator is an STL compatible iterator that is returned
from begin() and end() of the symtab. It allows usual forward iterator
operations and can optionally take a filter predicate to skip non
matching elements.
symtab_reader::symtab_filter
The symtab_filter serves as a predicate for the symtab_iterator by
providing a matches(const elf_symbol_sptr&) function. The predicate
is built by ANDing together several conditions on attributes a symbol
can have. The filter conditions are implemented in terms of
std::optional<bool> members to allow a tristate: "needs to have the
condition set", "must not have it set" and "don't care".
symtab_reader::filtered_symtab
The filtered_symtab is a convenience zero cost abstraction that allows
prepopulating the symtab_filter (call it a capture) such that begin()
and end() are now accessible without the need to pass the filter
again. Argumentless begin() and end() are a requirement for range-for
loops and other STL based algorithms.