Range lists, zero-length functions, linker gc

Sat Jun 20 00:46:46 GMT 2020

On Fri, Jun 19, 2020 at 5:00 AM Mark Wielaard <mark@klomp.org> wrote:
>
> Hi,
>
> On Tue, 2020-06-02 at 11:06 -0700, David Blaikie via Elfutils-devel wrote:
> > > I do think combining Split DWARF and LTO might not be the best
> > > solution. When doing LTO you probably want something like GCC Early
> > > Debug, which is like Split DWARF, but different, because the Early
> > > Debug simply doesn't contain any address (ranges) yet (not even through
> > > indirection like .debug_addr).
> >
> > I don't think Early Debug fits here - it seems like it was
> > specifically for DWARF that doesn't refer to any code (eg: function
> > declarations and type definitions). I don't see how it could be used
> > for the actual address-referencing DWARF needed to describe function
> > definitions.
>
> I think that is kind of the point of Early Debug. Only use DWARF (at
> first) for address/range-less data like types and program scope
> entries, but don't emit anything (in DWARF format) for things that
> might need adjustments during link/LTO phase. The problem with using
> DWARF with address (ranges) during early object creation is that the
> linker isn't capable to rewrite the DWARF. You'll need a linker plugin
> that calls back into the compiler to do the actual LTO and emit the
> actual DWARF containing address/ranges (which can then link back to the
> already emitted DWARF types/program scope/etc during the Early Debug
> phase). I think the issue you are describing is actually that you do
> use DWARF to describe function definitions (not just the declarations)
> too early. If you aren't sure yet which addresses will be used DWARF
> isn't really the appropriate (temporary) debug format.

Sorry, I think we keep talking around each other. Not sure if we can
reach a good consensus or shared understanding on this topic.

DWARF in unlinked object files has been a fairly well used temporary
debug format for a long time - and the DWARF spec has done a lot to
ensure it is compatible with ELF in both object files and linkers
forever, basically? So I don't think it'd be suitable to say "DWARF
isn't an appropriate intermediate debug format to use between
compilers and linkers". In the sense that I don't think either the
DWARF committee members, producers, or consumers would agree with this
sentiment.

> > > > > > & again the overhead of all those separate contributions, headers,
> > > > > > etc, turns out to be not very desirable in any case.
> > > > >
> > > > > Yes, I agree with that. But as said earlier, maybe the compiler
> > > > > shouldn't have generated to code/data in the first place?
> > > >
> > > > In the (especially) C++ compilation model, I don't believe that's
> > > > possible - inline functions, templates, etc, require duplication -
> > > > unless you have a more complicated build process that can gather the
> > > > potential duplication, then fan back out again to compile, etc.
> > > > ThinLTO does some of this - at a cost of a more complicated build
> > > > system, etc.
> > >
> > > It might be useful for the original discussion to have a few more
> > > concrete examples to show when you might have unused code that the
> > > linker might want to discard, but where the compiler could only produce
> > > DWARF in one big blob. Apart of the -ffunction-sections case,
> >
> > Function sections, inline functions, function templates are core examples.
>
> I understand the function sections case, but can you give actual
> examples of an inline function or function template source code and how
> a DWARF producer generates DWARF for that? Maybe some simple source
> code we can put through gcc or clang to see how they (mis)handle it.
> Not being a compiler architect I am not sure I understand why those
> cannot be expressed correctly.

oh, sure! sorry.

a simple case of inline functions being deduplicated looks like this:

a.cpp:
inline void f1() { }
void f2() {
  f1();
}

b.cpp:
inline void f1() { }
void f2();
int main() {
  f1();
  f2();
}

This actually demonstrates a slightly different behavior of bfd and
gold: When the comdats are the same size (I'm told that's the
heuristic) and the local symbol names the DWARF uses to refer to the
functions (f1 in this case) - then both DWARF descriptions are
resolved to point to the same deduplicated copy of 'f1', eg:

BFD and Gold both produce this DWARF (uninteresting attributes have
been omitted):

DW_TAG_compile_unit [1] *
  DW_AT_name [DW_FORM_strp]     ( .debug_str[0x00000065] = "a.cpp")
  DW_AT_low_pc [DW_FORM_addr]   (0x0000000000000000)
  DW_AT_ranges [DW_FORM_sec_offset]     (0x00000000
     [0x0000000000401110, 0x000000000040111b)
     [0x0000000000401120, 0x0000000000401126))
  DW_TAG_subprogram [2]
    DW_AT_low_pc [DW_FORM_addr] (0x0000000000401110)
    DW_AT_high_pc [DW_FORM_data4]       (0x0000000b)
    DW_AT_name [DW_FORM_strp]   ( .debug_str[0x0000009d] = "f2")
  DW_TAG_subprogram [2]
    DW_AT_low_pc [DW_FORM_addr] (0x0000000000401120)
    DW_AT_high_pc [DW_FORM_data4]       (0x00000006)
    DW_AT_name [DW_FORM_strp]   ( .debug_str[0x000000a7] = "f1")
DW_TAG_compile_unit [1] *
  DW_AT_name [DW_FORM_strp]     ( .debug_str[0x000000aa] = "b.cpp")
  DW_AT_low_pc [DW_FORM_addr]   (0x0000000000000000)
  DW_AT_ranges [DW_FORM_sec_offset]     (0x00000030
     [0x0000000000401130, 0x0000000000401142)
     [0x0000000000401120, 0x0000000000401126))
  DW_TAG_subprogram [2]
    DW_AT_low_pc [DW_FORM_addr] (0x0000000000401130)
    DW_AT_high_pc [DW_FORM_data4]       (0x00000012)
    DW_AT_name [DW_FORM_strp]   ( .debug_str[0x000000b0] = "main")
  DW_TAG_subprogram [3]
    DW_AT_low_pc [DW_FORM_addr] (0x0000000000401120)
    DW_AT_high_pc [DW_FORM_data4]       (0x00000006)
    DW_AT_name [DW_FORM_strp]   ( .debug_str[0x000000a7] = "f1")

Now you have two CUs that have overlapping ranges, which is
interesting - if not strictly invalid (DWARF being permissive and
all). Though I think the size heuristic is risky - it's possible that
'f1' was optimized differently in the two compilations and just
happened to end up with the same size - but the DWARF descriptions may
be incorrect for the other version of the function (eg: one compiler
chose to put a constant in one register, the toher compiler used
another register - same instruction sequence length, but the DWARF
would be different and incorrect to mismatch like that)

If you end up with different function lengths (which is common enough
in larger programs - different other definitions may be available,
different inlining heuristics about overall object size, etc, may kick
in) then you get BFD and Gold's current tombstoning behavior:

DW_TAG_compile_unit [1] *
  DW_AT_name [DW_FORM_strp]     ( .debug_str[0x00000065] = "a.cpp")
  DW_AT_low_pc [DW_FORM_addr]   (0x0000000000000000)
  DW_AT_ranges [DW_FORM_sec_offset]     (0x00000000
     [0x0000000000401110, 0x000000000040111b)
     [0x0000000000401120, 0x000000000040112b))
  DW_TAG_subprogram [2]
    DW_AT_low_pc [DW_FORM_addr] (0x0000000000401110)
    DW_AT_high_pc [DW_FORM_data4]       (0x0000000b)
    DW_AT_name [DW_FORM_strp]   ( .debug_str[0x0000009d] = "f2")
  DW_TAG_subprogram [2]
    DW_AT_low_pc [DW_FORM_addr] (0x0000000000401120)
    DW_AT_high_pc [DW_FORM_data4]       (0x0000000b)
    DW_AT_name [DW_FORM_strp]   ( .debug_str[0x000000a7] = "f1")
DW_TAG_compile_unit [1] *
  DW_AT_name [DW_FORM_strp]     ( .debug_str[0x000000aa] = "b.cpp")
  DW_AT_low_pc [DW_FORM_addr]   (0x0000000000000000)
  DW_AT_ranges [DW_FORM_sec_offset]     (0x00000030
     [0x0000000000401130, 0x0000000000401142)
     [0x0000000000000001, 0x0000000000000001))
  DW_TAG_subprogram [2]
    DW_AT_low_pc [DW_FORM_addr] (0x0000000000401130)
    DW_AT_high_pc [DW_FORM_data4]       (0x00000012)
    DW_AT_name [DW_FORM_strp]   ( .debug_str[0x000000b0] = "main")
  DW_TAG_subprogram [3]
    DW_AT_low_pc [DW_FORM_addr] (0x0000000000000000)
    DW_AT_high_pc [DW_FORM_data4]       (0x00000006)
    DW_AT_name [DW_FORM_strp]   ( .debug_str[0x000000a7] = "f1")

In this case BFD uses the tombstone value 0 in most sections, but uses
1 in debug_ranges to ensure it doesn't produce the 0,0 that would end
the range list early (this workaround is incomplete and should also be
applied to debug_loc which is terminated by 0,0 too - but GCC (and
Clang) doesn't produce any inter-function location lists, so this
doesn't present a problem in practice/for now, except for dumping
tools which end up seeing "holes" in debug_loc that would otherwise be
dumpable)

Gold's behavior in this case is a little different, using the 0+addend approach:

DW_TAG_compile_unit [1] *
  DW_AT_name [DW_FORM_strp]     ( .debug_str[0x00000065] = "a.cpp")
  DW_AT_low_pc [DW_FORM_addr]   (0x0000000000000000)
  DW_AT_ranges [DW_FORM_sec_offset]     (0x00000000
     [0x0000000000400540, 0x000000000040054b)
     [0x0000000000400550, 0x000000000040055b))
  DW_TAG_subprogram [2]
    DW_AT_low_pc [DW_FORM_addr] (0x0000000000400540)
    DW_AT_high_pc [DW_FORM_data4]       (0x0000000b)
    DW_AT_name [DW_FORM_strp]   ( .debug_str[0x0000009d] = "f2")
  DW_TAG_subprogram [2]
    DW_AT_low_pc [DW_FORM_addr] (0x0000000000400550)
    DW_AT_high_pc [DW_FORM_data4]       (0x0000000b)
    DW_AT_name [DW_FORM_strp]   ( .debug_str[0x000000a7] = "f1")
DW_TAG_compile_unit [1] *
  DW_AT_name [DW_FORM_strp]     ( .debug_str[0x000000aa] = "b.cpp")
  DW_AT_low_pc [DW_FORM_addr]   (0x0000000000000000)
  DW_AT_ranges [DW_FORM_sec_offset]     (0x00000030
     [0x0000000000400560, 0x0000000000400572)
     [0x0000000000000000, 0x0000000000000006))
  DW_TAG_subprogram [2]
    DW_AT_low_pc [DW_FORM_addr] (0x0000000000400560)
    DW_AT_high_pc [DW_FORM_data4]       (0x00000012)
    DW_AT_name [DW_FORM_strp]   ( .debug_str[0x000000b0] = "main")
  DW_TAG_subprogram [3]
    DW_AT_low_pc [DW_FORM_addr] (0x0000000000000000)
    DW_AT_high_pc [DW_FORM_data4]       (0x00000006)
    DW_AT_name [DW_FORM_strp]   ( .debug_str[0x000000a7] = "f1")

I introduced an ODR violation here (by modifying a.cpp's f1 to call f2
- thus making a.cpp's f1 a different length from b.cpp's f1) just as
an easy way to demonstrate the "different lengths" issue - but this
could arise from valid code that was differently optimized in the two
translation units.

& yeah - on an LLVM thread we did dabble with what it'd look like to
use comdats without whole separate units to put these together - and
it's possible, though that doesn't apply to Split DWARF (can't piece
together the debug_addr section either - since it'd throw of the
indexes used from the Split DWARF file) - and still adds extra section
overhead. Did prototype debug_ranges/debug_rnglist comdat assembling
(so the CU's range list wouldn't have entries for the
deduplicated/gc'd functions) (but again, more ELF sections - for
little gain in linked debug info size for the cost in intermediate
object size)

> > > where I
> > > would argue the compiler simply needs to make sure that if it generates
> > > code in separate sections it also should create the DWARF separate
> > > section (groups).
> >
> > I don't think that's practical - the overhead, I believe, is too high.
> > Headers for each section contribution (ELF headers but DWARF headers
> > moreso - having a separate .debug_addr, .debug_line, etc section for
> > each function would be very expensive) would make for very large
> > object files.
>
> I see your point, but maybe this shouldn't be handled by the linker
> then, but maybe have a linker plugin so the compiler can fixup the
> DWARF (or generate it later).

This sounds like it'd still be fairly intrusive (architecturally) and
expensive (both from a software complexity and linking time/memory
usage/etc). I'm not ruling it out as a possibility - and I'm
interested in dabbling with this kind of deduplication purely
academically (my users use Split DWARF, so there's no opportunity
there to fix this - so my interest in in-.o/linked executable DWARF is
limited to personal interest). I'm curious about just how expensive
the ELF sections would be, what sort of custom scheme might be used
instead (I could imagine a content-aware feature that might be more
terse than generic ELF sections, but not especially invasive (wouldn't
require parsing or rewriting DWARF DIEs, etc). That's being discussed
in the LLVM community - but I don't expect it'll be soon, nor
pervasively used even if it is built.

So I come back to Split DWARF making this fairly well impossible to
implement without a tombstone value, so far as I can imagine/think of.
And function sections at least making it very expensive to implement
(either in terms of object size and/or significant changes to the
nature of linking DWARF). And this being a pretty well established use
case/feature for decades now - that has some relatively small
drawbacks in certain narrow cases (zero length functions, zero or low
address values that are valid in some use cases) that adding an
explicit tombstone is necessary in some cases and beneficial if not
strictly necessary in others.

- Dave

>
> Cheers,
>
> Mark