Range lists, zero-length functions, linker gc

David Blaikie dblaikie@gmail.com
Thu Jun 25 23:45:54 GMT 2020


On Wed, Jun 24, 2020 at 3:22 PM Mark Wielaard <mark@klomp.org> wrote:
>
> Hi David,
>
> On Fri, 2020-06-19 at 17:46 -0700, David Blaikie via Elfutils-devel wrote:
> > On Fri, Jun 19, 2020 at 5:00 AM Mark Wielaard <mark@klomp.org> wrote:
> > > I think that is kind of the point of Early Debug. Only use DWARF (at
> > > first) for address/range-less data like types and program scope
> > > entries, but don't emit anything (in DWARF format) for things that
> > > might need adjustments during link/LTO phase. The problem with using
> > > DWARF with address (ranges) during early object creation is that the
> > > linker isn't capable to rewrite the DWARF. You'll need a linker plugin
> > > that calls back into the compiler to do the actual LTO and emit the
> > > actual DWARF containing address/ranges (which can then link back to the
> > > already emitted DWARF types/program scope/etc during the Early Debug
> > > phase). I think the issue you are describing is actually that you do
> > > use DWARF to describe function definitions (not just the declarations)
> > > too early. If you aren't sure yet which addresses will be used DWARF
> > > isn't really the appropriate (temporary) debug format.
> >
> > Sorry, I think we keep talking around each other. Not sure if we can
> > reach a good consensus or shared understanding on this topic.
>
> I think the confusion comes from the fact that we seem to cycle through
> a couple of different topics which are related, but not really
> connected directly.
>
> There is the topic of using "tombstones" in place of some pc or range
> attributes/tables in the case of traditional linking separate compile
> units/objects. Where we seem to agree that those are better than
> silently producing bad data, but were we disagree whether there are
> other ways to solve the issue (using comdat section for example, where
> we might see the overhead/gains differently).
>
> There is the topic of LTO where part of the linker optimization is done
> through a (compiler) plugin. Where it isn't clear (to me at least) if
> some of the traditional way of handling DWARF in object files makes
> sense.

Oh - perhaps to clarify: I don't know of any implementation that
creates DWARF in intermediate object files in LTO.

> I would argue that GCC shows that for LTO you need something
> like Early Debug, where you only produce parts of the DWARF early that
> don't contain any addresses or ranges, since you don't know yet where
> code/data will end up till after the actual LTO phase, only after which
> it can be produced.

Yeah - I guess that's the point of the name "Early Debug" - it's
earlier than usual, rather than making the rest later than usual.

In LLVM's implementation the faux .o files in LTO contain no DWARF
whatsoever - but a semantic representation something like DWARF
intended to be manipulated by compiler optimizations and designed to
drop unreferenced portions as optimizations make changes. (if you
inline and optimize away a function call, that function may get
dropped - then no DWARF is emitted for it, same as if it were never
called)

Yeah, it'd be theoretically possible to create all the DWARF up-front,
use loclists and rnglists for /everything/ (because you wouldn't know
if a variable would have a single location or multiple until after
optimizations) and then fill in those loclists and rnglists
post-optimization. I don't know of any implementation that does that,
though - it'd make for very verbose DWARF, and I agree with you that
that wouldn't be great - I think the only point of conflict there is:
I don't think that's a concern that's actually manifesting in DWARF
producers today. Certainly not in LLVM & doesn't sound like it is in
GCC.

I think there's enough incentive for compiler performance - not to
produce loads of duplicate DWARF, and to have a fairly
compact/optimizable intermediate representation - there was a lot of
work that went into changing LLVM's representation to be more amenable
to LTO to ensure things got dropped and deduplicated as soon as
possible.

> Then there is the topic of Split Dwarf, where I am not sure it is
> directly relevant to the above two topics. It is just a different
> representation of the DWARF data, with an extra layer of indirections
> used for addresses. Which in the case of the traditional model means
> that you still hit the tombstones, just through an indirection table.
> And for LTO it just makes some things more complicated because you have
> this extra address indirection table, but since you cannot know where
> the addresses end up till after the LTO phase you now have an extra
> layer of indirection to fix up.

I think the point of Split DWARF is, to your first point about you and
I having perhaps different tradeoffs about object size cost (using
comdats to deduplicate/drop DWARF For dead or deduplicated functions)
- in the case of Split DWARF, it's impossible - well, it's impossible
if you're going to use fragmented DWARF (eg: use comdats to stitch
together a single CU out of droppable parts). If you were going to
drop the DWARF related to a dead or deduplicated function when using
Split DWARF you'd have to use a whole separate unit (possibly a
partial_unit) - which would add a lot more size overhead. Perhaps
enough that we'd both agree that's prohibitive (especially since that
cost would persist into the linked binary - so it wouldn't be as much
of a .o/linked executable tradeoff, but an outright growth)

>
> > DWARF in unlinked object files has been a fairly well used temporary
> > debug format for a long time - and the DWARF spec has done a lot to
> > ensure it is compatible with ELF in both object files and linkers
> > forever, basically? So I don't think it'd be suitable to say "DWARF
> > isn't an appropriate intermediate debug format to use between
> > compilers and linkers". In the sense that I don't think either the
> > DWARF committee members, producers, or consumers would agree with this
> > sentiment.
>
> I absolutely agree with that statement for the traditional linker
> model, where you build up DWARF data per compile unit.

Ah, OK - then perhaps that's all we need to really agree on to move
forward with the discussion of a tombstone value, what value it is,
that it should be in the DWARF spec and all the implementations should
know and agree on it?

> But for the LTO
> model, where there is a feedback loop between compiler and linker, I
> don't think (all of) DWARF is an appropriate intermediate debug format.

Neither do I - though if we both agree there is a need for a tombstone
in the traditional linker model, then we do leave it open for very
inefficient LTO implementations to use that feature too - though
there's lots of ways a DWARF producer could produce very inefficient
DWARF & I don't think there's a great need to mandate against it in
general (if we could avoid having the tombstone concept entirely -
sure - but if we've got to have it, I don't know that the LTO
conversation goes anyway in terms of informing the design of the
tombstone feature)

> If only because the concept of "compile unit" gets really fuzzy. I
> think in that model a lot of DWARF can still be used usefully as
> intermediate debug format to pass between compiler, linker, compiler,
> linker during the LTO phase. Just not the part that describes the
> program scope and variable/data locations represented as (ranges of)
> addresses (when produced early).
>
> > > I understand the function sections case, but can you give actual
> > > examples of an inline function or function template source code and how
> > > a DWARF producer generates DWARF for that? Maybe some simple source
> > > code we can put through gcc or clang to see how they (mis)handle it.
> > > Not being a compiler architect I am not sure I understand why those
> > > cannot be expressed correctly.
> >
> > oh, sure! sorry.
> >
> > a simple case of inline functions being deduplicated looks like this:
> >
> > a.cpp:
> > inline void f1() { }
> > void f2() {
> >   f1();
> > }
> >
> > b.cpp:
> > inline void f1() { }
> > void f2();
> > int main() {
> >   f1();
> >   f2();
> > }
> >
> > This actually demonstrates a slightly different behavior of bfd and
> > gold: When the comdats are the same size (I'm told that's the
> > heuristic) and the local symbol names the DWARF uses to refer to the
> > functions (f1 in this case) - then both DWARF descriptions are
> > resolved to point to the same deduplicated copy of 'f1', eg:
>
> Thanks for the concrete example. I'll study it.
>
> Would you mind telling which DWARF producer/compiler you used and which
> command line flags you used to the compiler and linker invocations?

clang or gcc without any extra flags should suffice here

To get the summarized DWARF I showed above, I used this complete command line:

$ clang++ -g a.cpp b.cpp && llvm-dwarfdump -v -debug-info a.out | grep
"DW_TAG\|DW_AT_[^ ]*pc\|DW_AT_ranges\|^ *\[\|DW_AT_name" | sed -e
"s/............//"

(using clang and llvm-dwarfdump from LLVM trunk)

> I
> like to replicate the produced DWARF but wasn't able to get something
> that used ranges like in your examples. I also wonder about the ODR
> violation, does your example depend on this being C++ or does it
> produce the same issues when it was build as a C program?

I believe C has different "inline" semantics that I'm not as familiar
with - but I /believe/ the actual C standard inline semantics wouldn't
produce the kind of situation that C++ does. (in C++ you define an
inline function in every translation it's used - and the compiler can
choose to inline or not, and if it doesn't actually inline then the
object file carries a deduplicable definition of the function and then
the linker picks one of those definitions from any in the input object
files - whereas in C the inline function definition, if not inlined,
is discarded by the compiler and the user must have provided a
non-inline definition in one file as usual - so there's no
duplication/deduplication)

You could use function-sections/gc-sections to observe the "ODR
violation" sort of situation where the addresses go to zero/tombstone
rather than the "two subprograms point to one function" behavior:

eg:

$ clang -g -ffunction-sections -Wl,-gc-sections a.c &&
llvm-dwarfdump-tot -v -debug-info a.out | grep "DW_TAG\|DW_AT_[^
]*pc\|DW_AT_ranges\|^ *\[\|DW_AT_name" | sed -e "s/............//"
DW_TAG_compile_unit [1] *
  DW_AT_name [DW_FORM_strp]     ( .debug_str[0x00000065] = "a.c")
  DW_AT_low_pc [DW_FORM_addr]   (0x0000000000000000)
  DW_AT_ranges [DW_FORM_sec_offset]     (0x00000000
     [0x0000000000000001, 0x0000000000000001)
     [0x0000000000401110, 0x0000000000401118))
  DW_TAG_subprogram [2]
    DW_AT_low_pc [DW_FORM_addr] (0x0000000000000000)
    DW_AT_high_pc [DW_FORM_data4]       (0x00000006)
    DW_AT_name [DW_FORM_strp]   ( .debug_str[0x00000094] = "f1")
  DW_TAG_subprogram [3]
    DW_AT_low_pc [DW_FORM_addr] (0x0000000000401110)
    DW_AT_high_pc [DW_FORM_data4]       (0x00000008)
    DW_AT_name [DW_FORM_strp]   ( .debug_str[0x00000097] = "main")
  DW_TAG_base_type [4]
    DW_AT_name [DW_FORM_strp]   ( .debug_str[0x0000009c] = "int")

& I guess now we can show the full variety of tombstone behavior...

(the above example was with bfd ld, using 1 as a tombstone in
debug_ranges and 0 as the tombstone elsewhere (such as in the low_pc
of the "f1" subprogram)) - this works unless zero or 1 (or other
"small" values - or you have large functions (so [0, 6) range becomes
larger and starts overlapping with the non-gc'd functions)) are part
of the valid address range of the program - if they are, then the
subprogram address ranges become ambiguous & you don't know which
function you're in

Then we've got gold (add "-fuse-ld=gold" to the compilation command),
just snipping the relevant bit of the output:

  DW_AT_ranges [DW_FORM_sec_offset]     (0x00000000
     [0x0000000000000000, 0x0000000000000006)
     [0x0000000000400510, 0x0000000000400518))
  DW_TAG_subprogram [2]
    DW_AT_low_pc [DW_FORM_addr] (0x0000000000000000)

Here we can see gold's technique of using "0+addend" as the tombstone
value - which works, again, until your valid address range is lower or
you have large functions (or you special case zero as the tombstone -
which then works until you have zero as a valid code address, or you
have empty functions (where range and loc lists would get terminated
prematurely) or you have a function that starts at a non-zero
addend... )

Then we've got lld's new behavior (which will hopefully be adopted by
the other linkers and the DWARF standard as a more robust solution):

  DW_AT_ranges [DW_FORM_sec_offset]     (0x00000000
     [0xfffffffffffffffe, 0xfffffffffffffffe) # this would be
0xffffffffffffffff in DWARFv5, but needs to be 0xfffffffffffffffe in
DWARFv4 to avoid creating unintended base address selection entries in
debug_loc and debug_ranges
     [0x0000000000201690, 0x0000000000201698))
  DW_TAG_subprogram [2]
    DW_AT_low_pc [DW_FORM_addr] (0xffffffffffffffff)

Which probably works about as well as the other solutions if the
consumer isn't special casing things (& isn't being too fussy about
the fact that low_pc+(data4)high_pc might overflow... ) and also
allows the consumer to special case more intentionally without ruling
out zero as a valid address, etc.


More information about the Gdb mailing list