How to sort mixed DWARF32 and DWARF64 .debug_*

Fangrui Song
Fri Nov 13 18:31:22 GMT 2020


On 2020-11-13, Michael Matz wrote:
>On Fri, 13 Nov 2020, Nick Clifton via Binutils wrote:
>> > Let me rephrase the problem:
>> > If a .debug_* output section S can be larger than 32-bit and its section
>> > offset is referenced by a DWARF32 input section of itself or another
>> > .debug_* output section,
>> > the relocation may be subject to 32-bit relocation overflow.
>> But in this case using DWARF32 is the wrong thing to do.  The producers
>> should be generating DWARF64.  Mitigation is all very well, but it will
>> not solve the overall problem, nor will it guarantee good debug info.
>> > (We can imagine that for a known large project, most compilation units
>> > are DWARF64.
>> > Some archives may have DWARF32 debug info and may be difficult to rebuild.
>> > This partition scheme can solve the problems.)
>> Ok, although I still think that the onus should be on the library maintainers
>> to rebuild their binaries.  But I take your point that this might not always
>> be practical.
>The typical example in such cases are the crt*.o files.  (As are
>system-provided static archives, no matter how bad we think such practice
>is :) libc_nonshared.a? ).

In a link,

... user_object_files.o libs.a libstdc++.a libc_nonshared.a libgcc.a libgcc_eh.a crtn.o crtend.o

Even if user object files are all DWARF64, it may be impractical to
ensure all the system libraries/object files are DWARF64. Moreover, the linked libraries
may not all be DWARF64. When a user has relocation overflow problems,
usually their controlled object files and libraries dominate the total
size, while the linked libraries contribute a small portion but just that
small portion can cause overflows if not ordered before DWARF64.

For such linked libraries, shipping two copies DWARF32/DWARF64 seem to
ask too much...

>On Fri, 13 Nov 2020, Nick Clifton via Binutils wrote:
>> Does such an option --dwarf32-before-dwarf64 look good to binutils folks?
>Would this result in multiple copies of the same section ?  Eg would the
>output file contain say two .debug_info sections, one for 32-bit dwarf
>and one for 64-bit dwarf, and the second one would be further down in the
>memory map ?

This is actually another linker inconsistency problem I want to resolve:)
In GNU ld, if there are two .debug_info output section descriptions,
the contents are merged. While in Gold and LLD, two .debug_info 0 : { ...  }
lead to two output sections. See below.

>I do not actually have any objections to the idea, although it does occur
>to me that maybe it should be extended to cover other types of section as
>well.  (Eg 32-bits vs 64-bit notes).
>Hmmm - maybe this would be better expressed in the linker script, rather
>than as a command line option.  For example:
>  .debug_info     0 : { *(.debug_info${RELOCATING+ .gnu.linkonce.wi.*}) }
>  .debug_abbrev   0 : { *(.debug_abbrev) }
>  .debug_line     0 : { *(.debug_line${RELOCATING+ .debug_line.* .debug_line_end}) }
>  [...]
>Could be replaced by:
>  .debug_info     0 : { IS_32BIT(*(.debug_info${RELOCATING+ .gnu.linkonce.wi.*})) }
>  .debug_abbrev   0 : { IS_32BIT(*(.debug_abbrev)) }
>  .debug_line     0 : { IS_32BIT(*(.debug_line${RELOCATING+ .debug_line.* .debug_line_end})) }
>  [...and then later on....]
>  .debug_info     0 : { IS_64BIT(*(.debug_info${RELOCATING+ .gnu.linkonce.wi.*})) }
>  .debug_abbrev   0 : { IS_64BIT(*(.debug_abbrev)) }
>  .debug_line     0 : { IS_64BIT(*(.debug_line${RELOCATING+ .debug_line.* .debug_line_end})) }
>   [...]
>This would work for any section types that come in 32-bit and 64-bit flavours...
raised the section type idea: we could assign a dedicated section type for
DWARF64 .debug_* sections. The thread also raised ideas about using sh_info.

The section type idea actually looks quite good to me. There are some features we will need:

* we might need to ask on generic-abi whether
   Solaris/HP-UX/... folks are happy with a dedicated section type, if not
   then perhaps we could use SHT_GNU_*
* we need a way in linker scripts to match input sections by type. AFAIK
   there is no existing mechanism.

If we invent a keyword (say, TYPE) to match sections by type, we could use

   .debug_info     0 : { *(TYPE (SHT_PROGBITS) .debug_info${RELOCATING+ .gnu.linkonce.wi.*}) }
   .debug_info     0 : { *(TYPE (SHT_GNU_DWARF64) .debug_info) }


   .debug_info     0 : { *(TYPE (SHT_PROGBITS) .debug_info${RELOCATING+ .gnu.linkonce.wi.*} TYPE (SHT_GNU_DWARF64) .debug_info) }

>> I'd be happy to act as a bridge communicating thoughts from both sides:)
>Are you volunteering to write a patch as well ? :-)  Not to worry, I will
>have a go myself as long as we can agree on a solution that works for both
>the binutils and lld communities.

I am afraid that my ld knowledge is not solid enough to work on such a
large feature:( Nick, can you drive it? :-)

Last, let me share my experiments with the
detecting-DWARF64-by-first-relocation approach. An important guideline is that
we should avoid parsing section contents, even a lightweight header parsing
(this is in contrary to the "smart format, dumb linker" spirit and can lead to
performance issues).

The first relocation of .debug_* beging a 64-bit absolute relocation type (e.g.
R_X86_64_64, R_PPC64_ADDR64) is a good indicator whether it is a DWARF64 section,
to list a few:

* .debug_info: the first relocation is a .debug_abbrev offset
* .debug_names references .debug_info: the first relocation is a .debug_info offset
* .debug_aranges references .debug_info: the first relocation is a .debug_info offset
* .debug_str_offsets references .debug_str: the first relocation is a .debug_str offset

(some "is" above probably should be "is usually")

However, there is some difficulty when deciding whether a .debug_str is DWARF64 (.debug_str
can even be larger than .debug_info and definitely need DWARF32/DWARF64

In DWARF v5, if .debug_str_offsets is used, it has relocations referencing .debug_str =>
this is good, we can mark .debug_* referenced as DWARF64 by first relocations in .debug_*

In DWARF v4 or if .debug_str_offset is not used, it is a problem. A heuristic is:
if an input section in a file is marked DWARF64, we mark all other .debug_* DWARF64.
This makes me feel a bit uneasy because for an output section description

   .debug_str 0 : { *(.debug_str) }

Now the behavior of `*` (or, if we invent a `SORT_*` keyword) is also dependent on other output sections.

Coming to this, I feel that deciding by relocation types looks pretty hacky.

Have thought about both approaches (section type vs checking relocations), I prefer a new section type.
(There is a question whether a new section type works with other binary formats.
While I mostly care about ELF and don't mind if the scheme only works for ELF,
it'd be nice if the section type idea works for some other binary formats, but I don't know about
linker scripts on non-ELF:( )

More information about the Binutils mailing list