Suppress the fetch of an archive member via --defsym (glibc/elf/librtld.map.o)

Fangrui Song maskray@google.com
Mon Mar 16 18:31:01 GMT 2020


>>>On 2020-03-16, H.J. Lu wrote:
>>>Glibc build requires a linker compatible with ld.  Can you provide an lld
>>>option to make lld compatible with ld for cases like this?
>>
>>As a contributor of lld, I would be cooperative and be happy to adapt lld if the proposed semantic is reasonable.
>>
>>I am concerned that the --defsym's order dependence with archive files is not so obvious, given -u's behavior:
>>
>># -u inserts an undefined which fetches b.a(b.o)
>>ld.bfd -u foo b.a       # b.a(b.o) is fetched. free is present
>># This can't be order dependent because b.a (not in a group) should have been dropped when we saw -u
>>ld.bfd b.a -u foo       # b.a(b.o) is fetched. free is present
>>
>>
>>Some observations:
>>
>>
>># GNU ld --defsym interacts with an archive
>>ld.bfd a.o b.a --defsym foo=0  # b.a(b.o) is fetched. free is present
>>ld.bfd --defsym foo=0 a.o b.a  # b.a(b.o) is not fetched. free is absent
>>
>># a.x contains one line `foo = 0;`
>># -T a.x is similar to --defsym
>>ld.bfd a.o b.a -T a.x -o a  # b.a(b.o) is fetched. free is present
>>ld.bfd -T a.x a.o b.a -o a  # b.a(b.o) is not fetched. free is absent
>>
>># -u is usually order independent
>># The second can't be order dependent because b.a should have been dropped when we see -u
>>ld.bfd -u foo b.a       # b.a(b.o) is fetched. free is present
>>ld.bfd b.a -u foo       # b.a(b.o) is fetched. free is present
>>
>>
>># gold --defsym is order independent. For the more complex glibc elf/librtld.map.o case, it happens to make it work
>>gold a.o b.a --defsym foo=0    # b.a(b.o) is not fetched. free is absent
>>gold --defsym foo=0 a.o b.a    # b.a(b.o) is not fetched. free is absent
>>
>># gold --export-dynamic-symbol (not in GNU ld) implies -u
>>gold --export-dynamic-symbol foo b.a    # b.a(b.o) is fetched. free is present
>>gold b.a --export-dynamic-symbol foo    # b.a(b.o) is fetched. free is present
>>
>>
>># lld --defsym is order independent. --defsym is processed the last. For elf/librtld.map.o it will report a multiple definition error.
>>ld.lld a.o b.a --defsym foo=0  # b.a(b.o) is fetched. free is present
>>ld.lld --defsym=0 a.o b.a      # b.a(b.o) is fetched. free is present
>>
>>
>>If we aim for robustness and make the librtld.map.o trick supported (I will add a note that gold happens to work),
>>I will hope both the following can suppress b.a(b.o):
>>
>>  ld.bfd a.o b.a --defsym foo=0
>>  ld.bfd --defsym foo=0 a.o b.a
>>
>>(a) Given --defsym's similarity to a symbol assignment specified by a -T, we will hope -T does not behave too differently.
>>(b) Note that in a linker script, at least input files should be order dependent w.r.t. input files on the command line.
>>
>>(a)+(b) => symbol assignments specified by -T need to be declared early but input files specified -T are ordered w.r.t. input files on the command line.
>>
>>
>>For linker portability, projects using this trick (currently glibc is the only one) should place --defsym first to work with
>>existing releases of GNU ld.
>>
>>The added librtld.map.o code is related to https://sourceware.org/bugzilla/show_bug.cgi?id=25486
>
>Gold is irrelevant here since it isn't supported to build glibc:
>
>https://sourceware.org/bugzilla/show_bug.cgi?id=24148
>
>There are also other gold bugs which may impact glibc build.   Please remove
>gold from this discussion.

Thanks for the link. I am aways eager to learn more about linkers..

My motivation is indeed for my selfish dream to make glibc compilable with clang and linkable with lld.
As a contributor of lld and various LLVM tools, I think I am best positioned to make the situation better:)
At least for lld, it seems we need very few portability patches to work.
(At least the default configuration. There are numerous other configurations I don't know how to test now. I know really little about glibc.)

To make it very clear, I won't add workarounds for quality of implementation issues of clang+lld.
I mostly consider lld HEAD (lld<8 is not very reliable.)

To make my intention more plausible, this is in the spirit of https://gcc.gnu.org/wiki/cauldron2019 (GCC/LLVM Collaboration BoF).



Given more thoughts, I would hope we don't rely on the --defsym trick for malloc.os symbols
(already in use before commit 3a0ecccb599a6b1ad4b149dc569c0080e92d057b).

lld processes input files and -T/-u/--defsym in the following order:

1. parseFile(files[i])  // shared, relocatable, archive, linker scripts. Most archive members fetch is done in this step
2. handle -u            // This can trigger more fetch of archive members
3. declare symbols specified by -T and --defsym // too late to suppress fetch of archive members

(I have a feeling that gold's overall strategy is similar, though, when handling --start-group, it
sorta gets into an undesired "mixed" status (my simple a.o b.a example and librtld.map.o exhibit
different behaviors))

I could let lld match GNU ld by adding a separate --defsym step, but the logic would not be
consistent with -T processing.

1. declare symbols specified by --defsym
2. parseFile(files[i])
3. handle -u
4. declare symbols specified by -T

Note that --defsym can accept arbitrary expressions. Their st_shndx/st_value isn't finalized.
Postponing their effectness as late as possible can theoretically make implementations (ld/ldlang.c
ld/ldexp.c) simpler.

Without the --defsym trick, the following scheme may work:

(1) remove malloc.os from libc_pic.a
(2) ld -r -( dl-allobjs.os libc_pic.a -) -Map librtld.map
(3) add malloc.os back to libc_pic.a
(4) process librtld.map and get the libc dependencies
...

(1) and (3) could be avoided if GNU ld supported --start-lib --end-lib (gold,lld):
https://sourceware.org/bugzilla/show_bug.cgi?id=24600


As to why we should make ld a.o b.a --defsym foo=0 and ld --defsym foo=0 a.o b.a behave the same,
order dependency is not robust. We should avoid subtle failures if possible.


More information about the Libc-alpha mailing list