Bug 27924

Summary: ld.so: Support DT_RELR relative relocation format
Product: glibc Reporter: Fangrui Song <i>
Component: dynamic-linkAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED FIXED    
Severity: normal CC: carlos, ccoutant, fweimer, hjl.tools, sam
Priority: P2 Flags: fweimer: security-
Version: 2.35   
Target Milestone: 2.36   
See Also: https://sourceware.org/bugzilla/show_bug.cgi?id=27923
https://bugzilla.redhat.com/show_bug.cgi?id=2014699
https://sourceware.org/bugzilla/show_bug.cgi?id=28495
Host: Target:
Build: 2021-06-24 0:00 Last reconfirmed: 2021-06-24 00:00:00
Attachments: A patch
A patch to properly handle zero DT_RELA/DT_REL values

Description Fangrui Song 2021-05-27 18:14:01 UTC
Position independent executables and shared objects built with -Bsymbolic/-Bsymbolic-functions may have many relative relocations (instead of absolute relocations (e.g. R_X86_64_64)).

(The number of R_*_RELATIVE is the dominating factor between a PDE and a PIE in terms of their size differences.)

Relative relocations can be packed in a compact format to save a lot of space. See the last few messages on https://groups.google.com/g/generic-abi/c/bX460iggiKg

An Elf64_Rela takes 24 bytes. 64 consecutive R_*_RELATIVE take 24*64 bytes. In RELR, this just needs 8*2 bytes.

DT_RELR has support in the Linux kernel (arch/arm64/Kconfig ARCH_HAS_RELR) since 2019 https://git.kernel.org/linus/5cf896fb6be3effd9aea455b22213e27be8bdb1d

SHT_RELR/DT_RELR has dumper support in llvm-readobj/llvm-readelf. 
ld.lld --pack-dyn-relocs=relr can encode R_*_RELATIVE in RELR.

GNU ld feature request: https://sourceware.org/bugzilla/show_bug.cgi?id=27923
Comment 1 Fangrui Song 2021-05-27 19:01:46 UTC
It is probably well-known that position independent executables have many relative relocations.

If the Distribution adopts -Bsymbolic-non-weak-functions (safer alternative -Bsymbolic-functions which respects pointer equality for C++ vague linkage definitions), shared objects can benefit a lot as well https://sourceware.org/bugzilla/show_bug.cgi?id=27871

Here is an example of libLLVM-13git.so

# -Bsymbolic-functions (-Bsymbolic-non-weak-functions is similar since -fvisibility-inlines-hidden is already used)
% llvm-readelf -WS symbolic.so | egrep 'Nr|\.rel[ar]'
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 8] .rela.dyn         RELA            00000000003c8d60 3c8d60 27d368 18   A  1   0  8
  [ 9] .rela.plt         RELA            00000000006460c8 6460c8 0020a0 18  AI  1  25  8

# -Bsymbolic-functions --pack-dyn-relocs=relr
% llvm-readelf -WS relr.so | egrep 'Nr|\.rel[ar]'    
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 8] .rela.dyn         RELA            00000000003c8d60 3c8d60 016bd8 18   A  1   0  8
  [ 9] .relr.dyn         RELR            00000000003df938 3df938 007ba8 08   A  0   0  8
  [10] .rela.plt         RELA            00000000003e74e0 3e74e0 0020a0 18  AI  1  26  8
Comment 2 Fangrui Song 2021-05-27 23:20:16 UTC
I can find a Chrome OS patch against the old glibc 2.23. Hope a glibc maintainer can refresh it and apply it in the upstream.

https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/master/sys-libs/glibc/files/local/glibc-2.23-sht_relr.patch
Comment 3 Carlos O'Donell 2021-06-24 16:08:08 UTC
The proposal to the ABI seems stalled and probably needs to be pushed again to the gABI group with the updated wording for review again? The alternative is not to extend the gABI but consider this a GNU ABI extension. If Google has interest in moving this forward someone needs to be a change champion here and work with upstream.

Objectively I have no problem with SHT_RELR, but it should be documented in our ABI docs *somewhere* so we have a concrete place to discuss future changes and accept those changes.

Today the gABI list is a "consensus" group with Cary Coutant documenting that consensus. The GNU ABI is implemented by the GNU Toolchain, so there is probably some consensus there that needs forming, but we can do that too.
Comment 4 Carlos O'Donell 2021-06-24 16:30:30 UTC
GNU gABI discussion:
https://sourceware.org/legacy-ml/gnu-gabi/2017-q4/msg00005.html
Comment 5 Fangrui Song 2021-06-24 18:03:52 UTC
(In reply to Carlos O'Donell from comment #3)
> The proposal to the ABI seems stalled and probably needs to be pushed again
> to the gABI group with the updated wording for review again? The alternative
> is not to extend the gABI but consider this a GNU ABI extension. If Google
> has interest in moving this forward someone needs to be a change champion
> here and work with upstream.
> 
> Objectively I have no problem with SHT_RELR, but it should be documented in
> our ABI docs *somewhere* so we have a concrete place to discuss future
> changes and accept those changes.
> 
> Today the gABI list is a "consensus" group with Cary Coutant documenting
> that consensus. The GNU ABI is implemented by the GNU Toolchain, so there is
> probably some consensus there that needs forming, but we can do that too.

http://www.sco.com/developers/gabi/latest/contents.html is unmaintained and receives no update since circa 2016. So SHT_RELR is wording adopted by many groups, including Solaris which has many differences with the GNU ABI.

If https://groups.google.com/g/generic-abi/c/9OO5vhxb00Y "Ongoing Maintenance of the gABI" will take some time, perhaps the wording can be added to System V Application Binary Interface Linux Extensions temporarily?
Comment 6 Carlos O'Donell 2021-06-24 18:15:51 UTC
(In reply to Fangrui Song from comment #5)
> (In reply to Carlos O'Donell from comment #3)
> > The proposal to the ABI seems stalled and probably needs to be pushed again
> > to the gABI group with the updated wording for review again? The alternative
> > is not to extend the gABI but consider this a GNU ABI extension. If Google
> > has interest in moving this forward someone needs to be a change champion
> > here and work with upstream.
> > 
> > Objectively I have no problem with SHT_RELR, but it should be documented in
> > our ABI docs *somewhere* so we have a concrete place to discuss future
> > changes and accept those changes.
> > 
> > Today the gABI list is a "consensus" group with Cary Coutant documenting
> > that consensus. The GNU ABI is implemented by the GNU Toolchain, so there is
> > probably some consensus there that needs forming, but we can do that too.
> 
> http://www.sco.com/developers/gabi/latest/contents.html is unmaintained and
> receives no update since circa 2016. So SHT_RELR is wording adopted by many
> groups, including Solaris which has many differences with the GNU ABI.

No update since 2016 does not mean unmaintained. Please speak with Cary Coutant to determine the current status.

If Solaris has adopted SHT_RELR I assume they put it into their OS-specific ABI?

> If https://groups.google.com/g/generic-abi/c/9OO5vhxb00Y "Ongoing
> Maintenance of the gABI" will take some time, perhaps the wording can be
> added to System V Application Binary Interface Linux Extensions temporarily?

It could, but why temporary? If there are objections at the ELF gABI level, and there might be, then we can just look at a Linux extension [1].

We would submit the constants in the scope of the OS-specific extensions, and then binaries with the old ABI would not match, but I don't know if that is an issue you want to address?

[1] Just for the sake of others reading, the generic Linux ABI (GNU ABI) is maintained by HJ with input from various toolchain developers: https://gitlab.com/x86-psABIs/Linux-ABI
Comment 7 Cary Coutant 2021-06-24 21:38:01 UTC
(In reply to Carlos O'Donell from comment #6)
> (In reply to Fangrui Song from comment #5)
> > (In reply to Carlos O'Donell from comment #3)
> > > The proposal to the ABI seems stalled and probably needs to be pushed again
> > > to the gABI group with the updated wording for review again? The alternative
> > > is not to extend the gABI but consider this a GNU ABI extension. If Google
> > > has interest in moving this forward someone needs to be a change champion
> > > here and work with upstream.
> > > 
> > > Objectively I have no problem with SHT_RELR, but it should be documented in
> > > our ABI docs *somewhere* so we have a concrete place to discuss future
> > > changes and accept those changes.
> > > 
> > > Today the gABI list is a "consensus" group with Cary Coutant documenting
> > > that consensus. The GNU ABI is implemented by the GNU Toolchain, so there is
> > > probably some consensus there that needs forming, but we can do that too.
> > 
> > http://www.sco.com/developers/gabi/latest/contents.html is unmaintained and
> > receives no update since circa 2016. So SHT_RELR is wording adopted by many
> > groups, including Solaris which has many differences with the GNU ABI.
> 
> No update since 2016 does not mean unmaintained. Please speak with Cary
> Coutant to determine the current status.

I am now under contract with Xinuos to convert the documentation
to a more maintainable form and maintain it from there. Current plans
are to translate it to ReStructured Text and place it on github.

SHT_RELR is the first of the updates I plan to apply once the
conversion is done.
Comment 8 Cary Coutant 2021-06-24 21:38:30 UTC
Thanks for forwarding!

-cary

On Thu, Jun 24, 2021 at 11:15 AM carlos at redhat dot com
<sourceware-bugzilla@sourceware.org> wrote:
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=27924
>
> --- Comment #6 from Carlos O'Donell <carlos at redhat dot com> ---
> (In reply to Fangrui Song from comment #5)
> > (In reply to Carlos O'Donell from comment #3)
> > > The proposal to the ABI seems stalled and probably needs to be pushed again
> > > to the gABI group with the updated wording for review again? The alternative
> > > is not to extend the gABI but consider this a GNU ABI extension. If Google
> > > has interest in moving this forward someone needs to be a change champion
> > > here and work with upstream.
> > >
> > > Objectively I have no problem with SHT_RELR, but it should be documented in
> > > our ABI docs *somewhere* so we have a concrete place to discuss future
> > > changes and accept those changes.
> > >
> > > Today the gABI list is a "consensus" group with Cary Coutant documenting
> > > that consensus. The GNU ABI is implemented by the GNU Toolchain, so there is
> > > probably some consensus there that needs forming, but we can do that too.
> >
> > http://www.sco.com/developers/gabi/latest/contents.html is unmaintained and
> > receives no update since circa 2016. So SHT_RELR is wording adopted by many
> > groups, including Solaris which has many differences with the GNU ABI.
>
> No update since 2016 does not mean unmaintained. Please speak with Cary Coutant
> to determine the current status.
>
> If Solaris has adopted SHT_RELR I assume they put it into their OS-specific
> ABI?
>
> > If https://groups.google.com/g/generic-abi/c/9OO5vhxb00Y "Ongoing
> > Maintenance of the gABI" will take some time, perhaps the wording can be
> > added to System V Application Binary Interface Linux Extensions temporarily?
>
> It could, but why temporary? If there are objections at the ELF gABI level, and
> there might be, then we can just look at a Linux extension [1].
>
> We would submit the constants in the scope of the OS-specific extensions, and
> then binaries with the old ABI would not match, but I don't know if that is an
> issue you want to address?
>
> [1] Just for the sake of others reading, the generic Linux ABI (GNU ABI) is
> maintained by HJ with input from various toolchain developers:
> https://gitlab.com/x86-psABIs/Linux-ABI
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
Comment 10 Fangrui Song 2021-10-19 16:51:21 UTC
in-reply-to: https://bugzilla.redhat.com/show_bug.cgi?id=2014699#c1

> * [v2] elf: Support DT_RELR relative relocation format [BZ #27924]
>  * Need official gABI assignment (Carry Coutant status?)
>   * Needs to be reserved verbally by Carry (Action item)
>   * Required for patch acceptance.

s/Carry/Cary.

https://groups.google.com/g/generic-abi/c/bX460iggiKg/m/6psrcLHLAgAJ (2018-02)
https://sourceware.org/bugzilla/show_bug.cgi?id=27924#c7 (2021-06) and
https://sourceware.org/pipermail/libc-alpha/2021-October/131781.html
are verbal by Cary.

> * Need BFD port or a released llvm/lld that supports DT_RELR to review patches.
>  * Minimally a 32-bit and 64-bit port that can build glibc and passes regression testing.
>  * Required for patch review.

LLD since 7.0.0 supports --pack-dyn-relocs=relr.
Both arm and aarch64 ports have users.

ld.lld --pack-dyn-relocs=relr produces working executables on Fuchsia, Linux kernel's arm64 port, FreeBSD-CURRENT, ChromeOS's (patched glibc) today.
Just that the produced executable segfaults on glibc due to lack of support.

> * Official glibc relese with feature?
>  * Need BFD port to support build-many-glibcs to provide testing confidence.
> * HJ: Would like to see a BFD port before a glibc release with the features.
>  * Sufficient to have a released binutils with DT_RELR also.

Wish that a GNU ld maintainer can work on this...

> * Release date of released static linkers with support should be close to glibc release with feature.
>  * When glibc is released users can pick a static linker to test with (binutils or lld).

LLD since 7.0.0 supports --pack-dyn-relocs=relr, so the release date (2018) cannot be close to the glibc version now...
Even in the absence of GNU ld support,  with DT_RELR support, users can use ld.lld --pack-dyn-relocs=relr today.
Comment 11 H.J. Lu 2022-01-05 00:25:15 UTC
The code doesn't work when DT_RELR is enabled in ld.so:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7639766 in _dlerror_run (operate=operate@entry=0xf7639ba0 <dlopen_doit>, args=args@entry=0xffffddb0) at dlerror.c:138
138	  int errcode = GLRO (dl_catch_error) (&objname, &errstring, &malloced,
(gdb) p _rtld_local_ro._dl_catch_error
$1 = (int (*)(const char **, const char **, _Bool *, void (*)(void *), 
    void *)) 0xeffae7f0
(gdb) watch _rtld_local_ro._dl_catch_error
Hardware watchpoint 1: _rtld_local_ro._dl_catch_error
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /export/build/gnu/tools-build/glibc-gitlab-test/build-x86_64-linux/elf/global --direct
BFD: /export/build/gnu/tools-build/glibc-gitlab-test/build-x86_64-linux/elf/ld.so: unknown type [0x13] section `.relr.dyn'
warning: `/export/build/gnu/tools-build/glibc-gitlab-test/build-x86_64-linux/elf/ld.so': Shared library architecture unknown is not compatible with target architecture i386:x86-64.
warning: `/export/build/gnu/tools-build/glibc-gitlab-test/build-x86_64-linux/elf/ld.so': Shared library architecture unknown is not compatible with target architecture i386:x86-64.

Hardware watchpoint 1: _rtld_local_ro._dl_catch_error

Old value = (int (*)(const char **, const char **, _Bool *, void (*)(void *), void *)) 0x2a7f0
New value = (int (*)(const char **, const char **, _Bool *, void (*)(void *), void *)) 0xf7fec7f0 <_rtld_catch_error>
0x00007ffff7fcffb7 in _dl_start (arg=<error reading variable: Cannot access memory at address 0xffffde60>) at rtld.c:572
572	      ELF_DYNAMIC_RELOCATE (&bootstrap_map, NULL, 0, 0, 0);
(gdb) c
Continuing.
BFD: /export/build/gnu/tools-build/glibc-gitlab-test/build-x86_64-linux/elf/ld.so: unknown type [0x13] section `.relr.dyn'
warning: `/export/build/gnu/tools-build/glibc-gitlab-test/build-x86_64-linux/elf/ld.so': Shared library architecture unknown is not compatible with target architecture i386:x86-64.

Hardware watchpoint 1: _rtld_local_ro._dl_catch_error

Old value = (int (*)(const char **, const char **, _Bool *, void (*)(void *), void *)) 0xf7fec7f0 <_rtld_catch_error>
New value = (int (*)(const char **, const char **, _Bool *, void (*)(void *), void *)) 0xeffae7f0
0x00007ffff7fdae3f in _dl_relocate_object (l=l@entry=0xf7ff9250 <_rtld_local+2736>, scope=<optimized out>, reloc_mode=reloc_mode@entry=0, consider_profiling=<optimized out>, consider_profiling@entry=0) at dl-reloc.c:288
288	    ELF_DYNAMIC_RELOCATE (l, scope, lazy, consider_profiling, skip_ifunc);
(gdb) 

The problem is that ELF_DYNAMIC_DO_RELR is down twice on ld.so.  The first
time is for RTLD_BOOTSTRAP.  It needs to be fixed.
Comment 12 Fangrui Song 2022-01-05 01:11:06 UTC
(In reply to H.J. Lu from comment #11)
> The code doesn't work when DT_RELR is enabled in ld.so:
> 
> [...]
>
> The problem is that ELF_DYNAMIC_DO_RELR is down twice on ld.so.  The first
> time is for RTLD_BOOTSTRAP.  It needs to be fixed.

I noticed this (building ld.so itself with DT_RELR) but did not investigate it. Thanks for investigation! How to fix it?
Comment 13 H.J. Lu 2022-01-05 01:49:30 UTC
Created attachment 13891 [details]
A patch
Comment 14 H.J. Lu 2022-01-05 01:50:04 UTC
(In reply to Fangrui Song from comment #12)
> (In reply to H.J. Lu from comment #11)
> > The code doesn't work when DT_RELR is enabled in ld.so:
> > 
> > [...]
> >
> > The problem is that ELF_DYNAMIC_DO_RELR is down twice on ld.so.  The first
> > time is for RTLD_BOOTSTRAP.  It needs to be fixed.
> 
> I noticed this (building ld.so itself with DT_RELR) but did not investigate
> it. Thanks for investigation! How to fix it?

See my patch.
Comment 15 Fangrui Song 2022-01-05 03:27:40 UTC
(In reply to H.J. Lu from comment #14)
> (In reply to Fangrui Song from comment #12)
> > (In reply to H.J. Lu from comment #11)
> > > The code doesn't work when DT_RELR is enabled in ld.so:
> > > 
> > > [...]
> > >
> > > The problem is that ELF_DYNAMIC_DO_RELR is down twice on ld.so.  The first
> > > time is for RTLD_BOOTSTRAP.  It needs to be fixed.
> > 
> > I noticed this (building ld.so itself with DT_RELR) but did not investigate
> > it. Thanks for investigation! How to fix it?
> 
> See my patch.

Thanks. Folded into https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/maskray/relr
Comment 16 H.J. Lu 2022-01-05 03:39:41 UTC
There is another problem.  A static PIE with DT_RELR can have
0 DT_RELA entry when all relative relocations are in DT_RELR:

[hjl@gnu-tgl-3 DT_RELR-3]$ readelf -d x-static

Dynamic section at offset 0xbbd88 contains 24 entries:
  Tag        Type                         Name/Value
 0x000000000000000c (INIT)               0x1000
 0x000000000000000d (FINI)               0x8d630
 0x0000000000000019 (INIT_ARRAY)         0xb9750
 0x000000000000001b (INIT_ARRAYSZ)       8 (bytes)
 0x000000000000001a (FINI_ARRAY)         0xb9758
 0x000000000000001c (FINI_ARRAYSZ)       8 (bytes)
 0x000000006ffffef5 (GNU_HASH)           0x368
 0x0000000000000005 (STRTAB)             0x3a0
 0x0000000000000006 (SYMTAB)             0x388
 0x000000000000000a (STRSZ)              1 (bytes)
 0x000000000000000b (SYMENT)             24 (bytes)
 0x0000000000000015 (DEBUG)              0x0
 0x0000000000000003 (PLTGOT)             0xbd000
 0x0000000000000002 (PLTRELSZ)           576 (bytes)
 0x0000000000000014 (PLTREL)             RELA
 0x0000000000000017 (JMPREL)             0x3a8
 0x0000000000000007 (RELA)               0x0
 0x0000000000000008 (RELASZ)             0 (bytes)
 0x0000000000000009 (RELAENT)            24 (bytes)
 0x000000006ffffffb (FLAGS_1)            Flags: PIE
 0x0000000000000024 (RELR)               0x5e8
 0x0000000000000023 (RELRSZ)             312 (bytes)
 0x0000000000000025 (RELRENT)            8 (bytes)
 0x0000000000000000 (NULL)               0x0
[hjl@gnu-tgl-3 DT_RELR-3]$ 

ld.so doesn't work with 0 DT_RELA entry.
Comment 17 H.J. Lu 2022-01-05 03:42:19 UTC
Created attachment 13892 [details]
A patch to properly handle zero DT_RELA/DT_REL values
Comment 18 H.J. Lu 2022-01-05 22:54:38 UTC
On Fedora 35/x86-64, I saw

FAIL: nptl/test-cond-printers
FAIL: nptl/test-condattr-printers
FAIL: nptl/test-mutex-printers
FAIL: nptl/test-mutexattr-printers
FAIL: nptl/test-rwlock-printers
FAIL: nptl/test-rwlockattr-printers
FAIL: nptl/tst-pthread-gdb-attach-static

when DT_RELR is enabled on glibc itself:

$ cat nptl/test-cond-printers.out 
Error: Response does not match the expected pattern.
Command: print *condvar
Expected pattern: pthread_cond_t
Response:  Cannot access memory at address 0xffffd858
(gdb) 
$ 

Is GDB DT_RELR change needed for these tests?
Comment 19 H.J. Lu 2022-01-08 20:36:16 UTC
It is too late for 2.35.  I'd like to support it in 2.36 with binutils 2.38
after 2.35 is branched.
Comment 20 H.J. Lu 2022-01-09 17:10:07 UTC
(In reply to H.J. Lu from comment #18)
> On Fedora 35/x86-64, I saw
> 
> FAIL: nptl/test-cond-printers
> FAIL: nptl/test-condattr-printers
> FAIL: nptl/test-mutex-printers
> FAIL: nptl/test-mutexattr-printers
> FAIL: nptl/test-rwlock-printers
> FAIL: nptl/test-rwlockattr-printers

This is caused by PR 28757.
Comment 21 Fangrui Song 2022-06-07 08:05:19 UTC
Resolved by some changes pushed on 2022-04-26: https://sourceware.org/pipermail/libc-alpha/2022-April/138085.html ([PATCH v11 0/7] Support DT_RELR relative relocation format)