Can DT_RELR catch up glibc 2.35?

Rich Felker dalias@libc.org
Fri Nov 19 19:18:52 GMT 2021


On Wed, Nov 17, 2021 at 04:30:25PM -0800, Fangrui Song wrote:
> On 2021-11-17, H.J. Lu wrote:
> >On Wed, Nov 17, 2021 at 4:46 AM Adhemerval Zanella
> ><adhemerval.zanella@linaro.org> wrote:
> >>
> >>
> >>
> >>On 16/11/2021 21:26, H.J. Lu wrote:
> >>> On Tue, Nov 16, 2021 at 1:07 PM Adhemerval Zanella
> >>> <adhemerval.zanella@linaro.org> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 12/11/2021 04:47, Fangrui Song wrote:
> >>>>> I am glad that https://sourceware.org/pipermail/libc-alpha/2021-October/132029.html
> >>>>> ("[PATCH v2] elf: Support DT_RELR relative relocation format [BZ #27924]") gets
> >>>>> some traction and many folks acknowledge the size benefit.
> >>>>> (On my Arch Linux, I measured 8% decrease for my /usr/bin.)
> >>>>
> >>>> I brought this to the weekly glibc call two weeks ago and if I recall correctly
> >>>> the *main* issue is we need a proper generic ABI definition published to move this
> >>>> forward on glibc side (H.J.Lu was adamant about).
> >>>>
> >>>> From my part, current status where we have multiple system that already support
> >>>> it (android, chromeos, freebsd) and with a toolchain that supports build/check
> >>>> glibc on at least 4 different ABIs (lld 13 on x86 and arm) is good enough.
> >>>>
> >>>> We lack of proper testing while using bfd might a drawback, since we lack a way
> >>>> to generate binaries without linker support.
> >>>>
> >>>>>
> >>>>> There are two potential issues.
> >>>>>
> >>>>> 1. Lack of "Time travel compatibility" detector
> >>>>> 2. Some folks feel that unable to test with scripts/build-many-glibcs.py is a problem.
> >>>>>   (ld.lld --pack-dyn-relocs=relr (since July 2018) is the only linker implementation
> >>>>>   and scripts/build-many-glibcs.py doesn't have an lld configuration)
> >>>>>
> >>>>> Let me address them for you.
> >>>>>
> >>>>> ---
> >>>>>
> >>>>> 1.
> >>>>>
> >>>>> "Time travel compatibility" means running a new object on an old system.
> >>>>> A new object using DT_RELR doesn't have the R_*_RELATIVE part in
> >>>>> .rel.dyn/.rela.dyn and is destined to crash.
> >>>>>
> >>>>> If the GNU ld implementation (which may take a while) adopts an
> >>>>> undefined versioned .dynsym symbol (e.g. _dl_have_relr
> >>>>> https://sourceware.org/pipermail/binutils/2021-October/118347.html),
> >>>>> we can guarantee old ld.so will report an error.
> >>>>> The undefined symbol needs to be versioned because ld -shared (default
> >>>>> to --allow-shlib-undefined) does not error on unversioned symbols. Say
> >>>>> GNU ld adopts something like _dl_have_relr@GLIBC_2.40 . Now it is funny as GNU
> >>>>> ld needs to know the glibc version "GLIBC_2.40", not just the stem
> >>>>> glibc-flavored symbol name "_dl_have_relr".
> >>>>
> >>>> This might be troublesome to backport, since it would require to use a higher
> >>>> version than the baseline one.  I am not sure if distro will be willing or plan
> >>>> to backport such feature though.
> >>>>
> >>>>>
> >>>>> There are non-Linux OSes which don't like a "_dl_have_relr" symbol name.
> >>>>> GNU ld would have to provide options in two flavors, one with
> >>>>> _dl_have_relr@GLIBC_2.40, one without. Among glibc systems, there are
> >>>>> plenty of distros there which don't rigidly require a friendly
> >>>>> diagnostic for "time traverl compatibility", e.g. I pretty sure many
> >>>>> Gentoo Linux folks doing aggressive optimizations know that their
> >>>>> executables don't run on old systems.
> >>>>
> >>>> I think even other Linux libc, such as musl, won't be willing to support
> >>>> tying the DT_RELR to a loader/libc symbol existing (musl even less because
> >>>> it explicit does not support symbol versioning).
> >>>>
> >>>>>
> >>>>> An alternative to _dl_have_relr is EI_ABIVERSION. That is probably even
> >>>>> less appealing because bumping the version locks out many ELF consumers.
> >>>>> https://maskray.me/blog/2021-10-31-relative-relocations-and-relr#ei_abiversion
> >>>>> In addition, I noticed that Debian ld.so 2.32 just seems to ignore EI_ABIVERSION.
> >>>>
> >>>> The problem with EI_ABIVERSION is a limitation of glibc, which only checks
> >>>> EI_ABIVERSION on open_verify() and this is not called on default process
> >>>> execution, where kernel will be one responsible to load both the binary
> >>>> and the interpreter:
> >>>>
> >>>> ---
> >>>> $ cat test.c
> >>>> #include <stdio.h>
> >>>>
> >>>> int main ()
> >>>> {
> >>>>   return 0;
> >>>> }
> >>>> $ gdb ./test
> >>>> [...]
> >>>> (gdb) starti
> >>>> [...]
> >>>> process 1420253
> >>>> Mapped address spaces:
> >>>>
> >>>>           Start Addr           End Addr       Size     Offset objfile
> >>>>       0x555555554000     0x555555555000     0x1000        0x0 /tmp/test/test
> >>>>       0x555555555000     0x555555556000     0x1000     0x1000 /tmp/test/test
> >>>>       0x555555556000     0x555555557000     0x1000     0x2000 /tmp/test/test
> >>>>       0x555555557000     0x555555559000     0x2000     0x2000 /tmp/test/test
> >>>>       0x7ffff7fc2000     0x7ffff7fc6000     0x4000        0x0 [vvar]
> >>>>       0x7ffff7fc6000     0x7ffff7fc8000     0x2000        0x0 [vdso]
> >>>>       0x7ffff7fc8000     0x7ffff7fc9000     0x1000        0x0 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
> >>>>       0x7ffff7fc9000     0x7ffff7ff1000    0x28000     0x1000 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
> >>>>       0x7ffff7ff1000     0x7ffff7ffb000     0xa000    0x29000 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
> >>>>       0x7ffff7ffb000     0x7ffff7fff000     0x4000    0x32000 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
> >>>>       0x7ffffffde000     0x7ffffffff000    0x21000        0x0 [stack]
> >>>>   0xffffffffff600000 0xffffffffff601000     0x1000        0x0 [vsyscall]
> >>>> ---
> >>>>
> >>>> However, the test is correctly executed on any load library and/or if the
> >>>> executable is executed by issuing the loader directly:
> >>>>
> >>>> ---
> >>>> $ readelf -h test
> >>>> ELF Header:
> >>>>   Magic:   7f 45 4c 46 02 01 01 00 *04* 00 00 00 00 00 00 00
> >>>> [...]
> >>>> $ /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 ./test
> >>>> ./test: error while loading shared libraries: ./test: ELF file ABI version invalid
> >>>> ---
> >>>>
> >>>> I think this is an bug, since it basically defeats the EI_ABIVERSION check
> >>>> and makes programs executed by issuing the loader with a different semantic
> >>>> than the one executed through execve syscall.
> >>>>
> >>>> Afaik kernel does not pass such information on auxv vector (we might ask
> >>>> for a AT_EHDR eventually) so a potential fix will cost us some extra
> >>>> syscalls on every program execution (to read and check the ELF Header with
> >>>> similar test done on open_verify()).
> >>>>
> >>>> However it does *not* help on older glibc which will still accept old binaries.
> >>>>
> >>>>>
> >>>>> % r2 -wqc 'wx 22 @ 8' a; readelf -Wh a | grep ABI; ./a
> >>>>>   OS/ABI:                            UNIX - GNU
> >>>>>   ABI Version:                       34
> >>>>> hello
> >>>>>
> >>>>
> >>>> I am not really sure if the 'time travel compatibility' is really an issue,
> >>>> although I saw reports where users try to use chromeos library on glibc that
> >>>> fails in some strange ways (most likely due DT_RELR). If user is deploying
> >>>> a *opt-in* feature that requires proper dynamic loader support, I would
> >>>> expect it know the environment he is targeting.
> >>>>
> >>>> So I think the best course of action for this issue is indeed fix EI_ABIVERSION
> >>>> and make DT_RELR a new 'libc-abis' entry.  We might backport the EI_ABIVERSION
> >>>> fix to some older releases, and distros that want to use DT_RELR should do also.
> >>>
> >>> Given that EI_ABIVERSION doesn't really work, should we revisit my
> >>> GNU_PROPERTY_1_GLIBC_2_NEEDED proposal:
> >>>
> >>> https://sourceware.org/pipermail/binutils/2021-October/118292.html
> >>
> >>The GNU_PROPERTY_1_GLIBC_2_NEEDED still does not really help much if the idea
> >>is to backport DT_RELR to older version and it still adds logic on the static
> >>linker about glibc symbol version.  I would like that static linker know as
> >>little as possible about glibc version, EI_ABIVERSION is way simpler and
> >>already express ABI extensions.
> >>
> >>I still think for DT_RELR instead of inventing another GNU extension, we might
> >>fix EI_ABIVERSION and use it properly.   Checking with kernel, I think it should
> >>be simple: the elf header is located at the AT_PHDR - sizeof (ElfW(Ehdr)), so we
> >>can refactor the tests at open_verify and use on rtld.c for the case execve()
> >>is called for the executable.
> >
> >The scheme should work for older systems without changes.  Can we add
> >GLIBC_PRIVATE_DT_RELR?  Linker adds GLIBC_PRIVATE_DT_RELR
> >version dependency when DT_RELR is generated
> 
> For CCed folks who may be puzzled about the context,
> I have a write-up
> https://maskray.me/blog/2021-10-31-relative-relocations-and-relr#time-travel-compatibility
> which provides my reply to HJ's question as well.
> 
> A synthesized versioned undefined dynamic symbol can indeed catch "time
> travel compatibility", but the mechanism would be the first time ld adds an option variant
> for a particular libc implementation (glibc) locking out all other
> implementations: --pack-dyn-relocs=relr-glibc or -z relr-glibc.
> Sigh, it is really not pretty.
> 
> We know many other libc implementations don't want to synthesize such a
> symbol.

If you really want this, I have an alternate solution: add a new
relocation type to live in the normal REL/RELA table, whose semantics
are "process a DT_RELR table". This will cause the dynamic linker to
error out of it's too old to know about DT_RELR, and it can be ignored
as a no-op (or used as the trigger to process DT_RELR) by ldso that's
new enough to know about it.

Rich


More information about the Libc-alpha mailing list