Bug 29456 - missing DT_HASH section in shared objects
Summary: missing DT_HASH section in shared objects
Status: UNCONFIRMED
Alias: None
Product: glibc
Classification: Unclassified
Component: build (show other bugs)
Version: 2.36
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-08-08 09:10 UTC by Arkadiusz Hiler
Modified: 2022-08-24 08:57 UTC (History)
14 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Arkadiusz Hiler 2022-08-08 09:10:18 UTC
commit e47de5cb2d4dbecb58f569ed241e8e95c568f03c causes glibc shared object to not have DT_HASH section anymore. Only the DT_GNU_HASH is still present.

What seems to be forgotten is that glibc's dlopen() is not the only thing that may depend on it. This breaks SysV ABI compatibility and all the tools that use DT_HASH for symbol lookup.

This breaks all the games running via Proton that are using EAC.
Comment 1 Florian Weimer 2022-08-08 10:45:30 UTC
On typical distributions, none of the system libraries use DT_HASH, not even core libraries such as libgcc_s.so.1. Any tool that performs symbol lookups needs to support DT_GNU_HASH these days.
Comment 2 Arkadiusz Hiler 2022-08-08 11:33:51 UTC
Hi, thanks for the reply. I wonder where the defaulting to having only DT_GNU_HASH is coming from?

Quoting ld's man page:

       --hash-style=style
           Set the type of linker's hash table(s).
           style can be either "sysv" for classic
           ELF ".hash" section, "gnu" for new style
           GNU ".gnu.hash" section or "both" for
           both the classic ELF ".hash" and new
           style GNU ".gnu.hash" hash tables.  The
           default depends upon how the linker was
           configured, but for most Linux based
           systems it will be "both".


and indeed `ld --help` claims:

    --hash-style=STYLE          Set hash style to sysv/gnu/both.  Default: both

As a sidenote mold defaults to sysv style hashes[0] and the format is well documented[1], whereas for the GNU-style one you have to read the sources of one of the implementations or rely on blog posts.

The change also breaks existing software, not sure what glibc's stance is on that.

[0]: https://github.com/rui314/mold/blob/415673a49e24b1f4189000b9eb4fb1a36a697093/elf/mold.h#L1475
[1]: https://refspecs.linuxfoundation.org/elf/gabi41.pdf section 5-21
Comment 3 Florian Weimer 2022-08-08 12:01:26 UTC
Distributions configure or patch GCC to pass --hash-style=gnu to the linker by default. The linker default does not matter.
Comment 4 Adhemerval Zanella 2022-08-08 12:43:20 UTC
(In reply to Arkadiusz Hiler from comment #0)
> commit e47de5cb2d4dbecb58f569ed241e8e95c568f03c causes glibc shared object
> to not have DT_HASH section anymore. Only the DT_GNU_HASH is still present.
> 
> What seems to be forgotten is that glibc's dlopen() is not the only thing
> that may depend on it. This breaks SysV ABI compatibility and all the tools
> that use DT_HASH for symbol lookup.

The static linker option provides objects with only one kind of hash section. so I take it is expected from tools to handle this on GNU Linux environments. 

> 
> This breaks all the games running via Proton that are using EAC.

What EAC is doing exactly that requires a DT_HASH on glibc? What happens if you mix DT_HASH and DT_GNU_HASH shared objects (for instance shared libraries with only one option)? It does seem a shortcoming from EAC, although I am not sure what exactly it is trying to enforce here.
Comment 5 Arkadiusz Hiler 2022-08-08 14:34:37 UTC
(In reply to Florian Weimer from comment #3)
> Distributions configure or patch GCC to pass --hash-style=gnu to the linker
> by default. The linker default does not matter.

Thanks, that explains a few things. My gcc -v indeed claims that:

    Configured with: /build/gcc/src/gcc/configure ... --with-linker-hash-style=gnu ...

According to gcc/docs/install.texi the default is sysv but after browsing
the code for a bit it looks like if `--with-linker-hash-style` is not
specified no `--hash-style` is passed to the linker so it would depend on
linker's defaults.

Do you know why distros do this? Looking at package history it seems like an
artifact from the times where both was a bit problematic. There used to be
some custom patches for that years ago.


(In reply to Adhemerval Zanella from comment #4)
> > This breaks all the games running via Proton that are using EAC.
> 
> What EAC is doing exactly that requires a DT_HASH on glibc? What happens if
> you mix DT_HASH and DT_GNU_HASH shared objects (for instance shared
> libraries with only one option)? It does seem a shortcoming from EAC,
> although I am not sure what exactly it is trying to enforce here.

I can only guess but probably to look up some symbols in multiple ways to
assure no hooking has occurred.

It seems to require it only for the glibc DSOs as they were shipping with
DT_HASH for years, even on distros that default to gnu, and this is an
unexpected change.
Comment 6 Florian Weimer 2022-08-08 14:45:39 UTC
(In reply to Arkadiusz Hiler from comment #5)
> (In reply to Florian Weimer from comment #3)
> > Distributions configure or patch GCC to pass --hash-style=gnu to the linker
> > by default. The linker default does not matter.
> 
> Thanks, that explains a few things. My gcc -v indeed claims that:
> 
>     Configured with: /build/gcc/src/gcc/configure ...
> --with-linker-hash-style=gnu ...
> 
> According to gcc/docs/install.texi the default is sysv but after browsing
> the code for a bit it looks like if `--with-linker-hash-style` is not
> specified no `--hash-style` is passed to the linker so it would depend on
> linker's defaults.
> 
> Do you know why distros do this?

sysv hashing is much slower for symbol binding because we have to do a strcmp even on lookup failure. The Bloom filter is missing, too. Once the glibc dynamic linker supported it, I think distributions really wanted to use it, even if binutils and GCC defaults had not caught up at that point yet. It fixes real issues around startup performance.
Comment 7 Adhemerval Zanella 2022-08-08 14:49:57 UTC
(In reply to Arkadiusz Hiler from comment #5)
> (In reply to Florian Weimer from comment #3)
> > Distributions configure or patch GCC to pass --hash-style=gnu to the linker
> > by default. The linker default does not matter.
> 
> Thanks, that explains a few things. My gcc -v indeed claims that:
> 
>     Configured with: /build/gcc/src/gcc/configure ...
> --with-linker-hash-style=gnu ...
> 
> According to gcc/docs/install.texi the default is sysv but after browsing
> the code for a bit it looks like if `--with-linker-hash-style` is not
> specified no `--hash-style` is passed to the linker so it would depend on
> linker's defaults.
> 
> Do you know why distros do this? Looking at package history it seems like an
> artifact from the times where both was a bit problematic. There used to be
> some custom patches for that years ago.
> 
> 
> (In reply to Adhemerval Zanella from comment #4)
> > > This breaks all the games running via Proton that are using EAC.
> > 
> > What EAC is doing exactly that requires a DT_HASH on glibc? What happens if
> > you mix DT_HASH and DT_GNU_HASH shared objects (for instance shared
> > libraries with only one option)? It does seem a shortcoming from EAC,
> > although I am not sure what exactly it is trying to enforce here.
> 
> I can only guess but probably to look up some symbols in multiple ways to
> assure no hooking has occurred.
> 
> It seems to require it only for the glibc DSOs as they were shipping with
> DT_HASH for years, even on distros that default to gnu, and this is an
> unexpected change.

Right, however, I am not sure this characterize as an ABI break since the symbol lookup information would be indeed provided (albeit in a different format). And it should be transparent to most applications as well since symbol resolution is done by the loader itself.
Comment 8 Arkadiusz Hiler 2022-08-08 15:08:32 UTC
(In reply to Florian Weimer from comment #6)
> > Do you know why distros do this?
> 
> sysv hashing is much slower for symbol binding because we have to do a
> strcmp even on lookup failure. The Bloom filter is missing, too. Once the
> glibc dynamic linker supported it, I think distributions really wanted to
> use it, even if binutils and GCC defaults had not caught up at that point
> yet. It fixes real issues around startup performance.

I've read through both algorithms and seen the benchmarks / original
discussion around the new hash algorithm. The new one is unquestionably
superior.

I don't understand why people forced "gnu" instead of "both" though.
From my point of view, as an outsider, it looks like people hopped on the
feature a bit early overriding upstream defaults and it has stuck.

If they would not have carried over `--with-linker-hash-style=gnu` from
that era we would respect linker's default, which is both for The GNU
Linker.

As far as I understand there's small size increase and no real performance
penalty caused by having both.

This is mostly besides the point though and doesn't affect the perceived
breakage, but I appreciate having the historical context.

(In reply to Adhemerval Zanella from comment #7)
> Right, however, I am not sure this characterize as an ABI break since the
> symbol lookup information would be indeed provided (albeit in a different
> format). And it should be transparent to most applications as well since
> symbol resolution is done by the loader itself.

The glibc loader is not the only thing that can do symbol resolution, and
software that ships with distros is not the only software out there.
Changing the section name and the format is significant. All the historical
software that uses old version of dlsym() that's linked in to use pthreads
would break.
Comment 9 Adhemerval Zanella 2022-08-08 15:29:29 UTC
(In reply to Arkadiusz Hiler from comment #8)
> (In reply to Florian Weimer from comment #6)
> > > Do you know why distros do this?
> > 
> > sysv hashing is much slower for symbol binding because we have to do a
> > strcmp even on lookup failure. The Bloom filter is missing, too. Once the
> > glibc dynamic linker supported it, I think distributions really wanted to
> > use it, even if binutils and GCC defaults had not caught up at that point
> > yet. It fixes real issues around startup performance.
> 
> I've read through both algorithms and seen the benchmarks / original
> discussion around the new hash algorithm. The new one is unquestionably
> superior.
> 
> I don't understand why people forced "gnu" instead of "both" though.
> From my point of view, as an outsider, it looks like people hopped on the
> feature a bit early overriding upstream defaults and it has stuck.
> 
> If they would not have carried over `--with-linker-hash-style=gnu` from
> that era we would respect linker's default, which is both for The GNU
> Linker.
> 
> As far as I understand there's small size increase and no real performance
> penalty caused by having both.
> 
> This is mostly besides the point though and doesn't affect the perceived
> breakage, but I appreciate having the historical context.

It was done as size optimization from perceived unused features since DT_GNU_HASH is being used as a default on most distros for a long time. 

On libc.so for x86_64, I see:

* with -Wl,--hash-style=both
$ size libc.so
   text    data     bss     dec     hex filename
1992171   20320   55120 2067611  1f8c9b libc.so

* with -Wl,--hash-style=gnu
   text    data     bss     dec     hex filename
1975923   20304   55120 2051347  1f4d13 libc.so

Roughly 1%, which is considerable for an unused feature.

> 
> (In reply to Adhemerval Zanella from comment #7)
> > Right, however, I am not sure this characterize as an ABI break since the
> > symbol lookup information would be indeed provided (albeit in a different
> > format). And it should be transparent to most applications as well since
> > symbol resolution is done by the loader itself.
> 
> The glibc loader is not the only thing that can do symbol resolution, and
> software that ships with distros is not the only software out there.
> Changing the section name and the format is significant. All the historical
> software that uses old version of dlsym() that's linked in to use pthreads
> would break.

But we do not really support loading two different C runtime environments (for instance having libc.so with different versions in a different namespace), nor loading libc.so by a different executable (for instance dlsym libc.so from a different C runtime implementation), and static dlopen is being deprecated (meaning that eventually, we will phase out its support, besides the already know issues).

And dlopen libphtread will still work since we continue to provide a stub libpthread.so.  Are you aware of an issue with the current scheme?

So it does not make much sense to enforce DT_HASH on libc.so, especially if users do want to do symbol resolution it is still provided by DT_GNU_HASH. The glibc build now uses the default set by binutils, so maybe one option is to enforce -Wl,--hash-style=both as distro level if this is really a compatibility issue (since it might not be specific to glibc shared objects).
Comment 10 Arkadiusz Hiler 2022-08-08 16:49:40 UTC
(In reply to Adhemerval Zanella from comment #9)
> It was done as size optimization from perceived unused features since
> DT_GNU_HASH is being used as a default on most distros for a long time. 
> 
> On libc.so for x86_64, I see:
> 
> * with -Wl,--hash-style=both
> $ size libc.so
>    text    data     bss     dec     hex filename
> 1992171   20320   55120 2067611  1f8c9b libc.so
> 
> * with -Wl,--hash-style=gnu
>    text    data     bss     dec     hex filename
> 1975923   20304   55120 2051347  1f4d13 libc.so
> 
> Roughly 1%, which is considerable for an unused feature.

~16kB that's used by a whole class of games that are now unplayable on more
bleeding-edge distros.

See: https://github.com/ValveSoftware/Proton/issues/6051

> So it does not make much sense to enforce DT_HASH on libc.so, especially if
> users do want to do symbol resolution it is still provided by DT_GNU_HASH.
> The glibc build now uses the default set by binutils, so maybe one option is
> to enforce -Wl,--hash-style=both as distro level if this is really a
> compatibility issue (since it might not be specific to glibc shared objects).

Shifting responsibility on downstream to maintain compatibility with something
that glibc provided for years as the default and there are now users of it
is not really feasible.

I don't think I can convince you. I really hoped it can be just a simple revert.
The best I can do is to report that a thing used to work is now broken and
point to a commit that caused it.
Comment 11 Adhemerval Zanella 2022-08-08 17:13:25 UTC
(In reply to Arkadiusz Hiler from comment #10)
> (In reply to Adhemerval Zanella from comment #9)
> > It was done as size optimization from perceived unused features since
> > DT_GNU_HASH is being used as a default on most distros for a long time. 
> > 
> > On libc.so for x86_64, I see:
> > 
> > * with -Wl,--hash-style=both
> > $ size libc.so
> >    text    data     bss     dec     hex filename
> > 1992171   20320   55120 2067611  1f8c9b libc.so
> > 
> > * with -Wl,--hash-style=gnu
> >    text    data     bss     dec     hex filename
> > 1975923   20304   55120 2051347  1f4d13 libc.so
> > 
> > Roughly 1%, which is considerable for an unused feature.
> 
> ~16kB that's used by a whole class of games that are now unplayable on more
> bleeding-edge distros.
> 
> See: https://github.com/ValveSoftware/Proton/issues/6051
> 
> > So it does not make much sense to enforce DT_HASH on libc.so, especially if
> > users do want to do symbol resolution it is still provided by DT_GNU_HASH.
> > The glibc build now uses the default set by binutils, so maybe one option is
> > to enforce -Wl,--hash-style=both as distro level if this is really a
> > compatibility issue (since it might not be specific to glibc shared objects).
> 
> Shifting responsibility on downstream to maintain compatibility with
> something
> that glibc provided for years as the default and there are now users of it
> is not really feasible.
> 
> I don't think I can convince you. I really hoped it can be just a simple
> revert.
> The best I can do is to report that a thing used to work is now broken and
> point to a commit that caused it.

In fact, we take backward compatibility quite seriously, the main issue here is what characterizes an ABI break and what kind of forward compatibility and extra internal details changes we need to take care of.

I am not against revert this change, but it really odd that on all system binaries we need to keep the sysv hash solely for glibc add eternum because a specific tool where we do not really understand exactly why requires such functionality, and it does not provide any extra function to glibc itself.

Does it also prevent us to add another possible HASH scheme in the future, taking that EAC might break if it does see a DT_GNU_HASH?
Comment 12 Adhemerval Zanella 2022-08-08 17:32:43 UTC
(In reply to Arkadiusz Hiler from comment #10)
> 
> Shifting responsibility on downstream to maintain compatibility with
> something
> that glibc provided for years as the default and there are now users of it
> is not really feasible.
> 

I send a message on libc-alpha to obtain some opinions from other maintainers and developers [1].

[1] https://sourceware.org/pipermail/libc-alpha/2022-August/141302.html
Comment 13 Arkadiusz Hiler 2022-08-12 07:46:23 UTC
> Does it also prevent us to add another possible HASH scheme in the future, taking that EAC might break if it does see a DT_GNU_HASH?

The breakage here is not related to having DT_GNU_HASH, it's related to missing DT_HASH. I don't see how this is relevant.



There were two more breakages caused by dropping DT_HASH:

A native game - Shovel Knight:
https://github.com/ValveSoftware/Proton/issues/6051#issuecomment-1212748397

Framerate limiter for OpenGL - libstrangle:
https://bugs.gentoo.org/863863
Comment 14 Adhemerval Zanella 2022-08-12 13:17:57 UTC
(In reply to Arkadiusz Hiler from comment #13)
> > Does it also prevent us to add another possible HASH scheme in the future, taking that EAC might break if it does see a DT_GNU_HASH?
> 
> The breakage here is not related to having DT_GNU_HASH, it's related to
> missing DT_HASH. I don't see how this is relevant.
> 
> 
> 
> There were two more breakages caused by dropping DT_HASH:
> 
> A native game - Shovel Knight:
> https://github.com/ValveSoftware/Proton/issues/6051#issuecomment-1212748397
> 
> Framerate limiter for OpenGL - libstrangle:
> https://bugs.gentoo.org/863863

It is relevant because we are discussing if it is reasonable to GNU ABI ELF extension to keep DT_HASH mandatory.  From generic ABI discussion [1], it seems that other mantainers seem reasonable to make DT_HASH optional if DT_GNU_HASH is present on GNU objects, although some are not confortable on making it optional for generic ABI (which is not the case anyway).

As Carlos has summarized [2], DT_GNU_HASH is the *de facto* standard hash scheme for GNU and it has been used on distro deployments for over 16 years. The generic ABI discussion has raises some missing spots where DT_GNU_HASH does not cover (the total symbol list size) which generates another gABI discussion to add a new dynamic section type [3].

>  Shifting responsibility on downstream to maintain compatibility with something
> that glibc provided for years as the default and there are now users of it
> is not really feasible.

And it is already being done anyway [5], since for such cases compatibility is being done to revert upstream patches anyway instead of work on vendors to adapt their code to a decade-old standard. 

So I tend to agree that if DT_HASH compatibility is really required, it would be better to provide it as the default static linker option used to build glibc (so you also ensure that all installed shared object does have it) as Gentoo is doing [4].

[1] https://groups.google.com/g/generic-abi/c/th5919osPAQ
[2] https://sourceware.org/pipermail/libc-alpha/2022-August/141304.html
[3] https://groups.google.com/g/generic-abi/c/9L03yrxXPBc
[4] https://sourceware.org/pipermail/libc-alpha/2022-August/141312.html
[5] https://sourceware.org/pipermail/libc-alpha/2022-August/141313.html
Comment 15 James Y Knight 2022-08-20 18:01:13 UTC
The problem that programs are running into here seems really quite particular to libc.

Normally, if someone is intercepting a function in a shared-lib, they'll use dlsym RTLD_NEXT to find the next (or original) version of the function. No problem, no custom symbol-table parsing required.

However, if you intercept "dlsym", how do you then find the original "dlsym"? You cannot call dlsym to find it! Unfortunately, it seems that there's no great answer there...but a bunch of software has decided that the best choice is to dlopen libdl.so.2, and iterate the symbol table to find dlsym. And, sadly, even if you just iterate the symboltable in order, that still requires using DT_HASH to get the count of symbols, since (as discussed on the ABI list) there's no such thing as DT_SYMTAB_COUNT.

This bug report from 2014 has a nice explanation: https://bugs.gentoo.org/527504

That code was indeed updated to support reading DT_GNU_HASH, later on, see
https://github.com/mumble-voip/mumble/blob/master/overlay_gl/init_unix.c

Anyhow, I think the discussion about whether the REST of the libs on the system should also continue to require DT_HSAH is irrelevant, because they don't provide dlsym, and so folks aren't using this hack to lookup symbols in them. The question to ask is really just: "Should glibc go back to forcing --hash-style=both for its own build?"

And IMO, the answer should be yes, because the compat break isn't worth the tiny file-size savings.
Comment 16 Adhemerval Zanella 2022-08-22 14:16:11 UTC
(In reply to James Y Knight from comment #15)
> The problem that programs are running into here seems really quite
> particular to libc.
> 
> Normally, if someone is intercepting a function in a shared-lib, they'll use
> dlsym RTLD_NEXT to find the next (or original) version of the function. No
> problem, no custom symbol-table parsing required.
> 
> However, if you intercept "dlsym", how do you then find the original
> "dlsym"? You cannot call dlsym to find it! Unfortunately, it seems that
> there's no great answer there...but a bunch of software has decided that the
> best choice is to dlopen libdl.so.2, and iterate the symbol table to find
> dlsym. And, sadly, even if you just iterate the symboltable in order, that
> still requires using DT_HASH to get the count of symbols, since (as
> discussed on the ABI list) there's no such thing as DT_SYMTAB_COUNT.
> 
> This bug report from 2014 has a nice explanation:
> https://bugs.gentoo.org/527504
> 
> That code was indeed updated to support reading DT_GNU_HASH, later on, see
> https://github.com/mumble-voip/mumble/blob/master/overlay_gl/init_unix.c

Unfortunately, there is no direct way to override dlsym with default ELF way
to provide a ldpreload library with the implemented symbols.  You can
accomplish it with an audit module, but it will require refactor the way to
override library works and until recently you can rebind bind-now library
(it was fixed on 2.35).

By the way, the mumble probably does not work with libc 2.34 since dlsym was
moved to libc.so.  It will be better to include <gnu/lib-names.h> for glibc
and test first LIBDL_SO and then LIBC_SO.

> 
> Anyhow, I think the discussion about whether the REST of the libs on the
> system should also continue to require DT_HSAH is irrelevant, because they
> don't provide dlsym, and so folks aren't using this hack to lookup symbols
> in them. The question to ask is really just: "Should glibc go back to
> forcing --hash-style=both for its own build?"
> 
> And IMO, the answer should be yes, because the compat break isn't worth the
> tiny file-size savings.

I agree this change is doing more harm than good, I brought that I was willing
to revert it but Carlos asked to hold on since he is engaging with affected
parties to see what kind of support they require to see a better solution.
Comment 17 John Brooks 2022-08-22 16:42:57 UTC
I think there needs to be a much higher bar for keeping changes that break real applications, especially applications that have little or no chance of getting fixed. This is part of maintaining a stable operating system. When something as integral as glibc decides to keep breaking changes, it damages GNU/Linux as a platform. 

I'm sure this exact debate has been had all over the internet in the past couple of weeks, and I don't want to rehash it here or advocate for some absolute principle in one direction in the other. But regardless of whose fault it is in any particular case, compatibility breakage (however intentional or accidental, necessary or frivolous, justifiable or unwarranted) is still a bad thing for both users and developers, and that should be assigned the appropriate weight in these decisions.
Comment 18 Fangrui Song 2022-08-24 04:12:39 UTC
(In reply to John Brooks from comment #17)
> I think there needs to be a much higher bar for keeping changes that break real applications, especially applications that have little or no chance of getting fixed. This is part of maintaining a stable operating system. When something as integral as glibc decides to keep breaking changes, it damages GNU/Linux as a platform. 

There was lots of misinformation about the glibc change.
I created https://maskray.me/blog/2022-08-21-glibc-and-dt-gnu-hash 

See my comment on "Easy Anti-Cheat" and the new finding about Arch Linux package.

I think glibc should close this issue now.
Comment 19 Richard Biener 2022-08-24 08:28:17 UTC
SUSE uses the linker default from GCC and configures binutils explicitely to default to 'both'.  If glibc doesn't honor this in it's default behavior I'd declare it broken (why should it use an explicit --hash-style not honoring the
systems default?)
Comment 20 Florian Weimer 2022-08-24 08:56:54 UTC
(In reply to Richard Biener from comment #19)
> SUSE uses the linker default from GCC and configures binutils explicitely to
> default to 'both'.  If glibc doesn't honor this in it's default behavior I'd
> declare it broken (why should it use an explicit --hash-style not honoring
> the
> systems default?)

Current glibc sources do not override the toolchain default. Older versions overrode it with --hash-style=both.