commit e47de5cb2d4dbecb58f569ed241e8e95c568f03c causes glibc shared object to not have DT_HASH section anymore. Only the DT_GNU_HASH is still present. What seems to be forgotten is that glibc's dlopen() is not the only thing that may depend on it. This breaks SysV ABI compatibility and all the tools that use DT_HASH for symbol lookup. This breaks all the games running via Proton that are using EAC.
On typical distributions, none of the system libraries use DT_HASH, not even core libraries such as libgcc_s.so.1. Any tool that performs symbol lookups needs to support DT_GNU_HASH these days.
Hi, thanks for the reply. I wonder where the defaulting to having only DT_GNU_HASH is coming from? Quoting ld's man page: --hash-style=style Set the type of linker's hash table(s). style can be either "sysv" for classic ELF ".hash" section, "gnu" for new style GNU ".gnu.hash" section or "both" for both the classic ELF ".hash" and new style GNU ".gnu.hash" hash tables. The default depends upon how the linker was configured, but for most Linux based systems it will be "both". and indeed `ld --help` claims: --hash-style=STYLE Set hash style to sysv/gnu/both. Default: both As a sidenote mold defaults to sysv style hashes[0] and the format is well documented[1], whereas for the GNU-style one you have to read the sources of one of the implementations or rely on blog posts. The change also breaks existing software, not sure what glibc's stance is on that. [0]: https://github.com/rui314/mold/blob/415673a49e24b1f4189000b9eb4fb1a36a697093/elf/mold.h#L1475 [1]: https://refspecs.linuxfoundation.org/elf/gabi41.pdf section 5-21
Distributions configure or patch GCC to pass --hash-style=gnu to the linker by default. The linker default does not matter.
(In reply to Arkadiusz Hiler from comment #0) > commit e47de5cb2d4dbecb58f569ed241e8e95c568f03c causes glibc shared object > to not have DT_HASH section anymore. Only the DT_GNU_HASH is still present. > > What seems to be forgotten is that glibc's dlopen() is not the only thing > that may depend on it. This breaks SysV ABI compatibility and all the tools > that use DT_HASH for symbol lookup. The static linker option provides objects with only one kind of hash section. so I take it is expected from tools to handle this on GNU Linux environments. > > This breaks all the games running via Proton that are using EAC. What EAC is doing exactly that requires a DT_HASH on glibc? What happens if you mix DT_HASH and DT_GNU_HASH shared objects (for instance shared libraries with only one option)? It does seem a shortcoming from EAC, although I am not sure what exactly it is trying to enforce here.
(In reply to Florian Weimer from comment #3) > Distributions configure or patch GCC to pass --hash-style=gnu to the linker > by default. The linker default does not matter. Thanks, that explains a few things. My gcc -v indeed claims that: Configured with: /build/gcc/src/gcc/configure ... --with-linker-hash-style=gnu ... According to gcc/docs/install.texi the default is sysv but after browsing the code for a bit it looks like if `--with-linker-hash-style` is not specified no `--hash-style` is passed to the linker so it would depend on linker's defaults. Do you know why distros do this? Looking at package history it seems like an artifact from the times where both was a bit problematic. There used to be some custom patches for that years ago. (In reply to Adhemerval Zanella from comment #4) > > This breaks all the games running via Proton that are using EAC. > > What EAC is doing exactly that requires a DT_HASH on glibc? What happens if > you mix DT_HASH and DT_GNU_HASH shared objects (for instance shared > libraries with only one option)? It does seem a shortcoming from EAC, > although I am not sure what exactly it is trying to enforce here. I can only guess but probably to look up some symbols in multiple ways to assure no hooking has occurred. It seems to require it only for the glibc DSOs as they were shipping with DT_HASH for years, even on distros that default to gnu, and this is an unexpected change.
(In reply to Arkadiusz Hiler from comment #5) > (In reply to Florian Weimer from comment #3) > > Distributions configure or patch GCC to pass --hash-style=gnu to the linker > > by default. The linker default does not matter. > > Thanks, that explains a few things. My gcc -v indeed claims that: > > Configured with: /build/gcc/src/gcc/configure ... > --with-linker-hash-style=gnu ... > > According to gcc/docs/install.texi the default is sysv but after browsing > the code for a bit it looks like if `--with-linker-hash-style` is not > specified no `--hash-style` is passed to the linker so it would depend on > linker's defaults. > > Do you know why distros do this? sysv hashing is much slower for symbol binding because we have to do a strcmp even on lookup failure. The Bloom filter is missing, too. Once the glibc dynamic linker supported it, I think distributions really wanted to use it, even if binutils and GCC defaults had not caught up at that point yet. It fixes real issues around startup performance.
(In reply to Arkadiusz Hiler from comment #5) > (In reply to Florian Weimer from comment #3) > > Distributions configure or patch GCC to pass --hash-style=gnu to the linker > > by default. The linker default does not matter. > > Thanks, that explains a few things. My gcc -v indeed claims that: > > Configured with: /build/gcc/src/gcc/configure ... > --with-linker-hash-style=gnu ... > > According to gcc/docs/install.texi the default is sysv but after browsing > the code for a bit it looks like if `--with-linker-hash-style` is not > specified no `--hash-style` is passed to the linker so it would depend on > linker's defaults. > > Do you know why distros do this? Looking at package history it seems like an > artifact from the times where both was a bit problematic. There used to be > some custom patches for that years ago. > > > (In reply to Adhemerval Zanella from comment #4) > > > This breaks all the games running via Proton that are using EAC. > > > > What EAC is doing exactly that requires a DT_HASH on glibc? What happens if > > you mix DT_HASH and DT_GNU_HASH shared objects (for instance shared > > libraries with only one option)? It does seem a shortcoming from EAC, > > although I am not sure what exactly it is trying to enforce here. > > I can only guess but probably to look up some symbols in multiple ways to > assure no hooking has occurred. > > It seems to require it only for the glibc DSOs as they were shipping with > DT_HASH for years, even on distros that default to gnu, and this is an > unexpected change. Right, however, I am not sure this characterize as an ABI break since the symbol lookup information would be indeed provided (albeit in a different format). And it should be transparent to most applications as well since symbol resolution is done by the loader itself.
(In reply to Florian Weimer from comment #6) > > Do you know why distros do this? > > sysv hashing is much slower for symbol binding because we have to do a > strcmp even on lookup failure. The Bloom filter is missing, too. Once the > glibc dynamic linker supported it, I think distributions really wanted to > use it, even if binutils and GCC defaults had not caught up at that point > yet. It fixes real issues around startup performance. I've read through both algorithms and seen the benchmarks / original discussion around the new hash algorithm. The new one is unquestionably superior. I don't understand why people forced "gnu" instead of "both" though. From my point of view, as an outsider, it looks like people hopped on the feature a bit early overriding upstream defaults and it has stuck. If they would not have carried over `--with-linker-hash-style=gnu` from that era we would respect linker's default, which is both for The GNU Linker. As far as I understand there's small size increase and no real performance penalty caused by having both. This is mostly besides the point though and doesn't affect the perceived breakage, but I appreciate having the historical context. (In reply to Adhemerval Zanella from comment #7) > Right, however, I am not sure this characterize as an ABI break since the > symbol lookup information would be indeed provided (albeit in a different > format). And it should be transparent to most applications as well since > symbol resolution is done by the loader itself. The glibc loader is not the only thing that can do symbol resolution, and software that ships with distros is not the only software out there. Changing the section name and the format is significant. All the historical software that uses old version of dlsym() that's linked in to use pthreads would break.
(In reply to Arkadiusz Hiler from comment #8) > (In reply to Florian Weimer from comment #6) > > > Do you know why distros do this? > > > > sysv hashing is much slower for symbol binding because we have to do a > > strcmp even on lookup failure. The Bloom filter is missing, too. Once the > > glibc dynamic linker supported it, I think distributions really wanted to > > use it, even if binutils and GCC defaults had not caught up at that point > > yet. It fixes real issues around startup performance. > > I've read through both algorithms and seen the benchmarks / original > discussion around the new hash algorithm. The new one is unquestionably > superior. > > I don't understand why people forced "gnu" instead of "both" though. > From my point of view, as an outsider, it looks like people hopped on the > feature a bit early overriding upstream defaults and it has stuck. > > If they would not have carried over `--with-linker-hash-style=gnu` from > that era we would respect linker's default, which is both for The GNU > Linker. > > As far as I understand there's small size increase and no real performance > penalty caused by having both. > > This is mostly besides the point though and doesn't affect the perceived > breakage, but I appreciate having the historical context. It was done as size optimization from perceived unused features since DT_GNU_HASH is being used as a default on most distros for a long time. On libc.so for x86_64, I see: * with -Wl,--hash-style=both $ size libc.so text data bss dec hex filename 1992171 20320 55120 2067611 1f8c9b libc.so * with -Wl,--hash-style=gnu text data bss dec hex filename 1975923 20304 55120 2051347 1f4d13 libc.so Roughly 1%, which is considerable for an unused feature. > > (In reply to Adhemerval Zanella from comment #7) > > Right, however, I am not sure this characterize as an ABI break since the > > symbol lookup information would be indeed provided (albeit in a different > > format). And it should be transparent to most applications as well since > > symbol resolution is done by the loader itself. > > The glibc loader is not the only thing that can do symbol resolution, and > software that ships with distros is not the only software out there. > Changing the section name and the format is significant. All the historical > software that uses old version of dlsym() that's linked in to use pthreads > would break. But we do not really support loading two different C runtime environments (for instance having libc.so with different versions in a different namespace), nor loading libc.so by a different executable (for instance dlsym libc.so from a different C runtime implementation), and static dlopen is being deprecated (meaning that eventually, we will phase out its support, besides the already know issues). And dlopen libphtread will still work since we continue to provide a stub libpthread.so. Are you aware of an issue with the current scheme? So it does not make much sense to enforce DT_HASH on libc.so, especially if users do want to do symbol resolution it is still provided by DT_GNU_HASH. The glibc build now uses the default set by binutils, so maybe one option is to enforce -Wl,--hash-style=both as distro level if this is really a compatibility issue (since it might not be specific to glibc shared objects).
(In reply to Adhemerval Zanella from comment #9) > It was done as size optimization from perceived unused features since > DT_GNU_HASH is being used as a default on most distros for a long time. > > On libc.so for x86_64, I see: > > * with -Wl,--hash-style=both > $ size libc.so > text data bss dec hex filename > 1992171 20320 55120 2067611 1f8c9b libc.so > > * with -Wl,--hash-style=gnu > text data bss dec hex filename > 1975923 20304 55120 2051347 1f4d13 libc.so > > Roughly 1%, which is considerable for an unused feature. ~16kB that's used by a whole class of games that are now unplayable on more bleeding-edge distros. See: https://github.com/ValveSoftware/Proton/issues/6051 > So it does not make much sense to enforce DT_HASH on libc.so, especially if > users do want to do symbol resolution it is still provided by DT_GNU_HASH. > The glibc build now uses the default set by binutils, so maybe one option is > to enforce -Wl,--hash-style=both as distro level if this is really a > compatibility issue (since it might not be specific to glibc shared objects). Shifting responsibility on downstream to maintain compatibility with something that glibc provided for years as the default and there are now users of it is not really feasible. I don't think I can convince you. I really hoped it can be just a simple revert. The best I can do is to report that a thing used to work is now broken and point to a commit that caused it.
(In reply to Arkadiusz Hiler from comment #10) > (In reply to Adhemerval Zanella from comment #9) > > It was done as size optimization from perceived unused features since > > DT_GNU_HASH is being used as a default on most distros for a long time. > > > > On libc.so for x86_64, I see: > > > > * with -Wl,--hash-style=both > > $ size libc.so > > text data bss dec hex filename > > 1992171 20320 55120 2067611 1f8c9b libc.so > > > > * with -Wl,--hash-style=gnu > > text data bss dec hex filename > > 1975923 20304 55120 2051347 1f4d13 libc.so > > > > Roughly 1%, which is considerable for an unused feature. > > ~16kB that's used by a whole class of games that are now unplayable on more > bleeding-edge distros. > > See: https://github.com/ValveSoftware/Proton/issues/6051 > > > So it does not make much sense to enforce DT_HASH on libc.so, especially if > > users do want to do symbol resolution it is still provided by DT_GNU_HASH. > > The glibc build now uses the default set by binutils, so maybe one option is > > to enforce -Wl,--hash-style=both as distro level if this is really a > > compatibility issue (since it might not be specific to glibc shared objects). > > Shifting responsibility on downstream to maintain compatibility with > something > that glibc provided for years as the default and there are now users of it > is not really feasible. > > I don't think I can convince you. I really hoped it can be just a simple > revert. > The best I can do is to report that a thing used to work is now broken and > point to a commit that caused it. In fact, we take backward compatibility quite seriously, the main issue here is what characterizes an ABI break and what kind of forward compatibility and extra internal details changes we need to take care of. I am not against revert this change, but it really odd that on all system binaries we need to keep the sysv hash solely for glibc add eternum because a specific tool where we do not really understand exactly why requires such functionality, and it does not provide any extra function to glibc itself. Does it also prevent us to add another possible HASH scheme in the future, taking that EAC might break if it does see a DT_GNU_HASH?
(In reply to Arkadiusz Hiler from comment #10) > > Shifting responsibility on downstream to maintain compatibility with > something > that glibc provided for years as the default and there are now users of it > is not really feasible. > I send a message on libc-alpha to obtain some opinions from other maintainers and developers [1]. [1] https://sourceware.org/pipermail/libc-alpha/2022-August/141302.html
> Does it also prevent us to add another possible HASH scheme in the future, taking that EAC might break if it does see a DT_GNU_HASH? The breakage here is not related to having DT_GNU_HASH, it's related to missing DT_HASH. I don't see how this is relevant. There were two more breakages caused by dropping DT_HASH: A native game - Shovel Knight: https://github.com/ValveSoftware/Proton/issues/6051#issuecomment-1212748397 Framerate limiter for OpenGL - libstrangle: https://bugs.gentoo.org/863863
(In reply to Arkadiusz Hiler from comment #13) > > Does it also prevent us to add another possible HASH scheme in the future, taking that EAC might break if it does see a DT_GNU_HASH? > > The breakage here is not related to having DT_GNU_HASH, it's related to > missing DT_HASH. I don't see how this is relevant. > > > > There were two more breakages caused by dropping DT_HASH: > > A native game - Shovel Knight: > https://github.com/ValveSoftware/Proton/issues/6051#issuecomment-1212748397 > > Framerate limiter for OpenGL - libstrangle: > https://bugs.gentoo.org/863863 It is relevant because we are discussing if it is reasonable to GNU ABI ELF extension to keep DT_HASH mandatory. From generic ABI discussion [1], it seems that other mantainers seem reasonable to make DT_HASH optional if DT_GNU_HASH is present on GNU objects, although some are not confortable on making it optional for generic ABI (which is not the case anyway). As Carlos has summarized [2], DT_GNU_HASH is the *de facto* standard hash scheme for GNU and it has been used on distro deployments for over 16 years. The generic ABI discussion has raises some missing spots where DT_GNU_HASH does not cover (the total symbol list size) which generates another gABI discussion to add a new dynamic section type [3]. > Shifting responsibility on downstream to maintain compatibility with something > that glibc provided for years as the default and there are now users of it > is not really feasible. And it is already being done anyway [5], since for such cases compatibility is being done to revert upstream patches anyway instead of work on vendors to adapt their code to a decade-old standard. So I tend to agree that if DT_HASH compatibility is really required, it would be better to provide it as the default static linker option used to build glibc (so you also ensure that all installed shared object does have it) as Gentoo is doing [4]. [1] https://groups.google.com/g/generic-abi/c/th5919osPAQ [2] https://sourceware.org/pipermail/libc-alpha/2022-August/141304.html [3] https://groups.google.com/g/generic-abi/c/9L03yrxXPBc [4] https://sourceware.org/pipermail/libc-alpha/2022-August/141312.html [5] https://sourceware.org/pipermail/libc-alpha/2022-August/141313.html
The problem that programs are running into here seems really quite particular to libc. Normally, if someone is intercepting a function in a shared-lib, they'll use dlsym RTLD_NEXT to find the next (or original) version of the function. No problem, no custom symbol-table parsing required. However, if you intercept "dlsym", how do you then find the original "dlsym"? You cannot call dlsym to find it! Unfortunately, it seems that there's no great answer there...but a bunch of software has decided that the best choice is to dlopen libdl.so.2, and iterate the symbol table to find dlsym. And, sadly, even if you just iterate the symboltable in order, that still requires using DT_HASH to get the count of symbols, since (as discussed on the ABI list) there's no such thing as DT_SYMTAB_COUNT. This bug report from 2014 has a nice explanation: https://bugs.gentoo.org/527504 That code was indeed updated to support reading DT_GNU_HASH, later on, see https://github.com/mumble-voip/mumble/blob/master/overlay_gl/init_unix.c Anyhow, I think the discussion about whether the REST of the libs on the system should also continue to require DT_HSAH is irrelevant, because they don't provide dlsym, and so folks aren't using this hack to lookup symbols in them. The question to ask is really just: "Should glibc go back to forcing --hash-style=both for its own build?" And IMO, the answer should be yes, because the compat break isn't worth the tiny file-size savings.
(In reply to James Y Knight from comment #15) > The problem that programs are running into here seems really quite > particular to libc. > > Normally, if someone is intercepting a function in a shared-lib, they'll use > dlsym RTLD_NEXT to find the next (or original) version of the function. No > problem, no custom symbol-table parsing required. > > However, if you intercept "dlsym", how do you then find the original > "dlsym"? You cannot call dlsym to find it! Unfortunately, it seems that > there's no great answer there...but a bunch of software has decided that the > best choice is to dlopen libdl.so.2, and iterate the symbol table to find > dlsym. And, sadly, even if you just iterate the symboltable in order, that > still requires using DT_HASH to get the count of symbols, since (as > discussed on the ABI list) there's no such thing as DT_SYMTAB_COUNT. > > This bug report from 2014 has a nice explanation: > https://bugs.gentoo.org/527504 > > That code was indeed updated to support reading DT_GNU_HASH, later on, see > https://github.com/mumble-voip/mumble/blob/master/overlay_gl/init_unix.c Unfortunately, there is no direct way to override dlsym with default ELF way to provide a ldpreload library with the implemented symbols. You can accomplish it with an audit module, but it will require refactor the way to override library works and until recently you can rebind bind-now library (it was fixed on 2.35). By the way, the mumble probably does not work with libc 2.34 since dlsym was moved to libc.so. It will be better to include <gnu/lib-names.h> for glibc and test first LIBDL_SO and then LIBC_SO. > > Anyhow, I think the discussion about whether the REST of the libs on the > system should also continue to require DT_HSAH is irrelevant, because they > don't provide dlsym, and so folks aren't using this hack to lookup symbols > in them. The question to ask is really just: "Should glibc go back to > forcing --hash-style=both for its own build?" > > And IMO, the answer should be yes, because the compat break isn't worth the > tiny file-size savings. I agree this change is doing more harm than good, I brought that I was willing to revert it but Carlos asked to hold on since he is engaging with affected parties to see what kind of support they require to see a better solution.
I think there needs to be a much higher bar for keeping changes that break real applications, especially applications that have little or no chance of getting fixed. This is part of maintaining a stable operating system. When something as integral as glibc decides to keep breaking changes, it damages GNU/Linux as a platform. I'm sure this exact debate has been had all over the internet in the past couple of weeks, and I don't want to rehash it here or advocate for some absolute principle in one direction in the other. But regardless of whose fault it is in any particular case, compatibility breakage (however intentional or accidental, necessary or frivolous, justifiable or unwarranted) is still a bad thing for both users and developers, and that should be assigned the appropriate weight in these decisions.
(In reply to John Brooks from comment #17) > I think there needs to be a much higher bar for keeping changes that break real applications, especially applications that have little or no chance of getting fixed. This is part of maintaining a stable operating system. When something as integral as glibc decides to keep breaking changes, it damages GNU/Linux as a platform. There was lots of misinformation about the glibc change. I created https://maskray.me/blog/2022-08-21-glibc-and-dt-gnu-hash See my comment on "Easy Anti-Cheat" and the new finding about Arch Linux package. I think glibc should close this issue now.
SUSE uses the linker default from GCC and configures binutils explicitely to default to 'both'. If glibc doesn't honor this in it's default behavior I'd declare it broken (why should it use an explicit --hash-style not honoring the systems default?)
(In reply to Richard Biener from comment #19) > SUSE uses the linker default from GCC and configures binutils explicitely to > default to 'both'. If glibc doesn't honor this in it's default behavior I'd > declare it broken (why should it use an explicit --hash-style not honoring > the > systems default?) Current glibc sources do not override the toolchain default. Older versions overrode it with --hash-style=both.
You are overlooking the main point boyos. People MUST be able to tell, in advance, that the ABI will be missing a previous component. The simple way to do that is just to increment the soname. If you can't do that, you MUST not reduce the ABI. You cannot rely on a component just not being used anymore, announcing the change in a blog, or even documenting it. The change needs to be explicit while compiling. Any other thing is a leap of faith into the void.
I'm marking this issue as RESOLVED WONTFIX, because the glibc build will include both hashes if that's how the distribution builds the ELF . It is up to the distributions to decide how to proceed with the availability of the hashes. There is no intent to fix this in glibc by forcing both hashes.