Bug 24636 - Export basic metadata about ABI compatibility
Summary: Export basic metadata about ABI compatibility
Status: RESOLVED NOTABUG
Alias: None
Product: glibc
Classification: Unclassified
Component: libc (show other bugs)
Version: unspecified
: P2 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-06-05 06:18 UTC by Nathaniel J. Smith
Modified: 2021-09-21 03:13 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Nathaniel J. Smith 2019-06-05 06:18:48 UTC
It's well known that glibc follows strict backwards compatibility rules: if you build against glibc 2.X, then you can expect your binary to work when run against glibc 2.Y, so long as X <= Y.

I'm sure this is a lot of work, but it's a fabulous property so, thank you! One project that relies on this is the "manylinux" project for distributing precompiled Python packages on Linux. The way it works is:

1. We tag the built binaries with metadata about what glibc version they were compiled against
2. When a user wants to install a Python package, the installer uses the gnu_get_libc_version() function to figure out whether the current system is using glibc and if so which version, and then compares that to the metadata attached to each available binary to decide if it's a good candidate to install.

(Well, there are a lot more details to making this work in practice, but that's the fundamental idea.)

This system is popular, and works really well in practice. Last month (May), Python users used glibc version metadata to download ~260 million of these precompiled packages.

There are some challenges though:

Using gnu_get_libc_version() like this is a bit awkward, especially because in practice some vendors like to stick random strings in there. (We have an empirically-derived regex, basically just matching "2\.[0-9]+" and then ignoring anything after the first non-numeric digit.)

It's not clear if the glibc maintainers intended this function to be used programmatically like this, or intend to preserve the invariants that we rely on.

And in particular, Zack Weinberg recently told us that there might some day be a glibc 3.0 that kept the soname and 99% ABI compatibility, but dropped some ancient ABIs: https://github.com/pypa/manylinux/pull/304#discussion_r290046198
Currently this would be a catastrophic compatibility break for us, because our metadata code doesn't know how to handle 3.X versions.

After discussing it with him, I'd like to propose that glibc add a new function:

  void gnu_get_libc_abi_levels(int *build_abi, int *min_runtime_abi, int *max_runtime_abi)

Basically this is a function that returns 3 integers. build_abi specifies the "ABI level" of binaries built against this glibc. min_runtime_abi and max_runtime_abi specify the supported "ABI levels" of binaries run against this glibc.

If I build against a glibc whose build_abi is X, then I run against a glibc whose min_runtime_abi and max_runtime_abi are Y and Z respectively, then I can expect it to work iff Y <= X <= Z.

The implementation is trivial of course:

  void gnu_get_libc_abi_levels(int *build_abi, int *min_runtime_abi, int *max_runtime_abi) {
      if (build_abi)
          *build_abi = __GLIBC_MINOR__;
      if (min_runtime_abi)
          *min_runtime_abi = 0;
      if (max_runtime_abi)
          *max_runtime_abi = __GLIBC_MINOR__;
  }

And we retroactively declare that for previous versions of glibc that didn't include this function, their build abi matches __GLIBC_MINOR__.

Of course there are a bunch of bikesheddable details here. The most crucial part is that there be some way to, at runtime, fetch the range of supported ABI levels.

Compared to the current system:

- This lets us (someday, eventually) drop our version string parsing code,
- It makes your public ABI compatibility commitments more obvious
- And it gives you the option of someday increasing the min_runtime_abi version without breaking everyone using Python
Comment 1 joseph@codesourcery.com 2019-06-06 20:39:30 UTC
On Wed, 5 Jun 2019, njs at pobox dot com wrote:

> Using gnu_get_libc_version() like this is a bit awkward, especially because in
> practice some vendors like to stick random strings in there. (We have an
> empirically-derived regex, basically just matching "2\.[0-9]+" and then
> ignoring anything after the first non-numeric digit.)

The right place for vendor-specific information is in PKGVERSION 
(configure --with-pkgversion=<something>), which affects the banner you 
get when you run libc.so.6, but not the result of gnu_get_libc_version.

> After discussing it with him, I'd like to propose that glibc add a new
> function:
> 
>   void gnu_get_libc_abi_levels(int *build_abi, int *min_runtime_abi, int
> *max_runtime_abi)
> 
> Basically this is a function that returns 3 integers. build_abi specifies the
> "ABI level" of binaries built against this glibc. min_runtime_abi and
> max_runtime_abi specify the supported "ABI levels" of binaries run against this
> glibc.

I'm not convinced ABI levels are a defined concept like that, at least not 
as integers.  (We have symbol versions, and an ordering relation between 
them.  The particular set of symbol versions may depend on the 
architecture.  Up to 2.3.x there were sometimes new symbol versions in 
point releases.  Although we don't currently do point releases, and 
haven't had new symbol versions in them for a very long time, it's not 
obvious to me that there will never be a case in future for doing point 
releases and adding symbol versions in them.  Say, if some security issue 
shows up an API design issue and it's concluded to be important to add a 
new API quickly including into older versions.)

The min_runtime_abi concept is questionable.  We removed the 
--enable-oldest-abi option years ago as bitrotten (bug 6652).  Any 
suggested slightly-incompatible changes would *not* remove GLIBC_2.0 
symbols in general; they might very selectively remove certain 
compatibility features that quite likely could not be associated with a 
symbol version at all, and would not have anything we could define in 
advance as an ABI level that might later be removed (there wouldn't be a 
total ordering between such compatibility features, either, which rather 
prevents defining their presence or absence by such a minimum ABI level).

The nearest thing we have to a minimum ABI level is the minimum symbol 
version - but any change of that is *completely* incompatible (replaces 
symbol versions for every symbol at the old version, so would indicate a 
new SONAME or dynamic linker name).

What *is* clearly defined is the __GLIBC__ and __GLIBC_MINOR__ integer 
values, so there could be a C API to provide those (if such an API is 
useful).  The comparison should not treat 3.0 as being incompatible with 
2.x, just as being later (as any slightly-incompatible changes without 
change of SONAME would be such that very few binaries would be likely to 
be affected, and any (unlikely) change of SONAME would mean a program 
built for a different SONAME of glibc simply wouldn't run).
Comment 2 Nathaniel J. Smith 2019-06-07 01:50:16 UTC
> The right place for vendor-specific information is in PKGVERSION 
> (configure --with-pkgversion=<something>), which affects the banner you 
> get when you run libc.so.6, but not the result of gnu_get_libc_version.

That's nice to know, but it turns out not all vendors got the message, and some of their users use Python... https://github.com/pypa/pip/issues/3588

> I'm not convinced ABI levels are a defined concept like that, at least not 
as integers.

I see what you mean. Symbol versions are great, and definitely give more fine grained information. But dumping full symbol version information into every package's metadata doesn't work very well. The idea of an "ABI level" is to 
provide a shorthand name for some common collections of symbol versions.

And pragmatically speaking, there are millions of systems relying on __GLIBC_MINOR__ to act as an ABI level right now, so they're defined in that sense :-). Fortunately, I don't think we need them to do as much as symbol versions do, so it can work.

Let's imagine a hypothetical 2.40 that turns out to have urgent bugs, so we end up with a 2.40.1 release that includes some @GLIBC_2.41 symbols.

The easy case is where you also rush out 2.41 with the exact same symbols as 2.40.1. In this case, it just means that binaries built against 2.40.1 require ABI level 41, and 2.40.1 supports ABI levels [0, 41].

The trickier case is where the final 2.41 includes other new symbols that didn't make it into 2.40.1. In that case, I guess the best the metadata could do was say that binaries built against 2.40.1 have ABI level 41, and that 2.40.1 supports ABI levels [0, 40]. Which looks really odd, because it seems to suggest that if you build against 2.40.1, then you might not be able to run against 2.40.1. But if you obey what it's telling you, you will end up with a working configuration; it just isn't quite precise enough to tell you about some other configurations that would also work.

This would be totally fine for us – we're OK with occasionally not installing a binary that would have worked, so long as we avoid installing binaries that don't work. So if we end up only installing 2.40.1 binaries on systems that have 2.41, that's no big deal. Also, when building binaries we have a lot of control over the environment: we get to choose the distro, compiler, we have tooling that audits symbol versions, etc. So we'd probably just not use 2.40.1 for builds.

Where we really need reliable automated rules is in the installer, since that runs on random end users systems that we don't control, including systems that haven't been released yet. So our core requirement is: some metadata we can query at runtime, that gives a conservative estimate of what binaries will work with the current glibc. We wouldn't actually use the build_abi runtime query API; the crucial thing would be for the the glibc devs to have chosen a build_abi value for each release, so that later on the runtime_abi has a way to talk about that release.

> The min_runtime_abi concept is questionable

This would be different from --enable-oldest-abi, because it wouldn't be configurable at build time; it'd just be taking whatever decision you'd made about abi compatibility, and making it more visible to the rest of us. And it's different from changing the minimum symbol version, because it doesn't refer to symbols, it refers to binaries built against those old versions. So bumping the min_runtime_abi to, say, 3, would just mean "if a binary was built against glibc 2.2 or earlier, we no longer guarantee that it works".

I don't know if that's the best approach – really what we want is the glibc devs to make a commitment about how to tell which glibc versions are compatible with which other glibc versions, that's precise enough to encode in software. What exactly that commitment might look like is a policy question for glibc; my proposal was just a guess based on what Zack told me :-).

An API to fetch __GLIBC__ and __GLIBC_MINOR__ at runtime would be somewhat useful, because it would let us stop parsing strings, but it doesn't really touch on the "commitment" part. Your statement here in the tracker about what a hypothetical future glibc 3.0 would mean is definitely helpful, but I do wonder a little whether the glibc devs of 2029 will feel themselves bound to match a one-off comment in an issue from 2019. And maybe if you had a way to tell software like ours *which* binaries were broken by 3.0, instead of having to handwave and say "probably not many, don't worry about it", then that would make it easier to ship 3.0? I don't have a strong conclusion here; these are just things I'm thinking about.
Comment 3 joseph@codesourcery.com 2019-06-10 17:02:50 UTC
On Fri, 7 Jun 2019, njs at pobox dot com wrote:

> Let's imagine a hypothetical 2.40 that turns out to have urgent bugs, so we end
> up with a 2.40.1 release that includes some @GLIBC_2.41 symbols.

In that case I expect we'd use @GLIBC_2.40.1 instead.

> This would be different from --enable-oldest-abi, because it wouldn't be
> configurable at build time; it'd just be taking whatever decision you'd made
> about abi compatibility, and making it more visible to the rest of us. And it's
> different from changing the minimum symbol version, because it doesn't refer to
> symbols, it refers to binaries built against those old versions. So bumping the
> min_runtime_abi to, say, 3, would just mean "if a binary was built against
> glibc 2.2 or earlier, we no longer guarantee that it works".

But that's simply not how slightly-incompatible changes work in practice.  
It's not generally removing features that might only be used by binaries 
built with a given glibc version.  It's removing features that might only 
be used by C++ binaries built with GCC 2.95 or earlier, for example 
(independent of the glibc version they were built with).  Or removing 
features that might be used by binaries built with any glibc version, but 
are sufficiently obscure we think that is unlikely to be relevant in 
practice - take the recent discussion of the copy_file_range emulation for 
older kernels, for example; that would be removing a feature 
"copy_file_range sometimes works without ENOSYS on older kernels".

Because there is no total ordering for such features and no relation in 
general to particular old glibc versions, a minimum ABI can't really be 
defined in a way that could usefully change to reflect such 
slightly-incompatible changes.
Comment 4 Nathaniel J. Smith 2019-06-11 09:12:50 UTC
Fair enough.

Here's another idea that occurred to me, that I'll throw out here. glibc could provide an API that lets you explicitly query whether the current glibc thinks it can run a binary built against a given version:

  bool
  gnu_get_libc_can_run (int build_major, int build_minor)
  {
    return (build_major < __GLIBC__
            || (build_major == __GLIBC__ && build_minor <= __GLIBC_MINOR__);
  }

If we had this we wouldn't even need a way to query __GLIBC__ and __GLIBC_MINOR__, because we're only querying them so we can implement this ourselves :-). Basically it would let us get rid of all this logic:

  https://github.com/pypa/pip/blob/5776ddd05896162e283737d7fcdf8f5a63a97bbc/src/pip/_internal/utils/glibc.py#L40-L62

It would also give the glibc devs full control over expressing whatever compatibility guidelines they want to commit to. (You'll notice that the code I linked to assumes that 3.x and 2.x are incompatible, which you're saying is wrong, so I guess we have a poor track record at reading the glibc devs' minds!)
Comment 5 joseph@codesourcery.com 2019-06-13 17:09:01 UTC
On Tue, 11 Jun 2019, njs at pobox dot com wrote:

> Here's another idea that occurred to me, that I'll throw out here. glibc could
> provide an API that lets you explicitly query whether the current glibc thinks
> it can run a binary built against a given version:
> 
>   bool
>   gnu_get_libc_can_run (int build_major, int build_minor)
>   {
>     return (build_major < __GLIBC__
>             || (build_major == __GLIBC__ && build_minor <= __GLIBC_MINOR__);
>   }

Note that such logic is only valid on the assumption that both versions 
are using the same SONAME, and the same one of the ABIs listed at 
<https://sourceware.org/glibc/wiki/ABIList>.  (Some ABI incompatibilities 
are checked for by glibc dynamic linker code, but not all.)  For example, 
on Arm it would happily report being able to run binaries built with glibc 
2.0, but any Arm binaries built with a version before 2.4 would be using 
the old ABI instead of EABI, and so certainly not able to run with current 
glibc (and for a while, both ABIs were supported before old-ABI support 
was removed).  Or on x86 it would claim support for glibc 1.x binaries, 
which aren't compatible with 2.x (different SONAME, different dynamic 
linker, etc.).

I'm not clear on the context in which you'd be calling such a function.  
Would it already be guaranteed that a case of non-matching SONAME or 
non-matching ABI either never reached this code, or is not something it 
needs to care about?
Comment 6 Nathaniel J. Smith 2019-06-16 02:02:08 UTC
Regarding SONAMEs: those are implied by the glibc version number, right? So I guess they should be handled by this function. We don't have a separate SONAME check, and would rather not add one. You're right though that I was sloppy about handling glibc 1.x. So I guess a better version would be:

   bool
   gnu_get_libc_can_run (int build_major, int build_minor)
   {     
     if (build_major < 2)
       {
         /* glibc 1.x had a totally different ABI */
         return false;
       }
     else
       {
         /* for glibc 2.x or later, the rule is simply
          *   (build_major, build_minor) <= (runtime_major, runtime_minor)
          * where <= is tuple comparison.
          */
         return (build_major < __GLIBC__
                 || (build_major == __GLIBC__ && build_minor <= __GLIBC_MINOR__);
       }
   }

Regarding low-level ABI differences (architecture, calling convention, etc.): for the Python packaging case, our metadata has a platform ABI tag that we check separately, so we can assume that that's already been handled. That said, so far we've only supported x86 and x86-64, so there are probably some exciting surprises waiting for us as we start supporting architectures like ARM. Maybe we'll discover that glibc could do something to help here (maybe a gnu_get_libc_supported_abis, or something like that?). But I think we can treat that as an independent discussion.
Comment 7 joseph@codesourcery.com 2019-06-17 15:24:29 UTC
On Sun, 16 Jun 2019, njs at pobox dot com wrote:

> Regarding SONAMEs: those are implied by the glibc version number, right?

They're implied by the glibc ABI (from the list at 
<https://sourceware.org/glibc/wiki/ABIList>).  Different glibc versions 
support different sets of ABIs.  (Given that glibc 1.x used a disjoint set 
of ABIs and SONAMEs, it's thus questionable whether the function does need 
to handle it or not.)
Comment 8 Nathaniel J. Smith 2019-06-17 18:53:10 UTC
Ok, I guess I meant that *given a platform ABI*, the version number implies the soname?

I don't really care that much about glibc 1.x honestly. Nobody is shipping glibc 1.x-based binaries or systems, and it doesn't affect my use cases at all. So you can handle it however you like.

I'm not sure I understand what you're trying to figure out here.
Comment 9 joseph@codesourcery.com 2019-06-17 19:18:10 UTC
On Mon, 17 Jun 2019, njs at pobox dot com wrote:

> Ok, I guess I meant that *given a platform ABI*, the version number implies the
> soname?

A platform ABI implies the SONAME.  You don't need the version number.

> I'm not sure I understand what you're trying to figure out here.

Whether a simple version number comparison would actually address your 
problem (with all comparisons of the platform ABI - including all 
distinguishing of the different ABI variants listed at 
<https://sourceware.org/glibc/wiki/ABIList> - being the responsibility of 
something else, not the function in glibc).
Comment 10 Nathaniel J. Smith 2019-06-17 19:39:34 UTC
Right now our platform ABI tags on Linux are literally just the two strings "x86" and "x86_64". I guess glibc 1.x did support "x86", maybe with a slightly different calling convention? But that's not really important anymore.

The interesting cases will be as we add more ARM support. Since we get to choose what tags we use, we can make our tags as fine-grained as necessary. So I'm guessing that yes, we can arrange things so that we do one ABI check using those tags, combine that with a version check using this function, and together that will take care of everything. But I'm definitely not an expert on the fine details of ARM ABIs. Do you foresee any problems if we split up responsibilities like that?
Comment 11 joseph@codesourcery.com 2019-06-17 21:42:10 UTC
On Mon, 17 Jun 2019, njs at pobox dot com wrote:

> Right now our platform ABI tags on Linux are literally just the two strings
> "x86" and "x86_64". I guess glibc 1.x did support "x86", maybe with a slightly
> different calling convention? But that's not really important anymore.

The glibc notion of different ABIs effectively treats that as being a 
different platform (and, likewise, Arm old-ABI and EABI as different 
platforms, MIPS classic-NaN and NaN2008 as different platforms, etc.).

Note, incidentally, that the choice between BE8 and BE32 for Arm 
big-endian is a choice made when the static linker is run - code being 
built for Arm big-endian can't tell when compiled which version of 
big-endian will be chosen at link time.
Comment 12 Nathaniel J. Smith 2019-06-18 01:46:26 UTC
OK, so it sounds like the answer is yes, a version-number-only function like this would be helpful, and we can take care of ABI differences separately.

(The big thing about version numbers is that they change all the time, and we don't want to have to patch and deploy new packaging tools every time glibc makes a release. But if we have to patch and deploy new packaging tools to enable support for a new microarchitecture/calling convention/etc., that's fine.)
Comment 13 Carlos O'Donell 2021-09-21 03:13:30 UTC
I have reviewed this issue and it is my opinion that Joseph has answered all of Nathan's questions.

When it comes to a mythical "glibc 3.0" which is more than likely based upon Florian Weimer's GNU Tools Cauldron talk "glibc 3.0":
https://gcc.gnu.org/wiki/cauldron2017#glibc30
https://slideslive.com/38902629/glibc-30

The talk was about creating a lively discussion about features that might be deprecated. Please review the talk if you have questions about where the community might go. The kinds of deprecation that we're talking about for glibc 3.0 should not impact any future python application built today.

I'm marking this issue RESOLVED/NOTABUG since the existing mechanisms that the Python community is using for Wheels should continue to work in the future as discussed in PEP-600 (https://www.python.org/dev/peps/pep-0600/).

If a future glibc drops an old symbol it could mean that manylinux_2_5 built binary wheels could be incompatible with future modern distributions, and that is something that a compatibility resolver would need to know and just install manylinux_X_Y where X and Y are closer to a modern distribution. This hasn't happened yet, but when it does I'd expect what we deprecate is unused or untestable and so doesn't impact the existing wheels.