This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: glibc 2.19 status?
- From: "Carlos O'Donell" <carlos at redhat dot com>
- To: Allan McRae <allan at archlinux dot org>, Paul Pluzhnikov <ppluzhnikov at google dot com>, Andrew Hunter <ahh at google dot com>
- Cc: Roland McGrath <roland at hack dot frob dot com>, "Joseph S. Myers" <joseph at codesourcery dot com>, libc-alpha <libc-alpha at sourceware dot org>
- Date: Tue, 04 Feb 2014 23:07:00 -0500
- Subject: Re: glibc 2.19 status?
- Authentication-results: sourceware.org; auth=none
- References: <52E649BF dot 5020400 at archlinux dot org> <20140128205657 dot 16DBA74438 at topped-with-meat dot com> <52E9DEB7 dot 4000709 at redhat dot com> <52E9E84F dot 50907 at redhat dot com> <52EA682D dot 90900 at archlinux dot org> <52F03BEC dot 1020202 at archlinux dot org> <52F062C5 dot 6050705 at redhat dot com> <52F06713 dot 1040005 at archlinux dot org> <Pine dot LNX dot 4 dot 64 dot 1402050004130 dot 25166 at digraph dot polyomino dot org dot uk> <20140205001815 dot B59AB7444A at topped-with-meat dot com>
On 02/04/2014 07:18 PM, Roland McGrath wrote:
>> Well, what we should not do is sit around indefinitely delaying the
>> release! Revert the changes, run the testsuite on x86_64 and x86, commit
>> the reversion and start the process for the actual release. It's clear we
>> do not have consensus to keep the changes in 2.19, which is what matters.
>
> Agreed.
>
>> We can discuss later in what form such changes might come back for 2.20
>> (on the whole my view is that the problems are fundamental to the approach
>> of signal-safe allocation and would best be avoided by the approach of
>> allocating at dlopen / pthread_create time - where objects opened with the
>> old symbol version of dlopen, or using a new RTLD_LAZY_TLS flag, keep lazy
>> TLS but do without signal-safety). I think providing better interfaces
>> for tools to identify memory allocated by glibc is a good idea, but
>> largely orthogonal to solving the TLS signal-safety problem.
>
> Broadly agreed with some of the details to be argued later.
I will not put forth a sustained objection to the reversion of
the current AS-Safe TLS patches. I feel like we had consensus
from the submitters and reviewers and that the fix solved a
real and immediate problem.
I agree with Joseph that there are other alternative solutions
to this problem. However, my worry is that nobody has signed up
to implement those considerably more complex alternative
solutions (which have no guarantee they don't break ASAN).
The solution we have today is good and solves the problem.
I disagree with Roland, my opinion is that he is being ultra--
conservative while I am being merely conservative. It is a
difference of opinion. Despite the positions being similar
I feel his position has problematic long-term maintenance
consequences (discussed below).
It seems as though Joseph and Roland object for different
reasons. Joseph objecting because the solution still has the
potential to fail at runtime in odd ways, and Roland because
we are not sufficiently conservative. I don't know that we will
be able to resolve their requests any time soon if ever.
I will be back in 2.20 to champion for the re-inclusion of the
AS-Safe TLS patches from Paul and Andrew.
My more detailed comments are as follows:
(1) Are glibc internals considered fixed ABIs?
(a) "Yes the internal interfaces are undocumented ABIs"
* Many unknown tools can rely on these interfaces, changing them
breaks things we are not aware of.
* These interfaces are undocumented subtleties that we must change
only very slowly and conservatively.
* If the interfaces change they need to change only after slow
and detailed review, and that happens only after several releases
of notification that such interfaces are going to change.
(b) "No the internal interfaces are not ABIs we can break them to fix
bugs."
* External tools must not rely on internal implementation details.
* Tools must work with glibc to define tooling APIs to provide
supportable and stable interfaces for capturing events of interest
to the tools.
* The community must work with tool vendors to ensure that there are
workarounds for any changes that allow the newest version of the
tool to work with the newest version of glibc. We provide no
backwards compatibility when it comes to internals and their
implementation.
I argue (b) is the choice that reduces future maintenance for the
project, allows us to make internal changes to fix bugs, and gives
us the flexibility to expand glibc in ways which benefit all of
our users.
How does it reduce future maintenance?
If we have to maintain all internal interfaces as potentially
useful points of interposition by external tools, say malloc
interposition, then all future solutions to fix bugs must also
have this property. That complicates the requirements of fixes
that would otherwise simply change internal implementation details.
In the case of making AS-Safe TLS from dlopen'd modules the only
robust solution is to throw away lazy initialization. That is
a lot of work, and you can see how (a) imposes this huge maintenance
burden on the internals of the library. Thus (b) has less maitenance
burden for the project. However, it means we need to actively
engage with 3rd party tools authors to talk about sensible tooling
APIs.
How does it allow us to make internal changes to fix bugs?
We know tools interact with glibc through interposition of
symbols, or a fixed API. Period. That's easy to review when
fixing bugs.
How does it allow us to expand glibc in new directions?
The internals are unconstrained by unknown undocumented
unknown application requirements.
I can't tell if Roland agrees with (b) but is taking the
ultra-conservative approach that anything in the internals including
the ability to rely on interposing malloc for internal allocations
is part of an expected ABI. Thus while agreeing with (b), the
position is that we must instead document where we might allow
symbol interposition and not remove those points without serious
consideration. I find this approach too conservative, sorry.
(2) Current ASAN and LeakSanitizer is fixed.
- Using new glibc 2.19? Upgrade ASAN.
- Using old glibc 2.19? Use any ASAN you want.
Kostya has a fix that enables LeakSanitizer to work with 2.19. It is
true that old ASAN with new glibc will not work correctly, but that is
the reality when you deal with undocumented glibc internals. Once we
have a stable API, then I will stand behind this working correctly.
The balancing act here is:
Break old ASAN with new glibc
vs.
Unsafe first access TLS variables used in signal handlers from
dlopened code.
My opinion is that while it is terrible that we broke ASAN, that
upgrading ASAN is infinitely easier than asking the user to rewrite
their use of TLS variables.
(3) Create stable tools APIs.
What should be happening is open discussion and the creation of stable
APIs for use by tool authors.
Kostya, Rich, myself and others have been working on what the API should
look like, and Kostya has already started documenting a design for 2.20
here:
https://sourceware.org/glibc/wiki/ThreadPropertiesAPI
Keeping implicit ABIs stable is a recipe for disaster and I do not
condone it at any level. It is poor engineering practice, difficult to
maintain, and locks down the implementation in undocumented ways.
(4) TLS access should never fail at runtime.
Joseph and Rich have both argued that TLS access should never fail
at runtime. While that is a good goal it seems contrary to the
scalability goals that some users have regarding DSO loading, TLS,
and threads. As Joseph suggests it might be that the lazy behaviour
is invoked via a new dlopen flag. This still means we need a fix
for the dlopen with the alternate flag and that still breaks ASAN.
Thus we have no real reason to reject Paul and Andrew's patch to
make GNU TLS fully AS-Safe. Similar arguments apply to GNU2 TLS,
and forcing those allocations to happen at dlopen time for all
required descriptors.
Cheers,
Carlos.