This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: glibc 2.19 status?


On 02/04/2014 07:18 PM, Roland McGrath wrote:
>> Well, what we should not do is sit around indefinitely delaying the 
>> release!  Revert the changes, run the testsuite on x86_64 and x86, commit 
>> the reversion and start the process for the actual release.  It's clear we 
>> do not have consensus to keep the changes in 2.19, which is what matters.
> 
> Agreed.
> 
>> We can discuss later in what form such changes might come back for 2.20 
>> (on the whole my view is that the problems are fundamental to the approach 
>> of signal-safe allocation and would best be avoided by the approach of 
>> allocating at dlopen / pthread_create time - where objects opened with the 
>> old symbol version of dlopen, or using a new RTLD_LAZY_TLS flag, keep lazy 
>> TLS but do without signal-safety).  I think providing better interfaces 
>> for tools to identify memory allocated by glibc is a good idea, but 
>> largely orthogonal to solving the TLS signal-safety problem.
> 
> Broadly agreed with some of the details to be argued later.

I will not put forth a sustained objection to the reversion of
the current AS-Safe TLS patches. I feel like we had consensus
from the submitters and reviewers and that the fix solved a
real and immediate problem.

I agree with Joseph that there are other alternative solutions
to this problem. However, my worry is that nobody has signed up 
to implement those considerably more complex alternative 
solutions (which have no guarantee they don't break ASAN).
The solution we have today is good and solves the problem.

I disagree with Roland, my opinion is that he is being ultra--
conservative while I am being merely conservative. It is a
difference of opinion. Despite the positions being similar
I feel his position has problematic long-term maintenance
consequences (discussed below).

It seems as though Joseph and Roland object for different
reasons. Joseph objecting because the solution still has the
potential to fail at runtime in odd ways, and Roland because
we are not sufficiently conservative. I don't know that we will
be able to resolve their requests any time soon if ever.

I will be back in 2.20 to champion for the re-inclusion of the
AS-Safe TLS patches from Paul and Andrew.

My more detailed comments are as follows:

(1) Are glibc internals considered fixed ABIs?

(a) "Yes the internal interfaces are undocumented ABIs"

* Many unknown tools can rely on these interfaces, changing them
  breaks things we are not aware of.

* These interfaces are undocumented subtleties that we must change
  only very slowly and conservatively.

* If the interfaces change they need to change only after slow
  and detailed review, and that happens only after several releases
  of notification that such interfaces are going to change.

(b) "No the internal interfaces are not ABIs we can break them to fix
     bugs."

* External tools must not rely on internal implementation details.

* Tools must work with glibc to define tooling APIs to provide
  supportable and stable interfaces for capturing events of interest
  to the tools.

* The community must work with tool vendors to ensure that there are
  workarounds for any changes that allow the newest version of the
  tool to work with the newest version of glibc. We provide no
  backwards compatibility when it comes to internals and their
  implementation.

I argue (b) is the choice that reduces future maintenance for the
project, allows us to make internal changes to fix bugs, and gives
us the flexibility to expand glibc in ways which benefit all of
our users.

How does it reduce future maintenance?

If we have to maintain all internal interfaces as potentially
useful points of interposition by external tools, say malloc
interposition, then all future solutions to fix bugs must also
have this property. That complicates the requirements of fixes
that would otherwise simply change internal implementation details.
In the case of making AS-Safe TLS from dlopen'd modules the only
robust solution is to throw away lazy initialization. That is
a lot of work, and you can see how (a) imposes this huge maintenance
burden on the internals of the library. Thus (b) has less maitenance
burden for the project. However, it means we need to actively
engage with 3rd party tools authors to talk about sensible tooling
APIs.

How does it allow us to make internal changes to fix bugs?

We know tools interact with glibc through interposition of
symbols, or a fixed API. Period. That's easy to review when
fixing bugs.

How does it allow us to expand glibc in new directions?

The internals are unconstrained by unknown undocumented
unknown application requirements.

I can't tell if Roland agrees with (b) but is taking the
ultra-conservative approach that anything in the internals including
the ability to rely on interposing malloc for internal allocations
is part of an expected ABI. Thus while agreeing with (b), the
position is that we must instead document where we might allow
symbol interposition and not remove those points without serious
consideration. I find this approach too conservative, sorry.

(2) Current ASAN and LeakSanitizer is fixed.

- Using new glibc 2.19? Upgrade ASAN.

- Using old glibc 2.19? Use any ASAN you want.

Kostya has a fix that enables LeakSanitizer to work with 2.19.  It is
true that old ASAN with new glibc will not work correctly, but that is
the reality when you deal with undocumented glibc internals. Once we
have a stable API, then I will stand behind this working correctly.

The balancing act here is:

Break old ASAN with new glibc

vs.

Unsafe first access TLS variables used in signal handlers from
dlopened code.

My opinion is that while it is terrible that we broke ASAN, that
upgrading ASAN is infinitely easier than asking the user to rewrite
their use of TLS variables.

(3) Create stable tools APIs.

What should be happening is open discussion and the creation of stable 
APIs for use by tool authors.

Kostya, Rich, myself and others have been working on what the API should
look like, and Kostya has already started documenting a design for 2.20
here:

https://sourceware.org/glibc/wiki/ThreadPropertiesAPI

Keeping implicit ABIs stable is a recipe for disaster and I do not
condone it at any level. It is poor engineering practice, difficult to
maintain, and locks down the implementation in undocumented ways.

(4) TLS access should never fail at runtime.

Joseph and Rich have both argued that TLS access should never fail
at runtime. While that is a good goal it seems contrary to the 
scalability goals that some users have regarding DSO loading, TLS, 
and threads. As Joseph suggests it might be that the lazy behaviour
is invoked via a new dlopen flag. This still means we need a fix
for the dlopen with the alternate flag and that still breaks ASAN.
Thus we have no real reason to reject Paul and Andrew's patch to
make GNU TLS fully AS-Safe. Similar arguments apply to GNU2 TLS,
and forcing those allocations to happen at dlopen time for all
required descriptors.

Cheers,
Carlos.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]