This is the mail archive of the
mailing list for the glibc project.
Re: [PR19826] fix non-LE TLS in static programs
- From: Torvald Riegel <triegel at redhat dot com>
- To: Alexandre Oliva <aoliva at redhat dot com>
- Cc: Andreas Schwab <schwab at suse dot de>, libc-alpha at sourceware dot org, "Carlos O'Donell" <carlos at redhat dot com>
- Date: Thu, 29 Sep 2016 13:19:58 +0200
- Subject: Re: [PR19826] fix non-LE TLS in static programs
- Authentication-results: sourceware.org; auth=none
- References: <firstname.lastname@example.org> <email@example.com> <firstname.lastname@example.org> <email@example.com> <firstname.lastname@example.org>
On Sun, 2016-09-25 at 19:18 -0300, Alexandre Oliva wrote:
> On Sep 24, 2016, Torvald Riegel <email@example.com> wrote:
> > On Sat, 2016-09-24 at 00:44 -0300, Alexandre Oliva wrote:
> >> In this patch, I add a long comment explaining why we can't modify
> >> another threads' DTV while initializing its static TLS block during
> >> dlopen.
> > I appreciate that you added a comment. Given that you seemed to have
> > paged in all of this currently, it would be even better if you could try
> > to further improve the documentation of all of this.
> As your feedback to the patch shows, it's pointless for me to even try.
I do not think it is, provided that you are willing to work within the
memory model that we have chosen for glibc (which is basically
equivalent to C11's).
> My way of reasoning about this stuff doesn't seem to make sense to you.
> So why bother?
This isn't about my personal opinion, or taste. glibc has chosen the
C11 memory model (see above), so I consider this our technical context.
As with any other technical context we pick, we want to build code (and
documentation) that is sound within that context. Hence, if I think
that a patch can be improved in this regard, I'll speak up.
Consider an exaggerated but similar example: I want to add a
non-trivial piece of code, and like to reason about it using notes on
paper. Yet glibc has decided that it wants comments in the code. What
should I do? Should I just give up, or try to find a way to do what
glibc wants and how it decided to maintain the code base (eg, finding a
back-and-forth conversion between glibc model of taking notes and mine)?
> > (I suppose missing the issue you fix here would have been less likely
> > if proper documentation existed before.)
> *nod*. That's why I added the new, longer comments: the comments I'd
> added before were not enough to stop even myself from reintroducing the
> error, because there was a shallow, more obvious (apparent? harmless?)
> race, and a deeper, less obvious one that's very harmful when it hits.
Right. But we also need to consider that more than one person will
eventually want to maintain any piece of glibc code. Thus, we need a
common way of reasoning about it. This needs to very well specified,
and that's a job we can't do on our own. So, picking something that
will be wide-spread, like C11's model, is the right thing to do. But
then we also need to make use of it.
I've noted a couple of times that of course, the project needs to find
its own way of how to talk about concurrency. Discussing patches such
as yours is a way of doing that. I've also always asked reviewers of my
patches that concurrent code to also comment on whether they find the
comments about concurrency in these good (eg, level of detail), or
whether they have suggestions for improvement.
However, we need to be technically correct in how we speak about it, and
that includes sticking to the terminology that defines the memory model
we have picked. Otherwise, we're deviating from our technical base of
reasoning, and this will just create misunderstandings.
> > Are you aware of any synchronization in the TLS stuff that is not based
> > on locks?
> Yes, plenty. Off the top of my head, there's static TLS block
> initialization at dlopen vs use by other threads, there was DTV
> initialization vs use and resize (fixed by my 1.5yo-patch, broken by my
> patch from last week, and fixed again once the patch that reverts the
> breakage goes in, because we only mess with a thread's DTV at thread its
> start-up and then by the thread itself)
Can you fix this to use C11 atomics (or even the old-style atomics if
you're not comfortable with using C11 atomics yet), please?
> but IIRC there are
> self-concurrency issues if DTV resizing is interrupted by a signal that
> uses GD and starts further resizing) and IIRC there is plenty of
> suspicious stuff in slotinfo manipulation vs use (the data structure is
> only modified while holding the rtld lock, but it's read while updating
> a DTV without any explicit synchronization; I never looked into it
> enough to tell what the assumptions were that make it safe at some
> point, to tell whether there's any chance they still are);
That sounds like a incorrect attempt at double-checked locking, which
I've found several cases of so far, so it would be "within the pattern".
> I have a
> hunch that there are probably issues with TLS descriptors as well,
> because although they're created while holding the rtld lock, their use
> is not explicitly guarded either, nor are than the GOT entries relocated
> so as to point to them.
> IMHO these are all internal implementations, so the C11 memory model for
> applications doesn't quite apply to them:
The C11 model is the memory that glibc uses. See
> it's ok if they make stronger
> or weaker assumptions than what the language is supposed to expose to
> users of the language. Just like the *implementation* of mutexes,
> locks, condition variables and other such concepts won't necessarily
> abide by the rules imposed on users of the language, as long as they
> deliver the intended semantics, icluding synchronization and memory
> model properties.
See above. While I agree that we could use something else, we chose C11
for glibc's code too. There are very few cases where we'd need more
than C11 atomics to implement glibc's concurrent code.
> > If so, it would be great if you could transform them to using the
> > new-style C11 atomics.
> I guess it would, wouldn't it? Too bad I'm probably not. I just can't
> stand even the thought of the never-ending conversations that would
If your patches would follow the C11 model and the comments would be
fine, you'd just get an "OK" in my review.
> So, I'm out of here; the only reason you're even from me is that
> I'm responsible enough to at least try to fix the breakage I introduced,
> once it was brought to my attention (at a time in which I actually
> noticed it; too bad I missed it the first time).
> > I've reviewed nscd concurrency recently and converted it to use C11
> > atomics, and this process revealed many synchronization bugs such as
> > missing barriers.
> That shouldn't be surprising. A lot of that code was written before C11
> was even in the foreseeable future.
It was fine to assume TSO when glibc was started, and the role of
compilers in concurrency was less well understood than before.
However, glibc claims to support ARM and PowerPC correctly now for quite
a while, and that is not the case in concurrent code in many cases (eg,
because assuming something like TSO).
Also note that even if assuming a strong memory model, some of the nscd
synchronization would be broken.
Furthermore, you reviewed nscd-related code and marked it as
thread-safe. This wasn't that long ago, and happened at a time at which
weak memory models such as ARM's, and the role of the compiler were well
understood. My patch shows that this wasn't thread-safe. Thus, I find
it surprising that you think it wasn't surprising that the
synchronization in nscd is buggy.
> >> + Another race, not to be mistaken for the above, has to do with
> >> + the TLS block initialized below. Although we perform this
> >> + initialization while holding the rtld lock, once dlopen
> >> + completes, other threads might run code that accesses the
> >> + variables in these TLS blocks, without taking the rtld lock or
> >> + any other explicit memory acquire.
> > What is a "memory acquire"?
> That's a term from long before C standardized atomics, from the 20 years
> or so in which people like myself have been doing concurrent programming
> on multi-CPU computers without explicit language support, and even on
> single-CPU multi-tasked systems. In dusted books from that age, you may
> be able to find tons of information about such things as memory barriers
> or fences, some with acquire and release semantics, as well as the
> implied acquire and release of all memory that occurs at mutex acquire
> and release time, respectively.
> There are still some remnants from those days in e.g. Linux docs:
> but a lot of what can be found on the Internet now seems to be more
> recent than C and C++ atomics. Still, the notion of memory acquire and
> release from the ancient scrolls is still there, even if a bit in
> disguise. There's no reason for us to limit our language and throw away
> all the old books just because the term was reused in a new, narrower
> context, is there?
My point is that you're not specific. There's no such thing as The
Acquire Semantics -- so if you should be specific which semantics you
are referring to. Referring to a group of semantics isn't good enough
if the members of this group differ significantly.
One such difference is, for example, whether you assume that all
acquire/release actions are global, or always "tied" to accesses to a
specific memory location. You seem to think the former, but this isn't
true in weak HW memory models we have today and that glibc claims to
The language we use in glibc needs to be sufficiently precise for what
we want to do in glibc. What you (or anyone else for that matter)
thinks might have been popular terminology X years ago or on some
architectures of the past is frankly irrelevant; sure, it can helps us
get to proper terminology, but it's not sufficient in glibc code /
> >> + Our implementation appears to assume that, in order for the other
> >> + threads to access those variables, there will be some form of
> >> + synchronization between the end of dlopen() in one thread and the
> >> + use of the just-loaded modules in others, otherwise the very use
> >> + of the loaded code could be undefined. */
> > I'd reverse that: I find it misleading that you start that there is a
> > race condition, when in fact this seems not to be the case because of
> > the assumption. Therefore, I'd mention the assumption first, and then
> > derive from that there is no data-race and that uses of the TLS will
> > find it initialized.
> Unless the assumption fails to meet the definition of data race in some
> possibly irrelevant fashion.
This assumption is orthogonal to how a data race is defined.
> The point is that the implementation
> doesn't guard itself from certain races, if someone's determined to
> create one,
That's why I suggested to clearly state the assumption first. You could
also call it a precondition for this functionality in glibc. (I'm not
aware of any "official" requirement / precondition, but if it exists, we
should cite it.)
> but I pose they don't matter. Let me give you some
> - there's the obvious case of the program with a race: the dlopen caller
> sets a global non-atomic variable when dlopen completes, and another
> thread busy-waits for the variable to change, and then uses its value to
> call a function that accesses the newly-loaded TLS variable. The race
> on the global variable enables the race on the TLS variable, so
> undefined behavior has already been invoked by the time the TLS
> implementation incurs its own race.
> - here's a variant that doesn't involve any other race: the dlopen
> caller writes to a pipe a pointer to a function that accesses the TLS
> variable, and another thread reads the pointer from the pipe and calls
> it -> race? there's clearly a happens-before order implied by the
> write() and read(), but that doesn't imply memory synchronization in the
> memory model AFAICT.
I would think that the preferable semantics would be that a read()
reading data from a write() call would be considered to create a
synchronizes-with edge between these.
If it doesn't, then it's the same as if using relaxed-MO atomics to
communicate the pointer, which doesn't create a synchronizes-with. In
this case, which is not a data race, the precondition of dlopen would
apply; if the user violates it, then it's the user's fault. The comment
I suggest would then remind glibc developers that there is this
precondition, and thus glibc is correct wrt its specification (which
includes the precondition).
> - another variant in which there's a clear happens-before order, but no
> memory synchronization, so there's possibly a race: the dlopen caller
> installs a signal handler that accesses the TLS variable, and then gets
> the signal sent to the process itself; another thread is chosen to
> handle the signal and accesses the TLS variable -> race?
Violates the precondition, so undefined behavior. Whether that might or
might not lead to a data race in the execution of the implementation is
an implementation detail.
> - generalizing, use any out-of-(memory-)band information-carrying side
> effect that one thread could use to communicate to another thread how to
> call a function that accesses the TLS variable, both function and
> variable brought in by dlopen, and you'll have, technically, a data
> race, but IMO a harmless data race.
I'm missing some detail in the description of the case, so I can't say
whether it would or would not be a data race in how the variable is
communicated. But irrespective of whether the