This is the mail archive of the
mailing list for the glibc project.
Re: [BZ#17090/17620/17621]: fix DTV race, assert, and DTV_SURPLUS Static TLS limit
- From: Torvald Riegel <triegel at redhat dot com>
- To: Alexandre Oliva <aoliva at redhat dot com>
- Cc: Florian Weimer <fw at deneb dot enyo dot de>, Andreas Schwab <schwab at suse dot de>, Richard Henderson <rth at twiddle dot net>, Alan Modra <amodra at gmail dot com>, Roland McGrath <roland at hack dot frob dot com>, codonell at redhat dot com, libc-alpha at sourceware dot org, rth at redhat dot com
- Date: Sat, 24 Sep 2016 12:52:56 +0200
- Subject: Re: [BZ#17090/17620/17621]: fix DTV race, assert, and DTV_SURPLUS Static TLS limit
- Authentication-results: sourceware.org; auth=none
- References: <firstname.lastname@example.org> <20141118224048.600312C3B23@topped-with-meat.com> <email@example.com> <20141120021703.86F032C3B18@topped-with-meat.com> <firstname.lastname@example.org> <email@example.com> <firstname.lastname@example.org> <email@example.com> <firstname.lastname@example.org> <20150304050529.GD26435@bubble.grove.modra.org> <20150304110430.GE26435@bubble.grove.modra.org> <email@example.com> <firstname.lastname@example.org> <email@example.com> <firstname.lastname@example.org> <email@example.com> <firstname.lastname@example.org> <email@example.com> <firstname.lastname@example.org>
On Fri, 2016-09-23 at 23:52 -0300, Alexandre Oliva wrote:
> (if it decides to update it without
> resizing, we also have a race, but both threads end up writing the same
> final value, which AFAIK is not so much of a problem)
Can this still happen after your patches? If so, then this needs to be
fixed. Data races are a problem. You must used atomics to actually
make writes of the same value, concurrently by more than one thread,
Also, for code comments at least, we need to be less ambiguous regarding
our use of the word "race". I think we should avoid it unless we
specifically mean a data race (as defined by C11), or something that's
bad and needs to be fixed. If we want to say that something is not
ordered by happens-before, we should say exactly that. This allows us
to see the term "race" as a red flag.
If we need to use atomics to avoid data races, they will not be ordered
by happens-before. But that's okay, because that's in the nature of
cases where we have to synchronize.
The final state that we should head towards is that we have code that is
either data-race-free or proper synchronization (e.g., using atomics)
that is well-documented.
> The end result is that the initializer may write to stale/freed memory,
> that may even have already been reassigned to some other purpose.
> That's very bad indeed.
> So, although I post the entire backported patch, because that's what I
> tested, and the comment I refer to above is in a part of the patch that,
> if applied, would restore the described race, I immediately withdraw it,
> and propose instead to revert the changes to elf/dl-reloc.c and
> nptl/allocatestack.c that I'd installed earlier this week to the master
> branch. They're not necessary for use with the regular __tls_get_addr,
> and they won't affect the alternate __tls_get_addr used in static
> programs, because the Static TLS block for the main program (the only
> module it cares about) and its DTV entry are always set up by
> __libc_setup_tls (main thread) or _dl_allocate_tls_init (other threads):
> it will never be the case that we will dlopen the main program after the
> program starts and have to initialize its static TLS blocks (and DTV
> entry) for all running threads, because we never dlopen the main
ISTM that all this should become a code comment, and not just be in an
email on the list.
> For the master branch, I'll post a separate patch momentarily, reverting
> the incorrect portion of the recent change, and expanding the comments
> about the race condition, so that neither myself nor anyone else ever
> mistakes another (harmless?) race for that one: that one thread
> initializes other threads' TLS blocks while holding the rtld lock, but
> later on the other threads access the initialized TLS blocks without
> taking any locks or any form of memory synchronization that would
> clearly establish, memory-wise, the implied happens-before.
IIUC, that's were the assumption about dlopen vs. usage comes in that
you included in the other patch. If so, this is not a data race (nor a
race condition). Or are you referring to something else?
BTW, there are no harmless data races ;)
> I'll retest and repost the patches for master and branches early next
> week, if not over the weekend.
Please also remember that with the tools we have, it's pretty unlikely
that we can test the absence of concurrency bugs. Therefore, it is very
important that our reasoning about concurrency is thorough. And we need
to do that collectively, which means:
* Shared terminology
* Proper, detailed documentation
Otherwise, we can't communicate properly about it, which means we can't
review it, and that we're not decreasing bus factors.