[PATCH 2/2] manual: Document __libc_single_threaded

Fri May 22 16:14:14 GMT 2020

On Fri, May 22, 2020 at 11:07:20AM -0400, Rich Felker wrote:
> On Fri, May 22, 2020 at 11:54:58AM +0100, Szabolcs Nagy wrote:
> > The 05/22/2020 12:05, Florian Weimer wrote:
> > > * Szabolcs Nagy:
> > > 
> > > > The 05/21/2020 15:44, Florian Weimer wrote:
> > > >> * Szabolcs Nagy:
> > > >> > what's wrong with pthread_join updating it?
> > > >> 
> > > >> It's tricky do it correctly if there are two remaining threads, one of
> > > >> them the one being joined, the other one a detached thread.  A
> > > >> straightforward implementation merely looking at __nptl_nthreads before
> > > >> returning from pthread_join would not perform the required
> > > >> synchronization on the detached thread exit.
> > > >
> > > > i'm trying to understand this, but don't see
> > > > what's wrong if the last thread is detached.
> > > 
> > > Sorry, I meant three reamining threads in total, i.e., two more threads
> > > in addition to the one thread that keeps going after the other two
> > > exited, and may use __libc_single_threaded in the future.
> > > 
> > > Clearer now?
> > 
> > hm so a detached thread is concurrently exiting with
> > a pthread_join which sees a decremented __nptl_nthreads
> > but the detached thread has not actually exited yet.
> 
> In principle this is no big deal as long as the exiting thread cannot
> make any further actions where its existence causes an observable
> effect on users of __libc_single_threaded. (For this purpose, I think
> you actually need to define what uses are valid, though; see setxid
> remarks below.) If it makes a problem for pthread_join that's an
> implementation detail that should be fixable. The bigger issue is
> memory synchronization.
> 
> > i think glibc can issue a memory barrier syscall before
> > decrementing __nptl_nthreads in a detached thread, this
> > means if pthread_join observes __nptl_nthreads==1
> > then user memory accesses in the detached thread are
> > synchronized with non-atomic memory accesses after
> > pthread_join returns. (i.e. __nptl_nthreads==1 should
> > mean at all times that as far as user code is concerned
> > the process is single threaded even if some detached
> > thread is still hanging around)
> 
> This still has consequences for setxid safety which is why musl now
> fully synchronizes the existing threads list. But if you're not using
> the thread count for that, it's not an issue. Indeed I think
> SYS_membarrier is a solution here, but if it's not supported or
> blocked by seccomp then __libc_single_threaded must not be made true
> again at this time.

Uhg, SYS_membarrier is *not* a solution here. The problem is far
worse, because the user of __libc_single_threaded potentially lacks
*compiler barriers* too.

Consider something like:

	if (!__libc_single_threaded) { lock(); need_unlock=1; }
	x = *p;
	if (need_unlock) unlock();
	/* ... */
	if (!__libc_single_threaded) { lock(); need_unlock=1; }
	x = *p;
	if (need_unlock) unlock();

Here, in the case where __libc_single_threaded is true the second time
around, there is no (memory or compiler) acquire barrier between the
first access to *p and the second. Thus the compiler can (and actually
does! I don't have a minimal PoC but musl actually just hit a bug very
close to this) omit the second load from memory, and uses the cached
value, which may be incorrect because the exiting thread modified it.

This could potentially be avoided with complex contracts about
barriers needed to use __libc_single_threaded, but it seems highly
error-prone.

Note that the issue becomes much easier to hit with a sort of "pretest
not under lock, re-check with lock" idiom of the form:

	x = *p;
	if (predicate(x)) {
		if (!__libc_single_threaded) { lock(); need_unlock=1; }
		x = *p;
		/* ... */
		if (need_unlock) unlock();
	}

Unlike the above, this one does not depend on the release barrier in
unlock() not also being a compiler acquire barrier.

Rich