This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [COMMITTED PATCH] NaCl: Make thread exit wake pthread_join.


Torvald, thanks very much for reviewing this NaCl-specific code!  I
very much appreciate your help with the concurrency issues (and any
other comments you or other folks might have).  It's above and beyond
the call of duty for anybody else to worry about the NaCl code, so I
consider it a personal favor to me and am grateful.

> Maybe add a comment why the relaxed store is sufficient here (it's just
> a flag for busy-waiting, it doesn't imply a happens-before for any other
> state change).  This may be different from the store of the final value,
> for which I'm wondering whether it needs to be a release MO store.  See
> below for comments on the (acquire?) load side of this.

Frankly I'm not all sure that it actually is sufficient.  

This code is modelled after an old glibc fork modified for NaCl (that
was never anywhere near fit for going upstream).  In that version the
field had been changed to be volatile and this was just a simple load.
I knew it didn't make any sense to use volatile, and started out just
using atomic_forced_read as the equivalent.  Then I decided that, in
keeping with new best practices, I should use something in the C11
vocabulary instead, and atomic_load_relaxed is the thing from that set
that is actually the same as atomic_forced_read in practice.  But I
was consciously glossing over the real memory-model issues to just get
something working already to meet a personal deadline I'd set for
myself.  However, that old model both was not at all necessarily
right, and was only used in production on x86, where you can get away
with a lot in practice.

> > +# ifndef BUSY_WAIT_NOP
> > +#  define BUSY_WAIT_NOP		__sync_synchronize ()
> > +# endif
> 
> Are you sure that adding a barrier is the right thing to do here
> (especially one of the __sync variety that we want to get rid off
> eventually)?  Technically, you don't need the barrier.  Other archs use
> "pause" code sequences.

A barrier is certainly not the right thing in any general sense.  This
is the generic fallback for when the machine-specific file doesn't
define anything.  On x86, it's defined to the pause instruction.  I
saw that the Linux kernel's equivalent macro cpu_relax() is defined to
smp_mb() on some machines that don't have an ideal instruction for
this, and decided __sync_synchronize was close enough to that to be a
reasonable fallback choice and at least better than nothing.  Perhaps
it's not really better than just a pure compiler barrier, which is
what Linux's cpu_relax() does on many other configurations.

What I'd really like to see is the sometimes-defined macro replaced
with an always-defined macro (or inline) in a sysdeps header for
purely machine-specific things (perhaps just its own tiny header).
Then the sysdeps/generic/ definition would be whatever consensus of
those wiser than me says is the most useful generic fallback, and of
course each machine maintainer can supply the optimal
machine-dependent thing if there is one.

> > +/* See exit-thread.h for details.  */
> > +# define NACL_EXITING_TID	1
> > +
> > +# undef lll_wait_tid
> > +# define lll_wait_tid(tid)				\
> > +  do {							\
> > +    __typeof (tid) __tid;				\
> > +    volatile __typeof (tid) *__tidp = &(tid);		\
> > +    while ((__tid = atomic_load_relaxed (__tidp)) != 0) \
> 
> I'm aware we use a relaxed load in the generic version as well but I'm
> wondering why that is actually sufficient and we don't need an acquire
> load.  pthread_join does seem to access values provided by the finished
> thread (eg, load pd->result).
> 
> I vaguely remember thinking about that issue when we changed the
> documentation of lll_wait_tid -- but I can't find any notes or emails
> about it.
> 
> Do you have a reasoning why it's sufficient, or is this something we
> need to dig up / come up with?

Nope, I have no reasons. :-)

It's quite plausible to me that there is something else done before
pthread_join's load and/or after after where the pthread_create.c code
(START_THREAD_DEFN) sets the value that constitutes a sufficient
barrier.  But if so, it is implicit and not explained in any comment.
So at the very least we need more comments, and it seems likely we do
actually need more barriers.

For example, it's not clear to me what barriers are implied by the
futex operations.  The Linux CLONE_THREAD_CLEARTID behavior is that
the exiting thread implicitly does a FUTEX_WAKE call, so if that
constitutes a necessary barrier then it should already be sufficient
(on Linux).  But we should be thoroughly explicit about such
expectations.  I imagine that is something you will document properly
in the new internal futex API, and that documentation can say how
__exit_thread is required to interact.

The actual NaCl picture is probably even worse.  The clearing done in
the system code does not use any kind of explicit barrier or atomic
operation.  The documentation (such as it is) for the system's
thread-exit call does not say anything about barrier semantics at all.
In practice it probably does what matters, because it incidentally
acquires and releases some internal locks both before and after
clearing the word.  But I would certainly like to get some clear
documentation on what memory-model semantics are provided by Linux's
CLONE_THREAD_CLEARTID and are required by libpthread.  (Then we'll
probably just try to make the NaCl system semantics match that
explicitly, modulo the actual futex-waking part.)


Thanks,
Roland


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]