This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Fix sem_post race (bug 14532)


On Tue, 2012-09-11 at 18:26 -0400, Carlos O'Donell wrote:
> On Tue, Sep 11, 2012 at 5:41 PM, Torvald Riegel <triegel@redhat.com> wrote:
> >
> > On Thu, 2012-08-30 at 21:09 +0000, Joseph S. Myers wrote:
> > > Bug 14532 is a race condition in sem_post (generic C version), where
> > > it wrongly uses an acquire barrier rather than a release barrier, so
> > > wrongly allowing memory operations from before the unlock to be moved
> > > after it.
> >
> > What is your (or anybody else's) opinion about documenting concurrent
> > pieces of code in more detail to make issues like this one less likely?
> >
> > For example, something like:
> >
> > > 7e9..67e8cc5 100644
> > > --- a/nptl/sysdeps/unix/sysv/linux/sem_post.c
> > > +++ b/nptl/sysdeps/unix/sysv/linux/sem_post.c
> > > @@ -1,5 +1,5 @@
> > >  /* sem_post -- post to a POSIX semaphore.  Generic futex-using version.
> > > -   Copyright (C) 2003, 2004, 2007, 2008 Free Software Foundation, Inc.
> > > +   Copyright (C) 2003-2012 Free Software Foundation, Inc.
> > >     This file is part of the GNU C Library.
> > >     Contributed by Jakub Jelinek <jakub@redhat.com>, 2003.
> > >
> > > @@ -40,7 +40,7 @@ __new_sem_post (sem_t *sem)
> > >         return -1;
> > >       }
> > >      }
> > > -  while (atomic_compare_and_exchange_bool_acq (&isem->value, cur + 1,
> > > cur));
> > > +  while (atomic_compare_and_exchange_bool_rel (&isem->value, cur + 1,
> > > cur));
> >
> > /* This rel memory order synchronizes with the acq memory order in
> > atomic_decrement_if_positive() called by waiters.  We don't need acq_rel
> > here because even though we can race with other posters, we only use the
> > CAS to modify isem->value, so reading an outdated value of isem->value
> > does not matter. */
> 
> I would accept such a patch.
> 
> I don't like the use of `acq_rel' and would prefer we come up with some
> consistent single name for an operation that has both acquire and release
> semantics (totally bike-shed though).

"acq_rel" is consistent with the naming in the C11 and C++11 memory
models (e.g., C++11 has "memory_order_acq_rel").  I believe it would be
good to stick as close as possible to these models.

In general, does glibc intend to replicate/use the C11/C++11 memory
model (i.e., the semantics etc; we can still implement the atomic ops
differently)?  That is, are _acq and _rel supposed to have the same
semantics as in those memory models?

Is there interest in consume memory order too (e.g., by PowerPC / ARM
people)?

> I'm strongly in favour of such comments in the code, in fact I wish all
> atomic sequences had such descriptions. Writing them up is quite a bit of
> work but goes a long way towards documenting expectations for future
> developers.
> 
> > Side notes:
> > - Could not having acq_rel matter on ARM? I've never read the ARM memory
> > model, but heard that there's no guarantee that the most recent value
> > would get picked up eventually. Although I suspect that the CAS would
> > do.
> 
> I don't have an example where not having acq_rel would matter.
> 
> In my mind these kinds of problems are best discussed with a specific example,
> otherwise it's purely speculation.

Well, I was asking whether such an execution example could exist, on ARM
specifically, and for the __new_sem_post piece of code.  I'm not an ARM
expert, so that's why I posed this as a question, and not with an
example that would fail on ARM :)

> > - Should atomic_decrement_if_positive() document the acq memory order
> > that is has (and must have)?
> >
> > >    atomic_full_barrier ();
> >
> > Why do we have the full barrier after the rel barrier?  Is it necessary?
> > Was it meant to have the affect of the rel barrier, but was incorrectly
> > placed after the CAS?
> 
> I don't know why there is an atomic_full_barrier(), at first glance it
> seems unneeded, but as a conservative project we'd have to prove that
> it's not needed before removing it. It appears to have been present since
> the function was rewritten in 2007 by Ulrich.

I'm all for proofs when it comes to concurrent code (not necessarily a
formal proof, but reasoning that is sufficiently complete and precise to
make us confident enough that the code should work).  I'll make a note,
but if anybody wants to review this barrier again, please go ahead...

Torvald


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]