This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Add STATIC_GETENV macro.


On Thu, 2013-11-14 at 13:52 +0100, OndÅej BÃlka wrote:
> On Thu, Nov 14, 2013 at 11:13:58AM +0100, Torvald Riegel wrote:
> > On Wed, 2013-11-13 at 22:33 +0100, OndÅej BÃlka wrote:
> > > On Wed, Nov 13, 2013 at 04:34:18PM +0100, Torvald Riegel wrote:
> > > > I don't think a macro is wise here.  You seem to want to make the macro
> > > > safe for use in multi-threaded programs, so this is similar to a
> > > > pthread_once except that there can be more than one call to get_env (ie,
> > > > the equivalent to the initialization function), but exactly on
> > > > assignment.
> > > > Maybe a pthread_once_store(pointer, producer) would be better (*if* we
> > > > actually want to add such a feature): pointer is like p in your code,
> > > > producer is a function that is called when we actually need to
> > > > initialize but can be safely called more than once; we exactly-once set
> > > > pointer to the value returned by one of the producer calls.
> > > > 
> > > Yes but this does not need forcing users to link with pthread.
> > > 
> > > 
> > > > Implementing this properly requires more than just a simple CAS (see
> > > > pthread_once and my comments below); thus, you'll need to use proper
> > > > atomics or whatever the compiler has available, which requires you to
> > > > take care of all of that in the context of the client's compilation
> > > > environment.
> > > > 
> > > > On Sat, 2013-11-09 at 11:38 +0100, OndÅej BÃlka wrote:
> > > > > +#define STATIC_GETENV(c) \
> > > > > +({									      \
> > > > > +  static char *__p = (char *) &__p;					      \
> > > > > +  char *__new = __p;							      \
> > > > > +  if (__p == (char *) &__p)						      \
> > > >
> > > There should be
> > > if (__new == (char *) &__p)
> > 
> > Even with that change, the load of p needs to be atomic with
> > acquire/consume memory order.
> >
> We should support older gcc that do not support C11 atomics. 

Sure, but you still need something equivalent to make it work correctly.
In effect, you're just arguing that it shouldn't be a macro but a
function call, where glibc can deal with how to support older
compilers / language standards.

> Also when you are concerned about machines that do issue nonatomic reads
> then relaxed model is appropriate, acquire/consume could yield unacceptable
> performance overhead.

Which relaxed model?  Do you mean relaxed memory order?  Atomicity is
orthogonal to ordering constraints enforced by particular memory orders.
Also, acquire/consume don't need any HW instructions on x86, sparc, and
the like (they still constrain compiler optzns, of course).  Consume
should be very little overhead on Power, and even acquire shouldn't be
too much on Power and ARM.

> > > > This is a data race (at least in C11, which has a concept of threads).
> > > > You need to use an atomic access, and it must have acquire memory order.
> > > > Consume memory order could also be sufficient, I believe.
> > 
> > Like it or not, this is a data race according to C11 and C++11.  You
> > offer this using a macro, so it becomes part of the client's code.  Data
> > race detectors will find this race condition.
> > 
> This is valid comment, do race detect report also race for relaxed
> loads?

(Relaxed) atomic accesses do not create data races according to the
C11/C++11 memory model.  A particular data race detector needs to pay
attention to this, of course.

> When you use a race detector it would also uncover a race in fnmatch and
> other libc races.

Yes, and this is something we might want to address in the future.

> > > You do not need atomic access for correctness, worst that can happen is
> > > performance degradation by repeately recalculating value.
> > 
> > No.  Please understand the C11/C++11 memory model.  The compiler could
> > choose to read p nonatomically, or read it a couple of times and assume
> > that the value won't change, speculating that it is always like in the
> > second of two accesses.
> >
> 
> A default pthread_once implementation is following
> 
> int
> __pthread_once (once_control, init_routine)
>      pthread_once_t *once_control;
>      void (*init_routine) (void);
> {
>   /* XXX Depending on whether the LOCK_IN_ONCE_T is defined use a
>      global lock variable or one which is part of the pthread_once_t
>      object.  */
>   if (*once_control == PTHREAD_ONCE_INIT)
>     {
>       lll_lock (once_lock, LLL_PRIVATE);
> 
>       /* XXX This implementation is not complete.  It doesn't take
>          cancelation and fork into account.  */
>       if (*once_control == PTHREAD_ONCE_INIT)
>         {
>           init_routine ();
> 
>           *once_control = !PTHREAD_ONCE_INIT;
>         }
> 
>       lll_unlock (once_lock, LLL_PRIVATE);
>     }
> 
>   return 0;
> 
> A type of pthread_once_t is int so compiler could read once_control
> nonatomicaly. According to your argument this code is broken so file a
> bug report and send a patch to fix this serious issue.

Which I did quite a while ago, and which is waiting for review :)

> > Furthermore, read about weak memory models.  We need acquire/consume
> > memory order to ensure that accesses to *p happen after p* has been
> > initialized (and likewise for release on the assignment to p).
> > 
> I know them

So if you know how memory orders interact (considering the C11/C++11
model here, as it's the one we should be following), why do you disagree
that we need to add certain memory orders / barriers in this example
(or, alternatively, can't seem to explain why we don't need them)?

> and 
> 
> > If you can show that in this particular case, initialization of *p will
> > happen before every thread can use STATIC_GETENV, then you can do
> > without acquire/release -- but then you need to add a comment why this
> > is the case and still use relaxed memory order atomic accesses (also on
> > the CAS to avoid the barrier overhead, the __sync CAS has mostly seq-cst
> > memory order).
> > 
> Using a static initializer is a common pattern used in nptl and we do not 
> need comment for obvious cases.

This is not related to having __p initialized statically.  It's about
how you synchronize with other threads.  I'd claim that I know a bit
about synchronization, and if you do too, then if we disagree on this,
it at the very least shows that it isn't quite an obvious case.

> 
> > > Unless compiler decides to inline functions so that it accesses variable
> > > in loop which when called first time would cause repeated
> > > recomputations. A solution would be mark variable volatile.
> > 
> > Volatile is not a sufficient replacement for atomic accesses because it
> > tells the compiler that for such variables, accesses should happen
> > exactly as if the abstract machine would run the program -- but this
> > doesn't necessarily mean they should be atomic wrt. accesses in other
> > threads.  And you also cannot establish inter-thread happens-before
> > relations with just volatiles (for which you need the memory-orders
> > and/or the total orders the memory model defines for seq-cst actions).
> > 
> This relationship is not necessary.

Are you saying that establishing an inter-thread happens-before is not
necessary?  Do you recall the definition of a data race in C11/C++11?

> > > > 
> > > > > +    {									      \
> > > > > +      __new = getenv (c);						      \
> > > > > +      char *__val = __sync_val_compare_and_swap (&__p, (char *) &__p, __new); \
> > > > 
> > > > This needs acquire-release memory order.  Seq-cst would be too much,
> > > > anything less than acquire-release won't work on archs with weak memory
> > > > models.
> > > > 
> > > No need to complicate matters with custom order as this path will likely be 
> > > visited at most once per program lifetime.
> > 
> > First, whether something is "unlikely" to go wrong isn't a good
> > criterion for glibc code.
> 
> This is not about going wrong but about separating a hot code and cold.

That seems to be your intent, but if you want the code you posted to
work in a multi-threaded setting, it's simply not correct as posted.
(It might be correct under some assumptions / preconditions, but those
aren't obvious, and you haven't done anything to show that they hold.)

> Hot code should be optimized for speed but cold code for readability.
> 
> From correctness perspecive this always returns correct value

If so, then please show this based on the language's memory model.  If
it's obvious as you claim, it should be easy to show for you...

> so only
> performance is a concern. You evaluate this by risk analysis, in a worst
> case we always recompute a getenv from scratch as unoptimized code
> always computed getenv there is only additional overhead of atomic
> instructions,

The worst case with the code you posted is that you read garbage when
you access *p, or read a invalid p.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]