This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 2/4] Add atomic operations similar to those provided by C11.


On Thu, 2014-10-30 at 17:17 +0000, Joseph S. Myers wrote:
> On Thu, 30 Oct 2014, Torvald Riegel wrote:
> 
> > On Wed, 2014-10-29 at 23:06 +0000, Joseph S. Myers wrote:
> > > On Wed, 29 Oct 2014, Torvald Riegel wrote:
> > > 
> > > > First, do you agree that we need to make the compiler aware of
> > > > concurrency?  For example, it would be bad if the compiler assumes that
> > > > it can safely reload from an atomic variable just because it was able to
> > > > prove that the loading thread didn't change it in the meantime.
> > > 
> > > I believe that the code already makes it explicit where such reloading 
> > > would be problematic,
> > 
> > Why do you think this is the case?  There is an atomic_forced_read, for
> > example, but no atomic_read_exactly_once.
> 
> See for example elf/dl-lookup.c:do_lookup_x, using an asm to ensure reads 
> happen in a particular order:
> 
>   size_t n = scope->r_nlist;
>   /* Make sure we read the value before proceeding.  Otherwise we
>      might use r_list pointing to the initial scope and r_nlist being
>      the value after a resize.  That is the only path in dl-open.c not
>      protected by GSCOPE.  A read barrier here might be to expensive.  */
>   __asm volatile ("" : "+r" (n), "+m" (scope->r_list));
>   struct link_map **list = scope->r_list;

That ensures that the compiler won't reorder the reads -- but it does
*not* ensure that reads happen in a particular order when executed on an
arch with a weak memory model.  If we don't care about HW reordering, we
won't need the asm (unless perhaps we communicate with external stuff
through volatile, or do something behind the compiler's back), because
the language semantics for sequential code are taking care of that.

Thus, my guess is that this is a bug.

> (There are lots of cases of asms being used - via macros - to control 
> exactly when floating-point evaluations take place and associated 
> exceptions are raised, but that's not relevant to atomics.  In general, 
> any asm of an empty string is probably doing something like this.)
> 
> > > either (a) using asm with appropriate clobbers so 
> > > the compiler doesn't know the value is unchanged
> > 
> > Do you have any examples for this where the clobber is not due to a
> > compiler barrier used for acquire or release fences?
> 
> I don't understand the question.

The fences consist of a compiler barrier and, depending on the arch,
potentially a real HW fence instruction.  The example you gave above is
a compiler barrier, but it looks like it should actually be an acquire
fence.

> > > or (b) using volatile.
> > 
> > But volatile does not guarantee atomicity.  I also think that current
> 
> The point is to guarantee ordering

volatile does not guarantee ordering in a concurrent setting.  The
guarantee is that the loads/stores will be issued in exactly the same
order as if the abstract machine would execute the program -- but this
doesn't constrain visibility to other threads, or what other threads
observe.

> (with the atomicity of aligned word 
> loads / stores being part of the GNU C language semantics required for 
> building glibc).

Ok.  We can continue to rely on such things, but we can also just
embrace that C11 now has an actual memory model for concurrent code.

> > But I also see no real evidence that using atomics for relaxed
> > loads/stores would hurt at all.  The sanitizer use case in the BZ you
> > cited is arguably special.  The pthread_once case didn't show any
> > difference on the fast path.  I can test with a __atomic*-providing
> > compiler with and without using __atomic to look for differences, if
> > that would help you.
> 
> Yes.  We may find in particular cases (possibly all of them) that the 
> change does not have significant code generation consequences, but given 
> the known bug we should actually check that in each case rather than 
> presuming it.

I can't argue against being conservative by default, but I also can't
imagine this being an actual problem for typical synchronizing code.

> > Can we at least agree on having all glibc code use our own
> > atomic_store_relaxed / atomic_load_relaxed (see the patch)?  Then we can
> > still argue whether to use __atomic* to implement these.  But at least
> > we can easily switch to __atomic in the future.  Any objections?
> 
> I think explicit atomic loads / stores are fine, subject to checking code 
> generation before making them use __atomic_*.

How do you envision to do this?

Compare generated code on all archs before and after a change, for each
algorithm on its own?
Do the C11 transition, and then compare all code?
Look at a few algorithms to see whether it makes a difference, and then
see whether we have anecdotal evidence that it might affect performance?


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]