[PATCH 1/2] Optimize generic spinlock code and use C11 like atomic macros.

Thu Apr 6 12:04:00 GMT 2017

On Wed, 2017-03-22 at 12:56 +0000, Szabolcs Nagy wrote:
> On 21/03/17 15:43, Stefan Liebler wrote:
> > On 03/14/2017 04:55 PM, Stefan Liebler wrote:
> >> Okay. I've attached an updated patch. It is now using case 2).
> >> This choice applies to pthread_spin_trylock.c and the first attempt to
> >> acquire the lock in pthread_spin_lock.c.
> >> Therefore I've introduced ATOMIC_EXCHANGE_USES_CAS for all architectures
> >> in atomic-machine.h files. There is a check in include/atomic.h which
> >> ensures that it is defined to either 0 or 1. Can you please review the
> >> setting of 0 or 1?
> >>
> >> Bye Stefan
> > Ping
> > 
> 
> the aarch64 changes look ok to me (but this is
> something that ideally would be benchmarked on real
> hw with interesting workload and i haven't done that
> because it is non-trivial)

This is something that we need to continue working on.  I don't think
it's required for this patch.  But any further tuning will need some
benchmark.

I won't have time to work on benchmarks in the foreseeable future I
believe; it would be great if you, Stefan, or someone else could
continue to work on this.

> power consumption of a contended spin lock on armv8
> can be improved using a send-event/wait-event mechanism,
> but then the atomic_spin_nop needs to be in a loop with
> an ll/sc pair not with a relaxed load.
> (i guess we can introduce a target specific spinlock
> if this turns out to be relevant)

Interesting.  I expect the machine maintainers to drive such
optimizations in the future; performance differences should be made
reproducable using benchmarks contributed to glibc.