[PATCH v4 0/3] Optimize CAS [BZ #28537]
Paul A. Clarke
pc@us.ibm.com
Wed Nov 10 20:07:22 GMT 2021
On Wed, Nov 10, 2021 at 08:26:09AM -0600, Paul E Murphy via Libc-alpha wrote:
> On 11/9/21 6:16 PM, H.J. Lu via Libc-alpha wrote:
> > CAS instruction is expensive. From the x86 CPU's point of view, getting
> > a cache line for writing is more expensive than reading. See Appendix
> > A.2 Spinlock in:
> >
> > https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/xeon-lock-scaling-analysis-paper.pdf
> >
> > The full compare and swap will grab the cache line exclusive and cause
> > excessive cache line bouncing.
> >
> > Optimize CAS in low level locks and pthread_mutex_lock.c:
> >
> > 1. Do an atomic load and skip CAS if compare may fail to reduce cache
> > line bouncing on contended locks.
> > 2. Replace atomic_compare_and_exchange_bool_acq with
> > atomic_compare_and_exchange_val_acq to avoid the extra load.
> > 3. Drop __glibc_unlikely in __lll_trylock and lll_cond_trylock since we
> > don't know if it's actually rare; in the contended case it is clearly not
> > rare.
>
> Are you able to share benchmarks of this change? I am curious what effects
> this might have on other platforms.
I'd like to see the expected performance results, too.
For me, the results are not uniformly positive (Power10).
>From bench-pthread-locks:
bench bench-patched
mutex-empty 4.73371 4.54792 3.9%
mutex-filler 18.5395 18.3419 1.1%
mutex_trylock-empty 10.46 2.46364 76.4%
mutex_trylock-filler 16.2188 16.1758 0.3%
rwlock_read-empty 16.5118 16.4681 0.3%
rwlock_read-filler 20.68 20.4416 1.2%
rwlock_tryread-empty 2.06572 2.17284 -5.2%
rwlock_tryread-filler 16.082 16.1215 -0.2%
rwlock_write-empty 31.3723 31.259 0.4%
rwlock_write-filler 41.6492 69.313 -66.4%
rwlock_trywrite-empty 2.20584 2.32178 -5.3%
rwlock_trywrite-filler 15.7044 15.9088 -1.3%
spin_lock-empty 16.7964 16.7731 0.1%
spin_lock-filler 20.6118 20.4175 0.9%
spin_trylock-empty 8.99989 8.98879 0.1%
spin_trylock-filler 16.4732 15.9957 2.9%
sem_wait-empty 15.805 15.7391 0.4%
sem_wait-filler 19.2346 19.5098 -1.4%
sem_trywait-empty 2.06405 2.03782 1.3%
sem_trywait-filler 15.921 15.8408 0.5%
condvar-empty 1385.84 1387.29 -0.1%
condvar-filler 1419.82 1424.01 -0.3%
consumer_producer-empty 2550.01 2395.29 6.1%
consumer_producer-filler 2709.4 2558.28 5.6%
PC
More information about the Libc-alpha
mailing list