This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH 1/2] Optimize generic spinlock code and use C11 like atomic macros.
- From: Stefan Liebler <stli at linux dot vnet dot ibm dot com>
- To: libc-alpha at sourceware dot org
- Date: Thu, 23 Mar 2017 17:15:54 +0100
- Subject: Re: [PATCH 1/2] Optimize generic spinlock code and use C11 like atomic macros.
- Authentication-results: sourceware.org; auth=none
- References: <1481905917-15654-1-git-send-email-stli@linux.vnet.ibm.com> <5857CF10.1060100@arm.com> <628f6311-239c-5eea-572c-c2acae6fcbee@linux.vnet.ibm.com> <1487017743.16322.80.camel@redhat.com> <60a34645-17e4-6693-1343-03c55b0c47ad@linux.vnet.ibm.com> <1487437038.20203.68.camel@redhat.com> <25ad863b-6f20-bfb7-95e6-3b04a2b3eee8@linux.vnet.ibm.com> <1487598702.20203.138.camel@redhat.com> <b57d3477-a041-7b06-82ac-6d2b6c6bb08c@linux.vnet.ibm.com> <9c3fc2b3-57b6-b160-3f97-5ce3be05f4c0@linux.vnet.ibm.com> <58D2746A.90405@arm.com>
On 03/22/2017 01:56 PM, Szabolcs Nagy wrote:
On 21/03/17 15:43, Stefan Liebler wrote:
On 03/14/2017 04:55 PM, Stefan Liebler wrote:
Okay. I've attached an updated patch. It is now using case 2).
This choice applies to pthread_spin_trylock.c and the first attempt to
acquire the lock in pthread_spin_lock.c.
Therefore I've introduced ATOMIC_EXCHANGE_USES_CAS for all architectures
in atomic-machine.h files. There is a check in include/atomic.h which
ensures that it is defined to either 0 or 1. Can you please review the
setting of 0 or 1?
Bye Stefan
Ping
the aarch64 changes look ok to me (but this is
something that ideally would be benchmarked on real
hw with interesting workload and i haven't done that
because it is non-trivial)
on a trivial benchmark it seems to be a bit better
than the current code.
This sounds good.
the performance of the unconteded case can be improved
slightly by reverting the unlock change (the release
store is stronger than the barrier was, conceptually
there is a barrier before and after an armv8 release
store to prevent an independent load-acquire to get
reordered with it in either direction)
Thus you mean something like the following?
atomic_thread_fence_release ();
atomic_store_relaxed (lock, 0);
(Info: I've used scripts/build-many-glibcs.py to get the following
objdumps. The sysdeps/powerpc/nptl/pthread_spin_unlock.c is using
atomic_store_release, too. For the following powerpc64-linux-gnu
objdumps, I've removed the powerpc-spinlock implementation to see the
differences)
=>aarch64-linux-gnu
0000000000000000 <pthread_spin_unlock>:
0: d5033bbf dmb ish
4: b900001f str wzr, [x0]
8: 52800000 mov w0, #0x0 // #0
c: d65f03c0 ret
=>powerpc64-linux-gnu:
0000000000000000 <.pthread_spin_unlock>:
0: 7c 69 1b 78 mr r9,r3
4: 7c 20 04 ac lwsync
8: 39 40 00 00 li r10,0
c: 38 60 00 00 li r3,0
10: 91 49 00 00 stw r10,0(r9)
14: 4e 80 00 20 blr
Here is the upstream code as comparison:
atomic_full_barrier ();
*lock = 0;
=>aarch64-linux-gnu
0000000000000000 <pthread_spin_unlock>:
0: aa0003e1 mov x1, x0
4: d5033bbf dmb ish
8: 52800000 mov w0, #0x0 // #0
c: b900003f str wzr, [x1]
10: d65f03c0 ret
=>powerpc64-linux-gnu:
0000000000000000 <.pthread_spin_unlock>:
0: 7c 69 1b 78 mr r9,r3
4: 7c 00 04 ac hwsync
8: 39 40 00 00 li r10,0
c: 38 60 00 00 li r3,0
10: 91 49 00 00 stw r10,0(r9)
14: 4e 80 00 20 blr
And the code of my latest patch:
atomic_store_release (lock, 0);
=>aarch64-linux-gnu
0000000000000000 <pthread_spin_unlock>:
0: 889ffc1f stlr wzr, [x0]
4: 52800000 mov w0, #0x0 // #0
8: d65f03c0 ret
=>powerpc64-linux-gnu:
0000000000000000 <.pthread_spin_unlock>:
0: 7c 69 1b 78 mr r9,r3
4: 7c 20 04 ac lwsync
8: 39 40 00 00 li r10,0
c: 38 60 00 00 li r3,0
10: 91 49 00 00 stw r10,0(r9)
14: 4e 80 00 20 blr
power consumption of a contended spin lock on armv8
can be improved using a send-event/wait-event mechanism,
but then the atomic_spin_nop needs to be in a loop with
an ll/sc pair not with a relaxed load.
(i guess we can introduce a target specific spinlock
if this turns out to be relevant)
(git apply complained about an extra newline at the
end of atomic.h)
Oops. I'll fix it in the next version.
Thanks.
Stefan