This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Unify pthread_once (bug 15215)
- From: Torvald Riegel <triegel at redhat dot com>
- To: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>
- Cc: libc-alpha at sourceware dot org
- Date: Fri, 11 Apr 2014 16:27:12 +0200
- Subject: Re: [PATCH] Unify pthread_once (bug 15215)
- Authentication-results: sourceware.org; auth=none
- References: <1368024237 dot 7774 dot 794 dot camel at triegel dot csb> <519D97E4 dot 4030808 at redhat dot com> <1381018836 dot 8757 dot 3598 dot camel at triegel dot csb> <Pine dot LNX dot 4 dot 64 dot 1310071604090 dot 31470 at digraph dot polyomino dot org dot uk> <1381182784 dot 18547 dot 138 dot camel at triegel dot csb> <533605BF dot 9000005 at redhat dot com> <534307E1 dot 5050904 at linux dot vnet dot ibm dot com> <1397159860 dot 10643 dot 16419 dot camel at triegel dot csb> <534710B9 dot 6010303 at linux dot vnet dot ibm dot com>
On Thu, 2014-04-10 at 18:44 -0300, Adhemerval Zanella wrote:
> On 10-04-2014 16:57, Torvald Riegel wrote:
> > On Mon, 2014-04-07 at 17:17 -0300, Adhemerval Zanella wrote:
> >> Digging up on current code and the unified one, I noted PowerPC one is currently doing load+and+store
> >> condition followed by a full barrier.
> > Hmm, the code I see is using __lll_acq_instr, which is an isync, which
> > AFAICT is not a full barrier (hwsync).
> >> This is overkill for some scenarios: if the initialization is
> >> done (val & 2) we don't need a full barrier (isync), since the requirement (as Tovalds has explained)
> >> is just a load acquire.
> > The atomic_read_barrier used in the unified code seems to result in an
> > lwsync. Is that faster than an isync? If not, the overhead may be due
> > to something else.
>
> You are correct, bad wording from me. In fact the issue here ir, from some profiling, the load/store
> with *reservation*, the isync is fact necessary in current case as acquire barrier. The atomic
> read/store shows a better performance for POWER. And yes, lwsync is faster than isync (show below
> by the experiment with load acquire/store release).
Okay. If lwsync is preferable for an acquire load, it might be best to
check the GCC atomics implementation because I believe it's following
http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html, which suggests
isync for acquire loads.
>
> >
> >> I ran make check and results looks good. I also checked with GCC issues that originated the fix
> >> that added the release barrier in the code (gcc.gnu.org/bugzilla/show_bug.cgi?id=52839#c10) and
> >> it does not show the issue with new implementation.
> > Thanks for doing the tests. Would you want to remove the powerpc
> > implementation after this patch has been committed?
>
> Yes I planning to do it.
Thanks. I've just committed the patch.