This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH 2/2] PowerPC: libc single-thread lock optimization
- From: Torvald Riegel <triegel at redhat dot com>
- To: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>
- Cc: "GNU C. Library" <libc-alpha at sourceware dot org>
- Date: Mon, 05 May 2014 12:33:08 +0200
- Subject: Re: [PATCH 2/2] PowerPC: libc single-thread lock optimization
- Authentication-results: sourceware.org; auth=none
- References: <5361013D dot 9020005 at linux dot vnet dot ibm dot com> <1399045370 dot 32485 dot 6639 dot camel at triegel dot csb> <5363D878 dot 9080803 at linux dot vnet dot ibm dot com>
On Fri, 2014-05-02 at 14:40 -0300, Adhemerval Zanella wrote:
> On 02-05-2014 12:42, Torvald Riegel wrote:
> > On Wed, 2014-04-30 at 10:57 -0300, Adhemerval Zanella wrote:
> >> +#else
> >> +# define __lll_is_single_thread (0)
> >> +#endif
> >> +
> >> +
> >> #define lll_futex_wait(futexp, val, private) \
> >> lll_futex_timed_wait (futexp, val, NULL, private)
> >>
> >> @@ -205,7 +215,9 @@
> >> /* Set *futex to ID if it is 0, atomically. Returns the old value */
> >> #define __lll_robust_trylock(futex, id) \
> >> ({ int __val; \
> >> - __asm __volatile ("1: lwarx %0,0,%2" MUTEX_HINT_ACQ "\n" \
> >> + if (!__lll_is_single_thread) \
> >> + __asm __volatile ( \
> >> + "1: lwarx %0,0,%2" MUTEX_HINT_ACQ "\n" \
> >> " cmpwi 0,%0,0\n" \
> >> " bne 2f\n" \
> >> " stwcx. %3,0,%2\n" \
> >> @@ -214,6 +226,12 @@
> >> : "=&r" (__val), "=m" (*futex) \
> >> : "r" (futex), "r" (id), "m" (*futex) \
> >> : "cr0", "memory"); \
> >> + else \
> >> + { \
> >> + __val = *futex; \
> >> + if (__val == 0) \
> >> + *futex = id; \
> >> + } \
> >> __val; \
> >> })
> > Conceptually, you can safely use trylock in signal handlers, because
> > it's not a blocking operation. And this is used for normal and robust
> > locks. I haven't checked all the trylock code and all uses to see
> > whether it is indeed AS-Safe, but I'm pretty sure this change is wrong.
> > At the very least, it doesn't document that trylock is now AS-Unsafe.
>
> If there is the case I concede to you, however I couldn't find any proper
> documentation (either POSIX or from our manual) saying trylocks should be
> AS-Safe.
We haven't documented it AFAIK, but these are our internal locks, and
unlike lock() which blocks, trylock() won't block. So, trylock's
semantics can be useful in signal handler contexts, and your are
removing this possibility. At the very least, if we have no current
internal users of trylock in AS-Safe functions, then we could consider
this change. But I think this needs wider consensus in the community.
Regarding POSIX, I'm not aware of any explicit mentioning of what's safe
in signal handlers. However, trylock is synchronizing, and documented
as non-blocking, so one could consider it being AS-Safe.
> And answering your previous question, I don't know if anyone is
> tracking the remaining documentation about pthread.
Nor do I really, but if you make a change to the semantics, it's worth
documenting it even if there's no existing documentation.
> >
> > Finally, for a change as this whose effect on performance is
> > non-obvious, I think we really need performance measurements. We need
> > to have microbenchmarks so that we can track the change of performance.
> >
> Although micro-benchmarks can provided (and they will prob be in that loop)
> 'this is not a real case'),
What do you mean? Will you provide microbenchmarks later on, just some
for the atomics, or some for a specific use of the modified locks (e.g.,
malloc)?
> for PowerPC avoid LL/SC instructions where they
> are not required is always a gain.
I think we had consensus in the community that we want to track
performance, and base performance-focused changes on data that we can
reassess later on (e.g., for a future PowerPC generation). If we want
to do that, then I do think this needs microbenchmarks. Two options
come to mind:
* Plain RMWs or CASs to cached and uncached memory. The best case for
the optimization, I suppose.
* Real use of the optimization in, say, malloc. That's a more realistic
benchmark.
If the second doesn't show any significant performance differences
(e.g., because there's so much other stuff going on that this doesn't
matter), then maybe it's not worth it to maintain it. We can only say
with some confidence what's the case after having numbers.