This is the mail archive of the
mailing list for the glibc project.
Re: PowerPC: libc single-thread lock optimization
- From: Torvald Riegel <triegel at redhat dot com>
- To: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>
- Cc: libc-alpha at sourceware dot org
- Date: Fri, 02 May 2014 16:37:56 +0200
- Subject: Re: PowerPC: libc single-thread lock optimization
- Authentication-results: sourceware.org; auth=none
- References: <5343F8F1 dot 4000400 at linux dot vnet dot ibm dot com> <535ECADE dot 2050004 at linux dot vnet dot ibm dot com> <20140428214938 dot 3B10F2C3A13 at topped-with-meat dot com> <535ED72A dot 5060203 at linux dot vnet dot ibm dot com> <1398788543 dot 32485 dot 1139 dot camel at triegel dot csb> <535FD835 dot 1090702 at linux dot vnet dot ibm dot com> <1398794007 dot 32485 dot 1490 dot camel at triegel dot csb> <535FE9EE dot 1040305 at linux dot vnet dot ibm dot com> <1399039455 dot 32485 dot 6253 dot camel at triegel dot csb> <5363A865 dot 1070306 at linux dot vnet dot ibm dot com>
On Fri, 2014-05-02 at 11:15 -0300, Adhemerval Zanella wrote:
> On 02-05-2014 11:04, Torvald Riegel wrote:
> > On Tue, 2014-04-29 at 15:05 -0300, Adhemerval Zanella wrote:
> >> On 29-04-2014 14:53, Torvald Riegel wrote:
> >>> On Tue, 2014-04-29 at 13:49 -0300, Adhemerval Zanella wrote:
> >>>> On 29-04-2014 13:22, Torvald Riegel wrote:
> >>>>> On Mon, 2014-04-28 at 19:33 -0300, Adhemerval Zanella wrote:
> >>>>>> I bring this about x86 because usually it is the reference implementation and sometimes puzzles
> >>>>>> me that copying the same idea in another platform raise architectural question. But I concede
> >>>>>> that the reference itself maybe had not opted for best solution in first place.
> >>>>>> So if I have understand correctly, is the optimization to check for single-thread and opt to
> >>>>>> use locks is to focused on lowlevellock solely? If so, how do you suggest to other archs to
> >>>>>> mimic x86 optimization on atomic.h primitives? Should other arch follow the x86_64 and
> >>>>>> check for __libc_multiple_threads value instead? This could be a way, however it is redundant
> >>>>>> in mostly way: the TCP definition already contains the information required, so there it no
> >>>>>> need to keep track of it in another memory reference. Also, following x86_64 idea, it check
> >>>>>> for TCB header information for sysdeps/CPU/bits/atomic.h, but for __libc_multiple_threads
> >>>>>> in lowlevellock.h. Which is correct guideline for other archs?
> >>>>> >From a synchronization perspective, I think any single-thread
> >>>>> optimizations belong into the specific concurrent algorithms (e.g.,
> >>>>> mutexes, condvars, ...)
> >>>>> * Doing the optimization at the lowest level (ie, the atomics) might be
> >>>>> insufficient because if there's indeed just one thread, then lots of
> >>>>> synchronization code can be a lot more simpler than just avoiding
> >>>>> atomics (e.g., avoiding loops, checks, ...).
> >>>>> * The mutexes, condvars, etc. are what's exposed to the user, so the
> >>>>> assumptions of whether there really no concurrency or not just make
> >>>>> sense there. For example, a single-thread program can still have a
> >>>>> process-shared condvar, so the condvar would need to use
> >>>>> synchronization.
> >>>> Follow x86_64 idea, this optimization is only for internal atomic usage for
> >>>> libc itself: for a process-shared condvar, one will use pthread code, which
> >>>> is *not* built with this optimization.
> >>> pthread code uses the same atomics we use for libc internally.
> >>> Currently, the x86_64 condvar, for example, doesn't use the atomics --
> >>> but this is what we'd need it do to if we ever want to use unified
> >>> implementations of condvars (e.g., like we did for pthread_once
> >>> recently).
> >> If you check my patch, the SINGLE_THREAD_P is defined as:
> >> #ifndef NOT_IN_libc
> >> # define SINGLE_THREAD_P \
> >> (THREAD_GETMEM (THREAD_SELF, header.multiple_threads) == 0)
> >> #else
> >> # define SINGLE_THREAD_P 0
> >> #endif
> >> So for libpthread, the code path to use non atomic will be eliminated. x86_64 is
> >> not that careful in some atomic primitives though.
> > I think that's not sufficient, nor are the low-level atomics the right
> > place for this kind of optimization.
> > First, there are several source of concurrency affecting shared-memory
> > synchronization:
> > * Threads created by nptl.
> > * Other processes we're interacting with via shared memory.
> > * Reentrancy.
> > * The kernel, if we should synchronize with it via shared memory (e.g.,
> > recent perf does so, IIRC).
> > We control the first. The second case is, I suppose, only reachable by
> > using pthreads pshared sync operations (or not?).
> > In case of reentrancy, there is concurrency between a signal handler and
> > a process consisting of a single thread, so we might want to use atomics
> > to synchronize. I haven't checked whether we actually do (Alex might
> > know after doing the MT-Safety documentation) -- but I would not want us
> > preventing from using atomics for that, so a check on just
> > multiple_threads is not sufficient IMO.
> > Something similar applies to the kernel case. Or if, in the future, we
> > should want to sync with any accelerators or similar.
> As I stated previously, I have dropped to modify the atomic.h in favor of just
> the lowlevellock.h.
> And I think we need to reevaluate then the x86_64 code that does exactly what
> you think is wrong (add the single-thread opt on atomics).
Note that those are different: They drop the "lock" prefix, but they are
not sequential code like what you add.
I agree that it's worth documenting them, but those should work in case