This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Unify pthread_once (bug 15215)
- From: Torvald Riegel <triegel at redhat dot com>
- To: Will Newton <will dot newton at linaro dot org>
- Cc: "Joseph S. Myers" <joseph at codesourcery dot com>, "Carlos O'Donell" <carlos at redhat dot com>, GLIBC Devel <libc-alpha at sourceware dot org>, libc-ports <libc-ports at sourceware dot org>
- Date: Mon, 31 Mar 2014 13:58:44 +0200
- Subject: Re: [PATCH] Unify pthread_once (bug 15215)
- Authentication-results: sourceware.org; auth=none
- References: <1368024237 dot 7774 dot 794 dot camel at triegel dot csb> <519D97E4 dot 4030808 at redhat dot com> <1381018836 dot 8757 dot 3598 dot camel at triegel dot csb> <Pine dot LNX dot 4 dot 64 dot 1310071604090 dot 31470 at digraph dot polyomino dot org dot uk> <1381182784 dot 18547 dot 138 dot camel at triegel dot csb> <CANu=DmiGP9b+KSW3DrQKoFCKVQ3mscajBz0-ZASivKQVEXbtjw at mail dot gmail dot com>
On Mon, 2014-03-31 at 12:44 +0100, Will Newton wrote:
> On 7 October 2013 22:53, Torvald Riegel <triegel@redhat.com> wrote:
> > On Mon, 2013-10-07 at 16:04 +0000, Joseph S. Myers wrote:
> >> I have no comments on the substance of this patch, but note that ports/
> >> has a separate ChangeLog file for each architecture.
> >
> > Sorry. The attached patch now has separate ChangeLog entries for each of
> > the affected archs.
>
> There seems to be a significant performance delta on aarch64:
>
> Old code:
>
> "pthread_once": {
> "": {
> "duration": 9.29471e+09, "iterations": 1.10667e+09, "max": 24.54,
> "min": 8.38, "mean": 8.39882
>
> New code:
>
> "pthread_once": {
> "": {
> "duration": 9.72366e+09, "iterations": 4.33843e+08, "max": 30.86,
> "min": 22.38, "mean": 22.4128
>
> And also ARM:
>
> Old code:
>
> "pthread_once": {
> "": {
> "duration": 8.38662e+09, "iterations": 6.6695e+08, "max": 35.292,
> "min": 12.416, "mean": 12.5746
>
> New code:
>
> "pthread_once": {
> "": {
> "duration": 9.26424e+09, "iterations": 3.07574e+08, "max": 86.125,
> "min": 28.875, "mean": 30.1204
>
> It would be nice to understand the source of this variation. I can put
> it on my todo list but I can't promise I will be able to look at it
> any time soon.
The ARM code (or, the code in general) was lacking a memory barrier.
Here's what I wrote in the email that first sent the patch:
> > Both I1 and I2 were missing acquire MO on the very first load of
> > once_control. This needs to synchronize with the release MO on setting
> > the state to init-finished, so without it it's not guaranteed to work
> > either.
> > Note that this will make a call to pthread_once that doesn't need to
> > actually run the init routine slightly slower due to the additional
> > acquire barrier. If you're really concerned about this overhead, speak
> > up. There are ways to avoid it, but it comes with additional complexity
> > and bookkeeping.
One way to try to work around the overhead is to keep thread-local state
that checks via a counter or such whether a particular thread already
used an acquire barrier on a load to this pthread_once previously. This
will help only if the same pthread_once is called several times from the
same thread -- it won't help if a couple of threads all just call a
particular pthread_once a few times.
Also, because we can't keep thread-local state for each pthread_once,
we'd need to group them all -- in return, this will lead to some
synchronization between the initialization phases of unrelated
pthread_once instances.