This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH 5/6][BZ #11588] x86_64: Remove assembly implementations for pthread_cond_*
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Darren Hart <dvhart at linux dot intel dot com>
- Cc: gratian dot crisan at ni dot com, libc-alpha at sourceware dot org, Carlos O'Donell <carlos at redhat dot com>, Joseph Myers <joseph at codesourcery dot com>, Jeff Law <law at redhat dot com>, Scot Salmon <scot dot salmon at ni dot com>, Siddhesh Poyarekar <spoyarek at redhat dot com>, Thomas Gleixner <tglx at linutronix dot de>, Torvald Riegel <triegel at redhat dot com>, Clark Williams <williams at redhat dot com>, "Paul E. McKenney" <paulmck at linux dot vnet dot ibm dot com>, Will Newton <will dot newton at linaro dot org>, gratian at gmail dot com
- Date: Thu, 31 Jul 2014 11:17:44 +0200
- Subject: Re: [PATCH 5/6][BZ #11588] x86_64: Remove assembly implementations for pthread_cond_*
- Authentication-results: sourceware.org; auth=none
- References: <OF6ABEE614 dot FAE80AD2-ON86257D0E dot 006B38F4-86257D0E dot 0070034A at ni dot com> <1406680317-20189-1-git-send-email-gratian dot crisan at ni dot com> <1406680317-20189-6-git-send-email-gratian dot crisan at ni dot com> <CFFF0B3F dot 9EB56%dvhart at linux dot intel dot com>
On Wed, Jul 30, 2014 at 08:58:03PM -0700, Darren Hart wrote:
> On 7/29/14, 17:31, "gratian.crisan@ni.com" <gratian.crisan@ni.com> wrote:
>
> >From: Gratian Crisan <gratian.crisan@ni.com>
> >
> >Switch x86_64 from using assembly implementations for pthread_cond_signal,
> >pthread_cond_broadcast, pthread_cond_wait, and pthread_cond_timedwait to
> >using the generic C implementation. Based on benchmarks results (see
> >below)
> >the C implementation is comparable in performance, easier to maintain,
> >less
> >bug prone, and supports priority inheritance for associated mutexes.
> >Note: the bench-pthread_cond output was edited to fit within 80 columns by
> >removing some white space and the 'variance' column.
>
>
> The Atom tests in particular seem to vary *greatly* between the C and ASM
> implementations. A 3825 is a Baytrail dual core (silvermont core) I
> believe, which I would have expected some better performance from, with
> fewer bubbles in the instruction pipeline, etc. Perhaps the compiler now
> does a better job at this than the hand written asm in this case.
>
> I would *love* to see the ASM go away though - thanks for including this.
>
Could you rerun these tests? It is probably because first test ran at
1.33GHz and second on 500MHz or so. I cannot otherwise explain why futex
call got slower.
In general on atom if c or assembly implementation is faster is
basically a coin flip as they were optimized for different architecture.
You would need to write atom-specific assembly implementation where you
pair instructions that should be executed in parallel which is
pessimization for machines with out-of-order execution.
Second you should try is add -march=atom to c implementation and see if
it helps.
> >
> >C implementation, dual core Intel(R) Atom(TM) CPU E3825 @ 1.33GHz, gcc
> >4.7.3
> >pthread_cond_[test] iter/threads mean min max std.
> >dev
> >--------------------------------------------------------------------------
> >--
> >signal (w/o waiters) 1000000/100 95.077 90 28960
> >33.3326
> >broadcast (w/o waiters) 1000000/100 114.874 90 13820
> >78.6426
> >signal 1000000/1 6704.17 3510 49390
> >3537.21
> >broadcast 1000000/1 6726.35 3850 55430
> >3297.21
> >signal/wait 100000/100 16888.2 12240 6682020
> >15045.4
> >signal/timedwait 100000/100 19246.6 13560 6874950
> >15969.5
> >broadcast/wait 100000/100 17228.5 12390 6461480
> >14780.2
> >broadcast/timedwait 100000/100 19414.5 13910 6656950
> >15681.8
> >
> >Assembly implementation, dual core Intel(R) Atom(TM) CPU E3825 @ 1.33GHz
> >pthread_cond_[test] iter/threads mean min max std.
> >dev
> >--------------------------------------------------------------------------
> >--
> >signal (w/o waiters) 1000000/100 263.81 70 120171680 90138
> >broadcast (w/o waiters) 1000000/100 264.213 70 160178010
> >91861.4
> >signal 1000000/1 15851.7 3800 13372770 13889
> >broadcast 1000000/1 16095.2 5900 14940170
> >16346.7
> >signal/wait 100000/100 33151 7930 252746080 475402
> >signal/timedwait 100000/100 34921.1 10950 147023040 270191
> >broadcast/wait 100000/100 33400.2 11810 247194720 455105
> >broadcast/timedwait 100000/100 35022.1 13610 161552720 30328
> >