This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 5/6][BZ #11588] x86_64: Remove assembly implementations for pthread_cond_*


On Wed, Jul 30, 2014 at 08:58:03PM -0700, Darren Hart wrote:
> On 7/29/14, 17:31, "gratian.crisan@ni.com" <gratian.crisan@ni.com> wrote:
> 
> >From: Gratian Crisan <gratian.crisan@ni.com>
> >
> >Switch x86_64 from using assembly implementations for pthread_cond_signal,
> >pthread_cond_broadcast, pthread_cond_wait, and pthread_cond_timedwait to
> >using the generic C implementation. Based on benchmarks results (see
> >below)
> >the C implementation is comparable in performance, easier to maintain,
> >less
> >bug prone, and supports priority inheritance for associated mutexes.
> >Note: the bench-pthread_cond output was edited to fit within 80 columns by
> >removing some white space and the 'variance' column.
> 
> 
> The Atom tests in particular seem to vary *greatly* between the C and ASM
> implementations. A 3825 is a Baytrail dual core (silvermont core) I
> believe, which I would have expected some better performance from, with
> fewer bubbles in the instruction pipeline, etc. Perhaps the compiler now
> does a better job at this than the hand written asm in this case.
> 
> I would *love* to see the ASM go away though - thanks for including this.
> 
Could you rerun these tests? It is probably because first test ran at
1.33GHz and second on 500MHz or so. I cannot otherwise explain why futex
call got slower.

In general on atom if c or assembly implementation is faster is
basically a coin flip as they were optimized for different architecture.

You would need to write atom-specific assembly implementation where you
pair instructions that should be executed in parallel which is
pessimization for machines with out-of-order execution.

Second you should try is add -march=atom to c implementation and see if
it helps.

> >
> >C implementation, dual core Intel(R) Atom(TM) CPU E3825 @ 1.33GHz, gcc
> >4.7.3
> >pthread_cond_[test]     iter/threads   mean       min    max        std.
> >dev
> >--------------------------------------------------------------------------
> >--
> >signal (w/o waiters)    1000000/100    95.077     90     28960
> >33.3326
> >broadcast (w/o waiters) 1000000/100    114.874    90     13820
> >78.6426
> >signal                  1000000/1      6704.17    3510   49390
> >3537.21
> >broadcast               1000000/1      6726.35    3850   55430
> >3297.21
> >signal/wait             100000/100     16888.2    12240  6682020
> >15045.4
> >signal/timedwait        100000/100     19246.6    13560  6874950
> >15969.5
> >broadcast/wait          100000/100     17228.5    12390  6461480
> >14780.2
> >broadcast/timedwait     100000/100     19414.5    13910  6656950
> >15681.8
> >
> >Assembly implementation, dual core Intel(R) Atom(TM) CPU E3825 @ 1.33GHz
> >pthread_cond_[test]     iter/threads   mean       min    max        std.
> >dev
> >--------------------------------------------------------------------------
> >--
> >signal (w/o waiters)    1000000/100    263.81     70     120171680  90138
> >broadcast (w/o waiters) 1000000/100    264.213    70     160178010
> >91861.4
> >signal                  1000000/1      15851.7    3800   13372770   13889
> >broadcast               1000000/1      16095.2    5900   14940170
> >16346.7
> >signal/wait             100000/100     33151      7930   252746080  475402
> >signal/timedwait        100000/100     34921.1    10950  147023040  270191
> >broadcast/wait          100000/100     33400.2    11810  247194720  455105
> >broadcast/timedwait     100000/100     35022.1    13610  161552720  30328
> >


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]