This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 5/6][BZ #11588] x86_64: Remove assembly implementations for pthread_cond_*


Ondřej Bílka <neleai@seznam.cz> wrote on 07/31/2014 04:17:44 AM:

> Subject: Re: [PATCH 5/6][BZ #11588] x86_64: Remove assembly 
> implementations for pthread_cond_*
> 
> On Wed, Jul 30, 2014 at 08:58:03PM -0700, Darren Hart wrote:
> > On 7/29/14, 17:31, "gratian.crisan@ni.com" <gratian.crisan@ni.com> 
wrote:
> > 
> > >From: Gratian Crisan <gratian.crisan@ni.com>
> > >
> > >Switch x86_64 from using assembly implementations for 
pthread_cond_signal,
> > >pthread_cond_broadcast, pthread_cond_wait, and pthread_cond
_timedwait 
to
> > >using the generic C implementation. Based on benchmarks results (see
> > >below)
> > >the C implementation is comparable in performance, easier to 
maintain,
> > >less
> > >bug prone, and supports priority inheritance for associated mutexes.
> > >Note: the bench-pthread_cond output was edited to fit within 80 
columns by
> > >removing some white space and the 'variance' column.
> > 
> > 
> > The Atom tests in particular seem to vary *greatly* between the C and 
ASM
> > implementations. A 3825 is a Baytrail dual core (silvermont core) I
> > believe, which I would have expected some better performance from, 
with
> > fewer bubbles in the instruction pipeline, etc. Perhaps the compiler 
now
> > does a better job at this than the hand written asm in this case.
> > 
> > I would *love* to see the ASM go away though - thanks for including 
this.
> > 
> Could you rerun these tests? It is probably because first test ran at
> 1.33GHz and second on 500MHz or so. I cannot otherwise explain why futex
> call got slower.
> 
> In general on atom if c or assembly implementation is faster is
> basically a coin flip as they were optimized for different architecture.
> 
> You would need to write atom-specific assembly implementation where you
> pair instructions that should be executed in parallel which is
> pessimization for machines with out-of-order execution.
> 
> Second you should try is add -march=atom to c implementation and see if
> it helps.

Thanks. These are good suggestions. I will re-run the benchmarks on the 
Atom/Baytrail board and re-post the results.
I have a few other ideas to try that might explain the variation in the 
results (they are related to how this particular system is configured with

the PREEMPT_RT patch and default core affinity for non-RT processes).

-Gratian


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]