This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: ppc64 vDSO in mainline

From: Steve Munroe <sjmunroe at us dot ibm dot com>
To: Ulrich Drepper <drepper at redhat dot com>
Cc: Alan Modra <amodra at bigpond dot net dot au>, Benjamin Herrenschmidt <benh at kernel dot crashing dot org>, libc-alpha at sources dot redhat dot com, Roland McGrath <roland at redhat dot com>
Date: Tue, 29 Mar 2005 08:50:14 -0600
Subject: Re: ppc64 vDSO in mainline

Ulrich Drepper <drepper@redhat.com> wrote on 03/28/2005 05:59:36 PM:

> Steve Munroe wrote:
> > 3) a function which is currently
> > exported by libc , but a better optimized version (with a different
> > symbol) is also exported by the VDSO.
> 
> There is no reason to add any complications or dependency problems for
> this.  Just using a pointer in libc itself, a test for NULL and if not,
> jump to the function is enough.  The penalty for this extra indication
> if minimal compared to all the other work involved.
> 

But this level of indirection is unacceptable overhead for some functions. 
I have rewritten memcpy twice and will rewrite it again. I am in the 
process of rewriting memcmp and then strncmp. Why? because they show up as 
hotspots in the importance benchmarks like SPEC and TPC-C.

I can do this aggressive optimization for powerpc64 because I have access 
to all currently released 64-bit implementations. I can't do the same for 
powerpc32 because there are so many different varieties for 32-bit 
implementation. My best efforts for a powerpc32 memcpy/memcmp on 
POWER4/POWER5 might make users of older pMAC and 4xx embedded hardware 
very unhappy. But if I know I am running on a PPC64 kernel I will know 
exactly which processor I am running on and can provide appropriately 
optimized string function for both powerpc32/powerpc64.

There is no mechanism is glibc to deal with this problem (processor 
specific optimization).

So the additional overhead (at least 9 cycles with the NULL pointer check) 
does matter. In the new memcmp I can compare 8 bytes per cycle (8 x 9 == 
72 bytes) so this overhead is significant. 

If you don't believe me, put that G5 to good use, and find out for your 
self: http://www.alphaworks.ibm.com/tech/simppc, 
http://sourceforge.net/projects/perfinsp

Steven J. Munroe
Linux on Power Toolchain Architect
IBM Corporation, Linux Technology Center

Follow-Ups:
- Re: ppc64 vDSO in mainline
  - From: Ulrich Drepper

References:
- Re: ppc64 vDSO in mainline
  - From: Ulrich Drepper

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]