This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH] PowerPC - Optimization for str[n]casecmp functions
- From: "Ryan S. Arnold" <ryan dot arnold at gmail dot com>
- To: "GNU C. Library" <libc-alpha at sourceware dot org>
- Cc: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>
- Date: Tue, 22 Nov 2011 15:50:05 -0600
- Subject: Re: [PATCH] PowerPC - Optimization for str[n]casecmp functions
- References: <4EA70840.email@example.com><CAOPLpQeE+8b_KVbX366vnqEh7-O5ejYsSKRhs1YC6HBmF3B9YA@mail.gmail.com><4ECC067A.firstname.lastname@example.org>
On Tue, Nov 22, 2011 at 2:30 PM, Adhemerval Zanella
> On 10/29/2011 02:03 PM, Ulrich Drepper wrote:
> > On Tue, Oct 25, 2011 at 15:04, Adhemerval Zanella
> > <email@example.com> wrote:
> >> This patch provides throughput boost for the strcasecmp/strncasecmp
> >> functions for POWER7 for both ppc32 (25%) and ppc64 (40%),
> > If it does, then fix the compiler to automatically do this. ÂAdd
> > options for the ppc builds. ÂThere i no reason whatsoever to add this
> > code as it is identical to the generic code.
> I have tried to make GCC emit good code for the function strcasecmp by adding an option
> to force loop unrolling (GCC does not unroll loops with branches, even with
> -funroll-all-loops) and although the resulting code was better I still could see some
> room for improvements.
> For the strcasecmp algorithm GCC unrolls loops by adding arithmetic instructions to
> update the string descriptors, while a better version would be by using loads with
> offsets (lbz r1, offset(r2)) and just updating the loops at end. While this is the best
> option for this algorithm it might not the best option in general.
We discussed this issue with the IBM PowerPC compiler team and
Adhemerval's approach is seen as the best option currently available
The compiler could in theory do such unrolling, but right now it is
likely beyond the current unrolling implementation. And even if this
could be enabled it would be quite some time before one could build a
glibc that would benefit from the change.
So I'm acking this patch. We'd like to see this performance enhancement.
Ryan S. Arnold