This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] [Patch 1/1] [Powerpc] Tune/optimize powerpc{32, 64}/power7/memchr.S.
- From: "Ryan S. Arnold" <ryan dot arnold at gmail dot com>
- To: Will Schmidt <will_schmidt at vnet dot ibm dot com>
- Cc: libc-alpha at sourceware dot org, willschm at us dot ibm dot com
- Date: Fri, 11 May 2012 09:32:53 -0500
- Subject: Re: [PATCH] [Patch 1/1] [Powerpc] Tune/optimize powerpc{32, 64}/power7/memchr.S.
- References: <20120511135221.7637.33663.stgit@brimstone>
On Fri, May 11, 2012 at 8:52 AM, Will Schmidt <will_schmidt@vnet.ibm.com> wrote:
> [Powerpc] Tune/optimize powerpc{32,64}/power7/memchr.S.
>
> Assorted tweaking, twisting and tuning to squeeze a few additional cycles
> out of the memchr code. Â Changes include bypassing the shift pairs (sld,srd)
> when they are not required, and unrolling the small_loop that handles short
> and trailing strings.
> Per scrollpipe data measuring aligned strings for 64-bit, these changes save
> between five and eight cycles (9-13% overall) for short strings (<32), ÂLonger
> aligned strings see slight improvement of 1-3% due to bypassing the shifts
> and the instruction rearranging. ÂAttempts to rework and partially unroll
> the main loop did not show any benefits.
> The Powerpc32 version of the code was changed in a similar fashion to match,
> and should show similar improvements.
>
> Passed make check with no regressions.
>
> While I was in the neighborhood, I updated a few of the existing comments so
> they made a bit more sense to me, and touched up a bit of the whitespace for
> better consistency throughout.
>
> 2012-05-10 ÂWill Schmidt <will_schmidt@vnet.ibm.com>
>
> Â Â Â Â* sysdeps/powerpc/powerpc64/power7/memchr.S: ÂUnrolled short loop and
> Â Â Â Â slight instruction rearrangements per scrollpipe analysis.
> Â Â Â Â* sysdeps/powerpc/powerpc64/power7/memchr.S: ÂDitto.
Hi Will,
I'll apply the patch and check it out. My only question is with some
formatting but I trust your numbers otherwise.
Were there any data sets for which there were regressions in
performance or is this an all-around improvement?
Ryan