This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: PowerPC LE strlen
- From: Will Schmidt <will_schmidt at vnet dot ibm dot com>
- To: Alan Modra <amodra at gmail dot com>
- Cc: libc-alpha at sourceware dot org, ryan dot arnold at gmail dot com
- Date: Tue, 13 Aug 2013 15:53:48 -0500
- Subject: Re: PowerPC LE strlen
- References: <20130809051815 dot GH3294 at bubble dot grove dot modra dot org>
- Reply-to: will_schmidt at vnet dot ibm dot com
On Fri, 2013-08-09 at 14:48 +0930, Alan Modra wrote:
> This is the first of nine patches adding little-endian support to the
> existing optimised string and memory functions. I did spend some
> time with a power7 simulator looking at cycle by cycle behaviour for
> memchr, but most of these patches have not been run on cpu simulators
> to check that we are going as fast as possible. I'm sure PowerPC can
> do better. However, the little-endian support mostly leaves main
> loops unchanged, so I'm banking on previous authors having done a
> good job on big-endian.. As with most code you stare at long enough,
> I found some improvements for big-endian too.
>
> This one is LE support for strlen. Like most of the string functions,
> I leave the main word or multiple-word loops substantially unchanged,
> just needing to modify the tail.
>
> Removing the branch in the power7 functions is just a tidy. .align
> produces a branch anyway. Modifying regs in the non-power7 functions
Interesting detail - I was not aware that .align produced a branch for
us. That answered most of my questions. :-)
<...>
> ENTRY (strlen)
> CALL_MCOUNT 1
>
> -#define rTMP1 r0
> +#define rTMP4 r0
> #define rRTN r3 /* incoming STR arg, outgoing result */
> #define rSTR r4 /* current string position */
> #define rPADN r5 /* number of padding bits we prepend to the
> @@ -88,9 +93,9 @@ ENTRY (strlen)
> #define rWORD1 r8 /* current string doubleword */
> #define rWORD2 r9 /* next string doubleword */
> #define rMASK r9 /* mask for first string doubleword */
> -#define rTMP2 r10
> -#define rTMP3 r11
> -#define rTMP4 r12
> +#define rTMP1 r10
> +#define rTMP2 r11
> +#define rTMP3 r12
<...>
> - nor rTMP1, rTMP2, rTMP1
> - and. rWORD1, rTMP1, rMASK
> + nor rTMP3, rTMP2, rTMP1
> + and. rTMP3, rTMP3, rMASK
^ For this and related changes, is this clean-up such that it's easier
to read, or is there an underlying improvement in how we were using the
involved registers?
I've got no issues with the patch - looks good to me. :-)
Thanks,
-Will