This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: PowerPC LE strlen

From: Will Schmidt <will_schmidt at vnet dot ibm dot com>
To: Alan Modra <amodra at gmail dot com>
Cc: libc-alpha at sourceware dot org, ryan dot arnold at gmail dot com
Date: Tue, 13 Aug 2013 15:53:48 -0500
Subject: Re: PowerPC LE strlen
References: <20130809051815 dot GH3294 at bubble dot grove dot modra dot org>
Reply-to: will_schmidt at vnet dot ibm dot com

On Fri, 2013-08-09 at 14:48 +0930, Alan Modra wrote:
> This is the first of nine patches adding little-endian support to the
> existing optimised string and memory functions.  I did spend some
> time with a power7 simulator looking at cycle by cycle behaviour for
> memchr, but most of these patches have not been run on cpu simulators
> to check that we are going as fast as possible.  I'm sure PowerPC can
> do better.  However, the little-endian support mostly leaves main
> loops unchanged, so I'm banking on previous authors having done a
> good job on big-endian..  As with most code you stare at long enough,
> I found some improvements for big-endian too.
> 
> This one is LE support for strlen.  Like most of the string functions,
> I leave the main word or multiple-word loops substantially unchanged,
> just needing to modify the tail.
> 
> Removing the branch in the power7 functions is just a tidy.  .align
> produces a branch anyway.  Modifying regs in the non-power7 functions

Interesting detail - I was not aware that .align produced a branch for
us.   That answered most of my questions.  :-)

<...>

>  ENTRY (strlen)
>  	CALL_MCOUNT 1
> 
> -#define rTMP1	r0
> +#define rTMP4	r0
>  #define rRTN	r3	/* incoming STR arg, outgoing result */
>  #define rSTR	r4	/* current string position */
>  #define rPADN	r5	/* number of padding bits we prepend to the
> @@ -88,9 +93,9 @@ ENTRY (strlen)
>  #define rWORD1	r8	/* current string doubleword */
>  #define rWORD2	r9	/* next string doubleword */
>  #define rMASK	r9	/* mask for first string doubleword */
> -#define rTMP2	r10
> -#define rTMP3	r11
> -#define rTMP4	r12
> +#define rTMP1	r10
> +#define rTMP2	r11
> +#define rTMP3	r12
<...>
> -	nor	rTMP1, rTMP2, rTMP1
> -	and.	rWORD1, rTMP1, rMASK

> +	nor	rTMP3, rTMP2, rTMP1
> +	and.	rTMP3, rTMP3, rMASK

^ For this and related changes, is this clean-up such that it's easier
to read, or is there an underlying improvement in how we were using the
involved registers? 


I've got no issues with the patch - looks good to me.  :-) 

Thanks, 
-Will

Follow-Ups:
- Re: PowerPC LE strlen
  - From: Alan Modra

References:
- PowerPC LE strlen
  - From: Alan Modra

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]