This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Patch, MIPS] Modify memset.S for mips32r6/mips64r6

On Mon, 2014-12-22 at 18:02 +0000, Joseph Myers wrote:
> On Fri, 19 Dec 2014, Steve Ellcey  wrote:
> > @@ -188,9 +205,9 @@ LEAF(MEMSET_NAME)
> >  
> >  	.set	nomips16
> >  	.set	noreorder
> > -/* If the size is less than 2*NSIZE (8 or 16), go to L(lastb).  Regardless of
> > +/* If the size is less than 4*NSIZE (16 or 32), go to L(lastb).  Regardless of
> >     size, copy dst pointer to v0 for the return value.  */
> > -	slti	t2,a2,(2 * NSIZE)
> > +	slti	t2,a2,(4 * NSIZE)
> >  	bne	t2,zero,L(lastb)
> >  	move	v0,a0
> >  
> This doesn't appear to have anything obvious to do with r6.  Please submit 
> it separately from the r6 changes, with its own rationale (or if it is in 
> fact needed as part of the r6 changes, please explain further).
> In general, r6 changes should not result in any changes to the code in 
> glibc built for non-r6 - if the best way to support r6 also involves 
> changes to code for non-r6 (e.g. if some cleanup or optimization naturally 
> results in r6 support because the non-r6-compatible code was bad in some 
> way and r6-compatible code is better for all versions, or if code 
> compatible with both r6 and pre-r6 can be just as good on all versions as 
> the existing non-r6-compatible code) then it's best to submit those 
> changes separately.

The change is needed for r6 because I align the buffer to 8 bytes
instead of to 4 bytes for 32 bit mode and 8 bytes for 64 bit mode.  This
should help with cache alignment and, depending on the CPU, with memory
bonding where two sequential 4 byte loads can be handled as if they were
a single 8 byte load.

Without the slti change (assuming O32 mode), if I have a non-aligned 9
byte buffer I could set up to 7 bytes to get it 8 byte aligned and that
leaves less then a full 4 byte word to be set.  The existing code after
this check assumes there is at lease one full word needs to be set.

I considered changing the alignment code to only align on a 4 byte
boundary for O32 mode, or ifdef'ing this test but it seemed cleaner to
increase the minimum size of buffers that get handled via a simple byte
copy loop for both r6 and earlier CPU's.

If you don't like this part of the patch I could go back to aligning on
a 4 byte boundary in 32 bit mode for now and get some performance data
on the different alignment options later.

Steve Ellcey

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]