This is the mail archive of the
mailing list for the glibc project.
Re: [Patch, MIPS] Modify memset.S for mips32r6/mips64r6
- From: Steve Ellcey <sellcey at imgtec dot com>
- To: Joseph Myers <joseph at codesourcery dot com>
- Cc: <libc-alpha at sourceware dot org>
- Date: Mon, 22 Dec 2014 11:20:35 -0800
- Subject: Re: [Patch, MIPS] Modify memset.S for mips32r6/mips64r6
- Authentication-results: sourceware.org; auth=none
- References: <2923c970-026c-4e00-be7a-0650e82421b5 at BAMAIL02 dot ba dot imgtec dot org> <alpine dot DEB dot 2 dot 10 dot 1412221759370 dot 5278 at digraph dot polyomino dot org dot uk>
- Reply-to: <sellcey at imgtec dot com>
On Mon, 2014-12-22 at 18:02 +0000, Joseph Myers wrote:
> On Fri, 19 Dec 2014, Steve Ellcey wrote:
> > @@ -188,9 +205,9 @@ LEAF(MEMSET_NAME)
> > .set nomips16
> > .set noreorder
> > -/* If the size is less than 2*NSIZE (8 or 16), go to L(lastb). Regardless of
> > +/* If the size is less than 4*NSIZE (16 or 32), go to L(lastb). Regardless of
> > size, copy dst pointer to v0 for the return value. */
> > - slti t2,a2,(2 * NSIZE)
> > + slti t2,a2,(4 * NSIZE)
> > bne t2,zero,L(lastb)
> > move v0,a0
> This doesn't appear to have anything obvious to do with r6. Please submit
> it separately from the r6 changes, with its own rationale (or if it is in
> fact needed as part of the r6 changes, please explain further).
> In general, r6 changes should not result in any changes to the code in
> glibc built for non-r6 - if the best way to support r6 also involves
> changes to code for non-r6 (e.g. if some cleanup or optimization naturally
> results in r6 support because the non-r6-compatible code was bad in some
> way and r6-compatible code is better for all versions, or if code
> compatible with both r6 and pre-r6 can be just as good on all versions as
> the existing non-r6-compatible code) then it's best to submit those
> changes separately.
The change is needed for r6 because I align the buffer to 8 bytes
instead of to 4 bytes for 32 bit mode and 8 bytes for 64 bit mode. This
should help with cache alignment and, depending on the CPU, with memory
bonding where two sequential 4 byte loads can be handled as if they were
a single 8 byte load.
Without the slti change (assuming O32 mode), if I have a non-aligned 9
byte buffer I could set up to 7 bytes to get it 8 byte aligned and that
leaves less then a full 4 byte word to be set. The existing code after
this check assumes there is at lease one full word needs to be set.
I considered changing the alignment code to only align on a 4 byte
boundary for O32 mode, or ifdef'ing this test but it seemed cleaner to
increase the minimum size of buffers that get handled via a simple byte
copy loop for both r6 and earlier CPU's.
If you don't like this part of the patch I could go back to aligning on
a 4 byte boundary in 32 bit mode for now and get some performance data
on the different alignment options later.