This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [Patch, MIPS] Modify memcpy.S for mips32r6/mips64r6

From: OndÅej BÃlka <neleai at seznam dot cz>
To: Richard Henderson <rth at twiddle dot net>
Cc: sellcey at imgtec dot com, Joseph Myers <joseph at codesourcery dot com>, libc-alpha at sourceware dot org
Date: Tue, 23 Dec 2014 21:30:46 +0100
Subject: Re: [Patch, MIPS] Modify memcpy.S for mips32r6/mips64r6
Authentication-results: sourceware.org; auth=none
References: <7ec2bf7e-fc1e-428b-ac0a-747f2a3ab3e6 at BAMAIL02 dot ba dot imgtec dot org> <alpine dot DEB dot 2 dot 10 dot 1412221758190 dot 5278 at digraph dot polyomino dot org dot uk> <1419354526 dot 27606 dot 73 dot camel at ubuntu-sellcey> <5499ABF8 dot 3060307 at twiddle dot net>

On Tue, Dec 23, 2014 at 09:52:56AM -0800, Richard Henderson wrote:
> On 12/23/2014 09:08 AM, Steve Ellcey wrote:
> > +	andi	t8,a0,7
> > +	lapc	t9,L(atable)
> > +	PTR_LSA	t9,t8,t9,2
> > +	jrc	t9
> > +L(atable):
> > +	bc	L(lb0)
> > +	bc	L(lb7)
> > +	bc	L(lb6)
> > +	bc	L(lb5)
> > +	bc	L(lb4)
> > +	bc	L(lb3)
> > +	bc	L(lb2)
> > +	bc	L(lb1)
> > +L(lb7):
> > +	lb	a3, 6(a1)
> > +	sb	a3, 6(a0)
> > +L(lb6):
> > +	lb	a3, 5(a1)
> > +	sb	a3, 5(a0)
> > +L(lb5):
> > +	lb	a3, 4(a1)
> > +	sb	a3, 4(a0)
> > +L(lb4):
> > +	lb	a3, 3(a1)
> > +	sb	a3, 3(a0)
> > +L(lb3):
> > +	lb	a3, 2(a1)
> > +	sb	a3, 2(a0)
> > +L(lb2):
> > +	lb	a3, 1(a1)
> > +	sb	a3, 1(a0)
> > +L(lb1):
> > +	lb	a3, 0(a1)
> > +	sb	a3, 0(a0)
> L(lbx):
> > +
> > +	li	t9,8
> > +	subu	t8,t9,t8
> > +	PTR_SUBU a2,a2,t8
> > +	PTR_ADDU a0,a0,t8
> > +	PTR_ADDU a1,a1,t8
> > +L(lb0):
> 
> This table is regular enough that I wonder if it wouldn't be better to do some
> arithmetic instead of a branch-to-branch.  E.g.
> 
> 	andi	t7,a0,7
> 	li	t8,L(lb0)-L(lbx)
> 	lsa	t8,t7,t8,8
> 	lapc	t9,L(lb0)
> 	selnez	t8,t8,t7
> 	PTR_SUBU t9,t9,t8
> 	jrc	t9
> 
> Which is certainly smaller than your 12 insns, unlikely to be slower on any
> conceivable hardware, but probably faster on most.
> 
Do you have that hardware? I already objected versus table but do not
have data. I wouldn't be surprised if its slower than byte-by-byte copy 
with if after each byte. Or just copy 8 bytes without condition but I am
not sure how hardware handles overlapping stores. Difference will be
bigger in practice, in profiling around 50% calls are 8 byte aligned and
you save address calculation cost on these.

Follow-Ups:
- RE: [Patch, MIPS] Modify memcpy.S for mips32r6/mips64r6
  - From: Matthew Fortune

References:
- [Patch, MIPS] Modify memcpy.S for mips32r6/mips64r6
  - From: Steve Ellcey
- Re: [Patch, MIPS] Modify memcpy.S for mips32r6/mips64r6
  - From: Joseph Myers
- Re: [Patch, MIPS] Modify memcpy.S for mips32r6/mips64r6
  - From: Steve Ellcey
- Re: [Patch, MIPS] Modify memcpy.S for mips32r6/mips64r6
  - From: Richard Henderson

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]