This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Patch, MIPS] Modify memcpy.S for mips32r6/mips64r6


On Tue, Dec 23, 2014 at 09:52:56AM -0800, Richard Henderson wrote:
> On 12/23/2014 09:08 AM, Steve Ellcey wrote:
> > +	andi	t8,a0,7
> > +	lapc	t9,L(atable)
> > +	PTR_LSA	t9,t8,t9,2
> > +	jrc	t9
> > +L(atable):
> > +	bc	L(lb0)
> > +	bc	L(lb7)
> > +	bc	L(lb6)
> > +	bc	L(lb5)
> > +	bc	L(lb4)
> > +	bc	L(lb3)
> > +	bc	L(lb2)
> > +	bc	L(lb1)
> > +L(lb7):
> > +	lb	a3, 6(a1)
> > +	sb	a3, 6(a0)
> > +L(lb6):
> > +	lb	a3, 5(a1)
> > +	sb	a3, 5(a0)
> > +L(lb5):
> > +	lb	a3, 4(a1)
> > +	sb	a3, 4(a0)
> > +L(lb4):
> > +	lb	a3, 3(a1)
> > +	sb	a3, 3(a0)
> > +L(lb3):
> > +	lb	a3, 2(a1)
> > +	sb	a3, 2(a0)
> > +L(lb2):
> > +	lb	a3, 1(a1)
> > +	sb	a3, 1(a0)
> > +L(lb1):
> > +	lb	a3, 0(a1)
> > +	sb	a3, 0(a0)
> L(lbx):
> > +
> > +	li	t9,8
> > +	subu	t8,t9,t8
> > +	PTR_SUBU a2,a2,t8
> > +	PTR_ADDU a0,a0,t8
> > +	PTR_ADDU a1,a1,t8
> > +L(lb0):
> 
> This table is regular enough that I wonder if it wouldn't be better to do some
> arithmetic instead of a branch-to-branch.  E.g.
> 
> 	andi	t7,a0,7
> 	li	t8,L(lb0)-L(lbx)
> 	lsa	t8,t7,t8,8
> 	lapc	t9,L(lb0)
> 	selnez	t8,t8,t7
> 	PTR_SUBU t9,t9,t8
> 	jrc	t9
> 
> Which is certainly smaller than your 12 insns, unlikely to be slower on any
> conceivable hardware, but probably faster on most.
> 
Do you have that hardware? I already objected versus table but do not
have data. I wouldn't be surprised if its slower than byte-by-byte copy 
with if after each byte. Or just copy 8 bytes without condition but I am
not sure how hardware handles overlapping stores. Difference will be
bigger in practice, in profiling around 50% calls are 8 byte aligned and
you save address calculation cost on these.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]