This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: PowerPC: Optimized mpn functions for PowerPC64/POWER7


On Tue, Nov 05, 2013 at 12:15:03PM -0200, Adhemerval Zanella wrote:
> This patch add optimized __mpn_add_n/__mpn_sub_n for PowerPC64/POWER7.
> They are originally from GMP with adjustments for GLIBC. It provides a
> small boost for mpn_divrem.

It's a pity to see the comments from the gmp version gone..

> +#define RP  r3
> +#define UP  r4
> +#define VP  r5
> +#define N   r6

This is just a style preference, but I like assembly code better that
uses defines with somewhat descriptive names throughout, *or* uses the
actual registers throughout, with a preference for the latter.  In
code like yours here that uses a mix, I find myself wondering whether
rXYZ clashes with r3, and having to go back to the defines to check.
Well, not so much with a simple function like this one, but other
powerpc code in glibc makes me scratch my head.

> +
> +EALIGN(FUNC, 5, 0)
> +#ifdef USE_AS_SUB
> +	addic	r0, r0, r0

Don't use r0 for the last arg when you mean 0.

> +#else
> +	addic   r0, r1, -1
> +#endif
> +	andi.	r7, N, r1

Similarly here.  andi is not operating on the contents of r1 here.

> +	beq	L(bx0)
> +
> +	ld	r7, r0(UP)

0 not r0, similarly below.

> +	ld	r9, r0(VP)
> +	ADDSUBC	r11, r9, r7
> +	std	r11, r0(RP)
> +	cmpldi	N, N, r1

Again, 1 not r1.

> +	beq	N, L(end)
> +	addi	UP, UP, r8
> +	addi	VP, VP, r8
> +	addi	RP, RP, r8

And again.  8, not r8.

> +
> +L(bx0):	addi	r0, N, 2
> +	srdi	r0, r0, 2
> +	mtctr	r0
> +
> +	andi.	r7, N, 2
> +	bne	L(mid)
> +
> +	addi	UP, UP, 16
> +	addi	VP, VP, 16
> +	addi	RP, RP, 16
> +
> +	.align	5
> +L(top):	ld	N, -16(UP)

Use the register here rather than N.  r6 is no longer the count.
Similarly below.

> +	ld	r7, -8(UP)
> +	ld	r8, -16(VP)
> +	ld	r9, -8(VP)
> +	ADDSUBC	r10, r8, N
> +	ADDSUBC	r11, r9, r7
> +	std	r10, -16(RP)
> +	std	r11, -8(RP)
> +L(mid):	ld	N, 0(UP)
> +	ld	r7, r8(UP)

8 not r8.

> +	ld	r8, 0(VP)
> +	ld	r9, r8(VP)

Ditto.

> +	ADDSUBC	r10, r8, N
> +	ADDSUBC	r11, r9, r7
> +	std	r10, r0(RP)
> +	std	r11, r8(RP)

And again, 0 not r0 too.

> +	addi	UP, UP, 32
> +	addi	VP, VP, 32
> +	addi	RP, RP, 32
> +	bdnz	L(top)
> +
> +L(end):	subfe	RP, r0, r0

I wouldn't use RP here and below, since the use of r3 here is really
nothing to do with the pointer input parameter.  It's this sort of
multiple use of a register and the r6/N case above that most times
lead me to prefer using registers throughout rather than defines.

> +#ifdef USE_AS_SUB
> +	neg	RP, RP
> +#else
> +	addi	RP, RP, r1
> +#endif
> +	blr
> +END(FUNC)

-- 
Alan Modra
Australia Development Lab, IBM


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]