This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: PowerPC: Optimized mpn functions for PowerPC64/POWER7
- From: Alan Modra <amodra at gmail dot com>
- To: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>
- Cc: "GNU C. Library" <libc-alpha at sourceware dot org>
- Date: Wed, 4 Dec 2013 15:20:39 +1030
- Subject: Re: PowerPC: Optimized mpn functions for PowerPC64/POWER7
- Authentication-results: sourceware.org; auth=none
- References: <5278FD67 dot 3010105 at linux dot vnet dot ibm dot com>
On Tue, Nov 05, 2013 at 12:15:03PM -0200, Adhemerval Zanella wrote:
> This patch add optimized __mpn_add_n/__mpn_sub_n for PowerPC64/POWER7.
> They are originally from GMP with adjustments for GLIBC. It provides a
> small boost for mpn_divrem.
It's a pity to see the comments from the gmp version gone..
> +#define RP r3
> +#define UP r4
> +#define VP r5
> +#define N r6
This is just a style preference, but I like assembly code better that
uses defines with somewhat descriptive names throughout, *or* uses the
actual registers throughout, with a preference for the latter. In
code like yours here that uses a mix, I find myself wondering whether
rXYZ clashes with r3, and having to go back to the defines to check.
Well, not so much with a simple function like this one, but other
powerpc code in glibc makes me scratch my head.
> +
> +EALIGN(FUNC, 5, 0)
> +#ifdef USE_AS_SUB
> + addic r0, r0, r0
Don't use r0 for the last arg when you mean 0.
> +#else
> + addic r0, r1, -1
> +#endif
> + andi. r7, N, r1
Similarly here. andi is not operating on the contents of r1 here.
> + beq L(bx0)
> +
> + ld r7, r0(UP)
0 not r0, similarly below.
> + ld r9, r0(VP)
> + ADDSUBC r11, r9, r7
> + std r11, r0(RP)
> + cmpldi N, N, r1
Again, 1 not r1.
> + beq N, L(end)
> + addi UP, UP, r8
> + addi VP, VP, r8
> + addi RP, RP, r8
And again. 8, not r8.
> +
> +L(bx0): addi r0, N, 2
> + srdi r0, r0, 2
> + mtctr r0
> +
> + andi. r7, N, 2
> + bne L(mid)
> +
> + addi UP, UP, 16
> + addi VP, VP, 16
> + addi RP, RP, 16
> +
> + .align 5
> +L(top): ld N, -16(UP)
Use the register here rather than N. r6 is no longer the count.
Similarly below.
> + ld r7, -8(UP)
> + ld r8, -16(VP)
> + ld r9, -8(VP)
> + ADDSUBC r10, r8, N
> + ADDSUBC r11, r9, r7
> + std r10, -16(RP)
> + std r11, -8(RP)
> +L(mid): ld N, 0(UP)
> + ld r7, r8(UP)
8 not r8.
> + ld r8, 0(VP)
> + ld r9, r8(VP)
Ditto.
> + ADDSUBC r10, r8, N
> + ADDSUBC r11, r9, r7
> + std r10, r0(RP)
> + std r11, r8(RP)
And again, 0 not r0 too.
> + addi UP, UP, 32
> + addi VP, VP, 32
> + addi RP, RP, 32
> + bdnz L(top)
> +
> +L(end): subfe RP, r0, r0
I wouldn't use RP here and below, since the use of r3 here is really
nothing to do with the pointer input parameter. It's this sort of
multiple use of a register and the r6/N case above that most times
lead me to prefer using registers throughout rather than defines.
> +#ifdef USE_AS_SUB
> + neg RP, RP
> +#else
> + addi RP, RP, r1
> +#endif
> + blr
> +END(FUNC)
--
Alan Modra
Australia Development Lab, IBM