This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Split mantissa calculation loop and add branchprediction to mp multiplication
- From: Steven Munroe <munroesj at linux dot vnet dot ibm dot com>
- To: Siddhesh Poyarekar <siddhesh at redhat dot com>
- Cc: libc-alpha at sourceware dot org
- Date: Wed, 02 Jan 2013 14:20:13 -0600
- Subject: Re: [PATCH] Split mantissa calculation loop and add branchprediction to mp multiplication
- References: <20121231092850.GA21621@spoyarek.pnq.redhat.com>
- Reply-to: munroesj at us dot ibm dot com
On Mon, 2012-12-31 at 14:58 +0530, Siddhesh Poyarekar wrote:
> Hi,
>
> The loop inside __mul in mpa.c is slow because of the branch
> evaluation within the loop. Attached patch splits the loop to make
> the loop simpler to evaluate and also adds some branch prediction to
> improve performance. I have verified that this does not break the
> testsuite.
>
> I checked for performance improvement with this patch using the test
> program here:
>
> http://entropymine.com/imageworsener/slowpow/
>
> with the following commandline:
>
> time ./powtest 1000000 1.0000000000000020 1.5000000000500000
>
> Taking into account the best times with and without the patch (with
> more than 10 runs each), the improvement is about a quarter of a
> second.
>
> OK to commit?
>
> Siddhesh
>
> ChangeLog:
>
> * sysdeps/ieee754/dbl-64/mpa.c (__mul): Split mantissa
> calculation loop and add branch prediction.
>
> diff --git a/sysdeps/ieee754/dbl-64/mpa.c b/sysdeps/ieee754/dbl-64/mpa.c
> index 78afd99..278fdb6 100644
> --- a/sysdeps/ieee754/dbl-64/mpa.c
> +++ b/sysdeps/ieee754/dbl-64/mpa.c
> @@ -447,33 +447,51 @@ void
> SECTION
> __mul(const mp_no *x, const mp_no *y, mp_no *z, int p) {
>
> - int i, i1, i2, j, k, k2;
> + int i, j, k, k2;
> double u;
>
> - /* Is z=0? */
> - if (X[0]*Y[0]==ZERO)
> - { Z[0]=ZERO; return; }
> -
> - /* Multiply, add and carry */
> - k2 = (p<3) ? p+p : p+3;
> - Z[k2]=ZERO;
> - for (k=k2; k>1; ) {
> - if (k > p) {i1=k-p; i2=p+1; }
> - else {i1=1; i2=k; }
> - for (i=i1,j=i2-1; i<i2; i++,j--) Z[k] += X[i]*Y[j];
> -
> - u = (Z[k] + CUTTER)-CUTTER;
> - if (u > Z[k]) u -= RADIX;
> - Z[k] -= u;
> - Z[--k] = u*RADIXI;
> - }
> + /* Is z=0? */
> + if (__glibc_unlikely (X[0] * Y[0] == ZERO))
> + {
> + Z[0]=ZERO;
> + return;
> + }
>
> - /* Is there a carry beyond the most significant digit? */
> - if (Z[1] == ZERO) {
> - for (i=1; i<=p; i++) Z[i]=Z[i+1];
> - EZ = EX + EY - 1; }
> - else
> - EZ = EX + EY;
> + /* Multiply, add and carry */
> + k2 = (__glibc_unlikely (p < 3)) ? p + p : p + 3;
> + Z[k2] = ZERO;
> +
> + for (k = k2; k > p; )
> + {
> + for (i = k - p, j = p; i < p + 1; i++, j--)
> + Z[k] += X[i] * Y[j];
> +
> + u = (Z[k] + CUTTER)-CUTTER;
> + if (u > Z[k]) u -= RADIX;
> + Z[k] -= u;
> + Z[--k] = u * RADIXI;
> + }
> +
> + for (; k > 1; )
> + {
> + for (i = 1,j = k - 1; i < k; i++, j--)
> + Z[k] += X[i] * Y[j];
> +
> + u = (Z[k] + CUTTER)-CUTTER;
> + if (u > Z[k])
> + u -= RADIX;
> + Z[k] -= u;
> + Z[--k] = u*RADIXI;
> + }
> +
I do not understand what you are doing here. If the intent is to replace
the X[], Y[], Z[] doubles with int's you will get overflows in Z[] if
you are changing X[], y[]. Z[] with uint64_t then you avoid the
overflows but (Z[k] + CUTTER)-CUTTER has no effect and you have not
saved any space. Also u is still a double, so you are adding some
expensive int->float->int converts to the inter loop.
Have you really thought this through?
> + EZ = EX + EY;
> + /* Is there a carry beyond the most significant digit? */
> + if (__glibc_unlikely (Z[1] == ZERO))
> + {
> + for (i = 1; i <= p; i++)
> + Z[i] = Z[i+1];
> + EZ--;
> + }
>
> Z[0] = X[0] * Y[0];
> }
>