This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
[PATCH] Don't spin around multiplying zeroes in __mul
- From: Siddhesh Poyarekar <siddhesh at redhat dot com>
- To: libc-alpha at sourceware dot org
- Date: Fri, 11 Jan 2013 19:49:41 +0530
- Subject: [PATCH] Don't spin around multiplying zeroes in __mul
Hi,
The current implementation of __mul unconditionally multiplies X[i]
and Y[j] regardless of whether they're zero. The sequence of X[i] *
Y[j] is such that i increments up to the initial value of j and j
decrements to the initial value of i. Hence, if both X and Y have
only zeroes after IP digits, the terms at the extreme ends,
i.e. 1..ip-1 and ip+1..p are always zero (P being the precision we're
calculating in). It is possible to skip those terms altogether and
just compute the non-zero terms.
Attached patch only bothers performing multiplication of mantissa
digits up to the point where either X or Y have non-zero digits and
skips the rest. I have verified that this passes the testsuite on
x86_64. The patch applies on top of the earlier patch that added the
zk variable.
Performance improvement:
For 100000 iterations of pow with input as (1.0000000000000020, 1.5):
Without the patch:
Total:42972814009, Fastest:416278, Slowest:1242152, Avg:429728.140090
With the patch:
Total:29527884109, Fastest:285682, Slowest:842066, Avg:295278.841090
That means an improvement in fastest run by ~31.4% and average
performance by ~31.3%!
Siddhesh
* sysdeps/ieee754/dbl-64/mpa.c (__mul): Don't bother with zero
values in the mantissa.
diff --git a/sysdeps/ieee754/dbl-64/mpa.c b/sysdeps/ieee754/dbl-64/mpa.c
index 98a9b36..68f5240 100644
--- a/sysdeps/ieee754/dbl-64/mpa.c
+++ b/sysdeps/ieee754/dbl-64/mpa.c
@@ -383,7 +383,7 @@ void
SECTION
__mul(const mp_no *x, const mp_no *y, mp_no *z, int p) {
- int i, j, k, k2;
+ int i, j, k, k2, ip = p;
double u, zk;
/* Is z=0? */
@@ -393,13 +393,21 @@ __mul(const mp_no *x, const mp_no *y, mp_no *z, int p) {
return;
}
+ /* We need not iterate through all X's and Y's since it's pointless to
+ multiply zeroes. */
+ for (i = p; i > 0; i--)
+ if (X[i] == 0 && Y[i] == 0)
+ ip--;
+ else
+ break;
+
/* Multiply, add and carry. */
k2 = (__glibc_unlikely (p < 3)) ? p + p : p + 3;
zk = Z[k2] = ZERO;
for (k = k2; k > p; k--)
{
- for (i = k - p, j = p; i < p + 1; i++, j--)
+ for (i = k - ip, j = ip; i < ip + 1; i++, j--)
zk += X[i] * Y[j];
u = (zk + CUTTER) - CUTTER;
@@ -408,6 +416,8 @@ __mul(const mp_no *x, const mp_no *y, mp_no *z, int p) {
Z[k] = zk - u;
zk = u * RADIXI;
}
+ Z[k] = zk;
+ k = MIN(k, 2 * ip);
while (k > 1)
{