This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
[PATCH] Minor tweak to mp multiplication
- From: Siddhesh Poyarekar <siddhesh at redhat dot com>
- To: libc-alpha at sourceware dot org
- Date: Fri, 11 Jan 2013 16:58:31 +0530
- Subject: [PATCH] Minor tweak to mp multiplication
Hi,
Attached is a minor tweak to the multiplication function (__mul) to
help the compiler a bit to generate slightly faster code. It adds a
local variable zk that acts as an accumulator across the loops (and
within them) to reduce the reads to/writes from the array z.d. I took
this idea from the powerpc code. This results in an improvement of
about 5% on x86_64 for the pow function and no regressions in the
testsuite due to this.
For 100000 iterations of pow with input as (1.0000000000000020, 1.5):
Without the patch:
Total:43300983220, Fastest:421664, Slowest:1347553, Avg:433009.832200
With the patch:
Total:41006921512, Fastest:402743, Slowest:1326084, Avg:410069.215120
The fastest time improved by about 4.5% and the average improved by
about 5.3%.
OK to commit?
Siddhesh
* sysdeps/ieee754/dbl-64/mpa.c (__mul): Add a local variable
to optimize copies.
diff --git a/sysdeps/ieee754/dbl-64/mpa.c b/sysdeps/ieee754/dbl-64/mpa.c
index 7abad67..98a9b36 100644
--- a/sysdeps/ieee754/dbl-64/mpa.c
+++ b/sysdeps/ieee754/dbl-64/mpa.c
@@ -384,7 +384,7 @@ SECTION
__mul(const mp_no *x, const mp_no *y, mp_no *z, int p) {
int i, j, k, k2;
- double u;
+ double u, zk;
/* Is z=0? */
if (__glibc_unlikely (X[0] * Y[0] == ZERO))
@@ -395,31 +395,33 @@ __mul(const mp_no *x, const mp_no *y, mp_no *z, int p) {
/* Multiply, add and carry. */
k2 = (__glibc_unlikely (p < 3)) ? p + p : p + 3;
- Z[k2] = ZERO;
+ zk = Z[k2] = ZERO;
- for (k = k2; k > p; )
+ for (k = k2; k > p; k--)
{
for (i = k - p, j = p; i < p + 1; i++, j--)
- Z[k] += X[i] * Y[j];
+ zk += X[i] * Y[j];
- u = (Z[k] + CUTTER) - CUTTER;
- if (u > Z[k])
+ u = (zk + CUTTER) - CUTTER;
+ if (u > zk)
u -= RADIX;
- Z[k] -= u;
- Z[--k] = u * RADIXI;
+ Z[k] = zk - u;
+ zk = u * RADIXI;
}
while (k > 1)
{
for (i = 1,j = k - 1; i < k; i++, j--)
- Z[k] += X[i] * Y[j];
+ zk += X[i] * Y[j];
- u = (Z[k] + CUTTER) - CUTTER;
- if (u > Z[k])
+ u = (zk + CUTTER) - CUTTER;
+ if (u > zk)
u -= RADIX;
- Z[k] -= u;
- Z[--k] = u * RADIXI;
+ Z[k] = zk - u;
+ zk = u * RADIXI;
+ k--;
}
+ Z[k] = zk;
EZ = EX + EY;
/* Is there a carry beyond the most significant digit? */