This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Split mantissa calculation loop and add branchprediction to mp multiplication
On Thu, 2013-01-03 at 22:14 +0530, Siddhesh Poyarekar wrote:
> On Thu, Jan 03, 2013 at 10:18:08AM -0600, Steven Munroe wrote:
> > This is very bad for POWER. PowerPC has (multiple) independent fixed
> > point and floating point pipelines. This allow super-scalar out-of-order
> > execution, UNTIL you force a transfer (through memory) between the
> > FPRs/GPRs. PowerPC has lots of registers (32+32+32), we expect the
> > compiler to keep lots of data in the registers, and so we don't optimize
> > the hardware for dependent load after store, we optimize for memory
> > bandwidth.
> >
> > You proposed code forces an (unnecessary) double->long conversion and
> > FPR to GPR transfer into the inner loop, disabling any super-scalar
> > parallel execution. It also prevents loop unrolling and does not allow
> > GCC to make good use of all those registers we provide in the
> > architecture.
> >
> > So your code is optimized for (register poor, in-order-execution) X86 at
> > the expense of PowerPC.
> >
>
> I'm confused, which patch are you talking about, the current loop
> split patch or the conversion of mantissa to int or some other patch?
> I'll summarize the patches that are currently under review:
>
I was referring to this code from your note:
> (2) u does not exist since it is replaced by a much simpler operation,
> which results in that snippet looking like this:
>
> int64_t tmp = Z[k];
> for (i=i1,j=i2-1; i<i2; i++,j--)
> tmp += (int64_t) X[i]*Y[j];
>
> Z[k] = (int) (tmp % (1 << 24));
> Z[--k] = (int) (tmp / (1 << 24));
I do performance analysis and tuning for a living and this is an obvious
problem. oprofile will show this a hot spot.