This is the mail archive of the mailing list for the libc-ports project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 24/26] arm: Add optimized addmul_1

Richard Henderson <> writes:

> On 02/28/2013 05:58 AM, Måns Rullgård wrote:
>>> > +0:
>>> > +	ldr	r6, [r1], #4		/* load next ul */
>>> > +	adds	r4, r4, r5		/* (out, c) = cl + lpl */
>>> > +	ldr	r5, [r0, #4]		/* load next rl */
>>> > +	str	r4, [r0], #4
>>> > +	adc	r4, ip, #0		/* cl = hpl + c */
>> You might gain a cycle here on some cores by replacing r4 by something
>> else in the adds/str sequence and reversing the order of the last two
>> insns to better exploit dual-issue.  On most semi-modern cores you can
>> get another register for free by pushing one more to the stack
>> (load/store multiple instructions transfer registers pairwise).
>> I'd expect this to benefit the A8 and maybe A9.  On A15 it should make
>> no difference.
> To swap the adc and str, I'd have to add another move insn too.  I guess the
> intent is that would dual-issue with the store, giving us 6 insns in 3 cycles
> as opposed to 5 insns in 4 cycles?

I meant like this:

	ldr	r6, [r1], #4		/* load next ul */
	adds	r7, r4, r5		/* (out, c) = cl + lpl */
	ldr	r5, [r0, #4]		/* load next rl */
	adc	r4, ip, #0		/* cl = hpl + c */
	str	r7, [r0], #4

It seems to me this leaves everything with the same values as your
version.  r7 can be pushed/popped for free since you're currently
preserving and odd number of registers.

> Fair enough.
> I'm not willing to work *too* hard on this.  If someone cares about the last
> cycle of performance on A[89], they should work on getting the real libgmp
> routines re-licensed for glibc.  I'm not willing to do politics.

Nor am I.

Måns Rullgård

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]