This is the mail archive of the
mailing list for the libc-ports project.
Re: [PATCH 24/26] arm: Add optimized addmul_1
On 02/28/2013 05:58 AM, Måns Rullgård wrote:
>> > +0:
>> > + ldr r6, [r1], #4 /* load next ul */
>> > + adds r4, r4, r5 /* (out, c) = cl + lpl */
>> > + ldr r5, [r0, #4] /* load next rl */
>> > + str r4, [r0], #4
>> > + adc r4, ip, #0 /* cl = hpl + c */
> You might gain a cycle here on some cores by replacing r4 by something
> else in the adds/str sequence and reversing the order of the last two
> insns to better exploit dual-issue. On most semi-modern cores you can
> get another register for free by pushing one more to the stack
> (load/store multiple instructions transfer registers pairwise).
> I'd expect this to benefit the A8 and maybe A9. On A15 it should make
> no difference.
To swap the adc and str, I'd have to add another move insn too. I guess the
intent is that would dual-issue with the store, giving us 6 insns in 3 cycles
as opposed to 5 insns in 4 cycles?
I'm not willing to work *too* hard on this. If someone cares about the last
cycle of performance on A, they should work on getting the real libgmp
routines re-licensed for glibc. I'm not willing to do politics.