This is the mail archive of the libc-ports@sources.redhat.com mailing list for the libc-ports project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH 24/26] arm: Add optimized addmul_1

From: Måns Rullgård <mans at mansr dot com>
To: Richard Henderson <rth at twiddle dot net>
Cc: Måns Rullgård <mans at mansr dot com>, libc-ports at sourceware dot org, Joseph Myers <joseph at codesourcery dot com>
Date: Thu, 28 Feb 2013 19:37:22 +0000
Subject: Re: [PATCH 24/26] arm: Add optimized addmul_1
References: <1361934986-17018-1-git-send-email-rth@twiddle.net> <1361934986-17018-25-git-send-email-rth@twiddle.net> <yw1xwqts7d7b.fsf@unicorn.mansr.com> <512F9FA4.1080500@twiddle.net>

Richard Henderson <rth@twiddle.net> writes:

> On 02/28/2013 05:58 AM, Måns Rullgård wrote:
>>> > +0:
>>> > +	ldr	r6, [r1], #4		/* load next ul */
>>> > +	adds	r4, r4, r5		/* (out, c) = cl + lpl */
>>> > +	ldr	r5, [r0, #4]		/* load next rl */
>>> > +	str	r4, [r0], #4
>>> > +	adc	r4, ip, #0		/* cl = hpl + c */
>> You might gain a cycle here on some cores by replacing r4 by something
>> else in the adds/str sequence and reversing the order of the last two
>> insns to better exploit dual-issue.  On most semi-modern cores you can
>> get another register for free by pushing one more to the stack
>> (load/store multiple instructions transfer registers pairwise).
>> 
>> I'd expect this to benefit the A8 and maybe A9.  On A15 it should make
>> no difference.
>> 
>
> To swap the adc and str, I'd have to add another move insn too.  I guess the
> intent is that would dual-issue with the store, giving us 6 insns in 3 cycles
> as opposed to 5 insns in 4 cycles?

I meant like this:

	ldr	r6, [r1], #4		/* load next ul */
	adds	r7, r4, r5		/* (out, c) = cl + lpl */
	ldr	r5, [r0, #4]		/* load next rl */
	adc	r4, ip, #0		/* cl = hpl + c */
	str	r7, [r0], #4

It seems to me this leaves everything with the same values as your
version.  r7 can be pushed/popped for free since you're currently
preserving and odd number of registers.

> Fair enough.
>
> I'm not willing to work *too* hard on this.  If someone cares about the last
> cycle of performance on A[89], they should work on getting the real libgmp
> routines re-licensed for glibc.  I'm not willing to do politics.

Nor am I.

-- 
Måns Rullgård
mans@mansr.com

References:
- [PATCH 00/26] ARM improvements
  - From: Richard Henderson
- [PATCH 24/26] arm: Add optimized addmul_1
  - From: Richard Henderson
- Re: [PATCH 24/26] arm: Add optimized addmul_1
  - From: Måns Rullgård
- Re: [PATCH 24/26] arm: Add optimized addmul_1
  - From: Richard Henderson

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]