This is the mail archive of the
libffi-discuss@sourceware.org
mailing list for the libffi project.
Re: [PATCH 3/4] s390: Reorganize assembly
- From: Dominik Vogt <vogt at linux dot vnet dot ibm dot com>
- To: libffi-discuss at sourceware dot org
- Date: Tue, 23 Dec 2014 10:54:39 +0100
- Subject: Re: [PATCH 3/4] s390: Reorganize assembly
- Authentication-results: sourceware.org; auth=none
- References: <1418938403-15836-1-git-send-email-rth at twiddle dot net> <1418938403-15836-4-git-send-email-rth at twiddle dot net> <20141222121250 dot GA25775 at linux dot vnet dot ibm dot com> <20141222122517 dot GA30481 at linux dot vnet dot ibm dot com> <54984712 dot 2050609 at redhat dot com>
- Reply-to: libffi-discuss at sourceware dot org
On Mon, Dec 22, 2014 at 08:30:10AM -0800, Richard Henderson wrote:
> On 12/22/2014 04:25 AM, Dominik Vogt wrote:
> > Or rather the attached patch stat replaces
> >
> > stm %r2,%r3,0(%r12)
> > nop
> >
> > with
> >
> > st %r2,0(%r12)
> > st %r3,4(%r12)
>
> Is that really an improvement?
>
> (1) You now need a branch for the (presumed) normal "int" case.
Well, depends on whether we talk about 64 bit or 31 bit:
64 bit
------
+ saves decoding two nop instructions for int64_t
+ saves four bytes
31 bit
------
+ saves decoding two nop instructions for int64_t
- replaces a nop with a jump instruction for int32_t
Of course one could have a different order of types depending on
the platform to eliminate the disadvantage of the 31 bit case.
> (2) Is stm really that much faster than two st? I would have
> thought the reverse, actually.
No, stm+nop is potentially slower than two st. That's why the
patch uses the two st instructions.
P.S.: Just noticed there's a duplicate ".balign" in the patched
code.
Ciao
Dominik ^_^ ^_^
--
Dominik Vogt
IBM Germany