This is the mail archive of the
binutils@sourceware.org
mailing list for the binutils project.
Re: x86: Swap destination/source to encode VEX only if possible
>>> On 13.09.18 at 15:19, <hjl.tools@gmail.com> wrote:
> On Thu, Sep 13, 2018 at 5:57 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Wed, Sep 5, 2018 at 6:05 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>> Various moves come in load and store forms, and just like on the GPR
>>> and FPU sides there would better be only one pattern. In some cases this
>>> is not feasible because the opcodes are too different, but quite a few
>>> cases follow a similar standard scheme. Introduce Opcode_SIMD_FloatD and
>>> Opcode_SIMD_IntD, generalize handling in operand_size_match() (reverse
>>> operand handling there simply needs to match "straight" operand one),
>>> and fix a long standing, but so far only latent bug with when to zap
>>> found_reverse_match.
>>>
>>> Also once again drop IgnoreSize where pointlessly applied to templates
>>> touched anyway as well as *word when redundant with Reg*.
>>>
>>> gas/
>>> 2018-09-05 Jan Beulich <jbeulich@suse.com>
>>>
>>> * config/tc-i386.c (operand_size_match): Mirror
>>> .reg/.regsimd/.acc handling from forward to reverse case.
>>> (build_vex_prefix): Check first and last operand types are equal
>>> and also consider .d for swapping operands for VEX2 encoding.
>>> (match_template): Clear found_reverse_match on every iteration.
>>> Use Opcode_SIMD_FloatD and Opcode_SIMD_IntD.
>>> * testsuite/gas/i386/pseudos.s,
>>> testsuite/gas/i386/x86-64-pseudos.s: Add kmov* tests.
>>> * testsuite/gas/i386/pseudos.d,
>>> testsuite/gas/i386/x86-64-pseudos.d: Adjust expectations.
>>>
>>> opcodes/
>>> 2018-09-05 Jan Beulich <jbeulich@suse.com>
>>>
>>> * i386-opc.tbl (bndmov, kmovb, kmovd, kmovq, kmovw, movapd,
>>> movaps, movd, movdqa, movdqu, movhpd, movhps, movlpd, movlps,
>>> movq, movsd, movss, movupd, movups, vmovapd, vmovaps, vmovd,
>>> vmovdqa, vmovdqa32, vmovdqa64, vmovdqu, vmovdqu16, vmovdqu32,
>>> vmovdqu64, vmovdqu8, vmovq, vmovsd, vmovss, vmovupd, vmovups):
>>> Fold load and store templates where possible, adding D. Drop
>>> IgnoreSize where it was pointlessly present. Drop redundant
>>> *word.
>>> * i386-tbl.h: Re-generate.
>>
>> On Linux/x86-64, this caused
>>
>> FAIL: i386 arch 10
>> FAIL: i386 arch 10 (lzcnt)
>> FAIL: i386 arch 10 (prefetchw)
>> FAIL: i386 arch 10 (bdver1)
>> FAIL: i386 arch 10 (bdver2)
>> FAIL: i386 arch 10 (bdver3)
>> FAIL: i386 arch 10 (bdver4)
>> FAIL: i386 arch 10 (btver1)
>> FAIL: i386 arch 10 (btver2)
>> FAIL: i386 noavx-1
>> FAIL: i386 noavx-3
>> FAIL: i386 AVX
>> FAIL: i386 AVX (Intel disassembly)
>> FAIL: x86-64 arch 2
>> FAIL: x86-64 arch 2 (lzcnt)
>> FAIL: x86-64 arch 2 (prefetchw)
>> FAIL: x86-64 arch 2 (bdver1)
>> FAIL: x86-64 arch 2 (bdver2)
>> FAIL: x86-64 arch 2 (bdver3)
>> FAIL: x86-64 arch 2 (bdver4)
>> FAIL: x86-64 arch 2 (btver1)
>> FAIL: x86-64 arch 2 (btver2)
>> FAIL: x86-64 AVX
>> FAIL: x86-64 AVX (Intel mode)
>>
>> Can you fix it today?
>>
>
> as fails to assemble
>
> vzeroall
Interesting. I have absolutely no idea why it looks to have worked for
me.
> I checked in the following patch to fix it.
Thanks a lot!
Jan