[PATCH] x86: Correct EVEX vector load/store optimization
Jan Beulich
JBeulich@suse.com
Tue Mar 19 08:30:00 GMT 2019
>>> On 19.03.19 at 07:20, <hjl.tools@gmail.com> wrote:
> On Mon, Mar 18, 2019 at 9:49 PM Jan Beulich <JBeulich@suse.com> wrote:
>>
>> >>> On 17.03.19 at 21:47, <hjl.tools@gmail.com> wrote:
>> > --- a/gas/config/tc-i386.c
>> > +++ b/gas/config/tc-i386.c
>> > @@ -4075,6 +4075,56 @@ optimize_encoding (void)
>> > i.types[j].bitfield.ymmword = 0;
>> > }
>> > }
>> > + else if ((cpu_arch_flags.bitfield.cpuavx
>> > + || cpu_arch_isa_flags.bitfield.cpuavx)
>>
>> Once again a questionable condition, as per earlier replies to
>> other patches of yours.
>
> Fixed.
>
>> > + && i.vec_encoding != vex_encoding_evex
>> > + && !i.types[0].bitfield.zmmword
>> > + && !i.mask
>> > + && is_evex_encoding (&i.tm)
>> > + && (i.tm.base_opcode == 0x666f
>> > + || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0x666f
>> > + || i.tm.base_opcode == 0xf36f
>> > + || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf36f
>> > + || i.tm.base_opcode == 0xf26f
>> > + || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf26f)
>>
>> All three of these can be expressed with just a single comparison,
>> using & or | instead of ^ and (if necessary) adjusting the literal
>> value compared against.
>
> Fixed.
>
>> > + && i.tm.extension_opcode == None)
>> > + {
>> > + /* Optimize: -O1:
>> > + VOP, one of vmovdqa32, vmovdqa64, vmovdqu8, vmovdqu16,
>> > + vmovdqu32 and vmovdqu64:
>> > + EVEX VOP %xmmM, %xmmN
>> > + -> VEX vmovdqa|vmovdqu %xmmM, %xmmN (M and N < 16)
>> > + EVEX VOP %ymmM, %ymmN
>> > + -> VEX vmovdqa|vmovdqu %ymmM, %ymmN (M and N < 16)
>> > + EVEX VOP %xmmM, mem
>> > + -> VEX vmovdqa|vmovdqu %xmmM, mem (M < 16)
>> > + EVEX VOP %ymmM, mem
>> > + -> VEX vmovdqa|vmovdqu %ymmM, mem (M < 16)
>> > + EVEX VOP mem, %xmmN
>> > + -> VEX mvmovdqa|vmovdquem, %xmmN (N < 16)
>>
>> There's some confusion on this line.
>>
>> > + EVEX VOP mem, %ymmN
>> > + -> VEX vmovdqa|vmovdqu mem, %ymmN (N < 16)
>> > + */
>>
>> For the variants with a memory operand I doubt the conversion
>> is always a win, and it may be against the user request in case of
>> -Os. This is because of the Disp8 scaling the EVEX encoding permits.
>
> Fixed.
>
>> > + if (i.tm.base_opcode == 0xf26f)
>> > + i.tm.base_opcode = 0xf36f;
>> > + else if ((i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf26f)
>> > + i.tm.base_opcode = 0xf36f ^ Opcode_SIMD_IntD;
>>
>> This again can be expressed without "else if()" afaict.
>>
>
> Fixed.
>
> Here is the patch.
Thanks.
>--- a/gas/config/tc-i386.c
>+++ b/gas/config/tc-i386.c
>@@ -4068,18 +4068,14 @@ optimize_encoding (void)
> i.types[j].bitfield.ymmword = 0;
> }
> }
>- else if ((cpu_arch_flags.bitfield.cpuavx
>- || cpu_arch_isa_flags.bitfield.cpuavx)
>- && i.vec_encoding != vex_encoding_evex
>+ else if (i.vec_encoding != vex_encoding_evex
> && !i.types[0].bitfield.zmmword
Ah, here the remaining cpuavx goes away as well.
>+ if ((i.tm.base_opcode & ~Opcode_SIMD_IntD) == 0xf26f)
>+ {
>+ i.tm.base_opcode &= Opcode_SIMD_IntD;
>+ i.tm.base_opcode |= 0xf36f;
>+ }
How about the even simpler
if ((i.tm.base_opcode & ~Opcode_SIMD_IntD) == 0xf26f)
i.tm.base_opcode ^= 0xf36f ^ 0xf26f;
?
Jan
More information about the Binutils
mailing list