[PATCH 2/4] i386: Align branches within a fixed boundary
Jan Beulich
jbeulich@suse.com
Thu Dec 5 08:04:00 GMT 2019
On 05.12.2019 00:55, H.J. Lu wrote:
> On Wed, Dec 4, 2019 at 2:57 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 03.12.2019 20:02, H.J. Lu wrote:
>>> +static int
>>> +maybe_fused_with_jcc_p (void)
>>> +{
>>> + /* No RIP address. */
>>> + if (i.base_reg && i.base_reg->reg_num == RegIP)
>>> + return 0;
>>> +
>>> + /* and, add, sub with destination register. */
>>> + if ((i.tm.base_opcode | 5) == 0x25
>>> + || (i.tm.base_opcode | 5) == 0x5
>>> + || (i.tm.base_opcode >= 0x28 && i.tm.base_opcode <= 0x2d)
>>
>> Why not the same "| 5" approach here as above? And why then not
>> further fold AND and SUB (using "| 0xd" along the lines of ...
>
> It turns out that (i.tm.base_opcode | 5) == 0x25 doesn't work since
> 5 doesn't contain consecutive bits. I need to use
>
> i.tm.base_opcode >= 0x20 && i.tm.base_opcode <= 0x25
Then use "| 7", as suggested further down, since ...
>>> + || ((i.tm.base_opcode | 3) == 0x83
>>> + && ((i.tm.extension_opcode | 1) == 0x5
>>
>> ... this?
>>
>>> + || i.tm.extension_opcode == 0x0)))
>>> + return (i.types[1].bitfield.class == Reg
>>> + || i.types[1].bitfield.instance == Accum);
... this will produce 0 for opcodes 0x26 and 0x27.
>>> + /* test, cmp with any register. */
>>> + if ((i.tm.base_opcode | 1) == 0x85
>>> + || (i.tm.base_opcode | 1) == 0xa9
>>> + || (i.tm.base_opcode | 1) == 0xf7
>>
>> This needs further qualification by extension opcode, to avoid
>> covering not, neg, mul, etc.
>
> I am adding i.tm.extension_opcode == 0.
>
>>> + || (i.tm.base_opcode >= 0x38 && i.tm.base_opcode <= 0x3d)
>>
>> Along the lines of the above use "| 7" here, seeing that ...
>
> We can't use "| 7" since it will match 0x3e and 0x3f.
>
>>> + || ((i.tm.base_opcode | 3) == 0x83
>>> + && (i.tm.extension_opcode == 0x7)))
>>> + return (i.types[0].bitfield.class == Reg
>>> + || i.types[0].bitfield.instance == Accum
>>> + || i.types[1].bitfield.class == Reg
>>> + || i.types[1].bitfield.instance == Accum);
>>
>> ... this will also be correct (returning 0) for AAS?
>
> AAS can't be fused with jcc.
Of course, but my comment did cover this fact: The expression of
the return statement will produce 0 for it, which is what you
want (aiui).
>>> +static int
>>> +add_branch_prefix_frag_p (void)
>>> +{
>>> + /* NB: Don't work with COND_JUMP86 without i386. Don't add prefix
>>> + to PadLock instructions since they include prefixes in opcode. */
>>> + if (!align_branch_power
>>> + || now_seg == absolute_section
>>> + || i.tm.cpu_flags.bitfield.cpupadlock
>>
>> I continue to question this: PadLock insns are no different to legacy
>> encoded SSE ones, and even some newer GPR ones. Why don't you instead
>> check i.tm.base_opcode >> (8 * i.tm.opcode_bytes) to be (non-)zero?
>
> There are a few i.tm.cpu_flags.bitfield.cpupadlock checks. I want to be
> conservative here. PadLock insns should be on dead code path for affected
> processors anyway.
But legacy encoded insns with prefixes embedded in their base
opcode won't be. I appreciate you wanting to be conservative here
- what I'm asking for is to be even more conservative. If there
is a concern about such prefixes in the first place, then the
concern ought to be universal, I would think.
>>> + || !cpu_arch_flags.bitfield.cpui386)
>>> + return 0;
>>> +
>>> + /* Don't add prefix if it is a prefix or there is no operand. */
>>> + if (!i.operands || i.tm.opcode_modifier.isprefix)
>>> + return 0;
>>
>> Why is there not being any operand a reason to avoid adding prefixes?
>
> I am just a little bit conservative.
Especially in cases of arbitrary decisions like this one, may I
ask that you leave suitable comments, so it won't remain guess
work down the road as to why the condition is there?
>>> +static int
>>> +add_branch_padding_frag_p (enum align_branch_kind *branch_p)
>>> +{
>>> + int add_padding;
>>> +
>>> + /* NB: Don't work with COND_JUMP86 without i386. */
>>> + if (!align_branch_power
>>> + || now_seg == absolute_section
>>> + || !cpu_arch_flags.bitfield.cpui386)
>>> + return 0;
>>> +
>>> + add_padding = 0;
>>> +
>>> + /* Check for jcc and direct jmp. */
>>> + if (i.tm.opcode_modifier.jump == JUMP)
>>> + {
>>> + if (i.tm.base_opcode == JUMP_PC_RELATIVE)
>>> + {
>>> + *branch_p = align_branch_jmp;
>>> + add_padding = align_branch & align_branch_jmp_bit;
>>> + }
>>> + else
>>> + {
>>> + *branch_p = align_branch_jcc;
>>> + if ((align_branch & align_branch_jcc_bit))
>>
>> Extra pair of parentheses (more similar instances further down
>> in this function).
>
> I consider it as a coding style in case a new condition is added.
Oh. I didn't think we do so commonly. But anyway, you're the
maintainer.
Jan
More information about the Binutils
mailing list