[PATCH 2/4] i386: Align branches within a fixed boundary

Jan Beulich jbeulich@suse.com
Thu Dec 5 08:04:00 GMT 2019


On 05.12.2019 00:55, H.J. Lu wrote:
> On Wed, Dec 4, 2019 at 2:57 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 03.12.2019 20:02,  H.J. Lu  wrote:
>>> +static int
>>> +maybe_fused_with_jcc_p (void)
>>> +{
>>> +  /* No RIP address.  */
>>> +  if (i.base_reg && i.base_reg->reg_num == RegIP)
>>> +    return 0;
>>> +
>>> +  /* and, add, sub with destination register.  */
>>> +  if ((i.tm.base_opcode | 5) == 0x25
>>> +      || (i.tm.base_opcode | 5) == 0x5
>>> +      || (i.tm.base_opcode >= 0x28 && i.tm.base_opcode <= 0x2d)
>>
>> Why not the same "| 5" approach here as above? And why then not
>> further fold AND and SUB (using "| 0xd" along the lines of ...
> 
> It turns out that (i.tm.base_opcode | 5) == 0x25 doesn't work since
> 5 doesn't contain consecutive bits.  I need to use
> 
> i.tm.base_opcode >= 0x20 && i.tm.base_opcode <= 0x25

Then use "| 7", as suggested further down, since ...

>>> +      || ((i.tm.base_opcode | 3) == 0x83
>>> +       && ((i.tm.extension_opcode | 1) == 0x5
>>
>> ... this?
>>
>>> +           || i.tm.extension_opcode == 0x0)))
>>> +    return (i.types[1].bitfield.class == Reg
>>> +         || i.types[1].bitfield.instance == Accum);

... this will produce 0 for opcodes 0x26 and 0x27.

>>> +  /* test, cmp with any register.  */
>>> +  if ((i.tm.base_opcode | 1) == 0x85
>>> +      || (i.tm.base_opcode | 1) == 0xa9
>>> +      || (i.tm.base_opcode | 1) == 0xf7
>>
>> This needs further qualification by extension opcode, to avoid
>> covering not, neg, mul, etc.
> 
> I am adding i.tm.extension_opcode == 0.
> 
>>> +      || (i.tm.base_opcode >= 0x38 && i.tm.base_opcode <= 0x3d)
>>
>> Along the lines of the above use "| 7" here, seeing that ...
> 
> We can't use "| 7" since it will match 0x3e and 0x3f.
> 
>>> +      || ((i.tm.base_opcode | 3) == 0x83
>>> +       && (i.tm.extension_opcode == 0x7)))
>>> +    return (i.types[0].bitfield.class == Reg
>>> +         || i.types[0].bitfield.instance == Accum
>>> +         || i.types[1].bitfield.class == Reg
>>> +         || i.types[1].bitfield.instance == Accum);
>>
>> ... this will also be correct (returning 0) for AAS?
> 
> AAS can't be fused with jcc.

Of course, but my comment did cover this fact: The expression of
the return statement will produce 0 for it, which is what you
want (aiui).

>>> +static int
>>> +add_branch_prefix_frag_p (void)
>>> +{
>>> +  /* NB: Don't work with COND_JUMP86 without i386.  Don't add prefix
>>> +     to PadLock instructions since they include prefixes in opcode.  */
>>> +  if (!align_branch_power
>>> +      || now_seg == absolute_section
>>> +      || i.tm.cpu_flags.bitfield.cpupadlock
>>
>> I continue to question this: PadLock insns are no different to legacy
>> encoded SSE ones, and even some newer GPR ones. Why don't you instead
>> check i.tm.base_opcode >> (8 * i.tm.opcode_bytes) to be (non-)zero?
> 
> There are a few  i.tm.cpu_flags.bitfield.cpupadlock checks.  I want to be
> conservative here.  PadLock insns should be on dead code path for affected
> processors anyway.

But legacy encoded insns with prefixes embedded in their base
opcode won't be. I appreciate you wanting to be conservative here
- what I'm asking for is to be even more conservative. If there
is a concern about such prefixes in the first place, then the
concern ought to be universal, I would think.

>>> +      || !cpu_arch_flags.bitfield.cpui386)
>>> +    return 0;
>>> +
>>> +  /* Don't add prefix if it is a prefix or there is no operand.  */
>>> +  if (!i.operands || i.tm.opcode_modifier.isprefix)
>>> +    return 0;
>>
>> Why is there not being any operand a reason to avoid adding prefixes?
> 
> I am just a little bit conservative.

Especially in cases of arbitrary decisions like this one, may I
ask that you leave suitable comments, so it won't remain guess
work down the road as to why the condition is there?

>>> +static int
>>> +add_branch_padding_frag_p (enum align_branch_kind *branch_p)
>>> +{
>>> +  int add_padding;
>>> +
>>> +  /* NB: Don't work with COND_JUMP86 without i386.  */
>>> +  if (!align_branch_power
>>> +      || now_seg == absolute_section
>>> +      || !cpu_arch_flags.bitfield.cpui386)
>>> +    return 0;
>>> +
>>> +  add_padding = 0;
>>> +
>>> +  /* Check for jcc and direct jmp.  */
>>> +  if (i.tm.opcode_modifier.jump == JUMP)
>>> +    {
>>> +      if (i.tm.base_opcode == JUMP_PC_RELATIVE)
>>> +     {
>>> +       *branch_p = align_branch_jmp;
>>> +       add_padding = align_branch & align_branch_jmp_bit;
>>> +     }
>>> +      else
>>> +     {
>>> +       *branch_p = align_branch_jcc;
>>> +       if ((align_branch & align_branch_jcc_bit))
>>
>> Extra pair of parentheses (more similar instances further down
>> in this function).
> 
> I consider it as a coding style in case a new condition is added.

Oh. I didn't think we do so commonly. But anyway, you're the
maintainer.

Jan



More information about the Binutils mailing list