[PATCH 0/5] i386: Optimize for Jump Conditional Code Erratum

Thu Nov 14 19:16:00 GMT 2019

On 11/14/19 11:20 AM, H.J. Lu wrote:
> On Thu, Nov 14, 2019 at 3:59 AM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * H. J. Lu:
>>
>>> Microcode update for Jump Conditional Code Erratum may cause performance
>>> loss for some workloads:
>>>
>>> https://www.intel.com/content/www/us/en/support/articles/000055650.html
>>>
>>> Here is the set of assembler patches to mitigate performance impact by
>>> aligning branches within 32-byte boundary.  The impacted instructions
>>> are:
>>>
>>>   a. Conditional jump.
>>>   b. Fused conditional jump.
>>>   c. Unconditional jump.
>>>   d. Call.
>>>   e. Ret.
>>>   f. Indirect jump and call.
>>>
>>> The new -mbranches-within-32B-boundaries command-line option aligns
>>> conditional jump, fused conditional jump and unconditional jump within
>>> 32-byte boundary.
>>
>> Should this mitigation be enabled by default?
> 
> We'd like to see it enabled as much as it can.   The potential issues are
> 
> 1.  Some assembly codes, like Linux kernel, check the code size.   Add prefix
> increases code size and may break such assembly codes.
ISTM the kernel could just turn off the flag if we were to turn it on by
default.

But I can't help but point out that if we do this, then we're forcing
everyone to pay a price in terms of runtime performance and codesize --
even if they're on a processor where this doesn't matter.

Additionally, I have yet to find any documentation which indicates with
better precision when this happens and what the consequences are when it
does happen.  That makes it impossible to know if there's any kind of
filtering we can do to avoid inserting to many nops/prefixes to ensure
branch alignment.  It's also exceedingly hard to assess the real world
impact of the bug which in turn makes it hard to assess the importance
of the mitigations.

> 2. Linker may re-write instructions, like TLS optimization.  Add prefixes may
> cause linker error or incorrect linker output.
> 
>> According to the whitepaper, this mitigation has some overhead on
>> non-affected CPUs (some Intel Atom-type CPUs are mentioned).  Is there a
>> way to avoid this overhead, so that the decision to enable this by
> 
> Yes, we are looking into it.
I don't want to open a can of worms here, but...

Aligning everything seems needlessly wasteful.  So at a high level I
wouldn't bother aligning anything in a cold code segment.   I'd probably
also avoid aligning things in code that isn't on the hot paths.

With some hackery we probably could identify the former.  The latter is
tougher because all the mitigation work is done in the assembler with no
cooperation from the compiler.  One might reasonably ask that the
compiler identify jumps that should be aligned, which we could do with a
pseudo-op or directive.

The other thing that comes to mind is branch target alignment.   If a
jump is going to need alignment, might it be better to instead insert
the nops/prefixes at the previous label in some cases?  This might be
particularly interesting is those nops/prefixes happen to increase the
alignment of the label to a nicer value.  If this happens with any
regularity, then we're killing two birds with one stone -- we're fixing
the alignment of the jump itself, but also improving the alignment of a
branch target which can be good for performance.

Jeff