This is the mail archive of the mailing list for the binutils project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 0/5] i386: Optimize for Jump Conditional Code Erratum

On Thu, Nov 14, 2019 at 4:16 PM Fangrui Song <> wrote:
> On 2019-11-12, H.J. Lu wrote:
> >Microcode update for Jump Conditional Code Erratum may cause performance
> >loss for some workloads:
> >
> >
> >
> >Here is the set of assembler patches to mitigate performance impact by
> >aligning branches within 32-byte boundary.  The impacted instructions
> >are:
> So, a few questions.
> 1. Without the assembler mitigation, what is the performance hit with and
>   without the microcode update?

The JCC erratum microcode update will cause a greater number of misses
out of the Decoded ICache and subsequent switches to the legacy decode
pipeline. This occurs since branches that overlay or end on a 32-byte
boundary are unable to fill into the Decoded ICache.

The potential performance impact of the JCC erratum mitigation arises
from two different sources:

1.      A switch penalty that occurs when executing in the Decoded
ICache and switching over to the legacy decode pipeline.
2.      Inefficiencies that occur when executing from the legacy
decode pipeline that are potentially hidden by the Decoded ICache.

Intel has observed performance effects associated with the workaround
ranging from 0-4% on many industry-standard benchmarks. In
subcomponents of these benchmarks, Intel has observed outliers higher
than the 0-4% range. The effects on other workloads not observed by
Intel may vary. Intel has developed software-based tools to minimize
the impact on potentially affected applications and workloads.

> 2. What is the code size increase of this assembler mitigation?

We measured the increase in code size due to the addition of padding
to instructions to align branches correctly. The geomean increase in
code size is 3-4% with individual outliers of up to 5%.

> 3. Why is the jcc+fused+jmp set suggested? (-mbranches-within-32B-boundaries)
>   What is the performance and code size impact with this set compared
>   with the full set? Among "jcc+fused+call+jmp+ret+indirect", which one
>   gives the largest hit?

From our investigation, we observed jcc+fused+jmp has mitigated most
of the performance effect from the benchmarks with moderate code size
increase. We think it strikes an appropriate balance between
performance gain and code size increase for most workloads.  For other
cases, we provide separate options for users to explore additional
performance improvement.

> 4. Shall we default to -mbranches-within-32B-boundaries if the specified
>    -march= or -mtune= may be affected by the erratum?

No. It’s a performance mitigation for the microcode update not a
functional fix. While it can mitigate the potential performance effect
in most cases as we observed, it increases the code size and may harm
the performance in some cases. It may also impact the performance of
those architectures which are not affected by this JCC erratum.

Software mitigation cannot be applied in some scenarios where
application behavior is dependent on exact code size. In other words,
the inserted padding (prefix, nop) may break the assumption of code
size that the programmer has made.  We have observed such assumptions
in the compilation of the Linux kernel.

Therefore we do not enable it by default. The user should evaluate its
impact and make their own determination as to whether to enable the
software mitigation  knowing that when this option is enabled, the
performance impact may vary case-by-case.

> 5. Do we need to increase the section alignment (sh_addralign)?

The minimum section alignment is increased to 32 bytes to ensure
that branches can be properly aligned.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]