[PATCH 00/11] x86: NOP emission adjustments

Wed Sep 27 15:59:25 GMT 2023

On 27.09.2023 17:46, Jan Beulich via Binutils wrote:
> I've noticed a number of issues and inefficiencies.
> 
> 01: x86: record flag_code in tc_frag_data
> 02: x86: i386_generate_nops() may not derive decisions from global variables
> 03: x86: don't use 32-bit LEA as NOP surrogate in 64-bit code
> 04: x86: don't use operand size override with NOP in 16-bit code
> 05: x86: respect ".arch nonop" when selecting which NOPs to emit
> 06: x86: i686 != PentiumPro
> 07: x86: don't record full i386_cpu_flags in struct i386_tc_frag_data
> 08: x86: add a few more NOP patterns
> 09: x86: fold a few of the "alternative" NOP patterns
> 10: x86: fold NOP testcase expecations where possible
> 11: gas: make .nops output visible in listing

I shall have mentioned one further observation: When we use LEA as NOP-
surrogate, we always use %{,e,r}si as destination. I was suspecting this
might not be optimal when these actually end up executing, and indeed on
one of the three systems I checked (a Skylake) there was a reliably
measurable difference between that and alternating the destination
registers used. Question is whether that's enough of a concern, when
generally we expect people to build 64-bit code and not use .arch .nonop.

Jan