New .nops directive, to aid Linux alternatives patching?

Andrew Cooper
Fri Feb 9 11:35:00 GMT 2018

On 09/02/18 02:22, H.J. Lu wrote:
> On Thu, Feb 8, 2018 at 5:14 PM, H.J. Lu <> wrote:
>> On Thu, Feb 8, 2018 at 4:45 PM, Andrew Cooper <> wrote:
>>> On 09/02/2018 00:24, H.J. Lu wrote:
>>>> On Thu, Feb 8, 2018 at 3:47 PM, Andrew Cooper <> wrote:
>>>>> On 08/02/2018 20:36, H.J. Lu wrote:
>>>>>> On Thu, Feb 8, 2018 at 12:33 PM, Andrew Cooper
>>>>>> <> wrote:
>>>>>>> On 08/02/2018 20:28, H.J. Lu wrote:
>>>>>>>> On Thu, Feb 8, 2018 at 12:27 PM, H.J. Lu <> wrote:
>>>>>>>>> On Thu, Feb 8, 2018 at 12:18 PM, Andrew Cooper
>>>>>>>>> <> wrote:
>>>>>>>>>> On 08/02/2018 20:10, H.J. Lu wrote:
>>>>>>>>>>> On Thu, Feb 8, 2018 at 11:26 AM, Andrew Cooper
>>>>>>>>>>> <> wrote:
>>>>>>>>>>>> Hello,
>>>>>>>>>>>> I realise this is a little bit niche, but how feasible would it be to
>>>>>>>>>>>> introduce a new .nops directive which takes a size parameter, and
>>>>>>>>>>>> outputs long nops covering the number of specified bytes?
>>>>>>>>>>> Sounds to me you want a pseudo NOP instruction:
>>>>>>>>>>> pseudo-NOP N
>>>>>>>>>>> which generates a long NOP with N byte.  Is that correct.  If yes,
>>>>>>>>>>> what is the range of N?
>>>>>>>>>> Currently 255 based on other implementation limits, and I expect that
>>>>>>>>>> ought to be long enough for anyone.  There is one existing user for
>>>>>>>>>> N=43, and I expect that to grow a bit.
>>>>>>>>>> The real answer properly depends at what point it is more efficient to
>>>>>>>>>> jmp rather than wasting decode bandwidth decoding nops, and I don't know
>>>>>>>>>> the answer, but expect that it isn't larger than 255.
>>>>>>>>> How about
>>>>>>>>> {nop} N
>>>>>>>>> If N is less than 15 bytes, it generates a long nop.   Otherwise, we use a jump
>>>>>>>>> instruction over nops.  Does it work for you?
>>>>>>>> N will be limited to 255.
>>>>>>> Do you mean up to 255 bytes of adjacent long nops, or still a jump if
>>>>>>> over 15 bytes?  For alternatives in the range of 15-30, a jmp is almost
>>>>>>> certainly slower than executing through the nops.  The ORM isn't clear
>>>>>>> where the split lies, and I expect it is very uarch specific.
>>>>>> How about this
>>>>>> {nop} N, L
>>>>>> {nop} N
>>>>>> N is < =255. If L is missing, L is 15.
>>>>>> If N < L then
>>>>>>   Long NOPs up to N bytes
>>>>>> else
>>>>>>   jmp + long nops up to N bytes.
>>>>>> fi
>>>>> I'm afraid that I don't think that will be very helpful in that form.
>>>>> Are there technical reasons why you don't want to emit more than a
>>>>> single 15byte long nop?
>>>> Doesn't
>>>> {nop} 28, 40
>>>> generate 2 x 14-byte nops?
>>> By the above logic, yes.  I still don't see the value in the L
>>> parameter, because I don't expect an average programmer to know how to
>>> choose it sensibly.  Then again, a compiler generating code for a
>>> specified uarch probably could have some idea of what value to feed in.
>>> If the semantics were a little more like:
>>> {nop} N => N bytes of nops with no jumps
>>> {nop} N, L => as above
>>> Then this might be more useful.
>>> I expect N will typically be an expression rather than an absolute
>>> number, because the usecase I've proposed is for filling in a specific,
>>> calculated number of bytes.  (In particular, what commonly happens is
>>> that memory references in alternatives are the thing which cause the
>>> exact length to fluctuate.)  When there is a sensible uarch value for L,
>>> that can be fed in, but shouldn't be mandatory.  In particular, if it
>>> unknown, 15 is almost certainly the wrong default for it.
>> So, you want
>> .nop SIZE
>> and
>> .jump SIZE
>> which are similar to '.skip SIZE , FILL'.  But they fill SIZE with nops or
>> jmp + nops.
> Or
> If SIZE < JUMP_SIZE then
>   SIZE of nops.
> else
>   SIZE of jmp + nops.
> fi

I'm still not sure why you want the jump functionality in the first
place, but yes - this latest option would work.

FWIW, jumping over code with alternatives is typically done like:

ALTERNATIVE "jmp .L\@_skip", "", FEATURE_X

At which point it is only the two or 5 byte jmp which is being
dynamically modified.  The converse case is where we begin with 2 or 5
bytes of nops, and dynamically insert the jmp.

If we're in the line for other related feature requests, how about being
able to optionally specify the maximum length of individual nops?  e.g.

.nop SIZE [, MAX_NOP = 9 [, JUMP_SIZE = -1]]

  SIZE of nops (of MAX_NOP len or less).
  SIZE of jmp + nops.

uarch considerations also affect the maximum length of long nops which
can be executed without suffering decode stalls.  A sensible default (on
64-bit capable processors) is 9, rather than the 15 which would be the
more obvious answer.  However, in the case of inserting the jmp, we
don't end up executing the nops, at which point decode stalls are not of
any concern.


More information about the Binutils mailing list