New .nops directive, to aid Linux alternatives patching?

H.J. Lu hjl.tools@gmail.com
Sat Feb 10 15:44:00 GMT 2018


On Fri, Feb 9, 2018 at 5:29 AM, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> On 09/02/18 11:55, H.J. Lu wrote:
>> On Fri, Feb 9, 2018 at 3:35 AM, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>> On 09/02/18 02:22, H.J. Lu wrote:
>>>> On Thu, Feb 8, 2018 at 5:14 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>> On Thu, Feb 8, 2018 at 4:45 PM, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>>>>> On 09/02/2018 00:24, H.J. Lu wrote:
>>>>>>> On Thu, Feb 8, 2018 at 3:47 PM, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>>>>>>> On 08/02/2018 20:36, H.J. Lu wrote:
>>>>>>>>> On Thu, Feb 8, 2018 at 12:33 PM, Andrew Cooper
>>>>>>>>> <andrew.cooper3@citrix.com> wrote:
>>>>>>>>>> On 08/02/2018 20:28, H.J. Lu wrote:
>>>>>>>>>>> On Thu, Feb 8, 2018 at 12:27 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>>>>>>>>> On Thu, Feb 8, 2018 at 12:18 PM, Andrew Cooper
>>>>>>>>>>>> <andrew.cooper3@citrix.com> wrote:
>>>>>>>>>>>>> On 08/02/2018 20:10, H.J. Lu wrote:
>>>>>>>>>>>>>> On Thu, Feb 8, 2018 at 11:26 AM, Andrew Cooper
>>>>>>>>>>>>>> <andrew.cooper3@citrix.com> wrote:
>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I realise this is a little bit niche, but how feasible would it be to
>>>>>>>>>>>>>>> introduce a new .nops directive which takes a size parameter, and
>>>>>>>>>>>>>>> outputs long nops covering the number of specified bytes?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sounds to me you want a pseudo NOP instruction:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> pseudo-NOP N
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> which generates a long NOP with N byte.  Is that correct.  If yes,
>>>>>>>>>>>>>> what is the range of N?
>>>>>>>>>>>>> Currently 255 based on other implementation limits, and I expect that
>>>>>>>>>>>>> ought to be long enough for anyone.  There is one existing user for
>>>>>>>>>>>>> N=43, and I expect that to grow a bit.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The real answer properly depends at what point it is more efficient to
>>>>>>>>>>>>> jmp rather than wasting decode bandwidth decoding nops, and I don't know
>>>>>>>>>>>>> the answer, but expect that it isn't larger than 255.
>>>>>>>>>>>>>
>>>>>>>>>>>> How about
>>>>>>>>>>>>
>>>>>>>>>>>> {nop} N
>>>>>>>>>>>>
>>>>>>>>>>>> If N is less than 15 bytes, it generates a long nop.   Otherwise, we use a jump
>>>>>>>>>>>> instruction over nops.  Does it work for you?
>>>>>>>>>>> N will be limited to 255.
>>>>>>>>>> Do you mean up to 255 bytes of adjacent long nops, or still a jump if
>>>>>>>>>> over 15 bytes?  For alternatives in the range of 15-30, a jmp is almost
>>>>>>>>>> certainly slower than executing through the nops.  The ORM isn't clear
>>>>>>>>>> where the split lies, and I expect it is very uarch specific.
>>>>>>>>> How about this
>>>>>>>>>
>>>>>>>>> {nop} N, L
>>>>>>>>> {nop} N
>>>>>>>>>
>>>>>>>>> N is < =255. If L is missing, L is 15.
>>>>>>>>>
>>>>>>>>> If N < L then
>>>>>>>>>   Long NOPs up to N bytes
>>>>>>>>> else
>>>>>>>>>   jmp + long nops up to N bytes.
>>>>>>>>> fi
>>>>>>>> I'm afraid that I don't think that will be very helpful in that form.
>>>>>>>> Are there technical reasons why you don't want to emit more than a
>>>>>>>> single 15byte long nop?
>>>>>>>>
>>>>>>> Doesn't
>>>>>>>
>>>>>>> {nop} 28, 40
>>>>>>>
>>>>>>> generate 2 x 14-byte nops?
>>>>>> By the above logic, yes.  I still don't see the value in the L
>>>>>> parameter, because I don't expect an average programmer to know how to
>>>>>> choose it sensibly.  Then again, a compiler generating code for a
>>>>>> specified uarch probably could have some idea of what value to feed in.
>>>>>>
>>>>>> If the semantics were a little more like:
>>>>>>
>>>>>> {nop} N => N bytes of nops with no jumps
>>>>>> {nop} N, L => as above
>>>>>>
>>>>>> Then this might be more useful.
>>>>>>
>>>>>> I expect N will typically be an expression rather than an absolute
>>>>>> number, because the usecase I've proposed is for filling in a specific,
>>>>>> calculated number of bytes.  (In particular, what commonly happens is
>>>>>> that memory references in alternatives are the thing which cause the
>>>>>> exact length to fluctuate.)  When there is a sensible uarch value for L,
>>>>>> that can be fed in, but shouldn't be mandatory.  In particular, if it
>>>>>> unknown, 15 is almost certainly the wrong default for it.
>>>>> So, you want
>>>>>
>>>>> .nop SIZE
>>>>>
>>>>> and
>>>>>
>>>>> .jump SIZE
>>>>>
>>>>> which are similar to '.skip SIZE , FILL'.  But they fill SIZE with nops or
>>>>> jmp + nops.
>>>>>
>>>> Or
>>>>
>>>> .nop SIZE, JUMP_SIZE
>>>>
>>>> If SIZE < JUMP_SIZE then
>>>>   SIZE of nops.
>>>> else
>>>>   SIZE of jmp + nops.
>>>> fi
>>> I'm still not sure why you want the jump functionality in the first
>>> place, but yes - this latest option would work.
>>>
>>> FWIW, jumping over code with alternatives is typically done like:
>>>
>>> ALTERNATIVE "jmp .L\@_skip", "", FEATURE_X
>>> ...
>>> .L\@_skip:
>>>
>>> At which point it is only the two or 5 byte jmp which is being
>>> dynamically modified.  The converse case is where we begin with 2 or 5
>>> bytes of nops, and dynamically insert the jmp.
>>>
>>> If we're in the line for other related feature requests, how about being
>>> able to optionally specify the maximum length of individual nops?  e.g.
>>>
>>> .nop SIZE [, MAX_NOP = 9 [, JUMP_SIZE = -1]]
>> OK, let go with
>>
>>  .nop SIZE [, MAX_NOP = 9]
>>
>> It is easier to implement with 2 arguments.   MAX_NOP must be a constant.
>
> Sounds good to me.

Please try users/hjl/nop branch:

https://github.com/hjl-tools/binutils-gdb/tree/users/hjl/nop

It implemented:

.nop SIZE [, MAX_NOP = 10]

The maximum SIZE is 255.

-- 
H.J.



More information about the Binutils mailing list