New .nops directive, to aid Linux alternatives patching?

H.J. Lu hjl.tools@gmail.com
Fri Feb 9 01:14:00 GMT 2018


On Thu, Feb 8, 2018 at 4:45 PM, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> On 09/02/2018 00:24, H.J. Lu wrote:
>> On Thu, Feb 8, 2018 at 3:47 PM, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>> On 08/02/2018 20:36, H.J. Lu wrote:
>>>> On Thu, Feb 8, 2018 at 12:33 PM, Andrew Cooper
>>>> <andrew.cooper3@citrix.com> wrote:
>>>>> On 08/02/2018 20:28, H.J. Lu wrote:
>>>>>> On Thu, Feb 8, 2018 at 12:27 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>>>> On Thu, Feb 8, 2018 at 12:18 PM, Andrew Cooper
>>>>>>> <andrew.cooper3@citrix.com> wrote:
>>>>>>>> On 08/02/2018 20:10, H.J. Lu wrote:
>>>>>>>>> On Thu, Feb 8, 2018 at 11:26 AM, Andrew Cooper
>>>>>>>>> <andrew.cooper3@citrix.com> wrote:
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> I realise this is a little bit niche, but how feasible would it be to
>>>>>>>>>> introduce a new .nops directive which takes a size parameter, and
>>>>>>>>>> outputs long nops covering the number of specified bytes?
>>>>>>>>>>
>>>>>>>>> Sounds to me you want a pseudo NOP instruction:
>>>>>>>>>
>>>>>>>>> pseudo-NOP N
>>>>>>>>>
>>>>>>>>> which generates a long NOP with N byte.  Is that correct.  If yes,
>>>>>>>>> what is the range of N?
>>>>>>>> Currently 255 based on other implementation limits, and I expect that
>>>>>>>> ought to be long enough for anyone.  There is one existing user for
>>>>>>>> N=43, and I expect that to grow a bit.
>>>>>>>>
>>>>>>>> The real answer properly depends at what point it is more efficient to
>>>>>>>> jmp rather than wasting decode bandwidth decoding nops, and I don't know
>>>>>>>> the answer, but expect that it isn't larger than 255.
>>>>>>>>
>>>>>>> How about
>>>>>>>
>>>>>>> {nop} N
>>>>>>>
>>>>>>> If N is less than 15 bytes, it generates a long nop.   Otherwise, we use a jump
>>>>>>> instruction over nops.  Does it work for you?
>>>>>> N will be limited to 255.
>>>>> Do you mean up to 255 bytes of adjacent long nops, or still a jump if
>>>>> over 15 bytes?  For alternatives in the range of 15-30, a jmp is almost
>>>>> certainly slower than executing through the nops.  The ORM isn't clear
>>>>> where the split lies, and I expect it is very uarch specific.
>>>> How about this
>>>>
>>>> {nop} N, L
>>>> {nop} N
>>>>
>>>> N is < =255. If L is missing, L is 15.
>>>>
>>>> If N < L then
>>>>   Long NOPs up to N bytes
>>>> else
>>>>   jmp + long nops up to N bytes.
>>>> fi
>>> I'm afraid that I don't think that will be very helpful in that form.
>>> Are there technical reasons why you don't want to emit more than a
>>> single 15byte long nop?
>>>
>> Doesn't
>>
>> {nop} 28, 40
>>
>> generate 2 x 14-byte nops?
>
> By the above logic, yes.  I still don't see the value in the L
> parameter, because I don't expect an average programmer to know how to
> choose it sensibly.  Then again, a compiler generating code for a
> specified uarch probably could have some idea of what value to feed in.
>
> If the semantics were a little more like:
>
> {nop} N => N bytes of nops with no jumps
> {nop} N, L => as above
>
> Then this might be more useful.
>
> I expect N will typically be an expression rather than an absolute
> number, because the usecase I've proposed is for filling in a specific,
> calculated number of bytes.  (In particular, what commonly happens is
> that memory references in alternatives are the thing which cause the
> exact length to fluctuate.)  When there is a sensible uarch value for L,
> that can be fed in, but shouldn't be mandatory.  In particular, if it
> unknown, 15 is almost certainly the wrong default for it.

So, you want

.nop SIZE

and

.jump SIZE

which are similar to '.skip SIZE , FILL'.  But they fill SIZE with nops or
jmp + nops.

-- 
H.J.



More information about the Binutils mailing list