This is the mail archive of the binutils@sourceware.org mailing list for the binutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: New .nops directive, to aid Linux alternatives patching?


On 09/02/2018 00:24, H.J. Lu wrote:
> On Thu, Feb 8, 2018 at 3:47 PM, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> On 08/02/2018 20:36, H.J. Lu wrote:
>>> On Thu, Feb 8, 2018 at 12:33 PM, Andrew Cooper
>>> <andrew.cooper3@citrix.com> wrote:
>>>> On 08/02/2018 20:28, H.J. Lu wrote:
>>>>> On Thu, Feb 8, 2018 at 12:27 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>>> On Thu, Feb 8, 2018 at 12:18 PM, Andrew Cooper
>>>>>> <andrew.cooper3@citrix.com> wrote:
>>>>>>> On 08/02/2018 20:10, H.J. Lu wrote:
>>>>>>>> On Thu, Feb 8, 2018 at 11:26 AM, Andrew Cooper
>>>>>>>> <andrew.cooper3@citrix.com> wrote:
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> I realise this is a little bit niche, but how feasible would it be to
>>>>>>>>> introduce a new .nops directive which takes a size parameter, and
>>>>>>>>> outputs long nops covering the number of specified bytes?
>>>>>>>>>
>>>>>>>> Sounds to me you want a pseudo NOP instruction:
>>>>>>>>
>>>>>>>> pseudo-NOP N
>>>>>>>>
>>>>>>>> which generates a long NOP with N byte.  Is that correct.  If yes,
>>>>>>>> what is the range of N?
>>>>>>> Currently 255 based on other implementation limits, and I expect that
>>>>>>> ought to be long enough for anyone.  There is one existing user for
>>>>>>> N=43, and I expect that to grow a bit.
>>>>>>>
>>>>>>> The real answer properly depends at what point it is more efficient to
>>>>>>> jmp rather than wasting decode bandwidth decoding nops, and I don't know
>>>>>>> the answer, but expect that it isn't larger than 255.
>>>>>>>
>>>>>> How about
>>>>>>
>>>>>> {nop} N
>>>>>>
>>>>>> If N is less than 15 bytes, it generates a long nop.   Otherwise, we use a jump
>>>>>> instruction over nops.  Does it work for you?
>>>>> N will be limited to 255.
>>>> Do you mean up to 255 bytes of adjacent long nops, or still a jump if
>>>> over 15 bytes?  For alternatives in the range of 15-30, a jmp is almost
>>>> certainly slower than executing through the nops.  The ORM isn't clear
>>>> where the split lies, and I expect it is very uarch specific.
>>> How about this
>>>
>>> {nop} N, L
>>> {nop} N
>>>
>>> N is < =255. If L is missing, L is 15.
>>>
>>> If N < L then
>>>   Long NOPs up to N bytes
>>> else
>>>   jmp + long nops up to N bytes.
>>> fi
>> I'm afraid that I don't think that will be very helpful in that form.
>> Are there technical reasons why you don't want to emit more than a
>> single 15byte long nop?
>>
> Doesn't
>
> {nop} 28, 40
>
> generate 2 x 14-byte nops?

By the above logic, yes.  I still don't see the value in the L
parameter, because I don't expect an average programmer to know how to
choose it sensibly.  Then again, a compiler generating code for a
specified uarch probably could have some idea of what value to feed in.

If the semantics were a little more like:

{nop} N => N bytes of nops with no jumps
{nop} N, L => as above

Then this might be more useful.

I expect N will typically be an expression rather than an absolute
number, because the usecase I've proposed is for filling in a specific,
calculated number of bytes.  (In particular, what commonly happens is
that memory references in alternatives are the thing which cause the
exact length to fluctuate.)  When there is a sensible uarch value for L,
that can be fed in, but shouldn't be mandatory.  In particular, if it
unknown, 15 is almost certainly the wrong default for it.

~Andrew


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]