This is the mail archive of the
binutils@sourceware.org
mailing list for the binutils project.
Re: New .nops directive, to aid Linux alternatives patching?
On 09/02/2018 00:24, H.J. Lu wrote:
> On Thu, Feb 8, 2018 at 3:47 PM, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> On 08/02/2018 20:36, H.J. Lu wrote:
>>> On Thu, Feb 8, 2018 at 12:33 PM, Andrew Cooper
>>> <andrew.cooper3@citrix.com> wrote:
>>>> On 08/02/2018 20:28, H.J. Lu wrote:
>>>>> On Thu, Feb 8, 2018 at 12:27 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>>> On Thu, Feb 8, 2018 at 12:18 PM, Andrew Cooper
>>>>>> <andrew.cooper3@citrix.com> wrote:
>>>>>>> On 08/02/2018 20:10, H.J. Lu wrote:
>>>>>>>> On Thu, Feb 8, 2018 at 11:26 AM, Andrew Cooper
>>>>>>>> <andrew.cooper3@citrix.com> wrote:
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> I realise this is a little bit niche, but how feasible would it be to
>>>>>>>>> introduce a new .nops directive which takes a size parameter, and
>>>>>>>>> outputs long nops covering the number of specified bytes?
>>>>>>>>>
>>>>>>>> Sounds to me you want a pseudo NOP instruction:
>>>>>>>>
>>>>>>>> pseudo-NOP N
>>>>>>>>
>>>>>>>> which generates a long NOP with N byte. Is that correct. If yes,
>>>>>>>> what is the range of N?
>>>>>>> Currently 255 based on other implementation limits, and I expect that
>>>>>>> ought to be long enough for anyone. There is one existing user for
>>>>>>> N=43, and I expect that to grow a bit.
>>>>>>>
>>>>>>> The real answer properly depends at what point it is more efficient to
>>>>>>> jmp rather than wasting decode bandwidth decoding nops, and I don't know
>>>>>>> the answer, but expect that it isn't larger than 255.
>>>>>>>
>>>>>> How about
>>>>>>
>>>>>> {nop} N
>>>>>>
>>>>>> If N is less than 15 bytes, it generates a long nop. Otherwise, we use a jump
>>>>>> instruction over nops. Does it work for you?
>>>>> N will be limited to 255.
>>>> Do you mean up to 255 bytes of adjacent long nops, or still a jump if
>>>> over 15 bytes? For alternatives in the range of 15-30, a jmp is almost
>>>> certainly slower than executing through the nops. The ORM isn't clear
>>>> where the split lies, and I expect it is very uarch specific.
>>> How about this
>>>
>>> {nop} N, L
>>> {nop} N
>>>
>>> N is < =255. If L is missing, L is 15.
>>>
>>> If N < L then
>>> Long NOPs up to N bytes
>>> else
>>> jmp + long nops up to N bytes.
>>> fi
>> I'm afraid that I don't think that will be very helpful in that form.
>> Are there technical reasons why you don't want to emit more than a
>> single 15byte long nop?
>>
> Doesn't
>
> {nop} 28, 40
>
> generate 2 x 14-byte nops?
By the above logic, yes. I still don't see the value in the L
parameter, because I don't expect an average programmer to know how to
choose it sensibly. Then again, a compiler generating code for a
specified uarch probably could have some idea of what value to feed in.
If the semantics were a little more like:
{nop} N => N bytes of nops with no jumps
{nop} N, L => as above
Then this might be more useful.
I expect N will typically be an expression rather than an absolute
number, because the usecase I've proposed is for filling in a specific,
calculated number of bytes. (In particular, what commonly happens is
that memory references in alternatives are the thing which cause the
exact length to fluctuate.) When there is a sensible uarch value for L,
that can be fed in, but shouldn't be mandatory. In particular, if it
unknown, 15 is almost certainly the wrong default for it.
~Andrew