New .nops directive, to aid Linux alternatives patching?

H.J. Lu hjl.tools@gmail.com
Mon Feb 12 13:48:00 GMT 2018


On Mon, Feb 12, 2018 at 5:44 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Mon, Feb 12, 2018 at 3:26 AM, Andrew Cooper
> <andrew.cooper3@citrix.com> wrote:
>> On 12/02/18 00:26, H.J. Lu wrote:
>>> On Sun, Feb 11, 2018 at 4:07 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> On Sun, Feb 11, 2018 at 3:28 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>> On Sun, Feb 11, 2018 at 3:05 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>>> On Sun, Feb 11, 2018 at 10:58 AM, Andrew Cooper
>>>>>> <andrew.cooper3@citrix.com> wrote:
>>>>>>> On 11/02/2018 17:19, H.J. Lu wrote:
>>>>>>>> On Sun, Feb 11, 2018 at 8:45 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>>>>>> On Sun, Feb 11, 2018 at 8:25 AM, Andrew Cooper
>>>>>>>>> <andrew.cooper3@citrix.com> wrote:
>>>>>>>>>> On 11/02/2018 00:59, H.J. Lu wrote:
>>>>>>>>>>>>> Please try users/hjl/nop branch:
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://github.com/hjl-tools/binutils-gdb/tree/users/hjl/nop
>>>>>>>>>>>> Oh - thankyou!  I was about to ask if there were any pointers to get
>>>>>>>>>>>> started hacking on binutils.
>>>>>>>>>>>>
>>>>>>>>>>>> As for the functionality, there are unfortunately some issues.  Given
>>>>>>>>>>>> this source:
>>>>>>>>>>>>
>>>>>>>>>>>>         .text
>>>>>>>>>>>> single:
>>>>>>>>>>>>         nop
>>>>>>>>>>>>
>>>>>>>>>>>> pseudo_1:
>>>>>>>>>>>>         .nop 1
>>>>>>>>>>>>
>>>>>>>>>>>> pseudo_8:
>>>>>>>>>>>>         .nop 8
>>>>>>>>>>>>
>>>>>>>>>>>> pseudo_8_4:
>>>>>>>>>>>>         .nop 8, 4
>>>>>>>>>>>>
>>>>>>>>>>>> pseudo_20:
>>>>>>>>>>>>         .nop 20
>>>>>>>>>>>>
>>>>>>>>>>>> I get the following disassembly:
>>>>>>>>>>>>
>>>>>>>>>>>> 0000000000000000 <single>:
>>>>>>>>>>>>    0:    90                       nop
>>>>>>>>>>>>
>>>>>>>>>>>> 0000000000000001 <pseudo_1>:
>>>>>>>>>>>>    1:    66 90                    xchg   %ax,%ax
>>>>>>>>>>>>
>>>>>>>>>>>> 0000000000000003 <pseudo_8>:
>>>>>>>>>>>>    3:    66 0f 1f 84 00 00 00     nopw   0x0(%rax,%rax,1)
>>>>>>>>>>>>    a:    00 00
>>>>>>>>>>>>
>>>>>>>>>>>> 000000000000000c <pseudo_8_4>:
>>>>>>>>>>>>    c:    90                       nop
>>>>>>>>>>>>    d:    0f 1f 40 00              nopl   0x0(%rax)
>>>>>>>>>>>>   11:    0f 1f 40 00              nopl   0x0(%rax)
>>>>>>>>>>>>
>>>>>>>>>>>> 0000000000000015 <pseudo_20>:
>>>>>>>>>>>>   15:    90                       nop
>>>>>>>>>>>>   16:    66 2e 0f 1f 84 00 00     nopw   %cs:0x0(%rax,%rax,1)
>>>>>>>>>>>>   1d:    00 00 00
>>>>>>>>>>>>   20:    66 2e 0f 1f 84 00 00     nopw   %cs:0x0(%rax,%rax,1)
>>>>>>>>>>>>   27:    00 00 00
>>>>>>>>>>>>
>>>>>>>>>>>> The MAX_NOP part looks to be working as intended (including reducing
>>>>>>>>>>>> below the default of 10), but there appears to be an off-by-one
>>>>>>>>>>>> somewhere, as one too many nops are emitted in the block.
>>>>>>>>>>>>
>>>>>>>>>>>> Furthermore, attempting to use .nop 30 yields:
>>>>>>>>>>>>
>>>>>>>>>>>> /tmp/ccI2Eakp.s: Assembler messages:
>>>>>>>>>>>> /tmp/ccI2Eakp.s: Fatal error: can't write 145268933551616 bytes to
>>>>>>>>>>>> section .text of nops.o: 'Bad value'
>>>>>>>>>>> Please try my branch again.  It should be fixed.
>>>>>>>>>> Thanks.  All of that looks to be in order.
>>>>>>>>>>
>>>>>>>>>> However, when trying to build larger examples, I've started hitting:
>>>>>>>>>>
>>>>>>>>>> /tmp/ccvxOy2v.s: Assembler messages:
>>>>>>>>>> /tmp/ccvxOy2v.s: Internal error in md_convert_frag at
>>>>>>>>>> ../../gas/config/tc-i386.c:9510.
>>>>>>>>>>
>>>>>>>>>> Which is the gas_assert (fragP->fr_var != BFD_RELOC_X86_NOP); you've added.
>>>>>>>>>>
>>>>>>>>>> It occurs when the calculation of the number of nops to insert evaluates
>>>>>>>>>> to 0, and a simple ".nop 0" managed to reproduce the issue.  The
>>>>>>>>>> calculation evaluating to 0 is a side effect of the existing logic to
>>>>>>>>>> evaluate how much, if an, padding is required, and follows this kind of
>>>>>>>>>> pattern:
>>>>>>>>>>
>>>>>>>>> It should be fixed now.  I also added 11-byte nop for 64-bit:
>>>>>>>>>
>>>>>>>>> 67 66 2e 0f 1f 84 00 00 00 00 00 nopw %cs:0x0(%eax,%eax,1)
>>>>>>>>>
>>>>>>>> I implemented:
>>>>>>>>
>>>>>>>> .nop SIZE [, MAX_NOP]
>>>>>>>>
>>>>>>>> where the maximum size is 255 bytes.  Should we go with
>>>>>>>>
>>>>>>>> .nop MAX_SIZE, SIZE [, MAX_NOP]
>>>>>>>>
>>>>>>>> to support more than 255 bytes?
>>>>>>> If you were to do that, why not simply remove the 255 maximum limit,
>>>>>>> rather than having a user pass two identical numbers?  That said, I
>>>>>>> think the current implementation with 255 is probably fine; My example
>>>>>>> of ~45 is pushing it, but I expect that any example trying to use 64 or
>>>>>>> more almost certainly has a better way to do the same thing.
>>>>>>>
>>>>>>> As for your latest branch, I've found one very curious failure which I'm
>>>>>>> at a loss to explain.  Its all building fine, except for one single
>>>>>>> RSB-stuffing alternative in VT-x vmexit handler.  The alternative in
>>>>>>> question should be 0 +21 nops padding, optionally replaced with 21 bytes
>>>>>>> of actual RSB-stuffing, and several identical copies of this alternative
>>>>>>> elsewhere appear to be working correctly.
>>>>>>>
>>>>>>> Using your latest branch, when building using .skip, everything works
>>>>>>> correctly, but when building with .nop, the calculation believes that
>>>>>>> there are only 3 bytes of padding necessary, and trip the assertion that
>>>>>>> the replacement length is not longer than original length.
>>>>>>>
>>>>>>> At a guess, I'd say that something is suspect with the relocation
>>>>>>> calculations, but I have no idea what.
>>>>>>>
>>>>>>> I haven't managed to miniaturise the repro any further than this:
>>>>>>>
>>>>>>> Grab
>>>>>>> http://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/alternatives-v1
>>>>>>> which is a branch cleaning up a load of our alternatives handling, and
>>>>>>> has support for .nop, and use the following build rune from the root of
>>>>>>> the tree:
>>>>>>>
>>>>>>> (cd xen/arch/x86/hvm/vmx;
>>>>>>> PATH=/local/bin/gcc-ret/bin:/local/bin/nops-binutils/bin:$PATH gcc
>>>>>>> -D__ASSEMBLY__ -m64 -DBUILD_ID -fno-strict-aliasing -Wall
>>>>>>> -Wstrict-prototypes -Wdeclaration-after-statement
>>>>>>> -Wno-unused-but-set-variable -Wno-unused-local-typedefs -O1
>>>>>>> -fno-omit-frame-pointer -nostdinc -fno-builtin -fno-common -Werror
>>>>>>> -Wredundant-decls -Wno-pointer-arith -pipe -g -D__XEN__ -include
>>>>>>> ../../../../include/xen/config.h '-D__OBJECT_FILE__="entry.o"'
>>>>>>> -Wa,--strip-local-absolute -MMD -MF ./.entry.o.d -I../../../../include
>>>>>>> -I../../../../include/asm-x86/mach-generic
>>>>>>> -I../../../../include/asm-x86/mach-default -DXEN_IMG_OFFSET=0x200000
>>>>>>> '-D__OBJECT_LABEL__=arch$x86$hvm$vmx$entry.o' -msoft-float
>>>>>>> -fno-stack-protector -fno-exceptions -Wnested-externs -DHAVE_GAS_VMX
>>>>>>> -DHAVE_GAS_SSE4_2 -DHAVE_GAS_EPT -DHAVE_GAS_RDRAND -DHAVE_GAS_FSGSBASE
>>>>>>> -DHAVE_GAS_RDSEED -DHAVE_GAS_LONG_NOPS -U__OBJECT_LABEL__
>>>>>>> -DHAVE_GAS_QUOTED_SYM '-D__OBJECT_LABEL__=arch/x86/hvm/vmx/entry.o'
>>>>>>> -mno-red-zone -fpic -fno-asynchronous-unwind-tables -mno-sse
>>>>>>> -mskip-rax-setup -DGCC_HAS_VISIBILITY_ATTRIBUTE
>>>>>>> -mindirect-branch=thunk-extern -mindirect-branch-register
>>>>>>> -DCONFIG_INDIRECT_THUNK -Wa,-I../../../../include -c entry.S -o entry.o)
>>>>>>>
>>>>>>> vmx's entry.S is fairly small, and in this example, I happen to be using
>>>>>>> one of your repoline branch versions of from "gcc (GCC) 7.2.1
>>>>>>> 20171218".  Substitute the PATH as appropriate, and the interesting bits
>>>>>>> of the ALTERNATIVE implementation are all in
>>>>>>> xen/include/asm-x86/alternative-asm.h
>>>>>> Is this the error message you saw:
>>>>>>
>>>>>> gcc -D__ASSEMBLY__ -m64 -DBUILD_ID -fno-strict-aliasing -Wall
>>>>>> -Wstrict-prototypes -Wdeclaration-after-statement
>>>>>> -Wno-unused-but-set-variable -Wno-unused-local-typedefs -O1
>>>>>> -fno-omit-frame-pointer -nostdinc -fno-builtin -fno-common -Werror
>>>>>> -Wredundant-decls -Wno-pointer-arith -pipe -g -D__XEN__ -include
>>>>>> /export/ssd/git/kernel.org/xen/xen/include/xen/config.h
>>>>>> '-D__OBJECT_FILE__="entry.o"' -Wa,--strip-local-absolute -MMD -MF
>>>>>> ./.entry.o.d -I/export/ssd/git/kernel.org/xen/xen/include
>>>>>> -I/export/ssd/git/kernel.org/xen/xen/include/asm-x86/mach-generic
>>>>>> -I/export/ssd/git/kernel.org/xen/xen/include/asm-x86/mach-default
>>>>>> -DXEN_IMG_OFFSET=0x200000
>>>>>> '-D__OBJECT_LABEL__=arch$x86$hvm$vmx$entry.o' -msoft-float
>>>>>> -fno-stack-protector -fno-exceptions -Wnested-externs -DHAVE_GAS_VMX
>>>>>> -DHAVE_GAS_SSE4_2 -DHAVE_GAS_EPT -DHAVE_GAS_RDRAND -DHAVE_GAS_FSGSBASE
>>>>>> -DHAVE_GAS_RDSEED -DHAVE_GAS_LONG_NOPS -U__OBJECT_LABEL__
>>>>>> -DHAVE_GAS_QUOTED_SYM '-D__OBJECT_LABEL__=arch/x86/hvm/vmx/entry.o'
>>>>>> -mno-red-zone -fpic -fno-asynchronous-unwind-tables -mno-sse
>>>>>> -mskip-rax-setup -DGCC_HAS_VISIBILITY_ATTRIBUTE
>>>>>> -mindirect-branch=thunk-extern -mindirect-branch-register
>>>>>> -DCONFIG_INDIRECT_THUNK
>>>>>> -Wa,-I/export/ssd/git/kernel.org/xen/xen/include -c entry.S
>>>>>> entry.S: Assembler messages:
>>>>>> entry.S:41: Error: value of 292 too large for field of 1 byte at 1
>>>>>>
>>>>>>
>>>>> I need a small testcase to work on assembler.  Please double
>>>>> check to verify that your change is correct.
>>>>>
>>>> Does it look a testcase?
>>>>
>>>> .macro mknops nr_bytes
>>>> #ifdef NOP
>>>>     .nop \nr_bytes, 9
>>>> #else
>>>>     .skip \nr_bytes, 9
>>>> #endif
>>>> .endm
>>>>
>>>> .L_orig_s:
>>>> .L_orig_e:
>>>>      mknops (-(((.L_repl_e1 - .L_repl_s1) - (.L_orig_e - .L_orig_s)) >
>>>> 0) * ((.L_repl_e1 - .L_repl_s1) - (.L_orig_e - .L_orig_s)))
>>>> .L_orig_p:
>>>>
>>>>     .byte 0xff + (.L_repl_e1 - .L_repl_s1) - (.L_orig_p - .L_orig_s)
>>>>     .section .altinstr_replacement, "ax", @progbits
>>>> .L_repl_s1:
>>>> .L_fill_rsb_loop:
>>>>     jnz .L_fill_rsb_loop
>>>>     mov %rax, %rsp
>>>> .L_repl_e1:
>>>>
>>>> [hjl@gnu-bdx-1 vmx]$ gcc -c y.S -DNOP
>>>> y.S: Assembler messages:
>>>> y.S:14: Error: value of 257 too large for field of 1 byte at 3
>>>> [hjl@gnu-bdx-1 vmx]$ gcc -c y.S
>>>> [hjl@gnu-bdx-1 vmx]$
>>>>
>>>>
>>> This is because I used a machine dependent relax state in
>>> assembler to implement this so that I only need to change
>>> the x86 specific part of assembler.  But it is also used to
>>> implement branches.  They won't work together.
>>>
>>> I need to add a new relax state.
>>>
>>
>> Oops sorry yes - I should have given you the error as well, but it looks
>> like you're on top of the problem.  If there is anything else I can do
>> to help, please ask.
>>
>
> It is fixed now.
>
> BTW, linker doesn't support "-r -s" on object generated by "gcc -g3".
> -g2 works.  I am working on a fix.
>

This:

https://sourceware.org/bugzilla/show_bug.cgi?id=22836


-- 
H.J.



More information about the Binutils mailing list