Handling of spaces between directives and their arguments when expanding macros is not consistent in GNU as -- it only supports selective directives. For example, the following code assembled with aarch64-linux-gnu-gcc, .macro foo insn \insn .endm foo .inst (XXX) However, if we replace the argument of the macro expansion with another directive, such as "foo .section .sec1", it will fail. This is the case on x86 as well. Removing the space and surround the section name with parenthesis will work, i.e. "foo .section(.sec1)". But the section name is (.sec1) instead of .sec1 as intended. Not sure if this is intended but it introduces some confusion IMO.
Hi caij, Which version of the binutils are you using for these tests ? When I tried to reproduce the problem using the current development sources I received the message "Error: too many positional arguments" for each invocation of foo. I did however find a variation that worked and did not have any behavioural inconsistencies: .macro foo insn:vararg \insn .endm foo .word (XXX) foo .word 0x1234 foo .section sec1 Using the :vararg suffix to tell the macro processor to use all of the text as a single argument (inside the macro) preserves spaces and results in the output that you would expect. Cheers Nick
Hi Nick, Thank you for the information. I used gas 2.33.1 $ as -v GNU assembler version 2.33.1 (aarch64-linux-gnu) using BFD version (GNU Binutils for Debian) 2.33.1 Does :vararg work for multiple arguments? Thanks.
Hi Jian, (In reply to Jian Cai from comment #2) > Does :vararg work for multiple arguments? Thanks. Yes it does: % cat fred.s .macro foo insn:vararg \insn .endm foo .section sec1 , "ax" , @progbits foo .section sec2 , "aw" , @note % as fred.s -o fred.o % readelf --wide --sections fred.o | grep sec [ 4] sec1 PROGBITS 0000000000000000 000040 000000 00 AX 0 0 1 [ 5] sec2 NOTE 0000000000000000 000040 000000 00 WA 0 0 1 Cheers Nick
Sorry I just realized my question was misleading. What I really meant was if vararg would work on a macro taking as input multiple directives with arguments, such as the one below. vararg seemed to not work with such case in my experiment. .macro foo insn1, insn2 \insn1 \insn2 .endm foo .section .sec1 foo .section .sec2
Hi Jian, > Sorry I just realized my question was misleading. What I really meant was if > vararg would work on a macro taking as input multiple directives with > arguments, such as the one below. vararg seemed to not work with such case > in my experiment. > > .macro foo insn1, insn2 > \insn1 > \insn2 > .endm > > foo .section .sec1 > foo .section .sec2 Well now that particular example would not work[1], but then it is not using the :vararg suffix either. You are correct however in assuming that you cannot have two :vararg suffixes in a macro definition. So for example this does not work: .macro foo insn1:vararg insn2:vararg \insn1 \insn2 .endm foo .section .sec1 .section .sec2 In fact only the last macro argument can have the :vararg suffix, which is why it can only be used once. Cheers Nick [1] Just to be pedantic, your example would work if you put the two insns on the same line. So this: .macro foo insn1 insn2 \insn1 \insn2 .endm foo .section .sec1 Does work.
> So this: > > .macro foo insn1 insn2 > \insn1 \insn2 > .endm > > foo .section .sec1 > > Does work. Thanks for the suggestion. I found this issue while working on https://github.com/ClangBuiltLinux/linux/issues/939. Essentially, the following case assembled with arm-linux-gnueabihf-gcc but not clang. $ cat bad.s .macro alternative_insn insn1 .endm alternative_insn .inst (0x0) There were several ways to make it work with clang, including removing the space between .inst and its argument, quoting the directive and its arguments, and now using vararg as you suggested. I made an LLVM patch for (https://reviews.llvm.org/D76962), which will make clang's integrated assembler to treat .inst and its arguments as one macro argument in the above example. It did not go though upstream review as it would create mismatch between clang and gcc, so I am curious if gcc is interested to have more general support, so that code like the example in Comment 4 will work without being rewritten.
(In reply to Jian Cai from comment #6) Hi Jian, > I made an LLVM patch for (https://reviews.llvm.org/D76962), which will make > clang's integrated assembler to treat .inst and its arguments as one macro > argument in the above example. I am not clear on this. Do you mean that if a macro's argument is the string ".inst" then it will treat any text that follows on the same line as being part of the same argument ? If so, then is this special treatment reserved only for macro arguments that start with ".inst" ? > It did not go though upstream review as it > would create mismatch between clang and gcc, [Just to be pedantic, it would be a mismatch between llvm's integrated assembler and the binutils's stand alone assembler. Gcc is a separate project to the binutils]. > so I am curious if gcc is > interested to have more general support, so that code like the example in > Comment 4 will work without being rewritten. I think not. I am definitely against having macros change their behaviour based upon the text inside their arguments. Plus there is a hard built-in requirement that assembler directives operate on a per-line basis. If there is a need to spread the arguments to a directive across multiple lines in the input text, then the newline should be escaped. Eg: .section sec1 \ , "ax" , @progbits Cheers Nick
Hi Nick, Thanks for the feedback. First all of, apology and correction to the code example I came up with in https://sourceware.org/bugzilla/show_bug.cgi?id=25750#c4. It should be .macro foo insn1 insn2 \insn1 \insn2 .endm foo .section .sec1, .section .sec2 > Do you mean that if a macro's argument is the string ".inst" then it will treat any text that follows on the same line as being part of the same argument ? If so, then is this special treatment reserved only for macro arguments that start with ".inst" ? What I meant was for gcc to treat the macro expansion in this example as foo ".section .sec1", ".section .sec2", albeit without the quotes, i.e. considering .section .sec1 as the argument for insn1, and .section .sec2 as the argument for insn2. Currently, as treats foo .inst (XXX), .inst (YYY) as foo ".inst (XXX)", ".inst (YYY)" but not some others, such as foo .section .sec1, .section .sec2. All in all I don't think this is a major issue and we have different ways to work around it, so it's completely understandable if this remains what it is on gcc. I just figured it may be worth reporting this issue to upstream in case there is any interest to fix it.
(In reply to Jian Cai from comment #8) Hi Jian, > .macro foo insn1 insn2 > \insn1 > \insn2 > .endm > > foo .section .sec1, .section .sec2 It occurs to me that we might be able to achieve something like this, if we extend the macro argument syntax to include a definition of the argument separator. For example: .macro foo insn1:sep, insn2:vararg \insn1 \insn2 .endm This would tell the macro processor that insn1 takes all characters up to (but excluding) the next comma character and that insn2 takes the rest of the line (starting at the first non-whitespace character after the comma). The new syntax "<arg>:sep<char>" tells the macro processor that argument <arg> is separated from the next argument by the <char> character, and only by the <char> character, so that any other character (apart from line ending characters) forms part of <arg>. This syntax does not exist *yet*. I am going to have a look at implementing it, but if it is too difficult I will give up. What do you think - would this change provide the functionality that you desire ? Cheers Nick
(In reply to Jian Cai from comment #8) Hi Jian, > I did however find a variation that worked and did not have any behavioural inconsistencies: For .macro memorize a \a .endm memorize .long (19371226) The x86 backend rejects it while most other backends accept it. This is actually unintentional. gas/config/tc-i386.c defines /* List of chars besides those in app.c:symbol_chars that can start an operand. Used to prevent the scrubber eating vital white-space. */ const char extra_symbol_chars[] = "*%-([{}" The type of these symbols are considered LEX_IS_SYMBOL_COMPONENT. Before the input is fed to the parser, gas runs do_scrub_chars() to perform some preprocessing. * For other backends, lex['('] is 0, even though the state is 10, no space is kept macro_expand() will see `.long(19371226)` and the string will be parsed as an argument. * For the x86 backend, lex['('] is LEX_IS_SYMBOL_COMPONENT, the space before '(' is kept (state is 10). macro_expand() will see `.long (19371226)` and the string will be parsed as two arguments. If the argument does not have the `:vararg` qualifier, gas will error `Error: too many positional arguments` There is one feature not documented in https://sourceware.org/binutils/docs/as/Macro.html For a string literal argument, gas will remove the quotes when evaluation the argument. This makes the following example work: .macro foo a \a .endm foo ".section sec" For the Linux kernel, __emit_insn may be used as either a directive or an operand. Its reliance on `.inst (x)` being parsed as a whole is brittle. arch/arm64/include/asm/sysreg.h 52:#define __emit_inst(x) .inst (x) arch/arm64/include/asm/sysreg.h 93:#define SET_PSTATE_PAN(x) __emit_inst(0xd500401f | PSTATE_PAN | ((!!x) << PSTATE_Imm_shift)) arch/arm64/kvm/hyp/entry.S 112: ALTERNATIVE(nop, SET_PSTATE_PAN(1), ARM64_HAS_PAN, CONFIG_ARM64_PAN) #define __emit_inst(x) .inst(x) may be the most suitable workaround for clang integrated assembler.
(In reply to Nick Clifton from comment #9) > (In reply to Jian Cai from comment #8) > Hi Jian, > > > .macro foo insn1 insn2 > > \insn1 > > \insn2 > > .endm > > > > foo .section .sec1, .section .sec2 > > It occurs to me that we might be able to achieve something like this, > if we extend the macro argument syntax to include a definition of the > argument separator. For example: > > .macro foo insn1:sep, insn2:vararg > \insn1 > \insn2 > .endm > > This would tell the macro processor that insn1 takes all characters up to > (but excluding) the next comma character and that insn2 takes the rest of > the line (starting at the first non-whitespace character after the comma). > > The new syntax "<arg>:sep<char>" tells the macro processor that argument > <arg> is separated from the next argument by the <char> character, and only > by the <char> character, so that any other character (apart from line ending > characters) forms part of <arg>. > > This syntax does not exist *yet*. I am going to have a look at implementing > it, but if it is too difficult I will give up. > > What do you think - would this change provide the functionality that you > desire ? > > Cheers > Nick Hi Nick, for Jian's request, we can probably just quote the arguments .macro foo insn1 insn2 \insn1 \insn2 .endm - foo .section .sec1, .section .sec2 + foo ".section .sec1", ".section .sec2" We need to document it, though. > Comment 10 > There is one feature not documented in https://sourceware.org/binutils/docs/as/Macro.html For a string literal argument, gas will remove the quotes when evaluation the argument. I am a bit concerned that the :sep qualifier will not be accurate, either. do_scrub_chars() can squeeze a run of spaces or delete some spaces arbitrarily. The :sep argument will not be a sole separator as spaces/commas can be separators as well. % x86_64-linux-gnu-as a.s && objdump -s 0000 612c6220 28632064 292c65 a,b (c d),e % aarch64-linux-gnu-as a.s && objdump -s ... 0000 612c6228 63206429 2c65 a,b(c d),e
Forgot to upload the assembly file I used for the `x86_64-linux-gnu-as a.s && objdump -s` command in Comment 11 % cat a.s .macro foo a:vararg .ascii "\a" .endm foo a, b (c d), e
Yes - I have had second thoughts about my ":sep" idea, and decided that it is the wrong approach. Using double quotes is right solution and no extra hacking should be needed. Changing the lexical behaviour of the x86 backend would probably be a bad idea. I am sure that it would break something somewhere. I suspect that, broken though it may be, we will be best off if we accept the status quo. Cheers Nick
Yes. I would hope do_scrub_chars() did not squeeze a sequence of spaces or dropped spaces before '(', but making such a change would just be disruptive now, see https://lore.kernel.org/linux-arm-kernel/20200414154307.2cke3x5ocz3q2as4@google.com/ Linux kernel between https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bca8f17f57bd76ddf2bbd2527eb890d6f588853e (2016) til now relies on `.inst (...)` scrubbed by do_scrub_chars() as `.inst(...)` Not making the change will keep the nuance regarding spaces but that may be bearable. % cat a.s .macro foo a:vararg .ascii "\a" .endm foo a b (c d) % aarch64-linux-gnu-as a.s && objdump -s ... 0000 61206228 63206429 a b(c d) % as a.s && objdump -s ... 0000 61206220 28632064 29 a b (c d)
Thanks Nick and Fangrui. Going through all the comments, it seems keeping the status quo is a better choice to keep existing (especially legacy) code work. Thank you all for the discussion and will close this issue now.