Bug 22267

Summary: ld.bfd does not accept "foo = ~0xFF" in linker script but accepts "foo = ~ 0xFF"
Product: binutils Reporter: georgerim
Component: ldAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED FIXED    
Severity: normal CC: hjl.tools, rafael
Priority: P2    
Version: 2.30   
Target Milestone: 2.30   
Host: Target:
Build: Last reconfirmed:

Description georgerim 2017-10-06 11:44:45 UTC
Following linkerscript triggers an error when using 
"GNU ld (GNU Binutils) 2.29.51.20171006":

SECTIONS { . = 0x10000; foo = ~0xFF; }
"test.script:3: undefined symbol `~0xFF' referenced in expression"

Though when I insert space between `~` and `0xFF`, ld.bfd parses script fine:
SECTIONS { . = 0x10000; foo = ~ 0xFF; }

It is not clear what is intentional behavior, as it not documented I think, but ld.bfd does not require space in following assignment: "foo = !0xFF".

FWIW gold also does not have this "issue", I'll open bug for it too just in case it has unwanted relaxed behavior.
Comment 1 georgerim 2017-10-06 11:49:09 UTC
Bug for ld.gold posted here:
https://sourceware.org/bugzilla/show_bug.cgi?id=22268
Comment 2 H.J. Lu 2017-10-07 01:35:13 UTC
Don't know why "~' is allowed as the first char in symbol name.
This patch disallows it:

diff --git a/ld/ldlex.l b/ld/ldlex.l
index ba618ecc27..795a4d7c8e 100644
--- a/ld/ldlex.l
+++ b/ld/ldlex.l
@@ -94,6 +94,7 @@ static void lex_warn_invalid (char *where, char *what);
 CMDFILENAMECHAR   [_a-zA-Z0-9\/\.\\_\+\$\:\[\]\\\,\=\&\!\<\>\-\~]
 CMDFILENAMECHAR1  [_a-zA-Z0-9\/\.\\_\+\$\:\[\]\\\,\=\&\!\<\>\~]
 FILENAMECHAR1	[_a-zA-Z\/\.\\\$\_\~]
+SYMBOLNAMECHAR1	[_a-zA-Z\/\.\\\$\_]
 SYMBOLCHARN     [_a-zA-Z\/\.\\\$\_\~0-9]
 FILENAMECHAR	[_a-zA-Z0-9\/\.\-\_\+\=\$\:\[\]\\\,\~]
 WILDCHAR	[_a-zA-Z0-9\/\.\-\_\+\=\$\:\[\]\\\,\~\?\*\^\!]
@@ -136,7 +137,7 @@ V_IDENTIFIER [*?.$_a-zA-Z\[\]\-\!\^\\]([*?.$_a-zA-Z0-9\[\]\-\!\^\\]|::)*
 
 <DEFSYMEXP>"-"                  { RTOKEN('-');}
 <DEFSYMEXP>"+"                  { RTOKEN('+');}
-<DEFSYMEXP>{FILENAMECHAR1}{SYMBOLCHARN}*   { yylval.name = xstrdup (yytext); return NAME; }
+<DEFSYMEXP>{SYMBOLNAMECHAR1}{SYMBOLCHARN}*   { yylval.name = xstrdup (yytext); return NAME; }
 <DEFSYMEXP>"="                  { RTOKEN('='); }
 
 <MRI,EXPRESSION>"$"([0-9A-Fa-f])+ {
@@ -390,7 +391,7 @@ V_IDENTIFIER [*?.$_a-zA-Z\[\]\-\!\^\\]([*?.$_a-zA-Z0-9\[\]\-\!\^\\]|::)*
 				  yylval.name = xstrdup (yytext + 2);
 				  return LNAME;
 				}
-<EXPRESSION>{FILENAMECHAR1}{NOCFILENAMECHAR}*	{
+<EXPRESSION>{SYMBOLNAMECHAR1}{NOCFILENAMECHAR}*	{
 				 yylval.name = xstrdup (yytext);
 				  return NAME;
 				}
Comment 3 georgerim 2017-10-07 13:58:51 UTC
Thanks !
I can confirm this patch works,
ld.bfd does not report issue anymore and produces correct symbol 'foo' for me,
so output is consistent with ld.gold.

Can we expect this change be included into 2.30 release ?
Comment 4 Sourceware Commits 2017-10-09 11:18:34 UTC
The master branch has been updated by H.J. Lu <hjl@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=76f361eb4934dcda0626517c311b34fbc92d09b9

commit 76f361eb4934dcda0626517c311b34fbc92d09b9
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Mon Oct 9 04:17:10 2017 -0700

    ld: Don't allow '~' as the first char in symbol name
    
    Don't allow '~' as the first character in symbol name in linker script.
    
    	PR ld/22267
    	* ldlex.l (SYMBOLNAMECHAR1) New.
    	(DEFSYMEXP): Replace FILENAMECHAR1 with SYMBOLNAMECHAR1.
    	(EXPRESSION): Likewise.
    	* testsuite/ld-scripts/expr.exp: Run pr22267.
    	* testsuite/ld-scripts/pr22267.d: New file.
    	* testsuite/ld-scripts/pr22267.s: Likewise.
    	* testsuite/ld-scripts/pr22267.t: Likewise.
Comment 5 H.J. Lu 2017-10-09 11:18:54 UTC
Fixed for 2.30.