backslashes in quoted symbol names
Nick Clifton
nickc@redhat.com
Fri Jun 11 12:30:20 GMT 2021
Hi Jan
[Sorry - I was sidetracked by other issues...]
> What we need to consider is how we want to deal with things in the
> middle of a string. What right now can be expressed as "sym\bol"
> (backslash will be retained) might need to be changed to "sym\\bol".
The documentation currently suggests that backslashes ought to be
escape characters. In the "Symbol Names" section of the assembler
documentation it says:
Multibyte characters are supported. To generate a symbol
name containing multibyte characters enclose it within
double quotes and use escape codes. cf See Strings.
So this implies that symbol names inside double quotes are treated as
strings. It turns out however that this is not true, and multibyte
characters cannot be encoded in this way.
> And the question we need to answer up front is what treatment a
> backslash preceding other than another backslash or a double quote
> should receive. Imo strictly speaking such uses should be
> documented as reserved, such that we could alter the behavior down
> the road, e.g. when it turns out necessary to escape other stuff.
> That's what we may want to be warning about.
Personally I think that we should follow the documentation on this one
and treat the backslash character as a real escape character, including
generating an error when the escaped character is not one with a special
meaning.
Yet then anyone using
> "sym\\bol" now would still observe a silent change, as we'd convert
> what now results in two backslashes to just one. I guess this might
> be acceptable if mentioned in NEWS?
Yes - we should do that. Plus we should extend the documentation to
make it clear that double quote enclosed symbol names are definitely
treated as strings.
Maybe we should add a warning for PE based targets that \\ is being
treated as \ ? Those targets are the only ones where I would imagine
this situation might actually arise.
If we do get complaints from users about this, then blame me. :-)
Plus there is one other situation. In my testing I discovered that:
"<space>":
"<tab>":
"<newline>":
are all treated as valid symbol names. (Just to be clear, each of
those symbol names only contains one character)... These should all
be rejected as invalid. So should:
" foo":
and the like.
Are you volunteering to create a patch for this ?
Cheers
Nick
More information about the Binutils
mailing list