support C/C++ identifiers named with non-ASCII characters
Paul.Koning@dell.com
Paul.Koning@dell.com
Mon May 21 19:25:00 GMT 2018
> On May 21, 2018, at 2:14 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> From: <Paul.Koning@dell.com>
>> CC: <simark@simark.ca>, <zjz@zjz.name>, <gdb-patches@sourceware.org>
>> Date: Mon, 21 May 2018 18:03:17 +0000
>>
>>> Is it a fact that non-ASCII identifiers must be encoded in UTF-8, and
>>> can not include invalid UTF-8 sequences?
>>
>> Encoding is a I/O question.
>
> Not necessarily.
>
> I asked that question because scanning a string for certain ASCII
> characters using a 'char *' pointer will only work reliably if the
> string is in UTF-8 or in some single-byte encoding. Otherwise, we
> might find false hits for the delimiters, which are actually parts of
> multibyte sequences.
I see your point.
The I/O encoding ties to the internal encoding. UTF-8 can be read into char[] and processed using C string primitives. Other encodings cannot. For example, if you have UTF-16 or UTF-32, you'd have to read it into a wchar_t string of the correct character width and use the wchar string functions.
So there are two questions:
1. What are the valid characters? (Unicode question, independent of encoding)
2. What encoding do we expect in I/O (UTF question) from which we conclude what processing functions we need.
paul
More information about the Gdb-patches
mailing list