This is the mail archive of the
gdb-patches@sourceware.org
mailing list for the GDB project.
Re: support C/C++ identifiers named with non-ASCII characters
> From: <Paul.Koning@dell.com>
> CC: <zjz@zjz.name>, <gdb-patches@sourceware.org>
> Date: Mon, 21 May 2018 14:12:12 +0000
>
> > Given unlimited time, would the right solution be to use a lib to parse the
> > string as utf-8, and reject strings that are not valid utf-8?
>
> This sounds like a scenario where "stringprep" is helpful (or necessary). It validates strings to be valid utf-8, can check that they obey certain rules (such as "word elements only" which rejects punctuation and the like), and can convert them to a canonical form so equal strings match whether they are encoded the same or not.
Is it a fact that non-ASCII identifiers must be encoded in UTF-8, and
can not include invalid UTF-8 sequences?