[PATCH] [gdb/tui] Handle unicode chars in prompt

Fri May 26 13:56:24 GMT 2023

> Cc: Tom Tromey <tom@tromey.com>
> Date: Fri, 26 May 2023 15:25:12 +0200
> From: Tom de Vries via Gdb-patches <gdb-patches@sourceware.org>
> 
> +/* Return true if STRING starts with a multi-byte char.  Return the length of
> +   the multi-byte char in LEN, or 0 in case it's a multi-byte null char.
> +   Implementation based on _rl_read_mbchar.  */
> +
> +static bool
> +is_mb_char (const char *string, int &len)
> +{
> +  for (len = 1; len <= MB_CUR_MAX; len++)
> +    {
> +      size_t res;
> +
> +      {
> +	wchar_t wc;  <<<<<<<<<<<<<<<<<<<<<<<
> +	mbstate_t ps;
> +	memset (&ps, 0, sizeof (mbstate_t));
> +	res = mbrtowc (&wc, string, len, &ps);

The above assumes each call to mbrtowc produces only one wchar_t
value.  But that's non-portable: on MS-Windows wchar_t is a 16-bit
wide data type, and wchar_t "wide characters" are actually encoded in
UTF-16.  So characters beyond the BMP will yield 2 wchar_t values, not
one.

One additional caveat: "multibyte" != "UTF-8".  There's more than one
multibyte encoding, and the current locale could use some non-UTF-8
encoding instead.  For example, some encoding of the ISO-2022 family.
I'm not sure what this means for the issue at hand.

Yet another consideration is whether tui_puts_internal is used for
outputting text in the target charset, in which case you may have
problems with using mbrtowc, because AFAIK that supports only the
current locale's codeset.  If the target charset is different from the
locale's (basically, the host) charset, and we don't convert one to
the other before calling tui_puts_internal, mbrtowc will fail.

Yes, this is a mess.

Thanks.