[PATCH/WIP] C/C++ wchar_t/Unicode printing support

Fri Jan 16 09:36:00 GMT 2009

> Date: Thu, 15 Jan 2009 20:24:11 +0000
> From: Julian Brown <julian@codesourcery.com>
> Cc: tromey@redhat.com
> 
> This patch contains (at least the start of) support for printing
> wchar_t strings from a debugged program within GDB. This is the subject
> for GDB bugs 9103 (and its duplicates 9369, 9268) and maybe 7821.

Thank you!

> OK to apply

Not without documentation, sorry.  Such an important feature should
not go in undocumented.

> or any comments?

A few:

> (gdb) show host-charset
> The host character set is "UTF-8" (auto).

Elsewhere in GDB, we show such settings in a slightly different form:

    (gdb) show language
    The current source language is "auto; currently c".

I like this latter form better: it first says that the setting is
"auto", then what is the detected state.

> + #ifndef GDB_DEFAULT_TARGET_WIDE_CHARSET
> + #define GDB_DEFAULT_TARGET_WIDE_CHARSET "UTF-32"
> + #endif
> + 
> + #ifndef GDB_INTERNAL_CODESET
> + #define GDB_INTERNAL_CODESET "UCS-4LE"
> + #endif

Why are these the defaults? because of what GNU/Linux (i.e. glibc)
does, or for some other reason?  If the former, shouldn't this be
autoconfigured?

> + static const char *target_wide_charset_enum[] =
> + {
> +   "UCS-2",
> +   "UCS-2LE",
> +   "UCS-2BE",
> +   "UCS-4",
> +   "UCS-4LE",
> +   "UCS-4BE",
> +   "UTF-16",
> +   "UTF-16LE",
> +   "UTF-16BE",
> +   "UTF-32",
> +   "UTF-32LE",
> +   "UTF-32BE",
> +   0
> + };

Why do we need the UCS-2 charsets?  That's just confusing; are there
important platforms that support UCS-2 instead of UTF-16?  I'd also
suggest to consider removing UTF-32 and its endian variants, since
they are exactly identical to UCS-4.  (Unless someone wants to support
the Emacs 23 internal representation, but that one should be called by
its own name anyway.)