[RFC-v2] Allow explicit 16 or 32 char in 'x /s'
Pierre Muller
pierre.muller@ics-cnrs.unistra.fr
Thu Apr 1 09:34:00 GMT 2010
> -----Message d'origine-----
> De : gdb-patches-owner@sourceware.org [mailto:gdb-patches-
> owner@sourceware.org] De la part de Eli Zaretskii
> > +the unit size defaults to @samp{b}, unless it is explicitly given.
> > +Use @kbd{x /hs} to display 16-bit char strings and @kbd{x /ws} to
> display
> > +32-bit strings. The next use of @kbd{x /s} will again display 8-bit
> > strings.
>
> This is okay, but I still think we should mention that the encoding is
> UTF-16 and UCS-4, respectively, and that it cannot be changed.
According to c_emit_char function, it is
UTF-16 (LE or BE depending on target endianess)
or UTF-32 (LE or BE also).
Is UCS-4 exactly the same as UTF-32?
Furthermore, this is c_emit_char, which means that this
is a language specific output.
Several languages have their own emit_char functions,
several of them start by a
c &= 0xFF;
line, which discards higher bytes of the character value.
(found in f-lang.c:86, m2-lang.c:45, objc-lang.c:287 and p-lang.c:161)
Of course these implementations would benefit from
using the more up to date c-lang.c implementation, but that is another
story.
This means that UTF-16 and UTF-32 will only be used
for c, cplus, assembler, minimal.
Java language seems to use another scheme to represent
extended characters: it uses
fprintf_unfiltered (stream, "\\u%.4x", (unsigned int) c);
To summarize, I don't think that saying that ' /hs' uses UTF-16
without specifying that this is language specific is correct.
Should I just mention that the output is language dependent
and uses UTF-16 or UTF-32 for c, cplus, assembler and minimal languages?
Pierre Muller
More information about the Gdb-patches
mailing list