This is the mail archive of the
gdb@sources.redhat.com
mailing list for the GDB project.
Displaying Unicode
[This came from a different thread: ]
[Was: extending Gdb to display app specific data)]
Eli Zaretskii wrote:
> Could you please post a list of the types which you'd like to be able
> to display, and tell how each one of these types should look on the
> screen when displayed by GDB?
Lets look at Unicode.
UCS-2
=============
Each character is 16 bits which allows programs to (more
easily) handle languages with more than 255 characters
(eg: Japanese). The ascii values are basically extended
to 16 bits. For example "A" which is 0x41 in ascii would
be 0x0041 in UCS-2.
The simplest UCS-2 display would be to display the ascii
as ascii and the non-ascii as hex. This way the non
internalization (i18n) engineers get to continue to see
the strings as before. The i18n engineers will have to
look up the values (which they often need to do anyway).
A more sophisticated display routine would check if the
non ascii was displayable in the current locale and if so
would convert it to the current locale encoding for
display. This way developers that can read Japanese, etc.
can see (hopefully something close to) the intended
text.
To my knowledge there is not a universally used type for
UCS-2 but we could require that for Gdb display the app
use "UCS2 *".
UTF-16
========
For purposes of this discussion it is the same as UCS-2.
UTF-16 supports greater than 64K characters by allowing
an extended value to be composed of two 16 bit characters.
Display the same as UCS-2.
To my knowledge there is not a universally used type for
UTF-16 but we could require that for Gdb display the app
use "UTF16 *".
UTF-8
=========
An alternate (8 bit multibyte) encoding, Popular because
UTF-8 does not have any 0 bytes in the data stream. For
display convert it to UCS-2 and then display the UCS-2.
To my knowledge there is not a universally used type for
UTF-8 but we could require that for Gdb display the app
use "UTF8 *".
Here is a UCS-2 display routine I use:
==============================================
void
dump_UCharString(const UChar *uChar_str)
{
while (*uChar_str) {
if (*uChar_str < 0x7F) {
printf("%c", *uChar_str);
}
else {
printf("\\x%02x%02x ", (*uChar_str)>>8, (*uChar_str)&0xFF);
}
uChar_str++;
}
}