This is the mail archive of the
gdb-patches@sourceware.org
mailing list for the GDB project.
[PATCH/WIP] C/C++ wchar_t/Unicode printing support
- From: Julian Brown <julian at codesourcery dot com>
- To: gdb-patches at sourceware dot org
- Cc: tromey at redhat dot com
- Date: Thu, 15 Jan 2009 20:24:11 +0000
- Subject: [PATCH/WIP] C/C++ wchar_t/Unicode printing support
Hi,
This patch contains (at least the start of) support for printing
wchar_t strings from a debugged program within GDB. This is the subject
for GDB bugs 9103 (and its duplicates 9369, 9268) and maybe 7821.
Notes on the implementation:
1. I've added a new configuration variable, similar to "host-charset"
and "target-charset". The latter can't be used for printing wide
characters, because regular C strings and wide strings aren't
necessarily (or in fact ever) encoded using the same encoding. The new
variable is set like:
(gdb) set target-wide-charset UTF-32
I considered adding "set target-wide-charset auto" to attempt to
auto-detect the charset used for wchar_t strings automatically (i.e.
probably 4 bytes -> UCS-4, 2 bytes -> UTF-16), but that's not done
presently.
2. The host terminal may be able to print Unicode characters, by
feeding it UTF-8 encoded characters. There are some limitations: I
don't think Unix terminals support combining character sequences --
I've ignored that for now. GDB currently defaults "host-charset" to
ISO-8859-1, although a given terminal may not print
top-bit-set characters correctly.
I've added a new way of setting the host character set from the
host terminal (using nl_langinfo (CODESET)), like so:
(gdb) set host-charset auto
If the terminal supports UTF-8 (e.g. LC_ALL is set to en_US.UTF-8), we
will then see:
(gdb) show host-charset
The host character set is "UTF-8" (auto).
If the terminal only supports ASCII (e.g. LC_ALL is set to C), we will
instead see:
(gdb) show host-charset
The host character set is "ANSI_X3.4-1968" (auto).
3. Types which are literally called "wchar_t" are assumed to be wide
characters. So we can do:
wchar_t *msg = L"Hello world";
and then:
(gdb) p msg
$1 = (wchar_t *) 0x85c4 "Hello world"
If the message contains funny characters, and the user has typed "set
host-charset auto" on a UTF-8 capable terminal, they will be printed
nicely:
(gdb) p msg
$2 = (wchar_t *) 0x85c4 "SchÃne GrÃÃe"
With the caveat that there's no way for GDB to know if you have a font
with the right glyphs in it: if not, you can fall back to ASCII:
(gdb) set host-charset ASCII
(gdb) p msg
$3 = (wchar_t *) 0x85c4 "Sch\x00f6ne Gr\x00fc\x00dfe"
4. If you want to print an integer array type which isn't literally
called "wchar_t" but nevertheless contains a wchar_t string, you can
override using "/s", just like with regular strings, e.g.:
(gdb) p/s intmsg
$2 = (int *) 0x85c4 "SchÃne GrÃÃe"
5. The existing string-printing code is careful about not printing out
lots of repeating characters. For wchar_t strings (taking into account
the differences between what they represent on various platforms
mentioned above), there is generally an X-Y correspondence between the
number of input bytes and the number of output bytes for each
character: to detect repeats, we convert an arbitrary number of X's to
UCS-4, detect repeated UCS-4 values, then translate each to Y output
characters.
Current shortcomings:
1. There's no support for non-C-like languages.
2. I've probably broken building with iconv disabled (actually I
couldn't figure out how to build without iconv() support -- even for
e.g. a mingw32 host which shouldn't support it).
3. Currently wrong-endian wide characters from the target will confuse
things (but you can explicitly set target-wide-charset to UCS-4LE or
UCS-4BE for example).
4. I've not written documentation or altered test cases yet
(charset.exp shows some regressions).
Tom Tromey is working on a patch related to this. Some of his comments
are incorporated in this patch relative to an earlier version sent to
him privately (thanks!).
Regression tested on x86-64 Linux, and spot-checked with an ARM Linux
cross debugger (from x86 build/host). As mentioned above, there are
some regressions so far.
OK to apply, or any comments?
Cheers,
Julian
ChangeLog
gdb/
* c-valprint.c (textual_element_type): Alter TYPE to be the type of
the element before looking through typedefs, and update comment. Add
wide-character support.
(c_val_print): Pass type before typedef resolution to
textual_element_type calls.
* charset.c (langinfo.h): Include, if HAVE_LANGINFO_CODESET.
(GDB_DEFAULT_TARGET_WIDE_CHARSET, GDB_INTERNAL_CODESET): New macros.
(host_charset_auto): New.
(show_host_charset_name): Indicate automatically-selected charset.
(target_wide_charset_name, show_target_wide_charset_name): New.
(host_charset_enum): Add "auto".
(target_wide_charset_enum): New. Support a limited number of
wchar_t character sets.
(iconv_char_print_literally): New.
(iconv_to_control): New.
(lookup_and_register_iconv_charset): New.
(default_c_internal_char_has_backslash_escape): New.
(current_target_wide_charset, internal_charset): New.
(set_host_charset): Add support for "auto" host charset.
(show_charset): Show target wide charset.
(set_target_wide_charset, set_target_wide_charset_sfunc)
(target_wide_charset, cached_iconv_target_to_internal)
(cached_iconv_internal_to_host, target_to_internal_iconv_t)
(internal_to_host_iconv_t, reset_host_char_state)
(target_char_to_internal, internal_char_host_emit): New.
(_initialize_charset): Add wide-character support.
* charset.h (target_wide_charset, reset_host_char_state)
(target_char_to_internal) (internal_char_host_emit): Add prototypes.
* c-lang.c (c_internal_char_host_emit, c_printwidestr): New.
(c_printstr): Call c_printwidestr when appropriate.
* printcmd.c (print_formatted): Add wide-character support.
* configure.ac (AM_LANGINFO_CODESET): Add.
* acinclude.m4 (../config/codeset.m4): Include.
* config.in: Regenerate.
* configure: Regenerate.
Index: gdb/c-valprint.c
===================================================================
RCS file: /cvs/src/src/gdb/c-valprint.c,v
retrieving revision 1.55
diff -c -p -r1.55 c-valprint.c
*** gdb/c-valprint.c 3 Jan 2009 05:57:51 -0000 1.55
--- gdb/c-valprint.c 15 Jan 2009 20:10:38 -0000
*************** print_function_pointer_address (CORE_ADD
*** 59,70 ****
to TYPE should be printed as a textual string. Return non-zero if
it should, or zero if it should be treated as an array of integers
or pointer to integers. FORMAT is the current format letter,
! or 0 if none.
We guess that "char" is a character. Explicitly signed and
unsigned character types are also characters. Integer data from
vector types is not. The user can override this by using the /s
! format letter. */
static int
textual_element_type (struct type *type, char format)
--- 59,76 ----
to TYPE should be printed as a textual string. Return non-zero if
it should, or zero if it should be treated as an array of integers
or pointer to integers. FORMAT is the current format letter,
! or 0 if none. So that we can detect wchar_t strings, TYPE should
! *not* have been resolved using check_typedef before calling this
! function (in C, wchar_t would then appear to be a plain integer).
We guess that "char" is a character. Explicitly signed and
unsigned character types are also characters. Integer data from
vector types is not. The user can override this by using the /s
! format letter. The /s format letter can also be used to print arrays
! of 2- or 4-byte integers as wide character strings.
!
! If TYPE is named "wchar_t" (before looking through typedefs), and elements
! are of 2 or 4-byte integer type, detect as a wide-character string. */
static int
textual_element_type (struct type *type, char format)
*************** textual_element_type (struct type *type,
*** 80,89 ****
if (format == 's')
{
! /* Print this as a string if we can manage it. For now, no
! wide character support. */
if (TYPE_CODE (true_type) == TYPE_CODE_INT
! && TYPE_LENGTH (true_type) == 1)
return 1;
}
else
--- 86,96 ----
if (format == 's')
{
! /* Print this as a string if we can manage it. */
if (TYPE_CODE (true_type) == TYPE_CODE_INT
! && (TYPE_LENGTH (true_type) == 1
! || TYPE_LENGTH (true_type) == 2
! || TYPE_LENGTH (true_type) == 4))
return 1;
}
else
*************** textual_element_type (struct type *type,
*** 97,102 ****
--- 104,116 ----
return 1;
}
+ if (TYPE_NAME (type) && strcmp (TYPE_NAME (type), "wchar_t") == 0
+ && TYPE_CODE (true_type) == TYPE_CODE_INT
+ && (TYPE_LENGTH (true_type) == 2
+ || TYPE_LENGTH (true_type) == 4)
+ && !TYPE_NOTTEXT (true_type))
+ return 1;
+
return 0;
}
*************** c_val_print (struct type *type, const gd
*** 115,121 ****
{
unsigned int i = 0; /* Number of characters printed */
unsigned len;
! struct type *elttype;
unsigned eltlen;
LONGEST val;
CORE_ADDR addr;
--- 129,136 ----
{
unsigned int i = 0; /* Number of characters printed */
unsigned len;
! struct type *elttype, *unresolved_elttype;
! struct type *unresolved_type = type;
unsigned eltlen;
LONGEST val;
CORE_ADDR addr;
*************** c_val_print (struct type *type, const gd
*** 124,131 ****
switch (TYPE_CODE (type))
{
case TYPE_CODE_ARRAY:
! elttype = check_typedef (TYPE_TARGET_TYPE (type));
! if (TYPE_LENGTH (type) > 0 && TYPE_LENGTH (TYPE_TARGET_TYPE (type)) > 0)
{
eltlen = TYPE_LENGTH (elttype);
len = TYPE_LENGTH (type) / eltlen;
--- 139,147 ----
switch (TYPE_CODE (type))
{
case TYPE_CODE_ARRAY:
! unresolved_elttype = TYPE_TARGET_TYPE (type);
! elttype = check_typedef (unresolved_elttype);
! if (TYPE_LENGTH (type) > 0 && TYPE_LENGTH (unresolved_elttype) > 0)
{
eltlen = TYPE_LENGTH (elttype);
len = TYPE_LENGTH (type) / eltlen;
*************** c_val_print (struct type *type, const gd
*** 135,141 ****
}
/* Print arrays of textual chars with a string syntax. */
! if (textual_element_type (elttype, options->format))
{
/* If requested, look for the first null char and only print
elements up to it. */
--- 151,157 ----
}
/* Print arrays of textual chars with a string syntax. */
! if (textual_element_type (unresolved_elttype, options->format))
{
/* If requested, look for the first null char and only print
elements up to it. */
*************** c_val_print (struct type *type, const gd
*** 145,153 ****
/* Look for a NULL char. */
for (temp_len = 0;
! (valaddr + embedded_offset)[temp_len]
! && temp_len < len && temp_len < options->print_max;
! temp_len++);
len = temp_len;
}
--- 161,173 ----
/* Look for a NULL char. */
for (temp_len = 0;
! (temp_len < len
! && temp_len < options->print_max
! && extract_unsigned_integer (valaddr + embedded_offset
! + temp_len * eltlen,
! eltlen) == 0);
! temp_len++)
! ;
len = temp_len;
}
*************** c_val_print (struct type *type, const gd
*** 209,215 ****
print_function_pointer_address (addr, stream, options->addressprint);
break;
}
! elttype = check_typedef (TYPE_TARGET_TYPE (type));
{
addr = unpack_pointer (type, valaddr + embedded_offset);
print_unpacked_pointer:
--- 229,236 ----
print_function_pointer_address (addr, stream, options->addressprint);
break;
}
! unresolved_elttype = TYPE_TARGET_TYPE (type);
! elttype = check_typedef (unresolved_elttype);
{
addr = unpack_pointer (type, valaddr + embedded_offset);
print_unpacked_pointer:
*************** c_val_print (struct type *type, const gd
*** 228,236 ****
/* For a pointer to a textual type, also print the string
pointed to, unless pointer is null. */
- /* FIXME: need to handle wchar_t here... */
! if (textual_element_type (elttype, options->format)
&& addr != 0)
{
i = val_print_string (addr, -1, TYPE_LENGTH (elttype), stream,
--- 249,256 ----
/* For a pointer to a textual type, also print the string
pointed to, unless pointer is null. */
! if (textual_element_type (unresolved_elttype, options->format)
&& addr != 0)
{
i = val_print_string (addr, -1, TYPE_LENGTH (elttype), stream,
*************** c_val_print (struct type *type, const gd
*** 268,274 ****
}
else
{
! wtype = TYPE_TARGET_TYPE (type);
}
vt_val = value_at (wtype, vt_address);
common_val_print (vt_val, stream, recurse + 1, options,
--- 288,294 ----
}
else
{
! wtype = unresolved_elttype;
}
vt_val = value_at (wtype, vt_address);
common_val_print (vt_val, stream, recurse + 1, options,
*************** c_val_print (struct type *type, const gd
*** 442,448 ****
Since we don't know whether the value is really intended to
be used as an integer or a character, print the character
equivalent as well. */
! if (textual_element_type (type, options->format))
{
fputs_filtered (" ", stream);
LA_PRINT_CHAR ((unsigned char) unpack_long (type, valaddr + embedded_offset),
--- 462,468 ----
Since we don't know whether the value is really intended to
be used as an integer or a character, print the character
equivalent as well. */
! if (textual_element_type (unresolved_type, options->format))
{
fputs_filtered (" ", stream);
LA_PRINT_CHAR ((unsigned char) unpack_long (type, valaddr + embedded_offset),
Index: gdb/charset.c
===================================================================
RCS file: /cvs/src/src/gdb/charset.c,v
retrieving revision 1.16
diff -c -p -r1.16 charset.c
*** gdb/charset.c 3 Jan 2009 05:57:51 -0000 1.16
--- gdb/charset.c 15 Jan 2009 20:10:38 -0000
***************
*** 30,35 ****
--- 30,39 ----
#include <iconv.h>
#endif
+ #ifdef HAVE_LANGINFO_CODESET
+ #include <langinfo.h>
+ #endif
+
/* How GDB's character set support works
*************** struct translation {
*** 162,174 ****
#define GDB_DEFAULT_TARGET_CHARSET "ISO-8859-1"
#endif
static const char *host_charset_name = GDB_DEFAULT_HOST_CHARSET;
static void
show_host_charset_name (struct ui_file *file, int from_tty,
struct cmd_list_element *c,
const char *value)
{
! fprintf_filtered (file, _("The host character set is \"%s\".\n"), value);
}
static const char *target_charset_name = GDB_DEFAULT_TARGET_CHARSET;
--- 166,192 ----
#define GDB_DEFAULT_TARGET_CHARSET "ISO-8859-1"
#endif
+ #ifndef GDB_DEFAULT_TARGET_WIDE_CHARSET
+ #define GDB_DEFAULT_TARGET_WIDE_CHARSET "UTF-32"
+ #endif
+
+ #ifndef GDB_INTERNAL_CODESET
+ #define GDB_INTERNAL_CODESET "UCS-4LE"
+ #endif
+
static const char *host_charset_name = GDB_DEFAULT_HOST_CHARSET;
+ static int host_charset_auto = 1;
static void
show_host_charset_name (struct ui_file *file, int from_tty,
struct cmd_list_element *c,
const char *value)
{
! fprintf_filtered (file, _("The host character set is \"%s\""), value);
!
! if (host_charset_auto)
! fprintf_filtered (file, _(" (auto).\n"));
! else
! fputs_filtered (".\n", file);
}
static const char *target_charset_name = GDB_DEFAULT_TARGET_CHARSET;
*************** show_target_charset_name (struct ui_file
*** 180,190 ****
--- 198,217 ----
value);
}
+ static const char *target_wide_charset_name = GDB_DEFAULT_TARGET_WIDE_CHARSET;
+ static void
+ show_target_wide_charset_name (struct ui_file *file, int from_tty,
+ struct cmd_list_element *c, const char *value)
+ {
+ fprintf_filtered (file, _("The target wide character set is \"%s\".\n"),
+ value);
+ }
static const char *host_charset_enum[] =
{
"ASCII",
"ISO-8859-1",
+ "auto",
0
};
*************** static const char *target_charset_enum[]
*** 197,202 ****
--- 224,246 ----
0
};
+ static const char *target_wide_charset_enum[] =
+ {
+ "UCS-2",
+ "UCS-2LE",
+ "UCS-2BE",
+ "UCS-4",
+ "UCS-4LE",
+ "UCS-4BE",
+ "UTF-16",
+ "UTF-16LE",
+ "UTF-16BE",
+ "UTF-32",
+ "UTF-32LE",
+ "UTF-32BE",
+ 0
+ };
+
/* The global list of all the charsets GDB knows about. */
static struct charset *all_charsets;
*************** ebcdic_family_charset (const char *name)
*** 376,381 ****
--- 420,474 ----
#if defined(HAVE_ICONV)
+ /* Note: this is a stub. */
+
+ static int
+ iconv_char_print_literally (void *baton, int c)
+ {
+ return 1;
+ }
+
+ /* Note: this is a stub. */
+
+ static int
+ iconv_to_control (void *baton, int c, int *ctrl_char)
+ {
+ return 0;
+ }
+
+ /* Check charset is permitted by iconv, and return a "struct charset *"
+ representing it if so. Return NULL on failure. */
+ static struct charset *
+ lookup_and_register_iconv_charset (const char *name)
+ {
+ struct charset **ptr, *cs;
+ iconv_t probe;
+
+ /* On Solaris, identity conversions are apparently not permitted. Try two
+ probes: the first to GDB_INTERNAL_CODESET, the second from ASCII. If one
+ of these succeeds, we know that iconv supports charset NAME. */
+ probe = iconv_open (name, GDB_INTERNAL_CODESET);
+ if (probe == (iconv_t) -1)
+ probe = iconv_open ("ASCII", name);
+
+ if (probe == (iconv_t) -1)
+ {
+ warning (_("Invalid iconv character set `%s'."), name);
+
+ return NULL;
+ }
+
+ iconv_close (probe);
+
+ for (ptr = &all_charsets; *ptr; ptr = &(*ptr)->next)
+ if (! strcmp (name, (*ptr)->name))
+ return *ptr;
+
+ /* Warning: valid_host_charset == 1 isn't necessarily true. */
+ return simple_charset (xstrdup (name), 1, iconv_char_print_literally, NULL,
+ iconv_to_control, NULL);
+ }
+
struct cached_iconv {
struct charset *from, *to;
iconv_t i;
*************** default_c_parse_backslash (void *baton,
*** 575,580 ****
--- 668,688 ----
}
+ /* Similar to default_c_target_char_has_backslash_escape, but works on an
+ internal char in UCS-4. */
+ static const char *
+ default_c_internal_char_has_backslash_escape (unsigned long internal_char)
+ {
+ const char *ix;
+
+ ix = strchr (represented, internal_char);
+ if (ix)
+ return backslashed[ix - represented];
+ else
+ return NULL;
+ }
+
+
/* Convert using a cached iconv descriptor. */
static int
iconv_convert (void *baton, int from_char, int *to_char)
*************** simple_table_translation (const char *fr
*** 898,904 ****
/* The current host and target character sets. */
! static struct charset *current_host_charset, *current_target_charset;
/* The current functions and batons we should use for the functions in
charset.h. */
--- 1006,1013 ----
/* The current host and target character sets. */
! static struct charset *current_host_charset, *current_target_charset,
! *current_target_wide_charset, *internal_charset;
/* The current functions and batons we should use for the functions in
charset.h. */
*************** set_host_and_target_charsets (struct cha
*** 1041,1048 ****
static void
set_host_charset (const char *charset)
{
! struct charset *cs = lookup_charset_or_error (charset);
! check_valid_host_charset (cs);
set_host_and_target_charsets (cs, current_target_charset);
}
--- 1150,1183 ----
static void
set_host_charset (const char *charset)
{
! struct charset *cs;
!
! if (strcmp (charset, "auto") == 0)
! {
! const char *old_charset_name = host_charset_name;
! struct charset *old_charset = current_host_charset;
! #ifdef HAVE_LANGINFO_CODESET
! charset = nl_langinfo (CODESET);
! #else
! /* No nl_langinfo (CODESET). Fall back to default. */
! charset = GDB_DEFAULT_HOST_CHARSET;
! #endif
! host_charset_auto = 1;
! host_charset_name = charset;
! cs = lookup_and_register_iconv_charset (charset);
! if (!cs)
! {
! host_charset_auto = 0;
! host_charset_name = old_charset_name;
! cs = old_charset;
! }
! }
! else
! {
! cs = lookup_charset_or_error (charset);
! host_charset_auto = 0;
! check_valid_host_charset (cs);
! }
set_host_and_target_charsets (cs, current_target_charset);
}
*************** set_target_charset (const char *charset)
*** 1055,1060 ****
--- 1190,1203 ----
set_host_and_target_charsets (current_host_charset, cs);
}
+ static void
+ set_target_wide_charset (const char *charset)
+ {
+ struct charset *cs = lookup_and_register_iconv_charset (charset);
+
+ current_target_wide_charset = cs;
+ }
+
/* 'Set charset', 'set host-charset', 'set target-charset', 'show
charset' sfunc's. */
*************** set_target_charset_sfunc (char *charset,
*** 1087,1092 ****
--- 1230,1243 ----
set_target_charset (target_charset_name);
}
+ /* Wrapper for the 'set target-wide-charset' command. */
+ static void
+ set_target_wide_charset_sfunc (char *charset, int from_tty,
+ struct cmd_list_element *c)
+ {
+ set_target_wide_charset (target_wide_charset_name);
+ }
+
/* sfunc for the 'show charset' command. */
static void
show_charset (struct ui_file *file, int from_tty, struct cmd_list_element *c,
*************** show_charset (struct ui_file *file, int
*** 1103,1108 ****
--- 1254,1261 ----
fprintf_filtered (file, _("The current target character set is `%s'.\n"),
target_charset ());
}
+ fprintf_filtered (file, _("The current target wide character set is `%s'.\n"),
+ target_wide_charset ());
}
*************** target_charset (void)
*** 1120,1125 ****
--- 1273,1284 ----
return current_target_charset->name;
}
+ const char *
+ target_wide_charset (void)
+ {
+ return current_target_wide_charset->name;
+ }
+
/* Public character management functions. */
*************** target_char_to_host (int target_char, in
*** 1174,1179 ****
--- 1333,1516 ----
(target_char_to_host_baton, target_char, host_char));
}
+ /* Wide character support, via iconv. */
+
+ static struct cached_iconv cached_iconv_target_to_internal;
+ static struct cached_iconv cached_iconv_internal_to_host;
+
+ static iconv_t
+ target_to_internal_iconv_t (void)
+ {
+ check_iconv_cache (&cached_iconv_target_to_internal,
+ current_target_wide_charset,
+ internal_charset);
+
+ return cached_iconv_target_to_internal.i;
+ }
+
+ static iconv_t
+ internal_to_host_iconv_t (void)
+ {
+ check_iconv_cache (&cached_iconv_internal_to_host,
+ internal_charset,
+ current_host_charset);
+
+ return cached_iconv_internal_to_host.i;
+ }
+
+ void
+ reset_host_char_state (struct ui_file *stream)
+ {
+ char resetcode[200]; /* FIXME: Yuck, fixed-size buffer. */
+ size_t output_to_go = sizeof (resetcode), ret;
+ char *op = &resetcode[0];
+ iconv_t cd = internal_to_host_iconv_t ();
+
+ ret = iconv (cd, NULL, NULL, &op, &output_to_go);
+
+ if (ret != -1)
+ {
+ int i, reset_seq_length = sizeof (resetcode) - output_to_go;
+
+ for (i = 0; i < reset_seq_length; i++)
+ fputc_filtered (resetcode[i], stream);
+ }
+ }
+
+ /* Convert target bytes at *CP until we've read one code point in internal form
+ (UCS-4). Move *CP to the next input (multibyte) character. Returns the
+ converted character in *INTERN. Returns 0 on success, 1 on error. */
+
+ int
+ target_char_to_internal (unsigned long *intern, gdb_byte **cp)
+ {
+ char *ip = *cp;
+ char outbuf[4], *op;
+ size_t outbytesleft = sizeof (outbuf), ret, inbytes, probe_inbytes;
+ unsigned long internal = 0;
+ int i;
+ iconv_t cd = target_to_internal_iconv_t ();
+
+ probe_inbytes = 1;
+
+ *intern = 0;
+
+ while (outbytesleft != 0)
+ {
+ inbytes = probe_inbytes;
+ memset (outbuf, '\0', sizeof (outbuf));
+ ip = *cp;
+ op = &outbuf[0];
+ outbytesleft = sizeof (outbuf);
+
+ /* Reset conversion state. */
+ iconv (cd, NULL, NULL, NULL, NULL);
+ /* And do conversion. */
+ ret = iconv (cd, (ICONV_CONST char **) &ip, &inbytes, &op, &outbytesleft);
+
+ if (ret == (size_t) -1)
+ {
+ switch (errno)
+ {
+ case EILSEQ:
+ /* Illegal multibyte sequence -- give up. */
+ (*cp) += probe_inbytes;
+ return 1;
+
+ case EINVAL:
+ /* Incomplete multibyte sequence. Try converting a longer
+ one. */
+ probe_inbytes++;
+ break;
+
+ default:
+ /* Something else went wrong. */
+ error (_("GDB encountered unexpected `iconv' error."));
+ return 1;
+ }
+ }
+ }
+
+ /* Note: We explicitly use little-endian UCS-4 for our internal
+ representation, so that this gets the codepoint right. */
+ for (i = 0; i < 4; i++)
+ internal |= (unsigned char) outbuf[i] << (i * 8);
+
+ /* Move to next input char. */
+ *cp = ip;
+ *intern = internal;
+
+ return 0;
+ }
+
+ /* Return 0 on success, 1 on error. */
+
+ int
+ internal_char_host_emit (struct ui_file *stream, unsigned long codept)
+ {
+ char inbuf[4], *outbuf, *ip, *op;
+ static size_t outbufsize = 4;
+ size_t inbytesleft, rc, outbytesleft;
+ int i, converted;
+ iconv_t cd = internal_to_host_iconv_t ();
+
+ /* Handle control characters, etc. specially. Hm, this is C-specific. */
+ if (codept < 32 || codept == 127)
+ {
+ const char *esc = default_c_internal_char_has_backslash_escape (codept);
+
+ if (esc)
+ fprintf_filtered (stream, "\\%s", esc);
+ else
+ fprintf_filtered (stream, "\\%.3lo", codept);
+
+ return 0;
+ }
+
+ for (i = 0; i < 4; i++)
+ {
+ inbuf[i] = codept & 255;
+ codept >>= 8;
+ }
+
+ outbuf = xmalloc (outbufsize);
+
+ while (1)
+ {
+ ip = &inbuf[0];
+ op = outbuf;
+ inbytesleft = 4;
+ outbytesleft = outbufsize;
+ /* Reset conversion state. */
+ iconv (cd, NULL, NULL, NULL, NULL);
+ /* Attempt conversion. */
+ rc = iconv (cd, (ICONV_CONST char **) &ip, &inbytesleft, &op,
+ &outbytesleft);
+
+ if (rc != (size_t) -1)
+ break;
+
+ if (errno == E2BIG)
+ {
+ outbufsize *= 2;
+ outbuf = xrealloc (outbuf, outbufsize);
+ }
+ else
+ break;
+ }
+
+ converted = outbufsize - outbytesleft;
+
+ if (inbytesleft != 0 || converted == 0 || rc > 0)
+ return 1;
+
+ for (i = 0; i < converted; i++)
+ fputc_filtered (outbuf[i], stream);
+
+ free (outbuf);
+
+ return 0;
+ }
/* The charset.c module initialization function. */
*************** _initialize_charset (void)
*** 1231,1236 ****
--- 1568,1576 ----
set_host_charset (host_charset_name);
set_target_charset (target_charset_name);
+ set_target_wide_charset (target_wide_charset_name);
+
+ internal_charset = lookup_and_register_iconv_charset (GDB_INTERNAL_CODESET);
add_setshow_enum_cmd ("charset", class_support,
host_charset_enum, &host_charset_name, _("\
*************** To see a list of the character sets GDB
*** 1271,1274 ****
--- 1611,1628 ----
set_target_charset_sfunc,
show_target_charset_name,
&setlist, &showlist);
+
+ target_wide_charset_name = xstrdup (GDB_DEFAULT_TARGET_WIDE_CHARSET);
+
+ add_setshow_enum_cmd ("target-wide-charset", class_support,
+ target_wide_charset_enum, &target_wide_charset_name,
+ _("\
+ Set the target wide character (wchar_t) character set."), _("\
+ Show the target wide character (wchar_t) character set."), _("\
+ The `target wide character set' is the one used by the program being\n\
+ debugged for wide characters, e.g. literal wchar_t strings."),
+ set_target_wide_charset_sfunc,
+ show_target_wide_charset_name,
+ &setlist, &showlist);
+
}
Index: gdb/charset.h
===================================================================
RCS file: /cvs/src/src/gdb/charset.h,v
retrieving revision 1.7
diff -c -p -r1.7 charset.h
*** gdb/charset.h 3 Jan 2009 05:57:51 -0000 1.7
--- gdb/charset.h 15 Jan 2009 20:10:38 -0000
***************
*** 49,54 ****
--- 49,55 ----
it. */
const char *host_charset (void);
const char *target_charset (void);
+ const char *target_wide_charset (void);
/* In general, the set of C backslash escapes (\n, \f) is specific to
the character set. Not all character sets will have form feed
*************** int target_char_to_host (int target_char
*** 103,107 ****
--- 104,117 ----
zero. */
int target_char_to_control_char (int target_char, int *target_ctrl_char);
+ /* Wide character support: reset terminal state. */
+ void reset_host_char_state (struct ui_file *stream);
+
+ /* Wide character support: convert target character to internal form. */
+ int target_char_to_internal (unsigned long *, gdb_byte **cp);
+
+ /* Wide character support: emit character in internal form to host output
+ stream. */
+ int internal_char_host_emit (struct ui_file *stream, unsigned long codept);
#endif /* CHARSET_H */
Index: gdb/c-lang.c
===================================================================
RCS file: /cvs/src/src/gdb/c-lang.c,v
retrieving revision 1.60
diff -c -p -r1.60 c-lang.c
*** gdb/c-lang.c 3 Jan 2009 05:57:51 -0000 1.60
--- gdb/c-lang.c 15 Jan 2009 20:10:38 -0000
*************** c_printchar (int c, struct ui_file *stre
*** 78,83 ****
--- 78,213 ----
fputc_filtered ('\'', stream);
}
+ void
+ c_internal_char_host_emit (struct ui_file *stream, unsigned long codept)
+ {
+ int err;
+
+ err = internal_char_host_emit (stream, codept);
+
+ /* Some error occurred before printing anything. NOTE: This can cause
+ ambiguity in the displayed output. Not sure what to do about that. */
+ if (err)
+ fprintf_filtered (stream, "\\x%.4lx", codept);
+ }
+
+ /* Convert wchar_t elements (of WIDTH bytes each) from target memory to
+ internal form (a buffer of PRINT_MAX such elements) -- UCS-4 code points in
+ host endianness. Perform repeated character detection on this buffer --
+ allowing extension in case more characters are repeated. If a break in
+ repetition is detected, emit elements (in internal form) to the output
+ stream, in the host charset.
+ Don't print more than LENGTH target elements.
+ Note: WIDTH is currently ignored. */
+
+ void
+ c_printwidestr (struct ui_file *stream, const gdb_byte *string,
+ unsigned int length, int width, int force_ellipses,
+ const struct value_print_options *options)
+ {
+ unsigned long *buffer;
+ int buf_read_idx = 0, buf_write_idx = 0, repeat_starts_at = 0;
+ gdb_byte *sp = (gdb_byte *) string;
+ unsigned long repeating_char = -1u;
+ int repeat_count = 0, endpoint;
+ int in_quotes = 0, need_comma = 0, found_terminator = 0, any_errs = 0;
+ unsigned int buf_length = options->print_max + 1, things_printed = 0;
+
+ buffer = xmalloc (sizeof (long) * buf_length);
+
+ /* Most likely this is not necessary. */
+ reset_host_char_state (stream);
+
+ while (!found_terminator || buf_read_idx != buf_write_idx)
+ {
+ int err = target_char_to_internal (&buffer[buf_write_idx], &sp);
+
+ any_errs |= err;
+
+ if (need_comma)
+ {
+ fputs_filtered (", ", stream);
+ need_comma = 0;
+ }
+
+ if (buffer[buf_write_idx] == repeating_char && !found_terminator)
+ repeat_count++;
+ else
+ {
+ int repeating_tail = repeat_count > options->repeat_count_threshold;
+ int nonrepeating_end = repeating_tail ? repeat_starts_at
+ : buf_write_idx;
+ int nonrepeating_head = nonrepeating_end > buf_read_idx;
+
+ if (!in_quotes && nonrepeating_head)
+ {
+ if (options->inspect_it)
+ fputs_filtered ("\\\"", stream);
+ else
+ fputs_filtered ("\"", stream);
+ in_quotes = 1;
+ }
+
+ while (buf_read_idx < nonrepeating_end)
+ {
+ c_internal_char_host_emit (stream, buffer[buf_read_idx++]);
+ things_printed++;
+ }
+
+ if (repeating_tail)
+ {
+ if (in_quotes)
+ {
+ if (options->inspect_it)
+ fputs_filtered ("\\\", ", stream);
+ else
+ fputs_filtered ("\", ", stream);
+ in_quotes = 0;
+ }
+
+ fputc_filtered ('\'', stream);
+ c_internal_char_host_emit (stream, repeating_char);
+ fputc_filtered ('\'', stream);
+
+ fprintf_filtered (stream, _(" <repeats %u times>"), repeat_count);
+ buf_read_idx = buf_write_idx;
+
+ things_printed += repeat_count;
+
+ need_comma = 1;
+ }
+
+ repeating_char = buffer[buf_write_idx];
+ repeat_starts_at = buf_write_idx;
+ repeat_count = 1;
+ }
+
+ if (buf_write_idx < length && things_printed < options->print_max && !err)
+ buf_write_idx++;
+ else
+ found_terminator = 1;
+ }
+
+ if (in_quotes)
+ {
+ if (options->inspect_it)
+ fputs_filtered ("\\\"", stream);
+ else
+ fputs_filtered ("\"", stream);
+ }
+
+ /* Most likely this is not necessary. */
+ reset_host_char_state (stream);
+
+ if (any_errs)
+ fputs_filtered ("<character conversion error>", stream);
+
+ if (force_ellipses || buf_write_idx < length)
+ fputs_filtered ("...", stream);
+
+ free (buffer);
+ }
+
/* Print the character string STRING, printing at most LENGTH characters.
LENGTH is -1 if the string is nul terminated. Each character is WIDTH bytes
long. Printing stops early if the number hits print_max; repeat counts are
*************** c_printstr (struct ui_file *stream, cons
*** 109,114 ****
--- 239,250 ----
return;
}
+ if (width > 1)
+ {
+ c_printwidestr (stream, string, length, width, force_ellipses, options);
+ return;
+ }
+
for (i = 0; i < length && things_printed < options->print_max; ++i)
{
/* Position of the character we are examining
Index: gdb/printcmd.c
===================================================================
RCS file: /cvs/src/src/gdb/printcmd.c,v
retrieving revision 1.141
diff -c -p -r1.141 printcmd.c
*** gdb/printcmd.c 3 Jan 2009 05:57:53 -0000 1.141
--- gdb/printcmd.c 15 Jan 2009 20:10:41 -0000
*************** print_formatted (struct value *val, int
*** 269,279 ****
switch (options->format)
{
case 's':
! /* FIXME: Need to handle wchar_t's here... */
! next_address = VALUE_ADDRESS (val)
! + val_print_string (VALUE_ADDRESS (val), -1, 1, stream,
! options);
! return;
case 'i':
/* We often wrap here if there are long symbolic names. */
--- 269,293 ----
switch (options->format)
{
case 's':
! {
! struct type *elttype = TYPE_TARGET_TYPE (type)
! ? check_typedef (TYPE_TARGET_TYPE (type))
! : NULL;
! unsigned eltlen = 1;
!
! /* If this is a plausible string of wide characters, try to print
! it as such. */
! if (TYPE_CODE (type) == TYPE_CODE_PTR
! && elttype
! && TYPE_CODE (elttype) == TYPE_CODE_INT
! && (TYPE_LENGTH (elttype) == 2 || TYPE_LENGTH (elttype) == 4))
! eltlen = TYPE_LENGTH (elttype);
!
! next_address = VALUE_ADDRESS (val)
! + val_print_string (VALUE_ADDRESS (val), -1, eltlen, stream,
! options);
! return;
! }
case 'i':
/* We often wrap here if there are long symbolic names. */
Index: gdb/configure.ac
===================================================================
RCS file: /cvs/src/src/gdb/configure.ac,v
retrieving revision 1.84
diff -c -p -r1.84 configure.ac
*** gdb/configure.ac 12 Jan 2009 01:10:27 -0000 1.84
--- gdb/configure.ac 15 Jan 2009 20:10:42 -0000
*************** AC_DEFINE(GDB_DEFAULT_HOST_CHARSET, "ISO
*** 1913,1918 ****
--- 1913,1920 ----
AM_ICONV
+ AM_LANGINFO_CODESET
+
AC_OUTPUT(Makefile .gdbinit:gdbinit.in gnulib/Makefile,
[
dnl Autoconf doesn't provide a mechanism for modifying definitions
Index: gdb/acinclude.m4
===================================================================
RCS file: /cvs/src/src/gdb/acinclude.m4,v
retrieving revision 1.24
diff -c -p -r1.24 acinclude.m4
*** gdb/acinclude.m4 3 Jan 2009 05:57:50 -0000 1.24
--- gdb/acinclude.m4 15 Jan 2009 20:10:42 -0000
*************** sinclude(../config/acx.m4)
*** 23,28 ****
--- 23,31 ----
dnl for TCL definitions
sinclude(../config/tcl.m4)
+ dnl for langinfo check
+ sinclude(../config/codeset.m4)
+
dnl For dependency tracking macros.
sinclude([../config/depstand.m4])