This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

printing 0xbeef wchar_t on x86-windows...

From: Joel Brobecker <brobecker at adacore dot com>
To: gdb-patches at sourceware dot org
Cc: Tom Tromey <tromey at redhat dot com>
Date: Mon, 15 Oct 2012 12:00:52 -0700
Subject: printing 0xbeef wchar_t on x86-windows...

Hello,

I have a variable of type wchar_t whose value is 0xbeef, simply
defined as follow:

    wchar_t single = 0xbeef;

But with the current HEAD, I get:

    (gdb) print single
    $5 = 48879 L'\357'

In chronological order:

  * valprint.c:generic_emit_char calls wchar_iterate, and finds
    one valid character according to the intermediate encoding
    ("wchar_t"), even though the character isn't valid in the
    original/target charset ("CP1252").

  * valprint.c:print_wchar then checks whether the character is
    printable or not. If it wasn't, then print_wchar would have
    converted the multi-byte sequence into a hex string image.
    But unfortunately for us, Window's iswprint likes 0xbeef as
    printable, as so print_wchar puts it in the buffer as is to
    be printed.

  * Before actually printing the buffer, generic_emit_char converts
    the string from the intermediate encoding into the host encoding,
    which is "CP1252". The converstion routine now finds that,
    although the multi-bypte sequence is printable, it isn't valid
    in the target encoding (iconv returns EILSEQ), and thus
    replaces the wchar by a string with a sequence of octal numbers,
    one for each byte. For instance \357 is 0xef.

    But the problem is that convert_between_encodings was called
    with the width set to 1, instead of using the character type's
    size.

With the attached patch, we now get the following output...

    (gdb) print single
    $2 = 48879 L'\357\276'

... which is no longer missing half of the wide character value.

For completeness' sake, GDB 7.5 used to produce the following output:

    (gdb) print single
    $2 = 48879 L'\xbeef'

I prefer this output, as it provides the wide character as one number,
rather than two. The reason why GDB 7.5 presented the value this way
is because it took a different path during the initial iteration, thanks
to the fact that the intermediate encoding was "CP1252" instead of
"wchar_t", making the character invalid the whole way. This comes from
a change in defs.h which added an include of build-gnulib/config.h,
which itself caused HAVE_WCHAR_H to be defined, thus influencing
the intermediate encoding.

I have a feeling that going back to "CP1252" as the intermediate
encoding isn't something that we'd like to do. What I explored for
a while, was the idea of having convert_between_encodings transform
invalid sequences into one single number, the same way print_wchar
does. But I think that there is an endianness issue - not sure -
as we don't really know whether the buffer is following the target
or host endinaness. We need that piece of info in order to extract
the wide character's value.

Nonetheless, I think that this can be looked at separately if desired.
In the meantime, the following patch updates the calls to
convert_between_encodings to pass the correct width, and the new
output is already an improvement. So I think that the attached
patch is worth checking in on its own.

gdb/ChangeLog:

        * valprint.c (generic_emit_char): Pass correct width in call to
        convert_between_encodings.
        (generic_printstr): Likewise.

Tested on x86-linux. OK to commit?

Thanks,
-- 
Joel

diff --git a/gdb/valprint.c b/gdb/valprint.c
index 6e651f6..31cef54 100644
--- a/gdb/valprint.c
+++ b/gdb/valprint.c
@@ -2037,7 +2037,7 @@ generic_emit_char (int c, struct type *type, struct ui_file *stream,
   convert_between_encodings (INTERMEDIATE_ENCODING, host_charset (),
 			     obstack_base (&wchar_buf),
 			     obstack_object_size (&wchar_buf),
-			     1, &output, translit_char);
+			     TYPE_LENGTH (type), &output, translit_char);
   obstack_1grow (&output, '\0');
 
   fputs_filtered (obstack_base (&output), stream);
@@ -2278,7 +2278,7 @@ generic_printstr (struct ui_file *stream, struct type *type,
   convert_between_encodings (INTERMEDIATE_ENCODING, host_charset (),
 			     obstack_base (&wchar_buf),
 			     obstack_object_size (&wchar_buf),
-			     1, &output, translit_char);
+			     width, &output, translit_char);
   obstack_1grow (&output, '\0');
 
   fputs_filtered (obstack_base (&output), stream);

Follow-Ups:
- Re: printing 0xbeef wchar_t on x86-windows...
  - From: Eli Zaretskii
- Re: printing 0xbeef wchar_t on x86-windows...
  - From: Tom Tromey

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]