This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: printing 0xbeef wchar_t on x86-windows...


>>>>> "Joel" == Joel Brobecker <brobecker@adacore.com> writes:

Joel>   * valprint.c:generic_emit_char calls wchar_iterate, and finds
Joel>     one valid character according to the intermediate encoding
Joel>     ("wchar_t"), even though the character isn't valid in the
Joel>     original/target charset ("CP1252").

FWIW I think Eli's analysis here is correct.

generic_emit_char should be assuming that the character is in the target
wide charset, not in the target charset.  That is, "show
target-wide-charset".

If the 'encoding' argument to generic_emit_char is "CP1252" then I think
something went wrong earlier.

Joel>   * Before actually printing the buffer, generic_emit_char converts
Joel>     the string from the intermediate encoding into the host encoding,
Joel>     which is "CP1252". The converstion routine now finds that,
Joel>     although the multi-bypte sequence is printable, it isn't valid
Joel>     in the target encoding (iconv returns EILSEQ), and thus

Must be the host encoding here, not the target encoding?

Joel>     But the problem is that convert_between_encodings was called
Joel>     with the width set to 1, instead of using the character type's
Joel>     size.

This does seem wrong.  But, I don't think that using the type length
here is correct, either.

The width argument to convert_between_encodings is documented as:

   WIDTH is the width of a character from the FROM charset, in bytes.
   For a variable width encoding, WIDTH should be the size of a "base
   character".

(I didn't check whether this comment is accurate.)

And, this call to convert_between_encodings is converting from the
intermediate charset to the host charset.  So, I think this should be
sizeof (gdb_wchar_t).

Before putting something like that in, though, I would like to look at
Keith's pending patch that reworks this code.  Maybe he already fixed
the bug.

Also, I think this should have a regression test.

Joel> For completeness' sake, GDB 7.5 used to produce the following output:
Joel>     (gdb) print single
Joel>     $2 = 48879 L'\xbeef'
Joel> I prefer this output, as it provides the wide character as one number,
Joel> rather than two.

Offhandedly I agree, but my recollection is that all these little
decisions have some logic behind them (though sometimes just "that's how
it used to work"), and so you have to dig down to see what the change
would really imply.

Joel> The reason why GDB 7.5 presented the value this way
Joel> is because it took a different path during the initial iteration, thanks
Joel> to the fact that the intermediate encoding was "CP1252" instead of
Joel> "wchar_t", making the character invalid the whole way. This comes from
Joel> a change in defs.h which added an include of build-gnulib/config.h,
Joel> which itself caused HAVE_WCHAR_H to be defined, thus influencing
Joel> the intermediate encoding.

This area is quite fiddly unfortunately.

It sounds like the recent gnulib imports have invalidated some of the
logic in gdb_wchar.h.  It seems that we can now always rely on wchar.h
being available.  So maybe we could at least remove some configury and
#ifs.

Tom


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]