This is the mail archive of the gdb@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: printing wchar_t*


> From:  Vladimir Prus <ghost@cs.msu.su>
> Date:  Fri, 14 Apr 2006 10:10:19 +0400
> 
> The problem is that I don't see any way how gdb can print wchar_t in a way
> that does not require post-processing. It can print it as UTF8, but then
> for printing char* gdb should use local 8 bit encoding, which is likely to
> be *not* UTF8.

You are talking about a GUI front-end, aren't you?  In that case, you
will need to code a routine that accepts a wchar_t string, and then
_displays_ it using the appropriate font.  It is wrong to talk about
``printing'' it and about ``local 8-bit encoding'', because you don't
want to encode it, you want to display it using the appropriate font.

In particular, if the original wchar_t uses Unicode codepoints, then
presumably there should be some GUI API call, specific to your
windowing system, that would accept such a wchar_t string and display
it using a Unicode font.

So if you are going to do this in the front-end, I think all you need
is ask GDB to supply the wchar_t string using the array notation; the
rest will have to be done inside the front-end.  Am I missing
something?

> Gdb can probably use some extra markers for values: like:
> 
>    "foo"  for string in local 8-bit encoding
>    L"foo" for string in UTF8 encoding.
> 
> It's also possible to use "\u" escapes.

Why do you need any of these?  16-bit Unicode characters are just
integers, so ask GDB to send them as integers.  That should be all you
need, since displaying them is something your FE will need to do
itself, no?

> But then there's a problem:
> 
>    - Do we assume that wchar_t is always UTF-16 or UTF-32?

You don't need to assume, you can ask the application.  Wouldn't
"sizeof(wchar_t)" do the trick?

>      - how user-specified encoding will be handled

wchar_t is not an encoding, it's the characters' codes themselves.
Encoded characters are (in general) multibyte character strings, not
wchar_t.  See, for example, the description of library functions
mbsinit, mbrlen, mbrtowc, etc., for more about this distinction.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]