This is the mail archive of the gdb@sourceware.org mailing list for the GDB project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: printing wchar_t*

From: Vladimir Prus <ghost at cs dot msu dot su>
To: Eli Zaretskii <eliz at gnu dot org>
Cc: gdb at sources dot redhat dot com
Date: Fri, 14 Apr 2006 18:37:25 +0400
Subject: Re: printing wchar_t*
References: <e1lsqg$aml$1@sea.gmane.org> <200604141257.41690.ghost@cs.msu.su> <uu08w1cnf.fsf@gnu.org>

On Friday 14 April 2006 17:59, Eli Zaretskii wrote:

> > In an original post, I've asked if gdb can print wchar_t just as a raw
> > sequence of values, like this:
> >
> >     0x56, 0x1456
>
> The answer is YES.  Use array notation, and add a feature to report
> the length of a wchar_t array.

Ok.

> Now, the same letter ``small a'' can be encoded in several other ways:
> for example, its ISO-2022-7bit encoding is 0x1B 0x24 0x2C 0x31 0x28
> 0x50, its KOI8-r encoding is 0xC1, its ISO-8859-5 encoding is 0xD0,
> etc.  It should be obvious that, of all the encodings, only the
> fixed-length ones can be used in a wchar_t array (because wchar_t
> arrays are stateless, 

I don't think this statement is backed up by anything.

> This is why I said that wchar_t is not used for an encoding (such as
> ISO-8859-5 or UTF-8 or UTF-16), but for characters' codepoints.  It is
> nowadays almost universally accepted that wchar_t is a Unicode
> codepoint, 

Again, can you provide any specific pointers to support that view?

> the only difference between applications being whether only 
> the first 64K characters (the so-called BMP) are supported by 16-bit
> wchar_t, or the entire 23-bit range is supported by a 32-bit wchar_t.

I believe that on Windows:

- wchar_t is 16-bit
- wchar_t* values are supposed to be in UTF-16 encoding
(see    
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_9i79.asp

Do you disagree with any of the above statements? If not, then it directly 
follows that a given wchar_t is not a Unicode code point, but a code unit in 
specific representation (UTF-16), and a given code points takes either one or 
two code units, that is either one or two wchar_t. This is contrary to your 
statement that wchar_t is a single code point.

Anyway, this is quickly getting off-topic for gdb list, so maybe we should 
bring this somewhere else.

- Volodya

Follow-Ups:
- Re: printing wchar_t*
  - From: Eli Zaretskii

References:
- printing wchar_t*
  - From: Vladimir Prus
- Re: printing wchar_t*
  - From: Vladimir Prus
- Re: printing wchar_t*
  - From: Eli Zaretskii

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]