This is the mail archive of the gdb@sourceware.org mailing list for the GDB project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: printing wchar_t*

From: Eli Zaretskii <eliz at gnu dot org>
To: Vladimir Prus <ghost at cs dot msu dot su>
Cc: pkoning at equallogic dot com, gdb at sources dot redhat dot com
Date: Fri, 14 Apr 2006 20:10:29 +0300
Subject: Re: printing wchar_t*
References: <e1lsqg$aml$1@sea.gmane.org> <17471.42725.651176.368871@gargle.gargle.HOWL> <uodz41b8l.fsf@gnu.org> <200604141850.08495.ghost@cs.msu.su>
Reply-to: Eli Zaretskii <eliz at gnu dot org>

> From: Vladimir Prus <ghost@cs.msu.su>
> Date: Fri, 14 Apr 2006 18:50:07 +0400
> Cc: Paul Koning <pkoning@equallogic.com>,  gdb@sources.redhat.com
> 
> > You could use wchar_t arrays for that, but then not every array
> > element will be a full character, and you will not be able to access
> > individual characters by their positional index.
> 
> And what? Even if wchar_t is 32 bit then element at position 'i' can be 
> combining character modifying another character, and be of little use itself.

You are introducing into the argument yet another face of a character:
how it is displayed.  It's true that some characters, when they are
adjacent to each other, are displayed in some special way (the ff
ligature is one simple example of that), but that is something for the
rendering engine to take care of, it has nothing to do with the
string's content.  As far as any software, except the rendering
engine, is concerned, the combining character is, in fact, part of the
string.  For example, if the user wants to search for such a
character, the program must find it.

So, for the purposes of processing the wchar_t strings, it is very
important to know whether they are fixed-size wide characters or
variable-size encoding.  If you just copy the string verbatim to and
fro, then it doesn't matter, but for anything more complex the
difference is very large.

> > If we want to support wchar_t arrays that store UTF-16, we will need
> > to add a feature to GDB to convert UTF-16 to the full UCS-4
> > codepoints, and output those.  
> 
> That's what I mentioned in a reply to Jim -- since the current string printing 
> code operated "one wchar_t at a time", it's not suitable for outputing UTF-16 
> encoded wchar_t values to the user.

I don't understand: if the wchar_t array holds a UTF-16 encoding, then
when you receive the entire string, you have a UTF-16 encoding of what
you want to display, and you yourself said that displaying a UTF-16
encoded string is easy for you.  So where is the problem? is that only
that you cannot know the length of the UTF-16 encoded string? or is
there something else missing?

> > Alternatively, the FE will have to 
> > support display of UTF-16 encoded characters.
> 
> Speaking about FE, handling UTF-16 is trivial

Maybe in your environment and windowing system, but not in all cases,
AFAIK.

Follow-Ups:
- Re: printing wchar_t*
  - From: Vladimir Prus

References:
- printing wchar_t*
  - From: Vladimir Prus
- Re: printing wchar_t*
  - From: Paul Koning
- Re: printing wchar_t*
  - From: Eli Zaretskii
- Re: printing wchar_t*
  - From: Vladimir Prus

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]