This is the mail archive of the gdb@sourceware.org mailing list for the GDB project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [RFC] string handling in python

From: Thiago Jung Bauermann <bauerman at br dot ibm dot com>
To: tromey at redhat dot com
Cc: gdb ml <gdb at sourceware dot org>
Date: Tue, 08 Jul 2008 02:30:13 -0300
Subject: Re: [RFC] string handling in python
References: <1215408302.1795.38.camel@localhost.localdomain> <m3prpp9srd.fsf@fleche.redhat.com>

On Mon, 2008-07-07 at 17:30 -0600, Tom Tromey wrote:
> >>>>> "Thiago" == Thiago Jung Bauermann <bauerman@br.ibm.com> writes:
> 
> Thiago> So, in my opinion for GDB's Python bindings we should always
> Thiago> use Unicode strings, and convert to/from desired encodings as
> Thiago> necessary. Strings provided by the user would be assumed to
> Thiago> have host_charset () encoding, and strings coming from/going
> Thiago> to the inferior would be assumed to have target_charset ()
> Thiago> encoding.
> 
> Sounds reasonable to me.
> 
> I thought we already did some of this... search for host_charset in
> the python directory.

It doesn't really work. PyString_Decode transforms the string from
host_charset to unicode, and then from unicode to Python's default
charset (almost always ASCII). So if you have any non-ASCII character in
the string, Python won't even be able to print the string on screen. I
just made the test, by making valpy_str use PyString_Decode instead of
PyString_String:

(gdb) p s
$2 = 0x80484f0 "acentuaÃÃo"
(gdb) py a = gdb.get_value_from_history (1)
(gdb) py print a
Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 17: ordinal not in range(128)

Oddly enough, if I use PyString_String then the example works. I'm not
sure why, though. Probably PyString_String doesn't try to convert back
and forth between charsets, and just prints the stream of bytes to the
screen hoping for the best...

> Thiago> So for example, to create a value object of char * type using
> Thiago> a string provided by the user or coming from Python code, GDB
> Thiago> would first convert the Python string object (assumed to be in
> Thiago> the host charset) to a unicode object (this process is called
> Thiago> "decoding", in python parlance), and then convert it from
> Thiago> unicode to a string in the target charset.
> 
> This sounds like a good candidate for convenience functions, one for
> each direction.

Right, I'll add them.
-- 
[]'s
Thiago Jung Bauermann
Software Engineer
IBM Linux Technology Center

Follow-Ups:
- Re: [RFC] string handling in python
  - From: Thiago Jung Bauermann

References:
- [RFC] string handling in python
  - From: Thiago Jung Bauermann
- Re: [RFC] string handling in python
  - From: Tom Tromey

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]