This is the mail archive of the
archer@sourceware.org
mailing list for the Archer project.
Re: Python pretty-printers and non-ASCII strings do not play well together :-(
Tom> What should happen here, though? The string contains invalid
Tom> characters for its declared (via set target-charset) encoding.
Paul> As an end-user, I would expect something like
Paul> $2 = <"\xef\xcd\xab">
It occurs to me I am not completely certain where this error
originates. My theory is that it is the call to PyUnicode_Decode in
valpy_str.
If so, then we aren't seeing a value representation problem, which is
what I was worried about. Instead, I think common_val_print is
emitting a string which is not actually valid according to
host_charset. That seems wrong.
We could work around this in valpy_str, I suppose. But I'm curious to
know why this is happening -- why isn't common_val_print printing the
escape sequences itself?
My guess is that the target and host charsets are the same, and
charset.c is passing character through without checking them for
validity. I didn't debug it, but when I set host-charset to ASCII (my
target-charset is ISO-8859-1), I do see the escapes.
Every time I look at this stuff I'm reminded that the gdb charset code
could use a good scrubbing. For example, the default host charset
ought to come from the locale settings. I have a patch to implement
this, but there's no point submitting it since it breaks gdb on
typical Linux systems -- most people use UTF-8 locales, but gdb
doesn't handle UTF-8.
Maybe we should just install a smart Python printer for 'char *' ;-)
Paul> What are some of the good Python references?
Tom> http://www.python.org/doc/2.5.2/api/api.html
Paul> Yes, I've seen the above, but it didn't have the answers I was
Paul> looking for :(
What do you want to know? Both Thiago and I have worked in this area,
maybe one of us knows.
Tom