This is the mail archive of the
gdb@sourceware.org
mailing list for the GDB project.
Re: [RFC] Signed/unsigned character arrays are not strings
- From: Daniel Jacobowitz <drow at false dot org>
- To: Jan Kratochvil <jan dot kratochvil at redhat dot com>
- Cc: mathieu lacage <Mathieu dot Lacage at sophia dot inria dot fr>, Nick Roberts <nickrob at snap dot net dot nz>, gdb at sourceware dot org
- Date: Tue, 10 Apr 2007 17:59:51 -0400
- Subject: Re: [RFC] Signed/unsigned character arrays are not strings
- References: <17887.62990.937672.281975@kahikatea.snap.net.nz> <20070224161315.GA27534@caradoc.them.org> <17888.39894.136355.447008@kahikatea.snap.net.nz> <1172390381.2584.18.camel@mathieu> <20070225195350.GA12811@host0.dyn.jankratochvil.net>
On Sun, Feb 25, 2007 at 08:53:50PM +0100, Jan Kratochvil wrote:
> On Sun, 25 Feb 2007 08:59:41 +0100, mathieu lacage wrote:
> ...
> > I don't know how useful that is to you but a lot of people (the first
> > which comes to my mind is libxml2) decided to use "unsigned char *" to
> > identify utf-8 encoded strings in C.
>
> Together with the attached RMS's response I became more inclined to revert this
> change and provide only "$xmm"-specific fix instead (probably for the GDB
> int8_t/uint8_t internal types).
There was a lot of discussion about how to treat signed char, unsigned
char, signed char *, et cetera. There weren't a lot of conclusions,
but several people did not like the new behavior, and then discussion
trailed off.
I don't want to just revert the patch, because the problem that Jan
was fixing (unuseful display of $xmm registers) is really quite
annoying. I see these options:
1. Make vector types special. Treat arrays of single byte integers
as characters, like before, unless they occur in a vector type. This
is reasonable, but tricky to implement.
2. Make two special single byte integer types, with a GDB internal
"not a char" flag set. Use them for our builtin int8_t and uint8_t.
Use these to build types for vector registers. Print all other single
byte types from user code as chars or strings. This is similar to
#1, a little less helpful, but fairly easy.
3. Treat "char" as a character, but "unsigned char" and "signed char"
as numbers (Jan's patch started down this road and Jim's went a bit
further). Treat pointers/arrays of char as strings and
pointers/arrays of unsigned or signed char as numbers. Add a "/s"
flag to the print command that treats single byte types as
characters or strings.
For example:
char str[] = "hi";
unsigned char version[] = "6.5";
(gdb) p version
$1 = { 54, 46, 53 }
(gdb) p/s version
$2 = "6.5"
(gdb) p str
$3 = "hi"
4. Like #3, except that instead of adding a /s modifier, add a "set"
knob. Of course in this case we get to argue about the default value.
I think it's important that we resolve this open issue before we
release a new version of GDB, so please post which you prefer. I like
#3 best, followed by #2; #4 is a good compromise but I worry that we
are proliferating knobs that no one ever changes. I'm interested in
any other suggestions, though I think we've ruled out guessing based
on the type name.
--
Daniel Jacobowitz
CodeSourcery