String printing in rust is somewhat wrong. I think it always tries to print &[u8] using UTF-8. Maybe that is ok, but sometimes &[u8] is just bytes. Maybe some detection could be done; or maybe it could default to printing bytes and special case the contents of &str. Escapes should be printed rust-style, like \u{nnn}, but currently are not.
Special-casing &str is being done in bug 22236.
Working on the escapes now.
I wonder if char printing should use unicode-ish syntax like: (gdb) print 'x' $1 = U+0078 'x' Not sure.
(In reply to Tom Tromey from comment #3) > I wonder if char printing should use unicode-ish syntax like: > > (gdb) print 'x' > $1 = U+0078 'x' > > Not sure. I like it. Note that the convention is to have zeros prepended when there are less than 4 digits, but not when there are 5 or 6 digits. https://www.unicode.org/versions/Unicode14.0.0/appA.pdf "Leading zeros are omitted, unless the code point would have fewer than four hexadecimal digits — for example, U+0001, U+0012, U+0123, U+1234, U+12345, U+102345." I am not sure how to flag the "illegal" char values, U+D800 to U+DFFF and anything larger than U+10FFFF. But it would be good to have some indicator you are dealing with an invalid value.
(In reply to Mark Wielaard from comment #4) > I am not sure how to flag the "illegal" char values, U+D800 to U+DFFF and > anything larger than U+10FFFF. But it would be good to have some indicator > you are dealing with an invalid value. You'll already get a hex escape in this situation: (gdb) p '\u{d800}' $2 = 55296 '\u{00d800}' I guess the escapes sort of make the "U+..." idea a bit weird. It would just be printing the same info twice.
(In reply to Tom Tromey from comment #5) > (In reply to Mark Wielaard from comment #4) > > > I am not sure how to flag the "illegal" char values, U+D800 to U+DFFF and > > anything larger than U+10FFFF. But it would be good to have some indicator > > you are dealing with an invalid value. > > You'll already get a hex escape in this situation: > > (gdb) p '\u{d800}' > $2 = 55296 '\u{00d800}' But isn't that escape "value" itself invalid? Shouldn't it be something like $2 = 55296 '???' So, use U+xxxxxx 'unicode-char' for valid char values, otherwise "raw number" '???'
Yeah, I see that rustc rejects a program using that. We could print something like $2 = U+D8000 <invalid Unicode character> ... at least if the invalid ranges aren't a pain to figure out. Is it really just surrogates and values > 0x10ffff? I didn't look yet. There are still characters that print as escapes, for example U+200C. But maybe double-printing these oddities isn't so bad.
(In reply to Tom Tromey from comment #7) > Yeah, I see that rustc rejects a program using that. > We could print something like > > $2 = U+D8000 <invalid Unicode character> I would like that. > ... at least if the invalid ranges aren't a pain to figure out. > Is it really just surrogates and values > 0x10ffff? I didn't look yet. It is, see https://doc.rust-lang.org/reference/types/textual.html "A value of type char is a Unicode scalar value (i.e. a code point that is not a surrogate), represented as a 32-bit unsigned word in the 0x0000 to 0xD7FF or 0xE000 to 0x10FFFF range. It is immediate Undefined Behavior to create a char that falls outside this range." > There are still characters that print as escapes, for example U+200C. > But maybe double-printing these oddities isn't so bad. I don't think it is that odd. You could also use ascii escapes (https://doc.rust-lang.org/reference/tokens.html#ascii-escapes) when possible.
(In reply to Mark Wielaard from comment #8) > I don't think it is that odd. You could also use ascii escapes > (https://doc.rust-lang.org/reference/tokens.html#ascii-escapes) when > possible. That's already done, though I see on that page that only up to 0x7f is supported for char types, so there's a little bug in the output here, as gdb checks "if (value <= 255)"