Summary: | GDB 13.1 no longer prints length of Rust slice wrappers | ||
---|---|---|---|
Product: | gdb | Reporter: | Josh Stone <jistone> |
Component: | rust | Assignee: | Tom Tromey <tromey> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | michaelwoerister, tromey |
Priority: | P2 | ||
Version: | 13.1 | ||
Target Milestone: | 15.1 | ||
Host: | Target: | ||
Build: | Last reconfirmed: | ||
Attachments: | Binary that triggers crash when printing Rust &str slice |
Description
Josh Stone
2023-04-10 20:45:10 UTC
Apparently the intent now is to print the underlying data in the slice, not just the shape of it. However gdb stumbles over this obviously incorrect debuginfo: <2><2699>: Abbrev Number: 7 (DW_TAG_structure_type) <269a> DW_AT_name : (indirect string, offset: 0x41a): Foo<[u8]> <269e> DW_AT_byte_size : 0 <269f> DW_AT_alignment : 1 This structure does not really have a byte size of 0, because it has this member: <3><26a9>: Abbrev Number: 25 (DW_TAG_member) <26aa> DW_AT_name : (indirect string, offset: 0xff104): value <26ae> DW_AT_type : <0x23b0> <26b2> DW_AT_byte_size : 1 <26b3> DW_AT_bit_size : 0 <26b4> DW_AT_bit_offset : 8 <26b5> DW_AT_data_member_location: 0 This seems to be a rustc bug. I don't know yet if this can be worked around in a reasonable way. Hmm, Rust 1.59 to 1.60 changed those member attributes.
1.59:
<3><1002>: Abbrev Number: 12 (DW_TAG_member)
<1003> DW_AT_name : (indirect string, offset: 0xf63af): value
<1007> DW_AT_type : <0x13ba>
<100b> DW_AT_alignment : 1
<100c> DW_AT_data_member_location: 0
1.60:
<3><13a6>: Abbrev Number: 41 (DW_TAG_member)
<13a7> DW_AT_name : (indirect string, offset: 0x105d4b): value
<13ab> DW_AT_type : <0xb7>
<13af> DW_AT_byte_size : 1
<13b0> DW_AT_bit_size : 0
<13b1> DW_AT_bit_offset : 8
<13b2> DW_AT_data_member_location: 0
> This structure does not really have a byte size of 0,
It doesn't have a *static* size at all -- it needs the reference's length.
> It doesn't have a *static* size at all -- it needs the reference's length.
Yeah, oops, sorry about that.
I guess we need some kind of type resolution that will rewrite
the size of the final field in an unsized struct. Unfortunately
this doesn't seem to fit that well into how dynamic type resolution
is currently handled in gdb. So maybe we need some Rust-specific
bit... Ada handles similar(-ish) cases entirely on its own but I think
it would be better to somehow hook into the generic code.
FWIW I can't gdb 12 to print anything really useful here.
You can inspect the data, a bit, but it isn't really obvious
and I even get some weird stuff:
(gdb) p *a.data_ptr
$2 = prog::Foo<[u8]> {
value: 0
}
(gdb) p a.data_ptr.value
$3 = 97
Also it seems a little weird that the trailing member is a u8 and not an array: <3><26a9>: Abbrev Number: 25 (DW_TAG_member) <26aa> DW_AT_name : (indirect string, offset: 0xff104): value <26ae> DW_AT_type : <0x23b0> ... <1><23b0>: Abbrev Number: 6 (DW_TAG_base_type) <23b1> DW_AT_name : (indirect string, offset: 0x158e): u8 <23b5> DW_AT_encoding : 7 (unsigned) <23b6> DW_AT_byte_size : 1 BTW, I'm not trying to shirk rustc responsibility here - there could well be improvements needed in that debuginfo. But frankly, you've done more with that than I have. :) (In reply to Josh Stone from comment #5) > BTW, I'm not trying to shirk rustc responsibility here Not at all, also I'm never really sure I understand unsized types. I found this comment in rust/compiler/rustc_codegen_llvm/src/debuginfo/metadata.rs: /// Create debuginfo for `[T]` and `str`. These are unsized. /// /// NOTE: We currently emit just emit the debuginfo for the element type here /// (i.e. `T` for slices and `u8` for `str`), so that we end up with /// `*const T` for the `data_ptr` field of the corresponding fat-pointer /// debuginfo of `&[T]`. /// /// It would be preferable and more accurate if we emitted a DIArray of T /// without an upper bound instead. That is, LLVM already supports emitting /// debuginfo of arrays of unknown size. But GDB currently seems to end up /// in an infinite loop when confronted with such a type. I don't recall seeing a report about this infinite loop. I'm not exactly sure how to reproduce it. Anyway I tend to think the rustc debuginfo here is just bad. I'm reluctant to try to work around it. It would be better if the DWARF accurately described what is really being emitted, including some information about where the array bounds can be found dynamically. To implement a workaround here, I guess I'd need to know exactly which field(s) might have to change from 'T' to 'T[]' to be correct. I don't know how to find this. Unsized types doc'd here: https://doc.rust-lang.org/book/ch19-04-advanced-types.html#dynamically-sized-types-and-the-sized-trait I don't know why I keep not understanding this but I think I finally figured it out. (gdb) ptype a type = struct &prog::Foo<[u8]> { struct prog::Foo<[u8]> *data_ptr; usize length; } So here, gdb should change the type of 'data_ptr' to 'struct prog::Foo<[u8; ...length...]> *' This can't readily be expressed in DWARF I think. But gdb can apply the change by itself. So then all that remains is fixing up the instance not to use 'u8' (but rather 'u8[]') as the type of the "VLA" part. Created attachment 15244 [details] Binary that triggers crash when printing Rust &str slice Compiled version of this rustc test case: https://github.com/rust-lang/rust/blob/568f6a8641e391ffcdcdb03e79d0901731d8d399/tests/debuginfo/empty-string.rs#L15 The rustc version this is compiled with has been modified to emit the proposed DWARF for slices: https://github.com/michaelwoerister/rust/tree/try-fixing-slice-debuginfo The bug with the test case is due to insufficient paranoia in the code to specially handle &str: if (strcmp (type->name (), "&str") == 0) val_print_string (base->type ()->target_type (), "UTF-8", value_as_address (base), value_as_long (len), stream, options); Here the bug is that base->type()->target_type() used to be a pointer to the underlying char type, but now it is a 0-length array. If you want to keep data_ptr as an array, we can fix this in gdb. However the earliest the fix could appear is gdb 14.2. If you want this to work for already-released versions of gdb, having data_ptr just be a "C-like" pointer-to-element would be fine. IIRC in this bug the issue with unsized was that data_ptr just had the element type, i.e. 'u8' and not 'u8 *'. (In reply to Tom Tromey from comment #10) > IIRC in this bug the issue with unsized was that data_ptr just > had the element type, i.e. 'u8' and not 'u8 *'. I re-read comment #8 and I do not, in fact, "RC". There, data_ptr has type struct prog::Foo<[u8]> *data_ptr; but this "hides" the array Using this branch of rustc with the original program in this bug, I still get: <2><1750>: Abbrev Number: 8 (DW_TAG_structure_type) <1751> DW_AT_name : (indirect string, offset: 0xe76): Foo<[u8]> <1755> DW_AT_byte_size : 0 <1756> DW_AT_alignment : 1 Ok, I think I see now. Sorry about all the noise btw. We have: <2><1750>: Abbrev Number: 8 (DW_TAG_structure_type) <1751> DW_AT_name : (indirect string, offset: 0xe76): Foo<[u8]> <1755> DW_AT_byte_size : 0 <1756> DW_AT_alignment : 1 <3><1757>: Abbrev Number: 15 (DW_TAG_template_type_param) <1758> DW_AT_type : <0x3bb> <175c> DW_AT_name : (indirect string, offset: 0x27b): T <3><1760>: Abbrev Number: 4 (DW_TAG_member) <1761> DW_AT_name : (indirect string, offset: 0x8ad): value <1765> DW_AT_type : <0x3bb> <1769> DW_AT_alignment : 1 <176a> DW_AT_data_member_location: 0 Size==0 but then this is the unsized part. The type of 'value' is now: <1><3bb>: Abbrev Number: 29 (DW_TAG_array_type) <3bc> DW_AT_type : <0x342> <2><3c0>: Abbrev Number: 30 (DW_TAG_subrange_type) <3c1> DW_AT_type : <0x3c7> <3c5> DW_AT_lower_bound : 0 So I suppose what gdb ought to do is: * Notice a slice-like type (has data_ptr & length) * Notice that data_ptr points to a zero-sized struct * Find the trailing array in that struct * Transform the type from [mumble] to [mumble;length], updating the struct size as well Just wanting to verify this is the correct procedure before I go implement it. > Notice that data_ptr points to a zero-sized struct
This does not look quite right. Let's try to enumerate the cases from the ground up:
- There are two kinds of unsized types in Rust: slice-like (containing `[T]` in some form) and dyn-objects (containing `dyn` in some form). We'll only talk about slice-like unsized types here.
- A slice-like unsized value needs some kind of fat pointer that contains the length of the slice. An fat pointer can be `&`, `&mut`, `*const`, `*mut` and in debuginfo will be represented as a struct with a `data_ptr` field and a `length` field.
- A slice-like unsized value can either be "just the slice" (e.g. `[T]`) or a struct with the slice type as its last field (e.g. `struct Foo { a: u32, b: u32, c: [u32] }`). An unsized struct can have any number of regular fields, i.e. it is not always zero-sized.
- The slice type `[T]` in both cases will be represented by a `DW_TAG_array_type` with no count or upper bound. I'd call this array "unsized" rather than "zero-sized".
- In the case of just the slice, `data_ptr` will then be a regular thin pointer to that `DW_TAG_array_type`.
- In the case of an unsized struct, the `DW_TAG_array_type` will be the type of the last field. `data_ptr` will be a thin pointer to `Foo<[T]>`, i.e. it will point to the beginning of the struct, not to the unsized field.
So, it sounds like the following, slightly adapted algorithm should work:
- Notice a fat-pointer (it is a struct + its name starts with `&`, or `&mut`, or `*const`, or `*mut` + it has a `data_ptr` and a `length` field).
- Check the pointee type:
- If it is DW_TAG_array_type, this is a "just the slice" case (e.g. `&[T]`). Turn the fat pointer into a thin pointer of type `&[T; length]`.
- If it is a struct, we expect the last field to be unsized. Note that this could be a vanilla `[T]` type (i.e. it would be a DW_TAG_array_type as described) OR it could be *another unsized struct* (as in `Foo<Bar<[T]>>`), so the type transformation has be applied recursively. I.e. turn the fat pointer `&Foo<Bar<[T]>>` into a thin pointer `&Foo<Bar<[T; length]>>`.
Let me know what you think :)
Here is an example program containing all the cases: (https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=11cf2917347fd3cdab77351222b045a7) // A generic struct that is unsized if T is unsized. pub struct MaybeUnsizedStruct<T: ?Sized> { pub regular_field1: u32, pub regular_field2: u32, pub unsized_field: T, } fn main() { // This struct is still sized because T is a fixed-length array let sized_struct = &MaybeUnsizedStruct { regular_field1: 1, regular_field2: 2, unsized_field: [1, 2, 3, 4, 5, 6, 7], }; // This struct is still sized because T is sized let nested_sized_struct = &MaybeUnsizedStruct { regular_field1: 1, regular_field2: 2, unsized_field: MaybeUnsizedStruct { regular_field1: 1, regular_field2: 2, unsized_field: [1, 2, 3, 4, 5, 6, 7], }, }; // This is a regular slice let slice: &[u32] = &[1, 2, 3]; // unsized_struct will be a fat pointer, // containing the length of the final: let unsized_struct: &MaybeUnsizedStruct<[u32]> = sized_struct; // nested_unsized_struct will also be a fat pointer, // containing the length of the final field: let nested_unsized_struct: &MaybeUnsizedStruct<MaybeUnsizedStruct<[u32]>> = nested_sized_struct; // Print some addresses: coercing from sized to unsized does // not change the address the pointer points to. let data_ptr_addr_sized = sized_struct as *const _ as usize; let data_ptr_addr_unsized = unsized_struct as *const _ as *const () as usize; let data_ptr_addr_slice = slice.as_ptr() as usize; let data_ptr_addr_nested_sized = nested_sized_struct as *const _ as usize; let data_ptr_addr_nested_unsized = nested_unsized_struct as *const _ as *const () as usize; dbg!(data_ptr_addr_sized); dbg!(data_ptr_addr_unsized); dbg!(data_ptr_addr_nested_sized); dbg!(data_ptr_addr_nested_unsized); dbg!(data_ptr_addr_slice); } I'm finally actually working on this. I was thinking today that maybe I'd relent and just work around the current debuginfo. I'm not sure why I changed my mind, I guess I just figured it might be more friendly to users. Anyway, with the program in comment #15, and with rustc 1.75.0, I do not see any field or structure with byte-size == 0. So, maybe handling older versions just can't be done; though it does give me pause that the main weirdness just doesn't exist here. My original example still has that 0 with 1.75: <2><1720>: Abbrev Number: 8 (DW_TAG_structure_type) <1721> DW_AT_name : (strp) (offset: 0xdb5): Foo<[u8]> <1725> DW_AT_byte_size : (data1) 0 <1726> DW_AT_alignment : (udata) 1 I have a patch that works "pretty ok" with the current rustc output. Still working out some details about exactly where to hook into gdb. (In reply to Josh Stone from comment #17) > My original example still has that 0 with 1.75: > > <2><1720>: Abbrev Number: 8 (DW_TAG_structure_type) > <1721> DW_AT_name : (strp) (offset: 0xdb5): Foo<[u8]> > <1725> DW_AT_byte_size : (data1) 0 > <1726> DW_AT_alignment : (udata) 1 I figured it out. What happens is that the array that comes at the end of the unsized type is not included in the object size. So if there are no other preceding fields, the size is 0; but if there are preceding fields, the size will be nonzero. This can be detected by looking for a field whose offset is the same as the struct type's size. BTW, with that patch, no compiler change will be needed. The master branch has been updated by Tom Tromey <tromey@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=b0dd661fa16a424f059b1e1d80e779508b1a9a12 commit b0dd661fa16a424f059b1e1d80e779508b1a9a12 Author: Tom Tromey <tromey@adacore.com> Date: Tue Jan 30 10:06:46 2024 -0700 Rewrite Rust slice type handling This patch rewrites the handling of slice types in Rust. More recent versions of the Rust compiler changed how unsized types were emitted, letting gdb inspect them more nicely. However, gdb did not do this, and in fact treated all such types as if they were slices of arrays, which is incorrect. This patch rewrites this handling and removes the restriction that unsized types must be array slices. I've added a comment explaining how unsized types are represented to rust-lang.c as well. I looked into a different approach, namely changing the DWARF reader to fix up slice types to have a dynamic type. However, the approach taken here turned out to be simpler. Tested on x86-64 Fedora 38 with a variety of Rust compiler versions. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30330 I think this should be fixed now. The master branch has been updated by Tom Tromey <tromey@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=251cedaeb57fe1e0fd28798f476fbee75373bbf4 commit 251cedaeb57fe1e0fd28798f476fbee75373bbf4 Author: Tom Tromey <tromey@adacore.com> Date: Thu Mar 7 12:57:07 2024 -0700 Print type name when printing Rust slice The recent change to how unsized Rust values are printed included a small regression from past behavior. Previously, a slice's type would be printed, like: (gdb) print slice $80 = &[i32] [3] The patch changed this to just (gdb) print slice $80 = [3] This patch restores the previous behavior. Reviewed-By: Simon Marchi <simon.marchi@efficios.com> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30330 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31517 |