Bug 30330 - GDB 13.1 no longer prints length of Rust slice wrappers
Summary: GDB 13.1 no longer prints length of Rust slice wrappers
Status: RESOLVED FIXED
Alias: None
Product: gdb
Classification: Unclassified
Component: rust (show other bugs)
Version: 13.1
: P2 normal
Target Milestone: 15.1
Assignee: Tom Tromey
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-04-10 20:45 UTC by Josh Stone
Modified: 2024-04-02 17:44 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
Binary that triggers crash when printing Rust &str slice (12.09 KB, application/octet-stream)
2023-12-08 16:34 UTC, Michael Woerister
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Josh Stone 2023-04-10 20:45:10 UTC
GDB 13.1 has regressed in the Rust debuginfo testsuite here:
https://github.com/rust-lang/rust/blob/a73288371e3fa0a610fbc11e7e8418017bdfde42/tests/debuginfo/unsized.rs#L7-L9

I have reduced it to the following test case compiled with Rust 1.68.2, rustc -g unsized.rs

    #![allow(unused)]

    #[derive(Debug)]
    struct Foo<T: ?Sized> {
        value: T,
    }

    fn main() {
        let foo: Foo<[u8; 4]> = Foo { value: *b"abc\0" };
        let a: &Foo<[u8]> = &foo;
        dbg!(a);
    }

With GDB 12.1 after stepping to the dbg line, I get:

    (gdb) p a
    $1 = &unsized::Foo<[u8]> {data_ptr: 0x7fffffffd488, length: 4}

With GDB 13.1 on the same executable, I get:

    (gdb) p a
    $1 = &unsized::Foo<[u8]> 0x7fffffffd488

I bisected the change to this commit:
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=506ec52e8805d8edd538d6bd11750489a8c8bbee

commit 506ec52e8805d8edd538d6bd11750489a8c8bbee
Author: Tom Tromey <tom@tromey.com>
Date:   Fri Mar 25 13:36:53 2022 -0600

    Reimplement Rust slice printing
    
    The current nightly Rust compiler (aka 1.61) added better DWARF
    representation for unsized types.  Fixing this is PR rust/21466; but
    the code is actually the same as what is required to make slice
    printing more useful, which is PR rust/23871.  This patch implements
    this.  I tested this against various Rust compilers: 1.48, current
    stable, current beta, and current nightly.
    
    Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=21466
    Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=23871
Comment 1 Tom Tromey 2023-04-11 11:47:09 UTC
Apparently the intent now is to print the underlying
data in the slice, not just the shape of it.
However gdb stumbles over this obviously incorrect debuginfo:

 <2><2699>: Abbrev Number: 7 (DW_TAG_structure_type)
    <269a>   DW_AT_name        : (indirect string, offset: 0x41a): Foo<[u8]>
    <269e>   DW_AT_byte_size   : 0
    <269f>   DW_AT_alignment   : 1

This structure does not really have a byte size of 0,
because it has this member:

 <3><26a9>: Abbrev Number: 25 (DW_TAG_member)
    <26aa>   DW_AT_name        : (indirect string, offset: 0xff104): value
    <26ae>   DW_AT_type        : <0x23b0>
    <26b2>   DW_AT_byte_size   : 1
    <26b3>   DW_AT_bit_size    : 0
    <26b4>   DW_AT_bit_offset  : 8
    <26b5>   DW_AT_data_member_location: 0

This seems to be a rustc bug.

I don't know yet if this can be worked around in a reasonable way.
Comment 2 Josh Stone 2023-04-11 18:52:39 UTC
Hmm, Rust 1.59 to 1.60 changed those member attributes.

1.59:

<3><1002>: Abbrev Number: 12 (DW_TAG_member)
    <1003>   DW_AT_name        : (indirect string, offset: 0xf63af): value
    <1007>   DW_AT_type        : <0x13ba>
    <100b>   DW_AT_alignment   : 1
    <100c>   DW_AT_data_member_location: 0

1.60:

<3><13a6>: Abbrev Number: 41 (DW_TAG_member)
    <13a7>   DW_AT_name        : (indirect string, offset: 0x105d4b): value
    <13ab>   DW_AT_type        : <0xb7>
    <13af>   DW_AT_byte_size   : 1
    <13b0>   DW_AT_bit_size    : 0
    <13b1>   DW_AT_bit_offset  : 8
    <13b2>   DW_AT_data_member_location: 0

> This structure does not really have a byte size of 0,

It doesn't have a *static* size at all -- it needs the reference's length.
Comment 3 Tom Tromey 2023-04-14 03:19:07 UTC
> It doesn't have a *static* size at all -- it needs the reference's length.

Yeah, oops, sorry about that.
I guess we need some kind of type resolution that will rewrite
the size of the final field in an unsized struct.  Unfortunately
this doesn't seem to fit that well into how dynamic type resolution
is currently handled in gdb.  So maybe we need some Rust-specific
bit... Ada handles similar(-ish) cases entirely on its own but I think
it would be better to somehow hook into the generic code.

FWIW I can't gdb 12 to print anything really useful here.
You can inspect the data, a bit, but it isn't really obvious
and I even get some weird stuff:

(gdb) p *a.data_ptr
$2 = prog::Foo<[u8]> {
  value: 0
}
(gdb) p a.data_ptr.value
$3 = 97
Comment 4 Tom Tromey 2023-04-14 04:25:46 UTC
Also it seems a little weird that the trailing member is a u8
and not an array:

 <3><26a9>: Abbrev Number: 25 (DW_TAG_member)
    <26aa>   DW_AT_name        : (indirect string, offset: 0xff104): value
    <26ae>   DW_AT_type        : <0x23b0>

...

 <1><23b0>: Abbrev Number: 6 (DW_TAG_base_type)
    <23b1>   DW_AT_name        : (indirect string, offset: 0x158e): u8
    <23b5>   DW_AT_encoding    : 7	(unsigned)
    <23b6>   DW_AT_byte_size   : 1
Comment 5 Josh Stone 2023-04-14 15:47:42 UTC
BTW, I'm not trying to shirk rustc responsibility here - there could well be improvements needed in that debuginfo. But frankly, you've done more with that than I have. :)
Comment 6 Tom Tromey 2023-04-14 22:03:13 UTC
(In reply to Josh Stone from comment #5)
> BTW, I'm not trying to shirk rustc responsibility here 

Not at all, also I'm never really sure I understand unsized types.
Comment 7 Tom Tromey 2023-07-24 13:48:38 UTC
I found this comment in rust/compiler/rustc_codegen_llvm/src/debuginfo/metadata.rs:

/// Create debuginfo for `[T]` and `str`. These are unsized.
///
/// NOTE: We currently emit just emit the debuginfo for the element type here
/// (i.e. `T` for slices and `u8` for `str`), so that we end up with
/// `*const T` for the `data_ptr` field of the corresponding fat-pointer
/// debuginfo of `&[T]`.
///
/// It would be preferable and more accurate if we emitted a DIArray of T
/// without an upper bound instead. That is, LLVM already supports emitting
/// debuginfo of arrays of unknown size. But GDB currently seems to end up
/// in an infinite loop when confronted with such a type.


I don't recall seeing a report about this infinite loop.  I'm not exactly
sure how to reproduce it.

Anyway I tend to think the rustc debuginfo here is just bad.  I'm reluctant
to try to work around it.  It would be better if the DWARF accurately
described what is really being emitted, including some information about
where the array bounds can be found dynamically.

To implement a workaround here, I guess I'd need to know exactly which field(s)
might have to change from 'T' to 'T[]' to be correct.  I don't know how to
find this.
Comment 8 Tom Tromey 2023-07-28 21:31:45 UTC
Unsized types doc'd here:
https://doc.rust-lang.org/book/ch19-04-advanced-types.html#dynamically-sized-types-and-the-sized-trait

I don't know why I keep not understanding this but I think I
finally figured it out.

(gdb) ptype a
type = struct &prog::Foo<[u8]> {
    struct prog::Foo<[u8]> *data_ptr;
    usize length;
}

So here, gdb should change the type of 'data_ptr'
to 'struct prog::Foo<[u8; ...length...]> *'

This can't readily be expressed in DWARF I think.
But gdb can apply the change by itself.

So then all that remains is fixing up the instance
not to use 'u8' (but rather 'u8[]') as the type of the
"VLA" part.
Comment 9 Michael Woerister 2023-12-08 16:34:10 UTC
Created attachment 15244 [details]
Binary that triggers crash when printing Rust &str slice

Compiled version of this rustc test case:
https://github.com/rust-lang/rust/blob/568f6a8641e391ffcdcdb03e79d0901731d8d399/tests/debuginfo/empty-string.rs#L15

The rustc version this is compiled with has been modified to emit the proposed DWARF for slices:
https://github.com/michaelwoerister/rust/tree/try-fixing-slice-debuginfo
Comment 10 Tom Tromey 2023-12-08 21:34:48 UTC
The bug with the test case is due to insufficient paranoia
in the code to specially handle &str:

  if (strcmp (type->name (), "&str") == 0)
    val_print_string (base->type ()->target_type (), "UTF-8",
		      value_as_address (base), value_as_long (len), stream,
		      options);

Here the bug is that base->type()->target_type() used to
be a pointer to the underlying char type, but now it is
a 0-length array.

If you want to keep data_ptr as an array, we can fix this in gdb.
However the earliest the fix could appear is gdb 14.2.

If you want this to work for already-released versions of gdb,
having data_ptr just be a "C-like" pointer-to-element would be fine.
IIRC in this bug the issue with unsized was that data_ptr just
had the element type, i.e. 'u8' and not 'u8 *'.
Comment 11 Tom Tromey 2023-12-08 21:36:43 UTC
(In reply to Tom Tromey from comment #10)

> IIRC in this bug the issue with unsized was that data_ptr just
> had the element type, i.e. 'u8' and not 'u8 *'.

I re-read comment #8 and I do not, in fact, "RC".
There, data_ptr has type

    struct prog::Foo<[u8]> *data_ptr;

but this "hides" the array
Comment 12 Tom Tromey 2023-12-08 21:43:27 UTC
Using this branch of rustc with the original program in this bug,
I still get:

 <2><1750>: Abbrev Number: 8 (DW_TAG_structure_type)
    <1751>   DW_AT_name        : (indirect string, offset: 0xe76): Foo<[u8]>
    <1755>   DW_AT_byte_size   : 0
    <1756>   DW_AT_alignment   : 1
Comment 13 Tom Tromey 2023-12-08 21:49:10 UTC
Ok, I think I see now.  Sorry about all the noise btw.

We have:

 <2><1750>: Abbrev Number: 8 (DW_TAG_structure_type)
    <1751>   DW_AT_name        : (indirect string, offset: 0xe76): Foo<[u8]>
    <1755>   DW_AT_byte_size   : 0
    <1756>   DW_AT_alignment   : 1
 <3><1757>: Abbrev Number: 15 (DW_TAG_template_type_param)
    <1758>   DW_AT_type        : <0x3bb>
    <175c>   DW_AT_name        : (indirect string, offset: 0x27b): T
 <3><1760>: Abbrev Number: 4 (DW_TAG_member)
    <1761>   DW_AT_name        : (indirect string, offset: 0x8ad): value
    <1765>   DW_AT_type        : <0x3bb>
    <1769>   DW_AT_alignment   : 1
    <176a>   DW_AT_data_member_location: 0


Size==0 but then this is the unsized part.
The type of 'value' is now:

 <1><3bb>: Abbrev Number: 29 (DW_TAG_array_type)
    <3bc>   DW_AT_type        : <0x342>
 <2><3c0>: Abbrev Number: 30 (DW_TAG_subrange_type)
    <3c1>   DW_AT_type        : <0x3c7>
    <3c5>   DW_AT_lower_bound : 0

So I suppose what gdb ought to do is:

* Notice a slice-like type (has data_ptr & length)
* Notice that data_ptr points to a zero-sized struct
* Find the trailing array in that struct
* Transform the type from [mumble] to [mumble;length],
  updating the struct size as well

Just wanting to verify this is the correct procedure before I
go implement it.
Comment 14 Michael Woerister 2023-12-11 10:27:08 UTC
> Notice that data_ptr points to a zero-sized struct

This does not look quite right. Let's try to enumerate the cases from the ground up:

- There are two kinds of unsized types in Rust: slice-like (containing `[T]` in some form) and dyn-objects (containing `dyn` in some form). We'll only talk about slice-like unsized types here.
- A slice-like unsized value needs some kind of fat pointer that contains the length of the slice. An fat pointer can be `&`, `&mut`, `*const`, `*mut` and in debuginfo will be represented as a struct with a `data_ptr` field and a `length` field. 
- A slice-like unsized value can either be "just the slice" (e.g. `[T]`) or a struct with the slice type as its last field (e.g. `struct Foo { a: u32, b: u32, c: [u32] }`). An unsized struct can have any number of regular fields, i.e. it is not always zero-sized.
- The slice type `[T]` in both cases will be represented by a `DW_TAG_array_type` with no count or upper bound. I'd call this array "unsized" rather than "zero-sized".
- In the case of just the slice, `data_ptr` will then be a regular thin pointer to that `DW_TAG_array_type`.
- In the case of an unsized struct, the `DW_TAG_array_type` will be the type of the last field. `data_ptr` will be a thin pointer to `Foo<[T]>`, i.e. it will point to the beginning of the struct, not to the unsized field.

So, it sounds like the following, slightly adapted algorithm should work:

- Notice a fat-pointer (it is a struct + its name starts with `&`, or `&mut`, or `*const`, or `*mut` + it has a `data_ptr` and a `length` field).
- Check the pointee type:
  - If it is DW_TAG_array_type, this is a "just the slice" case (e.g. `&[T]`). Turn the fat pointer into a thin pointer of type `&[T; length]`.
  - If it is a struct, we expect the last field to be unsized. Note that this could be a vanilla `[T]` type (i.e. it would be a DW_TAG_array_type as described) OR it could be *another unsized struct* (as in `Foo<Bar<[T]>>`), so the type transformation has be applied recursively. I.e. turn the fat pointer `&Foo<Bar<[T]>>` into a thin pointer `&Foo<Bar<[T; length]>>`.

Let me know what you think :)
Comment 15 Michael Woerister 2023-12-11 10:32:45 UTC
Here is an example program containing all the cases:
(https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=11cf2917347fd3cdab77351222b045a7)

// A generic struct that is unsized if T is unsized.
pub struct MaybeUnsizedStruct<T: ?Sized> {
    pub regular_field1: u32,
    pub regular_field2: u32,
    pub unsized_field: T,
}

fn main() {
    // This struct is still sized because T is a fixed-length array
    let sized_struct = &MaybeUnsizedStruct {
        regular_field1: 1,
        regular_field2: 2,
        unsized_field: [1, 2, 3, 4, 5, 6, 7],
    };

    // This struct is still sized because T is sized
    let nested_sized_struct = &MaybeUnsizedStruct {
        regular_field1: 1,
        regular_field2: 2,
        unsized_field: MaybeUnsizedStruct {
            regular_field1: 1,
            regular_field2: 2,
            unsized_field: [1, 2, 3, 4, 5, 6, 7],
        },
    };

    // This is a regular slice
    let slice: &[u32] = &[1, 2, 3];

    // unsized_struct will be a fat pointer, 
    // containing the length of the final:
    let unsized_struct: &MaybeUnsizedStruct<[u32]> = sized_struct;

    // nested_unsized_struct will also be a fat pointer, 
    // containing the length of the final field:
    let nested_unsized_struct: 
        &MaybeUnsizedStruct<MaybeUnsizedStruct<[u32]>> = nested_sized_struct;
    
    // Print some addresses: coercing from sized to unsized does 
    // not change the address the pointer points to.
    let data_ptr_addr_sized = sized_struct as *const _ as usize;
    let data_ptr_addr_unsized = 
        unsized_struct as *const _ as *const () as usize;
    let data_ptr_addr_slice = slice.as_ptr() as usize;
    let data_ptr_addr_nested_sized = nested_sized_struct as *const _ as usize;
    let data_ptr_addr_nested_unsized = 
        nested_unsized_struct as *const _ as *const () as usize;

    dbg!(data_ptr_addr_sized);
    dbg!(data_ptr_addr_unsized);
    dbg!(data_ptr_addr_nested_sized);
    dbg!(data_ptr_addr_nested_unsized);
    dbg!(data_ptr_addr_slice);
}
Comment 16 Tom Tromey 2024-01-30 18:24:28 UTC
I'm finally actually working on this.

I was thinking today that maybe I'd relent and just work around
the current debuginfo.  I'm not sure why I changed my mind, I
guess I just figured it might be more friendly to users.

Anyway, with the program in comment #15, and with rustc 1.75.0,
I do not see any field or structure with byte-size == 0.
So, maybe handling older versions just can't be done; though
it does give me pause that the main weirdness just doesn't
exist here.
Comment 17 Josh Stone 2024-01-30 20:33:12 UTC
My original example still has that 0 with 1.75:

 <2><1720>: Abbrev Number: 8 (DW_TAG_structure_type)
    <1721>   DW_AT_name        : (strp) (offset: 0xdb5): Foo<[u8]>
    <1725>   DW_AT_byte_size   : (data1) 0
    <1726>   DW_AT_alignment   : (udata) 1
Comment 18 Tom Tromey 2024-01-31 20:57:39 UTC
I have a patch that works "pretty ok" with the current rustc output.
Still working out some details about exactly where to hook into gdb.
Comment 19 Tom Tromey 2024-02-01 19:42:28 UTC
(In reply to Josh Stone from comment #17)
> My original example still has that 0 with 1.75:
> 
>  <2><1720>: Abbrev Number: 8 (DW_TAG_structure_type)
>     <1721>   DW_AT_name        : (strp) (offset: 0xdb5): Foo<[u8]>
>     <1725>   DW_AT_byte_size   : (data1) 0
>     <1726>   DW_AT_alignment   : (udata) 1

I figured it out.  What happens is that the array that
comes at the end of the unsized type is not included
in the object size.  So if there are no other preceding
fields, the size is 0; but if there are preceding fields,
the size will be nonzero.  This can be detected by looking
for a field whose offset is the same as the struct type's size.
Comment 21 Tom Tromey 2024-02-02 18:21:38 UTC
BTW, with that patch, no compiler change will be needed.
Comment 22 Sourceware Commits 2024-02-20 20:57:30 UTC
The master branch has been updated by Tom Tromey <tromey@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=b0dd661fa16a424f059b1e1d80e779508b1a9a12

commit b0dd661fa16a424f059b1e1d80e779508b1a9a12
Author: Tom Tromey <tromey@adacore.com>
Date:   Tue Jan 30 10:06:46 2024 -0700

    Rewrite Rust slice type handling
    
    This patch rewrites the handling of slice types in Rust.
    
    More recent versions of the Rust compiler changed how unsized types
    were emitted, letting gdb inspect them more nicely.  However, gdb did
    not do this, and in fact treated all such types as if they were slices
    of arrays, which is incorrect.
    
    This patch rewrites this handling and removes the restriction that
    unsized types must be array slices.  I've added a comment explaining
    how unsized types are represented to rust-lang.c as well.
    
    I looked into a different approach, namely changing the DWARF reader
    to fix up slice types to have a dynamic type.  However, the approach
    taken here turned out to be simpler.
    
    Tested on x86-64 Fedora 38 with a variety of Rust compiler versions.
    
    Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30330
Comment 23 Tom Tromey 2024-02-20 20:58:43 UTC
I think this should be fixed now.
Comment 24 Sourceware Commits 2024-04-02 17:44:28 UTC
The master branch has been updated by Tom Tromey <tromey@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=251cedaeb57fe1e0fd28798f476fbee75373bbf4

commit 251cedaeb57fe1e0fd28798f476fbee75373bbf4
Author: Tom Tromey <tromey@adacore.com>
Date:   Thu Mar 7 12:57:07 2024 -0700

    Print type name when printing Rust slice
    
    The recent change to how unsized Rust values are printed included a
    small regression from past behavior.  Previously, a slice's type would
    be printed, like:
    
        (gdb) print slice
        $80 = &[i32] [3]
    
    The patch changed this to just
    
        (gdb) print slice
        $80 = [3]
    
    This patch restores the previous behavior.
    
    Reviewed-By: Simon Marchi <simon.marchi@efficios.com>
    Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30330
    Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31517