Bug 28681 - Wrong pretty-printed unique_ptr value when using "finish"
Summary: Wrong pretty-printed unique_ptr value when using "finish"
Status: RESOLVED FIXED
Alias: None
Product: gdb
Classification: Unclassified
Component: gdb (show other bugs)
Version: HEAD
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-12-10 17:02 UTC by Simon Marchi
Modified: 2022-11-14 21:23 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
Possible fix. (405 bytes, patch)
2021-12-13 16:50 UTC, Andrew Burgess
Details | Diff
Updated fix, with tests. (1.24 KB, patch)
2021-12-13 17:31 UTC, Andrew Burgess
Details | Diff
attachment-747313-0.html (586 bytes, text/html)
2021-12-23 13:05 UTC, Simon Marchi
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Simon Marchi 2021-12-10 17:02:54 UTC
With this program:

---
#include <memory>

std::unique_ptr<int> foo()
{
  int *p = new int (0x1234);
  printf("In foo: %p\n", p);
  return std::unique_ptr<int> (p);
}

int main()
{
  auto ptr = foo ();
  printf("In main: %p\n", ptr.get());
  return 0;
}

---

$ ./gdb -q -iex "add-auto-load-safe-path /usr/share/gdb/auto-load" -iex "add-auto-load-scripts-directory /usr/share/gdb/auto-load" -nx --data-directory=data-directory a.out
Reading symbols from a.out...
(gdb) b foo
Breakpoint 1 at 0x11d9: file test.cpp, line 5.
(gdb) r
Starting program: /home/smarchi/build/binutils-gdb/gdb/a.out 

Breakpoint 1, foo () at test.cpp:5
5         int *p = new int (0x1234);
(gdb) finish
Run till exit from #0  foo () at test.cpp:5
In foo: 0x55555556aeb0
main () at test.cpp:13
13        printf("In main: %p\n", ptr.get());
Value returned is $1 = std::unique_ptr<int> = {get() = 0x7fffffffe130}
(gdb) n
In main: 0x55555556aeb0
14        return 0;
(gdb) p ptr
$2 = std::unique_ptr<int> = {get() = 0x55555556aeb0}


Notice how the value printed by finish doesn't match all other three printed values.
Comment 1 Andrew Burgess 2021-12-13 16:37:34 UTC
I see the same result on x86-64.  I'm just guessing right now, but I suspect that this is a case of the result being miss-classified as something that can be passed in register, when, it is really something that is passed in memory.

The sys-v ABI says that if the return value is a non-trivial, C++ object, the caller should allocate space for the return value, and pass the address of the space to the callee as a hidden first argument.  The address of the object should then be returned as the return value.

Checking the disassembly of this test, and I can confirm that this is what is happening.

However, on return, GDB appears to treat the address of the unique_ptr on the stack as if that was the unique_ptr, so I suspect, that GDB has classified the return value as being passed in register, rather than in memory.

The culprit is likely to be amd64_return_value (or something called from that function) for x86-64.
Comment 2 Andrew Burgess 2021-12-13 16:50:52 UTC
Created attachment 13848 [details]
Possible fix.

Other than testing on the small example in this bug (which now works), this patch is completely untested.

I'll still need to write a test case for this.
Comment 3 Andrew Burgess 2021-12-13 17:31:31 UTC
Created attachment 13849 [details]
Updated fix, with tests.

An updated patch, this includes some tests.  Unfortunately, the tests are hitting a different pre-existing bug in GDB.  I see this assertion failure:

  gdbtypes.h:676: internal-error: loc_bitpos: Assertion `m_loc_kind == FIELD_LOC_KIND_BITPOS' failed.

If I keep my new tests, but revert the fix in GDB, I still hit the same assert, so I don't think its a result of my changes. I'll take a look at this assertion before posting the fix to the list.
Comment 4 Simon Marchi 2021-12-13 18:22:51 UTC
(In reply to Andrew Burgess from comment #3)
> Created attachment 13849 [details]
> Updated fix, with tests.
> 
> An updated patch, this includes some tests.  Unfortunately, the tests are
> hitting a different pre-existing bug in GDB.  I see this assertion failure:
> 
>   gdbtypes.h:676: internal-error: loc_bitpos: Assertion `m_loc_kind ==
> FIELD_LOC_KIND_BITPOS' failed.
> 
> If I keep my new tests, but revert the fix in GDB, I still hit the same
> assert, so I don't think its a result of my changes. I'll take a look at
> this assertion before posting the fix to the list.

Can you give some more info about that? It might be already known.
Comment 5 Andrew Burgess 2021-12-14 09:50:33 UTC
Sure.  My original comment was all I knew at the time, but I've looked into it a little more since then.

The call stack when the assert triggers is:

  amd64_return_value
    amd64_classify
      amd64_classify_aggregate
        amd64_has_unaligned_fields

In amd64_has_unaligned_fields we have this loop:

  for (int i = 0; i < type->num_fields (); i++)
    {
      struct type *subtype = check_typedef (type->field (i).type ());

      /* Ignore static fields, empty fields (for example nested
         empty structures), and bitfields (these are handled by
         the caller).  */
      if (field_is_static (&type->field (i))
          || (TYPE_FIELD_BITSIZE (type, i) == 0
              && TYPE_LENGTH (subtype) == 0)
          || TYPE_FIELD_PACKED (type, i))
        continue;

      int bitpos = type->field (i).loc_bitpos ();

      /* ... snip ... */
    }

And it is when we ask for the loc_bitpos that the assert triggers, as the field's type is no FIELD_LOC_KIND_BITPOS, but is instead, FIELD_LOC_KIND_DWARF_BLOCK.

In the case we're handling, the field being processed is the field that describes the virtual base of the type being processed.

I originally starting playing with is_dynamic_type, but noticed that the type in question doesn't return true for is_dynamic_type.  This seemed weird, as clearly the offset needs resolving.  But then I found some code in is_dynamic_type_internal which specifically detects the exact case I'm looking at, and ensures that such a case is NOT enough to make a type appear dynamic.  The comment on this code says:

	    /* Do not consider C++ virtual base types to be dynamic
	       due to the field's offset being dynamic; these are
	       handled via other means.  */

But, frustratingly, doesn't indicate what that other means is, or where I might find it.

However, I might be able to side step this issue for now.

The patch I'm playing with adds a check in amd64_classify_aggregate, if the aggregate is not trivially_copyable then it should be passed in memory.  No further checks are required.

I initially added the trivially_copyable check as the last check of the if condition, like this:

  if (TYPE_LENGTH (type) > 16
      || amd64_has_unaligned_fields (type)
      || !language_pass_by_reference (type).trivially_copyable)
    {
      theclass[0] = theclass[1] = AMD64_MEMORY;
      return;
    }

However, while looking at the assertion case, I noticed that the type with a virtual base was not trivially_copyable, so, if I reorder the checks like this:

  if (TYPE_LENGTH (type) > 16
      || !language_pass_by_reference (type).trivially_copyable
      || amd64_has_unaligned_fields (type))
    {
      theclass[0] = theclass[1] = AMD64_MEMORY;
      return;
    }

Then, problem solved!  Or, at least, problem kicked down the road!  I'm still experimenting, but, so far, I've not been able to create an example that both uses virtual base classes, and is trivially_copyable.

My plan for now is to place the checks in the "correct" order, and add a solid comment explaining that the order is important (plus the test will fail if the order is ever changed).

Things that would help are:

  1. Suggestions for how amd64_has_unaligned_fields can be improved in order to handle virtual base class fields, e.g. how do we obtain the field offset for those fields, or

  2. Confirmation that in C++ types with a virtual base class, are, by definition, never trivially_copyable, this would imply my solution is actually not such a hack after all, or

  3. An example of a class with a virtual base class, that is trivially copyable.  Ideally, the class would be less than 8-bytes in size so that it can be passed in a single register, such an example would then fail in my patched GDB.
Comment 6 Simon Marchi 2021-12-14 13:52:24 UTC
> And it is when we ask for the loc_bitpos that the assert triggers, as the
> field's type is no FIELD_LOC_KIND_BITPOS, but is instead,
> FIELD_LOC_KIND_DWARF_BLOCK.
> 
> In the case we're handling, the field being processed is the field that
> describes the virtual base of the type being processed.
> 
> I originally starting playing with is_dynamic_type, but noticed that the
> type in question doesn't return true for is_dynamic_type.  This seemed
> weird, as clearly the offset needs resolving.  But then I found some code in
> is_dynamic_type_internal which specifically detects the exact case I'm
> looking at, and ensures that such a case is NOT enough to make a type appear
> dynamic.  The comment on this code says:
> 
> 	    /* Do not consider C++ virtual base types to be dynamic
> 	       due to the field's offset being dynamic; these are
> 	       handled via other means.  */
> 
> But, frustratingly, doesn't indicate what that other means is, or where I
> might find it.

Ok, it is that issue I was curious about.  It sounds similar to this, but at a different place:

https://sourceware.org/pipermail/gdb-patches/2021-September/182307.html
Comment 7 Andrew Burgess 2021-12-15 17:39:35 UTC
I posted my fix to the mailing list:

  https://sourceware.org/pipermail/gdb-patches/2021-December/184496.html
Comment 8 Sourceware Commits 2021-12-23 12:15:57 UTC
The master branch has been updated by Andrew Burgess <aburgess@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=b1718fcdd1d2a5c514f8ee504ba07fb3f42b8608

commit b1718fcdd1d2a5c514f8ee504ba07fb3f42b8608
Author: Andrew Burgess <aburgess@redhat.com>
Date:   Mon Dec 13 16:56:16 2021 +0000

    gdb: on x86-64 non-trivial C++ objects are returned in memory
    
    Fixes PR gdb/28681.  It was observed that after using the `finish`
    command an incorrect value was displayed in some cases.  Specifically,
    this behaviour was observed on an x86-64 target.
    
    Consider this test program:
    
      struct A
      {
        int i;
    
        A ()
        { this->i = 0; }
        A (const A& a)
        { this->i = a.i; }
      };
    
      A
      func (int i)
      {
        A a;
        a.i = i;
        return a;
      }
    
      int
      main ()
      {
        A a = func (3);
        return a.i;
      }
    
    And this GDB session:
    
      $ gdb -q ex.x
      Reading symbols from ex.x...
      (gdb) b func
      Breakpoint 1 at 0x401115: file ex.cc, line 14.
      (gdb) r
      Starting program: /home/andrew/tmp/ex.x
    
      Breakpoint 1, func (i=3) at ex.cc:14
      14      A a;
      (gdb) finish
      Run till exit from #0  func (i=3) at ex.cc:14
      main () at ex.cc:23
      23      return a.i;
      Value returned is $1 = {
        i = -19044
      }
      (gdb) p a
      $2 = {
        i = 3
      }
      (gdb)
    
    Notice how after the `finish` the contents of $1 are junk, but, when I
    immediately ask for the value of `a`, I get back the correct value.
    
    The problem here is that after the finish command GDB calls the
    function amd64_return_value to figure out where the return value can
    be found (on x86-64 targets anyway).
    
    This function makes the wrong choice for the struct A in our case, as
    sizeof(A) <= 8, then amd64_return_value decides that A will be
    returned in a register.  GDB then reads the return value register an
    interprets the contents as an instance of A.
    
    Unfortunately, A is not trivially copyable (due to its copy
    constructor), and the sys-v specification for argument and return
    value passing, says that any non-trivial C++ object should have space
    allocated for it by the caller, and the address of this space is
    passed to the callee as a hidden first argument.  The callee should
    then return the address of this space as the return value.
    
    And so, the register that GDB is treating as containing an instance of
    A, actually contains the address of an instance of A (in this case on
    the stack), this is why GDB shows the incorrect result.
    
    The call stack within GDB for where we actually go wrong is this:
    
      amd64_return_value
        amd64_classify
          amd64_classify_aggregate
    
    And it is in amd64_classify_aggregate that we should be classifying
    the type as AMD64_MEMORY, instead of as AMD64_INTEGER as we currently
    do (via a call to amd64_classify_aggregate_field).
    
    At the top of amd64_classify_aggregate we already have this logic:
    
      if (TYPE_LENGTH (type) > 16 || amd64_has_unaligned_fields (type))
        {
          theclass[0] = theclass[1] = AMD64_MEMORY;
          return;
        }
    
    Which handles some easy cases where we know a struct will be placed
    into memory, that is (a) the struct is more than 16-bytes in size,
    or (b) the struct has any unaligned fields.
    
    All we need then, is to add a check here to see if the struct is
    trivially copyable.  If it is not then we know the struct will be
    passed in memory.
    
    I originally structured the code like this:
    
      if (TYPE_LENGTH (type) > 16
          || amd64_has_unaligned_fields (type)
          || !language_pass_by_reference (type).trivially_copyable)
        {
          theclass[0] = theclass[1] = AMD64_MEMORY;
          return;
        }
    
    This solved the example from the bug, and my small example above.  So
    then I started adding some more extensive tests to the GDB testsuite,
    and I ran into a problem.  I hit this error:
    
      gdbtypes.h:676: internal-error: loc_bitpos: Assertion `m_loc_kind == FIELD_LOC_KIND_BITPOS' failed.
    
    This problem is triggered from:
    
      amd64_classify_aggregate
        amd64_has_unaligned_fields
          field::loc_bitpos
    
    Inside the unaligned field check we try to get the bit position of
    each field.  Unfortunately, in some cases the field location is not
    FIELD_LOC_KIND_BITPOS, but is FIELD_LOC_KIND_DWARF_BLOCK.
    
    An example that shows this bug is:
    
      struct B
      {
        short j;
      };
    
      struct A : virtual public B
      {
        short i;
    
        A ()
        { this->i = 0; }
        A (const A& a)
        { this->i = a.i; }
      };
    
      A
      func (int i)
      {
        A a;
        a.i = i;
        return a;
      }
    
      int
      main ()
      {
        A a = func (3);
        return a.i;
      }
    
    It is the virtual base class, B, that causes the problem.  The base
    class is represented, within GDB, as a field within A.  However, the
    location type for this field is a DWARF_BLOCK.
    
    I spent a little time trying to figure out how to convert the
    DWARF_BLOCK to a BITPOS, however, I realised that, in this case at
    least, conversion is not needed.
    
    The C++ standard says that a class is not trivially copyable if it has
    any virtual base classes.  And so, in this case, even if I could
    figure out the BITPOS for the virtual base class fields, I know for
    sure that I would immediately fail the trivially_copyable check.  So,
    lets just reorder the checks in amd64_classify_aggregate to:
    
      if (TYPE_LENGTH (type) > 16
          || !language_pass_by_reference (type).trivially_copyable
          || amd64_has_unaligned_fields (type))
        {
          theclass[0] = theclass[1] = AMD64_MEMORY;
          return;
        }
    
    Now, if we have a class with virtual bases we will fail quicker, and
    avoid the unaligned fields check completely.
    
    Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28681
Comment 9 Andrew Burgess 2021-12-23 12:16:59 UTC
I believe this should now be resolved in master.
Comment 10 Simon Marchi 2021-12-23 13:05:06 UTC
Created attachment 13873 [details]
attachment-747313-0.html

Thanks! 

On December 23, 2021 7:16:59 AM EST, aburgess at redhat dot com <sourceware-bugzilla@sourceware.org> wrote:
>https://sourceware.org/bugzilla/show_bug.cgi?id=28681
>
>Andrew Burgess <aburgess at redhat dot com> changed:
>
>           What    |Removed                     |Added
>----------------------------------------------------------------------------
>         Resolution|---                         |FIXED
>             Status|NEW                         |RESOLVED
>
>--- Comment #9 from Andrew Burgess <aburgess at redhat dot com> ---
>I believe this should now be resolved in master.
>
>-- 
>You are receiving this mail because:
>You reported the bug.
Comment 11 Sourceware Commits 2022-03-14 10:39:04 UTC
The master branch has been updated by Luis Machado <luisgpm@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=bab22d0640914384d467e09c3e796585fd08e7c6

commit bab22d0640914384d467e09c3e796585fd08e7c6
Author: Luis Machado <luis.machado@arm.com>
Date:   Tue Jan 4 14:06:36 2022 -0300

    [aarch64/arm] Properly extract the return value returned in memory
    
    When running gdb.cp/non-trivial-retval.exp, the following shows up for
    both aarch64-linux and armhf-linux:
    
    Breakpoint 3, f1 (i1=23, i2=100) at src/gdb/testsuite/gdb.cp/non-trivial-retval.cc:35
    35        A a;
    (gdb) finish
    Run till exit from #0  f1 (i1=23, i2=100) at src/gdb/testsuite/gdb.cp/non-trivial-retval.cc:35
    main () at /src/gdb/testsuite/gdb.cp/non-trivial-retval.cc:163
    163       B b = f2 (i1, i2);
    Value returned is $6 = {a = -11952}
    (gdb)
    
    The return value should be {a = 123} instead. This happens because the
    backends don't extract the return value from the correct location. GDB should
    fetch a pointer to the memory location from X8 for aarch64 and r0 for armhf.
    
    With the patch, gdb.cp/non-trivial-retval.exp has full passes on
    aarch64-linux and armhf-linux on Ubuntu 20.04/18.04.
    
    The problem only shows up with the "finish" command. The "call" command
    works correctly and displays the correct return value.
    
    This is also related to PR gdb/28681
    (https://sourceware.org/bugzilla/show_bug.cgi?id=28681) and fixes FAIL's in
    gdb.ada/mi_var_array.exp.
    
    A new testcase is provided, and it exercises GDB's ability to "finish" a
    function that returns a large struct (> 16 bytes) and display the
    contents of this struct correctly. This has always been incorrect for
    these backends, but no testcase exercised this particular scenario.
Comment 12 Sourceware Commits 2022-11-14 21:23:46 UTC
The master branch has been updated by Carl Love <carll@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=24b27e5e9b6bf5a37fb39dfca151722bb801cbaa

commit 24b27e5e9b6bf5a37fb39dfca151722bb801cbaa
Author: Carl Love <cel@us.ibm.com>
Date:   Mon Nov 14 16:22:11 2022 -0500

    PowerPC, function ppc64_sysv_abi_return_value add missing return value convention
    
    This patch address five testcase failures in gdb.cp/non-trivial-retval.exp.
    The following commit resulted in the five testcases failures on PowerPC.
    The value returned by the function is being reported incorrectly.
    
      commit b1718fcdd1d2a5c514f8ee504ba07fb3f42b8608
      Author: Andrew Burgess <aburgess@redhat.com>
      Date:   Mon Dec 13 16:56:16 2021 +0000
    
          gdb: on x86-64 non-trivial C++ objects are returned in memory
    
          Fixes PR gdb/28681.  It was observed that after using the `finish`
          command an incorrect value was displayed in some cases.  Specifically,
          this behaviour was observed on an x86-64 target.
    
    The function:
    
      enum return_value_convention
      ppc64_sysv_abi_return_value (struct gdbarch *gdbarch, struct value *function,
                                   struct type *valtype, struct regcache *regcache,
                                   gdb_byte *readbuf, const gdb_byte *writebuf)
    
    should return RETURN_VALUE_STRUCT_CONVENTION if the valtype->code() is
    TYPE_CODE_STRUCT and if the language_pass_by_reference is not
    trivially_copyable.
    
    This patch adds the needed code to return the value
    RETURN_VALUE_STRUCT_CONVENTION in this case.
    
    With this patch, the five test cases still fail but with the message "Value
    returned has type: A. Cannot determine contents".  The PowerPC ABI stores
    the address of the buffer containing the function return value in register
    r3 on entry to the function.  However, the PowerPC ABI does not guarentee
    that r3 will not be modified in the function.  So when the function returns,
    the return buffer address cannot be reliably obtained from register r3.
    Thus the message "Cannot determine contents" is appropriate in this case.