With this program: --- #include <memory> std::unique_ptr<int> foo() { int *p = new int (0x1234); printf("In foo: %p\n", p); return std::unique_ptr<int> (p); } int main() { auto ptr = foo (); printf("In main: %p\n", ptr.get()); return 0; } --- $ ./gdb -q -iex "add-auto-load-safe-path /usr/share/gdb/auto-load" -iex "add-auto-load-scripts-directory /usr/share/gdb/auto-load" -nx --data-directory=data-directory a.out Reading symbols from a.out... (gdb) b foo Breakpoint 1 at 0x11d9: file test.cpp, line 5. (gdb) r Starting program: /home/smarchi/build/binutils-gdb/gdb/a.out Breakpoint 1, foo () at test.cpp:5 5 int *p = new int (0x1234); (gdb) finish Run till exit from #0 foo () at test.cpp:5 In foo: 0x55555556aeb0 main () at test.cpp:13 13 printf("In main: %p\n", ptr.get()); Value returned is $1 = std::unique_ptr<int> = {get() = 0x7fffffffe130} (gdb) n In main: 0x55555556aeb0 14 return 0; (gdb) p ptr $2 = std::unique_ptr<int> = {get() = 0x55555556aeb0} Notice how the value printed by finish doesn't match all other three printed values.
I see the same result on x86-64. I'm just guessing right now, but I suspect that this is a case of the result being miss-classified as something that can be passed in register, when, it is really something that is passed in memory. The sys-v ABI says that if the return value is a non-trivial, C++ object, the caller should allocate space for the return value, and pass the address of the space to the callee as a hidden first argument. The address of the object should then be returned as the return value. Checking the disassembly of this test, and I can confirm that this is what is happening. However, on return, GDB appears to treat the address of the unique_ptr on the stack as if that was the unique_ptr, so I suspect, that GDB has classified the return value as being passed in register, rather than in memory. The culprit is likely to be amd64_return_value (or something called from that function) for x86-64.
Created attachment 13848 [details] Possible fix. Other than testing on the small example in this bug (which now works), this patch is completely untested. I'll still need to write a test case for this.
Created attachment 13849 [details] Updated fix, with tests. An updated patch, this includes some tests. Unfortunately, the tests are hitting a different pre-existing bug in GDB. I see this assertion failure: gdbtypes.h:676: internal-error: loc_bitpos: Assertion `m_loc_kind == FIELD_LOC_KIND_BITPOS' failed. If I keep my new tests, but revert the fix in GDB, I still hit the same assert, so I don't think its a result of my changes. I'll take a look at this assertion before posting the fix to the list.
(In reply to Andrew Burgess from comment #3) > Created attachment 13849 [details] > Updated fix, with tests. > > An updated patch, this includes some tests. Unfortunately, the tests are > hitting a different pre-existing bug in GDB. I see this assertion failure: > > gdbtypes.h:676: internal-error: loc_bitpos: Assertion `m_loc_kind == > FIELD_LOC_KIND_BITPOS' failed. > > If I keep my new tests, but revert the fix in GDB, I still hit the same > assert, so I don't think its a result of my changes. I'll take a look at > this assertion before posting the fix to the list. Can you give some more info about that? It might be already known.
Sure. My original comment was all I knew at the time, but I've looked into it a little more since then. The call stack when the assert triggers is: amd64_return_value amd64_classify amd64_classify_aggregate amd64_has_unaligned_fields In amd64_has_unaligned_fields we have this loop: for (int i = 0; i < type->num_fields (); i++) { struct type *subtype = check_typedef (type->field (i).type ()); /* Ignore static fields, empty fields (for example nested empty structures), and bitfields (these are handled by the caller). */ if (field_is_static (&type->field (i)) || (TYPE_FIELD_BITSIZE (type, i) == 0 && TYPE_LENGTH (subtype) == 0) || TYPE_FIELD_PACKED (type, i)) continue; int bitpos = type->field (i).loc_bitpos (); /* ... snip ... */ } And it is when we ask for the loc_bitpos that the assert triggers, as the field's type is no FIELD_LOC_KIND_BITPOS, but is instead, FIELD_LOC_KIND_DWARF_BLOCK. In the case we're handling, the field being processed is the field that describes the virtual base of the type being processed. I originally starting playing with is_dynamic_type, but noticed that the type in question doesn't return true for is_dynamic_type. This seemed weird, as clearly the offset needs resolving. But then I found some code in is_dynamic_type_internal which specifically detects the exact case I'm looking at, and ensures that such a case is NOT enough to make a type appear dynamic. The comment on this code says: /* Do not consider C++ virtual base types to be dynamic due to the field's offset being dynamic; these are handled via other means. */ But, frustratingly, doesn't indicate what that other means is, or where I might find it. However, I might be able to side step this issue for now. The patch I'm playing with adds a check in amd64_classify_aggregate, if the aggregate is not trivially_copyable then it should be passed in memory. No further checks are required. I initially added the trivially_copyable check as the last check of the if condition, like this: if (TYPE_LENGTH (type) > 16 || amd64_has_unaligned_fields (type) || !language_pass_by_reference (type).trivially_copyable) { theclass[0] = theclass[1] = AMD64_MEMORY; return; } However, while looking at the assertion case, I noticed that the type with a virtual base was not trivially_copyable, so, if I reorder the checks like this: if (TYPE_LENGTH (type) > 16 || !language_pass_by_reference (type).trivially_copyable || amd64_has_unaligned_fields (type)) { theclass[0] = theclass[1] = AMD64_MEMORY; return; } Then, problem solved! Or, at least, problem kicked down the road! I'm still experimenting, but, so far, I've not been able to create an example that both uses virtual base classes, and is trivially_copyable. My plan for now is to place the checks in the "correct" order, and add a solid comment explaining that the order is important (plus the test will fail if the order is ever changed). Things that would help are: 1. Suggestions for how amd64_has_unaligned_fields can be improved in order to handle virtual base class fields, e.g. how do we obtain the field offset for those fields, or 2. Confirmation that in C++ types with a virtual base class, are, by definition, never trivially_copyable, this would imply my solution is actually not such a hack after all, or 3. An example of a class with a virtual base class, that is trivially copyable. Ideally, the class would be less than 8-bytes in size so that it can be passed in a single register, such an example would then fail in my patched GDB.
> And it is when we ask for the loc_bitpos that the assert triggers, as the > field's type is no FIELD_LOC_KIND_BITPOS, but is instead, > FIELD_LOC_KIND_DWARF_BLOCK. > > In the case we're handling, the field being processed is the field that > describes the virtual base of the type being processed. > > I originally starting playing with is_dynamic_type, but noticed that the > type in question doesn't return true for is_dynamic_type. This seemed > weird, as clearly the offset needs resolving. But then I found some code in > is_dynamic_type_internal which specifically detects the exact case I'm > looking at, and ensures that such a case is NOT enough to make a type appear > dynamic. The comment on this code says: > > /* Do not consider C++ virtual base types to be dynamic > due to the field's offset being dynamic; these are > handled via other means. */ > > But, frustratingly, doesn't indicate what that other means is, or where I > might find it. Ok, it is that issue I was curious about. It sounds similar to this, but at a different place: https://sourceware.org/pipermail/gdb-patches/2021-September/182307.html
I posted my fix to the mailing list: https://sourceware.org/pipermail/gdb-patches/2021-December/184496.html
The master branch has been updated by Andrew Burgess <aburgess@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=b1718fcdd1d2a5c514f8ee504ba07fb3f42b8608 commit b1718fcdd1d2a5c514f8ee504ba07fb3f42b8608 Author: Andrew Burgess <aburgess@redhat.com> Date: Mon Dec 13 16:56:16 2021 +0000 gdb: on x86-64 non-trivial C++ objects are returned in memory Fixes PR gdb/28681. It was observed that after using the `finish` command an incorrect value was displayed in some cases. Specifically, this behaviour was observed on an x86-64 target. Consider this test program: struct A { int i; A () { this->i = 0; } A (const A& a) { this->i = a.i; } }; A func (int i) { A a; a.i = i; return a; } int main () { A a = func (3); return a.i; } And this GDB session: $ gdb -q ex.x Reading symbols from ex.x... (gdb) b func Breakpoint 1 at 0x401115: file ex.cc, line 14. (gdb) r Starting program: /home/andrew/tmp/ex.x Breakpoint 1, func (i=3) at ex.cc:14 14 A a; (gdb) finish Run till exit from #0 func (i=3) at ex.cc:14 main () at ex.cc:23 23 return a.i; Value returned is $1 = { i = -19044 } (gdb) p a $2 = { i = 3 } (gdb) Notice how after the `finish` the contents of $1 are junk, but, when I immediately ask for the value of `a`, I get back the correct value. The problem here is that after the finish command GDB calls the function amd64_return_value to figure out where the return value can be found (on x86-64 targets anyway). This function makes the wrong choice for the struct A in our case, as sizeof(A) <= 8, then amd64_return_value decides that A will be returned in a register. GDB then reads the return value register an interprets the contents as an instance of A. Unfortunately, A is not trivially copyable (due to its copy constructor), and the sys-v specification for argument and return value passing, says that any non-trivial C++ object should have space allocated for it by the caller, and the address of this space is passed to the callee as a hidden first argument. The callee should then return the address of this space as the return value. And so, the register that GDB is treating as containing an instance of A, actually contains the address of an instance of A (in this case on the stack), this is why GDB shows the incorrect result. The call stack within GDB for where we actually go wrong is this: amd64_return_value amd64_classify amd64_classify_aggregate And it is in amd64_classify_aggregate that we should be classifying the type as AMD64_MEMORY, instead of as AMD64_INTEGER as we currently do (via a call to amd64_classify_aggregate_field). At the top of amd64_classify_aggregate we already have this logic: if (TYPE_LENGTH (type) > 16 || amd64_has_unaligned_fields (type)) { theclass[0] = theclass[1] = AMD64_MEMORY; return; } Which handles some easy cases where we know a struct will be placed into memory, that is (a) the struct is more than 16-bytes in size, or (b) the struct has any unaligned fields. All we need then, is to add a check here to see if the struct is trivially copyable. If it is not then we know the struct will be passed in memory. I originally structured the code like this: if (TYPE_LENGTH (type) > 16 || amd64_has_unaligned_fields (type) || !language_pass_by_reference (type).trivially_copyable) { theclass[0] = theclass[1] = AMD64_MEMORY; return; } This solved the example from the bug, and my small example above. So then I started adding some more extensive tests to the GDB testsuite, and I ran into a problem. I hit this error: gdbtypes.h:676: internal-error: loc_bitpos: Assertion `m_loc_kind == FIELD_LOC_KIND_BITPOS' failed. This problem is triggered from: amd64_classify_aggregate amd64_has_unaligned_fields field::loc_bitpos Inside the unaligned field check we try to get the bit position of each field. Unfortunately, in some cases the field location is not FIELD_LOC_KIND_BITPOS, but is FIELD_LOC_KIND_DWARF_BLOCK. An example that shows this bug is: struct B { short j; }; struct A : virtual public B { short i; A () { this->i = 0; } A (const A& a) { this->i = a.i; } }; A func (int i) { A a; a.i = i; return a; } int main () { A a = func (3); return a.i; } It is the virtual base class, B, that causes the problem. The base class is represented, within GDB, as a field within A. However, the location type for this field is a DWARF_BLOCK. I spent a little time trying to figure out how to convert the DWARF_BLOCK to a BITPOS, however, I realised that, in this case at least, conversion is not needed. The C++ standard says that a class is not trivially copyable if it has any virtual base classes. And so, in this case, even if I could figure out the BITPOS for the virtual base class fields, I know for sure that I would immediately fail the trivially_copyable check. So, lets just reorder the checks in amd64_classify_aggregate to: if (TYPE_LENGTH (type) > 16 || !language_pass_by_reference (type).trivially_copyable || amd64_has_unaligned_fields (type)) { theclass[0] = theclass[1] = AMD64_MEMORY; return; } Now, if we have a class with virtual bases we will fail quicker, and avoid the unaligned fields check completely. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28681
I believe this should now be resolved in master.
Created attachment 13873 [details] attachment-747313-0.html Thanks! On December 23, 2021 7:16:59 AM EST, aburgess at redhat dot com <sourceware-bugzilla@sourceware.org> wrote: >https://sourceware.org/bugzilla/show_bug.cgi?id=28681 > >Andrew Burgess <aburgess at redhat dot com> changed: > > What |Removed |Added >---------------------------------------------------------------------------- > Resolution|--- |FIXED > Status|NEW |RESOLVED > >--- Comment #9 from Andrew Burgess <aburgess at redhat dot com> --- >I believe this should now be resolved in master. > >-- >You are receiving this mail because: >You reported the bug.
The master branch has been updated by Luis Machado <luisgpm@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=bab22d0640914384d467e09c3e796585fd08e7c6 commit bab22d0640914384d467e09c3e796585fd08e7c6 Author: Luis Machado <luis.machado@arm.com> Date: Tue Jan 4 14:06:36 2022 -0300 [aarch64/arm] Properly extract the return value returned in memory When running gdb.cp/non-trivial-retval.exp, the following shows up for both aarch64-linux and armhf-linux: Breakpoint 3, f1 (i1=23, i2=100) at src/gdb/testsuite/gdb.cp/non-trivial-retval.cc:35 35 A a; (gdb) finish Run till exit from #0 f1 (i1=23, i2=100) at src/gdb/testsuite/gdb.cp/non-trivial-retval.cc:35 main () at /src/gdb/testsuite/gdb.cp/non-trivial-retval.cc:163 163 B b = f2 (i1, i2); Value returned is $6 = {a = -11952} (gdb) The return value should be {a = 123} instead. This happens because the backends don't extract the return value from the correct location. GDB should fetch a pointer to the memory location from X8 for aarch64 and r0 for armhf. With the patch, gdb.cp/non-trivial-retval.exp has full passes on aarch64-linux and armhf-linux on Ubuntu 20.04/18.04. The problem only shows up with the "finish" command. The "call" command works correctly and displays the correct return value. This is also related to PR gdb/28681 (https://sourceware.org/bugzilla/show_bug.cgi?id=28681) and fixes FAIL's in gdb.ada/mi_var_array.exp. A new testcase is provided, and it exercises GDB's ability to "finish" a function that returns a large struct (> 16 bytes) and display the contents of this struct correctly. This has always been incorrect for these backends, but no testcase exercised this particular scenario.
The master branch has been updated by Carl Love <carll@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=24b27e5e9b6bf5a37fb39dfca151722bb801cbaa commit 24b27e5e9b6bf5a37fb39dfca151722bb801cbaa Author: Carl Love <cel@us.ibm.com> Date: Mon Nov 14 16:22:11 2022 -0500 PowerPC, function ppc64_sysv_abi_return_value add missing return value convention This patch address five testcase failures in gdb.cp/non-trivial-retval.exp. The following commit resulted in the five testcases failures on PowerPC. The value returned by the function is being reported incorrectly. commit b1718fcdd1d2a5c514f8ee504ba07fb3f42b8608 Author: Andrew Burgess <aburgess@redhat.com> Date: Mon Dec 13 16:56:16 2021 +0000 gdb: on x86-64 non-trivial C++ objects are returned in memory Fixes PR gdb/28681. It was observed that after using the `finish` command an incorrect value was displayed in some cases. Specifically, this behaviour was observed on an x86-64 target. The function: enum return_value_convention ppc64_sysv_abi_return_value (struct gdbarch *gdbarch, struct value *function, struct type *valtype, struct regcache *regcache, gdb_byte *readbuf, const gdb_byte *writebuf) should return RETURN_VALUE_STRUCT_CONVENTION if the valtype->code() is TYPE_CODE_STRUCT and if the language_pass_by_reference is not trivially_copyable. This patch adds the needed code to return the value RETURN_VALUE_STRUCT_CONVENTION in this case. With this patch, the five test cases still fail but with the message "Value returned has type: A. Cannot determine contents". The PowerPC ABI stores the address of the buffer containing the function return value in register r3 on entry to the function. However, the PowerPC ABI does not guarentee that r3 will not be modified in the function. So when the function returns, the return buffer address cannot be reliably obtained from register r3. Thus the message "Cannot determine contents" is appropriate in this case.