[PATCH] gdb: fix target_ops reference count for some cases
Andrew Burgess
aburgess@redhat.com
Thu Sep 22 14:21:45 GMT 2022
Simon Marchi <simark@simark.ca> writes:
> On 2022-09-21 09:12, Andrew Burgess via Gdb-patches wrote:
>> This commit started as an investigation into why the test
>> gdb.python/py-inferior.exp crashes when GDB exits, leaving a core file
>> behind.
>>
>> The crash occurs in connpy_connection_dealloc, and is actually
>> triggered by this assert:
>>
>> gdb_assert (conn_obj->target == nullptr);
>>
>> Now a little aside...
>>
>> ... the assert is never actually printed, instead GDB crashes due to
>> calling a pure virtual function. The backtrace at the point of crash
>> looks like this:
>>
>> #7 0x00007fef7e2cf747 in std::terminate() () from /lib64/libstdc++.so.6
>> #8 0x00007fef7e2d0515 in __cxa_pure_virtual () from /lib64/libstdc++.so.6
>> #9 0x0000000000de334d in target_stack::find_beneath (this=0x4934d78, t=0x2bda270 <the_dummy_target>) at ../../src/gdb/target.c:3606
>> #10 0x0000000000df4380 in inferior::find_target_beneath (this=0x4934b50, t=0x2bda270 <the_dummy_target>) at ../../src/gdb/inferior.h:377
>> #11 0x0000000000de2381 in target_ops::beneath (this=0x2bda270 <the_dummy_target>) at ../../src/gdb/target.c:3047
>> #12 0x0000000000de68aa in target_ops::supports_terminal_ours (this=0x2bda270 <the_dummy_target>) at ../../src/gdb/target-delegates.c:1223
>> #13 0x0000000000dde6b9 in target_supports_terminal_ours () at ../../src/gdb/target.c:1112
>> #14 0x0000000000ee55f1 in internal_vproblem(internal_problem *, const char *, int, const char *, typedef __va_list_tag __va_list_tag *) (problem=0x2bdab00 <internal_error_problem>, file=0x198acf0 "../../src/gdb/python/py-connection.c", line=193, fmt=0x198ac9f "%s: Assertion `%s' failed.", ap=0x7ffdc26109d8) at ../../src/gdb/utils.c:379
>>
>> Notice in frame #12 we called target_ops::supports_terminal_ours,
>> however, this is the_dummy_target, which is of type dummy_target, and
>> so we should have called dummy_target::supports_terminal_ours. I
>> believe the reason we ended up in the wrong implementation of
>> supports_terminal_ours (which is a virtual function) is because we
>> made the call during GDB's shut-down, and, I suspect, the vtables were
>> in a weird state.
>>
>> Anyway, the point of this patch is not to fix GDB's ability to print
>> an assert during exit, but to address the root cause of the assert.
>> With that aside out of the way, we can return to the main story...
>>
>> Connections are represented in Python with gdb.TargtetConnection
>> objects (or its sub-classes). The assert in question confirms that
>> when a gdb.TargtetConnection is deallocated, the underlying GDB
>> connection has itself been removed from GDB. If this is not true then
>> we risk creating multiple different gdb.TargtetConnection objects for
>> the same connection, which would be bad.
>>
>> When a connection removed in GDB the connection_removed observer
>
> Missing "is".
>
>> fires, which we catch with connpy_connection_removed, this function
>> then sets conn_obj->target to nullptr.
>>
>> The first issue here is that connpy_connection_dealloc is being called
>> as part of GDB's exit code, which is run after the Python interpreter
>> has been shut down. The connpy_connection_dealloc function is used to
>> deallocate the gdb.TargtetConnection Python object. Surely it is
>> wrong for us to be deallocating Python objects after the interpreter
>> has been shut down.
>>
>> The reason why connpy_connection_dealloc is called during GDB's exit
>> is that the global all_connection_objects map is holding a reference
>> to the gdb.TargtetConnection object. When the map is destroyed during
>
> Typo in "TargtetConnection".
>
>> GDB's exit, the gdb.TargtetConnection objects within the map can
>> finally be deallocated.
>>
>> Another job of connpy_connection_removed (the function we mentioned
>> earlier) is to remove connections from the all_connection_objects map
>> when the connection is removed from GDB.
>>
>> And so, the reason why all_connection_objects has contents when GDB
>> exits, and the reason the assert fires, is that, when GDB exits, there
>> are still some connections that have not yet been removed from GDB,
>> that is, they have a non-zero reference count.
>>
>> If we take a look at quit_force (top.c) you can see that, for each
>> inferior, we call pop_all_targets before we (later in the function)
>> call do_final_cleanups. It is the do_final_cleanups call that is
>> responsible for shutting down the Python interpreter.
>>
>> So, in theory, we should have popped all targets be the time GDB
>
> be -> before?
>
>> exits, this should have reduced their reference counts to zero, which
>> in turn should have triggered the connection_removed observer, and
>> resulted in the connection being removed from all_connection_objects,
>> and the gdb.TargtetConnection object being deallocated.
>
> "TargtetConnection"
>
>> That this is not happening indicates that earlier, somewhere else in
>> GDB, we are leaking references to GDB's connections.
>>
>> I tracked the problem down to the 'remove-inferiors' command,
>> implemented with the remove_inferior_command function (in inferior.c).
>> This function calls delete_inferior for each inferior the user
>> specifies.
>>
>> In delete_inferior we do some house keeping, and then delete the
>> inferior object, which calls inferior::~inferior.
>>
>> In neither delete_inferior or inferior::~inferior do we call
>> pop_all_targets, and it is this missing call that means we leak some
>> references to the target_ops objects on the inferior's target_stack.
>>
>> To fix this we need to add a pop_all_targets call either in
>> delete_inferior or in inferior::~inferior. Currently, I think that we
>> should place the call in delete_inferior.
>>
>> Before calling pop_all_targets the inferior for which we are popping
>> needs to be made current, along with the program_space associated with
>> the inferior.
>
> Why does the inferior and program_space need to be made current in order
> to pop the targets? I understand that pop_all_targets_above and other
> functions use `current_inferior`, but could we convert them (or add new
> versions) so they don't? Off-hand I don't see why they couldn't receive
> the inferior as a parameter (or be made methods of inferior and/or
> target_stack).
>
> It shouldn't be important which inferior is the current one when calling
> target_close on a target. If we are closing a target, it means it is no
> longer controlling any inferior.
I agree with you 100%. Unfortunately, the following targets all seem to
depend on current_inferior being set (in their ::close method):
bsd_kvm_target
core_target
darwin_nat_target
record_btrace_target
ctf_target
tfile_target
windows_nat_target (though this is only for debug output)
I suspect that this means these targets only really work when GDB has a
single inferior maybe? In most cases GDB seems to be clearing out some
per-inferior state relating to the target... I need to investigate more,
but I guess I wanted to raise this in case you (or anyone) had thoughts.
>
>> At the moment the inferior's program_space is deleted in
>> delete_inferior before we call inferior::~inferior, so, I think, to
>> place the pop_all_targets call into inferior::~inferior would require
>> additional adjustment to GDB. As delete_inferior already exists, and
>> includes various house keeping tasks, it doesn't seem unreasonable to
>> place the pop_all_targets call there.
>
> I don't object to fixing it like this. I'm just wondering, did you
> consider changing target_stack::m_stack to make it hold string
> references, something like std::vector<target_ops_ref>? I haven't tried
> so maybe this doesn't make sense / is too difficult. But if it does, I
> guess the problem would take care of itself. When deleting an inferior
> that still has some targets pushed, they would be automatically decref'd
> and closed if needed.
I did think about this. I think in the end the fix I proposed here
was just less churn.
I've revisited the idea of holding target_ops_ref objects, and I have
some patches that move GDB in that direction, though I haven't yet
figured out if we can get rid of the whole pop_all_targets API, which I
think is what you're hinting at.
>
>> Now when I run py-inferior.exp, by the time GDB exits, the reference
>> counts are correct. The final pop_all_targets calls in quit_force
>> reduce the reference counts to zero, which means the connections are
>> removed before the Python interpreter is shut down. When GDB actually
>> exits the all_connection_objects map is empty, and no further Python
>> objects are deallocated at that point. The test now exits cleanly
>> without creating a core file.
>>
>> I've made some additional, related, changes in this commit.
>>
>> In inferior::~inferior I've added a new assert that ensures, by the
>> time the inferior is destructed, the inferior's target stack is
>> empty (with the exception of the dummy_target). If this is not true
>> then we will be loosing a reference to a target_ops object.
>>
>> It is worth noting that we are loosing references to the dummy_target
>> object, however, I've not tried to fix that problem in this patch, as
>> I don't think it is as important. The dummy target is a global
>> singleton, there's no observer for when the dummy target is deleted,
>> so no other parts of GDB care when the object is deleted. As a global
>> it is always just deleted as part of the exit code, and we never
>> really care what its reference count is. So, though it is a little
>> annoying that its reference count is wrong, it doesn't really matter.
>> Maybe I'll come back in a later patch and try to clean that up... but
>> that's for another day.
>>
>> When I tested the changes above I ran into a failure from 'maint
>> selftest infrun_thread_ptid_changed'.
>>
>> The problem is with scoped_mock_context. This object creates a new
>> inferior (called mock_inferior), with a thread, and some other
>> associated state, and then select this new inferior. We also push a
>> process_stratum_target sub-class onto the new inferior's target stack.
>>
>> In ~scoped_mock_context we call:
>>
>> pop_all_targets_at_and_above (process_stratum);
>>
>> this will remove all target_ops objects from the mock_inferior's
>> target stack, but leaves anything at the dummy_stratum and the
>> file_stratum (which I find a little weird, but more on this later).
>>
>> The problem though is that pop_all_targets_at_and_above, just like
>> pop_all_targets, removes things from the target stack of the current
>> inferior. In ~scoped_mock_context we don't ensure that the
>> mock_inferior associated with the current scoped_mock_context is
>> actually selected.
>>
>> In most tests we create a single scoped_mock_context, which
>> automatically selects its contained mock_inferior. However, in the
>> test infrun_thread_ptid_changed, we create multiple
>> scoped_mock_context, and then change which inferior is currently
>> selected.
>>
>> As a result, in one case, we end up in ~scoped_mock_context with the
>> wrong inferior selected. The pop_all_targets_at_and_above call then
>> removes the target_ops from the wrong inferior's target stack. This
>> leaves the target_ops on the scoped_mock_context::mock_inferior's
>> target stack, and, when the mock_inferior is destructed, we loose
>> some references, this triggers the assert I placed in
>> inferior::~inferior.
>>
>> To fix this I added a switch_to_inferior_no_thread call within the
>> ~scoped_mock_context function.
>
> Good catch. Although, if that could be fixed by making
> pop_all_targets_at_and_above not use the current_inferior, I think it
> would be nicer. And if the target stack could take care of managing the
> refcount, as mentioned above, even nicer.
As I mention above, right now it seems we do need th correct inferior
selected, so we might need something like this, I'll see how my new
patches work out.
Thanks,
Andrew
>
>> As I mention above, it seems weird that we call
>> pop_all_targets_at_and_above instead of pop_all_targets, so I've
>> changed that. I didn't see any test regressions after this, so I'm
>> assuming this is fine.
>
> Seems fine to me (this is essentially what a target stack holding
> target_ops_refs would do).
>
> Simon
More information about the Gdb-patches
mailing list