Bug 31281 - [gdb] intrusive_list.h:329: internal-error: push_back: Assertion elem_node->prev == INTRUSIVE_LIST_UNLINKED_VALUE failed.
Summary: [gdb] intrusive_list.h:329: internal-error: push_back: Assertion elem_node->p...
Status: RESOLVED FIXED
Alias: None
Product: gdb
Classification: Unclassified
Component: build (show other bugs)
Version: HEAD
: P2 normal
Target Milestone: 15.1
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-01-23 17:59 UTC by Guinevere Larsen
Modified: 2024-10-31 04:30 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Guinevere Larsen 2024-01-23 17:59:22 UTC
When testing with the newest clang in rawhide (17.0.6), multiple tests in the testsuite trigger the assertion in the testsuite.

One example is gdb.base/watch_thread_num.exp, happening on the test "Watchpoint triggered iteration 1".

I haven't had time to dig any further, but I did manage to confirm that it isn't fixed by the change suggested in PR 31061.
Comment 1 Guinevere Larsen 2024-01-24 15:05:20 UTC
Turns out the issue is the GCC version change from 13.2.1 to 14.0.1 (fedora 39 to rawhide).

I mistakenly thought it was clang because I saw this on the clang buildbot, but it also fails when using gcc to build the test. It's 100% related to the gcc used to build the test.
Comment 2 Tom Tromey 2024-01-24 16:17:38 UTC
This is suggestive:

#11 0x000000000087a3d1 in process_stratum_target::maybe_remove_resumed_with_pending_wait_status (this=0x10fac80 <the_amd64_linux_nat_target>, thread=0x237ea40)
    at ../../binutils-gdb/gdb/process-stratum-target.c:136
#12 0x00000000009fed78 in set_thread_exited (tp=0x237ea40, 
    exit_code=std::optional [no contained value], silent=true)
    at ../../binutils-gdb/gdb/thread.c:236
#13 0x0000000000762cc2 in operator() (__closure=0x7ffe08d2b507, thr=0x237ea40)
    at ../../binutils-gdb/gdb/inferior.c:268
#14 0x0000000000765977 in intrusive_list<thread_info, intrusive_base_node<thread_info> >::clear_and_dispose<inferior::clear_thread_list()::<lambda(thread_info*)> >(struct {...}) (this=0x1c0b068, disposer=...)
    at ../../binutils-gdb/gdb/../gdbsupport/intrusive_list.h:518


In frame 11 the assertion triggers.
But maybe the entry was already removed by the loop in frame 14.
Comment 3 Tom de Vries 2024-01-24 17:06:01 UTC
I'm not too sure what this code is doing, but this seems to fix the mentioned test-case:
...
diff --git a/gdb/process-stratum-target.c b/gdb/process-stratum-target.c
index 8737938e3b4..69d3a1deac7 100644
--- a/gdb/process-stratum-target.c
+++ b/gdb/process-stratum-target.c
@@ -119,7 +119,8 @@ process_stratum_target::maybe_add_resumed_with_pending_wait_status
     {
       infrun_debug_printf ("adding to resumed threads with event list: %s",
                           thread->ptid.to_string ().c_str ());
-      m_resumed_with_pending_wait_status.push_back (*thread);
+      if (!thread->is_linked ())
+       m_resumed_with_pending_wait_status.push_back (*thread);
     }
 }
 
@@ -133,9 +134,11 @@ process_stratum_target::maybe_remove_resumed_with_pending_wait_status
     {
       infrun_debug_printf ("removing from resumed threads with event list: %s",
                           thread->ptid.to_string ().c_str ());
-      gdb_assert (thread->resumed_with_pending_wait_status_node.is_linked ());
-      auto it = m_resumed_with_pending_wait_status.iterator_to (*thread);
-      m_resumed_with_pending_wait_status.erase (it);
+      if (thread->resumed_with_pending_wait_status_node.is_linked ())
+       {
+         auto it = m_resumed_with_pending_wait_status.iterator_to (*thread);
+         m_resumed_with_pending_wait_status.erase (it);
+       }
     }
   else
     gdb_assert (!thread->resumed_with_pending_wait_status_node.is_linked ());
...
Comment 4 Simon Marchi 2024-01-24 17:18:57 UTC
The problem appears to be with the computation of the node address, here:

---
/* For element types that keep the node as member field.  */

template<typename T, intrusive_list_node<T> T::*MemberNode>
struct intrusive_member_node
{
  static intrusive_list_node<T> *as_node (T *elem)
  { return &(elem->*MemberNode); }
};
---

I stopped at the first internal error.  Going up a few frames, the thread_info object's address is:

(gdb) p thread
$1 = (thread_info *) 0x517000050180

The address of its `resumed_with_pending_wait_status_node` member is (btw printing this expression took like 10 seconds, it might be good to dig to see if that's normal):

(gdb) p &thread->resumed_with_pending_wait_status_node 
$2 = (intrusive_list_node<thread_info> *) 0x5170000503d0

However, in the `intrusive_list<thread_info>::push_back` frame, `elem_node` is:

(gdb) p elem_node
$3 = (intrusive_list_node<thread_info> *) 0x5170000503d8

We would expect `elem_node` to be equal to `$2` above.
Comment 5 Guinevere Larsen 2024-01-24 18:11:40 UTC
Tom De Vries, you patch does fix all the crashes relating to this specific crash, but given Simon's analysis, I don't think its a fix, just a way to get gdb not to assert.

Also, that patch adds a regression on gdb.base/bp-cond-failure.exp, for example in the test "access_type=short: cond_eval=auto: multi-loc: continue", where we expect gdb to print 'condition for breakpoint 2.1' and with the patch it prints 'condition for breakpoint 2'.
Comment 6 Simon Marchi 2024-01-24 19:23:28 UTC
I bisected gcc, the failure appeared with this commit:

c++: non-dependent .* operand folding [PR112427]
https://gitlab.com/gnutools/gcc/-/commit/d3f48f68227

I know nothing about compilers, so I don't know what the code changes.  But the included test case uses a pointer-to-member, which is somewhat relevant in our case.
Comment 7 Tom de Vries 2024-01-24 19:24:37 UTC
(In reply to Guinevere Larsen from comment #5)
> Tom De Vries, you patch does fix all the crashes relating to this specific
> crash, but given Simon's analysis, I don't think its a fix, just a way to
> get gdb not to assert.
> 

Agreed.

> Also, that patch adds a regression on gdb.base/bp-cond-failure.exp, for
> example in the test "access_type=short: cond_eval=auto: multi-loc:
> continue", where we expect gdb to print 'condition for breakpoint 2.1' and
> with the patch it prints 'condition for breakpoint 2'.

OK, that confirms it, thanks for letting me know.
Comment 8 Tom de Vries 2024-01-24 19:41:18 UTC
(In reply to Simon Marchi from comment #6)
> I bisected gcc, the failure appeared with this commit:
> 
> c++: non-dependent .* operand folding [PR112427]
> https://gitlab.com/gnutools/gcc/-/commit/d3f48f68227
> 
> I know nothing about compilers, so I don't know what the code changes.  But
> the included test case uses a pointer-to-member, which is somewhat relevant
> in our case.

At the parser level (original tree dump file), I see:
...
;; Function static intrusive_list_node<T>* intrusive_member_node<T, MemberNode>::as_node(T*) [with T = thread_info; intrusive_list_node<T> T::* MemberNode = &thread_info::resumed_with_pending_wait_status_node] (null)
;; enabled by -tree-original


return <retval> = NON_LVALUE_EXPR <elem> != 0B ? &NON_LVALUE_EXPR <elem>->D.191606 + (sizetype) 520 : (struct intrusive_list_node *) 520;
...

The 520 is the offset of resumed_with_pending_wait_status_node:
...
(gdb) p /x (void *)&thread->resumed_with_pending_wait_status_node - (void *)thread
$46 = 0x208
(gdb) p 0x208
$12 = 520
...

The D.191606 bit adds an offset of 8 (this becomes apparent at the expand rtl dump file).  That also happens to be the offset of the inherited intrusive list node, see:
...
class thread_info : public refcounted_object,
                    public intrusive_list_node<thread_info>
...
and:
...
(gdb) p (void *) &thread.next - (void *) thread
$13 = 8
...
Comment 9 Simon Marchi 2024-01-24 21:02:14 UTC
(In reply to Tom de Vries from comment #8)
> (In reply to Simon Marchi from comment #6)
> > I bisected gcc, the failure appeared with this commit:
> > 
> > c++: non-dependent .* operand folding [PR112427]
> > https://gitlab.com/gnutools/gcc/-/commit/d3f48f68227
> > 
> > I know nothing about compilers, so I don't know what the code changes.  But
> > the included test case uses a pointer-to-member, which is somewhat relevant
> > in our case.
> 
> At the parser level (original tree dump file), I see:
> ...
> ;; Function static intrusive_list_node<T>* intrusive_member_node<T,
> MemberNode>::as_node(T*) [with T = thread_info; intrusive_list_node<T> T::*
> MemberNode = &thread_info::resumed_with_pending_wait_status_node] (null)
> ;; enabled by -tree-original
> 
> 
> return <retval> = NON_LVALUE_EXPR <elem> != 0B ? &NON_LVALUE_EXPR
> <elem>->D.191606 + (sizetype) 520 : (struct intrusive_list_node *) 520;
> ...
> 
> The 520 is the offset of resumed_with_pending_wait_status_node:
> ...
> (gdb) p /x (void *)&thread->resumed_with_pending_wait_status_node - (void
> *)thread
> $46 = 0x208
> (gdb) p 0x208
> $12 = 520
> ...
> 
> The D.191606 bit adds an offset of 8 (this becomes apparent at the expand
> rtl dump file).  That also happens to be the offset of the inherited
> intrusive list node, see:
> ...
> class thread_info : public refcounted_object,
>                     public intrusive_list_node<thread_info>
> ...
> and:
> ...
> (gdb) p (void *) &thread.next - (void *) thread
> $13 = 8
> ...

I'm not sure I understand, but aren't you conflating thread_info::next with thread_info::resumed_with_pending_wait_status_node::next?
Comment 10 Tom de Vries 2024-01-24 22:01:40 UTC
(In reply to Simon Marchi from comment #9)
> I'm not sure I understand, but aren't you conflating thread_info::next with
> thread_info::resumed_with_pending_wait_status_node::next?

AFAICT no, but I'm wondering if the compiler is conflating them.
Comment 11 Tom de Vries 2024-01-24 22:39:13 UTC
(In reply to Simon Marchi from comment #9)
> The D.191606 bit adds an offset of 8 (this becomes apparent at the expand
> rtl dump file).  That also happens to be the offset of the inherited
> intrusive list node, see:
> ...
> class thread_info : public refcounted_object,
>                     public intrusive_list_node<thread_info>
> ...
> and:
> ...
> (gdb) p (void *) &thread.next - (void *) thread
> $13 = 8
> ...

And following that logic, by flipping the order here:
...
diff --git a/gdb/gdbthread.h b/gdb/gdbthread.h
index e7035d40ad4..3167bf31945 100644
--- a/gdb/gdbthread.h
+++ b/gdb/gdbthread.h
@@ -246,8 +246,8 @@ using private_thread_info_up = std::unique_ptr<private_thread_info>;
 
    The intrusive_list_node base links threads in a per-inferior list.  */
 
-class thread_info : public refcounted_object,
-                   public intrusive_list_node<thread_info>
+class thread_info : public intrusive_list_node<thread_info>,
+                   public refcounted_object
 {
 public:
   explicit thread_info (inferior *inf, ptid_t ptid);
...
we change the offset from 8 to 0, and ... the test-case passes.
Comment 12 Simon Marchi 2024-01-25 03:51:12 UTC
I made what I think is a fairly minimal reproducer: https://godbolt.org/z/rbevnhG1f

node_ptr_1 is the node field address computed using the pointer to member machinery.  node_ptr_2 is also the node field address, but computed directly by taking its address.  We would expect them to be the same, but they're not with gcc 14 / trunk.  They are with other compilers.

Does this look sufficient to file a gcc bug?

Here's the code in case godbolt is down.

---

#include <cassert>
#include <iostream>

struct intrusive_list_node {
  void *next;
};

struct dummy {
  void *base;
};

struct thread_info : public dummy, public intrusive_list_node {
  intrusive_list_node node;
};

template <intrusive_list_node thread_info::*MemberNode>
struct intrusive_member_node {
  static intrusive_list_node *as_node(thread_info *elem) {
    return &(elem->*MemberNode);
  }
};

int main() {
  for (int i = 0; i < 100000; ++i) {
    thread_info *tp = new thread_info;
    auto node_ptr_1 = intrusive_member_node<&thread_info::node>::as_node(tp);
    auto node_ptr_2 = &tp->node;
    std::cout << "With tp == " << tp << ", does " << node_ptr_1
              << " == " << node_ptr_2 << " -> " << (node_ptr_1 == node_ptr_2)
              << std::endl;
    assert(node_ptr_1 == node_ptr_2);
  }
}
Comment 13 Tom de Vries 2024-01-25 08:45:37 UTC
(In reply to Simon Marchi from comment #12)
> I made what I think is a fairly minimal reproducer: https://godbolt.org/z/rbevnhG1f

Great.

> Does this look sufficient to file a gcc bug?

I've managed to get rid of the template bit.

Also, I've removed the includes, the goal is to minimize the size of the preprocessed example.

Futhermore, I've replaced the new with an object declaration, which allows for simpler expressions ('.' instead of '->').

So, I get to:
...
struct intrusive_list_node {
  void *next;
};

struct dummy {
  void *base;
};

struct thread_info : public dummy, public intrusive_list_node {
  intrusive_list_node node;
};

static thread_info ti;

int main (void) {
  auto thread_info::*MemberNode = &thread_info::node;
  auto node_ptr_1 = &(ti.*MemberNode);
  auto node_ptr_2 = &ti.node;
  return !(node_ptr_1 == node_ptr_2);
}
...

gcc-13:
...
$ g++ test.cpp; ./a.out; echo $?
0
...

gcc-14:
...
$ g++ test.cpp; ./a.out; echo $?
1
...

I think this should be good enough to file.
Comment 14 Simon Marchi 2024-01-25 12:38:06 UTC
(In reply to Tom de Vries from comment #13)
> (In reply to Simon Marchi from comment #12)
> > I made what I think is a fairly minimal reproducer: https://godbolt.org/z/rbevnhG1f
> 
> Great.
> 
> > Does this look sufficient to file a gcc bug?
> 
> I've managed to get rid of the template bit.
> 
> Also, I've removed the includes, the goal is to minimize the size of the
> preprocessed example.
> 
> Futhermore, I've replaced the new with an object declaration, which allows
> for simpler expressions ('.' instead of '->').
> 
> So, I get to:
> ...
> struct intrusive_list_node {
>   void *next;
> };
> 
> struct dummy {
>   void *base;
> };
> 
> struct thread_info : public dummy, public intrusive_list_node {
>   intrusive_list_node node;
> };
> 
> static thread_info ti;
> 
> int main (void) {
>   auto thread_info::*MemberNode = &thread_info::node;
>   auto node_ptr_1 = &(ti.*MemberNode);
>   auto node_ptr_2 = &ti.node;
>   return !(node_ptr_1 == node_ptr_2);
> }
> ...
> 
> gcc-13:
> ...
> $ g++ test.cpp; ./a.out; echo $?
> 0
> ...
> 
> gcc-14:
> ...
> $ g++ test.cpp; ./a.out; echo $?
> 1
> ...
> 
> I think this should be good enough to file.

Thank you very much for that, filed:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113599
Comment 15 Tom de Vries 2024-01-25 14:21:25 UTC
(In reply to Tom de Vries from comment #11)
> (In reply to Simon Marchi from comment #9)
> > The D.191606 bit adds an offset of 8 (this becomes apparent at the expand
> > rtl dump file).  That also happens to be the offset of the inherited
> > intrusive list node, see:
> > ...
> > class thread_info : public refcounted_object,
> >                     public intrusive_list_node<thread_info>
> > ...
> > and:
> > ...
> > (gdb) p (void *) &thread.next - (void *) thread
> > $13 = 8
> > ...
> 
> And following that logic, by flipping the order here:
> ...
> diff --git a/gdb/gdbthread.h b/gdb/gdbthread.h
> index e7035d40ad4..3167bf31945 100644
> --- a/gdb/gdbthread.h
> +++ b/gdb/gdbthread.h
> @@ -246,8 +246,8 @@ using private_thread_info_up =
> std::unique_ptr<private_thread_info>;
>  
>     The intrusive_list_node base links threads in a per-inferior list.  */
>  
> -class thread_info : public refcounted_object,
> -                   public intrusive_list_node<thread_info>
> +class thread_info : public intrusive_list_node<thread_info>,
> +                   public refcounted_object
>  {
>  public:
>    explicit thread_info (inferior *inf, ptid_t ptid);
> ...
> we change the offset from 8 to 0, and ... the test-case passes.

Posted this as workaround: https://sourceware.org/pipermail/gdb-patches/2024-January/206218.html
Comment 16 Sourceware Commits 2024-01-25 15:30:35 UTC
The master branch has been updated by Tom de Vries <vries@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=2ec7980408fafb172d2ddb716a367ce2111e2e9e

commit 2ec7980408fafb172d2ddb716a367ce2111e2e9e
Author: Tom de Vries <tdevries@suse.de>
Date:   Thu Jan 25 16:31:47 2024 +0100

    [gdb/build] Workaround gcc PR113599
    
    Since gcc commit d3f48f68227 ("c++: non-dependent .* operand folding
    [PR112427]"), with gdb we run into PR gcc/113599 [1], a wrong-code bug, as
    reported in PR build/31281.
    
    Work around this by flipping inherit order:
    ...
    -class thread_info : public refcounted_object,
    -                   public intrusive_list_node<thread_info>
    +class thread_info : public intrusive_list_node<thread_info>,
    +                   public refcounted_object
    ...
    
    An argument could be made that this isn't necessary, because this occurred in
    an unreleased gcc version.
    
    However, I think it could be useful when bisecting gcc for other problems in
    building gdb.  Having this workaround means the bisect won't reintroduce the
    problem.  Furthermore, the workaround is harmless.
    
    Tested on Fedora rawhide x86_64.
    
    Approved-By: Tom Tromey <tom@tromey.com>
    
    Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31281
    
    [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113599
Comment 17 Tom de Vries 2024-01-25 15:31:25 UTC
Fixed by committing workaround.