Bug 9381

Summary: A multithreaded application with ~ 7000 threads cause gdb to produce internal error. File linux-nat.c
Product: gdb Reporter: atulsvasu
Component: threadsAssignee: Not yet assigned to anyone <unassigned>
Status: ASSIGNED ---    
Severity: normal CC: gdb-prs, pedro
Priority: P3    
Version: 6.6   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:
Attachments: gdb_report.tar.bz2

Description atulsvasu 2007-06-22 21:08:02 UTC
[Converted from Gnats 2276]

Once it caused:
----------------

linux-nat.c:1229: internal-error: linux_nat_resume: Assertion `signo == TARGET_SIGNAL_0' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) y

linux-nat.c:1229: internal-error: linux_nat_resume: Assertion `signo == TARGET_SIGNAL_0' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Create a core file of GDB? (y or n) n

---------------

linux-nat.c:546: internal-error: wait returned unexpected status 0x100
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) y

linux-nat.c:546: internal-error: wait returned unexpected status 0x100
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Create a core file of GDB? (y or n) n

Release:
GNU gdb 6.6

Environment:
Linux atul.mooo.com 2.6.18-gentoo-r2 #6 PREEMPT Fri May 4 18:38:43 IST 2007 x86_64 AMD Athlon(tm) 64 Processor 3500+ AuthenticAMD GNU/Linux

gcc (GCC) 4.1.2 (Gentoo 4.1.2)

How-To-Repeat:
It doesn't come now after I fixed something in my code,
but it use to appear after it starts Initializing users,
and all thread goes to a blocking state.

Ctrl + Z suspended the program, then I entered
thread apply all cont

it produced error (1) listed.

Another time, Ctrl + C, and continue. Both times it
had over 1000 threads running most in "waiting state",
some conditional, some mutex. (pthread). 

I had a core dump, deleted :(, shall update, if I get 
error again.
Comment 1 Pedro Alves 2008-08-16 14:37:19 UTC
From: Pedro Alves <pedro@codesourcery.com>
To: gdb-gnats@sources.redhat.com
Cc:  
Subject: Re: c++/2276: A multithreaded application with ~ 7000 threads cause gdb to produce internal error. File linux-nat.c
Date: Sat, 16 Aug 2008 15:37:19 +0100

 This assertion can be easilly reproduced like so:
 
 Start an application with more than one thread, like below, from gdb's
 testsuite.  Run it under GDB.  Type ctrl-z, to generate a SIGTSTP, 
 switch to another thread, and issue continue.  Do it a couple of 
 times, since it may not trigger the first time.
 
 This is related to:
 
  1) GDB storing a global stop_signal, and passing the last stop
    signal to the thread we're resuming, irrespective of whether it
    was the thread that reported the signal in the first place.
 
  2) To stop all threads in linux, we have to iterate over them,
    and stop them individually by sending each of them a SIGSTOP.  When we go  
    to wait for the SIGSTOP, we may notice that a thread reported a signal back
    instead of SIGSTOP.  We aren't interested in it at this point yet, so we
    cache it for use when we later resume or go wait for the target again.  We
    do another wait on the thread until we really see a SIGSTOP coming out.
 
  3) Expanding on #2 for the case in hand, when we noticed a thread got a
     SIGTSTP, and went to stop all other threads before reporting the stop
     to GDB's core, we also notice that other threads report a stop signal
     different from SIGSTOP.  It was SIGTSPT, but we didn't care what
     it is at this point.  We just care that it was something, and we leave
     it cached in lp->status.
 
  4) If #3 has happened, and the user switches threads before resuming,
     and the thread the user is resuming is one that has a SIGTSTP cached
     already, due to #1, we'll try to pass another SIGTSTP to this thread.
     That is, in the example run below, we are passing the SIGTSTP that was
     reported against thread 3, to thread 2, but thread also has a SIGTSTP
     event cached.
 
 Changing #1, involves making each thread store a stop_signal, and
 changing the target side code, to do the signal pass or no pass
 decision itself, and on a thread basis.  This would fix the issue
 the user reported.
 
 You'd still hit the assert if passing a signal to the inferior with
 "signal nn", instead of "continue" when the thread we're resuming
 already has a cached event.  To fix it, we would need to queue in the
 inferior's signal queue, one of either
   - the already cached signal
   - the signal we're just trying to pass
 
 And, perhaps, as a special case, if they're the same signal number,
 just ignore one of them.
 
 >./gdb/baseline/build/gdb/gdb /home/pedro/gdb/tests/threads
 GNU gdb (GDB) 6.8.50.20080815-cvs
 Copyright (C) 2008 Free Software Foundation, Inc.
 License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
 This is free software: you are free to change and redistribute it.
 There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
 and "show warranty" for details.
 This GDB was configured as "x86_64-unknown-linux-gnu".
 For bug reporting instructions, please see:
 <http://www.gnu.org/software/gdb/bugs/>...
 (gdb) r
 Starting program: /home/pedro/gdb/tests/threads
 [Thread debugging using libthread_db enabled]
 [New Thread 0x40800950 (LWP 14674)]
 [New Thread 0x41001950 (LWP 14675)]
 
 Program received signal SIGTSTP, Stopped (user).
 0x00007ffff7bcb796 in pthread_join () from /lib/libpthread.so.0
 (gdb) info threads
   3 Thread 0x41001950 (LWP 14675)  0x00007ffff78ffb81 in nanosleep () 
 from /lib/libc.so.6
   2 Thread 0x40800950 (LWP 14674)  0x00007ffff78ffb81 in nanosleep () 
 from /lib/libc.so.6
 * 1 Thread 0x7ffff7fd76e0 (LWP 14671)  0x00007ffff7bcb796 in pthread_join () 
 from /lib/libpthread.so.0
 (gdb) t 3
 [Switching to thread 3 (Thread 0x41001950 (LWP 14675))]#0  0x00007ffff78ffb81 
 in nanosleep () from /lib/libc.so.6
 (gdb) c
 Continuing.
 
 Program received signal SIGTSTP, Stopped (user).
 0x00007ffff78ffb81 in nanosleep () from /lib/libc.so.6
 (gdb) t 2
 [Switching to thread 2 (Thread 0x40800950 (LWP 14674))]#0  0x00007ffff78ffb81 
 in nanosleep () from /lib/libc.so.6
 (gdb) c
 Continuing.
 ../../src/gdb/linux-nat.c:1717: internal-error: linux_nat_resume: Assertion 
 `signo == TARGET_SIGNAL_0' failed.
 A problem internal to GDB has been detected,
 further debugging may prove unreliable.
 Quit this debugging session? (y or n) 
 
 
 #include <stdio.h>
 #include <unistd.h>
 #include <stdlib.h>
 #include <pthread.h>
 
 void *thread_function0(void *arg); /* Pointer to function executed by each 
 thread */
 
 void *thread_function1(void *arg); /* Pointer to function executed by each 
 thread */
 
 unsigned int args[2];
 
 int main() {
     int res;
     pthread_t threads[2];
     void *thread_result;
     long i = 0;
 
     args[i] = 1; /* Init value.  */
     res = pthread_create(&threads[i],
                          NULL,
                          thread_function0,
                          (void *) i);
 
     i++;
     args[i] = 1; /* Init value.  */
     res = pthread_create(&threads[i],
                          NULL,
                          thread_function1,
                          (void *) i);
 
     pthread_join (threads[0], &thread_result);
     pthread_join (threads[1], &thread_result);
     exit(EXIT_SUCCESS);
 }
 
 void *thread_function0(void *arg) {
     int my_number =  (long) arg;
     volatile int *myp = (volatile int *) &args[my_number];
 
     /* Don't run forever.  Run just short of it :)  */
     while (*myp > 0)
       {
         (*myp) ++;
         usleep (1);  /* Loop increment.  */
       }
 
     pthread_exit(NULL);
 }
 
 
 void *thread_function1(void *arg) {
     int my_number =  (long) arg;
     volatile int *myp = (volatile int *) &args[my_number];
 
     /* Don't run forever.  Run just short of it :)  */
     while (*myp > 0)
       {
         (*myp) ++;
         usleep (1);  /* Loop increment.  */
       }
 
     pthread_exit(NULL);
 }
 
 -- 
 Pedro Alves
Comment 2 Tom Tromey 2010-01-19 02:30:47 UTC
Changing component to threads; this isn't a c++ issue.