[Converted from Gnats 2276] Once it caused: ---------------- linux-nat.c:1229: internal-error: linux_nat_resume: Assertion `signo == TARGET_SIGNAL_0' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. Quit this debugging session? (y or n) y linux-nat.c:1229: internal-error: linux_nat_resume: Assertion `signo == TARGET_SIGNAL_0' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. Create a core file of GDB? (y or n) n --------------- linux-nat.c:546: internal-error: wait returned unexpected status 0x100 A problem internal to GDB has been detected, further debugging may prove unreliable. Quit this debugging session? (y or n) y linux-nat.c:546: internal-error: wait returned unexpected status 0x100 A problem internal to GDB has been detected, further debugging may prove unreliable. Create a core file of GDB? (y or n) n Release: GNU gdb 6.6 Environment: Linux atul.mooo.com 2.6.18-gentoo-r2 #6 PREEMPT Fri May 4 18:38:43 IST 2007 x86_64 AMD Athlon(tm) 64 Processor 3500+ AuthenticAMD GNU/Linux gcc (GCC) 4.1.2 (Gentoo 4.1.2) How-To-Repeat: It doesn't come now after I fixed something in my code, but it use to appear after it starts Initializing users, and all thread goes to a blocking state. Ctrl + Z suspended the program, then I entered thread apply all cont it produced error (1) listed. Another time, Ctrl + C, and continue. Both times it had over 1000 threads running most in "waiting state", some conditional, some mutex. (pthread). I had a core dump, deleted :(, shall update, if I get error again.
From: Pedro Alves <pedro@codesourcery.com> To: gdb-gnats@sources.redhat.com Cc: Subject: Re: c++/2276: A multithreaded application with ~ 7000 threads cause gdb to produce internal error. File linux-nat.c Date: Sat, 16 Aug 2008 15:37:19 +0100 This assertion can be easilly reproduced like so: Start an application with more than one thread, like below, from gdb's testsuite. Run it under GDB. Type ctrl-z, to generate a SIGTSTP, switch to another thread, and issue continue. Do it a couple of times, since it may not trigger the first time. This is related to: 1) GDB storing a global stop_signal, and passing the last stop signal to the thread we're resuming, irrespective of whether it was the thread that reported the signal in the first place. 2) To stop all threads in linux, we have to iterate over them, and stop them individually by sending each of them a SIGSTOP. When we go to wait for the SIGSTOP, we may notice that a thread reported a signal back instead of SIGSTOP. We aren't interested in it at this point yet, so we cache it for use when we later resume or go wait for the target again. We do another wait on the thread until we really see a SIGSTOP coming out. 3) Expanding on #2 for the case in hand, when we noticed a thread got a SIGTSTP, and went to stop all other threads before reporting the stop to GDB's core, we also notice that other threads report a stop signal different from SIGSTOP. It was SIGTSPT, but we didn't care what it is at this point. We just care that it was something, and we leave it cached in lp->status. 4) If #3 has happened, and the user switches threads before resuming, and the thread the user is resuming is one that has a SIGTSTP cached already, due to #1, we'll try to pass another SIGTSTP to this thread. That is, in the example run below, we are passing the SIGTSTP that was reported against thread 3, to thread 2, but thread also has a SIGTSTP event cached. Changing #1, involves making each thread store a stop_signal, and changing the target side code, to do the signal pass or no pass decision itself, and on a thread basis. This would fix the issue the user reported. You'd still hit the assert if passing a signal to the inferior with "signal nn", instead of "continue" when the thread we're resuming already has a cached event. To fix it, we would need to queue in the inferior's signal queue, one of either - the already cached signal - the signal we're just trying to pass And, perhaps, as a special case, if they're the same signal number, just ignore one of them. >./gdb/baseline/build/gdb/gdb /home/pedro/gdb/tests/threads GNU gdb (GDB) 6.8.50.20080815-cvs Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-unknown-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... (gdb) r Starting program: /home/pedro/gdb/tests/threads [Thread debugging using libthread_db enabled] [New Thread 0x40800950 (LWP 14674)] [New Thread 0x41001950 (LWP 14675)] Program received signal SIGTSTP, Stopped (user). 0x00007ffff7bcb796 in pthread_join () from /lib/libpthread.so.0 (gdb) info threads 3 Thread 0x41001950 (LWP 14675) 0x00007ffff78ffb81 in nanosleep () from /lib/libc.so.6 2 Thread 0x40800950 (LWP 14674) 0x00007ffff78ffb81 in nanosleep () from /lib/libc.so.6 * 1 Thread 0x7ffff7fd76e0 (LWP 14671) 0x00007ffff7bcb796 in pthread_join () from /lib/libpthread.so.0 (gdb) t 3 [Switching to thread 3 (Thread 0x41001950 (LWP 14675))]#0 0x00007ffff78ffb81 in nanosleep () from /lib/libc.so.6 (gdb) c Continuing. Program received signal SIGTSTP, Stopped (user). 0x00007ffff78ffb81 in nanosleep () from /lib/libc.so.6 (gdb) t 2 [Switching to thread 2 (Thread 0x40800950 (LWP 14674))]#0 0x00007ffff78ffb81 in nanosleep () from /lib/libc.so.6 (gdb) c Continuing. ../../src/gdb/linux-nat.c:1717: internal-error: linux_nat_resume: Assertion `signo == TARGET_SIGNAL_0' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. Quit this debugging session? (y or n) #include <stdio.h> #include <unistd.h> #include <stdlib.h> #include <pthread.h> void *thread_function0(void *arg); /* Pointer to function executed by each thread */ void *thread_function1(void *arg); /* Pointer to function executed by each thread */ unsigned int args[2]; int main() { int res; pthread_t threads[2]; void *thread_result; long i = 0; args[i] = 1; /* Init value. */ res = pthread_create(&threads[i], NULL, thread_function0, (void *) i); i++; args[i] = 1; /* Init value. */ res = pthread_create(&threads[i], NULL, thread_function1, (void *) i); pthread_join (threads[0], &thread_result); pthread_join (threads[1], &thread_result); exit(EXIT_SUCCESS); } void *thread_function0(void *arg) { int my_number = (long) arg; volatile int *myp = (volatile int *) &args[my_number]; /* Don't run forever. Run just short of it :) */ while (*myp > 0) { (*myp) ++; usleep (1); /* Loop increment. */ } pthread_exit(NULL); } void *thread_function1(void *arg) { int my_number = (long) arg; volatile int *myp = (volatile int *) &args[my_number]; /* Don't run forever. Run just short of it :) */ while (*myp > 0) { (*myp) ++; usleep (1); /* Loop increment. */ } pthread_exit(NULL); } -- Pedro Alves
Changing component to threads; this isn't a c++ issue.