A pthread_barrier_wait defined to release after two threads deadlocks two Java threads after reaching the barrier: Thread 3 (Thread -1211135072 (LWP 22483)): #0 0x00970402 in __kernel_vsyscall () #1 0x00136406 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0x02e91ddb in _Jv_CondWait () from /usr/lib/libgcj.so.7 #3 0x02e79f6d in gnu::gcj::runtime::FinalizerThread::run () #4 0x02e89e0b in _Jv_ThreadRun () from /usr/lib/libgcj.so.7 #5 0x02e91880 in _Jv_ThreadRegister () from /usr/lib/libgcj.so.7 #6 0x0347ef24 in GC_start_routine () from /usr/lib/libgcj.so.7 #7 0x0013340b in start_thread () from /lib/libpthread.so.0 #8 0x00a56b7e in clone () from /lib/libc.so.6 Thread 2 (Thread -1232114784 (LWP 23494)): #0 0x00970402 in __kernel_vsyscall () #1 0x00136d7b in pthread_barrier_wait () from /lib/libpthread.so.0 #2 0x0812fd8c in ?? () #3 0x080a64cc in frysk::sys::Ptrace::ptrace_thread_run () #4 0x080a5147 in frysk::sys::Ptrace$PtraceThread::run () #5 0x02e89e0b in _Jv_ThreadRun () from /usr/lib/libgcj.so.7 #6 0x02e91880 in _Jv_ThreadRegister () from /usr/lib/libgcj.so.7 #7 0x0347ef24 in GC_start_routine () from /usr/lib/libgcj.so.7 #8 0x0013340b in start_thread () from /lib/libpthread.so.0 #9 0x00a56b7e in clone () from /lib/libc.so.6 Thread 1 (Thread -1209034544 (LWP 22466)): #0 0x00970402 in __kernel_vsyscall () #1 0x00136d7b in pthread_barrier_wait () from /lib/libpthread.so.0 #2 0x0812fd8c in ?? () #3 0x080a6132 in ptrace_thread_head () #4 0x080a63ff in frysk::sys::Ptrace::attach () #5 0x080a073b in frysk::proc::LinuxTask::sendAttach () #6 0x0808549f in frysk::proc::TaskState$1::handleAttach () #7 0x08084a96 in frysk::proc::Task::performAttach () #8 0x08081d67 in frysk::proc::ProcState$Attaching::initialState () #9 0x08081df2 in frysk::proc::ProcState$1::handleAddObservation () #10 0x0807fe49 in frysk::proc::Proc::handleAddObservation () #11 0x0807f5fe in frysk::proc::Proc$6::execute () #12 0x0809d321 in frysk::event::EventLoop::runEventLoop () Unit test to follow
Strangely enough, this problem also seems to occur using simple pthread condition variables and mutex locks, as soon as the GC kicks in: Thread 3 (Thread -1210889312 (LWP 7762)): #0 0x00970402 in __kernel_vsyscall () #1 0x00c2b406 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0x02e91ddb in _Jv_CondWait () from /usr/lib/libgcj.so.7 #3 0x02e79f6d in gnu::gcj::runtime::FinalizerThread::run () #4 0x02e89e0b in _Jv_ThreadRun () from /usr/lib/libgcj.so.7 #5 0x02e91880 in _Jv_ThreadRegister () from /usr/lib/libgcj.so.7 #6 0x0347ef24 in GC_start_routine () from /usr/lib/libgcj.so.7 #7 0x00c2840b in start_thread () from /lib/libpthread.so.0 #8 0x00a56b7e in clone () from /lib/libc.so.6 Thread 2 (Thread -1221379168 (LWP 8771)): #0 0x00970402 in __kernel_vsyscall () #1 0x00c2b406 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0x02e91ddb in _Jv_CondWait () from /usr/lib/libgcj.so.7 #3 0x02e85e8e in java::lang::Object::wait () from /usr/lib/libgcj.so.7 #4 0x02e75b33 in java::lang::Object::wait () from /usr/lib/libgcj.so.7 #5 0x080a5399 in frysk::sys::Ptrace$PtraceThread::run () #6 0x02e89e0b in _Jv_ThreadRun () from /usr/lib/libgcj.so.7 #7 0x02e91880 in _Jv_ThreadRegister () from /usr/lib/libgcj.so.7 #8 0x0347ef24 in GC_start_routine () from /usr/lib/libgcj.so.7 #9 0x00c2840b in start_thread () from /lib/libpthread.so.0 #10 0x00a56b7e in clone () from /lib/libc.so.6 Thread 1 (Thread -1208788784 (LWP 7761)): #0 0x00970402 in __kernel_vsyscall () #1 0x00c2d97e in __lll_mutex_lock_wait () from /lib/libpthread.so.0 #2 0x00c2a22f in _L_mutex_lock_71 () from /lib/libpthread.so.0 #3 0x00c2a00e in pthread_mutex_lock () from /lib/libpthread.so.0 #4 0x080a62be in ptrace_thread_head () #5 0x080a655b in frysk::sys::Ptrace::attach () #6 0x080a061b in frysk::proc::LinuxTask::sendAttach () #7 0x0808537f in frysk::proc::TaskState$1::handleAttach () #8 0x08084976 in frysk::proc::Task::performAttach ()
Even more interestingly, this many only happen when ptrace calls are thrown into the mix; especially with PTRACE_ATTACH. Unit test is up in CVS.
Is there an upstream bug report for this?
Some more info from Andrew: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=180637 I think it was assumed that that was related so blocked waiting for it to first be fixed