fork hang with corrupted list_all_lock

david wu master.wcj@gmail.com
Fri Jul 9 14:00:00 GMT 2010


Wayne H. Badger <badger@...> writes:

> 
> I have discovered an anomaly whose investigation has led to glibc and  
> I'm
> wondering if this has been seen before.
> 
> I have a cluster of machines running RHEL5.4 (glibc-2.5 based) on  
> Nehalem
> E5530 processors (16 hyperthreaded CPUs, stepping 5) that are running  
> a java
> process (hadoop TaskTracker).  TaskTracker is 32-bit and multithreaded  
> (~80
> threads).  The kernel is 64 bit running 2.6.18-164.2.1.el5.
> 
> I have caught the process in a relatively rare event that is one of  
> those
> "can't happen" scenarios.
> 
...
> Wayne
> 
> --
> Wayne Badger
> Yahoo!
> 
> 
we have bumped into the same issue in our tomcat web application.
With rutime.getRuntime.exec() calling shell regularly, it is expected to 
hang every days or weeks, using gdb to dump the native stack, we saw 
the fork thread hangs at calling of _IO_list_lock(),some other threads 
block on free_atfork(),when we use gdb to check list_all_lock,it is just in 
the sam state as you report.after we have removed  rutime.getRuntime.exec,
we still can see the some other threads blocked on 
list_all_lock which was in a invalid state.
 OS: SLE 10 64
CPU: E5504
JDK: 1.6_13

if we run our app in another server with cpu replaced with CPU as  xron 540x 
or os os as SLE 8 or jdk as JDK _5,we have failed to reproduce it.
it's so daunting.

David wu





More information about the Libc-help mailing list