fork hang with corrupted list_all_lock
david wu
master.wcj@gmail.com
Fri Jul 9 14:00:00 GMT 2010
Wayne H. Badger <badger@...> writes:
>
> I have discovered an anomaly whose investigation has led to glibc and
> I'm
> wondering if this has been seen before.
>
> I have a cluster of machines running RHEL5.4 (glibc-2.5 based) on
> Nehalem
> E5530 processors (16 hyperthreaded CPUs, stepping 5) that are running
> a java
> process (hadoop TaskTracker). TaskTracker is 32-bit and multithreaded
> (~80
> threads). The kernel is 64 bit running 2.6.18-164.2.1.el5.
>
> I have caught the process in a relatively rare event that is one of
> those
> "can't happen" scenarios.
>
...
> Wayne
>
> --
> Wayne Badger
> Yahoo!
>
>
we have bumped into the same issue in our tomcat web application.
With rutime.getRuntime.exec() calling shell regularly, it is expected to
hang every days or weeks, using gdb to dump the native stack, we saw
the fork thread hangs at calling of _IO_list_lock(),some other threads
block on free_atfork(),when we use gdb to check list_all_lock,it is just in
the sam state as you report.after we have removed rutime.getRuntime.exec,
we still can see the some other threads blocked on
list_all_lock which was in a invalid state.
OS: SLE 10 64
CPU: E5504
JDK: 1.6_13
if we run our app in another server with cpu replaced with CPU as xron 540x
or os os as SLE 8 or jdk as JDK _5,we have failed to reproduce it.
it's so daunting.
David wu
More information about the Libc-help
mailing list