This is the mail archive of the
mailing list for the Cygwin project.
RE: cygwin 1.5.20-1, spinning pdksh, 100% CPU
- From: "Ernie Coskrey" <Ernie dot Coskrey at steeleye dot com>
- To: <cygwin at cygwin dot com>
- Date: Wed, 8 Aug 2007 14:10:57 -0400
- Subject: RE: cygwin 1.5.20-1, spinning pdksh, 100% CPU
> -----Original Message-----
> From: email@example.com
> [mailto:firstname.lastname@example.org] On Behalf Of Ernie Coskrey
> Sent: Tuesday, July 31, 2007 3:40 PM
> To: email@example.com
> Subject: cygwin 1.5.20-1, spinning pdksh, 100% CPU
> I've run into a problem with cygwin 1.5.20-1 and pdksh
> 5.2.14. We've got a pdksh.exe process that is spinning,
> using all the CPU.
> This scenario is very hard to reproduce, but has happened on
> our test systems occasionally. It occurred recently, and I
> currently have gdb attached to the process and have the
> symbols loaded. I see that pdksh is continually calling
> "sigsuspend()", which is immediately returning from
> cancelable_wait due to the fact that the signal_arrived event
> is set. I also see that pdksh is waiting for a subprocess to
> complete, and has a handle to the PID of that process -
> however the process has long since terminated.
> It appears that something went wrong during delivery of SIGCHLD.
> I've got two questions related to this:
> - have there been changes between 1.5.20-1 and 1.5.24-2, or
> the latest snapshot, that might have fixed this issue? We've
> done some limited testing with 1.5.24-2 and haven't seen this
> happen yet, but as I said the it only happens rarely.
> - is there anything I can look at in gdb to help identify
> what the issue is?
> Any suggestions would be appreciated!
> Ernie Coskrey
I've discovered an interesting piece of information that I think is
related to this. I'm hoping this might ring a bell with someone on the
Looking at _main_tls->stack, when I've set a breakpoint in
handle_sigsuspend just after the cancelable_wait() call, I see the
0x6109186f is "sigdelayed()", which is the routine that should have been
called to deliver the signal and reset the signal_arrived event.
0x4132ac is j_waitj (in pdksh).
So, somehow, when this problem occurs, "sigdelayed" gets pushed onto the
stack *before* j_waitj does. So, _sigbe never calls sigdelayed.
I don't think there's ever a case where sigdelayed should be at
_main_tls->stack. However this happened is, I believe, the cause of
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Problem reports: http://cygwin.com/problems.html