cygwin 1.5.20-1, spinning pdksh, 100% CPU

Igor Peshansky pechtcha@cs.nyu.edu
Tue Jul 31 21:16:00 GMT 2007


On Tue, 31 Jul 2007, Ernie Coskrey wrote:

> I've run into a problem with cygwin 1.5.20-1 and pdksh 5.2.14.  We've
> got a pdksh.exe process that is spinning, using all the CPU.
>
> This scenario is very hard to reproduce, but has happened on our test
> systems occasionally.  It occurred recently, and I currently have gdb
> attached to the process and have the symbols loaded.

I assume you've rebuilt pdksh from source, since the packaged binary is
stripped...  Do you also have the symbols for the Cygwin DLL?

> I see that pdksh is continually calling "sigsuspend()", which is
> immediately returning from cancelable_wait due to the fact that the
> signal_arrived event is set.

Do you mean the sigpause() call?  Can you see which signal it attempts to
suspend?  Can you email me (privately, if you wish) the stack dump from
gdb?

> I also see that pdksh is waiting for a subprocess to complete, and has a
> handle to the PID of that process - however the process has long since
> terminated.

That's normal (I think).  Cygwin may not deliver SIGCHLD immediately after
process termination.  Until pdksh gets SIGCHLD, it'll keep the process
handle.

> It appears that something went wrong during delivery of SIGCHLD.

Does this happen before or after j_sigchld() gets invoked?

> I've got two questions related to this:
>
> - have there been changes between 1.5.20-1 and 1.5.24-2, or the latest
> snapshot, that might have fixed this issue?  We've done some limited
> testing with 1.5.24-2 and haven't seen this happen yet, but as I said
> the it only happens rarely.

Quite possibly.  There were changes to signal handling since 1.5.20, IIRC.
Unless I'm mistaken, there's even a patch for a race condition in process
handling code (though it's not in 1.5.24, I think).

> - is there anything I can look at in gdb to help identify what the issue
> is?
>
> Any suggestions would be appreciated!

Posting a sequence of steps that reliably reproduces the problem for you
would be great (but not necessarily easy).

As I said above, a stack dump (with full pdksh symbols) would help...
That might mean that you'd need to build an unstripped pdksh and attempt
to reproduce the problem again.
	Igor
-- 
				http://cs.nyu.edu/~pechtcha/
      |\      _,,,---,,_	    pechtcha@cs.nyu.edu | igor@watson.ibm.com
ZZZzz /,`.-'`'    -.  ;-;;,_		Igor Peshansky, Ph.D. (name changed!)
     |,4-  ) )-,_. ,\ (  `'-'		old name: Igor Pechtchanski
    '---''(_/--'  `-'\_) fL	a.k.a JaguaR-R-R-r-r-r-.-.-.  Meow!

Belief can be manipulated.  Only knowledge is dangerous.  -- Frank Herbert

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/



More information about the Cygwin mailing list