This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: Shells hang during script execution


Here's a description of a second hang condition we were encountering, along with a patch for it.


The application (pdksh in this case) does a read on a pipe, which eventually calls pipe.cc fhandler_pipe::read in Thread 1.  This creates a new cygthread with "read_pipe()" as the function.  Then it calls th->detach(read_state).

When the hang occurs, the new thread gets terminated early, before cygthread::stub() can call "callfunc()".  You see the error message "erroneous thread activation".  I'm not sure what's causing the thread to fail activation, but the result is, the read_state semaphore never gets signalled.

So Thread 1 goes into cygthread::detach(read_state).  The first thing that happens is signal_arrived is set.  The old code would then set n=1, but leave howlong=INFINITE.  My change sets howlong=100 in this case.  Then, when TIMEOUT occurs, we look to see if __name is not NULL.  Since the thread was terminated, its name is now NULL, so it doesn't decrement i, and eventually you break out of the loop and clean up as expected.



--- cygthread.cc.ORIG	2006-02-22 10:57:42.123931300 -0500
+++ cygthread.cc	2006-02-23 15:50:23.894461500 -0500
@@ -374,10 +374,12 @@
 		break;
 	      case WAIT_OBJECT_0 + 1:
 		n = 1;
-		if (i--)
-		  howlong = 50;
+		i--;
+		howlong = 100;
 		break;
 	      case WAIT_TIMEOUT:
+		if(!i && __name)
+			i--;
 		break;
 	      default:
 		if (!exiting)

> -----Original Message-----
> From: Ernie Coskrey 
> Sent: Friday, February 10, 2006 1:31 PM
> To: Ernie Coskrey; 'cygwin@cygwin.com'
> Subject: RE: Shells hang during script execution
> 
> 
> We've been able to narrow this down some more.  The shell 
> gets hung in sigsuspend(), waiting for SIGCHLD.  We've 
> verified that the process that's executed as part of the 
> command substitution does complete, and returns EOF, and the 
> shell (we're testing with pdksh) goes into sigsuspend and 
> never comes out.
> 
> If we execute "kill -CHLD <pid>", the shell resumes its processing.
> 
> I'm going to continue to look into this - if anybody has any 
> insight into how SIGCHLD might be getting lost, please let me 
> know.  Thanks!
> 
> Ernie Coskrey
> 
> 
> -----Original Message-----
> From: Ernie Coskrey
> Sent: Wed 2/1/2006 3:27 PM
> To: 'cygwin@cygwin.com'
> Subject: Shells hang during script execution
>  
> I've run into problems with shell scripts hanging during 
> execution for no apparent reason.  I've narrowed down my test 
> case to two simple shell scripts.  To reproduce the problem, 
> I ran three instances of the "top.sh" script included here, 
> and after a bit (30 minutes to an hour or so) I'll see that 
> one or two of the shells have just stopped in their tracks.
> 
> Here are the scripts:
> 
> ----<top.sh>----
> dir=$1
> loops=$2
> 
> for loop in `seq 1 $loops`
> do
>         x=`./subtest.sh $dir`
>         date
>         echo loop $loop
> done
> 
> ----<subtest.sh>----
> for j in `ls $1`
> do
>         if [ `echo $j | egrep -i "A|B" | wc -l` -ne 0 ]
>         then
>                 echo $j
>         fi
> done
> echo subtest1 done >&2
> 
> --------
> 
> I then ran three bash shells.  The commands I ran, 
> simultaneously, were:
> 
> 1) ./top.sh C:/ 600
> 2) ./top.sh C:/windows 300
> 3) ./top.sh C:/windows/system32 100
> 
> These ran for about 45 minutes, and then I noticed that two 
> of them (1 and 2 above) had stopped printing any output.  The 
> third was still moving along.  The third completed, but the 
> first two never progressed any further.  I used Process 
> Explorer from ntinternals.com, and saw that the two hung 
> shells were not using any CPU, and did not have any child 
> processes created; they were simply stopped.  If a process 
> dump would be helpful, I can generate one with Windbg or gdb.
> 
> 
> -----
> Ernie Coskrey       SteelEye Technology, Inc.    803-461-3875
> 
> 

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]