This is the mail archive of the
mailing list for the glibc project.
mt-application hanging in exit()
- From: Bertold Kolics <Bertold dot Kolics at Sun dot COM>
- To: libc-alpha at sources dot redhat dot com
- Date: Fri, 18 Jan 2002 17:15:12 -0600
- Subject: mt-application hanging in exit()
- Organization: Sun Microsystems, Inc.
- Reply-to: Bertold dot Kolics at Sun dot COM
I am using RedHat 7.2 with glibc-2.2.4-19.3 on a single-CPU, Pentium-based system. I have a simple multi-threaded application that can not terminate properly, because it is hung in the exit() call.
Here is how the test application works:
1) installs a signal handler function for the TERM signal (the signal handler only sets a variable which tells the main process to terminate)
2) creates 1 thread which only calls a sleep() once and then terminates
3) the main thread checks (in a loop) if a variable has a certain value (the expected value is set by the signal handler)
4) if so, the main thread exits by calling exit()
Now, here is the test I did:
- I waited for the termination of the only thread created by the application
- then I sent a TERM signal to the process
The problem was that the application did not terminate, but was waiting for the restart signal (in a suspend() called by pthread_onexit_process() ).
I've spent quite a bit of time to figure out what happenned and found the following:
- pthread_onexit_process() notifies the thread manager that the process is about to exit and then suspends itself and waits for the restart signal
- thread manager calls the exit handler and just before _exit()'ing, sends the restart signal to the issuing thread
The problem seems to be that the restart signal arrives BEFORE the suspend() has finished modifying the signal mask.
However, it is important to emphasize that this happenned only if there were 2 threads remaining (1 user thread and the thread manager thread). When the only thread created by the main thread was still active, the application could shutdown cleanly.
I suspect that this race condition only happens if the *only* thing the thread manager has to do is to send a restart signal to the issuing thread when servicing a REQ_PROCESS_EXIT request. Otherwise, the thread manager will send cancel signals to the other threads and thus will give a chance to the suspend() to finish in the main thread before sending a restart signal to it.
I've modified the glibc in the following manner:
- I didn't see any good reason why we need signalling in case the process is about to exit: the user thread will wait for the thread manager to exit (using waitpid()), so I don't see why an extra signal is needed.
- So, I removed the suspend() call from pthread_onexit_process() and the restart() call from pthread_handle_exit().
And the application is working now.
I would be interested in your comments and please, let me know if you need the source code.
PS: please, Cc any reply to me as well, because I am not on the list.
Sun Microsystems, Inc. \ Direct: +1 (512) 401-1188
Attn: Bertold Kolics \ Fax: +1 (512) 401-1197
5300 Riata Park Court, Bldg A \ Phone: +1 (512) 401-1184
Austin, TX 78727, USA \