This is the mail archive of the
mailing list for the glibc project.
Re: [libc-alpha] mt-application hanging in exit()
- From: Bertold Kolics <Bertold dot Kolics at Sun dot COM>
- To: Kaz Kylheku <kaz at ashi dot footprints dot net>
- Cc: libc-alpha at sources dot redhat dot com
- Date: Fri, 18 Jan 2002 18:39:25 -0600
- Subject: Re: [libc-alpha] mt-application hanging in exit()
- Organization: Sun Microsystems, Inc.
- References: <Pine.LNX.4.33.0201181549250.26357-100000@ashi.FootPrints.net>
- Reply-to: Bertold dot Kolics at Sun dot COM
Thanks for the prompt answer.
Kaz Kylheku wrote:
> On Fri, 18 Jan 2002, Bertold Kolics wrote:
> [ reformated to 79 cols ]
I am sorry about that.
> > I suspect that this race condition only happens if the *only* thing
> > the thread manager has to do is to send a restart signal to the issuing
> > thread when servicing a REQ_PROCESS_EXIT request. Otherwise, the thread
> > manager will send cancel signals to the other threads and thus will give
> > a chance to the suspend() to finish in the main thread before sending
> > a restart signal to it.
> suspend() and restart() are fundamental to all synchronization in
> LinuxThreads. If they are broken, then the problem is much more serious
> than hanging in exit() symptom you are seeing, and requires a real fix.
Then we must have a problem. I can reproduce it each & every time. I have run
my app under strace using microsec timestamping, and I could see the restart
signal arriving before the signal mask modification is done inside suspend().
So, then what could be done to avoid this race condition? How could the thread
manager detect that the main thread is not yet waiting for the restart signal
and delay sending the restart signal?
> > I've modified the glibc in the following manner:
> > - I didn't see any good reason why we need signalling in case the
> > process is about to exit: the user thread will wait for the thread manager
> > to exit (using waitpid()), so I don't see why an extra signal is needed.
> I don't have the code before me, but one thing to remember is that a
> thread other than the main thread is permitted to call exit(). Threads
> other than the main one cannot waitpid() for the thread manager, because
> it is their parent, not their child.
Oh, I see the problem. Thanks for the explanation. In my particular case, I
called exit from the main thread.
I appreciate your help.