This is the mail archive of the
cygwin
mailing list for the Cygwin project.
cygserver blocking on semctl(SETVAL) call
- From: Ethan Tira-Thompson <ejt at andrew dot cmu dot edu>
- To: cygwin at cygwin dot com
- Date: Fri, 25 Aug 2006 11:21:20 -0400
- Subject: cygserver blocking on semctl(SETVAL) call
[Slightly modified from version previously sent on cygwin-developers,
who suggest this is a better forum for discussion]
I've discovered what I believe to be a internal deadlock issue in
cygserver.
I have a piece of code:
void SemaphoreManager::setValue(semid_t id, int x) const {
semun params;
params.val=x;
cout << "SEMCTL..." << flush;
if(semctl(semid,id,SETVAL,params)<0) {
perror("ERROR: SemaphoreManager::setValue (semctl)");
exit(EXIT_FAILURE);
}
cout << "done" << endl;
}
This is part of a function which gets called a number of times
throughout the life of the program. It works just fine up until one
particular call (with x=0) which reliably causes it to block between
the two cout's. Not just my program either -- all IPC is blocked at
this point. So bringing up new cygwin windows, running 'ipcs', etc.,
all hang. Once I kill any one process in the group that are using
the semaphore, it seems to jump start things a bit and may run a bit
more, but usually eventually blocks again until all of my program's
processes are killed.
My code runs fine under Linux and Mac OS X, it's only now that we're
nearing release that I'm testing under cygwin and finding something
has gone wrong in the past 9 months or so -- either something updated
on your end, or a change in our code that's now tickling an issue.
The kicker to note here -- is there any reason a *SETVAL* operation
could be blocked??? It should either go through or return an error.
I'm fairly convinced it's *not* this particular semctl call that's
causing the block, it just gets hung up because some *other*,
previous, operation has hung cygserver, and it's that operation
that's causing the trouble.
One nuisance is that when I run cygserver with -d, it doesn't block
in the same place -- something about all that debugging output
changes the race conditions. In any case, I've attached the
cygserver output leading up to a block, in hopes it means something
to you.
Thanks for taking a look -- I'm afraid I'm stumped. (doesn't help
gdb only reports '??' for all function calls when I attach to a
process, so I can't tell what any of my code is doing. And yes, I do
have -g enabled)
Our code can be checked out from CVS, but before running you'll need
to increase the semmns and semmsl parameters as described in step 5:
http://www.cs.cmu.edu/~tekkotsu/cygwin-install.html
After that's set up:
cvs -d :pserver:anonymous@cvs.tekkotsu.org:/cvs checkout -P Tekkotsu
cd Tekkotsu;
setenv TEKKOTSU_ROOT `pwd` || export TEKKOTSU_ROOT=`pwd`
cd project
make sim
./sim-ERS7 Speed=0
When launched, the simulator forks into four processes, using IPC to
communicate between them. 'Speed=0' pauses our simulator so it
shouldn't be trying to process anything. When launched, it goes
through a series of runlevels CONSTRUCTING, STARTING, RUNNING,
STOPPING, DESTRUCTING, DESTRUCTED. Passing InitialRunlevel=X on the
command line will stop in a runlevel other than "running", and then
you can use the 'runlevel' command within the simulator to advance.
It reliably gets into the "starting" runlevel, but something about
the "running" runlevel triggers the problem. SemaphoreManager (from
the code displayed above) is found in the root IPC directory.
Beware leaked semaphores sets btw, since this problem also causes the
signal handler to block when trying to remove the set on being
killed, you'll need to kill -9 it, and use 'ipcs' to check for any
leftover sets, and then 'ipcrm' them manually between runs.
(Actually, I find it easier to just kill/relaunch cygserver itself
which releases all of the blocked processes and clears leftover
semaphores at the same time)
-ethan
The following trace corresponds to the 'cygserver -d' activity
following entering the 'runlevel' command to move from STARTING to
RUNNING, and the block that occurs in that runlevel.
Attachment:
cygserverout.txt
Description: Text document
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Problem reports: http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/