cygrunsrv hangs forever on exec error (fix included)
Corinna Vinschen
corinna-cygwin@cygwin.com
Sun Nov 13 16:09:00 GMT 2005
On Nov 12 17:44, Christian Franke wrote:
> Hi,
>
> if the exec in cygrunsrv fails or the command exits to early, cygrunsrv
> will hang forever.
> The service can no longer be controlled until cygrunsrv has been kill(-9)ed.
> Auto restart of service does not work in this case.
>
>
> Steps to reproduce (at least on XP SP2):
>
> $ cygrunsrv -I test -p /bin/true
>
> $ cygrunsrv -S test
> cygrunsrv: Error starting a service: ... Win32 error 1053:
> ...
>
> $ sleep 3600 #;-)
>
> $ cygrunsrv -S test
> Service : test
> Current State : Start Pending
> Controls Accepted : Stop
> Command : /bin/true
>
> cygrunsrv process has to be killed manually.
>
> Same result occurs if the command's exe is removed.
>
>
> This is due to the following (IMO undocumented and "interesting";-)
> behavior of the windows SCM:
>
> The StartServiceControlDispatcher() routine does not return unless some
> thread (no necessarily service_main()) sets SERVICE_STOPPED.
>
> The service_main() thread started by SCM is left alone.
> Exiting service_main() does nothing, in particular it does not end
> StartServiceControlDispatcher().
>
> But, if SERVICE_STOPPED is set, StartServiceControlDispatcher()
> immediately returns, again without any care about service_main().
> Because usually now the program exit()'s and kills all threads, code
> following SERVICE_STOPPED may not be executed at all.
Thanks for this report and the simple testcases. The description
is very helpful. I just don't really like the idea to leave the
service_main function through _exit.
So I rearranged the waitpid
evaluation to accomplish two results, first, except in the neverexit
case, always set the service status to SERVICE_STOPPED *and* do it as
the last action in service_main, second, simplify the code. What bugged
me was that the WIFEXITED/WIFSIGNALED parts exists twice, even though
there isn't really a lot of difference between the return code from
the first vs. the second waitpid.
I didn't create a new cygrunsrv version for now, instead I'm sending
my diff. I would like to hear what you think and if I didn't made a
fatal mistake, I'll uplaod a new cygrunsrv version with this changes.
Corinna
* cygrunsrv.cc (service_main): Simplify waitpid return value
evaluation. Always set service status to SERVICE_STOPPED,
except in the neverexits case. Move the set_service_status
call to be always the last action in service_main.
Index: cygrunsrv.cc
===================================================================
RCS file: /cvs/cygwin-apps/cygrunsrv/cygrunsrv.cc,v
retrieving revision 1.28
diff -p -u -r1.28 cygrunsrv.cc
--- cygrunsrv.cc 22 May 2005 16:34:55 -0000 1.28
+++ cygrunsrv.cc 13 Nov 2005 13:10:53 -0000
@@ -1515,81 +1515,78 @@ service_main (DWORD argc, LPSTR *argv)
report_service_status ();
int status = 1;
- if (!waitpid (server_pid, &status, WNOHANG))
+ int ret = waitpid (server_pid, &status, WNOHANG);
+ if (!ret)
{
/* Child has probably `execv'd properly. */
set_service_status (SERVICE_RUNNING);
syslog(LOG_INFO, "`%s' service started", svcname);
/* Keep repeating waitpid() command as long as it returned because
of a handled signal sent to this cygrunsrv process */
- int ret;
while ((ret = waitpid (server_pid, &status, 0)) == -1 &&
errno == EINTR)
;
+ }
+ else
+ /* The ret == -1 case below is only valid for the inner watpid call. */
+ ret = 0;
- /* If ret is -1, report errno, else process the status */
- if (ret == -1)
+ /* If ret is -1, report errno, else process the status */
+ if (ret == -1)
+ {
+ switch (errno)
{
- switch (errno)
- {
- case ECHILD:
- syslog (LOG_ERR, "service `%s' exited, & its status was lost"
- " errno ECHILD", svcname);
- break;
- default:
- syslog (LOG_ERR, "service `%s' error: waitpid() failed: "
- "errno %d", svcname, errno);
- }
+ case ECHILD:
+ syslog (LOG_ERR, "service `%s' exited, its status was lost"
+ " errno ECHILD", svcname);
+ break;
+ default:
+ syslog (LOG_ERR, "service `%s' error: waitpid() failed: "
+ "errno %d", svcname, errno);
}
- else if (WIFEXITED (status))
+ service_main_exitval = errno;
+ set_service_status (SERVICE_STOPPED);
+ }
+ else if (WIFEXITED (status))
+ {
+ unsigned char s = WEXITSTATUS (status);
+ if (neverexits && !shutting_down)
{
- unsigned char s = WEXITSTATUS (status);
- if (neverexits && !shutting_down)
- {
- syslog (LOG_ERR, "`%s' service exited prematurely with "
- "exit status: %u", svcname, s);
- /* Do not report that the service is stopped so that if
- recovery options are set, Windows will automatically
- restart the service. */
- service_main_exitval = s;
- }
- else
- {
- syslog (LOG_INFO, "`%s' service stopped, exit status: %u",
- svcname, s);
- set_service_status (SERVICE_STOPPED);
- service_main_exitval = 0;
- }
+ syslog (LOG_ERR, "`%s' service exited prematurely with "
+ "exit status: %u", svcname, s);
+ /* Do not report that the service is stopped so that if
+ recovery options are set, Windows will automatically
+ restart the service. */
+ service_main_exitval = s;
}
- else if (WIFSIGNALED (status))
+ else
{
- /* If the signal is the one we've send, everything's ok.
- Otherwise we log the signal event. */
- if (!termsig_sent || WTERMSIG (status) != termsig)
- syslog (LOG_ERR, "service `%s' failed: signal %d raised",
- svcname, WTERMSIG (status));
- else
- syslog (LOG_INFO, "`%s' service stopped, signal %d received",
- svcname, WTERMSIG (status));
- set_service_status (SERVICE_STOPPED);
+ syslog (LOG_INFO, "`%s' service stopped, exit status: %u",
+ svcname, s);
service_main_exitval = 0;
+ set_service_status (SERVICE_STOPPED);
}
- else
- syslog (LOG_ERR, "`%s' service stopped for an unknown reason",
- svcname);
- }
- else if (WIFEXITED (status))
- {
- /* Although we're not going to set the service status to stopped,
- only allow zero exit status if neverexits is not set. */
- if (!neverexits)
- service_main_exitval = WEXITSTATUS (status);
- syslog_starterr ("execv", 0, WEXITSTATUS (status));
}
else if (WIFSIGNALED (status))
- syslog (LOG_ERR, "starting service `%s' failed: signal %d raised",
- svcname, WTERMSIG (status));
- break;
+ {
+ /* If the signal is the one we've send, everything's ok.
+ Otherwise we log the signal event. */
+ if (!termsig_sent || WTERMSIG (status) != termsig)
+ syslog (LOG_ERR, "service `%s' failed: signal %d raised",
+ svcname, WTERMSIG (status));
+ else
+ syslog (LOG_INFO, "`%s' service stopped, signal %d received",
+ svcname, WTERMSIG (status));
+ service_main_exitval = 0;
+ set_service_status (SERVICE_STOPPED);
+ }
+ else
+ {
+ syslog (LOG_ERR, "`%s' service stopped for an unknown reason",
+ svcname);
+ service_main_exitval = 0;
+ set_service_status (SERVICE_STOPPED);
+ }
}
}
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat, Inc.
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Problem reports: http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
More information about the Cygwin
mailing list