This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 4/4] Mark nscd service as forking in systemd service file (#16639)


Siddhesh Poyarekar <siddhesh@redhat.com> writes:

> That's the meaning I got from a systemd maintainer, see:

> https://bugzilla.redhat.com/show_bug.cgi?id=1048123#c2

> Fixing nscd to be a well-behaved daemon that recovers from its errors
> is a fairly complicated task, given the number of errors it would need
> to recover from.  I think it is a good compromise to start with and
> maybe keep a bug open to track nscd's error recovery.

The advice that you got here doesn't seem to be quite right.  It seems to
me like it's confusing the readiness protocol with the way systemd
monitors processes.

The bug report here was that nscd failed to start, but systemctl didn't
realize that and returned a zero status.  This is a readiness problem.  If
you use Type=simple, systemd will assume that the process is ready
(started) as soon as the binary is execed.  systemctl will then return
success at that point, and will not be aware of the subsequent failure.
But in the case of nscd, that's not actually true; it's *not* ready to
answer queries as soon as the binary has been run with exec.  It has more
startup work to do, and that work can fail.

Normally, socket activation means you don't care that much if the daemon
is not ready to answer requests immediately, since the requests will queue
in the socket that was created by systemd, and you want to assume
immediate readiness.  That's why a lot of services that use socket
activation don't bother with readiness notification.  But that assumption
does *not* hold if the service can fail without ever answering a request,
*and* you want to stall any services that depend on it until you're sure
that the service has started properly.  It sounds like that's the concern
here.  (If that isn't a concern, then I think the correct answer to that
bug report is that it's not a bug, just a misunderstanding of what the
return status of systemctl means.)

You can "fix" this by converting it to Type=forking, but that's only
because you're changing the readiness protocol to wait for the process to
create a PID file.  Failures before creation of the PID file will
therefore be detected... but at the cost of using a more complex service
startup and having to carry a PID file around that isn't actually
necessary.  Note, though, that if you don't use a PID file and the
PIDFile= option, you're still left with the same problem if nscd fails
after forking but before actually being ready to answer requests.  (Now,
it's possible -- I've not checked -- that nscd is very careful to not exit
in the parent process until the child process is ready to answer
connections, in which case that readiness protocol would work.  But that's
tricky to do properly and a lot of internal work.)

The best fix from a systemd perspective is to not use either of these
service types and instead use Type=notify along with the sd_notify(3) API
to clearly inform systemd when the daemon is actually ready to answer
requests.  This would produce the correct behavior in this case without
requiring reverting the service type to the forking model.  This is
exactly the problem for which the sd_notify(3) interface was created.

-- 
Russ Allbery (eagle@eyrie.org)              <http://www.eyrie.org/~eagle/>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]