This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH 4/4] Mark nscd service as forking in systemd service file (#16639)

From: Russ Allbery <eagle at eyrie dot org>
To: "Joseph S. Myers" <joseph at codesourcery dot com>
Cc: Rich Felker <dalias at aerifal dot cx>, Siddhesh Poyarekar <siddhesh at redhat dot com>, <libc-alpha at sourceware dot org>
Date: Wed, 26 Feb 2014 17:42:46 -0800
Subject: Re: [PATCH 4/4] Mark nscd service as forking in systemd service file (#16639)
Authentication-results: sourceware.org; auth=none
References: <20140226172242 dot GE6419 at spoyarek dot pnq dot redhat dot com> <20140226183950 dot GK184 at brightrain dot aerifal dot cx> <20140226185509 dot GG6419 at spoyarek dot pnq dot redhat dot com> <87a9ddwkyg dot fsf at windlord dot stanford dot edu> <20140227004603 dot GN184 at brightrain dot aerifal dot cx> <Pine dot LNX dot 4 dot 64 dot 1402270106190 dot 17207 at digraph dot polyomino dot org dot uk>

"Joseph S. Myers" <joseph@codesourcery.com> writes:

> And for a build in the glibc context you'd want to use dlopen to avoid
> circular dependencies (dependencies of code built with glibc on any
> other library that needs glibc to build are best avoided where
> possible), complicating things further.

Or embed the equivalent code.  It's fairly straightforward and only
differs in some details from what Rich proposes (primarily to allow
notifications to be multiplexed across one listening socket and to support
the other features of the notification protocol, which are not in play
here).

I think the question here is how much effort you want to put into
detecting nscd failures and converting them into service activation
failures, and what types of failures you want to detect that way.

In general, if a daemon starts and then dies, the command to start it
often doesn't know about the failure and still returns success (often
before the daemon dies).  This has been true as long as there have been
init systems, and it's an unsolvable problem in general since the daemon
could die at any point and the start command can't wait forever for
failures.  You already have to handle those failures some other way,
either by alerting someone or by attempting to restart the process or
both.

The point of a notification process is to decide when the service is
sufficiently up to allow other services that depend on it to be started.
This is a complex question with no clear solution that works for everyone;
to some extent it comes down to local policy.  Some people want everything
to start as fast as possible provided that no queries to
correctly-configured daemons will be lost.  Other people want each service
to be fully verified to be running before any services that depend on it
have started.  Yet other groups may actually want to stop downstream
services if an upstream service fails unexpectedly.  Some number of those
failures won't be caught by the startup command because they happen too
late.  The only thing that one can do in practice is move around where
"too late" is based on what one thinks the common case is.  (This is true
regardless of init system; all notification protocols have the same basic
set of tradeoffs.)

Anyway, in practice, there are five notification methods you can use that
work with current init systems:

1. None.  Treat the service as ready as soon as the process starts.  With
   socket activation, this satisfies the requirement that no requests to
   correctly-configured services will be lost, but it means that you will
   not detect runtime misconfiguration at the time of service start and
   will need to catch that some other way (such as, for example, asking
   systemd what services have failed).

   All widely-used init systems except traditional init scripts support
   this method.

2. Exit of the parent process.  This requires a forking service model,
   which has various drawbacks and which essentially all init systems
   written after the classic shell script init system have tried to move
   away from.  This allows you to detect all errors that can be detected
   before the fork, but requires some sort of internal IPC mechanism to
   tell the parent process when to exit if you want to detect errors that
   happen after the fork.  This is the most common historical method, but
   it's usually incorrectly implemented because getting the details right
   is hard.

   All widely-used init systems support this method.

3. Writing of the PID file.  The service is considered started when the
   PID file is created.  This has various problems with stale PID files,
   locking concerns when two copies of the daemon are started at the same
   time with the same PID file, and so forth, but is often easier to get
   right than coordinating the parent process exit.

   I'm not sure any init system actually supports this.  Debian's
   start-stop-daemon wrapper used with traditional init does not; it still
   uses exit of the parent process.  systemd can read the PID file but
   doesn't appear to take it into account for startup notification.
   However, in theory, it would be possible, and it may be that the
   traditional init libraries on platforms I'm less familiar with than
   Debian do use this method.

4. sd_notify, which uses an anonymous or UNIX domain socket to communicate
   to the init system when the daemon is actually ready.  This is the
   easiest to get right since the daemon has complete control over the
   notification timing without having to do things like coordinate process
   exit.  However, the notification protocol is the most complex of the
   four options.

   Of the widely-used init systems, only systemd supports this method.

5. Raising SIGSTOP when the process is ready.  This is equivalently easy
   to sd_notify to get the timing right, for the same reasons, but uses
   (or abuses, depending on how you feel about it) SIGSTOP for something
   other than its documented purpose and requires the init system to raise
   SIGCONT or the results are very confusing.

   Of the widely-used init systems, only upstart supports this method.

Basically, pick your poison.  They all have advantages and disadvantages.
But since it's an IPC protocol, you have to pick some method that the init
system you're targeting actually supports, or there's no point.

The suggestion in the bug was to switch nscd from type 1 to type 2
notifications.  That may or may not be the right thing to do.  It depends
on what errors you want to catch during startup, whether catching those
errors is worth the additional complexity of a forking service, etc.  It's
hard to make a general decision for everyone.

In a systemd world, from the system administrator perspective, supporting
type 4 notifications is a clear win, since then they can use either
Type=notify or Type=simple based on their local requirements and both work
as expected.  But the significant drawback from a glibc perspective is
that the daemon side of the sd_notify protocol is, while relatively
straightforward, not trivial.

-- 
Russ Allbery (eagle@eyrie.org)              <http://www.eyrie.org/~eagle/>

References:
- [PATCH 4/4] Mark nscd service as forking in systemd service file (#16639)
  - From: Siddhesh Poyarekar
- Re: [PATCH 4/4] Mark nscd service as forking in systemd service file (#16639)
  - From: Rich Felker
- Re: [PATCH 4/4] Mark nscd service as forking in systemd service file (#16639)
  - From: Siddhesh Poyarekar
- Re: [PATCH 4/4] Mark nscd service as forking in systemd service file (#16639)
  - From: Russ Allbery
- Re: [PATCH 4/4] Mark nscd service as forking in systemd service file (#16639)
  - From: Rich Felker
- Re: [PATCH 4/4] Mark nscd service as forking in systemd service file (#16639)
  - From: Joseph S. Myers

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]