The unreliability of AF_UNIX datagram sockets

Corinna Vinschen
Mon May 3 10:30:45 GMT 2021

Hi Ken,

On May  1 17:41, Ken Brown wrote:
> I've been thinking about the overall design of using mqueues instead of
> pipes, and I just want to make sure I'm on the right track.  Here are my
> thoughts:
> 1. Each socket needs to create its own mqueue that it uses only for reading.
> For writing, it opens its peer's mqueue.  So each socket holds two mqueue
> descriptors, one for reading and one for writing.

Sounds right to me.

> 2. A STREAM socket S that wants to connect to a listening socket T sends a
> message to T containing S's mqueue name.  (Probably it's sufficient for S to
> send its unique ID, from which the mqueue name will be constructed.)  T then
> creates a socket T1, which sends its mqueue name (or ID) to S, and S and T1
> are then connected.  In the async case, maybe S uses mq_notify to set up the
> thread that waits for a connection.

Sounds good as well.  Maybe it's better to look at this from the
listener side in the first place, because that's the more tricky side,
but that's just a POV thingy.

> 3. In fhandler_socket_unix::dup, the child will need to open any mqueues
> that the parent holds open.  Maybe an internal _mq_dup function would be
> useful here.

Makes sense.

> 4. I'm not sure what needs to be done after fork/exec.  After an exec, all

Same here, see below.

> mqueue descriptors are automatically closed according to Kerrisk, but I
> don't see where this is done in the Cygwin code.  Or is it somehow automatic
> as a consequence of the mqueue implementation (which I haven't studied in
> detail)?

Yes, that's automatic.  The handles are duped, the addresses are either
on the heap or in an mmap, those are duplicated automaticelly during
fork.  The file descriptor for the mmap'ed file gets closed right during
mq_open, so it's not inherited at all, and memory isn't inherited by an
exec'ed child.  But, see below (Note 2).

> On the other hand, why does Cygwin's mq_open accept O_CLOEXEC if
> this is the case?

The mq code doesn't handle incoming O_CLOEXEC explicitely, it just lets
open flags slip through.  I don't know what Linux' idea here is, but for
our implementation O_CLOEXEC has no meaning because the open flags other
than O_NONBLOCK are only used in the open(2) call for the mapped file,
and that uses O_CLOEXEC anyway.

> And after a fork, something might need to be done to make sure that the
> child can set the blocking mode of its inherited mqueue descriptors
> independently of the parent.  If I understand the mqueue documentation
> correctly, this isn't normally the case.  In the terminology of Kerrisk, the
> mqueue descriptor that the child inherits from the parent refers to the same
> mqueue description as the parent's descriptor, and the blocking mode is part
> of the description.  But again, this might be Linux terminology that doesn't
> apply to Cygwin.

Doesn't apply to Cygwin.  The structure representing the mqd_t, mq_info,
is used to keep track of the O_NONBLOCK flag, not the mqueue header.  So
the flag is local only.

> That's all I have for the moment, but I'm sure there will be more questions
> when I actually start coding.

Certainly.  As for the above "see below"s... I encountered a couple of
problems over the weekend myself (during soccer viewing, which I don't
care for at all), which all need either fixing, or have to be
implemented first.

1. As you noticed, the socket descriptors are inherited by exec'ed
   children, but the mqueue isn't.  So we need at least some kind of
   fixup_after_exec for mqueues used as part of AF_UNIX sockets.

2. While none of the mqueue structures are propagated to child
   processes, the handles to the synchronization objects accidentally

3. Note 1 and 2 can only be implemented, if we introduce a new
   superstructure keeping track of all mdq_t/mq_info structure
   pointers in an application.  Oh well.  Bummer, I was SOO happy
   that the posix_ipc stuff didn't need it yet...

4. As stated in the code comment leading the mqueue implementation,
   I used Stevens code as the basis.  What I didn't realize so far is
   that Stevens simplified the implementation in some ways.  The code
   works for real POSIX mqueues, but needs some more fixing before it
   can be used for AF_UNIX at all.

5. I hacked a bit on an mq-only mmap call, which is supposed to allow
   creating/opening of named shared memeory areas, but that's a tricky
   extension to the mmap scenario.  I have a gut feeling that it's
   better to avoid using mmap at all and use Windows section mapping
   directly in mq_open/mq_close, especially if we have to implement
   fixup_after_exec semantics anyway.

6. Ultimately, AF_UNIX sockets should not run file-backed at all,
   anyway.  Given that sockets can't be bound multiple times, there's
   no persistency requirement for the mqueue.

7. ...?  Not sure if I forgot something here, but the above problems
   are quite enough to spend some time on already...


More information about the Cygwin-developers mailing list