The unreliability of AF_UNIX datagram sockets

Ken Brown
Sat May 1 21:41:51 GMT 2021

On 4/29/2021 1:39 PM, Corinna Vinschen wrote:
> On Apr 29 12:44, Ken Brown wrote:
>> On 4/29/2021 11:05 AM, Corinna Vinschen wrote:
>>> So maybe we should really think hard about the alternative
>>> implementation using POSIX message queues, I guess.  And *if* we do
>>> that, this should be used likewise for STREAM as for DGRAM sockets, so
>>> the code is easier to maintain.  Obvious advantage: No problem with
>>> older OS versions.  And maybe it's even dirt easy to implement in
>>> comparison with using other methods, because the transport mechanism
>>> is already in place.
>> Yes, I don't think it should be too hard.  The one thing I can think of
>> that's missing is a facility for doing a partial read of a message on the
>> message queue.  (This would be needed for a recv call on a STREAM socket, in
>> which the buffer is smaller than the payload of the next message on the
>> queue.)  But this should be straightforward to implement.
>> Alternatively, I guess we could read the whole message and store the excess
>> in a readahead buffer.
> Alternatively, we could introduce a new, internal-only method into the
> POSIX msq code, one that reads a partial message, reduces the message
> to the remainder and keeps it on the queue head...
>> On 4/29/2021 11:18 AM, Corinna Vinschen wrote:
>>> While searching the net I found this additional gem of information:
>>> Native AF_UNIX sockets don't support abstract sockets.  You must bind to
>>> a valid path, so you always have a visible file in the filesystem.
>>> Discussed here:
>>> We could workaround that with our POSIX unlink semantics, probably,
>>> but it's YA downside
>> Agreed.  The more features that are missing from native AF_UNIX sockets, the
>> less appealing they become.
>> Concerning abstract sockets, would we still have an issue if we used message
>> queues?  Wouldn't there be a visible file under /dev/mqueue?  Or is there a
>> way around that?
> Good point!  There's no way around that yet.  In theory that shouldn't
> matter because /dev/mqueue is kind of a "virtual" path, even if Cygwin
> implements the queues as real files.  But that's setting the perspective
> straight, we're in fact no better than the native AF_UNIX here ¯\_(ツ)_/¯
> Probably we should actually add an internal-only way of creating
> non-file backed mqueues for the purpose of adding abstract sockets.

I've been thinking about the overall design of using mqueues instead of pipes, 
and I just want to make sure I'm on the right track.  Here are my thoughts:

1. Each socket needs to create its own mqueue that it uses only for reading. 
For writing, it opens its peer's mqueue.  So each socket holds two mqueue 
descriptors, one for reading and one for writing.

2. A STREAM socket S that wants to connect to a listening socket T sends a 
message to T containing S's mqueue name.  (Probably it's sufficient for S to 
send its unique ID, from which the mqueue name will be constructed.)  T then 
creates a socket T1, which sends its mqueue name (or ID) to S, and S and T1 are 
then connected.  In the async case, maybe S uses mq_notify to set up the thread 
that waits for a connection.

3. In fhandler_socket_unix::dup, the child will need to open any mqueues that 
the parent holds open.  Maybe an internal _mq_dup function would be useful here.

4. I'm not sure what needs to be done after fork/exec.  After an exec, all 
mqueue descriptors are automatically closed according to Kerrisk, but I don't 
see where this is done in the Cygwin code.  Or is it somehow automatic as a 
consequence of the mqueue implementation (which I haven't studied in detail)? 
On the other hand, why does Cygwin's mq_open accept O_CLOEXEC if this is the case?

And after a fork, something might need to be done to make sure that the child 
can set the blocking mode of its inherited mqueue descriptors independently of 
the parent.  If I understand the mqueue documentation correctly, this isn't 
normally the case.  In the terminology of Kerrisk, the mqueue descriptor that 
the child inherits from the parent refers to the same mqueue description as the 
parent's descriptor, and the blocking mode is part of the description.  But 
again, this might be Linux terminology that doesn't apply to Cygwin.

That's all I have for the moment, but I'm sure there will be more questions when 
I actually start coding.


More information about the Cygwin-developers mailing list