Unix Domain Socket Limitation?

Ken Brown kbrown@cornell.edu
Mon Nov 30 18:14:52 GMT 2020


On 11/30/2020 12:19 PM, Norton Allen wrote:
> On 11/26/2020 12:13 PM, Ken Brown wrote:
>> [Adding the Cygwin list back to the Cc.]
>>
>> On 11/26/2020 11:27 AM, Norton Allen wrote:
>>> On 11/25/2020 5:27 PM, Ken Brown via Cygwin wrote:
>>>> On 11/25/2020 4:47 PM, Norton Allen wrote:
>>>>> In my recent tests, it appears as though it is not possible to successfully 
>>>>> connect via two Unix Domain sockets from one client application to one 
>>>>> server application.
>>>>>
>>>>> Specifically, if I create a server which listens on a Unix Domain socket 
>>>>> and a client, which attempts to connect() twice, both seem to lock up. This 
>>>>> is not the behavior under Linux.
>>>>>
>>>>> I will be happy to work up a minimal example if it is helpful in tracking 
>>>>> this down. I wanted to start by asking whether this is a known limitation 
>>>>> and/or if there is something about the Cygwin implementation that makes 
>>>>> this sort of thing very difficult.
>>>>
>>>> A minimal example would be extremely helpful.
>>>>
>>>> Corinna can answer questions about limitations in the current 
>>>> implementation. But there is a new implementation under development. It's in 
>>>> the topic/af_unix branch of the Cygwin git repository if you're interested 
>>>> in looking at it.
>>>>
>>>> Corinna began working on this a couple years ago, and I've recently been 
>>>> trying to finish it.  I've made quite a bit of progress, but there's still 
>>>> more to do and undoubtedly many bugs. So any test cases you have would be 
>>>> very useful. 
>>>
>>> Thanks Ken,
>>>
>>> As it happens, attempting to produce a minimal example suggests my problem 
>>> may be somewhere else. I think I've worked in most of the features of my 
>>> application one by one but have not yet revealed a failure.
>>
>> OK.  But if you ever do have occasion to write small test programs involving 
>> AF_UNIX sockets, please send them on.  The new AF_UNIX code needs as much 
>> testing as it can get.
>>
> I have finally put together a start of a minimal example, although it seems to 
> require a certain level of complexity before tripping on the bug. At the moment, 
> I do not believe the issue is related to having multiple sockets between the 
> client and server. I am thinking it is some sort of race condition related to 
> non-blocking sockets, since I have only observed it when both the client and 
> server are using non-blocking sockets.
> 
> I have yet to plunge into cygwin.dll, but I think I have reached that point.
> 
> Here is the code: https://github.com/nthallen/cygwin_unix
> 
> Since I have only exercised this on my machine, I would be very interested to 
> know if it is reproducible on anyone else's.

I can reproduce the hang, and it happens if I use the new AF_UNIX code also. 
But what I'm seeing (at least with the new code) isn't exactly what you describe.

When the server's first select call returns, accept succeeds.  The server then 
calls select a second time, and that call doesn't return.  I haven't checked yet 
to see what's going on in the client, and I may not get to that for a while.

Ken


More information about the Cygwin mailing list