1.5.9-1: socket() appears NOT to be thread-safe

Brian Ford ford@vss.fsi.com
Thu Jun 10 15:21:00 GMT 2004


On Fri, 21 May 2004, Christopher Faylor wrote:

> On Fri, May 21, 2004 at 05:21:19PM -0500, Brian Ford wrote:
> >On Thu, 15 Apr 2004, Christopher Faylor wrote:
> >>Corinna showed me that this was a problem in my autoload code rather
> >>than a problem with winsock.  That's comforting.  I guess I've grown
> >>too quick to judge Windows.
> >>
> >>I've checked in a fix and am regenerating a snapshot.  The fix
> >>consisted of deleting a few lines of code so that's always nice...
> >>
> >>Thanks for the test case.  It helped a lot in tracking this problem
> >>down.
> >
> >I still see the same symptom (ie.  socket randomly returns "Operation
> >not permitted" at application startup) with current CVS, but not with
> >the original test case, and only on a dual CPU box :-(.

I'm now seeing it on some Hyperthreaded boxes too.  It is still a problem
in current CVS.

> It's not usually helpful to see a "it doesn't work" a month after the
> announcement of a fix.  Call me absent minded but I don't even remember
> what I did to supposedly fix this.

I know, but I was unable to get time on the particular box that exhibts
the problem with the particular software scenerio that does the same in a
timely fashion.  Sorry.

> >About 30% of the time, socket returns the error above.  I tried
> >replacing the exec line in the shell script with:
> >
> >exec strace -o tracefile -b 1000000 socket_error.exe
> >
> >but then it doesn't fail.  It also doesn't fail if socket_error.exe is
> >launched directly from the bash prompt.
> >
> >I will keep trying to come up with a test case that I can actually study,
> >but I was hoping someone might have an idea about how to catch it better
> >or where to look.
>
> Put a call to the debugger at the offending error message and look around.

Ok, I found out a tiny bit more.

I put a call to try_to_debug in find_winsock_errno when it returned the
default EPERM.  Then, by setting
CYGWIN="error_start=c:\cygwin\bin\gdb.exe" I can get a gdb pop up and do a
back trace.  The error was:

10093
Either the application has not called WSAStartup, or WSAStartup failed.

and came from the socket call in cygwin_socket.

> >Is it possible that the autoload code needs to be made dual CPU safe?
>
> No.

So, given this is some sort of race condition, how do I debug the autoload
code and find out if WSAStartup was actually called or if it failed?

I'm still digging, but very slowly given the nature of the problem and my
time to debug it.

-- 
Brian Ford
Senior Realtime Software Engineer
VITAL - Visual Simulation Systems
FlightSafety International
the best safety device in any aircraft is a well-trained pilot...

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/



More information about the Cygwin mailing list