This is the mail archive of the mailing list for the Cygwin project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Fwd: RE: ssh problem on Windows XP]

On Jan 21,  6:34pm, Corinna Vinschen wrote:
-- Subject: [Fwd: RE: ssh problem on Windows XP]
> is there any chance that we get a fix in the next couple of weeks?

I remain absolutely committed to fixing the problems that have been
reported, but I can't say that I'll have a fix in that timeframe,
because I have some urgent deadlines for other projects.  Maybe
early to mid-February?

> If we don't get a patch, I'm inclined to revert the pipe patch before
> we release 1.5.13.

Instead of reverting the entire patch, if you want to restore the old
behavior (select always returning true for writes on pipes), you could
add a small piece of code to "short-circuit" the NtQueryInformationFile
logic that I added.

That would make it much easier for me to apply my fix when it's available,
because I could just remove the "short-circuit" when I add a test to
detect the problem, which I think I understand completely, and have
described in an earlier posting: NtQueryInformationFile acts strangely
when there is a pending, blocking read on the other end of the pipe.
I need time to finish prototyping the new test, however.

> Btw., didn't you announce more pipe patches yet to come?  Is it possible
> that you already have a patch which will get that working again?  I'm
> still hoping for something more satisfying than reverting...
-- End of excerpt from Corinna Vinschen

Yes, I have more patches, but they don't fix the outstanding problem
with the first patch (I would have certainly sent the fix if I had one).
All of my fixes are related to detecting and avoiding deadlocks, and I
have some that are not pipe related.

In case anyone is curious, let me relate the story of what started all
of this ...

At Curl (where I work), we have a pool of about a dozen Windows servers
that we use for automated builds of our products.  Each build starts by
rsync-ing the sources from a Linux server to a Windows build server (over
ssh), then we launch make via ssh and collect std{out,err} over an ssh
channel, and finally the build finishes by rsync-ing the build tree back
from the Windows build server to the Linux build server, again over ssh.

Our Windows build servers are in almost continuous use.  As you
can imagine, this setup acts as a severe stress test for Cygwin.
Unfortunately, last year Cygwin deadlocks were killing our productivity:
at least 25% of our builds were wedging, which was completely

So, I rolled up my sleeves, installed a Cygwin DLL with all of the
debugging symbols, and went to work investigating and fixing each deadlock
that I encountered.

It soon became apparent that pipes were a major problem.  We observed
deadlocks because select for writes on pipes always returned true,
so I implemented a fix (that's the first patch).  Similar deadlocks
occurred because nonblocking writes on pipes could block, so I added
an implementation for nonblocking writes too (a second patch, which I
submitted, but was never applied, because we wanted to investigate the
reported problems with the first patch).

Later, I found that Cygwin is burned by unfortunate winsock behavior
that we had already encountered in other contexts: it sometimes assigns a
local port that can't actually be used immediately, because it is still
in TIME_WAIT, so connect fails with EADDRINUSE.  I fixed one nasty case
where this phenomenon can cause a missed notification in the code for
select on sockets (a third patch), and another that caused socketpair
to fail sporadically (a fourth patch).

The improvement in Cygwin's behavior with these four patches has been
dramatic at Curl.  Our builds almost never experience deadlocks now.  I am
eager to contribute them to the rest of the world, but I recognize that
I need to fix the first patch before we apply the rest, and I will do it.

Our patches have been extensively tested, but we missed the problem
that occurs for pending, nonblocking reads, because our automated builds
don't use commands like sftp, unison, etc.  Most of the other commands
seem to use nonblocking I/O on pipes (often with select), and that works
with my patches.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]