This is the mail archive of the
cygwin@cygwin.com
mailing list for the Cygwin project.
Cygwin deadlocks due to broken select() when writing to pipes
- From: "Bob Byrnes" <byrnes at curl dot com>
- To: cygwin at cygwin dot com
- Date: Thu, 30 Oct 2003 22:31:19 -0500
- Subject: Cygwin deadlocks due to broken select() when writing to pipes
- Organization: Curl Corporation
I have recently discovered that the Cygwin implementation of select()
is broken (or at best incomplete): it incorrectly claims that file
descriptors are *always* ready to write to pipes.
That's bad, because when select() indicates that file descriptors are
ready for writing (or reading), then it is supposed to be guaranteed
that a subsequent write() (or read()) will not block. But writes to
a pipe can certainly block if the pipe happens to be full (i.e., the
process reading from the other end of the pipe is doing so slowly, and
the amount of data in transit exceeds the system-dependent limit on the
buffer size of the pipe).
Many programs (rsync and sshd come to mind) are written to use select()
to avoid blocking write() and read() calls, and if select() misbehaves as
described above, then they can deadlock. We have observed this happening
in a variety of scenarios, but the most reproducible is to run rsync over
ssh to pull data from a Cygwin system to some other system, like Linux.
This has been reported by others to the rsync mailing list:
http://www.mail-archive.com/rsync@lists.samba.org/msg07559.html
The strace output reported in this message is consistent with our
experience, and shows that a deadlock occurs when the rsync server
process is looping doing ...
select(2, NULL, [1], NULL, {60, 0}) = 1 (out [1], left {60, 0})
write(1, "...", 4096) = 4096
The write() blocks after select() incorrectly claims that fd 1 is ready
for writing. The Cygwin strace output shows this even more clearly:
----------------------------------------
128 124570283 [main] rsync 940 cygwin_select: 2, 0x0, 0x226A30, 0x0, 0x226A20
182 124570465 [main] rsync 940 dtable::select_write: fd 1
95 124570560 [main] rsync 940 cygwin_select: to->tv_sec 60, to->tv_usec 0, ms 60000
98 124570658 [main] rsync 940 cygwin_select: sel.always_ready 1
103 124570761 [main] rsync 940 select_stuff::cleanup: calling cleanup routines
104 124570865 [main] rsync 940 set_bits: me 0x101BA4C0, testing fd 1 ()
103 124570968 [main] rsync 940 set_bits: ready 1
96 124571064 [main] rsync 940 select_stuff::poll: returning 1
101 124571165 [main] rsync 940 select_stuff::cleanup: calling cleanup routines
101 124571266 [main] rsync 940 select_stuff::~select_stuff: deleting select records
178 124571444 [main] rsync 940 writev: writev (1, 0x2269F0, 1)
97 124571541 [main] rsync 940 fhandler_base::write: binary write
... write() blocks here, eventually ...
140 124571681 [main] rsync 940 fhandler_base::write: 4096 = write (0x226A60, 4096)
102 124571783 [main] rsync 940 writev: 4096 = write (1, 0x2269F0, 1), errno 0
----------------------------------------
I have also appended a short test program that reproduces the bug.
The program creates a pipe and writes to it in small chunks until the
pipe fills. If it is compiled with -USELECT, then eventually write()
blocks, as expected. However, if we compile with -DSELECT, then on
UNIX systems, one or more write() calls succeed, and eventually select()
starts timing out to indicate that the pipe is full (so the write file
descriptor is not ready). On Cygwin the program blocks in write()
even with -DSELECT, which isn't supposed to happen.
I was a bit surprised not to see any mention of this important
limitation of select() for pipes in the User's Guide (section 1.6.10)
or in the source code. But in winsup/cygwin/select.cc it is clear
that fhandler_pipe::select_write just sets the write_ready field of the
select_record to true, and peek_pipe doesn't do anything for the write
file descriptor case. We can also see that the always_ready field is
set in the strace output above.
It isn't immediately clear how to fix this. I see that PeekNamedPipe()
is used to determine if read descriptors for pipes are ready, but
this obviously won't work for write file descriptors. Were any other
approaches considered and rejected while this code was being developed,
or was the problem not recognized at the time?
--
Bob Byrnes e-mail: byrnes@curl.com
Curl Corporation phone: 617-761-1200
1 Cambridge Center, 10th Floor fax: 617-761-1201
Cambridge, MA 02142-1612
----------------------------------------
/* sel-pipe.c */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#ifdef SELECT
#include <sys/time.h>
#include <sys/types.h>
#include <sys/select.h>
#endif /* SELECT */
#ifndef CHUNK
#define CHUNK 1024
#endif
static char buf[CHUNK];
int
main(int argc, char **argv)
{
int pfds[2];
int count = 0;
if (pipe(pfds) == -1) {
perror("pipe");
exit(2);
}
while (1) {
#ifdef SELECT
int nfds;
struct timeval timeout;
fd_set wfds;
int found;
nfds = pfds[1] + 1;
timeout.tv_sec = 1;
timeout.tv_usec = 0;
FD_ZERO(&wfds);
FD_SET(pfds[1], &wfds);
switch (found = select(nfds, NULL, &wfds, NULL, &timeout)) {
case 1:
if (!FD_ISSET(pfds[1], &wfds)) {
fprintf(stderr, "select returned without fd set\n");
exit(3);
}
break; /* continue with write, below */
case 0:
printf("pipe is full\n");
fflush(stdout);
continue;
case -1:
perror("select");
exit(4);
default:
fprintf(stderr, "select returned strange fd count %d\n", found);
exit(5);
}
#endif /* SELECT */
printf("writing chunk #%d ... ", ++count);
fflush(stdout);
if (write(pfds[1], buf, sizeof(buf)) == -1) {
perror("write");
exit(9);
}
printf("done\n");
fflush(stdout);
}
}
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Problem reports: http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/