This is the mail archive of the cygwin@cygwin.com mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Cygwin deadlocks due to broken select() when writing to pipes


I have recently discovered that the Cygwin implementation of select()
is broken (or at best incomplete): it incorrectly claims that file
descriptors are *always* ready to write to pipes.

That's bad, because when select() indicates that file descriptors are
ready for writing (or reading), then it is supposed to be guaranteed
that a subsequent write() (or read()) will not block.  But writes to
a pipe can certainly block if the pipe happens to be full (i.e., the
process reading from the other end of the pipe is doing so slowly, and
the amount of data in transit exceeds the system-dependent limit on the
buffer size of the pipe).

Many programs (rsync and sshd come to mind) are written to use select()
to avoid blocking write() and read() calls, and if select() misbehaves as
described above, then they can deadlock.  We have observed this happening
in a variety of scenarios, but the most reproducible is to run rsync over
ssh to pull data from a Cygwin system to some other system, like Linux.
This has been reported by others to the rsync mailing list:

    http://www.mail-archive.com/rsync@lists.samba.org/msg07559.html

The strace output reported in this message is consistent with our
experience, and shows that a deadlock occurs when the rsync server
process is looping doing ...

select(2, NULL, [1], NULL, {60, 0})     = 1 (out [1], left {60, 0})
write(1, "...", 4096) = 4096

The write() blocks after select() incorrectly claims that fd 1 is ready
for writing.  The Cygwin strace output shows this even more clearly:

----------------------------------------
  128 124570283 [main] rsync 940 cygwin_select: 2, 0x0, 0x226A30, 0x0, 0x226A20
  182 124570465 [main] rsync 940 dtable::select_write:  fd 1
   95 124570560 [main] rsync 940 cygwin_select: to->tv_sec 60, to->tv_usec 0, ms 60000
   98 124570658 [main] rsync 940 cygwin_select: sel.always_ready 1
  103 124570761 [main] rsync 940 select_stuff::cleanup: calling cleanup routines
  104 124570865 [main] rsync 940 set_bits: me 0x101BA4C0, testing fd 1 ()
  103 124570968 [main] rsync 940 set_bits: ready 1
   96 124571064 [main] rsync 940 select_stuff::poll: returning 1
  101 124571165 [main] rsync 940 select_stuff::cleanup: calling cleanup routines
  101 124571266 [main] rsync 940 select_stuff::~select_stuff: deleting select records
  178 124571444 [main] rsync 940 writev: writev (1, 0x2269F0, 1)
   97 124571541 [main] rsync 940 fhandler_base::write: binary write
        ... write() blocks here, eventually ...
  140 124571681 [main] rsync 940 fhandler_base::write: 4096 = write (0x226A60, 4096)
  102 124571783 [main] rsync 940 writev: 4096 = write (1, 0x2269F0, 1), errno 0
----------------------------------------

I have also appended a short test program that reproduces the bug.
The program creates a pipe and writes to it in small chunks until the
pipe fills.  If it is compiled with -USELECT, then eventually write()
blocks, as expected.  However, if we compile with -DSELECT, then on
UNIX systems, one or more write() calls succeed, and eventually select()
starts timing out to indicate that the pipe is full (so the write file
descriptor is not ready).  On Cygwin the program blocks in write()
even with -DSELECT, which isn't supposed to happen.

I was a bit surprised not to see any mention of this important
limitation of select() for pipes in the User's Guide (section 1.6.10)
or in the source code.  But in winsup/cygwin/select.cc it is clear
that fhandler_pipe::select_write just sets the write_ready field of the
select_record to true, and peek_pipe doesn't do anything for the write
file descriptor case.  We can also see that the always_ready field is
set in the strace output above.

It isn't immediately clear how to fix this.  I see that PeekNamedPipe()
is used to determine if read descriptors for pipes are ready, but
this obviously won't work for write file descriptors.  Were any other
approaches considered and rejected while this code was being developed,
or was the problem not recognized at the time?

--
Bob Byrnes                        e-mail: byrnes@curl.com
Curl Corporation                  phone:  617-761-1200
1 Cambridge Center, 10th Floor    fax:    617-761-1201
Cambridge, MA 02142-1612

----------------------------------------

/* sel-pipe.c */

#include <stdio.h>

#include <stdlib.h>
#include <unistd.h>

#ifdef  SELECT
#include <sys/time.h>
#include <sys/types.h>
#include <sys/select.h>
#endif  /* SELECT */

#ifndef CHUNK
#define CHUNK   1024
#endif

static char buf[CHUNK];

int
main(int argc, char **argv)
{
    int pfds[2];
    int count = 0;

    if (pipe(pfds) == -1) {
        perror("pipe");
        exit(2);
    }

    while (1) {
#ifdef  SELECT
        int nfds;
        struct timeval timeout;
        fd_set wfds;
        int found;

        nfds = pfds[1] + 1;

        timeout.tv_sec = 1;
        timeout.tv_usec = 0;

        FD_ZERO(&wfds);
        FD_SET(pfds[1], &wfds);

        switch (found = select(nfds, NULL, &wfds, NULL, &timeout)) {
            case 1:
                if (!FD_ISSET(pfds[1], &wfds)) {
                    fprintf(stderr, "select returned without fd set\n");
                    exit(3);
                }
                break;  /* continue with write, below */

            case 0:
                printf("pipe is full\n");
                fflush(stdout);
                continue;

            case -1:
                perror("select");
                exit(4);

            default:
                fprintf(stderr, "select returned strange fd count %d\n", found);
                exit(5);
        }
#endif  /* SELECT */

        printf("writing chunk #%d ... ", ++count);
        fflush(stdout);

        if (write(pfds[1], buf, sizeof(buf)) == -1) {
            perror("write");
            exit(9);
        }

        printf("done\n");
        fflush(stdout);
    }
}

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]