View Bug Activity | Format For Printing
When nscd has a sufficiently large reply to a query (for instance a group with many members), the entire reply may not yet be ready for reading in the client when it first returns from __poll() indicating that the socket is ready for reading. The functions __readall() and __readvall() in nscd/nscd_helper.c expect that the entire reply can be read immediately, since __poll() indicated that the socket was ready, but in reality this is not always the case - nscd may need to get scheduled again before more data will be available. The effect of this problem is that when using nscd, calls to functions like getgrgid() may fail intermittently when the reply is large. The following patch adds calls to __poll() inside the loops in these two functions when errno is EAGAIN to correct the problem. --- nscd/nscd_helper.c 2006-02-28 21:39:03.000000000 -0800 +++ nscd/nscd_helper.c 2006-09-14 16:29:45.812773000 -0700 @@ -44,6 +44,14 @@ do { ret = TEMP_FAILURE_RETRY (__read (fd, buf, n)); + if (ret < 0 && errno == EAGAIN) + { + struct pollfd fds[1]; + fds[0].fd = fd; + fds[0].events = POLLIN | POLLERR | POLLHUP; + if (__poll (fds, 1, 200) > 0) + continue; + } if (ret <= 0) break; buf = (char *) buf + ret; @@ -58,8 +66,10 @@ __readvall (int fd, const struct iovec *iov, int iovcnt) { ssize_t ret = TEMP_FAILURE_RETRY (__readv (fd, iov, iovcnt)); - if (ret <= 0) + if (ret <= 0 && errno != EAGAIN) return ret; + if (ret < 0) + ret = 0; size_t total = 0; for (int i = 0; i < iovcnt; ++i) @@ -82,6 +92,17 @@ iovp->iov_base = (char *) iovp->iov_base + r; iovp->iov_len -= r; r = TEMP_FAILURE_RETRY (__readv (fd, iovp, iovcnt)); + if (r < 0 && errno == EAGAIN) + { + struct pollfd fds[1]; + fds[0].fd = fd; + fds[0].events = POLLIN | POLLERR | POLLHUP; + if (__poll (fds, 1, 200) > 0) + { + r = 0; + continue; + } + } if (r <= 0) break; ret += r;
This could indeed be an issue. The proposed patch had a whole bunch of problems, though. I wrote my on patch which also avoids duplication. THe result is in cvs.
Well, it was meant to be a dirty patch, not a definitive solution. Happy to see this is finally fixed, however.