This is sources Bugzilla
Bugzilla Version 2.17.5
Bugzilla Bug 3242
  getgrgid() and getgrnam() can fail for large groups when using nscd Last modified: 2007-10-13 18:15
     Query page      Enter new bug
Bug#: 3242   Hardware:   Reporter: mdm@google.com
Host: Target: Build:
Product:     Add CC:
Component:   Version:   CC:
Remove selected CCs
Status: RESOLVED   Priority:  
Resolution: FIXED   Severity:  
Assigned To: Ulrich Drepper <drepper@redhat.com>   Target Milestone:  
Flags: Requestee:
  backport ()
  examined ()
  testsuite ()
Summary:
Keywords:

Attachment Description Type Created Actions
Create a New Attachment (proposed patch, testcase, etc.) View All

Bug 3242 depends on: Show dependency tree
Show dependency graph
Bug 3242 blocks:

Additional Comments:


Leave as RESOLVED FIXED
Reopen bug
Mark bug as VERIFIED

View Bug Activity   |   Format For Printing


Description:   Last confirmed: 0000-00-00 00:00 Opened: 2006-09-21 18:27
When nscd has a sufficiently large reply to a query (for instance a group
with many members), the entire reply may not yet be ready for reading in
the client when it first returns from __poll() indicating that the socket
is ready for reading. The functions __readall() and __readvall() in
nscd/nscd_helper.c expect that the entire reply can be read immediately,
since __poll() indicated that the socket was ready, but in reality this is
not always the case - nscd may need to get scheduled again before more
data will be available.

The effect of this problem is that when using nscd, calls to functions
like getgrgid() may fail intermittently when the reply is large. The
following patch adds calls to __poll() inside the loops in these two
functions when errno is EAGAIN to correct the problem.

--- nscd/nscd_helper.c  2006-02-28 21:39:03.000000000 -0800
+++ nscd/nscd_helper.c  2006-09-14 16:29:45.812773000 -0700
@@ -44,6 +44,14 @@
   do
     {
       ret = TEMP_FAILURE_RETRY (__read (fd, buf, n));
+      if (ret < 0 && errno == EAGAIN)
+       {
+         struct pollfd fds[1];
+         fds[0].fd = fd;
+         fds[0].events = POLLIN | POLLERR | POLLHUP;
+         if (__poll (fds, 1, 200) > 0)
+           continue;
+       }
       if (ret <= 0)
        break;
       buf = (char *) buf + ret;
@@ -58,8 +66,10 @@
 __readvall (int fd, const struct iovec *iov, int iovcnt)
 {
   ssize_t ret = TEMP_FAILURE_RETRY (__readv (fd, iov, iovcnt));
-  if (ret <= 0)
+  if (ret <= 0 && errno != EAGAIN)
     return ret;
+  if (ret < 0)
+    ret = 0;

   size_t total = 0;
   for (int i = 0; i < iovcnt; ++i)
@@ -82,6 +92,17 @@
          iovp->iov_base = (char *) iovp->iov_base + r;
          iovp->iov_len -= r;
          r = TEMP_FAILURE_RETRY (__readv (fd, iovp, iovcnt));
+         if (r < 0 && errno == EAGAIN)
+           {
+             struct pollfd fds[1];
+             fds[0].fd = fd;
+             fds[0].events = POLLIN | POLLERR | POLLHUP;
+             if (__poll (fds, 1, 200) > 0)
+               {
+                 r = 0;
+                 continue;
+               }
+           }
          if (r <= 0)
            break;
          ret += r;

------- Additional Comment #1 From Ulrich Drepper 2007-10-13 18:03 -------
This could indeed be an issue.  The proposed patch had a whole bunch of
problems, though.  I wrote my on patch which also avoids duplication.  THe
result is in cvs.

------- Additional Comment #2 From Giampaolo Tomassoni 2007-10-13 18:15 -------
Well, it was meant to be a dirty patch, not a definitive solution.

Happy to see this is finally fixed, however.

     Query page      Enter new bug
Actions: New | Query | bug # | Reports | Requests   New Account | Log In