Bug 18493

Summary: Infinite loop/deadlock? in __libc_recv (fd=fd@entry=300, buf=buf@entry=0x7f6042880600, n=n@entry=5, flags=-1, flags@entry=258) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
Product: glibc Reporter: Dan Searle <dan>
Component: libcAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED MOVED    
Severity: critical CC: drepper.fsp, fweimer
Priority: P2 Flags: fweimer: security-
Version: 2.19   
Target Milestone: ---   
See Also: https://bugzilla.redhat.com/show_bug.cgi?id=1205258
https://bugzilla.kernel.org/show_bug.cgi?id=99461
Host: Target:
Build: Last reconfirmed:

Description Dan Searle 2015-06-05 11:12:18 UTC
In a multi-threaded pthreads process running on Ubuntu 14.04 AMD64 (with over 1000 threads) which uses real time FIFO scheduling, we occasionally see calls to recv() with flags (MSG_PEEK | MSG_WAITALL) get stuck in an infinte loop or deadlock meaning the threads lock up chewing as much CPU as they can (due to FIFO scheduling) while stuck inside recv().

Here's an example gdb back trace:

[Switching to thread 4 (Thread 0x7f6040546700 (LWP 27251))]
#0  0x00007f6231d2f7eb in __libc_recv (fd=fd@entry=146, buf=buf@entry=0x7f6040543600, n=n@entry=5, flags=-1, flags@entry=258) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
33      ../sysdeps/unix/sysv/linux/x86_64/recv.c: No such file or directory.
(gdb) bt
#0  0x00007f6231d2f7eb in __libc_recv (fd=fd@entry=146, buf=buf@entry=0x7f6040543600, n=n@entry=5, flags=-1, flags@entry=258) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
#1  0x0000000000421945 in recv (__flags=258, __n=5, __buf=0x7f6040543600, __fd=146) at /usr/include/x86_64-linux-gnu/bits/socket2.h:44
[snip]

The socket is a TCP socket in blocking mode, the recv() call is inside an outer loop with a counter, and I've checked the counter with gdb and it's always at 1, meaning that I'm sure that the outer loop isn't the problem, the thread is indeed deadlocked inside the recv() internals.

Other nodes: 
* There always seems to be 2 or more threads deadlocked in the same place (same recv() call but with distinct FDs)
* The threads calling recv() have cancellation disbaled by previously executing: thread_setcancelstate(PTHREAD_CANCEL_DISABLE, NULL);

I've even tried adding a poll() call for POLLRDNORM on the socket before calling recv() with MSG_PEEK | MSG_WAITALL flags to try to make sure there's data available on the socket before calling poll(), but it makes no difference.

So, I don't know what is wrong here, I've read all the recv() documentation and believe that recv() is being used correctly, the only conclusion I can come to is that there is a bug in libc recv() when using flags MSG_PEEK | MSG_WAITALL with thousands of pthreads running.
Comment 1 Dan Searle 2015-06-05 11:17:06 UTC
Typo, original: "I've even tried adding a poll() call for POLLRDNORM on the socket before calling recv() with MSG_PEEK | MSG_WAITALL flags to try to make sure there's data available on the socket before calling poll(), but it makes no difference."

Should have been: "I've even tried adding a poll() call for POLLRDNORM on the socket before calling recv() with MSG_PEEK | MSG_WAITALL flags to try to make sure there's data available on the socket before calling *recv()*, but it makes no difference."
Comment 2 Andreas Schwab 2015-06-05 11:44:18 UTC
The __libc_recv function is just a thin wrapper around the recvfrom system call.  Please report this to the kernel people.
Comment 3 Dan Searle 2015-06-05 11:52:04 UTC
by "kernel people", you mean https://bugzilla.kernel.org/ ?
Comment 4 Florian Weimer 2015-06-10 14:00:44 UTC
(In reply to Dan Searle from comment #3)
> by "kernel people", you mean https://bugzilla.kernel.org/ ?

More like one of the mailing lists, either linux-kernel or netdev.  You will need to provide a proper test case, though.  It's also not quite clear what you mean by “the threads lock up chewing as much CPU as they can”. Does recv actually return from the kernel?
Comment 5 Dan Searle 2015-06-11 08:00:49 UTC
There is a tracker for this bug with a test case here: https://bugzilla.redhat.com/show_bug.cgi?id=1205258
Comment 6 Florian Weimer 2015-06-11 08:07:58 UTC
(In reply to Dan Searle from comment #5)
> There is a tracker for this bug with a test case here:
> https://bugzilla.redhat.com/show_bug.cgi?id=1205258

Also <https://bugzilla.kernel.org/show_bug.cgi?id=99461>.  This clearly is a kernel bug.
Comment 7 Dan Searle 2015-06-11 08:15:08 UTC
Agreed, it's a kernel bug, it doesn't handle symultanious use of both MSG_PEEK and MSG_WAITALL flags in recvfrom SYSCALL in certain edge case(s).

I have worked around the issue for now by not using MSG_WAITALL (while still using MSG_PEEK) with an outer loop around recv() with a sleep() and a counter to retry the recv() call a set number of times before timing out.