This is the mail archive of the libc-help@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Indefinite hang in getaddrinfo / check_pf / make_request


On Sep 22, 2015, at 9:59 PM, Steven Schlansker <stevenschlansker@gmail.com> wrote:

> 
>> On Sep 22, 2015, at 9:04 PM, Paul Pluzhnikov <ppluzhnikov@google.com> wrote:
>> 
>> On Tue, Sep 22, 2015 at 8:53 PM, Steven Schlansker
>> <stevenschlansker@gmail.com> wrote:
>> 
>>> We found the following issue:
>>> https://sourceware.org/bugzilla/show_bug.cgi?id=15946
>> 
>> You may be seeing https://sourceware.org/bugzilla/show_bug.cgi?id=12926 instead.
>> 
>> See if that patch has been applied to your sources as well.
> 
> Thanks for finding this.  While that fix is not applied to our deployed version,
> I think the symptoms are slightly different

Thanks Paul and Adhemerval for the advice.  I believe I have evidence that this is
not the same issue as either 15946 or 12926.

Here is a strace log as my process goes through a few separate calls to getaddrinfo.
First one succeeds, last one hangs forever:


socket(PF_NETLINK, SOCK_RAW, 0)         = 12
bind(12, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0
getsockname(12, {sa_family=AF_NETLINK, pid=1, groups=00000000}, [12]) = 0
sendto(12, "\24\0\0\0\26\0\1\3\37'\4V\0\0\0\0\0\0\0\0", 20, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 20
recvmsg(12, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"L\0\0\0\24\0\2\0\37'\4V\1\0\0\0\2\10\200\376\1\0\0\0\10\0\1\0\177\0\0\1\10\0\2\0\177\0\0\1\7\0\3\0lo\0\0\10\0\10\0\200\0\0\0\24\0\6\0\377\377\377\377\377\377\377\377\327\2\0\0\327\2\0\0X\0\0\0\24\0\2\0\37'\4V\1\0\0\0\2\27\200
\0\7\0\0\0\10\0\1\0\n\36ON\10\0\2\0\n\36ON\10\0\4\0\n\36O\377\t\0\3\0eth0\0\0\0\0\10\0\10\0\200\0\0\0\24\0\6\0\377\377\377\377\377\377\377\377\343\2\0\0\343\2\0\0P\0\0\0\24\0\2\0\37'\4V\1\0\0\0\2\20\200\0\20\0\0\0\10\0\1\0\254\21*\1\10\0\2\0\254\21*\1\f\0\3\0docker0\0\10\0\10\0\200\0\0\0\24\0\6\0\377\377\377\377
\377\377\377\377\200\3\0\0\200\3\0\0", 4096}], msg_controllen=0, msg_flags=0}, 0) = 244
recvmsg(12, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"H\0\0\0\24\0\2\0\37'\4V\1\0\0\0\n\200\200\376\1\0\0\0\24\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\24\0\6\0\377\377\377\377\377\377\377\377\327\2\0\0\327\2\0\0\10\0\10\0\200\0\0\0H\0\0\0\24\0\2\0\37'\4V\1\0\0\0\n@\200\375\7\0\0\0
\24\0\1\0\376\200\0\0\0\0\0\0\10\203h\377\376\221:?\24\0\6\0\377\377\377\377\377\377\377\377\341\2\0\0\341\2\0\0\10\0\10\0\200\0\0\0H\0\0\0\24\0\2\0\37'\4V\1\0\0\0\n@\200\375\20\0\0\0\24\0\1\0\376\200\0\0\0\0\0\0004\361\34\377\376\256\345\230\24\0\6\0\377\377\377\377\377\377\377\377\204\3\0\0\204\3\0\0\10\0\10\0
\200\0\0\0H\0\0\0\24\0\2\0\37'\4V\1\0\0\0\n@\200\375\32\0\0\0\24\0\1\0\376\200\0\0\0\0\0\0\34\243\256\377\376\342n\10\24\0\6\0\377\377\377\377\377\377\377\377a\357\262\10a\357\262\10\10\0\10\0\200\0\0\0H\0\0\0\24\0\2\0\37'\4V\1\0\0\0\n@\200\375\36\0\0\0\24\0\1\0\376\200\0\0\0\0\0\0,pt\377\376lI?\24\0\6\0\377\377
\377\377\377\377\377\3774q\326\t4q\326\t\10\0\10\0\200\0\0\0H\0\0\0\24\0\2\0\37'\4V\1\0\0\0\n@\200\375 \0\0\0\24\0\1\0\376\200\0\0\0\0\0\0\4a3\377\376\374\354\341\24\0\6\0\377\377\377\377\377\377\377\3778\257\24\n8\257\24\n\10\0\10\0\200\0\0\0H\0\0\0\24\0\2\0\37'\4V\1\0\0\0\n@\200\375(\0\0\0\24\0\1\0\376\200\0\0
\0\0\0\0\234%\7\377\376\202\255d\24\0\6\0\377\377\377\377\377\377\377\377\360\240\177\17\360\240\177\17\10\0\10\0\200\0\0\0", 4096}], msg_controllen=0, msg_flags=0}, 0) = 504
recvmsg(12, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\24\0\0\0\3\0\2\0\37'\4V\1\0\0\0\0\0\0\0", 4096}], msg_controllen=0, msg_flags=0}, 0) = 20
close(12)                               = 0
stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=277, ...}) = 0
open("/etc/hosts", O_RDONLY|O_CLOEXEC)  = 12
fstat(12, {st_mode=S_IFREG|0644, st_size=351, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f53e1d00000
read(12, "127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4\n mesos-slave4-prod-uswest2.otsql.opentable.com mesos-slave4-prod-uswest2\n# The following lines are desirable for IPv6 capable hosts\n::1 ip6-localhost ip6-loopback\nfe00::0 ip6-localnet\nff00::0 ip6-mcastprefix\nff02::1 ip6-
allnodes\nff02::2 ip6-allrouters\nff02::3 ip6-allhosts\n\n", 4096) = 351
read(12, "", 4096)                      = 0
close(12)                               = 0
munmap(0x7f53e1d00000, 4096)            = 0
stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=277, ...}) = 0
socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 12
connect(12, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.30.64.254")}, 16) = 0
gettimeofday({1443112735, 433606}, NULL) = 0
poll([{fd=12, events=POLLOUT}], 1, 0)   = 1 ([{fd=12, revents=POLLOUT}])
sendto(12, "\343/\1\0\0\1\0\0\0\0\0\0\31mesos-slave4-prod-uswest2\5otsql\topentable\3com\0\0\1\0\1", 63, MSG_NOSIGNAL, NULL, 0) = 63
poll([{fd=12, events=POLLIN}], 1, 5000) = 1 ([{fd=12, revents=POLLIN}])
ioctl(12, FIONREAD, [227])              = 0
recvfrom(12, "\343/\201\200\0\1\0\1\0\4\0\4\31mesos-slave4-prod-uswest2\5otsql\topentable\3com\0\0\1\0\1\300\f\0\1\0\1\0\0\0\266\0\4\n\36ON\300,\0\2\0\1\0\0\0P\0\17\4ns-b\4pnap\3net\0\300,\0\2\0\1\0\0\0P\0\7\4ns-c\300`\300,\0\2\0\1\0\0\0P\0\7\4ns-d\300`\300,\0\2\0\1\0\0\0P\0\7\4ns-a\300`\300\234\0\1\0\1\0\0\0P\0
\4@^{\4\300[\0\1\0\1\0\0\0P\0\4@^{$\300v\0\1\0\1\0\0\0P\0\4@_=\4\300\211\0\1\0\1\0\0\0P\0\4@_=$", 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.30.64.254")}, [16]) = 227
close(12)                               = 0
socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 12
connect(12, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.30.64.254")}, 16) = 0
gettimeofday({1443112735, 435382}, NULL) = 0
poll([{fd=12, events=POLLOUT}], 1, 4998) = 1 ([{fd=12, revents=POLLOUT}])
sendto(12, "\214\200\1\0\0\1\0\0\0\0\0\0\31mesos-slave4-prod-uswest2\5otsql\topentable\3com\0\0\34\0\1", 63, MSG_NOSIGNAL, NULL, 0) = 63
gettimeofday({1443112735, 435596}, NULL) = 0
poll([{fd=12, events=POLLIN}], 1, 4998) = 1 ([{fd=12, revents=POLLIN}])
ioctl(12, FIONREAD, [113])              = 0
recvfrom(12, "\214\200\201\200\0\1\0\0\0\1\0\0\31mesos-slave4-prod-uswest2\5otsql\topentable\3com\0\0\34\0\1\300&\0\6\0\1\0\0\0\1\0&\10sc-dc-01\300&\5admin\0\0\0\345\"\0\0\3\204\0\0\2X\0\1Q\200\0\0\1,", 1821, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.30.64.254")}, [16]) = 113
close(12)                               = 0
write(8, "\1\0\0\0\0\0\0\0", 8)         = 8


futex(0x15ed764, FUTEX_WAIT_PRIVATE, 50538, NULL) = 0
futex(0x15ed7a0, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable)
futex(0x15ed7a0, FUTEX_WAKE_PRIVATE, 1) = 0


socket(PF_NETLINK, SOCK_RAW, 0)         = 12
bind(12, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0
getsockname(12, {sa_family=AF_NETLINK, pid=1, groups=00000000}, [12]) = 0
sendto(12, "\24\0\0\0\26\0\1\3 '\4V\0\0\0\0\0\0\0\0", 20, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 20
recvmsg(12, <hangs forever>




If this were bug #15946, I would expect *DNS query traffic* to go out incorrect file descriptors.
But we are not actually doing a DNS query, we are inside of check_pf which is attempting to get the list of
network interfaces through the kernel netlink socket.  And you can see the FD is always 12.

If this were bug #12926, I would expect the recvmsg call to return 0 and the userspace library to spin
forever.  But we are actually stuck inside of the recvmsg system call forever.

While thinking about how this code might fail, I read some interesting notes in the netlink(7) documentation:

> Netlink is not a reliable protocol.  It tries its best to deliver a message to its destination(s), but may drop messages when an out-of-memory condition or other error occurs.  For reliable transfer the sender can request an acknowledgement from the receiver by  setting  the  NLM_F_ACK  flag.   An acknowledgment is an NLMSG_ERROR packet with the error field set to 0.  The application must generate acknowledgements for received messages itself.  The kernel tries to send an NLMSG_ERROR message for every failed packet.  A user process should follow this convention too.
> However,  reliable transmissions from kernel to user are impossible in any case.  The kernel can't send a netlink message if the socket buffer is full: the message will be dropped and the kernel and the user-space process will no longer have the same view of kernel state.  It is up to the application to detect when this happens (via the ENOBUFS error returned by recvmsg(2)) and resynchronize.

The check_pf code does neither.  So if a kernel netlink reply is lost, it looks like it could be stuck in a read that will never complete?


I am going to spend some time trying to distill down a test case that just exercises the check_pf code and see if I can reproduce in isolation.
In the meantime, does anyone have any ideas for further diagnostics that would be useful?  I'm not sure how to check the kernel side of the netlink socket effectively,
to see if it actually tried to reply or not...


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]