This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug libc/12994] New: getaddrinfo fails if response records returned in wrong order and one of them is server failure

           Summary: getaddrinfo fails if response records returned in
                    wrong order and one of them is server failure
           Product: glibc
           Version: 2.14
            Status: NEW
          Severity: normal
          Priority: P2
         Component: libc

Created attachment 5848
tcpdump capture from getaddrinfo

A program calls getaddrinfo.

Deep within the bowels of the resolver library, __libc_res_nquery in
res_query.c creates two queries, an A query and an AAAA query.

Deeper within the bowels of the resolver library, send_dg in res_send.c sends
both queries and waits for responses. My name server sends the response to the
*second* query *first*, and it's a server failure. I'm pretty sure that if the
responses were sent in the reverse order, the problem would not occur.

At this point things get all screwed up. I'm not sure whether the problem is in
send_dg or _libc_res_nsend or _libc_res_nquery. I've spent hours poring over
the code trying to figure out who is at fault. I can't, because this is some of
the most poorly written code I've looked at in a very long time. It's
completely incomprehensible and most of its "cleverness" is inadequately

Anyway, by the time status results bubble back up to getaddrinfo, the code has
decided that it was unable to resolve the host name to an address, even though
one of the two responses that came back from the DNS server had a valid A
record in it.

Test case? Run getaddrinfo on immediately after restarting
your name server. I'm using BIND 9.8.0-7.P4.fc15.x86_64; I don't know how
universal this behavior is. I am attaching a wireshark dump from the virtual
interface that captures both my loopback interface (on which my client is
making its queries) and the queries my DNS server is making to try to satisfy
the local queries. And here's what my test program (which I will also attach)
prints as output:

Wed Jul 13 00:14:18 2011: getaddrinfo: Name or service not known

Note that if you run the exact same getaddrinfo call a second time immediately
afterwards it works, because the previous successful query response, which is a
CNAME, is cached and gets returned in response to both the A and AAAA queries.

Since this bug causes DNS queries that should succeed to fail in a very
user-visible way, I'm tempted to set it to critical, but I suppose since
there's no permanent loss of data it isn't actually. I don't know, tough call.

Configure bugmail:
------- You are receiving this mail because: -------
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]