This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] CVE-2015-7547 --- glibc getaddrinfo() stack-based buffer overflow


The glibc project thanks the Google Security Team and Red Hat for 
reporting the security impact of this issue, and Robert Holiday of Ciena
for reporting the related bug 18665.

Summary
=======

During upstream review of the public open bug 18665 for glibc, it was
discovered that the bug could lead to a stack-based buffer overflow.

https://sourceware.org/bugzilla/show_bug.cgi?id=18665

The buffer overflow occurs in the function send_dg (UDP) and send_vc
(TCP) for the NSS module libnss_dns.so.2 when calling getaddrinfo with
AF_UNSPEC family and in some cases also with AF_INET6 before the fix in
commit 8479f23a (only use gethostbyname4_r if PF_UNSPEC).

The use of AF_UNSPEC triggers the low-level resolver code to send out
two parallel queries for A and AAAA. A mismanagement of the buffers used
for those queries could result in the response writing beyond the alloca
allocated buffer created by __res_nquery.

Main conclusions:

- Via getaddrinfo with family AF_UNSPEC or AF_INET6 the overflowed
  buffer is located on the stack via alloca (a 2048 byte fixed size
  buffer for DNS responses).

- At most 65535 bytes (MAX_PACKET) may be written to the alloca buffer
  of 2048 bytes. Overflowing bytes are entirely under the control of the
  attacker and are the result of a crafted DNS response.

- Local testing shows that we have been able to control at least the
  execution of one free() call with the buffer overflow and gained
  control of EIP. Further exploitation was not attempted, only this
  single attempt to show that it is very likely that execution control
  can be gained without much more effort. We know of no known attacks
  that use this specific vulnerability.

- Mitigating factors for UDP include:
  - A firewall that drops UDP DNS packets > 512 bytes.
  - A local resolver (that drops non-compliant responses).
  - Avoid dual A and AAAA queries (avoids buffer management error) e.g.
    Do not use AF_UNSPEC.
  - No use of `options edns0` in /etc/resolv.conf since EDNS0 allows
    responses larger than 512 bytes and can lead to valid DNS responses
    that overflow.
  - No use of `RES_USE_EDNS0` or `RES_USE_DNSSEC` since they can both
    lead to valid large EDNS0-based DNS responses that can overflow.

- Mitigating factors for TCP include:
  - Limit all replies to 1024 bytes.

- Mitigations that don't work:
  - Setting `options single-request` does not change buffer management
    and does not prevent the exploit.
  - Setting `options single-request-reopen` does not change buffer
    management and does not prevent the exploit.
  - Disabling IPv6 does not disable AAAA queries. The use of AF_UNSPEC
    unconditionally enables the dual query.
    - The use of `sysctl -w net.ipv6.conf.all.disable_ipv6=1` will not
      protect your system from the exploit.
  - Blocking IPv6 at a local or intermediate resolver does not work to
    prevent the exploit. The exploit payload can be delivered in A or
    AAAA results, it is the parallel query that triggers the buffer
    management flaw.

- The code that causes the vulnerability was introduced in May 2008 as
  part of glibc 2.9.

- The code that causes the vulnerability is only present in glibc's copy
  of libresolv which has enhancements to carry out parallel A and AAAA
  queries. Therefore only programs using glibc's copy of the code have
  this problem.

- A back of the envelope analysis shows that it should be possible to
  write correctly formed DNS responses with attacker controlled payloads
  that will penetrate a DNS cache hierarchy and therefore allow
  attackers to exploit machines behind such caches.

Solution
========

The immediate solution to the buffer mismanagement issues are as
follows:

- Remove buffer reuse.

- Always malloc the second response buffer if needed.

  - Requires fix for sourceware bug 16574 to avoid memory leak.
    commit d668061994a7486a3ba9c7d5e7882d85a2883707
    commit ab09bf616ad527b249aca5f2a4956fd526f0712f

- Correctly adjust pointer *and* size for buffer in use.

In order to validate and test the resulting changes, including valgrind
validation, the following was fixed:

- Uninitialized uses of *herrno_p.
  - With all uses initialized we have clean valgrind runs.

- Result of NSS_STATUS_SUCCESS masking the case where the second
  response has failed with an ERANGE failure. In this case the second
  response will contain whatever was on the stack last (alloca).
  - With NSS_STATUS_TRYAGAIN returned if any of the results fail with
    ERANGE we have deterministic results that can be validated.

Attached to the email are:
- Patch to fix the vulnerability.

To follow:
- Tarball of validation tests which will be integrated into glibc.

NEWS update will be included in the final commit.

High-level Analysis:
====================

The defect is located in the glibc sources in the following file:

- resolv/res_send.c

as part of the send_dg and send_vc functions which are part of the
__libc_res_nsend (res_nsend) interface which  is used by many of the
higher level interfaces including getaddrinfo (indirectly via the DNS
NSS module.)

One way to trigger the buffer mismanagement is like this:

* Have the target attempt a DNS resolution for a domain you control.
  - Need to get A and AAAA queries.
* First response is 2048 bytes.
  - Fills the alloca buffer entirely with 0 left over.
  - send_dg attemps to reuse the user buffer but can't.
  - New buffer created but due to bug old alloca buffer is used with new
    size of 65535 (size of the malloc'd buffer).
  - Response should be valid.
* Send second response.
  - This response should be flawed in such a way that it forces
    __libc_res_nsend to retry the query. It is sufficient for example to
    pick any of the listed failure modes in the code which return zero.
* Send third response.
  - The third response can contain 2048 bytes of valid response.
  - The remaining 63487 bytes of the response are the attack payload and
    the recvfrom smashes the stack with it.

The flaw happens because when send_dg is retried it restarts the query,
but the second time around the answer buffer points to the alloca'd
buffer but with the wrong size.

Please note that there are other ways to trigger the buffer management
flaw, but they require slightly more control over the timing of the
responses and use poll timeout to carry out the exploit with just two
responses from the attacker (as opposed to three).

A similar exploit is possible with TCP, but requires closing the TCP
connection (either with a TCP reset or a regular 3-way connection
close), or sending an empty response with a zero length header. Any such
action with forces send_vc to exit and retry with the wrong buffer size
will trigger a similar failure as seen in send_dg.

Detailed Analysis:
==================

>From getaddrinfo we call into the NSS DNS module.

First in resolv/nss_dns/dns-host.c (_nss_dns_gethostbyname4_r):

 307   host_buffer.buf = orig_host_buffer = (querybuf *) alloca (2048);
...
 315   int n = __libc_res_nsearch (&_res, name, C_IN, T_UNSPEC,
 316                               host_buffer.buf->buf, 2048, &host_buffer.ptr,
 317                               &ans2p, &nans2p, &resplen2, &ans2p_malloced);

We have the alloca which is the root cause of the stack smash (even a
malloc'd buffer might be exploitable via malloc metadata manipluation
similar to CVE-2015-0235).

Then in resolv/res_send.c (send_dg):

We are in POLLIN reading the server response to our request:

1151         } else if (pfd[0].revents & POLLIN) {
1152                 int *thisanssizp;
1153                 u_char **thisansp;
1154                 int *thisresplenp;
1155 
1156                 if ((recvresp1 | recvresp2) == 0 || buf2 == NULL) {
1157                         thisanssizp = anssizp;
1158                         thisansp = anscp ?: ansp;
1159                         assert (anscp != NULL || ansp2 == NULL);
1160                         thisresplenp = &resplen;

Neither reply is received yet so thisanssizp is anssizp e.g. 2048,
matching the user supplied buffer (from _nss_dns_gethostbyname4_r).

1189                 if (*thisanssizp < MAXPACKET
1190                     /* Yes, we test ANSCP here.  If we have two buffers
1191                        both will be allocatable.  */
1192                     && anscp
1193 #ifdef FIONREAD
1194                     && (ioctl (pfd[0].fd, FIONREAD, thisresplenp) < 0
1195                         || *thisanssizp < *thisresplenp)
1196 #endif

The buffer is sufficient and `thisresplenp` is 2048 and that fits in the
user buffer.

1357                 /* Mark which reply we received.  */
1358                 if (recvresp1 == 0 && hp->id == anhp->id)
1359                         recvresp1 = 1;

We mark the first response as received, and go back to `wait:`

1362                 /* Repeat waiting if we have a second answer to arrive.  */
1363                 if ((recvresp1 & recvresp2) == 0) {
...
1374                         goto wait;

This time around though we have already received one reply, and buf2 is
non-NULL so we trigger this:

1162                         if (*anssizp != MAXPACKET) {
1163                                 /* No buffer allocated for the first
1164                                    reply.  We can try to use the rest
1165                                    of the user-provided buffer.  */
1166 #if _STRING_ARCH_unaligned
1167                                 *anssizp2 = orig_anssizp - resplen;
1168                                 *ansp2 = *ansp + resplen;

The use of MAXPACKET is a value/boolean-style check. The send_dg
internal logic uses a value of *anssizp of MAXPACKET to indicate an
internal buffer was allocated for the response. It does not contemplate
a user might pass in a buffer that big, but let us ignore that for now.

The above code, noting it has not allocated an malloc'd buffer, attempts
to use the remainder of the user buffer to store the second response to
avoid malloc.

However, the user buffer of 2048 was fully used by the 2048 response,
and the value of `orig_anssizp - resplen` is zero, so `*anssizp2` is
zero.

This isn't relevant to the failure itself, but serves to explain why we
trigger the malloc code below which leads to the failure.

1169 #else
...
1184                         thisanssizp = anssizp2;
1185                         thisansp = ansp2;
1186                         thisresplenp = resplen2;

Then `thisansp` points also just beyond the stack, but because the size
is zero we'll never use this pointer.

1189                 if (*thisanssizp < MAXPACKET
1190                     /* Yes, we test ANSCP here.  If we have two buffers
1191                        both will be allocatable.  */
1192                     && anscp
1193 #ifdef FIONREAD
1194                     && (ioctl (pfd[0].fd, FIONREAD, thisresplenp) < 0
1195                         || *thisanssizp < *thisresplenp)

This is all true. We are reusing the user buffer so it's size is less
than MAXPACKET, yes we have `anscp` non-NULL pointer (allowed to modify
callers storage pointers), and the ioctl shows we have 10000 bytes
arriving.

1196 #endif
1197                     ) {
1198                         u_char *newp = malloc (MAXPACKET);
1199                         if (newp != NULL) {
1200                                 *anssizp = MAXPACKET;
1201                                 *thisansp = ans = newp;
1202                                 if (thisansp == ansp2)
1203                                   *ansp2_malloced = 1;
1204                         }
1205                 }

So we allocate a new buffer, set *anssizp to MAXPACKET, but fail to set
*ansp to the new buffer, and fail to update *thisanssizp to the new
size.

1209                 *thisresplenp = recvfrom(pfd[0].fd, (char*)*thisansp,
1210                                          *thisanssizp, 0,
1211                                         (struct sockaddr *)&from, &fromlen);

Then in recvfrom we read zero bytes because *thisanssizp is zero.

Which means we error out:

1212                 if (__glibc_unlikely (*thisresplenp <= 0))       {
...
1218                         goto err_out;

However, we return to __libc_res_nsend, and we have manipulated the
callers state (they are all pointers-to-pointers) incorrectly.

Firstly:

1167                                 *anssizp2 = orig_anssizp - resplen;

... this sets the size of answer buffer #2 to 0.


1185                         thisansp = ansp2;
...
1201                                 *thisansp = ans = newp;

... but then we set the answer buffer #2 pointer to the malloced block.

So now in __libc_res_nsend the second answer buffer has size 0, but is
malloced and points to a valid block of MAXPACKET bytes of memory.

Secondly:

1200                                 *anssizp = MAXPACKET;

... this sets the size of the answer buffer #1 to MAXPACKET bytes, but
does nothing else to change *ansp to the new malloc'd buffer.

So now in __libc_res_nsend the first answer buffer has a recorded size
of MAXPACKET bytes, but is still the same alloca'd space that is only
2048 bytes long.

The send_dg function exits, and we loop in __libc_res_nsend looking for
an answer with the next resolver. The buffers are reused and send_dg is
called again and this time it results in `MAXPACKET - 2048` bytes being
overflowed from the response directly onto the stack.

Playing with the attack buffer slightly and the timing gives another way
to trigger the stack smash.

Set the response buffer to 2049 or larger to trigger malloc buffer
reallocation right away.

1182                 if ((recvresp1 | recvresp2) == 0 || buf2 == NULL) {
1183                         thisanssizp = anssizp;
1184                         thisansp = anscp ?: ansp;
1185                         assert (anscp != NULL || ansp2 == NULL);
1186                         thisresplenp = &resplen;

this executes and we assign `thisansp` to `anscp` which is a pointer
that is non-null if the caller intends to allow us to change the
allocation of the answer buffer.

Then we go on to allocate the new buffer:

1224                         u_char *newp = malloc (MAXPACKET);
1225                         if (newp != NULL) {
1226                                 *thisanssizp = MAXPACKET;
1227                                 *thisansp = ans = newp;
1228                                 if (thisansp == ansp2)
1229                                   *ansp2_malloced = 1;

We update `ans` (our cached copy of `ansp` or `&ans` in res_nsend), and
we update `*thisansp` which is `anscp` (`ansp` in res_nsend), but
nowhere do we update `ansp` which means that in res_nsend the two
pointers `&ans` and `ansp` point to two different buffers, but only have
one size `*anssizp` which is now MAXPACKET, but the original `ansp` is
still pointing at the 2048 alloca buffer.

Everything is OK for now since this works:

1235                 *thisresplenp = recvfrom(pfd[0].fd, (char*)*thisansp,
1236                                          *thisanssizp, 0,
1237                                         (struct sockaddr *)&from, &fromlen);

We point at the new buffer, and read the new length at most.

But if you delay the application enough to timeout the retry loop
(perhaps load induced in real life) and that causes us to exit send_dg
with 0 to indicate we should retry.

When we restart send_dg, we now have `&ans` pointing at the 2048 alloca
buffer but `&anssiz` is MAXPACKET now and doesn't match. This is OK for
the first answer because:

1184                 if ((recvresp1 | recvresp2) == 0 || buf2 == NULL) {
1185                         thisanssizp = anssizp;
1186                         thisansp = anscp ?: ansp;
1187                         assert (anscp != NULL || ansp2 == NULL);
1188                         thisresplenp = &resplen;

... this again selects anscp (malloc'd storage) over ansp (alloca).

However, if we are quick this time and don't timeout, we flip over to
trying to make the second query and setup an ansp2 buffer:

1204                         } else {
1205                                 /* The first reply did not fit into the
1206                                    user-provided buffer.  Maybe the second
1207                                    answer will.  */
1208                                 *anssizp2 = orig_anssizp;
1209                                 *ansp2 = *ansp;
1210                         }

Here we assign ansp2 to ansp (alloca) with an orig_anssizp size which is
wrong (MAXPACKET).

The next time we do a recvfrom we will smash the stack again.

The same exact problems exist in send_vc.

At this point we set out to write validation tests to verify the various
aspects of the bug and the supported options used in the resolver.

The test cases showed 2 uninitialized uses in getaddrinfo under
valgrind:

==4917== Conditional jump or move depends on uninitialised value(s)
==4917==    at 0x512674E: gaih_inet (getaddrinfo.c:870)
==4917==    by 0x512866D: getaddrinfo (getaddrinfo.c:2417)
==4917==    by 0x400D0B: main (bug18665.c:166)
==4917==  Uninitialised value was created by a stack allocation
==4917==    at 0x5125850: gaih_inet (getaddrinfo.c:275)
==4917== 
==4917== Conditional jump or move depends on uninitialised value(s)
==4917==    at 0x5126584: gaih_inet (getaddrinfo.c:1079)
==4917==    by 0x512866D: getaddrinfo (getaddrinfo.c:2417)
==4917==    by 0x400D0B: main (bug18665.c:166)
==4917==  Uninitialised value was created by a stack allocation
==4917==    at 0x5125850: gaih_inet (getaddrinfo.c:275)

Which are caused by failure to set *h_errnop in
resolv/nss_dns/dns-host.c (gaih_getanswer_slice) which results in
gaih_inet jump/move depends on uninitialized value here:

 862                       status = DL_CALL_FCT (fct4, (name, pat, tmpbuf,
 863                                                    tmpbuflen, &rc, &herrno,
 864                                                    NULL));
 865                       if (status == NSS_STATUS_SUCCESS)
 866                         break;
 867                       if (status != NSS_STATUS_TRYAGAIN
 868                           || rc != ERANGE || herrno != NETDB_INTERNAL)
 869                         {
 870                           if (herrno == TRY_AGAIN)
                                   ^^^^^^
 871                             no_data = EAI_AGAIN;
 872                           else
 873                             no_data = herrno == NO_DATA;
 874                           break;

All paths through _nss_dns_gethostbyname4_r and called functions must
correctly set *h_errnop or we have an uninitialized use of the stack
allocated herrno variable. This is fixed in the patch.

As a final belt-and-suspenders change we adjust the size of malloc'd
second buffer to zero when freeing the buffer in __libc_res_nsearch.
This way there is no confusion about the size of the buffer on free'd.

Lastly we also fix a buffer status issue in gaih_getanswer where a
second response failing with ERANGE still results in an
NSS_STATUS_SUCCESS result.  We need deterministic results in the buffers
in order test the API completely.  Returning NSS_STATUS_SUCCESS might
have allowed an A query to succeed at the expense of a failing AAAA
query, but being unable to disambiguate a failure from a success makes
testing impossible. We document very carefully all of the return codes
and indicate with reason '[5]' the exact code that was changed.

Taken from the patch itself this change captures what we are fixing:

+      /* Do not return a truncated second response (unless it was
+         unavoidable e.g. unrecoverable TRYAGAIN).  */
+      if (status == NSS_STATUS_SUCCESS
+         && (status2 == NSS_STATUS_TRYAGAIN
+             && *errnop == ERANGE && *h_errnop != NO_RECOVERY))
+       status = NSS_STATUS_TRYAGAIN;

This completes the testing-related chages.

All in all we go slightly beyond the minimal changes required to fix
and test the failure, but we do this in order to ensure we prevent
future problems with this code.

Cheers,
Carlos.



CVE-2015-7547

2016-02-15  Carlos O'Donell  <carlos@redhat.com>

	[BZ #18665]
	* resolv/nss_dns/dns-host.c (gaih_getanswer_slice): Always set
	*herrno_p.
	(gaih_getanswer): Document functional behviour. Return tryagain
	if any result is tryagain.
	* resolv/res_query.c (__libc_res_nsearch): Set buffer size to zero
	when freed.
	* resolv/res_send.c: Add copyright text.
	(__libc_res_nsend): Document that MAXPACKET is expected.
	(send_vc): Document. Remove buffer reuse.
	(send_dg): Document. Remove buffer reuse. Set *thisanssizp to set the
	size of the buffer. Add Dprint for truncated UDP buffer.

diff --git a/resolv/nss_dns/dns-host.c b/resolv/nss_dns/dns-host.c
index a255d5e..47cfe27 100644
--- a/resolv/nss_dns/dns-host.c
+++ b/resolv/nss_dns/dns-host.c
@@ -1031,7 +1031,10 @@ gaih_getanswer_slice (const querybuf *answer, int anslen, const char *qname,
   int h_namelen = 0;
 
   if (ancount == 0)
-    return NSS_STATUS_NOTFOUND;
+    {
+      *h_errnop = HOST_NOT_FOUND;
+      return NSS_STATUS_NOTFOUND;
+    }
 
   while (ancount-- > 0 && cp < end_of_message && had_error == 0)
     {
@@ -1208,7 +1211,14 @@ gaih_getanswer_slice (const querybuf *answer, int anslen, const char *qname,
   /* Special case here: if the resolver sent a result but it only
      contains a CNAME while we are looking for a T_A or T_AAAA record,
      we fail with NOTFOUND instead of TRYAGAIN.  */
-  return canon == NULL ? NSS_STATUS_TRYAGAIN : NSS_STATUS_NOTFOUND;
+  if (canon != NULL)
+    {
+      *h_errnop = HOST_NOT_FOUND;
+      return NSS_STATUS_NOTFOUND;
+    }
+
+  *h_errnop = NETDB_INTERNAL;
+  return NSS_STATUS_TRYAGAIN;
 }
 
 
@@ -1222,11 +1232,101 @@ gaih_getanswer (const querybuf *answer1, int anslen1, const querybuf *answer2,
 
   enum nss_status status = NSS_STATUS_NOTFOUND;
 
+  /* Combining the NSS status of two distinct queries requires some
+     compromise and attention to symmetry (A or AAAA queries can be
+     returned in any order).  What follows is a breakdown of how this
+     code is expected to work and why. We discuss only SUCCESS,
+     TRYAGAIN, NOTFOUND and UNAVAIL, since they are the only returns
+     that apply (though RETURN and MERGE exist).  We make a distinction
+     between TRYAGAIN (recoverable) and TRYAGAIN' (not-recoverable).
+     A recoverable TRYAGAIN is almost always due to buffer size issues
+     and returns ERANGE in errno and the caller is expected to retry
+     with a larger buffer.
+
+     Lastly, you may be tempted to make significant changes to the
+     conditions in this code to bring about symmetry between responses.
+     Please don't change anything without due consideration for
+     expected application behaviour.  Some of the synthesized responses
+     aren't very well thought out and sometimes appear to imply that
+     IPv4 responses are always answer 1, and IPv6 responses are always
+     answer 2, but that's not true (see the implemetnation of send_dg
+     and send_vc to see response can arrive in any order, particlarly
+     for UDP). However, we expect it holds roughly enough of the time
+     that this code works, but certainly needs to be fixed to make this
+     a more robust implementation.
+
+     ----------------------------------------------
+     | Answer 1 Status /   | Synthesized | Reason |
+     | Answer 2 Status     | Status      |        |
+     |--------------------------------------------|
+     | SUCCESS/SUCCESS     | SUCCESS     | [1]    |
+     | SUCCESS/TRYAGAIN    | TRYAGAIN    | [5]    |
+     | SUCCESS/TRYAGAIN'   | SUCCESS     | [1]    |
+     | SUCCESS/NOTFOUND    | SUCCESS     | [1]    |
+     | SUCCESS/UNAVAIL     | SUCCESS     | [1]    |
+     | TRYAGAIN/SUCCESS    | TRYAGAIN    | [2]    |
+     | TRYAGAIN/TRYAGAIN   | TRYAGAIN    | [2]    |
+     | TRYAGAIN/TRYAGAIN'  | TRYAGAIN    | [2]    |
+     | TRYAGAIN/NOTFOUND   | TRYAGAIN    | [2]    |
+     | TRYAGAIN/UNAVAIL    | TRYAGAIN    | [2]    |
+     | TRYAGAIN'/SUCCESS   | SUCCESS     | [3]    |
+     | TRYAGAIN'/TRYAGAIN  | TRYAGAIN    | [3]    |
+     | TRYAGAIN'/TRYAGAIN' | TRYAGAIN'   | [3]    |
+     | TRYAGAIN'/NOTFOUND  | TRYAGAIN'   | [3]    |
+     | TRYAGAIN'/UNAVAIL   | UNAVAIL     | [3]    |
+     | NOTFOUND/SUCCESS    | SUCCESS     | [3]    |
+     | NOTFOUND/TRYAGAIN   | TRYAGAIN    | [3]    |
+     | NOTFOUND/TRYAGAIN'  | TRYAGAIN'   | [3]    |
+     | NOTFOUND/NOTFOUND   | NOTFOUND    | [3]    |
+     | NOTFOUND/UNAVAIL    | UNAVAIL     | [3]    |
+     | UNAVAIL/SUCCESS     | UNAVAIL     | [4]    |
+     | UNAVAIL/TRYAGAIN    | UNAVAIL     | [4]    |
+     | UNAVAIL/TRYAGAIN'   | UNAVAIL     | [4]    |
+     | UNAVAIL/NOTFOUND    | UNAVAIL     | [4]    |
+     | UNAVAIL/UNAVAIL     | UNAVAIL     | [4]    |
+     ----------------------------------------------
+
+     [1] If the first response is a success we return success.
+         This ignores the state of the second answer and in fact
+         incorrectly sets errno and h_errno to that of the second
+	 answer.  However because the response is a success we ignore
+	 *errnop and *h_errnop (though that means you touched errno on
+         success).  We are being conservative here and returning the
+         likely IPv4 response in the first answer as a success.
+
+     [2] If the first response is a recoverable TRYAGAIN we return
+	 that instead of looking at the second response.  The
+	 expectation here is that we have failed to get an IPv4 response
+	 and should retry both queries.
+
+     [3] If the first response was not a SUCCESS and the second
+	 response is not NOTFOUND (had a SUCCESS, need to TRYAGAIN,
+	 or failed entirely e.g. TRYAGAIN' and UNAVAIL) then use the
+	 result from the second response, otherwise the first responses
+	 status is used.  Again we have some odd side-effects when the
+	 second response is NOTFOUND because we overwrite *errnop and
+	 *h_errnop that means that a first answer of NOTFOUND might see
+	 its *errnop and *h_errnop values altered.  Whether it matters
+	 in practice that a first response NOTFOUND has the wrong
+	 *errnop and *h_errnop is undecided.
+
+     [4] If the first response is UNAVAIL we return that instead of
+	 looking at the second response.  The expectation here is that
+	 it will have failed similarly e.g. configuration failure.
+
+     [5] Testing this code is complicated by the fact that truncated
+	 second response buffers might be returned as SUCCESS if the
+	 first answer is a SUCCESS.  To fix this we add symmetry to
+	 TRYAGAIN with the second response.  If the second response
+	 is a recoverable error we now return TRYAGIN even if the first
+	 response was SUCCESS.  */
+
   if (anslen1 > 0)
     status = gaih_getanswer_slice(answer1, anslen1, qname,
 				  &pat, &buffer, &buflen,
 				  errnop, h_errnop, ttlp,
 				  &first);
+
   if ((status == NSS_STATUS_SUCCESS || status == NSS_STATUS_NOTFOUND
        || (status == NSS_STATUS_TRYAGAIN
 	   /* We want to look at the second answer in case of an
@@ -1242,8 +1342,15 @@ gaih_getanswer (const querybuf *answer1, int anslen1, const querybuf *answer2,
 						     &pat, &buffer, &buflen,
 						     errnop, h_errnop, ttlp,
 						     &first);
+      /* Use the second response status in some cases.  */
       if (status != NSS_STATUS_SUCCESS && status2 != NSS_STATUS_NOTFOUND)
 	status = status2;
+      /* Do not return a truncated second response (unless it was
+         unavoidable e.g. unrecoverable TRYAGAIN).  */
+      if (status == NSS_STATUS_SUCCESS
+	  && (status2 == NSS_STATUS_TRYAGAIN
+	      && *errnop == ERANGE && *h_errnop != NO_RECOVERY))
+	status = NSS_STATUS_TRYAGAIN;
     }
 
   return status;
diff --git a/resolv/res_query.c b/resolv/res_query.c
index 4a9b3b3..95470a9 100644
--- a/resolv/res_query.c
+++ b/resolv/res_query.c
@@ -396,6 +396,7 @@ __libc_res_nsearch(res_state statp,
 		  {
 		    free (*answerp2);
 		    *answerp2 = NULL;
+		    *nanswerp2 = 0;
 		    *answerp2_malloced = 0;
 		  }
 	}
@@ -447,6 +448,7 @@ __libc_res_nsearch(res_state statp,
 			  {
 			    free (*answerp2);
 			    *answerp2 = NULL;
+			    *nanswerp2 = 0;
 			    *answerp2_malloced = 0;
 			  }
 
@@ -521,6 +523,7 @@ __libc_res_nsearch(res_state statp,
 	  {
 	    free (*answerp2);
 	    *answerp2 = NULL;
+	    *nanswerp2 = 0;
 	    *answerp2_malloced = 0;
 	  }
 	if (saved_herrno != -1)
diff --git a/resolv/res_send.c b/resolv/res_send.c
index a968b95..21843f1 100644
--- a/resolv/res_send.c
+++ b/resolv/res_send.c
@@ -1,3 +1,20 @@
+/* Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
 /*
  * Copyright (c) 1985, 1989, 1993
  *    The Regents of the University of California.  All rights reserved.
@@ -355,6 +372,8 @@ __libc_res_nsend(res_state statp, const u_char *buf, int buflen,
 #ifdef USE_HOOKS
 	if (__glibc_unlikely (statp->qhook || statp->rhook))       {
 		if (anssiz < MAXPACKET && ansp) {
+			/* Always allocate MAXPACKET, callers expect
+			   this specific size.  */
 			u_char *buf = malloc (MAXPACKET);
 			if (buf == NULL)
 				return (-1);
@@ -630,6 +649,77 @@ get_nsaddr (res_state statp, int n)
     return (struct sockaddr *) (void *) &statp->nsaddr_list[n];
 }
 
+/* The send_vc function is responsible for sending a DNS query over TCP
+   to the nameserver numbered NS from the res_state STATP i.e.
+   EXT(statp).nssocks[ns].  The function supports sending both IPv4 and
+   IPv6 queries at the same serially on the same socket.
+
+   Please note that for TCP there is no way to disable sending both
+   queries, unlike UDP, which honours RES_SNGLKUP and RES_SNGLKUPREOP
+   and sends the queries serially and waits for the result after each
+   sent query.  This implemetnation should be corrected to honour these
+   options.
+
+   Please also note that for TCP we send both queries over the same
+   socket one after another.  This technically violates best practice
+   since the server is allowed to read the first query, respond, and
+   then close the socket (to service another client).  If the server
+   does this, then the remaining second query in the socket data buffer
+   will cause the server to send the client an RST which will arrive
+   asynchronously and the client's OS will likely tear down the socket
+   receive buffer resulting in a potentially short read and lost
+   response data.  This will force the client to retry the query again,
+   and this process may repeat until all servers and connection resets
+   are exhausted and then the query will fail.  It's not known if this
+   happens with any frequency in real DNS server implementations.  This
+   implementation should be corrected to use two sockets by default for
+   parallel queries.
+
+   The query stored in BUF of BUFLEN length is sent first followed by
+   the query stored in BUF2 of BUFLEN2 length.  Queries are sent
+   serially on the same socket.
+
+   Answers to the query are stored firstly in *ANSP up to a max of
+   *ANSSIZP bytes.  If more than *ANSSIZP bytes are needed and ANSCP
+   is non-NULL (to indicate that modifying the answer buffer is allowed)
+   then malloc is used to allocate a new response buffer and ANSCP and
+   ANSP will both point to the new buffer.  If more than *ANSSIZP bytes
+   are needed but ANSCP is NULL, then as much of the response as
+   possible is read into the buffer, but the results will be truncated.
+   When truncation happens because of a small answer buffer the DNS
+   packets header feild TC will bet set to 1, indicating a truncated
+   message and the rest of the socket data will be read and discarded.
+
+   Answers to the query are stored secondly in *ANSP2 up to a max of
+   *ANSSIZP2 bytes, with the actual response length stored in
+   *RESPLEN2.  If more than *ANSSIZP bytes are needed and ANSP2
+   is non-NULL (required for a second query) then malloc is used to
+   allocate a new response buffer, *ANSSIZP2 is set to the new buffer
+   size and *ANSP2_MALLOCED is set to 1.
+
+   The ANSP2_MALLOCED argument will eventually be removed as the
+   change in buffer pointer can be used to detect the buffer has
+   changed and that the caller should use free on the new buffer.
+
+   Note that the answers may arrive in any order from the server and
+   therefore the first and second answer buffers may not correspond to
+   the first and second queries.
+
+   It is not supported to call this function with a non-NULL ANSP2
+   but a NULL ANSCP.  Put another way, you can call send_vc with a
+   single unmodifiable buffer or two modifiable buffers, but no other
+   combination is supported.
+
+   It is the caller's responsibility to free the malloc allocated
+   buffers by detecting that the pointers have changed from their
+   original values i.e. *ANSCP or *ANSP2 has changed.
+
+   If errors are encountered then *TERRNO is set to an appropriate
+   errno value and a zero result is returned for a recoverable error,
+   and a less-than zero result is returned for a non-recoverable error.
+
+   If no errors are encountered then *TERRNO is left unmodified and
+   a the length of the first response in bytes is returned.  */
 static int
 send_vc(res_state statp,
 	const u_char *buf, int buflen, const u_char *buf2, int buflen2,
@@ -639,11 +729,7 @@ send_vc(res_state statp,
 {
 	const HEADER *hp = (HEADER *) buf;
 	const HEADER *hp2 = (HEADER *) buf2;
-	u_char *ans = *ansp;
-	int orig_anssizp = *anssizp;
-	// XXX REMOVE
-	// int anssiz = *anssizp;
-	HEADER *anhp = (HEADER *) ans;
+	HEADER *anhp = (HEADER *) *ansp;
 	struct sockaddr *nsap = get_nsaddr (statp, ns);
 	int truncating, connreset, n;
 	/* On some architectures compiler might emit a warning indicating
@@ -731,6 +817,8 @@ send_vc(res_state statp,
 	 * Receive length & response
 	 */
 	int recvresp1 = 0;
+	/* Skip the second response if there is no second query.
+           To do that we mark the second response as received.  */
 	int recvresp2 = buf2 == NULL;
 	uint16_t rlen16;
  read_len:
@@ -767,36 +855,14 @@ send_vc(res_state statp,
 	u_char **thisansp;
 	int *thisresplenp;
 	if ((recvresp1 | recvresp2) == 0 || buf2 == NULL) {
+		/* We have not received any responses
+		   yet or we only have one response to
+		   receive.  */
 		thisanssizp = anssizp;
 		thisansp = anscp ?: ansp;
 		assert (anscp != NULL || ansp2 == NULL);
 		thisresplenp = &resplen;
 	} else {
-		if (*anssizp != MAXPACKET) {
-			/* No buffer allocated for the first
-			   reply.  We can try to use the rest
-			   of the user-provided buffer.  */
-			DIAG_PUSH_NEEDS_COMMENT;
-			DIAG_IGNORE_NEEDS_COMMENT (5, "-Wmaybe-uninitialized");
-#if _STRING_ARCH_unaligned
-			*anssizp2 = orig_anssizp - resplen;
-			*ansp2 = *ansp + resplen;
-#else
-			int aligned_resplen
-			  = ((resplen + __alignof__ (HEADER) - 1)
-			     & ~(__alignof__ (HEADER) - 1));
-			*anssizp2 = orig_anssizp - aligned_resplen;
-			*ansp2 = *ansp + aligned_resplen;
-#endif
-			DIAG_POP_NEEDS_COMMENT;
-		} else {
-			/* The first reply did not fit into the
-			   user-provided buffer.  Maybe the second
-			   answer will.  */
-			*anssizp2 = orig_anssizp;
-			*ansp2 = *ansp;
-		}
-
 		thisanssizp = anssizp2;
 		thisansp = ansp2;
 		thisresplenp = resplen2;
@@ -804,10 +870,14 @@ send_vc(res_state statp,
 	anhp = (HEADER *) *thisansp;
 
 	*thisresplenp = rlen;
-	if (rlen > *thisanssizp) {
-		/* Yes, we test ANSCP here.  If we have two buffers
-		   both will be allocatable.  */
-		if (__glibc_likely (anscp != NULL))       {
+	/* Is the answer buffer too small?  */
+	if (*thisanssizp < rlen) {
+		/* If the current buffer is not the the static
+		   user-supplied buffer then we can reallocate
+		   it.  */
+		if (thisansp != NULL && thisansp != ansp) {
+			/* Always allocate MAXPACKET, callers expect
+			   this specific size.  */
 			u_char *newp = malloc (MAXPACKET);
 			if (newp == NULL) {
 				*terrno = ENOMEM;
@@ -819,6 +889,9 @@ send_vc(res_state statp,
 			if (thisansp == ansp2)
 			  *ansp2_malloced = 1;
 			anhp = (HEADER *) newp;
+			/* A uint16_t can't be larger than MAXPACKET
+			   thus it's safe to allocate MAXPACKET but
+			   read RLEN bytes instead.  */
 			len = rlen;
 		} else {
 			Dprint(statp->options & RES_DEBUG,
@@ -948,6 +1021,66 @@ reopen (res_state statp, int *terrno, int ns)
 	return 1;
 }
 
+/* The send_dg function is responsible for sending a DNS query over UDP
+   to the nameserver numbered NS from the res_state STATP i.e.
+   EXT(statp).nssocks[ns].  The function supports IPv4 and IPv6 queries
+   along with the ability to send the query in parallel for both stacks
+   (default) or serially (RES_SINGLKUP).  It also supports serial lookup
+   with a close and reopen of the socket used to talk to the server
+   (RES_SNGLKUPREOP) to work around broken name servers.
+
+   The query stored in BUF of BUFLEN length is sent first followed by
+   the query stored in BUF2 of BUFLEN2 length.  Queries are sent
+   in parallel (default) or serially (RES_SINGLKUP or RES_SNGLKUPREOP).
+
+   Answers to the query are stored firstly in *ANSP up to a max of
+   *ANSSIZP bytes.  If more than *ANSSIZP bytes are needed and ANSCP
+   is non-NULL (to indicate that modifying the answer buffer is allowed)
+   then malloc is used to allocate a new response buffer and ANSCP and
+   ANSP will both point to the new buffer.  If more than *ANSSIZP bytes
+   are needed but ANSCP is NULL, then as much of the response as
+   possible is read into the buffer, but the results will be truncated.
+   When truncation happens because of a small answer buffer the DNS
+   packets header feild TC will bet set to 1, indicating a truncated
+   message, while the rest of the UDP packet is discarded.
+
+   Answers to the query are stored secondly in *ANSP2 up to a max of
+   *ANSSIZP2 bytes, with the actual response length stored in
+   *RESPLEN2.  If more than *ANSSIZP bytes are needed and ANSP2
+   is non-NULL (required for a second query) then malloc is used to
+   allocate a new response buffer, *ANSSIZP2 is set to the new buffer
+   size and *ANSP2_MALLOCED is set to 1.
+
+   The ANSP2_MALLOCED argument will eventually be removed as the
+   change in buffer pointer can be used to detect the buffer has
+   changed and that the caller should use free on the new buffer.
+
+   Note that the answers may arrive in any order from the server and
+   therefore the first and second answer buffers may not correspond to
+   the first and second queries.
+
+   It is not supported to call this function with a non-NULL ANSP2
+   but a NULL ANSCP.  Put another way, you can call send_vc with a
+   single unmodifiable buffer or two modifiable buffers, but no other
+   combination is supported.
+
+   It is the caller's responsibility to free the malloc allocated
+   buffers by detecting that the pointers have changed from their
+   original values i.e. *ANSCP or *ANSP2 has changed.
+
+   If an answer is truncated because of UDP datagram DNS limits then
+   *V_CIRCUIT is set to 1 and the return value non-zero to indicate to
+   the caller to retry with TCP.  The value *GOTSOMEWHERE is set to 1
+   if any progress was made reading a response from the nameserver and
+   is used by the caller to distinguish between ECONNREFUSED and
+   ETIMEDOUT (the latter if *GOTSOMEWHERE is 1).
+
+   If errors are encountered then *TERRNO is set to an appropriate
+   errno value and a zero result is returned for a recoverable error,
+   and a less-than zero result is returned for a non-recoverable error.
+
+   If no errors are encountered then *TERRNO is left unmodified and
+   a the length of the first response in bytes is returned.  */
 static int
 send_dg(res_state statp,
 	const u_char *buf, int buflen, const u_char *buf2, int buflen2,
@@ -957,8 +1090,6 @@ send_dg(res_state statp,
 {
 	const HEADER *hp = (HEADER *) buf;
 	const HEADER *hp2 = (HEADER *) buf2;
-	u_char *ans = *ansp;
-	int orig_anssizp = *anssizp;
 	struct timespec now, timeout, finish;
 	struct pollfd pfd[1];
 	int ptimeout;
@@ -991,6 +1122,8 @@ send_dg(res_state statp,
 	int need_recompute = 0;
 	int nwritten = 0;
 	int recvresp1 = 0;
+	/* Skip the second response if there is no second query.
+           To do that we mark the second response as received.  */
 	int recvresp2 = buf2 == NULL;
 	pfd[0].fd = EXT(statp).nssocks[ns];
 	pfd[0].events = POLLOUT;
@@ -1154,55 +1287,56 @@ send_dg(res_state statp,
 		int *thisresplenp;
 
 		if ((recvresp1 | recvresp2) == 0 || buf2 == NULL) {
+			/* We have not received any responses
+			   yet or we only have one response to
+			   receive.  */
 			thisanssizp = anssizp;
 			thisansp = anscp ?: ansp;
 			assert (anscp != NULL || ansp2 == NULL);
 			thisresplenp = &resplen;
 		} else {
-			if (*anssizp != MAXPACKET) {
-				/* No buffer allocated for the first
-				   reply.  We can try to use the rest
-				   of the user-provided buffer.  */
-#if _STRING_ARCH_unaligned
-				*anssizp2 = orig_anssizp - resplen;
-				*ansp2 = *ansp + resplen;
-#else
-				int aligned_resplen
-				  = ((resplen + __alignof__ (HEADER) - 1)
-				     & ~(__alignof__ (HEADER) - 1));
-				*anssizp2 = orig_anssizp - aligned_resplen;
-				*ansp2 = *ansp + aligned_resplen;
-#endif
-			} else {
-				/* The first reply did not fit into the
-				   user-provided buffer.  Maybe the second
-				   answer will.  */
-				*anssizp2 = orig_anssizp;
-				*ansp2 = *ansp;
-			}
-
 			thisanssizp = anssizp2;
 			thisansp = ansp2;
 			thisresplenp = resplen2;
 		}
 
 		if (*thisanssizp < MAXPACKET
-		    /* Yes, we test ANSCP here.  If we have two buffers
-		       both will be allocatable.  */
-		    && anscp
+		    /* If the current buffer is not the the static
+		       user-supplied buffer then we can reallocate
+		       it.  */
+		    && (thisansp != NULL && thisansp != ansp)
 #ifdef FIONREAD
+		    /* Is the size too small?  */
 		    && (ioctl (pfd[0].fd, FIONREAD, thisresplenp) < 0
 			|| *thisanssizp < *thisresplenp)
 #endif
                     ) {
+			/* Always allocate MAXPACKET, callers expect
+			   this specific size.  */
 			u_char *newp = malloc (MAXPACKET);
 			if (newp != NULL) {
-				*anssizp = MAXPACKET;
-				*thisansp = ans = newp;
+				*thisanssizp = MAXPACKET;
+				*thisansp = newp;
 				if (thisansp == ansp2)
 				  *ansp2_malloced = 1;
 			}
 		}
+		/* We could end up with truncation if anscp was NULL
+		   (not allowed to change caller's buffer) and the
+		   response buffer size is too small.  This isn't a
+		   reliable way to detect truncation because the ioctl
+		   may be an inaccurate report of the UDP message size.
+		   Therefore we use this only to issue debug output.
+		   To do truncation accurately with UDP we need
+		   MSG_TRUNC which is only available on Linux.  We
+		   can abstract out the Linux-specific feature in the
+		   future to detect truncation.  */
+		if (__glibc_unlikely (*thisanssizp < *thisresplenp)) {
+			Dprint(statp->options & RES_DEBUG,
+			       (stdout, ";; response may be truncated (UDP)\n")
+			);
+		}
+
 		HEADER *anhp = (HEADER *) *thisansp;
 		socklen_t fromlen = sizeof(struct sockaddr_in6);
 		assert (sizeof(from) <= fromlen);


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]