This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] resolv: Always set *resplen2 out parameter in send_dg [BZ #19791]


On 03/23/2016 04:22 PM, Florian Weimer wrote:
> This is the new patch.
> 
> It sets *resplen2 strictly before each return (except for the success
> case).  I have convinced myself that this does not alter any correct
> cases.  If send_dg does not return 0

This patch looks good to me, though it took me a while to understand
the code path that triggers the flaw and for the sake of correctness
I'm going to write it down below.

My concerns are that this change adds complexity and I wonder if there
isn't a simpler solution to just unconditionally clear *resplen2 if
resplen2 is non-NULL?

So in summary:

- Patch looks good to me.
- If *resplen2 can't be set unconditionally I think this patch should
  be applied as-is.
- If *resplen2 can be set unconditionally we should do so to simplify
  the patch.

> I have a companion patch for send_vc (bug 19825) which I still need to
> submit.
> 
> I ran the external resolver test suite, apart from a tst-partial-fail
> failure due to libresolv's newly found ability to find non-broken
> responses, there are no failures.
> 
> Thanks,
> Florian
> 
> Since commit 44d20bca52ace85850012b0ead37b360e3ecd96e (Implement
> second fallback mode for DNS requests), there is a code path which
> returns early, before *resplen2 is initialized.  This happens if the
> name server address is immediately recognized as invalid (because of
> lack of protocol support, or if it is a broadcast address such
> 255.255.255.255, or another invalid address).
>

I assume the case looks like this:

- send_dg(): sets *resplen2 in error and returns.
- __libc_res_nsend(): tries an invalid server (ipv6 with ipv6 disabled)
- send_dg(): calls reopen() which ignores the server as invalid
             and returns without clearing *resplen2
- __libc_res_nsend():
 546                         if (n == 0 && (buf2 == NULL || *resplen2 == 0))
 547                                 goto next_ns;
 Line 546, particularly the *resplen2 == 0 prevents us from switching to the
 next ns because we believe we have a valid answer in the second response
 buffer.

It isn't entirely clear to me how you would get an initial failure that
would set *resplen2 to non-zero in a configuration as described in BZ
#19791. It seems like you would always fail first and that would mean
you'd be stuck looping over servers you can't contact.

I agree that there is a problem here and that clearing *resplen2 in all
paths is appropriate.

> If this happens and *resplen2 was non-zero (which is the case if a
> previous query resulted in a failure), __libc_res_nquery would reuse
> an existing second answer buffer.  This answer has been previously
> identified as unusable (for example, it could be an NXDOMAIN
> response).  Due to the presence of a second answer, no name server
> switching will occur.  The result is a name resolution failure,
> although a successful resolution would have been possible if name
> servers have been switched and queries had proceeded along the search
> path.
> 
> The above paragraph still simplifies the situation.  Before glibc
> 2.23, if the second answer needed malloc, the stub resolver would
> still attempt to reuse the second answer, but this is not possible
> because __libc_res_nsearch has freed it, after the unsuccessful call
> to __libc_res_nquerydomain, and set the buffer pointer to NULL.  This
> eventually leads to an assertion failure in __libc_res_nquery:
> 
> 	/* Make sure both hp and hp2 are defined */
> 	assert((hp != NULL) && (hp2 != NULL));
> 
> If assertions are disabled, the consequence is a NULL pointer
> dereference on the next line.
> 
> Starting with glibc 2.23, as a result of commit
> e9db92d3acfe1822d56d11abcea5bfc4c41cf6ca (CVE-2015-7547: getaddrinfo()
> stack-based buffer overflow (Bug 18665)), the second answer is always
> allocated with malloc.  This means that the assertion failure happens
> with small responses as well because there is no buffer to reuse, as
> soon as there is a name resolution failure which triggers a search for
> an answer along the search path.
> 
> This commit addresses the issue by ensuring that *resplen2 is
> initialized before the send_dg function returns.
> 
> This commit also addresses a bug where an invalid second reply is
> incorrectly returned as a valid to the caller.
> 
> 
> 0001-resolv-Always-set-resplen2-out-parameter-in-send_dg-.patch
> 
> 
> 2016-03-23  Florian Weimer  <fweimer@redhat.com>
> 
> 	[BZ #19791]
> 	* resolv/res_send.c (close_and_return_error): New function.
> 	(send_dg): Initialize *resplen2 after reopen failure.  Call
> 	close_and_return_error for error returns.  On error paths without
> 	__res_iclose, initialze *resplen2 explicitly.  Update comment for
> 	successful return.
> 
> diff --git a/resolv/res_send.c b/resolv/res_send.c
> index 25c19f1..b4efcb6 100644
> --- a/resolv/res_send.c
> +++ b/resolv/res_send.c
> @@ -649,6 +649,18 @@ get_nsaddr (res_state statp, int n)
>      return (struct sockaddr *) (void *) &statp->nsaddr_list[n];
>  }
>  
> +/* Close the resolver structure, assign zero to *RESPLEN2 if RESPLEN2
> +   is not NULL, and return zero.  */
> +static int
> +__attribute__ ((warn_unused_result))
> +close_and_return_error (res_state statp, int *resplen2)
> +{
> +  __res_iclose(statp, false);
> +  if (resplen2 != NULL)
> +    *resplen2 = 0;
> +  return 0;
> +}

OK, provide a function to do the cleanup in a common case.

All paths that would call this function either succeeded at
response #1 and failed at response #2 or failed at response #1
and stopped, in both cases setting *resplen2 to zero is OK.
If you knew this was a failure for response #1 you could avoid
the clearing of *resplen2, but that's so complicated at this
point that it's not worth it.

> +
>  /* The send_vc function is responsible for sending a DNS query over TCP
>     to the nameserver numbered NS from the res_state STATP i.e.
>     EXT(statp).nssocks[ns].  The function supports sending both IPv4 and	
> @@ -1114,7 +1126,11 @@ send_dg(res_state statp,
>   retry_reopen:
>  	retval = reopen (statp, terrno, ns);
>  	if (retval <= 0)
> -		return retval;
> +	  {
> +	    if (resplen2 != NULL)
> +	      *resplen2 = 0;
> +	    return retval;
> +	  }

OK.

>   retry:
>  	evNowTime(&now);
>  	evConsTime(&timeout, seconds, 0);
> @@ -1127,8 +1143,6 @@ send_dg(res_state statp,
>  	int recvresp2 = buf2 == NULL;
>  	pfd[0].fd = EXT(statp).nssocks[ns];
>  	pfd[0].events = POLLOUT;
> -	if (resplen2 != NULL)
> -	  *resplen2 = 0;

OK, not needed since every other path sets it to zero.

>   wait:
>  	if (need_recompute) {
>  	recompute_resend:
> @@ -1136,9 +1150,7 @@ send_dg(res_state statp,
>  		if (evCmpTime(finish, now) <= 0) {
>  		poll_err_out:
>  			Perror(statp, stderr, "poll", errno);
> -		err_out:

Thank you for getting rid of another goto label in this already too complex
function.

> -			__res_iclose(statp, false);
> -			return (0);
> +			return close_and_return_error (statp, resplen2);

OK, fixes another return path that didn't set *resplen2 to zero.

>  		}
>  		evSubTime(&timeout, &finish, &now);
>  		need_recompute = 0;

FYI, it annoys me that we set *resplen2 to 1 here as a `hack` to cause
the code in __libc_res_nsend to avoid switching to the next nameserver.

1185                     *resplen2 = 1;
1186                     return resplen;

We are overloading the meaning of *resplen2 here for our own purposes
of avoiding switching to the next nameserver, but the only way to fix
that is to add explicit information about the desire for send_dg to
switch or not switch to the next nameserver.

> @@ -1185,7 +1197,9 @@ send_dg(res_state statp,
>  		  }
>  
>  		*gotsomewhere = 1;
> -		return (0);
> +		if (resplen2 != NULL)
> +		  *resplen2 = 0;
> +		return 0;

OK.

>  	}
>  	if (n < 0) {
>  		if (errno == EINTR)
> @@ -1253,7 +1267,7 @@ send_dg(res_state statp,
>  
>  		      fail_sendmmsg:
>  			Perror(statp, stderr, "sendmmsg", errno);
> -			goto err_out;
> +			return close_and_return_error (statp, resplen2);

OK.

>  		      }
>  		  }
>  		else
> @@ -1271,7 +1285,7 @@ send_dg(res_state statp,
>  		      if (errno == EINTR || errno == EAGAIN)
>  			goto recompute_resend;
>  		      Perror(statp, stderr, "send", errno);
> -		      goto err_out;
> +		      return close_and_return_error (statp, resplen2);

OK.

>  		    }
>  		  just_one:
>  		    if (nwritten != 0 || buf2 == NULL || single_request)
> @@ -1349,7 +1363,7 @@ send_dg(res_state statp,
>  				goto wait;
>  			}
>  			Perror(statp, stderr, "recvfrom", errno);
> -			goto err_out;
> +			return close_and_return_error (statp, resplen2);

OK.

>  		}
>  		*gotsomewhere = 1;
>  		if (__glibc_unlikely (*thisresplenp < HFIXEDSZ))       {
> @@ -1360,7 +1374,7 @@ send_dg(res_state statp,
>  			       (stdout, ";; undersized: %d\n",
>  				*thisresplenp));
>  			*terrno = EMSGSIZE;
> -			goto err_out;
> +			return close_and_return_error (statp, resplen2);

OK.

>  		}
>  		if ((recvresp1 || hp->id != anhp->id)
>  		    && (recvresp2 || hp2->id != anhp->id)) {
> @@ -1409,7 +1423,7 @@ send_dg(res_state statp,
>  				? *thisanssizp : *thisresplenp);
>  			/* record the error */
>  			statp->_flags |= RES_F_EDNS0ERR;
> -			goto err_out;
> +			return close_and_return_error (statp, resplen2);

OK.

>  	}
>  #endif
>  		if (!(statp->options & RES_INSECURE2)
> @@ -1461,10 +1475,10 @@ send_dg(res_state statp,
>  			    goto wait;
>  			  }
>  
> -			__res_iclose(statp, false);
>  			/* don't retry if called from dig */
>  			if (!statp->pfcode)
> -				return (0);
> +			  return close_and_return_error (statp, resplen2);
> +			__res_iclose(statp, false);

OK.

Please kill next_ns label some day soon :-)

>  		}
>  		if (anhp->rcode == NOERROR && anhp->ancount == 0
>  		    && anhp->aa == 0 && anhp->ra == 0 && anhp->arcount == 0) {
> @@ -1486,6 +1500,8 @@ send_dg(res_state statp,
>  			__res_iclose(statp, false);
>  			// XXX if we have received one reply we could
>  			// XXX use it and not repeat it over TCP...
> +			if (resplen2 != NULL)
> +			  *resplen2 = 0;

OK.

>  			return (1);

Again, this annoys me that we overload size 1 to prevent ns switching
from happening, which is something we should be explicit about, but that
would require better data structures in this code and that's a future change.

Notice the non-obiouvs trickery with things like "resplen > 1" to avoid the use
of "1" to mean we stayed on the same nameserver (not to mention the use of
saved

>  		}
>  		/* Mark which reply we received.  */
> @@ -1501,21 +1517,22 @@ send_dg(res_state statp,
>  					__res_iclose (statp, false);
>  					retval = reopen (statp, terrno, ns);
>  					if (retval <= 0)
> -						return retval;
> +					  {
> +					    if (resplen2 != NULL)
> +					      *resplen2 = 0;
> +					    return retval;
> +					  }

OK.

>  					pfd[0].fd = EXT(statp).nssocks[ns];
>  				}
>  			}
>  			goto wait;
>  		}
> -		/*
> -		 * All is well, or the error is fatal.  Signal that the
> -		 * next nameserver ought not be tried.
> -		 */
> +		/* All is well.  We have received both responses (if
> +		   two responses were requested).  */

OK.

>  		return (resplen);
> -	} else if (pfd[0].revents & (POLLERR | POLLHUP | POLLNVAL)) {
> -		/* Something went wrong.  We can stop trying.  */
> -		goto err_out;
> -	}
> +	} else if (pfd[0].revents & (POLLERR | POLLHUP | POLLNVAL))
> +	  /* Something went wrong.  We can stop trying.  */
> +	  return close_and_return_error (statp, resplen2);

OK.

>  	else {
>  		/* poll should not have returned > 0 in this case.  */
>  		abort ();


-- 
Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]