Bug 21336 - __libc_res_nsearch() leak?
Summary: __libc_res_nsearch() leak?
Status: RESOLVED DUPLICATE of bug 16574
Alias: None
Product: glibc
Classification: Unclassified
Component: libc (show other bugs)
Version: 2.19
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-03-30 16:46 UTC by Rob Krakora
Modified: 2017-04-04 19:37 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
Attached is the eglibc bitbake recipe provided with Yocto Daisy which we use to build our Yocto image for our ARM based embedded system. (94.80 KB, application/gzip)
2017-04-04 16:35 UTC, Rob Krakora
Details
Here is the pcap and strace captures... (3.83 KB, application/octet-stream)
2017-04-04 17:29 UTC, Rob Krakora
Details
Here is the strace capture... (198.60 KB, text/plain)
2017-04-04 17:31 UTC, Rob Krakora
Details
Patch for eglibc-2.19 (3.19 KB, patch)
2017-04-04 18:10 UTC, Rob Krakora
Details | Diff
Patch for eglibc-2.19 - CORRECTION (3.19 KB, patch)
2017-04-04 18:16 UTC, Rob Krakora
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Rob Krakora 2017-03-30 16:46:36 UTC
In glibc 2.25, shouldn't this...

	 * If the query has not already been tried as is then try it
	 * unless RES_NOTLDQUERY is set and there were no dots.
	 */
	if ((dots || !searched || (statp->options & RES_NOTLDQUERY) == 0)
	    && !(tried_as_is || root_on_list)) {
		ret = __libc_res_nquerydomain(statp, name, NULL, class, type,
					      answer, anslen, answerp,
					      answerp2, nanswerp2, resplen2,
					      answerp2_malloced);
		if (ret > 0 || (ret == 0 && resplen2 != NULL
				&& *resplen2 > 0))
			return (ret);
	}

...look like this...

	 * If the query has not already been tried as is then try it
	 * unless RES_NOTLDQUERY is set and there were no dots.
	 */
	if ((dots || !searched || (statp->options & RES_NOTLDQUERY) == 0)
	    && !(tried_as_is || root_on_list)) {
		ret = __libc_res_nquerydomain(statp, name, NULL, class, type,
					      answer, anslen, answerp,
					      answerp2, nanswerp2, resplen2,
					      answerp2_malloced);
		if (ret > 0 || (ret == 0 && resplen2 != NULL
				&& *resplen2 > 0))
			return (ret);

		if (answerp && *answerp != answer) {
			answer = *answerp;
			anslen = MAXPACKET;
		}
	}

...a leak occurred in this spot in eglibc 2.19 provided by openembedded for Yocto Daisy...
Comment 1 Florian Weimer 2017-04-04 13:18:26 UTC
(In reply to Rob Krakora from comment #0)
> 		if (answerp && *answerp != answer) {
> 			answer = *answerp;
> 			anslen = MAXPACKET;
> 		}

I don't see how this could make a difference because the answer and anslen variables are dead at this point.

> ...a leak occurred in this spot in eglibc 2.19 provided by openembedded for
> Yocto Daisy...

Can you provide more information how to reproduce the leak?  Thanks.
Comment 2 Rob Krakora 2017-04-04 15:31:08 UTC
Here is a trace flagged by dmalloc (we could not use Valgrind as we are on an ARM v5 and Valgrind support starts at ARM v7) as the source of the 64K memory leak.

#0  send_dg (resplen2=<optimized out>, anssizp2=<optimized out>, ansp2=<optimized out>, anscp=<optimized out>, 
    gotsomewhere=<synthetic pointer>, v_circuit=<synthetic pointer>, ns=-1290429468, terrno=0xb3159bcc, anssizp=0xb3159c80, 
    ansp=0xb3159c7c, buflen2=<optimized out>, buf2=<optimized out>, buflen=<optimized out>, buf=<optimized out>, statp=0x0)
    at res_send.c:1359
#1  __libc_res_nsend (statp=statp@entry=0xb315c704, buf=0x0, buf@entry=0xb3159ca8 "(\255\001", buflen=0, buflen@entry=63, 
    buf2=0xb315c704 "\005", buf2@entry=0xb3159ce8 "\001q\001", buflen2=buflen2@entry=63, ans=ans@entry=0xb315a7f0 "(\255\201\200", 
    anssiz=anssiz@entry=2048, ansp=ansp@entry=0xb315aff8, ansp2=ansp2@entry=0xb315affc, nansp2=nansp2@entry=0xb315b000, 
    resplen2=resplen2@entry=0xb315b004) at res_send.c:575
#2  0xb0927b84 in __GI___libc_res_nquery (statp=statp@entry=0xb315c704, 
    name=name@entry=0x109f414 "bp-cloud-api-app-01.eastus.cloudapp.azure.com", class=class@entry=1, type=type@entry=17429524, 
    answer=0xb315a7f0 "(\255\201\200", answer@entry=0xb0928634 <__GI___libc_res_nsearch+656> "", anslen=2048, anslen@entry=62321, 
    answerp=0xb315aff8, answerp@entry=0xb315a7f0, answerp2=0xb315affc, answerp2@entry=0x800, nanswerp2=0xb315b000, 
    nanswerp2@entry=0xb315aff8, resplen2=0xb315b004, resplen2@entry=0xb315affc) at res_query.c:226
#3  0xb0928114 in __libc_res_nquerydomain (statp=statp@entry=0xb315c704, 
    name=name@entry=0x109f414 "bp-cloud-api-app-01.eastus.cloudapp.azure.com", domain=domain@entry=0x0, class=1, 
    class@entry=5024748, type=62321, type@entry=0, answer=0xb315a7f0 "(\255\201\200", 
    answer@entry=0xb315b03c "\330K\241\266`\265\025\263l\265\025\263", anslen=anslen@entry=2048, answerp=answerp@entry=0xb315aff8, 
    answerp2=answerp2@entry=0xb315affc, nanswerp2=0xb315b000, nanswerp2@entry=0xb315aff8, resplen2=0xb315b004, 
    resplen2@entry=0xb315affc) at res_query.c:585
#4  0xb0928634 in __GI___libc_res_nsearch (statp=0xb315c704, name=0x109f414 "bp-cloud-api-app-01.eastus.cloudapp.azure.com", 
    class=5024748, class@entry=1, type=0, type@entry=62321, answer=answer@entry=0xb315a7f0 "(\255\201\200", 
    anslen=anslen@entry=2048, answerp=0xb315aff8, answerp2=answerp2@entry=0xb315affc, nanswerp2=nanswerp2@entry=0xb315b000, 
    resplen2=resplen2@entry=0xb315b004) at res_query.c:378
#5  0xb093f4a4 in _nss_dns_gethostbyname4_r (name=<optimized out>, 
    name@entry=0x109f414 "bp-cloud-api-app-01.eastus.cloudapp.azure.com", pat=pat@entry=0xb315b55c, 
    buffer=buffer@entry=0xb315b058 "\177", buflen=buflen@entry=1056, errnop=errnop@entry=0xb315b560, 
    herrnop=herrnop@entry=0xb315b56c, ttlp=ttlp@entry=0x0) at nss_dns/dns-host.c:314
#6  0xb6a14bd8 in gaih_inet (name=<optimized out>, name@entry=0x109f414 "bp-cloud-api-app-01.eastus.cloudapp.azure.com", 
    service=<optimized out>, req=req@entry=0xb315b6f8, pai=pai@entry=0xb315b608, naddrs=naddrs@entry=0xb315b614)
    at ../sysdeps/posix/getaddrinfo.c:850
#7  0xb6a16ab4 in __GI_getaddrinfo (name=0x109f414 "bp-cloud-api-app-01.eastus.cloudapp.azure.com", service=0x0, hints=0xb315b6f8, 
    pai=0xb315b6d8) at ../sysdeps/posix/getaddrinfo.c:2406
#8  0x00220ddc in Poco::Net::DNS::hostByName (hostname=..., hintFlags=hintFlags@entry=34)
    at /mnt/gateway-image/mlinux-3.x/build/tmp/work/arm926ejste-mlinux-linux-gnueabi/poco/1.7.7-r0/poco-poco-1.7.7-release/Net/src/DNS.cpp:60
#9  0x00202000 in Poco::Net::SocketAddress::init (this=this@entry=0xb315b7a0, hostAddress=..., portNumber=<optimized out>)
    at /mnt/gateway-image/mlinux-3.x/build/tmp/work/arm926ejste-mlinux-linux-gnueabi/poco/1.7.7-r0/poco-poco-1.7.7-release/Net/src/SocketAddress.cpp:233
#10 0x002022fc in Poco::Net::SocketAddress::SocketAddress (this=0xb315b7a0, hostAddress=..., portNumber=<optimized out>)
    at /mnt/gateway-image/mlinux-3.x/build/tmp/work/arm926ejste-mlinux-linux-gnueabi/poco/1.7.7-r0/poco-poco-1.7.7-release/Net/src/SocketAddress.cpp:76
#11 0x002186fc in Poco::Net::HTTPClientSession::reconnect (this=this@entry=0x1219a08)
    at /mnt/gateway-image/mlinux-3.x/build/tmp/work/arm926ejste-mlinux-linux-gnueabi/poco/1.7.7-r0/poco-poco-1.7.7-release/Net/src/HTTPClientSession.cpp:341
#12 0x00219760 in Poco::Net::HTTPClientSession::sendRequest (this=0x1219a08, request=...)
    at /mnt/gateway-image/mlinux-3.x/build/tmp/work/arm926ejste-mlinux-linux-gnueabi/poco/1.7.7-r0/poco-poco-1.7.7-release/Net/src/HTTPClientSession.cpp:205
#13 0x000be8b0 in AvisePocoClient::sendRequest (this=0x10f7008, req=...)
Comment 3 Rob Krakora 2017-04-04 15:35:08 UTC
The trace was against eglibc 2.19 running on a Yocto Daisy build...
Comment 4 Florian Weimer 2017-04-04 15:37:49 UTC
(In reply to Rob Krakora from comment #3)
> The trace was against eglibc 2.19 running on a Yocto Daisy build...

Can you reproduce this at will?  In this case, it would be very helpful to have a capture of the DNS queries and responses.

Do you have any peculiar /etc/resolv.conf or RES_OPTIONS settings?  What does the hints structure for getaddrinfo look like?
Comment 5 Rob Krakora 2017-04-04 15:42:08 UTC
Yes, I can reproduce this at will.  resolv.conf only contains the this:

# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 8.8.8.8

Where 8.8.8.8 is the Google DNS server.  It also fails when other DNS servers are utilized.

I can capture the DNS traffic for you when this occurs.  tcpdump capture Ok?
Comment 6 Florian Weimer 2017-04-04 15:46:22 UTC
(In reply to Rob Krakora from comment #5)

> I can capture the DNS traffic for you when this occurs.  tcpdump capture Ok?

Yes, please do a binary (pcap file) capture with “-s 0” (complete packets).  You should limit the capture with “host 8.8.8.8”, so that both UDP, fragmented UDP, and TCP packets are included.

Capturing strace output at the same time would be helpful, too, in case the stub resolver performs unexpected system calls which are not reflected in the network traffic.

How close are your sources to upstream glibc 2.19?  I wonder if this could be a faulty backport.
Comment 7 Rob Krakora 2017-04-04 16:35:13 UTC
Created attachment 9969 [details]
Attached is the eglibc bitbake recipe provided with Yocto Daisy which we use to build our Yocto image for our ARM based embedded system.

Attached is the eglibc bitbake recipe provided with Yocto Daisy which we use to build our Yocto image for our ARM based embedded system.
Comment 8 Rob Krakora 2017-04-04 17:29:52 UTC
Created attachment 9970 [details]
Here is the pcap and strace captures...

Here is the pcap capture...
Comment 9 Rob Krakora 2017-04-04 17:31:18 UTC
Created attachment 9971 [details]
Here is the strace capture...

Here is the strace capture...
Comment 10 Rob Krakora 2017-04-04 17:57:58 UTC
The leak is occurring in the eglibc 2.19 code below...however, in the glibc 2.25 code further below there is some code added to indicated to the layers above that a malloc has occurred which would negate the my proposed patch.  eglibc 2.19 and glibc 2.19 are closer and glibc 2.19 does not have the code to indicate to the layers above that a malloc has occurred so it could possibly have a leak???

eglibc 2.19:

		if (*thisanssizp < MAXPACKET
		    /* If the current buffer is not the the static
		       user-supplied buffer then we can reallocate
		       it.  */
		    && (thisansp != NULL && thisansp != ansp)
#ifdef FIONREAD
		    /* Is the size too small?  */
		    && (ioctl (pfd[0].fd, FIONREAD, thisresplenp) < 0
			|| *thisanssizp < *thisresplenp)
#endif
                    ) {
			/* Always allocate MAXPACKET, callers expect
			   this specific size.  */
			u_char *newp = malloc (MAXPACKET);
			if (newp != NULL) {
				*thisanssizp = MAXPACKET;
				*thisansp = newp;
			}
		}

glibc 2.25:

		if (*thisanssizp < MAXPACKET
		    /* If the current buffer is not the the static
		       user-supplied buffer then we can reallocate
		       it.  */
		    && (thisansp != NULL && thisansp != ansp)
#ifdef FIONREAD
		    /* Is the size too small?  */
		    && (ioctl (pfd[0].fd, FIONREAD, thisresplenp) < 0
			|| *thisanssizp < *thisresplenp)
#endif
                    ) {
			/* Always allocate MAXPACKET, callers expect
			   this specific size.  */
			u_char *newp = malloc (MAXPACKET);
			if (newp != NULL) {
				*thisanssizp = MAXPACKET;
				*thisansp = newp;
				if (thisansp == ansp2)
				  *ansp2_malloced = 1;
			}
		}
Comment 11 Rob Krakora 2017-04-04 18:04:28 UTC
That is not it...adding the code from my previous reply does not fix the leak.  Only the code change I originally proposed fixes it...hmmm....
Comment 12 Rob Krakora 2017-04-04 18:10:50 UTC
Created attachment 9972 [details]
Patch for eglibc-2.19
Comment 13 Rob Krakora 2017-04-04 18:16:49 UTC
Created attachment 9973 [details]
Patch for eglibc-2.19 - CORRECTION
Comment 14 Andreas Schwab 2017-04-04 18:18:23 UTC
If you want to continue using 2.19 you are on your own.  Unless you can reproduce it with 2.25 this is a WONTFIX.
Comment 15 Rob Krakora 2017-04-04 18:21:37 UTC
We are moving to Yocto Krogoth which employs glibc 2.23 instead of eglibc 2.19 in a couple of months.  I will run the same test and see if the leak exists there.
Comment 16 Florian Weimer 2017-04-04 18:26:58 UTC
The announcement of CVE-2015-7547 said:

“
- Always malloc the second response buffer if needed.

  - Requires fix for sourceware bug 16574 to avoid memory leak.
    commit d668061994a7486a3ba9c7d5e7882d85a2883707
    commit ab09bf616ad527b249aca5f2a4956fd526f0712f
”

<https://www.sourceware.org/ml/libc-alpha/2016-02/msg00416.html>

Coincidentally, this regression originally delayed the disclosure of CVE-2015-7547.  The upstream glibc 2.19 branch already had the fix for bug 16574 when CVE-2015-7547 was fixed, but our downstream 2.12 and 2.17 branches still needed it.

If you have you own resolv backports, you should really try to get a valgrind-clean pass with the external resolv test suite:

<https://pagure.io/glibc-resolv-tests>

I'm in the process of upstreaming the remaining tests.

*** This bug has been marked as a duplicate of bug 16574 ***
Comment 17 Rob Krakora 2017-04-04 18:30:21 UTC
Yes, I would LOVE to use Valgrind.  However, we are on a ARM v5 and Valgrind ARM support starts at v7.  :)  Thanks so much for your help and valuable time!!!
Comment 18 Florian Weimer 2017-04-04 19:27:59 UTC
(In reply to Rob Krakora from comment #17)
> Yes, I would LOVE to use Valgrind.  However, we are on a ARM v5 and Valgrind
> ARM support starts at v7.

I assume you could perform memory leak testing on a different architecture than your production architecture.  The important bit is to increase test coverage.  Architecture-specific memory leaks happen, but are not the norm.
Comment 19 Rob Krakora 2017-04-04 19:37:14 UTC
True.