Hi Guys, First of all i really dont have the ability and skills to build and check this which seems to be a bug against CVS glibc. I really would like to, but i cant. I have already filled this to fedora bugzilla system. Fedora devels are very busy because of a major recompile with new GCC packages, so bug had no replies yet. But i got some replies on fedora-devel mailing lists and, because of them, i was asked to direct this bug report direct to glibc people. Fedora bugzilla is here https://bugzilla.redhat.com/show_bug.cgi?id=428067 and contain some important informations i will be reproducing in this bug report here I hope i can provide enough informations which would help you to possibly confirm and, if confirmed, fix this which seems to be a bug to me. Let's go ... Everything started when i got abnormal results with postfix reverse DNS restriction parameters. Wietse, postfix maintener, pointed that postfix wasn't the problem, the problem is that postfix was getting wrong results from system libraries, in this case getnameinfo functions. Wietse asked me to do some testings against getnameinfo function, with a getnameinfo.c simple software provided with postfix sources, and we confirmed that getnameinfo was really behaving strange in some situations where an IP address had LOTS of PTR records. The problem wasnt with multiple PTR records, those worked fine. But when i hit some IP that had multiples and LOTS of PTRs, getnameinfo was behaving strangely (returning unknown when it's supposed to return any of those multiple PTRs). In my case, my mail server hit an ip address with 258 PTRs. The postfix discussion is here: http://marc.info/?l=postfix-users&m=119973018023513&w=2 So, in situations where getnameinfo is called against an ip address with LOTS of PTRs, the function seems to return unknown, when it's supposed to return anyone of those LOTS of PTRs. As i told, i'm not able of providing test results against CVS glibc. But i can test against different versions on different Fedora boxes i have. With Fedora 3, 4, 5, 6 and 8 (dont have 7 to test) the results are the same: getnameinfo returns unknown when it should return something. --> Fedora Core 8 fully updated [root@f8 tmp]# cat /etc/fedora-release Fedora release 8 (Werewolf) [root@f8 tmp]# rpm -qi glibc Name: glibc Version: 2.7 Vendor: Fedora Project Release: 2 Build Date: Thu 18 Oct 2007 06:49:18 AM BRST --> running host and getnameinfo against some ip with multiple (but NOT lots of) PTR records [root@f8 tmp]# host 201.34.255.57 | grep "domain name pointer" | wc -l 11 [root@f8 tmp]# ./getnameinfo 201.34.255.57 Hostname: mail.egiro.com.br Address: 201.34.255.57 [root@f8 tmp]# --> running host and getnameinfo against some ip with multiple and LOTS of PTR records [root@f8 tmp]# host 201.34.255.58 | grep "domain name pointer" | wc -l 258 [root@f8 tmp]# ./getnameinfo 201.34.255.58 getnameinfo 201.34.255.58: Name or service not known [root@f8 tmp]# Despite the fact having 258 PTRs to a single ip address is very weird, it seems to be legal, RFC-speaking. I havent found RFC saying it's illegal, but i found this draft, which sometime will probably become RFC, which says: It is possible for there to be multiple PTRs at a single reverse tree node. In extreme cases, these multiple PTRs could cause a DNS response to exceed the UDP limit, and fall back to TCP or otherwise exceed the DNS protocol limits. Such a case could be one where the advantages of reverse mapping are exceeded by the disadvantages of the additional burden. This may be of particular significance for "mass virtual hosting" systems, where many hostnames are associated with a single IP. http://tools.ietf.org/html/draft-ietf-dnsop-reverse-mapping-considerations-05 this draft discusses the questions related to UDP and TCP responses, given the reply size in the case of several PTRs. But it didnt says it's illegal. It just brings some concerns about this situation which appears to be legal until some RFC says it's not. And at last, the final information .... this, which i believe to be a bug, doesnt happen with the oldest glibc which i still have access, a redhat 9 box: [root@aparecida tmp]# cat /etc/redhat-release Red Hat Linux release 9 (Shrike) [root@aparecida tmp]# rpm -qi glibc Name: glibc Version: 2.3.2 Vendor: Red Hat, Inc. Release: 27.9 Build Date: Mon 07 Apr 2003 08:29:45 PM BRT [root@aparecida tmp]# host 201.34.255.58 | grep "domain name pointer" | wc -l 258 [root@aparecida tmp]# ./getnameinfo 201.34.255.58 Hostname: webmail.alexandreabreu.com Address: 201.34.255.58 [root@aparecida tmp]# [root@aparecida tmp]# host 201.34.255.70 | grep "domain name pointer" | wc -l 387 [root@aparecida tmp]# ./getnameinfo 201.34.255.70 Hostname: www.sobrapgoias.com.br Address: 201.34.255.70 [root@aparecida tmp]# The IP addresses which i used for testing here (201.34.255.58 and .70) are still returning 258 and 387 PTRs at the time i'm writing this bug report. So i believe any of you can easily test this, just calling getnameinfo and pointing those IPs. The getnameinfo.c program, for postfix sources, are attached to the fedora bugzilla i created. It can be simply downloaded from here: https://bugzilla.redhat.com/attachment.cgi?id=291104
The problem seems to be in resolv/nss_dns/dns-host.c. The initial buffer is too small (1024 bytes), and getanswer_r will do: /* The buffer is too small. */ too_small: *errnop = ERANGE; *h_errnop = NETDB_INTERNAL; return NSS_STATUS_TRYAGAIN; Which is the right thing that getnameinfo eventually expects: while (__gethostbyaddr_r ((const char *) &in_addr, sizeof (struct in_addr), AF_INET, &th, tmpbuf, tmpbuflen, &h, &herror)) { if (herror == NETDB_INTERNAL && errno == ERANGE) tmpbuf = extend_alloca (tmpbuf, tmpbuflen, 2 * tmpbuflen); else break; } But, unlike _nss_dns_gethostbyname*_r, _nss_dns_gethostbyaddr*_r will overwrite all the values: status = getanswer_r (host_buffer.buf, n, qbuf, T_PTR, result, buffer, buflen, errnop, h_errnop, 0 /* XXX */, ttlp, NULL); if (host_buffer.buf != orig_host_buffer) free (host_buffer.buf); if (status != NSS_STATUS_SUCCESS) { *h_errnop = h_errno; *errnop = errno; return status; } I guess nuking the *h_errnop = h_errno; *errnop = errno; could fix this.
I dont know anything about coding .... but just as curiosity, the 2 IP addresses i listed, which contains 258 and 357 PTRs, have the following 'reply sizes': [root@correio postfix]# dig @ns1.cultura.com.br 58.255.34.201.in-addr.arpa ptr ;; Truncated, retrying in TCP mode. ;; Query time: 625 msec ;; SERVER: 201.34.255.2#53(201.34.255.2) ;; WHEN: Tue Feb 26 10:27:59 2008 ;; MSG SIZE rcvd: 9137 [root@correio postfix]# dig @ns1.cultura.com.br 70.255.34.201.in-addr.arpa ptr ;; Truncated, retrying in TCP mode. ;; Query time: 542 msec ;; SERVER: 201.34.255.2#53(201.34.255.2) ;; WHEN: Tue Feb 26 10:29:57 2008 ;; MSG SIZE rcvd: 12116 if i correctly understood, buffer would have to be raised to fit these kind of big replies, that's correct ?
I applied the patch to the trunk.