Bug 304 - AI_CANONNAME in getaddrinfo(): name is an address; result sorting
Summary: AI_CANONNAME in getaddrinfo(): name is an address; result sorting
Status: RESOLVED WORKSFORME
Alias: None
Product: glibc
Classification: Unclassified
Component: libc (show other bugs)
Version: 2.3.3
: P2 normal
Target Milestone: ---
Assignee: Ulrich Drepper
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-08-06 07:59 UTC by A. Guru
Modified: 2019-04-10 10:53 UTC (History)
4 users (show)

See Also:
Host: n/a
Target: n/a
Build: n/a
Last reconfirmed:
fweimer: security-


Attachments
patch for first 2 of 3 problems described (2.43 KB, patch)
2004-08-06 08:01 UTC, A. Guru
Details | Diff
oops... this is still necessary (apply on top of first patch) (242 bytes, patch)
2004-08-06 08:49 UTC, A. Guru
Details | Diff
patch for ""If hints is a null pointer, ... value zero for the ai_flags... " (440 bytes, patch)
2004-08-08 00:30 UTC, A. Guru
Details | Diff
patch to solve third of 3 problems described (667 bytes, patch)
2004-08-08 00:38 UTC, A. Guru
Details | Diff
Fix typo + safety check. (Apply on top of first patch.) (391 bytes, patch)
2004-08-08 00:44 UTC, A. Guru
Details | Diff
new patch for cvs version 1.67 of getaddrinfo.c (3.18 KB, patch)
2004-08-17 04:22 UTC, A. Guru
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description A. Guru 2004-08-06 07:59:34 UTC
According to

  <http://www.opengroup.org/onlinepubs/009695399/functions/getaddrinfo.html>,

regarding AI_CANONNAME

  "A numeric host address string is not a ``name'', and thus does not
   have a ``canonical name'' form; no address to host name translation
   is performed. See below for handling of the case where a canonical
   name cannot be obtained."

  "if the canonical name is not available, then ai_canonname shall
   refer to the nodename argument or a string with the same contents."

The current glibc version of getaddrinfo() attempts to fill
ai_canonname even when the name argument turned out to be an
IP address.  The caller might count on the fact that no DNS server
is ever contacted on his behalf when he provides a numerical address
(for privacy or security reasons).  This is fixed by the attached
patch.

Also

  "the ai_canonname field of the first returned addrinfo structure"

Since the glibc implementation of getaddrinfo() performs sorting of
these structures according to RFC 3484, which one of them is the
first is only known after sorting.  Therefore, the appropriate
ai_canonname field can only be filled after the sort is done.  The
implementation also has the problem that it attemptsto fill the
field for many structures when more than one service is found (first
value of at2, _all_ values of st2 in the nested for loops of
gaih_inet()).  The attached patch also attempts to solve this problem.
We have to be careful with memory in the solution because

  "The freeaddrinfo() function shall support the freeing of
   arbitrary sublists of an addrinfo list originally returned
   by getaddrinfo()"

        ********

There is another issue with AI_CANONNAME that is _not_ addressed
by the attached patch.

  "the function shall attempt to determine the canonical name
   corresponding to nodename (for example, if nodename is an
   alias or shorthand notation for a complete name)."

  "the terms ``canonical name'' and ``alias''"

  "Domain Name System implementations are expected to interpret
   them as they are used in RFC 1034."

From RFC 1034:

  "CNAME identifies the canonical name of an alias"

  "CNAME chains should be followed"

  "A name error indicating that the name does not exist.  This
   may include CNAME RRs that indicate that the original query
   name was an alias for a name which does not exist."

This implies that, in the DNS case, ai_canonname should contain
a copy of the "owner" of the found A or AAAA (or A6) resource
record(s) that provided the address(es) that were found, possibly
after having followed a CNAME chain.

The current implementation systematically uses __gethostbyaddr_r(),
which translates into a PTR query in the DNS case, instead of somehow
using the result of the AAAA and A queries that were already performed
(since we must only fill ai_canonname if the name argument is not
null and is an actual name, not an address).  There are problems with
this.  There might not even be a PTR RR even if there is one or more
AAAA or A RR(s).  If there is a PTR RR, it may be on another DNS server
than the AAAA or A one and the caller might not want or expect that
this other server be contacted on his behalf (for privacy or security
reasons).

I don't know how to fix this behavior because I am not familiar with
the implementation of the lookup functions on which getaddrinfo() is
based.

        ********

The proposed patch follows.  I wrote it and I am putting it in the
public domain.  It is relative to cvs version 1.63 of the file.
Comment 1 A. Guru 2004-08-06 08:01:28 UTC
Created attachment 152 [details]
patch for first 2 of 3 problems described
Comment 2 A. Guru 2004-08-06 08:49:44 UTC
Created attachment 153 [details]
oops... this is still necessary (apply on top of first patch)
Comment 3 A. Guru 2004-08-08 00:30:43 UTC
Created attachment 155 [details]
patch for ""If hints is a null pointer, ... value zero for the ai_flags... "
Comment 4 A. Guru 2004-08-08 00:38:01 UTC
Created attachment 156 [details]
patch to solve third of 3 problems described
Comment 5 A. Guru 2004-08-08 00:44:09 UTC
Created attachment 157 [details]
Fix typo + safety check.  (Apply on top of first patch.)
Comment 6 Ulrich Drepper 2004-08-16 18:52:17 UTC
None of this applies to the current sources anymore.

Consider updating your comments for the current sources and mark the patches
which are not useful as obsolete.
Comment 7 A. Guru 2004-08-17 04:19:54 UTC
This comment replaces my previous ones.
All quotes are from

  <http://www.opengroup.org/onlinepubs/009695399/functions/getaddrinfo.html>.

All issues relative to cvs version 1.67 of getaddrinfo.c, which is
the latest at the time of writing.

Issues:

	-- "The freeaddrinfo() function shall support the freeing of
	    arbitrary sublists of an addrinfo list originally returned
	    by getaddrinfo()."

		For example:

			struct addrinfo hints = { .ai_flags = AI_CANNONNAME };
			rc = getaddrinfo(nodename, servname, &hints, &res);
			freeaddrinfo(res->ai_next);
			printf("%s\n", res->ai_canonname);

		This is legal (except for the error and non-NULL checkings).

		The problem is that the way ai_canonname is allocated
		(in the same malloc() as the original struct addrinfo
		itself), the first struct addrinfo after reordering may
		be other than the original one and now contains a pointer
		to memory in another malloc(), that of the original one.
		By the time the printf() above is executed, we may
		dereference freed memory.

		The solution is to either allocate ai_canonname separately
		(which is what I have done) or realloc() the new first
		struct addrinfo if it is not the original one and store a
		copy of ai_canonname in it.

		In the new patch, notice earlier use of local_hints, removal
		of "const" in prototypes, usage of req->ai_canonname, usage
		of strdup(), and usage of free(... ai_canonname).

	-- "In this hints structure every member other than ai_flags,
	    ai_family, ai_socktype, and ai_protocol shall be set to zero
	    or a null pointer."

	   "If hints is a null pointer, the behavior shall be as if it
	    referred to a structure containing the value zero for the
	    ai_flags, ai_socktype, and ai_protocol fields, and AF_UNSPEC
	    for the ai_family field."

		I removed all fields which should be zero in the
		initializer of default_hints which will accomplish
		just that.  No more AI_DEFAULT.

	-- "If the AI_CANONNAME flag is specified and the nodename argument
	    is not null..."

	   "A numeric host address string is not a ``name'', and thus
	    does not have a ``canonical name'' form; no address to host
	    name translation is performed. See below for handling of
	    the case where a canonical name cannot be obtained."

	   "... if the canonical name is not available, then ai_canonname
	    shall refer to the nodename argument or a string with the
	    same contents."

		Here, the nodename argument should be an unmodified
		"name", not some transformation of it by libidn.
		I've called the transformed version "iname" and used
		it where appropriate when HAVE_LIBIDN.

	-- The ai_canonname field should be the result of following a
	   CNAME RR chain (if any) to the _owner_ of an AAAA or A record,
	   not the result of waiting to follow those address records
	   (there may be many) then following back PTR RRs (which may
	   be one of many or not exist at all for each address record).
	   See original bug description for more details.

		I didn't remove the code that uses __gethostbyaddr_r(),
		just in case ai_canonname has not be found by then,
		but maybe it should just be gone altogether.

	-- Each and every call to resolving functions (through NSS or
	   direct) may result in network traffic or at least file
	   manipulations.  (Assume there is no cache, which is a
	   possibility.)  We should always avoid calling such a function
	   if the result it can provide could have been extracted from
	   a previous call.

		I tried to set canon as early as possible and only if
		necessary.

	-- Weird returned struct hostent are possible.

		Thus the (h && h->h_name && h->h_name[0]) checks.
Comment 8 A. Guru 2004-08-17 04:22:02 UTC
Created attachment 165 [details]
new patch for cvs version 1.67 of getaddrinfo.c
Comment 9 Ulrich Drepper 2004-09-26 21:35:19 UTC
Once again, getaddrinfo changed significantly.  Do you still think something is
wrong?
Comment 10 Ulrich Drepper 2005-09-23 03:18:01 UTC
No reply in almost a year, closing.
Comment 11 Jackie Rosen 2014-02-16 19:35:17 UTC Comment hidden (spam)