Differences between revisions 49 and 50
Revision 49 as of 2014-06-03 13:48:51
Size: 12094
Comment:
Revision 50 as of 2014-06-19 03:02:39
Size: 12468
Comment:
Deletions are marked like this. Additions are marked like this.
Line 203: Line 203:
=== Fedora ===

 * Fix getaddrinfo - https://fedoraproject.org/wiki/Features/FixNetworkNameResolution

 * Fix AI_ADDRCONFIG - https://fedoraproject.org/wiki/Networking/NameResolution/ADDRCONFIG

 * Name resolution overview - https://fedoraproject.org/wiki/Networking/NameResolution

 * Distribution networking overview - https://fedoraproject.org/wiki/Networking

Introduction

The purpose of this page is to coordinate an effort to decide on the correct behavior of getaddrinfo() in certain corner cases.

The behavior of getaddrinfo() is governed by POSIX but unfortunately the spec is not entirely clear.

For background see the discussion at bug #15726.

getaddrinfo()

#include <sys/socket.h>
#include <netdb.h>

int getaddrinfo(
    const char *restrict nodename,
    const char *restrict service,
    const struct addrinfo *restrict hints,
    struct addrinfo **restrict result
);

The nodename argument can be either a hostname or an address (a dotted-decimal or coloned-hex string) or NULL, indicating the local machine. The service can be a name or numeral (in string form) or NULL, indicating that network-level addresses should be returned. Either nodename or service (or both) should be non-NULL.

If not NULL, hints causes the returned information to be filtered by family, flags, protocol and/or socket type.

In case of success the function returns a nonempty linked list of addrinfo structures pointed to by result.

Resolving

Most often getaddrinfo() is used to resolve a hostname to an address.

Qualified variants of the given hostname should also be sought.

All available address types (IPv4 and IPv6) should be returned.

The correct sequence of lookups is: in the first information source look up all the different hostname variants and for each variant look up the different address types. For example, if a "hosts" file is the first source then the resolver should first look for all hostname variants in the hosts file before trying DNS.

If a hostname is found in a source (a positive answer) then the resolver should look up all the address types that were requested and then not search any further.

Some sources might also contain the information that the hostname does not exist (negative answer). In case of a negative answer the resolver should also stop searching.

If there is neither a positive or negative answer then the resolver should continue searching until all sources have been searched.

If all sources have been exhausted and no positive answer was obtained then that should be considered a negative answer.

Outcomes

The following outcomes are possible.

  • A positive answer: the hostname exists. The hostname could have zero, one or more addresses assigned to it.

  • A negative answer: the hostname does not exist.

  • Some error occurred during the search. The error can be either local — reported perhaps by a local subsystem such as the memory allocator — or remote, reported by a remote service such as a nameserver.

The difference between a negative answer and an error is important because in the case of a negative answer there is no point in retrying, whereas in the case of an error there may be some point in retrying (later).

Return value

The spec allows the following return values (http://pubs.opengroup.org/onlinepubs/9699919799/functions/gai_strerror.html):

0
[EAI_AGAIN]
[EAI_BADFLAGS]
[EAI_FAIL]
[EAI_FAMILY]
[EAI_MEMORY]
[EAI_NONAME]
[EAI_OVERFLOW]
[EAI_SERVICE]
[EAI_SOCKTYPE]
[EAI_SYSTEM]

The getaddrinfo() spec explains them this way:

[EAI_AGAIN]
    The name could not be resolved at this time. Future attempts may succeed.
[EAI_BADFLAGS]
    The flags parameter had an invalid value.
[EAI_FAIL]
    A non-recoverable error occurred when attempting to resolve the name.
[EAI_FAMILY]
    The address family was not recognized.
[EAI_MEMORY]
    There was a memory allocation failure when trying to allocate storage for the return value.
[EAI_NONAME]
    The name does not resolve for the supplied parameters.

    Neither nodename nor servname were supplied. At least one of these shall be supplied.
[EAI_SERVICE]
    The service passed was not recognized for the specified socket type.
[EAI_SOCKTYPE]
    The intended socket type was not recognized.
[EAI_SYSTEM]
    A system error occurred; the error code can be found in errno. 

Glibc currently (March 2014) defines the following values (in glibc/sysdeps/posix/gai_strerror-strs.h) in addition to 0.

_S(EAI_ADDRFAMILY, N_("Address family for hostname not supported"))
_S(EAI_AGAIN, N_("Temporary failure in name resolution"))
_S(EAI_BADFLAGS, N_("Bad value for ai_flags"))
_S(EAI_FAIL, N_("Non-recoverable failure in name resolution"))
_S(EAI_FAMILY, N_("ai_family not supported"))
_S(EAI_MEMORY, N_("Memory allocation failure"))
_S(EAI_NODATA, N_("No address associated with hostname"))
_S(EAI_NONAME, N_("Name or service not known"))
_S(EAI_SERVICE, N_("Servname not supported for ai_socktype"))
_S(EAI_SOCKTYPE, N_("ai_socktype not supported"))
_S(EAI_SYSTEM, N_("System error"))
_S(EAI_INPROGRESS, N_("Processing request in progress"))
_S(EAI_CANCELED, N_("Request canceled"))
_S(EAI_NOTCANCELED, N_("Request not canceled"))
_S(EAI_ALLDONE, N_("All requests done"))
_S(EAI_INTR, N_("Interrupted by a signal"))
_S(EAI_IDN_ENCODE, N_("Parameter string not correctly encoded"))

Note the extra values EAI_ADDRFAMILY and EAI_NODATA.

It is clear, at least, that getaddinfo() should:

  • return 0 if a positive answer was received and at least one address was obtained;

  • return some error code if an error was encountered.

It is reasonably clear that the following error codes should be returned under the following circumstances.

  • A source could not be reached due to some apparently temporary condition: EAI_AGAIN
  • AI_NUMERICHOST was used and the hostname wasn't a valid numeric string representation of the address: EAI_NONAME
  • AI_NUMERICSERV was used and the service name wasn't a valid number in string form: EAI_NONAME
  • Both nodename and servname are NULL: EAI_NONAME
  • Unknown bits where set in the ai_flags: EAI_BADFLAGS
  • Unknown or unsupported family was used: EAI_FAMILY
  • Unknown or unsupported socket type: EAI_SOCKTYPE
  • There was a failure to allocate memory: EAI_MEMORY
  • There was a answer that indicates that the service name doesn't exist for the given socket type: EAI_SERVICE
  • The source information doesn't make sense, like failing to parse a file, dns returned an invalid packet: EAI_FAIL
  • Some system error occurred and errno is set: EAI_SYSTEM.

Some reasons why system might return EAI_SYSTEM and set errno:

  • We tried to open a file and got an error like EACCES, EMFILE, ... It should probably return EAI_SYSTEM in that case.
  • We tried to create a socket and got an error like EACCES, EINVAL, ... It should probably only return that error in case all the different sockets it tried to create failed.
  • We tried to do network communication (send, sendto, recv, recvfrom) and got an error. It should probably all be treated as non-permanent error and might then result in an EAI_AGAIN.
  • EINTR should always be handled by getaddrinfo() itself. It should not return EAI_SYSTEM in this case.

But...

  • In case of a negative answer it's unclear what should be returned. Some implementations return EAI_FAIL which seems justifiable insofar as a negative answer is an "unrecoverable" failure (to resolve the name); others return EAI_NONAME which "sounds right" but conflates negative answers with other failure cases which amount to programming errors.
  • For the case where there was a positive answer with no addresses (of the requested family), EAI_NODATA used to be specified as the return value. This value is still defined in glibc gai_strerror(), but it is not mentioned in recent versions of the spec. We can't return 0 (along with an empty list of addresses) because the standard stipulates that when 0 is returned there must be at least one address returned. Either EIA_FAIL or EAI_NONAME might be returned, but either one of these conflates the no-address case with other rather different failure cases. There was an interesting discussion about this on the bind-users mailing list: https://lists.isc.org/pipermail/bind-users/2011-April/083701.html

From the perspective of the application that calls getaddrinfo() it perhaps doesn't matter that much since EAI_FAIL, EAI_NONAME and EAI_NODATA are all permanent failure codes and the causes are all permanent failures in the sense that there is no point in retrying later.

Currently (March 2014) Ubuntu and FreeBSD return EAI_NONAME in case of permanent failure.

Files specifics

A hostname might be assigned multiple addresses in the hosts file so the whole file has to be checked.

In case the hostname exists but doesn't have the address type that is asked for, should the resolver return the positive answer with no addresses, or what?

Both FQDN and non-FQDNs can be looked up in this source.

DNS specifics

When looking things up in DNS, there could be more than one server which we can ask address of the hostname. If we don't get a answer from the server or get a communication error we should move to the next server. In case there are no answers it should retry it since this goes over UDP and the packet could be lost. There should be some timeout before it gives up trying to look up the host. It should probably also increase the time between sending packets to the same server in case of no answer. Only after the timeout has been reached should we abort with an error.

In case of an invalid packet, retry until timeout, or error?

If the DNSSEC verification fails for whatever reason, should this be treated as a fatal error?

mDNS specifics

mDNS can only do lookups in the .local domain, and so the hostname should always be a FQDN.

mDNS queries should be retried until there is an answer, a fatal error or a timeout is reached (which is also an error). There should probably be an increase in time between packets.

Since this is based on DNS, it can also return the case of no address.

There is a problem with the .local domain since both DNS and mDNS claim to be authoritative over it, and DNS will always return a negative answer for it if root nameserver can be reached. Changing the order of the sources doesn't solve this. Therefore in case of mDNS returning no answer for a host in the .local domain it should prevent the next source from being tried, and DNS should come after mDNS.

Configuration changes

There are various reasons while the configuration files can be changed while a process is running. getaddrinfo() should re-read those files if they got changed.

Relevant Standards

RFC3493 "Basic Socket Interface Extensions for IPv6" http://tools.ietf.org/html/rfc3493.html (obsoletes RFC2553, RFC2113)

RFC6724 "Default Address Selection for Internet Protocol version 6 (IPv6)" https://tools.ietf.org/html/rfc6724 (obsoletes RFC3484)

POSIX Issue 7 "freeaddrinfo, getaddrinfo - get address information" http://pubs.opengroup.org/onlinepubs/9699919799/functions/freeaddrinfo.html

Fedora

Raw API designs

Resolver libraries

None: NameResolver (last edited 2014-06-19 03:02:39 by CarlosODonell)