Bug 15726 - getaddrinfo() returns incorrect status
Summary: getaddrinfo() returns incorrect status
Status: NEW
Alias: None
Product: glibc
Classification: Unclassified
Component: network (show other bugs)
Version: 2.17
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL: http://pubs.opengroup.org/onlinepubs/...
Keywords:
Depends on:
Blocks:
 
Reported: 2013-07-10 18:04 UTC by Kurt Roeckx
Modified: 2016-05-16 18:18 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Kurt Roeckx 2013-07-10 18:04:04 UTC
There have been many changes in the return values from getaddrinfo() in glibc to the point I can't rely on the return values to mean anything.

As far as I know the standard for this is rfc3493 (also known as 2553bis).

I'm going to quote those that I'm having a problem with as glibc now returns:
   [EAI_AGAIN]     The name could not be resolved at this time.  Future
                   attempts may succeed.
   [EAI_FAIL]      A non-recoverable error occurred when attempting to
                   resolve the name.
   [EAI_NONAME]    The name does not resolve for the supplied
                   parameters.  Neither nodename nor servname were
                   supplied.  At least one of these must be supplied.

And from the manpage:
      EAI_AGAIN
              The name server returned a temporary failure indication.  Try again later.
      EAI_FAIL
              The name server returned a permanent failure indication.
      EAI_NONAME
              The  node  or service is not known; or both node and service are NULL; or AI_NUMERICSERV was specified in hints.ai_flags and service was not a numeric port-number string.


What I expect:
- Things work as expected: return 0
- The nameserver replies that the hostname does not exist: EAI_FAIL
- The nameserver doesn't reply, or replies with a temporary failure: EAI_AGAIN
- You used AI_NUMERICHOST or AI_NUMERICSERV and didn't give a number: EAI_NONAME

What I think the current situation is:
- Things work as expected: return 0
- The nameserver replies that the hostname does not exist: EAI_NONAME
- The nameserver doesn't reply: EAI_NONAME
- The nameserver replies with a temporary failure: EAI_NONAME


Kurt
Comment 1 Andreas Schwab 2013-07-10 21:14:53 UTC
EAI_FAIL is only returned for an erroneous answer.  A negative answer is not erroneous.
Comment 2 Kurt Roeckx 2013-07-10 21:35:18 UTC
(In reply to Andreas Schwab from comment #1)
> EAI_FAIL is only returned for an erroneous answer.  A negative answer is not
> erroneous.

I'm not sure what you're trying to say here.  I never said something about what I think an erroneous answer should return.  But I can understand both EAI_FAIL and EAI_AGAIN as valid return code for that case.

But that has nothing to do with what a negative answer should have as return value.  And clearly EAI_NONAME is the wrong thing to return for any of the cases I've mentioned that it currently returns it.

It should only return EAI_NONAME in case of:
- AI_NUMERICHOST was used and nodename is not a numeric string representing an address.
- AI_NUMERICSERV was used and servname is not a numeric string representing a port
- Both nodename and servname are NULL
Comment 3 Andreas Schwab 2013-07-10 21:44:14 UTC
The correct value for a negative answer is EAI_NONAME, not EAI_FAIL.
Comment 4 Kurt Roeckx 2013-07-10 21:45:50 UTC
(In reply to Andreas Schwab from comment #3)
> The correct value for a negative answer is EAI_NONAME, not EAI_FAIL.

Can you at least agree that all other cases it now returns EAI_NONAME for are wrong?
Comment 5 Zack Weinberg 2013-07-11 13:26:50 UTC
http://pubs.opengroup.org/onlinepubs/9699919799/ is the X/Open Issue 7 spec for getaddrinfo, which is a little clearer about EAI_NONAME.  It's the exact same text but with a paragraph break inserted at a key point:


    [EAI_AGAIN]
        The name could not be resolved at this time. Future attempts may succeed.
    [EAI_FAIL]
        A non-recoverable error occurred when attempting to resolve the name.
    [EAI_NONAME]
        The name does not resolve for the supplied parameters.

        Neither nodename nor servname were supplied. At least one of these
        shall be supplied.
    [EAI_SYSTEM]
        A system error occurred; the error code can be found in errno.

I read that as specifying that EAI_NONAME is the appropriate error return *both* when the name does not resolve (== NXDOMAIN at the DNS level), *and* when "neither nodename nor servname were supplied".

I think it's kind of unfortunate that EAI_NONAME is overloaded this way; it would have been better to have a code specifically for a bad argument combination (like the existing EAI_BADFLAGS).  Also, as an application programmer I have no idea how I'm supposed to interpret EAI_FAIL.  At least with EAI_SYSTEM there is an errno code to give additional guidance.

For context, as I understand it this grew out of a conversation about what happens when you try to resolve a name during boot, and the network isn't yet configured enough to provide name service.  On the operational level it seems desirable for that always to produce EAI_AGAIN.
Comment 6 Thomas Hood 2013-07-11 13:37:05 UTC
Kurt's interpretation appears prima facie to be correct. The definition of EAI_NONAME is:

    The name does not resolve for the supplied parameters.
    Neither nodename nor servname were supplied. At least
    one of these must be supplied.

I.e., getaddrinfo() was called with invalid arguments, period.

But this is not how the RFC is being interpreted.

OpenBSD
-------
The code returns EAI_NONAME in two cases. First:
		if (as->as.ai.hostname == NULL &&
		    as->as.ai.servname == NULL) {
			ar->ar_gai_errno = EAI_NONAME;
Second:
		[...]
		if (ai->ai_flags & AI_NUMERICHOST) {
			ar->ar_gai_errno = EAI_NONAME;

OpenBSD man page for gai_strerror:

   EAI_NONAME  hostname or servname not provided, or not known

NetBSD
------
NetBSD getaddrinfo() returns EAI_NONAME in two kinds of cases. First:
	if (hostname == NULL && servname == NULL)
		return EAI_NONAME;
Second:
	[...]
	if (pai->ai_flags & AI_NUMERICHOST)
		ERR(EAI_NONAME);

NetBSD man page for getaddrinfo:

     EAI_NONAME  nodename nor servname provided, or not known.

A relevant NetBSD problem report:

   http://gnats.netbsd.org/44915

Solaris
-------
Solaris man page for getaddrinfo():
      EAI_NONAME  Neither nodename nor servname is provided or known.

HP Tru64 UNIX
-------------
  [EAI_NONAME]
      The node name cannot be resolved with the supplied parameters.
      You did not pass either the nodename or servname parameter.
      You must pass at least one.

bind-users
----------
There's an interesting discussion of this issue on the bind-users mailing list.

   https://lists.isc.org/pipermail/bind-users/2011-April/083701.html
Comment 7 Thomas Hood 2013-07-11 15:17:21 UTC
Kurt's interpretation appears prima facie to be correct. The definition of EAI_NONAME is:

    The name does not resolve for the supplied parameters.
    Neither nodename nor servname were supplied. At least
    one of these must be supplied.

I.e., getaddrinfo() was called with invalid arguments, period.

Compare with other *NIXes.

OpenBSD
-------
The code returns EAI_NONAME in two cases. First:
		if (as->as.ai.hostname == NULL &&
		    as->as.ai.servname == NULL) {
			ar->ar_gai_errno = EAI_NONAME;
Second:
		[...]
		if (ai->ai_flags & AI_NUMERICHOST) {
			ar->ar_gai_errno = EAI_NONAME;

OpenBSD man page for gai_strerror:

   EAI_NONAME  hostname or servname not provided, or not known

NetBSD
------
NetBSD getaddrinfo() returns EAI_NONAME in two kinds of cases. First:
	if (hostname == NULL && servname == NULL)
		return EAI_NONAME;
Second:
	[...]
	if (pai->ai_flags & AI_NUMERICHOST)
		ERR(EAI_NONAME);

NetBSD man page for getaddrinfo:

     EAI_NONAME  nodename nor servname provided, or not known.

A relevant NetBSD problem report:

   http://gnats.netbsd.org/44915

Solaris
-------
Solaris man page for getaddrinfo():
      EAI_NONAME  Neither nodename nor servname is provided or known.

HP Tru64 UNIX
-------------
  [EAI_NONAME]
      The node name cannot be resolved with the supplied parameters.
      You did not pass either the nodename or servname parameter.
      You must pass at least one.

bind-users
----------
There's an interesting discussion of this issue on the bind-users mailing list.

   https://lists.isc.org/pipermail/bind-users/2011-April/083701.html
Comment 8 Kurt Roeckx 2013-07-11 16:54:33 UTC
So I've been looking into the case of a negative response. It's clear to me that it's not documented properly in the standard(s), and I can see how you can interprete either EAI_FAIL or EAI_NONAME as the proper answer.  And I have no problem with either as reply.

Looking at the history of it, rfc 2133 and 2553 had:
      EAI_FAIL        non-recoverable failure in name resolution
      EAI_NODATA      no address associated with nodename
      EAI_NONAME      nodename nor servname provided, or not known

Looking at old wrappers around gethostbyname, I find code like this:
	hptr = gethostbyname(host);
	if (hptr == NULL) {
		switch (h_errno) {
			case HOST_NOT_FOUND:	return(EAI_NONAME);
			case TRY_AGAIN:		return(EAI_AGAIN);
			case NO_RECOVERY:	return(EAI_FAIL);
			case NO_DATA:		return(EAI_NODATA);
			default:		return(EAI_NONAME);
		}
	}

Tht is code written by W. Richard Stevens, among other things co-author of the rfc.

Those error codes are defined as:
       HOST_NOT_FOUND

	      No such host is known.

       NO_DATA
	      The  server  recognized the request and the name, but no address
	      is available. Another type of request to the name server for the
	      domain might return an answer.

       NO_RECOVERY

	      An unexpected server failure occurred which cannot be recovered.

       TRY_AGAIN
	      A  temporary  and  possibly  transient error occurred, such as a
	      failure of a server to respond.

Kurt
Comment 9 Thomas Hood 2013-07-12 10:03:36 UTC
(Drat — sorry for the duplicate comment #6. If someone has the rights, please delete comment #6.)

In general I think we should try to follow the RFC — preferably, if reasonable, as it has already been interpreted by other significant unixes.

The most important distinction is that between (1) not receiving an answer (whether this be because the function is not called properly, the network is down, there are no more file descriptors, whatever); and (2) receiving an answer (whether it be an answer containing the requested information or an answer containing the information that the name in question does not exist in the available sources).

1. No answer received: AGAIN, BADFLAGS, FAIL, FAMILY, MEMORY, SERVICE, SOCKTYPE, SYSTEM
2. Answer was received: 0

I omit ADDRFAMILY and NODATA which are not mentioned in RFC3493.

The question about NONAME is mainly the question which of these classes it falls into. 

RFC3493 says very clearly that it must be returned at least in the case when neither nodename nor servname were supplied.

Zack Weinberg wrote:
> I read that as specifying that EAI_NONAME is the appropriate error
> return *both* when the name does not resolve (== NXDOMAIN at the
> DNS level), *and* when "neither nodename nor servname were supplied".
>
> I think it's kind of unfortunate that EAI_NONAME is overloaded this way

Agreed, although it is true in some sense that there is no such domain name in DNS as the null name.

> Also, as an application programmer I have no idea how I'm supposed
> to interpret EAI_FAIL.  At least with EAI_SYSTEM there is an errno
> code to give additional guidance.

FAIL means that a remote fault occurred whereas SYSTEM means that a local fault occurred? I can imagine that for many clients this won't make any difference.
Comment 10 Kurt Roeckx 2013-07-13 12:23:20 UTC
(In reply to Thomas Hood from comment #9)
> In general I think we should try to follow the RFC — preferably, if
> reasonable, as it has already been interpreted by other significant unixes.

If the rfc is open for interpretation, I think we should try and get it (or posix) fixed.

> The most important distinction is that between (1) not receiving an answer
> (whether this be because the function is not called properly, the network is
> down, there are no more file descriptors, whatever); and (2) receiving an
> answer (whether it be an answer containing the requested information or an
> answer containing the information that the name in question does not exist
> in the available sources).
> 
> 1. No answer received: AGAIN, BADFLAGS, FAIL, FAMILY, MEMORY, SERVICE,
> SOCKTYPE, SYSTEM
> 2. Answer was received: 0

I'm not sure I understand what you're trying to say here.  do both 1's and 2's match?  Note that if it returns 0 it should have at least 1 address as result.

I also think the return value can be AGAIN in case the server does reply, but for instance returns a temporary failure.  Or we only had the A reply and not the AAAA reply.

> RFC3493 says very clearly that it must be returned at least in the case when
> neither nodename nor servname were supplied.

For some cases it's very explicit when which error code should be returned.  For the the rest it just lists possible error code, not saying when which should be returned.

> Zack Weinberg wrote:
> > I read that as specifying that EAI_NONAME is the appropriate error
> > return *both* when the name does not resolve (== NXDOMAIN at the
> > DNS level), *and* when "neither nodename nor servname were supplied".
> >
> > I think it's kind of unfortunate that EAI_NONAME is overloaded this way

I first thought it wasn't overloaded this way and was only for cases where you
supplied bad parameters.  Now I'm not sure anymore.


Kurt
Comment 11 Thomas Hood 2013-07-14 09:44:07 UTC
> > 1. No answer received: AGAIN, BADFLAGS, FAIL, FAMILY, MEMORY, SERVICE,
> > SOCKTYPE, SYSTEM
> > 2. Answer was received: 0
>
> I'm not sure I understand what you're trying to say here.  do both 1's and
> 2's match?  Note that if it returns 0 it should have at least 1 address
> as result.

When status 0 is received it means that an answer has been received to the question (simplifying a bit) "What are the addresses corresponding to this nodename + servname?"

"There is no address corresponding to the given nodename + servname" is also an answer.

But "Sorry, we couldn't find out what addresses correspond to this nodename + servname because we ran out of file descriptors" is not what I am calling an answer.

Put another way, when a client receives an answer, the client has the information that was stored in the name services; when the client receives no answer, the client does not have the information that was stored in the name services. The client may want to react differently to these two outcomes.

When AGAIN, BADFLAGS, FAIL, FAMILY, MEMORY, SERVICE, SOCKTYPE, SYSTEM are returned, it means that no answer was received.

When status 0 is returned, it means that an answer was received.

Q1. Now, what does it mean when NONAME is returned? If both nodename and servname were null then it means just that. And if nodename and servname were not null? Is NONAME ever returned in that case, and if so, under what circumstances?

Q2. What is returned when the answer is "There is no address corresponding to the given nodename + servname", i.e., in the case of DNS, NXDOMAIN?

I would be inclined to come out and say that NONAME should be returned under exactly the following circumstances

    Either nodename and servname are null
    or there is no address corresponding to the given nodename+servname

if it weren't for the fact that the RFC doesn't clearly say that this is what NONAME means, and the fact that I haven't yet looked closely enough at how other unices have interpreted the RFC.
Comment 12 Rich Felker 2013-07-14 16:06:48 UTC
Conceptually, this whole topic is very simple. getaddrinfo has two "successful" result possibilities, 0 and EAI_NONAME.

0 means a successful query was performed and returned a result.

EAI_NONAME means a successful query was performed and determined that the queried name does not (by hostname lookup) or cannot (by virtue of being an invalid ip string) exist.

All other result codes indicate to the application that something went wrong during the name resolving process, and that the result is indeterminate.
Comment 13 Carlos O'Donell 2013-07-18 07:03:09 UTC
(In reply to Rich Felker from comment #12)
> Conceptually, this whole topic is very simple. getaddrinfo has two
> "successful" result possibilities, 0 and EAI_NONAME.
> 
> 0 means a successful query was performed and returned a result.
> 
> EAI_NONAME means a successful query was performed and determined that the
> queried name does not (by hostname lookup) or cannot (by virtue of being an
> invalid ip string) exist.
> 
> All other result codes indicate to the application that something went wrong
> during the name resolving process, and that the result is indeterminate.

What about EAI_NODATA which glibc still uses? It would seem that EAI_NODATA is more likely be returned in the current implementation than EAI_NONAME (which is mostly returned for the parameter errors discussed).
Comment 14 Rich Felker 2013-07-18 14:56:12 UTC
On Thu, Jul 18, 2013 at 07:03:09AM +0000, carlos at redhat dot com wrote:
> What about EAI_NODATA which glibc still uses? It would seem that EAI_NODATA is
> more likely be returned in the current implementation than EAI_NONAME (which is
> mostly returned for the parameter errors discussed).

EAI_NODATA is not even in the specification for this function; I don't
know why it exists at all, as it could badly confuse conforming
applications which don't expect a positive "successful query with no
result" return code other than the specified EAI_NONAME.

On the other hand, I see the motivation, at least according to the
documentation. The Linux man page suggests glibc is using EAI_NONAME
when there is no such host/domain name, and EAI_NODATA when the
hostname exists but does not have any A or AAAA records (nor a CNAME
pointing to one that does).

Is it behaving differently from how it's documented?
Comment 15 Carlos O'Donell 2013-07-18 17:43:22 UTC
(In reply to Rich Felker from comment #14)
> On Thu, Jul 18, 2013 at 07:03:09AM +0000, carlos at redhat dot com wrote:
> > What about EAI_NODATA which glibc still uses? It would seem that EAI_NODATA is
> > more likely be returned in the current implementation than EAI_NONAME (which is
> > mostly returned for the parameter errors discussed).
> 
> EAI_NODATA is not even in the specification for this function; I don't
> know why it exists at all, as it could badly confuse conforming
> applications which don't expect a positive "successful query with no
> result" return code other than the specified EAI_NONAME.

EAI_NODATA is defined under GNU source and was part of the getaddrinfo specification at one point and was then removed.

It is not documented anywhere in glibc (classic glibc).

The linux man pages project says:
~~~
EAI_NODATA
           The specified network host exists, but does not have
           any network addresses defined.
~~~

> On the other hand, I see the motivation, at least according to the
> documentation. The Linux man page suggests glibc is using EAI_NONAME
> when there is no such host/domain name, and EAI_NODATA when the
> hostname exists but does not have any A or AAAA records (nor a CNAME
> pointing to one that does).
> 
> Is it behaving differently from how it's documented?

It behaves as documented.

For example the following pseudo code:
  hints.ai_family = AF_INET6;
  print_errorcode (getaddrinfo ("www.redhat.com", NULL, &hints, &result));

Would print:
  EAI_NODATA

Since the host exists but it has no AAAA record.

If you remove /etc/resolv.conf it prints:
  EAI_NONAME

As expected since given your system configuration there is no such host/domain name.

Does that make sense?
Comment 16 Thomas Hood 2013-07-30 07:05:33 UTC
Ubuntu has just released a new libc6 package which returns -2 (EAI_NONAME) both when the nameserver can't be reached and when the name does not exist. Formerly Ubuntu libc6 returned -11 in both cases. It returns -3 (EAI_AGAIN) if the network interface is down. Details here:

    https://bugs.launchpad.net/ubuntu/+source/eglibc/+bug/1154599
Comment 17 Siddhesh Poyarekar 2013-07-30 07:16:56 UTC
That was bug 15339, which is fixed in 2.18.
Comment 18 Rich Felker 2013-07-30 12:07:07 UTC
> --- Comment #16 from Thomas Hood <jdthood at gmail dot com> ---
> Ubuntu has just released a new libc6 package which returns -2 (EAI_NONAME) both
> when the nameserver can't be reached and when the name does not exist. Formerly

This is wrong too. The nameserver being unreachable does not tell you
that the name does not exist. The right behavior is very simple; why
can nobody get it right?
Comment 19 Carlos O'Donell 2013-07-31 04:07:53 UTC
(In reply to Rich Felker from comment #18)
> > --- Comment #16 from Thomas Hood <jdthood at gmail dot com> ---
> > Ubuntu has just released a new libc6 package which returns -2 (EAI_NONAME) both
> > when the nameserver can't be reached and when the name does not exist. Formerly
> 
> This is wrong too. The nameserver being unreachable does not tell you
> that the name does not exist. The right behavior is very simple; why
> can nobody get it right?

My plan is to write a technical memo on this as a glibc wiki page and use that to educate people about the correct behaviour. Additionally it will serve as rationale for RFC interpretations and to defined our choices (or adjust them later). If you want to start drafting something I'd be more than happy :-)
Comment 20 Kurt Roeckx 2013-08-09 14:35:30 UTC
(In reply to Carlos O'Donell from comment #19)
> My plan is to write a technical memo on this as a glibc wiki page and use
> that to educate people about the correct behaviour. Additionally it will
> serve as rationale for RFC interpretations and to defined our choices (or
> adjust them later). If you want to start drafting something I'd be more than
> happy :-)

I'm willing to start on this.  What kind of things do you think it should cover?
Where do I put it exactly?
Comment 21 Carlos O'Donell 2013-08-09 16:04:53 UTC
(In reply to Kurt Roeckx from comment #20)
> (In reply to Carlos O'Donell from comment #19)
> > My plan is to write a technical memo on this as a glibc wiki page and use
> > that to educate people about the correct behaviour. Additionally it will
> > serve as rationale for RFC interpretations and to defined our choices (or
> > adjust them later). If you want to start drafting something I'd be more than
> > happy :-)
> 
> I'm willing to start on this.  What kind of things do you think it should
> cover?
> Where do I put it exactly?

(1) Create a glibc wiki account.
http://sourceware.org/glibc/wiki/
(2) Get someone to vouch for you and add you to the editor list (anti-spam measure). I can add you once you confirm your account name.
http://sourceware.org/glibc/wiki/EditorGroup
(3) Create a new page on the wiki and start adding information.
e.g. "NameResolver" (bikeshed).
Comment 22 Kurt Roeckx 2013-08-13 15:38:05 UTC
(In reply to Carlos O'Donell from comment #21)
> (3) Create a new page on the wiki and start adding information.
> e.g. "NameResolver" (bikeshed).

So I started on:
http://sourceware.org/glibc/wiki/NameResolver

More to come later.  Any feedback welcome.
Comment 23 Carlos O'Donell 2013-08-13 19:25:04 UTC
(In reply to Kurt Roeckx from comment #22)
> (In reply to Carlos O'Donell from comment #21)
> > (3) Create a new page on the wiki and start adding information.
> > e.g. "NameResolver" (bikeshed).
> 
> So I started on:
> http://sourceware.org/glibc/wiki/NameResolver
> 
> More to come later.  Any feedback welcome.

Great start. I'm happy to see people helping put this document together. I'll be helping here over the next couple of months.
Comment 24 Kurt Roeckx 2013-08-24 18:08:15 UTC
(In reply to Kurt Roeckx from comment #22)
> So I started on:
> http://sourceware.org/glibc/wiki/NameResolver
> 
> More to come later.  Any feedback welcome.

I've added more things to that page.  I'm not sure what else to add.
Comment 25 Carlos O'Donell 2013-08-28 04:53:04 UTC
(In reply to Kurt Roeckx from comment #24)
> (In reply to Kurt Roeckx from comment #22)
> > So I started on:
> > http://sourceware.org/glibc/wiki/NameResolver
> > 
> > More to come later.  Any feedback welcome.
> 
> I've added more things to that page.  I'm not sure what else to add.

What you've got is a great start. I'll get to reviewing this as part of my work on getaddrinfo.
Comment 26 Rich Felker 2013-08-28 09:04:55 UTC
I disagree with this:

"In case there was a negative answer it's unclear what should be returned. Some implementations return EAI_FAIL, others EAI_NONAME."

In the case of a negative answer, EAI_NONAME is the only correct answer. You have an answer: the name does not resolve. POSIX directly specifies this as:

[EAI_NONAME]
The name does not resolve for the supplied parameters. [or]
Neither nodename nor servname were supplied. At least one of these shall be supplied.

The only time EAI_FAIL would be appropriate is when you don't have a local system reason for the failure that would be reportable in errno, but rather an error reported by the nameserver. For instance (this one actually hit me recently due to misconfiguration in a resolv.conf file) the nameserver could give response code 5 (Refused) if it's authoritative-only and you're sending recursive requests to it, or if it's configured to be accessible only to certain client IP addresses. This condition should probably be ignored however if there are other nameservers to fall back to. As for the other response codes, most of them seem to be things that either should not happen, or that would warrant EAI_AGAIN rather than EAI_FAIL.
Comment 27 Kurt Roeckx 2013-08-28 21:20:00 UTC
(In reply to Rich Felker from comment #26)
> I disagree with this:
> 
> "In case there was a negative answer it's unclear what should be returned.
> Some implementations return EAI_FAIL, others EAI_NONAME."
> 
> In the case of a negative answer, EAI_NONAME is the only correct answer. You
> have an answer: the name does not resolve. POSIX directly specifies this as:

There clearly is disagreement about what the behavior should be, which is why I didn't say what should be returned.  I can see arguments for both ways.

> [EAI_NONAME]
> The name does not resolve for the supplied parameters. [or]
> Neither nodename nor servname were supplied. At least one of these shall be
> supplied.

I can interpret the "for the supplied parameters" as meaning the AI_NUMERIC* cases, since it seems the reason for the failure is is the parameters.  And the text above indicates in which cases EAI_NONAME should be returned.

But I can also interpret is as just not resolving, and that it's not a limiting list of cases.  There is also Stevens's implementation that returns EAI_NONAME in this case.

EAI_FAIL on the other hand talks about a "non-recoverable error", which can be just about anything.

> For instance (this one actually hit me
> recently due to misconfiguration in a resolv.conf file) the nameserver could
> give response code 5 (Refused) if it's authoritative-only and you're sending
> recursive requests to it, or if it's configured to be accessible only to
> certain client IP addresses. This condition should probably be ignored
> however if there are other nameservers to fall back to. As for the other
> response codes, most of them seem to be things that either should not
> happen, or that would warrant EAI_AGAIN rather than EAI_FAIL.

We should just not use that nameserver anymore (for the current lookup), and if we find no nameservers that give us an answer it should result in "no answer" / EAI_FAIL.  But I'm not sure how detailed we should go with all those things.
Comment 28 Rich Felker 2013-08-30 01:10:01 UTC
On Wed, Aug 28, 2013 at 09:20:00PM +0000, kurt at roeckx dot be wrote:
> I can interpret the "for the supplied parameters" as meaning the AI_NUMERIC*
> cases, since it seems the reason for the failure is is the parameters.  And the
> text above indicates in which cases EAI_NONAME should be returned.

The "supplied parameters" would include AI_NUMERIC* as well as the
requested address family.

> EAI_FAIL on the other hand talks about a "non-recoverable error", which can be
> just about anything.

A negative response is not an "error" at all, much less a
non-recoverable error. Even if it were, it's clearly not the kind of
thing EAI_FAIL was intended to represent.

> We should just not use that nameserver anymore (for the current lookup), and if
> we find no nameservers that give us an answer it should result in "no answer" /
> EAI_FAIL.  But I'm not sure how detailed we should go with all those things.

If the nameserver gives response code 5, you could discontinue using
it for the current request, but it probably shouldn't be dropped in
general -- for example, the nameserver might be denying recursion but
serving a local domain only. In any case, EAI_FAIL is the right
response here if the only nameserver(s) available give(s) response
code 5, since it's not a temporary condition, there's no way to work
around it, and it's neither a positive result nor a negative result.
Comment 29 Kurt Roeckx 2013-08-30 07:11:23 UTC
(In reply to Rich Felker from comment #28)
> > We should just not use that nameserver anymore (for the current lookup), and if
> > we find no nameservers that give us an answer it should result in "no answer" /
> > EAI_FAIL.  But I'm not sure how detailed we should go with all those things.
> 
> If the nameserver gives response code 5, you could discontinue using
> it for the current request, but it probably shouldn't be dropped in
> general -- for example, the nameserver might be denying recursion but
> serving a local domain only. In any case, EAI_FAIL is the right
> response here if the only nameserver(s) available give(s) response
> code 5, since it's not a temporary condition, there's no way to work
> around it, and it's neither a positive result nor a negative result.

I meant "no answer" / EAI_AGAIN.

But I can also see how EAI_FAIL might be a good value for it.  One of the reasons for EAI_FAIL could be "local configuration error".
Comment 30 Thomas Hood 2013-09-01 00:46:34 UTC
(In reply to Kurt Roeckx from comment #27)
> There clearly is disagreement about what the behavior should be, which is
> why I didn't say what should be returned.  I can see arguments for both ways.
> 
> > [EAI_NONAME]
> > The name does not resolve for the supplied parameters. [or]
> > Neither nodename nor servname were supplied. At least one of these shall be
> > supplied.
> 
> I can interpret the "for the supplied parameters" as meaning the AI_NUMERIC*
> cases, since it seems the reason for the failure is is the parameters.  And
> the text above indicates in which cases EAI_NONAME should be returned.

Kurt, can you explain this interpretation more fully?

Rereading the quoted code snippets in my comment #7 I see support for the proposition that in other UNIXes EAI_NONAME is returned either when nodename and servname are NULL or in another case where the AI_NUMERIC flag is set.
Comment 31 Thomas Hood 2013-09-01 02:11:46 UTC
(I wrote:
> or in another case where the AI_NUMERIC flag is set.

I should have said "or in another case where a AI_NUMERIC* flag is set.")


The RFC is unclear. So the spec will ultimately have to be clarified.  Instead of reading tea leaves it might be better to establish what we think a good spec would be and then take this to POSIX. (I don't know what that entails, though.)
Comment 32 Thomas Hood 2014-03-11 22:11:26 UTC
From the point of view of the user of getaddrinfo() it is mainly going to be important to distinguish the following three outcomes of the getaddrinfo() call.

    A. Success => Go ahead and use the returned address
    B. Temporary failure => Retry
    C. Permanent failure => Abort

Although there is ambiguity in the specs, variability in the implementations and lack of consensus in this discussion, am I right in saying that it is correct to program the application as follows?

    0 => Success => Go ahead and use the returned address
    EAI_AGAIN => Temporary failure => Retry
    EAI_FAIL | EAI_NONAME => Permanent failure => Abort

With reference to the last line: EAI_FAIL is explicitly specified to indicate a permanent failure. The "bad parameters" cause of EAI_NONAME is also a permanent failure. And whether it should result in EAI_FAIL or EAI_NONAME or EAI_NODATA, the absence of the hostname from the name service or absence of addresses for that hostname in the name service is also a permanent condition which can be regarded as a permanent failure to obtain an address for that name.
Comment 33 Thomas Hood 2014-03-20 16:51:14 UTC
New related Ubuntu bug report: "With 'hosts: mdns4' in nsswitch.conf, getaddrinfo() returns -5 (EAI_NODATA) when network interface is down"

    https://bugs.launchpad.net/ubuntu/+source/nss-mdns/+bug/1295229
Comment 34 Thomas Hood 2014-03-21 11:26:04 UTC
I just checked FreeBSD and it always returns 8 (EAI_NONAME) when the name cannot be resolved, whether it's because the name doesn't exist or the nameserver can't be reached or the network interface is down.