This is the mail archive of the
mailing list for the glibc project.
NSS error reporting (bug 20532)
- From: Florian Weimer <fweimer at redhat dot com>
- To: GNU C Library <libc-alpha at sourceware dot org>
- Date: Thu, 3 Aug 2017 18:28:51 +0200
- Subject: NSS error reporting (bug 20532)
- Authentication-results: sourceware.org; auth=none
- Authentication-results: ext-mx04.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
- Authentication-results: ext-mx04.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=fweimer at redhat dot com
- Dmarc-filter: OpenDMARC Filter v1.3.2 mx1.redhat.com A64F07EA84
I'm looking again at bug 20532 and related issues.
The core problem is that the NSS service functions have many different
ways of reporting result status:
 The return value, enum nss_status.
 errno value via the int *errnop argument
 direct errno update
 h_errno value via the int *herrnop argument (for host lookups)
 direct h_errno update (for host lookups)
 empty results (while otherwise reporting success)
The overall behavior is extremely inconsistent even within glibc.
(A) The general assumption seems to be that NSS_STATUS_SUCCESS and
NSS_STATUS_NOTFOUND  do not have any other error codes associated
with them (i.e., the other error flags are indeterminate). This is
explicitly enforced by the get*by*_r wrapper in glibc, which discards
errno for the success case.
(B) For NSS_STATUS_TRYAGAIN , *errnop  equals ERANGE, and, for
hosts, lookups, *errnop equals NETDB_INTERNAL, is used to request retry
with a larger buffer. Therefore NSS_STATUS_TRYAGAIN requires a valid
*errnop  value (non-host lookup) or a valid *herrnop  value (host
lookup), or a valid *herrnop  value and valid *errnop  value (host
lookup with *herrnop equals NETDB_INTERNAL).
(C) NSS_STATUS_UNAVAIL  seems to follow the same rules about validity
(D) getaddrinfo, in gethosts, unconditionally uses *errnop  and
*herrnop , without checking the return value, in violation of (B).
(E) In contrast to (D), getaddrinfo largely follows (B) for the
AF_UNSPEC/fct4 NSS function call.
(F) getaddrinfo manually propagates *herrnop  to h_errno in many
cases, for internal use and perhaps for use by the caller of getaddrinfo
(but this is not part of POSIX and underspecified).
(G) Googles fix for bug 20532 touches getaddrinfo code which reads
h_errno  directly (not *herrnop ).
(H) getaddrinfo documentation and actually callers suggest that errno is
only valid if getaddrinfo fails with EAI_SYSTEM.
(I) getaddrinfo, besides the handling for (B)/(D)/(E), generally assumes
that the NSS service function has set errno directly , and does not
propagate it to the caller even for EAI_SYSTEM.
(J) Empty results  cannot be reported by most of the get* NSS
functions (e.g., a struct passwd * indicates a single). For
getaddrinfo, the struct addrinfo * list includes at least one element if
getaddrinfo is successful (says POSIX). So that seems to rule out 
(K) _nss_files_get*_r saves and restores errno around the
internal_setent call. This means that even a failure with
NSS_STATUS_UNAVAIL or NSS_STATUS_TRYAGAIN does not result in a proper
*errnop  or errno  value. This could cause problems for the
get*ent implementation in glibc if the initial errno value was ERANGE
(because there is no way for the caller to tell that the error occurred
during the internal_setent phase).
(L) Carlos added NSS_STATUS_NOTFOUND with *errnop equals 0 as a
documented special case to the manual, in commit
d4e301c5c65393837e438b6d81feabfbfde7b9c7. This contradicts (A).
NSS_STATUS_NOTFOUND is handled implicitly by __nss_next2, which does not
have access to the errno value, so I do not understand how this could work.
(M) NSS_STATUS_RETURN does not seem to have associated error codes, like
NSS_STATUS_SUCCESS. The manual refuses to document this status code,
and I have not tried to reverse-engineer its behavior.
(N) Many callers alias *errnop  and errno  (passing &errno as
errnop), but this is not fully consistent.
(O) Fewer callers alias *herrnop  and h_errnop, and most of them are
(Q) The glibc get*ent wrappers with a herrnop parameter appear to ignore
that and use &h_errno instead (which looks like a genuine bug).
I suppose this list could go on for quite some time. The question is in
which direction we want to move this.
I think the three values (enum nss_status return value , *errnop ,
*herrnop ) should behave roughly like this (as expressed in Standard
ML, I hope anyone reading thus far can understand this):
type errno = int
datatype h_errno =
| NETDB_INTERNAL of errno
datatype 'a nss_status =
NSS_STATUS_TRYAGAIN of 'a
| NSS_STATUS_UNAVAIL of 'a
(* The hosts database has an associated h_errno variable, and errno is
only valid in the NETDB_INTERNAL case.)
type host_status = h_errno nss_status
(* The other databases use errno directly, without h_errno. *)
type other_status = errno nss_status
In addition, we should change NSS service function calls so that we call
them with *errnop == 0, *herrnop == 0, and if we need any of these error
values (due to NSS_STATUS_TRYAGAIN, NSS_STATUS_UNAVAIL results) and they
are still 0, we use errno  and h_errno  directly as a replacement.
This should provide backwards compatibility in cases where external NSS
service functions currently expect aliasing of to the thread-local
variables and updates (say) errno, but not *errnop.
The gethosts issue (D) should be fixed, and getaddrinfo should only read
the other error codes for NSS_STATUS_TRYAGAIN, NSS_STATUS_UNAVAIL, per
the SML types above.
(Q) is also a clear bug. (L) probably should be reverted. (K) looks
like it needs fixing as well.
I don't really know how to test this, except by making the cleanups and
doing some whole-system tests.