Bug 15862 - nscd doesn't cache record containing more than one IP address.
Summary: nscd doesn't cache record containing more than one IP address.
Status: NEW
Alias: None
Product: glibc
Classification: Unclassified
Component: nscd (show other bugs)
Version: 2.18
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-08-20 14:52 UTC by frimik
Modified: 2017-03-09 05:57 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
Enable nscd to cache multiple addresses in gethostby* (1.26 KB, patch)
2014-05-14 07:48 UTC, Alexandre Oliva
Details | Diff
Use nscd in getaddrinfo before falling back to gethostbyname2_r (1.65 KB, patch)
2014-05-14 07:51 UTC, Alexandre Oliva
Details | Diff
Add getent cmdline arg to set default getaddrinfo args (933 bytes, patch)
2014-05-14 07:57 UTC, Alexandre Oliva
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description frimik 2013-08-20 14:52:58 UTC
http://sourceware.org/git/?p=glibc.git;a=blob;f=nscd/hstcache.c;h=0d421fcbbb5e8823b660973e08b73e15e0dac3c8;hb=HEAD#l240

explains:
 240       /* If the record contains more than one IP address (used for
 241          load balancing etc) don't cache the entry.  This is something
 242          the current cache handling cannot handle and it is more than
 243          questionable whether it is worthwhile complicating the cache
 244          handling just for handling such a special case. */

This is hardly such a special case anymore. DNS round-robin is more and more common... An example service today which use it extensively is AWS for example for all of its S3 hosts ...

Is this as complicated to fix as it sounds? Would really appreciate a patch for v2.12 (or later).
Comment 1 Alexandre Oliva 2014-05-14 07:40:22 UTC
AFAICT, current cache handling *can* handle multiple addresses, and this was true in 2.12 as well.  I'm a bit concerned earlier versions might fail should they find multiple addresses in the mmapped cache, but since they could always handle multiple addresses in (not cached) responses from nscd, I'm inclined to think it wouldn't be a problem to remove the tests for a single address throughout nscd/hstcache.c.

That said, such a change, by itself, would remove load balancing, whereas consulting a (caching and presumably close) name server retains load balancing without too much of a performance penalty.  I suppose this was the driving factor for not caching answers with multiple addresses.

One oddity is that getaddrinfo *does* cache multiple addresses, and if it is to be a modern replacement for gethostbyname, the same considerations should apply.

Furthermore, it is very weird that getaddrinfo will cache multiple answers in all but AF_INET non-AI_CANONNAME requests, because this combination in getaddrinfo is implemented in terms of gethostbyname2_r.
Comment 2 Alexandre Oliva 2014-05-14 07:48:08 UTC
Created attachment 7596 [details]
Enable nscd to cache multiple addresses in gethostby*

This patch disables the code that prevents multiple addresses from being cached, and removes comments that are apparently obsolete.  I haven't tested it thoroughly, but code review and some testing suggest this should work.
Comment 3 Alexandre Oliva 2014-05-14 07:51:54 UTC
Created attachment 7597 [details]
Use nscd in getaddrinfo before falling back to gethostbyname2_r

This patch makes getaddrinfo try an nscd query before falling back to gethostbyname_r, so that getaddrinfo results get consistently cached when using nscd.  This may obviate the previous patch, at least when it comes to programs using getaddrinfo even for IPv4 lookups.
Comment 4 Alexandre Oliva 2014-05-14 07:57:41 UTC
Created attachment 7598 [details]
Add getent cmdline arg to set default getaddrinfo args

This is the patch for getent I used for testing the previous two patches.  Even ahostsv4 requests wouldn't take the gethostbyname2_r path because AI_CANONNAME was given by default, and hosts request wouldn't ask for IPv4 addresses of names that had valid IPv6 addresses.  Setting the default flags to zero enables both paths to be tested, with and without nscd.

In order to run these programs the build tree, even within a debugger, I added the following flag to the link commands of nss/getent and nscd/nscd: -Wl,-I,`pwd`/elf/ld-linux-x86-64.so.2,-R,`pwd`:`pwd`/nss:`pwd`/resolv:`pwd`/nptl
Comment 5 Alexandre Oliva 2014-05-14 08:11:39 UTC
Another thought that occurred to me was introducing in getaddrinfo a way to query the nscd cache without performing a lookup if we don't have a cached answer, before falling back to gethostbyname2_r, and performing a lookup proper afterwards, if we don't fall back.

Some means to query the cache without a lookup on failure might enable us to use GETAI cached values to satisfy GETHOSTBYNAME or GETHOSTBYNAME6 requests, and perhaps vice-versa.  Although the latter isn't necessarily right: given a sequence of nss backends for hosts, if we find a result for say v6 with the first backend, should we ever combine it with a result for v4 with later backends?  Should it matter whether we're looking for v4 addresses only?  I'd think not, and that if we find v6 addresses first, we should stop there and not report any IPv4 addresses, but falling back to gethostbyname2_r doesn't behave that way.