Bug 22479 - getaddrinfo: check_pf caches "ipv6 seen, ipv4 not seen" and any DNS lookup fails until restarting the process
Summary: getaddrinfo: check_pf caches "ipv6 seen, ipv4 not seen" and any DNS lookup fa...
Status: UNCONFIRMED
Alias: None
Product: glibc
Classification: Unclassified
Component: network (show other bugs)
Version: 2.26
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-11-22 22:55 UTC by David Faure
Modified: 2018-08-29 06:56 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
patch for debug output (4.49 KB, patch)
2017-11-22 22:55 UTC, David Faure
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description David Faure 2017-11-22 22:55:33 UTC
Created attachment 10629 [details]
patch for debug output

Context:

a laptop running OpenSuSE, boots into Plasma5 which restores the session (including firefox and kmail) and meanwhile networkmanager brings up WiFi using DHCP. 

Symptoms:

- Firefox keeps saying "Server Not Found" for any host, and hitting Reload makes no difference. Only restarting Firefox makes it work again.
- KMail does not fetch any new mail, and any click on "Synchronize" says "the resource needs to be made online, do you accept?". Then nothing happens, and if I try again, the same thing happens, i.e. the resource doesn't actually manage to go online. Only restarting the resource (or akonadi) fixes it.

Investigation:

These processes are attempting to connect sockets while the network is being brought up.
What happens often is that wlan0 gets a link-local IPv6 address immediately (fe80::a634:d9ff:fecc:1451/64 in my case) while the IPv4 address only comes in later, when the DHCP server sends it. So for a little while we have v6=yes v4=no, locally.

Now the problem is that glibc's make_request() method in sysdeps/unix/sysv/linux/check_pf.c caches this information (see line 268)
      result->seen_ipv4 = seen_ipv4;
      result->seen_ipv6 = true;

So from now on, __check_pf won't even call make_request anymore, it simply uses the cache and returns "v6=yes, v4=no", which therefore narrows all DNS queries to ipv6. This makes everything fail when the rest of local network seems to really only support ipv4.

There is a cache invalidation mechanism with nl_timestamp, but I failed to see what is supposed to increment that, it's always 1 for me, whatever I try.

The attached patch adds a ton of debug output to glibc, and shows how I found all of the above information.
The output when things fail is the following:

(pid 12904) check_pf: cache not valid (cache->timestamp=0), made PF_NETLINK socket 48
(pid 12904)   seen non-loopback ipv6: host=fe80::a634:d9ff:fecc:1451, serv=0
(pid 12904)         __nscd_get_nl_timestamp: returning 1 from NSCD_HST_IDX_CONF_TIMESTAMP
(pid 12904)  ipv6 seen ==> storing seen_ipv4=0 into cached data, timestamp=1
(pid 12904) WARNING, CACHING IPv4=false IPv6=true, THIS IS BAD
(pid 12904)  data from PF_NETLINK socket port id cf1efdff says seen_ipv4=false seen_ipv6=true!
(pid 12904) Narrowed search for www.opensuse.org to ipv6

(pid 12904)         __nscd_get_nl_timestamp: returning 1 from NSCD_HST_IDX_CONF_TIMESTAMP
(pid 12904) check_pf: cache valid, and cached data says seen_ipv4=false!

Additional thoughts:

* why doesn't this happen to everyone?

The race condition only happens when using pam_kwallet5.so, so that networkmanager can autoconnect to the wifi during the session-restoring.

Without pam_kwallet, the apps start first (and fail to load anything), while the network-manager applet waits for the user to type in their kwallet password before being able to connect to the wifi, so no race.

I have seen the same issue in other situations than session startup though, e.g. when suspending the laptop and restoring it in a different wifi, in which case the kwallet password is available and network-manager will drop the ipv4 address (but not the link-local ipv6 one) before getting a new ipv4 address via DHCP. But it's harder to trigger the race there, since it requires an app to try to connect to a host during that short time frame before the new ipv4 address comes in.

* possible solutions:

Should we only cache results when both ipv4 and ipv6 were found? There's no caching when neither is found, so it's already the case that every query can call check_pf's make_request().
Or the nl_timestamp mechanism has to be fixed...

* side bug:

My debug output sometimes indicates
(pid 8536) ERROR wait_on_socket 211 returned 0 or negative
(pid 8536) ERROR: nscd not running or wrong version. __nss_not_use_nscd_hosts = 1
which leads to no caching (no nscd) for the next 100 requests.
Let me know if I should dig further into why this happens.

(And confusingly, when that happens, check_pf is "caching" its results with a timestamp of 0, which effectively means that cache cannot possibly be used).
Comment 1 David Faure 2017-12-05 12:16:49 UTC
This patch for sysdeps/unix/sysv/linux/check_pf.c "fixes" the bug for me, but I suppose it's not good enough for the general purpose, in case people only have a v6-only network interface? (does that even exist?)

@@ -261,8 +261,8 @@ make_request (int fd, pid_t pid)
     }
   while (! done);
 
-  if (seen_ipv6 && result != NULL)
+  if (seen_ipv6 && seen_ipv4 && result != NULL)
     {
       result->timestamp = get_nl_timestamp ();
       result->usecnt = 2;
       result->seen_ipv4 = seen_ipv4;


(But maybe all this is a red herring, and the issue is in the kernel netlink stuff, not notifying of changes, so __nscd_get_nl_timestamp always returns 1 ?)
Comment 2 David Faure 2018-04-03 11:06:52 UTC
Ping? Any input on this almost-fully-debugged problem?

Any advice on how __nscd_get_nl_timestamp is supposed to work, if that's where the cache invalidation is supposed to come from?
Comment 3 David Faure 2018-07-25 07:19:42 UTC
Bug originally found in glibc 2.22, still present in 2.26.
Comment 4 Frederick Eaton 2018-08-16 20:26:39 UTC
David could this bug be the cause of the problem I experienced recently with link-local IPv6 addresses lacking an interface?

https://superuser.com/questions/1349197/nscd-on-ipv6-dhcp-with-link-local-addresses-causes-ssh-connect-to-host-myserve?noredirect=1#comment2022321_1349197

https://bbs.archlinux.org/viewtopic.php?id=239604

I'm wondering if I should submit a new bug. It looks like you did some great debugging and it's too bad that nobody gave you a response yet. Maybe I ought to try your patch... However I think that I was using glibc 2.27 before and the problem must have appeared in 2.28, while your bug appeared in 2.22...
Comment 5 Carlos O'Donell 2018-08-17 03:14:09 UTC
(In reply to Frederick Eaton from comment #4)
> I'm wondering if I should submit a new bug. It looks like you did some great
> debugging and it's too bad that nobody gave you a response yet. Maybe I
> ought to try your patch... However I think that I was using glibc 2.27
> before and the problem must have appeared in 2.28, while your bug appeared
> in 2.22...

Volunteering to try patches like the one posted by David in comment #1 always helps.

In general these issues are difficult to reproduce because they require some complex system startup orchestration. No developer has likely seen the issue or we would fix it.

Thus it's mostly up to David to drive his use case further, and get involved in fixing it. I don't have any more advice than that.

The next step here is:

* Post a proper patch to libc-alpha@sourceware.org.
  See the contribution checklist:
  https://sourceware.org/glibc/wiki/Contribution%20checklist

Florian Weimer <fweimer@redhat.com> is the network subsystem maintainer, so you should TO him on your patch which explains the issue. Florian worked on the /etc/resolve.conf dynamic reloading, so he's empathetic to the dyanmic aspect of a network configuration :-)
Comment 6 Frederick Eaton 2018-08-17 05:11:48 UTC
Thanks Carlos, for the encouragement and instructions.

Before submitting a bug I want to try downgrading glibc to an earlier
version which doesn't exhibit the problem. I checked on another
machine with version 2.25 and I find that nscd was disabled, but upon
enabling it ssh breaks. But downgrading is difficult on my distro
(Arch), and downgrading glibc tends to break a lot of things, and I'm
relatively inexperienced, so it may be a while before I get around to
it (and in the meantime there is an easy workaround: disable the
systemd unit). I'm also confused about why I experienced this problem
suddenly after upgrading from glibc 2.27 to 2.28, as when I downgrade
to 2.27 I still get the failure (and as I said 2.25 shows it too...).

David, sorry for spamming your bug comments. The relationship was
superficial but I need to do more debugging to know more.
Comment 7 Frederick Eaton 2018-08-18 17:03:05 UTC
By the way, not to be mean or anything, but to me it seems that David asked some reasonably simple questions and projected a willingness to make further contributions. If I were him then I wouldn't interpret almost a year of silence as "it's mostly up to David to drive his use case further, and get involved in fixing it". I would interpret that silence as "project leadership isn't willing to delegate simple tasks to interested volunteers".

Regarding this bug in particular, I'm not familiar with the code but I don't understand why nscd would be caching information about a local interface in the first place, I thought the purpose of the facility was to avoid expensive DNS lookups, not to avoid (inexpensive) system calls? What if the address assigned to the local interface changes, does some kind of cache invalidation take place? Maybe I'm misinterpreting the purpose of "glibc/sysdeps/unix/sysv/linux/check_pf.c", whose sole documentation consists of the sentence "Determine protocol families for which interfaces exist.".

Also, I noticed that this bug doesn't appear on Google, is that intentional? I think collaboration would be easier if search engines were allowed to index the bugs in your software.
Comment 8 jsm-csl@polyomino.org.uk 2018-08-20 15:59:42 UTC
v6-only interfaces / networks certainly exist (with or without a NAT64 
gateway doing DNS64 rewriting).  So a patch that wouldn't work with 
v6-only interfaces, as asserted in comment#1, wouldn't be suitable for 
inclusion in glibc.

> Also, I noticed that this bug doesn't appear on Google, is that intentional? I
> think collaboration would be easier if search engines were allowed to index the
> bugs in your software.

I see nothing in robots.txt or any robots meta tags to exclude indexing of 
either Bugzilla or the glibc-bugs list archives.  You would have to ask 
Google about why it's not included (or maybe someone on overseers has 
access to or could set up Google Webmaster Tools for sourceware).
Comment 9 David Faure 2018-08-20 21:23:14 UTC
I know my patch is a hack, I'm definitely not suggesting to include it in glibc upstream, given that I still don't understand everything around this. Note however that it wouldn't break ipv6 only networks, it would just make them skip the cache in that code. But yeah, not ideal.

Thanks for finally answering *one* of my many questions... ok, ipv6-only can happen. Good, what about the other questions?

Does *anyone* know what might lead to __nscd_get_nl_timestamp ever being a value higher than 1? In my tests it's always 1, so the cache is never invalidated.

If this indeed never happens, then surely you guys must agree that check_pf.c caching "seen_ipv4 = false; seen_ipv6 = true;" forever is obviously wrong, since ipv4 can come up slower...

Is nscd even guaranteed to run? If not, then nothing will ever invalidate that cache, right?

Is there a description somewhere of how nscd, the kernel's NETLINK stuff, and glibc interact with each other? This is further down the rabbit hole than I ever looked myself before this bug...
Comment 10 Carlos O'Donell 2018-08-21 00:02:07 UTC
(In reply to David Faure from comment #9)
> I know my patch is a hack, I'm definitely not suggesting to include it in
> glibc upstream, given that I still don't understand everything around this.
> Note however that it wouldn't break ipv6 only networks, it would just make
> them skip the cache in that code. But yeah, not ideal.
> 
> Thanks for finally answering *one* of my many questions... ok, ipv6-only can
> happen. Good, what about the other questions?
> 
> Does *anyone* know what might lead to __nscd_get_nl_timestamp ever being a
> value higher than 1? In my tests it's always 1, so the cache is never
> invalidated.
> 
> If this indeed never happens, then surely you guys must agree that
> check_pf.c caching "seen_ipv4 = false; seen_ipv6 = true;" forever is
> obviously wrong, since ipv4 can come up slower...
> 
> Is nscd even guaranteed to run? If not, then nothing will ever invalidate
> that cache, right?
> 
> Is there a description somewhere of how nscd, the kernel's NETLINK stuff,
> and glibc interact with each other? This is further down the rabbit hole
> than I ever looked myself before this bug...

The description is in the code. I know that's a lame answer, but that's it. With DJ's recent test-in-container patches we're going to be able to start running real nscd tests as part of the testsuite. Right now it's only ever really tested by enterprise distribution testing.

I don't have any answers to your other questions. I generally jump head first into the rabbit hole and start debugging.
Comment 11 frederik 2018-08-22 16:50:25 UTC
I'm still confused about what is being cached and why. Isn't this all
data that lives in some device driver in the kernel? So why does it
need to be cached?

Looks like Mr. Ulrich Drepper did a lot of the work, is he still
around?

https://sourceware.org/git/?p=glibc.git;a=blame;f=sysdeps/unix/sysv/linux/check_pf.c;h=04cd104aec5b6b39d22bfd0fa14336803189e212;hb=HEAD

https://sourceware.org/git/?p=glibc.git;a=commit;f=sysdeps/unix/sysv/linux/check_pf.c;h=3a2c02424d9824f5cdea4ebd32ff929b2b1f49c6

Here's part of his commit message: "Whenever getaddrinfo needed
network interface information it used the netlink interface to read
the information every single time. The problem is that this
information can change at any time."

I'm confused by this, why is it a problem for programs to have the
latest version of the interface information from the kernel? Isn't
that what they want? Especially if it can "change at any time"?
Comment 12 Florian Weimer 2018-08-22 17:04:21 UTC
The caching is pretty much required because systems can have many, many network interfaces, and enumerating all of them can be very slow.  This is especially true today on container hosts, but it could also happen with web server IP-based virtual hosting in the past.  With a different kernel interface, the caching would not be needed.
Comment 13 Frederick Eaton 2018-08-22 20:56:59 UTC
I see, thanks Florian... so the kernel requires applications to enumerate all interfaces at once, when they just might be curious about one?

How long does this enumeration take, milliseconds? Microseconds?

Maybe we can have some kind of cache expiration (which works - what I'm getting from David is that whatever is there now doesn't work) so that interface changes are only delayed by a half second or so, before becoming visible to applications which use nscd?
Comment 14 Florian Weimer 2018-08-28 12:36:52 UTC
(In reply to Frederick Eaton from comment #13)
> I see, thanks Florian... so the kernel requires applications to enumerate
> all interfaces at once, when they just might be curious about one?
> 
> How long does this enumeration take, milliseconds? Microseconds?

With many interfaces and frequent changes leading to partial/aborted dumps, hundreds of milliseconds are possible.  This could happen during system startup, for example.

Cache experiation is difficult.  We could the kernel folks to update a counter whenever the interface configuration changes, but such changes could be rather frequent if this includes changes in adjacency information.

I want to add an option to /etc/gai.conf which tells getaddrinfo to use a routing lookup in AI_ADDRCONFIG mode and include the address only if the kernel knows how to route the address.  It's not quite what AI_ADDRCONFIG means today, hence the need for configuration, but I think this has approach would have other benefits besides solving the caching issue.
Comment 15 Frederick Eaton 2018-08-29 06:56:47 UTC
Thank you Florian. This is getting a bit over my head but maybe others understand it better.

After checking getaddrinfo(3), I vaguely understand your AI_ADDRCONFIG suggestion, and maybe it even sounds like what you are suggesting would be an improvement in the current semantics of the flag, by generalizing them in the spirit of the original intent. I don't understand how it is relevant to this bug.

I don't understand the comment about a counter. Again, if changes are frequent then I would think that caching is a bad idea.

I don't see why cache expiration should be difficult. If enumerating the interfaces takes 100ms on certain specialized virtual-hosting systems, then we cache the result for 10 seconds before asking the kernel to enumerate them again. This would slow things down by at most 1%. For the typical system with a small number of interfaces, where enumerating them may take microseconds, we could cache the result for a shorter period like 100 ms. This would seem to eliminate David's problem. The cache expiration timeout would be configurable through whatever configuration files are used by nscd.

If people complain about a 1% slow-down, we can ask them to create a more efficient kernel interface, or we can suggest they configure the cache expiry timeout to a larger value or infinity for their systems.

Please let me know if I'm misunderstanding something.