I noticed recently that if I start nscd, then 'ssh' fails intermittently. The failure is related to 'ssh' getting a link-local IPv6 address with no specified interface when it tries to resolve a host.
The details are here:
I tried to narrow it down to a specific glibc version but was not able to. I have two machines which exhibit the problem, one is running glibc 2.25 and the other 2.28.
Also I have a third machine which doesn't exhibit the problem on 2.24, and I upgraded its glibc package to 2.28 and still could not reproduce the problem (but I have not been able to do a full system upgrade on that machine). It is a 32-bit machine while the others are 64-bits. So it could be a 32-bit issue, or it could be an issue with another package interacting with glibc in some way, or it could be something else that I haven't taken into account.
I'm not sure why I didn't experience this before, it appears that nscd is disabled on all my systems, which are running Arch Linux. I upgraded recently and must have enabled it while debugging another problem, because when I downgraded I found that nscd still causes problems with ssh.
I have a Linksys router which has a DHCP server and a DNS server, and it allows the hosts on my local network to choose their own hostnames, this happens automatically.
I think my configurations and my network are pretty standard, so if you can find someone (someone who is using Arch? someone with a Linksys router?), then reproducing this bug should be a matter of getting them to enable nscd and running something like this:
H=somehost; while =ssh -F /dev/null $H date; do sleep 1; done; echo "ssh failed"; date;
The loop should eventually stop with an error, and then you can use the timestamps to try to correlate it to some event in the logs or whatever. I've never seen it take more than a few minutes to encounter the error.
I thought this might be related to the following bug:
but on reflection it seems unlikely, since the interfaces I'm working with have the same address for the duration of the experiment, while David Faure's bug has to do with interfaces receiving an IPv4 address sometime after an application starts. David did a lot more debugging than I did. Sorry but unfortunately I don't have time to pursue this in such depth. Although I am happy to answer questions or run code that people send me.
I found out that this is related to systemd-resolved and LLNR, see my answer to the superuser.com question. Disabling systemd-resolved is another way to fix the problem. The bug still deserves some investigation in my opinion, as many people have systemd-resolved enabled.
Right, there are nscd <-> systemd-resolved interactions that we should review. Thanks.
I just saw that the title was changed to "Problems with nscd and systemd-resolved interactions on IPv6 network.". I'm not sure if it is accurate to call it an IPv6 network, I haven't configured anything IPv6-related on either my computers or on the router. The problem is related to IPv6, but the network is a "typical" home network, with a consumer-grade linksys router serving IPs in the 192.168.1.* range.