When getnameinfo() gets temporary error from nss, it should return temporary error instead of permanent error to the application. This has already been solved by following patch for cases without nscd running: http://sourceware.org/cgi-bin/cvsweb.cgi/libc/inet/getnameinfo.c.diff?r1=1.34&r2=1.35&cvsroot=glibc&f=h However, with nscd it still doesn't return proper value. We hit this with 2.4 first and verified that it still exists in 2.8. I have working fix for 2.4 and unverified fix for current CVS (underlying code is a bit different). I'll attach it in this bug.
Created attachment 2883 [details] nscd patch for current CVS
Created attachment 2884 [details] just for reference - working, verified patch that applies to 2.4
... ping?
Created attachment 4826 [details] updated patch for HEAD
I don't like this patch. It does no caching at all. It is highly unlikely that a request will succeed a very short while afterwards. Whther negtimeout is sufficient to use is another question. Maybe a third timeout value is needed. But the patch as-is is no good.
I do not think it is appropriate on two counts. First, caching temporary results would not be specific to this cache and would be a completely orthogonal nscd-wide change, so I do not think the patch should be judged based on this at all. Second, the idea of caching intermittent failures itself seems strange to me. The situation itself should happen only in relatively exceptional cases, usually due to some network outage - in that case, it is not clear the benefit of giving quicker feedback to the application outweights the disadvantage of prolonging service outages or even massively amplifying mere singular errors. After all, the return code specifically says "try again", and it is quite plausible the application will go to some limited loop where it tries to re-resolve after very short interval. In short, I am worried about the amplification of singular failures, and noone showed up so far who would actually want to cache intermittent failures to cater for some plausible scenario.
(In reply to comment #6) > I do not think it is appropriate on two counts. > [...] But you're wrong. The whole point of nscd is to reduce load and latency. These errors, when they happen, usually happen for some time. And if one lookup happens a second often follows shortly. Caching these types of errors is definitely the right thing to do.
Ok, what you say also makes sense; however, that means I'd be wrong just on the second count - I still believe fixing hstcache bug returning wrong results for temporary errors is independent from implementing a new feature for general caching of the temporary errors.
I checked in a patch.