This is the mail archive of the
libc-help@sourceware.org
mailing list for the glibc project.
High Availability with resolv.conf
- From: Edouard COLE <Edouard dot COLE at rgsystem dot com>
- To: "libc-help at sourceware dot org" <libc-help at sourceware dot org>
- Date: Wed, 4 Apr 2018 15:04:07 +0000
- Subject: High Availability with resolv.conf
Hello,
Today, I've discovered the rotate option in the resolv.conf file. I've been investigating into sources the behavior this option induces, and if I understand correctly, it randomly selects one server from the nameserver list, and uses this one. It's good when it's mixed with a low timeout (timeout:1) and a low retry count (retry:1) to make load balancing and high availability with a "small tradeoff", but this is not something perfect (especially when no luck and you hit first the broken name server).
We had the problem on a production environment, and when a server goes down, it implied some latencies when it comes to lookup names in applications with no cache (application freshly started for example). I wanted to know what would be the best approach to solve this behavior, and I was wondering two things:
- Why did you drop the RES_BLAST option, as it looks like it was firing all lookups simultaneously (but I was not able to find any documentation on what was accurately the previous behavior before it was unimplemented)
- Would it be possible to add an option (something like "parallel-factor: 3") to perform a lookup on 3 randoms name servers on the given list in resolv.conf and returns when at least one response was received
Maybe this option already exists? Maybe this makes no sense at all? I do understand that this option could break the internet if it augments all DNS calls by a fixed factor, but when it comes to deal with HA into a dedicated environment, this sounds pretty neat
Thanks
Edouard COLE