High Availability with resolv.conf

Edouard COLE Edouard.COLE@rgsystem.com
Wed Apr 4 15:04:00 GMT 2018


Today, I've discovered the rotate option in the resolv.conf file. I've been investigating into sources the behavior this option induces, and if I understand correctly, it randomly selects one server from the nameserver list, and uses this one. It's good when it's mixed with a low timeout (timeout:1) and a low retry count (retry:1) to make load balancing and high availability with a "small tradeoff", but this is not something perfect (especially when no luck and you hit first the broken name server).

We had the problem on a production environment, and when a server goes down, it implied some latencies when it comes to lookup names in applications with no cache (application freshly started for example). I wanted to know what would be the best approach to solve this behavior, and I was wondering two things:
- Why did you drop the RES_BLAST option, as it looks like it was firing all lookups simultaneously (but I was not able to find any documentation on what was accurately the previous behavior before it was unimplemented)
- Would it be possible to add an option (something like "parallel-factor: 3") to perform a lookup on 3 randoms name servers on the given list in resolv.conf and returns when at least one response was received

Maybe this option already exists? Maybe this makes no sense at all? I do understand that this option could break the internet if it augments all DNS calls by a fixed factor, but when it comes to deal with HA into a dedicated environment, this sounds pretty neat 

Edouard COLE

More information about the Libc-help mailing list