Client-side DNS failover is one of the coolest things I learned about recently. It’s simple. Instead of having a single A record with one IP, you add additional A records with different IPs, and clients automatically skip over failing servers.
Check out this Webmasters Stack Exchange question: “Using multiple A-records for my domain - do web browsers ever try more than one?”
Pretty much every browser does indeed receive the full list of A records, and does indeed check others if the one it is using fails. You can expect each client to have a 30 second wait when they first try to access a site when a server is down, until it connects to a working address. The browser will then cache which address is working and continue using that one for future requests unless it also fails, then it will have to search through the list again. So 30 second wait on first request, fine thereafter.
For example, let’s look at
$ dig amazon.com ;; QUESTION SECTION: ;amazon.com. IN A ;; ANSWER SECTION: amazon.com. 55 IN A 18.104.22.168 amazon.com. 55 IN A 22.214.171.124 amazon.com. 55 IN A 126.96.36.199 amazon.com. 55 IN A 188.8.131.52 amazon.com. 55 IN A 184.108.40.206 amazon.com. 55 IN A 220.127.116.11
If 18.104.22.168 is down, my browser will automatically try 22.214.171.124, and so on.
It’s not just web browsers that can do this. Programs like curl and telnet do it too.
$ telnet amazon.com Trying 126.96.36.199... telnet: connect to address 188.8.131.52: Connection refused Trying 184.108.40.206... telnet: connect to address 220.127.116.11: Connection refused Trying 18.104.22.168... telnet: connect to address 22.214.171.124: Connection refused Trying 126.96.36.199... telnet: connect to address 188.8.131.52: Connection refused Trying 184.108.40.206... telnet: connect to address 220.127.116.11: Connection refused Trying 18.104.22.168... telnet: connect to address 22.214.171.124: Connection refused telnet: Unable to connect to remote host
If you’re writing your own programs, you can think about using this technique to implement
high availability. Go programmers have it easy, as the
does it automatically and it also supports dual-stack fallback. I’m not sure about other
I can see this being useful for analytics services. Browsers will automatically failover to active servers, so you don’t have to do anything fancy at the back-end. Last year, I posted an HA example with a Digital Ocean Floating IP and leader election with my libab library. That approach makes failover transparent to the client, but I think this approach is more elegant. Clients are smart!