Over the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.
An ancient haiku reads, "It's not DNS. There's no way it's DNS. It was DNS."
Welcome to the Thursday episode of the AWS Morning Brief. What you can also think of as networking in the cloud. This episode is sponsored by ThousandEyes and their Cloud State Live Event Wednesday, November 13th from 11:00 AM until noon, Central Time. There'll be live streaming from Austin, Texas, the live reveal of their latest cloud performance benchmark where they pit AWS, Azure, GCP, IBM, and Alibaba cloud against each other from a variety of networking perspectives. Oracle Cloud is pointedly not invited. If you'd like to follow along, visit snark.cloud/cloudstatelive, that's snark.cloud/cloudstatelive, and thanks to ThousandEyes for their sponsorship of this ridiculous yet educational podcast episode.
DNS, the domain name system, it's how computers translate numbers into something humans can understand when those humans have a first language that is not math. Put more succinctly if I want to translate www.twitterforpets.com into an IP Address of 22.214.171.124, I probably want a computer able to do that because humans find it easier to remember twitterforpets.com. Originally, this was done with a far more manual process. There was a file on every computer on the internet that was kept in sync with each other. The internet was a smaller place back then, a friendlier time and jerks who are trying to monetize everything at the expense of others were no longer lurked behind every shadow, so how does this service work?
Well, let's go back to the beginning. When you look at a typical domain name, let's call it www.twitterforpets.com there's a hierarchy built in and it goes from right to left. In fact, if you pick any domain you'd like that ends .com, .net, .technology, .dev, .anything else you care about there's another dot at the end of it. That's right. You could go to www.google.com., and it works just the same way as you would expect it to. That dot represents the root and there are a number of root servers run by various organizations that no one entity controls scattered around the internet and they have an interesting job where their role is to resolve who is the authoritative responsible DNS server for the top-level domains. That's all that the root servers do.
The top-level domains, in turn, have name servers that refer out to who is responsible for any given domain within that top-level domain and so on and so forth. You can have subdomains running at your own company. You could have twitterforpets.com but all of the engineering.twitterforpets.com domains are delegated to a subname server out and so on and so forth. It can hit ludicrous lengths if you'd like. Now, once upon a time, this was relatively straightforward because there were only so many top-level domains that existed; .com, .net, .org, .edu, .mil and so on and so forth, and the governing body, ICAN, decided, "You know what's great? Money," so they wound up, in turn, going for additional top-level domains that you could grab the .technology, .blog, .underpants for all I know, no one can keep them all in their head anymore and one leaps to mind of an incredibly obnoxious purchase by google.dev.
Now, you can have anything you want .dev exist as a domain because Google has taken responsibility for owning that subdomain. Why is that obnoxious? Well, historically for the longest time on the internet, there were a finite number of top-level domains that people had to worry about. So internally, when people were building out their own environments, they would come up with something that was guaranteed never to resolve, .dev was a popular pick. You could put that to a local name server inside your firewall or you could even hard-code it on your laptop itself and it worked out super-well. Now, anyone who registers whatever domain you picked has the potential to set up a listener on their end. That is not just a theoretical concern. I worked at a company once that had their domain.com as their external domain and domain.net for their internal domain, which is reasonable, except for the part where they didn't own the .net version of their domain.
Someone else did and kept refusing offers to buy it, so periodically, we would try and log into something internal while not being on the VPN, despite thinking that we were, and type a credential into this listener that is set up and immediately have to reset our credentials. It was awful. Try not to do that. If you use a development domain, make sure you own it, it's $12, everyone will be happier with this. Now, a common interview question that people love to ask when it comes to CIS Admins, SRS, DevOps, whatever we're calling them this week, is when I punch www.google.com into my web browser and I hit enter how does it translate that into an IP address?
There're a lot of things you can hit, but by and large, the way that it works is something like this. Oh, and a caveat they love to add in because otherwise, this gets way more complicated, is every server involved has a cold cache, and we'll get to what that means in a bit, but at that point, your browser then says, "Oh, who has www.google.com?" It passes that query to the system resolver on your computer that goes through a series of different resolution techniques. It usually will check the /etc/host's file if it's on a Mac or a Linux style box, and if there isn't anything hardcoded in there, which there is it for purposes of this exercise, it queries the systems external resolver.
This is usually provided by your ISP, but you can also use Google's public resolvers 126.96.36.199 And 188.8.131.52, Cloudflare's 184.108.40.206, OpenDNSs, which is really weird and no one can remember them off the top of their head, but there're a lot of different options. When that gets queried, it's looks at that www.google.com because it has a cold cache its first question is great, "Who owns .com?" It queries the route name server. The route name server says, "Oh, .com is handled by the .com TLD authoritative servers," and it passes that out. The route name server then returns who's authoritative for.com to the resolver. The resolver says, "Great," and then queries is the authoritative name server for .com, "Who has www.google.com?" and it returns the authoritative name servers for google.com.
Now, something strange if you were to actually try this yourself is that the answer to that question is generally ns1.google.com that sets up the opportunity for an infinite loop where oh, nsi.google.com. Ask .com, "Who has nsi.google.com?" except for the part that when it returns with that result specifically, it includes an IP address. That IP address is known as a glue record to break that circular dependency. Glue records are often one of those things that pop up in CIS Admin type interviews to prove the interviewer thinks they're smarter than you are. From there, the resolver then queries and nsi.google.com, "Who has www.google.com?" and the ns1.google.com authoritative server, in turn, responds with an IP address. The resolver caches that result while passing it back to the original requester and the next time that resolver is queried, it has that in cache until the TTL expires.
What is the TTL? It stands for time to live because a lot of these things don't change very often, but they do change from time to time. For example, if I want to re-point my website from one provider to another, I don't want everyone to continue going to the old provider in perpetuity, but I also don't necessarily want to slow everyone down when they're querying, "Who has responsibility for my site," and going through that whole DNS chain every single time. Setting a reasonable time to live values as a bit of an art when it comes to DNS, some forms of load balancing use incredibly low values in case it changes on a minute to minute basis, but by and large, what happens is when anything along that path queries and gets a result, it comes with a time to live field and when that gets exceeded, the result is considered stale and is discarded and the query has to go out again.
Let's talk a little bit about how that works in a time of cloud. However, first, let's talk a little bit more about Cloud State Live. Who's going to be there? Well, from the world of cloud, the internet and various network experts from both digital native and digitally transforming companies, they're going to be there, media and industry analysts will be there, which is how I managed to sneak in last year because it turns out when you call yourself an analyst, there's no actual certifying body that proves it, and I had everyone fooled. Anyone and everyone with a vested interest in the cloud is welcome to join Cloud State Live either in person in Austin, Texas or on the live stream where I will attempt to aggressively live Tweet it in my typical style. Last years was in San Francisco, so I was able to sneak in without hopping a plane. This year I'll be doing it remotely, so you're definitely going to want to follow along here.
They're also teasing the reveal of a major innovation that will finally change the power dynamic for every business that relies on the internet, which, of course, is every business. Find out more and see if they live up to that at snark.cloud/cloudstatelive. That's snark.cloud/cloudstatelive and my thanks to ThousandEyes for their sponsoring this ridiculous yet strangely informative podcast. Now, in a world of cloud, there are two different kinds of resolvers, the same as there on the internet, the authoritative servers that own the records for a zone and the resolver that winds up going out and figuring out what everything else on the internet has. Route 53 is AWS's authoritative DNS service and unlike any other public service that AWS has, it offers a 100% SLA, meaning it will always be available it'll always be up.
And some folks don't believe that and you shouldn't. I take SLA guarantees with a grain of salt except for the fact that it's DNS. If they're publishing a 100% SLA, then services internal to AWS inherently are going to be building to that 100% SLA. So should Route 53 goes down, and at some point, it almost certainly will because it's a computer computers break, it's what they do. Then internally we'll almost certainly see significant outages of other AWS services with baked independencies on Route 53. So if you're looking to get out of any potential DNS issues by just having a secondary provider available, you may have to do some more work than just DNS.
Now, cloud providers are fascinating in this world just because they have built systems that are fully compatible with DNS because it is a worldwide well-known protocol and you can't just build your own without some serious buy-in from other folks, but they also wind up doing it in their own special way. Common question that you'll get in these interviews again is, "Does DNS speak UDP or TCP?" And the easy answer is, "Oh, it speaks UDP," which it does, but there are exceptions and those exceptions are what that condescending interviewer at Google almost certainly wants to hear.
UDP, in a DNS context, is limited originally to 512 bytes. That's why there are only 13 root name servers, anything more wouldn't fit in a single DNS packet. Now, if the result is larger than 512 bytes, what happens traditionally is that UDP packet fits as much data as it can and then it sets the truncate bit in the packet, meaning that it is left up to the client to decide, "Do I just make do with these partial results, or do I retry using TCP," you can't guarantee that any particular client is going to do exactly what you expect, so you have to account for that. So the correct answer to, "Does DNS speak UDP or TCP?" Is both, but there's one other edge case as well.
All of the records for a zone live in what is known as a zone file. Zone files are used as effectively the authoritative record right now of what lives inside a given zone and when one updates on a primary DNS server, a secondary DNS server pulls it to wind up figuring out what has changed. There are two ways that zone transfers happen in legacy systems, by which I mean not cloud systems, by which I mean computers you control. The AXFR, which is a complete transfer of the entire zone file and the IXFR, which is a partial transfer of just what's changed. Now, Route 53 supports neither of those and it can work for some use cases. It causes problems for others, but it does wind up being a difference that people sometimes forget about and if you're trying to pair this with some other form of DNS server, you have some work to do.
Now, lastly, before we sign off, I want to talk about a few stupid DNS tricks that I love. My personal favorite story is that Route 53 is my personal favorite database because again, it has that 100% SLA, you can query it and DNS is fundamentally a large key-value store. Some would say a key-value store isn't a database, but Reddit calls itself one. So who are we to complain? Now, originally I would use text records for various systems I had to give me further information about them, what rack they lived in, et cetera. So you could make a text record query for any given resource and get a pile of information back. There're much better ways to do this these days, but it mostly worked.
That said it was a database and I maintain Route 53 remains my favorite database. You could take it a step further, although Route 53 doesn't support this yet, and use DNS itself as a transport layer for something else riding on top of it. Iodine is a good example of this. You can put TCP streams over DNS as a transport so you can have a VPN that winds up going over DNS, or open VPN can be hacked to make it work over DNS as well.
Why would you do this? Well, an awful lot of terrible captive portals in various coffee shops and whatnot won't let you connect to the internet without paying them or giving over a bunch of personal information, but they will resolve DNS externally. So you can sort of pivot over the top of any barriers by using DNS. Now, rather than just being a scammy way to wind up squeezing money out of people when you don't deserve it this can be a serious security concern because you can start using DNS to exfiltrate data from inside of an environment. It's always DNS, even when it's not. That more or less rounds up what I had to say about DNS this week. If you disagree with anything I have to say first, let me condescendingly tell you you're wrong, even though you're probably not. Secondly, feel free to chime into the conversation on Twitter. I'm Quinny Pig that's Q-U-I-N-N-Y Pig or visit me on the web at www.lastweekandaws.com, I'm cloud economist, Corey Quinn, and I'll talk to you next week about more network things.
Announcer: This has been a HumblePod Production. Stay humble.