Summer Replay - Ironing out the BGP Ruffles with Ivan Pepelnjak

Episode Summary

If you need a point of contact for all things networking, then look no further than Ivan Pepelnjak. Ivan is the webinar author at ipSpace.net where he is working on making networking an approachable subject for everyone. From teaching to writing books, Ivan has been at it for a long and storied career, and as a de facto go-to for networking knowledge, you can’t beat him. In this Summer Replay of Screaming in the Cloud, Ivan and Corey discuss Ivan’s status as a CCIE Emeritus and the old days of Cisco. Ivan also levels his network engineering expertise and helps Corey answer some questions about BGP and its implementation. Ivan aptly narrows it down into “layers” that he kindly runs us through. So tune in for a Dante-esque descent into BGP, DNS and Facebook, seeing out the graybeards of tech, and more!

Episode Video

Episode Show Notes & Transcript

Show Highlights:

(0:00) Intro to episode
(1:23) Panoptica sponsor read
(2:04) The world of VaxVMS
(2:39) The significance of being a CCIE emeritus
(5:02) The value of certification in the modern tech world
(7:37) BGP and networking
(12:41) Internal vs. external BGPs
(15:23) “Unfair criticisms” of BGP
(17:35) Differences between BGP and DNS
(23:19) Cloud growth vs. loss of networking engineers
(24:57) Panoptica sponsor read
(25:20) Outsourcing admin work
(27:45) Breaking down the Facebook DNS outage
(31:37) Disconnect at the data center
(37:06) Where you can find Ivan

About Guest:

Ivan Pepelnjak, CCIE#1354 Emeritus, is an independent network architect, blogger, and webinar author at ipSpace.net. He's been designing and implementing large-scale service provider and enterprise networks as well as teaching and writing books about advanced internetworking technologies since 1990.

Links Referenced:

ipSpace.net: https://ipspace.net
Original Episode: https://www.lastweekinaws.com/podcast/screaming-in-the-cloud/ironing-out-the-bgp-ruffles-with-ivan-pepelnjak/

Sponsor

Panoptica: https://www.panoptica.app/

Transcript

Ivan Pepelnjak: [00:00:00] They have DNS servers around the world, and the DNS servers serve the local region, if you wish. And that DNS server then decides What Facebook. com really stands for. So if you query for Facebook. com, you'll get a different answer in Europe than in US.

Corey Quinn: Welcome to Screaming in the Cloud. I'm Corey Quinn. I have an interesting and storied career path. I dabbled in marketing. security engineering slash infosec for a while before I realized that being crappy to people in the community wasn't really my thing. I was a grumpy Unix systems administrator, because it's not like there's a second kind of those out there.

And I dabbled ever so briefly in the wide world of network administration slash network engineering slash plugging the computers in to make them talk to one another ideally correctly. But I was always a dabbler. When it comes [00:01:00] time to have deep conversations about networking, I immediately tag out and look to an expert.

My guest today is one such person. Ivan Pepelnyak is oh so many things. He's a CCIE emeritus and well, let's start there. Ivan, welcome to the show.

Ivan Pepelnjak: Thanks for having me. And oh, by the way, I have to tell people that I was a VEX VMS administrator in those days.

Corey Quinn: This episode has been sponsored by our friends at Panoptica, part of Cisco.

This is one of those real rarities where it's a security product that you can get started with for free, but also scale to enterprise grade. Take a look. In fact, if you sign up for an enterprise account, they'll even throw you one of the limited, heavily discounted AWS skill builder licenses they got, because believe it or not, unlike so many companies out there, they do understand AWS.

To learn more, please visit panoptica. app slash last week in AWS. That's [00:02:00] panoptica. app slash last week in AWS.

Oh yes, the VaxVMS world was fascinating. I talked to a company that was finally emulating them on physical cards because that was the only way to get them there. Did you refer to them as Vaxen or Vaxes or how did you wind up referring?

Vaxes. Vaxes. Okay. Let's I was on the other side of that with the inappropriately pluralizing anything that ends with an X with an EN, Boxen, and the rest. And that's why I had no friends for many years.

Ivan Pepelnjak: You don't know what the first VEX was, right? I do not. It was a Swedish Hoover company. Ooh. And they had a trademark dispute with Digital over the name and then they settled that.

Corey Quinn: You describe yourself in your bio as a CCIE emeritus, and you give the number, which is low, number 1354. Now, I've talked about certifications on this show in the context of the modern era, and whether it makes sense to get cloud certifications or not, but this is from a different time. Understand that for many listeners, [00:03:00] these stories might be older than you are in some cases, and that's okay.

But Cisco at one point, believe it or not, was a shining beacon of the industry, the kind of place that people wanted to work at. And their certification path was no joke. I got my CCNA from them, Cisco Certified Network Administrator. And that was basically a byproduct of learning how networks worked.

There are several more tiers beyond that, culminating in the CCIE. Which stands for Cisco Certified Internetworking Expert, or am I misremembering? No, no, that's it. Perfect. And that was known as the Doctorate of Networking in many circles for many years. Back in those days, if you had a CCIE, you were guaranteed to be making an awful lot of money at Basically any company you wanted to, because you knew how networking worked.

In the U. S. Well, in the U. S., true. There's always the interesting stories of working in places that are trying to go with the lowest bidder for networking gear, and you wind up spending weeks on end trying to figure out [00:04:00] why things are breaking intermittently, and only to find out at the end that someone saved 20 bucks by buying cheap patch cables.

I digress, and I still have the scars from those. But it was fascinating in those days because there was a lab component of getting those tasks. There were constant rumors that in the middle of the night during the two day certification exam, they would come in and mess with the lab and things you'd set up to fix the following day.

That is true.

Ivan Pepelnjak: Yeah. So in the good old days when the lab was still physical, they would even turn the connectors around so that they would look like they would be plugged in, but obviously there was no signal coming through. And they would mess the jumpers on the line cards and all that stuff. So when you got your broken lab, you really had to work hard, you know, from the physical layer up from the jumpers and they would mess up your config and everything else.

It was, you know, the real deal. The thing you would experience in real world with Underqualified technicians [00:05:00] putting stuff together. Let's put it this way.

Corey Quinn: I don't wish to besmirch our brethren working in the data centers, but having worked with folks who did some hilariously awful things with cabling, and how having been one of those people myself from time to time, it's hard to have sympathy when you just spent hours chasing it down.

To be clear, the CCIE is one of those things where in a certain era, if you're trying to have an argument on the internet with someone about how networks work, and their response is, well, I'm a CCIE. Yeah, the conversation was over at that point. I'm not one to appeal to authority on stuff like that very often, but it's the equivalent of arguing about medicine with a practicing doctor.

It's the same type of story. It is someone where if they're wrong, it's going to be in the very fringes or the nuances back in this era. Today, I cannot speak to the quality of CCIEs. I'm not attempting to disparage them. Do besmirch any of them, but I'm also not endorsing that certification the way I once did.

Ivan Pepelnjak: Yeah, well, I totally agree with you. When this became, you know, a [00:06:00] mass certification, uh, the reason it became a mass certification is because reseller discounts are tied to reseller status, which is tied to the number of CCIEs they have. It became, you know, this, well, still high end, but commodity that you simply had to get to remain employed because your employer needed the extra two point discount.

It used to be that the

Corey Quinn: prerequisite for getting the certification was, beyond other certifications, you spent five or six years working on things.

Ivan Pepelnjak: Well, that was what gave you the experience you needed, because in those days there were no boot camps. Today you have a boot camp. Now there's boot camp brain dump

Corey Quinn: things where it's, we're going to, we're going to train you for four straight weeks of nothing but this.

Teach to the test. And, okay.

Ivan Pepelnjak: Yeah, no, it's even worse. There were rumors that some of these bootcamps in some parts of the world that shall remain unnamed were actually teaching you how to type in the commands from the actual lab.

Corey Quinn: [00:07:00] Even better.

Ivan Pepelnjak: Yeah. You don't have to think, you don't have to remember. You just have to type in the commands you've learned.

You're done.

Corey Quinn: There's an arc to the value of a certification. It comes out and no one knows what the hell it is. And suddenly it's great. And you can use that to really identify what's great and what isn't. And then it goes at some point down into the point where it becomes commoditized and you need it for partner requirements and the rest.

And at that point, it is no longer something that is a reliable signal of anything other than that someone spent some time and or money.

Ivan Pepelnjak: Well,

Corey Quinn: are you talking about bachelor degree now? Well, no, I don't have one of those either. I have an 8th grade education because I'm about as good of an academic as it probably sounds like I am.

The thing that really differentiated in my world, the, the difference between what I was doing in the network engineering sense and the things that folks like you, who are actually, you know, professionals rather than enthusiastic amateurs, took into account was that I was always working inside of the LAN, the local area network, inside of a data center.

Cool, everything here inside the cage, I can make it talk to each other, I can [00:08:00] screw up the switching fabric, etc, etc. I didn't deal with any of the WAN, wide area network, think internet in some cases. And at that point, we're talking about things like BGP or OSPF in some parts of the world, or RIP, or RIPv2 if you make terrible life choices.

BGP is the routing protocol that more or less powers the internet. At the time of this recording, we're a couple weeks past a BGP kerfuffle, that took Facebook down for a number of hours, during which time the internet was terrific. I wish they could do that more often. In fact, it was almost like a holiday.

It was fantastic. I took my elderly relatives out and got them vaccinated. It was glorious. Now we're back to having Facebook and terrific. The problem I have whenever something like this happens is there's a whole bunch of crappy explainers out there of what is BGP and how might it work and people have angry opinions about all of these things.

So instead, I prefer to talk to you, given that you are a networking trainer, [00:09:00] you have taught people about these things, you have written books, you have operated large scale environments. I even developed a BGP course for Cisco. You taught it for Cisco, of all places. Back when that was impressive and awesome and not a has been.

It's honestly, I feel like I could go there and still wind up going back in time and still, it's the same Cisco in some respects. They've all ever died, Dinosaur, and they got frozen in amber. But let's start at the very beginning. What is BGP?

Ivan Pepelnjak: Well, you know, when the internet was young, they figured out that, uh, we aren't all friends on the internet anymore.

And, uh, I want to control what I tell you, and you want to control what you tell me. And furthermore, I want to control what I believe from what you're telling me. So we needed a protocol that would implement policy. where I could say, I will only announce my customers to you, but not what I've heard from Verizon.

And you would do the same. And then I would say, well, but I don't want to hear about that customer of [00:10:00] yours because he's also my customer. So we need some sort of policy. And so they invented a protocol where you will tell me what you have. I will tell you what I have. And then we would both choose what we want to believe and follow those paths to forward traffic.

And so BGP was born.

Corey Quinn: On some level, it seems like it's this faraway thing to people like me, because I have a residential internet connection, and I am not generally allowed to make my own BGP announcements to the greater world. Even when I was working in data centers, very often the BGP was handled by our upstream provider, or very occasionally by a router they would drop in with the easiest maintenance instructions in the world for me of step one, step Make sure it has power.

Step two, never touch it. Step three, we'd prefer if you don't even look at it and remain at least 20 feet away to keep from bringing your aura near anything we care about. And that's basically how you should do with me in the context of hardware. So it was always this [00:11:00] arcane magic thing.

Ivan Pepelnjak: Well, it's not, you know, it's like a power transmission.

When you know enough about it, it stops being magic. It's technology. It's a bit more complicated than some other stuff. It's way less complicated than some other stuff like quantum physics, but still it's so rarely used that it gets this aura of being mysterious. And then of course, everyone starts getting their opinion, particularly the graduates of the Facebook Academy.

And yes, it is true that usually BGP would be used between service providers. So whenever, you know, we are big enough to need policy, if you just need one uplink, there is no policy there. You either use the uplink or you don't use the uplink. If you want to have two different links to two different points of presence or to two different service providers, then you are already in the policy land.

Do I prefer one provider over the other? Do I want to [00:12:00] announce some things to one provider but other things to the other? Do I want to take local customers from both providers? Because I want to, you know. have lower latency because they are local customers, or do I want to use one solely as the backup link because I paid so little for that link that I know it's shitty?

So you need all that policy stuff. And to do that, you really need BGP. There is no other way. routing protocol in the world where you could implement that sort of policy. Because everything else is concerned mostly with let's figure out as fast as possible what is reachable and how to get there. And BGP is like, hey, slow down.

There's policy.

Corey Quinn: Yeah. In the context of someone whose primary interaction with networks is their home internet, where there's a single cable coming in from the outside world, you plug it into a device, maybe yours, maybe the ISP's, maybe we don't care, that's sort of the end of it. But think in terms of large interchanges, where there are multiple redundant networks to get from [00:13:00] here to somewhere else, which one should traffic go down at any given point in time?

Which networks are reachable on the other end of various distant links? That's the sort of problem that we're that BGP is very good at addressing and what it was built for. If you're running BGP internally, if in a small network, consider not doing exactly that.

Ivan Pepelnjak: Well, I've seen two use cases, well, three use cases for people running BGP internally.

Corey Quinn: Okay, this I want to hear, because I was always told, No, touch them! But, you know, I'm about to learn something. That's why I'm talking to you. The first

Ivan Pepelnjak: one was multinationals who needed policy.

Corey Quinn: Yes, many multi site environments, large scale companies that have redundant links. They're trying to run full mesh in some cases, or partial mesh, where between a bunch of facilities.

Ivan Pepelnjak: In this case, it was multiple continents and really expensive transcontinental links. And it was, I don't want to go from Europe to Sydney over US, I want to go over [00:14:00] Middle East. And to implement that type of policy, You have to split, you know, the whole network into regions and then each region is what BGP calls an autonomous system so that it gets its tag, its autonomous system number, and then you can do policy on that saying, well, I will not announce Asian routes to Europe through US or I will make them less preferred so that if the Middle East region goes down, I can still reach Asia through US, but Preferably, I will not go there.

The second one is, yet again, large networks where they had too many prefixes for something like OSPF to carry. And so their OSPF was breaking down and the only way to solve that was to go to something that was designed to scale better, which was BGP. And third one is if you want to implement some of the stuff that was designed for service providers initially, like VPNs, layer [00:15:00] two or layer three, then, uh, BGP becomes this kitchen sink protocol.

You know, it's like using, uh, route 53 as a database. We are using BGP to carry any information anyone ever wants to carry around. I'm just waiting for someone to design JSON in BGP RFC and then we are, you know, where we need to be.

Corey Quinn: I feel on some level like BGP gets relatively unfair criticism because the only time it really intrudes on the general awareness is when something has happened and it breaks.

This is sort of the quintessential network or systems or honestly computer type of issue. It's either Or, you're getting screamed at because something isn't working. It's almost like a utility on some level. When you turn on a faucet, you don't wonder whether water is going to come out this time, but if it doesn't, there's hell to pay.

Unless it's

Ivan Pepelnjak: brown.

Corey Quinn: Well, there is that. Let's, let's stay away from that particular direction. There's a beautiful metaphor, probably involving IBM, if we do. So, the [00:16:00] challenge, too, when you look at it, is that it's this weird, esoteric thing that isn't super well understood, and as soon as it breaks, everyone wants to know more about it.

And then, in theory, full on charging to the wrong side of the Dunning Kruger curve. It's, well, that doesn't sound hard. Why are they so bad at this? I would be able to run this better than they could. I assure you, you can't. This stuff is complicated. It is nuanced. It is difficult. But the common question is, why is this so fragile and able to easily break?

I'm going to turn that around. How is it that something that is this esoteric and touches so many different things works as well as it does?

Ivan Pepelnjak: Yeah, it's a miracle, particularly considering how crappy it is. The things are configured around the world.

Corey Quinn: There have been periodic outages of sites when some ISP sends out a bad BGP announcement and their upstream doesn't suppress it because, hey, you misconfigured things and suddenly half the internet believes, Oh, YouTube now lives in this different world.

Tiny place halfway around the world, rather than where it's currently being [00:17:00] anycasted from. Called Pakistan, to be precise. Exactly. There was an actual incident there. We are not dunking on Pakistan as an example of faraway place. No, no, and Pakistani ISP wound up doing exactly this and taking YouTube down for an afternoon a while back.

It's a common problem.

Ivan Pepelnjak: Yeah, the problem was that they tried to stop local users accessing YouTube and they figured out that, you know, YouTube is announcing this prefix and if they would announce two more specific prefixes then, you know, they would attract the traffic and the local users wouldn't be able to reach YouTube.

Perfect. But that leaked.

Corey Quinn: If you wind up saying that, all right, the entire internet is available on this interface, and a small network of 256 nodes available on the second interface, the most specific route always wins. That's why the default route, or route of last resort, is the entire internet, and if you don't know where to send it, Throw it down this direction, that is usually in most home environments, the gateway that then hands it up to your ISP where they inspect it and do all [00:18:00] kinds of fun things to sell ads to you and then eventually get it to where it's going.

This gets complicated at these higher levels. And I have sympathy for the technical aspects of what happened at Facebook, no sympathy whatsoever for the company itself because they basically do far more harm than they do good and I've been very upfront about that. But I want to talk to you as well about something that.

People are going to be convinced I'm taking this in my database direction, but I assure you I'm not. DNS. What is the relationship between BGP and DNS? Which sounds like a strange question sometimes.

Ivan Pepelnjak: There is none.

Corey Quinn: Excellent.

Ivan Pepelnjak: It's just that different large scale properties decided to implement the global load balancing.

global optimal access to their servers in different ways. So Cloudflare is a typical example of someone who is doing any cost. They're announcing the same networks, the same prefixes from [00:19:00] 100 locations around the world. So BGP will take care that you always get to the closest Cloudflare pop. And that's it.

That's how they work. No magic. Facebook didn't believe in the power of any cost when they started designing their service. So what they're doing is they have DNS servers around the world and the DNS servers serve the local region if you wish. And that DNS server then decides what facebook. com really stands for.

So if you query for facebook. com, you'll get a different answer in Europe than in US.

Corey Quinn: Just a slight diversion on what anycast is. If I ping Google's public resolver, 8. 8. 8. 8, Easy to remember. From my computer right now, the packet gets there and back in about 5 milliseconds. Wherever you are listening to this, if you were to try that same thing, you'd see something roughly similar.

Now, one [00:20:00] of two things is happening. Either Google has found a way to break the laws of physics and get traffic to a central point faster than light, or Google is The 8. 8. 8. 8 that I'm talking to and the one that you are talking to are not, in fact, the same computer.

Ivan Pepelnjak: Well, by the way, it's 13 milliseconds for me, and between you and me it's 200 milliseconds, so yes, they are cheating.

Corey Quinn: Just a little bit, or unless they huddled through the earth rather than having to bounce it off of satellites or through cables.

Ivan Pepelnjak: No, even that wouldn't work.

Corey Quinn: That's what the quantum computers are for. I always wondered, now we know.

Ivan Pepelnjak: Yeah, they're entangling the replies in advance and that's how it works. Yeah, you're right.

Corey Quinn: Please continue. I just wanted to clarify that point because I got that one hilariously wrong once upon a time and was extremely confused for about six months.

Ivan Pepelnjak: Yeah, it's something that no one ever thinks about unless, you know, you're really running large scale DNS because honestly, root DNS servers were any casted for ages.

You think there are like 12 different [00:21:00] root DNS servers? In reality, there are like 300 instances hidden behind those 12 addresses.

Corey Quinn: And fun trivia fact, the

reason there are 12 addresses is because any more than that would no longer fit within the 512 byte limit of a UDP packet without truncating.

Ivan Pepelnjak: Thanks for that. I didn't know that.

Corey Quinn: Of course. Now, EDNS extensions let you go out with a larger stop, but you can't guarantee that's going to hit. And what happens when you receive a UDP packet, when you receive a DNS result with a truncate flag set on the UDP packet? It is left to the client. It can either use the partial result, or it can try and re establish over a TCP connection.

That is one of those weird trivia questions they love to ask in sysadmin interviews. But it's, yeah, fundamentally, if you're doing something that requires the root name servers, you don't really want to start going down those arcane paths. You want it to just be something that fits in a single packet, not require a whole bunch of computational overhead.

Ivan Pepelnjak: Yeah, and even within those 300 instances, there are multiple servers. Listening to the same IP address [00:22:00] and incoming packets are just sprayed across those servers and whichever one gets the packet replies to it. And because it's UDP, it's one packet in, one packet out, problem solved, it all works. People thought that this doesn't work for TCP because you know, you need a whole session.

So you need to establish the session. You send the request, you get the reply, their acknowledgements, all that stuff. Turns out that there is. Almost never two ways to get to a certain destination across the internet from you. So people thought that, you know, this wouldn't work because half of your packets will end in San Francisco and half of the packets will end in San Jose, for example.

Doesn't work that way.

Corey Quinn: Why not?

Ivan Pepelnjak: Well, because the global internet is so diverse that you almost never get two equal cost paths to two different destinations. Because it would be San Francisco and San Jose announcing 8. 8. 8. [00:23:00] 8. And it would be a miracle if you would be sitting just in the middle. So that the first packet would go to San Francisco, the second one would go to San Jose and, you know, back and forth.

That never happens. That's why Cloudflare makes it work by announcing the same prefix throughout the world.

Corey Quinn: So I just learned something new about how routing announcers work, an aspect of BGP. And you, a few minutes ago, learned something about the UDP size limit and the root name servers. BGP and DNS are two of the oldest protocols in existence.

You and I are also decades into our careers. If someone is starting out their career today working in a cloudy environment, there are very few network centric roles because cloud providers handle a lot of this for us. Given these protocols are so foundational to what goes on, and they're as old as they are, Are we as an industry slash sector slash engineers losing the skills to effectively deploy and manage these things?

Ivan Pepelnjak: Yes. The same [00:24:00] problem that you have in any other sufficiently developed technology area. How many people can build power lines? How many people can write a compiler? How many people can design a new CPU? How many people can design a new motherboard? I mean, when I was 18 years old, I was wire wrapping my own motherboard with 8 bit processor.

You can't do that today. You know, as the technology is evolving and maturing, it's no longer fun. It's no longer sexy. It stops being a hobby. And so it bifurcates into users. and people who know about stuff. And it's really hard to bridge the gap from one to the other. So in the end, you have like this 20 gray barred people who know everything about the technology and the youngsters have no idea.

And when these people die, don't ask me how we'll get any further on.

Corey Quinn: Few things are better for your career and your company than [00:25:00] achieving more expertise in the cloud. Security improves, compensation goes up, employee retention skyrockets, pen optica, a cloud security platform from Cisco has created an Academy of Free courses just for you.

Head on over to academy. panoptica. app to get started.

On some level, it feels like it's a bit of a down the stack analogy for what happened to me early in my career. My first systems administration job was running a large scale email system. It was a hobby that I was interested in. I basically bluffed my way into working at a university for a year.

Thanks Chapman, I appreciate that. And it was great, but it was also pretty clear to me that With the rise of things like hosted email, Gmail, and whatnot, it was not going to be the future of what the present day at that point looked like, which was most large companies needed an email administrator.

Those jobs were dwindling. Now, if you want to be an email systems administrator, there are maybe a dozen companies or so that can really use that [00:26:00] skill set. And everyone else just outsources that. That said, at those companies like Google and Microsoft, there are some incredibly gifted email administrators who are phenomenal at understanding every nuance of this.

Do you think that that is what we're going to see in the world of running BGP at large scale, where a few companies really need to know how this stuff works and everyone else just sort of smiles, nods, and rolls with it?

Ivan Pepelnjak: Absolutely. We are already there. Because, you know, if I am an end customer and I need BGP because I have two uplinks to two ISPs, that's really easy.

I mean, there are a few tricks you should follow and hopefully some of the guard rails will be built into network operating systems so that you will really have to confuse figure explicitly that you want to leak crowds between Verizon and AT& T which is great fun if you have two low speed links to both of them and now you're becoming transit between the two which did happen to Verizon that's why I'm mentioning them sorry guys anyway if you are a [00:27:00] small guy and you just need two uplinks and maybe do a bit of policy that's easy and that's uh Achievable, let's say, with some Google interface and throwing spaghetti at the wall and seeing what sticks.

On the other hand, what the large scale providers like, for example, Facebook, because we were talking about them, are doing is like light years away. It's like comparing me turning on the light bulb and someone running, you know, a nuclear reactor.

Corey Quinn: Yeah, you kind of want the experts running some aspects on that.

Honestly, in my case, you probably want someone more competent flipping the light switch too, but that's why I have IoT devices here that power my lights. It, on the one hand, keeps me from hurting myself, and on the other, leads to a nice seasonal feel because my house is freaking haunted.

Ivan Pepelnjak: So, coming back to Facebook.

They have these DNS servers all around the world, and they don't want everyone else to freak out when one of these DNS servers goes away. So that's why they're using the [00:28:00] same IP address. For all the DNS servers sitting anywhere in the world. So the name server for facebook. com is the same worldwide. But it's different machines and they will give you different answers when you ask Where is facebook.

com? I will get a European answer, you will get a US answer, Someone in Asia will get whatever. And so they're using BGP. to advertise the DNS servers to the world so that everyone gets to the closest DNS server. And now it doesn't make sense, right? For the DNS server to say, hey, come to European Facebook if European Facebook tends to be down.

So if their DNS server discovers that it cannot reach the servers in the data center, it stops advertising itself with BGP. Why with BGP? Because that's the only thing it can do. That's the only protocol where I can tell you, Hey, I know about this prefix. You really should send the traffic to me. And that's what happened to [00:29:00] Facebook.

They bricked their backbone, whatever they did, they never told. And so their DNS server said, Gee, I can't reach the data center. I better stop announcing that I'm a DNS server because obviously I am disconnected from the rest of Facebook. And that happens to all DNS servers because you know, the backbone was bricked.

And so they just, you know, de peered from the internet. They stopped advertising themselves. And so we thought that there was no DNS server for Facebook because no DNS server was able to reach their core. And so all DNS servers were like, Gee, I better get off this because, you know, I have no clue what's going on.

So everything was working fine. Everything was there. It's just that they didn't want to talk to us because they couldn't reach the backend servers. And of course people blamed DNS first because the DNS servers weren't working. Of course they weren't. And then they blame DBGP because it [00:30:00] must be BGP if it isn't DNS.

But it's like, you know, you're blaming headache and muscle cramps and high fever. But in fact, you have flu.

Corey Quinn: For almost any other

company that wasn't Facebook, this would have been a less severe outage just because most companies are interdependent on other companies to run infrastructure. When Facebook itself has evolved the way that it has, everything that they use internally runs on these same systems.

So they wound up almost with a bootstrapping problem. An example of this in more prosaic terms are, okay, the data center had a power outage. Okay, now I need to Power up all the systems again and the physical servers I'm trying to turn on need to talk to a DNS server to finish booting, but the DNS server is a VM that lives on those physical servers.

Uh oh, now I'm in trouble. That is an overly simplified and real example of what Facebook encountered trying to get back into this, to my understanding.

Ivan Pepelnjak: Yes, so it was worse than that. It looks like, you know, even out of [00:31:00] band management access didn't work. Which to me would suggest that out of band management was using authentication servers that were down.

People couldn't even log to Zoom because Zoom was using single sign on based on Facebook. com and Facebook. com was down, so they couldn't even make Zoom calls or open Google Dots or whatever. There were rumors that there was a certain hardware tool with a rotating blade that was used to get into a data center and unbrick a box, but those rumors were vehemently denied, so who knows?

Corey Quinn: The idea of having someone trying to physically break into a data center in order to power things back up is hilarious, but it does lead to an interesting question, which is in this world of cloud computing, there are a lot of people in the physical data centers themselves, but they don't have access in most cases to log into any of the boxes.

One of the most naive things I see all the time is, Oh, well, what the cloud provider can read [00:32:00] all of your data. No, they can't. These things are audited, and yeah, theoretically, if they're lying outright and somehow have falsified all of the third party audit stuff that has been reported and are willing to completely destroy their business when it gets out, and I assure you it would, yeah, theoretically that's there.

There is an element of trust here. I've had to answer a couple of journalist questions recently of, ooh, is AWS going to start scanning all customer content? No, they physically cannot do it, because there are many ways you can do, you can configure things where they cannot see it. And that's exactly what we want.

Ivan Pepelnjak: Yeah, like a disk encryption?

Corey Quinn: Exactly! Disk encryption, KMS on some level, rolling your own, etc, etc. They use a lot of the same systems we do. The point being, though, is that people in the data centers do not even have login rights to any of these nodes for the physical machines in some cases, let alone the customer tenants on top of those things.

So, on some level, you wind up with the people building these systems that run on top of these computers and they've never set foot in one of the data [00:33:00] centers. That seems ridiculous to me as someone who came up visiting data centers because I had to know where things were when they were working so I could put them back that way when they broke later but that's not necessary anymore.

Ivan Pepelnjak: Yeah and that's the problem that Facebook was facing with that outage because you start believing that certain systems will always work and when those systems break down you're totally cut off and then, oh, There was an article in ACMQ a long while ago, uh, where they were discussing, you know, the results of simulated failures, not real ones, and there were hilarious things like phone directory was offline because it wasn't on, uh, UPS, and so they didn't know whom to call.

Or alerts couldn't be diverted to a different data center because the management station for alert configuration was offline because it wasn't on UPS. Or, [00:34:00] you know the one, right, where in New York they placed the gas pump in the basement? And the diesel generators were on the top floor and the hurricane came in and they had to carry gas manually all the way up to the top floor because the gas pump in the basement just stopped working.

It was flooded. So they did everything right. Just the fuel wouldn't come to the diesel generators.

Corey Quinn: It's always the stuff that is under the hood on these things that you can't make sense of. One of the biggest things I did when I was evaluating data center sites was I'd get a one line diagram, which is an electrical layout of the entire facility.

Great. I talked to the folks running it. Now let's take a walk and tour it. Hmm. Okay. You show four transformers on your one line diagram. I see two transformers and two empty concrete pads. It's an aspirational One line diagram. It's a joke that makes it a one liner diagram, and it's not very funny. So it's okay if I can't trust you [00:35:00] for those little things, that's a problem.

Ivan Pepelnjak: Yeah, well, I have another funny story like that. We had two power feeds coming into the house plus the diesel generator, and it was, you know, the properly tested every month diesel generator. And then they were doing some maintenance and they told us in advance that they will cut both power feeds at 2 a.

m. on a Sunday morning. And guess what? The diesel generator didn't start. Half an hour later, UPS was empty. We were totally dead in water with quadruple redundancy because you can't get someone at 2 a. m. on a Sunday morning to press that button on the diesel generator in half an hour.

Corey Quinn: That is unfortunate.

Ivan Pepelnjak: Yeah, but that's how the world

works.

Corey Quinn: So, it's been fantastic reminding myself of some of the things I've forgotten, because let's be clear, in working with cloud, a lot of this stuff is completely abstracted away. I don't have to care about most of these things anymore. Now, [00:36:00] there's a small team of people at AWS who very much has to care, and if they don't, I will say mean things to them on Twitter if I let my HugOps position slip at just a smidgen.

They do such a good job at this. We don't have problems like this. almost ever to the point where when it does happen, it's noteworthy. It's been fun talking to you about this just because it's a trip down a memory lane that is a lot more aligned with the things that are there. And we tend not to think about them.

It's almost a how it's made episode.

Ivan Pepelnjak: Yeah. And, uh, don't be so relaxed regarding the cloud networking, because you know, if you don't go. Full serverless with nothing on premises. You know what protocol you're running between on premises and the cloud on Direct Connect? It's called BGP.

Corey Quinn: Ah, you know, I did not know that.

I've, I've done some ridiculous IPSec pairings over those things and was extremely unhappy for a while afterwards, but never got to the BGP piece of it. Makes sense.

Ivan Pepelnjak: Yeah. Even over IPSec, if you want [00:37:00] to have any dynamic, uh, failover or multiple sites or anything, it's BP.

Corey Quinn: I really want to thank you for taking the time to go through

all this with me.

People want to learn more about how you view these things, learn more things from you, as I strongly recommend they should, if they're even slightly interested by the conversation we've had, where can they find you?

Ivan Pepelnjak: Well, just go to ipspace. net and start exploring. There's the blog with thousands of blog entries, some of them snarkier than others, uh, then there are like 200 webinars, uh, short snippets of a few hours of

Corey Quinn: It's like a one man version of Reinvent, my god.

Ivan Pepelnjak: Yeah, sort of, but I've been working on this for 10 years and they do it every year, so I can't produce the content at their speed. And then there are three different full blown courses. Some of them are just, you know, the materials from the webinars, plus guest speakers, plus hands on exercises, plus I personally [00:38:00] review all the stuff people submit, and they cover data centers and automation and public clouds.

Corey Quinn: Fantastic. And we will, of course, put links to that into the show notes. Thank you so

much for being so generous with your time. I appreciate it.

Ivan Pepelnjak: Oh, it's been such a huge pleasure. It's always great talking with you. Thank you.

Corey Quinn: It really is. Thank you once again, Ivan Pepanyak, Network Architect, and oh so much more, CCIE number 1354 Emeritus, and read the bio.

It's well worth it. I am Cloud Economist Corey Quinn, and this has is screaming in the cloud. If you've enjoyed this podcast, please leave a five star review on your podcast platform of choice. Whereas if you've hated this podcast, please leave a five star review on your podcast platform of choice and a comment formatted as a RIP V2 announcement.

Summer Replay – Ironing out the BGP Ruffles with Ivan Pepelnjak

Episode Summary

Episode Video

Episode Show Notes & Transcript

Transcript

You might also like

See Why GenAI Workloads Are Breaking Observability with Wayne Segar

Presenting at re:Invent with Matt Berk and Bowen Wang

The Latest State of IaC with Ido Neeman

Get the Newsletter

Sponsor an Episode