Networking in the Cloud Fundamentals: BGP Revisited with Ivan Pepelnjak

Episode Summary

Join me as I conclude my series on cloud fundamentals by reexamining border gateway protocol (BGP) with Ivan Pepelnjak, Chief Technology Advisor at NIL Data Communications. This episode features a discussion about what Ivan believes Corey got wrong about BGP in a previous episode of this podcast; Ivan’s telling of the history of BGP and how it has evolved over time; why Ivan thinks that, when something goes wrong, it’s not fair to blame the tool itself, and that the misuse of the tool is what deserves the blame; why regulators may have to think about driver’s licenses for the internet; the year modern BGP emerged; and more.

Episode Show Notes & Transcript

About Corey Quinn
Over the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.

Corey: Hello and welcome to our Networking In The Cloud mini series, sponsored by ThousandEyes. That's right. There may be just one of you, but there are a thousand eyes. On a more serious note, ThousandEyes has sponsored their cloud performance benchmarking report for 2019, at the end of last year, talking about what it looks like when you race various cloud providers. They looked at all the big cloud providers and determined what does performance look like from an end user perspective? What does the user experience look like among and between different cloud providers? To get your copy of this report, you can visit Why real clouds? Well, because they raced AWS, Azure, GCP, IBM Cloud, and Alibaba, all of which are real clouds. They did not include Oracle cloud because, once again, they are real clouds. Check out your copy of the report at

Welcome to week 12 of the Networking In The Cloud mini series of the AWS Morning Brief, sponsored by ThousandEyes. So one of the early episodes of the Networking In The Cloud mini series had me opining and relatively uninformed broad brush strokes about the nature of BGP. Today I am joined by Ivan Pepelnjak, who is a former CCIE who wrote a fascinating blog post that I will link to in the show notes, saying, "This is great, but this is what happens when someone who's good at one thing steps completely out of their comfort zone into things they don't fully understand and start opining confidently, if not authoritatively." Ivan, thank you for taking the time to speak with me.

Ivan: Thanks for having me on. And no, I was way more polite than your summary.

Corey: Absolutely. I believe that there's a way to tell a story of the hero's journey that everyone talks about when they're building a narrative arc. Instead, I go for the moron's journey and I always like to be the moron because, generally, I tend to be, and as I walk through the world and get things sometimes right, occasionally wrong, I love being corrected when I stumble blindly into an area I don't know. First because it gives me an opportunity to learn something new, which is great, but it also gives me that opportunity to be the dumbest person in the room again, which is awesome. So...

Ivan: That's exactly why I blog to get your opinions.

Corey: Exactly. You have data, I have opinions and mine are louder seems to be the way that discourse works in the modern era. So from a high level, what did I get wrong about BGP?

Ivan: Well, you got everything right about the mess that we are in and the fragility of the generic internet infrastructure. The only thing you got wrong was that you blamed the tool, but not people using the tool.

Corey: It always feels like it's safer, on some level, to blame technology because if the takeaway is, "Well, the user experience around tool X isn't great, and that adds a contributing factor to why things break." That seems to be a message that carries slightly better than, "And thus the answer is for everyone to be smarter and stop screwing up." And that may very well be the answer. It's just a bitter pill to swallow sometimes. So I find blaming a tool is easy.

Ivan: Yeah, but it's like blaming the knives for people to get cut or blaming the chainsaw for people to cut off their arm because they were not properly trained.

Corey: One of my assertions was that BGP is more or less a hot mess because it was designed for an era when people on the internet fundamentally could trust one another and that doesn't seem to be the case today. The analogy in my mind, that I don't think I mentioned, was SMTP, the the email protocol, for lack of a better term. When that was built, the internet was more or less comprised of researchers and who in the world would ever abuse a protocol like email? It's not like there was any money involved in the internet. Fast forward today and your spam folder is inherently a garbage fire.

Ivan: Yeah, but BGP has a slightly different history. It was redesigned a few times. There were several attempts to get the global routing protocol right. And BGP, the last attempt, already included the tools that allow entities that don't trust each other, like commercial internet service providers, to exchange information and apply policies on inbound and outbound updates. So for example, I don't want to hear about your customers because I hate you and I don't want to peer with you or I don't want to tell you about my customer because that customer has a special deal and their traffic can only go through some other transit providers so I will not tell you about that customer. Those things were already a major requirement when BGP was designed and it always included the tools to implement the policies that individual commercial entities wanted to have, which by the way, never happens to SMTP. We have BGP version 4 now and we are still on SMTP version plus enhancements.

Corey: I guess the best analogy I can come up with through my exposure with BGP, because I tend to handle inter networking between various groups about as well as I write code, things that I have some vague awareness that there are things you should be doing here that I will almost certainly not get right, so I back away slowly and leave it to professionals. As a result, every time I really see how BGP works in any hands-on sense or a point where it's forced upon my awareness, it's similar to how I become aware of plumbing. I don't think about it. I don't question it. I just expect when I turn the faucet on or flush the toilet that water will do what it's going to do. I don't expect the toilet to explode. So the only time I think about BGP is when there is a peering dispute or when there's a flap or, on one notable occasion, when I was at a security conference and, as a demo, some folks hijacked the entire AS of the SN for the conference and rerouted it halfway around the world and back, which explained why everything was super latent and crappy.

Ivan: Yeah. You're absolutely right, but all the incidents you mentioned are not the fault of the tool. They are the fault of the tool not being properly used. And also, let's be honest, it took them hundreds of years to get the plan being to the point when you can just turn on the faucet and the clean and drinkable water comes out of it. It's not like that would have happened in the last year or two, and very probably it wouldn't have happened without public pressure to bring us drinkable water and interest in paying for the drinkable water and some wide regulation to ensure that, if the water company says the water is drinkable, it actually is drinkable, and we have none of those in a BGP or in generic internet, global internet infrastructure, I should say. Now you see where you got me, I started blaming the tool.

Corey: See isn't it addictive? Because it's easy to blame tools. When you start blaming individuals or people, it suddenly feels like, "Oh dear, now I'm calling people out. Sometimes intentionally, sometimes not." And then "Oh, did they ever come out of the woodwork?"

Ivan: Yeah. If we go back to, for example, the one example you mentioned where someone was able to hijack the whole conference autonomous system, that is because no one is looking at the updates that are being sent through the internet. For various reasons. A, because the service providers are not motivated to filter the announcements that their customers are sending. And B, in the internet core, you might be in the place where everything is so complex that you just don't want to touch anything. So all you have to do to hijack whoever you wish is you find the sloppiest possible tier one provider, find the sloppiest possible tier two provider that is connected to the tier one provider, because now you know that neither one of them will filter what you're sending out, and then you just start hijacking AWS, DNS servers, for example.

Corey: What is the answer to something like this, other than yelling at those sloppy providers to clean up their act?

Ivan: Well, there are two answers. One is customer pressure, but as long as the customers will go and buy the cheapest bandwidth possible without considering the quality of the service that the service provider is offering, we're not getting anywhere there. The other thing is regulation. There is a reason we have driving licenses and there is a reason that truck drivers have to pass a different exam than you and me because they could do more damage. We don't have anything like that on the global internet. It's totally unregulated, apart from, let's say, mutually agreed understanding that there are five organization worldwide who handle the address space and autonomous system numbers. Even there I am getting messages from a few mailing lists where every week someone is yet again describing how the crooks managed to hijack unused address space belonging to whatever legacy entity just because one of those five organizations that were supposed to do the right thing and take care of proper allocation of address space just didn't check the very basics of whether the request they got was legit or not.

Corey: The challenge of authenticating that something comes from who it claims to be generally feels like an authentication piece, is possibly an encryption story as well, but to my understanding that was only added to BGP after the internet was already a going concern. Is my history mistaken on that?

Ivan: No, you're absolutely right. You have two ways of solving this problem. One is with technology and the other one is with good processes because what's stopping us from having a global database of who owns what and having someone being responsible for that database? And then we can all use the information from that database to build filters. So for example, if you have AS number one and that database says that you only own one prefix, why would I ever accept more than one prefix from you, my customer? And why would I ever accept prefix from an address space that doesn't belong to you?

But of course that requires that A, everyone registers in that database and no one has ever made that mandatory, and B, that I, as the service provider, actually care about security. Honestly, for a sloppy service provider, it's cheaper not to care about security because caring about security causes support costs, it causes education of clueless customers and all that is costing money. It's way simpler to just accept everything, propagate everything, pollute the global internet with toxic waste, claim that it's not your fault but your customer's fault, and then everyone comes to the conclusion that BGP is a hot mess.

Corey: When did modern BGP, as it stands today, emerge in its current form?

Ivan: It's my vague memory that it must have been in the early 1990s.

Corey: Which is later than one would expect given that the internet predates that by a significant margin.

Ivan: Yeah. They had one routing protocol when ARPANET was still the core of the internet because then it was easy, everyone was sending information to our ARPANET and only ARPANET needed to know where everyone was. And then they figured out that no, this will not work and they invented a routing protocol, I think it was called EGP, and that thing worked for a little bit longer and then they figured out that no, this is not going to work. And then there was the famous coffee shop, or whatever chat between two engineers, resulting in the famous three napkins that were the original specification of BGP. That got implemented and that was version one. Then we had version two and version three, and I started playing with being an ISP when they were just migrating from version three to version four, which is what we have today, and that must have been in early 1990s. But I think that Russ White did a podcast on the history of BGP once and, if I ever find it, I'll send you the link to include in the show notes.

Corey: I would like to thank once again ThousandEyes for making this entire ridiculous rant possible. In addition to their cloud performance benchmark report, ThousandEyes winds up giving companies insight into what's going on on the broader internet: routing issues, provider failures upstream, different companies having different problems. It more or less is a real time traffic meets weather map for the internet. This helps companies who use them wind up getting a better perspective of what the current end user experience is and begin routing around provider failures, yelling at providers, et cetera, ideally before those errors become evident to customers. To learn more, visit and tell them Corey sent you. In fact, they may very well say something like, "Wow, you heard about us from Corey and you still looked us up, what a Testament to how awesome our product is." My thanks again to ThousandEyes for putting up with my ridiculous nonsense to sponsor this ridiculous podcast.

Tell me a little bit about who you are and why you're well positioned to opine on these ethereal topics that those of us working in small, scrappy TwitterForPets-style startups don't generally have to think about these level of networking deep dive complexities. Who is Ivan Pepelnjak?

Ivan: Well, I started with networking in mid 1980s, and in those days there was no internet where I was, and then internet came to central Europe and I was, in that time, in Yugoslavia, so almost behind the iron curtain. In early 1990s, at which time I was already building local area networks, and then set up the first commercial ISP in my country. At that time approximately, we became Cisco partner and they pushed me into becoming one of the instructors, which I think I got the instructor number 12 worldwide or something along those lines, this was one of the first batches of Cisco certified instructors, and then I started developing courses for Cisco and, actually, BGP was my first course I developed for them. Years later date that turned into some official Cisco training and I have no idea whether they are still doing that or not or how that course would be called today.

Then the big internet bubble happened and we started offering professional services throughout Europe, designing and building large internet networks for the traditional service providers, and then that that bubble burst and I was smart enough to retire at approximately that time or, as someone said, "Took a long coffee break." Got bored, started blogging, and then figured out that there is this tiny little niche for someone to explain to networking engineers how the networking vendors are trying to oversell whatever they are doing. Whereas in reality, it's usually just recast off old stuff with new clothes and some shiny glamor on top of that so you don't figure out what's going on. And that's what I've been doing for the last almost 15 years.

Corey: You have a similar aspect of your business as I do for mine. Namely, you are independent, you are not backed by any particular vendor and, as a result, you're not sitting here with an agenda of, "Oh, you should do whatever you want, but as long as you're buying Cisco gear to do it," for example. You're a trusted voice in your space.

Ivan: Well, I would hope I am, but yeah, I don't have any vendor behind me. Actually, one of the vendor reps, one told me, "We don't care what you say about us as long as you're equally snarky towards everyone."

Corey: Exactly. You can be a jerk as much as you want, just make sure you're a jerk to everyone. That's part of it, for me at least. The other part, given my unique styling, has always been punch up, never down. The reason I own is because making fun of an actual startup where people are doing blood, sweat, and tears trying to get something off the ground just makes me a jerk.

Ivan: Yeah, likewise. I would never go after a small company, I would try to help them, but the major networking vendors are fair game.

Corey: Absolutely, and this also helps bring this mini series to a close by answering a question I didn't know how I was going to answer until we got here at the end, which is: if people want to learn more about networking in the cloud, now that I've more or less exhausted my knowledge on this, where can they go next? Until today the answer was "Idunno," but now I can say, "They talk to you." You can take them down the path of what modern networking in a cloud era looks like. You could be found at, as well as wherever fine networking snark is sold."

Ivan: Yes, exactly. Thank you.

Corey: Of course. You do webinars, you do a podcast of your own, you've written several books, and your blog is, I'm going to say, obnoxiously prolific.

Ivan: Yeah. I try to publish something on my blog every day, and sometimes it's just a pointer to some other stuff I'm doing. Sometimes it's a technical deep dive into a particular topic. I try to publish one rant per week to keep people amused and other people extremely angry. And yes, I do webinars on particular networking technologies, often with guest speakers, so right now we have more guests than my own webinars. Some of this would be on networking in public clouds, others would be on network automation. So yeah, if you want to know how networking really works in public clouds, either you go for the public cloud official training, AWS has something, Azure has something, or you can try and look at my stuff and see what my opinion is on what they're telling you.

Corey: Which I strongly endorse and recommend. Thank you so much for taking the time to correct some of my misunderstandings around what is, admittedly, a highly complex topic.

Ivan: You're most welcome. Thanks for having me.

Corey: Of course. Ivan Pepelnjak,, independent blogger, trusted voice, and gentle corrector when folks get it wrong. This has been the 12 week Networking In The Cloud mini series. Thank you one last time to ThousandEyes for their generous sponsorship of this ridiculous podcast mini series. Thank you again, Ivan, for correcting me when I get it wrong in a variety of fascinating but incredibly confident sounding ways. I am cloud economist Corey Quinn, if you've enjoyed this podcast, please leave a five star review on Apple podcasts. If you've hated this podcast, please leave a five star review on Apple podcasts, and tell me exactly what my problem is in the comments.

Announcer: This has been HumblePod Production. Stay humble.

Newsletter Footer

Get the Newsletter

Reach over 30,000 discerning engineers, managers, enthusiasts who actually care about the state of Amazon’s cloud ecosystems.

"*" indicates required fields

This field is for validation purposes and should be left unchanged.
Sponsor Icon Footer

Sponsor an Episode

Get your message in front of people who care enough to keep current about the cloud phenomenon and its business impacts.