Join Pete and Jesse as they talk about the Herculean effort that is lifting and shifting a system to AWS; why your work is just getting started after you do a lift-and-shift; how technical debt piles up when you don’t modernize applications to take advantage of cloud-native tools, frustrating your team; how many people forget about the new costs they’ll need to pay after moving to AWS (e.g., data transfer and storage); how deciding to use Oracle was probably a good choice at the time but why most businesses are migrating away from it; how you can think about migrating to the cloud the same way you might think about moving from a monolith to a microservices architecture; how to get rid of your Oracle addiction; managing costs in AWS Batch; and more.
Corey: This episode is sponsored in part by LaunchDarkly
. Take a look at what it takes to get your code into production. I’m going to just guess that it’s awful because it’s always awful. No one loves their deployment process. What if launching new features didn’t require you to do a full-on code and possibly infrastructure deploy? What if you could test on a small subset of users and then roll it back immediately if results aren’t what you expect? LaunchDarkly does exactly this. To learn more, visit launchdarkly.com
and tell them Corey sent you, and watch for the wince.
Pete: Hello, and welcome to the AWS Morning Brief: Fridays From the Field. I am Pete Cheslock.
Jesse: I’m Jesse DeRose.
Jesse: This is fantastic. I’m really excited that we have one fan. I’ve always wanted one fan.
Pete: Well, two fans now. Maybe even more because we keep getting questions. And you can also be one of our Friends of the Pod by going to lastweekinaws.com/QA
. And you can give us some feedback, you can give us a question and, like, will totally answer it because we like Friends of the Pod.
Jesse: We may or may not enter you into a raffle to get a Members Only jacket that’s branded with ‘Friends with the Pod.’
Pete: We should get some pins made, maybe.
Pete: I think that's a good idea.
Pete: So, what are we answering today, or attempting to answer for our listener, Jesse?
Jesse: So today, we’ve got a really great question from [Godwin 00:01:20]. Thank you, Godwin, Godwin writes, “I truly believe that the system that I support is, like, a data hoarder. We do a lot of data ingestion, we recently did a lift-and-shift of the system to AWS, we use an Oracle database. The question is, how do I segregate the data and start thinking about moving it out of traditional relational databases and into other types of databases? Presently, our method is all types of data goes into a quote-unquote, ‘all-purpose database,’ and the database is growing quite fast. Where should I get started?”
Pete: Well, I just want to commend you for a lift-and-shift into Amazon. That’s a Herculean feat, no matter what you’re lifting and shifting over. Hopefully, you have maybe started to decommission those original data centers and you don’t just have more data in twice as many locations.
Jesse: [laugh]. But I also want to call out well done for thinking about not just the lift-and-shift, but the next step. I feel like that’s the thing that a lot of people forget about. They think about the lift-and-shift, and then they go, “Awesome. We’re hybrid. We’re in AWS, now. We’re in our data center. We’re good. Case closed.” And they forget that there’s a lot more work to do to modernize all those workloads in AWS, once you’ve lifted and shifted. And this is part of that conversation.
Pete: Yeah, that’s a really good point because I know we’ve talked about this in the past, the lift-and-shift shot clock: when you don’t start migrating, start modernizing those applications to take advantage of things that are more cloud-native, the technical debt is really going to start piling up, and the folks that are going to manage that are going to get more burnt out, and it really is going to end poorly. So, the fact you’re starting to think about this now is a great thing. Also, what is available to you now that you’re on AWS is huge compared to a traditional data center.
Pete: And that’s not just talking about the—I don’t even know if I’ve ever counted how many different databases exist on Amazon. I mean, they have a database for, at this point, every type of data. I mean, is there a type of data that they’re going to create, just so that they can create a database to put it into?
Jesse: Wouldn’t surprise me at this point.
Pete: They’ll find a way [laugh] to come up with that charge on your bill. But when it comes to Oracle, specifically Oracle databases, there’s obviously a big problem in not only the cost of the engine, running the database on a RDS or something to that effect, but you have licensing costs that are added into it as well. Maybe you have a bring-your-own-license or maybe you’re just using the off-the-shelf, but the off-the-shelf, kind of, ‘retail on-demand pricing’ RDS—I’m using air quotes for all these things, but you can’t see that—they will just have the licensing costs baked in as well. So, you’re paying for it—kind of—either way.
Jesse: And I think this is something also to think about that we’ll dive into in a minute, but one of the things that a lot of people forget about when they move into AWS says that you’re not just paying for data sitting on a piece of hardware in a data center that’s depreciating, now. You’re paying for storage, you’re paying for I/O costs, you’re paying for data transfer, to Pete’s point, you’re also paying for some of the license as well, potentially. So, there’s lots of different costs associated with keeping an Oracle Database running in AWS. So, that’s actually probably the best place to start thinking about this next step about where to get started. Think about the usage patterns of your data.
And this may be something that you need to involve engineering, maybe involve product for if they’re part of these conversations for storage of your product or your feature sets. Think about what are the usage patterns of your data?
Pete: Yeah, exactly. Now, you may say to yourself, “Well, we’re on Oracle”—and I’m sure people listening are like, “Well, that’s your problem. You should just move off of Oracle.” And since you can’t go back in time and undo that decision—and the reality is, it probably was a good decision at the time. There’s a lot of businesses, including Amazon, who ran all of their systems on Oracle.
And then migrated off of them. Understanding the usage patterns, what type of data is going into Oracle, I think is a big one. Because if you can understand the access patterns of the types of data that are going in, that can help you start peeling off where that data should go. Now, let’s say you’re just pushing all new data created. And we don’t even know what your data is, so we’re going to take some wild assumptions here on what you could possibly do—but more so just giving you homework, really—thinking about the type of data going in, right?
If you’re just—“I’m pushing all of my data into this database because someday we might need to query it.” That’s actually a situation where you really want to start thinking of leveraging more of a data warehouse-style approach to it, where you have a large amount of data being created, you don’t know if you’re going to need to query it in the future, but you might want to glean some value out of that. Using S3, which is now available to you outside of your data center world, is going to be super valuable to just very cheaply shove data into S3, to be able to go back in later time. And then you can use things like Athena to ad hoc query that data, or leverage a lot of the ingestion services that exist to suck that data into other databases. But thinking about what’s being created, when it is going into places is a big first step to start understanding, well, how quickly does this data need to come back?
Can the query be measured in many seconds? Can it be done ad hoc, like in Athena? Does it need to be measured in milliseconds? What’s the replication that needs to happen? Is this very valuable data that we need to have multiple backups on?
Is it queried more than it’s created? Maybe you need to have multiple replica reader databases that are there. So, all these types of things of really understanding just what’s there to begin with, and it’s probably going to be in talking to a lot of engineering teams.
Jesse: Yeah, you can think about this project in the same way that you might move from a monolith to a microservice architecture. So, if you’re moving from a monolith to a microservice architecture, you might start peeling away pieces of the monolith, one at a time. Pieces that can easily be turned into microservices that stand on their own within the cloud, even if they’re running on the same underlying infrastructure as the monolith itself within AWS. And then, as you can pull those pieces away, then start thinking about does this need to be in a relational database? Does this need to have the same amount of uptime and availability as the resources that are sitting in my Oracle Database right now?
All those things that Pete just mentioned, start thinking about all of those components to figure out where best to pull off the individual components of data, and ultimately put them in different places within AWS. And to be clear, there’s lots of great guides on the internet that talk about moving from your Oracle database into, gosh, just about any database of choice. AWS even has specific instructions
for this, and we’ll throw a link in the [show notes 00:09:02].
They really, really want you to move this data to RDS Aurora. They go through painstaking detail to talk about using the AWS schema conversion tool to convert your schema over; they talk about the AWS database migration service to migrate the data over, and then they talk about performing post-migration activities such as running SQL queries for validating the object types, object count, things like that. I think that a lot of folks actually don’t know that the database migration service exists, and it’s something worth calling out as a really powerful tool.
Pete: Yeah, the Amazon DMS service is honestly I think, a super-underrated service that people just don’t know about. It has the ability to replicate data from both on-premises databases to Amazon databases but also databases already running on Amazon. You could replicate from a database running on EC2 into Aurora. You could replicate that into S3—you know, replicate data into S3 that way, bringing things into sync—replicate that data into S3, and then maybe use it for other purposes. It can replicate data from DocumentDB into other sources.
So, they’re clearly doing a big investment in there. And to Jesse’s point, yeah, Amazon really wants this data. So, talk to your account manager as you’re testing out some of these services. Do a small proof of concept, maybe, to see how well it works, if you can understand the queries, or you can point your application over at an Aurora database with some of this data migrated in; that’s a great way to understand how well this could work for your organization. But as Jesse mentioned, they do want that data in Aurora.
So, if it turns out that you’re looking at your—you know, migrate some data in there, and it’s starting to work, and you’re kind of getting a feel for the engineering effort to migrate there, stop. Talk to your account manager before you spend any more money on Aurora because it’s very likely that they can put together a program—if a program doesn’t already exist—to incentivize you to move that data over; they can give you subject matter expertise; they can provide you credits to help you migrate that data over. Don’t feel like you have to do this on your own. You have an account team; you should definitely reach out to them, and they will provide you a lot of help to get that data in there. They’ve done it for many of their other clients, and they’re happy to do it for you because they know that, long term, when you move that data to Aurora, it’s going to be very sticky in Aurora.
You’re probably not going to move off of there. It’s a long game for them; that’s how they play it. So, check out those services; that could be a really great way to help you get rid of your Oracle addiction.
Jesse: Yeah, and if you’re able to, as we talked about earlier, if you’re able to identify workloads that don’t need to run in a relational database, or don’t need to run in, maybe, a database at all, for that matter, stick that data in S3. Call it a day. Put them on lifecycle management policies or different storage tiers, and use Athena for ad hoc queries, or maybe Redshift if you’re doing more data warehouse-style tasks. But if that data doesn’t need to live in a relational database, there are many cheaper options for that data.
Pete: Exactly. But one last point I will make is don’t shove it into MongoDB just because you want to have schema-less, or—
Pete: —think about what you’re going to use it for, think about what the data access patterns because there is a right place for your data. Don’t just jump into no-SQL just ‘cause because you’ll probably end up with a bigger problem. In the long run.
Corey: If your mean time to WTF for a security alert is more than a minute, it's time to look at Lacework
. Lacework will help you get your security act together for everything from compliance service configurations to container app relationships, all without the need for PhDs in AWS to write the rules. If you're building a secure business on AWS with compliance requirements, you don't really have time to choose between antivirus or firewall companies to help you secure your stack. That's why Lacework is built from the ground up for the Cloud: low effort, high visibility and detection. To learn more, visit lacework.com
Pete: So Jesse, I’m looking at our list of questions. And it turns out, we have another question.
Pete: Two questions came in.
Jesse: You like me, you really like me!
Pete: It’s so great. Again, you can also send us a question, lastweekinaws.com/QA
. You can go there, drop in a question and feel free to put your name. Or not; you can be anonymous, it’s totally fine. We’ll happily answer your question either way. So Jesse, who is our next question from? What is this one about?
Jesse: This one’s from [Joseph 00:13:19]. They write in, “Hey, folks. Love the show. Longtime listener, first-time caller.” Thank you. “I would love to know how people manage their costs in AWS Batch. Jobs themselves can’t be tagged for cost allocation, which makes things a bit complicated.” Lord Almighty, yes, it does. “How best should I see if the jobs are right-sized? Are they over-provisioned in terms of memory or compute? What’s the best way to see if EC2 is my better choice, versus Fargate, versus other options? How can I tell if the batch-managed cluster itself is under-utilized?”
Pete: Oof. This is a loaded question with a lot of variables.
Jesse: Yeah. And so we’re going to break it down because there’s definitely a couple questions here. But I want to start off with what AWS Batch is, just really quick to make sure everybody’s on the same page here. AWS Batch, effectively, is a managed service in AWS that schedules it and runs your batch computing jobs on top of AWS compute resources. Effectively, it does a lot of the heavy lifting configuration for you so you can just focus on analyzing the results of those queries.
Pete: Yeah, exactly. And Batch supports a really wide variety of tooling that can operate this, and that’s why it’s hard for us to give, specifically, how you might optimize this, but I think some of the optimizations actually mirror a lot of the optimizations we’ve done with optimizing EMR clusters and things of that nature, where you’re running these distributed jobs. And you want to make sure that if you’re running straight off of EC2 instances, then you want to make sure that they are essentially maxed out. If the CPU is anything less than 100% for an on-demand instance, then there’s wasted, or there’s opportunity for improvement. And so making sure that your jobs are sized appropriately and balancing out memory and CPU so that, effectively, you’re using all of the memory and all of the CPU, that’s a real basic first step.
But honestly, a lot of folks kind of miss out on that. They just kind of run a job and go off and do their own thing. They never really go back and look at those graphs. You can go to CloudWatch, they’re all going to be there for you.
Jesse: Yeah. And to this point, there’s always an opportunity to make these workloads more ephemeral. If you have the opportunity to make it more ephemeral, please, please, please, please, absolutely do so. Unless your batch job needs to run 24/7. We’ve seen that in a few cases where they have, essentially, clusters that are running 24/7, but they’re not actually utilized regularly; the workloads are only scheduled for a short amount of time.
So, if you don’t need those batch jobs running 24/7, please, by all means, move to more ephemeral resources, like Fargate. Fargate on Spot, Spot Instances in general, or even Lambda, which AWS Batch now supports as well.
Pete: Yeah, it has some step function support, which is pretty interesting. Yeah, this is a great opportunity to aggressively—aggressively—leverage Spots, if you’re not currently today. The reality is that check out Fargate on Spot if you don’t need, like, a custom operating system, you don’t need a custom EBS volume size. If you do, then EC2 on Spot is probably the best option that you really have. But really do not want to be running anything on on-demand instances. Even on-demand instances with a really good savings plan, you’re still leaving money on the table because Spot Instances are going to be a lot cheaper than even the best savings plan that’s out there.
Jesse: And I think that’s a good point, too, Pete, which is if you do need to run these workloads on-demand, 24/7, think about if you can get away with using Spot Instances. If you can’t get away with using Spot Instances, at least purchase a savings plan if you don’t do anything else. If you take nothing else away from this, at least make sure that you have some kind of savings plan in place for these resources so that you’re not paying on-demand costs 24/7. But in most cases, you can likely make them more ephemeral, which is going to save you a lot more money in the long run.
Pete: Yeah, exactly. That’s the name of the game. I mean, when we talk to folks on Amazon, the more ephemeral you can make your application—the more you can have it handle interruption—the less expensive it will be to operate. And that goes from everywhere from Spot Instances and how they’re priced, right? If you just get a normal Spot Instance, it will have a really aggressive discount on it if you need zero time in advance before interruption.
So, if that instance can just go in at any second, then you’ll get the best discount on that Spot Instance. But if your app needs a little time, or runs for a defined period of time—let’s say your app runs for one hour—you can get a defined duration Spot of one hour, you’ll get a great discount still and you’ll only pay for however long you use it, but you will get that resource for one whole hour, and then you’ll lose it. If that’s still too aggressive, there’s configurable options up to six hours. Again, less discount, but more stability in that resource. So, that’s the trade-off you make when you move over to Spot Instances.
Jesse: So, I also want to make sure that we get to the second part of this question, which is about attributing cost to your AWS Batch workloads. According to the AWS Batch documentation, you can tag AWS Batch compute environments, jobs, job definitions, and job queues, but you can’t propagate those tags to the underlying resources that actually run those jobs. Which to me, kind of just defeats the point.
Pete: Yeah. [sigh]. Hashtag AWS wishlist here. You know, again, continuing to expand out tagging support for things that don’t support it. I know we’ve seen kind of weird inconsistencies, and just even, like, tagging ECS jobs and where you have to tag them for they’re to apply.
So, I know it’s a hard problem, but obviously, it’s something that should be continually worked out on because, yeah, if you’re trying to attribute these costs, you’re left with the only option to run them in separate Amazon accounts, which solves this problem, but again, depending on your organization, could increase just the management overhead of those. But that is the ultimate way. I mean, that is the one way to ensure 100% of costs are encapsulated to a service is to have them run in a dedicated account. The downside being is that if you have a series of different jobs running across a different, maybe, business units, then obviously that’s going to break down super quick.
Jesse: Yeah, and it’s also worth calling out that if there’s any batch jobs that need to send data to different places—maybe the batch job belongs to product A, but it needs to send data to product B—there’s going to be some amount of data transfer either across regionally or across accounts in order to share that data, depending on how your organization, how your products are set up. So, keep in mind that there are potentially some minor charges that may appear with this, but ultimately, if you’re talking about the best ways to really attribute costs for your AWS Batch workloads, linked accounts is the way to go.
Pete: Yeah. If you need attribution down to the penny—some of our clients absolutely do. For invoicing purposes, they need attribution for business unit down to the penny. And if you’re an organization that needs that, then the only way to get that, effectively, is segmented accounts. So, keep that in mind.
Again, until Amazon comes out with the ability to get a little bit more flexible tagging, but also, too, feel free to yell at your account manager—I mean, ask them nicely. They are people, too. But, you know, let them know that you want this. Amazon builds what the customers want, and if you don’t tell them that you want it, they’re not going to prioritize it. I’m not saying if you tell them, you’re going to get it in a couple of months, but you’re never going to get it if you don’t say anything. So, definitely let people know when there’s something that doesn’t work the way you expect it to.
Pete: Awesome. Wow. Two questions. I feel it’s like Christmas. Except—
Pete: —it’s Christmas in almost springtime. It’s great. Well, again, you, too, can join us by being a Friend of the Pod, which Jesse really loves that one for some reason. [laugh].
Jesse: Yeah. Don’t know why, but it’s going to be stuck in my brain.
Pete: Exactly. You too can be a Friend of the Pod by going to lastweekinaws.com/QA
and you can send us a question. We would love to spend some time in a future episode, answering them for you.
If you’ve enjoyed this podcast, please go to lastweekinaws.com/review
. Give it a five-star review on your podcast platform of choice, whereas if you hated this podcast, please go to lastweekinaws.com/review
and give it a five-star rating on your podcast platform of choice and tell us why you want to be a Friend of the Pod. Thank you.
Announcer: This has been a HumblePod production. Stay humble.