AWS S3 Storage Lens: The Best Service Not Announced at AWS Storage Day

Episode Summary

Join Pete and Jesse as they talk about the coolest service not announced at AWS Storage Day: AWS S3 Storage Lens, which lets you track your S3 usage across accounts. They discuss how this new service solves a major problem, how you’d have to track S3 usage prior to Storage Lens, how many organizations spend a lot of money storing multipart file uploads that fail, how AWS deserves kudos for making it super easy to set up the new service, what’s missing in AWS S3 Storage Lens, how Jesse and Pete spend more time than anyone should spend reading AWS documentation, and more.

Episode Show Notes & Transcript

Links


Transcript
Corey: This episode is sponsored by ExtraHop. ExtraHop provides threat detection and response for the Enterprise (not the starship). On-prem security doesn’t translate well to cloud or multi-cloud environments, and that’s not even counting IoT. ExtraHop automatically discovers everything inside the perimeter, including your cloud workloads and IoT devices, detects these threats up to 35 percent faster, and helps you act immediately. Ask for a free trial of detection and response for AWS today at extrahop.com/trial.


Pete: Hello, welcome to AWS Morning Brief. I am Pete Cheslock, and I am here yet again with Jesse DeRose.


Jesse: Hello. 


Pete: We here to talk about the best service announced not during AWS Storage Day 2020.


Jesse: So, close.


Pete: So, close, though. It was announced a few days after, and that is the AWS S3 Storage Lens service, which I think I've got that naming right. I know sometimes it's ‘AWS thing,’ sometimes it's ‘Amazon thing,’ and to be honest, I never know which is which. 


Jesse: Yeah.


Pete: AWS S3 Storage Lens is honestly one of the best new services that I've seen out, released thus far. I guess we're still pre-re:Invent announcements in a lot of this stuff. But what it is is a—from their site it says, “S3 Storage Lens delivers organization-wide visibility into object storage usage, activity trends,” blah, blah, blah, blah, blah, marketing speak. Basically, it allows you to get a view of your S3 usage across accounts. Which, that's mindblowing, right?


Jesse: Yeah. This feature has so much potential; I'm really excited to see where they go with it.


Pete: Yeah. And so when I first saw this blog post on Amazon’s site talking about it, my mind just started going crazy because again, we work in Duckbill Group as cloud economists with a lot of different clients, and because Amazon organizations may be the reason why, made it very easy to spin up new accounts, maybe also the adage, the design principle of creating many Amazon accounts to kind of segment workloads or to provide you to—segment your workloads in a way for cost reasoning or security reasons. But all of those things—somewhat related, somewhat not—have caused a lot of our clients to have lots of Amazon accounts. I mean, you could see hundreds, in some cases, of Amazon accounts. 


And the issue that I've always kind of had, and especially an issue we deal with in helping our clients analyze their costs and optimize their costs is how do you aggregate S3 usage? Because S3 is normally in the top five of services that we see in usage, how do you pull that together? And I guess we do that a lot of different ways. Jesse, maybe you can chat a little bit about what are some of the ways that we try to analyze this spend currently?


Jesse: Yeah. Pete, I think I'm really excited about this feature because AWS already offers aggregate looks at metrics for other top services by spend. Like, for EC2, you've got Compute Optimizer. We don't have anything for RDS yet, but I feel like that might be not far off, given Compute Optimizer’s existence. And we already have other tools that allow you to look across multiple accounts to look at metrics, especially if you're looking at Cost Explorer, for example, you can see metrics across multiple accounts, you can see spend across multiple accounts. 


So, I feel like this makes sense. I'm really excited to see that you can look at all of your S3 storage metrics in one place because right now, the only way that we're able to get any kind of representation of S3 usage is through Cost Explorer. And there are ways that you can go about filtering and slicing that data to get usage information and certain metrics, slicing and dicing on different filters for accounts and cost allocation tags, but it's all at the bucket level, or at the usage level, and if you really want to dig in deeper, you don't have a lot of options.


Pete: Yeah, it's a service that they're operating on your behalf. So, your only insight is what they give you insight into. Maybe some of that is CloudWatch metrics, there's obviously the S3 storage analytics that can give you some idea in your storage—based on access—that can help you kind of optimize, but nothing really again at the—ability to see it across multiple accounts is I think, really the big game-changer too.


Jesse: And I think what's really amazing here is that the majority of metrics that they're offering are free. And we'll get into that in a minute, but I'm really impressed that so many of these metrics are shared free of charge. You just have to turn it on. And then you have access to all of this great information that you can work with. 


Pete: Yeah. I think that's a great point that we haven't mentioned yet, that this is—the basic form of this is free. And the metrics that you can get are pretty useful in the free tier. Also, this is actually something that is turned on in your account right now. If you have an Amazon account, go into S3, it's actually under S3, it'll be on the left-hand column—at least it should be unless they go move stuff around—but you'll see a drop-down for Storage Lens, and you'll see an option for dashboards. 


And when you go into the dashboards, there will be a default dashboard already pre-configured with the free metrics enabled for your account. Now, that could be super helpful if, let's say, you just have one account, you can get some real good high-level metrics around your storage based on bucket. You can go into that dashboard and really quickly see total storage across all your buckets. You can see trend analysis with, day-by-day, week-by-week change comparison, how are things growing. There was one thing that I saw that I was really blown away by because this is something we deal with a lot is they have broken the metrics out in kind of a high-level summary, focusing on data protection, like being able to see data percentage replicated or encrypted, but also based on cost efficiency, too, being able to see if you have versioning enabled, obviously, there's a cost for that. 


How many old versions of this thing do you have, but also incomplete multipart uploads? That is potentially a large and in many ways, super hidden cost for some users of Amazon S3. If you are uploading a multipart file, and it fails, it lives in this purgatory, storage purgatory, where you're charged for it, but you may not see it in an obvious way. 


Jesse: And we see that with a lot of our clients who have multipart uploads and end up with these incomplete multipart uploads that just take up space. There's no clear metrics right now, prior to Storage Lens, that say, here's all of this stale multi-part upload usage that you're paying for, that's effectively just taking up wasted space. But now we have metrics for that; now we have information that can clearly tell us where they are, how much space they're taking, and you can actually do something about it.


Pete: Right. Yeah, it gives you this intelligence that you can act upon. To talk about those metrics, since we're kind of on that stage, when I went into that default dashboard, I obviously started looking through what kind of metrics. And there's a whole lot of them, and these are included in the Amazon documentation, which is linked within the Storage Lens in S3, but you can see things such as average object size, and object count, total storage. Those things can be helpful, depending on maybe you want to see kind of where you're spending in which buckets. 


Maybe your top-level spend, but you want to know how much is in certain buckets. Being able to see current and percentage of current version storage: how much data is an old version, which can really stack up the charges. Like I mentioned, multipart uploads. And then even dive into things around replication, how much your data is replicated and replicating? And the encryption as well. Like—


Jesse: Yeah, that one I'm really excited about because—


Pete: Yeah. Like, what percentage of your data is encrypted?


Jesse: Yeah. I feel like this is something that so many companies harp on—or especially security teams harp on to get all of their data encrypted end-to-end, everywhere in their application ecosystem. So, to be able to see at a glance, you have 80 percent of your buckets are encrypted, or 80 percent of your S3 objects are encrypted, you have a clear picture of how much of your data is protected the way that you expect it to be protected, how much more you have to go, and it's all in one pane of glass, essentially. As much as I hate to use that phrase, it really is this clean dashboard that gives you all this information at a glance.


Pete: Yeah, exactly. And when I logged in, checked out the default dashboard, I was, like, thinking—and this is maybe just a confusion on my part. Also, I didn't read the directions. Classic move; I just went right in and clicked on it—it does call itself default account dashboard, not default organizational dashboard. 


Jesse: Yeah.


Pete: And so when I clicked that, I was like, wow, this storage is kind of small because obviously, it was just for this account. So, what I did is, let's go create an all Duckbill account view; I wanted to look at a dashboard for all of those, and to do that you actually need to go in enable an Amazon organizational setting where it's authorizing S3 Storage Lens to access the organizations, so that you can create those organization level dashboards. That was amazingly easy to do: you click a box, and you tick a thing, and hit save. You're not dealing with IAM. I didn't go into IAM once for any of this. So, kudos on—


Jesse: Which is huge.


Pete: Kudos on making that setup super easy. And so I went to go create a dashboard for all my buckets, gave it a name. Again, kind of read through as you go create these things. I didn't read anything, and I ran into some issues, one of which is there's a region for your dashboard. And that's important because if you create a dashboard in a region, and then want to dump data to a separate bucket, it actually told us that you needed to create the bucket in the same region as the dashboard. 


So, that's one of the cool features as well of Storage Lens is the ability to output the metrics that it has for you into S3. So, now you can consume this into whatever you're using, like if you want to consume it into your other monitoring services. And I'm sure there's going to be a variety of third-party integrations for this kind of data. As you go and create the dashboard, you can limit it to maybe all of the accounts, certain accounts, including, excluding certain things. But then you get into the section on metrics collection. 


And there's free metrics that's the default. But you can also enable the advanced metrics and recommendations. There is a price for this; that's not free. And interestingly enough, that is actually something that is twice as expensive as the current pricing for the storage analytics, I believe, and—I believe it's 20 cents per million objects monitored. So, not a lot of people may know how many objects they have, but here's the beauty: now you do. 


You can turn on the free metrics, figure out how much you have, and actually get an accurate cost idea before you turn it on. That's pretty awesome and rare in the Amazon world.


Jesse: Yeah. One other thing that I do want to call out is that this feature is enabled in your individual accounts already—or has been enabled in your individual accounts already, but if you do want to turn it on for, let's say, the entire organization or some subset of your accounts, once you turn it on and it starts gathering metrics, it will only start gathering metrics across whatever subset of accounts or buckets you give it at the time that you turn it on. So, effectively when we turned it on, it started giving us metrics across all of our linked accounts at that time, but wouldn't go any further back. So, similar to S3 Analytics, where you turn it on, and then it starts giving you metrics based on your usage patterns over the first 30, or 60, or 90 days that you haven't turned on. Similar case here, where you will only see metrics across multiple accounts or across an entire organization once you turn it on and effectively tell AWS that you want to gather all of that data in one place. It won't automatically have that data and store that data historically for you.


Corey: This episode is sponsored in part by ChaosSearch. Now their name isn’t in all caps, so they’re definitely worth talking to. What is ChaosSearch? A scalable log analysis service that lets you add new workloads in minutes, not days or weeks. Click. Boom. Done. ChaosSearch is for you if you’re trying to get a handle on processing multiple terabytes, or more, of log and event data per day, at a disruptive price. One more thing, for those of you that have been down this path of disappointment before, ChaosSearch is a fully managed solution that isn’t playing marketing games when they say “fully managed.” The data lives within your S3 buckets, and that’s really all you have to care about. No managing of servers, but also no data movement. Check them out at chaossearch.io and tell them Corey sent you. Watch for the wince when you say my name. That’s chaossearch.io.


Pete: And what do you get for your 20 cents per million objects monitored? You get a lot of activity metrics: get requests, put requests, lists, posts, deletes, et cetera. If you're using S3 Select, you'll be able to see details around selecting request, and amount scanned, and bytes downloaded, uploaded. All kinds of things like that, super helpful. One thing it doesn't have, though, and I'm hoping these services get merged because, honestly, Storage Lens, I want the cross-account view of my storage analytics data. 


So, I want the view of how often files are being accessed in this same view. And I really hope they incorporate it in. And to be honest, if it's 10 cents for just Analytics, but 20 cents for Storage Lens. I would pay more for Storage Lens if it gave me that insight, the storage class analytics because then I can optimize for not only requests, maybe identify where I need CloudFront in front of one of my buckets, but also at tier as well.


Jesse: Yeah, absolutely. That's something that I think is going to be really impactful for—or is a really great use case for the advanced metrics because ultimately, I think that most people can get away with just the free tier of metrics that are available today free of charge, you don't need to enable the advanced metrics. But if you really want to go the extra mile to really start looking at how can I optimize my applications’ ability to use data and read and write data in S3, those advanced metrics will absolutely help.


Pete: But here's the beauty. Turn on the free mode—well, it's all really on by default, but if you want to turn on cross-account, figure out how many objects you have that you want to monitor that you want to get those insights on, turn on for just those specific buckets or items. 


Jesse: Absolutely. 


Pete: And because you're going to know how many objects you have, you'll know the cost impact in advance. You also don't need to have it on all the time. You can turn it on for a month; you can turn it on for a period of time, get the insight you need, make the recommendations, and move on. One very critical point, though, that I will call out here because we obsess about Amazon pricing and bills. Is that a valid assessment, Jesse?


Jesse: Yeah. I think that's an understatement.


Pete: To the point where we spend more time than really any human should reading Amazon documentation around pricing because every time something new comes out, we obviously want to know how it's priced because people are going to ask us how it's priced and we want to have that answer. So, this is, again, it's 20 cents per million objects monitored per month; straightforward. And there's a pricing model that already exists for this with Storage Analytics, S3 Analytics for Storage Class, so we understand that model. But in the pricing guide on S3, there is a line that I called out here, which is interesting because you can enable or disable a dashboard, which I thought was weird. 


Jesse: Yeah. 


Pete: Why is that there? And I now know why it's there. And this is the line; it says, “For S3 Storage Lens advanced metrics and recommendations, you will be charged object monitoring fees for each Storage Lens dashboard used. The Storage Lens advanced metrics and recommendations pricing include all the stuff that you get: 50 month data retention, activity metrics, et cetera.” 


What that means is, if you create a dashboard to monitor all of your accounts and all of your buckets, and you turn on advanced metrics, you will be charged 20 cents per million objects monitored. If you create a second dashboard doing the exact same thing, you will get charged an additional 20 cents. Your price will literally double. And they will keep doing that. You will keep getting charged 20 cents per million objects monitored per month, per dashboard. 


Jesse: This feels like a very typical AWS move where they announce something really, really awesome, really, really cool, really, really exciting, but the pricing and the documentation doesn't quite clearly highlight those very sharp edge cases. 


Pete: And we see it a lot with other services, that people have no idea why they're getting charged in certain ways. And it's simply because of the pricing being specific to something like this, like every dashboard, you create. Also too, if I create an all-account dashboard, in my top-level, kind of master payer account, let's say, and other people create dashboards at their maybe lower-level accounts aggregating that same data, again, you're going to get these additional charges. And so that's definitely something to keep in mind. It's a rough edge there. And it's something that you'll want to monitor for. Maybe they'll create S3 Storage Analytics Dashboard Systems Manager Cost Manager for us later. I don’t know.


Jesse: [laugh]. God Almighty, help us.


Pete: Don't actually do that Amazon; that was a joke. Don't create that service. But what's interesting is when we did create this dashboard, I was like, “Cool. I want to go look at it.” And we got an Amazon Detective pulled on us here, Jesse. What happened?


Jesse: Yeah, so as soon as we enabled this dashboard, we clicked into the dashboard to look at it, and it said, “Thanks for enabling me. I have to do some stuff behind the scenes. Come back in 48 hours.”


Pete: I mean, you can't be that upset about it, but it is still funny. At least Detective was like, “Come back in 14 days,” and we were like, “Okay.” [laugh].


Jesse: Well, yeah. And it was worse for AWS Detective because—or, excuse me, Amazon Detective because you were effectively paying for those 14 days or you are in part of a trial period for those 14 days, where you effectively couldn't do anything.


Pete: Yeah, exactly. Now, there was one again, great feature, the chef kiss feature, that when you create these metrics, whether they're free or paid, you can have them be exported to an S3 bucket into CSV format or Parquet format, and again, shout out for more Parquet storage because these metrics could potentially be pretty sizable, I guess, if you have a lot of data, just like everything. But also, if you have it in Parquet format on S3, you can immediately query that stuff with Athena in a super-easy way. But if that's just a little too advanced, which I get it; it's not the easiest to use, CSV is very flexible as well. And I think it’s, again, great that they're giving you insight into this data and then giving you the data for, then, you to do whatever, and maybe that is to consume into your third party metric system or consume into your own tool. 


But there's still some questions, I think, that we're trying to figure out when it comes to pricing. There's some places where maybe the web is free, but the CLI isn’t.


Jesse: Yeah.


Pete: Is there an API to this? I don't know. We didn't have time to check that out.


Jesse: Yeah. I will say final thoughts for me, this is definitely awesome. I'm really excited that the free tier has so many amazing features and I'm really excited to dig in more to it. I would say go out, enable the free tier today—well, it’s already enabled for your individual accounts, but if you want to enable it across multiple accounts, if you've got multiple accounts, by all means do so. Again, it's free, and who doesn't love more data to make more data-driven decisions?


Pete: Let's be honest, AWS Storage Lens is the best new service that was not announced at AWS Storage Day. So, if you've enjoyed this podcast, please go to lastweekinaws.com/review, give it a five-star review on your podcast platform of choice, whereas if you hated this podcast, please go to lastweekinaws.com/review and give it a five-star rating on your podcast platform of choice and tell us what interesting things you found with AWS Storage Lens.


Announcer: This has been a HumblePod production. Stay humble.
Newsletter Footer

Get the Newsletter

Reach over 30,000 discerning engineers, managers, enthusiasts who actually care about the state of Amazon’s cloud ecosystems.

"*" indicates required fields

This field is for validation purposes and should be left unchanged.
Sponsor Icon Footer

Sponsor an Episode

Get your message in front of people who care enough to keep current about the cloud phenomenon and its business impacts.