TRACKING THE INFINITY THAT IS SNOWFLAKE
With FlowObjects Auditing & Monitoring
Snowflake’s unlimited scale can be larger than administrators have the ability to track with the naked eye. To keep track of what is going on in Snowflake, a much deeper analysis needs to keep a pulse on the infinite potential that is Snowflake.
In this short session, the FlowObjects team demonstrates some of the analytics that comes packaged with the Snowflake Auditing & Monitoring analytics.
Read the Transcript:
WHAT IS FLOW OBJECTS AUDITING & MONITORING?
Jared Hillam (0:08)
Hey, folks, thanks for joining us. In this presentation, we're going to go through how to manage the infinity that is Snowflake. And we're gonna dive into the FlowObjects Auditing & Monitoring.
I'm gonna jump in by talking about, what is the Auditing & Monitoring solution and why we created it. It's part of a suite of utilities that we've created around Snowflake. There's Auditing & Monitoring, Access Control Master, Excel Add-In, and Data Loader.
First, let me just talk about what's happened to Snowflake over the last few years and what we've learned about Snowflake over the last few years, truly how unlimited it is. You can have unlimited numbers of tables, unlimited numbers of roles, databases, warehouses (which is compute), allocation plans, and all of that stuff. It's just all unlimited.
And think back in the days of old here on the left, we kind of used to have this natural fence. We'd have this natural fencing of a server to kind of hold us in and corral us. And of course, it was very slow to plan for these servers. And we don't even talk about hardware anymore, but in the old days, it was a big deal. It was kind of crazy back, then you'd have to plan... you couldn't go too small, and you couldn't go too big, you kind of had to figure out what "just right" was.
Well, now in our day here, it can be small, it can be incredibly massive... it can be anywhere. And the only limit is our budget. That's the only thing that limits us. And so obviously, the budget can be a problem in Snowflake accounts. Especially where they haven't thought through some of the things that could add up a lot of dollars in the bottom line investment when they don't maybe invest in the appropriate level of engineering. So how do you know if you're paying too much? Or if you're just right? And how do you monitor this infinity? It's great to have infinite potential. But if you have no ability to monitor that potential, it becomes difficult.
We've seen some strides taken here. Snowflake has released what is called "Snowsight", which does give you some visibility into some of these objects. But what we're talking about is really level two, which is how we go really deep into this analysis and really, in a detailed view, get to see what's going on inside of Snowflake so that we can optimize it for its maximum potential. And that's really what the FlowObjects Auditing & Monitoring is all about.
So what we're going to do is I'm going to pass it over to Rich to demo the Auditing & Monitoring, and then he'll pass it back over to me and I'll show you what the pricing structure looks like for it.
Rich Hathaway (3:25)
DASHBOARD OVERVIEW: Alright, you should be seeing my little dashboard that I've got here. So let's talk about this a little bit. First of all, we're showing the dashboard, but as everybody knows, behind any dashboard, the value is what's the data and where we are getting the information. One of the most important pieces of this product is that in Snowflake we are taking advantage of all of the different opportunities to gather metadata about what is happening in your Snowflake data estate. What's happening in your account.
We're pulling information from information schemas, from Snowflake account usage - all the secured views there. We're pulling information from some show commands, describe commands, and some of the table functions. We gather all that information and we actually persist that information on a regular daily basis. So what that does, first of all, allows us two things:
1) It allows us to display the metrics and look at the information much faster, because going against the Snowflake views/secured views, sometimes querying that metadata can be fairly slow. So by persisting that information into tables, it allows us to go after that data much faster.
2) It allows us to persist that data for a longer period of time. In Snowflake, if you look at some of the metadata views that are in, like, the information schema. So if you want to look against some things that are happening in a specific database in Snowflake, you can look at, for example, the load history view in the information schema for that particular database. But that data is only there and only available for 14 days. You can go to the Snowflake account usage area, and that information is persisted for one year. But what if you want to be able to do some analysis of what are we doing year over year? How am I trending my credit span this year compared to the same period last year? ...Those types of things. Without persisting that data, you're not able to do that.
We built this whole process that goes through and grabs all that data and persists all that data. And we're continuing to find new things, new ways to persist that information, new ways to analyze that information. And I'll talk about some of those, as we look through the dashboard. But there's quite a bit here, and we're continually adding new things to this as we learn.
So this is mainly the front page of the Auditing & Monitoring dashboard. And what this is really looking at is our Snowflake usage. We want to see the things that are costing us money... how much are we paying for Snowflake? We know that the two ways that we're mainly paying for Snowflake are:
- One is the credits - the credits that we're using. So the use of virtual warehouses.
- And then the other is the storage. So the storage, we'll talk some more about that, there are some things that you need to understand there.
What we do with this information, we persist this information. You can see on this dashboard up in the top: First of all, we can look at this by trailing the number of months. Do we want 6/12/18/24 months? How long do we want to be able to look at this? There's a number of options that we're we're building in here. We can see for that 12 months how much. We can also look at this by credits or by currency.
Jared Hillam (7:22)
It's kind of nice to be able to have that currency button for the "bean counters" that are trying to see how much this is costing on a more performance level view.
Rich Hathaway (7:36)
Yeah! And actually, we've got the ability up here so you can kind of look, what if our credit costs was something a little bit different? We've recently introduced in the data that we are persisting, a place where we can actually store the actual credit cost that we have under contract with Snowflake. And so you can use it with your actual costs, or you can look at it with a "what if". What if our credit was this, what would our costs be looking like? That type of thing.
CREDIT USAGE: The first thing we can see is at the top here, we've got Warehouse Credits, SnowPipe Usage, and Average Monthly Storage in Terabytes. So for whatever the date range is - in this particular case, we're looking at the last 12 months. And (a little warning) some of this data may look a little funny as we're going through this. This is one of Intricity's Snowflake accounts. So we'll be heavy users of this for a couple of days and then nobody touches it for a few days... it's not a normal Snowflake account. But it's ours and we love it. So the warehouse credits: we've used 424 credits in here over the last12 months. We actually have done some SnowPipe, we did a little bit of usage of SnowPipe. So so we track those separately. And our average monthly storage is a little over half a terabyte.
And then we have broken this out into a number of visualizations. So we can look at it a few different ways to easily see what's going on in our Snowflake environment.
CUMULATIVE COMPUTE USAGE: This visualization here, is really our cumulative compute usage over that period of time. So we can see, over the past 12 months, that we've used our 425 credits, but we can see how that came in in monthly chunks. Down here below, we can actually look at it by the specific warehouses. So we've got a whole bunch of warehouses here. We can look at it by the warehouses if we wanted to look at any one particular warehouse. You can see our workshop warehouse. We use this when we do workshops/ hands-on training with students and folks that are learning Snowflake. So you can see we've got quite a bit of credit usage is in that workshop warehouse. But we can also look at it for any of the other warehouses and then there are lots of things we can do. Like if we want to see the details of some of these really small ones, we can zoom in on that so we can see. But again, this is our monthly usage. It's just split up and looking at it by warehouse.
Over here, we've got some more hills on the monthly usage. And then we've got kind of like a little waterfall diagram that we show how our usage is changing from month to month. Is it continually growing? Is it growing at a certain rate? This is showing us that, for example, in September, we use 38.5 more credits than we used in August of 2020. But then in October, we dropped down another 21 credits. So we can see it's kind of the waterfall.
Then we want to look at it by day of the week. Are we really heavy users on Monday? Which day of the week are we are we doing the heavy usage in Snowflake?
And then down below, this is just the same thing, but by hours of the day. We can see most of our usage loads. Are they happening in the middle of the night? Is it consumers that are running reports during the day? So we can look at this in a number of different ways for us to be able to look at the credits. And you'll see as we look at some of the other pages in the dashboard here, and in the reports, we'll see some more details on some of these areas.
AVERAGE STORAGE USAGE: Over here on the right, we actually have average storage usage. And so we're pulling this by all the different types of storage so that we can see what's going on with our storage. We've got regular storage, fail-safe storage, and stage storage. You can see we had .015 terabytes of stage storage in June of 2020. So we've got a bunch of files that are sitting out there and the stage hasn't been cleaned up. We might look at that and say, do we really need those out there? Or can we clean those up?
Jared Hillam (11:49)
Is that a common scenario to see a big massive spike in a certain type of storage?
Rich Hathaway (11:56)
Yeah, and that's one of the things we'll look at. You can see we've got some other storage metrics here, we'll go into some more details on that. But this is one thing that you really want to watch. Again, the storage is not that expensive... a terabyte of compressed storage $23 a month. But what will happen is that can get out of control. Especially with Snowflake with the ability to do time travel, the way that the micro partitions work in Snowflake with those being immutable. If you're not paying attention to what's going on, your storage can get out of control! So it's something that again, we want to be aware and understand what's going on. And so that's what this is allowing us to do. Looking at the different types of storage, and then again from month to month, so we can look at how our storage is growing. Are we adding 20 terabytes every month? What's going on with our storage from month to month.
WAREHOUSE SETUP METRICS: Let's talk a little bit more about the credit side of things first, and that's the virtual warehouses. The next tab that we've got here is some information about how our warehouses are being set up, and how they're being used. Of course, this is very important for us to be able to see what's going on with our credit usage, because all the credits are coming through the usage of these virtual warehouses. So we can we can select specific warehouses, if we were just interested in a couple. By default I'm showing all the warehouses here. We can look at a date range. The date range is really for the average queries running, and we'll talk about that in a moment. So the rest of this is setup information about the warehouses.
Jared Hillam (13:33)
I think it's safe to say that our favorite warehouse size is extra small.
Rich Hathaway (13:37)
Yes, absolutely. And that should be unless you have a good reason for it not to be. In our environment, extra small does an amazing, amazing job. Out of our 34 warehouses, 32 of them are extra smalls. We have one that's current size is a medium, and we have one whose current size is a small. The warehouse sizes with Snowflake can scale up and down. We can easily change a warehouse instantly. We can change it from an extra small to a small or an extra small to a 2x. We can easily do that. So this is the current size of the warehouse from the last time that we pulled that information, this is not showing us the history of the warehouses. So we can see the warehouse count by size.
AUTO SUSPEND: And then this is the auto suspend. This is important for us to know and to be able to look at. The auto suspend of course is with your virtual warehouse. Let's say your warehouses sitting there, it's not currently on... somebody kicks up a query, the Snowflake warehouse starts running this query and that kicks it off. Once that query is finished, if there are no other queries that are running on that warehouse, this is how long the warehouse is going to continue to run before it shuts itself down. You can set that up so that the warehouse... you can actually set it up so the warehouse would never shut down, which is a very bad idea because then you're just getting charged credits the whole time. And that is an option, though, if you go into Snowflake in the warehouse, autosuspend - never. Never select never, by the way. But so this is showing us that we have 15 of our warehouses, the auto suspend is set to one minute. So that warehouse will continue to run up until one minute after the last query is finished, and then that warehouse will shut itself down. And it will start up again because we've got them all on auto-resume, it will start up again, as soon as we launch another query. We've also got 10 that have an auto suspend of five minutes. This is just showing us and again, this is something we need to look at. I've seen and worked with a number of clients, where they're saying their credit usage went up. We go in and look, well, somebody changed it to "never auto suspend". Or somebody changed it to 60 minutes... Somebody changed something and we didn't know it. And so we need to be able to maintain, manage, and understand that.
WAREHOUSE SCALING: The next one talks about scaling up and down. And we can change warehouse-size from extra small, small, medium, large. But this is looking at the warehouse clustering. So this is scaling in and out. So this is the horizontal scaling, which is what we want to set up on our warehouses. This is for the concurrency scaling. What that means is if we've got a warehouse that's running with so many queries that are running on that warehouse, and some of those queries are starting to queue to wait for other queries to finish... are we going to let that warehouse automatically spin up another cluster (which is basically like starting another warehouse of the exact same size)? You can see we have 20 warehouses where the maximum cluster is one. So if you're running on this warehouse, and your query is queued because there are too many other queries running on that warehouse at that point in time, you're just gonna have to wait until one of those other queries finish and then you get your turn (so you're in that queue). We've got six warehouses where the max clusters are two. So if we had queries that go into a queueing state, these warehouses are automatically going to spin up one more cluster. And then it's got the ability to run more of those queries. If it gets to the point where we still have queries queuing, after the two clusters, it's not going to ever spin up another cluster. This is important for us to know because we need to understand if we have queries that are queuing because there's too much load on the warehouse. Or if we have queries with warehouses that we're just spinning up a whole bunch of clusters. Again, there's a cost involved with that.
CREDITS BY WAREHOUSE: And then this is just a view of the different credits by warehouse. You can see we use an awful lot of the "load warehouse". We've got a lot of warehouses, they're not used very often, we play around and do lots of things with them. When we've got this list of all the warehouses, you can see the details of the information of the warehouse. And then we can see over a period of time, the average queries that were running during that five minutes, and then the actual number of our credits that we use over that same period of time. We can zoom this in and out if we wanted to be able to see these peaks a little bit higher. So we can see on May 30th at 10 am we actually use 7.48 credits in that timeframe.
I talked about the scaling a little bit with the clusters. This view allows us to really see what's going on with the warehouse cluster. The top view up here is in five-minute increments. Again, you'll see that this is one of the days that we had the heaviest usage on our Snowflake instance. And you can see there were a whole bunch of times where that warehouse was doing nothing. What we can see here is that this is a warehouse that we have to scale out. We had one cluster, one cluster, one cluster... and then we jumped up to four clusters because we had a bunch of queries running. So we can see that we got actually got up to five clusters. The top view here is how many extra clusters were running in that five-minute period. Down below we can see the average number of queries running. So at that same time, you can see 30.97 was the average number of queries running during that five minutes. But also can see how many were queued, and how many were queued for load or for provisioning. This allows us to really look at how our warehouses are being used as far as scaling in and out - the horizontal or the concurrency scaling. So another way for us to be able to look at that. So all of this is very important when managing our warehouse credits and our usage of the credits.
STORAGE METRICS & TIME TRAVEL: Let's talk a little bit more about storage. This is the ability to look at any of our tables. For example, if we only wanted to look at the admin database, we can look at this selection up here. We can expand this down so we can look at certain schemas, or we can even go down to individual tables if we wanted to. We're going to select all because we want to look at everything that's here. Then in Snowflake, we've got time travel. So on each table, you can set up retention days - which is how many days you want that time travel data to be available. And so you can see, in our instance, most of our tables are set to one day of retention We have a couple that are transient tables and a couple that are set to zero, and very few that are set to seven. And again, this is one of the things that we need to look at as you set these. We've worked with a number of clients who said, "Oh, time travel, that's a great plan! We want to be able to get back to that data. And if somebody messed up some data three weeks ago, we want to be able to easily get back to that." which is great. But if you have a table that's got a lot of churn in it, then what's going to happen is you're going to be adding up lots and lots of storage in that time travel. And so you're gonna see some things happen that you may not really expect.
Jared Hillam (22:28)
In some of the cases that you've come across, how big can that get?
Rich Hathaway (22:33)
I've actually seen a number of clients where they'll have tables, where the active size of the table (so the storage) is maybe five gigabytes, and the time travel and the failsafe storage is like 20 terabytes. Because what's happening is you've got a table, that's a large table, but every day or every five minutes (depending on how often you're updating that and if you're doing blind updates, and all those types of things). Snowflake micro partitions are immutable. So every time you do an update or insert its new micro partitions, and if we're saying that we want to keep those around for 45 days, and we update that table 10 times a day. Let's say we update every record in a 20 million record table, and we update that every record 10 times a day. This shows this up here - if I if I do a sort on the failsafe. So you can see that we've got a table here called DB tables, the total size is 21 megabytes. Active size is 1.4 megabytes and in failsafe, there are 19.9 megabytes. This is a way that allows us to look at each of our individual tables. How much active storage are we using? How much storage are we using for time travel, for failsafe? Do we have some tables out there that are just being retained because there's a clone? What are the retention days that were set on those? And then we've got a percent of the storage size of that table that's being used for active data, and for retention micro partitions. So again, just really an eye-opening and interesting way to see this. And what we've learned with certain clients is that not everything needs to be set to 90 days of time travel. And also, if you are going to set some tables to time travel, you want to make sure that you're not doing blind updates, you're not doing things like that, that is going to add to the churn of that table, which is just going to be loading up all your retention as date data there.
STORAGE GROWTH: We also have the ability to look at some storage growth, and this is a little funky. Again, we don't use this on a daily basis. But we persist a snapshot of the tables on a daily basis. So we can look at, over time, the growth of specific tables. If we wanted to look at this specific table, you can see that back here on April 7th, that table was brand new, there was nothing there. Again, we're not running this on a daily basis on our system. But you can see that when we grew that that grew to 23 megabytes on May 4th. By May 10th, it had grown some more. So we can look at this and see how this shows up over time. And this is very valuable if you're trying to understand things like time travel and feel safe storage.
So let's say you've got a table where, like we talked about, that you've got just a huge amount of storage in time travel because it's a high turntable and you've set it to have retention days of 45. And then let's say we go in and we change that table down to seven days or something like that. What's going to happen is, we're going to see the time-traveling and failsafe bytes start to go down as that kind of cleans itself up over time. So again, just another way for us to look at that storage and understand what's going on with the storage.
QUERY PLANS & PERFORMANCE ENHANCEMENT: We also have, and this is a scenario we've been working on quite a bit. There's a lot of new visualizations and new information that's coming out here. But what we're doing is we're taking all of the queries that are run, and actually running query plans and persisting the query plans on those queries. So with those query plans, we can actually see certain pieces of information. Like we can see how many micro partitions a certain query opened up for a specific table, and how many micro partitions that table has in total. So we can see what's going on in all these queries. And what this is allowing us to do. Again, this is an area that we're revamping, kind of right in the middle of right now. But what this is allowing us to do is to be able to say and to kind of take our top 10 and say which of our tables are really in a lot of queries, and they're causing the queries to be slow.
And there's an opportunity to improve the performance. thus, when we improve performance, we're improving credit usage, of course (because the less we're running our virtual warehouses, the less we're paying in credits). And so we want to look at all these, and by bringing in all of this information, where we can look at any of these tables. How often they're being used in the query, we can actually look at any of the specific queries.
For example, if we look at some of these query history queries, and we come down here, and let's say we want to look at this one that returned 1000 rows. If we click on that, we can actually see the query here and we can see all the steps in the query. So we can see all the table scans, we can see which tables were used in there, we can see how many micro partitions were opened up out of the total. We can see the percent of the partitions accessed. So this one, the table had 40 micro partitions, we opened up all 40 of them for the query. What we're doing with this, as we persist this information for large amounts of queries, what we can do is we can apply some analysis to that. We can say here are some tables, the way we're querying these tables on a regular basis, it looks like we can set up a cluster key on these two columns and that will greatly enhance the performance of our queries. And so we're going to see better performance, our users are going to get their queries back faster, our credit usage is going to go down because we're going to be opening up many fewer micro partitions for those queries. So lots of things that are going on there in the query performance world and how we're we're persisting that data.
One of the really cool things about that, as I said, we're doing a lot of work in that area right now. We're actually using Snowflake external functions and the asynchronous Python ability with Snowflake. Because persisting, grabbing the query plans, is actually a process that takes a little bit of time. And so if you're looping through training one after another, and you've got hundreds of 1000s of queries that are running every day, that's not going to happen. But using the external functions and the Python, asynchronous ability with Snowflake, it's amazing how fast we can jump through all those queries. We can persist all those query plans so that we are able to do this analysis and find out which tables can benefit from cluster keys, which tables can benefit from search optimization, which developers are just writing queries that we need to go and teach them how to write some better SQL. All of those things and this is what this can get for us.
USER INFORMATION: Real quickly, we'll jump through a couple of these other ones. Here's some user information. We want to see our users, how many users do we have? Do we have some that haven't logged in? How are they logging in? We can look at their logins by different client types, we can look at users that have a password or don't have a password. Do we have users that we're only allowing to come in through SSO? If they have a password, we know they can get in through the Snowflake UI. If they don't have a password, we know they're only able to come in through SSO. Have they got multi-factor authentication setup? All these things we can see here in the user information.
ROLES: Also, we've been doing some work and doing a bit more work - a lot more will be done here. But we want to look at roles. This is one of the things that are very, very important Snowflake. But it's kind of difficult to see in Snowflake. This allows us to pick certain roles and look and see how these roles are associated with each other. So let me just grab account admin. We've got the role of account admin, and all of these other roles inherit up to account admin. And so if we wanted to look at my sysadmin, account admin, and my sysadmin Dev, and I want to see how all of these relate to each other. So we've got sysadmin here, which inherits up from all of these roles. We've got sysadmin dev that inherits up to sysadmin, and then from sysadmin to account admin. So it's just another way, using a Sankey diagram, that we can kind of pull some of these things out, we can take a look and see how these relationships are set up. And then we've got that also for users and roles.
(Rich's connection dropped)
Jared Hillam (31:57)
PRICING: Alright, so what I want to do is walk you through the costing. If you go into here to flowobjects.com and then click on products. In Auditing & Monitoring, it'll bring you in here Where you can see what the pricing is. The pricing is $1200 a month which translates to is about $14,400 a year. So it's not, it's not over-the-top, and there is a lot of feature sets that are being packed into here. So there's a lot of value for that.
If you're interested in FlowObjects and if you'd like us to present it to your organization, I'd recommend you reach out to us. And of course, if you're interested in getting a quote for your organization, it's per instance. So if you have multiple instances of Snowflake, then we can go ahead and work from each of the instances. Just to give you a scenario that might make sense there. Imagine you're a global organization and you have multiple divisions with their own Snowflake instance, we license it by instance.
And that concludes our presentation on the FlowObjects Auditing & Monitoring. If you're interested in connecting with us about this, please do reach out. We will catch you all later. Thank you so much.
As an Enterprise Data Solutions Architect at INTRICITY, Rich has deployed Snowflake in several dozen Fortune 500 accounts and designs and delivers orchestration, modeling, security and strategy to enable elastic cloud data warehousing.
Jared acts as our Vice President of Emerging Technologies and is the host of our Snowflake ProTalk webinar series. Listed as one of the “Top Marketing Operations Leaders You Should Know” and is the author and creator of our INTRICITY101 video series.