Augie Gonzalez: So, today we’re going to talk a little bit about a fairly modern issue, which is a result of so many monolithic designs being replaced by purpose-built gear. So, what I want to give you is a simple, three-step process that helps you convert this kind of collection of diverse block file and cloud storage into something that can be treated more as a resource pool. And we’ll do that in the next few minutes, and so you should be able to walk out of there with some really actionable insights from this. And part of this, the choices that we have in front of us today, come as a result of tuning things, tuning storage for particular purposes. And surely, most of us look at block storage as the place where we would drive anything that has low latency demands and very high IOPS [weave], whereas we treat kind of that object storage and cloud for durability and elasticity and long-term retention and archive. And then everything in the middle kind of falls into that file services and [NAS] things.
So, that creates some predicaments for us, and I think that – you can appreciate that the difference between having a single system that somehow try to accomplish all those objectives and having three or four or some diverse number of these. One of the biggest problems with it is that they all have their own unique way to provision capacity and protect the data. So, that’s issue number one. The second part is that these are essentially islands of storage. They’re silos. And so, consequently, in order to move data between one to the other as it ages or it becomes less relevant, that becomes a manual task on somebody that seldom gets done. So, while in practice you would think well, I have these three different types of buckets and I really would like to relocate data when appropriate to the best fit, in practice, that doesn’t happen. Just because there’s no time and there’s no fidelity in the management to be able to do that.
So, what I want to encourage you to do is kind of think of these more as a way to put all of them under the hood of a common control plane, where we can have a unified set of data services that can operate infrastructure-wide across these, and treat them not as silos anymore, but rather treat them as tiers of storage that have a role throughout the life cycle of the data. And for that, the first thing we need to do is layer some software on top of them. That software helps us treat all of these very different characteristics of storage as different elements in a bigger storage pool. So, by pool, I mean a federation of those resources or the aggregation, so that they appear to be multiple resources under a single broker, and that broker can determine, at any point in time, what requires – what files should be put in high performance storage, and which ones should be relegated to lower-cost, cheaper capacity, whether those are coming in through NFS or SMB file shares, or whether they’re coming in through the Amazon S3API, it matters not. They all share these kind of characteristics where they go from being hot data to moderate to chilled-out information, and so that naturally causes the [traversal] of these different types of resources.
So today, I’m going to give you a little bit of sneak peek about an announcement that we’re making on November 14th, and so that’s – I’m only going to share a bit of the information for you, so leave some of the mystery for later, but enough for you to run with. And so, I won’t get into all the comprehensive data services that we’re offering at that point, but I will give you enough to work through. So, at the outset I said it would – I could do this in three steps. How I can create this collection and aggregation in three steps. I may have shaded that a little bit. I think when I put it all together, you’ll find that it’s about five steps, but three are mandatory, and the other two are optional.
So, the first part that we go into this and I’ll give you some more graphics behind this, is the idea of assimilating these existing file systems. The shares that you have in place, and put them under the wing of a virtual scale-out NAS, essentially what it would look like, what it would appear that – but without touching the data. So, data remains in place in the repositories that currently exist, so there’s no migration as you would think. So, there’s – in order to take advantage of these new functionalities, you’d just basically say look, I need some other intelligence above what sits in place now in order to combine these into something in the aggregate.
And the other thing I do as part of this assimilation process – some people call it ingestion, but assimilation is probably a better word, is to retain all the file attributes, all the permissions, the ownerships – all of those things that determine how access controls operate, and who is the owner and who has rights to this.
The second part then, in this process, is to be able to let the software take inventory of all those things that exist, that collection of unstructured data, and create what becomes then one global, scalable, searchable catalogue. This is sometimes known as a global Namespace, but that global Namespace spans the collection of individual silos that you have, and from that point on as you’ll see in a subsequent graphic, the users, or the administrative staff, have to know the specific filer on which the data sits on, or the specific NAS box which it sits on. We can detach that and separate ourselves from how the data is stored from how we access it.
And in fact, how we access is also independent of protocol. So, the same file that may have been placed by a Linux system using NFS, can also be accessed from a Window system, using the SMB protocol or from a new, modern application that may be using the S3API. So, any of those are conversant ways to get at the information.
The third step here is allowing the software, the software layer, to load balance the new request across those available storage systems. So, rather than having someone manually decide well, this share is starting to get loaded on these, this particular storage box, or this NAS, and have to move users off of it and create another Namespace and create yet more confusion, what we allowed the software to say is, I understand how that capacity is being exhausted; let’s make sure I do some well distribution of the consumption of capacity across the available ones, based on some criteria that are site-specific.
So, those are the three main things that we want to get done right away. The next things then become an advanced function that we can help to then mitigate some of the costs associated with it. And here, what we can do is for one, expand the capacity of what we have on-prem, by treating cloud storage as an extension to that global pool, and simply let the software say whenever certain situations exist, and when the information is deemed appropriate, I shall move it off on-prem to the lower cost storage, and where I have much more elastic space for it. So, we’ll walk though that in a little bit more detail.
Dynamic File Relocation
Then, the final element here is where a lot of the smarts goes into, and that’s the ability to, through some hints, or the system, the software will be able to dynamically relocate the information to the best device, based on what we call business intent, that is, what is your objective? What are you – how are you classifying this particular file, or object? And based on that, and – is it the most relevant thing? The most interesting thing that you’re after, or is it something that perhaps has aged to the point where it really merits being placed somewhere else outside of the high-performance storage, so that we can leave more room for things that actually need that kind of performance.
And that same approach applies to whether the information has saved its file [or] objects. So, before I give you any more information on this, I’d like to understand a little bit about your current environment. And David, if you could help me with this poll.
David: Absolutely. We have a poll here. I’ll just bring it up. And the question is: where do you – where do your file shares sit? And there’s a number of answers here: NetApp Filers, Isilon, other NAS, Windows SMB/CIFS/NSF Servers, Linux File Servers, or maybe you have a mixture. So, I’ll give you a minute to answer this question and then I’ll share the results with you so you can see maybe how you stack up with your peers who are on event today.
Gonzalez: Yeah, so, as I’m seeing the dials move, it looks like the dominant group have a mix of them, as I would expect, certainly from the audience profile that I understand, it makes sense that that would be the case, because some of you are – likely have built-out some number of NAS systems, and then you’ve supplemented that with general purpose – kind of turned some Linux servers into shares and maybe some Windows servers [in] some – a little bit of that – and that seems to be about where the strongest mix is that I can tell from, Dave.
David: Absolutely, yeah, 47 percent said a mix, followed by 30 percent who are using Windows with a variety of different protocols.
Gonzalez: Okay. So, let me show you exactly how we would deal with this. So, here is a screenshot from the software that helps us visualize what I mean by assimilating. What I can immediately tell the system is, okay, there is – there are devices out here that have certain personalities to them, as you can see here, it may be for those [of] you who are running Isilon, or running some of the NetApp, whether it’s cluster mode or the seven series, we can ingest that, put it under that Namespace as well as any of the NFS-class machines, whether that’s a window server, with the NFS role on or not, those – that’s where Window shares would fall.
So, that’s where the data is already existing, that should be your existing Namespace. What we’re going to do is then take that under the uber management of the DataCore software, and it starts to then, from the sidelines, without being in band or anything, it’s just basically saying okay, now I understand all those shares; I know the files and the directory structure and I know who they belong to and how I should enforce access to this. Then the second thing I do is I say, I ask of the site, where would you like to supplement the existing on-prem storage for things that could go on lower-cost capacity. And then, these are the choices that I can give you, is I could say well I want to also expand the overall space by looking at some buckets and the Amazon cloud, or in the Azure cloud or Google cloud, or I may have some object storage systems that I’m considering putting on-prem that – that’s what will provide that elastic storage for me. And that will have a much lower cost per terabyte than any of the other systems, but I understand those tradeoffs.
And so there is where I’m going to be placing any of these less relevant information that I want to archive. And it’s that simple. It’s basically here’s what I have, and here’s what I’m going to add, and now I understand all of the assets at my disposal. So, no longer, under this view, are we bound by the hierarchies that we’re accustomed with file shares, for example. So, today, the way you’re likely structured is that these individual file shares sit on a specific devices or servers and the access to them is governed by first understanding which location they’re at, and by navigating that directory folder hierarchy. That can be quite difficult as things move from one of these buckets to the other, and as people fail to follow the discipline that you had hoped; that all of a sudden the folks in [APAC] are putting some of their stuff out of necessity, because they ran out of room on the Americas folder, and maybe the HR people are having to open up some of their space to the support team, because they’ve run out of space.
So, that’s how things deteriorate over time. Under this global Namespace, here defined as omniShare, we basically distance ourselves from the way storage, the information is stored. We simply say we have a collection of this, the directory structures remain in place, but we are not bound by them. Instead, now as when I make a request, I’m searching for something, it can look across this entire address space and find it. Not have to be pinpointing and [hecking] and pecking, where it is I need it. That allows me to do things pretty interesting way. For example, from a sysadmin standpoint, or devops, I can make a request of the software to say archive anything that’s been untouched in six months. Clearly, nobody is doing anything with this. Why keep it around here? Let’s move that to this Amazon cloud, where I think it’s going to be a lot cheaper for us to keep it there. And so, we can make that request very straightforward, simply by stating these objectives in this panel.
So, here, what we’re – if you can open up this and expand it a little bit, what you would see is we’re looking for a very specific surge criteria in this example, as to what to place somewhere else. And what this is saying is, I want to take any MP4 files that I find here that haven’t been active, haven’t been touched, haven’t been accessed at all, over the past few months, and I’m going to place them in Azure, in the Azure [blob]. If it’s less than that, if the MP4 files that are much – have been touched more recently, not doing a thing with those. Just leave them where they are. In fact, I want to make sure they don’t go get put in object store. Customers have already found that they are identifying a bunch of wasted space from these video clips and that are sitting around unbeknownst to them, tying up a lot of room. And so, this is one way first, to identify the fact that you’ve got a bunch of those, and if you still choose to keep them, to move them off-site, where they’re going to cost you a lot less money.
These same kinds of objectives then can be stated for different criteria, with the intent being that you can define what drives the data placement. And I can use several different parameters to establish that. So, those policies or business objectives can be based on performance characteristics, on resiliency, that is how many nines does this particular file, or this directory, or this project collection need to have? And based on that, to put it in the right place. Or it may be based on the cost that I want, to the maximum cost I want to spend on this class of files, should be reduced like so many cents per terabyte, or per gigabyte, I’m sorry. And I can also be as explicit [in] saying this is in – this group of files, should never leave this location, because I may have, for example, GDPR requirements that prevent me – or some other extraordinary regulation that says those should be confined to a particular area.
And aging is another natural one. Things that just get old. So, I can make these policies [be] very specific, or as we say here, fuzzy, that is fuzzy is I’m not exactly telling you what it is, but I’m giving you some hint, as I really want the five nines; I’m not telling you specifically what kind of storage it should be on. The telemetry that the system gathers, and the additional hints that you tell it about the storage devices underneath, will determine where this is placed, and as systems are put in and out of service, those can be – those decisions then change accordingly. So, that’s part of the machine learning the system is doing. It’s actually gauging the response, and the cost, of the [backend] storage, and from that, aligning your data placement to match your objectives.
This is all done at the file level, so it’s much more granular. Most of us have been accustomed to dealing in coarse volumes, and all of a sudden, all we wanted was two of the – two files out of this entire directory to be put in high-performance storage, but we’re forced to put the entire thing instead there, or to move the entire thing to a different storage system. And that’s just overkill and excess baggage. It also costs a lot of bandwidth. So here, we’re able to do this at the file level. So, with that, I’d like to get another poll from you. David?
David: Absolutely, yeah. The next poll here is: what percentage of your data is inactive? So, you’re not using it, it’s just sitting there, it’s inactive. I’m curious to see the responses to this. The options are more than 80 percent; 60 to 80; 30 to 59; or less than 30 percent.
All right, a lot of –
Gonzalez: I’m actually surprised –
David: – feedback coming in.
Gonzalez: [Laughter] Yeah, I’m surprised to see the less than 30 percent, because I thought that would be pretty much vacant. But I do see a few people. And, yes, okay – a lot of them are in the 30 to 59 percent, which is, you know, this is just a guess, generally, because that’s what we’re trying to do. We’re kind if gauging that based on our past experiences.
David:Yeah, absolutely. In fact, someone just said in the questions box there “This is a guess.” Because I’m, sure there’s a lot of companies out there who don’t quite know the answer to this question.
All right. Let me share the results, and it looks like 43 percent said 30-to-59; followed by 60-to-80 percent, which were 29 percent of the respondents. What do you think [unintelligible 0:19:31]?
Gonzalez: It’s – yeah, I’s a big chunk of change [that’s] sitting out there, and it’s probably occupying premium resources, so that’s the other part. So, what can we do about that? Well, first we can be much more deliberate on estimating that, and part of the tools that DataCore provides is to be able to give you insight into that; to actually be able to look at that collection, look at the meta-data on them, and from that, deduce what is active and inactive, so you’re not guessing. And you can say inactivity to me means something hasn’t been touched in four months, or it could mean something that hasn’t been touched in two years. So, it is very much site- and organizationally-specific. Now what we can do immediately to free up that space and put it in the right bucket is, we can use the software basically to say, given that criteria, anything that was inactive under those, that definition, that was tying up your premium storage [here], let’s move it, globally de-dupe it, and compress it and then put it in the low capacity storage that you’d defined for it.
That does two things: saves a boatload of money, and it also frees up [these] resources, premium storage, so you’re not having – you can potentially be deferring the purchase of another NAS, or the other thing it does is it’s reducing the load on that, because much of the load is just the fact that the system is busy, it’s trying to put new stuff in a place that’s already heavily occupied and is having to jump through hoops to do that. No longer is that the case. So, you’re getting better use of the resources you have by freeing them from this extra burden. Now this can happen at the same time that you’re moving new gear in. So, don’t let – don’t let this be – indicate – that somehow you’re tied to those particular systems. At any point, you could say look, this individual storage device, this file server, or this NAS is either coming off lease, or it’s used up its financial life. I want to replace it and I can decommission that in the background, and the software will say okay, anything that was stored here, let me distribute that amongst the available surviving resources and any new gear that you put in. And that kind of migration in the background happens completely transparently, and without ever changing the Namespace, or the process that you’re using internally. So, this makes the injection of modern technology super-trivial, which today, it’s a nightmare.
So, in conclusion, what we’re basically providing with the new product capabilities out of DataCore, is first of all, unprecedented visibility and control over unstructured data. And the capacity that it’s occupying. And it’s done in a very simple way, a very quick way as well, because this process of assimilating only take minutes. It’s not like I have to wait for all these petabytes of data to be moved somewhere, because we’re only looking at the attributes, not looking at having to move the files anywhere. The efficiency is – clearly one of the outstanding benefits, is that the right places, the data is put in the right place where it’s most cost-effective, and I can do this flexibly, regardless of the makeup of the storage devices at my disposal. So, with that, I think we are in good shape.