Webcast - First aired on
1:00 pm EDT

Migrating from Storage Silos to Tiered Data Pools (Modernization without Aggravation)

Augie Gonzalez

Director of Product Marketing

DataCore Software

Webcast Transcript

David:  I’m excited to introduce Augie Gonzalez director of product marketing at DataCore.  Augie are you there?

Augie Gonzalez: I am, David.  How are you?

David: I’m doing excellent.  Thank you for being on; take it away.

Augie Gonzalez: Good, so the topic for our discussion today has a lot to do with modernization, a lot of the things that you’re hearing throughout this Eco Cast has to do with moving to new gear.  And what often goes unspoken when one does that, is the level of effort to migrate data from one piece of equipment to another.  So what I want to do is walk through that, especially if you’ve been in an environment where there’s a number of storage silos that you’re working through and a mixed bag of those.  And now you’re in a situation where you’re trying to replace some of that gear or expand on that gear.  What – and that may be the case – one of the examples we run into quite frequently is a collection of distributed network and tax storage and a bunch of filers that are – where the pole’s key information is sitting.

And that may be mixed in with  a good batch of storage area networks, like the SANs, maybe even some new HCI storage clusters that are coming into the mix.  And all of these represent small islands of data that are – tend to be compartmented, their own little sandboxes.  And when you combine all of these and try to look at the different number of tools you’re having to use to administer them, to take advantage or inability to take advantage of their mutual excess capacity, it does make you wonder what could you do to put those into a better use. Especially that’s true from the IT perspective, but it’s also true from the user’s perspective.

So they are having to suffer with the idea of having data scattered all over the place and trying to determine only if they know the location can they possibly arrive at the actual source file they’re looking for.  So one of the things that we’ve seen is that from a data migration standpoint there’s a number of steps that are recommended, there’s a pretty rigorous process to do that if you’re doing it well, but that process usually is either violated or short circumvented, might be a good way to put it.  And just add [hawk] kinds of moves.  What we want to do is take a more disciplined approach to that and one that can be used on a recurring basis to take advantage of the resources at our disposal.  So part of the proposal that I will be making to you today is the idea of using a federated approach or a storage pooling approach.

That is by pooling, I mean, collecting or aggregating a number of the storage resources that sit as in isolation and put them in the collective aggregated view.  And then use software not only to create that pool, but also to balance the load across them and I’ll work through some diagrams on this shortly.  The important element of this decision to federate the resources is also to be able to classify them into tiers, because not all of those silos should be created equally, there’s some that may be pretty premium sized, maybe some all-flash array, some midrange equipment and some cheap and deep stuff, that’s where you’re putting some of your archive.  So let’s make collective sense of that and see how best to organize them.

With that in mind, use a common control plane across where these data service look uniform so that we can automate some of the movement between them.  This is a pretty good picture of the kind of current state you might be in on the left and the proposed state on the right.  So on the left what we’re seeing is the individual pieces of storage, whether those are individual mass boxes, or file servers or SANs.  And what we are doing from the DataCore perspective is accumulating, placing those all under this common control point and then raising or up leveling the functions that one uses to provision information from their provision capacity from there.

The ones the – the tools that you would use to do data protection and migration apply uniformly across all of the member of that pool, because they’re basically treated as backend resources.  And this has the benefit of having a single view of your entire collection of [gear] and a number of other benefits.  So really important is that when you have these silos, there’s usually some that have – that are like up to the – their  next in full files, others may have quite a bit of vacancy.  By pooling them you’re able to redistribute the capacity and put it all it use, so you can often defer buying some capacity as a result of that, pretty big chunks that can be freed up.  You also tend to evacuate and vacate space on your most premium resources that may be used poorly.

And that, in effect, will defer you from buying additional capacity and spending on more high end gear.  There’s also the ability to take the perspective from that point on, that storage devices are interchangeable.  So let’s look at that in a little bit more detail in this next picture.  Essentially by having this software layer or storage virtualization of layer sitting between the consumers of storage and the place where the bits are held we can play those games, we can set up these different tiers, tier 1, tier 2, tier 3 type storage and at any time substitute the most desirable piece of gear for any of those tiers.  So at one point you may have a real favorite vendor, a favorite model from them that you like to do.

Sometime in the future you like to swap something out in its place because it’s useful life has expired and we can make that transparent without disruption.  So you can move things in and out of here very quickly without disturbing the consumers.  And those consumers may be coming in over file shares, in some case they’re coming in as iSCSI or Fibre Channel connections.  So either game the DataCore offers solutions for both of those.  In the process, the other thing that happens is the transparent migration of data from the premium resources to lower cost.  So as the value of data decreases over time, because it ages or simply goes unused, then there’s no sense tying up your most precious assets to hold that information, instead we move it to an auto-tiering technique off to the lower cost storage.

And eventually if you need, go ahead and archive it off to the cloud.  This puts you in the unusually good bargaining position to select from competing alternatives without having concern about compatibility, mutual compatibility or trying to match up with what you had before, because these services are occurring at a higher level.  And it also lets you add capacity, pretty much on the fly when you need it, you don’t have to buy ahead, you buy it on, basically on demand.  And better distribution of the lines that cross your on-prem storage.  So the – there’s a little bit of a – animation on this, this is one of the ways that you can go ahead and expand the environment effortlessly.

So what you’ll see is that we can bring in a new piece of gear into here,  nobody needs to know about it, we don’t have to take any planned downtime to accomplish that.  It simply says, hey the pool has grown, I have a new capacity and I also have a new level or new tier that I can drop that information into based on the policies that I’ve established.  So some things we would’ve gravitated now from this tier 1 to tier 2 can do that seamlessly behind the scenes.  A similar thing is the ability to choose, this choice that you have now is that your hands are no longer tied in terms of the best – using the best hardware for the job.  Because you distance yourself from the nuances and the peculiarities of the individual hardware.

The same is true when you’re refreshing hardware, so you’re replacing it, that’s often the thing that most of you are probably shopping for around right now.  Is how do I modernize what I have, and in the process how do I clean out, and reclaim or remove the data that I have on the other things without including and encroaching on the users and having to disturb them?  So this allows you that privilege, it allows you to go ahead and swap out stuff behind the scenes again and here’s a similar animation for that where the software’s –  You’re basically tell the software, here’s what I want to do, I have a new piece of gear that I’ve introduced into the pool, it’s going to replace this other element that I had, which is long on the tooth now.

So please evacuate the other one and move it where appropriate.  Appropriate may mean part of those files for example, may be moving into the new piece of gear, but some of it may actually be redistributed across other parts of the pool based on the policies that we’ve established for this site and based in the [aging] relevance that we talked about.  And then at the point that that is complete, that that evacuation is complete, this software basically says, OK what do you want to do with this now?  You want to just go ahead and decommission it, fine.  And so it’s removed, in this case we have an isilon NAS that we have – had been a storage system inside the pool, we’ve decided that that is no longer valuable to us, whatever it is, maybe it’s the lease has expired, that may be one of the reasons it drives you there or its useful financial life has exhausted.

So at that point, we remove it, it’s bits have been replaced, moved around to the appropriate buckets on other machines and no one is the wiser for it, it’s that straight forward and that whole concerned about migration and disruption and everything else goes away in this environment.  The other thing that’s occurring throughout this process is that we’re unifying the tools that we are using, so the same technique that you’re using in provision virtual volumes to protect against outages, do your backups and your DR replication practices, those are constants, they’re respective of the make or model of the elements in the pool.  I will clarify for those of you who might be looking at the minutia here is that things – some of the device specific components that you use, they’re may be [counts] you still have to use to originally set up those individual members of the pool.

And for any detail troubleshooting the supplier that may still use those tools.  What we’re talking about here is a control plan that deals at a higher level, that the general provisioning and care and feeding of the overall infrastructure not the individual devices.  And to give you a better sense for that, there’s a – this is a collection of the feature set that’s available through the DataCore’s vFilO, distributed file and object storage virtualization product.  And so it – you can see it deals with all the things that you’re curious to do in a device independent fashion, so we can do auto placement, I’ll show you a little bit about that, create the data mobility and load balancing.

Load balancing I think is one of the really hard ones to do in a silo environment, I know that in our own shop in the past what we’ve had is the issue where file servers get full, they ask us to back them up and move them someone.  And they’re going to have to relocate some of our shares some other places.  And all that manual intervention is pretty untenable at times and often what it leads to do is hey I’m not even going to bother, I’m just going to have to buy another file share because I’m not willing to go through that.  With the product like vFilO, what you’re able to do is basically say, put –

I know how much capacity I have out there let me just distribute as new, files get added to this mix, I know that – which buckets have space on them and I’ll move it there and I can also set upper limits on how full I’m willing to tolerate any one of these systems before I move somewhere else.  So it kind of ping-pongs between the available resources.  Through this collection of functions, then, we learn them once, we don’t have to worry about when you bring in new gear, you’re still using the same tool to provision, protect data and to do your archives and things like that.  And the automation end then remains intact.  So you’re not having to learn every new tool based on that or that may – no longer is a constraint on who you’re shopping for.

The added advantage of that is having much better visibility over the infrastructure, much better control over it at a macro level, as well as where the data is placed.  So let’s take a look at what the software does on that behalf.  And there’s two – three really key functions here,  let me jump ahead to some of the animations.  What you have is a way to tell the software, basically, these are the policies, these are the objectives that I want based on characteristics of the data.  So I can say, for example, this piece of data from the finance group should never leave this particular site, it should never be migrated anywhere else.  However I have a number of, let’s say, MP4 files, video files, that if any of those, we see that they haven’t been touched in six months, let’s move them to archive in the cloud.

That’s just a standard policy, nobody has to be looking at it, the software is kind of your [ eye keeper] for you and it’s watching behind the scenes when any of the – those situations occur, it takes care of that data migration on your behalf without ever having to involve IT in the process, yet, it remains in the catalogs.  So you still see the file name, it’s unlike other situations where you would archive something and then delete it from the file hierarchy.  Here it remains in place and if you need to go get it, it’s still there, it’s just going to take a little bit longer to retrieve it because the software’s going to have to go and rehydrate it from the cloud, bring it back over and put it in the active space.

A number of these can be also related to durability and high availability, so you can specify at a course level or, what we call, a fuzzy level here to simply say, look, I need [five nine’s] for some of this data, it’s really critical to the company.  So make sure you’re spreading it, copies of it, over multiple storage elements and it just goes and does that.  So you might be wondering some – how some of that happens and a lot of it has to do with machine learning and artificial intelligence working behind the scenes.  The automation is not only on the data placement in tiering, but it also applies to how the software fails over should one of these systems go offline, whether it was intentional or it was unintentional.  And all of that activity happens, a lot of it happens in parallel.  Now how do you get the system in place initially, which was the – I’m sure one of your questions.

And that has to do with the assimilation, we call that assimilation.  Assimilation basically says I am going to layer the software on top of your existing resources, I’m going to take inventory of what I have back there and I’m going to use the metadata there to create the uber catalog and from that point on, it becomes part of the pool.  I don’t actually have to move any data anywhere else, they all remain in place, and then I can choose when new equipment comes in to make those migrations.  So the original picture for you may have looked something like this.  This is the – the scattered, widely scattered data across multiple filers or NAS where you have to have location knowledge in order to figure out where to search for information.

And obviously the fragmentation of the file structure under the vFilO single namespace, all those appear as part of one global corporate share, if you will.  And you don’t have to know anything about where the data is sitting and the data may in fact be moving from bucket to bucket without you knowing it.  So that should give you a pretty good idea of what we’re up to and the next thing we want to do is just kind of give you – encourage you to contact one of our solution architects for much more insights into it.

Read Full Transcript