Webcast - First aired on
1:00 pm EDT

Disrupt without Disruption (How you achieve full control on your unstructured data with your existing storage environment)

Alfons Michels

Member of the Product Team

DataCore Software

Webcast Transcript

Alfons Michels : Yeah the Disrupt without Disruption is all about doing much more with what you already have.  So it’s about leveraging existing assets, but more efficient – effectively and also being prepared for the future.  For example, the tremendous data growth, especially in the area of unstructured data storage.  Today’s session can also be seen as preparation or setting the scene for our deep dive, we will have tomorrow.  If you take a look at the upper right corner under handouts the first link, this is a link to the webcast on the same platform, but with a different presenter and it’s a technical deep dive into – so a full out demo of what we will discuss briefly here today on slides.  And my colleague Steve [Hunsaker] will hold that tomorrow at that same time, same platform just [unintelligible 00:00:56] yourself and take a look on the proof of what I hear presenting in slides.

Let’s get started with a problem.  And the problem for unstructured data is often that there are many different places where they are stored.  So you have – we call that [metric] grown infrastructure, we have different file service or NAS systems or different shares, just the Lenox share or windows share, which is used to exchange files.  I mean word files, MP4 files, photos, whatever.  And the problem here is that they are located on, let me say, separate islands, you can connect them, of course, with an appropriate file system to each other.  Nevertheless, it’s always a different access point to get those files.  So if you know a file name or even that it is a presentation or a photo, what you’re looking for, you also need to know where it’s stored.  So where is your access point to get to it?

This is the user’s perspective, if you take a look at this from a management perspective, it’s for example, if you take now the dark grey one, the support filer and if it’s reached its capacity, it’s not simple to enlarge the capacity there expect there is some space for additional disk in that filer.  But otherwise there is really a tricky problem or it’s a tricky thing to move the data, which would be stored under this structure somewhere else.  Or you have to move it which then is a painful migration.  And the ideal way to overcome this issue would be if everything would be in a very large, single system.

But if you take a look at that, that would overcome all these problems because users, they can easily there finds from a central catalog, they just type in that they are looking for a photo and the photo has in its name, for example, summer and they will find that, regardless where there’s it’s stored.  And also from the administrative perspective, they have essential ability to set policies, for example, please archive all video files which were not touched during the past six months to a different system.  But if you take a look at this large system, which would be – and just think about that we have 20% of the respondents saying that they have more than 20 of those sources, yeah.  How large should this system be?  And all the data from those systems, even if you have just five to nine shares, they need to be migrated in such a large system.

And the data migration can be painful and also very risky.  And if you have that large system, what about capability and performance of that system?  Is it still insured?  And how do you take care when a hardware refresh needs to be done in a couple of years?  So the enter immigration process again and this is exactly where the disrupt without disruption approach came into the game, it’s the software defined approach.  By leveraging the existing resources we got as well as the network attached storage, file and object storage, and here is the before and after scenario for what we have seen earlier under those things, but in a simplified way.

Because currently and today, most people have n sources, [does mean] n storages or n places where the data are stored.  Does also mean n access paths and also n management interfaces from the administrative perspective with [abstraction] or it can also say virtualization of this.  You can change this and this will be done with software-defined storage.  Because after this is changed, you still have these n sources.  But there is only one access point from the user’s perspective, and one management interface from the administrator’s perspective whereby all the existing assets are leveraged underneath.  And this exactly brings me to the introduction of DataCore vFilO which does the extraction of file and object storage from the underlying storage hardware.

It’s especially [created] for the needs of file and object management, so for unstructured data, these requirements are at least the level of importance is different to the needs for structured data.  For example, for structured data often performance is very critical factor and a key element.  Whereby for the majority of unstructured data, performance is just a secondary requirement.  Let’s have a look on what can be achieved by leveraging this software defined storage.  So of course you have the virtualization and very obvious you have everything under one roof.  So you have a common catalog and this is a consolidated namespace which covers several filers and sources.

So from the user perspective and also from data perspective any file can easily be found, accessed, shared and of course, also under – backed up or treated in a manner the administrator would like to treat the file.  The second or obvious use case is, and it’s intended, I haven’t touched this before, it’s to leverage the cloud as lower cost storage.  From two angles as lower cost storage, one of course, as an archive.  To archiving infrequently used files, need to know that here also, indeed the location and compression can be leveraged over the three interface in the background.  Does mean that the space is much less needed in the cloud.  But the cloud also for companies who do not have a second or third location, can be used as backup or DR locality to store files, so meaning, having additional replicas of critical files additionally stored in the cloud.

And last but not least, also to scale out and load balance the data between the different data sources.  So that it can keep up with the growth of unstructured data and that you can balance, not only the load, but also the capacity between the available resources, so that you don’t have to take care where everything is stored.  Having a look how vFilO is structured and this somehow is a description of a product, it’s an overview.  So what are the possible consumers?  Of course we do have physical servers, hosting applications, of course as users running applications on all PC’s and laptops.  Then applications which are running in virtual machines and of course also an upcoming technology applications running in containers.

Typically when we talk about unstructured data, they access their data even with NFS, SMB or S3 and here vFilO provides of course operation and insights.  We will just talk about especially the extensible metadata later on in a little bit more detail because this is not a key capability but it’s important to understand the power of data, about data.  And this also explains how vFilO operates and what can be achieved by leveraging vFilO.  A further thing, also leveraging of the underlying technologies are data migrations and when I say data migrations, I mean fully transparent data migrations.  So that means that data can be moved in the background without that – any user will even recognize that the data are there.

From the user’s perspective they’re still fully accessible in the background, but fully accessible, but in the background they are moved from one storage systems or – to another or even to the cloud back and forth.  Of course then any kind of charting, and alerts and provisioning is also covered.  In terms of command and control, it’s important to keep the access controls, so it means the rights which are associated with each file, so they are still at AR.  In addition to a very intrusive [unintelligible 00:09:47] we also have a [command LAN] interface and of course also plug-ins.  For example, to provide persistent storage for container virtualizations.  But one point I would like to highlight here is the file granularity.

Everything I have talked about and I will talk about, can be done on file granularity, this is especially important when you keep in mind the specific rules and also if you would like to have several copies of just one file, they do not have to apply to a folder or to an entire system, everything can be done on file granularity.  And of course, underneath any kind of storage support.  So any kind of classical block storage typically attached with Fibre Channel, or iSCSI or if it is directly attached even with those interfaces, then of course any kind of file services means attached by an NFS or SMB this could be for example, [unintelligible 00:10:50] or any manner of system or just a simple Windows or Lenox file share.  Then local object storage by the [lines] three interfaces, three is not a three, we support for a variety of those.

And of course, also leveraging the cloud and this is one of the key use cases, we touched on this, we support all major players like AWS, Azure or Google Cloud storage and many more.  But in the heart of all that stuff, we do have the so-called data services, these are the data services – how the data are, say, handled.  And here we have a lot of interesting things, no worries, I don’t go through them one by one, I just touch on a few.  For example, the active archive, the active archive is the one when you can move data according to the data migration and also to the data mobility seamlessly in the background.

But from the use perspective, they are still accessible and they are also fast accessible, if somebody really opens a file and it is to be brought to a cloud storage, this might take a few milliseconds longer because of the connection.  But in terms of visibility, the data are still there.  Also auto placement, because based on the metadata, we see that later, you can set policies that data are placed to that storage or that resource in your storage pool which fits best to the needs in terms of relevance also cost, not only of performance.  And then of course, the global [name] space, but here again, like with the S3 interface in the access methods, not to mix up with the S3 interface the storage resources.

This is in planning, this would come with a future release, but the global name space enables you to have several insulations with one single global namespace worldwide to access all this data and to access global catalog and the load balancing also of course covers the capacity of things.  So it means moving capacity or moving the data between the available capacities.  Yeah, what’s in?  So what does anybody have out of it?  So the vFilO benefits and also differentiation, the most obvious one, of course, is control and visibility.  Because all unstructured data are accessed and administered centrally.  From a central access point, they can follow central rules, everybody can see what’s available.

Of course, always according to the rights he is allowed to see what’s available, this dramatically improves the collaboration between locations and of course, also users because it’s not about forgetting names.   It’s very simple to comply with business requirements through explicit policies and high level objectives.  This is something we kind of present in the last demo very nicely and there you can see that, for example, if you put additional information to the metadata, that a certain file is not allowed to leave a region or even not to leave a building, this policy can be automatically applied to any file as we talk about file level granularity.  And you do not have to worry that it will – there will be a copy in the cloud or in another service just as an example.

And of course you’re going to get new insights from the metadata, because you now have a central view on which files are used when, by whom and you can do a little bit more with that.  The next very important thing is the ultimate flexibility provider [with that] so you don’t have any dependencies on [half their own] vendors does mean on the storage, which is used in a way on the cloud services so we have free choice.  And you can also integrate new storages or new – even complete new technologies alongside to existing environment and you can do the data migration, fully transparently for the users in the background if they’re needed.

And as already explained, can keep the data where it makes most sense from a business perspective as all the aspects relevant for the business, which of course is costs, important of the data does.  For example mean, how many copies do I need or how many replicas do I need?  Where are the replicas stored?  They can also be automatically taken into account and be applied to every single file.  And of course I haven’t explained this so far, but you also have the capability to access the same file out of our multiple protocols, currently by our NFS and SNB, NFS three and four, NSMB two and three and in the near future also by our S3 interface.  And then last but not least, it’s efficiency and simplicity because to set up such a vFilO environment, in addition to existing infrastructure, just takes less than an hour.

We will see how the setup in principle looks like and [share] them the other capability to do automatic data placement to do replication of the critical files.  You have this non-disruptive data migrations and you have this unified storage management and access points for different data types and profiles.  Of course, also not to forget the seamless cloud integration, the graphical user interface and the short time to [value] an operation, I mentioned that already, that the installation can be done in less than an hour.  But also the assimilation, this is something we will take a little bit deeper look later on.  It’s also of course depending on how many files and how many should be assimilated, a question of minutes, and not days and weeks.

Having mentioned the assimilation so often, let’s have a look at the power of metadata as mentioned earlier.  So the power of metadata, data about data is getting deeper inside, knowing what’s happening with the files and having them accessible at a single point, not in different locations also allows a very quick search and the apply – to apply rules over your entire set of data like we already discussed.  And this is exactly how vFilO operates because the first step when you have set up of vFilO configuration – it’s the assimilation of metadata from existing, diverse NAS and file systems.  So the assimilation process does not touch the data at all, there is a long list of support devices, which can be assimilated and the data remain there during the assimilation as they are, they are really untouched, there’s no replacement, no migration associated with that.

And of course, all the attributes remain unchanged, so they remain as they are so the rights, timestamps, users, everything which is known remains as they are.  During that assimilation process, and also later on, also the functional parameters of the storage are measured, as that – this is not a – the assimilation before vFilO has control over the data is a one-time process and the assimilation itself does not do anything.  But during the assimilation also the functional parameters of the storage are measured.  For example, is this a highly available file storage?  Or what is the performance, so the numbers are known to that file storage.  How many capacity is free on that?  If so, any capacity free?  And all these parameters are measured well and these – this is an ongoing process, so if we vFilO’s inside of there, this is an ongoing process to measure all these data, because this enables them, the other steps.

And this is, of course, then the creation of the one global, scalable and searchable catalog, what I said remained space.  The global named space in the future where we’re talking about several vFilO instances, and the access for mixed protocols here, again the NFS, v3, and v4.2, the SMB, v2, v3 and S3 in the near future.  Then the balancing of data across all of the available storage systems, of course, including to leverage the cloud as very elastic extension of local capacity.  But regardless between those systems and of course according to the policies which are previously set or manually set, you are able to move the data around between those services.  This all  happens dynamically, if it’s done by policies, and can also be done manually, but always fully transparent to the users.

So the business requirements determine what would be the most appropriate storage in terms of the performance availability, cost or whatever you determine and this is a uniform approach or a common approach for file and object stored data.  Let’s have a look at an example, this is a [huge] example of how your business needs to determine the best location for your data.  And on this example I need to mention that it’s just a semantic example.  So what you see here right now, it’s a premium, NAS storage or file storage.  It can also be a cluster, but what is true is that of those premium systems, only at maximum 20% of the data are really active or important data to the business.

Let’s assume in this example the active and important data, these are the storage for your [unintelligible 00:21:21].  So these are the active and important data, but you have a lot of inactive data, so means a lot of pictures, videos, all presentation, all PDF files, and they are hosted on this very expensive premium storage.  So the majority is not so important, but it’s still hosted there because this is available file storage, file storage is easy to manage, easy to share and it’s fully integrated into the business – into your business continuity concept and also into your backup concept.  So this is pretty expensive to include it there and for some NAS systems it needs to be taken into account if they are getting close to the capacity limit, they are also getting slow, which then effects also the applications running in [unintelligible 00:22:13] file, in this example.

And by having the metadata and capabilities to set policies, those policies could easily be just applied only to those files which are less important.  As said, it’s a file level granularity or shared level granularity as you determine and then you could leverage affordable storage and set.  Here alternatively, just an example it’s a [j port] and some cloud storage and here if you move further with the non-disruptive data migration in the background, your storage can be [freezed] from the inactive data and they can be moved either on the affordable storage and [one, one, one] copies or even de-duplicated and compressed to local S3 storage or to cloud S3 storage, which then means that you have also safe storage space and your premium storage is free for more important files.

And the side effect is, if this is a system which has performance issues when it became nearly too full, now it’s not full any longer and the performance goes up and everybody is happy by just having done this change.  And from the user’s perspective, this is as said, fully transparent and of course what can be done in one direction, can also be done in the other direction, so it would be possible to do it vice versa.  So if a file becomes important again and it’s no longer, not so important files or inactive, it can be, of course, again fully transparently moved to the active and important files, as well.  Here I already mentioned that we briefly talk about the configuration principles and how this happens.  If you may remember from the other view slide, we have the consumers and you should see them here at the top.

We’re talking about the virtual machines, the PC’s, laptops, physical servers and of course also containers.  This is non-ending list and of course the diverse resources underneath which can be either file [unintelligible 00:24:32] blocked devices or SMP, NFS, [unintelligible] servers or if the object storage locally or any cloud storage this is a typical setup and this is the access path.  When vFilO comes into the game, and this also explains how the assimilation process starts, this is mandatory and production environments to have two metadata servers.  They are hosting the metadata and therefore explains the importance of the metadata.  And they are the central points of administration from the administer perspective.

And they are hosting the catalog and making everything available.  For test environments, also of course one server could be enough but for production environments it’s mandatory to have two of those servers.  And then you also need those who provide the data services to the upper consumers and this is what we call, data services server or data movers.  And this exactly is the minimum configuration that’s one of them.  While I’m on it, if you see here always physical servers, these physical servers are not – do not need to be physical servers.  This can also be virtual machines of course, but especially for the metadata’s servers, it’s recommended to have those virtual machines on two different physical resources, in terms of viability.  And here the data servers also can scale up and down as the load requires them to scale.

So it’s no problem to add one further, they are acting like a grid, they can be scaled up and down and this can be extended depending on the needs, also the life of things independent of software defined storage.  How many data servers do you have?  So it can go up to 40 and please keep in mind the 40, this is the technical limit we tested already.  So this is not a design limit, this is just a technical limit we tested.  If there is the need for hundreds of megabyte of files to have more data movers, it’s not a problem to increase that number.  Yeah this is the principle setup and this is what you’re going to see in the [unintelligible 00:26:54].  What we have here further, the summary, so get most of your existing storage environment and be prepared for the future.

This is all what we understand also under disrupt without  disruption.  Or the motto of today’s webcast day to do more with less.  You can find further information on vFilO of course in our product page and I really encourage you to attend our deep dive with live demo and of course also the option to ask questions during that webcast.

Read Full Transcript

Get started with the cornerstone of the next-generation, software-defined data center