From an architecture perspective there are three types of storage: block, file and object.
Historically the differences between each were easy to understand. However, recent advancement in protocol support, virtualization and software-defined approaches are blurring the lines between each.
In this educational webinar, DataCore domain experts for each storage type will translate the technological differences into the value they deliver using real world examples and experiences gathered over the course of hundreds of deployments.
Adrian Herrera: Hello everyone and welcome to, block, file and object storage and why they matter to you. This is the first in a series of DataCore educational webinars on object storage. First let’s start with some introductions. My name is Adrian J. Herrera. I’m the one to the right there, I am the product marketing principal for object storage. And I’m going to be acting as the moderator today.
And with me, I have our product experts, we’ll start with Augie Gonzalez. Augie why don’t you introduce yourself to everyone?
Augie Gonzalez: Thanks Adrian. And I’m out in Fort Lauderdale, Florida, beautiful Fort Lauderdale. Been with DataCore for a dozen years. And before that I was with a space program, involved with missile assemble and testing and manufacturing. So I’ve got firsthand experience with all these technologies that we’re going to talk about, both as a use and on the vendor side of it. Eric?
Eric Dey: And I am Eric Dey, and product manager for the Swarm objects storage component of DataCore’s offering. I’ve been involved with the development and evolution of swarm for the past 15 years or so. Prior to that I did a lot of work in the consulting industry around IT, IT management and automation of processes. And so, I’m one of the original core users of technologies like object storage.
Adrian Herrera: And of course, my name is Adrian J. Herrera. I am again the product marketing principal of object storage. I come with Eric through the Cringo acquisition to DataCore. Very happy to be here, in sunny San Diego. And before DataCore I worked for some cloud storage companies both on the enterprises and on the consumer side. So I have a very strong storage background. And with that let’s go ahead and talk about what you’re going to learn in this webinar.
This is what we hope you will learn. You’ll learn the differences between block, file and object architectures anthropologies. We’re going to spend a lot of time on protocols and how you can access those different architectures and anthropologies in different ways. How those architectures and protocols affect use cases and workflows. And then finally, how to determine the right architectures and deployment method for your specific requirements.
We’re going to go over three typologies, architectures, there is a time, a place, a workflow, a workload for each one of those. And there are some overlaps between them. And we will discuss them and also, hopefully at the end of the webinar you will be able to determine the right architecture and deployment method for your requirements.
So, from a very high level, here are the three types of storage architectures, these should not be a surprise since they were in the title of the webinar. But let’s go ahead and start with a quick description of what you’re seeing here. So, you see block and we start with block and from a one statement perspective, block is generally meant for high transaction and high rate of change data segments. And we’ll going to the reasoning for that in this webinar.
File is really about collaboration and portability for frequently changing files. And object is for ease of management and access at scale for static content. I think Eric, when we were talking the other day, you brought up a really good point from a visual perspective seeing kind of, how data grows from left to right. Do you want to give a quick explanation of what you were talking about there?
Eric Dey: Sure. Yeah, basically, you go from your very granular to your managing higher and higher levels of abstraction as you move from the left to the right. And so, it’s really about the sort of unit of storage that you’re managing, or you’re concerned with on this continuum. Down at the block level you’re looking at 512 bytes for four K worth of information at a time.
As you move through files, now you’re worried about full file objects or full files. And then as you get into object then you get into, again you have the full file which you may be looking at a version chain of files. The file may actually represent the object may actually represent a continuum of versions and the changes that have occurred. And with objects we’re looking at whole versions of it and not just something like the current file that’s out there. So, that was kind of the observation is, how you’re managing higher levels of extractions as you move across the scale.
Adrian Herrera: Yeah. And the reason we wanted you to point that out is because this will come out in the information that we’re about to cover. So, from a visual perspective I really like the way that you positioned that and that’s what we try to represent here visually. And do you have anything to add Augie before we jump into block?
Augie Gonzalez: Well, we’ll talk a little bit more about file in the ensuing slides, I think we’ll save it for then.
Adrian Herrera: Right. So, lets jump into block storage. So Augie, take it away.
Augie Gonzalez: Yeah, block storage is at the ground floor of storage architectures. It’s used as the native’s disk, basically that you’re familiar with. Unlike the other two it’s seldom in touch with the human. There’s an intermediary that’s involved with this, and we’ll get into that a little bit more.
The orientation of blocks is very fixed, it’s very regimented into these block sizes whether those are 512 bytes or four meg, 4K bytes. They very much are targeted at low latency, high transaction environments where you have a lot of change in a small segment of the information hierarchy. And there’s really no tagging or anything like that. There’s no relationship to the origin of this, you’re pretty much looking at raw data in its native form carved out into these things. The most common usage for block storage is for the operating system propre or the hypervisors. And it’s kind of structured along the lines of a volume, a classical volume.
So, we as carbon robots seldom are involved in this, this is all downstream from us, but we benefit from it. And the connections tend to be as simple as the legacy kind of SATA drives that we see and the ISCSI protocol. FC stands for fiber channel. Fiber channels on the faster low latency end of the business, and ISCSI is kind of the dominant way to connect to block storage for storage area network. The one that gets a lot of attention these days is NVME. And that’s a multi-channel connection, a lot of parallel paths into storage for high input, and also low latency. So, we’ll see how these shape up in the conversations.
Adrian Herrera: Do you have anything to add there Eric? I know you’re the object guy, but you have a lot of knowledge about block and file too.
Eric Dey: Yeah, I think the interesting observation about what Augie just said was that we humans, and I hadn’t heard the term carbon robots.
Adrian Herrera: Yeah, carbon robots, I was going to ask him to explain carbon robots for those who may not speak or are familiar with the English language.
Eric Dey: But for the humans, I just thought it was an interesting observation that typically we end users don’t have a personal relationship, we don’t interact directly with block. It’s always something else that’s deciding on how to organize it. I thought that was a, think of that as we now walk-through file and object.
Adrian Herrera: Yeah, and I want to just have everyone kind of focus on that no meta data and the carbon robot and human aspect of it. Because when we shift into the object world, your designer, user, interfaces and applications for humans, where metadata really comes into play. So, we’ll just kind of capture that mentally, we’ll talk about that coming up in the object slides, but I think that’s a really good segway and it’s really good here, this is all for machines. But your meta data becomes very, very important once you start designing applications for humans and we’ll talk about that in a bit. So, with that, let’s jump into file storage, so, Augie again, you are the file expert, take it away.
Augie Gonzalez: Files are completely the opposite, right, files are all about how people see the universe of data. How they structure themselves. So this is a carryover from the old cabinets and how do you file things in the draw. And so, that whole directory and folder structure mimics what individuals, how they set themselves up. And it tends to have a hierarchy, it tends to be kind of this hierarchy where there’s departmental or basically facilities, or dates, or anything like that, that seems to help one establish where they place the information. Because you’re really relying on that hierarchy and your memory of where you stored the information and how you structured, and how you set up those files and folders, you’re very much dependent on that.
And that gets into also, the second part of it, which files tend to be a natural means by which we collaborate, by which we share information. So I point you to a share folder someplace, say that’s where I last put it, here’s the latest version go grab it from there. That makes the exchange of information all that much more portable and easy for us to work with, since we all kind of share that same understanding about what a file is and it’s seems pretty natural. Unlike block storage, we do include a number of attributes with each file. These attributes which are loosely defined as metadata cover things like, the ownership. In the block world there’s really no sense of who’s the owner, you might know who created a volume but, ownership is not so singular. Here we can actually define owners, permissions to them, to the file, whether they have read, write permissions or just exclusively execute permission, things like that to determine. And there may be some other elements that help define and help seclude or open up the use of the files.
The common way to access file systems these days, especially in a network environment, would be to SMB the sifts protocol as many of you may know it, who’ve been in the business for a while, as well as NSF. So that the way that this gets lined up is usually on the window centric side of the world, people use the SMB protocol, on the LINUX and UNIX side of the world, they tend to use the NFS and there’s bridge products that let you access from either one using those native protocol. And this is described as file system. Now we also have file system s where you’re not going over in that work, you’re actually looking at a file directly on your personal machine. But we really trying to deal more in this seminar about the broader use where there are multiple collaborators and getting to these network resources. [unintelligible 00:12:21] your thoughts would be on that.
Eric Dey: Yeah, sort of relating it back to the previous slide. So, what we have here is that file systems really don’t exist without underlying block storage in most cases. And they really grew up block storage behind them. And what they’re doing is, they’re abstracting the block, a single block device up to when you’re talking about network file systems, they could be aggregated together dozens of different backend block devices. But again, as Augie’s saying here, it presents the hierarchical view of the storge that the humans are familiar with seeing.
So, think about the hierarchy as we talk now about object though, that the hierarchy that we’re all used to in the file space is a single hierarchy. In that, you start at the top-level directory, and it’s got some number of sub-directories and you follow the tree, whether you draw your trees up or down, in computer science I draw mine down. You open a folder you see additional folders and you keep drilling down deeper into the tree, into the single tree hierarchy of the folder or sub- directory organization here. So, again, as Augie was saying, you have to decide what your hierarchy is going to be when you lay out the files. The file system natively doesn’t tend to give you good ways to search for things, that’s generally add on products to it. But this is the predominant storage that we’re familiar with seeing, familiar paradigm.
Augie Gonzalez: And Eric made a point earlier about versioning in file storage, you’re responsible for your versions and you have to know that they are discreet. So you kept one copy in maybe the version folder and know you have a version two, you keep that. And then maybe you have your current folder that you keep things and maybe that’s how you arrange versioning. It does lead to situations where somebody is saying, “Hey, I’ve shared this with you previously,” but you might have shared with them may have been an earlier version and they may not be aware that there’s a more current revision that they should be working with. So you have to be cognizant of these variations because they’re not intuitive from, unless, maybe the naming, you have pen to the name things like, oh that’s dot V1, that’s V2 and always use the latest.
Eric Dey: Yeah. And there have been some versioning file systems out there, BMS had theirs. But it doesn’t tend to catch on. I haven’t seen it typically used with all systems in practice.
Augie Gonzalez: Mines work for many years.
Adrian Herrera: And just pointing out that the reason we’re calling on all these architectural differences and the differences in usability is because, they ultimately impact the workload and workflow and we’ll go into that. But, what’s under the hood here, how these solutions are architected affect how they perform for certain uses, for certain workloads, in certain environments. And that’s why we’re setting the foundation for that. So, just be aware of that as we move on to object storage. And you are the object expert Eric, so this is where you jump in.
Eric Dey: Okay, thanks Adrian. So, we’ve come from the block, which is more application interfaced, to file which we think of as both application and human interface. It’s where the human can start seeing the organization, having some ability to compartmentalize things. And then, sort of the natural evolution of that is up into object storage. Now where again, you have applications and humans both interacting with it. Object is the most basic description is that it’s a key value store.
The key that you would think of from the file system world would be the full path. Full path to a file. And unlike coming from the hierarchy of a file system, in the object world things tend to be very flat. The name space is flat, the objects don’t really have this organizational hierarchy to them, instead, it relies more on meta data to self-describe the objects that are stored within it. And the reason for that is the metadata, being self-descriptive, lends itself to searching.
So, think in terms of the mac finder or the way that google docs does things, or even internet searches. Is that you used to browse the internet by going through some sort of directory and you would click down through what you were interested in looking for. That would be sort of to the hierarchy that we’re used to from file systems. And we’ve changed and now we’ve just kind of start by searching and we’re basically searching on attributes of the things we’re looking for, we no longer kind of browse the hierarchy to go find them.
And so, this is where object comes in as these singular representations of files. And it presents itself as a massive pool of data. So you’re no longer concerning yourself with necessarily which servers’ things are kept on, the system is allowed to move things around in the background to optimize or to heal itself from failures that occur in the underlying hardware. And by having the system, be this dynamic manager of the content, that opens up the ability to scale into mini petabytes worth of data and billions and billions, I’m sounding like Carl [Sagen]. Billions of files under management. And it’s possible because you’re using essentially searching a searching paradigm to find things. You’re asking for things by their descriptions rather than trying to remember which sub-directory and folders you put stuff into.
It lends itself very directly to user interfaces because object storage tends to be accessed through HTTP style protocols, restful protocols. So, in the case of SWARM, we support the common S3, common S3 protocol. We have a native one that we call SCSP. But that tends to be the normal access pattern for it and so, user interfaces, HTML5, which interfaces are very naturally built upon that because that just looks like asynchronous calls to it, as it’s presenting to you some representation of a data behind the scenes.
Adrian Herrera: I think you touched an important point on interface, is that I want to point out. I’m going to ask you Augie about load the primary differences between block and file. But I did want the viewers to really pay attention to the native interfaces. You see S3, we’re going to talk S3 and used cases across the different file types. Just having an S3 interface does not make something object storage, it means it has an object interface. There’s a big difference because of all the underlying architectural things that are happening behind the scenes. And we’ll dive into that in a bit. I just wanted to point that out. Augie, you were going to say something?
Augie Gonzalez: Regarding the searches, they may be misconstrued as that, the object storage is actually weaving through like spider to the internals of what’s in those objects. It’s actually based on the meta data that makes it searchable. So you don’t have to actually traverse or sweep the internal constants to determine that you’re getting some hints from these attributes, these self-describing attributes that are attached to the object. And that you can enrich overtime, you can enhance the tags. Because something might be both an animal and be in the zoo and that might help you identify and narrow in on it without ever having a peek inside the image for example, of a giraffe.
Eric Dey: Yeah, that’s right. And so that metadata can be added at the time of the objects are written or could be augmented later by humans, or by application processes that are going through and in decorating the content with more descriptive information.
Augie Gonzalez: I just want to make one other point Adrian, is that we’re not picking one of these, Data ore offers solution in all three, in block, file and object storage. What we’re trying to do is highlight some essential characteristics that target why you would use one or the other to help you prescribe which is the better architecture used for a given circumstance. That’s really what a big part of the objective for our discussion today.
Eric Dey: Yeah, that’s right.
Adrian Herrera: Yeah, and that’s a perfect segway to the next slide, there’s a time and a place for all three of these architectures. There’s a time and a place for all of these interfaces. But it really depends on your data. What are your data requirements? That is what ultimately affects your storage selection. So, the data type, whether it’s structured or unstructured. And we just spent some time talking about databases. In the traditional world you have at structured database where you talk to a sequel, you need it to define everything before you put your data into that database.
What Augie and Eric were just talking about is an unstructured database, a no sequel database, something that you can continually modify and then just run ad hoc queries against. That can be indexed at any time for any type of data. There is a time and a place for both of those. Is the data type random or sequential? From a data access perspective, are these shared systems, what applications are accessing the data? What users and what authority and authentication do they need to access the data? Or what is required from the authority and the authentication perspective? Is the data chunked or complete? Do you have a whole file or are you just updating files? Is the data local to the WEN, or maybe over LAN, or maybe over the internet? What kind of security and access controls are there? What are your performance and latency requirements, from an aisle’s perspective or a through perspective. And of course, how do you want to protect the data? What are the durability levels that you’re looking for?
Because of course, as the durability levels increase, as business continuity increases as you make things highly available that increases the cost. Maybe you don’t need to do that, maybe there are more economical ways where you have some sort of reasonable data durability level set? So, you really need to take a look at your data requirements, and you need to define them, and that’ll help you select the right storage for the use case. And we’ll talk about this, I would ask your opinions on this Augie and Eric, but we’re going to talk about these through the next few slides. And this is how data is written via protocol. So, maybe Augie, if you want to start with the block and file, Eric, you wrap it up with the object. I think we touched upon a lot of this already.
Augie Gonzalez: Yes, I think an important point that was made in the last slide, some of you might see the word chumped versus complete in the data access area, it’s actually very visual with what’s happening in this slide. With block you are often just making an update to a very small segment. So, in the left picture here, it might be block 12 that you need to update. So a sensor information coming from an IOT device maybe just trying to refresh the state of a power grid in that one area. That’s all it’s doing it’s just touching that. So it doesn’t want to make any other modification and it’s a very lean protocol for those purposes. All of these handshakes that are used to communicate with block storage devices are meant to turn around, send a quick command with a payload, come right back and get to the next one. So, that’s back to that high transaction rate.
With the file level stuff, we have a very rich set of verbs. These are almost like that human instantiation of what we want to do with a file and that’s what this pause compliant part is. And they define the verbs that have grown up over time on how you set the permissions, how you open a file, how you traverse the file to get to different segments of that file. But they tend to be a little bit more constrained in contrast to object, in terms of the scale. We’d sometimes see use cases where for compatibility purposes the S3 protocol is also used to access file and there is usually some translation that happens. And we’ll get into that a little bit more, what the tradeoffs are, but you will see those sometimes together and that creates part of the confusion that we want to help clear up during this webinar.
Eric Dey: And as we move into the object access, unlike the scenario where with block where you may be updating just a very small portion in the middle of your sort of application define data structure on block, object is looking at dealing with whole objects at a time. So the typical pattern is that, if you’re going to write a new object into the system you provide the entire contents of the object. If you’re going to update it with a new version, you would, there are some exceptions. But in general you’re writing the entire thing again to it, so that you have a complete version of the object that is consistent.
Meaning that you don’t have the partial right and it is sort of unknown point in time when, if I read it right now, am I reading the old half in some of the new parts that’s being written? You don’t have the ambiguity because the object exists as whole consistent versions when you deal with them. And this where, as Augie was saying that you can put an S3 protocol on top of file and generally the application may not know the difference, but when you start to push on the limits of scaling that’s when you can start to see this translation that goes on, the inefficiency of the translation starts to matter. And that’s where the HTTP restless protocol sitting on top of object start to come into their own.
Now object also has it’s very common to see these translation gateways that put something that looks like a file interface, a network file interface on top of object. That has some of those same translation representation problems that you get when you kind of put an object interface on top of file, is that you can really push the bounds of that. And we’re going to talk about some of the impedes hat mismatches of that in a moment.
Adrian Herrera: Yeah, we’re going to talk about impedance, mismatches. And we’re going to ask you to define impedance on the next slide Eric. But we got two questions, and they’re both very similar. One of them is, is there an automated process to move files into the object world? And the other one is, can object storage be POSIX compliant? And they’re both related. Let’s answer both directly. So let’s anther the first one, can you move files from the file-based world to the object world? I’ll go ahead and answer that one.
The answer is yes, there are file interfaces that can take a file system data and move it into the object world. But again you have to take into consideration what Eric was talking about. There is an impedance mismatch and maybe this is a good time to talk about impedance, I think we can talk about this in the next slide.
Eric Dey: Let’s back up for one second. Backup. Can you copy files from a file system into an object storage? Yes, there’s absolutely utilities and tools for that. Part of the question was also, is object a future? Yes. It’s future but it’s not going to completely replace the other two. Part of what we’re trying to get across in this webinar is that all of them have places in the world. It’s you’re choosing the right tool for the job. There are things that block, just cannot be matched for. It’s the reason it still exists, even with object storage.
And there are things that a file could do that are very difficult for the object world. Think about a log file where multiple times per second you’re making these small little appends to the file. In the object world each of those little appends looks like a new version of an object. That’s a very difficult use case in the object world to deal with. And so, it’s kind of misleading answer to say is object the future and say yes to it. It’s the future for some application patterns.
Augie Gonzalez: Yeah, just adding to that. Object has been around for quite some time, for many, many years, about 15 or more years. Most people have interacted with it when they deal with cloud storage, for many, many years, email systems, many, many years. I think what you’re saying is, it isn’t so much the future but it’s the way that humans are interacting with data now, today and for the foreseeable future.
So, now you interact with the same data in different ways, on your mobile device, on your set top box, on your laptop, on your desktop. So, when you do that, you need a very flexible way to be able to call and view that data and that is object storage. And I think that’s why you’re seeing object storage really come up. It’s not necessarily, it’s been around, it’s technology that has been around for a long time, but it’s becoming popular now because humans are interacting with data, the same data sets in very, very different ways. Is that fair? A fair statement Eric?
Eric Dey: Yeah, it is. And also, I’m an evangelist for objects so, I don’t mean to say that the others are dead and leave them in the dust. It’s that there are some very good use cases for objects, and they remain very good use cases for file and block.
Augie Gonzalez: It’s also, I just wanted to point out there’s an economic activity that drives some of that, like the move that’s being asked about as in some cases. If there’s a lot of file activity you want to make sure you can get to that and perform the wrap it up dates and all that. But once those files become less relevant, they become stale or they can be archived, that may be in some cases, that’s when that move occurs. And some of our solutions in fact provide that graceful migration from a file system into object storage, so that the appropriate architecture is applied at the appropriate time to the data.
Adrian Herrera: Absolutely. Let’s dive into this a little bit, I think we covered some of these points. But, what we really wanted to let the viewers know is, protocol support does not really guarantee the full value of the underlying storage architecture. That is one of the reasons why DataCore has all three in our storage portfolio is because, there is a time and a place and there will always continue to be a time and a place for all three architectures. I think both of the questions were actually addressed on this slide. File-based storage that supports object protocols, does not deliver the same ease of management protection that access that scale. And define scale for the viewers out there Eric, what are we talking about when we say object scale?
Eric Dey: So, object scale, again, I’ll try to channel my Carl Sagan with the billions and billions of distinct objects under management, as well as many petabytes, dozens, and dozens or more petabytes within an object system. And managed usually by half a person, half a full-time administrator. And so, that scale in the object world and it can do that because the system itself is doing a lot of those sort of day-to-day management activity such as dealing with disc drives that go bad, because you will have disc drives that always go bad in the system. And dealing with optimizing where things are stored, such as load balancing, leveling of storage, those kinds of tasks where automated. I will give Augie a chance but then I want to come back to the concept of impedance in one of the questions that we have.
Adrian Herrera: Yeah, and Augie make a quick comment about POSIX compliance. There was a question, is object storage POSIX compliant. I’ll go on the record and say, no, object storage is not one hundred percent POSIX compliant.
Augie Gonzalez: That’s what I wanted to cover too.
Adrian Herrera: Any vendor that tells you different, is not being truthful with you. So, let’s put that on the table Augie, anything else to add to that point?
Augie Gonzalez: No that was it.
Adrian Herrera: Okay. All right so, define impedance quickly Eric.
Eric Dey: So, let’s pop back for one moment. So impede [unintelligible 00:37:26] into the electrical engineering definition of impedance, very far. It’s basically something that resist changes in electrical current. Think of it as, it resists changes or it’s sort of a barrier impedance, if you will, to change. And so, the question out there was, can object storage be POSIX compliant? And as an engineer essentially, you could design a protocol gateway that would be a hundred percent POSIX compliant but what would you give up by doing that and this is where the impedance mismatch comes in. Is essentially, you’d be asking the underlying storage system to behave in ways that completely destroy some of its benefits.
So, the benefits of scale, the benefits of allowing it to grow up to many, many petabytes and billions of things under management, start to be slowed down severely as you try to bring what is really file semantics, 100 percent file semantics into the object world. And the other way applies as well, as you try to bring object semantics a 100 percent into the file space you start to get similar kinds of challenges there. So, the question, could it be? Sure, one could do it, but you’d be destroying then a lot of the capabilities of object by doing that. And if you need that you probably need to stay in the file space.
So, things like, file locking where you say, I don’t want anyone else to touch this while I make a change, that’s very hard in a distributed system, that is, its scale came from the fact that they have independent operators in there that are not absolutely coordinating all of their activities.
Adrian Herrera: And that is a good segway into this, popular use cases by storage type. Because, sure you can but it will greatly impact what you’re trying to do with it. So, trying to rum home directory on object, may be a mismatch. This is not meant to be an exhaustive list of use cases, this is just meant to say hey, these are some popular use cases for these architectures, but you start to see some overlap here. We have data bases on the block side and data bases on the object side. Why is that? You have media archives on the object side and media production on the file side. Well, why is that? Some people use their file systems for their media archive and for their content archives. You see data lakes for artificial intelligence, machine learning in both object and file.
So, you can have these use cases and it really depends on what your requirement is, what your budget is, what you’re trying to do with the data. And I think we’re going to jump into this in a bit here, there’s tradeoffs. That’s what you’re talking about Eric, you’re talking about the tradeoffs. So, impedance, electrical engineering, you want to use a natural term, your kind of swimming upstream or you’re trying to fit a square peg in a round hole?
Eric Dey: Yeah, or vice versa.
Adrian Herrera: Sure, if you have a hammer, you can bang it in there enough and you can get it to fit, but there are some tradeoffs. So, let’s look at this really fast, Augie do you want to talk about some of the tradeoffs from your perspective?
Augie Gonzalez: Performance is a really important one. So the performance and the latency in particular of how quickly it turns around the transaction, that often is the distinguishing requirement that drives something towards block for example. So if it’s a small change, has to happen quickly. Very little overhead associated with that, that tends to drive if it’s a more relaxed or the payloads very, very large that you’re trying to put in there, maybe a streaming image, that might steer you further to the right in a diagram that we’ve been showing.
Adrian Herrera: And then Eric, you have anything else to add on the tradeoffs? You were talking about what kind of file locking collaboration. I guess that would be security and authentication, and you talked about management and resources already and scalability?
Eric Dey: Yeah. So, I think I’ve covered a lot of the tradeoffs in just sort of this comments and the slides prior to this. So let’s, we certainly have a lot of resources on our website and other videos talking about a lot of these aspects, so I’ll defer to those.
Augie Gonzalez: Yeah, and I’m going to point out that last bullet point, the requirements also affect whether you use software to find storage, we’re huge believers in software- defined storage here at DataCore. We use appliances, cloud, or hybrid. Now, even in the cloud and hybrid world, someone is managing storage somewhere. So, there is storage going on, cloud and hybrid are really deployment methods in a remote data center being managed by someone else. But you need to take a look at your tradeoffs and then select the appropriate deployment method for your requirements.
Data protection, so this is kind of a big one. Obviously, you want your data to be protected. Augie do you want to explain what’s going on here in the block and file world?
Augie Gonzalez: Yes. The most common thing that you see as far as data protection in the block world has to do with either redundant, or [unintelligible 00:43:21]. Or some kind of synchronist mirroring, or mirroring, just complete copies of that LUN, the logical unit or the volume. And then they’re both, like metro cluster environments, you often find block systems that are using synchronist mirroring between then, so they’re in lock step, one copying in a different fault domain from another. Replication may be also an asynchronism replication for long haul copies, extra data protection outside of the regional disaster area. And then of course you have your traditional snapshots and backups that act on a specific volume. So, the unit that’s being protected is of smaller scale. It is of fractional piece.
On the files replication, backups are tend to be the same thing that you do to take care of business. Like I’m making a copy essentially, I’m making a copy of my file, and underneath, that file may have been stored in a block storage device that was mirrored. So I have not only two copies that I made of the file but the blocks that service and where that data is stored is actually also divided across two different systems. So if one fails, the other one can provide full access. And for object, I leave it to Eric.
Adrian Herrera: Eric, remember we have those slides on the ratio coding coming up, so, we can cover that on the next slide.
Eric Dey: Yeah. So, in the object world the protection is based upon replication and erasure coding. And that’s handled in the back end through the system. But the important point to thin about here is that these protections are decisions that are made on per object basis instead of at a per file system, or per LUN basis. So, you can have both replicated objects and erasure coded objects, with different replication values and the different erasure coding schemes in the same system.
Augie Gonzalez: I guess one point to make there is, this is where the difference is between file system-based architectures differ from object-based architectures. Specifically when you’re talking about replication erasure coding. For object systems that are built upon file systems they have more ridged forms or ridged structures of protection where you cannot mix replicated and erasure coding bytes on the same infrastructure.
So, one point, if that is important to you, we do encourage you to ask your vendor if you can mix replicated and erasure coding schemes on the same infrastructure. Then you’ll know if there’s an actual underlying file system underneath. If they say no, you have to set your replicated object here and your erasure coded objects here, conceptually, you now know that there’s a file system underneath.
Adrian Herrera: So, do you want to go over erasure coding really fast?
Eric Dey: Sure. So, erasure coding, some may not know the term but I’m sure everyone actually has lived with it before, because, raid is very familiar with us. We think about raid five or raid six and there’s others. But raid five is essentially, you think of it as it’s sum number of drives, plus a parody disc. Raid six, is sum number of drives plus two ARV drives. And that is erasure coding. In its essence, is that, you have taken something and split it up into these slices, or parts, or segments. And then you have equally sized segments that serve as the parody so that you can lose any of the primary data segments or the parody segments and you can still reconstruct what was lost. Up until the point that you lose too many of them. And the too many of them means one of these segments more than you have parody segments for.
So in an erasure coding five two scheme, that mean you can lose up to two of anything in this stripe or segmented view of the object and you can recalculate what those two lost segments should be. And so, you can have the system in the background be busy reconstructing and re-protecting an object when a drive fails. And so, choosing erasure coding scheme essentially allows you to choose your level of durability of the object in the system. And to have different disabilities assigned to the system. How many disc drives or how much overhead are you willing to assign to assuring data protection for objects? The answer to that is going to be different based upon what is the content. If it’s just sort of system logs of an application, you know that’s maybe not as important as billing and financial records, in terms of your need for durability.
Adrian Herrera: Yeah, absolutely. And we are seeing erasure coding content being used in the block-based world but we wanted to give it a high-level view of the popular ways that data is protected within the storage architectures.
Now we jump into use cases, again, we’re going to show you how these architectures flow into popular use cases. Again, this is not meant to be an exhaustive list. This is just meant to give you an example of how the architectural differences and the interfaces are applied for workloads and use cases. So Augie take this one away.
Augie Gonzalez: Yeah, it’s most fundamental visualization of a block storage would be, the internal drives on a laptop, or PC, or a server. If you’re dealing with a physical machine, you may have like a sequel server engine that’s basically talking directly to disc. So, the sequel server is the database. Underneath the billing application that a finance team might be using. Generally we refer to this as direct attached storage because there’s no network. I think that’s a distinguisher.
In the next picture directly below it, you now elaborate a storage area network where you have multiple machines accessing a centralized shared resource of block storage. Usually in that case, the central controller shown here, has some level of redundancy because it is the single point of failure. So, if that SAN controller were to go down, or that SAN array were to go down, all of those machines who are dependent on that block storage would also fail, miserably. That would not be a good thing.
In the next center piece here, we’re showing an example of data or software being used to build out a hyper converged infrastructure. Where you’re essentially combining some of these concepts of the internal storage, in a server that is distributed and replicated between multiple nodes for redundancy, yet, you have complete access to them. So the virtual machines on the left can get to disc on the right and similar from the reverse side. And so, it creates a pool of resources in that way, but they’re all internal to the systems. Yet they are accessed over and often ISCSI connections or fiber channel between the two clustered. This might be ESX servers for example, BM ware or Hyper-V in the Microsoft world could be two examples.
In the third picture, it’s a more nuance view of how block storage is being applied, where we’re actually aggregating or pooling different SAN arrays as well as internal storage to make them look like a diverse collection and different tiers of storage that the software makes some decisions about based on the workload. So, we have kind of this notion of virtual disc that’s presented up to the applications so the virtual machines are the operating systems up there. And then, based on maybe things like access frequency the software makes a choice, “Oh, I think seeing as how this data is being hammered, and it seems to be a very high priority item, I will prefer to keep it blocks on the tier one storage.” Things that are seldom used, gravitate to the right, to the cheaper, higher density storage, might be tier four. And that pooling is part of the block storage virtualization that DataCore brings to the party. That kind of gives you a quick idea about our SANsymphony product that was shown in the earlier slide.
Adrian Herrera: Now we talk about file storage?
Eric Dey: Yeah. So file storage looks a little bit different. We’ve kind of hinted at these paths that you need to know. So the path to that engineering uses to get to their files might be the slash engineering, slash development team in, where, Bulgaria, and a different group, maybe regional, in sales has their own file share somewhere. These appear as the discreet file servers. One has to know which machine I’m going to go get that file from, and perhaps you are cordoned off from being able to access another department unless your given special permission. So they tend to, kind of a segregation, that tends to occur here. In the general case, that segregation also tends to lead to silos of the storage underneath.
So, engineering’s got their buckets of 40 terabytes in this example. Sales has their 60 terabytes they tend to be more consumerous I guess in this case, same thing with the others. One of the things that DataCore brings to the party is the idea of a global file system. And somebody was asking about things like [unintelligible 00:53:53] distributer file system. This is where we essentially blend all of these, and collect all what appear to be independent shares, like engineering, sales, marketing, they roll up as directories under this global uber entry point. And that’s what you mount as a user or as an application. From there you can navigate to any of the other resources, without having to know the physical device they sit on. That’s really important, this is really good for collaboration.
The other thing that happens here is we’re able to collect and aggregate the collective capacity of all those individual silos and turn them into one big pool. And so, even if engineering is not consuming as much and sales was actually needed a little bit more capacity, they could draw on that pool based on their current needs and not feel like well, I’ve got to add more to my mine as even though engineering got space or the reverse. So this is a way to kind of combine the two concepts of this centralized, virtualization view and split it out over multiple sites. And we’ll do some other webinars on this to elaborate on.
Adrian Herrera: Yeah, and that’s what I’ll point out, we have a lot of great information at datacore.com in the resources section, on a lot of these concepts. This was just meant to be a very, very high-level overview on all three architectures. And then we segway into object storage use cases. So Eric do you want to describe what’s going on or you want me to take it?
Eric Dey: Why don’t you take it here Adrian.
Adrian Herrera: All right. So, from an active archive perspective, you’ll see it’s a little different than what Augie was showing. You see swarm in the middle, swarm is DataCore’s object storage solution. Its software to find, your installation any commodity or X86 server infrastructure and it creates a massively scalable pool of storage as we were discussing earlier. You see swarm just plugs straight into the workflow. You have you’re HSM or your data mover application on the left, you know that’s plugging into your block or file solution. You see block or file, that can be SAN symphony, that can be IP file, or that can be any NAS that you have.
And then you have your user and applications on one end creating the content, storing it on block and file, the data mover HSM, moves it to swarm. Sometimes the blocker file solution can write directly to swarm if it supports an SD protocol. And then that’s where swarm stores it, protects it. It can be sent to a disaster recovery site as is shown there on the bottom. And users have direct access to it, it doesn’t just have to be a human user, it can be an application. o the application, if it has that unique ID can just call that data, call the metadata, and display it in a lot of different way.
So, even though you don’t know it, a lot of the different services that you interact on the internet are architected in this way. Something is creating the content, it’s storing it on object storage. And then, there’s some sort of on end user interface or application that is calling that data and presenting it to you in a very rapid way over the internet. There’s a lot of the value of object storage built into that workflow.
The second use case is video streaming and video on demand. Very similar to what we saw on the active archive use case. You could see that data is being captured, recorded, produced, in the block and file world. And then once it is done, once its ready for consumption, a lot of the times, that’s when it moves into what’s called an archive. And that archive is kind of a misnomer these days, you think of archive and a lot of people’s minds go to like a cave somewhere, or maybe that last scene in Indiana Jones, right where they have all the stuff that they’re just throwing in a warehouse. These days the archive is continuously accessed in a very distributed way, for very good reasons.
Because humans are interacting with data with a lot of different ways, they’re interacting on their mobile devices, their iPad, their tablets, they’re interacting even over telephone systems calling different audio files. So, object storage is a very good way to do that. Here in the video streaming world, [synsinative] interface for object storage is HTTP. You can deliver video directly from the object layer itself. So, a lot of the times object storage is being used to deliver video directly to subscribers, organizations, like Netflix, like Disney, all of the major video on demand services out there are using object storage in one way, shape or form.
And finally, we’re getting to the end here, so if you have any questions, you can go ahead and start writing them in the question dialogue box. But if this is one of the wrap up slides, how do you determine your storage needs? You really need to detail your workflow. You need to have an understanding of the application and performance requirements. Your availability and durability requirements, your accessibility, the network you have to transfer data. A lot of the times that’s where you wouldn’t need to bring storage in house on PREM, because you don’t have consistent bandwidth out to maybe your cloud storage provider, your cloud provider.
You need to forecast growth of the capacity and performance needs, we do recommend consulting with trusted advisors, those are your bars, your resellers, your SIs. a lot of the time those organizations has spent a lot of time talking to defenders, understanding the difference and understanding a lot of what we covered today. And again, we do encourage evaluations and proof of concepts depending on their certain workloads and workflows. Anything to add here guys?
Augie Gonzalez: I think you nailed it.
Adrian Herrera: All right. So, for resources, the next steps, here’s what we recommend, if you do want a more technical deep dive, we were kind of between technical and business. But if you do want a technical deep dive going down into how data’s actually addressed via these different architectures on this specific storage media, I do recommend watching SNEO’s file, versus block, versus object webinars. You can just type in, seno file versus block versus object and this link will pop up. We do encourage you to go to our events section where we will have all of our upcoming webinars. We also have a library of previously recorded webinars for you. If you do have specific questions and we didn’t have time to get to them today, you can ask questions now, we’ll see what we can get to. But if we didn’t answer your question or if you want to ask something more direct, please email us at firstname.lastname@example.org and we will respond as quickly as we can.
And we’ll open it up for questions, we’re going to go ahead and leave it on this slide. So, you know, just wrapping it up, Augie and Eric, is there anything that we didn’t touch upon, are there any recommendations you want to give to the audience out there on how to get started? Maybe we’ll start with you Augie.
Augie Gonzalez: There’s one area that we didn’t touch much on, which is the financial aspect of this. Sometimes those tradeoffs, we mentioned budget, but they also tend to be just plain old, how much is the per terabyte cost of each of these solutions tends to be higher on the left, and decreasing to the right, just because of the nature, the density of how much capacity per unit of storage is kept and the price per terabyte for that. But that is commensurate with the need and the urgency and the repeativity by which you need to get high transaction rates. Versus something where you’re looking more for ease of management. The content is more static on the far right, and so, it’s intrinsic in that tradeoff decision you’re making.
Adrian Herrera: And Eric any comments there, or how to get started, or what we didn’t cover?
Eric Dey: Yeah, so the economics is definitely something that we have other webinars on that. And that’s actually a very important distinction for choosing this. It deals with the scales of deployment and the needs of the application using it.
Adrian Herrera: Yeah, we have a question on, and this is all we have time for, but the unified, versus SAN, versus data storage. Do you want to make a comment on that Augie? I know we didn’t specifically say unified, but how does unified storage fit into the mix here?
Augie Gonzalez: Unified tends to be a bland, it’s typically an array or system that has two heads to it. On one side you can access it as a block storage device through another protocol and an access method, you get the file system. Internally most of these tend to be either a block storage system that’s emulating a file system, with a file system head, or the reverse. It’s a file system natively and it exposes chunks of information for volumes.
So, it’s trying to compress the functionalities into a dual-purpose system. And some are adequate to do that, but generally, our approach has always been to provide best breed for the access method and architecture you’re looking for.
Adrian Herrera: All right, and we did get a question on [HUDUP] earlier. Eric you kind of touched upon HUDUP.
Eric Dey: Yeah, so that, HUDUP, that kind of goes back to that AIML case where is it appropriate. I know HUDUP is used a lot of times which files tends to put NFS as sort of the, I don’t want to think about storage to just show me the NFS share where I can go to, to get at it. And really what’s happening there is, you’re saying HUDUP is a distributed system and you need these independently acting operators to go out to some shared pool of storage, and they need to share at interface to it.
One of the reasons to choose file versus object in that case is that, you do have these independent operators trying to come through a file interface and they’re sort of needlessly traversing down a directory tree, that doesn’t need to exist. They don’t really need a directory tree but you’re still paying the overhead of traversing through one. Whereas, if they came directly to object in that case, they know the key, the key they have is what looks like a full directory path. And they could use the key to get directly to the content they need to get a hold of.
And so, that’s actually kind of the case where, file is often used in that case but that’s just because that’s the easiest, most readily available interface that most people have in house already. Whereas what you see in the bigger deployments that you might read about is that, people using an object system for AIML and the reasoning for that is that you have this concept of a lot of parallel access. And they don’t really need the file system hierarchy, that just sort of let things in the way for them there. What they really know is, they know the key to the thing they want and they need to go process it and then store some result based upon that.
Adrian Herrera: Well, that concludes the webinar. I want to thank everyone for their time, especially the viewers out there, thank you so much for watching our webinar. Augie, Eric, thank you so much for your time. Again, if you have any questions, please feel free to email us at email@example.com. So, Augie, Eric, any closing statements before we wrap up here?
Augie Gonzalez: Thank you.
Eric Dey: Just remember it’s important to know the application use case because these all have an important place in the world, file, block, or object. And it depends on what’s optimal for what you’re trying to do.