One is a type of storage/architecture (object storage), one is a protocol and one is a storage service (cloud storage) but all three are related and sometimes confused by end users.
So what are the differences and how do you leverage each to solve your specific data storage and access challenges?
In this webinar David Boland, Product Marketing Director from Wasabi joins Adrian Herrera, Product Marketing Principal, Object Storage from DataCore to discuss these technologies, evolving market requirements, and how all three are reducing storage TCO while keeping petabytes of data instantly accessible.
Adrian Herrera: Hello everyone, welcome to our webinar today. The topic is what is the difference between object storage, S3 and cloud storage? With me today I have David Boland product marketing director for Wasabi. I, of course am Adrian J. Herrera. But David, I want you to introduce yourself, tell everyone what your background is and then we’ll jump into the content after that.
David Boland: Absolutely. Thanks AJ and thank you for having me here today, I’m really excited about this webinar; I look forward to sharing information with the crowd and learning more about DataCore. I know we’ve been partners for a long time now and it’s a relationship that I value and so I’m happy that I could be here. I am – like you said, I’m the director of our product marketing team here at Wasabi. We are a cloud storage company we’ll talk more about that in a moment. But Wasabi is, let’s see here, was founded in 2015, our service went live in 2017, so we’ve got about, almost five years worth of service live up and running.
I’ve been here for almost three years of that. Prior to Wasabi, I was at NetApp storage vendor for a few years and before I was at NetApp I spent a lot of time on the network, side. I spent a decade or so at Juniper Networks and before that it was Lucent Technologies. And then Ascend Communications and Cascade Communications in the days of framed relay and ATM. And I got my start on the network side back in the late eighties, early nineties at a little Ethernet company called Cabletron Systems, building Ethernet switches. So the majority of my life has been spent on the network side and it translates well into cloud and transferring data to and from the cloud. And the last few years have been all storage.
Adrian Herrera: Yeah, so you have very relevant experience and that’s what I wanted the viewers to hear the relevant experience you had in your past and know where your point of view is coming from. I also come from a pretty strong cloud storage background; I came through the acquisition of Caringo, that’s how I came to DataCore, just a few months ago, at the beginning of the year. Before Caringo, I was on the founding team of a company called Nervanix which was one of the first enterprise cloud storage services, it was the second generation of storage services, this was way, way, way back in 2005, 2006, so it seems so long ago.
David Boland: I remember Nervanix.
Adrian Herrera: Yeah, yeah and part of – I was also part of direct consumer cloud storage services and media organizations with Yahoo for a bit and [unintelligible 00:02:57] so I come more from the media and service side, then transitioned to enterprise storage. So I just wanted everyone to know where our experiences and what our point of views are coming from. Because this is a thought leadership series and being part of a thought leadership series, feel free to ask questions throughout – the viewers, feel free to type in your questions. David and I like to have conversations through the webinars; we’ve done a few together. So it’s OK if you have a question on a particular topic, just go ahead and type it into the chat window. And as always, David, feel free to ask me questions, I will ask you questions throughout.
But, being part of a thought leadership series, the first in this series was going over object, file and block storage, architectures and interfaces. Why interfaces don’t always give you the value of the underlying architectures? One of the other things that we often see in the market is confusion between object storage, cloud storage and the S3 protocol and David you and I have had conversations about this before. It seems so clear to us because we’re so close to it, but people that I talk to are confused and users and partners are confused and I’m sure you experience the same thing.
David Boland: Every day, every day, I get asked the difference between what’s the S3 protocol, what’s the S3 service? What’s the difference between the different types of cloud storage out there? So yeah every day.
Adrian Herrera: Yeah and that’s – hopefully we’re going to clear a little bit of that up today. This is what we hope you will learn, the differences between object storage and cloud storage. The differences between a RESTful interface and standards-based interfaces. The difference between and S3 service and the S3 protocol, I think that one’s big. We covered that I think pretty extensively on the OnPrem side, now we have an expert like David here to walk us through the differences on the service side. And then how to determine the right solution or service for your specific needs? So we’ll go over a real light matrix to help you with your decision making to fulfill your requirements. And with that, let’s just jump right into it.
We want to set the stage here, just provide a real easy visual to show you the differences. Starting at the bottom there, there’s object storage in that teal color, then cloud storage, almost every major cloud storage service runs on object storage, but there’s a lot of valued added services added to cloud storage and we’ll go over that. David will do a deep dive in that in his section. And then there’s an interface, how you interface with a service. How you interface with a storage. We’re going to be talking specifically about the S3 API, but almost every storage service out there, when they first started, at least when they first started a while ago, had their own proprietary API then the S3 protocol, the S3 API started gaining in momentum and popularity and a lot of service providers and even OnPrem storage solutions just started supporting that from a native perspective. But again, we’ll explain that. David anything you want to add here from a level setting perspective, from a foundational perspective?
David Boland: Nah, you covered it well Adrian, I think we’ve got a lot to talk about, so I don’t want to add anything else to our menu.
Adrian Herrera: Yeah, yup. So let’s start with object storage, and again I encourage you to go take a look at the recording that we have in our webinar section, you can go to DataCore.com in the resources section, look at the webinars. And there’s one that goes over this topic in detail, block is really meant for high transaction and high rates of change for data segments. You could think of it as a raw volume that’s presented to operating systems and systems as a whole like databases, that kind of thing. Then you have file, file was really meant for collaboration and portability for frequently changing files. It was designed in a way that is easy for humans to understand, putting stuff in directories is very similar to putting stuff into folders.
And those concepts were needed when this technology was first founded, storage technology was first founded. And then you have object, object grew up in a little bit after file, and it was really meant to provide one massive scalable for storage. Focusing on the ease of management access at scale, that’s why a lot of cloud storage services use object as the underlying storage technology. Because when you store something, you get a key and all you need is that key to retrieve it. Of course from a cloud storage perspective, there’s a lot of other value adds that are added on top of object storage from the operation’s perspectives, you know the cloud storage services probably manages the data and everything.
And again, David will go into that, so I don’t want to go too far into that. But we want to set the stage with the different storage architectures because this is a slightly deeper dive on object storage and cloud storage.
David Boland: Hey AJ, can I throw out a couple definitions here?
Adrian Herrera: Sure.
David Boland: Yeah, so if we go back one slide just for a second, we talk about block, file and object, I want to talk about structured data and unstructured data. So in the case of block data, you may see or block storage you may see definite examples of people saying, well I used block storage for my structured data. And structured data is really going to be anything that fits well into a database. Right. So you’ll see block storage used with databases a lot. If you get more of an unstructured data, that could be anything well from voice recordings, or video recordings, or PowerPoint presentations, or x-rays, anything that doesn’t fit well into a database, why that’s a great fit for the file or object. So you’ll see structured and unstructured data using different types of storage. Thanks Adrian.
Adrian Herrera: Yeah, yeah, absolutely. And again we do a deeper dive and go into datasets too in that webinar so if you’re interested in more what David said, we had some product managers [unintelligible 00:09:12] product managers from DataCore do real deep dive into the different types of data and the use cases, so that’s a really good webinar. From the object storage perspective again, we talked about key value, I don’t want to spend too much time on this because we have a lot of great information on our site. But the points to drive home here are that there is this very easy way to access data for an application. So data is stored in a massive pool of storage, it was really designed with applications and scalability in mind and that’s the point to drive home.
And when we talk about scalability where we talk about scale, we’re talking billions, hundreds of billions of files, we’re talking thousands to millions of tenets, we’re talking anywhere between petabytes, hundred petabytes, even approaching exabytes of data. So those are the kinds of scale that object storage was designed to handle. And of course you have the associated metadata, with the object storage, so the data about the data. In the file system world, sure you can have data about the data, usually it’s stored in the external database, but with a lot of object storage solutions, the metadata is just part of the object itself, at least on some of the best of breed solutions out there. Again, if you want more information on object storage, you can go take a look at DataCore.com.
But from a very high level, here’s why users consider object storage, I put Swarm object storage because it’s our product, but from a very high level when you take a look at why organizations purchase object storage and run it within their data center, it’s really because they need to manage billions of files, petabytes of data and thousands of tenets with limited staff. We have a number of different organizations that are really struggling with providing distributed access to content and data and you can do that and we’ll show you why you can do that in a distributed way in a bit. Often organizations require the scalability of cloud storage but the data needs to remain onsite.
Whether this is for some sort of regulation or some privacy concern, if there’s a requirement to keep data onsite, maybe there isn’t a consistent or persistent bandwidth or high speed bandwidth, they need to keep that data onsite and within their own data center. Or if they have already an archiving solution, maybe they put something on tape, but they need to access that data now. I think one thing that the recent occurrences in the world has showed us is that you just can’t lock data up anymore, you can’t put it in a closet, you can’t put it in a vault and expect to be able to access it quickly. Sometimes you won’t be able to get to that vault, sometimes you won’t be able to get to that closet or wherever you have your archives. So you really need to think about keeping that data online and accessible.
And those are really the primary reasons why organizations are using object storage today. David, I know this is an object storage slide, but do you have anything to add there?
David Boland: Yeah, you know what I just want to point out that sometimes when we say that there are billions of files or billions of objects, that’s not marketing, that’s the truth. Honestly we’ve got customers that have hundreds of millions into the multiple billions of objects stored in different buckets. If you think about your average objects size and everyone’s object size or file size is different, it all depends on what you do with them. If you have, say an average one megabyte size object that you’re storing, one petabytes is a billion objects. So if you’ve got two petabytes, three petabytes, you’re looking at two/three billion objects that you have to manage and store. Like AJ said, nothing’s better than – at doing that than object storage and I’m a big fan of Swarm object storage for managing billions of objects and petabytes of data.
Adrian Herrera: Yeah it’s funny when you write billions and it is hard to conceptualize billions but you’re right, we have many customers storing tens of billions of files. And there’s no reason that the solution can’t store hundreds of billions to trillions of files. I was just on call yesterday and someone said hey we should put billions of billions, I’m like, well we could put trillions right, but you’re talking trillions there. But that’s the scale that object storage can handle and the reason it can handle that is because it doesn’t have some of the limits of an underlying file system. And again, it’s a deeper dive in the architectures and there’s a lot of good content out there if anyone’s interested.
If anyone’s interested we’re going to put our email addresses at the end, email addresses for the organizations where you can just go ahead and email us for more information. So wanted to do a quick use case for object storage, just a level set again showing you how object storage is used. You can see the Swarm archive there in the middle, that’s object storage, you can also have different clusters in a different site in a different DR site. One of those DR sites can be Wasabi, you can just go ahead and set that up in the UI and you have your key, and your password and your secret key and you just drop it into the UI and specify what datasets you want replicated to Wasabi and it goes to Wasabi.
But here this is more for a media entertainment use case, but this could be any use case, that metadata controller or HSM could be a gateway, it could be a NAS that supports the S3 API, it really could be any data management application. That primary shared storage down there, could be, of course, SANsymphony, another DataCore product. I guess I should point out that metadata controller HSM could also be a vFilO, so there’s a lot of different ways to architect solutions that utilize object storage and of course you can write directly to object storage, it’s that application there on the left. Just integrated the Swarm API, even the S3 protocol, it can write directly to the Swarm archive. And then you get the benefits of the scalability, you get the benefits of the automated management.
You could provide internal archive access directly, so there is a content portal built into the Swarm archive, users can just log in and only have access to the datasets that they want. Again, you can send a dataset out to Wasabi for disaster recovery purposes, if anything were to happen to that primary Swarm archive. Or maybe even your other disaster recovery site within your data center, you can recall that content back from Wasabi, just flip the switch and rebuild everything. So it just makes it very easy, very flexible and of course the data is accessible, this is a tape replacement use case, there’s lots of other use cases. But in the tape replacement use case, usually that data would have been stored on some form of tape archive; to access that data you needed to go get the tape, you needed to bring it back online, you needed to find the piece of content and then make sure it’s accessible.
Here in this world content remains accessible, it’s always on disc, you have metadata so you can easily search for it, you’re always protected because you have your disaster recovery site and your secondary oratory copy up on Wasabi. So, again, there’s a lot of different ways to utilize object storage, but just wanted to give you an example of one use case. And then do a deeper dive on – actually it’s one of our mutual customers, one that was Wasabi and DataCore, both have as a customer and that’s Kinetiq, they used to be iQ Media, then they were purchased and they became Kinetiq. But – so Kinetiq, they have a media intelligence platform, so what they do, they record shows, and they record commercials and organizations that are interested in how their brands are performing or how their message is performing in particular markets go to Kinetiq.
And they say, “Hey, I want you to run this analysis in this specific region to let me know how I’m doing.” And that means that they need to record and collect a lot of content, a lot of unstructured data, like David was talking about. And furthermore, they need to keep it accessible in perpetuity, so they are experts at recording and collecting data and at analyzing that data after the fact. So from a case study perspective, when they first started S3 wasn’t huge, they had to make due with the technologies and the storage technologies that were popular at the time. So that meant that they built their solution on Windows File Server and Windows File Server is based on a file system and it was just – it had some scale limits.
So they wanted to take advantage of this new storage, object storage, S3 storage years ago, but they needed to do it in a way that they can plug into their existing file system and file server workflow. So that’s when they did their search and they came across, back then it was Caringo Swarm, today it’s DataCore Swarm. But the benefit is the same, they were able to plug into their existing workflows, they were able to reduce their RAID rebuild times, they were able to reduce their RAID protection scheme altogether because the objects are stored via replication, erasure coding. And they were able to scale, it just shows the automation that’s built in, the data protection functionality built in, how it as applied to a traditional workflow by leveraging some of the concepts that we spoke about on the object storage side. And also some of the concepts that we’re about to talk about in the cloud storage side. So they also use Wasabi for some of their cloud storage needs. I believe they do, I didn’t –
David Boland: Yup you’re absolutely right, they do and one of the things I love about Kinetiq, the guys have a fantastic IT architecture and which I know we’ll talk about later. But the hybrid cloud architecture where you keep your hot data OnPrem easily accessible, fast and performing. And then you can move your colder data to the cloud for cheaper, deeper storage or for offsite backup. These guys, this case study is just a fantastic usage of DataCore, OnPrem, Wasabi and the cloud and if anybody has any questions about how this operates, just reach out to Adrian and me and we’ll go into some detail for it. But this is a great hybrid cloud case study.
Adrian Herrera: Yeah and I think why I like it is because they’re doing some cutting edge stuff on the analysis side and the service that they provide to their users. But they’re experiencing, I think, a lot of some of the issues that most in the IT world are experiencing. You purchase certain technologies and the ecosystem, the landscape, changes so fast and you make investment and particular infrastructure and then three years/four years down the road, something else comes up. Well how do you leverage those advancements? Right, what do you do? And I think Kinetiq did a really good job of still extracting as much as they could from their initial investment and then creating a nice transition, nice bridge to some more current underlying storage technologies, foundational technologies.
And that’s what I think a lot of these services and products do these days, we’re able to bridge the gaps and be able to help you migrate or create a bridge from traditional technologies to these more current technologies and ways of doing things, more service-based approaches, I should say. And with that – so I know we put RESTful Interface in-between the cloud storage section and the object storage section, the reason being is the main way to interact with object storage is via HGDPD via a RESTful Interface. So all object storage solutions, if they don’t have their own proprietary API, they have their own proprietary implementation of the S3 API and we’ll go into that in a bit. But we wanted to define what a RESTful Interface was because a lot of people you hear RESTful and there’s just a lot of confusion.
It’s like what is it? Well, RESTful stands for representational state of transfer, it’s really a set of architectural constraints. So it’s not a protocol, it’s really a framework, I’ve put standard there, but it should be a framework. And it’s just a framework for how requests should be made over HTTP in a stateless environment, so stateless client server environment. It – as a framework there are some guidelines, it needs to be lightweight, it needs to be fast, it needs to be scalable. And because it’s going over HTTP it needs to be fault tolerant, so it needs to be able to expect issues and gracefully work through those issues, so it’ll just keep on chugging and working, it’s basically the way that you talk to services these days.
Back when S3 first came out, most organizations were plugging into [pas 6 ? 00:23:45] compliance storage solutions via standard storage protocols, like, I don’t even want to say it, well I don’t even think it was SMB back then, David, I think it was CIFS, right.
David Boland: Yes, I don’t think you say that word, I don’t think you can say that word anymore, I think it’s no longer popular to say CIFS, you’ve got to say SMB.
Adrian Herrera: Yeah, exactly Microsoft’s going to knock on my door or something and – yeah. But yeah and the reason being is there needed to be standards for applications to talk to the underlying file systems. Without those types of protocols, the ecosystem would not have grown, you would not have had innovation on the application side. It was a little bit different when interfaces came out, when RESTful Interfaces came out, because then the innovation really moved to the service side and that’s where if you were to put constraints on RESTful Interfaces when they first came out, I think you would have stifled innovation from the services perspective and that’s setting the stage to what’s coming next. So David, do you have anything to add to this before we jump into your section here?
David Boland: No, this is – you covered it pretty well. I just – I sometimes like to make a – oh, the little, I guess a higher level metaphor for how an API works. For – because I know some of the people that are coming today and they are not deeply technical, so they still may be confused or questioned about what actually the API does. You did a great job, Adrian of explaining it but I’m going to go a little bit higher and use this metaphor I’ve seen used a couple times in the past with regards to API. So think of it as, OK, we’re going to use a restaurant as a metaphor for this and you’re in the process of ordering food at a restaurant and when you’re sitting at the table at a restaurant, and you’re given a menu of choices to order from, well the kitchen is the system. Right?
And they’ll prepare what you order, right, but how that kitchen understands what you’re going to pick from the menu is important, and that’s where the waiter comes into play, and in this example the waiter is the communicator, the waiter is the API actually. And he takes what you choose from the menu and tells the kitchen what needs to be prepared and then the waiter takes it from the kitchen back to you and – at your table. And in this example the kitchen is the application that has things, the restaurant customer is the application that needs things and the menu is the list of API calls, so things you can – the kitchen can make for you or deliver to you. And the waiter is the programming interface that communicates back and forth between the customer and the kitchen.
So it’s kind of a higher level overview or metaphor the restaurant, but I find that it’s helpful for people to better understand how an API works if you put food in front of them.
Adrian Herrera: Yeah, yeah and you also made me hungry. So –
David Boland: Sorry Adrian. I know it’s lunchtime for you over there.
Adrian Herrera: Yeah, definitely. It’s breakfast time it isn’t even breakfast time yet, but so it’s – so let’s jump into cloud storage, so we said object storage which is underlying architecture storage technology. We talked about the RESTful API, so just the framework setting the stage for how you interface with both object storage solutions and cloud storage solutions, so now we’re jumping into cloud storage. So with that, David, why don’t you explain to us what cloud storage is?
David Boland: All right, this is simple. Storage is storage for the most part, but where it sits and who owns it and operates it and manages it, are two different things. So you could have block storage, and file storage and object storage OnPrem, like AJ talked about and in that first webinar which was covered. Or you could have it sitting in the cloud, I’ll use AWS as an example, they’re the granddaddy of all cloud storage, they launched their service in 2006 and that was with originally S3 object storage. Time went by and they delivered file storage, EFS, and FSX for Luster, or FSX for Windows, and they also have object storage, EBS. So file, block, object can be OnPrem or in the cloud, it just really, who owns it and who operates it? In the case of cloud, it – on demand storage should be there when you ask for it.
You can ask for it honestly, if you wanted to spin up a couple terabytes today, you could probably get that done in 15 or 20 minutes. Right? And if you were done with it, you could be done with it. But between us, there’s not a lot of elasticity in cloud storage, well I take that back, not a lot of elasticity in cloud object storage. You pretty much know what you need and it’s not going to deviate too much, it may grow, you may delete some stuff over a period of time, but you don’t really spin it up or spin it down like you could compute or block. So it’s just there for you when you need it and as Adrian covered in the RESTful Interface section a couple minutes ago, it’s accessible via HTTP. Super simple to use, people understand it, it’s just there.
So the infrastructure is managed by the service provider, AWS, Azure, Google, Wasabi, whoever that cloud service provider is, that’s where we get the – that’s who owns it, operates it. So you don’t, as the end user, have to worry about patches, or releases or data migrations that much. As AJ said earlier, one of the nice things about the Swarm piece is it’s easy to understand, it’s easy to use and you can do it with a limited staff. If you even have more staff restraints, well cloud storage is a good option for you too, because you don’t really need to spend too much time worrying about the data they manage for cloud storage, because that’s where the applications sits, the application communicates to the cloud storage via the API and so once you set it up and walk away from it, it should be easy to use and walk away from.
And then from an economics perspective, cloud storage, it started in 2006, was it less expensive than OnPrem? No, it really wasn’t and for the most case, many cases it’s still not as inexpensive, not inexpensive, it’s sometimes more expensive than your OnPrem storage, depending on who you use and what you use. There is the option for pay as you go pricing, so you don’t need to worry about buying a lot of storage upfront, and then hoping that you have the capacity as time goes by. Because, again, if you only need 15 terabytes, a 100 terabytes, you buy 50 or a 100 terabytes, you don’t need to buy 200 terabytes or 250 knowing that you’re going to grow into it. And all this really does is allows you to help reduce operational costs, if you are consolidating data centers or if have a bunch of old storage out there, or as Adrian mentioned, tape.
Right? And you say OK, I don’t want to rack and stack any new LTO9 stuff, what do I do? Well you can migrate that tape to Swarm, you can migrate that to Wasabi. Hell you could use DataCore vFilO to figure out which data is new and which data is old and move that old data off Prem to the cloud. And the vFilO product has great [unintelligible 00:32:00] and compression technology that allows you to reduce the amount of storage you need in the cloud, so you could reduce your operational cost really easy just by having a hybrid cloud solution that’s comprised of a combination of OnPrem storage and applications like vFilO and cloud storage like Wasabi.
Adrian Herrera: Yeah, I guess the point is it’s really dependent on what your requirements are and we give everyone a nice little table to, at least help conceptualize. Hey how do I compare these different services, different solutions? And how do I make my decisions? That’s how we close out this webinar. But it leads to the question, well, OK just set the stage of what cloud storage is? Well what is S3? So why don’t you take the first shot at describing this and then I’ll come in and –
David Boland: Yeah so this is where sometimes the confusion hits, right. So you have S3 that it’s a service and it’s a – well protocol. I guess you would call it a protocol, it’s a de facto standard for protocol, like you said, there’s really more of a framework. But the S3 service, one of Amazon’s first, if not the first service, was S3, simple storage service. And it’s object storage in the cloud offered by Amazon. And it was designed in 2006, really as, was it file sharing? I don’t know if it was designed as file sharing solution, but it’s more of a Dropbox or box type solution where you could put big files up into the cloud and that other people could download them. One of the first use cases was some folks that wanted to put satellite video up into the cloud and let researchers pull down that satellite video, or satellite images and then search for aliens or whatever they searched – whatever they were searching for, that was one of the first use cases.
And then Dropbox type stuff, but it was designed 2006, long before a lot of the modern use cases that are out there now were designed. And it did a good job, and it still does a good job for hundreds or thousands of customers out there. So it is a good service, but it’s also the protocol that we use to interface with that service, the S3 protocol. And we’ll get into this more in a moment, but if you are an independent software vendor an ISV, let’s say that you’re DataCore with the vFilO product or you’re [unintelligible 00:34:23] or Rubrik or Commvalt or on the surveillance side, maybe a Milestone or anybody on the media asset management space in Hollywood. You have to decide if you are an independent software vendor, who’s object storage you want to interface with. Right?
And the list of object storage 10/15 years ago, it was long, right, you had AWS S3, you have IBM object storage, Azure, Google, NetApp and EMC. Of course you had Caringo, Hatachi, Fujitsu, HB, you had all these different object storage vendors. And if you wanted to make your software work with all of these at – you had to write a different version of the software most of the time in order for your customers to use whatever storage they wanted to. What ended up happening is, that most folks wanted to start writing into Amazon and so people started adopting that S3 protocol. So the S3 protocol can also be found, as the API, can also be found on other object storage. So, if you look at Swarm, if you look at Hitachi, or EMC, or NetApp, or Google, IBM, or AWS, Azure, or Wasabi, you’ll find that S3 protocol support it.
So those independent software vendors can now write to other people’s object storage and allow their customers to use the storage back end of their choice, that’s what it is. Long story, short, S3 is two different things, it’s the service and it’s the API that allows independent software vendors to have their applications talk to that storage.
Adrian Herrera: Yeah and I guess the point to get across is S3, the protocol is still owned by Amazon and managed by Amazon, so all of the advancements come from Amazon. And then it’s up to the service and software community to support whatever they come out with, it’s a moving target, it needs to be continuously managed. You need to make sure from a software perspective, that you support the latest and greatest calls, all of the menu options, that David was talking about, you need to be able to support. And furthermore, there’s different SDK’s and different ways to integrate the S3 protocol, from my perspective, going back to 2006 before there was a standard I think Amazon did the entire industry a great service by spending a tremendous amount of money and effort pushing this type of standard, this type of protocol, training the overall IT ecosystem to really utilize REST Interfaces.
It took a lot of effort, lot of time and yeah I think it did the entire industry a service. Of course there were other organizations involved, Microsoft with Azure, and of course Oracle was there and then Google and others. Yeah Amazon really was the first. But the point I want to get across here is from the S3 protocol perspective, it is a moving target and it is a de facto standard, but we use de facto for a reason. Right, it’s – it can change, it does change and just because some – an application supports the S3 API, it doesn’t necessarily mean that it works with the S3 target; you really have to test it and make sure you’re workflows work and are operable. Usually there’s just some minor tweak that needs to be done, but you do need to go ahead and validate everything and make sure that it works from a workflow perspective.
That brings us to deficiencies, all right, so what are the deficiencies to S3? So maybe I’ll explain this one David and then you can jump in. so from a protocol perspective, as David mentioned, it was designed to just be simple storage, it was designed as a ramp for other AWS services. So in 2006, Amazon pretty much gave S3 away for free as they tried to push it into the market and train developers to use it for their archival storage for their applications on the back end. That’s how they went to market, they really focused on a grassroots campaign where they went and they showed developers how to use it, how to put it a link and then access a BRU or L. so it was really designed to be a ramp to other AWS services.
It relies on the ecosystem for apps, for content management search, so because it’s simple, the ecosystem is where the intelligence is, so it’s a very, very light menu of options. You have your calls, your puts, your gets, different versions, you do have object blocking, so it is moving, intelligence is being increased, but for the most part, it’s a relatively simple API, relatively limited set of API calls. It was designed for granular data, durability and replication – or it wasn’t designed with granular data durability and replication options in mind. So again, it was pretty simple, you store something and then Amazon service and the way that they have architected their service, determines how things are protected and stored.
They have lots of different tiers, I think David you’ll talk about this in a bit, because Wasabi also has their advantages there and how you protect data on the back end. And how you expose those costs to users. If it’s costing Amazon something, you are being charged for it, right. So I think that’s the point to get across here as you go higher up on the functionality and the feature set perspective, you are getting charged more from the S3 and the AWS side. And of course, that leads into what’s going on in the service side, it leads to a relatively expensive service, right. And it is difficult to forecast costs and this is a good segway into some of what Wasabi’s done. So anything to add here, David to this slide?
David Boland: No, you did a great job covering the protocols and features, I’ll talk more about the service expenses and the difficulty is it of forecasting costs for sure. Let’s just jump right into it, let’s just go back to the hidden cost slide there Adrian and we’ll talk about this. So the world was different in 2006, right, I mean I’m old enough, I’m sure that some of the folks listening today probably don’t remember dial up modems, or a DSL, or the difficulty in getting network connections to the internet. But everything was expensive, every time you cranked the CPU in order to do something that you – in the cloud you were charged for it. Every time you wanted to use any kind of network bandwidth you were charged for it.
And so the architecture in the first generation cloud storage service provides, like AWS, or IBM or even Azure, they charge their customers for every possible thing that they do. Think of it – let’s go back to the original – my original metaphor in the restaurant. Right. You are the application, you’re sitting at your table and you’re given the menu by the waiter, and you ask for a glass of water, the act of the waiter walking to the water station and bringing back a glass of water, the act of him walking back with that costs you money. If you ask for a knife, or a fork, or a spoon, and a napkin, the act of asking for that is going to cost you a few pennies here, a few pennies there.
If you say, ooh I dropped my spoon, I’d like to have a new one – clean one please, that act of asking for one is going to cost you a few pennies. So every time you have an API request in AWS, Azure or Google, anybody else, first generation clouds, the API request is going to cost the customer money. And those requests on that menu could be your API puts or your API gets, your lists, your heads, inventory operations. If you ask the waiter, “Well what’s the special today?” And he has to go to the kitchen and ask the chef what the special is, that ask of what the specials are, well that’s going to cost you as well too.
So inventory operations, there are all part of your cloud service bill from AWS, if you have a small object, OK, and some – we’ll talk about tiers in a moment, but you have different tiers in these clouds storage architects like AWS, you have your standard, your infrequent access and frequent access. One’s own Glacier Deep Archive, yada, yada, yada. If you are in frequent access for example, and you have objects that are smaller than a 128 k in object size, they’re going to charge you a minimum object size of a 128 k, so you have 64 k objects, we’ll be charged a 128 k prices, so there’s a small object tax. That’s like asking if ordering a side salad or a small plate and getting charged a surtax because it’s not taking up the full plate. Right? Transfer acceleration charges, if you put your data into the wrong tier in AWS, it winds up in Glacier.
Yeah if it winds up in Glacier and you need that back faster than the average storage time return, five hours, they’ll charge you to get it back faster, that’s your acceleration charges. If you want to have a bucket replicated from one region to another, you’re going to pay on network transfer charges. Retrieval fees, that’s just the charge that you get for asking to get your data back, that’s a retrieval fee, that’s standard retrieval access and access one zone and Glacier and Deep Archive, and that’s different from your egress charges. Retrieval fee is the cost of asking for it. Egress charges are the cost of the transport that they use to get it back to you.
And again, if you wind up in the Deep Archive or Glacier, every object you have in there, and let’s go back to our petabytes example again, if you have a billion objects in Glacier or Deep Archive, that’s a billion objects, you’re going to – each one of those objects has a object overhead charge associated with it because they add metadata to that object so that they can find it when you ask for it later. So they’re charging you the – for the ability for them to find that data when they go to look for it, if you ask for it back. So there’s all these overhead charges, there’s egress fees and API charges, all of these things are on top of the visible monthly charge of $23 a terabyte for AWS standard, or $12.50 for infrequent access, or four bucks for Glacier.
And all of these hidden charges make it impossible, impossible to predict what your cost is going to be at the end of the month. It’s never the same, because you’re always going to have different egress charges, different API request charge, different object overhead charges, that monthly storage charge is always going to be different in the case of AWS, Azure or Google. All right, so this is architecture that was possibly worked well back in 2006 or 2010, but it’s no longer a viable architecture for many customers these days, because they don’t want to be hit with the mystery charges every month. And they’re tired of getting nickel and dimed, so in the case of Wasabi, we’ve done away with that, Wasabi is different than AWS, Azure or Google, we have different architecture.
Hey Jim let’s jump ahead one more slide. So, why is Wasabi different, right? So it’s cloud storage, just like AWS, Azure, Google or IBM, the difference is we don’t charge for egress fees, we don’t charge egress fees until you get your data back. We don’t charge our customers for the API requests – to – you look at your menu at your table and you say, “Ooh I would like to have this,” we’re not going to charge you for requests, we’re not going to charge you for any kind of overhead charge or on net transfer charge, anything like that. We don’t have charges other than straight up storage, five dollars and 99 cents per terabyte per month, that’s all it is. So if you know that you’ve got a 100 terabytes of storage per month, you know it’s $599 at the end of the month.
Multiply that by 12 and you know exactly what your year costs are going to be for forecasting your cloud storage bill, it’s simple to use, easy to understand, no hidden charges, that’s what makes Wasabi different. Also what makes us different, there’s no complex tiers, so all the data that’s kept in Wasabi is kept on a hot tier, that’s it. All your data is available at milliseconds away, all you have to do is application has to request it or you hit your mouse – hit the return button on your computer or your mouse and you start to get your data back, so it’s always available when you want it, when you want it, there’s no archive tiers that you have to wait five hours or 15 hours in order to get that data back. Our price is low enough to compete with those archive tiers, but the data comes back to you fast enough or faster in most cases, than the hottest tiers in Amazon, Azure or Google.
Adrian Herrera: Then we’ll talk about performance, just doing a quick time check, David, we have about 10 minutes.
David Boland: All right.
Adrian Herrera: But, yeah talk about performance and of – just to confirm again, you guys are using object storage at your core, correct?
David Boland: Yeah, we use object storage at the core, that’s our file system. It doesn’t really matter to the end user what the file system is, it matters to the application that’s sending the data to Wasabi. But hell we could be using just about anything on the end side, as long as we have that API interface that allows us to work with the application that the customer is using. Customer’s using DataCore vFilO, works perfect right, so on the back end side, yeah it’s object storage in the cloud, but a customer doesn’t need to know how to operate object storage. And the world’s changed a lot since 2006 and object storage – I know Adrian you’ve been around since the beginning of object storage in the late nineties, possibly even earlier than that, and I’ve been on it too and things have changed since the 2006, when the first cloud object storage architectures were designed.
Right? We now have – we can put terabytes and terabytes on our spinning disc, whereas back in 2006 the highest density disc was one terabyte, right. Today we get 18 to 20 terabytes on a disc. The amount of capacity is – outstrips what we would’ve thought about in 2006. Wasabi’s performance is based on just a – needing hardware technology that we’ve designed and we use a proprietary operating system and file system that allows us to be really efficient in how we manage our – the data and disc space and drive down cost, to pass those cost savings on to customer. Our performance is also enabled by – we have distributed architecture in the cloud where each Wasabi storage vault, we call them vaults, they’re a good size units, have a purpose built software running on leading edge hardware that is both scalable and we use the concept of a user server, a database server and a storage server.
We tie those together with big fat hundred gig with Ethernet pipes and load balancers and a lot of parallel processing. So as you mentioned earlier, object storage is great for a multi tenet environment, bingo, cloud is the multi tenet environment and we just have fantastic performance. And we have tests that show our right performance, [unintelligible 00:50:28] performance, versus AWS and depending on object size, and the number of cores you’re using to write to the cloud, we’re going to beat AWS S3, which is the most fastest performing storage service. In our case, we ran 40 tests we beat them at 36 out of 40 tests. And if you want to check that benchmark test out you can find it at wasabi.com/performance, that’ll give you an idea of where we are versus other cloud object storage.
Adrian Herrera: Yup and then use cases, I think we talked about a lot of these things as we were having our discussion. Are there any you want to point out here?
David Boland: You know what, backup recovery is really a fantastic fit for cloud storage, backup recovery and archiving, right. If it’s really high performance, and you need your reading and writing and pulling stuff in and out of storage, like your Swarm, keep that OnPrem, keep that Swarm piece OnPrem. But if you have a backup and recovery need, you put stiff offsite or maybe it’s for remote access, for people who work from home, you know what, you can use the cloud. Tape to cloud works well, archiving works where we see more people using global file share and we see a lot of interest in storing videos surveillance footage in the cloud, so the number of use cases are growing on a weekly basis really for cloud object storage. It wasn’t that case 10 years ago, now we’re just doing all kinds of crazy things with cloud object storage.
Adrian Herrera: Yeah, I guess if you were to take a look at a Swarm use case diagram, it’d look almost the same and there’s a reason, right, object storage is at the foundation of Wasabi, object storage is obviously Swarm, Swarm is object storage. But the point to get across is really depends on what your requirements are, what your cost requirements, are, what your data protection requirements are, the bandwidth connectivity that you have. There’s a lot of options and we wanted to show two different views to users, because when you just get one vendor’s view you don’t always know that, hey there may be other benefits with other technologies. And you can combine both of these technologies in the hybrid way. So part of the educational series is showing both views, one from the service provider view, one from the OnPrem view.
There are times where some of these solutions compete, but there’s also times when they work together. And when you – we’re getting to a scale from a data perspective where you really need them to work together, you need your cloud storage service, you need your OnPrem storage to work together because datasets are getting so big that they’re difficult to move quickly.
David Boland: Yup, you’re exactly right and that’s one of the reasons why the Legendary Entertainment, I’ll just touch on it real quickly. Legendary Entertainment is a Wasabi customer, they were use – looking at Wasabi for the use cases backup archive and remote collaboration. Their challenge was that they needed to reduce the OnPrem footprint and they wanted to eliminate some storage silos, they had tape, they had different spinning discs, they had SSD out there, just a bunch of different silos and they said, hey we need to figure out a way to reduce our footprint, save some money. And additionally, they also had a lot of video footage in other clouds and they wanted to lower that cloud storage expense and illuminate egress fees and those hidden costs I talked about a moment ago.
So they looked around, they tested Wasabi, they said, hey this works well, so they – we started storing their video footage and their backup and archive and business data for the last couple years. One of the important things for these guys was our support for that S3 SPI interface that allowed their existing S3 API applications to work with Wasabi. And so on the media asset management side, Covalent was a media asset management package that they use and it was important that we support that, which of course we do. And then Adobe Premiere Pro, right, so you have remote work, in the times of COVID people working from home and there’s a work from home mandate, and how do you do that with people working from home? You have to support that Adobe Premiere Pro interface and so we did.
Next slide for me Adrian, I know we’re running short on time, so I’m going to try to wrap this up. Yeah, so – right, so as Adrian mentioned before, right, S3 API compliant applications you need to work with S3 API compliant object storage, right. So if you have a independent software vendor or a piece of software that you’re using today that you want to make sure that it works with Wasabi; don’t worry about it because AWS – Wasabi’s AWS S3 compliant. Everything that is from our identity and access management is the same as AWS itself. So if you have Adobe Premiere Pro, or Covalent or DataCore vFilO, or Milestone on the video management software for surveillance world or you have Nasuni for – or Panzura for global file sharing.
Because these guys work with AWS S3 they’re going to work with Wasabi, they’re going to work with DataCore S1. This is the important piece of ASW S3 compatibility, this is it.
Adrian Herrera: Yeah. And then – so let’s – we’ll close it on this slide, we have a thank you slide, let me just show you those emails really fast, email@example.com and firstname.lastname@example.org. but I know we’re approaching the hour here; we can go a few minutes over if people have questions, feel free to ask your questions now if you have them. But we’ll talk about this. How do you choose the solution for your requirements? And we talked about this throughout the whole webinar but I’ll focus on the object storage side, you could focus on the cloud storage side, David. But from a capacity perspective, as David mentioned on the service side, it’s very easy to get started with Wasabi and other services.
So if anything under a 100 terabytes you should really consider cloud storage, it’s very easy to get set up, you put down your credit card, you could start storing. Once you get to that 100 terabyte mark, you can have the benefit of – or economics, the benefits of economies of scale start to kick in and you could bring that stuff, that data in house. And you can definitely optimize your cost from a long term perspective. If you’re taking just a look at storing something for a few months, then the cloud storage is a really good solution. But if you’re keeping stuff for years, three years, five years, 10 years, the long term retention economy to scale really kick in. From an overall data center perspective, you can fit about – anywhere between a 168 terabytes to 200 terabytes in about one year of rack space from an object storage perspective.
You can get more dense, you can use SSD’s, you can use HTD’s very cost effective, HTD so it’s very flexible from that perspective. From the egress time to store, you’re talking about internal network speed, so a lot of the time it’s application requirements. So I know we’re hitting the limit here, is there – from the cloud storage perspective, is there anything you want to point out here, David?
David Boland: Nah, you covered it pretty well. The only thing I have to say is, with cloud storage in general, makes a great fit for primary storage overflow. If somebody’s sitting – they’re like, OK I need to order more arrays, I need more spinning discs, I need more boxes in my data center to support my growth and you’re not going to get it on time, right, just – you can, like you said, put your credit card down, store some of that data for a percent of the time into Wasabi and then when you get your new gear racked and stacked, egress it back out of Wasabi, back into your Swarm. Because there’s no charges for egress, so you can move it in, move it out as you see fit.
Adrian Herrera: Yeah, and I think that’s the point to get across, you could take a hybrid approach; we’re seeing more organizations take a hybrid approach, it doesn’t need to be an either or decision, it could be both. Yeah we have technologies at DataCore that can control that from a data management perspective. I’m guessing a lot of the viewers also have those data management applications that they already use today that can go ahead and manage sending to both Wasabi and DataCore products and DataCore whether that’s SANsymphony, vFilO or Swarm. And with that, we have next steps, so if you have any questions for David or Wasabi you can contact them at email@example.com. Any questions for me or DataCore you can send an email to firstname.lastname@example.org. of course we have a lot of different resources in your respective websites, wasabi.com and datacore.com. And I know we’re hitting the end here, so I think we will wrap it up, since we’re already two minutes over. So if you do have any questions and we didn’t answer them throughout, please go ahead and send them to either one of these addresses, I’ll go ahead and follow up with any questions that we didn’t get to throughout the webinar. And with that David any closing statements or comments to anyone? How do people get started with Wasabi, maybe that’s a good one –
David Boland: That’s a great question. All right, so hey, if you’re curious, you want to kick the tires and give it a 30 day free trial for a terabyte, go to wasabi.com look for the free trial button. Put in your contact information and we will respond to your email with free trail information. You don’t – we’re not going to ask you for a credit care, we’re not going to – you won’t be automatically enrolled, kick the tires for 30 days, decide if you like it and if you do, awesome. If you don’t, ah, come back some other time, but yeah hit Wasabi website, look for the free trial and that if you also go to our knowledge base, which is part of our resource section, you will find how Wasabi works with vFilO and Swarm, so we have both how to configure Wasabi for vFilO, how to figure Wasabi for Swarm on our knowledge base, it’s there and it’s ready for you.
Adrian Herrera: Yeah and of course to get started with DataCore by going to datacore.com or talking to your regional DataCore rep, whether that’s a partner or someone at DataCore. And with that, David, thank you so much for your time. As always it’s fun and informative.
David Boland: Thank you, Adrian, great time, I love you guys.
Adrian Herrera: Yeah, likewise and thanks to all our viewers, thank you for spending the time with us today. All right, this concludes our webinar, thanks everyone. Bye.