Search
Languages
<
On Demand Webcast
39 min

How to Integrate Hyperconverged Systems with Existing SANs

Webcast Transcript

Danielle: Good morning and good afternoon, everyone. Thank you so much for joining us today. My name is Danielle Brown. I’m a marketing manager here at DataCore Software, and I’ll be your moderator. We’d like to welcome you to today’s webinar, “How to Integrate Hyperconverged Systems with Existing SANs.” But, before we begin today’s presentation, I’d like to go over just a few housekeeping points for us.

This presentation is being recorded and will be available on demand and you will be receiving an email from BrightTALK with an on demand link at the conclusion of this webinar. And lastly, feel free to type in any questions throughout and we’ll be able to address those at the conclusion as well. And with that, I would like to kick it off and introduce our presenter for today, Augie Gonzalez, director of product marketing at DataCore Software here as well. Augie?

Augie: Thanks, Danielle. So what I want to do is go through several items that I think will be pretty much top of mind for you. The first is to try to understand and give you some tips on whether hyperconverged systems are for you. I think that in some cases people think that’s a foregone conclusion. What I’d like to describe is some of the boundary conditions and considerations there. Really where is the best fit for hyperconvergence? Because it does have places that would be considered outside the norm.

Also want to discuss some of the special considerations if your applications tend to be IO intensive. Because general purpose things do really well, but IO intensive apps require a little more planning, and perhaps different choices. In discussing that, we’ll go through one of the case studies from an emergency, a 911 emergency dispatch customer of ours. And that’ll cement, I think, some of the concepts that we’re discussing here; and follow that with a little bit more information on how to leverage resources that are already sitting on the floor that you’ve already paid good money for and you want to take advantage of.

And close the discussion with some specific steps and guidance to ease you from the transition from potentially the three tier SAN that you’re on today to get you on the hyperconverged environment. Now, on the basic comparison, both three tier SANs, what would be considered external storage on the left, where servers are connected through some kind of a iSCSI or fiber channel network to an array and the hyperconverged systems where you have servers with applications run in a cluster, are all trying to achieve the identical objective, that is to provide shared storage to a number of applications so that they can partake in that community pool of capacity and also in some cases to enable things like live migration of applications and virtual machines, including contains from one physical server to another without disruption.

In these diagrams you’ll see I’ll use a kind of aqua blue rectangle to show where software is participating. So on the left we have it on the array as the front end controller. In the case of hyperconverged, using internal storage in the servers, that would be a piece of software that would be sitting either on a VM or right on the hypervisor.

DataCore offers both, so we have no particular horse in the race that we have to worry about trying to drive one way or the other. But I will describe some of the variations on why we could recommend one vs. the other. There are very clear differences in what you’re trying to accomplish. In the case where you have a centralize SAN, job one is to offload the servers and concentrate the storage capacity in one area that’s easy to manage and provide network access to that over that iSCSI or fiber channel connection.

It’s also interesting to segregate the responsibilities of let’s say the storage administrator who has purview over those central resources. Contrast that with a hyperconverged system. What we’re trying to do there is say, all these servers have excess capacity in terms of compute, IO, and potentially storage. And so let’s pool all of those across them and take full advantage of them and where possible insure that the applications can get to local disk without having to make a traversal across the network.

At the same time, we have an enhanced potentially way to increase alt isolation by distributing the storage across more nodes. And part of the trade off there is we’re also combining roles. So from an administrative standpoint the person who’s taking care of the servers is also having to take custody of the storage.

And I’ll get into some of the social political factors in one second. And here’s part of that view. So if you re-draw that central SAN and make it look something like this, just give some emphasis to the fact that that storage on the left is all concentrated, it’s a centralized pool, it may be made up of multiple arrays or storage devices. But they are somehow, there’s something wrapped around them that makes them look like a central pool. And then the compute elements where the application’s running, those would just connect to it through whatever fabric we’re dealing with.

So that’s what we mean by clear separation of roles. In the second case, it’s not clear because each one of these nodes has the potential to both provide resources in terms of capacity; at the same time it’s a consumer of storage capacity and network bandwidth, in computational cycles. So you have to be concerned that you may be placing as a storage provider – let’s say you’re on the left node here – and you are providing capacity to your companions to the right, to your neighbors.

If one is not taking special precautions, you may actually reduce the service level that you’re offering to your neighbors, unconsciously. Just because you are kind of dealing with your self contained server and didn’t realize the ripple effect this was having on the adjacent consumers. Very important point. In the other case you’re always thinking it; it’s like my job is to provide storage to the broader and I want to make sure nothing’s getting in the way of that. That’s a little bit more difficult to achieve when you’re doing hyperconverged systems.

So the general philosophy here is sometimes to over-provision resources, just so you have a little bit more cushion and you see a little bit more fan out of that. So on the social front – this is some of the things that kind of guides the orientation of the individuals responsible for this, and it’s kind of curious because I see this a lot, talk to customers on both side of the fence, even in the same company, that have different roles.

Most of the storage administrators that I’ve spoken with and many of you might be on the audience today, you have a mentality – I’m a special ops person, I have to worry about controlling all that capacity and I’m all about providing the highest levels of service to my consumers by centralizing those assets and making sure nothing ever goes down here, that the response time and the through-put it where everybody’s expecting it.

And I have to feed that central SAN as necessary to make sure I’m able to refresh the gear so it’s the most modern and most capable and that I avoid anything that smells of an outage that could have this catastrophic effect on all my clients. The hyperconverged person in fact is usually somebody who’s already taken responsibility for the virtualization administration, whether you’re a vSphere administrator or hyper-V or KDM usually own that broader range of requirements, so you worry about compute, you worry about load balancing between different servers.

Much more of an application focus. And you’re really managing this from the host standpoint rather than from the storage side of it. It’s also key that you may be quite accustomed to self-provisioning, is giving yourself capacity and drawing from that pool. So you can see those are two different ways to look at it. And somebody who’s making that switch from either side, it can be a little bit of reprogramming required in your own mind.

You also have different favorites in terms of the vendors you deal with if you’re coming at it from a central SAN standpoint or whether you’re coming at it from the server perspective. So the companies you’ve been dealing with and you’ve been buying gear from could be quite different, and that may appear in your selection criteria. There’s quite a bit of lure and charm around hyperconverged. It’s been that way for, I would say maybe three years now. First thing that’s really apparent is that hyperconverged, just the optics on it, when you see all packaged together, it appears very, very simple. It’s really only concerned with one type of gear; that is servers.

So I’m not dealing with three elements, like the compute part, storage and the network have all been combined and consolidated in this one package. I think it’s purported in many cases to be cheaper, less expensive, I guess would be the appropriate word, because the view is that commodity service can be using lower cost equipment than large proprietary storage arrays which carry a lot of other sheet metal and other items.

And the – when you look at some of the diagrams that are shown out there, you may reach the conclusion that hyperconverged must also be faster. Because it does not incur the delays of hopping over a network to get to the storage since the storage is local to the machine. So we’ll peel back on all those a little bit. And there’s a little caution throughout that I want to get across to you. Many of what we’re seeing out there is hyperconverged products while they are marketed to large enterprises for big applications, in fact have been designed for relatively small and general purpose scenarios.

They do really well at light loads, especially when things can be easily split when the application is more of a distributed app that you can say, well, I can either run two copies on two nodes or I can do four copies on four nodes and I can handle it that way. Where things get a little crazy is when you have the monolithic apps. A lot of the systems of records, for example, the really heavy, hammering on IO, and how kind of that linear build to them, where they cannot be split up across multiple nodes, that’s where they fall off track and you may have some issues trying to get those in there.

So that’s an immediate decision that you need to make in your assessing, in your assessment of potential ways to solve your problem. Some of things that we find is that while the hardware is full capable of addressing very high performance requirements, the software stack that’s being put in place to address that in fact becomes a bottleneck. So it’s a server side bottleneck that’s introduced; it causes this sluggish performance you would not have expected when you look at the network diagram.

And because of the way that the hyperconverged systems are packaged for ease of use, they also, you’ll find in many cases you can’t get your fingers in there to tune things. It’s pretty much this is the defaults and we’re working with that. And that can lead to a term we in the real time business refer to as un-deterministic behavior. You get certain variabilities in it because of the way it’s trying to do kind of a little bit of a time sharing. There is this very important concern that we’ve discovered over the past several years that we’ve been working on this and seeing other offerings that many of them continue to use, out of all those cores that you see on these servers, a single server, effectively a single core, is being used to handle the IO.

And that can be the single biggest responsible reason for sluggish behavior. And as I said before, one of the things you have to have worry about is that without knowing it, you may be causing this competition for resources because you’ve loaded up one of the nodes or two of the nodes out of your complex of our or five. And those are the ones that everybody’s drawing capacity from.

And that’s not very apparent. Especially if you’re used to working just from the server perspective. Now, what can we do about these things? Well, the design that DataCore has put in place for hyperconverged systems is very, very simple and very high performance. Because the class of applications that we’ve been asked to design these systems for demands the lowest latency, that is the fastest response, and also a very attractive price performance; that is, the price you pay for IOP.

And in fact, some of these systems have been fully vetted with the Storage Performance Council to see how they stand up head to head against other gear. In this example I’m showing you here, was just a real basic but very beefy system turns out in terms of its capability. It set, it’s just a little two node system, set the world record for hyperconverged IO on the SPC-1 IOPs; those are special heavy duty transaction type work loads, typical of what you see in OLTP environments.

And IOPs are kind of sometimes over marketed. Really what most people care about is how much useful work can you do, and that’s, and at price. And so those two measures, in terms, especially on latency, it’s what’s key to these figures. So not only was it the faster hyperconverged shown on this; it was also the fastest response, a sub-millisecond response. It was a 220 microseconds, and at 10 centers SPC IOP. So unmatched economics is the way we describe that.

This is an industry standard. You’ll see, if you go to the SPC website, you’ll see a number of the biggest, baddest storage suppliers have published their works and you can see how DataCore stands next to them, even with these very economical configurations. Now I should say, that picture that I showed, DataCore only sells software, so we team with hardware suppliers, whether that’s Lenovo, Dell. You pick the supplier of servers and we put that all together so our authorized resource takes that responsibility to package it for you.

And in many cases we take full advantage of equipment you already have in place, so that’s the big scene here, is that you’ve already paid good money for this; let’s just put it to use. Part of our capabilities and what sets us apart is the ability to put all these cores from these servers to use and to handle the IO in parallel where others are forcing it down, basically one single thread.

We have other webinars that we talk a bit about parallel IO. And you can see how that provides you the ability to do more work in less time and fewer machines can do the work. Many. Those are essential characteristics and distinguishing aspects of the DataCore offering. Now let’s make this a little more real for you. And I want to draw on a case study that was done with a 911 emergency dispatch company. It is a service. It’s a governmental agency in Oregon. And we’re all pretty familiar with that fortunately and we’re not having to call a 911 on a regular basis. But you can imagine the life criticality of making a call and making sure these systems are turning around the information quickly that the call dispatchers and the first responders all have the latest information.

So if – I’ll provide a link in the next slide, I believe it is, that gives you a little bit more detail on this. But the whole issue with them is that this call dispatch center was finding that their queries and their inputs to their sequel server back end inside this that makes up the centerpiece of their environment was just not responding quickly enough to make it possible for them to respond within their service level agreements that they’ve set out in order to provide quick enough response back to the folks who are on the other end of the line.

And they were experiencing very high latencies. They were using conventional storage devices, sitting out on a raise. They also had some very specific goals in terms of up time and data loss. So you see them express here, which they were not fulfilling. They were not able to do that and they certainly had a problem; when they looked at that financially, how to correct those problems, they found that every alternative that was given to them, whether it was hyperconverged or more arrays at the same time, required a lot of forklift upgrades, a lot of brand new money being spent on this.

And it just could not be justified. Now they went through that conclusion and eventually selected DataCore to address their requirements. Here’s a quick picture, just to kind of give you an idea of the flow for this, as you might expect. I was just watching, in fact last night, one of these programs, it’s called “911.” I don’t know, I don’t necessarily recommend it. But it does give you an idea what’s happening behind the scenes. And what it – part of the issue here is that both the text that’s being typed in terms of what’s going on at the emergency site, what the audio recordings – all that information needs to be passed on and situational awareness is required to give to the first responders, whether law enforcement or they’re firefighters; all those folks need to know very quickly most of their SLAs are within 90 seconds.

They have to process all the information, get that to somebody who can take action in order to save lives, or to prevent the law enforcement officer in some cases from doing, approaching somebody who may be a threat to them. And they may not know it. So pulling that off obviously requires the fastest performance from a latency standpoint, not so much through put, because through put, while interesting, it is not the biggest measure here.

After they put the DataCore hyperconverged system in place, these latencies that had exceeded over 200 milliseconds in terms of their sequel server behavior, they disappeared. I think it was the gentleman’s name that spoke to this, they outright disappeared, so that the dispatchers, when that happened, on instituting this new capability, they basically said it looks like somebody threw a switch and made this thing go super fast. And this was achieved through that transition to a DataCore hyperconverged system. The other notable effect is that they had had this problem with trying to set up their disaster recovery site. So in the event that their primary location where they’re servicing these calls had an outage or needed some major maintenance, they would have to sail over to a location, and they could be out several minutes where 911 calls would go nowhere basically.

There’s nobody to address that. Subsequent to instituting a stretch cluster with DataCore across these two sites, roughly about two miles away, the dispatches could not even sense when the service had been cut over to the DR site and back during maintenance periods. It was that seamless too in that response. So that gives you, I think, an idea of how snappy the way that DataCore is serving these IO requests.

But it was also important and feasible because it was lower cost at the end of the day. So in conclusion to their changeover, they found that they had reduced the infrastructure by going to this hyperconverged system, by over 60 percent; they had reduced the number of hosts, physical hosts that they needed to run it because they were collapsing all of that onto fewer servers that were doing double duty and they saved quite a bundle on their sequel server license as well, because they had fewer instances of that.

Not something you normally would think about when you’re doing this, but it is an important financial byproduct of hyperconverged done right. Part of what they’ll talk about and several of hyperconverged customers talk, as a second round of conversation is not just the performance, low latency, high availability, but the features that they’re able to put to use for enhancing the way they can restore, minimize the backup and restore Windows; that normally they would have required some separate backup products that create a bit of delay for them and uncertainty.

We have things like continuous data protection as one that has helped a number of them combat ransomware. Not insignificant these days. It’s making a lot of headline. And CDP is a way basically that records almost like a DVR the stream of IOs to critical volumes and allows you to trace back in time and dial that back to a point before the ransom ware attack occurred and restore data from them.

So you can tell those folks who are hijacking your system, sorry, they are of no consequence. Because we can recover completely from it. So having said that, let’s talk a little bit about some of the configuration at large. One of the things that you’re fully capable of doing with the hyperconverged system from DataCore is scale as you need to. You can easily start with a system, originally may have been as modest as two nodes, and grow that in place, and grow that in place not only with like systems but you may in fact be able to tap servers you have on the floor already and recommission them in this role. So they do not have to be identical.

And that’s a rule that many other competitors in this space, they’re forcing you to have like machines in every one of those nodes, and that ties your hands. The second element I want to draw your attention to is some of these nodes may be storage heavy – that is, they are big suppliers of capacity; others maybe compute only. And you can certainly do that mix and match as you feel, and from my standpoint of licensing and pricing, you can see it’s – the number of nodes really doesn’t matter, were all capacity based.

There are, when you look at the way DataCore software is deployed, it’s quite unusual, and another source of distinction. So in these diagrams that I’m showing, everywhere where you see those two aqua arrows horizontally, that’s an instance of DataCore software running. And so many of our customers in years past who have run central SANs have used those to pool and virtualize storage arrays from third party suppliers, whether it was Dell, EMC, Hitachi, IBM, you name it.

And we sit in front of that as a caching layer and a high availability layer, that pools those collective resources and make them look if they were potentially from the same manufacturer. That same software can also because applied and the same licenses can be applied to be used at the point where those storage devices, they’re useful life ends, and you would replace them with just direct attached storage; there’s local storage on the servers where DataCore is running, and that’s the second picture you see here, from the left.

So you can immediately, that’s your first  level of collapsing. And at some point you get to scenario where you’re hyperconverged if you take this one at a time, where everything fits tightly and neatly inside just the servers, with the applications of storage and all the networks combined. The fourth picture I’ll talk about in the minute, but it’s a hybrid of those. I’ll leave a little mystery for that.

Now if you’re hands on, on a hyperconverged system, you’re wondering how does this all work, especially from a provisioning standpoint, if you’re a vSphere administrator, you would find this very, very familiar. We take advantage of a thing called VVOLs. VVOLs basically says you would see the catalog of resources available to you that you can provision, whether it’s very high performance storage, midrange storage, or secondary storage for large archives, and you simply collect and specify that at the time you create your VMs, just like you would any data store, and DataCore fulfills those requirements based on the types of discs and the quality of service you requested.

So you have to, once a system has been set up, you don’t have to be aware of the magic that DataCore’s doing behind the covers; it simply looks like your familiar provisioning, and substantiation of vSphere virtual machines and attaching discs to it, adding more disc, all of that. And this is all policy based storage. Sometimes you’ll hear the terms used like gold, silver and bronze to specify the different service levels and the availability requirements.

If you happen to come from the Microsoft Hyper-V world and are a fan of Microsoft System Center Virtual Machine Manager, the same is true there. You would use your familiar interface provisioning, those virtual machines, and you would see these storage pools available to you to tap and associate with your virtual machine.

So the whole idea is to hide the intricacies of storage management from you so you can just fulfill the capacity needs you want and the availability and performance requirements that your users are requesting of you. In practice, what we find is the same software is being used in a variety of ways. So this picture just shows you some of the diversity you might run into in applying this. In the center piece we have what has been a three tier SAN where DataCore is being used to pool these dissimilar arrays.

On the left top corner is a case where the same software stack has now been configured as a hyperconverged cluster for Tier 1 applications that demand the highest performance and lowest latency but are basically cordoned off potentially from the rest of the system. Notice here on the – at the DR sites, the same software can be used to normalize or standardize the way connections happen from the primary site to the DR site or for the primary site to the remote offices and branch offices.

And all of this can be centrally managed. And in fact, the personalities of those locations, what may have at one time been a branch office could turn into a DR site, and you simply stand up and reconfigure the system so that the top right element might over time start to look more like the one below it as the definition and the requirements for that branch office have magnified. And there was no forklift through this.

This was simply we’re starting to configure additional resources and allowing it to tap into those resources and the role has changed. The same is true if you’re trying to do disaster recovery to the Cloud. Let’s say you’re trying to get a copy from on-prem to Azure or AWS; what you would do in that case is you would have, for example, a hyperconverged system on premise or multiples of them, and they would be replicating to copies of the DataCore software sitting in that public Cloud.

It would look very much as if you were at a co-location or just another branch office. It’s really no difference there other than how that capacity is managed. There are a number of advantages that I want to highlight here. Really at the end of the day there are a few measures of just what’s different and what stands out as spectacular when people evaluate our product. And that tends to be in terms of the responsiveness of the system and the price performance, but also the flexibility. So it, rather than force you and tie your hands to say, well, your hyperconverged has to look like this because that’s how it’s been defined, we allow you the ability to contour, make changes to that, to match what it is you already have in place and take advantage of those resources, rather than tossing them just because the architecture is forcing it.

And so here’s the transition that I would suggest for any of you, especially if you’re coming from a three tier SAN. And it’s pretty straightforward and it’s self-based. First part of this is, you’d say, okay, if I want to take control of the diversity I have, the first thing I’d do is I would institute the DataCore software between my current host and whether there’s physical servers or bare metal servers, and allow us to pool those arrays. At the point that we feel that those arrays have exceeded their financial value, and we substitute for them the internal storage, the behavior, operationally, is the same. The performance may actually in some cases exceed text performance, what you’re seeing on the left, because some of these arrays have in fact been a little bit long on the tooth, I guess that’s one way to describe.

And they’re just simply not able to deliver the kinds of speed that solid state disc and flash, NBME flash, directly on these servers can do. In the second step in that transition then, you can start to get a taste for what happens when you have double duty on those servers. They’re not just doing storage but they’re handling both the storage and the application computational workload.

So that’s the third figure. And there you might say, hey, I want to do a few nodes like this. Let’s see how that’s working out. Am I able to manage the quality of service that I’ve been asked to? Am I comfortable operating the environment that way? And if so, then fine. You’re there and then you kind of apply that same behavior to the rest of your nodes. You may also find that you have some outliers. You may have some bare metal machines that’ll never be part of your hyperconverged. Simply, that application, nobody’s going to take the time to virtualize it and put it on a cluster.

And you’ll be able to feed capacity and serve capacity in a more appropriate way using a hybrid of this. That is, I’m taking my hyperconverged system, which is normally just fulfilling the requirements of the private cluster and making either fiber channel or I-Scuzzy connections to these external hosts, which are strictly consumers of capacity. They do not contribute to the storage pool.

And they too gain the performance availability and low cost benefits. So with that, what I’d like to do is invite you to see the slides. Schedule a live demo with us. The link that I provide here lets you schedule a request and one of our solution advisors can get with you and just walk you through that process so you can see how straightforward it is and how it applies to your environment and the rules of governance you’re operating under. And with that we’ll take questions.

Danielle: I see we have a few questions here, Augie. Number one would be, how is your software priced?

Augie: Okay. Yeah. Software is priced very straightforward. So it is consumption based, how much capacity you have under management; that is, how much capacity you’re managing and how much capacity is available to us to serve to the hyperconverged system or to the three tier SAN, same. And we have three models. Basically one is the top run premium model, which has all the bells and whistles for the highest performance, kind of what I call the first call.

I’ve got a premium business model as well. And then I’ve got the economy model, which is really meant for large scale secondary storage. And please consult one of our value added solution providers or kind of the complete packaging.

Danielle: Awesome, thank you. And then last one before we close up, I would say, is can I use your software in the Cloud?

Augie: Yeah, I touched on that very quickly. I didn’t get into it too much. So more and more what we’re seeing is a combination, really a hybrid Cloud deployment of DataCore, where you’re starting out and most of your emphasis initially has been on taking care of business on premises, and rather than stand up a remote site somewhere else that needs brick and mortar, you’re using Cloud service, public Cloud services, to create that second copy or that third copy of the data and have a way to fall back to that in the case of a major disaster or really to avoid the consequences of such of a disaster.

So the software, it’s exactly the same package. You simply install it in an instance, in a virtual machine instance, or multiple virtual machine instances, in – as your AWS, Google Cloud, you name it.

Danielle: Okay, so I think that we’re – I see we’re getting a few more questions. Maybe I feel that if anyone has any further questions, feel free to email us as well – the information’s provided on our website. You can get in touch with a solutions architect and they’ll be able to answer any further questions in a more personal diagram.

So with that I would like to close everything up and say that before you go, we definitely wanted to say we’ll be able to provide a PDF version of this presentation for you that will be in the email with the on demand link. BrightTALK will be following in your email port as well as, lastly, we’re always looking for ways to improve. So don’t forget to read this webinar before you go as well. And with that, have a great day, everyone. Thank you so much.

Read Full Transcript