At the founding of DataCore, the storage landscape was dominated by large incumbents producing custom-built hardware appliances. We were established on the premise that general purpose computers were sufficiently performant so that storage could best be implemented as a flexible software solution running on commodity hardware. Not only were we proved correct through our delivery of world-recording holding performance and enterprise-class data services, but also our efforts served to create the entire software-defined storage product category.
Delivering a storage solution based on commodity hardware means that it can seamlessly take advantage of improvements in the underlying platform. Moore’s Law projects that the number of transistors on a chip will double every year or two and for a long time this has meant faster CPU clock speeds and hence faster applications.
There is a lot of hidden detail behind the simple phrase “faster CPU clock speeds”: pipelining to ensure that every part of the chip is kept busy, hyper-threading to turn spare capacity into the appearance of another CPU, and caching to help the memory subsystem keep up. In the main, these are implementation details that can be ignored, but they start to become more significant as the demands on the system increase.
Sooner or later though, every dream has to come to an end. In recent years, CPU clock speeds have been reasonably static and instead Moore’s Law has driven the development of multicore chips with their massive parallelism. A single chip can now contain over a hundred logical CPUs and the system might contain four such chips. This can significantly change the way that software needs to be developed.
Balancing Performance, Contention, and Consistency
Since its inception, DataCore has always targeted multi-CPU hardware and has a wealth of expertise in this area due to its heritage in Real-Time and Symmetrical Multi-Processing. However, there is a vast difference between algorithms that perform well with a few CPUs and the sort of techniques that are required to enable hundreds of logical processors to handle millions of I/Os per second with microsecond response.
As an example, consider a common design pattern: a linked-list used as a work queue which is protected by a spinlock. The lock has to be takenwhenever work is added to the queue or removed from it. On a busy system this can result in almost continuous contention for the lock, which means that CPUs are tied up just waiting for access. Basic spinlocks don’t provide the concept of ‘first come first served’, so an individual CPU can get delayed for an extended period, holding up I/O requests indefinitely and affecting application response time. On a lightly loaded system, of course, there is less contention and the issues are simply not seen.
There are some well-known techniques that can be adopted, including using a queued spinlock which incorporates the concept of fairness and lock-free programming based around compare-and-set instructions. At first glance, these seem to provide elegant solutions to the problem, but none of them really hold up well under this level of load. Even a simple memory access can have a significant impact on performance if it misses cache or falls in the wrong cache-line and on a multi-socket system the cost of an access can also depend upon which CPU is making it.
Innovative Approaches for Performing Parallel IO Operations
DataCore has developed a number of innovative techniques to maximize performance, whilst still maintaining the required level of consistency. One of the key principles is that state is maintained in the domain with the lowest contention until there is a necessity to expose it more widely. The protection techniques within each domain are matched to the expected level of contention. This all has to be achieved whilst retaining global visibility to the state.
This may sound simple, but there’s a lot of complexity hidden in the detail; multithreading on highly parallel systems is far from trivial. In fact, it’s so difficult that the trend now is towards scaling applications by building them from independent, containerized, microservices … an environment just crying out for a new kind of storage stack, but that’s a story for another day!
DataCore’s most recently issued patent in this area is:
U.S. Patent Number 10,409,640 – Methods and Apparatus for Data Request Scheduling in Performing Parallel IO Operations.
It’s part of a set of related applications covering techniques for improving I/O performance on a multi-core system, including:
U.S. Patent Number 10,318,354 – Methods and Apparatus for Command List Processing in Performing Parallel IO Operations
What do these patents mean in practice? A software-defined storage product that delivers greater I/O throughput, with better and more deterministic latency, whilst using less resources. It’s technical innovations like these that cause nearly 80 percent of our customers to report 300-1000 percent storage performance improvements and that elevate DataCore from being just a provider of storage virtualization into the Authority on Software-Defined Storage.
Request a live demo or fully-functional free trial of the most advanced technology to accelerate performance, increase resource efficiency, and achieve zero-downtime availability for your storage systems.