A shift in the computer industry has occurred. It wasn’t a shift that happened yesterday — the year was 2005 and Moore’s Law took a deviation from the
path that it had been traveling on for over 35 years. Up until this point, improved processor performance was mainly due to frequency scaling, but when the
core speed reached ~3.8GHz, the situation quickly became cost prohibitive due to the physics involved with pushing beyond this barrier (factors such as
core current, voltage, heat dissipation, structural integrity of the transistors, etc.). Thus, processor manufacturers (and Moore’s Law) were forced to
take a different path. This was the dawning of the massive symmetrical multiprocessing era (or what we refer to today as ‘multicore’).
The shift to superscalar symmetrical multiprocessing (SMP) architectures required a specialized skill set in parallel programming to fully realize the
performance increase across the numerous processor resources. It was no longer enough to rely on frequency scaling for better application response times
and throughput. More than a decade later, a severe gap persists in our ability to harness the power of multicore, mainly due to either a lack of
understanding of parallel programming or the inherent difficulty in porting a well-established application framework to a parallel programming construct.
Perhaps virtualization is also responsible for some of the gap since the entire concept of virtualization (specifically compute virtualization) is to
create many independent virtual machines whereby each one can run the same application simultaneously and independently. Within this framework, the demand
for parallelism at the application level may have diminished since the parallelism is handled by the abstraction layer and scheduler within the compute
hypervisor (and no longer as necessary for the application developer — I’m just speculating here). So, while databases and hypervisors are largely rooted
in parallelism, there is one massive area that still suffers from a lack of parallelism – storage.
THE PARALLEL STORAGE REVOLUTION BEGINS
In 1998, DataCore Software began work on a framework specifically intended for driving storage I/O. This framework would become known as a storage
hypervisor. At the time, the best multiprocessor systems that were commercially available were multi-socket single-core systems (2 or 4 sockets per
server). From 1998 to 2005, DataCore perfected the method of harnessing the full potential of common x86 SMP architectures with the sole purpose of driving
high-performance storage I/O. For the first time, the storage industry had a portable software-based storage controller technology that was not coupled to
a proprietary hardware frame.
In 2005, when multicore processors arrived in the x86 market, an intersection formed between multicore processing and increasingly parallel applications
such as VMware’s hypervisor and parallel database engines such as Microsoft SQL and Oracle. Enterprise applications started to slowly become more and more
parallel, while surprisingly, the storage subsystems that supported these applications remained serial.
The serial nature of storage subsystems did not go unnoticed, at least by storage manufacturers. It was well understood that at the current rate of
increase in processor density coupled with wider adoption of virtualization technologies (which drove much higher I/O demand density per system), a change
was needed at the storage layer to keep up with increased workloads.
In order to overcome the serial limitation in storage I/O processing, the industry had to make a decision to go parallel. At the time, the path of least
resistance was to simply make disks faster, or taken from another perspective, make solid state disks, which by 2005 had been around in some form for over
30 years, more affordable and with higher densities.
As it turns out, the path of least resistance was chosen, either because alternative methods of storage I/O parallelization were unrealized or perhaps
there was an unwillingness by the storage industry to completely recode their already highly complex storage subsystem programming. The chosen technique,
referred to as [Hardware] Device Parallelization, is now used by every major storage vendor in the industry. The only problem is that it doesn’t
drastically address the fundamental problem of storage performance which is latency.
Chris Mellor from The Register wrote recently in an article, “The entire recent investment in developing all-flash arrays could have been avoided simply by
parallelizing server IO and populating the servers with SSDs.”