Lately I have found myself engaging in many conversations related to the topic of parallelization and why it matters, particularly to I/O processing. Upon first hearing the phrase ‘Parallel I/O’, people often jump directly to a traditional performance discussion. While there is no doubt application workload performance is improved in a traditional sense (in terms of latency reduction, increase in operations per second, etc.), there is much more to the story, another dimension if you will.
You’ve Most Likely Been Here Before
Let’s consider a real-world example we can build on to explain how this works and why it is important. For the sake of simplicity, let’s assume we are standing at a department store checkout area with a single open lane and one cashier. There are 60 customers currently waiting to checkout (or in slightly technical terms, those 60 customers are in queue). Each of the 60 customers is likely to have a varying number of items to checkout, but for the sake of simplicity, let’s assume they are all roughly the same. If it takes the cashier one minute to checkout each customer, the checkout rate is: 1 customer per minute. Pretty simple so far.
About this time, the store manager recognizes the bottleneck and decides to open up five additional lanes, for a total of six (6) open lanes. With six lanes now available, six customers can be checked-out at the same time (i.e. in parallel). The new checkout rate becomes: 6 customers per minute.
Let’s stop here for a moment. I don’t think anyone would argue against the fact that by simply opening up more checkout lanes, more customers can be handled in the same amount of time. But the other interesting aspect of this is the total effective wait time for all the customers has also been reduced. In the first scenario with only one checkout lane, the total time to checkout 60 customers was 60 minutes. In the second scenario with six checkout lanes, the total time to checkout 60 customers is 10 minutes. And of course, you can see where I’m going here. Suppose now there are 60 operational checkout lanes. The new effective checkout rate for all 60 customers is now one minute. In effect, the store manager has achieved what I call, “matching the customer demand one-to-one with the capabilities of the store checkouts”.
This example is precisely what we see in the technology world. The application workloads are the customers (I/O generators) waiting for service and the checkout line with the cashier is the servicer (I/O response layer). The problem with today’s systems is they are largely single “cashier” driven when it comes to the checkout process.
To make the customer experience even better, let’s imagine being able to walk into the store, grab your items, and walk straight-out (while paying for your stuff, of course), but not having to stand in the checkout line at all. This of course makes the assumption that the store has what you want in-stock, but with the massive department stores we now have today, this usually isn’t a problem. The total overall net-effect is the store can now process many more times the customers it originally could and the overall customer satisfaction is high because the customer checkout time is very low. This is another attribute of Parallel I/O, a mechanism referred to as anti-queuing (in other words, don’t queue unless you have to).
Adaptive Parallelization is not just a neat innovation, it is an absolute must when it comes to dealing with the highly variable onslaught of I/O generated by today’s applications. But the extra dimension of performance I mentioned earlier is the ability to now run many more times the number of applications simultaneously at better performance levels than the single application did before. If each workload had its own cashier (or worker), then you can run many more applications (i.e. databases, virtual machines, etc.) and run them faster since there is less demand for each individual cashier (or worker).
Particularly with Hyper-Converged workloads (HCI), the handicap caused by single-worker I/O processing is precisely what is holding HCI back from achieving much higher levels of workload density per system. The I/O response layer simply cannot keep up with the highly concurrent application workload demand. And to make matters worse, we now live in the “container-era”. This will only drive concurrency even higher, thus aggravating the I/O response layer bottleneck even more.