Learn how to move data between diverse online storage media and explore the two popular techniques: data tiering and data placement.
Determining which data gets placed on which storage is a herculean challenge storage administrators grapple with on a daily basis. Not all storage media are the same. They can vary by performance, cost, compliance, deployment, location, etc., and certainly not all data are equally important. Some are hot data accessed very frequently, some are infrequently accessed data, and some are just redundant copies of data for disaster recovery and accessed only in the event of disruption and data loss. It is necessary to highlight that the importance of data as well as data temperatures change over time. For example, warm data stored on fast HDDs could be accessed frequently by a certain application and may want to be considered as hot and moved to faster SSDs.
It is the responsibility of the storage administrator to figure out which data goes where. Given the speed and volume at which data is processed, executing this manually and in real time is impossible. This is where automation of data movement comes to aid.
Data storage management software – those that come built into storage hardware and those that are available from third-party solution providers – provide the means to automatically move data to the appropriate storage tier. And this happens fully transparently to the application and users accessing the data, without any impact to operational continuity.
In this blog we will compare and contrast two techniques – data tiering and data placement – that are similar in principle, but different in the way they work. Let’s dive right in.
Automated Data Tiering
Automated data tiering (also known as storage tiering or auto-tiering) is a widely used technique in the block storage world where the software controlling the data movement uses machine learning to track access patterns and understand data temperatures. The science of data tiering distills down to monitoring I/O behavior, determining frequency of use, then dynamically moving blocks of information to the most suitable class or tier of storage media. Based on how hot, warm, or cold the frequency is, data gets placed on corresponding storage tiers. Typically, the storage administrator defines the storage tiers – tier 1, 2, 3, and so on. Then, the software does the rest.
Data tiering can work within a single storage device with different tiers distinguished within itself, or across devices from the same manufacturer or from different manufacturers. Its full potential can be realized when there is no vendor or device constraint, and tiering is performed across any storage system.
Consider an environment where there is a mix of premium SSD flash arrays, HDD storage systems, and JBODs. You would not want to waste the space on the premium flash array with cold data which rarely gets accessed and leave the device constantly hungry for more capacity, which is neither smart nor cost-effective. Data tiering enables automatic data movement so that the high-performing and expensive storage (tier 1) stores the hottest data and the other tiers (lesser in designated tier numbers) get the warm and cold data.
This movement does not only happen when new data is written to the disk. Even as existing data is being accessed and the frequencies change, the data storage management software intelligently recognizes the pattern and moves it to the respective storage tier. Data movement happens continuously and automatically and fully transparently to the application in the front end.
At DataCore, we have incorporated automated data tiering into our block-based software-defined storage solution, SANsymphony, which uses storage virtualization technology to abstract storage capacity from the storage hardware and create virtual pools. Within a storage pool, storage tiers can be characterized and SANsymphony performs data tiering in real time letting you take full advantage of the capacity on your performant hardware for storing critical/hot data. SANsymphony promotes most frequently used blocks to the fastest tier, whereas least frequently used storage blocks get demoted to the slowest tier. This also gives you the benefit to integrate new technologies into your storage infrastructure seamlessly. For example, if you are adding some storage disks based on 3D XPoint, SANsymphony can add that storage non-disruptively into its virtual storage pool and make it your tier 1 storage where all your hot data automatically gets promoted to. SANsymphony’ unique value is that it supports data tiering across any make or model of storage hardware and any deployment type (including hyperconverged).
Automated Data Placement
In the world of unstructured data, where data growth is much greater in comparison to structured data, file storage is generally used as a preferred storage medium. IT organizations require the flexibility to move data back and forth between file storage systems such as NAS, file servers, etc. – and also with object storage when needed – based on their requirements.
This is possible with automated data placement, which is a variant of automated data tiering, but goes far beyond that in meeting different criteria for data movement. Here, the data storage management software is typically a global file system which resides above the storage layer. Leveraging file virtualization technology, the global file system first gathers the metadata from the data payload stored on various storage systems (file servers, NAS, cloud, etc.). It then assimilates the files, including their metadata information, into its global namespace.
Now, the global file system knows details of which files are stored where, what type of files they are, when they were created and last accessed, what their size is, which user created the file, etc. and the capacity utilization of the storage systems. The information gathered about the data is much greater than in the case of block storage. So, there are now more options to customize the criteria based on which data can be moved between storage media. Frequency of data access (or data temperature) is indeed one of them. But there can be numerous other bespoke policies that the administrator can create to regulate data movement. Hence data placement has greater applicability than data tiering.
Here are some examples for better understanding:
- Durability and data protection: Create copies of data stored on a certain share and move it to multiple locations as backup.
- Performance: Offload data stored on premium NAS devices to slower disks and cheaper storage. This helps free up capacity on your primary storage and minimizes I/O bottlenecks.
- Compliance: Regulatory compliance policies may require organizations to retain data in a specific location for a given time period before it is moved or deleted. For example, store customer data within a country or within a specific site to meet compliance and security requirements.
- Offload to object storage: For organizations that are focused on leveraging object storage as a low-cost alternative to file storage, they can use automated data placement and move inactive/cold data to object storage either on-premises or in the cloud.
- Custom business objectives: Move all snapshot files to the cloud; move all data from HR department getting stored specific storage hardware to secondary storage; when capacity limit is reached on a specific storage volume, move all new data to another storage volume (this helps balance load across storage systems); and more.
For performing automated data placement across distributed file and object storage, DataCore offers vFilO, a software-defined storage solution that acts as a global file system and governs data movement based on custom policies set by the storage administrator. vFilO uses machine learning to detect patterns as data gets written to storage and then performs data placement based on these policies. Using vFilO allows you to aggregate namespaces across disparate NAS devices and filers into a single global namespace and streamline data mobility as you desire.
Just like in data tiering, here also the data movement happens dynamically and fully transparently to the application and users in the front end. With the option to move data to the cloud and between different public cloud platforms, vFilO can also support you on your cloud journey and leverage economical options to store data.
While it is common for IT pros to use the terms data tiering and data placement interchangeably, from DataCore’s perspective, we treat them as two distinct techniques wherein data tiering is focused on data movement based on data temperatures, and data placement uses custom policies to control data movement based on business requirements (which also includes data temperatures as one of the options). You can check out SANsymphony and/or vFilO based on what your storage environment is made up of (block, file, or object) and what type of data you are dealing with (structured or unstructured).