What Is Deduplication & Compression?
Deduplication analyzes blocks of data and creates a unique hash for each data block. If a new block getting written to disk has the same hash value as an already existing block, it will be replaced with an identifier that simply points to the already existing data block.
Multiple redundant copies of data can be replaced with references to a single copy, which reduces the amount of capacity needed.
Deduplication is most beneficial when there are multiple blocks of the same data, for example, redundancy within snapshots or VDI images.
Compression, is an algorithmic process that shrinks the size of data by first finding identical data sequences appearing in a row, and then saving only the first sequence and replacing the following identical sequences with the information on the number of times they appear in a row.
Because only the first data sequence is stored as-is, less disk space is sufficient to represent the same information again. Compression typically depends on the nature of the dataset itself – whether it is in a compressible format and how much of it can be compressed.
Benefits of Deduplication and Compression
- Reduced disk space requirements leading to optimized storage allocation
- Greater IT cost savings and increased ROI
- Reduced hardware footprint resulting in lesser floor space and energy requirements
- Improved storage efficiency
Two Approaches for Data Deduplication and Compression
DataCore SANsymphony offers two approaches to performing storage deduplication and compression. You can choose the appropriate approach based on your business and IT requirements.
Inline deduplication and compression: Here, data reduction happens before the data is written to disk. SANsymphony scans and analyzes the incoming data for potential optimization opportunities and performs deduplication and compression. Inline processing reduces disk capacity requirements as data is deduplicated and compressed before it gets stored. When there are frequent backup operations carried out and redundant data generated is high, inline processing would be a recommended approach as it cuts down the data size before storing the backup.
Inline deduplication and compression are supported only with EN edition of SANsymphony and can be enabled individually or together (either deduplication, compression, or both) as needed.
Post-process deduplication and compression: Here, data reduction happens after the data is written to disk. SANsymphony first stores the raw data in the target storage device. Then, the stored data is scanned and analyzed for optimization opportunities. The deduplicated and compressed data is written back to the storage device which now occupies less capacity than before. It must be noted that the initial capacity allocation on the target device is larger with post-processing as the raw data is first stored as-is before undergoing data reduction. Post-processing allows capacity optimization to be scheduled at non-peak hours, thus minimizing the impact on IOPS during peak hours.
Post-process deduplication and compression are supported with EN, ST, and LS editions of SANsymphony. Compare SANsymphony editions.
There are many factors that play a role in determining the efficiency and output of deduplication and compression: type of data, rate of changes being made to the data, access frequency, backup frequency, etc. There are some workloads that inherently perform some level of redundancy elimination at the application level thus yielding lower deduplication and compression ratios. And then there are other workloads, such as VDI, with multiple copies of the same operating system image that, when being backed up, yield higher deduplication and compression ratios. The types of files most likely to benefit from deduplication and compression contain repetitive data blocks, with relatively static content and accessed infrequently. Both inline and post-processing capacity optimization techniques help IT teams achieve CAPEX savings. The actual savings depends on the efficiency of deduplication and compression operations and their individual capacity optimization ratios.