Data Redundancy 101

In today’s fast-paced digital era, where data stands as the most valuable asset, protecting it against potential threats has emerged as one of the paramount concerns for businesses worldwide. The rise in cyber-attacks, accidental data deletions, natural disasters, and hardware malfunctions only underscore the fragility of our digital repositories. These challenges have forced organizations to confront the sobering reality of data loss, which not only translates to monetary setbacks but also irrevocable damage to reputation, loss of client trust, and regulatory repercussions.

Even minor disruptions can cascade into mammoth setbacks, bringing operations to a screeching halt and denting your company’s bottom line. In such a scenario, it is no longer just about storing data; it is about safeguarding it with an armor of redundancy.

What is Data Redundancy?

At its core, data redundancy is the practice of creating and storing duplicate copies of data, ensuring that if the primary data encounters any form of compromise, there’s a fallback waiting in the wings. This intentional replication serves as a safety net, catching anomalies before they snowball into full-blown catastrophes. It’s akin to having a spare tire in your car; while you hope never to face a flat tire, having a redundant copy ensures that you are never left stranded.

Data redundancy, as a strategy, is more than just creating a data copy; it’s a combination of tools, techniques and infrastructure planning to ensure that your data always remains accessible and intact. Redundancy can be achieved at multiple levels, from hardware to software, and from local to geographically distributed environments. In this blog, we will list out and brief the different methodologies of achieving data redundancy, and also analyze the pros and cons of each approach.

What is Data Redundancy?

1. Synchronous Mirroring

automatic failover protection Synchronous mirroring, a cornerstone of high availability architectures, ensures real-time data availability across two (or in some cases three) storage systems – usually within the same site or across metro-clusters. When a write operation occurs, the system dispatches the data not only to the primary storage device but also, concurrently, to a mirror (or secondary) storage device. Thus, data redundancy is maintained constantly, ensuring a one-to-one data match across both storage systems.

The write operation is considered complete only after data is successfully stored in both primary and mirrored storages. This establishes a consistent data state across devices, ensuring zero Recovery Point Objective (RPO) and Recovery Time Objective (RTO). Underneath this functionality lies a series of protocols and communication methodologies ensuring real-time data transfer, synchronization checks, failover, and failback operations. This also necessitates robust network infrastructures, often leveraging fiber channel or high-speed Ethernet, to mitigate latency.

2. Asynchronous Replication

Redundancy in asynchronous replication is established by periodically copying data from the primary to a secondary location – usually across long distances over WAN connection. A key application of this is in disaster recovery. While the primary storage first acknowledges the write operation, data replication to the secondary storage might have a slight delay. Hence it is asynchronous in nature. But, over time, the secondary storage is synced with the primary one, ensuring that a redundant copy is available (even if it might be slightly outdated compared to the primary copy).

Asynchronous replication employs a queue or buffer system. Once the primary storage acknowledges the write, the data is queued for replication. Advanced systems may implement algorithms to batch data, minimize network chatter, or prioritize data sequences. Change logs may be utilized to keep track of data states at specific intervals, allowing for periodic synchronization with secondary storage. This approach is especially prevalent in geographically dispersed disaster recovery architectures, where data is asynchronously replicated to remote sites.

3. Erasure Coding

erasure Erasure coding represents a paradigm shift from traditional data redundancy methods. Instead of replicating the entire set of data multiple times, erasure coding breaks data into smaller chunks and generates additional parity chunks. When stored across different nodes or devices, even if some of these chunks are lost or corrupted, the original data can still be reconstructed using the remaining ones. This method is especially relevant in distributed storage systems, like object storage platforms or distributed file systems, where data resilience across nodes or even data centers is paramount.

When to Use Erasure Coding vs. Replication

Erasure Coding is preferred when:

Storage efficiency is crucial
The number of storage nodes is generally high
There is an expectation to have faster reads of data

Replication is usually preferred when:

Low latency is a priority
The number of storage nodes is comparatively low
Computational overhead needs to be minimal

4. RAID

RAID stands for Redundant Array of Independent Disks. It is a technology used to combine multiple disk drive components into a single logical unit for the purposes of data redundancy. By leveraging multiple drives, RAID can distribute the I/O operations, thereby mirroring data to protect against drive failures.

Technically, RAID operates on principles of striping, mirroring, and parity.

Striping (as in RAID 0) disperses data across multiple drives, increasing I/O parallelism.
Mirroring (RAID 1), on the other hand, replicates identical data on two drives, serving as a direct backup.
Parity (RAID 5 & 6) introduces a method where data is striped across drives with additional parity information. This parity allows data to be reconstructed if a drive fails.

5. Backup

Opening Icons Data backup restore Backups represent a fundamental approach to data redundancy. They ensure that an independent copy of the data is stored away from the primary storage – usually on different media or even different geographical locations. Backups are point-in-time copies of data that can be reverted to in the event of data corruption, deletion, or other catastrophic events.

A full backup captures the entirety of the designated dataset. Subsequent backups can either be differential, capturing changes since the last full backup, or incremental, capturing changes since the last backup of any kind. Underlying these processes, backup systems use data comparison algorithms, checksums, and indexing mechanisms.

Differential vs. Incremental Backup

A differential backup captures all the changes made to the data since the last full backup. In other words, it includes the differences between the last full backup and the current state of the data. This means it can be larger than incremental backups but offers a faster restore process as it only requires the last full backup and the latest differential backup.

Incremental backups, on the other hand, only capture the changes made since the last backup of any kind, which can be a full backup or a previous incremental backup. They tend to be smaller in size compared to differentials but may require more backups for a complete restoration since you need to apply each incremental backup in sequence, starting from the last full backup.

6. Snapshot

Snapshots achieve redundancy by preserving the state of data at specific moments in time. Instead of copying the entire dataset, a snapshot initially captures the full state and subsequently only logs changes relative to that state. Even if the primary data undergoes numerous changes or gets corrupted, the snapshot can serve as a redundant point-in-time copy, allowing data restoration to its state when the snapshot was taken.

Snapshots employ a copy-on-write or a redirect-on-write mechanism.

In a copy-on-write snapshot, when data is modified, the original data block is copied and preserved before the modification occurs.
In redirect-on-write snapshot, the new data is written to a fresh block, and the original block remains untouched.

Just like backups, snapshots can also be differential and incremental. Full snapshots capture the entire dataset at a specific moment, offering comprehensive and independent backups but consuming more storage. Differential snapshots record changes relative to a full snapshot, providing space-efficient backups by capturing only the modifications since the last full snapshot.

7. Continuous Data Protection (CDP)

Continuous Data Protection offers a granular approach to data safety. Unlike traditional backups and snapshots with specific intervals, CDP ensures redundancy by incessantly recording every change made to the data. This continuous monitoring means there’s always a redundant log of data modifications. When a need arises to revert or recover data, CDP provides a granular rewind capability (an undo button so to speak). Even if substantial data gets corrupted or lost in the primary storage, the CDP system’s comprehensive journal can restore data to any of its previous states, providing redundant recovery points.

In block-based CDP, changes at the storage block level are monitored and logged.
File-based CDP, as the name suggests, watches for changes at the file level.
Application-based CDP focuses on capturing data changes within specific applications, ensuring application-consistent recovery points.

RPO for CDP is typically near 0 (in seconds), and RTO would be in the range of a few minutes.

Analyzing Pros and Cons of Data Redundancy Measures

Technique	Pros	Cons
Synchronous Mirroring	Data Consistency:Ensures both source and target storage systems are always in sync Enabling Transparent Failover:Seamless and immediate transition during failures	Resource Intensive:May need robust network and sufficient storage capacity to maintain fully redundant data copies, handle simultaneous writes, which also increases costs
Asynchronous Replication	Performance:Performance of the source device is not impacted Geographical Flexibility:Apt for distant backup locations.	Data Loss Risk:Potential loss between the intervals of replication, should a site outage occur. Capacity Intensive:Needs same amount of storage space both in source and target locations
Erasure Coding	Efficient Redundancy:More space-efficient than mirroring/replication Fault Tolerance:Tolerates multiple simultaneous failures (depends on the EC scheme defined)	Computational Intensity:Encoding and decoding overhead Complexity:Implementation can be complex and demands expertise
RAID	Built-in Failover:Tolerates failures within the RAID system (RAID 1, 5, 6) Hot Swapping:Some RAID configurations allow failed drives to be replaced without turning off the system or disrupting operations Performance:When implemented using hardware RAID controller, delivers excellent performance	Rebuild Time:Risk of lengthy rebuild times, especially for large drives. Performance Variability:Some RAID levels (e.g., RAID 5) while providing fault tolerance, can experience write penalties due to parity calculations Limited Scaling:Often allows scaling up only to a certain extent (e.g., connectivity of hardware RAID controller)
Backup	Enabling Isolation:Backups can be air-gapped for heightened security against unwanted data changes; acts as the last line of defense Enabling Ransomware Protection:Having a recent backup can help restore systems without succumbing to ransom demands	Restoration Time:Time-consuming for large datasets; also depends on backup medium (tape vs. online storage) Storage Costs:Significant space requirements (such as when there are multiple versions) increase costs
Snapshot	Space Efficient with Incremental Copies: Captures only changes; reduces storage requirements Quick Creation: Being a point-in-copy, it is faster in comparison to backups can be easily used for testing purposes	Not a Full Backup:Snapshots do not replace backups. If the underlying storage is compromised, all snapshots can be lost. Performance Degradation:Accumulating more snapshots can impact performance and make management complex
Continuous Data Protection (CDP)	Granularity:Captures every change and allowing data restoration to any specific point in time and resulting in minimal data loss Improved RPO and RTO:By complementing backups and snapshots, CDP helps achieve faster recovery	Resource Intensive:When there are many changes to data, requires significant computational and storage resources Limited Restoration Period:To avoid overuse of storage space, typically only a few days

As we have journeyed through the intricacies of data redundancy, it is evident that safeguarding your data is not just a luxury but an imperative. The unexpected can strike at any moment. By implementing robust data redundancy measures, you can arm yourself against these unforeseen events, ensuring that data remains accessible and intact even when faced with adversity.

DataCore offers software-defined solutions for block and object storage environments with many data redundancy measures built in. Contact us to learn more about data redundancy and the best practices to implement them in your IT infrastructure.

Data Redundancy 101: Protecting Your Data Against the Unexpected