Adrian Herrera

5 Considerations for
Petabyte-Scale Data Protection

As an IT executive, do you remember when you moved from talking about protecting megabytes of data to gigabytes? How about from gigabytes to terabytes? How about petabytes?

If you haven’t started talking about protecting petabytes, you will soon. Data growth at every organization is increasing in momentum due to the continued move to all-digital workflows. The average Swarm customer has close to a petabyte of data, and many have billions of files that need to be accessed by hundreds to thousands of users. At this scale, you need to think about data protection and recovery differently. It is not just about where you are today, but where you will be years from now.

Here are 5 things you need for petabyte-scale data protection and how DataCore Swarm provides them.

1. Automated, Fast Recovery

Automated, Fast RecoveryThe first thing to consider is recovery. When you get to a certain scale, hardware will fail, it’s inevitable. Your storage system needs to be designed to anticipate this and have the ability to recover data gracefully. Swarm Object Storage was designed from the ground up to manage billions of files and petabytes of information. To do this efficiently, Swarm focuses on automated recovery and built-in data protection that doesn’t require manual intervention. Fast Volume Recovery uses a distributed algorithm with performance that scales with cluster size. It is automatically applied to lost or retired volumes to ensure adequate replication and preservation of segments of objects that were on the lost/retired volume. Thus, when a damaged object is identified, it is actively rebuilt (as opposed to being passively rebuilt, which is the case in most other storage solutions).

2. On-Premises Infrastructure Management and Cloud DR Optimization

On-Premises Infrastructure Management and Cloud DR OptimizationThe second thing to consider is optimizing all available resources. These include both your staff and the underlying infrastructure they support. Administrators often manage many different types of technology including various types of storage, networking gear, applications, and everything in between. It is important to determine how difficult a solution is to maintain, expand, and if an administrator can achieve their objectives with minimal effort and oversight.

With Swarm, a part-time administrator with general IT knowledge can manage over 100 petabytes of data and underlying infrastructure. This is due to functionality like the Swarm Health Processor that virtually eliminates manual infrastructure management by periodically examining the health of objects. Data integrity, number of replicas, erasure-coded data parity, volume IDs and the like are all continuously verified. Load balancing and space reclamation are also automated, avoiding hotspots while ensuring that hardware is utilized optimally.

Swarm also includes the ability to replicate data for disaster recovery (DR) purposes to any S3-enabled device (like Fujifilm Object Archive) or S3-enabled service (like Amazon S3 Glacier or Wasabi). This makes it easy to rapidly comply with corporate policies requiring a copy of data stored offsite regardless of your location.

3. Policy-Driven Data Lifecycle Management

Policy-Driven Data Lifecycle ManagementThe third thing to consider is that the value of data changes over time, but in today’s world it also needs to remain instantly accessible. In Swarm, Lifepoints automatically manage storage utilization and data lifecycle. Lifepoints are policies and triggers stored as metadata that automatically manage the content lifecycle from creation to expiration. These include protection scheme, deletion, and delete protection. You can ensure data is protected from malicious attacks by making it immutable for the first few months. Move from replication to erasure coding to significantly reduce the capacity needed. Or, move from erasure coding to replication to increase file performance. All this is done in an automated way that you can set on initial write.

4. Remote Storage Management and Alerting

Remote Storage Management and AlertingThe fourth thing to consider is how an administrator manages and views the health of a rapidly expanding storage solution from wherever they are. As recent events have shown, you may not always be able to access the data center in-person when working remotely. With Swarm, we created a highly responsive HTML5-based Swarm Storage User Interface (UI) that you can access from any current browser on a PC, tablet, or mobile device. The Swarm Storage UI provides a dashboard for an at-a-glance health overview, trend reports for capacity planning, operational counters, and configuration of dynamic settings without requiring Simple Network Management Protocol (SNMP).

Remote Storage Management And Alerting

In addition to the management dashboard, Swarm provides a number of ways to monitor the status of your Swarm cluster including phone-home reports, detailed system status via SNMP, and the ability to export detailed system information via Prometheus for visualization using Grafana and alerting via email or Slack.

5. Tenant & Content Management

Tenant & Content ManagementThe fifth thing to consider is how users (be they end-users or applications) access storage services and data. This goes beyond ensuring integration with identity management solutions. Administrators need to have the ability to manage access to data directly from the storage layer and also enable end-users and applications to find and access the data they need when they need it. Swarm’s tenant management features include the ability to easily create domains and buckets, and allow you to set bandwidth/capacity limits. In addition, administrators can set data protection policies and easily manage authentication and authorization via Active Directory, LDAP or tokens (S3, SAML/SSO 2.0).

Swarm also includes a Content Management UI for both administrators and end-users that enables end-users to upload, download, search for, share, stream, and organize data (through customizable metadata).

Learn More: DataCore’s On-Demand Educational Object Storage Webinar

You can learn more about this topic in our educational webinar “Data Resilience and Recovery with Object Storage.” In this webinar, John Bell, DataCore’s Senior Technical Customer Engineer, and I will do a deep dive on why object storage provides the best defense against a wide range of issues and how to identify the right object storage solution for your organization.

We will also provide an overview of why object storage is increasing in popularity as a tool to help protect against malicious attacks, replication and erasure coding, and how object storage fits into the 3-2-1 and 3-2-1-1 backup rule.

Helpful Resources

Stay up-to-date

Subscribe to get the latest articles from the authority in software-defined storage, delivered directly to your inbox.