The simple fact that only a minor portion of the exponentially growing and fast-aging data is being frequently accessed and used drives every organization to think about an effective archival strategy. This is paramount for cost-efficient long-term preservation and governance of data. Whether it is an active archive with immediate access to data or a deep archive for cold files, IT teams are tasked with the role of finding effective, yet economical storage options. Let’s examine in this blog three commonly used archival options and compare and contrast their benefits and capabilities.
#1 On-premises Object Storage
This is a popular option for storing and protecting data within their data center and under their own IT security and data governance policies. Typically, businesses buy a turnkey object storage appliance from leading storage suppliers or build a customizable and scalable object storage cluster configured with software-defined storage with S3/HTTP access. Resiliency and durability are built into the architecture.
#2 Public Cloud Storage
Public cloud storage (based on S3 object storage) is attractive to those that prefer a third-party provider to house and manage the storage infrastructure. No need to worry about IT operations and scaling up/down capacity as the business demands. Hosting providers offer various durability and availability SLAs to suit different needs. While an all-cloud approach is still not widespread, hybrid cloud adoption is catching up where data storage is distributed between on-premises/private cloud and public cloud locations based on business needs.
#3 LTO Tape
Linear Tape-Open (LTO) is used for long-term data storage purposes, particularly where offline storage is a requirement. Tape drives are used to store cold data that will not need to be retrieved in the foreseeable future. They are usually stowed away securely in offsite locations delivering an air-gapped extra layer of data protection for archives.
Comparing Archive Storage
Each of these storage options benefits certain use cases. Let’s compare them from various aspects to help you decide which best suits your specific requirements.
|On-premises Object Storage||Public Cloud Storage||LTO Tape|
|Location||On-premises data center, secondary/DR sites, remote branches||Offsite – hosted in the public cloud||On-premises data center, offsite vaults|
|Data Storage Footprint for Organizations||Depends on storage capacity and data protection policy applied||Nil – as all the data is stored in the public cloud||Depends on storage capacity and data protection policy applied|
|Costs||Initial CAPEX for the appliance/hardware as needed, and simple and predictable OPEX for data services and management thereafter (possibly with software-defined storage)||No infrastructure cost on-premises. But unpredictable costs, such as egress fees, that compound annually, cost of retrieving archived data, WAN cost, indirect costs associated with data migration to the cloud.||Initial CAPEX costs for tape drive hardware. Low running costs to maintain and replace cartridges.|
|Data Security||Apply in-house security policies, encryption, authentication, etc. Granular control (at the object level) of security and compliance to meet requirements.||Managed by hosting provider; no flexibility to apply organizational security regulations||Apply in-house security policies, encryption, authentication, etc. Granular policy definition is hard at the tape-level. Once policies are set, it is hard to change the initial configuration later.|
|Data Protection||Flexible data protection policies as per requirement. Store redundant copies on-premises or copy to a secondary site/cloud. Additional self-healing capabilities protect against bit rot and data corruption.||Full reliance on the service provider to maintain high durability levels based on SLA||Multiple backup copies can be stored on tape devices in different sites for recovery|
|Data Accessibility||On-demand always-on active archive: best suited for collaborated workflows, distributed access, and content delivery||Longer access latencies, and occasional loss of service (depends on quality of Internet connectivity)||Slow load times and sequential access: not suitable to store data that requires instant access|
|Searchability||Metadata-driven content management simplifies file search||Metadata-driven content management simplifies file search||Search is only possible via storage applications such as media asset managers.|
|Downtime Recovery||Speedy recovery with erasure-coding segments and replicas||Dependence on cloud service provider SLAs to fix the issue and recover data||With frequent access, there is a higher risk of wear and tear. Recovery is complex.|
|Storage Management Effort||Very minimal effort is needed for management, especially when software-defined storage is used||Nil – as it’s all taken care of by the service provider and incurs additional cost||Management is not so simple. Manual effort is needed when changing cartridges, moving tape media between locations, etc.|
|Hardware Refresh||Nodes typically have a 5-year refresh cycle. With software-defined storage, it is easy to perform in-place migration of data and non-disruptively refresh hardware.||Managed entirely by the service provider.||Significant manual intervention is needed when changing media and drives get old and need upgrading. Also migrating data between different generations of LTO is not very easy.|
|Single File & Bulk File Access||Well-suited for both||Better for single file or smaller number of files due to egress fees||Better for bulk retrievals (e.g., production set), as once media is loaded, stream rates are fast|
As we saw in the table above, there are clear benefits and limitations of each approach. Ultimately your requirements determine the right storage option to go with. And many a time, it’s not always one or the other. For companies who want to secure and safeguard their data in-house, an on-premises object storage solution would fit the bill. For other organizations, a combination of these with on-premises object storage being the active archive would be ideal.
Let’s see an example where you can use a combination of these options. You can use an affordable on-premises object storage platform to offload seldom-accessed data from primary NAS/file servers. This would be your main active archive.
- Easily migrate inactive data, large media files, backups, etc. to this scalable on-premises object storage platform and quickly access any of it when needed.
- Preserve and protect this data for numerous years based on your organizational security policy and compliance mandates to store data locally.
- Then, this data can be moved to a deep archive in the public cloud or to tape for longer term storage.
You can create multiple copies of data (for redundancy) in the same on-premises location or in a secondary site. Because you have the flexibility to choose the storage media, you can have your primary backup in an internal hard disk drive or a physical server and a secondary copy in a removable media like tape. You can also consider the option to have offsite DR in the cloud (cloud tiering tools come in handy here).
Talk to a DataCore active archive expert to get guidance on the most suitable means to store and protect your data. Our Swarm software-defined object storage platform can be a cost-effective and secure choice within your data center, scale on-demand, and enable quick and easy access when needed.