What’s the Difference and Why it Matters
Data is the lifeblood of every modern organization. Our ability to share, store, and use it effectively is crucial to helping businesses grow, improve operational efficiency, keep customers happy, and gain a competitive edge. It’s also vital for empowering employees by giving them access to the information they need to get their jobs done. This is especially true with more of us working remotely during the current health crisis.
We all know that data is growing explosively – organizations have to buy more data storage than ever before. And that’s a big problem. However, every organization is faced with another big problem that effects everyone – business leaders, IT professionals and users – though it effects them in different ways. And this is: Not all data is equally valuable.
Data is like cash. We treat, protect and use the cash in our wallet differently based on its value. We’re a lot more careful about how we look after and spend $100 bills than $1 bills. The same is true of data. Not all of it is equally important and more importantly, its value changes over time – typically because of the information contained in it, access frequency, and even age of the data. Ideally, organizations should have storage platforms that are built to handle the importance of data in intelligent ways, rather than just storing bits and bytes unintelligently. That’s why data storage providers introduced the concept of “Data Temperature.”
To illustrate: there is usually a short burst of frenzied activity with newly created data, but this activity rapidly drops off over time. Typically 90% of I/O activity takes places in 10% of data storage. And it is also true for most organizations that only about 20% of all data is being actively used. That leaves 80% of data just sitting there chilling. It might be used once a month, or once a year, or never again. The image below shows how data temperature equates to its value. Hot data is in active use and it’s most valuable to the organization. Inactive data is cold and less valuable, but you still have to store it for possible future use, which would make it hot again.
It is to be noted that data access need not be the only deterministic factor for inactive/cold data. For unstructured data, there could be other business requirements that determine when data can be deemed inactive, such as the age of the data, cost of storing it, its protection level, compliance and so on.
Let’s look at the unstructured data world where data is more distributed and the two popular formats of data storage: file system and object storage.
What is File Storage?
File storage (aka file-based storage or file-level storage) is the type of data storage where data is stored in a hierarchical file and folder structure. A file is stored as a whole without breaking down the data into blocks, such as in block storage. Files can be stored in folders, which can then be placed in other folders in a nested structure. The directory path of the file and which folder it is stored in is needed to call up that file again from its storage location. NAS systems typically use file storage and are comparatively less expensive than block storage.
If you have a computer, you’ve used a file system. File systems contain documents, presentations, images, all the sorts of resources we move around on our desktop or store in our ‘Documents’ folder. File systems give us a hierarchical system for organization. It’s a similar approach to using a filing cabinet with the data organized into named directories, folders, subfolders and files. Applications and users know where everything is based on name and location. File systems are great for simple in and out access, provided you know the location of what you’re looking for.
For file storage beyond the ordinary desktop or laptop, organizations use NAS (Network Attached Storage) solutions and file servers to provide specialized and optimized file share capabilities across a network. They usually provide NFS and SMB protocol support for use in Unix, Linux, and Windows environments. These are great for file and document storage or sharing.
NAS is typically suited for file and document storage or sharing, as well as access control. But as you know from your own desktop, you’re only working on a few files at a time. Most of the files on your hard drive are cool—or cold. If that’s true on a file server or NAS, the system runs out of storage or performance bogs down—just like your notebook. In such cases, IT organizations can consider object storage as a means to store cold (or inactive) data.
What is Object Storage?
Object storage (aka object-based storage) is a type of data storage used to handle large volumes of unstructured data where data is bundled along with metadata tags and a unique identifier. Each of these self-contained object datasets are placed into a flat address space, known as a storage pool. Unlike file storage, object storage does not follow a hierarchical structure. The metadata contains description about the data and the unique identifier is used to easily retrieve the object instead of a file name and file path. Cloud-based S3 is a popular object storage option in addition to on-premises object storage deployments.
Object storage is a more recent approach that doesn’t impose a file system on the data. Instead, metadata is used to describe all the details about the underlying data. This can include the name, creation date, location, owner and much more. Tables are used to make it possible to store, track and retrieve data based on this metadata.
This works in the same way as using a valet service at a car parking facility. Imagine millions of cars in an enormous parking lot. The valet provides a parking ticket in exchange for your car and then parks it for you. You don’t need to know where it’s parked, just that it’s safe and will be available when you need it next. It can be retrieved by the valet at any time based on the information (or metadata) on the parking ticket, no matter the size of the parking lot.
The advantages of object storage include low cost, massive scalability, and global access capabilities. The trade-offs include latency and performance, but these are improving over time. For users who almost never need access to old files and documents, it’s almost invisible. But to organizations who need to keep everything for regulatory compliance or legal defense, object storage is essential.
Putting the Right Data in the Right Place at the Right Time
The key take-away: Different data is worth more or less, depending on time, users, and importance. That means the most appropriate storage for any specific data will depend upon how valuable it is right now and the specific needs of applications or end-users utilizing it, or its business relevance. And that’s nearly impossible for a storage admin to determine, day by day. After all, your organization is creating millions of documents every year. Can you imagine a storage admin digging through every document, trying to decide whether it’s hot, warm, or cold, or applying different business relevance conditions manually and deciding which data is placed on which storage device?
The problem is that, up until now, we haven’t had a good way to make sure that data – whether on NAS devices or object stores – was in the right place at the right time, especially since needs change all the time, file and object platforms might come from different vendors or use different toolkits, and manual migration from one another is a pain.
That’s where a modern software-defined storage solution like DataCore’s vFilO comes in.
- It uses AI/ML-driven auto-placement to move data to the most appropriate storage based on its access temperature. vFilO checks the heat template of data stored in a storage device and then determines whether to keep the data on a premium NAS device or move it to lower cost alternatives (such as object stores). vFilO not only looks at the access frequency of the data, but also other custom criteria based on business relevance that the storage administrator can set, such as age of file, location, resiliency, etc. This means you can balance performance, capacity, operational efficiency and cost factors. High-performance, high-cost storage can be reserved for hot data, while mon-critical (or inactive) data can migrated to low-cost storage or the cloud.
- You can tap into all the available capacity across the organization, unlocking pockets of unused storage you didn’t even know you had. That means you can delay expensive upgrades, or avoid them altogether.
- With a global namespace, it’s simple to find the data you need when you need it. All file and object data is now accessible from a central console regardless of which storage device/type they live in. Using a metadata-driven search and find operation, vFilO accelerates the process of locating and accessing data across different types of storage devices (file or object stored on-premises or in the cloud).
Why are those factors so important to business leaders, IT admins, and users right now?
- Because they enable quick and seamless access to data anytime, from anywhere, helping to drive innovation and gain a competitive advantage.
- Because you can balance and fine-tune performance, capacity, operational efficiency and cost across your entire storage landscape.
- Because they give you complete visibility and control to adapt to the radically new economic realities and even the new paradigm of a largely remote workforce.
Alternatively, why not give DataCore a call to discuss your specific requirements?