Object Storage - What Is It? (1/3)
In this series of articles, we will start with a wide description of the Object Storage technology currently in production at Scaleway.
I’m Constance, Product Marketing Manager of Block Storage. Since arriving at Scaleway, I realized that users are increasingly investing in storage capability, which has led us to accelerate the development of new Storage solutions to fit as many of our clients’ needs as possible.
I have come to two conclusions:
How can startups, companies and individuals make sure their cloud storage usage is optimized and that every cent invested in storage is worth it?
Because, believe it or not, I've seen impressively well-built infrastructures in which storage was a real lever for better performance, cost efficiency, security and compliance.
Keep reading to find our guide to Scaleway Storage solutions, how they differ, and discover which solution is best-suited to your cloud usage. And a big thank you to Franck Pagny, Scaleway’s PM on Serverless Database, Marie Debard, Scaleway’s PM on Object Storage, and Thomas Deschamps PM on Block Storage who helped to build this guide!
Object Storage allows you to store large amounts of unstructured data (documents, images, videos, etc.) and to distribute them instantly, anywhere in the world.
Object Storage stores data as distinct units, or "objects". Each object has a unique identifier, and is bundled with highly customizable metadata, making it easier for you to control the way that you upload, download, access and analyze your stored data. Objects are stored in a flat address space, with no file paths, and kept in secure and scalable buckets, making it easier to locate and retrieve your data across regions.
Object Storage is a solution for any organization which needs to store a massive and growing amount of unstructured data in a scalable, efficient, and affordable way. Common use cases include:
Applications & websites
Great when content needs to be highly available and highly durable, such as for streaming videos and serving images,documents and other website files.
Data lakes
Object storage allows you to centralize a vast amount of data in its native format. Advantages like unlimited volume, low price and high scalability make object storage the go-to option to build a data lake.
Machine learning
Often, a large amount of data must be stored to train models, Object Storage is a great solution for this.
It is essential to choose the right storage for your intended usage.
Scaleway Object Storage offers three different storage classes whose price and performance vary, making it easy to optimize storage costs according to your needs.
You can also leverage storage with microservices. Data stored in Object storage needs to be processed, and this usually means provisioning and configuring VMs, managing load balancers, and tweaking autoscaling rules. This is where Serverless comes in: Serverless functions free you from configuring and managing infrastructure and let you focus on your data. You can schedule automated transformation of all *.jpg or *.png images stored in an Object Storage bucket.
Cold Storage is an Object Storage class that is used to store “cold data”, the opposite of “hot data”. Hot data needs to be easily and quickly accessible as it is accessed and used very frequently. Cold storage, on the contrary, represents data that isn’t used frequently and thus doesn’t need fast access.
Scaleway’s cold storage service, Glacier, is engineered on specific hardware:
An object stored in Glacier class is listed for you to see, but cannot be downloaded instantly. It needs to be restored to the Standard class first. It can take anywhere from a few seconds to 24 hours to retrieve the first byte of an average-sized file. To facilitate restoration and ensure fast restitution of your data, we recommend using average-sized files (larger than 1MB).
Cold Storage use cases center around deep archiving: storing data that you need to keep for regulatory purposes but don’t need to access frequently or quickly.
Backups and archiving
Use Amazon S3 lifecycle management features and versioning to automatically archive data such as logs or backups to Scaleway Glacier after a certain period of time. Benefit from a lower price, and retrieve your data when needed.
Legal archive
Archive highly restricted legal documents required by the law such as contracts, accounting data, administrative documents or access logs.
Stay compliant with GDPR and other local regulations while limiting your budget.
Now that you understand what Scaleway Glacier is designed for, let’s have a closer look at how Scaleway Glacier is perfect for cost optimization.
The amount of data you need to store inevitably grows over time, and some of this will include data that you use and access very infrequently or not at all. But that doesn’t mean that this largely unused data is useless, and this is where Glacier comes in. Thanks to lifecycle rule management, you will be able to reduce your storage costs by “freezing” infrequently accessed data in cold storage. This will represent an approximate 56% saving on your object storage budget (by moving 70% of your data to Glacier).
In addition, moving data from Object Storage “hot” classes to Glacier class (or vice versa) is free of charge.
On top of this, you can also set an expiration date for your data so that it is deleted after this time, to help you keep your budget even further under control.
Scaleway Block Storage provides network-attached storage (NAS) that can be plugged in and out of cloud products such as Instances like a virtual hard-drive. Block Storage devices are independent from the local storage of Instances, and the fact that they are accessed over a network connection makes it easy to move them between Instances in the same Availability Zone. From the user’s point of view, once mounted, the block device behaves like a regular disk.
When you create a block volume attached to an Instance, the operating system detects it as a raw disk.
In the background, a block device is managed by our Ceph cluster as a collection of smaller pieces (called chunks or blocks). Each of these chunks are replicated 3 times to avoid data loss in the event of a storage medium failure. So when you provision a certain amount of Block Storage inside the data center, we provision three times this amount on multiple devices to ensure your data resiliency.
Block Storage at Scaleway consists of scalable and persistent SSD disks storing data on your virtual machines (VMs), making it easy to transfer to other VMs or to reinstall quickly on new VMs when you restart your machines.
Due to its replication, high availability and great performance, Block Storage is more expensive than other types of storage. However, you can nonetheless follow some simple rules of thumb to keep your costs low:
117. Start with a reasonable volume size. Because Block Storage can be increased without any downtime, you don’t have to choose a huge amount of storage upfront, because you know you can expand it in the future when needed.
118. If you want to store files for which you don’t need fast access, Object Storage might be a better, more cost-effective solution. In this case, you will still benefit from the high availability and resilience of your files but won’t pay for the speed you don’t need.
119. Snapshots are full-volume copies of your Instance, and you can snapshot your block volume whenever you want. To make it cost-effective, you can transfer those snapshots to Object Storage!
File storage stores data as files and presents it to the user in a hierarchical directory structure. To access a stored file on a file storage system, you must use the specific path to where the file is located, such as
/home/myuser/myphotos/christmas2022/CostaRica/beach.jpeg.
Data can either be stored on a local computer hard drive or on a network-attached storage solution, for example through network attached storage (NAS) devices.
This storage system supports a range of file access management features, such as ownership and permissions across a set of authenticated users. It also supports multiple concurrent writes and ensures high data availability.
File storage solutions usually avoid using only a local computer hard drive and rely instead on protocols such as Network File System (NFS), Rados (used in CephFS) or GlusterFS protocol, which create virtual filesystems in a file storage server and expose them to clients. Clients can see and interact with these virtual file systems exactly like a typical file system. As these virtual filesystems are stored on file storage servers and not locally, they can be shared by multiple clients simultaneously. Behind the scenes, servers implementing file storage will handle data replication and/or distribution among different physical nodes to ensure high availability and durability.
File storage can be used in a broad range of applications. For example:
Applications & website content
When hosting static files with tools such as Content Management Systems (eg. WordPress, Drupa or Joomla), you can benefit from file storage to store images or video content. This enables the application to be containerized and scaled without having static storage as a bottleneck.
Data processing
Processing a large amount of data to perform operations such as data transformation or machine learning algorithms can require low storage latency to achieve higher performance. If data volumes and concurrent access stay reasonable, file storage can perform better than Object for some use cases.
Drive or file system transfer
Providing personal cloud drives for end users to store their documents, images, or videos online is also a typical use case for file storage. Indeed file storage’s hierarchical structure is already familiar to most end-users who want to organize their files, while also ensuring high data availability.
File storage enables a good balance between latency requirements and scalability, despite generally being more expensive than Block or Object Storage per GB stored. File storage is mainly designed to scale data storage and concurrent access up to a certain point, while keeping the well-known file structure hierarchy.
Furthermore, as many solutions have used a file system structure over the last decades, it enables many legacy applications to use it with no or little migration effort required. In such situations, file storage can be a good asset to limit investment in application refactoring while still improving performance and limiting maintenance costs.
Like other storage systems, a file storage cost optimization also relies on assessing data access frequency and durability needs. Choosing a storage type adapted to these needs (hot/cold, single availability zone redundancy/multi-availability zone redundancy etc) can optimize costs drastically.
In short, if you need compute power to build, test and launch an application, you’ll provision Instances, such as PLAY2, which need Block Storage, as a minimum requirement.
If you are a professional wedding photographer and you keep your clients’ photos for several years before erasing them, you’ll need cold storage, such as Scaleway Glacier to store 8K ultra-high-definition footage or raw image data sets that don't require frequent access.
If you're an e-commerce company managing OLTP protocols (Online Transaction Processing) you need to deploy and scale your PostgreSQL database seasonally. To do this, you’ll have to provide high-availability block storage to guarantee your data redundancy and integrity.
I hope this article helped you understand some of the storage possibilities available, and that you learnt some tips along the way. There is no secret recipe to building your perfectly optimized infrastructure just yet, but if you start by identifying your needs, and take a step back on your options, you are going in the right direction.
If this article made you want to ask a million questions, reach out to us on our Slack Community!
In this series of articles, we will start with a wide description of the Object Storage technology currently in production at Scaleway.
Scaleway just released its Block Storage in public beta and it is a great opportunity for us to explain the main differences between Block, File and Object storage.
This article will highlight the main differentiators when choosing the storage type for your managed database.