Object Storage - How It Works? (2/3)
In this article, we will present the internal architecture of Scaleway Object Storage.
Scaleway just released its Block Storage in public beta and it is a great opportunity for us to explain the main differences between Block, File and Object storage.
Block storage is a technology that allows abstraction over a low-level storage device. The main advantage of this storage solution is to provide low-latency operations. When one orders a block volume, it is like ordering a virtual hard-drive that can be plugged-in/out of a cloud instance. As a user, you would treat this block device as a regular disk. When you plug it in, the operating system detects it as a raw disk. Then, you format it to create a file system on it (ext4, XFS, NTFS…) and start using it as a regular device upon which you can store your data.
In the background, a block device is managed by the cluster as a collection of smaller pieces (called chunks or simply blocks, hence its name). Each of these chunks can be stored across a storage cluster of several machines and under a unique address. In particular, in the case of a cloud block storage, those chunks are replicated to avoid data loss in the event of a storage medium failure.
When the operating system of your instance asks for a particular file, it makes a request to the block device your file is stored in. This request is then translated in a block storage system request that will deliver the data to your operating system just like a real hard-drive would do.
This makes block storage ideal for latency-critical applications such as storage of virtual machines and transactional databases. It is also well suited for business-critical applications as data is stored redundantly across multiple physical disks and nodes. In case a disk failure occurs, the missing blocks can easily be recovered from other disks in the cluster. Block storage also provides consistent performance, no matter the amount of data stored, contrary to file storage, which might suffer performance issues when a certain number of files are stored. In addition, the block device is accessed over a network, making it easy to detach a volume from a server A to attach it to a server B inside the same availability zone. Block storage is convenient when using products such as Kubernetes or Database.
Block Storage Pros: | Block Storage Cons: |
---|---|
Adequate for applications that requires high-performance and optimized I/O bound | Block storage volumes are allocated with a fixed size.You may end up paying for unused storage. |
Highly redundant. The data is redundant across the volumeso if a disk fails, the data can be easily recovered without any impact on your applications. |
File storage is a solution to store data as files and present it to its final users as a hierarchical directories structure. The main advantage is to provide a user-friendly solution to store and retrieve files. To locate a file in file storage, the complete path of the file is required. For instance: /home/myuser/myphotos/summer2019/italy/beach.webp
.
File storage is how final users and many applications interact with a storage solution. Data can either be stored on a local computer hard drive or on a network-attached storage solution for example through network attached storage (NAS) devices.
This type of storage system supports a range of file access management features, such as ownership and permissions across a set of authenticated users. It also supports multiple concurrent writes. Several users can mount the same file storage and edit it concurrently.
But it got a set of drawbacks in the field of scalability. This solution is limited in the number of files it can serve efficiently. Still, as the number of data we need to manage is continuously growing, file recovery can become a burdensome and time-consuming task. Expanding the storage capacity requires the careful management of the underlying storage medium which can be problematic in the case of a NAS with limited slots.
Cloud solutions exist to solve the physical constraints of file storage. These services allow users to store their files on servers in a remote datacenter (the cloud) and make them available through a network connection. Multiple users can simultaneously access their files while the cloud provider manages the physical devices storing the data.
File Storage Pros: | File Storage Cons: |
---|---|
Accessible to multiple runtimes. A single fileshare that has multiple servers accessing all at once. | Performance affected by network traffic. |
Simultaneous reads and writes without worryingabout your data being overwritten. | Performance suffers beyond a certain capacity |
Limited set of metadata |
Object storage is one of the most recent storage system. It was created in the cloud computing industry with the requirement of storing vast amounts of unstructured data. Instead of using file paths, data is stored as immutable objects addressed by a key.
No matter if these objects are log files, HTML websites, images, documents, or any kind of data. As there is no specific schema to follow, these objects are called unstructured. Data objects include an ID (instead of a file name and a file path), metadata (e.g., authors of the file, permissions set on the files, date on which it was created, etc.) and unstructured data (e.g., images, videos, websites backups, etc.). The metadata is entirely customizable, which allows you to add more information to each piece of data.
Access to the objects and their metadata is done using a standard HTTP API, which is one of the reasons object storage became a massive success with all kinds of developers and in particular web developer. Storing and retrieving objects using standard HTTP requests made it easy to develop libraries for almost all programming languages. Most object storage service providers also allow accessing objects from a public link making it possible to host static assets of websites on the object storage service instead of using a dedicated web server for this task.
Object Storage Pros: | Object Storage Cons: |
---|---|
Massively scalable | Object Storage does not allow the modificationof a certain data blob, each object must be read and written completely whichmay lead to performance issues |
Customizable metadata and flat address space | Higher latency than block storage |
Easily accessible via HTTP requests | |
Billed per usage, no fixed costs or very low entry fee |
Object Storage is the ideal solution for storing large amounts of data that is not being altered once stored and where latency is secondary. For example, it can be used to provide storage for file-sharing services, backups or personal data storage like photos or videos.
In this article, we will present the internal architecture of Scaleway Object Storage.
In this series of articles, we will start with a wide description of the Object Storage technology currently in production at Scaleway.
In this article, we will go through the infrastructure design on which our object storage service runs. The first challenge was to find the right balance between the network, CPUs and IOPS.