Object Storages

Thu 10 March 2016

An object is defined as data (typically a file) along with all its metadata, all bundled up as an object. This object is given an ID that is typically calculated from the content of that object (both file and metadata) itself. An object is always retrieved by an application by presenting the object ID to object storage. Unlike files and file systems, objects are stored in a flat structure. You have a pool of objects, and you simply ask for a given object by presenting its object ID. Objects may be local or geographically separated, but because they are in a flat address space, they are retrieved exactly the same way. An object is not limited to any type or amount of metadata. If you choose to, you can assign metadata such as the type of application the object is associated with; the importance of an application; the level of data protection you want to assign to an object; if you want this object replicated to another site or sites; when to move this object to a different tier of storage or to a different geography; and when to delete this object. This type of metadata goes way beyond the access control lists used in file systems. The fact that object storage allows users flexibility to define metadata as they wish is unique to object storage. You can start to see how this opens up vast opportunities for analytics that one could never dream of performing before. Given the nature of objects, as described above, performance is not necessarily a hallmark of object storage. But if you want a simple way to manage storage and a service that spans geographies and provides rich (and user-definable) metadata, object storage is the way to go.

There is another characteristic of object storage that is critical to understanding its nature. Unlike a file or block, you access an object using an HTTP-based REST application programming interface. These are simple calls such as Get, Put, Delete and a few others. Their simplicity is an advantage, but they do require changes to the application that were probably written to use SCSI, CIFS or NFS calls. Therein lies the problem. There are ways around this, but the cleanest approach is to change the application code to make direct REST-based calls. So, in a nutshell, an object store is easy to manage, can scale almost infinitely, transcend geographic boundaries in a single namespace and can carry a ton of metadata, but it is generally lower-performance and may require changes to the application code.

RESTful service

RESTful APIs explicitly take advantage of HTTP methodologies defined by the RFC 2616 protocol. It simply uses a "PUT" to change the state of or update a resource, which can be an object, file or block; a "GET" to retrieve a resource; a "POST" to create that resource; and a "DELETE" to remove it.

Working with Amazon S3 Objects

Amazon S3 is a simple key, value store designed to store as many objects as you want. You store these objects in one or more buckets. An object consists of the following:

  • Key: The name that you assign to an object. You use the object key to retrieve the object.
  • Version ID: Within a bucket, a key and version ID uniquely identify an object. The version ID is a string that Amazon S3 generates when you add an object to a bucket. For more information, see Object Versioning.
  • Value: The content that you are storing. An object value can be any sequence of bytes. Objects can range in size from zero to 5 TB.
  • Metadata: A set of name-value pairs with which you can store information regarding the object. You can assign metadata, referred to as user-defined metadata, to your objects in Amazon S3. Amazon S3 also assigns system-metadata to these objects, which it uses for managing objects.
  • Subresources: Amazon S3 uses the subresource mechanism to store object-specific additional information. Because subresources are subordinates to objects, they are always associated with some other entity such as an object or a bucket.
  • Access Control Information: You can control access to the objects you store in Amazon S3. Amazon S3 supports both the resource-based access control, such as an Access Control List (ACL) and bucket policies, and user-based access control.

Amazon S3 REST API

Reviewed as example to work with Object Storage via REST API. Authentication is out of review. Only basic operations, for more detail information please check Amazon Simple Storage Service, REST API

  • Operations on the Service:

    • GET Service: returns a list of all buckets owned by the authenticated sender of the request.
  • Operations on Buckets:

    • DELETE Bucket: deletes the bucket named in the URI. All objects (including all object versions and delete markers) in the bucket must be deleted before the bucket itself can be deleted.
    • GET Bucket (List Objects): returns some or all (up to 1000) of the objects in a bucket. You can use the request parameters as selection criteria to return a subset of the objects in a bucket.
    • HEAD Bucket: useful to determine if a bucket exists and you have permission to access it.
    • PUT Bucket: creates a new bucket.
  • Operations on Objects:

    • DELETE Object: removes the null version (if there is one) of an object and inserts a delete marker, which becomes the current version of the object. If there isn't a null version, Amazon S3 does not remove any objects.
    • GET Object: retrieves objects from Amazon S3. An Amazon S3 bucket has no directory hierarchy such as you would find in a typical computer file system. You can, however, create a logical hierarchy by using object key names that imply a folder structure. For example, instead of naming an object sample.jpg, you can name it photos/2006/February/sample.jpg.
    • HEAD Object: retrieves metadata from an object without returning the object itself. This operation is useful if you are interested only in an object's metadata.
    • OPTIONS object: A browser can send this preflight request to Amazon S3 to determine if it can send an actual request with the specific origin, HTTP method, and headers.
    • POST Object: adds an object to a specified bucket using HTML forms. POST is an alternate form of PUT that enables browser-based uploads as a way of putting objects in buckets. Parameters that are passed to PUT via HTTP Headers are instead passed as form fields to POST in the multipart/form-data encoded message body. Amazon S3 never stores partial objects: if you receive a successful response, you can be confident the entire object was stored.
    • PUT Object: adds an object to a bucket.

Amazon S3 never adds partial objects; if you receive a success response, Amazon S3 added the entire object to the bucket. Amazon S3 is a distributed system. If it receives multiple write requests for the same object simultaneously, it overwrites all but the last object written. Amazon S3 does not provide object locking; if you need this, make sure to build it into your application layer or use versioning instead.

To ensure that data is not corrupted traversing the network, use the Content-MD5 header. When you use this header, Amazon S3 checks the object against the provided MD5 value and, if they do not match, returns an error. Additionally, you can calculate the MD5 while putting an object to Amazon S3 and compare the returned ETag to the calculated MD5 value.

Object Key and Metadata

Each Amazon S3 object has data, a key, and metadata. Object key (or key name) uniquely identifies the object in a bucket. Object metadata is a set of name-value pairs. You can set object metadata at the time you upload it. After you upload the object, you cannot modify object metadata. The only way to modify object metadata is to make a copy of the object and set the metadata.

Object Keys

When you create an object, you specify the key name, which uniquely identifies the object in the bucket. For example, in the Amazon S3 console (see AWS Management Console), when you highlight a bucket, a list of objects in your bucket appears. These names are the object keys. The name for a key is a sequence of Unicode characters whose UTF-8 encoding is at most 1024 bytes long. The Amazon S3 data model is a flat structure: you create a bucket, and the bucket stores objects. There is no hierarchy of subbuckets or subfolders; however, you can infer logical hierarchy using keyname prefixes and delimiters as the Amazon S3 console does.

Object Metadata

There are two kinds of metadata: system metadata and user-defined metadata.

  • System-Defined Metadata. For each object stored in a bucket, Amazon S3 maintains a set of system metadata. Amazon S3 processes this system metadata as needed. For example, Amazon S3 maintains object creation date and size metadata and uses this information as part of object management.
  • User-Defined Metadata. When uploading an object, you can also assign metadata to the object. You provide this optional information as a name-value pair when you send a PUT or POST request to create the object. When uploading objects using the REST API the optional user-defined metadata names must begin with "x-amz-meta-" to distinguish them from other HTTP headers. When you retrieve the object using the REST API, this prefix is returned. When uploading objects using the SOAP API, the prefix is not required. When you retrieve the object using the SOAP API, the prefix is removed, regardless of which API you used to upload the object.

Links