Gaurav Mantri's Personal Blog.

Comparing Windows Azure Blob Storage and Google Cloud Storage

Taking a break from my personal and emotionally charged Cerebrata Story (https://gauravmantri.com/tag/cerebrata/) Smile, I thought it would be nice to write a comparison between Google App Engine storage offerings and Windows Azure Storage offerings as some of the readers asked for it. So here I’m, comparing Windows Azure Blob Storage and Google Cloud Storage.

This blog post will derive a lot of its content from my other blog posts comparing Windows Azure Blob Storage and Amazon S3, so if you have not read them I would recommend doing so first and then come here. You can read those posts here:

Part I (Blob Containers & Buckets): https://gauravmantri.com/2012/05/09/comparing-windows-azure-blob-storage-and-amazon-simple-storage-service-s3part-i/

Part II (Blobs & Objects): https://gauravmantri.com/2012/05/11/comparing-windows-azure-blob-storage-and-amazon-simple-storage-service-s3part-ii/

Summary: https://gauravmantri.com/2012/05/13/comparing-windows-azure-blob-storage-and-amazon-simple-storage-service-s3summary/

Like with other posts, for the sake of brevity, we’re going to refer Windows Azure Blob Storage as WABS and Google Cloud Storage as GCS in the rest of this blog post. Also since I will be referring a lot about Amazon S3 in this post, so that will be referred to as AS3.

From fundamental functionality point of view, both WABS and GCS provides similar functionality. Simply put, both can be considered as file system in the cloud allowing you to store huge amount of unstructured data (usually in form of files).

Both systems provide a REST based API for working with files and folders and other higher level language libraries which are essentially wrappers implementing REST API. Over the years, both systems have evolved in terms of functionality provided. In both systems, each release of the API is versioned and is specified as a date. At the time of writing this blog, the service version number for WABS is 2011-08-18 while that of GCS is version 2.0.

At a very high level, both systems provide similar functionality. Here are some of them:

  • Both systems in essence are file system in the cloud with two levels of hierarchy.
  • Both systems allow you to store large amount of data reliably and cheaply.
  • Both systems allow you to protect your content from unauthorized access.
  • Both systems provide access control mechanisms to protect data stored. GCS provide ACLs and Query String Authentication where as WABS provide ACLs and Shared Access Signatures.
  • Both systems allow you to keep many versions of an object however the implementation of versioning is different.

Concepts

Before we talk about these two services in greater detail, I think it is important to get some concepts clear. If you’re familiar with the basic foundation concepts of WABS and GCS, you may want to skip this section.

Blob Containers and Buckets: If these services are file system in the cloud, think of a blob container (in WABS) or a bucket (in GCS) as folder or directory. In a storage account (in WABS) or a project (in GCS), you can have zero or more blob containers and buckets respectively and these will contain blobs or objects respectively.

A few comments about blob containers and buckets:

  • There is no concept of nested blob containers and buckets. Both WABS and GCS support 2-level hierarchy only and nested folders are not allowed. However, both systems allow you to create an illusion of folder hierarchy using something called “prefix”.
  • There is no limit on number of blob containers and buckets you can create in each system.
  • Both GCS and WABS provide capability where you can log the requests made against the resources. This feature is called “logging” in GCS and “Storage Analytics” in WABS. The difference is that the logging works at a bucket level in GCS while storage analytics work at storage account level in WABS. Furthermore the logging in GCS puts the data in a separate user defined bucket where as in WABS the storage analytics data goes into predefined tables and blob containers which are created automatically for you once you enable storage analytics on your storage account.

Blobs and Objects: Simply put blobs (in WABS) and objects (in GCS) are the files in your cloud file system. They go into blob containers and buckets respectively.

A few comments about blobs and objects:

  • There is no limit on the number of blobs and objects you can store. While GCS does not tell you the maximum storage capacity allocated for you, the total number of blobs in WABS is restricted by the size of your storage account (100 TB currently).
  • The maximum size of a blob in WABS is 1 TB while GCS does not specify the maximum size of an object.
  • In WABS , there are two kinds of blobs – Block Blobs and Page Blobs. Block Blobs are suitable for streaming payload (e.g images, videos, documents etc.) and can be of a maximum of 200 GB in size. Page Blobs are suitable for random read/write payload and can be of a maximum of 1 TB in size. A common use case of a page blob is a VHD mounted as a drive in a Windows Azure role. In GCS, there is no such distinction.
  • Both systems are quite feature rich as far as operations on blobs and objects are concerned. You can copy, upload, download and perform other operations on them.
  • While both systems allow you to protect your content from unauthorized access, the ACL mechanism is much granular in GCS where you can set custom ACL on each object in a bucket. In WABS, it is at a blob container level.

Two of the most important functions that you would work with are uploading and downloading. Let’s take a moment and understand them in somewhat greater detail and then we’ll compare the functionality offered.

Uploading Blobs and Objects

Let’s talk about how you can upload blobs and objects in a blob container or bucket respectively. There are two mechanisms by which you can perform an upload. You can either upload an entire blob or object in a single attempt or you can split them in chunks (called blocks or pages in WABS).

Uploading in Single Attempt

Let’s say the data you’re uploading is small and you have relatively good Internet connectivity, you can upload that entire data in a single attempt. In WABS you would use Put Blob functionality. In GCS you would use PUT Object or POST Object functionality.

Uploading in Chunks

As mentioned above, you could store really large data as blobs and objects in WABS and GCS respectively. If you’re trying to upload really large data, it becomes impractical to try and upload the entire data in single attempt. Luckily both WABS and GCS give you the ability to split the data in smaller chunks (called blocks or pages in WABS and while GCS does not have a specific name for these individual chunks but the functionality is called Resumable Uploads) and then upload these chunks. To do so in WABS, for block blobs you would use Put Block and Put Block List functionality and for page blobs you would use Put Page functionality. To do so in GCS, you would use POST Object and Put Object functionality.

There are many reasons why you should consider uploading data in chunks:

  • You’re trying to upload really large data. Thus it becomes impractical to upload data of such size in a single go.
  • Your Internet connectivity may be poor.
  • Both WABS and GCS are cloud services designed to cater to hundreds of thousands of concurrent users. They will timeout your requests if they find out your requests are taking longer than allowed limit. For example, WABS has a threshold where they allow your requests 10 minutes to upload 1 MB of data. If you fail to upload this much data in 10 minutes, WABS will time you out.
  • Splitting large data in chunks allows you to perform parallel uploads thus you would be able to upload data much faster.
  • In case a chunk fails, you could just retry just that chunk. However if you’re uploading data in a single go and that fails, you would need to do entire upload again which is not very efficient.
  • You’re restricted by the system. WABS actually prohibits you from uploading data in a single go if the size of the data is more than 64 MB i.e. if you’re data size is more than 64 MB, you must upload it by splitting in blocks or pages.

Now let’s see how uploading in chunks work in each system. Let’s say you want to upload a 100 MB file in each system and you want upload it in chunks.

WABS

Let’s say size of each chunk is 1 MB each (though it is not really required that all your chunks are of same size). Essentially what you need to do is upload 100 chunks. For the sake of explaining, let’s assume that you’re uploading a block blob in WABS instead of a page blob. In WABS, for each block (chunk) you will assign a unique identifier (called BlockId) and start uploading that block using Put Block function. BlockId is a Base64 encoded string value maximum length of which could be 64 bytes. Also all block ids (100 in our case) must be of same length. It doesn’t matter in which order you upload the blocks. You can upload blocks in parallel if you would like. Once a block is uploaded successfully, WABS store it somewhere in its infrastructure. WABS will keep these blocks for a period of 7 days. Once all block are uploaded successfully, then you would call Put Block List functionality to commit these blocks. Until the time you call “Put Block List”, you will not be able to access the blob. If you don’t commit the blocks in 7 days, they will be garbage collected by the system. When you call Put Block List, based on the order of BlockIds passed in its request payload, WABS construct the blob and then make it accessible. While it is not important how you name the BlockIds (all of them can be GUIDs), it is really important to send the BlockId list in proper order to Put Block List functionality.

There are a few restrictions here:

  • A blob can only be split into a maximum of 50000 blocks.
  • A blob can have a maximum of 100000 uncommitted blocks at any given time.
  • The set of uncommitted blocks cannot exceed 400 GB in total size.
  • As mentioned above, all BlockIds in a blob are of same length. It is not acceptable to have BlockIds like block8, block9, block11.
  • Maximum length of a BlockId can be 64 bytes.

GCS

Uploading a large file in chunks functionality is called “Resumable Uploads”. First thing that you will do is tell GCS is that you’re starting this process. You do so by invoking POST Object functionality. In general you use this functionality to upload a file using HTML forms but in this case, you will not specify a file. Rather you will specify some request headers which would let GCS know that you are initiating this process. When this process completes successfully, GCS sends you back a response containing an “Upload Id” which uniquely identifies this upload process. You must preserve this upload id as it will be required when you’re uploading chunks. Based on the documentation, next thing that you will do is try and upload this file in its entirety the first time. You will use Put Object function and providing this upload id along with the content. If this operation succeeds, GCS will send you a 200 OK status code however if the operation fails for any reason, you would need to query GCS for the number of bytes it has received so far using Put Object function. One of the response headers is “Range” header which tells you how many bytes have been received by GCS and along with the response, GCS sends you 308 Resume Incomplete status code which means the upload has not completed. Based on this information, you can start sending remaining data to GCS again using Put Object function. Every time you send this data and get the response back, you would need to check the status by calling Put Object function and calculate the bytes which are remaining. Since there is no Put Block List (in WABS) or Complete Multipart Upload (in AS3), I am guessing when your upload is completed, you would get 200 OK status code back from GCS.

Having just read about this process, here are some of my thoughts:

  • I think we can get rid of the first call to PUT Object function where we’re trying to upload an entire file hoping to get a 200 OK status code in first shot. If I am trying to upload a 100 MB file, I’m pretty sure that it will not upload in a single go. So instead of even trying to upload this file in its entirety, I can skip first 2 steps and simply upload a chunk of that file, get it’s status and then either retry that chunk or send next chunk.
  • I’m not sure how we can do parallel uploads in GCS as it returns a range header when I am querying for how many bytes I have uploaded. WABS has the concept of Block Id and AS3 has the concept of Part Number which makes it very easy for me to perform parallel uploads.

Downloading Blobs and Objects

Now let’s talk about how you can download blobs and objects from a blob container or bucket respectively. There are two mechanisms by which you can perform an download. You can either download an entire blob or object in a single attempt or you can split them in chunks.

Unlike uploading where there are many functions to perform upload (Put Blob, Put Block, Put Block List in WABS and PUT Object, POST Object in GCS), there is only a single function to perform a download. That function is Get Blob in WABS and GET Object in GCS.

Downloading in Single Attempt

If the data to be downloaded is small and you have good Internet connectivity, you can download that entire data in single attempt using Get Blob in WABS and GET Object in GCS.

Downloading in Chunks

If the data to be downloaded is large and you think you will not be able to download it completely in a single attempt, you can download the data in chunks. You will still use the same functions as above however you would use an additional request header called “Range” and specify the byte range you would want to download.

The reasons for uploading data in chunks also apply here as to why you would want to download data in chunks.

This is the process you would typically follow:

  1. First you would need to know the size of the blob or object you wish to download. Let’s say you wish to download a 100 MB item.
  2. Next you determine the chunk size based on your requirement. Let’s say you’re comfortable with 1 MB chunks.
  3. Now you will repeatedly call Get Blob or GET Object functionality and pass appropriate values in Range request header. So if you’re downloading sequentially, your first request will have the value of this header as “0 – 1048575” (0 – 1 MB) and your second request will have the value of this header as “1048576 – 2097151” (1 – 2 MB) and so on and so forth.
  4. Once each chunk is downloaded, you will store that data somewhere.
  5. Once every chunk is downloaded, you would create an empty file of 100 MB size and start fill that file with all the chunks you have downloaded.

General Similarities between WABS, AS3, and GCS

There is a common theme which cuts across all three systems. For example:

  • All three systems are essentially file system in the cloud.
  • All three systems allow you to store huge amounts of data.
  • All three systems provide 2 level hierarchy for storage (Buckets/Objects in AS3 & GCS and Blob Containers/Blobs in WABS).
  • At the very least, all three systems provide RESTful interface to interact with the service. All three systems provide higher level client libraries which are essentially wrapper over these REST interface.

Similarity with AS3

When I first started reading about GCS, what I found is that there are many striking resemblances between GCS and AS3 apart from the similarities mentioned above. For example:

  •  Same Terminology: Both systems use the same terminology like Buckets and Objects (WABS call them Blob Containers and Blobs).
  • Same Operation Names: Both systems even use the same operation names. For example, the API function to get the list of buckets is called GET Service in both systems.
  • Same Pricing Structure: Both systems have similar pricing structure. While WABS has a fixed per transaction pricing for all transactions, both AS3 and GCS have fixed per transaction pricing however it varies depend on the kind of transactions you’re performing. We’ll talk more about the pricing in details later in this post.
  • Same Hosting Style: Both systems support virtual-hosted-style (e.g. http://mybucket.s3.amazon.com/myobject) and path-style (e.g. http://s3-eu-west-1.amazonaws.com/mybucket/myobject) whereas WABS only support path-style (e.g. http://myaccount.blob.core.windows.net/myblobcontainer/myblob).
  • Similar Consistency Model: Both systems provide similar consistency model. For example, both systems provide strong read-after-write consistency model for all PUT requests and eventually consistent model for all List (GET) operations.

Unique things in GCS

When we start discussing the core functionality offered by GCS, it may look like it offers lesser functionality as compared to WABS and AS3 (which to some extent is true also) however it offers certain unique functionality which is not offered by both WABS and GCS. For example:

  • OAuth 2.0 Authentication: This is a unique and “modern” feature implemented only in GCS which eliminates the need to provide sensitive account information to users and applications that need to access the data. To read more about this, click here: https://developers.google.com/storage/docs/authentication#oauth.
  • Cookie bases Authentication: GCS provide browser-based authenticated downloads to users who do not have Google Cloud Storage accounts. To do this, you would need to apply Google account-based ACLs to the object and then provide users with a URL that is scoped to the object. To read more about this, click here: https://developers.google.com/storage/docs/authentication#cookieauth.
  • Cross-Origin Resource Sharing (CORS): This is again a unique and “modern” feature implemented only in GCS. The CORS specification was developed by the W3C to get around the same-origin policy is a security policy enforced on client-side web apps (e.g., web browsers) to prevent interactions between resources from different origins. While useful for preventing malicious behavior, this security measure also prevents useful and legitimate interactions between known origins. GCS supports this specification by allowing you to configure your buckets to return CORS-compliant responses. To read more about this, click here: https://developers.google.com/storage/docs/cross-origin. Please note that this feature is still in it’s “Experimental” stage (in other words, Beta phase Smile). I’m not 100% sure, but same thing is achieved through “$root” blob container in WABS.

Pricing

Before we dig deep into the functionality, let’s take a look at the pricing. In both systems, there are no upfront costs. The pricing model is rather simple and consumption based. In both systems, you are charged on usage basis and there are three components:

  1. Transaction: You’re charged for the for the number of transactions you performed against each system. Simply put, a single transaction can be defined as one call to the functionality in each system. However there is one significant difference in the way this pricing is calculated in each system. While in WABS the cost per transaction is fixed (currently $0.01 for 10,000 transactions), in GCS it varies based on the type of transaction performed. So if you perform PUT, POST, GET Bucket or GET Service operations you pay a higher price / transaction (e.g. $0.01 for 1,000 transactions) and for GET and all other Requests you pay a lower price / transaction (e.g. $0.01 for 10,000 transactions). Since they didn’t mention anything about DELETE operation on the pricing page and considering their similarity with AS3, I am assuming that like AS3 all the delete requests are free in GCS.
  2. Storage: You’re charged for the amount of data you store in each system.
  3. Data Transfer: You’re charged for the amount of data transferred from/to each of the system. At the time of writing this blog, both systems offer free ingress (i.e. data coming from outside) but charge a fee for data egress (i.e. data going out). There is not mention about whether data transfer between same data center will be chargeable or not in GCS. Similarly in WABS, only the data going out of a data center is charged.

There is also a concept of specialized pricing and both systems offer different pricing packages and offers which you can avail. For more details on pricing, please refer to https://www.windowsazure.com/en-us/pricing/details/ for WABS and https://developers.google.com/storage/docs/pricingandterms for GCS.

Function/Feature Summary

Following table summarizes the list of functions provided by WABS and GCS. Again, since GCS provide functionality, I will only describe the features which are provided in both WABS and GCS. For the functions which are provided by WABS only, I will not describe them here but point you to posts comparing WABS and AS3.

WABS GCS
Create Container/PUT Bucket Yes Yes
List Containers/GET Service Yes Yes
Delete Container/DELETE Bucket Yes Yes
List Blobs/GET Bucket (List Objects) Yes Yes
Set Container ACL/PUT Bucket (ACL or CORS) Yes Yes
Get Container ACL/Get Bucket (ACL or CORS) Yes Yes
Put Blob/PUT Object Yes Yes
POST Object No Yes
Get Blob/GET Object Yes Yes
Delete Blob/DELETE Object Yes Yes
Copy Blob/PUT Object Yes Yes
Get Blob Properties/HEAD Object Yes Yes
Get Blob Metadata/HEAD Object Yes Yes

To keep the table above less cluttered, following table lists the functions only available in WABS:

WABS GCS
Set Blob Service Properties Yes No
Get Blob Service Properties Yes No
Set Container Metadata Yes No
Get Container Metadata Yes No
Set Blob Properties Yes No
Set Blob Metadata Yes No
Snapshot Blob Yes No
Lease Blob Yes No
Put Block Yes No
Put Block List Yes No
Get Block List/List Parts Yes No
Put Page Yes No
Get Page Ranges Yes No

Now we’ll explore these functions in somewhat more details.

Create Container/PUT Bucket

WABS GCS
Create Container/PUT Bucket Yes Yes

As the name suggests, this function creates a new blob container and a bucket in WABS and GCS respectively.

One important thing to understand is that blob containers in WABS are scoped within a storage account while buckets in GCS are scoped within a project. When you create a storage account in WABS, you specify the location (data center) of that storage account. Thus all your blob containers are within a particular data center (geographic location constrained). Where as when you create a bucket in GCS, you can specify in which region you wish this bucket to be created. Thus in essence you could have your buckets spread over all data centers in GCS if there is a need for such a requirement. To accomplish the same in WABS, you would need to create storage accounts in different data centers first and then create blob containers in each storage account.

There are few rules when it comes to naming a blob container/bucket. Following table summarizes these rules:

WABS GCS
Minimum/Maximum length 3/63 3/63
Case Sensitivity Lower case Lower case
Valid Characters Alphanumeric, and Dash (-) Alphanumeric, Dash (-) and Period (.)

There are few other rules for naming a blob container/bucket:

  • Blob container and bucket names must start with a letter or number i.e. it can’t start with a dash (-) character. Furthermore in WABS, every dash (-) character must be immediately preceded and followed by a letter or number; consecutive dashes are not permitted in container names.
  • When naming buckets in GCS, they must be a series of one or more labels separated by a period (.) where each label must start and end with a lowercase letter or number. Also bucket names must not be formatted as an IP Address (e.g. 127.0.0.1).
  • Even though bucket names must be between 3 and 63 characters in length, however if the bucket name contains dots, then the bucket name can be up to 222 characters in length where each dot separated content cannot be longer than 63 characters.
  • Bucket names cannot begin with “goog” prefix.

There are a few other things:

  • When creating a blob container, you can also set the ACL for that blob container as well. Setting ACL at the time of creation is optional and if not specified, by default WABS create a “Private” blob container which is only accessible to the owner by default. However in GCS, you can’t set the ACL at the time of creation of a bucket. GCS applies the System Default ACL at the time of creation. To learn more about default bucket and object ACL, click here: https://developers.google.com/storage/docs/accesscontrol#default. You can change the ACL on a bucket or apply CORS configuration on a bucket after it is created using this same function.
  • WABS allow you to specify custom metadata for a blob container. Metadata essentially is a name/value collection. The maximum size of all name/value pairs in metadata can be 8KB. GCS does not allow you to have custom metadata on a bucket.

List Containers/GET Service

WABS GCS
List Containers/GET Service Yes Yes

As the name suggests, this function returns the list of all blob containers in a storage account in WABS and all buckets owned by the authenticated sender of the request in GCS.

A few comments about List Containers:

  • A single call to List Container functionality in WABS return a maximum of 5000 blob containers. If there are more than 5000 blob containers, WABS also return a continuation token using which next set of blob containers. By default WABS return up to 5000 blob containers but you can instruct WABS to return lesser number of blob containers as well. However the maximum is 5000. GCS documentation does not mention about this limit.
  • You can also perform server side filtering by instructing WABS to return only those blob containers whose names starts with a specified “prefix”.
  • You can also instruct WABS to optionally return blob container metadata with the blob containers list as well.

    Delete Container/DELETE Bucket

    WABS GCS
    Delete Container/DELETE Bucket Yes Yes

    As the name suggests, this function deletes a blob container and a bucket in WABS and AS3 respectively.

    A few comments about this functionality:

    • It might appear that Delete Container operation is a synchronous operation, in reality it is not. When you send a request to delete a blob container in WABS, it is marked for deletion by the system and is no longer accessible. The blob container is deleted through a garbage collection process. The actual time for deleting a blob container would among other things depend on the size of the data present in the blob container. In my experience, deleting a very large blob container could take hours. During this time an attempt to create a blob container by the same name would result in an error(Conflict error – HTTP Status Code 409). Thus a forethought must be put regarding the impact of this while deleting a table.
    • In GCS, a bucket must be empty before it can be deleted. You would need to first delete all objects from a bucket before you can perform this operation on a bucket. If you try and delete a non-empty bucket, you will get an error with 409 (Conflict) status code.

    List Blobs/GET Bucket (List Objects)

    WABS GCS
    List Blobs/GET Bucket (List Objects) Yes Yes

    This function is used to get the list of blobs and objects in a blob container and bucket respectively. Both functions are almost identical in the sense that:

    • In both functions you can limit how many number of blobs you want each function to return.
    • Both functions have an upper limit on how many blobs the service will return. In case of WABS, it is 5000 while in case of GCS, it is 1000. What this means is that in a single call to this service, WABS can return a list consisting of a maximum of 5000 blobs while GCS can return a list consisting of a maximum of 1000 objects.
    • Both functions support the concept of a delimiter. In both systems, a delimiter is a character to group blobs or object. Most commonly used delimiter is “/”. As mentioned above, both WABS and GCS support only two level hierarchy. Using a delimiter can create the illusion of a folder like hierarchy. For example, let’s say you have following blobs (or objects): images/a.png, images/b.png, images/c.png, logs/1.txt, logs/2.txt, files.txt. When you invoke this function and provide “/” as delimiter, both systems return the following: images, logs, and files.txt.
    • Both functions support the concept of server side filtering the list by making use of what is called prefix. When your request contains a prefix, both systems will return the items which starts with that prefix. In the example above, if we provide prefix as “images” (and no delimiter) both systems will return the following: images/a.png, images/b.png, and images/c.png.
    • Both functions support the concept of marker which is kind of a continuation token to instruct both systems to start listing the items from this marker.
    • Both systems return items alphabetically.

    There are a few differences as well:

    • As mentioned above, in a single call to this function WABS can return a maximum of 5000 blobs while GCS can return a maximum of 1000 objects.
    • When fetching the list, you can instruct WABS to return snapshots for the blobs as well. In GCS, there is no equivalent function.
    • When fetching the list, you can instruct WABS to return metadata for blobs as well. GCS does not return this information. You would need to call HEAD Object for each object to get it’s metadata.
    • When fetching the list, you can instruct WABS to return the list of blobs which are not yet committed (i.e. partially uploaded). GCS only returns the objects which are fully uploaded.
    • You can use this function to return ACL or CORS configuration for a bucket as well.

Set Container ACL/PUT Bucket (ACL or CORS)

WABS GCS
Set Container ACL/PUT Bucket (ACL or CORS) Yes Yes

This function is used to set the ACL for a blob container and a bucket. For a blob container in WABS, you can also define one or more access policies as well. For a bucket, you can configure either ACL or CORS (but not both in a single request).

Possible ACL values for a blob container are:

    • Full public read access (Container): Container and blob data can be read via anonymous request. Clients can enumerate blobs within the container via anonymous request, but cannot enumerate containers within the storage account.
    • Public read access for blobs only (Blob): Blob data within this container can be read via anonymous request, but container data is not available. Clients cannot enumerate blobs within the container via anonymous request.
    • No public read access (Private): Container and blob data can be read by the account owner only.

Possible ACL values for a bucket are:

  • READ: When this ACL is granted on a bucket, it allows grantee to list the objects in the bucket.
  • WRITE: When this ACL is granted on a bucket, it allows grantee to create, overwrite, and delete any object in the bucket.
  • FULL_CONTROL: When this ACL is granted on a bucket, it allows grantee the READ and WRITE permissions on the bucket.

One neat thing about setting ACL in GCS is that it is very fine grained in the sense that you can grant different set of permissions to different users. For example, I can set READ ACL for user1 while WRITE ACL for user 2. This flexibility is not there in WABS. You can only set only one kind of permission on a blob container there.

One neat thing about this function in WABS is that apart from setting the ACL, you can also set one or more (up to a maximum of 5) container level access policy. A container level access policy allows you to specify a time bound set of permissions. For example, you could create an access policy with “Write” permissions on a blob container which is valid for a day. Using this policy, you could then generate a URL which you can share with your users who would then be able to write to that blob container but only for the time period the policy is valid. Another advantage with using a container level access policy is that it gives you more flexibility in issuing Shared Access Signatures. A shared access signature permit you to provide access rights to blob containers and blobs at a more granular level than by simply setting a blob container’s permissions for public access. By specifying a Shared Access Signature, you can grant users access to a specific blob or to any blob within a specified container for a specified period of time.

Get Container ACL/GET Bucket (ACL or CORS)

WABS GCS
Get Container ACL/GET Bucket (ACL or CORS) Yes Yes

This function is used to get the ACL for a blob container and  either ACL or CORS configuration for a bucket. In case of WABS, this function also returns the access policies defined for that blob container.

To get ACL for a bucket, you call GET Bucket function with “acl” query string parameter and to get CORS configuration for a bucket, you call GET Bucket function with “cors” query string parameter. If none of the parameters are specified, then this function returns list of objects in that bucket.

Put Blob/PUT Object

WABS GCS
Put Blob/PUT Object Yes Yes

As explained earlier, this function adds a block or a page blob to a blob container in WABS and an object to a bucket in GCS. You use this function to set ACL on an existing object in GCS (WABS does not support ACL at blob level). You also use this function to copy an object from one bucket to another in GCS.

A few comments about this functionality in each system:

  • In both systems, this function will overwrite an existing item with the same name.
  • Both systems allow you to define some properties on items being uploaded like cache control, content type, etc.
  • Both systems allow you to send MD5 hash of the content to check for data integrity between sender and receiver.
  • You can also set predefined ACL on an object while creating it in GCS which you can’t do in WABS. However you can provide more granular ACL on an existing object using this functionality when the purpose of this function is to set ACL.
  • Both systems allow you to set custom metadata on blobs and objects in form of a collection of name/value pairs. In WABS, maximum size of metadata can be 8 KB. I’m not able to find that information for GCS.
  • When you create a Page Blob using this functionality in WABS, you only initiate the page blob. You don’t put data in that blob. To insert data in a page blob you would use Put Page functionality.
  • When you create a Block Blob in WABS or an object in GCS, the data is sent in the request payload.
  • Maximum size of block blob which can be created with this functionality is 64 MB. If the size of the blob is more than 64 MB, it must be split into blocks and uploaded using Put Block and Put Block List functionality.
  • WABS allow you to specify preconditions which must be satisfied for successful completion of this operation by means of conditional headers (If-Modified-Since, If-Unmodified-Since, If-Match, If-None-Match).

POST Object

WABS GCS
POST Object No Yes

This function adds an object to a specified bucket using HTML forms. POST is an alternate form of PUT that enables browser-based uploads as a way of putting objects in buckets. Parameters that are passed to PUT via HTTP Headers are instead passed as form fields to POST in the multipart/form-data encoded message body.

Get Blob/GET Object

WABS GCS
Get Blob/GET Object Yes Yes

As explained earlier, this function downloads a blob from a blob container in WABS and an object from a bucket in GCS.

A few comments about this functionality in each system:

  • As mentioned above, you can do a partial download by specifying the bytes you want to download in the “Range” request header.
  • In WABS, this function also returns the custom metadata for the blob being downloaded.
  • In GCS, you can either use this function to get contents of an object or its ACL.
  • Both systems allow you to specify preconditions which must be satisfied for successful completion of this operation by means of conditional headers (If-Modified-Since, If-Unmodified-Since, If-Match, If-None-Match).
  • You can also use this functionality to get versioned blobs. In order to get versioned blob you would need to specify blob’s snapshot date/time in WABS.

Delete Blob/DELETE Object

WABS GCS
Delete Blob/DELETE Object Yes Yes

This function allows you to delete a blob or an object from storage.

A few comments about this functionality in each system:

  • You can use this function to remove only the snapshots of a blob in WABS without deleting the base blob. If base blob is deleted, all of its snapshots are deleted as well.
  • You can also use this functionality to delete a particular version of a blob in WABS. In order to do that, you would need to specify blob’s snapshot date/time.
  • WABS allow you to specify preconditions which must be satisfied for successful completion of this operation by means of conditional headers (If-Modified-Since, If-Unmodified-Since, If-Match, If-None-Match).

Copy Blob/Put Object – Copy

WABS GCS
Copy Blob/Put Object – Copy Yes Yes

As the name suggests, this function copies a blob from a source location to a target location.

A few comments about this functionality in each system:

  • Both systems allow you to specify preconditions which must be satisfied for successful completion of this operation by means of conditional headers (If-Modified-Since, If-Unmodified-Since, If-Match, If-None-Match). You can specify these conditional headers on both source and target items in WABS and on source item in GCS.
  • WABS only allow you to copy item from one blob container to another blob container in same storage account. GCS does not have this restriction. As long as the source and target buckets belong to the same project, an object can be copied.
  • Both systems allow you to either copy the existing metadata or define new metadata for the target item.
  • In GCS, if you copy the metadata, the ACLs are not copied and the default object ACLs are applied to the new object. You can apply different ACLs during a copy operation by using appropriate request header.
  • UPDATE: You can also use this functionality to update an object’s metadata as well. All you would need to do is specify the source and target object the same and set the metadata directive to “replace”.

Cool Tips:

  • Neither of the system support “Move” functionality. However you can perform a “move” operation by first copying item from source to target and then deleting the source.
  • You can also “promote” a version of a blob to the most current version. To do so you would pass the versioned blob (by specifying its snapshot) as source and the target as the unversioned blob.

Get Blob Properties/HEAD Object

WABS GCS
Get Blob Properties/HEAD Object Yes Yes

This function is used to fetch properties/metadata of a blob and an object. It does not return the contents of a blob or object.

A few comments about this functionality in each system:

  • Get Blob Properties in WABS and HEAD Object in GCS returns all user-defined metadata, standard HTTP properties, and system properties for the blob or object respectively.
  • Both systems allow you to specify preconditions which must be satisfied for successful completion of this operation by means of conditional headers (If-Modified-Since, If-Unmodified-Since, If-Match, If-None-Match).
  • You can also use this functionality to get properties for a particular version of a blob in WABS. In order to get that information, you would need to specify blob’s snapshot date/time in WABS.
  • If you’re only interested in getting custom metadata for a blob, you would need to use Get Blob Metadata function.

Get Blob Metadata/Head Object

WABS GCS
Get Blob Metadata/HEAD Object Yes Yes

This function returns all user-defined metadata for the specified blob or object. You can also use this functionality to get metadata for a particular version of a blob. In order to get that information, you would need to specify blob’s snapshot date/time in WABS.

Summary

As we saw in this post, both systems offer similar feature set. There are some features which are present in one system which we wish for in another system but generally speaking the disparity between them is not that big.

A few comments (Disclaimers)

  1. It is not the intent of the blog post to prove that one service is superior over other. I just wanted to have a very objective comparison of these two services.
  2. I make my living using Windows Azure Platform (if you wish, you can call me Windows Azure fan boy Smile). This does not mean that I have negative things to say about Google App Engine. It’s just that I never had a chance to play with Google App Engine just yet.
  3. Since I haven’t played with Google App Engine just yet, the information I have presented in this blog post about Google Cloud Storage is solely based on my understanding of this service based on the documentation. Very likely, I may be wrong about some of things I have written here. Please note that it is not done intentionally and you can blame my lack of knowledge for that. If you find such issues, please let me know and I will fix them ASAP.

[This is the latest product I'm working on]