Gaurav Mantri's Personal Blog.

Comparing Windows Azure Table Storage and Amazon SimpleDB

In this blog post, we are going to compare Windows Azure Table Storage Service and Amazon SimpleDB from core functionality point of view.

For the sake of brevity, we’re going to refer Windows Azure Table Storage Service as WATS and Amazon SimpleDB as ASDB.

From fundamental functionality point of view, both WATS and ASDB provide similar functionality. Both of them are NoSQL database systems designed to store massive amount of data. You can read more about NoSQL databases on Wikipedia. Amazon recently started offering another NoSQL database which is called Amazon DynamoDB. I will compare that with WATS in another blog post later.

At a very high level, two systems are quite similar in the sense that:

  • Both system are non-relational NoSQL systems.
  • Both systems are essentially key/value store where data is stored in the form of key/value pairs.
  • None of the system supports relations normally available in a relational database.
  • Both systems are designed for high availability and flexibility.
  • Both systems provide a REST based API for working with queues and messages and other higher level language libraries which are essentially wrappers implementing REST API. Over the years, both systems have evolved in terms of functionality provided. In both systems, each release of the API is versioned and is specified as a date. At the time of writing this blog, the service version number for WATS is 2011-08-18 while that of ASDB is 2009-04-15

There are a few key differences as well. I will mention them here and will talk about them later in the post:

  • One big difference in ASDB and WATS is that ASDB is eventually consistent while WATS is strongly consistent. What that means is that when you’re reading the data in ASDB it is quite possible that the service does not return you most recent data. However you can instruct ASDB to perform consistent reads and provide you with the most recent data.
  • ASDB indexes your data while WATS does not. Well, technically WATS also indexes your data but only on certain attributes (PartitionKey and RowKey). Having the ability to specify secondary indexes in WATS has been one of the most requested feature there. In this sense, WATS is more like Amazon DynamoDB than ASDB.
  • ASDB has a 10 GB limit of data in each domain (read table) while WATS does not have this limitation. Again in this sense, WATS is more like Amazon DynamoDB.

Concepts

Before we talk about these two services in greater detail, I think it is important to get some concepts clear. If you’re familiar with the basic foundation concepts of WATS and ASDB, you may want to skip this section.

  • Table and Domain:  When we think of tables, first thing that comes to our mind is “something” consisting of rows and columns (kind of like a grid). A “Table” in WATS and a “Domain” in ASDB may look like a table but they are not. Essentially think of them as a container containing collections of similar name/value pairs representing data. In a relational database, we define columns for a table and the rows contain data for these columns. In order to store data in a table, you have to define columns. A table in WATS and a domain in ASDB are schemaless i.e. you don’t have to define “columns” there to store the data. In short, think of them as a “bag” where you put relevant data.

    While fundamentally both Tables and Domains are containers for storing data, there are a few differences between them. Some of the differences are:

    • By default there is a limit of 250 domains in ASDB (which you can increase by contacting them), while in WATS there is no such limitation. You can have as many tables in WATS as you want provided you don’t exceed your storage quota (currently 100 TB).
    • There is a size limit of 10 GB per domain in ASDB where as in WATS you can store as much data as you want provided you don’t exceed your storage quota.
    • Each domain in ASDB can contain up to 1 billion attributes where as in WATS no such restriction exists.
    • Data in each domain is indexed on all attributes where as in WATS it is only indexed on few key attributes (more on this below).
       
  • Entity and Item: This is what defines the data in a table. Each entity (in WATS) and item (in ASDB) consists of one or more attributes. An attribute is a name/value (name/value/data type in WATS) collection. In a relational database, this would be a row. Here each row in a table/domain is on its own having no relation whatsoever with other rows in that table. Each entity in WATS is uniquely identified by two attributes: PartitionKey and RowKey. Think of them as a composite primary key in a table in WATS. In a table each entity should have a unique combination of  these two attributes. In ASDB, each item is uniquely identified by ItemName. Think of it as a primary key in a domain in ASDB. In a domain each item should have a unique item name.

    While fundamentally both Entity and Item store data, there are a few differences between them. Some of the differences are:

    • There can be a maximum of 256 attributes per entity or item. However each entity in WATS has 3 system defined attributes: PartitionKey, RowKey and Timestamp which leaves you with only up to 253 custom attributes. While you define values for PartitionKey and RowKey attributes, value for Timestamp attribute is provided by WATS. It tells you date/time value (in UTC) when an entity was created/updated. PartitionKey and RowKey are of “String” data type.
    • All attribute values are of String type in ASDB however in WATS they could be of one of the following 8 data types: Binary, Boolean, DateTime, Decimal, Int32, Int64, Guid, and String thus providing you with richer and more qualified data model.
    • Maximum size of data in an attribute in an item in ASDB is 1KB and combine that with the limitation of a maximum of 256 attributes, the maximum size of an item can be 256 KB (= 1 KB [max. size / attribute] * 256 [max. number of attributes]). There is no limit on the size of an individual attribute in an entity in WATS however the maximum size of an entity is 1 MB. This 1 MB size limitation includes the size of PartitionKey, RowKey and Timestamp as well.
    • In WATS, data is indexed only on PartitionKey and RowKey attributes. Indexing on custom attributes is not supported at this point of time. Furthermore data in Windows Azure is partitioned based on the PartitionKey value. Thus it becomes really important to choose your PartitionKey wisely as improper PartitionKey value could lead to severely degraded performance when fetching data from a table. There is a very useful whitepaper published by Windows Azure Storage team which you can read here: http://blogs.msdn.com/b/windowsazurestorage/archive/2010/05/10/windows-azure-storage-abstractions-and-their-scalability-targets.aspx.   

Pricing

Before we talk about the functionality offered in each system, let’s take a moment and talk about the pricing. In both systems, there are no upfront costs. The pricing model is rather simple and consumption based. In both systems, you are charged on usage basis and there are three components:

  1. Transaction: You’re charged for the for the number of transactions you performed against each system. Simply put, a single transaction can be defined as one call to the functionality in each system. However there is one significant difference in the way this pricing is calculated in each system. While in WATS the cost per transaction is fixed (currently $0.01 for 10,000 transactions), in ASDB it varies based on the time it takes for the system to perform that transaction. Basically ASDB measures the machine utilization of each request and charges based on the amount of machine capacity used to complete the particular request. So in case of ASDB when it comes to calculate how much you owe Amazon, it simply calculates the total machine utilization (in hours) and multiplies it with machine utilization rate ($/hour) to figure out your transaction costs. Where as in case of WATS when it comes to calculate how much you owe Microsoft, it calculates the total number of transactions and multiples it with transaction rate ($/transaction) to figure out transaction costs.

    IMHO, Windows Azure pricing is simple, straight forward and predictable. If I can accurately predict the number of transactions I am going to perform against WATS, I can accurately predict my bill. However the good part about ASDB is that with each transaction it actually tells you the machine hours consumed by that transaction thus giving you the opportunity to fine tune your requests to ASDB. WATS does not expose this functionality with each transaction however you could find that out by using Windows Azure Storage Analytics (http://msdn.microsoft.com/en-us/library/windowsazure/hh343270.aspx).
     

  2. Storage: You’re charged for the amount of data you store in each system. Both systems have published detailed information about the logic/formula they use for calculating the storage costs which you can read here http://aws.amazon.com/simpledb/#calculating-your-storage-needs [ASDB] and  http://blogs.msdn.com/b/windowsazurestorage/archive/2010/07/09/understanding-windows-azure-storage-billing-bandwidth-transactions-and-capacity.aspx [WATS].
     
  3. Data Transfer: You’re charged for the amount of data transferred from/to each of the system. At the time of writing this blog, both systems offer free ingress (i.e. data coming from outside) but charge a fee for data egress (i.e. data going out). Data transferred between ASDB and Amazon EC2 within a single region is free of charge (i.e., $0.00 per GB). Data transferred between ASDB and Amazon EC2 in different regions will be charged at Internet Data Transfer rates on both sides of the transfer. Similarly in WATS, only the data going out of a data center is charged.

There is also a concept of specialized pricing and both systems offer different pricing packages and offers which you can avail. For more details on pricing, please refer to https://www.windowsazure.com/en-us/pricing/details/ for WATS and http://aws.amazon.com/simpledb/pricing/ for ASDB.

Function/Feature Summary

Following table summarizes the list of functions provided by WATS and ASDB.

  WATS ASDB
Create Table/CreateDomain Yes Yes
Query Tables/ListDomains Yes Yes
Delete Table/DeleteDomain Yes Yes
DomainMetadata No Yes
CRUD Operations on a single Entity/Item Yes Yes
– Insert Entity/PutAttributes Yes Yes
– Update Entity/PutAttributes Yes Yes
– Merge Entity/PutAttributes Yes Yes
– Insert Or Replace Entity/PutAttributes Yes Yes
– Insert Or Merge Entity/PutAttributes Yes Yes
– Delete Entity/DeleteAttributes Yes Yes
– Query Entities/GetAttributes Yes Yes
CRUD Operations on multiple Entities/Items Yes Yes
– Entity Group Transactions/BatchPutAttributes Yes Yes
– Entity Group Transactions/BatchDeleteAttributes Yes Yes
Query Entities/Select Yes Yes

 

Now we’ll explore these functions in somewhat more details.

 

Create Table/CreateDomain

  WATS ASDB
Create Table/CreateDomain Yes Yes

As the name suggests, this function creates a table and a domain in WATS and ASDB respectively. There is one important difference between this operation in two systems: CreateDomain operation is an idempotent operation while Create Table is not. In WATS, if you try to create a table with the name of an existing table, this function will throw an error (Conflict error – HTTP Status Code 409) however in ASDB no error will be thrown. Also based on documentation, CreateDomain operation might take 10 or more seconds to complete.

There are few rules when it comes to naming a table/domain. Following table summarizes these rules:

  WATS ASDB
Minimum/Maximum length 3/63 3/255
Case Sensitivity Mixed case Mixed case
Valid Characters Alphanumeric Alphanumeric, Dash (-), Hyphen (_) and Period (.)

There are a few other things:

  • In WATS, a table name cannot begin with a number e.g. 1a2 is an invalid table name while a12 is a valid table name. Furthermore, table names preserve the case with which they were created, but are case-insensitive when used.
  • As mentioned above also in the Concepts section, by default you can have up to 250 domains per account in ASDB. To increase this limit, you can submit a request to Amazon (http://aws.amazon.com/contact-us/simpledb-limit-request/).

 

Query Tables/ListDomains

  WATS ASDB
Query Tables/ListDomains Yes Yes

As the name suggests, this function returns the list of tables and domains from WATS and ASDB respectively. A single call to this function returns up to 1000 tables in WATS and 100 domains in ASDB. If there are more tables and domains available, then a “continuation token” is returned by both services which can be used to fetch next set of tables and domains.

To summarize:

  WATS ASDB
Maximum records returned per call 1000 100
Returns continuation token in case more data is available Yes Yes

 

Delete Table/DeleteDomain

  WATS ASDB
Delete Table/DeleteDomain Yes Yes

As the name suggests, this function removes a table and domain from WATS and ASDB respectively. There is one important difference between this operation in two systems: DeleteDomain operation is an idempotent operation while Delete Table is not. In WATS, if you try to create a table with the name of an existing table, this function will throw an error (NotFound error – HTTP Status Code 404) however in ASDB no error will be thrown.

Based on the documentation, deleting a domain in ASDB may take 10 seconds or more to complete while deleting a table in WATS may take 40 seconds or more to complete. As far as WATS is concerned, even though this operation might sound like a synchronous operation, in reality it is not. When you send a request to delete a table in WATS, it is marked for deletion by the system and is no longer accessible. The table is deleted through a garbage collection process. The actual time for deleting a table would among other things depend on the size of the data present in the table. In my experience, deleting a very large table could take hours. During this time an attempt to create a table by the same name would result in an error(Conflict error – HTTP Status Code 409). Thus a forethought must be put regarding the impact of this while deleting a table.

 

DomainMetadata

  WATS ASDB
DomainMetadata No Yes

This function is only available in ASDB and returns some useful information like size and number of attributes about a domain. Information returned by this function include: Timestamp, ItemCount, AttributeValueCount, AttributeNameCount, ItemNamesSizeBytes, AttributeValuesSizeBytes, AttributeNamesSizeBytes.

As I mentioned, this feature is not available in WATS however it is one of the most requested features there. Users have consistently asked for some statistics on WATS.

 

CRUD Operations on a single Entity/Item

  WATS ASDB
CRUD Operations on a single Entity/Item Yes Yes
– Insert Entity/PutAttributes Yes Yes
– Update Entity/PutAttributes Yes Yes
– Merge Entity/PutAttributes Yes Yes
– Insert Or Replace Entity/PutAttributes Yes Yes
– Insert Or Merge Entity/PutAttributes Yes Yes
– Delete Entity/DeleteAttributes Yes Yes
– Query Entities/GetAttributes Yes Yes

Both systems allow you to perform Create, Read, Update, and Delete (CRUD) operations on a single entity/item.

A few things to keep in mind:

  • Both systems allow you to specify up to 256 attributes per entity/item. However keep in mind in WATS, since there are 3 system defined attributes (PartitionKey, RowKey, and Timestamp) you can only define up to 253 attributes.
  • An attribute’s value is of string type in ASDB where as in WATS it could be one of the following eight data types: Binary, Boolean, DateTime, Double, GUID, Int32, Int64, and String.
  • Maximum amount of data that an attribute value can hold in ASDB is 1KB while the maximum size of an entity in WATS can be 1MB.

Create

There are many operations in WATS to perform create operations while in ASDB, it is nicely encapsulated in just one functionality (PutAttributes). PutAttributes function can be used update an item as well. To create an item in ASDB, you simply define a unique item name and define the attributes and their values and submit it to the system. To create an entity in WATS, you define values for PartitionKey, RowKey and other attributes and submit it to the system. Few comments about creating an entity in WATS:

  • If you create an entity using Insert Entity functionality, if an entity with same PartitionKey/RowKey combination exists and error will be thrown by the system.
  • WATS support UPSERT functionality through Insert or Replace Entity and Insert or Merge Entity functionality. When trying to create an entity using either of these two functionality, if a matching entity is found it will be updated else a new entity will be created.    

Read

To read a single entity in WATS, you would invoke something like a Query Entities functionality without any select criteria. You would need to provide that entity’s PartitionKey and RowKey value so that a matching entity can be returned by the system. In ASDB, you use GetAttributes functionality to get the attributes of an item. It is quite possible that the data returned by this functionality in ASDB is not the most recent one because of its “eventual consistency” model however you can instruct ASDB to perform a “consistent” read operation to ensure that most recent data is returned. In both systems, if no matching record is found an empty dataset is returned back.

Update

In ASDB, you can update one or more attributes of an item by calling PutAttributes function. If the attribute is present, its value will be changed otherwise a new attribute will be created. In WATS, there are 2 distinct mechanism by which an entity is updated:

  • Replace mechanism wherein existing entity is deleted and a new entity is created with same PartitionKey/RowKey values and new attribute values. This is supported through Update Entity operation.
  • Merge mechanism wherein existing attributes are updated if a new value for those attributes is provided in the payload. Any new attributes provided in the payload will be added to that entity. Any attributes which are present in existing entity but not provided in the update payload will remain untouched. This is supported through Merge Entity operation. Looking at the functionality in PutAttribute, I think it matches closely with Merge Entity operation than Update Entity operation. Come to think about it, it is not possible to replace an item in its entirety with new set of values in ASDB. In order to accomplish that, you must first delete that item (or delete all attributes) and then create new attributes.

Conditional Updates: Both systems support conditional updates to an entity/item however the mechanism is entirely different. In ASDB, you can define conditions on value of existing attributes i.e. you can instruct ASDB to update value of say attribute1 only if the value of another attribute say attribute2 is some value. Furthermore conditional updates on ASDB also supports existence of an attribute i.e. you can instruct ASDB to update value of (again) say attribute1 only of another attribute say attribute2 exists. In WATS, it is entirely different. In WATS, it is based on entity’s ETag value. To update an entity conditionally, you must provide entity’s ETag value in one of the request headers (when using REST API). WATS then compares this value with the most current value of ETag for that entity and the update only succeeds if both ETag value matches.

Delete

In ASDB, you can delete either few attributes of an item or an item completely using DeleteAttributes functionality. All all attributes of an item are deleted, that item is also deleted. In WATS there is no direct function to delete just one or more attributes from an entity. Using Delete Entity functionality in WATS you delete an entire entity. To delete one or more attributes from an entity in WATS, you would need to use Update Entity function and providing only those attributes which you want to keep in the request payload.

DeleteAttributes function in ASDB is idempotent i.e. if you try and delete an attribute which is not present, ASDB will not throw an error. If you’re trying to delete an entity which does not exist in WATS, an error (NotFound error – HTTP Status Code 404) will be thrown.

Conditional Deletes: Both systems support conditional deletion of an entity/item however the mechanism is entirely different. In ASDB, you can define conditions on value of existing attributes i.e. you can instruct ASDB to delete say attribute1 only if the value of another attribute say attribute2 is some value. Furthermore conditional updates on ASDB also supports existence of an attribute i.e. you can instruct ASDB to delete (again) say attribute1 only of another attribute say attribute2 exists. In WATS, it is entirely different. In WATS, it is based on entity’s ETag value. To delete an entity conditionally, you must provide entity’s ETag value in one of the request headers (when using REST API). WATS then compares this value with the most current value of ETag for that entity and the delete only succeeds if both ETag value matches.

 

CRUD Operations on multiple Entities/Items

  WATS ASDB
CRUD Operations on multiple Entities/Items Yes Yes
– Entity Group Transactions/BatchPutAttributes Yes Yes
– Entity Group Transactions/BatchDeleteAttributes Yes Yes

In both systems it is possible to perform CRUD operations on multiple entities/items in a single call to the service. When working with a single item in ASDB a single function is used to creating and updating and many functions are used for creating and updating a single entity in WATS, reverse is true when working with multiple entities/items in each system.

In WATS, you essentially use Entity Group Transactions functionality to create/update/delete one or more entities in a single transaction. In ASDB, you use BatchPutAttributes functionality to create/update one or more attributes and BatchDeleteAttributes functionality to delete one or more attributes.

A few comments about these functionalities:

  • In both systems, it is a transaction scoped operation i.e. either the whole operation succeeds or fails.
  • In ASDB, you can work with multiple items in a single call to BatchPutAttributes and BatchDeleteAttributes. While you can work with multiple entities when performing Entity Group Transactions, all entities must have same PartitionKey value i.e. you can’t go across a PartitionKey boundary when performing an entity group transaction.
  • In ASDB, maximum size of the request can be 1 MB while in WATS it is 4 MB.
  • When deleting multiple attributes/items using BatchDeleteAttributes in ASDB, there is a 25 item limit per operation. Similarly there is a limit of 100 entities per entity group transaction in WATS.

 

Query Entities/Select

  WATS ASDB
Query Entities/Select Yes Yes

This functionality is used to retrieve one or more entities/items from a table/domain based on a criteria.

A few comments about these functionalities:

  • In both systems, there is a limit to the maximum number of records returned per call. In ASDB a maximum of 2500 items are returned while in WATS a maximum of 1000 entities are returned. If there are more items/entities are available, both services return some kind of continuation token which can be used when you call this functionality next time. In both systems you can specify the maximum number of items/entities you want the service to return (this number should be less than the maximum items which can be returned by the service – currently 2500 in ASDB and 1000 in WATS). By Default 100 items are returned per call in ASDB and 1000 entities are returned per call in WATS.
  • Both systems are designed for high availability and will either timeout your requests and return partial results. Both systems are designed to timeout your queries after 5 seconds. Based on the data each service has fetched in 5 seconds will be returned back to the caller. If more data is available then a continuation token will be returned back.
  • In case of WATS, it is possible that your query result is sent back if the service crosses a PartitionKey boundary.
  • In case of ASDB, there is also a limit of 1 MB of the response size i.e. your maximum response size can be of 1 MB. If your query potentially result in a larger dataset, only a partial result will be sent back.
  • Based on the points above, it is quite possible that you get all the records you requested for or a partial result or no result at all even though there is matching data available. When partial records or no records are sent back despite the availability of data matching your query, the service will always return you a continuation token. Thus it becomes very important that you put provision in your application/code to handle this continuation token.
  • ASDB lets you write queries in a SQL SELECT like fashion whereas in WATS you have to specify a query using WCF data service filters ($filter).
  • You can use “ORDER BY” clause in your ASDB queries and the service will return you ordered data based on that however in WATS the data returned is always ordered by PartitionKey first and then RowKey. WATS does not support ordering query results on any other attributes at this time.
  • Both systems allow you to return all attributes or partial list of attributes in a query. In ASDB, you specify that as part of your select clause (e.g. Select attribute1, attribute2, … from … or Select * from …). In WATS, it is achieved by specifying the attribute names you want the service to return in $select query option e.g. $select=PartitionKey,RowKey,Attribute1,Attribute2….
  • By default ASDB supports eventual consistency so it is possible that the data returned by the select query is not the most recent one. However you can instruct ASDB to return you most recent data by forcing it to perform a consistent read. WATS on the other hand is strongly consistent thus always the most recent data is returned.

Summary

To summarize, both systems are quite comparable and offer similar feature sets. There are some differences in the functionality and if we as developers can keep them in mind, it is quite possible to build a system which abstracts both services so that it can be interchanged if needed. There are both pros and cons in each system (e.g. in ASDB indexing capability can be considered as a pro while limiting the size to 10 GB can be considered as a downside and likewise in WATS lack of secondary indexing can be considered as a downside while no limit on the data can be considered as a pro), and we just have to evaluate these pros and cons rationally to decide which system would best serve our need.

A few comments (Disclaimers)

  1. It is not the intent of the blog post to prove that one service is superior over other. I just wanted to have a very objective comparison of these two services.
  2. I make my living using Windows Azure Platform (if you wish, you can call me Windows Azure fan boy Smile). This does not mean that I have negative things to say about Amazon Cloud Computing Platform. It’s just that I never had a chance to play with Amazon just yet.
  3. Since I haven’t played with Amazon platform just yet, the information I have presented in this blog post about Amazon SimpleDB is solely based on my understanding of this service based on the documentation. Very likely, I may be wrong about some of things I have written here. Please note that it is not done intentionally and you can blame my lack of knowledge for that. If you find such issues, please let me know and I will fix them ASAP.

[This is the latest product I'm working on]