Gaurav Mantri's Personal Blog.

Comparing Windows Azure Table Storage and Amazon DynamoDB

In this blog post, we are going to compare Windows Azure Table Storage Service and Amazon DynamoDB from core functionality point of view.

For the sake of brevity, we’re going to refer Windows Azure Table Storage Service as WATS and Amazon DynamoDB as ADDB.

From fundamental functionality point of view, both WATS and ADDB provide similar functionality. Both of them are NoSQL database systems designed to store massive amount of data. You can read more about NoSQL databases on Wikipedia. Apart from ADDB, Amazon has another NoSQL database offering called Amazon SimpleDB. I have written a blog post comparing Windows Azure Table Storage and Amazon SimpleDB which you can read here: http://gauravmantri.com/2012/04/21/comparing-windows-azure-table-storage-and-amazon-simpledb/.

One important thing to notice when it comes to ADDB is that it is just not a NoSQL database offering. It is a database service. Yes, it is true that it is used to manage data however you control how scalable the system should be by provisioning the throughput you desire. In that sense it is more like a compute instance in Amazon or Windows Azure. In case of compute instances you decide what size of instance you want and the system allocates that for you. Similarly in case of ADDB, you tell the system how many read/write operations your application(s) will be performing against a table in ADDB and ADDB provisions that capacity for you.

At a very high level, two systems are quite similar in the sense that:

  • Both system are non-relational NoSQL systems.
  • Both systems are essentially key/value store where data is stored in the form of key/value pairs.
  • None of the system supports relations normally available in a relational database.
  • Both systems are designed for high availability and flexibility.
  • Both systems are strongly consistent however with WATS it is by design but in ADDB it is optional to perform consistent reads but that comes with additional price.
  • Both systems provide a REST based API for working with queues and messages and other higher level language libraries which are essentially wrappers implementing REST API. Over the years, both systems have evolved in terms of functionality provided. In both systems, each release of the API is versioned and is specified as a date. At the time of writing this blog, the service version number for WATS is 2011-08-18 while that of ADDB is 2011-12-05.

There are a few key differences as well. I will mention them here and will talk about them later in the post:

  • In ADDB, the throughput you require out of system is something you provision when you begin while in WATS the throughput is controlled by the system. Thus, when it comes to throughput provisioning ADDB is more flexible however it requires more upfront work on your part to begin with.
  • Unlike SimpleDB where there is a 10 GB data limit per domain, ADDB does not have that limitation and you can store as much data as you like. WATS also does not impose any hard limits for the data per table, however you’re constrained by the size of your storage account (currently 100 TB).
  • ADDB can optionally index your data while WATS does not. Well, technically WATS also indexes your data but only on certain attributes (PartitionKey and RowKey). Having the ability to specify secondary indexes in WATS has been one of the most requested feature there.

Concepts

Before we talk about these two services in greater detail, I think it is important to get some concepts clear. If you’re familiar with the basic foundation concepts of WATS and ADDB, you may want to skip this section.

Table: When we think of tables, first thing that comes to our mind is “something” consisting of rows and columns (kind of like a grid). A “Table” in WATS and ADDB may look like a table but they are not. Essentially think of them as a container containing collections of similar name/value pairs representing data. In a relational database, we define columns for a table and the rows contain data for these columns. In order to store data in a table, you have to define columns. A table in WATS and ADDB are schemaless i.e. you don’t have to define “columns” there to store the data. In short, think of them as a “bag” where you put relevant data.

While fundamentally Tables in both systems are containers for storing data, there are a few differences between them. Some of the differences are:

  • By default there is a limit of 256 tables in ADDB (which you can increase by contacting them), while in WATS there is no such limitation. You can have as many tables in WATS as you want provided you don’t exceed your storage quota (currently 100 TB).
  • When you define a table in ADDB you have to define a primary key for that table. This could be a single attribute or multiple attributes in that table. All items in that table must have unique value for this primary key. In WATS, the primary key (so to speak) is designated by the system and is a combination of “PartitionKey” and “RowKey” attributes.  Each entity in a table must have a unique combination of these two attributes.
  • You have to specify the provisioned throughput (read/write capacity) when you’re creating the table in ADDB while this is not in your control in WATS. You could also change the throughput through the API in ADDB later on. When the requests against a particular table crosses the provisioned throughput, ADDB can throttle your requests.

Entity and Item: This is what defines the data in a table. Each entity (in WATS) and item (in ADDB) consists of one or more attributes. An attribute is a name/value (name/value/data type in WATS) collection. In a relational database, this would be a row. Here each row in a table/domain is on its own having no relation whatsoever with other rows in that table. Each entity in WATS is uniquely identified by two attributes: PartitionKey and RowKey. Think of them as a composite primary key in a table in WATS. In a table each entity should have a unique combination of these two attributes. In ADDB, each item is uniquely identified by a primary key which is one of the attribute of an item. All items in a table in ADDB must have this primary key attribute.

While fundamentally both Entity and Item store data, there are a few differences between them. Some of the differences are:

  • There can be a maximum of 256 attributes per entity in WATS however there is no limit on the number of attributes per item in ADDB. However each entity in WATS has 3 system defined attributes: PartitionKey, RowKey and Timestamp which leaves you with only up to 253 custom attributes. While you define values for PartitionKey and RowKey attributes, value for Timestamp attribute is provided by WATS. It tells you date/time value (in UTC) when an entity was created/updated. PartitionKey and RowKey are of “String” data type.
  • Maximum size of an entity in WATS is 1MB while the maximum size of an item in ADDB is 64KB.
  • In WATS the attribute values could be of one of the following 8 data types: Binary, Boolean, DateTime, Decimal, Int32, Int64, Guid, and String thus providing you with richer and more qualified data model. However in ADDB, it could be of one of the following types: String, Number and String/Number Sets (i.e. string or number arrays).
  • In WATS, data is indexed only on PartitionKey and RowKey attributes. Indexing on custom attributes is not supported at this point of time. Furthermore data in Windows Azure is partitioned based on the PartitionKey value. Thus it becomes really important to choose your PartitionKey wisely as improper PartitionKey value could lead to severely degraded performance when fetching data from a table. There is a very useful whitepaper published by Windows Azure Storage team which you can read here: http://blogs.msdn.com/b/windowsazurestorage/archive/2010/05/10/windows-azure-storage-abstractions-and-their-scalability-targets.aspx. In ADDB, the data is indexed on the attributes which consist the primary key of the table.
  • ADDB supports two types of primary keys:
    1. Hash Type Primary Key: In this case the primary key is made of one attribute, a hash attribute. ADDB builds an unordered hash index on this primary key attribute.
    2. Hash and Range Type Primary Key: In this case, the primary key is made of two attributes. The first attributes is the hash attribute and the second one is the range attribute. ADDB builds an unordered hash index on the hash primary key attribute and a sorted range index on the range primary key attribute.

Throughput Provisioning

One of the most important thing in ADDB is throughput provisioning. Apart from being a NoSQL database, it allows you to configure desired throughput based on your application. Simply put, throughput provisioning is specifying how many reads and writes per minute you wish to perform against a table in ADDB. Based on the values you specify, ADDS allocates sufficient resources to meet your requirements. You could update this on the fly either by using the API or through Amazon Management Console.

Throughput provisioning is provided through something called Read Capacity Units for reads and Write Capacity Units for writes.

A Read Capacity Unit is defined as number of consistent reads of items per second in 1 KB block. So if you request 10 read capacity units, what this means is that you could do consistent read operation on 10 items up to 1 KB in size per second. If your item size is more than 1 KB, then the number of items you could read per second will be lesser. For example if your items are between 1 KB and 2 KB in size, you will be able to perform only 5 consistent reads per second before being throttled by the system. If you’re doing eventually consistent reads instead of consistent reads, your throughput essentially doubles. What this means is that if you have requested 10 read capacity units, you would be able to perform 20 eventually consistent reads on items 1 KB or less.

Similarly a Write Capacity Unit is defined as number of 1 KB writes / second. So if you request 10 write capacity units, you will be able to write 10 items up to 1 KB in size per second. If your item size is more than 1 KB, then the number of items you could write per second will be lesser. For example if your items are between 1 KB and 2 KB in size, you will be able to perform only 5 writes per second before being throttled by the system.

Please note that throughput provisioning has pricing implications as the pricing for ADDB is based on this apart from other things. Essentially you pay for the read and write capacity units you reserved. At the time of writing this post, you would pay $0.01 / hour for every 10 units of write capacity you requested and $0.01 / hour for every 50 units of write capacity you requested in US East (Virginia) data center. In principle, this kind pricing is similar to the pricing you pay for your compute instances. In compute instances you request a VM of a particular size (which offers you certain processing power and RAM) and you pay an hourly rate for that VM irrespective of whether you’re fully utilizing that VM or not. Similarly here in ADDB, you would pay an hourly rate based on the the throughput you asked Amazon to provision for you irrespective of whether you’re fully utilizing that throughput or not. Thus a lot of thought must go in before setting up a table in ADDB as you configure throughput provisioning per table.

There are few other considerations, when it comes to throughput provisioning:

  • As I said above, you do this for each table separately.
  • There is a minimum throughput of 5 read capacity units and 5 write capacity units per table. So for each table, at a minimum you’re paying $0.001 ($0.01 * 5 / 50) for consistent reads and $0.005 for writes per hour even if your table is not used at all.
  • When you increase or decrease the provisioned throughput, it has to be at least 10% more or less than previous values. For example, if your current throughput is 100 read capacity units and you wish to increase it, the new value must be greater than or equal to 110.
  • When you increase the provisioned throughput, in a single request you can only be increased up to twice the current value. For example, if your current throughput is 100 read capacity units you can only increase it up to 200.
  • You can only decrease the provisioned throughput once a day.
  • There is a maximum of 10,000 units of read capacity units and 10,000 units of write capacity units per table by default. Also there is a maximum of 20,000 units of read capacity units and 20,000 units of write capacity units across all tables in an Amazon account by default. You can get this quota increased by contacting Amazon at http://www.amazon.com/gp/html-forms-controller/DynamoDB_Limit_Increase_Form.

Pricing

Before we talk about the functionality offered in each system, let’s take a moment and talk about the pricing. In both systems, there are no upfront costs. Some components which constitute the overall pricing are same while some are radically different. Here are the pricing components:

  1. Transaction: IN WATS, you’re charged for the for the number of transactions you performed. In WATS, the cost per transaction is fixed (currently $0.01 per 10,000 transactions). Thus when it comes to calculate how much you owe Microsoft, it calculates the total number of transactions and multiples it with transaction rate ($/transaction) to figure out transaction costs.
  2. Provisioned Throughput: In ADDB, you’re charged for the throughput you have provisioned. There are fixed hourly rates for read and write capacity units in ADDB and based on that your bill is calculated. Thus when it comes to calculate how much you owe Amazon, it takes the read and write capacity units defined and multiplies that with the hourly rate to figure out the bill for a particular hour.
  3. Storage: You’re charged for the amount of data you store in each system.
  4. Data Transfer: You’re charged for the amount of data transferred from/to each of the system. At the time of writing this blog, both systems offer free ingress (i.e. data coming from outside) but charge a fee for data egress (i.e. data going out). Data transferred between ADDB and Amazon EC2 within a single region is free of charge (i.e., $0.00 per GB). Data transferred between ADDB and Amazon EC2 in different regions will be charged at Internet Data Transfer rates on both sides of the transfer. Similarly in WATS, only the data going out of a data center is charged.

With provisioned throughput, ADDB pricing is much more predictable than WATS transaction based pricing however the trick is to find the correct throughput. You provision more throughput but didn’t get enough requests, you end up paying more than you actually need to. You provision lesser throughput, you run the risk of your requests getting throttled in case the load suddenly increases.

Function/Feature Summary

Following table summarizes the list of functions provided by WATS and ADDB.

  WATS ADDB
Create Table/CreateTable Yes Yes
Query Tables/ListTables Yes Yes
Delete Table/DeleteTable Yes Yes
UpdateTable No Yes
DescribeTable No Yes
CRUD Operations on a single Entity/Item Yes Yes
CRUD Operations on multiple Entities/Items Yes Yes
Query Entities/Query (or Scan) Yes Yes

Now we’ll explore these functions in somewhat more details.

Create Table/CreateTable

  WATS ADDB
Create Table/CreateTable Yes Yes

As the name suggests, this function creates a table in WATS and ADDB respectively. Unlike SimpleDB where CreateDomain operation is an idempotent operation, here is ADDB it is not i.e. if you try and create a table with the name of an existing table, the system will throw an error.

There are few rules when it comes to naming a table/domain. Following table summarizes these rules:

  WATS ADDB
Minimum/Maximum length 3/63 3/255
Case Sensitivity Mixed case Mixed case
Valid Characters Alphanumeric Alphanumeric, Dash (-), Hyphen (_) and Period (.)

There are a few other things:

  • In WATS, a table name cannot begin with a number e.g. 1a2 is an invalid table name while a12 is a valid table name. Furthermore, table names preserve the case with which they were created, but are case-insensitive when used.
  • As mentioned above also in the Concepts section, by default you can have up to 256 tables per account in ADDB. To increase this limit, you can submit a request to Amazon (http://www.amazon.com/gp/html-forms-controller/DynamoDB_Limit_Increase_Form).
  • This operation in ADDB is asynchronous operation where as in WATS it is synchronous. As soon as ADDB receives a request to create a table, it spawns of a number of processes (assuming resource allocation). You will not be able to use that table until all processes have been completed and the table is in “Active” state.
  • When you’re creating a table in ADDB, you would need to specify the primary key for that table as well as specify the throughput provisioning desired for that table. You can change the provisioned throughput later using UpdateTable functionality however primary key can’t be changed once a table is created.

Query Tables/ListTables

  WATS ADDB
Query Tables/ListTables Yes Yes

As the name suggests, this function returns the list of tables from WATS and ADDB respectively. A single call to this function returns up to 1000 tables in WATS and all tables in ADDB. If there are more tables and domains available, then a “continuation token” is returned by both services which can be used to fetch next set of tables and domains.

To summarize:

  WATS ADDB
Maximum records returned per call 1000 -
Returns continuation token in case more data is available Yes Yes

Delete Table/DeleteTable

  WATS ADDB
Delete Table/DeleteTable Yes Yes

As the name suggests, this function removes a table from WATS and ADDB. Here again like CreateTable operation, this operation is not idempotent in ADDB.

In order for a table to be deleted from ADDB, that table must be in “Active” state i.e. no “Create” or “Update” operation is being performed on that table when a delete request is sent. DeleteTable operation in ADDB, this is an asynchronous operation. As far as WATS is concerned, even though this operation might sound like a synchronous operation, in reality it is not. When you send a request to delete a table in WATS, it is marked for deletion by the system and is no longer accessible. The table is deleted through a garbage collection process. The actual time for deleting a table would among other things depend on the size of the data present in the table. In my experience, deleting a very large table could take hours. During this time an attempt to create a table by the same name would result in an error(Conflict error – HTTP Status Code 409). Thus a forethought must be put regarding the impact of this while deleting a table in WATS.

UpdateTable

  WATS ADDB
UpdateTable No Yes

UpdateTable functionality is used to update the provisioned throughput of a table in ADDB. You can upgrade or downgrade provisioned throughput using this functionality provided:

  • The new throughput values are within the limits and no rules are violated(please see Throughput Provisioning section above).
  • The table is in “Active” state.

DescribeTable

  WATS ADDB
DescribeTable No Yes

DescribeTable functionality is used to get information about a table. Following information about a table is returned by this function:

  • CreationDateTime: Date when the table was created in UNIX epoch time.
  • ItemCount: Number of items in the specified table. Amazon DynamoDB updates this value approximately every six hours. Recent changes might not be reflected in this value.
  • KeySchema: The primary key (simple or composite) structure for the table.
  • ProvisionedThroughput: Throughput for the specified table, consisting of values for LastIncreaseDateTime (if applicable), LastDecreaseDateTime (if applicable), ReadCapacityUnits and WriteCapacityUnits. If the throughput for the table has never been increased or decreased, ADDB does not return values for those elements.
  • TableSizeBytes: Total size of the specified table, in bytes. Amazon DynamoDB updates this value approximately every six hours. Recent changes might not be reflected in this value.
  • TableStatus: The current state of the table (CREATING, ACTIVE, DELETING or UPDATING).

Please note that results of this operation are eventually consistent i.e. it is not guaranteed that you will get the most recent results.

CRUD Operations on a single Entity/Item

  WATS ADDB
CRUD Operations on a single Entity/Item Yes Yes

Both systems allow you to perform Create, Read, Update, and Delete (CRUD) operations on a single entity/item.

A few things to keep in mind:

  • There is a limit of 256 attributes per entity in WATS whereas ADDB does not have this limit. Also in WATS, since there are 3 system defined attributes (PartitionKey, RowKey, and Timestamp) you can only define up to 253 attributes.
  • In WATS the attribute values could be of one of the following 8 data types: Binary, Boolean, DateTime, Decimal, Int32, Int64, Guid, and String thus providing you with richer and more qualified data model. However in ADDB, it could be of one of the following types: String, Number and String/Number Sets (i.e. string or number arrays).
  • Maximum size on an item in ADDB is 64 KB while the maximum size of an entity in WATS can be 1MB.

Create

There are many operations in WATS to perform create operations while in ADDB, it is nicely encapsulated in just one functionality (PutItem). A PutItem operation either creates an item or if an item is found with the same primary key, this operation completely replaces it. In WATS, there are three functions which perform the operation of creating an entity:

  1. Insert Entity: Creates a new entity in a table. If an entity with same PartitionKey and RowKey is found, an error is thrown.
  2. Insert or Merge Entity: Creates a new entity in a table. If an entity with same PartitionKey and RowKey is found, that entity will be merged with this new entity i.e. values of any matching existing attribute will be updated, any attributes only defined in the new entity will be added to the existing entity and any attributes only defined in the old entity will remain unchanged.
  3. Insert or Replace Entity: Creates a new entity in a table. If an entity with same PartitionKey and RowKey is found, that entity will be replaced with this new entity by deleting the old entity and creating a new entity with same PartitionKey and RowKey values.

Read

In both systems, reading operation is essentially fetching attributes of an entity/item. In WATS, this is accomplished through Query Entities functionality and passing entity’s PartitionKey and RowKey as input parameters. In ADDB, this is accomplished through GetItem functionality and passing item’s primary key as input parameter.

Please note that by default GetItem operation provides an eventually consistent read. You can however force this operation to perform a consistent read by providing “ConsistentRead” optional parameter.

Update

Like create operation, there are many operations in WATS to perform update operations while in ADDB, it is nicely encapsulated in just two functions:

  1. PutItem: A PutItem operation either creates an item or if an item is found with the same primary key, this operation completely replaces it.
  2. UpdateItem: If you wish to just change a few attributes on an existing item in ADDB instead of replacing the whole item, you would use this functionality. It provides you very granular control over how you want your attributes to be updated.

In WATS, there are four functions which perform the operation of updating an entity:

  1. Merge Entity: This operation merges an existing entity with a new entity i.e. values of any matching existing attribute will be updated, any attributes only defined in the new entity will be added to the existing entity and any attributes only defined in the old entity will remain unchanged. If no matching entity is found, then an error is thrown.
  2. Update Entity: This operation replaces an existing entity with a new entity by deleting the old entity and creating a new entity with same PartitionKey and RowKey values. If no matching entity is found, then an error is thrown.
  3. Insert or Merge Entity: Creates a new entity in a table. If an entity with same PartitionKey and RowKey is found, that entity will be merged with this new entity i.e. values of any matching existing attribute will be updated, any attributes only defined in the new entity will be added to the existing entity and any attributes only defined in the old entity will remain unchanged.
  4. Insert or Replace Entity: Creates a new entity in a table. If an entity with same PartitionKey and RowKey is found, that entity will be replaced with this new entity by deleting the old entity and creating a new entity with same PartitionKey and RowKey values.

Conditional Updates: Both systems support conditional updates to an entity/item however the mechanism is entirely different. In ADDB, you can define conditions on value of existing attributes i.e. you can instruct ADDB to update value of say attribute1 only if the value of another attribute say attribute2 is some value. Furthermore conditional updates on ADDB also supports existence of an attribute i.e. you can instruct ADDB to update value of (again) say attribute1 only of another attribute say attribute2 exists. In WATS, it is entirely different. In WATS, it is based on entity’s ETag value. To update an entity conditionally, you must provide entity’s ETag value in one of the request headers (when using REST API). WATS then compares this value with the most current value of ETag for that entity and the update only succeeds if both ETag value matches.

Delete

To delete an entity in WATS, you use Delete Entity function by passing PartitionKey and RowKey of that entity as input parameter. Similarly to delete an item in ADDB, you use DeleteItem function by passing primary key of that item as input parameter.

DeleteAttributes function in ADDB is idempotent i.e. if you try and delete an item which is not present, ADDB will not throw an error unless you try and do a conditional delete. If you’re doing a conditional delete in ADDB then the operation is not idempotent. If you’re trying to delete an entity which does not exist in WATS, an error (NotFound error – HTTP Status Code 404) will be thrown.

Conditional Deletes: Both systems support conditional deletion of an entity/item however the mechanism is entirely different. In ADDB, you can define conditions on value of existing attributes i.e. you can instruct ADDB to delete an item only if the value of an attribute say attribute1 is some value. Furthermore conditional updates on ADDB also supports existence of an attribute i.e. you can instruct ADDB to delete an item only if an attribute say attribute2 exists. In WATS, it is entirely different. In WATS, it is based on entity’s ETag value. To delete an entity conditionally, you must provide entity’s ETag value in one of the request headers (when using REST API). WATS then compares this value with the most current value of ETag for that entity and the delete only succeeds if both ETag value matches.

CRUD Operations on multiple Entities/Items

  WATS ADDB
CRUD Operations on multiple Entities/Items Yes Yes

In both systems it is possible to perform CRUD operations on multiple entities/items in a single call to the service.

In WATS, you essentially use Entity Group Transactions functionality to create/update/delete one or more entities in a single transaction. In ADDB, you use BatchWriteItem functionality to create/update/delete one or more items. Furthermore, you use BatchGetItem to read multiple items from different tables using the primary keys.

A few comments about these functionalities:

  • In WATS, it is a transaction scoped operation i.e. either the whole operation succeeds or fails however that is not true in case of ADDB. It is possible that some items in the batch might fail. When that happens, ADDB returns you the list of failed items which you can process later.
  • You can’t update an item using BatchWriteItem. You can only put or delete a items using this functionality.
  • In ADDB, maximum size of the request can be 1 MB while in WATS it is 4 MB.
  • In ADDB, there is a 25 item limit per BatchWriteItem operation. Similarly there is a limit of 100 entities per entity group transaction in WATS.
  • BatchWriteItem in ADDB allow you to work with multiple tables in same request however entity group transaction operation in WATS require you to not only work with a single table but also requires that all entities participating in this operation have the same PartitionKey value thus it is somewhat constrained however it allows you to have more number of entities (100 as compared to 25) and bigger request payload (4 MB as compared to 1 MB).
  • BatchGetItem in ADDB performs eventual consistent reads. It is not possible to perform consistent reads using this functionality.

Query Entities/Query (or Scan)

  WATS ADDB
Query Entities/Query (or Scan) Yes Yes

This functionality is used to retrieve one or more entities/items from a table based on a criteria.

A few comments about these functionalities:

  • In WATS, you use Query Entities functionality to fetch the list of items. In ADDB, there are two functions to fetch the list of items: Query and Scan. The difference between a call to query function and scan function is that in query function you have to use the primary key where as in scan function the primary key is not required. Since the data is indexed on the primary key in ADDB, query functionality is much faster than scan functionality which basically scans the entire table. Query functionality is only available on hash-and-range primary key tables. It is my understanding (and I may be wrong here) that if you’re using Query function in ADDB, you can only filter on the attributes comprising the primary key. You can’t filter on other attributes. For that you would need to use scan functionality. In WATS, to filter you have to specify a query using WCF data service filters ($filter).
  • Both systems are designed for high availability and will either timeout your requests and return partial results. From the documentation it is not clear when a query will be timed out in ADDB. WATS is designed to timeout your queries after 5 seconds.
  • In case of WATS, it is possible that your query result is sent back if the service crosses a PartitionKey boundary.
  • In case of ADDB, there is a limit of 1 MB of the response size i.e. your maximum response size can be of 1 MB. If your query potentially result in a larger dataset, only a partial result will be sent back.
  • Based on the points above, it is quite possible that you get all the records you requested for or a partial result or no result at all even though there is matching data available. When partial records or no records are sent back despite the availability of data matching your query, the service will always return you a continuation token. Thus it becomes very important that you put provision in your application/code to handle this continuation token.
  • Both systems allow you to return all attributes or partial list of attributes in a query. In ADDB, you specify that in “AttributesToGet” parameter. In WATS, it is achieved by specifying the attribute names you want the service to return in $select query option e.g. $select=PartitionKey,RowKey,Attribute1,..

A Few Questions

There are a few questions I could not find answers for. Hopefully some of you may be able to provide answers for them:

  • Let’s say I have configured throughput provisioning for 10 read capacity units. What will happen if I cross that limit (say I get 12 read requests / second)? Will ADDB simply kill additional 2 requests or will it add to some kind of queue? Will ADDB do this for just those requests which are over the limit or will it do for all the requests? [Edit] See comments below from Simon which provide answer to this question (direct link: http://aws.amazon.com/dynamodb/faqs/#What_happens_if_my_application_performs_more_reads_or_writes_than_my_provisioned_capacity)
  • Documentation says that ADDB is indexed data store. Does this mean it creates indexes on all attributes or just the attributes comprising primary key?

Summary

To summarize, both systems are quite comparable and offer similar feature sets. There are some differences in the functionality and if we as developers can keep them in mind, it is quite possible to build a system which abstracts both services so that it can be interchanged if needed. There are both pros and cons in each system and we just have to evaluate these pros and cons rationally to decide which system would best serve our need.

A few comments (Disclaimers)

  1. It is not the intent of the blog post to prove that one service is superior over other. I just wanted to have a very objective comparison of these two services.
  2. I make my living using Windows Azure Platform (if you wish, you can call me Windows Azure fan boy Smile). This does not mean that I have negative things to say about Amazon Cloud Computing Platform. It’s just that I never had a chance to play with Amazon just yet.
  3. Since I haven’t played with Amazon platform just yet, the information I have presented in this blog post about Amazon DynamoDB is solely based on my understanding of this service based on the documentation. Very likely, I may be wrong about some of things I have written here. Please note that it is not done intentionally and you can blame my lack of knowledge for that. If you find such issues, please let me know and I will fix them ASAP.

Comments

  1. Gaurav,

    Well done on another detailed analysis.

    The answer to your first question is that you get errors if you exceed your provisioned capacity (see http://aws.amazon.com/dynamodb/faqs/#What_happens_if_my_application_performs_more_reads_or_writes_than_my_provisioned_capacity)

    Although you touch on it in your post, there are couple of killer features in DynamoDB over Windows Azure Table Storage.
    1. The ability to provision (and dynamically change) capacity is a big one. I am currently on an Azure project that will break the 5000 transactions per second storage limit and the workarounds (storage accounts) are architecturally unsightly. The ability to store *a lot* of data as fast as possible is important for the type of applications that we believe that the cloud is suited for.
    2. DynamoDB is a poster-child for NoSQL databases and embraces the NoSQL concepts far more than Table Storage does. The ability to have inconsistent reads, in exchange for cost and performance, is one of the fundamental arguments of BASE versus ACID. Windows Azure Table Storage is too strongly consistent and may be NoSQL by its adoption of a key value store, but is philosophically only dipping its toes into NoSQL. The fact that DynamoDB is engineered to be blazingly fast, including using solid state disks, is a big deal. The only alternative on Azure to get fast, eventually consistent data storage, is to run something like MongoDB — there is no out-the-box service offered by Windows Azure.

    Simon

  2. Thanks Simon. Appreciate your inputs.

    Regarding #1, I completely agree that having the flexibility to change the throughput provisioning on the fly is a very big plus. Real question is, how many would really appreciate this and don’t consider this as an administration overhead. “Noobs” like me would look at this and go “Huh!!! how in hell am I going to figure this thing out?”.

  3. Vishwas Lele says:

    Very useful. Thanks for sharing. SSD vs. commodity hardware is another key difference between the two ( of course this may change in the future)

Speak Your Mind

*