Gaurav Mantri's Personal Blog.

Storage Client Library 2.0 – Implementing Retry Policies

Continuing my blog posts about migrating code to make use of storage client library 2.0, in this post I will talk about some changes done in the library for implementing retries in your code. You can read my previous posts on storage client library 2.0 here:

https://gauravmantri.com/2012/12/26/storage-client-library-2-0-exception-handling/

https://gauravmantri.com/2012/11/28/storage-client-library-2-0-migrating-blob-storage-code/

https://gauravmantri.com/2012/11/24/storage-client-library-2-0-migrating-queue-storage-code/

https://gauravmantri.com/2012/11/17/storage-client-library-2-0-migrating-table-storage-code/

Like the previous posts, I will attempt to provide some code sample through which I will try and demonstrate how you can implement retry operations in your code. What I did is wrote two simple console applications: one which uses storage client library version 1.7 and the other which uses version 2.0 and in those two applications I demonstrated some simple functionality.

Read These First

Since version 2.0 library is significantly different than the previous ones, before you decide to upgrade your code to make use of this version I strongly urge you to read up the following blog posts by the storage team as there’re many breaking changes.

Introducing Windows Azure Storage Client Library 2.0 for .NET and Windows Runtime

http://blogs.msdn.com/b/windowsazurestorage/archive/2012/10/29/introducing-windows-azure-storage-client-library-2-0-for-net-and-windows-runtime.aspx

Windows Azure Storage Client Library 2.0 Breaking Changes & Migration Guide

http://blogs.msdn.com/b/windowsazurestorage/archive/2012/10/29/windows-azure-storage-client-library-2-0-breaking-changes-amp-migration-guide.aspx

Getting Started

Before jumping into the code, there’re a few things I would like to mention:

Storage Client Libraries

To get the reference for storage client library 1.7, you can browse your local computer and navigate to the Azure SDK installation directory (C:\Program Files\Microsoft SDKs\Windows Azure\.NET SDK\2012-10\ref – assuming you have SDK 1.8 installed) and select Microsoft.WindowsAzure.StorageClient.dll from there.

To get the reference for storage client library 2.0 (or the latest version for that matter), I would actually recommend getting this using Nuget. That way you’ll always get the latest version. You can simply get it by executing the following command in Nuget Package Manager console: Install-Package WindowsAzure.Storage. While it’s an easy way to get the latest version upgrades, one must not upgrade it before ensuring the new version won’t break anything in the existing code.

Namespace

One good thing that is done with version 2.0 is that the functionality is now neatly segregated into different namespaces. For retry policies, following namespace is used:

using Microsoft.WindowsAzure.Storage.RetryPolicies;

Retry Policies – What, Why and Other Gory Details

Before talking about how to implement retry policies in your code, I would like to take a moment to explain why do we need them in our code and what do they do. If you’re familiar with the concepts, please feel free to skip this section otherwise please read on :).

What?

As the name suggests, retry policies allows you execute some operations repeatedly (i.e. retry those operations) if an error has occurred while executing those operations and certain conditions are met. They also allow you to define how many times the operation should be retried and with how much delay.

Why?

One key thing in any public cloud infrastructure is that the entire infrastructure is shared. In grand scheme of things, you (rather your application) is just a tenant there. You don’t own that infrastructure. Since this infrastructure is designed to support hundreds of thousands (or maybe more) of tenants like you, your cloud provider define some thresholds on this infrastructure to keep it highly available. Take Windows Azure Storage for example. The storage account you create is not the only thing in that storage system; there are millions of other storage accounts as well. Windows Azure Storage has defined some thresholds per tenant. If you go above those thresholds, the storage system starts throwing error back at you. For example in a single queue in Windows Azure Queues, you can process a maximum of 2000 messages per second. If you go above that, the system will throttle your requests.

Another key thing to understand is that even though a cloud services are very highly available, there may be very brief moments of time during which a service may not be available to you. It could be because of many reasons – throttling as I mentioned above or Internet latency or any other reason. These are commonly referred to as “Transient” situations because they last for a very small amount of time.

Your cloud application code should be able to handle these “Transient” situations because as the name says, they’re transient in nature and if you try same request again a few moments later everything should work perfectly fine. Furthermore your cloud application code should be able to differentiate between the errors caused by these situation and other errors and deal with them separately.

By properly implementing retry policies, your code will be able to handle these transient errors.

How It Works?

Since all operations against Windows Azure Storage are REST based operations over HTTP, all requests (successful or failed) return a HTTP status code. That is the foundation for retry policies. In case of a failed request, retry policy code block intercepts the exception and inspects the HTTP status code (and also the error code returned by storage service) and based on that determines if the request should be retried or not. Once it is decided that the request should be retried, other factors come into picture. These other factors include how much time the code should wait before resending the request (remember that we’re dealing with transient errors so it’s in our best interest to wait for some time before firing off that next request) and how many times the request has been retried (no point in retrying our requests infinitely).

Types

Basically there’re 2 types of standard retry policies available to you out-of-the-box which are based on the interval after which an operation is retried (called backoff period):

Linear

In case of a linear retry policy, an operation is repeated periodically with same backoff period. For example, if the retry interval is set for 5 seconds, first retry operation is performed 5 seconds after the first response and then the next retry operation is performed 5 seconds after the second response and so on.

clip_image002

Picture above shows implementation of linear retry policy where an operation is retried 5 times at the interval of 5 seconds each. After 5 attempts, retry policy code gives up and sends an exception to the application.

Exponential

In case of an exponential retry policy, again an operation is repeated periodically after a backoff period however the backoff period between two retries is not constant, rather it varies (typically increase) exponentially. For example, if the retry interval is set for 2 seconds, first retry operation is performed 2 seconds after the first response and then the second retry operation is performed 4 seconds after the first response and the retry operation after that is performed 8 seconds after the second response and so on.

clip_image004

Picture above shows implementation of an exponential retry policy where an operation is retried 5 times with 2 seconds as the retry interval which increases exponentially. After 5 attempts, retry policy code gives up and sends an exception to the application.

The example I gave above uses 100% delta (2 -> 4 -> 8 -> 16 -> 32) however in actual implementation is usually +/- 20% of the original backoff period and is random to avoid multiple clients retrying the request at the same time.

No Retry

There’s a 3rd retry policy as well which technically is not a retry policy but is still available to us and that is “No Retry”. Basically it means the operation should not be retried at all :).

Custom Retry Policy

Apart from the standard retry policies, you can create your own custom retry policies as well if the standard retry policies don’t meet your requirements.

Recommendation

If you’re implementing one of the standard retry policies, Windows Azure Storage team recommend that you use Exponential retry policy.

Implementing Retry Policies

Enough talking :). Let’s see some code!!!

Retry Policies in Storage Client Library 1.7

In version 1.7, in order to implement a simple retry policy based on standard retry policies (linear or exponential) just create an instance of retry policy object and assign that to “RetryPolicy” property of table, blob, or queue client object. For example, take a look at the code below where I tried to implement a linear retry policy which retries the code to create a blob container 10 times. The backoff duration is 10 seconds:

        static string accountName = "<storage account name>";
        static string accountKey = "<storage account key>";
        static void Main(string[] args)
        {
            var storageAccount = new CloudStorageAccount(new StorageCredentialsAccountAndKey(accountName, accountKey), true);
            string blobContainerName = "temp-" + DateTime.UtcNow.Ticks;
            CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
            //Create a linear retry policy
            blobClient.RetryPolicy = RetryPolicies.Retry(10, TimeSpan.FromSeconds(5));
            CloudBlobContainer blobContainer = blobClient.GetContainerReference(blobContainerName);
            blobContainer.Create();
            Console.ReadLine();
        }

To show that the stuff works, I turned off the Internet connection on my computer and executed the application and traced the request in Fiddler. Screenshot below shows that the request is tried 10 times (a total of 11 times) before throwing the error.

image

To create an exponential retry policy, just change the following lines of code:

blobClient.RetryPolicy = RetryPolicies.Retry(10, TimeSpan.FromSeconds(5));

To

blobClient.RetryPolicy = RetryPolicies.RetryExponential(10, TimeSpan.FromSeconds(1), TimeSpan.FromSeconds(30), TimeSpan.FromSeconds(5));

So your code would look something like:

        static string accountName = "<storage account name>";
        static string accountKey = "<storage account key>";
        static void Main(string[] args)
        {
            var storageAccount = new CloudStorageAccount(new StorageCredentialsAccountAndKey(accountName, accountKey), true);
            string blobContainerName = "temp-" + DateTime.UtcNow.Ticks;
            CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
            blobClient.RetryPolicy = RetryPolicies.RetryExponential(5, TimeSpan.FromSeconds(1), TimeSpan.FromSeconds(30), TimeSpan.FromSeconds(2));
            CloudBlobContainer blobContainer = blobClient.GetContainerReference(blobContainerName);
            blobContainer.Create();
            Console.WriteLine("Blob container created.");
            Console.ReadLine();
        }

Simple, isn’t it?

I could possibly keep on talking about this but since the intention of this post is to talk about retry policies in version 2.0, let’s stop right here with 1.7 and continue this post with version 2.0.

Retry Policies in Storage Client Library 2.0

Let’s talk about how you can implement retry policies in your code making use of storage client library 2.0.

What has changed?

Let’s first talk about the things at a very high level that has changed between 1.7 and 2.0 as far as retry policies are concerned:

  • Namespace: As mentioned above, retry policies are in their own namespace “Microsoft.WindowsAzure.Storage.RetryPolicies” in 2.0 against one namespace “Microsoft.WindowsAzure.StorageClient” in 1.7.
  • Delegate v/s Interface: Retry policy in 1.7 is a delegate whereas in 2.0 it’s an interface IRetryPolicy.
  • “Retryable” Operations: Retry policy implementation in 2.0 no longer prefilters certain types of exceptions or HTTP status codes prior to evaluating the users RetryPolicy. The RetryPolicies contained in the library will by default not retry 400 class errors, but this can be overridden by implementing your own policy
  • Retry Policy Application: In version 1.7, when it comes to blob operations there’re two places where you could apply the retry policy – either at CloudBlobClient level or at individual operation level by using BlobRequestOptions object. However with tables and queues you don’t have that flexibility and would need to define this only at CloudTableClient and CloudQueueClient level. This has changed significantly in 2.0. In 2.0, you still get have all the options available in 1.7 however one significant improvement that is done is that now you can apply the retry policy at operation level by making use of TableRequestOptions and QueueRequestOptions. What that means is that you get much granular control over the retry operation.
  • Default Retry Policy: If you don’t specify a retry policy (i.e. keep RetryPolicy member null in [Blob/Queue/Table]RequestOptions object or Cloud[Blob/Table/Queue]Client object), by default storage client library will apply exponential retry policy. If you don’t want an operation to be retried, you must explicitly set the RetryPolicy as NoRetryPolicy. e.g.
                IRetryPolicy noRetryPolicy = new NoRetry();
                BlobRequestOptions requestOptions = new BlobRequestOptions()
                {
                    RetryPolicy = noRetryPolicy,
                };
    

Implementing Linear Retry Policy

Procedure for creating a linear retry policy in version 2.0 is rather simple and straightforward. You just create an instance of “LinearRetry” class with appropriate parameters. For example, take a look at the code below where I tried to implement a linear retry policy which retries the code to create a blob container 10 times. The backoff duration is 2 seconds:

        static string accountName = "<storage account name>";
        static string accountKey = "<storage account key>";
        static void Main(string[] args)
        {
            var storageAccount = new CloudStorageAccount(new StorageCredentials(accountName, accountKey), true);
            string blobContainerName = "temp-" + DateTime.UtcNow.Ticks;
            CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
            IRetryPolicy linearRetryPolicy = new LinearRetry(TimeSpan.FromSeconds(2), 10);
            blobClient.RetryPolicy = linearRetryPolicy;
            CloudBlobContainer blobContainer = blobClient.GetContainerReference(blobContainerName);
            blobContainer.Create();
            Console.WriteLine("Blob container created.");
            Console.ReadLine();
        }

To show that the stuff works, I turned off the Internet connection on my computer and executed the application and traced the request in Fiddler. Screenshot below shows that the request is tried 10 times (a total of 11 times) before throwing the error.

clip_image002[5]

Implementing Exponential Retry Policy

Again, the procedure for creating an exponential retry policy in version 2.0 is simple and straightforward. You just create an instance of “ExponentialRetry” class with appropriate parameters. For example, take a look at the code below where I tried to implement a linear retry policy which retries the code to create a blob container 10 times.

        static string accountName = "<storage account name>";
        static string accountKey = "<storage account key>";
        static void Main(string[] args)
        {
            var storageAccount = new CloudStorageAccount(new StorageCredentials(accountName, accountKey), true);
            string blobContainerName = "temp-" + DateTime.UtcNow.Ticks;
            CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
            IRetryPolicy exponentialRetryPolicy = new ExponentialRetry(TimeSpan.FromSeconds(2), 10);
            blobClient.RetryPolicy = exponentialRetryPolicy;
            CloudBlobContainer blobContainer = blobClient.GetContainerReference(blobContainerName);
            blobContainer.Create();
            Console.WriteLine("Blob container created.");
            Console.ReadLine();
        }

Implementing No Retry Policy

Consider a scenario where you would want storage client library to retry most operations but for certain operations you don’t want any retries, for those operations you could create a “No Retry” retry policy. Consider the following code for example:

        static string accountName = "<storage account name>";
        static string accountKey = "<storage account key>";
        static void Main(string[] args)
        {
            var storageAccount = new CloudStorageAccount(new StorageCredentials(accountName, accountKey), true);
            string blobContainerName = "temp-" + DateTime.UtcNow.Ticks;
            CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
            IRetryPolicy linearRetryPolicy = new LinearRetry(TimeSpan.FromSeconds(2), 10);
            blobClient.RetryPolicy = linearRetryPolicy;
            CloudBlobContainer blobContainer = blobClient.GetContainerReference(blobContainerName);
            IRetryPolicy noRetryPolicy = new NoRetry();
            BlobRequestOptions requestOptions = new BlobRequestOptions()
            {
                RetryPolicy = noRetryPolicy,
            };
            blobContainer.Create(requestOptions);
            Console.WriteLine("Blob container created.");
            Console.ReadLine();
        }

In the example above, all other blob related operations would be retried with a linear retry policy however the create blob container operation will not be retried.

clip_image002[7]

Implementing Custom Retry Policy

Let’s spice things a bit and try to create a custom retry policy :). The scenario I want to cover is to have retry create blob container operation in case I just deleted a blob container and trying to recreate it immediately. Under normal circumstances I get an error with 409 (Conflict) HTTP status code and ContainerBeingDeleted error code. By default the retry policy will consider 409 HTTP status code as non retryable operation.

To do so, let’s create a class called “ContainerBeingDeletedRetryPolicy” and have it implement IRetryPolicy interface.

    public class ContainerBeingDeletedRetryPolicy : IRetryPolicy
    {
        IRetryPolicy CreateInstance()
        {
            throw new NotImplementedException();
        }

        bool ShouldRetry(int currentRetryCount, int statusCode, Exception lastException, out TimeSpan retryInterval, OperationContext operationContext)
        {
            throw new NotImplementedException();
        }
    }

Now let’s implement “ShouldRetry” method. Here’re the things we’ll check:

  1. “currentRetryCount” must be less than maxRetryCount that we’ll set.
  2. HTTP Status code returned in the exception should be 409 (Conflict).
  3. Exception should of type StorageException.
  4. Error code should be “ContainerBeingDeleted”.

Based on these, this is how our class would finally look:

    public class ContainerBeingDeletedRetryPolicy : IRetryPolicy
    {
        int maxRetryAttemps = 10;

        TimeSpan defaultRetryInterval = TimeSpan.FromSeconds(5);

        public ContainerBeingDeletedRetryPolicy(TimeSpan deltaBackoff, int retryAttempts)
        {
            maxRetryAttemps = retryAttempts;
            defaultRetryInterval = deltaBackoff;
        }

        public IRetryPolicy CreateInstance()
        {
            return new ContainerBeingDeletedRetryPolicy(TimeSpan.FromSeconds(2), 5);
        }

        public bool ShouldRetry(int currentRetryCount, int statusCode, Exception lastException, out TimeSpan retryInterval, OperationContext operationContext)
        {
            retryInterval = defaultRetryInterval;
            if (currentRetryCount >= maxRetryAttemps)
            {
                return false;
            }
            //Since we're only interested in 409 status code, let's not retry any other operation.
            if ((HttpStatusCode)statusCode != HttpStatusCode.Conflict)
            {
                return false;
            }
            //We're only interested in storage exceptions so if there's any other exception, let's not retry it.
            if (lastException.GetType() != typeof(StorageException))
            {
                return false;
            }
            else
            {
                var storageException = (StorageException)lastException;
                string errorCode = storageException.RequestInformation.ExtendedErrorInformation.ErrorCode;
                if (errorCode.Equals("ContainerBeingDeleted"))
                {
                    return true;
                }
                else
                {
                    return false;
                }
            }
            return true;
        }
    }

To use this retry policy, the code is quite simple. To simulate this “ContainerBeingDeleted” condition, I created a blob container, deleted it and immediately tried to recreate it as shown in the code below:

        static string accountName = "<storage account name>";
        static string accountKey = "<storage account key>";
        static void Main(string[] args)
        {
            var storageAccount = new CloudStorageAccount(new StorageCredentials(accountName, accountKey), true);
            string blobContainerName = "temp-" + DateTime.UtcNow.Ticks;
            CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
            IRetryPolicy linearRetryPolicy = new LinearRetry(TimeSpan.FromSeconds(2), 10);
            blobClient.RetryPolicy = linearRetryPolicy;
            CloudBlobContainer blobContainer = blobClient.GetContainerReference(blobContainerName);
            blobContainer.Create();
            Console.WriteLine("Blob container created.");
            blobContainer.Delete();
            Console.WriteLine("Blob container deleted.");
            IRetryPolicy containerBeingDeletedRetryPolicy = new ContainerBeingDeletedRetryPolicy(TimeSpan.FromSeconds(2), 10);
            BlobRequestOptions requestOptions = new BlobRequestOptions()
            {
                RetryPolicy = containerBeingDeletedRetryPolicy,
            };
            blobContainer.Create(requestOptions);
            Console.WriteLine("Blob container created.");
            Console.ReadLine();
        }

To ensure things are working properly, I traced the request through fiddler and this is what I see there.

clip_image002[9]

As you can see above, the application first sent the “create” request and then sent the “delete” request. After that it sent the “create” request again only to get 409 error back with “ContainerBeingDeleted” in error code. After that the retry policy kicks and retries the operation.

Other Alternative

While we’re on the subject of retry policies, I briefly want to mention about TOPAZ as well. If you don’t know, TOPAZ is Microsoft’s Transient Fault Handling Application Block which is part of Enterprise Library 5.0 for Windows Azure. You can read more about TOPAZ here: http://entlib.codeplex.com/wikipage?title=EntLib5Azure.

One good thing I like about TOPAZ is that it supports not only Windows Azure Storage but also Windows Azure SQL Database (i.e. SQL Azure), Service Bus and Caching Service. So if your application makes use of one or more of these components and wish to implement unified retry policies for these components, this one solution would work really well.

I may write a separate blog post about TOPAZ later.

Summary

As you saw above, there have been some changes to storage client library as far as implementing retry policies are concerned and proper care must be taken before doing the migration. However I do feel that in principle it’s the same and the changes done will enable us developers to extend this functionality to suit our needs in a much better way. I especially liked the interface based approach and the capability to define separate retry policies for each operation.

Please feel free to share your experience with migration exercise by providing comments. This will help me and the readers of this blog immensely. Finally, if you find any issues with this post please let me know and I will try and fix them ASAP.

So long and happy coding!!!


[This is the latest product I'm working on]