Gaurav Mantri's Personal Blog.

Windows Azure Blob Storage – Dealing With “The specified blob or block content is invalid” Error

If you’re uploading blobs by splitting blobs into blocks and you get the error – The specified blob or block content is invalid, then this post is for you.

Short Version

If you’re uploading blobs by splitting blobs into blocks and you get the above mentioned error, ensure that your block ids of your blocks are of same length. If the block ids of your blocks are of different length, you’ll get this error.

Long Version

Now for the longer version of this post Smile. A few days back I was working with storage client library especially around uploading blobs in chunks and with one particular blob I was constantly getting the error – The specified blob or block content is invalid. I tried numerous combinations even resorting to REST API directly but to no avail. It only happened with just one blob. Furthermore if I uploaded the same blob without splitting it into blocks, all was well. I was at my wits’ end. Tried searching the Internet for this error but could not find a conclusive answer to my problem.

After much trial and error, I was able to simulate the same problem on other blobs as well. Here’s how you can recreate it:

  1. Start uploading the blob by splitting it into blocks. For block id, let’s do a 7 character long string e.g. intValue.ToString(“d7”). This will ensure that my block ids would be “0000001”, “0000002”, …, ”0000010” …..
  2. After one or two blocks are uploaded, cancel the operation.
  3. Now re-upload the blob by splitting it into blocks. However this time for block id, let’s do a 6 character long string e.g. intValue.ToString(“d6”).
  4. You’ll get the error as soon as you try to upload the 1st block.

Possible Solutions

Now that we know the root cause of this problem, let’s look at some of the possible solutions to solve this problem.

Wait out

One possible solution is to wait out. I know its lame but still a possible solution. We know that Windows Azure Blob Storage Service keeps all uncommitted blocks for a duration of 7 days and if within 7 days those uncommitted blocks are not committed, the storage service purges them.

I wish storage service provided some mechanism to purge uncommitted blocks programmatically.

Commit uncommitted blocks

You could possibly commit the blocks which are in uncommitted state so that at least you get a blob (which would not be the blob we wanted to upload in the first place). You can then delete that blob and re-upload the blob by specifying block ids which are of same length. To fetch the list of uncommitted blocks, if you’re using REST API directly you can perform “Get Block List” operation and pass “blocklisttype=uncommitted” as one of the query string parameters. If you’re using storage client library (assuming you’re using the version 2.x of .Net storage client library), you can do something like the code below:

        private static List<string> GetUncommittedBlockIds(CloudBlockBlob blob)
        {
            var sasUri = blob.GetSharedAccessSignature(new SharedAccessBlobPolicy()
            {
                SharedAccessExpiryTime = DateTime.UtcNow.AddMinutes(5),
                Permissions = SharedAccessBlobPermissions.Read,
            });
            var blobUri = new Uri(string.Format("{0}{1}", blob.Uri, sasUri));
            List<string> uncommittedBlockIds = new List<string>();
            var request = BlobHttpWebRequestFactory.GetBlockList(blobUri, null, null, BlockListingFilter.Uncommitted, null, null);
            //request.Headers.Add("Authorization", 
            using (var resp = (HttpWebResponse)request.GetResponse())
            {
                using (var stream = resp.GetResponseStream())
                {
                    var getBlockListResponse = new GetBlockListResponse(stream);
                    var blocks = getBlockListResponse.Blocks;
                    foreach (var block in blocks.Where(b => !b.Committed))
                    {
                        uncommittedBlockIds.Add(Encoding.UTF8.GetString(Convert.FromBase64String(block.Name)));
                    }
                }
            }
            return uncommittedBlockIds;
        }

A few things to keep in mind here:

Fetch uncommitted blocks to see block id length

You could fetch the list of uncommitted blocks just to find out the length of the block id used. You could then use that block id length for your new upload session and do the upload. Please see the code snippet above to find this information.

Upload another blob with same name without splitting it into blocks

You could also upload another blob with the same name without splitting it into blocks. It could very well be a zero byte blob. That way your uncommitted block list will be wiped clean. Then you could delete that dummy blob and re-upload the actual blob.

A Few Words About Blocks

Since we’re talking about blocks, I thought it might be useful to mention a few points about them:

  • Blocks and block related operations are only applicable for “Block Blobs”. Duh!! You’ll get an error if you’re trying to do these operations on a “Page Blob”.
  • For uploading large blobs, it is recommended that you split your blob into blocks. In fact if your blob size is more than 64 MB, then you have to split it into blocks.
  • Minimum size of a block is 1 Byte and the maximum size of a block is 4 MB. It is recommended that you choose a block size based on your internet connectivity and number of parallel threads you want use to upload these blocks.
  • A blob can be split into a maximum of 50000 blocks. It’s important to remember this limitation because you are reminded of this limit when you’re trying to upload 50001st block.
  • The length of all the block ids must be same. So if you’re using an integer value to denote block id, you make sure that you pad that integer value with “0” so that you get same length. So you could do something like int.ToString(“d6”).
  • When passing the block id as a parameter, it must be Base64 encoded.
  • While the order in which the blocks are uploaded is not important, the order is important when you commit the block list because that’s when the blob is constructed by the service. For example, let’s say you’re uploading a blob by splitting it into 5 blocks (with ids “000001”, “000002”, “000003”, “000004”, and “000005”). You could upload these blocks in any order – 000004, 000001, 000003, 000005, 000002 however when you commit the block list, ensure that the block ids are passed in proper order i.e. 000001, 000002, 000003, 000004, 000005.

Summary

That’s it for this post. I hope you’ve found this information useful. I spent considerable amount of time trying to fix this problem so I hope it will help some folks out. As always, if you find any issues with the post please let me know and I’ll fix it ASAP.

Happy Coding!


[This is the latest product I'm working on]