Gaurav Mantri's Personal Blog.

Understanding Windows Azure Diagnostics Costs And Some Ways To Control It

In this blog post, we’ll try to understand the costs associated with Windows Azure Diagnostics (WAD) and some of the things we could do to keep it down.

Brief Introduction

Let’s take a moment and talk briefly about WAD especially around how the data is stored. If you’re familiar with WAD and it’s data store, please feel free to skip this section.

Essentially Windows Azure Storage (Tables and Blobs) are utilized to store WAD data collected by your application. Following table summarizes the tables/blob containers used for storing WAD data:

Table / Blob Container Name Purpose
WADLogsTable Table to store application tracing data.
WADDiagnosticInfrastructureLogsTable Table to store diagnostics infrastructure data collected by Windows Azure.
WADPerformanceCountersTable Table to store performance counters data.
WADWindowsEventLogsTable Table to store event logs data.
WADDirectoriesTable Pointer table for some of the diagnostics data stored in blob storage.
wad-iis-logfiles Blob container to store IIS logs.
wad-iis-failedrequestlogfiles Blob container to store IIS failed request logs.
wad-crash-dumps Blob container to store crash dump data.
wad-control-container Blob container to store WAD configuration data.

 

In this blog post, we will focus only on tables.

Understanding Overall Costing

Now let’s take a moment and understand how you’re charged. Since the data is stored in Windows Azure Storage, there’re two components:

Storage Costs

This is the cost of storing the data. Since the data is stored in the form of entities in the tables mentioned above, it is possible to calculate the storage size. The formula for calculating the storage cost of an entity is:

4 bytes + Len (PartitionKey + RowKey) * 2 bytes + For-Each Property(8 bytes + Len(Property Name) * 2 bytes + Sizeof(.Net Property Type))

Where, the Sizeof(.Net Property Type) for the different types is:

  • String – # of Characters * 2 bytes + 4 bytes for length of string
  • DateTime – 8 bytes
  • GUID – 16 bytes
  • Double – 8 bytes
  • Int – 4 bytes
  • INT64 – 8 bytes
  • Bool – 1 byte
  • Binary – sizeof(value) in bytes + 4 bytes for length of binary array

Reference: http://blogs.msdn.com/b/avkashchauhan/archive/2011/11/30/how-the-size-of-an-entity-is-caclulated-in-windows-azure-table-storage.aspx

At the time of writing this blog post, the cost of storing 1 GB of data in Windows Azure Storage was:

$0.125 – Geo redundant storage

$0.093 – Locally redundant storage

Transaction Costs

This is the cost of inserting records in Windows Azure Table Storage. WAD makes use of entity group transactions and the PartitionKey for WAD tables actually represent date/time (in UTC) up to minutes precision. What that means is that for each minute of diagnostics data stored in table storage, you incur charge for a single transaction. This is based on the assumption that

  1. You’re not collecting more than 100 data points per minute because there’s a limit of 100 entities per entity group transaction. E.g. if you’re collecting 5 performance counters every second, then in a minute you’re collecting 300 data points per minute. In this case, to transfer this data WAD would need to perform 3 transactions.
  2. Total payload size is less than 4 MB because of the size limitation in an entity group transaction. E.g. if a WAD entity is say 1 MB in size and you have 10 such WAD entities per minute. Since the total payload is 10 MB, to transfer this data WAD would need to perform 3 transactions.

At the time of writing this blog post, the cost of performing 100,000 transactions against your storage account was $0.01.

Bandwidth Costs

There’s also bandwidth costs but we will not consider it in our calculations because I’m assuming your compute instances and diagnostics storage account are co-located in the same data center (even in the same affinity group) and you don’t pay for bandwidth unless the data goes out of the data center.

Storage Cost Calculator

Now let’s take some sample data and calculate how much it would cost us to store just that data. One can then extrapolate that data to calculate total storage costs.

Since all tables have different attributes, we will take each table separately.

WADLogsTable

Attribute

Data Type

Sample Data

Size (Formula)

Size (Bytes)

PartitionKey String 0634012319400000000 2 * (19) 38
RowKey String 43b7f32f389648639f16b55c6dcd7c4b

___WorkerRole1___WorkerRole1_IN_

0___0000000001652032029
2 * (87) 174
EventTickCount Int64 634012319404982640 2 * (14) + 8 + 8 44
DeploymentId String 43b7f32f389648639f16b55c6dcd7c4b 2 * (12) + 2 * 32 + 4 + 8 100
Role String WorkerRole1 2 * (4) + 2 * 11 + 4 + 8 42
RoleInstance String WorkerRole1_IN_0 2 * (12) + 2 * 16 + 4 + 8 68
Level Int32 5 2 * (5) + 4 + 8 22
EventId Int32 0 2 * (7) + 4 + 8 26
Pid Int32 1652 2 * (3) + 4 + 8 18
Tid Int32 2128 2 * (3) + 4 + 8 18
Message String Worker Role is working.: Working 2 * (7) + 2 * 32 + 4 + 8 100
         
Overhead       4
Total / Entity 654

So if I am writing the following line of code once per second:

    Trace.WriteLine("Worker Role is working.: Working");

This is how much the cost I’m incurring in a month from storage cost point of view:

654 bytes * 60 (seconds/minute) * 60 (minutes/hour) * 24 (hours/day) * 30 (days/month) * $0.125/GB = $0.20.

From storage transactions point of view, the total cost I’m incurring in a month would be:

1 entity batch transaction for a minute * 60 (minutes/hour) * 24 (hours/day) * 30 (days/month) * $0.01/100000 = $0.004.

WADDiagnosticInfrastructureLogsTable

Attribute

Data Type

Sample Data

Size (Formula)

Size (Bytes)

PartitionKey String 0634012319400000000 2 * (19) 38
RowKey String 43b7f32f389648639f16b55c6dcd7c4b

___WorkerRole1___WorkerRole1_IN_

0___0000000001652032029
2 * (87) 174
EventTickCount Int64 634012319404982640 2 * (14) + 8 + 8 44
DeploymentId String 43b7f32f389648639f16b55c6dcd7c4b 2 * (12) + 2 * 32 + 4 + 8 100
Role String WorkerRole1 2 * (4) + 2 * 11 + 4 + 8 42
RoleInstance String WorkerRole1_IN_0 2 * (12) + 2 * 16 + 4 + 8 68
Level Int32 5 2 * (5) + 4 + 8 22
Pid Int32 1652 2 * (3) + 4 + 8 18
Tid Int32 2245 2 * (3) + 4 + 8 18
Function String XTableConnection::PushOutMessages 2 * (8) + 2 * 33 + 4 + 8 94
Line Int32 969 2 * (4) + 4 + 8 20
MDRESULT Int32 327705 2 * (8) + 4 + 8 28
ErrorCodeMsg String Some error message. 2 * (12) + 2 * 19 + 4 + 8 74
Message String

Successfully sent out data (1 messages),

"–batch_36522ad7-fc75-4b56-8c71-56071383e000

Content-Type: multipart/mixed;

boundary=changeset_77162fcd-b8da-41ac-a9f8-9357efbbd000

Content-Length: 1811


–changeset_77162fcd-b8da-41ac-a9f8-9357efbbd000

Content-Type: application

/http

Content-Transfer-Encoding:binary

POST https://cerebrataqa.table.core.

windows.net//WADDiagnosticInfrastr

uctureLogsTable HTTP/1.1

Host:

https://cerebrataqa.table.core.windows.net/

Content-ID: 1

Content-Type:

application/atom+xml; type=entry

Content-Length: 1329

x-ms-date:

Fri, 23 Apr 2010 07:24:09 GMT

<?xml version="1.0" encoding="utf-8"

standalone="yes"?><entry xmlns:d="http://schemas.microsoft.com

/ado/2007/08/dataservices"

xmlns:m="http://schemas.microsoft.com

/ado/2007/08/dataservices/metadata"

xmlns="http://www.w3.org/2005

/Atom"><title /><updated>2010-04-23T07:22:00.711Z</updated>

<author><name /></author><id /><content type="application/xml">

<m:properties><d:PartitionKey>

0634076041200000000</d:PartitionKey>

<d:RowKey>0851db6534634c1fbc

e8496c6c57a2e3

___WorkerRole1___

WorkerRole1_IN_0___0000000001735393281

</d:RowKey><d:EventTickCount

m:type="Edm.Int64">634076041207111524

</d:EventTickCount>

<d:DeploymentId m:type="Edm.String">0851db6534634c1fbce84

96c6c57a2e3</d:DeploymentId><d:Role m:type="Edm.String">Wo

rkerRole1</d:Role><d:RoleInstance m:type="Edm.String">Worke

rRole1_IN_0</d:RoleInstance><d:Level m:type="Edm.Int32">4</d

:Level><d:Pid m:type="Edm.Int32">1400</d:Pid><d:Tid
m:type="E

dm.Int32">2712</d:Tid><d:Function m:type="Edm.String"></d:Func

tion><d:Line m:type="Edm.Int32">0</d:Line><d:MDRESULT
m:type="

Edm.Int32">0</d:MDRESULT><d:ErrorCode m:type="Edm.I

nt32">0</d:ErrorCode><d:ErrorCodeMsg m:type="Edm.String">

</d:ErrorCodeMsg><d:Message m:type="Edm.String">Checking fo

r configuration updates 04/23/2010 07:21:55.; TraceSource

‘Microsoft.WindowsAzure.Diagnostics’ event</d:Message>

</m:properties></content></entry>

–changeset_77162fcd-b8

da-41ac-a9f8-9357efbbd000–

–batch_36522ad7-fc75-4b56-8c71-5607138

2 * (7) + 2 * 1980 + 4 + 8 3986
         
Overhead       4
Total / Entity       4730

Again if I’m writing one record / second of these, this is how much the cost I’m incurring in a month from storage cost point of view:

4730 bytes * 60 (seconds/minute) * 60 (minutes/hour) * 24 (hours/day) * 30 (days/month) * $0.125/GB = $1.43.

WADPerformanceCountersTable

Attribute

Data Type

Sample Data

Size (Formula)

Size (Bytes)

PartitionKey String 0634012319400000000 2 * (19) 38
RowKey String 43b7f32f389648639f16b55c6dcd7c4b

___WorkerRole1___WorkerRole1_IN_

0___0000000001652032029
2 * (87) 174
EventTickCount Int64 634012319404982640 2 * (14) + 8 + 8 44
DeploymentId String 43b7f32f389648639f16b55c6dcd7c4b 2 * (12) + 2 * 32 + 4 + 8 100
Role String WorkerRole1 2 * (4) + 2 * 11 + 4 + 8 42
RoleInstance String WorkerRole1_IN_0 2 * (12) + 2 * 16 + 4 + 8 68
CounterName String \%Processor(_Total)\% Processor Time 2 * (11) + 2 * 35 + 4 + 8 104
CounterValue Double 0.173347 2 * (12) + 8 + 8 40
         
Overhead       4
Total / Entity 614

If my sampling rate is once / second, this is how much the cost I’m incurring in a month from storage cost point of view:

614 bytes * 60 (seconds/minute) * 60 (minutes/hour) * 24 (hours/day) * 30 (days/month) * $0.125/GB = $0.185.

However we normally capture more performance counters. So if we’re capturing say 5 performance counters at the same rate, the storage cost goes up by 5 times and would be approximately $0.90.

WADWindowsEventLogsTable

Attribute

Data Type

Sample Data

Size (Formula)

Size (Bytes)

PartitionKey String 0634012319400000000 2 * (19) 38
RowKey String 43b7f32f389648639f16b55c6dcd7c4b

___WorkerRole1___WorkerRole1_IN_

0___0000000001652032029
2 * (87) 174
EventTickCount Int64 634012319404982640 2 * (14) + 8 + 8 44
DeploymentId String 43b7f32f389648639f16b55c6dcd7c4b 2 * (12) + 2 * 32 + 4 + 8 100
Role String WorkerRole1 2 * (4) + 2 * 11 + 4 + 8 42
RoleInstance String WorkerRole1_IN_0 2 * (12) + 2 * 16 + 4 + 8 68
ProviderGuid String {555908D1-A6D7-4695-8E1E-26931D2012F4} 2 * (12) + 2 * 38 + 4 + 8 112
ProviderName String Service Control Manager 2 * (12) + 2 * 23 + 4 + 8 82
EventId Int32 0 2 * (7) + 4 + 8 26
Level Int32 5 2 * (5) + 4 + 8 22
Pid Int32 1652 2 * (3) + 4 + 8 18
Tid Int32 2128 2 * (3) + 4 + 8 18
Channel String System 2 * (7) + 2 * 6 + 4 + 8 38
RawXml String <Event xmlns=’http://schemas.microsoft.com/win/

2004/08/events/event’
><System><Provider Name=

‘Service Control Manager’ Guid='{555908D1-A6D7

-4695-8E1E-26931D2012F4}’ EventSourceName=

‘Service Control Manager’/><EventID Qualifiers=’163

84′>7036</EventID><Version>0

</Version><Level>4</Level><Task>0</Task>

<Opcode>0</Opcode><Keywords>0x80000000000000

</Keywords><TimeCreated SystemTime=

‘2010-06-20T06:35:04.000Z’/><EventRecordID>

2237

</EventRecordID><Correlation/><Execution

ProcessID=’0′ ThreadID=’0’/><Channel>

System</Channel>

<Computer>RD00155D317B6C</Computer>

<Security/></System><EventData><Data Name=’param1′>SLUINotify</Data><Data
Name=’param2′>stopped</Data>

</EventData></Event>
2 * (6) + 2 * 676 + 4 + 8 1376
         
Overhead       4
Total / Entity 2162

Assuming that I’m writing one record / second of these, this is how much the cost I’m incurring in a month from storage cost point of view:

2162 bytes * 60 (seconds/minute) * 60 (minutes/hour) * 24 (hours/day) * 30 (days/month) * $0.125/GB = $0.65.

WADDirectoriesTable

Attribute

Data Type

Sample Data

Size (Formula)

Size (Bytes)

PartitionKey String 0634012319400000000 2 * (19) 38
RowKey String 43b7f32f389648639f16b55c6dcd7c4b

___WorkerRole1___WorkerRole1_IN_

0___0000000001652032029
2 * (87) 174
EventTickCount Int64 634012319404982640 2 * (14) + 8 + 8 44
DeploymentId String 43b7f32f389648639f16b55c6dcd7c4b 2 * (12) + 2 * 32 + 4 + 8 100
Role String WorkerRole1 2 * (4) + 2 * 11 + 4 + 8 42
RoleInstance String WorkerRole1_IN_0 2 * (12) + 2 * 16 + 4 + 8 68
AbsolutePath String C:\Users\Gaurav.Mantri\AppData\Local\dftmp\s0\deployment(60)
\res\deployment(60).WAD_Basic.WADDemo.0\directory
\DiagnosticStore\LogFiles\W3SVC1\u_ex11011913.log
2 * (12) + 2 * 158 + 4 + 8 352
RelativePath String deployment(60)/WADDemo/deployment(60).WAD_Basic.WADDemo.0
/W3SVC1\u_ex11011913.log
2 * (12) + 2 * 81 + 4 + 8 198
Container String wad-iis-logfiles 2 * (9) + 2 * 16 + 4 + 8 62
RootDirectory String C:\Users\Gaurav.Mantri\AppData\Local\dftmp\s0\deployment(60)
\res\deployment(60).WAD_Basic.WADDemo.0\directory\DiagnosticStore\LogFiles
2 * (13) + 2 * 134 + 4 + 8 306
         
Overhead       4
Total / Entity 1388

Assuming that I’m writing one record / second of these, this is how much the cost I’m incurring in a month from storage cost point of view:

1388 bytes * 60 (seconds/minute) * 60 (minutes/hour) * 24 (hours/day) * 30 (days/month) * $0.125/GB = $0.42.

Some Considerations

At the outset, these amounts look pretty small and considering grand scheme of things, they are. But hey, a penny saved is penny earned Smile.

A few things need to be kept in mind:

  • This is a very simplistic scenario. In actual code, you will probably have more data especially for message (or similar) attributes.
  • This is for a single instance. Depending upon the number of instances you have, these amounts will multiply by that factor.
  • Note that these are not one time charges. If you don’t do anything with this data, you will keep on incurring these charges month after month.

Controlling Costs

As I said above, a penny saved is penny earned. So let’s talk about some ways you can control these costs. These are obviously some of the things I could think of. If you have additional ways to control it, please feel free to share.

Transfer only the data you need for analysis

WAD gives you the flexibility to collect a lot of diagnostics data and transfer only the selected one. If you are using .Net Storage Client library, you could specify what kind of data you wish to collect and what kind of data you wish to persist in Windows Azure Storage. For example, with trace/diagnostics infrastructure/event logs you could collect everything but then choose to transfer only Error and above log types:

// Filter what will be sent to persistent storage.
var myLogLevel = LogLevel.Error;
config.Logs.ScheduledTransferLogLevelFilter = myLogLevel;
config.DiagnosticInfrastructureLogs.ScheduledTransferLogLevelFilter = myLogLevel;
config.WindowsEventLog.ScheduledTransferLogLevelFilter = myLogLevel;

 

Obviously a lot of that depends on what stage your application is in (in other words, how stable the application is) and what kind of data is critical for you.

Go easy on performance counters!

One may get carried away and start capturing all possible performance counters they could think of and with very high sampling rate. But do think before you do that. Firstly, capturing a lot of performance counters data with high sampling rate could create some strain on your VM and also unlike other logs, you don’t get the flexibility to transfer only selected performance counters. Whatever you collect will get persisted in Windows Azure Storage.

Keep an eye on Diagnostics Infrastructure Logs!

They can be the curve (or googly as we say in cricket) ball you never expectedSmile. Basically these are automatically created by Windows Azure i.e. you don’t write to these logs, Windows Azure does and usually they’re quite verbose as we saw from the example above.

Clean up occasionally

It may not hurt to clean up diagnostics data once in a while. There’re two approaches of cleaning up the data:

Partial deletes

This would mean deleting some old data e.g. delete data that is one month old. You could use one of the existing tools to do that. I know all Cerebrata tools are capable of doing that and they are optimized to delete diagnostics data specifically. You could also write your own tool to do that. What this tool would do is fetch the diagnostics data for the date range you specify and then delete that data. If you do end up writing your own tool, just keep in mind NOT to fetch the data on Timestamp or EventTickCount Attribute. Always use PartitionKey. I wrote a blog post sometime ago about fetching diagnostics data efficiently which you can read here:

https://gauravmantri.com/2012/02/17/effective-way-of-fetching-diagnostics-data-from-windows-azure-diagnostics-table-hint-use-partitionkey/.

Partial deletes are usually cumbersome in the sense that they are more time consuming and since you’re doing multiple reads and deletes, you incur transaction and bandwidth costs. However they give you greater flexibility.

Complete deletes

Complete delete simply means deleting a table completely. For example, if you use Cloud Storage Studio, you could simply right click on one of these tables and hit delete. Seems simple and rather painless, isn’t it?? Unfortunately that’s not the case. The time it takes to actually delete a table depends on the amount of data you may have in one of these tables. It may take a few seconds or it may take a few days to actually delete the data.

Here’s my theory on this: When you delete a table, Windows Azure marks that table as “Deleted” so that further operations can’t be performed on that table however the table is not deleted immediately. Instead it is handed over to some garbage collector process which deletes this table on its own will and you don’t have control over that.

What’s worse is that while the table is being “Deleted”, any attempt to recreate the table would result in an error. Thus if your application is constantly writing the data in WAD tables, that would fail.

Some folks have recommended to switch diagnostics storage account before you do this. That way, your storage account is not used actively by any of your applications while you’re deleting the tables from that storage account and the diagnostics data goes into a separate storage account and you don’t lose any information.

Switch to locally redundant storage for diagnostics data

Since locally redundant storage is about 25% cheaper than geo-redundant storage, you may be able to cut down your storage costs by that much amount. Again the decision for this should be based not only on the cost but also on the availability requirement as well.

Summary

These were some of my thoughts on costing aspects of WAD and how you can control it. Feel free to pitch in if you have some more information to share regarding this. As always, if you find any issues with this blog post, please let me know ASAP and I will fix them.

So long and stay tuned!!!


[This is the latest product I'm working on]