In this blog post, we’ll try to understand the costs associated with Windows Azure Diagnostics (WAD) and some of the things we could do to keep it down.
Brief Introduction
Let’s take a moment and talk briefly about WAD especially around how the data is stored. If you’re familiar with WAD and it’s data store, please feel free to skip this section.
Essentially Windows Azure Storage (Tables and Blobs) are utilized to store WAD data collected by your application. Following table summarizes the tables/blob containers used for storing WAD data:
Table / Blob Container Name | Purpose |
WADLogsTable | Table to store application tracing data. |
WADDiagnosticInfrastructureLogsTable | Table to store diagnostics infrastructure data collected by Windows Azure. |
WADPerformanceCountersTable | Table to store performance counters data. |
WADWindowsEventLogsTable | Table to store event logs data. |
WADDirectoriesTable | Pointer table for some of the diagnostics data stored in blob storage. |
wad-iis-logfiles | Blob container to store IIS logs. |
wad-iis-failedrequestlogfiles | Blob container to store IIS failed request logs. |
wad-crash-dumps | Blob container to store crash dump data. |
wad-control-container | Blob container to store WAD configuration data. |
In this blog post, we will focus only on tables.
Understanding Overall Costing
Now let’s take a moment and understand how you’re charged. Since the data is stored in Windows Azure Storage, there’re two components:
Storage Costs
This is the cost of storing the data. Since the data is stored in the form of entities in the tables mentioned above, it is possible to calculate the storage size. The formula for calculating the storage cost of an entity is:
4 bytes + Len (PartitionKey + RowKey) * 2 bytes + For-Each Property(8 bytes + Len(Property Name) * 2 bytes + Sizeof(.Net Property Type))
Where, the Sizeof(.Net Property Type) for the different types is:
- String – # of Characters * 2 bytes + 4 bytes for length of string
- DateTime – 8 bytes
- GUID – 16 bytes
- Double – 8 bytes
- Int – 4 bytes
- INT64 – 8 bytes
- Bool – 1 byte
- Binary – sizeof(value) in bytes + 4 bytes for length of binary array
At the time of writing this blog post, the cost of storing 1 GB of data in Windows Azure Storage was:
$0.125 – Geo redundant storage
$0.093 – Locally redundant storage
Transaction Costs
This is the cost of inserting records in Windows Azure Table Storage. WAD makes use of entity group transactions and the PartitionKey for WAD tables actually represent date/time (in UTC) up to minutes precision. What that means is that for each minute of diagnostics data stored in table storage, you incur charge for a single transaction. This is based on the assumption that
- You’re not collecting more than 100 data points per minute because there’s a limit of 100 entities per entity group transaction. E.g. if you’re collecting 5 performance counters every second, then in a minute you’re collecting 300 data points per minute. In this case, to transfer this data WAD would need to perform 3 transactions.
- Total payload size is less than 4 MB because of the size limitation in an entity group transaction. E.g. if a WAD entity is say 1 MB in size and you have 10 such WAD entities per minute. Since the total payload is 10 MB, to transfer this data WAD would need to perform 3 transactions.
At the time of writing this blog post, the cost of performing 100,000 transactions against your storage account was $0.01.
Bandwidth Costs
There’s also bandwidth costs but we will not consider it in our calculations because I’m assuming your compute instances and diagnostics storage account are co-located in the same data center (even in the same affinity group) and you don’t pay for bandwidth unless the data goes out of the data center.
Storage Cost Calculator
Now let’s take some sample data and calculate how much it would cost us to store just that data. One can then extrapolate that data to calculate total storage costs.
Since all tables have different attributes, we will take each table separately.
WADLogsTable
Attribute |
Data Type |
Sample Data |
Size (Formula) |
Size (Bytes) |
PartitionKey | String | 0634012319400000000 | 2 * (19) | 38 |
RowKey | String |
43b7f32f389648639f16b55c6dcd7c4b ___WorkerRole1___WorkerRole1_IN_ 0___0000000001652032029 |
2 * (87) | 174 |
EventTickCount | Int64 | 634012319404982640 | 2 * (14) + 8 + 8 | 44 |
DeploymentId | String | 43b7f32f389648639f16b55c6dcd7c4b | 2 * (12) + 2 * 32 + 4 + 8 | 100 |
Role | String | WorkerRole1 | 2 * (4) + 2 * 11 + 4 + 8 | 42 |
RoleInstance | String | WorkerRole1_IN_0 | 2 * (12) + 2 * 16 + 4 + 8 | 68 |
Level | Int32 | 5 | 2 * (5) + 4 + 8 | 22 |
EventId | Int32 | 0 | 2 * (7) + 4 + 8 | 26 |
Pid | Int32 | 1652 | 2 * (3) + 4 + 8 | 18 |
Tid | Int32 | 2128 | 2 * (3) + 4 + 8 | 18 |
Message | String | Worker Role is working.: Working | 2 * (7) + 2 * 32 + 4 + 8 | 100 |
Overhead | 4 | |||
Total / Entity | 654 |
So if I am writing the following line of code once per second:
Trace.WriteLine("Worker Role is working.: Working");
This is how much the cost I’m incurring in a month from storage cost point of view:
654 bytes * 60 (seconds/minute) * 60 (minutes/hour) * 24 (hours/day) * 30 (days/month) * $0.125/GB = $0.20.
From storage transactions point of view, the total cost I’m incurring in a month would be:
1 entity batch transaction for a minute * 60 (minutes/hour) * 24 (hours/day) * 30 (days/month) * $0.01/100000 = $0.004.
WADDiagnosticInfrastructureLogsTable
Attribute |
Data Type |
Sample Data |
Size (Formula) |
Size (Bytes) |
PartitionKey | String | 0634012319400000000 | 2 * (19) | 38 |
RowKey | String |
43b7f32f389648639f16b55c6dcd7c4b ___WorkerRole1___WorkerRole1_IN_ 0___0000000001652032029 |
2 * (87) | 174 |
EventTickCount | Int64 | 634012319404982640 | 2 * (14) + 8 + 8 | 44 |
DeploymentId | String | 43b7f32f389648639f16b55c6dcd7c4b | 2 * (12) + 2 * 32 + 4 + 8 | 100 |
Role | String | WorkerRole1 | 2 * (4) + 2 * 11 + 4 + 8 | 42 |
RoleInstance | String | WorkerRole1_IN_0 | 2 * (12) + 2 * 16 + 4 + 8 | 68 |
Level | Int32 | 5 | 2 * (5) + 4 + 8 | 22 |
Pid | Int32 | 1652 | 2 * (3) + 4 + 8 | 18 |
Tid | Int32 | 2245 | 2 * (3) + 4 + 8 | 18 |
Function | String | XTableConnection::PushOutMessages | 2 * (8) + 2 * 33 + 4 + 8 | 94 |
Line | Int32 | 969 | 2 * (4) + 4 + 8 | 20 |
MDRESULT | Int32 | 327705 | 2 * (8) + 4 + 8 | 28 |
ErrorCodeMsg | String | Some error message. | 2 * (12) + 2 * 19 + 4 + 8 | 74 |
Message | String |
Successfully sent out data (1 messages),
POST https://cerebrataqa.table.core.
uctureLogsTable HTTP/1.1
<?xml version="1.0" encoding="utf-8"
0634076041200000000</d:PartitionKey> e8496c6c57a2e3
___WorkerRole1___
</d:RowKey><d:EventTickCount
</d:EventTickCount> |
2 * (7) + 2 * 1980 + 4 + 8 | 3986 |
Overhead | 4 | |||
Total / Entity | 4730 |
Again if I’m writing one record / second of these, this is how much the cost I’m incurring in a month from storage cost point of view:
4730 bytes * 60 (seconds/minute) * 60 (minutes/hour) * 24 (hours/day) * 30 (days/month) * $0.125/GB = $1.43.
WADPerformanceCountersTable
Attribute |
Data Type |
Sample Data |
Size (Formula) |
Size (Bytes) |
PartitionKey | String | 0634012319400000000 | 2 * (19) | 38 |
RowKey | String |
43b7f32f389648639f16b55c6dcd7c4b ___WorkerRole1___WorkerRole1_IN_ 0___0000000001652032029 |
2 * (87) | 174 |
EventTickCount | Int64 | 634012319404982640 | 2 * (14) + 8 + 8 | 44 |
DeploymentId | String | 43b7f32f389648639f16b55c6dcd7c4b | 2 * (12) + 2 * 32 + 4 + 8 | 100 |
Role | String | WorkerRole1 | 2 * (4) + 2 * 11 + 4 + 8 | 42 |
RoleInstance | String | WorkerRole1_IN_0 | 2 * (12) + 2 * 16 + 4 + 8 | 68 |
CounterName | String | \%Processor(_Total)\% Processor Time | 2 * (11) + 2 * 35 + 4 + 8 | 104 |
CounterValue | Double | 0.173347 | 2 * (12) + 8 + 8 | 40 |
Overhead | 4 | |||
Total / Entity | 614 |
If my sampling rate is once / second, this is how much the cost I’m incurring in a month from storage cost point of view:
614 bytes * 60 (seconds/minute) * 60 (minutes/hour) * 24 (hours/day) * 30 (days/month) * $0.125/GB = $0.185.
However we normally capture more performance counters. So if we’re capturing say 5 performance counters at the same rate, the storage cost goes up by 5 times and would be approximately $0.90.
WADWindowsEventLogsTable
Attribute |
Data Type |
Sample Data |
Size (Formula) |
Size (Bytes) |
PartitionKey | String | 0634012319400000000 | 2 * (19) | 38 |
RowKey | String |
43b7f32f389648639f16b55c6dcd7c4b ___WorkerRole1___WorkerRole1_IN_ 0___0000000001652032029 |
2 * (87) | 174 |
EventTickCount | Int64 | 634012319404982640 | 2 * (14) + 8 + 8 | 44 |
DeploymentId | String | 43b7f32f389648639f16b55c6dcd7c4b | 2 * (12) + 2 * 32 + 4 + 8 | 100 |
Role | String | WorkerRole1 | 2 * (4) + 2 * 11 + 4 + 8 | 42 |
RoleInstance | String | WorkerRole1_IN_0 | 2 * (12) + 2 * 16 + 4 + 8 | 68 |
ProviderGuid | String | {555908D1-A6D7-4695-8E1E-26931D2012F4} | 2 * (12) + 2 * 38 + 4 + 8 | 112 |
ProviderName | String | Service Control Manager | 2 * (12) + 2 * 23 + 4 + 8 | 82 |
EventId | Int32 | 0 | 2 * (7) + 4 + 8 | 26 |
Level | Int32 | 5 | 2 * (5) + 4 + 8 | 22 |
Pid | Int32 | 1652 | 2 * (3) + 4 + 8 | 18 |
Tid | Int32 | 2128 | 2 * (3) + 4 + 8 | 18 |
Channel | String | System | 2 * (7) + 2 * 6 + 4 + 8 | 38 |
RawXml | String |
<Event xmlns=’http://schemas.microsoft.com/win/ 2004/08/events/event’><System><Provider Name= ‘Service Control Manager’ Guid='{555908D1-A6D7 -4695-8E1E-26931D2012F4}’ EventSourceName= ‘Service Control Manager’/><EventID Qualifiers=’163 84′>7036</EventID><Version>0 </Version><Level>4</Level><Task>0</Task> <Opcode>0</Opcode><Keywords>0x80000000000000 </Keywords><TimeCreated SystemTime= ‘2010-06-20T06:35:04.000Z’/><EventRecordID> 2237 </EventRecordID><Correlation/><Execution ProcessID=’0′ ThreadID=’0’/><Channel> System</Channel> <Computer>RD00155D317B6C</Computer> <Security/></System><EventData><Data Name=’param1′>SLUINotify</Data><Data Name=’param2′>stopped</Data> </EventData></Event> |
2 * (6) + 2 * 676 + 4 + 8 | 1376 |
Overhead | 4 | |||
Total / Entity | 2162 |
Assuming that I’m writing one record / second of these, this is how much the cost I’m incurring in a month from storage cost point of view:
2162 bytes * 60 (seconds/minute) * 60 (minutes/hour) * 24 (hours/day) * 30 (days/month) * $0.125/GB = $0.65.
WADDirectoriesTable
Attribute |
Data Type |
Sample Data |
Size (Formula) |
Size (Bytes) |
PartitionKey | String | 0634012319400000000 | 2 * (19) | 38 |
RowKey | String |
43b7f32f389648639f16b55c6dcd7c4b ___WorkerRole1___WorkerRole1_IN_ 0___0000000001652032029 |
2 * (87) | 174 |
EventTickCount | Int64 | 634012319404982640 | 2 * (14) + 8 + 8 | 44 |
DeploymentId | String | 43b7f32f389648639f16b55c6dcd7c4b | 2 * (12) + 2 * 32 + 4 + 8 | 100 |
Role | String | WorkerRole1 | 2 * (4) + 2 * 11 + 4 + 8 | 42 |
RoleInstance | String | WorkerRole1_IN_0 | 2 * (12) + 2 * 16 + 4 + 8 | 68 |
AbsolutePath | String |
C:\Users\Gaurav.Mantri\AppData\Local\dftmp\s0\deployment(60) \res\deployment(60).WAD_Basic.WADDemo.0\directory \DiagnosticStore\LogFiles\W3SVC1\u_ex11011913.log |
2 * (12) + 2 * 158 + 4 + 8 | 352 |
RelativePath | String |
deployment(60)/WADDemo/deployment(60).WAD_Basic.WADDemo.0 /W3SVC1\u_ex11011913.log |
2 * (12) + 2 * 81 + 4 + 8 | 198 |
Container | String | wad-iis-logfiles | 2 * (9) + 2 * 16 + 4 + 8 | 62 |
RootDirectory | String |
C:\Users\Gaurav.Mantri\AppData\Local\dftmp\s0\deployment(60) \res\deployment(60).WAD_Basic.WADDemo.0\directory\DiagnosticStore\LogFiles |
2 * (13) + 2 * 134 + 4 + 8 | 306 |
Overhead | 4 | |||
Total / Entity | 1388 |
Assuming that I’m writing one record / second of these, this is how much the cost I’m incurring in a month from storage cost point of view:
1388 bytes * 60 (seconds/minute) * 60 (minutes/hour) * 24 (hours/day) * 30 (days/month) * $0.125/GB = $0.42.
Some Considerations
At the outset, these amounts look pretty small and considering grand scheme of things, they are. But hey, a penny saved is penny earned .
A few things need to be kept in mind:
- This is a very simplistic scenario. In actual code, you will probably have more data especially for message (or similar) attributes.
- This is for a single instance. Depending upon the number of instances you have, these amounts will multiply by that factor.
- Note that these are not one time charges. If you don’t do anything with this data, you will keep on incurring these charges month after month.
Controlling Costs
As I said above, a penny saved is penny earned. So let’s talk about some ways you can control these costs. These are obviously some of the things I could think of. If you have additional ways to control it, please feel free to share.
Transfer only the data you need for analysis
WAD gives you the flexibility to collect a lot of diagnostics data and transfer only the selected one. If you are using .Net Storage Client library, you could specify what kind of data you wish to collect and what kind of data you wish to persist in Windows Azure Storage. For example, with trace/diagnostics infrastructure/event logs you could collect everything but then choose to transfer only Error and above log types:
// Filter what will be sent to persistent storage. var myLogLevel = LogLevel.Error; config.Logs.ScheduledTransferLogLevelFilter = myLogLevel; config.DiagnosticInfrastructureLogs.ScheduledTransferLogLevelFilter = myLogLevel; config.WindowsEventLog.ScheduledTransferLogLevelFilter = myLogLevel;
Obviously a lot of that depends on what stage your application is in (in other words, how stable the application is) and what kind of data is critical for you.
Go easy on performance counters!
One may get carried away and start capturing all possible performance counters they could think of and with very high sampling rate. But do think before you do that. Firstly, capturing a lot of performance counters data with high sampling rate could create some strain on your VM and also unlike other logs, you don’t get the flexibility to transfer only selected performance counters. Whatever you collect will get persisted in Windows Azure Storage.
Keep an eye on Diagnostics Infrastructure Logs!
They can be the curve (or googly as we say in cricket) ball you never expected. Basically these are automatically created by Windows Azure i.e. you don’t write to these logs, Windows Azure does and usually they’re quite verbose as we saw from the example above.
Clean up occasionally
It may not hurt to clean up diagnostics data once in a while. There’re two approaches of cleaning up the data:
Partial deletes
This would mean deleting some old data e.g. delete data that is one month old. You could use one of the existing tools to do that. I know all Cerebrata tools are capable of doing that and they are optimized to delete diagnostics data specifically. You could also write your own tool to do that. What this tool would do is fetch the diagnostics data for the date range you specify and then delete that data. If you do end up writing your own tool, just keep in mind NOT to fetch the data on Timestamp or EventTickCount Attribute. Always use PartitionKey. I wrote a blog post sometime ago about fetching diagnostics data efficiently which you can read here:
Partial deletes are usually cumbersome in the sense that they are more time consuming and since you’re doing multiple reads and deletes, you incur transaction and bandwidth costs. However they give you greater flexibility.
Complete deletes
Complete delete simply means deleting a table completely. For example, if you use Cloud Storage Studio, you could simply right click on one of these tables and hit delete. Seems simple and rather painless, isn’t it?? Unfortunately that’s not the case. The time it takes to actually delete a table depends on the amount of data you may have in one of these tables. It may take a few seconds or it may take a few days to actually delete the data.
Here’s my theory on this: When you delete a table, Windows Azure marks that table as “Deleted” so that further operations can’t be performed on that table however the table is not deleted immediately. Instead it is handed over to some garbage collector process which deletes this table on its own will and you don’t have control over that.
What’s worse is that while the table is being “Deleted”, any attempt to recreate the table would result in an error. Thus if your application is constantly writing the data in WAD tables, that would fail.
Some folks have recommended to switch diagnostics storage account before you do this. That way, your storage account is not used actively by any of your applications while you’re deleting the tables from that storage account and the diagnostics data goes into a separate storage account and you don’t lose any information.
Switch to locally redundant storage for diagnostics data
Since locally redundant storage is about 25% cheaper than geo-redundant storage, you may be able to cut down your storage costs by that much amount. Again the decision for this should be based not only on the cost but also on the availability requirement as well.
Summary
These were some of my thoughts on costing aspects of WAD and how you can control it. Feel free to pitch in if you have some more information to share regarding this. As always, if you find any issues with this blog post, please let me know ASAP and I will fix them.
So long and stay tuned!!!