Every now and then question about searching in Azure Blob Storage comes up in one of the community forums. Folks posting the questions would like to find out things like:
- Finding blobs containing certain letters in them e.g. all pdf or png files in blob storage.
- Finding blobs by blob type – block or page.
- Finding blobs by last modified date e.g. all blobs which have not been modified since a particular date/time.
and much more. The problem is that unlike Azure Table Storage, Blob Storage does not have querying capabilities and in order to find answers to questions like above one would need to list blobs in a container and then do the filtering on the client side. It is not really an efficient way but unfortunately that had been the only option available till now. Well, not anymore :). In this blog post we will talk about how you can query Azure Blob Storage using Azure Search Service. The title is a bit of misnomer because you are not performing the searches directly on blob storage but you’re making use of Azure Search Service by seeding some searchable data from blob storage into this service and then searching for that data in Azure Search Service.
To be honest, I was not going to look into it but then my good friends Vishwas Lele and Mike Martin persuaded me to look into integrating this service in Cloud Portam application and I’m so glad that they persuaded me. I spent a few hours over this weekend to read up on this service. In fact, I had sent a really stupid email to both of these guys asking for clarification but as I started reading it, things became more clearer. The code you see below is something I wrote in about 6 hours. Yeah, it’s that easy!!!
Without further ado, let’s start.
What is Azure Search Service
Let’s take a moment and talk briefly about Azure Search Service (I will be brief because I didn’t understand it completely :P). Essentially this is one of the newest service introduced in Azure platform which provides (no marks for guessing:)), Search as a Service. Simply put, you (as a developer) can just use this service to build search applications that can be easily integrated in your existing applications. It includes full-text search over your content, plus advanced features like type-ahead query suggestions and relevance scoring profiles used to create a custom ranking system for search results.
Search service is exposed via REST API (which we will use later on in the post) thus you’re not restricted to any specific platform. It doesn’t really matter whether your application is built in PHP, Java, .Net, Ruby or any other language. As long as the programming language you’re using has the capability to consume REST API, you can have search functionality integrated into your application.
As I mentioned above, my good friends wanted me to incorporate search capability for blob storage and I got all confused that how would I allow this service to index the contents of my blob storage. I looked up the documentation trying to find answer to this question and ended up more confused. Upon reading the documentation, a few things became clearer. Some concepts related to this service are which I think are important to grasp:
- Azure Search Service is not a crawler. It is very important to understand this concept. It will not go and crawl the contents of your data store. Instead you would need to push content you want to become searchable in it and it will index the data and provide a REST interface to query that data.
- Index is an entity in Azure Search Service which defines the search behavior. The data which you want to be searched is constrained by this index. An index can have many attributes and each attribute can have many properties like name, data type etc.
- Document is an entity in Azure Search Service which defines the data. It is important to understand that data is not blob or in other words you can’t have unstructured documents in this service. A document’s structure is constrained by the Index containing this document. A document can have many properties however properties names/data types must match with the properties of Index.
Azure team has done a great job with providing quality documentation. I would strong encourage you to read about this service on MSDN: http://msdn.microsoft.com/en-us/library/azure/dn798933.aspx.
Problem Statement
Now that basic introduction of Azure Search Service is done, its time now to dig into the code. But before we do that, let’s take a moment and define our problem statement (or in other words what we’re trying to achieve):
- We want our blobs to be searchable by blob name, container name, blob type, blob’s content type.
- For this demo, we only want to do simple text searches though you can use this service to perform complex searches like “search for page blobs which are greater than 1 GB in size and have not been modified since last month”. These kind of complex searches are done via OData queries.
- Our blob content will not be searchable.
Implementation
Now that problem statement is defined, it’s time to get our hands dirty and start implementing the code. We will walk through some steps to accomplish our goal.
Step 1: Create Search Service
First step is to create a search service if not done already. I will not go through the process here as it has been covered in depth here: http://azure.microsoft.com/en-us/documentation/articles/search-configure/. There are 3 things we would need when you have completed creating a search service in the portal:
- Your search service endpoint. For example, for my application I created a search service by the name “blob-search” thus my endpoint (without https) would be “blob-search.search.windows.net”.
- API Key. It could be the primary key or secondary key. I used primary key in my application. This is used for performing management operations like creating index, uploading document etc.
- Query Key. This is used for performing query operations.
Once we have all this information available, we can move on to next step. Since we’re going to consume REST API directly, I wrote some helper classes which we will be using throughout our application. Here’s the code for those classes:
RequestResponseHelper.cs
Responsible for performing HTTP Web Requests and processing web responses.
using System; using System.Collections.Generic; using System.Collections.Specialized; using System.IO; using System.Linq; using System.Net; using System.Text; using System.Threading.Tasks; namespace SearchServiceHelper { /// <summary> /// Helper class to perform HTTP Web Requests and process Web Responses. /// </summary> public static class RequestResponseHelper { public static async Task<WebResponse> ProcessRequest(Uri requestUri, HttpMethod httpMethod, string contentType, NameValueCollection requestHeaders, byte[] requestBody) { try { HttpWebRequest request = (HttpWebRequest)WebRequest.Create(requestUri); request.Method = httpMethod.ToString(); request.Host = SystemSettings.Endpoint; switch (httpMethod) { case HttpMethod.GET: case HttpMethod.HEAD: case HttpMethod.DELETE: request.ContentLength = 0; break; default: if (requestBody != null) { request.ContentLength = requestBody.Length; } break; } if (!string.IsNullOrWhiteSpace(contentType)) { request.ContentType = contentType; } if (requestHeaders != null) { request.Headers.Add(requestHeaders); } WebResponse response = await ProcessHttpRequest(request, requestBody); return response; } finally { } } private static async Task<WebResponse> ProcessHttpRequest(HttpWebRequest request, byte[] requestBody) { try { if (requestBody != null) { using (Stream s = request.GetRequestStream()) { s.Write(requestBody, 0, requestBody.Length); } } return await request.GetResponseAsync(); } finally { } } } public enum HttpMethod { GET, HEAD, POST, PUT, MERGE, DELETE } }
SystemSettings.cs
Contains some static variables which are filled by reading from various configuration files.
using System; using System.Collections.Generic; using System.Configuration; using System.Linq; using System.Text; using System.Threading.Tasks; namespace SearchServiceHelper { /// <summary> /// Helper class to read keys from configuration store. /// </summary> public static class SystemSettings { private static string _apiKey; private static string _apiVersion; private static string _endpoint; private static string _queryKey; public static string ApiKey { get { if (string.IsNullOrWhiteSpace(_apiKey)) { AppSettingsReader rdr = new AppSettingsReader(); _apiKey = (string)rdr.GetValue("ApiKey", typeof(string)); } return _apiKey; } } public static string ApiVersion { get { if (string.IsNullOrWhiteSpace(_apiVersion)) { AppSettingsReader rdr = new AppSettingsReader(); _apiVersion = (string)rdr.GetValue("ApiVersion", typeof(string)); } return _apiVersion; } } public static string Endpoint { get { if (string.IsNullOrWhiteSpace(_endpoint)) { AppSettingsReader rdr = new AppSettingsReader(); _endpoint = (string)rdr.GetValue("Endpoint", typeof(string)); } return _endpoint; } } public static string QueryKey { get { if (string.IsNullOrWhiteSpace(_queryKey)) { AppSettingsReader rdr = new AppSettingsReader(); _queryKey = (string)rdr.GetValue("QueryKey", typeof(string)); } return _queryKey; } } } }
Step 2: Create Index
Next step is to create an index programmatically. Following table summarizes the attributes of the index we’re going to create for our application.
Name | Data Type | Key | Searchable | Retrievable | Comments |
---|---|---|---|---|---|
UniqueIdentifier | Edm.String | Yes | Yes | Yes | Unique identifier |
Name | Edm.String | No | Yes | Yes | Blob name |
Container | Edm.String | No | Yes | Yes | Container name |
URL | Edm.String | No | Yes | Yes | Blob URL |
BlobType | Edm.String | No | Yes | Yes | Blob type: Block or Page |
ContentType | Edm.String | No | Yes | Yes | Blob’s content type |
Size | Edm.Double | No | Yes | Yes | Blob’s size |
LastModified | Edm.DateTimeOffset | No | Yes | Yes | Blob’s last modified date/time |
Since an index contains one or more fields, we will create a class called “IndexField” which will include properties needed for an index field. Also when sending the data to Search Service, we need to convert this data into JSON format, I wrote a simple “ToJson()” method which does just that. Here’s the code for “IndexField” class:
/// <summary> /// Entity class for an Index field. /// </summary> public class IndexField { public string Name { get; set; } public SearchDataType DataType { get; set; } public bool Searchable { get; set; } public bool Filterable { get; set; } public bool Sortable { get; set; } public bool Facetable { get; set; } public bool Suggestions { get; set; } public bool Key { get; set; } public bool Retrievable { get; set; } public string ToJson() { StringBuilder sb = new StringBuilder(); sb.Append("{"); sb.AppendFormat("\"name\":\"{0}\",", Name); switch (DataType) { case SearchDataType.String: case SearchDataType.Boolean: case SearchDataType.Int32: case SearchDataType.Double: case SearchDataType.DateTimeOffset: case SearchDataType.GeographyPoint: case SearchDataType.GeographyPolygon: sb.AppendFormat("\"type\":\"Edm.{0}\",", DataType.ToString()); break; case SearchDataType.Collection: sb.AppendFormat("\"type\":\"{0}Edm.String\",", DataType.ToString()); break; } sb.AppendFormat("\"searchable\":\"{0}\",", (DataType == SearchDataType.String || DataType == SearchDataType.Collection) ? Searchable.ToString().ToLowerInvariant() : "false"); sb.AppendFormat("\"filterable\":\"{0}\",", Filterable.ToString().ToLowerInvariant()); sb.AppendFormat("\"sortable\":\"{0}\",", (DataType == SearchDataType.Collection) ? "false" : Sortable.ToString().ToLowerInvariant()); sb.AppendFormat("\"facetable\":\"{0}\",", (DataType == SearchDataType.Collection) ? "false" : Facetable.ToString().ToLowerInvariant()); sb.AppendFormat("\"suggestions\":\"{0}\",", (DataType == SearchDataType.String || DataType == SearchDataType.Collection) ? Searchable.ToString().ToLowerInvariant() : "false"); sb.AppendFormat("\"key\":\"{0}\",", (DataType == SearchDataType.String) ? Key.ToString().ToLowerInvariant() : "false"); sb.AppendFormat("\"retrievable\":\"{0}\"", Retrievable.ToString().ToLowerInvariant()); sb.Append("}"); return sb.ToString(); } }
We will do similar thing with “Index” as well.
/// <summary> /// Entity class for an Index. /// </summary> public class Index { public string Name { get; set; } public List<IndexField> Fields { get; set; } public string ToJson() { StringBuilder sb = new StringBuilder(); sb.Append("{"); sb.AppendFormat("\"name\":\"{0}\",", Name); if (Fields != null && Fields.Count > 0) { sb.Append("\"fields\": ["); for (var i = 0; i < Fields.Count; i++) { sb.Append(Fields[i].ToJson()); if (i < Fields.Count - 1) { sb.Append(","); } } sb.Append("]"); } sb.Append("}"); return sb.ToString(); } }
Now the next thing we would need to do is consume the REST API for creating index. Here’s the code for that:
/// <summary> /// Helper function to create Index using REST API. /// </summary> /// <param name="index"></param> /// <returns></returns> public static async Task<bool> CreateIndex(Index index) { var createIndexUri = new Uri(string.Format("https://{0}/indexes?api-version={1}", SystemSettings.Endpoint, SystemSettings.ApiVersion)); var requestBody = index.ToJson(); NameValueCollection requestHeaders = new NameValueCollection() { {"api-key", SystemSettings.ApiKey} }; using (var response = (HttpWebResponse) await RequestResponseHelper.ProcessRequest(createIndexUri, HttpMethod.POST, "application/json", requestHeaders, Encoding.UTF8.GetBytes(requestBody))) { var statusCode = response.StatusCode; return statusCode == HttpStatusCode.Created; } }
Now that our library functions are done, let’s consume this code and create an index. To do the basic data population stuff, I included a console application in my solution and wrote some functions there. Here’s the code that creates an index.
private static void CreateIndex() { //Create search index List<IndexField> indexFields = new List<IndexField>(); indexFields.Add(new IndexField() { Name = "UniqueIdentifier", Key = true, Searchable = true, Filterable = true, Sortable = true, DataType = SearchDataType.String, Retrievable = true, }); indexFields.Add(new IndexField() { Name = "Url", Key = false, Searchable = true, Filterable = true, Sortable = true, DataType = SearchDataType.String, Retrievable = true, }); indexFields.Add(new IndexField() { Name = "Name", Key = false, Searchable = true, Filterable = true, Sortable = true, DataType = SearchDataType.String, Retrievable = true, }); indexFields.Add(new IndexField() { Name = "BlobType", Key = false, Searchable = true, Filterable = true, Sortable = true, DataType = SearchDataType.String, Retrievable = true, }); indexFields.Add(new IndexField() { Name = "ContentType", Key = false, Searchable = true, Filterable = true, Sortable = true, DataType = SearchDataType.String, Retrievable = true, }); indexFields.Add(new IndexField() { Name = "Size", Key = false, Searchable = true, Filterable = true, Sortable = true, DataType = SearchDataType.Double, Retrievable = true, }); indexFields.Add(new IndexField() { Name = "LastModified", Key = false, Searchable = true, Filterable = true, Sortable = true, DataType = SearchDataType.DateTimeOffset, Retrievable = true, }); indexFields.Add(new IndexField() { Name = "Container", Key = false, Searchable = true, Filterable = true, Sortable = true, DataType = SearchDataType.String, Retrievable = true, }); var searchIndex = new SearchServiceHelper.Index() { Name = "blobsindex", Fields = indexFields, }; var createIndexTask = Task.Run<bool>(async () => { return await SearchApiHelper.CreateIndex(searchIndex); }); createIndexTask.Wait(); Console.WriteLine("Index created"); }
Once the index is created successfully, you should be able to see that index in the preview portal.
For our application since we’re only going to create/update index, I implemented that functionality only in my code. Furthermore there are many other options available when creating indexes which I chose to omit in my code to keep things simple. I would strongly encourage you to go through REST API documentation and find more details about all index related operations: http://msdn.microsoft.com/en-us/library/azure/dn798918.aspx.
Step 2: Build Search Catalog
The technical term mentioned on MSDN site for this is “Upload Documents” but I found this a bit confusing when I started. Thus I titled the section as “Build Search Catalog”. Essentially what we are doing in this step is providing data to Azure Search Service on which searches will be performed.
As mentioned above, a document (or catalog item) contains many fields we will create a class called “DocumentField” which has 3 properties: Name, Value and DataType. Again when we upload the document we would need to convert that into JSON format, I wrote a simple “ToJson()” method which does just that. Here’s the code for “DocumentField” class.
/// <summary> /// Entity class for a document field. /// </summary> public class DocumentField { public string Name { get; set; } public string Value { get; set; } public SearchDataType DataType { get; set; } public string ToJson() { StringBuilder sb = new StringBuilder(); sb.AppendFormat("\"{0}\": ", Name); switch (DataType) { case SearchDataType.Int32: case SearchDataType.Double: case SearchDataType.Boolean: sb.Append(Value); break; default: sb.AppendFormat("\"{0}\"", Value); break; } sb.Append(","); return sb.ToString(); } }
We will do similar thing for “Document” as well. One thing important to understand here is that each document should have a “Key” field which should uniquely identify the document (think of it as the Primary Key for the document). Thus we will have a separate property called “KeyField” for that and then a collection of other fields.
/// <summary> /// Entity class for a document. /// </summary> public class Document { public DocumentField KeyField { get; set; } public List<DocumentField> Fields { get; set; } public string ToJson(DocumentAction action) { StringBuilder sb = new StringBuilder(); sb.Append("{"); sb.AppendFormat("\"@search.action\": \"{0}\",", action.ToString().ToLowerInvariant()); var keyFieldJsonString = KeyField.ToJson(); if (keyFieldJsonString.EndsWith(",")) { keyFieldJsonString = keyFieldJsonString.Substring(0, keyFieldJsonString.Length - 1); } sb.Append(keyFieldJsonString); if (Fields != null && Fields.Count > 0) { sb.Append(","); StringBuilder fieldsSB = new StringBuilder(); foreach (var field in Fields) { if (field.Name != KeyField.Name) { fieldsSB.Append(field.ToJson()); } } var str = fieldsSB.ToString(); if (str.EndsWith(",")) { str = str.Substring(0, str.Length - 1); } sb.Append(str); } sb.Append("}"); return sb.ToString(); } }
Now the next thing we would need to do is consume the REST API for uploading document. Here’s the code for that:
/// <summary> /// Helper function to upload documents using REST API. /// </summary> /// <param name="index"></param> /// <param name="documents"></param> /// <returns></returns> public static async Task<bool> UploadDocument(Index index, List<Document> documents) { var uploadDocumentUri = new Uri(string.Format("https://{0}/indexes/{1}/docs/index?api-version={2}", SystemSettings.Endpoint, index.Name, SystemSettings.ApiVersion)); StringBuilder sb = new StringBuilder(); sb.Append("{ \"value\": ["); for (var i = 0; i < documents.Count; i++) { sb.Append(documents[i].ToJson(DocumentAction.Upload)); if (i < documents.Count - 1) { sb.Append(","); } } sb.Append("] }"); var requestBody = sb.ToString(); NameValueCollection requestHeaders = new NameValueCollection() { {"api-key", SystemSettings.ApiKey}, }; using (var response = (HttpWebResponse)await RequestResponseHelper.ProcessRequest(uploadDocumentUri, HttpMethod.POST, "application/json", requestHeaders, Encoding.UTF8.GetBytes(requestBody))) { var statusCode = response.StatusCode; return statusCode == HttpStatusCode.OK; } }
Since we will be uploading multiple documents in a single operation, we are passing a collection of documents to the method.
Now that library functions are written, what we need to do is actually consume these functions. Since our objective is to upload information about blobs in a storage account, what we will do is iterate over all blob containers in a storage account and then fetch blobs from each container. Since we want to keep the request payload optimal, we will fetch 100 blobs at a time from a blob container, create documents for these 100 blobs, upload them and then fetch the next set of blobs till the time more blobs are available in a container. Once we have fetched information about all blobs in a container, we will move on to next container and repeat these steps till the time we have iterated over all containers. Here’s the code which does that. Please make sure to put your storage account credentials in this function.
private static void UploadDocument() { var searchIndex = new SearchServiceHelper.Index() { Name = "blobsindex", }; var accountName = "[your Azure Storage Account Name]"; var accountKey = "[your Azure Storage Account Key]"; var storageAccount = new CloudStorageAccount(new StorageCredentials(accountName, accountKey), true); var blobClient = storageAccount.CreateCloudBlobClient(); BlobContinuationToken continuationToken = null; BlobContinuationToken containerContinuationToken = null; List<string> containerNames = new List<string>(); long totalBlobsIndexed = 0; do { var fetchContainersTask = Task.Run<ContainerResultSegment>(async() => { return await blobClient.ListContainersSegmentedAsync("", ContainerListingDetails.All, 100, containerContinuationToken, null, null); }); var fetchContainersTaskResult = fetchContainersTask.Result; containerContinuationToken = fetchContainersTaskResult.ContinuationToken; var containers = fetchContainersTaskResult.Results.ToList(); foreach (var container in containers) { Console.WriteLine("Listing blobs from '" + container.Name + "' container and uploading them in search service."); long totalBlobsUploaded = 0; continuationToken = null; do { var fetchBlobsTask = Task.Run<BlobResultSegment>(async () => { return await container.ListBlobsSegmentedAsync("", true, BlobListingDetails.All, 100, continuationToken, null, null); }); var blobListingResult = fetchBlobsTask.Result; continuationToken = blobListingResult.ContinuationToken; var blobsList = blobListingResult.Results.ToList(); if (blobsList.Count > 0) { var documentsList = new List<Document>(); foreach (var blob in blobsList) { var keyDocumentField = new DocumentField() { Name = "UniqueIdentifier", Value = string.Format("ID{0}", blob.Uri.AbsoluteUri.GetHashCode()), DataType = SearchDataType.String, }; List<DocumentField> documentFields = new List<DocumentField>(); documentFields.Add(new DocumentField() { Name = "Url", Value = blob.Uri.AbsoluteUri, DataType = SearchDataType.String, }); documentFields.Add(new DocumentField() { Name = "Container", Value = blob.Container.Name, DataType = SearchDataType.String, }); var blockBlob = blob as CloudBlockBlob; if (blockBlob != null) { documentFields.Add(new DocumentField() { Name = "Name", Value = blockBlob.Name, DataType = SearchDataType.String, }); documentFields.Add(new DocumentField() { Name = "BlobType", Value = "Block", DataType = SearchDataType.String, }); documentFields.Add(new DocumentField() { Name = "ContentType", Value = blockBlob.Properties.ContentType, DataType = SearchDataType.String, }); documentFields.Add(new DocumentField() { Name = "Size", Value = blockBlob.Properties.Length.ToString(), DataType = SearchDataType.Double, }); documentFields.Add(new DocumentField() { Name = "LastModified", Value = blockBlob.Properties.LastModified.Value.ToString("yyyy-MM-ddTHH:mm:ssZ"), DataType = SearchDataType.DateTimeOffset, }); } var pageBlob = blob as CloudPageBlob; if (pageBlob != null) { documentFields.Add(new DocumentField() { Name = "Name", Value = pageBlob.Name, DataType = SearchDataType.String, }); documentFields.Add(new DocumentField() { Name = "BlobType", Value = "Page", DataType = SearchDataType.String, }); documentFields.Add(new DocumentField() { Name = "ContentType", Value = pageBlob.Properties.ContentType, DataType = SearchDataType.String, }); documentFields.Add(new DocumentField() { Name = "Size", Value = pageBlob.Properties.Length.ToString(), DataType = SearchDataType.Double, }); documentFields.Add(new DocumentField() { Name = "LastModified", Value = pageBlob.Properties.LastModified.Value.ToString("yyyy-MM-ddTHH:mm:ssZ"), DataType = SearchDataType.DateTimeOffset, }); } var document = new Document() { KeyField = keyDocumentField, Fields = documentFields, }; documentsList.Add(document); } var uploadDocumentTask = Task.Run<bool>(async () => { return await SearchApiHelper.UploadDocument(searchIndex, documentsList); }); uploadDocumentTask.Wait(); totalBlobsUploaded += documentsList.Count; totalBlobsIndexed += documentsList.Count; } } while (continuationToken != null); Console.WriteLine("Uploaded " + totalBlobsUploaded + " blobs from '" + container.Name + "' container."); Console.WriteLine("---------------------------------------------------------------------------------------"); } } while (containerContinuationToken != null); Console.WriteLine("Uploaded " + totalBlobsIndexed + " documents from '" + accountName + "' storage account for indexing...."); }
Once this step is done, your data is all ready to be searched.
Step 3: Build Search Interface
This is the last and final step (we’re almost there:)). Now all we have to do is build a search interface. But before we do a that, let’s write a simple helper function which consumes the REST API for doing simple search. Here’s the code to do that:
/// <summary> /// Helper function to perform simple search. /// </summary> /// <param name="index"></param> /// <param name="searchCriteria"></param> /// <returns></returns> public static async Task<string> SimpleSearch(Index index, string searchCriteria) { var searchUri = new Uri(string.Format("https://{0}/indexes/{1}/docs?api-version={2}&search={3}", SystemSettings.Endpoint, index.Name, SystemSettings.ApiVersion, searchCriteria)); NameValueCollection requestHeaders = new NameValueCollection() { //{"Accept", "application/json"}, {"api-key", SystemSettings.QueryKey} }; using (var response = (HttpWebResponse)await RequestResponseHelper.ProcessRequest(searchUri, HttpMethod.GET, "", requestHeaders, null)) { using (var streamReader = new StreamReader(response.GetResponseStream())) { return await streamReader.ReadToEndAsync(); } } }
Again, I opted for simple implementation and I strongly urge you to check out the rich querying capabilities in the REST API: http://msdn.microsoft.com/en-us/library/azure/dn798940.aspx.
For implementing search interface, I built a simple MVC application which uses jQuery/Ajax to query a “search” action in controller. Search action in the code in turn invokes the helper function for searching and returns string data which gets passed back to the JavaScript code in the view. The response gets parsed as JSON object and then using underscore.js template engine the results are shown on the web page. Simple!!!
You can get the code for web application from the download link below.
Closing Thoughts
That’s pretty much it. As I mentioned, all-in-all it took me about 6 hours to do everything I described above. Obviously the code is not production grade but it is insanely simple to build a search application using this service.
Obviously one thought that came to my mind is that how this service will keep track of all the changes that have been going on in my blob storage account. Well, it won’t! Remember, it’s not a crawler service. Thus it would be your responsibility as a developer to keep your search catalog up-to-date if you don’t want to serve stale search results.
Another thing that crossed my mind is that how can I make this service search inside my blobs. Again I omitted that part but just thinking out loud, you could achieve that by doing one of the two things:
- Include blob’s content as part of your catalog (during document upload process) but the downside of this approach is that you increase your time to populate the catalog and secondly Azure Search Service still can’t search through binary content (to the best of my knowledge). Thus if you have PDF files or Word documents inside your blob storage, Azure Search Service won’t be able to search through them (again, to the best of my knowledge).
- Build strong metadata repository. Blobs in blob storage can have metadata in form of key/value pair. I would encourage you to use that and include that metadata as part of your catalog.
Important Resources
Man, I need to get out to Twitter more often:). I thought I was the first one to do some work on this service but my good friend Sandrino beat me to it and by a big margin. Only after I posted the thread, I ran into his blog posts and he has done an amazing job with this service. Earlier in the post I said that there’s no SDK available for this service. Well, I take that back. There’s a .Net SDK (not official though but at this time who cares) developed by Sandrino. Not only there is a .Net SDK, there’s a browser based explorer/manager available as well. I would strongly recommend you check out these posts from Sandrino for more details:
I’m sure that you will find these useful.
Summary
That’s it for this post. I hope you’ve found it useful. If you find any issues with the post, please let me know ASAP and I will correct them. Overall, I think it’s a great service and I can’t wait to see what more features will be introduced as this service matures.
Oh, almost forgot! You can download the source code for this entire project from here: Download Source Code
Happy Coding!!!