As you may already know, for past 2 weeks we have been implementing support for Azure Search Service in Cloud Portam. We released a newer version yesterday (more on this here). Currently there is no SDK available for this service and we ended up implementing REST API for Azure Search Service. During the course of development, we learned a lot about the service and discovered a number of business rules. If you’re trying to write code to consume this REST API, you may find this blog post useful as it will (hopefully) save roundtrips to the server only to find out that you passed incorrect data.
Terminology
Let’s take a moment and talk about some of the terms used in Azure Search. Please note that this is my understanding and my interpretation may be wrong (if that’s the case, please let me know and I will update the post).
Index
I would like to think of an Index as “Search Catalog”. If you’re coming from a relational database world, I would like you to think of an index as a table in a relational database. Essentially it is a container for data you want to search along with some additional properties to influence the search results.
Index Fields
Index Fields define the schema for an index. Again if you’re coming from a relational database world, I would like you to think of index fields as columns in a table. A column in a table is of a certain data type and has certain attributes (like Primary Key, Nullable etc.) and similarly an index field is of a certain data type and has certain attributes (e.g. Key, Searchable etc.)
Index Scoring Profile
This is where things get interesting 🙂 [read: I really didn’t understand it completely :)]. A scoring profile to me is a way to influence search result and show search results in a different order. For example, let’s say you have an e-commerce website where you’re using Azure Search and a user searches for say “laptop”. Now assuming you wish to get rid of your old inventory first. What you could do is define scoring profile in such a way that you should old inventory towards the top of your search results.
Index Scoring Profile Weights
This is part of Index Scoring Profile and is used to assign different weights to different searchable fields in your index.
Index Scoring Profile Functions
This is again part of Index Scoring Profile and is used to alter search ranking for items based on values in applicable fields.
Rules
Now that we’ve covered some basic terminology, let’s look at some of the rules we discovered. Please note that at the time of writing the blog, service version is 2014-07-31-Preview and the rules I mentioned below are for this version. Please check MSDN Documentation for the latest service version.
Index Fields
Some of the rules we discovered as applicable to index fields:
Data Type / Properties Matrix
Following table shows index field data type and the properties applicable.
Key | Searchable | Filterable | Sortable | Facetable | Suggestions | Retrievable | |
Edm.String | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
Collection(Edm.String) | No | Yes | Yes | No | Yes | Yes | Yes |
Edm.Int32 | No | No | Yes | Yes | Yes | No | Yes |
Edm.Double | No | No | Yes | Yes | Yes | No | Yes |
Edm.Boolean | No | No | Yes | Yes | Yes | No | Yes |
Edm.DateTimeOffset | No | No | Yes | Yes | Yes | No | Yes |
Edm.GeographyPoint | No | No | Yes | Yes | No | No | Yes |
The way you would read this matrix is by asking this question: Can an index field of XYX data type have ABC property? For example, Can an index field of “Edm.DateTimeOffset” data type be “Seachable”? The answer is “No”. Any attempt to set this property to true will result in a bad request error thrown from the server.
Key Property
Think of “Key” property as “Primary Key” for the index. We discovered following rules when it came to key property:
- An index must have a key field or in other words, primary key is required.
- There can be only one key field per index or in other words an index can’t have composite primary key.
- Key field must be of “Edm.String” data type. No other data types are supported for key field at the time of writing of this blog.
Some Other things
- Currently once an index is created, you can’t edit or delete any of the existing index fields. Through update operation, you can add new fields though.
- From what I have been told, currently only “String” collections are supported through “Collection (Edm.String) data type.
- Currently there’s no limit on the number of fields an index can have however to future safe your application in case search service announces the limit, it is better only to add fields which are really required for searching.
Index Scoring Profile Weights
These are the things we discovered for scoring profile weights:
- Weights can only be defined on “Searchable” fields.
- Weight value must be a positive number.
Index Scoring Profile Functions
This is where we had most fun :). These are the things we discovered for scoring profile functions:
- Scoring profile functions can only be defined on “Filterable” fields.
- For each scoring profile function, you have to define a boosting value which should be a positive number but should not have a value of one (1).
- There are three types of functions supported currently – Distance, Freshness and Magnitude.
- Magnitude type functions can only be defined on fields of “Edm.Int32” and “Edm.Double” data type.
- Freshness type functions can only be defined on fields of “Edm.DateTimeOffset” data type.
- Distance type functions can only be defined on fields of “Edm.GeographyPoint” data type.
- For Freshness type function, you must define boosting duration which represents a Timespan. You would need to define that in “P[nD][T[nH][nM][nS]]” format e.g. P1DT12H32M50.345S.
- For Distance type function, you must define a boosting distance which should be a positive number greater than zero (0). Please note that the boosting distance is in Kilometers (so if you’re habitual of using miles, just multiply that number by 1.6 to get Kilometers :)).
- For Magnitude type function, you must define numerical values for boosting range start and end. In our tests, we could not find any limitations other than numerical on the values you put in for boosting range start and end.
Some Regular Expressions
Since we wanted to enforce these rules on the client side only, we ended up writing some regular expressions which I am including below. Please note that I am no RegEx Ninja 🙂 and most of the stuff I put below is copied from Stack Overflow and modified to suit our needs. If you find any issues with these or have a better way of doing it, please let me know.
Index Field Name
/(^(?!azureSearch))(^[a-zA-Z]([a-zA-Z0-9_]*)$)/
Index Scoring Profile Weight
/^\+?(\d*[1-9]\d*\.?|\d*\.\d*[1-9]\d*)$/
CORS Max Age (in Seconds)
/^[1-9][0-9]*$/
Index Scoring Profile Function Boost
/^\+?(\d*[1-9]\d*\.?|\d*\.\d*[1-9]\d*)$/
Index Scoring Profile Distance Function – Reference Point Parameter
/^[a-zA-Z][a-zA-Z0-9]{1,15}$/
Index Scoring Profile Distance Function – Boosting Distance
/^((\d*\.)?)\d+$/
Index Scoring Profile Freshness Function – Number of Days
/^\d{0,3}$/
Index Scoring Profile Freshness Function – Number of Hours
/(^[1-9]$)|(^[01][0-9]$)|(^2[0-3]$)/
Index Scoring Profile Freshness Function – Number of Minutes/Seconds
/(^[1-9]$)|(^[0-5][0-9]$)/
Index Scoring Profile Freshness Function – Number of Milliseconds
/(^[1-9]$)|(^[0-5][0-9]$)/
Index Scoring Profile Magnitude Function – Boosting Range Start/End
/^([-+]?(\d*\.)?)\d+$/
Summary
That’s it for this post. As you start building search applications using Azure Search Service, I hope you will find this post useful … Or … you could simply use Cloud Portam where we have already taken care of these rules(How’s that for a sales pitch :)).
Jokes apart, I do hope that you find this post useful. As always, if you find any issues with the post please let me know ASAP so that I can fix that.
Happy Searching with Azure Search Service and Happy Coding!!!