Welcome to part four of my tutorial on getting started with Elastic using .Net NEST library. In part one, I covered the reasons for choosing NoSQL, an explanation of Elastic, and the installation of and communication with Elastic. In part two, I went over the creation of your first Elastic index, CRUD operation, and simple search with paging and sorting. Part three focused on queries, nested types, and retrieving only part of a document. In this, the final installment, I will cover empty properties, dynamically-constructed queries, custom analyzers and NEST Bulk API document migration. Let's dig in.
Find an Empty Property
Null and empty values cannot be indexed, therefore usual queries cannot be used. A filtered query should be used instead. Let’s put a new post with an empty body property into our index as such:
var author1 = new Author
{
Id = Guid.NewGuid(),
FirstName = "John",
LastName = "Doe"
};
var blogPost = new BlogPost
{
Id = Guid.NewGuid(),
Title = "test post 1",
Body = null,
Author = author1
};
elastic.Index(blogPost, p => p
.Id(blogPost.Id.ToString())
.Refresh()
);
Now we will write a filtered query to retrieve the post with the empty body property as such:
var res = elastic.Search(s => s
.Query(fq => fq
.Filtered(f => f
.Query(q1 => q1.MatchAll())
.Filter(f2 => f2.Missing(p => p.Body)))));
Console.WriteLine(res.RequestInformation.Success);
Console.WriteLine(res.Hits.Count());
foreach (var hit in res.Hits)
{
Console.WriteLine(hit.Source);
}
Now save the build and run the program. You should see the following output in the console:
True
1
Id: '44b002f9-2c6e-4356-9b00-a13b3c2160dc', Title: 'test post 1', Body: '', Author: Id: '9dcdc66d-e260-421f-8c2a-3eeb26110844', First name: 'John', Last Name: 'Doe'
We tell Elastic to match all documents, but additionally filter results with a “missing” filter in order to return only those having a null or empty “body” property. You can read more about missing filters here.
How To Dynamically Construct a Query
Exactly how to dynamically construct a query is initially unclear using NEST. This example should put you on the right track:
var name = "John";
var surname = "Doe";
Func, BoolQueryDescriptor> boolQuery = bq => bq;
Func, FilterContainer> filter = f => f;
Func, QueryContainer> must = m => m;
Func, NestedQueryDescriptor> firstname = n => n
.Path(p => p.Author)
.Query(nq => nq
.Match(nqm => nqm.OnField(p => p.Author.FirstName).Query(name).Lenient()));
Func, NestedQueryDescriptor> lastname = n => n
.Path(p => p.Author)
.Query(nq => nq
.Match(nqm => nqm.OnField(p => p.Author.LastName).Query(surname).Lenient()));
if (!string.IsNullOrEmpty(name)) {
must = m => m.Nested(n => firstname(n)) && m.Nested(n => lastname(n));
} else {
must = m => m.Nested(n => lastname(n));
}
filter = f => f
.Bool(b => b
.Must(m2 => m2
.Missing(p => p.Body)));
boolQuery = bq => bq.Must(m => must(m));
Func, QueryContainer> query = q => q
.Filtered(f => f
.Query(q1 => q1
.Bool(b => boolQuery(b)))
.Filter(f2 => filter(f2)));
var res = elastic.Search(s => s.Query(query));
Console.WriteLine(res.RequestInformation.Success);
Console.WriteLine(res.Hits.Count());
foreach (var hit in res.Hits)
{
Console.WriteLine(hit.Source);
}
Now save the build and run the program. You should see the following output in the console:
True
1
Id: '44b002f9-2c6e-4356-9b00-a13b3c2160dc', Title: 'test post 1', Body: '', Author: Id: '9dcdc66d-e260-421f-8c2a-3eeb26110844', First name: 'John', Last Name: 'Doe'
As you can see, we are dynamically changing the "must" query part and adding a necessary filter part.
Defining and Using a Custom Analyzer (Autocomplete Example)
The analyzer performs the following three functions: character filtering, tokenization and token filtering. A character filter cleans text blocks by removing or replacing unwanted text sequences, for example, removing HTML tags. Tokenization splits a block of text into individual terms, which are suitable to use in an inverted index. Token filters normalize terms into standard forms. Elastic comes with the following analyzers available out of the box: standard analyzer, simple analyzer, whitespace analyzer and language analyzer. The real power of Elastic is in the ability to create custom analyzers suitable for different purposes by combining different character filters, tokenizers, and token filters provided by Elastic out of the box.
We will construct a custom analyzer by using a “keyword” tokenizer and “lowercase” token filter. Keyword tokenizer emits the entire input as a single input. Lowercase token filter normalizes token text to lowercase. This custom analyzer could be used where the whole input (not splitting it into individual terms) should be matched in a case-insensitive way. To define our custom analyzer, we must modify the index creation code in the following way:
var indexSettings = new IndexSettings();
var customAnalyzer = new CustomAnalyzer();
customAnalyzer.Tokenizer = "keyword";
customAnalyzer.Filter = new List();
customAnalyzer.Filter.Add("lowercase");
indexSettings.Analysis.Analyzers.Add("custom_lowercase_analyzer", customAnalyzer);
var analyzerRes = elastic.CreateIndex(ci => ci
.Index("my_third_index")
.InitializeUsing(indexSettings)
.AddMapping(m => m.MapFromAttributes())
.AddMapping(m => m.MapFromAttributes()));
Console.WriteLine(analyzerRes.RequestInformation.Success);
We also changed mappings of BlogPost in order to use our custom analyzer:
[ElasticType(IdProperty = "Id", Name = "blog_post")]
public class BlogPost
{
[ElasticProperty(Name = "_id", Index = FieldIndexOption.NotAnalyzed, Type = FieldType.String)]
public Guid? Id { get; set; }
[ElasticProperty(Name = "title", Index = FieldIndexOption.Analyzed, Analyzer = "custom_lowercase_analyzer")]
public string Title { get; set; }
[ElasticProperty(Name = "body", Index = FieldIndexOption.Analyzed, Type = FieldType.String)]
public string Body { get; set; }
[ElasticProperty(Name = "author", Index = FieldIndexOption.Analyzed, Type = FieldType.Nested)]
public Author Author { get; set; }
public override string ToString()
{
return string.Format("Id: '{0}', Title: '{1}', Body: '{2}', Author: {3}", Id, Title, Body, Author);
}
}
Here we are using a “custom_lowercase_analyzer” in the BlogPost Title property mapping. We tell Elastic to use this analyzer, then we perform a search on the blog post title. For example, this analyzer, in combination with a “wildcard” query, could be used to drive autocompleting functionality. We created a new index, so let’s populate it with the following data:
var author = new Author { Id = Guid.NewGuid(), FirstName = "John", LastName = "Doe" };
var blogPosts = new[]
{
new BlogPost { Id = Guid.NewGuid(), Title = "Title 001", Body = "1" , Author = author },
new BlogPost { Id = Guid.NewGuid(), Title = "Title long 002", Body = "2" , Author = author },
new BlogPost { Id = Guid.NewGuid(), Title = "title long unique 003", Body = "3" , Author = author }
};
foreach (var blogPost in blogPosts)
{
elastic.Index(blogPost, p => p
.Id(blogPost.Id.ToString())
.Refresh());
}
Then, we create the following "wildcard" query to see our analyzer in action:
var wcRes = elastic.Search(s => s
.Query(q => q
.Wildcard(wc => wc
.OnField(f => f.Title).Value("*t*0*"))));
Console.WriteLine(wcRes.RequestInformation.Success);
Console.WriteLine(wcRes.Hits.Count());
foreach (var hit in wcRes.Hits)
{
Console.WriteLine(hit.Source);
}
Now save the build and run the program. You should see the following output in the console:
Now save build and run program. You should see the following output in console:
True
3
Id: '116f4004-279c-4621-bf40-fe00d020aa43', Title: 'title long unique 003', Body: '3', Author: Id: 'bc855392-d9a0-4e68-a6c7-b1b90b2d18e0', First name: 'John', Last Name: 'Doe'
Id: '69f6bf05-a4d8-461c-8018-feb81c547223', Title: 'title long 002', Body: '2', Author: Id: 'bc855392-d9a0-4e68-a6c7-b1b90b2d18e0', First name: 'John', Last Name: 'Doe'
Id: '56f81fe3-d7a1-426b-b620-ee3c1c8af372', Title: 'title 001', Body: '1', Author: Id: 'bc855392-d9a0-4e68-a6c7-b1b90b2d18e0', First name: 'John', Last Name: 'Doe'
Our analyzer has done fine job by matching post title ignoring character cases. Let’s change wildcard query to “*t*2*”.
Now save the build and run the program. You should see the following output in console:
True
1
Id: '69f6bf05-a4d8-461c-8018-feb81c547223', Title: 'title long 002', Body: '2', Author: Id: 'bc855392-d9a0-4e68-a6c7-b1b90b2d18e0', First name: 'John', Last Name: 'Doe'
Now only one title is matched.
Using NEST Bulk API to Migrate Documents Into a Newly-created Index
Many times in this article, we have changed mappings, then created a new index and populated it. This is ok for learning purposes, but in a real production project you don’t want to populate an index with new data and lose old documents every time you change mappings. What we can do about this? Here comes document migration and the wonderful NEST Bulk API into our rescue. You can use "Bulk()" to create any bulk request you'd like. Let’s create a minimum example for the migration of old documents into a newly created index. We have created a new index and named it “my_fourth_index” by using the same code as in the previous example.
var local = new Uri("http://localhost:9200");
var settings = new ConnectionSettings(local, "my_third_index");
var sourceIndex = new ElasticClient(settings);
var settings2 = new ConnectionSettings(local, "my_fourth_index");
var destinationIndex = new ElasticClient(settings2);
var scanResults = sourceIndex.Search(s => s
.Index("my_third_index")
.From(0)
.Size(10)
.MatchAll().
SearchType(SearchType.Scan)
.Scroll("2s"));
var results = sourceIndex.Scroll("4s", scanResults.ScrollId);
while (results.Documents.Any())
{
var request = new BulkRequest();
request.Refresh = true;
request.Consistency = Consistency.One;
request.Operations = new List();
request.Index = "my_fourth_index";
foreach (var document in results.Documents)
{
request.Operations.Add(new BulkIndexOperation(document));
}
var response = destinationIndex.Bulk(request);
results = sourceIndex.Scroll("4s", results.ScrollId);
}
Now save the build and run the program. You can run a simple “match all” query to see that all old documents from “my_third_index” were migrated into the new “my_fourth_index”. We are using scan queries and scroll because it is the optimum solution to access documents in a sequence.
That is all. Thank you for your time. The following is a “Reading resource” section if you want even more information about Elastic and NEST.
Reading resources
Elastic: The Definitive Guide: http://www.elastic.co/guide/en/elasticsearch/guide/current/index.html
Elastic documentation: http://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
NEST Quick start: http://nest.azurewebsites.net/nest/quick-start.html
Elastic setup on Azure: http://thomasardal.com/running-elasticsearch-in-a-cluster-on-azure/
ElasticHQ plugin: http://www.elastichq.org/index.html
Fiddler web debugging proxy: http://www.telerik.com/fiddler