Welcome to part three of my series on Elastic using .Net NEST library. In part one, I covered the basics of NoSQL, Elastic and a quick installation. In part two, I covered the creation of an index, CRUD operations and simple search with paging and sorting. This time around, we'll cover a few queries (match and bool), nested types and nested queries, and how to retrieve only part of a document.
Match Query
The first type of query is a “match” query, which accepts all kinds of data (text, numeric and date), analyzes it and then constructs a Boolean clause from each fragment of input text. By default, a match query uses the “or” operator to combine those clauses into a final query. You can control how Boolean clauses are combined by setting the “operator” flag either to “or” or “and”.
Let’s test how a match query works. First, we will put some fresh data into our index, as follows:
var blogPosts = new\[\]
{
new BlogPost { Id = Guid.NewGuid(), Title = "test post 123", Body = "1" },
new BlogPost { Id = Guid.NewGuid(), Title = "test something 123", Body = "2" },
new BlogPost { Id = Guid.NewGuid(), Title = "read this post", Body = "3" }
};
foreach (var blogPost in blogPosts)
{
elastic.Index(blogPost, p => p
.Id(blogPost.Id.ToString())
.Refresh());
}
Then add the following query:
var res = elastic.Search(s => s
.Query(q => q
.Match(m => m.OnField(f => f.Title).Query("test post 123"))));
Console.WriteLine(res.RequestInformation.Success);
Console.WriteLine(res.Hits.Count());
foreach (var hit in res.Hits)
{
Console.WriteLine(hit.Source);
}
Console.WriteLine(res.RequestInformation.Success);
Console.WriteLine(res.Hits.Count());
foreach (var hit in res.Hits)
{
Console.WriteLine(hit.Source);
}
Now save the build and the run the program. You should see the following output in console:
Id: '13efef63-ed44-48a9-8eab-1a8a4ee9fcff', Title: 'test post 123', Body: '1'
Id: '92d4159c-1c0b-4f61-8691-fb81fe9246d9', Title: 'test something 123', Body: '2'
Id: '0d5315e0-df67-4d77-9b5c-ad07fbb62951', Title: 'read this post', Body: '3'
Why were all three posts returned? Do you remember that by default, a match query combines Boolean clauses using the “or” operator? So, if any of those clauses are satisfied by the document, it is returned as a hit. We searched posts by title and gave the input text: “test post 123”. Each returned document matches some part of our input text. Let’s see what happens if we modify the query to use the “and” operator instead of “or”:
var res = elastic.Search(s => s
.Query(q => q
.Match(m => m
.OnField(f => f.Title)
.Query("test post 123")
.Operator(Operator.And))));
Console.WriteLine(res.RequestInformation.Success);
Console.WriteLine(res.Hits.Count());
foreach (var hit in res.Hits)
{
Console.WriteLine(hit.Source);
}
Now save the build and run the program. You should see the following output in the console:
True
1
Id: '9ca6456f-af9a-47ac-b80e-532a60ad72e9', Title: 'test post 123', Body: '1'
Now, only documents with a titles that match each Boolean clause are returned. We have only one hit. There is also a “MinimumShouldMatch” option, which specifies how many Boolean clauses should be matched to include in the results. Let’s see what happens if we modify our query to use the “MinimumShouldMatch” option and set its value to “two”:
var res = elastic.Search(s => s
.Query(q => q
.Match(m => m
.OnField(f => f.Title)
.Query("test post 123")
.Operator(Operator.Or)
.MinimumShouldMatch("2"))));
Console.WriteLine(res.RequestInformation.Success);
Console.WriteLine(res.Hits.Count());
foreach (var hit in res.Hits)
{
Console.WriteLine(hit.Source);
}
Now save the build and run the program. You should see the following output in the console:
True
2
Id: '9ca6456f-af9a-47ac-b80e-532a60ad72e9', Title: 'test post 123', Body: '1'
Id: '7a9b5193-4714-443e-9ad0-9463da95c432', Title: 'test something 123', Body: '2'
Now, two posts are returned because they match at least two fragments of input text (“test” and “123”).
There are other options that could be specified. Read more about the match query at here.
Next, I will show you how to use a bool query.
Bool Query
A “bool” query combines multiple criteria with defined occurrence constraints into a Boolean combination by which documents are matched. The following occurrence constraints are available: “must” – criteria must be matched in a document; “should” - one or more of “should” criteria must be matched in a document (it is possible to specify a minimum number of criteria to match in a “minimum_should_match” option); “must_not” – criteria must not be matched in a document.
Here is an example of a bool query in which the criteria are combined using a “must” occurrence constraint:
var boolRes1 = elastic.Search(s => s
.Query(q => q
.Bool(b => b
.Must(m =>
m.Match(mt1 => mt1.OnField(f1 => f1.Title).Query("title")) &&
m.Match(mt2 => mt2.OnField(f2 => f2.Body).Query("001")))
))
.Sort(o => o.OnField(p => p.Title).Ascending()));
Console.WriteLine(boolRes1.RequestInformation.Success);
Console.WriteLine(boolRes1.Hits.Count());
foreach (var hit in boolRes1.Hits)
{
Console.WriteLine(hit.Source);
}
Now save the build and run the program. You should see the following output in the console:
True
1
Id: '9172efe6-d7c4-4fc0-8b40-2d85faf73381', Title: 'title 001', Body: 'This is 001 very long blog post!'
We are telling Elastic to match only those posts that have a “title” fragment in the title and “001” fragment in the body. Only one post satisfies both of those conditions, therefore we have only one match. Let’s see what happens if we modify our query to use a “should” occurrence constraint instead of “must”:
var boolRes2 = elastic.Search(s => s
.Query(q => q
.Bool(b => b
.Should(sh =>
sh.Match(mt1 => mt1.OnField(f1 => f1.Title).Query("title")) ||
sh.Match(mt2 => mt2.OnField(f2 => f2.Body).Query("001"))
)))
.Sort(o => o.OnField(p => p.Title).Ascending()));
Console.WriteLine(boolRes2.RequestInformation.Success);
Console.WriteLine(boolRes2.Hits.Count());
foreach (var hit in boolRes2.Hits)
{
Console.WriteLine(hit.Source);
}
Now save the build and run the program. You should see the following output in the console:
True
10
Id: '461c803f-022c-4cfb-af82-3ff6d022664c', Title: 'title 000', Body: 'This is 000 very long blog post!'
Id: '9172efe6-d7c4-4fc0-8b40-2d85faf73381', Title: 'title 001', Body: 'This is 001 very long blog post!'
Id: '66d6659d-c78b-4c65-ba8b-9077502371c9', Title: 'title 002', Body: 'This is 002 very long blog post!'
Id: '2342f1af-6c9e-4196-8d18-f74057170a4d', Title: 'title 003', Body: 'This is 003 very long blog post!'
Id: '6a9e9940-7033-411a-8889-a0dec0e23d6c', Title: 'title 004', Body: 'This is 004 very long blog post!'
Id: '9ea05802-de42-431d-a6f7-26706e1d8759', Title: 'title 005', Body: 'This is 005 very long blog post!'
Id: '14e4c337-e9d6-48d9-9ed2-d863d63d17b6', Title: 'title 006', Body: 'This is 006 very long blog post!'
Id: 'd07b2abf-ef1a-4aa0-bf12-08a7ae07862b', Title: 'title 007', Body: 'This is 007 very long blog post!'
Id: 'b6827f69-75a3-47b3-8fc4-efff37806892', Title: 'title 008', Body: 'This is 008 very long blog post!'
Id: '5684c48d-f244-48fd-9ee4-4c62721a35c4', Title: 'title 009', Body: 'This is 009 very long blog post!'
We are telling Elastic to match posts that have a “title” fragment in the title or “001” fragment in the body. In this case, ten posts have a “title” fragment in the title. One of the criterion is satisfied and it does not matter that only one post has “001” fragment in its body. The “should” occurrence constraint requires that a minimum of one criterion must be matched. Therefore, we have ten matches. Let’s add a “must_not” occurrence constraint into the mix:
var boolRes3 = elastic.Search(s => s
.Query(q => q
.Bool(b => b
.Should(sh =>
sh.Match(mt1 => mt1.OnField(f1 => f1.Title).Query("title")) ||
sh.Match(mt2 => mt2.OnField(f2 => f2.Title).Query("001")))
.Must(ms =>
ms.Match(mt2 => mt2.OnField(f => f.Body).Query("this")))
.MustNot(mn =>
mn.Match(mt2 => mt2.OnField(f => f.Body).Query("002"))
)))
.Sort(o => o.OnField(p => p.Title).Ascending()));
Console.WriteLine(boolRes3.RequestInformation.Success);
Console.WriteLine(boolRes3.Hits.Count());
foreach (var hit in boolRes3.Hits)
{
Console.WriteLine(hit.Source);
}
Now save the build and run the program. You should see the following output in the console:
True
9
Id: '461c803f-022c-4cfb-af82-3ff6d022664c', Title: 'title 000', Body: 'This is 000 very long blog post!'
Id: '9172efe6-d7c4-4fc0-8b40-2d85faf73381', Title: 'title 001', Body: 'This is 001 very long blog post!'
Id: '2342f1af-6c9e-4196-8d18-f74057170a4d', Title: 'title 003', Body: 'This is 003 very long blog post!'
Id: '6a9e9940-7033-411a-8889-a0dec0e23d6c', Title: 'title 004', Body: 'This is 004 very long blog post!'
Id: '9ea05802-de42-431d-a6f7-26706e1d8759', Title: 'title 005', Body: 'This is 005 very long blog post!'
Id: '14e4c337-e9d6-48d9-9ed2-d863d63d17b6', Title: 'title 006', Body: 'This is 006 very long blog post!'
Id: 'd07b2abf-ef1a-4aa0-bf12-08a7ae07862b', Title: 'title 007', Body: 'This is 007 very long blog post!'
Id: 'b6827f69-75a3-47b3-8fc4-efff37806892', Title: 'title 008', Body: 'This is 008 very long blog post!'
Id: '5684c48d-f244-48fd-9ee4-4c62721a35c4', Title: 'title 009', Body: 'This is 009 very long blog post!'
We are telling Elastic to match posts that have a “title” or “001” fragment in their title, a “this” fragment in their body, and don’t have “002” fragment in their body. By using a “must_not”, we tell Elastic that we are forbidding a “002” fragment from being in the body, and so we are dropping one post from the results. We have nine matches.
Note: The NEST library supports the combination of criteria using bitwise operators, so we can rewrite the previous query in the following shorter form:
elastic.Search(s => s
.Query(q =>
(q.Match(mt1 => mt1.OnField(f1 => f1.Title).Query("title")) ||
q.Match(mt2 => mt2.OnField(f2 => f2.Title).Query("001")))
&& (q.Match(mt2 => mt2.OnField(f => f.Body).Query("this")))
&& (!q.Match(mt2 => mt2.OnField(f => f.Body).Query("002"))))
.Sort(o => o.OnField(p => p.Title).Ascending()));
Here, we replace the “must” with a bitwise operator, “&&”, “should” with “||” and “must_not” with “!”. This is very handy feature, as opposed to writing complex queries.
You can read more about the bool query here.
Nested Types and Nested Query
Elastic, like most NoSQL databases, lacks the ability to include “joins” in queries. However, there are widely used NoSQL patterns that compensate for a lack of join queries. These are the most commonly used once:
Multiple queries - retrieving needed data with additional queries. NoSQL queries are simple and therefore faster than SQL queries, so the cost of additional queries may be acceptable.
Caching/replication/non-normalized data - instead of only storing foreign keys, it's common to store some additional data values. If the data changes, this will now need to be changed in many places in the database, thus this approach works better when reads are much more common than writes.
Nesting data - with document databases, it's common to put more data in a smaller number of collections. For example, in a blogging application, one might choose to store comments within the blog post document so that a single query retrieves all comments. In other words, a single document contains all the needed data for a specific task.
Now, we will see how well Elastic handles nested data. Let’s add author information as a nested type into a post. We will define the author type in a separate “Author.cs” file. Create it and enter the following code:
\[ElasticType(Name = "author", IdProperty = "Id")\]
public class Author
{
\[ElasticProperty(Name = "\_id", Index = FieldIndexOption.NotAnalyzed, Type = FieldType.String)\]
public Guid? Id { get; set; }
\[ElasticProperty(Name = "first\_name", Index = FieldIndexOption.Analyzed, Type = FieldType.String)\]
public string FirstName { get; set; }
\[ElasticProperty(Name = "last\_name", Index = FieldIndexOption.Analyzed, Type = FieldType.String)\]
public string LastName { get; set; }
public override string ToString()
{
return string.Format("Id: '{0}', First name: '{1}', Last Name: '{2}'", Id, FirstName, LastName);
}
}
Then modify the “BlogPost.cs” file:
\[ElasticType(IdProperty = "Id", Name = "blog\_post")\]
public class BlogPost
{
\[ElasticProperty(Name = "\_id", Index = FieldIndexOption.NotAnalyzed, Type = FieldType.String)\]
public Guid? Id { get; set; }
\[ElasticProperty(Name = "title", Index = FieldIndexOption.Analyzed, Type = FieldType.String)\]
public string Title { get; set; }
\[ElasticProperty(Name = "body", Index = FieldIndexOption.Analyzed, Type = FieldType.String)\]
public string Body { get; set; }
\[ElasticProperty(Name = "author", Index = FieldIndexOption.Analyzed, Type = FieldType.Nested)\]
public Author Author { get; set; }
public override string ToString()
{
return string.Format("Id: '{0}', Title: '{1}', Body: '{2}', Author: {3}", Id, Title, Body, Author);
}
}
We added the author property into a BlogPost class. Pay attention to the ElasticProperty attribute type option; we have used “FieldType.Nested” to tell Elastic that the Author property is of the Nested type. We have changed type mappings, so we also have to create a new index that will use new mappings. Let’s create an index:
var idxRes = elastic.CreateIndex(ci => ci
.Index("my\_second\_index")
.AddMapping(m => m.MapFromAttributes())
.AddMapping(m => m.MapFromAttributes())
);
Console.WriteLine(idxRes.RequestInformation.Success);
We should also change the default index value in the connection settings:
var local = new Uri("http://localhost:9200");
var settings = new ConnectionSettings(local, "my\_second\_index");
var elastic = new ElasticClient(settings);
Now, we will use the following code to populate a new index:
var author1 = new Author { Id = Guid.NewGuid(), FirstName = "John", LastName = "Doe" };
var author2 = new Author { Id = Guid.NewGuid(), FirstName = "Notjohn", LastName = "Doe" };
var author3 = new Author { Id = Guid.NewGuid(), FirstName = "John", LastName = "Notdoe" };
var blogPosts = new\[\]
{
new BlogPost { Id = Guid.NewGuid(), Title = "test post 1", Body = "1" , Author = author1 },
new BlogPost { Id = Guid.NewGuid(), Title = "test post 2", Body = "2" , Author = author2 },
new BlogPost { Id = Guid.NewGuid(), Title = "test post 3", Body = "3" , Author = author3 }
};
foreach (var blogPost in blogPosts)
{
elastic.Index(blogPost, p => p
.Id(blogPost.Id.ToString())
.Refresh());
}
Now we have a new index, mappings and data. Let’s construct a nested query:
var nestedRes = elastic.Search(s => s
.Query(q => q
.Nested(n => n
.Path(b => b.Author)
.Query(nq =>
nq.Match(m1 => m1.OnField(f1 => f1.Author.FirstName).Query("John")) &&
nq.Match(m2 => m2.OnField(f2 => f2.Author.LastName).Query("Doe")))
)));
Console.WriteLine(nestedRes.RequestInformation.Success);
Console.WriteLine(nestedRes.Hits.Count());
foreach (var hit in nestedRes.Hits)
{
Console.WriteLine(hit.Source);
}
Now save the build and run the program. You should see the following output in the console:
True
1
Id: '071ef56c-4395-48cc-9425-1bd67f732f7a', Title: 'test post 1', Body: '1', Author: Id: 'b9b27efd-b3e2-4d0a-bd7c-f6a7446c509d', First name: 'John', Last Name: 'Doe'
We tell Elastic to search the post by author, with the first name of “John” and last name “Doe”. Only one author satisfies both criterions, so his post is returned as a result. You can read more about nested type and nested query here and here.
Retrieve Only Part of a Document
Sometimes there are situations in which only a small part of a big document is needed, so you can limit the returned fields for each hit by specifying a “Fields” or “Source” property. Here is an example query with the specified “Fields” property:
var sourceRes = elastic.Search(s => s
.Source(f => f.Include(p => p.Author))
.Query(q => q.MatchAll())
);
Console.WriteLine(sourceRes.RequestInformation.Success);
Console.WriteLine(sourceRes.Hits.Count());
foreach (var field in sourceRes.Hits)
{
Console.WriteLine(field.Source.Author);
}
We tell Elastic that we want only the “Id” and “Title” properties of matching posts to be returned in our results. Pay attention to the returned fields values; they are always arrays instead of a single value, except for internal metadata values. It is also possible to get “null” instead of array, so this situation also should be handled.
var sourceRes = elastic.Search(s => s
.Source(f => f.Include(p => p.Author))
.Query(q => q.MatchAll())
);
Console.WriteLine(sourceRes.RequestInformation.Success);
Console.WriteLine(sourceRes.Hits.Count());
foreach (var field in sourceRes.Hits)
{
Console.WriteLine(field.Source.Author);
}
We tell Elastic that we want only the “Author” nested property to be included into returned results. You can read more about fields and source filtering here, here and here.
That concludes part three of the series. Stay tuned for part four, in which I'll wrap up the series by covering empty properties, dynamically-constructed queries, custom analyzers, and using Bulk API to migrate documents into newly-created indexes.