Making Sense of Your Logs with Elasticsearch

When users report an issue, the first thing you ask them is what were they doing within the application. The difficulty with this approach is oftentimes, only a small fraction of the users are able to adequately answer the question, thus not allowing us to provide a solution. So, to remedy this, the first thing you can do as a user is add logging to your application. And sometimes you need to log a lot more than just exceptions, errors, or, simply put, things that just went bad.

Clearly, logging everything generates a lot of data (and affects performance, but luckily our users can adjust the log level through UI). Though you might have all the information you need to solve their problem, the task does not become easier, especially if you have to go through all log files. However, if the logs are located in one place, this becomes an easier task. But, in our case, logs were generated on mobile devices and in many cases users operating them were located remotely. Our mobile application was communicating with the web server to get data/send updates to it and we used that to deliver log data to a centralized location.The problem with the centralized location was that it was not actually centralized. We had different development and testing environments for different customers using our mobile application. So, all we were really doing was just delivering logs to different environment data servers. Can you imagine being the person who has to answer support calls and check log files on each of those machines?!

What Is Elasticsearch

Making Sense of Your Logs with Elasticsearch

To lift that burden, we chose an elasticsearch, which is a powerful, near real-time, search and analytics engine. All we had to do was deliver log events to it through RESTful API. To differentiate between separate log sources (or environments in our case), we tagged them with different indexes. Finally, we had all the log data in one place.

For those of you who have tried elasticsearch, you know that using it is not very user-friendly. You may also be aware of kibana–a search and analysis UI for elasticsearch. It’s a tool which is easy to setup (only three steps!) and configures to your needs. At last, we could breath a sigh of relief knowing that our support staff was geared up with all the needed tools to answer our users’ cries for help.

You might wonder if we should really feel so relieved–is this some sort of magical fix? Unfortunately, when it comes to computers, there is no magic. Elasticsearch is built on top of Apache Lucene ™–high performance, full-featured Information Retrieval library (full text search) - and supports real-time data and real-time analysis. Regardless of size, it allows you to start by depending on your capacity and scale as you grow (distributed). If you are concerned about availability, just add few extra nodes. Additionally, it’s JSON document oriented, with RESTful API, and runs on both Windows and Unix. And the best part of all - it’s free!

For the purpose of indexing and searching logs, there is a bundled solution offered at the Elasticsearch page - ELK stack, which stands for elasticsearch, logstash and kibana. That component, which we coded ourselves, in most cases can be replaced by logstash - a tool to collect and process the logs delivering them to Elasticsearch at the end (but it’s not limited to only this output).

Challenges with Elasticsearch

You might reconsider choosing elasticsearch if you are worried about any of the things below:

  • There is no authentication or access control functionality in elasticsearch.

  • No transactions on data manipulation.

  • Lower priority on backups and durability compared with other datastores.

  • Lack of mature client libraries and third party tools.

  • Data availability is only “near real-time” - if you add new data, it takes some time to update the indexes.

There were a few issues when we implemented Elasticsearch. Though we weren’t too concerned with the issues above, we came across the following bumps:

  • Delivering logged events from data service to elasticsearch: We implemented it as a separate log4net custom adapter, so it could be switched in/out easily through configuration files. The logging service that accepts requests from the mobile application only had to relog those events to a specific log level. That way we could have more output types if needed.

  • Some logged events did appear in-real time but were being displayed in the wrong order. To solve this, we had to come up with custom field to store logging order.

  • Kibana front-end supports paging only on the client side. With a very wide filter criteria generating lots of results it could take a while to load a page.

Many major companies have used elasticsearch to solve their problems. Companies like Stack Overflow, GitHub, SoundCloud and StumpleUpon all recognize that Elasticsearch is a powerful tool for searching and indexing but should be used with care. It’s great for cases where you have to make sense of huge amounts of collected data, especially logs. In other hand there is no single tool to solve all your problems, and none of them are perfect - but in our case our choice was perfect!

External references:

1. http://www.elasticsearch.org/

2. http://www.elasticsearch.org/overview/kibana/

3. http://logging.apache.org/log4net/

4. http://www.quora.com/Why-should-I-NOT-use-ElasticSearch-as-my-primary-datastore

5. http://logstash.net/