Step by Step guide to use Elasticsearch in Automation
Author : Parvez Ahmad

Elasticsearch is used as a source for feeding data to many modules in Shiksha, and so the testing of these modules rely very much on the data consistency on Elasticsearch and the way it is shown on the page.
While automation has covered many DB and SOLR fed pages there was a requirement now to check for the pages which are based on the data on Elasticsearch. For the sake of the above we had to introduce the Elasticsearch usage in the existing automation suite.
What is Elasticsearch?
Elasticsearch is a distributed, open source search and analytics engine for all types of data, including textual, numerical, structured and unstructured. It is well known for its simple REST APIs, distributed nature,support to schema-free JSON objects and scalability..
Methods to use Elasticsearch In Selenium using Java.
The interaction of Elasticsearch and Java can be done using clients. The major types of clients are
- Transport Client
This client connects to an existing cluster node of the Elasticsearch using the transport module (the layer that is also used for inter-node communication) once the connection to one node is established we can use ‘sniff ’ to check and retrieve the URL from another node and use them in round robin way. The issue here is that this client pings the ES automatically, and thus can fail in case of new node addition or removal of some nodes.
- Reactive Client
The Reactive Elasticsearch Client are driver based webclients. It uses the request/response objects provided by Elasticsearch itself. The calls were directly operated on a reactive stack, not wrapping asynchronous responses into reactive types.
- Java high level REST Client
It is similar to the Transport client, request arguments and response objects are same to the transport client however the High Level Client is guaranteed to be able to communicate with any Elasticsearch node running on the same major version and greater or equal minor version. Also all the methods are available in both synchronous and asynchronous versions.
The transport Client is now deprecated in ES 7 and will be removed in ES 8 and the Reactive Clients were having limitations to handle async responses. So we went along with the Java High level REST Clients instead .
REST Clients are the default client for Elasticsearch, and the asynchronous calls are operated on a client level thread pool, Also this was pretty simple to implement.
Implementation in Shiksha Automation Project
The Shiksha Automation Project Is a maven Project so the first step to go on with using the Elasticsearch clients was to add the dependencies to the POM.xml file, which is the fundamental unit of work in maven projects. The dependency for Elasticsearch is as,
Once the POM was all set the project can now be helped with the search requests of the REST clients. The SearchRequest is used for any operation that has to do with searching documents, aggregations, suggestions and also offers ways of requesting highlighting on the resulting documents. In general this was going to be the one solution for all types of data extraction.
After this we needed to set up the search APIs and trigger the query which would return the desired output which was to be a JSON Object. To do this we start off with the SearchRequest class which creates a SearchRequest object which without arguments runs against all indices. Most search parameters are added to the SearchSourceBuilder. It offers setters for everything that goes into the search request body. Then we add a match_all query to the SearchSourceBuilder. And finally we add the SearchSourceBuilder to the SearchRequest. [add Flow diagram]
And then we create a simple response object using the basic REST Assured functions and store the response which here will be the output of the query which we need to get result from, a snippet of the above can be checked out below,
The above can be better understood by the flowchart below.
Once the response string is fetched it can be again converted into a JSONObject as the format of storing a document in Elasticsearch is JSON. so the documents or the responses which come as the result of the above execution are in JSON format.
This is to avoid unnecessary efforts that may arise while the string manipulation process of the response string, as JSON handling is much easier.
Sample of response from shiksha, here I’ve used aggregation with size as 0 in the query, I’ve done this on purpose to avoid unnecessary sharing of docs and keys of documents indexed.
What was the benefit?
The client response viewer interface was to be revamped and this was served by Elasticsearch itself. Hence to facilitate the testing of the ongoing project it was to be done. This however may help in further testing of other functionalities , but the idea to begin with was up with this project itself.
The results that were found by the help of automation were very satisfactory and reliable which might have taken more effort if to be done manually.
This was helpful in
- Verification of values in the documents,
- The total number of documents available (for any query conditions).
- Working on a larger dataset.
- Time and effort reduction once the scripts were up and running.
Challenges during the implementation
Although there were no big challenges regarding the working of REST clients, there still were some challenges faced while introducing ES in the Shiksha Automation suite.
- One issue was with building up the query and how to pass them on the request itself. A primitive solution was simply building up a string of the query and passing it to the body function parameter. However this was not an efficient solution but can help in the short run. The entire document on how to build a query can be found in this link.
- A lot of variation around query builder was a little new and took time to get familiar.
References