Contact Us

6 Elasticsearch Implementation Best Practices: Lessons Learned

RDA experts determine Elasticsearch implementation best practices.

The RDA development team was tasked with replacing an aging ColdFusion/SQL Server application on a current project for a B2B wholesaler. The client was challenged with providing relevant results across a corpus of 100,000+ products with individualized pricing for thousands of their customers. Customers needed the ability to search, sort, and filter product results based on their pricing and purchase history to locate products most relevant to them quickly and with minimal effort.  

While RDA evaluated many search solutions, we ultimately recommended that our client use Elasticsearch hosted by Elastic.co. This solution allows them to start small and grow the complexity and scalability as required, which gives them the confidence they need to introduce a new product into their tech stack. Over the course of this project, the RDA team has discovered several Elasticsearch implementation best practices to follow in the future. 

Elasticsearch Implementation Best Practices

What follows are some general lessons, or best practices, the RDA team has learned over the course of several months. working on this project. 

Let Elasticsearch Generate the Document ID

The single biggest performance gain RDA saw in our indexing process resulted from allowing Elasticsearch to generate the document ID. It is so tempting to choose a clever, meaningful ID for documents in the index. Resist the temptation because the downside is that with every insert, Elasticsearch must check to see if that ID already exists. This can significantly decrease your indexing speed leading to longer processing times and less up-to-date data for end users. 

Application Performance Monitoring (APM) and Logging are Awesome, but…

Elasticsearch is more than just search. The structured logging and APM capabilities turned out to be a huge bonus for our team when it came to monitoring and troubleshooting the application. No more digging through log files, when the full query power of Elasticsearch is applied to log and performance data. The but? Avoid the urge to put these indexes in the same instance as your search data because you do not want logging to compete with your application search for resources. Instead, create a new dedicated instance for logging and APM.  While there is some additional cost, the stress saved on your main search index is worth it.  

Recreate the Index Each Day

One of the biggest aha moments we had was upon realizing it was faster to generate a new index every morning as opposed to updating every document in an existing index. Aliases are your friend here! We took a simple approach where a numeric day of the week is appended to the index name and then the alias is reassigned to the new index. Any index older than two days is removed via a lifecycle rule. The bonus is that we always have yesterday’s index to fall back on in case of a data issue arising 

Tune Your Batch Size

There really is no correct answer to how large a batch size should be for inserts or updates. Testing and tuning both the number of documents per batch and the number of concurrent requests while observing Elasticsearch performance is key. Document size plays a role here and there is no substitute for testing and understanding your data. The RDA team will continue to fine tune these parameters as the amount of data grows.  

Plan Your Partitions

It is imperative to understand the relationship between document size and the total number of documents. Partitions should max out at 50GB so, adding new partitions means building a new index (not an issue if you’re rebuilding it every day!). If your index is going to be larger than 50GB, plan for it from the start and give yourself some breathing room. 

Choose the Right Hardware Profile

The final of these Elasticsearch implementation best practices is to choose the right hardware profile for your deployment. The RDA team initially thought storage was going to be a main concern, so we chose a Storage Optimized profile. We quickly learned that under heavy indexing workload, CPU utilization was an issue. Switching to a CPU Optimized profile made a substantial difference in total indexing time.  

RDA, Search Implementation Experts

If you’re just starting out with search implementation or struggling with indexing performance, hopefully these Elasticsearch implementation best practices are valuable to you. Following these best practices should enable you to streamline your work by making the implementation a more efficient process. 

RDA has been specializing in the fine-tuning of digital strategy and technical processes for more than three decades. Our team of award-winning strategists, analysts, designers, architects, and engineers build solutions that grow businesses and deliver results. Get in touch to see how we can help!

Recent Posts

See All