samedi 14 mars 2015

Aggregating by day-of-week or month-of-year in Elastic

Aggregations are one useful tool of Elastic (previously named Elasticsearch) to summarize get statistics over the data. With aggregations, getting the average price of some products, or the number of post made each week is made easy.

Still, no default aggregation exist for finding information over the day-of-week or month-of-year in the current version of Elastic. I've seen and answered many questions on Stack Overflow relating to this issue.

Script aggregations can however solve this by using the following code:

Date date = new Date(doc['created_time'].value) ; 
java.text.SimpleDateFormat format = new java.text.SimpleDateFormat('EEE');
format.format(date)
The 'EEE' means that we want the day of week. For the hour-of-day use 'HH' or 'MMM' for month-of-year.

Put then the script into your aggregation's JSON:
{
    "aggs": {
        "perWeekDay": {
            "terms": {
                "script": "Date date = new Date(doc['created_time'].value) ;java.text.SimpleDateFormat format = new java.text.SimpleDateFormat('EEE');format.format(date)"
            }
        }
    }
}
And it's done.

dimanche 18 janvier 2015

Good practices for real time search with Elasticsearch

Elasticsearch good practices: introduction

As promised in my first articlethis is the start of my series of posts about the good practices when using Elasticsearch. These tips will come from the Elasticsearch documentation, other blogs and my own experience.

What is Elasticsearch?

Elasticsearch ( ES) is a real-time scalable open-source search engine built upon Lucene.  The near-real time property means that the indexed documents are searchable seconds only after having been uploaded. Scalable, Elasticsearch splits the information into shards and replicas that are automatically shared between the nodes of the cluster. ES is becoming more and more popular since it is easy to use, intuitive and well-documented. You can interact with Elasticsearch with any language able to make HTTP requests since it is REST-based.
After having discovered Elasticsearch, I was especially amazed by the speed of the searches: using like Lucene inverted indexes, finding the relevant documents in the Big Data haystack is child play.
Now that you've had a small introduction to ES, stay tuned for the next parts of the series!