samedi 30 janvier 2016

Aliases of aliases: myth or reality?

The dangers of aliases of aliases in Elasticsearch

This post introduces the Elastic aliases and explain the danger of misusing them.

The aliases

In Elasticsearch, aliases provide a way to give alternative names to an index or a group of index. Searching, deleting and adding documents over multiple indexes as if manipulating only one index is made easy with this technique.

Aliases can also be used in your application code instead of the index name to provide the possibility of changing the index without modifying the source and to make zero-downtime reindexing. With the addition of filters, aliases can give the impression different small communities each have their own index, by filtering the retrieved documents by some user identifier, while in fact using a single index.

Aliases of aliases

Going further, you could imagine many use cases where aliases of aliases could provide a way group more aliases and add additional filters to them. As an example, in the shared index case, you could want an alias of alias to refer to three specific communities and add a filter to match only the documents beyond a specific date.

Does it work? Let's try it! Launch a local Elastic instance and run the following queries in your sense:

 DELETE index
 POST index
 {
     "mappings": {
         "user": {
             "properties": {
                 "name": {
                     "type": "string"
                 }
             }
         }
     }
 }
 
 POST _aliases
 {
     "actions": [
        {
           "add": {
              "index": "index",
              "alias": "alias1",
              "filter" : { "term" : { "name" : "helain" } }
           }
        },
        {
           "add": {
              "index": "index",
              "alias": "alias2",
              "filter" : { "term" : { "name" : "nialeh" } }
           }
        }
     ]
 }

 POST _aliases
 {
     "actions": [
        {
           "add": {
              "index": ["alias1","alias2"],
              "alias": "meta-alias"
           }
        }
     ]
 }


With these steps, you just created the index and two aliases alias1 and alias2 of this index, each alias with it's own filter. The last operation added an alias meta-alias covering alias1 and alias2.

Let's now see our aliases with a GET _aliases:

{
   "index": {
      "aliases": {
         "meta-alias": {},
         "alias2": {
            "filter": {
               "term": {
                  "name": "nialeh"
               }
            }
         },
         "alias1": {
            "filter": {
               "term": {
                  "name": "helain"
               }
            }
         }
      }
   }
}

From the result, we can see that our meta-alias doesn't refer to both our aliases, but only to our index. If you index some documents and try to search on meta-alias, you will see that the filters of alias1 and alias2 aren't applied to your queries.

This behavior happens because aliases aren't real indexes; a query on an alias will be forwarded to the referred index, with the addition of the filter if existing. When we created meta-alias, an alias creation query was sent to alias1 and alias2, that forwarded it to index. Meta-alias consequently forwards it's requests to index only and the filters of alias1 and alias2 will never be applied. If you don't expect  this behavior, it can results into exposure of information you would have wanted filtered.

Conclusion

Creating an alias of an alias has the same result as if creating an alias over the referred indexes directly.

lundi 11 janvier 2016

Elasticsearch in 20 minutes

Want to setup an Elasticsearch instance, tinker with this technology? Here is a quick and dirty tutorial that will help you to setup an Elasticsearch instance on your computer in less than twenty minutes!

What's Elasticsearch?

If you have an idea of what's Elasticsearch is you can pass this section and go straight to the installation of Elastic.

Elasticsearch ( also called Elastic ) is a database based on Lucene. It's becoming extremely popular since it's free, open-source and makes searches and data analytic easy and in real-time.

While a SQL query might take minutes or event timeout when the database contains too much records, Elasticsearch will scale better and always return a result in milliseconds. I have once tried to empirically compare the performance of both technologies, made a lot of equivalent queries on hundreds of thousands documents on both database. Elastic was overall 3000 faster, but also provided advanced lots of Natural Language Processing features like tokenization, use of n-grams, stop-word removal and many more, making possible the implementation of a google-like full-text search engine on your websites, making search more natural for your visitors.

Installing Elasticsearch

To start you journey with Elasticsearch, just download the software from the official website. When the download is over, just extract the contents of you elasticsearch-x.x.x.zip and go into the resulting folder.

Open a command prompt in the folder and move into the bin directory. Run the elasticsearch script to launch the database.


Your first Elasticsearch instance is now running! In the next tutorials, we will see how to add documents into your Elastic server and unlock the power of real-time full-text searches and analytics.


  • Tip: if you get a "Could not find any executable java binary." or "JAVA_HOME environment variable must be set" error, you should install Java and edit your environment variables.