Hosted ELK

The hosted ELK stack: Centralizing and managing your logs for fun and profit

You don’t look a gift-horse in the mouth. Especially if it’s not a horse and actually an elk. It will poke you with its antlers. Or get drunk on fermented apples and trash your backyard. That’s an occupational hazard we’re willing to accept though so let’s take a look anyhow.

In this post we’d like to share some of our experience with the caveats of deploying and managing the ELK stack on your own and introduce you to the world of hosted ElasticSearch. Or in other words, when does it make sense to move on from managing your own ElasticSearch deployment, and what are the options you have when you decide to flip the switch. Having been on both sides, and understanding each team and system will have their own unique requirements, we wanted to share some insights to help you reach the right decision for your environment.

What’s wrong with hosting your ELK stack on your own?

When thinking about the open source software landscape you can roughly place any project on a spectrum between how easy and hard it is actually to use. How complex / time consuming it is to tend to its quirks and make it do what you want. Ease of use can then be broken down to parameters like deployment and setup, getting started with the tool, integrations, UI, scalability, stability and… maintenance. Spoiler alert: these last 3 areas are where we got hit. So if you can place projects like, say, Logback, far on the easier side of the scale, then ELK is going deep into the hard side.

1. The pains of scaling up

Takipi Kibana
Part of Takipi’s older Kibana dashboard

The tipping point occurred when the product reached a phase where companies with tens or hundreds of servers were able to get full teams on board and expand the visibility into their system. The stats flew off the charts. And so has our poor Kibana dashboard who started to hang far too often. At this stage we were experimenting with using the ELK stack mainly for BI purposes and occasional high touch support, so it was not defined mission critical.

Lesson #1: Don’t leave your ELK stack behind when gearing up towards high scale.

Find the Crap in Your Java App

Show me how >>

Fred

2. Stability takes a tough hit

What started as an occasional slow and hanging Kibana dashboard, quickly turned to crashing tumbling down ElasticSearch. Tough queries where the index size was bigger than allocated RAM caused lots of OutOfMemory exceptions finally resulting in a non-responsive database. The quick solution was just to restart it and take it easy on the queries.

Lesson #2: Watch out on those big queries when you work through your Kibana dashboard.

The real solution though would require tuning the $ES_HEAP_SIZE, stronger AWS instances, more RAM, disk space was running out too so we needed to compensate for that as well. Either through cyclic logs which would result in shorter historic records or just more disk space if you don’t want to flush your DB once in a while.

Lesson #3: What started with an easy deployment and integration process, quickly turned to an issue that requires constant monitoring and domain expertise.

3. Surprise! You’re now an ElasticSearch DBA

At this stage, especially if you’re not only using ELK for BI stats but pipeline all your logging data through it, it might make sense to get on the paid subscription service to get dedicated support and monitoring / security / permission management capabilities. The whole shebang.

Lesson #4: When your Kibana dashboard needs to be accessed by members of different teams you’ll probably need to set up access and user control. Patching up a solution of your own might not be smartest way to go here.

More things to keep tabs on include upgrades, backups, and managing sharding between nodes as your ElasticSearch cluster grows and gets… well… elastic. Before you know it, you sidekick as an ElasticSearch DBA and it consumes more and more of your time. This of course depends on how big your dev team is and if it makes sense to put more time into it in-house.

Lesson #5: Scaling up with more nodes for your ElasticSearch cluster is relatively easy and only requires a few settings, but don’t let it get out of hand – Another node is not necessarily the right solution.

Enter hosted ElasticSearch services

In a nutshell, if in the in-house setup we were already piping the logs through Logstash to an ElasticSearch cluster that was set up on a few AWS nodes with a Kibana dashboard on top, we decided to move to a hosted ElasticSearch solution. In hosted mode, ElasticSearch cluster management is taken off your shoulders and you’re free to focus on other things. The two main questions here are will it scale and how much would it cost?

The basic requirement would be to use a service that hosts its servers on the same cloud hosting provider that you’re using in your day to day. So if you’re on AWS, you want a hosted ElasticSearch service that uses AWS; saving costs and ensuring better network performance.

Pricing: Using a hosted service would cost more than the infrastructure needed to run it on your own. The upside you’re getting here is freeing your time from managing your ElasticSearch deployment, with support from experts and DBAs.

Found.no (AWS)

QBox.io (AWS, Rackspace, SoftLayer, Azure)

These two are probably the most popular solutions currently available. Found was recently acquired by Elastic, and it will be interesting to see how this will affect their offering on the long run. As far as pricing goes, both services bill hourly with varying steps according to parameters like disk size (and type), memory and data retention. Pricing is also affected by the region you choose to host your ElasticSearch server at, and if you need to have longer data retention it wouldn’t be too farfetched to assume your bill will go well over $1,000 per month.

With so many moving parts, it would be best to experiment with a few of the solutions, get the customized quote that would best reflect your needs. Installation and setup is promising to be super quick, and pricing turns to be the major factor in the decision making process here. As far as experimenting goes, Found has a 14 day trial (with 1GB memory, 8GB storage, on 2 AWS zones), and QBox delivers a $60 credit to new accounts. This is enough to get a feel for the product but probably won’t be sufficient for a full test run, which might require some negotiation or a paid POC. The cost of switching services is pretty low, just a matter of a few configuration changes, so you have a chance to experiment here with the only downside of losing some history.

But how to even identify what your requirements are? QBox and Found have a few nice tutorials available, check out these two posts here and here.

found.no
Found.no: It will fetch your data for you

Logz.io (AWS)

Bonzai (AWS)

Compose (AWS, DigitalOcean, SoftLayer)

FacetFlow (Azure)

Sematext Logsense

Currently at Takipi we’re testing the waters with Logz.io, who also support shipping logs to ElasticSearch without necessarily using Logstash. Apart from the 2 bigger players in this space we see more services like Bonzai.io, Compose.io, FacetFlow and others. Each providing their own management dashboards that extend Kibana as the visualization engine for ElasticSearch.

Logz.io Kibana
The Logz.io Kibana dashboard

Conclusion

The ELK stack is an awesome open source platform that provides a complete solution for log management, it’s easy to get started with and the eye candy is super sweet, but when it comes to managing it on the long run – Things get a bit awkward. While ElasticSearch is built well to scale, the effort you need to put into it may often outweigh the benefits of using a free open solution. That’s where you need to look into getting professional services involved, swallow the pill and let a hosted ElasticSearch-as-a-Service solution ease your pain to keep enjoying the benefits of centralized log management and visualization.


fredjava

15 tools to use when deploying new code to production – View tool list

email
Some kind of monster @ OverOps, GDG Haifa lead.
  • http://blog.sematext.com Stefan Thies
  • Otis Gospodnetić

    You may want to take a peek at Logsene. The following will sound like an ad, but when you are talking about logs and ES and ELK and hosted ELK, one just shouldn’t miss Logsene – http://sematext.com/logsene

    Logsene is built for log/event data. It’s hosted ELK, except instead of just L(ogstash) you can use any log shipper (e.g. Fluentd, rsyslog, syslog-ng, Apache Flume, nxlog, etc.) It exposes an ES API for reads and writes. It works with Kibana 3 and Kibana 4 and has both of them directly integrated in the service (you can also run your own K3/K4 and point them to Logsene). It is built by Sematext, which is known for its Elasticsearch (and ELK) expertise (ES/ELK consulting, production support, and training), as well as monitoring and running other data-intensive hosted services, like SPM and Site Search Analytics. I hope this helps. Logsene’s at http://sematext.com/logsene .

    • http://www.takipi.com/ Alex Zhitnitsky

      Cool, thanks for sharing Otis! I’ve added a link to the post.

    • Robin Ersek-Obadovics

      I can also recommend NXLog, it provides scalable high-performance, and is open source, an instant download is available on its website – https://nxlog.co/products/nxlog-community-edition – not to mention it’s a multi-platform software tool, so it can collect and process logs from Linux, Windows, Android and more.

  • Asaf Yigal

    Hi,

    Thanks for the mention on this great article and thanks for being a loyal user! I hope you don’t mind, but I just wanted to clarify a small thing in your post that was not entirely accurate.

    You wrote:

    “Using a hosted service would cost more than the infrastructure needed to run it on your own. The upside you’re getting here is freeing your time from managing your ElasticSearch deployment, with support from experts and DBAs.”

    Although using hosted ELK cost more than running an ELK deployment in-house, we at logz.io have created a full-stack ELK system that actually makes it cheaper to use our hosted service. Just as an example: Shipping 10GB a day with a retention of 30 days would cost $1356 a month on found.no but less than $400 a month on logz.io.

    We found the way to save that much money through our technology’s unique functionalities:

    1.) Running as a service and not just as a hosted solution — meaning that you can ship logs from any server and not only from AWS, Google Cloud, or Microsoft Azure.

    2.) Unlimited scalability — other services are limited in this capacity, but our users can ship as much data as they want and we process it and give them access to it

    3.) A free plan — we are the only ones that offer a free tier that is worth more than $229 a month (if people were using found.no)

    Thanks again for the mention!

    (Just for full disclosure: I am the co-founder of logz.io. https://logz.io)

    — Asaf.

    • http://www.takipi.com/ Alex Zhitnitsky

      Hi Asaf! Thanks for the comment and for sharing the stats. Right, so we compared hosted vs in-house pricing of the cloud resources for managing ElasticSearch deployments and haven’t gone into direct pricing comparisons between the hosted services. We also didn’t take the human factor and the time saved into account which might shift the scales in favor of the hosted solutions.

  • John Vanderzyden

    Thank you for spotlighting Qbox, Alex. Let me know if there’s a possibility for you to do some guest blogging for us. Maybe there could be another opportunity for us to work together. Kindly send me your contact information at john@qbox.io. Thanks again!

  • Ezra Quemuel

    Hi Alex, did you guys end up sticking with logz.io?

    • Gideon

      yeah I’d also be curious to know if you stuck with logz.io as we (at Converge) are looking to move to hosted ELK (or other logging solution that suits our needs)