Wednesday, April 02, 2014

Slimming down logstash

Following on from my previous post on logstash, it rapidly becomes clear that the elasticsearch indices grow rather large.

After a very quick look, it was obvious that some of the fields I was keeping were redundant or unnecessary.

For example, why keep the pathname of the log file itself? It doesn't change over time, and you can work out the name of the file easily (if you ever wanted it, and I can't see why you ever would - if you wanted to identify a source, that ought to be some other piece of data you create).

Also, why keep the full log message? You've parsed it, broken it up, and stored the individual fields you're interested in. So why keep the whole thing, a duplicate of the information you're already storing?

With that in mind, I used a mutate clause to remove the file name and the original log entry, like so:

  mutate {
     remove_field => "path"
     remove_field => "message"
  }


After this simple change, the daily elasticsearch indices on the first system I tried this on shrank from 4.5GB to 1.6GB - almost a factor of 3. Definitely worthwhile, and there are benefits in terms of network traffic, search performance, elasticsearch memory utilization, and capacity for future growth as well.

No comments: