Adventures with Logstash Grok

  • Tue 09 May 2017
  • misc

I just got over a multi-day odyssey where I thought I was going mad, trying to get Logstash working.

For those who are unaware, Logstash, is a component of the ELK Stack (which they now want you to call Elastic Stack) which facilitates ingesting logs from arbitrary places, optionally turning them into structured data, and finally putting in Elasticsearch, a search engine based on Lucene.

Taking free-form logfiles and turning them into structured data is one of those incredible pain-in-the-butt exercises which one does not fully appreciate until one has tried to do it. Simply put, text that is intended to be easily understandable by humans is not easily understandable by computers.

Logstash has a filter called "grok" (no doubt as a nod to Heinlein), which allows one to apply a regular expression based grammar to match lines and thus parse them into name:value pairs (and then export them, likely as JSON or a similar structure) into Elasticsearch. Yup, regexes. In the words of Jamie Zawinski, Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems..

With over 30 years' experience using regexes, that wasn't going to be the long pole in the tent here, even if the grammar for writing grok filters was new to me. But you wouldn't know it from the level of frustration that I was suffering when playing with grok. Simply put, filter lines that I thought should work, were returning _grokparseerror. Even patterns which perfectly passed various different validators that various folks have put up... just plain didn't work. Usually. Different lines which were minor variants of each other either worked or didn't. Changing names sometimes seemed to help.

It's probably worth noting at this point that you can have a directory full of files that get included to establish patterns for matching - all files in this directory get included and evaluated as filters. So one might have a stanza like this:

    filter {
        grok {
        patterns_dir => ["/Users/rs/elk/logstash-5.3.0/patterns"]
        match => { "message" => ["%{PF}", "%{DOVECOT}" ] }

Since I was quoting jwz you might reasonably surmise that I could be an Emacs user, and you'd be right... which is where things start to go off the rails.

Emacs has had a behavior since time immemorial wherein it saves the previous version of, for example, foo.c as foo.c~ - particularly in the days before we used version control systems as the saved-copy-of-record, this was occasionally useful for saving one's bacon - indeed, preserving multiple old copies of a file is a tuneable parameter.

Emacs is not the only editor that does this. I'm told that Elvis (a vi clone) also has a similar behavior.

Logstash does not enforce a file extension. It hoovers in all files in that patterns directory. See where this is going?

    Macintosh-HD:logstash-5.3.0 rs$ ls -l patterns/
    total 32
    -rw-r--r--  1 rseastro  178763599  7803 May  9 10:29 postfix
    -rw-r--r--  1 rseastro  178763599  7803 May  8 23:29 postfix~
    Macintosh-HD:logstash-5.3.0 rs$ 

Yup, Logstash was reading my definitions twice. First the current saved version, then the old saved version. And since the patterns files are basically an exercise in setting variables, and the ability to redefine a variable is a feature not a bug, there were no warnings whatsoever when this happened. Reading the file multiple times, or rather reading the main file and the backup file (in alphabetical order, sigh) overwrites the values in the current file.

A reasonable request for the Logstash folks might be that they start ignoring files that end in ~ - possibly with a warning - if they insist on not enforcing filename suffixes of any sort. This would represent a policy of least astonishment. Then again, I suspect that your typical person today who gets stuck with doing logstash configuration is probably using pico or nano as an editor of choice, or maybe even notepad. Sigh.