Do-over from scratch in Splunk Universal Forwarder

Fri 20 October 2017
misc

Lately I've been playing with Splunk, centralizing my mail server logs and getting telemetry from a Raspberry Pi 2 that I stashed at a location where I'm trying to smoke out flaky Internet connectivity.

My configuration is a little bit unusual in that I'm manually configuring the hosts where the Universal Forwrader runs rather than doing centralized management via the Splunk console. The Universal Forwarder is the process that reads /var/log and other similar directories where files get written and sends them onward to the indexer and datastore.

Why am I doing this? Splunk is built around the notion that the people who are running the data collection are not the sysadmins on the systems that are generating the data. So they created a way to remotely steer what data is collected from where, via the system console.

But that's not me, and I want to be able to send all my config stuff out to the Universal Forwarders by editing the config files and sending them out with Ansible.

Since you're probably wondering, here are the files I have under management:

{% raw %} /opt/splunkforwarder/etc/apps/search/metadata/local.meta /opt/splunkforwarder/etc/apps/search/local/inputs.conf /opt/splunkforwarder/etc/system/local/inputs.conf /opt/splunkforwarder/etc/system/local/outputs.conf /opt/splunkforwarder/etc/system/local/props.conf /opt/splunkforwarder/etc/system/local/server.conf

But did I get it right the first time? Of course not. Development is an iterative process. I only discovered that I was defaulting to using time-of-import rather than json-timestamp-field for the event timestamp after I had an outage. Hard to actually be sending the ping data to "the cloud" when the Internet is down, right?

So, I wanted a do-over, that is to say, delete all the data that a host has sent, and do it again from scratch to see if I got it right this time. Took some effort to figure out how to make this happen, and it's not perfect inasmuch as it doesn't free up space according to the documentation, but this is a small amount of data and disk space is cheap...

First thing to do is stop the Universal Sender. Easy.

{% raw %} root@raspberrypi:/home/pi# /opt/splunkforwarder/bin/splunk stop

Then, on the Splunk console, delete the records. Note that you'll need the "can_delete role" on the account from which you execute this command. You can find this (on Splunk 7.0.0) under Settings -> Users and Authentication -> Access Controls -> Users -> (username) -> Assign to roles.

The query to actually do the deletion is:

{% raw %} host="raspberrypi" | delete

with a timespan preset of "all time".

Great, now the data is out of the centralized datastore. But we can't restart the universal forwarder quite yet. It maintains a database of file hashes and pointers to offsets in all files under management so that it doesn't send the same data twice. If we start the forwarder now it will merely pick up where it left off. Since we're debugging our file sending configuration, we need to give the universal forwarder amnesia so it starts over from scratch.

There are or used to be ways to do this using the splunk command (much like "splunk start" and "splunk stop") but it seems that this command has gone away or changed in Splunk 7.0.0.

Basically, we want to get rid of a local database or at least its contents. Fortunately, if deleted (the big hammer approach), the database will be regenerated the next time the universal forwarder starts.

{% raw %} root@raspberrypi:/home/pi# rm -rf /opt/splunkforwarder/var/lib/splunk/fishbucket/

Now we can start things up again by firing up the universal forwarder.

{% raw %} root@raspberrypi:/home/pi# /opt/splunkforwarder/bin/splunk start

Momentarily (or rather as fast as the forwarder can git-r-done) you should start seeing records via the Splunk console.