home about

Setup a Central Logging Server using Splunk, syslog-ng and named pipes on Ubuntu

July 29th, 2007 cpetersen

Introduction

As you may already know, that I love Virtual Computing. My program of choice is VMWare, but regardless of what you use, I just love the concept. Virtual computing allows a small company like the Assay Depot to create many virtual computers, each handling a different aspect of the business. However, nothing is free in this world, and Virtual Computing comes with its own set of challenges. In this post, I hope to show how you can use Splunk, sysnlog-ng and named pipes to keep you informed as to what is happening on each of those virtual computers (it works just as well for physical computers). I use Ubuntu, so this post assumes you do too. You may have to do some interpreting if this is not the case. During this project I found this post informative: http://mysfitt.net/tutorials/splunk_fifo.php

Design

I have many Ubuntu machines, database servers, development servers, mail relays, application servers, etc. I want to monitor them using Splunk. Splunk is a program that reads log files, makes them searchable and generally more manageable. Splunk can read log files from many different sources, for this post, I chose to to tail files from the local file system. Since I want to manage all my machines with one instance of Splunk, my next challenge was to continuously feed log files from those machines onto one central logging server. This is where syslog-ng comes in. On one end, syslog-ng can be configured to send all system log statements to a network port. On the other end, it can be configured to receive those statements and write them to a file. In this way, our central logging server can handle all the system log files of all the Ubuntu machines on our network and Splunk can index them all. Finally, I want to monitor services that don't write directly to the system log. I would like to have syslog-ng simply tail those log files that I'm interested, however, either syslog-ng can't do that, or I couldn't figure it out (probably the latter). Instead it wrote a Ruby daemon that will tail a file and pipe the results to a named pipe. Syslog-ng then reads from that named pipe and sends the information over a network port to the central logging server that writes the statement to a file for indexing. Simple enough, lets get to it.

Install Splunk

For this post, I've chosen the Splunk 3.0 beta. If you choose to use version 2.2.6, you can download the deb file. For the beta, I downloaded the tgz version. The following code should be executed on the machine you plan to use as your central logging server.
wget 'http://www.splunk.com/index.php/download_track?file=/3.0b2/linux32/splunk-3.0b2-19829-Linux-i686.tgz&ac=ostg-lb&wget=true&name=wget'
tar xvfz splunk-3.0b2-19829-Linux-i686.tgz
sudo mv splunk /opt
sudo cp /opt/splunk/etc/init.d/debian/splunk /etc/init.d
sudo chmod 755 /etc/init.d/splunk
sudo update-rc.d splunk start 50 2 .
/etc/init.d/splunk start
That's it, we downloaded the package, expanded it to /opt, copied the init script to /etc/init.d and made it executable, told the server to start it when the machine starts, and finally started the server. Since we used the defaults (installed /opt/splunk) we didn't have to change anything. If you want to install to a different location, their README is very informative.

Install syslog-ng (Server)

Next we need to syslog-ng on the central logging server and configure it to accept logging statements from the clients. First, install syslog-ng using apt-get:
sudo apt-get install syslog-ng
by default, syslog-ng doesn't use dns names for performance reasons, but we want to turn it on so our logs are stored in directories named by server name, rather than by IP address. To turn it on, edit /etc/syslog-ng/syslog-ng.conf and change the line that says:
	use_dns(no)
to
	use_dns(yes)
Next, you have to set up some sources and destinations. Sources are where syslog-ng will read from and are typically named with a leading "s_". Destinations are where syslog-ng will write to and are typically named with a leading "d_". Log statements connect sources and destinations. I added the following to my /etc/syslog-ng/syslog-ng.conf file.
source s_remote {
        tcp(port(5140) keep-alive(yes));
};

destination d_hosts {
        file(
                "/u01/log/hosts/$HOST/$YEAR/$MONTH/$DAY/$FACILITY"
                owner(root)
                group(root)
                perm(0644)
                dir_perm(0755)
                create_dirs(yes)
        );
};

log {
        source(s_remote);
        destination(d_hosts);
};
The source reads logging statements from port 5140. The destination writes those logging statements to files in /u01/log/hosts based on the host name, date and which service published the logging statement. Lastly, the log statement connects the two. Once you restart the syslog-ng service, your central logging server is ready to go:
sudo /etc/init.d/syslog-ng restart

Install syslog-ng (Client)

Now that you have completed your central logging server, its time to setup some clients. First (just like with the server) you need to install syslog-ng:
sudo apt-get install syslog-ng
Now setup some sources and destinations in /etc/syslog-ng/syslog-ng.conf, except this time, you will be reading log statements from the system and writing them to a network port:
destination d_splunk {
        tcp("splunk01" port(5140));
};

log {
        source(s_all);
        destination(d_splunk);
};
The destination statement sets up syslog-ng to write to port 5140 on the server splunk01 (that's the name of my central logging server). The log statement connects s_all, which is the default source for syslog-ng, to your logging server. Your logging server which should now be collecting all kernel log messages from all the servers you've configured.

Log Custom Services

Ok, so far we are logging all kernel log statements to your logging server. That's great, but we really want to log our custom services which write to standard log files. Services such as postgres or Rails. Now, I'm certainly not a syslog-ng expert, but I couldn't figure out how to get syslog-ng to tail a file, I could get it to read a file once, but that didn't help much. I did however get syslog-ng to read from a named pipe. So I took it upon myself to write a program that would tail a file to a named pipe. (BTW, a named pipe, aka a FIFO, is tool for interprocess communication, one process can write to it like a file, while another reads from it like a file). First, we create our named pipe:
sudo mkdir /var/syslog-ng
cd /var/syslog-ng
sudo mkfifo syslog_fifo
I called mine, /var/syslog-ng/syslog_fifo. Now lets make syslog-ng read from that fifo and write to our network port. First we add a new source to /etc/syslog-ng/syslog-ng.conf
source s_pipe {
        pipe( "/var/syslog-ng/syslog_fifo" );
};
Next we update the log statement from the previous section, it should now look like:
log {
        source(s_all);
        source(s_pipe);
        destination(d_splunk);
};
Now, it is reading from the s_all source as well as our new s_pipe source and writing everything to d_splunk, our logging server. You must restart syslog-ng for the changes to take effect:
/etc/init.d/syslog-ng restart
Finally, we need to tail the files we are interested in and pipe them to our new fifo. To accomplish this I wrote the following daemon using Ruby:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
#!/usr/bin/ruby

APP_NAME = "rlogd"

@configFilename = "/etc/rlogd.conf"
@pidFilename = "/etc/rlogd.pid"
@pipe = "/var/syslog-ng/syslog_fifo"

def getChildPid(pid)
  cPid = `ps -ef | awk '$3 == #{pid} {print $2}'`
  cPid.chomp!
end

def status
  return File.exists?(@pidFilename)
end

def start
  if status
    puts "#{APP_NAME} is already running... stopping"
    stop
  end
  pidFile = File.new(@pidFilename, "w")
  configFile = File.open(@configFilename, "r")
  configFile.each_line do |file|
    file.chomp!
    unless file.empty? or file.match('^#')
      puts "tail -f #{file} > #{@pipe}"
      pid = fork do
        `tail -f #{file} > #{@pipe}`
      end
      Process.detach(pid)
      pidFile.puts(pid)
    end
  end
  pidFile.close
end

def stop
  if status
    pidFile = File.open(@pidFilename, "r")
    pidFile.each_line do |pid|
      pid.chomp!
      begin
        # STUPID HACK TO GET THE GRANDCHILD PROCESS ID
        # PID => RUBY
        #   CHILD PID => SH
        #     GRANDCHILD PID => TAIL
        cPid = getChildPid(pid)
        unless cPid.empty?
          gcPid = getChildPid(cPid)
          unless gcPid.empty?
            Process.kill("TERM", gcPid.to_i)
          end
        end
      rescue
        puts "Unable to kill #{pid}"
      end
    end
    pidFile.close
    File.delete(@pidFilename)
  else
    puts "#{APP_NAME} is not started"
  end
end

case ARGV.first
  when 'start':
    start
  when 'stop':
    stop
  when 'restart':
    stop
    start
  when 'status':
    if status
      puts "started"
    else
      puts "stopped"
    end
  else
    puts "Usage: #{APP_NAME} {start|stop|restart|status}"
end
The daemon reads from /etc/rlogd.conf. The config file consists of the filenames of any file that you want to be tailed separated by newlines. For instance, if you wanted to monitor your postgres server, your rlogd.conf file would look something like:
#
# rlogd.rb config file
# Files added here will be tailed into /var/syslog-ng/syslog_fifo
# where they can be controlled via syslog-ng
#
/etc/postgresql/8.2/main/log
Now start the rlogd server:
/etc/init.d/rlogd start
At this point, your client is setup. It should be sending all kernel log messages, as well as messages from log files you've configured to your logging server where they are stored and indexed. Since I will be performing this on all of my servers, I went ahead and packaged it up as a deb file, so setting up my clients is as easy as:
sudo apt-get install syslog-ng
sudo apt-get install rlogd

Configure Splunk

Your Splunk server should already be running and your clients should be logging to it. The last step is to configure Splunk to read the log files. First, visit http://splunk01:8000/admin, where splunk01 is the name of your logging server. Next, click on "Data Inputs", then on "Files and Directories". From there you can "Add Input"; Set the path to /u01/log (or whatever directory you are using) and change the host to "Regex on path". I use the following regex:
/u01/log/hosts/([a-z,A-Z,0-9]*)/
which tells Splunk to use the path to determine the host name.

Conclusion

That's it. You can now use Splunk to monitor the log files on all of your Ubuntu machines (virtual or otherwise). Additionally, once you set up the server, it is quite easy to setup the clients using the the deb file. You could even deploy the deb file on your repository server and make deploying even easier.

1 Response to “Setup a Central Logging Server using Splunk, syslog-ng and named pipes on Ubuntu”

  1. 7echno7im Says:
    I just wanted to say thank you so much for explaining this! This was exaclty what I was looking for, but could never find it. I really think Splunk should have similar documentation on their site, considering they own Splunk and Ubuntu has a huge footprint in the Linux world right now. If you don't mind, I may link to your site from mine? www.techtronic.us

Leave a Reply