In this article I’ll be explaining the Linux sar command and some of its many uses for helping you track down possible bottleneck problems on your server.
Please note in order to follow along with this guide you’d need to have a VPS (Virtual Private Server) or dedicated server to have access to the sar information.
What is sar?
Most of what happens on your server, is going to happen when you’re not actively logged into the server and monitoring things yourself. This is where the System Activity Reporter (sar) can come to the resuce.
The sar command is part of the Linux systat package, which is a collection of tools to help monitor your server’s usage statistics.
What can I do with sar data?
The most common thing you can do with the sar command is simply running it by itself, this will provide you with the default view of data that has been logged.
sar
So if you run this command alone:
sar
You’ll get output back like this:
Linux 2.6.32-279.22.1.el6.x86_64 (elite.inmotionhosting.com) 03/06/2013 _x86_64_ (8 CPU)
12:00:04 AM CPU %user %nice %system %iowait %steal %idle
12:10:01 AM all 0.06 0.08 0.04 0.40 0.00 99.42
12:20:01 AM all 0.04 0.10 0.04 0.26 0.00 99.55
12:30:01 AM all 0.04 0.09 0.03 0.02 0.00 99.82
This is showing the current day’s CPU statistics, which generally isn’t going to be too helpful. However sar has a lot of other features built in that can come in very handy
sar -q
When using the -q flag on the sar command, you’re presented with the process run queue and server load average, which is much more helpful in spotting possible server bottlenecks.
So if you run this command:
sar -q
You’ll get output back like this:
Linux 2.6.32-279.22.1.el6.x86_64 (elite.inmotionhosting.com) 03/06/2013 _x86_64_ (8 CPU)
12:00:01 AM runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15
12:02:01 AM 2 260 0.75 0.53 0.67
12:04:01 AM 1 255 0.96 0.75 0.73
12:06:01 AM 2 254 0.31 0.57 0.67
12:08:01 AM 3 269 0.35 0.53 0.63
12:10:01 AM 5 269 7.30 2.25 1.20
In this example, you can see that at 12:02AM our 8 core server had a 1 minute load average denoted by the ldavg-1 column of 0.75. However by 12:10AM this had spiked up to 7.30, so we can see that something caused the server to almost reach a full 100% utilization of its resources.
More advanced sar usage
I had written another article on advanced server load monitoring that even explains how you can use the following one-liner to show all the times your server’s load average was spiking:
for log in `ls -1 /var/log/sa/sa[0-9]*`; do echo $log; sar -q -f $log | egrep -v “Average|ldavg” | awk ‘{if ($5>=1) print $1,$2,$5}’; echo “”; done | less
In another article I wrote on how to determine the cause of a server usage spike it goes over also using sar data to find when your server’s load has been spiking, and then how to look at your Apache website access logs to correlate what was going on at that time, that might have led to the server’s usage spiking.