High server load can be caused by a number of problems such as high CPU load, high memory usage and excessive disk usage.
High CPU and high memory usage is easy to spot by using tools such as 'top' via ssh. Here's a typical top output...
top - 22:18:18 up 2:10, 1 user, load average: 0.71, 0.71, 0.75 Tasks: 171 total, 1 running, 170 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 0.5%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi, 0.5%si, 0.0%st Mem: 8166584k total, 2174004k used, 5992580k free, 55904k buffers Swap: 4200888k total, 0k used, 4200888k free, 650228k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8952 root 10 -5 0 0 0 S 2.3 0.0 1:37.19 md2_resync 813 root 10 -5 0 0 0 S 1.0 0.0 0:57.03 md2_raid1 820 root 10 -5 0 0 0 S 0.3 0.0 0:00.45 kjournald 1 root 15 0 10348 704 592 S 0.0 0.0 0:01.93 init 2 root RT -5 0 0 0 S 0.0 0.0 0:00.00 migration/0 3 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0 4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0 5 root RT -5 0 0 0 S 0.0 0.0 0:00.00 migration/1
The load average in this case is fine. We can also see the CPU usage is fine and there's plenty of free memory and swap space. In this case, swap space hasn't been used at all. Generally, the worst load problems relate to running out of memory and the server resorting to using disk swap space.
If CPU usage and Memory usage are both fine, there may be a problem elsewhere. One factor to check is IOWait.
To check this historically, the 'sar' command is very useful. Here's output from sar showing high IOWait usage.
# sar Linux 2.6.18-164.11.1.el5 (server.aegishosting.co.uk) 02/25/2010 12:00:01 AM CPU %user %nice %system %iowait %steal %idle 12:10:01 AM all 0.37 0.09 0.11 0.02 0.00 99.41 12:20:01 AM all 0.34 0.09 0.10 0.01 0.00 99.47 12:30:01 AM all 0.34 0.06 0.11 0.02 0.00 99.47 12:40:01 AM all 0.44 0.06 0.10 0.09 0.00 99.31 12:50:01 AM all 0.45 0.10 0.10 0.25 0.00 99.11 01:00:01 AM all 1.26 0.06 0.23 0.04 0.00 98.41 01:10:02 AM all 0.65 6.28 1.06 8.50 0.00 83.51 01:20:03 AM all 0.41 4.50 1.11 12.13 0.00 81.85 01:44:27 AM all 0.25 1.81 0.61 22.47 0.00 74.86 01:44:28 AM all 9.39 0.00 3.33 80.51 0.00 6.77 02:02:23 AM all 0.47 0.12 0.19 28.66 0.00 70.57 02:02:25 AM all 12.85 2.77 2.57 35.71 0.00 46.09 02:10:01 AM all 0.61 0.13 0.48 20.12 0.00 78.66 02:20:02 AM all 0.68 2.88 1.49 13.87 0.00 81.09 02:30:03 AM all 0.45 9.63 1.11 8.11 0.00 80.71 02:40:01 AM all 0.95 1.76 0.65 5.99 0.00 90.66 02:50:01 AM all 0.35 0.10 0.11 0.27 0.00 99.17 03:00:01 AM all 0.44 0.06 0.11 0.24 0.00 99.15 03:10:01 AM all 0.34 0.06 0.10 0.38 0.00 99.12 03:20:01 AM all 0.35 0.09 0.08 0.20 0.00 99.27
In the case above, the high IOWait figures occurred at about 1am which was the time that this particular server performs it's backups. A quick check of error messages with the 'dmesg' command showed us some problems with one of the disks in our RAID array. The intensive use of the disk during backups was showing itself as high IOWait figures. We swapped out the faulty disk before it failed totally.