Linux Training

Linux training for private, public & voluntary sector.

0800 024 8425

City LinUX Training Courses

Section 16.
Process Control
.


"All stable processes we shall predict. All unstable process we shall control."

John von Neumann.

16. Linux process control.

The process is one of the fundamental abstractions in Unix/Linux operating systems. A process is a program in execution. It consists of the executing program code, a set of resources such as open files, internal kernel data, an address space, one or more threads of execution and a data section containing global variables. Every process running on a Linux host has a process id(entity).

Processes are managed by the kernel.

The command ps lists the status of the current processes.

sa101$ ps
  PID TTY          TIME CMD
24837 pts/5    00:00:00 bash
24839 pts/5    00:00:00 ps
sa101$ ps -af
UID        PID  PPID  C STIME TTY        TIME CMD
fulford   4284  3850  0 Dec04 pts/0  00:12:07 terminal
fulford   4288     1  0 Dec04 pts/0  00:00:00 dbus-launch --autolaunch efe6149
fulford   4291  4284  0 Dec04 pts/0  00:00:00 gnome-pty-helper
fulford   4391  4389  0 Dec04 pts/3  00:01:15 gv
fulford  18218  3850  0 Dec09 pts/0  00:00:02 alpine
fulford  20160 23761  1 Dec09 pts/4  00:06:25 /usr/lib/firefox-4.0/firefox-bin
fulford  21621 20160  1 Dec09 pts/4  00:05:27 /usr/lib/firefox-4.0/plugin-cont
fulford  22635  3864  0 Dec06 pts/1  00:00:00 man hier
fulford  22638 22635  0 Dec06 pts/1  00:00:00 sh -c (cd "/usr/share/man" && (e
fulford  22639 22638  0 Dec06 pts/1  00:00:00 sh -c (cd "/usr/share/man" && (e
fulford  22643 22639  0 Dec06 pts/1  00:00:00 /usr/bin/less -is
fulford  24552  4391  0 Dec09 pts/3  00:00:13 gs -sDEVICE=x11 -dTextAlphaBits=
fulford  24576  4292  0 Dec09 pts/2  00:00:00 vi proccntrl.ms
fulford  24835 24576  0 00:05 pts/2  00:00:00 script
fulford  24836 24835  0 00:05 pts/2  00:00:00 script
fulford  24841 24837  0 00:05 pts/5  00:00:00 ps -af

Note that the snapshot generated by the ps command includes ps itself.

Each process is run with a user identity uid, when the -f flag is used this is listed as the first field of each line of output (other than the first line which gives a heading to each field in the subsequent lines).

The UID may be the UID of the user invoking the process or it maybe set using the file ownership and a set user id (suid) bit in the file permissions.

Each process has a unique reference number called the process id and a parent process id, which identifies the process from which it was invoked.

All processes can be traced back to the init process or process id 1.

sa101$ ps -f
UID        PID  PPID  C STIME TTY          TIME CMD
fulford  24986 24985  0 00:22 pts/5    00:00:00 bash -i
fulford  24987 24986  0 00:22 pts/5    00:00:00 ps -f
sa101$ ps -fp 24985
UID        PID  PPID  C STIME TTY          TIME CMD
fulford  24985 24984  0 00:22 pts/2    00:00:00 script
sa101$ ps -fp 24984
UID        PID  PPID  C STIME TTY          TIME CMD
fulford  24984 24861  0 00:22 pts/2    00:00:00 script
sa101$ ps -fp 24861
UID        PID  PPID  C STIME TTY          TIME CMD
fulford  24861  4292  0 00:08 pts/2    00:00:00 vi proccntrl.ms
sa101$ ps -fp 4292
UID        PID  PPID  C STIME TTY          TIME CMD
fulford   4292  4284  1 Dec04 pts/2    01:24:39 bash
sa101$ ps -fp 4284
UID        PID  PPID  C STIME TTY          TIME CMD
fulford   4284  3850  0 Dec04 pts/0    00:12:10 terminal
sa101$ ps -fp 3850
UID        PID  PPID  C STIME TTY          TIME CMD
fulford   3850  3848  0 Dec04 pts/0    00:00:00 bash
sa101$ ps -fp 3848
UID        PID  PPID  C STIME TTY          TIME CMD
fulford   3848  3843  0 Dec04 ?        00:00:01 xterm -sb
sa101$ ps -fp 3843
UID        PID  PPID  C STIME TTY          TIME CMD
fulford   3843  3828  0 Dec04 ?        00:01:37 wmaker --for-real
sa101$ ps -fp 3828
UID        PID  PPID  C STIME TTY          TIME CMD
fulford   3828  2269  0 Dec04 ?        00:00:00 wmaker
sa101$ ps -fp 2269
UID        PID  PPID  C STIME TTY          TIME CMD
root      2269  2259  0 Dec04 ?        00:00:00 -:0
sa101$ ps -fp 2259
UID        PID  PPID  C STIME TTY          TIME CMD
root      2259     1  0 Dec04 ?        00:00:00 /usr/bin/kdm -nodaemon
sa101$ ps -fp 1
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0  2012 ?        00:00:40 init [4]

16.1. Background processes.

Programs can be invoked as background processes by appending the & character to the command.

In response the shell

prints a job number (in square brackets) and the PID,

prompts for further input without waiting for the process to complete,

disconnects STDIN from the terminal device,

does not disconnect STDOUT or STDERR from the terminal.

We can bring a job back to the foreground with the fg command,

sa101$ find / -name impossible.file.name 2>/dev/null &
[1] 25199
sa101$ ps
  PID TTY          TIME CMD
25197 pts/5    00:00:00 bash
25199 pts/5    00:00:00 find
25200 pts/5    00:00:00 ps
sa101$ jobs
[1]+  Running      find / -name impossible.file.name 2> /dev/null&
sa101$ fg
find / -name impossible.file.name 2> /dev/null

and push it back into the background with the terminal metacharacter ^Z.

^Z
[1]+  Stopped        find / -name impossible.file.name 2> /dev/null
sa101$ bg
[1]+ find / -name impossible.file.name 2> /dev/null &
sa101$ kill %1

When a running process is pushed into the background it stops running. It can be scheduled to run again by issuing the bg command.

16.2. Terminating a process.

A process can be prematurely terminated by setting a signal flag for the kernel.

The list of signals available can be obtained with the kill -l command. Details of each signal are in the man pages.

sa10$ man 7 signal

The signals commonly set by users are

SIGINT (2). This is the keyboard interrupt invoked by ^C.

SIGTERM (15) which requests an orderly termination of process (termination of subprocesses, closing files etc.) and

SIGKILL (9). SIGKILL is invoked with the kill -9 <pid> command. It should only be used in extremis when other signals have failed.

Background jobs can be killed by using the job number prepended with the character %.

sa101$ find / -name afile -print 2>/dev/null|wc&
[1] 25631
sa101$ ps
  PID TTY          TIME CMD
25621 pts/5    00:00:00 bash
25630 pts/5    00:00:00 find
25631 pts/5    00:00:00 wc
25632 pts/5    00:00:00 ps
sa101$ kill %1
sa101$ ps
  PID TTY          TIME CMD
25621 pts/5    00:00:00 bash
25635 pts/5    00:00:00 ps
[1]+  Terminated      find / -name afile -print 2> /dev/null | wc

NB. Killing the job %1 killed all the associated processes in the script.

16.3. Looking for process hogs.

When a process occupies an excessive number of CPU cycles it is called a "process hog".

A full listing of running processes should look something like this:

sa101$ ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 Dec04 ?        00:00:05 init [4]
root         2     0  0 Dec04 ?        00:00:00 [kthreadd]
root         3     2  0 Dec04 ?        00:00:03 [ksoftirqd/0]
root         6     2  0 Dec04 ?        00:00:00 [migration/0]
root         7     2  0 Dec04 ?        00:00:00 [cpuset]
root         8     2  0 Dec04 ?        00:00:00 [khelper]
root         9     2  0 Dec04 ?        00:00:00 [kdevtmpfs]
root        10     2  0 Dec04 ?        00:00:00 [netns]
root        11     2  0 Dec04 ?        00:00:00 [kworker/u:1]
root       472     2  0 Dec04 ?        00:00:04 [sync_supers]
root       474     2  0 Dec04 ?        00:00:00 [bdi-default]
root       476     2  0 Dec04 ?        00:00:00 [kblockd]
root       559     2  0 Dec04 ?        00:00:00 [ata_sff]
root       566     2  0 Dec04 ?        00:00:00 [khubd]
root       572     2  0 Dec04 ?        00:00:00 [md]
root       674     2  0 Dec04 ?        00:00:00 [rpciod]
root       687     2  0 Dec04 ?        00:00:00 [khungtaskd]
root       693     2  0 Dec04 ?        00:00:12 [kswapd0]
root       757     2  0 Dec04 ?        00:00:00 [fsnotify_mark]
root       782     2  0 Dec04 ?        00:00:00 [nfsiod]
root       792     2  0 Dec04 ?        00:00:00 [jfsIO]
root       793     2  0 Dec04 ?        00:00:00 [jfsCommit]
root       794     2  0 Dec04 ?        00:00:00 [jfsSync]
root       802     2  0 Dec04 ?        00:00:00 [xfs_mru_cache]
root       803     2  0 Dec04 ?        00:00:00 [xfslogd]
root       804     2  0 Dec04 ?        00:00:00 [xfsdatad]
root       805     2  0 Dec04 ?        00:00:00 [xfsconvertd]
root       807     2  0 Dec04 ?        00:00:00 [ocfs2_wq]
root       810     2  0 Dec04 ?        00:00:00 [user_dlm]
root       817     2  0 Dec04 ?        00:00:00 [glock_workqueue]
root       818     2  0 Dec04 ?        00:00:00 [delete_workqueu]
root       822     2  0 Dec04 ?        00:00:00 [gfs_recovery]
root       824     2  0 Dec04 ?        00:00:00 [crypto]
root       867     2  0 Dec04 ?        00:00:00 [kthrotld]
root       996     2  0 Dec04 ?        00:00:00 [cciss_scan]
root      1015     2  0 Dec04 ?        00:00:00 [fc_exch_workque]
root      1016     2  0 Dec04 ?        00:00:00 [fc_rport_eq]
root      1017     2  0 Dec04 ?        00:00:00 [fcoethread/0]
root      1019     2  0 Dec04 ?        00:00:00 [fnic_event_wq]
root      1105     2  0 Dec04 ?        00:00:00 [scsi_eh_2]
root      1108     2  0 Dec04 ?        00:00:00 [scsi_eh_3]
root      1112     2  0 Dec04 ?        00:00:00 [kworker/u:3]
root      1164     2  0 Dec04 ?        00:00:00 [scsi_eh_4]
root      1167     2  0 Dec04 ?        00:00:00 [scsi_eh_5]
root      1186     2  0 Dec04 ?        00:00:00 [exec-osm]
root      1192     2  0 Dec04 ?        00:00:00 [block-osm]
root      1316     2  0 Dec04 ?        00:00:10 [kjournald]
root      1366     1  0 Dec04 ?        00:00:00 /sbin/udevd --daemon
root      1422     2  0 Dec04 ?        00:00:00 [kpsmoused]
root      1802     2  0 Dec04 ?        00:00:00 [nfsd]
root      1803     2  0 Dec04 ?        00:00:00 [nfsd]
root      1807     2  0 Dec04 ?        00:00:00 [nfsd]
daemon    1851     1  0 Dec04 ?        00:00:00 /usr/sbin/atd -b 15 -l 1
root      1854     1  0 Dec04 ?        00:00:19 sendmail: accepting connections
smmsp     1857     1  0 Dec04 ?        00:00:00 sendmail: Queue runner@00:25:00
root      1874     1  0 Dec04 ?        00:01:49 /usr/local/bin/spamd -d --pidfil
apache    2150  2123  0 Dec04 ?        00:01:27 /usr/sbin/httpd -k start
apache    2151  2123  0 Dec04 ?        00:01:28 /usr/sbin/httpd -k start
apache    2152  2123  0 Dec04 ?        00:01:31 /usr/sbin/httpd -k start
root      2234  2136  0 Dec04 ?        00:00:00 /usr/sbin/smbd -D
root      2242     1  0 Dec04 ?        00:00:14 automount
root      2252     1  0 Dec04 ?        00:00:05 /usr/local/sbin/opendkim -p loca
root      2253     1  0 Dec04 tty1     00:00:00 /sbin/agetty 38400 tty1 linux
root      2254     1  0 Dec04 tty2     00:00:00 /sbin/agetty 38400 tty2 linux
root      2255     1  0 Dec04 tty3     00:00:00 /sbin/agetty 38400 tty3 linux
root      2256     1  0 Dec04 tty4     00:00:00 /sbin/agetty 38400 tty4 linux
root      2257     1  0 Dec04 tty5     00:00:00 /sbin/agetty 38400 tty5 linux
root      2263  2259  0 Dec04 tty7     00:41:22 /usr/bin/X -br :0 vt7 -nolisten
root      2266     2  0 Dec04 ?        00:00:00 [ttm_swap]
root      2269  2259  0 Dec04 ?        00:00:00 -:0
apache    2287  2123  0 Dec04 ?        00:01:39 /usr/sbin/httpd -k start
root      3753     1  0 Dec04 ?        00:00:00 /usr/sbin/console-kit-daemon --n
root      3818     1  0 Dec04 ?        00:00:00 /usr/libexec/polkitd --no-debug
fulford  28692 28691  0 12:00 pts/5    00:00:00 ps -ef

NB. Output edited and truncated.

Very few processes appear to register any CPU time at all. Of those that do only those that have been running for days have registered more than a few seconds.

If a second or third snapshot is taken and the CPU time on a process is rising rapidly, we may suspect a process hog.

The command top can also assist in identifying process hogs.

sa101$ top
top - 12:12:16 up 5 days, 22:19,  6 users,  load average: 0.04, 0.14, 0.46
Tasks: 151 total,   1 running, 149 sleeping,   1 stopped,   0 zombie
Cpu(s):  4.3%us,  0.7%sy,  0.0%ni, 93.9%id,  1.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:    493116k total,   308328k used,   184788k free,     6328k buffers
Swap:   989972k total,   387928k used,   602044k free,   111664k cached
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
28765 fulford   20   0  2828 1084  808 R  2.0  0.2   0:00.01 top
    1 root      20   0  2008    4    0 S  0.0  0.0   0:05.63 init
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.14 kthreadd
    3 root      20   0     0    0    0 S  0.0  0.0   0:03.15 ksoftirqd/0
    6 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/0
    7 root       0 -20     0    0    0 S  0.0  0.0   0:00.00 cpuset
    8 root       0 -20     0    0    0 S  0.0  0.0   0:00.00 khelper
    9 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kdevtmpfs
   10 root       0 -20     0    0    0 S  0.0  0.0   0:00.00 netns
   11 root      20   0     0    0    0 S  0.0  0.0   0:00.06 kworker/u:1
  472 root      20   0     0    0    0 S  0.0  0.0   0:04.60 sync_supers
  474 root      20   0     0    0    0 S  0.0  0.0   0:00.03 bdi-default
  476 root       0 -20     0    0    0 S  0.0  0.0   0:00.00 kblockd
  559 root       0 -20     0    0    0 S  0.0  0.0   0:00.00 ata_sff
  566 root      20   0     0    0    0 S  0.0  0.0   0:00.01 khubd
  572 root       0 -20     0    0    0 S  0.0  0.0   0:00.00 md
  674 root       0 -20     0    0    0 S  0.0  0.0   0:00.00 rpciod
top - 12:12:19 up 5 days, 22:19,  6 users,  load average: 0.04, 0.14, 0.45

16.4. Nice and renice.

Each process is allocated a run time priority level. The priority can be adjusted by using nice when invoking the command.

sa101$ /et   nice find / -ctime 1000 >/var/tmp/olderfiles &
[1] 29408
sa101$ ps -l
F S   UID   PID  PPID  C PRI  NI ADDR SZ WCHAN  TTY          TIME CMD
0 S  1000 29402 29401  0  80   0 -  1171 wait   pts/9    00:00:00 bash
0 D  1000 29408 29402  0  90  10 -   619 sleep_ pts/9    00:00:00 find
0 R  1000 29409 29402  0  80   0 -   664 -      pts/9    00:00:00 ps

Note that the nice value for find is 10 which raises the priority from the default 80 to 90 (the higher the number the lower the priority).

Although not much used by ordinary users these days nice is still an important tool for administrators of busy multi-user and multi-tasking systems where we want to start a reporting process in the background when time to completion is not an issue.

The range of values that can be used to modify the priority is -20 to 19 (least favourable priority).

The nice value on a running process can be modified with the renice command. The systems administrator may want to renice a suspect process pending further investigation or raise the priority of process that appears to be hung, is not being rescheduled and hence is not seeing a termination signal.

sa101# nice find / -ctime +1000 >/var/tmp/oldfiles &
[2] 29472
sa101# ps -l
F S   UID   PID  PPID  C PRI  NI ADDR SZ WCHAN  TTY          TIME CMD
0 S     0 29394 29393  0  80   0 -   920 wait   pts/8    00:00:00 bash
4 D     0 29467 29394  2  90  10 -   727 sleep_ pts/8    00:00:01 find
1 D     0 29472 29394  0  80   0 -   920 sleep_ pts/8    00:00:00 bash
4 R     0 29473 29394  0  80   0 -   664 -      pts/8    00:00:00 ps
sa101# renice +10 -p 29467
29467 (process ID) old priority 10, new priority 19
sa101# ps -l
F S   UID   PID  PPID  C PRI  NI ADDR SZ WCHAN  TTY          TIME CMD
0 S     0 29394 29393  0  80   0 -   920 wait   pts/8    00:00:00 bash
4 D     0 29467 29394  1  99  19 -   727 sleep_ pts/8    00:00:01 find
4 R     0 29483 29394  0  80   0 -   664 -      pts/8    00:00:00 ps

16.5. Checking the CPU.

The current overall systems activity can be monitored with the vmstat command.

sa101$ t vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0 375380  67912   9604 147524    2    3    12    21   81   72  4  1 94  1
 0  0 375380  67912   9604 147528    0    0     0     0  206  636  4  1 95  0
 0  0 375356  67788   9620 147528    0    0     0    22  218  646  4  1 95  0
 0  0 375348  66796   9628 147528    0    0     0    13  213  639  4  1 95  0
 0  0 375348  66796   9628 147528    0    0     0    22  210  641  4  1 95  0

On a well behaved system with no CPU constraint we should expect the cpu idle time to be approaching 100%. The number of runnable processes in the queue should normally be 1 or 0. If we see the idle time consistently fall below 60% and the number of runnable processes in the queue exceed 3 there are either badly behaved programs or inadequate CPU resources. There are likely to be performance issues if the situation is not soon addressed.

NB. The first line of vmstat output shows the averages since the last reboot and so using the options 5 5 gives us the average to date and 4 snapshots 5 seconds apart. By setting an interval of 5 seconds we can negate the impact of vmstat itself on the report.

16.6. Exercise.

Use top to find the processes on your host that are currently using the most CPU cycles.

Check the output of vmstat and then try to raise the activity level by invoking several background processes and then running a couple of find commands (starting from the root directory with the output redirected to disk files).

Take ps listings redirected to files and check the output of vmstat once more.

Using the table below revise the commands you have learned in the Linux training course to date.

16.7. Tools and metanotation:

   

Image imgs/sa101-24.png

Image imgs/sa101-25.png


The layout and associated style sheets for this page are taken from the World Wide Web Consortium and used here under the W3C software licence.