Running programs

How to run your programs efficiently, without overloading the system.

There is a "cat" command?

This is a general guide on how to run programs on our Linux servers in an efficient way. At the Department, we have several Linux servers for communal use, but no queueing or scheduling systems. But, with some simple and standard tools, and some consideration to your fellow users, you can all utilize the resources to the max.

The ideas and tools presented here are also useful on a single-user Linux workstation or laptop, or even a Mac.

This guide assumes you know the basics of using the Linux/Unix shell. It does not cover how to connect, or how to e.g write shell scripts. There are other articles for that.

Shortcuts: Load | Memory | at & batch | nice & renice | compiling

TL;DR: Scroll down to paragraphs in bold italics to see some "best practices" suggestions.

Load and processes

A process is a program during execution. It uses computer resources, such as CPU time and memory, and usually reads and writes some files. The Linux kernel will provide these resources to processes as they request them.

The load on a Unix computer is a simple measure of how busy it is. Simply put, it is the number of processes that is using or is waiting for CPU time, averaged over some time interval. An idle machine will have a load close to zero, as (on average) the running processes does very little. For example, your login shell spends most of its time waiting for you to press some keys, and very little time actually running on the CPU.

You can check the current load on the system with the commands uptime or w, e.g.:

$ uptime
22:44:23 up 272 days, 29 min, 26 users,  load average: 10.67, 10.61, 10.46

The three load numbers shown are the 1,5, and 15 minute load averages.

The top program shows a snapshot of computer resource usage — including load — and the busiest processes, updated every few seconds. Also, the xload program shows a simple histogram of the load.

Screenshot of xload on sverdrup.

One process running constantly will up the load to one. In the above example, the load is about 10, suggesting 10 processes are running full throttle.

If all you have is a single CPU core, these 10 process will have to share the time on the CPU, and each will run slowly, taking 10 times as long to complete. On the other hand, if this particular system has 10 (or more) CPUs, each process has one CPU for itself, and all is good!

To see how many CPUs a system has, use the lscpu command:

$ lscpu
(...)
CPU(s):              24

Things get more complex if you account for things like I/O-wait, multithreading and hyperthreading, but the following rule of thumb should still be valid:

As long as the load is lower than the number of CPUs, there are idle CPUs that can be put to work!

Memory

A process uses memory to contain its own code, and to hold data that is being processed. It is important that the system has available memory for processes that request more of it, or else bad things will happen to your processes and quite likely also to the system.

You can check the free memory on the system with the free command:

$ free -h
              total        used        free      shared  buff/cache   available
Mem:           251G         49G         89G        485M        112G        200G
Swap:            9G        4.4G        5.6G

Here, we can see that the system has 251 GB of total memory, and that 200 GB is available — which is good!

The program code is usually not very big, and Linux is clever enough that identical pieces of code uses the same physical memory (that's the "shared" memory in the above example), so it doesn't add much to the memory footprint to e.g. start another shell. Linux will also use otherwise unused memory for buffers and cache, which can be flushed out quickly when needed. In the example, 89 GB is truly free while 112 GB is buffers and cache, yielding (89+112≈) 200 GB of available memory.

The output above also show swap usage. Our servers are set up with relatively little swap, so it is not so important for this discussion.

To see what processes is consuming the most memory, run top and press M (shift-m) to sort on memory usage:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
44281 paul      35  15   86.0g  72.6g  39668 R  1296 12.3 360:13.12 python
13317 paul      35  15   50.3g  32.7g  29860 S   0.3  5.5  47:02.05 python
20128 sanches   35  15   25.3g   6.5g 303952 S   2.6  1.1 457:54.42 MATLAB
6427  sanches   35  15   25.2g   6.4g 302828 S   2.3  1.1 401:17.32 MATLAB

Here we see that user paul is running a python job that consumes about 12% of system memory. (The CPU usage of almost 1300% is because this single process uses several threads, running on several CPUs. Press I (shift-i) in top to normalize to 100%.)

To inspect the memory consumption of your own processes only, try the following:

$ ps -o pid,user,%mem,command x
 PID USER     %MEM COMMAND
13877 hpverne   0.0 sshd: hpverne@pts/19
13883 hpverne   0.0 -bash
26362 hpverne   0.0 ps -o pid,user,%mem,command x

In this case, I am only running a few processes that consumes negligible parts of system memory.

As some memory is shared between processes, it can be difficult to pinpoint exactly where memory is used, but the above procedures should give you an idea.

This leads us to a couple of other rules of thumb:

Make sure there are plenty of available memory before you start a big job!

Inspect your long-running processes now and then. If they are likely to grow to consume more memory than available, kill or suspend them to prevent the system from crashing!

at & batch

Sometimes you can benefit from starting a job later — the servers are usually less busy at night. If you don't want to stay up late, you can use at to schedule a job at a give time, provided the job can be run as a single command (or several, separate commands).

For this, you might want to create a shell script that bundles the commands that should be run, including staging and post-processing tasks. If you need to load modules to run your jobs, do so in the job script.

To use, change directory (cd) to where you keep the script. Make sure the script works as intended (run it briefly). Then, run the at command with the desired start-time, and type the name of the script, terminate with Ctrl-D. Something like this:

$ at 01:00
at> ./myscript.sh
at> ^D
job 112 at Thu Mar 20 01:00:00 2020

The command will be started at 1 o'clock, and run till completion. Any output from the script will be sent to you by email. If you have several job-scripts, you can add them to the same at command, or better yet, run the at command again with a later start-time.

Note that running a job this way makes it difficult to monitor load and memory usage, as suggested above.

You can use the batch command to start a job when the load drops below some system-defined limit. The usage is similar to at, except you don't specify time. For example:

$ batch
at> ./my_other_script.sh
at> ^D
job 113 at Thu Mar 19 11:28:00 2020

Again, you will receive output by email.

The system load limit is set in /etc/sysconfig/atd, it should typically be slightly (one) less than the number of CPUs in the system. E.g.:

$ grep OPTS /etc/sysconfig/atd
OPTS="-l 23"

Get in touch if you think it should be set differently for the server you are using.

Note that running multithreading processes with batch defies the purpose, as the process will be started before there is a sufficient number of free CPUs to run on.

nice and renice

When there are several processes that compete for time on the CPU (i.e. load is high), the Linux kernel uses the process' nice value to decide its priority. A high nice value will give low priority.

To start a "nice" program (low priority), simply precede it with the nice command, e.g.:

$ nice ./my_third_script.sh &

Note that we add the ampersand character (&) to run in the backgroud.

You can specify a numeric nice value, but it is usually not necessary. A "non-nice" process runs with a default nice value is 0, you can specify values up to 19 (nice -n 19 command).

You can also increase the "niceness" of a running program with the renice command. You will then need to find the process ID (PID), for example using the ps command:

$ ps x | grep MATLAB
 4499 pts/31   Sl     1:26 /opt/app-sync/matlab/bin/glnxa64/MATLAB
 6949 pts/31   S+     0:00 grep MATLAB

$ renice -n 10 4499
4499 (process ID) old priority 0, new priority 10

(The word "priority" in the above output is misleading. It's the nice value that's increased.)

You can only increase the nice value. The super-user (root) can also decrease it, and even set negative nice values. The nice value of processes can be inspected with top or ps (with the appropriate options).

It's important to realize that the nice value has little effect unless the system is over-loaded! When CPU time is abundant, priority is not important.

Even when the system is not overloaded, running demanding processes "nicely" improves interactive response, because the user interface (the shell, etc) has higher priority. Which leads us to yet another rule of thumb:

Run your jobs with nice! It doesn't hurt.

Your system administrators will sometimes adjust (manually or automatically) the nice value of running processes, to ensure the system runs smoothly and completes the different jobs in due order. If your jobs already runs "nicely", we are less likely to change that.

Compiling your own programs

This is short introduction to a complex topic.

In the Linux world, many programs are distributed as source code, under a variety of more-or-less open-source licenses. To make use of these programs, they must be compiled and linked (or "built") on your computer of choice.

Your system admins will often do this upon request, but sometimes you might want to do this yourself. How this is done varies quite a bit, the following are intended to get you started.

First of, make notes of your progress: What you do, what works and what doesn't. This will be highly useful when you want to upgrade the program, or if you want to persuade us to install it system-wide.

Obviously, you need to download the source package somewhere, or perhaps you receive it directly from the authors. There should be some accompanying text describing how to build it. Read the instructions carefully!

Usually, you will have to decide upon a "target" directory, i.e. where the compiled binaries and libraries should go. As you don't have write access to system directories, this would in your case be someplace under your home directory or perhaps project directory.

Unpack the source package in a suitable subdirectory, and look through the various text files in the package (typically named README, INSTALL, HOWTO, or whatever). Possibly, this is the same information found online.

The two most common build systems are autoconf and cmake, the documentation should make it clear what is to be used in this case. Sometimes, there will be a custom-made build script or other non-standard procedures involved.The cmake that comes with RHEL7 is inadequate for many program packages, but you can load a newer version of cmake as a module on wessel (module avail cmake). Also, you can consider loading newer versions of the desired compiler (modul avail), possibly also the libraries or MPI utilities.

If your source uses autoconf, there will be a configure script that you must run, specifying your target directory:

$ ./configure --prefix=/target/directory

If the build system is cmake, it is often recommended to build in a separate directory. It could look like this, starting from the package source top directory:

$ mkdir build
$ cd build
$ cmake -DCMAKE_INSTALL_PREFIX=/target/prefix ..

Quite possibly, you will need other arguments to specify path to libraries or compile-time options.

cmake/autoconf will try to figure out what kind of machine this is, what compiler you have, and what libraries are available. Sometimes it needs a little help.

Some libraries or other prerequisites are simple to install from the RedHat/EPEL repositories. You'll have to contact us to request this.

If a required library is not available either as a module or from the OS repos, you will likely have to install that from source as well!

When configure/cmake is finished (without errors), it's time to compile, usually with the make utility. If you have 10 idle CPUs, you can set them all to compile source files, in parallel:

$ make -j 10

If this succeeds, you can usually install the finished files/libraries in the chosen target directory:

$ make install

If all this succeeds, you are pretty much done! Make sure you add your target directory's bin subdirectory to your PATH, and you're ready to roll.

For all the utilities described here, there should be manuals available with the man command, e.g.:

$ man uptime

If the manual is missing, or unintelligible, you might find it helpful to google your question.

Written by Hans Peter Verne, last updated 2020-03-19

By Hans Peter

Published Mar. 19, 2020 8:13 PM - Last modified May 23, 2022 6:10 PM