Efficient use of CPUs/cores on BioHPC Lab machines

All BioHPC machines have multiple CPUs (cores) available. Modern processors are faster than the past

ones, but the biggest difference between old and new servers is in the amount of parallel processing

power available – multiple CPUs and cores. Without using this power, i.e. running programs in parallel,

there will be little speedup and lots of resources will be wasted. From practical point of view the parallel

processing power of a server is measured in cores, which are processing units capable of executing code

independently. CPUs are separate microprocessors on separate chips, and each of them usually host

multiple cores.

There 4 general ways to run programs in parallel: (a) using a given program’s built-in parallelization, (b)

executing a bunch of programs in the background in parallel, (c) using a driver program to execute

multiple programs, or (d) using a job scheduler (SLURM). Each of these methods will be discussed

below.

In order to illustrate how important it is to run programs in parallel, the table below shows two real

world examples.

(a) Using BLAST to search Swissprot database for matches of 10,000 randomly chosen human cDNA

sequences. Swissprot is a good example of a small memory footprint.

Machine

CPU

available

cores

available

cores

used

time

(hrs)

speedup

(in machine)

cbsulm10

0.931

27.506

cbsulm10

1.962

13.056

cbsulm10

25.619

1.000

cbsumm15

2.058

12.117

cbsumm15

2.593

9.616

cbsumm15

24.930

1.000

cbsum1c2b008

4.193

6.717

cbsum1c2b008

28.161

1.000

(b) Using BLAST to search nr database for matches of 2,000 randomly chosen human cDNA sequences.

Nr is a good example of a large memory footprint.

Machine

CPU

available

cores

available

cores

used

time

(hrs)

speedup

(in machine)

cbsulm10

10.97

2.222

cbsulm10

24.37

1.000

cbsumm15

26.10

2.140

cbsumm15

55.85

1.000

The above examples highlight several important points of parallel execution.

✓ First – it is VERY important to use multiple cores. BLAST on 64 cores takes only 0.931 hours (2K

cDNA vs swissprot), the same run on a single core takes over 25 hours!

✓ Speedup is not directly proportional to the number of cores. Sometimes it is linear, but most

often it is not, usually it is somewhat less than expected, but still sufficiently large to justify the

effort. 64 cores compared to 1 core in example (a) have 27.5 speedup rate, much less than 64

expected from linear trend, but still large!

✓ Speedup depends not only on the machine (hardware), but also program (algorithm) and

parameters (nr vs swissport). When using nr database (example b) on cbsumm15 the speedup

between 12 and 24 cores is 2.14, for swissprot in the same situation (example a) it is

12.117/9.616=1.26. It is often a good idea to run a short example first (if possible) on a subset of

data to figure out the optimal number of cores.

(a) Using a given program’s built-in parallelization

Many programs have built-in parallelization options. You will need to read the appropriate

documentation to find out what is the name of this option; usually it is described as “number of

threads” or “number of processes”. Typical examples are blast (option is ‘-a’) and blast+

(‘-num_threads’).

Other examples are tophat (‘-p’), cuffdiff (‘-p’), bwa (‘-t’) and bowtie (‘-p’).

Most of the common CPU-intensive programs do have multithreading options, if their algorithms permit

it. It is usually important to research how many cores are reasonable; sometimes using more cores will

not give any measurable speedup (diminishing returns). It may not be a big problem, since you can

always run multiple programs in the background, each on multiple cores.

(b) Executing a bunch of programs in the background in parallel

blastall -a 8 [other options]

blast+ -num_threads 8 [other options]

tophat –p 8 [other options]

cuffdiff –p 8 [other options]

bwa –t 8 [other options]

bowtie –p 8 [other options]

If the number of programs to run is less or equal to the number of cores available you can run them all

in the background in parallel. To do so, you will need to prepare a file containing all the commands

necessary, with each program output redirected to a different file. Here is a real world example of

running several tophat commands, each using multiple cores on a 64 core machine. The file prepared

contains 9 commands since I had 9 tophat jobs to run for this particular RNA-seq pipeline. To use all

cores on a 64 core machine I used 7 cores per tophat command.

In principle, the file should look like the template below

tophat -p 7 -o B_L1-1 --transcriptome-index genome/transcriptome/ZmB73_5a_WGS \