Running Production Jobs

Overview of the LSF Batch System

All production jobs on the Pegasus cluster is run using the LSF batch system. Generally, each user is allowed to submit a maximum number of 32 jobs or use 256 slots (cores), whichever comes first, to Pegasus via LSF. To learn more about LSF batch system, please visit our documentation repository. If you need to have your job limit increased, please contact us.

You can run interactive jobs (LSF queue “debug”) to debug your program before submitting it to the cluster. After you have tested your jobs, you can run batch jobs for production. Unlike interactive jobs, batch jobs are controlled via scripts. These scripts tell the system which resources a job will require and how long they will be needed. Then the requests are submitted to the LSF queue manager to be processed. Here is a list of all the queues available in Pegasus.

Table 14: Common LSF commands and description
Command Purpose
bsub < ScriptFile Submits a job via script file to LSF to be run. NOTE: the redirection symbol, “<”, is a must when submitting the job
bjobs Displays running and pending jobs in the queue.
bhist Displays historical information about your finished jobs.
bkill Removes/cancels a job or jobs from the class.
bqueues Shows the current configuration of queues.
bhosts Shows the load on each node.
bpeek Displays stderr and stdout from your unfinished job.

Example Usage:

The command bsub < ScriptFile will submit the given script for processing. You must write a script containing the information LSF needs to allocate the resources your job requires, to handle standard I/O streams, and to run the job. Please see the example scripts below. On submission, LSF will return the job id.

[user@kronos]:>bsub < test.job
Job <4225> is submitted to default queue .

The commands bjobs will show all jobs currently running or queued on the system.

[user@m1 roms]$ bjobs
JOBID  USER   STAT  QUEUE    FROM_HOST  EXEC_HOST   JOB_NAME  SUBMIT_TIME
4225   user   RUN   small   m1         8*n0060     testjob   Mar  2 11:53
                                        8*n0061
                                        8*n0063
                                        8*n0064

For details about your particular job, issue the command bjobs -l JobID where JobID is obtained from the “JOBID” field of the above bjobs output. The command bkill JobID will remove the job from the quue and terminate the job if it is running.

[xwu@m1 roms]$ bkill 4225
Job <4225> is being terminated


An example script for a serial Job

#!/bin/bash
#BSUB -J serialjob
#BSUB -o %J.out
#BSUB -e %J.err
#BSUB -W 1:00
#BSUB -q small
#BSUB -n 1
#BSUB -B
#BSUB -N
#
# Run serial executable on 1 cpu of one node
cd ${HOME}/path/to/current/directory
./test.x a b c

Here is a detailed line-by-line breakdown of the keywords and their assigned values listed in this script:

#!/bin/bash

Specifies the shell to be used when executing the command portion of the script.
The default is Bash shell.

BSUB -J serialjob
assigns a name to job. The name of the job will show in the bjobs output.

#BSUB -o %J.out
redirect std output to a specified file. In this example, %J is the JobID.

#BSUB -e %J.err
redirect std error to a specified file

#BSUB -W 1:00
set wallclock time limit of 1 hour

#BSUB -q small
specify queue to be used

#BSUB -n 1
specify number of processors. For serial job, it would be 1.

#BSUB -B
Send email at job start

#BSUB -N
Send email at job end

LSF stops reading directives at the first executable (i.e. non-blank, and doesn’t begin with #) line. The last two lines simply say to change to the current directory, and then run the executable “test.x” with arguments “a b c”.

An example script for an MPI Job

#!/bin/bash
#BSUB -J mpijob
#BSUB -o %J.out
#BSUB -e %J.err
#BSUB -a mpich2
#BSUB -W 1:30
#BSUB -q small
#BSUB -n 32
#
# Run an MPI job with the "mpirun.lsf" MPI job starter.
mpirun.lsf ./test.x mpi.in

Here is a line-by-line breakdown of the keywords and their assigned values listed in this script:

#!/bin/bash
Specifies the shell to be used when executing the command portion of the script.
The default is Bash shell.

BSUB -J mpijob
assigns a name to job. The name of the job will show in the bjobs output.

#BSUB -o %J.out
redirect std output to a specified file. In this example, %J is the JobID.

#BSUB -e %J.err
redirect std error to a specified file

#BSUB -a openmpi
specify serial/parallel job options

#BSUB -W 1:30
set wallclock time limit of 1 hour and 30 mins.

#BSUB -q small
specify queue to be used for the job

#BSUB -n 32
specify number of processors. For MPI jobs, it is the total number of processes initialized.

LSF stops reading directives at the first executable (i.e. non-blank, and doesn’t begin with #) line.
The last line simply says to run the executable “test.x” under the current directory with input file “mpi.in”.

An example script for an OpenMP Job

#!/bin/bash
#BSUB -J openmpjob
#BSUB -o %J.out
#BSUB -e %J.err
#BSUB -W 4:00
#BSUB -q small
#BSUB -n 8
#BSUB -R "span[ptile=8]"
#
# Run the OpenMP job with OMP_NUM_THREADS specified
export OMP_NUM_THREADS=8
./test.x openmp.in

Here is a line-by-line breakdown of the keywords and their assigned values listed in this script:

#!/bin/bash
Specifies the shell to be used when executing the command portion of the script.
The default is Bash shell.

BSUB -J openmpjob
assigns a name to job. The name of the job will show in the bjobs output.

#BSUB -o %J.out
redirect std output to a specified file. In this example, %J is the JobID.

#BSUB -e %J.err
redirect std error to a specified file

#BSUB -W 4:00
set wallclock time limit of 4 hours

#BSUB -q small
specify queue to be used for the job

#BSUB -n 8
specify number of processors. For openMP jobs, it could be the total number of threads.

#BSUB -R “span[ptile=8]“
specify openMP resource requirements per node.

LSF stops reading directives at the first executable (i.e. non-blank, and doesn’t begin with #) line. At the end of the LSF portion of the script, the environment variable OMP_NUM_THREADS is set to 8. This will result in the job running over 8 threads. The last line simply says to run the executable “test.x” under the current directory with input file “openmp.in”.