Job Submission

Job Submission

In order to create a resource allocation and launch tasks you can submit a batch script.

A batch script, submitted to the scheduling system must specify the job specifications:

  1. resource queue , default is compute
  2. number of nodes required
  3. number of cores per node required
  4. maximum wall time for the job , (please notice the jobs exceeding wall time will be killed).

To submit a job, user can use the sbatch command.

sbatch my_script

Please check sbatch man for more information.

man sbatch

Define batch script

Batch scripts contain

  1. scheduler directives : lines begin with #SBATCH
  2. shell commands: UNIX shell (bash) commands
  3. job steps: created with the srun command
#!/bin/bash
#SBATCH --job-name=my_script    # Job name
#SBATCH --ntasks=2              # Number of tasks
#SBATCH --time=01:30:00         # Run time (hh:mm:ss) - 1.5 hours

module load gnu intelmpi        #load any needed modules

echo "Start at `date`"
cd $HOME/workdir
./a.out
echo "End at `date`"

To submit this batch script

sbatch my_script

Job Specifications

Option Argument Specification
--job-name, -J job_name Job name is job_name
--partition, -p queue_name Submits to queue queue_name
--account, -A project_name Project to charge compute hours
--ntasks, -n number_of_tasks Total number of tasks
--nodes, -N number_of_nodes Number of nodes
--ntasks-per-node ntasks_per_node Tasks per node
--cpus-per-task, -c ntasks_per_node Threads per task
--time, -t HH:MM:SS Time limit (hh:mm:ss)
--mem memory_mb Total memory requirements (MB)
--output, -o stdout_filename Direct job satndard output to stdout_filename, (%j expands to jobID)
--error, -e stderr_filename Direct job error to error_file, (%j expands to jobID)
--depend, -d afterok:jobid Job dependency

SLURM Environment Variables

SLURM provides environment variables for most of the values used in the #SBATCH directives.

Evironment Variable Description
$SLURM_JOBID Job id
$SLURM_JOB_NAME Job name
$SLURM_SUBMIT_DIR Submit directory
$SLURM_SUBMIT_HOST Submit host
$SLURM_JOB_NODELIST Node list
$SLURM_JOB_NUM_NODES Number of nodes
$SLURM_CPUS_ON_NODE Number of cores/node
$SLURM_CPUS_PER_TASK Threads per task
$SLURM_NTASKS_PER_NODE Number of tasks per node
#!/bin/bash
#SBATCH --job-name=slurm_env
#SBATCH --nodes=2                # 2 nodes
#SBATCH --ntasks-per-node=12     # Number of tasks to be invoked on each node
#SBATCH --mem-per-cpu=1024       # Minimum memory required per CPU (in megabytes)
#SBATCH --time=00:01:00          # Run time in hh:mm:ss
#SBATCH --error=job.%J.out
#SBATCH --output=job.%J.out

echo "Start at `date`"
echo "Running on hosts: $SLURM_NODELIST"
echo "Running on $SLURM_NNODES nodes."
echo "Running $SLURM_NTASKS_PER_NODE tasks per node"
echo "Job id is $SLURM_JOBID"
echo "End at `date`"

Job Scripts

Here are some sample job submission scripts for different runtime models.

  • Serial job: Run serial programs,scripts on a single core.
  • MPI job: Run multi-process programs with MPI.
  • Hybrid job: Parallel programs with MPI and OpenMP threads.
  • GPU job: Utilize GPU accelerators.
  • PHI job: Utilize PHI accelrators (offload mode only).

Serial batch script

#!/bin/bash

#-----------------------------------------------------------------
# Serial job , requesting 1 core , 2800 MB of memory per job
#-----------------------------------------------------------------

#SBATCH --job-name=seraljob# Job name
#SBATCH --output=serialjob.%j.out # Stdout (%j expands to jobId)
#SBATCH --error=serialjob.%j.err # Stderr (%j expands to jobId)
#SBATCH --ntasks=1 # Total number of tasks
#SBATCH --nodes=1 # Total number of nodes requested
#SBATCH --ntasks-per-node=1 # Tasks per node
#SBATCH --cpus-per-task=1 # Threads per task
#SBATCH --mem=2800 # Memory per job in MB
#SBATCH -t 01:30:00 # Run time (hh:mm:ss) - (max 48h)
#SBATCH --partition=taskp # Submit queue
#SBATCH -A testproj # Accounting project

# Load any necessary modules
module load gnu
module load intel

# Launch the executable a.out
./a.out ARGS

Pure MPI batch script

Launch MPI jobs with srun command

DON’T USE mpirun AND mpiexec

#!/bin/bash

#-----------------------------------------------------------------
# Pure MPI job , using 80 procs on 4 nodes ,
# with 20 procs per node and 1 thread per MPI task

#-----------------------------------------------------------------

#SBATCH --job-name=mpijob # Job name
#SBATCH --output=mpijob.%j.out # Stdout (%j expands to jobId)
#SBATCH --error=mpijob.%j.err # Stderr (%j expands to jobId)
#SBATCH --ntasks=80 # Total number of tasks
#SBATCH --nodes=4 # Total number of nodes requested
#SBATCH --ntasks-per-node=20 # Tasks per node
#SBATCH --cpus-per-task=1 # Threads per task(=1) for pure MPI
#SBATCH --mem=56000 # Memory per job in MB
#SBATCH -t 01:30:00 # Run time (hh:mm:ss) - (max 48h)
#SBATCH --partition=compute # Submit queue
#SBATCH -A testproj # Accounting project

# Load any necessary modules

module load gnu
module load intel
module load intelmpi

export I_MPI_FABRICS=shm:dapl

# Launch the executable

srun EXE ARGS

Hybrid MPI/OpenMP batch script

Launch MPI jobs with srun command

DON’T USE mpirun AND mpiexec

#!/bin/bash

#-----------------------------------------------------------------
# Hybrid MPI/OpenMP job , using 80 procs on 4 nodes ,
# with 2 procs per node and 10 threads per MPI task.
#-----------------------------------------------------------------

#SBATCH --job-name=hybridjob # Job name
#SBATCH --output=hybridjob.%j.out # Stdout (%j expands to jobId)
#SBATCH --error=hybridjob.%j.err # Stderr (%j expands to jobId)
#SBATCH --ntasks=8 # Total number of tasks
#SBATCH --nodes=4 # Total number of nodes requested
#SBATCH --ntasks-per-node=2 # Tasks per node
#SBATCH --cpus-per-task=10 # Threads per task
#SBATCH --mem=56000 # Memory per job in MB
#SBATCH -t 01:30:00 # Run time (hh:mm:ss) - (max 48h)
#SBATCH --partition=compute # Submit queue
#SBATCH -A testproj # Accounting project

# Load any necessary modules

module load gnu
module load intel
module load intelmpi

export I_MPI_FABRICS=shm:dapl

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

# Launch the executable
srun EXE ARGS

GPU batch script

Launch GPU accelerated jobs.

#!/bin/bash

#-----------------------------------------------------------------
# GPU job , using 80 procs on 4 nodes ,
# with 2 gpus per node, 1 procs per node  and 20 threads per MPI task.
#-----------------------------------------------------------------

#SBATCH --job-name=gpujob # Job name
#SBATCH --output=gpujob.%j.out # Stdout (%j expands to jobId)
#SBATCH --error=gpujob.%j.err # Stderr (%j expands to jobId)
#SBATCH --ntasks=4 # Total number of tasks
#SBATCH --gres=gpu:2 # GPUs per node
#SBATCH --nodes=4 # Total number of nodes requested
#SBATCH --ntasks-per-node=1 # Tasks per node
#SBATCH --cpus-per-task=20 # Threads per task
#SBATCH --mem=56000 # Memory per job in MB
#SBATCH -t 01:30:00 # Run time (hh:mm:ss) - (max 48h)
#SBATCH --partition=gpu # Run on the GPU nodes queue
#SBATCH -A testproj # Accounting project

# Load any necessary modules

module load gnu
module load intel
module load intelmpi
module load cuda

export I_MPI_FABRICS=shm:dapl

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

# Launch the executable
srun EXE ARGS

PHI batch script

Launch PHI accelerated jobs.

#!/bin/bash

#-----------------------------------------------------------------
# PHI job , using 80 procs on 4 nodes ,
# with 2 phi's per node, 1 procs per node  and 20 threads per MPI task.
#-----------------------------------------------------------------

#SBATCH --job-name=phijob # Job name
#SBATCH --output=phijob.%j.out # Stdout (%j expands to jobId)
#SBATCH --error=phijob.%j.err # Stderr (%j expands to jobId)
#SBATCH --ntasks=4 # Total number of tasks
#SBATCH --nodes=4 # Total number of nodes requested
#SBATCH --ntasks-per-node=1 # Tasks per node
#SBATCH --cpus-per-task=20 # Threads per task
#SBATCH --gres:mic:2 # Accelerators per node
#SBATCH --mem=56000 # Memory per job in MB
#SBATCH -t 01:30:00 # Run time (hh:mm:ss) - (max 48h)
#SBATCH --partition=phi # Run on the GPU nodes queue
#SBATCH -A testproj # Accounting project

# Load any necessary modules

module load gnu
module load intel
module load intelmpi

export I_MPI_FABRICS=shm:dapl

## (HOST) OPENMP NUMBER OF THREADS
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

## (MIC) OPENMP NUMBER OF THREADS
export MIC_ENV_PREFIX=MIC
## (MIC) 60 physical cores 4 hardware threads
export MIC_OMP_NUM_THREADS=240

# Launch the executable
srun EXE ARGS