Jobs

To execute workloads in the cluster, users must submit their jobs through the workload scheduler. The number of jobs a user can submit is based on their assigned quota. This quota can be found using the myinfo command.

Submitting Jobs

Submitting jobs with the `sbatch` command (recommended)

The sbatch command will read a shell script (.sh file) and execute the job when resources are available in the cluster. The workload scheduler can be configured to inform the user when the submitted job is being executed and when the job ends or fails.

Tip

Files written to /tmp/ may be inaccessible after the job has been completed.

As the submitted job may be distributed to other nodes for processing, do not use the /tmp/ directory to store your temporary files or output. Instead, use the assigned scratch directory or your home directory to preserve any logs or output.

A sample job submission script is located here (open it with your favourite text editor to view) —

Download Shell Script Template

These scripts must have the executable permission granted. This can be done with the following command

chmod +x <file path>/shellscript.sh

Commands in sbatch

Parameters listed below allow users to adjust the resources assigned for their job. If no values are specified, then the cluster will assign a default value to the job.

Important

By default, no GPUs will be assigned to the job.

All parameters in the SBATCH File starts with a # prefix

/bin/bash
#SBATCH --job-name=finalSubmissionFinal
#SBATCH --partition=project

Command parameters	Description	Argument format
`--job-name`	Name of the job submitted, without spaces	`<jobid>.log`
`--partition`	Name of the partiton assigned	`project`
`--mail-type`	When should the cluster send an email. Needs to be specified if `--mail-user` is set	`ALL` or `BEGIN` or `END` or `FAIL` (multiple options separated by comma)
`--mail-user`	Email address to send notifications to. Needs to be specified if `--mail-type` is set	Email Address
`--time`	Maximum time before the cluster terminates the job	`HH:MM:SS`
`--nodes`	How many nodes to run the job on	Integer
`--cpus-per-tasks`	Number of CPUs to use for the workload	Integer
`--mem`	Amount of memory require by the job in Gigabytes (GB)	`(Integer)GB` (no space)
`--gres`	Determines how many GPU(s) will be assigned to the job, in the following format `gpu:quantity`. By default, no GPU will be issued for jobs.	`gpu:(Integer)`
`--output`	Where should the log files go? Recommended to be in home directory : `--output /common/home/Module/moduleaccount/%u.%j.out`	File path

To submit the job to the cluster —

sbatch /path/to/sh/file.sh

Submitting jobs with the `srun` command

The srun command is similar to sbatch. It enables the user to request for resources from the cluster to run an interactive job. By executing your script through srun, the output will be directed to your terminal instead.

This is not recommended because it is an interactive session. The job will be terminated if you disconnect from the session.

srun should be used in a script that is read by sbatch (Refer to the template)

The command options are similar to that of sbatch

srun --partition=normal --nodes=1 --cpus-per-task=30 --mem=2GB /path/to/final_finalProject.py

Job queue commands

View the status of your jobs

myqueue

Cancel your jobs

scancel <JOBID>

Cancel all jobs submitted by your account

scancel --me

View detailed information about a running/pending job

Take note

This command only fetches information on jobs that are currently pending/running OR completed within the past 5 minutes.

scontrol show jobid <job id>

View information about previously completed jobs

Query using job id

sacct --job=<jobid> --format=JobID,User,Jobname,partition,state,time,start,end,elapsed,AllocTRES%45

Query using job name

sacct --name=<jobname> --format=JobID,User,Jobname,partition,state,time,start,end,elapsed,AllocTRES%45

View all jobs for past N days

You may view older jobs by modifying the query of 1 day ago below (up to 30 days).

sacct --starttime $(date -d '1 day ago' +"%Y-%m-%d") --format=JobID,User,Jobname,partition,state,time,start,end,elapsed,AllocTRES%45