Skip to content

Jobs

To execute workloads in the cluster, users must submit their jobs through the workload scheduler. The number of jobs a user can submit is based on their assigned quota. This quota can be found using the myinfo command.

Submitting Jobs

The sbatch command will read a shell script (.sh file) and execute the job when resources are available in the cluster. The workload scheduler can be configured to inform the user when the submitted job is being executed and when the job ends or fails.

Tip

Files written to /tmp/ may be inaccessible after the job has been completed.

As the submitted job may be distributed to other nodes for processing, do not use the /tmp/ directory to store your temporary files or output. Instead, use the assigned scratch directory or your home directory to preserve any logs or output.

A sample job submission script is located here (open it with your favourite text editor to view) —

Download Shell Script Template

These scripts must have the executable permission granted. This can be done with the following command

chmod +x <file path>/shellscript.sh

Commands in sbatch

Parameters listed below allow users to adjust the resources assigned for their job. If no values are specified, then the cluster will assign a default value to the job.

Important

By default, no GPUs will be assigned to the job.

All parameters in the SBATCH File starts with a # prefix

/bin/bash
#SBATCH --job-name=finalSubmissionFinal
#SBATCH --partition=project
Command parameters Description Argument format
--job-name Name of the job submitted, without spaces <jobid>.log
--partition Name of the partiton assigned project
--mail-type When should the cluster send an email. Needs to be specified if --mail-user is set ALL or BEGIN or END or FAIL (multiple options separated by comma)
--mail-user Email address to send notifications to. Needs to be specified if --mail-type is set Email Address
--time Maximum time before the cluster terminates the job HH:MM:SS
--nodes How many nodes to run the job on Integer
--cpus-per-tasks Number of CPUs to use for the workload Integer
--mem Amount of memory require by the job in Gigabytes (GB) (Integer)GB (no space)
--gres Determines how many GPU(s) will be assigned to the job, in the following format gpu:quantity. By default, no GPU will be issued for jobs. gpu:(Integer)
--output Where should the log files go? Recommended to be in home directory : --output /common/home/Module/moduleaccount/%u.%j.out File path

To submit the job to the cluster —

sbatch /path/to/sh/file.sh

Submitting jobs with the srun command

The srun command is similar to sbatch. It enables the user to request for resources from the cluster to run an interactive job. By executing your script through srun, the output will be directed to your terminal instead.

This is not recommended because it is an interactive session. The job will be terminated if you disconnect from the session.

srun should be used in a script that is read by sbatch (Refer to the template)

The command options are similar to that of sbatch

srun --partition=normal --nodes=1 --cpus-per-task=30 --mem=2GB /path/to/final_finalProject.py

Job queue commands

View the status of your jobs

myqueue

Cancel your jobs

scancel <JOBID>

Cancel all jobs submitted by your account

scancel --me

View detailed information about a running/pending job

Take note

This command only fetches information on jobs that are currently pending/running OR completed within the past 5 minutes.

scontrol show jobid <job id>

View information about previously completed jobs

Query using job id

sacct --job=<jobid> --format=JobID,User,Jobname,partition,state,time,start,end,elapsed,AllocTRES%45

Query using job name

sacct --name=<jobname> --format=JobID,User,Jobname,partition,state,time,start,end,elapsed,AllocTRES%45

View all jobs for past N days

You may view older jobs by modifying the query of 1 day ago below (up to 30 days).

sacct --starttime $(date -d '1 day ago' +"%Y-%m-%d") --format=JobID,User,Jobname,partition,state,time,start,end,elapsed,AllocTRES%45