Jobs
To execute workloads in the cluster, users must submit their jobs through the workload scheduler. The number of jobs a user can submit is based on their assigned quota. This quota can be found using the myinfo
command.
Submitting Jobs
Submitting jobs with the sbatch
command (recommended)
The sbatch
command will read a shell script (.sh file) and execute the job when resources are available in the cluster. The workload scheduler can be configured to inform the user when the submitted job is being executed and when the job ends or fails.
Tip
Files written to /tmp/
may be inaccessible after the job has been completed.
As the submitted job may be distributed to other nodes for processing, do not use the /tmp/
directory to store your temporary files or output. Instead, use the assigned scratch directory or your home directory to preserve any logs or output.
A sample job submission script is located here (open it with your favourite text editor to view) —
Download Shell Script Template
These scripts must have the executable
permission granted. This can be done with the following command
chmod +x <file path>/shellscript.sh
Commands in sbatch
Parameters listed below allow users to adjust the resources assigned for their job. If no values are specified, then the cluster will assign a default value to the job.
Important
By default, no GPUs will be assigned to the job.
All parameters in the SBATCH File starts with a #
prefix
Command parameters | Description | Argument format |
---|---|---|
--job-name |
Name of the job submitted, without spaces | <jobid>.log |
--partition |
Name of the partiton assigned | project |
--mail-type |
When should the cluster send an email. Needs to be specified if --mail-user is set |
ALL or BEGIN or END or FAIL (multiple options separated by comma) |
--mail-user |
Email address to send notifications to. Needs to be specified if --mail-type is set |
Email Address |
--time |
Maximum time before the cluster terminates the job | HH:MM:SS |
--nodes |
How many nodes to run the job on | Integer |
--cpus-per-tasks |
Number of CPUs to use for the workload | Integer |
--mem |
Amount of memory require by the job in Gigabytes (GB) | (Integer)GB (no space) |
--gres |
Determines how many GPU(s) will be assigned to the job, in the following format gpu:quantity . By default, no GPU will be issued for jobs. |
gpu:(Integer) |
--output |
Where should the log files go? Recommended to be in home directory : --output /common/home/Module/moduleaccount/%u.%j.out |
File path |
To submit the job to the cluster —
sbatch /path/to/sh/file.sh
Submitting jobs with the srun
command
The srun
command is similar to sbatch
. It enables the user to request for resources from the cluster to run an interactive job. By executing your script through srun
, the output will be directed to your terminal instead.
This is not recommended because it is an interactive session. The job will be terminated if you disconnect from the session.
srun
should be used in a script that is read by sbatch (Refer to the template)
The command options are similar to that of sbatch
srun --partition=normal --nodes=1 --cpus-per-task=30 --mem=2GB /path/to/final_finalProject.py
Job queue commands
View the status of your jobs
myqueue
Cancel your jobs
scancel <JOBID>
Cancel all jobs submitted by your account
scancel --me
View detailed information about a running/pending job
Take note
This command only fetches information on jobs that are currently pending/running OR completed within the past 5 minutes.
scontrol show jobid <job id>
View information about previously completed jobs
Query using job id
sacct --job=<jobid> --format=JobID,User,Jobname,partition,state,time,start,end,elapsed,AllocTRES%45
Query using job name
sacct --name=<jobname> --format=JobID,User,Jobname,partition,state,time,start,end,elapsed,AllocTRES%45
View all jobs for past N days
You may view older jobs by modifying the query of 1 day ago
below (up to 30 days).
sacct --starttime $(date -d '1 day ago' +"%Y-%m-%d") --format=JobID,User,Jobname,partition,state,time,start,end,elapsed,AllocTRES%45