Selecting GPUs
Selecting a GPU is useful when you need to use a certain GPU model that meets your research requirement. There are two methods to select GPUs in your batch job script as shown below.
If the resource you have requested is being used, your job will be placed in a queue until that resource is avilable.
Method 1
In the scenario where it's compulsory for the job to use a certain GPU, use the #SBATCH --constraint= parameter in the job submission script.
Example —
#SBATCH --output=%u.%j.out # Where should the log files go?
# You must provide an absolute path eg /common/home/module/username/
# If no paths are provided, the output file will be placed in your current working directory
#SBATCH --requeue # Remove if you are not want the workload scheduler to requeue your job after preemption
#SBATCH --constraint=a40 # This tells the workload scheduler to provision you a40 nodes
################################################################
## EDIT AFTER THIS LINE IF YOU ARE OKAY WITH DEFAULT SETTINGS ##
################################################################
Method 2
In the scenario where it's optional for the job to use a certain GPU, use the ##SBATCH --prefer= parameter in the job submission script. If the requested resource is unavailable, other resources will be assigned to the job.
#SBATCH --output=%u.%j.out # Where should the log files go?
# You must provide an absolute path eg /common/home/module/username/
# If no paths are provided, the output file will be placed in your current working directory
#SBATCH --requeue # Remove if you are not want the workload scheduler to requeue your job after preemption
#SBATCH --prefer=a40 # This tells the workload scheduler to provision you a40 nodes at a best effort basis
################################################################
## EDIT AFTER THIS LINE IF YOU ARE OKAY WITH DEFAULT SETTINGS ##
################################################################
Using operators
You can use operators such as | (or) & (and) to define the type of GPU resources that your job requires. For example, your job requires either a H100 or H100 NVL GPU for your job.
srun -p researchlong -c 4 --mem=8gb --gres=gpu:1 --constraint="h100|h100nvl" nvidia-smi
Tags
When the memory parameter --mem is used, the job may be assigned to any card that fulfills the memory requirement. If a GPU is preemptable, ensure your work is checkpointed to prevent data data.
Please refer to the Google Spreadsheet (here) for a list of resources and their tags. Log in with your SMU credentials (abc@smu.edu.sg) in order to view the spreadsheet.
nopreempt tag
When a nopreempt tag is applied, the scheduler will run your jobs on nodes that will not preempt. Such nodes are limited and your jobs will be queued if there are no resources available.
Usage examples for node requests in srun and sbatch can be found below
Example #1 Requesting for a node that will not preempt
In srun:
srun -p researchlong -c 4 --mem=8gb --gres=gpu:1 --constraint=nopreempt nvidia-smi
In sbatch file:
#SBATCH --constraint="nopreempt"
Example #2 Request for a non preemptive compute node with l40s GPUs
In srun:
srun -p researchlong -c 4 --mem=8gb --gres=gpu:1 --constraint="nopreempt&l40s" nvidia-smi
In sbatch file:
#SBATCH --constraint="nopreempt&l40s"
Example #3 Request for a non preemptive compute node with either l40s OR v100 GPUs
In srun:
srun -p researchlong -c 4 --mem=8gb --gres=gpu:1 --constraint="nopreempt&v100|l40s" nvidia-smi
In sbatch file:
#SBATCH --constraint="nopreempt&v100|l40s"