Selecting GPUs

Selecting a GPU is useful when you need to use a certain GPU model that meets your research requirement. There are two methods to select GPUs in your batch job script as shown below.

Note

If the resource you have requested is being used, your job will be placed in a queue until that resource is avilable.

Method 1

In the scenario where it's compulsory for the job to use a certain GPU, use the #SBATCH --constraint= parameter in the job submission script.

Example —

#SBATCH --output=%u.%j.out          # Where should the log files go?
                                    # You must provide an absolute path eg /common/home/module/username/
                                    # If no paths are provided, the output file will be placed in your current working directory
#SBATCH --requeue                   # Remove if you are not want the workload scheduler to requeue your job after preemption
#SBATCH --constraint=a40            # This tells the workload scheduler to provision you a40 nodes 
################################################################
## EDIT AFTER THIS LINE IF YOU ARE OKAY WITH DEFAULT SETTINGS ##
################################################################

Method 2

In the scenario where it's optional for the job to use a certain GPU, use the ##SBATCH --prefer= parameter in the job submission script. If the requested resource is unavailable, other resources will be assigned to the job.

#SBATCH --output=%u.%j.out          # Where should the log files go?
                                    # You must provide an absolute path eg /common/home/module/username/
                                    # If no paths are provided, the output file will be placed in your current working directory
#SBATCH --requeue                   # Remove if you are not want the workload scheduler to requeue your job after preemption
#SBATCH --prefer=a40                # This tells the workload scheduler to provision you a40 nodes at a best effort basis 
################################################################
## EDIT AFTER THIS LINE IF YOU ARE OKAY WITH DEFAULT SETTINGS ##
################################################################

Using operators

You can use operators such as | (or) & (and) to define the type of GPU resources that your job requires. For example, your job requires either a H100 or H100 NVL GPU for your job.

srun -p researchlong -c 4 --mem=8gb --gres=gpu:1 --constraint="h100|h100nvl" nvidia-smi

Tags

When the memory parameter --mem is used, the job may be assigned to any card that fulfills the memory requirement. If a GPU is preemptable, ensure your work is checkpointed to prevent data data.

Please refer to the Google Spreadsheet (here) for a list of resources and their tags. Log in with your SMU credentials (abc@smu.edu.sg) in order to view the spreadsheet.

`nopreempt` tag

When a nopreempt tag is applied, the scheduler will run your jobs on nodes that will not preempt. Such nodes are limited and your jobs will be queued if there are no resources available.

Usage examples for node requests in srun and sbatch can be found below

Example #1 Requesting for a node that will not preempt

In srun:

srun -p researchlong -c 4 --mem=8gb --gres=gpu:1 --constraint=nopreempt nvidia-smi

In sbatch file:

#SBATCH --constraint="nopreempt"

Example #2 Request for a non preemptive compute node with l40s GPUs

In srun:

srun -p researchlong -c 4 --mem=8gb --gres=gpu:1 --constraint="nopreempt&l40s" nvidia-smi

In sbatch file:

#SBATCH --constraint="nopreempt&l40s"

Example #3 Request for a non preemptive compute node with either l40s OR v100 GPUs

In srun:

srun -p researchlong -c 4 --mem=8gb --gres=gpu:1 --constraint="nopreempt&v100|l40s" nvidia-smi

In sbatch file:

#SBATCH --constraint="nopreempt&v100|l40s"

Method 1​

Method 2​

Using operators​

Tags​

nopreempt tag​

Example #1 Requesting for a node that will not preempt​

Example #2 Request for a non preemptive compute node with l40s GPUs​

Example #3 Request for a non preemptive compute node with either l40s OR v100 GPUs​

Method 1

Method 2

Using operators

Tags

`nopreempt` tag

Example #1 Requesting for a node that will not preempt

Example #2 Request for a non preemptive compute node with l40s GPUs

Example #3 Request for a non preemptive compute node with either l40s OR v100 GPUs