Skip to content

Job submission guide (pip3)

Introduction

This serves as a quickstart to get your job submitted onto the cluster, after transferring your code onto the server.

If you have used Jupyter notebook to develop your code, you will need to convert it into a Python file

Note

The researchlong queue will preempt jobs whenever they are insufficient resources. Append the #SBATCH --requeue parameter to your sbatch file to re-submit the job if it gets preempted.

Submitting Python Script to Cluster

Pre-requsities

Before following this guide, ensure you have -

  1. Logged into the cluster through SSH
  2. Copied your project files into the cluster
  3. Downloaded a copy of the shell script template below.

Download pip3 job submission shell script template

Getting your account quota information

  1. Log into the GPU cluster
  2. Execute the command "myinfo" in your terminal/powershell

    [IS000G3@origami ~]$ myinfo
    ================ Account parameters ================
    
    Description                 | Value
    ---------------------------------------------
    Account name                | is000
    List of Assigned Partition  | tester
    List of Assigned QOS        | is000qos
    ---------------------------------------------
    ... output truncated
    
  3. Copy down the values of

    • Account name (Line 6)
    • List of Assigned Partition (Line 7)
    • List of Assigned QOS (Line 8)

Amending the template file

  1. Open the downloaded shell script template with your favourite editor
  2. Amend line 26 of the file and replace it with the parition value you copied down earlier

    # The partition you've been assigned
    #SBATCH --partition=tester
    
  3. Amend line 27 of the file and replace it with the account value you copied down earlier

    # The account you've been assigned
    #SBATCH --account=is000
    
  4. Amend line 28 of the file and replace it with the QOS value you copied down earlier

    #What is the QOS assigned to you? Check with myinfo command
    #SBATCH --qos=is000qos
    
  5. Amend line 29 of the file and replace it with your email address. (To enter multiple email addresses, separate with a comma)

    # Who should receive the email notifications
    #SBATCH --mail-user=exampleuser1@scis.smu.edu.sg,exampleuser2@scis.smu.edu.sg
    
  6. Amend line 30 and give your job a title/name

    # Give the job a name
    #SBATCH --job-name=YourName
    
  7. Select the right modules to load.

    If you only require Python, there is no need to amend this portion of the code.

    • If you are using Tensorflow, you should amend lines 39 and 40 as shown

      # Purge the enviromnent, load the modules we require.
      # Refer to https://violet.smu.edu.sg/origami/module/ for more information
      module purge
      module load Python/3.11.7
      module load cuDNN/8.9.7.29-CUDA-12.3.2
      

    Tensorflow Projects

    Please refer to the Tensorflow section of the build config guide

    • If you are using PyTorch, you should amend lines 39 and 40 as shown

      # Purge the enviromnent, load the modules we require.
      # Refer to https://violet.smu.edu.sg/origami/module/ for more information
      module purge
      module load Python/3.11.7
      module load CUDA/12.4.0
      

    PyTorch Projects

    Please refer to the PyTorch section of the build config guide

  8. This command creates a Python virtual environment to reduce package conflict. It should only be ran once and will be ignored if you have an existing virtual environment.

    This command creates a virtualenv called myenv in your home directory

    # Create a virtual environment
    python3.11 -m venv ~/myenv
    
  9. The virtual environment must be activated every time, before pip packges are downloaded.

    User with multiple projects

    If you have multiple projects, you need to activate the right virtual environment

    Tip

    The ~ refers to your home directory.

    # You will need to have a virtual environment created for this command to work.
    source ~/myenv/bin/activate
    
  10. Include the libraries/packages you need to install for your project using pip3

    Tip

    After the first successful run of the script. You may choose to remove these commands to speed up your job submission

    # If you require any packages, install it as usual before the srun job submission.
    pip3 install numpy
    pip3 install scikit
    
  11. Replace the template path with the Python script you would like the cluster to execute. An example is provided below —

    # execute your job with the srun command
    srun --gres=gpu:1 python3 <file path>/myScript.py
    

Submitting the script

  1. Locate the uploaded script and give it executable permissions

    [IS000G3@origami ~]$ chmod +x sbatchTemplatePython.sh
    
  2. Submit the script to the cluster for processing

    [IS000G3@origami ~]$ sbatch sbatchTemplatePython.sh
    Submitted batch job 1334
    
  3. The logs/output for this command will appear in the same directory where the sbatch command was executed. The file format follows <USERNAME>.<JOBID>.out

    [IS000G3@origami ~]$ ls -lrt
    -rw-rw-r--. 1 IS000G3 IS000G3     8 Feb 16 15:05 IS000G3.1334.out
    
  4. To show the results of the output, you can open and read the file with the cat command

    [IS000G3@origami ~]$ cat IS000G3.1334.out
    ...output redacted
    

You will receive an email notification when your job starts, completes, ends, or fails.

Other useful commands

View the status of your jobs

Use the myqueue command

[IS000G3@origami ~]$ myqueue
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
1329    tester mlProje-  IS000G3  R    3:36:23      1 mustang

View detailed information about a running/pending job

The myjob <jobid> command fetches information on jobs which are currently running or are recently completed in the last 5 minutes.

This command is useful to check on the resources that were allocated to the job

Tip

You can use the myqueue command to get the job id of your existing jobs

[IS000G3@origami fyp]$ myjob 1329
JobId=1329 JobName=mlProjectFinalFYP
   UserId=IS000G3(1008) GroupId=IS000G3(1012) MCS_label=N/A
   Priority=4294901739 Nice=0 Account=is000 QOS=is000qos
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=03:37:52 TimeLimit=06:00:00 TimeMin=N/A
...redacted

View past jobs

The mypastjob <number of days> command will show a history of the jobs that were executed in the past N days. A user may only fetch up to 30 days of past jobs.

[IS000G3@origami fyp]$ mypastjob 2