Skip to content

Job submission guide (conda)

Introduction

Warning

If you are not using conda permanently on your environment, do not execute conda init. After loading Anaconda3 module load Anaconda3 execute the following command eval "$(conda shell.bash hook)".

This serves as a quickstart to get your job submitted onto the cluster, after transferring your code onto the server.

If you have used Jupyter notebook to develop your code, you will need to convert it into a Python file

Note

The researchlong queue will preempt jobs whenever they are insufficient resources. Append the #SBATCH --requeue parameter to your sbatch file to re-submit the job if it gets preempted.

Submitting Python Script to Cluster

Pre-requsities

Before following this guide, ensure you have -

  1. Logged into the cluster through SSH
  2. Copied your project files into the cluster
  3. Downloaded a copy of the shell script template below.

Download pip3 job submission shell script template

Getting your account quota information

  1. Log into the GPU cluster
  2. Execute the command "myinfo" in your terminal/powershell

    [IS000G3@origami ~]$ myinfo
    ================ Account parameters ================
    
    Description                 | Value
    ---------------------------------------------
    Account name                | is000
    List of Assigned Partition  | tester
    List of Assigned QOS        | is000qos
    ---------------------------------------------
    ... output truncated
    
  3. Copy down the values of

    • Account name (Line 6)
    • List of Assigned Partition (Line 7)
    • List of Assigned QOS (Line 8)

Amending the template file

  1. Open the downloaded shell script template with your favourite editor
  2. Amend line 26 of the file and replace it with the parition value you copied down earlier

    # The partition you've been assigned
    #SBATCH --partition=tester
    
  3. Amend line 27 of the file and replace it with the account value you copied down earlier

    # The account you've been assigned
    #SBATCH --account=is000
    
  4. Amend line 28 of the file and replace it with the QOS value you copied down earlier

    #What is the QOS assigned to you? Check with myinfo command
    #SBATCH --qos=is000qos
    
  5. Amend line 29 of the file and replace it with your email address. (To enter multiple email addresses, separate with a comma)

    # Who should receive the email notifications
    #SBATCH --mail-user=exampleuser1@scis.smu.edu.sg,exampleuser2@scis.smu.edu.sg
    
  6. Amend line 30 and give your job a title/name

    # Give the job a name
    #SBATCH --job-name=YourName
    
  7. Load the Anaconda module and all other required modules

    # Purge the enviromnent, load the modules we require.
    # Refer to https://violet.smu.edu.sg/origami/module/ for more information
    module purge
    module load Anaconda3/2022.05
    
  8. This command creates a conda virtual environment to reduce package conflict. It should only be ran once and will be ignored if you have an existing virtual environment.

    This command creates a virtualenv called myenvnamehere in conda

    # Create a virtual environment
    conda create -n myenvnamehere
    
  9. The virtual environment must be activated every time, before conda packges are downloaded.

    User with multiple projects

    If you have multiple projects, you need to activate the right virtual environment

    # You will need to have a virtual environment created for this command to work.
    conda activate myenvnamehere
    
  10. Include the libraries/packages you need to install for your project using conda

    Tip

    After the first successful run of the script. You may choose to remove these commands to speed up your job submission

    # If you require any packages, install it before the srun job submission.
    conda install pytorch torchvision torchaudio -c pytorch
    
  11. Replace the template path with the Python script you would like the cluster to execute. An example is provided below —

    # Submit your job to the cluster
    srun --gres=gpu:1 python3 <file path>/myScript.py
    

Submitting the script to the cluster

  1. Locate the uploaded script and give it executable permissions

    [IS000G3@origami ~]$ chmod +x sbatchTemplatePython.sh
    
  2. Submit the script to the cluster for processing

    [IS000G3@origami ~]$ sbatch sbatchTemplatePython.sh
    Submitted batch job 1334
    
  3. The logs/output for this command will appear in the same directory where the sbatch command was executed. The file format follows <USERNAME>.<JOBID>.out

    [IS000G3@origami ~]$ ls -lrt
    -rw-rw-r--. 1 IS000G3 IS000G3     8 Feb 16 15:05 IS000G3.1334.out
    
  4. To show the results of the output, you can open and read the file with the cat command

    [IS000G3@origami ~]$ cat IS000G3.1334.out
    ...output redacted
    

You will receive an email notification when your job starts, completes, ends, or fails.

Other useful commands

Viewing the status of your jobs

Use the myqueue command

[IS000G3@origami ~]$ myqueue
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
1329    tester mlProje-  IS000G3  R    3:36:23      1 mustang

View detailed information about a running/pending job

The myjob <jobid> command fetches information on jobs which are currently running or are recently completed in the last 5 minutes.

This command is useful to check on the resources that were allocated to the job

Tip

You can use the myqueue command to get the job id of your existing jobs

[IS000G3@origami fyp]$ myjob 1329
JobId=1329 JobName=mlProjectFinalFYP
   UserId=IS000G3(1008) GroupId=IS000G3(1012) MCS_label=N/A
   Priority=4294901739 Nice=0 Account=is000 QOS=is000qos
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=03:37:52 TimeLimit=06:00:00 TimeMin=N/A
...redacted

View past jobs

The mypastjob <number of days> command will show a history of the jobs that were executed in the past N days. A user may only fetch up to 30 days of past jobs.

[IS000G3@origami fyp]$ mypastjob 2