Job submission guide (pip3)
Introduction
This serves as a quickstart to get your job submitted onto the cluster, after transferring your code onto the server.
If you have used Jupyter notebook to develop your code, you will need to convert it into a Python file
Note
The researchlong
queue will preempt jobs whenever they are insufficient resources. Append the #SBATCH --requeue
parameter to your sbatch file to re-submit the job if it gets preempted.
Submitting Python Script to Cluster
Pre-requsities
Before following this guide, ensure you have -
- Logged into the cluster through SSH
- Copied your project files into the cluster
- Downloaded a copy of the shell script template below.
Download pip3 job submission shell script template
Getting your account quota information
- Log into the GPU cluster
-
Execute the command "myinfo" in your terminal/powershell
-
Copy down the values of
- Account name (Line 6)
- List of Assigned Partition (Line 7)
- List of Assigned QOS (Line 8)
Amending the template file
- Open the downloaded shell script template with your favourite editor
-
Amend line 26 of the file and replace it with the parition value you copied down earlier
-
Amend line 27 of the file and replace it with the account value you copied down earlier
-
Amend line 28 of the file and replace it with the QOS value you copied down earlier
-
Amend line 29 of the file and replace it with your email address. (To enter multiple email addresses, separate with a comma)
-
Amend line 30 and give your job a title/name
-
Select the right modules to load.
If you only require Python, there is no need to amend this portion of the code.
-
If you are using Tensorflow, you should amend lines 39 and 40 as shown
Tensorflow Projects
Please refer to the Tensorflow section of the build config guide
-
If you are using PyTorch, you should amend lines 39 and 40 as shown
PyTorch Projects
Please refer to the PyTorch section of the build config guide
-
-
This command creates a Python virtual environment to reduce package conflict. It should only be ran once and will be ignored if you have an existing virtual environment.
This command creates a virtualenv called
myenv
in your home directory -
The virtual environment must be activated every time, before pip packges are downloaded.
User with multiple projects
If you have multiple projects, you need to activate the right virtual environment
Tip
The
~
refers to your home directory. -
Include the libraries/packages you need to install for your project using pip3
Tip
After the first successful run of the script. You may choose to remove these commands to speed up your job submission
-
Replace the template path with the Python script you would like the cluster to execute. An example is provided below —
Submitting the script
-
Locate the uploaded script and give it executable permissions
-
Submit the script to the cluster for processing
-
The logs/output for this command will appear in the same directory where the sbatch command was executed. The file format follows
<USERNAME>.<JOBID>.out
-
To show the results of the output, you can open and read the file with the
cat
command
You will receive an email notification when your job starts, completes, ends, or fails.
Other useful commands
View the status of your jobs
Use the myqueue
command
[IS000G3@origami ~]$ myqueue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1329 tester mlProje- IS000G3 R 3:36:23 1 mustang
View detailed information about a running/pending job
The myjob <jobid>
command fetches information on jobs which are currently running or are recently completed in the last 5 minutes.
This command is useful to check on the resources that were allocated to the job
Tip
You can use the myqueue
command to get the job id of your existing jobs
[IS000G3@origami fyp]$ myjob 1329
JobId=1329 JobName=mlProjectFinalFYP
UserId=IS000G3(1008) GroupId=IS000G3(1012) MCS_label=N/A
Priority=4294901739 Nice=0 Account=is000 QOS=is000qos
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=03:37:52 TimeLimit=06:00:00 TimeMin=N/A
...redacted
View past jobs
The mypastjob <number of days>
command will show a history of the jobs that were executed in the past N days. A user may only fetch up to 30 days of past jobs.