Frequently Anticipated Questions

Why is the cluster unresponsive?

When there is a high amount of IO operations on the cluster, the system would respond in a slower manner.

If you are in a middle of a ls or copying operation, it may take a longer duration to complete.

Why can't I login/logon?

Ensure that your system meets the following requirements —

ClearPass is installed with a Healthy Status.
Connected to the WLAN-SMU WiFi
No external VPN services enabled/turned on

To access the GPU cluster while outside the SMU network, connect to the SMU VPN (Cisco).

If you do not have a school VPN account, approach your instructor for more information.

When logging into the cluster for the first time, you are prompted to change your password and you need to enter the "old" password provided by your Instructor.

ssh bob@origami.smu.edu.sg
The authenticity of host origami.smu.edu.sg (10.0.104.102)' can't be establishe
d.
ED25519 key fingerprint is SHA256: bQkcHdAzNGjjvv4NheCRSh7jNudtCocl9z/cFk2Tudo.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/(fingerprint])? yes
Warning: Permanently added
*origami.smu.edu.sg' (ED25519) to the list of known h
osts.
You are required to change your password immediately (root enforced)
Last login: Mon Dec 11 12:19:19 2025 from 192.168.4.15
WARNING: Your password has expired.
You must change your password now and login again!
Changing password for user bob.
Changing password for bob.
(current) UNIX password:

I am unable to detect/find/locate/load GPUs

Scripts must be executed with the job scheduler, in order for a GPU to be assigned. Refer to the Job submission guide for more information.

Help!! My job is not running

There are a few reasons why a job is not running. Execute the following command squeue --me to check on the state of your job.

If the state is PD, it means that all the resources are currently in use and your job is being queued.

Why does my job fail?

There are a few reasons why a job fails

Is it the only job failing on the cluster?
Are all file paths referenced in the Python script available on the cluster?
Have you installed the right python libraries?
Did you make the template file executable?
Have the right modules been loaded for the libraries (eg Tensorflow/PyTorch)?

Which CUDA (toolkit) version do I use?

The crimson cluster uses Nvidia 3090 GPUs, hence CUDA 11.1 or higher should be used

Unable to find my question here

Kindly post your questions at the Github forum in the following format —

Subject [Accoutname] <Your issue>

Description of your issue

What happened?
What should happen instead?
Steps to reproduce
Screenshots if any
Upload the .out file if it's available