Introduction
Libraries such as Tensorflow and PyTorch have a set of tested dependencies. Using the specific libraries recommended by the developers can help reduce GPU compatiblity issues on the cluster.
Tensorflow
You can locate Tensorflow's Tested Build Configurations here.
For example, if you are adopting Tensorflow version 2.17, the build has been tested against Python versions 3.9 to 3.12, cuDNN 8.9 and CUDA 12.3.
Insert the following lines into your sbatch template script after line 52 if you are using Tensorflow version 2.15
PyTorch
Simiarly, PyTorch has a recommended set build configuration as well.
Use the pip3 command to install PyTorch —
Insert the following lines into your sbatch template script after line 52