Guide on using Odyssey to run Python scripts

Last updated on Mar 19, 2019.

These instructions assume Linux/Mac OS and Python 3.x. They do not cover how to run jobs on GPUs.
Contact me (hshan at g dot harvard dot edu) for suggestions, corrections and comments.

1. Setting up Odyssey

Request an account and OpenAuth two-step verification here https://www.rc.fas.harvard.edu/resources/faq/how-do-i-get-a-research-computing-account/
Your RC account is not your Harvard ID!
VPN is not needed.
(optional) install FileZilla to manage files on the cluster with a graphical interface.
https://www.rc.fas.harvard.edu/resources/documentation/sftp-file-transfer/

2. Logging in

In Terminal, type

ssh [your RC username]@login.rc.fas.harvard.edu

You will be prompted to enter your RC account password and the “verification code”,
a six-digit password from your Duo Mobile app for two-step verification.

3. Creating a Python environment

To run Python scripts, you need to create an environment with all the packages you need. To do this, use a line like this

conda create -n [env_name] python=3.6 numpy scipy

Here, I created an environment called [env_name],
using python 3.6, and requested additional packages numpy, and scipy.

You need to do this even if you have import xxx in your script.
This is like installing packages on your computer.

Note that the package you can install in this way are very basic.
For not-ordinary ones (e.g. pytorch), don’t do it here.

Once you finishe this step, type

source activate [env_name]

Now you can install additional packages like you normally will on your own computer. For example, you can use

conda install pytorch torchvision -c pytorch

to install pytorch.

With all the packages you need in your script, the environment is now ready.

4. Configuring a job

The way to run a performance-intensive code on Odyssey is to submit it with a batch file. On your computer,
create a plain text file with these content:

#!/bin/bash
#SBATCH -p general
#SBATCH -N 1
#SBATCH -n 8
#SBATCH --mem 64000
#SBATCH -t 0-6:00
module load python
source activate [env_name]
python C3v_cluster.py

Let’s go though the lines one by one.

The first line should not be altered
The second line sets which partition your job is submitted to.
In the example above, it is submitted to the “general” partition.
\-N is the number of nodes you request. In general it should be 1.
\-n is the number of CPU cores you request.
–mem is the amount of memory (in MB) to be shared by your cores.
-t is the amount of runtime you request.
“module load python” loads the application python to your environment.
source activate [env_name] activates the Python environment you created.
“python C3v_cluster.py” this .py file is the script that you want to run.

5. Last step: submitting a job

Upload the text file (see above) and the python script to your directory on the cluster. Then simply run

sbatch [text_file_name]

and your job should be submitted.

6. What will happen after a job is submitted?

Your job won’t be immediately started. In short, if you request more resources, you will have to wait longer.
You can check on jobs that you’ve submitted here https://portal.rc.fas.harvard.edu/jobs/.
If the job is waiting to start, it will say “PENDING”.
Once your job is done (or terminated due to errors), the output of your code (the print() outputs in Python)
will be collected into a file called slurm-1234567.out, where the number is the number of your job.

7. Useful codes

To load a Python 3 command line:
```
module load python/3.6.3-fasrc02
```

8. Running the same script with different parameters on different jobs

This is a commonly encountered scenario when running simulations. The way it works on Odyssey is the following

Your run a for loop in the terminal, iterating through the parameter values you use.
In each iteration, it calls a special batch file, which creates a job for that parameter value
In your Python script, some codes will pick up the parameter value from and environment and use it.

Let’s set these up one by one, in reverse order.

Modifying your Python script

At the beginning of your Python script, do this

import sys
args = sys.argv # pull arguments from the environment

What does it do? When you run a Python script in the terminal, typically you would use something like

python script.py

It turns out that you can add arguments to it. For example, you can do

python script.py 1 2 3

If script.py includes the Python code we mentioned above, then args would be a list with entries

['script.py', '1', '2', '3']

This way, you can pull a parameter from the environment by converting entries in this list to integers etc.

Creating a special batch file

You need to create a plain text file. Here we call it [sbatch_file_name]. Inside, type

#!/bin/bash
# [sbatch_file_name]
#
#SBATCH -p general.
#SBATCH -N 1
#SBATCH -n 4
#SBATCH --mem 64000
#SBATCH -t 0-24:00

python script.py ${PARAM1} ${PARAM2}

# Batch control file

What this script does is that it submits a job (like the sbatch command); in each job, it runs script.py with arguments ${PARAM1} and ${PARAM2}. You could add more parameters. Values for these parameters will be assigned by the next part.

Special script to put into the terminal

When you have uploaded your Python script and the special batch file, run this in the terminal

for PARAM1 in $(seq 1 5); do
#
echo "${PARAM1}"
export PARAM1
#
sbatch -o job_no_${PARAM1}.stdout.txt \
--job-name=something_${PARAM1} \
[sbatch_file_name]
#
sleep 1
done

With these codes, we create a for loop where the value of PARAM1 goes from 1 to 5. For each value, we run the special batch file [sbatch_file_name], name the job something_${PARAM1}, and asks it to name the output file as job_no_${PARAM1}.stdout.txt.