October 29, 2021
Teton is a high performance computing cluster administered by UW’s Advanced Research Computing Center (ARCC). It has received investments from the University, entities such as Wyoming INBRE, Wyoming EPSCoR and many individual research labs across the campus. It features a large number of compute nodes each consisting of upto 32 processing units. Some compute nodes consist of Graphical Processing Units (GPUs). Yet other nodes are very high capacity offering upto 1TB of random access memory suitable for jobs with very high memory requirement.
ARCC administers these compute nodes using the SLURM Scheduling System. Because some nodes are a result of investments by entities on campus, they have priority access to those nodes. When jobs are submitted, SLURM schedules them in the order of priority for those nodes. Remaining nodes are available on first come first serve basis.
Documentation for Teton is available in the form of a Wiki.
Secure SHell (SSH) networking protocol provides functionality to connect a client (you) to the host (server). This protocol is administered through a terminal session. Most systems out there employ SSH keys (very long, encrypted passwords) in order to authenticate a client. Teton does not use SSH keys. It instead uses 2-factor authentication.
The simplest way to connect to Teton over SSH is as follows:
When the server receives this login request, it opens up the following display:
NOTICE TO USERS
=============================================================================
This is a University of Wyoming computer system, which may be accessed and
used only for authorized business by authorized personnel. Unauthorized
access or use of this computer system may subject violators to criminal,
civil, and/or administrative action. All information on this computer
system may be intercepted, recorded, read, copied, and disclosed by and to
authorized personnel for any purpose. Access or use of this computer system
by any person, whether authorized or unauthorized, constitutes consent to
these terms. There is no right of privacy in this system. Discontinue access
immediately if you do not agree with the conditions stated in this notice.
=============================================================================
TWO-FACTOR AUTHENTICATION
=============================================================================
This system requires two-factor authentication.
The password requirement is your UWYO domain password.
The token can be generated by your registered YubiKey or manually input with
the Duo mobile app. If you have questions about using this implementation of
two-factor authentication, contact the ARCC team at arcc-help@uwyo.edu
Please enter the two-factor password the in the form:
<password>,<token>
=============================================================================
Enter your uwyo login password, a comma and the 2fa passcode from the DUO mobile app. Then hit ENTER.
The system will log you in and present several pieces of information.
Last login: Thu Oct 21 14:34:35 2021 from ondemand-prod-pub.arcc.uwyo.edu
******************************************************************************
________ _____
___ __/_____ __ /_______ _______
__ / _ _ \_ __/_ __ \__ __ \
_ / / __// /_ / /_/ /_ / / /
/_/ \___/ \__/ \____/ /_/ /_/
******************************************************************************
Maintenance Scheduled: Jan 5th, 2022 from 8am to 8pm
+----------------------------------------------------------------------------------+
| *arccquota tool* | Block | File |
+----------------------------------------------------------------------------------+
| Path | Used Limit % | Used Limit % |
+----------------------------------------------------------------------------------+
| /home/popgen | 3.47g 25.00g 13.87 | 43.2K 0.0 0.00 |
| /gscratch/popgen | 1.00t 5.00t 20.00 | 33.0K 0.0 0.00 |
+----------------------------------------------------------------------------------+
| /project/inbre-train | 0.59t 1.00t 58.76 | 9.0K 0.0 0.00 |
| `- popgen | 0.42t 0.00k 0.00 | 1.8K 0.0 0.00 |
+----------------------------------------------------------------------------------+
When you are working with Teton on a daily basis, it gets cumbersome to type out the entire login command every time. Use this simple shortcut to simplify your login.
Open up the SSH configuration file on your local workstation:
Add following code to the file. Then save and close.
Now simply type the following to start login process.
When working on Teton, you will often find yourself opening multiple terminal windows for multi-tasking. But using individual logins for each of these windows is quite inefficient and ARCC actually discourages it. Instead, you can piggyback multiple SSH instances on the original login. Here is how to do it. Reopen the ssh config file:
Add following code to the file:
Host teton
User YOUR_USERNAME
Port 22
HostName teton.uwyo.edu
ControlMaster auto
ControlPath ~/.ssh/ssh-%r@%h:%p
Then open a new tab in your terminal and simply type ssh teton
. You will not be asked to re-authenticate.
When you login to teton, you always end up at a login node. For example:
tlog1
is one of the login nodes. It’s mainly there as a landing placeholder and to perform light weight operations, such as copying files and folders or performing other housekeeping commands which won’t consume significant resources.
As an example, checking things like job status, disk quota, available nodes are all valid examples. Running a R script that will take several minutes or hours to complete and is memory heavy is not. Do not attempt to run commands like that on login nodes.
Compute nodes are the appropriate locations to run those commands and jobs where you can leverage necessary resources such as number of processors and amount of memory. We will look at the protocol to submit jobs using compute nodes in the following sections.
A module is simply a software which is available to all users of the Teton HPC. The advantage of having a module is that if the software has any dependencies, those have already been looked into and satisfied by ARCC. In contrast, if you run a software that is not yet available as module on Teton, you are responsible for satisfying its dependencies, which may not be a trivial task.
ARCC provides a master command module spider
for searching existing module database.
module spider python
---------------------------------------------------------------------------------------------------------------------------------
python:
---------------------------------------------------------------------------------------------------------------------------------
Versions:
python/2.7.5
python/2.7.14
python/2.7.15
python/3.4.0
python/3.6.3
python/3.7.6
python/3.8.7
Other possible modules matches:
biopython py-biopython python3-common
As you can see, there are multiple versions of Python available as independent modules. In order to find out what dependencies exist for a given version, you can search with the version number. For example:
module spider python/3.8.7
---------------------------------------------------------------------------------------------------------------------------------
python: python/3.8.7
---------------------------------------------------------------------------------------------------------------------------------
You will need to load all module(s) on any one of the lines below before the "python/3.8.7" module is available to load.
swset/2018.05 gcc/7.3.0
Here, in order to load the module python/3.8.7
, you also first need to load swset
and gcc
modules which are its dependencies. In fact, these two dependencies must be loaded for most modules to run on Teton.
It’s very simple.
How do we know that the modules were loaded in memory?
module list
Currently Loaded Modules:
1) slurm/20.11 (S) 5) gcc/7.3.0 9) zlib/1.2.11 13) gettext/0.20.1 17) gdbm/1.18.1 21) python/3.8.7
2) arcc/0.1 (S) 6) ncurses/6.2 10) libxml2/2.9.9 14) readline/8.0 18) libbsd/0.10.0
3) singularity/2.5.2 7) libiconv/1.16 11) bzip2/1.0.8 15) openssl/1.1.1e 19) expat/2.2.9
4) swset/2018.05 8) xz/5.2.4 12) tar/1.32 16) sqlite/3.30.1 20) libffi/3.2.1
Where:
S: Module is Sticky, requires --force to unload or purge
The modules we loaded are number 4, 5 and 21. All other modules are usually loaded automatically when you first login to the system.
Unloading is reverse of load. Just use the unload
command.
Remember that if you do not use the version number for a given module, the default version will be loaded.
There are two ways to submit jobs to compute nodes on Teton.
Interactive resources
SLURM sbatch
submission
For both type of jobs, you will need to understand some basic parameters:
--time=
or -t=
: To set up the amount of time you think the job will likely take. This is a hard time limit. Your job will be killed if it doesn’t finish in this time slot. Be judicious about choosing value of this parameter.
--nodes=
: Number of compute nodes to engage
--ntasks-per-node=
: Number of cores to engage from each of the selected compute nodes
--mem=
: Amount of memory to be allocated to the job
-J
: Name of the job. Only 8 characters allowed without any spaces.
--account=
: Your project name. You must provide value for this parameter or you will not be able to submit jobs
--mail-type=
: If you want email notifications about your job, set this parameter to ALL
. Otherwise, set it to NONE
--mail-user=
: This parameter is only neended if you set the previous one to ALL
. If you did, provide your complete email address.
SLURM scheduling system ships with a utility called salloc
or secure allocation. This program allows you to request access to resources on the fly.
This command is asking for allocation of 1 node, 16 cores, 5G of memory for one hour of time. Type this command in your terminal session. Then check what happens to your login prompt.
Notice the name of the node: m067
. That’s a compute node. The clock is ticking. Let’s try to run a simple job.
helpbwa.sh
. Contents of the script follow:
module load gcc
module lod swset
module load bwa
module list
bwa
echo "This job finished at $(date)"
Now you know how interactive jobs run. One caveat with them is that SLURM will not generate a log file for these jobs. Log files can be quite useful in understanding why a job failed or when it did not do what you expected.
sbatch
SubmissionThis is the non-lazy way of running jobs. First write a script which includes various sbatch parameters as necessary:
vim helpbwa2.sh
```bash
#!/bin/bash
#SBATCH -t 00:1:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --mem=1G
#SBATCH -J help_bwa
#SBATCH --mail-type=NONE
#SBATCH --account=popgen
module load gcc swset bwa
echo "Printing help menu for bwa"
bwa
echo "Printing help menu for bwa aln algorithm"
bwa aln
echo "Printing help menu for bwa mem algorithm"
bwa mem
echo "This job finished at $(date)"
Then submit the job as follows:
You will notice that nothing is now printed to the screen. Instead, all the output goes to a log file that slurm generates. If you search in the current directory, you will find a file that ends in .out
.
Check the contents of the file:
SLURM has another utility which makes monitoring jobs easier: squeue
.
Currently no jobs are running so nothing is listed here. But the job will show up whether it was submitted through sbatch or interactively using salloc.
Sometimes, you want to check not just the status of the job but the output it is producing. You can run the following command on the .out
file to get real time updates on it.
If the job is running, you will see new output printed on the screen. Pressing Ctrl+z
removes the tail command.
Methods of interacting with Teton as presented above provide a powerful experience. However, not everyone is thrilled to be using commandline all of the time. ARCC is currently testing out a graphical interface for some of the Teton functionality through a system called SouthPass.
Southpass can be accessed here: southpass.arcc.uwyo.edu.