Working with Teton Computing Environment

October 29, 2021

Table of Contents







































































































1. Teton Background and Documentation

Teton is a high performance computing cluster administered by UW’s Advanced Research Computing Center (ARCC). It has received investments from the University, entities such as Wyoming INBRE, Wyoming EPSCoR and many individual research labs across the campus. It features a large number of compute nodes each consisting of upto 32 processing units. Some compute nodes consist of Graphical Processing Units (GPUs). Yet other nodes are very high capacity offering upto 1TB of random access memory suitable for jobs with very high memory requirement.

ARCC administers these compute nodes using the SLURM Scheduling System. Because some nodes are a result of investments by entities on campus, they have priority access to those nodes. When jobs are submitted, SLURM schedules them in the order of priority for those nodes. Remaining nodes are available on first come first serve basis.

Documentation for Teton is available in the form of a Wiki.



2. Connecting Over SSH

Secure SHell (SSH) networking protocol provides functionality to connect a client (you) to the host (server). This protocol is administered through a terminal session. Most systems out there employ SSH keys (very long, encrypted passwords) in order to authenticate a client. Teton does not use SSH keys. It instead uses 2-factor authentication.

The simplest way to connect to Teton over SSH is as follows:


ssh USERNAME@teton.uwyo.edu

When the server receives this login request, it opens up the following display:


NOTICE TO USERS
=============================================================================
This is a University of Wyoming computer system, which may be accessed and
used only for authorized business by authorized personnel. Unauthorized
access or use of this computer system may subject violators to criminal,
civil, and/or administrative action. All information on this computer
system may be intercepted, recorded, read, copied, and disclosed by and to
authorized personnel for any purpose. Access or use of this computer system
by any person, whether authorized or unauthorized, constitutes consent to
these terms. There is no right of privacy in this system. Discontinue access
immediately if you do not agree with the conditions stated in this notice.
=============================================================================

                         TWO-FACTOR AUTHENTICATION
=============================================================================
This system requires two-factor authentication.

The password requirement is your UWYO domain password.

The token can be generated by your registered YubiKey or manually input with
the Duo mobile app. If you have questions about using this implementation of
two-factor authentication, contact the ARCC team at arcc-help@uwyo.edu

Please enter the two-factor password the in the form:

                            <password>,<token>

=============================================================================

Enter your uwyo login password, a comma and the 2fa passcode from the DUO mobile app. Then hit ENTER.

The system will log you in and present several pieces of information.


Last login: Thu Oct 21 14:34:35 2021 from ondemand-prod-pub.arcc.uwyo.edu
******************************************************************************
                      ________      _____                
                      ___  __/_____ __  /_______ _______ 
                      __  /   _  _ \_  __/_  __ \__  __ \
                      _  /    /  __// /_  / /_/ /_  / / /
                      /_/     \___/ \__/  \____/ /_/ /_/ 
******************************************************************************

Maintenance Scheduled:   Jan 5th, 2022 from 8am to 8pm
+----------------------------------------------------------------------------------+
|      *arccquota tool*       |          Block          |          File            |
+----------------------------------------------------------------------------------+
|            Path             |  Used      Limit      % |   Used      Limit      % |
+----------------------------------------------------------------------------------+
| /home/popgen                |   3.47g   25.00g  13.87 |    43.2K      0.0   0.00 |
| /gscratch/popgen            |   1.00t    5.00t  20.00 |    33.0K      0.0   0.00 |
+----------------------------------------------------------------------------------+
| /project/inbre-train        |   0.59t    1.00t  58.76 |     9.0K      0.0   0.00 |
|  `- popgen                  |   0.42t    0.00k   0.00 |     1.8K      0.0   0.00 |
+----------------------------------------------------------------------------------+



2.1 Quick Connect Using Shortcut

When you are working with Teton on a daily basis, it gets cumbersome to type out the entire login command every time. Use this simple shortcut to simplify your login.

Open up the SSH configuration file on your local workstation:


cd /Users/popgen/.ssh

vim config

Add following code to the file. Then save and close.


Host teton
User YOUR_USERNAME
Port 22
HostName teton.uwyo.edu

Now simply type the following to start login process.


ssh teton



2.2 Multiple Instances with Single Login

When working on Teton, you will often find yourself opening multiple terminal windows for multi-tasking. But using individual logins for each of these windows is quite inefficient and ARCC actually discourages it. Instead, you can piggyback multiple SSH instances on the original login. Here is how to do it. Reopen the ssh config file:


cd /Users/popgen/.ssh

vim config

Add following code to the file:


Host teton
User YOUR_USERNAME
Port 22
HostName teton.uwyo.edu
ControlMaster auto
ControlPath ~/.ssh/ssh-%r@%h:%p

Then open a new tab in your terminal and simply type ssh teton. You will not be asked to re-authenticate.




3. Login vs Compute Nodes

When you login to teton, you always end up at a login node. For example:


[popgen@tlog1 ~]$

tlog1 is one of the login nodes. It’s mainly there as a landing placeholder and to perform light weight operations, such as copying files and folders or performing other housekeeping commands which won’t consume significant resources.

As an example, checking things like job status, disk quota, available nodes are all valid examples. Running a R script that will take several minutes or hours to complete and is memory heavy is not. Do not attempt to run commands like that on login nodes.

Compute nodes are the appropriate locations to run those commands and jobs where you can leverage necessary resources such as number of processors and amount of memory. We will look at the protocol to submit jobs using compute nodes in the following sections.




4. Module System on Teton

A module is simply a software which is available to all users of the Teton HPC. The advantage of having a module is that if the software has any dependencies, those have already been looked into and satisfied by ARCC. In contrast, if you run a software that is not yet available as module on Teton, you are responsible for satisfying its dependencies, which may not be a trivial task.



4.1 How to Find a Module

ARCC provides a master command module spider for searching existing module database.


module spider python

---------------------------------------------------------------------------------------------------------------------------------
  python:
---------------------------------------------------------------------------------------------------------------------------------
     Versions:
        python/2.7.5
        python/2.7.14
        python/2.7.15
        python/3.4.0
        python/3.6.3
        python/3.7.6
        python/3.8.7
     Other possible modules matches:
        biopython  py-biopython  python3-common

As you can see, there are multiple versions of Python available as independent modules. In order to find out what dependencies exist for a given version, you can search with the version number. For example:


module spider python/3.8.7

---------------------------------------------------------------------------------------------------------------------------------
  python: python/3.8.7
---------------------------------------------------------------------------------------------------------------------------------

    You will need to load all module(s) on any one of the lines below before the "python/3.8.7" module is available to load.

      swset/2018.05  gcc/7.3.0

Here, in order to load the module python/3.8.7, you also first need to load swset and gcc modules which are its dependencies. In fact, these two dependencies must be loaded for most modules to run on Teton.



4.2 How to Load or Unload a Module

It’s very simple.


module load swset
module load gcc
module load python/3.8.7

How do we know that the modules were loaded in memory?


module list

Currently Loaded Modules:
  1) slurm/20.11       (S)   5) gcc/7.3.0       9) zlib/1.2.11    13) gettext/0.20.1  17) gdbm/1.18.1    21) python/3.8.7
  2) arcc/0.1          (S)   6) ncurses/6.2    10) libxml2/2.9.9  14) readline/8.0    18) libbsd/0.10.0
  3) singularity/2.5.2       7) libiconv/1.16  11) bzip2/1.0.8    15) openssl/1.1.1e  19) expat/2.2.9
  4) swset/2018.05           8) xz/5.2.4       12) tar/1.32       16) sqlite/3.30.1   20) libffi/3.2.1

  Where:
   S:  Module is Sticky, requires --force to unload or purge

The modules we loaded are number 4, 5 and 21. All other modules are usually loaded automatically when you first login to the system.

Unloading is reverse of load. Just use the unload command.


module unload python

Remember that if you do not use the version number for a given module, the default version will be loaded.




5. Submitting and Monitoring Jobs

There are two ways to submit jobs to compute nodes on Teton.

For both type of jobs, you will need to understand some basic parameters:



5.1 Job Submission Parameters



5.2 Interactive Jobs

SLURM scheduling system ships with a utility called salloc or secure allocation. This program allows you to request access to resources on the fly.


salloc --account=popgen --nodes=1 --ntasks-per-node=16 --mem=5G --time=1:00:00

This command is asking for allocation of 1 node, 16 cores, 5G of memory for one hour of time. Type this command in your terminal session. Then check what happens to your login prompt.


salloc: Granted job allocation 15429924

[popgen@m067] ~$

Notice the name of the node: m067. That’s a compute node. The clock is ticking. Let’s try to run a simple job.


module load gcc
module lod swset
module load bwa


module list

bwa

echo "This job finished at $(date)"

bash helpbwa.sh

Now you know how interactive jobs run. One caveat with them is that SLURM will not generate a log file for these jobs. Log files can be quite useful in understanding why a job failed or when it did not do what you expected.



5.3 SLURM sbatch Submission

This is the non-lazy way of running jobs. First write a script which includes various sbatch parameters as necessary:


vim helpbwa2.sh

```bash

#!/bin/bash
#SBATCH -t 00:1:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --mem=1G
#SBATCH -J help_bwa
#SBATCH --mail-type=NONE
#SBATCH --account=popgen


module load gcc swset bwa

echo "Printing help menu for bwa"
bwa

echo "Printing help menu for bwa aln algorithm"
bwa aln

echo "Printing help menu for bwa mem algorithm"
bwa mem


echo "This job finished at $(date)"

Then submit the job as follows:


sbatch helpbwa2.sh

You will notice that nothing is now printed to the screen. Instead, all the output goes to a log file that slurm generates. If you search in the current directory, you will find a file that ends in .out.


ls -lh *.out

-rw-rw-r-- 1 popgen popgen 6.1K Oct 28 15:02 slurm-15430128.out

Check the contents of the file:


cat slurm-15430128.out



5.4 Monitoring the Jobs

SLURM has another utility which makes monitoring jobs easier: squeue.


squeue -u popgen

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)

Currently no jobs are running so nothing is listed here. But the job will show up whether it was submitted through sbatch or interactively using salloc.

Sometimes, you want to check not just the status of the job but the output it is producing. You can run the following command on the .out file to get real time updates on it.


tail -f slurm-15430128.out

If the job is running, you will see new output printed on the screen. Pressing Ctrl+z removes the tail command.




6. Southpass An Alternate Gateway

Methods of interacting with Teton as presented above provide a powerful experience. However, not everyone is thrilled to be using commandline all of the time. ARCC is currently testing out a graphical interface for some of the Teton functionality through a system called SouthPass.

Southpass can be accessed here: southpass.arcc.uwyo.edu.