Reef Quick Start Guide
Table of Contents
1. Introduction
This document provides a brief summary of information that you'll need to know to quickly get started working on Reef. For more detailed information, see the Reef User Guide.
2. Get a Kerberos Ticket
For security purposes, you must have a current Kerberos ticket on your computer before attempting to connect to Reef. A Kerberos client kit must be installed on your desktop to enable you to get a Kerberos ticket. Information about installing Kerberos clients on your Windows desktop can be found at HPC Centers: Kerberos & Authentication.
3. Connect to Reef
Reef can be accessed via Kerberized ssh as follows:
% ssh user@reef.mhpcc.hpc.mil
% ssh user@reeflogin01.mhpcc.hpc.mil
4. Home, Working, and Center-wide Directories
Each user has file space in the $HOME and $WORKDIR directories. The $HOME and $WORKDIR environment variables are predefined for you and point to the appropriate locations in the file systems. You are strongly encouraged to use these variables in your scripts.
NOTE: $WORKDIR is a "scratch" file system that is accessible to all center production machines. The $WORKDIR file system is not backed up. You are responsible for managing files in your $WORKDIR directories by backing up files to the archive system and deleting unneeded files. Currently, $WORKDIR files that have not been accessed in 30 days are subject to being purged.
If it is determined as part of the normal purge cycle that files in your $WORKDIR directory must be deleted, we will notify you via email 6 days prior to deletion.
If it is determined as part of the normal purge cycle that files in your $WORKDIR directory must be deleted, you WILL NOT be notified prior to deletion. You are responsible to monitor your workspace to prevent data loss.
5. Transfer Files and Data to Reef
File transfers to DSRC systems must be performed using Kerberized versions of the following tools: scp, ftp, sftp, and mpscp. For example, the command below uses secure copy (scp) to copy a local file into a destination directory on an Reef login node.
% scp local_file reef.mhpcc.hpc.mil:/target_dir % scp local_file user@reeflogin01.mhpcc.hpc.mil:/target_dirFor additional information on file transfers to and from Reef, see the File Transfers section of the Reef User Guide.
6. Submit Jobs to the Batch Queue
Slurm is the workload management system for Reef. To submit a batch job, use the following command:
sbatch [ options ] my_job_scriptwhere my_job_script is the name of the file containing your batch script. For more information on using Slurm or on job scripts, see the Reef User Guide, the Reef PBS Guide, or the sample script examples found in the $SAMPLES_HOME directory on Reef.
7. Batch Queues
The following table describes the PBS queues available on Reef:
Queue Name | Max Wall Clock Time | Max Jobs | Min Cores Per Job | Max Cores Per Job | Description |
---|---|---|---|---|---|
standard | None | N/A | 1 | 190 | 5 Non GPU Compute nodes |
telsa | None | None | 1 | 342 | 9 GPU Compute nodes Dual Tesla V100 |
rtx | None | None | 1 | 76 | 2 GPU Compute nodes Dual Quadro RTX 8000 |
8. Monitoring Your Job
You can monitor your batch jobs on Reef using the squeue command.
List your jobs
The squeue command lists all jobs in the queue. The "-u username" option shows only jobs owned by the given user, as follows:
% squeue -u username% squeue -u username -t RUNNING
% squeue -u username -t PENDING [smith@reeflogin01 IOR]$ squeue -u smith
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
567 standard ior-1x smith R 0:06 1 reefnode01
Notice that the output contains the JobID for each job. This ID can be used with the scontrol, sstat, sacct, and scancel commands.
Delete jobs
Delet What? | Command |
---|---|
A specific job | scancel jobID |
All of your jobs | scancel –u username |
All of your pending jobs | scancel -t PENDING -u username |
All of your jobs by jobname | scancel --name myJobName |
List detailed information for a job
% scontrol show jobid -dd jobid
% sstat --format=AveCPU,AvePages,AveRSS,AveVMSize,JobID -j
After job completes
% sacct -j
9. Archiving Your Work
When your job is finished, you should archive any important data to prevent automatic deletion by the purge scripts.
Copy one or more files to the archive system
archive put [-C path ] [-D] [-s] file1 [file2 ...]
Copy one or more files from the archive system
archive get [-C path ] [-s] file1 [file2 ...].
For more information on archiving your files, see the Archive Guide.
10. Modules
Software modules are a very convenient way to set needed environment variables and include necessary directories in your path so that commands for particular applications can be found. Reef uses "modules" to initialize your environment with COTS application software, system commands and libraries, compiler suites, environment variables, and PBS batch system commands.
A number of modules are loaded automatically as soon as you log in. To see the modules that are currently loaded, run "module list". To see the entire list of available modules, run "module avail". You can modify the configuration of your environment by loading and unloading modules. For complete information on how to do this, see the Modules User Guide.
11. Available Software
A list of software on Reef is available on the software page.