Reef User Guide
Table of Contents
- 1. Introduction
- 1.1. Document Scope and Assumptions
- 1.2. Policies to Review
- 1.2.1. Login Node Abuse Policy
- 1.2.2. Workspace Purge Policy
- 1.2.3 Scheduled Maintenance Policy
- 1.2.3 Archive Policy
- 1.3. Obtaining an Account
- 1.4. Requesting Assistance
- 2. System Configuration
- 2.1. System Summary
- 2.2. Processor
- 2.3. Memory
- 2.4. Operating System
- 3. Accessing the System
- 3.1. Kerberos
- 3.2. Logging In
- 3.3. File Transfers
- 4. User Environment
- 4.1. User Directories
- 4.1.1. Home Directory
- 4.1.2. Work Directory
- 4.1.3. Center Directory
- 4.2. Shells
- 4.3. Environment Variables
- 4.3.1. Login Environment Variables
- 4.3.2. Batch-Only Environment Variables
- 4.4. Modules
- 4.5. Archive Usage
- 4.5.1. Archive Commands
- 5. Program Development
- 5.1. Message Passing Interface (MPI)
- 5.1.1. IBM Compilers
- 5.2. Available Compilers
- 5.2.1. Intel Compilers
- 5.2.2. PGI Compilers
- 5.2.3. GNU Compilers
- 6. Batch Scheduling
- 6.1. Scheduler
- 6.2. Queue Information
- 6.3. Interactive Logins
- 6.4. Interactive Batch Sessions
- 6.5. Batch Request Submission
- 6.6. Batch Resource Directives
- 7. Software Resources
- 7.1. Application Software
- 7.2. Useful Utilities
- 7.3. 7.3. Sample Code Repository
1. Introduction
1.1. Document Scope and Assumptions
This document provides an overview and introduction to the use of the Reef system located at the MHPCC DSRC, along with a description of the specific computing environment on Reef. The intent of this guide is to provide information that will enable the average user to perform computational tasks on the system. To receive the most benefit from the information provided here, you should be proficient in the following areas:
- Use of the UNIX operating system
- Use of an editor (e.g., vi or emacs)
- Remote usage of computer systems via network or modem access
- A selected programming language and its related tools and libraries
1.2. Policies to Review
Users are expected to be aware of the following policies for working on Reef.
1.2.1. Login Node Abuse Policy
Memory or CPU intensive programs running on the login nodes can significantly affect all users of the system. Therefore, only small applications requiring less than 10 minutes of runtime and less than 2 GBytes of memory are allowed on the login nodes. Any job running on the login nodes that exceeds these limits may be unilaterally terminated.
1.2.2. Workspace Purge Policy
The /scratch/ directory is subject to a 60 day purge policy. A system "scrubber" monitors scratch space utilization, and if available space becomes low, files not accessed within 60 days are subject to removal, although files may remain longer if the space permits. There are no exceptions to this policy.
1.2.3. Scheduled Maintenance Policy
The Maui High Performance Computing Center may reserve the entire system on site for regularly scheduled maintenance the 4th Wednesday of every month from 8:00 am - 10:00 pm (HST). The reservation is scheduled the previous Friday. Every Monday afternoon, a committee convenes to determine if maintenance will be performed.
Additionally, the system may be down periodically for software and hardware upgrades at other times. Users are usually notified of such times in advance by "What's New" and by the login banner. Unscheduled downtimes are unusual but do occur. In such cases, notification to users may not be possible. If you cannot access the system during a non-scheduled downtime period, please send an email or call HPC Help Desk.
1.2.4. Archive Policy
MHPCC has provided on its web site information for best use of the Archive. Users that read/write thousands of files or very large files to the Archive, adversely impact the performance of the Archive for all users. A user that is negatively impacting the performance of the Archive will be notified and advised of how to best use the Archive. After being notified, if the user continues to adversely impact the Archive, the user's access to the Archive will be suspended until the user has agreed to follow best use practices. Data that is stored on the Archive must be for legitimate projects or task orders. Users will be asked to remove data from the Archive that is not for a sanctioned project or task order. If the user does not remove the unacceptable data from the archive, it will be removed by the MHPCC storage administrator.
1.3. Obtaining an Account
The process of getting an account on the HPC systems at any of the DSRCs begins with getting an account on the HPCMP Portal to the Information Environment, commonly called a "pIE User Account." If you do not yet have a pIE User Account, please visit HPC Centers: Obtaining An Account and follow the instructions there. Once you have an active pIE User Account, visit the MHPCC accounts page for instructions on how to request accounts on the MHPCC DSRC HPC systems. If you need assistance with any part of this process, please contact the HPC Help Desk at accounts@helpdesk.hpc.mil.
1.4. Requesting Assistance
The HPC Help Desk is available to help users with unclassified problems, issues, or questions. Analysts are on duty 8:00 a.m. - 8:00 p.m. Eastern, Monday - Friday (excluding Federal holidays).
- Web: https://helpdesk.hpc.mil
- E-mail: help@helpdesk.hpc.mil
- Phone: 1-877-222-2039 or (937) 255-0679
- Fax: (937) 656-9538
You can contact the MHPCC DSRC directly in any of the following ways for support services not provided by the HPC Help Desk:
- Web: https://mhpcc.hpc.mil/user/help_form.html
- E-mail: help@helpdesk.hpc.mil
- Phone: (808) 879-5077
- Fax: (808) 879-5018
- U.S. Mail:
Maui High Performance Computing Center
550 Lipoa Parkway
Kihei, Maui, HI 96753
For more detailed contact information, please see our Contact Page.
2. System Configuration
2.1. System Summary
Reef is a Aspen Systems Linux Cluster. The login and compute nodes are populated with INTEL 2.5Ghz Cascade Lake processors. Reef uses the Mellanox Infiniband as its high speed network for MPI messages and IO traffic. Reef uses Lustre to manage its parallel file system that targets DDN ES200NV NVMe SSD arrays. Reef has 8 CPU only and 11 GPU compute nodes that share memory only on the node; memory is not shared across the nodes. Each compute node has 2 20 core processors with its own Redhat 7 operating system, sharing 768 GBytes of DDR4 memory, with no user accessible swap space. Reef has 109 TBytes (formatted) of disk storage.
Reef is intended to be used as a batch scheduled HPC system. Its login nodes are not to be used for large computational (e.g. memory, IO, long executions) work. All executions that require large amounts of system resources must be sent to the compute nodes by batch job submission.
Login Nodes | Compute Nodes | ||||
---|---|---|---|---|---|
Standard Memory |
Tesla V100 GPU Accelerated |
Quadro RTX 8000 GPU Accelerated |
|||
Total Cores | Nodes | 40 Cores | 1 Nodes | 200 Cores | 5 Nodes | 360 Cores | 9 Nodes | 80 Cores | 2 Nodes | |
Operating System | RHEL7 | RHEL7 | RHEL7 | RHEL7 | |
Cores/Node *Openstack VMs |
38 Cores | 38 Cores | 38 Cores + 2 GPUs (2 GPUs x 5120 Cores) |
38 Cores + 2 GPUs (2 GPUs x 4608 Cores) |
|
Processor Type | Dual Intel Xeon Gold 6248 Cascade Lake (20 cores/socket) |
Dual Intel Xeon Gold 6248 Cascade Lake (20 cores/socket) |
Dual Intel Xeon Gold 6248 Cascade Lake (20 cores/socket) |
Dual Intel Xeon Gold 6248 Cascade Lake (20 cores/socket) |
|
Processor Speed | 2.5GHz | 2.5GHz | 2.5GHz | 2.5GHz | |
Memory/Node | 768GB | 768GB | 768GB +2GPUs x 32GB/GPU |
768GB +2GPUs x 48GB/GPU |
|
Usable Memory/Node *Openstack VMs |
640GB | 640GB | 640GB | 640GB | |
Interconnect Type | Mellanox HDR100, EDR InfiniBand | Mellanox HDR100, EDR InfiniBand | Mellanox HDR100, EDR InfiniBand | Mellanox HDR100, EDR InfiniBand |
Path | Capacity | Type | User Quota | Minimum File Retention Time |
---|---|---|---|---|
/work1/scratch ($WORKDIR) User Quota | 138 TBytes | Luster | None | 30 days |
/work2/home ($HOME) | 109 TBytes | Ceph | 50 GBytes | None |
/p/cwfs ($CENTER) | 753 TBytes | NFS | None | 120 days |
2.2. Processor
Reef uses 2.5GHz Intel Cascade Lake processors on its Openstack VM login node. There are 2 processors per Openstack VM node, each with 19 cores, for a total of 38 cores per node. In addition, these processors have 1.25 MiB of L1 cache, 20 MiB of L2 cache, and 27.5 MiB of L3 cache.
Reef's GPU nodes consist of 2.5GHz Intel Cascade Lake paired with 2 NVIDIA Tesla V100 or Quadro RTX 8000. The main processors have 20 cores, for a total of 40 cores per node. These processors have 1.25 MiB of L1 cache, 20 MiB of L2 cache, and 27 MiB of L3 cache. Each Tesla and RTX GPU respectively has 5120 and 4608 CUDA cores operating at 1.23 and 1.39 GHz with 32 and 48 GBytes of memory.
2.3. Memory
Reef uses both shared and distributed memory models. Memory is shared among all the cores on a node, but is not shared among the nodes across the cluster.
Each Openstack VM login node contains 630 GBytes of main memory. All memory and cores on the node are shared among all users who are logged in.
Each compute node also contains 630 GBytes of usable shared memory.
Each NVIDIA GPU accelerated compute node contains 630 GBytes of usable shared memory on the node, as well as 32 or 48 GBytes of shared memory internal to each accelerator depending on the accelerator.
2.4. Operating System
The operating system on Reef is RedHat Linux.
3. Accessing the System
3.1. Kerberos
For security purposes, you must have a current Kerberos ticket on your computer before attempting to connect to Reef. To obtain a ticket you must either install a Kerberos client kit on your desktop to enable you to get a Kerberos ticket, or else connect via the HPC Portal (discussed in Section 3.2.). Visit HPC Centers: Authentication for information about installing Kerberos clients on your Windows, Linux, or Mac desktop. Instructions are also available on those pages for getting a ticket and logging into the HPC systems from each platform.
3.2. Logging In
The system host name for the Reef cluster is reef.mhpcc.hpc.mil, which will redirect the user to reeflogin01.mhpcc.hpc.mil. Hostnames and IP addresses to these nodes are available upon request from the HPC Help Desk.
The preferred way to login to Reef is via ssh, as follows:
ssh -l username reef.mhpcc.hpc.mil
3.3. File Transfers
File transfers to DSRC systems (except for those to the local archive system) must be performed using the following HPCMP Kerberized tools: scp, mpscp, sftp, kftp, scampi, or tube. Before using any of these tools (except tube) you must use a Kerberos client to obtain a Kerberos ticket. Information about installing and using a Kerberos client can be found at HPC Centers: Kerberos & Authentication.
The command below uses secure copy (scp) to copy a single local file into a destination directory on a Machine-name login node.The mpscp command is similar to the scp command, but has a different underlying means of data transfer, and may enable greater transfer rate.The mpscp command has the same syntax as scp.
%scp local_file user@reeflogin01.mhpcc.hpc.mil:/target_dir (# = 1 to N)
Both scp and mpscp can be used to send multiple files. This command transfers all files with the .txt extension to the same destination directory.
% scp *.txt user@reeflogin01.mhpcc.hpc.mil:/target_dir (# = 1 to N)
The example below uses the secure file transfer protocol (sftp) to connect to Reef, then uses sftp's cd and put commands to change to the destination directory and copy a local file there. The sftp quit command ends the sftp session. Use the sftp help command to see a list of all sftp commands.
% sftp user@reeflogin01.mhpcc.hpc.mil (# = 1 to N)
sftp> cd target_dir
sftp> put local_file
sftp> quit
The Kerberized file transfer protocol (kftp) command differs from sftp in that your username is not specified on the command line, but given later when prompted. The kftp command may not be available in all environments.
% kftp reeflogin01.mhpcc.hpc.mil (# = 1 to N)
username> user
kftp> cd target_dir
kftp> put local_file
kftp> quit
Windows users may use a graphical file transfer protocol (ftp) client such as FileZilla.
4. User Environment
4.1. User Directories
The following user directories are provided for all users on Reef.
4.1.1. Home Directory
When you log in, you are placed in your home directory,
/p/home/username. It is accessible from the login and compute nodes and
can be referenced by the environment variable $HOME.
Your home directory is intended for storage of frequently-used files, scripts, and small utility programs. It has a 50-GByte quota, and files stored there are not subject to automatic deletion based on age. It is backed up weekly to enable file restoration in the event of catastrophic system failure.
Important! The home file system is not tuned for parallel I/O and does not support application-level I/O. Jobs performing file I/O in your home directory will perform poorly and cause problems for everyone on the system. Running jobs should use the work file system (//work1/scratch) for file I/O.
4.1.2. Work Directory
The work file system is a high-performance Lustre-based file system tuned for parallel application-level I/O with a capacity of 109TB. It is accessible from the login and compute nodes and provides temporary file storage for queued and running jobs.
All users have a work directory, /p/work/username, on this file system, which can be referenced by the environment variable, $WORKDIR. This directory should be used for all application file I/O. NEVER allow your jobs to perform file I/O in $HOME.
$WORKDIR has no quota. It is not backed up or exported to any other system and is subject to an automated deletion cycle. If available disk space gets too low, files that have not been accessed in 30 days may be deleted. If this happens or if catastrophic disk failure occurs, lost files are irretrievable. To prevent the loss of important files, transfer them to a long-term storage area, such as your archival directory ($ARCHIVE_HOME), which has no quota. Or, for smaller files, your home directory ($HOME).
Maintaining the high performance of the Lustre file system is important for the efficient and effective use of Reef by all users. You are expected to take steps to ensure your file storage and access methods follow the suggested guidelines in the Lustre User Guide. Additional examples can be found in $SAMPLES_HOME/Data_Management/OST_Stripes on Reef.
To avoid errors that can arise from two jobs using the same scratch directory, a common technique is to create a unique subdirectory for each batch job by including the following lines in your batch script to create a unique output directory within $WORKDIR for each job:
TMPD=${WORKDIR}/${SLURM_JOBID}
mkdir -p ${TMPD}
4.1.3 Center Directory
The Center-Wide File System (CWFS) is an NFS-mounted file system with a formatted capacity of 753TB. It is accessible from the login nodes of all HPC systems at the center and from the HPC Portal. It provides centralized, shared storage that enables users to easily access data from multiple systems. The CWFS is not tuned for parallel I/O and does not support application-level I/O.
All users have a directory on the CWFS. The name of your directory may vary between machines and between centers, but the environment variable $CENTER will always refer to this directory.
$CENTER has a quota of 100 TBytes. It is not backed up or exported to any other system and is subject to an automated deletion cycle. If available disk space gets too low, files that have not been accessed in 120 days may be deleted. If this happens or if catastrophic disk failure occurs, lost files are irretrievable. To prevent the loss of important files, transfer them to a long-term storage area, such as your archival directory ($ARCHIVE_HOME), which has no quota. Or, for smaller files, your home directory ($HOME).
4.2. Shells
The following shells are available on Reef: csh, bash, ksh, tcsh, sh, and zsh.
To change your default shell, log into the Portal to the Information Environment and go to "User Information Environment" > "View/Modify personal account information". Scroll down to "Preferred Shell" and select your desired default shell. Then scroll to the bottom and click "Save Changes". Your requested change should take effect within 24 hours.
4.3. Environment Variables
A number of environment variables are provided by default on all HPCMP high performance computing (HPC) systems. We encourage you to use these variables in your scripts where possible. Doing so will help to simplify your scripts and reduce portability issues if you ever need to run those scripts on other systems. The following environment variables are automatically set in your login environment:
4.3.1. Common Environment Variables
The following environment variables are common to both the login and batch environments:
Variable | Description |
---|---|
$ARCHIVE_HOME | Your directory on the archive system. |
$ARCHIVE_HOST | The host name of the archive system. |
$BC_ACCELERATOR_NODE_CORES | The number of CPU cores per node for a compute node which features CPUs and a hosted accelerator processor. |
$BC_BIGMEM_NODE_CORES | The number of cores per node for a big memory (BIGMEM) compute node. |
$BC_CORES_PER_NODE | The number of CPU cores per node for the node type on which the variable is queried. |
$BC_HOST | The generic (not node specific) name of the system. Examples include centennial, mustang, onyx and gaffney. |
$BC_NODE_TYPE | The type of node on which the variable is queried. Values of $BC_NODE_TYPE will be: LOGIN, STANDARD, PHI, BIGMEM, BATCH or ACCELERATOR. |
$BC_PHI_NODE_CORES | The number of Phi cores per node, if the system has any Phi nodes. It will be set to 0 on systems without Phi nodes. |
$BC_STANDARD_NODE_CORES | The number of CPU cores per node for a standard compute node. |
$CC | The currently selected C compiler. This variable is automatically updated when a new compiler environment is loaded. |
$CENTER | Your directory on the Center-Wide File System (CWFS). |
$COST_HOME | The top-level directory for the Common Open Source Tools (COST) software packages. This variable has been deprecated in favor of $CSE_HOME. |
$CSE_HOME | The top-level directory for the Computational Science Environment (CSE) tools and applications. |
$CXX | The currently selected C++ compiler. This variable is automatically updated when a new compiler environment is loaded. |
$DAAC_HOME | The top level directory for the DAAC (Data Analysis and Assessment Center) supported tools. |
$F77 | The currently selected Fortran 77 compiler. This variable is automatically updated when a new compiler environment is loaded. |
$F90 | The currently selected Fortran 90 compiler. This variable is automatically updated when a new compiler environment is loaded. |
$HOME | Your home directory on the system. |
$JAVA_HOME | The directory containing the default installation of JAVA. |
$KRB5_HOME | The directory containing the Kerberos utilities. |
$LOCALWORKDIR | A high-speed work directory that is local and unique to an individual node, if the node provides such space. |
$PET_HOME | The directory containing tools installed by PET staff, which are considered experimental or under evaluation. Certain older packages have been migrated to $CSE_HOME, as appropriate. |
$PROJECTS_HOME | The directory in which user-supported applications and codes may be installed. |
$SAMPLES_HOME | A directory that contains the Sample Code Repository, a variety of sample codes and scripts provided by a center’s staff. |
$WORKDIR | Your work directory on the local temporary file system (i.e., local high-speed disk). |
Variable | Description |
---|---|
$ARCHIVE_HOME | Your directory on the archive server. |
$ARCHIVE_HOST | The host name of the archive server. |
$BCI_HOME | Variable points to location of HPCMO common set of open source utilities per HPCMO Baseline Configuration Program |
$CSE_HOME (TBD) | Variable points to location of Computational Science Environment (CSE) Software |
$CSI_HOME (TBD) | The directory containing the following list of heavily used application packages: ABAQUS, Accelrys, ANSYS, CFD++, Cobalt, EnSight, Fluent, GASP, Gaussian, LS-DYNA, MATLAB, and TotalView, formerly known as the Consolidated Software Initiative (CSI) list. Other application software may also be installed here by our staff. |
$HOME | Your home directory on the system. |
$JAVA_HOME (TBD) | The directory containing the default installation of JAVA. |
$PET_HOME (TBD) | The directory containing the tools formerly installed and maintained by the PETTT staff. This variable is deprecated and will be removed from the system in the future. Certain tools will be migrated to $COST_HOME, as appropriate. |
$SAMPLES_HOME (TBD) | The Sample Code Repository. This is a collection of sample scripts and codes provided and maintained by our staff to help users learn to write their own scripts. There are a number of ready-to-use scripts for a variety of applications. |
$WORKDIR | Your work directory on the local temporary file system (i.e., local high-speed disk). |
4.3.2. Batch-Only Environment Variables
In addition to the variables listed above, the following variables are automatically set only in your batch environment. That is, your batch scripts will be able to see them when they run. These variables are supplied for your convenience and are intended for use inside your batch scripts.
Variable | Description |
---|---|
$BC_MEM_PER_NODE | The approximate maximum user-accessible memory per node (in integer MBytes) for the compute node on which a job is running. |
$BC_MPI_TASKS_ALLOC | The number of MPI tasks allocated for a job. |
$BC_NODE_ALLOC | The number of nodes allocated for a job. |
4.4. Modules
Software modules are a convenient way to set needed environment variables and include necessary directories in your path so that commands for particular applications can be found. Reef uses "modules" to initialize your environment with COTS application software, system commands and libraries, compiler suites, environment variables, and PBS batch system commands.
A number of modules are loaded automatically as soon as you log in. To see the modules which are currently loaded, use the "module list" command. To see the entire list of available modules, use "module avail". You can modify the configuration of your environment by loading and unloading modules. For complete information on how to do this, see the Modules User Guide.
4.5. Archive Usage
All of our HPC systems have access to an online archival mass storage system that provides long-term storage for users' files on a petascale tape file system that resides on a robotic tape library system. A 48 TByte disk cache frontends the tape file system and temporarily holds files while they are being transferred to or from tape.
Tape file systems have very slow access times. The tapes must be robotically pulled from the tape library, mounted in one of the limited number of tape drives, and wound into position for file archival or retrieval. For this reason, users should always tar up their small files in a large tarball when archiving a significant number of files. Files larger than 8 TByte will span more than one tape, which will greatly increase the time required for both archival and retrieval.
The environment variable $ARCHIVE_HOME is automatically set for you and can be used to reference your archive directory when using archive commands.
4.5.1. Archive Command Synopsis
A synopsis of the archive utility is listed below. For information on additional capabilities, see the Archive User Guide or read the online man page that is available on each system. This command is non-Kerberized and can be used in batch submission scripts if desired.
Copy one or more files from the archive system:
archive get [-C path] [-s] file1 [file2...]
List files and directory contents on the archive system:
archive ls [lsopts] [file/dir ...]
Create directories on the archive system:
archive mkdir [-C path] [-m mode] [-p] [-s] dir1 [dir2 ...]
Copy one or more files to the archive system:
archive put [-C path] [-D] [-s] file1 [file2 ...]
Move or rename files and directories on the archive server:
archive mv [-C path] [-s] file1 [file2 ...] target
Remove files and directories from the archive server:
archive rm [-C path] [-r] [-s] file1 [file2 ...]
Check and report the status of the archive server:
archive stat [-s]
Remove empty directories from the archive server:
archive rmdir [-C path] [-p] [-s] dir1 [dir2 ...]
Change permissions of files and directories on the archive server:
archive chmod [-C path] [-R] [-s] mode file1 [file2 ...]
Change the group of files and directories on the archive server:
archive chgrp [-C path] [-R] [-h] [-s] group file1 [file2 ...]
5. Program Development
5.1. Programming Models
Reef supports Message Passing Interface (MPI). MPI is an example of a message- or data-passing model.
5.1.1. Message Passing Interface (MPI)
Reef has IBM Spectrum MPI and OpenMPI standard library suites.
5.2. Available Compilers
Reef has three programming environment suites:
- Intel
- PGI
- GNU
The paths for the compilers are already setup for users through the use of "modules". The default modules loaded can be viewed by executing the command module list
To see what modules are available execute the command module avail
To change environment to a different module/compiler execute the command module purge
5.2.1. Intel Compilers
Option | Purpose |
---|---|
icc | Intel C compiler |
icpc | Intel C++ compiler |
ifort | Intel Fortran compiler |
mpiicc | Compiles and links MPI programs written in C |
mpiicpc | Compiles and links MPI programs written in C++. |
mpiifort | Compiles and links MPI programs written in Fortran |
5.2.2. PGI Compiler
Option | Purpose |
---|---|
pgcc | PGI C compiler |
pg++ | PGI C++ compiler |
pgfortran | PGI Fortran 77 compiler |
pgf90 | PGI Fortran 90 compiler |
mpicc | Compiles and links MPI programs written in C |
mpiCC | Compiles and links MPI programs written in C++ |
mpif77 | Compiles and links MPI programs written in Fortran 77 |
mpif90 | Compiles and links MPI programs written in Fortran 90 |
5.2.3. GNU Compiler Collection
Option | Purpose |
---|---|
gcc | C compiler, found in path /usr/bin |
g++ | C++ compiler, found in path /usr/bin |
g77 | Fortran 77 compiler, found in path /usr/bin |
mpicc | Compiles and links MPI programs written in C |
mpiCC | Compiles and links MPI programs written in C++ |
mpi77 | Compiles and links MPI programs written in Fortran 77 |
NOTE: All MPI compilers are built for Infiniband interconnect communication. We do not support slower ethernet drivers.
Library paths:
/usr/lib
/usr/lib64
6. Batch Scheduling
6.1. Scheduler
The SLURM batch scheduler is currently running on Reef. It schedules jobs and manages resources and job queues, and can be accessed through the interactive batch environment or by submitting a batch request. SLURM is able to manage both single-processor and multiprocessor jobs.
6.2. Queue Information
The following table describes the SLURM queues available on Reef:
Max Wall Clock Time | Max Jobs | Min Cores Per Job | Max Cores Per Job | Description |
---|---|---|---|---|
None | N/A | 1 | 190 | 5 Non GPU Compute nodes |
None | none | 1 | 342 | 9 GPU Compute nodes
Dual Tesla V100 |
None | none | 1 | 76 | 2 GPU Compute nodes
Dual Quadro RTX 8000 |
6.3. Interactive Logins
When you log in to Reef, you will be running in an interactive shell on a login node. The login nodes provide login access for Reef and support such activities as compiling, editing, and general interactive use by all users. Please note the Login Node Abuse Policy. The preferred method to run resource-intensive executions is to use an interactive batch session.
6.3. Interactive Batch Session
You can run an interactive job like this:
srun --nodes=1 --ntasks-per-node=1 --time=01:00:00 --pty bash -i
Your batch shell request will be placed in the interactive queue and scheduled for execution. This may take a few minutes or a long time depending on the system load. Once your shell starts, you will be logged into the first compute node of the compute nodes that were assigned to your interactive batch job. At this point, you can run or debug applications interactively, execute job scripts, or start executions on the compute nodes you were assigned. The "-X" option enables X-Windows access, so it may be omitted if that functionality is not required for the interactive job.
6.5. Batch Request Submission
SLURM batch jobs are submitted via the sbatch command. The format of this command is:
sbatch [ options ] sbatch_script_file
sbatch options may be specified on the command line or embedded in the batch script file by lines beginning with "#SBATCH".
For a more thorough discussion of SLURM Batch Submission, see the Reef SLURM User Guide.
6.6. Batch Resource Directives
A complete listing of batch resource directives is available in the Reef SLURM User Guide.
7. Software Resources
7.1. Application Software
A complete listing with installed versions can be found on our software page. The general rule for all COTS software packages is that the two latest versions will be maintained on our systems. For convenience, modules are also available for most COTS software packages.
7.2. Useful Utilities
The following utilities are available on Reef:
Utility | Description |
---|---|
check_license | Checks the status of HPCMP shared applications. |
node_use | Display the amount of free and used memory for login nodes. |
qpeek | Display spooled stdout and stderr for an executing batch job. |
qview | Display information about batch jobs and queues. |
showq | User friendly, highly descriptive representation of the PBS queue specific to Reef. |
show_queues | Report current batch queue status, usage, and limits. |
showres | An informative command regarding reservations. |
show_storage | Display MSAS allocation and usage by subproject. |
show_usage | Display CPU allocation and usage by subproject. |
7.3. Sample Code Repository
The Sample Code Repository is a directory that contains examples for COTS batch scripts, building and using serial and parallel programs, data management, and accessing and using serial and parallel math libraries. The $SAMPLES_HOME environment variable contains the path to this area, and is automatically defined in your login environment. Below is a listing of the examples provided in the Sample Code Repository on Reef.