Skip Nav

SuperMicro SuperServer MLA V100 (Vulcanite)
User Guide

Table of Contents

1. Introduction

1.1. Document Scope and Assumptions

This document provides an overview and introduction to the use of the SuperMicro SuperServer MLA V100 (Vulcanite) located at the ERDC DSRC, along with a description of the specific computing environment on Vulcanite. The intent of this guide is to provide information that will enable the average user to perform computational tasks on the system. To receive the most benefit from the information provided here, you should be proficient in the following areas:

  • Use of the UNIX operating system
  • Use of an editor (e.g., vi or emacs)
  • Remote usage of computer systems via network or modem access
  • A selected programming language and its related tools and libraries

1.2. Obtaining an Account

To get an account on Vulcanite, you must first submit a Vulcanite Project Proposal. You will also require an account on the HPCMP Portal to the Information Environment, commonly called a "pIE User Account." Once you have submitted your proposal, if you do not yet have a pIE User Account, please visit HPC Centers: Obtaining An Account and follow the instructions there. If you need assistance with any part of this process, please contact the HPC Help Desk at accounts@helpdesk.hpc.mil.

1.3. Requesting Assistance

The ERDC DSRC HPC Service Center is available to help users with problems, issues, or questions. Analysts are on duty 8:00 a.m. - 5:00 p.m. Central, Monday - Friday (excluding Federal holidays).

To request assistance, contact the ERDC DSRC directly in any of the following ways:

For more detailed contact information, please see our Contact Page.

2. System Configuration

2.1. System Summary

Vulcanite is an exploratory system meant to provide users access to a variety of high density GPU node configurations. Each node type has a different number of processors, amount of memory, number of GPUs, amount of SSD storage, and number of network interfaces. Because of this users should take care when migrating between node types.

Node Configuration
Login Nodes Accelerator Nodes
2 GPUs 4 GPUs 8 GPUs
Total Nodes 2 26 8 5
Operating System RHEL 7
Cores/Node 12 24 48
Core Type Intel Xeon
Intel Xeon
Gold 6126T
Skylake
Intel Xeon
Intel Xeon
Gold 6126T
Skylake
(12 cores each)
Intel Xeon
Dual Intel Xeon
Gold 6136
Skylake
(12 cores each socket)
Intel Xeon
Dual Intel Xeon
Platinum 8160
Skylake
(24 cores each socket)
Core Speed 2.6 GHz 3.0 GHz 2.1 GHz
Memory/Node 192 GBytes
DDR4-2666
384 GBytes
DDR4-2666
768 GBytes
DDR4-2666
Accessible Memory/Node 8 GBytes 206 GBytes 284 GBytes 764 GBytes
Interconnect Type EDR InfiniBand 1x EDR InfiniBand 2x EDR InfiniBand 4x
SSD local on Node 2 TBytes NVMe 2 TBytes NVMe 4 TBytes NVMe 8 TBytes NVMe
File Systems on Vulcanite
Path Capacity Type
/home 4 TBytes SSD
/gpfs/cwfs 3 PBytes GPFS

2.2. Operating System

Vulcanite's operating system is RedHat Enterprise Linux 7.

2.3. File Systems

Vulcanite has the following file systems available for user storage:

2.3.1. /home

/home is a locally mounted SSD with a unformatted capacity of 4 TBytes. All users have a home directory located on this file system, which can be referenced by the environment variable $HOME. /home has a 30 GByte quota.

2.3.2. /gpfs/cwfs

The Center-Wide File System (CWFS) provides file storage that is accessible from all Vulcanite's nodes. The environment variable $CENTER refers to this directory.

2.3.3. /tmp

The /tmp directory allows users write access to the local SSD on each node. The size of the SDD depends on configuration of the node (see table above). Note: any files placed on /tmp will be removed when the batch job ends. Users should copy any necessary files from $CENTER to /tmp near the beginning of their batch script. Then copy any desired results from /tmp to $CENTER before the end of their batch script.

3. Accessing the System

3.1. Kerberos

A Kerberos client kit must be installed on your desktop to enable you to get a Kerberos ticket. Kerberos is a network authentication tool that provides secure communication by using secret cryptographic keys. Only users with a valid HPCMP Kerberos authentication can gain access to Vulcanite. More information about installing Kerberos clients on your desktop can be found at HPC Centers: Kerberos & Authentication.

3.2. Logging In

The login nodes for the Vulcanite cluster are vulcanite01 and vulcanite02.

The preferred way to login to Vulcanite is via ssh, as follows:

% ssh vulcanite01.erdc.hpc.mil

4. User Environment

4.1. Modules

A number of modules are loaded automatically as soon as you log in. To see the modules which are currently loaded, use the "module list" command. To see the entire list of available modules, use the "module avail" command. You can modify the configuration of your environment by loading and unloading modules. For complete information on how to do this, see the Modules User Guide.

4.2. Archive Usage

Vulcanite does not have direct access to the MSAS archive server, but does have access to the CWFS.

4.3. Available Compilers

Vulcanite has the GNU and Intel compilers.

Vulcanite has several MPI suites:

  • OpenMPI (GCC)
  • MPICH (GCC)
  • MVAPICH2 (GCC)
  • IMPI (INTEL)

4.4. Programming Models

Vulcantie supports two base programming models: Message Passing Interface (MPI) and Open Multi-Processing (OpenMP). A Hybrid MPI/OpenMP programming model is also supported.

5. Batch Scheduling

5.1. Scheduler

The Portable Batch System (PBS) is currently running on Vulcanite.

5.2. Queue Information

Vulcanite only has the Standard queue. The maximum wall clock time is 168 hours.

5.3. Interactive Batch Sessions

To get an interactive batch session, you must first submit an interactive batch job through PBS. This is done by executing a qsub command with the "-I" option from within the interactive login environment. For example:

qsub -l select=N1:ncpus=12:mpiprocs=N2 -A Project_ID -q standard -l walltime=HHH:MM:SS -I

You must specify the number of nodes requested (N1), the number of processes per node (N2), the desired maximum walltime, your project ID, and a job queue.

Your interactive batch sessions will be scheduled just as normal batch jobs are scheduled depending on the other queued batch jobs, so it may take quite a while. Once your interactive batch shell starts, you can run or debug interactive applications, post-process data, etc.

At this point, you can run parallel applications on your assigned set of compute nodes. You can also run interactive commands or scripts on this node.

5.4. Batch Resource Directives

Batch resource directives allow you to specify to PBS how your batch jobs should be run and what resources your job requires. Although PBS has many directives, you only need to know a few to run most jobs.

The basic syntax of PBS directives is as follows:

#PBS option[[=]value]

where some options may require values to be included. For example, to start a 8-process job, you would request one node of 12 cores and specify that you will be running 8 processes per node:

#PBS -l select=1:ncpus=12:mpiprocs=8:ngpus=2

The following directives are required for all jobs:

Required PBS Directives
Directive Value Description
-A Project_ID Name of the project
-q queue_name Name of the queue
-l select=N1:ncpus=12:mpiprocs=N2ngpus=2 For 2-GPU nodes:
N1 = Number of nodes
N2 = MPI processes per node
-l select=N1:ncpus=24:mpiprocs=N2ngpus=4 For 4-GPU node:
N1 = Number of nodes
N2 = MPI processes per node
-l select=N1:ncpus=48:mpiprocs=N2ngpus=8 For 8-GPU nodes:
N1 = Number of nodes
N2 = MPI processes per node
-l walltime=HHH:MM:SS Maximum wall time

5.5. Launch Commands

To launch an MPI executable use mpiexec. For example:

mpiexec -n #_of_MPI_tasks ./mpijob.exe

For OpenMP executables, no launch command is needed.

5.6. Sample Script

While it is possible to include all PBS directives at the qsub command line, the preferred method is to embed the PBS directives within the batch request script using "#PBS". The following is a sample batch script:

#!/bin/csh

# Declare the project under which this job run will be charged. (required)
# Users can find eligible projects by typing "show_usage" on the command line.
#PBS -A Project_ID

# Request 1 hour of wallclock time for execution.
#PBS -l walltime=01:00:00

# Request nodes.
#PBS -l select=1:ncpus=12:mpiprocs=12:ngpus=2

# Submit job to standard queue.
#PBS -q standard

# Declare a jobname.
#PBS -N myjob

# Send standard output (stdout) and error (stderr) to the same file.
#PBS -j oe

# Change to the working directory.
cd $PBS_O_WORKDIR

# Execute a parallel program.
???

5.7. PBS Commands

The following commands provide the basic functionality for using the PBS batch system:

qsub: Used to submit jobs for batch processing.
qsub [ options ] my_job_script

qstat: Used to check the status of submitted jobs.
qstat PBS_JOBID ## check one job
qstat -u my_user_name ## check all of user's jobs

qdel: Used to kill queued or running jobs.
qdel PBS_JOBID