Storrs HPC Cluster
The Storrs HPC cluster currently comprises over 20,000 cores, spread among 400 nodes, with two CPUs per node. The nodes include Intel Skylake and the latest AMD Epyc CPU (see table below).
Hi-speed parallel storage is provided; 220TB of scratch storage and 3.65PB of persistent storage, including archive storage.
We have three types GPU nodes available, with a total of 135 general-purpose GPUs, listed in a table below.
The Red Hat RHEL8 operating system runs on the Skylake nodes and newer. The remaining nodes will be upgraded to RHEL8 soon.
The Slurm scheduler manages jobs. Network traffic travels over Ethernet at 10 or 25Gb per second between nodes, and file data travels over Infiniband at 100Gb or 200Gb per second, depending on the node. The nodes are each connected via our Infiniband network to over 3.65PB of parallel storage, managed by WekaIO.
The Storrs HPC cluster is supported by three full-time staff, and two or more student workers. Scientific applications are installed as needed; to date over 200 have been made available.
Node Details
CPU | Cores per Node |
Memory per Node |
Number of Nodes |
Total Cores |
Network Speed |
Infiniband Speed |
---|---|---|---|---|---|---|
Intel SkyLake | 36 | 192GB | 82 | 2952 | 10Gb | 100Gb |
AMD Epyc | 64 | 256GB | 38 | 2432 | 10Gb | 200Gb |
AMD Epyc | 128 | 512GB | 108 | 13824 | 25Gb | 200Gb |
GPU Node Details
GPU Type | CPU Type | Number of Nodes | Cores per Node | Total Cores | GPUs/node | Total GPUS |
---|---|---|---|---|---|---|
NVidia Tesla V100 | Skylake | 2 | 36 | 72 | 1 to 3 | 18 |
NVidia Tesla A100 | AMD Epyc | 16 | 64 | 1024 | 1 to 3 | 28 |
NVidia Tesla A30 | AMD Epyc | 16 | 64 | 1024 | 1 to 3 | 5 |
The Condo Model
In the Condo Model, researchers who fund nodes get priority access. However, if their priority job queue becomes idle, unprivileged jobs may run instead. Once started, unprivileged jobs can run for up to twelve hours before they stop. So although priority jobs could wait twelve hours to start, typically most priority jobs wait less than an hour. Furthermore, if priority users keep their job queue full, their jobs will not wait at all.
You can read more about the “Condo Model” on the HPC Knowledge Base.
Last updated December 2, 2022