The University of Miami’s Supercomputer, Pegasus, is a 350-node Lenovo cluster with each node having 2 Intel Sandy Bridge E5-2670 (2.6 GHz) 8C – with 32 GB 1600 MHz RAM (2GB/core) for a total of over 160 TFlops. Connected with an FDR Infiniband fabric, Pegasus was purpose-built for the style of data processing performed by biomedical research and analytics. In contrast with traditional Supercomputers where data flows along the slowest communication network possible (Ethernet), Pegasus was built on the principle that data needs to be on the fastest fabric possible. By utilizing the low latency high bandwidth IB fabric for data, Pegasus is able to access all three tiers (SSD, 15K RPM SAS, 7.2K NL-SAS) at unprecedented speeds.
Unlike traditional HPC storage, the 150 TB
/scratch filesystem is optimized for small random reads and writes; and can support over 125,000 sustained IOPs/second and 20 Gb/sec throughput at 4Kb file size. Composed of over 500 15K RPM SAS disks,
/scratch is ideal for the extremely demanding IO requirements of biomedical workloads.
For instances where even
/scratch is not fast enough, Pegasus now has access to over 8TB of burst buffer space clocked at over 1,000,000 IOPs. This buffer space provides biomedical researchers a good place for large file manipulation and transformation.
Along with the 350 nodes in the general processing queue, all researchers also have access to the 20 large memory nodes in the bigmem queue. With access to the entire suite of software available on Pegasus, the bigmem queue provides large memory access (256 GB) to researchers where parallelization is not an option. With 20 cores each, the bigmem servers provide an SMP-like environment well suited to biomedical research.
As many modern analysis tools require interaction, Pegasus has a unique feature of allowing ssh and graphical (GUI) access to programs using LSF. Tools ranging from Matlab to Knime and SAS to R are available to researchers in the interactive queue with full speed access to
/scratch and the W.A.D.E. storage cloud.
Five Racks of iDataPlex in iDataPlex Racks
- One Standard Enterprise Rack for Networking and Management
- iDataPlex dx360 M4:
- Qty (2) Intel Sandy Bridge E5-2670 (2.6 GHz)- 32 GB 1600 MHz RAM (2GB/core)
- Mellanox Connect X3 Single-Port FDR
- Mellanox FDR MSX6036
- DNA SFA 12K:
- Qty (12) 3TB 7.2K RPM SATA (RAID 6 in 8+2)
- Qty (360) 600GB 15K SAS (RAID 6 in 8+2)
- Qty (10) 400GB MLC SSD (RAID 1 Pairs)
- xCAT 2.7.x
- Platform LSF
- RHL 6.2 for Login/Management Nodes
Pegasus’ CPU Workhorse — The IBM iDataPlex dx360M4
|Compute Nodes||350 dx360 M4 Compute Nodes|
|Processor||Two 8-core Intel Sandy Bridge 2.6 GHz scalar, 2.33 GHz* AVX|
|Memory||32 GiB (2 GiB/core) using eight x 4GB 1600MHz DDR3 DIMMs|
|Clustering Network||One FDR InifiniBand HCA|
|Management Network||GB Ethernet NIC connected to the cluster management VLANs. IMM access shared through the eth0 port|
Examples of Projects Run on Pegasus
- Consortium for Advanced Research on Transport of Hydrocarbon in the Environment (CARTHE) Project| Project Leader: Tamay Ozgokmen | Project Updates Blog
Fund Title: Consortium for Advanced Research on Transport of Hydrocarbon in the Environment
Fund Source: Gulf of Mexico Research Initiative
CPU Hours: 5,000,000 | CCS Focus Area: Climate & Environmental Hazards
Scientific Justification: Building on results from RFP-I, CARTHE remain focused on advancing fundamental understanding and modeling of the diverse physical mechanisms responsible for hydrocarbon transport in the Gulf of Mexico environment. An integral part of any informed response to a future event like the Deepwater Horizon incident requires knowledge of the distribution of pollutants in the water column and the ability to predict where and how fast the pollutants will spread. This information is also crucial for estimating the pollutants’ impact on the local ecosystem and coastal communities. The overall goal of CARTHE is accurate predictive modeling of pollutant transport from ocean-bottom release to landfall on the beach. This project specifically identifies two topics, whose understanding is critical for oil spill dispersion prediction; namely (i) the dynamics of transport in the near-surface ocean and lower atmosphere, and (ii) transport in deep-ocean plumes. Read more.
- Age-Related Macular Degeneration (AMD) | Project Leader: William Scott
Fund Title: Genomic Architecture of Progression and Treatment Response in AMD8
Fund Source: National Institutes of Health (NIH) funded project EY012118
CPU Hours: 1,000,000 | CCS Focus Area: Computational Biology & Bioinformatics
Scientific Justification: Age-related macular degeneration (AMD) is a significant health problem that affects millions of individuals and is the most common cause of severe vision loss among individuals over age 50 in the U.S. The influence of genetic variation on AMD is b and through the application of recent technological advances the genetic etiology of risk for AMD is being deconstructed. Independent studies, including our own, have identified and confirmed variations in multiple genes that affect risk to AMD, including CFH, HTRA1/ARMS2, C2/CFB, and C3. Variation in these genes explain a significant portion of the genetic risk for AMD and ongoing studies are continuing to identify additional such genes. Also important are environmental risk factors such as smoking, hormone therapy and diet that contribute to AMD risk both independently and through their interactions with genes. Again, ongoing studies are teasing apart these contributions. However, risk is just one of the many facets of the overall genetic architecture of AMD. Disease progression and treatment response are two critical elements also influenced by genetic variation. The goal of this proposal is to increase our understanding of the genetic etiology of progression and treatment response in AMD, both of which have been understudied. Identifying the genes underlying clinical outcomes is directly relevant to better directing current treatments and developing new and better treatments and regimens for those suffering this disabling disorder.
- Mapes Lab| Project Leader: Brian Mapes
Fund Title: Understanding predictability and model error . . .
Fund Source: Office of Naval Research (ONR)
CPU Hours: 1,000,000 | CCS Focus Area: Climate & Environmental Hazards
Scientific Justification: Our global atmosphere models need to be kept on track by nudging their state to observed state sequences through time. Those target state sequences are large 4D datasets that need to be consulted during runtime. At the same time, we need to output all the tendencies including the nudging tendency for diagnosis, so that too requires space.
- Addiction Study| Project Leaders: Deborah Mash, CCS Fellow National Institute on Drug Abuse
CPU Hours: 50,000 | CCS Focus Area: Drug Discovery
Scientific Justification: Discovery science suggests that drug addiction and obesity are defined as disorders in which the saliency of food or drug reward becomes exaggerated and share some neurobiological overlaps. Several common neuropeptides and hormones regulate the excessive consumption of both addictive drugs and palatable food, such as insulin and leptin, which are involved in the mesolimbic and nigrostriatal dopamine (DA) system that regulate food and drug use. This study is focused on identifying common molecular targets for addiction and obesity for medication development.