HSIM Cluster – Mount
Overview
The cluster consists of four Dell T620 PowerEdge servers, a Dell 16-port 1Gbps switch, and a Synology 32TB SATA III NAS storage array in a RAID-10 configuration (16TB effective storage). Each server has two dual-threaded 8-core 2.6GHz (up to 3.3GHz with Intel Turbo Boost 2.0 technology) processors (uses the 256-bit AVX instruction set extension = 8 FLOP/s/cycle), 64GB of 1600MHz RDIMMs, 4x1Gbps NIC interfaces, a PERC H710 1GB RAID controller, 1x4TB 7.2K RPM 6Gbps SAS drive (set to RAID-0 for future additions), and 1x400GB SLC 6Gbps SSD drive (also set to RAID-0 for future expansion). The cluster can operate at 1.331-1.6896 TFLOP/s.
Processor Specifics
Each E5-2670 processor has:
- 8 2.6GHz (up to 3.3GHz with Intel Turbo Boost 2.0)
- 2 threads each for a total of 16 threads
- 32 nm
- 256-bit AVX instruction set extension (8 FLOP/s per cycle)
- 115 W, 0.60V-1.35V
- 40x PCI-e 3.0 lanes integrated into the processor
- 2x QPI links
- 32KB L1 cache per core
- 25KB L2 cache per core
- 20MB L3 shared memory
- 4 channels of DDR3 1600MHz RAM (quad-channel)
- 2 GB/s max memory bandwidth
Dual processor setup:
- “SandyBridge”
- Dual 8.0GT/s QPI link between processors
- Dedicated quad-channel memory controllers per CPU
- Integrated I/O controller per CPU, each provides CPU providers 40 lanes of PCI-e Gen 3.0 expansion
- I/O Hub to connect to storage, legacy PCI bus, USB, and more
Estimated TFLOP/s:
- General – 2.6GHz
- Processor = 8(cores)*2.6G(clock)*8(FLOP/s/cycle) = 166.4 GFLOP/s
- Server = 2*Processor = 332.8 GFLOP/s
- Cluster = 4*server = 1,331.2 GFLOP/s = 1.331 TFLOP/s
- With turbo boost 2.0 – 3.3GHz
- Processor = 8(cores)*3.3G(clock)*8(FLOP/s/cycle) = 211.2 GFLOP/s
- Server = 2*Processor = 422.4 GFLOP/s
- Cluster = 4*server = 1,689.6 GFLOP/s = 1.6896 TFLOP/s
Additional Hardware
Dell 2816 Switch:
- 16 1Gb ports
NAS RAID Array:
- Synology DiskStation 8-bay (DS1812+)
- 32TB of maximum storage
- 8x4TB WD RE Enterprise HDs 7.2K RPM, SATA III, 64MB cache
- RAID-10 configuration so a total of 16TB or mirrored storage
- 3GB 1333GHz RAM
- Dual 1Gb NIC
- 4 USB 2.0, 2 USB 3.0
Cluster Summary
Overall cluster:
- 64 2.6-3.3GHz cores (128 threads)
- 256GB RAM (max 64GB/machine)
- 16TB SAS 6Gbps (max 4TB/machine)
- 1600GB SSD 6Gbps (max 400GB/machine)
- 16TB SATA III RAID-1 NAS storage
Software
- Rocks Cluster 6.1 – CentOS 6.3
- Area51 Roll – file integrity software
- Bio Roll – bioinformatics tools
- HMMER, NCBI BLAST, MpiBLAST, Biopython, ClustalW, MrBayes, T_Coffee, Emboss, Phylip, fasta, Glimmer, TIGR Assembler, bioperl, bioperl-ext, bioperl-run, bioperl-db
- Condor Roll – high throughput computing
- Ganglia Roll – cluster monitoring
- HPC Roll – higher performance computing
- OpenMPI (with Java support), OpenMP, MPICH2
- PVM – parallel virtual machines
- Java Roll – various java packages
- Eclipse, Java JDK, JBoxx, Maven, Tomcat
- Perl Roll – perl
- Python Roll – python
- Sun Grid Engine Roll – scheduler
- Torque Roll – scheduler
- Web-server Roll – web server
- Xen Roll – installing and configuring virtual machines
- ZFS-Linux Roll – the ZFS file system
Other HPC Systems on Campus
SGI Prism – Coffee
- Visualization engine
- 8 single-threaded Itanium 2 1.5GHz processors (64-bit instruction set)
- 16GB of memory
- 380GB of storage
- Estimated TFLOP/s
- Processor = 1 (core)*1.5G (clock)*2(FLOP/s/cycle) = 3 GFLOP/s
- Server = 8*Processor = 24 GFLOP/s = 0.024 TFLOP/s
SGI Origin 350 – Zeus
- 32 single-threaded MIPS R16000A 800MHz processors (RISC) (64-bit instruction set)
- 32GB of memory
- 2TB of storage
- Max allowed usage per the job scheduler: 8 CPUs for up to 72 hours, 4 CPUs for up to 120 hours
- Estimated TFLOP/s
- Processor = 1 (core)*0.78125G (clock)*2(FLOP/s/cycle) = 1.5625 GFLOP/s
- Cluster = 32*Processor = 50 GFLOP/s = 0.05 TFLOP/s
SGI Altix 4700 – Jasta
- 64 dual-core, single-threaded Montecito 1.6GHz processors (9000 series, 64-bit instruction set)
- 128GB of shared memory
- 14TB of storage
- Same job scheduler limitations
- Estimated TFLOP/s
- Processor = 2 (cores)*1.6G (clock)*2(FLOP/s/cycle) = 6.4 GFLOP/s
- Cluster= 64*Processor = 409.6 GFLOP/s = 0.4096 TFLOP/s
Mercury – Dell Xeon Cluster
- 16 node cluster / 2 10-core Xeon E5-2670v2 2.5-3.3GHz processors (320 total cores) (256-bit AVX instruction set)
- 32 core head node
- 128GB per node (2.0TB total pooled)
- 1TB storage per node / 180 total including disk array
- OS: RedHat
- Compilers: Intel and Portland
- Scheduler: PBS Torque
- Additional: Intel MKL and MPI
- Unknown allocation max per job
- Estimated TFLOP/s
- General – 2.5GHz
- Processor = 10(cores)*2.5G(clock)*8(FLOP/s/cycle) = 200 GFLOP/s
- Server = 2*Processor = 400 GFLOP/s
- Cluster = 16*server = 6,400 GFLOP/s = 6.4 TFLOP/s
- With turbo boost 2.0 – 3.3GHz
- Processor = 10(cores)*3.3G(clock)*8(FLOP/s/cycle) = 264 GFLOP/s
- Server = 2*Processor = 528 GFLOP/s
- Cluster = 16*server = 8,448 GFLOP/s = 8.448 TFLOP/s
- General – 2.5GHz
Biology HPC Facility
IBM Intelligent Cluster
- 2 quad-core HS22 servers as the head and storage nodes (no further information provided)
- 8×2 six-core Intel Xeon 5650 processors (assumed to be dual-threaded, 2.66GHz, SSE4.2 instruction set extension (128-bit extension = 4 FLOP/s/cycle), 6.4GT/s, 3 channels of up to 1333MHz RAM, 32GB max memory bandwidth, and 12MB cache per Intel’s documentation)
- Estimated TFLOP/s
- General – 2.66GHz
- Processor = 6 (cores)*2.66G (clock)*4(FLOP/s/cycle) = 63.84 GFLOP/s
- Server = 2*Processor = 127.68 GFLOP/s
- Cluster = 8*server = 1,021.44 GFLOP/s = 1.021 TFLOP/s
- With turbo boost – 3.06GHz
- Processor = 6 (cores)*3.06G (clock)*4(FLOP/s/cycle) = 73.44 GFLOP/s
- Server = 2*Processor = 146.88 GFLOP/s
- Cluster = 8*server = 1,175.04 GFLOP/s = 1.175 TFLOP/s
- No memory specification
- A single storage blade consisting of 10 600GB 15K SAS drives
- General – 2.66GHz
- Estimated TFLOP/s
Dell PowerEdge R910 large memory server
- 4xIntel Xeon X7550 2GHz processors (assumed to be dual-threaded, 8-cores, 18MB cache, 6.4GT/s, 1066MHz RAM, and SSE4.2 instruction set extension (128-bit extension = 4 FLOP/s/cycle))
- Estimate TFLOP/s
- General – 2GHz
- Processor GFLOP/s = 8 (cores)*2G (clock)*4(FLOP/s/cycle) = 64 GFLOP/s
- Server = 4*Processor = 256 GFLOP/s = 0.256 TFLOP/s
- With turbo boost – 2.4GHz
- Processor GFLOP/s = 8(cores)*2.46G (clock)*4(FLOP/s/cycle) = 76.8 GFLOP/s
- Server = 4*Processor = 307.2 GFLOP/s = 0.307 TFLOP/s
- 1TB (64×16) memory
- 2x146GB 15K SCSI drives
- 4x1TB 7.2K SAS drives
- General – 2GHz
- Estimate TFLOP/s