Cluster Details

HSIM Cluster – Mount

Overview

The cluster consists of four Dell T620 PowerEdge servers, a Dell 16-port 1Gbps switch, and a Synology 32TB SATA III NAS storage array in a RAID-10 configuration (16TB effective storage). Each server has two dual-threaded 8-core 2.6GHz (up to 3.3GHz with Intel Turbo Boost 2.0 technology) processors (uses the 256-bit AVX instruction set extension = 8 FLOP/s/cycle), 64GB of 1600MHz RDIMMs, 4x1Gbps NIC interfaces, a PERC H710 1GB RAID controller, 1x4TB 7.2K RPM 6Gbps SAS drive (set to RAID-0 for future additions), and 1x400GB SLC 6Gbps SSD drive (also set to RAID-0 for future expansion). The cluster can operate at 1.331-1.6896 TFLOP/s.

Processor Specifics

Each E5-2670 processor has:

  • 8 2.6GHz (up to 3.3GHz with Intel Turbo Boost 2.0)
  • 2 threads each for a total of 16 threads
  • 32 nm
  • 256-bit AVX instruction set extension (8 FLOP/s per cycle)
  • 115 W, 0.60V-1.35V
  • 40x PCI-e 3.0 lanes integrated into the processor
  • 2x QPI links
  • 32KB L1 cache per core
  • 25KB L2 cache per core
  • 20MB L3 shared memory
  • 4 channels of DDR3 1600MHz RAM (quad-channel)
  • 2 GB/s max memory bandwidth

Dual processor setup:

  • “SandyBridge”
  • Dual 8.0GT/s QPI link between processors
  • Dedicated quad-channel memory controllers per CPU
  • Integrated I/O controller per CPU, each provides CPU providers 40 lanes of PCI-e Gen 3.0 expansion
  • I/O Hub to connect to storage, legacy PCI bus, USB, and more

Estimated TFLOP/s:

  • General – 2.6GHz
    • Processor = 8(cores)*2.6G(clock)*8(FLOP/s/cycle) = 166.4 GFLOP/s
    • Server = 2*Processor = 332.8 GFLOP/s
    • Cluster = 4*server = 1,331.2 GFLOP/s = 1.331 TFLOP/s
  • With turbo boost 2.0 – 3.3GHz
    • Processor = 8(cores)*3.3G(clock)*8(FLOP/s/cycle) = 211.2 GFLOP/s
    • Server = 2*Processor = 422.4 GFLOP/s
    • Cluster = 4*server = 1,689.6 GFLOP/s = 1.6896 TFLOP/s
Additional Hardware

Dell 2816 Switch:

  • 16 1Gb ports

NAS RAID Array:

  • Synology DiskStation 8-bay (DS1812+)
  • 32TB of maximum storage
    • 8x4TB WD RE Enterprise HDs 7.2K RPM, SATA III, 64MB cache
    • RAID-10 configuration so a total of 16TB or mirrored storage
  • 3GB 1333GHz RAM
  • Dual 1Gb NIC
  • 4 USB 2.0, 2 USB 3.0
Cluster Summary

Overall cluster:

  • 64 2.6-3.3GHz cores (128 threads)
  • 256GB RAM (max 64GB/machine)
  • 16TB SAS 6Gbps (max 4TB/machine)
  • 1600GB SSD 6Gbps (max 400GB/machine)
  • 16TB SATA III RAID-1 NAS storage
Software
  • Rocks Cluster 6.1 – CentOS 6.3
  • Area51 Roll – file integrity software
  • Bio Roll – bioinformatics tools
    • HMMER, NCBI BLAST, MpiBLAST, Biopython, ClustalW, MrBayes, T_Coffee, Emboss, Phylip, fasta, Glimmer, TIGR Assembler, bioperl, bioperl-ext, bioperl-run, bioperl-db
  • Condor Roll – high throughput computing
  • Ganglia Roll – cluster monitoring
  • HPC Roll – higher performance computing
    • OpenMPI (with Java support), OpenMP, MPICH2
    • PVM – parallel virtual machines
  • Java Roll – various java packages
    • Eclipse, Java JDK, JBoxx, Maven, Tomcat
  • Perl Roll – perl
  • Python Roll – python
  • Sun Grid Engine Roll – scheduler
  • Torque Roll – scheduler
  • Web-server Roll – web server
  • Xen Roll – installing and configuring virtual machines
  • ZFS-Linux Roll – the ZFS file system

Other HPC Systems on Campus

SGI Prism – Coffee
  • Visualization engine
  • 8 single-threaded Itanium 2 1.5GHz processors (64-bit instruction set)
  • 16GB of memory
  • 380GB of storage
  • Estimated TFLOP/s
    • Processor = 1 (core)*1.5G (clock)*2(FLOP/s/cycle) = 3 GFLOP/s
    • Server = 8*Processor = 24 GFLOP/s = 0.024 TFLOP/s
SGI Origin 350 – Zeus
  • 32 single-threaded MIPS R16000A 800MHz processors (RISC) (64-bit instruction set)
  • 32GB of memory
  • 2TB of storage
  • Max allowed usage per the job scheduler: 8 CPUs for up to 72 hours, 4 CPUs for up to 120 hours
  • Estimated TFLOP/s
    • Processor = 1 (core)*0.78125G (clock)*2(FLOP/s/cycle) = 1.5625 GFLOP/s
    • Cluster = 32*Processor = 50 GFLOP/s = 0.05 TFLOP/s
SGI Altix 4700 – Jasta
  • 64 dual-core, single-threaded Montecito 1.6GHz processors (9000 series, 64-bit instruction set)
  • 128GB of shared memory
  • 14TB of storage
  • Same job scheduler limitations
  • Estimated TFLOP/s
    • Processor = 2 (cores)*1.6G (clock)*2(FLOP/s/cycle) = 6.4 GFLOP/s
    • Cluster= 64*Processor = 409.6 GFLOP/s = 0.4096 TFLOP/s
Mercury – Dell Xeon Cluster
  • 16 node cluster / 2 10-core Xeon E5-2670v2 2.5-3.3GHz processors (320 total cores) (256-bit AVX instruction set)
  • 32 core head node
  • 128GB per node (2.0TB total pooled)
  • 1TB storage per node / 180 total including disk array
  • OS: RedHat
  • Compilers: Intel and Portland
  • Scheduler: PBS Torque
  • Additional: Intel MKL and MPI
  • Unknown allocation max per job
  • Estimated TFLOP/s
    • General – 2.5GHz
      • Processor = 10(cores)*2.5G(clock)*8(FLOP/s/cycle) = 200 GFLOP/s
      • Server = 2*Processor = 400 GFLOP/s
      • Cluster = 16*server = 6,400 GFLOP/s = 6.4 TFLOP/s
    • With turbo boost 2.0 – 3.3GHz
      • Processor = 10(cores)*3.3G(clock)*8(FLOP/s/cycle) = 264 GFLOP/s
      • Server = 2*Processor = 528 GFLOP/s
      • Cluster = 16*server = 8,448 GFLOP/s = 8.448 TFLOP/s
Biology HPC Facility
IBM Intelligent Cluster
  • 2 quad-core HS22 servers as the head and storage nodes (no further information provided)
  • 8×2 six-core Intel Xeon 5650 processors (assumed to be dual-threaded, 2.66GHz, SSE4.2 instruction set extension (128-bit extension = 4 FLOP/s/cycle), 6.4GT/s, 3 channels of up to 1333MHz RAM, 32GB max memory bandwidth, and 12MB cache per Intel’s documentation)
    • Estimated TFLOP/s
      • General – 2.66GHz
        • Processor = 6 (cores)*2.66G (clock)*4(FLOP/s/cycle) = 63.84 GFLOP/s
        • Server = 2*Processor = 127.68 GFLOP/s
        • Cluster = 8*server = 1,021.44 GFLOP/s = 1.021 TFLOP/s
      • With turbo boost – 3.06GHz
        • Processor = 6 (cores)*3.06G (clock)*4(FLOP/s/cycle) = 73.44 GFLOP/s
        • Server = 2*Processor = 146.88 GFLOP/s
        • Cluster = 8*server = 1,175.04 GFLOP/s = 1.175 TFLOP/s
      • No memory specification
      • A single storage blade consisting of 10 600GB 15K SAS drives
Dell PowerEdge R910 large memory server
  • 4xIntel Xeon X7550 2GHz processors (assumed to be dual-threaded, 8-cores, 18MB cache, 6.4GT/s, 1066MHz RAM, and SSE4.2 instruction set extension (128-bit extension = 4 FLOP/s/cycle))
    • Estimate TFLOP/s
      • General – 2GHz
        • Processor GFLOP/s = 8 (cores)*2G (clock)*4(FLOP/s/cycle) = 64 GFLOP/s
        • Server = 4*Processor = 256 GFLOP/s = 0.256 TFLOP/s
      • With turbo boost – 2.4GHz
        • Processor GFLOP/s = 8(cores)*2.46G (clock)*4(FLOP/s/cycle) = 76.8 GFLOP/s
        • Server = 4*Processor = 307.2 GFLOP/s = 0.307 TFLOP/s
      • 1TB (64×16) memory
      • 2x146GB 15K SCSI drives
      • 4x1TB 7.2K SAS drives