Sample Hardware Configuration for Structural Analysis (ANSYS):

Transcription

Sample Hardware Configuration for Structural Analysis (ANSYS):
Sample Hardware Configuration for Structural Analysis (ANSYS):
Penguin Computing 64-Core CAE Cluster
Operating Environment
Scyld ClusterWare 4: Manage a cluster like a single system,
Minimal overhead on compute nodes
Single point of control for Scyld ClusterWare / NFS server
Head Node
Options:
1 x Relion Intel 2612: Dual processor, Dual/Quad Core Xeon CPU
52XX/54XX, Redundant power supply, Up to 12 hot swappable
SATA or SAS Hard Drives,
1 x Altus AMD 2650: Dual processor, Dual/Quad Core AMD
Opteron 2200/2300 Series, Redundant power supply, Up to 6 hot
swappable SATA or SAS Hard Drives
Compute engines
Compute Node
Options:
4 x Relion Intel 1672: Dual processor, Quad Core Xeon CPU 54XX
(Harpertown), Intel Chipset (Seaburg), ‘Twin’ system integrating
two nodes in one 1U unit → High Density, Single power supply →
Power efficiency
8 x Altus AMD 650 Linux Server: Dual Processor, Quad Core
Opteron CPU 235X (Barcelona), NVidia 3600 Chipset, 16 DIMM
slots → Up to 128GB RAM capacity
Storage
Compute Nodes: 2 x 160GB SATA Drives, 7200RPM, RAID0
configuration
Memory
16GB – 32GB (2 – 4 GB per core): Depends on model complexity,
ANSYS recommends 1GB per Million DOFs
Interconnect
Gigabit Ethernet
Memory Configuration
The recommended amount of memory is highly model and solver dependent. Figure 1
shows the runtime for ANSYS benchmarks bm-1 – bm-8 for three different memory
configurations of 8GB, 16GB and 32GB. The presented results were obtained on cluster
of Penguin Computing Relion 1600 servers, equipped with two dual-core Intel Xeon
5160 CPUs (clock speed of 3GHz). The benchmarks are described at
http://www.ansys.com/services/hardware-support-db.htm.
Figure 1: Performance Impact of Memory Configuration
DANSYS Scalability
Distributed ANSYS spreads the computational workload of a single solver run across
multiple systems. Figure 2 illustrates the solver scalability using ANSYS benchmark
bmd-4. The cores used for this set of benchmark runs were allocated round-robin: Each
process was launched on one core on a different system. After four cores on four
systems had been allocated, the algorithm wrapped around and allocated the next core
on the first node in the set etc. Each node had 8GB of RAM installed.
Figure 2: Scalability of ANSYS’ Distributed Solver