Mont-Blanc
Transcription
Mont-Blanc
http://www.montblanc-project.eu High Performance Computing on ARM Hardware The Mont-Blanc Project Axel Auweter Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities 1 7th Workshop on UnConventional High Performance Computing 2014 This project has received funding from the European Union's Seventh Framework Programme for research, technological development and demonstration under grant agreements n° 288777 and 610402. August 26, 2014 The Top500 List of Supercomputers 1E+18 Linpack Performance (FLOPs) 1E+17 1E+16 1E+15 1E+14 1E+13 In 1E+12 1E+11 m s i l e l l a r a ed P s a e r Inc 1E+10 1E+09 1E+08 06/93 lo C d ase d e e p ck S Incre06/96 06/99 06/02 #1 2 d e s a cre t e g d Bu 7th Workshop on UnConventional High Performance Computing 06/05 #500 August 26, 2014 06/08 Sum 06/11 06/14 Sustained Performance vs. Peak Performance 1E+15 100% Linpack Performance (FLOPs) 1E+14 75% 1E+13 1E+12 50% 1E+11 25% 1E+10 1E+09 0% 06/93 06/96 06/99 06/02 06/05 Top500 Avg Peak Performance 3 7th Workshop on UnConventional High Performance Computing 06/08 Top500 Avg Efficiency August 26, 2014 06/11 06/14 Sustained Performance vs. Peak Performance 1E+17 100% Linpack Performance (FLOPs) 1E+16 75% 1E+15 1E+14 50% 1E+13 25% 1E+12 1E+11 0% 06/93 06/96 06/99 06/02 06/05 Top500 #1 Peak Performance 4 7th Workshop on UnConventional High Performance Computing 06/08 Top500 #1 Efficiency August 26, 2014 06/11 06/14 Building Supercomputers - The Tradeoffs Lower Price Higher Performance Higher Scientific Value 5 7th Workshop on UnConventional High Performance Computing August 26, 2014 Building Supercomputers - The Tradeoffs Lower Price Higher Performance Higher Scientific Value • Reduce component prices by using commodity hardware • Improve energy efficiency across the entire HPC stack • Ensure scientific value of the machine through Co-Design 6 7th Workshop on UnConventional High Performance Computing August 26, 2014 Commodity Already Drives HPC Number of Top500 Systems 500 400 300 200 100 0 06/93 06/96 06/99 06/02 Vector 7 7th Workshop on UnConventional High Performance Computing 06/05 RISC August 26, 2014 06/08 x86 06/11 06/14 Mobile & Embedded Hardware: Energy Efficient & Commodity • Powerful & energy efficient devices do exist! • ... and they are the next step in commodity hardware! • Sold in 2013 (2012): • • • • 8 ~ 10M Servers (<10M) > 320M PCs (>350M) > 195M Tablets (>100M) > 960M Smartphones (>700M) 7th Workshop on UnConventional High Performance Computing August 26, 2014 ARM Processor Improvements in DP FLOPS IBM Intel BG/Q AVX 16 DP ops/cycle 8 ARMv8 Intel SSE2 4 IBM BG/P ARM CortexTM-A15 2 ARM CortexTM-A9 1 1999 2001 2003 2005 2007 2009 2011 2013 2015 • 256 bit SIMD is today’s de-facto standard for vectorization • 8 DP ops / cycle • ARM quickly moved from optional floating-point to state-of-the-art • ARMv8 ISA introduces DP in the NEON instruction set (128-bit SIMD) 9 7th Workshop on UnConventional High Performance Computing August 26, 2014 The Mont-Blanc Project 10 7th Workshop on UnConventional High Performance Computing August 26, 2014 Mont-Blanc Project Facts • To develop an European Exascale approach • Leverage commodity and embedded power-efficient technology • Partners: • Supported by the EU’s FP7 with 16M € under two projects: • • 11 Mont-Blanc: October 2011 - September 2014 (14.5M €) Mont-Blanc 2: October 2013 - September 2016 (8.0M €) 7th Workshop on UnConventional High Performance Computing August 26, 2014 Selecting a Suitable System-on-Chip - Wishlist 12 7th Workshop on UnConventional High Performance Computing August 26, 2014 Samsung Exynos 5 Dual (5250) • Dual-core ARM Cortex-A15 @ 1.7 GHz • • VFP for 64-bit Floating Point • 6.8 GFLOPS (1 FMA / cycle) NEON for 32-bit floating point SIMD • Quad-core ARM Mali T604 GPU • Compute capable • • • OpenCL 1.1 68 GFLOPS (SP) 25.5 GFLOPS (DP) • Shared memory between CPU and GPU 13 7th Workshop on UnConventional High Performance Computing August 26, 2014 Exynos5 Multicore Performance • Tegra3 platform as fast as Exynos5 platform, a bit more energy efficient • 4-core Cortex A-9 vs. 2-core Cortex A-15 • Core i7 is 6x faster than Exynos5 at maximum frequency • Tegra3 and Exynos5 as efficient as Core i7 at the same frequency 14 7th Workshop on UnConventional High Performance Computing August 26, 2014 Exynos5 Multicore Performance • Tegra3 platform as fast as Exynos5 platform, a bit more energy efficient • 4-core Cortex A-9 vs. 2-core Cortex A-15 • Core i7 is 6x faster than Exynos5 at maximum frequency • Tegra3 and Exynos5 as efficient as Core i7 at the same frequency 14 7th Workshop on UnConventional High Performance Computing August 26, 2014 Mont-Blanc Server-on-Module (SoM) • CPU+GPU+DRAM+storage+network all in a compute card that’s just 8.5 x 5.6cm! 15 7th Workshop on UnConventional High Performance Computing August 26, 2014 Mont-Blanc Prototype • Exynos 5 compute card • • • 2x Cortex A-15 @ 1.7 GHz 1x Mali T-604 GPU 6.8 + 25.5 GFLOPS (peak) @ 15W • • • 15x Compute Cards 485 GFLOPS @ 300W 1 GbE to 10 GbE • • 9x Carrier Blades (=135 Compute Cards) 4.3 TFLOPS @ 2.7 kW • • 6x Blade Chassis 26 TFLOPS @ 18 kW • Carrier Blade • Blade Chassis 7U • Rack 16 7th Workshop on UnConventional High Performance Computing August 26, 2014 Mont-Blanc Applications BQCD Particle physics BigDFT * Elect. Structure COSMO * & ** Weather forecast EUTERPE * & ** Fusion MP2C Multi-particle collisions PEPC Coulomb + Grav. Forces ProFASI Protein folding Quantum ESPRESSO * & ** Elect. Structure * GPU capable (CUDA or OpenCL) ** OmpSs capable SMMP * Protein folding 17 SPECFEM3D * & ** Wave propagation 7th Workshop on UnConventional High Performance Computing YALES2 Combustion August 26, 2014 Further Topics in Mont-Blanc • System Software Stack • • • • • Linux gcc, g++, gfortran, ... ATLAS, FFTW, HDF5 SLURM MPICH2, OpenMPI • • Allinea DDT debugger Extrae, Scalasca • OmpSs • Provides SMEs remote access to Mont-Blanc prototype platforms • Development Tools • Programming Models • Industrial End-User Group • Support & Training 18 7th Workshop on UnConventional High Performance Computing August 26, 2014 19 7th Workshop on UnConventional High Performance Computing August 26, 2014 The 4 Pillar Model of Energy Efficient HPC • Use newest semiconductor technology • Use of energy saving processor and memory technologies • Consider using special hardware or accelerators designed for specific scientific problems or numerical algorithms Energy efficient hardware 20 • Reduce power losses in the power supply chain • Improve cooling technology • Reuse waste heat from IT systems Energy efficient infrastructure 7th Workshop on UnConventional High Performance Computing • Monitor the energy consumption of the compute system and the building infrastructure • Use energy aware system software to exploit the energy saving features of the platform Energy aware system software August 26, 2014 • Use the most efficient algorithms • Use best libraries • Use most efficient programming paradigm Energy efficient applications The 4 Pillar Model of Energy Efficient HPC • Use newest semiconductor technology • Use of energy saving processor and memory technologies • Consider using special hardware or accelerators designed for specific scientific problems or numerical algorithms Energy efficient hardware 20 • Reduce power losses in the power supply chain • Improve cooling technology • Reuse waste heat from IT systems Energy efficient infrastructure 7th Workshop on UnConventional High Performance Computing • Monitor the energy consumption of the compute system and the building infrastructure • Use energy aware system software to exploit the energy saving features of the platform Energy aware system software August 26, 2014 • Use the most efficient algorithms • Use best libraries • Use most efficient programming paradigm Energy efficient applications 21 7th Workshop on UnConventional High Performance Computing August 26, 2014 Exynos Exynos Exynos Exynos Exynos Exynos Exynos Exynos Exynos Exynos Exynos Exynos Exynos Exynos Exynos Monitoring Power Consumption in Mont-Blanc Monitoring Power Consumption in Mont-Blanc PM PM PM PM PM 22 PM PM PM PM PM PM PM PM 7th Workshop on UnConventional High Performance Computing PM PM August 26, 2014 BMC FPGA Monitoring Power Consumption in Mont-Blanc • Field Programmable Gate Array (FPGA) • • Collects power consumption data from all 15 power measurement ICs Sample interval: 70ms BMC BMC Mont-Blanc Pusher Collects 1s averaged data from FPGA Stores measurement samples in FIFO MQTT Protocol • Mont-Blanc Pusher • • BMC IPMI • Board Management Controller (BMC) • • BMC Collects measurement data from multiple BMCs using custom IPMI commands Forwards data using MQTT Protocol through Collect Agent into key-value store Collect Agent Distributed Key-Value Store 23 7th Workshop on UnConventional High Performance Computing August 26, 2014 Monitoring Power Consumption in the entire Data Center Mont-Blanc Pusher IPMI Pusher SysFS Pusher BACNet Pusher MQTT Protocol Collect Agent Distributed Key-Value Store 24 7th Workshop on UnConventional High Performance Computing August 26, 2014 ... Monitoring Power Consumption in the entire Data Center Mont-Blanc Pusher IPMI Pusher SysFS Pusher BACNet Pusher MQTT Protocol Collect Agent Collect Agent Distributed Key-Value Store 24 7th Workshop on UnConventional High Performance Computing August 26, 2014 ... To take home... www.montblanc-project.eu MontBlancEU @MontBlanc_EU 25 7th Workshop on UnConventional High Performance Computing August 26, 2014
Similar documents
“Piz Daint:” Application driven co-design of a
co-design of a supercomputer based on Cray’s adaptive system design
More information