The Mont-Blanc approach towards Exascale - Mont
Transcription
The Mont-Blanc approach towards Exascale - Mont
About BSC Computer Science Department • Origins in UPC Computer Architecture Department • • 1985- Inherited know-how and key personnel Our mission: To influence the way high-performance computing systems are built, programmed and used 1991-2000 2000-04 People and teams Programming Models Programming Models (26) Performance Tools Performance Tools (11) Accelerators for HPC (5) Grid and Cluster Grid and Cluster (21) Autonomic eBusiness Autonomic eBusiness (14) Storage Systems Storage Systems (11) Extreme Computing Computer Architecture Application Optimization Directors Team leaders Researchers Associate (UPC) Heterogenous Architectures Heterogenous Architectures (17) OS/CA Interface OS/CA Interface (21) OS/PM Interface OS/PM Interface (26) Network Architectures Network Architectures 100 75 50 25 Students 0 Support 2005 (28) 2006 (69) 2007 (82) (3) 2008 (130) 2009 (138) 2010 (162) 2011 (159) (2) Department funding sources 6.000.000 5.000.000 4.000.000 Contracts EU funding 3.000.000 National Personnel 2.000.000 1.000.000 0 2006 2007 2008 2009 2010 2011 EU projects HOPSA-EU OPTIMIS Red Iberoamericana de Supercomputación 6.000.000 5.000.000 4.000.000 3.000.000 2.000.000 1.000.000 0 Total: 10.76 M€ Average: 1.8 M€ Current (2012): 2.9 M€ 2006 2007 2008 2009 2010 2011 http://www.montblanc-project.eu The next generation supercomputer Alex Ramirez Barcelona Supercomputing Center Disclaimer: Not only I speak for myself ... All references to unavailable products are speculative, taken from web sources. There is no commitment from ARM, Samsung, TI, Nvidia, Bull, or others, implied. This project and the research leading to these results has received funding from the European Community's Seventh Framework Programme [FP7/2007-2013] under grant agreement n° 288777. Project goals • • To develop an European Exascale approach Based on embedded power-efficient technology • Objetives • • • Develop a first prototype system, limited by available technology Design a Next Generation system, to overcome the limitations Develop a set of Exascale applications targeting the new system Outline • A bit of history • • • • Supercomputers from mobile components • • • • • Vector supercomputers Commodity supercomputers The next step in the commodity chain Killer mobile examples Mont-Blanc architecture strawman Rely on OmpSs to handle the challenges BSC prototype roadmap Mont-Blanc project goals and milestones In the beginning ... there were only supercomputers • • • • • • • • Built to order • Very few of them Special purpose hardware • Very expensive Control Data, Convex, ... Cray-1 • 1975, 160 MFLOPS • 80 units, 5-8 M$ Cray X-MP • 1982, 800 MFLOPS Cray-2 • 1985, 1.9 GFLOPS Cray Y-MP • 1988, 2.6 GFLOPS Fortran+vectorizing compilers 10 The Killer Microprocessors 10.000 MFLOPS Cray-1, Cray-C90 NEC SX4, SX5 1000 Alpha AV4, EV5 Intel Pentium 100 IBM P2SC HP PA8200 10 1974 • 1984 1989 1994 1999 Microprocessors killed the Vector supercomputers • • • 1979 They were not faster ... ... but they were significantly cheaper and greener Need 10 microprocessors to achieve the performance of 1 Vector CPU • SIMD vs. MIMD programming paradigms 11 Then, commodity took over special purpose • ASCI Red, Sandia • • • • 1997, 1 TFLOPS (Linpack), 9298 cores @ 200 Mhz 1.2 Tbytes Intel Pentium Pro • Upgraded to Pentium II Xeon, 1999, 3.1 Tflops • MareNostrum, BSC • • 2004, 20 TFLOPS IBM PowerPC 970 FX • • • Blade enclosure Myrinet + 1 GbE network SuSe Linux Message-Passing Programming Models 12 The next step in the commodity chain HPC Servers Desktop • • Mobile • Total cores in Jun'12 Top500 • 13.5 Mcores Tablets sold in Q4 2011 • 27 Mtablets Smartphones sold Q4 2011 • > 100 Mphones 13 ARM Processor improvements in DP FLOPS IBM Intel BG/Q AVX 16 ARMv8 DP ops/cycle 8 Intel SSE2 4 IBM BG/P 2 ARM CortexTM-A9 1 1999 • 2001 2003 2005 2007 2009 2011 2013 2015 IBM BG/Q and Intel AVX implement DP in 256-bit SIMD • • ARM CortexTM-A15 8 DP ops / cycle ARM quickly moved from optional floating-point to state-of-the-art • ARMv8 ISA introduces DP in the NEON instruction set (128-bit SIMD) 14 Integrated ARM GPU performance Performance Skrymir Mali-T658 High-end solution + compute capability Scalable to 8 cores, ARMv8 compatible 272 GFLOPS* Mali-T604 First Midgard architecture product Scalable to 4 cores 68 GFLOPS* 2012 • 2013 2014 GPU compute performance increases faster than Moore’s Law * Data from web sources, not an ARM commitment Are the “Killer Mobiles™" coming? Cost (log10) Server ($1500) Desktop ($150) HPC-Mobile ($40) ? Mobile ($20) Nowadays Near future Performance (log2) • Where is the sweet spot? Maybe in the low-end ... • • • Today ~ 1:8 ratio in performance, 1:100 ratio in cost Tomorrow ~ 1:2 ratio in performance, still 1:100 in cost ? The same reason why microprocessors killed supercomputers • Not so much performance ... but much lower cost, and power 16 Samsung Exynos 5 Dual Superphone SoC • 32nm HKMG • Dual-core ARM Cortex-A15 @ 1.7 GHz • Quad-core ARM Mali T604 • OpenCL 1.1 • Dual-channel DDR3 • USB 3.0 to 1 GbE bridge • All in a low-power mobile socket High density packaging architecture • • Standard BullX blade enclosure Multiple compute nodes per blade • Additional level of interconnect, on-blade network There is no free lunch 2X more nodes for the same performance ½ on-chip memory / core 8X more address spaces 1 GbE inter-chip communication OmpSs runtime layer manages architecture complexity P0 P1 P2 • • Programmer exposed a simple architecture Task graph provides lookahead • • Automatically handle all of the architecture challenges • • • • • No overlap Overlap Exploit knowledge about the future Strong scalability Multiple address spaces Low cache size Low interconnect bandwidth Enjoy the positive aspects • • Energy efficiency Low cost BSC ARM-based prototype roadmap GFLOPS / W Pedraforca: ARM + GPU Tibidabo: ARM multicore 2011 • Integrated ARM + GPU 2012 2013 2014 Prototypes are critical to accelerate software development • System software stack + applications Very high expectations ... • • High media impact of ARM-based HPC Scientific, HPC, general press quote MontBlanc objectives • Highlighted by Eric Schmidt, Google Executive Chairman, at the EC's Innovation Convention The hype curve Peak of Inflated Expectations Y1 Visibility Plateau of Productivity Y3 Slope of Enlightenment Trough of Disillusionment Technology Trigger Time • We'll see how deep it gets on the way down ... Conclusions • Mont-Blanc architecture is shaping up • • • ARM multicore + integrated OpenCL accelerator Ethernet NIC High density packaging • • OmpSs programming model port to OpenCL Applications being ported to tasking model • Stay tuned! www.montblanc-project.eu MontBlancEU @MontBlanc_EU