Mont-Blanc

Transcription

Mont-Blanc
http://www.montblanc-project.eu
High Performance Computing on ARM Hardware
The Mont-Blanc Project
Axel Auweter
Leibniz Supercomputing Centre
of the Bavarian Academy of Sciences and Humanities
1
7th Workshop on UnConventional High Performance Computing 2014
This project has received funding from the European Union's Seventh Framework Programme for research, technological development and demonstration under grant agreements n° 288777 and 610402.
August 26, 2014
The Top500 List of Supercomputers
1E+18
Linpack Performance (FLOPs)
1E+17
1E+16
1E+15
1E+14
1E+13
In
1E+12
1E+11
m
s
i
l
e
l
l
a
r
a
ed P
s
a
e
r
Inc
1E+10
1E+09
1E+08
06/93
lo
C
d
ase
d
e
e
p
ck S
Incre06/96
06/99
06/02
#1
2
d
e
s
a
cre
t
e
g
d
Bu
7th Workshop on UnConventional High Performance Computing
06/05
#500
August 26, 2014
06/08
Sum
06/11
06/14
Sustained Performance vs. Peak Performance
1E+15
100%
Linpack Performance (FLOPs)
1E+14
75%
1E+13
1E+12
50%
1E+11
25%
1E+10
1E+09
0%
06/93
06/96
06/99
06/02
06/05
Top500 Avg Peak Performance
3
7th Workshop on UnConventional High Performance Computing
06/08
Top500 Avg Efficiency
August 26, 2014
06/11
06/14
Sustained Performance vs. Peak Performance
1E+17
100%
Linpack Performance (FLOPs)
1E+16
75%
1E+15
1E+14
50%
1E+13
25%
1E+12
1E+11
0%
06/93
06/96
06/99
06/02
06/05
Top500 #1 Peak Performance
4
7th Workshop on UnConventional High Performance Computing
06/08
Top500 #1 Efficiency
August 26, 2014
06/11
06/14
Building Supercomputers - The Tradeoffs
Lower Price
Higher Performance
Higher Scientific
Value
5
7th Workshop on UnConventional High Performance Computing
August 26, 2014
Building Supercomputers - The Tradeoffs
Lower Price
Higher Performance
Higher Scientific
Value
• Reduce component prices by using commodity hardware
• Improve energy efficiency across the entire HPC stack
• Ensure scientific value of the machine through Co-Design
6
7th Workshop on UnConventional High Performance Computing
August 26, 2014
Commodity Already Drives HPC
Number of Top500 Systems
500
400
300
200
100
0
06/93
06/96
06/99
06/02
Vector
7
7th Workshop on UnConventional High Performance Computing
06/05
RISC
August 26, 2014
06/08
x86
06/11
06/14
Mobile & Embedded Hardware: Energy Efficient & Commodity
• Powerful & energy efficient
devices do exist!
• ... and they are the next step in
commodity hardware!
• Sold in 2013 (2012):
•
•
•
•
8
~ 10M Servers (<10M)
> 320M PCs (>350M)
> 195M Tablets (>100M)
> 960M Smartphones (>700M)
7th Workshop on UnConventional High Performance Computing
August 26, 2014
ARM Processor Improvements in DP FLOPS
IBM Intel
BG/Q AVX
16
DP ops/cycle
8
ARMv8
Intel
SSE2
4
IBM
BG/P
ARM
CortexTM-A15
2
ARM
CortexTM-A9
1
1999
2001
2003
2005
2007
2009
2011
2013
2015
• 256 bit SIMD is today’s de-facto standard for vectorization
• 8 DP ops / cycle
• ARM quickly moved from optional floating-point to state-of-the-art
• ARMv8 ISA introduces DP in the NEON instruction set (128-bit SIMD)
9
7th Workshop on UnConventional High Performance Computing
August 26, 2014
The Mont-Blanc Project
10
7th Workshop on UnConventional High Performance Computing
August 26, 2014
Mont-Blanc Project Facts
• To develop an European Exascale approach
• Leverage commodity and embedded power-efficient technology
• Partners:
• Supported by the EU’s FP7 with 16M € under two projects:
•
•
11
Mont-Blanc: October 2011 - September 2014 (14.5M €)
Mont-Blanc 2: October 2013 - September 2016 (8.0M €)
7th Workshop on UnConventional High Performance Computing
August 26, 2014
Selecting a Suitable System-on-Chip - Wishlist
12
7th Workshop on UnConventional High Performance Computing
August 26, 2014
Samsung Exynos 5 Dual (5250)
• Dual-core ARM Cortex-A15 @ 1.7 GHz
•
•
VFP for 64-bit Floating Point
•
6.8 GFLOPS (1 FMA / cycle)
NEON for 32-bit floating point SIMD
• Quad-core ARM Mali T604 GPU
•
Compute capable
•
•
•
OpenCL 1.1
68 GFLOPS (SP)
25.5 GFLOPS (DP)
• Shared memory between CPU and GPU
13
7th Workshop on UnConventional High Performance Computing
August 26, 2014
Exynos5 Multicore Performance
• Tegra3 platform as fast as Exynos5 platform, a bit more energy efficient
•
4-core Cortex A-9 vs. 2-core Cortex A-15
• Core i7 is 6x faster than Exynos5 at maximum frequency
• Tegra3 and Exynos5 as efficient as Core i7 at the same frequency
14
7th Workshop on UnConventional High Performance Computing
August 26, 2014
Exynos5 Multicore Performance
• Tegra3 platform as fast as Exynos5 platform, a bit more energy efficient
•
4-core Cortex A-9 vs. 2-core Cortex A-15
• Core i7 is 6x faster than Exynos5 at maximum frequency
• Tegra3 and Exynos5 as efficient as Core i7 at the same frequency
14
7th Workshop on UnConventional High Performance Computing
August 26, 2014
Mont-Blanc Server-on-Module (SoM)
• CPU+GPU+DRAM+storage+network all in a compute card that’s just 8.5 x 5.6cm!
15
7th Workshop on UnConventional High Performance Computing
August 26, 2014
Mont-Blanc Prototype
• Exynos 5 compute card
•
•
•
2x Cortex A-15 @ 1.7 GHz
1x Mali T-604 GPU
6.8 + 25.5 GFLOPS (peak) @ 15W
•
•
•
15x Compute Cards
485 GFLOPS @ 300W
1 GbE to 10 GbE
•
•
9x Carrier Blades (=135 Compute Cards)
4.3 TFLOPS @ 2.7 kW
•
•
6x Blade Chassis
26 TFLOPS @ 18 kW
• Carrier Blade
• Blade Chassis 7U
• Rack
16
7th Workshop on UnConventional High Performance Computing
August 26, 2014
Mont-Blanc Applications
BQCD
Particle physics
BigDFT *
Elect. Structure
COSMO * & **
Weather forecast
EUTERPE * & **
Fusion
MP2C
Multi-particle collisions
PEPC
Coulomb + Grav. Forces
ProFASI
Protein folding
Quantum ESPRESSO * & **
Elect. Structure
* GPU capable (CUDA or OpenCL)
** OmpSs capable
SMMP *
Protein folding
17
SPECFEM3D * & **
Wave propagation
7th Workshop on UnConventional High Performance Computing
YALES2
Combustion
August 26, 2014
Further Topics in Mont-Blanc
• System Software Stack
•
•
•
•
•
Linux
gcc, g++, gfortran, ...
ATLAS, FFTW, HDF5
SLURM
MPICH2, OpenMPI
•
•
Allinea DDT debugger
Extrae, Scalasca
•
OmpSs
•
Provides SMEs remote access to Mont-Blanc prototype platforms
• Development Tools
• Programming Models
• Industrial End-User Group
• Support & Training
18
7th Workshop on UnConventional High Performance Computing
August 26, 2014
19
7th Workshop on UnConventional High Performance Computing
August 26, 2014
The 4 Pillar Model of Energy Efficient HPC
• Use newest
semiconductor
technology
• Use of energy saving
processor and
memory technologies
• Consider using
special hardware or
accelerators
designed for specific
scientific problems or
numerical algorithms
Energy efficient
hardware
20
• Reduce power losses
in the power supply
chain
• Improve cooling
technology
• Reuse waste heat
from IT systems
Energy efficient
infrastructure
7th Workshop on UnConventional High Performance Computing
• Monitor the energy
consumption of the
compute system and
the building
infrastructure
• Use energy aware
system software to
exploit the energy
saving features of the
platform
Energy aware
system software
August 26, 2014
• Use the most efficient
algorithms
• Use best libraries
• Use most efficient
programming
paradigm
Energy efficient
applications
The 4 Pillar Model of Energy Efficient HPC
• Use newest
semiconductor
technology
• Use of energy saving
processor and
memory technologies
• Consider using
special hardware or
accelerators
designed for specific
scientific problems or
numerical algorithms
Energy efficient
hardware
20
• Reduce power losses
in the power supply
chain
• Improve cooling
technology
• Reuse waste heat
from IT systems
Energy efficient
infrastructure
7th Workshop on UnConventional High Performance Computing
• Monitor the energy
consumption of the
compute system and
the building
infrastructure
• Use energy aware
system software to
exploit the energy
saving features of the
platform
Energy aware
system software
August 26, 2014
• Use the most efficient
algorithms
• Use best libraries
• Use most efficient
programming
paradigm
Energy efficient
applications
21
7th Workshop on UnConventional High Performance Computing
August 26, 2014
Exynos
Exynos
Exynos
Exynos
Exynos
Exynos
Exynos
Exynos
Exynos
Exynos
Exynos
Exynos
Exynos
Exynos
Exynos
Monitoring Power Consumption in Mont-Blanc
Monitoring Power Consumption in Mont-Blanc
PM
PM
PM
PM
PM
22
PM
PM
PM
PM
PM
PM
PM
PM
7th Workshop on UnConventional High Performance Computing
PM
PM
August 26, 2014
BMC
FPGA
Monitoring Power Consumption in Mont-Blanc
• Field Programmable Gate Array (FPGA)
•
•
Collects power consumption data from all 15
power measurement ICs
Sample interval: 70ms
BMC
BMC
Mont-Blanc Pusher
Collects 1s averaged data from FPGA
Stores measurement samples in FIFO
MQTT Protocol
• Mont-Blanc Pusher
•
•
BMC
IPMI
• Board Management Controller (BMC)
•
•
BMC
Collects measurement data from multiple
BMCs using custom IPMI commands
Forwards data using MQTT Protocol through
Collect Agent into key-value store
Collect Agent
Distributed Key-Value Store
23
7th Workshop on UnConventional High Performance Computing
August 26, 2014
Monitoring Power Consumption in the entire Data Center
Mont-Blanc Pusher
IPMI Pusher
SysFS Pusher
BACNet Pusher
MQTT Protocol
Collect Agent
Distributed Key-Value Store
24
7th Workshop on UnConventional High Performance Computing
August 26, 2014
...
Monitoring Power Consumption in the entire Data Center
Mont-Blanc Pusher
IPMI Pusher
SysFS Pusher
BACNet Pusher
MQTT Protocol
Collect Agent
Collect Agent
Distributed Key-Value Store
24
7th Workshop on UnConventional High Performance Computing
August 26, 2014
...
To take home...
www.montblanc-project.eu
MontBlancEU
@MontBlanc_EU
25
7th Workshop on UnConventional High Performance Computing
August 26, 2014

Similar documents

“Piz Daint:” Application driven co-design of a

“Piz Daint:” Application driven co-design of a co-design of a supercomputer based on Cray’s adaptive system design

More information