The Mont-Blanc approach towards Exascale - Mont

Transcription

The Mont-Blanc approach towards Exascale - Mont
About BSC Computer Science Department
•
Origins in UPC Computer Architecture Department
•
•
1985-
Inherited know-how and key personnel
Our mission: To influence the way high-performance
computing systems are built, programmed and used
1991-2000
2000-04
People and teams
Programming
Models
Programming
Models
(26)
Performance
Tools
Performance
Tools
(11)
Accelerators
for HPC
(5)
Grid and
Cluster
Grid and
Cluster
(21)
Autonomic eBusiness
Autonomic eBusiness
(14)
Storage
Systems
Storage
Systems
(11)
Extreme
Computing
Computer
Architecture
Application
Optimization
Directors
Team leaders
Researchers
Associate (UPC)
Heterogenous
Architectures
Heterogenous
Architectures
(17)
OS/CA
Interface
OS/CA
Interface
(21)
OS/PM
Interface
OS/PM
Interface
(26)
Network
Architectures
Network
Architectures
100
75
50
25
Students
0
Support
2005 (28)
2006 (69)
2007 (82)
(3)
2008 (130)
2009 (138)
2010 (162)
2011 (159)
(2)
Department funding sources
6.000.000
5.000.000
4.000.000
Contracts
EU funding
3.000.000
National
Personnel
2.000.000
1.000.000
0
2006
2007
2008
2009
2010
2011
EU projects
HOPSA-EU
OPTIMIS
Red Iberoamericana
de Supercomputación
6.000.000
5.000.000
4.000.000
3.000.000
2.000.000
1.000.000
0
Total: 10.76 M€
Average: 1.8 M€
Current (2012): 2.9 M€
2006
2007
2008
2009
2010
2011
http://www.montblanc-project.eu
The next generation
supercomputer
Alex Ramirez
Barcelona Supercomputing Center
Disclaimer: Not only I speak for myself ... All references to unavailable products are speculative, taken from web sources.
There is no commitment from ARM, Samsung, TI, Nvidia, Bull, or others, implied.
This project and the research leading to these results has received funding from the European Community's Seventh Framework Programme [FP7/2007-2013] under grant agreement n° 288777.
Project goals
•
•
To develop an European Exascale approach
Based on embedded power-efficient technology
•
Objetives
•
•
•
Develop a first prototype system, limited by available technology
Design a Next Generation system, to overcome the limitations
Develop a set of Exascale applications targeting the new system
Outline
•
A bit of history
•
•
•
•
Supercomputers from mobile components
•
•
•
•
•
Vector supercomputers
Commodity supercomputers
The next step in the commodity chain
Killer mobile examples
Mont-Blanc architecture strawman
Rely on OmpSs to handle the challenges
BSC prototype roadmap
Mont-Blanc project goals and milestones
In the beginning ... there were only supercomputers
•
•
•
•
•
•
•
•
Built to order
•
Very few of them
Special purpose hardware
•
Very expensive
Control Data, Convex, ...
Cray-1
•
1975, 160 MFLOPS
•
80 units, 5-8 M$
Cray X-MP
•
1982, 800 MFLOPS
Cray-2
•
1985, 1.9 GFLOPS
Cray Y-MP
•
1988, 2.6 GFLOPS
Fortran+vectorizing compilers
10
The Killer Microprocessors
10.000
MFLOPS
Cray-1, Cray-C90
NEC SX4, SX5
1000
Alpha AV4, EV5
Intel Pentium
100
IBM P2SC
HP PA8200
10
1974
•
1984
1989
1994
1999
Microprocessors killed the Vector supercomputers
•
•
•
1979
They were not faster ...
... but they were significantly cheaper and greener
Need 10 microprocessors to achieve the performance of 1 Vector
CPU
•
SIMD vs. MIMD programming paradigms
11
Then, commodity took over special purpose
•
ASCI Red, Sandia
•
•
•
•
1997, 1 TFLOPS (Linpack),
9298 cores @ 200 Mhz
1.2 Tbytes
Intel Pentium Pro
•
Upgraded to Pentium II Xeon,
1999, 3.1 Tflops
•
MareNostrum, BSC
•
•
2004, 20 TFLOPS
IBM PowerPC 970 FX
•
•
•
Blade enclosure
Myrinet + 1 GbE network
SuSe Linux
Message-Passing Programming Models
12
The next step in the commodity chain
HPC
Servers
Desktop
•
•
Mobile
•
Total cores in Jun'12 Top500
•
13.5 Mcores
Tablets sold in Q4 2011
•
27 Mtablets
Smartphones sold Q4 2011
•
> 100 Mphones
13
ARM Processor improvements in DP FLOPS
IBM Intel
BG/Q AVX
16
ARMv8
DP ops/cycle
8
Intel
SSE2
4
IBM
BG/P
2
ARM
CortexTM-A9
1
1999
•
2001
2003
2005
2007
2009
2011
2013
2015
IBM BG/Q and Intel AVX implement DP in 256-bit SIMD
•
•
ARM
CortexTM-A15
8 DP ops / cycle
ARM quickly moved from optional floating-point to state-of-the-art
•
ARMv8 ISA introduces DP in the NEON instruction set (128-bit SIMD)
14
Integrated ARM GPU performance
Performance
Skrymir
Mali-T658
High-end solution + compute capability
Scalable to 8 cores, ARMv8 compatible
272 GFLOPS*
Mali-T604
First Midgard architecture product
Scalable to 4 cores
68 GFLOPS*
2012
•
2013
2014
GPU compute performance increases faster than
Moore’s Law
* Data from web sources, not an ARM commitment
Are the “Killer Mobiles™" coming?
Cost (log10)
Server ($1500)
Desktop ($150)
HPC-Mobile ($40) ?
Mobile ($20)
Nowadays
Near future
Performance (log2)
•
Where is the sweet spot? Maybe in the low-end ...
•
•
•
Today ~ 1:8 ratio in performance, 1:100 ratio in cost
Tomorrow ~ 1:2 ratio in performance, still 1:100 in cost ?
The same reason why microprocessors killed supercomputers
•
Not so much performance ... but much lower cost, and power
16
Samsung Exynos 5 Dual Superphone SoC
• 32nm HKMG
• Dual-core ARM Cortex-A15 @ 1.7 GHz
• Quad-core ARM Mali T604
• OpenCL 1.1
• Dual-channel DDR3
• USB 3.0 to 1 GbE bridge
•
All in a low-power mobile socket
High density packaging architecture
•
•
Standard BullX blade
enclosure
Multiple compute nodes per
blade
•
Additional level of
interconnect, on-blade
network
There is no free lunch
2X more nodes for
the same
performance
½ on-chip memory /
core
8X more address
spaces
1 GbE inter-chip
communication
OmpSs runtime layer manages architecture complexity
P0
P1
P2
•
•
Programmer exposed a simple
architecture
Task graph provides
lookahead
•
•
Automatically handle all of the
architecture challenges
•
•
•
•
•
No overlap
Overlap
Exploit knowledge about the
future
Strong scalability
Multiple address spaces
Low cache size
Low interconnect bandwidth
Enjoy the positive aspects
•
•
Energy efficiency
Low cost
BSC ARM-based prototype roadmap
GFLOPS / W
Pedraforca:
ARM + GPU
Tibidabo:
ARM multicore
2011
•
Integrated
ARM + GPU
2012
2013
2014
Prototypes are critical to accelerate software development
•
System software stack + applications
Very high expectations ...
•
•
High media impact of ARM-based HPC
Scientific, HPC, general press quote MontBlanc objectives
•
Highlighted by Eric Schmidt, Google Executive
Chairman, at the EC's Innovation Convention
The hype curve
Peak of Inflated Expectations
Y1
Visibility
Plateau of Productivity
Y3
Slope of Enlightenment
Trough of Disillusionment
Technology Trigger
Time
•
We'll see how deep it gets on the way down ...
Conclusions
•
Mont-Blanc architecture is shaping up
•
•
•
ARM multicore + integrated OpenCL accelerator
Ethernet NIC
High density packaging
•
•
OmpSs programming model port to OpenCL
Applications being ported to tasking model
•
Stay tuned!
www.montblanc-project.eu
MontBlancEU
@MontBlanc_EU