High-Productivity, Standards-Based Computing for Weather

Transcription

High-Productivity, Standards-Based Computing for Weather
Headline in Arial Bold 30pt
High-Productivity, Standards-Based Computing
for Weather Forecasting and Climate Modelling
Dave Parry, Senior Vice President of Products, SGI
Computers are beginning to catch up to ideas
64,000 “computers” working together
Lewis Fry Richardson, Weather Prediction by Numerical Process, 1922
9/24/2007
Slide 2
Top 500 System Sizes with 8,192 or more core
140,000
120,000
100,000
80,000
64,000
60,000
40,000
(29 Total)
20,000
(3 Total)
0
2003
(4 Total)
2004
8,192 cores
9/24/2007
Slide 3
(19 Total)
(10 Total)
2005
2006
>8,192 cores
2007
Recent trends in system deployment
Top 500 Systems by Type
Top 500 Weather/Environment Systems by Type
500
35
400
30
25
300
20
15
200
10
100
5
0
2003
2004
2005
2006
2007
0
2003
2004
2005
2006
2007
Year
Year
Weather/Environment System Size By Type
5500
Earth Simulator (Vector)
5000
Clusters
2500
MPP/Specialty
2000
SMP/Constellation
1500
Vector
1000
500
9/24/2007
Slide 4
0
2003
2004
2005
Year
2006
2007
Altix systems for Weather Forecasting
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Finnish Meteorological Institute – 304p
KNMI (Netherlands) - 224p
Hungarian Meteorological Service (144p)
KMI (Belgium) – 56p
Puertos del Estado (Spain) – 20p
Catalan Meteorological Service (CESCA/MeteoCat) – 128p
Roshydromet – 112p
Romanian Met – 2p
Meteo Croatia – 16p
Desert Research Institute (DRI) – 72p
NOAA NSSL – 64p
BAMS – 24p
INMET Brazil – 64p
China Met. Administration – 22p
China Met. Administration, Institute of Arid Meteorology – 20p
Shanghai Meteorology Center – 64p
Yunnan Meteorology Bureau – 80p
Sichuan Weather Bureau – 192p
Taiwan Central Weather Bureau – 28p
Meteorological Service of New Zealand – 20p
9/24/2007
Slide 5
Altix systems for meteorology and climate research
•
NOAA GFDL –2560p Altix 3700 + 2560c Altix 4700– MOM4, AM, CM2.1
•
University of Oceanography of China, Tsing Dao – 224p
•
Polar Research Inst. Of China – 64p
•
Nanjing UIST- 128p + 8p
•
U Tasmania/Antarctic CRC – 128p
•
CMMACS – 80p Altix 3700 BX2 & 350 – MOM4
•
U of Waterloo – 64p Altix 3700 & 16p A350
•
Universidad Complutense – 64p
•
Beijing Normal University, Climate Modeling Branch, State Lab of Remote Sensing Science – 56p
•
First Institute of Oceanography (China) – 56p
•
Georgia Tech – 48p
•
Institute of Desert Meteorology, China – 32p
•
Univ. of Florida- 32p
•
Harvard University – 28p
•
Univ. of Wisconsin CMISS – 24p
•
MIT Dept of Earth, Atmosphere and Planetary Science – 20p
•
Dalhousie University – 16p
•
NIO, Goa, India – 16p
•
Univ. of Utah – 16p
•
Woods Hole Oceanographic Institute – 16p
•
Univ. of Colorado Boulder - 12p
•
APAT -8p
•
Utrecht Univ – 8p
Slide 6 Florida – 4p
• 9/24/2007
Univ. of South
•
Florida Institute of Technology – 2p
SGI systems for storage and data management in
Weather Forecasting and climate modeling
• INM (Spain) uses DMF and CXFS with Cray vector
system
• NRW (Queensland) uses DMF with Cray vector
system
• Environment Canada uses SGI servers for pre- and
post-processing with IBM SP system
• MeteoFrance uses DMF with NEC vector system
9/24/2007
Slide 7
Selected Large Altix Installations
•
NASA Columbia – 10,240p Altix 3700 constellation, 20 nodes + 512
core Altix 4700
– 2048p single NL fabric with 4 512p partitions + 16x512 (Madison 9M)
•
LRZ 9728p Altix 4700
– 19x512 core nodes, single NL fabric
•
•
•
•
WPAFP (US DoD) – 9216p (18x512 core nodes), single NL fabric,
Montecito
TU Dresden – 2048 core Altix 4700, Montecito
NOAA GFDL - 2560p Altix 3700 and Altix 3700 BX2 systems
(Madison) + 2560 cores Altix 4700 (Montecito) and SGI largest DMF
installation (~10 PB)
APAC – 1936p Altix 3700/BX2, multiple partitions
All use CXFS for shared filesystem
9/24/2007
Slide 8
Cluster Systems – Usage, Goals, Issues
Top 3 System Goals
Cluster Utilization by Industry
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
60%
50%
40%
30%
20%
10%
0%
Digital
GeoContent science
Creation
Average Weather Software
Engineering
Aggregate Aggregate Processor
High
Performance
Performance
Availability
I/O
Top 3 Current Issues
60%
50%
40%
30%
Source: IDC 2007
20%
10%
9/24/2007
Slide 9
0%
Facilities
System
Power & Cooling Management
Application
Complexity
Escalating Computer Center Concerns
• Cost : 1 MW costs $1M /year
• Government
– EPA Energy-Star Bill
– EC Renewable Energies Unit
“Code of Conduct”?
• 2005, U.S. spent
– $20.5B on computer equipment
– $9.3B on electricity to run computers
9/24/2007
Slide 10
}
}
339W
216W
Weather & Environment Application Scaling
Application Scaling
1400
1600
1200
1400
Simulation speedup
Simulation Speed
(Forecast hours/hour)
WRF V2.1.2+, 12km CONUS WSM5 uphys
1000
800
600
400
200
Global Ocean Model,
POP 0.1o
1200
1000
800
600
Regional Forecast Model
400 Globalo Ocean Model, WRF CONUS 5km
POP 1
(970x720x37)
200
0
0
256
512
768
1024 1280 1536 1792 2048
Number of cores
0
0
400
800
Number of cores
Altix 4700 MTC 1.6GHz/18M
Altix XE310 Xeon 5355 2.67GHz
Altix ICE Xeon 5355 2.67GHz
9/24/2007
1200
Slide 11
Source: SGI Internal Testing
1600
2000
What conclusions can we draw?
• Weather & climate application scale better than most
– 1,000+ cores feasible in global weather, regional weather, ocean modeling
• Availability is much more important than in other markets
– Forecasts must be delivered on time
• SMP/Constellation systems are dominant
– Production installations in both weather forecasting and climate research
• Clusters are O.K. for research and small-scale production,
but not for large, complex OWF
– Maturity still not there for availability, system management, I/O
– Progress is being made in each area – see SGI Altix ICE discussion
9/24/2007
Slide 12
Critical Issues
- What is SGI doing?
• Reliability
- hardware, Linux, tool improvements
• Scalability
- hardware, Linux, I/O improvements
• Achievable
performance
- low-latency interconnects & MPI,Linux
improvements, IB storage
• Facilities issues
- more efficient power supplies, water cooling
• System
management
- system management tools for 1,000s of
nodes, parallel booting, lights-out error
reporting
9/24/2007
Slide 13
SGI Linux
Driving Performance Computing into the Community
•
SGI is behind only IBM and RedHat in total contributions to the Linux
Community
Of 16,678 “Copyright” instances in Linux-2.6.18.tar.gz (Sept20’06) :
•
•
“IBM”, “International Business” or “ibm.”
676+86+180= 942
•
“Red Hat”, “RedHat”, “redhat.”
438+2+244 = 684
•
“SGI”, “Silicon Graphics”, “sgi.”
5+454+151 = 610
•
“SuSE”, “SUSE”, “suse.”
67+2+422 = 491
•
“HP”, “Hewlett Packard” & “Hewlett-Packard”, “hp.”
7+20+244+296= 467
SGI continues to drive the interests of our customers with Linux
developers
9/24/2007
Slide 14
9/24/2007
Slide 15
LINUX OPERATING SYSTEM
BIOS
PROFILERS, DEBUGGERS
LIBRARIES
COMPILERS
FILE SYSTEM
STORAGE MANAGEMENT
SYSTEM MANAGEMENT
Today’s Linux Environment
JOB SCHEDULER
SGI Altix
• Proven in production weather & climate environments
• Single system based on commodity CPUs, memory, disks
–
–
–
–
Single system management with up to 1,024 Intel Itanium2 cores
Global shared memory up to 128TB
Very low latency MPI
Unified I/O with 10GB/sec I/O
• Capable of running high-resolution simulations or
multiple members of forecast ensembles
9/24/2007
Slide 16
produced by gsiCom - foto Kai Hamman. SGI Altix 4700 at LRZ.
SGI Altix ICE
• Merging cluster economics with HPC integration
• Next generation blade solution
– Density optimized: Up to 512 Intel Xeon 5300 cores (6+ TFLOPS) per rack
– Power optimized: 76% rack level power efficiency (1.4x typical clusters),
water cooling
– Reliability optimized: Hot-swap N+1 power, hot-swap N+1 fans, diskless
nodes, integrated dual-plane IB 4xDDR switches, parallel boot & system
management, cable free blade enclosure for up to 128 core.
– Performance optimized: IB RAID storage, O/S jitter control, dual IB 4XDDR
9/24/2007
Slide 17
Multi-Rack
Pizza Box Cluster
Multi-Rack SGI Altix ICE
SGI Altix ICE
• In standard cluster environments, the OS is not synchronized
across nodes, this “jitter” can reduce overall performance.
• SGI Altix ICE implements application transparent
synchronication of the OS to improve overall performance
Unsynchronized OS Noise => Wasted Cycles
Process on:
Node 1
Node 2
System
Overhead
Compute Cycles
Node 3
Process on:
Node 1
Node 2
Node 3
Wasted
Cycles
Wasted
Cycles
Wasted
Cycles
System
Overhead
Wasted
Cycles
System
Overhead
Barrier Complete
System
Overhead
System
Overhead
System
Overhead
Synchronized OS Noise => Faster Results
9/24/2007
Wasted
Cycles
Wasted
Cycles
Slide 18
Time
SGI Altix ICE
• SGI Tempo System Management Environment
–
–
–
–
–
9/24/2007
Hierarchical system management
Server image management and provisioning
Rapid parallel diskless booting
System monitoring
Designed to scale to 1,000s of nodes
Slide 19
SGI® Server Technology Roadmap
9/24/2007
Slide 20
Altix 4700/450
SHUB2/NUMAlink4
IA64 Montvale
RASC FPGA
Future
Project UV
NUMAlink 5
UV HUB
Xeon
Itanium
Altix ICE
Altix
Altix 4700/450
SHUB2/NUMAlink™4
IA64 Montecito
RASC™ FPGA RC100
2008
Altix ICE 8200
IB DDR4X fabric
Ultradense, cool Pkg
SGI® Tempo
Project Carlsbad+
IB DDR4X fabric
FPGA co-processing
Cluster Mgmt Advances
Altix XE
Red Hat® or SUSE® Linux with SGI ProPack™
Standard
Rack mount
Servers &
Clusters
SGI Data and System Management SW
Integrated
HPC Blade
MPP/Cluster
Solution
Intel® Xeon®
Enterprise
Class
SMP Blade
Solution
Intel Itanium®
Today
Altix XE 210/240
Altix XE 1200 Cluster
Altix XE 310
Altix XE 1300 Cluster
MS Windows CCS
Project Dixon
Dixon Cluster
Project Gallup 2
Gallup 2 Cluster
MS Windows CCS
Project Carlsbad2
IB Future
Future Altix XE
Project Gallup 3
Gallup 3 Cluster
Industrial Strength Linux Environment
LINUX OPERATING SYSTEM
9/24/2007
Slide 21
BIOS
PROFILERS, DEBUGGERS
LIBRARIES
COMPILERS
FILE SYSTEM
STORAGE MANAGEMENT
SYSTEM MANAGEMENT
JOB SCHEDULER
9/24/2007
Slide 22
LINUX OPERATING SYSTEM
BIOS
PROFILERS, DEBUGGERS
LIBRARIES
COMPILERS
FILE SYSTEM
STORAGE MANAGEMENT
SYSTEM MANAGEMENT
FY09 – Shared Workload
JOB SCHEDULER
Project UV Overview
Bringing Fast MPI and Scalable Shared-Memory to x86-64
•Scale-up capability in an
accessible, industry-standard platform
•Versatile Performance for data-intensive
or very large-scale workloads
•Robust, reliable operation with complete software
solution on industry standard Linux
•Affordable acquisition and operation costs
9/24/2007
Slide 23
UV HUB/Node Controller Features
Unmatched MPI and big-data capabilities
•Enabling Enterprise-class scalability and reliability on x86-64
•Cache-coherence across nodes
•Fault resiliency
•Extensive fault isolation, datapath protection, monitoring/debug functions
•Accelerating Large-scale workloads
•Fast Message-Passing
•Extends cpu capability for load requests
•System scale to 256+ sockets, 2048+ cores
•Accelerating Data-intensive applications
•Extended physical memory address
•Extended TLB page size
•Off-load instructions
9/24/2007
Slide 24
UV Multi-paradigm Architecture
Socket-attach Co-processors
Globally
GloballyShared
SharedMemory
Memory
[S] calar
Intel Xeon
Intel Itanium
[V] ector
[P] IM
memory ops
Proc in
memory
[A] pp-Specific
Graphics - GPU
Signals - DSP
Prog’ble - FPGA
Accelerator –
ClearSpeed, Cell
[A]
[A]
[A]
[A]
[S]
[S]
Intel[S]
Intel[S]
Intel
Intel
PI
PI
QuickPath/CSI
PI
PI PI
GRU
[V] GRUPINI [V]
PI PI
SGI
180
nm
TIO
90
nm
[V]
GRU
[V]
GRUPI
90
nm
PI NI
PI
SGI
SHub2
180
nm
NI
TIO
90
nm
[V]
GRU
[V]
GRU
NI
90
nm
PI
[P] AMU
[P]
AMU
SGI
MI
SHub2
180
nm
NI
TIO
90
nm
[V]
GRU
[V]
GRU
NI
90
nm
[P]MIAMU
[P]MIMI
AMU
SGI
SHub2
180
nm
NI
TIO
90
nm
90
nm
[P]MIAMU
[P]MIMI
AMU NI
SHub2
[P]MIAMU
[P]MIMIAMU
MI
MI
NUMAlink Interconnect
Fabric
9/24/2007
Slide 25
2 Skt
UV
Node
UV Multi-Paradigm Architecture
I/O Attached Co-processors
Globally
GloballyShared
SharedMemory
Memory
[S]
[S]
Intel[S]
Intel[S]
IO
[A]
IO
[A]
IO
[A]
IO
[A]
[S] calar
Intel Xeon
Intel Itanium
[V] ector
[P] IM
memory ops
Proc in
memory
[A] pp-Specific
Graphics - GPU
Signals - DSP
Prog’ble - FPGA
Accelerator –
ClearSpeed, Cell
Intel
Intel
[S]
[S]
Intel[S]
Intel[S]
Intel
Intel
PI
PI
PI
PI PI
GRU
[V] GRUPINI [V]
PI PI
SGI
180
nm
TIO
90
nm
[V]
GRU
[V]
GRUPI
90
nm
PI NI
PI
SGI
SHub2
180
nm
NI
TIO
90
nm
[V]
GRU
[V]
GRU
NI
90
nm
PI
[P] AMU
[P]
AMU
SGI
MI
SHub2
180
nm
NI
TIO
90
nm
[V]
GRU
[V]
GRU
NI
90
nm
[P]MIAMU
[P]MIMI
AMU
SGI
SHub2
180
nm
NI
TIO
90
nm
90
nm
[P]MIAMU
[P]MIMI
AMU NI
SHub2
[P]MIAMU
[P]MIMIAMU
MI
MI
NUMAlink
NUMAlinkInterconnect
Interconnect
Fabric
Fabric
9/24/2007
Slide 26
2 Skt
UV
Node
Energy Efficiency : Rack Level
Rack
UV stretch goal
80%
Net (all-in) Rack Energy Efficiency Roadmap
(N.B. even higher efficiency if no water-coil)
78%
75%
75%
70%
70%
65%
65%
60%
60%
55%
9/24/2007
Slide 27
Origin 2000
Origin 3000
Altix 3000
Altix 4000
Carlsbad
Ultraviolet
Altix Æ ICE Æ UV Cooling Solution
Integrated Water Cooled Option
(2) 18-Receptacle
Power-Strips
(4) Hinged
Water-Cooled Coils
Rack Chilled-Water Supply
45°F to 60°F (7.2°C to 15.6°C)
14.4 gpm (3.3 m3/hr) Max.
(2) 60A 200-240VAC
3-Phase IEC 60309 Plugs
9/24/2007
Slide 28
Rear View
UV Software in Development
Assuring a Complete Solution
•Linux OS Community Features to Support UV
•Key items already submitted to assure adoption by UV launch
•Drivers, APIs
•UV HUB/Node Controller Feature Enablement
•System Management, Integration
•Console
•Monitoring, debug
•Partitioning
•Integration with storage, data sharing across UV and other systems
•RAS – enable resiliency features of UV HUB + advanced memory RAS
•Unified Parallel C Source-to-source translator
•On Intel or GCC compiler
•Ongoing system management, MPT and other Propack advances
9/24/2007
Slide 29
2010 – Hybrid System with Isle
GPU
XEON
CO-PROC
LINUX OPERATING SYSTEM
9/24/2007
Slide 30
BIOS
PROFILERS, DEBUGGERS
UV
LIBRARIES
ITANIUM
COMPILERS
FILE SYSTEM
STORAGE MANAGEMENT
SYSTEM MANAGEMENT
JOB SCHEDULER
Conclusion
• SMP/Constellation systems own the majority of production
weather forecasting & environmental research.
• Cluster systems bring further improvements in economics,
but don’t meet production requirements for large systems.
• Next generation HPC cluster systems like SGI Altix ICE
will enhance current density, power, reliability and
performance characteristics.
• Future systems will bring enterprise RAS and scalability to
HPC systems for weather forecasting and environmental
research – but without the enterprise cost.
• SGI intends to remain at the technology forefront and drive
this evolution.
9/24/2007
Slide 31

Similar documents

SGI® with NVIDIA® Quadro®

SGI® with NVIDIA® Quadro® SGI® UV™ 300 and UV™ 30EX: Newly enhanced SGI UV 300 and SGI UV 30EX servers are designed for data-intensive, I/O heavy workloads such as data analytics, visualization, and real-time streaming. Fea...

More information

SGI UV 300, UV 30EX: Big Brains for No

SGI UV 300, UV 30EX: Big Brains for No SGI UV 300 delivers unparalleled Intel performance with optimum flexibility. Providing a high memory to processor ratio, the system’s x86 architecture now features Intel® Xeon® E7-8800 v4 and E7-48...

More information