Integrated Adaptive Software Systems

Transcription

Integrated Adaptive Software Systems
Simulation Science in Grid
Environments: Integrated
Adaptive Software Systems
Lennart Johnsson
Advanced Computing Research Laboratory
Department of Computer Science
University of Houston
and
Department of Numerical Analysis and Computer Science
Royal Institute of Technology, Stockholm
Outline
•
•
•
•
Technology drivers
Sample applications
Domain specific software environments
High-performance software
Functions per chip, Mtransistors
Cost of
Computing
100000
80000
DRAM, at Production
70000
DRAM, at Introduction
60000
50000
40000
Today’s most powerful
computers (the power of
10,000 PCs at a cost of
$100M) will cost a few
hundred thousand dollars
30000
20000
MPU, Cost-Performance, at
Production
MPU, Cost-Performance, at
Introduction
MPU, High-Performance, at
Production
ASIC, at Production
10000
0
1999
180 nm
2000
2001 2002
130 nm
.
2003
.
2004 2005
90 nm
.
2008 2011
2014
60 nm 40 nm 30 nm
Cost/Mtransistor
$/Mtransistor
In 2010, the compute power
of today’s top-of-the-line PC
can be found in $1 consumer
electronics
SIA Roadmap
90000
45
40
35
30
25
20
15
10
5
0
SIA Roadmap
DRAM, cost x 100, at Introduction,
$/Mtransistors
DRAM, cost x 100, at Production,
$/Mtransistors
1999 2001 2003 2005 2011
.
100 40 nm
130
180
nm
nm
nm
MPU, Cost-Performance, at
Introduction, $/Mtransistors
MPU, Cost-Performance, at
Production, $/Mtransistors
MPU, High Performance, at
Production, $/Mtransistors
8KB
Average Price of Storage
DRAM
Seagate ST500
32KB
100
Price/MByte, Dollars
In 2010, $1 will
buy enough disk
space to store
1000
IBM6150
Wren II
10
HDD
64KB
128KB
128KB Flash
512KB
Seagate ST125
IBM0615
Maxt170
IBM0663
1
DRAM
IBM 9.1GB Ultrastar
Seagate B'cuda4
IBM Deskstar3
Paper/Film
0.1
Flash
IBM 1 GB Microdrive
IBM 8.1GB
Travelstar
IBM Deskstar4
Quant 4.5GB
IBM 18.2GB Ultrastar
IBM 16.8GB Deskstar
Seagate 8.6GB
IBM Deskstar 25GB
0.01
oemprc2000aa.prz
1995
1" HDD Projection
DataQuest 2000
IBM 9.1GB Ultra 2XP
1 " HDD
3.5 " HDD
1990
Flash Projection
DataQuest 2000
Toshiba
IBM 25GB
6.4GB
Travelstar
IBM Deskstar 37GB
IBM Deskstar 75GXP
1985
Paper/Film
256KB Flash
512KB Flash
2MB
1MB
4MB Flash
1MB Flash
4MB
64MB Flash
16MB Flash
96 MB Flash Camera
128MB Flash
Mem.
128MB Flash
64MB
IBM 340 MB Microdrive
64MB
Range of
0.001
1980
Flash
2000
2.5 " HDD
2005
2010
Year
Ed Grochowski at Almaden
10,000 Books
35 hrs of CD
Quality audio
2 min of DVD
Quality Video
Growth of Cell vs. Internet
800
700
600
In Millions
500
Cell Subscriptions
400
Internet Hosts
300
200
100
0
1992 1993 1994 1995 1996 1997 1998 1999 2000 2001
Access Technologies
Computing Platforms
2001 ⇒ 2030
Ê Personal Computers O[$1000]
– 109 Flops/sec in 2001 ⇒ 1015 – 1017 Flops/sec by 2030
Ê Supercomputers O[$100,000,000]
– 1013 Flops/sec in 2001 ⇒ 1018 – 1020 Flops/sec by 2030
Ê Number of Computers [global population ~1010]
– SCs ⇒ 10-8 –10-6 per person ⇒ 102 – 104 systems
– PCs ⇒ .1x – 10x per person ⇒ 109 – 1011 systems
– Embedded ⇒ 10x – 105x per person ⇒ 1011 – 1015 systems
– Nanocomputers ⇒ 0x – 1010 per person ⇒ 0 – 1020 systems
Ê Available Flops Planetwide by 2030
– 1024 – 1030 Flops/sec [assuming classical models of computation]
Courtesy Rick Stevens
MEMS - Biosensors
http://www.darpa.mil/mto/mems/presentations/memsatdarpa3.pdf
MEMS – Jet Engine Application
http://www.darpa.mil/mto/mems/presentations/memsatdarpa3.pdf
Smart Dust - UCB
Sensor
RF Mote
Laser Mote with CCD
RF Mini Mote I
Laser Mote
IrDA Mote
RF Mini Mote II
http://robotics.eecs.berkeley.edu/~pister/SmartDust/
Polymer Radio Frequency
Identification Transponder
http://www.research.philips.com/pressmedia/pictures/polelec.html
Optical Communication costs
Larry Roberts, Caspian Networks
Fiber Optic
Communication
In 2010. . .
¾ A million books can be sent across
the Pacific for 1$ in 8 seconds
¾ All books in the American Research
Libraries can be sent across the
Pacific in about 1 hr for $500
Fiberoptic Communication
Milestones
Ê
Ê
Ê
Ê
Ê
Ê
First Laser 1960
Ê
WAN fiberoptic cables often have 384 strands of fiber and
would have a capacity of 2 Pbps. Several such cables are
typically deployed in the same conduit/right-of-way
First room temperature laser, ~1970
Continuous mode commercial lasers, ~1980
Tunable lasers, ~1990
Commercial fiberoptic WANs, 1985
10 Tbps/strand demonstrated in 2000 (10% of fiber peak
capacity). (10 Tbps is enough bandwidth to transmit a
million high-definition resolution movies simultaneously, or
over 100 million phone calls).
Pacific Capacity
Atlantic Capacity
NSFnet
vBNS
Internet2
Abilene
TeraGrid
10000
Doubling every year
OC-192
8000
Mbit/s
6000
4000
OC-48
2000
OC-3
OC-12
0
1986
1996
1997
1999
SURFnet
2001
Year
Vancouver CA*net4
Seattle
Portland
NTON
NTON
San
Francisco
Los
Angeles
San
Diego
(SDSC
)
DTF
40Gb
U Wisconsin
Chicago IU
NCSA
PSC
N
Y
C
Atlanta
AMPA
TH
CalREN-2
I-WIRE
UIC
NU / Starlight
• State Funded Infrastructure to
Star Tap
support Networking and Applications
ANL
Research
IIT
– $6.5M Total Funding
• $4M FY00-01 (in hand)
UC
• $2.5M FY02 (approved 1June-01)
• Possible add’l $1M in FY03-5
– Application Driven
• Access Grid: Telepresence &
Media
NCSA/UIUC
• Computational Grids: Internet
Computing
• Data Grids: Information
Analysis
– New Technologies Proving
Ground
• Optical Switching
• Dense Wave Division
Multiplexing
• Ultra-High Speed SONET
• Wireless
• Advanced middleware
infrastructure
Charlie Catlett
Argonne National Laboratory
CA*net 4 Architecture
CANARIE
GigaPOP
ORAN DWDM
Carrier DWDM
Edmonton
Saskatoon
St. John’s
Calgary Regina
Quebec
Winnipeg
Charlottetown
Thunder Bay
Montreal
Ottawa
Victoria
Vancouver
Fredericton
Halifax
Boston
Seattle
Chicago
New York
CA*net 4 node)
Possible future CA*net 4 node
Toronto
Bill St Arnaud
Windsor
CANARIE
Wavelength Disk Drives
St. John’s
Regina
Calgary
CA*net 3/4
Winnipeg
Charlottetown
Montreal
Fredericton
Vancouver
WDD Node
Halifax
Toronto
Ottawa
Computer data continuously circulates around the WDD
GEANT
Nordic Grid Networks
0.155 Gbps
0.622 Gbps
2.5
10
Gbps
Gbps
SURFnet4 Topology
Grid Applications
Grid Application Projects
PAMELA
ODIN
March 28, 2000 Fort Worth Tornado
Courtesy Kelvin Droegemeier
In 1988 … NEXRAD Was
Becoming a Reality
Courtesy Kelvin Droegemeier
ns f
Environmental Studies
Houston, TX
Neptune Undersea Grid
Air Quality Measurement
and Control
NCAR
Real-time
data
Surface data
Radar data
Ballon data
Satellite data
Digital Mammography
Ê
Ê
Ê
Ê
Ê
Ê
Ê
Ê
Ê
Ê
Ê
Ê
About 40 million mammograms/yr (USA) (estimates 32 – 48 million)
About 250,000 new breast cancer cases detected each year
Over 10,000 units (analogue)
Resolution: up to about 25 microns/pixel
Image size: up to about 4k x 6k (example: 4096 x 5624)
Dynamic range: 12-bits
Image size: about 48 Mbytes
Images per patient: 4
Data set size per patient: about 200 Mbytes
Data set per year: about 10 Pbytes
Data set per unit, if digital: 1 Tbytes/yr, on average
Data rates/unit: 4 Gbytes/operating day, or 0.5 Gbytes/hr, or 1
Mbps
Ê Computation: 100 ops/pixel = 10 Mflops/unit, 100 Gflops total; 1000
ops/pixel = 1 Tflops total
E-Science: Data Gathering, Analysis,
Simulation, and Collaboration
Simulated
Higgs Decay
CMS
LHC
Molecular Dynamics
Jim Briggs
University of Houston
Molecular Dynamics Simulations
SimDB
Simulation Data Base
SimDB
Architecture
Biological Imaging
JEOL300
0-FEG
Liquid He
stage
NSF
support
500
Å
No. of Particles Needed for 3-D
Reconstruction
Resolution
B = 100 Å2
B = 50 Å2
8.5 Å
6,000
3,000
4.5 Å
5,000,000
150,000
8.5 Å Structure
of the HSV-1
Capsid
EMAN
Vitrification
Robot
Particle Selection
Power Spectrum
Analysis
Initial
3D Model
Classify
Particles
Reproject
3D Model
Align
Average
Deconvolute
Build New
3D Model
EMEN Database
•Archival
•Data Mining
•Management
Tele-Microscopy
Osaka, Japan
Mark Ellisman, UCSD
Computational Steering
GEMSviz at iGRID 2000
Paralleldatorcentrum
KTH Stockholm
INET
NORDUnet
STAR TAP
APAN
Universityof Houston
NORDUnet 2000
28 Sep 00 - #17
GrADS – Grid Application
Development Software
Grids – Contract Development
Grids - Contract Development
Grids – Contract Development
Grids – Application Launch
Grids – Library Evaluation
Grids – Performance Models
Grids – Library Evaluation
Grids – Library Evaluation
Cactus on the Grid
Cactus – Job Migration
Cactus – Migration Architecture
Cactus – Migration example
Adaptive Software
Challenges
• Diversity of execution environments
– Growing complexity of modern microprocessors.
• Deep memory hierarchies
• Out-of-order execution
• Instruction level parallelism
– Growing diversity of platform characteristics
• SMPs
• Clusters (employing a range of interconnect
technologies)
• Grids (heterogeneity, wide range of characteristics)
• Wide range of application needs
– Dimensionality and sizes
– Data structures and data types
– Languages and programming paradigms
Challenges
• Algorithmic
– High arithmetic efficiency
• low floating-point v.s. load/store ratio
– Unfavorable data access patterns (big 2n strides)
• Application owns the datastructures/layout
– Additions/multiplications unbalanced
• Version explosion
– Verification
– Maintenance
Opportunities
• Multiple algorithms with comparable
numerical properties for many functions
• Improved software techniques and hardware
performance
• Integrated performance monitors, models
and data bases
• Run-time code construction
Approach
• Automatic algorithm selection – polyalgorithmic
functions (CMSSL, FFTW, ATLAS, SPIRAL, …..)
• Exploit multiple precision options
• Code generation from high-level descriptions
(WASSEM, CMSSL, CM-Convolution-Compiler, FFTW,
UHFFT, SPIRAL, …..)
• Integrated performance monitoring, modeling and
analysis
• Judicious choice between compile-time and run-time
analysis and code construction
• Automated installation process
The UHFFT
• Program preparation at installation (platform
dependent)
• Integrated performance models (in progress)
and data bases
• Algorithm selection at run-time from set
defined at installation
• Automatic multiple precision constant
generation
• Program construction at run-time based on
application and performance predictions
Performance Tuning
Methodology
Input Parameters
System specifics,
User options
Input Parameters
Size, dim., …
UHFFT Code
generator
Initialization
Select best plan
(factorization)
Library of
FFT modules
Execution
Calculate one
or more FFTs
Performance
database
Performance
Monitoring
Database update
Installation
Run-time
Codelet efficiency
Intel PIV 1.8 GHz
AMD Athlon 1.4 GHz
PowerPC G4 867 MHz
Radix-4 codelet efficiency
Intel PIV 1.8 GHz
AMD Athlon 1.4 GHz
PowerPC G4 867 MHz
Radix-8 codelet efficiency
Intel PIV 1.8 GHz
AMD Athlon 1.4 GHz
PowerPC G4 867 MHz
Plan Performance, 32bit Architectures
Power3 plan performance
350
300
200
150
100
50
Plan
2222
422
242
224
82
44
28
0
16
MFLOPS
250
222 MHz
888 Mflops
Itanium …..
Processor
Intel Itanium
Intel Itanium 2
Intel Itanium 2
Sun UltraSparc-III
Sun UltraSparc-III
Clock frequency
800 Mhz
900 Mhz
1000 Mhz
750 Mhz
1050 Mhz
Peak
Performance
Cache structure
3.2 GFlops
L1: 16K+16K
(Data+Instruction)
L2: 92K, L3: 2-4M (off-die)
3.6 GFlops
L1: 16K+16K
(Data+Instruction)
L2: 256K, L3: 1.5M (on-die)
4 GFlops
L1: 16K+16K
(Data+Instruction)
L2: 256K, L3: 3M (on-die)
1.5 GFlops
L1: 64K+32K+2K+2K
(Data+Instruction+Pre-fetch+Write)
L2: up to 8M (off-die)
2.1 GFlops
L1: 64K+32K+2K+2K
(Data+Instruction+Pre-fetch+Write)
L2: up to 8M (off-die)
Tested configuration
Itanium-2 (McKinley)
Itanium
L1I and L1D
Size:
Line size/Associativity:
Latency:
Write Policies:
16KB + 16KB
64B/4-way
1 cycle
Write through, No write allocate
16KB + 16KB
32B/4-way
1 cycle
Write through, No write allocate
Unified L2
Size:
Line size/Associativity:
Integer Latency:
FP Latency:
Write Policies:
256KB
128B/8-way
Min 5 cycles
Min 6 cycles
Write back, write allocate
96K B
64B/6-way
Min 6 cycles
Min 9 cycles
Write back, write allocate
Unified L3
Memory Hierarchy
Size:
Line size/Associativity:
Integer Latency:
FP Latency:
Bandwith:
3MB or 1.5MB on chip
128B/12-way
Min 12 cycles
Min 13 cycles
32B/cycle
4MB or 2MB off chip
64B/4-way
Min 21 cycles
Min 24 cycles
16B/cycle
Itanium Comparison
Workstation
HP i2000
HP zx2000
Processor
800 MHz Intel Itanium
900 MHz Intel Itanium 2 (McKinley)
Bus Speed
133 MHZ
400 MHz
Bus Width
64 bit
128 bit
Chipset
Intel 82460GX
HP zx1
Memory
2 GB SDRAM (133 MHz)
2 GB DDR SDRAM (266 MHz)
OS
64-bit Red Hat Linux 7.1
HP version of the 64-bit RH Linux 7.2
Compiler
Intel 6.0
Intel 6.0
HP zx1 Chipset
2-way block diagram
Features:
•2-way and 4-way
•Low latency connection to the DDR
memory (112 ns)
•Directly (112 ns latency)
•Through (up to 12 ) scalable
memory expanders (+25 ns
latency)
•Up to 64 GB of DDR today (256 in
the future)
•AGP 4x today (8x in the future
versions)
•1-8 I/O adapters supporting
•PCI, PCI-X, AGP
UHFFT Codelet Performance
UHFFT Codelet Performance
Codelet Performance Radix-2
Codelet Performance Radix-3
Codelet Performance Radix-4
Codelet Performance Radix-5
Codelet Performance Radix-6
Codelet Performance Radix-7
Codelet Performance Radix-13
Codelet Performance Radix-64
The UHFFT: Summary
•
•
•
•
Code generator written in C
Code is generated at installation
Codelet library is tuned to the underlying architecture
The whole library can be easily customized through
parameter specification
– No need for laborious manual changes in the source
– Existing code generation infrastructure allows easy library extensions
• Future:
– Inclusion of vector/streaming instruction set extension for various
architectures
– Implementation of new scheduling/optimization algorithms
– New codelet types and better execution routines
– Unified algorithm specification on all levels
Acknowledgements
GrADS contributors
Dave Angulo, Ruth Aydt, Fran Berman, Anrew
Chien, Keith Cooper, Holly Dail, Jack Dongarra, Ian
Foster, Sridhar Gullapallii, Lennart Johnsson, Ken
Kennedy, Carl Kesselman, Chuck Koelbel, Bo Liu,
Chuang Liu, Xin Liu, Anirban Mandal, Mark Mazina,
John Mellor-Crummey, Celso Mendes, Graziano
Obertelli, Alex Olugbile, Mitul Patel, Dan Reed,
Martin Swany, Linda Torczon, Satish Vahidyar,
Shannon Whitmore, Rich Wolski, Huaxia Xia,
Lingyun Yang, Asim Yarkin, ….
Funding: NSF Next Generation Software initiative,
Los Alamos Computer Science Institute
Acknowledgements
SimDB Contributors:
Matin Abdullah
Michael Feig
Lennart Johnsson
Seonah Kim
Prerna Kohsla
Gillian Lynch
Montgomery Pettitt
Funding:
NPACI (NSF)
Texas Learning and Computation Center
Acknowledgements
UHFFT Contributors
Dragan Mirkovic
Rishad Mahasoom
Fredrick Mwandia
Nils Smeds
Funding:
Alliance (NSF)
LACSI (DoE)

Similar documents

CURSUS NETWERK MANAGEMENT ENERTEL 12 en 13 mei 1998

CURSUS NETWERK MANAGEMENT ENERTEL 12 en 13 mei 1998 1: MAF is considered to be additional to any Agent or Manager activities and may be inconflict with ISO definitions 2: These functions (or equivalent) may be considered to be as part of the UISF

More information