Integrated Adaptive Software Systems
Transcription
Integrated Adaptive Software Systems
Simulation Science in Grid Environments: Integrated Adaptive Software Systems Lennart Johnsson Advanced Computing Research Laboratory Department of Computer Science University of Houston and Department of Numerical Analysis and Computer Science Royal Institute of Technology, Stockholm Outline • • • • Technology drivers Sample applications Domain specific software environments High-performance software Functions per chip, Mtransistors Cost of Computing 100000 80000 DRAM, at Production 70000 DRAM, at Introduction 60000 50000 40000 Today’s most powerful computers (the power of 10,000 PCs at a cost of $100M) will cost a few hundred thousand dollars 30000 20000 MPU, Cost-Performance, at Production MPU, Cost-Performance, at Introduction MPU, High-Performance, at Production ASIC, at Production 10000 0 1999 180 nm 2000 2001 2002 130 nm . 2003 . 2004 2005 90 nm . 2008 2011 2014 60 nm 40 nm 30 nm Cost/Mtransistor $/Mtransistor In 2010, the compute power of today’s top-of-the-line PC can be found in $1 consumer electronics SIA Roadmap 90000 45 40 35 30 25 20 15 10 5 0 SIA Roadmap DRAM, cost x 100, at Introduction, $/Mtransistors DRAM, cost x 100, at Production, $/Mtransistors 1999 2001 2003 2005 2011 . 100 40 nm 130 180 nm nm nm MPU, Cost-Performance, at Introduction, $/Mtransistors MPU, Cost-Performance, at Production, $/Mtransistors MPU, High Performance, at Production, $/Mtransistors 8KB Average Price of Storage DRAM Seagate ST500 32KB 100 Price/MByte, Dollars In 2010, $1 will buy enough disk space to store 1000 IBM6150 Wren II 10 HDD 64KB 128KB 128KB Flash 512KB Seagate ST125 IBM0615 Maxt170 IBM0663 1 DRAM IBM 9.1GB Ultrastar Seagate B'cuda4 IBM Deskstar3 Paper/Film 0.1 Flash IBM 1 GB Microdrive IBM 8.1GB Travelstar IBM Deskstar4 Quant 4.5GB IBM 18.2GB Ultrastar IBM 16.8GB Deskstar Seagate 8.6GB IBM Deskstar 25GB 0.01 oemprc2000aa.prz 1995 1" HDD Projection DataQuest 2000 IBM 9.1GB Ultra 2XP 1 " HDD 3.5 " HDD 1990 Flash Projection DataQuest 2000 Toshiba IBM 25GB 6.4GB Travelstar IBM Deskstar 37GB IBM Deskstar 75GXP 1985 Paper/Film 256KB Flash 512KB Flash 2MB 1MB 4MB Flash 1MB Flash 4MB 64MB Flash 16MB Flash 96 MB Flash Camera 128MB Flash Mem. 128MB Flash 64MB IBM 340 MB Microdrive 64MB Range of 0.001 1980 Flash 2000 2.5 " HDD 2005 2010 Year Ed Grochowski at Almaden 10,000 Books 35 hrs of CD Quality audio 2 min of DVD Quality Video Growth of Cell vs. Internet 800 700 600 In Millions 500 Cell Subscriptions 400 Internet Hosts 300 200 100 0 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 Access Technologies Computing Platforms 2001 ⇒ 2030 Ê Personal Computers O[$1000] – 109 Flops/sec in 2001 ⇒ 1015 – 1017 Flops/sec by 2030 Ê Supercomputers O[$100,000,000] – 1013 Flops/sec in 2001 ⇒ 1018 – 1020 Flops/sec by 2030 Ê Number of Computers [global population ~1010] – SCs ⇒ 10-8 –10-6 per person ⇒ 102 – 104 systems – PCs ⇒ .1x – 10x per person ⇒ 109 – 1011 systems – Embedded ⇒ 10x – 105x per person ⇒ 1011 – 1015 systems – Nanocomputers ⇒ 0x – 1010 per person ⇒ 0 – 1020 systems Ê Available Flops Planetwide by 2030 – 1024 – 1030 Flops/sec [assuming classical models of computation] Courtesy Rick Stevens MEMS - Biosensors http://www.darpa.mil/mto/mems/presentations/memsatdarpa3.pdf MEMS – Jet Engine Application http://www.darpa.mil/mto/mems/presentations/memsatdarpa3.pdf Smart Dust - UCB Sensor RF Mote Laser Mote with CCD RF Mini Mote I Laser Mote IrDA Mote RF Mini Mote II http://robotics.eecs.berkeley.edu/~pister/SmartDust/ Polymer Radio Frequency Identification Transponder http://www.research.philips.com/pressmedia/pictures/polelec.html Optical Communication costs Larry Roberts, Caspian Networks Fiber Optic Communication In 2010. . . ¾ A million books can be sent across the Pacific for 1$ in 8 seconds ¾ All books in the American Research Libraries can be sent across the Pacific in about 1 hr for $500 Fiberoptic Communication Milestones Ê Ê Ê Ê Ê Ê First Laser 1960 Ê WAN fiberoptic cables often have 384 strands of fiber and would have a capacity of 2 Pbps. Several such cables are typically deployed in the same conduit/right-of-way First room temperature laser, ~1970 Continuous mode commercial lasers, ~1980 Tunable lasers, ~1990 Commercial fiberoptic WANs, 1985 10 Tbps/strand demonstrated in 2000 (10% of fiber peak capacity). (10 Tbps is enough bandwidth to transmit a million high-definition resolution movies simultaneously, or over 100 million phone calls). Pacific Capacity Atlantic Capacity NSFnet vBNS Internet2 Abilene TeraGrid 10000 Doubling every year OC-192 8000 Mbit/s 6000 4000 OC-48 2000 OC-3 OC-12 0 1986 1996 1997 1999 SURFnet 2001 Year Vancouver CA*net4 Seattle Portland NTON NTON San Francisco Los Angeles San Diego (SDSC ) DTF 40Gb U Wisconsin Chicago IU NCSA PSC N Y C Atlanta AMPA TH CalREN-2 I-WIRE UIC NU / Starlight • State Funded Infrastructure to Star Tap support Networking and Applications ANL Research IIT – $6.5M Total Funding • $4M FY00-01 (in hand) UC • $2.5M FY02 (approved 1June-01) • Possible add’l $1M in FY03-5 – Application Driven • Access Grid: Telepresence & Media NCSA/UIUC • Computational Grids: Internet Computing • Data Grids: Information Analysis – New Technologies Proving Ground • Optical Switching • Dense Wave Division Multiplexing • Ultra-High Speed SONET • Wireless • Advanced middleware infrastructure Charlie Catlett Argonne National Laboratory CA*net 4 Architecture CANARIE GigaPOP ORAN DWDM Carrier DWDM Edmonton Saskatoon St. John’s Calgary Regina Quebec Winnipeg Charlottetown Thunder Bay Montreal Ottawa Victoria Vancouver Fredericton Halifax Boston Seattle Chicago New York CA*net 4 node) Possible future CA*net 4 node Toronto Bill St Arnaud Windsor CANARIE Wavelength Disk Drives St. John’s Regina Calgary CA*net 3/4 Winnipeg Charlottetown Montreal Fredericton Vancouver WDD Node Halifax Toronto Ottawa Computer data continuously circulates around the WDD GEANT Nordic Grid Networks 0.155 Gbps 0.622 Gbps 2.5 10 Gbps Gbps SURFnet4 Topology Grid Applications Grid Application Projects PAMELA ODIN March 28, 2000 Fort Worth Tornado Courtesy Kelvin Droegemeier In 1988 … NEXRAD Was Becoming a Reality Courtesy Kelvin Droegemeier ns f Environmental Studies Houston, TX Neptune Undersea Grid Air Quality Measurement and Control NCAR Real-time data Surface data Radar data Ballon data Satellite data Digital Mammography Ê Ê Ê Ê Ê Ê Ê Ê Ê Ê Ê Ê About 40 million mammograms/yr (USA) (estimates 32 – 48 million) About 250,000 new breast cancer cases detected each year Over 10,000 units (analogue) Resolution: up to about 25 microns/pixel Image size: up to about 4k x 6k (example: 4096 x 5624) Dynamic range: 12-bits Image size: about 48 Mbytes Images per patient: 4 Data set size per patient: about 200 Mbytes Data set per year: about 10 Pbytes Data set per unit, if digital: 1 Tbytes/yr, on average Data rates/unit: 4 Gbytes/operating day, or 0.5 Gbytes/hr, or 1 Mbps Ê Computation: 100 ops/pixel = 10 Mflops/unit, 100 Gflops total; 1000 ops/pixel = 1 Tflops total E-Science: Data Gathering, Analysis, Simulation, and Collaboration Simulated Higgs Decay CMS LHC Molecular Dynamics Jim Briggs University of Houston Molecular Dynamics Simulations SimDB Simulation Data Base SimDB Architecture Biological Imaging JEOL300 0-FEG Liquid He stage NSF support 500 Å No. of Particles Needed for 3-D Reconstruction Resolution B = 100 Å2 B = 50 Å2 8.5 Å 6,000 3,000 4.5 Å 5,000,000 150,000 8.5 Å Structure of the HSV-1 Capsid EMAN Vitrification Robot Particle Selection Power Spectrum Analysis Initial 3D Model Classify Particles Reproject 3D Model Align Average Deconvolute Build New 3D Model EMEN Database •Archival •Data Mining •Management Tele-Microscopy Osaka, Japan Mark Ellisman, UCSD Computational Steering GEMSviz at iGRID 2000 Paralleldatorcentrum KTH Stockholm INET NORDUnet STAR TAP APAN Universityof Houston NORDUnet 2000 28 Sep 00 - #17 GrADS – Grid Application Development Software Grids – Contract Development Grids - Contract Development Grids – Contract Development Grids – Application Launch Grids – Library Evaluation Grids – Performance Models Grids – Library Evaluation Grids – Library Evaluation Cactus on the Grid Cactus – Job Migration Cactus – Migration Architecture Cactus – Migration example Adaptive Software Challenges • Diversity of execution environments – Growing complexity of modern microprocessors. • Deep memory hierarchies • Out-of-order execution • Instruction level parallelism – Growing diversity of platform characteristics • SMPs • Clusters (employing a range of interconnect technologies) • Grids (heterogeneity, wide range of characteristics) • Wide range of application needs – Dimensionality and sizes – Data structures and data types – Languages and programming paradigms Challenges • Algorithmic – High arithmetic efficiency • low floating-point v.s. load/store ratio – Unfavorable data access patterns (big 2n strides) • Application owns the datastructures/layout – Additions/multiplications unbalanced • Version explosion – Verification – Maintenance Opportunities • Multiple algorithms with comparable numerical properties for many functions • Improved software techniques and hardware performance • Integrated performance monitors, models and data bases • Run-time code construction Approach • Automatic algorithm selection – polyalgorithmic functions (CMSSL, FFTW, ATLAS, SPIRAL, …..) • Exploit multiple precision options • Code generation from high-level descriptions (WASSEM, CMSSL, CM-Convolution-Compiler, FFTW, UHFFT, SPIRAL, …..) • Integrated performance monitoring, modeling and analysis • Judicious choice between compile-time and run-time analysis and code construction • Automated installation process The UHFFT • Program preparation at installation (platform dependent) • Integrated performance models (in progress) and data bases • Algorithm selection at run-time from set defined at installation • Automatic multiple precision constant generation • Program construction at run-time based on application and performance predictions Performance Tuning Methodology Input Parameters System specifics, User options Input Parameters Size, dim., … UHFFT Code generator Initialization Select best plan (factorization) Library of FFT modules Execution Calculate one or more FFTs Performance database Performance Monitoring Database update Installation Run-time Codelet efficiency Intel PIV 1.8 GHz AMD Athlon 1.4 GHz PowerPC G4 867 MHz Radix-4 codelet efficiency Intel PIV 1.8 GHz AMD Athlon 1.4 GHz PowerPC G4 867 MHz Radix-8 codelet efficiency Intel PIV 1.8 GHz AMD Athlon 1.4 GHz PowerPC G4 867 MHz Plan Performance, 32bit Architectures Power3 plan performance 350 300 200 150 100 50 Plan 2222 422 242 224 82 44 28 0 16 MFLOPS 250 222 MHz 888 Mflops Itanium ….. Processor Intel Itanium Intel Itanium 2 Intel Itanium 2 Sun UltraSparc-III Sun UltraSparc-III Clock frequency 800 Mhz 900 Mhz 1000 Mhz 750 Mhz 1050 Mhz Peak Performance Cache structure 3.2 GFlops L1: 16K+16K (Data+Instruction) L2: 92K, L3: 2-4M (off-die) 3.6 GFlops L1: 16K+16K (Data+Instruction) L2: 256K, L3: 1.5M (on-die) 4 GFlops L1: 16K+16K (Data+Instruction) L2: 256K, L3: 3M (on-die) 1.5 GFlops L1: 64K+32K+2K+2K (Data+Instruction+Pre-fetch+Write) L2: up to 8M (off-die) 2.1 GFlops L1: 64K+32K+2K+2K (Data+Instruction+Pre-fetch+Write) L2: up to 8M (off-die) Tested configuration Itanium-2 (McKinley) Itanium L1I and L1D Size: Line size/Associativity: Latency: Write Policies: 16KB + 16KB 64B/4-way 1 cycle Write through, No write allocate 16KB + 16KB 32B/4-way 1 cycle Write through, No write allocate Unified L2 Size: Line size/Associativity: Integer Latency: FP Latency: Write Policies: 256KB 128B/8-way Min 5 cycles Min 6 cycles Write back, write allocate 96K B 64B/6-way Min 6 cycles Min 9 cycles Write back, write allocate Unified L3 Memory Hierarchy Size: Line size/Associativity: Integer Latency: FP Latency: Bandwith: 3MB or 1.5MB on chip 128B/12-way Min 12 cycles Min 13 cycles 32B/cycle 4MB or 2MB off chip 64B/4-way Min 21 cycles Min 24 cycles 16B/cycle Itanium Comparison Workstation HP i2000 HP zx2000 Processor 800 MHz Intel Itanium 900 MHz Intel Itanium 2 (McKinley) Bus Speed 133 MHZ 400 MHz Bus Width 64 bit 128 bit Chipset Intel 82460GX HP zx1 Memory 2 GB SDRAM (133 MHz) 2 GB DDR SDRAM (266 MHz) OS 64-bit Red Hat Linux 7.1 HP version of the 64-bit RH Linux 7.2 Compiler Intel 6.0 Intel 6.0 HP zx1 Chipset 2-way block diagram Features: •2-way and 4-way •Low latency connection to the DDR memory (112 ns) •Directly (112 ns latency) •Through (up to 12 ) scalable memory expanders (+25 ns latency) •Up to 64 GB of DDR today (256 in the future) •AGP 4x today (8x in the future versions) •1-8 I/O adapters supporting •PCI, PCI-X, AGP UHFFT Codelet Performance UHFFT Codelet Performance Codelet Performance Radix-2 Codelet Performance Radix-3 Codelet Performance Radix-4 Codelet Performance Radix-5 Codelet Performance Radix-6 Codelet Performance Radix-7 Codelet Performance Radix-13 Codelet Performance Radix-64 The UHFFT: Summary • • • • Code generator written in C Code is generated at installation Codelet library is tuned to the underlying architecture The whole library can be easily customized through parameter specification – No need for laborious manual changes in the source – Existing code generation infrastructure allows easy library extensions • Future: – Inclusion of vector/streaming instruction set extension for various architectures – Implementation of new scheduling/optimization algorithms – New codelet types and better execution routines – Unified algorithm specification on all levels Acknowledgements GrADS contributors Dave Angulo, Ruth Aydt, Fran Berman, Anrew Chien, Keith Cooper, Holly Dail, Jack Dongarra, Ian Foster, Sridhar Gullapallii, Lennart Johnsson, Ken Kennedy, Carl Kesselman, Chuck Koelbel, Bo Liu, Chuang Liu, Xin Liu, Anirban Mandal, Mark Mazina, John Mellor-Crummey, Celso Mendes, Graziano Obertelli, Alex Olugbile, Mitul Patel, Dan Reed, Martin Swany, Linda Torczon, Satish Vahidyar, Shannon Whitmore, Rich Wolski, Huaxia Xia, Lingyun Yang, Asim Yarkin, …. Funding: NSF Next Generation Software initiative, Los Alamos Computer Science Institute Acknowledgements SimDB Contributors: Matin Abdullah Michael Feig Lennart Johnsson Seonah Kim Prerna Kohsla Gillian Lynch Montgomery Pettitt Funding: NPACI (NSF) Texas Learning and Computation Center Acknowledgements UHFFT Contributors Dragan Mirkovic Rishad Mahasoom Fredrick Mwandia Nils Smeds Funding: Alliance (NSF) LACSI (DoE)
Similar documents
CURSUS NETWERK MANAGEMENT ENERTEL 12 en 13 mei 1998
1: MAF is considered to be additional to any Agent or Manager activities and may be inconflict with ISO definitions 2: These functions (or equivalent) may be considered to be as part of the UISF
More information