Tilera`s Many-core Processor
Transcription
Tilera`s Many-core Processor
Tilera’s Many-core Processor A scalable architecture on a single chip. J. Whitesell & S. Ladavich Tuesday, May 14th, 2013 1 2 History of Tilera 3 History of Tilera Pros and Cons of Building a Manycore Architecture 4 History of Tilera Pros and Cons of Building a Manycore Architecture The Tilera Approach 5 History of Tilera Pros and Cons of Building a Manycore Architecture The Tilera Approach Tilera’s … 6 History of Tilera Pros and Cons of Building a Manycore Architecture The Tilera Approach Tilera’s … Tile Architecture 7 History of Tilera Pros and Cons of Building a Manycore Architecture The Tilera Approach Tilera’s … Tile Architecture iMesh Network Topology 8 History of Tilera Pros and Cons of Building a Manycore Architecture The Tilera Approach Applications … Tilera’s … Tile Architecture iMesh Network Topology 9 History of Tilera Pros and Cons of Building a Manycore Architecture The Tilera Approach Applications … Server Tilera’s … Tile Architecture iMesh Network Topology 10 History of Tilera Pros and Cons of Building a Manycore Architecture The Tilera Approach Applications … Server Tilera’s … Tile Architecture iMesh Network Topology Media 11 History of Tilera Pros and Cons of Building a Manycore Architecture The Tilera Approach Applications … Server Tilera’s … Tile Architecture iMesh Network Topology Media Cloud 12 History of Tilera Pros and Cons of Building a Manycore Architecture The Tilera Approach Tilera’s … Applications … Tile Architecture Server iMesh Network Topology Media Cloud Performance Analysis and Benchmarking 13 1990 1994 2002 2004 2007 2011 14 Multi-processor made of single chips 1990 1994 2002 2004 2007 MIT’s Dr. Anant Agarwal leads the way for Tiled Manycore 2011 15 Multi-processor made of single chips 1990 1994 32-node meshmesh based cache-coherent processor 2002 MIT’s RAW architecture 2004 2007 2011 16 Multi-processor made of single chips 1990 1994 32-node meshmesh based cache-coherent processor DARPA pays the bill! Gives 10s of millions 2002 supporting RAW MIT’s RAW architecture 2004 2007 2011 17 “Tilera has solved the multi-processor scalability problem!” does not exist!” Multi-processor made of single chips 1990 1994 32-node meshmesh based cache-coherent processor DARPA pays the bill! Gives 10s of millions 2002 supporting RAW 2004 Tilera’s stealth launch 2007 2011 18 “Tilera has solved the multi-processor scalability problem!” does not exist!” Multi-processor made of single chips 1990 1994 32-node meshmesh based cache-coherent processor DARPA pays the bill! Gives 10s of millions 2002 supporting RAW 2004 Tilera’s stealth launch Tilera’s corporate launch 2007 2011 19 Multi-processor made of single chips 1990 1994 32-node meshmesh based cache-coherent processor DARPA pays the bill! Gives 10s of millions 2002 supporting RAW 2004 Tilera’s stealth launch Tilera’s corporate launch 2007 2011 Latest line Gx series is released 20 Traditional Architectures aren’t Scalable Most Multi-Core Chips Stop Around 8 Cores Bus Interconnect ▪ Creates a Bottleneck for MM Access ▪ Consumes Chip-Area & Power 21 On-Chip Memory Limits Software Support ▪ Efficient API Development is Challenging ▪ Parallel Languages and Programmers are Needed 22 On-Chip Communication is Fast! Reduced Overheads Finer Grain Size On-Chip Network Footprint is Small! Natural Tiled Connections 2-D Mesh Suits 2-D Substrate 23 Create a Basic Modular Unit Homogeneous Across Chip Known as a Tile ▪ Full-Featured Processor Core ▪ Processor Engine ▪ Cache Engine ▪ Switch Engine ▪ Capable of Running an OS Basic Look Inside a Tile 24 Processor Engine 64-bit VLIW Architecture ▪ 3 Execution Pipelines ALU, Flow Control, LD/ST Cache Engine Dynamic Distributed Cache ▪ Shared L2 Caches (L3) Switch Engine Detailed Look Inside a Tile Direct Neighbor Connections I/O Connections on Periphery 25 Networks are easy! 26 Networks are easy! Communication is cheap! 27 Leverage Multiple Independent Networks 28 1) How many networks are needed? 29 1) How many networks are needed? 2) What functionalities do the networks have? 30 How are the message types and communications defined? Message Types: Dedicated Networks: 31 How are the message types and communications defined? Message Types: Implicit Message Passing Explicit Message Passing Dedicated Networks: 32 How are the message types and communications defined? Message Types: 1 1)Implicit Implicit Message Passing Explicit Message Passing Dedicated Networks: 1)MDN 2)TDN 33 How are the message types and communications defined? Message Types: 1 1)Implicit through… Implicit Message Implicit Passing Messages Explicit Message Passing Tile-to-tile shared address space Non-uniform / distributed cache access (NUCA) Dedicated Networks: Shared address space in off-chip / main memory Uniform memory access (UMA) 1)MDN 2)TDN 34 How are the message types and communications defined? Message Types: 1 1)Implicit Implicit Message Passing Streaming Data Explicit Message Passing Messages Dedicated Networks: 1)MDN 2)TDN 35 How are the message types and communications defined? Message Types: 1 1)Implicit 2)Message Passing Implicit Message Passing Explicit Message Passing 2 Streaming Data Messages Dedicated Networks: 1)MDN 2)TDN 3)UDN 36 How are the message types and communications defined? Message Types: 1 1)Implicit 2)Message Passing Implicit Message Passing Explicit Message Passing 2 Streaming Data Large Buffers Messages Small Buffers Dedicated Networks: 1)MDN 2)TDN 3)UDN 37 How are the message types and communications defined? Message Types: 1 1)Implicit 2)Message Passing 3)Streaming Data a) Small stream Implicit Message Passing Explicit Message Passing 2 Streaming Data Messages 3a Large Buffers Small Buffers Dedicated Networks: 1)MDN 2)TDN 3)UDN 38 How are the message types and communications defined? Message Types: 1 1)Implicit 2)Message Passing 3)Streaming Data a) Small stream b) Large stream Implicit Message Passing Explicit Message Passing 2 Streaming Data Messages 3a 3b Large Buffers Small Buffers Dedicated Networks: 1)MDN 2)TDN 3)UDN 39 How are the message types and communications defined? Message Types: 1 1)Implicit 2)Message Passing 3)Streaming Data a) Small stream b) Large stream Implicit Message Passing Explicit Message Passing 2 Streaming Data 3b Messages 3a Large Buffers Small Buffers Dedicated Networks: 1)MDN 2)TDN 3)UDN Special Case: High Performance Streaming 40 How are the message types and communications defined? Message Types: 1 1)Implicit 2)Message Passing 3)Streaming Data a) Small stream b) Large stream c) Large/Continuous Implicit Message Passing Explicit Message Passing 2 Streaming Data 3b Messages 3a Large Buffers Small Buffers Dedicated Networks: 1)MDN 2)TDN 3)UDN 4)STN 3c Special Case: High Performance Streaming 41 How are the message types and communications defined? Message Types: 1 1)Implicit 2)Message Passing 3)Streaming Data a) Small stream b) Large stream c) Large/Continuous Implicit Message Passing Explicit Message Passing 2 Streaming Data 3b Messages 3a Large Buffers Small Buffers Dedicated Networks: 1)MDN 2)TDN 3)UDN 4)STN 3c Special Case: High Performance Streaming Special Case: IO Messages System Traffic 42 How are the message types and communications defined? Message Types: 1 1)Implicit 2)Message Passing 3)Streaming Data a) Small stream b) Large stream c) Large/Continuous 4)System Level & IO Implicit Message Passing Explicit Message Passing 2 Streaming Data 3b Messages 3a Large Buffers Small Buffers Dedicated Networks: 1)MDN 2)TDN 3)UDN 4)STN 5)IDN 3c Special Case: High Performance Streaming 4 Special Case: IO Messages System Traffic 43 How are the message types and communications defined? Message Types: 5 Independent Hardware Networks: 1 1)Implicit 2)Message Passing 3)Streaming Data a) Small stream b) Large stream c) Large/Continuous 4)System Level & IO Implicit Message Passing Explicit Message Passing Memory Dynamic Network Tile Dynamic Network User Dynamic Network Static Network Streaming Data I/O Dynamic Network 3b 2 Messages 3a Large Buffers Small Buffers Dedicated Networks: 1)MDN 2)TDN 3)UDN 4)STN 5)IDN 3c Special Case: High Performance Streaming 4 Special Case: IO Messages System Traffic 44 How are the message types and communications defined? Message Types: 5 Independent Hardware Networks: 1 1)Implicit 2)Message Passing 3)Streaming Data a) Small stream b) Large stream c) Large/Continuous 4)System Level & IO Implicit Message Passing Explicit Message Passing Memory Dynamic Network Tile Dynamic Network User Dynamic Network Static Network Streaming Data I/O Dynamic Network 3b 1)MDN 2)TDN 3)UDN 4)STN 5)IDN Messages 3a Large Buffers Dedicated Networks: 2 Small Buffers Which minimize overheads for all desired forms of communication 3c Special Case: High Performance Streaming 4 Special Case: IO Messages System Traffic 45 Parallel Processing in Embedded Domain Network ▪ Lossless Packet Capture ▪ Intrusion Detection & Prevention Multimedia ▪ Video Conferencing ▪ IP Surveillance Cloud ▪ In-Memory Caching ▪ Server Load Balancing 46 Numerous Evaluations Single-Core Performance ▪ CoreMark Score Parallelized Performance ▪ Information Fusion ▪ Gaussian Elimination ▪ MemCached Comparisons of SMPs & Many-Core 47 Evaluates Single-Core Performance 4 Algorithms 1 Final Score Tilera’s Processors Feature: VLIW Architecture 3 Pipelines 64-bit Instr. Words All or None Exec. CoreMark Score Single-Core Single Thread CoreMark Comparison 48 Embedded Wireless Sensor Networks Cluster Heads Receive from 10 Sensors Head Node Performs Reduction ▪ Moving Average Filter 49 Results Vary Based on Application Integer-Based Arithmetic Floating-Point Intensive Information Fusion Application Gaussian Elimination Application 50 Why? Tiles Lack a Dedicated Floating-point Unit! Information Fusion Application Gaussian Elimination Application 51 Distributed Memory Caching System Creates a Virtual Memory Pool Used for Key-Value Stores Designed to Alleviate Database Load Currently Implemented by… Social Media Giants ▪ Facebook, Twitter, and Zynga 52 For a Fixed Memory Footprint ▪ Tilera Achieves 3.35x Throughput @ Less Power ▪ Better Performance per Watt 53 The Tile Architecture Exhibits… Superior Scalability ▪ Modular Design ▪ Low Cost of On-Chip Communication ▪ Exploiting a Variety of Task Grain Sizes ▪ ILP and TLP High Performance per Watt ▪ Relatively Low Clock Speeds ▪ Idle Mode for Unused Tiles ▪ Reducing Costs of Web Datacenters 54 55 Waingold, E.; Taylor, M.; Srikrishna, D.; Sarkar, V.; Lee, W.; Lee, V.; Kim, J.; Frank, M.; Finch, P.; Barua, R.; Babb, J.; Amarasinghe, S.; Agarwal, A., "Baring it all to software: Raw machines," Computer , vol.30, no.9, pp.86,93, Sep 1997 CURRENTLY NOT NEEDED Tilera Corporation, “Tile Processor User Architecture Manual,” UG101, Nov. 2011 [Rev. 2.4] Wentzlaff, D.; Griffin, P.; Hoffmann, H.; Liewei Bao; Edwards, B.; Ramey, C.; Mattina, M.; Chyi-Chang Miao; Brown, J.F.; Agarwal, A., "On-Chip Interconnection Architecture of the Tile Processor," Micro, IEEE , vol.27, no.5, pp.15,31, Sept.-Oct. 2007 Munir, A.; Gordon-Ross, A.; Ranka, S., "Parallelized benchmark-driven performance evaluation of SMPs and tiled multi-core architectures for embedded systems," Performance Computing and Communications Conference (IPCCC), 2012 IEEE 31st International , vol., no., pp.416,423, 1-3 Dec. 2012 Berezecki, M.; Frachtenberg, E.; Paleczny, M.; Steele, K., "Many-core key-value store," Green Computing Conference and Workshops (IGCC), 2011 International , vol., no., pp.1,8, 25-28 July 2011 R. Schooler, “The TILE-Gx Processor: Enabling HPC through Massive-Scale Manycore,” IEEE High Performance EMbedded Computing Conference Proceedings, 2010. Presentation Slides 28-30. Links to Other Images (Presentation Only): Tilera Silicon - http://www.datacenterdynamics.com/focus/archive/2011/07/facebook-tilera-chips-more-energy-efficient-x86 AMD Phenom Silicon - http://siliconmadness.blogspot.com/2010/05/amd-phenom-ii-x6-overclocking-record.html Scalability Graph - www.ll.mit.edu/HPEC/agendas/.../S2_1405_Schooler_presentation.ppt Tilera Products and Theme - http://www.tilera.com/contact/media_library Single Tile Detail - http://semiaccurate.com/2009/10/29/look-100-core-tilera-gx/ 56