Innovus Implementation System
Transcription
Innovus Implementation System
Innovus Implementation System August 2015 Customer design challenge Increasing power, performance, area demand Too many blocks and long turnaround time Source: AnandTech Turnaround time (TAT) Conflicting design objectives Power, performance, area (PPA) Traditional solutions have only addressed one objective: TAT or PPA 2 © 2015 Cadence Design Systems, Inc. All rights reserved. Cadence Full-flow Digital Solution Cadence Full-flow Digital Solution Traditional Flow 10-20% CTS timing power Implementation extract CTS timing power Unified CTS, Global Router placer router Opt Better PPA Synthesis Unified Timing/Power/Extract Opt Best-in-class PPA Optimization timing Unified Placement Engine placer Signoff TAT/Capacity gain Full-full Correlation Design Convergence Early Signoff Opt extract Reduced iterations Massively Parallel 3 Up to 10X © 2015 Cadence Design Systems, Inc. All rights reserved. Unified Engines Core PPA Algorithms Introducing “Innovus Implementation System” Full-flow Speedup Up to 10X TAT / capacity gain 10-20% Better PPA INNOVUS Integrated Signoff Ease of Use Massively Parallel Industrybest PPA Production Proven 16/14/10nm, established nodes A New Era of 5-10M+ Instance Block Implementation 4 © 2015 Cadence Design Systems, Inc. All rights reserved. Agenda • Dramatic TAT gains – Full flow, CPU-efficient improvements • Industry-leading power/performance/area – 10% to 20% better power, performance, and area (PPA) • Enhanced planning and exploration – Enabling billion-gate designs and fast PPA trials • Usability improvements and advanced-node readiness – Common user interface, 16/14/10nm readiness 5 © 2015 Cadence Design Systems, Inc. All rights reserved. Innovus Implementation System improves design schedule 5-10M+ instance blocks 150 25-50 blocks Weeks / Months saved Reduces total number of blocks Improves SoC design schedule Effectively handles large blocks 5-10X TAT/capacity gain Core algorithm speedup Full-flow multithreading Distributed network processing MASSIVELY PARALLEL 6 © 2015 Cadence Design Systems, Inc. All rights reserved. Multi (MMMC) scenario acceleration Production-proven speedup with Innovus Implementation System TAT speedup with Innovus™ Implementation System 9.7X 9.3M Cell 28nm 7X 2.8M Cell 28nm Innovus Reference 6.9X 3.1M Cell 28nm 6.1X 5.5M Cell 16nm 5.2X 1.5M Cell 16nm 0 29 100 200 300 400 Hours 7 © 2015 Cadence Design Systems, Inc. All rights reserved. 500 600 700 1.5M-cell graphics processor 16nm, 8 CPUs hours 150 postroute_hold postroute route postcts_hold cts prects 29 8 views: 5 setup, 3 hold High performance, AOCV 70% utilization 8 © 2015 Cadence Design Systems, Inc. All rights reserved. place 2.8M-cell networking IP 28nm, 8 CPUs hours 336 postroute_hold postroute route postcts_hold cts prects 48 place Init 1.4M instances per day 2 setup views, 2 hold views 800MHz 9 © 2015 Cadence Design Systems, Inc. All rights reserved. NanoRoute Advanced Digital Router Massively parallel Design 4 CPUs 96 CPUs Ratio 1 3:30:36 0:26:54 7.8 2 3:55:39 0:28:04 8.4 Massively Parallel 3 5:50:28 0:52:08 6.7 Unique Pipelining for Scalability 4 6:52:25 0:35:24 11.7 5 25:08:27 1:24:55 17.8 6 40:45:57 3:38:20 11.2 Design Example # CPUs Elapsed time # Instances # Nets Die Size (# Gcell) Gcell Size 4622735 4486830 10153806 15x15 4 8 16 32 64(4x16) 112(7x16) 8:31:06 4:47:33 2:48:08 1:45:00 0:58:10 00:39:00 1.78X 3.04X 4.87X 8.81X 13X 4.5M nets/hour 10M gates/hour 7M nets/hour 17M gates/hour Only router in the industry that can run on 100+ CPU machines 10 © 2015 Cadence Design Systems, Inc. All rights reserved. Complex, tough-to-close advanced-node design Leading broadband and wireless company Cadence Full-flow Digital Solution Bigger Blocks Previous Vendor Flow Multiple Small Blocks 5-10% Area reduction 28nm 3-4 blocks 7-15 days TAT 16nm 1 block 5-7 days TAT 3-4X gain in productivity Original 28nm Flow 11 Cadence 16nm Flow Block size 1-2M instances 5-7M instances Number of blocks 3-4 1 Power, area Inefficiencies due to hierarchy Flat, for best area and power Performance Boundary timing closure issues Flat timing Productivity Machines, resources 3-4X productivity benefits © 2015 Cadence Design Systems, Inc. All rights reserved. Agenda • Dramatic TAT gains – Full flow, CPU-efficient improvements • Industry-leading power, performance, and area – 10-20% better PPA • Enhanced planning and exploration – Enabling billion-gate designs and fast PPA trials • Usability improvements and advanced-node readiness – Common UI, 16/14/10nm readiness 12 © 2015 Cadence Design Systems, Inc. All rights reserved. Innovus technology Industry’s best PPA GigaPlace™ nextgeneration placement GigaOpt™ powerdriven optimization 10-20% Better PPA Advanced CCOpt™ and slack-driven routing H-tree FlexH Regular CTS tree Slack-driven, layer-aware, fully analytic 13 All GigaOpt transforms made power aware Flex H-tree improves cross-corner variation Minimizes leakage, internal and switching power Slack-driven routing reduces SI TNS © 2015 Cadence Design Systems, Inc. All rights reserved. Innovus GigaPlace tool Congestio n Next-generation placement technology Wirelength Slack Electricaldriven (Slack/MMMC/skew/power) Concurrent, multi-objective, massively-parallel algorithm Giga Place™ Physicaldriven (Topology/layer/ color/pin-access) Analytical Placement Engine Integrated and correlated with Tempus™ and GigaOpt™ tools Optimizat ion-driven Advanced-node (16/14/10nm) color-aware technology (Gate sizing/ buffering) Better PPA and utilization, and faster design closure 2X better TNS 14 5% better wirelength © 2015 Cadence Design Systems, Inc. All rights reserved. 5% better leakage 3% better utilization GigaPlace QoR benefits Better TNS/WNS Better Wirelength Slack-driven Topology-driven Default w/ Guide Slackdriven WNS: -0.274ns TNS: -870ns WNS: -0.289ns TNS: -637ns WNS: -0.266ns TNS: -561ns Better Congestion Better Spreading Auto-density screening Physical-driven Macro1 15 Macro2 © 2015 Cadence Design Systems, Inc. All rights reserved. GigaPlace slack-driven placement Test case results WNS preCTS TNS VP Density Cong. Leakage % LSL Existing Placement -0.056 -29.999 1995 83% 0.79% 0.96mW 0% GigaPlace™ -0.021 -2.548 276 73% 0.73% 0.84mW 0% WNS r2r – I/O TNS r2r – I/O VP r2r – I/O HWNS r2r HTNS r2r HVP r2r Density # DRC Leakage % LSL Existing Placement -0.032 -0.157 -4.691 -26.227 412 1147 -0.071 -256.044 45917 85.8 650 2.12mW 7.1% GigaPlace 0.0 -0.069 0.0 -2.480 0 84 -0.169 -81.916 16617 76.5 496 1.35mW 2.5% postRoute WNS r2r – I/O TNS r2r – I/O VP r2r – I/O HWNS r2r HTNS r2r HVP r2r Density # DRC Leakage % LSL 86 324 77.3 69 postCTS Existing Placement -0.16/ GigaPlace -0.27 -0.04/ -0.14 -222.7/ -398.0 -7.2/ -76.6 7765/ 11659 1068/ 2775 diverging -0.19 -332.4 12672 2.02mW 6.6% 1.32mW 2.7% LSL: high speed/high leakage cells 16 © 2015 Cadence Design Systems, Inc. All rights reserved. Ptrans f clkCV / 2 GigaOpt power-driven optimization GigaOpt™ concurrent power-driven optimization Static Power Leakage Dynamic Power Ptrans f clkCV 2 / 2 Internal Switching All GigaOpt optimization transforms made power-aware Ptrans f clkCV 2 / 2 Minimizes leakage, internal and switching power Avoids local minima to achieve globally optimal design PPA 17 © 2015 Cadence Design Systems, Inc. All rights reserved. GigaOpt power-driven optimization Dynamic power reduction transforms 18 Sequential Cell Downsize Gate Composition Transform Pin Swap Transform Gate Composition Transform © 2015 Cadence Design Systems, Inc. All rights reserved. Balancing leakage to dynamic priority User control to tune internal cost function to balance between leakage and dynamic power 19 © 2015 Cadence Design Systems, Inc. All rights reserved. NanoRoute TrackOpt Netlist TrackOpt fixes SI issues before detail route Reduces timing jump between pre-route and post-route Common Timing Engine Placer GigaPlace™ GigaOpt™ CCOpt™ Allows change in netlist and cell locations Track Assignment TrackOpt Nano Route NanoRoute™ GDS Tightly Integrated • SI Opt PPA • Post-route opt. PPA 20 Little impact on flow runtime and routability © 2015 Cadence Design Systems, Inc. All rights reserved. Detail Route Post-Route Opt. Signoff Full detail route on all nets longer than 1 gcell Clean for shorts and DPT but not every complex rule 95% correlation for SI timing with full detail route Innovus Implementation System— Saving battery life 1.6M-cell mobilecomputing 16nm block with built-in DSP Full-flow power-driven optimization Dynamic Power Reduction (mW) 4.2% Better Challenge: • Aggressive dynamic power target • 15+ power domains • DVFS, clock gating, power shutoff 13% Better Combinational Sequential Fully automated IEEE-1801 enabled Innovus™ flow Achieved 11% better total power mW/MHz vs. competition 21 © 2015 Cadence Design Systems, Inc. All rights reserved. Clock concurrent optimization (CCOpt) Natively integrated CCOpt and full-flow CCOpt CTS Netlist Designs Common Timing Engine GigaPlace Placer ™ GigaOpt™ CCOpt™ Nano Route NanoRoute™ Native CCOpt TAT Better TAT 14.2 1.5X on average & up14.1 to 3X Clock Tree Synthesis Clock/Datapath Opt. Post-Route Clock ECO GDS 2-3X faster vs. scripted Better hold awareness, fence regions, halo and multi-corner support 22 © 2015 Cadence Design Systems, Inc. All rights reserved. TAT (hours) Flex H-Tree Traditional H-Tree Advantages − Good cross-corner scaling behavior − Balanced by construction H-tree Disadvantages − Need to have power-of-two sinks − Need rectangular unblocked area − Higher power than ad-hoc CTS tree FlexH Regular CTS tree Flex H-tree improves cross-corner variation Cadence Flex H-Tree Any number of sinks in any arrangement Non-rectangular floorplans with multiple blockages supported Intelligent tradeoffs made between skew and power 23 © 2015 Cadence Design Systems, Inc. All rights reserved. Production-proven PPA advantage Better PPA (20% on average) High-performance CPU design benchmarks 45% Exceeds 2GHz converts to Cadence 40% PPA % Gain 35% 30% 25% 20% 18% better utilization 12% better power, converts to Cadence @16nm 17% better power 15% 10% 5% 0% Design 1 CPU @28nm 24 Design 2 CPU @16nm © 2015 Cadence Design Systems, Inc. All rights reserved. Design 3 64bit CPU @16nm Design 4 CPU @20nm Innovus Implementation System— Higher performance cores Multiple 16nm CPU blocks Achieved Frequency Target Exceeded frequency targets by 10% on all blocks 110% 100% 85% Old Flow Tapeout Timeline Automated net-weighting, region guides, and pipeline placement 25 Automated CCOpt™ multipoint CTS optimization © 2015 Cadence Design Systems, Inc. All rights reserved. Reduced customization Innovus Implementation System— Productive mixed-signal design 32-bit microcontroller (MCU) mixed-signal RISC design MS Floorplanning Better Area Virtuoso Full Timing Model Innovus Better QoR Constraints Passing Productivity Effective MS ECO TAT Challenge • Smaller footprint for IoT • Digital and analog distributed throughout design • Multi-Vt, multi-vdd OpenAccess Unified design database Reduced die size of small MCU design by additional 15% Eliminated manual iterations (TAT from weeks to hours) 26 © 2015 Cadence Design Systems, Inc. All rights reserved. Innovus Implementation System— Fully-automated mixed-signal timing analysis Digital P&R logic two levels down Digital Hand placed stdd. cells five levels down physical hierarchy Full physical layout (routing + inst) STA (timing + signal integrity) No need for AMS block-level .lib Analog FTM Digital Custom digital logic five levels down the physical hierarchy Fully automatic identification and flattening of digital paths within MS hierarchy Mixed-Signal Global Timing Debug Significant reduction in iterations between analog and digital design teams 27 © 2015 Cadence Design Systems, Inc. All rights reserved. Innovus Implementation System—Mixed-signal floorplanning enhancements AoT Flow New pin-constraint interoperability DoT Flow Virtuoso Innovus D D Innovus™ ANA D A A ANALOG A A A Pin get assigned based on constraints passed through OA view D D (Pin spacing, layer, side etc) ANA DIGITAL DIGITAL D (Pin spacing, layer, side, etc) ANA A Virtuoso® AMS AMS Legend Pre-assigned pins Pre-assigned pins unassigned pins by Virtuoso Pin constraints (side, layer etc) for unassigned pins Pre-assigned pins by Virtuoso Pin get assigned by Innovus OA OA New interoperable pinGroupGuides (pin grouping, pin location, pin spacing, layer, side, order, exclusivity, etc) Allows users the control of what pins can/can’t be modified and what kinds of modifications are allowed Eliminates manual error-prone iterations, reduces TAT from weeks to hours 28 © 2015 Cadence Design Systems, Inc. All rights reserved. Agenda • Dramatic TAT gains – Full flow, CPU-efficient improvements • Industry-leading power/performance/area – 10-20% better PPA • Enhanced planning and exploration – Enabling billion-gate designs and fast PPA trials • Usability improvements and advanced-node readiness – Common UI, 16/14/10nm readiness 29 © 2015 Cadence Design Systems, Inc. All rights reserved. Innovus hierarchical flow Next-generation technology for 1B+ gate design SoC exploration SoC architecture info. (SAI) flow Reference unit gate Reference flop Macros, IP Memories Floorplan dimension Clocks, etc. Design planning and prototyping FlexModel-based hier. partitioning flow Block closure Top-level assembly and closure FlexILM flow Concurrent top-block Opt. for interface paths Shape generation Feedthrough and pin assignment Auto block update No/partial netlist Virtual Opt Timing budgeting Floorplan/timing feasibility Bus/power planning/ pipelining Block partitioning Maintains block reg2reg timing Congestion/utilization Months to days 30 500M gates in 2 days © 2015 Cadence Design Systems, Inc. All rights reserved. One-pass top-block closure Agenda • Dramatic TAT gains – Full flow, CPU-efficient improvements • Industry-leading power/performance/area – 10-20% better PPA • Enhanced planning and exploration – Enabling billion-gate designs and fast PPA trials • Usability improvements and advanced-node readiness – Common UI, 16/14/10nm readiness 31 © 2015 Cadence Design Systems, Inc. All rights reserved. Innovus Implementation System drives productivity Integrated signoff for faster design closure Implementation Voltus Power Integrity Tempus Timing and SI Quantus Extraction Signoff Design Convergence Analysis Engines Common UI: Genus™, Innovus™, signoff Uniform Commands Across Tools Uniform New GUI Across Tools 32 Automated Flow and Metrics Common UI Uniform Database Access Uniform Reports and Logs Common Initialization Commands GDSII Tempus™/Voltus™/Quantus™ analysis-driven convergence Robust reporting and visualization Improved ease-of-use and designer productivity © 2015 Cadence Design Systems, Inc. All rights reserved. Consistent RTL2 signoff reporting and management Summary Full-flow Speedup Up to 10X TAT / capacity gain 10-20% Better PPA INNOVUS Integrated Signoff Ease of Use Massively Parallel Industrybest PPA Production Proven 16/14/10nm, established nodes A New Era of 5-10M+ Instance Block Implementation 33 © 2015 Cadence Design Systems, Inc. All rights reserved. 34 © 2015 Cadence Design Systems, Inc. All rights reserved. © 2015 Cadence Design Systems, Inc. All rights reserved worldwide. Cadence, Virtuoso, and the Cadence logo are registered trademarks and CCOpt, Genus, GigaPlace, GigaOpt, Innovus, Quantus, Tempus, and Voltus are trademarks of Cadence Design Systems. ARM and Cortex are registered trademarks of ARM Limited (or its subsidiaries) in the EU and/or elsewhere. All rights reserved. 35 © 2015 Cadence Design Systems, Inc. All rights reserved.