Innovus Implementation System

Transcription

Innovus Implementation System
Innovus Implementation System
August 2015
Customer design challenge
Increasing power, performance, area demand
Too many blocks and long turnaround time
Source: AnandTech
Turnaround
time (TAT)
Conflicting
design
objectives
Power,
performance,
area (PPA)
Traditional solutions have only addressed one objective: TAT or PPA
2
© 2015 Cadence Design Systems, Inc. All rights reserved.
Cadence Full-flow Digital Solution
Cadence Full-flow
Digital Solution
Traditional
Flow
10-20%
CTS
timing
power
Implementation
extract
CTS
timing
power
Unified CTS, Global Router
placer
router
Opt
Better PPA
Synthesis
Unified Timing/Power/Extract
Opt
Best-in-class PPA Optimization
timing
Unified Placement Engine
placer
Signoff
TAT/Capacity gain
Full-full Correlation
Design Convergence
Early Signoff Opt
extract
Reduced iterations
Massively
Parallel
3
Up to 10X
© 2015 Cadence Design Systems, Inc. All rights reserved.
Unified
Engines
Core PPA
Algorithms
Introducing “Innovus Implementation System”
Full-flow
Speedup
Up to 10X
TAT / capacity gain
10-20%
Better PPA
INNOVUS
Integrated Signoff
Ease of Use
Massively
Parallel
Industrybest PPA
Production Proven
16/14/10nm, established nodes
A New Era of 5-10M+ Instance Block Implementation
4
© 2015 Cadence Design Systems, Inc. All rights reserved.
Agenda
• Dramatic TAT gains
– Full flow, CPU-efficient improvements
• Industry-leading power/performance/area
– 10% to 20% better power, performance, and area (PPA)
• Enhanced planning and exploration
– Enabling billion-gate designs and fast PPA trials
• Usability improvements and advanced-node readiness
– Common user interface, 16/14/10nm readiness
5
© 2015 Cadence Design Systems, Inc. All rights reserved.
Innovus Implementation System improves design
schedule
5-10M+ instance blocks
150  25-50 blocks
Weeks / Months saved
Reduces total
number of blocks
Improves SoC
design schedule
Effectively handles
large blocks
5-10X
TAT/capacity
gain
Core
algorithm
speedup
Full-flow
multithreading
Distributed
network
processing
MASSIVELY PARALLEL
6
© 2015 Cadence Design Systems, Inc. All rights reserved.
Multi (MMMC)
scenario
acceleration
Production-proven speedup with Innovus
Implementation System
TAT speedup with Innovus™ Implementation System
9.7X
9.3M Cell
28nm
7X
2.8M Cell
28nm
Innovus
Reference
6.9X
3.1M Cell
28nm
6.1X
5.5M Cell
16nm
5.2X
1.5M Cell
16nm
0
29
100
200
300
400
Hours
7
© 2015 Cadence Design Systems, Inc. All rights reserved.
500
600
700
1.5M-cell graphics processor
16nm, 8 CPUs
hours
150
postroute_hold
postroute
route
postcts_hold
cts
prects
29
8 views: 5 setup, 3 hold
High performance, AOCV
70% utilization
8
© 2015 Cadence Design Systems, Inc. All rights reserved.
place
2.8M-cell networking IP
28nm, 8 CPUs
hours
336
postroute_hold
postroute
route
postcts_hold
cts
prects
48
place
Init
1.4M instances
per day
2 setup views, 2 hold views
800MHz
9
© 2015 Cadence Design Systems, Inc. All rights reserved.
NanoRoute Advanced Digital Router
Massively parallel
Design
4 CPUs 96 CPUs Ratio
1
3:30:36
0:26:54
7.8
2
3:55:39
0:28:04
8.4
Massively Parallel
3
5:50:28
0:52:08
6.7
Unique Pipelining for Scalability
4
6:52:25
0:35:24
11.7
5
25:08:27
1:24:55
17.8
6
40:45:57
3:38:20
11.2
Design
Example
# CPUs
Elapsed
time
#
Instances
# Nets
Die Size (#
Gcell)
Gcell Size
4622735
4486830
10153806
15x15
4
8
16
32
64(4x16)
112(7x16)
8:31:06
4:47:33
2:48:08
1:45:00
0:58:10
00:39:00
1.78X
3.04X
4.87X
8.81X
13X
4.5M nets/hour
10M gates/hour
7M nets/hour
17M gates/hour
Only router in the industry that can run on 100+ CPU machines
10
© 2015 Cadence Design Systems, Inc. All rights reserved.
Complex, tough-to-close advanced-node design
Leading broadband and wireless company
Cadence Full-flow
Digital Solution
Bigger Blocks
Previous Vendor Flow
Multiple Small Blocks
5-10% Area
reduction
28nm
3-4 blocks
7-15 days TAT
16nm
1 block
5-7 days TAT
3-4X gain in
productivity
Original 28nm Flow
11
Cadence 16nm Flow
Block size
1-2M instances
5-7M instances
Number of blocks
3-4
1
Power, area
Inefficiencies due to hierarchy
Flat, for best area and power
Performance
Boundary timing closure issues
Flat timing
Productivity
Machines, resources
3-4X productivity benefits
© 2015 Cadence Design Systems, Inc. All rights reserved.
Agenda
• Dramatic TAT gains
– Full flow, CPU-efficient improvements
• Industry-leading power, performance, and area
– 10-20% better PPA
• Enhanced planning and exploration
– Enabling billion-gate designs and fast PPA trials
• Usability improvements and advanced-node readiness
– Common UI, 16/14/10nm readiness
12
© 2015 Cadence Design Systems, Inc. All rights reserved.
Innovus technology
Industry’s best PPA
GigaPlace™ nextgeneration placement
GigaOpt™ powerdriven optimization
10-20%
Better PPA
Advanced CCOpt™ and
slack-driven routing
H-tree
FlexH
Regular CTS tree
Slack-driven, layer-aware,
fully analytic
13
All GigaOpt transforms
made power aware
Flex H-tree improves
cross-corner variation
Minimizes leakage, internal
and switching power
Slack-driven routing
reduces SI TNS
© 2015 Cadence Design Systems, Inc. All rights reserved.
Innovus GigaPlace tool
Congestio
n
Next-generation placement technology
Wirelength
Slack
Electricaldriven
(Slack/MMMC/skew/power)
Concurrent, multi-objective,
massively-parallel algorithm
Giga
Place™
Physicaldriven
(Topology/layer/
color/pin-access)
Analytical
Placement
Engine
Integrated and correlated with
Tempus™ and GigaOpt™ tools
Optimizat
ion-driven
Advanced-node (16/14/10nm)
color-aware technology
(Gate sizing/
buffering)
Better PPA and utilization, and faster design closure
2X better TNS
14
5% better wirelength
© 2015 Cadence Design Systems, Inc. All rights reserved.
5% better leakage
3% better utilization
GigaPlace QoR benefits
Better TNS/WNS
Better Wirelength
Slack-driven
Topology-driven
Default
w/ Guide
Slackdriven
WNS: -0.274ns
TNS: -870ns
WNS: -0.289ns
TNS: -637ns
WNS: -0.266ns
TNS: -561ns
Better Congestion
Better Spreading
Auto-density screening
Physical-driven
Macro1
15
Macro2
© 2015 Cadence Design Systems, Inc. All rights reserved.
GigaPlace slack-driven placement
Test case results
WNS
preCTS
TNS
VP
Density
Cong.
Leakage
% LSL
Existing Placement
-0.056
-29.999
1995
83%
0.79%
0.96mW
0%
GigaPlace™
-0.021
-2.548
276
73%
0.73%
0.84mW
0%
WNS
r2r – I/O
TNS
r2r – I/O
VP
r2r – I/O
HWNS
r2r
HTNS
r2r
HVP
r2r
Density
# DRC
Leakage
% LSL
Existing Placement
-0.032
-0.157
-4.691
-26.227
412
1147
-0.071
-256.044
45917
85.8
650
2.12mW
7.1%
GigaPlace
0.0
-0.069
0.0
-2.480
0
84
-0.169
-81.916
16617
76.5
496
1.35mW
2.5%
postRoute
WNS
r2r – I/O
TNS
r2r – I/O
VP
r2r – I/O
HWNS
r2r
HTNS
r2r
HVP
r2r
Density
# DRC
Leakage
% LSL
86
324
77.3
69
postCTS
Existing Placement -0.16/
GigaPlace
-0.27
-0.04/
-0.14
-222.7/
-398.0
-7.2/
-76.6
7765/
11659
1068/
2775
diverging
-0.19
-332.4
12672
2.02mW
6.6%
1.32mW
2.7%
LSL: high speed/high leakage cells
16
© 2015 Cadence Design Systems, Inc. All rights reserved.
Ptrans  f clkCV / 2
GigaOpt power-driven optimization
GigaOpt™ concurrent
power-driven optimization
Static Power
Leakage
Dynamic Power
Ptrans  f clkCV 2 / 2
Internal
Switching
All GigaOpt optimization
transforms made power-aware
Ptrans  f clkCV 2 / 2
Minimizes leakage, internal and
switching power
Avoids local minima to achieve globally optimal design PPA
17
© 2015 Cadence Design Systems, Inc. All rights reserved.
GigaOpt power-driven optimization
Dynamic power reduction transforms
18
Sequential Cell Downsize
Gate Composition Transform
Pin Swap Transform
Gate Composition Transform
© 2015 Cadence Design Systems, Inc. All rights reserved.
Balancing leakage to dynamic priority
User control to tune internal
cost function to balance
between leakage and
dynamic power
19
© 2015 Cadence Design Systems, Inc. All rights reserved.
NanoRoute TrackOpt
Netlist
TrackOpt fixes SI issues
before detail route
Reduces timing jump between
pre-route and post-route
Common Timing Engine
Placer
GigaPlace™
GigaOpt™
CCOpt™
Allows change in netlist and
cell locations
Track Assignment
TrackOpt
Nano
Route
NanoRoute™
GDS
Tightly Integrated
• SI Opt PPA
• Post-route opt. PPA
20
Little impact on flow runtime
and routability
© 2015 Cadence Design Systems, Inc. All rights reserved.
Detail Route
Post-Route Opt.
Signoff
 Full detail route on all nets longer than 1 gcell
 Clean for shorts and DPT but not every complex rule
 95% correlation for SI timing with full detail route
Innovus Implementation System—
Saving battery life
1.6M-cell mobilecomputing 16nm block
with built-in DSP
Full-flow power-driven
optimization
Dynamic Power
Reduction (mW)
4.2% Better
Challenge:
• Aggressive dynamic power target
• 15+ power domains
• DVFS, clock gating, power shutoff
13% Better
Combinational
Sequential
Fully automated IEEE-1801 enabled Innovus™ flow
Achieved 11% better total power mW/MHz vs. competition
21
© 2015 Cadence Design Systems, Inc. All rights reserved.
Clock concurrent optimization (CCOpt)
Natively integrated CCOpt and full-flow CCOpt CTS
Netlist
Designs
Common Timing Engine
GigaPlace
Placer ™
GigaOpt™
CCOpt™
Nano
Route
NanoRoute™
Native CCOpt TAT
Better TAT 14.2
1.5X on average & up14.1
to 3X
Clock Tree
Synthesis
Clock/Datapath Opt.
Post-Route
Clock ECO
GDS
2-3X faster vs. scripted
Better hold awareness, fence regions,
halo and multi-corner support
22
© 2015 Cadence Design Systems, Inc. All rights reserved.
TAT (hours)
Flex H-Tree
Traditional H-Tree
 Advantages
− Good cross-corner scaling behavior
− Balanced by construction
H-tree
 Disadvantages
− Need to have power-of-two sinks
− Need rectangular unblocked area
− Higher power than ad-hoc CTS tree
FlexH
Regular CTS tree
Flex H-tree improves
cross-corner variation
Cadence Flex H-Tree
 Any number of sinks in any arrangement
 Non-rectangular floorplans with multiple blockages supported
 Intelligent tradeoffs made between skew and power
23
© 2015 Cadence Design Systems, Inc. All rights reserved.
Production-proven PPA advantage
Better PPA
(20% on average)
High-performance CPU design benchmarks
45%
Exceeds 2GHz
converts to Cadence
40%
PPA % Gain
35%
30%
25%
20%
18% better
utilization
12% better power,
converts to Cadence
@16nm
17% better
power
15%
10%
5%
0%
Design 1
CPU
@28nm
24
Design 2
CPU
@16nm
© 2015 Cadence Design Systems, Inc. All rights reserved.
Design 3
64bit CPU
@16nm
Design 4
CPU
@20nm
Innovus Implementation System—
Higher performance cores
Multiple 16nm CPU blocks
Achieved
Frequency
Target
Exceeded frequency
targets by 10%
on all blocks
110%
100%
85%
Old Flow
Tapeout
Timeline
Automated net-weighting, region
guides, and pipeline placement
25
Automated CCOpt™ multipoint CTS optimization
© 2015 Cadence Design Systems, Inc. All rights reserved.
Reduced
customization
Innovus Implementation System—
Productive mixed-signal design
32-bit microcontroller (MCU)
mixed-signal RISC design
MS Floorplanning
Better Area
Virtuoso
Full Timing Model
Innovus
Better QoR
Constraints Passing
Productivity
Effective MS ECO
TAT
Challenge
• Smaller footprint for IoT
• Digital and analog distributed
throughout design
• Multi-Vt, multi-vdd
OpenAccess
Unified design database
Reduced die size of small MCU design by additional 15%
Eliminated manual iterations (TAT from weeks to hours)
26
© 2015 Cadence Design Systems, Inc. All rights reserved.
Innovus Implementation System—
Fully-automated mixed-signal timing analysis
Digital P&R logic two levels down
Digital
Hand placed stdd.
cells five levels down
physical hierarchy
Full physical layout (routing + inst)
STA (timing + signal integrity)
No need for AMS block-level .lib
Analog
FTM
Digital
Custom digital logic five levels
down the physical hierarchy
Fully automatic identification and flattening of
digital paths within MS hierarchy
Mixed-Signal
Global
Timing
Debug
Significant reduction in iterations between analog and
digital design teams
27
© 2015 Cadence Design Systems, Inc. All rights reserved.
Innovus Implementation System—Mixed-signal
floorplanning enhancements
AoT Flow
New pin-constraint interoperability
DoT Flow
Virtuoso
Innovus
D
D
Innovus™
ANA
D
A
A
ANALOG
A
A
A
Pin get assigned based
on constraints passed
through OA view
D
D
(Pin spacing,
layer, side etc)
ANA
DIGITAL
DIGITAL
D
(Pin spacing,
layer, side, etc)
ANA
A
Virtuoso®
AMS
AMS
Legend
Pre-assigned pins
Pre-assigned pins
unassigned pins by Virtuoso
Pin constraints
(side, layer etc) for
unassigned pins
Pre-assigned pins by Virtuoso
Pin get assigned by Innovus
OA
OA
New interoperable pinGroupGuides (pin
grouping, pin location, pin spacing, layer, side, order,
exclusivity, etc)
Allows users the control of what pins can/can’t be
modified and what kinds of modifications are allowed
Eliminates manual error-prone iterations, reduces TAT from weeks to hours
28
© 2015 Cadence Design Systems, Inc. All rights reserved.
Agenda
• Dramatic TAT gains
– Full flow, CPU-efficient improvements
• Industry-leading power/performance/area
– 10-20% better PPA
• Enhanced planning and exploration
– Enabling billion-gate designs and fast PPA trials
• Usability improvements and advanced-node readiness
– Common UI, 16/14/10nm readiness
29
© 2015 Cadence Design Systems, Inc. All rights reserved.
Innovus hierarchical flow
Next-generation technology for 1B+ gate design
SoC
exploration
SoC architecture
info. (SAI) flow
Reference unit gate
Reference flop
Macros, IP
Memories
Floorplan dimension
Clocks, etc.
Design planning
and prototyping
FlexModel-based hier.
partitioning flow
Block
closure
Top-level assembly
and closure
FlexILM flow
Concurrent top-block
Opt. for interface paths
Shape generation
Feedthrough and
pin assignment
Auto
block
update
No/partial netlist
Virtual Opt
Timing
budgeting
Floorplan/timing
feasibility
Bus/power planning/
pipelining
Block
partitioning
Maintains block
reg2reg timing
Congestion/utilization
Months to days
30
500M gates in 2 days
© 2015 Cadence Design Systems, Inc. All rights reserved.
One-pass top-block closure
Agenda
• Dramatic TAT gains
– Full flow, CPU-efficient improvements
• Industry-leading power/performance/area
– 10-20% better PPA
• Enhanced planning and exploration
– Enabling billion-gate designs and fast PPA trials
• Usability improvements and advanced-node readiness
– Common UI, 16/14/10nm readiness
31
© 2015 Cadence Design Systems, Inc. All rights reserved.
Innovus Implementation System drives
productivity
Integrated signoff for
faster design closure
Implementation
Voltus Power Integrity
Tempus Timing and SI
Quantus Extraction
Signoff
Design Convergence
Analysis
Engines
Common UI: Genus™,
Innovus™, signoff
Uniform
Commands
Across
Tools
Uniform
New GUI
Across
Tools
32
Automated
Flow and
Metrics
Common
UI
Uniform
Database
Access
Uniform
Reports and
Logs
Common
Initialization
Commands
GDSII
Tempus™/Voltus™/Quantus™
analysis-driven convergence
Robust reporting and
visualization
Improved ease-of-use and
designer productivity
© 2015 Cadence Design Systems, Inc. All rights reserved.
Consistent RTL2 signoff
reporting and management
Summary
Full-flow
Speedup
Up to 10X
TAT / capacity gain
10-20%
Better PPA
INNOVUS
Integrated Signoff
Ease of Use
Massively
Parallel
Industrybest PPA
Production Proven
16/14/10nm, established nodes
A New Era of 5-10M+ Instance Block Implementation
33
© 2015 Cadence Design Systems, Inc. All rights reserved.
34
© 2015 Cadence Design Systems, Inc. All rights reserved.
© 2015 Cadence Design Systems, Inc. All rights reserved worldwide. Cadence, Virtuoso, and the Cadence logo are registered trademarks
and CCOpt, Genus, GigaPlace, GigaOpt, Innovus, Quantus, Tempus, and Voltus are trademarks of Cadence Design Systems. ARM and
Cortex are registered trademarks of ARM Limited (or its subsidiaries) in the EU and/or elsewhere. All rights reserved.
35
© 2015 Cadence Design Systems, Inc. All rights reserved.