Lecture 3 - The University of Texas at Austin

Transcription

Lecture 3 - The University of Texas at Austin
EE-382M-8
VLSI–II
Early Design Planning:
Back End
Mark McDermott
EE 382M-8 VLSI-2
Page
11
Foil #
The University of Texas at Austin
Backend EDP Flow
•
The project activities will include:
– Determining the standard cell and custom library elements needed
to completely do the design with APR tools.
– Detailed floor-plan of the block level components.
– A reasonably detailed top-level floorplan using the cluster abstracts.
– Approximate clock routing at the top-level
– Approximate Power-GND routing at the top level
EE 382M-8 VLSI-2
Page
22
Foil #
The University of Texas at Austin
EDP and Layout in the Design Flow
Concept
Architecture
Technology
Readiness
EDP
uArchitecure
Logic
Front End
Development
Circuits
Layout
EDP encompasses
planning from
architecture to the layout.
Backend
Design
Execution
Silicon Ramp
EE 382M-8 VLSI-2
Page
33
Foil #
Si Debug
Production
The University of Texas at Austin
Standard Cell Library Effort
•
Will be using a very minimal standard cell library for the project:
~80+ cells
– Basic logic gates and buffers
– 1 set-reset flip-flop
•
“CMOS65_SubVt.lib” file was derived using a scaled 65nm .lib
file
– Need to validate the scaled numbers with HSPICE simulations.
– Need to validate power spreadsheet numbers using HSPICE:
• S-D leakage currents
• Intrinsic power
– Need to validate area spreadsheet numbers
EE 382M-8 VLSI-2
Page
44
Foil #
The University of Texas at Austin
Block Floorplanning Effort
•
Objectives:
–
–
–
•
Minimize area
Determine best shape of the block
Minimize total wire length
Each team will do a detailed floorplan of their respective blocks.
The output will be a spreadsheet analysis showing the
contribution from each of the following:
–
–
–
–
–
–
Power grid
Clocking
Signal Routing
Datapath area
Random logic area
White space
EE 382M-8 VLSI-2
Page
55
Foil #
The University of Texas at Austin
Integration Effort
•
The integration team will be responsible for:
–
–
–
–
–
–
–
–
–
–
Doing a floor plan of the top level of the chip
Characterizing the top-level routing delays and determining the
assertions and constraints for each cluster. They will be working
with each cluster to optimize the constraints.
Designing the clock routing structure:
Determining the clock generation implementation (block diagrams)
Determining the clock regeneration circuitry (block diagrams)
Determining the reset logic.
Designing the power grid.
Determining the power estimation for the global clock and signal
routing.
Generating the power budget for each cluster.
Generating the area budget for each cluster.
EE 382M-8 VLSI-2
Page
66
Foil #
The University of Texas at Austin
Layout Implementation Options
SPARC-T1
EE 382M-8 VLSI-2
Page
77
Foil #
The University of Texas at Austin
Layout Density & Die Size = Performance
Schematic
A
The layout of Block B affects the
C
timing of the path from A to C
Floorplan
•
Higher density layout leads to
smaller block sizes
•
Smaller block sizes lead to
shorter wires
•
Shorter wires can lead to higher
frequency
•
Shorter wires can also lead to
higher IPC by requiring fewer
transmission pipe stages
Layout #1
A
B
C
Layout #2
A
B’
EE 382M-8 VLSI-2
C
Page
88
Foil #
The University of Texas at Austin
Layout Implementation Options
•
Synthesis – Random Logic Macro (RLM)
– Cell layout comes from a shared cell library
– Automated cell selection and placement
– Automated routing between cells
•
Structured Custom (SC/SDP)
– Cell layout comes from a shared cell library
– Manual cell selection and placement
– Automated routing between cells
•
Increasing
Design Effort
(And Density)
Custom Design (CD)
– Cell layout is unique for each application
– Manual cell selection and placement
– Manual routing between cells
EE 382M-8 VLSI-2
Page
99
Foil #
The University of Texas at Austin
Layout Implementation Options
CD
SC
RLM
ARTL Coding
M
M
M
Logic Minimization
M
M
A
Cell Placement
M
M
A
Device Sizing
M
A
A
Layout
M
A
A
CD
SC
RLM
Timing
Best
Better
Worst
Density
Best
Better
Worst
Design Time
Worst
Better
Best
EE 382M-8 VLSI-2
Page
1010
Foil #
A = Automatic
M = Manual
• RLM saves time in
circuit design and
layout
• SC saves time in
layout.
• RLM and SDP make
revisions easier.
The University of Texas at Austin
Datapath and Block Floorplanning Procedures
MIPS R10K
EE 382M-8 VLSI-2
Page
1111
Foil #
The University of Texas at Austin
Datapath and Block Floorplanning Procedure
•
•
•
•
•
•
•
•
Step 1 - Identify feedthrus for RLM or SC/DP block
Step 2 - Look for opportunities for track sharing
Step 3 - Define the bitpitch of the block
Step 4 - Review the metal plan within the cell
Step 5 - Review and plan the clock routing and placement
Step 6 - Plan the critical cell placement locations
Step 7 - Estimate the area of the cells and the block
Step 8 - Review the power grid
EE 382M-8 VLSI-2
Page
1212
Foil #
The University of Texas at Austin
Feed-through or Over-the-cell (OTC) Routes
•
Metal tracks routed over RLM, Datapath or custom block
•
The block is neither the driver or a receiver of the signals
•
Feedthrus use up metal tracks which impacts the internal
signals of the block
•
Carefully review datapath connectivity to account for them
Sources
Receiver
Driver
Results
Bypass
ALU 0
ALU 1
Feedthrus
ALU 2
for ALU0
EE 382M-8 VLSI-2
Page
1313
Foil #
The University of Texas at Austin
Datapath and Block Floorplanning Procedure
•
•
•
•
•
•
•
•
Step 1 - Identify feedthrus
Step 2 - Look for opportunities for track sharing
Step 3 - Define the bitpitch of the block
Step 4 - Review the metal plan within the cell
Step 5 - Review and plan the clock routing and placement
Step 6 - Plan the critical cell placement locations
Step 7 - Estimate the area of the cells and the block
Step 8 - Review the power grid
EE 382M-8 VLSI-2
Page
1414
Foil #
The University of Texas at Austin
Step 2: Track Sharing
•
Minimizes the number of unique tracks in layout by
opportunistically sharing tracks where possible
•
Often allows for the smallest possible bitpitch
•
Allows for metal layers to be more efficiently utilized
•
Can help improve performance by shortening distances
•
Should always be explored to improve layout efficiency and
performance
EE 382M-8 VLSI-2
Page
1515
Foil #
The University of Texas at Austin
Step 2: Track Sharing
Sources
Receiver
Driver
Results
Bypass$
First, check outside your
block to see if there
are any candidates
for track sharing
ALU 0
ALU 1
ALU 2
EE 382M-8 VLSI-2
Page
1616
Foil #
The University of Texas at Austin
Step 2: Track Sharing
Metal 2
Metal 4
IE_BYC_DATA<11:0>
IE_RF_DATA<11:0>
Next, check inside your
block to see if there
are any candidates
LRBL<11:0>
RRBL<11:0>
for track sharing
EE 382M-8 VLSI-2
Page
1717
Foil #
The University of Texas at Austin
Step 2: Track Sharing Example
EE 382M-8 VLSI-2
Page
1818
Foil #
The University of Texas at Austin
Step 2: Track Sharing Example
EE 382M-8 VLSI-2
Page
1919
Foil #
The University of Texas at Austin
Datapath and Block Floorplanning Procedure
•
•
•
•
•
•
•
•
Step 1 - Identify feedthrus
Step 2 - Look for opportunities for track sharing
Step 3 - Define the bitpitch of the block
Step 4 - Review the metal plan within the cell
Step 5 - Review and plan the clock routing and placement
Step 6 - Plan the critical cell placement locations
Step 7 - Estimate the area of the cells and the block
Step 8 - Review the power grid
EE 382M-8 VLSI-2
Page
2020
Foil #
The University of Texas at Austin
Bit Pitch Defining Width of Chip
AMD K5
EE 382M-8 VLSI-2
Page
2121
Foil #
The University of Texas at Austin
Step 3: Define the Bitpitch
•
•
•
•
Fixed cell width chosen to
allow easy assembly
Most often determined by
metal usage within the
datapath
Bitpitch
Integration efficiency would
prefer one bitpitch per
project
Architectures lend
themselves to more unique
bit pitches
A<4>
A<3>
A<2>
A<1>
A<0>
EE 382M-8 VLSI-2
Page
2222
Foil #
Vdd
Sig0 <4>
Sig1 <4>
Sig2 <4>
Sig3 <4>
Sig4 <4>
Sig5 <4>
Vss
Vdd
Sig0 <1>
Sig1 <1>
Sig2 <1>
Sig3 <1>
Sig4 <1>
Sig5 <1>
Vss
The University of Texas at Austin
Step 3: Define the Bitpitch
EE 382M-8 VLSI-2
Page
2323
Foil #
Bitpitch #1 Xµ
WB Mux
Shifter
Bit Ops
System Uops
AGEN - LD / STA
ALU 1
Arith Flags
ALU 0
Bypass Cache
File
Integer Register
Bitpitch #2 Yµ
Insure all blocks in a datapath stack follow the same bitpitch
The University of Texas at Austin
Bit Pitch Example: 3:2 Adder Bit Cell
M3 & M1
Bitpitch
7.56u
M4
M1
EE 382M-8 VLSI-2
Page
2424
Foil #
The University of Texas at Austin
Bit Pitch Example: 4 Bit Cells stacked
BIT - 4
BIT - 2
BIT - 1
Bitpitch
BIT - 0
7.56u
EE 382M-8 VLSI-2
Page
2525
Foil #
The University of Texas at Austin
Bit Pitch Example: Tiled Datapath
EE 382M-8 VLSI-2
Page
2626
Foil #
The University of Texas at Austin
Bit Pitch Example: Swizzle
Don’t mix and match bit pitches to avoid swizzle channels
Swizzle
Channel
As buses get wider and the number of tracks per
bit gets higher the cost of swizzle channels grows
EE 382M-8 VLSI-2
Page
2727
Foil #
The University of Texas at Austin
Step 3: Define the Bitpitch
•
Wider bit pitches allow more upper level metal usage
•
Narrower bit pitches allow shorter routes for orthogonal signals
•
Balancing these conflicting objectives can be difficult
•
Understand your local constraints and be aware of the tradeoffs
EE 382M-8 VLSI-2
Page
2828
Foil #
The University of Texas at Austin
Datapath and Block Floorplanning Procedure
•
•
•
•
•
•
•
•
Step 1 - Identify feedthrus
Step 2 - Look for opportunities for track sharing
Step 3 - Define the bitpitch of the block
Step 4 - Review the metal plan within the cell
Step 5 - Review and plan the clock routing and placement
Step 6 - Plan the critical cell placement locations
Step 7 - Estimate the area of the cells and the block
Step 8 - Review the power grid
EE 382M-8 VLSI-2
Page
2929
Foil #
The University of Texas at Austin
Metal Planning
•
Metal layer, width, spacing and shielding are negotiable
– “Negotiable” means you have to plead your case to the integration
leaders
•
All of these impose a physical constraint for layout
•
For your first attempt at convergence
–
–
–
–
–
–
M1,M2
: Local routing
M3,M4, M5, M6
: Data and control
M7,M8
: Power, Ground, Clock, Reset, etc
Assume all nets are routed in M1&M2 within your block
Assume your only shielding is on clocks and reset
Assume the routes are minimum
EE 382M-8 VLSI-2
Page
3030
Foil #
The University of Texas at Austin
Metal Flow Planning
Avoid bi-directional dataflow
Cntl
Cntl
Data
Data
Data
BAD
EE 382M-8 VLSI-2
GOOD
Page
3131
Foil #
The University of Texas at Austin
Shielding
•
Intentionally routing signals to control the effective line-to-line
capacitance seen during switching.
•
Requires designers to constrain the physical assembly done by
routing tools or physical design specialists (PDSs).
•
Falls into one of three categories:
– Physical shielding - signals are routed next to a power rail
– Logical shielding - signals are routed by logically related signals
– Temporal shielding - signals are routed by temporally distinct
signals
EE 382M-8 VLSI-2
Page
3232
Foil #
The University of Texas at Austin
Miller Coupling Factor
A
A
B
B
C
C
MCF = 2.0 Both against
MCF = 1.5 One against, one quiet
A
A
A
B
B
B
C
C
C
MCF = 1.0 Both quiet
EE 382M-8 VLSI-2
MCF = 0.5 One with, one quiet
Page
3333
Foil #
MCF = 0.0 Both with
The University of Texas at Austin
No Shielding
•
•
•
•
Signals are routed next to any neighboring signals
Neighbors can slow down (max delay) or speed up (min delay)
signal transitions through line-to-line coupling
Variation can create design problems
Most signals will not be shielded
No Shield
Max MCF 2.0
Min MCF 0.0
A
A
B
B
C
C
Sig A Sig B Sig C
EE 382M-8 VLSI-2
Page
3434
Foil #
The University of Texas at Austin
Physical Shielding
•
•
•
•
Signals are routed next to at least one power rail
Helps both min delay and max delay
Can be expensive in terms of metal usage
Typically limited to most critical nets and clocks
Full Shield
Half Shield
Max MCF 1.5
Max MCF 1.0
Min MCF 0.5
Min MCF 1.0
Vss Sig A Sig B
EE 382M-8 VLSI-2
Vss Sig A Vss
Page
3535
Foil #
The University of Texas at Austin
Logical Shielding
•
•
•
•
Signals are routed next to mutually exclusive neighbors
Also helps min delay and max delay
Comparable results as physical shielding but lesser cost
Encouraged in mux structures and arrays
Max MCF 1.5
Min MCF 1.0
A
Sel A
Sel A
B
Sel B
Sel B
Sel A Sel B Sel C
Sel C
C
Sel C
EE 382M-8 VLSI-2
Page
3636
Foil #
The University of Texas at Austin
Temporal Shielding
•
•
•
•
Signals are routed next to signals that limit aggressors
Can help max delay or min delay or both
Lesser cost than physical shielding, but more design effort
Encouraged wherever possible but tricky
Max MCF 1.0
A
Ck
A
B
Ck
Sig A Sig B Sig C
Min MCF 0.0
B
C
C
Ck
EE 382M-8 VLSI-2
Page
3737
Foil #
The University of Texas at Austin
Shielding Gotcha
•
Tools may rely on the designer to override the default coupling
assumptions
Max MCF 2.0
L
A
Min MCF 0.0
Ck
If you need temporal shielding to make your
L
B
circuit meet timing, your circuit doesn’t
meet timing. Do not rely on it.
Ck
EE 382M-8 VLSI-2
Page
3838
Foil #
The University of Texas at Austin
Datapath and Block Floorplanning Procedure
•
•
•
•
•
•
•
•
Step 1 - Identify feedthrus
Step 2 - Look for opportunities for track sharing
Step 3 - Define the bitpitch of the block
Step 4 - Review the metal plan within the cell
Step 5 - Review and plan the clock routing and placement
Step 6 - Plan the critical cell placement locations
Step 7 - Estimate the area of the cells and the block
Step 8 - Review the power grid
EE 382M-8 VLSI-2
Page
3939
Foil #
The University of Texas at Austin
Variations of Clock Tree distribution networks
Target: Metallization and Gate topology uniformity
Tapered H-Tree
EE 382M-8 VLSI-2
Page
4040
Foil #
The University of Texas at Austin
Clock Routing
•
•
•
•
•
•
Watch out for the clock, it’s your most critical net
Make sure the physical design treats it accordingly
Help reduce clock power by eliminating unnecessary load
Make sure the clock has enough via coverage
Leave room for decoupling capacitors and upsizing
Don’t forget to account for clock routing overhead (full shield) in
your metal planning
EE 382M-8 VLSI-2
Page
4141
Foil #
The University of Texas at Austin
Clock Routing
Avoid unnecessary clock load to save active power
BAD
GOOD
UNNECESSARY
LOAD
EE 382M-8 VLSI-2
Page
4242
Foil #
The University of Texas at Austin
Power/Clock Grid
•
Clock grid is interleaved between VDD and VSS on metal6
LCB
LCB
Port0 Input Data Latch
Port1 Input Data Latch
LCB
LCB
LCB
Port0
Port0Read/Write
Read/WriteCkt
Ckt
Port0 Output Latch
Port1 Read/Write Ckt
Port1 Output Latch
EE 382M-8 VLSI-2
LCB
LCB
Port0 Input Data Latch
LCB Port1 Input Data Latch
LCB
Port1 Decoder
LCB
LCB
Port0 Decoder
Bitcell
Array
LCB
LCB
LCB
LCB
LCB
LCB
LCB
LCB
LCB
LCB
LCB
Page
4343
Foil #
LCB
LCB
Bitcell
Array
Port0 Read/Write Ckt
Port0 Output Latch
Port1 Read/Write Ckt
Port1 Output Latch
LCB
LCB
LCB
LCB
The University of Texas at Austin
Clock Routing
Make sure there are enough vias to get power through
the clock network
INSUFFICIENT
VIA COVERAGE
SUFFICIENT
VIA COVERAGE
EE 382M-8 VLSI-2
Page
4444
Foil #
The University of Texas at Austin
Clock Routing
Remember to count clocks as ~5-7 tracks in your
wire planning!
1x
2x
1x
Be careful with gated clocks. Fine grain
clock gating tends to drastically increase
the number of unique clocks, significantly
1.5x
1.5x
increasing the metal usage.
No tools catch this before layout
Vdd
EE 382M-8 VLSI-2
Clock
Vss
Page
4545
Foil #
The University of Texas at Austin
Datapath and Block Floorplanning Procedure
•
•
•
•
•
•
•
•
Step 1 - Identify feedthrus
Step 2 - Look for opportunities for track sharing
Step 3 - Define the bitpitch of the block
Step 4 - Review the metal plan within the cell
Step 5 - Review and plan the clock routing and placement
Step 6 - Plan the critical cell placement locations
Step 7 - Estimate the area of the cells and the block
Step 8 - Review the power grid
EE 382M-8 VLSI-2
Page
4646
Foil #
The University of Texas at Austin
Cell Placement
•
Start with the critical path!
– Place cells to limit the wire load on the critical path
– Move less critical blocks out of the way
•
Place clock generators to limit clock wire load
– Again, place most critical clock LCBs first if area is tight
– Ideally there should be minimal side loads
•
Consider track sharing opportunities when placing cells
– Cell placement can enable or disable track sharing
– Optimum placement generally follows data flow
EE 382M-8 VLSI-2
Page
4747
Foil #
The University of Texas at Austin
Cell Placement
Short
critical
path
No side
LCB
load
EE 382M-8 VLSI-2
Page
4848
Foil #
The University of Texas at Austin
Cell Placement and Routing
EE 382M-8 VLSI-2
Page
4949
Foil #
The University of Texas at Austin
Datapath and Block Floorplanning Procedure
•
•
•
•
•
•
•
•
Step 1 - Identify feedthrus
Step 2 - Look for opportunities for track sharing
Step 3 - Define the bitpitch of the block
Step 4 - Review the metal plan within the cell
Step 5 - Review and plan the clock routing and placement
Step 6 - Plan the critical cell placement locations
Step 7 - Estimate the area of the cells and the block
Step 8 - Review the power grid
EE 382M-8 VLSI-2
Page
5050
Foil #
The University of Texas at Austin
Area Estimation
•
All modules have an area budget in the floorplan
•
That budget is only an educated guess
•
Some guesses are high, and some are low
•
You will need to enhance the quality of these estimates by more
accurately estimating the area of your modules
•
While doing this you will reduce the amount of late surprises in
the design and also reduce post-layout effort by converging with
accurate parasitics
EE 382M-8 VLSI-2
Page
5151
Foil #
The University of Texas at Austin
Area Estimation
•
Custom cell area can be set in one of three ways
– Device limited layout means the device sizes set the cell area
– Metal limited layout means the wires set the cell area
– Pitch-matching means the cell area is set to match another cell
•
Your first job is to figure out which your cell is
– Datapaths are metal limited in one direction (bitpitch)
– Arrays often are metal limited in both directions
– Control blocks often match a datapath or array
EE 382M-8 VLSI-2
Page
5252
Foil #
The University of Texas at Austin
Die Size Estimation
EE 382M-8 VLSI-2
Page
5353
Foil #
The University of Texas at Austin
Datapath and Block Floorplanning Procedure
•
•
•
•
•
•
•
•
Step 1 - Identify feedthrus
Step 2 - Look for opportunities for track sharing
Step 3 - Define the bitpitch of the block
Step 4 - Review the metal plan within the cell
Step 5 - Review and plan the clock routing and placement
Step 6 - Plan the critical cell placement locations
Step 7 - Estimate the area of the cells and the block
Step 8 - Review the power grid
EE 382M-8 VLSI-2
Page
5454
Foil #
The University of Texas at Austin
Power Grid
•
•
•
•
Delivers current from the C4 bumps to the transistors
Designed to deliver typical current density to the devices
Increasing current density by arraying large devices can cause
you to exceed the power grid’s nominal design
Doing this can cause performance and noise problems
EE 382M-8 VLSI-2
Page
5555
Foil #
The University of Texas at Austin
Power Grid
Think of the grid as a straw
between the C4 and the devices.
Too many devices sucking through
the same straw or too narrow a
straw can cause devices to starve
and the supply to dip or crater!
EE 382M-8 VLSI-2
Page
5656
Foil #
The University of Texas at Austin
SAMPLE Power/Ground GRID
(Full Shielding, MCF = 1.0)
λ
λ
4λ
2λ
2λ
2λ
2λ
Sig
Vss
Sig
Vss
VSS
Sig
Sig
Vss
Sig
Vss
VDD
Sig
VSS
2λ
48λ
* Where λ is minimum critical dimension for width/space
ƒ
ƒ
Shielding takes up significant routing resources.
Global M6 routes over the array should have minimal coupling noise
to array bitlines.
EE 382M-8 VLSI-2
Page
5757
Foil #
The University of Texas at Austin
Power Grid
OUT
Bit 31 A<31:0>
<31:0>
A <31:0>
SCHEMATIC
VIEW
Bit 0
RELATIVE CELL
PLACEMENT
A
CELL LAYOUT
VIEW
EE 382M-8 VLSI-2
Page
5858
Foil #
The University of Texas at Austin
Power Grid
When large, arrayed drivers pull
OUT
Bit 31 A<31:0>
on the same rail, supply bounce
can occur degrading performance
<31:0>
and causing supply offset noise
A <31:0>
Out
SCHEMATIC
VIEW
Current
Vdd
Bit 0
RELATIVE CELL
PLACEMENT
A
Vss
CELL LAYOUT
VIEW
EE 382M-8 VLSI-2
Page
5959
Foil #
The University of Texas at Austin
Power Grid
•
•
•
•
•
Be very careful arraying large drivers
Follow the % power guidelines for the power grid
Try to keep temporal relationships between arrayed drivers
Consider the physical impact on the grid by your design
Be prepared to make the grid more robust to compensate for
marginal grids
EE 382M-8 VLSI-2
Page
6060
Foil #
The University of Texas at Austin
Summary
•
Early design planning and layout can have a significant impact
on processor design
– Die size, profit & power are impacted by layout density
– Schedule is impacted by implementation choices
•
Floorplanning also significantly impacts circuit performance
– Shielding can help timing and noise sensitive circuits
– Carefully floorplanning critical paths can help reduce wire loads
– Reducing clock routing can reduce clock skew and clock power
EE 382M-8 VLSI-2
Page
6161
Foil #
The University of Texas at Austin
Backup
EE 382M-8 VLSI-2
Page
6262
Foil #
The University of Texas at Austin
Wire and Resistance Calculator
EE 382M-8 VLSI-2
Page
6363
Foil #
The University of Texas at Austin
ALPHA 21364
EE 382M-8 VLSI-2
Page
6464
Foil #
The University of Texas at Austin
PPC 603
EE 382M-8 VLSI-2
Page
6565
Foil #
The University of Texas at Austin