Lecture 3 - The University of Texas at Austin
Transcription
Lecture 3 - The University of Texas at Austin
EE-382M-8 VLSI–II Early Design Planning: Back End Mark McDermott EE 382M-8 VLSI-2 Page 11 Foil # The University of Texas at Austin Backend EDP Flow • The project activities will include: – Determining the standard cell and custom library elements needed to completely do the design with APR tools. – Detailed floor-plan of the block level components. – A reasonably detailed top-level floorplan using the cluster abstracts. – Approximate clock routing at the top-level – Approximate Power-GND routing at the top level EE 382M-8 VLSI-2 Page 22 Foil # The University of Texas at Austin EDP and Layout in the Design Flow Concept Architecture Technology Readiness EDP uArchitecure Logic Front End Development Circuits Layout EDP encompasses planning from architecture to the layout. Backend Design Execution Silicon Ramp EE 382M-8 VLSI-2 Page 33 Foil # Si Debug Production The University of Texas at Austin Standard Cell Library Effort • Will be using a very minimal standard cell library for the project: ~80+ cells – Basic logic gates and buffers – 1 set-reset flip-flop • “CMOS65_SubVt.lib” file was derived using a scaled 65nm .lib file – Need to validate the scaled numbers with HSPICE simulations. – Need to validate power spreadsheet numbers using HSPICE: • S-D leakage currents • Intrinsic power – Need to validate area spreadsheet numbers EE 382M-8 VLSI-2 Page 44 Foil # The University of Texas at Austin Block Floorplanning Effort • Objectives: – – – • Minimize area Determine best shape of the block Minimize total wire length Each team will do a detailed floorplan of their respective blocks. The output will be a spreadsheet analysis showing the contribution from each of the following: – – – – – – Power grid Clocking Signal Routing Datapath area Random logic area White space EE 382M-8 VLSI-2 Page 55 Foil # The University of Texas at Austin Integration Effort • The integration team will be responsible for: – – – – – – – – – – Doing a floor plan of the top level of the chip Characterizing the top-level routing delays and determining the assertions and constraints for each cluster. They will be working with each cluster to optimize the constraints. Designing the clock routing structure: Determining the clock generation implementation (block diagrams) Determining the clock regeneration circuitry (block diagrams) Determining the reset logic. Designing the power grid. Determining the power estimation for the global clock and signal routing. Generating the power budget for each cluster. Generating the area budget for each cluster. EE 382M-8 VLSI-2 Page 66 Foil # The University of Texas at Austin Layout Implementation Options SPARC-T1 EE 382M-8 VLSI-2 Page 77 Foil # The University of Texas at Austin Layout Density & Die Size = Performance Schematic A The layout of Block B affects the C timing of the path from A to C Floorplan • Higher density layout leads to smaller block sizes • Smaller block sizes lead to shorter wires • Shorter wires can lead to higher frequency • Shorter wires can also lead to higher IPC by requiring fewer transmission pipe stages Layout #1 A B C Layout #2 A B’ EE 382M-8 VLSI-2 C Page 88 Foil # The University of Texas at Austin Layout Implementation Options • Synthesis – Random Logic Macro (RLM) – Cell layout comes from a shared cell library – Automated cell selection and placement – Automated routing between cells • Structured Custom (SC/SDP) – Cell layout comes from a shared cell library – Manual cell selection and placement – Automated routing between cells • Increasing Design Effort (And Density) Custom Design (CD) – Cell layout is unique for each application – Manual cell selection and placement – Manual routing between cells EE 382M-8 VLSI-2 Page 99 Foil # The University of Texas at Austin Layout Implementation Options CD SC RLM ARTL Coding M M M Logic Minimization M M A Cell Placement M M A Device Sizing M A A Layout M A A CD SC RLM Timing Best Better Worst Density Best Better Worst Design Time Worst Better Best EE 382M-8 VLSI-2 Page 1010 Foil # A = Automatic M = Manual • RLM saves time in circuit design and layout • SC saves time in layout. • RLM and SDP make revisions easier. The University of Texas at Austin Datapath and Block Floorplanning Procedures MIPS R10K EE 382M-8 VLSI-2 Page 1111 Foil # The University of Texas at Austin Datapath and Block Floorplanning Procedure • • • • • • • • Step 1 - Identify feedthrus for RLM or SC/DP block Step 2 - Look for opportunities for track sharing Step 3 - Define the bitpitch of the block Step 4 - Review the metal plan within the cell Step 5 - Review and plan the clock routing and placement Step 6 - Plan the critical cell placement locations Step 7 - Estimate the area of the cells and the block Step 8 - Review the power grid EE 382M-8 VLSI-2 Page 1212 Foil # The University of Texas at Austin Feed-through or Over-the-cell (OTC) Routes • Metal tracks routed over RLM, Datapath or custom block • The block is neither the driver or a receiver of the signals • Feedthrus use up metal tracks which impacts the internal signals of the block • Carefully review datapath connectivity to account for them Sources Receiver Driver Results Bypass ALU 0 ALU 1 Feedthrus ALU 2 for ALU0 EE 382M-8 VLSI-2 Page 1313 Foil # The University of Texas at Austin Datapath and Block Floorplanning Procedure • • • • • • • • Step 1 - Identify feedthrus Step 2 - Look for opportunities for track sharing Step 3 - Define the bitpitch of the block Step 4 - Review the metal plan within the cell Step 5 - Review and plan the clock routing and placement Step 6 - Plan the critical cell placement locations Step 7 - Estimate the area of the cells and the block Step 8 - Review the power grid EE 382M-8 VLSI-2 Page 1414 Foil # The University of Texas at Austin Step 2: Track Sharing • Minimizes the number of unique tracks in layout by opportunistically sharing tracks where possible • Often allows for the smallest possible bitpitch • Allows for metal layers to be more efficiently utilized • Can help improve performance by shortening distances • Should always be explored to improve layout efficiency and performance EE 382M-8 VLSI-2 Page 1515 Foil # The University of Texas at Austin Step 2: Track Sharing Sources Receiver Driver Results Bypass$ First, check outside your block to see if there are any candidates for track sharing ALU 0 ALU 1 ALU 2 EE 382M-8 VLSI-2 Page 1616 Foil # The University of Texas at Austin Step 2: Track Sharing Metal 2 Metal 4 IE_BYC_DATA<11:0> IE_RF_DATA<11:0> Next, check inside your block to see if there are any candidates LRBL<11:0> RRBL<11:0> for track sharing EE 382M-8 VLSI-2 Page 1717 Foil # The University of Texas at Austin Step 2: Track Sharing Example EE 382M-8 VLSI-2 Page 1818 Foil # The University of Texas at Austin Step 2: Track Sharing Example EE 382M-8 VLSI-2 Page 1919 Foil # The University of Texas at Austin Datapath and Block Floorplanning Procedure • • • • • • • • Step 1 - Identify feedthrus Step 2 - Look for opportunities for track sharing Step 3 - Define the bitpitch of the block Step 4 - Review the metal plan within the cell Step 5 - Review and plan the clock routing and placement Step 6 - Plan the critical cell placement locations Step 7 - Estimate the area of the cells and the block Step 8 - Review the power grid EE 382M-8 VLSI-2 Page 2020 Foil # The University of Texas at Austin Bit Pitch Defining Width of Chip AMD K5 EE 382M-8 VLSI-2 Page 2121 Foil # The University of Texas at Austin Step 3: Define the Bitpitch • • • • Fixed cell width chosen to allow easy assembly Most often determined by metal usage within the datapath Bitpitch Integration efficiency would prefer one bitpitch per project Architectures lend themselves to more unique bit pitches A<4> A<3> A<2> A<1> A<0> EE 382M-8 VLSI-2 Page 2222 Foil # Vdd Sig0 <4> Sig1 <4> Sig2 <4> Sig3 <4> Sig4 <4> Sig5 <4> Vss Vdd Sig0 <1> Sig1 <1> Sig2 <1> Sig3 <1> Sig4 <1> Sig5 <1> Vss The University of Texas at Austin Step 3: Define the Bitpitch EE 382M-8 VLSI-2 Page 2323 Foil # Bitpitch #1 Xµ WB Mux Shifter Bit Ops System Uops AGEN - LD / STA ALU 1 Arith Flags ALU 0 Bypass Cache File Integer Register Bitpitch #2 Yµ Insure all blocks in a datapath stack follow the same bitpitch The University of Texas at Austin Bit Pitch Example: 3:2 Adder Bit Cell M3 & M1 Bitpitch 7.56u M4 M1 EE 382M-8 VLSI-2 Page 2424 Foil # The University of Texas at Austin Bit Pitch Example: 4 Bit Cells stacked BIT - 4 BIT - 2 BIT - 1 Bitpitch BIT - 0 7.56u EE 382M-8 VLSI-2 Page 2525 Foil # The University of Texas at Austin Bit Pitch Example: Tiled Datapath EE 382M-8 VLSI-2 Page 2626 Foil # The University of Texas at Austin Bit Pitch Example: Swizzle Don’t mix and match bit pitches to avoid swizzle channels Swizzle Channel As buses get wider and the number of tracks per bit gets higher the cost of swizzle channels grows EE 382M-8 VLSI-2 Page 2727 Foil # The University of Texas at Austin Step 3: Define the Bitpitch • Wider bit pitches allow more upper level metal usage • Narrower bit pitches allow shorter routes for orthogonal signals • Balancing these conflicting objectives can be difficult • Understand your local constraints and be aware of the tradeoffs EE 382M-8 VLSI-2 Page 2828 Foil # The University of Texas at Austin Datapath and Block Floorplanning Procedure • • • • • • • • Step 1 - Identify feedthrus Step 2 - Look for opportunities for track sharing Step 3 - Define the bitpitch of the block Step 4 - Review the metal plan within the cell Step 5 - Review and plan the clock routing and placement Step 6 - Plan the critical cell placement locations Step 7 - Estimate the area of the cells and the block Step 8 - Review the power grid EE 382M-8 VLSI-2 Page 2929 Foil # The University of Texas at Austin Metal Planning • Metal layer, width, spacing and shielding are negotiable – “Negotiable” means you have to plead your case to the integration leaders • All of these impose a physical constraint for layout • For your first attempt at convergence – – – – – – M1,M2 : Local routing M3,M4, M5, M6 : Data and control M7,M8 : Power, Ground, Clock, Reset, etc Assume all nets are routed in M1&M2 within your block Assume your only shielding is on clocks and reset Assume the routes are minimum EE 382M-8 VLSI-2 Page 3030 Foil # The University of Texas at Austin Metal Flow Planning Avoid bi-directional dataflow Cntl Cntl Data Data Data BAD EE 382M-8 VLSI-2 GOOD Page 3131 Foil # The University of Texas at Austin Shielding • Intentionally routing signals to control the effective line-to-line capacitance seen during switching. • Requires designers to constrain the physical assembly done by routing tools or physical design specialists (PDSs). • Falls into one of three categories: – Physical shielding - signals are routed next to a power rail – Logical shielding - signals are routed by logically related signals – Temporal shielding - signals are routed by temporally distinct signals EE 382M-8 VLSI-2 Page 3232 Foil # The University of Texas at Austin Miller Coupling Factor A A B B C C MCF = 2.0 Both against MCF = 1.5 One against, one quiet A A A B B B C C C MCF = 1.0 Both quiet EE 382M-8 VLSI-2 MCF = 0.5 One with, one quiet Page 3333 Foil # MCF = 0.0 Both with The University of Texas at Austin No Shielding • • • • Signals are routed next to any neighboring signals Neighbors can slow down (max delay) or speed up (min delay) signal transitions through line-to-line coupling Variation can create design problems Most signals will not be shielded No Shield Max MCF 2.0 Min MCF 0.0 A A B B C C Sig A Sig B Sig C EE 382M-8 VLSI-2 Page 3434 Foil # The University of Texas at Austin Physical Shielding • • • • Signals are routed next to at least one power rail Helps both min delay and max delay Can be expensive in terms of metal usage Typically limited to most critical nets and clocks Full Shield Half Shield Max MCF 1.5 Max MCF 1.0 Min MCF 0.5 Min MCF 1.0 Vss Sig A Sig B EE 382M-8 VLSI-2 Vss Sig A Vss Page 3535 Foil # The University of Texas at Austin Logical Shielding • • • • Signals are routed next to mutually exclusive neighbors Also helps min delay and max delay Comparable results as physical shielding but lesser cost Encouraged in mux structures and arrays Max MCF 1.5 Min MCF 1.0 A Sel A Sel A B Sel B Sel B Sel A Sel B Sel C Sel C C Sel C EE 382M-8 VLSI-2 Page 3636 Foil # The University of Texas at Austin Temporal Shielding • • • • Signals are routed next to signals that limit aggressors Can help max delay or min delay or both Lesser cost than physical shielding, but more design effort Encouraged wherever possible but tricky Max MCF 1.0 A Ck A B Ck Sig A Sig B Sig C Min MCF 0.0 B C C Ck EE 382M-8 VLSI-2 Page 3737 Foil # The University of Texas at Austin Shielding Gotcha • Tools may rely on the designer to override the default coupling assumptions Max MCF 2.0 L A Min MCF 0.0 Ck If you need temporal shielding to make your L B circuit meet timing, your circuit doesn’t meet timing. Do not rely on it. Ck EE 382M-8 VLSI-2 Page 3838 Foil # The University of Texas at Austin Datapath and Block Floorplanning Procedure • • • • • • • • Step 1 - Identify feedthrus Step 2 - Look for opportunities for track sharing Step 3 - Define the bitpitch of the block Step 4 - Review the metal plan within the cell Step 5 - Review and plan the clock routing and placement Step 6 - Plan the critical cell placement locations Step 7 - Estimate the area of the cells and the block Step 8 - Review the power grid EE 382M-8 VLSI-2 Page 3939 Foil # The University of Texas at Austin Variations of Clock Tree distribution networks Target: Metallization and Gate topology uniformity Tapered H-Tree EE 382M-8 VLSI-2 Page 4040 Foil # The University of Texas at Austin Clock Routing • • • • • • Watch out for the clock, it’s your most critical net Make sure the physical design treats it accordingly Help reduce clock power by eliminating unnecessary load Make sure the clock has enough via coverage Leave room for decoupling capacitors and upsizing Don’t forget to account for clock routing overhead (full shield) in your metal planning EE 382M-8 VLSI-2 Page 4141 Foil # The University of Texas at Austin Clock Routing Avoid unnecessary clock load to save active power BAD GOOD UNNECESSARY LOAD EE 382M-8 VLSI-2 Page 4242 Foil # The University of Texas at Austin Power/Clock Grid • Clock grid is interleaved between VDD and VSS on metal6 LCB LCB Port0 Input Data Latch Port1 Input Data Latch LCB LCB LCB Port0 Port0Read/Write Read/WriteCkt Ckt Port0 Output Latch Port1 Read/Write Ckt Port1 Output Latch EE 382M-8 VLSI-2 LCB LCB Port0 Input Data Latch LCB Port1 Input Data Latch LCB Port1 Decoder LCB LCB Port0 Decoder Bitcell Array LCB LCB LCB LCB LCB LCB LCB LCB LCB LCB LCB Page 4343 Foil # LCB LCB Bitcell Array Port0 Read/Write Ckt Port0 Output Latch Port1 Read/Write Ckt Port1 Output Latch LCB LCB LCB LCB The University of Texas at Austin Clock Routing Make sure there are enough vias to get power through the clock network INSUFFICIENT VIA COVERAGE SUFFICIENT VIA COVERAGE EE 382M-8 VLSI-2 Page 4444 Foil # The University of Texas at Austin Clock Routing Remember to count clocks as ~5-7 tracks in your wire planning! 1x 2x 1x Be careful with gated clocks. Fine grain clock gating tends to drastically increase the number of unique clocks, significantly 1.5x 1.5x increasing the metal usage. No tools catch this before layout Vdd EE 382M-8 VLSI-2 Clock Vss Page 4545 Foil # The University of Texas at Austin Datapath and Block Floorplanning Procedure • • • • • • • • Step 1 - Identify feedthrus Step 2 - Look for opportunities for track sharing Step 3 - Define the bitpitch of the block Step 4 - Review the metal plan within the cell Step 5 - Review and plan the clock routing and placement Step 6 - Plan the critical cell placement locations Step 7 - Estimate the area of the cells and the block Step 8 - Review the power grid EE 382M-8 VLSI-2 Page 4646 Foil # The University of Texas at Austin Cell Placement • Start with the critical path! – Place cells to limit the wire load on the critical path – Move less critical blocks out of the way • Place clock generators to limit clock wire load – Again, place most critical clock LCBs first if area is tight – Ideally there should be minimal side loads • Consider track sharing opportunities when placing cells – Cell placement can enable or disable track sharing – Optimum placement generally follows data flow EE 382M-8 VLSI-2 Page 4747 Foil # The University of Texas at Austin Cell Placement Short critical path No side LCB load EE 382M-8 VLSI-2 Page 4848 Foil # The University of Texas at Austin Cell Placement and Routing EE 382M-8 VLSI-2 Page 4949 Foil # The University of Texas at Austin Datapath and Block Floorplanning Procedure • • • • • • • • Step 1 - Identify feedthrus Step 2 - Look for opportunities for track sharing Step 3 - Define the bitpitch of the block Step 4 - Review the metal plan within the cell Step 5 - Review and plan the clock routing and placement Step 6 - Plan the critical cell placement locations Step 7 - Estimate the area of the cells and the block Step 8 - Review the power grid EE 382M-8 VLSI-2 Page 5050 Foil # The University of Texas at Austin Area Estimation • All modules have an area budget in the floorplan • That budget is only an educated guess • Some guesses are high, and some are low • You will need to enhance the quality of these estimates by more accurately estimating the area of your modules • While doing this you will reduce the amount of late surprises in the design and also reduce post-layout effort by converging with accurate parasitics EE 382M-8 VLSI-2 Page 5151 Foil # The University of Texas at Austin Area Estimation • Custom cell area can be set in one of three ways – Device limited layout means the device sizes set the cell area – Metal limited layout means the wires set the cell area – Pitch-matching means the cell area is set to match another cell • Your first job is to figure out which your cell is – Datapaths are metal limited in one direction (bitpitch) – Arrays often are metal limited in both directions – Control blocks often match a datapath or array EE 382M-8 VLSI-2 Page 5252 Foil # The University of Texas at Austin Die Size Estimation EE 382M-8 VLSI-2 Page 5353 Foil # The University of Texas at Austin Datapath and Block Floorplanning Procedure • • • • • • • • Step 1 - Identify feedthrus Step 2 - Look for opportunities for track sharing Step 3 - Define the bitpitch of the block Step 4 - Review the metal plan within the cell Step 5 - Review and plan the clock routing and placement Step 6 - Plan the critical cell placement locations Step 7 - Estimate the area of the cells and the block Step 8 - Review the power grid EE 382M-8 VLSI-2 Page 5454 Foil # The University of Texas at Austin Power Grid • • • • Delivers current from the C4 bumps to the transistors Designed to deliver typical current density to the devices Increasing current density by arraying large devices can cause you to exceed the power grid’s nominal design Doing this can cause performance and noise problems EE 382M-8 VLSI-2 Page 5555 Foil # The University of Texas at Austin Power Grid Think of the grid as a straw between the C4 and the devices. Too many devices sucking through the same straw or too narrow a straw can cause devices to starve and the supply to dip or crater! EE 382M-8 VLSI-2 Page 5656 Foil # The University of Texas at Austin SAMPLE Power/Ground GRID (Full Shielding, MCF = 1.0) λ λ 4λ 2λ 2λ 2λ 2λ Sig Vss Sig Vss VSS Sig Sig Vss Sig Vss VDD Sig VSS 2λ 48λ * Where λ is minimum critical dimension for width/space Shielding takes up significant routing resources. Global M6 routes over the array should have minimal coupling noise to array bitlines. EE 382M-8 VLSI-2 Page 5757 Foil # The University of Texas at Austin Power Grid OUT Bit 31 A<31:0> <31:0> A <31:0> SCHEMATIC VIEW Bit 0 RELATIVE CELL PLACEMENT A CELL LAYOUT VIEW EE 382M-8 VLSI-2 Page 5858 Foil # The University of Texas at Austin Power Grid When large, arrayed drivers pull OUT Bit 31 A<31:0> on the same rail, supply bounce can occur degrading performance <31:0> and causing supply offset noise A <31:0> Out SCHEMATIC VIEW Current Vdd Bit 0 RELATIVE CELL PLACEMENT A Vss CELL LAYOUT VIEW EE 382M-8 VLSI-2 Page 5959 Foil # The University of Texas at Austin Power Grid • • • • • Be very careful arraying large drivers Follow the % power guidelines for the power grid Try to keep temporal relationships between arrayed drivers Consider the physical impact on the grid by your design Be prepared to make the grid more robust to compensate for marginal grids EE 382M-8 VLSI-2 Page 6060 Foil # The University of Texas at Austin Summary • Early design planning and layout can have a significant impact on processor design – Die size, profit & power are impacted by layout density – Schedule is impacted by implementation choices • Floorplanning also significantly impacts circuit performance – Shielding can help timing and noise sensitive circuits – Carefully floorplanning critical paths can help reduce wire loads – Reducing clock routing can reduce clock skew and clock power EE 382M-8 VLSI-2 Page 6161 Foil # The University of Texas at Austin Backup EE 382M-8 VLSI-2 Page 6262 Foil # The University of Texas at Austin Wire and Resistance Calculator EE 382M-8 VLSI-2 Page 6363 Foil # The University of Texas at Austin ALPHA 21364 EE 382M-8 VLSI-2 Page 6464 Foil # The University of Texas at Austin PPC 603 EE 382M-8 VLSI-2 Page 6565 Foil # The University of Texas at Austin