A Pausible Bisynchronous FIFO for GALS Systems
Transcription
A Pausible Bisynchronous FIFO for GALS Systems
2015 Symposium on Asynchronous Circuits and Systems A Pausible Bisynchronous FIFO for GALS Systems Ben Keller*†, Ma@hew FojCk*, Brucek Khailany* *NVIDIA CorporaCon May 4, 2015 †University of California, Berkeley A Pausible Bisynchronous FIFO for GALS Systems 1 The Nightmare of Global Clocking • Tapeout signoff currently requires distribuCng balanced synchronous clocks to every flip-‐flop in a clock domain… – Made up of many par$$ons 1 Clock 5 domain 2 25mm 6 3 8 E 7 ParCCons • …then closing SETUP and 4 HOLD on all paths in the clock domain • Large chips have only a handful of clock domains • Each clock domain is many mm2, even in ≤28nm 25mm May 4, 2015 – Across dozens of PVT corners! A Pausible Bisynchronous FIFO for GALS Systems 2 Globally Asynchronous, Locally Synchronous! From this… May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 3 Globally Asynchronous, Locally Synchronous! Local Local Clock DVCO Generator Local Local Clock DVCO Generator …to this. Internal Logic Partition A May 4, 2015 Internal Logic Partition B A Pausible Bisynchronous FIFO for GALS Systems 4 Towards Fine-‐Grained GALS Local Local Clock DVCO Generator Internal Logic Partition A Local Local Clock DVCO Generator Internal Logic Partition B Clock Domain Crossing May 4, 2015 • No global clock! Local ring oscillator generates clock in each parCCon • No global Cming closure • Reduce Cming margin – Tracks local PVT variaCon – May improve tracking of high-‐frequency noise • But there’s a catch… A Pausible Bisynchronous FIFO for GALS Systems 5 Textbook Approach: Bisynchronous FIFO Data Out Data In tx_clk Valid Full FIFO Memory (Dual Port RAM) Read Pointer Write Pointer Write Pointer “Brute Force” Synchronizers Logic tx_clk Read Pointer Logic Ready rx_clk TX Interface May 4, 2015 Empty A Pausible Bisynchronous FIFO for GALS Systems RX Interface 6 The Brute Force Synchronizer MTBF ∝ e t • Add latency unCl metastability is most likely resolved • +3 cycles of latency at every parCCon boundary! • Need to do be@er… May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 7 Pausible Clocking Basics Metastability happens when Data_Sync transitions near the clock edge. Mutual exclusion circuit only allows G1 OR G2 to go high (never both). Clock Data_Sync R2 OK Phase Delay Phase Data_Sync Sync_OK R1 G1 R2 G2 Clock Generator Sync_OK If data arrives when clock is high, let it through. May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 8 Pausible Clocking Basics Metastability happens when Data_Sync transitions near the clock edge. Mutual exclusion circuit only allows G1 OR G2 to go high (never both). Clock Data_Sync R2 OK Phase Delay Phase Data_Sync Sync_OK R1 G1 R2 G2 Clock Generator Sync_OK If data arrives when clock is low, delay it until after the clock edge. May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 9 Metastability in Pausible Clocks If the metastability lasts long enough, the next clock edge is delayed until it resolves safely. Data_Sync Clock R2 OK Phase Delay Phase Sync_OK R1 G1 R2 G2 Clock Generator Data_Sync Mutex Output May 4, 2015 metastability A Pausible Bisynchronous FIFO for GALS Systems 10 Clock Pauses Are Rare Mutex Delay (ps) 1E+05 1E+04 1E+03 Average impact on cycle time: <0.1% 99.99% of metastability events resolve in <100ps 96% of synchronizations have no metastability 1E+02 1E+01 1E-‐07 1E-‐06 1E-‐05 1E-‐04 1E-‐03 1E-‐02 1E-‐01 1E+00 1E+01 1E+02 1E+03 Difference in Arrival Times (ps) May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 11 SynchronizaCon With Pausible Clock Generators “Pausible Clock Generator” Synchronized signal arrives after (an average of) just 1 clock cycle! Mutex and pausible clock safely synchronize the signal. Clock Generator “Demystifying Data Driven and Pausible Clocking Schemes,” Mullins07 May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 12 Pausible Clocks: Our ContribuCon • Prior work in pausible clocks: – Proposed pausible clocking as a technique for low-‐ latency asynchronous interface crossings (Yun96) – Integrated pausible clocks with asynchronous FIFOs to make GALS wrappers (Moore02, Fan09, others) • We propose a circuit that: – Pairs pausible clocking techniques with standard two-‐ ported synchronous FIFOs – Uses a novel flow-‐control scheme that enables low-‐ latency asynchronous boundary crossings – Integrates easily with standard toolflows May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 13 A Pausible Bisynchronous FIFO Data Out Data In tx_clk Valid Full Write Pointer Logic FIFO Memory (Dual Port RAM) Read Pointer Write Pointer tx_clk Read Pointer Logic Ready rx_clk TX Interface May 4, 2015 Empty A Pausible Bisynchronous FIFO for GALS Systems RX Interface 14 A Pausible Bisynchronous FIFO Data Out Data In tx_clk Valid Full Write Pointer Logic FIFO Memory (Dual Port RAM) Read Pointer Write Pointer Pausible Clock Generator tx_clk Pausible Clock Generator Read Pointer Logic Ready rx_clk TX Interface May 4, 2015 Empty A Pausible Bisynchronous FIFO for GALS Systems RX Interface 15 A Pausible Bisynchronous FIFO Pausible Clock Generator May 4, 2015 Pausible Clock Generator A Pausible Bisynchronous FIFO for GALS Systems Pointers are synchronized via two-phase increment and acknowledge signals. 16 Two-phase signals 5. Read pointer increment is toggled. 1. Valid data is written to the FIFO. 6. After the RX clock, write pointer acknowledge is toggled. 2. Write pointer increment is toggled. 3. The write pointer increment is synchronized. 7. Write pointer acknowledge is synchronized. The increment signal can now safely be reused. 4. Valid is asserted and data is read out of the FIFO. May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 17 SimulaCon Results: Brute Force Synchronizer 8 Latency (RX Cycles) 7 Typical latency: 4 cycles 6 5 Brute Force Synchronizer 4 3 Pausible Synchronizer Typical latency: 1.25 cycles 2 1 0 0.5 1 2 4 TX Clock Period / RX Clock Period May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 18 Pausible Clocking Timing Constraints Minimum clock period: T ≥ tr 2 + tg2 + t fb 2 t fb RX Pointer Logic Maximum insertion delay: Tins ≤ T − t fb − tg2 tg2 Maximum same-cycle work: TCL ≤ t fb + tg2 + Tins May 4, 2015 tr 2 A Pausible Bisynchronous FIFO for GALS Systems tins 19 Pausible Clocking LimitaCons: Where to Put the Mutexes? Partition Pausible Interface Delay due to physical distance Pausible Interface Clock Generator Pausible Interface Pausible Interface May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 20 Pausible Clocking LimitaCons: Where to Put the Mutexes? OpAon 1: Mutexes at the interface – Reduces maximum clock rate OpAon 2: Mutexes at the center – Increases latency + Improves maximum clock rate + Can bake the mutex into a black box with the clock generator May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems Added Delay 21 Playing Nice with the Tools • Most of the design is synchronous and works fine with synthesis tools • FIFO is a standard dual-‐ ported memory • A single custom cell is needed • Custom Cming constraints at clock domain crossings May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 22 Synchronizer Comparison Design Average Latency (cycles) Area (um2) Power (mW) Energy (fJ/bit) Synchronous Interface 1 4968 4.08 25.5 Brute Force Synchronizer ~4 5005 6.03 37.7 Pausible Synchronizer ~1.5 4808 5.41 33.8 May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 23 Summary • Pausible clocks have key advantages over standard synchronizers, including low latency and zero probability of failure • We designed a pausible bisynchronous FIFO that pairs pausible clocking techniques with standard two-‐ported synchronous FIFOs • Fast asynchronous interfaces that work well with standard toolflows are a key enabling technology of fine-‐grained GALS design May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 24 References R. Mullins and S. Moore, “DemysCfying Data-‐Driven and Pausible Clocking Schemes,” in Proc. IEEE Symposium on Asynchronous Cir-‐ cuits and Systems, 2007, pp. 175–185. K. Yun and R. Donohue, “Pausible clocking: a first step toward heterogeneous systems,” in Proc. IEEE Interna$onal Conference on Computer Design, 1996, pp. 118–123. S. Moore et al., “Point to point GALS interconnect,” in Proc. IEEE Symposium on Asynchronous Circuits and Systems, 2002, pp. 69–75. X. Fan et al., “Analysis and opCmizaCon of pausible clocking based GALS design,” IEEE Interna$onal Conference Computer Design, 2009. May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 25 QuesCons? May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 26 Backup Slides May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 27 A Pausible Bisynchronous FIFO data_in data_out tx_clk Dual-Port FIFO rptr wptr Write Pointer Logic wptr_inc valid Read Pointer rptr_inc Logic valid rptr_ack_sync wptr_ack_sync To TX Side LAT LAT LAT LAT w pt r _i nc _ sy From RX Side ready nc ready wptr_ack wptr_ack g1 r1 g2 C tx_clk_root r_pause_start t_pause_start tx_clk r2 MUTEX MUTEX MUTEX MUTEX May 4, 2015 r2 MUTEX MUTEX MUTEX MUTEX r1 t_pause_start r_pause_start rx_clk g1 g2 C rx_clk_root A Pausible Bisynchronous FIFO for GALS Systems 28 Mutex Circuit SR Latch Metastability Filter R1 R2 G1 G2 0 0 0 0 1 0 1 0 0 1 0 1 1 1 May 4, 2015 R1 first -‐> G1 high R2 first -‐> G2 high • Only G1 or G2 can be high at once (never both) • If R1 and R2 transiCon high simultaneously, metastability can result • The “metastability filter” keeps both outputs low unCl metastability resolves A Pausible Bisynchronous FIFO for GALS Systems 29 SimulaCon Setup TX Clock Domain Dummy Compute (TX) RX Clock Domain Interface Dummy Compute (RX) • Implemented interface in Verilog • Allows direct comparison to brute-‐force synchronizers via drop-‐in replacement May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 30 Minimum Clock Period Request arrives here t fb Clock R2 OK Phase Delay Phase Minimum clock period: T ≥ tr 2 + tg2 + t fb 2 May 4, 2015 tg2 tr 2 A Pausible Bisynchronous FIFO for GALS Systems 31 InserCon Delay Requests can arrive near the clock edge! Clock_root Clock tins LAT r2 R2 OK Phase Delay Phase Maximum insertion delay: Tins ≤ T − t fb − tg2 May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems Clock_root tins 32 CombinaConal Logic Delay TCL t fb Maximum same-cycle work: tg2 TCL ≤ t fb + tg2 + Tins tr 2 May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 33 Metastability Margin tm tm t fb Minimum clock period: T ≥ tr 2 + tg2 + t fb 2 tg2 Metastability margin: tm = T − (tr 2 + tg2 + t fb ) 2 May 4, 2015 tr 2 A Pausible Bisynchronous FIFO for GALS Systems 34 Average Latency Average latency: TL = T + Tins − tr 2 May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 35