A Pausible Bisynchronous FIFO for GALS Systems

Transcription

A Pausible Bisynchronous FIFO for GALS Systems
2015 Symposium on Asynchronous Circuits and Systems A Pausible Bisynchronous FIFO for GALS Systems Ben Keller*†, Ma@hew FojCk*, Brucek Khailany* *NVIDIA CorporaCon May 4, 2015 †University of California, Berkeley A Pausible Bisynchronous FIFO for GALS Systems 1 The Nightmare of Global Clocking •  Tapeout signoff currently requires distribuCng balanced synchronous clocks to every flip-­‐flop in a clock domain… –  Made up of many par$$ons
1 Clock 5 domain 2 25mm 6 3 8 E 7 ParCCons •  …then closing SETUP and 4 HOLD on all paths in the clock domain •  Large chips have only a handful of clock domains •  Each clock domain is many mm2, even in ≤28nm 25mm May 4, 2015 –  Across dozens of PVT corners! A Pausible Bisynchronous FIFO for GALS Systems 2 Globally Asynchronous, Locally Synchronous! From this… May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 3 Globally Asynchronous, Locally Synchronous! Local
Local Clock DVCO
Generator Local
Local Clock DVCO
Generator …to this. Internal
Logic
Partition A
May 4, 2015 Internal
Logic
Partition B
A Pausible Bisynchronous FIFO for GALS Systems 4 Towards Fine-­‐Grained GALS Local
Local Clock DVCO
Generator Internal
Logic
Partition A
Local
Local Clock DVCO
Generator Internal
Logic
Partition B
Clock Domain Crossing
May 4, 2015 •  No global clock! Local ring oscillator generates clock in each parCCon •  No global Cming closure •  Reduce Cming margin –  Tracks local PVT variaCon –  May improve tracking of high-­‐frequency noise •  But there’s a catch… A Pausible Bisynchronous FIFO for GALS Systems 5 Textbook Approach: Bisynchronous FIFO Data Out
Data In
tx_clk
Valid
Full
FIFO Memory (Dual Port RAM) Read
Pointer
Write
Pointer
Write Pointer “Brute Force” Synchronizers
Logic tx_clk
Read Pointer Logic Ready
rx_clk
TX Interface
May 4, 2015 Empty
A Pausible Bisynchronous FIFO for GALS Systems RX Interface
6 The Brute Force Synchronizer MTBF ∝ e
t
•  Add latency unCl metastability is most likely resolved •  +3 cycles of latency at every parCCon boundary! •  Need to do be@er… May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 7 Pausible Clocking Basics Metastability happens when
Data_Sync transitions near the
clock edge.
Mutual exclusion
circuit only allows G1
OR G2 to go high
(never both).
Clock
Data_Sync
R2
OK Phase Delay Phase Data_Sync
Sync_OK
R1 G1 R2 G2 Clock Generator Sync_OK If data arrives when
clock is high, let it
through.
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 8 Pausible Clocking Basics Metastability happens when
Data_Sync transitions near the
clock edge.
Mutual exclusion
circuit only allows G1
OR G2 to go high
(never both).
Clock
Data_Sync
R2
OK Phase Delay Phase Data_Sync
Sync_OK
R1 G1 R2 G2 Clock Generator Sync_OK If data arrives when clock is
low, delay it until after the
clock edge.
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 9 Metastability in Pausible Clocks If the metastability lasts long
enough, the next clock edge is
delayed until it resolves safely.
Data_Sync
Clock
R2
OK Phase Delay Phase Sync_OK
R1 G1 R2 G2 Clock Generator Data_Sync
Mutex Output May 4, 2015 metastability
A Pausible Bisynchronous FIFO for GALS Systems 10 Clock Pauses Are Rare Mutex Delay (ps) 1E+05 1E+04 1E+03 Average
impact on
cycle time:
<0.1%
99.99% of
metastability
events resolve
in <100ps
96% of
synchronizations
have no
metastability
1E+02 1E+01 1E-­‐07 1E-­‐06 1E-­‐05 1E-­‐04 1E-­‐03 1E-­‐02 1E-­‐01 1E+00 1E+01 1E+02 1E+03 Difference in Arrival Times (ps) May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 11 SynchronizaCon With Pausible Clock Generators “Pausible Clock Generator”
Synchronized
signal arrives
after (an
average of)
just 1 clock
cycle!
Mutex and
pausible clock
safely
synchronize
the signal.
Clock Generator “Demystifying Data Driven and Pausible Clocking Schemes,” Mullins07
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 12 Pausible Clocks: Our ContribuCon •  Prior work in pausible clocks: –  Proposed pausible clocking as a technique for low-­‐
latency asynchronous interface crossings (Yun96) –  Integrated pausible clocks with asynchronous FIFOs to make GALS wrappers (Moore02, Fan09, others) •  We propose a circuit that: –  Pairs pausible clocking techniques with standard two-­‐
ported synchronous FIFOs –  Uses a novel flow-­‐control scheme that enables low-­‐
latency asynchronous boundary crossings –  Integrates easily with standard toolflows May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 13 A Pausible Bisynchronous FIFO Data Out
Data In
tx_clk
Valid
Full
Write Pointer Logic FIFO Memory (Dual Port RAM) Read
Pointer
Write
Pointer
tx_clk
Read Pointer Logic Ready
rx_clk
TX Interface
May 4, 2015 Empty
A Pausible Bisynchronous FIFO for GALS Systems RX Interface
14 A Pausible Bisynchronous FIFO Data Out
Data In
tx_clk
Valid
Full
Write Pointer Logic FIFO Memory (Dual Port RAM) Read
Pointer
Write
Pointer
Pausible Clock Generator tx_clk
Pausible Clock Generator Read Pointer Logic Ready
rx_clk
TX Interface
May 4, 2015 Empty
A Pausible Bisynchronous FIFO for GALS Systems RX Interface
15 A Pausible Bisynchronous FIFO Pausible Clock Generator May 4, 2015 Pausible Clock Generator A Pausible Bisynchronous FIFO for GALS Systems Pointers are
synchronized
via two-phase
increment and
acknowledge
signals.
16 Two-phase
signals
5. Read
pointer
increment
is toggled.
1. Valid data
is written to
the FIFO.
6. After the
RX clock,
write pointer
acknowledge
is toggled.
2. Write
pointer
increment
is toggled.
3. The write
pointer
increment is
synchronized.
7. Write pointer
acknowledge is
synchronized. The
increment signal can
now safely be reused.
4. Valid is asserted
and data is read
out of the FIFO. May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 17 SimulaCon Results: Brute Force Synchronizer 8 Latency (RX Cycles) 7 Typical latency: 4 cycles
6 5 Brute Force Synchronizer 4 3 Pausible Synchronizer Typical latency: 1.25 cycles
2 1 0 0.5 1 2 4 TX Clock Period / RX Clock Period May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 18 Pausible Clocking Timing Constraints Minimum clock period:
T ≥ tr 2 + tg2 + t fb
2
t fb
RX Pointer Logic Maximum insertion delay:
Tins ≤ T − t fb − tg2
tg2
Maximum same-cycle work:
TCL ≤ t fb + tg2 + Tins
May 4, 2015 tr 2
A Pausible Bisynchronous FIFO for GALS Systems tins
19 Pausible Clocking LimitaCons: Where to Put the Mutexes? Partition
Pausible Interface Delay
due to
physical
distance
Pausible Interface Clock Generator Pausible Interface Pausible Interface May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 20 Pausible Clocking LimitaCons: Where to Put the Mutexes? OpAon 1: Mutexes at the interface – Reduces maximum clock rate OpAon 2: Mutexes at the center – Increases latency + Improves maximum clock rate + Can bake the mutex into a black box with the clock generator May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems Added
Delay
21 Playing Nice with the Tools •  Most of the design is synchronous and works fine with synthesis tools •  FIFO is a standard dual-­‐
ported memory •  A single custom cell is needed •  Custom Cming constraints at clock domain crossings
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 22 Synchronizer Comparison Design Average Latency (cycles) Area (um2) Power (mW) Energy (fJ/bit) Synchronous Interface 1 4968 4.08 25.5 Brute Force Synchronizer ~4 5005 6.03 37.7 Pausible Synchronizer ~1.5 4808 5.41 33.8 May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 23 Summary •  Pausible clocks have key advantages over standard synchronizers, including low latency and zero probability of failure •  We designed a pausible bisynchronous FIFO that pairs pausible clocking techniques with standard two-­‐ported synchronous FIFOs •  Fast asynchronous interfaces that work well with standard toolflows are a key enabling technology of fine-­‐grained GALS design May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 24 References R. Mullins and S. Moore, “DemysCfying Data-­‐Driven and Pausible Clocking Schemes,” in Proc. IEEE Symposium on Asynchronous Cir-­‐ cuits and Systems, 2007, pp. 175–185. K. Yun and R. Donohue, “Pausible clocking: a first step toward heterogeneous systems,” in Proc. IEEE Interna$onal Conference on Computer Design, 1996, pp. 118–123. S. Moore et al., “Point to point GALS interconnect,” in Proc. IEEE Symposium on Asynchronous Circuits and Systems, 2002, pp. 69–75. X. Fan et al., “Analysis and opCmizaCon of pausible clocking based GALS design,” IEEE Interna$onal Conference Computer Design, 2009. May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 25 QuesCons? May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 26 Backup Slides May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 27 A Pausible Bisynchronous FIFO data_in
data_out
tx_clk
Dual-Port FIFO
rptr
wptr
Write
Pointer
Logic wptr_inc
valid
Read
Pointer
rptr_inc Logic
valid
rptr_ack_sync
wptr_ack_sync
To TX Side
LAT
LAT
LAT
LAT
w
pt
r
_i
nc
_
sy
From RX Side
ready
nc
ready
wptr_ack
wptr_ack
g1
r1
g2
C
tx_clk_root
r_pause_start
t_pause_start
tx_clk
r2
MUTEX
MUTEX
MUTEX
MUTEX
May 4, 2015 r2
MUTEX
MUTEX
MUTEX
MUTEX
r1
t_pause_start
r_pause_start
rx_clk
g1
g2
C
rx_clk_root
A Pausible Bisynchronous FIFO for GALS Systems 28 Mutex Circuit SR Latch
Metastability Filter
R1 R2 G1 G2 0 0 0 0 1 0 1 0 0 1 0 1 1 1 May 4, 2015 R1 first -­‐> G1 high R2 first -­‐> G2 high •  Only G1 or G2 can be high at once (never both) •  If R1 and R2 transiCon high simultaneously, metastability can result •  The “metastability filter” keeps both outputs low unCl metastability resolves A Pausible Bisynchronous FIFO for GALS Systems 29 SimulaCon Setup TX Clock Domain
Dummy Compute (TX) RX Clock Domain
Interface Dummy Compute (RX) •  Implemented interface in Verilog •  Allows direct comparison to brute-­‐force synchronizers via drop-­‐in replacement May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 30 Minimum Clock Period Request arrives here t fb
Clock
R2
OK Phase Delay Phase Minimum clock period:
T ≥ tr 2 + tg2 + t fb
2
May 4, 2015 tg2
tr 2
A Pausible Bisynchronous FIFO for GALS Systems 31 InserCon Delay Requests can arrive
near the clock edge!
Clock_root
Clock
tins
LAT
r2
R2
OK Phase Delay Phase Maximum insertion delay:
Tins ≤ T − t fb − tg2
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems Clock_root
tins
32 CombinaConal Logic Delay TCL t fb
Maximum same-cycle work:
tg2
TCL ≤ t fb + tg2 + Tins
tr 2
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 33 Metastability Margin tm tm
t fb
Minimum clock period:
T ≥ tr 2 + tg2 + t fb
2
tg2
Metastability margin:
tm = T − (tr 2 + tg2 + t fb )
2
May 4, 2015 tr 2
A Pausible Bisynchronous FIFO for GALS Systems 34 Average Latency Average latency:
TL = T + Tins − tr 2
May 4, 2015 A Pausible Bisynchronous FIFO for GALS Systems 35