Lecture 7: CMOS Proximity Wireless Communications for 3D

Transcription

Lecture 7: CMOS Proximity Wireless Communications for 3D
EE290c Spring 2007, Tues & Thurs 9:30-11:00, 212 Cory UCB
Lecture 7:
CMOS Proximity Wireless Communications for
3D Integration (2)
Tadahiro Kuroda
Visiting MacKay Professor
Department of EECS
University of California, Berkeley
[email protected], [email protected]
http://bwrc.eecs.berkeley.edu/Classes/ee290c_s07
http://www.kuroda.elec.keio.ac.jp/
© T. Kuroda (1/42)
ISSCC2007
„
Energy dissipation reduction to 0.14pJ/b
†Pulse shaping
„
Extension of communications range to 1.2mm for
through-package link
†Preamplifier and offset cancellation
Energy Dissipation (Power/Bandwidth) [pJ/b]
1000
Toshiba
Probe IC
on FCB
NEC
NTT
TI
100
Intel
NEC
Inductors
in FCB
NEC
Keio
Rambus
Hitachi
10
CLK
Target LSI
on PCB
Fujitsu
TeraChip
Sun
TX RX
Keio
Probe IC
(transceiver)
Keio
NC State
Inductors
in FCB
SFT
To/From
in-Circuit-Emulator
1
Target LSI
(in SSOP pkg)
glue
This Work (180nm)
0.1
’96 ’97 ’98
© T. Kuroda (2/42)
ARCES
’99 ’00
’01 ’02
Year
(90nm)
’03 ’04 ’05 ’06 ’07
PCB
On-chip inductors
Flexible-Circuit-Board
(FCB)
Low-Power Design Needed
HDTV Camcorder
M
SR A
DR
AM
og
l
a
n
A
C PU
Mobile Phone
Portable Game
High Bandwidth
Low Power
„
SiP application: high performance and yet low power
Example: H.264 video decoding for 1080HDTV
Required bandwidth = 20Gb/s
Decoder power
= 100mW (C.C. Lin, ISSCC’06)
Total IO power
< 10mW
IO energy dissipation < 0.5pJ/b (10mW / 20Gb/s)
© T. Kuroda (3/42)
Previous Works
Capacitive Coupling
0.14pJ/b
d=2µm
A. Fazzi, (CICC’05)
„
Inductive Coupling
2.8pJ/b
d=20µm
N. Miura, (ISSCC’06)
0.14pJ/b inductive-coupling transceiver
Tx: digital pulse shaping, Rx: process scaling
No performance degradation
© T. Kuroda (4/42)
Data Transceiver Circuit
Txdata
Txdat
a
IT
-
+
Rxclk
VR
VB
IT [mA] Txdata Txclk
Pulse
Generator
ETX= 2.2pJ/b
1.8
V
0
1.8V
VR [mV]
Txclk
0
5
0
-5
50
0
Rxdata Rxclk
-50
1.8V
0
1.8V
ERX= C VDD2
= 0.6pJ/b
Rxdata
© T. Kuroda (5/42)
Rxdata
0
0
2
Time [ns]
4
6
Energy Dissipation in Tx
IT
Pulse
Generator
SP
τ
0
VR= MdIT/dt
VP= 2 M IP / τ
= 2 M SP
Txdata
VDD
M
+ VR -
τ
ETX= VDD IP τ
= VDD SP τ2
-VP
Time
© T. Kuroda (6/42)
Txdata
IT
VP
0
Txclk
ETX= VDD IP τ
IP
Energy Dissipation in Tx
Txclk
IT
IP
Pulse
Generator
SP
τ
τ/2
0
VR= MdIT/dt
Txdata
VDD
IT
VP
M
+ VR -
0
ETX= VDD IP τ
= VDD SP τ2
-VP
Time
© T. Kuroda (7/42)
Txdata
Bathtub Curve
10-3
@ 1Gb/s
IT [mA]
4
10-6
180ps
BER
0
VR [mV]
60
10-9
0
180ps
-60
Timing
Margin=150ps
10-12
© T. Kuroda (8/42)
250
Time
350
300
Sampling Timing [ps]
400
Inter-Channel Skew
10-3
@ 1Gb/s
10ps
Skew in 64ch Array
10-6
BER
30µm
10-9
Timing
Margin=150ps
10-12
© T. Kuroda (9/42)
250
350
300
Sampling Timing [ps]
400
Pulse Shaping Circuit
Pulse Width
Control (6bit)
4ps Step
Txdata
Pulse Slew Rate
Control (4bit)
20w
Txclk
4-Phase Clk
0º 45º 90º 135º 0º~45
º
135º
PI
PI
6bit
Pulse
0º~45
135º
º
Pulse
Txdata
24w
Pulse Amplitude
Control (5bit)
24w
© T. Kuroda (10/42)
20w
IT
Tx Chip
-+
Rxclk
τ
VR
Rx
Rxdata
Rx Chip
Simulated Waveforms
Pulse Amplitude Control
Pulse Width Control
1.2
60ps
0.4
Con
Slew stant
Rat
e
IT [mA]
0.8
1
0.5
0
0
80
60
40
20
0
-20
-40
-60
-80
80
60
40
20
0
-20
-40
-60
-80
0
VR [mV]
VR [mV]
IT [mA]
1.5
60ps
100
200
Time [ps]
© T. Kuroda (11/42)
300
400
0
60ps
100
200
Time [ps]
300
400
Txclk
4-Phase Clk
0º 45º 90º 135º
PI
PI
6bit
0º~45
º
Txdata
135º
Txdata
Tx
Tx
ITC
IT
Pulse Width Control
Timing Control
Tx Chip
Sampling Timing Control
135º
VRC
VR
1bit
Rx
Rx
90º 6bit
Rxdata
Rxclk
45º
-+
0º
0º~135º
© T. Kuroda (12/42)
PI
4-Phase Clk
PI
-+
Clock Link
Rx Chip
Txclk
ITC
VB2
VRC [V]
+ VRCVB1
VSA
VDD
Rxclk
© T. Kuroda (13/42)
1.8
Sclk
0
1
0
-1
0.1
0
-0.1
1.1
Rxclk
Rxclk [V] VSA [V]
Txclk
ITC [mA] Txclk [V]
Clock Transceiver
0.9
0.7
1.8
0
0
1
2
3
Time [ns]
4
5
Test Chip in 180nm CMOS
Rx Chip
Data Transceiver
(1Gb/s)
Tx Chip
(10µm-Thick)
Clock Transceiver
(1GHz)
30µm
200µm
© T. Kuroda (14/42)
Clock Jitter Reduction
1GHz Rxclk
Txclk
Txclk
ITC
100ps
4.8psrms
½ of [3]
+ VRCVB2
VB1
6
@ 1GHz
VSA
VDD
Rxclk
© T. Kuroda (15/42)
200mV
Rxclk
4.8psrms Jitter
Jitter [psrms]
Clock Slew Rate (Sclk) Control
2psrms Jitter
5.6
5.2
4.8
6
8
10
Sclk [mV/ps]
12
Pulse Amplitude (VP) Control
τ=60ps @ 1Gb/s
100
VP=20mV
10-3
BER
60m
V
10-9
80m
Time
V
τ
10-6
40m
VR
VP
V
10-12
© T. Kuroda (16/42)
65
85
105
Sampling Timing [ps]
Pulse Width (τ) Control
VP=60mV @ 1Gb/s
100
Time
BER
τ
10-6
10-9
/b
53pJ
s, 0.
120p
VR
VP
b
3pJ/
=0.1
ps, E TX
b
τ=60
3pJ/
, 0.2
80ps
/b
36pJ
s, 0.
100p
10-3
25ps
10-12
© T. Kuroda (17/42)
20
40
60
80
100
Sampling Timing [ps]
120
Supply Noise Immunity
ETX=0.13pJ/b @ 1Gb/s
10-3
C
ha
ng
e
Supply Noise
(1GHz Random Load Change)
1G
H
z
ha
ng
e
Lo
ad
350mV
50
kH
z
BER
50ns
VDD
Rx
Tx
Chip Chip
Rx Load
Probe
Tx Load
10-9
Lo
ad
C
10-6
Board
10-12
0
© T. Kuroda (18/42)
100
200
300
400
500
Supply Noise [mV-peak-to-peak]
600
Test Chip in 90nm CMOS
Tx Chip
(10µm-Thick)
Metal Inductor
P=30µm
3x3 Channel Array
Rx Chip
(750µm-Thick)
© T. Kuroda (19/42)
Bathtub Curve in 90nm CMOS
100
@ 1Gb/s
τ=60ps, ETX=0.11pJ/b, ERX=0.03pJ/b
BER
10-3
10-6
10-9
Timing
Margin=30ps
10-12
© T. Kuroda (20/42)
-40
-30
-20
-10
0
10
Sampling Timing [ps]
20
Performance Summary
This Work
Previous Work
Energy Dissipation
in Tx/Rx, ETOTAL
0.14pJ/b
0.33pJ/b
2.8pJ/b
Energy Dissipation
in Tx, ETX
0.11pJ/b
0.13pJ/b
2.2pJ/b
Energy Dissipation
in Rx, ERX
0.03pJ/b
0.2pJ/b
0.6pJ/b
Process
90nm CMOS
(VDD=1V)
180nm CMOS
(VDD=1.8V)
Data Rate
1Gb/s
Bit Error Rate
<10-12
Clock Rate
1GHz
Channel Area
30µm x 30µm
Distance
15µm
© T. Kuroda (21/42)
World Lowest Energy (0.14pJ/b)
1000
Energy Dissipation [pJ/b]
Toshiba (350nm)
HDTV
H.264/AVC
(23.1Gb/s)
NEC (250nm)
NTT (250nm)
TI (180nm)
100
Intel (180nm)
NEC (130nm)
NEC (130nm)
Rambus
(90nm)
Keio (350nm)
Hitachi
(250nm)
10
Wire Bonding 200mW
Fujitsu
(90nm)
TeraChip (130nm)
Sun (350nm)
[2]Keio
Keio (180nm)
(250nm)
µ-bump
20mW
w/ interposer
[1]SFT
1
(180nm)
This Work (180nm)
Inductive
This Work (90nm)
0.1
’96
’97
’98
’99
’00
’01
’02
’03
’04
’05
’06
2mW
’07
Year
[20.2] “A 0.14pJ/b Inductive-Coupling Inter-Chip Data Transceiver with
Digitally-Controlled Precise Pulse Shaping”
[16] ISSCC’07, Keio Univ.
© T. Kuroda (22/42)
Summary: Energy Reduction
ERX = CV 2
scale as CMOS gate
2.8pJ/bit
Q = CV
E = QV
T pulse
ETX = QV =
Rx.data
0.6pJ/b
∂I
∂t
2
Idt
⋅
V
∝
V
⋅
T
pulse
∫
= const.
iT
max
Tx.data
2.2pJ/b
0.14pJ/bit
180nm
1.8V
90nm
1.0V
0.03pJ/b
0.11pJ/b
vR
Tpulse
t
t
shorten pulse width (timing issue)
lower voltage
© T. Kuroda (23/42)
ISSCC2007
„
Energy dissipation reduction to 0.14pJ/b
„ Pulse shaping
„
Extension of communications range to 1.2mm for
through-package link
†Preamplifier and offset cancellation
Energy Dissipation (Power/Bandwidth) [pJ/b]
1000
Toshiba
Probe IC
on FCB
NEC
NTT
TI
100
Intel
NEC
Inductors
in FCB
NEC
Keio
Rambus
Hitachi
10
CLK
Target LSI
on PCB
Fujitsu
TeraChip
Sun
TX RX
Keio
Probe IC
(transceiver)
Keio
NC State
Inductors
in FCB
SFT
To/From
in-Circuit-Emulator
1
Target LSI
(in SSOP pkg)
glue
This Work (180nm)
0.1
’96 ’97 ’98
© T. Kuroda (24/42)
ARCES
’99 ’00
’01 ’02
Year
(90nm)
’03 ’04 ’05 ’06 ’07
PCB
On-chip inductors
Flexible-Circuit-Board
(FCB)
Background
„
Pulse-based inductive-coupling technique
†
High-speed, low-power, and low-cost chip to
chip communication in a SiP (BW > 1Tbps)
z
„
Communication range: 10µm – 100µm
(ref. Miura et. al., ISSCC2006, 23.4)
New applications opened up by extension of
communication distance to a millimeter range
Detachable high-speed wireless interfaces for
†
†
†
Real-time on-chip bus monitor
High-speed memory access
Durable contactless connector, etc
© T. Kuroda (25/42)
Target of This Study
9
Wireless logic probing through LSI package for
firmware debugging
Merits
†Down sizing and cost reduction
by elimination of package test pins and PCB pattern
for debugging
†Flexibility enhancement
by detachable interface
†Security improvement
by elimination of easily accessible test pins
†Electrical isolation
by removal of contacts
© T. Kuroda (26/42)
System Overview
PC
Probe IC (Amp. etc.)
Target
Probe (FCB)
µ-controller LSI
USB
PCB
Debugger
Probe
(Flexible-Circuit-Board)
Enlarged
wireless interface
Probe IC
Inductive-coupling
Target
µ-controller LSI
© T. Kuroda (27/42)
Inductors
Bus Probing for Debugging
Probe IC
on FCB
CLK
Inductors
in FCB
Target LSI
on PCB
TX RX
Probe IC
(transceiver)
Inductors
in FCB
Target LSI
(in SSOP pkg)
To/From
in-Circuit-Emulator
glue
Flexible-Circuit-Board
(FCB)
PCB
On-chip inductors
[18] ISSCC’07, Keio Univ.
[20.3] “An Attachable Wireless Chip-Access Interface for Arbitrary Data
Rate Using Pulse-Based Inductive-Coupling through LSI Package”
© T. Kuroda (28/42)
Die Photograph
CLK
• Technology
0.25µm CMOS
Standard digital process
with embedded flash ROM
3 layer AL
• Power supply
MCU core and transceiver:2.5V
•Die size
10.1mm2
MCU core
TX
© T. Kuroda (29/42)
RX
Block Diagram
© T. Kuroda (30/42)
Signaling
© T. Kuroda (31/42)
Tradeoff by Inductor Size
Inductor size
Large
Small
Cost
High
Low
Self resonant frequency
Low
High
Must be long
Can be short
Attainable data rate
Low
High
Communication distance
Long
Short
TX/RX inductor alignment
Easy
Difficult
Pulse width
© T. Kuroda (32/42)
Attainable Communication Distance
Detectable level
Coupling Coefficient
1
D=10µm
D=100µm
10-2
10-4
10-6
D=1mm
Targe
t
Only Comp.
Amp. + Comp.
(30dB)
D
X
Noise floor
10-8
1mm
10mm
10µm
100µm
Communication Distance X
© T. Kuroda (33/42)
Data Receiver
• Pre-amplifier for high sensitivity
• DAC for offset cancellation
• Delay line for decision timing adjustment
© T. Kuroda (34/42)
Interference Problem
„
Switching noise from digital circuits and I/O buffers
(mainly just after the clock edge)
Noise coupling to the receiver via substrate, power lines,
ground lines and bonding wires
Malfunction of the asynchronous clock receiver
Our Solution
„
To de-sensitize the clock receiver after the clock
transition
© T. Kuroda (35/42)
Clock Receiver
© T. Kuroda (36/42)
Experimental Setup
Development Kit
(target µ-controller)
Wireless probe
Debugger
Reference
© T. Kuroda (37/42)
Measured Clock and Data Waveforms
Clock
Transmitted (upper)
Received (lower)
© T. Kuroda (38/42)
Data
Transmitted (upper)
Received (lower)
Received Pulse Waveform
Signal amplitude
(DAC input value)
30
20
10
0
-10
-20
-30
0
0.5
1
1.5
2
2.5
time (nsec)
© T. Kuroda (39/42)
3
3.5
4
Alignment Tolerance
1.E+00
1
Vertical distance: 1.2mm
1.E-01
-2
1.E-02
10
1.E-03
BER
-4
1.E-04
10
1.E-05
1.E-06
10-6
1.E-07
1.E-08
10-8
1.E-09
1.E-10
10-10
-1
© T. Kuroda (40/42)
-0.5
0
0.5
1
Horizontal alignment error (mm)
Chip Specification
Technology
Chip size
Supply voltage
LSI: 0.25µm CMOS, 3-layer metal
Probe: 2-layer metal FCB
2.4mm x 4.2mm
MCU core and transceiver : 2.5V
I/O : 3.3V - 5.0V
Data rate
20 Mbps (full-duplex)
Communication
distance
1.2 mm (@ BER < 10-10 )
Alignment tolerance
0.5 mm (@ BER < 10-10 )
Power dissipation
(@20Mbps)
CLK: TX 14.3 mW, RX 10.4 mW
© T. Kuroda (41/42)
DATA: TX 0.5 mW,
RX 8.1 mW
Summary: Range Extension
„
Wireless chip access interface through LSI package was
realized for firmware debugging.
„
Preamplifier and offset cancellation DAC in the receiver
extend the communication range to 1.2mm with enough
alignment tolerance.
„
De-glitch circuit enables the reliable clock transmission
even in the presence of interference.
„
The interface achieved 20Mbps and has the potential
data rate of up to 500Mbps/ch.
© T. Kuroda (42/42)

Similar documents