On-Chip Optical Interconnects

Transcription

On-Chip Optical Interconnects
6th International Conference of Soft Computing and Pattern Recognition, August 11-14, 2014, Tunis, Tunisia
On-Chip Optical Interconnects:
Prospects and Challenges
Abderazek Ben Abdallah
The University of Aizu
School of Computer Science and Engineering
Division of Computer Engineering
Adaptive Systems Laboratory
Aizu-Wakamatsu, Japan
E-mail: [email protected]
August 13, 2014
[email protected]
1
Agenda





Motivation
Optical Interconnect Prospects
PHENIC Si-Photonics Network-onChip
Technology Challenges
Conclusion
August 13, 2014
[email protected]
2
HP computing today
Tianhe-2, # 1 in Nov. 2013
The switch backplane
Features
• 16,000 nodes, each with 2 Intel Xeon IvyBridge CPUs and 3 Xeon Phi CPUs for
a combined total of 3,120,000 computing cores
• 33.9 Pflops ( 4% of the Exascale target (2020))
• 17.8 MW (89 % of the 20 MW power limit)
August 13, 2014
[email protected]
8
A closer look at a computing system
CPU
On-chip
bottleneck
2nJ/Inst
PCI express ( 48/MB/s)
GPU
Off-chip
If we consider exascale within 20 MW ?
• We need 20pJ/Instruction !
 Target performance is far by using today's machines.
August 13, 2014
[email protected]
200pJ/Inst
5
Communications cost
Energy cost of data movement relative to the cost of a flop for current and 2018 systems. (Shalf et al., VECPAR 2010)
Challenges
• Preparing the operands costs more than performing computing on them!
• There is no Moore’s law for communications.
August 13, 2014
[email protected]
6
Gate vs. interconnect delays
Sor. IDEAL Research
August 13, 2014
[email protected]
7
Agenda





Motivation
Optical Interconnect Prospects
PHENIC Si-Photonics Network-onChip
Technology Challenges
Conclusion
August 13, 2014
[email protected]
8
The idea
Replace wires with
waveguides and
electrons with
photons!
August 13, 2014
[email protected]
(Photo: Spectrum 2005, Paniccia)
9
Milestones
1mm
Si
Photonics
target
area
On-chip
1cm
Optical wire/Waveguide
August 13, 2014
1 km
1m
chip to chip
10 cm
1 Mm
rack to rack
long haul
100m
board to board
1000 Km
LAN
1m
10Km
Optical cable/fiber
[email protected]
10
A typical architecture today
8.5 GBpS
30 mW/Gbps
DDR3
Coper link
Multicore Processor (CMP)
•
•
DRAM
Big cores for single thread performance
Small cores for multithread performance
Accelerating Multi- and Many-core
• Coper link consumes large power  an alternative approach is needed.
August 13, 2014
[email protected]
11
Photonics in computing system today
Transmission over fiber
Multicore Processor (CMP)
DRAM
Receiver/Transmitter
Optical link
• Uses monolithic integration that reduces energy consumption
• Utilizes the standard bulk CMOS flow
• Cladding is used to increase the total internal reflection  reduces data loss
August 13, 2014
[email protected]
12
Photonics in computing system today
Transmission over fiber
(WDM)
channel
λ1 λ2 λ3 …λn
>1 TBps
<1 mW/Gbps
Multicore Processor (CMP)
DRAM
Receiver/Transmitter
WDM, DWDM
• Supports WDM that improves bandwidth density
• DWDM can transports tens to hundreds of wavelengths per fiber.
• Integrated Tb/s optical link on a single chip is ongoing
August 13, 2014
[email protected]
13
(Si) Photonics benefits over electronic
Low operating
costs
Low heating of
components
Low power
Consumption
Possibility to
integrate more
optical functionalities
in a single
component
Low
manufacturing
cost
High
Integration
High
Reliability
Higher density of
interconnects
August 13, 2014
[email protected]
19
Data rate
Gb/s
Doubling the Data Rate Every 2 Years
August 13, 2014
[email protected]
15
Intel 50Gb/s WDM link
(A. Aldiuno et al , IPR 2010)
12.5 Gb/s x 4 channels = 50 Gb/s
(Intel Lab.)
Source: SemiconductorTODAY Compounds&AdvancedSilicon, Vol. 5, Issue 6 • July/August 2010
August 13, 2014
[email protected]
16
Si-Photonics in computing system today
Si-Photonics interposer
• Optical I/O’s for chip-to-chip and chip-to-board links (IBM, Intel, Fujitsu)
• E-O-E transceivers for Opto-Silicon Interposer
August 13, 2014
[email protected]
17
Channel technology
• Silicon waveguide
– Used on-chip
– Moderate loss, crossover issues
• Free space
– Use air
– Bunch of micro-mirrors and micro-lenses guide the light around
– On-chip use
• Hollow metal waveguide
– Used for slightly longer distances, at the board level
– Low loss, ease of fabrication
• Fiber optic cable
– Off-chip interconnect
August 13, 2014
[email protected]
18
Si-Photonics building blocks
Resonator
Modulator
Laser Source (input)
N+
Photodetectors
P+
Vm
Main components
• Laser Source: Inject the required laser lights into waveguide
• Modulators: Modulate the laser lights to ‘0’ and ‘1’ states
• Photodetectors: Detect the laser lights and convert to electrical signal
• Turn Resonators: Control the routing direction of the laser lights
August 13, 2014
[email protected]
19
Si-Photonics building blocks
A reversely biased
p-i-n diode to
eliminate the TPAinduced FCA
Raman Silicon Laser
Simulated Raman
Scattering (SRS)
On-chip: Vertical Cavity Surface Emitting Laser (VCSEL)
• One of the largest volume (and hence, cheapest) lasers currently in use
• Is often integrated on-chip
• Enables “direct modulation” ( You directly turn the laser ON/OFF in
accordance with the data being transmitted )
• Not fully CMOS compatible
• Does not support DWDM
August 13, 2014
[email protected]
20
Si-Photonics building blocks
5cm SOI nanowire
1.28Tb/s (32 l x 40Gb/s)
IBM/Columbia




Germanium on SOI,
Silicon on Insulator (to 3.6 μm),
Silcon Sapphre (to 5.6 μm),
Silicon on Nitride (to 6.7 μm)
Si Wire/Waveguide
• Silicon is transparent above 1100 nm
• Nearly all optical data links function at the near-infrared wavelength range
between 800 nm and 1600 nm
• We operate at 1310 nm (Industry Standard)
• SOI wafers cost about 10 times as much as conventional wafers
August 13, 2014
[email protected]
21
Transmission over Si Wire/Waveguide
Snell’s Law of Refraction:
n1
sin  1 n2 v 1


sin  2 n1 v 2

n2
n1
reflected ray
reflected ray
n2
refracted ray
refracted ray
1
1
incident
ray


1
1
2
incident ray 


n2  n1

n2  n1

August 13, 2014
2
[email protected]

22
Total internal reflection in Si Wire/Waveguide
n1
reflected ray
n2
Let 2 = /2:
refracted ray
2
1
Then sin 1 
n 
 c  sin  2 
n1 
1
1

incident ray 

n2
n1
n2  n1

For 1 > c, light ray is completely reflected.
 Total internal reflection

August 13, 2014
[email protected]
23
Total internal reflection in Si Wire/Waveguide
ncladding ncore ncladding
n1 n2
reflected ray
refracted ray
2
1

1
ncladding  ncore
Total internal reflection keeps all
optical energy within the core,
even if the fiber bends.

incident ray 

n  n1
core2
image from Wikipedia
cladding

August 13, 2014
[email protected]
24
Si-Photonics building blocks
Mach-Zehnder
Interferometer (MZI)
SOR: Intel Lab.
Modulator
• Enables high-speed conversion from E to O signals.
• Encodes data on a single wavelength channel that is combined with other signals
through WDM
• MRs are used for modulation due to their high modulation speed (10~20Gbps), low
power (47fJ/bit) and small footprint (µm2)
August 13, 2014
[email protected]
25
Si-Photonics building blocks
Photodetectors
• The same Microring used for modulation can be used as a wavelength
selective filter (photodetectors) to extract light out of the waveguide, if the
microring is doped with a photo-detecting material such as CMOScompatible germanium.
• The resonant light will be absorbed by the germanium and converted into an
electrical signal.
August 13, 2014
[email protected]
26
What is needed for on-chip Si-Photonics
interconnects ?
There is still a problem of scaling!
August 13, 2014
[email protected]
27
Processor is scaling to Man-core
Processor Scaling to Man-core
•
•
Are trending toward multi-core architectures with a growing number of cores -> require an
increasingly efficient and low-power communications infrastructure to achieve the desired level
of bandwidth & connectivity.
Si-photonic NoCs provide an effective solution to the power and bandwidth limitations of
existing E-NoCs used within CMPs
August 13, 2014
[email protected]
28
Processor is scaling to Many-core
Processor Scaling to Many-core
•
•
Are trending toward many-core architectures with a growing number of cores -> require an
increasingly efficient and low-power communications infrastructure to achieve the desired level
of bandwidth & connectivity.
Si-photonic NoCs provide an effective solution to the power and bandwidth limitations of
existing E-NoCs used within CMPs
August 13, 2014
[email protected]
29
Bandwidth, pin count and power scaling
1 Byte/Flop,
8 Flops/core
@ 5GHz
August 13, 2014
[email protected]
41
What is needed for on-chip Si-Photonics
interconnects ?
August 13, 2014
[email protected]
31
Critical Specs
•
•
•
•
•
•
•
Size
Bandwidth
Power consumption
Switching speed
Insertion loss
Differential loss
Crosstalk
August 13, 2014
[email protected]
32
Si Photonics on-chip communication
C
C
C
C
C
C
C
C
Switch controller
Shared $
Shared $
C
C
C
C
X
Shared $
Shared $
C
C
C
C
Merit #1: High Bandwidth
• Can scale easily via WDM/DWDM (electronics only via bus width )
August 13, 2014
[email protected]
33
Si Photonics on-chip communication
C
C
C
C
C
C
C
C
Switch controller
Shared $
Shared $
C
C
C
C
X
Shared $
Shared $
C
C
C
C
Merit #2: Low power consumption
August 13, 2014
[email protected]
34
Si Photonics on-chip communication
C
C
C
C
C
C
C
C
Switch controller
Shared $
Shared $
C
C
C
C
X
Shared $
Shared $
C
C
C
C
Merit #3: High Switching speed
• The goal is not communicate as fast as possible, but as fast as needed
depending on the application (speed of light 299,792 km/s)
August 13, 2014
[email protected]
35
Si Photonics on-chip communication
C
C
C
C
C
C
C
C
Switch controller
Shared $
Shared $
C
C
C
C
X
Shared $
Shared $
C
C
C
C
Merit #3: High Switching speed
• The goal is not communicate as fast as possible, but as fast as needed
depending on the application (Normal or Burst types).
August 13, 2014
[email protected]
36
Landscape of SiP on-Chip networks (PNoC)
Mesh
[Shacham’07]
[Petracca’08]
August 13, 2014
Mesh
Crossbar
[Joshi’09a]
[Pan’09]
[Shacham’07]
[Petracca’08]
[email protected]
Clos
[1-21]
37
The basic PNoC building block
in1
out1
in2
out1
in2
out2
in1
out2
BAR state
CROSS state
2x2 switch
• BAR state switch: data passes through
• CROSS state switch: data passes to opposite port
• Typical wavelength Range: 1260 ~ 1360 or 1510 ~ 1610 nm (Mechanical
Switch)
Problems:
• Lack of processing at bit level in optical domain
• Lack of efficient buffering in optical domain
August 13, 2014
[email protected]
38
The basic PNoC building block
• Just cascading 2x2 switch is not efficient and
increases loses.
August 13, 2014
[email protected]
39
Agenda





Motivation
Optical Interconnect Prospects
PHENIC Si-Photonics Network-onChip
Technology Challenges
Conclusion
August 13, 2014
[email protected]
40
PHENIC: Hybrid Si-Photonic NoC
via size < ~ 2μm
Benefits
• Higher integration
• Shorter interconnect (important for
Short message mode)
August 13, 2014
• Heterogeneous integration
• Reliability
• Short message mode & Large/Burst mode
[email protected]
41
Routing in Hybrid Si-Photonic NoC
D
S
August 13, 2014
[email protected]
43
Routing in Hybrid Si-Photonic NoC
1.Reserve the path
2.ACK
3. Transmit data on the
Photonic layer
D
4.Release (tear-down)
S
August 13, 2014
[email protected]
44
Electrical router and control
OASIS-RV1 Chip Layout (45nm CMOS
Process, 222.387 uW, 557 pins).
Major tasks
• Photonic route computation (path setting)
• Route computation for short messages on the electronic later (network)
• Other control tasks for the photonic switch on the photonic layer (network)
August 13, 2014
[email protected]
58
Photonic wavelength switch
Major tasks
• Photonic data transmission
• Optical data cannot be stored (no optical buffers!)
• No computation performed
August 13, 2014
[email protected]
59
Bandwidth, power and latency
August 13, 2014
[email protected]
47
Agenda





Motivation
Optical Interconnect Prospects
Case Study: PHENIC Si-Photonics
Network-on-Chip
Technology Challenges
Conclusion
August 13, 2014
[email protected]
48
Electronics integration
Intel, core i7, 2011)
Intel, 4004, 1971)
A billion transistors billions of
multiplications per sec 32 nm CMOS
2300 transistors thousands of
multiplications per sec 10 μm PMOS
August 13, 2014
[email protected]
49
Photonics integration
Intel’s 50 Gb/s (4x12.5Gb/s)
transceiver (2012).
CMOS sensor array
1st Semiconductor
Laser (~1962)
Single Channel transmitter
Luxtera’s photograph of CMOS
4x10Gb/s WDM die (2007)
Challenges
• Wafer-scale fabrication is difficult
• Si does not support some functions
• Improvement of cost, space, power, reliability is needed
August 13, 2014
[email protected]
50
E-O-E Transceivers (Tx/Rx) – Multilayer
option
MULTI-CHIPS OPTION
Challenges
▸ Single photonics platform (wafer-scale fabrication)
▸ Efficient E/O and O/E conversion
▸ CMOS-driven components
August 13, 2014
[email protected]
51
E-O-E Transceivers (Tx/Rx) – Si Photonics
Option
Si-Photonics option
MODULATORS
LASERS
MUX
DETECTORES
PLC
OPTICAL
I/O’s
MUX
Features
• Small photonics component footprint
• CMOS compatible fabrication processes
• 3D connectivity to CMOS wafers for improved O-E performance
August 13, 2014
[email protected]
52
Compact of ON-chip optical
wires/wiveguides
• Requirements
– Performance -> loss ~1dB/cm
– High density -> Bending radius ~1μm
• Challenges
– Meet low-loss despite Si sidewalls imperfection
– Realize efficient I/O (fiber) coupling despite large
mode mismatch
August 13, 2014
[email protected]
53
Reliability Challenges & Vision
Architecture Techniques
Macro Solutions
Micro Solutions
Redundant
active/passive
component (cores,
routers etc.)
PBC
ECC
Moore’s law: increasing
the bit count
exponentially: 2x every
2 years
Circuit Techniques
Cell creation
Comp. Param. Reconfiguration
Process Techniques
State of the Art Processes
Transient, intermittent, and permanent errors/faults
are reliability challenges
August 13, 2014
[email protected]
54
Agenda





Motivation
Optical Interconnect Prospects
Case Study: PHENIC Si-Photonics
Network-on-Chip
Technology Challenges
Conclusion
August 13, 2014
[email protected]
55
Concluding remarks
• Computer system interconnects are very complex
micro-communication components
• Most important metrics
– Bandwidth-density
– Energy-efficiency
• Si-Photonics design approach can improve system
throughput by 15-20x
• Many issues should be carefully handled
– Optimize network design (electrical switching, optical transport)
– Optimize physical mapping (layout) for low optical insertion loss
August 13, 2014
[email protected]
56
References
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
Achraf Ben Ahmed, A. Ben Abdallah, PHENIC: Towards Photonic 3D-Network-on-Chip Architecture for High-throughput Many-core Systems-on-Chip, IEEE Proceedings of the 14th International conference on
Sciences and Techniques of Automatic control and computer engineering (STA'2013), Dec. 2013. [DOI]
A. Ben Abdallah, PHENIC: Silicon Photonic 3D-Network-on-Chip Architecture for High-performance Heterogeneous Many-core System-on-Chip>PDF, Technical Report, Ref. PTR0901A0715-2013, September 1,
2013.
OASIS 3D-Router Hardware Physical Design, Technical Report, Adaptive Systems Laboratory, Division of Computer Engineering, University of Aizu, July 8, 2014.
Akram Ben Ahmed, A. Ben Abdallah, Graceful Deadlock-Free Fault-Tolerant Routing Algorithm for 3D Network-on-Chip Architectures, Journal of Parallel and Distributed Computing, 2014. [DOI]
Akram Ben Ahmed, Achraf Ben Ahmed, A. Ben Abdallah, Deadlock-Recovery Support for Fault-tolerant Routing Algorithms in 3D-NoC Architectures, IEEE Proceedings of the 7th International Symposium on
Embedded Multicore/Many-core SoCs (MCSoC-13), pp., 2013. [DOI]
Akram Ben Ahmed, A. Ben Abdallah, Architecture and Design of High-throughput, Low-latency and Fault Tolerant Routing Algorithm for 3D-Network-on-Chip, The Jnl. of Supercomputing, December 2013,
Volume 66, Issue 3, pp 1507-1532. [DOI]
Akram Ben Ahmed, T. Ouchi, S. Miura, A. Ben Abdallah, ''Run-Time Monitoring Mechanism for Efficient Design of Application-specific NoC Architectures in Multi/Manycore Era'', ''' IEEE Proc. of the 6th
International Workshop on Engineering Parallel and Multicore Systems (ePaMuS2013'), July 2013.''' [DOI]
Akram Ben Ahmed, T. Ouchi, S. Miura, A. Ben Abdallah, Run-Time Monitoring Mechanism for Efficient Design of Application-specific NoC Architectures in Multi/Manycore Era, Proc. IEEE 6th International
Workshop on Engineering Parallel and Multicore Systems (ePaMuS2013'), July 2013.
Akram Ben Ahmed, A. Ben Abdallah, ''Low-overhead Routing Algorithm for 3D Network-on-Chip'', '''IEEE Proc. of the The Third International Conference on Networking and Computing (ICNC'12), pp. 23-32,
2012.''' [DOI]
Akram Ben Ahmed, A. Ben Abdallah, ''LA-XYZ: Low Latency, High Throughput Look-Ahead Routing Algorithm for 3D Network-on-Chip (3D-NoC) Architecture'', '''IEEE Proceedings of the 6th International
Symposium on Embedded Multicore SoCs (MCSoC-12), pp. 167-174, 2012. [DOI]
Akram Ben Ahmed, A. Ben Abdallah, ''ONoC-SPL Customized Network-on-Chip (NoC) Architecture and Prototyping for Data-intensive Computation Applications'', '''IEEE Proceedings of The 4th International
Conference on Awareness Science and Technology, pp. 257-262, 2012. DOI
Kenichi Mori,A. Ben Abdallah, OASIS Network-on-Chip Prototyping on FPGA, Master's Thesis, The University of Aizu, Feb. 2012. [Thesis], [slides]
Ben Ahmed Akram, A. Ben Abdallah,[[On the Design of a 3D Network-on-Chip for Many-core SoC, Master's Thesis, The University of Aizu, Feb. 2012. [Thesis], [slides]
Shohei Miura, A. Ben Abdallah, Design of Parametrizable Network-on-Chip, '''Master's Thesis, The University of Aizu, Feb. 2012.'''
Ryuya Okada, A. Ben Abdallah, ''Architecture and Design of Core Network Interface for Distributed Routing in OASIS NoC'', '''Graduation Thesis, The University of Aizu, Feb. 2012.'
A. Ben Ahmed, A. Ben Abdallah, K. Kuroda, Architecture and Design of Efficient 3D Network-on-Chip (3D NoC) for Custom Multicore SoC, IEEE Proc. of the 5th International Conference on Broadband, Wireless
Computing, Communication and Applications (BWCCA-2010), pp.67-73, Nov. 2010. (''best paper award'')
Kenichi Mori, A. Ben Abdallah, OASIS Network-on-Chip Prototyping on FPGA , Master's Thesis, Graduate School of Computer Science and Engineering, The University of Aizu, Feb. 2012
K. Mori, A. Esch, A. Ben Abdallah, K. Kuroda, Advanced Design Issues for OASIS Network-on-Chip Architecture, IEEE Proc. of the 5th International Conference on Broadband, Wireless Computing,
Communication and Applications (BWCCA-2010),pp.74-79, Nov. 2010.
T. Uesaka, OASIS NoC Topology Optimization with Short-Path Link, Technical Report, Systems Architecture Group,March 2011.
K. Mori, A. Ben Abdallah, OASIS NoC Architecture Design in Verilog HDL, Technical Report,TR-062010-OASIS, Adaptive Systems Laboratory, the University of Aizu, June 2010.
Shohei Miura, Abderazek Ben Abdallah, Kenichi Kuroda, PNoC: Design and Preliminary Evaluation of a Parameterizable NoC for MCSoC Generation and Design Space Exploration, The 19th Intelligent System
Symposium (FAN 2009), pp.314-317, Sep.2009.
Kenichi Mori, Abderazek Ben Abdallah, Kenichi Kuroda, ''Design and Evaluation of a Complexity Effective Network-on-Chip Architecture on FPGA'', The 19th Intelligent System Symposium (FAN 2009), pp.318321, Sep. 2009.
A. Ben Abdallah, T. Yoshinaga and M. Sowa, Mathematical Model for Multiobjective Synthesis of NoC Architectures, IEEE Proc. of the 36th International Conference on Parallel Processing, Sept. 4-8, 2007,
[email protected]
57
References
Multicore Systems-onchip: Practical
Hardware/Software
Design Issues
Hardcover – August 6, 2010
August 13, 2014
[email protected]
58
August 13, 2014
[email protected]
59
University of Aizu
August 13, 2014
[email protected]
60
Thank you.
August 13, 2014
[email protected]
61