On the Design of CMOS Vision Systems on Chip

Transcription

On the Design of CMOS Vision Systems on Chip
On the Design of
CMOS
Vision Systems on Chip
Angel Rodríguez-Vázquez
IEEE CAS Workshop 2008
Rio de Janeiro, August 27 - 2008
© ANAFOCUS 2007
1
Outline of the Talk
•
Some Basic Concepts:
◊
◊
◊
•
Architectures for Vision Systems
◊
◊
◊
•
Progressive Distributed Processing
Bio--Inspired Vision System Architectures
Bio
Shifting the Analog
Analog--toto-Digital Border
Topographical
T
hi l S
Sensor-Processors
SensorP
◊
◊
◊
•
Concept of Vision Systems and VSoCs
Conventional Vision System Architecture
Rationale
i
l for
f
Using
i
CMOS
C OS
Concept of Concurrent Sensory
Sensory--Processing
Multi--Functional Pixel
Multi
R ti
Rationale
l ffor Analog
A l
Pre
PreP -Processing
P
i
The EyeEye-RIS Vision Systems
◊
◊
◊
Architecture
The System in Operation
Some Prospects for Future Developments
2
Some Basic Concepts
Concept
p of Vision Systems
y
Conventional MV Architectures
Rationale for CMOS Implementation
© ANAFOCUS 2008
3
Concept and Ingredients of Vision
4
Interpretation and Decision
Image
g Processing
g
Image Acquisition
© ANAFOCUS 2008
4
The Concept of CMOS Vision System on-Chip
5
All functions for smart imaging and vision systems on chip
Memory
I
Image
Sensors
S
ADC
Single-Chip
Solution
Data Converters
Multi-Function
System
Digital Signal Processors
© ANAFOCUS 2008
5
Illustrating the Challenge of Autonomy
6
Systems capable to make autonomous
decisions for vision-based guidance
1
2
Voltage regulator
board
Motor controller
board
3
Battery
Image acquisition | Low-pass
filtering | tresholding |
skeletonization |morphological
filtering | signal recognition |
road-lanes recognition |
actuation on the robot wheels.
4
© ANAFOCUS 2008
6
Illustrating the Challenge of Speed
7
Systems to close the Perception-to-Action Loop
at Thousands Images-per-Second
Speed ~ 40 m /s
Up to 3,000 frames per second
Laser system
Illumination system
Image acquisition | Low-pass
filtering | activity detection |
motion estimation | object
tracking | Loop control |
position prediction |
coordinates translation |
actuation on the laser system.
© ANAFOCUS 2008
7
Goals of Artificial Vision Systems
8
Perceive lightness and colour under various illuminations
Detect intensity changes and perform 2-D segmentation
Infer 3-D structures from stereo or motion images
Organize surface and regions into objects of interest
G
Generate
t description
d
i ti
off objects
bj t and
d recognize
i
th
them
among
a potentially large class of objects
Make non-visual inferences about the scene based on
visual processing (abstraction)
[H.R. Myler, Fundamentals of Machine Vision]
© ANAFOCUS 2008
8
The Vision Processing Chain 9
Huge amount
Early Stages
of
data
at
RGB 8
8-bit
bit coded @ 512 x 512
implies 200Mbit/s data rate
Most data are useless
© ANAFOCUS 2008
9
Illustrating the Computational Demands
10
Per-Pixel
Per
Pixel 3x3 Linear Convolution
9 Products
+
If the image is
NxM
8 Sums
17xNxM Oper.
17 Operations
If the algorithm
contains
Ns
convs
convs.
17x NsxNxM Oper.
At F Frames per
second
17xFx NsxNxM OPS
10 convolutions on 8-bit gray-scale VGA images at 100 FPS requires
5 GOPS
Memory operations must be accounted for as well
[Original from Gustavo Liñán]
© ANAFOCUS 2008
10
Conventional Vision System Architecture11
Conventional systems are heterogeneous
Sensors, Memory, Processors,
Communications
Involve multiple technologies
control signals
F
F
F
ADC
F
RAM
Data Flow
CMOS / CCD sensor
High performance A/D Converters
High-performance
Advanced DSP & High Density Memory
Image Acquisition
Image Coding
Image Processing and Storage
© ANAFOCUS 2008
11
Troubles of Conventional MV Architectures12
Input Image
Physical separation:
S
Sensors
Camera
Processors
H t
Heterogeneous
t h l i
technologies
BOTTLENECK
BOTTLENECK
CCD or CMOS for sensing
CMOS: FPGA,
FPGA Power-PC,
Power PC
DSP, etc for processing
Bottlenecks
stages
DSP
Memory
at
different
Image coding
Image transmission
t ansmission
Image processing
DSP
DSP
© ANAFOCUS 2008
12
Rationale for CMOS
13
Photo-transduction
Photo
transduction Mechanisms
n
p
e-
h+
e- + -h h eh e eh+
p-
+ +
e-
h+
h+
e-
Iph
Si
Incident photons create electronhole pairs
The absorption length
upon the wavelength
depends
An electrical field separate the holeelectron pairs
© ANAFOCUS 2008
13
Rationale for CMOS
14
Pinned Photodiodes
VDD
F t
Features:
FD
TG
p+ n p- structure
RST
Low-dark current
VDD
SEL
P+
n+
n+
n
Operation similar to photogate
Decrease reset noise (Correlated
Double Sampling)
Better response to blue
P-
During integration phase, photo-generated majority carriers are stored in the
depletion
p
region.
g
(Device
(
is fully
y depleted)
p
)
© ANAFOCUS 2008
14
The Concept of CMOS Camera on Chip
15
Fossum´s
Fossum
s Camera on Chip
© ANAFOCUS 2008
15
Architectures for Sensory-Processing
Progressive
g
Distributed Processing
g
Bio-Inspired Vision Architectures
Shifting the Analog
Analog-to-Digital
to Digital Border
© ANAFOCUS 2008
16
Conceptual Architectural Choices
17
Conventional Architecture
Sensors
+
Analog Signal Conditioning
A/D Conversion
Digital
☺ Large
spatial
resolution
limited only by pixel SNR
☺ Large
circuit
operation
predictability and robustness
analog only at the ADCs
☺ Small
spurious
signal
interactions
☺ Large flexibility and programmability:
mostly digital circuits and
codes
Small computational power
and efficiency
Large memory requirements
Data transfer bottlenecks
© ANAFOCUS 2008
17
Conceptual Architectural Choices
18
Linear Array of Digital Processors
Sensors
+
Analog Signal Conditioning
A/D Conversion
Distributed Digital: LAP
Digital
Smaller
spatial
resolution:
trade-off with processing
☺ Large
circuit
operation
predictability and robustness:
analog only at the ADCs
☺ Small
spurious
signal
interactions
☺ Large
flexibility
and
pro
programmability:
mostly digital circuits and
codes
Larger computational power
and efficiency
S
Smaller
ll
memory requirements
i
Data transfer bottlenecks
© ANAFOCUS 2008
18
Conceptual Architectural Choices
19
A True LAP VSoC for Machine Vision
Pinned photodiode pixel
High-speed rolling and global electronic
shutter with programmable exposure time.
Programmable controls: gain, offset, frame
rate and frame size (RoI)
100MHz clock frequency
Per-column readout p
path p
permitting
g up
p to
500fps readout speeds
Multiple
I/Os
communications
and
high-speed
© ANAFOCUS 2008
19
Conceptual Architectural Choices
20
Linear Array of Analog
Analog-Digital
Digital Processors
Sensors
+
Analog Signal Conditioning
Distributed Analog
+
A/D Conversion
Distributed Digital: LAP
Digital
Smaller
spatial
resolution:
trade-off with processing
Smaller
circuit
operation
predictability and robustness:
analog also for processing
Larger
spurious
signal
interactions
Smaller flexibility and propro
grammability:
both analog and digital
circuits
Larger computational power
and efficiency
S
Smaller
ll
memory requirements
i
Smaller transfer bottlenecks
© ANAFOCUS 2008
20
21
How Does The Retina Work ?
Sensors
and
processors
are
merged
d
Control
Processing
and
sensing
i
are
simultaneous
Actuate
Si ifi
Significant
t data
d t
compression
is
achieved
© ANAFOCUS 2008
21
How Does Retina Work ? 22
[Roska & Werblin 2001]
© ANAFOCUS 2008
22
Shifting the Analogue-Digital Border 23
Sensors
+
Signal Coditioning
Sensors
A/D Conversion
C +
i
Mixed-Signal Cond. + Proc.
Digital
Digital
© ANAFOCUS 2008
23
Using the Bio-Inspiration Concept
24
control signals
F
F
F
F
ADC
RAM
Data Flow
CMOS / CCD sensor
High-performance A/D Converters
Advanced DSP & High Density Memory
Image Acquisition
Image Coding
Image Processing and Storage
control signals
f << F
F
f
ADC
Data Flow
f
RAM
F l Pl
Focal-Plane
Sensor-Processor
S
P
L P fil A/D C
Low-Profile
Converters
t
Si l DSP & LLow Memory
Simple
M
Image Acquisition & Pre-processing
Pre-processed Image Conversion
Image Post-Processing and Storage
© ANAFOCUS 2008
24
Topographic Sensor-Processors
C
Concept
t off Concurrent
C
t Sensory-Processing
S
P
i
Multi-Functional Pixels
Rationale for Analog Pre-Processing
© ANAFOCUS 2008
25
Concept of Concurrent Sensor-Processing
26
state/output
input
A
B
z
© ANAFOCUS 2008
26
Multiple Signal Representations
27
© ANAFOCUS 2008
27
Basic Spatial Interactions
28
CNN Single Layer Interactions
© ANAFOCUS 2008
28
Spatio-Temporal Interactions
29
CNN Multi
Multi-Layer
Layer Interactions
Layer 1
Inputs
b1u1 + z1
b2u2 + z2
Layer 2
τ2
A11y1
a12y1
x1,ij
f()
y1,ij
g( )
a12y1
Feedback
Layer 2
A22y2
y1 y2
Layer 3
τ3 <<τ1,τ 2
τ1 ∫
b1u1,ij + z1
Layer 1
τ1
∑
Inter-Layer
Coupling
p g
1
w1y1 + w2y2
Outputs
∑
b2 u 2 ,ij + z 2
1
τ2
∫
x2 ,ij
f()
y 2 ,ij
g( )
© ANAFOCUS 2008
29
Programming Reaction-Diffusion Equations
30
Multi-Layer
Multi
Layer Interactions
d
φi ( x, y, t ) − ci∇2φi ( x, y, t ) = α iφi ( x, y, t ) + βiφi ( x, y, t0 ) + γ ijφ j ( x, y, t )
dt
diffusion
reaction
Wave propagation in active media
Layer 1 (slow)
[Perona & Malik 1990]
Layer 2 (fast)
© ANAFOCUS 2008
30
Wave Generation
31
Multi-Layer
Multi
Layer Interactions
Traveling Waves
1 1 1
A1 = 1 1.5 1
1 1 1
0.6 0.8 0.6
A 2 = 0.8 − 0.6 0.8
0.6 0.8 0.6
a 21 = 4.5
a12 = −4.2
b1 =0 b2 =5
z1 = 0.6
z 2 = − 2 .7
τ 1 : τ 2 = 16 : 1
Layer 1 (slow)
Layer 2 (fast)
Spiral Wave
0.6 0.8 0.6
0.6 0.8 0.6


A1 = 0.8 −0.3 0.8 A2 = 0.8 −0.9 0.8
0.6 0.8 0.6
0.6 0.8 0.6
a21 = 6 a12 = −3
b1 = 0.0 b2 = 3.9
z1 = −4.5 z2 = −3.6
Only fast layer
τ1 : τ2 =8:1
© ANAFOCUS 2008
31
Retina Emulation
32
Multi-Layer
Multi
Layer Interactions
Long Distance Inhibition in the IPL
1 1 1
A1 = 1 1.5 1
1 1 1
0.6 0.8 0.6
A 2 = 0.8 − 0.6 0.8
0.6 0.8 0.6
a 21 = 3 .3 a12 = − 5 .4
b1 =0 b2 =3
z1 = 0 .6
z 2 = − 2 .7
τ 1 : τ 2 = 16 : 1
Layer
y 1 (slow)
(
)
Layer 2 (fast)
Spatial-Temporal Detection of Edges in the OPL
0.5 0.8 0.5
0 0 0


A1 = 0.8 −3.6 0.8 A2 = 0 0 0
0 0 0
0.5 0.8 0.5
a21 = 4.5 a12 = −5.7
b1 = 0.0 b2 = 3.0
z1 = −9 z2 = 0
Only fast layer
τ1 : τ2 = 6:1
© ANAFOCUS 2008
32
Conceptual Architectural Choices
33
Topographic ADCs + Digital Processing
Sensors
S
+
ADCs
+
Digital
+
Memory
Smaller
spatial
resolution:
trade-off with processing
Larger
system
operation
predictability and robustness:
analog only for ADCs
Smaller
spurious
interactions
Larger flexibility
grammability:
signal
and
pro-
only digital processing
Distributed Digital: LAP
Digital
Functionality and efficiency
compromised by ADC accuracy
Large
in-pixel
requirements
memory
© ANAFOCUS 2008
33
Conceptual Architectural Choices
34
Topographic Mixed
Mixed-Signal
Signal Processing
Sensors
+
Analog
+
Digital
+
M
Memory
A/D Conversion
Distributed Digital: LAP
Digital
Low
spatial
resolution:
trade-off with processing
Involved
circuit
operation
predictability and robustness:
analog also for processing
Larger
spurious
signal
interactions
Involved
flexibility
and
programability:
both analog and digital
circuits
☺ Large computational power
and efficiency
☺ Small
S
ll memory requirements
i
☺ Small transfer bottlenecks
© ANAFOCUS 2008
34
Rationale for CMOS
35
The Functional Power of MOST
© ANAFOCUS 2008
35
The Accuracy Requirements
36
Visual Inspection of Results
Adding random noise (σ=4LSB)
4LSB) to: 1) Input Image;
2) Coefficients of the convolution kernel at every
position; 3) Output Image
[Original from Gustavo Liñán]
© ANAFOCUS 2008
36
The Accuracy Requirements
37
Using Quality Assessment Methods
0.9087
[Original from Gustavo Liñán]
0.4815
0.4537
Z. Wang, et al., "Image quality assessment: From error visibility to structural similarity," IEEE
Transactions on Image Processing, vol. 13, no. 4, pp. 600-612, Apr. 2004.
© ANAFOCUS 2008
37
Multi-Functional Pixel Example
38
© ANAFOCUS 2008
38
The Resolution Compromise
39
Size of Sensory Pixel
Size of Sensory-Processing Pixel
How Important is Resolution ?
© ANAFOCUS 2008
39
How Important is Resolution ?
512 x 512
128 x 128
Vision is possible with 25 x 25
“pixels” (within limited field of
view)
)
Text can be read at 200
words/min (300 words/min
with normal vision)
St d t can navigate
Students
i t in
i
complex environments (maze)
with confidence
64 x 64
40
32 x 32
Resolution can be increased
through proper design
VGA and up to 1.3Mpixels
resolution for Surveillance
> 1Mpixels for Machine Vision
and Automotive
© ANAFOCUS 2008
40
Q-Eye in Operation
41
© ANAFOCUS 2008
41
The Eye-RIS system
Architecture
The System in Operation
© ANAFOCUS 2008
42
Multi-Functional Pixel Concept
43
© ANAFOCUS 2008
43
The Eye-RIS™ System
44
Configurable system which provides an efficient solution
for low-to-medium resolution and high frame rates
applications
li ti
Mixed-signal focalplane early processor
Data - Memory
(SRAM)
32bit RISC
Microprocessor
Program - Memory
(SRAM)
Ethernet / PCI /
USB 2,0
High-speed Bus
Multi channel
Multi-channel
DMA controller
Timing, PLL,
General I/O, …
Bus Bridge
Off-chip memory
Controller
Co
to e
Low-speed Bus
Ultra-high frame rate: up to 10,000fps@QCIF image resolution
Ultra-low power consumption: < 10mW@30fps@QCIF image resolution
© ANAFOCUS 2008
44
The Q-Eye Chip
45
UMC 0.18µm CMOS 1P5M (1.8V/3.3V) - mixed-signal.
High-performance Smart Image Sensor
176 x 144 grey-scale pixels with 29.1µm pitch
High-speed non-rolling electronic shutter.
Programmable exposure time (controlling
(
stepdown to 20ns)
Approximated sensitivity of 3V/lux-sec at 550nm
4 +1 (two banks) high
high-retention
retention analog and 4 binary
memories
Analog multiplexer for image shifting and analog
MAC unit
Programmable,3 x 3 neighbourhood pattern
matching with 1/0/d.n.c.pattern definition (fast
morphological functions)
On-chip bank of high-speed ADCs and DACs.
Multiple I/Os and high-speed communication ports
© ANAFOCUS 2008
45
Q-Eye Pixel
46
Generic analog
voltages
Direct address event
block
ADCs
LAMs
MAC
176 x 144 cell array
Grey optical
module
R-G-B
R
G B optical
modules
Morphological
operator
Buffers - DACs
Neigbourhood
multiplexer
Binary memories
Control unit + memory
LLU
Resistive ggrid
© ANAFOCUS 2008
46
The Eye-RIS™ v1.2
47
Main Features
Q-Eye focal-plane processor
▪
▪
▪
▪
176 x 144 spatial resolution
Monochrome
M
h
image
i
sensor with
ith 3.2V/(lux·sec)
3 2V/(l
) sensitivity
iti it
Maximum frame rate (sensing + processing) of over 10,000fps
Advanced pixel architecture combining image acquisition, image
processing and storage:
- Multi-mode image sensing, analogue & binary memories, analogue
multiplexor
lti l
ffor iimage shifting,
hifti
analogue
l
MAC unit,
it programmable
bl LUT,
LUT
resistive grid for controllable smoothing…
Digital control & post-processing
▪ ALTERA NIOS-II 32-bit RISC microprocessor
▪ 1.17 DMIPS/MHz performance at 70MHz operation frequency
▪ 32Mb SDRAM for program and image/data storage
Communications
▪ SPI port, UART, 2xPWM ports and GPIOs, USB 2.0, and GigE
▪ JTAG Controller
▪ 1.5W typical power consumption
Application development kit
Eye-RIS™ v1.2 vision system
▪ Application Development Kit including: Project builder, C compiler,
assembler,, linker,, and source-code debugger.
gg
▪ Image-processing library including basic routines such as: point-topoint operations, spatial filtering operations, morphological
operations and statistical operations
© ANAFOCUS 2008
47
The First True Bio-inspired VSoC: Eye-RIS™ v2.1
48
© ANAFOCUS 2008
48
The First True Bio-inspired VSoC: Eye-RIS™ v2.1
49
Main Features
Main Features
Q-Eye
Q
y focal-plane
p
p
processor
▪
▪
▪
▪
176 x 144 spatial resolution
Monochrome image sensor with 3.2V/(lux·sec) sensitivity
Maximum frame rate (sensing + processing) of over 10,000fps
Advanced pixel architecture combining image acquisition, image
processing and storage:
Q-Eye Focal-plane processor
- Multi-mode image sensing, analogue & binary memories, analogue
multiplexor for image shifting, analogue MAC unit, programmable LUT,
resistive grid for controllable smoothing…
Digital control & post-processing
▪ ALTERA NIOS-II
NIOS II 32-bit
32 bit RISC microprocessor
i
▪ 1.17 DMIPS/MHz performance at 100MHz operation frequency
Memory
NIOS-II uP
Memory
Communications
▪ SPI port, UART, 2xPWM ports and GPIOs, USB 2.0 interface, and
▪ JTAG C
Controller
t ll
▪ <700mW typical power consumption
Application development kit
▪ Application Development Kit including: Project builder, C compiler,
assembler linker,
assembler,
linker and source-code
source code debugger.
debugger
▪ Image-processing library including basic routines such as: point-topoint operations, spatial filtering operations, morphological
operations and statistical operations
© ANAFOCUS 2008
49
Processor Comparison
50
Benchmark: Each of the processors calculated 20 convolutions, 2
diffusions,, 3 means,, and 40 morphology
p
gy and 10 g
global ORs. Note:
Only the DSP and the pipe-line multi-core (FPGA) architectures
support trading between resolution and frame-rate
[Original from Csaba Reckeczky]
© ANAFOCUS 2008
50
Processor Comparison
51
© ANAFOCUS 2008
51
Processor Comparison
52
+ Texas Instrument DaVinci video processor (TMS320DM6443)
++ Xilinx Spartan 3A DSP FPGA (XC3SD3400A)
* due to data access
** due to pipe-line operations
*** no multiplication,
multiplication scaling with a few discrete values
**** these data intensive operators slow down significantly when the image does not fit to the
internal memory (typically above 128x128 for a DaVinci, which has 64kByte internal memory)
[Original from Csaba Reckeczky]
© ANAFOCUS 2008
52
3-D Parallel Processing Multi-Layer Architectures
Bonding wire
Sensor layer (320x240)
Mixed
signal
layer
Processor
array
(160x120)
Control bus
Pixel
parallel
53
Bonding wire
AD
DA?
32 bit data bus
(gray value)
Digital
proc.
layer
Digital
memory
layer
Processor array
256 cores
Processor
parallel
memory array
64kB
© ANAFOCUS 2008
53
3-D Integration
Topographic,
layered
sensingprocessing
54
AFRL BB Process
Memory
…
Bump bonding:
Proc
ADC – Proc No1
Sensor
3D stacking:
3D
CNN
CUBE
MITLL Low-Power
FDSOI
DSOI CMOS
C OS Process
54
© ANAFOCUS 2008
54
Visual Prosthesis
55
Retina Coder
+
Electrode Driving
implante
sub-retinal
+
Electrode Array
[K. Wise, D. Kipke,U. Michigan]
Sub-Retinal Implant
p
[Chow et. al 2001]
implante
epi-retinal
Epi-Retinal Implant
[Humayun et. al 1999]
© ANAFOCUS 2008
55
Discussion
VSoCs are still like science fiction toys for industry
Even smart CMOS sensors are not common yet
A real challenge for engineering !!
We can take advantage
g of nature through
g understanding
g
Parallel and concurrent sensorysensory-processing
Multi
Multi--scale representation
p
Signal coding
Close interaction and collaboration between analog and
digital circuitry is needed for efficient VSoC design.
© ANAFOCUS 2007
56