On the Design of CMOS Vision Systems on Chip
Transcription
On the Design of CMOS Vision Systems on Chip
On the Design of CMOS Vision Systems on Chip Angel Rodríguez-Vázquez IEEE CAS Workshop 2008 Rio de Janeiro, August 27 - 2008 © ANAFOCUS 2007 1 Outline of the Talk • Some Basic Concepts: ◊ ◊ ◊ • Architectures for Vision Systems ◊ ◊ ◊ • Progressive Distributed Processing Bio--Inspired Vision System Architectures Bio Shifting the Analog Analog--toto-Digital Border Topographical T hi l S Sensor-Processors SensorP ◊ ◊ ◊ • Concept of Vision Systems and VSoCs Conventional Vision System Architecture Rationale i l for f Using i CMOS C OS Concept of Concurrent Sensory Sensory--Processing Multi--Functional Pixel Multi R ti Rationale l ffor Analog A l Pre PreP -Processing P i The EyeEye-RIS Vision Systems ◊ ◊ ◊ Architecture The System in Operation Some Prospects for Future Developments 2 Some Basic Concepts Concept p of Vision Systems y Conventional MV Architectures Rationale for CMOS Implementation © ANAFOCUS 2008 3 Concept and Ingredients of Vision 4 Interpretation and Decision Image g Processing g Image Acquisition © ANAFOCUS 2008 4 The Concept of CMOS Vision System on-Chip 5 All functions for smart imaging and vision systems on chip Memory I Image Sensors S ADC Single-Chip Solution Data Converters Multi-Function System Digital Signal Processors © ANAFOCUS 2008 5 Illustrating the Challenge of Autonomy 6 Systems capable to make autonomous decisions for vision-based guidance 1 2 Voltage regulator board Motor controller board 3 Battery Image acquisition | Low-pass filtering | tresholding | skeletonization |morphological filtering | signal recognition | road-lanes recognition | actuation on the robot wheels. 4 © ANAFOCUS 2008 6 Illustrating the Challenge of Speed 7 Systems to close the Perception-to-Action Loop at Thousands Images-per-Second Speed ~ 40 m /s Up to 3,000 frames per second Laser system Illumination system Image acquisition | Low-pass filtering | activity detection | motion estimation | object tracking | Loop control | position prediction | coordinates translation | actuation on the laser system. © ANAFOCUS 2008 7 Goals of Artificial Vision Systems 8 Perceive lightness and colour under various illuminations Detect intensity changes and perform 2-D segmentation Infer 3-D structures from stereo or motion images Organize surface and regions into objects of interest G Generate t description d i ti off objects bj t and d recognize i th them among a potentially large class of objects Make non-visual inferences about the scene based on visual processing (abstraction) [H.R. Myler, Fundamentals of Machine Vision] © ANAFOCUS 2008 8 The Vision Processing Chain 9 Huge amount Early Stages of data at RGB 8 8-bit bit coded @ 512 x 512 implies 200Mbit/s data rate Most data are useless © ANAFOCUS 2008 9 Illustrating the Computational Demands 10 Per-Pixel Per Pixel 3x3 Linear Convolution 9 Products + If the image is NxM 8 Sums 17xNxM Oper. 17 Operations If the algorithm contains Ns convs convs. 17x NsxNxM Oper. At F Frames per second 17xFx NsxNxM OPS 10 convolutions on 8-bit gray-scale VGA images at 100 FPS requires 5 GOPS Memory operations must be accounted for as well [Original from Gustavo Liñán] © ANAFOCUS 2008 10 Conventional Vision System Architecture11 Conventional systems are heterogeneous Sensors, Memory, Processors, Communications Involve multiple technologies control signals F F F ADC F RAM Data Flow CMOS / CCD sensor High performance A/D Converters High-performance Advanced DSP & High Density Memory Image Acquisition Image Coding Image Processing and Storage © ANAFOCUS 2008 11 Troubles of Conventional MV Architectures12 Input Image Physical separation: S Sensors Camera Processors H t Heterogeneous t h l i technologies BOTTLENECK BOTTLENECK CCD or CMOS for sensing CMOS: FPGA, FPGA Power-PC, Power PC DSP, etc for processing Bottlenecks stages DSP Memory at different Image coding Image transmission t ansmission Image processing DSP DSP © ANAFOCUS 2008 12 Rationale for CMOS 13 Photo-transduction Photo transduction Mechanisms n p e- h+ e- + -h h eh e eh+ p- + + e- h+ h+ e- Iph Si Incident photons create electronhole pairs The absorption length upon the wavelength depends An electrical field separate the holeelectron pairs © ANAFOCUS 2008 13 Rationale for CMOS 14 Pinned Photodiodes VDD F t Features: FD TG p+ n p- structure RST Low-dark current VDD SEL P+ n+ n+ n Operation similar to photogate Decrease reset noise (Correlated Double Sampling) Better response to blue P- During integration phase, photo-generated majority carriers are stored in the depletion p region. g (Device ( is fully y depleted) p ) © ANAFOCUS 2008 14 The Concept of CMOS Camera on Chip 15 Fossum´s Fossum s Camera on Chip © ANAFOCUS 2008 15 Architectures for Sensory-Processing Progressive g Distributed Processing g Bio-Inspired Vision Architectures Shifting the Analog Analog-to-Digital to Digital Border © ANAFOCUS 2008 16 Conceptual Architectural Choices 17 Conventional Architecture Sensors + Analog Signal Conditioning A/D Conversion Digital ☺ Large spatial resolution limited only by pixel SNR ☺ Large circuit operation predictability and robustness analog only at the ADCs ☺ Small spurious signal interactions ☺ Large flexibility and programmability: mostly digital circuits and codes Small computational power and efficiency Large memory requirements Data transfer bottlenecks © ANAFOCUS 2008 17 Conceptual Architectural Choices 18 Linear Array of Digital Processors Sensors + Analog Signal Conditioning A/D Conversion Distributed Digital: LAP Digital Smaller spatial resolution: trade-off with processing ☺ Large circuit operation predictability and robustness: analog only at the ADCs ☺ Small spurious signal interactions ☺ Large flexibility and pro programmability: mostly digital circuits and codes Larger computational power and efficiency S Smaller ll memory requirements i Data transfer bottlenecks © ANAFOCUS 2008 18 Conceptual Architectural Choices 19 A True LAP VSoC for Machine Vision Pinned photodiode pixel High-speed rolling and global electronic shutter with programmable exposure time. Programmable controls: gain, offset, frame rate and frame size (RoI) 100MHz clock frequency Per-column readout p path p permitting g up p to 500fps readout speeds Multiple I/Os communications and high-speed © ANAFOCUS 2008 19 Conceptual Architectural Choices 20 Linear Array of Analog Analog-Digital Digital Processors Sensors + Analog Signal Conditioning Distributed Analog + A/D Conversion Distributed Digital: LAP Digital Smaller spatial resolution: trade-off with processing Smaller circuit operation predictability and robustness: analog also for processing Larger spurious signal interactions Smaller flexibility and propro grammability: both analog and digital circuits Larger computational power and efficiency S Smaller ll memory requirements i Smaller transfer bottlenecks © ANAFOCUS 2008 20 21 How Does The Retina Work ? Sensors and processors are merged d Control Processing and sensing i are simultaneous Actuate Si ifi Significant t data d t compression is achieved © ANAFOCUS 2008 21 How Does Retina Work ? 22 [Roska & Werblin 2001] © ANAFOCUS 2008 22 Shifting the Analogue-Digital Border 23 Sensors + Signal Coditioning Sensors A/D Conversion C + i Mixed-Signal Cond. + Proc. Digital Digital © ANAFOCUS 2008 23 Using the Bio-Inspiration Concept 24 control signals F F F F ADC RAM Data Flow CMOS / CCD sensor High-performance A/D Converters Advanced DSP & High Density Memory Image Acquisition Image Coding Image Processing and Storage control signals f << F F f ADC Data Flow f RAM F l Pl Focal-Plane Sensor-Processor S P L P fil A/D C Low-Profile Converters t Si l DSP & LLow Memory Simple M Image Acquisition & Pre-processing Pre-processed Image Conversion Image Post-Processing and Storage © ANAFOCUS 2008 24 Topographic Sensor-Processors C Concept t off Concurrent C t Sensory-Processing S P i Multi-Functional Pixels Rationale for Analog Pre-Processing © ANAFOCUS 2008 25 Concept of Concurrent Sensor-Processing 26 state/output input A B z © ANAFOCUS 2008 26 Multiple Signal Representations 27 © ANAFOCUS 2008 27 Basic Spatial Interactions 28 CNN Single Layer Interactions © ANAFOCUS 2008 28 Spatio-Temporal Interactions 29 CNN Multi Multi-Layer Layer Interactions Layer 1 Inputs b1u1 + z1 b2u2 + z2 Layer 2 τ2 A11y1 a12y1 x1,ij f() y1,ij g( ) a12y1 Feedback Layer 2 A22y2 y1 y2 Layer 3 τ3 <<τ1,τ 2 τ1 ∫ b1u1,ij + z1 Layer 1 τ1 ∑ Inter-Layer Coupling p g 1 w1y1 + w2y2 Outputs ∑ b2 u 2 ,ij + z 2 1 τ2 ∫ x2 ,ij f() y 2 ,ij g( ) © ANAFOCUS 2008 29 Programming Reaction-Diffusion Equations 30 Multi-Layer Multi Layer Interactions d φi ( x, y, t ) − ci∇2φi ( x, y, t ) = α iφi ( x, y, t ) + βiφi ( x, y, t0 ) + γ ijφ j ( x, y, t ) dt diffusion reaction Wave propagation in active media Layer 1 (slow) [Perona & Malik 1990] Layer 2 (fast) © ANAFOCUS 2008 30 Wave Generation 31 Multi-Layer Multi Layer Interactions Traveling Waves 1 1 1 A1 = 1 1.5 1 1 1 1 0.6 0.8 0.6 A 2 = 0.8 − 0.6 0.8 0.6 0.8 0.6 a 21 = 4.5 a12 = −4.2 b1 =0 b2 =5 z1 = 0.6 z 2 = − 2 .7 τ 1 : τ 2 = 16 : 1 Layer 1 (slow) Layer 2 (fast) Spiral Wave 0.6 0.8 0.6 0.6 0.8 0.6 A1 = 0.8 −0.3 0.8 A2 = 0.8 −0.9 0.8 0.6 0.8 0.6 0.6 0.8 0.6 a21 = 6 a12 = −3 b1 = 0.0 b2 = 3.9 z1 = −4.5 z2 = −3.6 Only fast layer τ1 : τ2 =8:1 © ANAFOCUS 2008 31 Retina Emulation 32 Multi-Layer Multi Layer Interactions Long Distance Inhibition in the IPL 1 1 1 A1 = 1 1.5 1 1 1 1 0.6 0.8 0.6 A 2 = 0.8 − 0.6 0.8 0.6 0.8 0.6 a 21 = 3 .3 a12 = − 5 .4 b1 =0 b2 =3 z1 = 0 .6 z 2 = − 2 .7 τ 1 : τ 2 = 16 : 1 Layer y 1 (slow) ( ) Layer 2 (fast) Spatial-Temporal Detection of Edges in the OPL 0.5 0.8 0.5 0 0 0 A1 = 0.8 −3.6 0.8 A2 = 0 0 0 0 0 0 0.5 0.8 0.5 a21 = 4.5 a12 = −5.7 b1 = 0.0 b2 = 3.0 z1 = −9 z2 = 0 Only fast layer τ1 : τ2 = 6:1 © ANAFOCUS 2008 32 Conceptual Architectural Choices 33 Topographic ADCs + Digital Processing Sensors S + ADCs + Digital + Memory Smaller spatial resolution: trade-off with processing Larger system operation predictability and robustness: analog only for ADCs Smaller spurious interactions Larger flexibility grammability: signal and pro- only digital processing Distributed Digital: LAP Digital Functionality and efficiency compromised by ADC accuracy Large in-pixel requirements memory © ANAFOCUS 2008 33 Conceptual Architectural Choices 34 Topographic Mixed Mixed-Signal Signal Processing Sensors + Analog + Digital + M Memory A/D Conversion Distributed Digital: LAP Digital Low spatial resolution: trade-off with processing Involved circuit operation predictability and robustness: analog also for processing Larger spurious signal interactions Involved flexibility and programability: both analog and digital circuits ☺ Large computational power and efficiency ☺ Small S ll memory requirements i ☺ Small transfer bottlenecks © ANAFOCUS 2008 34 Rationale for CMOS 35 The Functional Power of MOST © ANAFOCUS 2008 35 The Accuracy Requirements 36 Visual Inspection of Results Adding random noise (σ=4LSB) 4LSB) to: 1) Input Image; 2) Coefficients of the convolution kernel at every position; 3) Output Image [Original from Gustavo Liñán] © ANAFOCUS 2008 36 The Accuracy Requirements 37 Using Quality Assessment Methods 0.9087 [Original from Gustavo Liñán] 0.4815 0.4537 Z. Wang, et al., "Image quality assessment: From error visibility to structural similarity," IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600-612, Apr. 2004. © ANAFOCUS 2008 37 Multi-Functional Pixel Example 38 © ANAFOCUS 2008 38 The Resolution Compromise 39 Size of Sensory Pixel Size of Sensory-Processing Pixel How Important is Resolution ? © ANAFOCUS 2008 39 How Important is Resolution ? 512 x 512 128 x 128 Vision is possible with 25 x 25 “pixels” (within limited field of view) ) Text can be read at 200 words/min (300 words/min with normal vision) St d t can navigate Students i t in i complex environments (maze) with confidence 64 x 64 40 32 x 32 Resolution can be increased through proper design VGA and up to 1.3Mpixels resolution for Surveillance > 1Mpixels for Machine Vision and Automotive © ANAFOCUS 2008 40 Q-Eye in Operation 41 © ANAFOCUS 2008 41 The Eye-RIS system Architecture The System in Operation © ANAFOCUS 2008 42 Multi-Functional Pixel Concept 43 © ANAFOCUS 2008 43 The Eye-RIS™ System 44 Configurable system which provides an efficient solution for low-to-medium resolution and high frame rates applications li ti Mixed-signal focalplane early processor Data - Memory (SRAM) 32bit RISC Microprocessor Program - Memory (SRAM) Ethernet / PCI / USB 2,0 High-speed Bus Multi channel Multi-channel DMA controller Timing, PLL, General I/O, … Bus Bridge Off-chip memory Controller Co to e Low-speed Bus Ultra-high frame rate: up to 10,000fps@QCIF image resolution Ultra-low power consumption: < 10mW@30fps@QCIF image resolution © ANAFOCUS 2008 44 The Q-Eye Chip 45 UMC 0.18µm CMOS 1P5M (1.8V/3.3V) - mixed-signal. High-performance Smart Image Sensor 176 x 144 grey-scale pixels with 29.1µm pitch High-speed non-rolling electronic shutter. Programmable exposure time (controlling ( stepdown to 20ns) Approximated sensitivity of 3V/lux-sec at 550nm 4 +1 (two banks) high high-retention retention analog and 4 binary memories Analog multiplexer for image shifting and analog MAC unit Programmable,3 x 3 neighbourhood pattern matching with 1/0/d.n.c.pattern definition (fast morphological functions) On-chip bank of high-speed ADCs and DACs. Multiple I/Os and high-speed communication ports © ANAFOCUS 2008 45 Q-Eye Pixel 46 Generic analog voltages Direct address event block ADCs LAMs MAC 176 x 144 cell array Grey optical module R-G-B R G B optical modules Morphological operator Buffers - DACs Neigbourhood multiplexer Binary memories Control unit + memory LLU Resistive ggrid © ANAFOCUS 2008 46 The Eye-RIS™ v1.2 47 Main Features Q-Eye focal-plane processor ▪ ▪ ▪ ▪ 176 x 144 spatial resolution Monochrome M h image i sensor with ith 3.2V/(lux·sec) 3 2V/(l ) sensitivity iti it Maximum frame rate (sensing + processing) of over 10,000fps Advanced pixel architecture combining image acquisition, image processing and storage: - Multi-mode image sensing, analogue & binary memories, analogue multiplexor lti l ffor iimage shifting, hifti analogue l MAC unit, it programmable bl LUT, LUT resistive grid for controllable smoothing… Digital control & post-processing ▪ ALTERA NIOS-II 32-bit RISC microprocessor ▪ 1.17 DMIPS/MHz performance at 70MHz operation frequency ▪ 32Mb SDRAM for program and image/data storage Communications ▪ SPI port, UART, 2xPWM ports and GPIOs, USB 2.0, and GigE ▪ JTAG Controller ▪ 1.5W typical power consumption Application development kit Eye-RIS™ v1.2 vision system ▪ Application Development Kit including: Project builder, C compiler, assembler,, linker,, and source-code debugger. gg ▪ Image-processing library including basic routines such as: point-topoint operations, spatial filtering operations, morphological operations and statistical operations © ANAFOCUS 2008 47 The First True Bio-inspired VSoC: Eye-RIS™ v2.1 48 © ANAFOCUS 2008 48 The First True Bio-inspired VSoC: Eye-RIS™ v2.1 49 Main Features Main Features Q-Eye Q y focal-plane p p processor ▪ ▪ ▪ ▪ 176 x 144 spatial resolution Monochrome image sensor with 3.2V/(lux·sec) sensitivity Maximum frame rate (sensing + processing) of over 10,000fps Advanced pixel architecture combining image acquisition, image processing and storage: Q-Eye Focal-plane processor - Multi-mode image sensing, analogue & binary memories, analogue multiplexor for image shifting, analogue MAC unit, programmable LUT, resistive grid for controllable smoothing… Digital control & post-processing ▪ ALTERA NIOS-II NIOS II 32-bit 32 bit RISC microprocessor i ▪ 1.17 DMIPS/MHz performance at 100MHz operation frequency Memory NIOS-II uP Memory Communications ▪ SPI port, UART, 2xPWM ports and GPIOs, USB 2.0 interface, and ▪ JTAG C Controller t ll ▪ <700mW typical power consumption Application development kit ▪ Application Development Kit including: Project builder, C compiler, assembler linker, assembler, linker and source-code source code debugger. debugger ▪ Image-processing library including basic routines such as: point-topoint operations, spatial filtering operations, morphological operations and statistical operations © ANAFOCUS 2008 49 Processor Comparison 50 Benchmark: Each of the processors calculated 20 convolutions, 2 diffusions,, 3 means,, and 40 morphology p gy and 10 g global ORs. Note: Only the DSP and the pipe-line multi-core (FPGA) architectures support trading between resolution and frame-rate [Original from Csaba Reckeczky] © ANAFOCUS 2008 50 Processor Comparison 51 © ANAFOCUS 2008 51 Processor Comparison 52 + Texas Instrument DaVinci video processor (TMS320DM6443) ++ Xilinx Spartan 3A DSP FPGA (XC3SD3400A) * due to data access ** due to pipe-line operations *** no multiplication, multiplication scaling with a few discrete values **** these data intensive operators slow down significantly when the image does not fit to the internal memory (typically above 128x128 for a DaVinci, which has 64kByte internal memory) [Original from Csaba Reckeczky] © ANAFOCUS 2008 52 3-D Parallel Processing Multi-Layer Architectures Bonding wire Sensor layer (320x240) Mixed signal layer Processor array (160x120) Control bus Pixel parallel 53 Bonding wire AD DA? 32 bit data bus (gray value) Digital proc. layer Digital memory layer Processor array 256 cores Processor parallel memory array 64kB © ANAFOCUS 2008 53 3-D Integration Topographic, layered sensingprocessing 54 AFRL BB Process Memory … Bump bonding: Proc ADC – Proc No1 Sensor 3D stacking: 3D CNN CUBE MITLL Low-Power FDSOI DSOI CMOS C OS Process 54 © ANAFOCUS 2008 54 Visual Prosthesis 55 Retina Coder + Electrode Driving implante sub-retinal + Electrode Array [K. Wise, D. Kipke,U. Michigan] Sub-Retinal Implant p [Chow et. al 2001] implante epi-retinal Epi-Retinal Implant [Humayun et. al 1999] © ANAFOCUS 2008 55 Discussion VSoCs are still like science fiction toys for industry Even smart CMOS sensors are not common yet A real challenge for engineering !! We can take advantage g of nature through g understanding g Parallel and concurrent sensorysensory-processing Multi Multi--scale representation p Signal coding Close interaction and collaboration between analog and digital circuitry is needed for efficient VSoC design. © ANAFOCUS 2007 56