Unified Detection and Recognition for Reading Text in Scene

Transcription

Unified Detection and Recognition for Reading Text in Scene
Unified Detection and Recognition for
Reading Text in Scene Images
Jerod Weinman
Computer Vision Laboratory
Department of Computer Science
University of Massachusetts Amherst
IBM’s Deep Blue
Herb Simon and Allen Newell, c. 1958
Gary Kasparov
Page Readers
Legal Business DOE
99.7%
99.4%
98.4%
Annual Rpt. Magazine
Seven-Year-Old
IBM’s Deep Blue
97.4%
Rice et al., Optical Character Recognition, Kluwer, 1999.
96.6%
traditional Asi
Hcalipg Arts
Reading Signs
What Assumptions?
Detection
 Text line detection
 Word segmentation
 Binarization
traditional Asi
Hcalipg Arts
 Character segmentation
Recognition
 Small sample ( < 50 chars)
Overview
Overview
 Detection
 Detection
Reading
 Recognition
Reading
Detection
 Segmentation and
Recognition
 Detection and
Recognition
 Recognition
Detection
 Segmentation and
Recognition
Recognition
 Detection and
Recognition
Recognition
Sign and Text Detection
Context
Contextual Model
(
t3
t4
t1
t2
Mixed Labels
)
Pr t | Image,Experience
t3
t4
!
t1
t2
Partially Labeled Data
p (ttSign, ttBg
Bg ,, t Mix | x ; θ)
Marginal Likelihood:
p (tSign, tBg | x ; θ) =
p (tSign, tBg , tMix | x ; θ)
p (tMix | tSign, tBg, x ; θ )
Detection Examples
Detection Examples
Detection Comparison
20%
Context: 80% DR, 0.9% FP
No Context: 63% DR, 0.6% FP
Detection Results
Detection Comparison
SignLevel
Local
Contextu
al
Detection
Rate
86±1%
85±2%
Avg.
Coverage
74 ±2%
88 ±1%
Median
Coverage
81%
100%
False Pos
Area/Image
0.97
±0.01%
1.0 ±0.2%
Contributions
 Learned Layout Analysis
 Simultaneous Multi-Scale Detection
 Training with Partially Labeled Data
Overview
 Detection
Reading
 Recognition
Detection
 Segmentation and
Recognition
 Detection and
Recognition
Recognition
Overview
Recognition Information
 Appearance
 Detection
Reading
 Recognition
Detection
 Local Language
 Segmentation and
Recognition
Recognition
 Detection and
Recognition
P (TH | English) =
39
1000
P (QU | English) =
1.4
1000
P (IN | English) =
21
1000
P (QA | English) =
.0001
1000
 Lexicon
a
Aaberg
Aachen
M
zymurgy
Zyuganov
Zzz
M
Recognition Information
Recognition Information
 Similarity
ξαλλαλλαννα
!
?
t=
t
Mississippi
traditional Asi
Hcalipg Arts
Recognition Information
Learning Similarity
=
?
p!
n
 Similarity
 Same
!
?
t=
t
=
?
p!
n
traditional Asi
Hcalipg Arts
=
 Different
!
!
?
t=
t
All the Information
All the Information
Similarity
Appearance
[Ohya et al. 94]
[Chen et al. 02]
[Kusachi et al. 04]
Language
Similarity
Appearance
[Riseman & Hanson 74]
[Jones et al. 91]
[McQueen & Mann 00]
[Thillou et al. 05]
Language
[Bazzi et al. 99]
[Brakensiek et al. 00]
Lexicon
Lexicon
All the Information
All the Information
Similarity
Appearance
[Bledsoe & Browning 59]
[Shi & Pavlidis 97]
[Thillou et al. 07]
[Casey & Nagy 71]
[Hong & Hull 95]
[Manmatha et al. 96]
[Hobby & Ho 97]
[Breuel 01]
Language
Lexicon
Appearance
[Bazzi et al. 99]
[Zhang & Chang 03]
[Jacobs et al. 05]
[Edwards & Forsyth 06]
Language
[Weinman 07]
Recognition Model
Similarity
(
)
Pr y | Image,Experience
Language
!
Appearance
Language
Similarity
Lexicon
[Weinman 06]
Lexicon
All the Information
Appearance
Similarity
y1
y2
y3
y4
Recognition Model
(
Recognition Examples
)
Pr y | Image,Experience
Appearance
!
Language
y1
Similarity
y2
y3
y4
Correct
Recognition Results
Hard
Recognition Model
Char. Error Rate
(
)
Pr y | Image,Experience
!
w1
Appearance
Language
Similarity
y1
y2
y3
Lexicon
Bias
Shortening the Lexicon
Dreg
Dyed
ELNA
ERLS
Edan
Elia
Eurs
FANE
FLOR
FRIS
Figs
Fobs
GARD
GEUM
GRIZ
Gala
Glia
Grae
HAZE
HIGH
HYMN
Hats
Hood
Hutt
IRAs
Ibid
Ivah
JAVA
JUNE
Jebs
KABS
KEAN
KOLK
Kahl
Kind
Kora
LECH
LIDS
LOXS
LYND
Leus
Lirs
Lurs
MALT
MILK
MOLT
Mams
Mega
Mood
Mutt
NIKO
NUBS
Nils
Null
OORT
OTTI
Omar
Orva
PEEP
PIER
PROP
Paco
Pica
Poke
Pyms
RAHS
RICE
RODD
Rags
Redo
Roeg
Rude
SEER
SHIR
SOHS
SRTA
Sair
Seda
Sixs
Socs
Suns
TAGS
TICA
TOLD
Tabb
Tavi
Toff
Trev
URSI
Urea
VOLA
Vern
WEFT
WING
Wean
YENS
YWIS
ZIMA
Zerk
afro
alds
arch
assn
bark
bent
boat
brod
cati
choc
coos
cubs
demo
ding
duck
ease
emit
espy
febs
Fitz
fyns
gaye
goof
gulp
herb
hoke
idos
inss
jeep
join
keas
kiln
lads
leak
livy
lous
mara
meme
mope
myke
nine
numb
oort
otti
Peon
pipe
puce
race
rews
road
sabu
scop
sils
smog
sued
synd
Sparsity Impact
thaw
tobe
twig
urga
vile
vows
webb
weys
wkly
wrap
yeti
yore
zinc
zuni
w
rA
Belief
AARA
ADDS
AMAL
ANYS
AVER
Acts
Alia
Anew
Arks
Arts
BABS
BEVY
BLOK
BUDS
Badr
Bice
Blvd
Bump
CANA
CION
COIT
CUYP
Cary
Clop
Cons
DALS
DELE
DONE
DRYE
Deck
Diba
y1
y2
y3
y4
A
N
r
c
p
t
k
s
h
cN
pR
v
Character
 Median number of characters: 4
 Median percent of words:
0.07%
 Median number of words
16
(of 35,000)
y4
Recognition Examples
Recognition Results
Char. Error Rate
No
Lexico
n
USED
BOOKI
HTOR
UP5
RELAmo
3
31
BOLTWOO
D
Lexico
n
USED
BOOKS
HOOK
UPS
RELAmo
3
31
BOLTWOO
D
Forced
Lexico
n
USED
BOOKS
HOOK
UPS
DELANO
S
SI
BENTWOO
D
All the Information
Contributions
Similarity
Appearance
Word Error Rate
 Unified model for recognition
 Learned similarity function for consistency
 Principled, fast lexicon integration
Language
Lexicon
Overview
Overview
 Detection
 Detection
Reading
 Recognition
Reading
Detection
 Segmentation and
Recognition
 Detection and
Recognition
 Recognition
Detection
 Segmentation and
Recognition
Recognition
 Detection and
Recognition
Recognition
Word Segmentation
Character Segmentation
Interpretation Graph
Recognition Comparison
uucL,ass
Free ehe in
LIBRFIRY
A),1HERbT
o r t
i
TAVF
n
Tradiliorpal Asi’?
HealiIN Arts
Results
Low Resolution
Resumes
Char Err.
Resumes
OmniPage
23.5%
OmniPage +
Binarized
16.6%
Ours
15.0%
WM/ I WS
krauts
11.-4her
tttA wms
Low Resolution
Contributions
Je eryAmherst
JefferyAmherst
JONryAmhent
*
 Unified word segmentation and recognition
 Hybrid open and closed vocabulary modes
 Recognition bias prevention
GkilaryAmhent
†
.WlervAmMnt
✖
*
Lit wryA An memo
†
An wrrA Am wme
Jo 116 fp
Ak 11_Se-
Overview
Overview
 Detection
 Detection
Reading
 Recognition
Reading
Detection
 Segmentation and
Recognition
 Detection and
Recognition
 Recognition
Detection
 Segmentation and
Recognition
Recognition
Reading Problem
Recognition
 Detection and
Recognition
One Task, Many Problems
A
Recognition
Detection
Characters
Recognition
B
C
Image Region
Cars
Erik's Honda
My Jeep
Allen
Detection
Faces
Background
Andrew
Keith
Reading Problem
Which Features?
Detection
Recognition
Model Options
One Task, Many Problems
A
Characters
B
C
Image Region
Cars
Erik's Honda
My Jeep
Allen
Faces
Independent
Data
Cars
Characters
Background
Flat
Factored
[Torralba et al. 04]
[Bar-Hillel & Weinshall 06]
Background
Categorization Comparison
Andrew
Keith
Recognition Comparison
Contributions
 Better performance, less time
 Hierarchical training objective
Overview
Overview
 Detection
 Detection
Reading
 Recognition
Reading
Detection
 Segmentation and
Recognition
 Detection and
Recognition
The Result?
 Recognition
Detection
 Segmentation and
Recognition
Recognition
 Detection and
Recognition
OmniPage
Recognition
MILL ANTIQUES
J
L7:"? 4,
M o fo o f we y l IP m wi
Overview
 Detection
Reading
 Recognition
Detection
 Segmentation and
Recognition
FIREE DELIVEp
Ma OS
 Detection and
Recognition
Recognition
Thank You
Support
NSF IIS-0326249
NSF IIS-0100851
NSA
CIA
Collaborators
Allen Hanson
Erik Learned-Miller
Andrew McCallum
Piyanuch Silapachote
Marwan Mattar
Richard Weiss

Similar documents