Unified Detection and Recognition for Reading Text in Scene
Transcription
Unified Detection and Recognition for Reading Text in Scene
Unified Detection and Recognition for Reading Text in Scene Images Jerod Weinman Computer Vision Laboratory Department of Computer Science University of Massachusetts Amherst IBM’s Deep Blue Herb Simon and Allen Newell, c. 1958 Gary Kasparov Page Readers Legal Business DOE 99.7% 99.4% 98.4% Annual Rpt. Magazine Seven-Year-Old IBM’s Deep Blue 97.4% Rice et al., Optical Character Recognition, Kluwer, 1999. 96.6% traditional Asi Hcalipg Arts Reading Signs What Assumptions? Detection Text line detection Word segmentation Binarization traditional Asi Hcalipg Arts Character segmentation Recognition Small sample ( < 50 chars) Overview Overview Detection Detection Reading Recognition Reading Detection Segmentation and Recognition Detection and Recognition Recognition Detection Segmentation and Recognition Recognition Detection and Recognition Recognition Sign and Text Detection Context Contextual Model ( t3 t4 t1 t2 Mixed Labels ) Pr t | Image,Experience t3 t4 ! t1 t2 Partially Labeled Data p (ttSign, ttBg Bg ,, t Mix | x ; θ) Marginal Likelihood: p (tSign, tBg | x ; θ) = p (tSign, tBg , tMix | x ; θ) p (tMix | tSign, tBg, x ; θ ) Detection Examples Detection Examples Detection Comparison 20% Context: 80% DR, 0.9% FP No Context: 63% DR, 0.6% FP Detection Results Detection Comparison SignLevel Local Contextu al Detection Rate 86±1% 85±2% Avg. Coverage 74 ±2% 88 ±1% Median Coverage 81% 100% False Pos Area/Image 0.97 ±0.01% 1.0 ±0.2% Contributions Learned Layout Analysis Simultaneous Multi-Scale Detection Training with Partially Labeled Data Overview Detection Reading Recognition Detection Segmentation and Recognition Detection and Recognition Recognition Overview Recognition Information Appearance Detection Reading Recognition Detection Local Language Segmentation and Recognition Recognition Detection and Recognition P (TH | English) = 39 1000 P (QU | English) = 1.4 1000 P (IN | English) = 21 1000 P (QA | English) = .0001 1000 Lexicon a Aaberg Aachen M zymurgy Zyuganov Zzz M Recognition Information Recognition Information Similarity ξαλλαλλαννα ! ? t= t Mississippi traditional Asi Hcalipg Arts Recognition Information Learning Similarity = ? p! n Similarity Same ! ? t= t = ? p! n traditional Asi Hcalipg Arts = Different ! ! ? t= t All the Information All the Information Similarity Appearance [Ohya et al. 94] [Chen et al. 02] [Kusachi et al. 04] Language Similarity Appearance [Riseman & Hanson 74] [Jones et al. 91] [McQueen & Mann 00] [Thillou et al. 05] Language [Bazzi et al. 99] [Brakensiek et al. 00] Lexicon Lexicon All the Information All the Information Similarity Appearance [Bledsoe & Browning 59] [Shi & Pavlidis 97] [Thillou et al. 07] [Casey & Nagy 71] [Hong & Hull 95] [Manmatha et al. 96] [Hobby & Ho 97] [Breuel 01] Language Lexicon Appearance [Bazzi et al. 99] [Zhang & Chang 03] [Jacobs et al. 05] [Edwards & Forsyth 06] Language [Weinman 07] Recognition Model Similarity ( ) Pr y | Image,Experience Language ! Appearance Language Similarity Lexicon [Weinman 06] Lexicon All the Information Appearance Similarity y1 y2 y3 y4 Recognition Model ( Recognition Examples ) Pr y | Image,Experience Appearance ! Language y1 Similarity y2 y3 y4 Correct Recognition Results Hard Recognition Model Char. Error Rate ( ) Pr y | Image,Experience ! w1 Appearance Language Similarity y1 y2 y3 Lexicon Bias Shortening the Lexicon Dreg Dyed ELNA ERLS Edan Elia Eurs FANE FLOR FRIS Figs Fobs GARD GEUM GRIZ Gala Glia Grae HAZE HIGH HYMN Hats Hood Hutt IRAs Ibid Ivah JAVA JUNE Jebs KABS KEAN KOLK Kahl Kind Kora LECH LIDS LOXS LYND Leus Lirs Lurs MALT MILK MOLT Mams Mega Mood Mutt NIKO NUBS Nils Null OORT OTTI Omar Orva PEEP PIER PROP Paco Pica Poke Pyms RAHS RICE RODD Rags Redo Roeg Rude SEER SHIR SOHS SRTA Sair Seda Sixs Socs Suns TAGS TICA TOLD Tabb Tavi Toff Trev URSI Urea VOLA Vern WEFT WING Wean YENS YWIS ZIMA Zerk afro alds arch assn bark bent boat brod cati choc coos cubs demo ding duck ease emit espy febs Fitz fyns gaye goof gulp herb hoke idos inss jeep join keas kiln lads leak livy lous mara meme mope myke nine numb oort otti Peon pipe puce race rews road sabu scop sils smog sued synd Sparsity Impact thaw tobe twig urga vile vows webb weys wkly wrap yeti yore zinc zuni w rA Belief AARA ADDS AMAL ANYS AVER Acts Alia Anew Arks Arts BABS BEVY BLOK BUDS Badr Bice Blvd Bump CANA CION COIT CUYP Cary Clop Cons DALS DELE DONE DRYE Deck Diba y1 y2 y3 y4 A N r c p t k s h cN pR v Character Median number of characters: 4 Median percent of words: 0.07% Median number of words 16 (of 35,000) y4 Recognition Examples Recognition Results Char. Error Rate No Lexico n USED BOOKI HTOR UP5 RELAmo 3 31 BOLTWOO D Lexico n USED BOOKS HOOK UPS RELAmo 3 31 BOLTWOO D Forced Lexico n USED BOOKS HOOK UPS DELANO S SI BENTWOO D All the Information Contributions Similarity Appearance Word Error Rate Unified model for recognition Learned similarity function for consistency Principled, fast lexicon integration Language Lexicon Overview Overview Detection Detection Reading Recognition Reading Detection Segmentation and Recognition Detection and Recognition Recognition Detection Segmentation and Recognition Recognition Detection and Recognition Recognition Word Segmentation Character Segmentation Interpretation Graph Recognition Comparison uucL,ass Free ehe in LIBRFIRY A),1HERbT o r t i TAVF n Tradiliorpal Asi’? HealiIN Arts Results Low Resolution Resumes Char Err. Resumes OmniPage 23.5% OmniPage + Binarized 16.6% Ours 15.0% WM/ I WS krauts 11.-4her tttA wms Low Resolution Contributions Je eryAmherst JefferyAmherst JONryAmhent * Unified word segmentation and recognition Hybrid open and closed vocabulary modes Recognition bias prevention GkilaryAmhent † .WlervAmMnt ✖ * Lit wryA An memo † An wrrA Am wme Jo 116 fp Ak 11_Se- Overview Overview Detection Detection Reading Recognition Reading Detection Segmentation and Recognition Detection and Recognition Recognition Detection Segmentation and Recognition Recognition Reading Problem Recognition Detection and Recognition One Task, Many Problems A Recognition Detection Characters Recognition B C Image Region Cars Erik's Honda My Jeep Allen Detection Faces Background Andrew Keith Reading Problem Which Features? Detection Recognition Model Options One Task, Many Problems A Characters B C Image Region Cars Erik's Honda My Jeep Allen Faces Independent Data Cars Characters Background Flat Factored [Torralba et al. 04] [Bar-Hillel & Weinshall 06] Background Categorization Comparison Andrew Keith Recognition Comparison Contributions Better performance, less time Hierarchical training objective Overview Overview Detection Detection Reading Recognition Reading Detection Segmentation and Recognition Detection and Recognition The Result? Recognition Detection Segmentation and Recognition Recognition Detection and Recognition OmniPage Recognition MILL ANTIQUES J L7:"? 4, M o fo o f we y l IP m wi Overview Detection Reading Recognition Detection Segmentation and Recognition FIREE DELIVEp Ma OS Detection and Recognition Recognition Thank You Support NSF IIS-0326249 NSF IIS-0100851 NSA CIA Collaborators Allen Hanson Erik Learned-Miller Andrew McCallum Piyanuch Silapachote Marwan Mattar Richard Weiss