Gior in VId nata di S deo Sorve Studio su eglianza i Progett in ITalia i

Transcription

Gior in VId nata di S deo Sorve Studio su eglianza i Progett in ITalia i
 Uniiversità di Moodena e Regggio Emilia
Giornata di Studio
S
sui Progetti di Ricerrca
in VIddeo Sorveeglianza in ITalia (VISIT 2008)
22 Magggio 2008
Faacoltà di Inngegneria
Via Vignolese, 9905 - Moddena, Italy
httpp://imageelab.ing.unnimore.it//visit2008
Indice
Indice Foreword Organizzatori e Sponsor Problematiche della videosorveglianza in rapporto all’attività delle Forze di Polizia i
iii
iv
1
dott. Elio Graziano Questore di Modena Video Biometric Surveillance and Forensic Image Analysis Tecnologie Biometriche per la Videosorveglianza ed Analisi delle Immagini per Applicazioni Forensi 7
Gian Luca Marcialis1, Fabio Roli1, Pietro Andronico2, Paolo Multineddu2, Pietro Coli3, Giovanni Delogu3 (1)
Dipartimento di Ingegneria Elettrica ed Elettronica, Università di Cagliari; (2)Vitrociset SpA; (3)
Reparto Carabinieri Investigazioni Scientifiche di Cagliari Vision‐based technologies for surveillance and monitoring Tecnologie basate sulla visione per la sorveglianza e il monitoraggio 12
Stefano Messelodi Fondazione Bruno Kessler – IRST – TRENTO ‐ Italy Improved Videosurveillance platforms for critical events recognition Piattaforme evolute di videosorveglianza per il riconoscimento di eventi critici 16
C. Distante+, M. Leo∗, A. Leone+, G. Diraco+, T. D’Orazio*, P. Spagnolo* + IMM‐CNR, Istituto per la Microelettronica e Microsistemi, Lecce, Italy ∗
ISSIA‐CNR, Istituto di Studi sui Sistemi Intelligenti per l’Automazione, Bari, Italy Samurai ‐ Small Area 24‐hours Monitoring. Using a netwoRk of cAmeras & sensors for sItuation awareness enhancement SAMURAI – Monitoraggio di ambienti mediante l’uso di reti di telecamere e sensori di supporto proattivo all’operatore 21
Marco Cristani, Vittorio Murino Dipartimento di Informatica, University of Verona Threat Assessments in Reactive Surveillance Networks Reti di Sorveglianza reattive per il rilevamento di eventi 25
Prof. Gian Luca Foresti and Dr. Christian Micheloni Dept. Computer Science, Università degli Studi di Udine Multiple cameras surveillance systems for multiple people tracking and action analysis: the ImageLab solution Sistemi multi‐camera di videosorveglianza per il tracking di più persone e il riconoscimento delle azioni: la soluzione di ImageLab Rita Cucchiara, Costantino Grana, Andrea Prati, Roberto Vezzani Imagelab, Università di Modena e Reggio Emilia 1 30
Distant Humans Identification In Wide Areas Identificazione di persone in aree estese 34
A. Del Bimbo, F. Dini, A. Grifoni, F. Pernici Dipartimento di Sistemi e Informatica, Università degli Studi di Firenze Video surveillance and bio‐inspired embodied cognitive systems Videosorveglianza e sistemi cognitivi embodied bio‐inspired 36
Matteo Pinasco, Carlo S. Regazzoni Department of Biophysical and Electronic Engineering, University of Genoa Video Surveillance @ CVLab Video Sorveglianza presso il Computer Vision Laboratory 39
Luigi Di Stefano, Stefano Mattoccia, Federico Tombari, Alessandro Lanza DEIS‐ARCES, University of Bologna Integrating computer vision techniques and wireless sensor networks in video surveillance systems Integrazione di tecniche di visione artificiale e reti di sensori wireless in sistemi di video sorveglianza 42
Edoardo Ardizzone, Marco La Cascia, Liliana Lo Presti Dipartimento di Ingegneria Informatica ‐ Università di Palermo Challenges and developments in intelligent video surveillance Sfide e sviluppi nella video sorveglianza intelligente Virginio Cantoni, Luca Lombardi, Roberto Marmo University of Pavia, Faculty of Engineering, Laboratory of Computer Vision 2 47
Foreword
VISIT2008 è la prima Giornata di Studio sulla ricerca in Video Sorveglianza in Italia, organizzata dal GIRPR, Gruppo Italiano di Ricercatori nella Pattern Recognition. In questi anni stiamo assistendo ad una straordinaria diffusione della ricerca in Visione Artificiale e Pattern Recognition per la video sorveglianza, come è testimoniato dalle decine di workshop, eventi e conferenze che vengono organizzati all’estero, sponsorizzati dalle maggiori associazioni internazionali. Questa giornata nasce per rispondere ad una condivisa esigenza di confronto scientifico anche a livello nazionale, nel momento in cui molti ricercatori italiani nell’ambito dell’informatica, dell’ingegneria informatica e delle telecomunicazioni, si stanno occupando di video sorveglianza e sicurezza, con progetti e risultati di eccellenza internazionale. In questo volume abbiamo raccolto il contributo di una decina di gruppi di ricerca italiani che attualmente stanno svolgendo progetti di ricerca nazionali ed internazionali di rilevo. Tra essi ricordiamo 4 Progetti europei, 2 progetti NATO, 3 progetti internazionali inter‐universitari, 2 Progetti di Rilevante Interesse Nazionale (PRIN) del Ministero Della Ricerca e dell’Università, 1 progetto del Ministero dei Trasporti, e più di una decina di progetti di ricerca applicata con enti pubblici ed aziende italiane. I progetti spaziano dalla sorveglianza di luoghi pubblici e privati, al controllo del traffico e al comportamento delle persone, al monitoraggio di oggetti abbandonati e di eventi sospetti, alla biometria per la sicurezza e all’analisi forense. Speriamo che questo incontro possa essere il primo di una lunga serie per promuovere la ricerca italiana, il confronto e la collaborazione in un settore emergente cosi strategico sia socialmente che economicamente quale è la analisi automatica di video ed immagini per la video sorveglianza. VISIT2008 is the first Workshop on Video Surveillance Research in Italy, organized by GIRPR , Gruppo Italiano di Ricercatori nella Pattern Recognition. The research in Computer Vision and Pattern Recognition for video surveillance has an impressive growing in the last years over the world, as is testified by the tens of workshops, conferences and events sponsored by the most important international scientific associations. This workshop would like to promote also in Italy the interchanges between research affinities between the number of researchers which are applied in many national and international projects on video surveillance. This book collects the contributions coming from a dozen Italian Research Groups which are involved in relevant research projects. Among them, we cite 4 European Projects, 2 NATO projects, 3 international academic collaborations, 3 Italian Ministry Council projects and tens of applied research project funded by public subjects and private companies. They address many topics such as traffic control, people surveillance indoor and outdoor, abandoned object detection, event and abnormal behavioral analysis, biometry and forensics. We hope that this meeting could be the first of a long series of events for promoting the Italian Research, the collaboration and the scientific inter‐change in a such emergent and strategic field like video‐
surveillance. Prof. Rita Cucchiara Modena, 22 May 2008 Università di Modena e Reggio Emilia 3 Organizzatori e sponsor
Comitato scientifico General Chair: Prof. Rita Cucchiara (Università di Modena e Reggio Emilia)
Scientific Committee: Prof. Rita Cucchiara (Università di Modena e Reggio Emilia)
Prof. Marco Ferretti (Università di Pavia)
Prof. Gian Luca Foresti (Università di Udine)
Prof. Vittorio Murino (Università di Verona)
Program Chair: Dr. Andrea Prati (Università di Modena e Reggio Emilia)
Local Chair: Dr. Roberto Vezzani (Università di Modena e Reggio Emilia)
Con il patrocinio dell'Università di Modena e Reggio Emilia Dipartimento di Ingegneria
dell'Informazione
Dipartimento di Scienze e
Metodi dell'Ingegneria
CRIS
(Centro Interdipartimentale di
Ricerca sulla Sicurezza)
Si ringraziano gli sponsor 4 Problematiche della videosorveglianza in rapporto all’attività
delle Forze di Polizia
dott. Elio Graziano
Questore di Modena Diversi comuni della provincia di Modena, compreso il Capoluogo, si sono dotati di impianti di videosorveglianza per il presidio del centro urbano e di aree a rischio. Altri hanno già annunciato l’intenzione di istallarli. L’adozione di tali impianti viene prospettata sempre più come un’arma fondamentale se non addirittura decisiva per la soluzione dei problemi di sicurezza delle città, anche di quelle più piccole. Certamente essi possono svolgere, se ricorrono alcune condizioni di tipo tecnico e organizzativo di cui dirò appresso, un ruolo prezioso per l’identificazione di persone e il monitoraggio di fatti e comportamenti, che sono attività fondamentali ai fini dell'applicazione della legge penale, ma anche dell'attuazione della funzione preventiva della polizia. Non esistono ancora in Italia indagini approfondite circa l’effettiva capacità di deterrenza di questi sistemi in ordine alla commissione di reati. In via generale, può dirsi che risultati significativi sulla diminuzione di questi in un’area comunale presidiata dalle telecamere non si ottengono in tempi brevi, come emerge anche da una recente ricerca effettuata in Gran Bretagna. Ciò è tanto più vero in un periodo di impetuosa crescita della delittuosità qual’ è stato in gran parte il 2007 nella nostra provincia. In periodi come questo è difficile valutare l’efficacia di tutti gli strumenti di cui si avvale l’apparato di prevenzione. La semplice constatazione di un aumento di reati non autorizza un giudizio negativo su tali strumenti, specie se esso può trovare giustificazione in fattori estranei al territorio e non governabili dagli organi preposti alla sicurezza cittadina, come è stato certamente l’indulto promulgato alla fine del 2006. La presenza di telecamere ha generalmente un effetto positivo sulla percezione di sicurezza dei cittadini, ma, di fatto, una capacità dissuasiva limitata e non immediata sulla cosiddetta criminalità diffusa, in aree di una certa ampiezza come i centri cittadini. La capacità dissuasiva è, invece, maggiore se le telecamere vengono collocate all’interno di spazi ristretti o ben delimitati, come le stazioni ferroviarie o gli impianti sportivi o a presidio di obiettivi sensibili ben identificati ( per esempio scuole, luoghi di culto, ambasciate) e raggiunge la massima efficacia per attività a rischio o di custodia di valori svolte in interni. Per queste tipologie di ambienti l’optimum è costituito dall’abbinamento alla videosorveglianza di un sistema di acquisizione di dati biometrici. In particolare, negli istituti di credito si va sempre più diffondendo la prassi “virtuosa” di subordinare l’accesso alla rilevazione di impronte digitali: in questi casi gli autori di rapine sono il più delle volte identificati. Se poi sistemi di questo tipo sono implementati con apposito software che impedisca l’accesso in banca ai soggetti di cui non si individui attraverso la telecamera un numero sufficiente di elementi fisionomici, diventa veramente improbabile l’ingresso di malintenzionati. È’ noto, infatti, che un rapinatore tende a nascondere il proprio volto alle telecamere. Ho esaminato software di questo tipo alla Comerson e sicuramente un “filtro” del genere è più efficace del metal detector, che se è troppo sensibile esaspera la clientela, se lo è poco è praticamente inutile. Senza contare che molte rapine vengono compiute con semplici taglierini che sfuggono al metal detector. Ma torniamo alla videosorveglianza a presidio delle città. Essa certamente non può essere considerata una soluzione miracolistica per tutte le problematiche del disordine urbano ma può costituire un ausilio prezioso per le Forze di Polizia impegnate nella vigilanza dei centri urbani e nel contrasto delle molteplici tipologie criminali che li affliggono: normalmente reati predatori e spaccio di droga, ma a volte anche reati di terrorismo. E’ necessario, però, un progetto organico per la sicurezza urbana che deve essere elaborato in accordo tra autorità locali e organi di polizia e prevedere, insieme alla videosorveglianza, molteplici e differenziati interventi, che possono essere di polizia, come il controllo del territorio da parte delle pattuglie auto o moto montate, il controllo di “prossimità” del poliziotto di quartiere, e l’attività di intelligence degli organi investigativi, ma anche di tipo urbanistico, come l’ illuminazione 1 Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 cittadina, iniziative per rivitalizzare zone degradate, disincentivazione e, se necessario, smantellamento di agglomerati urbani “ghetto”. La videosorveglianza, per essere efficace, oltre che essere inserita in un complesso di misure coerenti, deve potersi avvalere di apparecchiature tecnologicamente adeguate, sia per la ripresa ( telecamere ad alta definizione, con obiettivi sufficientemente sensibili e dotati di meccanismi di compensazione per il controluce o l’insufficienza di luce e con possibilità di brandeggio e zoom ) che per la trasmissione e la registrazione delle immagini (che si attivi, ad esempio, mediante motion detector, soltanto quando una cosa o una persona entrino nel campo visivo dell’obiettivo, facilitando così la successiva ricerca ed analisi delle immagini. Sarebbe ancora meglio se il motion detector, inoltre, richiamasse l’attenzione dell’operatore in presenza di avvenimenti inusuali ripresi dalle telecamere). Le telecamere, inoltre, devono essere posizionate in modo da riprendere in primo piano e nitidamente persone e situazioni rilevanti per il controllo del territorio, se necessario procedendo a spostamenti progressivi che possono essere suggeriti dai responsabili delle Forze di Polizia in base alle risultanze operative. Le immagini registrate, se posseggono sufficienti requisiti di nitidezza, costituiscono, indubbiamente, un supporto prezioso per identificare gli autori di un’azione criminosa attraverso l’ esame del loro volto e delle altre caratteristiche somatiche e comportamentali nonché dell’abbigliamento. Non solo: la tempestiva visione del filmato agevola la ricerca, in sede di sopralluogo tecnico della polizia scientifica, delle tracce dei malviventi e, talvolta, la ricostruzione di dettagli importanti della dinamica criminosa. Nell’attuale ordinamento processuale, caratterizzato dalla formazione della prova nel contraddittorio delle parti, l’acquisizione di dati oggettivi attraverso l’analisi delle immagini, può conferire particolare forza ad un’ipotesi accusatoria. Non esiste in Italia una normativa specifica sulla videosorveglianza. I trattamenti di dati personali acquisiti attraverso di essa devono comunque attenersi ai principi del codice in materia di protezione dei dati personali. Il Garante per la Privacy, nell’ambito delle competenze conferitogli dall’art. 154 del decreto legislativo 30 giugno 2003 n. 196 ha emanato il “Provvedimento generale sulla videosorveglianza” del 29 aprile del 2004, che 2 ampliando e dettagliando meglio le disposizioni del precedente “Decalogo di regole sulla videosorveglianza” del 29 novembre 2000 ha enunciato gli adempimenti, le garanzie e le cautele già oggi necessarie in base ai principi del suddetto codice. Un terzo provvedimento di rilievo è stato emanato 27 ottobre 2005 in ordine ai limiti e alle garanzie relative alla rilevazione congiunta di impronte digitali e immagini presso gli istituti di credito. Nel “Provvedimento generale sulla videosorveglianza” del 29 aprile del 2004 il Garante per la protezione dei dati personali postula l’esigenza del rispetto del principio di proporzionalità tra mezzi impiegati e fini perseguiti, desunto dall’art. 11, comma 1, lettera d del decreto legislativo 30 giugno 2003 n. 196 e prescrive testualmente che “Gli impianti di videosorveglianza possono essere attivati solo quando altre misure siano ponderatamente valutate insufficienti o inattuabili” e che “Non va adottata la scelta semplicemente meno costosa, o meno complicata, o di più rapida attuazione”. Aggiunge, il Garante, che occorre valutare, tra l’altro, se “sia sufficiente, ai fini della sicurezza, rilevare immagini che non rendano identificabili i singoli cittadini, anche tramite ingrandimenti” e “se sia realmente essenziale ai fini prefissi raccogliere immagini dettagliate”. Nello stesso documento il Garante arriva a preconizzare una videosorveglianza senza registrazione per “gioiellerie, supermercati, filiali di banche, uffici postali”; la registrazione andrebbe attivata soltanto in vista di specifiche situazioni di rischio pronosticabili per lo più sulla base di reati già verificatisi. Ora non c’è dubbio che una videosorveglianza senza possibilità di identificare i soggetti ripresi o addirittura senza registrazione, di fatto perderebbe gran parte della funzione dissuasiva e totalmente la funzione di strumento di identificazione dei responsabili di azioni criminose. Inoltre, quelle che vengono indicate dal Garante come “altre misure”, per essere veramente efficaci, molto spesso, devono essere adottate unitamente alla videosorveglianza. Esse non sono in alternativa o contrapposte alla videosorveglianza, ma devono, piuttosto, integrarsi con essa. Questa affermazione è vera soprattutto con riguardo all’impiego della risorsa umana, che rimane, ovviamente quella principale. Certamente il personale in servizio nelle sale operative, se si avvale della videosorveglianza, Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 può svolgere meglio i propri compiti. Le immagini che pervengono ai monitor possono fornire informazioni preziose alle pattuglie su strada e allo stesso poliziotto/carabiniere di quartiere per esercitare vigilanza e interventi più mirati. Nel provvedimento emanato il 27 ottobre 2005, si subordina a condizioni ancora più restrittive l’installazione di un rilevatore di impronte digitali associato alle telecamere. Deve sussistere, secondo il Garante, “una concreta situazione di elevato rischio” risultante “anche da concordanti valutazioni da parte degli organi competenti in materia di tutela dell'ordine e della sicurezza pubblica” che può derivare, in particolare, “dalla localizzazione dello sportello bancario (ad esempio, ove lo stesso sia situato in aree ad alta densità criminale, o isolate o, comunque, poste nell'immediata prossimità di "vie di fuga"). Può altresì venire in considerazione la circostanza che lo sportello bancario, o altri sportelli siti nella medesima zona, abbiano subìto rapine. Possono inoltre rilevare altre contingenti vicende che espongano a reale pericolo una o più filiali determinate (come ad esempio rilevato in passato, con riguardo alla maggiore "liquidità" presso gli sportelli bancari in corrispondenza dell'introduzione della moneta unica europea)”. Secondo il Garante “In mancanza di specifici elementi che comprovino una concreta situazione di elevato rischio” la rilevazione di impronte digitali comporta “un sacrificio sproporzionato della sfera di libertà e della dignità delle persone interessate, esponendo, altresì, le stesse a pericolo di abusi in relazione a dati a sé riferibili particolarmente delicati quali sono le impronte digitali”. Quest’ultima affermazione non trova alcun riscontro nella realtà, perché mentre un volto, teoricamente potrebbe essere riconosciuto dagli addetti alla vigilanza della banca o da chiunque abbia modo di rivedere le immagini registrate, l’impronta digitale può far risalire all’identità di una persona soltanto se è elaborata dalla polizia scientifica. Il che accade in seguito alla commissione di un crimine. A meno che non si intenda difendere la privacy di ladri e rapinatori deve ammettersi che l’assunzione di un’impronta digitale è molto meno invadente dell’acquisizione di un’immagine. Il verità il complesso di disposizioni impartite dal Garante tradisce una diffidenza a mio parere del tutto ingiustificata nei confronti di strumenti che sono ormai indispensabili per le Forze di Polizia. Tornando alla videosorveglianza a presidio dei centri urbani, è necessario un adeguato rinnovamento culturale all’interno delle Forze di Polizia, affinché l’attività di controllo del territorio venga organizzata e ricalibrata periodicamente in funzione di queste tecnologie e che gli operatori siano formati per sfruttare al meglio le opportunità che esse offrono. Non conta soltanto la tempestività degli interventi nelle zone presidiate dalle telecamere: spesso l’intervento pur tempestivo non produce nell’immediato l’identificazione o l’arresto della persona che ha commesso un reato né la soluzione di una situazione di “criticità”. Assume, allora, un rilievo primario la capacità di analizzare le immagini registrate che, come si è già detto, possono essere di grande ausilio all’attività investigativa e consentire l’identificazione di autori di reati in un momento successivo. Non si deve pensare soltanto alla cosiddetta criminalità diffusa, ma anche al terrorismo, come dimostra l’attività investigativa svolta in occasione degli attentati di Madrid e di Londra. Le immagini sono d’ausilio agli investigatori anche quando non documentino direttamente l’azione criminosa, ma solo gli spostamenti di soggetti che abbiano commesso un crimine altrove, com’è stato per le indagini sull’omicidio Biagi. La circostanza che una persona si trovi in un certo giorno e in una certa ora in una data località può assumere di per sé un valore decisivo per scoprire l’autore di un reato. In alcuni casi non è neppure pensabile di svolgere un’attività investigativa che risponda a criteri di efficienza e di economicità senza l’ausilio della videosorveglianza. Ad esempio spesso sarebbe impossibile identificare i responsabili di comportamenti penalmente o amministrativamente sanzionati all’interno di impianti sportivi. Pertanto, è utile e necessario che essa venga diffusa e potenziata. Sarebbe importante, inoltre, implementare il sistema di controllo dei centri urbani con i sistemi di videosorveglianza di beni e di esercizi privati, in particolare di esercizi a rischio come banche, uffici postali, gioiellerie, farmacie e supermercati, imponendo loro di installare almeno una telecamera all’esterno in prossimità dell’ingresso. E’ appena il caso di precisare che i malviventi che intendono compiere un’azione criminosa (l’esempio classico è quello della rapina in banca) quando sono ancora all’esterno devono mostrarsi a volto scoperto. Si consideri inoltre il 3 Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 grave pericolo per la sicurezza e l’incolumità pubblica, in termini generali, costituito dal forzamento dei bancomat esterni con esplosione di gas, che certamente impone di presidiare tali sportelli con più telecamere esterne. La citata direttiva del Garante del 2004 si muove in senso diametralmente opposto proibendo ai privati l’installazione di telecamere che possano riprendere i cittadini all’esterno delle strutture presidiate. Essa andrebbe modificata al più presto: si consideri che in ottemperanza ad essa molte banche hanno rimosso le telecamere esterne. Ma si dice che questi sistemi comprimerebbero troppo la libertà e la privacy dei cittadini. Io credo che, se la videosorveglianza viene esercitata nel rispetto di due prescrizioni fondamentali del Garante contenute nel citato provvedimento del 29 aprile 2004, i pericoli di violazione della sfera di libertà privata dei cittadini che sono ripresi loro malgrado dalle telecamere siano soltanto teorici. Le prescrizioni, che possono essere ritenute congrue ed adeguate alla tutela del diritto al rispetto della propria privacy, sono: 1) Le immagini devono essere cancellate periodicamente e automaticamente; 2) Soltanto le Forze di Polizia possono estrapolare e utilizzare le immagini, a fini investigativi, in occasione del compimento di un’attività criminosa, informandone il Magistrato. Il Garante indica in una settimana il termine massimo per la cancellazione delle immagini, salvo esigenze particolari prospettate da organi investigativi. Il termine forse è troppo breve e andrebbe prolungato: si pensi alla necessità di identificare dei malviventi che effettuino un “sopralluogo” a volto scoperto prima di compiere una rapina in banca. In Francia ed in Spagna, ove esiste una legge ad hoc, il termine è di un mese. La verità è che le tecnologie molto progredite di controllo dei comportamenti umani, come quelle utilizzate per la videosorveglianza, sono potenzialmente pericolose in Paesi che non diano sufficienti garanzie di rispetto delle libertà democratiche. Nei Paesi democratici, come il nostro, i comportamenti all’interno degli apparati di polizia sono sotto il controllo dei cittadini. Se è vero questo, non si può pensare, in nome di una malintesa “salvaguardia” della privacy, di impedire o ostacolare oltre la ragionevolezza la diffusione di strumenti di controllo efficaci come la videosorveglianza. 4 A questo proposito, poi, vale la pena di chiarire : non è vero che i cittadini quando vengono ripresi da una telecamera siano “automaticamente” identificati. Né si deve pensare che qualcuno in permanenza stia a guardare i monitor a questo scopo. L’identificazione dei soggetti ripresi viene effettuata dalla Polizia, generalmente in un momento successivo, soltanto nel caso che siano commessi reati e comunque mai interfacciando il computer che memorizza le immagini ad un database esterno. Certi scenari “orweliani” sono di pura fantasia. Sarebbe auspicabile un intervento legislativo che disciplini in modo organico l’intera materia, sia con riguardo al doveroso rispetto della privacy, sia con riguardo ai requisiti minimi di sicurezza che devono avere strutture a rischio come le banche, dettando disposizioni non equivoche, che siano in linea con il progresso tecnologico e con le concrete esigenze della sicurezza urbana. Forse si potrebbe intervenire, almeno parzialmente, anche con regolamenti edilizi comunali che impongano certe caratteristiche ai locali ove si svolgono attività a rischio. Si è già detto che la videosorveglianza, per svolgere al meglio la sua funzione, deve essere realizzata con apparecchiature tecnologicamente adeguate e avvalersi di operatori delle Forze di Polizia appositamente formati. Ma ciò non basta. E’ anche necessario che le immagini relative a fatti criminosi siano condivise da più organismi investigativi e i soggetti, in esse raffigurati, siano identificati in modo scientifico come autori di reati. E’ auspicabile la costituzione di un apposito database condiviso da tutte le Forze di Polizia, almeno a livello regionale. Infatti, la possibilità di collegare attraverso le immagini fatti criminosi compiuti in località diverse permette di utilizzare tutti gli elementi emersi (auto e armi impiegate, modus operandi, ecc..), con il risultato che l’ identificazione dei responsabili, anche se è non possibile nell'immediato, può essere conseguita successivamente in base al complesso di dati raccolti e memorizzati. Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 quanto riguarda i sospettati, non ci si limiterà alle loro fotosegnaletiche ma, se è possibile, essi saranno sottoposti a "fotosegnalamento mirato", in modo da acquisire ed utilizzare immagini fotografiche omogenee rispetto ai fotogrammi dei rapinatori e, perciò, idonee a evidenziare i medesimi elementi morfologici. Inoltre un’attenta osservazione diretta dei sospettati permetterà di individuare dettagli somatici e fisionomici nonché caratteristiche funzionali (come la deambulazione) che possano costituire oggetto di confronto con gli elementi che emergono dall’ analisi della sequenza della rapina. Infatti, se i dati acquisiti trovano corrispondenza nei rapinatori e negli indagati, sarà necessario procedere ad un confronto diretto dei dettagli morfologici, attraverso ingrandimenti ed elaborazioni. A tal fine, se possibile, gli indagati dovranno essere condotti negli ambienti in cui fu commessa la rapina ed essere ripresi dalle apparecchiature in uso all'Istituto rapinato, in posizioni coincidenti con quelle dei rapinatori e nei medesimi siti spaziali da essi occupati. Per effettuare quest’operazione sarà utilizzato un mixer‐video collegato a due videoregistratori che sovrapponga in dissolvenza sul monitor, alle immagini dei rapinatori, trasmesse da uno dei due videoregistratori, quelle degli indagati ripresi in diretta. Mediante spostamenti «fini», degli indagati, sarà possibile far coincidere la loro immagine con quella dei rapinatori, in modo da poterne confrontare meglio tutti gli elementi somatici e fisionomici. Infine si procederà alla comparazione delle immagini più significative dei rapinatori con quelle degli indagati acquisite nelle fasi precedenti o comunque disponibili, con l'ausilio di appositi software. Utilizzando il materiale fotografico si procederà attraverso ingrandimenti ed elaborazioni di immagini, ad un confronto diretto di dettagli morfologici, apprezzabili negli uni e negli altri. Le tecniche di confronto delle immagini e i parametri di giudizio in ordine ai risultati delle comparazioni variano notevolmente a seconda del sistema di identificazione adottato, fra quelli elaborati dagli studiosi della materia. Comunque, la risposta al quesito sull’identificazione o meno delle persone raffigurate nei fotogrammi negli indagati, non può prescindere dall’individuazione di elementi fisionomici fortemente caratterizzanti come i connotati salienti e i contrassegni (cicatrici, nei, mutilazioni ecc.), ma Confronto tra i soggetti ripresi dalle
telecamere e le persone sospettate di
aver compiuto un determinato reato
Un problema di notevole rilievo è quello della “tenuta processuale” delle identificazioni, operate “empiricamente” dagli investigatori, di soggetti ripresi dalle telecamere, che devono essere confermate da giudizi scientificamente fondati. In base all'analisi delle immagini e altri dati emersi dall’attività investigativa possono essere individuate una o più persone sospettate di aver compiuto un determinato reato. Per esprimere un vero e proprio giudizio di identità o di non identità tra le persone raffigurate nelle immagini e quelle sospettate sarà necessario eseguire una complessa indagine tecnica finalizzata alla soluzione di un problema di identità relativa, cioè della provenienza o meno di due termini confrontabili da uno stesso soggetto. La metodica usata per tale indagine prevede, anzitutto, l’osservazione attenta e minuziosa di ciascuno dei termini da confrontare, singolarmente presi, alla ricerca di caratteristiche sufficientemente individualizzanti. In secondo luogo si procederà alla comparazione diretta di tali elementi, onde evidenziare in essi corrispondenze e dissomiglianze significative. Il giudizio finale sull’identità dei termini confrontati varierà da possibile, probabile, molto probabile o certo, a seconda del numero e del valore identificativo, in ciascuno di essi, delle corrispondenze riscontrate. Il giudizio sarà di non identità nel caso venga rilevata anche una soltanto dissomiglianza significativa. Esempio emblematico è il confronto tra un frammento digitale trovato sul luogo del reato e l’impronta di una delle dita del sospettato, che possono costituire aspetti della medesima realtà, pur non essendo identici in senso assoluto. Nel tipo di indagine di cui parliamo i termini in comparazione sono costituiti da immagini fotografiche dei soggetti ripresi dalle telecamere, da un lato e dei sospettati, dall’altro. Si procederà, anzitutto, alla descrizione delle caratteristiche somatiche e fisionomiche dei rapinatori e dei sospettati, avvalendosi di modelli classificatori, elaborati da vari studiosi e utilizzati dagli operatori della Polizia Scientifica Italiana nel corso del segnalamento. I dati relativi ai rapinatori saranno desunti dai "frames" che riproducono le loro sembianze. Per 5 Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 deve basarsi anche su dati di tipo matematico, ottenuti via software, quali ad esempio la statura e gli “indici facciali”, che si ottengono calcolando i rapporti delle distanze fra alcuni punti (punti di repere) tracciati su ambedue i volti. L’acquisizione di questi dati è fortemente condizionata, naturalmente, dalla disponibilità di immagini nitide, che dipende dalla qualità delle telecamere e del sistema di registrazione. Occorre realizzare attività di ricerca congiunte Polizia Scientifica‐Università al fine di migliorare i sistemi per il riconoscimento in termini scientifici dei soggetti ripresi. Ad esempio, utilizzando i cartellini fotosegnaletici, potrebbe essere utile effettuare una ricerca per calcolare statisticamente, su un campione significativo, quante volte si ripetono certe caratteristiche fisionomiche come le morfologie del volto, del naso, del mento, delle sopracciglia, dell’orecchio, ecc., individuate utilizzando le tipizzazioni degli studiosi riconosciute in letteratura. In base ai risultati di questa e di ricerche analoghe in materia, si potrebbe pervenire all’ elaborazione, con il concorso di esperti di polizia scientifica e di medicina legale, di un protocollo di requisiti minimi per esprimere un giudizio sull’ identità che sia scientificamente fondato e accettato dall’ Autorità Giudiziaria. Questo convegno, attraverso il confronto di esperienze e professionalità diverse, può offrire anch’esso un piccolo contributo per promuovere la collaborazione Polizia‐Università nell’attività di ricerca oltre che tentare l’approfondimento, di alcune problematiche, sia giuridiche che tecniche, che riteniamo particolarmente interessanti, negli specifici settori della videosorveglianza e dell’analisi di immagini. Elio Graziano Modena, 22 maggio 2008 Facoltà di Ingegneria 6 Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 Video Biometric Surveillance and Forensic Image Analysis
Tecnologie Biometriche per la Videosorveglianza ed Analisi
delle Immagini per Applicazioni Forensi
Gian Luca Marcialis1, Fabio Roli1, Pietro Andronico2, Paolo Multineddu2,
Pietro Coli3, Giovanni Delogu3
(1)
Dipartimento di Ingegneria Elettrica ed Elettronica, Università di Cagliari; (2)Vitrociset SpA;
(3)
Reparto Carabinieri Investigazioni Scientifiche di Cagliari
in grado di generare alcune misure che aiutino l’operatore a capire se l’impronta latente sia stata lasciata da un dito reale oppure da una replica artificiale. Sommario
In questo articolo vengono descritti due progetti di ricerca condotti dal personale del Dipartimento di Ingegneria Elettrica ed Elettronica (DIEE) dell’Università di Cagliari, in collaborazione con l’azienda Vitrociset SpA e il Reparto Carabinieri Investigazioni Scientifiche di Cagliari, riguardanti le applicazioni della biometria per la video‐
sorveglianza ambientale e le scienze forensi. Nel primo, denominato “Procedura per il Riconoscimento di Oggetti ed elaborazione di immagini multi input” (PRiO) il DIEE è stato coinvolto come struttura consulente dell’azienda Vitrociset S.p.A.. Il secondo, intitolato “Stato dell’arte sui metodi e gli algoritmi per l’analisi computerizzata di immagini di impronte digitali e per l’identificazione di impronte falsificate”, viene svolto nell’ambito di una convenzione che il DIEE ha stipulato con il Raggruppamento Carabinieri Investigazioni Scientifiche (Ra.C.I.S.) dell’Arma dei Carabinieri. Il progetto PRiO è in particolare focalizzato sulla creazione di un sistema intelligente di sensori per il controllo di vaste aree riservate. I sensori, interagendo tra loro, sono in grado di segnalare la presenza di un oggetto in movimento nella scena (autoveicolo o persona), tracciare l’identità di un soggetto passato ad un varco gestito da un sistema biometrico di prossimità, e segnalare ad un operatore eventi sospetti. Il sistema è anche prototipalmente in grado di pianificare opportune contro‐reazioni sulla base di una serie di “situazioni modello” gestite da un simulatore tattico. Il progetto Ra.C.I.S. ha coinvolto il DIEE nello sviluppo di una serie di moduli software per l’analisi di immagini di impronte digitali latenti utili al dattiloscopista, insieme con un sistema prototipale Abstract
In this paper, we describe two research projects involving the Department of Electrical and Electronic Engineering (DIEE) of the University of Cagliari. They are related to the applications of biometrics for environmental video‐surveillance and forensic sciences. In the first one, entitled “Procedure for Object Recognition and Processing of Multi‐Input Images” (PRiO), the DIEE people has been involved as consultant of the company Vitrociset S.p.A.. The second one, entitled “State of the art on methods and algorithms for automatic analysis of fingerprint images and for fake fingerprints identification”, is in the context of an agreement between DIEE and Scientific Investigation Office (Ra.C.I.S.) of “Arma dei Carabinieri” (the militia maintained by the Italian government for police duties). The PRiO project is focused on the development of an intelligent system of sensors for the control of wide reserved areas. The sensors can interact each others in order to capture the presence of living objects in the scene (cars or humans), tracking the subject identity after passing through a proximity biometric verification system, and to signal some unusual situations to a human operator. The system is also able to program some counter‐measures on the basis of models managed by a tactical simulation software. The role of DIEE consists in the development of some fundamentals modules of the project: a proximity biometric verification systems 7 Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 based on faces and fingerprints, a module for ancillary information extraction from a tracked subject (height, gait speed), and also to discriminate between cars and humans into a scene. The Ra.C.I.S. project lies in the development of a set of software modules aimed to process and compare latent fingerprint images, and also a prototype module which helps the human expert to discriminate latent fingerprints released by a live finger from those released by a fake finger. 2) “State of the art on methods and algorithms for automatic analysis of fingerprint images and for fake fingerprints identification”, in cooperation with Raggruppamento Carabinieri Investigazioni Scientifiche (Ra.C.I.S.). The following two Sections briefly describe the above projects. 2. The “PRiO” research project
1. Introduction
The aim of “PRiO” research project is to develop a prototype, proof‐of‐concept, intelligent system for environmental video‐surveillance. In this system, multiple sensors are interconnected and share several types of information, mainly extracted from video‐surveillance cameras. The architecture of the proposed system is shown in Figure 1. The first component of the system is a set of cameras which monitor the scene to be controlled. In this paper, we describe the main research projects involving the Department of Electrical and Electronic Engineering – University of Cagliari, in the context of video‐surveillance and forensic applications. The described projects are: 1) “Procedure for Object Recognition and Processing of Multi‐Input Images” (PRiO), in cooperation with Vitrociset S.p.A.; Camera
Camera
Camera
Proximity
biometric
verification
system
LAN Network for data and
actions communication
Camera
Data collection and
analysis
including object
tracking, car -person
classification, ancillary
information
DB
Tactical simulator
Human operator at
the check -box
Fig. 1. Scheme summarizing the modules of the Intelligent System for Video‐Surveillance applications 8 Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 Four software modules, two of them wholly developed by DIEE people, which receive the inputs from the cameras, interacts each other: 1. An object tracker. 2. A car‐person classifier. 3. An ancillary information extractor. 4. A proximity biometric verification system. The first module has been developed by Vitrociset S.p.A. by using the libraries developed by Technoaware s.r.l.. The input is the video captured by a camera, the output is the so‐called “blob” of the detected object (it is expected that more than one object can be localized into the scene). The second one is aimed to analyse the scene by detecting the nature of each actor, in terms of a human being or a vehicle. The module uses two integrated classification systems, named here as ‘A’ and ‘B’, both aimed to this task: ‘A’ classifier has been developed by Technoaware, ‘B’ classifier by the DIEE people. This second classifier follows the method proposed in [1], where two features are extracted from the detected blob. They are based on geometric characteristics of the blob. Decision is made, on the basis of current features values, by the nearest neighbour classifier applied to a training set of previously captured samples. Decision can be “combined” with the one performed by the Technoaware software in order to improve its robustness. An “AND” rule, which also takes into account previous decisions at previous frames according to a temporal window set by the human operator (the most frequent decision is selected), is used to this goal. Depending on the detected object (a car or a person), alarms or signals forwarded to the human operator at the check‐box can be obviously different. In particular, if the blob has been classified as a person, the third module, wholly developed by DIEE people, extracts some ancillary information which can be useful to the human operator for deciding the best action strategy to adopt. Ancillary information selected has been the height and the gait speed. Both measurements are extracted from people captured and tracked by a certain video‐
camera. Height and gait speed are extracted on the basis of a well‐calibrated scene, by using the algorithm proposed in [2]. It has been noticed that, if the calibration step according to [2] is well done, and a few of non‐linear distortion is supposed to affect the scene, height can be easily extracted, whilst the gait speed can be computed on the basis of the “speed” of the blob’s centre of mass. This last ancillary information is important for a human operator: there can be much difference in action strategies to choose when monitoring a man which runs, with respect to a walking man. Height and gait speed are crisped into three categories (low, medium, high), since the peculiar value of height or gait speed can be redundant for a human operator. The accuracy of this module strongly depends on the blob extractor accuracy. The fourth module is the proximity biometric verification system. Biometrics [3] are physiological or behavioural characteristics of human persons, as the fingerprint, the face, the signature. They have been proposed for recognising the identity of a person, due to their uniqueness and difficult reproducibility, instead of PINs or passwords. However, they also exhibit the problem of the complex computation of discriminative features, and are dependent on the environmental conditions under which they are extracted. Among others, face [4] has to deal with background, illumination and expression changes. But an acceptable performance can be reached even with simple approaches if the working environment is strongly controlled. Moreover, recognising face people by adopting near infrared cameras allowed recently to reach good results [5]. Another biometric widely used in personal identity verification is the fingerprint [6]. This exhibits a better performance than face. Moreover sensors for capturing fingerprint images can be easily found on the market. Face and fingerprint do not require an accurate training of the subject for their acquisition, and can be acceptable by the user 9 Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 population since they are commonly intended as main media for recognizing people. Therefore, the proximity biometric verification system combines both fingerprint and face biometrics for person identity authentication. It also allows to acquire face using a near infrared camera and to switch verification operation from visible to infrared typologies of acquisition. With regard to fingerprint, a cheap optical scanner has been used. Face and fingerprint verification is based on well‐
known algorithms [7‐8]. System is sketched in Figure 2. The role of the proximity identity verification system is the following. The person asks for entering into a reserved area. First of all, he must be authenticated through the system. The information about his identity is passed to the main tracking module. As the person enters into the area, he is tracked with his identity by the system. The ancillary information module at the same time extracts height and gait speed measurements. All signals are forwarded to the human operator which can decide about the best action strategy. Since the overall scene can be very complex, a set of possible “action models”, given to a tactical simulator together with the information extracted by the cameras and by the developed software modules, helps the human operator in his final decision. This last part has only been drafted for future research projects. 3. The “Ra.C.I.S.” project
This research project is aimed to develop a set of software tools for helping the human expert in analysing latent fingerprint images, and it is strictly conducted in cooperation with Ra.C.I.S. people. This project is currently in progress. Three prototypes will be developed (some of them are about to be completed), and integrated each others: 1. An image processing prototype. 2. A comparator prototype. 3. A fake fingerprint analyser for latent images. The first prototype implements a set of standard image processing tools (like contrast enhancement, sharpening, etc.) and also a set of “advanced” tools as dedicated fingerprint enhancement (based on Fig. 2 A snapshot from the image processing prototype (Ra.C.I.S. research project). The denoising filter developed by DIEE people has been applied to three images of the same latent fingerprint enhanced under infrared wave length, in order to avoid spots and improve the image quality. 10 Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 Gabor filters [6]) and extraction of orientation field [6]. Parameters can be set by the human operator. The prototype is characterized by two important features involving the combination of multiple latent fingerprint images captured at different conditions in order to obtain a clearly detailed fingerprint image in terms of ridges flow: (1) under illumination at different wave lengths; (2) by focusing differently on several images locations. A denoising filter obtained by combination of several latent fingerprint images enhanced by infrared lighting completes the available features. Figure 2 shows a snapshot from this module. An enhanced image obtained by the above denoising filter on three infrared latent fingerprint images is reported. The second prototype allows to compare two latent fingerprints: re‐scaling and rotation operations are available, as the possibility to manually indicate and label minutiae (terminations and bifurcations of ridge lines [6]). Images can also be rotated by using minutiae locations. Whichever parts of them can be extracted for being studied in detail. The third prototype allows the extraction from two fingerprint images of a set of measurements which can allow the human expert to discriminate a latent fingerprint left in the scene by a living finger from another one left by a fake finger. We mean with the term “fake finger” an artificial stamp made up of silicon or gelatine, which reproduces the ridge flow of a certain fingerprint. Several works in the literature have shown that reproducing fingerprints is possible, with and without the subject cooperation [9]. This module extracts some useful measurements proposed in the literature even by DIEE people [10], relying on the ridge width, the minutiae intra‐distances, the power spectrum of fingerprint in particular. This prototype is currently under development. References
[1] A.J. Lipton, H. Fujiyoshi, R.S. Patil, Moving Target
Classification and Tracking from Real-time Video, p. 8,
Fourth IEEE Workshop on Applications of Computer
Vision (WACV'98).
[2] A. Criminisi, I. Reid, I. Zisserman, Single view
metrology, Proc. of the Seventh IEEE Int. Conference
on Volume 1, Issue 1:434 – 441, 1999.
[3] Jain, A.K., Bolle, R., Pankanti S. (Eds.), 1999.
BIOMETRICS – Personal Identification in Networked
Society.
Kluwer
Academic
Publishers,
Boston/Dordrecht/London.
[4] Li, S.Z., Jain, A.K. (Eds.), 2005. Handbook of face
recognition. Springer.
[5] S.G. Kong, J. Heo, B.R. Abidi, J. Paik, and M.A. Abidi,
Recent advances in visual and infrared face
recognition—a review,Computer Vision and Image
Understanding, 97 (2005) 103-135.
[6] Maltoni, D., Maio, D., Jain, A.K., Prabhakar, S., 2003.
Handbook of fingerprint recognition. Springer Verlag.
[7] Belhumeur, P.N., Hespanha, J.P., Kriegman, D.J.,
1997. Eigenfaces vs. Fisherfaces: Recognition Using
Class Specific Linear Projection. IEEE Trans. on
PAMI, 19 (7) 711-720.
[8] A.K. Jain, L. Hong, and R. Bolle, On-line Fingerprint
Verification, IEEE Transactions on Pattern Analysis
and Machine Intelligence, 19(4) 302-314, 1997.
[9] T. Matsumoto, H. Matsumoto, K. Yamada, H.
Hoshino, Impact of artificial ‘gummy’ fingers on
fingerprint systems, Proceedings of SPIE, vol. 4677,
(2002).
[10] P. Coli, G.L. Marcialis, and F. Roli, Vitality detection
from fingerprint images: a critical survey, IEEE/IAPR
nd
2 International Conference on Biometrics ICB 2007,
August, 27-29, 2007, Seoul (Korea), S.-W. Lee and S.
Li Eds., Springer LNCS 4642, pp.722-731.
11 Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 Vision-based technologies for surveillance and monitoring
Tecnologie basate sulla visione per la sorveglianza e il
monitoraggio
Stefano Messelodi
Fondazione Bruno Kessler – IRST
POVO - Via Sommarive, 18 – TRENTO - Italy
Sommario
Il presente contributo illustra alcune tecnologie di monitoraggio e di video‐sorveglianza realizzate presso l’unità di ricerca Tecnologie della visione (TeV) della Fondazione Bruno Kesseler di Trento (ex ITC). Dopo una breve descrizione dell’attività di ricerca svolta in TeV vengono illustrate due tecnologie per il controllo del traffico (Scoca) e per l’analisi di scene popolate da persone (SmarTrack). Tali tecnologie sono il risultato della ricerca condotta all’interno di svariati progetti che hanno portato anche alla realizzazione di brevetti, sistemi dimostrativi prototipi. Entrambe le tecnologie hanno riscontrato interesse di tipo industriale e sono in corso azioni mirate al loro trasferimento e sfruttamento. Abstract
This contribution presents the surveillance and monitoring technologies developed at the Technologies of Vision (TeV) research unit of the Fondazione Bruno Kessler in Trento (formerly ITC). A brief description of the research conducted in TeV is followed by the introduction of two technologies, devoted to traffic scene analysis (Scoca) and to people tracking (SmarTrack). They represent the result of the research carried on in the context of various projects, which originated demonstrative systems and prototypes. Expressions of interest have been shown by different companies and they are currently undertaking a transfer and exploitation process. Introduction
The Technologies of Vision (TeV) research unit, at FBK‐irst, is facing two of the most challenging open problems in the signal processing and visual perception fields today, namely: understanding of complex dynamic scenes and semantic image labelling. Starting from the following considerations: ƒ Technology maturation permits us to consider ever more unconstrained scenarios (e.g. multi‐
room, indoor/outdoor) where perception and multimedia interpretation (visual/acoustic) can be applied ƒ The integration of visual information and knowledge coming from other sources in complex environments requires the design of novel approaches supporting perceptive adaptation to environmental conditions and to events taking place in the observed scene. we aim to investigate a novel statistical framework for the generation of a numerical representation of the real world and its dynamics through a learning process that starts from sensory data and takes inspiration from nature. In our aims, results of TeV research generate novel technologies and prototypes that show their applicability to real‐world problems. In the context of the dynamic scene understanding research area we developed two main technologies devoted to the surveillance and monitoring task: traffic scene analysis (Scoca) and people tracking (SmarTrack). In the following we provide a brief description of the technologies and of the past and current projects where they have been realized and applied to different applicative scenarios. 12 Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 Fig. 1: Scoca: track and classify vehicles Scoca – a system for tracking and
classifying vehicles
http://tev.fbk.eu/TeV/Technologies/SCOCA.html
SCOCA is a real‐time vision system to compute traffic parameters by analysing monocular image sequences coming from pole‐mounted video‐
cameras at urban crossroads. The system uses a combination of segmentation and motion information to localize and track moving objects on the road plane, utilizing a robust background updating, and a feature‐based tracking method. It is able to describe the path of each detected vehicle, to estimate its speed and to classify it into seven categories. The classification task relies on a model‐
based matching technique refined by a feature‐
based one for distinguishing between classes having similar models, like bicycles and motorcycles. The system is flexible with respect to the intersection geometry and the camera position. Experimental results demonstrate robust, real‐time vehicle detection, tracking and classification over several hours of videos taken under different illumination conditions. Scoca has been originally developed as an experimental system for the automatic extraction of traffic parameters like volume, density, occupancy, traffic composition, etc. and the computation of the origin‐destination map at intersection level. Next, it has been extended, in the context of the national project: Progetto Pilota funded by Italian Transport Ministry, to the extraction of traffic data to support traffic accident risk measurement. The method is based on microscopic traffic data collected automatically by Scoca along with new traffic parameters estimated from them, like distance among vehicles, illegal trajectories, etc. The benefit of the method is twofold: the risk level is computed without statistics on past accidents, and its computation is fully automated, i.e. it does not require a manual collection of traffic data. The proposed risk index has been evaluated at an urban intersection, before and after the reorganization of its geometry, and seemed to reflect the expectation of traffic experts in evaluating the impact of intervention to improve the safety level of the intersection. SmarTrack – a SmarT people Tracker
http://tev.fbk.eu/smartrack/
SmarTrack is a video tracking technology that provides accurate real‐time information about the spatial location of people using a number of time‐
stamped image streams. For each target entering the monitored scene it acquires an appearance Fig. 2: the interface of the SmarTrack prototype 13 Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 model comprising shape and colour information, and tracks it across the monitored scene using probabilistic representations and context information, if available, such as a map indicating accessible areas or information about visual obstacles. An outstanding feature of SmarTrack is that it can handle large, persistent occlusions among several targets and known obstacles at an affordable computational cost. The probabilistic nature of the tracker allows for consistent support of ambiguity and uncertainty often inherently present in the data in form of clutter and occlusions. The resolution of probabilistic estimates is dynamically adapted by the system to its current performance. The use of descriptive models allows preserving target’s identity over time, a feature that is essential in many applications. SmarTrack has also the capability to handle in a robust way targets with similar appearance, such as a number of soccer players of the same team, by avoiding track coalescence. SmarTrack is a novel, enabling technology for a variety of applications in currently uncovered scenarios. The technology has been developed in the context of two projects: Peach (funded by Provincia Autonoma di Trento) and CHIL (funded by the European Community). Peach aimed at studying and experimenting with advanced technologies that can enhance cultural heritage appreciation by creating an interactive and personalized guide for museum visitors. The CHIL objective was to create environments in which computers serve humans who focus on interacting with other humans as opposed to having to attend to the machines themselves. Computer services have been designed that model humans and the state of their activities. Typical CHIL scenarios are the office and the lecture room, where people interact face to face with people, exchange information, collaborate to jointly solve problems, learn, or socialize, by using whatever means they choose. Currently SmarTrack technology is used and extended within the projects described in the following. Netcarity (VI EU Programme Framework). It aims at an integrated and multidisciplinary approach to the design and development of technologies for supporting the elderly living alone at home. The project will develop a light infrastructure to augment homes with sensors (audio, visual and other) with the capability of trace an inhabitant’s behaviour in order to detect critical safety situations as well as proactively and reactively providing the appropriate service at the right moment. TeV’s specific objective is to design and develop technologies for adaptive visual monitoring of the environment and to classify its inhabitants’ behaviour in terms of movements, postures, and activities.
My‐e‐Director 2012 (VII EU Programme Framework). The main goal is to research and develop an interactive broadcasting service enabling end‐users to select focal actors and points of interest within real‐time broadcasted scenes. The service will resemble an automated ambient intelligent director that will operate with minimal or even without human intervention. One of the key innovative features is the design and development of novel technologies for identifying and tracking targets in the scene of interest. These include an optimal blend of technologies for human actor detection, localization and tracking, as well as other technologies for information fusion, automated context‐acquisition and scene analysis. TeV is addressing the localization and tracking of people in an unconstrained indoor/outdoor environment, with un‐calibrated, fixed and mobile cameras. FreeSurf (PRIN in collaboration with the Universities of Modena, Firenze and Palermo). The project aims at developing new technologies for next generation surveillance systems devoted to the real‐time control of environments where events and actions involving people take place. The purpose is twofold: 14 Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 to do research in the computer vision and pattern recognition fields and also to apply the results to the development of effective and socially acceptable, surveillance systems. TeV is contributing to the project by developing a people tracker based on an architecture of active and distributed cameras, operating in indoor and outdoor environments, possibly exploiting the integration of different approaches adopted by the project partners. Acube (funded by Provincia Autonoma di Trento). The project aims at developing an advanced integrated infrastructure for intelligent monitoring in nursing homes and validating it in real‐world scenarios. The infrastructure encompasses sophisticated features targeted to support medical and assistance staff in order to have a major impact on both the quality of life of the assisted and the working conditions of caregivers. To this end, the project is facing a number of technological and research challenges: algorithms for intelligent sensing, based on the integration of visual, acoustic and environmental sensing; a middleware to support the integration of data streams from sensors; an intelligent architecture, featuring configurability, learning abilities and robustness; model‐based reasoning techniques to provide representation of the domain status, monitor the execution of activities in the environment and deliberate the most appropriate reactions. References
[1] S. Messelodi, C.M. Modena and G. Cattoni, ”Visionbased bicycle / motorcycle classification”. Pattern
Recognition Letters, Vol. 28, No. 13, pp. 1719-1726,
October 2007
[2] S. Messelodi, C.M. Modena and M. Zanin, “A computer
vision system for the detection and classification of
vehicles at urban road intersections”. Pattern Analysis
and Applications, Vol. 8, No. 1-2, pp. 17-31,
September 2005
[3] S. Messelodi, C.M. Modena, N. Segata, M. Zanin, “A
Kalman filter based background updating algorithm
robust to sharp illumination changes”. 13th
International Conference on Image Analysis and
Processing - ICIAP 2005, Cagliari, Italy, September 68, 2005
[4] S. Messelodi, C.M. Modena, “A Computer Vision
System for Traffic Accident Risk Measurement: A Case
Study”. Advances in Transportation Studies, an
International Journal, Vol. 7(B), pp. 51-66, November
2005
[5] O. Lanz, "Approximate Bayesian Multibody Tracking".
IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 28, no. 9, pp. 1436-1449, September,
2006.
[6] O. Lanz: “An Information Theoretic Rule for Sample
Size Adaptation in Particle Filtering”. Proc. Int.l
Conference on Image Analysis and Processing,
September 10-14, 2007, Modena, Italy. Caianiello
paper award.
[7] O. Lanz and R. Brunelli: “Dynamic Head Location and
Pose from Video”. Proc. IEEE Conference on
Multisensor Fusion and Integration, September 3-6,
2006, Heidelberg, Germany
15 Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 Improved Videosurveillance platforms for critical events
recognition
Piattaforme evolute di videosorveglianza per il riconoscimento
di eventi critici
C. Distante+, M. Leo∗, A. Leone+, G. Diraco+, T. D’Orazio*, P. Spagnolo*
+ IMM‐CNR, Istituto per la Microelettronica e Microsistemi, Lecce, Italy ∗
ISSIA‐CNR, Istituto di Studi sui Sistemi Intelligenti per l’Automazione, Bari, Italy Sommario
Da alcuni anni, è in fermento un nutrito interesse verso lo studio di sistemi intelligenti per l’osservazione e la comprensione di comportamenti umani e loro modellazione di apparenza. Inoltre, l’assistenza e la difesa in contesto indoor/outdoor stanno divenendo un importante tema di ricerca sia nel campo della videosorveglianza che nello sviluppo di tecnologie di sensing evolute. In particolare, in questo articolo, architetture intelligenti che operano in contesto di videosorveglianza sono proposte. Esse fanno uso di due approcci di visione artificiale con l’obiettivo di riconoscere eventi comportamentali umani (critico e non). La prima piattaforma di videosorveglianza usa tecnologia CCD standard per il riconoscimento di: comportamenti inusuali, accesso illegale in aree non consentite; oggetti abbandonati e mimica umana. La seconda piattaforma fa uso di tecnologia di visione 3D attiva di ultima generazione che ricostruisce la profondità della scena in base al tempo di volo della radiazione emessa, e che viene usata per il riconoscimento delle cadute di persone in contesto domotico. presented in the videosurveillance context, with the objective of recognizing interesting (critical or non‐
critical) human events. First sensing technology makes use of standard CCD cameras in order to recognize: unusual behaviours, illegal access in forbidden areas, abandoned objects and human gestures. The second sensing technology is related to the use of nowadays emerging 3D time‐of‐flight cameras, which allows to have access to depth information and is used in indoor context for human fall detection. Introduction
Intelligent video surveillance systems are receiving a great deal of interest especially in the fields of personal security and assistance. These systems are built in order to accomplish several tasks that goes from the detection of human presence to the recognition of their activities. Depending on the scenario, it must be specialized accordingly by reducing the number of false alarms often due to intrinsic sensor noise. The integrated functionalities of the systems are: Abstract
moving object segmentation, their description for Over the past few years, research has been the following recognition task and their tracking. increasingly focusing on building systems for Detecting and describing moving objects is often a observing humans and understanding their difficult task, both with standard CCD technology appearance and activities. Furthermore, assistance and emerging 3D active cameras due to different and protection in indoor/outdoor contexts are reasons which will be described below. The becoming an important topic in videosurveillance detection of moving objects is not trivial, since and in general in the sensor field. In particular, in several problems can arise such as camera noise, this paper, two sensing technologies will be reflection, occlusions and shadows [2,6]. This last is 16 Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 due to the occlusion of light source by an object in the scene that generally corrupt the segmentation stage and then the final decision of the system especially when using passive cameras. After segmentation, features are extracted and selected in order to describe the object of interest that must be recognized by an opportune classifier. This paper is organized as follows. Section 2 describes an architecture for the recognition of: unusual behaviours, illegal access in forbidden areas, removed or abandoned objects and human gestures. In section 3 a platform for human fall detection using a 3D active camera is proposed. 1. 2D Imaging Architecture
moving regions characterized by a constant photometric gain the correlation between pixels is calculated, and it is compared with the same value calculated in the background image: regions whose correlation is not substantially changed are marked as shadow regions and removed [2] (see Fig.1). Fig. 1 Moving object segmentation and shadow removing outcome (on the right) on a sequence with varying light conditions due to the presence of large windows (on the left). We propose a new approach that uses the 1.1 Multiview Object Tracking radiometric similarity between corresponding In recent decades, the scientific community has regions of successive frames to evaluate effective been increasingly involved in developing visual moving points, and also between the current target tracking systems due to its numerous temporal image and the background reference potential applications in important tasks such as image to segment foreground objects, making our video surveillance, human activity analysis, traffic approach more robust against noise and avoiding monitoring and event detection. There are the wrong detections of small and continuous numerous situations involving people/objects movements (as for example bushes or curtains moving and interacting in a particular domain blowing in the wind). Sudden light variations have where the tracks of the targets over time provide a been faced introducing a further step of motion rich source of information for analysis of behavior. evaluation that prevents our system from giving However, the automatic visual tracking in situations erroneous segmentations in those frames of frequent interactions remains a challenging corresponding to a un‐updated background model problem, even under favourable viewing conditions. [1]. Finally we face the problem of the shadow In addition to all of the challenging problem removing. The proposed shadow removing inherent in single‐target tracking, multitarget approach starts from the assumption that a shadow tracking must deal with multitarget occlusion, is an abnormal illumination of a part of an image disambiguate targets and assign correct labels. due to the interposition of an opaque object with We present a method to integrate object respect to a bright point‐like illumination source. trajectories from images taken by different static From this assumption, we can note that shadows cameras located at different viewpoints. have not a fixed texture, as real objects: they are Objects’ position are detected in individual cameras half‐transparent regions which retain the and the trajectories they form are transformed on a representation of the underlying background virtual top‐view. Then we use a perspective surface pattern. Therefore, our aim is to examine transformation, the homography, between the the parts of the image that have been detected as camera image and top‐view (see Fig.2). Finally we moving regions from the aforesaid segmentation use the graph matching method to deal with label step but with a texture substantially unchanged association of each player. [3] with respect to the corresponding background. In the proposed approach for all the segmented 17 Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 Fig. 2 From top to bottom the image acquired at time t and the corresponding top‐view generated by homographic projection where is possible to see the trajectory of the person in the scene. 1.2 High Level Processing Information extracted by aforesaid segmentation and tracking steps are the input of different processing modules performing high level processing. The algorithms running in each module generally strongly depend on the application context. In the last years we develop o a procedure for recognizing unusual behaviour and illegal access in forbidden areas [4]; o a procedure to recognize removed or abandoned objects in the scene [4]; o a procedure to recognize human gestures [5]. 1.2.1 Human Behavior Recognition By Analyzing the 2D trajectories in to the virtual top‐view common/expected human behaviours are modelled using statistical approaches. Unexpected human trajectories are then detected in real time by comparison with expected ones allowing to point out suspicious behaviours. Moreover object positions in the virtual plane are continuously compared with a list containing forbidden positions and then illegal access are immediately signalled and dangerous actions can be prevented. Moreover this avoid false alarm generated by perspective projection (see Fig. 2) 1.2.2 Recognize removed or abandoned objects in the scene The implemented approach starts from the segmented image at each frame. If a blob is considered as static for a certain period of time, it is passed to the module for removed/abandoned discrimination. By analyzing the edges, the system is able to detect the type of static regions as abandoned object (a static object left by a person) and removed object (a scene object that is moved). Primarily, an edge operator is applied to the segmented binary image to find the edges of the detected blob. The same operator is applied to the current gray level image. To detect abandoned or removed objects a matching procedure of the edge points in the resulting edged images is introduced (Fig.3). 1.2.3 Recognize human gestures A colour based hand detection procedure particularly adapt for context‐unaware environments (especially video surveillance) has been developed. The idea is to build the skin colour model for each person that enters in the camera view, and update this model during his movements in the scene. The procedure is based on the combination of a preliminary unsupervised clustering technique, that identifies the face region, with a learning method for skin colour modelling. The hand positions are then tacked over time in order to recognize human gestures. Different classification schemes could be used to adapt the procedure to different context: in our experiment we used a Finite State machine modelled on the 18 Fig.3 A person remove an object from the scene. The system recognizes the illegal activity indicating providing additional information about object geometry Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 image plane areas in order to recognize gestures as: opening/closing a cabinet (for example for monitoring activity of elderly people in their house) and putting hand up (in case of a rubbery) (Fig. 4). each pixel is available. The information coming from the camera is processed to extract the blobs (moving regions), detect moving people and track them in the scene. This step is realized by using statistical modeling such as the Mixtures of Gaussians (MoGs), when depth information is used (but we could combine this information with 2D visible stream). The detection of a person in the area is obtained through segmentation by using a Bayesian approach (Fig. 5). The use of depth information provides important improvements in the segmentation process, since, as opposed to the architecture presented in the previous section, it is Fig. 4: Recognizing human gesture:Opening a chest of drawers not affected by shadows presence and/or mimetic appearances. However, other kind of drawbacks 2. 3D Fall detector
can be considered with 3D TOF cameras, such as: aliasing, multi‐path effects, object reflection The detector is based on the Time‐of‐Flight camera properties and limited field of view. MESA SwissRanger 3000. Although this camera After segmentation, tracking mechanisms by using technology has a reduced field of view, a wall‐ Condensation algorithm is used in order to follow mounting setup has been chosen, and the person in the room and estimates its position measurements of both the grayscale and depth for when occlusions arise. When the foreground is extracted (people’s silhouettes), the system analyzes the silhouettes to see if the condition of fall accident is met: a crucial step is the extraction of exhaustive features in order to know the posture of the occupant so it is possible to understand dangerous behaviors (fall). In particular for the proposed architecture, the distance of the centroid of the segmented blob from the ground‐floor is evaluated by calibrating the system by finding an appropriate transformation from camera coordinate system to world coordinate. Fig. 5 Segmentation results with depth information. Fig. 6 Scene geometry for centroid height h definition with respect to the ground plane. 19 Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 Information regarding height and tilt angle with respect to the ground plane is used to derive transformation matrices for the calculation of the metric distance of the centroid C to the floor plane (Fig. 6). A‐priori information is needed (distance 3D camera‐floor plane, orientation of the 3D camera). The proposed approach provides metric information about the spatial geometric configuration of the silhouette in the scene allowing to discriminate 3 macro postures (standing, slanting and flat [7]) according to a thresholding strategy. The fall is detected with a simple threshold method of the centroid distance to the floor (Fig. 7). The analysis of the moving object centroid along the time can be used to discriminate some interesting behavior patterns. Threshold
References
[1] P.Spagnolo, T. D'Orazio, M. Leo, A. Distante “Moving
Object Segmentation by Background Subtraction and
Temporal Analysis” Image and Vision Computing 24
(2006) 411-423 ISSN: 0262-8856
[2] P. Spagnolo, T. D'Orazio, M. Leo, A. Distante
“Advances in Background Updating and Shadow
Removing for Motion Detection Algorithms” Lectures
Notes in Computer Science vol. 3691, pp. 398-406,
2005
[3] G. Kayumbi, P.L. Mazzeo, P. Spagnolo, M. Taj, A.
Cavallaro “Distributed visual sensing for virtual topview trajectory generation in football videos” In the
proceeding of the ACM International Conference on
Image and Video Retrieval, July 7-9, 2008 Sheraton
Fallsview Hotel, Niagara Falls, Canada
[4] M. Leo, P. Spagnolo,T. D’Orazio, P.L. Mazzeo,
A.Distante “Real Time Smart Surveillance Using
Motion Analysis” In the proceeding of the International
Conference on Computer Vision Theory and
Applications 8-11 March 2007, Barcellona, Spain
[5] M. Leo, T. D’Orazio, A. Caroppo, P. Spagnolo, C.
Guaragnella, “Unsupervised Skin Colour Modelling for
Hand Segmentation” In the Proceeding of the fifth
IASTED Int. Conf. Visualization,Imaging and Image
Processing VIIP 2005, pag.430-435
[6] Alessandro Leone, Cosimo Distante, Shadow detection
for moving objects based on texture analysis, Pattern
Recognition, Volume 40 Issue 4, pp. 1222-1233 ,2007
[7] F. Buccolieri, C. Distante, A. Leone, Human Posture
Recognition Using Active Contours and Radial Basis
Function Neural Network, IEEE International
Conference on Advanced Video and Signal based
Surveillance (AVSS), Como, Sept. 2005.
Fig. 7 Typical pattern of the centroid height if fall occurs. In the proposed fall‐detector prototype a simple thresholding strategy is used to detect the fall event. 20 Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 Samurai - Small Area 24-hours Monitoring
Using a netwoRk of cAmeras & sensors for sItuation
awareness enhancement
SAMURAI – Monitoraggio di ambienti mediante l’uso di reti di
telecamere e sensori di supporto proattivo all’operatore
Marco Cristani, Vittorio Murino
Dipartimento di Informatica, University of Verona
Sommario
Il progetto affronta le direttive descritte nella call SEC‐2007‐2.3‐04 “Small area 24 hours surveillance”, presente nel Security Theme al’interno del Cooperation Work Programme, nell’ambito del 7th EC RTD Framework Programme. L’idea è quella di realizzare una piattaforma computazionale di sorveglianza 24/24h 7/7d in grado di monitorare un ambiente medio‐piccolo come un aeroporto, una stazione ferroviaria‐metropolitana, al fine di identificare situazioni anormali o sospettose. La nostra soluzione rappresenta un forte novità nell’ambito dei sistemi di video sorveglianza avanzati. L’elemento alla base della nostra proposta è costituito dalla presenza di agenti umani come parte sostanziale della struttura di sorveglianza, i quali comunicano con una struttura di sorveglianza automatica ed un supervisore umano. In questo modo, la capacità d’adattamento a situazioni impreviste e l’intuizione umana, non replicabili sinteticamente, vengono aumentate o supportate con una rete di moduli di ragionamento e sensori che segnalano situazioni sospettose all’interno dell’ambiente monitorato Abstract
This project will address strategic objectives identified by the call SEC‐2007‐2.3‐04 “Small area 24 hours surveillance” present in the Security Theme in the Cooperation Work Programme, under the 7th EC RTD Framework Programme. The idea is to create a computational framework able to monitor 24/24h, 7/7d a well defined environment, such an airport, a railway‐metro station, in order to detect suspicious and/or abnormal human behavior. Our proposed solution brings a substantial novel element with respect to all the automated surveillance system designs: substantially, our framework sees as key element the presence of human agents which are in tight communication with a reasoning and monitoring automated structure and a human supervisor. In this way, the human intuition, non synthetically repeatable, is augmented and supported with automatic reasoning and sensors modules that raise the attention in the case of suspicious situations. Project Summary
1. We will develop an intelligent surveillance system for permanent monitoring of an infrastructure (e.g., an airport, an industrial site, a shopping mall, an hospital) and its surrounding area. 2. The system will make intensive use of a distributed heterogeneous sensor network consisting of both fixed (standard, PTZ, possibly infrared) and mobile cameras, positioning sensors such as GPSs, and wearable audio/video sensors. These sensors work in a cooperative manner. Specifically, fixed cameras with wide view angle provide a broad coverage of the area under surveillance while other sensors provide 21 Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 INFRASTRUCTURE INFRASTRUCTURE AGENTS AGENTS SYSTEM SYSTEM SUPERVISOR SUPERVISOR Fig. 1a: overview of the system more specific details regarding objects/events of interest. 3. We will develop and integrate capabilities to enable robust people (body appearance and faces), commodities, luggage and vehicle (appearance and number plate) detection, segmentation and alignment, categorisation and retrieval, and tracking across a distributed network of cameras continuously over time and under different weather/lighting conditions. 4. The system will operate and cooperate with three fundamental entities: 1) the infrastructure, 2) the human supervisor(s) located in a central monitoring station, and 3) human agents located and moving in the infrastructure. An effective and permanent multi‐sensorial communication will be instantiated among these three entities and the surveillance system, as depicted by the black arrows in Fig.1.a 5. The services made available by the surveillance system will produce a reliable information flow, as indicated by the green arrows depicted in Fig.1.b, where each of the labelled green arrows represent the final capabilities provided by the system. They are: a. The supervisor(s) can watch autonomously all the critical locations of the infrastructure, in order to discover suspicious object, events or behaviours. The system will augment this human capability by detecting and predicting pre‐codified abnormal 22 a b Fig. 1b: communications in the system behaviours such as abandoned luggage, unexpected or unauthorised presence of people/vehicles in restricted areas, unexpected periodic presence of the same person/vehicle in a given location, infrastructure, or surrounding area, abnormal behaviour among people. In all these cases, the system is able to provide the supervisor with relevant video sequences so that he can decide to: i) communicate the event to the agents, its location and the visual aspect of the actor(s) and the related location(s); ii) demand the system to track objects involved in the event, by exploiting the sensor distributed inside and outside the infrastructure; iii) better inspect the scene in which the behaviour has been detected, by exploiting the sensor machinery (PTZ cameras) located around the scene. Moreover, the supervisor will be made aware of different kinds of hazards present in the area and will be alerted in case of shoots or explosions. b. The supervisor can tightly co‐operate with the security agents. In particular, the supervisor knows the positions of the agents in the structure and of what Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 they see so that to let him aware of the visual scene observed by the agent(s). In this case, the supervisor can ask the agents to further inspect a particular object/person present in the observed scene, or directly exploit the sensors located in the observed area, in order to zoom in on a particular detail which is far or unreachable by the agents. Alternatively, the supervisor can demand to the agents to chase a person/vehicle through communications on its whereabouts and its appearance. 6. In order to ease the monitoring, decision‐making activities of the supervisor, intuitive and interactive visualization interfaces will be developed, which is useful to better and more promptly locate agents and suspicious civilians in the infrastructure. To this end, audio‐visual wearable interfaces will also be built, aiming to more quickly and effectively locate and individuate suspected civilians. If agents are provided with wearable PDA‐like devices, they can observe the focused scene directly from the small screens of the wearable device, via command of the control room, as well as a virtual map generated from different points of view when the same scene is sensed by more directions by the agents. Moreover, visualization tools will be developed for the more effective discovering of abnormal behaviours. camera systems become workhorses of a novel proactive strategy of surveillance; increasing of the use of CCTV surveillance cameras in public and semi‐public places across Europe is well documented, and has been studied in some detail by the Urbaneye Project[1]. Due to this massive presence of sensors, the amount of data to be analyzed by humans increased rapidly, making an accurate and persistent survey a task really difficult to accomplish. IMS Research[2] identified that “after 22 minutes, supervisors miss up to 95% of all scene activity”. Therefore, different solutions towards an automatic machine support on robust behaviour extraction from video‐data have been addressed, starting from the 90’s, and still remain an open issue, addressed by this project. State of the art: what the limitation of the
existing approaches
The main goal of automatic surveillance systems is to create in an autonomous way an “expected” behaviour model, in order to detect unexpected (=suspect) behaviours. Developing intelligent and robust surveillance systems has become the main focus of a considerable amount of scientists, and several have been the outcomes. Anyway, three main weaknesses are typically present in all existent approaches; 1) behaviour models are developed by inspecting spatially and temporarily well‐constrained scenes, such as single rooms or locations under daylight, therefore, behaviour models arising from these situations are Motivation: why this problem is
semantically poor and not sufficiently robust; 2) important
data acquisition is demanded to a fixed sensorial Many of the currently exploited defence systems infrastructure, thus providing a limited inspection against criminal or terrorist attacks typically focus capability; 3) the typical working flow of an on the weapons employed, from firearms, to automatic surveillance system is to alert ASAP one explosives, to viruses, to nukes. Such systems try to ore more supervisors about a occurring dangerous seek out these physical representations of danger situation. None of these systems manage how to with X‐ray machines, explosive sniffers, and interact with a dangerous situation. For example, radiation detectors. Even if these methods do work, human staff involved with the maintenance of the a more effective alert of a criminal act consists in safety has been never been included actively in the determining suspect behaviours. To this aim, multi‐ operative inspection flow cited 23 Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 Added value: technically, why the proposed
project is better; what social and economic
impact it can bring to the EU
SAMURAI project will consist in a surveillance system, in which one or more supervisors will survey a small area 24/7 by means of advanced sensor machinery and an high performance reasoning module, that will inform about suspect human behaviours. In particular, SAMURAI will address the limitations reported above; more specifically, 1) SAMURAI will build human behaviour models by tracking subjects permanently across different locations, both indoor and outdoor; 2) human security agents and supervisors will be actively embedded in the SAMURAI system, making them inherit all the augmented potentialities offered by the sensor machinery located in the infrastructure, in order to effectively interact with suspicious situations, and not only to observe them; 3) high performance sensor machinery will be embedded in the infrastructure, such as PTZ , wireless, IR, and GPS cameras, in order to make easier and more robust a continuous surveillance, via data fusion. Human security agents will be equipped with wearable sensors, so that they can be considered as autonomous mobile sensors‐
actuators, communicating with the supervisor(s). Acknowledgments
Prof. Shaogang Gong of the Queen Mary and Westfield College, University of London, the project coordinator, and Dr. Giovanni Garibotto of ELSAG Datamat S.p.A., Italy, the leader industrial partner, are gratefully acknowledged. References
[1] Urbaneye Project - http://www.urbaneye.net/
[2] IMS report on Market for CCTV & Video Surveillance
Equipment
2005
–
Worldwidehttp://www.imsresearch.com/
24 Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 Threat Assessments in Reactive Surveillance Networks
Reti di Sorveglianza reattive per il rilevamento di eventi
Prof. Gian Luca Foresti and Dr. Christian Micheloni
Dept. Computer Science, Università degli Studi di Udine
Sommario
Abstract
Nelle moderne aree cittadine si è ultimamente assistito ad un continuo aumento di sensori per il controllo del territorio, dalle stazioni per il controllo del tasso di inquinamento alle telecamere a circuito chiuso per il controllo del traffico alle telecamere per la sicurezza all'interno di stazioni, aeroporti e banche. Tale crescita è dettata dalla necessità di aumentare la sicurezza e la qualità della vita del cittadino, ma tali obiettivi sono generalmente disattesi a causa dell'arretrata tecnologia. Le installazioni attuali si affidano esclusivamente all'interpretazione umana dei dati, che tuttavia non può far fronte all'enorme quantità di informazione che viene generata dalla rete di sensori. Il presente lavoro propone uno studio e sviluppo di opportuni algoritmi per la gestione di una rete di sensori eterogenei per il monitoraggio di ambienti pubblici. Un approccio informatizzato permetterebbe infatti di ottimizzare le procedure di integrazione di dati eterogenei in modo da ottenere una visuale d'insieme dello scenario monitorato; tali informazioni potrebbero provenire sia da sensori statici (ad es. telecamere di sorveglianza) che da sentinelle mobili (ad es. agenti di polizia). I dati così raccolti possono essere successivamente elaborati al fine di inferire interpretazioni ad alto livello delle attività in corso all'interno dell'ambiente osservato (ad es. riconoscere i comportamenti sospetti che conducono alla generazione di eventi pericolosi). Si propone inoltre la riconfigurazione automatica della rete di sensori in modo da garantire costantemente che la rete si trovi nella configurazione ottimale per l'acquisizione dei dati di interesse. Il coinvolgimento di diversi tipi di sensori richiede inoltre lo studio di opportune tecniche di trasmissione dati che tengano conto delle limitazioni imposte da canali trasmissivi eterogenei. In the last years, a large number of sensors have been distributed in urban areas for a remote control of the environment, from the sensors for the pollution control to the CCTV cameras for security in railway and metro stations, airports, banks, etc. The request of installing large number of sensors becomes from the need to improve the security and the quality of life of citizens. However, these objectives are generally disappointed because of the old technology used. The existing systems really installed in urban environments are based exclusively on the human understanding of data. However, as the number of sensors increase, event monitoring by human operators is rather boring, tedious and error‐prone. The current work proposes a study and a development of new algorithms and techniques for the design of a network of heterogeneous sensors for the automatic monitoring of public environments. A computer‐based approach can optimize the integration of heterogeneous data coming from both fixed sensors (e.g., video surveillance cameras) and mobile sensors (e.g., autonomous robots or police operators) in such a way to augment the perception of the monitored environment. The collected data can be successively processed in order to generate high‐level interpretations of the human activities in the observed environment (e.g., detecting and understanding unusual behaviors for the set‐up of attacks, etc.). Moreover, the project proposes the automatic reconfiguration of the sensor network in such a way that the network can assume the best possible configuration for the acquisition of 25 Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 interesting data. The use of different types of sensors requires the study of appropriate data transmission techniques to take into account the specific bandwidth limitations imposed by heterogeneous transmission channels. accordance with important NATO Workshops [2][3]. Such a workgroup inherited the hierarchic architecture, proposed by Micheloni et al. [6], where static and active sensors cooperate in a distributed manner. Some of its concepts like event understanding, communication as consequence of Introduction
semantic information, network reconfiguration Video surveillance systems that have been have been deeply investigated and optimized by the proposed in the literature can be classified under task group and by the team working on the PRIN a technological perspective as belonging to three project funded by the Italian ministry of research [4] successive generations. The three generations in 2007. follow the evolution of communications, processing and storage and they have evolved in recent years The resulting architecture considers Local Area with the same increasing speed of such Monitoring Networks (LAMNs) organised in a technologies. hierarchic way to provide the monitoring of small The main goal of the current third generation areas (see Fig. 3). Static Camera Systems (SCSs) are surveillance systems (3GSS) is to provide “full defined to detect and track all the objects moving digital” solutions to the design of the systems, within the assigned area while Active Camera starting at the sensor level, up to the presentation Systems (ACSs) have been exploited for gazing of mixed symbolic and visual information to the targets with higher resolution. The sensor operators. In this sense, they take advantage of placement is done by balancing two opposite progress in low cost, high performance needs: on one hand the possibility to cover the computing networks and in the availability of whole area minimizing the number of sensors while digital communications on heterogeneous, mobile on the other the need for redundancy to counteract and fixed broadband networks. From the point of targets occlusions by providing overlapping fields of view of augmentation of human perception and view. monitoring capabilities, the 3GSS alleviates the human from monitoring a collection of video monitors and in addition would assist the human in tasks that are rather cumbersome (i.e. fall outside normal human spatial and temporal cognitive abilities) to do with traditional systems. For instance, real‐time person tracking in a crowded scene is a tough task for a human to perform with a single video displayed on the monitor. Another improvement of 3GSSs is that online tools can be built to assist humans with event management. System Architecture
During the activity of the IST060‐RTG026 Nato task‐
group [1] several possible interesting scenarios have been analyzed (e.g., automatic face detection and recognition in public areas, remote monitoring Fig. 3: System architecture. The entire sensor network is of parking lots, bridges or tunnels) also in hierarchically organised in Local Area Monitoring Networks 26 Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 (LAMNs) each responsible of the surveillance of a restricted zone. To achieve the global awareness within such an architecture each node is demanded for the detection and the analysis of events of interest. Concerning the behaviour analysis, we propose a method based on statistical machine learning approach to automatically model regular and atypical (critical) behaviour scenarios. By employing machine learning, a priori knowledge of scenario definitions are not required since they are learned by the system while acquiring behavioural information. This way we are guaranteed that the system self‐adapts to different contexts and situations (high flexibility). When detecting critical/irregular behaviours, the system can analyze them with more accuracy by segmenting the recognition process at single entities level (e.g., face recognition, plates OCR recognition, etc.). The system can automatically reconfigure the sensors' network, installed into the environment, according to detected events (behaviour analysis) for optimizing data acquisition. The sensors configuration is determined also by the recognition module that aims to focus the operator and system attention to the detected critical situations and scenarios. As consequence of such tasks, each node is demanded to provide information to the upper nodes. Thus, the network has a hierarchic structure, allowing the processing of raw data at sensor level, to transmit only the meta‐data needed for a global analysis of the events of interest to the higher nodes (Fig. 4). In the current projects, we study innovative techniques to integrate data fed from a sensors' network to automatically infer high level behavioural models . These models have two targets: on the one hand, to allow automatic sensors' network reconfiguration , and, on the other hand, to support decisions taken by human operators. The development of an ad‐hoc communication layer is needed to guarantee efficient data communication among involved entities. Reactive Event Analysis
For the early detection, localization, and continuous tracking of individuals or groups carrying hazardous material (e.g. Improvised Explosive Devices IED) within a multiple person flow the Hamlet project [5] has demonstrated core functions of an in‐door security assistance system for real‐time decision support by using advanced sensors and multiple sensor fusion techniques. The video solution, studied in such a project, provides a first level of computation concerning the extraction of moving objects (blobs) from video streams and the computation of relevant features for event analysis purposes. Moving objects are detected from the background (background modeling and foreground segmentation) and their movements tracked on a 2D top‐view map of the monitored scene [7]. Tracking is performed using a Kalman Filter applied on map positions together with a Meanshift based tracking technique on the image plane. The object classification is performed by employing an adaptive high order neural tree (AHNT) classifier [8] that allows to distinguish between the main classes (e.g. person, vehicle, luggage, etc.). At each time instant, a set of low level features (i.e. object class, position, trajectory, mean speed, etc.) is extracted and maintained for any foreground object in the scene. Fig. 4 Hirearchic structure of the sensor network with respect to the data/information flow. 27 Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 Once all the necessary features have been extracted from the video stream, an approach based on explicit modeling of dangerous events is adopted to describe and understand the activities occurring inside the monitored environment. Two different types of events have been considered: simple events, characterized by the motion (and behaviour) of a single object and composite events, characterized by interactions among multiple objects. A simple event v is defined over a temporal interval [T s , T f ] and contains a set of features F = [ f1 , … , f m ] belonging to a given object O i observed over a sequence of n consecutive frames as: (
)
v T s , T f = { f k | f k ∈ Oi , k ∈ [1..m]} (1) Composite events are represented by a set of simple events that are spatially and/or temporally correlated. Hence, a composite event is defined over a wide temporal interval as a graph G = (V , E ) {
where the }
set of vertexes V = v1 (T s , T f ), … v n (T s , T f ) is the set of simple events and the set of edges E is the set of the temporal and spatial associations between simple events. To recognize composite events thus associating them to a predefined list of events of interest, we employ a graph matching technique [9]. Each event stored in a graph forest (the database of events of interest) is associated to an alarm level describing the degree of importance of that event in a specific contest. We identified 3 different alarm levels, in increasing scale of danger. The alarm levels are defined as: a )normal, b)suspicious and c) critical. On the basis of the alarm level, the proposed system can decide the importance of the activity and the need of monitoring a particular area. Thus, the system can assign different bandwidths for data transmission. In particular, it can reserve higher bandwidth for those sensors that are acquiring meaningful data and lower band for the others. With the same concept the system can decide a reconfiguration of the camera network in order to improve the detection and recognition. The project PRIN06 focuses its studies on appropriate metric measures for analyzing the performances of both sensors and algorithms in order to define a global quality function that allows the reconfiguration of the whole network of sensors. References
[1] Nato IST060-RTG026 on “Advanced Mutlisensor
Surveillance Systems for Combating Terrorism”
(AMSS-CT), 2004-2008.
[2] NATO Research and Technology Organization,
Workshop on “Combating
Terrorism
(CT)”,
Washington, D.C., February 2002.
[3] RTO Inter-Panel Group (RTO IPG), workshop on
“RTO Activities
and Combating Terrorism”, RTA
Headquarters, Neuilly sur Seine (France), June 27,
2002.
[4] Italian Ministry of University and Scientific Research
within theframework of the project PRIN-06 ”Ambient
Intelligence: event analysis, sensor reconfiguration and
multimodal interfaces”(2007-2008).
[5] EU-PARS-SEC6-SA-204400 HAMLeT project (20062008) - Hazardous Material Localization & Tracking
[6] Micheloni, C., Foresti, G., Snidaro, L.: A network of
cooperative cameras for visual-surveillance. IEE
Visual,Image & Signal Processing 152(2), 205–212
(2005).
[7] G. L. Foresti, C. Micheloni, L. Snidaro, P. Remagnino,
and T. Ellis,“Active video-based surveillance systems,”
IEEE Signal Processing Magazine, vol. 22, no. 2, pp.
25–37, March 2005.
[8] G. Foresti and T. Dolso, “Adaptive high-order neural
trees for pattern recognition,” IEEE Transactions on
System, Man and Cybernetics Part B, vol. 34, no. 2,
pp. 988–996, Apr. 2004.
[9] D. Conte, P. Foggia, C. Sansone, and M. Vento,
“Thirty years of graph matching in pattern recognition,”
International Journal of Pattern Recognition and
Artificial Intelligence, vol. 18, no. 3, pp. 265–298, 2004.
28 Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 (a)
(b)
(c)
Fig. 5 Example of cooperative IED‐Video network reconfiguration. In (a) three different people are detected nearby a signaling IED‐Sensor. (b) A person C is nearby another signaling IED‐Sensor having been already tagged in a previous instant. It’s dangerousness is raised. (c) Person C is once more detected nearby a signaling IED‐Sensor. At this time the network is reconfigured to focus on such a person by setting a camera to understand its movements and a camera to detect and recognize its face. Fig. 6 Example of video event understanding. (a) Typical normal event recognised by the system and presented to the human operator with a green light. A suspicious event (b) has been recognised and signaled to the operator with a yellow light. The operator can look in the log of the events at the bottom of the interface to identify the suspicious entity and investigate more its behaviour with respect to the other sensors. Finally, a critical event (c) has been recognised and signaled to the operator with a red light. The operator can extract from the event logs the identity of the entity and task other people to deeply investigate. 29 Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 Multiple cameras surveillance systems for multiple people
tracking and action analysis: the ImageLab solution
Sistemi multi-camera di videosorveglianza per il tracking di
più persone e il riconoscimento delle azioni:
la soluzione di ImageLab
Rita Cucchiara, Costantino Grana, Andrea Prati, Roberto Vezzani
Imagelab, Università di Modena e Reggio Emilia
Sommario
Questo articolo presenta l’attività di ricerca svolta nell’ambito della pattern recognition e della visione artificiale per la video sorveglianza presso l’ImageLab del Dipartimento di Ingegneria dell’Informazione di Modena. La ricerca, finanziata da progetti nazionali ed internazionali sia da enti pubblici che da aziende private ha portato a diversi risultati scientifici, con pubblicazioni di rilievo e prototipi funzionanti. In questo lavoro viene brevemente presentata l’architettura della soluzione di ImageLab e vengono descritti alcuni degli ultimi risultati in tracking di persone da molte telecamere e il riconoscimento di azioni e comportamenti. Abstract
This paper describes the research activity in computer vision and pattern recognition for video surveillance, carried out at ImageLab of the Dipartimento di Ingegneria dell’Informazione di Modena. The research, funded by many national and international projects, carried out several scientific results, outstanding publications and working prototypes. As well as the architecture of the whole ImageLab solution, results of multiple people tracking from distributed cameras and action and behavior recognition will be presented. solutions for multiple objects detection and tracking in cluttered and shadowed scenes [4]. These techniques have been implemented in robust systems with distributed cameras [5] and sensor integration [6], applied in real–time working prototypes. Recently, Imagelab’s activity focuses on challenging safety and security problems: posture analysis for home security [2], people tracking and face obscuration for privacy [7], dangerous event recognition (in particular smoke detection [8]) and abnormal behavior and action recognition [9,10]. In this presentation, we briefly illustrate the outgoing researches carried out within national and international projects, sponsored by companies and public institutions. The most significant ones are:1 •
•
•
•
•
•
1. Introduction
The Imagelab of Modena turns 10 years of research activity in computer vision and pattern recognition for video‐surveillance. Research focus ranges from vehicle surveillance in intelligent transportation systems [1] to object and people surveillance in indoor [2] and outdoor environments [3]. The main effort is devoted to novel algorithms and software All these projects share the attempt to define innovative, robust and efficient techniques in real‐
time working solutions, for next generations of video surveillance systems. 1
(see http://imagelab.ing.unimore.it/projects) 30 LAICA (Lab. di ambient intelligence per una città amica) ‐ Piano Telematico Emilia Romagna (2004‐06) FREE SURF (free surveillance in a privacy respectful way) ‐ Progetto di Rilevante interesse Nazionale (PRIN) of MIUR (2006‐08) SmokeWave (smoke detection with wavelet technology) ‐ sponsored by WTI srl (2007‐09) BE SAFE (Behavior lEarning in Surveilled Areas with Feature Extraction)‐ Nato Project “science for peace” with Hebrew University (Israel) (2007‐08) ”Automatic detection of infiltrated objects for security of airports and train stations”‐ Australian Council project with University of Sidney (2007‐09) VISOR (video surveillance on line repository), ‐ Task of the EU STREP project VIDI‐VIDEO (2007‐09). Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 Fig. 1: The Sakbot‐Imagelab solution
hoc) [5] have been adopted in many applications of vehicle and people tracking from single cameras. 2. The Imagelab solution
In ten years, we developed a modular architecture as is sketched in Fig. 1. The architecture exploits several libraries for video surveillance, written in C++ in the ImageLab lib, able to provide a direct interoperability with OpenCV [11] , the state of the art of computer vision open source tools. Describing it in a bottom‐up approach, the lowest level is the interface with sensors. Traditionally video surveillance employs fixed cameras and provides motion detection by means of background suppression. Accordingly, we defined a Statistical and Knowledge Based Object detecTor algorithm (Sakbot) [3], very robust in many different conditions: it computes a selective temporal median for background modeling, allows ghost (e.g. apparent objects) and shadows suppression and moving object validation [4]. It has been compared in real scenes of public parks in several benchmarks with MoG approaches showing his effectiveness in mostly mono‐modal backgrounds [11]. Sakbot segmentation with a sophisticated Appearance baseD tracking module Handling Occlusions (Ad Recent advances call for a large covering of monitored areas, requiring multiple cameras. In case of cameras with partially overlapped field of view (FoV) we proposed a new statistical and geometrical approach to solve the consistent labeling problem. It means that people (or objects) detected from a camera module should maintain their identification if acquired by other camera modules, even in case of overlaps, partial views, multiple people passing the edges of the FoVs and in case of uncertainly. An automatic learning phase to reconstruct the homography of the plane and the epipolar lines is due. The approach, called Hecol [12] (Homography and Epipolar based Consistent Labeling), has been employed for real time monitoring of public parks in Reggio Emilia. Other experiments have been carried out in the University campus too with four cameras and tens of people tracked in real‐time (Fig. 2), for the FREE‐SURF project. 31 Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 The multi‐camera framework comprehends also the exploitation of moving cameras: we study the problem of people tracking using cameras with a constrained motion like PTZ sensors, and recently we addressing the problem of tracking with freely –
moving unconstrained cameras The former has been addressed with a dynamic mosaicing of the scene and a snake –based edge tracking, the latter uses well known algorithms such as CamShift and Particle Filters improved with a graph based model consistency check. The adoption of a multi view acquisition system allows to extract the best view of each detected and tracked people for logging purposes and to have a good resolution in acquiring people shape to extract the regions of head and face. This approach [7] enables to create surveillance systems free from privacy constraints since provides an automatic obscuration of face identity for all these application such as web delivery of surveillance data where the people anonymity should be guaranteed. An important chapter of new video surveillance research is the action recognition and behavior analysis. We concentrate on single people action, singularly tracked by previous modules. The former refers to the extraction of reliable and meaningful spatio‐temporal features to classify or recognize specific activities: it comprehends posture classification, gait recognition, actions and interactions. The latter is more general and should take into account also the contexts, the concepts and the events related with and a reasoning system to cluster and classify events. In the framework of action recognition we explored different techniques for posture analysis with probability maps on projection histograms and HMM [2], applied in indoor surveillance for detect people falling down. Now, in the context of BE SAFE project, we are analyzing angular features of people shapes embodied in a single descriptor called “Action Signature” to classify people actions such as running, walking, crouching etc. [10]. The behavior analysis is carried out starting with people trajectory. Learning the normal path of people we could infer abnormal behaviors by means of detecting strange trajectories which do not fit clusters of pre‐analyzed trajectories. The challenge is a reliable measure of similarity of shape trajectory in large space covered by multiple cameras. In this field we recently proposed a new effective descriptor of trajectory based on circular statistics with Mixture of Von Mises distributions [9]. Fig.3 MOSES streaming system Fig.2 Experiments of multi‐camera tracking The architecture of Fig.1 takes into account also the storage, the annotation and the remote transmission of surveillance data. On this last point a new effective streaming architecture with MPEG4‐AVC has been developed (the MoSES architecture) suitable to surveillance in low bandwidth connection [13] with mobile devices (see 32 Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 fig.3). Finally, we are recently developed the framework VISOR (Video Surveillance On line Repository) to furnish a surveillance concept list, an annotation and querying platform containing hundreds of video clips (see Fig. 4). Examples are videos containing smoke. We developed a module for smoke detection: the results in XML or visual form are stored in VISOR with the videos we used for test [8]. VISOR is available for uploading and downloading videos and we hope could be used world‐wide for video‐
surveillance performance evaluation. 3. Conclusion
The activity and some results of research in video‐
surveillance at ImageLab have been presented. With this effort, Imagelab contributed in the definition of distributed architecture of the often called “third generations of video surveillance systems” which integrate network of digital sensors and processing elements, cognitive processes with machine learning‐based active tasks, computer vision paradigms fully exploiting 3D geometry and multimedia interfaces for remote and mobile connections. We are glad to thank the tens of students and collaborators which make these results possible. References
[1] R. Cucchiara, P. Mello, M. Piccardi, "Image Analysis
and Rule-Based Reasoning for a Traffic Monitoring"
IEEE Transactions on Intelligent Transportation
Systems, vol. 1, n. 2, IEEE Press, 119-130, June,
2000
[2] R. Cucchiara, C. Grana, A. Prati, R. Vezzani,
"Probabilistic Posture Classification for Human
Behaviour Analysis" in IEEE Transactions on Systems,
Man, and Cybernetics, Part A: Systems and Humans,
vol. 35, n. 1, pp. 42-54, 2005
[3] R. Cucchiara, C. Grana, M. Piccardi, A. Prati,
"Detecting Moving Objects, Ghosts and Shadows in
Video Streams" in IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 25, n. 10,
pp. 1337-1342, 2003
[4] A. Prati, I. Mikic, M.M. Trivedi, R. Cucchiara,
"Detecting Moving Shadows: Algorithms and
Evaluation" in IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol. 25, n. 7,918-923, 2003.
[5] S. Calderara, A. Prati, R. Cucchiara, "HECOL:
Homography and Epipolar-based Consistent Labeling
for Outdoor Park Surveillance" in press on Computer
Vision and Image Understanding.
Fig.4 VISOR Video clips examples [6] R. Cucchiara, A. Prati, R. Vezzani, L. Benini, E.
Farella, P. Zappi, "Using a Wireless Sensor Network to
Enhance Video Surveillance" in Journal of Ubiquitous
Computing and Intelligence (JUCI), (1), pp 1-11, 2006.
[7] R. Cucchiara, A. Prati, R. Vezzani, "A System for
Automatic Face Obscuration for Privacy Purposes" in
Pattern Recognition Letters, vol. 27,num 15, 18091815, 2006
[8] R. Vezzani, S. Calderara, P. Piccinini, R. Cucchiara,
"Smoke detection in videosurveillance: the use of
VISOR (Video Surveillance On-line Repository)" in
press on Proceeding of ACM International Conference
on Image and Video Retrieval, Niagara Falls, Canada,
July, 7-9, 2008
[9] A. Prati, S. Calderara, R. Cucchiara, "Using Circular
Statistics for Trajectory Analysis" in Proceedings of
International Conference on Computer Vision and
Pattern Recognition (CVPR 2008), Anchorage, Alaska
(USA), June 24-26, 2008
[10]S. Calderara, R. Cucchiara, A. Prati, “Action Signature:
a Novel Holistic Representation for Action
Recognition”, in press on Proceedings of IEEE
International Conference on Advanced Video and
Signal based Surveillance (IEEE AVSS 2008), New
Mexico, Sept. 1-3, 2008.
[11]S. Calderara, R. Cucchiara, A. Prati, "Multimedia
Surveillance:
Content-based
Retrieval
with
Multicamera People Tracking" in Proceedings of
Fourth ACM International Workshop on Video
Surveillance and Sensor Networks (VSSN 2006),
Santa Barbara (CA), USA, Oct. 27, 2006
[12]S. Calderara, R. Cucchiara, A. Prati, "Bayesiancompetitive
Consistent
Labeling
for
People
Surveillance" in IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol. 30, n. 2, 354-360, 2008.
[13] Giovanni Gualdi, Andrea Prati, Rita Cucchiara, “Video
Streaming for Mobile Video Surveillance”, in press on
IEEE Transactions on Multimedia.
33 Prima Gio
ornata di Stud
dio sui Progettti di Ricerca in Video Sorveeglianza in Itaalia 2008 First Worksh
hop on Video Surveillance
e projects in Italy 2008 Dis
stant Hu
umans Identification In Wide
e Areas
Identific
cazione
e di pers
sone in aree es
stese
A Del Bimbo
A.
o, F. Dini, A.
A Grifoni, F.. Pernici
Dipa
artimento dii Sistemi e Informatica,
I
, Università degli Studii di Firenze
V di Santa
Via
a Marta 3, 50139
5
Firenzze, Italy
Sommario
L’identificazione di persone in aree estese d telecameere coinvolgge due attraversso l’uso di principali problemi: quello di rilevare, r
traccciare e riconosccere le iden
ntità delle persone p
monitorate (tipicamente attraveerso la facciaa), la seconda quella di doverr gestire teleccamere bran
ndeggiabili (P
PTZ) con zoom. Presentiamo
o qui due applicazioni/
a
ricerche svolte dal d nostro grruppo di riccerca dell’Un
niversità degli Stu
udi di Firenzee. Abstra
act
]. Since no b
histogram [2
h
background maintenancee is adopted this a
approach caan be used w
with either fixxed or active cam
o
meras. are evaluateed in order to
Face images F
o assign them
m a quality q
metrric that could be comp
pared with the highest quali
h
best ty reached sso far, so thaat only the b
quality image
q
es (see fig.1)) are stored in the face log (conciseness)). In succh a fram
mework, face obfuscation o
is also possibile, providing a solutio
ons to legal probllems related
d to people p
privacy respeect. In order to identify h
humans at a distance, tw
wo main urveillance building blockks are necesssary: 1) video su
a face id
dentification module and
d 2) a methodology to manaage active zo
ooming cameeras (PTZ). Here H
we present two main application
n in this direction d
investigaated by our rresearch group. 1. Face
e Detectio
on and Tra
acking for
concis
se Face Lo
ogging.
For som
me video surveillance s
application
ns, face detection, tracking and recoggnition are critical H
resolutiion, high qu
uality frontal images tasks. High of the face are needed in order o
to efffectively on we presen
nt a face recognizze a person. In this sectio
logger which w
is able to collects time‐sstamped frontal face f
imagess where only the best quality images aare stored. A
A first applicaation is a face logger prototyp
pe, capable to detectt, track an
nd grab imagery of faces graabbed by an
n IP camera. Targets are deteected by a Viiola‐Jones face detector [1], and used either to initialize a new paarticle filter aand as a measureement for trracking togeether with a a colour 34
Fig. 1: Collectio
F
ons of concise sset of time‐stam
mped face imagges related to h
human identity obtained by our face‐logger. Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 2. On-line Camera cooperation to
acquire human head imagery in wide
areas
Another application brings the problem of grabbing high quality images of the target one step forward. A PTZ camera is used in conjunction with a fixed wide view camera that detects and tracks the target in a surveilled area, and controls the PTZ camera in order to get close‐ups of it [3]. We consider the problem of estimating on‐line the time‐variant transformation relating a person’s feet position in the image of a first, fixed camera, to his head position in the image of a second, pan‐tilt‐
zoom camera. The transformation allow to acquire high‐resolution images by steering the PTZ camera (slave camera) at targets detected in a fixed camera view (master camera). Assuming a planar scene and modeling humans as vertical segments, we present the development of an uncalibrated framework which does not require any 3D known location to be specified, and it allows to take into account both zooming camera and target uncertainties [4]. Results (see fig.2) show good performances in slave camera target head localization, degrading when the high zoom factor causes a lack of feature points in the slave camera. A particle filter based tracking method is used for two different tasks: first, to track the target in the fixed camera image plane; and second, to track the parameters that describe the time‐variant homography mapping the fixed camera image plane in the PTZ camera image plane. This will build a base framework upon which we will be able to develop sensors management solutions for future, advanced video surveillance applications. References
Fig. 2: Some screenshot from the experimental results. On the left the fixed camera view. On the right the PTZ camera view. Time is increasing from top to bottom. The red crosses show the feet and head imaged locations. It can be seen, that increasing the zoom factor may lead to inaccurate estimation of the homography, due to lack of features. The blue line is the imaged line orthogonal to the scene plane. [1] P.Viola and M.Jones. “Rapid object detection using a
bossted cascade of simple features” In Proc.IEEE
Conference on Computer Vision and Pattern Recognition,
2001
[2] A. Bagdanov, A. Del Bimbo, Fabrizio Dini, and W.
Nunziati. "Improving the robustness of particle filter–based
visual trackers using online parameter adaptation",
Proceedings of the IEEE International Conference on
Advanced Video Surveillance (AVSS), London, England,
September 2007.
[3] X. Zhou, R. Collins, T. Kanade, and P. Metes. A
master-slave system to acquire biometric imagery of
humans at a distance. ACM SIGMM 2003 Workshop on
Video Surveillance, pages 113–120, 2003.
[4] Alberto Del Bimbo, Federico Pernici “Uncalibrated 3D
Human Tracking With A Ptz-Camera Viewing A Plane”
Proc.
3DTV
International
Conference:
Capture,
Transmission and Display of 3D Video (3DTV-CON 08),
Istanbul May 2008.
35 Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 Video surveillance and bio-inspired embodied cognitive
systems
Videosorveglianza e sistemi cognitivi embodied bio-inspired
Matteo Pinasco
Carlo S. Regazzoni
{pinasco, carlo}@dibe.unige.it
Department of Biophysical and Electronic Engineering, University of Genoa
Genoa – Italy
Sommario
Negli ultimi 20 – 30 anni la videosorveglianza ha ricevuto un aumento dell’attenzione, dall’opinione pubblica, a causa della richiesta di sicurezza da parte della società. Relativamente a questo fatto, sono stati effettuati significativi investimenti, da parte dei governi e delle industrie, nello sviluppo di sistemi di videosorveglianza e in progetti di ricerca. In questo lavoro verrà descritto lo stato dell’arte dei sistemi di videosorveglianza intelligente, analizzando problemi aperti e possibili soluzioni. Infine verranno illustrati i risultati ottenuti in progetti di ricerca svolti all’interno del gruppo di ricerca ISIP40 in collaborazione con altri enti. Abstract
Over the last 20 ‐ 30 years video surveillance received an increasing attention from public opinion because of the society demand of security. Related to that, significant funds, both coming from governments and industries, has been invested in the development of surveillance systems and in research projects. The state of the art of intelligent video surveillance systems will be described analyzing open problems and possible solutions. Then results from some research projects, developed by ISIP40 research group and other institutions, will be shown. State of the art
From a research point of view, the evolution of video surveillance systems can be summarized in some different generations [1]. First generation (1960‐2000): The first generation of surveillance systems [2] simply extends human perception capabilities collecting images of the scene by a set of cameras. Human operators analyse the video streams through a set of monitors where images from different cameras are presented cyclic multiplexed. Hence the scene understanding task is performed solely by human operators which, however, extend their possibility of guarding wider areas. The sensors employed are analog and there is no information processing. For some applications this kind of system is itself a deterrent for dangerous situations to occur merely by signalling their presence. The major drawback is related to the attention and processing limitations of humans[3]. Another drawback is related to the difficult and slow retrieval of information as a large amount of tapes has to be re‐examined. Second generation (1980‐2015): The second generation of surveillance systems [4] represents the current technology employed from a commercial point of view. This generation takes advantages from video digital acquisition, transmission, storage and processing. The most innovative and relevant aspect of second generation systems is the conversion of analog signals to a digital representation, suitable for their processing. Therefore, simultaneously, new video‐
processing and computer vision algorithms were developed in order to allow an accurate scene interpretation. Hence, digital information are elaborated by a centralized system to identify those 36 Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 scenes with an high probability to be associated with an event of interest. In this way, the human operator is supposed to analyse only a restricted number of situations focusing its attention on those monitors showing the potentially dangerous events. Third generation (2000‐2025): The third generation of surveillance systems extends the digitalization to the entire system architecture, from sensors to the information provided to operators. The most important characteristic introduced is the distribution of intelligence among the elements constituting the system framework [5], which are able to communicate each other over heterogeneous networks (optical fibre, twisted pairs, wireless etc.). Hence, third generation systems take great advantage from the fast improvement of broadband and mobile communications as well as from the development of the so‐called smart sensors [6], i.e. sensors with processing capabilities, obtained by means of either dedicated DSPs or general purpose processors. As a consequence, architecture can reach high level of flexibility and, at the same time of robustness. Moreover, alarm detection, semantic interpretation of the scene, effective and fast information retrieval are characteristics increasingly improved. The distribution of intelligence implies, also, as a natural consequence, development of data fusion techniques [7], aiming at exploiting the multisensoriality of third generation systems. Third generation systems, thanks to availability of an integration of powerful and heterogeneous devices, are also open to a larger domain of applications and they are able to provide additional surveillance functionalities. Towards fourth generation (2005‐?): Nowadays research is coming towards cognitive video surveillance systems. This kind of systems has the aim to overcome limitation by interacting directly with the environment through actuators present in the area of interest [8]. Actions performed by the system must provoke a modification of the external world in order either to avoid undesired events to take place or to handle directly and quickly their evolution. According to these requirements it is clear that these systems must have the following “cognitive” capabilities: 1) semantic interpretation of the events, 2) dynamic adaptability to changes, 3) proactive or preventive interaction with the external world. Cognitive Surveillance Systems able to accomplish these goals can be reasonably designed by imitating and modelling brain skills. The Cognitive Cycle is a basic model to represent the processes of a live being while interacting with the external world. Four consequent and inter‐
depending stages can be identified: 1) sensing, 2) analysis, 3) decision, and 4) action [9]. A learning phase is usually included to support the decision with the memory of past events. These systems are capable to take decisions, and then to make some action, autonomously, but also to interact with human operators to reach to a cooperative decision. In this way the decision process are distributed on more levels: lower levels which represent the reflex decisions and higher levels which concerns of cooperative decisions. There are still a lot of open problems which are elements of research. Some of these are related to third generation systems, other ones to fourth generation system and also some problems that concern both third and fourth generations. For third generation systems a very interesting problem is the robustness of the algorithms and the performances evaluation of the systems. In the fourth generation learning algorithms play a very important role and an attractive research field consists of he study of efficient and effective systems which are capable to learn from past experiences. With the development of extended architectures, multisensor data fusion algorithms are very important in third and fourth generations and so to improve their efficiency, to satisfy security requirements, is an open challenge. Research projects
ISIP40 is a research group that works in the field of video surveillance since over twenty years and it has collaborated in many research project. In 37 Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 particular, in this presentation the following projects are exposed: Joint Lab with Technoaware: This is a permanent project which was born with the aim to realize robust libraries and than strong algorithm for events detection mainly in the security field. Following the research trend, not only third generation systems are developed, but also ambient intelligence systems are studied. VICAST (VIsion‐based Computer Aided Safe Transportation): The project aims at exploiting computer vision techniques together with data fusion and scene understanding algorithms to provide the driver (be it a train or a road vehicle driver) with an additional active safety tool. This project is been realized with CRF (Centro Ricerche FIAT) and it is well contextualized in fourth generation systems. Distributed and heterogeneous architectures for multi‐sensor surveillance systems: The aim of this project is to realize a distributed automatic monitoring system for extended environments through an heterogeneous multi‐sensor architecture. Some of third generation open problems are considered, such as reliability and performances evaluation metrics. Context awareness autonomic network: This project is strictly related to a particular application of third generation systems, in which the aim is to estimate the pose of people in a security context. The system is capable to estimate when a person is standing, squatting, sitting or lying; in this way anomalous behaviour and emergency situations can be detected. Videosorveglianza in impianti industriali: This project concerns the factory plant security. The aim of this project is to do processing and automatic interpretation of video sequences for recognize events that can injure people or transport security. Also this project has as target to overcomes some third generation systems open problems. SINTESIS (Sistema INTEgrato per la Sicurezza ad Intelligenza diStribuita): This project aim to create an innovative structure based on the concept of “Network Centric Operation” which is particularly adapt to optimize distribution, QoS, and “time critical” processes. The project want to study the application of the cognitive model to the security field. The aim is to realize a fourth generation architecture composed by nodes which have a reason capability similar to the human operators one, but with a more efficient analysis capability also in very complex situations, with a very small response time. Other projects: Other projects from research groups working in the same field at PoliMi (Prof. Stefano Tubaro) and university of Siena (Prof. Alessandro Mecocci) are shown. A common characteristic of these projects is the signal based oriented approach coming from a telecommunications background of research groups. References
[1] G. L. Foresti, C. S. Regazzoni, and V.Ramesh,
“Special issue on video communications, processing,
and understanding for third generation surveillance
systems,” Proceedings of the IEEE, vol. 89, no. 10, pp.
1355–1359, October 2001.
[2] D. G. Chandler, “Applications for bidirectional
broadband coaxial cable communication system,”
Journal of the Society of Motion Picture and Television
Engineers, vol. 79, no. 9, p. 836, September 1970.
[3] G. J. D. Smith, “Behind the screens: Examining
constructions of deviance and informal practices
among CCTV control rooms operators in the UK,”
Surveillance and Society, vol. 2, pp. 376–395, 2004.
[4] C. S. Regazzoni, G. Vernazza, and G. Fabri, Eds.,
Advanced
Video-Based
Surveillance
Systems.
Norwell, MA, USA: Kluwer Academic Publishers, 1998.
[5] L. Marcenaro, L. Marchesotti, and C. S. Regazzoni,
“Moving objects self-generated illumination variations
removal for automatic videosurveillance systems.” in
Proceedings of Visual Communications and Image
Processing, VCIP ’03, 2003, pp. 414–421.
[6] A. Del Bue, D. Comaniciu, V. Ramesh, and C. S.
Regazzoni, “Smart cameras with real-time video object
generation,” in Int. Conf. on Image Processing, ICIP
’02, September 2002.
[7] D. L. Hall and J. Llinas, “An introduction to multisensor
data fusion,” Proceedings of the IEEE, vol. 85, no. 1,
June 1997.
[8] D. Vernon, G. Metta, and G. Sandini “A survey of
artificial cognitive systems: Implications for the
autonomous development of mental capabilities in
computational agents,” IEEE Transactions on
Evolutionary Computation, 11(2):151–180, 2007.
[9] L. Marchesotti, S.Piva, M. Gandetto, and C.
Regazzoni, “Video and radio attributes extraction for
identity and location estimation in Ambient Intelligence
systems,” Ambient Intelligence, 2004.
38 Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 Video Surveillance @ CVLab
Video Sorveglianza presso il Computer Vision Laboratory
Luigi Di Stefano, Stefano Mattoccia, Federico Tombari, Alessandro Lanza
DEIS-ARCES, University of Bologna
Viale Risorgimento 2, 40136 Bologna
Sommario
Questo contributo ha l’obiettivo di fornire una panoramica sulle attività di ricerca inerenti la video sorveglianza che sono attualmente in corso presso il Computer Vision Lab (vision.deis.unibo.it) dell’Università di Bologna. Tali attività vertono sulla rilevazione robusta del movimento e sull’individuazione di eventi potenzialmente illeciti quali atti di vandalismo o l’accesso simultaneo di più persone attraverso varchi ad elevata sicurezza (ad esempio, le “bussole” comunemente utilizzate per controllare l’accesso alle filiali bancarie). Per ciò che concerne la prima tematica, l’attività è focalizzata sullo sviluppo di algoritmi robusti di motion detection basati sul principio della background subtraction. In particolare, l’obiettivo è ottenere una rilevazione del movimento accurata ed al contempo molto robusta rispetto a fattori di disturbo quali variazioni sia lente sia repentine delle condizioni di illuminazione, dei parametri della telecamera (dovute a meccanismi quali la regolazione automatica del guadagno o dell’esposizione), ombre e rumore. In questo contesto sono stati sviluppati approcci basati su misure robuste per la corrispondenza visuale, sulla regressione isotonica non parametrica, sulla fusione di più viste registrate rispetto al piano di terra e sull’impiego sinergico di informazioni di apparenza e profondità. Nell’ambito della seconda tematica è stato affrontato il problema della rilevazione di atti di vandalismo che provocano variazioni di apparenza delle superfici statiche di una scena di riferimento. Esempi di tali atti sono il disegno di graffiti sulle mura di una palazzo così come il danneggiare o deturpare un quadro, una statua o anche il sedile di un mezzo di trasporto pubblico. E’ stato quindi sviluppato un sistema basato su due telecamere sincronizzate in grado di rilevare in tempo reale sia le intrusioni sia i sopracitati atti di vandalismo. Infine, è stato realizzato un sistema in grado di monitorare un varco ad elevata sicurezza al fine di rilevare la presenza di una persona (sensore di presenza) e prevenire l’attraversamento simultaneo del varco da parte di più persone (funzionalità di APB ‐ Anti Piggy‐Backing). Abstract
The aim of this contribution is to provide an overview on the research activities related to visual surveillance currently going on at Computer Vision Lab (vision.deis.unibo.it), University of Bologna. These concern robust extraction of motion cues as well as detection of illicit events such as acts of vandalism against public/private properties and piggy‐backing in secured entrance areas. Robust detection of motion cues We investigate on novel techniques for change detection based on background subtraction. Currently the major challenges for background subtraction algorithms are represented by handling vacillating backgrounds and sudden illumination changes occurring in the scene. Our research is focused on the latter issue. We have recently developed [1] a novel change detection algorithm aimed at robust and accurate foreground segmentation under sudden illumination variations. The algorithm deploys a robust visual correspondence measure [2] based on the ordering assumption in order to detect in each frame those points that reliably belongs to the background. Then, a tonal registration procedure [3] allows removing photometric distortions before carrying 39 Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 out background subtraction. Overall this methodology yields accurate change masks even under heavy illumination changes. Another approach relies on modelling the effects of disturbance factors on images as locally order‐
preserving transformations of pixel intensities plus additive noise. This allowed us to formalize the change detection problem within a ML (Maximum Likelihood) non‐parametric isotonic regression framework [7]. The latest development of this approach lead us to devise a novel method that, under the assumption of noiseless background and noisy incoming frames (“error in one image” regression problem), carries out robust change detection by an O(N) algorithm known as Pool Adjacent Violators Algorithm [5]. of the system against these factors. Event detection
We investigate on the use of change detection and 3D information to build a surveillance system able to perform real‐time detection of acts of vandalism such as graffiti and dirtying or scarring of walls and objects surfaces belonging to the background of the scene. The goal is to discriminate between changes occurring on the background of the scene (e.g. graffiti) and those occurring between the background and the sensor (intrusion). Obviously, the system must also be able to handle disturbance factors such as sudden illumination changes and camouflage. A first approach performs efficient and reliable Robust change detection has been also addressed graffiti detection by means of 3D information within a multi‐view framework [8] in a joint obtained from a stereo camera. First, our method research work with CVLab‐EPFL (cvlab.epfl.ch), deploys a robust background subtraction algorithm Lausanne. Global appearance changes due to [5] to detect structural changes occurring in the disturbance factors such as global light changes and scene, thus filtering out changes due to sudden dynamic adjustments of camera parameters (e.g. illumination variation. Then, an effective method auto‐exposure and auto‐gain control) are dealt with which deploys stereo disparity information [6] is by a proper single‐view change detection used to discriminate between changes occurring in algorithm (i.e. [7]) run independently on each view. the background and intrusions. The single‐view change masks are then fused into a synergy mask defined into a common virtual top‐
view attained by estimation of the homographies associated with the ground plane. This allows to effectively detect and filter‐out spurious local appearance changes due to physical points lying on the background surface (e.g. shadows cast by moving objects). We have also investigated on the use of depth information coming from a Time‐of‐Flight (TOF) camera to perform graffiti detection. TOF cameras relate to a very recent and growing technology which has already proved to be useful for several computer vision tasks. Preliminary results obtained on real image sequences have shown promising capabilities of our approach, with improvements expected as the technology gets more mature. Another strategy we have been investigating relies on the use of a 3D sensor based on stereo vision to A further event‐detection application concerns perform robust background subtraction [4]. The access monitoring to interlocks and secured deployment of depth information on the monitored entrance areas. In particular, we have developed a scene allows overcoming typical problems which video surveillance system which deploys two views occur using traditional approaches such as sudden in order to robustly perform intrusion detection and illumination changes, camouflage, presence of singularization (i.e. anti piggy‐backing). The first shadows. The basic assumption is that intrusion in a stage of our approach is represented by background scene must occur between the background and the subtraction which aims at handling real disturbance camera position: hence the joint use of intensity factors typically occurring in small interlocks such as and depth information helps improving robustness 40 Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 non‐linear illumination changes, shadows and photometric distortions. Then, the use of a feature extraction and classification approach allows to reliably determine an estimation of the number of people currently occupying the monitored area. Our system is designed to operate in very small interlocks and can work in a substantially unstructured environment [3]
[4]
[5]
[6]
References
[1] L. Di Stefano, F. Tombari, S. Mattoccia, E. De Lisi,
“Robust and accurate change detection under sudden
illumination variations", ACCV'07 Workshop on Multidimensional and Multi-view Image Processing, 2007.
[2] F. Tombari, L. Di Stefano, and S. Mattoccia, “A robust
measure for visual correspondence”, ICIAP’07, Int.
Conf. on Image Analysis and Processing, 2007
[7]
[8]
41 R. Gonzalez and R. Woods, “Digital Image
Processing”, Prentice Hall, 2nd edition, 2002.
F. Tombari, S. Mattoccia, L. Di Stefano, F. Tonelli,
“Detecting motion by means of 2D and 3D
information", ACCV'07 Workshop on Multi-dimensional
and Multi-view Image Processing, 2007.
A. Lanza, L. Di Stefano, “Statistical change detection
using the Pool Adjacent Violators Algorithm”, IEEE
Trans. Pattern Analysis and Machine Intelligence (in
preparation)
Y. A. Ivanov, A. F. Bobick, J. Liu, “Fast Lighting
Independent Background Subtraction", International
Journal of Computer Vision, vol. 37, n. 2, pages 199207, 2000.
A. Lanza, L. Di Stefano, “Detecting Changes in Grey
Level Sequences by ML Isotonic Regression”, AVSS
‘06, IEEE Conf. on Advanced Video and Signal Based
Surveillance.
A. Lanza, L. Di Stefano, Jerome Berclaz, Francois
Fleuret, Pascal Fua, “Robust Multi-View Change
Detection”, BMVC ‘07, British Machine Vision
Conference, 2007.
Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 Integrating computer vision techniques and wireless sensor
networks in video surveillance systems
Integrazione di tecniche di visione artificiale e reti di sensori
wireless in sistemi di video sorveglianza
Edoardo Ardizzone, Marco La Cascia, Liliana Lo Presti
Università di Palermo
Dipartimento di Ingegneria Informatica
Sommario
L’unità di ricerca dell’Università di Palermo che si occupa di video sorveglianza nell’ambito del progetto FREE SURF si propone lo studio e la messa a punto di metodologie e algoritmi per sistemi intelligenti di video‐sorveglianza in grado di rispondere alle esigenze sociali di salvaguardia della sicurezza pubblica e personale nel rispetto della privacy individuale. Un sistema intelligente di video‐sorveglianza deve essere in grado di monitorare ambienti dinamici in cui più persone interagiscono tra loro e con l’ambiente stesso in condizioni continuamente mutevoli nel tempo. Un tale sistema deve generare opportuni allarmi quando la situazione corrente è riconosciuta come potenzialmente pericolosa sulla base delle eventuali esperienze pregresse del sistema stesso. Al fine di monitorare vaste aree senza deturpare l’ambiente è inoltre auspicabile l’utilizzo di un insieme minimale di videocamere distribuite capaci di interagire in modalità wireless con dei server remoti in modo da garantire l’analisi distribuita e ad alto livello delle sequenze video acquisite. Le videocamere possono essere accoppiate ad una rete di sensori wireless in grado di rilevare grandezze caratteristiche quali temperatura, umidità etc. utili ad una migliore comprensione degli eventi che si verificano nel sito monitorato. Un sistema così fatto dovrà analizzare video per estrarre automaticamente ed in tempo reale informazioni sulla scena utili all’individuazione, alla localizzazione e all’inseguimento nel tempo (“tracking”) di persone presenti nella scena nonché all’analisi e alla comprensione dei loro comportamenti; il sistema deve comprendere gli eventi più significativi che si verificano e rilevare eventuali oggetti introdotti nella scena e potenzialmente pericolosi. Inoltre il sistema deve essere in grado di coordinare le diverse camere che lo compongono e fondere opportunamente le informazioni raccolte dalle camere posizionate in punti geograficamente diversi e dai sensori di varia natura opportunamente allocati nel sito. Un maggior controllo del sito potrebbe anche derivare da analisi a‐posteriori dei dati raccolti. A tal fine tecniche di information retrieval su database locali che memorizzano le informazioni locali sulla scena correntemente “vista” da ogni camera possono rivelarsi particolarmente utili e sono correntemente oggetto di studio. Abstract
Nowadays video‐surveillance systems are essential tools to monitor sites and to guarantee the safety of people: automatic detection of moving objects in the scene and recognition of dangerous events are particularly interesting [1,2,3]. Our project aims to realize tools and techniques for video surveillance systems in outdoor environment to detect people in an automatic real‐time way without the direct control of a human operator. The reference framework consists of distributed stationary cameras coordinated with sensor networks. In particular, wireless sensors are used to sense characteristic quantities of the monitored site, such as variations in temperature, humidity, noise, vibrations, and so on while cameras are used to obtain visual data on the site. These data are then locally processed to monitor interesting events concerning the safety of the monitored site and to recover information at a later time. 42 Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 A logical reasoning subsystem is responsible for the management of the whole system to allow complex analyses and infer a higher level representation of the outdoor environment. One of our research activities is related to the study of motion detection algorithms in a dynamically changing environment heavily subjected to illumination changes and constrained by the limited computational power of the devices used in the wireless sensor network. In such kind of distributed system, methods to track moving objects among the several distributed cameras plays a central role; these methods may establish geometrical relations among the different views of the cameras deployed in the site in order to univocally identify objects in the site and solve potential partial/total occlusion problems (consistent labelling). Moreover, automatically annotation of events on stored information is very useful in distributed video‐surveillance system to provide a temporal representation of the events arisen in the site and to allow query over wide databases. In particular, information retrieval techniques based on visual content are also object of our research activities to automatically understand what happened in the site. Real time video processing and automatic people detection, localization and tracking are intrinsically complex problems. In our project, new algorithms have been studied and developed for people detection, segmentation and tracking in videos by exploiting appearance information. Particular care has been posed in monitoring potentially dangerous situation for the site in real‐time and detecting, for example, new stationary objects introduced in the site like, for example, abandoned luggage. Wireless sensors are used to collect measurements from the environment such as variations in temperature, humidity, noise, vibrations, and so on. All these quantities may be used in connection to visual data to infer useful information on the dynamics of the site [4, 5]. Moreover, features describing the view acquired by cameras may be annotated to represent the most interesting situations useful for a‐posteriori analyses so to provide effective support in a predictive way. Figure 1 shows the model of the proposed architecture. Introduction
The automatic visual control of human presence and actions in an outdoor environment is a challenging problem. A large literature exists about surveillance in structured environment but a lot of research works is still necessary to develop and migrate video‐surveillance algorithms on architectures with low computational power and limited storing capability. We propose an architecture composed by a logical reasoning unit and a data gathering network consisting of wireless sensors and cameras. The former unit must infer high level representation of the monitored site by using the information gathered by the sensor networks; cameras coupled with a sensor networks must: collect raw data from the environment, locally process them to extract high or mid level information and, finally, aggregate data and send it to the logical reasoner. Fig. 2: The proposed Architecture for our WSN‐based video‐
surveillance system Wireless Sensor Network
This project aims to realize a video surveillance system for wide outdoor environments such as archaeological sites. To reduce costs, we combine a minimal set of cameras with wireless sensor networks (WSN) realizing a distributed video‐
surveillance system in which sensors collect and process in real‐time the data gathered in the 43 Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 monitored site and then present the results of the analyses to the final user. In particular, the proposed architecture aims to use the technology of wireless sensor networks in conjunction with traditional techniques of video processing and analysis of images coming from cameras deployed at strategic points in the monitored site. Wireless sensors are a class of tiny devices with programmable computing capabilities, equipped with sensing and communication features and characterized by a limited energy supply. A WSN consists of a large numbers of these heterogeneous devices; each node consists of a micro‐controller, a radio‐communication device, a battery for its own power supply and one or more low level sensor to sense the environment. Since this kind of sensors have to work in hostile conditions without any human action after the deployment, the minimization of energy loss keeping high the reliability and efficiency of the network is a typical project goal. The base station of the WSN collects information among the several nodes with which it communicates and acts like a gateway among the network nodes and the user. It is possible to share data among the sensors and to manipulate or merge it in order to reduce the number of transmissions by aggregating the gathered data rather than forwarding the raw sensor readings. There are two types of sensor nodes: 9 motes have low performance, limited processing and storing capability and are used like leaf nodes to sense the monitored site (fig. 2); 9 micro‐servers are generally more efficient in processing data having an higher computational capability; they work strictly joint with camera and are used to construct the backbone of the network of hybrid sensors to collect and process data from the peripheral nodes and to coordinate them (fig. 3). Fig. 2: Xbow MicaZ Fig. 3: Intel Stargate To run the experiments we have chosen to use Stargate micro‐server produced by CrossBow Technology. Stargate is a powerful Single Board Computer high capacity of communication and processing of signal detected by sensor nodes. It can be properly used to realize application oriented towards Wireless Sensor Network (WSN)[5]. It uses a XScale 400Mhz (PXA255) Intel processor realized in the research program on the Intel Ubiquitous Computing. Vision algorithms
A WSN‐based video‐surveillance system has to identify in real‐time moving objects and scene changes like the introduction in the site of new stationary objects. The system should determine if changes are temporary or permanent modifications and then choose and apply a strategy. It is important that low level nodes detect objects of interest and send to the reasoner only that information really useful to understand the scene. For example, low level nodes have to classify image pixels as foreground or background in order to detect moving objects and extract some features in order to characterize them. Moreover, to rapidly identify suspicious objects and to highlight the new 44 Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 stationary objects introduced in the scene, a new class, called midground, can be used [6]. We have developed an algorithm[7] to highlight changes in the scene by distinguishing at pixel level between moving objects and objects becoming stationary in the scene. In particular we have modelled the scene by using three different memories in order to store short‐term, medium‐
term and long‐term changes. In classifying pixel as foreground, midground and background we have taken advantages from considering the multi‐modal nature of the pixels and their temporal evolution. In particular, midground permits us to put in evidence the new stationary objects in the site so to make possible the analysis of suspicious objects (for example abandoned bags) appearing in the scene. In our algorithm, background adapts only to the permanent changes; the training phase of the models is done on‐line and output precision becomes more and more accurate after some frames. The algorithm has been conceived with particular care to limit computational load and memory occupancy to guarantee real‐time operation on a low level node of a WSN. We have tested our method on the Stargate platforms and experimental results showed that it would be possible to use it in a wider system in which a wireless sensor network is realized to monitor the site. The developed technique reasonably deals with waving trees and foreground aperture issues considering also the compuational limit deriving from our approach where we chose to process video data on low computational power nodes and send only aggregate data over the network. Neverthless, each “smart” camera provides only local information so one of the main task of the logical reasoner is to merge and correlate the information coming from different geographical locations and infer a more global representation of the events in the site. The reasoner must assign the same identity to several views of the same object acquired from different cameras and have to track the object when it moves from a camera field of view to another one. This problem of coordination of many cameras is known as "consistent labeling" and may be solved with several approaches. Our research activities aim to solve the problem by geometrical methods. In our approach, at the start up the reasoner computes iteratively the homographic transformation between two couples of cameras by matching the appearance models of the moving objects detected in the correspondent nodes. This is done by tracking the objects and using the resulting trajectories to determine the correspondence points in the processed views. Once the transformation is obtained, then each correspondence in the two views is known and it is possible to use this information to label in a consistent way the detected objects and solve problem of partial/global occlusion or of split/merge of the detected objects. Another topic that is currently under investigation in our group is the analysis a‐posteriori of sensed data. Ideally the reasoner must be able to infer what is happening in the site by its own experience and knowledge of the past events. To assure this ability, we need to describe the dynamics of the events occurring in the site by maintaining many databases connected to groups of nodes. In our vision the databases should contain only low‐level visual information to dinamycally represent the site. The reasoner should then implement filtering and querying capability to execute data mining procedures. The information deduced by means of this high level analysis can then be used, for instance, to detect abnormal or suspicious events in the scene or to determine which should be the normal response of the system to the events occurring. Conclusions and future works
Intelligent video‐surveillance is an active research field and many techniques are still far away from 45 Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 the maturity needed for a commercial and sensitive product. Our research activities is oriented towards automatic understanding of people behaviors and gesture recognition by using the aggregated data produced by the nodes of the WSN. Future works include leveraging content‐based image and video retrieval techniques for wide area distributed video‐surveillance systems, integrating face detection, and facial expression analysis capabilities and developing algorithms to recognize objects introduced in the environment. References
[1] T. P. Chen, H. Haussecker, A. Bovyrin, R. Belenov, K.
Rodyushkin, A. Kuranov, and V. Eruhimov, Computer
vision workload analysis: Case study of video
surveillance systems, Intel Technology Journal Compute-intensive, highly parallel applications and
uses, 9:109 – 118, 2005.
[2] R. Collins, A. Lipton, H. Fujiyoshi, and T. Kanade,
Algorithms for cooperative multisensor surveillance,
Proceedings of the IEEE, 89:1456 – 1477, 2001.
[3] I. Haritaoglu, D. Harwood, and L. S. Davis, W4: Realtime surveillance of people and their activities, IEEE
Transactions on Pattern Analysis and Machine
Intelligence, 22:809 – 830, 2000.
[4] A. Mainwaring, D. Culler, J. Polastre, R. Szewczyk, and
J. Anderson, Wireless sensor networks for habitat
monitoring, in WSNA ’02: Proceedings of the 1st ACM
international workshop on Wireless sensor networks
and applications, pages 88–97. ACM, 2002.
[5] M. Rahimi, R. Baer, O. I. Iroezi, J. C. Garcia, J.
Warrior, D. Estrin, and M. Srivastava, Cyclops: in situ
image sensing and interpretation in wireless sensor
networks, in SenSys ’05: Proceedings of the 3rd
international conference on Embedded networked
sensor systems, pages 192–204. ACM, 2005
[6] S. Apewokin, B. Valentine, L. Wills, S. Wills, and A.
Gentile, Midground object detection in real world video
scenes, IEEE Conference on Advanced Video and
Signal based Surveillance AVSS ’07., 2007.
[7] L. Lo Presti and M. La Cascia, Real-time object
detection in embedded video surveillance systems, 9th
International Workshop on Image Analysis for
Multimedia Interactive Services (in press)
46 Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 Challenges and developments in intelligent video surveillance
Sfide e sviluppi nella video sorveglianza intelligente
Virginio Cantoni, Luca Lombardi, Roberto Marmo
University of Pavia, Faculty of Engineering, Laboratory of Computer Vision
Website: http://vision.unipv.it
Sommario
Il laboratorio di Visione artificiale presso la Facoltà di Ingegneria a Pavia sviluppa le seguenti tematiche della videosorveglianza: 1. conteggio degli autoveicoli e calcolo della velocità 2. riconoscimento del comportamento di una persona che chiede soccorso 3. controllo degli accessi a tunnel ferroviari. Le attività di ricerca vengono condotte in collaborazione con aziende interessate a risolvere un preciso problema. Nel seguito verranno discusse per ogni tematica le immagini usate e gli algoritmi con particolare riferimento alle prestazioni. I risultati più innovativi sono stati ottenuti in merito al punto 2 circa l’analisi della sagoma umana, il suo comportamento in un breve intervallo di tempo, l’analisi della comunicazione non verbale per chiedere aiuto, la creazione di tutti gli scenari possibili per la valutazione dei risultati. Vehicle counting
The first grey‐level camera analyzes the traffic flow, the second color camera stores video only during traffic, tele‐laser computes velocity of vehicle in the area. The first camera acts as a trigger of turned on/off (activation/disactivation) of other systems (Fig. 1). Fig. 3: an example of entry (lower) and exit (upper) area as triggers for vehicle counting. Abstract
The Laboratory of Computer Vision of Faculty of Engineering in Pavia is developing the following researches on video surveillance: 1. vehicle counting and speed estimation; 2. visual SOS request;
3. unauthorized entry into railway tunnel; 4. buyer's guide CCTV.
The research is in collaboration with companies to solve specific problem. In the following we discuss each topic regarding images used and algorithms with particular cure to evaluation step. Innovative results were obtained about activity 2 regarding analysis of human shape, nonverbal communication when the person executes the SOS request, short‐
time behaviour of human shape, performance evaluation derived from various task scenarios. Two regions of interests are placed on image by user interface to establish entry and exit area. We compute the mean of grey pixel as activation threshold and we remove shadows due to vehicles in the other direction of two‐way street. The trigger signal is turned on when a vehicle ends to cross the entry area and the counter is increased. So, the trigger can alert the other systems and the counter represents the number of vehicles between the two areas. The trigger signal is turned off when a vehicle ends to cross the exit area and the counter is decremented, in case of zero value all systems are turned off. Another software can computes off‐line on few selected frame the classification of vehicle shapes. 47 Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 As future research, we will study vehicle classification using pixel area, Hough transform to extract vertical lines due to rear structure of vehicle and diagonal line due to up‐slanted boundaries of vehicle. Visual SOS request
In paper [1] we propose an approach for detecting an SOS people request in intelligent video surveillance system, a sort of behavioural analysis. We suppose that in a public outdoor environment there are a lot of camera and specific signs that suggest to people to look at camera, so we need to consider only the front side view of human body. The case studied is: when a person wants to attract the attention for an SOS request, he has to look at the camera and execute a particular movement of his or her body as shaking the arms and jumping for a certain time. In this way there is a sort of nonverbal communication based on body language (Fig. 2). After simple background subtraction and shadow detecting, we draw a rectangle around the human body. scene with different persons (e.g. male, female, small, tall and so forth) at 5.7, 8.7, 11, 13 meters from camera. Also, we consider scene with: people walking; people walking in groups and speaking together; three actors through the scene, they executes an SOS request and go out. As future research, we will study movements of people that all together create a SOS request. Unauthorized entry in tunnel
The purpose is to detect unauthorized intrusion in railway tunnel from vagrants, suicide attempts or even terrorist attacks. We recognize 3D shape features of a human body automatically using template matching composed by primary models of head and shoulder (Fig. 3). A set of model is composed by resizing the primary model. In this way it is possible to exclude train movements and noise as tree branches or leafs moving in the wind. Fig. 3: an example of templates about head and shoulder (left) to recognize human body (right). Fig. 2: an example of person that shaking the arms and jumping to create a SOS request. Two geometric feature from all rectangles are: the area and the centroid, that is the cartesian coordinates of the center of mass of the region. Particular care was taken on performance evaluation based on 3487 color images on two outdoor environments, 24‐bit RGB 480x360 pixel. Average accuracy is evaluated in 77%. We consider For each column we count edges between grey and black area, and we count how many adjacent columns assume similar values. We compare these values with related measures on set of model and decision about presence of body is based on thresholds. As future research we adopt model of heads of animals. 48 Prima Giornata di Studio sui Progetti di Ricerca in Video Sorveglianza in Italia 2008 First Workshop on Video Surveillance projects in Italy 2008 Buyer’s guide CCTV A wide variety of systems are available, so buyer is very confused. We are writing a checklist composed of many key factors to think through when deciding which type of system and equipment is right for your individual needs. Specific weight is associated at each key, so weighted sum can return a measure of convenience of choices. Acknowledges This work has been developed with the financial support of MIUR‐PRIN 2006 funding " Detection and classification of complex events in real environments using reconfigurable sensor networks". References
[1] Cantoni V., Marmo R., Zemblini M.: Video Surveillance
and SOS Request, 14th International Conference on
Image Analysis and Processing, ICIAP 2007, p. 560565.
49 Con il patrocinio dell'Università di Modena e Reggio Emilia
Dipartimento di
Ingegneria
dell'Informazione
Dipartimento di
Scienze e Metodi
dell'Ingegneria
Si ringraziano gli sponsor
CRIS
(Centro
Interdipartimentale di
Ricerca sulla Sicurezza)