Print Slides - IfIS - Technische Universität Braunschweig
Transcription
Print Slides - IfIS - Technische Universität Braunschweig
4/7/2016 0 Organizational Issues • Lecture – 07.04.2016 – 14.07.2016 – 09:45-12:15 (approx. 2 lecture hours with a break) – Exercises, detours, and homeworks Multimedia Databases • 5 Credits • Exams Wolf-Tilo Balke Younès Ghammad Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de – Oral exam – Achieving more than 50% in homework points is advised Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 0 Organizational Issues 0 Organizational Issues – Castelli/Bergman: Image Databases, Wiley, 2002 • Recommended literature – Schmitt: Ähnlichkeitssuche in Multimedia-Datenbanken, Oldenbourg, 2005 – Khoshafian/Baker: Multimedia and Imaging Databases, Morgan Kaufmann, 1996 – Steinmetz: Multimedia-Technologie: Grundlagen, Komponenten und Systeme, Springer, 1999 Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig – Sometimes: original papers (on our Web page) 3 0 Organizational Issues Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 4 1 Introduction • Course Web page 1 Introduction – http://www.ifis.cs.tu-bs.de/teaching/ss-16/mmdb – Contains slides, exercises, related papers and a video of the lecture – Any questions? Just drop us an email… Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 2 1.1 What are multimedia databases? 1.2 Multimedia database applications 1.3 Evaluation of retrieval techniques 5 Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 6 1 4/7/2016 1.1 Multimedia Databases 1.1 Basic Definitions • What are multimedia databases (MMDB)? • Multimedia – Databases + multimedia = MMDB – The concept of multimedia expresses the integration of different digital media types – The integration is usually performed in a document – Basic media types are text, image, vector graphics, audio and video • Key words: databases and multimedia • We already know databases, so what is multimedia? Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 7 1.1 Data Types Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 8 1.1 Documents • Text • Document types – Text data, Spreadsheets, E-Mail, … – Media objects are documents which are of only one type (not necessarily text) – Multimedia objects are general documents which allow an arbitrary combination of different types • Image – Photos (Bitmaps), Vector graphics, CAD, … • Audio – Speech- and music records, annotations, wave files, MIDI, MP3, … • Multimedia data is transferred through the use of a medium • Video – Dynamical image record, frame-sequences, MPEG, AVI, … Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 9 1.1 Basic Definitions 10 1.1 Medium Example • Medium • Book – A medium is a carrier of information in a communication connection – It is independent of the transported information – The used medium can also be changed during information transfer Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig – Communication between author and reader – Independent from content – Hierarchically built on text and images – Reading out loud represents medium change to sound/audio 11 Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 12 2 4/7/2016 1.1 Medium Classification 1.1 Multimedia Databases • Based on receiver type • We now have seen… – Visual/optical medium – Acoustic mediums – Haptical medium – through tactile senses – Olfactory medium – through smell – Gustatory medium – through taste – …what multimedia is – …and how it is transported (through some medium) • But… why do we need databases? – Most important operations of databases are data storage and data retrieval • Based on time – Dynamic – Static Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 13 Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 1.1 Multimedia Databases 1.1 Multimedia Databases • Persistent storage of multimedia data, e.g.: • Stand-alone vs. database storage model? – Text documents – Vector graphics, CAD – Images, audio, video – Special retrieval functionality as well as corresponding optimization can be provided in both cases… – But in the second case we also get the general advantages of databases • Content-based retrieval – Efficient content-based search – Standardization of meta-data (e. g., MPEG-7, MPEG-21) Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 15 1.1 Historical Overview Declarative query language Orthogonal combination of the query functionality Query optimization, Index structures Transaction management, recovery ... Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 16 • Relational Databases use the data type BLOB (binary large object) Retrieval procedures for text documents (Information Retrieval) – Un-interpreted data – Retrieval through metadata like e.g., file name, size, author, … 1970 Relational Databases and SQL 1980 • Object-relational extensions feature enhanced retrieval functionality Presence of multimedia objects intensifies – Semantic search – IBM DB2 Extenders, Oracle Cartridges, … – Integration in DB through UDFs, UDTs, Stored Procedures, … SQL-92 introduces BLOBs First Multimedia-Databases 2000 Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig • • • • • 1.1 Commercial Systems 1960 1990 14 17 Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 18 3 4/7/2016 1.1 Requirements 1.1 Requirements • Requirements for multimedia databases (Christodoulakis, 1985) • To comply with these requirements the following aspects need to be considered – Classical database functionality – Maintenance of unformatted data – Consideration of special storage and presentation devices Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig – Software architecture – new or extension of existing databases? – Content addressing – identification of the objects through content-based features – Performance – improvements using indexes, optimization, etc. 19 1.1 Requirements • Retrieval: choosing between data objects. Based on… – a SELECT condition (exact match) – or a defined similarity connection (best match) • Retrieval may also cover the delivery of the results to the user 21 1.1 Retrieval Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 22 1.1 Content-based Retrieval • Closer look at the search functionality • “Retrieve all images showing a sunset !” – „Semantic“ search functionality – Orthogonal integration of classical and extended functionality – Search does not directly access the media objects – Extraction, normalization and indexing of contentbased features – Meaningful similarity/distance measures Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 20 1.1 Retrieval – User interface – how should the user interact with the system? Separate structure from content! – Information extraction – (automatic) generation of content-based features – Storage devices – very large storage capacity, redundancy control and compression – Information retrieval – integration of some extended search functionality Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig • What exactly do these images have in common? 23 Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 24 4 4/7/2016 1.1 Schematic View 1.1 Detailed View Query • Usually 2 main steps – Example: image databases Image collection 3. Query preparation Image analysis and feature extraction Digitization Result 5. Result preparation Query plan & feature values Image database Result data 4. Similarity computation & query processing Creating the database Feature values Raw & relational data Querying the database Image analysis and feature extraction Digitization 2. Extraction of features Similarity search Search result Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig MM-Database 1. Insert into the database MM-Objects + relational data 25 1.1 More Detailed View Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 26 1.2 Applications Result Query preparation Normalization Segmentation Feature extraction Optimization Feature values Query plan Relevance feedback Query Raw data Image query Similarity computation • Lots of multimedia content on the Web Result preparation – Social networking e.g., Facebook, MySpace, Hi5, etc. – Photo sharing e.g., Flickr, Photobucket, Instagram, Picasa, etc. – Video sharing e.g.,YouTube, Metacafe, blip.tv, Liveleak, etc. Medium transformation Format transformation Result data Query processing Feature values MM-Database Feature index BLOBs/CLOBs Relational DB Metadata Profile Structure data Feature extraction Pre-processing Feature recognition Feature preparation Decomposition Normalization Segmentation MM-Objects Relational data Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 27 1.2 Applications Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 28 1.2 Applications • Cameras are everywhere • Picasa face recognition – In London “there are at least 500,000 cameras in the city, and one study showed that in a single day a person could expect to be filmed 300 times” Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 29 Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 30 5 4/7/2016 1.2 Applications 1.2 Applications • Picasa, face recognition example Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig • 2007 :learning phase • 2015: ~ 98% correctness 31 1.2 Applications Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 32 1.2 Sample Scenario • Consider a police investigation of a large-scale drug operation • Possible generated data: • Picasa example – Video data captured by surveillance cameras – Audio data captured – Image data consisting of still photographs taken by investigators – Structured relational data containing background information – Geographic information system data Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 33 1.2 Sample Scenario Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 34 1.2 Sample Scenario • Possible queries – Video Query: (Murder case) – Image query by keywords: police officer wants to examine pictures of “Tony Soprano” • The police assumes that the killer must have interacted with the victim in the near past • Query: “Find all video segments from last week in which Jerry appears” • Query: “retrieve all images from the image library in which ‘Tony Soprano’ appears" – Image query by example: the police officer has a photograph and wants to find the identity of the person in the picture • He hopes that someone else has already tagged another photo of this person • Query: “retrieve all images from the database in which the person appearing in the (currently displayed) photograph appears” Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 35 Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 36 6 4/7/2016 1.2 Sample Scenario 1.2 Characteristics – Heterogeneous Multimedia Query: • Find all individuals who have been photographed with “Tony Soprano” and who have been convicted of attempted murder in New Jersey and who have recently had electronic fund transfers made into their bank accounts from ABC Corp. 37 Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 1.2 Example • … so there are different types of queries … what about the MMDB characteristics? – Static: high number of search queries (read access), few modifications of the data – Dynamic: often modifications of the data – Passive: database reacts only at requests from outside – Active: the functionality of the database leads to operations at application level – Standard search: queries are answered through the use of metadata e.g., Google-image search – Retrieval functionality: content based search on the multimedia repository e.g., Picasa face recognition Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 38 1.2 Example – Coat of arms: Possible hit in a multimedia database • Passive static retrieval – Art historical use case 39 Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 1.2 Example Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 40 1.2 Example • Active dynamic retrieval • Standard search – Wetter warning through evaluation of satellite photos – Queries are answered through the use of metadata e.g., Google-image search Extraction Typhoon-Warning for the Philippines Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 41 Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 42 7 4/7/2016 1.2 Example 1.3 Retrieval Evaluation • Retrieval functionality • Basic evaluation of retrieval techniques – Content based e.g., Picasa face recognition – Efficiency of the system • Efficient utilization of system resources • Scalable also over big collections – Effectivity of the retrieval process • High quality of the result • Meaningful usage of the system • What is more important? An effective retrieval process or an efficient one? Depends on the application! Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 43 1.3 Evaluating Efficiency Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 44 1.3 Evaluating Effectivity • Characteristic values to measure efficiency are e.g.: • Measuring effectivity is more difficult and always depending on the query • We need to define some query-dependent evaluation measures! – Memory usage – CPU-time – Number of I/O-Operations – Response time – Objective quality metrics – Independent from the querying interface and the retrieval procedure • Depends on the (Hardware-) environment • Allows for comparing different systems/algorithms • Goal: the system should be efficient enough! Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 45 1.3 Evaluating Effectivity 46 1.3 Relevance • Effectivity can be measured regarding an explicit query • Relevance as a measure for retrieval: each document will be binary classified as relevant or irrelevant with respect to the query – Main focus on evaluating the behavior of the system with respect to a query – Relevance of the result set – This classification is manually performed by “experts” – The response of the system to the query will be compared to this classification • But effectivity also needs to consider implicit information needs • Compare the obtained response with the “ideal” result – Main focus on evaluating the usefulness, usability and user friendliness of the system – Not relevant for this lecure! Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 47 Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 48 8 4/7/2016 1.3 Involved Sets 1.3 False Positives • Then apply the automatic retrieval system: • False positives: irrelevant documents, classified as relevant by the system collection collection – False alarms Experts say: this is relevant ca searched for fa found cd found (= query result) searched for (= relevant) fd • Needlessly increase the result set • Usually inevitable (ambiguity) • Can be easily eliminated by the user The automatic retrieval says: this is relevant 49 Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 1.3 False Negatives 1.3 Remaining Sets • False negatives: relevant documents classified by the system as irrelevant • Correct positives (correct alarms) – All documents correctly classified by the system as relevant collection – False dismissals 50 Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig fd ca searched for fa found • Correct negatives (correct dismissals) • Dangerous, since they cd can’t be detected easily by the user – Are there “better” documents in the collection which the system didn’t return? – False alarms are usually not as bad as false dismissals – All documents correctly classified by the system as irrelevant • All sets are disjunctive and their reunion is the entire document collection collection fd ca searched for fa found cd Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 51 1.3 Overview • Relevant results = fd + ca – Handpicked by experts! • Retrieved results = ca + fa – Retrieved by the system relevant irrelevant relevant ca fd irrelevant fa cd Userevaluation 52 1.3 Interpretation • Confusion matrix: visualizes the effectivity of an algorithm Systemevaluation Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig collection fd ca searched for fa found cd Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 53 Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 54 9 4/7/2016 1.3 Precision 1.3 Recall • Precision measures the ratio of correctly returned documents relative to all returned documents • Recall measures the ratio of correctly returned documents relative to all relevant documents – R = ca / (ca + fd) collection – P = ca / (ca + fa) collection fd fd ca searched for fa • Value between [0, 1] (1 representing the best value) • High number of false drops mean worse results found cd • Value between [0, 1] (1 representing the best value) • High number of false alarms mean worse results Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 55 1.3 Precision-Recall Analysis • Both measures only make sense, if considered at the same time • Can be balanced by tuning the system – E.g., smaller result sets lead to better precision rates at the cost of recall cd 56 1.3 Actual Evaluation collection fd ca searched for – Precision is easy to calculate fa found cd • Dismissals (not returned elements) are not so trivial to divide in cd und fd, because the entire collection has to be classified – Recall is difficult to calculate • Standardized Benchmarks • Usually the average precision-recall of more queries is considered (macro evaluation) Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig fa found Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig • Alarms (returned elements) divided in ca and fa – E.g., get perfect recall by simply returning all documents, but then the precision is extremely low… ca searched for – Provided connections and queries – Annotated result sets 57 1.3 Example Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 58 1.3 Representation • Precision-Recall-Curves Query fa ca fd cd P R Q1 8 2 6 4 0,2 0,25 Q2 2 8 2 8 0,8 0,8 0,5 0,525 Average Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig Average precision of the system 3 at a recall-level of 0,2 System 1 System 2 System 3 59 Which system is the best? What is more important: recall or precision? Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 60 10 4/7/2016 Next lecture • • • • Retrieval of images by color Introduction to color spaces Color histograms Matching Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 61 11