Balloon Extraction from Complex Comic Books Using
Transcription
Balloon Extraction from Complex Comic Books Using
Balloon Extraction from Complex Comic Books Using Edge Detection and Histogram Scoring João Miguel Coronha Correia Submitted to University of Beira Interior in partial fulfillment of the requirement for the degree of Master of Science in Tecnologia e Sistemas de Informação Supervised by Prof. Dr. Abel João Padrão Gomes Departmento de Informática Universidade da Beira Interior Covilhã, Portugal http://www.di.ubi.pt Acknowledgments I would like to express my gratitude to my advisor, Prof. Dr. Abel João Padrão Gomes, for his scientific guidance during the development of this master thesis and related affairs. João Correia Covilhã, Portugal October 22, 2012 iii iv Resumo Ao longo dos anos várias tecnologias disruptivas tem alterado a maneira como acedemos aos conteúdos e informações presentes nos vários meios de comunicação e formas de expressão cultural. Recentemente, com o aumento do mercado dos dispositivos portáteis como smartphones, tablets, portáteis e ultrabooks, tornou-se necessário adaptar os conteúdos para este formato de ecrã reduzido. Adaptações em livros, filmes, ou música foram relativamente rápidas e bem sucedidas porque as suas caracteristicas intrinsecas puderam ser modificadas para uma correcta representação em ecrãs pequenos e de manuseamento táctil. Os livros de texto foram particularmente bem sucedidos nestes novos formatos. Os blocos de texto moldam-se facilmente a qualquer espaço, e o tamanho de letra pode ser ajustado para permitir maior visibilidade. Mas alguns tipos de livros não se adaptaram tão bem, nomeadamente os livros, ou revistas, de banda desenhada. A sua estrutura, composta por arte e texto, não é moldável, e as suas formas complexas não são tão adaptáveis como livros de texto. No entanto, existe um mercado potencial cada vez maior de jovens com acesso a dispositivos inteligentes que são, simultaneamente, o público alvo preferencial para as revistas de banda desenhada. Torna-se relevante, então, arranjar forma de adaptar, o melhor possível, os conteúdos aos dispositivos. Um dos maiores problemas da adaptação aos dispositivos móveis é a legibilidade do texto das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o texto está representado como parte do desenho, dentro de balões, como fazer para encontrar apenas esses elementos relevantes? Como realçar os balões, tornando-os mais visíveis, ou maiores, sem perder o contexto visual dentro da página? Como evitar aumentar o tamanho da imagem através de um nível de zoom excessívo e permitir ao mesmo tempo discernir o conteúdo? Como faze-lo para bandas desenhadas mais complexas que web comics com três paineis e estilos artísticos minimalistas? v Esta dissertação apresenta uma possível solução para este problema, ao apresentar um algoritmo para identificação de balões dentro das páginas de banda desenhada, com um procedimento aplicável a qualquer tipo de banda desenhada, mesmo as mais elaboradas. vi Abstract Over the years, several disruptive technologies have changed the way we access content and information on various media and means of cultural expression. Recently, with the expansion of the market for hand held devices like smart phones, tablets, laptops and ultra books, it has become necessary to adapt content to this reduced screen size. Changes in books, movies or music were relatively fast and successful because their intrinsic characteristics could be modified so that they were correctly represented in small screen sizes and handled with tactile interfaces. Text books were particularly successful in this new formats. Text blocks are easily molded to any space, and font size can be adjusted to allow for better visibility. But some types of books were not so easily adapted, namely comic books. Its structure, with art and text, is not adjustable, and its complex shapes are not as easy to change as text books. Nevertheless, there is a growing potential market of young people with access to smart devices which, simultaneously, are the preferential target audience for comic books. It is relevant, then, to find ways to adapt, as best as possible, the content to the devices. One of the greatest problems in the adaptation to mobile devices is the readability of comic book text when viewing it in smaller than normal sizes. Since the text is embedded in the art, inside balloons, how to find just those relevant elements? How to enhance the balloons, making them more visible, or bigger, without losing visual context inside the page? How to avoid enlarging the image with excessive zoom level and at the same time allow the content to be understood? How to do it for comic books more complex than three panel web comics with minimalistic art styles? This dissertation presents a possible solution to this problem, by introducing an algorithm to identify balloons inside comic book pages, with a method that works for any type of comic book, even the most complex ones. vii Contents Resumo v Abstract vii Contents ix List of Figures xi List of Tables xiii 1 2 Introduction 1 1.1 Motivation and Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Research Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.5 Target Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Balloon Extraction from Complex Comic Books Using Edge Detection And Histogram Scoring 5 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Page Layout and Terminology . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3 Balloon Extraction Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3.1 Page Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3.1.1 Conversion to gray scale . . . . . . . . . . . . . . . . . . 15 2.3.1.2 Sobel-based edge detection . . . . . . . . . . . . . . . . 16 2.3.1.3 Negative pages . . . . . . . . . . . . . . . . . . . . . . . 17 2.3.1.4 Flood fill and region extraction . . . . . . . . . . . . . . 18 ix x CONTENTS 2.3.2 2.4 2.5 3 Balloon extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3.2.1 Region culling . . . . . . . . . . . . . . . . . . . . . . . 20 2.3.2.2 Region scoring . . . . . . . . . . . . . . . . . . . . . . . 21 2.3.2.3 Region sorting . . . . . . . . . . . . . . . . . . . . . . . 21 2.3.2.4 Region filtering . . . . . . . . . . . . . . . . . . . . . . 22 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.4.1 Analysis of results . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.4.2 Comparison to other algorithms . . . . . . . . . . . . . . . . . . . 29 2.4.3 OCR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Conclusions References 33 35 List of Figures 2.1 Alpha Flight 6, January 1984, page 9 [1]. . . . . . . . . . . . . . . . . . . 8 2.2 Comic book page components. . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3 (left) Original image. The Amazing Spider-Man 679, April 2012, story page 5; (right) Gray scale image. . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 (left) Book page after applying the Sobel edge detector; (right) Negative of the Sobelized book page. . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 22 Balloon regions extracted from the book page shown in Fig. 2.3. Although not visible here, the regions have a balloon shape indeed. . . . . . . . . . . 2.8 19 (top) Typical histogram of a balloon region; (bottom) histogram of an ordinary (non-balloon) region. . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 14 Flood fill method comparison. (left) standard flood fill extraction, (right) modified flood fill extraction. . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 13 23 Special case balloons that were successfully recognized. Balloons having different text colors than other balloons, different text colors inside the same balloon, shapes different from standard balloons, wavy text, different font faces in the same balloon. . . . . . . . . . . . . . . . . . . . . . . . . . . . xi 30 List of Tables 2.1 Criteria for size-based region culling. . . . . . . . . . . . . . . . . . . . . 20 2.2 Extraction results for [2] . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.3 Extraction results for [3] . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.4 Extraction results for [4] . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.5 Extraction results for [5] . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.6 Extraction results for [6] . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.7 Extraction results for [7] . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.8 Extraction results for [8] . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.9 Extraction results for [9] . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.10 Extraction results for [10] . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.11 Extraction results for [11] . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.12 Extraction results for [12] . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.13 Extraction results for [13] . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 xiii Chapter 1 Introduction A hand held device, or mobile device, or even hand held computer, is a small computing device defined by a small screen size and touch input or minimal keyboard. The category comprises smart phones, tablets or PDA’s. These types of devices started to appear in mass production at the turn of the century, driven by a need for always accessible communication and content. Early usage scenarios, beyond establishing phone calls, involved text messaging, e-mail or simple web browsing, but with the increase in network bandwidth, cpu power and battery life, those usage scenarios grew more complex and now encompass everything from book reading to watching movies or video chat. This device market has seen an explosive growth in the last few years, particularly since the introduction of Apple’s Ipad device. While some types of content like text books are perfect matches for the devices, others require some trade-offs when viewed in devices with reduced screen sizes. One of these content types was comic books. Comic books, as an art form, have been present in human cultures for many years, with some possible representations as far back as the middle ages [14], and are a popular form of entertainment and story telling around the world. With the convenience of hand held devices, it was logical to expect comic books to be widely spread in usage on those devices, given the almost perfect match between target audiences for comic books and hand held devices. Yet, despite the fact that the major publishers have, recently, begun to distribute digital versions of their comics[15][16], the adoption has been, at best, slow, when compared to e-books, music or movies. This is, in no small measure, caused by the inadequacy of the traditional comic book to the small screen size. Currently, one of two things happens when 1 2 CHAPTER 1. INTRODUCTION you try to read a comic book in a hand held device: - The digital comic book has been pre-processed and embedded information regarding text has been included in the file. - The digital comic book has not been pre-processed and in order to correctly read it, the reader has to manually zoom in on all the text elements of the comic to be able to read them and comprehend the story. Both of these approaches have drawbacks. The first requires more pre-publishing time for new comic books before reaching the market, and more manual processing, hence making them more expensive than required and also delays the distribution, and the second makes the reading experience very user-unfriendly and ultimately leads to potential readers not choosing these devices. Since digital comic book have multiple advantages over traditional paper comic books, like easier storage, global availability, potentially lower price and faster distribution, better searching and indexing and, even, better quality, it is quite distressing not to take advantage of it. Of no lesser importance is the fact that digital comic books allow younger (or new) readers access to comics that are no longer available in paper format[17]. As in other areas like automatic document digitalization[18] or video processing[19], there are digital image processing techniques that can be applied to comic books to provide some of the lacking functionalities, namely, correctly identifying the elements of the comic book image that contain text - the balloons. Relying on edge detection methods, like Sobel, flood fill and histogram analysis[20], the algorithm presented in this thesis describes an answer to the problem of how to extract the balloons from comics book pages, to improve usability and provide a better experience to the end user and allow for an adoption of digital comic books comparable to text e-books. 1.1 Motivation and Objectives The main goal of this thesis is to demonstrate the feasibility of applying digital image processing techniques to comic book images and correctly identify and extract the balloons, and in doing so, make it possible to use small screen devices to adequately read comic books, in the same way as those devices are used to read traditional text books today. Existing proposed solutions to this problem have almost always focused more in simple 1.1. MOTIVATION AND OBJECTIVES 3 web comics, which are, traditionally, comics with much simpler art style and much smaller in scope when compared to commercial comic books published by companies like Marvel Comics, DC Comics or Dark Horse. Those approaches have, invariably, produced extremely good results for those types of web comics, but less than adequate [21] for commercial comic books. Other solutions are targeted to specific types of comics, like manga, and target specific characteristics like top-to-bottom text flows for identification of text balloons [22]. It has been, so far, impossible to find a solution that has acceptable level of success extracting balloons accross all types of comics. This problem of identifying the balloons is not, by any means, a trivial task[18]. Balloons can have any shape, size or be in any location inside the comic book page. Artistic style can make the balloons spread over multiple panels, or overlapping each other. Text inside the balloons can have any color or font face, sometimes even inside the same balloon [23][14]. Another important consideration is that hand held devices are battery powered, so any approach to this problem must account for the fact that, cpu intensive operations will drain that battery faster, and, as such, result in a bad user experience. This would work against the adoption of these devices for comic book reading. This has been an insurmountable problem, so much so, that, despite the fact that the main publishers have digital distribution systems in place, the actual digital comic books they sell are pre-processed manually to identify the balloons. This is, in a way, comparable to the medieval form of book copying by scribe’s hand, and clamouring for a workable, automated, solution, that allows for both enhancing the digital comic book experience and allow for the automated indexing and processing of the digital age. Irrespective of all those challenges, the main motivation for tackling this problem stems from the desire to find an enjoyable solution that brings the convenience of hand held devices that can store several thousand comic books with the pleasure of having available, almost anywhere, comic books to read, and finally replace dead-tree comics with equally good digital versions. 4 CHAPTER 1. INTRODUCTION 1.2 Research Contribution The major contributions of this thesis is an automatic balloon identification and extraction algorithm using digital image processing techniques. 1.3 Organization of the Thesis This document has the following chapters: - Chapter 1. This the current chapter, which approaches the subject of the thesis, its motivation and objectives, well as the the problem solved in this thesis, the organization of the thesis, and the audience to which the thesis is addressing. - Chapter 2. This chapter is the core of this thesis, because it describes in detail the algorithm to extract balloons from comic book pages. This algorithm is based on image processing techniques, and applies to any sort of comics. - Chapter 3. This is the last chapter of this thesis, where relevant conclusions are drawn and possible future work is outlined. 1.4 Publications Resulting from this work, the following paper has been submitted for publication: J. Correia and A. Gomes. Complex Comic Book Balloon Extraction Using Edge Detection And Histogram Scoring. 1.5 Target Audience The target audience of this thesis includes, but is not limited to, software developers for mobile platforms, comic book publishers and creators, digital imaging researchers, mobile device content publishers and distributors. Chapter 2 Balloon Extraction from Complex Comic Books Using Edge Detection And Histogram Scoring This chapter proposes an algorithm based on the Sobel operator to identify balloons in comic book pages. Unlike other approaches, this method works on colored and complex comic book images, as well as comic strips, without making any assumptions regarding the line continuity of the image, the orientation of the text or the color depth of the image. Each comic book page is input to the Sobel operator, then each closed region of such a page is identified, being afterwards each region subject to equalization and scoring, using for that the mean value of its histogram. Experimental test results show that our method significantly improves the rate of correctly detected balloons, and simultaneously decreases the number of false positives, when compared to other methods. 2.1 Introduction Textual information present in comic books has been so far out of reach of automated indexing because there was not any reliable way to do it on complex comic books. Most existing methods have good results on simple web comics or comic strips, but have much lower success rates on colored and complex pages of comic books. 5 6 CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS The method presented in this paper offers a possible approach to tackle this problem. It opens possibilities for information indexing, processing and retrieval similar to what already exists for standard electronic books, being thus also suited to tablets and smartphones. These types of devices typically have small screens which make reading comic books either cumbersome (having to zoom in on each portion of the page) or tiresome (straining the eyesight to read the small text). By applying our method, applications only need to enhance or zoom in on the balloons, without loss of visual context, and without preprocessing requirements at the time of publishing. Current commercial solutions for digital comic books build on proprietary formats that incorporate textual information manually introduced at the time of publishing. It is clear that this procedure increases file sizes, adds complexity and delays the time-to-market. Instead, by applying the method here proposed, the books can simply be packaged as a set of scanned images. This will make the transition to digital formats even faster, fostering thus also easier adoption of such digital formats on handheld devices. But correctly identifying the text balloons in a complex comic book page is neither easy nor straightforward. As others [18] have also recognized, text extraction of comic book images is harder than text extraction from traditional images like pictures or documents, because the noise level and the image texture (e.g., depth sense is not so notorious as in pictures) of the drawings are specially problematic. The vast disparity of possible shapes, colors and arrangement of the elements within a given comic book page make the automated identification of text elements a very hard process. There is no technique or shape for the text elements that is always adhered to, and there can be no assumptions regarding positioning inside the panels or even regarding the shape of panels itself. To deal with this problem, the currently proposed solutions fall into three main categories, namely: - edge based detection, - blob (or region) based detection, - connected component based detection. Some other methods of text extraction from generic images, rather than specifically tailored for comic book images, have also been studied (see, for example, [24] [25]), but they are not so relevant for the problem in hand. 2.1. INTRODUCTION 7 Edge based techniques rely on identifying the edges in the panels and balloons, including edges of the text glyphs. [18], presented a generic method for separating text from imagery that, while not specifically designed for comic book images, still provides some acceptable results for simple comic strip images. Such a method relies on the principle that edges of text glyphs are smaller when compared to the edges or lines of the surrounding image, so that balloons are the regions in which are located the smaller edges. [26] used the same principle of text size to enclose text of each balloon inside a bounding box. For that, they used a Canny edge detector and other image filters to separate the text out from the rest of the image. It is clear that the above methods produce a number of false positives, that is, a number of non-textual edges are mistaken as textual edges. For example, hair strands growing out from the skin can be mistaken as text glyphs in a comic book page. [27] presents a possible solution to this problem that consists in using a text extraction method based on the curvature of the edges. In fact, the accentuated curvature of the text elements allow us to distinguish between small non-textual edges and small textual edges, and this is something which could be adapted for extracting text from balloons in comic books. Interestingly, all of those methods produce fairly good results for simple images as those of web comics, but not for commercial comics books in which the artistry is complex. The assumption for them to work properly is that the edges are well delineated, but such ideal is not found in complex images of commercial comic books, where many crossing lines and balloons overlapping across multiple panels and gutter may populate book pages. Also there are no guarantees that the balloon boundary is completely enclosed and connected. Following [28], our method makes usage of a Sobel edge detector. It allows us to overcome the problems aforementioned. More specifically, the Sobel operator produces delineated edges even when such edges are not well defined in the original image, being this achieved at expense of identifying gradient changes and enhancing the boundaries of regions in the image. On the other hand, blob detection methods depend on identifying shapes and detecting contrast between text and background. [21] follows an approach that is based on shape identification by comparing the shapes of a dictionary of letters and symbols in relation to the binarized images, trying so to locate the text elements. Brahma ends up concluding that the method does not produce adequate enough results to replace traditional paper comics 8 CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS Figure 2.1: Alpha Flight 6, January 1984, page 9 [1]. with electronic comic books. In [22], blob identification is also based on rejection of blobs with reference to their size, resulting from this screening a list of potential text blobs. Once again, these methods produce good results on simple comic book images, but fail in comics with balloons that span multiple panels or in images with oddly shaped balloon elements. Finally, connected component based methods rely on the assumption of continuity between the outer edges of the images and every other colored pixel, with the exception of text pixels. The method works by tracking all pixels different than a background color (usually white), eliminating those background pixels from the edge of the image towards the center. Of course, the downside is that even for simple images there is a very small probability of having all pixels connected somehow, not being so possible to distinguish between text pixels and background pixels. From empirical evidence, regarding complex and colored comic book pages, the probability of this event to occur in a comic book page is zero in practice. But, when one applies a connected component based method to even simple web 2.1. INTRODUCTION 9 Figure 2.2: Comic book page components. comic images, it may occur some line noise (i.e., left-over lines) on discontinuity areas. [29] suggests going over the left-over areas and applying a width/height threshold to the bounding boxes of the remaining connected components in order to identify text areas. But, Guo et al. recognize that the result was acceptable for 80% of the samples taken into account in spite of the persistent left-over noise artifacts. In [30], one tries to extract panels and text elements out comic pages by means of a connected component based method and histogram analysis to identify text areas. Their method depends strongly on text orientation, because it checks the horizontal pixel density in relation to the vertical pixel density to identify text regions. In fact, if the text is in any other orientation, the ratio of horizontal pixel density to vertical pixel density will fall outside of the pre-defined threshold. In the context of image processing, [31] compares edge based methods and connected component methods on color images, and specifically tests those methods on different lighting conditions. Sushma and Padmaja conclude that edge based methods frequently produce more robust results, and are considered superior to connected component methods. and that can also be extended to the problem at hand. 10 CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS In short, those three types of approaches produce good results when they are applied to simple images like web comics, but not to complex images such as comic book pages. Edge detection methods depend heavily on the principle of perfect line separation, thus they fail when there are many intersections and text overlap. Connected component methods produce noise artifacts which are often confused with text glyphs during the OCR (optical character recognition) processing. In respect to region detection based methods, one assumes that the text glyphs are always of the same shape, which is not true, or the text glyphs have only horizontal orientation and are not overlapped by other elements, which is not true either. Despite that, each category of algorithms has interesting techniques, like histogram analysis on blob-based methods or edge detection on edge-based methods. Edge detection is indeed adequate for the type of images in comic books because the drawings have defined edges between all color areas, either by having steep gradient changes, easily detectable by derivative operators like Sobel, or by strong, thick, black lines separating the elements in the drawing. This is a defined characteristic of the comic book medium. Also, connected component techniques can be used to, within each defined area, identify all the contained pixels, and in turn, be used to calculate the bounding boxes used in region based detection. The method presented in this paper ends doing that, taking the strong points of each of the methods described and producing a more robust extractor of text balloons that works in complex comic book images with a comparable, or even better, result rate when compared to each of the other methods when applied to simpler images. It is also worth noting that the method presented in this paper does not intend to extract the text (that is, perform OCR), but rather correctly find the text-containing elements of the comic book page —the balloons. This can be used as a first step for performing OCR, but can also be useful on its own because, when viewing comics on small screen devices, it is often enough to simply zoom in the balloons rather than actually getting the text inside them. When OCR is actually needed (for indexing, searching, etc), then the method presented here produces images that contain just the balloons and have no image noise, which will then permit a straight forward text extraction using readily available OCR processing engines. 2.2. PAGE LAYOUT AND TERMINOLOGY 2.2 11 Page Layout and Terminology Since this method is aimed towards comic book pages, the elements of each one of these pages are described here to allow a better understanding of the terminology used. For the purpose of this document, a comic book page and a comic vignette (like a web comic strip) are indistinguishable and exchangeable at will. Both contain the same basic elements and as such, the method can be used on either without compromising the validity of the results. A page in a comic book is composed of several distinct elements: - Panel — This is the basic element of a comic book page. Usually, a panel contains a single illustration or drawing. Commonly, but not always, a panel is delimitated by a clear boundary that separates it from other panels. Each panel represents a scene or moment in the story. A page can contain one or more panels of diverse shapes. - Gutter — This is the space between panels. On older comic books or on vignettes, the gutter is usually white and clearly visible. On modern comic books, it is barely present, but when it exists, it provides an empty space that separates each panel from its adjacent panels. Artistic freedom exists in modern comic books, so some art intrudes itself into the gutter, and even flows from one panel to another across it. - Art — The art consists of the drawings, which usually are made inside panels. Nevertheless, there is artistic leeway that allows for its extension beyond the limits of the panels at the discretion of the artist. Drawings are used to convey the narrative and setting of the story. All colored drawing areas have very well defined borders, either through black contour lines or contrasting colors. In the case of black and white art, the regions are separated by black contour lines or different shades of gray. - Balloons — These elements are enclosed areas where text is written. They represent speech, thoughts or narrative data. They are usually placed inside panels, but sometimes they extend outside those panels. Their shape is usually rounded for speech balloons and thought balloons, and usually rectangular for narrative data panels. Balloons usually have a tail that points towards the originator of the speech or the thought in the drawing. - Splash Balloons — The main difference in relation to other balloons is the shape which is jagged. Usually, a splash balloon conveys an exclamation or dramatic text, 12 CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS which is very common on Manga books (Japanese comic books). Traditionally, balloons have white background and black text. In fact, this is the only feature common to all types of comics books. Some exceptions do exist, however, usually to lend a different quality to the voice or thought process of the comic book character. This element is created to be as clearly visible and legible as possible, as well as to draw the reader’s attention who must interpret it quickly and without ambiguity. On a more practical level, this translates into text balloons that are brighter than ordinary regions in the page. Obviously, this helps the reader to follow the sequence of the story. Regarding the overall composition, comic books have such a diversity of styles that there are no guarantees that all elements are present in given a book page. Also, no assumptions can be made concerning the shape, size or disposition of those elements. In fact, some comic books have whole pages with nothing in them as part of the narrative [1]. This formal freedom can show up in other forms such as in the drawing of panels-within-panels-withinpanels or deliberately in the drawing of balloons that overlap. Also, there are some highly distinctive styles between European, American and Japanese comic books. Those are the predominant types of comic books existing today, and each has specific ways of using the comic book elements. For example, American comic books have pratically no gutter between panels, and the art elements may flow from one panel to another, while Japanese comics, or manga, have different panel layouts, with long, dramatic panels covering the width or height of the pages occurring more often. 2.3 Balloon Extraction Algorithm Our algorithm belongs to the category of edge based algorithms while at the same time using histogram analysis common in blob-based algorithms. The balloon extraction algorithm described here is essentially a two-stage algorithm. The first stage is an image segmentation algorithm that divides each comic book page into regions. The second stage basically filters out the regions relating to balloons. The only assumption made by the algorithm is that the balloons in the images have white background (or very bright background) and black text or, alternatively, they are composed of a single background color and a single text color. In fact, only residual cases do not 2.3. BALLOON EXTRACTION ALGORITHM 13 Figure 2.3: (left) Original image. The Amazing Spider-Man 679, April 2012, story page 5; (right) Gray scale image. 14 CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS Figure 2.4: (left) Book page after applying the Sobel edge detector; (right) Negative of the Sobelized book page. 2.3. BALLOON EXTRACTION ALGORITHM 15 follow this rule. This is due to the fact that text balloons are made to be quite legible to the reader, i.e., to facilitate their localization to the reader. Naturally, there are a few special cases that fall outside this definition, but we will see how well our algorithm applies to them in the results section. 2.3.1 Page Segmentation Let B = {Pi }, i = 0 . . . n, a colored comic book with n pages. A comic book page is composed of multiple closed regions, possibly with different colors, but separated by black outlines. The text balloons are particular cases of closed regions, being for that also delimited by black outlines. The segmentation of each comic book page Pi into regions involves the following steps: 1. Convert Pi to a grayscale page PiG . 2. Apply the Sobel operator to PiG to get PiS . 3. Compute the negative page PiN of PiS . 4. Apply an enhanced flood fill operator to all black pixels of PiN to obtain all page regions. By using a Sobel-based edge detector together with a flood fill operator, all regions in a comic book page can be thus separated and extracted. 2.3.1.1 Conversion to gray scale To ensure that this method can be applied to color images and also to black and white images, the first step is to apply an operator that converts color images into grayscale images. Since the overall goal is to be able detecting the brightest regions (i.e., balloons) in the image, the conversion to a gray scale representation can be achieved by calculating the luminance of each pixel from its RGB components. Luminance is a measure of the brightness of a given pixel when adjusted for human vision. The human eye can detect brightness variations better than color variations [20], so the 16 CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS conversion should respect this feature. One way to achieve such a conversion is to utilize the RGB-to-YCrCb conversion as follows [32]: Y 0.2989 0.5866 0.1145 R Cb = −0.1687 −0.3312 0.5000 G . Cr 0.5000 −0.4183 −0.0816 B (2.1) This allows us, in a simple manner, to obtain color information in terms of luminance and chrominance (YCrCb), rather than just color (RGB). The luminance channel (Y) is simply a grayscale representation of the original image, such that brighter colors are transformed into brighter grays. The other channels (Cb and Cr) can be ignored for the purpose of our algorithm, so they do not have to be calculated. They represent the chrominance of blues and reds. Obviously, if the original book pages already are in black and white, or in grayscale, this conversion step can be skipped. It is also clear that other color space representations other than YCrCb might be used after all, as long as they include a channel for luminance [33]. We could even use distinct weights for the values of R, G, and B in the computation of luminance, but, as explained further ahead, this may lead to a need for applying histogram equalization to page regions. The important thing to retain here is that the computing of luminance should allow us to mimic the way how human vision delineates contours of regions, in a way that edges extracted by the algorithm correspond to edges identified by the human eye. Threfore, the grayscale conversion step is crucial to enable the Sobel operator to detect intensity changes (edges) more easily, and also ensure that the detected edges are closer to what a human would perceive as an border when looking at the image. 2.3.1.2 Sobel-based edge detection From the grayscale page image, the edges must be identified and enhanced for further processing. In image processing, an edge is a group of pixels that have a significant difference of brightness in relation to neighbouring pixels. This can be seen as a sudden change (or discontinuity) when looking at the color variation on a given area. These changes exist between regions of an image, or between foreground objects and background objects [28][20]. 2.3. BALLOON EXTRACTION ALGORITHM 17 We used the Sobel operator for edge detection, since it has proved itself perfectly adequate, not to speak of its simplicity, to accentuate shape contours. In fact, in contrast with photographic images, art lines drawn in comic books are continuous, which guarantees that the edges found in book pages are precise, in particular those concerning boundaries found in the page image. This also means that there is no need for any kind of image pre-processing using a sharpening filter to better delineate contours in images. The Sobel operator convolves two 3 × 3 kernels with the original image in order to compute approximations to the derivatives in both x-direction and y-direction, which are sensitive to vertical and horizontal intensity changes, respectively [20]. Thus, this operator slides each kernel on the book page, computing then the product of the kernel matrix and the 3 × 3 neighborhood matrix of each pixel in book page. When compared to other edge detection operators, the Sobel operator produces thicker edges [28], which is desirable for this type of images because eases the detection of region contours, as needed when performing region extraction through flood fill algorithm. Also, since this operator is fast to traverse the complete image because only requires knowledge of the 3 × 3 neighborhood of each page pixel. As a margin note, let us say that we have also tested a Laplacian operator, which is still faster than Sobel operator, to detect and enhance edges in the image, but it produced aliasing artifacts on the edges, originating continuity problems across page regions. More specifically, some regions get connected when they indeed lie separate in the page. 2.3.1.3 Negative pages As shown above, edge detection tries to match the way the human eye perceives colors and brightness, so that each book page is first transformed to a gray scale representation, immediately before applying the Sobel operator which increases the contrast of the edges with black lines in a white background. To facilitate processing, one calculates the negative of each book page, hence resulting white edges in black background (right hand side of Fig. 2.4). 18 CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS 2.3.1.4 Flood fill and region extraction Found its negative, each book page can be separated into closed regions. Each region is a set of pixels from the original book page delimited by a number of edges (white pixels in the negative book page). Careful analysis revealed that the letters inside text balloons are also identified as edges. This is the expected behavior of an edge detection operator, since the text letters can be considered as discontinuities (edges) in the image. However, this fact is derisive to a proper extraction of the balloon regions, because making such extraction of regions would result in extracting the balloons without the letters; in other words, we would obtain balloon regions with holes left by the letters, as illustrated in left hand side of Fig. 2.5. Therefore, the flood fill must be adapted to overcome this problem. Recall that a standard filling algorithm inundates the basin of a region, in a progressive manner, while its frontier is not attained by the water. A letter inside a balloon region works as a barrier to the progression of the water. Consequently, flooding a balloon region gives rise to a region with holes. Filling in these holes can be accomplished by, first, identifying the most left pixel (see red pixels in the right hand side of Fig. 2.5) and the most right pixel (see green pixels in the left hand side of Fig. 2.5) of each row of the flooded region, copying then all the pixels of the corresponding row of the original colored book page onto a colored ballon region. Thus, in the end of this stage, the extracted regions of a book page are all colored regions. 2.3.2 Balloon extraction The page segmentation described above has produced a set {R j }, j = 0, . . . , N, of regions for each book page, some of which concern the text balloons. The question then is how to filter out these regions with text balloons, what corresponds to the second stage of our algorithm, i.e., balloon extraction, which consists of the following steps: 1. Discard either too large or too small regions. 2. Score the remaining regions (mid-sized regions) with respect to their grayscale histograms. 2.3. BALLOON EXTRACTION ALGORITHM 19 Figure 2.5: Flood fill method comparison. (left) standard flood fill extraction, (right) modified flood fill extraction. 20 CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS 3. Sort mid-sized regions according to their weighted average luminance. 4. Filter out ballooned regions using an empirical threshold based on the weighted average luminance. 2.3.2.1 Region culling Having extracted all the regions present in the image, it is a priori necessary to cull a significant number of regions in order to increase the overall performance of the algorithm. During testing, it was found that most of comic book pages had between 500 and 1000 separate regions; for example, the book page shown in Figure 2.3 contains 661 regions. Moreover, most of them were too small (just a couple of pixels) or too large (in height or width) for being considered as balloon regions. The rather small regions usually concern art details or single letters, which are irrelevant to the outcome of the algorithm and thus can be discarded straight away. In respect to too large regions, we noted that they usually represent the gutter or extensive background segments, but not balloon regions, so they should also be discarded altogether. Too small Too large Relative region width <1.5% >50% Relative region height <1.5% >50% Relative region area <1% >20% Table 2.1: Criteria for size-based region culling. Therefore, as shown in Table 2.1, culling of regions is based on one of the following criteria: - Relative region width; - Relative region height; - Relative region area. If the region width is less than 1.5% or is greater than 50% relative to page width, it is then considered as discardable and thus excluded from the remaining set of regions. These culling percentages also apply to the relative height of each region. Additionally, any region whose area is smaller than 1% or larger than 20% relative to page area is discarded. This 2.3. BALLOON EXTRACTION ALGORITHM 21 usually removes 90% of the extracted regions, making the next step of the algorithm even faster. 2.3.2.2 Region scoring Before proceeding any further, it may be convenient to remember that we still have about 10% percent of negative regions that need to be ranked in some way, since many of them will also be dropped. For this purpose, we first extract the color regions from the original colored book page that are homologous to those negative regions, using for that a pixel-wise copy operation. Afterwards, let us convert these color regions to grayscale regions because we intuitively know that brighter regions correspond to balloon regions. One way of ranking the remaining regions, and identify the ones most likely to be balloons, is generating a histogram for each region, taking into consideration the overall luminance of the region. Obviously, the luminance of a region depends on the luminance of each one of its pixels, which is given by the Y value in (2.1). Note that Y ∈ [0, 255] ⊂ R, so after calculating the Y -value of every pixel pk of a given region with N pixels, we have to map it into the discrete grayscale of the histogram as follows: G(pk ) = round(Y (pk )) (2.2) so, the corresponding histogram bin scores one more point; for example, if G(pk ) = 231, the bin numbered as 231 will be increased of 1, that is, h(231) = h(231) + 1. (2.3) where h(i), i ∈ [0, 255] ⊂ N, denotes the i-th bin of the histogram. 2.3.2.3 Region sorting So, taking into account that we are dealing once again with grayscale regions, the histogram of each region has a range of values G ∈ [0, 255] ⊂ N (dark – white). It is then clear that a histogram of a brighter region (i.e., a balloon region) has a peak closer to its right hand side 22 CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS Figure 2.6: (top) Typical histogram of a balloon region; (bottom) histogram of an ordinary (nonballoon) region. (or white) (Fig. 2.6 Top). Thus, a grayscale region with N pixels can be ranked according to the weighted average luminance: 255 ∑ i × h(i) L= i=0 N (2.4) Since the histogram range values vary from 0 (black) to 255 (white), regions with a higher percentage of white will score more, and darker regions will score less (Fig. 2.6 Bottom). Thus, the higher scoring regions are typically the balloons in the image. The regions whose histograms are shown in Fig. 2.6 represent a balloon (top) with L=254,00 and an ordinary region (bottom), with an L=84,25. 2.3.2.4 Region filtering Experimental results showed that the weighted average luminance L = 247 is the threshold above which a region is considered as a balloon; otherwise, the region will be discarded. Note that the white band [247, 255] represents only 3% of the histogram range. For example, 2.3. BALLOON EXTRACTION ALGORITHM 23 Figure 2.7: Balloon regions extracted from the book page shown in Fig. 2.3. Although not visible here, the regions have a balloon shape indeed. the regions shown in Fig. 2.7 concern the text balloons that were filtered in this manner from the book page depicted in Fig. 2.3. Interestingly, when the ranking produces no results above the threshold L = 247, a different method can be used to identify the correct regions, specifically, color counting. Using the histograms already created for each region, score each region by counting the number of spikes in the histogram. Any region with exactly two spikes is a candidate balloon region (foreground and background colors). Image artifacts like aliasing, bluring or sharpening can cause the number of spikes to change, so care should be taken not to use other image processing filters before creating the histogram. This alternative method deals with almost 24 CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS all cases where balloons are not the brightest areas on the page. Besides, it may happen that, with poor quality scanned images or in old, yellowed paper scanned images, the original image is too dark or has very low contrast. All the balloons would fall low of the threshold for balloon detection. In this case, an equalization of the histogram will solve this problem, by normalizing the color representation values for the purpose of scoring only. 2.4 Experimental Results The testing of our algorithm were performed using a PC powered by an Intel Core i7 860 processor, 2.8 Ghz clock, with 8GB RAM and one ATI 5670, and running Windows 7 (64 bit version) operating system. To test and demonstrate the validity of the method presented, a test run with 7 comic books, from different publishers and with different artistic styles, was used. The comic books were scanned as lossless PNG files, with resolution 1024x1590, 150 DPI, and only the first 10 story pages per book were used, what makes a total of 70 pages of comics analysed. Note that the page numbers in Tables 2-13 are relative to the actual page number of the book’s story. For example, the page number 1 might correspond, in a particular book, to page 4, discounting covers, publicity pages, title or credits pages, since those have no text balloons and are not actually comic book images. Also, when we have two consecutive pages, each showing a half of a larger image (i.e., a page spread), they count as only one, since that is actually what they represent. This was done purely for convenience but had it been done differently, would not affect the result. As a margin note, let us say that the average processing time of each book page was 1 to 2 seconds. The balloons per page were counted manually and this number was compared with the number of balloons correctly identified by our method. False positives, as well as missed balloons, were also counted. Recall that a false positive is an image region that the method presents as being a balloon containing text but in reality is not, and a missed balloon is a balloon existing in the image but not identified as such. Those two cases are presented separately because they represent the two possible points-of-failure (resulting usually from either under or over-tuning of the thresholding parameter).[2] 2.4. EXPERIMENTAL RESULTS 25 Table 2.2: Extraction results for [2] Batman 670 Page 1 2 3 4 5 6 7 8 9 10 Total Percentage Success Detections 6 6 8 10 4 9 6 8 6 7 70 Actual Balloons 6 6 7 6 4 7 7 6 6 7 62 False Positives 0 0 1 4 0 2 0 2 0 0 9 12,86 87,14 Missed Balloons 0 0 0 0 0 0 1 0 0 0 1 1,43 98,57 Percentage Success Table 2.3: Extraction results for [3] Batman 671 Page 1 2 3 4 5 6 7 8 9 10 Total Detections 1 4 4 6 23 1 3 6 8 6 62 Actual Balloons 1 5 4 6 8 4 4 4 8 6 50 False Positives 0 0 0 0 15 0 1 2 0 1 19 30,65 69,35 Missed Balloons 0 1 0 0 0 3 2 0 0 1 7 11,29 88,71 Percentage Success Table 2.4: Extraction results for [4] Batman 672 Page 1 2 3 4 5 6 7 8 9 10 Total Detections 1 8 6 4 8 9 7 5 9 4 61 Actual Balloons 1 8 6 4 11 9 7 5 9 4 64 False Positives 1 0 0 0 0 0 0 0 0 0 1 1,64 98,36 Missed Balloons 1 0 0 0 3 0 0 0 0 0 4 6,56 93,44 Percentage Success Table 2.5: Extraction results for [5] Batman 673 Page 1 2 3 4 5 6 7 8 9 10 Total Detections 2 1 2 1 5 3 8 8 6 4 40 Actual Balloons 2 1 2 0 2 2 8 8 6 4 35 False Positives 0 0 0 1 3 1 0 0 0 0 5 12,50 87,50 Missed Balloons 0 0 0 0 0 0 0 0 0 0 0 0,00 100,00 26 CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS Table 2.6: Extraction results for [6] Batman 674 Page 1 2 3 4 5 6 7 8 9 10 Total Percentage Success Detections 4 16 4 6 3 5 9 11 2 3 63 Actual Balloons 4 5 4 6 3 5 9 11 2 3 52 False Positives 0 11 0 0 0 0 0 0 0 0 11 17,46 82,54 Missed Balloons 0 0 0 0 0 0 0 0 0 0 0 0,00 100,00 Table 2.7: Extraction results for [7] Amazing Spider-Man 643 Page 1 2 3 4 5 6 7 8 9 10 Total Percentage Success Detections 8 12 9 5 8 7 4 19 10 7 89 Actual Balloons 7 12 9 4 8 6 4 12 10 5 77 False Positives 1 0 0 1 0 1 0 7 0 2 12 13,48 86,52 Missed Balloons 0 0 0 0 0 0 0 0 0 0 0 0,00 100,00 Table 2.8: Extraction results for [8] Amazing Spider-Man 644 Page 1 2 3 4 5 6 7 8 9 10 Total Percentage Success Detections 10 8 10 11 8 10 9 11 7 15 99 Actual Balloons 8 7 10 10 0 9 8 9 7 13 81 False Positives 2 1 0 1 8 1 1 2 0 2 18 18,18 81,82 Missed Balloons 0 0 0 0 0 0 0 0 0 0 0 0,00 100,00 2.4. EXPERIMENTAL RESULTS 27 Table 2.9: Extraction results for [9] Amazing Spider-Man 645 Page 1 2 3 4 5 6 7 8 9 10 Total Percentage Success Detections 8 12 7 9 7 8 15 10 11 10 97 Actual Balloons 8 10 7 9 6 8 15 7 11 10 91 False Positives 0 3 0 0 1 0 0 3 0 1 8 8,25 91,75 Missed Balloons 0 1 0 0 0 0 0 0 0 1 2 2,06 97,94 Table 2.10: Extraction results for [10] Amazing Spider-Man 646 Page 1 2 3 4 5 6 7 8 9 10 Total Percentage Success Detections 4 4 9 9 2 6 7 5 7 8 61 Actual Balloons 6 4 8 9 2 7 8 1 7 8 60 False Positives 0 0 1 0 0 0 0 4 0 0 5 8,20 91,80 Missed Balloons 2 0 0 0 0 1 1 0 0 0 4 6,56 93,44 Table 2.11: Extraction results for [11] Amazing Spider-Man 647 Page 1 2 3 4 5 6 7 8 9 10 Total Percentage Success Detections 7 2 10 13 12 8 8 8 4 4 76 Actual Balloons 8 2 10 13 13 7 9 7 4 3 76 False Positives 0 0 0 0 0 1 0 1 0 1 3 3,95 96,05 Missed Balloons 1 0 0 0 1 0 1 0 0 0 3 3,95 96,05 28 CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS Table 2.12: Extraction results for [12] Amazing Spider-Man 648 Page 1 2 3 4 5 6 7 8 9 10 Total Percentage Success Detections 7 4 10 16 15 12 14 12 19 11 120 Actual Balloons 7 4 9 11 13 12 14 12 16 11 109 False Positives 0 0 1 5 3 0 0 1 3 0 13 10,83 89,17 Missed Balloons 0 0 0 0 1 0 0 1 0 0 2 1,67 98,33 Table 2.13: Extraction results for [13] Amazing Spider-Man 649 Page 1 2 3 4 5 6 7 8 9 10 Total Detections 10 14 16 6 9 12 5 14 15 3 104 Actual Balloons 9 11 16 6 7 11 4 8 14 3 89 False Positives 2 3 1 0 2 1 1 4 1 0 Missed Balloons 1 0 1 0 0 0 0 0 0 0 2.4.1 Percentage Success 15 14,42 85,58 2 1,92 98,08 Analysis of results Tables 2-13 show the results for 12 comic books (our bookset), a table per book. More specifically, as mentioned above, each table presents test results for the first 10 pages of a single book. Pages with no false positives and no missed balloons are classified as having optimal results, and this happens in 62 out of 120 pages in the bookset, which, given the complexity of the pages, is a very high value. It means that balloon extraction is optimal or completely automatic in more than 50% percent of book pages. Also, only 25 out of 846 balloons of the bookset were missed out by the algorithm, what represents 2.95% of total number of balloons. Taking into consideration that each book page has approximately a minimum of 500 regions, we end up to processing 5000 regions for the first 10 pages of each book, in a total of 60,000 regions concerning 12 books. Interestingly, by inspection of Tables 2-13, we note that the algorithm only produced 119 false positives, what represents about 0.2% of the total number 2.4. EXPERIMENTAL RESULTS 29 of regions. Interestingly, the page with a significant number of false positives is page 5 in Batman 671, which has 15 occurrences of false positives. This case has occurred because on page 5 we find snow, which in some circumstances can be easily mistaken as a balloon region, since they are both white. In fact, all false positives are created by very bright single color regions that are the same size as balloons. Such false positives could be avoided by counting the peaks in the histogram, since all the false positives have only one color. Ou false positives are visually similar to false positives described in [30]. 2.4.2 Comparison to other algorithms Other authors have tackled the problem of extraction of text balloons in comics (see, for example, [34] and [22]). However, they used simple comics like web comics and flat color books (e.g., Asterix, Lucky Luke or Garfiled), but not more complex comics like those published by Marvel or DC such as Batman or Spider Man. In respect to simple comics, we can say that our algorithm does not fail, that is, it does not produce no false positives nor missing balloons. Moreover, because this algorithm does not concern itself with the text inside the balloons, but just with the balloons themselves, it can detect balloons that have text in any orientation, direction, alphabet (cyrilic, arabic, chinese, etc.), font face type or color. During testing, our algorithm successfully detected the balloons with special characteristics like those shown in Fig. 2.8. All of those balloons depicted in Fig. 2.8 cause problems to other methods. Different colored text inside the balloons would lead to false positive detection, wavy text would defeat horizontal text searches, and different font faces, with disconnected letters, would make other methods to fail. 2.4.3 OCR This algorithm does not attempt to perform OCR on the detected balloons because that is not always the goal of balloon extraction and also because there are readily available solutions 30 CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS Figure 2.8: Special case balloons that were successfully recognized. Balloons having different text colors than other balloons, different text colors inside the same balloon, shapes different from standard balloons, wavy text, different font faces in the same balloon. to detect text in images, provided that those images only contain text, as is the case of the balloons produced after applying this algorithm. On the other hand, when the optical character recognition of the text in the extracted balloon images is necessary, we can use any traditional OCR device that supports the font types used in comic books or, simply, train the OCR device to recognize those font types. Traditionally, the font types used belong to the ComicScript and Comicraft family for American and European comic books. Those are the font types of the textual elements, not those of onomatopeiae or drawn text elements (like street signs, billboards, etc) drawn into the image. During testing, using Google’s Tesseract OCR engine, by increasing the extracted balloon size by a factor of 4 we obtained better results than simply using the regions of the original image. This facilitated the correct recognition of individual letters that lie connected in the 2.5. FINAL REMARKS 31 original book page. 2.5 Final Remarks The method presented offers better results for complex comics when compared to other methods, and does so while at the same time having less processing requirements. The results show that, for many pages, it is optimal, in the sense that it has no false positives and correctly detects all balloons. It does, however, fail for specific pages with bright areas or balloons with uncommon background colors, like black or other dark colors. These are corner cases at best, and do not represent any significant portion of existing comic books. It should be noted that no other existing method can deal with such corner cases successfully either. Future work will address finding those balloons and hardening the false positives that result from small bright areas in the image. 32 CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS Chapter 3 Conclusions The results achieved with the algorithm presented allow the conclusion that performing balloon extraction in comic book pages, in an automated fashion, and without requiring intense cpu power, is possible. Those results show that this algorithm is both reliable and comprehensive in scope, in that it is not tailored to any specific style of comic book, but rather generic enough to be applied to any comic book, without compromising the results. It is true that there are still some corner cases that make total automation, for all comic books, unadvisable, but it is also observable from the results that this algorithm produces better and more consistent results than other methods. Experimental results show that optimal cases are the majority, that is, processed pages where there are no false positives and no missed balloons, which is the ultimate goal of any algorithm for solving this problem. The work presented can be expanded upon by looking at ways to double-check the regions identified as balloons, in order to reduce the still-present false positives. This could possibly be achieved by including OCR elements like character matching and exclude regions without matches, but this has not yet been confirmed. Also, it would be interesting to explore the possibility of an implementation completely in hardware of the proposed algorithm, possibly with some type of SoC (system-on-chip), and couple it with e-paper devices, for a complete comic book reading experience on a single sheet of paper. 33 34 CHAPTER 3. CONCLUSIONS References [1] John Byrne. Alpha Flight. Number 6. January 1984. [2] Grant Morrison and Tony Daniel. Batman. Number 670. DC Comics, December 2007. [3] Grant Morrison and Tony Daniel. Batman. Number 671. DC Comics, January 2008. [4] Grant Morrison and Tony Daniel. Batman. Number 672. DC Comics, February 2008. [5] Grant Morrison and Tony Daniel. Batman. Number 673. DC Comics, March 2008. [6] Grant Morrison and Tony Daniel. Batman. Number 674. DC Comics, April 2008. [7] Paul Azaceta Marcos Martin Mark Waid, Stan Lee. The Amazing Spider-Man. Number 643. Marvel Comics, November 2010. [8] Marcos Martin Mark Waid, Stan Lee. The Amazing Spider-Man. Number 644. Marvel Comics, November 2010. [9] Mathew Southworth Mark Waid, Paul Azaceta. The Amazing Spider-Man. Number 645. Marvel Comics, December 2010. [10] Paul Azaceta Mark Waid. The Amazing Spider-Man. Number 646. Marvel Comics, December 2010. [11] Dan Slott Fred Van Lente Mark Waid Zeb Wells Max Fiumara Karl Kesel Paul Azaceta Bob Gale, Joe Kelly. The Amazing Spider-Man. Number 647. Marvel Comics, December 2010. [12] Joe Quesada Clayton Henry Dan Slott, Paul Tobin. The Amazing Spider-Man. Number 648. Marvel Comics, January 2011. 35 36 REFERENCES [13] Humberto Ramos Dan Slott. The Amazing Spider-Man. Number 649. Marvel Comics, January 2011. [14] Scott McCloud. Understanding Comics - The Invisible Art. Harper Collins, 1994. [15] Marvel Comics. Marvel Digital Comics Shop, (accessed October 12, 2012). http: //comicstore.marvel.com/. [16] DC Comics. DC Comics Digital Comics Shop, (accessed October 12, 2012). http: //www.readdcentertainment.com/. [17] DigitalComicMuseum. Digital Comic Museum, (accessed October 12, 2012). http: //digitalcomicmuseum.com/. [18] Ruini Cao and Chew Lim Tan. Separation of overlapping text from graphics. In Proceedings of the Sixth International Conference on Document Analysis and Recognition (ICDAR’01), pages 44–48, 2001. [19] Trung Quy Phan Palaiahnakote Shivakumara and Chew Lim Tan. A laplacian approach to multi-oriented text detection in video. In IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 33, pages 412–419, 2011. [20] Rafael C. Gonzalez and Richard E. Woods. Digital Image Processing Third Edition. Prentice Hall, 2007. [21] Siddhartha Brahma. Text extraction using shape context matching, 2006. [22] Kohei Arai and Herman Tolle. Method for real time text extraction of digital manga comic. International Journal of Image Processing (IJIP), 4(6):669–676, 2011. [23] Will Eisner. Comics & Sequential Art. Poorhouse Press, 1985. [24] M. Praneesh and R. Jaya Kumar. Novel approach for color based comic image segmentation for extraction of text using modify fuzzy possibilistic c-means clustering algorithm. Special Issue of International Journal of Computer Applications (09758887) on Information Processing and Remote Computing - IPRC, pages 16–18, August 2012. [25] Keiichiro Hoashi, Chihiro Ono, Daisuke Ishii, and Hiroshi Watanabe. Automatic preview generation of comic episodes for digitized comic search. In Proceedings of the 19th International Conference on Multimedia 2011, 2011. REFERENCES 37 [26] Q. Yuan and C. L. Tan. Page segmentation and text extraction from gray scale image in microfilm format, 2001. [27] Sachin Grover, Kushal Arora, and Suman K. Mitra. Text extraction from document images using edge information. In IEEE India Council Conference, INDICON 2009: Ahmedabad, 2009. [28] Wenshuo Gao, Xiaoguang Zhang, Lei Yang, and Huizhong Liu. An improved sobel edge detection. In Computer Science and Information Technology (ICCSIT), 2010 3rd IEEE International Conference on, volume 5, pages 67–71, 2010. [29] Qinqlian Guo, Kyoko Kato, Norio Sato, and Yuko Hoshino. An algorithm for extracting text strings from comic strips, 2006. [30] Christophe Rigaud, Norbert Tsopze, Jean-Christophe Burie, and Jean-Marc Ogier. Extraction robuste des cases et du texte de bandes dessinées, 2012. [31] J. Sushma and M.Padmaja. Text detection in color images. In IEEE IAMA 2009, 2009. [32] ITU-R. Recommendation itu-r bt.601-7, March 2011. [33] R.W.G. Hunt. The Reproduction of Colour in Photography, Printing and Television. Fountain Press, 1987. [34] Anh Khoi Ngo ho, Jean-Christophe Burie, and Jean-Marc Ogier. Panel and speech balloon extraction from comic books. In 2012 10th IAPR International Workshop on Document Analysis Systems, 2012.