Balloon Extraction from Complex Comic Books Using

Transcription

Balloon Extraction from Complex Comic Books Using
Balloon Extraction from Complex Comic
Books Using Edge Detection and
Histogram Scoring
João Miguel Coronha Correia
Submitted to University of Beira Interior in partial fulfillment of the requirement for the
degree of
Master of Science in Tecnologia e Sistemas de Informação
Supervised by Prof. Dr. Abel João Padrão Gomes
Departmento de Informática
Universidade da Beira Interior
Covilhã, Portugal
http://www.di.ubi.pt
Acknowledgments
I would like to express my gratitude to my advisor, Prof. Dr. Abel João Padrão Gomes, for
his scientific guidance during the development of this master thesis and related affairs.
João Correia
Covilhã, Portugal
October 22, 2012
iii
iv
Resumo
Ao longo dos anos várias tecnologias disruptivas tem alterado a maneira como acedemos
aos conteúdos e informações presentes nos vários meios de comunicação e formas de expressão cultural. Recentemente, com o aumento do mercado dos dispositivos portáteis como
smartphones, tablets, portáteis e ultrabooks, tornou-se necessário adaptar os conteúdos para
este formato de ecrã reduzido. Adaptações em livros, filmes, ou música foram relativamente
rápidas e bem sucedidas porque as suas caracteristicas intrinsecas puderam ser modificadas
para uma correcta representação em ecrãs pequenos e de manuseamento táctil. Os livros
de texto foram particularmente bem sucedidos nestes novos formatos. Os blocos de texto
moldam-se facilmente a qualquer espaço, e o tamanho de letra pode ser ajustado para permitir maior visibilidade. Mas alguns tipos de livros não se adaptaram tão bem, nomeadamente
os livros, ou revistas, de banda desenhada. A sua estrutura, composta por arte e texto, não
é moldável, e as suas formas complexas não são tão adaptáveis como livros de texto. No
entanto, existe um mercado potencial cada vez maior de jovens com acesso a dispositivos
inteligentes que são, simultaneamente, o público alvo preferencial para as revistas de banda
desenhada. Torna-se relevante, então, arranjar forma de adaptar, o melhor possível, os
conteúdos aos dispositivos.
Um dos maiores problemas da adaptação aos dispositivos móveis é a legibilidade do texto
das bandas desenhadas quando a sua representação é mais pequena que o normal. Como o
texto está representado como parte do desenho, dentro de balões, como fazer para encontrar
apenas esses elementos relevantes? Como realçar os balões, tornando-os mais visíveis, ou
maiores, sem perder o contexto visual dentro da página? Como evitar aumentar o tamanho
da imagem através de um nível de zoom excessívo e permitir ao mesmo tempo discernir o
conteúdo? Como faze-lo para bandas desenhadas mais complexas que web comics com três
paineis e estilos artísticos minimalistas?
v
Esta dissertação apresenta uma possível solução para este problema, ao apresentar um
algoritmo para identificação de balões dentro das páginas de banda desenhada, com um
procedimento aplicável a qualquer tipo de banda desenhada, mesmo as mais elaboradas.
vi
Abstract
Over the years, several disruptive technologies have changed the way we access content and
information on various media and means of cultural expression. Recently, with the expansion of the market for hand held devices like smart phones, tablets, laptops and ultra books,
it has become necessary to adapt content to this reduced screen size. Changes in books,
movies or music were relatively fast and successful because their intrinsic characteristics
could be modified so that they were correctly represented in small screen sizes and handled
with tactile interfaces. Text books were particularly successful in this new formats. Text
blocks are easily molded to any space, and font size can be adjusted to allow for better
visibility. But some types of books were not so easily adapted, namely comic books. Its
structure, with art and text, is not adjustable, and its complex shapes are not as easy to
change as text books. Nevertheless, there is a growing potential market of young people
with access to smart devices which, simultaneously, are the preferential target audience for
comic books. It is relevant, then, to find ways to adapt, as best as possible, the content to
the devices.
One of the greatest problems in the adaptation to mobile devices is the readability of comic
book text when viewing it in smaller than normal sizes. Since the text is embedded in the
art, inside balloons, how to find just those relevant elements? How to enhance the balloons,
making them more visible, or bigger, without losing visual context inside the page? How to
avoid enlarging the image with excessive zoom level and at the same time allow the content
to be understood? How to do it for comic books more complex than three panel web comics
with minimalistic art styles?
This dissertation presents a possible solution to this problem, by introducing an algorithm to
identify balloons inside comic book pages, with a method that works for any type of comic
book, even the most complex ones.
vii
Contents
Resumo
v
Abstract
vii
Contents
ix
List of Figures
xi
List of Tables
xiii
1
2
Introduction
1
1.1
Motivation and Objectives . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.2
Research Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
1.3
Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . .
4
1.4
Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
1.5
Target Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
Balloon Extraction from Complex Comic Books Using Edge Detection And
Histogram Scoring
5
2.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
2.2
Page Layout and Terminology . . . . . . . . . . . . . . . . . . . . . . . .
11
2.3
Balloon Extraction Algorithm . . . . . . . . . . . . . . . . . . . . . . . .
12
2.3.1
Page Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . .
15
2.3.1.1
Conversion to gray scale . . . . . . . . . . . . . . . . . .
15
2.3.1.2
Sobel-based edge detection . . . . . . . . . . . . . . . .
16
2.3.1.3
Negative pages . . . . . . . . . . . . . . . . . . . . . . .
17
2.3.1.4
Flood fill and region extraction . . . . . . . . . . . . . .
18
ix
x
CONTENTS
2.3.2
2.4
2.5
3
Balloon extraction . . . . . . . . . . . . . . . . . . . . . . . . . .
18
2.3.2.1
Region culling . . . . . . . . . . . . . . . . . . . . . . .
20
2.3.2.2
Region scoring . . . . . . . . . . . . . . . . . . . . . . .
21
2.3.2.3
Region sorting . . . . . . . . . . . . . . . . . . . . . . .
21
2.3.2.4
Region filtering . . . . . . . . . . . . . . . . . . . . . .
22
Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
2.4.1
Analysis of results . . . . . . . . . . . . . . . . . . . . . . . . . .
28
2.4.2
Comparison to other algorithms . . . . . . . . . . . . . . . . . . .
29
2.4.3
OCR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
Conclusions
References
33
35
List of Figures
2.1
Alpha Flight 6, January 1984, page 9 [1]. . . . . . . . . . . . . . . . . . .
8
2.2
Comic book page components. . . . . . . . . . . . . . . . . . . . . . . . .
9
2.3
(left) Original image. The Amazing Spider-Man 679, April 2012, story page
5; (right) Gray scale image. . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4
(left) Book page after applying the Sobel edge detector; (right) Negative of
the Sobelized book page. . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5
22
Balloon regions extracted from the book page shown in Fig. 2.3. Although
not visible here, the regions have a balloon shape indeed. . . . . . . . . . .
2.8
19
(top) Typical histogram of a balloon region; (bottom) histogram of an ordinary (non-balloon) region. . . . . . . . . . . . . . . . . . . . . . . . . . .
2.7
14
Flood fill method comparison. (left) standard flood fill extraction, (right)
modified flood fill extraction. . . . . . . . . . . . . . . . . . . . . . . . . .
2.6
13
23
Special case balloons that were successfully recognized. Balloons having
different text colors than other balloons, different text colors inside the same
balloon, shapes different from standard balloons, wavy text, different font
faces in the same balloon. . . . . . . . . . . . . . . . . . . . . . . . . . . .
xi
30
List of Tables
2.1
Criteria for size-based region culling. . . . . . . . . . . . . . . . . . . . .
20
2.2
Extraction results for [2] . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
2.3
Extraction results for [3] . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
2.4
Extraction results for [4] . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
2.5
Extraction results for [5] . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
2.6
Extraction results for [6] . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
2.7
Extraction results for [7] . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
2.8
Extraction results for [8] . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
2.9
Extraction results for [9] . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
2.10 Extraction results for [10] . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
2.11 Extraction results for [11] . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
2.12 Extraction results for [12] . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
2.13 Extraction results for [13] . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
xiii
Chapter 1
Introduction
A hand held device, or mobile device, or even hand held computer, is a small computing
device defined by a small screen size and touch input or minimal keyboard. The category
comprises smart phones, tablets or PDA’s. These types of devices started to appear in mass
production at the turn of the century, driven by a need for always accessible communication and content. Early usage scenarios, beyond establishing phone calls, involved text
messaging, e-mail or simple web browsing, but with the increase in network bandwidth,
cpu power and battery life, those usage scenarios grew more complex and now encompass
everything from book reading to watching movies or video chat. This device market has
seen an explosive growth in the last few years, particularly since the introduction of Apple’s
Ipad device. While some types of content like text books are perfect matches for the devices,
others require some trade-offs when viewed in devices with reduced screen sizes. One of
these content types was comic books.
Comic books, as an art form, have been present in human cultures for many years, with
some possible representations as far back as the middle ages [14], and are a popular form
of entertainment and story telling around the world. With the convenience of hand held
devices, it was logical to expect comic books to be widely spread in usage on those devices,
given the almost perfect match between target audiences for comic books and hand held
devices. Yet, despite the fact that the major publishers have, recently, begun to distribute
digital versions of their comics[15][16], the adoption has been, at best, slow, when compared
to e-books, music or movies. This is, in no small measure, caused by the inadequacy of the
traditional comic book to the small screen size. Currently, one of two things happens when
1
2
CHAPTER 1. INTRODUCTION
you try to read a comic book in a hand held device:
- The digital comic book has been pre-processed and embedded information regarding
text has been included in the file.
- The digital comic book has not been pre-processed and in order to correctly read it,
the reader has to manually zoom in on all the text elements of the comic to be able to
read them and comprehend the story.
Both of these approaches have drawbacks. The first requires more pre-publishing time for
new comic books before reaching the market, and more manual processing, hence making
them more expensive than required and also delays the distribution, and the second makes
the reading experience very user-unfriendly and ultimately leads to potential readers not
choosing these devices. Since digital comic book have multiple advantages over traditional
paper comic books, like easier storage, global availability, potentially lower price and faster
distribution, better searching and indexing and, even, better quality, it is quite distressing not
to take advantage of it. Of no lesser importance is the fact that digital comic books allow
younger (or new) readers access to comics that are no longer available in paper format[17].
As in other areas like automatic document digitalization[18] or video processing[19], there
are digital image processing techniques that can be applied to comic books to provide some
of the lacking functionalities, namely, correctly identifying the elements of the comic book
image that contain text - the balloons. Relying on edge detection methods, like Sobel, flood
fill and histogram analysis[20], the algorithm presented in this thesis describes an answer
to the problem of how to extract the balloons from comics book pages, to improve usability
and provide a better experience to the end user and allow for an adoption of digital comic
books comparable to text e-books.
1.1
Motivation and Objectives
The main goal of this thesis is to demonstrate the feasibility of applying digital image
processing techniques to comic book images and correctly identify and extract the balloons,
and in doing so, make it possible to use small screen devices to adequately read comic
books, in the same way as those devices are used to read traditional text books today.
Existing proposed solutions to this problem have almost always focused more in simple
1.1. MOTIVATION AND OBJECTIVES
3
web comics, which are, traditionally, comics with much simpler art style and much smaller
in scope when compared to commercial comic books published by companies like Marvel
Comics, DC Comics or Dark Horse. Those approaches have, invariably, produced extremely
good results for those types of web comics, but less than adequate [21] for commercial
comic books. Other solutions are targeted to specific types of comics, like manga, and
target specific characteristics like top-to-bottom text flows for identification of text balloons
[22]. It has been, so far, impossible to find a solution that has acceptable level of success
extracting balloons accross all types of comics.
This problem of identifying the balloons is not, by any means, a trivial task[18]. Balloons
can have any shape, size or be in any location inside the comic book page. Artistic style can
make the balloons spread over multiple panels, or overlapping each other. Text inside the
balloons can have any color or font face, sometimes even inside the same balloon [23][14].
Another important consideration is that hand held devices are battery powered, so any
approach to this problem must account for the fact that, cpu intensive operations will drain
that battery faster, and, as such, result in a bad user experience. This would work against
the adoption of these devices for comic book reading.
This has been an insurmountable problem, so much so, that, despite the fact that the main
publishers have digital distribution systems in place, the actual digital comic books they sell
are pre-processed manually to identify the balloons. This is, in a way, comparable to the
medieval form of book copying by scribe’s hand, and clamouring for a workable, automated,
solution, that allows for both enhancing the digital comic book experience and allow for the
automated indexing and processing of the digital age.
Irrespective of all those challenges, the main motivation for tackling this problem stems
from the desire to find an enjoyable solution that brings the convenience of hand held devices
that can store several thousand comic books with the pleasure of having available, almost
anywhere, comic books to read, and finally replace dead-tree comics with equally good
digital versions.
4
CHAPTER 1. INTRODUCTION
1.2
Research Contribution
The major contributions of this thesis is an automatic balloon identification and extraction
algorithm using digital image processing techniques.
1.3
Organization of the Thesis
This document has the following chapters:
- Chapter 1. This the current chapter, which approaches the subject of the thesis, its
motivation and objectives, well as the the problem solved in this thesis, the organization of the thesis, and the audience to which the thesis is addressing.
- Chapter 2. This chapter is the core of this thesis, because it describes in detail the
algorithm to extract balloons from comic book pages. This algorithm is based on
image processing techniques, and applies to any sort of comics.
- Chapter 3. This is the last chapter of this thesis, where relevant conclusions are
drawn and possible future work is outlined.
1.4
Publications
Resulting from this work, the following paper has been submitted for publication:
J. Correia and A. Gomes. Complex Comic Book Balloon Extraction Using
Edge Detection And Histogram Scoring.
1.5
Target Audience
The target audience of this thesis includes, but is not limited to, software developers for
mobile platforms, comic book publishers and creators, digital imaging researchers, mobile
device content publishers and distributors.
Chapter 2
Balloon Extraction from Complex Comic
Books Using Edge Detection And
Histogram Scoring
This chapter proposes an algorithm based on the Sobel operator to identify balloons in
comic book pages. Unlike other approaches, this method works on colored and complex
comic book images, as well as comic strips, without making any assumptions regarding
the line continuity of the image, the orientation of the text or the color depth of the image.
Each comic book page is input to the Sobel operator, then each closed region of such a
page is identified, being afterwards each region subject to equalization and scoring, using
for that the mean value of its histogram. Experimental test results show that our method
significantly improves the rate of correctly detected balloons, and simultaneously decreases
the number of false positives, when compared to other methods.
2.1
Introduction
Textual information present in comic books has been so far out of reach of automated
indexing because there was not any reliable way to do it on complex comic books. Most
existing methods have good results on simple web comics or comic strips, but have much
lower success rates on colored and complex pages of comic books.
5
6
CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS
The method presented in this paper offers a possible approach to tackle this problem. It
opens possibilities for information indexing, processing and retrieval similar to what already
exists for standard electronic books, being thus also suited to tablets and smartphones.
These types of devices typically have small screens which make reading comic books either
cumbersome (having to zoom in on each portion of the page) or tiresome (straining the
eyesight to read the small text). By applying our method, applications only need to enhance
or zoom in on the balloons, without loss of visual context, and without preprocessing
requirements at the time of publishing.
Current commercial solutions for digital comic books build on proprietary formats that
incorporate textual information manually introduced at the time of publishing. It is clear that
this procedure increases file sizes, adds complexity and delays the time-to-market. Instead,
by applying the method here proposed, the books can simply be packaged as a set of scanned
images. This will make the transition to digital formats even faster, fostering thus also easier
adoption of such digital formats on handheld devices.
But correctly identifying the text balloons in a complex comic book page is neither easy nor
straightforward. As others [18] have also recognized, text extraction of comic book images
is harder than text extraction from traditional images like pictures or documents, because
the noise level and the image texture (e.g., depth sense is not so notorious as in pictures) of
the drawings are specially problematic.
The vast disparity of possible shapes, colors and arrangement of the elements within a given
comic book page make the automated identification of text elements a very hard process.
There is no technique or shape for the text elements that is always adhered to, and there
can be no assumptions regarding positioning inside the panels or even regarding the shape
of panels itself. To deal with this problem, the currently proposed solutions fall into three
main categories, namely:
- edge based detection,
- blob (or region) based detection,
- connected component based detection.
Some other methods of text extraction from generic images, rather than specifically tailored
for comic book images, have also been studied (see, for example, [24] [25]), but they are
not so relevant for the problem in hand.
2.1. INTRODUCTION
7
Edge based techniques rely on identifying the edges in the panels and balloons, including
edges of the text glyphs. [18], presented a generic method for separating text from imagery
that, while not specifically designed for comic book images, still provides some acceptable
results for simple comic strip images. Such a method relies on the principle that edges of
text glyphs are smaller when compared to the edges or lines of the surrounding image, so
that balloons are the regions in which are located the smaller edges.
[26] used the same principle of text size to enclose text of each balloon inside a bounding
box. For that, they used a Canny edge detector and other image filters to separate the text
out from the rest of the image.
It is clear that the above methods produce a number of false positives, that is, a number
of non-textual edges are mistaken as textual edges. For example, hair strands growing out
from the skin can be mistaken as text glyphs in a comic book page. [27] presents a possible
solution to this problem that consists in using a text extraction method based on the curvature
of the edges. In fact, the accentuated curvature of the text elements allow us to distinguish
between small non-textual edges and small textual edges, and this is something which could
be adapted for extracting text from balloons in comic books.
Interestingly, all of those methods produce fairly good results for simple images as those
of web comics, but not for commercial comics books in which the artistry is complex. The
assumption for them to work properly is that the edges are well delineated, but such ideal
is not found in complex images of commercial comic books, where many crossing lines
and balloons overlapping across multiple panels and gutter may populate book pages. Also
there are no guarantees that the balloon boundary is completely enclosed and connected.
Following [28], our method makes usage of a Sobel edge detector. It allows us to overcome
the problems aforementioned. More specifically, the Sobel operator produces delineated
edges even when such edges are not well defined in the original image, being this achieved
at expense of identifying gradient changes and enhancing the boundaries of regions in the
image.
On the other hand, blob detection methods depend on identifying shapes and detecting
contrast between text and background. [21] follows an approach that is based on shape
identification by comparing the shapes of a dictionary of letters and symbols in relation to
the binarized images, trying so to locate the text elements. Brahma ends up concluding that
the method does not produce adequate enough results to replace traditional paper comics
8
CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS
Figure 2.1: Alpha Flight 6, January 1984, page 9 [1].
with electronic comic books. In [22], blob identification is also based on rejection of blobs
with reference to their size, resulting from this screening a list of potential text blobs. Once
again, these methods produce good results on simple comic book images, but fail in comics
with balloons that span multiple panels or in images with oddly shaped balloon elements.
Finally, connected component based methods rely on the assumption of continuity between
the outer edges of the images and every other colored pixel, with the exception of text
pixels. The method works by tracking all pixels different than a background color (usually
white), eliminating those background pixels from the edge of the image towards the center.
Of course, the downside is that even for simple images there is a very small probability
of having all pixels connected somehow, not being so possible to distinguish between text
pixels and background pixels. From empirical evidence, regarding complex and colored
comic book pages, the probability of this event to occur in a comic book page is zero in
practice. But, when one applies a connected component based method to even simple web
2.1. INTRODUCTION
9
Figure 2.2: Comic book page components.
comic images, it may occur some line noise (i.e., left-over lines) on discontinuity areas.
[29] suggests going over the left-over areas and applying a width/height threshold to the
bounding boxes of the remaining connected components in order to identify text areas. But,
Guo et al. recognize that the result was acceptable for 80% of the samples taken into account
in spite of the persistent left-over noise artifacts.
In [30], one tries to extract panels and text elements out comic pages by means of a connected component based method and histogram analysis to identify text areas. Their method
depends strongly on text orientation, because it checks the horizontal pixel density in relation to the vertical pixel density to identify text regions. In fact, if the text is in any other
orientation, the ratio of horizontal pixel density to vertical pixel density will fall outside of
the pre-defined threshold.
In the context of image processing, [31] compares edge based methods and connected
component methods on color images, and specifically tests those methods on different
lighting conditions. Sushma and Padmaja conclude that edge based methods frequently
produce more robust results, and are considered superior to connected component methods.
and that can also be extended to the problem at hand.
10
CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS
In short, those three types of approaches produce good results when they are applied to
simple images like web comics, but not to complex images such as comic book pages. Edge
detection methods depend heavily on the principle of perfect line separation, thus they fail
when there are many intersections and text overlap. Connected component methods produce
noise artifacts which are often confused with text glyphs during the OCR (optical character
recognition) processing. In respect to region detection based methods, one assumes that the
text glyphs are always of the same shape, which is not true, or the text glyphs have only
horizontal orientation and are not overlapped by other elements, which is not true either.
Despite that, each category of algorithms has interesting techniques, like histogram analysis
on blob-based methods or edge detection on edge-based methods.
Edge detection is indeed adequate for the type of images in comic books because the drawings have defined edges between all color areas, either by having steep gradient changes,
easily detectable by derivative operators like Sobel, or by strong, thick, black lines separating the elements in the drawing. This is a defined characteristic of the comic book medium.
Also, connected component techniques can be used to, within each defined area, identify
all the contained pixels, and in turn, be used to calculate the bounding boxes used in region
based detection.
The method presented in this paper ends doing that, taking the strong points of each of the
methods described and producing a more robust extractor of text balloons that works in
complex comic book images with a comparable, or even better, result rate when compared
to each of the other methods when applied to simpler images. It is also worth noting that the
method presented in this paper does not intend to extract the text (that is, perform OCR), but
rather correctly find the text-containing elements of the comic book page —the balloons.
This can be used as a first step for performing OCR, but can also be useful on its own
because, when viewing comics on small screen devices, it is often enough to simply zoom
in the balloons rather than actually getting the text inside them. When OCR is actually
needed (for indexing, searching, etc), then the method presented here produces images that
contain just the balloons and have no image noise, which will then permit a straight forward
text extraction using readily available OCR processing engines.
2.2. PAGE LAYOUT AND TERMINOLOGY
2.2
11
Page Layout and Terminology
Since this method is aimed towards comic book pages, the elements of each one of these
pages are described here to allow a better understanding of the terminology used. For the
purpose of this document, a comic book page and a comic vignette (like a web comic strip)
are indistinguishable and exchangeable at will. Both contain the same basic elements and
as such, the method can be used on either without compromising the validity of the results.
A page in a comic book is composed of several distinct elements:
- Panel — This is the basic element of a comic book page. Usually, a panel contains a
single illustration or drawing. Commonly, but not always, a panel is delimitated by a
clear boundary that separates it from other panels. Each panel represents a scene or
moment in the story. A page can contain one or more panels of diverse shapes.
- Gutter — This is the space between panels. On older comic books or on vignettes,
the gutter is usually white and clearly visible. On modern comic books, it is barely
present, but when it exists, it provides an empty space that separates each panel
from its adjacent panels. Artistic freedom exists in modern comic books, so some
art intrudes itself into the gutter, and even flows from one panel to another across it.
- Art — The art consists of the drawings, which usually are made inside panels. Nevertheless, there is artistic leeway that allows for its extension beyond the limits of the
panels at the discretion of the artist. Drawings are used to convey the narrative and
setting of the story. All colored drawing areas have very well defined borders, either
through black contour lines or contrasting colors. In the case of black and white art,
the regions are separated by black contour lines or different shades of gray.
- Balloons — These elements are enclosed areas where text is written. They represent speech, thoughts or narrative data. They are usually placed inside panels,
but sometimes they extend outside those panels. Their shape is usually rounded
for speech balloons and thought balloons, and usually rectangular for narrative data
panels. Balloons usually have a tail that points towards the originator of the speech or
the thought in the drawing.
- Splash Balloons — The main difference in relation to other balloons is the shape
which is jagged. Usually, a splash balloon conveys an exclamation or dramatic text,
12
CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS
which is very common on Manga books (Japanese comic books).
Traditionally, balloons have white background and black text. In fact, this is the only feature
common to all types of comics books. Some exceptions do exist, however, usually to lend a
different quality to the voice or thought process of the comic book character. This element
is created to be as clearly visible and legible as possible, as well as to draw the reader’s
attention who must interpret it quickly and without ambiguity. On a more practical level, this
translates into text balloons that are brighter than ordinary regions in the page. Obviously,
this helps the reader to follow the sequence of the story.
Regarding the overall composition, comic books have such a diversity of styles that there
are no guarantees that all elements are present in given a book page. Also, no assumptions
can be made concerning the shape, size or disposition of those elements. In fact, some
comic books have whole pages with nothing in them as part of the narrative [1]. This formal
freedom can show up in other forms such as in the drawing of panels-within-panels-withinpanels or deliberately in the drawing of balloons that overlap.
Also, there are some highly distinctive styles between European, American and Japanese
comic books. Those are the predominant types of comic books existing today, and each
has specific ways of using the comic book elements. For example, American comic books
have pratically no gutter between panels, and the art elements may flow from one panel to
another, while Japanese comics, or manga, have different panel layouts, with long, dramatic
panels covering the width or height of the pages occurring more often.
2.3
Balloon Extraction Algorithm
Our algorithm belongs to the category of edge based algorithms while at the same time using
histogram analysis common in blob-based algorithms. The balloon extraction algorithm
described here is essentially a two-stage algorithm. The first stage is an image segmentation
algorithm that divides each comic book page into regions. The second stage basically filters
out the regions relating to balloons.
The only assumption made by the algorithm is that the balloons in the images have white
background (or very bright background) and black text or, alternatively, they are composed
of a single background color and a single text color. In fact, only residual cases do not
2.3. BALLOON EXTRACTION ALGORITHM
13
Figure 2.3: (left) Original image. The Amazing Spider-Man 679, April 2012, story page 5; (right)
Gray scale image.
14
CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS
Figure 2.4: (left) Book page after applying the Sobel edge detector; (right) Negative of the Sobelized
book page.
2.3. BALLOON EXTRACTION ALGORITHM
15
follow this rule. This is due to the fact that text balloons are made to be quite legible to the
reader, i.e., to facilitate their localization to the reader. Naturally, there are a few special
cases that fall outside this definition, but we will see how well our algorithm applies to them
in the results section.
2.3.1
Page Segmentation
Let B = {Pi }, i = 0 . . . n, a colored comic book with n pages. A comic book page is
composed of multiple closed regions, possibly with different colors, but separated by black
outlines. The text balloons are particular cases of closed regions, being for that also delimited by black outlines. The segmentation of each comic book page Pi into regions involves
the following steps:
1. Convert Pi to a grayscale page PiG .
2. Apply the Sobel operator to PiG to get PiS .
3. Compute the negative page PiN of PiS .
4. Apply an enhanced flood fill operator to all black pixels of PiN to obtain all page
regions.
By using a Sobel-based edge detector together with a flood fill operator, all regions in a
comic book page can be thus separated and extracted.
2.3.1.1
Conversion to gray scale
To ensure that this method can be applied to color images and also to black and white
images, the first step is to apply an operator that converts color images into grayscale images.
Since the overall goal is to be able detecting the brightest regions (i.e., balloons) in the
image, the conversion to a gray scale representation can be achieved by calculating the
luminance of each pixel from its RGB components.
Luminance is a measure of the brightness of a given pixel when adjusted for human vision.
The human eye can detect brightness variations better than color variations [20], so the
16
CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS
conversion should respect this feature. One way to achieve such a conversion is to utilize
the RGB-to-YCrCb conversion as follows [32]:



 
Y
0.2989
0.5866
0.1145
R
  
 
Cb = −0.1687 −0.3312 0.5000  G .
  
 
Cr
0.5000 −0.4183 −0.0816 B
(2.1)
This allows us, in a simple manner, to obtain color information in terms of luminance and
chrominance (YCrCb), rather than just color (RGB). The luminance channel (Y) is simply
a grayscale representation of the original image, such that brighter colors are transformed
into brighter grays. The other channels (Cb and Cr) can be ignored for the purpose of our
algorithm, so they do not have to be calculated. They represent the chrominance of blues and
reds. Obviously, if the original book pages already are in black and white, or in grayscale,
this conversion step can be skipped.
It is also clear that other color space representations other than YCrCb might be used after
all, as long as they include a channel for luminance [33]. We could even use distinct weights
for the values of R, G, and B in the computation of luminance, but, as explained further
ahead, this may lead to a need for applying histogram equalization to page regions. The
important thing to retain here is that the computing of luminance should allow us to mimic
the way how human vision delineates contours of regions, in a way that edges extracted by
the algorithm correspond to edges identified by the human eye.
Threfore, the grayscale conversion step is crucial to enable the Sobel operator to detect
intensity changes (edges) more easily, and also ensure that the detected edges are closer to
what a human would perceive as an border when looking at the image.
2.3.1.2
Sobel-based edge detection
From the grayscale page image, the edges must be identified and enhanced for further
processing. In image processing, an edge is a group of pixels that have a significant difference of brightness in relation to neighbouring pixels. This can be seen as a sudden
change (or discontinuity) when looking at the color variation on a given area. These changes
exist between regions of an image, or between foreground objects and background objects
[28][20].
2.3. BALLOON EXTRACTION ALGORITHM
17
We used the Sobel operator for edge detection, since it has proved itself perfectly adequate,
not to speak of its simplicity, to accentuate shape contours. In fact, in contrast with photographic images, art lines drawn in comic books are continuous, which guarantees that the
edges found in book pages are precise, in particular those concerning boundaries found in
the page image. This also means that there is no need for any kind of image pre-processing
using a sharpening filter to better delineate contours in images.
The Sobel operator convolves two 3 × 3 kernels with the original image in order to compute
approximations to the derivatives in both x-direction and y-direction, which are sensitive to
vertical and horizontal intensity changes, respectively [20]. Thus, this operator slides each
kernel on the book page, computing then the product of the kernel matrix and the 3 × 3
neighborhood matrix of each pixel in book page. When compared to other edge detection
operators, the Sobel operator produces thicker edges [28], which is desirable for this type of
images because eases the detection of region contours, as needed when performing region
extraction through flood fill algorithm.
Also, since this operator is fast to traverse the complete image because only requires knowledge of the 3 × 3 neighborhood of each page pixel. As a margin note, let us say that we
have also tested a Laplacian operator, which is still faster than Sobel operator, to detect
and enhance edges in the image, but it produced aliasing artifacts on the edges, originating
continuity problems across page regions. More specifically, some regions get connected
when they indeed lie separate in the page.
2.3.1.3
Negative pages
As shown above, edge detection tries to match the way the human eye perceives colors
and brightness, so that each book page is first transformed to a gray scale representation,
immediately before applying the Sobel operator which increases the contrast of the edges
with black lines in a white background. To facilitate processing, one calculates the negative
of each book page, hence resulting white edges in black background (right hand side of
Fig. 2.4).
18
CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS
2.3.1.4
Flood fill and region extraction
Found its negative, each book page can be separated into closed regions. Each region is a
set of pixels from the original book page delimited by a number of edges (white pixels in
the negative book page).
Careful analysis revealed that the letters inside text balloons are also identified as edges.
This is the expected behavior of an edge detection operator, since the text letters can be
considered as discontinuities (edges) in the image. However, this fact is derisive to a proper
extraction of the balloon regions, because making such extraction of regions would result in
extracting the balloons without the letters; in other words, we would obtain balloon regions
with holes left by the letters, as illustrated in left hand side of Fig. 2.5. Therefore, the flood
fill must be adapted to overcome this problem.
Recall that a standard filling algorithm inundates the basin of a region, in a progressive
manner, while its frontier is not attained by the water. A letter inside a balloon region works
as a barrier to the progression of the water. Consequently, flooding a balloon region gives
rise to a region with holes. Filling in these holes can be accomplished by, first, identifying
the most left pixel (see red pixels in the right hand side of Fig. 2.5) and the most right pixel
(see green pixels in the left hand side of Fig. 2.5) of each row of the flooded region, copying
then all the pixels of the corresponding row of the original colored book page onto a colored
ballon region. Thus, in the end of this stage, the extracted regions of a book page are all
colored regions.
2.3.2
Balloon extraction
The page segmentation described above has produced a set {R j }, j = 0, . . . , N, of regions
for each book page, some of which concern the text balloons. The question then is how
to filter out these regions with text balloons, what corresponds to the second stage of our
algorithm, i.e., balloon extraction, which consists of the following steps:
1. Discard either too large or too small regions.
2. Score the remaining regions (mid-sized regions) with respect to their grayscale histograms.
2.3. BALLOON EXTRACTION ALGORITHM
19
Figure 2.5: Flood fill method comparison. (left) standard flood fill extraction, (right) modified flood
fill extraction.
20
CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS
3. Sort mid-sized regions according to their weighted average luminance.
4. Filter out ballooned regions using an empirical threshold based on the weighted average luminance.
2.3.2.1
Region culling
Having extracted all the regions present in the image, it is a priori necessary to cull a
significant number of regions in order to increase the overall performance of the algorithm.
During testing, it was found that most of comic book pages had between 500 and 1000
separate regions; for example, the book page shown in Figure 2.3 contains 661 regions.
Moreover, most of them were too small (just a couple of pixels) or too large (in height or
width) for being considered as balloon regions. The rather small regions usually concern art
details or single letters, which are irrelevant to the outcome of the algorithm and thus can be
discarded straight away. In respect to too large regions, we noted that they usually represent
the gutter or extensive background segments, but not balloon regions, so they should also
be discarded altogether.
Too small Too large
Relative region width
<1.5%
>50%
Relative region height
<1.5%
>50%
Relative region area
<1%
>20%
Table 2.1: Criteria for size-based region culling.
Therefore, as shown in Table 2.1, culling of regions is based on one of the following criteria:
- Relative region width;
- Relative region height;
- Relative region area.
If the region width is less than 1.5% or is greater than 50% relative to page width, it is
then considered as discardable and thus excluded from the remaining set of regions. These
culling percentages also apply to the relative height of each region. Additionally, any region
whose area is smaller than 1% or larger than 20% relative to page area is discarded. This
2.3. BALLOON EXTRACTION ALGORITHM
21
usually removes 90% of the extracted regions, making the next step of the algorithm even
faster.
2.3.2.2
Region scoring
Before proceeding any further, it may be convenient to remember that we still have about
10% percent of negative regions that need to be ranked in some way, since many of them
will also be dropped. For this purpose, we first extract the color regions from the original
colored book page that are homologous to those negative regions, using for that a pixel-wise
copy operation.
Afterwards, let us convert these color regions to grayscale regions because we intuitively
know that brighter regions correspond to balloon regions. One way of ranking the remaining
regions, and identify the ones most likely to be balloons, is generating a histogram for each
region, taking into consideration the overall luminance of the region.
Obviously, the luminance of a region depends on the luminance of each one of its pixels,
which is given by the Y value in (2.1). Note that Y ∈ [0, 255] ⊂ R, so after calculating the
Y -value of every pixel pk of a given region with N pixels, we have to map it into the discrete
grayscale of the histogram as follows:
G(pk ) = round(Y (pk ))
(2.2)
so, the corresponding histogram bin scores one more point; for example, if G(pk ) = 231,
the bin numbered as 231 will be increased of 1, that is,
h(231) = h(231) + 1.
(2.3)
where h(i), i ∈ [0, 255] ⊂ N, denotes the i-th bin of the histogram.
2.3.2.3
Region sorting
So, taking into account that we are dealing once again with grayscale regions, the histogram
of each region has a range of values G ∈ [0, 255] ⊂ N (dark – white). It is then clear that a
histogram of a brighter region (i.e., a balloon region) has a peak closer to its right hand side
22
CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS
Figure 2.6: (top) Typical histogram of a balloon region; (bottom) histogram of an ordinary (nonballoon) region.
(or white) (Fig. 2.6 Top). Thus, a grayscale region with N pixels can be ranked according
to the weighted average luminance:
255
∑ i × h(i)
L=
i=0
N
(2.4)
Since the histogram range values vary from 0 (black) to 255 (white), regions with a higher
percentage of white will score more, and darker regions will score less (Fig. 2.6 Bottom).
Thus, the higher scoring regions are typically the balloons in the image.
The regions whose histograms are shown in Fig. 2.6 represent a balloon (top) with L=254,00
and an ordinary region (bottom), with an L=84,25.
2.3.2.4
Region filtering
Experimental results showed that the weighted average luminance L = 247 is the threshold
above which a region is considered as a balloon; otherwise, the region will be discarded.
Note that the white band [247, 255] represents only 3% of the histogram range. For example,
2.3. BALLOON EXTRACTION ALGORITHM
23
Figure 2.7: Balloon regions extracted from the book page shown in Fig. 2.3. Although not visible
here, the regions have a balloon shape indeed.
the regions shown in Fig. 2.7 concern the text balloons that were filtered in this manner from
the book page depicted in Fig. 2.3.
Interestingly, when the ranking produces no results above the threshold L = 247, a different
method can be used to identify the correct regions, specifically, color counting. Using the
histograms already created for each region, score each region by counting the number of
spikes in the histogram. Any region with exactly two spikes is a candidate balloon region
(foreground and background colors). Image artifacts like aliasing, bluring or sharpening
can cause the number of spikes to change, so care should be taken not to use other image
processing filters before creating the histogram. This alternative method deals with almost
24
CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS
all cases where balloons are not the brightest areas on the page.
Besides, it may happen that, with poor quality scanned images or in old, yellowed paper
scanned images, the original image is too dark or has very low contrast. All the balloons
would fall low of the threshold for balloon detection. In this case, an equalization of the
histogram will solve this problem, by normalizing the color representation values for the
purpose of scoring only.
2.4
Experimental Results
The testing of our algorithm were performed using a PC powered by an Intel Core i7 860
processor, 2.8 Ghz clock, with 8GB RAM and one ATI 5670, and running Windows 7 (64
bit version) operating system.
To test and demonstrate the validity of the method presented, a test run with 7 comic books,
from different publishers and with different artistic styles, was used. The comic books
were scanned as lossless PNG files, with resolution 1024x1590, 150 DPI, and only the first
10 story pages per book were used, what makes a total of 70 pages of comics analysed.
Note that the page numbers in Tables 2-13 are relative to the actual page number of the
book’s story. For example, the page number 1 might correspond, in a particular book, to
page 4, discounting covers, publicity pages, title or credits pages, since those have no text
balloons and are not actually comic book images. Also, when we have two consecutive
pages, each showing a half of a larger image (i.e., a page spread), they count as only one,
since that is actually what they represent. This was done purely for convenience but had
it been done differently, would not affect the result. As a margin note, let us say that the
average processing time of each book page was 1 to 2 seconds.
The balloons per page were counted manually and this number was compared with the
number of balloons correctly identified by our method. False positives, as well as missed
balloons, were also counted. Recall that a false positive is an image region that the method
presents as being a balloon containing text but in reality is not, and a missed balloon is
a balloon existing in the image but not identified as such. Those two cases are presented
separately because they represent the two possible points-of-failure (resulting usually from
either under or over-tuning of the thresholding parameter).[2]
2.4. EXPERIMENTAL RESULTS
25
Table 2.2: Extraction results for [2]
Batman 670
Page
1
2
3
4
5
6
7
8
9
10
Total
Percentage
Success
Detections
6
6
8
10
4
9
6
8
6
7
70
Actual Balloons
6
6
7
6
4
7
7
6
6
7
62
False Positives
0
0
1
4
0
2
0
2
0
0
9
12,86
87,14
Missed Balloons
0
0
0
0
0
0
1
0
0
0
1
1,43
98,57
Percentage
Success
Table 2.3: Extraction results for [3]
Batman 671
Page
1
2
3
4
5
6
7
8
9
10
Total
Detections
1
4
4
6
23
1
3
6
8
6
62
Actual Balloons
1
5
4
6
8
4
4
4
8
6
50
False Positives
0
0
0
0
15
0
1
2
0
1
19
30,65
69,35
Missed Balloons
0
1
0
0
0
3
2
0
0
1
7
11,29
88,71
Percentage
Success
Table 2.4: Extraction results for [4]
Batman 672
Page
1
2
3
4
5
6
7
8
9
10
Total
Detections
1
8
6
4
8
9
7
5
9
4
61
Actual Balloons
1
8
6
4
11
9
7
5
9
4
64
False Positives
1
0
0
0
0
0
0
0
0
0
1
1,64
98,36
Missed Balloons
1
0
0
0
3
0
0
0
0
0
4
6,56
93,44
Percentage
Success
Table 2.5: Extraction results for [5]
Batman 673
Page
1
2
3
4
5
6
7
8
9
10
Total
Detections
2
1
2
1
5
3
8
8
6
4
40
Actual Balloons
2
1
2
0
2
2
8
8
6
4
35
False Positives
0
0
0
1
3
1
0
0
0
0
5
12,50
87,50
Missed Balloons
0
0
0
0
0
0
0
0
0
0
0
0,00
100,00
26
CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS
Table 2.6: Extraction results for [6]
Batman 674
Page
1
2
3
4
5
6
7
8
9
10
Total
Percentage
Success
Detections
4
16
4
6
3
5
9
11
2
3
63
Actual Balloons
4
5
4
6
3
5
9
11
2
3
52
False Positives
0
11
0
0
0
0
0
0
0
0
11
17,46
82,54
Missed Balloons
0
0
0
0
0
0
0
0
0
0
0
0,00
100,00
Table 2.7: Extraction results for [7]
Amazing Spider-Man 643
Page
1
2
3
4
5
6
7
8
9
10
Total
Percentage
Success
Detections
8
12
9
5
8
7
4
19
10
7
89
Actual Balloons
7
12
9
4
8
6
4
12
10
5
77
False Positives
1
0
0
1
0
1
0
7
0
2
12
13,48
86,52
Missed Balloons
0
0
0
0
0
0
0
0
0
0
0
0,00
100,00
Table 2.8: Extraction results for [8]
Amazing Spider-Man 644
Page
1
2
3
4
5
6
7
8
9
10
Total
Percentage
Success
Detections
10
8
10
11
8
10
9
11
7
15
99
Actual Balloons
8
7
10
10
0
9
8
9
7
13
81
False Positives
2
1
0
1
8
1
1
2
0
2
18
18,18
81,82
Missed Balloons
0
0
0
0
0
0
0
0
0
0
0
0,00
100,00
2.4. EXPERIMENTAL RESULTS
27
Table 2.9: Extraction results for [9]
Amazing Spider-Man 645
Page
1
2
3
4
5
6
7
8
9
10
Total
Percentage
Success
Detections
8
12
7
9
7
8
15
10
11
10
97
Actual Balloons
8
10
7
9
6
8
15
7
11
10
91
False Positives
0
3
0
0
1
0
0
3
0
1
8
8,25
91,75
Missed Balloons
0
1
0
0
0
0
0
0
0
1
2
2,06
97,94
Table 2.10: Extraction results for [10]
Amazing Spider-Man 646
Page
1
2
3
4
5
6
7
8
9
10
Total
Percentage
Success
Detections
4
4
9
9
2
6
7
5
7
8
61
Actual Balloons
6
4
8
9
2
7
8
1
7
8
60
False Positives
0
0
1
0
0
0
0
4
0
0
5
8,20
91,80
Missed Balloons
2
0
0
0
0
1
1
0
0
0
4
6,56
93,44
Table 2.11: Extraction results for [11]
Amazing Spider-Man 647
Page
1
2
3
4
5
6
7
8
9
10
Total
Percentage
Success
Detections
7
2
10
13
12
8
8
8
4
4
76
Actual Balloons
8
2
10
13
13
7
9
7
4
3
76
False Positives
0
0
0
0
0
1
0
1
0
1
3
3,95
96,05
Missed Balloons
1
0
0
0
1
0
1
0
0
0
3
3,95
96,05
28
CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS
Table 2.12: Extraction results for [12]
Amazing Spider-Man 648
Page
1
2
3
4
5
6
7
8
9
10
Total
Percentage
Success
Detections
7
4
10
16
15
12
14
12
19
11
120
Actual Balloons
7
4
9
11
13
12
14
12
16
11
109
False Positives
0
0
1
5
3
0
0
1
3
0
13
10,83
89,17
Missed Balloons
0
0
0
0
1
0
0
1
0
0
2
1,67
98,33
Table 2.13: Extraction results for [13]
Amazing Spider-Man 649
Page
1
2
3
4
5
6
7
8
9
10
Total
Detections
10
14
16
6
9
12
5
14
15
3
104
Actual Balloons
9
11
16
6
7
11
4
8
14
3
89
False Positives
2
3
1
0
2
1
1
4
1
0
Missed Balloons
1
0
1
0
0
0
0
0
0
0
2.4.1
Percentage
Success
15
14,42
85,58
2
1,92
98,08
Analysis of results
Tables 2-13 show the results for 12 comic books (our bookset), a table per book. More
specifically, as mentioned above, each table presents test results for the first 10 pages of a
single book.
Pages with no false positives and no missed balloons are classified as having optimal results,
and this happens in 62 out of 120 pages in the bookset, which, given the complexity of
the pages, is a very high value. It means that balloon extraction is optimal or completely
automatic in more than 50% percent of book pages.
Also, only 25 out of 846 balloons of the bookset were missed out by the algorithm, what
represents 2.95% of total number of balloons.
Taking into consideration that each book page has approximately a minimum of 500 regions,
we end up to processing 5000 regions for the first 10 pages of each book, in a total of 60,000
regions concerning 12 books. Interestingly, by inspection of Tables 2-13, we note that the
algorithm only produced 119 false positives, what represents about 0.2% of the total number
2.4. EXPERIMENTAL RESULTS
29
of regions.
Interestingly, the page with a significant number of false positives is page 5 in Batman 671,
which has 15 occurrences of false positives. This case has occurred because on page 5 we
find snow, which in some circumstances can be easily mistaken as a balloon region, since
they are both white. In fact, all false positives are created by very bright single color regions
that are the same size as balloons. Such false positives could be avoided by counting the
peaks in the histogram, since all the false positives have only one color. Ou false positives
are visually similar to false positives described in [30].
2.4.2
Comparison to other algorithms
Other authors have tackled the problem of extraction of text balloons in comics (see, for
example, [34] and [22]). However, they used simple comics like web comics and flat color
books (e.g., Asterix, Lucky Luke or Garfiled), but not more complex comics like those
published by Marvel or DC such as Batman or Spider Man. In respect to simple comics, we
can say that our algorithm does not fail, that is, it does not produce no false positives nor
missing balloons.
Moreover, because this algorithm does not concern itself with the text inside the balloons,
but just with the balloons themselves, it can detect balloons that have text in any orientation,
direction, alphabet (cyrilic, arabic, chinese, etc.), font face type or color. During testing, our
algorithm successfully detected the balloons with special characteristics like those shown in
Fig. 2.8.
All of those balloons depicted in Fig. 2.8 cause problems to other methods. Different
colored text inside the balloons would lead to false positive detection, wavy text would
defeat horizontal text searches, and different font faces, with disconnected letters, would
make other methods to fail.
2.4.3
OCR
This algorithm does not attempt to perform OCR on the detected balloons because that is not
always the goal of balloon extraction and also because there are readily available solutions
30
CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS
Figure 2.8: Special case balloons that were successfully recognized. Balloons having different
text colors than other balloons, different text colors inside the same balloon, shapes different from
standard balloons, wavy text, different font faces in the same balloon.
to detect text in images, provided that those images only contain text, as is the case of the
balloons produced after applying this algorithm.
On the other hand, when the optical character recognition of the text in the extracted balloon
images is necessary, we can use any traditional OCR device that supports the font types used
in comic books or, simply, train the OCR device to recognize those font types. Traditionally,
the font types used belong to the ComicScript and Comicraft family for American and
European comic books. Those are the font types of the textual elements, not those of
onomatopeiae or drawn text elements (like street signs, billboards, etc) drawn into the
image.
During testing, using Google’s Tesseract OCR engine, by increasing the extracted balloon
size by a factor of 4 we obtained better results than simply using the regions of the original
image. This facilitated the correct recognition of individual letters that lie connected in the
2.5. FINAL REMARKS
31
original book page.
2.5
Final Remarks
The method presented offers better results for complex comics when compared to other
methods, and does so while at the same time having less processing requirements. The
results show that, for many pages, it is optimal, in the sense that it has no false positives
and correctly detects all balloons. It does, however, fail for specific pages with bright areas
or balloons with uncommon background colors, like black or other dark colors. These are
corner cases at best, and do not represent any significant portion of existing comic books. It
should be noted that no other existing method can deal with such corner cases successfully
either. Future work will address finding those balloons and hardening the false positives that
result from small bright areas in the image.
32
CHAPTER 2. BALLOON EXTRACTION FROM COMPLEX COMIC BOOKS
Chapter 3
Conclusions
The results achieved with the algorithm presented allow the conclusion that performing
balloon extraction in comic book pages, in an automated fashion, and without requiring
intense cpu power, is possible. Those results show that this algorithm is both reliable and
comprehensive in scope, in that it is not tailored to any specific style of comic book, but
rather generic enough to be applied to any comic book, without compromising the results.
It is true that there are still some corner cases that make total automation, for all comic
books, unadvisable, but it is also observable from the results that this algorithm produces
better and more consistent results than other methods.
Experimental results show that optimal cases are the majority, that is, processed pages
where there are no false positives and no missed balloons, which is the ultimate goal of
any algorithm for solving this problem.
The work presented can be expanded upon by looking at ways to double-check the regions
identified as balloons, in order to reduce the still-present false positives. This could possibly
be achieved by including OCR elements like character matching and exclude regions without matches, but this has not yet been confirmed. Also, it would be interesting to explore
the possibility of an implementation completely in hardware of the proposed algorithm,
possibly with some type of SoC (system-on-chip), and couple it with e-paper devices, for a
complete comic book reading experience on a single sheet of paper.
33
34
CHAPTER 3. CONCLUSIONS
References
[1] John Byrne. Alpha Flight. Number 6. January 1984.
[2] Grant Morrison and Tony Daniel. Batman. Number 670. DC Comics, December 2007.
[3] Grant Morrison and Tony Daniel. Batman. Number 671. DC Comics, January 2008.
[4] Grant Morrison and Tony Daniel. Batman. Number 672. DC Comics, February 2008.
[5] Grant Morrison and Tony Daniel. Batman. Number 673. DC Comics, March 2008.
[6] Grant Morrison and Tony Daniel. Batman. Number 674. DC Comics, April 2008.
[7] Paul Azaceta Marcos Martin Mark Waid, Stan Lee. The Amazing Spider-Man. Number
643. Marvel Comics, November 2010.
[8] Marcos Martin Mark Waid, Stan Lee. The Amazing Spider-Man. Number 644. Marvel
Comics, November 2010.
[9] Mathew Southworth Mark Waid, Paul Azaceta. The Amazing Spider-Man. Number
645. Marvel Comics, December 2010.
[10] Paul Azaceta Mark Waid. The Amazing Spider-Man. Number 646. Marvel Comics,
December 2010.
[11] Dan Slott Fred Van Lente Mark Waid Zeb Wells Max Fiumara Karl Kesel Paul Azaceta
Bob Gale, Joe Kelly. The Amazing Spider-Man. Number 647. Marvel Comics,
December 2010.
[12] Joe Quesada Clayton Henry Dan Slott, Paul Tobin. The Amazing Spider-Man. Number
648. Marvel Comics, January 2011.
35
36
REFERENCES
[13] Humberto Ramos Dan Slott. The Amazing Spider-Man. Number 649. Marvel Comics,
January 2011.
[14] Scott McCloud. Understanding Comics - The Invisible Art. Harper Collins, 1994.
[15] Marvel Comics. Marvel Digital Comics Shop, (accessed October 12, 2012). http:
//comicstore.marvel.com/.
[16] DC Comics. DC Comics Digital Comics Shop, (accessed October 12, 2012). http:
//www.readdcentertainment.com/.
[17] DigitalComicMuseum. Digital Comic Museum, (accessed October 12, 2012). http:
//digitalcomicmuseum.com/.
[18] Ruini Cao and Chew Lim Tan.
Separation of overlapping text from graphics.
In Proceedings of the Sixth International Conference on Document Analysis and
Recognition (ICDAR’01), pages 44–48, 2001.
[19] Trung Quy Phan Palaiahnakote Shivakumara and Chew Lim Tan. A laplacian approach
to multi-oriented text detection in video. In IEEE Transactions on Pattern Analysis and
Machine Intelligence, volume 33, pages 412–419, 2011.
[20] Rafael C. Gonzalez and Richard E. Woods. Digital Image Processing Third Edition.
Prentice Hall, 2007.
[21] Siddhartha Brahma. Text extraction using shape context matching, 2006.
[22] Kohei Arai and Herman Tolle. Method for real time text extraction of digital manga
comic. International Journal of Image Processing (IJIP), 4(6):669–676, 2011.
[23] Will Eisner. Comics & Sequential Art. Poorhouse Press, 1985.
[24] M. Praneesh and R. Jaya Kumar.
Novel approach for color based comic image
segmentation for extraction of text using modify fuzzy possibilistic c-means clustering
algorithm. Special Issue of International Journal of Computer Applications (09758887) on Information Processing and Remote Computing - IPRC, pages 16–18, August
2012.
[25] Keiichiro Hoashi, Chihiro Ono, Daisuke Ishii, and Hiroshi Watanabe. Automatic
preview generation of comic episodes for digitized comic search. In Proceedings of
the 19th International Conference on Multimedia 2011, 2011.
REFERENCES
37
[26] Q. Yuan and C. L. Tan. Page segmentation and text extraction from gray scale image
in microfilm format, 2001.
[27] Sachin Grover, Kushal Arora, and Suman K. Mitra. Text extraction from document
images using edge information. In IEEE India Council Conference, INDICON 2009:
Ahmedabad, 2009.
[28] Wenshuo Gao, Xiaoguang Zhang, Lei Yang, and Huizhong Liu. An improved sobel
edge detection. In Computer Science and Information Technology (ICCSIT), 2010 3rd
IEEE International Conference on, volume 5, pages 67–71, 2010.
[29] Qinqlian Guo, Kyoko Kato, Norio Sato, and Yuko Hoshino.
An algorithm for
extracting text strings from comic strips, 2006.
[30] Christophe Rigaud, Norbert Tsopze, Jean-Christophe Burie, and Jean-Marc Ogier.
Extraction robuste des cases et du texte de bandes dessinées, 2012.
[31] J. Sushma and M.Padmaja. Text detection in color images. In IEEE IAMA 2009, 2009.
[32] ITU-R. Recommendation itu-r bt.601-7, March 2011.
[33] R.W.G. Hunt. The Reproduction of Colour in Photography, Printing and Television.
Fountain Press, 1987.
[34] Anh Khoi Ngo ho, Jean-Christophe Burie, and Jean-Marc Ogier. Panel and speech
balloon extraction from comic books. In 2012 10th IAPR International Workshop on
Document Analysis Systems, 2012.