Image Content in Shopping Recommender Systems for

Transcription

Image Content in Shopping Recommender Systems for Mobile Users
by
Tranos Zuva
Submitted in fulfilment of the requirements for the degree
DOCTOR TECHNOLOGIAE
in the
Department of Computer Systems Engineering
FACULTY OF ICT
TSHWANE UNIVERSITY OF TECHNOLOGY
Supervisors:
Prof. Sunday O. Ojo
Prof. Oludayo O. Olugbara
Prof. Seleman M. Ngwira
August 2012
DECLARATION BY CANDIDATE
“I hereby declare that the dissertation /thesis submitted for the degree D Tech: Computer
Systems Engineering, at Tshwane University of Technology, is my original work and has
not previously been submitted to any other institution of higher education. I further declare
that all sources cited or quoted are indicated and acknowledged by means of a
comprehensive list of references”.
Tranos Zuva
Copyright© Tshwane University of Technology 2012
This study is dedicated to My late Mother for the inspiration. Quote: “Never send anyone
to do something on your behalf if you want it done to your taste”-Shanangurayi Zuva
ACKNOWLEDGEMENTS
First and foremost, I would like to express my sincerest gratitude and appreciation to my
supervisors namely Prof. Oludayo O. Olugbara, Prof. Sunday O. Ojo and Prof. Seleman M.
Ngwira for their encouragement, guidance, patience, motivation and support during my
DTech research period. Their contribution to my work is immeasurable by any standard.
To Prof. Olugbara, thank you very much for unselfishly sharing with me your immense
knowledge in this area (image processing). This enabled me to develop an understanding
of the subject. Truly, I could not have imagined having better supervisors for my DTech
study.
Besides my supervisors, I would like to thank my fellow members of staff and students
who were there for me when I needed help of any kind. They provided a conducive
environment for me to work in. Thanks guys.
I would also like to thank my family: my wife Keneilwe Zuva and kids (Unaludo, Tariro,
Nyasha and Trevor) for their support. I wish they would be rewarded for joys they have
sacrificed during the period of my study. Not forgetting my father, brothers, sisters, my inlaws, friends and their families for their support.
Last but not least a big thank you to Tshwane University of Technology for the financial
support and for giving me the opportunity to further my studies.
i
CONTENTS
PAGE
ACKNOWLEDGEMENTS
i
LIST OF FIGURES
vii
LIST OF TABLES
ix
PUBLICATION LIST
xi
ABSTRACT
xiii
CHAPTER 1 ......................................................................................................................... 1
1
INTRODUCTION .......................................................................................................... 1
1.1
Statement of Problem .............................................................................................. 3
1.2
Research Question .................................................................................................. 3
1.3
Goal and Objectives ................................................................................................ 4
1.4
Expected Contributions........................................................................................... 5
1.5
Thesis structure ....................................................................................................... 6
CHAPTER 2 ......................................................................................................................... 7
2
RECOMMENDER SYSTEMS ...................................................................................... 7
2.1
Collaborative FILTERING (CF)............................................................................. 8
2.1.1
User-based nearest neighbour .......................................................................... 9
2.1.2
Item-based nearest neighbour ........................................................................ 10
2.2
Content-Based Filtering ........................................................................................ 11
2.3
Knowledge based RECOMMENDER SYSTEMS ............................................... 12
ii
2.4
Hybrid RECOMMENDER systems ..................................................................... 13
2.5
Challenges of recommendation Techniques ......................................................... 13
2.6
Evaluation metrics for recommender systems ...................................................... 15
2.7
Mobile recommender systems .............................................................................. 16
2.8
Motivation for mobile recommender systems ...................................................... 17
2.8.1
2.9
Recommendation systems for mobile users ................................................... 17
Architecture of mobile recommendation system .................................................. 19
CHAPTER 3 ....................................................................................................................... 23
3
IMAGE SEGMENTATION, REPRESENTATION AND RETRIEVAL................... 23
3.1
Image Segmentation Techniques .......................................................................... 23
3.1.1
Thresholding Method .................................................................................... 25
3.1.2
Edge Based Methods ..................................................................................... 26
3.1.3
Region Based Methods .................................................................................. 30
3.1.4
Performance Evaluation ................................................................................ 36
3.1.5
Challenges and Future Directions.................................................................. 38
3.1.6
Segmentation techniques Summary............................................................... 38
3.2
Image Shape Representation and Description Techniques ................................... 40
3.2.1
Classification of shape representation and description techniques ............... 41
3.2.2
Boundary/Contour Based representation Techniques ................................... 43
3.2.3
Region/Whole based representation Techniques ........................................... 47
3.2.4
Evaluation of Representation and Description Algorithms ........................... 53
3.2.5
Challenges and Future Directions.................................................................. 54
iii
3.2.6
3.3
Image representation Summary ..................................................................... 54
Image (dis)similarity measurement and Database access algorithms ................... 56
3.3.1
(Dis)similarity Algorithms ............................................................................ 56
3.3.2
The Relationship between (Dis)similarity algorithm and Database Indexing65
3.4
Image (dis)similarity measurement and Database access algorithms Summary .. 67
3.5
Evaluation algorithm of Information Retrieval Systems ...................................... 68
3.5.1
Techniques for evaluation of unranked retrieval results ............................... 70
3.5.2
Techniques for evaluation of ranked retrieval results ................................... 71
3.5.3
Relationship between ROC AND p-r related measures ................................ 76
3.5.4
Conclusion ..................................................................................................... 77
3.6
Chapter summary .................................................................................................. 78
CHAPTER 4 ....................................................................................................................... 80
4
SHAPE IMAGE CONTENT FOR MOBILE RECOMMENDER SYSTEM ............. 80
4.1
Image pre-processing ............................................................................................ 84
4.2
Segmentation methods .......................................................................................... 84
4.2.1
Active contour without edges ........................................................................ 85
4.2.2
Robust Image Segmentation using Local Median ......................................... 87
4.3
Image representation method ................................................................................ 89
4.4
The 1-Dimensional Kernel Density Estimation .................................................... 89
4.4.1
Kernel Functions............................................................................................ 90
4.4.2
Kernel Density Estimator (Properties) .......................................................... 91
4.4.3
Bias of the Estimator ..................................................................................... 93
iv
4.4.4
Variance of the Kernel Density Estimator ..................................................... 95
4.4.5
Mean-Square Error (MSE) ............................................................................ 96
4.5
Finding Optimal Bandwidth ................................................................................. 97
4.5.1
Asymptotically Optimal Bandwidth .............................................................. 98
4.5.2
Plug-in Bandwidth ......................................................................................... 98
4.5.3
Adaptive Kernel Density Estimate (AKDE) ............................................... 100
4.6
The N-Dimensional Kernel Density Estimation ................................................. 102
4.6.1
Kernel Density Estimator (Properties) ........................................................ 103
4.6.2
Asymptotic Mean Integrated Squared Error ................................................ 104
4.7
Finding Optimal Bandwidth ............................................................................... 104
4.7.1
4.8
Plug-in Bandwidth ....................................................................................... 104
Shape representation using Adaptive kernel density feature points estimator
(AKDFPE) ..................................................................................................................... 106
4.8.1
Proposed calculation of the optimal bandwidth .......................................... 108
4.8.2
AKDFPE algorithm steps ............................................................................ 110
4.8.3
Example ....................................................................................................... 111
4.9
Similarity matching ............................................................................................. 112
4.10
Evaluation ........................................................................................................... 113
4.11
Datasets ............................................................................................................... 114
4.11.1
MPEG 7 ....................................................................................................... 115
4.11.2
General shopping item images .................................................................... 115
4.12
4.13
Query images ...................................................................................................... 116
Chapter Summary...................................................................................................116
v
CHAPTER 5 ..................................................................................................................... 117
5
EXPERIMENTATION, RESULTS AND DISCUSSION......................................... 117
5.1
Experiments ........................................................................................................ 117
5.2
Pre-processing, SEGMENTATION AND (dis)similarity selection ................... 118
5.2.1
Results for pre-processing AND SEGMENTATION stages ...................... 119
5.2.2
Results for Selection of (dis)similarity method using AKDFPE ................. 119
5.2.3
Results analysis of pre-processing, segmentation and (dis)milarity techniques
......................................................................................................................121
5.3
Effectiveness of KDFPE and other representation methods ............................... 122
5.3.1
Results for Comparison of effectiveness between KDFPE and other methods
on standard datasets .................................................................................................... 122
5.3.2
Results for Comparison of effectiveness between KDFPE and DHFP on
shopping items dataset ............................................................................................... 124
5.3.3
5.4
Results analysis for effectiveness between KDFPE and other methods ..... 126
Image content for shopping items recommender system for mobile users ......... 127
5.4.1
Results for retrieval system of shopping items for mobile users ................. 129
5.4.2
Results for Image content for shopping items recommender system for
mobile users................................................................................................................ 130
5.4.3
5.5
Results analysis............................................................................................ 135
Overal results anaysis ......................................................................................... 136
CHAPTER 6 ..................................................................................................................... 137
6
CONCLUSION, CONTRIBUTION AND FUTURE WORK................................... 137
6.1
Conclusion .......................................................................................................... 137
6.2
Summary of contributions................................................................................... 139
vi
6.3
Future work ......................................................................................................... 140
References ..................................................................................................................... 142
LIST OF FIGURES
PAGE
FIGURE 2-1: Classification of Recommender Systems ....................................................... 7
FIGURE 2-2: Cell phone screen size is small ..................................................................... 19
FIGURE 2-3: Proposed Mobile Recommender System ...................................................... 20
FIGURE 2-4: Architecture of the Recommender System ................................................... 21
FIGURE 3-1 An Overview of Shape Segmentation Techniques ........................................ 25
FIGURE 3-2: Sobel Edge Detection Templates .................................................................. 27
FIGURE 3-3: Two Commonly used Lapalcian kernels ...................................................... 29
FIGURE 3-4: Edge Based Method (Sobel) ......................................................................... 30
FIGURE 3-5: Quadtree Structure for Split and Merge Method ......................................... 31
FIGURE 3-6 Region Based Method (Chan & Vese) ......................................................... 36
FIGURE 3-7 An Overview of Evaluation Techniques ........................................................ 37
FIGURE 3-9.a Contour Pixels (8-Connectivity) ................................................................. 41
FIGURE 3-9.b Region Pixels (8-Connectivity) .................................................................. 41
FIGURE 3-10 Hierarchy of the Classification of Shape Representation and Description
Techniques ........................................................................................................................... 43
FIGURE 3-11 Examples of Contour Based Techniques ..................................................... 44
FIGURE 3-12 Directions for 4-connectivity ....................................................................... 45
FIGURE 3-13 4-directional Chain Code Representation .................................................... 45
FIGURE 3-14 Examples of Region Based techniques ........................................................ 48
vii
FIGURE 3-15: (a) Convex hull and its Concavities (b) Concavity representation tree of the
convex hull .......................................................................................................................... 52
FIGURE 3-16: Hierarch of classification of evaluation techniques for IR systems ........... 69
FIGURE 3-17: Set Diagram showing elements of Precision and Recall ............................ 70
FIGURE 3-18: Graphs for values in Table 1 and Table 2 ................................................... 73
FIGURE 3-19: Graphs illustrating the appearance of P-R and ROC curves....................... 76
FIGURE 4-1: The framework of the retrieval system ......................................................... 83
FIGURE 4-2: The Image Retrieval Process ........................................................................ 83
FIGURE 4-3: Shows the rings around the centroid of an image ....................................... 109
FIGURE 4-4: Segmented object shape .............................................................................. 111
FIGURE 4-5: Distinct images from the Internet ............................................................... 116
FIGURE 5-1: Samples of shopping items in each category in the dataset ........................ 118
FIGURE 5-2: (b) Sample results of pre-processing and segmentation of images in (a) .. 119
FIGURE 5-3: (Dis)similarity method Cosine on the left and Euclidean on the right
(KDFPE) ............................................................................................................................ 120
FIGURE 5-4: (Dis)similarity method Cosine on the left and Euclidean on the
right(KDFPE).................................................................................................................... 121
FIGURE 5-5: Segmented shapes that were considered similar by KDFPE using cosine
similarity algorithm ........................................................................................................... 121
FIGURE 5-6: Average precision-recall on Region Based Test Image Retrieval on 678
object shapes (MPEG 7 CE 2) ........................................................................................... 124
FIGURE 5-7: Ten retrieval results of KDFPE on left and DHFP on the right (query at the
top left of the figure) .......................................................................................................... 125
FIGURE 5-8: Ten retrieval results of KDFPE on the left and DHFP on the right (query at
the top left of the figure) .................................................................................................... 125
viii
FIGURE 5-9: Average precision-recall chart on General Image Retrieval ....................... 126
FIGURE 5-10: 2-D images of a 3-D shopping item .......................................................... 127
FIGURE 5-11: a) set of images difficult to identify b) set of images easy to identify...... 128
FIGURE 5-12: Query image captured by a camera enabled mobile device ..................... 129
FIGURE 5-13: Ten retrieval results of KDFPE ................................................................ 129
FIGURE 5-14: Average precision-recall on General Image Retrieval (Query captured by
cell phone) ......................................................................................................................... 130
FIGURE 5-15: Query Image ............................................................................................. 130
FIGURE 5-16: Results from the Shopping Recommender System .................................. 131
FIGURE 5-17: Query Image ............................................................................................. 131
FIGURE 5-18: Results from the Shopping Recommender System .................................. 132
FIGURE 5-20: Results from the Shopping Recommender System with GPS coordinates for
Retailer............................................................................................................................... 133
Retailer............................................................................................................................... 134
FIGURE 5-23: Evaluation of the recommender system .................................................... 135
LIST OF TABLES
PAGE
TABLE2.1: User Rating Data Matrix R...............................................................................9
TABLE 3.1: Segmentation techniques summary.................................................................39
TABLE 3.2: Representation techniques summary...............................................................55
TABLE 3.3: Interpretation of (dis)similarity values............................................................57
ix
TABLE 3.4: Non-metric classification.................................................................................59
TABLE 3.5: Examples of metric access methods................................................................67
TABLE 3.6: Showing the calculation of precision-recall coordinates…………………….72
TABLE 3.7: 11-Point interpolated average precision……………………………………..73
TABLE 3.8: Confusion matrix…………………………………………………………….75
TABLE 4.1: Plug-in values for hrot ....................................................................................100
TABLE 4.2: Values of constant C j ( K , q) .........................................................................106
TABLE 5.1: Comparison of Bull’s Eye Performance on MPEG 7 CE 1 dataset..............123
TABLE 5.2: 6220c cellphone and its camera specifications..............................................128
TABLE 5.3: Scores to measure satisfaction with performance of the system...................128
x
List of Publications
For the duration of three years of research, the following research papers were published or
submitted that are related to the research work.
Refereed Conference Papers
Tranos Zuva, Oludayo O. Olugbara, Sunday O. Ojo and Seleman M. Ngwira (2012),
Introducing an Adaptive Kernel Density Feature Points Estimator for Image
Representation, International Conference on Computer Science, Engineering &
Technology (ICCSET), 2-3 June 2012, Zurich, Switzerland, pp 129-133.
Enhanced Density Histogram of Feature Points Representation Method, International
Conference on Information Retrieval & Knowledge Management (CAMP 12), 13~15,
2012, Kuala Lumpur, India, pp 209-213.
Object Shape Representation by Kernel Density Feature Points Estimator, First
International workshop on Signal and Image Processing (SIP 2012) January 3~ 4, 2012,
Bangalore, India, pp. 209-216.
Tranos Zuva, Oludayo O. Olugbara, Sunday O. Ojo and Seleman M. Ngwira (2011), Image
Shape Representation and Description Techniques, Classification of Available Techniques
and Open Issues, Proceedings of 2011 IEEE International Conference on Intelligent
Computing and Intelligent Systems (ICIS 2011) November 18-20, Guangzhou, China,
pp. 186-191.
Tranos Zuva, Oludayo O. Olugbara, Sunday O. Ojo and Seleman M. Ngwira (2011), A
Review of Image Segmentation Techniques, Challenges and Future Directions,
International Conference on Materials Science and Computing Science (MSCS 2011)
August 13-14, Wuhan, China, ISSN: 1022-6680.
Refereed Journals Papers
Introducing an Adaptive Kernel Density Feature Points Estimator for Image
xi
Representation: International Journal of Wireless Information Networks & Business
information System (WINBIS) Vol. 3, June 2012, Pages: 124-130, (ISSN No: 2091-0266)
Content in Location-Based Shopping Recommender Systems for Mobile Users: Advanced
Computing: An International Journal (ACIJ), Vol.3, No.4, July 2012, Pages: 1-8, (ISSN:
2229 - 6727 [Online] [Online]; 2229 - 726X [Print]
Review of Image Shape Representation Methods, Challenges and Future Directions:
Canadian Journal on Image Processing and Computer Vision Vol. 3 No. 1, March 2012,
Pages: 32-37, ISSN: 1923-1717
Tranos Zuva, Oludayo O. Olugbara, Sunday O. Ojo and Seleman M. Ngwira (2011 Kernel
Density Feature Points Estimator for Content-based Image Retrieval: Signal & Image
Processing: An International Journal (SIPIJ), Vol.4 No.1, February 2012, Pages: 103-111,
ISSN 0975-5578 (Online) 0975-5934 (Print)
Segmentation, Available Techniques, Developments and Open Issues: Canadian Journal on
Image Processing and Computer Vision Vol. 2 No. 3, March 2011, Pages: 20-29, ISSN:
1923-1717
xii
Abstract
The general problem of generating recommendations from a recommender system for users
is an arduous one. More arduous is the generation of recommendations for mobile users,
because of the limitations of the mobile devices on which the recommendations are to be
projected. Mobile devices with integrated support of camera can be used to offer online
services to global community whenever and wherever they are located. The mobile user
expects to receive a limited number of probable recommendations from a shopping
recommender system in few seconds and must be approximately accurate to the mobile
user’s needs. In order to achieve this objective proposed client-server architecture for
image content based shopping recommender system framework over wireless mobile
devices was implemented. The image content shopping recommender system performed a
query by external image captured by the mobile device’s camera. It then generated a set of
recommendations that is viewed on the mobile device using the Internet browser. The
image content used to improve recommendations generation is the shape extracted using
level sets and active contour without edge methods. An algorithm to represent the extracted
shape content such that it will be invariant to Euclidean transform, affine transformation
and robust to occlusion and clutter was found. The shape invariant content was then used
to characterise sales item for effective recommendations generation. Suitable distance
measure was used to evaluate the images’ similarity for retrieval purpose on the content
representation. Experimental results were generated and analyzed to test the efficacy of the
shape content representation and matching algorithm. Finally the Image Content in
Recommender System for Mobile Users is simulated and evaluated by users.
xiii
CHAPTER 1
1
INTRODUCTION
This thesis reports on the development of mobile recommendation system to intuitively
support mobile users in recommendations generation. Recommendation systems belong to
the class of information search techniques that have been recently proposed to overcome
the information overload problems. The fundamental computational task of a recommender
system is to predict the subjective evaluation a user will assign to an item (Ricci, 2010).
This technology has been successful for web users in providing targeted item
recommendations but only a few have been designed for mobile users (Ricci & Nguyen,
2006). There are several mobile technologies including mobile data networks (General
Packet Radio Service (GPRS) and Universal Mobile Telecommunications System
(UMTS)), Global Positioning Systems (GPS), mobile phones and Personal Digital
Assistants (PDAs) that are in use to offer online services to the global community
whenever and wherever they are located (Lu & Weng, 2007). These types of services are
best suited to mobile users in places that they have never been to before and the user has to
make a choice from a number of available options. Potential beneficiaries of these services
include tourists, long distance vehicle drivers, business travellers, nomads and individuals
who want to access important information on the move. In particular, the technology is
beneficial to disabled people who find it difficult shopping around to locate items of choice
from their current locations through the help of mobile devices that they carry. These types
of people would want to know their way to certain places or where they are, where to get
item(s) and /or activity(ies) of their choice. At cheaper cost, recommender systems can be
of great help to this group of people in order to find places or where they are and/or item(s)
of their taste in places where there are so many options to choose from (Olugbara, Ojo &
Mphahlele, 2010). These users would want to satisfy their short term needs so it is
imperative that when they search for information, the output must be precisely accurate
instantly or after a short time of feedback. IT is important that the user be satisfied with
output from such applications. Most of the few web-based recommender systems designed
for mobile devices run only on PDAs (Palm or Pocket PC), and they are not suitable for the
much popular mobile smart phones (i.e iPhone, Black Berry, etc). This is due to the fact
1
that mobile smart phones have smaller screens, limited keypad and texting on such a
device is extremely difficult.
Use of text brings another problem as has been noted (Boutemedje, Ziou & Bouguila,
2007) that no two people can describe the same place and/or item using the same words or
in the same way. It means this ambiguous way of querying a database will flood or
overwhelm the user with so many probable search outputs. Users will take a long time to
get what they want, in so many cases they give up especially when they are mobile users
who do not have time to screen several outputs.
The aim of this research work is to find an effective way to represent and retrieve shopping
image items from a shopping database to use in Image Content in Recommender System
for Mobile Users. For this to be achieved a proposed image content retrieval system that
use images from a camera enabled mobile smart device as primary input to the system was
implemented. The retrieval system was then incorporated into the recommender system.
This has the potential to encourage usage of the system by mobile users because it removes
ambiguities, reluctance of querying due to spelling or sentence construction and other
texture related problems. Image contents that are usually used to describe the image
syntactic component and search for image in a database are colour, texture, shape, and their
combination. The effectiveness and efficiency of image content in shopping recommender
systems for mobile users depends importantly on content representation, the (dis)similarity
model used to evaluate the images’ similarity and the accessing method. It is highly
important that when the images are collected for a image database to be used with the
recommender system the content must be extracted and stored because extraction of
content could have high computational complexities (Chan & Vese, 2001; Sharma &
Aggarwal, 2010). This enables reduced processing time especially for mobile users who
have very limited time to wait for retrieval generation. In this work the shape content
extracted using level sets and active contour without edge was used. An algorithm to
represent the extracted shape content such that it is invariant to Euclidean transform and
affine transformation (rotation, scale and translation) was developed. The invariant content
was then used to characterise shopping items which is the domain of interest.
2
1.1
STATEMENT OF PROBLEM
The greatest challenge in recommender systems today is to improve recommendation
accuracy and efficiency. This challenge has been compounded with the availability of
smaller mobile devices like mobile phones and PDAs on the market with limited interface
screen, memory size and processing capability. Researchers are trying to migrate the
content-based image recommender systems to these mobile devices. Traditionally, the
Internet is an information retrieval system used to generate information for the users,
however, the information generated can overload the user. It has been shown (Ricci, 2010)
that this is caused by the ambiguity in querying of information and/or the structure of the
database. Nowadays with a great variety of mobile devices having limited interface screen
and memory sizes, it is imperative that recommender systems be very accurate and
efficient in processing queries to the satisfaction of the user.
Surely, without improvements in the retrieval systems then a very useful system will not
migrate smoothly to mobile users who so much need it. A mobile user is someone who is
limited in time, thus the output must be limited in number due to the constraints of the
device in use and accurate to reduce the feedback interaction time. This type of user can be
described as an impatient user so the input procedure for querying the database must not
delay the user. With this in mind, research work is being done to try to introduce
recommender systems because of their economic benefits to governments, companies and
individuals. This study is aimed at contributing to such efforts.
In our research group, we are focusing on a unified conceptualization of three main
research endeavours, recommendation technology, image processing and mobile
computing to realise an effective recommender system for mobile users. This implies the
effective management of three categories of problems, in the three research endeavours,
that is the problems of mobile computing, image representation and retrieval, and
recommendation technology.
1.2
RESEARCH QUESTION
The research question addressed in this study stemmed from the above research problem
statement and is stated as follows:
3
How could the shopping recommender system be developed so that mobile users are
satisfied in terms of retrieval accuracy and retrieval time using query-by-external image?
To put this question in context we illustrate the following mobile retrieval problem.
“Suppose Nyasha leaves home with a location and a camera enabled mobile device for
shopping. Getting to a nearby shop, she finds an item similar to an item she really wants.
Now she is faced with the difficulty of either buying it now or to continue doing window
shopping with the hope of finding the real item she wants. The dilemma is if she does not
buy now she might not find it later or if she does, she might find the one she wants, as she
continues her window shopping. Consequently, the problem is, with the aid of a camera
enabled mobile device carried by Nyasha, how can she be helped to make the decision of
buying this item or not with the assumption that the shops have databases of shopping
items online?”
In order to answer the above main research question and give a solution to Nyasha’s
problem the following sub-questions need to be adequately answered:
1. How can image content extracted using Active Contour without Edges be
represented for effective use in a shopping recommender systems for mobile
user?
2. How can camera enabled mobile devices facilitate an efficient retrieval of
shopping item image of interest for a mobile user from a shopping
database?
3. What (dis)similarity techniques can work effectively in matching similar
images in a recommender system being queried using images captured by
camera enabled mobile devices?
4. What is the effectiveness of the shopping recommender system for mobile
users?
1.3
GOAL AND OBJECTIVES
The goal of the research is to evolve efficient image content representation mechanism and
retrieval algorithm for effectively matching sales item whose image content has been
extracted by Active Contour without Edges in an Image Content in Shopping
4
Recommender System for Mobile Users. This can be accomplished by implementing the
following objectives:
1. To study and compare recommender algorithms, image segmentation, shape
representation and (dis)similarity methods
2. To highlight the challenges and open issues in the areas studied in 1 above
3. To propose an image representation technique for effective use in a
recommender system for mobile users
4. To measure the effectiveness of the shopping recommender system using
query images captured by a camera enabled mobile device
5. To measure the user satisfaction of the recommender system
1.4
EXPECTED CONTRIBUTIONS
This work makes the following research contributions:

a novel approach for item representation based on image shape content

a novel approach to recommender systems

an image database query technique using images captured by camera enabled
mobile device

highlight some of the challenges and open issues in the area of image processing
and recommender systems.
The use of image content in shopping recommender systems and the use of mobile device
to provide the query by external image to the system is a novel idea. It is novel since most
other recommender systems are not suitable for mobile users. The application of shape
content extracted using level sets and active contour without edge in shopping
recommender systems for mobile users is also novel.
Most research work in image
processing has been done in areas of medical, security, remote sensing, but not in ecommerce. This work also contributes in highlighting the problems, challenges and open
issues that are still encountered in this area of image recommender system for mobile
users. The introduction of the novel region based image representation method is also a
contribution of this research.
5
1.5
THESIS STRUCTURE
This thesis is structured as follows:
Chapter 2, Reviews of related works on recommender systems and discussion based on
theoretical framework for this research work.
Chapter 3, Reviews of related works on Image Segmentation, Representation and Retrieval
techniques.
Chapter 4, Discussion based on Shape Image Content for Mobile Recommender System,
including the experimental designs for this research work.
Chapter 5, Presentation of experiments, results and discussion of analysis of the
experimental results.
Chapter 6, Conclusion, Contribution and Future Work, the conclusion of the research
work. The achievements, shortfalls and future endeavours are discussed.
6
CHAPTER 2
2
RECOMMENDER SYSTEMS
Recommender systems belong to a class of personalized information filtering technologies
that aim to meaningfully suggest which items or products available might be of interest to a
particular user (Bogers & Bosch, 2009; Gunawardana & Meek, 2009). These systems make
recommendations using three fundamental steps: preferences acquisition (acquiring
preferences from the user’s input data), recommendation computation (computing
recommendations using proper methods) and recommendation presentation (presenting the
recommendation to the user) (Huang & Huang, 2009). Based on various techniques used in
recommendation computation existing recommendation systems can be classified into four
fundamental categories shown in Figure 2-1, that is, Collaborative Filtering (CF), ContentBased Filtering (CBF), Knowledge-Based filtering (KBF) and Hybrid Filtering (HF).
Recommender Systems
(RS)
Content-Based
Filtering
(CBF)
Collaborative
Filtering
(CF)
Hybrid
Filtering
(HF)
FIGURE 2-1: Classification of Recommender Systems
7
KnowledgeBased Filtering
(KBF)
2.1
COLLABORATIVE FILTERING (CF)
CF systems obtain user feedback in the form of ratings in a given application domain then
exploit similarities and differences among profiles of several users to generate
recommendations (Olugbara et al., 2010). Algorithms for CF recommender systems can be
grouped into two general classes: memory based (algorithms that require all ratings, items
and users be stored in memory) and model based (algorithms that periodically create a
summary of ratings patterns offline) (Chen, Jiang & Zhao, 2010; Schafer, Frankowski,
Herlocker & Sen, 2007). Most commonly used are the model based algorithms due to the
fact that run-time complexities are reduced. CF techniques can also be grouped into nonprobabilistic and probabilistic algorithms. Probabilistic CF algorithms are those that are
based on an underlying probabilistic model. Non-probabilistic CF algorithms are not based
on probabilistic model. The non-probabilistic CF algorithms are the most commonly used
(Chen et al., 2010; Schafer et al., 2007; Su & Khoshgoftaar, 2009). Nearest neighbour
algorithms are well-known CF non-probabilistic algorithms. There are two different classes
of nearest neighbour CF algorithms that are User-based nearest neighbour and Item-based
nearest neighbour. CF algorithms use a ratings matrix, R , to represent the complete mn
user-item data, m represents the m th user and n th item. Each entry Ru ,i is the score of item i
rated by user u within a certain numerical scale. The matrix is illustrated in table 2.1 below.
TABLE 2-1: User Rating Data Matrix R
Item1
Item 2
Item......
Item i
Item.......
Item n
User1
R1,1
R1, 2
R1,...
R1,i
R1,...
R1,n
User2
R2,1
R2 , 2
R2,...
R 2 ,i
R2,...
R2 , n
User......
R...,1
R....,2
R.....,...
R...,i
R.....,...
R...,n
Useru
Ru ,1
Ru , 2
Ru ,...
Ru ,i
Ru ,...
Ru ,n
User.....
R...,1
R...,2
R.....,...
R...,i
R.....,...
R.....,n
Userm
Rm,1
Rm , 2
Rm,...
R m ,i
Rm,...
Rm , n
8
This section will discuss the user-based nearest neighbour and item-based nearest
neighbour algorithms then the practical challenges of CF algorithms in general.
2.1.1 USER-BASED NEAREST NEIGHBOUR
In the user-based neighbour collaborative filtering recommendation systems, the prediction
of likeness of an item for an active user u is based on ratings from similar users. These
users are called neighbours of u . User-based algorithms generate a prediction for an item i
by analyzing ratings for i from users in the u ’s neighbourhood. Suppose we have a useritem rating matrix Rm*n , which means m is the number of all users n is the number of all
items and Ru ,i is the score of item i rated by user u , showing the user’s degree of preference
for item as in table 2.1. The most significant step in user-base neighbour CF algorithm is
to search the neighbour of the target user u t . To be able to find the neighbour of the target
user u t , similarity algorithm is used. There are two most used to compute similarity
methods: cosine similarity and Pearson correlation coefficient similarity. The formula for
Pearson is given in equations 2-1.
Usersim(u t , u ) 





R

R
R

R
ut 




u ,i
u
u t ,i


iI u , u t 

R



u ,i
i I u ,ut

 Ru 


2

R



i I u ,ut
u t ,i
(2-1)

 Rut 


2
where Usersim(u, ut ) represent the similarity between user u and ut , I uut  I (u)  I (ut )
means the item set rated simultaneously by user u and ut , Ru ,i and Rut ,i are the scores of


item i rated by users u and ut respectively, R u and R ut represent the average scores of users
u and ut respectively.
The last step is when N ut denotes the target user u t ’s neighbour set. We would want to
predict u t rating for item j . The following equation 2-2 will be used.
9



R

R


un , j
u n  * sim(u t , u n )



P (u t , j )  Aut 
userbased
 | sim(ut u n ) |
(2-2)
u nN u t

where Aut represents the average score for user u t for the rated items, Run , j is the score of

item j rated by neighbour user u n , R un means the average score of neighbour u n for the
rated items, sim(ut , u n ) means the similarity between user u t and the neighbour u n .
This will be used to recommend an item to target user. For cosine based similarity
algorithm refer to (Bigdeli, 2008).
2.1.2 ITEM-BASED NEAREST NEIGHBOUR
Item-based nearest neighbour algorithms are transpose of the user-based nearest neighbour
algorithms. Item-based algorithms create predictions based on similarities between items
(Schafer et al., 2007). There are many ways to calculate the similarity between items.
Some of the most popular algorithms are cosine based similarity, correlation based
similarity and adjusted-cosine similarity. The formula for Adjusted-based cosine which is
the most popular and believed to be the most accurate (Schafer et al., 2007; Zhang, Lin,
Xiao & Zhang, 2009) is given in equation 2.3.
(R


u ,i
Itemsim(i, j ) 


 R u )( Ru , j  R u )
u Ui, j
(R


u ,i
u Ui, j

 Ru ) 2
R


u, j

(2-3)
 Ru ) 2
u Ui, j

where Ru ,i and Ru , j represents the rating of user u on items i and j respectively, R u is the
mean of the u th user’s ratings and U i , j represents all users who have rated items i and j .
10
The prediction calculation for item based nearest neighbour algorithm for user u and item
j is carried out using formula 2-4 below.
 Itemsim(i, j ) * R
P
(u t , j ) 
item based
iRu t
ut , j
(2-4)
Itemsim(i, j )


i Ru t
If the predicted rating is high then the system recommends the item to user. The item-based
nearest neighbour algorithms are more accurate in predicting ratings than user based
nearest neighbour algorithms (Schafer et al., 2007).
2.2
CONTENT-BASED FILTERING
CBF approaches recommend items that are similar in content to the items the user liked in
the past or march to the attributes of the user (Melville & Sindhwani, 2010; Pazzani &
Billsus, 2007). In content based filtering recommender systems every item is represented
by a feature vector or an attribute profile. The feature hold numeric or nominal values
representing certain aspects of the item like colour, price, etc. A variety of (dis) similarity
measures between the feature vectors may be used to compute the similarity of two items.
The Euclidean or cosine (dis)similarity algorithms can be used and they are given in
equations 2-5 and 2-6 respectively.
11
Euclidean dissimilarity
n
 (x
dissim( x, y ) 
i 1
i
 yi ) 2 || x  y || 2
(2-5)
Cosine similarity
n
sim( x, y ) 
x
i 1
n
 xi2
i 1
i
* yi
n
y
i 1
2
i
(2-6)
where x and y are an items vectors with n elements in them, dissim( x, y) and sim( x, y)
measure the distance apart and closeness respectively.
The (dis)similarity values are then used to obtain a ranked list of recommended items.
These approaches are based on information retrieval because content associated with the
user’s preferences is treated as a query and unrated objects are scored with similarity to the
query. This approach can give recommendations in any domain. Content based
recommender systems work well if the items can be properly represented as a set of
features.
2.3
KNOWLEDGE BASED RECOMMENDER SYSTEMS
Knowledge based systems use knowledge structure to make inference about the user needs
and preferences (Ricci, 2010). Knowledge based approaches are well-known in that they
have functional knowledge: they have knowledge about how a particular item satisfies a
particular user need, and can therefore reason about the relationship between a need and
possible recommendation (Gemmis, Iaquinta, Lops, Musto, Narducci & Semeraro, 2009).
The user profile can be any knowledge structure that supports this inference.
12
2.4
HYBRID RECOMMENDER SYSTEMS
A hybrid is combination of at least two techniques in order to overcome the deficiencies of
a single method used in isolation (Pazzani & Billsus, 2007). One way is to combine content
based and collaborative filtering algorithms in such a way that they produce separate
ranked lists of recommendations then merge them to make up the final recommendations
(Melville & Sindhwani, 2010). Some notable examples of hybrid recommender systems
are Weighted and Switching hybrid recommender systems. A weighted hybrid
recommender is one in which the score of a recommended item is calculated from the
results of all of the available recommendation algorithms in the system. For example the
simplest combined hybrid recommender systems would be a linear combination of
recommendation scores. Switching Hybrid recommender system (SH) uses some criterion
to switch between recommendation techniques. Example of (SH) recommender system is
the DailyLearner that uses a content\collaborative hybrid. In this hybrid content based
recommendation algorithm is employed first then collaborative if the first results are not
satisfactory (Burke, 2002; Ghazanfar & Prugel-Bennett, 2010).
2.5
CHALLENGES OF RECOMMENDATION TECHNIQUES
Collaborative filtering recommender systems have been very successful in past, but their
extensive use has exposed some real challenges. Some of the challenges are: Data Sparsity,
Cold Start Problem, Fraud, Scalability, Gray sheep, Shilling attack and synonymy (Chen et
al., 2010; Melville & Sindhwani, 2010; Sarwar, Karypis, Konstan & Riedl, 2002; Su &
Khoshgoftaar, 2009).
Data Sparsity: In practice, many commercial recommender systems are used to evaluate
very large item sets (e.g. Amazon.com, CDnow.com). In these systems, even active users
may have purchased one percent of the items (1% of two million of books is 20 000
books). The user-item matrix used for CF will be extremely sparse and a recommender
system based on nearest neighbour algorithms may be unable to make any item
recommendations for a particular user. The system becomes very ineffective. Under data
sparsity there is also reduced coverage and neighbour transitivity (Schafer et al., 2007; Su
& Khoshgoftaar, 2009). Coverage can be defined as the percentage of items that the system
could provide recommendations for. The reduced coverage problem arises when the
13
number of users’ ratings may be very small compared with the large number of items in the
system and the recommender system may fail to generate the recommendations for them.
Neighbour transitivity refers to a problem with sparse databases, in which users with
similar tastes may not be identified if they have not rated the same items. Content based
approaches can also solve the problem since they do not require ratings from other users.
Cold start problem describes a situation in which a recommender system is unable to make
meaningful recommendations due to an initial lack of ratings. Cold start occurs when a
new user or item has just entered the system, it is very difficult to find similar ones due to
inadequate enough information. New items cannot be recommended until some users rate
them. The new item problem affects collaborative filtering recommender systems. Since
content based filtering recommender systems do not dependent on ratings from other users,
they can be used to produce recommendations for all items provided attributes of the items
are available. New users are very unlikely to be given good recommendations because of
lack of their rating or purchase history. Research to solve the new user problem is focusing
on effectively selecting items to be rated by the user to quickly get the user preferences to
improve the recommendation performance (Melville & Sindhwani, 2010).
Scalability: When the population of existing users and items grow tremendously, the
traditional recommender systems algorithms will suffer serious scalability problems, with
computational resources going beyond practical or acceptable levels.
Synonymy: When a number of the same or very similar items have a different name and
recommender systems fail to discover this latent association then treat these products
differently.
Gray Sheep and Black Sheep: When a user whose opinions do not consistently correlate in
agreement or disagreement with any group of people and thus not benefit from the system.
The gray sheep users problem is also responsible for increased error rate in collaborative
filtering recommender systems (Ghazanfar & Prugel-Bennett, 2011), which often result in
failure of recommender systems. Black sheep are those users who have no or very few
people who they correlate with. This situation makes it very difficult to make
recommendation for them (Gemmis et al., 2009).
Fraud: Recommender systems are increasingly being adopted by commercial websites due
to their economic benefits to the retailers and service providers. Unprincipled competing
14
vendors have started to engage in different forms of fraud in order to cheat the
recommender systems to their advantage. They have endeavoured to inflate the perceived
attractiveness of their own commodities (push attacks) or reduce the ratings of their rivals
(nuke attacks). These attacks are also known as shilling attacks (Melville & Sindhwani,
2010; Su & Khoshgoftaar, 2009).
With all these challenges encountered in the use of recommendation systems, there is need
to evaluate the performance of the developed systems. The evaluation of the systems
enables to determine the accuracy of the systems.
2.6
EVALUATION METRICS FOR RECOMMENDER SYSTEMS
The
performance
of
recommender
system
can
be
evaluated
by
comparing
recommendations to a test set of known user ratings. These systems are commonly
measured using predictive accuracy metrics, where the predicted ratings are directly
compared to actual user ratings (Melville & Sindhwani, 2010). The commonly used
metrics are Mean Absolute Error (MAE) and Root Mean Error (RME) as formulated in
equations 2-5 and 2-6 respectively (Melville & Sindhwani, 2010).
MAE 
| P
RMSE 
u ,i
 Ru ,i |
(2-7)
N
 P
 Ru ,i 
2
u ,i
(2-8)
N
where Pu ,i is the predicted ratings for u on item i , Ru ,i is the actual rating and N is the total
number of ratings in the test set. Predictive accuracy metrics treat all items equally.
15
2.7
MOBILE RECOMMENDER SYSTEMS
With the ever-growing Information Communication Technology (ICT) market there are
several mobile technologies on the mobile environment available and accessible to mobile
users to stay connected to service networks while on the move. These devices are being
used to offer online services to global community wherever and whenever they are located.
Most of these technologies are handheld wireless devices. Among these devices, cell phone
is becoming a primary platform for information access for online mobile-users (Gabbouj,
Ahmad, Amin & Kiranyaz, 2005; Ricci, 2010). Mobile browsers are discouraged from
shopping online products when they have to browse pages and pages (categories and
subcategories) of information from an e-shop in order to find the products of their choice.
The more time the user spends browsing the high cost to be paid in terms of time, money
(for wireless data network) and health wise (screen very small). The small screen size of
these handheld wireless devices require user to scroll up and down looking for information.
In order to solve some if not all of the problems encountered by mobile-users
recommender systems were introduced. In reality they must enable mobile-users to have
direct access to highly relevant information in order to minimize the connectivity duration,
time to browse for specific item(s) and user input.
Recommender systems are information filtering and decision support tools aimed at
addressing problems encountered by online browsers. Recommender systems have been
applied in many diverse areas including e-commerce, advertising, news, document
management and e-learning (Huang & Huang, 2009). They are one of the most popular
tools provided in e-commerce to accommodate customer shopping needs with merchant
offers (Yang, Cheng & Dia, 2008). Recommender systems enhance e-commerce sales in
three ways by changing browsers into buyers, enabling cross-sell and loyalty (Schafer,
Konstan & Riedl, 1999). Usually visitors or browsers visit an e-commerce website without
the intention of buying anything. A recommender system that has been monitoring the
browser may catch the eye of the browser by recommending an item of browser’s interest
thus turning a casual browser into a buyer. A cross-sell can take place when a
recommender system recommends an additional item based on those products already in
the shopping cart. Recommender systems improve loyalty by creating a value added
relationship between the site and the customer. Customers usually return to a site that best
match their needs. The more the customer uses a recommender system the more the
16
recommender system learns about the customer and a bond is created between the
customer and site. The customer becomes loyal to a site thus guaranteeing more sales. To
differentiate from recommender systems that have been successful on Personal Computers
(PC) (Ricci, 2010) the recommender systems for mobile devices will be addressed as
Mobile Recommender Systems (MRS). The rest of this chapter will review challenges,
open issues of MRS and discuss the proposed MRS.
2.8
MOTIVATION FOR MOBILE RECOMMENDER SYSTEMS
A Recommender System that utilises image retrieval techniques can be classified as
content based filtering recommender system. Image content such as colour, shape, texture
and motion are used for knowledge representation instead of related terms and keywords
(Olugbara et al., 2010). Most recommender systems in existence use text-based interface
approach for interaction and visualization of recommendations (Olugbara et al., 2010).
Searching with an actual image would be ideal since all ambiguities will be removed.
Images can have contents that text alone cannot adequately convey, making integrating
image retrieval and content-based filtering techniques suitable for addressing the
deficiencies of text-based recommender systems. Content-based and Collaborative
recommender systems have achieved considerable success but they do not take into
consideration location of the user (Yang et al., 2008). Nowadays with mobile device being
able to connect to service networks due to wireless network requires recommender systems
to adapt to a mobile user environment. This is why there is the need for mobile
recommender system for mobile users. Mobile Recommender Systems can be categorized
by positioning them along three basic dimensions, that is, user mobility, device portability
and wireless connectivity (Ricci, 2010). User mobility requires that the user has access to a
mobile recommender system in different places. Device portability implies that the device
used by the mobile user to access mobile information can be carried from one place to
another without much trouble. Wireless connectivity implies that the device used to access
the mobile information system by the mobile user is networked by means of a wireless
technology such as Wifi or Bluetooth or UMTS.
2.8.1 RECOMMENDATION SYSTEMS FOR MOBILE USERS
To enable migration of recommender systems to mobile environment, there are challenges
that need to be taken into consideration. These include limitations of the mobile devices,
17
limitations of the wireless networks, the impacts from the external environment and the
behavioural characteristics of the mobile users (Ricci, 2010). Notwithstanding these
challenges, there are capabilities of these mobile devices that can be exploited. These
include capability of giving the user’s physical position for example the Global Positioning
Systems (GPS) and Radio-Frequency Identification (RFID), ability to deliver the
information and services to the mobile users (omnipresence) whenever they are needed and
wherever the user is (Ricci, 2010) and ability to capture images of interest.
Defining mobile computing as a form of human (mobile user)-computer (mobile device)
interaction by which a computer is expected to be transported during usage. Three aspects
of mobile computing can be established as mobile communication, mobile hardware and
mobile software. In this case a mobile user accessing the recommender system with a
mobile device connected to a wireless network.
As have been mentioned before that the mobile phones are becoming the primary platform
for information access for online mobile-users. The limitation of these devices is the screen
size as can be appreciated in Figure 2-1. Recommendation sessions on a small screen can
be a daunting task and very frustrating for the users. The size of the display can impact
negatively to the user. It is known that users are capable to read and understand the
information offered by these small interfaces but the users have to do extensive scrolling
(Ricci, 2010). In comparison a user using a small screen is less effective in completing a
task than a user of large screen (Gabbouj et al., 2005; Ricci, 2010). These devices have
small keypad. Most existing mobile phones have only twelve-key numeric keypad which
makes it difficult to work with. The mobile devices (cell phone) batteries have a limited
operation period. Another limitation is lack of system resources such as processing power
and memory capacity.
18
FIGURE 2-2: Cell phone screen size is small
The wireless connection to these devices should be reliable for the user to complete their
search otherwise loss of connectivity can frustrate the user. Again in mobile Internet there
is lack of (de facto) standardization of the browsing tools (Ricci, 2010). These are some of
the challenges of using mobile devices to access information online.
Researchers are coming out with solutions to make mobile recommender system a reality
and acceptable by mobile users using the mobile device as the primary platform for
accessing the recommender system (Gabbouj et al., 2005; Ge, Xiong, Tuzhilin & Xiao,
2010; Guldogan & Gabbouj, 2005; Heijden, Kotsis & Kronsteiner, 2005; Olugbara et al.,
2010; Ricci, 2010; Yang et al., 2008). Some of the solutions to the above challenges are as
follows: to make energy efficient mobile recommender systems, effective mobile
recommender systems, efficient recommender systems, to just mention but a few.
2.9
ARCHITECTURE OF MOBILE RECOMMENDATION SYSTEM
In this research, the following problems are addressed in recommender systems for mobile
users while taking advantage of position detection of these mobile smart devices:
1. The problem of text usage in querying mobile recommender systems by taking
advantage of camera enabled mobile devices
2.
The problem of finding a shopping item that can be of interest to the mobile
user.
An illustration of the abstract is shown in Figure 2-3.
19
FIGURE 2-3: Proposed Mobile Recommender System
In Figure 2-3 there is no usage of text up to the time the mobile user gets the
recommendation. The mobile user either selects one of the image items recommended to
the user or decide to capture the image of interest of the user from the shopping items
available. The recommender system then uses the image sent by the user to recommend an
item in the category of the user sent item. In Figure 2-3 the item that was recommended
finally is a shoe.
The proposed overall architecture of the mobile recommender system for mobile user is
given in figure 2-4.
20
FIGURE 2-4: Architecture of the Recommender System
The Client-Server architecture of the Recommender System is shown in figure 2-4. The
client side (mobile device side) there are two main components, which are the LocationImage manager and the Internet browser. The Internet browser is used by the client to
request service from the server via the Internet (e.g. when the client needs the service of
the recommender system). The location-Image manager sends the location of the client and
the image of interest to the recommender system. On the server side there is a
recommender engine that consists of the on-line recommendation generator and an off-line
interest profile generator. The off-line interest generator tracks the user’s purchases in
order to generate the user interest profile. This will enable the system to recommend items
21
that are similar in content to the items the user liked in the past. Thus the system keeps a
database of users’ interest profile. The on-line recommendation generator maintains the
customer profile and retailer databases. The retailer database consists of retailer’s
information such as the shopping items and the GPS coordinates of the retailer’s location.
When client initiates a request, the on-line generator recommends the shopping items in the
category of the image sent by the client based on the client’s interest profile. The
recommended items then are received by the client with the GPS coordinates of the
retailer. Client clicking on the coordinates the GPS will give directions to retailer’s
physical location. If the client is buying on-line then the retailer receives the request from
the client with the GPS position of the client in order to facilitate delivery.
The mobile recommender system will be simulated on a PC but the content based retrieval
system component of the mobile recommender system is going to be implemented and
tested on shopping items. For the recommender system to know the category of the item
image captured by the camera enabled mobile device there are background processes that
take place behind the scene. These processes are:
1. Image Segmentation (Segmentation of the image selected or captured by the
camera enabled mobile device)
2. Image Representation (Representation of the image selected or captured by
the camera enabled mobile device)
3. Image Matching (Matching similarity of the image selected or captured by
the camera enabled mobile device to the shop image items in the database)
4. Image Ranking (Ranking the image items according to user profile).
The images are going to be segmented using level sets and active contour without edges
and a new representation method will be proposed that is more accurate and effective in
representing images. Selection of a suitable similarity method will be done between the
metric and non metric methods. The system will be implemented and tested on a sample
data. The following chapter will review the segmentation, representation and similarity
algorithms. The challenges and open issues of image processing will be highlighted.
22
CHAPTER 3
3
IMAGE SEGMENTATION, REPRESENTATION AND RETRIEVAL
In order to have effective and efficient image content in shopping recommender system for
mobile users, it is imperative to select or create or improve image segmentation,
representation and retrieval algorithms that are suitable for the shopping items domain or
generic domain. There are numerous algorithms available in literature to segment,
represent and retrieve images from an image databases. In this chapter the algorithms for
image segmentation, representation and retrieval will be reviewed in order to find out their
suitability for different applications. The classifications, advantages and disadvantages of
the algorithms, challenges and open issues in the areas of image segmentation,
representation and retrieval will be discussed. This chapter will enable to make a decision
on whether to select or improve or create algorithms for use in the Image Content in
Shopping Recommender System for Mobile Users.
3.1
IMAGE SEGMENTATION TECHNIQUES
The prime goal of image segmentation is domain independent partitioning of an image into
a set of disjoint regions that are visually different, homogeneous and meaningful with
respect to some characteristics such as grey-level, texture or colour to enable easy image
analysis (object identification, classification and processing) (Freixenet, Munoz, Raba,
Marti & Cufi, 2002; Lucchese & Mitra, 2001; Wang, Guo & Zhu, 2007). The formal
definition for image segmentation is as follows (Lucchese & Mitra, 2001):
Let the image domain be  and Pi be partitions of  Such that
Pi  ,    ni1 Pi , H ( Pi )  true m,
H ( Pi  Pj )  false Pi and Pj adjacent
(3-1)
where Pi  Pj   for i  j , and each Pi is connected .
Discontinuity and similarity/homogeneity are two basic properties of the image pixels in
relation to their local neighbourhood used in many segmentation methods. The
segmentation methods that are based on discontinuity property of pixels are considered as
23
boundary or edges based techniques and those that are based on similarity or homogeneity
are region based techniques.
We have intentionally separated the thresholding technique from region based because of
the usage of histogram and its simplicity in application (Freixenet et al., 2002). Hybrid
based techniques are derived from integration of the edge and region based techniques
information (Wang et al., 2007). Image segmentation surveys have been conducted, but
there are few who have presented how researchers can evaluate one’s technique against the
other on a domain independent images or evaluate the performance of their segmentation
(Zhang, 2001),(Min, Powell & Bowyer, 2004),(Udupa, LeBlanc, Zhuge, Imielinska,
Schmidt, Currie, Hirsch & Woodburn, 2006). Many surveys have been directed to one
area of application of image segmentation in areas such as medical, remote sensing and
image retrieval (Freixenet et al., 2002),(Lucchese & Mitra, 2001),(Deb, 2008). This
chapter is organized as follows: Thresholding Methods, Boundary/Edge Based methods,
Region based methods, Performance Evaluation and Summary.
FIGURE 3-1 indicates the classification of image segmentation techniques we have
considered in this chapter. Image segmentation is not an easy task because of: image noise,
weak object boundaries, inhomogeneous object region, weak contrast and many others that
affect images.
24
FIGURE 3-1 An Overview of Shape Segmentation Techniques
3.1.1 THRESHOLDING METHOD
Thresholding based image segmentation aims to partition an input image into pixels of two
or more values through comparison of pixel values with the predefined threshold value T
individually:
Let I (i, j ) be an image,
0, p(i, j )T
I (i, j )  
1 p(i, j )  T
where
(3-2)
refers to the pixel value at the position
. Thresholding may be
implemented locally or globally. In global thresholding the image is partitioned into two as
shown above in Eq.3-2. Local thresholding, the image is subdivided into subimages and
the threshold for each subimage is derived from the local properties of the pixels. The
25
predefined value of T is the one that complicates this method. The determination of the
value T has been the point of interest in image segmentation research (Cheriet, Said &
Suen, 1998),(Dawoud & Kamel, 2004),(Hu, Hoffman & Reinhardt, 2001). There have
been many algorithms developed to generate better threshold value T to segment an image
(Dawoud & Kamel, 2004). These methods that use intensity value do not use spatial
morphological image information of an image and they usually fail to segment objects with
low contrast or noisy images with varying background (Rekik, Zribi, Hamida &
Benjelloun, 2009).
Failure to find the most suitable algorithm to determine the threshold value might result in
one or all of the following:

The segmented region might be smaller or larger than the actual

The edges of the segmented region might not be connected

Over or under-segmentation of the image (arising of pseudo edges or missing
edges)
3.1.2 EDGE BASED METHODS
Edge based segmentation is the location of pixels in the image that correspond to the
boundaries of the objects seen in the image. It is then assumed that since it is a boundary
of a region or an object then it is closed and that the number of objects of interest is equal
to the number of boundaries in an image. For precision of the segmentation, the perimeter
of the boundaries detected must be approximately equal to that of the object in the input
image.
In the endeavour to implement the above there was need to define an edge in an image. An
edge or a linear feature is manifested as an abrupt change or a discontinuity in digital
number of pixels along a certain direction in an image. The manifestation becomes a highgradient or extreme of first order derivative or a zero crossing in the second derivatives.
This brought another assumption that every object of interest in an image has a boundary
that can be detected through the use of gradient or second derivative. Examples of edged
based segmentation algorithms are Sobel, Prewitt, Kirsch, Laplacian and active contour
methods just to mention a few. These segmentation methods use gradient or templates
based on gradient or first derivative or second derivative to detect the boundaries of an
image (Chan & Vese, 2001; Kekre & Gharge, 2010).
26
Sobel, Prewitt and Kirsch use templates based on gradient to detect the edges of an image.
These operators use a pair of kernels to detect edges. For example, the Sobel operator
consists usually of a pair of 3X3 convolution kernels as shown in figure 3-2. Sobel edge
detection algorithm is suitable to detect boundaries along the horizontal and vertical axis
because of the structure of the templates used shown in figure 3-2.
+1
+2
+1
-1
0
+1
0
0
0
-2
0
+2
-1
-2
-1
-1
0
+1
Gx
Gy
FIGURE 3-2: Sobel Edge Detection Templates
From the kernels, the result of Sobel operator at an image pixel that falls in a region of
constant image intensity is zero vector and at a pixel on the boundary is a vector that points
across the edge (Kekre & Gharge, 2010). Typically Sobel algorithm is used to obtain the
approximate absolute gradient magnitude at each pixel in an input of grayscale image. The
absolute gradient magnitude values may be calculated using for example one of the
equations 3-2 and 3-4 (Kekre & Gharge, 2010; Lakshmi & Sankaranarayanan, 2010).
| G || Gx |  | G y |
(3-3)
Or
G  G x2  G y2
(3-4)
In general this is how these gradient- based algorithms work.
The active contour models can also be classified as boundary based segmentation methods.
In active contour or deformable models, the user specifies an initial contour which is then
moved by image driven forces to the boundaries. Generally these methods can be defined
27
by a function g(x) that acts as a stopping term when the object/region boundary has been
reached. The function g(x) can be defined (Airouche, Bentabet & Zelmat, 2009; Liu, 2006)
as
g ( z )  0 and lim g ( z )  0
z 
For instance
g (| u ( x, y |) 
Where
1
, p 1
1 | G( x, y ) * u ( x, y) | p
is the convolution of the image
with the Gaussian filter
(3-5)
which
results in a smoother version of image , where,
G( x, y)  

1
2
e

|x2  y 2 |
4
 0, hom ogeneous region
g (| u ( x, y ) | 
edge
  0,
(3-6)
(3-7)
Laplacian is a 2-D isotropic measure of the second spatial derivative of an image. The
Laplacian of an image highlights regions of rapid intensity change and thus used for
boundary detection (zero crossing edge detector). The Laplacian L( x, y) of an image with
pixel intensity values I ( x, y) is given by equation 3-8:
L ( x, y ) 
2I 2I

x 2 y 2
(3-8)
The change of the gray level on the boundary of an image give a maximum or minimum
value of the first partial directional derivative near the area of image edge and a second
partial derivative of zero. When using Laplacian method the aim is to find the zero
positions and that constitute the boundaries of the image (Huang & Jiang, 2009). Since the
input image is represented as a set of discrete pixels a discrete convolution kernel that can
28
approximate the second partial derivatives in the definition of the Laplacian must be found.
The two usually used kernels are shown in figure 3-3.
-1
-1
-1
8
-1
-1
0
-1
0
-1
4
-1
0
-1
0
-1
-1
-1
(a)
(b)
FIGURE 3-3: Two Commonly used Lapalcian kernels
Using one of these kernels the Laplacian can be calculated using the convolution methods.
There are problems that have been of interest for researchers and the problems are centred
on the use of gradient to detect the boundaries (Chan & Vese, 2001). For instance, these
methods have problems with images that are edge-less, very noisy, boundary that are very
smooth and texture boundary. Other problems of these techniques emanate from the failure
to adjust/calibrate gradient function accordingly, thus producing undesirable results as:



Over or under-segmentation of the image
29
FIGURE 3-4: Edge Based Method (Sobel)
FIGURE 3-4 illustrates some of the problems that are encountered in the use of edged
based methods. The edges of FIGURE 3-4a can be seen missing in FIGURE 3-4b and this
causes problems in post-segmentation image processing, for instance in retrieval or
registration.
3.1.3 REGION BASED METHODS
The region based segmentation is a partitioning of an image into similar or homogenous
areas of connected pixels through the application of homogeneity or similarity criteria
among candidate sets of pixels. Each of the pixels in a region is similar with respect to
some characteristics or computed property such as colour, intensity and/or texture. The
assumption in these techniques is that the partitions that are formed correspond to objects
or meaningful parts of the image. In (Wang et al., 2007) the most commonly used
techniques are the following: Thresholding, Region Growing, Split and Merge, Classifiers
and Clustering.
30
Split and Merge segmentation methods have a common characteristic of starting with an
initial inhomogeneous partitioning of the image (usually the whole image). The main goal
of these methods is to distinguish the homogeneous parts of the image. The concept of split
and merge method is based on quadtree representation, which means each node of the tree
has four descendents and the root of the tree is the whole image as shown in figure 3-5.
IR
I P1
I P4
I P2
I P3
I P 44
I P 41
I P 42
I P 43
FIGURE 3-5: Quadtree Structure for Split and Merge Method
In figure 3-5 I R represents the entire image region that is subdivided into four descendents.
This process of splitting the regions of the image continues until homogeneous partitions
are obtained. After the splitting phase, the merging stage starts to connect the fragmented
regions that satisfy the condition of homogeneity. After merging phase the final segmented
image is produced (Sharma & Aggarwal, 2010).
31
Clustering image segmentation algorithms are usually unsupervised algorithms and not
dependent on training and training data. These methods create classes or partitions on an
image without any priori knowledge. Clustering algorithms are commonly divided into two
general classes that is hierarchical and partitional algorithms. Hierarchical clustering
techniques create a cluster tree by means heuristic splitting or merging procedures.
Partitional clustering techniques divide the input data into a particular determined number
of clusters in advance. The whole process is determined by minimization of certain goal
function, for example a square error function (Malyszko & Wierzchon, 2007). The two
most popularly used algorithms for clustering are K-mean or Hard C-mean and Fuzzy Cmean (Lucchese & Mitra, 2001; Sharma & Aggarwal, 2010). K-mean algorithm produces
results that correspondent to hard segmentation while fuzzy C-mean produces soft
segmentation. Allowing the pixels to have membership of cluster in which they have
maximum value of membership coefficient a soft segmentation can be converted to hard
segmentation. These two methods belong to the partitional algorithms that use a number of
“centres” to represent and group input data. General iterative model for partitional centrebased clustering algorithms has the following steps:
1. Initialize by assigning some values to the cluster centres
2. For each data point x i , compute its membership value m(c j | xi ) to all
clusters c j and its weight w( xi )
3. For each cluster centre c j , recalculate its location taking into account all
points x i assigned to this cluster according to the membership and weight
values:
n
cj 
 m(c
i 1
n
j
 m( c
i 1
| xi ) w( xi ) xi
(3-9)
j
| xi ) w( xi )
4. Repeat steps 2 and 3 until some termination criteria are met (Samma &
Salam, 2009).
Using K-means algorithm which is one of the partitional algorithms, the objective function
in equation 3-10 is minimized.
32
n
KM ( X , C )   min || xi  c j || 2
j1........k 
i 1
(3-10)
Here w( xi )  1 for all i , and the membership function is defined according to the “winner
takes all” rule that is an object belongs to the class with nearest centre (Malyszko &
Wierzchon, 2007). The fuzzy k-means algorithm is based on minimization of objective
function in equation 3-11.
n
k
FKM ( X , C )   mijr || xi  c j || 2
(3-11)
i 1 j 1
The value of the parameter r should be constrained to the values r  1 (Vasuda &
Satheesh, 2010).
The region growing is a mostly used classical segmentation technique. The basic idea of
region growing is a collection of pixels with similar properties to form a region.
Commencing from some seed point, region growing methods segments images by
incrementally recruiting pixels to a region based on some predefined criteria. Two
important segmentation criteria are value similarity and spatial proximity (Kirbas & Quek,
2003; Tang, 2010). These region growing based segmentation models share the following
assumptions about the image pixel properties:

The intensity values within each region/object conforms to Gaussian
distribution

The mean intensity value for each region or object is different (global mean)
[(Wang, He, Mishra & Li, 2009)].
The Gaussian probability distribution function (pdf) for the region
33
is given as follows:
p i i (u ) 
Where
=mean,
2
1
2 2
e

( u  i ) 2
 i2
(3-12)
= variance.
With this type of segmentation, the problems of discontinuous edges and no segmentation
of objects without edges have been eliminated. The boundary of an object can be identified
using the edge/boundary pixels of a region ensuring that the boundary is closed and the
segmentation of objects without edges can now be done.
One of the region based technique was introduced by Chan & Vese “Active Contour
without Edges” can detect contours with or without edges. These methods are capable of
detecting and preserving boundaries without the need to smooth the input image, even
when it is very noisy. Images with smooth boundaries no longer cause any problems (Chan
& Vese, 2001).
Lots of interest have been shown to perfect these methods and encouraging results have
been produced. For instance Jundong Liu argued that the global mean used by Chan &
Vese in their model was not the best for medical images. The argument centred on the
Chan & Vese model that defines the evolving curve C in Ω and an energy
function
. Chan & Vese model minimizes the energy functional defined as
follows:
F (c1, c2, C )  .length (C )   . Area (inside C )  1
| u  c
1
| 2 dxdy 2
inside( C )
| u  c
2
| 2 dxdy
inside( C )
(3-13)
where
are averages of
inside C and outside C respectively. The values of the
above energy function are global values computed from the entire image .
In his paper “Robust Image Segmentation using Local Median” he alluded that the
drawback that existed in most region based active contours were overcome. The paper
indicates that the drawbacks originated from the assumption that the intensity values
globally conforms to Gaussian distribution within each region and that global mean is
enough to be used as discriminate measure. In order to improve the region based
segmentation Liu minimized the following energy function:
34
F (c1, c2, C )  .length (C )   . Area (inside C )  1
| u  f
inside( C )
1
| 2 dxdy 2
| u  f
2
| 2 dxdy
inside( C )
(3-14)
In this function global mean
were replaced by local medians
respectively.
Where f1  median(u * inside (C ) *W )
(3-15)
f 2  median(u * outside (C ) *W )
(3-16)
W is a rectangle window that is used to define neighbourhood pixels in an image. The
functions are defined to calculate the two local medians for the neighbouring
pixels that are inside and outside the moving curve respectively on the image domain. Liu
emphasised on the use of local information in an image instead of the global information.
In this paper (Wang et al., 2009) “Active contour driven by local Gaussian distribution
fitting energy” tends to agree with Liu in that local information of an image is very
important in segmentation. They indicated that the use of global information as in “Active
contours without edges” segmentation fail to adequately segment images with intensity inhomogeneity. Most of the images that cause the problems to segmentation techniques that
use global information of an image are from medical field such as microscopy, computer
tomography (CT), Ultrasound, magnetic resonance imaging (MRI), Positron Emission
Tomography (PET), and mammography. Wang et al. used Gaussian distribution to
describe the local image intensities with different means and variances. They concluded
that their method was able to deal with both noise and intensity in-homogeneity, but has
high computational time. The computational cost of these methods has been one of limiting
factors in their usage (Ayed & Mitiche, 2008). These methods have to start with an initial
curve and its placement on the image plays an important role in the final product of the
segmentation process. Chan & Vese indicated that in their method “Active Contour
without Edges”, the initial curve can be placed anywhere in the image and the
segmentation of an image is competitively good. This shows that researchers are kin to
make these methods domain independent.
35
Failure to adjust the homogeneity/similarity criteria accordingly will produce undesirable
results. The following are some of them:


Over or under-segmentation of the image (arising of pseudo objects or missing
objects)

Fragmentation (Varshney, Rajpal & Purwar, 2009).
FIGURE 3-6 Region Based Method (Chan & Vese)
FIGURE 3-6 indicates some of the problems that can be encountered when using region
based methods. It can be observed that there are some addition and subtraction to region of
interest. Again this will affect the post-segmentation image processing.
3.1.4 PERFORMANCE EVALUATION
There have been many image segmentation methods created and being created using many
distinct approaches and algorithms but still it is very difficult to assess and compare the
performance of these segmentation techniques (Zhang, Fritts & Goldman, 2008).
Researchers would evaluate their image segmentation techniques by using one or more of
the following evaluation methods in Figure 3-7.
36
FIGURE 3-7 An Overview of Evaluation Techniques
The full description of the above evaluation methods can be found from (Zhang et al.,
2008),(Polak, Zhang & Pi, 2009). Most of these methods ideally should be domain
independent, but in reality they are domain dependent. It is generally believed that it is
difficult to develop a single model that applies to all image objects (Boucheron, Harvey &
Manjunath, 2007). Both the subjective and objective evaluation approaches have been used
to evaluate segmentation techniques, but within a domain dependent environment (Zhang
et al., 2008). It can be appreciated that whatever method is used in a specific domain has
been used to compare the segmentation technique in that domain. These methods have
been used to adjust parameters of the segmentation techniques in order to solve the
following problems in segmentation area:



Over or under-segmentation of the image
It is very sad that (Hu et al., 2001) concluded that there is no segmentation method that is
better than the other in all domains. We believe that with the use of universal evaluation
37
methods we can be able to find the segmentation techniques that we may say are better
than others in all domains.
3.1.5 CHALLENGES AND FUTURE DIRECTIONS
For us to find domain independent segmentation techniques is when we can evaluate the
techniques by domain independent evaluation methods using a domain independent image
database. In order for this to happen, we need to create a universal image database such
that researchers can use this database to evaluate their techniques. Whether a subjective or
an objective evaluation method is used, the image database must be same and the images
must be ranked to enable comparison of segmentation techniques. Whenever researchers
segment these images in the database they must indicate the value of parameters for each
image segmented, the computational time and the specification of the machine used. This
will enable easy selection of a segmentation technique for a particular area. Due to ad hoc
form of research, this way of evaluating techniques will give some form of orderliness in
the segmentation field. There is still a room for further improvement in each group of
segmentation methods, that is: Edge-based and Region-based.
3.1.6 SEGMENTATION TECHNIQUES SUMMARY
Segmentation is one of the important preliminary steps in image processing. As can be
appreciated choice of a suitable segmentation algorithm depends on peculiar characteristics
of individual problems. This chapter looked at the classification of segmentation
algorithms and challenges being faced. The boundary based methods show that they are
capable of giving a good segmentation results in the absence of noise in image. Noise
suppression techniques have been employed to improve the boundary based segmentation
results on noisy images. The problem with these noise suppression techniques is that in
reducing noise the edge strength may be reduced also resulting in failure to detect the edge.
Region based segmentation algorithms solve this problem of missing edges. The
advantages of region based segmentation algorithms over edge-based segmentation
algorithms are that they do not use the gradient to detect boundaries. This allows the region
based to be able to segment colour and multi-spectral images where there are no defined
gradient-boundaries. The region based algorithms are less sensitive to the location of the
initial contours. These algorithms have better capabilities of capturing concavities of object
images and are less sensitive to noise. For a better segmentation of an image one has to
38
decide whether to use global or local statistics because they affect the final segmentation of
an image. Through a proper performance evaluation of the segmentation algorithm over the
domain of interest one can get satisfactory segmentation results. We have looked at the
segmentation techniques, performance evaluation methods and we can give the following
summary in TABLE 3-1:
TABLE 3-1 Segmentation Techniques Summary
Segmentation
Research interest
Known Problems in segmenting
Methods
images
Thresholding
 Low contrast
Determine the value of T
 Spatial morphological
(threshold value)
Edge Based
information
 Edge-less
Determine the appropriate
Stopping gradient or other
stopping criteria
Region Based
 Noisy images
 Smooth boundaries
 Texture boundaries
 High computational time
 Determine homogeneity
criteria to decompose the
image into regions.
 Determine how to deal
with in-homogeneity in
images
All
three
of
 Determine performance
them:
evaluation of the
smaller or larger than the actual

techniques
 Thresholding 
Determine comparison
 Edge Based
criteria of the techniques

 The segmented region might be
The edges of the segmented
region might not be connected
 Over or under-segmentation of
Region
the image (arising of pseudo
Based
edges or missing edges)
39
3.2
IMAGE SHAPE REPRESENTATION AND DESCRIPTION TECHNIQUES
With vast collection of digital images on personal computers and on the Internet, the need
to find a particular image or a collection of images of interest has increased tremendously.
This has motivated researchers to endeavour to find efficient, effective and accurate
algorithms that are domain independent for representation, description and retrieval of
image(s) of interest. It is a daunting task, thus there are many algorithms that have been
developed to represent, describe and retrieve images using their visual features (shape,
color, texture) (Rui & Huang, 1999), (Li & Guan, 2006), (Zheng, Sherrill-Mix & Gao,
2007b), (Mingqiang, Kidiyo & Joseph, 2008). Visual feature representation and/or
description play(s) a very important role in image classification, recognition and retrieval.
A successful image representation, description, retrieval/recognition system dependent on
the selection of suitable image feature(s) to encode, quantification of these features and the
selection of the similarity measure.
This chapter deals with a brief review of 2-dimension (2D) shape representation and
description techniques. This area is receiving so much attention due to the fact that human
beings use shape as the basis of visual recognition (Zheng et al., 2007b). An accurate
image shape representation and description in a machine would enable machines to
compete very well with human beings in image recognition and retrieval. Image
representation and description must fulfil translation, rotation and scale invariant (change
in location/position, movement in a certain angle, shrinking or zooming of an image must
not affect its representation and description), noise resistance (quality of the image is
compromised, the visibility of certain features are reduced or lost, this must not affect too
much the representation and description of an image), affine invariant and precise
quantification of the chosen feature(s) to be considered accurate (Mingqiang et al., 2008).
Image retrieval rate can be improved substantially through the use of an appropriate
similarity measure technique. The similarity matching techniques depend very much on the
representation and description technique applied. Usually the image shape representation
and description is a collection of numbers (commonly vectors) produced by a
representation and description algorithm in the process of quantifying an image shape in
ways that concur with human intuition. To enable efficient storage and retrieval, the
representation and description should fulfil the following:
40

the vectors must not be very large

must enable similarity distance calculation to be simple (to reduce execution time)

compact image object representation and description (Mingqiang et al., 2008).
Surveys and reviews give researchers an overview of developments, achievements,
direction and open issues within a given area. We arranged this section as follows:

Classification of representation and description techniques

Boundary/Contour based techniques

Region/Whole based techniques

Challenges and Future Directions

Summary
3.2.1
CLASSIFICATION OF SHAPE REPRESENTATION AND DESCRIPTION
TECHNIQUES
Shape representation and description can be grouped into Region based and Contour based
classes. These classes indicate which pixels are being used in the representation and
description of the image. Region based shows that all the pixels of the shape contribute to
the description while contour/boundary based means that the edge pixels are used in
description of the image as shown in FIGURE 3-10a.and FIGURE 3-10b respectively.
FIGURE 3-9.a Contour Pixels (8Connectivity)
FIGURE 3-9.b Region Pixels (8Connectivity)
MPEG-7 proposed this classification and is widely used (Mingqiang et al., 2008). It must
be noted that representing and describing an image using contour based, the segmentation
41
of the image should be edge based or region based. Region based representation and
description should be region based segmentation. Each of the groups above can be
reclassified into Structural and Global subgroups. Structural based sub-group would
represent a shape by segments while global represents the shape as a whole. It can be said
that structural based is a discrete form of image representation while global is a continuous
form. For example in boundary based representation, the structural based approach divides
the shape boundary into segments called primitives (Zhang & Lu, 2004). The global based
sub-group focuses on the overall shape such as the integral boundary is used to describe the
shape. The techniques can also be classified into Space and Transform domains. This
approach would indicate whether the shape features are derived from the spatial domain or
not. Spatial domain is the normal image space. The space domain approaches match shapes
on a point basis while transform domain approaches match shapes on feature (vector)
basis. The last two classifications in Figure 3-10 are:

Information preserving (IP) and

Non-Information preserving (NIP).
Sometimes it is necessary to reconstruct the original image from its representation and
description. There are some techniques that enable the reconstruction of the original image
and others that do not. Unfortunately very few techniques are able to give sufficient
information for the reconstruction.
42
FIGURE 3-10 Hierarchy of the Classification of Shape Representation and Description
Techniques
3.2.2 BOUNDARY/CONTOUR BASED REPRESENTATION TECHNIQUES
Much of these techniques were described in (Zhang & Lu, 2004),(Mingqiang et al., 2008)
so here we are going to list some of them and implore any improvements that have taken
place on these techniques then add new ones so that we can predict the direction of the
research within the contour based techniques. Figure 3-11 shows some techniques
classified as contour based techniques.
43
FIGURE 3-11 Examples of Contour Based Techniques
These techniques use the boundary of shape to describe an object. It is commonly believed
that human beings can differentiate objects by their boundaries or contours (Zhang & Lu,
2002). Usually most objects form shapes with defined contours, making the use of these
techniques most appealing. The techniques can generally be applied to deferent application
areas with a considerable success. They have a low computation complexity as compared
to region based techniques. We observed that research has been taking place to improve
on the contour based image representation and description as seen in (Zhang & Lu, 2002).
There is a form that must be constructed in structural contour based techniques. For
example the chain code technique describes an object shape by a sequence of unit sized
straight line segments based on 4 or 8-connectivity with a given direction (Mingqiang et
al., 2008). An illustration of chain code using 4-connectivity in FIGURE 3-12 can be seen
in FIGURE 3-13;
44
FIGURE 3-12 Directions for 4-connectivity
FIGURE 3-13 4-directional Chain Code Representation
It means knowing the starting point one can roughly reconstruct the object shape. That
means this type of object representation is information preserving technique. Also we can
observe that the shape features are derived from the spatial domain. It is important to note
that any boundary disturbances probably due to either noise or the segmentation algorithm
used will not represent the object shape correctly.
The global contour based technique is where a function is derived from the boundary of the
object shape to be used to represent the shape, for example 1-dimensional Fourier
descriptors. The global indicate that these algorithms use the whole boundary pixels as a
one continuous unit. The whole boundary is transformed by applying Fourier transform on
a signature that is derived from the shape boundary coordinates (Mingqiang et al., 2008).
To parameterize an object shape boundary from 0 to π, given the boundary coordinates as
45
( xi , yi )  ( xo , y0 ), ( x1 , y1) ...( xn1 , y n1 )
(3-17)
A periodic function can be constructed to represent the boundary as a series of coordinates
in the complex plane as:
s(t )  x(t )  jy (t )
(3-18)
The discrete Fourier transform of s(t) is given as:
F (u ) 
1
N
N 1
 s(t )e
 2 j  ut / N
,
(3-19)
t 0
where u=0, 1, ….N-1 and F(u) coefficients are the Fourier descriptors of the boundary. It is
possible to reconstruct the boundary of the object shape by using inverse transform of F(u).
We call this an Information Preserving Contour based representation algorithm. In this case
the inverse transform will be:
N 1
s(t )   F (u )e 2 j  ut / N
(3-20)
u 0
where t=0, 1, …….N-1.
All the examples we have given preserve information of the object shape to enable a
reconstruction of the object shape boundary. There are also some object shape
representation algorithms that do not allow reconstruction. For example the object shape
signatures like Area function, Triangle-area representation and others, it will be near to
impossible to reconstruct the object shape. These object shape representation algorithms
fall within the Non-Information Preserving.
46
3.2.2.1 MERITS AND DEMERITS OF CONTOUR BASED ALGORITHMS
Advantages

it uses few pixels of an image

a low computation complexity

optimal in high contrast image objects
Disadvantages

sensitive to noise (variations on the edge pixels would represent the same image
object differently)
3.2.3 REGION/WHOLE BASED REPRESENTATION TECHNIQUES
These object shape representation algorithms use every feature point of the object shape to
describe the shape. In using all the pixels of the object shape to describe a shape, these
techniques can be classified as structural or global. The structural is where a form is
constructed by segments/sections that we call primitives in the process of representing the
object shape. On the other hand global techniques use the pixels as a continuous unit in
representing the object shape (Mingqiang et al., 2008). We are going to describe examples
in each category of the classifications that are structural, global, space, transform,
information preserving and non-information preserving. Some examples of region based
techniques are shown in figure 3-14.
47
FIGURE 3-14 Examples of Region Based techniques
Region based Fourier descriptor is an example of a global, transform domain, and non
information preserving representation technique. The Generic Fourier descriptor (GFD) is
derived by applying a Modified Polar Fourier Transform (MPFT) on the object shape
(Mingqiang et al., 2008) that has been transformed into a normal 2-dimensional
rectangular polar image. For a given object shape image f(x, y), the MPFT is defined as
PF (  ,  )   f (r , i )e
r
r
2i
[ j 2 (  
 )]
R
T
(3-21)
i
1
2 2
where 0  r  [( x  xc )  ( y  yc ) ]  R and  i  i(
2
2
), (0  iT ) ; ( xc , yc ) is centre of
T
mass of the shape; 0    R,0   T . R and T are radial and angular resolutions.
48
The calculated Fourier coefficients are invariant to translation but for it to be a good
representation of the object shape it should also fulfil rotation and scaling invariant. Thus
the following makes it to achieve the rotation and scaling invariant.
| PF (0,0) | | PF (0,1) |
| PF (0, n) | | PF (m,0) | | PF (m, n) | 
GFD  
,
,....,
,..
,..,

| PF (0,0) | | PF (0,0) | | PF (0,0) | 
 area | PF (0,0) |
(3-22)
where area is the surface of the bounding circle the shape resides; m is the maximum
number of the radial frequencies selected and n is the maximum number of angular
frequencies selected. m and n can be adjusted to achieve hierarchical coarse to fine
representation requirement.
Moments have been used widely in image representation. These include Invariant
moments, Algebraic moments, Zernike moments and Radial Chebyshev moments. They
belong to space domain. A Zernike moment is classified region based, global and
information preserving representation technique. Zernike moments object shape
representation preserve information of the shape to enable the original object shape to be
reconstructed from the shape description (Maofu, Yanxiang & Bin, 2007).
The Invariant moments (IM), the general form of a moment function m pq of order p  q ,
of an image function f ( x, y) is given by
m pq 
 ( x, y) f ( x, y)dxdy
where p, q  0, 1, 2...n, n  
(3-23)
xy
 pq is known as the moment weighting kernel or the basis set.
For digital image function f ( x, y) the equation above is written in discrete form as follows
m pq   ( x, y) f ( x, y)
x
(3-24)
y
When it is Geometric moments then
49
( x, y)  x p y q
(3-25)
The moments that are invariant to translation are the central moments and are defined as
follows:
 pq   ( x  xc ) p ( y  y c ) q f ( x, y ) p, q  0,1,2,..
x
y
where xc 
m10
m00
and
yc 
m01
.
m00
(3-26)
We must take note that the centroid ( xc , y c ) moves with the image under translation, that is
why the central moments are invariant to translation.
There are seven (7) (translation, rotation and scaling (TRS)) invariant moments and the
seven are as follows as given by Hu:
1   20   02
(3-27)
2  ( 20   02 ) 2  4112
(3-28)
3   20 02  112
(3-29)
4  ( 30  312 ) 2  (3 21   03 ) 2
(3-30)


5  ( 30  312 ) 30  12   30  12 2  3( 21   03 ) 2 
3 21   03  21   03 3 30  12 2   21   03 2 


6  ( 20   02 ) (30  12 ) 2   21   03 2  411 30  12  21   03 

7  (3 21   03 ) 30  12  ( 30  12 ) 2  3 21   03 2
50

(3-31)
(3-32)
(3-33)
where  pq 
 pq
(  00 )
( p  q  2)
2
(Flusser, Suk & Zitova, 2009),(Celebi & Aslandogan, 2005a;
Celebi & Aslandogan, 2005b))
Invariant moments are non information preserving representation algorithm. Invariant
moments have their drawbacks such as:

Information redundancy

Noise sensitivity

Large variation in the dynamic range of values.
Region based Convex Hull is an example of structural algorithm that segment the shape
into parts that are then used for image shape representation and description. It can further
be classified as space domain and a non information preserving method. A region R is

convex C if and only if for any two points x, y  R , the whole line segment xy is inside the

region xy  R . The convex hull CH of a region R is the smallest convex region C that
fulfils the condition R  C . The convex deficiency CD is the difference between the
convex hull CH and the region R as given in equation 3-34.
CD  CH  R
(3-34)
The computing of the smallest convex shape, called the convex hull CH that encloses a set
of points is the real problem. The image shape is represented using a series of convex hulls.
The extraction of convex hull can use both boundary tracing algorithms and morphological
algorithms (Zhang & Lu, 2004). In order to decrease the effect of noise, irregular
boundaries and variations in segmentation, the usual practice is to first smooth a boundary
prior to partitioning. The representation of the image shape may be obtained by a recursive
process which results in a concavity tree as shown in figure 3-15 (Mingqiang et al., 2008) .
51
FIGURE 3-15: (a) Convex hull and its Concavities (b) Concavity representation tree of the
convex hull
Figure 3-15 illustrate the convex hull of the object shape with its convex deficiencies, then
the convex hulls and deficiencies of the convex deficiencies, the process stops only when
all derived convex deficiencies are convex. From the figure it appreciated that
s1 , s2 , s3 , s4 , s5 are convex deficiencies and the same time s 2 , s3 , s 4 are already convex hulls.
The process continued on convex deficiencies s1 , s5 to produce convex hulls s11, s12 , s51, s52
then the process stopped. The object shape can then be represented as a concavity tree.
Each concavity can be described by its area, bridge length (the line that connects the cut of
the concavity), and maximum curvature and so on. The matching between shapes becomes
a string or a graph matching.
3.2.3.1
MERITS AND DEMERITS OF REGION/WHOLE BASED
TECHNIQUES
Advantages

generic image representation and description (both boundary and internal pixels are
used)

not sensitive to noise (small variations on the image object would not affect the
representation and description of an image object so much)
Disadvantages

it uses all pixels of an image
52

a high computation complexity
3.2.4 EVALUATION OF REPRESENTATION AND DESCRIPTION
ALGORITHMS
MPEG-7 has set several principles to measure a shape descriptor such as:


Good retrieval accuracy
Compact features

General application

Low computational complexity

Robust retrieval performance

Hierarchical coarse to fine representation
Most authors evaluate their representation and description methods by comparing the
retrieval efficiency against other methods (Tran & Ono, 2000), (Lecce & Guerriero, 1999).
This type of evaluation is not objective since one author has to reconstruct another author’s
system then use one’s chosen image database to do the comparison. In general authors
evaluate whether their methods fulfill the TRS invariant and noise resistance then tabulate
their retrieval performance on an image database of their choice (Tran & Ono, 2000),
(Mingqiang et al., 2008), (Muller, Michoux, Bandon & Geissbuhler, 2004). This form of
evaluation takes into consideration two of MPEG-7 principles to measure a shape
descriptor that is good retrieval accuracy and robust retrieval performance. Few authors
evaluate their method using most of the stated MPEG-7 principles (Sheng & Xin, 2005).
The testing of TRS and affine invariants is objective since anyone can prove its validity
analytically. Retrieval efficiency does not only evaluate the representation and description
algorithm but also similarity distance method used. Any improvements can either be on the
representation and description algorithm or on the similarity method. Since representation
and description algorithms dependent on the segmentation method used, comparing
retrieval results might not give an objective evaluation of one’s method. Robustness is also
subjective in the sense that the author is the one who has to select the noisy, distorted and
defective images to use in the experiment. There are image databases that are accessible to
53
everyone on the Internet that authors can use in their evaluation experiments. It becomes
subjective in the sense that it is up to the author to choose which database to use, for
example the CE_Shape-1 (Latecki, Lakamper & Eckhardt, 2000). So a structured way of
evaluating of representation and description algorithms is necessary, for the evaluation to
be objective.
3.2.5 CHALLENGES AND FUTURE DIRECTIONS
For us to find a general application representation and description algorithms, we need to
have ordered domain independent image databases to evaluate the algorithms. Authors
should be able to allow other authors to use their programmes code for the sake of those
who wish to compare efficiency of different algorithms. This would require image
representation programmes code databases. In this case, authors will need to indicate the
database used in their experiments and segmentation technique used so that comparison
can be objective. There must be a way of evaluating computational complexity, as of now
it is very subjective. Due to the subjectivity of evaluating these algorithms it is very
difficult to select better algorithms for particular area or general area. An orderly way of
evaluation of algorithms will give direction in research of these algorithms.
3.2.6 IMAGE REPRESENTATION SUMMARY
In this chapter some of the existing shape representation and description techniques have
been reviewed and classified. The evaluation of the algorithms, challenges and future
directions in this area have also been discussed. It was found that contour based approaches
are useful where the shape contour is of interest, whilst the shape interior content is not
important. However the contour based algorithms have their limitations. Contour based
shape representation and description algorithms are generally sensitive to noise and
variations due to the fact that they only use a small part of the shape information. In some
cases the shape contour information is not available due to problems encountered during
the preliminary stages of image processing or during the capturing of the image. These
limitations can be overcome by employing region based shape representation and
description algorithms. Region based algorithms are more robust because they utilize all
the shape information available. Region based have advantages in that they can be applied
to general applications and they provide more accurate retrieval. These advantages stem
from the fact that they can cope very well with shape defection. These methods are also
54
classified into global and structural approaches. Comparing the two, structural approaches
are too complex to implement. They have high indexing and matching complexities
making them a family of unstable shape representation and description algorithms. The
structural algorithms exhibit some advantages in that they can do partial matching. This is
useful when part of the boundary is missing or part of the shape also missing or occluded.
The algorithms are also classified into spatial domain and transform domain. Spatial
domain algorithms have their own disadvantages in noise sensitivity and high dimension.
In general region based algorithms give hope of finding a method that fulfils all the six
principles set by MPEG-7. The principles are good retrieval accuracy, compact features,
general application, low computation complexity, robust retrieval performance and
hierarchical coarse to fine representation. The only way of finding the ‘best’ shape image
representation and description method is when there are standardized evaluation methods
for the algorithms.
The table 3-2 below shows the research interests that we believe are necessary to be
pursued in the quest of finding complete generic and effective image representation and
description algorithms.
TABLE 3-2 Representation Techniques Summary
Representation and
Description Algorithm
Contour Based
Techniques


Region Based
Techniques


Research Interest
Known Problems in
Representing and
Describing images
Domain
independent
algorithms
Objective
or
orderly
evaluation of algorithms
An objective way to find
suitable
method
for
similarity
distance
measurement for different
representation
and
description algorithms
Calculation
of
computational time


Sensitive to noise
Not generic

Computation
intensive
55
3.3
IMAGE (DIS)SIMILARITY MEASUREMENT AND DATABASE ACCESS
ALGORITHMS
The rapid growth in the collection of multimedia data like images, audio, video and text
has prompted the need to have efficient methods for storage, retrieval and indexing of such
data. The content based image (dis)similarity measurement algorithms, if chosen correctly
for a particular multimedia database (s), will definitely increase the efficiency and
effectiveness retrieval of data of interest. In this chapter we will discuss the (dis)similarity
measurement algorithms of images represented using their visible features (shape, colour
and texture) and retrieval algorithms.
3.3.1 (DIS)SIMILARITY ALGORITHMS
Similarity ( s ) can be defined as the quantitative measurement that indicates the strength of
relationship (closeness) between two image objects. Dissimilarity ( d ) is also a quantitative
measurement that reflects the discrepancy (disorder, distance apart) between two image
objects. We formalise the definition of (dis)similarity in definition 1.
Definition 1 (Dis)Similarity (s/d)
Let Y be a non-empty set and s/d be a function on a set Y, such that
s / d : YxY  R, where R is the set of real numbers
This function is called pair-wise similarity/dissimilarity function. A (dis)similarity space is
a pair (Y, s (d)) in which Y is a non-empty set and s (d) is a (dis)similarity on Y. It is
possible to convert similarity value to dissimilarity value. The s/d function is bounded.
There is a relationship that exists between similarity and dissimilarity that allows us to
derive the similarity values from dissimilarity values. The relationship is given by
sij  1  d ij
where d ij is a normalized dissimilar ity value between objects i and j
sij  0,1
Or
56
(3-35)
sij  1  2d ij
where d ij is a normalized dissimilar ity value between objects i and j
sij   1,1
(3-36)
From the equations 2.1a and 2.1b we can have an equivalence relationship between
dissimilarity and similarity measurements. This equivalent relationship is shown below.
sij  siz  d ij  d iz , i, j, z  X
(3-37)
Table 3-3 summarises the interpretation of the values of similarity and dissimilarity.
TABLE 3-3 Interpretation of (dis)similarity values
Given two objects i and j using equation 1.1a
Similarity value
Dissimilarity value
Exact similar
1
0
Very different
0
1
Given two objects i and j using equation 1.1b
Exact similar
-1
1
Very different
1
0
General
Higher value
Lower value
Lower value
Higher value
interpretation
The (dis)similarity measurement algorithms can be grouped into metric and non-metric.
Metric is defined in definition 2.
Definition 2 (Dis)similarity Metric (Frechet)
57
Let X be a non-empty set. A metric on X is a function d of X x X into 0,   , that satisfies
the following conditions:
a) d ( x, y)  0, x, y  X
Non-negativity
b) d ( x, y)  0, if and only if x  y,
Reflexivity
c) d ( x, y)  d ( y, x), x, y  X ,
Symmetry
d) d ( x, y)  d ( x, y)  d ( z, y), x, y, z  X .
Triangle inequality
A metric space is a pair (X, d) in which X is a non-empty set and d is a metric on X.
Observation from the definition is that the metric is not bounded. In our case we need a
bounded metric, thus we will have an upper bound transforming it into bounded metric.
Non-metric (dis)similarity algorithms do not fulfil at least one metric conditions.
Depending on which metric condition(s) the non-metric (dis)similarity algorithm does not
fulfil a distinguishing term is used as shown in TABLE 3-4 (Skopal & Bustos, 2010).
58
TABLE 3-4 Non-metric Clasification
Metric Condition
Metric Condition not
Fulfilled
Fulfilled

Reflexivity,

Non-negativity,

Symmetry

Non-negativity,

Symmetry,

Triangle Inequality

Reflexivity,

Non-negativity,

Triangle Inequality

Reflexivity,

Symmetry,

Triangle Inequality

None
Distinguishing Term

Triangle Inequality

Semi-Metric (Non-Metric)

Reflexivity

Pseudo-Metric
(Non-
Metric)

Symmetry

Quasi-Metric (Non-Metric)

Non-negativity

? (Non-Metric)

Reflexivity,

Full-Non-Metric

Non-negativity,

Symmetry,

Triangle Inequality
These (dis)similarity algorithms have been used effectively to retrieve images of interest
successfully (Antani, Lee, Long & Thoma, 2004),(Petrakis & Faloutsos, 1997),(Stejic,
Takama & Hirota, 2003). What makes an algorithm perfect for a certain image database is
the contribution it has to the effectiveness and efficiency of content based image retrieval
system. Effectiveness of retrieval is usually measured by precision (which is the number of
correct image retrieved divided by the total number of images retrieved) and recall (is the
number of correct images retrieved divided by the total number of possible correct
images)(Zheng, Sherrill-Mix & Gao, 2007a).
59
precision 
A
AC
(3-38)
A
N
(3-39)
recall 
 A
 N if T  N
A

effectiven ess   if T  N
T
 A precision
 A  C
(3-40)
Where A is the number of relevant image objects retrieved, C is the number of not relevant
image objects retrieved, T is the number of relevant images that the user requires from the
database and N is the total relevant images in the database.
Efficiency of retrieval is the speed of retrieval (Skopal & Bustos, 2010). Metric and nonmetric (dis)similarity algorithms compete equally well in the effectiveness of retrieval.
Non-metric lags behind in the efficiency of retrieval. This is because the indexing of
databases is skewed in favour of metric (dis)similarity algorithms. It must be noted that an
effective retrieval system is useless in large databases if it is not efficient. Next sections we
are going to look at some metric and non-metric (dis)similarity algorithms.
3.3.1.1 METRIC (DIS)SIMILARITY (D/S) ALGORITHMS
Metric (D/S) algorithms exhibited high degree of effective and efficient retrieval of images
of interest from a very large image database. Many researchers used metric S/D algorithms
showed high precision and recall retrieval results (Tran & Ono, 2000),(Zhang & Lu, 2002),
(Zheng et al., 2007b). The metric conditions could be used to index the image database for
high efficient retrieval (Skopal & Bustos, 2010). The following are some of the mostly
used metric S/D algorithms:
1. Minkowski Family L p ( p  1 where p  1, 2, 3..... )
d
n
p
| x
i 1
i
 yi | p
(3-41)
60
Within this family very few have been used in image retrieval and they are Euclidean L2,
City block L1 (taxicab norm, Manhattan) and Chebyshev L∞ dissimilarity formulas. The
formulas are given in equations 3-42 to 3-44 below:
Euclidean L2
| x
d
i
(3-42)
 yi | 2
City block L1(taxicab norm, Manhattan)
n
d   | xi  y i |
(3-43)
i 1
Chebyshev L∞
d  max | xi  yi |
(3-44)
i
3.3.1.2 NON-METRIC (DIS)SIMILARITY ALGORITHMS
Non-metric D/S algorithms have been used and produced high degree of effective and
efficient retrieval results from very large databases. This is in part due to the fact that
researcher created weak metric (dis)similarity algorithms from these non-metric algorithms
(Clarkson, 2005) We are going to look at some of the non-metric S/D algorithms.
1. Pearson Dissimilarity Family
61


1 n XX
r  
n i 1   x




 Y  Y 
  
 y 

where r ,
X X
x
(3-45)

, X ,  x -are Pearson correlation coefficient, the standard
score, mean and standard deviation respectively.
Pearson dissimilarity measure algorithms are given as
d  1  r , where d  [0, 2]
(3-46)
d  1 | r |, where d  [0,1]
(3-47)
There are other (dis)similarity algorithms that use correlation, some of them are Spearman
rank correlation, Kendall’s  , Uncentred correlation(Cha, 2007).
2. Minkowski Family L p ( p  0,1 where p are fractions )
d
n
p
| x
i 1
i
 yi | p d is called fractional dissimilar ity
(3-48)
3. Shannon Entropy Family
In this family of (dis)similarity algorithms are Kullback-Leibler, Jeffreys/J divergence,
Jensen-Shannon and Jensen difference just to mention a few, are some of the non-metric
algorithms that have been used in image retrieval systems (Cha, 2007). The formulas are
given in equations 3-49 to 3-52 below.
Kullback-Leibler
n
d   xi ln
i 1
xi
yi
(3-49)
62
Jeffreys/J divergence
n
d   ( xi  yi ) ln
i 1
xi
yi
(3-50)
Jensen-Shannon
d
 2 xi
1 n
 xi ln 
2  i 1
 xi  y i
 n
 2 yi
   yi ln 
 i 1
 xi  y i



(3-51)
Jensen difference
n
 x ln xi  yi ln yi  xi  yi   xi  yi
d   i

 ln 
2
 2   2
i 1 



(3-52)
4. X 2 family
Squared Euclidean, Pearson, Neyman, Clark and additive symmetry are some in this group
(Cha, 2007). The formulas are given in equations 3-53 to 3-57 below.
Squared Euclidean
n
d    xi  y i 
2
(3-53)
i 1
Pearson X 2
63
n
xi  yi 2
i 1
yi
d 
(3-54)
Neyman X 2
n
xi  yi 2
i 1
xi
d 
(3-55)
Clark
 | x  yi | 

d    i
i 1  xi  y i 
n
2
(3-56)
Additive symmetry X 2
d 
xi  yi 2 xi  yi 
(3-57)
xi y i
5. Inner Product Family
The inner product family (dis)similarity measurement include the inner product explicitly
in their formulas. In this family we are going to look at only three formulas, that is inner
product, harmonic mean and cosine. We are interested in cosine since it is a normalised
64
inner product which allows for physical comparison of (dis)similarity measurements of
images. The formulas of the inner product family members are given in equations 3-58 to
3-60.
Inner Product
n
d   xi y i
(3-58)
i 1
Harmonic Mean
xi y i
xi  y i
d  2
(3-59)
Cosine
n
x y
d
i
i 1
n
i
x y
i 1
3.3.2
(3-60)
n
i
i 1
i
THE RELATIONSHIP BETWEEN (DIS)SIMILARITY ALGORITHM AND
DATABASE INDEXING
It is important to make a decision on how the database is going to be accessed for a speedy
retrieval of the image (s) of interest. The (dis)similarity algorithm used must be able to
fulfil certain properties that can be used to index image database for efficient retrieval. The
metric axioms are most commonly known properties that a (dis)similarity must fulfil. Thus
most databases are modelled in metric space (Bustos, Kreft & Skopal, 2011). This has
65
prompted to have Metric Access Methods that works efficiently with metric modelled
databases. The non-metric (dis)similarity algorithms have given the domain experts the
freedom to find suitable (dis)similarity algorithms in their domain without bothering about
metric axioms. This has create another challenge of finding non-metric access methods for
efficient retrieval (Skopal & Bustos, 2010).
3.3.2.1 METRIC ACCESS METHODS (MAM)
Definition 3 Metric Access Method (MAM)
Set of algorithms and data structure (s) providing efficient (fast) similarity search under the
metric space model (Skopal, 2010).
The triangle inequality property is the fundamental principle that MAM use to index the
object of database in different classes (Skopal & Bustos, 2010). This property is used to
create bounds (lower bound and upper bound) of a distance that is not known. Using the
lower and upper bound a query can be processed much faster. There are two ways of
making a metric (dis)similarity query: Range Query and k-nearest neighbours Query. The
mathematical formulation is given as follows:
Let X be a set of objects (database), and (X, d) is a metric space. Query object q that is to
be searched in the database X. A range query (q, r) is defined as the objects x  X that are
within (dis)similarity r to q that is d (q, x)  r . A k-nearest neighbour reports k number of
objects closest to q. Using triangle inequality property to establish the lower and upper
bound of d (q, x) , we use an object
p  X called pivot, (Dis)similarity of
d ( p, x) and d ( p, q) are known. Using the known construct the following two triangle
inequality:
d ( p, x )  d ( p, q )  d ( q, x )
d ( p, q)  d ( p, x)  d ( x, q)
(3-61)
Thus we can deduce that the lower bound of d (q, x) is
d (q, x)  d ( x, q)  d ( p, x)  d ( p, q)  d ( p, q)  d ( p, x) .
(3-62)
The upper bound of d (q, x) is
d (q, x)  d (q, p)  d ( p, x)
(3-63)
66
The d (q, x) is bounded as:
d ( p, q)  d ( p, x)  d (q, x)  d (q, p)  d ( p, x)
(3-64)
When a query is being processed most objects that do not satisfy the above inequality are
discarded so the image retrieval becomes efficient.
These MAM can be classified as non-hierarchical and hierarchical. The non-hierarchical
use the above inequality directly in the search while the hierarchical it indirectly. Some
examples of MAM are given in Table 3-5.
TABLE 3-5 Examples of Metric Access Methods
NON-HIERARCHICAL MAM

Approximation and Elimination Search
Algorithm (AESA)

HIERARCHICAL MAM
Linear AESA

Metric Tree (M-Tree)

Geometric Near-Neighbour Access Tree
(GNAT)

D-Tree

vp-Tree

Pivot Table
3.3.2.2 NON-METRIC ACCESS METHODS
Non-metric (dis)similarity algorithms face a big challenge of indexing database without
structured properties that govern them. In fact most the non-metric (dis)similarity if not all
are not full-non-metric. There have been a concerted effort to transform them to metric or
making sure they fulfil the triangle inequality in order to be able to use MAM to improve
the retrieval rate (Clarkson, 2005). Alternative properties to metric properties are also
being used to index non-metric modelled databases (Skopal & Bustos, 2010).
3.4
IMAGE (DIS)SIMILARITY MEASUREMENT AND DATABASE ACCESS
ALGORITHMS SUMMARY
The choice of a (dis)similarity method to use for (dis)simialarity search for certain
multimedia domain can no longer depend on the effectiveness of retrieval alone but on the
67
efficiency (speed) of retrieval. The domain experts were only worried about the
effectiveness due to the fact that the databases were small. Nowadays with the large
volume of multimedia data virtually in every field, there is need to think about the
efficiency of retrieval. The (dis)similarity methods contribute immensely to the indexing of
the database for efficient retrieval. The database access methods depend on the properties
of the (dis)similarity methods. We have seen the most commonly used (dis)similarity
methods and that there are grouped into metric and non-metric. The choice of metric
(dis)similarity methods in searching the similar objects has a lot of advantages in the fact
that they are well supported by the MAM and that the databases are metric modelled. On
the other hand the non-metric (dis)similarity methods lack concrete support due to the
scarcity availability of non-metric modelled databases and non-metric access methods.
3.5
EVALUATION ALGORITHM OF INFORMATION RETRIEVAL SYSTEMS
Evaluation is very crucial and tedious task in information retrieval system. There are many
retrieval models, algorithms and systems in literature so in order to proclaim the best
among many, choose one to use and improve there is need to evaluate them. One way to
evaluate is to measure the effectiveness of the systems. The difficult of measuring
effectiveness is that it is associated with the relevancy of the retrieved items. This makes
relevance the foundation on which information retrieval evaluation stands. Thus it is
important to understand relevance. In order to support laboratory experimentation in the
early studies, relevance was considered to be topical relevance, a subject relationship
between item and query. According to (Rasmussen, 2002) relevance is seen as a
relationship between any one of a document, surrogate, item, or information and a
problem, information need, request, or query. Relevancy from the human perspective is
subjective (depends upon a specific user’s judgement), situational (relates to user’s current
needs), cognitive (depends on human perception) and dynamic (changes over time). With
the problems associated with relevance, it is very difficult to implement user-oriented
evaluation of the system and it requires many resources. This problem of relevance has
been researched in textual and non-textual environments (Choi & Rasmussen, 2002;
Rasmussen, 2002). As a result, information retrieval evaluation experiments attempt to
evaluate the system only (Mandl, 2008). An objective expert is then used to judge the
relevance of a document/item to one information need. There are many algorithms to
evaluate the retrieval systems and can be classified into those that are used to evaluate
68
ranked or unranked retrieval results (Manning, Raghavan & Schutze, 2008). They can also
be regrouped into visual (graphical techniques) and scalar evaluation methods (non-visual
techniques) (Hoshino, Coughtrey, Sivaraja, Volnyansky, Auer & Trichtchenko, 2009). The
overview of the classification of the techniques is shown in FIGURE 3-16.
Evaluation
Techniques
For IR Systems
Techniques for
Techniques for
Evaluation of
Evaluation of
unranked Results
ranked Results
Non-Graphical
Graphical
Non-Graphical
Representation
Representation
Representation
Techniques
Techniques
Techniques
FIGURE 3-16: Hierarch of classification of evaluation techniques for IR systems
In this brief review of the evaluation techniques for information retrieval system, the
following techniques will be reviewed using the classification in FIGURE 3-16: Precision,
Recall, F-measure, Precision-Recall curve, Mean Average Precision, Receiver Operating
Characteristics (ROC) curve and Area Under ROC Curve (AUC). The merits and demerits
of these techniques will be discussed then investigate criteria to choose the appropriate
algorithm(s) to use in different situations. Finally open issues will also be discussed and
then conclusion.
69
3.5.1 TECHNIQUES FOR EVALUATION OF UNRANKED RETRIEVAL
RESULTS
The most frequently and important basic measures for information retrieval effectiveness
are precision and recall (Mandl, 2008; Manning et al., 2008). Precision can be defined as
the fraction of retrieved items that are relevant to all retrieved items or the probability
given that an item is retrieved it will be relevant and recall as the fraction of relevant items
that are retrieved to relevant items in the database or the probability given that an item is
relevant it will retrieved (Manning et al., 2008). These notions can be made clear by
examining the following set diagram (FIGURE 3-17). FIGURE 3-17 indicates the most
important components of these measurements and formulas can be derived from the
diagram.
A
B
Relevant
Retrieved
Retrieved
&
Relevant
FIGURE 3-17: Set Diagram showing elements of Precision and Recall
The formulas for Precision (P) and recall (R) using set notation are below:
P
R
n A  B 
n B 
(3-65)
n A  B 
n A
(3-66)
To the user the scalar value of recall indicates the ability of the system to find relevant
items as per query from the collection of different items and precision ability to output top
ranked relevant items as per query. In general the user is interested in the relevant retrieved
70
items thus the measures of precision and recall concentrate the evaluation on the relevant
output of the system. The lower the values indicates bad performance of the system and the
higher the values the more the user is encouraged to use the system due to the anticipation
of getting more of the relevant search items. These evaluation measures are interdependent measures in that as the number of retrieved items increases the precision usually
decreases while recall increases.
From these measures there are other measures that are derived from them. F-measure is
one known measure derived from precision and recall measures. This is scalar quantity that
trade off precision versus recall which is the weighted harmonic mean of precision and
recall. The formula is given in the equation below (Baeza-Yates & Ribeiro-Neto, 1999;
Zhou & Yao, 2010):
1
F
*
1
1
 (1   ) *
P
R
(3-67)
where   [0, 1] . The default balanced F measure equally weights precision and recall,
which means making  
1
. The weights can be varied as required.
2
It is important to note that precision, recall and F measure are set oriented measures thus
cannot adequately be used in ranked results systems (Mandl, 2008).
3.5.2 TECHNIQUES FOR EVALUATION OF RANKED RETRIEVAL RESULTS
This section describes techniques for evaluation of ranked information retrieval results that
use precision and/or recall measures. Among these techniques is Precision-Recall curve (PR-curve), R-precision, Mean Average Precision (MAP) and Precision at k just to mention a
few.
Most current systems present ranked results thus to be able to use the precision and recall
measures there is need to pair them at each given position. Considering the first k retrieved
items, the precision and the recall values can be calculated as long the total relevant items
are known in the database. The following example illustrates the construction of the
precision-recall curve.
71
Table 3-6: Showing the calculation of precision-recall coordinates
Calculating Precision-Recall Points
Query Item=I56
Known #relevant items in database=5
Rp
1
2
3
4
5
6
7
8
9
10
11
ItemID
I2
I33
I12
I8
I67
I99
I5
I1
I23
I3
I9
Relevance Yes
No
Yes
Yes
Yes
No
No
No
No
No
Yes
Recall
1/5=
1/5=
2/5=
3/5=
4/5=
4/5=
4/5=
4/5=
4/5=
4/5=
5/5=
0,2
0,2
0,4
0,6
0,8
0,8
0,8
0,8
0,8
0,8
1,0
1/1=
1/2=
2/3=
3/4=
4/5=
4/6=
4/7=
4/8=
4/9=
4/10= 5/11=
1,0
0,5
0,67
0,75
0,8
0,67
0,57
0,5
0,44
0,4
Value
Precision
Value
0,45
From Table 3-6 Rp is the ranked position of an item retrieved and ItemID is the item
identification. It can also be observed that when the item on Rp+1 is not relevant the recall
remains the same and precision decreases as shown in Table 3-6 when Rp+1 =2, recall
remained 0,2 as it was in Rp =1, precision decreased from 1,0 to 0,5. In case where the item
in Rp+1 is relevant the recall increases and the precision increases or remains the same. The
P-R graph is the plotted from the precision-recall values in Table 3-6. The graph can be
seen in FIGURE 3-18 with points marked using stars that have distinct saw-tooth shape. In
order to smoothen the graph the interpolated precision is used and the interpolated

precision P at certain recall level r is defined as the maximum precision found for any
recall level r’ as in equation 3-68.

P(r )  max p(r ' )
(3-68)
r r '
Interpolate a precision value for each standard recall level in Table 3-6 and the following
Table 3-7 of 11-point interpolated average precision is obtained.
72
Table 3-7: 11-Point Interpolated Average Precision
r’
0,2
0,2
0,4
0,6
0,8
0,8
0,8
0,8
0,8
0,8
1,0
R
0,0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1,0

1,0
1,0
1,0
0,67
0,67
0,75
0,75
0,80
0,8
0,45
0,45
p(r )
So the graph marked with stars is transformed to the graph marked with “Xs” in FIGURE
3-18.
FIGURE 3-18: Graphs for values in Table 1 and Table 2
For more variations of Precision-Recall curves consult (Baeza-Yates & Ribeiro-Neto,
1999; Manning et al., 2008).
Looking at non graphical evaluation techniques related to precision and/or recall, there is
MAP which has gained popularity among the Text Retrieval Conference (TREC) members
(Manning et al., 2008). MAP is one of the various ways of combining precision and recall
into a single scalar value measure which is defined as an average of the average precision
value for a set of queries. Average precision is calculated by averaging the precision for
73
every position in the ranking at which a relevant item is retrieved. Relevant items not
retrieved by cutoff depth are assigned a precision of zero. The scalar value obtained is
approximately equal to the area under the precision-recall curve. MAP expresses the
quality of the system in one number. The formula that is used to calculate the MAP is
given in equation 3-69 below.
k
1
MAP 
 Re k
nRe  k 1
 Re
i 1
i
k
(3-69)
where nRe  is the number of relevant items, Re k and Re i take zero or one indicating not
relevant or relevant at position k and i respectively.
There are other measures like Precision at k and R-precision that can be used. Precision at
k shortened as P@k is the precision calculated at a cut-off point k. This measure does not
measure recall. It is criticized in that relevance items for a query have a lot of influence on
precision at k but is ignored. In order to alleviate this problem R-precision measure is
introduced. In this measure the number of relevant items is known and it becomes the cutoff point. The formula is given in equation 3-70 below:
R  Pr ecision 
1 n (Re)
 Re k
nRe  k
(3-70)
The R-precision measure is also called break-even point. R-precision refers to the best
precision on the precision-recall curve.
Receiver Operating Characteristics curve is also used in information retrieval systems
performance evaluation. In order to illustrate how ROC works it is important to understand
the confusion matrix. A confusion matrix shows the differences between the true and
predicted classes (Bradley, 1997). The confusion matrix is shown in Table 3-8 below.
74
Table 3-8: Confusion Matrix
Actual Positive
Actual Negative
Total predicated
Predicated Positive
TP
FP
TP+FP=TPP
Predicated Negative
FN
TN
FN+TN=TPN
Total Actual
TP+FN=TAP
FP+TN=TAN
N
where TP is true positive(items correctly labelled as similar to query), FP false positive
(items incorrectly labelled as similar to query item), FN false negative (items incorrectly
labelled as not similar to query), TN true negative (items correctly labelled as not similar to
query item), TPP total predicated positive, TPN total predicted negative, TAP total actual
positive, TAN total actual negative and N =TAP+TAN=TPP+TPN.
From the confusion matrix more meaningful measures can be derived from it to illustrate
performance criteria as shown in equations 3-71 and 3-72 below (Davis & Goadrich, 2006;
Landgrebe, Paclik & Duin, 2006):
TPR or Sensitivit y or recall 
FPR or 1  Specificit y 
TP
TP

TP  FN TAP
FP
FP

FP  TN TAN
(3-71)
(3-72)
TPR (True Positive Rate) measures the fraction of all relevant items in the database that
have been correctly labelled similar to the query. FPR (False Positive Rate) measures the
fraction of all irrelevant items in the database that have been incorrectly labelled similar to
the query.
These measures of performance are valid only for one particular operating point, an
operating point normally being chosen to minimize the probability of error. The ROC
curve is a plot of TPR versus FPR across different thresholds (Brodersen, Ong, Stephan &
Buhmann, 2010). The TPR is plotted on the y-axis while FPR on the x-axis. Thus it offers
a threshold-independent way of evaluating information retrieval performance. Usually a
75
ROC curve always moves from the bottom left to the top right of the graph. Performance
of a model represented as a point in an ROC curve. A good system produces results that
generate a graph that climbs steeply on the lift side as can be appreciated in FIGURE 3-19
(right hand side graph). The point (0, 0) indicates that everything to be negative class, (1,
1) indicates everything to be positive class and (0, 1) is the ideal situation. The diagonal
line indicates a random guessing. Any point below the diagonal line predicts an opposite of
the true class indicating a lower TPR and/or higher FPR (Drummond & Holte, 2000; Ferri,
Hernandez-Orallo & Salido, 2003).
FIGURE 3-19: Graphs illustrating the appearance of P-R and ROC curves
The ROC curve also brings another form of measure of the performance of a system. This
measure is ROC Area Under Curve (AUC) a simple scalar metric that defines how an
algorithm performs over the whole space. The area can be calculated using the trapezoidal
area created between each ROC curve points (Drummond & Holte, 2000). AUC value
range is [0 1]. One indicates an ideal performance of a system, 0.5 a random guess
performance of the system and a zero a system that never retrieves anything similar to the
query (Walter, 2002).
3.5.3 RELATIONSHIP BETWEEN ROC AND P-R RELATED MEASURES
The ROC and P-R curves are visual performance measures as seen in FIGURE 3-19. In
(Davis & Goadrich, 2006) it is shown that a curve that that dominates in ROC space also
dominates in P-R space and vice versa.
This is illustrated in FIGURE 3-19, the
comparison of the two systems in P-R space and ROC space shows that system represented
with dashed line performs better in both spaces. Again from these graphs it can be
76
appreciated that the area under the curves in both spaces are approximately equal. In P-R
space the area under the curve is called MAP and in ROC space ROC-AUC. The bigger the
area the better the system performs.
3.5.4 CONCLUSION
Performance evaluation is crucial at many stages in information retrieval system
development. At the end of development process it is significant to show that the final
retrieval system achieves an acceptable level of performance and that it represents a
significant improvement over existing retrieval systems. To evaluate a retrieval system,
there is need to estimate the future performance of the system. The information retrieval
performance evaluation methods measures highlights different aspects of a model’s
classification performance and so selecting the most appropriate performance measure is
clearly application dependent (Landgrebe et al., 2006). The scalar measures are attractive
to use because they give a definitive answer to which retrieval system is better, this gives
authors the authority to claim the superiority of their algorithm. The scalar measure gives
an overall value of performance of the system and no any other information. The visual
performance measure preserves all performance related information about a retrieval
system. The visual performance measure is capable of showing if one system dominates
the other system totally or partially.
The traditional binary evaluation methods play a dominate role in the history of
information retrieval system evaluation. These methods include recall, precision, MAP,
precision at k and R-precision (Zhou & Yao, 2010). Precision-recall analysis has remained
as the appropriate evaluation performance measures of choice in applications such as
database image retrieval. Precision-Recall Curve (PRC) which plot precision vs recall
across all thresholds represents a more natural way of looking at classification performance
when it comes to search relevant items (information retrieval) in situations where the
available data is heavily imbalanced in favour of the negative class (Jarvelin & Kekalainen,
2000). The end-users relate to precision-recall curves as they indicate how many true
positives are likely to be found in a typical search. Evaluation at single operating point is
suitable in well defined environment where class priors and misclassification cost are
known (Landgrebe et al., 2006; Rasmussen, 2002).
77
ROC curve is helpful in assessing the performance of a system independently of any given
threshold. The ROC curve which plot TPR vs FPR allows authors to quickly see if one
method dominates another, and using convex hull to identify potentially optimal methods
without committing to a specific performance measure. There is a scalar measure related to
ROC curve which ROC Area Under Curve (ROC-AUC) is also used to measure predictive
system’s performance.
There are many other methods suggested in literature, they all fall within these two
categories: Scalar and Visual measures. The few described above seem to be the mostly
used in evaluation methods of information retrieval systems’ performance.
3.6
CHAPTER SUMMARY
In this chapter the algorithms for image segmentation, representation and retrieval were
reviewed. Image segmentation algorithms review revealed that region based segmentation
techniques exhibited excellent qualities required for generic image segmentation
algorithms. These region based algorithms perform differently in different application
areas due to the statistics selected to model regions for segmenting. For example
attempting to model regions using global statistics for segmenting heterogeneous object
image would not produce desirable results. Region based algorithms can segment images
without or with smooth edges successfully. These algorithms are less sensitive to image
noise and the location of the initial contour. All these advantages that region based image
segmentation algorithms have would benefit the recommender system (Image Content in
Shopping Recommender System for Mobile Users) if a region based segmentation
technique is cooperated into the system properly.
Image representation algorithms review indicated that region based and global methods
have more advantages in representing images in a generic domain.
Region based
algorithms are more robust because they utilize all the shape information available. Region
based have advantages in that they can be applied to general applications and they provide
more accurate retrieval. The choice of the representation method to be used in the
recommender system (Image Content for Shopping Items Recommender System for
Mobile Users) will come from the region based and global methods classes.
78
The review of the (dis)similarity algorithms shows that the choice of the technique to use
depend on how it affects the effective and efficiency of the retrieval system. So the
decision on the (dis)similarity algorithm will be done after testing its effect on the
effectiveness and efficiency of the system.
In conclusion the three algorithms that are image segmentation, representation and
(dis)similarity algorithms must be compatible to each other for effective and efficient
recommender system (Image Content for Shopping Item Recommender System for Mobile
Users). The next chapter will reveal whether the algorithms will be selected from existing
methods or new ones will be created for the recommender system.
79
CHAPTER 4
4
SHAPE IMAGE CONTENT FOR MOBILE RECOMMENDER SYSTEM
The Image Content in Shopping Items Recommender System for Mobile Users architecture
proposed in chapter 2 shown in figure 2-3 shows that there is need to have item and user
profiles in order to be able to personalize the shopping recommendations. This section
provides detail information about item and user profile representations modelling,
recommendation process and user interaction modelling and recommendations list
computation process of a shopping recommender system for mobile users.
The goal of our item representation is to model features that are common to large classes of
items. In reality, this makes the system suitable as an e-Commerce recommender that can
be used for sales of many different item types. We achieve this goal by capturing essential
item information such as unique identifier, name, class, image (logo), price, payment
method, shop and location. Specific features that are unique to each item are also captured
and stored in the item database. Items in the database are identified by their logos (L). As
a result, the recommendation method compactly represents items information as a feature
vector of m values as
i  (i1 , i2 , i3 ,........im )
(4-1)
where ii may be numeric, nominal or set of numbers.
A typical example of an item feature vector is:
i  ( Lid , GPSs, LGPSs ) where LGPSs  ( Lid , price  range, size  range, promotion,...etc )
(4-2)
where Lid is identifier of the logo, GPSs is where the item can be found, LGPSs is the set of
features of the item at different locations, price-range is the range of the item prices in its
various sizes, size-range is the sizes available of the item and so forth.
The user profile is also modelled as a feature vector of n values
80
u  (u1 , u2 , u3 ,........un )
(4-3)
where ui may be numeric, nominal or set of numbers.
A typical example of an user feature vector is:
u  (GPS , I AFSs ) where I AFSs  ( price  range, size  range,...etc )
(4-4)
where GPS is where the user is located at the time of querying the system, I AFSs is the set
of average features of the items bought by user previously that is price-range, size-range
and so forth.
When the mobile client sends a logo together with GPS coordinates to the system the
following steps are taken:
1 Searches the logo in the database
2 Finds the logo similar to the query logo
3 Looks at the GPS coordinates of locations where the item can be found
4 Calculates the distances between the mobile client and the retail locations
5 Rank the locations according to distances calculated in 4
6 Calculate the similarity between I AFSs and LGPSs (acceptable distances)
7 The final ranking is done taking into consideration the 5 and 6
Recommendation is send to the mobile client with GPS coordinates of the chosen location,
promotions and special offers.
In this research the distance calculated in step 4 is done using the following formula (Adair
& Turnbull, 1974):
duL  2 * a sin( sqrt((sin((lat1  lat 2) / 2)) 2  cos(lat1) * cos(lat 2) * (sin((lon1  lon 2) / 2)) 2
(4-5)
where lat1, lon1and lat 2, lon 2are GPS coordinates for the mobile client and the retail
location respectively . It is very important to note that lat and lon stand for latitude and
longitude respectively. North latitudes and west longitudes are taken as positive and south
latitudes and east longitudes are taken as negative.
81
In step 6 the similarity is calculated using the Cosine similarity formula:
sim( I AFSs , LGPSs ) 
I
I
AFS
* LGPS
L
2
AFS
2
GPS
(4-6)
In step 7 the similarity values for ranking are calculated using the following formula:
R(sim)  sim( I AFSs , LGPSs ) *  uL
(4-7)
where  uL is normalized d uL and then transformed to range [0, 1] and  uL is calculated as:
 uL 
d uL 
1 
1
2 
d uL  a 
(4-8)
In step 8 the biggest R(sim) is the one that is recommended to the user.
The goal of the Image Content in Shopping Recommender System for Mobile Users is to
efficiently find a set of items that match user desired item and give location of the nearest
vendor. For this goal to be achieved the image retrieval system component of the
recommender system should be very effective. The moment the system receives the image
from the user it must be able to identify the correct type of image and match with user
profile for retrieval of the user desired item. In chapter 2, figure 2-2 the user is supposed to
query the system using images that are in the database or images captured by a mobile
device. In this chapter the components of the retrieval system are going to be looked at in
order to be able to build an effective retrieval system for the recommender system.
We are going to develop a retrieval system that is capable of matching images that are in
the database with the images in the database or captured by a camera enabled device that
do not belong in the database but compatible with them. The system will be implemented
using Matlab programming language. This language was chosen due to its capabilities in
image processing. The following framework is going to be used in developing the system:
82
FIGURE 4-1: The framework of the retrieval system
In developing the retrieval system there are stages that are very crucial in contributing to
the success of the system. The following block diagram of the retrieval process clearly
shows the stages that we follow in implementing the system.
FIGURE 4-2: The Image Retrieval Process
83
We will explain what will be implemented in each stage in the block diagram figure 4-2.
4.1
IMAGE PRE-PROCESSING
Image pre-processing is the expression for operations on images at the lowest level of
abstraction. The objective of pre-processing is to improve image data, to suppress
undesirable distortions or to enhance some image features relevant for further processing
and analysis task (Miljkovic, 2009). Image may have noise, geometric distortions, varying
image resolution and lighting conditions. Pre-processing these images help in noise
attenuation, correction of image orientation and increase contrast of the image. In matlab
there are pre-processing techniques available that can be used. Some of the techniques we
are going to experiment with are histogram equalization, image filtering, resizing and
morphological techniques. This will make us to be able to set our system for automatic preprocessing of all the images captured by camera enabled device.
4.2
SEGMENTATION METHODS
Studying the types of images that constitute shop items, we find out that some might not
have definite edges. We decided that segmentation methods that are capable of segmenting
images without edges would be suitable for our study. The methods that fall in this
category are Active Contour without Edges and Robust Image Segmentation using Local
Median by Chan & Vese and Jundong Liu respectively. In (Chan & Vese, 2001; Liu, 2006)
Active contour without Edges and Robust Image Segmentation using Local Median have
the ability of detecting smooth boundaries, scale adaptivity and automatic change of
topology. Active contour without edges uses global statistics to model regions while and
Robust Image Segmentation using Local Median uses local statistics to model regions
(Lankton & Tannenbaum, 2008; Liu, 2006).These are the characteristics that we are
looking for in the shopping item domain. The experiment to justify its robustness can be
seen in (Chan & Vese, 2001). The two methods are some of the candidates for segmenting
images whose boundaries are not necessarily defined by gradient. We will
comprehensively describe these two methods because we are going to experiment with
them in this study.
84
4.2.1
ACTIVE CONTOUR WITHOUT EDGES
Chan & Vese active contour algorithm comes from segmentation problem formulated by
Mumford & Shah. Mumford & Shah formulated their Active Contour Method as follows:
Let  be a bounded open subset of  2 , that is the 2 - Dimensional image space, and
:
be a given gray image. In (Sezgin & Sankur, 2004), Mumford and Shah formulated the
image segmentation problem as follows: given an image  , finding a contour C which
divides the image into non-overlapping regions representing different objects. They
proposed the following speed function:
F (u, C )   (u  ) 2 dx    |  | 2 dx   | C |
(4-9)
Where | C | the contour length,  ,  0 are constants to balance the terms. u is an image to
approximate the original image  , which is smooth within each region inside or outside the
contour C . The first term in (3)
(  (u  ) 2 dx) over 
is data fitting. The second term in (3)
(   |  | 2 dx) over  \ C
is the smoothing term. The third term in (3 ) ( | C |) regularizes the contour by penalizing
the arc length. The minimization of Mumford-Shah functional results in optimal contour C
that segments the image  into disjoined regions, and smooth version if image  that is
denoised image u . The equation (3) is not easy to solve due to different dimensions of u
and C . F (u, C ) is not convex so may have multiple local minima. In order to overcome the
problems mentioned Chan & Vese proposed the Piecewise Constant Model. Chan & Vese
method is described as follows:
Let c1 and c 2 denote the average image u(x, y) intensities inside and outside a random
closed curve C respectively. C 0 denotes the boundary contour of an object region u 0 . A
fitting term F is defined as:
85
F1 (C )  F2 (C ) 

| u 0 ( x, y)  c1 ) | 2 dxdy 
inside( C )
| u
0
outside( C )
( x, y)  c2 ) | 2 dxdy
(4-10)
F is the energy function. The minimum of F is achieved only when C is fitted into C 0
enclosing the object region. Chan and Vese minimized the fitting term and added some
regularizing term, like the length of the curve C and/or the area of the region inside C .
The energy functional F (c1 , c2 , C ) is defined as:
F (c1 , c2 , C )  .Length (C )   . Area (inside (C ))  1
| u
0
( x, y)  c1 | 2 dxdy 
inside( C )
 | u ( x, y)  c
0
outside( C )
2
|2 dxdy
(4-11)
Where c1 and c 2 are the averages of u 0 inside C and outside C respectively. Using the
Heaviside function  , and the one-dimensional Dirac measure  0 , defined as respectively:
1, if z  0
H ( z)  
0, if z 0,
 0 ( z) 
(4-12)
d
H ( z) .
dz
(4-13)
Then the terms in the energy function F are expressed in the following way:
Length  0   | (( x, y)) | dxdy    0 (( x, y)) | ( x, y) | dxdy ,

(4-14)

Area  0   (( x, y))dxdy
(4-15)

And
86
 |u
0
 |u
0
 0
0
( x, y)  c1 | 2 dxdy   | u 0 ( x, y)  c1 | 2 (( x, y))dxdy ,
(4-16)
( x, y)  c2 | 2 dxdy   | u 0 ( x, y)  c2 | (1  (( x, y)))dxdy .
(4-17)


Then the energy function can be written as:
F (c1 , c2 , )     (( x, y)) | ( x, y) | dxdy    (( x, y))dxdy 


1  | u ( x, y)  c1 |2 (( x, y))dxdy  2  | u( x, y)  c2 | 2 (1  (( x, y)))dxdy

(4-18)

Where   0, 1 , 2  0 are fixed parameters. This is the function that Chan & Vese
minimize. The calculation of c1 and c 2 is carried out as:
c1 () 
u
0
( x, y ) (( x, y ))dxdy

 (( x, y))dxdy
(4-19)

c 2 ( ) 
u
0
( x, y )(1   (( x, y )))dxdy

 (1  (( x, y)))dxdy
(4-20)

These are global mean (an average intensity value), calculated based on the entire image.
There are some assumptions that were taken into consideration that is

Within each object the intensity values conform to Gaussian distribution

The global mean (average intensity value) for different regions are distinct,
therefore can be used in discriminating pixel.
4.2.2 ROBUST IMAGE SEGMENTATION USING LOCAL MEDIAN
This method is not very different from the one above. Instead of using the global mean
they used the local median. So in reality they introduced two functions f 1 and f 2 , both
defined on the image domain, to represent the median values of the local pixels inside and
outside the moving curve. Local in this case refers to that only neighbouring pixels will be
87
considered. The way to implement this “neighbourhood” is by introducing a rectangle
window W with size of
W  (2k  1) * (2k  1) ,
(4-21)
where k is a constant integer. Thus
f1  median(u0 * inside(C ) *W
(4-22)
f 2  median(u0 * outside(C ) *W
(4-23)
The functions f 1 and f 2 are defined on the entire image domain. The f1 ( x, y) and
f 2 ( x, y) are calculated for each point ( x, y) , and takes the median intensity value for the
neighbouring pixels that are inside and outside the moving curve respectively.
This method minimizes the following energy:
F ( f1 , f 2 , C )  .Length (C )   . Area (inside (C ))  1
| u
0
 f1 | dxdy 
inside( C )
2
| u
0
outside( C )
 f 2 )dxdy .
(4-24)
Mapping to level set framework, the new functional Liu attempts to minimize is
F ( f1 , f 2 , C )     (( x, y)) | ( x, y) | dxdy    (( x, y))dxdy 


1  | u 0  f1 | (( x, y))dxdy  2  | u 0  f 2 | (1  (( x, y))dxdy


(4-25)
Accordingly, f 1 and f 2 are calculated as follows:
f1  median(u0 * () *W )
(4-26)
f 2  median(u0 * (1  ()) *W ) .
(4-27)
We are going to experiment with these two to investigate which one is suitable for
shopping items.
88
4.3
IMAGE REPRESENTATION METHOD
We will briefly describe the theory that we are going to use in formulating our
representation technique. We are going to use non-parametric method to represent images
shapes. Representing an object shape using a non-parametric class of density estimators is
good because there are no assumptions about the distribution of the data set of the object
shape. This type of representation determines the density based on the object shape data
itself. Examples of some of the nonparametric methods are histogram, kernel density
estimation and k-nearest neighbour estimation. One of density estimator that is mostly used
is the histogram. In (Tran & Ono, 2000) Density Histogram of Feature Points (DHFP) was
used to represent an object shape. In general, to create a histogram one needs the starting
point x 0 and the bin width w . The major problem of histograms is their dependence on the
width of the bins (bin size) (Shimazaki & Shinomoto, 2007). The frequency distribution is
smoothed out (over-smoothing) or discretized (under-smoothing) when the bin size is
increased or decreased respectively. In most cases the bin size has mostly been selected
subjectively by individual researchers (Shimazaki & Shinomoto, 2007). The choice of
bandwidth is often very critical in the implementation of nonparametric methods. The other
problems with histograms are that they are not smooth and depend on the end points of the
bins. The histogram suffers from the curse of dimensionality that is the number of bins
grows exponentially with the dimension, a finer resolution implies a lot of bins and thus
most bins will be empty.
We can try to solve these problems by using another nonparametric method Kernel
Density Estimator (KDE) in the representation of an object shape. We will explore the
theory of KDE and then experiment to optimally represent object shapes in an image
database using KDE. We will find an optimal way of calculating optimal bandwidths and
for this project the Gaussian kernel function will be experimented with in representing the
images.
4.4
THE 1-DIMENSIONAL KERNEL DENSITY ESTIMATION
Kernel density function is way estimating a probability density function from observed
data. Kernel density approaches exist for discrete and continuous data types. We will
define kernel density estimator as follows:
89
Definition 1 Kernel Density Estimator
Let ( x1 , x2 ,........., xn ) be an independent and identically distributed (i.i.d) sample drawn
from some distribution with an unknown univariate density f . The kernel density
estimator for f is

f h ( x) 
1 n
1 n  x  xi 
K
(
x

x
)


 h
 K
i
n i 1
nh i 1  h 
(4-28)
with kernel function K (u ) and bandwidth h .
4.4.1 KERNEL FUNCTIONS
Kernel function K (u) :    is any function which satisfies

 K (u)du  1.
(4-29)

A probability density function is a non-negative kernel and it satisfies
K (u)  0 u
(4-30)
When a kernel is symmetric then it satisfies
K (u)  K (u) u
(4-31)
The moments of a kernel are given in Equation 3-24,

m j ( K )   u j K (u )du .
(4-32)

This enables us to define the order of a kernel as the order of the first non-zero moment.
We can conclude that all symmetric non-negative kernels are of second order since
m1 K (u)  0 and the first non-negative moment is
90

m2 ( K )   u 2 K (u )du   K2  0 .
(4-33)

The order of a symmetric kernel is always even. A kernel is higher-order kernel if j  2 .
These kernels are not probability densities because they have negative parts. We are
interested in second order kernels (probability densities). Examples of second order kernels
are given in Equations 3.26-3.28:
1
(
u2
)
2
Gaussian
K (u ) 
Uniform
1
K (u )  1(| u | 1)
2
Epanechnikov
K (u ) 
2
e
(4-34)
(4-35)
3
(1  u 2 )1(| u | 1) .
4
(4-36)
4.4.2 KERNEL DENSITY ESTIMATOR (PROPERTIES)

The density estimator must integrate to one
 


1 n
1 n 1  x  xi 
f
(
x
)
dx

K
(
x

x
)
dx

dx

 K
h
i
 h
n
n i 1  h  h 

 i 1
Applying change of variable u 
(4-37)
Xi  x
which has a Jacobian h then we obtain:
h

1 n
 K (u)du
n i 1 
Applying the property of kernel function that says it integrates to one, we obtain:
1 n
1
1  n  1.

n i 1
n
(4-38)

We can conclude that f (x) is a valid kernel density estimator when K (u)  0 .

Mean of the estimated density is:
91


 x f ( x)dx 

1 n 1  Xi  x 
dx
 x K
n i 1 h  h 
Applying change-of-variable u 
(4-39)
( X i  x)
we have
h

1 n
   ( X i  uh) K (u )du
n i 1 



n
1 n
X
K
(
u
)
du

 i

 uhK (u)du
n i 1 
i 1  

n
1 n
X
K
(
u
)
du

h  uK (u )du
 i

n i 1 
i 1 

Applying the following:





 K (u)du  1and  uK (u)du  0 , we have
1 n
  Xi
n i 1

We can conclude that the mean of the estimated density f (x) is equal to the sample mean
of X i .


The variance of the estimated density f (x) can be calculated as follows:


2
 x f ( x) dx 


1 n
 Xi  x 
2 1
x
K

dx

n i 1  h  h 
Applying the change-of-variables u 
( X i  x)
we have:
h
92
(4-40)

1 n
   ( X i  uh) 2 K (u )du
n i 1 
Expanding ( X i  uh) 2 we have X i2  2 X i uh  u 2 h 2 thus

1 n
   ( X i2  2 X i uh  u 2 h 2 ) K (u )du
n i 1 


1 n
 ( X 2 K (u)  2 X i uhK (u)  u 2 h 2 K (u))du
n i 1  i
 
1 n
  X i2
n i 1
Applying



2 n
1 n 2
(
K
(
u
)
du

X
h
uK
(
u
)
du

h  u 2 K (u )du


i


n i 1
n i 1 





 K (u)du  1and
 uK (u)du  0 we have


1 n 2 n 2
( X i   h  u 2 K (u )du )
n i 1
i 1


1 n 2
X i  h 2 m2 ( K )

n i 1

 2
 2
Thus the variance of the density f ( x)    h 2 m2 ( K ) where  is the sample variance.
4.4.3 BIAS OF THE ESTIMATOR

Bias ( f (x) ) is calculated as follows


Bias ( f ( x))  E f ( x)  f ( x)
(4-41)
We derive it as follows

1  Xi  x 
1  z x
E k
   k
 f ( x)dz
h  h   h  h 
93
Using the change-of-variables u 
zx
we have
h


 K (u) f ( x  hu)du

By linearity of the estimator we have

1 n 1  Xi  x 
E f ( x)   E k 
  K (u ) f ( x  hu )du
n i 1 h  h  

Using Taylor expansion of f ( x  hu) in the argument hu , which is valid as h  0 .For a j th
order kernel we take the expansion out to j th term
f ( x  hu )  f ( x)  f (1) ( x)hu 
1 ( 2)
1
f ( x)h 2 u 2  .........  f ( j ) ( x)h j u j  o(h j )
2
j
Then we have



 K (u) f ( x  hu)du   K (u) f ( x)du   K (u) f




 K (u)

( x)hudu 
1
 K (u) 2 f
( 2)


1 ( j)
f ( x)h j u j du   K (u )o(h j )du
j!


Using


(1)

 K (u)du  1 and  u

j
K (u )du  m j ( K )

We obtain
 f ( x)  f (1) hm1 ( K ) 
1 ( 2)
1
f ( x)h 2 m2 ( K )  ......  f ( j ) h j m j ( K )  o(h j )
2
j!
Assuming that the kernel is of order j then mi ( K )  0 for all i j thus we have
 f ( x) 
1 ( j) j
f h m j ( K )  o( h j )
j!
94
( x)h 2 u 2 du.  .........


Bias ( f ( x))  E f ( x)  f ( x) =
1 ( j) j
f h m j ( K )  o( h j )
j!
For second-order kernel we have (that is what we interested in)


Bias ( f ( x))  E f ( x)  f ( x) 
1 ( 2) 2
f h m 2 ( K )  o( h 4 )
2
(4-42)
4.4.4 VARIANCE OF THE KERNEL DENSITY ESTIMATOR
Assuming h  0 and n   variance of the kernel density estimator is

1
 1 
Var ( f ( x)) 
f ( x)  K (u ) 2 du  o 
nh
 nh 


(4-43)
We will derive the variance of the kernel density estimator as follows:
 X  x
The kernel estimator is a linear estimator and K  i
 is independent and identically
 h 
distributed then

Var ( f ( x)) 
  X  x 
1
Var  K  i
 
2
nh
  h 
2
1
11
 X  x
 X  x 
 2 EK  i
   EK  i
 
nh
nh
 h 
 h 
As observed in bias derivation that
2
X x
1
EK ( i
)  f ( x)  o(1) therefore the second term
h
h
1
is O  .
n
We expand the first term by making the expectation as integral, make change of variables
and then a first order Taylor expansion, we get the following
2
2

1
1
 X  x
 z x
EK  i
   K
 f ( z )dz
h
h   h 
 h 
95


 K (u)
f ( x  hu )du
2



 K (u)
2
( f ( x)  O(h))du

 f ( x) R( K )  O(h)

Where R( K ) 
 K (u)
2
du is the roughness of the kernel. We can conclude that variance of

kernel density estimator is

Var ( f ( x)) 
f ( x) R( K )
1
 O 
nh
n
(4-44)
1
1
 1 
The remainder O  is of smaller order than the O  leading term, since   .
h
n
 nh 
4.4.5 MEAN-SQUARE ERROR (MSE)
Mean square error is a local measure of the performance of the kernel density estimate at
point x and it is the sum of bias squared and variance. Therefore is as follows


MSE ( f ( x))  E ( f ( x)  f ( x)) 2

(4-45)

 Bias ( f ( x)) 2  Var ( f ( x))
(
1 ( j) j
f ( x) R( K )
f h m j ( K )) 2 
j!
nh

 AMSE ( f ( x))
Since this approximation is base on asymptotic expansion thus it is called Asymptotic
Mean Square Error (AMSE) as indicated in the derivation.
To obtain a Global Measure of performance at all values of x , we define the Integrated
Square Error (ISE).
96

ISE (h) 

2
 ( f ( x)  f ( x)) dx
(4-46)

This is written as a function of h to emphasize the dependence on the bandwidth. By taking
the expected value of the ISE we obtain MISE as follows


MISE (h)  E[ ISE (h)]  E[  ( f ( x)  f ( x)) 2 dx]




 E[( f ( x)  f ( x))
2
(4-47)
]dx




 MSE ( f ( x))dx




 AMSE ( f ( x))dx

 IMSE(h)
 AMISE (h)

m 2j ( K )
( j!)
2
R( f ( j ) )h 2 j 
R( K )
nh

Where R( f ( j ) )   ( f ( j ) ( x)) 2 dx is roughness of f ( j ) .

4.5
FINDING OPTIMAL BANDWIDTH
There are so many methods for bandwidth selection and these include Mean Square Error
MSE), Mean Integrated Squared Error (MISE), Asymptotical MISE, plug-in techniques,
bootstrap methods, just to mention a few. We are going to briefly describe some selected
methods of optimal bandwidth.
97
4.5.1 ASYMPTOTICALLY OPTIMAL BANDWIDTH
The optimal bandwidth minimizes MISE. The value of h that minimizes MISE is called
asymptotically optimal bandwidth. The solution is found by differentiating MISE with
respect to h and setting the derivative to zero this yields the optimal bandwidth. This can
be done as follows:
2
d
d  m j K (u )
R( K ) 
( j) 2 j
AMISE 
R
(
f
h

dh
dh  ( j!) 2
nh 
 2 jh
2j
m 2j ( K )
( j!)
2
R( f ( j ) ) 
(4-48)
R( K )
nh 2
0
The solution is
h0  C j ( K , f )n
C j ( K , f )  R( f
1
( 2 j 1)
(4-49)
( j)
)
1
( 2 j 1)
Aj (K )
(4-50)
1
 ( j!) 2 R( K )  ( 2 j 1)

Aj (K )  
 2 jm 2 ( K ) 
j


The optimal bandwidth is proportional to n
O(n
(
1
)
2 j 1
1
( 2 j 1)
(4-51)
. The optimal bandwidth is of order
1
) . For second-order kernels that we are interested in the optimal rate is O(n 5 ) .
4.5.2 PLUG-IN BANDWIDTH
A plug-in estimate for the bandwidth is a simple formula for hrot that depends on the
sample size n and the sample standard deviation s . The optimal bandwidth formula is
given as
98
ho  R( f ( j ) )
1
2 j 1
1
 ( j!) 2 R( K )  2 j 1 2 j11


n
 2 jm 2 ( K ) 
j


(4-52)
In the above formula all other items have known values except for

R( f ( j ) ) 
( f
( j)
( x)) 2 dx
(4-53)

So a useful starting point is to assume that the unknown density f (x) belongs to the family
of second-order normal distributions with mean  and variance  2 then we have

f
( 2)
( x) 2 dx 

3
8 
5

0.2116
5
(4-54)
Then
R( f
( 2)
)
1
5
1
 0.2116  5

  1.3643
5
 

(4-55)
Therefore hrot
1
hrot
 R( K )  5 51
 n 
 1.3643 *  2
 m2 ( K ) 
(4-56)
The above equation still has one unknown that is  and that needs to be replaced by
sample standard deviation s , so we have
1
hrot
 R( K )  5 51
 n s
 1.3643 *  2
 m2 ( K ) 
(4-57)
That is how plug-in works. There are other variations but the concept remains the same.
The table 3.1 shows values required for plug-in bandwidth selection hrot .
99
TABLE 4-1 Plug-in values for hrot
Kernel
R(K)
m2(K)
Uniform
½
1/3
Epanechnikov
3/5
1/5
Gaussian
1/2π
1
4.5.3 ADAPTIVE KERNEL DENSITY ESTIMATE (AKDE)
Global bandwidth approach that we have described above may result in under-smoothing
in areas with only sparse observations while at the same time over-smoothing in other
areas. For this reason there is need to vary the bandwidth along the sample data so that
more smoothing is done where data is sparse and vice versa. Kernel density estimation
methods that rely on such varying bandwidth are commonly referred to as adaptive kernel
density estimation. We also experiment with it in order to find out its effect on shopping
item images.
Most of the adaptive kernel density estimation can be grouped into two categories that is
balloon estimators and sample point estimators. The balloon estimators select different
smoothing parameter for each estimation point x . The sample point estimator uses a
distinct bandwidth for each data point x i .
4.5.3.1 UNIVARIATE BALLOON ESTIMATOR
The univariate balloon estimator is given as

f B ( x) 
1 n  xi  x 

 K
nh( x) i 1  h( x) 
(4-58)

The estimate of f B (x) is an average of identically scaled kernels centred at each data
point. The asymptotically best balloon estimator optimizes the AMSE pointwise; it
achieves a minimum where (Terrell & Scott, 1992)
100
1


 2 j 1
2
2
(
j
!
)
f
(
x
)
K

  2j11



ho ( x) 
n
,
 2 j ( f ( j ) ( x)) 2 




(4-59)
and
2


 2 j 1
j
( j)
2 j
2
 f ( x) f ( x)(  K ) 
2 j 1



AMSE o ( x)  (2 j  1)
n


(2 j ) j j!




(4-60)
This is the general result for non-negative kernels. The most commonly used univariate
balloon estimator is the Loftsgaarden-Quesenberry k th nearest-neighbour kernel of the
form

f ( x) 
n
 x  x
1

K  1

nhk ( x) i 1  hk ( x) 
(4-61)
The number of nearest neighbours k controls the level of smoothing, with larger values of
k corresponding to more smoothing. The use of nearest-neighbours results in more
smoothing occurring in regions of low density and less smoothing in region of high
density.
4.5.3.2 SAMPLE SMOOTHING ESTIMATORS
The sample point estimator is given by

f
SP
( x) 
 x  xi 
1
1

K 

n h( xi )  h( xi ) 
(4-62)
The estimation of f (x) is an average of differently scaled kernels centred at each
observation. In this case h( xi ) should vary inversely with the underlying density. Consider
taking the following:
101
h( xi )  he * f ( xi )
1
2
(4-63)
Thus we get
1
f e ( x) 
nhe

n

i 1
1


2
(
x

x
)
f
(
x
)


i
i
f ( xi ) K 

he


1
2
(4-64)
This choice is advantageous because it gives an improved convergence rate of MSE
(Simonoff, 1996).
4.6
THE N-DIMENSIONAL KERNEL DENSITY ESTIMATION
The concept of N-dimension KDE is almost an extension of 1-dimension KDE. Suppose
we
consider
a
q -dimensional
random
vector
X  ( X 1 , X 2, ......., X q )T where
X 1 , X 2 ,....., X q are one dimensional random variables. For random sample of size n , it
means we have n observations for each of the q random variables X 1 , X 2 ,....., X q . Our
goal is to estimate the probability density of X  ( X 1 , X 2, ......., X q )T , which is a joint
probability density function f of the random variables X 1 , X 2 ,....., X q
f ( x)  f ( x1 , x2 ,........xq ) .
From the 1-dimensional case we adapt the KDE to the q  dimensional case as

f h ( x) 
1 n
1
K hq ( x  X i )  q

n i 1
nh
n
 x  Xi 

h 
 K 
i 1
(4-65)
The above equation can be simplified and the multivariate KDE becomes

f h ( x) 
1 n  q 1  xv  X iv
  hv K  h
n i 1  v
v




(4-66)
Giving an example in 2-dimensional KDE where X  ( X 1 , X 2 ) T is given as
102

f h ( x) 

1 n 1 1  x1  X i1 x2  X i 2 

,
 * K
n i 1 h1 h2  h1
h2

(4-67)
 x  X i2 
1 n 1 1  x1  X i1 
 * K  2

* K 

n i 1 h1 h2  h1 
 h2

Each of the n observations is the form ( X i1 , X i 2 ) , where the first component gives the
value that the random variable X 1 takes on the i th observation and the second component
does the same for X 2 .
4.6.1 KERNEL DENSITY ESTIMATOR (PROPERTIES)

Multivariate kernel satisfies
 K (u)(du)   K (u)du .....du
1
q
1
(4-68)
Where K (u ) takes the product form:
K (u)  k (u1 ) * k (u 2 ) * ..... * k (u q ) .
(4-69)

Since K (u ) is a product kernel then the marginal densities of f (x) equal univariate kernel
density estimators with kernel functions k and bandwidths h j .

The variance of the estimator is

Var ( f ( x)) 


f ( x) R( K )
1
 O 
n| H |
n
f ( x) R(k ) q
1
 O( )
nh1 h2 ...hq
n
Bias of the estimator
103
(4-70)

Bias ( f ( x)) 
m j (K )
j!
j
f ( x)hvj  o(h1j  .....  hqj )

v
v 1 x j
q
(4-71)
4.6.2 ASYMPTOTIC MEAN INTEGRATED SQUARED ERROR
We derived it in univariate so here we state the AMISE as
2
m 2j ( K )  q  j
R( K ) q
j 


AMISE ( f ( x)) 
f
(
x
)
h
(
dx
)


v 
nh1h2 ....hq
( j!) 2   v 1 xvj


(4-72)
There is no closed-form solution for the bandwidth vector which minimizes this
expression. The following observations can be taken note of:

The AMISE depends on the kernel function only through R(K ) and m 2j K (u ) , so it
clear that for any given j , the optimal kernel minimizes R(K ) , which the same as in
the univariate case.

The optimal bandwidths will all be of order n
n
2 j
2 jq
1
2 jq
and the optimal AMISE of order
. These rates are slower than the univariate case that is when q  1 . The fact
that dimension has an adverse effect on convergence rates is called the CURSE OF
DIMENSIONALITY.
4.7
FINDING OPTIMAL BANDWIDTH
There are so many methods for bandwidth selection and these include Mean Square Error
MSE), Mean Integrated Squared Error (MISE), Asymptotical MISE, plug-in techniques,
bootstrap methods, just to mention a few. We are going to briefly describe plug-in methods
of optimal bandwidth.
4.7.1 PLUG-IN BANDWIDTH
We are going to derive the rule-of-thumb, suppose that h1  h2  .....  hq  h . Then
104
m 2j ( K ) R( j f )
R( K ) q
nh q
(4-73)
 ( j!) 2 qR( K ) q  ( 2 j  q )  ( 2 j1 q )

ho  
n
 2 jm 2 ( K ) R( j f ) 
j


(4-74)

AMISE ( f ( x)) 
( j!) 2
h2 j 
Where
 j f ( x)  
j
f ( x) .
xvj
We find that the optimal bandwidth is
1
For a rule-of-thumb bandwidth, we substitute f by the multivariate normal density  . We
calculate that
R( j  ) 
q
q
2
(2 j  1)!!(q  1)(( j  1)!!) 2 )
 2 q j
(4-75)
After the substitution, we obtain
h0  C j ( K , q)n

1
(2 j q)
(4-76)
Where
1
q

 (2 j q)


 2 2 q  j 1 ( j!) 2 R( K ) q
C j ( K , q)   2
2 
 jm j ( K )((2 j  1)!!(q  1)(( j  1)!!) ) 


(4-77)
We assumed that all variables had unit variance. Rescaling the bandwidths by the standard
deviation of each variable, we obtain the rule-of-thumb bandwidth for the v th variable:
105

hv   v C j ( K , q)n

1
(2 j q)
(4-78)
The values of constant C j ( K , q) are in the table 4.2:
TABLE 4-2 Value of Constant C j ( K , q)
4.8
SHAPE REPRESENTATION USING ADAPTIVE KERNEL DENSITY
FEATURE POINTS ESTIMATOR (AKDFPE)
This method describes the feature points within the rectangle boundary in an image grid.
Assume we have a silhouette object shape segmented by some means such as Chan & Vese
Active Contour without Edges and let the feature points set P( x, y) (intensity function) of
the object shape be defined as
P( x, y)  pi ( x, y) such that i  1, 2, .....n where n   .
(4-79)
We find the centroid of the object shape. The following formulae will be used to calculate
the centroid (Flusser et al., 2009),(Mukundan & Ramakrishnan, 1998):
xc 
yc 
m1, 0
(4-80)
m0 , 0
m0,1
(4-81)
m0 , 0
where m1,0 , m0,1 , m0,0 are derived from the silhouette moments given by
106
mi , j   x i y j P( x, y) .
x
(4-82)
y
Thus for silhouette image P( x, y) , m0, 0 the moment of zero order represents the
geometrical area of the image region and m1,0 , m0,1 moment of first order represents the
intensity moment about the y-axis and x-axis of the image respectively. The centroid
( xc , yc ) gives the geometrical centre of the image region.
Suppose the size of the grid occupied by the object shape is NXN. The vector dimension to
represent the density of object shape will be N-1. The centroid calculated by the two
formulas above 4-80 and 4-81 is ( xc , yc ) . From the centroid we count the number of image
pixels in the rings around the centroid. The number of image pixels in each and every ring
is given as vector
xim  (n1 , n2 , ......nm ) where m is the number of rings around the
centroid.
From now we apply the Adaptive Kernel Density Feature Points Estimator (AKDFPE). We
are using second-order KDE.
The AKDE using the modified Loftsgaarden-Quesenberry k th nearest-neighbour kernel of
the form in equation 4-75.

f ( x) 
n
 x1n  x 
1

K
 
nhkc ( x) i 1  hkc ( x) 
(4-83)
The number of nearest neighbours k c controls the level of smoothing of clusters c ,
i  1, 2, 3............, n . K () is the kernel function, n is the number of rings and hkc is the
bandwidth per cluster. We calculate the optimal bandwidth hoc j per cluster. Then we
recalculate the vector elements of the image, using equation 3-76 that follows:
 m 
1
 x  x
f ( xi ) 
K i
hoc j  hoc j 


(4-84)
Where j  1, 2, 3, .....m
107
4.8.1 PROPOSED CALCULATION OF THE OPTIMAL BANDWIDTH
The real problem with Kernel Density Estimator is when to use the global or variable
bandwidth. The next problem is how to find a suitable k th nearest neighbourhood to use to
calculate the optimal bandwidth. The number of nearest neighbourhood k controls the level
of smoothing. What it means is that when k is equal to the number of sample elements then
the global bandwidth is calculated otherwise the variable bandwidth is calculated. When
one has k then it is easy to calculate the optimal bandwidth. The question is: How do you
find the k for a given sample elements?
To address this problem, we take an image whose centroid have been calculated and
denoted as
Dm  Dc  1
(4-85)
This is the first density feature of the image and is equal to one since the centroid is one
pixel that belong to an image. The rest of the density feature points of the image is given as
Dm  n1, n2 ,n3 ,n4 ,..................., nm
(4-86)
Where m  1,2,3,.........n indicating the number of rings from the centroid. Within a given
ring the image I occupies a certain percentage of the ring area O(I )% . These percentages
indicate whether an image sparsely or densely occupies the ring. We calculate this
percentage as given in equation 3-79.
O( I )% 
nm
*100
2 m 2
(4-87)
The system then is supposed to cluster all the consecutive rings that fulfil predefined
conditions. For example
0  O( I )% 25
25  O( I )%50
50  O( I )%75
75  O( I )%  100
108
To find the k th nearest neighbourhood the system counts the elements in each cluster and
that constitute the k cth nearest neighbourhood of that given cluster. There are special cases to
consider in the event that they all fall in the same cluster or the clusters are made of one
element each then the system calculates the global bandwidth. When cluster has one
element in between two clusters then it is included in the cluster approximately near to it in
terms of cluster values. In reality it is calculating global bandwidth within clusters. This
method will take care of sparse and dense observations. Figure 15 shows an image with a
calculated centroid c and the ring around the centroid numbered 1, 2, 3, 4, 5, ...., n .
FIGURE 4-3: Shows the rings around the centroid of an image
Suppose the system clustered the rings as follows:
c1  1, 2, 3, c2  4, 5and so on
Thereafter the system calculates the bandwidth for each of the clusters as follows:
109
1
hoc1
1
 R(k )  5
 1.3643 *  2  * 3 5 * s
 m2 (k ) 
1
1
 R(k )  5
ho c2  1.3643 *  2  * 2 5 * s
 m2 (k ) 
.............................................................
1
ho cn
1
 R(k )  5
 1.3643 *  2  * x 5 * s
 m2 (k ) 
where x 1 is the number of elements in cluster n and s is the variance of the cluster in
question.
4.8.2 AKDFPE ALGORITHM STEPS
1. Read image
2. Digitalize the image
3. Find the centroid ( xc , yc )
4. Count image pixels in each circle around the centroid one pixel wide
5. Calculate the percentage of the image pixels in each circle
6. Cluster adjacent circles of the same percentage
7. Standardize the initial vector of the image
8. Find the optimum bandwidth for every cluster
9. Apply the kernel density estimator to every cluster
10. The resultant vector is the image representation
11. End
110
4.8.3 EXAMPLE
Supposed we have the following object shape features on a grid given in Figure 4-4
1,0
2,0 3,0
4,0
1,1
2,1 3,1
4,1
0,2 1,2
2,2 3,2
4,2
3,3
4,3
2,4 3,4
4,4
FIGURE 4-4: Segmented object shape
The bold numbers are the image pixels. The size of the grid occupied by the object shape is
5X5. The different colours indicate the rings of width one pixel around the centroid (3, 2).
x 3i  (7, 8, 1)
The vector above will be represented as follows in the standardize way:
x13  (28,16,1)
And percentages are as follows
%s  (88, 50, 3)
It means they belong to three different clusters. In this case we calculate global bandwidth.
From now we apply the Adaptive Kernel Density Estimator (AKDE). We are using
second-order AKDE.
The AKDE is given as
111

f
h ( x ) ( x) 
1 3
1 3  x  xi
K
(
x

x
)

 h( x)
 K
i
3 i 1
3h( x) i 1  h( x)



(4-88)
We then calculate the optimal bandwidth hoc for each cluster in an image shape vector.
Then we recalculate the vector elements of the image, using the univariate balloon
estimator using modified Loftsgaarden-Quesenberry k th nearest neighbourhood given in
equation 3-81.

f B ( x) 
1 n  xi  x 

 K
nh( x) i 1  h( x) 
(4-89)

The estimate of f B (x) is an average of identically scaled kernels centred at each data
point.



1  x13  x 
f ( x) 
K
hoc1  hoc1 


3
1
(4-90)
K () is the ker nel function
This is how the images will be represented.
4.9
SIMILARITY MATCHING
We will experiment with the (dis)similarity methods below in equations 3-83, 3-84 and 385. The (dis)similarity methods in equations 3-83 and 3-84 are metric which makes the
retrieval system efficient if used in a metric modelled database. Method in equation 3-85 is
non-metric (dis)similarity measure. We will compare the effectiveness of these
(dis)similarity methods. The methods are given in equation 3-83, 3-84 and 3-85.
112
Euclidean and the Cityblock dissimilarity algorithms are given as
d ( x, y ) 
n
 (x
i 1
i
 yi ) 2 || x  y || 2
(4-91)
n
d ( x, y )   | xi  yi | || x  y ||
(4-92)
i 1
respectively. The city-block takes fewer operations than the Euclidean dissimilarity. Both
of them are metric distances. The cosine similarity is given as
n
d ( x, y ) 
p
i 1
n
 pi2
i 1
i
* qi
n
q
i 1
2
i
(4-93)
The numerator of equation 4-93 is a dot product. To be able to compare the effectiveness
of the (dis)similarity methods we can use the retrieval effectiveness of the system when
different methods are used. The system is supposed to rank the results and we subjectively
evaluated the results. Cosine similarity is non-metric distance due to the fact that it does
not fulfil the reflexivity property of metric axioms in a).

d ( x, y)  0, if and only if x  y,
Reflexivity
4.10 EVALUATION
Visual evaluation of the system will be done using Precision-Recall Curve (PRC). This
will be complemented by scalar evaluation of the system.
Effectiveness of the retrieval system will be measured by precision (which is the number of
correct image retrieved divided by the total number of images retrieved) and recall (is the
number of correct images retrieved divided by the total number of possible correct
images).
precision 
A
AC
(4-94)
113
A
N
(4-95)
 A
if T  N

effectiven ess   N
A
 if T  N
T
(4-96)
recall 
Where A is the number of relevant image objects retrieved, C is the number of not relevant
image objects retrieved, T is the number of relevant images that the user requires from the
database and N is the total relevant images in the database.
We are also going to measure the retrieval rate by the bull’s eye score. The bull’s eye score
in percentage is measured by the number of correct retrievals divided by the number of
relevant items in the dataset. Every shape in the database is compared to every other shape
in the database. For example the MPEG 7 dataset where we have 70 distinct classes of 20
similar shapes the bull’s eye value percentage will be calculated as follows:
B
D
*100
P
(4-97)
where B is a Bull’s eye score in percentage, D is the total sum of correct retrieval and P is
the total possible outcome.
We will also compare our method with other representation methods to prove its
robustness.
4.11 DATASETS
MPEG7-CE shapes and general shopping item images will be used. The use of MPEG 7
dataset makes segmentation techniques not to influence the output of the retrieval system.
This makes the evaluation of the system objective. Our system must work in real world
were the different types of noise are introduced within the system and the system has to
deal with, therefore the need to use the general shopping items images.
114
4.11.1 MPEG 7
MPEG 7 contour shape CE-1 is dataset of over 3400 images divided into three parts. The
objective of each part is as follows:
PART A: robust to scaling and rotation (A1, A2)
B: performance of the similarity-based retrieval
C: robustness to changes caused by non-rigid motion.
Part A is a necessary condition for any shape descriptor. Sets A1 and A2 consists of eight
hundred and forty (840) shapes that are organized into seventy (70) groups. These sets
which have equal number of shapes and six (6) similar shapes in each group are used to
test scale and rotation invariance. The MPEG 7 part B database contains one thousand four
hundred (1400) binary shape images. This consists of seventy (70) distinct classes of
shapes; each class containing twenty (20) similar shapes. Set B is used to test the overall
robustness of the shape representation through similarity based retrieval. Set C contains
one thousand five hundred (1500) shapes and is used to test robustness of non-rigid
deformations.
MPEG 7 region shape CE-2 database consists of 3621 binary image shapes of mainly
trademarks. The database is used to test performance on complex shapes consisting of
multiple disjoint regions. The classified test set contains two thousand eight hundred
(2800) trademark shapes: six hundred and seventy eight (678) objects shapes are classified
into ten (10) groups, on the base of perceived region shape similarity. The groups consist
of variable number of shapes. The remaining two thousand one hundred and twenty two
trademarks are unclustered. They also measure scaling, rotation and subjective tests. This
database is also organized in almost the way as CE-1 database. The shapes also test
scaling, rotation and robustness of the shape representation through similarity based
retrieval.
4.11.2 GENERAL SHOPPING ITEM IMAGES
General shopping item images will also be used to measure the overall effectiveness of the
retrieval system in this domain. The database of over four hundred (400) shapes will be
created from images from the Internet. The database will be organized into twenty (20)
115
groups with at least ten (10) similar shapes in each group. Some distinct items collected
from the Internet to make a general shopping item image dataset are shown in figure 4-5.
FIGURE 4-5: Distinct images from the Internet
4.12 QUERY IMAGES
Images in the MPEG 7 databases and the general shopping item shapes database are all
possible query images. In addition to query images mentioned, images captured by camera
enabled devices will be used as query images to retrieve similar images from the general
shopping item shapes database.
4.13 CHAPTER SUMMARY
The following algorithms are going to be experimented with|:

Pre-processing stage – Histogram equalization, image filtering, resizing and
morphological techniques

Segmentation Stage – Active Contour without edges and Robust Image
Segmentation using local median

Representation Stage – Non-Parametric method

Similarity Matching Stage - Euclidean and Cosine methods
These techniques were chosen due to their merits discussed in this chapter.
116
CHAPTER 5
5
EXPERIMENTATION, RESULTS AND DISCUSSION
This chapter describes the experiments that were conducted during the building up of the
retrieval system, testing the retrieval system and experimenting with Image Content in
Shopping Recommender System for Mobile Users. It also gives the results of the
experiments and discusses the results of the experiments. The ultimate purposes of the
experiments were to:

Measure the effectiveness of the retrieval system using the AKDFPE representation
method

Incorporate the retrieval system into the Image Content in Shopping Recommender
System for Mobile Users
5.1

Simulate the usage of the recommender system

Measure the satisfaction of the users
EXPERIMENTS
The initial experiments are to ascertain the robustness of the retrieval system. In doing this,
it entails making sure all the stages are performing to optimum. Using general images, in
this case the shopping items would require all stages of the retrieval system to be working
appropriately to be able to get a good retrieval rate. This means that the pre-processing and
segmentation stages need to be tested and adjusted to produce acceptable results. To test
these stages there is need to have test data so that these stages could be calibrated to suit
the image domain for automation or semi-automation of the system. A database was
created of over four hundred (400) shopping items such as televisions, shoes, beds and so
forth. Samples of some distinct image items are shown in Figure 5.1.
117
FIGURE 5-1: Samples of shopping items in each category in the dataset
The selection of the (dis)similarity algorithm between the metric (Euclidean) and the nonmetric (cosine) algorithm will be done using the Adaptive Kernel Density Feature Points
Estimator (AKDFPE) algorithm since it is the method that is being proposed for the Image
Content for Shopping Items Recommender System for Mobile Users. After choosing the
compatible (dis)similarity algorithm the effectiveness of the representation algorithm
AKDFPE is measured in comparison with other methods. The AKDFPE is a region based
representation method so in theory it is a domain independent method (generic algorithm).
To test the generic form of the algorithm, it will be tested against contour based and region
based algorithms. In order not to reconstruct all the methods to compare with AKDFPE the
standard datasets are used and the results of AKDFPE will be compared with results
obtained by other authors. After ascertaining that the AKDFPE method is effective and
performing better than any other methods compared with, the retrieval system will be
incorporated into the recommender system.
The recommender system as shown in chapter 2 will be simulated. The results of the
recommender system will be evaluated by a sample of people. The analysis of the results
and the system will be done.
5.2
PRE-PROCESSING, SEGMENTATION AND (DIS)SIMILARITY
SELECTION
Image pre-processing suppresses undesirable distortions or enhances some image features
in order to improve the quality of the image data for further processing. Selection of good
pre-processing techniques is very significant in image processing. Segmentation is one of
the image processing techniques that depend directly on the pre-processing stage. For
effective retrieval, (dis)similarity algorithm must be compatible with the representation
method. A segmentation technique impacts directly on the effectiveness of the
118
representation method. The immense contribution of pre-processing, segmentation and
(dis)similarity techniques to the system cannot to be ignored. So experiments to ascertain
that these stages are adequately contributing to the system were performed and the
following results obtained.
5.2.1 RESULTS FOR PRE-PROCESSING AND SEGMENTATION STAGES
Figure 5-2b shows samples of results of pre-processing and segmentation of images in
Figure 5-2a. Subjectively it is agreed that this is an acceptable pre-processing and
segmentation results. The settings were then set for all the images for the retrieval system.
(a)
(b)
FIGURE 5-2: (b) Sample results of pre-processing and segmentation of images in (a)
5.2.2 RESULTS FOR SELECTION OF (DIS)SIMILARITY METHOD USING
AKDFPE
At least hundred shopping item images that belong to the database were used as query
images. Retrieval effectiveness of the (dis)similarity methods was evaluated using recall
119
and precision methods also subjective evaluation. Average precision was only calculated
on hundred percent recall. Ranking was also evaluated subjectively. Figures 5-3 and 5-4
show the normal retrieval results of the retrieval system that the users experience. In these
samples the query image is on top left of each figure. Figure 5-5 shows the results of the
retrieval system showing similar segmented images. In this Figure 5-5, it is possible for the
developers to evaluate the robustness of the representation and similarity methods used.
(AKDFPE)
120
(AKDFPE)
FIGURE 5-5: Segmented shapes that were considered similar by AKDFPE using cosine
similarity algorithm
5.2.3 RESULTS ANALYSIS OF PRE-PROCESSING, SEGMENTATION AND
(DIS)MILARITY TECHNIQUES
Results shown in Figure 5-2 enabled the setting of pre-processing and segmentation
parameter for automation of the stages for general image shapes. An observation from the
sample results Figure 5-3 results show eighty percent precision for cosine and seventy
percent precision for Euclidean methods. Figure 5-4 results show ninety percent for cosine
and eight percent for Euclidean methods. Overall results showed an average precision of
93.05 percent for cosine similarity method as compared to 92.60 percent for Euclidean
method. Subjectively it is agreed that cosine method was superior to Euclidean method in
ranking the image results. This prompted the selection of cosine similarity method to be
121
used with the AKDFPE representation method. In figure 5-5 where there are results of the
segmented images that were considered similar by the system (AKDKFPE and cosine),
subjectively it was evaluated that the system is working well. This was due to the fact that
human perception of the images they are similar to each other but with some distortions
which the system was able to overcome.
5.3
EFFECTIVENESS OF AKDFPE AND OTHER REPRESENTATION
METHODS
With building of the retrieval system complete, the system needs to be tested for its
effectiveness and robustness against other retrieval systems. The results of the system will
be compared with other systems results that were tested on standardized datasets. This
stage will also verify the generic form of our system. Experiments will also be done on
general shopping item images and the results compared with DHFP retrieval system.
5.3.1 RESULTS FOR COMPARISON OF EFFECTIVENESS BETWEEN
AKDFPE AND OTHER METHODS ON STANDARD DATASETS
The proposed system is now complete with the pre-processing and segmentation stages
calibrated for the shopping items domain. The cosine similarity algorithm selected as
optimum method for AKDFPE representation algorithm. Firstly the method AKDFPE was
tested for the necessary conditions that are rotation, scaling and translation. After obtaining
satisfactory results then the retrieval system was tested for effectiveness and robustness
against other methods. The results of the experiments are in figures below. The
representation methods AKDFPE and DHFP were subjected to experiments with MPEG 7
datasets in order to ascertain their effectiveness and generic form. The benefits of using
standardized datasets for example MPEG 7 datasets is that it is possible to compare
methods without reconstructing the other authors’ methods. Authors can claim the
robustness of their method(s) over others. The results of the experiments are shown in table
5-1 and figure 5-6.
122
TABLE 5-1: Comparison of Bull’s Eye Performance on MPEG 7 CE 1 Dataset Part B
Method
BEP %
CSS
81.12(Bai, Latecki & Tu, 2010)
IDSC
91.61(Bai et al., 2010)
DHFP
92.18
KDFPE
92.70 (Zuva, Olugbara, Ojo & Ngwira,
2012)
AKDFPE
93.56
123
FIGURE 5-6: Average precision-recall on Region Based Test Image Retrieval on 678
object shapes (MPEG 7 CE 2)
5.3.2 RESULTS FOR COMPARISON OF EFFECTIVENESS BETWEEN
AKDFPE AND DHFP ON SHOPPING ITEMS DATASET
Having had satisfactory results on standard datasets, we then moved to the domain of
interest shopping items domain. Figures 5.7 and 5.8 show the normal retrieval results of the
retrieval system that the users experience. Figure 5.9 shows the performance measure of
the retrieval system using the recall-precision graph. This will help in evaluating the
performance of the system in the domain of interest.
124
FIGURE 5-7: Ten retrieval results of AKDFPE on left and DHFP on the right (query at the
top left of the figure)
FIGURE 5-8: Ten retrieval results of AKDFPE on the left and DHFP on the right (query at
the top left of the figure)
125
FIGURE 5-9: Average precision-recall chart on General Image Retrieval
5.3.3 RESULTS ANALYSIS FOR EFFECTIVENESS BETWEEN AKDFPE AND
OTHER METHODS
In table 5-1 it shows AKDFPE has a better BEP score compared with rest of other methods
compared with. The experiment was testing the performance of the representation methods
on a contour based standardized dataset. Figure 5-6 indicates that AKDFPE performed
better than DHFP when tested on region based standardized dataset. These results which
showed robustness of AKDFPE method over others and its generic form necessitated its
comparison with DHFP on shopping items database. The results in figure 5.7 and in figure
5-8 show that AKDFPE performs better as compared with DHFP method. The BEP score
of 92.64 for AKDFPE and 90.87% for DHFP confirmed the superiority of AKDFPE
method. These satisfactory results enabled AKDFPE retrieval system to be incorporated in
the Image Content for Shopping Items Recommender system for mobile Users.
126
5.4
IMAGE CONTENT FOR SHOPPING ITEMS RECOMMENDER SYSTEM
FOR MOBILE USERS
The mobile users captured query images by a camera enabled cell phone. The 3-D objects
may have more than one 2-D images as shown in figure 5-10. In figure 5-10 some 2-D
images may be very difficult to be used to identify the type of the object as shown in figure
5-11a. Figure 5-11b shows 2-D images that seem to be easy to identify the object type. In
this case for object in figure 5-10 only images in figure 5-11b will be included in the
dataset. That means each object in the dataset will have more than two 2-D images in the
database if necessary. The shopping item images captured by the camera enabled device
must be compatible with those already in the database. This enabled measurement of the
performance of the retrieval system. The retrieval system was made aware of the images
that belong to the same 3-D object so that when retrieving only one image of the object
comes out. In terms of soft shopping items like cloths for example dresses, trousers the
images were taken while on doles. Example of cell phone and cell phone camera
specifications are given in table 5.2 (randomly chosen). At last we made the system to
operate as a recommender system where some shopping items were made to be on
promotion and others on special offer. The recommender system was evaluated by sample
of fifty (50) users randomly chosen for its performance, usefulness and satisfaction. They
rated the performance of the system by scoring their degree of satisfaction using the scores
in table 5.3. The system should retrieve images similar to the one queried by the user but at
the same time would also bring other shopping items on special offer to the user. The
system also has a dummy Global Positioning System (GPS) coordinates to enable the users
to find the retail shop.
FIGURE 5-10: 2-D images of a 3-D shopping item
127
a)
b)
FIGURE 5-11: a) set of images difficult to identify b) set of images easy to identify
TABLE 5-2 6220c cell phone and its camera specifications
TABLE 5-3 Scores to measure satisfaction with performance of the system
128
5.4.1 RESULTS FOR RETRIEVAL SYSTEM OF SHOPPING ITEMS FOR
MOBILE USERS
FIGURE 5-12: Query image captured by a camera enabled mobile device
FIGURE 5-13: Ten retrieval results of AKDFPE
129
FIGURE 5-14: Average precision-recall on General Image Retrieval (Query captured by
cell phone)
5.4.2 RESULTS FOR IMAGE CONTENT FOR SHOPPING ITEMS
RECOMMENDER SYSTEM FOR MOBILE USERS
FIGURE 5-15: Query Image
130
FIGURE 5-16: Results from the Shopping Recommender System
FIGURE 5-17: Query Image
131
FIGURE 5-18: Results from the Shopping Recommender System
132
Retailer
133
Retailer
134
FIGURE 5-23: Evaluation of the recommender system
5.4.3 RESULTS ANALYSIS
Figure 5-13 a very high effective retrieval rate of the shopping item images. For 100%
recall there is at least 60% average precision. This might have happened because of how
the 3-D images were represented using more than one 2-D images as shown in figure 5-10.
Evaluation of the system was done as shown in figures 5-23. The reason for high scoring
might be that retrieval of images is still novel to the students and also the system’s
performance is very high. The group of students were also influenced by the incorporation
of their preferences in the system. We can safely conclude that the effect of personal
preferences in the system has a positive effect to user. The imitation of the shopping
recommender system was accepted positively by the evaluators. The results in figures 518, 5-20 and 5-22 are very interesting; they show the practicality of incorporating image
retrieval into recommender systems. Since at 10% recall the 3-D retrieval system is almost
100% the recommender system was made to retrieve at most three images similar to the
one queried by the user.
135
5.5
OVERAL RESULTS ANAYSIS
The proposed AKDFPE representation method has been extensively studied and evaluated
in detail. It has shown that it fulfils the necessary conditions for image descriptors that are
rotation and scaling. The most challenging is scaling in the sense that scaling of object to a
relatively small size may result in significant distortion in their shape. AKDFPE method is
a generic image representation method that is why it was tested on contour based and
region based test datasets (MPEG 7 databases). The AKDFPE method satisfies almost all
of the requirements set by MPEG 7 for shape representation. The requirements are good
retrieval accuracy, compact features, general application, low computation complexity,
robust retrieval performance and hierarchical coarse to fine representation. The method
(AKDFPE) from the results obtained can deal with errors in segmentation and is robust to
segmentation noise.
In cases where BEP was calculated it means every image was considered as a query image
and every image contributed in the calculation of the performance measure. The recallprecision performance of the method is calculated where randomly selected images were
used as query images. In all these performance measurements AKDFPE has a high
retrieval performance and performing better than the compared methods. AKDFPE has
competitive retrieval performance on general shopping items shapes. It is important to note
that AKDFPE does not represent images by absolute values of the features but estimates.
This makes it very effective in retrieval of similar images.
Incorporation of the retrieval system into the recommender system shows that it has a
positive effect on the users. In this research only at least one aspect of user’s preferences
was factored in the system, in recommender systems almost all aspects of user preferences
will be factored in making the user to benefit more from the system. This type of
recommender system also has problems of scalability as the images increase the
effectiveness is also reduced. They also can make a browser be a buyer in the sense that if
a user captures an image the recommender system is capable of making a recommendation.
So they do not have problem of cold start.
136
CHAPTER 6
6
CONCLUSION, CONTRIBUTION AND FUTURE WORK
This chapter gives the conclusion, contribution and future work of the research work. In
chapter one the goal of the research is stated as follows: “The goal of the research is to
evolve image content representation algorithm for effectively matching sales item whose
image content has been extracted by Active Contour without Edges in an Image Content in
Shopping Recommender System for Mobile Users.” There is need to evaluate whether the
goal was achieved or not at the same time evaluating the contribution of the research work
and then discuss future work.
6.1
CONCLUSION
In this research, Image Content in Shopping Recommender System for Mobile Users was
the main interest. An effective and efficient recommender system for mobile users entails
having an effective image retrieval system as a component of this recommender. The
fundamental components of image retrieval system are image pre-processing, image
segmentation, image representation and image matching. In the endeavour to fulfil the
objective of this research work reviews of literature, creation of new method and building
of systems was done. Evaluation of the systems was done. The following section is an
elaboration of the work done to fulfil the primary objective of having an Image Content in
Shopping Recommender System for Mobile Users.
Shape representation, segmentation and similarity methods have been reviewed. The
importance of the reviews and studies are to understand the problems and issues involved
in these techniques. Also to identify open issues, advantages and disadvantages of these
techniques used in retrieval systems. Scientific methodologies have been used in the
studies. In this research standard datasets that is MPEG 7 datasets, general shopping items
database and acceptable performance measurement techniques have been used.
The image segmentation in shopping item domain requires a region based method. This is
due to the fact that some of the shopping items have smooth edges or are without edges
making contour based methods not the most ideal. Within the region based methods the
one that use the global statistics to model the regions to segment are the most suitable.
137
The study shows that region based representation techniques are the future in generic
retrieval systems. Region based techniques are more accurate and robust than the contour
based techniques. A new region based image representation method was born the
AKDFPE. The AKDFPE shows that representation of images is best done by estimation
instead of exact quantities of shape features. The system is capable of visualizing dense
and sparse areas in order to make a decision on how to calculate the optimal bandwidth for
the shapes. This method is robust to noise due to its estimation characteristics. The
performance measurements show that the method outperformed considered contour based
and region based methods. This method is a generic image representation method.
The (dis)similarity algorithm have to be chosen from metric or non metric classes. In doing
this one has to decide on the tradeoffs between effectiveness and efficiency of the system.
In this Image Content in Shopping Recommender System for mobile users the accuracy of
the recommendation is the most critical element of the system as much as the efficiency of
the system must be acceptable. In this case cosine similarity algorithm showed to be more
effective than Euclidean dissimilarity. It is also possible to support it using metric
modelled databases.
The retrieval system was final built. The system was tested on 2-D image shapes. The
challenges come when a query image is capture by camera enabled device that translates 3D objects to 2-D image shapes. To tackle this challenge more than one 2-D images were
made to represent one 3-D object. The retrieval system performed very well. Incorporating
the retrieval system into the recommender system was done. The evaluators were satisfied
with the system. In conclusion the situational problem in chapter one is revisited:
“Suppose Nyasha leaves home with a location and a camera enabled mobile device for
shopping. Getting to a nearby shop, she finds an item similar to an item she really wants.
Now she is faced with difficulty of either buying it now or continue doing window
shopping with the hope of finding the real item she wants. The dilemma is if she does not
buy now she might not get it later or if she does, she might get the one she wants, as she
continues window shopping. Consequently, the problem is, with the aid of a camera
enabled mobile device carried by Nyasha, how can she be helped to make the decision of
buying this item or not with the realization that the shops have databases of shopping items
online?”
138
Solution:
Nyasha must login into the Image Content in Recommender System for Mobile Users and
then capture the image of the object. The camera enable device then send the image to the
recommender system and the system returns the GPS of the retailer nearest to Nyasha’s
location with the shopping item of interest. It also recommends other shopping items that
might interest Nyasha that are on promotion or on special offer. The goal of the research
was fulfilled.
6.2
SUMMARY OF CONTRIBUTIONS
The main contributions of this research are as follows:

Comprehensive reviews of segmentation, representation and similarity
techniques were done, challenges, open issues, advantages and disadvantages
were highlighted. Standard evaluations of these techniques were recommended
for easy comparison. Research papers for international conferences were written.

A generic AKDFPE descriptor is proposed and a comprehensive evaluation was
done. The technique is suitable for generic shape description and retrieval. The
technique outperformed contour and region based techniques by taking
advantage of the way it estimates the feature distribution of the shape and
calculates the optimal bandwidth. The automation of the calculation of the
optimal bandwidth was novel. The AKDFPE satisfies most of the principles set
by MPEG 7 for image retrieval. The technique allows changes to its kernel
function to suit specific domain.

The proposed AKDFPE has been applied on MPEG 7 datasets and to general
shopping items images database. The Image Content in Shopping Recommender
System for Mobile Users novel and user satisfaction in the utilization of the
system was noted.

The proposed technique has been tested on shopping items images database
queried by items images captured by camera enabled mobile device.

The usage of images as input into system minimized the use of text which is still
a challenge to mobile users due to the size of the mobile devices. This means our
research has contributed in giving a solution to this problem.
139

The research contributed in showing the practicality of incorporating image
retrieval into mobile recommender system which is a novel idea. This removes
any ambiguities in querying the system as compared with what would have
happened when querying with keywords.
6.3
FUTURE WORK
Content based image retrieval in recommender systems for mobile users is a very
interesting area that is still under investigation. Reduction of text usage in mobile
recommender systems for mobile users is still a challenge that needs to be addressed.
Retrieval of images that are on heterogeneous background is still a challenge. Objects are
in 3-D but their images are represented in 2-D. Representation of 3-D objects requires
many 2-D shape images which makes it a difficult task. Optimization 2-D images required
to adequately represent a 3-D image is also challenge. Incorporating users or their taste as
part of retrieval system is still an area of interest. The satisfaction of the user is also a
paramount goal of recommender systems.
Segmentation is also an area where so many challenges still exist. The ideal situation is
when automatic segmentation is possible in generic images but human intervention is still
necessary. In large datasets of generic images segmentation becomes a daunting task. Thus
it would be necessary to minimize human involvement in segmentation of generic images.
The proposed image representation technique AKDFPE shows that it has high retrieval
effectiveness but the efficiency of it was not measured. The retrieval efficiency is
undoubtedly a critical factor in image retrieval for mobile user. Further research is
necessary to measure the efficiency of the representation technique. This factor is very
important for mobile users due to their limited time. The technique shows interesting
results when incorporated in the Image Content in Shopping Recommender System for
Mobile Users.
The most challenging is getting 100% precision from 100% recall and 100% user
satisfaction therefore there is still need for further investigation in the area of image
retrieval in mobile recommender systems.
140
Experiments using actual smart mobile devices on the market such as smart phones
(iPhones, Black Berry, etc) and other smart mobile devices (iPad, iPod, Black Berry Play
Book, etc) should be performed in future. This will enable to investigate how best the
system (Image Content in Shopping Recommender System for Mobile Users) can be
adapted to different smart mobile devices.
Extract:
“A mature science is governed by a single paradigm. The paradigm sets the standards for
legitimate work within the science it governs. By solving standard problems, performing
standard experiments and eventually by doing a piece of research under a supervisor who
is already a skilled practitioner within the paradigm, an aspiring scientist becomes
acquainted with the methods, the techniques and the standards of that paradigm.”
(Chalmers, 1999)
141
REFERENCES
ADAIR, J. B. & TURNBULL, M. 1974. A procedure for calculating great circle
distances between geographic locations. Council for Advanced transportation Studies, the
University of Texas at Austin.
AIROUCHE, M., BENTABET, L. & ZELMAT, M. 2009. Image Segmentation Using
Active Contour Model and Level Set Method Applied to Detect Oil Spills. Paper presented
at the Proceedings of the World Congress on Engineering (WCE 2009), London, UK.
ANTANI, S., LEE, D. J., LONG, L. R. & THOMA, G. R. 2004. Evaluation of shape
similarity measurement methods for spine X-ray images. J. Vis. Commun. Image R.
(Elsevier), 15:285-302.
AYED, I. B. & MITICHE, A. 2008. A Region Merging Prior for Variational Level Set
Image Segmentation. IEEE, 17(12):2301-2311.
BAEZA-YATES, R. & RIBEIRO-NETO, B. 1999. Modern Information Retrieval. New
York: ACM Press.
BAI, X., LATECKI, L. J. & TU, Z. 2010. Learning Context-Sensitive Shape Similarity
by Graph Transduction. IEEE Transactions on pattern analysis and machine intelligence,
32(5):861-874.
BIGDELI, E. 2008. Comparing accuracy of cosine-based similarity and correlation-based
similarity algorithms in tourism recommender systems. Paper presented at the 4th IEEE
International Conference on Management of Innovation and Technology, 2008.
BOGERS, T. & BOSCH, A. V. D. 2009. Collaborative and Content-based Filtering for
item Recommendation on Social Bookmarking Websites. Paper presented at the ACM
RecSys '09 Workshop on Recommender Systems and the Social Web, New York, USA.
BOUCHERON, L. E., HARVEY, N. R. & MANJUNATH, B. S. 2007. A quantitative
object-level metric for segmentation performance and its application to cell nuclei.
springer-Verlag 2007:208-219.
142
BOUTEMEDJE, S., ZIOU, D. & BOUGUILA, N. 2007. A graphical model for contentbased image suggestion and feature selection. Springer-Verlag, Berlin Heidelberg.
BRADLEY, A. P. 1997. The use of the area under the ROC curve in the evaluation of
machine learning algorithms. Pattern Recognition, 30(7):1145-1159.
BRODERSEN, K. H., ONG, C. S., STEPHAN, K. E. & BUHMANN, J. M. 2010. The
binormal assumption on precision-recall curves. Paper presented at the International
Conference on Pattern Recognition.
BURKE, R. 2002. Hybrid Recommender Systems: Survey and Experiments.
Modeling and User-Adapted Interaction, 12(4):331-370.
User
BUSTOS, B., KREFT, S. & SKOPAL, T. 2011. Adaptive metric indexes for searching in
multi-metric spaces. In: Multimedia Tools and Applications. Springer.
CELEBI, E. & ASLANDOGAN, A. 2005a. A comparative study of three moment-based
shape descriptors. Paper presented at the IEEE proceedings of the International
Conference on Information Technology: Coding and Computing
CELEBI, E. M. & ASLANDOGAN, A. Y. 2005b. A comparative Study of Three
Moment-Based Shape Descriptors. Proceedings of the International Conference on
Information Technology: Coding and Computing.
CHA, S.-H. 2007. Comprehensive Survey on Distance/Similarity Measures between
Probability Density Functions. International Journal of Mathematical Models and
Methods in Applied Sciences, 1(4):300-307.
CHALMERS, A. F. 1999. What is this thing called Science? Third Edition ed.
Buckingham: Open University Press.
CHAN, T. F. & VESE, L. A. 2001. Active Contours Without Edges. IEEE, 10(2):266277.
143
CHEN, Z., JIANG, Y. & ZHAO, Y. 2010. A Collaborative Filtering Recommendation
Algorithm Based on User Interest Change and Trust Evaluation. Internation Journal of
Digital Content Technology and its Applications, 4(9):106-113.
CHERIET, M., SAID, J. N. & SUEN, C. Y. 1998. A Recursive Thresholding Technique
for Image Segmentation. IEEE, 7(6).
CHOI, Y. & RASMUSSEN, E. 2002. User's relevance criteria in image retrieval in
America history. Information Processing and Management, 38(2002):695-726.
CLARKSON, K. L. 2005. Nearest-Neighbor Searching and Metric Space Dimensions.
In:
Nearest-Neighbor Methods for Learning and Vision: Theory and Practice.
Cambridge: MIT Press, Cambridge, MA.
DAVIS, J. & GOADRICH, M. 2006. The Relationship Between Precision-Recall and
ROC Curves. Paper presented at the Proceedings of the 23rd International Conference on
Machine Learning, Pittsburgh, PA.
DAWOUD, A. & KAMEL, M. S. 2004. Iterative Multimodel Subimage Binarization for
Handwritten Character Segmentation. IEEE, 13(9):1223-1230.
DEB, S. 2008. Overview of image segmetation techniques and searching for future
directions of research in content-based image retrieval.
DRUMMOND, C. & HOLTE, R. C. 2000. Explicity Representing Expected Cost: An
Alternative to ROC Representation. Paper presented at the In Proceedings of the Six ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining.
FERRI, C., HERNANDEZ-ORALLO, J. & SALIDO, M. A. 2003. Volume Under the
ROC surface for Multi-class Problems. Exact Computation and Evaluation of
Approximations. Paper presented at the Proc. of 14th European Conference on Machine
Learning.
FLUSSER, J., SUK, T. & ZITOVA, B. 2009. Moments and moment invariants in pattern
recognition. West Sussex: John Wiley & Sons Ltd.
144
FREIXENET, J., MUNOZ, X., RABA, D., MARTI, J. & CUFI, X. 2002. Yet Another
Survey on Image Segmentation: Region and Boundary Information Integration.
Springer:408 - 422.
GABBOUJ, M., AHMAD, I., AMIN, M. Y. & KIRANYAZ, S. 2005. Content based
Image Retrieval for Connected Mobile Devices. Paper presented at the Image Rochester,
New York.
GE, Y., XIONG, H., TUZHILIN, A. & XIAO, K. 2010. An Energy-Efficient Mobile
Recommender System. Paper presented at the Proceedings of the 16th ACM SIGKDD
international conference on knowledge discovery and data mining, New York.
GEMMIS, M. D., IAQUINTA, L., LOPS, P., MUSTO, C., NARDUCCI, F. &
SEMERARO, G. 2009. Preference Learning in Recommender Systems. Paper presented
at the European Conference on Machine Learning and Principles and Practice of
knowledge Discovery in Databases (ECML PKDD 2009), Bled, Slovenia.
GHAZANFAR, M. A. & PRUGEL-BENNETT, A. 2010. An Improved Switching Hybrid
Recommender System Using Naive Bayes Classifier and Collaborative Filtering. Paper
presented at the Proceedings of the International MultiConference of Engineers and
Computer Science (IMECS), Hong Kong.
GHAZANFAR, M. A. & PRUGEL-BENNETT, A. 2011. Fulfilling the Needs of GraySheep Users in Recommender Systems, A Clustering Solution. Paper presented at the In
2011 International Conference on Information Systems and Computational Intelligence,
Harbin, China.
GULDOGAN, O. & GABBOUJ, M. 2005. Content-Based Image Indexing and Retrieval
framework on Symbian Based Mobile Platform. Paper presented at the European Signal
Processing Conference, EUSIPCO 2005.
GUNAWARDANA, A. & MEEK, C. 2009. A Unified Approach to Building Hybrid
Recommender Systems. Paper presented at the Proceedings of the 2009 ACM Conference
on Recommender Systems, New York.
HEIJDEN, H. V. D., KOTSIS, G. & KRONSTEINER, R. 2005. Mobile recommedation
systems for decision making 'on the go'. Paper presented at the Proceeding of the
International Conference on Mobile Business.
145
HOSHINO, R., COUGHTREY, D., SIVARAJA, S., VOLNYANSKY, I., AUER, S. &
TRICHTCHENKO, A. 2009. Applications and extensions of cost curves to marine
container inspection. Annals OR, 187(1):159-183.
HU, S., HOFFMAN, E. A. & REINHARDT, J. M. 2001. Automatic Lung Segmentation
for Accurate Quantitation of Volumetric X-Ray CT Images. IEEE, 20(6):490-498.
HUANG, C.-L. & HUANG, W.-L. 2009. Handling sequential pattern decay:Developing a
two-stage collaborative recommender system. Electronic Commerce Research and
Applications, 8(2009):117-129.
HUANG, H. & JIANG, J. 2009. Laplacian Operator Based Level Set Segmentation
Algorithm for Medical Images. Paper presented at the IEEE: Second International
Congress on Image and Signal Processing (CISP), Tianjin.
JARVELIN, K. & KEKALAINEN, J. 2000. IR evaluation methods for retrieving highly
relevant documents. Paper presented at the Proceedings of the 23rd Annual Internationa
ACM SIGIR Conference on Research and Development in Information Retrieval, New
York NY.
KEKRE, H. B. & GHARGE, S. M. 2010. Image Segmentation using Extended Edge
Operator for Mammographic Images. International Journal of Computer Science and
Engineering (IJCSE), 2(4):1086-1091.
KIRBAS, C. & QUEK, F. K. H. 2003. Vese Extraction Techniques and Algorithms: A
Survey. Paper presented at the Proceedings of the third IEEE Symposium on
BioInformatics and BioEngineering (BIBE'03).
LAKSHMI, S. & SANKARANARAYANAN, V. 2010. A study of Edge Detection
Techniques for Segmentation Computing Approaches. International Journal of Computer
Application (IJCA), Special Issue on CASCT, 1:35-41.
LANDGREBE, T. C. W., PACLIK, P. & DUIN, R. P. W. 2006. Precision-recall
operating characteristic (P-ROC) curves in imprecise environments. Paper presented at the
The 18th International Conference on Pattern Recognition (ICPR'06), Washington, DC.
146
LANKTON, S. & TANNENBAUM, A. 2008. Localizing Region-Based Active Contours.
IEEE Transactions on Image Processing, 17(11):2029-2039.
LATECKI, L. J., LAKAMPER, R. & ECKHARDT, U. 2000. Shape descriptors for nonrigid shapes with a single closed contour. Paper presented at the IEEE Conference
proceedings on Computer Vision and Pattern Recognition.
LECCE, V. D. & GUERRIERO, A. 1999. An Evaluation of the Effectiveness of Image
Features for Image Retrieval. Visual Communication and Image Representation, 10:351362.
LI, Y. & GUAN, L. 2006. An effective shape descriptor for the retrieval of natural image
collections. Paper presented at the. Proceedings of the IEEE CCECE/CCGEI, Ottawa.
LIU, J. 2006. Robust Image Segmentation using Local Median. Paper presented at the
Proceedings of the 3rd Canadian Conference on Computer and Robot Vision, Canada.
LU, D. & WENG, Q. 2007. A survey of image classification methods and techniques for
improving classification performance. International Journal of remote Sensing, 28(5):860870.
LUCCHESE, L. & MITRA, S. K. 2001. Colour image segmentation: A state-of-the-art
survey.207 - 221.
MALYSZKO, D. & WIERZCHON, S. T. 2007. Standard and Genetic k-means Clustering
Techniques in Image Segmentation. Paper presented at the Sixth International Conference
on Computer Information Systems and industrial Management applications (CISIM'07),
Minneapolis, MN.
MANDL, T. 2008. Recent Developments in the Evaluation of Information Retrieval
System: Moving Towards Diversity and Practical Relevance. Informatica, 32(2008):2738.
MANNING, C. D., RAGHAVAN, P. & SCHUTZE, H. 2008. Introduction to Information
Retrieval. Cambridge University Press.
147
MAOFU, L., YANXIANG, H. & BIN, Y. 2007. Image Zernike Moments Shape Feature
Evaluation Based on Image Reconstruction. Geo-spatial Information Science, 10(3):191195.
MELVILLE, P. & SINDHWANI, V. 2010. Recommender Systems. In: VERLAG, S.
(Ed.). Encyclopedia of Machine Learning (1-9). Berlin: Springer.
MILJKOVIC, O. 2009. Image Pre-Processing. Kragujevac J Math, 32(2009):97-107.
MIN, J., POWELL, M. & BOWYER, K. W. 2004. Automated performance evaluation of
range image segmentation algorithms. IEEE, 34(1):263-271.
MINGQIANG, Y., KIDIYO, K. & JOSEPH, R.
extraction techniques. Pattern Recognition:43-90.
2008.
A survey of shape feature
MUKUNDAN, R. & RAMAKRISHNAN, K. R. 1998. Moment functions in image
analysis: theory and applications. Singapore: World Scientic Publishing Co. Pte. Ltd.
MULLER, H., MICHOUX, N., BANDON, D. & GEISSBUHLER, A. 2004. A review of
content-based image retrieval systems in medical applications-clinical benefits and future
directions International Journal of Medical Informatics, 73(1):1-23.
OLUGBARA, O. O., OJO, S. O. & MPHAHLELE, M. I. 2010. Exploiting Image Content
in Location-Based Shopping Recommender Systems for Mobile Users. International
Journal of Information Technology & Decision Making, 9(5):759-778.
PAZZANI, M. J. & BILLSUS, D. 2007. Content-based Recommendation Systems. Paper
presented at the The Adaptive Web, methods and Strategies of Web Personalization.
PETRAKIS, E. G. M. & FALOUTSOS, C. 1997. Similarity Searching in Medical Image
Databases. IEEE Transactions on Knowledge and Data Engineering, 9(3).
148
POLAK, M., ZHANG, H. & PI, M. 2009. An evaluation metric for image segmentation
of multiple objects. Image and Vision Computing, 27(8):1223-1227.
RASMUSSEN, E. 2002. Evaluation in Information Retrieval. Paper presented at the 3rd
International Conference on Music Information Retrieval, Paris, France.
REKIK, A., ZRIBI, M., HAMIDA, A. B. & BENJELLOUN, M. 2009. An Optimal
Unsupervised Satellite image Segmentation Approach Based on Pearson System and kMeans Clustering Algorithm Initialization. Internationaal Journal of Signal Processing,
5(1).
RICCI, F. 2010. Mobile Recommender Systems. IT & Tourism, 12(3):205-231.
RICCI, F. & NGUYEN, Q. N. 2006. Acquiring and revising preferences in a critiquebased mobile recommender system IEEE Intelligent System, 22(3):22-29.
RUI, Y. & HUANG, T. S. 1999. Image Retrieval: Current Techniques, Promising
Directions, and Open Issues. Journal of Visual Communication and Image Representation,
10:39-62.
SAMMA, A. S. B. & SALAM, R. A. 2009. Adaptation of K-mean Algorithm for Image
Segmentation. International Journal of Information and Communication Engineering,
5(4):58-62.
SARWAR, B. M., KARYPIS, G., KONSTAN, J. & RIEDL, J. 2002. Recommender
Systems for Large-Scale E-Commerce: Scalable Neighborhood Formation Using
Clustering. Paper presented at the In Proceedings of the Fifth International Conference on
Computer and Information Technology, Dhaka, Bangladesh.
SCHAFER, J. B., FRANKOWSKI, D., HERLOCKER, J. & SEN, S. 2007. Collaborative
Filtering Recommender Systems. In: SPRINGER-VERLAG (Ed.). The Adaptive web
(291-324). Berlin, Heidelberg.
SCHAFER, J. B., KONSTAN, J. & RIEDL, J. 1999. Recommender Systems in eCommerce. Paper presented at the In '99: Proceedings of the 1st ACM Conference on
Electronic Commerce New York.
149
SEZGIN, M. & SANKUR, B. 2004. Survey over image thresholding techniques and
quantitative performance evaluation. Journal of Electronic Imaging, 13(1):146–165.
SHARMA, N. & AGGARWAL, L. M. 2010. Automated Medical Image Segmentation
Techniques. Journal of Medical Physics, 35(1):3-14.
SHENG, C. & XIN, Y. 2005. Shape-based retrieval using shape matrix. International
Journal of signal processing:163-166.
SHIMAZAKI, H. & SHINOMOTO, S. 2007. A method for selecting the bin size of a
time hostogram Neural Computation, 19(6):1503-1527.
SIMONOFF, J. S. 1996. Smoothing Methods in Statistics.
Springer Series in Statistics.
In: SPRINGER (Ed.).
SKOPAL, T. 2010, September Where are you heading, metric access methods?: a
provocative survey. Paper presented at the SISAP '10: Proceeding of the Third
International Conference on Similarity Search and Application.
SKOPAL, T. & BUSTOS, B. 2010. On Nonmetric Similarity Search Problems in
Complex Domains. ACM Journal Name, V:1-56.
STEJIC, Z., TAKAMA, Y. & HIROTA, K. 2003. Genetic algorthm-based relevance
feedback for image retrieval using local similarity patterns. Information Processing and
Management, 39(1):1-23.
SU, X. & KHOSHGOFTAAR, T. M. 2009. A Survey of Collaborative Filtering
Techniques. Advances in Artificial Intelligence, 2009(2009):1-19.
TANG, J. 2010. A Color Image Segmentation Algorithm Based on Region Growing.
Paper presented at the Second International Conference on Computer Engineering and
Technology, Chengdu, China.
150
TERRELL, G. R. & SCOTT, D. W. 1992. Variable Kernel Density Estimation. The
Annals of Statistics, 20(3):1236-1265.
TRAN, D. C. & ONO, K. 2000. Content-based image retrieval: Object representation by
the Density of feature Points.213-218.
UDUPA, J. K., LEBLANC, V. R., ZHUGE, Y., IMIELINSKA, C., SCHMIDT, H.,
CURRIE, L. M., et al. 2006. A framework for evaluating image segmentation algorithms.
Computerized Medical Imaging and Graphics, 30( ):75-87.
VARSHNEY, S. S., RAJPAL, N. & PURWAR, R. 2009. Comparative Study of Image
Segmentation Techniques and Object Matching using Segmentation. Paper presented at the
Methods and Models in Computer Science.
VASUDA, P. & SATHEESH, S. 2010. Improved Fuzzy C-means Algorithm for MR
Brain Image Segmentation. International Journal on Computer Science and Engineering,
2(5):1713-1715.
WALTER, S. D. 2002. Properties of the Summary Receiver Operating Characteristic
(SROC) curve for diagnostic test data. Statistics in Medicine, 21(9):1237-1256.
WANG, L., HE, L., MISHRA, A. & LI, C. 2009. Active contours driven by local
Gaussian distribution fitting energy. Signal Processing.
WANG, Y., GUO, Q. & ZHU, Y. 2007. Medical image segmentation based on
deformable models and its applications Springer:209-260.
YANG, W.-S., CHENG, H.-C. & DIA, J.-B. 2008. A location-aware recommender
system for mobile shopping environments.
Expert Systems with Applications,
34(2008):437-445.
ZHANG, D. & LU, G. 2002. Generic Fourier Descriptor for Shape-based Image
Retrieval. Paper presented at the IEEE Transactions on multimedia.
151
ZHANG, D. & LU, G. 2004. Review of shape representation and description techniques.
Pattern Recognition Society, 37:1-19.
ZHANG, H., FRITTS, J. E. & GOLDMAN, S. A. 2008. Image Segmentation Evaluation:
A survey of unsupervised methods. Computer Vision and Image Understanding,
10(2):260-280.
ZHANG, J., LIN, Z., XIAO, B. & ZHANG, C. 2009. An Optimized Item-Based
Collaborative Filtering Recommendation Algorithm. Paper presented at the IEEE
International Conference on Network Infrastructure and Digital Content (IC-NIDC),
Beijing.
ZHANG, Y. J. 2001. A review of recent evaluation methods for image segmentation.
Paper presented at the International Symposium on Signal Processing and its Applications
(ISSPA), Kuala Lumpur.
ZHENG, X., SHERRILL-MIX, S. A. & GAO, Q. 2007a. Perceptual shape-based natural
image representation and retrieval Paper presented at the International Conference on
Semantic Computing.
ZHENG, X., SHERRILL-MIX, S. A. & GAO, Q. 2007b. Perceptual shape-based natural
image representation and retrieval. Paper presented at the Proceedings of the IEEE
International Conference on Semantic Computing.
ZHOU, B. & YAO, Y. 2010. Evaluation information retrieval system performance based
on user preference. Journal of Intelligent Information Systems, 34(3):227-248.
ZUVA, T., OLUGBARA, O. O., OJO, S. O. & NGWIRA, S. M. 2012. Kernel Density
Feature Points Estimator for Content-based Image Retrieval. Signal & Image Processing:
An International Journal (SIPIJ), 4(1):103-111.
152

Image Content in Shopping Recommender Systems for

Transcription

Similar documents

Sophisticated Coverpage Horst Lichter, Software

Song PredicZon System

DynaCAD Breast Brochure

FO3 Trixie M Dagame BFP Tacloban City Fire Station

Tekstin tallennus- ja hakumenetelmien kehittäminen suomen kielen

Segmentation,Targeting and Positioning

Software developments in CT scanning

ARL-2734 2008 Yearbook Ad #2

B2G Segmentation Strategies

Database Marketing Solutions - Anchor Computer