Image Content in Shopping Recommender Systems for
Transcription
Image Content in Shopping Recommender Systems for
Image Content in Shopping Recommender Systems for Mobile Users by Tranos Zuva Submitted in fulfilment of the requirements for the degree DOCTOR TECHNOLOGIAE in the Department of Computer Systems Engineering FACULTY OF ICT TSHWANE UNIVERSITY OF TECHNOLOGY Supervisors: Prof. Sunday O. Ojo Prof. Oludayo O. Olugbara Prof. Seleman M. Ngwira August 2012 DECLARATION BY CANDIDATE “I hereby declare that the dissertation /thesis submitted for the degree D Tech: Computer Systems Engineering, at Tshwane University of Technology, is my original work and has not previously been submitted to any other institution of higher education. I further declare that all sources cited or quoted are indicated and acknowledged by means of a comprehensive list of references”. Tranos Zuva Copyright© Tshwane University of Technology 2012 This study is dedicated to My late Mother for the inspiration. Quote: “Never send anyone to do something on your behalf if you want it done to your taste”-Shanangurayi Zuva ACKNOWLEDGEMENTS First and foremost, I would like to express my sincerest gratitude and appreciation to my supervisors namely Prof. Oludayo O. Olugbara, Prof. Sunday O. Ojo and Prof. Seleman M. Ngwira for their encouragement, guidance, patience, motivation and support during my DTech research period. Their contribution to my work is immeasurable by any standard. To Prof. Olugbara, thank you very much for unselfishly sharing with me your immense knowledge in this area (image processing). This enabled me to develop an understanding of the subject. Truly, I could not have imagined having better supervisors for my DTech study. Besides my supervisors, I would like to thank my fellow members of staff and students who were there for me when I needed help of any kind. They provided a conducive environment for me to work in. Thanks guys. I would also like to thank my family: my wife Keneilwe Zuva and kids (Unaludo, Tariro, Nyasha and Trevor) for their support. I wish they would be rewarded for joys they have sacrificed during the period of my study. Not forgetting my father, brothers, sisters, my inlaws, friends and their families for their support. Last but not least a big thank you to Tshwane University of Technology for the financial support and for giving me the opportunity to further my studies. i CONTENTS PAGE ACKNOWLEDGEMENTS i LIST OF FIGURES vii LIST OF TABLES ix PUBLICATION LIST xi ABSTRACT xiii CHAPTER 1 ......................................................................................................................... 1 1 INTRODUCTION .......................................................................................................... 1 1.1 Statement of Problem .............................................................................................. 3 1.2 Research Question .................................................................................................. 3 1.3 Goal and Objectives ................................................................................................ 4 1.4 Expected Contributions........................................................................................... 5 1.5 Thesis structure ....................................................................................................... 6 CHAPTER 2 ......................................................................................................................... 7 2 RECOMMENDER SYSTEMS ...................................................................................... 7 2.1 Collaborative FILTERING (CF)............................................................................. 8 2.1.1 User-based nearest neighbour .......................................................................... 9 2.1.2 Item-based nearest neighbour ........................................................................ 10 2.2 Content-Based Filtering ........................................................................................ 11 2.3 Knowledge based RECOMMENDER SYSTEMS ............................................... 12 ii 2.4 Hybrid RECOMMENDER systems ..................................................................... 13 2.5 Challenges of recommendation Techniques ......................................................... 13 2.6 Evaluation metrics for recommender systems ...................................................... 15 2.7 Mobile recommender systems .............................................................................. 16 2.8 Motivation for mobile recommender systems ...................................................... 17 2.8.1 2.9 Recommendation systems for mobile users ................................................... 17 Architecture of mobile recommendation system .................................................. 19 CHAPTER 3 ....................................................................................................................... 23 3 IMAGE SEGMENTATION, REPRESENTATION AND RETRIEVAL................... 23 3.1 Image Segmentation Techniques .......................................................................... 23 3.1.1 Thresholding Method .................................................................................... 25 3.1.2 Edge Based Methods ..................................................................................... 26 3.1.3 Region Based Methods .................................................................................. 30 3.1.4 Performance Evaluation ................................................................................ 36 3.1.5 Challenges and Future Directions.................................................................. 38 3.1.6 Segmentation techniques Summary............................................................... 38 3.2 Image Shape Representation and Description Techniques ................................... 40 3.2.1 Classification of shape representation and description techniques ............... 41 3.2.2 Boundary/Contour Based representation Techniques ................................... 43 3.2.3 Region/Whole based representation Techniques ........................................... 47 3.2.4 Evaluation of Representation and Description Algorithms ........................... 53 3.2.5 Challenges and Future Directions.................................................................. 54 iii 3.2.6 3.3 Image representation Summary ..................................................................... 54 Image (dis)similarity measurement and Database access algorithms ................... 56 3.3.1 (Dis)similarity Algorithms ............................................................................ 56 3.3.2 The Relationship between (Dis)similarity algorithm and Database Indexing65 3.4 Image (dis)similarity measurement and Database access algorithms Summary .. 67 3.5 Evaluation algorithm of Information Retrieval Systems ...................................... 68 3.5.1 Techniques for evaluation of unranked retrieval results ............................... 70 3.5.2 Techniques for evaluation of ranked retrieval results ................................... 71 3.5.3 Relationship between ROC AND p-r related measures ................................ 76 3.5.4 Conclusion ..................................................................................................... 77 3.6 Chapter summary .................................................................................................. 78 CHAPTER 4 ....................................................................................................................... 80 4 SHAPE IMAGE CONTENT FOR MOBILE RECOMMENDER SYSTEM ............. 80 4.1 Image pre-processing ............................................................................................ 84 4.2 Segmentation methods .......................................................................................... 84 4.2.1 Active contour without edges ........................................................................ 85 4.2.2 Robust Image Segmentation using Local Median ......................................... 87 4.3 Image representation method ................................................................................ 89 4.4 The 1-Dimensional Kernel Density Estimation .................................................... 89 4.4.1 Kernel Functions............................................................................................ 90 4.4.2 Kernel Density Estimator (Properties) .......................................................... 91 4.4.3 Bias of the Estimator ..................................................................................... 93 iv 4.4.4 Variance of the Kernel Density Estimator ..................................................... 95 4.4.5 Mean-Square Error (MSE) ............................................................................ 96 4.5 Finding Optimal Bandwidth ................................................................................. 97 4.5.1 Asymptotically Optimal Bandwidth .............................................................. 98 4.5.2 Plug-in Bandwidth ......................................................................................... 98 4.5.3 Adaptive Kernel Density Estimate (AKDE) ............................................... 100 4.6 The N-Dimensional Kernel Density Estimation ................................................. 102 4.6.1 Kernel Density Estimator (Properties) ........................................................ 103 4.6.2 Asymptotic Mean Integrated Squared Error ................................................ 104 4.7 Finding Optimal Bandwidth ............................................................................... 104 4.7.1 4.8 Plug-in Bandwidth ....................................................................................... 104 Shape representation using Adaptive kernel density feature points estimator (AKDFPE) ..................................................................................................................... 106 4.8.1 Proposed calculation of the optimal bandwidth .......................................... 108 4.8.2 AKDFPE algorithm steps ............................................................................ 110 4.8.3 Example ....................................................................................................... 111 4.9 Similarity matching ............................................................................................. 112 4.10 Evaluation ........................................................................................................... 113 4.11 Datasets ............................................................................................................... 114 4.11.1 MPEG 7 ....................................................................................................... 115 4.11.2 General shopping item images .................................................................... 115 4.12 4.13 Query images ...................................................................................................... 116 Chapter Summary...................................................................................................116 v CHAPTER 5 ..................................................................................................................... 117 5 EXPERIMENTATION, RESULTS AND DISCUSSION......................................... 117 5.1 Experiments ........................................................................................................ 117 5.2 Pre-processing, SEGMENTATION AND (dis)similarity selection ................... 118 5.2.1 Results for pre-processing AND SEGMENTATION stages ...................... 119 5.2.2 Results for Selection of (dis)similarity method using AKDFPE ................. 119 5.2.3 Results analysis of pre-processing, segmentation and (dis)milarity techniques ......................................................................................................................121 5.3 Effectiveness of KDFPE and other representation methods ............................... 122 5.3.1 Results for Comparison of effectiveness between KDFPE and other methods on standard datasets .................................................................................................... 122 5.3.2 Results for Comparison of effectiveness between KDFPE and DHFP on shopping items dataset ............................................................................................... 124 5.3.3 5.4 Results analysis for effectiveness between KDFPE and other methods ..... 126 Image content for shopping items recommender system for mobile users ......... 127 5.4.1 Results for retrieval system of shopping items for mobile users ................. 129 5.4.2 Results for Image content for shopping items recommender system for mobile users................................................................................................................ 130 5.4.3 5.5 Results analysis............................................................................................ 135 Overal results anaysis ......................................................................................... 136 CHAPTER 6 ..................................................................................................................... 137 6 CONCLUSION, CONTRIBUTION AND FUTURE WORK................................... 137 6.1 Conclusion .......................................................................................................... 137 6.2 Summary of contributions................................................................................... 139 vi 6.3 Future work ......................................................................................................... 140 References ..................................................................................................................... 142 LIST OF FIGURES PAGE FIGURE 2-1: Classification of Recommender Systems ....................................................... 7 FIGURE 2-2: Cell phone screen size is small ..................................................................... 19 FIGURE 2-3: Proposed Mobile Recommender System ...................................................... 20 FIGURE 2-4: Architecture of the Recommender System ................................................... 21 FIGURE 3-1 An Overview of Shape Segmentation Techniques ........................................ 25 FIGURE 3-2: Sobel Edge Detection Templates .................................................................. 27 FIGURE 3-3: Two Commonly used Lapalcian kernels ...................................................... 29 FIGURE 3-4: Edge Based Method (Sobel) ......................................................................... 30 FIGURE 3-5: Quadtree Structure for Split and Merge Method ......................................... 31 FIGURE 3-6 Region Based Method (Chan & Vese) ......................................................... 36 FIGURE 3-7 An Overview of Evaluation Techniques ........................................................ 37 FIGURE 3-9.a Contour Pixels (8-Connectivity) ................................................................. 41 FIGURE 3-9.b Region Pixels (8-Connectivity) .................................................................. 41 FIGURE 3-10 Hierarchy of the Classification of Shape Representation and Description Techniques ........................................................................................................................... 43 FIGURE 3-11 Examples of Contour Based Techniques ..................................................... 44 FIGURE 3-12 Directions for 4-connectivity ....................................................................... 45 FIGURE 3-13 4-directional Chain Code Representation .................................................... 45 FIGURE 3-14 Examples of Region Based techniques ........................................................ 48 vii FIGURE 3-15: (a) Convex hull and its Concavities (b) Concavity representation tree of the convex hull .......................................................................................................................... 52 FIGURE 3-16: Hierarch of classification of evaluation techniques for IR systems ........... 69 FIGURE 3-17: Set Diagram showing elements of Precision and Recall ............................ 70 FIGURE 3-18: Graphs for values in Table 1 and Table 2 ................................................... 73 FIGURE 3-19: Graphs illustrating the appearance of P-R and ROC curves....................... 76 FIGURE 4-1: The framework of the retrieval system ......................................................... 83 FIGURE 4-2: The Image Retrieval Process ........................................................................ 83 FIGURE 4-3: Shows the rings around the centroid of an image ....................................... 109 FIGURE 4-4: Segmented object shape .............................................................................. 111 FIGURE 4-5: Distinct images from the Internet ............................................................... 116 FIGURE 5-1: Samples of shopping items in each category in the dataset ........................ 118 FIGURE 5-2: (b) Sample results of pre-processing and segmentation of images in (a) .. 119 FIGURE 5-3: (Dis)similarity method Cosine on the left and Euclidean on the right (KDFPE) ............................................................................................................................ 120 FIGURE 5-4: (Dis)similarity method Cosine on the left and Euclidean on the right(KDFPE).................................................................................................................... 121 FIGURE 5-5: Segmented shapes that were considered similar by KDFPE using cosine similarity algorithm ........................................................................................................... 121 FIGURE 5-6: Average precision-recall on Region Based Test Image Retrieval on 678 object shapes (MPEG 7 CE 2) ........................................................................................... 124 FIGURE 5-7: Ten retrieval results of KDFPE on left and DHFP on the right (query at the top left of the figure) .......................................................................................................... 125 FIGURE 5-8: Ten retrieval results of KDFPE on the left and DHFP on the right (query at the top left of the figure) .................................................................................................... 125 viii FIGURE 5-9: Average precision-recall chart on General Image Retrieval ....................... 126 FIGURE 5-10: 2-D images of a 3-D shopping item .......................................................... 127 FIGURE 5-11: a) set of images difficult to identify b) set of images easy to identify...... 128 FIGURE 5-12: Query image captured by a camera enabled mobile device ..................... 129 FIGURE 5-13: Ten retrieval results of KDFPE ................................................................ 129 FIGURE 5-14: Average precision-recall on General Image Retrieval (Query captured by cell phone) ......................................................................................................................... 130 FIGURE 5-15: Query Image ............................................................................................. 130 FIGURE 5-16: Results from the Shopping Recommender System .................................. 131 FIGURE 5-17: Query Image ............................................................................................. 131 FIGURE 5-18: Results from the Shopping Recommender System .................................. 132 FIGURE 5-19: Query image captured by a camera enabled mobile device ..................... 132 FIGURE 5-20: Results from the Shopping Recommender System with GPS coordinates for Retailer............................................................................................................................... 133 FIGURE 5-21: Query image captured by a camera enabled mobile device ..................... 133 FIGURE 5-22: Results from the Shopping Recommender System with GPS coordinates for Retailer............................................................................................................................... 134 FIGURE 5-23: Evaluation of the recommender system .................................................... 135 LIST OF TABLES PAGE TABLE2.1: User Rating Data Matrix R...............................................................................9 TABLE 3.1: Segmentation techniques summary.................................................................39 TABLE 3.2: Representation techniques summary...............................................................55 TABLE 3.3: Interpretation of (dis)similarity values............................................................57 ix TABLE 3.4: Non-metric classification.................................................................................59 TABLE 3.5: Examples of metric access methods................................................................67 TABLE 3.6: Showing the calculation of precision-recall coordinates…………………….72 TABLE 3.7: 11-Point interpolated average precision……………………………………..73 TABLE 3.8: Confusion matrix…………………………………………………………….75 TABLE 4.1: Plug-in values for hrot ....................................................................................100 TABLE 4.2: Values of constant C j ( K , q) .........................................................................106 TABLE 5.1: Comparison of Bull’s Eye Performance on MPEG 7 CE 1 dataset..............123 TABLE 5.2: 6220c cellphone and its camera specifications..............................................128 TABLE 5.3: Scores to measure satisfaction with performance of the system...................128 x List of Publications For the duration of three years of research, the following research papers were published or submitted that are related to the research work. Refereed Conference Papers Tranos Zuva, Oludayo O. Olugbara, Sunday O. Ojo and Seleman M. Ngwira (2012), Introducing an Adaptive Kernel Density Feature Points Estimator for Image Representation, International Conference on Computer Science, Engineering & Technology (ICCSET), 2-3 June 2012, Zurich, Switzerland, pp 129-133. Tranos Zuva, Oludayo O. Olugbara, Sunday O. Ojo and Seleman M. Ngwira (2012), Enhanced Density Histogram of Feature Points Representation Method, International Conference on Information Retrieval & Knowledge Management (CAMP 12), 13~15, 2012, Kuala Lumpur, India, pp 209-213. Tranos Zuva, Oludayo O. Olugbara, Sunday O. Ojo and Seleman M. Ngwira (2012), Object Shape Representation by Kernel Density Feature Points Estimator, First International workshop on Signal and Image Processing (SIP 2012) January 3~ 4, 2012, Bangalore, India, pp. 209-216. Tranos Zuva, Oludayo O. Olugbara, Sunday O. Ojo and Seleman M. Ngwira (2011), Image Shape Representation and Description Techniques, Classification of Available Techniques and Open Issues, Proceedings of 2011 IEEE International Conference on Intelligent Computing and Intelligent Systems (ICIS 2011) November 18-20, Guangzhou, China, pp. 186-191. Tranos Zuva, Oludayo O. Olugbara, Sunday O. Ojo and Seleman M. Ngwira (2011), A Review of Image Segmentation Techniques, Challenges and Future Directions, International Conference on Materials Science and Computing Science (MSCS 2011) August 13-14, Wuhan, China, ISSN: 1022-6680. Refereed Journals Papers Tranos Zuva, Oludayo O. Olugbara, Sunday O. Ojo and Seleman M. Ngwira (2012), Introducing an Adaptive Kernel Density Feature Points Estimator for Image xi Representation: International Journal of Wireless Information Networks & Business information System (WINBIS) Vol. 3, June 2012, Pages: 124-130, (ISSN No: 2091-0266) Tranos Zuva, Oludayo O. Olugbara, Sunday O. Ojo and Seleman M. Ngwira (2012), Image Content in Location-Based Shopping Recommender Systems for Mobile Users: Advanced Computing: An International Journal (ACIJ), Vol.3, No.4, July 2012, Pages: 1-8, (ISSN: 2229 - 6727 [Online] [Online]; 2229 - 726X [Print] Tranos Zuva, Oludayo O. Olugbara, Sunday O. Ojo and Seleman M. Ngwira (2011), Review of Image Shape Representation Methods, Challenges and Future Directions: Canadian Journal on Image Processing and Computer Vision Vol. 3 No. 1, March 2012, Pages: 32-37, ISSN: 1923-1717 Tranos Zuva, Oludayo O. Olugbara, Sunday O. Ojo and Seleman M. Ngwira (2011 Kernel Density Feature Points Estimator for Content-based Image Retrieval: Signal & Image Processing: An International Journal (SIPIJ), Vol.4 No.1, February 2012, Pages: 103-111, ISSN 0975-5578 (Online) 0975-5934 (Print) Tranos Zuva, Oludayo O. Olugbara, Sunday O. Ojo and Seleman M. Ngwira (2011), Image Segmentation, Available Techniques, Developments and Open Issues: Canadian Journal on Image Processing and Computer Vision Vol. 2 No. 3, March 2011, Pages: 20-29, ISSN: 1923-1717 xii Abstract The general problem of generating recommendations from a recommender system for users is an arduous one. More arduous is the generation of recommendations for mobile users, because of the limitations of the mobile devices on which the recommendations are to be projected. Mobile devices with integrated support of camera can be used to offer online services to global community whenever and wherever they are located. The mobile user expects to receive a limited number of probable recommendations from a shopping recommender system in few seconds and must be approximately accurate to the mobile user’s needs. In order to achieve this objective proposed client-server architecture for image content based shopping recommender system framework over wireless mobile devices was implemented. The image content shopping recommender system performed a query by external image captured by the mobile device’s camera. It then generated a set of recommendations that is viewed on the mobile device using the Internet browser. The image content used to improve recommendations generation is the shape extracted using level sets and active contour without edge methods. An algorithm to represent the extracted shape content such that it will be invariant to Euclidean transform, affine transformation and robust to occlusion and clutter was found. The shape invariant content was then used to characterise sales item for effective recommendations generation. Suitable distance measure was used to evaluate the images’ similarity for retrieval purpose on the content representation. Experimental results were generated and analyzed to test the efficacy of the shape content representation and matching algorithm. Finally the Image Content in Recommender System for Mobile Users is simulated and evaluated by users. xiii CHAPTER 1 1 INTRODUCTION This thesis reports on the development of mobile recommendation system to intuitively support mobile users in recommendations generation. Recommendation systems belong to the class of information search techniques that have been recently proposed to overcome the information overload problems. The fundamental computational task of a recommender system is to predict the subjective evaluation a user will assign to an item (Ricci, 2010). This technology has been successful for web users in providing targeted item recommendations but only a few have been designed for mobile users (Ricci & Nguyen, 2006). There are several mobile technologies including mobile data networks (General Packet Radio Service (GPRS) and Universal Mobile Telecommunications System (UMTS)), Global Positioning Systems (GPS), mobile phones and Personal Digital Assistants (PDAs) that are in use to offer online services to the global community whenever and wherever they are located (Lu & Weng, 2007). These types of services are best suited to mobile users in places that they have never been to before and the user has to make a choice from a number of available options. Potential beneficiaries of these services include tourists, long distance vehicle drivers, business travellers, nomads and individuals who want to access important information on the move. In particular, the technology is beneficial to disabled people who find it difficult shopping around to locate items of choice from their current locations through the help of mobile devices that they carry. These types of people would want to know their way to certain places or where they are, where to get item(s) and /or activity(ies) of their choice. At cheaper cost, recommender systems can be of great help to this group of people in order to find places or where they are and/or item(s) of their taste in places where there are so many options to choose from (Olugbara, Ojo & Mphahlele, 2010). These users would want to satisfy their short term needs so it is imperative that when they search for information, the output must be precisely accurate instantly or after a short time of feedback. IT is important that the user be satisfied with output from such applications. Most of the few web-based recommender systems designed for mobile devices run only on PDAs (Palm or Pocket PC), and they are not suitable for the much popular mobile smart phones (i.e iPhone, Black Berry, etc). This is due to the fact 1 that mobile smart phones have smaller screens, limited keypad and texting on such a device is extremely difficult. Use of text brings another problem as has been noted (Boutemedje, Ziou & Bouguila, 2007) that no two people can describe the same place and/or item using the same words or in the same way. It means this ambiguous way of querying a database will flood or overwhelm the user with so many probable search outputs. Users will take a long time to get what they want, in so many cases they give up especially when they are mobile users who do not have time to screen several outputs. The aim of this research work is to find an effective way to represent and retrieve shopping image items from a shopping database to use in Image Content in Recommender System for Mobile Users. For this to be achieved a proposed image content retrieval system that use images from a camera enabled mobile smart device as primary input to the system was implemented. The retrieval system was then incorporated into the recommender system. This has the potential to encourage usage of the system by mobile users because it removes ambiguities, reluctance of querying due to spelling or sentence construction and other texture related problems. Image contents that are usually used to describe the image syntactic component and search for image in a database are colour, texture, shape, and their combination. The effectiveness and efficiency of image content in shopping recommender systems for mobile users depends importantly on content representation, the (dis)similarity model used to evaluate the images’ similarity and the accessing method. It is highly important that when the images are collected for a image database to be used with the recommender system the content must be extracted and stored because extraction of content could have high computational complexities (Chan & Vese, 2001; Sharma & Aggarwal, 2010). This enables reduced processing time especially for mobile users who have very limited time to wait for retrieval generation. In this work the shape content extracted using level sets and active contour without edge was used. An algorithm to represent the extracted shape content such that it is invariant to Euclidean transform and affine transformation (rotation, scale and translation) was developed. The invariant content was then used to characterise shopping items which is the domain of interest. 2 1.1 STATEMENT OF PROBLEM The greatest challenge in recommender systems today is to improve recommendation accuracy and efficiency. This challenge has been compounded with the availability of smaller mobile devices like mobile phones and PDAs on the market with limited interface screen, memory size and processing capability. Researchers are trying to migrate the content-based image recommender systems to these mobile devices. Traditionally, the Internet is an information retrieval system used to generate information for the users, however, the information generated can overload the user. It has been shown (Ricci, 2010) that this is caused by the ambiguity in querying of information and/or the structure of the database. Nowadays with a great variety of mobile devices having limited interface screen and memory sizes, it is imperative that recommender systems be very accurate and efficient in processing queries to the satisfaction of the user. Surely, without improvements in the retrieval systems then a very useful system will not migrate smoothly to mobile users who so much need it. A mobile user is someone who is limited in time, thus the output must be limited in number due to the constraints of the device in use and accurate to reduce the feedback interaction time. This type of user can be described as an impatient user so the input procedure for querying the database must not delay the user. With this in mind, research work is being done to try to introduce recommender systems because of their economic benefits to governments, companies and individuals. This study is aimed at contributing to such efforts. In our research group, we are focusing on a unified conceptualization of three main research endeavours, recommendation technology, image processing and mobile computing to realise an effective recommender system for mobile users. This implies the effective management of three categories of problems, in the three research endeavours, that is the problems of mobile computing, image representation and retrieval, and recommendation technology. 1.2 RESEARCH QUESTION The research question addressed in this study stemmed from the above research problem statement and is stated as follows: 3 How could the shopping recommender system be developed so that mobile users are satisfied in terms of retrieval accuracy and retrieval time using query-by-external image? To put this question in context we illustrate the following mobile retrieval problem. “Suppose Nyasha leaves home with a location and a camera enabled mobile device for shopping. Getting to a nearby shop, she finds an item similar to an item she really wants. Now she is faced with the difficulty of either buying it now or to continue doing window shopping with the hope of finding the real item she wants. The dilemma is if she does not buy now she might not find it later or if she does, she might find the one she wants, as she continues her window shopping. Consequently, the problem is, with the aid of a camera enabled mobile device carried by Nyasha, how can she be helped to make the decision of buying this item or not with the assumption that the shops have databases of shopping items online?” In order to answer the above main research question and give a solution to Nyasha’s problem the following sub-questions need to be adequately answered: 1. How can image content extracted using Active Contour without Edges be represented for effective use in a shopping recommender systems for mobile user? 2. How can camera enabled mobile devices facilitate an efficient retrieval of shopping item image of interest for a mobile user from a shopping database? 3. What (dis)similarity techniques can work effectively in matching similar images in a recommender system being queried using images captured by camera enabled mobile devices? 4. What is the effectiveness of the shopping recommender system for mobile users? 1.3 GOAL AND OBJECTIVES The goal of the research is to evolve efficient image content representation mechanism and retrieval algorithm for effectively matching sales item whose image content has been extracted by Active Contour without Edges in an Image Content in Shopping 4 Recommender System for Mobile Users. This can be accomplished by implementing the following objectives: 1. To study and compare recommender algorithms, image segmentation, shape representation and (dis)similarity methods 2. To highlight the challenges and open issues in the areas studied in 1 above 3. To propose an image representation technique for effective use in a recommender system for mobile users 4. To measure the effectiveness of the shopping recommender system using query images captured by a camera enabled mobile device 5. To measure the user satisfaction of the recommender system 1.4 EXPECTED CONTRIBUTIONS This work makes the following research contributions: a novel approach for item representation based on image shape content a novel approach to recommender systems an image database query technique using images captured by camera enabled mobile device highlight some of the challenges and open issues in the area of image processing and recommender systems. The use of image content in shopping recommender systems and the use of mobile device to provide the query by external image to the system is a novel idea. It is novel since most other recommender systems are not suitable for mobile users. The application of shape content extracted using level sets and active contour without edge in shopping recommender systems for mobile users is also novel. Most research work in image processing has been done in areas of medical, security, remote sensing, but not in ecommerce. This work also contributes in highlighting the problems, challenges and open issues that are still encountered in this area of image recommender system for mobile users. The introduction of the novel region based image representation method is also a contribution of this research. 5 1.5 THESIS STRUCTURE This thesis is structured as follows: Chapter 2, Reviews of related works on recommender systems and discussion based on theoretical framework for this research work. Chapter 3, Reviews of related works on Image Segmentation, Representation and Retrieval techniques. Chapter 4, Discussion based on Shape Image Content for Mobile Recommender System, including the experimental designs for this research work. Chapter 5, Presentation of experiments, results and discussion of analysis of the experimental results. Chapter 6, Conclusion, Contribution and Future Work, the conclusion of the research work. The achievements, shortfalls and future endeavours are discussed. 6 CHAPTER 2 2 RECOMMENDER SYSTEMS Recommender systems belong to a class of personalized information filtering technologies that aim to meaningfully suggest which items or products available might be of interest to a particular user (Bogers & Bosch, 2009; Gunawardana & Meek, 2009). These systems make recommendations using three fundamental steps: preferences acquisition (acquiring preferences from the user’s input data), recommendation computation (computing recommendations using proper methods) and recommendation presentation (presenting the recommendation to the user) (Huang & Huang, 2009). Based on various techniques used in recommendation computation existing recommendation systems can be classified into four fundamental categories shown in Figure 2-1, that is, Collaborative Filtering (CF), ContentBased Filtering (CBF), Knowledge-Based filtering (KBF) and Hybrid Filtering (HF). Recommender Systems (RS) Content-Based Filtering (CBF) Collaborative Filtering (CF) Hybrid Filtering (HF) FIGURE 2-1: Classification of Recommender Systems 7 KnowledgeBased Filtering (KBF) 2.1 COLLABORATIVE FILTERING (CF) CF systems obtain user feedback in the form of ratings in a given application domain then exploit similarities and differences among profiles of several users to generate recommendations (Olugbara et al., 2010). Algorithms for CF recommender systems can be grouped into two general classes: memory based (algorithms that require all ratings, items and users be stored in memory) and model based (algorithms that periodically create a summary of ratings patterns offline) (Chen, Jiang & Zhao, 2010; Schafer, Frankowski, Herlocker & Sen, 2007). Most commonly used are the model based algorithms due to the fact that run-time complexities are reduced. CF techniques can also be grouped into nonprobabilistic and probabilistic algorithms. Probabilistic CF algorithms are those that are based on an underlying probabilistic model. Non-probabilistic CF algorithms are not based on probabilistic model. The non-probabilistic CF algorithms are the most commonly used (Chen et al., 2010; Schafer et al., 2007; Su & Khoshgoftaar, 2009). Nearest neighbour algorithms are well-known CF non-probabilistic algorithms. There are two different classes of nearest neighbour CF algorithms that are User-based nearest neighbour and Item-based nearest neighbour. CF algorithms use a ratings matrix, R , to represent the complete mn user-item data, m represents the m th user and n th item. Each entry Ru ,i is the score of item i rated by user u within a certain numerical scale. The matrix is illustrated in table 2.1 below. TABLE 2-1: User Rating Data Matrix R Item1 Item 2 Item...... Item i Item....... Item n User1 R1,1 R1, 2 R1,... R1,i R1,... R1,n User2 R2,1 R2 , 2 R2,... R 2 ,i R2,... R2 , n User...... R...,1 R....,2 R.....,... R...,i R.....,... R...,n Useru Ru ,1 Ru , 2 Ru ,... Ru ,i Ru ,... Ru ,n User..... R...,1 R...,2 R.....,... R...,i R.....,... R.....,n Userm Rm,1 Rm , 2 Rm,... R m ,i Rm,... Rm , n 8 This section will discuss the user-based nearest neighbour and item-based nearest neighbour algorithms then the practical challenges of CF algorithms in general. 2.1.1 USER-BASED NEAREST NEIGHBOUR In the user-based neighbour collaborative filtering recommendation systems, the prediction of likeness of an item for an active user u is based on ratings from similar users. These users are called neighbours of u . User-based algorithms generate a prediction for an item i by analyzing ratings for i from users in the u ’s neighbourhood. Suppose we have a useritem rating matrix Rm*n , which means m is the number of all users n is the number of all items and Ru ,i is the score of item i rated by user u , showing the user’s degree of preference for item as in table 2.1. The most significant step in user-base neighbour CF algorithm is to search the neighbour of the target user u t . To be able to find the neighbour of the target user u t , similarity algorithm is used. There are two most used to compute similarity methods: cosine similarity and Pearson correlation coefficient similarity. The formula for Pearson is given in equations 2-1. Usersim(u t , u ) R R R R ut u ,i u u t ,i iI u , u t R u ,i i I u ,ut Ru 2 R i I u ,ut u t ,i (2-1) Rut 2 where Usersim(u, ut ) represent the similarity between user u and ut , I uut I (u) I (ut ) means the item set rated simultaneously by user u and ut , Ru ,i and Rut ,i are the scores of item i rated by users u and ut respectively, R u and R ut represent the average scores of users u and ut respectively. The last step is when N ut denotes the target user u t ’s neighbour set. We would want to predict u t rating for item j . The following equation 2-2 will be used. 9 R R un , j u n * sim(u t , u n ) P (u t , j ) Aut userbased | sim(ut u n ) | (2-2) u nN u t where Aut represents the average score for user u t for the rated items, Run , j is the score of item j rated by neighbour user u n , R un means the average score of neighbour u n for the rated items, sim(ut , u n ) means the similarity between user u t and the neighbour u n . This will be used to recommend an item to target user. For cosine based similarity algorithm refer to (Bigdeli, 2008). 2.1.2 ITEM-BASED NEAREST NEIGHBOUR Item-based nearest neighbour algorithms are transpose of the user-based nearest neighbour algorithms. Item-based algorithms create predictions based on similarities between items (Schafer et al., 2007). There are many ways to calculate the similarity between items. Some of the most popular algorithms are cosine based similarity, correlation based similarity and adjusted-cosine similarity. The formula for Adjusted-based cosine which is the most popular and believed to be the most accurate (Schafer et al., 2007; Zhang, Lin, Xiao & Zhang, 2009) is given in equation 2.3. (R u ,i Itemsim(i, j ) R u )( Ru , j R u ) u Ui, j (R u ,i u Ui, j Ru ) 2 R u, j (2-3) Ru ) 2 u Ui, j where Ru ,i and Ru , j represents the rating of user u on items i and j respectively, R u is the mean of the u th user’s ratings and U i , j represents all users who have rated items i and j . 10 The prediction calculation for item based nearest neighbour algorithm for user u and item j is carried out using formula 2-4 below. Itemsim(i, j ) * R P (u t , j ) item based iRu t ut , j (2-4) Itemsim(i, j ) i Ru t If the predicted rating is high then the system recommends the item to user. The item-based nearest neighbour algorithms are more accurate in predicting ratings than user based nearest neighbour algorithms (Schafer et al., 2007). 2.2 CONTENT-BASED FILTERING CBF approaches recommend items that are similar in content to the items the user liked in the past or march to the attributes of the user (Melville & Sindhwani, 2010; Pazzani & Billsus, 2007). In content based filtering recommender systems every item is represented by a feature vector or an attribute profile. The feature hold numeric or nominal values representing certain aspects of the item like colour, price, etc. A variety of (dis) similarity measures between the feature vectors may be used to compute the similarity of two items. The Euclidean or cosine (dis)similarity algorithms can be used and they are given in equations 2-5 and 2-6 respectively. 11 Euclidean dissimilarity n (x dissim( x, y ) i 1 i yi ) 2 || x y || 2 (2-5) Cosine similarity n sim( x, y ) x i 1 n xi2 i 1 i * yi n y i 1 2 i (2-6) where x and y are an items vectors with n elements in them, dissim( x, y) and sim( x, y) measure the distance apart and closeness respectively. The (dis)similarity values are then used to obtain a ranked list of recommended items. These approaches are based on information retrieval because content associated with the user’s preferences is treated as a query and unrated objects are scored with similarity to the query. This approach can give recommendations in any domain. Content based recommender systems work well if the items can be properly represented as a set of features. 2.3 KNOWLEDGE BASED RECOMMENDER SYSTEMS Knowledge based systems use knowledge structure to make inference about the user needs and preferences (Ricci, 2010). Knowledge based approaches are well-known in that they have functional knowledge: they have knowledge about how a particular item satisfies a particular user need, and can therefore reason about the relationship between a need and possible recommendation (Gemmis, Iaquinta, Lops, Musto, Narducci & Semeraro, 2009). The user profile can be any knowledge structure that supports this inference. 12 2.4 HYBRID RECOMMENDER SYSTEMS A hybrid is combination of at least two techniques in order to overcome the deficiencies of a single method used in isolation (Pazzani & Billsus, 2007). One way is to combine content based and collaborative filtering algorithms in such a way that they produce separate ranked lists of recommendations then merge them to make up the final recommendations (Melville & Sindhwani, 2010). Some notable examples of hybrid recommender systems are Weighted and Switching hybrid recommender systems. A weighted hybrid recommender is one in which the score of a recommended item is calculated from the results of all of the available recommendation algorithms in the system. For example the simplest combined hybrid recommender systems would be a linear combination of recommendation scores. Switching Hybrid recommender system (SH) uses some criterion to switch between recommendation techniques. Example of (SH) recommender system is the DailyLearner that uses a content\collaborative hybrid. In this hybrid content based recommendation algorithm is employed first then collaborative if the first results are not satisfactory (Burke, 2002; Ghazanfar & Prugel-Bennett, 2010). 2.5 CHALLENGES OF RECOMMENDATION TECHNIQUES Collaborative filtering recommender systems have been very successful in past, but their extensive use has exposed some real challenges. Some of the challenges are: Data Sparsity, Cold Start Problem, Fraud, Scalability, Gray sheep, Shilling attack and synonymy (Chen et al., 2010; Melville & Sindhwani, 2010; Sarwar, Karypis, Konstan & Riedl, 2002; Su & Khoshgoftaar, 2009). Data Sparsity: In practice, many commercial recommender systems are used to evaluate very large item sets (e.g. Amazon.com, CDnow.com). In these systems, even active users may have purchased one percent of the items (1% of two million of books is 20 000 books). The user-item matrix used for CF will be extremely sparse and a recommender system based on nearest neighbour algorithms may be unable to make any item recommendations for a particular user. The system becomes very ineffective. Under data sparsity there is also reduced coverage and neighbour transitivity (Schafer et al., 2007; Su & Khoshgoftaar, 2009). Coverage can be defined as the percentage of items that the system could provide recommendations for. The reduced coverage problem arises when the 13 number of users’ ratings may be very small compared with the large number of items in the system and the recommender system may fail to generate the recommendations for them. Neighbour transitivity refers to a problem with sparse databases, in which users with similar tastes may not be identified if they have not rated the same items. Content based approaches can also solve the problem since they do not require ratings from other users. Cold start problem describes a situation in which a recommender system is unable to make meaningful recommendations due to an initial lack of ratings. Cold start occurs when a new user or item has just entered the system, it is very difficult to find similar ones due to inadequate enough information. New items cannot be recommended until some users rate them. The new item problem affects collaborative filtering recommender systems. Since content based filtering recommender systems do not dependent on ratings from other users, they can be used to produce recommendations for all items provided attributes of the items are available. New users are very unlikely to be given good recommendations because of lack of their rating or purchase history. Research to solve the new user problem is focusing on effectively selecting items to be rated by the user to quickly get the user preferences to improve the recommendation performance (Melville & Sindhwani, 2010). Scalability: When the population of existing users and items grow tremendously, the traditional recommender systems algorithms will suffer serious scalability problems, with computational resources going beyond practical or acceptable levels. Synonymy: When a number of the same or very similar items have a different name and recommender systems fail to discover this latent association then treat these products differently. Gray Sheep and Black Sheep: When a user whose opinions do not consistently correlate in agreement or disagreement with any group of people and thus not benefit from the system. The gray sheep users problem is also responsible for increased error rate in collaborative filtering recommender systems (Ghazanfar & Prugel-Bennett, 2011), which often result in failure of recommender systems. Black sheep are those users who have no or very few people who they correlate with. This situation makes it very difficult to make recommendation for them (Gemmis et al., 2009). Fraud: Recommender systems are increasingly being adopted by commercial websites due to their economic benefits to the retailers and service providers. Unprincipled competing 14 vendors have started to engage in different forms of fraud in order to cheat the recommender systems to their advantage. They have endeavoured to inflate the perceived attractiveness of their own commodities (push attacks) or reduce the ratings of their rivals (nuke attacks). These attacks are also known as shilling attacks (Melville & Sindhwani, 2010; Su & Khoshgoftaar, 2009). With all these challenges encountered in the use of recommendation systems, there is need to evaluate the performance of the developed systems. The evaluation of the systems enables to determine the accuracy of the systems. 2.6 EVALUATION METRICS FOR RECOMMENDER SYSTEMS The performance of recommender system can be evaluated by comparing recommendations to a test set of known user ratings. These systems are commonly measured using predictive accuracy metrics, where the predicted ratings are directly compared to actual user ratings (Melville & Sindhwani, 2010). The commonly used metrics are Mean Absolute Error (MAE) and Root Mean Error (RME) as formulated in equations 2-5 and 2-6 respectively (Melville & Sindhwani, 2010). MAE | P RMSE u ,i Ru ,i | (2-7) N P Ru ,i 2 u ,i (2-8) N where Pu ,i is the predicted ratings for u on item i , Ru ,i is the actual rating and N is the total number of ratings in the test set. Predictive accuracy metrics treat all items equally. 15 2.7 MOBILE RECOMMENDER SYSTEMS With the ever-growing Information Communication Technology (ICT) market there are several mobile technologies on the mobile environment available and accessible to mobile users to stay connected to service networks while on the move. These devices are being used to offer online services to global community wherever and whenever they are located. Most of these technologies are handheld wireless devices. Among these devices, cell phone is becoming a primary platform for information access for online mobile-users (Gabbouj, Ahmad, Amin & Kiranyaz, 2005; Ricci, 2010). Mobile browsers are discouraged from shopping online products when they have to browse pages and pages (categories and subcategories) of information from an e-shop in order to find the products of their choice. The more time the user spends browsing the high cost to be paid in terms of time, money (for wireless data network) and health wise (screen very small). The small screen size of these handheld wireless devices require user to scroll up and down looking for information. In order to solve some if not all of the problems encountered by mobile-users recommender systems were introduced. In reality they must enable mobile-users to have direct access to highly relevant information in order to minimize the connectivity duration, time to browse for specific item(s) and user input. Recommender systems are information filtering and decision support tools aimed at addressing problems encountered by online browsers. Recommender systems have been applied in many diverse areas including e-commerce, advertising, news, document management and e-learning (Huang & Huang, 2009). They are one of the most popular tools provided in e-commerce to accommodate customer shopping needs with merchant offers (Yang, Cheng & Dia, 2008). Recommender systems enhance e-commerce sales in three ways by changing browsers into buyers, enabling cross-sell and loyalty (Schafer, Konstan & Riedl, 1999). Usually visitors or browsers visit an e-commerce website without the intention of buying anything. A recommender system that has been monitoring the browser may catch the eye of the browser by recommending an item of browser’s interest thus turning a casual browser into a buyer. A cross-sell can take place when a recommender system recommends an additional item based on those products already in the shopping cart. Recommender systems improve loyalty by creating a value added relationship between the site and the customer. Customers usually return to a site that best match their needs. The more the customer uses a recommender system the more the 16 recommender system learns about the customer and a bond is created between the customer and site. The customer becomes loyal to a site thus guaranteeing more sales. To differentiate from recommender systems that have been successful on Personal Computers (PC) (Ricci, 2010) the recommender systems for mobile devices will be addressed as Mobile Recommender Systems (MRS). The rest of this chapter will review challenges, open issues of MRS and discuss the proposed MRS. 2.8 MOTIVATION FOR MOBILE RECOMMENDER SYSTEMS A Recommender System that utilises image retrieval techniques can be classified as content based filtering recommender system. Image content such as colour, shape, texture and motion are used for knowledge representation instead of related terms and keywords (Olugbara et al., 2010). Most recommender systems in existence use text-based interface approach for interaction and visualization of recommendations (Olugbara et al., 2010). Searching with an actual image would be ideal since all ambiguities will be removed. Images can have contents that text alone cannot adequately convey, making integrating image retrieval and content-based filtering techniques suitable for addressing the deficiencies of text-based recommender systems. Content-based and Collaborative recommender systems have achieved considerable success but they do not take into consideration location of the user (Yang et al., 2008). Nowadays with mobile device being able to connect to service networks due to wireless network requires recommender systems to adapt to a mobile user environment. This is why there is the need for mobile recommender system for mobile users. Mobile Recommender Systems can be categorized by positioning them along three basic dimensions, that is, user mobility, device portability and wireless connectivity (Ricci, 2010). User mobility requires that the user has access to a mobile recommender system in different places. Device portability implies that the device used by the mobile user to access mobile information can be carried from one place to another without much trouble. Wireless connectivity implies that the device used to access the mobile information system by the mobile user is networked by means of a wireless technology such as Wifi or Bluetooth or UMTS. 2.8.1 RECOMMENDATION SYSTEMS FOR MOBILE USERS To enable migration of recommender systems to mobile environment, there are challenges that need to be taken into consideration. These include limitations of the mobile devices, 17 limitations of the wireless networks, the impacts from the external environment and the behavioural characteristics of the mobile users (Ricci, 2010). Notwithstanding these challenges, there are capabilities of these mobile devices that can be exploited. These include capability of giving the user’s physical position for example the Global Positioning Systems (GPS) and Radio-Frequency Identification (RFID), ability to deliver the information and services to the mobile users (omnipresence) whenever they are needed and wherever the user is (Ricci, 2010) and ability to capture images of interest. Defining mobile computing as a form of human (mobile user)-computer (mobile device) interaction by which a computer is expected to be transported during usage. Three aspects of mobile computing can be established as mobile communication, mobile hardware and mobile software. In this case a mobile user accessing the recommender system with a mobile device connected to a wireless network. As have been mentioned before that the mobile phones are becoming the primary platform for information access for online mobile-users. The limitation of these devices is the screen size as can be appreciated in Figure 2-1. Recommendation sessions on a small screen can be a daunting task and very frustrating for the users. The size of the display can impact negatively to the user. It is known that users are capable to read and understand the information offered by these small interfaces but the users have to do extensive scrolling (Ricci, 2010). In comparison a user using a small screen is less effective in completing a task than a user of large screen (Gabbouj et al., 2005; Ricci, 2010). These devices have small keypad. Most existing mobile phones have only twelve-key numeric keypad which makes it difficult to work with. The mobile devices (cell phone) batteries have a limited operation period. Another limitation is lack of system resources such as processing power and memory capacity. 18 FIGURE 2-2: Cell phone screen size is small The wireless connection to these devices should be reliable for the user to complete their search otherwise loss of connectivity can frustrate the user. Again in mobile Internet there is lack of (de facto) standardization of the browsing tools (Ricci, 2010). These are some of the challenges of using mobile devices to access information online. Researchers are coming out with solutions to make mobile recommender system a reality and acceptable by mobile users using the mobile device as the primary platform for accessing the recommender system (Gabbouj et al., 2005; Ge, Xiong, Tuzhilin & Xiao, 2010; Guldogan & Gabbouj, 2005; Heijden, Kotsis & Kronsteiner, 2005; Olugbara et al., 2010; Ricci, 2010; Yang et al., 2008). Some of the solutions to the above challenges are as follows: to make energy efficient mobile recommender systems, effective mobile recommender systems, efficient recommender systems, to just mention but a few. 2.9 ARCHITECTURE OF MOBILE RECOMMENDATION SYSTEM In this research, the following problems are addressed in recommender systems for mobile users while taking advantage of position detection of these mobile smart devices: 1. The problem of text usage in querying mobile recommender systems by taking advantage of camera enabled mobile devices 2. The problem of finding a shopping item that can be of interest to the mobile user. An illustration of the abstract is shown in Figure 2-3. 19 FIGURE 2-3: Proposed Mobile Recommender System In Figure 2-3 there is no usage of text up to the time the mobile user gets the recommendation. The mobile user either selects one of the image items recommended to the user or decide to capture the image of interest of the user from the shopping items available. The recommender system then uses the image sent by the user to recommend an item in the category of the user sent item. In Figure 2-3 the item that was recommended finally is a shoe. The proposed overall architecture of the mobile recommender system for mobile user is given in figure 2-4. 20 FIGURE 2-4: Architecture of the Recommender System The Client-Server architecture of the Recommender System is shown in figure 2-4. The client side (mobile device side) there are two main components, which are the LocationImage manager and the Internet browser. The Internet browser is used by the client to request service from the server via the Internet (e.g. when the client needs the service of the recommender system). The location-Image manager sends the location of the client and the image of interest to the recommender system. On the server side there is a recommender engine that consists of the on-line recommendation generator and an off-line interest profile generator. The off-line interest generator tracks the user’s purchases in order to generate the user interest profile. This will enable the system to recommend items 21 that are similar in content to the items the user liked in the past. Thus the system keeps a database of users’ interest profile. The on-line recommendation generator maintains the customer profile and retailer databases. The retailer database consists of retailer’s information such as the shopping items and the GPS coordinates of the retailer’s location. When client initiates a request, the on-line generator recommends the shopping items in the category of the image sent by the client based on the client’s interest profile. The recommended items then are received by the client with the GPS coordinates of the retailer. Client clicking on the coordinates the GPS will give directions to retailer’s physical location. If the client is buying on-line then the retailer receives the request from the client with the GPS position of the client in order to facilitate delivery. The mobile recommender system will be simulated on a PC but the content based retrieval system component of the mobile recommender system is going to be implemented and tested on shopping items. For the recommender system to know the category of the item image captured by the camera enabled mobile device there are background processes that take place behind the scene. These processes are: 1. Image Segmentation (Segmentation of the image selected or captured by the camera enabled mobile device) 2. Image Representation (Representation of the image selected or captured by the camera enabled mobile device) 3. Image Matching (Matching similarity of the image selected or captured by the camera enabled mobile device to the shop image items in the database) 4. Image Ranking (Ranking the image items according to user profile). The images are going to be segmented using level sets and active contour without edges and a new representation method will be proposed that is more accurate and effective in representing images. Selection of a suitable similarity method will be done between the metric and non metric methods. The system will be implemented and tested on a sample data. The following chapter will review the segmentation, representation and similarity algorithms. The challenges and open issues of image processing will be highlighted. 22 CHAPTER 3 3 IMAGE SEGMENTATION, REPRESENTATION AND RETRIEVAL In order to have effective and efficient image content in shopping recommender system for mobile users, it is imperative to select or create or improve image segmentation, representation and retrieval algorithms that are suitable for the shopping items domain or generic domain. There are numerous algorithms available in literature to segment, represent and retrieve images from an image databases. In this chapter the algorithms for image segmentation, representation and retrieval will be reviewed in order to find out their suitability for different applications. The classifications, advantages and disadvantages of the algorithms, challenges and open issues in the areas of image segmentation, representation and retrieval will be discussed. This chapter will enable to make a decision on whether to select or improve or create algorithms for use in the Image Content in Shopping Recommender System for Mobile Users. 3.1 IMAGE SEGMENTATION TECHNIQUES The prime goal of image segmentation is domain independent partitioning of an image into a set of disjoint regions that are visually different, homogeneous and meaningful with respect to some characteristics such as grey-level, texture or colour to enable easy image analysis (object identification, classification and processing) (Freixenet, Munoz, Raba, Marti & Cufi, 2002; Lucchese & Mitra, 2001; Wang, Guo & Zhu, 2007). The formal definition for image segmentation is as follows (Lucchese & Mitra, 2001): Let the image domain be and Pi be partitions of Such that Pi , ni1 Pi , H ( Pi ) true m, H ( Pi Pj ) false Pi and Pj adjacent (3-1) where Pi Pj for i j , and each Pi is connected . Discontinuity and similarity/homogeneity are two basic properties of the image pixels in relation to their local neighbourhood used in many segmentation methods. The segmentation methods that are based on discontinuity property of pixels are considered as 23 boundary or edges based techniques and those that are based on similarity or homogeneity are region based techniques. We have intentionally separated the thresholding technique from region based because of the usage of histogram and its simplicity in application (Freixenet et al., 2002). Hybrid based techniques are derived from integration of the edge and region based techniques information (Wang et al., 2007). Image segmentation surveys have been conducted, but there are few who have presented how researchers can evaluate one’s technique against the other on a domain independent images or evaluate the performance of their segmentation (Zhang, 2001),(Min, Powell & Bowyer, 2004),(Udupa, LeBlanc, Zhuge, Imielinska, Schmidt, Currie, Hirsch & Woodburn, 2006). Many surveys have been directed to one area of application of image segmentation in areas such as medical, remote sensing and image retrieval (Freixenet et al., 2002),(Lucchese & Mitra, 2001),(Deb, 2008). This chapter is organized as follows: Thresholding Methods, Boundary/Edge Based methods, Region based methods, Performance Evaluation and Summary. FIGURE 3-1 indicates the classification of image segmentation techniques we have considered in this chapter. Image segmentation is not an easy task because of: image noise, weak object boundaries, inhomogeneous object region, weak contrast and many others that affect images. 24 FIGURE 3-1 An Overview of Shape Segmentation Techniques 3.1.1 THRESHOLDING METHOD Thresholding based image segmentation aims to partition an input image into pixels of two or more values through comparison of pixel values with the predefined threshold value T individually: Let I (i, j ) be an image, 0, p(i, j )T I (i, j ) 1 p(i, j ) T where (3-2) refers to the pixel value at the position . Thresholding may be implemented locally or globally. In global thresholding the image is partitioned into two as shown above in Eq.3-2. Local thresholding, the image is subdivided into subimages and the threshold for each subimage is derived from the local properties of the pixels. The 25 predefined value of T is the one that complicates this method. The determination of the value T has been the point of interest in image segmentation research (Cheriet, Said & Suen, 1998),(Dawoud & Kamel, 2004),(Hu, Hoffman & Reinhardt, 2001). There have been many algorithms developed to generate better threshold value T to segment an image (Dawoud & Kamel, 2004). These methods that use intensity value do not use spatial morphological image information of an image and they usually fail to segment objects with low contrast or noisy images with varying background (Rekik, Zribi, Hamida & Benjelloun, 2009). Failure to find the most suitable algorithm to determine the threshold value might result in one or all of the following: The segmented region might be smaller or larger than the actual The edges of the segmented region might not be connected Over or under-segmentation of the image (arising of pseudo edges or missing edges) 3.1.2 EDGE BASED METHODS Edge based segmentation is the location of pixels in the image that correspond to the boundaries of the objects seen in the image. It is then assumed that since it is a boundary of a region or an object then it is closed and that the number of objects of interest is equal to the number of boundaries in an image. For precision of the segmentation, the perimeter of the boundaries detected must be approximately equal to that of the object in the input image. In the endeavour to implement the above there was need to define an edge in an image. An edge or a linear feature is manifested as an abrupt change or a discontinuity in digital number of pixels along a certain direction in an image. The manifestation becomes a highgradient or extreme of first order derivative or a zero crossing in the second derivatives. This brought another assumption that every object of interest in an image has a boundary that can be detected through the use of gradient or second derivative. Examples of edged based segmentation algorithms are Sobel, Prewitt, Kirsch, Laplacian and active contour methods just to mention a few. These segmentation methods use gradient or templates based on gradient or first derivative or second derivative to detect the boundaries of an image (Chan & Vese, 2001; Kekre & Gharge, 2010). 26 Sobel, Prewitt and Kirsch use templates based on gradient to detect the edges of an image. These operators use a pair of kernels to detect edges. For example, the Sobel operator consists usually of a pair of 3X3 convolution kernels as shown in figure 3-2. Sobel edge detection algorithm is suitable to detect boundaries along the horizontal and vertical axis because of the structure of the templates used shown in figure 3-2. +1 +2 +1 -1 0 +1 0 0 0 -2 0 +2 -1 -2 -1 -1 0 +1 Gx Gy FIGURE 3-2: Sobel Edge Detection Templates From the kernels, the result of Sobel operator at an image pixel that falls in a region of constant image intensity is zero vector and at a pixel on the boundary is a vector that points across the edge (Kekre & Gharge, 2010). Typically Sobel algorithm is used to obtain the approximate absolute gradient magnitude at each pixel in an input of grayscale image. The absolute gradient magnitude values may be calculated using for example one of the equations 3-2 and 3-4 (Kekre & Gharge, 2010; Lakshmi & Sankaranarayanan, 2010). | G || Gx | | G y | (3-3) Or G G x2 G y2 (3-4) In general this is how these gradient- based algorithms work. The active contour models can also be classified as boundary based segmentation methods. In active contour or deformable models, the user specifies an initial contour which is then moved by image driven forces to the boundaries. Generally these methods can be defined 27 by a function g(x) that acts as a stopping term when the object/region boundary has been reached. The function g(x) can be defined (Airouche, Bentabet & Zelmat, 2009; Liu, 2006) as g ( z ) 0 and lim g ( z ) 0 z For instance g (| u ( x, y |) Where 1 , p 1 1 | G( x, y ) * u ( x, y) | p is the convolution of the image with the Gaussian filter (3-5) which results in a smoother version of image , where, G( x, y) 1 2 e |x2 y 2 | 4 0, hom ogeneous region g (| u ( x, y ) | edge 0, (3-6) (3-7) Laplacian is a 2-D isotropic measure of the second spatial derivative of an image. The Laplacian of an image highlights regions of rapid intensity change and thus used for boundary detection (zero crossing edge detector). The Laplacian L( x, y) of an image with pixel intensity values I ( x, y) is given by equation 3-8: L ( x, y ) 2I 2I x 2 y 2 (3-8) The change of the gray level on the boundary of an image give a maximum or minimum value of the first partial directional derivative near the area of image edge and a second partial derivative of zero. When using Laplacian method the aim is to find the zero positions and that constitute the boundaries of the image (Huang & Jiang, 2009). Since the input image is represented as a set of discrete pixels a discrete convolution kernel that can 28 approximate the second partial derivatives in the definition of the Laplacian must be found. The two usually used kernels are shown in figure 3-3. -1 -1 -1 8 -1 -1 0 -1 0 -1 4 -1 0 -1 0 -1 -1 -1 (a) (b) FIGURE 3-3: Two Commonly used Lapalcian kernels Using one of these kernels the Laplacian can be calculated using the convolution methods. There are problems that have been of interest for researchers and the problems are centred on the use of gradient to detect the boundaries (Chan & Vese, 2001). For instance, these methods have problems with images that are edge-less, very noisy, boundary that are very smooth and texture boundary. Other problems of these techniques emanate from the failure to adjust/calibrate gradient function accordingly, thus producing undesirable results as: The segmented region might be smaller or larger than the actual The edges of the segmented region might not be connected Over or under-segmentation of the image 29 FIGURE 3-4: Edge Based Method (Sobel) FIGURE 3-4 illustrates some of the problems that are encountered in the use of edged based methods. The edges of FIGURE 3-4a can be seen missing in FIGURE 3-4b and this causes problems in post-segmentation image processing, for instance in retrieval or registration. 3.1.3 REGION BASED METHODS The region based segmentation is a partitioning of an image into similar or homogenous areas of connected pixels through the application of homogeneity or similarity criteria among candidate sets of pixels. Each of the pixels in a region is similar with respect to some characteristics or computed property such as colour, intensity and/or texture. The assumption in these techniques is that the partitions that are formed correspond to objects or meaningful parts of the image. In (Wang et al., 2007) the most commonly used techniques are the following: Thresholding, Region Growing, Split and Merge, Classifiers and Clustering. 30 Split and Merge segmentation methods have a common characteristic of starting with an initial inhomogeneous partitioning of the image (usually the whole image). The main goal of these methods is to distinguish the homogeneous parts of the image. The concept of split and merge method is based on quadtree representation, which means each node of the tree has four descendents and the root of the tree is the whole image as shown in figure 3-5. IR I P1 I P4 I P2 I P3 I P 44 I P 41 I P 42 I P 43 FIGURE 3-5: Quadtree Structure for Split and Merge Method In figure 3-5 I R represents the entire image region that is subdivided into four descendents. This process of splitting the regions of the image continues until homogeneous partitions are obtained. After the splitting phase, the merging stage starts to connect the fragmented regions that satisfy the condition of homogeneity. After merging phase the final segmented image is produced (Sharma & Aggarwal, 2010). 31 Clustering image segmentation algorithms are usually unsupervised algorithms and not dependent on training and training data. These methods create classes or partitions on an image without any priori knowledge. Clustering algorithms are commonly divided into two general classes that is hierarchical and partitional algorithms. Hierarchical clustering techniques create a cluster tree by means heuristic splitting or merging procedures. Partitional clustering techniques divide the input data into a particular determined number of clusters in advance. The whole process is determined by minimization of certain goal function, for example a square error function (Malyszko & Wierzchon, 2007). The two most popularly used algorithms for clustering are K-mean or Hard C-mean and Fuzzy Cmean (Lucchese & Mitra, 2001; Sharma & Aggarwal, 2010). K-mean algorithm produces results that correspondent to hard segmentation while fuzzy C-mean produces soft segmentation. Allowing the pixels to have membership of cluster in which they have maximum value of membership coefficient a soft segmentation can be converted to hard segmentation. These two methods belong to the partitional algorithms that use a number of “centres” to represent and group input data. General iterative model for partitional centrebased clustering algorithms has the following steps: 1. Initialize by assigning some values to the cluster centres 2. For each data point x i , compute its membership value m(c j | xi ) to all clusters c j and its weight w( xi ) 3. For each cluster centre c j , recalculate its location taking into account all points x i assigned to this cluster according to the membership and weight values: n cj m(c i 1 n j m( c i 1 | xi ) w( xi ) xi (3-9) j | xi ) w( xi ) 4. Repeat steps 2 and 3 until some termination criteria are met (Samma & Salam, 2009). Using K-means algorithm which is one of the partitional algorithms, the objective function in equation 3-10 is minimized. 32 n KM ( X , C ) min || xi c j || 2 j1........k i 1 (3-10) Here w( xi ) 1 for all i , and the membership function is defined according to the “winner takes all” rule that is an object belongs to the class with nearest centre (Malyszko & Wierzchon, 2007). The fuzzy k-means algorithm is based on minimization of objective function in equation 3-11. n k FKM ( X , C ) mijr || xi c j || 2 (3-11) i 1 j 1 The value of the parameter r should be constrained to the values r 1 (Vasuda & Satheesh, 2010). The region growing is a mostly used classical segmentation technique. The basic idea of region growing is a collection of pixels with similar properties to form a region. Commencing from some seed point, region growing methods segments images by incrementally recruiting pixels to a region based on some predefined criteria. Two important segmentation criteria are value similarity and spatial proximity (Kirbas & Quek, 2003; Tang, 2010). These region growing based segmentation models share the following assumptions about the image pixel properties: The intensity values within each region/object conforms to Gaussian distribution The mean intensity value for each region or object is different (global mean) [(Wang, He, Mishra & Li, 2009)]. The Gaussian probability distribution function (pdf) for the region 33 is given as follows: p i i (u ) Where =mean, 2 1 2 2 e ( u i ) 2 i2 (3-12) = variance. With this type of segmentation, the problems of discontinuous edges and no segmentation of objects without edges have been eliminated. The boundary of an object can be identified using the edge/boundary pixels of a region ensuring that the boundary is closed and the segmentation of objects without edges can now be done. One of the region based technique was introduced by Chan & Vese “Active Contour without Edges” can detect contours with or without edges. These methods are capable of detecting and preserving boundaries without the need to smooth the input image, even when it is very noisy. Images with smooth boundaries no longer cause any problems (Chan & Vese, 2001). Lots of interest have been shown to perfect these methods and encouraging results have been produced. For instance Jundong Liu argued that the global mean used by Chan & Vese in their model was not the best for medical images. The argument centred on the Chan & Vese model that defines the evolving curve C in Ω and an energy function . Chan & Vese model minimizes the energy functional defined as follows: F (c1, c2, C ) .length (C ) . Area (inside C ) 1 | u c 1 | 2 dxdy 2 inside( C ) | u c 2 | 2 dxdy inside( C ) (3-13) where are averages of inside C and outside C respectively. The values of the above energy function are global values computed from the entire image . In his paper “Robust Image Segmentation using Local Median” he alluded that the drawback that existed in most region based active contours were overcome. The paper indicates that the drawbacks originated from the assumption that the intensity values globally conforms to Gaussian distribution within each region and that global mean is enough to be used as discriminate measure. In order to improve the region based segmentation Liu minimized the following energy function: 34 F (c1, c2, C ) .length (C ) . Area (inside C ) 1 | u f inside( C ) 1 | 2 dxdy 2 | u f 2 | 2 dxdy inside( C ) (3-14) In this function global mean were replaced by local medians respectively. Where f1 median(u * inside (C ) *W ) (3-15) f 2 median(u * outside (C ) *W ) (3-16) W is a rectangle window that is used to define neighbourhood pixels in an image. The functions are defined to calculate the two local medians for the neighbouring pixels that are inside and outside the moving curve respectively on the image domain. Liu emphasised on the use of local information in an image instead of the global information. In this paper (Wang et al., 2009) “Active contour driven by local Gaussian distribution fitting energy” tends to agree with Liu in that local information of an image is very important in segmentation. They indicated that the use of global information as in “Active contours without edges” segmentation fail to adequately segment images with intensity inhomogeneity. Most of the images that cause the problems to segmentation techniques that use global information of an image are from medical field such as microscopy, computer tomography (CT), Ultrasound, magnetic resonance imaging (MRI), Positron Emission Tomography (PET), and mammography. Wang et al. used Gaussian distribution to describe the local image intensities with different means and variances. They concluded that their method was able to deal with both noise and intensity in-homogeneity, but has high computational time. The computational cost of these methods has been one of limiting factors in their usage (Ayed & Mitiche, 2008). These methods have to start with an initial curve and its placement on the image plays an important role in the final product of the segmentation process. Chan & Vese indicated that in their method “Active Contour without Edges”, the initial curve can be placed anywhere in the image and the segmentation of an image is competitively good. This shows that researchers are kin to make these methods domain independent. 35 Failure to adjust the homogeneity/similarity criteria accordingly will produce undesirable results. The following are some of them: The segmented region might be smaller or larger than the actual Over or under-segmentation of the image (arising of pseudo objects or missing objects) Fragmentation (Varshney, Rajpal & Purwar, 2009). FIGURE 3-6 Region Based Method (Chan & Vese) FIGURE 3-6 indicates some of the problems that can be encountered when using region based methods. It can be observed that there are some addition and subtraction to region of interest. Again this will affect the post-segmentation image processing. 3.1.4 PERFORMANCE EVALUATION There have been many image segmentation methods created and being created using many distinct approaches and algorithms but still it is very difficult to assess and compare the performance of these segmentation techniques (Zhang, Fritts & Goldman, 2008). Researchers would evaluate their image segmentation techniques by using one or more of the following evaluation methods in Figure 3-7. 36 FIGURE 3-7 An Overview of Evaluation Techniques The full description of the above evaluation methods can be found from (Zhang et al., 2008),(Polak, Zhang & Pi, 2009). Most of these methods ideally should be domain independent, but in reality they are domain dependent. It is generally believed that it is difficult to develop a single model that applies to all image objects (Boucheron, Harvey & Manjunath, 2007). Both the subjective and objective evaluation approaches have been used to evaluate segmentation techniques, but within a domain dependent environment (Zhang et al., 2008). It can be appreciated that whatever method is used in a specific domain has been used to compare the segmentation technique in that domain. These methods have been used to adjust parameters of the segmentation techniques in order to solve the following problems in segmentation area: The segmented region might be smaller or larger than the actual The edges of the segmented region might not be connected Over or under-segmentation of the image It is very sad that (Hu et al., 2001) concluded that there is no segmentation method that is better than the other in all domains. We believe that with the use of universal evaluation 37 methods we can be able to find the segmentation techniques that we may say are better than others in all domains. 3.1.5 CHALLENGES AND FUTURE DIRECTIONS For us to find domain independent segmentation techniques is when we can evaluate the techniques by domain independent evaluation methods using a domain independent image database. In order for this to happen, we need to create a universal image database such that researchers can use this database to evaluate their techniques. Whether a subjective or an objective evaluation method is used, the image database must be same and the images must be ranked to enable comparison of segmentation techniques. Whenever researchers segment these images in the database they must indicate the value of parameters for each image segmented, the computational time and the specification of the machine used. This will enable easy selection of a segmentation technique for a particular area. Due to ad hoc form of research, this way of evaluating techniques will give some form of orderliness in the segmentation field. There is still a room for further improvement in each group of segmentation methods, that is: Edge-based and Region-based. 3.1.6 SEGMENTATION TECHNIQUES SUMMARY Segmentation is one of the important preliminary steps in image processing. As can be appreciated choice of a suitable segmentation algorithm depends on peculiar characteristics of individual problems. This chapter looked at the classification of segmentation algorithms and challenges being faced. The boundary based methods show that they are capable of giving a good segmentation results in the absence of noise in image. Noise suppression techniques have been employed to improve the boundary based segmentation results on noisy images. The problem with these noise suppression techniques is that in reducing noise the edge strength may be reduced also resulting in failure to detect the edge. Region based segmentation algorithms solve this problem of missing edges. The advantages of region based segmentation algorithms over edge-based segmentation algorithms are that they do not use the gradient to detect boundaries. This allows the region based to be able to segment colour and multi-spectral images where there are no defined gradient-boundaries. The region based algorithms are less sensitive to the location of the initial contours. These algorithms have better capabilities of capturing concavities of object images and are less sensitive to noise. For a better segmentation of an image one has to 38 decide whether to use global or local statistics because they affect the final segmentation of an image. Through a proper performance evaluation of the segmentation algorithm over the domain of interest one can get satisfactory segmentation results. We have looked at the segmentation techniques, performance evaluation methods and we can give the following summary in TABLE 3-1: TABLE 3-1 Segmentation Techniques Summary Segmentation Research interest Known Problems in segmenting Methods images Thresholding Low contrast Determine the value of T Spatial morphological (threshold value) Edge Based information Edge-less Determine the appropriate Stopping gradient or other stopping criteria Region Based Noisy images Smooth boundaries Texture boundaries High computational time Determine homogeneity criteria to decompose the image into regions. Determine how to deal with in-homogeneity in images All three of Determine performance them: evaluation of the smaller or larger than the actual techniques Thresholding Determine comparison Edge Based criteria of the techniques The segmented region might be The edges of the segmented region might not be connected Over or under-segmentation of Region the image (arising of pseudo Based edges or missing edges) 39 3.2 IMAGE SHAPE REPRESENTATION AND DESCRIPTION TECHNIQUES With vast collection of digital images on personal computers and on the Internet, the need to find a particular image or a collection of images of interest has increased tremendously. This has motivated researchers to endeavour to find efficient, effective and accurate algorithms that are domain independent for representation, description and retrieval of image(s) of interest. It is a daunting task, thus there are many algorithms that have been developed to represent, describe and retrieve images using their visual features (shape, color, texture) (Rui & Huang, 1999), (Li & Guan, 2006), (Zheng, Sherrill-Mix & Gao, 2007b), (Mingqiang, Kidiyo & Joseph, 2008). Visual feature representation and/or description play(s) a very important role in image classification, recognition and retrieval. A successful image representation, description, retrieval/recognition system dependent on the selection of suitable image feature(s) to encode, quantification of these features and the selection of the similarity measure. This chapter deals with a brief review of 2-dimension (2D) shape representation and description techniques. This area is receiving so much attention due to the fact that human beings use shape as the basis of visual recognition (Zheng et al., 2007b). An accurate image shape representation and description in a machine would enable machines to compete very well with human beings in image recognition and retrieval. Image representation and description must fulfil translation, rotation and scale invariant (change in location/position, movement in a certain angle, shrinking or zooming of an image must not affect its representation and description), noise resistance (quality of the image is compromised, the visibility of certain features are reduced or lost, this must not affect too much the representation and description of an image), affine invariant and precise quantification of the chosen feature(s) to be considered accurate (Mingqiang et al., 2008). Image retrieval rate can be improved substantially through the use of an appropriate similarity measure technique. The similarity matching techniques depend very much on the representation and description technique applied. Usually the image shape representation and description is a collection of numbers (commonly vectors) produced by a representation and description algorithm in the process of quantifying an image shape in ways that concur with human intuition. To enable efficient storage and retrieval, the representation and description should fulfil the following: 40 the vectors must not be very large must enable similarity distance calculation to be simple (to reduce execution time) compact image object representation and description (Mingqiang et al., 2008). Surveys and reviews give researchers an overview of developments, achievements, direction and open issues within a given area. We arranged this section as follows: Classification of representation and description techniques Boundary/Contour based techniques Region/Whole based techniques Challenges and Future Directions Summary 3.2.1 CLASSIFICATION OF SHAPE REPRESENTATION AND DESCRIPTION TECHNIQUES Shape representation and description can be grouped into Region based and Contour based classes. These classes indicate which pixels are being used in the representation and description of the image. Region based shows that all the pixels of the shape contribute to the description while contour/boundary based means that the edge pixels are used in description of the image as shown in FIGURE 3-10a.and FIGURE 3-10b respectively. FIGURE 3-9.a Contour Pixels (8Connectivity) FIGURE 3-9.b Region Pixels (8Connectivity) MPEG-7 proposed this classification and is widely used (Mingqiang et al., 2008). It must be noted that representing and describing an image using contour based, the segmentation 41 of the image should be edge based or region based. Region based representation and description should be region based segmentation. Each of the groups above can be reclassified into Structural and Global subgroups. Structural based sub-group would represent a shape by segments while global represents the shape as a whole. It can be said that structural based is a discrete form of image representation while global is a continuous form. For example in boundary based representation, the structural based approach divides the shape boundary into segments called primitives (Zhang & Lu, 2004). The global based sub-group focuses on the overall shape such as the integral boundary is used to describe the shape. The techniques can also be classified into Space and Transform domains. This approach would indicate whether the shape features are derived from the spatial domain or not. Spatial domain is the normal image space. The space domain approaches match shapes on a point basis while transform domain approaches match shapes on feature (vector) basis. The last two classifications in Figure 3-10 are: Information preserving (IP) and Non-Information preserving (NIP). Sometimes it is necessary to reconstruct the original image from its representation and description. There are some techniques that enable the reconstruction of the original image and others that do not. Unfortunately very few techniques are able to give sufficient information for the reconstruction. 42 FIGURE 3-10 Hierarchy of the Classification of Shape Representation and Description Techniques 3.2.2 BOUNDARY/CONTOUR BASED REPRESENTATION TECHNIQUES Much of these techniques were described in (Zhang & Lu, 2004),(Mingqiang et al., 2008) so here we are going to list some of them and implore any improvements that have taken place on these techniques then add new ones so that we can predict the direction of the research within the contour based techniques. Figure 3-11 shows some techniques classified as contour based techniques. 43 FIGURE 3-11 Examples of Contour Based Techniques These techniques use the boundary of shape to describe an object. It is commonly believed that human beings can differentiate objects by their boundaries or contours (Zhang & Lu, 2002). Usually most objects form shapes with defined contours, making the use of these techniques most appealing. The techniques can generally be applied to deferent application areas with a considerable success. They have a low computation complexity as compared to region based techniques. We observed that research has been taking place to improve on the contour based image representation and description as seen in (Zhang & Lu, 2002). There is a form that must be constructed in structural contour based techniques. For example the chain code technique describes an object shape by a sequence of unit sized straight line segments based on 4 or 8-connectivity with a given direction (Mingqiang et al., 2008). An illustration of chain code using 4-connectivity in FIGURE 3-12 can be seen in FIGURE 3-13; 44 FIGURE 3-12 Directions for 4-connectivity FIGURE 3-13 4-directional Chain Code Representation It means knowing the starting point one can roughly reconstruct the object shape. That means this type of object representation is information preserving technique. Also we can observe that the shape features are derived from the spatial domain. It is important to note that any boundary disturbances probably due to either noise or the segmentation algorithm used will not represent the object shape correctly. The global contour based technique is where a function is derived from the boundary of the object shape to be used to represent the shape, for example 1-dimensional Fourier descriptors. The global indicate that these algorithms use the whole boundary pixels as a one continuous unit. The whole boundary is transformed by applying Fourier transform on a signature that is derived from the shape boundary coordinates (Mingqiang et al., 2008). To parameterize an object shape boundary from 0 to π, given the boundary coordinates as 45 ( xi , yi ) ( xo , y0 ), ( x1 , y1) ...( xn1 , y n1 ) (3-17) A periodic function can be constructed to represent the boundary as a series of coordinates in the complex plane as: s(t ) x(t ) jy (t ) (3-18) The discrete Fourier transform of s(t) is given as: F (u ) 1 N N 1 s(t )e 2 j ut / N , (3-19) t 0 where u=0, 1, ….N-1 and F(u) coefficients are the Fourier descriptors of the boundary. It is possible to reconstruct the boundary of the object shape by using inverse transform of F(u). We call this an Information Preserving Contour based representation algorithm. In this case the inverse transform will be: N 1 s(t ) F (u )e 2 j ut / N (3-20) u 0 where t=0, 1, …….N-1. All the examples we have given preserve information of the object shape to enable a reconstruction of the object shape boundary. There are also some object shape representation algorithms that do not allow reconstruction. For example the object shape signatures like Area function, Triangle-area representation and others, it will be near to impossible to reconstruct the object shape. These object shape representation algorithms fall within the Non-Information Preserving. 46 3.2.2.1 MERITS AND DEMERITS OF CONTOUR BASED ALGORITHMS Advantages it uses few pixels of an image a low computation complexity optimal in high contrast image objects Disadvantages sensitive to noise (variations on the edge pixels would represent the same image object differently) 3.2.3 REGION/WHOLE BASED REPRESENTATION TECHNIQUES These object shape representation algorithms use every feature point of the object shape to describe the shape. In using all the pixels of the object shape to describe a shape, these techniques can be classified as structural or global. The structural is where a form is constructed by segments/sections that we call primitives in the process of representing the object shape. On the other hand global techniques use the pixels as a continuous unit in representing the object shape (Mingqiang et al., 2008). We are going to describe examples in each category of the classifications that are structural, global, space, transform, information preserving and non-information preserving. Some examples of region based techniques are shown in figure 3-14. 47 FIGURE 3-14 Examples of Region Based techniques Region based Fourier descriptor is an example of a global, transform domain, and non information preserving representation technique. The Generic Fourier descriptor (GFD) is derived by applying a Modified Polar Fourier Transform (MPFT) on the object shape (Mingqiang et al., 2008) that has been transformed into a normal 2-dimensional rectangular polar image. For a given object shape image f(x, y), the MPFT is defined as PF ( , ) f (r , i )e r r 2i [ j 2 ( )] R T (3-21) i 1 2 2 where 0 r [( x xc ) ( y yc ) ] R and i i( 2 2 ), (0 iT ) ; ( xc , yc ) is centre of T mass of the shape; 0 R,0 T . R and T are radial and angular resolutions. 48 The calculated Fourier coefficients are invariant to translation but for it to be a good representation of the object shape it should also fulfil rotation and scaling invariant. Thus the following makes it to achieve the rotation and scaling invariant. | PF (0,0) | | PF (0,1) | | PF (0, n) | | PF (m,0) | | PF (m, n) | GFD , ,...., ,.. ,.., | PF (0,0) | | PF (0,0) | | PF (0,0) | area | PF (0,0) | (3-22) where area is the surface of the bounding circle the shape resides; m is the maximum number of the radial frequencies selected and n is the maximum number of angular frequencies selected. m and n can be adjusted to achieve hierarchical coarse to fine representation requirement. Moments have been used widely in image representation. These include Invariant moments, Algebraic moments, Zernike moments and Radial Chebyshev moments. They belong to space domain. A Zernike moment is classified region based, global and information preserving representation technique. Zernike moments object shape representation preserve information of the shape to enable the original object shape to be reconstructed from the shape description (Maofu, Yanxiang & Bin, 2007). The Invariant moments (IM), the general form of a moment function m pq of order p q , of an image function f ( x, y) is given by m pq ( x, y) f ( x, y)dxdy where p, q 0, 1, 2...n, n (3-23) xy pq is known as the moment weighting kernel or the basis set. For digital image function f ( x, y) the equation above is written in discrete form as follows m pq ( x, y) f ( x, y) x (3-24) y When it is Geometric moments then 49 ( x, y) x p y q (3-25) The moments that are invariant to translation are the central moments and are defined as follows: pq ( x xc ) p ( y y c ) q f ( x, y ) p, q 0,1,2,.. x y where xc m10 m00 and yc m01 . m00 (3-26) We must take note that the centroid ( xc , y c ) moves with the image under translation, that is why the central moments are invariant to translation. There are seven (7) (translation, rotation and scaling (TRS)) invariant moments and the seven are as follows as given by Hu: 1 20 02 (3-27) 2 ( 20 02 ) 2 4112 (3-28) 3 20 02 112 (3-29) 4 ( 30 312 ) 2 (3 21 03 ) 2 (3-30) 5 ( 30 312 ) 30 12 30 12 2 3( 21 03 ) 2 3 21 03 21 03 3 30 12 2 21 03 2 6 ( 20 02 ) (30 12 ) 2 21 03 2 411 30 12 21 03 7 (3 21 03 ) 30 12 ( 30 12 ) 2 3 21 03 2 50 (3-31) (3-32) (3-33) where pq pq ( 00 ) ( p q 2) 2 (Flusser, Suk & Zitova, 2009),(Celebi & Aslandogan, 2005a; Celebi & Aslandogan, 2005b)) Invariant moments are non information preserving representation algorithm. Invariant moments have their drawbacks such as: Information redundancy Noise sensitivity Large variation in the dynamic range of values. Region based Convex Hull is an example of structural algorithm that segment the shape into parts that are then used for image shape representation and description. It can further be classified as space domain and a non information preserving method. A region R is convex C if and only if for any two points x, y R , the whole line segment xy is inside the region xy R . The convex hull CH of a region R is the smallest convex region C that fulfils the condition R C . The convex deficiency CD is the difference between the convex hull CH and the region R as given in equation 3-34. CD CH R (3-34) The computing of the smallest convex shape, called the convex hull CH that encloses a set of points is the real problem. The image shape is represented using a series of convex hulls. The extraction of convex hull can use both boundary tracing algorithms and morphological algorithms (Zhang & Lu, 2004). In order to decrease the effect of noise, irregular boundaries and variations in segmentation, the usual practice is to first smooth a boundary prior to partitioning. The representation of the image shape may be obtained by a recursive process which results in a concavity tree as shown in figure 3-15 (Mingqiang et al., 2008) . 51 FIGURE 3-15: (a) Convex hull and its Concavities (b) Concavity representation tree of the convex hull Figure 3-15 illustrate the convex hull of the object shape with its convex deficiencies, then the convex hulls and deficiencies of the convex deficiencies, the process stops only when all derived convex deficiencies are convex. From the figure it appreciated that s1 , s2 , s3 , s4 , s5 are convex deficiencies and the same time s 2 , s3 , s 4 are already convex hulls. The process continued on convex deficiencies s1 , s5 to produce convex hulls s11, s12 , s51, s52 then the process stopped. The object shape can then be represented as a concavity tree. Each concavity can be described by its area, bridge length (the line that connects the cut of the concavity), and maximum curvature and so on. The matching between shapes becomes a string or a graph matching. 3.2.3.1 MERITS AND DEMERITS OF REGION/WHOLE BASED TECHNIQUES Advantages generic image representation and description (both boundary and internal pixels are used) not sensitive to noise (small variations on the image object would not affect the representation and description of an image object so much) Disadvantages it uses all pixels of an image 52 a high computation complexity 3.2.4 EVALUATION OF REPRESENTATION AND DESCRIPTION ALGORITHMS MPEG-7 has set several principles to measure a shape descriptor such as: Good retrieval accuracy Compact features General application Low computational complexity Robust retrieval performance Hierarchical coarse to fine representation Most authors evaluate their representation and description methods by comparing the retrieval efficiency against other methods (Tran & Ono, 2000), (Lecce & Guerriero, 1999). This type of evaluation is not objective since one author has to reconstruct another author’s system then use one’s chosen image database to do the comparison. In general authors evaluate whether their methods fulfill the TRS invariant and noise resistance then tabulate their retrieval performance on an image database of their choice (Tran & Ono, 2000), (Mingqiang et al., 2008), (Muller, Michoux, Bandon & Geissbuhler, 2004). This form of evaluation takes into consideration two of MPEG-7 principles to measure a shape descriptor that is good retrieval accuracy and robust retrieval performance. Few authors evaluate their method using most of the stated MPEG-7 principles (Sheng & Xin, 2005). The testing of TRS and affine invariants is objective since anyone can prove its validity analytically. Retrieval efficiency does not only evaluate the representation and description algorithm but also similarity distance method used. Any improvements can either be on the representation and description algorithm or on the similarity method. Since representation and description algorithms dependent on the segmentation method used, comparing retrieval results might not give an objective evaluation of one’s method. Robustness is also subjective in the sense that the author is the one who has to select the noisy, distorted and defective images to use in the experiment. There are image databases that are accessible to 53 everyone on the Internet that authors can use in their evaluation experiments. It becomes subjective in the sense that it is up to the author to choose which database to use, for example the CE_Shape-1 (Latecki, Lakamper & Eckhardt, 2000). So a structured way of evaluating of representation and description algorithms is necessary, for the evaluation to be objective. 3.2.5 CHALLENGES AND FUTURE DIRECTIONS For us to find a general application representation and description algorithms, we need to have ordered domain independent image databases to evaluate the algorithms. Authors should be able to allow other authors to use their programmes code for the sake of those who wish to compare efficiency of different algorithms. This would require image representation programmes code databases. In this case, authors will need to indicate the database used in their experiments and segmentation technique used so that comparison can be objective. There must be a way of evaluating computational complexity, as of now it is very subjective. Due to the subjectivity of evaluating these algorithms it is very difficult to select better algorithms for particular area or general area. An orderly way of evaluation of algorithms will give direction in research of these algorithms. 3.2.6 IMAGE REPRESENTATION SUMMARY In this chapter some of the existing shape representation and description techniques have been reviewed and classified. The evaluation of the algorithms, challenges and future directions in this area have also been discussed. It was found that contour based approaches are useful where the shape contour is of interest, whilst the shape interior content is not important. However the contour based algorithms have their limitations. Contour based shape representation and description algorithms are generally sensitive to noise and variations due to the fact that they only use a small part of the shape information. In some cases the shape contour information is not available due to problems encountered during the preliminary stages of image processing or during the capturing of the image. These limitations can be overcome by employing region based shape representation and description algorithms. Region based algorithms are more robust because they utilize all the shape information available. Region based have advantages in that they can be applied to general applications and they provide more accurate retrieval. These advantages stem from the fact that they can cope very well with shape defection. These methods are also 54 classified into global and structural approaches. Comparing the two, structural approaches are too complex to implement. They have high indexing and matching complexities making them a family of unstable shape representation and description algorithms. The structural algorithms exhibit some advantages in that they can do partial matching. This is useful when part of the boundary is missing or part of the shape also missing or occluded. The algorithms are also classified into spatial domain and transform domain. Spatial domain algorithms have their own disadvantages in noise sensitivity and high dimension. In general region based algorithms give hope of finding a method that fulfils all the six principles set by MPEG-7. The principles are good retrieval accuracy, compact features, general application, low computation complexity, robust retrieval performance and hierarchical coarse to fine representation. The only way of finding the ‘best’ shape image representation and description method is when there are standardized evaluation methods for the algorithms. The table 3-2 below shows the research interests that we believe are necessary to be pursued in the quest of finding complete generic and effective image representation and description algorithms. TABLE 3-2 Representation Techniques Summary Representation and Description Algorithm Contour Based Techniques Region Based Techniques Research Interest Known Problems in Representing and Describing images Domain independent algorithms Objective or orderly evaluation of algorithms An objective way to find suitable method for similarity distance measurement for different representation and description algorithms Calculation of computational time Sensitive to noise Not generic Computation intensive 55 3.3 IMAGE (DIS)SIMILARITY MEASUREMENT AND DATABASE ACCESS ALGORITHMS The rapid growth in the collection of multimedia data like images, audio, video and text has prompted the need to have efficient methods for storage, retrieval and indexing of such data. The content based image (dis)similarity measurement algorithms, if chosen correctly for a particular multimedia database (s), will definitely increase the efficiency and effectiveness retrieval of data of interest. In this chapter we will discuss the (dis)similarity measurement algorithms of images represented using their visible features (shape, colour and texture) and retrieval algorithms. 3.3.1 (DIS)SIMILARITY ALGORITHMS Similarity ( s ) can be defined as the quantitative measurement that indicates the strength of relationship (closeness) between two image objects. Dissimilarity ( d ) is also a quantitative measurement that reflects the discrepancy (disorder, distance apart) between two image objects. We formalise the definition of (dis)similarity in definition 1. Definition 1 (Dis)Similarity (s/d) Let Y be a non-empty set and s/d be a function on a set Y, such that s / d : YxY R, where R is the set of real numbers This function is called pair-wise similarity/dissimilarity function. A (dis)similarity space is a pair (Y, s (d)) in which Y is a non-empty set and s (d) is a (dis)similarity on Y. It is possible to convert similarity value to dissimilarity value. The s/d function is bounded. There is a relationship that exists between similarity and dissimilarity that allows us to derive the similarity values from dissimilarity values. The relationship is given by sij 1 d ij where d ij is a normalized dissimilar ity value between objects i and j sij 0,1 Or 56 (3-35) sij 1 2d ij where d ij is a normalized dissimilar ity value between objects i and j sij 1,1 (3-36) From the equations 2.1a and 2.1b we can have an equivalence relationship between dissimilarity and similarity measurements. This equivalent relationship is shown below. sij siz d ij d iz , i, j, z X (3-37) Table 3-3 summarises the interpretation of the values of similarity and dissimilarity. TABLE 3-3 Interpretation of (dis)similarity values Given two objects i and j using equation 1.1a Similarity value Dissimilarity value Exact similar 1 0 Very different 0 1 Given two objects i and j using equation 1.1b Exact similar -1 1 Very different 1 0 General Higher value Lower value Lower value Higher value interpretation The (dis)similarity measurement algorithms can be grouped into metric and non-metric. Metric is defined in definition 2. Definition 2 (Dis)similarity Metric (Frechet) 57 Let X be a non-empty set. A metric on X is a function d of X x X into 0, , that satisfies the following conditions: a) d ( x, y) 0, x, y X Non-negativity b) d ( x, y) 0, if and only if x y, Reflexivity c) d ( x, y) d ( y, x), x, y X , Symmetry d) d ( x, y) d ( x, y) d ( z, y), x, y, z X . Triangle inequality A metric space is a pair (X, d) in which X is a non-empty set and d is a metric on X. Observation from the definition is that the metric is not bounded. In our case we need a bounded metric, thus we will have an upper bound transforming it into bounded metric. Non-metric (dis)similarity algorithms do not fulfil at least one metric conditions. Depending on which metric condition(s) the non-metric (dis)similarity algorithm does not fulfil a distinguishing term is used as shown in TABLE 3-4 (Skopal & Bustos, 2010). 58 TABLE 3-4 Non-metric Clasification Metric Condition Metric Condition not Fulfilled Fulfilled Reflexivity, Non-negativity, Symmetry Non-negativity, Symmetry, Triangle Inequality Reflexivity, Non-negativity, Triangle Inequality Reflexivity, Symmetry, Triangle Inequality None Distinguishing Term Triangle Inequality Semi-Metric (Non-Metric) Reflexivity Pseudo-Metric (Non- Metric) Symmetry Quasi-Metric (Non-Metric) Non-negativity ? (Non-Metric) Reflexivity, Full-Non-Metric Non-negativity, Symmetry, Triangle Inequality These (dis)similarity algorithms have been used effectively to retrieve images of interest successfully (Antani, Lee, Long & Thoma, 2004),(Petrakis & Faloutsos, 1997),(Stejic, Takama & Hirota, 2003). What makes an algorithm perfect for a certain image database is the contribution it has to the effectiveness and efficiency of content based image retrieval system. Effectiveness of retrieval is usually measured by precision (which is the number of correct image retrieved divided by the total number of images retrieved) and recall (is the number of correct images retrieved divided by the total number of possible correct images)(Zheng, Sherrill-Mix & Gao, 2007a). 59 precision A AC (3-38) A N (3-39) recall A N if T N A effectiven ess if T N T A precision A C (3-40) Where A is the number of relevant image objects retrieved, C is the number of not relevant image objects retrieved, T is the number of relevant images that the user requires from the database and N is the total relevant images in the database. Efficiency of retrieval is the speed of retrieval (Skopal & Bustos, 2010). Metric and nonmetric (dis)similarity algorithms compete equally well in the effectiveness of retrieval. Non-metric lags behind in the efficiency of retrieval. This is because the indexing of databases is skewed in favour of metric (dis)similarity algorithms. It must be noted that an effective retrieval system is useless in large databases if it is not efficient. Next sections we are going to look at some metric and non-metric (dis)similarity algorithms. 3.3.1.1 METRIC (DIS)SIMILARITY (D/S) ALGORITHMS Metric (D/S) algorithms exhibited high degree of effective and efficient retrieval of images of interest from a very large image database. Many researchers used metric S/D algorithms showed high precision and recall retrieval results (Tran & Ono, 2000),(Zhang & Lu, 2002), (Zheng et al., 2007b). The metric conditions could be used to index the image database for high efficient retrieval (Skopal & Bustos, 2010). The following are some of the mostly used metric S/D algorithms: 1. Minkowski Family L p ( p 1 where p 1, 2, 3..... ) d n p | x i 1 i yi | p (3-41) 60 Within this family very few have been used in image retrieval and they are Euclidean L2, City block L1 (taxicab norm, Manhattan) and Chebyshev L∞ dissimilarity formulas. The formulas are given in equations 3-42 to 3-44 below: Euclidean L2 | x d i (3-42) yi | 2 City block L1(taxicab norm, Manhattan) n d | xi y i | (3-43) i 1 Chebyshev L∞ d max | xi yi | (3-44) i 3.3.1.2 NON-METRIC (DIS)SIMILARITY ALGORITHMS Non-metric D/S algorithms have been used and produced high degree of effective and efficient retrieval results from very large databases. This is in part due to the fact that researcher created weak metric (dis)similarity algorithms from these non-metric algorithms (Clarkson, 2005) We are going to look at some of the non-metric S/D algorithms. 1. Pearson Dissimilarity Family 61 1 n XX r n i 1 x Y Y y where r , X X x (3-45) , X , x -are Pearson correlation coefficient, the standard score, mean and standard deviation respectively. Pearson dissimilarity measure algorithms are given as d 1 r , where d [0, 2] (3-46) d 1 | r |, where d [0,1] (3-47) There are other (dis)similarity algorithms that use correlation, some of them are Spearman rank correlation, Kendall’s , Uncentred correlation(Cha, 2007). 2. Minkowski Family L p ( p 0,1 where p are fractions ) d n p | x i 1 i yi | p d is called fractional dissimilar ity (3-48) 3. Shannon Entropy Family In this family of (dis)similarity algorithms are Kullback-Leibler, Jeffreys/J divergence, Jensen-Shannon and Jensen difference just to mention a few, are some of the non-metric algorithms that have been used in image retrieval systems (Cha, 2007). The formulas are given in equations 3-49 to 3-52 below. Kullback-Leibler n d xi ln i 1 xi yi (3-49) 62 Jeffreys/J divergence n d ( xi yi ) ln i 1 xi yi (3-50) Jensen-Shannon d 2 xi 1 n xi ln 2 i 1 xi y i n 2 yi yi ln i 1 xi y i (3-51) Jensen difference n x ln xi yi ln yi xi yi xi yi d i ln 2 2 2 i 1 (3-52) 4. X 2 family Squared Euclidean, Pearson, Neyman, Clark and additive symmetry are some in this group (Cha, 2007). The formulas are given in equations 3-53 to 3-57 below. Squared Euclidean n d xi y i 2 (3-53) i 1 Pearson X 2 63 n xi yi 2 i 1 yi d (3-54) Neyman X 2 n xi yi 2 i 1 xi d (3-55) Clark | x yi | d i i 1 xi y i n 2 (3-56) Additive symmetry X 2 d xi yi 2 xi yi (3-57) xi y i 5. Inner Product Family The inner product family (dis)similarity measurement include the inner product explicitly in their formulas. In this family we are going to look at only three formulas, that is inner product, harmonic mean and cosine. We are interested in cosine since it is a normalised 64 inner product which allows for physical comparison of (dis)similarity measurements of images. The formulas of the inner product family members are given in equations 3-58 to 3-60. Inner Product n d xi y i (3-58) i 1 Harmonic Mean xi y i xi y i d 2 (3-59) Cosine n x y d i i 1 n i x y i 1 3.3.2 (3-60) n i i 1 i THE RELATIONSHIP BETWEEN (DIS)SIMILARITY ALGORITHM AND DATABASE INDEXING It is important to make a decision on how the database is going to be accessed for a speedy retrieval of the image (s) of interest. The (dis)similarity algorithm used must be able to fulfil certain properties that can be used to index image database for efficient retrieval. The metric axioms are most commonly known properties that a (dis)similarity must fulfil. Thus most databases are modelled in metric space (Bustos, Kreft & Skopal, 2011). This has 65 prompted to have Metric Access Methods that works efficiently with metric modelled databases. The non-metric (dis)similarity algorithms have given the domain experts the freedom to find suitable (dis)similarity algorithms in their domain without bothering about metric axioms. This has create another challenge of finding non-metric access methods for efficient retrieval (Skopal & Bustos, 2010). 3.3.2.1 METRIC ACCESS METHODS (MAM) Definition 3 Metric Access Method (MAM) Set of algorithms and data structure (s) providing efficient (fast) similarity search under the metric space model (Skopal, 2010). The triangle inequality property is the fundamental principle that MAM use to index the object of database in different classes (Skopal & Bustos, 2010). This property is used to create bounds (lower bound and upper bound) of a distance that is not known. Using the lower and upper bound a query can be processed much faster. There are two ways of making a metric (dis)similarity query: Range Query and k-nearest neighbours Query. The mathematical formulation is given as follows: Let X be a set of objects (database), and (X, d) is a metric space. Query object q that is to be searched in the database X. A range query (q, r) is defined as the objects x X that are within (dis)similarity r to q that is d (q, x) r . A k-nearest neighbour reports k number of objects closest to q. Using triangle inequality property to establish the lower and upper bound of d (q, x) , we use an object p X called pivot, (Dis)similarity of d ( p, x) and d ( p, q) are known. Using the known construct the following two triangle inequality: d ( p, x ) d ( p, q ) d ( q, x ) d ( p, q) d ( p, x) d ( x, q) (3-61) Thus we can deduce that the lower bound of d (q, x) is d (q, x) d ( x, q) d ( p, x) d ( p, q) d ( p, q) d ( p, x) . (3-62) The upper bound of d (q, x) is d (q, x) d (q, p) d ( p, x) (3-63) 66 The d (q, x) is bounded as: d ( p, q) d ( p, x) d (q, x) d (q, p) d ( p, x) (3-64) When a query is being processed most objects that do not satisfy the above inequality are discarded so the image retrieval becomes efficient. These MAM can be classified as non-hierarchical and hierarchical. The non-hierarchical use the above inequality directly in the search while the hierarchical it indirectly. Some examples of MAM are given in Table 3-5. TABLE 3-5 Examples of Metric Access Methods NON-HIERARCHICAL MAM Approximation and Elimination Search Algorithm (AESA) HIERARCHICAL MAM Linear AESA Metric Tree (M-Tree) Geometric Near-Neighbour Access Tree (GNAT) D-Tree vp-Tree Pivot Table 3.3.2.2 NON-METRIC ACCESS METHODS Non-metric (dis)similarity algorithms face a big challenge of indexing database without structured properties that govern them. In fact most the non-metric (dis)similarity if not all are not full-non-metric. There have been a concerted effort to transform them to metric or making sure they fulfil the triangle inequality in order to be able to use MAM to improve the retrieval rate (Clarkson, 2005). Alternative properties to metric properties are also being used to index non-metric modelled databases (Skopal & Bustos, 2010). 3.4 IMAGE (DIS)SIMILARITY MEASUREMENT AND DATABASE ACCESS ALGORITHMS SUMMARY The choice of a (dis)similarity method to use for (dis)simialarity search for certain multimedia domain can no longer depend on the effectiveness of retrieval alone but on the 67 efficiency (speed) of retrieval. The domain experts were only worried about the effectiveness due to the fact that the databases were small. Nowadays with the large volume of multimedia data virtually in every field, there is need to think about the efficiency of retrieval. The (dis)similarity methods contribute immensely to the indexing of the database for efficient retrieval. The database access methods depend on the properties of the (dis)similarity methods. We have seen the most commonly used (dis)similarity methods and that there are grouped into metric and non-metric. The choice of metric (dis)similarity methods in searching the similar objects has a lot of advantages in the fact that they are well supported by the MAM and that the databases are metric modelled. On the other hand the non-metric (dis)similarity methods lack concrete support due to the scarcity availability of non-metric modelled databases and non-metric access methods. 3.5 EVALUATION ALGORITHM OF INFORMATION RETRIEVAL SYSTEMS Evaluation is very crucial and tedious task in information retrieval system. There are many retrieval models, algorithms and systems in literature so in order to proclaim the best among many, choose one to use and improve there is need to evaluate them. One way to evaluate is to measure the effectiveness of the systems. The difficult of measuring effectiveness is that it is associated with the relevancy of the retrieved items. This makes relevance the foundation on which information retrieval evaluation stands. Thus it is important to understand relevance. In order to support laboratory experimentation in the early studies, relevance was considered to be topical relevance, a subject relationship between item and query. According to (Rasmussen, 2002) relevance is seen as a relationship between any one of a document, surrogate, item, or information and a problem, information need, request, or query. Relevancy from the human perspective is subjective (depends upon a specific user’s judgement), situational (relates to user’s current needs), cognitive (depends on human perception) and dynamic (changes over time). With the problems associated with relevance, it is very difficult to implement user-oriented evaluation of the system and it requires many resources. This problem of relevance has been researched in textual and non-textual environments (Choi & Rasmussen, 2002; Rasmussen, 2002). As a result, information retrieval evaluation experiments attempt to evaluate the system only (Mandl, 2008). An objective expert is then used to judge the relevance of a document/item to one information need. There are many algorithms to evaluate the retrieval systems and can be classified into those that are used to evaluate 68 ranked or unranked retrieval results (Manning, Raghavan & Schutze, 2008). They can also be regrouped into visual (graphical techniques) and scalar evaluation methods (non-visual techniques) (Hoshino, Coughtrey, Sivaraja, Volnyansky, Auer & Trichtchenko, 2009). The overview of the classification of the techniques is shown in FIGURE 3-16. Evaluation Techniques For IR Systems Techniques for Techniques for Evaluation of Evaluation of unranked Results ranked Results Non-Graphical Graphical Non-Graphical Representation Representation Representation Techniques Techniques Techniques FIGURE 3-16: Hierarch of classification of evaluation techniques for IR systems In this brief review of the evaluation techniques for information retrieval system, the following techniques will be reviewed using the classification in FIGURE 3-16: Precision, Recall, F-measure, Precision-Recall curve, Mean Average Precision, Receiver Operating Characteristics (ROC) curve and Area Under ROC Curve (AUC). The merits and demerits of these techniques will be discussed then investigate criteria to choose the appropriate algorithm(s) to use in different situations. Finally open issues will also be discussed and then conclusion. 69 3.5.1 TECHNIQUES FOR EVALUATION OF UNRANKED RETRIEVAL RESULTS The most frequently and important basic measures for information retrieval effectiveness are precision and recall (Mandl, 2008; Manning et al., 2008). Precision can be defined as the fraction of retrieved items that are relevant to all retrieved items or the probability given that an item is retrieved it will be relevant and recall as the fraction of relevant items that are retrieved to relevant items in the database or the probability given that an item is relevant it will retrieved (Manning et al., 2008). These notions can be made clear by examining the following set diagram (FIGURE 3-17). FIGURE 3-17 indicates the most important components of these measurements and formulas can be derived from the diagram. A B Relevant Retrieved Retrieved & Relevant FIGURE 3-17: Set Diagram showing elements of Precision and Recall The formulas for Precision (P) and recall (R) using set notation are below: P R n A B n B (3-65) n A B n A (3-66) To the user the scalar value of recall indicates the ability of the system to find relevant items as per query from the collection of different items and precision ability to output top ranked relevant items as per query. In general the user is interested in the relevant retrieved 70 items thus the measures of precision and recall concentrate the evaluation on the relevant output of the system. The lower the values indicates bad performance of the system and the higher the values the more the user is encouraged to use the system due to the anticipation of getting more of the relevant search items. These evaluation measures are interdependent measures in that as the number of retrieved items increases the precision usually decreases while recall increases. From these measures there are other measures that are derived from them. F-measure is one known measure derived from precision and recall measures. This is scalar quantity that trade off precision versus recall which is the weighted harmonic mean of precision and recall. The formula is given in the equation below (Baeza-Yates & Ribeiro-Neto, 1999; Zhou & Yao, 2010): 1 F * 1 1 (1 ) * P R (3-67) where [0, 1] . The default balanced F measure equally weights precision and recall, which means making 1 . The weights can be varied as required. 2 It is important to note that precision, recall and F measure are set oriented measures thus cannot adequately be used in ranked results systems (Mandl, 2008). 3.5.2 TECHNIQUES FOR EVALUATION OF RANKED RETRIEVAL RESULTS This section describes techniques for evaluation of ranked information retrieval results that use precision and/or recall measures. Among these techniques is Precision-Recall curve (PR-curve), R-precision, Mean Average Precision (MAP) and Precision at k just to mention a few. Most current systems present ranked results thus to be able to use the precision and recall measures there is need to pair them at each given position. Considering the first k retrieved items, the precision and the recall values can be calculated as long the total relevant items are known in the database. The following example illustrates the construction of the precision-recall curve. 71 Table 3-6: Showing the calculation of precision-recall coordinates Calculating Precision-Recall Points Query Item=I56 Known #relevant items in database=5 Rp 1 2 3 4 5 6 7 8 9 10 11 ItemID I2 I33 I12 I8 I67 I99 I5 I1 I23 I3 I9 Relevance Yes No Yes Yes Yes No No No No No Yes Recall 1/5= 1/5= 2/5= 3/5= 4/5= 4/5= 4/5= 4/5= 4/5= 4/5= 5/5= 0,2 0,2 0,4 0,6 0,8 0,8 0,8 0,8 0,8 0,8 1,0 1/1= 1/2= 2/3= 3/4= 4/5= 4/6= 4/7= 4/8= 4/9= 4/10= 5/11= 1,0 0,5 0,67 0,75 0,8 0,67 0,57 0,5 0,44 0,4 Value Precision Value 0,45 From Table 3-6 Rp is the ranked position of an item retrieved and ItemID is the item identification. It can also be observed that when the item on Rp+1 is not relevant the recall remains the same and precision decreases as shown in Table 3-6 when Rp+1 =2, recall remained 0,2 as it was in Rp =1, precision decreased from 1,0 to 0,5. In case where the item in Rp+1 is relevant the recall increases and the precision increases or remains the same. The P-R graph is the plotted from the precision-recall values in Table 3-6. The graph can be seen in FIGURE 3-18 with points marked using stars that have distinct saw-tooth shape. In order to smoothen the graph the interpolated precision is used and the interpolated precision P at certain recall level r is defined as the maximum precision found for any recall level r’ as in equation 3-68. P(r ) max p(r ' ) (3-68) r r ' Interpolate a precision value for each standard recall level in Table 3-6 and the following Table 3-7 of 11-point interpolated average precision is obtained. 72 Table 3-7: 11-Point Interpolated Average Precision r’ 0,2 0,2 0,4 0,6 0,8 0,8 0,8 0,8 0,8 0,8 1,0 R 0,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0 1,0 1,0 1,0 0,67 0,67 0,75 0,75 0,80 0,8 0,45 0,45 p(r ) So the graph marked with stars is transformed to the graph marked with “Xs” in FIGURE 3-18. FIGURE 3-18: Graphs for values in Table 1 and Table 2 For more variations of Precision-Recall curves consult (Baeza-Yates & Ribeiro-Neto, 1999; Manning et al., 2008). Looking at non graphical evaluation techniques related to precision and/or recall, there is MAP which has gained popularity among the Text Retrieval Conference (TREC) members (Manning et al., 2008). MAP is one of the various ways of combining precision and recall into a single scalar value measure which is defined as an average of the average precision value for a set of queries. Average precision is calculated by averaging the precision for 73 every position in the ranking at which a relevant item is retrieved. Relevant items not retrieved by cutoff depth are assigned a precision of zero. The scalar value obtained is approximately equal to the area under the precision-recall curve. MAP expresses the quality of the system in one number. The formula that is used to calculate the MAP is given in equation 3-69 below. k 1 MAP Re k nRe k 1 Re i 1 i k (3-69) where nRe is the number of relevant items, Re k and Re i take zero or one indicating not relevant or relevant at position k and i respectively. There are other measures like Precision at k and R-precision that can be used. Precision at k shortened as P@k is the precision calculated at a cut-off point k. This measure does not measure recall. It is criticized in that relevance items for a query have a lot of influence on precision at k but is ignored. In order to alleviate this problem R-precision measure is introduced. In this measure the number of relevant items is known and it becomes the cutoff point. The formula is given in equation 3-70 below: R Pr ecision 1 n (Re) Re k nRe k (3-70) The R-precision measure is also called break-even point. R-precision refers to the best precision on the precision-recall curve. Receiver Operating Characteristics curve is also used in information retrieval systems performance evaluation. In order to illustrate how ROC works it is important to understand the confusion matrix. A confusion matrix shows the differences between the true and predicted classes (Bradley, 1997). The confusion matrix is shown in Table 3-8 below. 74 Table 3-8: Confusion Matrix Actual Positive Actual Negative Total predicated Predicated Positive TP FP TP+FP=TPP Predicated Negative FN TN FN+TN=TPN Total Actual TP+FN=TAP FP+TN=TAN N where TP is true positive(items correctly labelled as similar to query), FP false positive (items incorrectly labelled as similar to query item), FN false negative (items incorrectly labelled as not similar to query), TN true negative (items correctly labelled as not similar to query item), TPP total predicated positive, TPN total predicted negative, TAP total actual positive, TAN total actual negative and N =TAP+TAN=TPP+TPN. From the confusion matrix more meaningful measures can be derived from it to illustrate performance criteria as shown in equations 3-71 and 3-72 below (Davis & Goadrich, 2006; Landgrebe, Paclik & Duin, 2006): TPR or Sensitivit y or recall FPR or 1 Specificit y TP TP TP FN TAP FP FP FP TN TAN (3-71) (3-72) TPR (True Positive Rate) measures the fraction of all relevant items in the database that have been correctly labelled similar to the query. FPR (False Positive Rate) measures the fraction of all irrelevant items in the database that have been incorrectly labelled similar to the query. These measures of performance are valid only for one particular operating point, an operating point normally being chosen to minimize the probability of error. The ROC curve is a plot of TPR versus FPR across different thresholds (Brodersen, Ong, Stephan & Buhmann, 2010). The TPR is plotted on the y-axis while FPR on the x-axis. Thus it offers a threshold-independent way of evaluating information retrieval performance. Usually a 75 ROC curve always moves from the bottom left to the top right of the graph. Performance of a model represented as a point in an ROC curve. A good system produces results that generate a graph that climbs steeply on the lift side as can be appreciated in FIGURE 3-19 (right hand side graph). The point (0, 0) indicates that everything to be negative class, (1, 1) indicates everything to be positive class and (0, 1) is the ideal situation. The diagonal line indicates a random guessing. Any point below the diagonal line predicts an opposite of the true class indicating a lower TPR and/or higher FPR (Drummond & Holte, 2000; Ferri, Hernandez-Orallo & Salido, 2003). FIGURE 3-19: Graphs illustrating the appearance of P-R and ROC curves The ROC curve also brings another form of measure of the performance of a system. This measure is ROC Area Under Curve (AUC) a simple scalar metric that defines how an algorithm performs over the whole space. The area can be calculated using the trapezoidal area created between each ROC curve points (Drummond & Holte, 2000). AUC value range is [0 1]. One indicates an ideal performance of a system, 0.5 a random guess performance of the system and a zero a system that never retrieves anything similar to the query (Walter, 2002). 3.5.3 RELATIONSHIP BETWEEN ROC AND P-R RELATED MEASURES The ROC and P-R curves are visual performance measures as seen in FIGURE 3-19. In (Davis & Goadrich, 2006) it is shown that a curve that that dominates in ROC space also dominates in P-R space and vice versa. This is illustrated in FIGURE 3-19, the comparison of the two systems in P-R space and ROC space shows that system represented with dashed line performs better in both spaces. Again from these graphs it can be 76 appreciated that the area under the curves in both spaces are approximately equal. In P-R space the area under the curve is called MAP and in ROC space ROC-AUC. The bigger the area the better the system performs. 3.5.4 CONCLUSION Performance evaluation is crucial at many stages in information retrieval system development. At the end of development process it is significant to show that the final retrieval system achieves an acceptable level of performance and that it represents a significant improvement over existing retrieval systems. To evaluate a retrieval system, there is need to estimate the future performance of the system. The information retrieval performance evaluation methods measures highlights different aspects of a model’s classification performance and so selecting the most appropriate performance measure is clearly application dependent (Landgrebe et al., 2006). The scalar measures are attractive to use because they give a definitive answer to which retrieval system is better, this gives authors the authority to claim the superiority of their algorithm. The scalar measure gives an overall value of performance of the system and no any other information. The visual performance measure preserves all performance related information about a retrieval system. The visual performance measure is capable of showing if one system dominates the other system totally or partially. The traditional binary evaluation methods play a dominate role in the history of information retrieval system evaluation. These methods include recall, precision, MAP, precision at k and R-precision (Zhou & Yao, 2010). Precision-recall analysis has remained as the appropriate evaluation performance measures of choice in applications such as database image retrieval. Precision-Recall Curve (PRC) which plot precision vs recall across all thresholds represents a more natural way of looking at classification performance when it comes to search relevant items (information retrieval) in situations where the available data is heavily imbalanced in favour of the negative class (Jarvelin & Kekalainen, 2000). The end-users relate to precision-recall curves as they indicate how many true positives are likely to be found in a typical search. Evaluation at single operating point is suitable in well defined environment where class priors and misclassification cost are known (Landgrebe et al., 2006; Rasmussen, 2002). 77 ROC curve is helpful in assessing the performance of a system independently of any given threshold. The ROC curve which plot TPR vs FPR allows authors to quickly see if one method dominates another, and using convex hull to identify potentially optimal methods without committing to a specific performance measure. There is a scalar measure related to ROC curve which ROC Area Under Curve (ROC-AUC) is also used to measure predictive system’s performance. There are many other methods suggested in literature, they all fall within these two categories: Scalar and Visual measures. The few described above seem to be the mostly used in evaluation methods of information retrieval systems’ performance. 3.6 CHAPTER SUMMARY In this chapter the algorithms for image segmentation, representation and retrieval were reviewed. Image segmentation algorithms review revealed that region based segmentation techniques exhibited excellent qualities required for generic image segmentation algorithms. These region based algorithms perform differently in different application areas due to the statistics selected to model regions for segmenting. For example attempting to model regions using global statistics for segmenting heterogeneous object image would not produce desirable results. Region based algorithms can segment images without or with smooth edges successfully. These algorithms are less sensitive to image noise and the location of the initial contour. All these advantages that region based image segmentation algorithms have would benefit the recommender system (Image Content in Shopping Recommender System for Mobile Users) if a region based segmentation technique is cooperated into the system properly. Image representation algorithms review indicated that region based and global methods have more advantages in representing images in a generic domain. Region based algorithms are more robust because they utilize all the shape information available. Region based have advantages in that they can be applied to general applications and they provide more accurate retrieval. The choice of the representation method to be used in the recommender system (Image Content for Shopping Items Recommender System for Mobile Users) will come from the region based and global methods classes. 78 The review of the (dis)similarity algorithms shows that the choice of the technique to use depend on how it affects the effective and efficiency of the retrieval system. So the decision on the (dis)similarity algorithm will be done after testing its effect on the effectiveness and efficiency of the system. In conclusion the three algorithms that are image segmentation, representation and (dis)similarity algorithms must be compatible to each other for effective and efficient recommender system (Image Content for Shopping Item Recommender System for Mobile Users). The next chapter will reveal whether the algorithms will be selected from existing methods or new ones will be created for the recommender system. 79 CHAPTER 4 4 SHAPE IMAGE CONTENT FOR MOBILE RECOMMENDER SYSTEM The Image Content in Shopping Items Recommender System for Mobile Users architecture proposed in chapter 2 shown in figure 2-3 shows that there is need to have item and user profiles in order to be able to personalize the shopping recommendations. This section provides detail information about item and user profile representations modelling, recommendation process and user interaction modelling and recommendations list computation process of a shopping recommender system for mobile users. The goal of our item representation is to model features that are common to large classes of items. In reality, this makes the system suitable as an e-Commerce recommender that can be used for sales of many different item types. We achieve this goal by capturing essential item information such as unique identifier, name, class, image (logo), price, payment method, shop and location. Specific features that are unique to each item are also captured and stored in the item database. Items in the database are identified by their logos (L). As a result, the recommendation method compactly represents items information as a feature vector of m values as i (i1 , i2 , i3 ,........im ) (4-1) where ii may be numeric, nominal or set of numbers. A typical example of an item feature vector is: i ( Lid , GPSs, LGPSs ) where LGPSs ( Lid , price range, size range, promotion,...etc ) (4-2) where Lid is identifier of the logo, GPSs is where the item can be found, LGPSs is the set of features of the item at different locations, price-range is the range of the item prices in its various sizes, size-range is the sizes available of the item and so forth. The user profile is also modelled as a feature vector of n values 80 u (u1 , u2 , u3 ,........un ) (4-3) where ui may be numeric, nominal or set of numbers. A typical example of an user feature vector is: u (GPS , I AFSs ) where I AFSs ( price range, size range,...etc ) (4-4) where GPS is where the user is located at the time of querying the system, I AFSs is the set of average features of the items bought by user previously that is price-range, size-range and so forth. When the mobile client sends a logo together with GPS coordinates to the system the following steps are taken: 1 Searches the logo in the database 2 Finds the logo similar to the query logo 3 Looks at the GPS coordinates of locations where the item can be found 4 Calculates the distances between the mobile client and the retail locations 5 Rank the locations according to distances calculated in 4 6 Calculate the similarity between I AFSs and LGPSs (acceptable distances) 7 The final ranking is done taking into consideration the 5 and 6 Recommendation is send to the mobile client with GPS coordinates of the chosen location, promotions and special offers. In this research the distance calculated in step 4 is done using the following formula (Adair & Turnbull, 1974): duL 2 * a sin( sqrt((sin((lat1 lat 2) / 2)) 2 cos(lat1) * cos(lat 2) * (sin((lon1 lon 2) / 2)) 2 (4-5) where lat1, lon1and lat 2, lon 2are GPS coordinates for the mobile client and the retail location respectively . It is very important to note that lat and lon stand for latitude and longitude respectively. North latitudes and west longitudes are taken as positive and south latitudes and east longitudes are taken as negative. 81 In step 6 the similarity is calculated using the Cosine similarity formula: sim( I AFSs , LGPSs ) I I AFS * LGPS L 2 AFS 2 GPS (4-6) In step 7 the similarity values for ranking are calculated using the following formula: R(sim) sim( I AFSs , LGPSs ) * uL (4-7) where uL is normalized d uL and then transformed to range [0, 1] and uL is calculated as: uL d uL 1 1 2 d uL a (4-8) In step 8 the biggest R(sim) is the one that is recommended to the user. The goal of the Image Content in Shopping Recommender System for Mobile Users is to efficiently find a set of items that match user desired item and give location of the nearest vendor. For this goal to be achieved the image retrieval system component of the recommender system should be very effective. The moment the system receives the image from the user it must be able to identify the correct type of image and match with user profile for retrieval of the user desired item. In chapter 2, figure 2-2 the user is supposed to query the system using images that are in the database or images captured by a mobile device. In this chapter the components of the retrieval system are going to be looked at in order to be able to build an effective retrieval system for the recommender system. We are going to develop a retrieval system that is capable of matching images that are in the database with the images in the database or captured by a camera enabled device that do not belong in the database but compatible with them. The system will be implemented using Matlab programming language. This language was chosen due to its capabilities in image processing. The following framework is going to be used in developing the system: 82 FIGURE 4-1: The framework of the retrieval system In developing the retrieval system there are stages that are very crucial in contributing to the success of the system. The following block diagram of the retrieval process clearly shows the stages that we follow in implementing the system. FIGURE 4-2: The Image Retrieval Process 83 We will explain what will be implemented in each stage in the block diagram figure 4-2. 4.1 IMAGE PRE-PROCESSING Image pre-processing is the expression for operations on images at the lowest level of abstraction. The objective of pre-processing is to improve image data, to suppress undesirable distortions or to enhance some image features relevant for further processing and analysis task (Miljkovic, 2009). Image may have noise, geometric distortions, varying image resolution and lighting conditions. Pre-processing these images help in noise attenuation, correction of image orientation and increase contrast of the image. In matlab there are pre-processing techniques available that can be used. Some of the techniques we are going to experiment with are histogram equalization, image filtering, resizing and morphological techniques. This will make us to be able to set our system for automatic preprocessing of all the images captured by camera enabled device. 4.2 SEGMENTATION METHODS Studying the types of images that constitute shop items, we find out that some might not have definite edges. We decided that segmentation methods that are capable of segmenting images without edges would be suitable for our study. The methods that fall in this category are Active Contour without Edges and Robust Image Segmentation using Local Median by Chan & Vese and Jundong Liu respectively. In (Chan & Vese, 2001; Liu, 2006) Active contour without Edges and Robust Image Segmentation using Local Median have the ability of detecting smooth boundaries, scale adaptivity and automatic change of topology. Active contour without edges uses global statistics to model regions while and Robust Image Segmentation using Local Median uses local statistics to model regions (Lankton & Tannenbaum, 2008; Liu, 2006).These are the characteristics that we are looking for in the shopping item domain. The experiment to justify its robustness can be seen in (Chan & Vese, 2001). The two methods are some of the candidates for segmenting images whose boundaries are not necessarily defined by gradient. We will comprehensively describe these two methods because we are going to experiment with them in this study. 84 4.2.1 ACTIVE CONTOUR WITHOUT EDGES Chan & Vese active contour algorithm comes from segmentation problem formulated by Mumford & Shah. Mumford & Shah formulated their Active Contour Method as follows: Let be a bounded open subset of 2 , that is the 2 - Dimensional image space, and : be a given gray image. In (Sezgin & Sankur, 2004), Mumford and Shah formulated the image segmentation problem as follows: given an image , finding a contour C which divides the image into non-overlapping regions representing different objects. They proposed the following speed function: F (u, C ) (u ) 2 dx | | 2 dx | C | (4-9) Where | C | the contour length, , 0 are constants to balance the terms. u is an image to approximate the original image , which is smooth within each region inside or outside the contour C . The first term in (3) ( (u ) 2 dx) over is data fitting. The second term in (3) ( | | 2 dx) over \ C is the smoothing term. The third term in (3 ) ( | C |) regularizes the contour by penalizing the arc length. The minimization of Mumford-Shah functional results in optimal contour C that segments the image into disjoined regions, and smooth version if image that is denoised image u . The equation (3) is not easy to solve due to different dimensions of u and C . F (u, C ) is not convex so may have multiple local minima. In order to overcome the problems mentioned Chan & Vese proposed the Piecewise Constant Model. Chan & Vese method is described as follows: Let c1 and c 2 denote the average image u(x, y) intensities inside and outside a random closed curve C respectively. C 0 denotes the boundary contour of an object region u 0 . A fitting term F is defined as: 85 F1 (C ) F2 (C ) | u 0 ( x, y) c1 ) | 2 dxdy inside( C ) | u 0 outside( C ) ( x, y) c2 ) | 2 dxdy (4-10) F is the energy function. The minimum of F is achieved only when C is fitted into C 0 enclosing the object region. Chan and Vese minimized the fitting term and added some regularizing term, like the length of the curve C and/or the area of the region inside C . The energy functional F (c1 , c2 , C ) is defined as: F (c1 , c2 , C ) .Length (C ) . Area (inside (C )) 1 | u 0 ( x, y) c1 | 2 dxdy inside( C ) | u ( x, y) c 0 outside( C ) 2 |2 dxdy (4-11) Where c1 and c 2 are the averages of u 0 inside C and outside C respectively. Using the Heaviside function , and the one-dimensional Dirac measure 0 , defined as respectively: 1, if z 0 H ( z) 0, if z 0, 0 ( z) (4-12) d H ( z) . dz (4-13) Then the terms in the energy function F are expressed in the following way: Length 0 | (( x, y)) | dxdy 0 (( x, y)) | ( x, y) | dxdy , (4-14) Area 0 (( x, y))dxdy (4-15) And 86 |u 0 |u 0 0 0 ( x, y) c1 | 2 dxdy | u 0 ( x, y) c1 | 2 (( x, y))dxdy , (4-16) ( x, y) c2 | 2 dxdy | u 0 ( x, y) c2 | (1 (( x, y)))dxdy . (4-17) Then the energy function can be written as: F (c1 , c2 , ) (( x, y)) | ( x, y) | dxdy (( x, y))dxdy 1 | u ( x, y) c1 |2 (( x, y))dxdy 2 | u( x, y) c2 | 2 (1 (( x, y)))dxdy (4-18) Where 0, 1 , 2 0 are fixed parameters. This is the function that Chan & Vese minimize. The calculation of c1 and c 2 is carried out as: c1 () u 0 ( x, y ) (( x, y ))dxdy (( x, y))dxdy (4-19) c 2 ( ) u 0 ( x, y )(1 (( x, y )))dxdy (1 (( x, y)))dxdy (4-20) These are global mean (an average intensity value), calculated based on the entire image. There are some assumptions that were taken into consideration that is Within each object the intensity values conform to Gaussian distribution The global mean (average intensity value) for different regions are distinct, therefore can be used in discriminating pixel. 4.2.2 ROBUST IMAGE SEGMENTATION USING LOCAL MEDIAN This method is not very different from the one above. Instead of using the global mean they used the local median. So in reality they introduced two functions f 1 and f 2 , both defined on the image domain, to represent the median values of the local pixels inside and outside the moving curve. Local in this case refers to that only neighbouring pixels will be 87 considered. The way to implement this “neighbourhood” is by introducing a rectangle window W with size of W (2k 1) * (2k 1) , (4-21) where k is a constant integer. Thus f1 median(u0 * inside(C ) *W (4-22) f 2 median(u0 * outside(C ) *W (4-23) The functions f 1 and f 2 are defined on the entire image domain. The f1 ( x, y) and f 2 ( x, y) are calculated for each point ( x, y) , and takes the median intensity value for the neighbouring pixels that are inside and outside the moving curve respectively. This method minimizes the following energy: F ( f1 , f 2 , C ) .Length (C ) . Area (inside (C )) 1 | u 0 f1 | dxdy inside( C ) 2 | u 0 outside( C ) f 2 )dxdy . (4-24) Mapping to level set framework, the new functional Liu attempts to minimize is F ( f1 , f 2 , C ) (( x, y)) | ( x, y) | dxdy (( x, y))dxdy 1 | u 0 f1 | (( x, y))dxdy 2 | u 0 f 2 | (1 (( x, y))dxdy (4-25) Accordingly, f 1 and f 2 are calculated as follows: f1 median(u0 * () *W ) (4-26) f 2 median(u0 * (1 ()) *W ) . (4-27) We are going to experiment with these two to investigate which one is suitable for shopping items. 88 4.3 IMAGE REPRESENTATION METHOD We will briefly describe the theory that we are going to use in formulating our representation technique. We are going to use non-parametric method to represent images shapes. Representing an object shape using a non-parametric class of density estimators is good because there are no assumptions about the distribution of the data set of the object shape. This type of representation determines the density based on the object shape data itself. Examples of some of the nonparametric methods are histogram, kernel density estimation and k-nearest neighbour estimation. One of density estimator that is mostly used is the histogram. In (Tran & Ono, 2000) Density Histogram of Feature Points (DHFP) was used to represent an object shape. In general, to create a histogram one needs the starting point x 0 and the bin width w . The major problem of histograms is their dependence on the width of the bins (bin size) (Shimazaki & Shinomoto, 2007). The frequency distribution is smoothed out (over-smoothing) or discretized (under-smoothing) when the bin size is increased or decreased respectively. In most cases the bin size has mostly been selected subjectively by individual researchers (Shimazaki & Shinomoto, 2007). The choice of bandwidth is often very critical in the implementation of nonparametric methods. The other problems with histograms are that they are not smooth and depend on the end points of the bins. The histogram suffers from the curse of dimensionality that is the number of bins grows exponentially with the dimension, a finer resolution implies a lot of bins and thus most bins will be empty. We can try to solve these problems by using another nonparametric method Kernel Density Estimator (KDE) in the representation of an object shape. We will explore the theory of KDE and then experiment to optimally represent object shapes in an image database using KDE. We will find an optimal way of calculating optimal bandwidths and for this project the Gaussian kernel function will be experimented with in representing the images. 4.4 THE 1-DIMENSIONAL KERNEL DENSITY ESTIMATION Kernel density function is way estimating a probability density function from observed data. Kernel density approaches exist for discrete and continuous data types. We will define kernel density estimator as follows: 89 Definition 1 Kernel Density Estimator Let ( x1 , x2 ,........., xn ) be an independent and identically distributed (i.i.d) sample drawn from some distribution with an unknown univariate density f . The kernel density estimator for f is f h ( x) 1 n 1 n x xi K ( x x ) h K i n i 1 nh i 1 h (4-28) with kernel function K (u ) and bandwidth h . 4.4.1 KERNEL FUNCTIONS Kernel function K (u) : is any function which satisfies K (u)du 1. (4-29) A probability density function is a non-negative kernel and it satisfies K (u) 0 u (4-30) When a kernel is symmetric then it satisfies K (u) K (u) u (4-31) The moments of a kernel are given in Equation 3-24, m j ( K ) u j K (u )du . (4-32) This enables us to define the order of a kernel as the order of the first non-zero moment. We can conclude that all symmetric non-negative kernels are of second order since m1 K (u) 0 and the first non-negative moment is 90 m2 ( K ) u 2 K (u )du K2 0 . (4-33) The order of a symmetric kernel is always even. A kernel is higher-order kernel if j 2 . These kernels are not probability densities because they have negative parts. We are interested in second order kernels (probability densities). Examples of second order kernels are given in Equations 3.26-3.28: 1 ( u2 ) 2 Gaussian K (u ) Uniform 1 K (u ) 1(| u | 1) 2 Epanechnikov K (u ) 2 e (4-34) (4-35) 3 (1 u 2 )1(| u | 1) . 4 (4-36) 4.4.2 KERNEL DENSITY ESTIMATOR (PROPERTIES) The density estimator must integrate to one 1 n 1 n 1 x xi f ( x ) dx K ( x x ) dx dx K h i h n n i 1 h h i 1 Applying change of variable u (4-37) Xi x which has a Jacobian h then we obtain: h 1 n K (u)du n i 1 Applying the property of kernel function that says it integrates to one, we obtain: 1 n 1 1 n 1. n i 1 n (4-38) We can conclude that f (x) is a valid kernel density estimator when K (u) 0 . Mean of the estimated density is: 91 x f ( x)dx 1 n 1 Xi x dx x K n i 1 h h Applying change-of-variable u (4-39) ( X i x) we have h 1 n ( X i uh) K (u )du n i 1 n 1 n X K ( u ) du i uhK (u)du n i 1 i 1 n 1 n X K ( u ) du h uK (u )du i n i 1 i 1 Applying the following: K (u)du 1and uK (u)du 0 , we have 1 n Xi n i 1 We can conclude that the mean of the estimated density f (x) is equal to the sample mean of X i . The variance of the estimated density f (x) can be calculated as follows: 2 x f ( x) dx 1 n Xi x 2 1 x K dx n i 1 h h Applying the change-of-variables u ( X i x) we have: h 92 (4-40) 1 n ( X i uh) 2 K (u )du n i 1 Expanding ( X i uh) 2 we have X i2 2 X i uh u 2 h 2 thus 1 n ( X i2 2 X i uh u 2 h 2 ) K (u )du n i 1 1 n ( X 2 K (u) 2 X i uhK (u) u 2 h 2 K (u))du n i 1 i 1 n X i2 n i 1 Applying 2 n 1 n 2 ( K ( u ) du X h uK ( u ) du h u 2 K (u )du i n i 1 n i 1 K (u)du 1and uK (u)du 0 we have 1 n 2 n 2 ( X i h u 2 K (u )du ) n i 1 i 1 1 n 2 X i h 2 m2 ( K ) n i 1 2 2 Thus the variance of the density f ( x) h 2 m2 ( K ) where is the sample variance. 4.4.3 BIAS OF THE ESTIMATOR Bias ( f (x) ) is calculated as follows Bias ( f ( x)) E f ( x) f ( x) (4-41) We derive it as follows 1 Xi x 1 z x E k k f ( x)dz h h h h 93 Using the change-of-variables u zx we have h K (u) f ( x hu)du By linearity of the estimator we have 1 n 1 Xi x E f ( x) E k K (u ) f ( x hu )du n i 1 h h Using Taylor expansion of f ( x hu) in the argument hu , which is valid as h 0 .For a j th order kernel we take the expansion out to j th term f ( x hu ) f ( x) f (1) ( x)hu 1 ( 2) 1 f ( x)h 2 u 2 ......... f ( j ) ( x)h j u j o(h j ) 2 j Then we have K (u) f ( x hu)du K (u) f ( x)du K (u) f K (u) ( x)hudu 1 K (u) 2 f ( 2) 1 ( j) f ( x)h j u j du K (u )o(h j )du j! Using (1) K (u)du 1 and u j K (u )du m j ( K ) We obtain f ( x) f (1) hm1 ( K ) 1 ( 2) 1 f ( x)h 2 m2 ( K ) ...... f ( j ) h j m j ( K ) o(h j ) 2 j! Assuming that the kernel is of order j then mi ( K ) 0 for all i j thus we have f ( x) 1 ( j) j f h m j ( K ) o( h j ) j! 94 ( x)h 2 u 2 du. ......... Bias ( f ( x)) E f ( x) f ( x) = 1 ( j) j f h m j ( K ) o( h j ) j! For second-order kernel we have (that is what we interested in) Bias ( f ( x)) E f ( x) f ( x) 1 ( 2) 2 f h m 2 ( K ) o( h 4 ) 2 (4-42) 4.4.4 VARIANCE OF THE KERNEL DENSITY ESTIMATOR Assuming h 0 and n variance of the kernel density estimator is 1 1 Var ( f ( x)) f ( x) K (u ) 2 du o nh nh (4-43) We will derive the variance of the kernel density estimator as follows: X x The kernel estimator is a linear estimator and K i is independent and identically h distributed then Var ( f ( x)) X x 1 Var K i 2 nh h 2 1 11 X x X x 2 EK i EK i nh nh h h As observed in bias derivation that 2 X x 1 EK ( i ) f ( x) o(1) therefore the second term h h 1 is O . n We expand the first term by making the expectation as integral, make change of variables and then a first order Taylor expansion, we get the following 2 2 1 1 X x z x EK i K f ( z )dz h h h h 95 K (u) f ( x hu )du 2 K (u) 2 ( f ( x) O(h))du f ( x) R( K ) O(h) Where R( K ) K (u) 2 du is the roughness of the kernel. We can conclude that variance of kernel density estimator is Var ( f ( x)) f ( x) R( K ) 1 O nh n (4-44) 1 1 1 The remainder O is of smaller order than the O leading term, since . h n nh 4.4.5 MEAN-SQUARE ERROR (MSE) Mean square error is a local measure of the performance of the kernel density estimate at point x and it is the sum of bias squared and variance. Therefore is as follows MSE ( f ( x)) E ( f ( x) f ( x)) 2 (4-45) Bias ( f ( x)) 2 Var ( f ( x)) ( 1 ( j) j f ( x) R( K ) f h m j ( K )) 2 j! nh AMSE ( f ( x)) Since this approximation is base on asymptotic expansion thus it is called Asymptotic Mean Square Error (AMSE) as indicated in the derivation. To obtain a Global Measure of performance at all values of x , we define the Integrated Square Error (ISE). 96 ISE (h) 2 ( f ( x) f ( x)) dx (4-46) This is written as a function of h to emphasize the dependence on the bandwidth. By taking the expected value of the ISE we obtain MISE as follows MISE (h) E[ ISE (h)] E[ ( f ( x) f ( x)) 2 dx] E[( f ( x) f ( x)) 2 (4-47) ]dx MSE ( f ( x))dx AMSE ( f ( x))dx IMSE(h) AMISE (h) m 2j ( K ) ( j!) 2 R( f ( j ) )h 2 j R( K ) nh Where R( f ( j ) ) ( f ( j ) ( x)) 2 dx is roughness of f ( j ) . 4.5 FINDING OPTIMAL BANDWIDTH There are so many methods for bandwidth selection and these include Mean Square Error MSE), Mean Integrated Squared Error (MISE), Asymptotical MISE, plug-in techniques, bootstrap methods, just to mention a few. We are going to briefly describe some selected methods of optimal bandwidth. 97 4.5.1 ASYMPTOTICALLY OPTIMAL BANDWIDTH The optimal bandwidth minimizes MISE. The value of h that minimizes MISE is called asymptotically optimal bandwidth. The solution is found by differentiating MISE with respect to h and setting the derivative to zero this yields the optimal bandwidth. This can be done as follows: 2 d d m j K (u ) R( K ) ( j) 2 j AMISE R ( f h dh dh ( j!) 2 nh 2 jh 2j m 2j ( K ) ( j!) 2 R( f ( j ) ) (4-48) R( K ) nh 2 0 The solution is h0 C j ( K , f )n C j ( K , f ) R( f 1 ( 2 j 1) (4-49) ( j) ) 1 ( 2 j 1) Aj (K ) (4-50) 1 ( j!) 2 R( K ) ( 2 j 1) Aj (K ) 2 jm 2 ( K ) j The optimal bandwidth is proportional to n O(n ( 1 ) 2 j 1 1 ( 2 j 1) (4-51) . The optimal bandwidth is of order 1 ) . For second-order kernels that we are interested in the optimal rate is O(n 5 ) . 4.5.2 PLUG-IN BANDWIDTH A plug-in estimate for the bandwidth is a simple formula for hrot that depends on the sample size n and the sample standard deviation s . The optimal bandwidth formula is given as 98 ho R( f ( j ) ) 1 2 j 1 1 ( j!) 2 R( K ) 2 j 1 2 j11 n 2 jm 2 ( K ) j (4-52) In the above formula all other items have known values except for R( f ( j ) ) ( f ( j) ( x)) 2 dx (4-53) So a useful starting point is to assume that the unknown density f (x) belongs to the family of second-order normal distributions with mean and variance 2 then we have f ( 2) ( x) 2 dx 3 8 5 0.2116 5 (4-54) Then R( f ( 2) ) 1 5 1 0.2116 5 1.3643 5 (4-55) Therefore hrot 1 hrot R( K ) 5 51 n 1.3643 * 2 m2 ( K ) (4-56) The above equation still has one unknown that is and that needs to be replaced by sample standard deviation s , so we have 1 hrot R( K ) 5 51 n s 1.3643 * 2 m2 ( K ) (4-57) That is how plug-in works. There are other variations but the concept remains the same. The table 3.1 shows values required for plug-in bandwidth selection hrot . 99 TABLE 4-1 Plug-in values for hrot Kernel R(K) m2(K) Uniform ½ 1/3 Epanechnikov 3/5 1/5 Gaussian 1/2π 1 4.5.3 ADAPTIVE KERNEL DENSITY ESTIMATE (AKDE) Global bandwidth approach that we have described above may result in under-smoothing in areas with only sparse observations while at the same time over-smoothing in other areas. For this reason there is need to vary the bandwidth along the sample data so that more smoothing is done where data is sparse and vice versa. Kernel density estimation methods that rely on such varying bandwidth are commonly referred to as adaptive kernel density estimation. We also experiment with it in order to find out its effect on shopping item images. Most of the adaptive kernel density estimation can be grouped into two categories that is balloon estimators and sample point estimators. The balloon estimators select different smoothing parameter for each estimation point x . The sample point estimator uses a distinct bandwidth for each data point x i . 4.5.3.1 UNIVARIATE BALLOON ESTIMATOR The univariate balloon estimator is given as f B ( x) 1 n xi x K nh( x) i 1 h( x) (4-58) The estimate of f B (x) is an average of identically scaled kernels centred at each data point. The asymptotically best balloon estimator optimizes the AMSE pointwise; it achieves a minimum where (Terrell & Scott, 1992) 100 1 2 j 1 2 2 ( j ! ) f ( x ) K 2j11 ho ( x) n , 2 j ( f ( j ) ( x)) 2 (4-59) and 2 2 j 1 j ( j) 2 j 2 f ( x) f ( x)( K ) 2 j 1 AMSE o ( x) (2 j 1) n (2 j ) j j! (4-60) This is the general result for non-negative kernels. The most commonly used univariate balloon estimator is the Loftsgaarden-Quesenberry k th nearest-neighbour kernel of the form f ( x) n x x 1 K 1 nhk ( x) i 1 hk ( x) (4-61) The number of nearest neighbours k controls the level of smoothing, with larger values of k corresponding to more smoothing. The use of nearest-neighbours results in more smoothing occurring in regions of low density and less smoothing in region of high density. 4.5.3.2 SAMPLE SMOOTHING ESTIMATORS The sample point estimator is given by f SP ( x) x xi 1 1 K n h( xi ) h( xi ) (4-62) The estimation of f (x) is an average of differently scaled kernels centred at each observation. In this case h( xi ) should vary inversely with the underlying density. Consider taking the following: 101 h( xi ) he * f ( xi ) 1 2 (4-63) Thus we get 1 f e ( x) nhe n i 1 1 2 ( x x ) f ( x ) i i f ( xi ) K he 1 2 (4-64) This choice is advantageous because it gives an improved convergence rate of MSE (Simonoff, 1996). 4.6 THE N-DIMENSIONAL KERNEL DENSITY ESTIMATION The concept of N-dimension KDE is almost an extension of 1-dimension KDE. Suppose we consider a q -dimensional random vector X ( X 1 , X 2, ......., X q )T where X 1 , X 2 ,....., X q are one dimensional random variables. For random sample of size n , it means we have n observations for each of the q random variables X 1 , X 2 ,....., X q . Our goal is to estimate the probability density of X ( X 1 , X 2, ......., X q )T , which is a joint probability density function f of the random variables X 1 , X 2 ,....., X q f ( x) f ( x1 , x2 ,........xq ) . From the 1-dimensional case we adapt the KDE to the q dimensional case as f h ( x) 1 n 1 K hq ( x X i ) q n i 1 nh n x Xi h K i 1 (4-65) The above equation can be simplified and the multivariate KDE becomes f h ( x) 1 n q 1 xv X iv hv K h n i 1 v v (4-66) Giving an example in 2-dimensional KDE where X ( X 1 , X 2 ) T is given as 102 f h ( x) 1 n 1 1 x1 X i1 x2 X i 2 , * K n i 1 h1 h2 h1 h2 (4-67) x X i2 1 n 1 1 x1 X i1 * K 2 * K n i 1 h1 h2 h1 h2 Each of the n observations is the form ( X i1 , X i 2 ) , where the first component gives the value that the random variable X 1 takes on the i th observation and the second component does the same for X 2 . 4.6.1 KERNEL DENSITY ESTIMATOR (PROPERTIES) Multivariate kernel satisfies K (u)(du) K (u)du .....du 1 q 1 (4-68) Where K (u ) takes the product form: K (u) k (u1 ) * k (u 2 ) * ..... * k (u q ) . (4-69) Since K (u ) is a product kernel then the marginal densities of f (x) equal univariate kernel density estimators with kernel functions k and bandwidths h j . The variance of the estimator is Var ( f ( x)) f ( x) R( K ) 1 O n| H | n f ( x) R(k ) q 1 O( ) nh1 h2 ...hq n Bias of the estimator 103 (4-70) Bias ( f ( x)) m j (K ) j! j f ( x)hvj o(h1j ..... hqj ) v v 1 x j q (4-71) 4.6.2 ASYMPTOTIC MEAN INTEGRATED SQUARED ERROR We derived it in univariate so here we state the AMISE as 2 m 2j ( K ) q j R( K ) q j AMISE ( f ( x)) f ( x ) h ( dx ) v nh1h2 ....hq ( j!) 2 v 1 xvj (4-72) There is no closed-form solution for the bandwidth vector which minimizes this expression. The following observations can be taken note of: The AMISE depends on the kernel function only through R(K ) and m 2j K (u ) , so it clear that for any given j , the optimal kernel minimizes R(K ) , which the same as in the univariate case. The optimal bandwidths will all be of order n n 2 j 2 jq 1 2 jq and the optimal AMISE of order . These rates are slower than the univariate case that is when q 1 . The fact that dimension has an adverse effect on convergence rates is called the CURSE OF DIMENSIONALITY. 4.7 FINDING OPTIMAL BANDWIDTH There are so many methods for bandwidth selection and these include Mean Square Error MSE), Mean Integrated Squared Error (MISE), Asymptotical MISE, plug-in techniques, bootstrap methods, just to mention a few. We are going to briefly describe plug-in methods of optimal bandwidth. 4.7.1 PLUG-IN BANDWIDTH We are going to derive the rule-of-thumb, suppose that h1 h2 ..... hq h . Then 104 m 2j ( K ) R( j f ) R( K ) q nh q (4-73) ( j!) 2 qR( K ) q ( 2 j q ) ( 2 j1 q ) ho n 2 jm 2 ( K ) R( j f ) j (4-74) AMISE ( f ( x)) ( j!) 2 h2 j Where j f ( x) j f ( x) . xvj We find that the optimal bandwidth is 1 For a rule-of-thumb bandwidth, we substitute f by the multivariate normal density . We calculate that R( j ) q q 2 (2 j 1)!!(q 1)(( j 1)!!) 2 ) 2 q j (4-75) After the substitution, we obtain h0 C j ( K , q)n 1 (2 j q) (4-76) Where 1 q (2 j q) 2 2 q j 1 ( j!) 2 R( K ) q C j ( K , q) 2 2 jm j ( K )((2 j 1)!!(q 1)(( j 1)!!) ) (4-77) We assumed that all variables had unit variance. Rescaling the bandwidths by the standard deviation of each variable, we obtain the rule-of-thumb bandwidth for the v th variable: 105 hv v C j ( K , q)n 1 (2 j q) (4-78) The values of constant C j ( K , q) are in the table 4.2: TABLE 4-2 Value of Constant C j ( K , q) 4.8 SHAPE REPRESENTATION USING ADAPTIVE KERNEL DENSITY FEATURE POINTS ESTIMATOR (AKDFPE) This method describes the feature points within the rectangle boundary in an image grid. Assume we have a silhouette object shape segmented by some means such as Chan & Vese Active Contour without Edges and let the feature points set P( x, y) (intensity function) of the object shape be defined as P( x, y) pi ( x, y) such that i 1, 2, .....n where n . (4-79) We find the centroid of the object shape. The following formulae will be used to calculate the centroid (Flusser et al., 2009),(Mukundan & Ramakrishnan, 1998): xc yc m1, 0 (4-80) m0 , 0 m0,1 (4-81) m0 , 0 where m1,0 , m0,1 , m0,0 are derived from the silhouette moments given by 106 mi , j x i y j P( x, y) . x (4-82) y Thus for silhouette image P( x, y) , m0, 0 the moment of zero order represents the geometrical area of the image region and m1,0 , m0,1 moment of first order represents the intensity moment about the y-axis and x-axis of the image respectively. The centroid ( xc , yc ) gives the geometrical centre of the image region. Suppose the size of the grid occupied by the object shape is NXN. The vector dimension to represent the density of object shape will be N-1. The centroid calculated by the two formulas above 4-80 and 4-81 is ( xc , yc ) . From the centroid we count the number of image pixels in the rings around the centroid. The number of image pixels in each and every ring is given as vector xim (n1 , n2 , ......nm ) where m is the number of rings around the centroid. From now we apply the Adaptive Kernel Density Feature Points Estimator (AKDFPE). We are using second-order KDE. The AKDE using the modified Loftsgaarden-Quesenberry k th nearest-neighbour kernel of the form in equation 4-75. f ( x) n x1n x 1 K nhkc ( x) i 1 hkc ( x) (4-83) The number of nearest neighbours k c controls the level of smoothing of clusters c , i 1, 2, 3............, n . K () is the kernel function, n is the number of rings and hkc is the bandwidth per cluster. We calculate the optimal bandwidth hoc j per cluster. Then we recalculate the vector elements of the image, using equation 3-76 that follows: m 1 x x f ( xi ) K i hoc j hoc j (4-84) Where j 1, 2, 3, .....m 107 4.8.1 PROPOSED CALCULATION OF THE OPTIMAL BANDWIDTH The real problem with Kernel Density Estimator is when to use the global or variable bandwidth. The next problem is how to find a suitable k th nearest neighbourhood to use to calculate the optimal bandwidth. The number of nearest neighbourhood k controls the level of smoothing. What it means is that when k is equal to the number of sample elements then the global bandwidth is calculated otherwise the variable bandwidth is calculated. When one has k then it is easy to calculate the optimal bandwidth. The question is: How do you find the k for a given sample elements? To address this problem, we take an image whose centroid have been calculated and denoted as Dm Dc 1 (4-85) This is the first density feature of the image and is equal to one since the centroid is one pixel that belong to an image. The rest of the density feature points of the image is given as Dm n1, n2 ,n3 ,n4 ,..................., nm (4-86) Where m 1,2,3,.........n indicating the number of rings from the centroid. Within a given ring the image I occupies a certain percentage of the ring area O(I )% . These percentages indicate whether an image sparsely or densely occupies the ring. We calculate this percentage as given in equation 3-79. O( I )% nm *100 2 m 2 (4-87) The system then is supposed to cluster all the consecutive rings that fulfil predefined conditions. For example 0 O( I )% 25 25 O( I )%50 50 O( I )%75 75 O( I )% 100 108 To find the k th nearest neighbourhood the system counts the elements in each cluster and that constitute the k cth nearest neighbourhood of that given cluster. There are special cases to consider in the event that they all fall in the same cluster or the clusters are made of one element each then the system calculates the global bandwidth. When cluster has one element in between two clusters then it is included in the cluster approximately near to it in terms of cluster values. In reality it is calculating global bandwidth within clusters. This method will take care of sparse and dense observations. Figure 15 shows an image with a calculated centroid c and the ring around the centroid numbered 1, 2, 3, 4, 5, ...., n . FIGURE 4-3: Shows the rings around the centroid of an image Suppose the system clustered the rings as follows: c1 1, 2, 3, c2 4, 5and so on Thereafter the system calculates the bandwidth for each of the clusters as follows: 109 1 hoc1 1 R(k ) 5 1.3643 * 2 * 3 5 * s m2 (k ) 1 1 R(k ) 5 ho c2 1.3643 * 2 * 2 5 * s m2 (k ) ............................................................. 1 ho cn 1 R(k ) 5 1.3643 * 2 * x 5 * s m2 (k ) where x 1 is the number of elements in cluster n and s is the variance of the cluster in question. 4.8.2 AKDFPE ALGORITHM STEPS 1. Read image 2. Digitalize the image 3. Find the centroid ( xc , yc ) 4. Count image pixels in each circle around the centroid one pixel wide 5. Calculate the percentage of the image pixels in each circle 6. Cluster adjacent circles of the same percentage 7. Standardize the initial vector of the image 8. Find the optimum bandwidth for every cluster 9. Apply the kernel density estimator to every cluster 10. The resultant vector is the image representation 11. End 110 4.8.3 EXAMPLE Supposed we have the following object shape features on a grid given in Figure 4-4 1,0 2,0 3,0 4,0 1,1 2,1 3,1 4,1 0,2 1,2 2,2 3,2 4,2 3,3 4,3 2,4 3,4 4,4 FIGURE 4-4: Segmented object shape The bold numbers are the image pixels. The size of the grid occupied by the object shape is 5X5. The different colours indicate the rings of width one pixel around the centroid (3, 2). x 3i (7, 8, 1) The vector above will be represented as follows in the standardize way: x13 (28,16,1) And percentages are as follows %s (88, 50, 3) It means they belong to three different clusters. In this case we calculate global bandwidth. From now we apply the Adaptive Kernel Density Estimator (AKDE). We are using second-order AKDE. The AKDE is given as 111 f h ( x ) ( x) 1 3 1 3 x xi K ( x x ) h( x) K i 3 i 1 3h( x) i 1 h( x) (4-88) We then calculate the optimal bandwidth hoc for each cluster in an image shape vector. Then we recalculate the vector elements of the image, using the univariate balloon estimator using modified Loftsgaarden-Quesenberry k th nearest neighbourhood given in equation 3-81. f B ( x) 1 n xi x K nh( x) i 1 h( x) (4-89) The estimate of f B (x) is an average of identically scaled kernels centred at each data point. 1 x13 x f ( x) K hoc1 hoc1 3 1 (4-90) K () is the ker nel function This is how the images will be represented. 4.9 SIMILARITY MATCHING We will experiment with the (dis)similarity methods below in equations 3-83, 3-84 and 385. The (dis)similarity methods in equations 3-83 and 3-84 are metric which makes the retrieval system efficient if used in a metric modelled database. Method in equation 3-85 is non-metric (dis)similarity measure. We will compare the effectiveness of these (dis)similarity methods. The methods are given in equation 3-83, 3-84 and 3-85. 112 Euclidean and the Cityblock dissimilarity algorithms are given as d ( x, y ) n (x i 1 i yi ) 2 || x y || 2 (4-91) n d ( x, y ) | xi yi | || x y || (4-92) i 1 respectively. The city-block takes fewer operations than the Euclidean dissimilarity. Both of them are metric distances. The cosine similarity is given as n d ( x, y ) p i 1 n pi2 i 1 i * qi n q i 1 2 i (4-93) The numerator of equation 4-93 is a dot product. To be able to compare the effectiveness of the (dis)similarity methods we can use the retrieval effectiveness of the system when different methods are used. The system is supposed to rank the results and we subjectively evaluated the results. Cosine similarity is non-metric distance due to the fact that it does not fulfil the reflexivity property of metric axioms in a). d ( x, y) 0, if and only if x y, Reflexivity 4.10 EVALUATION Visual evaluation of the system will be done using Precision-Recall Curve (PRC). This will be complemented by scalar evaluation of the system. Effectiveness of the retrieval system will be measured by precision (which is the number of correct image retrieved divided by the total number of images retrieved) and recall (is the number of correct images retrieved divided by the total number of possible correct images). precision A AC (4-94) 113 A N (4-95) A if T N effectiven ess N A if T N T (4-96) recall Where A is the number of relevant image objects retrieved, C is the number of not relevant image objects retrieved, T is the number of relevant images that the user requires from the database and N is the total relevant images in the database. We are also going to measure the retrieval rate by the bull’s eye score. The bull’s eye score in percentage is measured by the number of correct retrievals divided by the number of relevant items in the dataset. Every shape in the database is compared to every other shape in the database. For example the MPEG 7 dataset where we have 70 distinct classes of 20 similar shapes the bull’s eye value percentage will be calculated as follows: B D *100 P (4-97) where B is a Bull’s eye score in percentage, D is the total sum of correct retrieval and P is the total possible outcome. We will also compare our method with other representation methods to prove its robustness. 4.11 DATASETS MPEG7-CE shapes and general shopping item images will be used. The use of MPEG 7 dataset makes segmentation techniques not to influence the output of the retrieval system. This makes the evaluation of the system objective. Our system must work in real world were the different types of noise are introduced within the system and the system has to deal with, therefore the need to use the general shopping items images. 114 4.11.1 MPEG 7 MPEG 7 contour shape CE-1 is dataset of over 3400 images divided into three parts. The objective of each part is as follows: PART A: robust to scaling and rotation (A1, A2) B: performance of the similarity-based retrieval C: robustness to changes caused by non-rigid motion. Part A is a necessary condition for any shape descriptor. Sets A1 and A2 consists of eight hundred and forty (840) shapes that are organized into seventy (70) groups. These sets which have equal number of shapes and six (6) similar shapes in each group are used to test scale and rotation invariance. The MPEG 7 part B database contains one thousand four hundred (1400) binary shape images. This consists of seventy (70) distinct classes of shapes; each class containing twenty (20) similar shapes. Set B is used to test the overall robustness of the shape representation through similarity based retrieval. Set C contains one thousand five hundred (1500) shapes and is used to test robustness of non-rigid deformations. MPEG 7 region shape CE-2 database consists of 3621 binary image shapes of mainly trademarks. The database is used to test performance on complex shapes consisting of multiple disjoint regions. The classified test set contains two thousand eight hundred (2800) trademark shapes: six hundred and seventy eight (678) objects shapes are classified into ten (10) groups, on the base of perceived region shape similarity. The groups consist of variable number of shapes. The remaining two thousand one hundred and twenty two trademarks are unclustered. They also measure scaling, rotation and subjective tests. This database is also organized in almost the way as CE-1 database. The shapes also test scaling, rotation and robustness of the shape representation through similarity based retrieval. 4.11.2 GENERAL SHOPPING ITEM IMAGES General shopping item images will also be used to measure the overall effectiveness of the retrieval system in this domain. The database of over four hundred (400) shapes will be created from images from the Internet. The database will be organized into twenty (20) 115 groups with at least ten (10) similar shapes in each group. Some distinct items collected from the Internet to make a general shopping item image dataset are shown in figure 4-5. FIGURE 4-5: Distinct images from the Internet 4.12 QUERY IMAGES Images in the MPEG 7 databases and the general shopping item shapes database are all possible query images. In addition to query images mentioned, images captured by camera enabled devices will be used as query images to retrieve similar images from the general shopping item shapes database. 4.13 CHAPTER SUMMARY The following algorithms are going to be experimented with|: Pre-processing stage – Histogram equalization, image filtering, resizing and morphological techniques Segmentation Stage – Active Contour without edges and Robust Image Segmentation using local median Representation Stage – Non-Parametric method Similarity Matching Stage - Euclidean and Cosine methods These techniques were chosen due to their merits discussed in this chapter. 116 CHAPTER 5 5 EXPERIMENTATION, RESULTS AND DISCUSSION This chapter describes the experiments that were conducted during the building up of the retrieval system, testing the retrieval system and experimenting with Image Content in Shopping Recommender System for Mobile Users. It also gives the results of the experiments and discusses the results of the experiments. The ultimate purposes of the experiments were to: Measure the effectiveness of the retrieval system using the AKDFPE representation method Incorporate the retrieval system into the Image Content in Shopping Recommender System for Mobile Users 5.1 Simulate the usage of the recommender system Measure the satisfaction of the users EXPERIMENTS The initial experiments are to ascertain the robustness of the retrieval system. In doing this, it entails making sure all the stages are performing to optimum. Using general images, in this case the shopping items would require all stages of the retrieval system to be working appropriately to be able to get a good retrieval rate. This means that the pre-processing and segmentation stages need to be tested and adjusted to produce acceptable results. To test these stages there is need to have test data so that these stages could be calibrated to suit the image domain for automation or semi-automation of the system. A database was created of over four hundred (400) shopping items such as televisions, shoes, beds and so forth. Samples of some distinct image items are shown in Figure 5.1. 117 FIGURE 5-1: Samples of shopping items in each category in the dataset The selection of the (dis)similarity algorithm between the metric (Euclidean) and the nonmetric (cosine) algorithm will be done using the Adaptive Kernel Density Feature Points Estimator (AKDFPE) algorithm since it is the method that is being proposed for the Image Content for Shopping Items Recommender System for Mobile Users. After choosing the compatible (dis)similarity algorithm the effectiveness of the representation algorithm AKDFPE is measured in comparison with other methods. The AKDFPE is a region based representation method so in theory it is a domain independent method (generic algorithm). To test the generic form of the algorithm, it will be tested against contour based and region based algorithms. In order not to reconstruct all the methods to compare with AKDFPE the standard datasets are used and the results of AKDFPE will be compared with results obtained by other authors. After ascertaining that the AKDFPE method is effective and performing better than any other methods compared with, the retrieval system will be incorporated into the recommender system. The recommender system as shown in chapter 2 will be simulated. The results of the recommender system will be evaluated by a sample of people. The analysis of the results and the system will be done. 5.2 PRE-PROCESSING, SEGMENTATION AND (DIS)SIMILARITY SELECTION Image pre-processing suppresses undesirable distortions or enhances some image features in order to improve the quality of the image data for further processing. Selection of good pre-processing techniques is very significant in image processing. Segmentation is one of the image processing techniques that depend directly on the pre-processing stage. For effective retrieval, (dis)similarity algorithm must be compatible with the representation method. A segmentation technique impacts directly on the effectiveness of the 118 representation method. The immense contribution of pre-processing, segmentation and (dis)similarity techniques to the system cannot to be ignored. So experiments to ascertain that these stages are adequately contributing to the system were performed and the following results obtained. 5.2.1 RESULTS FOR PRE-PROCESSING AND SEGMENTATION STAGES Figure 5-2b shows samples of results of pre-processing and segmentation of images in Figure 5-2a. Subjectively it is agreed that this is an acceptable pre-processing and segmentation results. The settings were then set for all the images for the retrieval system. (a) (b) FIGURE 5-2: (b) Sample results of pre-processing and segmentation of images in (a) 5.2.2 RESULTS FOR SELECTION OF (DIS)SIMILARITY METHOD USING AKDFPE At least hundred shopping item images that belong to the database were used as query images. Retrieval effectiveness of the (dis)similarity methods was evaluated using recall 119 and precision methods also subjective evaluation. Average precision was only calculated on hundred percent recall. Ranking was also evaluated subjectively. Figures 5-3 and 5-4 show the normal retrieval results of the retrieval system that the users experience. In these samples the query image is on top left of each figure. Figure 5-5 shows the results of the retrieval system showing similar segmented images. In this Figure 5-5, it is possible for the developers to evaluate the robustness of the representation and similarity methods used. FIGURE 5-3: (Dis)similarity method Cosine on the left and Euclidean on the right (AKDFPE) 120 FIGURE 5-4: (Dis)similarity method Cosine on the left and Euclidean on the right (AKDFPE) FIGURE 5-5: Segmented shapes that were considered similar by AKDFPE using cosine similarity algorithm 5.2.3 RESULTS ANALYSIS OF PRE-PROCESSING, SEGMENTATION AND (DIS)MILARITY TECHNIQUES Results shown in Figure 5-2 enabled the setting of pre-processing and segmentation parameter for automation of the stages for general image shapes. An observation from the sample results Figure 5-3 results show eighty percent precision for cosine and seventy percent precision for Euclidean methods. Figure 5-4 results show ninety percent for cosine and eight percent for Euclidean methods. Overall results showed an average precision of 93.05 percent for cosine similarity method as compared to 92.60 percent for Euclidean method. Subjectively it is agreed that cosine method was superior to Euclidean method in ranking the image results. This prompted the selection of cosine similarity method to be 121 used with the AKDFPE representation method. In figure 5-5 where there are results of the segmented images that were considered similar by the system (AKDKFPE and cosine), subjectively it was evaluated that the system is working well. This was due to the fact that human perception of the images they are similar to each other but with some distortions which the system was able to overcome. 5.3 EFFECTIVENESS OF AKDFPE AND OTHER REPRESENTATION METHODS With building of the retrieval system complete, the system needs to be tested for its effectiveness and robustness against other retrieval systems. The results of the system will be compared with other systems results that were tested on standardized datasets. This stage will also verify the generic form of our system. Experiments will also be done on general shopping item images and the results compared with DHFP retrieval system. 5.3.1 RESULTS FOR COMPARISON OF EFFECTIVENESS BETWEEN AKDFPE AND OTHER METHODS ON STANDARD DATASETS The proposed system is now complete with the pre-processing and segmentation stages calibrated for the shopping items domain. The cosine similarity algorithm selected as optimum method for AKDFPE representation algorithm. Firstly the method AKDFPE was tested for the necessary conditions that are rotation, scaling and translation. After obtaining satisfactory results then the retrieval system was tested for effectiveness and robustness against other methods. The results of the experiments are in figures below. The representation methods AKDFPE and DHFP were subjected to experiments with MPEG 7 datasets in order to ascertain their effectiveness and generic form. The benefits of using standardized datasets for example MPEG 7 datasets is that it is possible to compare methods without reconstructing the other authors’ methods. Authors can claim the robustness of their method(s) over others. The results of the experiments are shown in table 5-1 and figure 5-6. 122 TABLE 5-1: Comparison of Bull’s Eye Performance on MPEG 7 CE 1 Dataset Part B Method BEP % CSS 81.12(Bai, Latecki & Tu, 2010) IDSC 91.61(Bai et al., 2010) DHFP 92.18 KDFPE 92.70 (Zuva, Olugbara, Ojo & Ngwira, 2012) AKDFPE 93.56 123 FIGURE 5-6: Average precision-recall on Region Based Test Image Retrieval on 678 object shapes (MPEG 7 CE 2) 5.3.2 RESULTS FOR COMPARISON OF EFFECTIVENESS BETWEEN AKDFPE AND DHFP ON SHOPPING ITEMS DATASET Having had satisfactory results on standard datasets, we then moved to the domain of interest shopping items domain. Figures 5.7 and 5.8 show the normal retrieval results of the retrieval system that the users experience. Figure 5.9 shows the performance measure of the retrieval system using the recall-precision graph. This will help in evaluating the performance of the system in the domain of interest. 124 FIGURE 5-7: Ten retrieval results of AKDFPE on left and DHFP on the right (query at the top left of the figure) FIGURE 5-8: Ten retrieval results of AKDFPE on the left and DHFP on the right (query at the top left of the figure) 125 FIGURE 5-9: Average precision-recall chart on General Image Retrieval 5.3.3 RESULTS ANALYSIS FOR EFFECTIVENESS BETWEEN AKDFPE AND OTHER METHODS In table 5-1 it shows AKDFPE has a better BEP score compared with rest of other methods compared with. The experiment was testing the performance of the representation methods on a contour based standardized dataset. Figure 5-6 indicates that AKDFPE performed better than DHFP when tested on region based standardized dataset. These results which showed robustness of AKDFPE method over others and its generic form necessitated its comparison with DHFP on shopping items database. The results in figure 5.7 and in figure 5-8 show that AKDFPE performs better as compared with DHFP method. The BEP score of 92.64 for AKDFPE and 90.87% for DHFP confirmed the superiority of AKDFPE method. These satisfactory results enabled AKDFPE retrieval system to be incorporated in the Image Content for Shopping Items Recommender system for mobile Users. 126 5.4 IMAGE CONTENT FOR SHOPPING ITEMS RECOMMENDER SYSTEM FOR MOBILE USERS The mobile users captured query images by a camera enabled cell phone. The 3-D objects may have more than one 2-D images as shown in figure 5-10. In figure 5-10 some 2-D images may be very difficult to be used to identify the type of the object as shown in figure 5-11a. Figure 5-11b shows 2-D images that seem to be easy to identify the object type. In this case for object in figure 5-10 only images in figure 5-11b will be included in the dataset. That means each object in the dataset will have more than two 2-D images in the database if necessary. The shopping item images captured by the camera enabled device must be compatible with those already in the database. This enabled measurement of the performance of the retrieval system. The retrieval system was made aware of the images that belong to the same 3-D object so that when retrieving only one image of the object comes out. In terms of soft shopping items like cloths for example dresses, trousers the images were taken while on doles. Example of cell phone and cell phone camera specifications are given in table 5.2 (randomly chosen). At last we made the system to operate as a recommender system where some shopping items were made to be on promotion and others on special offer. The recommender system was evaluated by sample of fifty (50) users randomly chosen for its performance, usefulness and satisfaction. They rated the performance of the system by scoring their degree of satisfaction using the scores in table 5.3. The system should retrieve images similar to the one queried by the user but at the same time would also bring other shopping items on special offer to the user. The system also has a dummy Global Positioning System (GPS) coordinates to enable the users to find the retail shop. FIGURE 5-10: 2-D images of a 3-D shopping item 127 a) b) FIGURE 5-11: a) set of images difficult to identify b) set of images easy to identify TABLE 5-2 6220c cell phone and its camera specifications TABLE 5-3 Scores to measure satisfaction with performance of the system 128 5.4.1 RESULTS FOR RETRIEVAL SYSTEM OF SHOPPING ITEMS FOR MOBILE USERS FIGURE 5-12: Query image captured by a camera enabled mobile device FIGURE 5-13: Ten retrieval results of AKDFPE 129 FIGURE 5-14: Average precision-recall on General Image Retrieval (Query captured by cell phone) 5.4.2 RESULTS FOR IMAGE CONTENT FOR SHOPPING ITEMS RECOMMENDER SYSTEM FOR MOBILE USERS FIGURE 5-15: Query Image 130 FIGURE 5-16: Results from the Shopping Recommender System FIGURE 5-17: Query Image 131 FIGURE 5-18: Results from the Shopping Recommender System FIGURE 5-19: Query image captured by a camera enabled mobile device 132 FIGURE 5-20: Results from the Shopping Recommender System with GPS coordinates for Retailer FIGURE 5-21: Query image captured by a camera enabled mobile device 133 FIGURE 5-22: Results from the Shopping Recommender System with GPS coordinates for Retailer 134 FIGURE 5-23: Evaluation of the recommender system 5.4.3 RESULTS ANALYSIS Figure 5-13 a very high effective retrieval rate of the shopping item images. For 100% recall there is at least 60% average precision. This might have happened because of how the 3-D images were represented using more than one 2-D images as shown in figure 5-10. Evaluation of the system was done as shown in figures 5-23. The reason for high scoring might be that retrieval of images is still novel to the students and also the system’s performance is very high. The group of students were also influenced by the incorporation of their preferences in the system. We can safely conclude that the effect of personal preferences in the system has a positive effect to user. The imitation of the shopping recommender system was accepted positively by the evaluators. The results in figures 518, 5-20 and 5-22 are very interesting; they show the practicality of incorporating image retrieval into recommender systems. Since at 10% recall the 3-D retrieval system is almost 100% the recommender system was made to retrieve at most three images similar to the one queried by the user. 135 5.5 OVERAL RESULTS ANAYSIS The proposed AKDFPE representation method has been extensively studied and evaluated in detail. It has shown that it fulfils the necessary conditions for image descriptors that are rotation and scaling. The most challenging is scaling in the sense that scaling of object to a relatively small size may result in significant distortion in their shape. AKDFPE method is a generic image representation method that is why it was tested on contour based and region based test datasets (MPEG 7 databases). The AKDFPE method satisfies almost all of the requirements set by MPEG 7 for shape representation. The requirements are good retrieval accuracy, compact features, general application, low computation complexity, robust retrieval performance and hierarchical coarse to fine representation. The method (AKDFPE) from the results obtained can deal with errors in segmentation and is robust to segmentation noise. In cases where BEP was calculated it means every image was considered as a query image and every image contributed in the calculation of the performance measure. The recallprecision performance of the method is calculated where randomly selected images were used as query images. In all these performance measurements AKDFPE has a high retrieval performance and performing better than the compared methods. AKDFPE has competitive retrieval performance on general shopping items shapes. It is important to note that AKDFPE does not represent images by absolute values of the features but estimates. This makes it very effective in retrieval of similar images. Incorporation of the retrieval system into the recommender system shows that it has a positive effect on the users. In this research only at least one aspect of user’s preferences was factored in the system, in recommender systems almost all aspects of user preferences will be factored in making the user to benefit more from the system. This type of recommender system also has problems of scalability as the images increase the effectiveness is also reduced. They also can make a browser be a buyer in the sense that if a user captures an image the recommender system is capable of making a recommendation. So they do not have problem of cold start. 136 CHAPTER 6 6 CONCLUSION, CONTRIBUTION AND FUTURE WORK This chapter gives the conclusion, contribution and future work of the research work. In chapter one the goal of the research is stated as follows: “The goal of the research is to evolve image content representation algorithm for effectively matching sales item whose image content has been extracted by Active Contour without Edges in an Image Content in Shopping Recommender System for Mobile Users.” There is need to evaluate whether the goal was achieved or not at the same time evaluating the contribution of the research work and then discuss future work. 6.1 CONCLUSION In this research, Image Content in Shopping Recommender System for Mobile Users was the main interest. An effective and efficient recommender system for mobile users entails having an effective image retrieval system as a component of this recommender. The fundamental components of image retrieval system are image pre-processing, image segmentation, image representation and image matching. In the endeavour to fulfil the objective of this research work reviews of literature, creation of new method and building of systems was done. Evaluation of the systems was done. The following section is an elaboration of the work done to fulfil the primary objective of having an Image Content in Shopping Recommender System for Mobile Users. Shape representation, segmentation and similarity methods have been reviewed. The importance of the reviews and studies are to understand the problems and issues involved in these techniques. Also to identify open issues, advantages and disadvantages of these techniques used in retrieval systems. Scientific methodologies have been used in the studies. In this research standard datasets that is MPEG 7 datasets, general shopping items database and acceptable performance measurement techniques have been used. The image segmentation in shopping item domain requires a region based method. This is due to the fact that some of the shopping items have smooth edges or are without edges making contour based methods not the most ideal. Within the region based methods the one that use the global statistics to model the regions to segment are the most suitable. 137 The study shows that region based representation techniques are the future in generic retrieval systems. Region based techniques are more accurate and robust than the contour based techniques. A new region based image representation method was born the AKDFPE. The AKDFPE shows that representation of images is best done by estimation instead of exact quantities of shape features. The system is capable of visualizing dense and sparse areas in order to make a decision on how to calculate the optimal bandwidth for the shapes. This method is robust to noise due to its estimation characteristics. The performance measurements show that the method outperformed considered contour based and region based methods. This method is a generic image representation method. The (dis)similarity algorithm have to be chosen from metric or non metric classes. In doing this one has to decide on the tradeoffs between effectiveness and efficiency of the system. In this Image Content in Shopping Recommender System for mobile users the accuracy of the recommendation is the most critical element of the system as much as the efficiency of the system must be acceptable. In this case cosine similarity algorithm showed to be more effective than Euclidean dissimilarity. It is also possible to support it using metric modelled databases. The retrieval system was final built. The system was tested on 2-D image shapes. The challenges come when a query image is capture by camera enabled device that translates 3D objects to 2-D image shapes. To tackle this challenge more than one 2-D images were made to represent one 3-D object. The retrieval system performed very well. Incorporating the retrieval system into the recommender system was done. The evaluators were satisfied with the system. In conclusion the situational problem in chapter one is revisited: “Suppose Nyasha leaves home with a location and a camera enabled mobile device for shopping. Getting to a nearby shop, she finds an item similar to an item she really wants. Now she is faced with difficulty of either buying it now or continue doing window shopping with the hope of finding the real item she wants. The dilemma is if she does not buy now she might not get it later or if she does, she might get the one she wants, as she continues window shopping. Consequently, the problem is, with the aid of a camera enabled mobile device carried by Nyasha, how can she be helped to make the decision of buying this item or not with the realization that the shops have databases of shopping items online?” 138 Solution: Nyasha must login into the Image Content in Recommender System for Mobile Users and then capture the image of the object. The camera enable device then send the image to the recommender system and the system returns the GPS of the retailer nearest to Nyasha’s location with the shopping item of interest. It also recommends other shopping items that might interest Nyasha that are on promotion or on special offer. The goal of the research was fulfilled. 6.2 SUMMARY OF CONTRIBUTIONS The main contributions of this research are as follows: Comprehensive reviews of segmentation, representation and similarity techniques were done, challenges, open issues, advantages and disadvantages were highlighted. Standard evaluations of these techniques were recommended for easy comparison. Research papers for international conferences were written. A generic AKDFPE descriptor is proposed and a comprehensive evaluation was done. The technique is suitable for generic shape description and retrieval. The technique outperformed contour and region based techniques by taking advantage of the way it estimates the feature distribution of the shape and calculates the optimal bandwidth. The automation of the calculation of the optimal bandwidth was novel. The AKDFPE satisfies most of the principles set by MPEG 7 for image retrieval. The technique allows changes to its kernel function to suit specific domain. The proposed AKDFPE has been applied on MPEG 7 datasets and to general shopping items images database. The Image Content in Shopping Recommender System for Mobile Users novel and user satisfaction in the utilization of the system was noted. The proposed technique has been tested on shopping items images database queried by items images captured by camera enabled mobile device. The usage of images as input into system minimized the use of text which is still a challenge to mobile users due to the size of the mobile devices. This means our research has contributed in giving a solution to this problem. 139 The research contributed in showing the practicality of incorporating image retrieval into mobile recommender system which is a novel idea. This removes any ambiguities in querying the system as compared with what would have happened when querying with keywords. 6.3 FUTURE WORK Content based image retrieval in recommender systems for mobile users is a very interesting area that is still under investigation. Reduction of text usage in mobile recommender systems for mobile users is still a challenge that needs to be addressed. Retrieval of images that are on heterogeneous background is still a challenge. Objects are in 3-D but their images are represented in 2-D. Representation of 3-D objects requires many 2-D shape images which makes it a difficult task. Optimization 2-D images required to adequately represent a 3-D image is also challenge. Incorporating users or their taste as part of retrieval system is still an area of interest. The satisfaction of the user is also a paramount goal of recommender systems. Segmentation is also an area where so many challenges still exist. The ideal situation is when automatic segmentation is possible in generic images but human intervention is still necessary. In large datasets of generic images segmentation becomes a daunting task. Thus it would be necessary to minimize human involvement in segmentation of generic images. The proposed image representation technique AKDFPE shows that it has high retrieval effectiveness but the efficiency of it was not measured. The retrieval efficiency is undoubtedly a critical factor in image retrieval for mobile user. Further research is necessary to measure the efficiency of the representation technique. This factor is very important for mobile users due to their limited time. The technique shows interesting results when incorporated in the Image Content in Shopping Recommender System for Mobile Users. The most challenging is getting 100% precision from 100% recall and 100% user satisfaction therefore there is still need for further investigation in the area of image retrieval in mobile recommender systems. 140 Experiments using actual smart mobile devices on the market such as smart phones (iPhones, Black Berry, etc) and other smart mobile devices (iPad, iPod, Black Berry Play Book, etc) should be performed in future. This will enable to investigate how best the system (Image Content in Shopping Recommender System for Mobile Users) can be adapted to different smart mobile devices. Extract: “A mature science is governed by a single paradigm. The paradigm sets the standards for legitimate work within the science it governs. By solving standard problems, performing standard experiments and eventually by doing a piece of research under a supervisor who is already a skilled practitioner within the paradigm, an aspiring scientist becomes acquainted with the methods, the techniques and the standards of that paradigm.” (Chalmers, 1999) 141 REFERENCES ADAIR, J. B. & TURNBULL, M. 1974. A procedure for calculating great circle distances between geographic locations. Council for Advanced transportation Studies, the University of Texas at Austin. AIROUCHE, M., BENTABET, L. & ZELMAT, M. 2009. Image Segmentation Using Active Contour Model and Level Set Method Applied to Detect Oil Spills. Paper presented at the Proceedings of the World Congress on Engineering (WCE 2009), London, UK. ANTANI, S., LEE, D. J., LONG, L. R. & THOMA, G. R. 2004. Evaluation of shape similarity measurement methods for spine X-ray images. J. Vis. Commun. Image R. (Elsevier), 15:285-302. AYED, I. B. & MITICHE, A. 2008. A Region Merging Prior for Variational Level Set Image Segmentation. IEEE, 17(12):2301-2311. BAEZA-YATES, R. & RIBEIRO-NETO, B. 1999. Modern Information Retrieval. New York: ACM Press. BAI, X., LATECKI, L. J. & TU, Z. 2010. Learning Context-Sensitive Shape Similarity by Graph Transduction. IEEE Transactions on pattern analysis and machine intelligence, 32(5):861-874. BIGDELI, E. 2008. Comparing accuracy of cosine-based similarity and correlation-based similarity algorithms in tourism recommender systems. Paper presented at the 4th IEEE International Conference on Management of Innovation and Technology, 2008. BOGERS, T. & BOSCH, A. V. D. 2009. Collaborative and Content-based Filtering for item Recommendation on Social Bookmarking Websites. Paper presented at the ACM RecSys '09 Workshop on Recommender Systems and the Social Web, New York, USA. BOUCHERON, L. E., HARVEY, N. R. & MANJUNATH, B. S. 2007. A quantitative object-level metric for segmentation performance and its application to cell nuclei. springer-Verlag 2007:208-219. 142 BOUTEMEDJE, S., ZIOU, D. & BOUGUILA, N. 2007. A graphical model for contentbased image suggestion and feature selection. Springer-Verlag, Berlin Heidelberg. BRADLEY, A. P. 1997. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(7):1145-1159. BRODERSEN, K. H., ONG, C. S., STEPHAN, K. E. & BUHMANN, J. M. 2010. The binormal assumption on precision-recall curves. Paper presented at the International Conference on Pattern Recognition. BURKE, R. 2002. Hybrid Recommender Systems: Survey and Experiments. Modeling and User-Adapted Interaction, 12(4):331-370. User BUSTOS, B., KREFT, S. & SKOPAL, T. 2011. Adaptive metric indexes for searching in multi-metric spaces. In: Multimedia Tools and Applications. Springer. CELEBI, E. & ASLANDOGAN, A. 2005a. A comparative study of three moment-based shape descriptors. Paper presented at the IEEE proceedings of the International Conference on Information Technology: Coding and Computing CELEBI, E. M. & ASLANDOGAN, A. Y. 2005b. A comparative Study of Three Moment-Based Shape Descriptors. Proceedings of the International Conference on Information Technology: Coding and Computing. CHA, S.-H. 2007. Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences, 1(4):300-307. CHALMERS, A. F. 1999. What is this thing called Science? Third Edition ed. Buckingham: Open University Press. CHAN, T. F. & VESE, L. A. 2001. Active Contours Without Edges. IEEE, 10(2):266277. 143 CHEN, Z., JIANG, Y. & ZHAO, Y. 2010. A Collaborative Filtering Recommendation Algorithm Based on User Interest Change and Trust Evaluation. Internation Journal of Digital Content Technology and its Applications, 4(9):106-113. CHERIET, M., SAID, J. N. & SUEN, C. Y. 1998. A Recursive Thresholding Technique for Image Segmentation. IEEE, 7(6). CHOI, Y. & RASMUSSEN, E. 2002. User's relevance criteria in image retrieval in America history. Information Processing and Management, 38(2002):695-726. CLARKSON, K. L. 2005. Nearest-Neighbor Searching and Metric Space Dimensions. In: Nearest-Neighbor Methods for Learning and Vision: Theory and Practice. Cambridge: MIT Press, Cambridge, MA. DAVIS, J. & GOADRICH, M. 2006. The Relationship Between Precision-Recall and ROC Curves. Paper presented at the Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA. DAWOUD, A. & KAMEL, M. S. 2004. Iterative Multimodel Subimage Binarization for Handwritten Character Segmentation. IEEE, 13(9):1223-1230. DEB, S. 2008. Overview of image segmetation techniques and searching for future directions of research in content-based image retrieval. DRUMMOND, C. & HOLTE, R. C. 2000. Explicity Representing Expected Cost: An Alternative to ROC Representation. Paper presented at the In Proceedings of the Six ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. FERRI, C., HERNANDEZ-ORALLO, J. & SALIDO, M. A. 2003. Volume Under the ROC surface for Multi-class Problems. Exact Computation and Evaluation of Approximations. Paper presented at the Proc. of 14th European Conference on Machine Learning. FLUSSER, J., SUK, T. & ZITOVA, B. 2009. Moments and moment invariants in pattern recognition. West Sussex: John Wiley & Sons Ltd. 144 FREIXENET, J., MUNOZ, X., RABA, D., MARTI, J. & CUFI, X. 2002. Yet Another Survey on Image Segmentation: Region and Boundary Information Integration. Springer:408 - 422. GABBOUJ, M., AHMAD, I., AMIN, M. Y. & KIRANYAZ, S. 2005. Content based Image Retrieval for Connected Mobile Devices. Paper presented at the Image Rochester, New York. GE, Y., XIONG, H., TUZHILIN, A. & XIAO, K. 2010. An Energy-Efficient Mobile Recommender System. Paper presented at the Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, New York. GEMMIS, M. D., IAQUINTA, L., LOPS, P., MUSTO, C., NARDUCCI, F. & SEMERARO, G. 2009. Preference Learning in Recommender Systems. Paper presented at the European Conference on Machine Learning and Principles and Practice of knowledge Discovery in Databases (ECML PKDD 2009), Bled, Slovenia. GHAZANFAR, M. A. & PRUGEL-BENNETT, A. 2010. An Improved Switching Hybrid Recommender System Using Naive Bayes Classifier and Collaborative Filtering. Paper presented at the Proceedings of the International MultiConference of Engineers and Computer Science (IMECS), Hong Kong. GHAZANFAR, M. A. & PRUGEL-BENNETT, A. 2011. Fulfilling the Needs of GraySheep Users in Recommender Systems, A Clustering Solution. Paper presented at the In 2011 International Conference on Information Systems and Computational Intelligence, Harbin, China. GULDOGAN, O. & GABBOUJ, M. 2005. Content-Based Image Indexing and Retrieval framework on Symbian Based Mobile Platform. Paper presented at the European Signal Processing Conference, EUSIPCO 2005. GUNAWARDANA, A. & MEEK, C. 2009. A Unified Approach to Building Hybrid Recommender Systems. Paper presented at the Proceedings of the 2009 ACM Conference on Recommender Systems, New York. HEIJDEN, H. V. D., KOTSIS, G. & KRONSTEINER, R. 2005. Mobile recommedation systems for decision making 'on the go'. Paper presented at the Proceeding of the International Conference on Mobile Business. 145 HOSHINO, R., COUGHTREY, D., SIVARAJA, S., VOLNYANSKY, I., AUER, S. & TRICHTCHENKO, A. 2009. Applications and extensions of cost curves to marine container inspection. Annals OR, 187(1):159-183. HU, S., HOFFMAN, E. A. & REINHARDT, J. M. 2001. Automatic Lung Segmentation for Accurate Quantitation of Volumetric X-Ray CT Images. IEEE, 20(6):490-498. HUANG, C.-L. & HUANG, W.-L. 2009. Handling sequential pattern decay:Developing a two-stage collaborative recommender system. Electronic Commerce Research and Applications, 8(2009):117-129. HUANG, H. & JIANG, J. 2009. Laplacian Operator Based Level Set Segmentation Algorithm for Medical Images. Paper presented at the IEEE: Second International Congress on Image and Signal Processing (CISP), Tianjin. JARVELIN, K. & KEKALAINEN, J. 2000. IR evaluation methods for retrieving highly relevant documents. Paper presented at the Proceedings of the 23rd Annual Internationa ACM SIGIR Conference on Research and Development in Information Retrieval, New York NY. KEKRE, H. B. & GHARGE, S. M. 2010. Image Segmentation using Extended Edge Operator for Mammographic Images. International Journal of Computer Science and Engineering (IJCSE), 2(4):1086-1091. KIRBAS, C. & QUEK, F. K. H. 2003. Vese Extraction Techniques and Algorithms: A Survey. Paper presented at the Proceedings of the third IEEE Symposium on BioInformatics and BioEngineering (BIBE'03). LAKSHMI, S. & SANKARANARAYANAN, V. 2010. A study of Edge Detection Techniques for Segmentation Computing Approaches. International Journal of Computer Application (IJCA), Special Issue on CASCT, 1:35-41. LANDGREBE, T. C. W., PACLIK, P. & DUIN, R. P. W. 2006. Precision-recall operating characteristic (P-ROC) curves in imprecise environments. Paper presented at the The 18th International Conference on Pattern Recognition (ICPR'06), Washington, DC. 146 LANKTON, S. & TANNENBAUM, A. 2008. Localizing Region-Based Active Contours. IEEE Transactions on Image Processing, 17(11):2029-2039. LATECKI, L. J., LAKAMPER, R. & ECKHARDT, U. 2000. Shape descriptors for nonrigid shapes with a single closed contour. Paper presented at the IEEE Conference proceedings on Computer Vision and Pattern Recognition. LECCE, V. D. & GUERRIERO, A. 1999. An Evaluation of the Effectiveness of Image Features for Image Retrieval. Visual Communication and Image Representation, 10:351362. LI, Y. & GUAN, L. 2006. An effective shape descriptor for the retrieval of natural image collections. Paper presented at the. Proceedings of the IEEE CCECE/CCGEI, Ottawa. LIU, J. 2006. Robust Image Segmentation using Local Median. Paper presented at the Proceedings of the 3rd Canadian Conference on Computer and Robot Vision, Canada. LU, D. & WENG, Q. 2007. A survey of image classification methods and techniques for improving classification performance. International Journal of remote Sensing, 28(5):860870. LUCCHESE, L. & MITRA, S. K. 2001. Colour image segmentation: A state-of-the-art survey.207 - 221. MALYSZKO, D. & WIERZCHON, S. T. 2007. Standard and Genetic k-means Clustering Techniques in Image Segmentation. Paper presented at the Sixth International Conference on Computer Information Systems and industrial Management applications (CISIM'07), Minneapolis, MN. MANDL, T. 2008. Recent Developments in the Evaluation of Information Retrieval System: Moving Towards Diversity and Practical Relevance. Informatica, 32(2008):2738. MANNING, C. D., RAGHAVAN, P. & SCHUTZE, H. 2008. Introduction to Information Retrieval. Cambridge University Press. 147 MAOFU, L., YANXIANG, H. & BIN, Y. 2007. Image Zernike Moments Shape Feature Evaluation Based on Image Reconstruction. Geo-spatial Information Science, 10(3):191195. MELVILLE, P. & SINDHWANI, V. 2010. Recommender Systems. In: VERLAG, S. (Ed.). Encyclopedia of Machine Learning (1-9). Berlin: Springer. MILJKOVIC, O. 2009. Image Pre-Processing. Kragujevac J Math, 32(2009):97-107. MIN, J., POWELL, M. & BOWYER, K. W. 2004. Automated performance evaluation of range image segmentation algorithms. IEEE, 34(1):263-271. MINGQIANG, Y., KIDIYO, K. & JOSEPH, R. extraction techniques. Pattern Recognition:43-90. 2008. A survey of shape feature MUKUNDAN, R. & RAMAKRISHNAN, K. R. 1998. Moment functions in image analysis: theory and applications. Singapore: World Scientic Publishing Co. Pte. Ltd. MULLER, H., MICHOUX, N., BANDON, D. & GEISSBUHLER, A. 2004. A review of content-based image retrieval systems in medical applications-clinical benefits and future directions International Journal of Medical Informatics, 73(1):1-23. OLUGBARA, O. O., OJO, S. O. & MPHAHLELE, M. I. 2010. Exploiting Image Content in Location-Based Shopping Recommender Systems for Mobile Users. International Journal of Information Technology & Decision Making, 9(5):759-778. PAZZANI, M. J. & BILLSUS, D. 2007. Content-based Recommendation Systems. Paper presented at the The Adaptive Web, methods and Strategies of Web Personalization. PETRAKIS, E. G. M. & FALOUTSOS, C. 1997. Similarity Searching in Medical Image Databases. IEEE Transactions on Knowledge and Data Engineering, 9(3). 148 POLAK, M., ZHANG, H. & PI, M. 2009. An evaluation metric for image segmentation of multiple objects. Image and Vision Computing, 27(8):1223-1227. RASMUSSEN, E. 2002. Evaluation in Information Retrieval. Paper presented at the 3rd International Conference on Music Information Retrieval, Paris, France. REKIK, A., ZRIBI, M., HAMIDA, A. B. & BENJELLOUN, M. 2009. An Optimal Unsupervised Satellite image Segmentation Approach Based on Pearson System and kMeans Clustering Algorithm Initialization. Internationaal Journal of Signal Processing, 5(1). RICCI, F. 2010. Mobile Recommender Systems. IT & Tourism, 12(3):205-231. RICCI, F. & NGUYEN, Q. N. 2006. Acquiring and revising preferences in a critiquebased mobile recommender system IEEE Intelligent System, 22(3):22-29. RUI, Y. & HUANG, T. S. 1999. Image Retrieval: Current Techniques, Promising Directions, and Open Issues. Journal of Visual Communication and Image Representation, 10:39-62. SAMMA, A. S. B. & SALAM, R. A. 2009. Adaptation of K-mean Algorithm for Image Segmentation. International Journal of Information and Communication Engineering, 5(4):58-62. SARWAR, B. M., KARYPIS, G., KONSTAN, J. & RIEDL, J. 2002. Recommender Systems for Large-Scale E-Commerce: Scalable Neighborhood Formation Using Clustering. Paper presented at the In Proceedings of the Fifth International Conference on Computer and Information Technology, Dhaka, Bangladesh. SCHAFER, J. B., FRANKOWSKI, D., HERLOCKER, J. & SEN, S. 2007. Collaborative Filtering Recommender Systems. In: SPRINGER-VERLAG (Ed.). The Adaptive web (291-324). Berlin, Heidelberg. SCHAFER, J. B., KONSTAN, J. & RIEDL, J. 1999. Recommender Systems in eCommerce. Paper presented at the In '99: Proceedings of the 1st ACM Conference on Electronic Commerce New York. 149 SEZGIN, M. & SANKUR, B. 2004. Survey over image thresholding techniques and quantitative performance evaluation. Journal of Electronic Imaging, 13(1):146–165. SHARMA, N. & AGGARWAL, L. M. 2010. Automated Medical Image Segmentation Techniques. Journal of Medical Physics, 35(1):3-14. SHENG, C. & XIN, Y. 2005. Shape-based retrieval using shape matrix. International Journal of signal processing:163-166. SHIMAZAKI, H. & SHINOMOTO, S. 2007. A method for selecting the bin size of a time hostogram Neural Computation, 19(6):1503-1527. SIMONOFF, J. S. 1996. Smoothing Methods in Statistics. Springer Series in Statistics. In: SPRINGER (Ed.). SKOPAL, T. 2010, September Where are you heading, metric access methods?: a provocative survey. Paper presented at the SISAP '10: Proceeding of the Third International Conference on Similarity Search and Application. SKOPAL, T. & BUSTOS, B. 2010. On Nonmetric Similarity Search Problems in Complex Domains. ACM Journal Name, V:1-56. STEJIC, Z., TAKAMA, Y. & HIROTA, K. 2003. Genetic algorthm-based relevance feedback for image retrieval using local similarity patterns. Information Processing and Management, 39(1):1-23. SU, X. & KHOSHGOFTAAR, T. M. 2009. A Survey of Collaborative Filtering Techniques. Advances in Artificial Intelligence, 2009(2009):1-19. TANG, J. 2010. A Color Image Segmentation Algorithm Based on Region Growing. Paper presented at the Second International Conference on Computer Engineering and Technology, Chengdu, China. 150 TERRELL, G. R. & SCOTT, D. W. 1992. Variable Kernel Density Estimation. The Annals of Statistics, 20(3):1236-1265. TRAN, D. C. & ONO, K. 2000. Content-based image retrieval: Object representation by the Density of feature Points.213-218. UDUPA, J. K., LEBLANC, V. R., ZHUGE, Y., IMIELINSKA, C., SCHMIDT, H., CURRIE, L. M., et al. 2006. A framework for evaluating image segmentation algorithms. Computerized Medical Imaging and Graphics, 30( ):75-87. VARSHNEY, S. S., RAJPAL, N. & PURWAR, R. 2009. Comparative Study of Image Segmentation Techniques and Object Matching using Segmentation. Paper presented at the Methods and Models in Computer Science. VASUDA, P. & SATHEESH, S. 2010. Improved Fuzzy C-means Algorithm for MR Brain Image Segmentation. International Journal on Computer Science and Engineering, 2(5):1713-1715. WALTER, S. D. 2002. Properties of the Summary Receiver Operating Characteristic (SROC) curve for diagnostic test data. Statistics in Medicine, 21(9):1237-1256. WANG, L., HE, L., MISHRA, A. & LI, C. 2009. Active contours driven by local Gaussian distribution fitting energy. Signal Processing. WANG, Y., GUO, Q. & ZHU, Y. 2007. Medical image segmentation based on deformable models and its applications Springer:209-260. YANG, W.-S., CHENG, H.-C. & DIA, J.-B. 2008. A location-aware recommender system for mobile shopping environments. Expert Systems with Applications, 34(2008):437-445. ZHANG, D. & LU, G. 2002. Generic Fourier Descriptor for Shape-based Image Retrieval. Paper presented at the IEEE Transactions on multimedia. 151 ZHANG, D. & LU, G. 2004. Review of shape representation and description techniques. Pattern Recognition Society, 37:1-19. ZHANG, H., FRITTS, J. E. & GOLDMAN, S. A. 2008. Image Segmentation Evaluation: A survey of unsupervised methods. Computer Vision and Image Understanding, 10(2):260-280. ZHANG, J., LIN, Z., XIAO, B. & ZHANG, C. 2009. An Optimized Item-Based Collaborative Filtering Recommendation Algorithm. Paper presented at the IEEE International Conference on Network Infrastructure and Digital Content (IC-NIDC), Beijing. ZHANG, Y. J. 2001. A review of recent evaluation methods for image segmentation. Paper presented at the International Symposium on Signal Processing and its Applications (ISSPA), Kuala Lumpur. ZHENG, X., SHERRILL-MIX, S. A. & GAO, Q. 2007a. Perceptual shape-based natural image representation and retrieval Paper presented at the International Conference on Semantic Computing. ZHENG, X., SHERRILL-MIX, S. A. & GAO, Q. 2007b. Perceptual shape-based natural image representation and retrieval. Paper presented at the Proceedings of the IEEE International Conference on Semantic Computing. ZHOU, B. & YAO, Y. 2010. Evaluation information retrieval system performance based on user preference. Journal of Intelligent Information Systems, 34(3):227-248. ZUVA, T., OLUGBARA, O. O., OJO, S. O. & NGWIRA, S. M. 2012. Kernel Density Feature Points Estimator for Content-based Image Retrieval. Signal & Image Processing: An International Journal (SIPIJ), 4(1):103-111. 152