A Novel Approach for Filtering Junk Images from Google Search

Transcription

A Novel Approach for Filtering Junk Images from Google Search
A Novel Approach for Filtering Junk Images from
Google Search Results
Yuli Gao1 , Jianping Fan1, , Hangzai Luo1 , and Shin’ichi Satoh2
1
Department of Computer Science, UNC-Charlott, USA
{jfan,ygao,hluo}@uncc.edu
2
National Institute of Informatics, Tokyo, Japan
[email protected]
Abstract. Keyword-based image search engines such as Google Images are now
very popular for accessing large amount of images on the Internet. Because only
the text information that are directly or indirectly linked to the images are used for
image indexing and retrieval, most existing image search engines such as Google
Images may return large amount of junk images which are irrelevant to the given
queries. To filter out the junk images from Google Images, we have developed
a kernel-based image clustering technique to partition the images returned by
Google Images into multiple visually-similar clusters. In addition, users are allowed to input their feedbacks for updating the underlying kernels to achieve
more accurate characterization of the diversity of visual similarities between the
images. To help users assess the goodness of image kernels and the relevance between the returned images, a novel framework is developed to achieve more intuitive visualization of large amount of returned images according to their visual
similarity. Experiments on diverse queries on Google Images have shown that our
proposed algorithm can filter out the junk images effectively. Online demo is also
released for public evaluation at: http://www.cs.uncc.edu/∼jfan/google−demo/.
Keywords: Junk image filtering, similarity-preserving image projection.
1 Introduction
As online image sharing and personal journalism become more and more popular, there
is an urgent need to develop more effective image search engines, so that users can
successfully retrieve large amount of images on the Internet. Text-based image search
engines such as Google Images have achieved great success on exploiting text information to index and retrieve large-scale online image collections. Even Google has the
most powerful text search engine in the world, Google Images is still unsatisfactory
because of the relatively low precision rate of the top ranked images [6-10]. One of the
major reasons for this phenomena is due to the fact that Google simplifies the image
search problem as a purely text-based search problem, and the underlying assumption
is that the image semantics are directly related to the text terms extracted from the associated documents. Unfortunately, such oversimplified online image indexing approach
has completely ignored that the linkages between the image semantics and the text
Correspondence Author.
S. Satoh, F. Nack, and M. Etoh (Eds.): MMM 2008, LNCS 4903, pp. 1–12, 2007.
c Springer-Verlag Berlin Heidelberg 2007
2
Y. Gao et al.
terms (that can be extracted from the associated text documents) may not be one-to-one
correspondence, but they could be one-to-many, many-to-one, or many-to-many relationships, or even there is no exact correspondence between the image semantics and
the associated text terms. This is the major reason why Google Images may return large
amount of junk images which are irrelevant to the given keyword-based queries. In addition, a lot of real world settings, such as photo-sharing websites, may only be able to
provide biased and noisy text taggings which may further mislead the text-based image
search engines such as Google Images. Therefore, there is an urgent need to develop
new algorithms to support junk image filtering from Google Images [6-10].
With the increasing computional power of modern computers, it is possible to incorporate image analysis algorithms into the text-based image search engines such as
Google Images without degrading their response speed significantly. Recent advance in
computer vision and multimedia computing can also allow us to take advantages of the
rich visual information (embedded in the images) for image semantics interpretation.
Some pioneer works have been proposed to improve Google Images [6-10].
By integrating multi-modal information (visual similarity, associated text, and users’
feedbacks) for image semantics interpretation, we have developed a novel framework
to filter out the junk images from Google Images. Our scheme takes the following major steps for junk image filtering: (a) Google Images is first performed to obtain large
amount of images returned for a given text-based query; (b) Our feature extraction algorithm is then performed to extract both the global and local visual features for image
similarity characterization; (c) The diverse visual similarities between the images are
characterized jointly by using multiple kernels and the returned images are partitioned
into multiple clusters according to their kernel-based visual similarities; (d) A hyperbolic visualization algorithm is developed to achieve more understandable assessment
of the relevance between the returned images and the users’ real query intentions; (e)
If necessary, users can involve to select few relevant images or junk images and such
users’ feedbacks are automatically transformed to update the kernels for image similarity characterization; (f) The updated kernel is further used to create a new presentation
of the returned images adaptively according to the users’ personal preferences.
2 Image Content Representation
The visual properties of the images are very important for users to assess the relevance
between the images returned by keyword-based queries and their real query intentions
[1-2]. Unfortunately, Google Images has fully ignored such important characteristics of
the images. In this paper, we have developed a new framework to seamlessly integrate
keyword-based image search with traditional content-based image search. To avoid the
pitfalls of image segmentation tools, image segmentation is not performed for feature
extraction. To characterize the diverse visual properties of images efficiently and effectively, both the global visual features and the local visual features are extracted for image similarity characterization. The global visual features such as color histogram can
provide the global image statistics and the general perceptual properties of entire images [11]. On the other hand, the local visual features via wavelet image transformation
A Novel Approach for Filtering Junk Images from Google Search Results
3
can characterize the most significant information of the underlying image structures
effectively [12].
To filter out the junk images from Google Images, the basic question is to define
more suitable similarity functions to accurately characterize the diverse visual similarities between the images returned by the keyword-based queries. Recently, the use of
kernel functions for data similarity characterization plays an important role in the statistical learning framework, where the kernel functions may satisfy some mathematical
requirements and possibly capture some domain knowledge.
In this paper, we have proposed two basic image descriptors to characterize various visual and geometrical properties of images [11-12]: (a) global color histogram; (b)
texture histogram via wavelet filter bank. The diverse visual similarities between the images can be characterized more effectively and efficiently by using a linear combination
of their basic image kernels (i.e., mixture-of-kernels):
K̂(x, y) =
κ
αi Ki (x, y),
i=1
κ
αi = 1
(1)
i=1
where αi ≥ 0 is the importance factor for the ith basic image kernel Ki (x, y) for
image similarity characterization. The rules for multiple kernel combination (i.e., the
selection of the values for the importance factors α) depend on two key issues: (a) The
relative importance of various visual features for image similarity characterization; (b)
The users’ preference. In this paper, we have developed an iterative algorithm to determine the values of the importance factors by seamlessly integrating both the importance
of visual features and the users’ preferences.
3 Kernel-Based Image Clustering
The images returned by the same keyword-based search are automatically partitioned
into multiple clusters according to their kernel-based visual similarities. Our kernelbased image clustering algorithm is able to obtain the most significant global distribution structures of the returned images. Through using multiple kernels for diverse
image similarity characterization, our kernel-based image clustering algorithm is able
to handle high-dimensional visual features effectively.
The optimal partition of the returned images is obtained by minimizing the trace of
φ
. The scatter matrix is given by:
the within-cluster scatter matrix, Sw
⎛
⎞
N
τ N
1
1
φ
Sw
=
βli ⎝K̂(xi , xj ) −
βlj K̂(xi , xj )⎠
(2)
N
N
l j=1
i=1
l=1
where K̂(·.·) is the mixture kernel function, N is the number of returned images and
τ is the number of clusters, Nl is the number of images in the lth cluster. Searching
the optimal values of the elements βli that minimizes the expression of the trace can
be achieved effectively by an iterative procedure. One major problem for kernel-based
image clustering is that it may require huge memory space to store the kernel matrix
when large amount of images come into view. Some pioneer works have been done for
4
Y. Gao et al.
reducing the memory cost, such as Chunking, Sequential Minimal Optimization (SMO),
SVMlight , and Mixture of Experts. One common shortage for these decompositionbased approaches is that global optimization is not performed.
Rather than following these decomposition-based approaches, we have developed a
new algorithm for reducing the memory cost by seamlessly integrating parallel computing with global decision optimization. Our new algorithm takes the following key
steps: (a) Users are allowed to define the maximum number of returned images which
they want to see and assess, and thus it can reduce the memory cost significantly. In addition, the returned images are partitioned into multiple smaller subsets. (b) Our kernelbased image clustering algorithm is then performed on all these smaller image subsets
to obtain a within-subset partition of the images according to their diverse visual similarities. (c) The support vectors for each image subset are validated by other image
subsets through testing Karush-Kuhn-Tucker (KKT) conditions. The support vectors,
which violate the KKT conditions, are integrated to update the decision boundaries for
the corresponding image subset incrementally. This process is repeated until the global
optimum is reached and an optimal partition of large amount of images under the same
image topic can be obtained accurately.
Our kernel-based image clustering algorithm has the following advantages: (1) It
can seamlessly integrate multiple kernels to characterize the diverse visual similarities between the images more accurately. Thus it can provide a good insight of large
amount of images by determining their global distribution structures (i.e., image clusters and their distributions) accurately, and such global image distribution structures
can further be integrated to achieve more effective image visualization for query result
assessment. (2) Only the most representative images (which are the support vectors) are
stored and validated by other image subsets, thus it may request far less memory space.
The redundant images (which are non-support vectors) are eliminated early, thus it can
significantly accelerate kernel-based image clustering. (3) Because the support vectors
for each subset are validated by other subsets, our algorithm can handle the outliers and
noise effectively and it can generate more robust clustering results.
4 Kernel Selection for Similarity-Preserving Image Projection
When the majority of the images returned by Google Images are relevant to the given
keyword-based query, there should be an intrinsic clustering structure within the corresponding kernel matrix, i.e., the kernel matrix would be in the form of a perturbed
block-diagonal matrix, where each block corresponds to one certain visual category,
and other entries of the kernel matrix (which corresponds to outliers or wrong returns)
are close to zero.
Based on this understanding, it seems reasonable to apply some meaningful “clusterness” measurement, such as the sum of square within-cluster distances, to estimate
the relative importance between various basic image kernels, and such clusterness measurement can further be used as the criteria for kernel selection. However, this naive
approach may actually yield a faulty decision due to the following reasons: (a) The majority assumption may not hold true. In the study conducted by Fergus et al. [6-7], it
is reported that, among the images returned by Google Images, contains more “junk
A Novel Approach for Filtering Junk Images from Google Search Results
5
images” than “good images” for more than half of the queries they studied. (b) A high
“clusterness” measurement may not directly imply a good kernel matrix, i.e., the reverse statement about the clusterness of the kernel matrix is not true. A trivial kernel
matrix, with one in all its entries, may always yield the best clusterness score for all
the queries. However, such trivial kernel matrix is certainly meaningless in revealing
the true clustering structure. (c) The text-based search engines unavoidably suffer from
the problem of semantic ambiguity. When users submit a query via keyword, the textbased image search engines such as Google Images may not know a priori which word
sense corresponds to the user’s request. Therefore, even the ideal kernel matrix may be
available, the text-based search engines can not possibly know which image clusters are
most relevant to the users’ real needs.
Because the systems may not know the real needs of users (i.e., which image cluster
is relevant or which image cluster is irrelevant to a given keyword-based query), it is
very hard to define the suitable criteria to evaluate the goodness of the kernel matrix and
achieve automatic kernel selection for junk image filtering, i.e., without users’ inputs,
it is very hard if not impossible to identify which image clusters correspond to the junk
images. One potential solution for these difficulties is to allow users to interactively
provide additional information for junk image filtering. Obviously, it is worth noting
that such interaction should not bring huge burden on the users.
In order to capture users’ feedbacks for junk image filtering, it is very important
to enable similarity-based visualization of large amount of images returned by Google
Images, so that users can quickly judge the relevance of an image with their real query
intentions. It is well-known that the diverse visual similarities between the images can
be characterized more effectively and efficiently by using different types of visual features and different types of kernels. Therefore, different types of these basic image
kernels may play different roles on characterizing the similarity of the returned images
from Google Images, and the optimal kernel for image similarity characterization can
be approximated more effectively by using a linear combination of these basic image
kernels with different importances. Obviously, such optimal combination of these basic
image kernels for image similarity characterization also depends on users’ preference.
To allow users to assess the relevance between the returned images and their real
query intentions, it is very important to achieve similarity-based visualization of large
amount of returned images by selecting an optimal combination of the basic image
kernels. Instead of targeting on finding an optimal combination of these basic image
kernels at the beginning, we have developed an iterative approach by starting from
a single but most suitable basic image kernel for generating the image clusters and
creating the hyperbolic visualization of the returned images, and the user’s feedbacks
are then integrated for obtaining the most accurate combination of the basic image
kernels iteratively.
We adopt a semi-supervised paradigm for kernel combination and selection, where
the most suitable basic image kernel is first used to generate the visually-similar image clusters and create the similarity-based visualization of the returned images. The
users are then allowed to choose a couple of relevant/junk images. Such users’ feedbacks are then transformed and integrated for updating the underlying image kernels
incrementally, re-clustering the returned images and creating new presentation and
6
Y. Gao et al.
visualization of the returned images. Through such iterative procedure, the most suitable
image kernels can be selected and be combined to effectively characterize the diverse
image similarities and filter out the junk images from Google Images.
To select the most suitable image kernel to start this iterative procedure, the S measure is used. For a given basic image kernel K, it can be turned into a distance matrix
D, where the distance D(x, y) between two images with the visual features x and y is
given by:
(3)
D(x, y) = φ(x) − φ(y) = K̂(x, x) + K̂(z, z) − 2K̂(x, z)
where we use φ(x) to denote the implicit feature space of the image with the visual
features x. We then rank all of these basic image kernels by their S scores, which is
defined as:
m m
m n
i=1
j=i+1 D(xi , xj ) −
i=1
j=1 D(xi , yj )
(4)
S=
median(D)
where median(D) gives the median distance between all the pair-distances among
all the image samples. {xi |i = 1, · · ·, m} and {yj |j = 1, · · ·, n (m + n ≥ 2)}
are the image pairs. Intuitively, S measure gives the favor of the basic image kernels
which may have higher similarity between the relevant image pairs and lower similarity
between the irrelevant image pairs. The smaller the S score, the better characterization
of the image similarity. Therefore, the basic image kernel with the lowest S score is first
selected as the ideal kernel to achieve an initial partition (clustering) of large amount
of images returned by Google Images, and create an initial hyperbolic visualization of
the returned images according to their kernel-based visual similarity, so that the users
can easily assess the relevance between the returned images and their query intentions.
In addition, the users can input their feedbacks interactively according to their personal
preferences.
To preserve the similarity relationships between the returned images, the images
returned by Google Images are projected to a 2D hyperbolic coordinate by using Kernel
Principle Component Analysis (KPCA) according to the selected basic image kernel
[13]. The kernel PCA is obtained by solving the eigenvalue equation:
Kv = λM v
(5)
→
v1 , · · ·, −
v→
where λ = [λ1 , · · · , λM ] denotes the eigenvalues and v = [−
M ] denotes the
corresponding complete set of eigenvectors, M is the number of the returned images,
K is a kernel matrix.
The optimal KPCA-based image projection can be obtained by:
⎫
⎧
M M
⎬
⎨
min
|K̂(xi , xj ) − d(xi , xj )|2
(6)
⎭
⎩
i=1 j=1
xi =
M
l=1
αl K̂(x, xl ), xj =
M
l=1
αl K̂(xl , xj )
A Novel Approach for Filtering Junk Images from Google Search Results
7
where K̂(xi , xj ) is the original kernel-based similarity distance between the images
with the visual features xi and xj , d(xi , xj ) is their location distance on the display
unit disk by using kernel PCA to achieve similarity-preserving image projection. Thus
the visually-similar images (i.e., images with smaller kernel-based similarity distances)
can be visualized closely on the display unit disk. The suitable kernels for similaritypreserving image projection can be chosen automatically to make the most representative images from different clusters to be spatially distinct.
Our mixture-kernel function can characterize the diverse visual similarities between
the images more accurately than the weighted distance functions used in multidimensional scaling (MDS), thus our KPCA-based projection framework can achieve
better similarity-based image visualization than the MDS-based projection approaches.
Therefore, KPCA-based image projection algorithm can preserve the similarity relationships between the images effectively.
5 Hyperbolic Image Visualization for Hypothesis Assessment
After such similarity-based image projection is obtained by using KPCA, Poincaré disk
model [15] is used to map the returned images from their feature space (i.e., images
which are represented by their visual features) onto a 2D display coordinate. Poincaré
disk model maps the entire Euclidean space into an open unit circle, and produces a
non-uniform mapping of the Euclidean distance to the hyperbolic space.
Formally, if let ρ be the hyperbolic distance and r be the Euclidean distance, of one
certain image A to the center of the unit circle, the relationship between their derivative
is described by:
2
· dr
(7)
dρ =
1 − r2
Intuitively, this projection makes a unit Euclidean distance correspond to a longer hyperbolic distance as it approaches the rim of the unit circle. In other words, if the images
are of fixed size, they would appear larger when they are closer to the origin of the unit
circle and smaller when they are further away. This property makes it very suitable for
visualizing large amount of images because the non-uniformity distance mapping creates an emphasis for the images which are in current focus, while de-emphasizing those
images that are further form the focus point.
In practice, it is often difficult to achieve an optimal kernel at the first guess. Therefore, it is desirable to allow users to provide feedbacks to the system, e.g., how closely
the current image layouts correspond to their real needs. On the other hand, it is also
very important to guarantee that the system can capture such users’ feedbacks effectively and transform them for updating the underlying kernel matrix and creating new
presentation and visualization of large amount of returned images.
In this paper, we have explored the usage of pair-wise constraints which can be obtained from users’ feedbacks automatically. In order to incorporate the users’ feedbacks
for improving kernel-based image clustering and projection, we have proposed an iterative algorithm that can directly translate the constraints (derived from the relevant and
junk images given by the users) into the kernel transformation of input space (feature
space) to generate more accurate kernels for image clustering and projection.
8
Y. Gao et al.
Fig. 1. Our online system for filtering junk images from Google Images, where the keyword “red
flower” is used for Google image search and most junk images are projected on the left side
One naive method is to generalize the vector-based kernel by introducing a weight
w = (w1 , w2 , ..., wN ) on each feature dimension of the input vector space, i.e., φ(x) =
(w1 x1 , w2 x2 , ..., wN xN ). Suppose we encode the pair-wise constraints between two
sets of feature vectors x, y in a constraint matrix C, where C(x, y) = 1 for must-link
image pairs (relevant image pairs); -1 for cannot-link image pairs (junk image pairs); 0
for non-constrained image pairs (image pairs which are not selected by the users), the
weight w can then be updated as:
w
i = wi · e−γxi−yi ·c(x,y)
(8)
where γ is a learning rate specifiable by the users. This reweighing process corresponds
to a dimension-wise rescaling of the input space such that the must-link image pairs can
be close (in the form of norm distance) to each others, and the cannot-link image pairs
are far apart. The resulting weight w also has an intuitive interpretation: dimensions
associated with large weights are more discriminant. For example, when the feature
vectors are represented as the color histogram, large weight for a certain dimension
(color bin) means that the proportion of the image area associated with this quantized
color play more important role on characterizing the image similarity. If we have m
constraints to satisfy, the original input space can be transformed by a sequence of
localized functions f 1 , f 2 , · · ·, f m and the final transformation of the input space is
given by φ(x) = f m (f m−1 (...f 2 (f 1 (x)))).
However, the major limitation of this simple dimension-wise rescaling algorithm is
that the scale factor along the full range of the respective dimension is uniform. If there
exists the same rescaling demand on this dimension from both the must-link constraints
and the cannot-link constraints, the rescaling would be cancelled.
A Novel Approach for Filtering Junk Images from Google Search Results
9
Fig. 2. Our online system for filtering junk images from Google Images, where the keyword
“sunset” is used for Google image search and most junk images are projected on the right-bottom
corner
To address this conflict, we have introduced two operators: shrinkage and expansion, whose rescaling effects are limited to a local neighborhood. In this work, we use
a piecewise linear function to achieve localized expansion and shrinkage. Obviously,
other localized functions may also be applicable.
→
x ) = (f1 (x1 ),
As indicated above, the transformation is now in the form of f k (−
f2 (x2 ), · · ·, fN (xN )), where fi (xi ), i = 1, · · ·, N are non-linear functions with local→
ized transformations, and −
x = {x1 , · · ·, xN } is N -dimensional feature vector. Given a
pair of vectors u, v, the ith component of the transformation is to be updated as:
⎧
xi < ui
⎨ xi
xi ∈ [ui , vi ]
fi (xi ) = a · (xi − ui ) + ui
(9)
⎩
xi + (a − 1) · (vi − ui ) xi > vi
where vi > ui , a is a constant term that satisfies: a > 1 for expansion operation,
and 0 < a < 1 for shrinkage operation. We set a = γ1 for the must-link constraints
and a = γ for the cannot-link constraints, where γ > 1 reflects the learning rate.
This constrained rescaling is used in the hyperbolic visualization, which is iteratively
rescaled until the best kernel for junk image filtering according to the users’ personal
preferences.
Although this rescaling is done piece-wise linearly in the input space, it can be a
non-linear mapping in the feature space if non-linear kernels such as RBF are used.
It can be proved that the new kernel satisfies Mercer’s conditions because K(x,
y) =
N
N
K(φ(x), φ(y)), where φ(x) : R −→ R .
10
Y. Gao et al.
Through such iterative kernel updating algorithm, an optimal kernel matrix is obtianed
by seamlessly integrating both the visual consistency between the relevant images and
the constraints derived from the user feedbacks, and our kernel-based image clustering
algorithm is then performed to partition the returned images into multiple visual categories. The image cluster that is selected as the relevant images is returned to the user as
the final result. Images in this cluster are then ranked in an ascending order according to
their kernel-based similarity distances with the images that are selected by the users.
6 System Evaluation
For a given text-based image query, our system can automatically generate 2D hyperbolic visualization of the returned images according to their diverse kernel-based visual
similarities. In Figs. 1, 2 and 3, the junk image filtering results for several keywordbased Google searches are given. From these experimental results, one can observe that
our proposed system can filter out the junk images effectively. In addition, users are allowed to provide the must-link and the cannot-link constraints by clicking the relevant
images and the junk images. Such constraints given by the users are automatically incorporated to update the underlying image kernels, generate new clustering and create
new presentation and visualization of the returned images as shown in Fig. 4. One can
observe that most junk images are filtered out after the first run of feedback. In order
to invite more people to participate for evaluating our junk image filtering system, we
have released our system at: http://www.cs.uncc.edu/∼jfan/google−demo/ .
To evaluate the effectiveness of our proposed algorithms for kernel selection and
updating, the accuracy of the undlerying image clustering kernel is calculated for each
user-system interaction. Given the confusion matrix C for image clustering, the
accuracy is defined as:
Fig. 3. Our online system for filtering junk images from Google Images, where the keyword “blue
sky” is used for Google image search and most junk images are projected on the left side
A Novel Approach for Filtering Junk Images from Google Search Results
11
Fig. 4. The filtering results for the keyword-based search “red flower”, the images, which boubdaries are in red color, are selected as the relevant images by users.
0.95
clustering accurarcy
0.9
0.85
0.8
0.75
0.7
0.65
0.6
0.55
0.5
1
2
3
4
5
number of relevance feedback
Fig. 5. Clustering accuracy as a function of the number of feedbacks provided by users. The solid
line represents the average clustering accuracy while the error bar shows the standard deviation
over all 500 queries.
c
C(i, i)
c
Accurarcy = c i=1
i=1
j=1 C(i, j)
(10)
where c = 2 is the number of clusters (i.e. relevant versus junk clusters). As shown
in Fig. 5, the performance of our kernel-based image clustering algorithm generally
increases with the number of constraints provided by the users, but it becomes stable
after 4 iterations. On average, our kernel-based image clustering algorithm can achieve
over 75% accuracy after filtering the junk images from Google Images. Compared to
the original 58% average accuracy of Google Images, our proposed junk image filtering
algorithm can achieve a significant improvement.
12
Y. Gao et al.
7 Conclusions
In this paper, we have presented an interactive kernel learning algorithm to filter out the
junk images from Google Images or a similar image search engine. The interaction between users and the system can be done quickly and effectively through a hyperbolic visualization tool based on Poincaré disk model. Supplied with user-given constraints, our
kernel learning algorithm can incrementally update the underlying hypotheses (margin
between the relevant images and the junk images) to approximate the underlying image
relevance more effectively and efficiently, and the returned images are then partitioned
into multiple visual categories according to the learned kernel matrix automatically. We
have tested our kernel learning algorithm and the relevance feedback mechanism on a
variety of queries which are submitted to Google Images. Experiments have shown good
results as to the effectiveness of this system. This work shows how a straightforward interactive visualization tool coupled tightly with image clustering methods and designed
carefully so that the complex image clustering results are presented to the user in an understandable manner can greatly improve and generalize the quality of image filtering.
References
1. Fan, J., Gao, Y., Luo, H.: Multi-level annotation of natural scenes using dominant image
compounds and semantic concepts. ACM Multimedia (2004)
2. Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. IEEE Trans. on PAMI (2000)
3. He, X., Ma, W.-Y., King, O., Li, M., Zhang, H.J.: Learning and inferring a semantic space
from user’s relevance feedback. ACM Multimedia (2002)
4. Tong, S., Chang, E.Y.: Support vector machine active learning for image retrieval. ACM
Multimedia, 107–118 (2001)
5. Rui, Y., Huang, T.S., Ortega, M., Mehrotra, S.: Relevance Ffeedback: A power tool in interactive content-based image retrieval. IEEE Trans. on CSVT 8(5), 644–655 (1998)
6. Fergus, R., Perona, P., Zisserman, A.: A Visual Category Filter for Google Images. In: Pajdla,
T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3024, Springer, Heidelberg (2004)
7. Fergus, R., Fei-Fei, L., Oerona, P., Zisserman, A.: Learning object categories from Google’s
image search. In: IEEE CVPR (2006)
8. Cai, D., He, X., Li, Z., Ma, W.-Y., Wen, J.-R.: Hierarchical clustering of WWW image search
results using visual, textual, and link information. ACM Multimedia (2004)
9. Wang, X.-J., Ma, W.-Y., Xue, G.-R., Li, X.: Multi-modal similarity propagation and its application for web image retrieval. ACM Multimedia (2004)
10. Gao, B., Liu, T.-Y., Qin, T., Zhang, X., Cheng, Q.-S., Ma, W.-Y.: Web image clustering by
consistent utilization of visual features and surrounding texts. ACM Multimedia (2005)
11. Ma, W.-Y., Manjunath, B.S.: Texture features and learning similarity. IEEE CVPR, 425–430
(1996)
12. Fan, J., Gao, Y., Luo, H., Satoh, S.: New approach for hierarchical classifier training and
multi-level image annotation, MMM, Kyoto (2008)
13. Scholkopf, B., Smola, A.J., Muller, K.-R.: Kernel principal component analysis. Neural
Computation 10(5), 1299–1319 (1998)
14. Vendrig, J., Worring, M., Smeulders, A.W.M.: Filter image browsing: Interactive image retrieval by using database overviews. Multimedia Tools and Applications 15, 83–103 (2001)
15. Fan, J., Gao, Y., Luo, H.: Hierarchical classification for automatic image annotation. ACM
SIGIR (2007)