S - melnikov.info

Transcription

S - melnikov.info

BASICS OF
MACHINE LEARNING
Alexey Melnikov
Institute for Theoretical Physics, University of Innsbruck
Quantum computing, control, and learning
March 16, 2016
OUTLINE
๏ Supervised Learning
๏ Unsupervised Learning
๏ Reinforcement Learning
2
MACHINE LEARNING
PREVIOUS INTRODUCTORY TALK
Environment
Intelligent agent
percepts
actions
3
MACHINE LEARNING
Agent
Environment
f
input
θ
output =
= f (θ ,input)
Agent’s
parameters
The agent (machine learning model) is defined as a function, that maps inputs to outputs.
The goal of learning is to modify the agent’s parameters, such that the agent produces
desired outputs.
Depending on the structure of the input,
several types of learning can be defined.
4
1. SUPERVISED LEARNING (SL)
SL agent
Teacher
f
Training data
{xi , yi }i=1...M
θ
Prediction
y′ = f (θ , x)
Correct output
Input object (number, vector, photo …)
Teacher provides training data
Learning model constructs
๏ M data points
‣ Input (x)
‣ Correct output (y)
๏ A map f from x to y’,
such that yi′ ! yi for all i
5
2. UNSUPERVISED LEARNING (USL)
USL agent
Teacher
f
Training data
{xi }i=1...M
θ
Prediction
y′ = f (θ , x)
๏ M data points
‣ Input (x)
๏ A map f from x to y’
6
3. REINFORCEMENT LEARNING (RL)
RL agent
Environment
f
Percept
s, r
θ
Action
a = f (θ , x)
Reward (number)
Input state (number, vector, photo …)
Environment provides
‣ State (s)
‣ Reward (r)
๏ A map f from s to a
The goal of a RL agent is to maximize the total reward
7
SUPERVISED LEARNING
SL agent
Teacher
f
Training data
{xi , yi }i=1...M
θ
Prediction
y′ = f (θ , x)
Correct output
๏ M data points
‣ Input (x)
๏ A map f from x to y’,
such that yi′ ! yi for all i
8
SUPERVISED LEARNING
SL agent
Teacher
f
Training data
{xi , yi }i=1...M
θ
Prediction
y′ = f (θ , x)
Applications
‣ Speech recognition
‣ Optical character recognition
‣ Face recognition
‣ Spam detection
‣ Netflix suggestions
‣ …
S. Russell and P. Norvig. Artificial intelligence: A Modern Approach, 3rd ed. Prentice Hall, 2009.
9
SUPERVISED LEARNING
SL agent
Teacher
f
Training data
{xi , yi }i=1...M
θ
Prediction
y′ = f (θ , x)
SL algorithms
‣ k-nearest neighbors
‣ Support Vector Machines (3. “SVM” talk; 8. “Quantum SVM” talk)
‣ Artificial Neural Networks (4. “ANN” talk)
‣ Learning Classifier Systems (5. “LCS and boosting” talk)
‣ …
10
SUPERVISED LEARNING
Training
SL agent
Testing
Teacher
SL agent
Training data
x
{xi , yi }i=1...M
θ
Input
θ
Model parameters
θ
Prediction
y' = f (θ , x)
Two types of tasks
๏ Classification (output y is discrete)
๏ Regression (output y is continuous)
11
Tester
SUPERVISED LEARNING
CL ASSIFIC ATION TASK
10 data points: {0.8, 1}
…
{0.95, 1}
SL agent
θ
{0.5, 1}
{0.3, 2}
Classifier
y
Teacher
{0.3, 2}
{0.5, 1}
{0.95, 1}
2
1
y' = f (θ , x)
0.5
⎧1,
y′ = f (θ , x) = ⎨
⎩2,
12
1
for x > θ
for x ≤ θ
x
SUPERVISED LEARNING
CL ASSIFIC ATION TASK
10 data points: {0.8, 1}
…
{0.95, 1}
SL agent
θ
{0.5, 1}
{0.3, 2}
Classifier
y
Teacher
{0.3, 2}
{0.5, 1}
{0.95, 1}
2
1
y' = f (θ , x)
0.5
⎧1,
y′ = f (θ , x) = ⎨
⎩2,
13
1
for x > 0.4
for x ≤ 0.4
x
SUPERVISED LEARNING
1D CL ASSIFIC ATION TASK
y
Classifier separates two classes
{0.3, 2}
{0.5, 1}
{0.95, 1}
2
classifier f(x)
1
0.5
⎧1,
y′ = f (θ , x) = ⎨
⎩2,
1
x
0.5
for x > 0.4
for x ≤ 0.4
14
1
x
SUPERVISED LEARNING
2D CL ASSIFIC ATION TASK
⎛ x1 ⎞
Input is a vector x = ⎜ ⎟ , output is a class {0,1}
⎝ x2 ⎠
x2
linear classifier f(x1, x2)
⎧1,
y′ = f (θ , x) = ⎨
⎩2,
for x2 > θ1 x1 + θ 2
for x2 ≤ θ1 x1 + θ 2
Empirical risk — training error
x1
1
R=
L( yi′, yi )
∑
M i
L( yi′, yi ) = 0 or 1
In general - hyperplane: Support Vector Machines (next talk)
15
SUPERVISED LEARNING
2D CL ASSIFIC ATION TASK. MORE COMPLEX MODEL
⎛ x1 ⎞
Input is a vector x = ⎜ ⎟ , output is a class {0,1}
⎝ x2 ⎠
x2
classifier f(x1, x2)
⎧1,
f (θ , x) = ⎨
⎩2,
for x2 > θ1 Sin(θ 2 x1 ) + θ 3
for x2 ≤ θ1 Sin(θ 2 x1 ) + θ 3
x1
This model classified all training samples correctly (empirical risk R=0), but is
more complex. Complex models lead to overfitting error.
16
SUPERVISED LEARNING
ERRORS VS. MODEL COMPLEXIT Y
Small complexity
Test error
overfitting
error
training
error
Larger complexity
Model complexity
Tradeoff between training error and overfitting error
17
SUPERVISED LEARNING
VAPNIK–C HERVONENKIS (VC) THEORY
overfitting
Test error
training
Model complexity
SVM
(next talk)
ANN
(next after next talk)
2M
η
h(log
+ 1) − log
⎛
⎞
h
4
P ⎜ test error ≤ training error +
= 1− η
⎟
M
⎝
⎠
h — VC dimension, corresponds to the model complexity
๏ If h is large — we can not say anything about the expected error on a test set
๏ If h is small — an error on a training set will be large, but we can bound an
error on a test set
18
SUPERVISED LEARNING
K-NN ALGORITHM. SPACE COMPLEXIT Y
x2
x2
look at k neighbors
?
choose the color, that
appears most times
x1
x1
k — the only parameter of the model. But large memory is required,
proportional to M (space complexity). Moreover, the model is
computationally complex.
19
SUPERVISED LEARNING
PROBABLY APPROXIMATELY CORRECT (PAC) THEORY
The size of the training set should be
⎞
1⎛
1
M ≥ ⎜ log + log h ⎟ ,
ε⎝
δ
⎠
0 ≤ ε ≤ 1 / 2,
0 ≤ δ ≤ 1 / 2.
h — VC dimension
ε — target error
δ — probability of having (1 -
ε
)-correct model
O. Maimon and L. Rokach. Introduction to supervised methods, Data Mining and Knowledge
Discovery Handbook. Springer, 2010.
20
SUPERVISED LEARNING
REGRESSION TASK
y
?
x
1D input, 1D output
21
SUPERVISED LEARNING
REGRESSION TASK
y
y’=f(x)
๏ Moderate training error
๏ Moderate test error is expected
22
x
SUPERVISED LEARNING
REGRESSION TASK
y
the spring is deformed from
its equilibrium length L=0
strain energy
Empirical risk —
training error
1
R=
M
y’=f(x)
2
~ (y-y’)
∑ L( y′, y )
i
i
i
Empirical risk minimisation —
principle of minimum energy
mean squared error
L( yi′, yi ) = (yi − yi′)2
๏ Moderate training error
๏ Moderate test error is expected
23
x
SUPERVISED LEARNING
REGRESSION TASK. MORE COMPLEX MODEL
y
springs have zero deformation,
the training error is zero
๏ Zero training error
๏ Huge test error is expected
24
y’=f(x)
x
UNSUPERVISED LEARNING
USL agent
Teacher
f
Training data
{xi }i=1...M
θ
Prediction
y′ = f (θ , x)
๏ M data points
‣ Input (x)
๏ A map f from x to y’
25
USL agent
Teacher
f
Training data
{xi }i=1...M
θ
Prediction
y′ = f (θ , x)
Applications
‣ Clustering
‣ Finding new concepts
‣ Dimensionality reduction
‣ …
26
USL agent
Teacher
f
Training data
{xi }i=1...M
θ
Prediction
y′ = f (θ , x)
SL algorithms
‣ k-means
‣ expectation maximization
‣ singular value decomposition
‣ …
27
CLUS TERING
x2
?
x1
2D input, binary output
M=4
28
CLUS TERING
x2
x1
M=4
29
CLUS TERING
x2
?
x1
M=23
30
CLUS TERING
x2
x1
M=23
31
DIMENSIONALIT Y REDUCTION
x2
x1
2D input
M=10
32
x2
x
x1
2D input
M=10
33
x
2D data is mapped to 1D data
34
REINFORCEMENT LEARNING
RL agent
Environment
f
Percept
s, r
θ
Action
a = f (θ , x)
Reward (number)
Input state (number, vector, photo …)
Environment provides
‣ State (s)
‣ Reward (r)
๏ A map f from s to a
The goal of a RL agent is to maximize the total reward
35
RL agent
Environment
f
Percept
s, r
θ
Action
a = f (θ , x)
Applications
‣ Games (12. “Deep (convolution) neural networks and Google AI” talk)
‣ Robotics (13. “Projective simulation and Robotics in Innsbruck” talk)
‣ Quantum experiments (14. “Machine learning for quantum experiments and
information processing” talk)
R. S. Sutton, and A. G. Barto. Reinforcement learning: An introduction. MIT press, 1998.
36
RL agent
Environment
f
Percept
s, r
θ
Action
a = f (θ , x)
RL algorithms
‣ Projective simulation (6. “PS” talk; 11. “Quantum speed-up of PS agents” talk;
13. “PS and Robotics in Innsbruck” talk; 14. “Machine learning for quantum
experiments and information processing” talk)
‣ Q-learning
‣ SARSA
R. S. Sutton, and A. G. Barto. Reinforcement learning: An introduction. MIT press, 1998.
37
MARKOV DECISION PROCESS (MDP)
The RL agent perceives one out 3
states ( S0 ,S1 ,S2 ) and can choose one
out of 2 actions ( a0 , a1 ).
With some probability, after
perceiving the state Si and doing the
action ak, the agent sees the state
S j and receives the reward
r(Si , ak , S j ) .
This process can be described by a
Markov chain with actions and
rewards — MDP.
https://en.wikipedia.org/wiki/Markov_decision_process
38
BASIC SCENARIO. MULTI- ARMED BANDIT
S
1
What slot machine should we choose?
2
…
n
What action should we choose?
https://en.wikipedia.org/wiki/Multi-armed_bandit
39
BASIC SCENARIO. MULTI- ARMED BANDIT
In order to obtain an average reward
one has to play many times…
What slot machine gives the largest
reward?
It would be very nice to have an
average reward for each machine.
But after we have statistics, we will
choose the machine with the largest
average reward.
https://en.wikipedia.org/wiki/Multi-armed_bandit
http://research.microsoft.com/en-us/projects/bandits/
40
DEFINITIONS OF S TANDARD RL
๏ Policy
π (s, a) ≡ Pr(a | s) — the probability to make an action “a” given a state “s”
๏ State value function
π
V (s) ≡ E[r | s] — the expected future reward given a state “s”
V (s) ≡ E[r | s] = ∑ π (s, a)∑ P [r + γ V ( s ′ )]
π
a
s′
a
ss ′
a
ss ′
π
= ∑ Pr(a |s)∑ Pr( s ′ |s,a)[r(s,a, s ′ ) + γ V π ( s ′ )]
a
s′
41
DEFINITIONS OF S TANDARD RL
๏ Policy
π (s, a) ≡ Pr(a | s) — the probability to make an action “a” given a state “s”
๏ State value function
π
V (s) ≡ E[r | s] — the expected future reward given a state “s”
V (s) ≡ E[r | s] = ∑ π (s, a)∑ P [r + γ V ( s ′ )]
π
s′
a
a
ss ′
a
ss ′
π
= ∑ Pr(a |s)∑ Pr( s ′ |s,a)[r(s,a, s ′ ) + γ V π ( s ′ )]
s′
a
๏ Optimal policy — the best way to react on state s
π
π *(s, a) = arg max V (s)
π
๏ Optimal state value function — the true value of the state s
V * (s) ≡ V π * (s) = ∑ π *(s, a)∑ Pssa′ [rsas′ + γ V π * ( s ′ )]
a
s′
42
VALUE FUNCTION
A consequence of the previous definitions — if we find the optimal value function
— the task is solved.
One way to find this function is to approximate it using the knowledge gained by
obtaining the rewards:
1 k
Vk (s) = ∑ ri ,
k i=1
k+1
1
1
Vk+1 (s) =
ri = Vk (s) +
(rk+1 − Vk (s)) = Vk (s) + α (rk+1 − Vk (s)),
∑
k + 1 i=1
k +1
where α is the learning rate.
43
Q-LEARNING AND SARSA ALGORITHMS
๏ Q-learning algorithm
Qt+1 (st ,at ) = Qt (st ,at ) + α (rt+1 + γ max Qt (st+1 ,a) − Qt (st ,at ))
a
off-policy learning
γ is the discount factor
๏ SARSA algorithm
Qt+1 (st ,at ) = Qt (st ,at ) + α (rt+1 + γ Qt (st+1 ,at+1 ) − Qt (st ,at ))
on-policy learning
44
SUMMARY
๏ Supervised Learning (SL)
‣ Classification: risks; model, space, computational and sample complexities
‣ Regression
๏ Unsupervised Learning (USL)
‣ Clustering
‣ Dimensionality reduction
๏ Reinforcement Learning (RL)
‣ MDP
‣ Value function, policy
‣ Q-learning, SARSA
45

S - melnikov.info

Transcription

Similar documents

MY REWaRDS - American Council on Exercise

Tech Deck:

supervised visitation

Abstract pdf - International Journal on Information Processing

Adventures on the Gorge Black Diamond Patches

APPLICATION FOR EMPLOYMENT

application for employment

Climbing Wall Paul Derda Recreation Center

Lecture Slides File - Moodle

Find the best - Wind Business Factor

Why Does Unsupervised Pre-training Help Deep Learning?

Why Does Unsupervised Pre-training Help Deep Learning? Dumitru Erhan . @