S - melnikov.info
Transcription
S - melnikov.info
BASICS OF MACHINE LEARNING Alexey Melnikov Institute for Theoretical Physics, University of Innsbruck Quantum computing, control, and learning March 16, 2016 OUTLINE ๏ Supervised Learning ๏ Unsupervised Learning ๏ Reinforcement Learning 2 MACHINE LEARNING PREVIOUS INTRODUCTORY TALK Environment Intelligent agent percepts actions 3 MACHINE LEARNING Agent Environment f input θ output = = f (θ ,input) Agent’s parameters The agent (machine learning model) is defined as a function, that maps inputs to outputs. The goal of learning is to modify the agent’s parameters, such that the agent produces desired outputs. Depending on the structure of the input, several types of learning can be defined. 4 1. SUPERVISED LEARNING (SL) SL agent Teacher f Training data {xi , yi }i=1...M θ Prediction y′ = f (θ , x) Correct output Input object (number, vector, photo …) Teacher provides training data Learning model constructs ๏ M data points ‣ Input (x) ‣ Correct output (y) ๏ A map f from x to y’, such that yi′ ! yi for all i 5 2. UNSUPERVISED LEARNING (USL) USL agent Teacher f Training data {xi }i=1...M θ Prediction y′ = f (θ , x) Input object (number, vector, photo …) Teacher provides training data Learning model constructs ๏ M data points ‣ Input (x) ‣ Correct output (y) ๏ A map f from x to y’ 6 3. REINFORCEMENT LEARNING (RL) RL agent Environment f Percept s, r θ Action a = f (θ , x) Reward (number) Input state (number, vector, photo …) Environment provides Learning model constructs ‣ State (s) ‣ Reward (r) ๏ A map f from s to a The goal of a RL agent is to maximize the total reward 7 SUPERVISED LEARNING SL agent Teacher f Training data {xi , yi }i=1...M θ Prediction y′ = f (θ , x) Correct output Input object (number, vector, photo …) Teacher provides training data Learning model constructs ๏ M data points ‣ Input (x) ‣ Correct output (y) ๏ A map f from x to y’, such that yi′ ! yi for all i 8 SUPERVISED LEARNING SL agent Teacher f Training data {xi , yi }i=1...M θ Prediction y′ = f (θ , x) Applications ‣ Speech recognition ‣ Optical character recognition ‣ Face recognition ‣ Spam detection ‣ Netflix suggestions ‣ … S. Russell and P. Norvig. Artificial intelligence: A Modern Approach, 3rd ed. Prentice Hall, 2009. 9 SUPERVISED LEARNING SL agent Teacher f Training data {xi , yi }i=1...M θ Prediction y′ = f (θ , x) SL algorithms ‣ k-nearest neighbors ‣ Support Vector Machines (3. “SVM” talk; 8. “Quantum SVM” talk) ‣ Artificial Neural Networks (4. “ANN” talk) ‣ Learning Classifier Systems (5. “LCS and boosting” talk) ‣ … S. Russell and P. Norvig. Artificial intelligence: A Modern Approach, 3rd ed. Prentice Hall, 2009. 10 SUPERVISED LEARNING Training SL agent Testing Teacher SL agent Training data x {xi , yi }i=1...M θ Input θ Model parameters θ Prediction y' = f (θ , x) Two types of tasks ๏ Classification (output y is discrete) ๏ Regression (output y is continuous) 11 Tester SUPERVISED LEARNING CL ASSIFIC ATION TASK 10 data points: {0.8, 1} … {0.95, 1} SL agent θ {0.5, 1} {0.3, 2} Classifier y Teacher {0.3, 2} {0.5, 1} {0.95, 1} 2 1 y' = f (θ , x) 0.5 ⎧1, y′ = f (θ , x) = ⎨ ⎩2, 12 1 for x > θ for x ≤ θ x SUPERVISED LEARNING CL ASSIFIC ATION TASK 10 data points: {0.8, 1} … {0.95, 1} SL agent θ {0.5, 1} {0.3, 2} Classifier y Teacher {0.3, 2} {0.5, 1} {0.95, 1} 2 1 y' = f (θ , x) 0.5 ⎧1, y′ = f (θ , x) = ⎨ ⎩2, 13 1 for x > 0.4 for x ≤ 0.4 x SUPERVISED LEARNING 1D CL ASSIFIC ATION TASK y Classifier separates two classes {0.3, 2} {0.5, 1} {0.95, 1} 2 classifier f(x) 1 0.5 ⎧1, y′ = f (θ , x) = ⎨ ⎩2, 1 x 0.5 for x > 0.4 for x ≤ 0.4 14 1 x SUPERVISED LEARNING 2D CL ASSIFIC ATION TASK ⎛ x1 ⎞ Input is a vector x = ⎜ ⎟ , output is a class {0,1} ⎝ x2 ⎠ x2 linear classifier f(x1, x2) ⎧1, y′ = f (θ , x) = ⎨ ⎩2, for x2 > θ1 x1 + θ 2 for x2 ≤ θ1 x1 + θ 2 Empirical risk — training error x1 1 R= L( yi′, yi ) ∑ M i L( yi′, yi ) = 0 or 1 In general - hyperplane: Support Vector Machines (next talk) 15 SUPERVISED LEARNING 2D CL ASSIFIC ATION TASK. MORE COMPLEX MODEL ⎛ x1 ⎞ Input is a vector x = ⎜ ⎟ , output is a class {0,1} ⎝ x2 ⎠ x2 classifier f(x1, x2) ⎧1, f (θ , x) = ⎨ ⎩2, for x2 > θ1 Sin(θ 2 x1 ) + θ 3 for x2 ≤ θ1 Sin(θ 2 x1 ) + θ 3 x1 This model classified all training samples correctly (empirical risk R=0), but is more complex. Complex models lead to overfitting error. 16 SUPERVISED LEARNING ERRORS VS. MODEL COMPLEXIT Y Small complexity Test error overfitting error training error Larger complexity Model complexity Tradeoff between training error and overfitting error 17 SUPERVISED LEARNING VAPNIK–C HERVONENKIS (VC) THEORY overfitting Test error training Model complexity SVM (next talk) ANN (next after next talk) 2M η h(log + 1) − log ⎛ ⎞ h 4 P ⎜ test error ≤ training error + = 1− η ⎟ M ⎝ ⎠ h — VC dimension, corresponds to the model complexity ๏ If h is large — we can not say anything about the expected error on a test set ๏ If h is small — an error on a training set will be large, but we can bound an error on a test set 18 SUPERVISED LEARNING K-NN ALGORITHM. SPACE COMPLEXIT Y x2 x2 look at k neighbors ? choose the color, that appears most times x1 x1 k — the only parameter of the model. But large memory is required, proportional to M (space complexity). Moreover, the model is computationally complex. 19 SUPERVISED LEARNING PROBABLY APPROXIMATELY CORRECT (PAC) THEORY The size of the training set should be ⎞ 1⎛ 1 M ≥ ⎜ log + log h ⎟ , ε⎝ δ ⎠ 0 ≤ ε ≤ 1 / 2, 0 ≤ δ ≤ 1 / 2. h — VC dimension ε — target error δ — probability of having (1 - ε )-correct model O. Maimon and L. Rokach. Introduction to supervised methods, Data Mining and Knowledge Discovery Handbook. Springer, 2010. 20 SUPERVISED LEARNING REGRESSION TASK y ? x 1D input, 1D output 21 SUPERVISED LEARNING REGRESSION TASK y y’=f(x) ๏ Moderate training error ๏ Moderate test error is expected 22 x SUPERVISED LEARNING REGRESSION TASK y the spring is deformed from its equilibrium length L=0 strain energy Empirical risk — training error 1 R= M y’=f(x) 2 ~ (y-y’) ∑ L( y′, y ) i i i Empirical risk minimisation — principle of minimum energy mean squared error L( yi′, yi ) = (yi − yi′)2 ๏ Moderate training error ๏ Moderate test error is expected 23 x SUPERVISED LEARNING REGRESSION TASK. MORE COMPLEX MODEL y springs have zero deformation, the training error is zero ๏ Zero training error ๏ Huge test error is expected 24 y’=f(x) x UNSUPERVISED LEARNING USL agent Teacher f Training data {xi }i=1...M θ Prediction y′ = f (θ , x) Input object (number, vector, photo …) Teacher provides training data Learning model constructs ๏ M data points ‣ Input (x) ‣ Correct output (y) ๏ A map f from x to y’ 25 UNSUPERVISED LEARNING USL agent Teacher f Training data {xi }i=1...M θ Prediction y′ = f (θ , x) Applications ‣ Clustering ‣ Finding new concepts ‣ Dimensionality reduction ‣ … S. Russell and P. Norvig. Artificial intelligence: A Modern Approach, 3rd ed. Prentice Hall, 2009. 26 UNSUPERVISED LEARNING USL agent Teacher f Training data {xi }i=1...M θ Prediction y′ = f (θ , x) SL algorithms ‣ k-means ‣ expectation maximization ‣ singular value decomposition ‣ … S. Russell and P. Norvig. Artificial intelligence: A Modern Approach, 3rd ed. Prentice Hall, 2009. 27 UNSUPERVISED LEARNING CLUS TERING x2 ? x1 2D input, binary output M=4 28 UNSUPERVISED LEARNING CLUS TERING x2 x1 2D input, binary output M=4 29 UNSUPERVISED LEARNING CLUS TERING x2 ? x1 2D input, binary output M=23 30 UNSUPERVISED LEARNING CLUS TERING x2 x1 2D input, binary output M=23 31 UNSUPERVISED LEARNING DIMENSIONALIT Y REDUCTION x2 x1 2D input M=10 32 UNSUPERVISED LEARNING DIMENSIONALIT Y REDUCTION x2 x x1 2D input M=10 33 UNSUPERVISED LEARNING DIMENSIONALIT Y REDUCTION x 2D data is mapped to 1D data 34 REINFORCEMENT LEARNING RL agent Environment f Percept s, r θ Action a = f (θ , x) Reward (number) Input state (number, vector, photo …) Environment provides Learning model constructs ‣ State (s) ‣ Reward (r) ๏ A map f from s to a The goal of a RL agent is to maximize the total reward 35 REINFORCEMENT LEARNING RL agent Environment f Percept s, r θ Action a = f (θ , x) Applications ‣ Games (12. “Deep (convolution) neural networks and Google AI” talk) ‣ Robotics (13. “Projective simulation and Robotics in Innsbruck” talk) ‣ Quantum experiments (14. “Machine learning for quantum experiments and information processing” talk) R. S. Sutton, and A. G. Barto. Reinforcement learning: An introduction. MIT press, 1998. 36 REINFORCEMENT LEARNING RL agent Environment f Percept s, r θ Action a = f (θ , x) RL algorithms ‣ Projective simulation (6. “PS” talk; 11. “Quantum speed-up of PS agents” talk; 13. “PS and Robotics in Innsbruck” talk; 14. “Machine learning for quantum experiments and information processing” talk) ‣ Q-learning ‣ SARSA R. S. Sutton, and A. G. Barto. Reinforcement learning: An introduction. MIT press, 1998. 37 REINFORCEMENT LEARNING MARKOV DECISION PROCESS (MDP) The RL agent perceives one out 3 states ( S0 ,S1 ,S2 ) and can choose one out of 2 actions ( a0 , a1 ). With some probability, after perceiving the state Si and doing the action ak, the agent sees the state S j and receives the reward r(Si , ak , S j ) . This process can be described by a Markov chain with actions and rewards — MDP. https://en.wikipedia.org/wiki/Markov_decision_process 38 REINFORCEMENT LEARNING BASIC SCENARIO. MULTI- ARMED BANDIT S 1 What slot machine should we choose? 2 … n What action should we choose? https://en.wikipedia.org/wiki/Multi-armed_bandit 39 REINFORCEMENT LEARNING BASIC SCENARIO. MULTI- ARMED BANDIT In order to obtain an average reward one has to play many times… What slot machine gives the largest reward? It would be very nice to have an average reward for each machine. But after we have statistics, we will choose the machine with the largest average reward. https://en.wikipedia.org/wiki/Multi-armed_bandit http://research.microsoft.com/en-us/projects/bandits/ 40 REINFORCEMENT LEARNING DEFINITIONS OF S TANDARD RL ๏ Policy π (s, a) ≡ Pr(a | s) — the probability to make an action “a” given a state “s” ๏ State value function π V (s) ≡ E[r | s] — the expected future reward given a state “s” V (s) ≡ E[r | s] = ∑ π (s, a)∑ P [r + γ V ( s ′ )] π a s′ a ss ′ a ss ′ π = ∑ Pr(a |s)∑ Pr( s ′ |s,a)[r(s,a, s ′ ) + γ V π ( s ′ )] a s′ 41 REINFORCEMENT LEARNING DEFINITIONS OF S TANDARD RL ๏ Policy π (s, a) ≡ Pr(a | s) — the probability to make an action “a” given a state “s” ๏ State value function π V (s) ≡ E[r | s] — the expected future reward given a state “s” V (s) ≡ E[r | s] = ∑ π (s, a)∑ P [r + γ V ( s ′ )] π s′ a a ss ′ a ss ′ π = ∑ Pr(a |s)∑ Pr( s ′ |s,a)[r(s,a, s ′ ) + γ V π ( s ′ )] s′ a ๏ Optimal policy — the best way to react on state s π π *(s, a) = arg max V (s) π ๏ Optimal state value function — the true value of the state s V * (s) ≡ V π * (s) = ∑ π *(s, a)∑ Pssa′ [rsas′ + γ V π * ( s ′ )] a s′ 42 REINFORCEMENT LEARNING VALUE FUNCTION A consequence of the previous definitions — if we find the optimal value function — the task is solved. One way to find this function is to approximate it using the knowledge gained by obtaining the rewards: 1 k Vk (s) = ∑ ri , k i=1 k+1 1 1 Vk+1 (s) = ri = Vk (s) + (rk+1 − Vk (s)) = Vk (s) + α (rk+1 − Vk (s)), ∑ k + 1 i=1 k +1 where α is the learning rate. 43 REINFORCEMENT LEARNING Q-LEARNING AND SARSA ALGORITHMS ๏ Q-learning algorithm Qt+1 (st ,at ) = Qt (st ,at ) + α (rt+1 + γ max Qt (st+1 ,a) − Qt (st ,at )) a off-policy learning γ is the discount factor ๏ SARSA algorithm Qt+1 (st ,at ) = Qt (st ,at ) + α (rt+1 + γ Qt (st+1 ,at+1 ) − Qt (st ,at )) on-policy learning 44 SUMMARY ๏ Supervised Learning (SL) ‣ Classification: risks; model, space, computational and sample complexities ‣ Regression ๏ Unsupervised Learning (USL) ‣ Clustering ‣ Dimensionality reduction ๏ Reinforcement Learning (RL) ‣ MDP ‣ Value function, policy ‣ Q-learning, SARSA 45