Ch 13 - Oncourse
Transcription
Ch 13 - Oncourse
Machine Learning: Probabilistic 13.0 Stochastic and dynamic Models of Learning 13.3 Stochastic Extensions to Reinforcement Learning 13.1 Hidden Markov Models (HMMs) 13.4 Epilogue and References 13.2 Dynamic Bayesian Networks and Learning 13.5 Exercises George F Luger ARTIFICIAL INTELLIGENCE 6th edition Structures and Strategies for Complex Problem Solving Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 1 DEFINITION HIDDEN MARKOV MODEL A graphical model is called a hidden Markov model (HMM) if it is a Markov model whose states are not directly observable but are hidden by a further stochastic system interpreting their output. More formally, given a set of states S = s1, s2, ..., sn, and given a set of state transition probabilities A = a11, a12, ..., a1n, a21, a22, ..., ..., ann, there is a set of observation likelihoods, O = pi(ot), each expressing the probability of an observation ot (at time t) being generated by a state st. Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 2 a12 = 1 - a 11 a11 S1 S2 a22 a21 = 1 - a 22 p(H) = b1 p(H) = b2 p(T) = 1 - b 1 p(T) = 1 - b 2 Figure 13.1 A state transition diagram of a hidden Markov model of two states designed for the coin flipping problem. The a ij are determined by the elements of the 2 x 2 transition matrix,A. Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 3 p(H) = b2 p(H) = b2 S2 a12 p(H) = b1 p(T) = 1 - b 1 a11 Figure 13.2 S1 a22 a21 a32 a23 a13 S3 p(H) = b3 p(T) = 1 - b 3 a33 a31 The state transition diagram for a three state hidden Markov model of coin flipping. Each coin/state, Si, has its bias,bi. Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 4 a. Figure 13.3. b. a. The hidden, S, and observable, O, states of the AR-HMM where p(O t | St , O t -1). b. The values of the hidden St of the example AR-HMM:s afe, unsafe, and faulty. Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 5 Figure 13.4 A selection of the real-time data across multiple time periods, with one time slice expanded, f or the AR-HMM. Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 6 Fig ure 1 3.5 The tim e-series data of Fig ure 1 3.4 processe d by a f ast Four ier trans form o n th e fre quen cy do main .Th is was the data su bmitted to the AR-HMM f or e ach tim e pe riod . Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 7 Fig ure 1 3.6 An au to re gress ive factor ial HMM, wh ere the o bserv able state Ot, at time t is depe nden t on multiple (St) su bprocess, S it, a nd O t-1. Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 8 neat .00013 n iy .48 t .52 need .00056 n iy Sta rt # d .11 new .001 n knee .000024 n Fi gure 13.7 .89 .36 .64 End # iy um iy A PFSM repres enti ng a set of phon eticall y related En glis h words . Th e pro babi lity o f ea ch wo rd occu rring is b elow that word. Ad apte d fro m Jurasky and Martin (200 8). Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 9 Start = 1.0 # n iy # neat .00013 2 paths 1.0 1.0 x .00013 = .00013 .00013 x 1.0 = .00013 .00013 x .52 = .000067 need .00056 2 paths 1.0 1.0 x .00056 = .00056 .00056 x 1.0 = .00056 .00056 x .11 = .000062 new .001 2 paths 1.0 1.0 x .001 = .001 .001 x .36 = .00036 .00036 x 1.0 = .00036 knee .000024 1 path 1.0 1.0 x .000024 = .000024 .000024 x 1.0 = .000024 .000024 x 1.0 = .000024 end Total best .00036 Figure 13.8 A trace of the Viterbi algorithm on several of the paths of Figure 13.7. Rows report the maximum value for Viterbi on each word for each input value (top row). Adapted from Jurafsky and Martin (2008). Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 10 function Viterbi(Observations of length T, Probabilistic FSM) begin number := number of states in FSM create probability matrix viterbi[R = N + 2, C = T + 2]; viterbi[0, 0] := 1.0; for each time step (observation) t from 0 to T do for each state si from i = 0 to number do for each transition from si to sj in the Probabilistic FSM do begin new-count := viterbi[si, t] x path[si, sj] x p(sj | si); if ((viterbi[sj, t + 1] = 0) or (new-count > viterbi[sj, t + 1])) then begin viterbi[si, t + 1] := new-count append back-pointer [sj , t + 1] to back-pointer list end end; return viterbi[R, C]; return back-pointer list end. Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 11 Fig ure 1 3.9 A DBN examp le o f two time sli ces .The set Q of rand om variab les are hid den, the set O ob served, t i ndicate s ti me. Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 12 Fi gure 13.1 0 A Ba ye sian bel ief net for the b urgla r ala rm, e arthq uake, bu rglary e xa mple. Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 13 Fig ure 1 3.11 A Markov r and om field refl ecting the pote ntia l fu nctions of the r and om variables in the BBN of Figure 13.10, toge ther wi th the two o bservati ons abou t th e system. Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 14 Fi gure 13.1 2. L u A l earna ble node Li s ad ded to the Markov ran dom fiel d of Fig ure 13 .11. The Markov rand om field ite rates across three tim e pe riods . Fo r sim plicity, the EM itera tion is only i ndicate d at tim e 1. DEFINITION A MARKOV DECISION PROCESS, or MDP A Markov Decision Process is a tuple <S, A, P, R> where: S is a set of states, and A is a set of actions. pa(st , st+1) = p(st+1 | st , at = a) is the probability that if the agent executes action a Œ A from state st at time t, it results in state st+1 at time t+1. Since the probability, pa Œ P is defined over the entire state-space of actions, it is often represented with a transition matrix. R(s) is the reward received by the agent when in state s. Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 16 DEFINITION A PARTIALLY OBSERVABLE MARKOV DECISION PROCESS, or POMDP A Partially Observable Markov Decision Process is a tuple <S, A, O, P, R> where: S is a set of states, and A is a set of actions. O is the set of observations denoting what the agent can see about its world. Since the agent cannot directly observe its current state, the observations are probabilistically related to the underlying actual state of the world. pa(st, o, st+1) = p(st+1 , ot = o | st , at = a) is the probability that when the agent executes action a from state st at time t, it results in an observation o that leads to an underlying state st +1 at time t+1. R(st, a, st+1) is the reward received by the agent when it executes action a in state st and transitions to state st+1. Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 17 st st+1 at high high low low high high low low low low high low high low high low high low high low search search search search wait wait wait wait recharge recharge Table 13.1. pa (st,st+1) a 1-a 1-b b 1 0 0 1 1 0 R a(st,st+1) R_search R_search -3 R_search R_wait R_wait R_wait R_wait 0 0 Tran siti on p robab ilities and xepected reward s for the fini te MDP of th e recycling robo t examp le. The ta ble co ntai ns a ll p ossi ble co mbin atio ns o f th e current sta te, s t, next s tate , s t+1, the actions and re wa rds p ossi ble from the curre nt s tateat. Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 18 Fi gure 13.1 3. Th e tran siti on grap h for the recycling robo t. Thestate node s are th e la rge circles and the action node s are the sma lllabck sta tes. Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 19