proactive mdp-based collision avoidance algorithm for autonomous
Transcription
proactive mdp-based collision avoidance algorithm for autonomous
P ROACTIVE MDP- BASED C OLLISION AVOIDANCE A LGORITHM FOR A UTONOMOUS C AR D. O SIPYCHEV, D. T RAN , W. S HENG AND G. C HOWDHARY O KLAHOMA S TATE U NIVERSITY (denis.osipychev, duyt, weihua.sheng, girish.chowdhary)@okstate.edu O BJECTIVES BJECTIVES Autonomous driving in the presence of other road users with the following key aspects: • Drive safe • Avoid collisions • Interact with other cars • Use the intention of others • Can assume full knowledge via V2V communication • Make smart decisions The objective is to develop a proactive collision avoidance system in addition to existing reactive safety features and prove its efficiency through simulations of interaction with modeled and real-human driving cars. B ACKGROUND ACKGROUND Classic path planning problem in time and XY domains has existing solutions [1]: • Completely reactive methods [2] • Roadmap methods [3] • Sampling search methods [4] • Sequential methods [5] However, these methods have such disadvantages as false alarms, harsh driving annoying the end user or inability to use probabilistic prediction. This work is aimed to: • Propose MDP-based algorithm to reduce number of false alarms • Optimize the time required to pass an intersection • Take into account transition uncertainty and probability of intention of other drivers M ETHODOLOGY ETHODOLOGY Action space is represented by actions: NA Action $ 1 Keep going 0 2 Slow acceleration 0 3 Slow brake 0 4 Slow Turn left 0 5 Slow Turn right 0 6 Emerg. stop -100 7 Acceleration -20 8 Brake -20 9 Turn left -30 10 Turn right -30 Start Get location and velocity from all cars Predict the motion of others based on human intention model Find probability for each state being occupied by others Program reward function R(s, a, s0 ) using probabilities and action costs Figure 2: State space representation Policy exists for R? yes no Solve value-iteration for {R} Execute the action in π(s) MDP tuple (S, A, T, R) → Solution (V, P ). Each state s is represented by set: {time, Locx , Locy , velocity} Transition probability model shown in Fig. 3 was learned through explicit simulations of car dynamic function. Figure 1: Col. Avoid. Algorithm Reward R defined by probability of collision and cost of the action. R(s, s0collision , a) = −1000 100 0 R(s, sfinal , a) = 0 s (time) R(s, s0 , a) = Cost(a) Value of each state is given by Bellman equation: Figure 3: Vel. dependent transition probability P 0 0 0 V (s) = maxa∈A s0 ∈S T (s, a, s )(R(s, a, s ) + γV (s )) R ESULTS ESULTS C ONCLUSION ONCLUSION + Allows the use of probabilistic human intention model + Solves the problem in an optimization framework + Highly generalizable solution concept - Can be computationally intensive - Need to recompute the solution for every reward scenario O NGOING NGOING R ESEARCH ESEARCH • Long-term human intention prediction – Interact with human drivers and use their intentior – Reduce additional policy computations – Allows to switch policy less often R EFERENCES EFERENCES [1] S. M. LaValle, Planning algorithms. Cambridge university press, 2006. [2] F. Belkhouche, “Reactive path planning in a dynamic environment,” Robotics, IEEE Transactions on, vol. 25, no. 4, pp. 902–911, 2009. [3] N. M. Amato and Y. Wu, “A randomized roadmap method for path and manipulation planning,” in Robotics and Automation, 1996. Proceedings., 1996 IEEE International Conference on, vol. 1. IEEE, 1996, pp. 113–120. [4] N. K. Yilmaz, C. Evangelinos, P. F. Lermusiaux, and N. M. Patrikalakis, “Path planning of autonomous underwater vehicles for adaptive sampling using mixed integer linear programming,” Oceanic Engineering, IEEE Journal of, vol. 33, no. 4, pp. 522–537, 2008. [5] S. Brechtel, T. Gindele, and R. Dillmann, “Probabilistic mdp-behavior planning for cars,” in 14th International IEEE Conference on Intelligent Transportation Systems (ITSC), 2011, pp. 1537–1542. [6] R. Bellman, “A markovian decision process,” DTIC Document, Tech. Rep., 1957. • Computationally efficient problem decomposition – Dynamical resolution change – Decompose car allocations – Separately solve MDP for each car Figure 4: Agent(’Car1’) and human(’Car2’) velocities in random example, simulation stops when Agent pass intersection. Figure 5: Max acceleration used and travel time comparison for MDP and reactive methods. The higher variances of MDP results are due to variety of solutions.