2-7 Triple Draw Poker
Transcription
2-7 Triple Draw Poker
Playing Draw Poker with Convolu4onal Neural Nets Nikolai Yakovenko 4/22/15 for EE6894 What is Draw Poker? • Five-‐card poker with one exchange • 10,000 machines in Las Vegas alone • Pays out $0.95 to $1.007 per dollar*… *with perfect play • Add photo of payout table 100% payout with perfect play. The Machine’s Edge What do you do here? Rule #7 Draw 3 to a Royal Flush! Worth $0.80 on average Worth $1.80 on average Easy to get 99.5% payout Just follow these 25 easy rules... No wonder people make mistakes Typical human plays 5% below expectaaon • • • • • • • • • • • • • • • • • • • • • • • • • Four of a kind, straight flush, royal flush 4 to a royal flush Three of a kind, straight, flush, full house 4 to a straight flush Two pair High pair 3 to a royal flush 4 to a flush Low pair 4 to an outside straight 3 to a straight flush (type 1) AKQJ unsuited 2 suited high cards 4 to an inside straight with 3 high cards 3 to a straight flush (type 2) KQJ unsuited QJ unsuited JT suited KQ, KJ unsuited QT suited AK, AQ, AJ unsuited KT suited One high card 3 to a straight flush (type 3) Discard everything Can learning do becer? Data Representaaon All 32 sim results for hand [Kh,Ad,Kc,8h,Qc]: []:2000 sample: 0.27 ave 6.00 max [4d,4h,Jh,5d,4c] x23456789TJQKA c..1.......... d..11......... h..1......1... s............. 4x13 binary matrix, each card 32-‐length vector for all draws [Kh]:20000 sample: 0.33 ave 9.00 max [Ad]:20000 sample: 0.45 ave 25.00 max [Kc]:20000 sample: 0.33 ave 25.00 max [8h]:2000 sample: 0.35 ave 25.00 max [Qc]:20000 sample: 0.44 ave 50.00 max [Kh,Ad]:2000 sample: 0.41 ave 9.00 max [Kh,Kc]:2000 sample: 1.54 ave 25.00 max … [Kc,Qc]:20000 sample: 0.66 ave 976.00 max [8h,Qc]:2000 sample: 0.34 ave 9.00 max [Kh,Ad,Kc]:2000 sample: 1.38 ave 25.00 max … [Kh,Kc,8h]:2000 sample: 1.42 ave 25.00 max [Kh,Kc,Qc]:2000 sample: 1.45 ave 25.00 max … [Kh,Ad,Kc,8h]:2000 sample: 1.23 ave 3.00 max [Kh,Ad,Kc,Qc]:2000 sample: 1.21 ave 3.00 max [Kh,Ad,8h,Qc]:2000 sample: 0.18 ave 1.00 max [Kh,Kc,8h,Qc]:2000 sample: 1.21 ave 3.00 max [Ad,Kc,8h,Qc]:2000 sample: 0.16 ave 1.00 max [Kh,Ad,Kc,8h,Qc]:2000 sample: 1.00 ave 1.00 max best result: [Kh,Kc]:2000 sample: 1.54 ave 25.00 max Why Convoluaonal Network? • Card games are visual • Learn properaes like pairs, flushes, straights • Proximity in the inputs macer [Kh,Qh,4h,3c,Jh] [2d,2h,7c,6c,5c] x23456789TJQKA c.1........... d............. h..1......111. s............. x23456789TJQKA c...111....... d1............ h1............ s............. Copy image ConvNet best pracaces Karen Simonyan & Andrew Zisserman explain… hcp://www.robots.ox.ac.uk/~karen/ pdf/ILSVRC_2014.pdf 5-‐Card Draw Poker: Network Shape input layer shape 100 x 5 x 17 x 17 convoluaon layer l_conv1. Shape (100, 16, 15, 15) convoluaon layer l_conv1_1. Shape (100, 16, 13, 13) maxPool layer l_pool1. Shape (100, 16, 7, 7) convoluaon layer l_conv2. Shape (100, 32, 5, 5) convoluaon layer l_conv2_2. Shape (100, 32, 3, 3) maxPool layer l_pool2. Shape (100, 32, 2, 2) hidden layer l_hidden1. Shape (100, 1024) dropout layer l_hidden1_dropout. Shape (100, 1024) final layer l_out, into 32 dimension. Shape (100, 32) Training Simple Nuanced • Learn all 32 outputs • Loss = mean squared error • Update with Nesterov Momentum • Bias toward rare cases • Round off large values – Easily gets to 67% accurate moves – 99% [0.0 – 4.0] – <1% [10.0 – 900.0] • Switch to adapave learning – AdaDelta works well – Iniaalize working model Results! Real Return (100k hands) Valida4on% 15x15 same-‐shape 50k training size $0.930 73% 17x17 valid-‐shape 90k training size $0.944 70% 17x17 valid-‐shape 150k training size $0.955 78% 17x17 valid-‐shape 90k training (longer training) $0.983 77% Don’t trust the averages, since hugely asymmetric payoff. Study your mistakes Biggest Errors 500 hands took 2397.96s 406 no error 52 tiny error 25 small error 17 big error biggest errors: (1.14, (1.12, (0.85, (0.83, (0.82, (0.81, (0.75, (0.71, … (0.41, '[4s,7h,Ah,3h,Jh]', '[Ts,Kc,Jc,3h,Qc]', '[Tc,2s,9c,8d,8c]', '[3d,Td,9s,Ad,8d]', '[6d,2c,3c,Tc,9c]', '[Jc,9h,Kh,Ah,2h]', '[2h,Qs,Jh,Kh,8h]', '[9s,Th,As,4s,6s]', '[7h,Ah,3h,Jh]', 1.26, '[4s,7h,Ah,Jh]', 0.12) '[Kc,Jc,Qc]', 2.015, '[Ts,Kc,Jc,Qc]', 0.897) '[8d,8c]', 0.852, '[Tc,2s,9c,8c]', 0.0) '[3d,Td,Ad,8d]', 1.272, '[Td,Ad,8d]', 0.442) '[2c,3c,Tc,9c]', 1.167, '[]', 0.340) '[9h,Kh,Ah,2h]', 1.278, '[Jc,Kh,Ah]', 0.465) '[2h,Jh,Kh,8h]', 1.261, '[Jh,Kh]', 0.513) '[9s,As,4s,6s]', 1.229, '[As]', 0.511) '[2s,Jh,Kd,Tc,Qs]', '[Jh,Kd,Tc,Qs]', 0.872, '[Jh,Kd,Qs]', 0.463) Struggling with straights, flushes, straight flushes. Will it learn with more ame? With becer examples? Lessons Learned Do Don’t do • Keep network lean • Endlessly fiddle with network shape • Fiddle with learning rate • Permute input data – Deep but simple • Use adapave learning – Iniaalize with working model • Train for a long ame • Bias toward difficult data – Becer to get fresh samples Digital or Analog? DeepMind’s Atari AI, acempang to approximate an exact value. Neural Nets do bad “Exact Math” for games Obvious Improvements • Train on errors – Directly, or look for similar cases – Generate more data, permute known cases • Run much longer, on more data – Total training: 400k cases, down-‐sampled to 150k • Train mulaple models, and vote on result Beyond Draw Video Poker • Different video payout – Start training on current model • Triple Draw – 3 rounds, so train 3 models – Same network shape, same output – One big model? • Incorporate be|ng, opponent hand informaaon Quesaons • Different network shape? • How to handle input padding? • Retrain on rare cases? – Or a specialty network? – (Backgammon AIs include 2-‐5 different networks) Thank you! Bibliography • GitHub: hcps://github.com/moscow25/deep_draw – Ping me if you want to run it. Needs bit of cleanup. • Lasagne: hcp://lasagne.readthedocs.org/en/latest/ • Network shape (for images) – hcp://www.robots.ox.ac.uk/~karen/pdf/ILSVRC_2014.pdf – hcp://vision.stanford.edu/teaching/cs231n/slides/lecture8.pdf • AdaDelta: hcp://www.machewzeiler.com/pubs/googleTR2012/googleTR2012.pdf • DeepMind Atari: hcp://www.nature.com/nature/journal/v518/n7540/full/ nature14236.html • PokerSnowie: hcps://www.pokersnowie.com/about/weaknesses.html • Wizard of Odds: hcp://wizardofodds.com/games/video-‐poker/tables/jacks-‐or-‐becer/
Similar documents
Meetmuslims.net: Best for Muslim Marriage Free Chat
Meetmuslims.net is the best online dating site to date Muslim women for free in the USA. Our dating site is best Muslim marriage free chat resource. Join with us to get your partner! Start Dating! Visit Us: http://meetmuslims.net/
More information