Applying Game Theory to Poker, Network Traffic and Combinatorial
Transcription
Applying Game Theory to Poker, Network Traffic and Combinatorial
My Projects Using Game Theory Robert Holte © 2006 R. C. Holte, AICML 1 CMPUT 366, November, 2006 Poker © 2006 R. C. Holte, AICML 2 CMPUT 366, November, 2006 The Challenges • • • • • Large game tree Stochastic element Imperfect information (during hand, and after) Variable number of players (2–10) Aim is not just to win, but to maximize winnings • © 2006 R. C. Holte, AICML Need to exploit opponent weaknesses 3 CMPUT 366, November, 2006 2-player, limit, Texas Hold’em 1,624,350 O(1018) Initial 9 of 19 Bet Sequence 17,296 Flop 9 of 19 Bet Sequence 45 9 of 19 Turn 2 private cards to each player 3 community cards 1 community card Bet Sequence 44 River 19 Bet Sequence 1 community card Game-Theoretic Approach © 2006 R. C. Holte, AICML 5 CMPUT 366, November, 2006 Linear Programming • • • • A 2-player, zero-sum game with chance events, mixed strategies, and imperfect information can be formulated as a linear program (LP). The LP can be solved in polynomial time to produce Nash strategies for P1 and P2. Guaranteed to minimize losses against the strongest possible opponent. “Sequence form” – the LP is linear in the size of the game tree (Koller, Megiddo, and von Stengel) © 2006 R. C. Holte, AICML 6 CMPUT 366, November, 2006 Linear Programming • A 2-player, zero-sum game with chance events, mixed strategies, and imperfect information can be formulated as a linear 18 program (LP). The LP can be solved in polynomial time to produce Nash strategies for P1 and P2. Guaranteed to minimize losses against the strongest possible opponent. “Sequence form” – the LP is linear in the size of the game tree. 10 !!! • • • © 2006 R. C. Holte, AICML 7 CMPUT 366, November, 2006 PsOpti (Sparbot) – IJCAI’03 • • • • • © 2006 R. C. Holte, AICML Abstract game tree of size 107 Bluffing, slow play, etc. fall out from the mathematics. Best 2-player program to date Has held its own against 2 world-class humans Won the AAAI’06 poker-bot competitions 8 CMPUT 366, November, 2006 PsOpti2 vs. “theCount” © 2006 R. C. Holte, AICML 9 CMPUT 366, November, 2006 PsOpti’s Weaknesses • The equilibrium strategy for the highly abstract game is far from perfect. • No opponent modelling. • Nash equilibrium not the best strategy: • • • Non-adaptive Defensive Even the best humans have weaknesses that should be exploited © 2006 R. C. Holte, AICML 10 CMPUT 366, November, 2006 http://www.poker-academy.com Network Routing when Traffic Demands are Uncertain Recent Ph.D. thesis by Yuxi Li © 2006 R. C. Holte, AICML 12 CMPUT 366, November, 2006 Example A TRAFFIC DEMAND To A B C From 100 A B C B 100 100 C © 2006 R. C. Holte, AICML • 0 80 20 80 0 10 20 10 0 A routing is a set of fractions for all meaningful combinations of H, O, D, N in: At node H, fH,O,D,N of the packets going from node O to node D should be sent next to node N 13 CMPUT 366, November, 2006 Treat this as a 2-player game • Objective function: • • • • lifetime of an energy-constrained network sensor network) (e.g. Player 1 (max): choose a routing Player 2 (min): choose a traffic demand matrix The Nash equilibrium of this game is a routing that is resistant to adversarial attacks © 2006 R. C. Holte, AICML 14 CMPUT 366, November, 2006 Behaviour under Attack © 2006 R. C. Holte, AICML 15 CMPUT 366, November, 2006 Combinatorial Auctions © 2006 R. C. Holte, AICML 16 CMPUT 366, November, 2006 Rodent auction Flopsy Rodent auction Flopsy Mopsy Rodent auction Flopsy Mopsy Jack Rodent auction Flopsy Mopsy C1: will pay $5 for any one C3: will pay $12 for all three C2: will pay $9 for a breeding pair (Flopsy and one of the others) Jack Combinatorial Auctions • • • • • © 2006 R. C. Holte, AICML Single Unit C.A. – one copy of each item Auction all items simultaneously Bid specifies a price and a set of items (“all or nothing”) Exclusive-OR can be achieved by having a “dummy item” representing the bidder Multi-round or single-round 21 CMPUT 366, November, 2006 $12 for all three $12 F © 2006 R. C. Holte, AICML M 22 J CMPUT 366, November, 2006 $9 for a breeding pair F © 2006 R. C. Holte, AICML $9 $9 M J 23 CMPUT 366, November, 2006 $5 for any one © 2006 R. C. Holte, AICML $5 $5 $5 F M J 24 C1 CMPUT 366, November, 2006 Applications • • • • © 2006 R. C. Holte, AICML FCC spectrum auctions Goods distribution routes Airport gates, parcels of land eBay 25 CMPUT 366, November, 2006 Example Application SPOT: earth observation satellite Requests are made for photographs. Each photograph can be taken at several times by several different instruments/settings, but quality (profit) may vary. Using one instrument/setting at a given time may prevent the use of another at the same or adjacent times. © 2006 R. C. Holte, AICML 26 CMPUT 366, November, 2006 Example Photograph 1 • • • $4 if taken on instrument A at time T1 $3 if taken in instrument A at time T2 $12 is taken on instrument B at time T2 Photograph 2 • $7 if taken on instrument B at time T2 If B is being used, A cannot used at the same time. © 2006 R. C. Holte, AICML 27 CMPUT 366, November, 2006 Formulation as a C.A. Each instrument/setting/time is a separate item for auction. Each photograph is a bidder, with XOR bids for each of the different ways of achieving the photograph. If “items” A and B are mutually exclusive, any bid for A also includes B. © 2006 R. C. Holte, AICML 28 CMPUT 366, November, 2006 Example as a C.A. $7 $12 B2 © 2006 R. C. Holte, AICML $3 A2 29 $4 A1 P1 CMPUT 366, November, 2006 Winner Determination Problem: how to determine who wins ? Choose a set of bids that are feasible (disjoint) and maximize the auctioneer’s profit. NP-complete (set packing problem) © 2006 R. C. Holte, AICML 30 CMPUT 366, November, 2006 Multi-unit Combinatorial Auctions • There are bi copies of item i • A bid specifies a quantity for each item (is a vector length m if there are m items) • © 2006 R. C. Holte, AICML A set of bids is feasible if its total demand for each item does not exceed the number of available copies of the item 31 CMPUT 366, November, 2006 Manufacturing Application 1 • • Inventory consists of m types of parts, with bi instances of type i. A customer order requests a certain quantity of each part, and offers a price. Determine which orders to fill to maximize profit. © 2006 R. C. Holte, AICML 32 CMPUT 366, November, 2006 Auctions and Knapsacks Winner determination in a multi-unit combinatorial auction with a single item is the classic NPcomplete Knapsack problem. With more than one item, it is the multidimensional Knapsack problem (MDKP) – much less studied. © 2006 R. C. Holte, AICML 33 CMPUT 366, November, 2006 The Knapsack Problem • • • • An “easy” NP-complete problem Garey & Johnson: knapsack is considered “solved” by many (by branch-and-bound) n (#bids) can be reduced by decomposition and preprocessing Solvable in pseudo-polynomial time by dynamic programming © 2006 R. C. Holte, AICML 34 CMPUT 366, November, 2006 Knapsack Algorithms • Dave Pisinger’s PhD (1995) - publicly available, very fast code (d.p.) • There is a very simple greedy algorithm guaranteed to be ½ optimal or better © 2006 R. C. Holte, AICML 35 CMPUT 366, November, 2006 The Multidimensional Knapsack Problem (MDKP) • • • An hard NP-complete problem Not solvable in pseudo-polynomial time. No greedy (polynomial) algorithm can guarantee an approximation that is better than OPT/k½ where k is the sum of the bi. © 2006 R. C. Holte, AICML 36 CMPUT 366, November, 2006 Why Try Hill-climbing ? • It is part of branch&bound search and some heuristic search algorithms • Try simple, generic search algorithms before complex ones and problem-specific variants © 2006 R. C. Holte, AICML 37 CMPUT 366, November, 2006 Deterministic Hill-Climbing Start with the empty bid-set REPEAT: • Consider adding one bid to the current bid-set • Prune bid-sets that are infeasible • Add the bid with the highest “score” UNTIL pruning eliminates all alternatives Report the highest price seen during search, not the price of the final local “score” maximum © 2006 R. C. Holte, AICML 38 CMPUT 366, November, 2006 Deterministic Hill-Climbing Start with the empty bid-set REPEAT: • Consider adding one bid to the current bid-set • Prune bid-sets that are infeasible or cannot be extended to improve the best price seen so far • Add the bid with the highest “score” UNTIL pruning eliminates all alternatives Report the highest price seen during search, not the price of the final local “score” maximum © 2006 R. C. Holte, AICML 39 CMPUT 366, November, 2006 Scoring functions • • Price N2norm = Price/size • • Single-unit, size = (# items in the bid)½ Multi-unit, size = (∑f(i)2)½ f(i) is the fraction of the remaining quantity of item i required by the bid • KO = Price/(price of contending bids KO’d by the bid) © 2006 R. C. Holte, AICML 40 CMPUT 366, November, 2006 Randomized Hill-climbing Instead of adding the bid with the best score, choose among the alternative bids (after pruning) randomly, with probability proportional to score. Restart (with the empty bid-set) several times on a given problem and report best price seen on any restart. © 2006 R. C. Holte, AICML 41 CMPUT 366, November, 2006 Single-Unit Test Problems CATS test suite from Stanford • new • Problem generators for 5 different scenarios e.g. airport takeoff and landing time-slots • • Realistic (e.g. airports are Chicago, LaGuardia, etc.) Numerous Parameters – defaults used except • • 3 variations on “regions” (default + 2 others) scaled down “paths” and “scheduling” © 2006 R. C. Holte, AICML 42 CMPUT 366, November, 2006 Experimental Setup • For each problem type randomly generate 100 different problems • On each problem run • the 3 deterministic hill-climbers “Best DHC” = best of these prices • the 3 randomized hill-climbers (20 restarts each) “Best RHC” = best of these prices © 2006 R. C. Holte, AICML 43 CMPUT 366, November, 2006 Average Solution Quality (% of optimal) problem type © 2006 R. C. Holte, AICML best DHC best RHC path 98 98 match 99 99 sched 96 98 r75P 83 92 r90P 90 96 r90N 89 96 arb 87 95 44 CMPUT 366, November, 2006 Observations • • Randomized (20 restarts) better than deterministic Randomized finds very good solutions (always > 80%, average > 92%) • Problem ratings: easy: path, match, sched harder: r90P, r90N, arb hardest: r75P • On the easy problems, deterministic finds very good solutions (almost as good as randomized) © 2006 R. C. Holte, AICML 45 CMPUT 366, November, 2006 Which Scoring Function ? N2norm and KO about the same, better than Price. © 2006 R. C. Holte, AICML 46 CMPUT 366, November, 2006 Which Scoring Function ? N2norm and KO about the same, better than Price. But are they better than chance ? © 2006 R. C. Holte, AICML 47 CMPUT 366, November, 2006 Blind Hill-climbing Choose bid randomly with uniform probability • • still prunes still reports best price seen throughout search Repeat 200 times on each problem to measure its solution quality distribution © 2006 R. C. Holte, AICML 48 CMPUT 366, November, 2006 Percentage of Blind HC solutions worse than Deterministic solutions problem type © 2006 R. C. Holte, AICML N2norm KO path 100 100 match 100 100 sched 99 99 r75P 76 63 r90P 16 7 r90N 23 6 arb 40 20 49 CMPUT 366, November, 2006 Single-unit CA – Conclusions • hard for Blind HC, easy for DHC and RHC • • • • problems of this type are solved well by HC success is due to the scoring functions not good testbeds for comparative experiments allowing suboptimal solutions easy for Blind HC, hard for DHC • • • scoring functions alone no better than chance numerous good solutions throughout the search space good testbeds as long as the Blind HC baseline is taken into account © 2006 R. C. Holte, AICML 50 CMPUT 366, November, 2006 Multidimensional Knapsack Test Problems ORLIB test suite from J. Beasley • mknap1, mknap2 • • • • real-world, optimal values known widely used now considered “too easy” artificial, larger, harder problems • • only the smallest is solvable by CPLEX best known solutions very close to LP-optimal © 2006 R. C. Holte, AICML 51 CMPUT 366, November, 2006 Average Solution Quality (% of optimal or of best known) test set Price N2norm KO Blind mknap1 90 98.99 83 84 mknap2 94 99.00 79 58 mknapcb1 89 98.94 85 82 mknapcb2 89 99.03 85 83 mknapcb3 89 99.21 85 83 mknapcb7 93 98.35 85 81 © 2006 R. C. Holte, AICML 52 CMPUT 366, November, 2006 Conclusion – MDKP • deterministic hill-climbing with N2norm competitive with all previous work, and much better than previous greedy approaches • other scoring functions relatively poor • randomized hill-climbing only better on the easy problems © 2006 R. C. Holte, AICML 53 CMPUT 366, November, 2006