Applying Game Theory to Poker, Network Traffic and Combinatorial

Transcription

Applying Game Theory to Poker, Network Traffic and Combinatorial
My Projects Using Game Theory
Robert Holte
© 2006 R. C. Holte, AICML
1
CMPUT 366, November, 2006
Poker
© 2006 R. C. Holte, AICML
2
CMPUT 366, November, 2006
The Challenges
•
•
•
•
•
Large game tree
Stochastic element
Imperfect information (during hand, and
after)
Variable number of players (2–10)
Aim is not just to win, but to maximize
winnings
•
© 2006 R. C. Holte, AICML
Need to exploit opponent weaknesses
3
CMPUT 366, November, 2006
2-player, limit, Texas Hold’em
1,624,350
O(1018)
Initial
9 of 19
Bet
Sequence
17,296
Flop
9 of 19
Bet
Sequence
45
9 of 19
Turn
2 private cards to each player
3 community cards
1 community card
Bet
Sequence
44
River
19
Bet
Sequence
1 community card
Game-Theoretic Approach
© 2006 R. C. Holte, AICML
5
CMPUT 366, November, 2006
Linear Programming
•
•
•
•
A 2-player, zero-sum game with chance
events, mixed strategies, and imperfect
information can be formulated as a linear
program (LP).
The LP can be solved in polynomial time to
produce Nash strategies for P1 and P2.
Guaranteed to minimize losses against the
strongest possible opponent.
“Sequence form” – the LP is linear in the
size of the game tree
(Koller, Megiddo, and von Stengel)
© 2006 R. C. Holte, AICML
6
CMPUT 366, November, 2006
Linear Programming
•
A 2-player, zero-sum game with chance
events, mixed strategies, and imperfect
information can be formulated as a linear
18
program (LP).
The LP can be solved in polynomial time to
produce Nash strategies for P1 and P2.
Guaranteed to minimize losses against the
strongest possible opponent.
“Sequence form” – the LP is linear in the
size of the game tree.
10 !!!
•
•
•
© 2006 R. C. Holte, AICML
7
CMPUT 366, November, 2006
PsOpti (Sparbot) – IJCAI’03
•
•
•
•
•
© 2006 R. C. Holte, AICML
Abstract game tree of size 107
Bluffing, slow play, etc. fall out from the
mathematics.
Best 2-player program to date
Has held its own against 2 world-class
humans
Won the AAAI’06 poker-bot competitions
8
CMPUT 366, November, 2006
PsOpti2 vs. “theCount”
© 2006 R. C. Holte, AICML
9
CMPUT 366, November, 2006
PsOpti’s Weaknesses
•
The equilibrium strategy for the highly abstract
game is far from perfect.
•
No opponent modelling.
•
Nash equilibrium not the best strategy:
•
•
•
Non-adaptive
Defensive
Even the best humans have weaknesses that
should be exploited
© 2006 R. C. Holte, AICML
10
CMPUT 366, November, 2006
http://www.poker-academy.com
Network Routing when Traffic
Demands are Uncertain
Recent Ph.D. thesis by Yuxi Li
© 2006 R. C. Holte, AICML
12
CMPUT 366, November, 2006
Example
A
TRAFFIC DEMAND
To A B C
From
100
A
B
C
B
100
100
C
© 2006 R. C. Holte, AICML
•
0 80 20
80 0 10
20 10 0
A routing is a set of fractions for all meaningful
combinations of H, O, D, N in:
At node H, fH,O,D,N of the packets going from
node O to node D should be sent next to node N
13
CMPUT 366, November, 2006
Treat this as a 2-player game
•
Objective function:
•
•
•
•
lifetime of an energy-constrained network
sensor network)
(e.g.
Player 1 (max): choose a routing
Player 2 (min): choose a traffic demand matrix
The Nash equilibrium of this game is a routing
that is resistant to adversarial attacks
© 2006 R. C. Holte, AICML
14
CMPUT 366, November, 2006
Behaviour under Attack
© 2006 R. C. Holte, AICML
15
CMPUT 366, November, 2006
Combinatorial Auctions
© 2006 R. C. Holte, AICML
16
CMPUT 366, November, 2006
Rodent auction
Flopsy
Rodent auction
Flopsy
Mopsy
Rodent auction
Flopsy
Mopsy
Jack
Rodent auction
Flopsy
Mopsy
C1: will pay $5 for any one
C3: will pay $12 for all three
C2: will pay $9 for a breeding pair
(Flopsy and one of the others)
Jack
Combinatorial Auctions
•
•
•
•
•
© 2006 R. C. Holte, AICML
Single Unit C.A. – one copy of each item
Auction all items simultaneously
Bid specifies a price and a set of items
(“all or nothing”)
Exclusive-OR can be achieved by having a
“dummy item” representing the bidder
Multi-round or single-round
21
CMPUT 366, November, 2006
$12 for all three
$12
F
© 2006 R. C. Holte, AICML
M
22
J
CMPUT 366, November, 2006
$9 for a breeding pair
F
© 2006 R. C. Holte, AICML
$9
$9
M
J
23
CMPUT 366, November, 2006
$5 for any one
© 2006 R. C. Holte, AICML
$5
$5
$5
F
M
J
24
C1
CMPUT 366, November, 2006
Applications
•
•
•
•
© 2006 R. C. Holte, AICML
FCC spectrum auctions
Goods distribution routes
Airport gates, parcels of land
eBay
25
CMPUT 366, November, 2006
Example Application
SPOT: earth observation satellite
Requests are made for photographs.
Each photograph can be taken at several
times by several different
instruments/settings, but quality (profit)
may vary.
Using one instrument/setting at a given time
may prevent the use of another at the
same or adjacent times.
© 2006 R. C. Holte, AICML
26
CMPUT 366, November, 2006
Example
Photograph 1
•
•
•
$4 if taken on instrument A at time T1
$3 if taken in instrument A at time T2
$12 is taken on instrument B at time T2
Photograph 2
•
$7 if taken on instrument B at time T2
If B is being used, A cannot used at the
same time.
© 2006 R. C. Holte, AICML
27
CMPUT 366, November, 2006
Formulation as a C.A.
Each instrument/setting/time is a separate
item for auction.
Each photograph is a bidder, with XOR bids
for each of the different ways of achieving
the photograph.
If “items” A and B are mutually exclusive, any
bid for A also includes B.
© 2006 R. C. Holte, AICML
28
CMPUT 366, November, 2006
Example as a C.A.
$7
$12
B2
© 2006 R. C. Holte, AICML
$3
A2
29
$4
A1
P1
CMPUT 366, November, 2006
Winner Determination
Problem: how to determine who wins ?
Choose a set of bids that are feasible
(disjoint) and maximize the auctioneer’s
profit.
NP-complete (set packing problem)
© 2006 R. C. Holte, AICML
30
CMPUT 366, November, 2006
Multi-unit Combinatorial Auctions
•
There are bi copies of item i
•
A bid specifies a quantity for each item
(is a vector length m if there are m items)
•
© 2006 R. C. Holte, AICML
A set of bids is feasible if its total
demand for each item does not exceed
the number of available copies of the
item
31
CMPUT 366, November, 2006
Manufacturing Application 1
•
•
Inventory consists of m types of parts, with bi
instances of type i.
A customer order requests a certain quantity of
each part, and offers a price.
Determine which orders to fill to maximize profit.
© 2006 R. C. Holte, AICML
32
CMPUT 366, November, 2006
Auctions and Knapsacks
Winner determination in a multi-unit combinatorial
auction with a single item is the classic NPcomplete Knapsack problem.
With more than one item, it is the
multidimensional Knapsack problem (MDKP) –
much less studied.
© 2006 R. C. Holte, AICML
33
CMPUT 366, November, 2006
The Knapsack Problem
•
•
•
•
An “easy” NP-complete problem
Garey & Johnson: knapsack is considered
“solved” by many (by branch-and-bound)
n (#bids) can be reduced by decomposition and
preprocessing
Solvable in pseudo-polynomial time by dynamic
programming
© 2006 R. C. Holte, AICML
34
CMPUT 366, November, 2006
Knapsack Algorithms
•
Dave Pisinger’s PhD (1995) - publicly
available, very fast code (d.p.)
•
There is a very simple greedy algorithm
guaranteed to be ½ optimal or better
© 2006 R. C. Holte, AICML
35
CMPUT 366, November, 2006
The Multidimensional Knapsack
Problem (MDKP)
•
•
•
An hard NP-complete problem
Not solvable in pseudo-polynomial time.
No greedy (polynomial) algorithm can guarantee
an approximation that is better than OPT/k½ where
k is the sum of the bi.
© 2006 R. C. Holte, AICML
36
CMPUT 366, November, 2006
Why Try Hill-climbing ?
•
It is part of branch&bound search and some
heuristic search algorithms
•
Try simple, generic search algorithms before
complex ones and problem-specific variants
© 2006 R. C. Holte, AICML
37
CMPUT 366, November, 2006
Deterministic Hill-Climbing
Start with the empty bid-set
REPEAT:
•
Consider adding one bid to the current bid-set
•
Prune bid-sets that are infeasible
•
Add the bid with the highest “score”
UNTIL pruning eliminates all alternatives
Report the highest price seen during search, not the price of
the final local “score” maximum
© 2006 R. C. Holte, AICML
38
CMPUT 366, November, 2006
Deterministic Hill-Climbing
Start with the empty bid-set
REPEAT:
•
Consider adding one bid to the current bid-set
•
Prune bid-sets that are infeasible or cannot be
extended to improve the best price seen so far
•
Add the bid with the highest “score”
UNTIL pruning eliminates all alternatives
Report the highest price seen during search, not the price of
the final local “score” maximum
© 2006 R. C. Holte, AICML
39
CMPUT 366, November, 2006
Scoring functions
•
•
Price
N2norm = Price/size
•
•
Single-unit, size = (# items in the bid)½
Multi-unit, size = (∑f(i)2)½
f(i) is the fraction of the remaining quantity of item i
required by the bid
•
KO = Price/(price of contending bids KO’d by the bid)
© 2006 R. C. Holte, AICML
40
CMPUT 366, November, 2006
Randomized Hill-climbing
Instead of adding the bid with the best score, choose
among the alternative bids (after pruning) randomly,
with probability proportional to score.
Restart (with the empty bid-set) several times on a given
problem and report best price seen on any restart.
© 2006 R. C. Holte, AICML
41
CMPUT 366, November, 2006
Single-Unit Test Problems
CATS test suite from Stanford
• new
• Problem generators for 5 different scenarios
e.g. airport takeoff and landing time-slots
•
•
Realistic (e.g. airports are Chicago, LaGuardia, etc.)
Numerous Parameters – defaults used except
•
•
3 variations on “regions” (default + 2 others)
scaled down “paths” and “scheduling”
© 2006 R. C. Holte, AICML
42
CMPUT 366, November, 2006
Experimental Setup
•
For each problem type randomly generate
100 different problems
•
On each problem run
•
the 3 deterministic hill-climbers
“Best DHC” = best of these prices
•
the 3 randomized hill-climbers (20 restarts each)
“Best RHC” = best of these prices
© 2006 R. C. Holte, AICML
43
CMPUT 366, November, 2006
Average Solution Quality
(% of optimal)
problem type
© 2006 R. C. Holte, AICML
best DHC best RHC
path
98
98
match
99
99
sched
96
98
r75P
83
92
r90P
90
96
r90N
89
96
arb
87
95
44
CMPUT 366, November, 2006
Observations
•
•
Randomized (20 restarts) better than deterministic
Randomized finds very good solutions
(always > 80%, average > 92%)
•
Problem ratings:
easy: path, match, sched
harder: r90P, r90N, arb
hardest: r75P
•
On the easy problems, deterministic finds very
good solutions (almost as good as randomized)
© 2006 R. C. Holte, AICML
45
CMPUT 366, November, 2006
Which Scoring Function ?
N2norm and KO about the same, better than Price.
© 2006 R. C. Holte, AICML
46
CMPUT 366, November, 2006
Which Scoring Function ?
N2norm and KO about the same, better than Price.
But are they better than chance ?
© 2006 R. C. Holte, AICML
47
CMPUT 366, November, 2006
Blind Hill-climbing
Choose bid randomly with uniform probability
•
•
still prunes
still reports best price seen throughout search
Repeat 200 times on each problem to measure its
solution quality distribution
© 2006 R. C. Holte, AICML
48
CMPUT 366, November, 2006
Percentage of Blind HC solutions
worse than Deterministic solutions
problem type
© 2006 R. C. Holte, AICML
N2norm
KO
path
100
100
match
100
100
sched
99
99
r75P
76
63
r90P
16
7
r90N
23
6
arb
40
20
49
CMPUT 366, November, 2006
Single-unit CA – Conclusions
•
hard for Blind HC, easy for DHC and RHC
•
•
•
•
problems of this type are solved well by HC
success is due to the scoring functions
not good testbeds for comparative experiments allowing
suboptimal solutions
easy for Blind HC, hard for DHC
•
•
•
scoring functions alone no better than chance
numerous good solutions throughout the search space
good testbeds as long as the Blind HC baseline is taken into
account
© 2006 R. C. Holte, AICML
50
CMPUT 366, November, 2006
Multidimensional Knapsack
Test Problems
ORLIB test suite from J. Beasley
• mknap1, mknap2
•
•
•
•
real-world, optimal values known
widely used
now considered “too easy”
artificial, larger, harder problems
•
•
only the smallest is solvable by CPLEX
best known solutions very close to LP-optimal
© 2006 R. C. Holte, AICML
51
CMPUT 366, November, 2006
Average Solution Quality
(% of optimal or of best known)
test set
Price
N2norm
KO
Blind
mknap1
90
98.99
83
84
mknap2
94
99.00
79
58
mknapcb1
89
98.94
85
82
mknapcb2
89
99.03
85
83
mknapcb3
89
99.21
85
83
mknapcb7
93
98.35
85
81
© 2006 R. C. Holte, AICML
52
CMPUT 366, November, 2006
Conclusion – MDKP
•
deterministic hill-climbing with N2norm
competitive with all previous work, and much
better than previous greedy approaches
•
other scoring functions relatively poor
•
randomized hill-climbing only better on the
easy problems
© 2006 R. C. Holte, AICML
53
CMPUT 366, November, 2006