Optimal exit time from casino gambling: Why a lucky coin

Transcription

Optimal exit time from casino gambling: Why a lucky coin
Optimal exit time from casino gambling: Why a lucky
coin and a good memory matter ∗
Xue Dong He†
Sang Hu‡
Jan Obl´oj§
Xun Yu Zhou¶
First version: May 24, 2013, This version: May 13, 2015
Abstract
We consider the dynamic casino gambling model initially proposed by Barberis
(2012) and study the optimal stopping strategy of a pre-committing gambler with
cumulative prospect theory (CPT) preferences. We develop a systematic and analytical approach to finding the gambler’s optimal strategy. We illustrate how the
strategies computed in Barberis (2012) can be strictly improved by reviewing the betting history or by tossing an independent coin, and we explain that the improvement
generated by using randomized strategies results from the lack of quasi-convexity of
CPT preferences. Finally, we show that any path-dependent strategy is equivalent to
a randomization of path-independent strategies.
Key words: casino gambling; cumulative prospect theory; path-dependence;
randomized strategies; quasi-convexity; optimal stopping; Skorodhod embedding
∗
The main results in this paper are contained in the Ph.D. thesis of Sang Hu, Optimal Exit Strategies of
Behavioral Gamblers (Hu, 2014), which was submitted to the Chinese University of Hong Kong (CUHK)
in September 2014. The authors thank Nick Barberis for his helpful comments on an earlier version of the
paper. The results were also announced and presented at the Trimester Seminar “Stochastic Dynamics in
Economics and Finance” held at the Hausdorff Research Institute for Mathematics in August 2013, as well
as at the “Second NUS Workshop on Risk & Regulation” held at the National University of Singapore in
January 2014. Comments from the participants of these events are gratefully acknowledged.
†
Department of Industrial Engineering and Operations Research, Columbia University, S. W. Mudd
Building, 500 W. 120th Street, New York, NY 10027, USA. Email: [email protected]. He acknowledges support from a start-up fund at Columbia University.
‡
Risk Management Institute, National University of Singapore, 21 Heng Mui Keng Terrace, Singapre
119613. Email: [email protected].
§
Mathematical Institute, The Oxford-Man Institute of Quantitative Finance and St John’s College,
University of Oxford, Oxford, UK. Email: [email protected]. Part of this research was completed while this author was visiting at CUHK in March 2013, and he is grateful for the support of that
institution. He also gratefully acknowledges support from ERC Starting Grant RobustFinMath 335421.
¶
Mathematical Institute, The University of Oxford, Woodstock Road, OX2 6GG Oxford, UK, and
Oxford–Man Institute of Quantitative Finance, University of Oxford. Email: [email protected].
This author acknowledges financial support from research funds at the University of Oxford and the Oxford–
Man Institute of Quantitative Finance.
1
1
Introduction
Gambling is the wagering of money on an event of which the outcome is uncertain. Casino
gambling is a type of gambling that enjoys huge popularity. By its very nature, casino
gambling is an act of risk seeking because a typical casino bet has at most zero expected
value. Such risk-seeing behavior cannot be explained by models in the expected utility
framework with concave utility functions. By contrast, models in behavioral economics
have the potential to shed meaningful light on this behavior.1
Barberis (2012) was the first to employ the cumulative prospect theory (CPT) of Tversky and Kahneman (1992) to model and study casino gambling. In his model, a gambler
comes to a casino at time 0 and is offered a bet with an equal chance to win or lose $1.
If the gambler accepts, the bet is then played out and he either gains $1 or loses $1 at
time 1. Then, the gambler is offered the same bet, and he can choose to leave the casino
or to continue gambling. If he continues, the second bet is played out and the gambler is
offered another bet at time 2, and so on. The problem is to determine whether the gambler
will enter the casino to play the game at all and, if yes, what is the best time for him
to stop playing. Clearly, the answer depends on the gambler’s preferences. In Barberis
(2012), the gambler’s risk preferences are represented by CPT. In this theory, individuals’
preferences are determined by an S-shaped utility function and two inverse-S shaped probability weighting functions. The latter effectively overweight the tails of a distribution, so
a gambler with CPT preferences overweights very large gains of small probability and thus
may decide to play in the casino.
Barberis (2012) compares the strategies of three types of gamblers: naive gamblers,
sophisticated gamblers with pre-commitment, and sophisticated gamblers without precommitment. Because of probability weighting, the casino gambling problem is timeinconsistent in the sense that the optimal strategy designed by the gambler at time 0
is no longer optimal at a future time if and when the gambler reconsiders the problem.
A naive gambler does not realize this inconsistency and thus keeps changing his strategy
over time. A sophisticated gambler with pre-commitment is able to commit himself in
the future to the strategy that is set up at time 0 through some commitment device. A
sophisticated gambler without pre-commitment realizes the inconsistency but is unable to
commit himself to the optimal strategy at time 0, and thus takes this inconsistency into
1
Another type of model similar to the one in Conlisk (1993) tries to explain gambling by introducing
additional utility of gambling and appending it to the expected utility model.
2
account when making decisions today. With reasonable parameter values, Barberis (2012)
finds that sophisticated gamblers with pre-commitment tend to take loss-exit strategies,
i.e., to stop playing at a certain loss level (e.g., $100 loss) and to continue when he is in
the gain position. Naive gamblers, by contrast, end up with gain-exit strategies. Finally,
sophisticated agents without pre-commitment choose not to play at all.
CPT is a descriptive model for individuals’ preferences. A crucial contribution in Barberis (2012) lies in showing that the optimal strategy of a gambler with CPT preferences
is consistent with several commonly observed gambling behaviors such as the popularity of
casino gambling and the implementation of gain-exit and loss-exit strategies. However, the
setting of Barberis (2012) was restrictive as it assumed that the gambler can only choose
among simple (path-independent) strategies. Moreover, the author proceeded by means of
an exhaustive search and only found the optimal strategy for an example with five periods.
Barberis (2012) leaves number of natural and important open questions. First, if the set
of strategies is not restricted, could CPT also explain other commonly observed gambling
patterns? For instance, Thaler and Johnson (1990) find the house money effect, i.e., that
gamblers become more risk seeking in the presence of a prior gain. This is an example of
a strategy which is not path-independent but takes into account the history of winnings.
Similarly, it has been observed that individuals may use a random device, such as a coin
flip, to aid their choices in various contexts. More examples and discussions on randomization can be found in, e.g., Agranov and Ortoleva (2013), Dwenger et al. (2013) and the
references therein2 . Thus, it would be interesting to see whether CPT gamblers sometimes
prefer to use path-dependent and randomized strategies and to understand why this is the
case. Second, instead of relying on enumeration, it seems desirable to develop a general
methodology for solving the casino gambling problem with CPT preferences.
We are able here to answer both of the above questions. We consider the casino gambling
problem without additional restrictions on time-horizon or set of available strategies. Our
first main contribution is to study different classes of strategies and explain which features of
the CPT preferences lead to behaviors consistent with empirical findings. This contribution
can be summarized in three parts.
2
In Dwenger et al. (2013), several real-life examples exploiting the randomization feature are given:
surprise menus at restaurants, last minute holiday booking desks at airports, movie sneak previews that do
not advertise the movie titles, surprise-me features of internet services, and home delivery of produce bins
with variable content. Another example is the practice of “drawing divination sticks”, popular in Chinese
culture even today, in which people who are reluctant or unable to make their own important decisions go
to a temple, pray and draw divination sticks, and follow whatever the words on the sticks tell them to do.
3
First, through a numerical example and assuming reasonable parameter values for the
gambler’s CPT preferences, we find that the gambler may strictly prefer path-dependent
strategies over path-independent strategies. Likewise, by tossing an independent (possibly
biased) coin at some point, the gambler may further strictly improve his preference value.
Secondly, we study, at a theoretical level, the issue of why the gambler prefers randomized strategies3 . Consider an agent who wants to optimally stop a Markov process. We find
that the distribution of the process at a randomized stopping time is a convex combination
of the distributions of the process at some non-randomized stopping times. Therefore, if
the agent’s objective is to maximize his preference value of the distribution of the process
at the stopping time and the preference is quasi-convex, he dislikes randomization. In the
casino gambling problem, the gambler has CPT preferences, which are not quasi-convex; so
it becomes possible for the gambler to strictly improve performance by using randomized
strategies.
Thirdly, we show that any path-dependent strategy is equivalent to a randomization of
path-independent strategies. This result implies that agents with quasi-convex preferences
do not need to choose path-dependent strategies. In particular, the gambler in the casino
problem strictly prefers path-dependent strategies only because CPT is not quasi-convex.
This is no longer true if randomization is allowed: randomized path-independent strategies
always perform better than path-dependent but non-randomized strategies. Sometimes, as
shown by examples, they perform strictly better. It also follows that it is always enough
to consider randomized, path-independent strategies since the more complex randomized
and path-dependent strategies cannot further improve the gambler’s preference value.
Our second main contribution is to develop a systematic approach to solving the casino
gambling problem. Because of probability weighting, classical approaches to optimal stopping problems such as the martingale method and the dynamic programming principle do
not apply here. By proving a suitable Skorodhod embedding theorem, we show that the
casino gambling problem is equivalent to an infinite-dimensional optimization problem in
which the distribution of the cumulative gain and loss at the stopping time becomes the decision variable. Therefore, to find the optimal strategy of the gambler, we only need to solve
the infinite-dimensional optimization problem for the optimal distribution and then use the
Skorodhod embedding theorem to find the stopping time leading to this distribution.
In the present paper, we focus on the optimal strategy for sophisticated agents with
3
In a game-theoretical language the non-randomized and randomized strategies correspond respectively
to pure and mixed strategies, see Section 4.1 below for a further discussion.
4
pre-commitment. We do so for a number of reasons. First, this allows us to concentrate on
studying the optimal behavior resulting from CPT preferences, describe features matching
empirically observed patterns and investigate theoretical properties underpinning them.
Second, we believe that such sophisticated agents with pre-commitment are themselves an
important type of agents, and many gamblers belong to this type with the aid of some
commitment devices. Finally, these agents are key to studying any other types of agents.
In particular, once we have found the optimal strategy for them, the realized strategy for
naive agents can also be computed. We propose to study it in detail in a subsequent work.
The remainder of this paper is organized as follows: In Section 2, we briefly review CPT
and then formulate the casino gambling model. In Section 3, we offer numerical examples
to show that path-dependent and randomized strategies strictly outperform, respectively,
path-independent and non-randomized ones. We then explain, in Section 4, the reasons
underlying this out-performance. In Section 5, we propose a systematic approach to the
casino gambling problem and carry out a numerical study. Finally, Section 6 concludes the
paper. All proofs are presented in Appendix A.
2
The Model
2.1
Cumulative Prospect Theory
In the classical expected utility theory (EUT), individuals’ preferences are represented
by the expected utility of random payoffs. However, this theory cannot explain many
commonly observed behaviors when individuals make decisions under risk.4 One of the
most notable alternatives to EUT is the cumulative prospect theory (CPT) proposed by
Kahneman and Tversky (1979) and Tversky and Kahneman (1992). In this theory, instead
of evaluating random payoffs directly, individuals evaluate random gains and losses relative
to a benchmark which is termed the reference point. More precisely, after a reference point
B is specified, individuals code each random payoff Y into the corresponding gain or loss
X = Y − B. Then, the preference for X is represented by the functional
Z
∞
Z
u(x)d[−w+ (1 − FX (x))] +
V (X) :=
u(x)d[w− (FX (x))],
−∞
0
4
0
See a detailed discussion in Starmer (2000).
5
(1)
where FX (·) is the cumulative distribution function (CDF) of X. The function u(·), which is
strictly increasing, is called the utility function and w± (·), two strictly increasing mappings
from [0, 1] onto [0, 1], are probability weighting (or distortion) functions on gains and losses,
respectively.5
Empirical studies reveal that u(·) is typically S-shaped. In other words, both u+ (x) :=
u(x), x ≥ 0, the utility of the gain, and u− (x) := −u(−x), x ≥ 0, the disutility of the
loss, are concave functions. On the other hand, w± (·) are inverse-S-shaped, representing
individuals’ tendency to overweight (relative to EUT) the tails of payoff distributions. The
S-shaped utility function and inverse-S-shaped probability weighting functions together
result in a fourfold pattern of risk attitudes that is consistent with empirical findings:
that individuals tend to be risk averse and risk seeking with respect to gains and losses
of moderate or high probability, respectively; individuals are risk seeking and risk averse
regarding gains and losses of small probability, respectively (Tversky and Kahneman, 1992).
Finally, loss aversion, the tendency to be more sensitive to losses than to gains, can also
be modeled in CPT by choosing u− (·) to be steeper than u+ (·). See Figures 1 and 2 for an
illustration of the utility and probability weighting functions, where the functions take the
following parametric forms proposed by Tversky and Kahneman (1992):
u(x) =


x α +

−λ(−x)α−
2.2
for x ≥ 0
and
for x < 0,
w± (p) =
(pδ±
pδ±
.
+ (1 − p)δ± )1/δ±
(2)
The Casino Gambling Problem
We consider the casino gambling problem model proposed by Barberis (2012). At time 0, a
gambler is offered a fair bet, e.g., an idealized black or red bet on a roulette wheel: win or
lose one dollar with equal probability.6 If he declines this bet, he does not enter the casino
to gamble. Otherwise, he starts the game and the outcome of the bet is played out at time
1, at which time he either wins or loses one dollar. The gambler is then offered the same
bet and he can again choose to play or not, and so forth.
5
The utility function is called the value function in Kahneman and Tversky’s terminology. In the present
paper, we use the term utility function to distinguish it from value functions of optimization problems.
6
In this paper we consider the case of a fair game because, while it is an idealization of reality, it is
already rich enough to convey the main economical insights. The case of an unfair game is technically
more involved without significantly new conceptual insights, and we hope to study it in a future paper.
6
4
3
2
u(x)
1
0
−1
−2
−3
−4
−2
−1.5
−1
−0.5
0
x
0.5
1
1.5
2
Figure 1: S-shaped utility function. The function takes form as in (2) with α+ = α− = 0.5, and
λ = 2.5.
1
delta = 0.65
delta = 0.4
0.9
0.8
0.7
w(p)
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.2
0.4
0.6
0.8
1
p
Figure 2: Inverse-S-shaped probability weighting function. The function takes form as in (2)
with the dotted line corresponding to δ = 0.65, the dashed line to δ = 0.4, and the solid line to
δ = 1.
7
At time 0, the gambler decides whether to enter the casino and, if yes, the optimal time
to leave the casino. We call such a decision a strategy. As in Barberis (2012), we assume
CPT preferences for the gambler; so the decision criterion is to maximize the CPT value
of his wealth at the time when he leaves the casino. Because the bet is fair, the cumulative
gain or loss of the agent, while he continues to play, is a standard symmetric random walk
Sn , n ≥ 0, on Z, the set of integers; see the representation in Figure 3. We further assume
that the gambler uses his initial wealth as the reference point, so he perceives Sn as his
cumulative gain or loss after playing the bet for n times.
For any stopping time τ , the CPT value of Sτ is
V (Sτ ) =
∞
X
u+ (n) w+ (P(Sτ ≥ n)) − w+ (P(Sτ > n))
n=1
−
∞
X
(3)
u− (n) w− (P(Sτ ≤ −n)) − w− (P(Sτ < −n)) ,
n=1
with the convention that +∞ − ∞ = −∞, so that V (Sτ ) is always well-defined. The
gambler’s problem faced at time 0 is then to find the stopping time τ in a set of admissible
strategies to maximize V (Sτ ). The choice of a specific admissible set is critical to forming
the optimal strategy, and we will specify a few important types below.
2.3
Types of Stopping Strategies
A simple class of strategies, considered in Barberis (2012), is given by rules τ , which assign
a binary decision, to stop or not to stop, to each node in the gain/loss process. In other
words, for any t ≥ 0, {τ = t} (conditioning on {τ ≥ t}) is determined by (t, St ). We call
such strategies path-independent or Markovian, and we denote the set of these strategies
as AM .
Extending the above, we can consider path-dependent strategies which base the decision
not only on the current state but also on the entire betting history. In other words, we
consider strategies τ such that for any t ≥ 0, {τ = t} is determined by the information set
Ft = σ(Su : u ≤ t). The binary decision—to stop or not to stop—depends not only on the
current node of the gain/loss process but also on how the gambler arrived at the current
node. Mathematically speaking, τ is a stopping time relative to {Ft }t≥0 . We denote the
set of path-dependent strategies as AD .
Next, we consider strategies that can make use of independent coin tosses. Imagine that
8
(2,2)
(1,1)
(0,0)
(2,0)
(1,Ͳ1)
(2,Ͳ2)
Figure 3: Gain/loss process represented as a binomial tree. Each node is marked by a pair (n, x),
where n stands for the time and x the amount of (cumulative) gain/loss. For example, the node
(2, −2) signifies a total loss, relative to the initial wealth, of $2 at time 2.
9
sometimes the gambler may have difficulty deciding whether or not to leave the casino, so
he tosses a coin to help with his decision making. More precisely, we consider the following
simple strategies: at each step the gambler tosses a coin and decides to leave the casino
if the outcome is tails and to continue if the outcome is otherwise. To describe this more
formally, consider {ξt,x }t≥0,x∈Z a family of mutually independent, and independent of S,
coin tosses: ξt,x = 0 stands for tails and ξt,x = 1 stands for heads. The gambler’s behaviour
is represented by the following strategy:
τ := inf{t ≥ 0|ξt,St = 0}.
(4)
Note that the information available to the gambler includes not only his prior gains
and losses but also the coin toss outcomes in the history; i.e., the information at t becomes
Gt := σ(Su , ξu,Su , u ≤ t). Clearly, τ in (4) is a stopping time with respect to this enlarged
information flow. However, τ defined by (4) is path-independent in that {τ = t} depends
only on St and ξt,St (conditioning on {τ ≥ t}). Let us stress that we do not specify the
distribution of these random coins, i.e., the value of rt,x := P(ξt,x = 0) = 1 − P(ξt,x = 1);
these numbers are determined by the gambler as part of his gambling strategy. We denote
by AC the set of such randomized, path-independent strategies generated by tossing coins,
i.e.,7
AC := {τ |τ is defined as in (4), for some {ξt,x }t≥0,x∈Z that are independent 0-1
random variables and are independent of {St }t≥0 .}
Note that AM ⊂ AC since non-randomized Markovian strategies correspond simply to the
case when rt,x ∈ {0, 1}.
Finally, we could also consider randomized path-dependent strategies; i.e., at each time
t and for each realization {xt } of {St }, an independent random coin ξt,xt∧· is tossed and the
agent stops at the first time ξt,St∧· = 0.8 Interestingly, as will be seen in Proposition 1 below,
using randomized path-dependent strategies does not improve the gambler’s preference
value compared to using randomized but path-independent strategies. Put differently, the
added complexity of randomized path-dependent strategies, when compared to randomized
7
One can also introduce randomization through mutually independent uniformly distributed random
variables.
8
These may also be represented as a random (independent of S) choice of strategy in AD . Using
game-theoretical language, these are mixed strategies with the set of pure strategies given by AD , see the
discussion in Section 4.1.
10
(3,3)
(5,3)
(4,2)
(3,1)
(5,1)
Figure 4: Optimal path-independent strategy. The CPT value is V = 0.250440. Black nodes
stand for “stop” and white nodes stand for “continue.”
Markovian strategies in AC , has no benefit for the gambler. In consequence, the randomized
path-dependent strategies will not be considered in the following analysis.
3
Comparison of Strategies
In this section we compare the three types of strategy defined in Section 2.3. We demonstrate, through the following example, that using path-dependent strategies can strictly
increase the optimal value achieved by using the Markovian ones and that using randomized strategies can further strictly increase the optimal value over path-dependent ones.
Here we consider Tversky and Kahneman (1992) utility and probability weighting functions in (2) with α+ = α− = α, δ+ = δ− = δ. Consider a 6-period horizon, α = 0.9,
δ = 0.4, and λ = 2.25, which are reasonable parameter values; see for instance Tversky and
Kahneman (1992).9
3.1
Optimal Path-Independent Strategy
First, consider the case in which only path-independent strategies are allowed. The optimal
strategy can be found through exhaustive search. The optimal CPT value is V = 0.250440.
9
Our results are not specific to the choice of probability weighting functions. For instance, we obtain
the same results by using the probability weighting function proposed by Prelec (1998), i.e., w± (p) =
δ
e−γ(− ln p) , with γ = 0.5 and δ = 0.4.
11
(5,3)
(3,3)
(4,2)
(3,1)
(5,1)
Figure 5: Optimal path-dependent strategy. The CPT value is V = 0.250693. Black nodes stand
for “stop” and white nodes stand for “continue.” The half-black-half-white node (5, 1) means that
if the previous two nodes are (3, 3) and (4, 2), then the gambler stops; if the previous two nodes
are (3, 1) and (4, 2), then he continues.
(3,3)
(5,3)
(4,2)
(3,1)
(5,1)
Figure 6: Randomized, path-independent strategy. The CPT value is V = 0.250702. Black nodes
stand for “stop” and white nodes stand for “continue.” The grey node (5, 3) means that if node
(5, 3) is reached, then the gambler tosses a biased coin with (heads : 31/32, tails : 1/32). If the
result is heads, then the gambler continues; otherwise he stops. The grey node (5, 1) means that if
node (5, 1) is reached, then the gambler tosses a coin with (heads : 1/2, tails : 1/2) and continues
only if the coin turns up heads.
12
Figure 4 shows the optimal path-independent strategy, where black nodes stand for “stop”
and white nodes stand for “continue.” Observe that this strategy is to stop at any loss
state and to continue at any gain state until the end of the 6th period except for at node
(5, 1).
3.2
Optimal Path-Dependent Strategy
Next, we consider path-dependent strategies. Again, the optimal strategy can be found
through exhaustive search. The optimal CPT value is strictly improved: V = 0.250693.
Figure 5 shows this optimal path-dependent strategy. Node (5, 1) is half-black-half-white,
meaning that if the previous two nodes the path has gone through are (3, 3) and (4, 2),
then the gambler stops; if the previous two nodes are (3, 1) and (4, 2), then he continues.10
Notice that even though (5, 1) is the only different node compared with the optimal pathindependent strategy, the optimal CPT value has already been strictly increased; i.e., the
optimal CPT value over path-independent strategies is strictly improved by using pathdependent strategies.
3.3
Randomized, Path-Independent Strategy
Now, we introduce randomization into nodes (5, 3) and (5, 1), which is shown in grey in
Figure 6. Once node (5, 3) is reached, the gambler tosses a biased coin with (heads :
31/32, tails : 1/32), i.e., the probability of the coin turning up heads is 31/32, while the
probability of tails is 1/32. If the result is heads, then the gambler continues; otherwise
he stops. Similarly, once node (5, 1) is reached, the gambler tosses a coin with (heads :
1/2, tails : 1/2) and continues only if the coin turns up heads. The CPT value of such
a strategy is increased to V = 0.250702. In particular, the optimal CPT value achieved
using path-dependent strategies is strictly improved when using randomized but pathindependent strategies.
The previous examples show that, starting with Markovian strategies, it is possible to
strictly increase the gambler’s CPT value by allowing for path-dependent strategies and
then to further increase the value by switching to randomized strategies in AC . This is a
10
The optimal path-dependent strategy is not unique: the gambler can also choose to continue if the
path leading to node (5, 1) goes through nodes (3, 3) and (4, 2) and stop if it goes through nodes (3, 1) and
(4, 2).
13
general fact and the following chain of inequalities holds:
sup V (Sτ ) ≤ sup V (Sτ ) ≤ sup V (Sτ ).
τ ∈AM
τ ∈AD
(5)
τ ∈AC
The inequality supτ ∈AM V (Sτ ) ≤ supτ ∈AD V (Sτ ) is obvious because the set of path-dependent
strategies contains that of Markovian strategies. Similarly, supτ ∈AM V (Sτ ) ≤ supτ ∈AC V (Sτ )
is also trivial because Markovian strategies are special (deterministic) examples of strategies in AC , which in general rely on more information (i.e., coin tosses). The inequality
supτ ∈AD V (Sτ ) ≤ supτ ∈AC V (Sτ ) (with the strict inequality possible) is not trivial and we
will show this inequality in Section 4.2.
4
Why Lucky Coins and Good Memory Matter
Preference for randomization has indeed been observed in the literature in a variety of contexts; see the recent empirical evidence provided in Agranov and Ortoleva (2013), Dwenger
et al. (2013) and the references therein. The preference for path dependence, however, has
not been studied extensively in the literature. In what follows, we explain theoretically, via
a general optimal stopping problem, the reasons underlying the observed strict preferences
for randomization and for path-dependence in the casino gambling model.
We consider a general discrete-time Markov chain {Xt }t≥0 .11 Without loss of generality,
we assume that Xt takes values in Z and that X0 = 0. As in the casino gambling problem,
we denote AM , AD , and AC as the sets of Markovian stopping times, path-dependent
stopping times, and randomized (through coin tossing) yet Markovian stopping times,
respectively.
Suppose an agent wants to choose stopping time τ to maximize his preference value of
Xτ . In the following, we study the problem when the agent likes or dislikes randomization
or path-dependent strategies.
4.1
Preference for Randomization
Any randomized stopping time in AC can be viewed as a special case of full randomization.
A fully randomized strategy is implemented by randomly choosing from a given set of
non-randomized stopping times, e.g., from AM , using a random device. Such a device is
11
The gain/loss process in the casino gambling problem is a symmetric random walk and thus is a Markov
chain.
14
represented by a random mapping σ from Ω to AM and is independent of the Markovian
process {Xt }t≥0 . Suppose the outcome of σ is τ , which is a stopping time τ ∈ AM ; the agent
then chooses τ as his strategy. To visualize this, imagine an uneven die with an infinite but
countable number of faces, each corresponding to a strategy in AM . Using game-theoretical
language, fully randomized strategies correspond to mixed strategies while elements in AM
represent pure strategies.
Any τ˜ ∈ AC , i.e., τ˜ = inf{t ≥ 0|ξt,Xt = 0}, for a given set of mutually independent 0-1
random variables {ξt,x }t≥0,x∈Z that are independent of {Xt }t≥0 , can be viewed as a special
case of full randomization. Indeed, for any A := {at,x }t≥0,x∈Z such that at,x ∈ {0, 1},
define τA := inf{t ≥ 0|at,Xt = 0}. Then, τA ∈ AM . Define σ as σ(ω) := τA with
A = {ξt,x (ω)}t≥0,x∈Z . Then, σ is a random mapping taking values in AM . Moreover,
because {ξt,x }t≥0,x∈Z is independent of {Xt }t≥0 , so is σ. Thus, σ is a fully randomized
strategy and, clearly, σ = τ˜.12
We now study the problem as to when the agent dislikes full randomization, and hence,
in particular, will not use randomized strategies in AC . We first show that the stopped
distribution of {Xt }t≥0 at a randomized stopping time σ is a convex combination of the
stopped distributions at a set of nonrandomized stopping times. To see this, suppose the
possible values that σ can take are τi ’s with probabilities pi ’s, respectively. Then,
FXσ (x) = P(Xσ ≤ x) = E [P(Xσ ≤ x|σ)] =
X
pi P(Xτi ≤ x|σ = τi )
i=1
=
X
pi P(Xτi ≤ x) =
i=1
X
pi FXτi (x),
i=1
where the third equality holds because of the independence of σ and {Xt }t≥0 . Therefore,
full randomization effectively convexifies the set of state distributions at non-randomized
stopping times.
Now, suppose the agent’s preference for Xτ is law-invariant, i.e., the preference value
of Xτ is V(FXτ ) for some functional V on distributions. We say V is quasi-convex if
V(pF1 + (1 − p)F2 ) ≤ max{V(F1 ), V(F2 )},
12
∀F1 , F2 , ∀p ∈ [0, 1].
(6)
The converse is also true: for any fully randomized strategy σ, we can find a strategy τ in AC such
that the distributions of Sτ and Sσ are the same; see the discussion after Theorem 2.
15
If, further, V is continuous, then we can conclude that
V
+∞
X
i=1
13
!
pi F i
≤ max V(Fi ),
i≥1
∀Fi , ∀pi ≥ 0, i ≥ 1, with
∞
X
pi = 1.
i=1
Because the state distribution at a randomized stopping time is a convex combination of
the state distributions at nonrandomized stopping times, we can conclude that the agent
dislikes any type of randomization if his preference representation V is quasi-convex.
It has been noted in the literature that individuals with quasi-convex preferences dislike
randomization; see for instance Machina (1985), Camerer and Ho (1994) and Blavatskyy
(2006). It is easy to see that a sufficient condition for quasi-convexity is betweenness: for any
two distributions F1 and F2 such that V(F1 ) ≥ V(F2 ) and any p ∈ (0, 1), V(F1 ) ≥ V(pF1 +
(1 − p)F2 ) ≥ V(F2 ). Betweenness is further implied by independence: for any distributions
F1 , F2 , G and any p ∈ (0, 1), if V(F1 ) ≥ V(F2 ), then V(pF1 +(1−p)G) ≥ V(pF2 +(1−p)G). 14
Because expected utility theory is underlined by the independence axiom, agents with EU
preferences dislike randomization. Similarly, agents with preferences satifying betweenness
or quasi-convexity also dislike randomization.15
On the other hand, preferences involving probability weighting, such as rank-dependent
utility (Quiggin, 1982), prospect theory (Kahneman and Tversky, 1979, Tversky and Kahneman, 1992), and dual theory of choice (Yaari, 1987), are in general not quasi-convex; see
Camerer and Ho (1994). With these preferences, randomization can be strictly preferred,
as shown in our casino gambling model.
Empirically, individuals have been observed to violate quasi-convexity and, as a result,
to prefer randomization; see for instance Camerer and Ho (1994), Blavatskyy (2006), and
the references therein.
Pn
Quasi-convexity implies V( i=1 pi Fi ) ≤ max1≤i≤n V(Fi ) and the continuity is needed to pass with n
to infinity.
14
Indeed, by setting G in the definition of the independence axiom to be F1 and F2 , respectively, we
immediately conclude that independence leads to betweenness.
15
Examples of preferences satisfying betweenness include disappointment theory (Gul, 1991), weighted
utility model (Chew, 1983, Chew and MacCrimmon, 1979, Fishburn, 1983), skew-symmetric bilinear model
(Fishburn, 1981), expected utility with suspicion (Bordley and Hazen, 1991), and utility models with
betweenness only (Chew, 1989, Dekel, 1986). Further, some models in the quadratic class of utilities
proposed by Machina (1982) and Chew et al. (1991) satisfy quasi-convexity.
13
16
4.2
Preference for Path-Dependent Strategies
It is well known that if the agent’s preferences are represented by expected utility, i.e.,
R
V(F ) = u(x)dF (x) for some utility function u, then the agent’s optimal stopping time
must be Markovian. This result is a consequence of dynamic programming. For dynamic
decision problems with non-EUT preferences, however, dynamic programming may not
hold if the problems are time inconsistent, so it is unclear whether the optimal stopping
time is still Markovian in this case. In the following, we show that any path-dependent
strategy can be viewed as a randomization of Markovian strategies, so the agent does not
strictly prefer path-dependent strategies if his preference for Xτ is quasi-convex.
Proposition 1 For any τ ∈ AD , there exist τ˜ ∈ AC such that (Xτ , τ ) has the same
distribution as (Xτ˜ , τ˜). More generally, for any randomized path-dependent strategy τ 0 ,
there exist τ˜ ∈ AC such that (Xτ 0 , τ 0 ) has the same distribution as (Xτ˜ , τ˜).
Proposition 1 shows that for any path-dependent stopping time τ , Xτ is identically
distributed as Xτ˜ for some τ˜ ∈ AC , which is a randomization of Markovian stopping times.
This result has three implications for an agent whose preference for Xτ is represented by
V(FXτ ). First, the agent is indifferent between τ and τ˜. Consequently, we must have
sup V(FXτ ) ≥ sup V(FXτ ),
τ ∈AD
τ ∈AC
which explains the second inequality in (5). In other words, randomized path-independent
strategies always perform no worse than non-randomized path-dependent strategies. The
inequality may be strict as seen in our examples in Section 3 above.
Second, for any randomized path-dependent strategy τ 0 , there also exists τ˜ ∈ AC such
that Xτ 0 has the same distribution as Xτ˜ . Therefore, using randomized path-dependent
strategies cannot improve the gamblers preference value compared to simply using randomized but path-independent strategies. This explains why we consider only the latter in the
casino gambling problem.
Third, if the agent’s preference representation V is quasi-convex then it is optimal for
him to use non-randomized and path-independent strategies only. Indeed, we have already
concluded that he dislikes any type of randomization. Further, by Proposition 1, any pathdependent strategy is equivalent to a randomization of path-independent strategies and is
thus less preferred than some path-independent strategy. In the casino gambling problem,
the gambler can improve his CPT value by considering path-dependent strategies only
17
because CPT is not quasi-convex.
4.3
Discounting and time-dependent preferences
For any full randomization σ, say taking possible values τi ’s with respective probabilities
pi ’s, we have
P(σ ≤ t, Xσ ≤ x) = E [P(σ ≤ t, Xσ ≤ x|σ)] =
X
pi P(τi ≤ t, Xτi ≤ x|σ = τi )
i=1
=
X
pi P(τi ≤ t, Xτi ≤ x),
i=1
i.e., the joint distribution of (σ, Xσ ) is a convex combination of the joint distributions of
(τi , Xτi ), i ≥ 1. Furthermore, Proposition 1 shows that for any path-dependent stopping
time τ (randomized or not), (τ, Xτ ) is identically distributed as (˜
τ , Xτ˜ ) for some randomized
but path-independent strategy τ . Therefore, the conclusions in Sections 4.1 and 4.2 remain
true if the agent’s preferences are represented by a functional V˜ of the joint distribution
of (τ, Xτ ). In particular, if V˜ is quasi-convex, then the agent dislikes randomization and
path-dependent strategies.
Suppose the agent’s preferences are represented as V(FH(τ,Xτ ) ), a functional of the distribution of H(τ, Xτ ) for some function H. Then, this preference representation can also
be viewed as a functional of the joint distribution of (τ, Xτ ). Furthermore, if V(FH(τ,Xτ ) )
is quasi-convex in FH(τ,Xτ ) , it is also quasi-convex in the joint distribution Fτ,Xτ . In this
case, the agent dislikes randomization and path-dependent strategies. A simple example
of function H is H(t, x) = e−rt x, where r is a discount factor. Therefore, if the agent has
law-invariant and quasi-convex preferences for the discounted value e−rτ Xτ , he will choose
only Markovian strategies.
5
Solving the Optimal Gambling Problem
In this section, we provide a systematic approach to finding the optimal randomized stopping time τ ∈ AC to maximize the gambler’s CPT value (3).
Classical approaches to optimal stopping problems include the martingale method and
the dynamic programming principle, which depend respectively on the linearity of mathematical expectation and time consistency. Both of these approaches fail in the casino
18
gambling problem due to the presence of probability weighting functions. Observe that
the probability distribution of Sτ is the direct carrier of the gambler’s CPT value; this
observation motivates us to take this distribution as the decision variable. This idea of
changing the decision variable from τ to the distribution or quantile function of Sτ was
first applied by Xu and Zhou (2012) to solve a continuous-time optimal stopping problem
with probability weighting. In our discrete-time setting, however, quantile functions are
not a suitable choice for the decision variable because they are integer-valued and thus form
a non-convex feasible set. Here, we choose distribution functions as the decision variable.
The procedure for solving the casino gambling problem can be divided into three steps.
First, change the decision variable from stopping time τ to the probability distribution of
Sτ . Second, solve an infinite-dimensional program to obtain the optimal distribution of Sτ .
Third, provide a construction of an optimal stopping strategy from the optimal probability
distribution. The key in carrying out this three-step procedure is to characterize the set
of distributions of Sτ and to recover the stopping time τ such that Sτ has the desired
optimal distribution, and these prerequisites are achieved using the Skorokhod embedding
techniques.
To illustrate this idea, we consider the casino gambling problem on an infinite time
horizon.16 In this setting, we need to exclude doubling strategies, e.g., τ = inf{t ≥ 0|St ≥ b}
for some b ∈ R. One way to exclude such strategies is to restrict ourselves to stopping times
τ such that {Sτ ∧t }t≥0 is uniformly integrable.17 Therefore, the feasible set of stopping times
under consideration becomes
T := {τ ∈ AC |{Sτ ∧t }t≥0 is uniformly integrable},
and the gambler’s problem can be formulated as
max V (Sτ ).
τ ∈T
(7)
Next, we characterize the distribution of Sτ for τ ∈ T . Because {Sτ ∧t }t≥0 is uniformly
integrable, we conclude that Sτ is integrable and that E[Sτ ] = 0. Thus, the following is a
16
Considering an infinite time horizon greatly reduces the technical difficulties. In the finite time horizon
case, which we hope to study in a separate work, the characterization of the feasible distributions of Sτ is
much more involved.
17
Another approach would be to impose a finite credit line, e.g., St ≥ −L, t ≥ 0 for some L > 0.
19
natural candidate for the set of feasible distributions of Sτ :
(
M0 (Z) =
)
probability measure µ on Z :
X
|n| · µ({n}) < ∞,
n∈Z
X
n · µ({n}) = 0 . (8)
n∈Z
The following theorem confirms that M0 (Z) fully characterizes the feasible set.
Theorem 2 For any µ ∈ M0 (Z), there exists {ri }i∈Z such that Sτ follows µ where τ is
defined as in (4) with P(ξt,i = 0) = ri , i ∈ Z. Furthermore, {Sτ ∧t }t≥0 is uniformly integrable
and does not visit states outside any interval that contains the support of µ. Conversely,
for any τ ∈ AC such that {Sτ ∧t }t≥0 is uniformly integrable, the distribution of Sτ belongs
to M0 (Z).
Note that M0 (Z) is a convex set and contains the distribution of Sτ for any τ ∈ AD
with uniform integrability. Consequently, the distribution of Sσ for any full randomization
σ of possibly path-dependent strategies also lies in M0 (Z) and Theorem 2 implies that it
may be achieved using a strategy in AC .
With xn := P(Sτ ≥ n) and yn := P(Sτ ≤ −n), n ≥ 1, we have
V (Sτ ) =
=
∞
X
n=1
∞
X
u+ (n) (w+ (xn ) − w+ (xn+1 )) −
(u+ (n) − u+ (n − 1)) w+ (xn ) −
n=1
∞
X
u− (n) (w− (yn ) − w− (yn+1 ))
n=1
∞
X
(u− (n) − u− (n − 1)) w− (yn )
n=1
where the second equality is due to Fubini’s theorem. Recall that V (Sτ ) := −∞ if the
second sum is infinite. With the notations x := (x1 , x2 , . . . ), y := (y1 , y2 , . . . ), we denote
U (x, y) :=
∞
X
(u+ (n) − u+ (n − 1)) w+ (xn ) −
n=1
∞
X
(u− (n) − u− (n − 1)) w− (yn ).
(9)
n=1
Again, U (x, y) := −∞ if the second sum is infinite.
In view of Theorem 2, problem (7) can be translated into the following optimization
20
problem:
max
x,y
U (x, y)
subject to 1 ≥ x1 ≥ x2 ≥ ... ≥ xn ≥ ... ≥ 0,
1 ≥ y1 ≥ y2 ≥ ... ≥ yn ≥ ... ≥ 0,
(10)
x1 + y1 ≤ 1,
P∞
P∞
x
=
n
n=1
n=1 yn .
Note that xn stands for P(Sτ ≥ n), so we must have the first constraint in (10). Similarly,
yn stands for P(Sτ ≤ −n), hence the second constraint in (10). In addition,
x1 + y1 = P(Sτ ≥ 1) + P(Sτ ≤ −1) = 1 − P(Sτ = 0) ≤ 1,
which explains the third constraint. Finally, the last constraint is the translation of E[Sτ ] =
0.
Because of Theorem 2, problems (7) and (10) are equivalent in terms of the optimal
value and the existence of the optimal solution. If we can find an optimal solution to
problem (10), which represents a distribution on Z, then Theorem 2 shows that we can
find a stopping time τ ∈ T such that Sτ follows this distribution. Consequently, τ is the
optimal stopping time for the gambling problem (7). The details of the construction of τ
are provided in Appendix B.
Therefore, to find the gambler’s optimal strategy, we only need to solve (10). To model
individuals’ tendency to overweigh the tails of a distribution, w± (z) should be concave
in z in the neighbourhood of 0. Consequently, U (x, y) is neither concave nor convex in
(x, y) and thus it is hard to solve (10) analytically. Assuming power probability weighting
functions w± (z) = z δ± , we are able to find a complete solution to (10).18 Because the
rigorous analysis leading to this complete solution is complex and lengthy, we develop and
present the analysis in a companion paper Authors (2014). Here, we only take a numerical
example from that paper to illustrate the procedure of solving the gambling problem set
out above. In this example, with α+ = 0.5, α− = 0.9, δ± = 0.52, and λ = 2.25, the optimal
18
Assuming concave power probability weighting functions is reasonable because these functions are
concave in the neighbourhood of 0. For general weighting functions, we are not able to find a complete
solution to (10).
21
distribution of Sτ is


0.5
0.5
1/0.48
0.5
0.5
1/0.48

0.1360 (n − (n − 1) )
,
− ((n + 1) − n )





0.0933,
P(Sτ = n) =



0.8851,





0,
n ≥ 2,
n = 1,
n = −1,
n = 0 or n ≤ −2.
We now construct a stopping time that achieves this distribution. In view of Theorem
2, we only need to determine ri ’s, i.e., the probabilities of the random coins showing heads.
Following the construction in Appendix B, we obtain
rn = 1, n ≤ −1, r0 = 0, r1 = 0.0571, r2 = 0.0061, r3 = 0.0025, r4 = 0.0014, . . . .
Therefore, the gambler’s strategy is as follows: stop once reaching −1; never stop at 0; toss
a coin with tails probability 0.0571 when reaching 1 and stop only if the toss outcome is
tails; toss a coin with tails probability 0.0061 when reaching 2 and stop only if the toss
outcome is tails; and so on. Note that this strategy requires a coin toss at every node19 .
6
Conclusion
This paper considers the dynamic casino gambling model with CPT preferences that was
initially proposed by Barberis (2012). In there, by studying a specific example, it was shown
that CPT is able to explain several empirically observed gambling behaviors. Because of the
restrictive set of strategies used in Barberis (2012) it was not possible to consider therein
two other commonly observed phenomena, namely: use of path-dependent strategies and
use of an independent randomization for decision making. Our first contribution here
was to show that CPT, as a descriptive model for individuals’ preferences, accounts also
for these two types of gambling behavior. We have illustrated, via simple examples, that
path-dependent stopping strategies and randomized stopping strategies strictly outperform
path-independent strategies. Moreover, we studied in detail the relation between different
classes of strategies and what features of preferences favour which strategies. We have
19
In our companion paper, a different type of randomized strategy, one that is path-dependent but
requires only a limited number of coin tosses, will be presented.
22
shown that agents with quasi-convex preferences have no incentive to use path-dependent
or randomized strategies. In particular, the improvement in performance brought by these
strategies in the casino gambling problem is a consequence of lack of quasi-convexity of
CPT preferences.
As mentioned before, the analysis in Barberis (2012) proceeds by enumerating all the
(Markovian) strategies in the 5-period model. Our second main contribution here was to
develop a systematic approach to solving the casino gambling problem analytically. Out
method involves changing the decision variables, establishing a new Skorokhod embedding
result, and solving an infinite-dimensional program. The solution of the infinite-dimensional
program is highly involved, so we leave it to a companion paper Authors (2014). In addition,
we have studied here only the infinite-horizon setting, which allowed us to focus on the main
features of the problem. Adapting our approach to a finite horizon case, which we hope to
pursue in future work, will involve new technical challenges. Finally, we focus only on one
type of gambler, i.e., sophisticated gamblers with pre-commitment. This is the crucial and
most involved type of agent which also highlights best the features of CPT preferences.
The other two types of gamblers addressed in Barberis (2012), i.e., naive gamblers and
sophisticated gamblers without pre-commitment, can be studied using the results of this
paper and we intend to do so in our future work.
Note. Recently, and independently of our work, Henderson et al. (2014) observed that
randomized strategies may be necessary for optimal gambling strategies. This observation
emerged in the course of a conversation between one of those authors and two of the
authors of the present paper at the SIAM Financial Mathematics meeting in November
2014 in Chicago. The other paper was subsequently posted on SSRN.
A
A.1
Proofs
Proof of Proposition 1
For any τ ∈ AD , let r(t, x) = P(τ = t|Xt = x, τ ≥ t), t ≥ 0, x ∈ Z. Take mutually
independent random variables {ξt,x }t≥0,x∈Z , which are also independent of {Xt }t≥0 , such
that P(ξt,x = 0) = r(t, x) = 1 − P(ξt,x = 1). Define τ˜ = inf{t ∈ N : ξt,Xt = 0} ∈ AC . We
will show that (Xτ˜ , τ˜) is identically distributed as (Xτ , τ ), i.e., for any s ≥ 0, x ∈ Z,
P(Xτ = x, τ = s) = P(Xτ˜ = x, τ˜ = s).
23
(11)
We prove this by mathematical induction.
We first show that (11) is true for s = 0. Indeed, for any x ∈ Z, we have
P(Xτ = x, τ = 0) = P(X0 = x, τ = 0, τ ≥ 0) = P(X0 = x, τ ≥ 0)P(τ = 0|X0 = x, τ ≥ 0)
= P(X0 = x)r(0, x) = P(X0 = x)P(ξ0,x = 0)
= P(X0 = x, ξ0,x = 0) = P(X0 = x, τ˜ = 0)
= P(Xτ˜ = x, τ˜ = 0),
where the fifth equality is due to the independence of ξ0,x and {Xt }t≥0 and the sixth equality
follows from the definition of τ˜.
Next, we suppose that (11) is true for s ≤ t and show that it is also true for s = t + 1.
First, note that {Xt }t≥0 is Markovian with respect both to the filtration generated by itself
and to the filtration enlarged by randomization {ξt,x }t≥0,x∈Z . Furthermore, τ and τ˜ are
stopping times with respect to these two filtrations, respectively. As a result, for any s < t,
given Xs , events {τ = s} and {˜
τ = s} are independent of Xt . Then, we have
P(Xt = x, τ ≤ t) =
XX
=
XX
s≤t
s≤t
=
=
y
P(Xt = x|Xs = y)P(Xs = y, τ˜ = s)
y
XX
s≤t
P(Xt = x|Xs = y)P(Xs = y, τ = s)
y
XX
s≤t
P(Xt = x|Xs = y, τ = s)P(Xs = y, τ = s)
P(Xt = x|Xs = y, τ˜ = s)P(Xs = y, τ˜ = s)
y
= P(Xt = x, τ˜ ≤ t),
where the third equality is the case because (11) holds for s ≤ t by mathematical induction.
24
Consequently,
P(Xτ = x, τ = t + 1)
=P(τ = t + 1|Xt+1 = x, τ ≥ t + 1)P(Xt+1 = x, τ ≥ t + 1)
X
=r(t + 1, x)
P(Xt+1 = x, Xt = y, τ ≥ t + 1)
y
=r(t + 1, x)
X
P(Xt+1 = x|Xt = y, τ ≥ t + 1)P(Xt = y, τ ≥ t + 1)
y
=r(t + 1, x)
X
=r(t + 1, x)
X
=r(t + 1, x)
X
P(Xt+1 = x|Xt = y)P(Xt = y, τ ≥ t + 1)
y
P(Xt+1 = x|Xt = y) (P(Xt = y) − P(Xt = y, τ ≤ t))
y
P(Xt+1 = x|Xt = y) (P(Xt = y) − P(Xt = y, τ˜ ≤ t))
y
=r(t + 1, x)
X
P(Xt+1 = x|Xt = y)P(Xt = y, τ˜ ≥ t + 1)
y
=r(t + 1, x)
X
=r(t + 1, x)
X
P(Xt+1 = x|Xt = y, τ˜ ≥ t + 1)P(Xt = y, τ˜ ≥ t + 1)
y
P(Xt+1 = x, Xt = y, τ˜ ≥ t + 1)
y
=P(˜
τ = t + 1|Xt+1 = x, τ˜ ≥ t + 1)P(Xt+1 = x, τ˜ ≥ t + 1)
=P(Xτ˜ = x, τ˜ = t + 1).
Here, the fourth and eighth equalities hold because of the Markovian property of {Xt }t≥0
and the tenth equality is the case because of the definition of τ˜.
By mathematical induction, (11) holds for any t and x.
Finally, consider a randomized, path-dependent strategy τ 0 , which is constructed as
τ 0 = inf{t ≥ 0|ξt,Xt∧· = 0},
where {ξt,xt∧· } is a sequence of independent 0-1 random variables which are independent
of {Xt }. Denote {Gt0 } as the filtration generated by {Xt } and the independent 0-1 random
variables, i.e., Gt0 := σ(Xu , ξu,Xu∧· , u ≤ t). Then, {Xt } is still a Markov process with respect
to {Gt0 } and τ 0 is a (possibly path-dependent) stopping time with respect to the same
filtration. As a result, the above proof is still valid if we replace τ with τ 0 , so (τ 0 , Xτ 0 ) is
25
identically distributed as (˜
τ , Xτ˜ ) for some τ˜ ∈ AC .
A.2
Proof of Theorem 2
It is straightforward to see that for any τ ∈ AC such that {Sτ ∧t }t≥0 is uniformly integrable,
the distribution of Sτ belongs to M0 (Z). We only need to show that for any µ ∈ M0 (Z), we
can construct τ ∈ AC such that Sτ is distributed as µ and {Sτ ∧t }t≥0 is uniformly integrable.
In the following, we denote r := {ri }i∈Z , where ri ∈ [0, 1]. Given r, consider independent
0-1 random variables {ξt,i }t≥0,i∈Z that are also independent of {St }t≥0 and satisfy P(ξt,i =
0) = 1 − P(ξt,i = 1) = ri , i ∈ Z. Denote
τ (r) := inf{t ≥ 0 : ξt,St = 0}.
Note that {St }t≥0 is still a symmetric random walk with respect to {Gt }t≥0 , where Gt :=
σ(Su , ξu,Su : u ≤ t), and τ (r) is a stopping time with respect to {Gt }t≥0 . Furthermore, the
distribution of Sτ (r) depends on r but is independent of the selection of {ξt,i }t≥0,i∈Z .
The proof of Theorem 2 is divided into several steps. We first consider the case in which
µ has finite support, i.e., µ([−B, A]) = 1 and µ({A}) > 0, µ({−B}) > 0 for some A, B ∈ Z.
Because µ is centered, i.e., it has zero mean, we must have A, B ≥ 0. We want to find r
such that τ (r) embeds the distribution µ, i.e., Sτ (r) has distribution µ.
Let H[−B,A] = inf{t ≥ 0|St ≥ A or St ≤ −B}. Consider the set of r which embed less
probability mass than prescribed by µ on (−B, A), i.e.,
Rµ = {r ∈ [0, 1]Z |ri = 1 if i ∈
/ (−B, A) and P(Sτ (r) = i) ≤ µ({i}) if i ∈ (−B, A)}.
By choosing ri = 0 for i ∈ (−B, A) and ri = 1 for i ∈
/ (−B, A), we have P(Sτ (r) = i) = 0
for i ∈ (−B, A), so Rµ is non-empty. Furthermore, by definition, τ (r) ≤ H[−B,A] for any
r ∈ Rµ .
Proposition 3 If r, r0 ∈ Rµ , then so does their maximum, i.e. ˜r ∈ Rµ , where r˜i = ri ∨ ri0 ,
i ∈ Z.
Proof Fix r, r0 ∈ Rµ and define ˜r = {˜
ri }i∈Z with r˜i = ri ∨ ri0 . Denote {ξt,i }t≥0,i∈Z as the
sequence of 0-1 random variables used to construct τ (r). Construct another sequence of
0-1 random variables {εt,i }t≥0,i∈Z that are independent of each other and of {St }t≥0 , and
26
satisfy
P(εt,i = 0) = 1 − P(εt,i
(ri0 − ri )+
= 1) =
1ri <1 ,
1 − ri
t ≥ 0, i ∈ Z.
Define ξ˜t,i := ξt,i εt,i . Then,
P(ξ˜t,i = 0) = P(ξt,i = 0) + P(ξt,i = 1)P(εt,i = 0) = ri + (1 − ri )
(ri0 − ri )+
1ri <1 = ri ∨ ri0 .
1 − ri
Therefore, τ (˜r) can be constructed as follows
τ (˜r) = inf{t ≥ 0|ξ˜t,St = 0} = inf{t ≥ 0|ξt,St εt,St = 0}.
One can easily see that τ (˜r) ≤ τ (r).
Now, consider any i ∈ Z ∩ (−B, A) such that ri ≥ ri0 , i.e., r˜i = ri . In this case,
P(εt,i = 0) = 0, i.e., εt,i = 1. Note that for any t ≥ 0,
{Sτ (˜r) = i, τ (˜r) = t} = {ξu,Su εu,Su = 1, u ≤ t − 1, ξt,i εt,i = 0, St = i}
⊆ {ξu,Su = 1, u ≤ t − 1, ξt,i εt,i = 0, St = i}
= {ξu,Su = 1, u ≤ t − 1, ξt,i = 0, St = i}
= {Sτ (r) = i, τ (r) = t},
where the second equality is the case because εt,i = 1. Therefore, we conclude {Sτ (˜r) =
i} ⊆ {Sτ (r) = i}, so P(Sτ (˜r) = i) ≤ P(Sτ (r) = i) ≤ µ({i}).
Similarly, for any i ∈ Z ∩ (−B, A) such that ri < ri0 , we also have P(Sτ (˜r) = i) ≤ µ({i}).
Furthermore, it is obvious that r˜i = 1, i ∈
/ (−B, A). Thus, ˜r ∈ Rµ . Q.E.D
The following lemma is useful:
Lemma 1 The distribution of Sτ (r) is continuous in r ∈ R := {r = {ri }i∈Z |ri = 1, i ∈
/
(−B, A)}.
Proof We need to prove that P(Sτ (r) = i) is continuous in r for any i ∈ Z. For any t ≥ 0,
P(Sτ (r) = i, τ (r) = t) = P(ξu,Su = 1, u ≤ t − 1, ξt,St = 0, St = i)
= E E[1ξu,Su =1,u≤t−1,ξt,St =0,St =i |σ(Su : u ≤ t)] .
27
For each realization of Su , u ≤ t, the above conditional probability is obviously continuous
in r. Because the number of realizations of Su , u ≤ t is finite, we conclude that P(Sτ (r) =
i, τ (r) = t) is continuous in r for any t ≥ 0.
Finally, we have τ (r) ≤ H[−B,A] for any r ∈ R, so
sup P(Sτ (r) = i, τ (r) ≥ t) ≤ P(H[−B,A] ≥ t),
r∈R
which goes to zero as t goes to infinity. Therefore, we conclude that P(Sτ (r) = i) is
continuous in r ∈ R. Q.E.D
Define rmax = {rimax }i∈Z with rimax := sup{s|r ∈ Rµ with ri = s}.
Proposition 4 rmax is the maximal of Rµ , i.e., rmax ∈ Rµ and rimax ≥ ri , i ∈ Z for
any r = {ri }i∈Z ∈ Rµ . Furthermore, Sτ (rmax ) follows distribution µ and {Sτ (rmax )∧t }t≥0 is
uniformly integrable.
Proof By definition, rimax = 1 for i ∈
/ (−B, A). For i ∈ (−B, A), there exist rn,i =
rjn }j∈Z with r˜jn := maxi∈(−B,A) rjn,i .
{rjn,i }j∈Z ∈ Rµ such that limn→∞ rin,i = rimax . Define ˜rn := {˜
Then, by Proposition 3, ˜rn ∈ Rµ and, furthermore, we can assume that r˜jn is increasing in
n for each j. Moreover, by construction, limn→∞ r˜in = rimax for each i ∈ (−B, A). Because
of Lemma 1, we conclude that rmax ∈ Rµ . By definition, it is clear that rmax is the maximal
of Rµ .
Next, {Sτ (rmax )∧t }t≥0 is uniformly integrable because rmax ≤ H[−B,A] and {SH[−B,A] ∧t }t≥0
is uniformly integrable. Consequently, Sτ (rmax ) has zero mean.
Finally, we show that Srmax follows distribution µ. By definition, P(Srmax = i) = 0 =
µ({i}) for i ∈
/ [−B, A]. We claim that P(Srmax = i) = µ({i}) for i ∈ (−B, A). Otherwise,
there exist i0 ∈ (−B, A) such that P(Sτ (rmax ) = i0 ) < µ({i0 }). Consider r with ri = rimax
for i 6= x and ri0 = rimax
+ . It follows that P(Sτ (r ) = i) ≤ P(Sτ (rmax ) = i) ≤ µ({i}) for
0
i 6= i0 . Further, by Lemma 1, P(Sτ (r ) = i0 ) ≤ µ({i0 }) for sufficiently small . Therefore,
r ∈ Rµ , contradicting the fact that rmax is the maximal of Rµ . Because both Sτ (rmax ) and
µ have zero mean and they agree on all states i except for −B and A, we conclude that
they have the same distribution. Q.E.D
Proposition 4 establishes Theorem 2 for µ with bounded support. The final step is to
extend the result to general µ ∈ M0 through a limiting and coupling procedure. Let us fix
µ ∈ M0 without bounded support and, without loss of generality, assume that the support
28
is unbounded on the right, i.e., µ((−∞, n]) < 1 for all n ≥ 0.
For each integer n ≥ 1, we want to construct µn such that µn has support on [−n, n], is
the same as µ on (−n, n), and has zero mean, where n will be determined later. To meet
the constraints, we must have

 µ ({−n}) + µ((−n, n)) + µ ({n}) = 1,
n
n
 −nµ ({−n}) + nµ ({n}) + Pn−1
iµ({i}) = 0,
n
n
i=−n+1
from which we can solve
"
#
n−1
X
1
n(1 − µ((−n, n))) −
iµ({i}) ,
µn ({n}) =
n+n
i=−n+1
"
#
n−1
X
1
µn ({−n}) =
n(1 − µ((−n, n))) +
iµ({i}) .
n+n
i=−n+1
(12)
We need to choose n such that µn is truly a probability measure, i.e., such that µn ({n}) ≥ 0
and µn ({−n}) ≥ 0.
We define n as
n := min {k ∈ N|f (n, k) > 0}} ,
f (n, k) := k(1 − µ((−k, n))) −
n−1
X
iµ({i}).
i=−k+1
For each fixed n, one can see that f (n, k) is increasing in k and limk→∞ f (n, k) = +∞
because µ((−∞, n]) < 1. Therefore, n is well-defined. By definition, µn ({n}) defined as in
(12) is positive. Furthermore, n > 0 because f (n, 0) ≤ 0.
On the other hand,
n(1 − µ((−n, n))) +
n−1
X
iµ({i}) = −f (n, n − 1) + (n + n − 1)(1 − µ((−n, n))) > 0,
i=−n+1
where the inequality is the case because f (n, n−1) ≤ 0 by the definition of n, µ((∞, n)) < 1,
n ≥ 1, and n > 0. Therefore, µn ({−n}) defined as in (12) is positive.
Finally, because f (n, k) is decreasing in n for each k, we conclude that n is increasing
in n. We claim that −n converges to the left end of the support of µ as n goes to infinity,
i.e., limn→∞ n = sup{i ∈ Z|µ((−∞, −i]) > 0}. Otherwise, there exists a positive integer
29
m < sup{i ∈ Z|µ((−∞, −i]) > 0} such that n ≤ m for any n. Because f (n, k) is increasing
in k, we conclude that f (n, m) > 0 for any n. However,
∞
X
lim f (n, m) = m(1 − µ((−m, +∞))) −
n→∞
iµ({i}) < 0,
i=−m+1
P∞
where the last inequality is the case because
i=−∞ iµ({i}) = 0 and m < sup{i ∈
Z|µ((−∞, −i]) > 0}.
To conclude, for each n ≥ 1, we construct positive integer n and measure µn such that
µn has support [−n, n] and is the same as µ on (−n, n), and −n converges to the left end
of the support of µ as n goes to infinity.
Denote rn = {rin }i∈Z as the maximal of Rµn as in Proposition 4. We show that rin
is decreasing in n for each i. To this end, we only need to show that rin ≥ rin+1 for any
i ∈ (−n, n). If it is not the case, there exist i0 ∈ (−n, n) such that rin0 < rin+1
. We define
0
˜r = {˜
ri }i∈Z with r˜i := 1 for i ∈
/ (−n, n) and r˜i := rin+1 for i ∈ (−n, n). It is obvious that
for any i ∈ (−n, n),
P(S˜r = i) ≤ P(Srn+1 = i) = µ({i}).
Therefore, ˜r ∈ Rµn , but this contradicts the maximality of rn in Rµn .
Because rin is decreasing in n for each i, we can define ri := limn→∞ rin and denote
r = {ri }i∈Z . We now construct a version of τ (rn )’s. Construct 0-1 random variables
{εnt,i }t≥0,n≥1,i∈Z that are independent of each other and {St }t≥0 and satisfy
P(ε1t,i = 1) = ri1 ,
P(εnt,i = 1) =
rin
rin−1
1rin−1 >0 , n ≥ 2.
Q
n
n
Define ξt,i
= 1 − nk=1 εkt,i , n ≥ 1. Then, it is straightforward to verify that P(ξt,i
= 0) = rin .
n
Because for each n, ξt,i
, t ≥ 0, i ∈ Z are independent of each other and {St }t≥0 , we can
n
n
define τ (r ) = inf{t ≥ 0|ξt,S
= 0}. By definition, τ (rn ) is increasing in n and
t
lim τ (rn ) = inf{t ≥ 0|ξt,St = 0},
n→∞
n
where ξt,i := limn→∞ ξt,i
. Note that ξt,i , t ≥ 0, i ∈ Z are well-defined 0-1 random variables,
30
independent of each other, and independent of {St }t≥0 . Furthermore,
n
P(ξt,i = 0) = lim P(ξt,i
= 0) = lim rin = ri ,
n→∞
n→∞
so inf{t ≥ 0|ξt,St = 0} is a version of τ (r). With this version, we have τ (r) = limn→∞ τ (rn ).
P
We now show that τ (r) < ∞ almost surely. To this end, let Ft := |St | − t−1
j=0 1Sj =0 ,
t ∈ N. One can check that {Ft }t≥0 is a martingale. By the optional sampling theorem,
using τ (rn ) ≤ H[−n,n] , we conclude E[Fτ (rn ) ] = 0. This yields the first equality in the
following computation:
τ (rn )−1
E[
X
1Sj =0 ] = E[|Sτ (rn ) |] =
j=0
X
|i|µn ({i}) ≤
X
|i|µ({i}) < ∞.
(13)
i∈Z
i∈Z
P (r)−1
1Sj =0 ],
By the monotone convergence theorem, the left-hand side of (13) converges to E[ τj=0
and it follows that τ (r) < ∞ almost surely. Otherwise, this expectation would be infinite
by the recurrence of symmetric random walk.
Therefore, we have that τ (rn ) converges increasingly to τ (r) and Sτ (rn ) converges to
Sτ (r) , almost surely, when n → ∞. Note that for each i 6= inf{j ∈ Z|µ({j}) > 0},
we have P(Sτ (rn ) = i) = µn ({i}) = µ({i}) for sufficiently large n, so P(Sτ (r) = i) =
limn→∞ P(Sτ (rn ) = i) = µ({i}). Hence Sτ (r) follows distribution µ. Furthermore,
X
i∈Z
|i|µ({i}) = E[|Sτ (r) |] ≤ lim inf E[|Sτ (rn ) |] ≤ lim sup E[|Sτ (rn ) |] ≤
n→∞
n→∞
X
|i|µ({i}) < ∞,
i∈Z
where the first inequality is due to Fatou’s lemma and the third inequality is because of (13).
Therefore, we have limn→∞ E[|Sτ (rn ) |] = E[|Sτ (r) |] < ∞. It follows by Scheffe’s lemma (see
e.g., Williams, 1991) that Sτ (rn ) → Sτ (r) in L1 , and hence E[Sτ (r) |Fτ (rn ) ] = Sτ (rn ) almost
surely. By the martingale convergence theorem and the tower property of conditional
expectation,
E[Sτ (r) |Fτ (r)∧t ] = lim E[Sτ (r) |Fτ (rn )∧t ] = lim E[E[Sτ (r) |Fτ (rn ) ]|Fτ (rn )∧t ]
n→∞
n→∞
= lim E[Sτ (rn ) |Fτ (rn )∧t ] = lim Sτ (rn )∧t = Sτ (r)∧t .
n→∞
n→∞
Therefore, we conclude that {Sτ (r)∧t }t≥0 is uniformly integrable.
Finally, by construction, it is easy to see that ri = 1 for any i that is not in the support
31
of µ and that is at the boundaries of the support. Therefore, {Sτ (r)∧t }t≥0 will never visit
states outside any interval that contains the support of µ.
B
Construction of Random Coins
In this section, we work under the assumptions of Theorem 2 and provide an algorithmic
method for computing {ri }i∈Z obtained therein. We let τ = τ (r) and gi denote the expected
number of visits of {St }t≥0 to state i strictly before τ , i.e.,
gi :=
∞
X
P (τ > t, St = i) .
(14)
t=0
An argument similar to the one in (13) shows that gi < ∞. We may then compute, writing
P
pi := µ({i}) = P(Sτ = i) = ∞
t=0 P (τ = t, St = i), as follows:
gi + pi =
∞
X
P (τ ≥ t, St = i) = 1{i=S0 } +
∞
X
P (τ ≥ t + 1, St+1 = i)
t=0
t=0
= 1{i=S0 } +
∞
X
P (τ > t, St+1 = i, St = i − 1) +
t=0
= 1{i=S0 } +
∞
X
∞
X
P (τ > t, St+1 = i, St = i + 1)
t=0
P (τ > t, St = i − 1) P (St+1 = i|St = i − 1, τ > t)
t=0
+
∞
X
P (τ > t, St = i + 1) P (St+1 = i|St = i + 1, τ > t)
t=S0
= 1{i=S0 } +
∞
X
1
t=1
2
P (τ > t, St = i − 1) +
∞
X
1
t=1
2
P (τ > t, St = i + 1)
1
1
= 1{i=S0 } + gi−1 + gi+1 ,
2
2
where the fifth equality is the case because {St }t≥0 is Markovian. In other words, we have
the following identity:
1
1
gi + pi − 1{i=S0 } = gi−1 + gi+1 ,
2
2
i ∈ Z.
(15)
Note that given a probability vector {pi }i∈Z , (15) may have an infinite number of solutions unless we specify a boundary condition. In general, we have gi → 0 as i → ±∞.
32
Further, if the support of µ is bounded from above or below, then gi is zero for i above or
below the bound of the support, respectively. Then (15) admits a unique solution and the
solution is finite. This allows us to recover gi from pi .
Finally, to construct r, we need to connect r with gi and pi . Note that
pi = P Sτ (r)
∞
X
=i =
P (τ (r) = t, St = i)
t=0
=
∞
X
P (ξu,Su = 1, u = 0, 1, . . . , t − 1, ξt,St = 0, St = i)
t=0
=
=
∞
X
t=0
∞
X
P (ξu,Su = 1, u = 0, 1, . . . , t − 1, St = i) P (ξt,St = 0|ξu,Su = 1, u = 0, 1, . . . , t − 1, St = i)
P (ξu,Su = 1, u = 0, 1, . . . , t − 1, St = i) ri
t=0
∞
X
= ri
P (τ (r) ≥ t, St = i) = ri (gi + pi ).
t=0
i
Therefore, if pi + gi > 0, we must have ri = pip+g
. If pi + gi = 0, then ri can take any value
i
in [0, 1]. Because we consider r to be the maximal in Rµ , we set ri = 1. To summarize, we
have
ri =
pi
1{p +g >0} + 1{pi +gi =0} ,
pi + gi i i
i ∈ Z.
(16)
References
Agranov, M. and Ortoleva, P. (2013). Stochastic choice and hedging, Technical report,
Mimeo California Institute of Technology.
Authors (2014). Stopping strategies of behavioral gamblers in infinite horizon. Working
Paper.
Barberis, N. (2012). A model of casino gambling, Management Science 58(1): 35–51.
Blavatskyy, P. R. (2006). Violations of betweenness or random errors?, Economics Letters
91(1): 34–38.
33
Bordley, R. and Hazen, G. B. (1991). SSB and weighted linear utility as expected utility
with suspicion, Management Science 37(4): 396–408.
Camerer, C. F. and Ho, T.-H. (1994). Violations of the betweenness axiom and nonlinearity
in probability, Journal of Risk and Uncertainty 8(2): 167–196.
Chew, S.-H. (1983). A generalization of the quasilinear mean with applications to the
measurement of income inequality and decision theory resolving the allais paradox, Econometrica 51(4): 1065–1092.
Chew, S. H. (1989). Axiomatic utility theories with the betweenness property, Annals of
Operations Research 19(1): 273–298.
Chew, S. H., Epstein, L. G. and Segal, U. (1991). Mixture symmetry and quadratic utility,
Econometrica pp. 139–163.
Chew, S. H. and MacCrimmon, K. R. (1979). Alpha-nu choice theory: a generalization of
expected utility theory. Working Paper.
Conlisk, J. (1993). The utility of gambling, Journal of Risk and Uncertainty 6(3): 255–275.
Dekel, E. (1986). An axiomatic characterization of preferences under uncertainty: Weakening the independence axiom, Journal of Economic Theory 40(2): 304–318.
Dwenger, N., K¨
ubler, D. and Weizsacker, G. (2013). Flipping a coin: Theory and evidence.
Working Paper.
URL: http://ssrn.com/abstract=2353282
Fishburn, P. (1981). An axiomatic characterization of skew-symmetric bilinear functionals,
with applications to utility theory, Economics Letters 8(4): 311–313.
Fishburn, P. C. (1983).
31(2): 293–317.
Transitive measurable utility, Journal of Economic Theory
Gul, F. (1991). A theory of disappointment aversion, Econometrica 59(3): 667–686.
Henderson, V., Hobson, D. and Tse, A. (2014). Randomized strategies and prospect theory
in a dynamic context. SSRN: 2531457.
Hu, S. (2014). Optimal Exist Strategies of Behavioral Gamblers, PhD thesis, The Chinese
University of Hong Kong.
34
Kahneman, D. and Tversky, A. (1979). Prospect theory: An analysis of decision under
risk, Econometrica 47(2): 263–291.
Machina, M. J. (1982). “Expected utility” analysis without the independence axiom, Econometrica 50(2): 277–322.
Machina, M. J. (1985). Stochastic choice functions generated from deterministic preferences
over lotteries, Economic Journal 95(379): 575–594.
Prelec, D. (1998). The probability weighting function, Econometrica 66(3): 497–527.
Quiggin, J. (1982). A theory of anticipated utility, Journal of Economic Behavior and
Organization 3: 323–343.
Starmer, C. (2000). Developments in non-expected utility theory: The hunt for a descriptive
theory of choice under risk, Journal of Economic Literature 38(2): 332–382.
Thaler, R. H. and Johnson, E. J. (1990). Gambling with the house money and trying
to break even: The effects of prior outcomes on risky choice, Management Science
36(6): 643–660.
Tversky, A. and Kahneman, D. (1992). Advances in prospect theory: Cumulative representation of uncertainty, Journal of Risk and Uncertainty 5(4): 297–323.
Williams, D. (1991). Probability with Martingales, Cambridge University Press.
Xu, Z. Q. and Zhou, X. Y. (2012). Optimal stopping under probability distortion, Annals
of Applied Probability 23(1): 251–282.
Yaari, M. E. (1987). The dual theory of choice under risk, Econometrica 55(1): 95–115.
35