illustrating the link between modern portfolio theory and kelly

Transcription

illustrating the link between modern portfolio theory and kelly
ILLUSTRATING THE LINK BETWEEN MODERN PORTFOLIO
THEORY AND KELLY
TOM STARKE
0
ILLUSTRATING THE LINK BETWEEN MODERN PORTFOLIO THEORY AND KELLY
1
1. Introduction
This article was hugely inspired by Ernie Chan’s work, so thanks Ernie for
providing us with so much insight.
If you are active in the algorithmic trading field you will have come across
Ernie Chan’s books1. He has a sizeable section dedicated to portfolio management
and describes how to use the Kelly formula to apply the optimum leverage in
order to maximise returns. This approach was first applied and popularised by
Thorpe2. Recently, Ernie has posted a very interesting piece of work on his blog3
which unifies the portfolio management approaches of Kelly and Markowitz and
his modern portfolio theory. This post is based on his work, please have a look at
his post now, before you read on.
Markowitz is generally regarded as a ’serious’ academic theory whereas Thorpe
was just a mathematically-minded Black-Jack player who also made a couple of
hundred millions on the stock market. The Kelly formula works on the assumption
that, with respect to your bankroll, there is an optimum fraction you can bet to
maximise your profits. However, both theories work on the (mostly inaccurate)
assumption that returns are normally distributed. In reality we often have fatter
tails, which means that we should at least reduce the amount of leverage that we
use. Even though we may maximise our profits with straight-Kelly, such a strategy
can also suffer from enormous drawdowns and anyone who experienced the darker
side of trading psychology knows what that means.
If you had a look at his post you may have realised that there is a lot of algebra
happening and some of you may want to know how they can use this in practical
terms. In this post I will walk you through his work and provide you with code
examples that illustrate the concepts. I hope you enjoy it and get a little more
enlightened in the process. In this post I used random data rather than actual
stock data, which will hopefully help you to get a sense of how to use modelling
and simulation to improve your understanding of the theoretical concepts. Don‘t
forget that the skill of an algo-trader is to put mathematical models into code and
this example is great practise.
As a final note I would like to point out that from my experience there are
a lot of people with IT experience reading these articles who do not always
have the in-depth mathematical background. For that reason I try not to jump
too many steps to increase the educational value. So let‘s start with importing a few modules, which we need later and produce a series of normally distributed returns. CVXOPT is a convex solver which you can easily download with
sudo pip install cvxopt.
1
http://epchan.blogspot.com.au
http://www.eecs.harvard.edu/cs286r/courses/fall12/papers/Thorpe_
KellyCriterion2007.pdf
3
http://epchan.blogspot.com.au/2014/08/kelly-vs-markowitz-portfolio.html
2
2
TOM STARKE
from numpy i mp ort ∗
from p y l a b im por t p l o t , show , x l a b e l , y l a b e l
from cv xo pt imp ort m a t r i x
from cv xo pt . b l a s i mpo rt dot
from cv xo pt . s o l v e r s im por t qp
im po rt numpy a s np
from s c i p y i mpo rt o p t i m i z e
n = 5 # number o f a s s e t s i n t h e p o r t f o l i o
pbar = [ ] ; r e t v e c = [ ]
f o r i in range (n) :
r e t v e c . append ( random . randn ( 1 0 0 0 ) + 0. 01 )
pbar . append ( mean ( r e t v e c [ − 1 ] ) )
Listing 1. Producing normally distributed return series.
These return series can be used to create a wide range of portfolios, which all
have different returns and risks (standard deviation). We can produce a wide range
of random weight vectors and plot those portfolios. In this example I introduce
an upward trending bias by adding 0.01, which makes things easier for illustration
but you can set it to zero and see what happens. You can also see that there is
a filter that only allows to plot portfolios with a standard deviation of < 1.2 for
better illustration.
def rand weight (n) :
k = random . r a n d i n t ( −101 ,101 , n )
r e t u r n k/ f l o a t ( sum ( k ) )
def calc mean ret ( ret vec , n , f ) :
means = [ ] ; s t d s = [ ]
f o r i in range (3000) :
w = f (n)
i f s t d ( a r r a y ( r e t v e c ) . T∗w) < 1 . 2 :
means . append ( dot ( m a t r i x ( pbar ) , m a t r i x (w) ) )
s t d s . append ( s q r t ( dot ( m a t r i x (w) , m a t r i x ( cov ( a r r a y ( r e t v e c ) ) ) ∗ m a t r i x (w) ) ) )
r e t u r n means , s t d s
means , s t d s = c a l c m e a n r e t ( r e t v e c , n , r a n d w e i g h t )
p l o t ( s t d s , means ,
’o ’ )
y l a b e l ( ’ mean ’ )
xlabel ( ’ std ’ )
Listing 2. Producing normally distributed return series.
Upon plotting those you will observe that they form a characteristic parabolic
shape called the ‘Markowitz bullet‘ with the boundaries being called the ‘efficient
frontier‘, where we have the lowest variance for a given expected return as shown
in Figure 1. In the code you will notice the calculation of the return with:
(1)
R = pT w
ILLUSTRATING THE LINK BETWEEN MODERN PORTFOLIO THEORY AND KELLY
3
where R is the expected return, pT is the transpose of the vector for the mean
returns for each time series and w is the weight vector of the portfolio. p is a Nx1
column vector, so pT turns into a 1xN row vector which can be multiplied with the
Nx1 weight (column) vector w to give a scalar result. This is equivalent to the dot
product used in the code. Keep in mind that Python has a reversed definition of
rows and columns and the accurate Numpy version of the previous equation would
be R = matrix(w)*matrix(p).T
Next, we calculate the standard deviation with
√
(2)
σ = wT Cw
where C is the covariance matrix of the returns which is a NxN matrix. Please
note that if we simply calculated the simple standard deviation with the appropriate weighting using std(array(ret_vec).T*w) we would get a slightly different
’bullet’. This is because the simple standard deviation calculation would not take
covariances into account. In the covariance matrix, the values of the diagonal
represent the simple variances of each asset while the off-diagonals are the variances between the assets. By using ordinary std() we effectively only regard the
diagonal and miss the rest. A small but significant difference.
Figure 1. Simulation of portfolios with varying weight, plot of the
efficient frontier and the Kelly portfolios with varying θ.
Once we have a good representation of our portfolios as the blue dots in Figure
1 show we can calculate the efficient frontier Markowitz-style. This is done by
minimising
(3)
wT Cw
4
TOM STARKE
for w on the expected portfolio return RT w whilst keeping the sum of all the
weights equal to 1:
X
(4)
wi = 1
i
Here we parametrically run through RT w = µ and find the minimum variance
for different µ‘s. This can be done with scipy.optimise.minimize but we have
to define quite a complex problem with bounds, constraints and a Lagrange multiplier. Conveniently, the cvxopt package, a convex solver, does that all for us.
I used one of their examples4 with some modifications as shown below. You will
notice that there are some conditioning expressions in the code. They are simply
needed to set up the problem. For more information please have a look at the
cvxopt example.
The mus vector produces a series of expected return values µ in a non-linear and
more appropriate way. We will see later that we don‘t need to calculate a lot of
these as they perfectly fit a parabola, which can safely be extrapolated for higher
values.
## CONVERSION TO MATRIX
k = array ( r e t v e c )
S = m a t r i x ( cov ( k ) )
pbar = m a t r i x ( pbar )
## CONDITIONING OPTIMIZER
G = matrix ( 0 . 0 , (n , n ) )
G [ : : n+1] = −1.0
h = matrix ( 0 . 0 , (n , 1 ) )
A = matrix ( 1 . 0 , ( 1 , n ) )
b = matrix ( 1 . 0 )
## CALCULATING EXPECTED RETURNS
N = 100
mus = [ 1 0 ∗ ∗ ( 5 . 0 ∗ t /N−1.0) f o r t i n r a n g e (N) ]
## CALCULATING THE WEIGHTS FOR EFFICIENT FRONTIER
p o r t f o l i o s = [ qp (mu∗S , −pbar , G, h , A, b ) [ ’ x ’ ] f o r mu i n mus ]
## CALCULATING RISKS AND RETURNS FOR FRONTIER
r e t s = [ dot ( pbar , x ) f o r x i n p o r t f o l i o s ]
r i s = [ s q r t ( dot ( x , S∗x ) ) f o r x i n p o r t f o l i o s ]
p l o t ( r i s , r e t s , ’ y−o ’ )
Listing 3. Calculating the efficient frontier.
You can see the result in Figure 1 as the parabolic fitting curve to the Markowitz
bullet. So far, there is no magic here. But now comes the interesting part: Ernie
Chan shows in his article after some interesting algebra that the Markowitz optimal
portfolio is proportional to the Kelly optimal leverage portfolio by a factor which
4http://cvxopt.org/examples/book/portfolio.html
ILLUSTRATING THE LINK BETWEEN MODERN PORTFOLIO THEORY AND KELLY
5
I will call θ. When I read this article I was intrigued and wanted to see this for
myself, which gave me the motivation for this article.
So let‘s start with calculating the optimal Kelly portfolio. This is very simply
done by the equation:
w = C −1 R
(5)
which in pythonic form look like this:
ww = np . l i n a l g . i n v ( np . m a t r i x ( S ) ) ∗np . m a t r i x ( pbar )
Listing 4. Calculation of optimal Kelly portfolio.
Now all we have to do is to apply a factor θ and see how we go with this. In my
little example below I actually plotted the lines for both θ and −θ, which is quite
interesting. The result of this can be seen as the black, dotted lines in Figure 1 and
it can be seen that the upper line is exactly the tangent of our efficient frontier.
Bingo! By getting that plot we can see that somewhere along the black Kelly line
we will find the optimal Markowitz portfolio. Since we know that all the points
on the efficient frontier have a leverage of 1 we can see that Ernie Chan is indeed
correct with his calculations. The black line with the negative slope can also be a
Kelly-tangent since the tangent to the efficient frontier has actually two solutions
as we will see below.
rks = [ ] ; res = [ ] ;
f o r theta in arange ( 0 . 0 5 , 2 0 , 0 . 0 0 0 1 ) :
w = ww/ t h e t a
r k s . append ( dot ( pbar , m a t r i x (w) ) )
r e s . append ( s q r t ( dot ( m a t r i x (w) , S∗ m a t r i x (w) ) ) )
p l o t ( r e s , r k s , ’ ko ’ , m a r k e r s i z e =3)
p l o t ( r e s , a r r a y ( r k s ) ∗ −1 , ’ ko ’ , m a r k e r s i z e =3)
show ( )
Listing 5. Calculation of the Kelly tangent.
Once we know the optimal Kelly portfolio, we can readily find the weights for
Markowitz with Pwwi . However, the diligent investigator may wonder how accurate
i
that Kelly-derived weight vector actually is. In order to work that out we first
have to extend our efficient frontier a little by doing a second-order polynomial fit
as shown in the code example below. At the same time we are also fitting the line
for the Kelly portfolios. An example of these fits is shown in Figure 2. For ease of
calculation I have reversed the axis here.
## FITTING THE MARKOWITZ PARABOLA
m1=p o l y f i t ( r e t s , r i s , 2 )
x = l i n s p a c e ( min ( r e t s ) ,max( r e t s ) ∗ 2 , 1 0 0 0 )
y = p o l y v a l (m1 , x )
## FITTING THE KELLY WEIGHTS
m2 = p o l y f i t ( r e s , r k s , 1 )
6
TOM STARKE
Listing 6. Fitting the efficient frontier.
Figure 2. Fitting the efficient frontier and the Kelly-tangent (blue
crosses) and plotting the optimum portfolio determined with Kelly
(green) and the and using the tangent calculation (black). The actual tangent to the efficient frontier is shown as solid black line. Note
that the axis of risk and return are reverse to Figure 1
Next, we calculate the tangent to the efficient frontier and the point where
tangent and parabola meet. You can easily calculate this yourself knowing that the
derivative of the point where both lines touch is equal to the slope of the tangent.
The equation for the efficient frontier can be modelled by a simple polynomial:
(6)
f (x) = ax2 + bx + c
where the slope of the tangent is:
(7)
f 0 (x) = 2ax + b
Since we know that our tangent goes through the origin, it‘s equation will be
t(x) = xf 0 (x) which gives us
(8)
t(x) = (2ax + b)x
We set this equal to the efficient frontier:
(9)
(2ax + b) = ax2 + bx + c
and after some algebra we can calculate the x-value for the intercept:
r
c
(10)
x=±
a
ILLUSTRATING THE LINK BETWEEN MODERN PORTFOLIO THEORY AND KELLY
7
and the slope of the tangent at this point:
√
(11)
m = ±2 ca + b
Notice that the for the tangent we obtain 2 solutions and my code is written to
obtain the positive one. If you change the bias on the returns to negative you
will notice that the bullet will also shift towards the negative tangent and you can
calculate the solutions for this case.
p l o t ( x , ( 2 ∗ ( s q r t (m1 [ 0 ] ∗ m1 [ 2 ] ) )+m1 [ 1 ] ) ∗x , ’ k ’ )
x1 = s q r t (m1 [ 2 ] / m1 [ 0 ] )
y1 = m1 [ 0 ] ∗ x1 ∗∗2 + m1 [ 1 ] ∗ x1 + m1 [ 2 ]
p l o t ( [ x1 ] , [ y1 ] , ’ ko ’ , m a r k e r s i z e =10)
Listing 7. Plotting tangent and interception point.
When we compare the slopes of the Kelly and the real tangent to the parabola
we find that they are slightly different. I do not have a satisfying explanation for
this right now other then attributing it to rounding errors, particularly when we
are dealing with portfolios where most of the expected returns are close to zero.
However, in most cases the curves are close enough to conclude that Ernie Chan‘s
derivations are correct.
Finally, we compare the optimum portfolio point obtained from the real tangent
with that from the Kelly-tangent. For this we use cvxopt again to calculate the
weights at the optimum Kelly point are a bit different but relatively close.
wt = qp ( x1 ∗S , −pbar , G, h , A, b ) [ ’ x ’ ]
p l o t ( dot ( pbar , m a t r i x ( wt ) ) , s q r t ( dot ( m a t r i x ( wt ) , S∗ m a t r i x ( wt ) ) ) , ’ go ’ , m a r k e r s i z e =10)
Listing 8. Calculating the interception point based on the Kellytangent.
Well done, if you have followed me up to this point. I hope you got a little bit
enlightened and please let me know if you have any questions or suggestions. To
make it really easy for you I finally put it all together in one big block of code.
But remember, just running the code does not help to understand the depth of
the problem described here.
from numpy imp ort ∗
from p y l a b im por t p l o t , show , x l a b e l , y l a b e l
from cv xo pt imp ort m a t r i x
from cv xo pt . b l a s i mpo rt dot
from cv xo pt . s o l v e r s im por t qp
im po rt numpy a s np
## NUMBER OF ASSETS
n = 4
## PRODUCING RANDOM WEIGHTS
def rand weight (n) :
k = random . r a n d i n t ( −101 ,101 , n )
8
TOM STARKE
r e t u r n k/ f l o a t ( sum ( k ) )
## PRODUCING NORMALLY DISTRIBUTED RETURNS
pbar = [ ] ; r e t v e c = [ ]
f o r i in range (n) :
r e t v e c . append ( random . randn ( 1 0 0 0 ) + 0. 01 )
pbar . append ( mean ( r e t v e c [ − 1 ] ) )
## CREATING PORTFOLIOS FUNCTION
def calc mean ret ( ret vec , n , f ) :
means = [ ] ; s t d s = [ ]
f o r i in range (3000) :
w = f (n)
i f s t d ( a r r a y ( r e t v e c ) . T∗w) < 1 . 2 :
means . append ( dot ( m a t r i x ( pbar ) , m a t r i x (w) ) )
s t d s . append ( s q r t ( dot ( m a t r i x (w) , m a t r i x ( cov ( a r r a y ( r e t v e c ) ) ) ∗ m a t r i x (w) ) ) )
r e t u r n means , s t d s
## RUNNING AND PLOTTING PORTFOLIOS
means , s t d s = c a l c m e a n r e t ( r e t v e c , n , r a n d w e i g h t )
p l o t ( s t d s , means ,
’o ’ )
y l a b e l ( ’ mean ’ )
xlabel ( ’ std ’ )
## CONVERSION TO MATRIX
k = array ( r e t v e c )
S = m a t r i x ( cov ( k ) )
pbar = m a t r i x ( pbar )
## CONDITIONING
G = matrix ( 0 . 0 ,
G [ : : n+1] = −1.0
h = matrix ( 0 . 0 ,
A = matrix ( 1 . 0 ,
b = matrix ( 1 . 0 )
OPTIMIZER
(n , n) )
(n , 1 ) )
(1 ,n) )
## CALCULATING EXPECTED RETURNS
N = 100
mus = [ 1 0 ∗ ∗ ( 5 . 0 ∗ t /N−1.0) f o r t i n r a n g e (N) ]
## CALCULATING THE WEIGHTS FOR EFFICIENT FRONTIER
p o r t f o l i o s = [ qp (mu∗S , −pbar , G, h , A, b ) [ ’ x ’ ] f o r mu i n mus ]
## CALCULATING RISKS AND RETURNS FOR FRONTIER
r e t s = [ dot ( pbar , x ) f o r x i n p o r t f o l i o s ]
r i s = [ s q r t ( dot ( x , S∗x ) ) f o r x i n p o r t f o l i o s ]
p l o t ( r i s , r e t s , ’ y−o ’ )
## PLOTTING THE KELLY PORTFOLIOS
ww = np . l i n a l g . i n v ( np . m a t r i x ( S ) ) ∗np . m a t r i x ( pbar )
rks = [ ] ; res = [ ] ;
f o r i in arange ( 0 . 0 5 , 2 0 , 0 . 0 0 0 1 ) :
w = ww/ i
r k s . append ( dot ( pbar , m a t r i x (w) ) )
r e s . append ( s q r t ( dot ( m a t r i x (w) , S∗ m a t r i x (w) ) ) )
ILLUSTRATING THE LINK BETWEEN MODERN PORTFOLIO THEORY AND KELLY
9
## PLOTTING THE KELLY PORTFOLIOS FOR DIFF LEVERAGE
p l o t ( r e s , r k s , ’ ko ’ , m a r k e r s i z e =3)
p l o t ( r e s , a r r a y ( r k s ) ∗ −1 , ’ ko ’ , m a r k e r s i z e =3)
show ( )
## FITTING THE MARKOWITZ PARABOLA
m1=p o l y f i t ( r e t s , r i s , 2 )
x = l i n s p a c e ( min ( r e t s ) ,max( r e t s ) ∗ 2 , 1 0 0 0 )
y = p o l y v a l (m1 , x )
## FITTING THE KELLY WEIGHTS
m2 = p o l y f i t ( r e s , r k s , 1 )
## m1 I S THE PARABOLIC FIT AND m2 THE LINEAR FIT
p r i n t m1 , m2 , ww
## PLOTTING THE TANGENT TO THE PARABOLA
p l o t ( x , ( 2 ∗ ( s q r t (m1 [ 0 ] ∗ m1 [ 2 ] ) )+m1 [ 1 ] ) ∗x , ’ k ’ )
## PLOTTING THE KELLY PORTFOLIOS FOR DIFFERENT FACTORS
p l o t ( rks , res , ’ x ’ )
## PLOTTING THE MARKOWITZ PARABOLA
p l o t ( r e t s , r i s , ’ r−o ’ )
## PLOTTING THE FITTING FUNCTION FOR THE MARKOWITZ PARABOLA
plot (x , y)
##
x1
x2
y1
y2
FINDING OPTIMAL MARKOWITZ POINTS
= s q r t (m1 [ 2 ] / m1 [ 0 ] )
= −s q r t (m1 [ 2 ] / m1 [ 0 ] )
= m1 [ 0 ] ∗ x1 ∗∗2 + m1 [ 1 ] ∗ x1 + m1 [ 2 ]
= m1 [ 0 ] ∗ x2 ∗∗2 + m1 [ 1 ] ∗ x2 + m1 [ 2 ]
## PLOTTING THE OPTIMUM MARKOWITZ POINT
p l o t ( [ x1 ] , [ y1 ] , ’ ko ’ , m a r k e r s i z e =10)
x l a b e l ( ’ mean ’ )
ylabel ( ’ std ’ )
## FINDING OPTIMUM PORTFOLIO WITH CVXOPT
wt = qp ( x1 ∗S , −pbar , G, h , A, b ) [ ’ x ’ ]
p l o t ( dot ( pbar , m a t r i x ( wt ) ) , s q r t ( dot ( m a t r i x ( wt ) , S∗ m a t r i x ( wt ) ) ) , ’ go ’ , m a r k e r s i z e =10)
show ( )
Listing 9. Putting it all together.