illustrating the link between modern portfolio theory and kelly
Transcription
illustrating the link between modern portfolio theory and kelly
ILLUSTRATING THE LINK BETWEEN MODERN PORTFOLIO THEORY AND KELLY TOM STARKE 0 ILLUSTRATING THE LINK BETWEEN MODERN PORTFOLIO THEORY AND KELLY 1 1. Introduction This article was hugely inspired by Ernie Chan’s work, so thanks Ernie for providing us with so much insight. If you are active in the algorithmic trading field you will have come across Ernie Chan’s books1. He has a sizeable section dedicated to portfolio management and describes how to use the Kelly formula to apply the optimum leverage in order to maximise returns. This approach was first applied and popularised by Thorpe2. Recently, Ernie has posted a very interesting piece of work on his blog3 which unifies the portfolio management approaches of Kelly and Markowitz and his modern portfolio theory. This post is based on his work, please have a look at his post now, before you read on. Markowitz is generally regarded as a ’serious’ academic theory whereas Thorpe was just a mathematically-minded Black-Jack player who also made a couple of hundred millions on the stock market. The Kelly formula works on the assumption that, with respect to your bankroll, there is an optimum fraction you can bet to maximise your profits. However, both theories work on the (mostly inaccurate) assumption that returns are normally distributed. In reality we often have fatter tails, which means that we should at least reduce the amount of leverage that we use. Even though we may maximise our profits with straight-Kelly, such a strategy can also suffer from enormous drawdowns and anyone who experienced the darker side of trading psychology knows what that means. If you had a look at his post you may have realised that there is a lot of algebra happening and some of you may want to know how they can use this in practical terms. In this post I will walk you through his work and provide you with code examples that illustrate the concepts. I hope you enjoy it and get a little more enlightened in the process. In this post I used random data rather than actual stock data, which will hopefully help you to get a sense of how to use modelling and simulation to improve your understanding of the theoretical concepts. Don‘t forget that the skill of an algo-trader is to put mathematical models into code and this example is great practise. As a final note I would like to point out that from my experience there are a lot of people with IT experience reading these articles who do not always have the in-depth mathematical background. For that reason I try not to jump too many steps to increase the educational value. So let‘s start with importing a few modules, which we need later and produce a series of normally distributed returns. CVXOPT is a convex solver which you can easily download with sudo pip install cvxopt. 1 http://epchan.blogspot.com.au http://www.eecs.harvard.edu/cs286r/courses/fall12/papers/Thorpe_ KellyCriterion2007.pdf 3 http://epchan.blogspot.com.au/2014/08/kelly-vs-markowitz-portfolio.html 2 2 TOM STARKE from numpy i mp ort ∗ from p y l a b im por t p l o t , show , x l a b e l , y l a b e l from cv xo pt imp ort m a t r i x from cv xo pt . b l a s i mpo rt dot from cv xo pt . s o l v e r s im por t qp im po rt numpy a s np from s c i p y i mpo rt o p t i m i z e n = 5 # number o f a s s e t s i n t h e p o r t f o l i o pbar = [ ] ; r e t v e c = [ ] f o r i in range (n) : r e t v e c . append ( random . randn ( 1 0 0 0 ) + 0. 01 ) pbar . append ( mean ( r e t v e c [ − 1 ] ) ) Listing 1. Producing normally distributed return series. These return series can be used to create a wide range of portfolios, which all have different returns and risks (standard deviation). We can produce a wide range of random weight vectors and plot those portfolios. In this example I introduce an upward trending bias by adding 0.01, which makes things easier for illustration but you can set it to zero and see what happens. You can also see that there is a filter that only allows to plot portfolios with a standard deviation of < 1.2 for better illustration. def rand weight (n) : k = random . r a n d i n t ( −101 ,101 , n ) r e t u r n k/ f l o a t ( sum ( k ) ) def calc mean ret ( ret vec , n , f ) : means = [ ] ; s t d s = [ ] f o r i in range (3000) : w = f (n) i f s t d ( a r r a y ( r e t v e c ) . T∗w) < 1 . 2 : means . append ( dot ( m a t r i x ( pbar ) , m a t r i x (w) ) ) s t d s . append ( s q r t ( dot ( m a t r i x (w) , m a t r i x ( cov ( a r r a y ( r e t v e c ) ) ) ∗ m a t r i x (w) ) ) ) r e t u r n means , s t d s means , s t d s = c a l c m e a n r e t ( r e t v e c , n , r a n d w e i g h t ) p l o t ( s t d s , means , ’o ’ ) y l a b e l ( ’ mean ’ ) xlabel ( ’ std ’ ) Listing 2. Producing normally distributed return series. Upon plotting those you will observe that they form a characteristic parabolic shape called the ‘Markowitz bullet‘ with the boundaries being called the ‘efficient frontier‘, where we have the lowest variance for a given expected return as shown in Figure 1. In the code you will notice the calculation of the return with: (1) R = pT w ILLUSTRATING THE LINK BETWEEN MODERN PORTFOLIO THEORY AND KELLY 3 where R is the expected return, pT is the transpose of the vector for the mean returns for each time series and w is the weight vector of the portfolio. p is a Nx1 column vector, so pT turns into a 1xN row vector which can be multiplied with the Nx1 weight (column) vector w to give a scalar result. This is equivalent to the dot product used in the code. Keep in mind that Python has a reversed definition of rows and columns and the accurate Numpy version of the previous equation would be R = matrix(w)*matrix(p).T Next, we calculate the standard deviation with √ (2) σ = wT Cw where C is the covariance matrix of the returns which is a NxN matrix. Please note that if we simply calculated the simple standard deviation with the appropriate weighting using std(array(ret_vec).T*w) we would get a slightly different ’bullet’. This is because the simple standard deviation calculation would not take covariances into account. In the covariance matrix, the values of the diagonal represent the simple variances of each asset while the off-diagonals are the variances between the assets. By using ordinary std() we effectively only regard the diagonal and miss the rest. A small but significant difference. Figure 1. Simulation of portfolios with varying weight, plot of the efficient frontier and the Kelly portfolios with varying θ. Once we have a good representation of our portfolios as the blue dots in Figure 1 show we can calculate the efficient frontier Markowitz-style. This is done by minimising (3) wT Cw 4 TOM STARKE for w on the expected portfolio return RT w whilst keeping the sum of all the weights equal to 1: X (4) wi = 1 i Here we parametrically run through RT w = µ and find the minimum variance for different µ‘s. This can be done with scipy.optimise.minimize but we have to define quite a complex problem with bounds, constraints and a Lagrange multiplier. Conveniently, the cvxopt package, a convex solver, does that all for us. I used one of their examples4 with some modifications as shown below. You will notice that there are some conditioning expressions in the code. They are simply needed to set up the problem. For more information please have a look at the cvxopt example. The mus vector produces a series of expected return values µ in a non-linear and more appropriate way. We will see later that we don‘t need to calculate a lot of these as they perfectly fit a parabola, which can safely be extrapolated for higher values. ## CONVERSION TO MATRIX k = array ( r e t v e c ) S = m a t r i x ( cov ( k ) ) pbar = m a t r i x ( pbar ) ## CONDITIONING OPTIMIZER G = matrix ( 0 . 0 , (n , n ) ) G [ : : n+1] = −1.0 h = matrix ( 0 . 0 , (n , 1 ) ) A = matrix ( 1 . 0 , ( 1 , n ) ) b = matrix ( 1 . 0 ) ## CALCULATING EXPECTED RETURNS N = 100 mus = [ 1 0 ∗ ∗ ( 5 . 0 ∗ t /N−1.0) f o r t i n r a n g e (N) ] ## CALCULATING THE WEIGHTS FOR EFFICIENT FRONTIER p o r t f o l i o s = [ qp (mu∗S , −pbar , G, h , A, b ) [ ’ x ’ ] f o r mu i n mus ] ## CALCULATING RISKS AND RETURNS FOR FRONTIER r e t s = [ dot ( pbar , x ) f o r x i n p o r t f o l i o s ] r i s = [ s q r t ( dot ( x , S∗x ) ) f o r x i n p o r t f o l i o s ] p l o t ( r i s , r e t s , ’ y−o ’ ) Listing 3. Calculating the efficient frontier. You can see the result in Figure 1 as the parabolic fitting curve to the Markowitz bullet. So far, there is no magic here. But now comes the interesting part: Ernie Chan shows in his article after some interesting algebra that the Markowitz optimal portfolio is proportional to the Kelly optimal leverage portfolio by a factor which 4http://cvxopt.org/examples/book/portfolio.html ILLUSTRATING THE LINK BETWEEN MODERN PORTFOLIO THEORY AND KELLY 5 I will call θ. When I read this article I was intrigued and wanted to see this for myself, which gave me the motivation for this article. So let‘s start with calculating the optimal Kelly portfolio. This is very simply done by the equation: w = C −1 R (5) which in pythonic form look like this: ww = np . l i n a l g . i n v ( np . m a t r i x ( S ) ) ∗np . m a t r i x ( pbar ) Listing 4. Calculation of optimal Kelly portfolio. Now all we have to do is to apply a factor θ and see how we go with this. In my little example below I actually plotted the lines for both θ and −θ, which is quite interesting. The result of this can be seen as the black, dotted lines in Figure 1 and it can be seen that the upper line is exactly the tangent of our efficient frontier. Bingo! By getting that plot we can see that somewhere along the black Kelly line we will find the optimal Markowitz portfolio. Since we know that all the points on the efficient frontier have a leverage of 1 we can see that Ernie Chan is indeed correct with his calculations. The black line with the negative slope can also be a Kelly-tangent since the tangent to the efficient frontier has actually two solutions as we will see below. rks = [ ] ; res = [ ] ; f o r theta in arange ( 0 . 0 5 , 2 0 , 0 . 0 0 0 1 ) : w = ww/ t h e t a r k s . append ( dot ( pbar , m a t r i x (w) ) ) r e s . append ( s q r t ( dot ( m a t r i x (w) , S∗ m a t r i x (w) ) ) ) p l o t ( r e s , r k s , ’ ko ’ , m a r k e r s i z e =3) p l o t ( r e s , a r r a y ( r k s ) ∗ −1 , ’ ko ’ , m a r k e r s i z e =3) show ( ) Listing 5. Calculation of the Kelly tangent. Once we know the optimal Kelly portfolio, we can readily find the weights for Markowitz with Pwwi . However, the diligent investigator may wonder how accurate i that Kelly-derived weight vector actually is. In order to work that out we first have to extend our efficient frontier a little by doing a second-order polynomial fit as shown in the code example below. At the same time we are also fitting the line for the Kelly portfolios. An example of these fits is shown in Figure 2. For ease of calculation I have reversed the axis here. ## FITTING THE MARKOWITZ PARABOLA m1=p o l y f i t ( r e t s , r i s , 2 ) x = l i n s p a c e ( min ( r e t s ) ,max( r e t s ) ∗ 2 , 1 0 0 0 ) y = p o l y v a l (m1 , x ) ## FITTING THE KELLY WEIGHTS m2 = p o l y f i t ( r e s , r k s , 1 ) 6 TOM STARKE Listing 6. Fitting the efficient frontier. Figure 2. Fitting the efficient frontier and the Kelly-tangent (blue crosses) and plotting the optimum portfolio determined with Kelly (green) and the and using the tangent calculation (black). The actual tangent to the efficient frontier is shown as solid black line. Note that the axis of risk and return are reverse to Figure 1 Next, we calculate the tangent to the efficient frontier and the point where tangent and parabola meet. You can easily calculate this yourself knowing that the derivative of the point where both lines touch is equal to the slope of the tangent. The equation for the efficient frontier can be modelled by a simple polynomial: (6) f (x) = ax2 + bx + c where the slope of the tangent is: (7) f 0 (x) = 2ax + b Since we know that our tangent goes through the origin, it‘s equation will be t(x) = xf 0 (x) which gives us (8) t(x) = (2ax + b)x We set this equal to the efficient frontier: (9) (2ax + b) = ax2 + bx + c and after some algebra we can calculate the x-value for the intercept: r c (10) x=± a ILLUSTRATING THE LINK BETWEEN MODERN PORTFOLIO THEORY AND KELLY 7 and the slope of the tangent at this point: √ (11) m = ±2 ca + b Notice that the for the tangent we obtain 2 solutions and my code is written to obtain the positive one. If you change the bias on the returns to negative you will notice that the bullet will also shift towards the negative tangent and you can calculate the solutions for this case. p l o t ( x , ( 2 ∗ ( s q r t (m1 [ 0 ] ∗ m1 [ 2 ] ) )+m1 [ 1 ] ) ∗x , ’ k ’ ) x1 = s q r t (m1 [ 2 ] / m1 [ 0 ] ) y1 = m1 [ 0 ] ∗ x1 ∗∗2 + m1 [ 1 ] ∗ x1 + m1 [ 2 ] p l o t ( [ x1 ] , [ y1 ] , ’ ko ’ , m a r k e r s i z e =10) Listing 7. Plotting tangent and interception point. When we compare the slopes of the Kelly and the real tangent to the parabola we find that they are slightly different. I do not have a satisfying explanation for this right now other then attributing it to rounding errors, particularly when we are dealing with portfolios where most of the expected returns are close to zero. However, in most cases the curves are close enough to conclude that Ernie Chan‘s derivations are correct. Finally, we compare the optimum portfolio point obtained from the real tangent with that from the Kelly-tangent. For this we use cvxopt again to calculate the weights at the optimum Kelly point are a bit different but relatively close. wt = qp ( x1 ∗S , −pbar , G, h , A, b ) [ ’ x ’ ] p l o t ( dot ( pbar , m a t r i x ( wt ) ) , s q r t ( dot ( m a t r i x ( wt ) , S∗ m a t r i x ( wt ) ) ) , ’ go ’ , m a r k e r s i z e =10) Listing 8. Calculating the interception point based on the Kellytangent. Well done, if you have followed me up to this point. I hope you got a little bit enlightened and please let me know if you have any questions or suggestions. To make it really easy for you I finally put it all together in one big block of code. But remember, just running the code does not help to understand the depth of the problem described here. from numpy imp ort ∗ from p y l a b im por t p l o t , show , x l a b e l , y l a b e l from cv xo pt imp ort m a t r i x from cv xo pt . b l a s i mpo rt dot from cv xo pt . s o l v e r s im por t qp im po rt numpy a s np ## NUMBER OF ASSETS n = 4 ## PRODUCING RANDOM WEIGHTS def rand weight (n) : k = random . r a n d i n t ( −101 ,101 , n ) 8 TOM STARKE r e t u r n k/ f l o a t ( sum ( k ) ) ## PRODUCING NORMALLY DISTRIBUTED RETURNS pbar = [ ] ; r e t v e c = [ ] f o r i in range (n) : r e t v e c . append ( random . randn ( 1 0 0 0 ) + 0. 01 ) pbar . append ( mean ( r e t v e c [ − 1 ] ) ) ## CREATING PORTFOLIOS FUNCTION def calc mean ret ( ret vec , n , f ) : means = [ ] ; s t d s = [ ] f o r i in range (3000) : w = f (n) i f s t d ( a r r a y ( r e t v e c ) . T∗w) < 1 . 2 : means . append ( dot ( m a t r i x ( pbar ) , m a t r i x (w) ) ) s t d s . append ( s q r t ( dot ( m a t r i x (w) , m a t r i x ( cov ( a r r a y ( r e t v e c ) ) ) ∗ m a t r i x (w) ) ) ) r e t u r n means , s t d s ## RUNNING AND PLOTTING PORTFOLIOS means , s t d s = c a l c m e a n r e t ( r e t v e c , n , r a n d w e i g h t ) p l o t ( s t d s , means , ’o ’ ) y l a b e l ( ’ mean ’ ) xlabel ( ’ std ’ ) ## CONVERSION TO MATRIX k = array ( r e t v e c ) S = m a t r i x ( cov ( k ) ) pbar = m a t r i x ( pbar ) ## CONDITIONING G = matrix ( 0 . 0 , G [ : : n+1] = −1.0 h = matrix ( 0 . 0 , A = matrix ( 1 . 0 , b = matrix ( 1 . 0 ) OPTIMIZER (n , n) ) (n , 1 ) ) (1 ,n) ) ## CALCULATING EXPECTED RETURNS N = 100 mus = [ 1 0 ∗ ∗ ( 5 . 0 ∗ t /N−1.0) f o r t i n r a n g e (N) ] ## CALCULATING THE WEIGHTS FOR EFFICIENT FRONTIER p o r t f o l i o s = [ qp (mu∗S , −pbar , G, h , A, b ) [ ’ x ’ ] f o r mu i n mus ] ## CALCULATING RISKS AND RETURNS FOR FRONTIER r e t s = [ dot ( pbar , x ) f o r x i n p o r t f o l i o s ] r i s = [ s q r t ( dot ( x , S∗x ) ) f o r x i n p o r t f o l i o s ] p l o t ( r i s , r e t s , ’ y−o ’ ) ## PLOTTING THE KELLY PORTFOLIOS ww = np . l i n a l g . i n v ( np . m a t r i x ( S ) ) ∗np . m a t r i x ( pbar ) rks = [ ] ; res = [ ] ; f o r i in arange ( 0 . 0 5 , 2 0 , 0 . 0 0 0 1 ) : w = ww/ i r k s . append ( dot ( pbar , m a t r i x (w) ) ) r e s . append ( s q r t ( dot ( m a t r i x (w) , S∗ m a t r i x (w) ) ) ) ILLUSTRATING THE LINK BETWEEN MODERN PORTFOLIO THEORY AND KELLY 9 ## PLOTTING THE KELLY PORTFOLIOS FOR DIFF LEVERAGE p l o t ( r e s , r k s , ’ ko ’ , m a r k e r s i z e =3) p l o t ( r e s , a r r a y ( r k s ) ∗ −1 , ’ ko ’ , m a r k e r s i z e =3) show ( ) ## FITTING THE MARKOWITZ PARABOLA m1=p o l y f i t ( r e t s , r i s , 2 ) x = l i n s p a c e ( min ( r e t s ) ,max( r e t s ) ∗ 2 , 1 0 0 0 ) y = p o l y v a l (m1 , x ) ## FITTING THE KELLY WEIGHTS m2 = p o l y f i t ( r e s , r k s , 1 ) ## m1 I S THE PARABOLIC FIT AND m2 THE LINEAR FIT p r i n t m1 , m2 , ww ## PLOTTING THE TANGENT TO THE PARABOLA p l o t ( x , ( 2 ∗ ( s q r t (m1 [ 0 ] ∗ m1 [ 2 ] ) )+m1 [ 1 ] ) ∗x , ’ k ’ ) ## PLOTTING THE KELLY PORTFOLIOS FOR DIFFERENT FACTORS p l o t ( rks , res , ’ x ’ ) ## PLOTTING THE MARKOWITZ PARABOLA p l o t ( r e t s , r i s , ’ r−o ’ ) ## PLOTTING THE FITTING FUNCTION FOR THE MARKOWITZ PARABOLA plot (x , y) ## x1 x2 y1 y2 FINDING OPTIMAL MARKOWITZ POINTS = s q r t (m1 [ 2 ] / m1 [ 0 ] ) = −s q r t (m1 [ 2 ] / m1 [ 0 ] ) = m1 [ 0 ] ∗ x1 ∗∗2 + m1 [ 1 ] ∗ x1 + m1 [ 2 ] = m1 [ 0 ] ∗ x2 ∗∗2 + m1 [ 1 ] ∗ x2 + m1 [ 2 ] ## PLOTTING THE OPTIMUM MARKOWITZ POINT p l o t ( [ x1 ] , [ y1 ] , ’ ko ’ , m a r k e r s i z e =10) x l a b e l ( ’ mean ’ ) ylabel ( ’ std ’ ) ## FINDING OPTIMUM PORTFOLIO WITH CVXOPT wt = qp ( x1 ∗S , −pbar , G, h , A, b ) [ ’ x ’ ] p l o t ( dot ( pbar , m a t r i x ( wt ) ) , s q r t ( dot ( m a t r i x ( wt ) , S∗ m a t r i x ( wt ) ) ) , ’ go ’ , m a r k e r s i z e =10) show ( ) Listing 9. Putting it all together.