A Linked Linear Regression - Data Envelopment Analysis Model for
Transcription
A Linked Linear Regression - Data Envelopment Analysis Model for
Case Study: A Linked Linear Regression Data Envelopment Analysis Model for the Long-Term Prediction of Mutual Fund Performance Cindy Chao, Greg Higgins Michael McGee Department of Computer Science and Engineering SOUTHERN METHODIST UNIVERSITY Dallas, Texas May 14, 1994 LINKED LINEAR RECRESSION - DEA MODEL Contents Abstract.............................................................................................................................3 Introduction......................................................................................................................3 Whatis a Mutual Fund? .................................................................................................4 DEAFactors......................................................................................................................4 Model Development and Validation.............................................................................6 Examinationof Resiilts ...................................................................................................9 InvestmentPerformance............................................................................................... 10 Conclusion...................................................................................................................... 11 References....................................................................................................................... 12 ProfessionalReferences................................................................................................. 13 Appendix........................................................................................................................ 14 DEAFormulation .............................................................................................. 15 DEACode ........................................................................................................... 16 ForcastingCode ................................................................................................. 19 2 Case Study: A Linked Linear Regression - Data Envelopment Analysis Model for the Long-Term Prediction of Mutual Fund Performance Michael McGee Greg Higgins Cindy Chao Department of Computer Science and Engineering Southern Methodist University Dallas, Texas 75275 Data Envelopment Analysis (DEA) can be used to evaluate managerial efficiency of mutual funds. By extending the DEA methodology with linear regression, a new prediction tool for future efficiency is developed. The linked model was tested on a sample of fifty growth mutual funds over a three year time horizon - resulting in accurate predictions of future efficiency with 99% confidence. he current mutual fund market current assessment of a fund's T contains nearly $1.4 trillion management. DEA is a linear dollars; this rapid growth of mutual programming methodology that funds as a long—term investment computes relative efficiency scores instrument has led to an increased among a group of homogeneous demand for evaluation and forecasting decision—making units (DMUs). techniques. The performance of a Combined with linear regression, DEA mutual fund is directly linked to the becomes a prediction tool for longperformance of the management staff. term forecasting. The linked model not Data Envelopment Analysis (DEA) can only evaluates return on investment be used to evaluate managerial but also the quality of the fund's efficiency; however, this is only, a managerial team on a long—term basis. 3 LINKED LINEAR REGRESSION - DEA MODEL What is a Mutual Fund? goal of the linked model is forecasting of long-term performance, growth mutual funds were selected because their objectives emphasize the accumulation of capital gains, and they have the greatest long-term growth potential. A mutual fund is an investment company that pools the money of many institutional and individual investors into one professionally managed investment account. The fund manager creates a portfolio for the shareholders by buying a variety of "true" financial instruments, such as corporate stocks and bonds, and government and municipal securities, in accordance with the fund's particular investment objectives. Fund managers are professional investors who know the financial market and follow it daily with extensive research capabilities to effectively investigate all investment opportunities. The fund is always monitored, with individual securities constantly being bought, sold, and traded in order to optimize the investment objectives. The beauty of investing in sometimes twenty different securities lies in the protection of the investment through portfolio diversification. Mutual funds are divided into six major categories according to their investment objectives, which are outlined in the fund's prospectus, such as goals, level of risk, and the type and frequency of return desired. Since the DEA Factors The inputs and outputs of the envelopment analysis were made up of the main characteristics of a mutual fund. They were selected because they are important factors to investors and are directly linked to the performance of the fund, manager. The inputs selected were beta - a measure of risk, expense ratio, and loads & other fees. The outputs used were net asset value (NAV) per share, dividends, and capital gains. Inputs Beta is a measure of the fund portfolio's average volatility,or risk, relative to the S&P 500. The beta value of the market is a constant equal to 1.00. A fund with a beta value greater than 1.00 is more volatile than the market. It is expected to outperform the "average fund" in a bull (rising) market and bring in greater returns, while it is expected to fall more than 4 LINKED LINEAR REGRESSION the average fund in times of a bear (declining) market. A fund with a beta value less than 1.00 Possesses the characteristics of the converse si tuation. For example, a portfolio with a beta value of 0.75 is expected to be 25% less responsive to the general economic trend than the average fund and return 25% more conservative than the average fund. DEA MODEL brokerage costs, and transaction fees. This value is a Percentage of the Purchase price taken off either before the money is invested (front-end load) or after investment when the shareholder redeems the shares (backend load). The amount of the load is established by the managerial company, but is paid to the sales and transaction team. Some funds have a load of 0.00%, Expense ratio is a measure which meaning that they have the same relates expenses to the average net benefits as load funds, except they are assets for a period. It is calculated by sold directly by the fund company taking the annual expenses paid by the rather than through a sales broker so fund (including management and tha t no loads or other fees are incurred. i nvestment advisory fees, custodial With no-load funds, the investor must and transfer agency charges, legal fees, seek out and contact the investment and dis tribution costs) divided by the firm directly. There is a distinct average shares outstanding for the a dvantage to investing in no-load period. The greater the expense ratio, funds. For example, if we had invested the larger the percentage of the 1000 in a particular portfolio with a shareholder's investment gets paid out 7.50% front-end load, $75 would be directly to the management company. i :aken off the top, and we would be left The expense ratio does not include vith only $925 worth of shares in the loads or commissions paid for sales or f und. Had this been a no-load fund, the tra nsaction fees. ftill $1000 would have been utilized. Loads and other fees represent all C )utputs other charges made to the shareholder Net asset value per share represents which are not a ccounted for in the h DW much each share is worth. It is expense ratio. A load is a portion of the c Ll Culated by taking the total market offering price of the fund that pays for v due of a fund's shares (securities plus costs such as sales commissions, sh plus accrued earnings minus 5 LINKED LINEAR REGRESSION - DEA MODEL liabilities) divided by the number of in different ways. Shareholders can shares outstanding. No profits or losses reinvest dividends and capital are realized through NAV until the distributions if they elect to do so. fund is sold. Interest lies in the amount of increase or decrease in NAV from Model Development and Validation time of purchase to time of sale. The d ecision—making units (DMUs) Dividends are payments declared in this analysis were the 50 growth by a corporation's board of directors mutual funds. A prerequisite of DEA is and made to shareholders, usually on a homogeneous DMUs. By selecting the quarterly basis. A fund's dividends funds from the same sector (growth come directly from the dividends paid funds), this requirement is met. out by the individual securities within The DEA model uses the dual the portfolio. They are paid out equally formulation of the linear program as among the shareholders on a described by Charnes, Cooper and scheduled time frame. Rhodes (1978). The objective function Capital gains on a mutual fund are of this form is to minimize the the profits made from the sale of a objective value of the current DM10. If capital asset (such as a security, stock, a DM10 is efficient the optimal value is or bond) within the portfolio. The 1, while an inefficient DM10 is driven number used for capital gains is towards zero. The first step in the adjusted for losses and stock splits. The development of the linked model is the sale of an asset occurs when the generation of DEA scores for the manager decides that a security needs mutual funds being studied. The data to be sold for the benefit of the collected for the mutual funds is used portfolio. Capital gains are distributed as the inputs and outputs for the DEA to shareholders when they are realized code which is documented in the (when a security is sold and a profit is appendix. The DEA code is written made) and are paid out as capital using the SAS/OR package allowing distributions. :he DEA scores to be fed directly into Dividends and capital gains are t he regression phase of the model. treated as separate outputs because DEA scores are generated for the they are incurred, paid out, and taxed fifty mutual funds in the study using LINKED LINEAR REGRESSION - DEA MODEL IflI &A A*Iy.I. •_. the 1990 data collected from Morningstar Mutual Funds. At this point DEA scores are also generated for 1993 and stored for use in validating the model's forecasts. The 1990 DEA scores are then regressed against all of the six DEA factors previously described. This regression model is unique from previous research in that the goal of the regression is not to create an efficiency score, but to forecast an efficiency score for the future of a particular DIv[LT. Previous work by Thanassoulis (1993) compared the DEA methodology with linear regression as a means of creating efficiency scores. Our model links the two methods into one prediction tool. The full factor model that resulted appeared on the surface to be a good predictor of efficiency (R2 = 0.8239). The full factor model included three non-significant factors at the 5% level of significance. Beta (p-value = 0.68), load (p-value = 0.65), and dividend (p-value = 0.50) were all eliminated from the model leaving only expense ratio, net asset value, and capital gains as significant factors. The revised model remained significant with an R2 = 0.8213. As stated before, this model appears to be a good predictor; however, linear I S.. •.I I 5.7 5.5 •.4 .,a,.t.. V.,.. - *fl II •.5 1.2 P5(5 Figure 1: Residual vs. Predicted DEA Score regression models must be validated through diagnostic tests. One assumption in regression analysis underpins the entire methodology errors are a randomly distributed variable following a normal distribution with mean equal to zero, N(0,a2). Regression analysis in SAS is based on a linear relationship between the dependent and independent variables. These assumptions can be verified through the examination of the residuals, or errors, of the regression model. A plot of the residuals versus the predicted values, Figure 1, reveals a distinct pattern indicating a non-linear relationship between the independent and dependent variables. This requires the examination of univariate statistics 7 LINKED LINEAR REGRESSION — DEA MODEL for each of the variables in the regression model. A standard normal quantile plot for each of the variables allows the determination of the distribution of each variable. From the univariate procedures, some of the variables demonstrated an exponential distribution requiring a log—linear transformation in order to validate the regression model. Net asset value, capital gains, dividends and the actual DEA score follow exponential distributions while beta, load, and expense ratio are all normally distributed. A second data set of the inputs and outputs was created by taking the natural log of each of the exponentially distributed variables. Once normality was confirmed through univarjate procedures, a new full factor linear regression model was fitted. The newly formed model resulted in an R 2 = 0.8163 and was validated through the residual plot shown in Figure 2. Using the SAS regression report, the regression equation was created using the coefficients of the significant factors: beta, expense ratio, net asset value, and dividend. The next step is the creation of predicted efficiency scores. The 8 •I I,: I • , PfldItt*O V.,.. • .i — It LOLA •.S P.O AS POLO Figure 2: Residual vs. Predicted DEA Score—after linear traa common method for forecast model verification is the prediction of the past. The linked model is verified by using the input and output data for 1993 in the regression equation created from the 1990 data. The predicted values are then compared to the actual 1993 DEA scores generated earlier by the DEA code. The forecast errors are evaluated using a significance test on the mean difference. The test is conducted at the 1% level of significance. H0 : mean difference = 0; Ha: mean difference # 0. The resulting test statistic: z = -0.5164 and critical value, z = 2.57 leads to the conclusion that the prediction values are not significantly different from the actual DEA scores with 99% confidence. LINKED LINEAR REGRESSION - DEA MODEL Fund DEA 1990 1.00000 FTRNX Predicted 1993 1.00000 DEA 1993 0.62320 Error -0.37680 PRWAX 1.00000 0.78983 0.29415 -0.49568 G1NLX 1.00000 0.33207 0.23691 -0.09516 ACRNX 1.00000 1.00000 1.00000 0.00000 FDETX 0.41944 0.98900 1.00000 0.01100 CFIMX 0.73415 0.98032 1.00000 0.01968 LOMCX 0.32965 0.68927 1.00000 0.31073 Table 1: Efficient MUtUI t-unas Examination of Results In analyzing the results of the sample studied, it is important to remember that a mutual fund is mainly used as a long-term investment, providing stable low-risk growth. A potential shareholder wants to invest in a fund that will be efficient in the future, regardless of its current efficiency. Therefore we focus our examination on the set of funds with efficient scores in either 1990 or 1993. Table 1 shows the 1990 DEA, Predicted 1993, and actual 1993 DEA scores along with the errors for each prediction (Error = 1993 DEA - Predicted 1993). Of the first four funds listed (all receiving efficient ratings in 1990), the future efficiency of three funds was correctly predicted. Our linked model predicted future behavior of currently efficient funds with 75% accuracy. Although the error for T. Rowe Price New America Fund, PRWAX, is high, we are not concerned with the magnitude of inefficiency, only the state of its efficiency. Of the last four funds in the table (all receiving efficient ratings in 1993), three funds were predicted accurately. The fifth and sixth funds, Fidelity Destiny Portfolios: Destiny II, FDETX, and Clipper Fund, CFIMX, both received a predicted score of approximately 0.98 for 1993. Based on the previous significance test, these values are not significantly different from 1.00. Thus our model predicts the funds that will be efficient in the future with 75% accuracy. LINKED LINEAR REGRESSION - DEA MODEL Table 2: Investment Simulation Results Investment Performance Since investors are concerned with the long-term growth and appreciation of their investments, DEA by itself is not an effective tool to aid in the development of investment strategies. However, combined with linear regression, DEA gives us the ability to estimate the long-term efficiency of a fund. To illustrate the effectiveness of the linked model relative to DEA alone, we simulated an investment in the efficient funds in 1990 and the funds predicted to be efficient by our linked model. Since mutual funds allow the purchase of fractional shares, we invested $1000 in each of the mutual funds. The results are given in Table 2. The percent of appreciation we received in the portfolio of 1990 efficient mutual funds was 46.92%, while the appreciation was significantly higher for the portfolio of funds recommended by the linked model at 60.41%. 10 LINKED LINEAR REGRESSION - DEA MODEL Conclusion The linked model allows us to broaden the range of applications for DEA. The linked model enables the prediction of future efficiency, which is of great benefit to investors because it provides more insight into the longterm viability of a mutual fund. The model also requires little historical data for each DM0 unlike other forecasting methods. Another benefit is the relative compactness of the model; very few predictors are required to explain a high degree of variability and achieve accurate predictions. Data envelopment analysis by itself is not an accurate predictor of future efficiency of mutual funds. Only one fund that was efficient in 1990 remained in the efficient set in 1993. Linear regression and DEA combine to give us an accurate prediction for the future behavior of DMUs. An immediate extension to this research is the definition of an "ideal type" mutual fund. An optimization algorithm will be used to generate values for the regression factors and record the combinations which yield and efficient fund prediction. Once this benchmark is defined a growth mutual fund prospectus can simply be visually inspected to determine managerial efficiency. The next phase of this research would be to compare the accuracy of the linked model with that of other forecasting methods, such as time series decomposition. Another step in the continued development of this area of research is to improve the accuracy of the predictions through further modifications to the regression model using a weighted regression model or the inclusion of adaptive error control. 11 LINKED LINEAR REGRESSION - DEA MODEL References Barr, R. S., L. M. Seiford, and T. F. Siems (1991), "An Envelopment Analysis Approach to Measuring the Management Quality of Banks," The Annals of Operations Research. J.C. Baltzer AG, Science Publishers. Dilorio, F. C. (1991), SAS Applications Programming: a Gentle Introduction. Boston: PWS-KENT Publishing Company. Flatt, L. G. (1994), "FUNDamentals: An Envelopment-Analysis Approach to Measuring the Managerial Quality of Mutual Funds," technical report. Dallas, Texas: Southern Methodist University, Department of Computer Science and Engineering. Morningstar Mutual Funds User's Guide (1992), Chicago: Morningstar, Inc. Morningstar Mutual Funds (1993), Chicago: Morningstar, Inc. Norman, M. and B. Stoker (1991), Data Envelopment Analysis: The Assessment of Performance. New York: John Wiley & Sons. SAS Institute Inc. (1989), SAS/OR User's Guide, Version 6, First Edition. Cary, NC: SAS Institute Inc. Sexton, T. R., S. Sleeper, and R. E. Taggart, Jr. (1994), "Improving Pupil Transportation in North Carolina," Interfaces, January-February 1994. Providence, RI: The Institute of Management Sciences. Thanassoulis, E. (1993), "A Comparison of Regression Analysis and Data Envelopment Analysis as Alternative Methods for Performance Assessments," O.R. Journal of the Operations Research Society. Coventry: Operational Research Society Ltd. 12 LINKED LINEAR REGRESSION - DEA MODEL Professional References Barr, Richard S., Ph.D. Associate Professor, School of Engineering and Applied Sciences. Southern Methodist University, Dallas, Texas. Dula, Jose, Ph.D. Assistant Professor, School of Engineering and Applied Sciences. Southern Methodist University, Dallas, Texas. Guerra, Rudy, Ph.D. Assistant Professor, Department of Statistical Science. Southern Methodist University, Dallas, Texas. Welch, David. Senior Systems Analyst II, Bradfield Computer Center. Southern Methodist University, Dallas, Texas. 13 LINKED LINEAR REGRESSION - DEA MODEL Appendix DEA Formulation Mat hematical formuation of the DEA model used in the Linked Model. DEA Code The DEA code is developed using Release 6.0 of the SAS System on an IBM 3090 mainframe. The code presented requires only two input files . One file contains the input and output factor data while the second file contains a listing of the mutualfrnd ticker symbols. The program is construct ed from a series of SAS Macros which fully automate the formulation and solution of the multiple linear programs required in Data Envelopment Analysis. Forecasting Code Developed as an extension to the DEA code, the forecasting model uses the same computing platform and software. The program reads the same data files as the DEA code as well as the DEA scores generated by the DEA study, The program generates predicted efficiency scores and all necessary statistical procedures for validating the model. 14 LINKED LINEAR REGRESSION - DEA MODEL DEA Formulation A Inputs ou + tputs Slacks = e - 10 [X A. Mi n E) A-A:!^O A7 e^!o Mat heinaticalformulation of DEA requires that the inputs and out puts for each DMU are formated in a matrix, A The matrix is multiplied by a vector, A,to construct the linear programming constraints. Minimize 0 Subject to: Inputs Beta: X,DMUl+XDMU2+... +xODMU5O-x51(0i'l):5OiO Load: xIDMUI+xDMU2+... +xo'DMU50-X5(0i1)!90O E_Pafio: xiDMUi +x2DMU2+... +x5oDMUso-x51(0i l):50 O Output$ NAV: X1DMUI+XDMU2+... +X5ODMU5O-X51(00) 011 Cap_Gain: xrDMUr +x2 DMU2+ ... +x5ODMU5Ox5(0rO) 0i • 1 Dividend: xrDMUr Mere +)(2 DMU2 + ... + X50 DMUso - X51 (0 0) All Variables 2: 0 a Is current ciecislon-maldng unit (DMU) Linear program solved by SAS for each DM11. 15 1 LINKED LINEAR REGRESSION —DEA MODEL DEA Code Part 1: Initialization This section reads the data files into a SAS dataset. Using the SAS Data Management Language datasets are created for the right hand side vectors, as well as the theta vectors. The two datasets are perfectly matched - each right hand side has exactly one corresponding theta vector. There is one right hand side and one theta vector for each DMU in the study. These vectors are merged with the inputs and outputs to formulate the linear program. The final step in the initialization is the renaming of variables to the proper ticker symbols in order to aid in report interpretation. MICHAEL MCGEE SENIOR DESIGN GROUP '94 DEA ANALYZER FILENAME NDT MENAME CTAA'; FILENAME DAflAF9O DATAA'; OPTIONS UNESIZE.72; %LET SYMBOLS = ACRNX LOMCX HRTVX STCSX BEONX GINIX PARNX VCAGX DEVLXAGBBX FPPTX FDE1X FDESI( FCN1( FBGRX FDVLX SRFCX FRGRX BRW1X SAMYX FDFFX FAGOX MLSVXALTFX ANEFX GPAFX PRWAX CGRWX OVOPX WTMGX NYVTX SEOGX BVALX BARAXAFGPX FTRNXAMRGX DNLDX HCRPX NBSSX PRGM( INNSI( TWHIX DWM( LMITX JHNX SRSPX CFIMX cWBX BOSSX DATA TICK; INFILENDT; ARRAY TICKR(50) $5; INPUT TICKRr); r MUTUAL FUND TICKER SYMBOLS DATA ONE; INFILE DAT; ARRAY FUND (50) INPUT JD_$ FUND() _TYPE_ $ _RHS.; r GENERATE THETA COLUMNS 1 DATA NEW-COL; SET ONE; ARRAY THETA (SO); ARRAY FUND(50); DO I ITO W. IF N_ = I THEN THETA(I) I; ELSE IF_N== 2 &_N_ <--4 THEN THETA(I .FUNO(I); ELSETHETA(I) =0; END DROP I; P GENERATE RIGHT HAND SIDE VECTORS DATA RANGE; ARRAY RH(50); ARRAY FUND(50); 1 SET ONE; DO I = ITO SE; IF _N_ = I THEN RH(I) MISSING; ELSE IF _N_ 4 THEN RH(I)= FUND([), ELSE RH(I) 0; END; DROP MISSING I FUNDI.FUND50; DATA PARAM; MERGE NEW_COL RANGE; P RENAMES THE FUNDSTO THE PROPER NASDAQTICKER SYMBOL! PROC DATASETS; MODIFY ONE; RENAME FUNDI =ACRNX FUND2=LOMCX FUND3= HRTVX FUND4 STCSX FUNDS=BEONX FUNDO=GINLX FUNDT=PARNX FUND8=VCAGA FUND9=DEVLC FUNDIO=AGBBX FUNDII= FPPIX FUNDI2= FDEIX 16 LINKED LINEAR REGRESSION - DEA MODEL FUNDI3 FDESX FUND14 FCNTX FUNDI5= FBGFO( FUND16 FDVLX FUND17 SRFCX FUNDI8 FRGRX FUNDI9 BRW1X FUND20SAMVX FUND21 FDFFX FUND22 FAGOX FUND23=MLSVX FUND24.ALTFX FUND26ANEFX EUND26 GPAFX FUND27. PRWAX FUND28=CGRWX FUND29. QVOPX FUND30.WTMGX FUND3I NWTX FUND SEX3X FUND33 BVALX FUND34BARAX FUND35=AFOPX FUND36= FTRNX FUND37AMRGX FUND38= DNLDX FUND39 HCRPX FUND40=NBSSX FUND41 PRGY( FUND42= INNDX FUND43 TWHIX FUND44=DWIVX FUND45=LMVTX FUND46=JHNGX FUND47 SRSPX FUND48CFX FUND49 CLMBX FUND5O= BOSSX; Part 2: DEA Macro This macro accepts one parameter: the index of the DMIJ to evaluate. The macro merges the input/output dataset with the appropriate theta and right hand side vectors. This dataset now contains the LP formulation for SAS to analyze. The procedure generates detailed printed reports for each of the NAME DEA DEC: AUTOMATES THE CREATION OF PROBLEM DATASET. MERGES THE CURRENT THETAAND RHS INTO THE COEFFICIENT MATRIX AND EXECUTES THE 1?, PAI*A: NDMU -THE INDEX NUMBER OF THE DMU TO BE ANALYZED %MACRO DEA(NOMU); DATA —NULL—; ARRAY TICKR(50) $S; SEE TICK: CALL S'PUT (FUND)(',PUT(TICKR{&NDMU}$S.)); RUN; DATA CUR—U; ARRAY THETA(50); ARRAY RH{5O}; IMIDMU; SET PARAM; CTHETA THETA(I); RHS_ RH{I}; KEEP CTHETA_RHS_; DATA PROBLEM; MERGE ONE CUR_DMU; TIELEI DATA ENVELOPMENT ANALYSIS'; TITLE2 CURREN1 E(AU= &FUNDX'; P EXECUTE THE LP' / PROC LP DATAPROBLEM POUT a &FUNDX; %MEND DEA 17 LINKED LINEAR REGRESSION - DEA MODEL Part 3: Loop Macro ' In order to iterate the execution of a SAS procedure without repetitive statements a macro was written to loop around the calls to the DEA macro. The loop macro accepts one parameter: the number of DMUs in study. Part 4: Extract Macro In addition to the written reports generated by the DEA macro, datasets were created in memory containing the results of the LPs. This macro combines the reports for each of the DMUs and extracts the DEA scores into a dataset which is then fed to the forecasting model. NAME: LOOP DEM. ITERATIVELY CALLS THE DEA MACRO PA4: LIMIT - NUMBER OF OMUS TO BE PROCESSED %MACRO LOOP(LIMIT) %DO INDEX I %TO &LIIrr; %DEA(&INDEX); %ENE; %MEND LOOE; NAME: EXTRACT DEW. EXTRACTS THE OBJECT WE VALUES FROM THE INDIVIDUAL FUND REPORTS AND CREATES DATASET CONTAINING THE IN FORMATION %MACRO EXTRACT; P MERGE LP OUTPUT DATASETS AND REMOVE UNIMPORtANT VARIABLES V DATA REEl; SEE &SYMBOLS; KEEP _VAR_ _VALUE. P EXTRACT OBJECTIVE FUNCTION VALUES i DATA RES2; SEE RESI; IF _VAft. OBJECr THEN SCORE = -VALUE-; ELSE DELETE; KEEP SCORE P GENERATE LISTING OF MUTUAL FUNDS DATA RESE; SEE TICK; ARRAY TIOKR(50) $5; DO I ITO 50 PCHANGE THIS TO 50 IN PRODUCTION RUN 1 NAME TICKR(I); OUTPUT; END; KEEP NAME; P MERGE CNTASEI' INTO ANAL FORM! DATA RESULTS; MERGE RESE RESE; PROC PRINT DATA=RESULTS; WEND EXTRACT; r EXECUTABLE CODE %LOOP(50); %EXTRACT; 18 LINKED LINEAR REGRESSION - DEA MODEL Forcasting Code P Mutual Fund Data Files 1 filename dati 'mfs93 data a'; filename dat2 'mfname data a'; filename dat5 'mfs90 data a'; proc rag; tle2 'Minimum Spec Model; model dea93 = E_Ratio NAV CAP_GAIN; P DEA Score Data Files 1 filename dat3 'score93 data a'; filename dat4 'score90 data a'; /* data info9o; infile dat5; array val(50}; input name $ val{}; option linesize 72 nonumber nodate; P Read 1993 Mutual Fund Data */ data one; mule dati; array val(50); input name $ val{}; proc transpose; P Transpose Matrix from IODMU to DMUIO run; proc transpose; P Transpose Matrix: from lO*DMU to DMUIO 1 run; proc datasets; P Assign proper names to lOs modify data 1; rename colt =Beta co12 = Load co13 = E_Rabe col4=NAV col5 = Cap-Gain co16 Dividend _Name_= Fund; 1990 Data Analysis */ 1 data scr9o; infile dat4; input dea90; data funds; P Create List of Fund Names / infile dat2; input Fund $; data scr93; r Read 1993 DEA scores from file infile dat3; input dea93; proc datasets; P Assign proper names to lOs modify data2; rename coil = Beta col2 Load co13 E_Rabo col4=NAV col5 = Cap-Gain coI6 Dividend _Name_= Fund; data comp90; merge data2 funds scr90; 1 tie1 '1993 DEA Analysis'; data comp93; set datal; merge datal funds scr93; tiflel 1990 DEA Analysis'; proc reg data=comp9o; tille2 'All Possible Regressions'; model dea9o= Beta Load E_Ratio NAV Cap-Gain Dividend /selection rsquare; proc reg data=cornp9o; tifle2 'Full Model'; model dea90= Beta Load E_Ratio NAV Cap- Gain Dividend; proc rag data=comp9o; tide2 'Minimum Spec'; model dea90= E_Ratio NAV CAP-GAIN; plot residual.(predicted. E_Ratio NAV Cap-gain); Run All possible regression models 1 proc reg; tle2 'All Possible Regression Models'; model dea93=Beta Load E_Ratio NAV Cap- Gain Dividend/ selecben=Rsquare; / /* Full Fit Model */ proc rag; title2'Full Model'; model dea93=Beta Load E_Ratio NAV Cap-Gain Dividend; Forecasting Validation Model 1 P Create predicted values using forecast model compare to actual 1993 values. /* 19 1 LINKED LINEAR REGRESSION - DEA MODEL model Ldea = Beta Load E_Ratio LNAV LCG LDiv plot residuat(predicted.); data predict; set datal; Exp_dea=-0.177220E_Rao + 0.016477NAV + 0.152344Cap_Gain+0.279313; if Exp_dea> 1 then Exp_dea = 1; proc reg data = linear; model Idea = Beta E_Rao LNAV LDIV; data forecast; merge predict funds scr93 scr9o; Err = dea93 - exp_dea; keep fund dea9O exp_dea dea93 err; / Linear Regression Equation / data predict2; set trans93; PDEA 0.6874598eta - 0.525658E_Ratio + 0.753563LNAV + O.229900LD1V -2.930810; Exp_dea = exp(PDEA); if Exp_dea> I then Exp_dea = 1; tine 'Regression Predictions'; tiUe2'Non-Linear Model'; proc pnnt data=forecast; var fund dea90 dec93 exp_dea err; / Compute Forecast Errors data frcs; merge predict2 funds scr93 scr90; Err = dea93 - exp_dea; keep fund dea90 exp_dea dea93 err; proc plot; plot exp_dea * dea93; / added 4/18/94'1 P Plot 93 Prediction vs 90 Actual data fcas merge scr90 forecast; keep exp_dea dea90 dea93 fund; title 'Regression Predictions'; tille2'Linear Model'; proc print data=frcst2; var fund dea90 dea93 exp_dea err; tillel 'Predicted 93 vs. Actual 1990'; tiUe2 'Non-Linear Model'; proc plot data=fcas plot exp_dea * dea90; P Linear transform to improve predictions data linear; set comp90; Ldea = log(dea90); LNAV = log(NAV); it Cap—Gain = 0 then LCG = 0; else LCG log(Cap_Gain); LEA log(E_Ratio); if Dividend = 0 then LDIV = 0; else LDIV = LOG(Dividend); P Transform forecast inputs to linear model / data trans93; set comp93; Ldea log(dea93); LNAV = log(NAV); if Cap—Gain = 0 then LCG = 0; else LCG = log(Cap_Gain); LER = log(E_Ratio); if Dividend = 0 then LDIV = 0; else LDIV = LOG(Dividend); Title 'Linear Forecast Model'; title2 'Model Verification'; proc rag data = linear; 20 Case Study: A Linked Linear Regression Data Envelopment Analysis Model for the Long-Term Prediction of Mutual Fund Performance Cindy Chao, Greg Higgins Michael McGee Department of Computer Science and Engineering SOUTHERN METHODIST UNIVERSITY Dallas, Texas May 14, 1994 C LINKED LINEAR REGRESSION - DEA MODEL Contents Abstract.............................................................................................................................3 Introduction...................................................................................................................... 3 Whatis a Mutual Fund? .................................................................................................4 DEAFactors......................................................................................................................4 Model Development and Validation............................................................................. 6 Examinationof Results ...................................................................................................9 InvestmentPerformance............................................................................................... 10 Conclusion...................................................................................................................... 11 References....................................................................................................................... 12 ProfessionalReferences................................................................................................. 13 Appendix........................................................................................................................ 14 DEAFormulation .............................................................................................. 15 DEACode ........................................................................................................... 16 ForcastingCode ................................................................................................. 19 2 Case Study: A Linked Linear Regression - Data Envelopment Analysis Model for the Long-Term Prediction of Mutual Fund Performance Michael McGee Greg Higgins Cindy Chao Department of Computer Science and Engineering Southern Methodist University Dallas, Texas 75275 Data Envelopment Analysis (DEA) can be used to evaluate managerial efficiency of mutual funds. By extending the DEA methodology with linear regression, a new prediction tool for future efficiency is developed. The linked model was tested on a sample of fifty growth mutual funds over a three year time horizon - resulting in accurate predictions of future efficiency with 99% confidence. he current mutual fund market T contains nearly $1.4 trillion dollars; this rapid growth of mutual funds as a long-term investment instrument has led to an increased demand for evaluation and forecasting techniques. The performance of a mutual fund is directly linked to the performance of the management staff. Data Envelopment Analysis (DEA) can be used to evaluate managerial efficiency; however, this is only a current assessment of a fund's management. DEA is a linear programming methodology that computes relative efficiency scores among a group of homogeneous decision-making units (DMU5). Combined with linear regression, DEA becomes a prediction tool for longterm forecasting. The linked model not only evaluates return on investment but also the quality of the fund's managerial team on a long-term basis. '3 LINKED LINEAR REGRESSION - DEA MODEL What is a Mutual Fund? goal of the linked model is forecasting A mutual fund is an investment of long-term performance, growth company that pools the money of mutual funds were selected because many institutional and individual their objectives emphasize the accumulation of capital gains, and they investors into one professionally have the greatest long-term growth managed investment account. The fund manager creates a portfolio for potential. the shareholders by buying a variety of "true" financial instruments, such as DEA Factors The inputs and outputs of the corporate stocks and bonds, and government and municipal securities, envelopment analysis were made up of in accordance with the fund's the main characteristics of a mutual particular investment objectives. Fund fund. They were selected because they managers are professional investors are important factors to investors and are directly linked to the performance who know the financial market and follow it daily with extensive research of the fund manager. The inputs capabilities to effectively investigate all selected were beta - a measure of risk, investment opportunities. The fund is expense ratio, and loads & other fees. always monitored, with individual The outputs used were net asset value securities constantly being bought, (NAV) per share, dividends, and sold, and traded in order to optimize capital gains. the investment objectives. The beauty Inputs of investing in sometimes twenty Beta is a measure of the fund different securities lies in the protection portfolio's average volatility, or risk, of the investment through portfolio relative to the S&P 500. The beta value diversification. of the market is a constant equal to Mutual funds are divided into six 1.00. A fund with a beta value greater major categories according to their than 1.00 is more volatile than the market. It is expected to outperform investment objectives, which are outlined in the fund's prospectus, such the "average fund" in a bull (rising) as goals, level of risk, and the type and market and bring in greater returns, frequency of return desired. Since the while it is expected to fall more than 4 LINKED LINEAR REGRESSION - DEA MODEL Ll brokerage costs, and transaction fees. the average fund in times of a bear (declining) market. A fund with a beta This value is a percentage of the purchase price taken off either before value less than 1.00 possesses the characteristics of the converse the money is invested (front-end load) situation. For example, a portfolio with or after investment when the a beta value of 0.75 is expected to be shareholder redeems the shares (backend load). The amount of the load is 25% less responsive to the general economic trend than the average fund established by the managerial company, and return 25% more conservative than but is paid to the sales and transaction the average fund. team. Some funds have a load of 0.00%, Expense ratio is a measure which meaning that they have the same relates expenses to the average net benefits as load funds, except they are assets for a period. It is calculated by sold directly by the fund company rather than through a sales broker so taking the annual expenses paid by the that no loads or other fees are incurred. fund (including management and With no-load funds, the investor must investment advisory fees, custodial and transfer agency charges, legal fees, seek out and contact the investment and distribution costs) divided by the firm directly. There is a distinct advantage to investing in no-load average shares outstanding for the period. The greater the expense ratio, funds. For example, if we had invested the larger the percentage of the $1000 in a particular portfolio with a shareholder's investment gets paid out 7.50% front-end load, $75 would be taken off the top, and we would be left directly to the management company. The expense ratio does not include with only $925 worth of shares in the loads or commissions paid for sales or fund. Had this been a no-load fund, the transaction fees. full $1000 would have been utilized. Loads and other fees represent all Outputs Net asset value per share represents other charges made to the shareholder how much each share is worth. It is which are not accounted for in the calculated by taking the total market expense ratio. A load is a portion of the value of a fund's shares (securities plus offering price of the fund that pays for cash plus accrued earnings minus costs such as sales commissions, LINKED LINEAR REGRESSION - DEA MODEL in different ways. Shareholders can liabilities) divided by the number of shares outstanding. No profits or losses reinvest dividends and capital distributions if they elect to do so. are realized through NAV until the fund is sold. Interest lies in the amount Model Development and Validation of increase or decrease in NAV from The decision—making units (DMUs) time of purchase to time of sale. in this analysis were the 50 growth Dividends are payments declared mutual funds. A prerequisite of DEA is by a corporation's board of directors and made to shareholders, usually on a homogeneous DMUs. By selecting the funds from the same sector (growth quarterly basis. A fund's dividends funds), this requirement is met. come directly from the dividends paid The DEA model uses the dual out by the individual securities within the portfolio. They are paid out equally formulation of the linear program as described by Charnes, Cooper and among the shareholders on a Rhodes (1978). The objective function scheduled time frame. of this form is to minimize the Capital gains on a mutual fund are objective value of the current DMU. If the profits made from the sale of a a DMU is efficient the optimal value is capital asset (such as a security, stock, 1, while an inefficient DMU is driven or bond) within the portfolio. The towards zero. The first step in the number used for capital gains is adjusted for losses and stock splits. The development of the linked model is the generation of DEA scores for the sale of an asset occurs when the mutual funds being studied. The data manager decides that a security needs collected for the mutual funds is used to be sold for the benefit of the portfolio. Capital gains are distributed as the inputs and outputs for the DEA to shareholders when they are realized code which is documented in the appendix. The DEA code is written (when a security is sold and a profit is using the SAS/OR package allowing made) and are paid out as capital the DEA scores to be fed directly into distributions. the regression phase of the model. Dividends and capital gains are DEA scores are generated for the treated as separate outputs because fifty mutual funds in the study using they are incurred, paid out, and taxed n LINKED LINEAR REGRESSION - . S S S the 1990 data collected from Morningstar Mutual Funds. At this point DEA scores are also generated for 1993 and stored for use in validating the model's forecasts. The 1990 DEA scores are then regressed against all of the six DEA factors previously described. This regression model is unique from previous research in that the goal of the regression is not to create an efficiency score, but to forecast an efficiency score for the future of a particular DMtJ. Previous work by Thanassoulis (1993) compared the DEA methodology with linear regression as a means of creating efficiency scores. Our model links the two methods into one prediction tool. The full factor model that resulted appeared on the surface to be a good predictor of efficiency (R2 = 0.8239). The full factor model included three non-significant factors at the 5% level of significance. Beta (p-value = 0.68), load (p-value = 0.65), and dividend (p-value = 0.50) were all eliminated from the model leaving only expense ratio, net asset value, and capital gains as significant factors. The revised model remained significant with an R2 = 0.8213. As stated before, this model appears to be a good predictor; however, linear fl fLAW flED Figure 1: Residual vs. Predicted DEA Score regression models must be validated through diagnostic tests. One assumption in regression analysis underpins the entire methodology errors are a randomly distributed variable following a normal distribution with mean equal to zero, N(0,(Y). Regression analysis in SAS is based on a linear relationship between the dependent and independent variables. These assumptions can be verified through the examination of the residuals, or errors, of the regression model. A plot of the residuals versus the predicted values, Figure 1, reveals a distinct pattern indicating a non-linear relationship between the independent and dependent variables. This requires the examination of univariate statistics S 7 0 DEA MODEL LINKED LINEAR REGRESSION - DEA MODEL S . S S S S for each of the variables in the regression model. A standard normal quantile plot for each of the variables allows the determination of the distribution of each variable. From the univariate procedures, some of the variables demonstrated an exponential distribution requiring a log—linear transformation in order to validate the regression model. Net asset value, capital gains, dividends and the actual DEA score follow exponential distributions while beta, load, and expense ratio are all normally distributed. A second data set of the inputs and outputs was created by taking the natural log of each of the exponentially distributed variables. Once normality was confirmed through univariate procedures, a new full factor linear regression model was fitted. The newly formed model resulted in an R2 = 0.8163 and was validated through the residual plot shown in Figure 2. Using the SAS regression report, the regression equation was created using the coefficients of the significant factors: beta, expense ratio, net asset value, and dividend. The next step is the creation of predicted efficiency scores. The Figure 2: Residual vs. Predicted DEA Score—after linear transform common method for forecast model verification is the prediction of the past. The linked model is verified by using the input and output data for 1993 in the regression equation created from the 1990 data. The predicted values are then compared to the actual 1993 DEA scores generated earlier by the DEA code. The forecast errors are evaluated using a significance test on the mean difference. The test is conducted at the 1% level of significance. H0: mean difference = 0; Ha: mean difference # 0. The resulting test statistic: z = -0.5164 and critical value, z = 2.57 leads to the conclusion that the prediction values are not significantly different from the actual DEA scores with 99% confidence. 8 0 LINKED LINEAR REGRESSION - flEA 1990 1.00000 FTRNX Fund DEA MODEL Predicted 1993 1.00000 flEA 1993 Error 0.62320 -0.37680 PRWAX GINLX 1.00000 1.00000 0.78983 0.33207 0.29415 0.23691 -0.49568 -0.09516 ACRNX 1.00000 1.00000 1.00000 0.00000 FDETX CFIMX 0.41944 0.73415 1.00000 1.00000 0.01100 0.01968 LOMCX 0.32965 0.98900 0.98032 0.68927 1.00000 10.31073 Table 1: Efficient Mutual Funds Examination of Results In analyzing the results of the sample studied, it is important to remember that a mutual fund is mainly used as a long-term investment, providing stable low-risk growth. A potential shareholder wants to invest in a fund that will be efficient in the future, regardless of its current efficiency. Therefore we focus our examination on the set of funds with efficient scores in either 1990 or 1993. Table 1 shows the 1990 DEA, Predicted 1993, and actual 1993 DEA scores along with the errors for each prediction (Error = 1993 DEA - Predicted 1993). Of the first four funds listed (all receiving efficient ratings in 1990), the future efficiency of three funds was correctly predicted. Our linked model predicted future behavior of currently efficient funds with 75% accuracy. Although the error for T. Rowe Price New America Fund, PRWAX, is high, we are not concerned with the magnitude of inefficiency, only the state of its efficiency. Of the last four funds in the table (all receiving efficient ratings in 1993), three funds were predicted accurately. The fifth and sixth funds, Fidelity Destiny Portfolios: Destiny II, FDETX, and Clipper Fund, CFIMX, both received' a predicted score of approximately 0.98 for 1993. Based on the previous significance test, these values are not significantly different from 1.00. Thus our model predicts the funds that will be efficient in the future with 75% accuracy. LINKED LINEAR REGRESSION - DEA MODEL Table 2: Investment Simulation Results Investment Performance Since investors are concerned with the long-term growth and appreciation of their investments, DEA by itself is not an effective tool to aid in the development of investment strategies. However, combined with linear regression, DEA gives us the ability to estimate the long-term efficiency of a fund. To illustrate the effectiveness of the linked model relative to DEA alone, we simulated an investment in the efficient funds in 1990 and the funds predicted to be efficient by our linked model. Since mutual funds allow the purchase of fractional shares, we invested $1000 in each of the mutual funds. The results are given in Table 2. The percent of appreciation we received in the portfolio of 1990 efficient mutual funds was 46.92%, while the appreciation was significantly higher for the portfolio of funds recommended by the linked model at 60.41%. 10 LINKED LINEAR REGRESSION - Q Conclusion The linked model allows us to broaden the range of applications for DEA. The linked model enables the prediction of future efficiency, which is of great benefit to investors because it provides more insight into the longterm viability of a mutual fund. The model also requires little historical data for each DMIJ unlike other forecasting methods. Another benefit is the relative compactness of the model; very few predictors are required to explain a high degree of variability and achieve accurate predictions. Data envelopment analysis by itself is not an accurate predictor of future efficiency of mutual funds. Only one fund that was efficient in 1990 remained in the efficient set in 1993. Linear regression and DEA combine to give us an accurate prediction for the future behavior of DMUs. DEA MODEL An immediate extension to this research is the definition of an "ideal type" mutual fund. An optimization algorithm will be used to generate values for the regression factors and record the combinations which yield and efficient fund prediction. Once this benchmark is defined a growth mutual fund prospectus can simply be visually inspected to determine managerial efficiency. The next phase of this research would be to compare the accuracy of the linked model with that of other forecasting methods, such as time series decomposition. Another step in the continued development of this area of research is to improve the accuracy of the predictions through further modifications to the regression model using a weighted regression model or the inclusion of adaptive error control. 11 LINKED LINEAR REGRESSION - DEA MODEL References Barr, R. S., L. M. Seiford, and T. F. Siems (1991), "An Envelopment Analysis Approach to Measuring the Management Quality of Banks," The Annals of Operations Research. J.C. Baltzer AG, Science Publishers. Dilorio, F. C. (1991), SAS Applications Programming: a Gentle Introduction. Boston: PWS-KENT Publishing Company. Flatt, L. G. (1994), "FUNDamentals: An Envelopment-Analysis Approach to Measuring the Managerial Quality of Mutual Funds," technical report. Dallas, Texas: Southern Methodist University, Department of Computer Science and Engineering. Morningstar Mutual Funds User's Guide (1992), Chicago: Morningstar, Inc. Morningstar Mutual Funds (1993), Chicago: Morningstar, Inc. Norman, M. and B. Stoker (1991), Data Envelopment Analysis: The Assessment of Performance. New York: John Wiley & Sons. SAS Institute Inc. (1989), SAS/OR User's Guide, Version 6, First Edition. Cary, NC: SAS Institute Inc. Sexton, T. R., S. Sleeper, and R. E. Taggart, Jr. (1994), "Improving Pupil Transportation in North Carolina," Interfaces, January-February 1994. Providence, RI: The Institute of Management Sciences. Thanassoulis, E. (1993), "A Comparison of Regression Analysis and Data Envelopment Analysis as Alternative Methods for Performance Assessments," O.R. Journal of the Operations Research Society. Coventry: Operational Research Society Ltd. 12 LINKED LINEAR REGRESSION - [ii DEA MODEL Professional References Barr, Richard S., Ph.D. Associate Professor, School of Engineering and Applied Sciences. Southern Methodist University, Dallas, Texas. Dula, Jose, Ph.D. Assistant Professor, School of Engineering and Applied Sciences. Southern Methodist University, Dallas, Texas. Guerra, Rudy, Ph.D. Assistant Professor, Department of Statistical Science. Southern Methodist University, Dallas, Texas. Welch, David. Senior Systems Analyst II, Bradfield Computer Center. Southern Methodist University, Dallas, Texas. 13 LINKED LINEAR REGRESSION - DEA MODEL Appendix DEA Formulation Mat hematicalformuation of the DEA model used in the Linked Model. DEA Code The DEA code is developed using Release 6.0 of the SAS System on an IBM 3090 mainframe. The code presented requires only two input files. One file contains the input and output factor data while the second file contains a listing of the mutualfund ticker symbols. The program is construct ed from a series of SAS Macros which fully automate the formulation and solution of the multiple linear programs required in Data Envelopment Analysis. Forecasting Code Developed as an extension to the DEA code, the forecasting model uses the same computing platform and software. The program reads the same data files as the DEA code as well as the DEA scores generated by the DEA study, The program generates predicted efficiency scores and all necessary statistical procedures for validating the model. 14 F] LINKED LINEAR REGRESSION - DEA MODEL DEA Formulation A 0 Inputs = -------- ] S Ex + Slacks = = outputs A. Mine A7-A!^O A7 e^!o Mat heinatical formulation of DEA requires that the inputs and out puts for each DMU are formated in a matrix, A The matrix is multiplied by a vector, 2,to construct the linear programming constraints. Minimize 0 Subject to: Inputs Beta: xi'DMUi +xDMU2+... +xsoDMUso - xs(E 1)<-E 0 Load: xIDMUI+x'DMU2+...+x5oDMU50-x5I(&1):50O E_Patio: XIDMUI + X2 'DMU2 +... + x5a • DMU50 - Xsi (& 1)--^0 '0 Outputs NAV: xDMU +X'DMU2+... +XDMU50-x51(00) 01 Cap_Gain: xDMUi + X2 • DMU2 + + Xo DMU5O - Xti (0 0) 0 * 1 Dividend: xiDMUi + X2 • DMU2 +... + x50 DMU5O X51 (e 0) 0 1 All Variables ^! 0 where 01$ current declslon-inaldng unit (OMU) Linear program solved by SAS for each DM11. 15 fl LINKED LINEAR REGRESSION - S DEA Code Part 1: Initialization MICHAEL MCGEE SENIOR DESIGN GROUP '94 DEA ANALYZER This section reads the data files into S DEA MODEL a SAS dataset. Using the SAS Data Management Language datasets are created for the right hand side vectors, as well as the theta vectors. The two datasets are perfectly matched - each right hand side has exactly one corresponding theta vector. There is one . right hand side and one theta vector for each DMtJ in the study. These vectors are merged with the inputs and outputs to formulate the linear program. The final step in the initialization is the renaming of variables to the proper ticker symbols in order to aid in report interpretation. FILENAME NOT MFNAME DATAA'; FILENAME DAT F9O DATA A'; OPTIONS LINESIZE72; %LEI SYMBOLS ACRNX LOMCX HRTVX STCSX BEONX GINIX PARNX VCAGX DEVLXAGBBX FPPTX FDETX FDESX FCND( FBGRX FDVLX SRFCX FRGRX BIRWIX SAMVX FDFFX FAGOX MLSVX ALTFX ANEFX GPAFX PRWAX CGRWI( OVOPX WTMGX NWTX SEOCX BVALX BARANAFGPX FTRNXAMRGX DNLDX HCRPX NBSSX PRGWI( INNDX TWHIX DW1X LMVTX JHNGX SRSPX CFIMX CLMBX BOSSX DATA TICK; WILE NOT; ARMY TICKR(so)ss; INPUT TICKR?); rU MUTUAL FUND TICKER SYMBOLS •1 DATA ONE; INFILE DAT; ARRAY FUND (50) INPUT _ID_ $ FUND (} _TYPE_ $ _RHS_; P GENERATE THETACOLUMNS DATA NEW-COL; SET ONE; ARRAY THETA (SO); ARRAY FUND(5O); DOIITO5 IF _N_ = I THEN THETA(I) = I; ELSE IF _N_ >=2 & _N_ <=4 THEN THETA{I}= -FUND(I); ELSE THETA(I) 0; END DROP I; P GENERATE RIGHT HAND SIDE VECTORS DATA RANGE; ARRAY RH(50); ARRAY FUND(50); SET ONE; DO I = ITO 5 IF _N_ I THEN RH(I} = MISSING; ELSE IF _N_>4 THEN RH(I)= FUND{I}; ELSE RH(I) 0; END; DROP MISSING I FUNDI-FUND5O; DATA PARAM; MERGE NEW-COL RANGE; P RENAMES THE FUNDS TO THE PROPER NASDAQ TICKER SYMBOL / PROC DATASETS; MODIFY ONE; RENAME FUNDI ACRNX FUNO2 LOMCX FUNDS HRT\ FUND4 = STCSX FUNDS = BEONX FUND6 = GINLX FUND7= PARNX FUND8 = VCAGA FUNDS = DEVLC FUNDIO= AGBBX FUNDII= FPPT)( FUNDI2= FDETX S iri LINKED LINEAR REGRESSION - DEA MODEL FUNDI3= FDESX FUNDl4 FCN1X FUNDIS= FBGRX FUNDI6= FDVLX FUNDI7= SRFCX FUND118 FRGRX FUNDI9= BRWIX FUND20= SAM'fX FUND2I FDFFX FUND22 FAGOX RJND23= MLSVX FUND24ALTFX FUND25=ANEFX FUND26= GPAFX FUND27= PRWAX FUND28 CGRY( FUND29 QVOPX FUND30= WTMGX FUND3I= NYVTX FUND32 SEcGX FUND33 BVALX FUND34= BARAX FUND35=AFGPX FUND36 FTRNX FUND37 AMRGX FUND38DNLDX FUND39= HCRPX FUND40=NBSSX FUND4I= PRGWX FUND42 INNDX FUND431WHD( FUND44 DWIvX FUND45= LMVD( FUND46= JHNGX FUND47= SRSPX FUND48= CFX FUND49= CLMB)( FUNDSO. BOSSX; Part 2: DEA Macro 'NAME: DEA DESC: AUTOMATES THE CREATION OF PROBLEM DATASET. MERGES THE CURRENT THETAAND RHS INTOTHE COEFFICIENT MATRIXAND EXECUTESTHE LP. PAPM:NDMU-THEINDEXNUMBEROFTHEDMUTOBEANALYZED This macro accepts one parameter: the index of the DM1.1 to evaluate. The macro merges the input/output dataset with the appropriate theta and right hand side vectors. This dataset now contains the LP formulation for SAS to analyze. The procedure generates detailed printed reports for each of the Dtv[(Js. •1 %MACRODEA(NDMU); DATA _NULL_; ARMY TICKR(So)ss; SErTK; CALL SYMPUT ('FUNDX',PUr(TICKR(&NDMU},SS.)); RUN; DATA CURU; ARRAY THETA{50}; ARRAYRH(5O); h&NDMU; SET PARM; CTHEIA=A TH ErA{I}; _RHS_ = RH{I}; KEEP CTHETA_RHS_; DATA PROBLEM; MERGE ONE CU R_DMU; TITLEI 'DATA ENVELOPMENT ANALYSIS'; TITLE2 CURRENT OMU= &FUNDX'; / EXECUTETHE LP '/ PROC LP DATA=PROBLEM POUT &FUNDX; %MEND DEA; 17 LINKED LINEAR REGRESSION - DEA MODEL Part 3: Loop Macro In order to iterate the execution of a SAS procedure without repetitive statements a macro was written to loop around the calls to the DEA macro. The loop macro accepts one parameter: the number of DMUs in study. - P NAME LOOP DESC: ITERATIVELY CALLS THE DEA MACRO PAPM: LIMIT • NUMBER OFDMU5TOBEPROCESSED %WCROLOOLIM %DO INDEX I%TO8LAIIT; %DEA(AJNDEX); %EN %MENDLOO Part 4: Extract Macro In addition to the written reports generated by the DEA macro, datasets were created in memory containing the results of the LPs. This macro combines the reports for each of the DMUs and extracts the DEA scores into a dataset which is then fed to the forecasting -1 1 model. "NAME EXTRACT DESC: EXTRACTS THE OBJECTIVE VALUES FROM THE INDIVIDUAL FUND REPORTSANDCREATESA DATASET CONTAINING THE INFORMATION -/.MACRO EXTRACT; P MERGE LP OUTPUT DATASETSAND REMOVE UNIMPORTANT VARIABLES 'I DATA RESI; SET &SYMBOLS; KEEP VARVALUE; P EXTRACT OBJECTFIE FUNCTION VALUES / DATARES2; SET RESI; IF -VAR OBJECT'THEN SCORE =_VALUE_; ELSE DELETE; KEEP SCORE; P GENERATE LISTING OF MUTUAL FUNDS/ DATA RES3; SET TICK; ARRAY TICKR{SO} $5; ITO 501, PCHANGE THIS TO 50 IN PRODUCTION RUN 'I DO NAME TICKR{l); OUTPUT; END; KEEP NAME; P MERGE DATASET INTO FINAL FORM •/ DATA RESULTS; MERGE RES3 RES2; PROC PRINT DATMRESULTS; %MEND EXTRACT; P EXECUTABLE CODE %LOOP(50); %EXTRACT; 18 LINKED LINEAR REGRESSION - DEA MODEL Forcasting Code 1 Mutual Fund Data Files 'I filename datl 'm1s93 data a'; filename daf2 'mfname data a'; filename dat5 'mfs go data a'; proc reg; tle2 'Minimum Spec Model'; model dea93 = E_Ratio NAV CAP_GAIN; 1 DEA Score Data Files 1 filename dat3 'score93 data a'; filename dat4 'score90 data a'; 1* 1990 Data Analysis / option linesize = 72 nonumber nodate; data info9o; intile dat5; array val(50); input name $ val{}; /* Read 1993 Mutual Fund Data •/ data one; del I; infile array val(50); input name $val{}; proc transpose; run; proc transpose; / Transpose Matrix: from IODMU to DMU*IO *1 run; J* Transpose Matrix: from IODMU to DMUIO proc datasets; /* Assign proper names to lOs 1 modify data2; rename coil = Beta co12 = Load co13 = E_Ratio col4=NAV col5 Cap—Gain co16 = Dividend _Name— = Fund; proc datasets; P Assign proper names to lOs 1 modify data 1; rename coil = Beta co12 = Load co13 = E_Ratio col4=NAV co15 = Cap—Gain co16 = Dividend _Name— = Fund; data scr90; infile dat4; input dea9o; data funds; P Create List of Fund Names 1 infile dat2; input Fund $; data comp9o; merge data2 funds scr9o; data scr93; / Read 1993 DEA scores from file 1 infile dat3; input dea93; title 1 '1990 DEA Analysis'; proc reg dala=comp9o; tifle2 'All Possible Regressions'; model dea90= Beta Load E_Ratio NAV Cap —Gain Dividend /selection = rsquare; titlel '1993 DEAAnalysis'; data comp93; set datal; merge data 1 funds scr93; proc reg data=comp90; tie2 'Full Model'; model dea9o= Beta Load E_Ratio NAV Cap —Gain Dividend; 1 Run All possible regression models 1 proc reg data=comp90; title2 'Minimum Spec'; model dea9o= E_Ratio NAV CAP_GAIN; plot residual.(predicted. E_Ratio NAV Cap—gain); proc reg; title2 'All Possible Regression Models'; model dea93=Beta Load E_Ratio NAV Cap_Gain Dividend/ selection=Rsquare; 1 Full Fit Model 1 proc reg; title2'Full Model'; model dea93=Beta Load E_Ratio NAV Cap—Gain Dividend; P Forecasting Validation Model / P Create predicted values using forecast model compare to actual 1993 values. 19 I LINKED LINEAR REGRESSION - DEA MODEL data predict; set data 11; Exp_dea=-0.1 77220E_Rao + 0.016477NAV + 0. 152344*Cap_Gain+0.279313; model Ldea = Beta Load E_Ratio LNAV LCG LDiv; plot residual.(predicted.); proc reg data = linear; model Idea = Beta E_Rao LNAV LDIV; if Exp_dea> 1 then Exp_dea = 1; data forecast; merge predict funds scr93 scr9o; Err = dea93 - exp_dea; keep fund dea90 exp_dea dea93 err; S I' Linear Regression Equation 1 data predict2; set trans93; PDEA = 0.687459Beta - 0.525658E —Ratio + 0.753563LNAV + 0.229900LD1V -2.930810; Exp_dea = exp(PDEA); if Exp_dea> 1 then Exp_dea = 1; title 'Regression Predictions'; title2'Non-Linear Model'; proc print data=forecast; var fund dea90 dea93 exp_dea err; 1 Compute Forecast Errors 1 data frcs; merge predict2 funds scr93 scr9o; Err dea93 - exp_dea; keep fund dea90 exp_dea dea93 err; proc plot; plot exp_dea dea93; 1 added 4/18/94/ 1 Plot 93 Prediction vs 90 Actual 1 data fcast merge scr90 forecast; keep exp_dea dea90 dea93 fund; title Regression Predictions'; title2'Linear Model'; proc print data=frcst2; var fund dea90 dea93 exp_dea err; title 1 'Predicted 93 vs. Actual 1990'; tifle2 'Non-Linear Model'; proc plot data=fcast plot exp_dea * dea90; /* Linear transform to improve predictions 1 data linear; set comp90; Ldea = log(deago); LNAV = log(NAV); if Cap—Gain = 0 then LCG = 0; else LCG = log(Cap_Gain); LEA = log(E_Ratio); if Dividend = 0 then LDIV = 0; else LDIV = LOG(Dividend); / Transform forecast inputs to linear model */ data trans93; set comp93; Ldea = log(dea93); LNAV log(NAV); if Cap—Gain = 0 then LCG = 0; else LCG = Iog(Cap_Gain); LEA = log(E_Aatio); if Dividend = 0 then LDIV = 0; else LDIV LOG(Dividend); Title 'Linear Forecast Model'; title2 'Model Verification'; proc reg data = linear; 20