Problem Set #2 - Agricultural and Resource Economics
Transcription
Problem Set #2 - Agricultural and Resource Economics
Department of Agricultural and Resource Economics University of California at Berkeley Steve Buck and Sylvan Herskowitz Spring 2015 ENV ECON 118 / IAS 118 – Introductory Applied Econometrics Assignment 2 Due February 24, at beginning of class This assignment should be completed using STATA, and you are encouraged to use a .do file to write your code for the exercise. To write notes in the .do file that STATA will not read as commands, type an “*” at the beginning of each line in which you’ve written a comment. This will help you keep track of the purpose of each command, which question you are trying to answer, etc. The first thing you should do in your .do file is to change directories so that Stata knows where to find the data that you downloaded and saved to your computer. To do this, you will use a command that is something like the following (but your file path will vary) cd "C:\....\EEP_118\PS2_Liberia" Then turn on a log file: log using PS_2.txt, text replace This time you are opening a Stata-formatted “.dta” file, so you will use the “use” command instead of “insheet”, which was needed to open a .csv spreadsheet in the last assignment. Your command should look like this: use "PS2_LiberiaData_S2015.dta", clear Finally don’t forget to close your log file at the end of the do-file by including the line “log close”. Note: For Exercise 1 you need to submit your log file from your Stata work in addition to your written answers. Exercise 1: The data for this exercise comes from a research project Sylvan was working on in Monrovia, Liberia. The sample consists entirely of women who came to apply for a factory job at the beginning of 2014. Women who were eligible to be hired were then given a survey which covered a wide range of topics including family composition, education background, household consumption, time use, and earnings among other topics. We have drawn a sample from the total of 720 women in the survey. The PS2_LiberiaData_S2015.dta file includes the following variables: age: Age of respondent secondary: Whether the respondent has completed secondary school (1=yes, 0=no) hoh: Respondent’s status as head of household (1=household head, 0=otherwise) hhmbrs: Number of household members children: Number of children in household under age 14 elderly: Number of elderly in household above the age of 64 exptot: Total monthly household expenditures (in Liberian Dollars) expclothes: Monthly expenditures spent on clothing (in Liberian Dollars) expfood: Monthly food expenditures per household (in Liberian Dollars) expnonfood: Monthly non-food expenditures per household (in Liberian Dollars) Also, note that at the time of the survey the exchange rate was 80 Liberian Dollar per US Dollar 1. First, write a short paragraph that describes your data. In particular: a) How many women are in your data set? What is their average age? What is their full age range? How many women have completed secondary school? How many are head of their household? b) Construct a variable exptotpc equal to total expenditures per capita in US$ using the given exchange rate. Plot a histogram of this constructed variable. Include this answer key in your solutions. What is the range of household total expenditures per capita? Hint: it may be easier to first generate a variable for expenditures per capita in Liberian Dollars and then convert it into US dollars. c) For each household calculate the proportion of household expenditures spent on clothes. What is the mean proportion of household expenditures on clothes? What is the median? How does this proportion of household expenditures on clothes differ between the households of women who have completed secondary school and those who have not? d) Construct a variable expfoodpc equal to total expenditures on food in US$ per person in the household. Plot a scatter diagram of this expfoodpc on exptotpc constructed in part (b). Include this in your solutions. How does this amount spent on food relate to spending overall? Stata tips: To count observations, use the command count. To create a new variable named var1, use the command generate var1. Open the “Data Editor (Browse)” to see and check what you have done. To create a scatter plot of variables y on x, use scatter y x and to create a histogram of a variable x, use the command histogram x. The command tabulate lists all values a variable takes in the sample and the number of times it takes each value. To summarize data for a specified subset of the observations, you can us summarize along with an “if” statement. 2. Estimate the following model of food and total expenditures: (1) expfood 0 1exptotal u a) Interpret your bˆ1 , remembering the triplet S(ign), S(size), and S(ignificance), though you don’t need to comment on significance in this problem set. b) How much of the variation in food expenditures is explained by variations in total expenditures? c) What is the predicted level of food expenditure for a household with total monthly expenditures of US$175? 3. Now estimate the following models of food and non-food expenditures. Note that you will first need to generate new, logged versions of the variables in the regression model (Stata hint: this can be done by using generate and the ln option): (2) log( expfood ) 0 1 log( exptotal ) u (3) log( expnonfood ) 0 1 log( exptotal ) u a) In this sample, what are the elasticities of food and non-food expenditures with respect to total expenditures? Comparing the estimates of bˆ1 in equations (2) and (3), do your results seem reasonable? (Hint: What does it mean for an elasticity to be greater or less than 1?) b) Using the results from (2), how would you expect food expenditure to change if total expenditures decreased by 15%? 4. We will now explore the role of household size in food consumption: (4) log( expfood ) 0 1 log( exptotal ) 2 hhmbrs u a) Estimate equation (4) and interpret your results. Is it a better statistical relationship than equation (2)? b) How did your estimate of bˆ1 change between equation (2) and equation (4)? Without performing any calculations, what information does this give you about the correlation between total expenditures and household size? (Explain your reasoning in no more than 4 sentences.) c) Predict the expected value of food expenditures of a household with 4 members and total expenditures of US$150 per month using your results from (4). 5. A country’s dependency ratio is the ratio of old and young dependents (dependents are those not in the labor force) to the working-age population. A similar measure could be constructed for the household: (5) hhdr hh members under 14 or over 64 hh members age 15 64 Equation (4) doesn’t quite capture how the composition of a household, i.e. the characteristics of the members, is associated with expenditure. You suspect that the log of total expenditures is negatively correlated with the log of the household dependency ratio, controlling for household size. a) Write an equation you could estimate that would test this hypothesis. b) Estimate the equation in part (a). Does the evidence from the regression support or contradict the hypothesis? Why? What might be driving this correlation? c) How many observations are in the regression estimated in 5(b)? Why is this different from what was estimated in 4(a)? Exercise 2 Population growth is a critical factor effecting a country’s growth, development, and ability to manage its natural resources. A researcher is interested in the relationship between different personal and household characteristics on women’s fertility outcomes. She has information from a survey of women in a country she is interested in. She has the following pieces of data in her data set: children: number of children that the respondent has income: household income in thousands of dollars per year ageatmarriage: age of the woman when she first got married education: number of years of education for the respondent parentsincome: parents’ income of respondent. In order to explore this further, the researcher runs a few regressions. She gets the following results from her regressions: (1) children = 2.45 – 1.57 income – 0.76 ageatmarriage (0.33) (0.14) (0.04) R^2 = 0.37 (2) children = 2.42 – 0.93 income – 0.52 ageatmarriage – 0.33 education (0.28) (0.09) (0.02) (0.11) R^2 = 0.53 (3) children = 2.48 - 0.65 income – 0.38 ageatmarriage – 0.22 education – 0.53 parentsincome (1.74) (0.58) (0.54) (0.18) (0.18) R^2 = 0.55 Remember that the numbers in parentheses beneath the regression equation are the standard errors for the estimated parameter value. The R^2 is also reported for each regression model. a) Looking at the results in the first regression (1) do the signs of the coefficients on income and ageatmarriage make sense? Are they statistically significant? Do you trust the estimated magnitude of these coefficients? Why or why not? b) The researcher then decides to add the variable education to the regression model and estimates equation (2). Comparing the results from equation (2) and the original ones from equation (1), what problem do you think the first model may have had? How can you tell? c) The researcher then decides to go further and add parentsincome to the other variables and estimates equation (3). She notes that the R^2 has improved slightly over the R^2 in equation (2). If there are other notable advantages or disadvantages of equation (3) relative to (2) then point them out. Overall, do you think model (3) is an improvement over the one in equation (2)? Why or why not?