Consider a regression of y on x1, x2 and x3. You are told that x1 and x3 are positively
correlated but x2 is uncorrelated with the other two variables.
(a) [3] What, if anything, can you say about the relative magnitudes of the estimated
coefficients on each of the three explanatory variables?
(b) [6] What, if anything, can you say about the precision with which we can estimate
these coefficients?
- Consider a regression of y on two explanatory variables, x1 and x2, which are potentially
correlated (though not perfectly). Say that x1 can take on any value between 1 and - A researcher draws a random sample of observations, with information on y, x1
and x2. She runs a regression on this sample, which we refer to as regression A.
She then takes the subset of the data where x1 is restricted to only take values between
1 and 50, but there is no restriction on x2. She runs another regression, which we refer
to as regression B.
(a) [4] Do you expect the estimated coefficients to differ between regressions A and
B? Explain.
(b) [5] Do you expect any difference in the precision of the estimated coefficients
between regressions A and B? Explain. - [5] Consider a regression on three explanatory variables, x1, x2 and x3. Consider two
possible F-tests for
(a) the joint significance of x2 and x3.
(b) the joint significance of x1, x2 and x3.
In which of these is the null hypothesis more likely to be rejected? Provide both an
intuitive and a mathematical explanation. - [10] A researcher has data on a number of different airline routes. In particular, for
each route, he observes the average fare on the route, avfare and the number of airlines
operating on the route, carriers. The researcher would like to obtain a measure of
consumer demand on each route, but is unable to access this variable. He is concerned
that the lack of this variable will cause a simple regression of avfare on carriers to
generate an estimate that is biased downward. Do you agree?
In order to answer this question you will need to make assumptions based on economic
analysis. State your assumptions clearly, even if you are unsure about them, and then
explain whether the slope coefficient will indeed be biased, and if so, how. - Suppose you need to estimate a regression using matrix algebra. A potential X matrix
of explanatory variables is as shown below. There is also a [5 × 1] vector, y, of the
dependent variable that is not shown.
1
x1 x2 x3 x4 x5
2 4 8 52 44
2 7 14 47 48
3 2 4 51 23
6 0 0 49 47
8 6 12 47 58
(a) [4] Suppose you want to regress y on only x1, x2 and x3. Explain what problems,
if any, you would encounter in doing so.
(b) [4] Suppose you want to regress y on x1, x2 and x4. Explain whether you would
encounter any problem doing so. If not, describe in detail how you would go
about using matrix algebra to do so.
(c) [4] Suppose you want to regress y on x1, x2, x4 and x5. However, a colleague points
out that this is not a square matrix. Explain whether this is a valid concern. If
not, describe in detail how you would go about using matrix algebra to do so. - [5] Consider the results from regressing the log of wages, lwage on years of education,
educ and years spent in the workforce, exper :
lwage = 0.532 + .094educ + .026exper (1)
Suppose that each additional year of education must necessarily reduce workforce experience by one year. What is the marginal effect of an additional year’s education on
wages? [Use the exact, rather than the approximate, percentage interpretation.] - [5] Consider using a one-tailed as well as a two-tailed test of a null hypothesis regarding an estimated regression coefficient. For the same significance level, which test is
more stringent? In other words, for which test does rejection of the null hypothesis
automatically imply rejection in the other test as well, but not vice versa? Explain. - Explain whether, and how, the critical value of a t-test, for a given significance level
is affected by:
(a) [3] The number of observations.
(b) [3] The number of explanatory variables. - Consider the following OLS model of women’s labour force participation:
inlf = β0 + β1kids0 2 + β2kids2 6 + β1educ + β2faminc
where inlf is a dummy variable for whether the woman is in the labour force, kids0 2
is the number of children aged between 0 and 2, kids2 6 is the number of children
between the ages of 2 and 6, educ is the number of years of education of the woman
and faminc is the family’s total income.
2
Suppose a researcher is interested in testing whether the presence of infants (age 0 to 2)
has the same effect of being in the labour force as the presence of children aged between
2 and 6. The alternative is a one-sided test that infants have a greater disincentive
effect than slightly older children.
(a) [4] Write down the null and alternative hypotheses formally.
(b) [8] Can you directly test the null hypothesis if given the coefficients and standard
errors from this regression? If so, explain how. If not, explain what you would
need to do instead. - Suppose you have data on a sample of recent house sales, for each of which you observe
the house price, price, the square feet of the house, sqrft, and a dummy variable for
whether the house has a garage or not, garage. You want to run a regression of the
log of the house price, with the other two variables as explanatory variables.
(a) [4] Briefly explain what signs you expect to find on the two slope coefficients.
(b) [3] Do you expect a positive or negative correlation between sqrft and garage?
Explain your reasoning.
(c) [6] Suppose you run the regression, and also include an interaction between the
two right-hand side variables, and obtain the following results (standard errors
not shown):
log(price) = 2.465 + 0.64 log sqrft + 0.08garage + 0.011 log sqrft × garage (2)
Calculate the marginal effect of having a garage for a 2000 square foot house.
(d) [6] Suppose you also now obtain data on coveredlot, which is the total built upon
area of the lot. In other words this is the sum of the square footage of the house
and of the garage. Would it be sensible to add this variable to the regression
above? What problems, if any, might you encounter in estimating the magnitude
or precision of the coefficient on this new variable? - [8] You have monthly data on gasoline prices in two cities—Vancouver and Toronto,
for the years 2006–2010. In each month of each year, you observe the average price
of gasoline in each city. Prices in Vancouver are usually higher than in Toronto, but
the cities follow similar price trends, as prices rise in the summer months and respond
similarly to demand and cost shocks. However, there are month-to-month fluctuations
for various reasons.
Starting from January 1, 2008, Vancouver imposed a carbon tax which was expected
to be reflected in higher gasoline prices. Explain how you would use a difference-indifferences framework to estimate the effect of the carbon tax. Carefully define any
new variables you need based on the data provided. Then, write down a line of R
code which will run the regression you need. Make sure you point out which regression
coefficient is the desired estimate.