Consider a regression of y on x1, x2 and x3. You are told that x1 and x3 are positively

correlated but x2 is uncorrelated with the other two variables.

(a) [3] What, if anything, can you say about the relative magnitudes of the estimated

coefficients on each of the three explanatory variables?

(b) [6] What, if anything, can you say about the precision with which we can estimate

these coefficients?

- Consider a regression of y on two explanatory variables, x1 and x2, which are potentially

correlated (though not perfectly). Say that x1 can take on any value between 1 and - A researcher draws a random sample of observations, with information on y, x1

and x2. She runs a regression on this sample, which we refer to as regression A.

She then takes the subset of the data where x1 is restricted to only take values between

1 and 50, but there is no restriction on x2. She runs another regression, which we refer

to as regression B.

(a) [4] Do you expect the estimated coefficients to differ between regressions A and

B? Explain.

(b) [5] Do you expect any difference in the precision of the estimated coefficients

between regressions A and B? Explain. - [5] Consider a regression on three explanatory variables, x1, x2 and x3. Consider two

possible F-tests for

(a) the joint significance of x2 and x3.

(b) the joint significance of x1, x2 and x3.

In which of these is the null hypothesis more likely to be rejected? Provide both an

intuitive and a mathematical explanation. - [10] A researcher has data on a number of different airline routes. In particular, for

each route, he observes the average fare on the route, avfare and the number of airlines

operating on the route, carriers. The researcher would like to obtain a measure of

consumer demand on each route, but is unable to access this variable. He is concerned

that the lack of this variable will cause a simple regression of avfare on carriers to

generate an estimate that is biased downward. Do you agree?

In order to answer this question you will need to make assumptions based on economic

analysis. State your assumptions clearly, even if you are unsure about them, and then

explain whether the slope coefficient will indeed be biased, and if so, how. - Suppose you need to estimate a regression using matrix algebra. A potential X matrix

of explanatory variables is as shown below. There is also a [5 × 1] vector, y, of the

dependent variable that is not shown.

1

x1 x2 x3 x4 x5

2 4 8 52 44

2 7 14 47 48

3 2 4 51 23

6 0 0 49 47

8 6 12 47 58

(a) [4] Suppose you want to regress y on only x1, x2 and x3. Explain what problems,

if any, you would encounter in doing so.

(b) [4] Suppose you want to regress y on x1, x2 and x4. Explain whether you would

encounter any problem doing so. If not, describe in detail how you would go

about using matrix algebra to do so.

(c) [4] Suppose you want to regress y on x1, x2, x4 and x5. However, a colleague points

out that this is not a square matrix. Explain whether this is a valid concern. If

not, describe in detail how you would go about using matrix algebra to do so. - [5] Consider the results from regressing the log of wages, lwage on years of education,

educ and years spent in the workforce, exper :

lwage = 0.532 + .094educ + .026exper (1)

Suppose that each additional year of education must necessarily reduce workforce experience by one year. What is the marginal effect of an additional year’s education on

wages? [Use the exact, rather than the approximate, percentage interpretation.] - [5] Consider using a one-tailed as well as a two-tailed test of a null hypothesis regarding an estimated regression coefficient. For the same significance level, which test is

more stringent? In other words, for which test does rejection of the null hypothesis

automatically imply rejection in the other test as well, but not vice versa? Explain. - Explain whether, and how, the critical value of a t-test, for a given significance level

is affected by:

(a) [3] The number of observations.

(b) [3] The number of explanatory variables. - Consider the following OLS model of women’s labour force participation:

inlf = β0 + β1kids0 2 + β2kids2 6 + β1educ + β2faminc

where inlf is a dummy variable for whether the woman is in the labour force, kids0 2

is the number of children aged between 0 and 2, kids2 6 is the number of children

between the ages of 2 and 6, educ is the number of years of education of the woman

and faminc is the family’s total income.

2

Suppose a researcher is interested in testing whether the presence of infants (age 0 to 2)

has the same effect of being in the labour force as the presence of children aged between

2 and 6. The alternative is a one-sided test that infants have a greater disincentive

effect than slightly older children.

(a) [4] Write down the null and alternative hypotheses formally.

(b) [8] Can you directly test the null hypothesis if given the coefficients and standard

errors from this regression? If so, explain how. If not, explain what you would

need to do instead. - Suppose you have data on a sample of recent house sales, for each of which you observe

the house price, price, the square feet of the house, sqrft, and a dummy variable for

whether the house has a garage or not, garage. You want to run a regression of the

log of the house price, with the other two variables as explanatory variables.

(a) [4] Briefly explain what signs you expect to find on the two slope coefficients.

(b) [3] Do you expect a positive or negative correlation between sqrft and garage?

Explain your reasoning.

(c) [6] Suppose you run the regression, and also include an interaction between the

two right-hand side variables, and obtain the following results (standard errors

not shown):

log(price) = 2.465 + 0.64 log sqrft + 0.08garage + 0.011 log sqrft × garage (2)

Calculate the marginal effect of having a garage for a 2000 square foot house.

(d) [6] Suppose you also now obtain data on coveredlot, which is the total built upon

area of the lot. In other words this is the sum of the square footage of the house

and of the garage. Would it be sensible to add this variable to the regression

above? What problems, if any, might you encounter in estimating the magnitude

or precision of the coefficient on this new variable? - [8] You have monthly data on gasoline prices in two cities—Vancouver and Toronto,

for the years 2006–2010. In each month of each year, you observe the average price

of gasoline in each city. Prices in Vancouver are usually higher than in Toronto, but

the cities follow similar price trends, as prices rise in the summer months and respond

similarly to demand and cost shocks. However, there are month-to-month fluctuations

for various reasons.

Starting from January 1, 2008, Vancouver imposed a carbon tax which was expected

to be reflected in higher gasoline prices. Explain how you would use a difference-indifferences framework to estimate the effect of the carbon tax. Carefully define any

new variables you need based on the data provided. Then, write down a line of R

code which will run the regression you need. Make sure you point out which regression

coefficient is the desired estimate.