您好,欢迎来到化拓教育网。
搜索
您的当前位置:首页IE-Outline_H-O

IE-Outline_H-O

来源:化拓教育网


INTRODUCTORY ECONOMETRICS –

P12205

MODULE OUTLINE - 2010/2011

Module Convenor: Dr. Saileshsingh Gunessee Date: February 2011

Contents Page

I. Module Details II. Module Aims, Objectives and Outcomes III. Module Assessment IV. Module Content V. Module Text VI. Access to Resources VII. Assessed Lab workshops VIII. Module Topics & Readings IX. Teaching Plan X. Exam Paper and Feedback XI. Exam Formula sheet XII. Exercise Classes Questions XIII.

MCQ Sheet

2

3

3

3

3

4

4

5

6

9

10

20

22

25

I. Module Details

Module Convenor: Dr Saileshsingh Gunessee

Semester: Second

Number of Credits: 10

Contact Details: Office: Room AB378 E-mail: saileshsingh.gunessee@nottingham.edu.cn Office Hours: Wednesday 14.00-15.00

Programme Classes: Lectures + Computer Workshops

Lectures: Monday 9:00-10:00 Venue: TB 329

Wednesday 10:00-11:00 Venue: TB 329 Computer Workshops: See Timetable

Teaching Material: Accessed through WebCT

The main aims of this module are: to introduce students to the principles, uses and interpretation of regression analysis most commonly employed in applied economics; to provide participants with sufficient knowledge of regression methods to critically evaluate and interpret empirical research.

On completion of this module students should be able to: demonstrate understanding of the assumptions and properties underlying regression analysis and the principle of ‘least squares’; interpret and manipulate the coefficients of multiple regression and performance criteria; conduct diagnostic checking of the validity of regression equations coefficients; appreciate the problems of misspecification, multicollinearity, heteroscedasticity and autocorrelation.

II. Module Aims, Objectives and Outcomes

The module will be assessed by a mixture of project work and examination. The breakdown is: 5 assessed computer lab workshops – 20%, Exam – 80%.

To get an idea with regards the examined part of the module I have posted previous exam papers (with feedback) on WebCT. For more information on the assessed lab workshops please read on this later.

The following is the list of topics we are going to cover:

1. Simple Regression Analysis 2. Multiple Regression Analysis 3. Dummy Variables 4. Heteroscedasticity 5. Autocorrelation

The first two topics consist the core of the principles of econometrics and we will spend the bulk of our lectures on the sub-topics forming part of these 2 topics. The ‘Module Topics & Readings’ section provides more information. In addition, the timing of the classes (lectures and computer workshops) can be found under the ‘Teaching Plan’.

III. Module Assessment

IV. Module Content

3

The main text for this module is

Dougherty, C. (2007). Introduction to Econometrics, third edition, Oxford.

My lectures will follow this text somewhat closely. However, for some parts of the module content I may refer to the older 2nd edition and other texts. When I do so students will be informed and some of these readings will be provided on WebCT (see Module Topics & Readings for more information). The book by Dougherty is a bit mathematically oriented. For a gentler introduction to the principles of econometrics students may want to consult

Gujarati, D.N. and Porter, D.C. (2009). Essentials of Econometrics, 4th edition, Mc-Graw-Hill. {You can also consult the 3rd edition by Gujarati, D.N. (2006).}

V. Module Text

Other texts which you may find worth consulting as additional reading are:

• Gujarati, D. N. and Porter, D.C. (2009), Basic Econometrics, 5th Edition, McGraw-Hill, New York.

(Good and extensive introductory book with applications). {You could try the 4th edtn also}

• Ramanathan, R. (2002), Introductory Econometrics with Applications, Fifth Edition, The Dryden

Press, Fort Worth. (Good introductory book with applications)

• Wooldridge, J. (2009). Introductory Econometrics: a modern approach, 4th edtn. Mason, OH:

South Western, Cengage Learning. (Alternative to Dougherty) {The 3rd edtn can also be used}. • Hill, R.C., Griffiths, W.E. and Lim, G.C. (2008). Principles of Econometrics, 3rd edition. New

York: John Wiley & Sons. (Alternative to Dougherty)

• Ashenfelter, O., Levine, P.B. and Zimmerman, D.J. (2003) Statistics and

econometrics: methods and applications, New York: J. Wiley. (basic introduction) Of course you are encouraged to not limit your reading to these texts only. The Library has a nice range of Econometric texts. In addition, there are several texts on Stata, from basic level to more advanced level. However, in our case we need introductory level texts, for e.g:

• Hamilton, L. (2009), Statistics with Stata, Belmont, California: Brook/Cole, Cengage Learning. • Lee C. Adkins, R. Carter Hill (2008), Using Stata for Principles of Econometrics, Third Edition,

Hoboken, NJ: Wiley.

All the books are available in the library in the ordinary loan, reference and 24-hour loan sections. Students are encouraged to read beyond the lecture handouts and the main reading. Success in this module, and thus in terms of gaining a first class, very much depends on this.

Lectures for this course begin in the week beginning Monday 21st February.

Various course materials, including lecture slides will be available on WebCT.

You will not be registered on WebCT for the course until enrolment lists are finalised, and this may not occur until several weeks into the semester. However, before that time you may self-register, in the following manner:

SELF REGISTRATION ON WEBCT

• On the internet, go to: webct.nottingham.ac.uk

• Click on: course list (NB: do not log in first)

• Click on: China

• Click to expand UNNC: Business School

• Scroll down to Introductory Econometrics (P12205)

• Click on icon to open registration page

• Enter Username and Password

(These are your university username and password)

• Click on: Register

VI. Access to Resources

4

As indicated above, there are 5 assessed computer lab workshops. At the outset let me state that given the computer workshops are assessed, it thus makes attending and completing the assigned problems at the workshop compulsory, if you want the mark. These sessions will be held from timetable week 33 (& teaching week 7) up to and including timetable week 37 (& teaching week 11). Academic Service Office would have already allocated you to a particular group (see your timetable for this). According to the timetable there are four groups: IE-1; IE-2; IE-3; IE-4. You should make sure you attend the group you have been allotted to and there is no excuse for not doing so.

Each group has a session once per week in the same lab and at the same time from timetable weeks 33-37 (except for group IE-1 due to the public holiday on the 2nd May, the class will on Tuesday instead. Again see your timetable). The first session (CW1) will be used as an introductory session on how to use the econometric software STATA, which will be helpful in completing several problems in the following sessions. At each of the remaining sessions (CW2-CW5) you will receive a ‘problem set’ and a ‘MCQ sheet’ and your task will be to attempt to answer the questions on the MCQ sheet (these MCQ sheets are the ones you are familiar with and a sample is available at the end of the outline). Note the assessment for the first session will be carried out in the second session and thus you will have two assessment exercises given to you in session 2, i.e, in CW2. The CWs will test your knowledge of the course, as well as you ability to use econometric software. Instructors are there to assist, but will obviously not extend their assistance to telling you the answers, nor to confirming or otherwise whether your proposed answer is correct. You may work individually or with other members of the group, as you wish. The tentative CWs plan is (you will be informed of any changes):

Computer Topics / Demonstration/ Assessment Workshops Lectures

CW1 All D CW2 Core: Simple & Multiple Regressions [L1-L8] A (CW1 + CW2) CW3 Multicollinearity + Wald Test + Core A (CW3) CW4 Dummy Variables + Variables Omission + Core A (CW4) CW5 Heteroscedasticity + Autocorrelation + Core A (CW5) It should go without saying that if you prepare the topics beforehand you will be able to work faster and more accurately and therefore be more likely to complete the questions correctly comfortably in the time allowed. You will have to hand both MCQ sheet and problem sets to the instructor before you leave at the end of the session, and your sheet will be marked. Each session will end promptly at 10 minutes to the hour, as per University conventions. If you do not hand in your sheet before the end of the session, your work will not be marked, i.e, you will get a mark of zero.

So make sure you are available for each of that group’s 5 sessions. Note instructors will only take to the labs enough ‘problem sets’ and ‘MCQ sheets’ for those individuals who are on the register for that session. This will be checked by the instructor at the start. Therefore, if you do not attend the session you allotted to, you will not be able to complete a MCQ sheet and so you will not be awarded a mark for that session. To be clear if you do not turn up at your allotted times, you will receive a mark of zero for that session. It therefore follows that you should check carefully your priorities for those particular dates for those 5 weeks (so check these dates in your diary). If you have valid extenuating circumstances and this causes you to miss one or more of your allotted sessions, you will need to contact the NUBS-China Senior Tutor (Mr Craig Fleming). If, and when, he approves your claim, he will notify me as module convenor, and I will subsequently get in touch with you to arrange a replacement session. Replacement sessions will take place separately from the scheduled lab sessions, most likely in timetable week 37. If you do not have valid extenuating circumstances and you miss your designated class, you will receive a mark of zero for that class. There will be no opportunity to retake that class at an alternative date.

The sessions are mostly open book sessions. Thus, you can bring any material that may assist you in answering the questions set. In fact, as a ‘to bring list’, I would recommend you to bring: the main textbook; lecture handouts; first lab session handout (this be provided in the first lab workshop CW1 and will provide details on how to use STATA with examples); calculator; pencil (for the MCQ sheet).

Let me repeat again the computer workshops are assessed and non-attendance & non-completion of the ‘problem sets’ for these computer workshops will result in a mark of zero.

VII. Assessed Lab Workshops

5

Topic 1: Simple Regression Analysis

VIII. Module Topics & Readings

Lectures Topic content

L1-L6 Purpose of econometrics; principles of regression analysis; various approaches & differences between them to understanding & estimating a relationship (graphical, correlation and regression); derivation of linear regression coefficients (using OLS method); interpretation of a regression equation; assumptions of the classical linear regression model; regression coefficients as random variables; BLUE properties of OLS estimators (demonstration of linearity & unbiasedness); sampling distribution and variances of regression coefficients; statistical inference on regression coefficients (confidence interval & hypothesis testing using both significance and p-value approach); goodness of fit; different functional forms & transformation of variables.

What you - What is econometrics about? need to - Differences between graphical, correlation and regression analyses in know? understanding & estimating an economic relationship - You are expected to understand the principle behind econometric analysis and hence the intuition behind the ordinary least squares method.

- You need to know & understand how to derive the OLS estimators and thus need to know the formulae for the regression coefficients. You also need to be able to apply these formulae for a given dataset.

- You need to know the assumptions that make the CLRM and the intuition behind them.

- You need to understand how to derive the unbiasedness property for OLS estimators and (only) understand the linearity & minimum variance properties. Thus, you need to understand the usefulness of the BLUE properties.

- You need to understand how a regression coefficient is a random variable and how this gives the sampling distribution of the regression coefficient. Furthermore, you need to appreciate how the shape of the distribution is determined.

- You should be able to understand the principle behind hypothesis testing and be able to perform a test. That is, what exactly are we testing for with regards to regression equation, how to formulate a null & alternative hypothesis carefully and interpret your findings.

- You are expected to be able to construct confidence interval for regression coefficients and understand the principle behind it.

- You need to understand how the R-squared provides a measure of goodness of fit and hence how to interpret it for an estimated regression.

- You need to be able to transform non-linear regression equation into a linear one. Also you need to understand concepts such as elasticity and marginal effects and thus how to compute them for different functional forms.

Readings L1-L2: Dougherty - Chapter 1 all sections except section 1.7 L3: Dougherty (3rd edtn) - Chapter 2.2

• Dougherty (2nd edtn)

• Section 3.1: Regression coefficients as random variables (To be

provided)

• Section 3.4: Unbiasedness of reg. coef. (To be provided)

• Gujarati-Porter (2009). Basic Econometrics

• Section 3.2: CLRM assumptions (Available in library)

• Section 3.4: Gauss-Markov theorem (Available in library)

• Koutsoyannis, A. (1977). Theory of Econometrics.

• Section 6.2.4: Importance of BLUE properties (To be provided)

L4: Dougherty (3rd edtn). Chapter 2.8

• Dougherty (2nd edtn). Chapter 3.5

• Precision of regression coefficients (To be provided)

L5: Dougherty (3rd edtn)

• Chapter 2.8: p-value

• Chapter 2.9: Confidence Interval • Chapter 1.7: Goodness of fit – R2

6

L6: Ramanathan, R. Introductory Econometrics.. Chapter 6 • To be provided on WebCT or see book in library

• Dougherty. Chapter 4.2

• Gujarati, D. Essentials of Econometrics. Chapter 9.2 (Available in library) • Gujarati-Porter. Basic Econometrics. Chapter 6.4, 6.5, 6.6, 6.8 (Available in

library)

Topic 2: Multiple Regression Analysis

Lectures Topic content

L7-L10

Estimation and interpretation; Hypothesis Testing (t test, F test and Wald F Test); Multicollinearity; Omission of Important Variables.

What you - You are expected to understand how the regression coefficients in the multiple need to regression in principle rather than how to derive them. know? - You need to appreciate the difference between the estimated regression coefficients under simple and multiple regressions.

- You should be able to interpret the slope coefficients noting the importance of

knowing its statistical significance first.

- You need to understand how the shape (i.e the variance) of the sampling distribution of the regression coefficient is determined (on top of the factors discussed in simple regression) by correlation between regressors.

- You should be able to appreciate the similarity in performing a partial inference on

a regression coefficient using t-test or confidence interval.

- You ought to know a bit about model selection criteria and the use of R2 and

adjusted R2.

- You should be able to comment on the overall fit of the regression and test for its

overall significance using the F-test, appreciating the link between the R2 and the F-test.

- You need to demonstrate that you can apply the Wald test to test for the significance of several coefficients and/or testing for linear restrictions on regression models and thus know the steps of the test.

- You are expected to be able to explain what is multicollinearity and how it impacts

the regression coefficients. You should be able to explain this using the Ballentine representation combined with the usual comments of the consequences.

- You should also know how to use the Ballentine to explain a bit about regression

analysis.

- You need to understand when to omit variables and when not to. You need to be

able to discuss the consequences of the deletion of important variables. Here again you should be able to use the Ballentine to explain things.

- You need to know about inclusion of an irrelevant variable and its consequences.

Readings L7: Dougherty (3rd edtn) Chapter 3.1: Illustration [+ Appendix slide: WebCT ]

Chapter 3.2: Interpretation (Can omit proof of derivation, just try to understand intuition)

Chapter 3.3: Precision multiple regression coefficients & t tests + confidence intervals

Chapter 3.5: Goodness of Fit + Example + Adjusted- R2

L8: Chapter 3.5: F-tests

Wald Test:

– Ramanathan pp.156-157 (To be provided) Testing Linear Restrictions:

– Dougherty: Chapter 6.5 - pp. 216-219

L9: Gujarati-Porter. Chapter 10 (especially Intro & consequences of Multicollinearity)

– Dougherty: Chapter 3 – Section 3.4

– Kennedy, P. (1981). The Ballentine: A graphical aid for Econometrics,

Australian Economic Papers (To be provided) – Kennedy, P. (2003). A Guide to Econometrics.

7

L10: Model Misspecification: Intro & Consequences

– Dougherty: Chapter 6.1 + 6.2 + 6.3 – Gujarati-Porter. Basic. Chapter 13.9

Ballentine:

• Kennedy, P. (1981). The Ballentine: A graphical aid for

Econometrics, [To be provided]

• Kennedy, P. (2003). A Guide to Econometrics. p. 115 [To be

provided]

Topic 3: Dummy Variables Lectures L11 Topic content Definition of a Dummy variable; intercept term shift only; slope shift only; intercept & slope shift; event dummies; interpretation of dummy variables in regression.

What you - Intuition behind introducing a dummy variable in a regression and how to need to interpret and test it. know? - How to introduce dummy variables where the regression changes through the constant term only.

- How to introduce a dummy variable in interaction with a quantitative variable

and understand what it actually means when interpreting the regression where we allow the explanatory variable to vary.

- How to allow both intercept and slope to change and be able to interpret it. - How to use event dummies in a regression and interpret it. - Understand the dummy variable trap.

Readings Ramanathan (2002). Chapter 7

• Dummy Variables: section 7.1 (without proof) pp.290-293; 7.3

(only shifts in slope + intercept & slope) [To be provided]

Ashenfelter, Levine and Zimmerman (2003). Chapter 12

• Sections 12.1 + 12.2 [To be provided]

Readings on Dummy Variable Trap:

• Gujarati. Essentials of Econometrics. pp.294-295

Topic 4: Heteroscedasticity Lectures L12 Topic content Definition of heteroscedasticity; causes of heteroscedasticity; detection of heteroscedasticity; consequences of heteroscedasticity; solutions to heteroscedasticity.

What you need - What is homocedasticity and what is heteroscedasticity. to know? - What may cause heteroscedasticity to arise? - What are the methods available to detect or test for heteroscedasticity and be

able to apply them? These include the graphical method, the White test and several other tests*.

- How to resolve the problem of heteroscedasticity through transformation of the

regression and using the weighted least squares method instead of the OLS technique.

- How we can use White’s Heteroscedasticity-consistent variances and standard

errors to allow for heteroscedasticity.

Readings - Dougherty (2007, 3rd edtn). Chapter 7 - Supplementary readings:

- Causes and consequences: Gujarati-Porter – 11.1; 11.4 - Detection: Gujarati-Porter – 11.5

- White test: Gujarati-Porter pp. 386-3 + Asteriou-Hall (2007) pp.116-117

• • OLS estimators pp.53-57

Multicollinearity p. 212 (To be provided)

8

- Breusch-Pagan-Godfrey test: Gujarati-Porter pp. 385-386 - Remedial Measures: Gujarati-Porter – 11.6

* The Goldfeld-Quandt test is no longer part of the syllabus.

Topic 5: Autocorrelation Lectures L13 Topic content Definition of autocorrelation; causes of autocorrelation; detection of autocorrelation; consequences of autocorrelation; solutions to autocorrelation.

What you - What is serial correlation and how it arises? need to - How autotcorrelation can be detected using the Durbin-Watson d test and be able know? to discuss the steps of this test. You should be able to compute the d statistic if data is available.

- You need to appreciate the limitations of the d test and how one can apply more

sophisticated test such as the Breusch-Godfrey test.

- You need to know how to resolve serial correlation when ρ is known and unknown. When unknown you need to be able to discuss ways to estimate ρ (such as through the Cochrane-Orcutt Iterative procedure).

- Alternatively the Newey-West method can be used to derive HAC (heteroscedasticity- and autocorrelation-consistent) standard errors or Newey-West standard errors.

Readings - Dougherty (2006). Chapters 12.3, 12.4 Supplementary readings:

– Causes and consequences:

• Gujarati-Porter – 12.1; 12.2; 12.3; 12.4

– Detection:

• Gujarati-Porter – 12.6

– Durbin-Watson d test:

• Gujarati-Porter – pp.434-438

• Ramanathan. pp. 386-3 [Limitations]

– Other tests:

• Gujarati-Porter - pp. 438-440

– Solutions: Gujarati-Porter – 12.7; 12.8; 12.9; 12.10

Timetable Teaching Week week

IX. Teaching Plan

Classes Lectures Computer Workshops

27 1 L1 & L2 - 28 2 L3 & L4 - 29 3 L5 & L6 - 30 4 L7 & L8 - 31 5 L9 & L10 - 32 6 L11 & L12 - 33 7 L13 CW1 + CW1(R2) + CW1(R3) + CW1(R4) 34 8 L14 [EC] CW2 + CW2(R2) + CW2(R3) + CW2(R4) 35 9 L15 [EC] CW3 + CW3(R2) + CW3(R3) + CW3(R4) 36 10 Review Lecture CW4 + CW4(R2) + CW4(R3) + CW4(R4) 37 11 - CW5 + CW5(R2) + CW5(R3) + CW5(R4)

Notes: 1) CW = Computer workshop; there will 5 CWs (CW1-CW5) and these are repeated (R2-R4). So for each CW we have four sessions. You will attend that session you are assigned to. Maximum numbers of students around 60 per lab session. 2) EC = Exercise classes.

3) There will be only one lecture in weeks 33-36 which coincide with the start of the computer workshops.

9

A LEVEL 2 MODULE, SPRING SEMESTER 2009-2010

INTRODUCTORY ECONOMETRICS

Time allowed TWO HOURS

Answer ALL questions from Section A and TWO questions from Section B

Section A accounts for one third of the total marks available for this examination. Each Section B

question carries equal weight

Percentage figures following each part indicate the proportionate weighting for that part within the

question

ADDITIONAL MATERIAL: Statistical Tables Formula Sheet

SECTION A

Answer ALL questions

1. Consider the following regression output for an earnings regression of hourly earnings, e

(measured in $) on years of schooling (s), years of experience (x) and gender (where the variable male takes value 1 if the individual is a male and 0 otherwise).

. reg e s x male

Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 58.16 Model | 28222.8502 3 9407.61674 Prob > F = 0.0000 Residual | 86694.1112 536 161.742745 R-squared = 0.2456 -------------+------------------------------ Adj R-squared = 0.2414 Total | 114916.961 539 213.20401 Root MSE = 12.718

------------------------------------------------------------------------------ e | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s | 2.587587 .2258699 11.46 0.000 2.143888 3.031286 x | .4679175 .1357672 3.45 0.001 .2012165 .7346185 male | 6.378477 1.109314 5.75 0.000 4.199341 8.557613 _cons | -26.79575 4.394197 -6.10 0.000 -35.4277 -18.16379 ------------------------------------------------------------------------------

Interpret the coefficients on the regressors. [100%]

2. Outline the reasons why a one-sided test might be preferred to a two-sided test for hypothesis

testing on an individual coefficient. [100%]

3. A researcher who has investigated how hourly earnings E (measured in $) is affected by years

of schooling (S) and years of experience (X) for a sample of 540 individuals has obtained the following fitted regression (with figures in brackets being standard errors):

^ Ei = -26.93 + 2.67 Si + 0.594 Xi

(4.523) (0.232) (0.138) R2 = 0.1991

Derive and interpret the elasticity of earnings with respect to schooling for the above regression for an individual with 12 years of schooling and 10 years of experience. [100%]

Start______________________________________________________________________

The University of Nottingham Ningbo, China

BUSINESS SCHOOL

X. Exam Paper & Feedback – Sample

10

4.

Given a sample of 50 observations and 4 explanatory variables, test for autocorrelation at a 1% level using the Durbin-Watson test when d=1.05, stating the hypotheses and interpreting your finding. [100%]

SECTION B

Answer TWO questions

5.

(a) What factors determine the ‘shape’ (i.e. variance) of the sampling distribution of a

regression coefficient? [40%]

(b) Consider the regression model

Yi = β1 + β2 Xi + ui , i = 1, …, n. whereE(ui)=0andE(ui2)=σ2

Show that the OLS estimator of β2 is unbiased. [40%]

(c) Explain what is meant to say that the OLS estimators are Best Linear Unbiased Estimators

(BLUE) and what is the relevance of these properties. [20%]

6. Collinearity between regressors and omission of an important variable can create problems.

(a) Explain how the OLS estimators provide estimates that control for the effects of all

included explanatory variables and outline the effect of co-linear regressors on estimation. [50%]

(b) Prove that the regression coefficient will be biased if an important variable is omitted.

[25%]

(c) Using the ‘Ballentine’ representation, explain why regression coefficients would be

biased if an important variable has been omitted but would still be unbiased if an irrelevant variable has been added. [25%]

7. A labour economist wishes to examine the effects of schooling and experience on earnings. She

considers the following regression model:

where ln E is the natural logarithm of earnings, S is the number of years of schooling, X is the number of years of experience. N is the number of sample observations with the standard errors being the values in parentheses. (a)

(b)

11

lnE=β1+β2S+β3X+u

Using cross-section data, she obtained the following regression result:

lnE=0.430+0.122S+0.041X

^ (0.176)(0.009)(0.005)

R2=0.266F(2,537)=97.51N=540

Stating the null and the alternative hypotheses, test the hypothesis that ‘schooling has no effect on earnings’. What do you conclude? [30%] Using the F test, test for the ‘overall significance of the regression’. Interpret your results. [30%]

(c)

The labour economist decides to re-run the above regression by adding the square of experience X2. She obtains the following estimated regression:

lnE=0.880+0.126S−0.031X+0.002X2

(0.248)

(0.009)

(0.029)

(0.001)

^ R2=0.275RSS=137.812N=540

She finds that the coefficient on X is insignificant and decides to test the joint significance of both X and X2 on earnings. She thus runs and obtains the following fitted simple regression without the experience variables: ^ lnE=1.425+0.100S

(0.126)

(0.009)

R2=0.186RSS=154.873N=540

Formulating the (un-)restricted models and your hypotheses, use the Wald test to find out if ‘experience has a significant effect on earnings’. [40%]

8.

A researcher who wishes to investigate the relationship between investment, I, government expenditure, G, and national income, Y, comes up with the following model

I=β1+β2G+β3Y+u

Using data for a sample of 30 countries, with I, G and Y measured in US$ billion, he fits the regression (standard errors in parentheses): ^

I=18.10+1.07G+0.36Y

(7.79)(0.14)(0.02)

R2=0.99

(1) Suspecting the presence of heteroscedasticity, he also regresses I/Y on G/Y and 1/Y: ^

IG1=0.39−0.93+0.03YYY(0.04)(0.22)(0.42)

R2=0.78

(2)

Finally, he regresses log I on log G and log Y: ^

logI=−2.44−0.63logG+1.60logY

(0.26)(0.12)(0.12)

R2=0.86

(3) It is very likely that regressions (2) and (3) do not suffer from heteroscedasticity.

(a) Explain carefully what is meant by heteroscedasticity. [30%]

(b) The researcher uses specification (1) and regresses the square of the residuals on G, Y and

their squares, and their product. He obtains an R2 of 0.9878 for this regression. Perform a White test for heteroscedasticity. [30%]

12

(c) Discuss the merits of specifications (2) and (3) and explain what can you conclude about the

effect of government expenditure, G, on investment, I. [40%]

Module Code: P12205 Module Title: INTRODUCTORY ECONOMETRICS Credits: 10 Module Convenor(s): Dr. Saileshsingh Gunessee

General Comments:

Overall, we had a distribution of marks slightly skewed on the upper side with an average mark of 65.54 (standard deviation of 10.58). To sum up the distribution of marks:

First 2:1 2:2 Pass Fail Max: 82 70+ 60-69 50-59 40-49 <40 Min: 0

Numbers 273 107 111 29 18 8 Mean: 65.54 % 100 39.2 40.7 10.6 6.6 2.9 SD: 10.58

Generally, I got a good set of scripts showing knowledge and understanding of the principles of econometrics testified by the lop-sided distribution dominated by scripts in the first-class and 2:1

bands. I was genuinely pleased to see the relatively good number of firsts (around 39.2%). Even if on the low side as always it’s unfortunate to see there were a few fails. Yet the number of overall fails for the module was a miserly 2 (when we account for the coursework). Worth emphasising this module is examined at 80% of the overall marks and the coursework accounts for 20% of the marks.

Similar story as in previous years, I noticed students (irrespective of class) being more at ease with derivation-type or numerical questions and less so with discursive-type questions. For such questions I had less discussion and more scribbling of equations or graphs (with little in way of explanation) in some answers. This was so for instance in questions 5a and 6a where some students scribbled some equations thinking they would explain themselves to me. Such answers with little in way of

explanation didn’t go down well with me. As always if students want higher marks they need to work on this aspect.

Given that most errors are repeated year after year, I hope that students next year reading these comments will take these comments constructively and reflect on them as they prepare for the module. These are:

1) Timing. As a repetition from last year, I think students should note that though Section A is

compulsory, it carries only 1/3 of the marks and that Section B carries 2/3 of the marks. You need to devote your time accordingly as failing to answer even one question from Section B, because you run out of time, could prove disastrous to your aspiration to a first. So time

yourself well in the exam. In addition, I would add that marks allotted should provide a guide to how much you write. I had many instances like in Q5 and Q6 where students wrote very little in say Q6a (when 50% was being allotted) or Q5c compared to Q5a/Q5b where I had some answers trying to prove unbiasedness or linearity when the question clearly required definition/explanation of each concept and their relevance.

2) Workings. An improvement from previous years. Most (not all) students got the message and

had the required detail in their answers. As stressed in previous years, it is important that you show all your workings. By that I mean you need to show all the intermediate steps of a

derivation or a calculation. For instance, for Q5b, Q6b and Q7 I expected students to show in detail how they get to their final answers. Answers lacking in intermediate steps didn’t fare well. Not surprisingly most students who did show all their workings got a very high first. 3) Care into what you say. What I am looking from this exam is for you to show me your

knowledge of econometrics that you have learnt over the course of this module. However, saying things that demonstrates to me that you either didn’t understand or failed to

INTRODUCTORY ECONOMETRICS EXAM FEEDBACK 2009-10

13

comprehend an important concept would cast doubt in the mind of the examiner. As an example for Q4 I had so many answers which had ‘so we reject Ho and conclude there is no autocorrelation’. How can that be, when the null being rejected is a null of ‘no autocorrelation’ for an alternative of ‘presence of autocorrelation’ which you would be accepting if you reject the null. There were many such instances in the scripts I got this year.

4) Small things matter. Yes this is correct small things you put in your answer, in line with point

3, do matter. Common mistakes which kept appearing include not being able to write the restricted and unrestricted models properly in Q7c. The number of scripts that combined fitted/estimated regression with the error term or fitted value of Y with the population

parameters was appreciably many. Note the use of the ‘hat' (^) in the same regression where you had the population parameters in the regression is wrong, as the ^ describes fitted values of the dependent variable using sample data and thus should correspond to a regression with

the estimators. This year the distinction between the population parameter βs and the

estimators bs was a less common mistake compared to previous years but it still appeared in a few scripts. They are different and when you write your equation you have to understand

parameters βs. To a lesser extent this year as well, some people trying to test the null hypothesis of ‘variables or estimators being equal to zero’ [it is the regression population parameters/coefficients that we set to zero not the variables per se]. I had few scripts which confusingly would use the term dependent variables instead of independent variables. This was the case in Q5a when explaining how the variance of the independent variable influences the shape of the distribution.

5) Degrees of Freedom and Critical Values. A lot of students failed to get the critical values

correctly as they failed to get the degrees of freedom. This malaise occurred in Q4, Q7 and Q8b. In Q4 the question on autocorrelation I got a number of people subtracting the 4

explanatory variables from the 50 sample of observations or subtracting 1 from 4 independent variables. Reading the D-W tables would have revealed what you needed to find the critical values (sample size and no. of explanatory variables). In this question I specifically asked to ‘test at the 1% level’ but I got a few answers using a 5% level. In Q7a I got a few scripts

setting a two-tail test but using a one-tail critical value. I also got one script which mistakenly thought that the given values in brackets were p-values, when it is clearly stated in the

question that they were standard errors. Many answers for this very often failed to reveal how they got the critical value just stating they were smaller than the computed t-values. This goes back to point 2 where I said intermediate steps are important and this is part of the testing procedure which is being missed. I also got a few answers which found that 540-3 was 536!! One answer confusingly computed the degrees of freedom from k-1 (number of parameters minus 1). In Q7b and partly for Q7c, I had many answers which confused the given F-ratio of 97.51 for the critical value when the F-value is the computed one and thus it also meant you didn’t have to compute it as it is already given to you!! And thus these answers didn’t bother searching for the correct critical value (it as 4.605 at a 1% level) which was much lower and wrongly concluded we failed to reject the null. If they had checked the degrees of freedom they would have found that out. More importantly if you had a ‘good knowledge of

econometrics’ you would have noticed that in part (a) that you just concluded that ‘schooling had an effect on earnings’ and thus this should logically mean that if the F-test is telling us something different (like do not reject the null of ‘schooling and experience have no joint

effect’) then something must be wrong. To me such failure demonstrates a lack of knowledge. In Q7b and Q7c I also had a few answers which got the degrees of freedom wrong. If the F-value reported tells you it’s F(2,537) then it means the numerator has degrees of freedom 2 and denominator is 537. Many who got this wrong in the first place failed to pick up on this ‘hint’ betraying their further lack of knowledge. In Q8b I had many who got degrees of

freedom of 2 from where I don’t know (when it should have been 5 as KA -1=6-1, where KA is number of parameters in the auxiliary regression). Unfortunately these may seem small mistakes to you but to the examiner it betrays a lack of understanding and thus marks

allocated reflected this. Concurrently showing your inability to read simple statistical tables is not a good signal to send to the examiner. Also importantly a few answers failed to report their degrees of freedom calculations and what critical values they find clearly. Don’t assume the examiner knows this, you have to tell him/her so.

6) Expectations. There was no proper question on dummy variables this year. However, Q1 if you

that a population regression function and sample regression function are different. The βs are unknowns (so can’t take any values) and thus we estimate them using the estimators bs. So the numerical values in an estimated equation are showing the estimates NOT the population

14

wanted you could have shown how when interpreting the coefficient on the gender variable ‘male’ how it shows the difference in mean salary between male and female. Whereas this

year I didn’t get the terrible ‘Earnings[Male=1]’ to replace the correct conditional expectations of E[Earnings/…], I did get a few scripts not writing the conditional expectation in full and properly with if you are using the fitted values should thus include ‘a hat’ on earnings and no error term.

7) Explaining graphs. Despite my suggestions from last year I still got a few scripts which drew a

graph which didn’t bother explaining, thinking they would self-explain themselves to me. Again this clearly does not come into consideration when I am awarding marks and also instead of helping your cause it may hurt it as it may leave doubts in my mind as to whether you have understood the graph and an unexplained graph to me may be a representation of your

incomprehension and hence why you have provided no explanation. So be careful about this aspect of your answer. DO EXPLAIN YOUR GRAPHS. Do not assume your graphs are self-explanatory!! They clearly are not.

8) Readings beyond my lectures. Unfortunately answers that were provided clearly at times were

pure and simple reproduction of my lecture material (even sometimes picking some of the mistakes in the handouts along the way!!). This shows no attempt had been made to read beyond my lectures and to communicate this to me. If you wish to earn the high marks and especially do really well on the discursive questions (like Q2, Q3, Q5a, Q5c, Q6a, Q6c) you had to demonstrate you have done the additional reading. These questions need to be explained and discussed in some depth but many students failed to do so. If you wished a mark above 80 in these questions then this is what you had to demonstrate: to show you have read

beyond my lectures and that you can communicate this clearly to me in your answers. A point I need to mention is that I have the habit of combining both computations and

interpretation/discussion in all of my questions. This is reflected throughout the exam paper and thus compels the student who wishes the high mark to WORK to achieve it!!

All of the students who paid lip service to the many suggestions and avoided these mistakes got higher marks. All in all, here I would suggest to next year’s students and soon-to-be fourth year

students who intend to take Applied Econometrics in their fourth year to take these points on board. Question Specific Comments:

Question 1:

Probably most well answered question and the highest incidence of highest mark in a script mostly in the 80s. For this question you had to interpret the coefficients on the regressors, meaning on the slope coefficients excluding the intercept/constant term. I had too many scripts who spent too much time and space discussing about the constant term which as I explained is pointless and thus why I asked about

interpretation of the slope coefficients in the first place. If I had wanted you to discuss the intercept term I would have asked ‘interpret all coefficients’. What was even worst were answers that were trying to test the ‘significance of the intercept term’ which makes no sense at all as I have explained in class the constant term doesn’t vary and thus it has no effect per se on your dependent variable. Or even some trying to give the same interpretation as in my lecture handouts ‘that people have to pay to work’. I couldn’t believe what I was reading!! To me it demonstrates a great deal of ignorance about econometrics!! The interpretation had to note that one variable was a binary variable. The interpretation of S and X using the usual ceteris paribus or holding constant other variables were quite common and correct. The better answers used ‘holding experience constant and for both genders [male and female]’. One mistake in a few scripts was the omission of the ceteris paribus condition when interpreting the variables. The male variable was mostly interpreted as male/men earn $6.38/hour more for people with the same level schooling and experience (some answers had the ‘more’ missing). A few answers missed the ceteris paribus condition in the latter

interpretation and some answers mistakenly interpreted the male variable as a normal variable as showing ‘an increase in male would lead to..’. Note interpretation of a dummy variable is different from a non-binary variable. It shows difference between two categories in the dependent variable which in this case it is showing the difference in earnings across gender. With respect to the interpretation here, it was ok if answers had used the ‘ceteris paribus’ condition. Also note it is a mistake to comment that the

15

Question 2:

Question 3:

Question 4:

positive sign on this variable or the significance implies it has ‘positive relationship or effect on earnings’. This is an incorrect interpretation. As what it means is there exists a difference in salary between male and female and this difference is positive and

significant. So there is ‘NO EFFECT of male on Earnings’ (an incorrect statement I kept getting in some scripts). Common mistakes included using ‘units of schooling’ instead of ‘years of schooling’. In addition, the best answers also commented on the type of relationship or difference for the dummy variable (positive or negative), the

significance of each coefficient (including overall significance & fit) using the p-values and R-squared and writing down the estimated regression & predicting earnings for some assumed values of S and X, commenting on the standard errors & confidence interval of each coefficient and deriving the interpretation of the population parameter (normally β4) on MALE from the population regression function in the general sense and then interpreting the estimate value. Those answers which had this additional information were awarded higher marks. Not a favourite question among students as testified by the low average mark on this question. You were required to explain the reasons of why a one-sided test could be preferred to a two-sided test. The three reasons provided in class was mostly

provided: it can reduce a Type II error without increasing Type I error; theory could dictate our choice; to derive more conclusive evidence in terms of the direction of the effect of the explanatory variable on the dependent variable. The best answers spent a whole page ‘explaining’ these reasons. Answers which were awarded lower marks only provided the reasons without explaining the reasoning. I was expecting a short explanation for each reason. The best answers which discussed Type I & II errors

defined these concepts and graphically explained how by shifting from a two-sided test to a one-sided test we can reduce a Type II error without increasing Type I error noting that under normal circumstances you would expect a trade-off between these two. The graphical explanation if provided had to explain this process carefully. Some answers unfortunately weren’t as clear and comprehensible as they should have been. This goes back to my point I made about students being uneasy with discursive questions. For this calculation I had a mixture of answers: some getting the answers right (both calculation and interpretation), some getting only the calculation right but not the interpretation, some getting the calculation slightly wrong, and last some not getting the question at all. In the latter case then what these scripts contained were either nothing or trying to derive confidence intervals and performing t-tests for reasons unknown to me. These students would have better used their time in answering other questions than wasting time on these unrewarded answers. The best answers started by defining elasticity both algebraically and in words. Then they computed earnings and slope of earnings-schooling (dy/dx or dE/dS to be exact) based on the years of schooling and experience levels (some answers didn’t show this bit clearly and thus didn’t get marks awarded for showing these computations). These answers then

proceeded to interpret elasticity of earnings with respect to schooling. This part most answers didn’t get. Some attempted an interpretation but didn’t get it right. The best answers discussed whether this was elastic or inelastic or how responsive earnings was to schooling. Even further they interpreted the derived elasticity coefficient in terms of percentage change noting it was for someone with 12 years of schooling and 10 years of experience. Thus this elasticity figure was non-constant for this regression and depended on those two variables. Some answers used the ceteris paribus condition instead of the ‘specific years’ interpretation. Though not specifically asked some answers also derived the elasticity of earnings wrt experience. For this, I awarded marks only if the part on schooling was complete and there were some comparison to schooling made. Relatively straightforward question and thus well answered by most but a few.

Common mistakes include mistaking the explanatory variables k’ for the k, the number of parameters when finding the critical values. A related problem was a few thought k’=3. I don’t know where they got that. One common mistake was getting the null and alternative hypotheses mixed or wrong. Another mistake was not properly drawing the D-W decision table or distribution (if drawn). Another mistake as mentioned in point 5

16

Question 5:

was getting the critical values wrong because some people got the degrees of freedom wrong. There were also some answers despite getting hypotheses and even the graph right couldn’t properly make an inference of whether to reject the null. I had answers saying ‘since d < dL then we fail to reject the null’. I don’t know where this came from. I was expecting a clear formulation of the hypotheses (as the question clearly asks for it). Many answers had ‘Ho: no autocorrelation and H1: autocorrelation’ or H0: ρ=0 and H1: ρ is not equal 0’. I was expecting a bit more than that as to me if this wasn’t said then I would ask where ρ comes from. Answers which had the latter at least I was expecting they started by saying that the disturbances were generated from ‘first-order’ process: ut = ρut-1 + εt. In terms of conclusions I was expecting a clear

conclusion. Also the best answers provided the decision rules before explaining their own conclusion. A clear formulation of hypotheses and inferences was needed.

Alternatively others stated a null hypothesis of ‘no autocorrelation’ and alternative of ‘autocorrelation’. Some good answers provided a definition of autocorrelation laid our some of the assumptions (especially relating the AR process) of the D-W test. Some good answers provided a few words on the drawbacks on the test. One common

mistake I observed was writing the wrong regression equation with either one, two or three explanatory variables when the question clearly says there are 4 explanatory variables. Second most popular question in Section B. There were three parts. Part (a) was about the factors that affect variance and thus shape of the distribution of the estimate regression coefficient. The best answers first outlined why the regression coefficient had a probability distribution and defined what it meant. Then they exposed the equations of the variances of the regression coefficients. They then explained how each factor affected the variance of the coefficient. Then some answers as extra information discussed how multicollinearity, heteroscedasticity and omission of

important variables can affect the variance. Worth noting these answers were not just repeating my lecture handouts and/or just stating that ‘such factor is inversely

related..’ but were making a genuine attempt at explaining ‘how’ these factors also impacted on the variance. These answers were logically rewarded with high marks. Those answers which failed to ‘carefully explain’ the link (thus just stating there is a link) didn’t fare as well as the best answers. I also had graphs with little in way of explanation in some cases. Part (b) was a straightforward derivation which had to have all the steps. It required a proof of unbiasedness that E(b2) = β2. The common mistakes were proving unbiasedness without applying the expectational operator E[.]. Thus, I had answers telling me eventually that b2 = β2. The highest mark were given to those answers that showed their workings clearly and explaining what they were doing. In short, intermediate steps do count and needs to be explained. The most familiar missing steps were some not showing how they got to b2 = β2+

Cov(X,U)/Var(X) and started their proof from there. To me an important part of the proof was missing, thus leading to lower marks. In fact I could say the fewer the intermediate steps, the lower the marks. You had to start with the given equation of the estimator b2 in the formula sheet, i.e, b2 = Cov(X,Y)/Var(X). You had to then

proceed step by step in your proof saying to me what you are doing. Like ‘substituting for Y=. we then obtain.. ’. Also poorly enough quite a few answers didn’t say enough about what covariance rules were being used to proceed and obtain b2 = β2+

Cov(X,U)/Var(X). This is where you apply expectations and then carefully explain given E(u)=0 we complete the proof. Not many observed that E(u)=0 was given in the question (and hence could use it rather than assume it was so). Here I wasn’t sure why some answers when stating ‘since E(u)=0’ also had ‘E(u2)=σ2’. Note the latter in not used in the proof of unbiasedness and such statements can only convey your ignorance. There were a few answers that started with a derivation of b2 which I am not sure why but it was ok as long as the proof of unbiasedness was correct. Note

given that b2 is provided in the formula sheet so you can use it from there. For part (c) you had to answer two sub-parts one asking you about what BLUE means and the second asking you about how relevant they were. This second sub-part wasn’t well answered by most answers with some people even failing to answer it. Also using the allocated mark as a guide, I was expecting at least a short definition-like explanation of each concept and not a proof of these or answers with only equations and no explanation. This is not good enough. The questions starts with ‘Explain’ and an

17

Question 6:

Question 7:

answer that doesn’t do so tells me you don’t understand what the question is asking. The answer had to be mostly discursive with where appropriate use and explain an equation and/or a graph. Quite a few got linearity wrong. They confused ‘linearity of estimators’ with the first classical linear regression model assumptions of ‘linear in the parameters’. They are completely different. In this question we were talking of

estimators not parameters [surprising the word parameters kept appearing in so many answers]!! Linearity here means the OLS estimators, in the form they are computed, are linearly related to Y, the dependent variable (I had some answers saying it was linearly related to both Y and X. How can it be linearly related to X when X appears in denominator!!). I also had answers telling me it was the ‘disturbance term which had minimum variance’. There were three parts to this question. Part (a) required an explanation of how OLS ‘estimates’ the unique effect of one explanatory variable on the dependent variable, thus controlling for the effect of other variables, and thus estimates a regression

coefficient by using uniquely attributable information. An explanation could use partial derivative (dY/dX partial) & the ceteris paribus condition to explain this and/or use the minimising process of OLS related to the RSS (dRSS/dbs.. in partial derivative of course) and/or using the equation of the OLS estimator under multiple regression

discussing that part in the equation which subtract the common effect. Another way to do this was using the Ballentine. There was a second sub-part to this part (a) which was about the consequences of multicollinearity on ESTIMATION. Note I wasn’t asking anything about hypothesis testing. I am not sure how many observed this. Yet quite a few spent too much time and space discussing the consequences on statistical

inference which the question wasn’t asking you about. Some answers also failed to properly explain how OLS works to ‘control for the effects of all included explanatory variables‘, focussing instead on discussing multicollinearity. What’s more too many answers to the latter part were too trivial in that they were just reproducing equations from my lecture handouts. That’s not good enough. You had to explain these

consequences. The best answers did that and were thus rewarded with high marks. Also I was surprised no one bothered to offer a definition of multicollinearity. Some used the Ballentine to explain part of this question. Furthermore, some answers offered little in terms of discussion for a question worth 50%. Part (b) required a straightforward proof of biasedness for OLS estimators when we have omitted

important variables. This was relatively well done but again the better answers were the ones with all the steps. Part (c) required a comparison using the Ballentine as to how and what differences we can see between ‘excluding an important variable’ and ‘adding an irrelevant one’. Again this was relatively well done by those who answered Q6. Some answers though sometimes missed the explanation of the irrelevant variables. The most popular question in Section B. As far as the first two parts were concerned it was a generally well answered question. The most common mistakes were as discussed in point 5 the failure to get the correct degrees of freedom and critical values thus. However, most answers were able to carefully formulate null and

alternative hypotheses and thus draw meaningful conclusions. The best answers were careful in providing in a step-wise fashion: explanation of why we need to use t-test (as it was a test on an individual coefficient); hypotheses (carefully outlined and not just with the usual Ho: β2 =0); compute test statistic (under the null with carefully outlining that the test statistic follows t-distribution etc..); explain how degrees of freedom and critical values {telling us what level of significance they have set} are found (not just df=540-3=537 or df=537); explain decision rules; and then explain what you find (tcomputed > tcritical); draw your inferences. Some of the best answers then interpreted the coefficient carefully noting this was a log-linear equation and thus multiplied 0.122 by a 100 before interpretation. Some scripts which attempted an interpretation forgot to do this. There were unfortunately very few of these. Most answers just tried to provide the bare minimum and sometimes missing some

intermediate steps in the process. Some rare problem was people trying to test Ho: b2 = 0 instead of Ho: β2 = 0. Some got the conclusions wrong or the hypotheses wrong. The next part (b) using the F-test of ‘overall of significance’. It was important that you formulated the hypotheses – explaining them [what they mean rather than just stating

18

Question 8:

the betas =0]– and drawing a conclusion. This was well done in general. The intermediate steps of how you get there were also important. Here you have to show the test statistic you are using and which distribution it follows. You had to compute the values of the tests and briefly say what the degrees of freedom were and how it gives the critical values at the α–level of significance which you then can write down. The latter was missing from a few answers. Surprisingly quite a few got the critical values wrong. This is an important criteria I look for when I mark. It is a simple but important skill to demonstrate: ‘knowing how to find the critical values’. Failure to do so leave me unimpressed and thus the marks reflect this. Another common mistake was people trying to test H0: β1=β2= β3=0 (where they included the intercept term β1). One common mistake made by a few, as explained in point 5 above, was using and confusing the provided F(2, 537)=97.51 for the critical value. It was unbelievable what I reading (Why would I provide the critical values with the regression output!!). As I explained above this demonstrated poor knowledge of econometrics and marks that would have been otherwise allocated were not. I also had a few answers with the

wrong hypotheses H0: b2= b3=0. The estimators take one value for this sample so you can’t be testing them. Part (c) required one to use the Wald test to test if experience had an effect on earnings. The hypothesis being tested was whether experience in level form (X) and squared of experience (X2) belonged to this regression [H0: β3= β4=0]. As the question clearly states you had to specify the unrestricted & restricted models and the hypotheses. I was expecting answers to specify the Population Regression Function (PRF) with population parameters (and not equations in their fitted forms as I already know them from the question!!). If you didn’t provide such information then it would be legitimate of me to ask where does H0: β3= β4=0 come from. Many answers couldn’t write down these two equations properly some had UR and R which personally mean little if not explained or some combined the PRF and SRF with hats and the bs (making a mess in the process). So very few got high marks

here. Similar to the above you also had to explain the intermediate process to getting to the final answer (especially the marks were for 40%). While most of you got the answer correctly, it was those intermediate steps and small things like getting the equations right which was a common failing of most answers. Least popular question. Of those few who attempted this question the majority didn’t do a good job I am afraid. Part (a) required a simple definition and explanation of what heteroscedasticity was. While most answers were able to define

heterosacedasticity in one sentence at least, some answers however started discussing about the consequences. I don’t have a problem with that as long heteroscedasticity was carefully well explained in the first place. This wasn’t always the case. You had to spend at least a few lines to explain heteroscedasticity rather than just in one

sentence. Part (b) was not well answered at all. While most computed the test statistic correctly the majority of the few who attempted this exercise got the degrees of freedom and critical values wrong. The best answers outlined carefully outlined the hypotheses and the auxiliary regression

ˆ2=α1+α2G+α3Y+α4G2+α5Y2++α6GY+v used by the White test and computed u

the statistic. The test statistic of the White test is n.R2 ~χ(KA-1) where KA, is number of parameters in the auxiliary regression,. The sample size of countries was 30. Some failed to find the degrees of freedom, ν, which is 5 (KA-1 =6-1). Many thought it was 2 (beats me how they got that!!). Hence, critical values obtained were at times wrong. The best answer pointed out the problem of using the White test here as its large

sample test. Part (c) wasn’t well answered either. You had to carefully explain why the estimated regressions 2 & 3 are better than regression 1. Then you can interpret the effect of G on I from these two equations.

2

________________________________________________________________________End

19

Residual sum of squares (RSS) The RSS is given by RSS=

XI. Exam Formula Sheet

∑e

2i

, where ei denotes the regression residual

Ordinary Least Squares (OLS) Estimators

Simple regression: Multiple regression: Y=β1+β2X+uY=β1+β2X2+β3X3+u

For the above regressions (with the subscript i dropped) the OLS estimators b1 and b2 are given by b1=Y−b2Xb1=Y−(b2X2+b3X3) XY−nXYCov(X2,Y)Var(X3)−Cov(X3,Y)Cov(X2,X3)

b=b2= 2

Var(X2)Var(X3)−Cov(X2,X3)2X2−nX2

Cov(X,Y) =

Var(X)

Standard Error of a Regression Coefficient

The standard error for the OLS estimator b2 in the simple and multiple regressions is given as Simple regression: Multiple regression: S2

1S2 s.e(b2)=.()seb=×2nVar(X) nvar(X)1−r2∑∑where S

2

e∑=

2i

2

X2X3

n−k

represents the sample variance with n – k degrees of freedom, such that n denotes

the number of observations and k denotes the number of parameters in the regression, and

2rXrepresents the correlation between explanatory variables X2 and X3. In the simple regression k = 2X3

2, while k = 3 in the multiple regression.

Confidence Interval for β

The confidence interval for βj is given by

bj−s.e.(bj)×tcrit≤βj≤bj+s.e.(bj)×tcrit

where j denotes the jth parameter (e.g when j = 2 we are looking at β2 and b2) and tcrit represents the critical value obtained from the t-tables.

t-statistic

For the test of the null hypothesis that the coefficient β equals some hypothesised value of βj denoted asβj, the statistic

bj−β0j t=

s.e.(bj)

follows the t distribution with n – k degrees of freedom.

F-test for the overall significance of the regression

In the multiple regression model a test of the significance of the regression can be conducted using the statistic

ESS/(k−1) F=

RSS/(n−k)

which follows the F distribution with degrees of freedom ν1 = k – 1 and ν2 = n – k, and where ESS denotes the explained sum of squares.

0

20

Wald F-test for Linear Restrictions

Given a set of linear restrictions on the parameters of a regression model, if the restrictions are valid, the statistic

22 (RUR−RR)/(k−m)(RSSR−RSSUR)/(k−m)

F=F =2

RSSUR/(n−k)(1−RUR)/(n−k)

follows the F distribution with degrees of freedom k – m in the numerator and degrees of freedom n – k in the denominator, where RSSR (RR) is the RSS (R2) from the restricted regression model, RSSUR (RUR) is the RSS (R2) from the unrestricted regression model, k is the number of parameters in the unrestricted regression and m is the number of parameters from the restricted model.

Durbin-Watson d statistic

Given the error process ut = ρut-1 + εt, where εt is a white noise error term, the Durbin-Watson statistic which follows the Durbin-Watson distribution with number of observations n and number of explanatory variables k′ is given by

n

(et−et−1)2

d=t=2n et2 t=1

where et denotes the regression residual.

2

2

21

XII. Exercises Classes Questions

Exercise Class 1

EC1-Question 1: Earnings regression a) Consider the following regression output for an earnings regression of hourly earnings, e (measured in $) on years of schooling (s), years of experience (x) and gender (where the variable male takes value 1 if the individual is a male and 0 otherwise).

. reg e s x male

Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 58.16 Model | 28222.8502 3 9407.61674 Prob > F = 0.0000 Residual | 86694.1112 536 161.742745 R-squared = 0.2456 -------------+------------------------------ Adj R-squared = 0.2414 Total | 114916.961 539 213.20401 Root MSE = 12.718 ------------------------------------------------------------------------------ e | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s | 2.587587 .2258699 11.46 0.000 2.143888 3.031286 x | .4679175 .1357672 3.45 0.001 .2012165 .7346185 male | 6.378477 1.109314 5.75 0.000 4.199341 8.557613 _cons | -26.79575 4.394197 -6.10 0.000 -35.4277 -18.16379 ------------------------------------------------------------------------------

Write down the population regression function. Interpret the slope coefficients.

b) Consider the following regression output of the log of hourly Earnings (ln e) on years of schooling (S) and years of experience (X). {E is measured in $}

. reg lne s x

Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 2, 537) = 97.51 Model | 50.60808 2 25.3320404 Prob > F = 0.0000 Residual | 139.511374 537 .259797717 R-squared = 0.26 -------------+------------------------------ Adj R-squared = 0.2637 Total | 190.175455 539 .352830158 Root MSE = .5097

------------------------------------------------------------------------------ lne | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s | .1216032 .0090323 13.46 0.000 .1038601 .1393462 x | .0412912 .0053697 7.69 0.000 .030743 .0518395 _cons | .429947 .1761078 2.44 0.015 .0840024 .77515 ------------------------------------------------------------------------------

Comment on the coefficients on the regressors. What is the earnings of someone with 12 years of schooling and 5 years of experience?

(c) A labour economist wishes to examine the effects of schooling and experience on earnings. She considers the following regression model:

lnE=β1+β2S+β3X+β4X2+u

The labour economist decides estimated regression is:

^ lnE=0.880+0.126S−0.031X+0.002X2

(0.248)

(0.009)

(0.029)

(0.001)

R2=0.275RSS=137.812N=540

She finds that the coefficient on X is insignificant and decides to test the joint significance of both X and X2 on earnings. She thus runs and obtains the following fitted simple regression without the experience variables: ^ lnE=1.425+0.100S

(0.126)

(0.009)

R2=0.186

RSS=154.873

22

N=540

Formulating the (un-)restricted models and your hypotheses, use the Wald test to find out if ‘experience has a significant effect on earnings’.

(d) Consider the regression equation from part (a). Explain how you would modify this regression to allow for another dummy variable for the attribute ethnicity, describing either a white employee or a black employee, and find the difference in the mean earnings between a black female employee and a white female employee.

(e) Suppose the ethnic dummy variable for each employee could actually be categorised as either, namely BLACK, HISPANIC or WHITE, instead of black or non-black. Using BLACK as your reference group, clearly define your dummy variables and explain the modification you could make to the above regression (1) to allow for these three ethnic attributes and then find for the same level of schooling and experience the difference in mean earnings: between a black employee and a hispanic employee; and a black employee and a white employee.

EC1-Question 2: Philips Curve The Phillips curve in its basic formulation relates inflation to unemployment which stipulates a negative relationship between inflation rate and unemployment rate. This formulation can be extended to account for the expected value of inflation. The expectations-augmented Phillips curve takes the form

It = β1 + β2 Ut + β3 Et + ut

where, It is the actual inflation rate (%) at time t, Ut is the unemployment rate (%) at time t, Et is the expected value at time t of future inflation (%) and ut is a normally distributed disturbance term with zero mean and constant variance.

Using annual U.K. data for the period 1960-2002 the following regression equation, with standard errors in parentheses, was obtained

Ît = 5.00 – 1.45 Ut + 1.85 Et

(0.78) (0.41) (0.21)

2

R = 0.877 F (2,40) = 142.6 For this model,

(d) Construct and comment on a 99% confidence interval for β2

(e) Stating the null and the alternative hypotheses test whether ‘unemployment has an effect on actual

inflation’ and interpret your results at the 1% level.

(f) What are the null and alternative hypotheses in an F test for the ‘significance of the regression’? At

the 1% significance level, test this null hypothesis using the F statistic reported above.

EC2-Question 1: Omission of Important Variables Using 25 annual observations a researcher obtains a regression of food expenditure (Y) against disposable personal income (X) and prices (P) for the US

^ Y = 116.7 – 0.739 P + 0.0112 X (1) (9.6) (0.114) (0.003)

R2 = 0.99 F(2,22) = 10.00

where Y and X are both measured in $bn at constant 1972 prices and P is a price index of food relative to the consumer price index (1972 =100). Assuming the model is correctly specified, somehow the researcher misguidedly drops disposable income (X) from his regression and the resulting regression is obtained:

^ Y = -125.9 + 2.462 P (2) (42.1) (0.407)

R2 = 0.62 F(1,23) = 37.53

a) Provide an explanation for the differences in the results of the two models, carefully outlining the various consequences of omitting an explanatory variable.

Exercise Class 2

23

b) If instead specification (2) was the true regression and we added X (an irrelevant variable) what the consequences would be.

EC2-Question 2: Heteroscedasticity A researcher who wishes to investigate the relationship between investment, I, government expenditure, G, and national income, Y, comes up with the following model

I=β1+β2G+β3Y+u

Using data for a sample of 30 countries, with I, G and Y measured in US$ billion, he fits the regression (standard errors in parentheses): ^

I=18.10+1.07G+0.36Y

(7.79)(0.14)(0.02)

R2=0.99

(1)

Suspecting the presence of heteroscedasticity, he also regresses I/Y on G/Y and 1/Y: ^

IG1

=0.39−0.93+0.03YYY(0.04)(0.22)(0.42)

R2=0.78

(2)

Finally, he regresses log I on log G and log Y:

logI=−2.44−0.63logG+1.60logY

(0.26)(0.12)(0.12)

^

R2=0.86

(3)

It is very likely that regressions (2) and (3) do not suffer from heteroscedasticity.

(a) The researcher uses specification (1) and regresses the square of the residuals on G, Y and their squares, and their product. He obtains an R2 of 0.9878 for this regression. Perform a White test for heteroscedasticity, specifying the various steps of the test (Hint: hypotheses, auxiliary regression, test statistic, decision rule). (b) Discuss the merits of specifications (2) and (3) and explain what can you conclude about the effect of government expenditure, G, on investment, I.

24

XIII. MCQ sheet

25

因篇幅问题不能全部显示,请点此查看更多更全内容

Copyright © 2019- huatuo9.cn 版权所有 赣ICP备2023008801号-1

违法及侵权请联系:TEL:199 18 7713 E-MAIL:2724546146@qq.com

本站由北京市万商天勤律师事务所王兴未律师提供法律服务