Lecture 5

Simple Linear Regression Continued

In the previous notes on simple linear regression we introduced the definition of the linear model, how to calculate the least squares estimates, and how to computer R-square. Now we will develop the concepts for constructing confidence interval estimates of the model coefficients and statistical tests of the model parameters.

First we need to establish the sampling distribution of the slope estimate.

1. Sampling Distribution of b₁: Relative Freq distribution of the sample estimate b₁

For large sample sizes, the central limit theorem applies, and b₁ will be approximately normally distributed.

b₁ = E[ b₁ ] = Mean of the sampling distribution of b₁

Var[ b₁ ] = { Var[ e_i ] / (N-1) }{1/Var[ X_i ] }

= variance of the sampling distribution of b₁

Sample estimate for Var[b₁] = [MSE / (N-1)] (1/s_x²)

MSE = SSE/(N-2)

s_x² = Var(X₁)

Std error of b₁:

std(b₁) = (s_e / s_x )/Sqrt(N-1)

where s_e = sqrt(MSE)

this is a measure of the precision of the estimate of the slope.

Now we will derive the confidence interval for large N using the z-distribution

2. Confidence Intervals with Large N

z_b = (b₁ - E[b₁]) / std(b₁)

is approximately normal(0,1)

Derivation of the 95% Confidence Interval

Define w as the width of the confidence interval.

Lower Bound = LB = b₁ - w

Upper Bound = UB = b₁+w

Choose the width such that

Pr[ E[b₁] < LB ] = .025

Pr[ E[b₁] > UB ] = .025

Pr[ E[b₁] < LB ]

= Pr[ E[b₁] < b₁ - w ]

= Pr[ w < b₁- E[b₁] ]

= Pr[ w/std[b₁] < z_b ] ( define z_c = w/std[b₁] )

= Pr[ z_b > z_c ] = .025

Using z-Table, we look for the z value that has an area below equal to .975. The answer is z_c = 1.96.

z_c = 1.96 = w/std(b₁)

w = (1.96)std(b₁)

This is the answer to our problem.

Now we will derive the answer for small sample sizes using the t-distribtion.

3. Confidence intervals for small N

For Small N and assuming Y is normal

t_b = (b₁ - E[b₁])/std(b₁)

has a t-distribution

95% Confidence Interval for small N:

Lower Bound = LB = b₁ - w

Upper Bound = UB = b₁+w

w = (t_c )Std(b₁)

t_c is obtained from t-Table

row corresponding to df = N-2

col corresponding to Area below = .975

So we use this formula for w to determine the width of the 95% confidence interval

Now we will derive a method for statistically testing the slope.

4. Hypothesis Testing:

H₀ : Null Hypothesis = null model is correct

Y_i' = b₀, E[b₁] = 0

H₁: Alternative Hyp = linear model is correct

Y_i' = b₀ + b₁X_i , E[b₁] > 0 or E[b₁] < 0.

Decision Table	True State
Decision	H₀ true	H₁ True
Choose H₀	1-alpha	beta = Pr[type 2]
Choose H₁	alpha=Pr[type 1]	power=1-beta

Procedure I: Set alpha = significance level = .05

1. t* = b₁ / std(b₁)

2. Look up t_c from Table using

row: df = N-2

col : Area below = .975 (1-alpha/2)

3. reject H₀ if |t*| > t_c (two tail test)

or reject H₀ if t* > t_c (one tail test)

Procedure II:

1. t* = b₁ / std(b₁)

2. Find p = Pr[ t > |t*| | H₀ ] (from printout)

3. reject H₀ if p < .05 (two tail test)

F -test (again assuming Y is normal)

H₀ : E[b₁] = 0

Source	Sum of Squares	Mean Sq	F
Linear	SSR=TSS-SSE	MSR=SSR/1	F*=MSR/MSE
Residual	SSE	MSE=SSE/N-2
Null	TSS	s_y² = TSS/N-1

Procedure I:

1. Look up F_c from Table

row section: den df = N-2

row subsection: A = 1-significance level = 1-.05 = .95

column: num df = 1

2. Reject H₀ if F* > F_c

Procedure II:

Reject H₀ if

p = Pr[ F > F* | H₀ ] < .05 = significance level.

(from printout)