Simple Linear Regression Continued

 

In the previous notes on simple linear regression we introduced the definition of the linear model, how to calculate the least squares estimates, and how to computer R-square.  Now we will develop the concepts for constructing confidence interval estimates of the model coefficients and statistical tests of the model parameters.

 

First we need to establish the sampling distribution of the slope estimate.

 

1. Sampling Distribution of b1: Relative Freq distribution of the sample estimate b1 

For large sample sizes, the central limit theorem applies, and b1 will be approximately normally distributed.

b1 = E[ b1 ] = Mean of the sampling distribution of b1

Var[ b1 ] = { Var[ ei ] / (N-1) }{1/Var[ Xi ] }

        = variance of the sampling distribution of b1

 

Sample estimate for Var[b1] = [MSE / (N-1)] (1/sx2)

MSE = SSE/(N-2)

sx2 = Var(X1)

 

Std error of b1:

std(b1) = (se / sx )/Sqrt(N-1)

where se  = sqrt(MSE)

this is a measure of the precision of the estimate of the slope.

 

Now we will derive the confidence interval for large N using the z-distribution

 

2. Confidence Intervals with Large N

zb = (b1 - E[b1]) / std(b1)

is approximately normal(0,1)

 

 

Derivation of the 95% Confidence Interval

Define w as the width of the confidence interval.

Lower Bound = LB = b1 - w

Upper Bound = UB = b1+w

 

Choose the width such that

Pr[ E[b1] < LB ] = .025

Pr[ E[b1] > UB ] = .025 

Pr[ E[b1] < LB ]

= Pr[ E[b1] < b1 - w ]

= Pr[ w < b1 - E[b1] ]

= Pr[ w/std[b1] < zb ] ( define zc = w/std[b1] )

= Pr[ zb > zc ] = .025

 

Using z-Table, we look for the z value that has an area below equal to .975. The answer is zc = 1.96.

zc = 1.96 = w/std(b1)

w = (1.96)std(b1)

This is the answer to our problem.

 

Now we will derive the answer for small sample sizes using the t-distribtion.

 

3. Confidence intervals for small N

For Small N and assuming Y is normal

tb = (b1 - E[b1])/std(b1)

has a t-distribution

 

95% Confidence Interval for small N:

Lower Bound = LB = b1 - w

Upper Bound = UB = b1+w

w = (tc )Std(b1)

tc is obtained from t-Table 

row corresponding to df = N-2

col corresponding to Area below = .975

So we use this formula for w to determine the width of the 95% confidence interval

 

 

Now we will derive a method for statistically testing the slope.

 

4. Hypothesis Testing:

H0 : Null Hypothesis = null model is correct

Yi' = b0, E[b1] = 0

 

H1: Alternative Hyp = linear model is correct

Yi' = b0 + b1Xi , E[b1] > 0 or E[b1] < 0.

 

 

Decision Table

True State

Decision

H0 true

H1 True

Choose H0

1-alpha

beta = Pr[type 2]

Choose H1

alpha=Pr[type 1]

power=1-beta

 

Procedure I: Set alpha = significance level = .05

1. t* = b1 / std(b1)

2. Look up tc from Table using

row: df = N-2

col : Area below = .975 (1-alpha/2)

3. reject H0 if |t*| > tc (two tail test)

or reject H0 if t* > tc (one tail test)

 

Procedure II:

1. t* = b1 / std(b1)

2. Find p = Pr[ t > |t*| | H0 ] (from printout)

3. reject H0 if p < .05 (two tail test)

 

F -test (again assuming Y is normal)

H0 : E[b1] = 0

Source

Sum of Squares

Mean Sq

F

Linear

SSR=TSS-SSE

MSR=SSR/1

F*=MSR/MSE

Residual

SSE

MSE=SSE/N-2

 

Null

TSS

sy2 = TSS/N-1

 

 

Procedure I:

1. Look up Fc from Table

row section: den df = N-2

row subsection: A = 1-significance level = 1-.05 = .95

column: num df = 1

2. Reject H0 if F* > Fc

 

Procedure II:

Reject H0 if

p = Pr[ F > F* | H0 ] < .05 = significance level.

(from printout)