Simple
Linear Regression Continued
In the previous notes on simple linear
regression we introduced the definition of the linear model, how to calculate
the least squares estimates, and how to computer R-square. Now we will develop the concepts for constructing
confidence interval estimates of the model coefficients and statistical tests
of the model parameters.
First we need to establish the sampling
distribution of the slope estimate.
1.
Sampling Distribution of b1: Relative Freq distribution of the
sample estimate b1
For large sample sizes, the central limit
theorem applies, and b1 will be approximately normally distributed.
b1 = E[
b1 ] = Mean of the sampling distribution of b1
Var[ b1 ] = { Var[ ei ] / (N-1) }{1/Var[ Xi ] }
=
variance of the sampling distribution of b1
Sample
estimate for Var[b1] = [MSE / (N-1)] (1/sx2)
MSE = SSE/(N-2)
sx2
= Var(X1)
Std error of b1:
std(b1) = (se
/ sx )/Sqrt(N-1)
where se = sqrt(MSE)
this is a measure of the precision of the estimate of the slope.
Now we will
derive the confidence interval for large N using the z-distribution
2.
Confidence Intervals with Large N
zb = (b1 - E[b1])
/ std(b1)
is approximately normal(0,1)
Derivation of
the 95% Confidence Interval
Define w as
the width of the confidence interval.
Lower Bound =
LB = b1 - w
Upper Bound =
UB = b1+w
Choose the
width such that
Pr[ E[b1] <
LB ] = .025
Pr[ E[b1] >
UB ] = .025
Pr[ E[b1] <
LB ]
= Pr[ E[b1]
< b1 - w ]
= Pr[ w < b1 -
E[b1] ]
= Pr[ w/std[b1]
< zb ] ( define zc
= w/std[b1] )
= Pr[ zb > zc
] = .025
Using z-Table,
we look for the z value that has an area below equal to .975. The answer is zc = 1.96.
zc = 1.96 = w/std(b1)
w = (1.96)std(b1)
This is the answer to our problem.
Now we will
derive the answer for small sample sizes using the t-distribtion.
3. Confidence intervals
for small N
For Small N
and assuming Y is normal
tb = (b1 - E[b1])/std(b1)
has a t-distribution
95% Confidence
Interval for small N:
Lower Bound =
LB = b1 - w
Upper Bound =
UB = b1+w
w = (tc )Std(b1)
tc is obtained from
t-Table
row corresponding to df = N-2
col corresponding to Area below = .975
So we use this
formula for w to determine the width of the 95% confidence interval
Now we will
derive a method for statistically testing the slope.
4.
Hypothesis Testing:
H0 : Null Hypothesis = null model is correct
Yi' = b0, E[b1]
= 0
H1:
Alternative Hyp = linear model is correct
Yi' = b0 + b1Xi
, E[b1] > 0 or E[b1] < 0.
Decision Table |
|
|
Decision |
H0 true |
H1 True |
Choose H0 |
1-alpha |
beta = Pr[type 2] |
Choose H1 |
alpha=Pr[type 1] |
power=1-beta |
Procedure I: Set
alpha = significance level = .05
1. t* = b1 / std(b1)
2. Look up tc from Table using
row: df = N-2
col : Area below = .975
(1-alpha/2)
3. reject H0 if |t*| > tc
(two tail test)
or reject H0 if t* > tc
(one tail test)
Procedure II:
1. t* = b1 / std(b1)
2. Find p = Pr[ t > |t*| | H0 ] (from printout)
3. reject H0 if p < .05 (two tail test)
F -test (again
assuming Y is normal)
H0 : E[b1] = 0
Source |
Sum of Squares |
Mean Sq |
F |
Linear |
SSR=TSS-SSE |
MSR=SSR/1 |
F*=MSR/MSE |
Residual |
SSE |
MSE=SSE/N-2 |
|
Null |
TSS |
sy2 = TSS/N-1 |
|
Procedure I:
1. Look up Fc from Table
row section: den df = N-2
row subsection: A = 1-significance level = 1-.05 = .95
column: num df = 1
2. Reject H0
if F* > Fc
Procedure II:
Reject H0
if
p = Pr[ F > F* | H0 ] < .05 = significance
level.
(from printout)