General Linear Model (Matrix Form)

  1. Bivariate Regression Model Example

y i = observed score on criterion for subject i

y i* = predicted score on criterion for subject i

ei = ( yiyi* ) = error score for subject i

X1i = score on predictor variable X1 for subject i

X2i = score on predictor variable X2 for subject i

Scalar form:

y1* = b0 + b1X11 + b2X21

y2* = b0 + b1X12 + b2X22

y3* = b0 + b1X13 + b2X23

yN* = b0 + b1X1N + b2X2N

Y = N x 1 column vector of criterion scores


           

 

X = N x 3 matrix of predictor variable scores


           

 

b = 3 x 1 column vector of regression coefficients

           

 

Matrix Form:

Y* = Xb

E = (Y-Y*)
 

II. General Estimation Problem: 

Y is N x 1, X is N x p, Y* = Xb , E = Y-Y*

Find b that minimizes SSE = E'E (the squared length of E)

To solve this we need to find the projection Y*=Xb of the point Y onto the plane X. The shortest distance is achieved when the difference E = (Y-Y*) is orthogonal to the plane, i.e., X’E = X’(Y-Xb) = 0.

Proof: Suppose b satisfies (Y-Xb)’X = 0, and c satisfies (Y-Xc)’X =/= 0 

Define D = XcXb so that Xc = Xb + D. Then

(Y-Xc)’(Y-Xc) = [Y – (Xb + D)]’[Y – (Xb + D)] = [(Y – Xb) – D]’[(Y - Xb) – D] 

= (Y-Xb)’(Y-Xb) – D’(Y-Xb) – (Y-Xb)’D + D’D = (Y-Xb)’(Y-Xb) + D’D > (Y-Xb)'(Y-Xb)

because D’(Y-Xb) = (c-b)’X’(Y-Xb) = 0 = (Y-Xb)’X(c-b) = (Y-Xb)’D . QED

Now that we have proved X’(Y-Xb) = 0 miminizes SSE

We can use this fact to solve for b

X’(Y-Xb) = 0 implies (X’Y) = (X’X)b so

General Solution: b = [(X’X)-1X’]Y or b = PY, P = [(X’X)-1X’]


 
 

III. General Linear Model (Univariate)

Y = Xb + e

X is the fixed design matrix

b is the population regression coefficient vector 

e ~ Normal ( 0 , s 2I ) implies Y~ Normal ( Xb , s 2I )

Some Properties of Least Squares Estimates: 

b = [(X'X)-1X']Y = PY

E [ b ] = E { PY } = E{(X'X)-1X'(Xb + e)}= E[ b + e ] = b .

Cov(b,b) = Cov( PY, PY) = PCov(Y,Y)P' = s 2PP' = s 2(X'X)-1 .

 

IV. General Linear Model (Multivariate)

Y = N x p matrix of scores from N subjects on p criterion variables

X = N x q matrix of scores from N subjects on q predictor variables

B = (X'X)-1(X'Y) = q x p matrix of regression coefficients

Y* = XB = N x p matrix of predictions

E = Y - Y* matrix of residuals 

Hypothesis Testing:

C = (g x q) pre contrast matrix (between subjects contrasts)

A = (p x u) post contrast matrix (within subjects contrasts)

D = CBA = (g x u) contrast matrix


 
 

H0: E [ D ] = 0

Qe = A'[ E'E ]A

Qh = D'[ C(X'X)-1C' ]-1D

L = Det(Qe) / Det(Qe + Qh) ,

L ~ Wilks Lambda with df = (u,g,N-q)

F = [ (1 - L 1/t ) / L 1/t ][ dfD / dfN ]

dfN = (g)(u)

dfD = (r)(t)-2w

r = (N-q)-(u-g+1)/2

w = ( gu -2)/4

t = Sqrt[ (g2u2 - 4)/(g2 + u2 - 5)] if (g2+u2 - 5) > 0, t = 1 otherwise.


 

Model Comparison View of Qh 

Assume A = I

Define:

Y* = XB   (predictions from complete model)

YR* = XBR ( BR is restricted to satisfy the constraints imposed by E[CB] = 0 )

Then

Qh = [ (Y*)'(Y*) - (YR*)'(YR*) ] = (CB)'[C(X'X)-1C']-1(CB)

 

SAS and Matlab Programs

Example Repeated Measures ANOVA MATLAb Program

Example Specific Contrast