disc

Discriminant Function Analysis

Purpose:
Reduce the original p raw score criterion variables to a smaller number q of discriminant scores such that
a) the discriminant scores are linear combinations of the original variables
b) they are orthogonal
c) together they maximize the multivariate F - statistic for some treatment effect

1. Review of GLM:
Y = N x p matrix of scores from N subjects on p criteria
X = N x q matrix scores from N subjects on q predictor variables
B = (X'X)^-1(X'Y) (matrix of coefficients)
Y* = XB (predictions)
E = Y - Y* (Residuals)
D = CB = (Treatment effects)
Q_h = D'[ C(X'X)^-1C' ]^-1D
Q_e = E'E

2. Goal:
Choose a post contrast matrix A = [a₁ , a₂ , ... , a_p]'
to maximize F = [(A'Q_hA)/(A'Q_eA)][df_D/ df_N]

In other words, compute discriminant scores, Z = YA
which produce the largest F ratio when Z is used as the dependent variable.

3. Solution:

Compute the eigenvectors and eigenvalues of the matrix product [Q_e^-1Q_h]

l₁ = the largest eigenvalue
P₁ = the eigenvector corresponding to the largest eigenvalue.

A = P₁ is chosen for the post contrast matrix,
Z = YP₁ defines the scores on the first discriminant variable.
l₁ = (P₁'Q_hP₁ )/(P₁'Q_eP₁ )

4. We can extract a second second discriminant variable orthogonal to the first by setting

l₂ = the second largest eigenvalue
P₂ = the eigenvector corresponding to the second largest eigenvalue.
Z = YP₂ defines the scores on the second discriminant variable

5. The extraction of discrminant variables can continue until we reach the rank of [Q_e^-1Q_h] .

rank(Q_e^-1Q_h) = rank( Q_h ) = rank(C) = number of rows in C used to define D = CB.