Principal Components Analysis Purpose: Reduce a large set of p correlated variables [X1 X2 … Xp] down to a smaller set of q < p uncorrelated components [Q1 Q2 … Qq] such that the components a. Are formed by linear combinations of the
original variables |
Definitions: X is a N x p observed data matrix with N rows of observations on p columns of variables (expressed in deviation scores so that the mean of each column is zero). SX = (X'X)/(N-1) is the p x p sample variance - covariance matrix SX = PDP' with P'P = PP' = I and D = Diag[ d1 , d2 , …dq , … dp ] Std = Diag(SX).5 The eigenvalues are ordered in magnitude so that d1 > d2 > … > dp P = [ Pq | Pp-q ], Pq is a p x q matrix formed from the first q eigenvector columns of P Dq = Diag[d1 , d2 , …dq ] is a diagonal matrix of the first q eigenvalues |
PC Model: X = QB + E Q is a N x q matrix with N rows of scores on q columns of components Q = XA for some full column rank p x q matrix A, and Q'Q = diagonal B is a q x p matrix of coefficients used to reproduce X from Q E is a N x p matrix of residuals Goal: Find A and B such that Trace[E'E] is minimized. |
Solution: X
= QB + E = (XA)B + E A = Pq and B = Pq' SQ = Q'Q/(N-1) = Pq'X'XPq / (N-1) = Pq'PDP'Pq = Dq Proportion Reproduced: R2 = 1 - Trace[E'E]/Trace[X'X] = [ d1 + d2 + … + dq ] / [ d1 + d2 + … + dp ] Cov[X,Q] = X'Q/(N-1) = X'XPq / (N-1) = PDP'Pq = PqDq |
Alternative Form of Solution: X = QB + E = (XPq)Pq’ + E
= (XPqDq-.5)(Dq.5Pq’) + E = VW + E V = X(PqDq-.5)
=( X × Std-1)(Std ×
PqDq-.5) C = (Std×
PqDq-.5) W = (Dq.5Pq’), or W’
= PqDq.5 SV = V'V/(N-1) = Dq-.5Pq'X'XPq Dq-.5 /(N-1) = Dq-.5Pq'PDP'Pq Dq-.5 = Dq-.5 Dq Dq-.5 = I Now the
components are ortho – normal. SPSS prints out W’
for component matrix, C for the component score coefficient matrix |
Residuals from PC Analysis E = X - QB = X - XPqPq' = X(I-PqPq') = XPp-qPp-q' E'E/(N-1) = ( Pp-qPp-q' )' X'X( Pp-qPp-q' )/(N-1) = ( Pp-qPp-q' )PDP'( Pp-qPp-q' ) = Pp-qDp-qPp-q' As required SX = PDP = S dj Pj Pj' = PqDqPq' + Pp-qDp-qPp-q' . |
Elements of Factor Analysis 1. Common Factor Model X is a N x p matrix of scores from N subjects on p variables The Common Factor model is based on four assumptions: a. X =
FA +
e
(Linearity) Then E[ SX ] = E [
X'X / (N-1) ]
If C = I then the
factors are uncorrelated and E[ SX ] = A'A + U |
2. Principle Axis Extraction Extract eigenvalues and eigenvectors from the reduced covariance matrix ( E[SX ] - U ) = PDP’ , where U is the uniqueness diagonal matrix If we choose to use q factors then we assume E[SX ] =
= (PqDq.5)( Dq.5Pq'
) + U where A = Dq.5Pq'
, and Dq.5 = Diag[ Sqrt(d1),
Sqrt(d2), …, Sqrt(dq) ] . |
3. Rotation: Linear Transformation of Factors T is a q x q full rank matrix used for rotation E [ SX ] = A'A + U (orthogonal
solution) = A’(TT-1 )(T’-1T’)A
+ U = (A’T)(T-1T’
-1 )(T’A) + U = A*’CA* + U, A* = T’A are rotated factor loadings. If T is orthonormal, then C
= (T-1 )(T’ -1) =
I and SX = A*'A* + U. Cov[X,F] = E[ X'F/(N-1) ] = E [ (FA* +e)'F/(N-1) ] = E [ (A*'F'F +e'F)/(N-1) ] = E [ A*'(F'F/N-1) ] = A*'C = A’T(T-1T’ -1)
= A’T' -1 Cov(F,X) = T-1A |
References:
Fabrigar, Wegener, MacCallum, Strahan (1999) Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods, 4, 272-299.
Mulaik, S. (1972) The foundation of factor analysis. NY: McGraw-Hill
Code for PC analysis on matlab: (assume X is a dev score matrix)
S = X’X/(n-1)
S is the name of the covariance matrix.
Std = sqrt(diag(diag(S)))
[ P , D ] = eig(S)
P = eigenvectors
D = eignenvalues
WT = P*D^.5
WT = Component matrix in SPSS printout.
C = Std*P*inv(sqrt(D))
C = component score coefficient matrix