Estimation and Statistical Tests of
Parameters
References
Kelly, C. (1999) Iterative methods
for optimization. Siam. Philadelphia.
Eliason, S. R. (1993) Maximum likelihood estimation:
Logic and Practice. Sage.
Burnham, K. P. & Anderson, D.
R. (1998) Model selection and inference. Springer.
I. Objective Functions
Suppose we have a data matrix comprised of
K=100 rows and 2 columns
T1 |
R1 |
. |
. |
Ti |
Ri |
. |
. |
TK |
RK |
T is a column vector of K time intervals [
Ti = ith time interval] R is a column vector of K retention scores
(Ri = proportion at time i). N = sample size for each proportion (e.g. N
= 250) We wish to estimate parameters q = (b,c,d) For the retention model P(T i
|q ) = b/exp[ (c T i
)d ] + error This is a nonlinear model because the
prediction P is a nonlinear function of the unknown parameters q = (b,c,d)'.
We wish to choose q = (b,c,d)' that that minimize lack of fit. The function used to define lack of fit function is called the objective function. |
There are two common ways to define
the objective function : A. Weighted Sum of Squared Error: WSSE(q ) = S wi( Ri -
Pi)2 wi = 1/Var(Ti ) For proportions Var(Ti ) = Pi(1-Pi)/N For homogenous
normal scores Var(Ti ) = s 2 B. Log likelihood ratio G(q ) = -2 { Lc
- Lu } Lc = log likelihood of the constrained model Lu =
log likelihood of the unconstrained estimate For proportions, Lc = N S[ Ri ln(Pi
) + (1-Ri)ln(1-Pi ) ] Lu = N
S[ Ri ln(Ri ) +
(1-Ri)ln(1-Ri)] Click Here to see an example of the surface of G for the Weibull(2) model |
II. Minimization of Objective Functions Reference: Fletcher, R. (1980) Practical
methods of optimization, Vol 1. N.Y. Wiley. Newton - Raphson
Method: Suppose we choose to use G(q) and assume that it is a "smooth" function of each
parameter. Consider the multivariate Taylor series
approximation of G near the true (yet unknown) minimum. G(q - q*) = G(q*) + (q- q*)'Ñ(q*) + (q-q*)'H(q*)(q- q*)/2 + … Where Ñ(q*) = [ ¶ G / ¶ q i ] evaluated at q* (gradient or direction of steepest ascent ) H(q*) = [¶ 2 G / ¶ q i ¶ q j] evaluated at q* (hessian matrix reflecting covariance's
among parameters) Using this quadratic approximation at some
point q*, solve for the minimum ¶ G/¶ q = 0 => Ñ + Hq = 0 => q = -H-1Ñ This yields the updating algorithm q = q* + s H-1(q*) Ñ(q*) where s is the step size (determined by a line
search method). Inverting H-1 each iteration is too expensive. Modified Newton - Raphson techniques employ simpler methods to update H-1
on each iteration This iterative search continues until some
criterion is reached such as the change in parameters falling below some
threshold in magnitude. Let q* denotes the final estimate of this search. |
Standard Errors and Statistical Tests of
Parameters If H is full rank, then the model is
identified. If H is singular, then one or more of the parameters are functionally
dependent. Note, however, H(q) is
computed at one point in the parameter space. Full rank at one point does not
imply full rank at all points. For a large sample size, the final estimate
of H-1(q*) is used to estimate the
variance-covariance matrix of the parameter estimates. Maximum likelihood estimates are consistent
estimators, and most efficient (min variance) of consistent estimators, and
they are asymptotically unbiased, and asymptotically normally distributed. Statistical Tests: The log likelihood statistic G is
asymptotically (as N --> infinity) chi square. This is used to
statistically test the difference between nested models. Suppose we wish to test the null
hypothesis: H0 : b = 1, and d = 1 . Model B: qB = (b,c,d)'
producing GB and qB = 3
parameters. Model A: qA = (1,c,1)' producing GA
and qA = 1 parameter. Then the difference GA - GB
has a chi - square distribution with (qB - qA) = (3 - 1) = 2 degrees of freedom assuming
that null hypothesis
|
III. Nonlinear Regression Methods Reference: Gallant, A. R. (1987) Nonlinear
statistical models. N.Y. :Wiley. Gauss - Newton method: Suppose we choose WSSE as our objective
function. (You can still use the Newton - Raphson
search method described above, but the Gauss - Newton method described below
is more efficient in this case.) Assume that the predictions P(Ti|q) are a "smooth"
function of each parameter. Consider the multivariate Taylor series
approximation of F near the true (yet unknown) minimum. P(Ti | q - q*) = P(q*) + Ji(q*)'( q - q*) + … J = [ ¶ P(T i | q - q*) / ¶ q j ] is the Jacobian matrix of P. Now we use this first order linear
approximation of P as the prediction. J serves as the matrix of predictor values. R - P(q*) serves as the column of criterion values. (q - q*) serves as the parameters of
the linear model. We perform a standard linear least squares
regression to yield the linear approximation to the solution for (q - q*) = (J'WJ)-1J'W[R-
P(q*)] where W = diag[ wi ]. So the updating algorithm becomes q = q*+ s (J'WJ)-1J'W[R
- P(q*)] s is the step size (determined by a line search). This iterative search continues until some
criterion is reached such as the change in parameters falling below some
threshold in magnitude. Modified Gauss - Newton methods are based on "ridge" adjustments of the diagonals of the (J'WJ) matrix. |
Standard Errors and Statistical Tests of
Parameters If (J'WJ) is full rank, then the
parameters are identified. If (J'WJ) is singular, then one or more of
the parameters are functionally dependent. Note, however, (J'WJ) is
computed at one point in the parameter space. Full rank at one point does not
imply full rank at all points. For a large sample size, the final estimate
of (J'WJ)-1 is used to estimate the variance covariance
matrix of the parameters. Statistical Tests. Statistical Tests of parameters can be
performed using either the Wald statistic or the F statistic for large sample
sizes. For example, H0 : C'q = Z, where Z is a vector that represents the null hypothesis, C' is
a contrast matrix with c rows, and q is the number of
parameters in q q = [ b c d ]' C' = [1 0 0 0 0 1] C'q = [ b d ]' Z = [ 1 1 ]' A test of H0 Can be performed
using the Wald Statistic: W2 = (C'q* - Z)'[ C'(JWJ')-1C]-1(C'q* - Z) Asymptotically, W2 is chi-square
distributed with c=2 degrees of freedom in this example. For normal scores, F = (W2 / c ) / (WSSE /
NK - q) Provides an F test with dfn
= c, dfd = NK-q, Where c = 2, q = 3 , K = 100, and N = 250 in this example. |