Estimation and Statistical Tests of Parameters

References

Kelly, C.  (1999) Iterative methods for optimization. Siam. Philadelphia.

Eliason, S. R. (1993) Maximum likelihood estimation: Logic and Practice. Sage.

Burnham, K. P. & Anderson, D. R. (1998) Model selection and inference. Springer.

 

I. Objective Functions

Suppose we have a data matrix comprised of K=100 rows and 2 columns

T1

R1

.

.

Ti

Ri

.

.

TK

RK

 

 

 

T is a column vector of K time intervals [ Ti = ith time interval]

R is a column vector of K retention scores (Ri = proportion at time i).

N = sample size for each proportion (e.g. N = 250)

We wish to estimate parameters q = (b,c,d)

For the retention model

P(T i |q ) = b/exp[ (c T i )d ] + error

This is a nonlinear model because the prediction P is a nonlinear function of the unknown parameters q = (b,c,d)'.

We wish to choose q = (b,c,d)' that that minimize lack of fit. The function used to define lack of fit function is called the objective function.

 

 

 

 There are two common ways to define the objective function :

 A. Weighted Sum of Squared Error:

WSSE(q ) = S wi( Ri - Pi)2

wi = 1/Var(Ti )

For proportions Var(Ti ) = Pi(1-Pi)/N

For homogenous normal scores Var(Ti ) = s 2

B.    Log likelihood ratio

G(q ) = -2 { Lc - Lu }

Lc = log likelihood of the constrained model

Lu = log likelihood of the unconstrained estimate

For proportions,

Lc = N S[ Ri ln(Pi ) + (1-Ri)ln(1-Pi ) ]

Lu = N S[ Ri ln(Ri ) + (1-Ri)ln(1-Ri)]

 Click Here to see an example of the surface of G for the Weibull(2) model

 

 

 

II. Minimization of Objective Functions

Reference: Fletcher, R. (1980) Practical methods of optimization, Vol 1. N.Y. Wiley.

Newton - Raphson Method:

Suppose we choose to use G(q) and assume that it is a "smooth" function of each parameter.

Consider the multivariate Taylor series approximation of G near the true (yet unknown) minimum.

G(q - q*) = G(q*) + (q- q*)'Ñ(q*) + (q-q*)'H(q*)(q- q*)/2 + …

Where Ñ(q*) = [ G / q i ] evaluated at q*

(gradient or direction of steepest ascent )

H(q*) = [2 G / q i ¶ q j] evaluated at q*

(hessian matrix reflecting covariance's among parameters)

Using this quadratic approximation at some point q*, solve for the minimum

G/ q = 0 => Ñ + Hq = 0 => q = -H-1Ñ

This yields the updating algorithm

q = q* + s H-1(q*) Ñ(q*)

where s is the step size (determined by a line search method).

Inverting H-1 each iteration is too expensive. Modified Newton - Raphson techniques employ simpler methods to update H-1 on each iteration

This iterative search continues until some criterion is reached such as the change in parameters falling below some threshold in magnitude.

Let q* denotes the final estimate of this search.

 

 

Standard Errors and Statistical Tests of Parameters

If H is full rank, then the model is identified. If H is singular, then one or more of the parameters are functionally dependent. Note, however, H(q) is computed at one point in the parameter space. Full rank at one point does not imply full rank at all points.

For a large sample size, the final estimate of H-1(q*) is used to estimate the variance-covariance matrix of the parameter estimates.

Maximum likelihood estimates are consistent estimators, and most efficient (min variance) of consistent estimators, and they are asymptotically unbiased, and asymptotically normally distributed.

Statistical Tests:

The log likelihood statistic G is asymptotically (as N --> infinity) chi square. This is used to statistically test the difference between nested models.

Suppose we wish to test the null hypothesis:

H0 : b = 1, and d = 1 .

Model B: qB = (b,c,d)' producing GB and qB = 3 parameters.

Model A: qA = (1,c,1)' producing GA and qA = 1 parameter.

Then the difference GA - GB has a chi - square distribution with

(qB - qA) = (3 - 1) = 2 degrees of freedom assuming that null hypothesis

 

 

 

 

III. Nonlinear Regression Methods

Reference: Gallant, A. R. (1987) Nonlinear statistical models. N.Y. :Wiley.

Gauss - Newton method:

Suppose we choose WSSE as our objective function. (You can still use the Newton - Raphson search method described above, but the Gauss - Newton method described below is more efficient in this case.)

Assume that the predictions P(Ti|q) are a "smooth" function of each parameter.

Consider the multivariate Taylor series approximation of F near the true (yet unknown) minimum.

P(Ti | q - q*) = P(q*) + Ji(q*)'( q - q*) + …

J = [ P(T i | q - q*) / q j ]

is the Jacobian matrix of P.

Now we use this first order linear approximation of P as the prediction.

J serves as the matrix of predictor values.

R - P(q*) serves as the column of criterion values.

(q - q*) serves as the parameters of the linear model.

We perform a standard linear least squares regression to yield the linear approximation to the solution for

(q - q*) = (J'WJ)-1J'W[R- P(q*)]

where W = diag[ wi ].

So the updating algorithm becomes

q = q*+ s (J'WJ)-1J'W[R - P(q*)]

s is the step size (determined by a line search).

This iterative search continues until some criterion is reached such as the change in parameters falling below some threshold in magnitude.

Modified Gauss - Newton methods are based on "ridge" adjustments of the diagonals of the (J'WJ) matrix.

 

Standard Errors and Statistical Tests of Parameters

If (J'WJ) is full rank, then the parameters are identified. If (J'WJ) is singular, then one or more of the parameters are functionally dependent. Note, however, (J'WJ) is computed at one point in the parameter space. Full rank at one point does not imply full rank at all points.

For a large sample size, the final estimate of (J'WJ)-1 is used to estimate the variance covariance matrix of the parameters.

Statistical Tests.

Statistical Tests of parameters can be performed using either the Wald statistic or the F statistic for large sample sizes.

For example,

H0 : C'q = Z, where Z is a vector that represents the null hypothesis, C' is a contrast matrix with c rows, and q is the number of parameters in q

q = [ b c d ]'

C' =

[1 0 0

0 0 1]

C'q = [ b d ]'

Z = [ 1 1 ]'

A test of H0 Can be performed using the Wald Statistic:

W2 = (C'q* - Z)'[ C'(JWJ')-1C]-1(C'q* - Z)

Asymptotically, W2 is chi-square distributed with c=2 degrees of freedom in this example.

For normal scores,

F = (W2 / c ) / (WSSE / NK - q)

Provides an F test with dfn = c, dfd = NK-q,

Where c = 2, q = 3 , K = 100, and N = 250 in this example.