Fits a generalized linear mixed model (S.J. Welham).

PRINT = *string token*

What output to display (model, monitoring,
components, vcovariance, means, backmeans, effects, waldtests); default mode,
moni, comp, vcov, mean, back, effe

DISTRIBUTION = *string token*

Error distribution (binomial, poisson,
normal, gamma, negativebinomial); default bino

LINK = *string token*

Link function (identity, logarithm, logit,
reciprocal, probit, complementaryloglog, logratio); default * gives the
canonical link

DISPERSION = *scalar*

Value at which to fix the residual variance, if missing
the variance is estimated; default 1

RANDOM = *formula*

Random model *excluding* bottom stratum; this must be
set

FIXED = *formula*

Fixed model; default *

ABSORB = *factor*

Absorbing factor to be used at the REML step of the
iterations

CONSTANT = *string token*

Whether to estimate or omit constant term in fixed
model (omit, estimate); default esti

FACTORIAL = *scalar*

Limit on number of factors/covariates in a model term;
default 3

PTERMS = *formula*

Formula specifying fixed terms for which means or back-transformed means are to be printed; default * prints all the fixed model terms

PSE = *string token*

Standard errors to print with tables of means
(differences, estimates, alldifferences, allestimates, vcovariance); default
diff, vcov

MVINCLUDE = *string tokens*

Whether to include units with missing values in
the explanatory factors and variates and/or the y-variates (explanatory, yvariate);
default * i.e. omit units with missing values in either explanatory factors or variates or
y-variates

MAXCYCLE = *scalar*

Maximum number of iterations of the GLMM algorithm;
default 20

TOLERANCE = *scalar*

Convergence criterion for iterative procedure; default
0.0001

FMETHOD = *string token*

Specifies fitting method (all, fixed): all indicates
the method of Schall (1991); fixed indicates the marginal method of Breslow &
Clayton (1993) ; default all

OFFSET = *variate*

Variate holding values to be used as an offset on the linear
predictor scale; default *

CADJUST = *string token*

AGGREGATION = *scalar*

Fixed parameter for negative binomial distribution
(parameter *k* as in variance function var = mean + mean^{2}/*k*); default 1

KLOGRATIO = *scalar*

Parameter *k* for logratio link, in form log(mean / (mean +
*k*)); default as set in AGGREGATION option

OWNDIST = *text*

For non-standard distributions only: text specifying the
variance function to be used with dummy variable DUM, e.g. OWNDIST='DUM'

OWNLINK = *text*

For non-standard link functions only: text specifying 3
functions using dummy variable DUM - the link function, its inverse and its derivative,
e.g. OWNLINK = !T('log(DUM)','exp(DUM)','1/DUM')

CDEFINITIONS = *text*

Statements to execute to define correlation models;
default * i.e. none

CVECTORS = *pointer*

Data structures involved in the correlation models

WORKSPACE = *scalar*

Number of blocks of internal memory to be set up for
use by the REML algorithm; default 1

Y = *variates*

Dependent variates

NBINOMIAL = *scalars* or *variates*

Number of binomial trials for each unit (must
be set if DISTRIBUTION=binomial)

FITTEDVALUES = *variates*

Variates to save fitted values

COMPONENTS = *variates*

Variate to save estimated variance components

VCOVARIANCE = *symmetric matrices*

Variance-covariance matrix for the variance
components

MEANS = *pointers*

Pointer to save tables of means for each Y variate

VARMEANS = *pointers*

Pointer to save covariance matrices of tables of means for
each Y variate

BACKMEANS = *pointers*

Pointer to save tables of back-transformed means for
each Y variate

ITERATIVEWEIGHTS = *variates*

Saves the iterative weights from the generalized
linear model fitting

INITIALFITTEDVALUES = *variates*

Defines initial values for the fitted values; if
unset, these are formed automatically

Saves details of the REML
analysis used to fit the model

Procedure GLMM estimates the parameters of a generalized linear mixed model using either the method of Schall (1991) or the marginal method of Breslow & Clayton (1993), as described in the Methods Section.

The procedure assumes a generalized linear mixed model, that is a generalized linear model with both fixed and Normally-distributed random effects on the scale of the linear predictor. The procedure estimates the fixed effects together with the variance components associated with the random effects.

The DISTRIBUTION option sets the error distribution; the default is to assume a binomial distribution but the poisson, gamma and negative-binomial distributions are also available. Other distributions can be used via the OWNDIST option; this should be set to a text containing the formula for calculating the variance function for the required distribution, in terms of dummy variable DUM. The link can be set using the LINK option; the default takes the canonical link. Identity, logarithm, logit, reciprocal, probit, complementaryloglog or logratio link functions are also provided, and alternative link functions can be used via the OWNLINK option. In this case, OWNLINK must be set to a text with three values containing formulae (in terms of dummy variable DUM) for calculating the link function, its inverse and its first derivative. For example, instead of specifying a Poisson distribution with log link, the OWNDIST and OWNLINK options could be set as

OWNDIST='DUM'; OWNLINK=!T(LOG(DUM),EXP(DUM),'1/DUM')

Where necessary, these expressions should be constructed so that invalid results (eg. divide by zero or log(zero)) are avoided.

The AGGREGATION option supplies the aggregation parameter for the negative-binomial
distribution; default 1. The KLOGRATIO option supplies the parameter *k* to be used in the
logratio link, and takes its default from AGGREGATION.

The dispersion parameter is assumed to be 1 unless otherwise specified by the DISPERSION option. Setting DISPERSION=* requests that the dispersion parameter be estimated.

The fixed and random models are specified by the FIXED and RANDOM options. The
number of factors in the terms of the fixed model can be limited using the FACTORIAL
option. The ABSORB option can specify an absorbing factor for use in the REML
steps of the GLMM algorithm. However, if the absorbing factor appears in any of the terms
of the FIXED model, no estimates of error will be available for these terms (see the *Guide
to GenStat*, Part 2, Sections 5.3.3 and 5.3.7). By default, a constant term is included in the
model; this can be suppressed by setting option CONSTANT=omit. An offset can be
included in the linear predictor by setting option OFFSET. By default any covariates are
centred for the REML fitting by subtracting their means, weighted according
to the iterative weights of the generalized linear model. You can save the iterative weights
using the ITERATIVEWEIGHTS parameter, or you can set option CADJUST=none to request
that the uncentred covariates are used instead.

It is also possible to define correlation models on the random terms, although the results should be used with caution as their properties are not yet well understood. To do this, you should set the CDEFINITIONS option to a text containing the GenStat statements required to define the models (e.g. using VSTRUCTURE). You also need to set the CVECTORS option to a pointer containing the data structures involved in the statements. Then, in the statements themselves, you should refer to each of these as CVECTORS[n], where n is the position of the relevant data structure in the pointer. For example:

TEXT cdef; VALUE=\

'VSTRUCTURE [CVECTORS[1].CVECTORS[2]] ar,ar; FACTOR=CVECTORS[1,2]; ORDER=1'

GLMM [DISTRIBUTION=gamma; LINK=log; FIXED=variety;\

RANDOM=fieldrow*fieldcolumn; CDEFINITION=cdef;\

CVECTORS=!p(fieldrow,fieldcolumn)] yield

The MVINCLUDE option allows the inclusion of units with missing values, as in the REML directive. By default, units where there is a missing value in the y-variate or in any of the factors or variates in the model terms are excluded. The setting explanatory allows units with missing values in factors or variates in the model to be included. For missing covariate values, this is equivalent to substituting the mean value. The setting yvariate includes units with missing values in the y-variate. This can be useful to retain the balanced structure of the data for use with direct product covariance matrices (see VSTRUCTURE), or to produce predictions of data values for given values of explanatory factors and/or variates.

The FMETHOD option specifies the method used to form the fitted values and therefore determines the fitting method to be used. The default setting all specifies that both fixed and random terms should be used to form fitted values which gives the method of Schall (1991); setting fixed indicates that only fixed terms are used to form fitted values which gives the marginal method of Breslow & Clayton (1993).

Output is controlled by options PRINT, PTERMS and PSE. PRINT allows printing of the current model, monitoring information, estimates of the variance components, their variance-covariance matrix, Wald tests, tables of means on the scale of the linear predictor (with standard errors), tables of back-transformed means (i.e. on the original scale) and tables of effects. If there is an offset, the predicted means are for an offset value of zero. Option PTERMS can select which tables of fixed effect means are to be printed; by default, tables of means are produced for all the terms in the fixed model. Option PSE controls the standard errors that are printed with tables of means: differences produces a summary of standard errors of differences between means; estimates produces a summary of standard errors of the means; allestimates produces a standard error for every mean; vcovariance produces the variance-covariance matrix for the table; alldifferences produces the full matrix of standard errors of differences between means. Setting PSE=* alone suppresses printing of error estimates. More than one setting can be used and, by default, a summary of seds and the variance covariance matrix are printed for each table.

Some control over the iterative GLMM algorithm is provided by option MAXCYCLE which sets the maximum number of iterations (default 20), and by option TOLERANCE which specifies the criterion for determining convergence of the algorithm (default 0.0001). Convergence is judged to have been attained once the maximum change in the ratio (variance component)/(residual variance) and the change in the residual variance are less than the specified TOLERANCE.

The dependent variate is specified using the Y parameter. The NBINOMIAL parameter must be set when DISTRIBUTION=binomial to specify the total number of trials on each unit, as a variate if the number varies from unit to unit or as a scalar if it is constant over all the units.

The other parameters are used to save results. The variance components and residual variance can be saved in a variate using parameter VCOMPONENTS, with their variance-covariance matrix stored in a symmetric matrix specified by parameter VCOVARIANCE. The tables of means to be saved are determined by the setting of PTERMS. The tables are stored in a pointer specified by parameter MEANS, in the order in which they appear in the FIXED model. Their variance matrices and tables of back-transformed means are stored similarly in pointers specified by parameters VARMEANS and BACKMEANS.

VDISPLAY and VKEEP can be used after procedure GLMM to redisplay or store other results from the internal REML estimation. You can use the SAVE parameter to save the associated REML save structure, so that the information will still be available if REML is used for another analysis in the interim.

Options: PRINT, DISTRIBUTION, LINK, DISPERSION, RANDOM, FIXED, ABSORB, CONSTANT, FACTORIAL, PTERMS, PSE, MVINCLUDE, MAXCYCLE, TOLERANCE, FMETHOD, OFFSET, CADJUST, AGGREGATION, KLOGRATIO, OWNDIST, OWNLINK, CDEFINITIONS, CVECTORS, WORKSPACE.

Parameters: Y, NBINOMIAL, FITTEDVALUES, COMPONENTS, VCOVARIANCE, MEANS, VARMEANS, BACKMEANS, ITERATIVEWEIGHTS, INITIALFITTEDVALUES, SAVE.

GLMM estimates the parameters of the Generalized Linear Mixed Model using either the method of Schall (1991) or the marginal method of Breslow & Clayton (1993). The method used is determined by the setting of option FMETHOD.

The data *y* arises from some specified distribution with variance function *sV* and
expected value μ. The link function g (with inverse h) is such that

g(μ) = η = *X a* + *Z b*

where *X* is the design matrix for the vector *a* of fixed effects and *Z* is the design matrix for
the vector *b* of random effects. The random effects *b* can be attributed to *c* random factors
which are assumed to have zero mean and to be uncorrelated with each other and with
*e*:

Cov(*b*) = *D* = Diag{ σ_{1}^{2} *I*_{1} ... σ_{c}^{2} *I _{c}* }

The method used by Schall (1991) develops the algorithm by analogy with the algorithm
for estimating conventional generalized linear models. The link function applied to the data
is linearized about μ to give the adjusted dependent variate *z*,

*z* = *X a* + *Z b* + *e* g′(μ)

where e=*y*-μ and g′ = dg/dμ.

Then

E(*z*) = *X a*; Cov(*b*) = *D*;

Cov(*e* g′(μ)) = *sV*(μ) × (dη/dμ) × (dη/dμ) = *s* × W(μ)^{-1}

where *s* is the dispersion parameter. Hence

Cov(*z*) = *s* × W(μ)^{-1} + *Z D Z*′.

This has the same form as the general linear mixed model, and the fixed effects and
variance components can be estimated by REML with (iterative) weights *W*.

This leads to the following algorithm:

Step 1) Using initial estimates of the variance components and of μ, calculate the adjusted
variate *z* and weights *W*.

Step 2) Get new estimates of the variance components and of μ by REML on adjusted
variate *z* with weights *W*.

Step 3) Convergence in estimates ⇒ exit algorithm.

Step 4) Use new estimates to update adjusted variate *z* and weights vector *W*.

Step 5) Go to Step 2.

The marginal model used by Breslow and Clayton is derived from a first order
approximation (linearisation about *Xa*) to give

*y* ∼ h(*Xa*) + h′(*Xa*)*Zb* + *e*

where ∼ indicates approximation, h is the inverse of the link fuction g and *e* is *y*-μ. They
then work in terms of the marginal mean, *M*=h(*Xa*). Quasi-likelihood estimation leads to
an algorithm similar to the one above, but the working variate becomes

*z* = *Xa* + (*y*-*M*)g′(*M*) = *Xa* + *E*g′(*M*)

where *E*=*y*-*M*. The working variate *z* then has variance

Cov(*z*) = *s* × W(*M*)^{-1} + *Z D Z*′.

The same algorithm is used to fit the model, replacing μ by *M* and *e* by *E*.

The only difference between the two algorithms is then in the method used to form the
mean μ or *M* and the "error" variate *e* or *E*. The option RMETHOD of REML
controls the method of forming fitted values after REML estimation (i.e. including just fixed
terms, or all terms except the residual) and this option is used inside the procedure to
determine which of the models is fitted.

Initial values for the variance components are calculated by REML estimation using the fixed and random models on the data transformed by the link function. Initial values for the fixed effects are calculated by fitting the fixed model only to a generalized linear model with the specified link and error distribution. The WORKSPACE option specifies the number of blocks of internal memory to be set up for use by the REML algorithm; see the REML directive for more details.

If the Y-variate is restricted, only the units not excluded by the restriction will be analysed.

Breslow, N.E. & Clayton, D.G. (1993). Approximate inference in generalized linear mixed
models. *Journal of the American Statistical Association*, 88, 421, 9-25.

McCullagh, P. & Nelder, J.A. (1989). *Generalized Linear Models (second edition)*. Chapman
& Hall, London.

Schall, R. (1991) Estimation in generalized linear models with random effects. *Biometrika*,
78, 719-727.

Commands for: Regression analysis.

CAPTION 'GLMM example',\ !t('Data from McCullagh & Nelder (1989, Table 14.4),',\ 'also see Schall (1991).'); STYLE=meta,plain FACTOR [NVALUES=120; LEVELS=20] Female, Male & [LEVELS=4; LABELS=!t(RR,RW,WR,WW)] Cross VARIATE [NVALUES=120] Mate1 READ Cross,Male,Female; FREPRESENTATION=labels,2(levels) RR 1 1 RW 14 1 RR 5 1 RW 11 1 RR 4 1 RW 15 1 RR 5 2 RW 15 2 RR 3 2 RW 13 2 RR 1 2 RW 12 2 RR 2 3 RW 11 3 RR 1 3 RW 14 3 RR 3 3 RW 13 3 RR 4 4 RW 12 4 RR 2 4 RW 15 4 RR 5 4 RW 14 4 RR 3 5 RW 13 5 RR 4 5 RW 12 5 RR 2 5 RW 11 5 RW 19 6 RR 9 6 RW 20 6 RR 7 6 RW 16 6 RR 8 6 RW 18 7 RR 8 7 RW 19 7 RR 9 7 RW 17 7 RR 6 7 RW 16 8 RR 6 8 RW 17 8 RR 10 8 RW 20 8 RR 9 8 RW 20 9 RR 7 9 RW 18 9 RR 6 9 RW 19 9 RR 10 9 RW 17 10 RR 10 10 RW 16 10 RR 8 10 RW 18 10 RR 7 10 WR 9 11 WW 19 11 WR 7 11 WW 20 11 WR 10 11 WW 18 11 WR 7 12 WW 16 12 WR 9 12 WW 17 12 WR 6 12 WW 20 12 WR 8 13 WW 17 13 WR 6 13 WW 19 13 WR 7 13 WW 16 13 WR 10 14 WW 20 14 WR 8 14 WW 18 14 WR 9 14 WW 19 14 WR 6 15 WW 18 15 WR 10 15 WW 16 15 WR 8 15 WW 17 15 WW 15 16 WR 2 16 WW 13 16 WR 4 16 WW 12 16 WR 1 16 WW 14 17 WR 1 17 WW 15 17 WR 2 17 WW 11 17 WR 5 17 WW 11 18 WR 4 18 WW 12 18 WR 5 18 WW 15 18 WR 3 18 WW 13 19 WR 3 19 WW 11 19 WR 1 19 WW 14 19 WR 4 19 WW 12 20 WR 5 20 WW 14 20 WR 3 20 WW 13 20 WR 2 20: READ Mate1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 0 0 1 0 0 0 1 0 0 1 1 0 0 1 1 1 1 0 0 1 0 1 0 0 1 1 1 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 1 1 1 0 1 0 1 0 0 0 0 0 0 1 0 0 0 1 1 1 0 1 1 1 0 1 0 1 0 1 1 1 1 1 0 1 0 0 1 1 0 : GLMM [DISTRIBUTION=binomial; LINK=logit; FIXED=Cross; RANDOM=Female+Male]\ Mate1; NBINOMIAL=1