The individual approach
$ \DeclareMathOperator{\argmin}{arg\,min} \newcommand{\psis}{\psi{^\star}} \newcommand{\phis}{\phi{^\star}} \newcommand{\hpsi}{\hat{\psi}} \newcommand{\hphi}{\hat{\phi}} \newcommand{\teps}{\tilde{\varepsilon}} \newcommand{\limite}[2]{\mathop{\longrightarrow}\limits_{\mathrm{#1}}^{\mathrm{#2}}} \newcommand{\DDt}[1]{\partial^2_\theta #1} $
Contents
Models and methods
Model for continuous data
A model for continuous data is described by a system:
- \(\begin{align} y_{j} &= f(t_j ; \psi) + \varepsilon_j \quad ; \quad 1\leq j \leq n \\ &= f(t_j ; \psi) + g(t_j ; \psi) \tilde{\varepsilon_j} \end{align}\)
where
- $f$ corresponds to the structural model
- $\psi=(\psi_1, \psi_2, \ldots, \psi_d)$ is the vector of parameters
- $(t_1,t_2,\ldots , t_n)$ is the vector of the observation times
- $(\varepsilon_j, \varepsilon_2, \ldots, \varepsilon_n)$ are the residual errors ($\Epsilon({\varepsilon_j}) =0$)
- $g$ indicates the residual error model
- $(\tilde{\varepsilon_1}, \tilde{\varepsilon_2}, \ldots, \tilde{\varepsilon_n})$ are the normalized residual errors $(Var(\tilde{\varepsilon_j}) =1)$
There are several residual error models; each model can be represented by a different formula:
- constant error model | $y=f+a\tilde{\varepsilon}$, | $g=a$ |
- proportional error model | $y=f+bf\tilde{\varepsilon}$, | $g=b\, f$ |
- combined error model 1 | $y=f+(a+bf)\tilde{\varepsilon}$, | $g=a+b f$ |
- combined error model 2 | $y=f+\sqrt{a^2+b^2f^2}\tilde{\varepsilon}$, | $g^2=a^2+b^2f^2$ |
Modelling a vector of observations $(y_j)$ requires several tasks. First of all we have to estimate the vector of parameters $\psi$ for a given model. Then we need to select the structural model $f$ and the residual error model $g$ to apply to the observations. Finally the selected model should be the assessed and validated.
Maximum likelihood estimation
The maximum likelihood estimation of the parameters can be computed using the formula: $\hat{\psi}$ maximizes $L(\psi ; y_1,y_2,\ldots,y_j)$, where
- \( L(\psi ; y_1,y_2,\ldots,y_j) {\overset{def}{=}} p_Y( y_1,y_2,\ldots,y_j ; \psi) \)
If we assume that $\bar{\varepsilon_i} \sim_{i.i.d} {\cal N}(0,1)$, then, the $y_i$'s are independent and
- \( y_{j} \sim {\cal N}(f(t_j ; \psi) , g(t_j ; \psi)^2) \)
Thus the p.d.f of $(y_1, y_2, \ldots y_n)$ can be computed via the equations:
- \(\begin{align} p_Y(y_1, y_2, \ldots y_n ; \psi) &=& \prod_{j=1}^n p_{Y_j}(y_j ; \psi) \\ \\ && \frac{e^{-\frac{1}{2} \sum_{j=1}^n \left( \frac{y_j - f(t_j ; \psi)}{g(t_j ; \psi)} \right)^2}}{\prod_{j=1}^n \sqrt{2\pi g(t_j ; \psi)}} \end{align}\)
Maximizing the likelihood is equivalent to minimizing the deviance (-2 $\times$ log-likelihood) which plays here the role of the objective function:
- \(\begin{align} \hat{\psi} = \argmin_{\psi} \left\{ \sum_{j=1}^n \log(g(t_j ; \psi)^2) + \sum_{j=1}^n \left( \frac{y_j - f(t_j ; \psi)}{g(t_j ; \psi) }\right)^2 \right \} \end{align}\)
and the deviance is therefore
- \(\begin{align} -2 LL(\hat{\psi} ; y_1,y_2,\ldots,y_j) = \sum_{j=1}^n \log(g(t_j ; \hat{\psi})^2) + \sum_{j=1}^n \left(\frac{y_j - f(t_j ; \hat{\psi})}{g(t_j ; \hat{\psi})}\right)^2 +n\log(2\pi) \end{align}\)
This minimization problem usually does not have an analytical solution for a non linear model. Some optimization procedures should be used.
Some specific models have specific solutions. For instance, for a constant error model:
- \( y_{j} = f(t_j ; \phi) + a \, \bar{\varepsilon_j}\)
we have
- \(\begin{align} \hat{\phi} &=& \argmin_{\psi} \sum_{j=1}^n \left( y_j - f(t_j ; \phi)\right)^2 \\ \\ \hat{a}&=& \frac{1}{n}\sum_{j=1}^n \left( y_j - f(t_j ; \hat{\phi})\right)^2 \\ \\ -2 LL(\hat{\psi} ; y_1,y_2,\ldots,y_j) &=& \sum_{j=1}^n \log(\hat{a}^2) + n +n\log(2\pi) \end{align}\)
For a linear error model:
- \( y_{j} = F_j \, \phi + a \, \bar{\varepsilon_j} \quad ; \quad 1\leq j \leq n \)
Let
- \( F = \left(\begin{array}{c} F_1 \\ F_2 \\ \vdots \\F_n \\ \end{array} \right) \quad ; \quad y = \left(\begin{array}{c} y_1 \\ y_2 \\ \vdots \\y_n \\ \end{array} \right) \)
Then
- \(\begin{align} \hat{\phi} &=& (F^\prime F)^{-1} F^\prime y \\ \hat{a}&=& \frac{1}{n}\sum_{j=1}^n \left( y_j - F \hat{\phi})\right)^2 \end{align}\)
Computing the Fisher Information Matrix
Let $\psis$ be the true unknown value of $\psi$, and let $\hat{\psi}$ be the maximum likelihood estimate of $\psi$. If the observed likelihood function is sufficiently smooth, asymptotic theory for maximum-likelihood estimation holds and
- \( I_n(\psis)^{\frac{1}{2}}(\hat{\psi}-\psi{^\star}) \limite{n\to \infty}{} {\mathcal N}(0,{\rm Id}) \)
where
- \( I_n(\psi{^\star})= - \DDt LL(\psis;y_1,y_2,\ldots,y_n) \)
is the observed Fisher information matrix. Thus, an estimate of the covariance of $\hpsi$ is the inverse of the observed Fisher information matrix:
- \( C(\hpsi) = \left(- \DDt LL(\hpsi ; y_1,y_2,\ldots,y_n) \right)^{-1} \)
Deriving confidence intervals for the parameters
Let $\psi_k$ be the $k$th component of $\psi$, $k=1,2,\ldots,d$. $\psi_k$ is estimated by $\hpsi_k$, the $k$th component of $\hpsi$.
$\hpsi_k$ is a random variable that converges to $\psi_k^{\star}$ when $n \to \infty$.
An estimator of its variance is the $k$th element of the diagonal of $C(\hpsi)$:
- \( \widehat{\rm Var}(\hpsi_k) = C_{kk}(\hpsi) \)
We can then derive an estimator of its standard error
- \( \widehat{\rm s.e}(\hpsi_k) = \sqrt{C_{kk}(\hpsi)} \)
and a confidence interval for $\psi_k^\star$ constructed at a confidence level $\alpha$:
- \( {\rm CI}(\psi_k) = [\hpsi_k + \widehat{\rm s.e}(\hpsi_k)q((1-\alpha)/2) , \hpsi_k + \widehat{\rm s.e}(\hpsi_k)q((1+\alpha)/2)] \)
where $q(\alpha)$ is the quantile of order $\alpha$ of a ${\cal N}(0,1)$ distribution.
Remark:
- The normal distribution for $\hpsi/\widehat{\rm s.e}(\hpsi_k)$ is a "good" approximation only when the number of observations $n$ is large.
- Better approximation should be used for small $n$.
- In the model $y_j = f(t_j ; \phi) + a\teps_j$, the distribution of $\hat{a}^2$ can be approximated by a chi-square distribution with $(n-d_\phi)$ degrees of freedom, :where $d_\phi$ is the size of $\phi$.
- The quantiles of the normal distribution can then be replaced by the quantiles of a $t$-distribution with $(n-d_\phi)$ df.
Deriving confidence intervals for the predictions:
The regression model (or structural model) can be predicted for any $t$ by
- \( \hat{f}(t,\phi) = f(t ; \hphi) \)
It is then possible for any $t$ to derive a confidence interval for $f(t,\phi)$ using the estimated variance of $\hphi$.
Indeed, as a first approximation, we have:
- \( f(t ; \hphi) \simeq f(t ; \phi) + J_f(\phi) (\hphi - \phi) \)
where $J_f(t,\phi)$ is the Jacobian matrix of $f$, i.e. the matrix of all first-order partial derivatives of $f(t,\phi)$ with respect to the components of $\phi$ (the $k$th row of $J_f(t,\phi)$ is )
- \( f(t ; \hphi) \simeq f(t ; \phis) + \nabla f (t,\phis) (\hphi - \phis) \)
where $\nabla f(t,\phis)$ is the gradient of $f$, {\it i.e.} the vector of all first-order partial derivatives of $f(t,\phi)$ with respect to the components of $\phi$, obtained with $\phi=\phis$.
$\nabla f(t,\phis)$ can be estimated by $\nabla f(t,\hphi)$. We can then estimate the variance of $f(t ; \hphi)$ by
- \( \widehat{\rm Var}\left(f(t ; \hphi)\right) \simeq \nabla f (t,\hphi)\widehat{\rm Var}(\hphi) \left(\nabla f (t,\hphi) \right)^\prime \)
Estimating confidence intervals by Monte Carlo:
Estimating any distribution using Monte Carlo method does not require any approximation of the model.
We can easily estimate accurately the distribution of $\hpsi$ by simulating a large number $M$ of independent vectors of observations $(y^{(m)},1\leq m \leq M)$ using the estimated distribution of the observed vector $y=(y_j)$:
- \( y^{(m)}_j = f(t_j ;\hpsi) + g(t_j ;\hpsi)\teps^{(m)}_j \)
where $\teps^{(m)}_j \sim_{i.i.d.} {\cal N}(0,1)$.
We can then compute the MLE of $\psi$ for each of these replicates to derive an empirical estimation of the distribution of $\hpsi$. In particular, we can estimate any quantile of the distribution of $\hpsi_k$ using the empirical quantiles of $(\hpsi^{(m)}_k ,1\leq m \leq M)$.
Any confidence interval for $\psi_k$ (resp. $f(t,\psi_k)$) can then be approximated by a prediction interval for $\hpsi_k$ (resp. $f(t,\hpsi_k)$) .
A PK example
Let us consider a dose $D=50$ mg of a drug that is administrated orally to a patient at time 0. The concentrations of the drug are measured at times (0.5, 1.0, 1.5, 2.0, 3.0, 4.0, 8.0, 10.0, 12.0, 16.0, 20.0, 24.0).
|
Two structural models:
|
|
We define then models ${\cal M}_1$ and ${\cal M}_2$ by assuming constant error models:
- \(\begin{align} {\cal M}_1 : \quad y_j & = & f_1(t_j ; \phi_1) + a_1\teps_j \\ {\cal M}_2 : \quad y_j & = & f_2(t_j ; \phi_2) + a_2\teps_j \end{align}\)
We can fit these two models to our data by computing $\hpsi_1=(\hphi_1,\hat{a}_1)$ and $\hpsi_2=(\hphi_2,\hat{a}_2)$, the MLEs of $\psi$ under models ${\cal M}_1$ and ${\cal M}_2$.
|
|
The estimated parameters $\hphi_1$ and $\hphi_2$ can then be used for computing predicted concentrations $\hat{f}_1(t)$ and $\hat{f}_2(t)$ under both models and for any time $t$.
We clearly see here that a much better fit is obtained with model ${\cal M}_2$, i.e. assuming a zero order absorption process.
tc=seq(from=0,to=25,by=0.1) fc1=predc1(tc,phi1) fc2=predc2(tc,phi2) plot(t,y,ylim=c(0,4.1),xlab="time (hour)",...
|
Another useful goodness of fit plot is obtained by displaying the observations $(y_j)$ versus the predictions $\hat{y}_j=f(t_j ; \hpsi)$ given by the model.
pk.nlm1=nlm(fmin1,c(0.3,6,0.2,1),y,t,hessian="true") f2=predc2(t,phi2) par(mfrow= c(1,2)) plot(f1,y,xlim=c(0,4),ylim=c(0,4),main="model 1") abline(a=0,b=1,lty=1) plot(f2,y,xlim=c(0,4),ylim=c(0,4),main="model 2") abline(a=0,b=1,lty=1) |
Bayesian Information Criteria confirm the zero order absorption process should be selected.
bic2=deviance2+log(n)*length(psi2) |
> cat(" bic1 =",bic1,"\n\n") bic1 = 24.10972 > cat(" bic2 =",bic2,"\n\n") bic2 = 11.24769
|
We have only considered for the moment constant error models. Nevertheless, the graphic "observations vs predictions" seems to indicate that the amplitude of the residual errors increase with the prediction. We will then consider four different residual error models associated with the same structural model $f_2$.
${\cal M}_2$, | constant error model: | $y_j=f_2(t_j;\phi_2)+a_2\teps_j$ |
${\cal M}_3$, | proportional error model: | $y_j=f_2(t_j;\phi_3)+b_3f_2(t_j;\phi_3)\teps_j$ |
${\cal M}_4$, | combined error model: | $y_j=f_2(t_j;\phi_4)+(a_4+b_4f_2(t_j;\phi_4))\teps_j$ |
${\cal M}_5$, | exponential error model: | $\log(y_j)=\log(f_2(t_j;\phi_5)) + a_5\teps_j$ |
fmin3=function(x,y,t) {
} |
fmin4=function(x,y,t) {
} |
fmin5=function(x,y,t) {
} |
We can now compute $\hpsi_3=(\hphi_3,\hat{b}_3)$, $\hpsi_4=(\hphi_4,\hat{a}_4,,\hat{b}_4)$ and $\hpsi_5=(\hphi_5,\hat{a}_5)$, the MLEs of $\psi$ under models ${\cal M}_3$, ${\cal M}_4$ and ${\cal M}_5$.
fmin3=function(x,y,t) {
} | ||
fmin4=function(x,y,t) {
} | ||
fmin5=function(x,y,t) {
} | ||
#-------- MLE --------# | ||
pk.nlm3=nlm(fmin3,c(phi2,0.1),y,t,hessian="true") psi4[c(4,5)]=abs(psi4[c(4,5)]) pk.nlm5=nlm(fmin5, c(phi2,0.1),y,t,hessian="true") bic3=deviance3 + log(n)*length(psi3)
bic5=deviance5 + log(n)*length(psi5) |
> cat(" bic3 =",bic3,"\n\n") bic3 = 3.443607 > cat(" bic4 =",bic4,"\n\n") bic4 = 3.475841 > cat(" bic5 =",bic5,"\n\n")
bic5 = 4.108521
|
There is no obvious difference between these three error models. We will use the combined error model ${\cal M}_4$ in the following.
A 90% confidence interval for $\psi_4$ can derived from the Hessian matrix of the objective function.
alpha=0.9 df=n-length(phi4)
| |||||||||||||||||||||||
ci4s=matrix(nrow=dpsi,ncol=2) for (k in seq(1,dpsi)) {
} | |||||||||||||||||||||||
m4s=colMeans(PSI) sd4s=apply(PSI,2,sd) | |||||||||||||||||||||||
cifc4s=matrix(nrow=nc,ncol=2) for (k in seq(1,nc)){
} | |||||||||||||||||||||||
ciy4s=matrix(nrow=nc,ncol=2) for (k in seq(1,nc)){
} | |||||||||||||||||||||||
par(mfrow= c(1,1)) plot(t,y,ylim=c(0,4.5),xlab="time (hour)",ylab="concentration (mg/l)",col="blue") lines(tc,fc4, type="l", col="red", lwd=2) lines(tc,cifc4s[,1], type="l", col="red", lwd=1, lty=3) lines(tc,cifc4s[,2], type="l", col="red",lwd=1,lty=3) lines(tc,ciy4s[,1], type="l", col="green",lwd=1,lty=3)
| |||||||||||||||||||||||
|