The individual approach

From Popix
Revision as of 18:15, 8 February 2013 by Admin (talk | contribs)
Jump to navigation Jump to search

$ \DeclareMathOperator{\argmin}{arg\,min} \newcommand{\psis}{\psi{^\star}} \newcommand{\phis}{\phi{^\star}} \newcommand{\hpsi}{\hat{\psi}} \newcommand{\hphi}{\hat{\phi}} \newcommand{\teps}{\tilde{\varepsilon}} \newcommand{\limite}[2]{\mathop{\longrightarrow}\limits_{\mathrm{#1}}^{\mathrm{#2}}} \newcommand{\DDt}[1]{\partial^2_\theta #1} $


Models and methods

Model for continuous data

A model for continuous data can be described by one of the two equations (according to the definition of a residual error model joint to the model):


\(\begin{align} y_{j} &= f(t_j ; \psi) + \varepsilon_j \quad ; \quad 1\leq j \leq n \\ &= f(t_j ; \psi) + g(t_j ; \psi) \tilde{\varepsilon_j} \end{align}\)


where:


  • $f$ corresponds to the structural model
  • $\psi=(\psi_1, \psi_2, \ldots, \psi_d)$ is the vector of parameters
  • $(t_1,t_2,\ldots , t_n)$ is the vector of the observation times
  • $(\varepsilon_j, \varepsilon_2, \ldots, \varepsilon_n)$ are the residual errors ($\Epsilon({\varepsilon_j}) =0$)
  • $g$ indicates the residual error model
  • $(\tilde{\varepsilon_1}, \tilde{\varepsilon_2}, \ldots, \tilde{\varepsilon_n})$ are the normalized residual errors $(Var(\tilde{\varepsilon_j}) =1)$


There are several residual error models; each model is represented by a specific formula and depends by different parameters. Hereafter there is a list of the main residual error models with relative


- constant error model $y=f+a\tilde{\varepsilon}$, $g=a$
- proportional error model $y=f+bf\tilde{\varepsilon}$, $g=b\, f$
- combined error model 1 $y=f+(a+bf)\tilde{\varepsilon}$, $g=a+b f$
- combined error model 2 $y=f+\sqrt{a^2+b^2f^2}\tilde{\varepsilon}$, $g^2=a^2+b^2f^2$


Modelling a vector of observations $(y_j)$ requires several tasks. First of all we have to estimate the vector of parameters $\psi$ for a given model. Then we need to select the structural model $f$ and the residual error model $g$ to apply to the observations. Finally the selected model should be assessed and validated.


Maximum likelihood estimation

The maximum likelihood estimation of the parameters can be computed using the formula: $\hat{\psi}$ maximizes $L(\psi ; y_1,y_2,\ldots,y_j)$, where


\( L(\psi ; y_1,y_2,\ldots,y_j) {\overset{def}{=}} p_Y( y_1,y_2,\ldots,y_j ; \psi) \)


If we assume that $\bar{\varepsilon_i} \sim_{i.i.d} {\cal N}(0,1)$, then, the $y_i$'s are independent and


\( y_{j} \sim {\cal N}(f(t_j ; \psi) , g(t_j ; \psi)^2) \)


As a consequence, the p.d.f of $(y_1, y_2, \ldots y_n)$ can be computed via the equations:


\(\begin{align} p_Y(y_1, y_2, \ldots y_n ; \psi) &= \prod_{j=1}^n p_{Y_j}(y_j ; \psi) \\ \\ &= \frac{e^{-\frac{1}{2} \sum_{j=1}^n \left( \frac{y_j - f(t_j ; \psi)}{g(t_j ; \psi)} \right)^2}}{\prod_{j=1}^n \sqrt{2\pi g(t_j ; \psi)}} \end{align}\)


Maximizing the likelihood is equivalent to minimizing the deviance (-2 $\times$ log-likelihood) which plays here the role of the objective function:

\(\begin{align} \hat{\psi} = \argmin_{\psi} \left\{ \sum_{j=1}^n \log(g(t_j ; \psi)^2) + \sum_{j=1}^n \left( \frac{y_j - f(t_j ; \psi)}{g(t_j ; \psi) }\right)^2 \right \} \end{align}\)


and the deviance is therefore


\(\begin{align} -2 LL(\hat{\psi} ; y_1,y_2,\ldots,y_j) = \sum_{j=1}^n \log(g(t_j ; \hat{\psi})^2) + \sum_{j=1}^n \left(\frac{y_j - f(t_j ; \hat{\psi})}{g(t_j ; \hat{\psi})}\right)^2 +n\log(2\pi) \end{align}\)


This minimization problem usually does not have an analytical solution for a non linear model. Some optimization procedures should be used.

Some specific models have specific solutions. For instance, for a constant error model described by the equation

\( y_{j} = f(t_j ; \phi) + a \, \bar{\varepsilon_j}\)


we have

\(\begin{align} \hat{\phi} &= \argmin_{\psi} \sum_{j=1}^n \left( y_j - f(t_j ; \phi)\right)^2 \\ \\ \hat{a}&= \frac{1}{n}\sum_{j=1}^n \left( y_j - f(t_j ; \hat{\phi})\right)^2 \\ \\ -2 LL(\hat{\psi} ; y_1,y_2,\ldots,y_j) &= \sum_{j=1}^n \log(\hat{a}^2) + n +n\log(2\pi) \end{align}\)


In case of a linear error model represented by

\( y_{j} = F_j \, \phi + a \, \bar{\varepsilon_j} \quad ; \quad 1\leq j \leq n \)


Let

\( F = \left(\begin{array}{c} F_1 \\ F_2 \\ \vdots \\F_n \\ \end{array} \right) \quad ; \quad y = \left(\begin{array}{c} y_1 \\ y_2 \\ \vdots \\y_n \\ \end{array} \right) \)


Then the solution is:


\(\begin{align} \hat{\phi} &= (F^\prime F)^{-1} F^\prime y \\ \hat{a}&= \frac{1}{n}\sum_{j=1}^n \left( y_j - F \hat{\phi})\right)^2 \end{align}\)





Computing the Fisher Information Matrix

Let $\psis$ be the true unknown value of $\psi$, and let $\hat{\psi}$ be the maximum likelihood estimate of $\psi$. If the observed likelihood function is sufficiently smooth, asymptotic theory for maximum-likelihood estimation holds and

\( I_n(\psis)^{\frac{1}{2}}(\hat{\psi}-\psi{^\star}) \limite{n\to \infty}{} {\mathcal N}(0,{\rm Id}) \)


where

\( I_n(\psi{^\star})= - \DDt LL(\psis;y_1,y_2,\ldots,y_n) \)


is the observed Fisher information matrix. Thus, an estimate of the covariance of $\hpsi$ is the inverse of the observed Fisher information matrix as expressed by the formula:

\( C(\hpsi) = \left(- \DDt LL(\hpsi ; y_1,y_2,\ldots,y_n) \right)^{-1} \)



Deriving confidence intervals for the parameters

Let $\psi_k$ be the $k$th component of $\psi$, $k=1,2,\ldots,d$. $\psi_k$ is estimated by $\hpsi_k$, the $k$th component of $\hpsi$. $\hpsi_k$ is a random variable that converges to $\psi_k^{\star}$ when $n \to \infty$.

An estimator of its variance is the $k$th element of the diagonal of $C(\hpsi)$:

\( \widehat{\rm Var}(\hpsi_k) = C_{kk}(\hpsi) \)


We can then derive an estimator of its standard error

\( \widehat{\rm s.e}(\hpsi_k) = \sqrt{C_{kk}(\hpsi)} \)

and a confidence interval for $\psi_k^\star$ constructed at a confidence level $\alpha$:

\( {\rm CI}(\psi_k) = [\hpsi_k + \widehat{\rm s.e}(\hpsi_k)q((1-\alpha)/2) , \hpsi_k + \widehat{\rm s.e}(\hpsi_k)q((1+\alpha)/2)] \)

where $q(\alpha)$ is the quantile of order $\alpha$ of a ${\cal N}(0,1)$ distribution.


Remarks:

The normal distribution for $\hpsi/\widehat{\rm s.e}(\hpsi_k)$ is a "good" approximation only when the number of observations $n$ is large.
Better approximation should be used for small $n$.
In the model $y_j = f(t_j ; \phi) + a\teps_j$, the distribution of $\hat{a}^2$ can be approximated by a chi-square distribution with $(n-d_\phi)$ degrees of freedom, :where $d_\phi$ is the size of $\phi$.
The quantiles of the normal distribution can then be replaced by the quantiles of a $t$-distribution with $(n-d_\phi)$ df.



Deriving confidence intervals for the predictions:

The regression model (or structural model) can be predicted for any $t$ by

\( \hat{f}(t,\phi) = f(t ; \hphi) \)


It is then possible for any $t$ to derive a confidence interval for $f(t,\phi)$ using the estimated variance of $\hphi$. Indeed, as a first approximation, we have:


\( f(t ; \hphi) \simeq f(t ; \phi) + J_f(\phi) (\hphi - \phi) \)


where $J_f(t,\phi)$ is the Jacobian matrix of $f$, i.e. the matrix of all first-order partial derivatives of $f(t,\phi)$ with respect to the components of $\phi$ (the $k$th row of $J_f(t,\phi)$ is )


\( f(t ; \hphi) \simeq f(t ; \phis) + \nabla f (t,\phis) (\hphi - \phis) \)


where $\nabla f(t,\phis)$ is the gradient of $f$, i.e. the vector of all first-order partial derivatives of $f(t,\phi)$ with respect to the components of $\phi$, obtained with $\phi=\phis$.

$\nabla f(t,\phis)$ can be estimated by $\nabla f(t,\hphi)$. We can then estimate the variance of $f(t ; \hphi)$ by


\( \widehat{\rm Var}\left(f(t ; \hphi)\right) \simeq \nabla f (t,\hphi)\widehat{\rm Var}(\hphi) \left(\nabla f (t,\hphi) \right)^\prime \)


Estimating confidence intervals by Monte Carlo:

Estimating any distribution using Monte Carlo method does not require any approximation of the model.

We can easily estimate accurately the distribution of $\hpsi$ by simulating a large number $M$ of independent vectors of observations $(y^{(m)},1\leq m \leq M)$ using the estimated distribution of the observed vector $y=(y_j)$:

\( y^{(m)}_j = f(t_j ;\hpsi) + g(t_j ;\hpsi)\teps^{(m)}_j \)


where $\teps^{(m)}_j \sim_{i.i.d.} {\cal N}(0,1)$.

We can then compute the MLE of $\psi$ for each of these replicates to derive an empirical estimation of the distribution of $\hpsi$. In particular, we can estimate any quantile of the distribution of $\hpsi_k$ using the empirical quantiles of $(\hpsi^{(m)}_k ,1\leq m \leq M)$.

Any confidence interval for $\psi_k$ (resp. $f(t,\psi_k)$) can then be approximated by a prediction interval for $\hpsi_k$ (resp. $f(t,\hpsi_k)$) .




A PK example

Let us consider a dose $D=50$ mg of a drug that is administrated orally to a patient at time 0. The concentrations of the drug are measured at times (0.5, 1.0, 1.5, 2.0, 3.0, 4.0, 8.0, 10.0, 12.0, 16.0, 20.0, 24.0).

Individual1.png

pk1=read.table("example1_data.txt",header=T) 
t=pk1$time  
y=pk1$concentration
plot(t,y,xlab="time(hour)",ylab="concentration(mg/l)",col="blue")   


There are two structural models that we can consider in order to evaluate the effects of such administration:

1. One compartment model, first order absorption and linear elimination


\(\begin{align} \phi_1 &= (k_a, V, k_e) \\ f_1(t ; \phi_1) &= \frac{D\, k_a}{V(k_a-k_e)} \left( e^{-k_e \, t} - e^{-k_a \, t}\right) \end{align} \)


2. One compartment model, zero order absorption and linear elimination


\(\begin{align} \phi_2 &= (T_{k0}, V, k_e) \\ f_2(t ; \phi_2) &= \left\{ \begin{array}{ll} \frac{D}{V \,T_{k0} \, k_e} \left( 1- e^{-k_e \, t} \right) & {\rm if } \ t\leq T_{k0} \\[0.3cm] \frac{D}{V \,T_{k0} \, k_e} \left( 1- e^{-k_e \, T_{k0}} \right)e^{-k_e \, (t- T_{k0})} & {\rm otherwise} \end{array} \right. \end{align}\)
predc1=function(t,x){ 
A=50*x[1]/x[2]/(x[1]-x[3]) 
f=A*(exp(-x[3]*t)-exp(-x[1]*t)) 
}

predc2=function(t,x){ 
A=50/x[1]/x[2]/x[3] 
ff=A*(1-exp(-x[3]*t)) 
ff[t>x[1]]=A*(1-exp(-x[3]*x[1]))*exp(-x[3]*(t[t>x[1]]-x[1])) 
f=ff 
}



We define then models ${\cal M}_1$ and ${\cal M}_2$ by assuming constant error models:

\(\begin{align} {\cal M}_1 : \quad y_j & = & f_1(t_j ; \phi_1) + a_1\teps_j \\ {\cal M}_2 : \quad y_j & = & f_2(t_j ; \phi_2) + a_2\teps_j \end{align}\)


We can fit these two models to our data by computing $\hpsi_1=(\hphi_1,\hat{a}_1)$ and $\hpsi_2=(\hphi_2,\hat{a}_2)$, the MLEs of $\psi$ under models ${\cal M}_1$ and ${\cal M}_2$.


>>psi1 
[1]    0.3240916 6.001204 0.3239337 0.4366948

>>psi2 
[1]    3.203111 8.999746 0.229977 0.2555242
fmin1=function(x,y,t) 
{f=predc1(t,x) 
g=x[4] 
e=sum( ((y-f)/g)^2 + log(g^2))}

fmin2=function(x,y,t) 
{f=predc2(t,x) 
g=x[4] 
e=sum( ((y-f)/g)^2 + log(g^2))}

pk.nlm1=nlm(fmin1, c(0.3,6,0.2,1), y, t, hessian="true") 
psi1=pk.nlm1$estimate 

pk.nlm2=nlm(fmin2, c(3,10,0.2,4), y, t, hessian="true") 
psi2=pk.nlm2$estimate


Therefore, the estimated parameters $\hphi_1$ and $\hphi_2$ can be used for computing predicted concentrations $\hat{f}_1(t)$ and $\hat{f}_2(t)$ under both models and for any time $t$. We clearly see here that a much better fit is obtained with model ${\cal M}_2$, i.e. assuming a zero order absorption process.


Individual2.png

tc=seq(from=0,to=25,by=0.1) 
fc1=predc1(tc,phi1) 
fc2=predc2(tc,phi2) 
plot(t,y,ylim=c(0,4.1),xlab="time (hour)",ylab="concentration (mg/l)",col="blue") 
lines(tc,fc1,type="l",col="green",lwd=2)
lines(tc,fc2,type="l",col="red",lwd=2)
abline(a=0,b=0,lty=2)
legend(13,4,c("observations","first order absorption","zero order absorption"), 
       lty=c(-1,1,1),pch=c(1,-1,-1),lwd=2,col=c("blue","green","red"))


Another useful goodness of fit plot is obtained by displaying the observations $(y_j)$ versus the predictions $\hat{y}_j=f(t_j ; \hpsi)$ given by the model:

Individual3.png

pk.nlm1=nlm(fmin1,c(0.3,6,0.2,1),y,t,hessian="true")
f2=predc2(t,phi2)
par(mfrow= c(1,2))
plot(f1,y,xlim=c(0,4),ylim=c(0,4),main="model 1")
abline(a=0,b=1,lty=1)
plot(f2,y,xlim=c(0,4),ylim=c(0,4),main="model 2")
abline(a=0,b=1,lty=1)


As we can see in the following paragraph, the Bayesian Information Criteria confirm the zero order absorption process should be selected.

>>bic1
[1]    24.10972

>>bic2 
[1]    11.24769
deviance1=pk.nlm1$minimum + n*log(2*pi)
bic1=deviance1+log(n)*length(psi1)
deviance2=pk.nlm2$minimum + n*log(2*pi) 
bic2=deviance2+log(n)*length(psi2)




We have only considered for the moment constant error models. Nevertheless, the graphic "observations vs predictions" seems to indicate that the amplitude of the residual errors increase with the prediction. We will then consider four different residual error models associated with the same structural model $f_2$.


${\cal M}_2$, constant error model: $y_j=f_2(t_j;\phi_2)+a_2\teps_j$
${\cal M}_3$, proportional error model: $y_j=f_2(t_j;\phi_3)+b_3f_2(t_j;\phi_3)\teps_j$
${\cal M}_4$, combined error model: $y_j=f_2(t_j;\phi_4)+(a_4+b_4f_2(t_j;\phi_4))\teps_j$
${\cal M}_5$, exponential error model: $\log(y_j)=\log(f_2(t_j;\phi_5)) + a_5\teps_j$
fmin3=function(x,y,t)
{f=predc2(t,x) 
g=x[4]*f
e=sum( ((y-f)/g)^2 + log(g^2))} 

fmin4=function(x,y,t) 
{f=predc2(t,x) 
g=abs(x[4])+abs(x[5])*f
e=sum( ((y-f)/g)^2 + log(g^2))} 

fmin5=function(x,y,t)
{f=predc2(t,x) 
g=x[4]
e=sum( ((log(y)-log(f))/g)^2 + log(g^2))}


We can now compute $\hpsi_3=(\hphi_3,\hat{b}_3)$, $\hpsi_4=(\hphi_4,\hat{a}_4,,\hat{b}_4)$ and $\hpsi_5=(\hphi_5,\hat{a}_5)$, the MLEs of $\psi$ under models ${\cal M}_3$, ${\cal M}_4$ and ${\cal M}_5$.

>>psi3
[1]    2.642409 11.44113 0.1838779 0.2189221

>>psi4
[1]    2.890066 10.16836 0.2068221 0.02741416 0.1456332

>>psi5
[1]    2.710984 11.2744 0.188901 0.2310001
pk.nlm3=nlm(fmin3,c(phi2,0.1),y,t,hessian="true") 
psi3=pk.nlm3$estimate
pk.nlm4=nlm(fmin4,c(phi2,1,0.1),y,t,hessian="true")
psi4=pk.nlm4$estimate
psi4[c(4,5)]=abs(psi4[c(4,5)])
pk.nlm5=nlm(fmin5, c(phi2,0.1),y,t,hessian="true")
psi5=pk.nlm5$estimate

As you can see, the three predicted concentrations obtained with models ${\cal M}_3$, ${\cal M}_4$ and ${\cal M}_5$ are very similar

Individual4.png

tc=seq(from=0,to=25,by=0.1) 
fc1=predc1(tc,phi1)
fc2=predc2(tc,phi2) 
plot(t,y,ylim=c(0,4.1),xlab="time (hour)", 
     ylab="concentration (mg/l)",col = "blue") 
lines(tc,fc1, type = "l", col = "green", lwd=2)
lines(tc,fc2, type = "l", col = "red", lwd=2)
abline(a=0,b=0,lty=2)
legend(13,4,c("observations","first order absorption","zero order absorption"), 
       lty=c(-1,1,1), pch=c(1,-1,-1),lwd=2, col=c("blue","green","red"))

BIC confirms that a residual error model including a proportional component should be selected.

>> bic3
[1]    3.443607

>> bic4
[1]    3.475841 

>> bic5
[1]    4.108521
deviance3=pk.nlm3$minimum + n*log(2*pi)
bic3=deviance3 + log(n)*length(psi3)
deviance4=pk.nlm4$minimum + n*log(2*pi)
bic4=deviance4 + log(n)*length(psi4)
deviance5=pk.nlm5$minimum + 2*sum(log(y)) + n*log(2*pi)
bic5=deviance5 + log(n)*length(psi5)

There is no obvious difference between these three error models. We will use the combined error model ${\cal M}_4$ in the following.

A 90% confidence interval for $\psi_4$ can derived from the Hessian matrix of the objective function.

>>ci4
            [,1]        [,2]
[1,]  2.22576690  3.55436561
[2,]  7.93442421 12.40228967
[3,]  0.16628224  0.24736196
[4,] -0.02444571  0.07927403
[5,]  0.04119983  0.25006660
alpha=0.9
df=n-length(phi4)
I4=pk.nlm4$hessian/2
H4=solve(I4)
s4=sqrt(diag(H4)*n/df)
delta4=s4*qt(0.5+alpha/2,df)
ci4=matrix(c(psi4-delta4,psi4+delta4),ncol=2)

as well as a confidence interval for $f_4(t)$ using the Central Limit Theorem

Individual6.png

nlpredci=function(phi,f,H)
{dphi=length(phi)
nf=length(f)
H=H*n/(n-dphi)
S=H[seq(1,dphi),seq(1,dphi)]
G=matrix(nrow=nf,ncol=dphi)
for (k in seq(1,dphi)) {
   dk=phi[k]*(1e-5)
   phid=phi 
   phid[k]=phi[k] + dk
   fd=predc2(tc,phid)
   G[,k]=(f-fd)/dk }
M=rowSums((G%*%S)*G)
deltaf=sqrt(M)*qt(0.5+alpha/2,df)}

deltafc4=nlpredci(phi4,fc4,H4)

par(mfrow= c(1,1))
plot(t,y,ylim=c(0,4.5),xlab="time (hour)",ylab="concentration (mg/l)",col = "blue")
lines(tc,fc4, type = "l", col = "red", lwd=2)
lines(tc,fc4-deltafc4, type = "l", col = "red", lwd=1, lty=3)
lines(tc,fc4+deltafc4, type = "l", col = "red", lwd=1, lty=3)
abline(a=0,b=0,lty=2)
legend(10.5,4.5,c("observed concentrations","predicted concentration","IC for predicted concentration"), 
       lty=c(-1,1,3), pch=c(1,-1,-1), lwd=c(2,2,1), col=c("blue","red","red")) 

Alternatively, prediction intervals for $\hpsi_4$, $\hat{f}_4(t;\hpsi_4)$ and new observations at any time can be estimated by Monte-Carlo

>>ci4s
             [,1]        [,2]
[1,] 2.412131e+00  3.54008925
[2,] 8.554283e+00 11.89304332
[3,] 1.829453e-01  0.23922393
[4,] 1.558052e-08  0.08006378
[5,] 1.233954e-02  0.19552977

Individual7.png

f=predc2(t,phi4)
a4=psi4[4]
b4=psi4[5]
g=a4+b4*f
dpsi=length(psi4)
nc=length(tc)
N=1000
qalpha=c(0.5 - alpha/2,0.5 + alpha/2)
PSI=matrix(nrow=N,ncol=dpsi)
FC=matrix(nrow=N,ncol=nc)
Y=matrix(nrow=N,ncol=nc)
for (k in seq(1,N)) {
   eps=rnorm(n)
   ys=f+g*eps
   pk.nlm=nlm(fmin4, psi4, ys, t)
   psie=pk.nlm$estimate
   psie[c(4,5)]=abs(psie[c(4,5)])
   PSI[k,]=psie
   fce=predc2(tc,psie[c(1,2,3)])
   FC[k,]=fce
   gce=a4+b4*fce
   Y[k,]=fce + gce*rnorm(1)
}

ci4s=matrix(nrow=dpsi,ncol=2)
for (k in seq(1,dpsi)){
   ci4s[k,]=quantile(PSI[,k],qalpha,names=FALSE)}

cifc4s=matrix(nrow=nc,ncol=2)
for (k in seq(1,nc)){
   cifc4s[k,]=quantile(FC[,k],qalpha,names=FALSE)}

ciy4s=matrix(nrow=nc,ncol=2)
for (k in seq(1,nc)){
   ciy4s[k,]=quantile(Y[,k],qalpha,names=FALSE)}
par(mfrow= c(1,1))
plot(t,y,ylim=c(0,4.5),xlab="time (hour)",ylab="concentration (mg/l)",col = "blue")
lines(tc,fc4, type = "l", col = "red", lwd=2)
lines(tc,cifc4s[,1], type = "l", col = "red", lwd=1, lty=3)
lines(tc,cifc4s[,2], type = "l", col = "red", lwd=1, lty=3)
lines(tc,ciy4s[,1], type = "l", col = "green", lwd=1, lty=3)
lines(tc,ciy4s[,2], type = "l", col = "green", lwd=1, lty=3)
abline(a=0,b=0,lty=2)
legend(10.5,4.5,c("observed concentrations","predicted concentration","IC for predicted concentration",
       "IC for observed concentrations"),lty=c(-1,1,3,3), pch=c(1,-1,-1,-1), lwd=c(2,2,1,1), 
       col=c("blue","red","red","green")) 


The R code used in this section to illustrate a basic PK example can be downloaded using the following link.