Difference between revisions of "The individual approach"

From Popix
Jump to navigation Jump to search
m (The data)
m (Selecting the error model)
 
(241 intermediate revisions by 2 users not shown)
Line 1: Line 1:
<!-- some LaTeX macros we want to use: -->
 
$
 
\DeclareMathOperator{\argmin}{arg\,min}
 
\DeclareMathOperator{\argmax}{arg\,max}
 
\newcommand{\psis}{\psi{^\star}}
 
\newcommand{\phis}{\phi{^\star}}
 
\newcommand{\hpsi}{\hat{\psi}}
 
\newcommand{\hphi}{\hat{\phi}}
 
\newcommand{\teps}{\varepsilon}
 
\newcommand{\limite}[2]{\mathop{\longrightarrow}\limits_{\mathrm{#1}}^{\mathrm{#2}}}
 
\newcommand{\DDt}[1]{\partial^2_\theta #1}
 
\def\aref{a^\star}
 
\def\kref{k^\star}
 
\def\model{M}
 
\def\hmodel{m}
 
\def\mmodel{\mu}
 
\def\imodel{H}
 
 
\def\Imax{\text{\it Imax}}
 
\def\id{{\rm Id}}
 
\def\teta{\tilde{\eta}}
 
\newcommand{\eqdef}{\mathop{=}\limits^{\mathrm{def}}}
 
 
\newcommand{\deriv}[1]{\frac{d}{dt}#1(t)}
 
 
\newcommand{\pred}[1]{\tilde{#1}}
 
\def\phis{\phi{^\star}}
 
\def\hphi{\tilde{\phi}}
 
\def\hw{\tilde{w}}
 
\def\hpsi{\tilde{\psi}}
 
\def\hatpsi{\hat{\psi}}
 
\def\hatphi{\hat{\phi}}
 
\def\psis{\psi{^\star}}
 
\def\transy{u}
 
\def\psipop{\psi_{\rm pop}}
 
\newcommand{\psigr}[1]{\hat{\bpsi}_{#1}}
 
\newcommand{\Vgr}[1]{\hat{V}_{#1}}
 
 
\def\psig{\psi}
 
\def\psigprime{\psig^{\prime}}
 
\def\psigiprime{\psig_i^{\prime}}
 
\def\psigk{\psig^{(k)}}
 
\def\psigki{{\psig_i^{(k)}}}
 
\def\psigkun{\psig^{(k+1)}}
 
\def\psigkuni{\psig_i^{(k+1)}}
 
\def\psigi{{\psig_i}}
 
\def\psigil{{\psig_{i,\ell}}}
 
\def\phig{{\phi}}
 
\def\phigi{{\phig_i}}
 
\def\phigil{{\phig_{i,\ell}}}
 
\def\etagi{{\eta_i}}
 
\def\IIV{{\Omega}}
 
\def\IIVk{{\IIV_{k}}}
 
\def\IIVkun{{\IIV_{k+1}}}
 
\def\sigmakun{\sigma^2_{{k+1}}}
 
\def\thetag{{\theta}}
 
\def\thetagk{{\theta_k}}
 
\def\thetagkun{{\theta_{k+1}}}
 
\def\thetagkunm{\theta_{k-1}}
 
\def\sgk{s_{k}}
 
\def\sgkun{s_{k+1}}
 
\def\yg{y}
 
\def\xg{x}
 
\def\varprior{{V_\star}}
 
\def\thetaprior{{\theta_\star}}
 
\def\dprior{{d_\star}}
 
 
\def\neta{{n_\eta}}
 
\def\ncov{M}
 
\def\npsi{n_\psig}
 
\def\nreg{{n_x}}
 
 
\def\bu{\boldsymbol{u}}
 
\def\bt{\boldsymbol{t}}
 
\def\bT{\boldsymbol{T}}
 
\def\vt{{t}}
 
 
\def\by{\boldsymbol{y}}
 
\def\bx{\boldsymbol{x}}
 
\def\bc{\boldsymbol{c}}
 
\def\bw{\boldsymbol{w}}
 
\def\bz{\boldsymbol{z}}
 
\def\bpsi{\boldsymbol{\psi}}
 
\def\bbeta{\beta}
 
\def\beeta{\eta}
 
 
\def\logit{\rm logit}
 
\def\transy{u}
 
\def\so{O}
 
 
\def\one{\mathbb 1}
 
\newcommand{\prob}[1]{ \mathbb{P}\!\left(#1\right)}
 
\newcommand{\probs}[2]{ \mathbb{P}_{#1}\!\left(#2\right)}
 
\newcommand{\esp}[1]{\mathbb{E}\left(#1\right)}
 
\newcommand{\esps}[2]{\mathbb{E}_{#1}\left(#2\right)}
 
\newcommand{\var}[1]{\mbox{Var}\left(#1\right)}
 
\newcommand{\vars}[2]{\mbox{Var}_{#1}\left(#2\right)}
 
\newcommand{\std}[1]{\mbox{sd}\left(#1\right)}
 
\newcommand{\stds}[2]{\mbox{sd}_{#1}\left(#2\right)}
 
\newcommand{\corr}[1]{\mbox{Corr}\left(#1\right)}
 
 
\def\pmacro{\mathbf{p}}
 
\def\py{\pmacro}
 
\def\pt{\pmacro}
 
\def\pc{\pmacro}
 
\def\pu{\pmacro}
 
\def\pyi{\pmacro}
 
\def\pyj{\pmacro}
 
\def\ppsi{\pmacro}
 
\def\ppsii{\pmacro}
 
\def\pcpsith{\pmacro}
 
\def\pth{\pmacro}
 
\def\pypsi{\pmacro}
 
\def\pcypsi{\pmacro}
 
\def\ppsic{\pmacro}
 
\def\pcpsic{\pmacro}
 
\def\pypsic{\pmacro}
 
\def\pypsit{\pmacro}
 
\def\pcypsit{\pmacro}
 
\def\pypsiu{\pmacro}
 
\def\pcypsiu{\pmacro}
 
\def\pypsith{\pmacro}
 
\def\pypsithcut{\pmacro}
 
\def\pypsithc{\pmacro}
 
\def\pcypsiut{\pmacro}
 
\def\pcpsithc{\pmacro}
 
\def\pcthy{\pmacro}
 
\def\pyth{\pmacro}
 
\def\pcpsiy{\pmacro}
 
\def\pz{\pmacro}
 
\def\pw{\pmacro}
 
\def\pcwz{\pmacro}
 
\def\pw{\pmacro}
 
\def\pcyipsii{\pmacro}
 
\def\pyipsii{\pmacro}
 
\def\pypsiij{\pmacro}
 
\def\pyipsi1{\pmacro}
 
\def\ptypsiij{\pmacro}
 
\def\pcyzipsii{\pmacro}
 
\def\pczipsii{\pmacro}
 
\def\pcyizpsii{\pmacro}
 
\def\pcyijzpsii{\pmacro}
 
\def\pcyi1zpsii{\pmacro}
 
\def\pcypsiz{\pmacro}
 
\def\pccypsiz{\pmacro}
 
\def\pypsiz{\pmacro}
 
\def\pcpsiz{\pmacro}
 
\def\peps{\pmacro}
 
$
 
  
 
== Overview ==
 
== Overview ==
  
Before we start looking at modeling a whole population at the same time, we are going to consider only one individual from that population. Much of the basic methodology for modeling one individual follows through to population modeling. We will see that when stepping up from one individual to a population, the difference is that some parameters shared by individuals are considered to be drawn from a probability distribution.
+
Before we start looking at modeling a whole population at the same time, we are going to consider only one individual from that population. Much of the basic methodology for modeling one individual follows through to population modeling. We will see that when stepping up from one individual to a population, the difference is that some parameters shared by individuals are considered to be drawn from a [http://en.wikipedia.org/wiki/Probability_distribution probability distribution].
  
 
Let us begin with a simple  example.
 
Let us begin with a simple  example.
Line 157: Line 8:
 
concentration of a marker in the bloodstream is measured and plotted against time:
 
concentration of a marker in the bloodstream is measured and plotted against time:
  
 
+
::[[File:New_Individual1.png|link=]]
::[[File:individual1.png]]
 
 
 
  
 
We aim to find a mathematical model to describe what we see in the figure. The eventual goal is then to extend this approach to the ''simultaneous modeling'' of a whole population.
 
We aim to find a mathematical model to describe what we see in the figure. The eventual goal is then to extend this approach to the ''simultaneous modeling'' of a whole population.
Line 165: Line 14:
  
 
<br>
 
<br>
 +
 
== Model and methods for the individual approach ==
 
== Model and methods for the individual approach ==
  
Line 171: Line 21:
  
 
In our example, the concentration is a ''continuous'' variable, so we will  try to use continuous functions to model it.
 
In our example, the concentration is a ''continuous'' variable, so we will  try to use continuous functions to model it.
Different types of data  (e.g., count data, categorical data, time-to-event data, etc.) require different types of models. All of these data types will be considered in due time, but for now let us concentrate on a continuous data model.
+
Different types of data  (e.g., [http://en.wikipedia.org/wiki/Count_data count data], [http://en.wikipedia.org/wiki/Categorical_data categorical data], [http://en.wikipedia.org/wiki/Survival_analysis time-to-event data], etc.) require different types of models. All of these data types will be considered in due time, but for now let us concentrate on a continuous data model.
  
 
A model for continuous data can be represented mathematically as follows:
 
A model for continuous data can be represented mathematically as follows:
Line 188: Line 38:
 
* $\psi=(\psi_1, \psi_2, \ldots, \psi_d)$  is a vector of $d$ parameters that influences the value of $f$.
 
* $\psi=(\psi_1, \psi_2, \ldots, \psi_d)$  is a vector of $d$ parameters that influences the value of $f$.
  
* $(e_1, e_2, \ldots, e_n)$  are called the ''residual errors''. Usually, we suppose that they come from some centered probability distribution: $\esp{e_j} =0$. Notice in the figure that even though the concentration is generally decreasing with time, sometimes it appears to increase (e.g., from $t= 7$ to $t=8$). This may be due to measurement errors rather than a true concentration increase, and is one reason we include the possibility of residual errors.
+
* $(e_1, e_2, \ldots, e_n)$  are called the ''residual errors''. Usually, we suppose that they come from some centered probability distribution: $\esp{e_j} =0$.  
  
  
Line 197: Line 47:
 
y_{j} = f(t_j ; \psi) + g(t_j ; \psi)\teps_j  , \quad \quad  1\leq j \leq n,
 
y_{j} = f(t_j ; \psi) + g(t_j ; \psi)\teps_j  , \quad \quad  1\leq j \leq n,
 
</math></div>
 
</math></div>
|reference=(1.1) }}
+
|reference=(1) }}
  
 
where now:
 
where now:
  
  
 +
<ul>
 
* $g$  is called the ''residual error model''. It may be a function of the time $t_j$ and parameters $\psi$.
 
* $g$  is called the ''residual error model''. It may be a function of the time $t_j$ and parameters $\psi$.
  
 
* $(\teps_1, \teps_2, \ldots, \teps_n)$  are the ''normalized'' residual errors. We suppose that these come from a probability distribution which is centered and has unit variance: $\esp{\teps_j} = 0$ and $\var{\teps_j} =1$.
 
* $(\teps_1, \teps_2, \ldots, \teps_n)$  are the ''normalized'' residual errors. We suppose that these come from a probability distribution which is centered and has unit variance: $\esp{\teps_j} = 0$ and $\var{\teps_j} =1$.
 
+
</ul>
 
 
  
 
<br>
 
<br>
Line 213: Line 63:
  
  
The choice of a residual error model $g$ is very flexible, and allows us to account for many different hypotheses we may have on the error's distribution. Let $f_j=f(t_j;\psi)$. Here are some simple error models,
+
The choice of a residual error model $g$ is very flexible, and allows us to account for many different hypotheses we may have on the error's distribution. Let $f_j=f(t_j;\psi)$. Here are some simple error models.
  
  
 +
<ul>
 
* ''Constant error model'': $g=a$. That is,  $y_j=f_j+a\teps_j$.
 
* ''Constant error model'': $g=a$. That is,  $y_j=f_j+a\teps_j$.
  
* ''Proportional error model'': $g=b\,f$. That is, $y_j=f_j+bf_j\teps_j$. This is for when we think the magnitude of the error is proportional to the value of the predicted value $f$.
+
 
 +
* ''Proportional error model'': $g=b\,f$. That is, $y_j=f_j+bf_j\teps_j$. This is for when we think the magnitude of the error is proportional to the value of the predicted value $f$.
 +
 
  
 
* ''Combined error model'': $g=a+b f$. Here, $y_j=f_j+(a+bf_j)\teps_j$.
 
* ''Combined error model'': $g=a+b f$. Here, $y_j=f_j+(a+bf_j)\teps_j$.
 +
  
 
* ''Alternative combined error model'': $g^2=a^2+b^2f^2$. Here, $y_j=f_j+\sqrt{a^2+b^2f_j^2}\teps_j$.
 
* ''Alternative combined error model'': $g^2=a^2+b^2f^2$. Here, $y_j=f_j+\sqrt{a^2+b^2f_j^2}\teps_j$.
 +
  
 
* ''Exponential error model'': here, the model is instead $\log(y_j)=\log(f_j) + a\teps_j$, that is, $g=a$. It is exponential in the sense that if we exponentiate, we end up with $y_j = f_j e^{a\teps_j}$.
 
* ''Exponential error model'': here, the model is instead $\log(y_j)=\log(f_j) + a\teps_j$, that is, $g=a$. It is exponential in the sense that if we exponentiate, we end up with $y_j = f_j e^{a\teps_j}$.
 +
</ul>
  
  
 +
<br>
  
<br>
 
 
===Tasks===
 
===Tasks===
  
 
To model a vector of observations $y = (y_j,\, 1\leq j \leq n$) we must perform several tasks:
 
To model a vector of observations $y = (y_j,\, 1\leq j \leq n$) we must perform several tasks:
  
 +
<ul>
 +
* Select a structural model $f$ and a residual error model $g$.
  
* Select a structural model $f$ and a residual error model $g$.
 
  
 
* Estimate the model's parameters $\psi$.
 
* Estimate the model's parameters $\psi$.
 +
  
 
* ''Assess and validate'' the selected model.
 
* ''Assess and validate'' the selected model.
 +
</ul>
  
  
Line 245: Line 104:
 
=== Selecting structural and residual error models ===
 
=== Selecting structural and residual error models ===
  
As we are interested in parametric modeling, we must choose parametric structural and residual error models. In the absence of biological (or other) information, we might suggest possible structural models just by looking at the graphs of time-evolution of the data. For example, if $y_j$ is increasing with time, we might suggest an affine, quadratic or logarithmic model, depending on the approximate trend of the data. If $y_j$ is instead decreasing ever slower to zero, an exponential model might be appropriate.
+
As we are interested in [http://en.wikipedia.org/wiki/Parametric_model parametric modeling], we must choose parametric structural and residual error models. In the absence of biological (or other) information, we might suggest possible structural models just by looking at the graphs of time-evolution of the data. For example, if $y_j$ is increasing with time, we might suggest an affine, quadratic or logarithmic model, depending on the approximate trend of the data. If $y_j$ is instead decreasing ever slower to zero, an exponential model might be appropriate.
  
However, often  we have biological (or other) information to help us make our choice. For instance, if we have a system of differential equations describing how the drug is eliminated from the body, its solution may provide the formula (i.e., structural model) we are looking for.
+
However, often  we have biological (or other) information to help us make our choice. For instance, if we have a system of [http://en.wikipedia.org/wiki/Differential_equation differential equations] describing how the drug is eliminated from the body, its solution may provide the formula (i.e., structural model) we are looking for.
  
 
As for the residual error model, if it is not immediately obvious which one to choose, several can be tested in conjunction with one or several possible structural models. After parameter estimation, each structural and residual error model pair can be assessed, compared against the others, and/or validated in various ways.
 
As for the residual error model, if it is not immediately obvious which one to choose, several can be tested in conjunction with one or several possible structural models. After parameter estimation, each structural and residual error model pair can be assessed, compared against the others, and/or validated in various ways.
Line 258: Line 117:
  
  
Given the observed data and the choice of a parametric model to describe it, our goal becomes to find the "best" parameters for the model. A traditional framework to solve this kind of problem is called ''maximum likelihood estimation'' or MLE, in which the "most likely" parameters are found, given the data that was observed.
+
Given the observed data and the choice of a parametric model to describe it, our goal becomes to find the "best" parameters for the model. A traditional framework to solve this kind of problem is called [http://en.wikipedia.org/wiki/Maximum_likelihood maximum likelihood estimation] or MLE, in which the "most likely" parameters are found, given the data that was observed.
  
 
The likelihood $L$ is a function defined as:
 
The likelihood $L$ is a function defined as:
Line 265: Line 124:
 
|equation=<math> L(\psi ; y_1,y_2,\ldots,y_n) \ \ \eqdef \ \ \py( y_1,y_2,\ldots,y_n; \psi) , </math> }}
 
|equation=<math> L(\psi ; y_1,y_2,\ldots,y_n) \ \ \eqdef \ \ \py( y_1,y_2,\ldots,y_n; \psi) , </math> }}
  
i.e., the conditional joint density function of $(y_j)$ given the parameters $\psi$, but looked at as if the data are known and the parameters not. The $\hat{\psi}$ which maximizes $L$ is known as the ''maximum likelihood estimator''.
+
i.e., the conditional [http://en.wikipedia.org/wiki/Joint_probability_distribution joint density function] of $(y_j)$ given the parameters $\psi$, but looked at as if the data are known and the parameters not. The $\hat{\psi}$ which maximizes $L$ is known as the ''maximum likelihood estimator''.
  
Suppose that we have chosen a structural model $f$ and residual error model $g$. If we assume for instance that $ \teps_j \sim_{i.i.d} {\cal N}(0,1)$, then the $y_j$ are independent of each other and [[#cont|(1.1)]] means that:
+
Suppose that we have chosen a structural model $f$ and residual error model $g$. If we assume for instance that $ \teps_j \sim_{i.i.d} {\cal N}(0,1)$, then the $y_j$ are independent of each other and [[#cont|(1)]] means that:
  
 
{{Equation1
 
{{Equation1
Line 277: Line 136:
 
|equation=<math>\begin{eqnarray}
 
|equation=<math>\begin{eqnarray}
 
\py(y_1, y_2, \ldots y_n ; \psi) &=& \prod_{j=1}^n \pyj(y_j ; \psi) \\ \\
 
\py(y_1, y_2, \ldots y_n ; \psi) &=& \prod_{j=1}^n \pyj(y_j ; \psi) \\ \\
& = & \displaystyle{ \frac{1}{\prod_{j=1}^n \sqrt{2\pi} g(t_j ; \psi)} }\ e^{\displaystyle{ -\frac{1}{2} } \sum_{j=1}^n \left( \frac{y_j - f(t_j ; \psi)}{g(t_j ; \psi)} \right)^2 } .  
+
& = & \frac{1}{\prod_{j=1}^n \sqrt{2\pi} g(t_j ; \psi)} \  {\rm exp}\left\{-\frac{1}{2} \sum_{j=1}^n \left( \displaystyle{ \frac{y_j - f(t_j ; \psi)}{g(t_j ; \psi)} }\right)^2\right\} .
 
\end{eqnarray}</math> }}
 
\end{eqnarray}</math> }}
  
Line 288: Line 147:
 
\sum_{j=1}^n \log\left(g(t_j ; \psi)^2\right)  + \sum_{j=1}^n \left(\displaystyle{ \frac{y_j - f(t_j ; \psi)}{g(t_j ; \psi)} }\right)^2 \right\} .  
 
\sum_{j=1}^n \log\left(g(t_j ; \psi)^2\right)  + \sum_{j=1}^n \left(\displaystyle{ \frac{y_j - f(t_j ; \psi)}{g(t_j ; \psi)} }\right)^2 \right\} .  
 
\end{eqnarray}</math></div>
 
\end{eqnarray}</math></div>
|reference=(1.2) }}
+
|reference=(2) }}
  
<!-- %and the deviance is therefore -->
 
<!-- %$$-2 LL(\hat{\psi} ; y_1,y_2,\ldots,y_j) = \sum_{j=1}^n \log(g(t_j ; \hat{\psi})^2)  + \sum_{j=1}^n \left(\frac{y_j - f(t_j ; \hat{\psi})}{g(t_j ; \hat{\psi})}\right)^2 +n\log(2\pi)$$ -->
 
  
This minimization problem does not usually have an analytical solution for nonlinear models, so an optimization procedure needs to be used.
+
This minimization problem does not usually have an [http://en.wikipedia.org/wiki/Analytical_expression analytical solution] for nonlinear models, so an [http://en.wikipedia.org/wiki/Mathematical_optimization optimization] procedure needs to be used.
However, for a few specific models, analytic solutions do exist.
+
However, for a few specific models, analytical solutions do exist.
  
For instance, suppose we have a constant error model: $y_{j} = f(t_j ; \psi)  + a \, \teps_j,\,\,  1\leq j \leq n,$ that is: $g(t_j;\psi) = a$. In practice, $f$ is not itself a function of $a$, so we can write $\psi = (\phi,a)$ and therefore: $y_{j} = f(t_j ; \phi)  + a \, \teps_j.$ Thus, [[#LLL|(1.2)]] simplifies to:
+
For instance, suppose we have a constant error model: $y_{j} = f(t_j ; \psi)  + a \, \teps_j,\,\,  1\leq j \leq n,$ that is: $g(t_j;\psi) = a$. In practice, $f$ is not itself a function of $a$, so we can write $\psi = (\phi,a)$ and therefore: $y_{j} = f(t_j ; \phi)  + a \, \teps_j.$ Thus, [[#LLL|(2)]] simplifies to:
  
 
{{Equation1
 
{{Equation1
Line 311: Line 168:
 
\end{eqnarray} </math> }}
 
\end{eqnarray} </math> }}
  
where $\hat{a}^2$ is found by setting the partial derivative of $-2LL$ to zero.
+
where $\hat{a}^2$ is found by setting the [http://en.wikipedia.org/wiki/Partial_derivative partial derivative] of $-2LL$ to zero.
  
 
Whether this has an analytical solution or not depends on the form of $f$. For example, if $f(t_j;\phi)$ is just a linear function of the components of the vector $\phi$, we can represent it as a matrix $F$ whose $j$th row gives the coefficients at time $t_j$. Therefore, we have the matrix equation $y = F \phi + a \teps$.
 
Whether this has an analytical solution or not depends on the form of $f$. For example, if $f(t_j;\phi)$ is just a linear function of the components of the vector $\phi$, we can represent it as a matrix $F$ whose $j$th row gives the coefficients at time $t_j$. Therefore, we have the matrix equation $y = F \phi + a \teps$.
Line 322: Line 179:
 
\hat{a}^2&=& \frac{1}{n}\sum_{j=1}^n \left( y_j - F_j \hat{\phi}\right)^2 . \\
 
\hat{a}^2&=& \frac{1}{n}\sum_{j=1}^n \left( y_j - F_j \hat{\phi}\right)^2 . \\
 
\end{eqnarray}</math> }}
 
\end{eqnarray}</math> }}
 +
  
  
 
<br>
 
<br>
 
 
===Computing the Fisher information matrix===
 
===Computing the Fisher information matrix===
  
The Fisher information is a way of measuring the amount of information that an observable random variable carries about an unknown parameter upon which its probability distribution depends.
+
The [http://en.wikipedia.org/wiki/Fisher_information Fisher information] is a way of measuring the amount of information that an observable random variable carries about an unknown parameter upon which its probability distribution depends.
  
 
Let $\psis $ be the true unknown value of $\psi$, and let $\hatpsi$ be the maximum likelihood estimate of $\psi$. If the observed likelihood function is sufficiently smooth, asymptotic theory for maximum-likelihood estimation holds and
 
Let $\psis $ be the true unknown value of $\psi$, and let $\hatpsi$ be the maximum likelihood estimate of $\psi$. If the observed likelihood function is sufficiently smooth, asymptotic theory for maximum-likelihood estimation holds and
Line 336: Line 193:
 
I_n(\psis)^{\frac{1}{2} }(\hatpsi-\psis) \limite{n\to \infty}{} {\mathcal N}(0,\id) ,
 
I_n(\psis)^{\frac{1}{2} }(\hatpsi-\psis) \limite{n\to \infty}{} {\mathcal N}(0,\id) ,
 
</math></div>
 
</math></div>
|reference=(1.3) }}
+
|reference=(3) }}
  
where
+
where $I_n(\psis)$ is (minus) the Hessian (i.e., the matrix of the second derivatives) of the log-likelihood:
  
 
{{Equation1
 
{{Equation1
|equation=<math>I_n(\psis)=-  \displaystyle{ \frac{\partial}{\partial \psi} } LL(\psis;y_1,y_2,\ldots,y_n)
+
|equation=<math>I_n(\psis)=-  \displaystyle{ \frac{\partial^2}{\partial \psi \partial \psi^\prime} } LL(\psis;y_1,y_2,\ldots,y_n)
 
</math> }}
 
</math> }}
  
Line 350: Line 207:
 
{{Equation1
 
{{Equation1
 
|equation=<math>C(\hatpsi) = - I_n(\hatpsi)^{-1} . </math> }}
 
|equation=<math>C(\hatpsi) = - I_n(\hatpsi)^{-1} . </math> }}
 +
  
  
 
<br>
 
<br>
 +
===Deriving confidence intervals for parameters===
  
===Deriving confidence intervals for parameters===
 
 
Let $\psi_k$ be the $k$th of $d$ components of $\psi$. Imagine that we have estimated $\psi_k$ with $\hatpsi_k$, the $k$th component of the MLE $\hatpsi$, that is, a random variable that converges to $\psi_k^{\star}$ when $n \to \infty$ under very general conditions.
 
Let $\psi_k$ be the $k$th of $d$ components of $\psi$. Imagine that we have estimated $\psi_k$ with $\hatpsi_k$, the $k$th component of the MLE $\hatpsi$, that is, a random variable that converges to $\psi_k^{\star}$ when $n \to \infty$ under very general conditions.
  
Line 362: Line 220:
 
|equation=<math>\widehat{\rm Var}(\hatpsi_k) = C_{kk}(\hatpsi) .</math> }}
 
|equation=<math>\widehat{\rm Var}(\hatpsi_k) = C_{kk}(\hatpsi) .</math> }}
  
We can thus derive an estimator of its standard error:
+
We can thus derive an estimator of its [http://en.wikipedia.org/wiki/Standard_error standard error]:
 
{{Equation1
 
{{Equation1
 
|equation=<math>\widehat{\rm s.e.}(\hatpsi_k) = \sqrt{C_{kk}(\hatpsi)} ,</math> }}
 
|equation=<math>\widehat{\rm s.e.}(\hatpsi_k) = \sqrt{C_{kk}(\hatpsi)} ,</math> }}
  
and a confidence interval of level $1-\alpha$ for $\psi_k^\star$:
+
and a [http://en.wikipedia.org/wiki/Confidence_interval confidence interval] of level $1-\alpha$ for $\psi_k^\star$:
  
 
{{Equation1
 
{{Equation1
 
|equation=<math>{\rm CI}(\psi_k^\star) = \left[\hatpsi_k + \widehat{\rm s.e.}(\hatpsi_k)\,q\left(\frac{\alpha}{2}\right), \ \hatpsi_k + \widehat{\rm s.e.}(\hatpsi_k)\,q\left(1-\frac{\alpha}{2}\right)\right] , </math> }}
 
|equation=<math>{\rm CI}(\psi_k^\star) = \left[\hatpsi_k + \widehat{\rm s.e.}(\hatpsi_k)\,q\left(\frac{\alpha}{2}\right), \ \hatpsi_k + \widehat{\rm s.e.}(\hatpsi_k)\,q\left(1-\frac{\alpha}{2}\right)\right] , </math> }}
  
where $q(w)$ is the quantile of order $w$ of a ${\cal N}(0,1)$ distribution.
+
where $q(w)$ is the [http://en.wikipedia.org/wiki/Quantile quantile] of order $w$ of a ${\cal N}(0,1)$ distribution.
  
  
 
{{Remarks
 
{{Remarks
 
|title=Remarks
 
|title=Remarks
|text= Approximating the fraction $\hatpsi/\widehat{\rm s.e}(\hatpsi_k)$ by the normal distribution is a "good" approximation only when the number of observations $n$ is large. A better approximation should be used for small $n$. In the model $y_j = f(t_j ; \phi) + a\teps_j$, the distribution of $\hat{a}^2$ can be approximated by a chi-square distribution with $(n-d_\phi)$ degrees of freedom, where $d_\phi$ is the dimension of $\phi$. The quantiles of the normal distribution can then be replaced by those of a Student's $t$-distribution with $(n-d_\phi)$ degrees of freedom.
+
|text= Approximating the fraction $\hatpsi/\widehat{\rm s.e}(\hatpsi_k)$ by the normal distribution is a "good" approximation only when the number of observations $n$ is large. A better approximation should be used for small $n$. In the model $y_j = f(t_j ; \phi) + a\teps_j$, the distribution of $\hat{a}^2$ can be approximated by a [http://en.wikipedia.org/wiki/Chi-squared_distribution chi-squared distribution] with $(n-d_\phi)$ [http://en.wikipedia.org/wiki/Degrees_of_freedom_%28statistics%29 degrees of freedom], where $d_\phi$ is the dimension of $\phi$. The quantiles of the normal distribution can then be replaced by those of a [http://en.wikipedia.org/wiki/Student%27s_t-distribution Student's $t$-distribution] with $(n-d_\phi)$ degrees of freedom.
 
<!-- %$${\rm CI}(\psi_k) = [\hatpsi_k - \widehat{\rm s.e}(\hatpsi_k)q((1-\alpha)/2,n-d) , \hatpsi_k + \widehat{\rm s.e}(\hatpsi_k)q((1+\alpha)/2,n-d)]$$ -->
 
<!-- %$${\rm CI}(\psi_k) = [\hatpsi_k - \widehat{\rm s.e}(\hatpsi_k)q((1-\alpha)/2,n-d) , \hatpsi_k + \widehat{\rm s.e}(\hatpsi_k)q((1+\alpha)/2,n-d)]$$ -->
 
<!--  %where $q(\alpha,\nu)$ is the quantile of order $\alpha$ of a $t$-distribution with $\nu$ degrees of freedom. -->
 
<!--  %where $q(\alpha,\nu)$ is the quantile of order $\alpha$ of a $t$-distribution with $\nu$ degrees of freedom. -->
Line 387: Line 245:
  
 
The structural model $f$ can be predicted for any $t$ using the estimated value $f(t; \hatphi)$. For that $t$, we can then derive a confidence interval for $f(t,\phi)$ using the estimated variance of $\hatphi$. Indeed, as a first approximation we have:
 
The structural model $f$ can be predicted for any $t$ using the estimated value $f(t; \hatphi)$. For that $t$, we can then derive a confidence interval for $f(t,\phi)$ using the estimated variance of $\hatphi$. Indeed, as a first approximation we have:
<!-- %$$  f(t ; \hatphi) &\simeq& f(t ; \phi) + J_f(\phi) (\hatphi - \phi) $$ -->
+
 
<!-- %where $J_f(t,\phi)$ is the Jacobian matrix of $f$, {\it i.e.} the matrix of all first-order partial derivatives of $f(t,\phi)$ with respect to the components of $\phi$ (the $k$th row of $J_f(t,\phi)$ is ) -->
 
  
 
{{Equation1
 
{{Equation1
Line 408: Line 265:
 
{{Equation1
 
{{Equation1
 
|equation=<math>{\rm CI}(f(t ; \phi^\star)) = \left[f(t ; \hatphi) + \widehat{\rm s.e.}(f(t ; \hatphi))\,q\left(\frac{\alpha}{2}\right), \ f(t ; \hatphi) + \widehat{\rm s.e.}(f(t ; \hatphi))\,q\left(1-\frac{\alpha}{2}\right)\right].</math> }}
 
|equation=<math>{\rm CI}(f(t ; \phi^\star)) = \left[f(t ; \hatphi) + \widehat{\rm s.e.}(f(t ; \hatphi))\,q\left(\frac{\alpha}{2}\right), \ f(t ; \hatphi) + \widehat{\rm s.e.}(f(t ; \hatphi))\,q\left(1-\frac{\alpha}{2}\right)\right].</math> }}
 +
  
  
 
<br>
 
<br>
 
 
===Estimating confidence intervals using Monte Carlo simulation===
 
===Estimating confidence intervals using Monte Carlo simulation===
  
The use of Monte Carlo methods to estimate a distribution does not require any approximation of the  model.
+
The use of [http://en.wikipedia.org/wiki/Monte_Carlo_method Monte Carlo methods] to estimate a distribution does not require any approximation of the  model.
  
 
We proceed in the following way. Suppose we have found a MLE $\hatpsi$ of $\psi$. We then simulate a data vector $y^{(1)}$ by first randomly generating the vector $\teps^{(1)}$ and then calculating for $1 \leq j \leq n$,
 
We proceed in the following way. Suppose we have found a MLE $\hatpsi$ of $\psi$. We then simulate a data vector $y^{(1)}$ by first randomly generating the vector $\teps^{(1)}$ and then calculating for $1 \leq j \leq n$,
Line 430: Line 287:
 
|equation=<math> [\hat{\psi}_{k,([\frac{\alpha}{2} M])} \ , \ \hat{\psi}_{k,([ (1-\frac{\alpha}{2})M])} ], </math> }}
 
|equation=<math> [\hat{\psi}_{k,([\frac{\alpha}{2} M])} \ , \ \hat{\psi}_{k,([ (1-\frac{\alpha}{2})M])} ], </math> }}
  
where $[\cdot]$ denotes the integer part and  $(\psi_{k,(m)},\ 1 \leq m \leq M)$ the order statistic, i.e., the parameters $(\hatpsi_k^{(m)}, 1 \leq m \leq M)$ reordered so that $\hatpsi_{k,(1)} \leq \hatpsi_{k,(2)} \leq \ldots \leq \hatpsi_{k,(M)}$.
+
where $[\cdot]$ denotes the [http://en.wikipedia.org/wiki/Floor_and_ceiling_functions integer part] and  $(\psi_{k,(m)},\ 1 \leq m \leq M)$ the order statistic, i.e., the parameters $(\hatpsi_k^{(m)}, 1 \leq m \leq M)$ reordered so that $\hatpsi_{k,(1)} \leq \hatpsi_{k,(2)} \leq \ldots \leq \hatpsi_{k,(M)}$.
  
  
Line 436: Line 293:
  
 
<br>
 
<br>
 
 
==A PK  example ==
 
==A PK  example ==
  
Line 445: Line 301:
 
===The data===
 
===The data===
  
 +
This modeling process is illustrated in detail in the following [http://en.wikipedia.org/wiki/Pharmacokinetics PK] example. Let us consider a dose D=50mg of a drug administered orally to a patient at time $t=0$. The concentration of the drug in the bloodstream is then measured at times $(t_j) = (0.5, 1,\,1.5,\,2,\,3,\,4,\,8,\,10,\,12,\,16,\,20,\,24).$ Here is the file {{Verbatim|individualFitting_data.txt}} with the data:
  
This modeling process is illustrated in detail in the following PK example. Let us consider a dose D=50mg of a drug administrated orally to a patient at time $t=0$. The concentration of the drug in the bloodstream is then measured at times $(t_j) = (0.5, 1,\,1.5,\,2,\,3,\,4,\,8,\,10,\,12,\,16,\,20,\,24).$
 
  
Here is the file <span style="font-family:courier new; font-size:12pt">individualFitting_data.txt</span> with the data:
+
{| class="wikitable" align="center" style="width: 30%;margin-left:15em"
 
+
!|      Time   ||    Concentration
{| class="wikitable" align="left" margin-left="5em" style="text-align:center; font-size:9pt; width: 450px; background-color:#F5F5DC;"
 
!|      time   ||    concentration
 
 
|-
 
|-
 
|0.5     ||      0.94
 
|0.5     ||      0.94
Line 479: Line 333:
  
  
We are going to perform the analyses for this example with the free statistical software <span style="font-family:courier new; font-size:12pt">R</span>. First, we import the data and plot it to have a look:
+
We are going to perform the analyses for this example with the free statistical software [http://www.r-project.org/  {{Verbatim|R}}]. First, we import the data and plot it to have a look:
 
+
{| cellpadding="5" cellspacing="0"  
 
+
| style="width: 50%" |  
{| cellpadding="5" cellspacing="5"
+
[[File:NewIndividual1.png|link=]]
| style="width: 600px;" |  
+
| style="width: 50%" | {{RcodeForTable
[[File:individual1.png]]
 
| style="width: 600px;" |
 
{{Rcode
 
 
|name=
 
|name=
 
|code=
 
|code=
Line 493: Line 344:
 
t=pk1$time   
 
t=pk1$time   
 
y=pk1$concentration
 
y=pk1$concentration
plot(t,y,xlab="time(hour)",ylab="concentration(mg/l)",col="blue")  </pre> }}
+
plot(t, y, xlab="time(hour)",
 +
    ylab="concentration(mg/l)", col="blue")   
 +
</pre> }}
 
|}
 
|}
 
  
  
Line 501: Line 353:
  
 
===Fitting two PK models===
 
===Fitting two PK models===
 +
 
We are going to consider two possible structural models that may describe the observed time-course of the concentration:
 
We are going to consider two possible structural models that may describe the observed time-course of the concentration:
  
  
* A one compartment model with first-order absorption and linear elimination:
+
<ul>
 +
* A [http://en.wikipedia.org/wiki/Multi-compartment_model#Single-compartment_model one compartment model] with first-order [http://en.wikipedia.org/wiki/Absorption_%28pharmacokinetics%29 absorption] and linear elimination:
  
 
{{Equation1
 
{{Equation1
Line 520: Line 374:
 
f_2(t ; \phi_2) &=& \left\{  \begin{array}{ll}
 
f_2(t ; \phi_2) &=& \left\{  \begin{array}{ll}
 
\displaystyle{ \frac{D}{V \,T_{k0} \, k_e} }\left( 1- e^{-k_e \, t} \right) & {\rm if }\ t\leq T_{k0} \\
 
\displaystyle{ \frac{D}{V \,T_{k0} \, k_e} }\left( 1- e^{-k_e \, t} \right) & {\rm if }\ t\leq T_{k0} \\
\displaystyle{ \frac{D}{V \,T_{k0} \, k_e} } \left( 1- e^{-k_e \, T_{k0}} \right)e^{-k_e \, (t- T_{k0})} & {\rm otherwise} .
+
\displaystyle{ \frac{D}{V \,T_{k0} \, k_e} } \left( 1- e^{-k_e \, T_{k0} } \right)e^{-k_e \, (t- T_{k0})} & {\rm otherwise} .
 
\end{array}
 
\end{array}
 
\right.
 
\right.
 
\end{eqnarray}</math> }}
 
\end{eqnarray}</math> }}
 +
</ul>
  
  
We define each of these functions in <span style="font-family:courier new; font-size:12pt">R</span>:
+
We define each of these functions in {{Verbatim|R}}:
  
  
::{{Rcode
+
{{Rcode
 
|name=
 
|name=
 
|code=
 
|code=
 
<pre style="background-color: #EFEFEF; border:none">
 
<pre style="background-color: #EFEFEF; border:none">
 
predc1=function(t,x){
 
predc1=function(t,x){
f=50*x[1]/x[2]/(x[1]-x[3])*(exp(-x[3]*t)-exp(-x[1]*t)) }
+
  f=50*x[1]/x[2]/(x[1]-x[3])*(exp(-x[3]*t)-exp(-x[1]*t))
 +
return(f)}
  
 
predc2=function(t,x){
 
predc2=function(t,x){
ff=50/x[1]/x[2]/x[3]*(1-exp(-x[3]*t))
+
  f=50/x[1]/x[2]/x[3]*(1-exp(-x[3]*t))
ff[t>x[1]]=50/x[1]/x[2]/x[3]*(1-exp(-x[3]*x[1]))*exp(-x[3]*(t[t>x[1]]-x[1]))
+
  f[t>x[1]]=50/x[1]/x[2]/x[3]*(1-exp(-x[3]*x[1]))*exp(-x[3]*(t[t>x[1]]-x[1]))
f=ff} </pre>
+
return(f)} </pre>
 
}}
 
}}
 
  
 
We then define two models ${\cal M}_1$ and ${\cal M}_2$ that assume (for now)  constant residual error models:
 
We then define two models ${\cal M}_1$ and ${\cal M}_2$ that assume (for now)  constant residual error models:
Line 553: Line 408:
 
We can fit these two models to our data by computing the MLE $\hatpsi_1=(\hatphi_1,\hat{a}_1)$ and $\hatpsi_2=(\hatphi_2,\hat{a}_2)$ of $\psi$  under each model:
 
We can fit these two models to our data by computing the MLE $\hatpsi_1=(\hatphi_1,\hat{a}_1)$ and $\hatpsi_2=(\hatphi_2,\hat{a}_2)$ of $\psi$  under each model:
  
 
+
{| cellpadding="10" cellspacing="10"
::{{Rcode
+
| style="width:50%" |
 +
{{RcodeForTable
 
|name=
 
|name=
 
|code=
 
|code=
 
<pre style="background-color: #EFEFEF; border:none">
 
<pre style="background-color: #EFEFEF; border:none">
fmin1=function(x,y,t)
+
fmin1=function(x,y,t){
{f=predc1(t,x)
+
  f=predc1(t,x)
g=x[4]
+
  g=x[4]
e=sum( ((y-f)/g)^2 + log(g^2))
+
  e=sum( ((y-f)/g)^2 + log(g^2))
}
+
return(e)}
  
fmin2=function(x,y,t)
+
fmin2=function(x,y,t){
{f=predc2(t,x)
+
  f=predc2(t,x)
g=x[4]
+
  g=x[4]
e=sum( ((y-f)/g)^2 + log(g^2))
+
  e=sum( ((y-f)/g)^2 + log(g^2))
}
+
return(e)}
  
 
#--------- MLE --------------------------------
 
#--------- MLE --------------------------------
Line 579: Line 435:
 
</pre>
 
</pre>
 
}}
 
}}
 +
| style="width:50%" |
 +
:Here are the parameter estimation results:
  
  
 
+
{{JustCodeForTable
Here are the parameter estimation results:
 
 
 
::{{JustCode
 
 
|code=
 
|code=
<pre style="background-color: #EFEFEF; border:none">
+
<pre style="background-color: #EFEFEF; border:none; color:blue">
 
> cat(" psi1 =",psi1,"\n\n")
 
> cat(" psi1 =",psi1,"\n\n")
 
  psi1 = 0.3240916 6.001204 0.3239337 0.4366948
 
  psi1 = 0.3240916 6.001204 0.3239337 0.4366948
Line 592: Line 447:
 
> cat(" psi2 =",psi2,"\n\n")
 
> cat(" psi2 =",psi2,"\n\n")
 
  psi2 = 3.203111 8.999746 0.229977 0.2555242
 
  psi2 = 3.203111 8.999746 0.229977 0.2555242
}}
+
</pre> }}
 +
|}
  
  
 
<br>
 
<br>
 +
 
===Assessing and selecting the PK model===
 
===Assessing and selecting the PK model===
 +
 
The estimated parameters $\hatphi_1$ and $\hatphi_2$ can then be used for computing the predicted concentrations $\hat{f}_1(t)$ and $\hat{f}_2(t)$ under both models at any time $t$. These curves can then be plotted over the original data and compared:
 
The estimated parameters $\hatphi_1$ and $\hatphi_2$ can then be used for computing the predicted concentrations $\hat{f}_1(t)$ and $\hat{f}_2(t)$ under both models at any time $t$. These curves can then be plotted over the original data and compared:
  
 
+
{| cellpadding="5" cellspacing="0"  
{| cellpadding="5" cellspacing="10" align="left"
+
| style="width:50%" |  
| style="width:550px;" |  
+
[[File:New_Individual2.png|link=]]
::[File:individual2.png]]
+
| style="width:50%" |
| style="width:550px;" |
+
{{RcodeForTable
{{Rcode
 
 
|name=
 
|name=
 
|code=
 
|code=
 
<pre style="background-color: #EFEFEF; border:none">
 
<pre style="background-color: #EFEFEF; border:none">
 
tc=seq(from=0,to=25,by=0.1)
 
tc=seq(from=0,to=25,by=0.1)
 +
phi1=psi1[c(1,2,3)]
 
fc1=predc1(tc,phi1)
 
fc1=predc1(tc,phi1)
 +
phi2=psi2[c(1,2,3)]
 
fc2=predc2(tc,phi2)
 
fc2=predc2(tc,phi2)
  
plot(t,y,ylim=c(0,4.1),xlab="time (hour)",ylab="concentration (mg/l)",col = "blue")
+
plot(t,y,ylim=c(0,4.1),xlab="time (hour)",  
 +
          ylab="concentration (mg/l)",col = "blue")
 
lines(tc,fc1, type = "l", col = "green", lwd=2)
 
lines(tc,fc1, type = "l", col = "green", lwd=2)
 
lines(tc,fc2, type = "l", col = "red", lwd=2)
 
lines(tc,fc2, type = "l", col = "red", lwd=2)
 
abline(a=0,b=0,lty=2)
 
abline(a=0,b=0,lty=2)
legend(13,4,c("observations","first order absorption","zero order absorption"),
+
legend(13,4,c("observations","first order absorption",  
 +
          "zero order absorption"),
 
lty=c(-1,1,1), pch=c(1,-1,-1), lwd=2, col=c("blue","green","red"))
 
lty=c(-1,1,1), pch=c(1,-1,-1), lwd=2, col=c("blue","green","red"))
 
</pre> }}
 
</pre> }}
 
|}
 
|}
 
  
 
We clearly see that a much better fit is obtained with model ${\cal M}_2$, i.e., the one assuming a zero-order absorption process.
 
We clearly see that a much better fit is obtained with model ${\cal M}_2$, i.e., the one assuming a zero-order absorption process.
 
  
 
Another useful goodness-of-fit plot is obtained by displaying the observations $(y_j)$ versus the predictions $\hat{y}_j=f(t_j ; \hatpsi)$ given by the models:
 
Another useful goodness-of-fit plot is obtained by displaying the observations $(y_j)$ versus the predictions $\hat{y}_j=f(t_j ; \hatpsi)$ given by the models:
  
 
+
{| cellpadding="5" cellspacing="0"  
{| cellpadding="5" cellspacing="10" align="left"
+
| style="width:50%" |  
| style="width:550px;" |  
+
[[File:individual3.png|link=]]
::[File:individual3.png]]
+
| style="width:50%" |
| style="width:550px;" |
+
{{RcodeForTable
{{Rcode
 
 
|name=
 
|name=
 
|code=
 
|code=
Line 640: Line 498:
  
 
par(mfrow= c(1,2))
 
par(mfrow= c(1,2))
plot(f1,y,xlim=c(0,4),ylim=c(0,4), main="model 1")
+
plot(f1,y,xlim=c(0,4),ylim=c(0,4),main="model 1")
 
abline(a=0,b=1,lty=1)
 
abline(a=0,b=1,lty=1)
plot(f2,y,xlim=c(0,4),ylim=c(0,4), main="model 2")
+
plot(f2,y,xlim=c(0,4),ylim=c(0,4),main="model 2")
 
abline(a=0,b=1,lty=1)
 
abline(a=0,b=1,lty=1)
 
</pre> }}
 
</pre> }}
 
|}
 
|}
 +
  
  
 
<br>
 
<br>
 +
 
===Model selection===
 
===Model selection===
  
  
Again, ${\cal M}_2$ would seem to have a slight edge. This can be tested more analytically using the Bayesian Information Criteria (BIC):
+
Again, ${\cal M}_2$ would seem to have a slight edge. This can be tested more analytically using the [http://en.wikipedia.org/wiki/Bayesian_information_criterion Bayesian Information Criteria] (BIC):
 
 
  
{| cellpadding="5" cellspacing="10" align="left"
+
{| cellpadding="10" cellspacing="10"  
| style="width:550px;" |  
+
| style="width:50%" |  
{{Rcode
+
{{RcodeForTable
 
|name=
 
|name=
 
|code=
 
|code=
Line 666: Line 525:
 
bic2=deviance2+log(n)*length(psi2)
 
bic2=deviance2+log(n)*length(psi2)
 
</pre> }}
 
</pre> }}
| style="width:550px;" |  
+
| style="width:50%" |  
{{JustCode
+
{{JustCodeForTable
 
|code=
 
|code=
<pre style="background-color: #EFEFEF; border:none">
+
<pre style="background-color: #EFEFEF; border:none; color:blue">
 
> cat(" bic1 =",bic1,"\n\n")
 
> cat(" bic1 =",bic1,"\n\n")
 
  bic1 = 24.10972
 
  bic1 = 24.10972
Line 679: Line 538:
  
 
A smaller BIC is better. Therefore, this also suggests that model ${\cal M}_2$ should be selected.
 
A smaller BIC is better. Therefore, this also suggests that model ${\cal M}_2$ should be selected.
 +
  
  
 
<br>
 
<br>
 +
 
===Fitting different error models===
 
===Fitting different error models===
  
For the moment, we have only considered  constant error models. However, the ``observations vs predictions'' figure seen earlier hints that the amplitude of the residual errors may increase with the size of the predicted value. Let us therefore take a closer look at four different residual error models, each of which we will associate with the ``best'' structural model $f_2$:
 
  
{| cellpadding="15" cellspacing="30" align="left" left-margin="3em" style="text-align:center;"
+
For the moment, we have only considered  constant error models. However, the "observations vs predictions" figure hints that the amplitude of the residual errors may increase with the size of the predicted value. Let us therefore take a closer look at four different residual error models, each of which we will associate with the "best" structural model $f_2$:
 +
 
 +
{| cellpadding="2" cellspacing="8" style="text-align:left; margin-left:4%"
 
|${\cal M}_2$ || Constant error model: || $y_j=f_2(t_j;\phi_2)+a_2\teps_j$
 
|${\cal M}_2$ || Constant error model: || $y_j=f_2(t_j;\phi_2)+a_2\teps_j$
 
|-
 
|-
Line 696: Line 558:
 
|}
 
|}
  
 +
The three new ones need to be entered into {{Verbatim|R}}:
  
The three new ones need to be entered into <span style="font-family:courier new; font-size:12pt">R</span>:
 
  
::{{Rcode
+
{{Rcode
 
|name=
 
|name=
 
|code=
 
|code=
 
<pre style="background-color: #EFEFEF; border:none">
 
<pre style="background-color: #EFEFEF; border:none">
fmin3=function(x,y,t)
+
fmin3=function(x,y,t){
{f=predc2(t,x)
+
  f=predc2(t,x)
g=x[4]*f
+
  g=x[4]*f
e=sum( ((y-f)/g)^2 + log(g^2))
+
  e=sum( ((y-f)/g)^2 + log(g^2))
}
+
return(e)}
  
fmin4=function(x,y,t)
+
fmin4=function(x,y,t){
{f=predc2(t,x)
+
  f=predc2(t,x)
g=abs(x[4])+abs(x[5])*f
+
  g=abs(x[4])+abs(x[5])*f
e=sum( ((y-f)/g)^2 + log(g^2))
+
  e=sum( ((y-f)/g)^2 + log(g^2))
}
+
return(e)}
  
fmin5=function(x,y,t)
+
fmin5=function(x,y,t){
{f=predc2(t,x)
+
  f=predc2(t,x)
g=x[4]
+
  g=x[4]
e=sum( ((log(y)-log(f))/g)^2 + log(g^2))
+
  e=sum( ((log(y)-log(f))/g)^2 + log(g^2))
}
+
return(e)}
 
</pre> }}
 
</pre> }}
  
Line 725: Line 587:
 
We can now compute the MLE $\hatpsi_3=(\hatphi_3,\hat{b}_3)$, $\hatpsi_4=(\hatphi_4,\hat{a}_4,\hat{b}_4)$ and $\hatpsi_5=(\hatphi_5,\hat{a}_5)$ of $\psi$  under models ${\cal M}_3$, ${\cal M}_4$  and ${\cal M}_5$:
 
We can now compute the MLE $\hatpsi_3=(\hatphi_3,\hat{b}_3)$, $\hatpsi_4=(\hatphi_4,\hat{a}_4,\hat{b}_4)$ and $\hatpsi_5=(\hatphi_5,\hat{a}_5)$ of $\psi$  under models ${\cal M}_3$, ${\cal M}_4$  and ${\cal M}_5$:
  
 
+
{| cellpadding="10" cellspacing="10"
{| cellpadding="5" cellspacing="10"  
+
|style="width:50%" |
|style="width:550xp" |
+
{{RcodeForTable
{{Rcode
 
 
|name=
 
|name=
 
|code=
 
|code=
 
<pre style="background-color: #EFEFEF; border:none">
 
<pre style="background-color: #EFEFEF; border:none">
fmin3=function(x,y,t)
 
{f=predc2(t,x)
 
g=x[4]*f
 
e=sum( ((y-f)/g)^2 + log(g^2))
 
}
 
 
fmin4=function(x,y,t)
 
{f=predc2(t,x)
 
g=abs(x[4])+abs(x[5])*f
 
e=sum( ((y-f)/g)^2 + log(g^2))
 
}
 
 
fmin5=function(x,y,t)
 
{f=predc2(t,x)
 
g=x[4]
 
e=sum( ((log(y)-log(f))/g)^2 + log(g^2))
 
}
 
 
 
#----------------  MLE  -------------------
 
#----------------  MLE  -------------------
  
pk.nlm3=nlm(fmin3, c(phi2,0.1), y, t, hessian="true")
+
pk.nlm3=nlm(fmin3, c(phi2,0.1), y, t,  
 +
      hessian="true")
 
psi3=pk.nlm3$estimate
 
psi3=pk.nlm3$estimate
  
pk.nlm4=nlm(fmin4, c(phi2,1,0.1), y, t, hessian="true")
+
pk.nlm4=nlm(fmin4, c(phi2,1,0.1), y, t,
 +
      hessian="true")
 
psi4=pk.nlm4$estimate
 
psi4=pk.nlm4$estimate
 
psi4[c(4,5)]=abs(psi4[c(4,5)])
 
psi4[c(4,5)]=abs(psi4[c(4,5)])
  
pk.nlm5=nlm(fmin5, c(phi2,0.1), y, t, hessian="true")
+
pk.nlm5=nlm(fmin5, c(phi2,0.1), y, t,
 +
      hessian="true")
 
psi5=pk.nlm5$estimate   
 
psi5=pk.nlm5$estimate   
 
</pre> }}
 
</pre> }}
|style="width:550xp" |
+
|style="width:50%" |
{{JustCode
+
{{JustCodeForTable
|code=<pre style="background-color: #EFEFEF; border:none">
+
|code=<pre style="background-color: #EFEFEF; border:none; color:blue">
 
> cat(" psi3 =",psi3,"\n\n")
 
> cat(" psi3 =",psi3,"\n\n")
 
  psi3 = 2.642409 11.44113 0.1838779 0.2189221
 
  psi3 = 2.642409 11.44113 0.1838779 0.2189221
Line 778: Line 624:
  
 
<br>
 
<br>
 +
 
===Selecting the error model===
 
===Selecting the error model===
  
Line 783: Line 630:
  
  
{| cellspacing="10" cellpadding="5"
+
{| cellpadding="5" cellspacing="0"  
|style="width=550px"|
+
|style="width=50%"|
::[[File:individual4.png]]
+
[[File:New_Individual4.png|link=]]
|style="width=550px"|
+
|style="width=50%"|
{{Rcode
+
{{RcodeForTable
 
|name=
 
|name=
 
|code=
 
|code=
 
<pre style="background-color: #EFEFEF; border:none">
 
<pre style="background-color: #EFEFEF; border:none">
tc=seq(from=0,to=25,by=0.1)
+
phi3=psi3[c(1,2,3)]
fc1=predc1(tc,phi1)
+
fc3=predc2(tc,phi3)
fc2=predc2(tc,phi2)
+
phi4=psi4[c(1,2,3)]
 +
fc4=predc2(tc,phi4)
 +
phi5=psi5[c(1,2,3)]
 +
fc5=predc2(tc,phi5)
  
plot(t,y,ylim=c(0,4.1),xlab="time (hour)",ylab="concentration (mg/l)",col = "blue")
+
par(mfrow= c(1,1))
lines(tc,fc1, type = "l", col = "green", lwd=2)
+
plot(t,y,ylim=c(0,4.1),xlab="time (hour)",ylab="concentration (mg/l)",
 +
        col = "blue")
 
lines(tc,fc2, type = "l", col = "red", lwd=2)
 
lines(tc,fc2, type = "l", col = "red", lwd=2)
 +
lines(tc,fc3, type = "l", col = "green", lwd=2)
 +
lines(tc,fc4, type = "l", col = "cyan", lwd=2)
 +
lines(tc,fc5, type = "l", col = "magenta", lwd=2)
 
abline(a=0,b=0,lty=2)
 
abline(a=0,b=0,lty=2)
legend(13,4,c("observations","first order absorption","zero order absorption"),
+
legend(13,4,c("observations","constant error model",
lty=c(-1,1,1), pch=c(1,-1,-1), lwd=2, col=c("blue","green","red"))
+
        "proportional error model","combined error model","exponential error model"),
 +
lty=c(-1,1,1,1,1), pch=c(1,-1,-1,-1,-1), lwd=2,  
 +
        col=c("blue","red","green","cyan","magenta"))
 
</pre> }}
 
</pre> }}
|}
+
|}  
 +
 
  
 
As you can see, the three predicted concentrations obtained with models ${\cal M}_3$, ${\cal M}_4$  and ${\cal M}_5$ are quite similar. We now calculate the BIC for each:
 
As you can see, the three predicted concentrations obtained with models ${\cal M}_3$, ${\cal M}_4$  and ${\cal M}_5$ are quite similar. We now calculate the BIC for each:
  
{| cellspacing="10" cellpadding="5"
+
 
|style="width=550px"|
+
{| cellpadding="10" cellspacing="10"  
{{Rcode
+
|style="width=50%"|
 +
{{RcodeForTable
 
|name=
 
|name=
 
|code=
 
|code=
Line 819: Line 677:
 
bic5=deviance5 + log(n)*length(psi5)
 
bic5=deviance5 + log(n)*length(psi5)
 
</pre> }}
 
</pre> }}
|style="width=550px"|
+
|style="width=50%"|
{{JustCode
+
{{JustCodeForTable
 
|code=
 
|code=
<pre style="background-color: #EFEFEF; border:none">
+
<pre style="background-color: #EFEFEF; border:none; color:blue">
 
> cat(" bic3 =",bic3,"\n\n")
 
> cat(" bic3 =",bic3,"\n\n")
 
  bic3 = 3.443607
 
  bic3 = 3.443607
Line 832: Line 690:
 
  bic5 = 4.108521
 
  bic5 = 4.108521
 
</pre> }}
 
</pre> }}
|}
+
|}  
  
 
All of these BIC are lower than the constant residual error one. BIC selects the residual error model ${\cal M}_3$ with a proportional component.
 
All of these BIC are lower than the constant residual error one. BIC selects the residual error model ${\cal M}_3$ with a proportional component.
Line 838: Line 696:
 
There is not a large difference between these three error models, though the proportional and combined error models give the smallest and essentially identical BIC.  We decide to use the combined error model ${\cal M}_4$ in the following (the same types of analysis could be done with the proportional error model).
 
There is not a large difference between these three error models, though the proportional and combined error models give the smallest and essentially identical BIC.  We decide to use the combined error model ${\cal M}_4$ in the following (the same types of analysis could be done with the proportional error model).
  
A 90% confidence interval for $\psi_4$ can derived from the Hessian (i.e., the square matrix of second-order partial derivatives)  of the objective function (i.e. -2 $\times \ LL$):
+
A 90% confidence interval for $\psi_4$ can derived from the Hessian (i.e., the square matrix of second-order partial derivatives)  of the objective function (i.e., -2 $\times \ LL$):
  
  
{| cellspacing="10" cellpadding="5"
+
{| cellpadding="10" cellspacing="10"  
|style="width=550px"|
+
|style="width=50%"|
{{Rcode
+
{{RcodeForTable
 
|name=
 
|name=
 
|code=
 
|code=
Line 852: Line 710:
 
H4=solve(I4)
 
H4=solve(I4)
 
s4=sqrt(diag(H4)*n/df)
 
s4=sqrt(diag(H4)*n/df)
delta4=s4*qt(0.5+ialpha/2,df)
+
delta4=s4*qt(0.5+ialpha/2, df)
 
ci4=matrix(c(psi4-delta4,psi4+delta4),ncol=2)
 
ci4=matrix(c(psi4-delta4,psi4+delta4),ncol=2)
 
</pre> }}
 
</pre> }}
|style="width=550px"|
+
|style="width=50%"|
{{JustCode
+
{{JustCodeForTable
 
|code=
 
|code=
<pre style="background-color: #EFEFEF; border:none">
+
<pre style="background-color: #EFEFEF; border:none; color:blue">
 
> ci4
 
> ci4
 
             [,1]        [,2]
 
             [,1]        [,2]
Line 869: Line 727:
 
|}
 
|}
  
We can also calculate a $90%$ confidence interval for $f_4(t)$ using the Central Limit Theorem (see \eqref{intro_individualCLT}):
 
  
::{{Rcode
+
We can also calculate a 90% confidence interval for $f_4(t)$ using the [http://en.wikipedia.org/wiki/Central_limit_theorem Central Limit Theorem] (see [[#intro_individualCLT|(3)]]):
 +
 
 +
 
 +
{{Rcode
 
|name=
 
|name=
 
|code=
 
|code=
 
<pre style="background-color: #EFEFEF; border:none">
 
<pre style="background-color: #EFEFEF; border:none">
nlpredci=function(phi,f,H)
+
nlpredci=function(phi,f,H){
{
+
  dphi=length(phi)
dphi=length(phi)
+
  nf=length(f)
nf=length(f)
+
  H=H*n/(n-dphi)
H=H*n/(n-dphi)
+
  S=H[seq(1,dphi),seq(1,dphi)]
S=H[seq(1,dphi),seq(1,dphi)]
+
  G=matrix(nrow=nf,ncol=dphi)
G=matrix(nrow=nf,ncol=dphi)
+
  for (k in seq(1,dphi)) {
for (k in seq(1,dphi)) {
+
    dk=phi[k]*(1e-5)
  dk=phi[k]*(1e-5)
+
    phid=phi
  phid=phi
+
    phid[k]=phi[k] + dk
  phid[k]=phi[k] + dk
+
    fd=predc2(tc,phid)
  fd=predc2(tc,phid)
+
    G[,k]=(f-fd)/dk
  G[,k]=(f-fd)/dk
+
  }
}
+
  M=rowSums((G%*%S)*G)
M=rowSums((G%*%S)*G)
+
  deltaf=sqrt(M)*qt(0.5+alpha/2,df)
deltaf=sqrt(M)*qt(0.5+ialpha/2,df)
+
return(deltaf)}
}
 
  
 
deltafc4=nlpredci(phi4,fc4,H4)
 
deltafc4=nlpredci(phi4,fc4,H4)
Line 898: Line 757:
 
This can then be plotted:
 
This can then be plotted:
  
{| cellspacing="10" cellpadding="5"
+
 
|style="width=550px"|
+
{| cellpadding="5" cellspacing="0"  
::[[File:individual6.png]]
+
|style="width=50%"|
|style="width=550px"|
+
[[File:NewIndividual6.png|link=]]
{{Rcode
+
|style="width=50%"|
 +
{{RcodeForTable
 
|name=
 
|name=
 
|code=
 
|code=
 
<pre style="background-color: #EFEFEF; border:none">
 
<pre style="background-color: #EFEFEF; border:none">
plot(t,y,ylim=c(0,4.5),xlab="time (hour)",ylab="concentration (mg/l)",col = "blue")
+
plot(t,y,ylim=c(0,4.5), xlab="time (hour)",  
lines(tc,fc4, type = "l", col = "red", lwd=2)
+
      ylab="concentration (mg/l)", col="blue")
lines(tc,fc4-deltafc4, type = "l", col = "red", lwd=1, lty=3)
+
lines(tc,fc4, type = "l",col = "red",lwd=2)
lines(tc,fc4+deltafc4, type = "l", col = "red", lwd=1, lty=3)
+
lines(tc, fc4-deltafc4, type = "l",
 +
      col = "red" ,lwd=1, lty=3)
 +
lines(tc,fc4+deltafc4,type = "l",
 +
      col = "red", lwd=1, lty=3)
 
abline(a=0,b=0,lty=2)
 
abline(a=0,b=0,lty=2)
legend(10.5,4.5,c("observed concentrations","predicted concentration",
+
legend(10.5,4.5,c("observed concentrations",
 +
      "predicted concentration",  
 
       "CI for predicted concentration"),
 
       "CI for predicted concentration"),
lty=c(-1,1,3), pch=c(1,-1,-1), lwd=c(2,2,1), col=c("blue","red","red"))
+
      lty=c(-1,1,3),pch=c(1,-1,-1),lwd=c(2,2,1),
 +
      col=c("blue","red","red"))
 
</pre> }}
 
</pre> }}
|}
+
|}  
 
 
  
 
Alternatively, prediction intervals for $\hatpsi_4$, $\hat{f}_4(t;\hatpsi_4)$ and new observations for any time $t$ can be estimated by Monte Carlo simulation:
 
Alternatively, prediction intervals for $\hatpsi_4$, $\hat{f}_4(t;\hatpsi_4)$ and new observations for any time $t$ can be estimated by Monte Carlo simulation:
  
  
::{{Rcode
+
{{Rcode
 
|name=
 
|name=
 
|code=
 
|code=
Line 967: Line 831:
  
 
par(mfrow= c(1,1))
 
par(mfrow= c(1,1))
plot(t,y,ylim=c(0,4.5),xlab="time (hour)",ylab="concentration (mg/l)",col = "blue")
+
plot(t,y,ylim=c(0,4.5),xlab="time (hour)",
 +
      ylab="concentration (mg/l)",col = "blue")
 
lines(tc,fc4, type = "l", col = "red", lwd=2)
 
lines(tc,fc4, type = "l", col = "red", lwd=2)
 
lines(tc,cifc4s[,1], type = "l", col = "red", lwd=1, lty=3)
 
lines(tc,cifc4s[,1], type = "l", col = "red", lwd=1, lty=3)
Line 974: Line 839:
 
lines(tc,ciy4s[,2], type = "l", col = "green", lwd=1, lty=3)
 
lines(tc,ciy4s[,2], type = "l", col = "green", lwd=1, lty=3)
 
abline(a=0,b=0,lty=2)
 
abline(a=0,b=0,lty=2)
legend(10.5,4.5,c("observed concentrations","predicted concentration",
+
legend(10.5,4.5,c("observed concentrations", "predicted concentration",  
"CI for predicted concentration", "CI for observed concentrations"), lty=c(-1,1,3,3),
+
      "CI for predicted concentration", "CI for observed concentrations"),  
pch=c(1,-1,-1,-1), lwd=c(2,2,1,1), col=c("blue","red","red","green"))
+
      lty=c(-1,1,3,3), pch=c(1,-1,-1,-1), lwd=c(2,2,1,1), col=c("blue","red","red","green"))
 
</pre> }}
 
</pre> }}
  
{| cellspacing="10" cellpadding="5"
+
 
|style="width=550px"|
+
{| cellpadding="5" cellspacing="0"  
::[[File:individual7.png]]
+
|style="width=50%"|
|style="width=550px"|
+
[[File:NewIndividual7.png|link=]]
{{JustCode
+
|style="width=50%"|
 +
{{JustCodeForTable
 
|code=
 
|code=
<pre style="background-color: #EFEFEF; border:none">
+
<pre style="background-color: #EFEFEF; border:none; color:blue">
 
> ci4s
 
> ci4s
 
             [,1]        [,2]
 
             [,1]        [,2]
Line 993: Line 859:
 
[4,] 5.445459e-09  0.08819339
 
[4,] 5.445459e-09  0.08819339
 
[5,] 1.563625e-02  0.19638889
 
[5,] 1.563625e-02  0.19638889
}}
+
</pre> }}
 
|}
 
|}
  
  
 +
The R code and input data used in this section can be downloaded here: {{filepath:R_IndividualFitting.rar}}.
 
<br>
 
<br>
 +
 
==Bibliography==
 
==Bibliography==
  
  
 +
<bibtex>
 +
@book{buonaccorsi2010measurement,
 +
  title={Measurement Error: Models, Methods, and Applications},
 +
  author={Buonaccorsi, J.P.},
 +
  isbn={9781420066586},
 +
  lccn={2009048849},
 +
  series={Chapman & Hall/CRC Interdisciplinary Statistics},
 +
  url={http://books.google.fr/books?id=QVtVmaCqLHMC},
 +
  year={2010},
 +
  publisher={Taylor & Francis}
 +
}
 +
</bibtex><bibtex>
 +
@book{carroll2010measurement,
 +
  title={Measurement Error in Nonlinear Models: A Modern Perspective, Second Edition},
 +
  author={Carroll, R.J. and Ruppert, D. and Stefanski, L.A. and Crainiceanu, C.M.},
 +
  isbn={9781420010138},
 +
  lccn={2006045485},
 +
  series={Chapman & Hall/CRC Monographs on Statistics & Applied Probability},
 +
  url={http://books.google.fr/books?id=9kBx5CPZCqkC},
 +
  year={2010},
 +
  publisher={Taylor & Francis}
 +
}
 +
</bibtex>
  
 
+
<bibtex>
 
+
@book{fitzmaurice2004applied,
 
+
  title={Applied Longitudinal Analysis},
 
+
  author={Fitzmaurice, G.M. and Laird, N.M. and Ware, J.H.},
 
+
  isbn={9780471214878},
 
+
  lccn={04040891},
 
+
  series={Wiley Series in Probability and Statistics},
 
+
  url={http://books.google.fr/books?id=gCoTIFejMgYC},
 
+
  year={2004},
 
+
  publisher={Wiley}
 
+
}
 
+
</bibtex>
 
+
<bibtex>
 
+
@book{gallant2009nonlinear,
 
+
  title={Nonlinear Statistical Models},
 
+
  author={Gallant, A.R.},
 
+
  isbn={9780470317372},
 
+
  series={Wiley Series in Probability and Statistics},
 
+
  url={http://books.google.fr/books?id=imv-NMozseEC},
 
+
  year={2009},
==Models and methods==
+
  publisher={Wiley}
 
+
}
 
+
</bibtex>
 
+
<bibtex>
 
+
@book{huet2003statistical,
 
+
  title={Statistical tools for nonlinear regression: a practical guide with S-PLUS and R examples},
===Model for continuous data===
+
  author={Huet, S. and Bouvier, A. and Poursat, M.A. and Jolivet, E.},
 
+
  year={2003},
 
+
  publisher={Springer}
+
}
A model for continuous data can be described by one of the two equations (according to the definition of a residual error model joint to the model):
+
</bibtex>
 
+
<bibtex>
 
+
@book{ritz2008nonlinear,
::<div style="text-align: left;font-size: 11pt"><math>\begin{align}
+
  title={Nonlinear regression with R},
y_{j} &= f(t_j ; \psi) + \varepsilon_j \quad ; \quad  1\leq j \leq n
+
   author={Ritz, C. and Streibig, J.C.},
\\
+
  volume={33},
&= f(t_j ; \psi) + g(t_j ; \psi) \tilde{\varepsilon_j}
+
  year={2008},
\end{align}</math></div>
+
  publisher={Springer New York}
 
+
}
 
+
</bibtex>
where:
+
<bibtex>
 
+
@book{ross1990nonlinear,
 
+
  title={Nonlinear estimation},
* $f$ corresponds to the structural model
+
  author={Ross, G.J.S.},
* $\psi=(\psi_1, \psi_2, \ldots, \psi_d)$  is the vector of parameters
+
  isbn={9780387972787},
* $(t_1,t_2,\ldots , t_n)$ is the vector of the observation times
+
  lccn={90032797},
* $(\varepsilon_j, \varepsilon_2, \ldots, \varepsilon_n)$  are the residual errors ($\Epsilon({\varepsilon_j}) =0$)
+
  series={Springer series in statistics},
* $g$ indicates the residual error model
+
  url={http://books.google.fr/books?id=7LkyzdLMghIC},
* $(\tilde{\varepsilon_1}, \tilde{\varepsilon_2}, \ldots, \tilde{\varepsilon_n})$ are the normalized residual errors $(Var(\tilde{\varepsilon_j}) =1)$
+
  year={1990},
 
+
  publisher={Springer-Verlag}
 
+
}
 
+
</bibtex>
There are several residual error models; each model is represented by a specific formula and depends by different parameters. Hereafter there is a list of the main residual error models with the corresponding formulas:
+
<bibtex>
 
+
@book{seber2003nonlinear,
 
+
  title={Nonlinear Regression},
{| {| align=left; style="width: 600px" cellpadding="8" cellspacing="0"
+
  author={Seber, G.A.F. and Wild, C.J.},
| - constant error model ||  $y=f+a\tilde{\varepsilon}$, || $g=a$ 
+
  isbn={9780471471356},
|-
+
  lccn={88017194},
| - proportional error model || $y=f+bf\tilde{\varepsilon}$, || $g=b\, f$
+
  series={Wiley Series in Probability and Statistics},
|-
+
  url={http://books.google.fr/books?id=YBYlCpBNo\_cC},
| - combined error model 1 || $y=f+(a+bf)\tilde{\varepsilon}$, || $g=a+b f$
+
  year={2003},
|-
+
  publisher={Wiley}
| - combined error model 2 || $y=f+\sqrt{a^2+b^2f^2}\tilde{\varepsilon}$, || $g^2=a^2+b^2f^2$
 
|-
 
|- exponential error model || $\log(y)=\log(f) + a\tilde{\varepsilon}$ || $g=a$
 
|}
 
 
 
 
 
 
 
Modelling a vector of observations $(y_j)$ requires several tasks. First of all we have to estimate the vector of parameters $\psi$ for a given model. Then we need to select the structural model $f$ and the residual error model $g$ to apply to the observations. Finally the selected model should be assessed and validated.
 
<br><br><br>
 
 
 
===Maximum likelihood estimation===
 
 
 
 
 
The maximum likelihood estimation of the parameters can be computed using the formula: $\hat{\psi}$ maximizes $L(\psi ; y_1,y_2,\ldots,y_j)$, where
 
 
 
 
 
::<div style="text-align: left;font-size: 11pt"><math>
 
L(\psi ; y_1,y_2,\ldots,y_j) {\overset{def}{=}} p_Y( y_1,y_2,\ldots,y_j ; \psi)
 
</math></div>
 
 
 
 
 
If we assume that $\bar{\varepsilon_i} \sim_{i.i.d} {\cal N}(0,1)$, then, the $y_i$'s are independent and
 
 
 
 
 
::<div style="text-align: left;font-size: 11pt"><math>
 
y_{j} \sim {\cal N}(f(t_j ; \psi) , g(t_j ; \psi)^2)
 
</math></div>
 
 
 
 
 
As a consequence, the ''p.d.f'' of $(y_1, y_2, \ldots y_n)$ can  be computed via the equations:
 
 
 
 
 
::<div style="text-align: left;font-size: 11pt"><math>\begin{align}
 
p_Y(y_1, y_2, \ldots y_n ; \psi) &= \prod_{j=1}^n p_{Y_j}(y_j ; \psi) \\ \\
 
&= \frac{e^{- \displaystyle{\frac{1}{2}} \sum_{j=1}^n \left( \frac{y_j - f(t_j ; \psi)}{g(t_j ; \psi)} \right)^2}}{\prod_{j=1}^n \sqrt{2\pi g(t_j ; \psi)}}
 
\end{align}</math></div>
 
 
 
 
 
 
 
Maximizing the likelihood is equivalent to minimizing the deviance (-2 $\times$ log-likelihood) which plays here the role of the objective function:
 
 
 
::<div style="text-align: left;font-size: 11pt"><math>\begin{align}
 
\hat{\psi} = \argmin_{\psi} \left\{  \sum_{j=1}^n \log(g(t_j ; \psi)^2) + \sum_{j=1}^n \left( \frac{y_j - f(t_j ; \psi)}{g(t_j ; \psi) }\right)^2 \right \}  
 
\end{align}</math></div>
 
 
 
 
 
and the deviance is therefore
 
 
 
 
 
::<div style="text-align: left;font-size: 11pt"><math>\begin{align}
 
-2 LL(\hat{\psi} ; y_1,y_2,\ldots,y_j) = \sum_{j=1}^n \log(g(t_j ; \hat{\psi})^2)  + \sum_{j=1}^n \left(\frac{y_j - f(t_j ; \hat{\psi})}{g(t_j ; \hat{\psi})}\right)^2 +n\log(2\pi)
 
\end{align}</math></div>
 
 
 
 
 
This minimization problem usually does not have an analytical solution for a non linear model and the optimization procedures should be used.
 
However, some specific models have specific solutions. For instance, in case of a ''constant error model'' described by the generic equation:
 
 
 
::<div style="text-align: left;font-size: 12pt"><math> y_{j} = f(t_j ; \phi)  + a \, \bar{\varepsilon_j}</math></div>
 
 
 
 
 
we have
 
 
 
::<div style="text-align: left;font-size: 11pt"><math>\begin{align}
 
\hat{\phi} &= \argmin_{\psi}  \sum_{j=1}^n \left( y_j - f(t_j ; \phi)\right)^2 \\ \\
 
\hat{a}&= \frac{1}{n}\sum_{j=1}^n \left( y_j - f(t_j ; \hat{\phi})\right)^2 \\ \\
 
-2 LL(\hat{\psi} ; y_1,y_2,\ldots,y_j) &=  \sum_{j=1}^n \log(\hat{a}^2) + n +n\log(2\pi)
 
\end{align}</math></div>
 
 
 
 
 
 
 
In case of a ''linear error model'' represented by
 
 
 
::<div style="text-align: left;font-size: 11pt"><math>
 
y_{j} = F_j \, \phi  + a \, \bar{\varepsilon_j} \quad ; \quad  1\leq j \leq n
 
</math></div>
 
 
 
 
 
Let
 
 
 
::<div style="text-align: left;font-size: 11pt"><math>
 
F = \left(\begin{array}{c} F_1 \\ F_2 \\ \vdots \\F_n \\ \end{array} \right)
 
\quad ; \quad
 
y = \left(\begin{array}{c} y_1 \\ y_2 \\ \vdots \\y_n \\ \end{array} \right)    
 
</math></div>
 
 
 
 
 
Then the solution is:
 
 
 
 
 
::<div style="text-align: left;font-size: 11pt"><math>\begin{align}
 
\hat{\phi} &= (F^\prime F)^{-1} F^\prime y \\
 
\hat{a}&= \frac{1}{n}\sum_{j=1}^n \left( y_j - F \hat{\phi})\right)^2
 
\end{align}</math></div>
 
<br><br><br>
 
 
 
===Computing the Fisher Information Matrix===
 
 
 
Let $\psis$ be the true unknown value of $\psi$, and let $\hat{\psi}$ be the maximum likelihood estimate of $\psi$.
 
If the observed likelihood function is sufficiently smooth, asymptotic theory for maximum-likelihood estimation holds and
 
 
 
::<div style="text-align: left;font-size: 11pt"><math>
 
I_n(\psis)^{\frac{1}{2}}(\hat{\psi}-\psi{^\star}) \limite{n\to \infty}{} {\mathcal N}(0,{\rm Id})
 
</math></div>
 
 
 
 
 
where
 
 
 
::<div style="text-align: left;font-size: 11pt"><math>
 
I_n(\psi{^\star})= - \DDt LL(\psis;y_1,y_2,\ldots,y_n)
 
</math></div>
 
 
 
 
 
is the observed Fisher information matrix. Thus, an estimate of the covariance of $\hpsi$ is the inverse of the observed Fisher information matrix as expressed by the formula:
 
 
 
::<div style="text-align: left;font-size: 11pt"><math>
 
C(\hpsi) = \left(- \DDt LL(\hpsi ; y_1,y_2,\ldots,y_n) \right)^{-1}
 
</math></div>
 
 
 
 
 
 
 
 
 
===Deriving confidence intervals for the parameters===
 
 
 
Let $\psi_k$ be the $k$th component of $\psi$, with $k=1,2,\ldots,d$.
 
$\psi_k$ is estimated by $\hpsi_k$, the $k$th component of $\hpsi$, that is a random variable that converges to $\psi_k^{\star}$ when $n \to \infty$.
 
 
 
An estimator of its variance is the $k$th element of the diagonal of the covariance matrix $C(\hpsi)$:
 
 
 
 
 
::<div style="text-align: left;font-size: 11pt"><math>
 
\widehat{\rm Var}(\hpsi_k) = C_{kk}(\hpsi)
 
</math></div>
 
 
 
 
 
We can then derive an estimator of its standard error
 
 
 
 
 
::<div style="text-align: left;font-size: 11pt"><math>
 
\widehat{\rm s.e}(\hpsi_k) = \sqrt{C_{kk}(\hpsi)}
 
</math></div>
 
 
 
 
 
and a confidence interval for $\psi_k^\star$ constructed at a confidence level $\alpha$:
 
 
 
 
 
::<div style="text-align: left;font-size: 11pt"><math>
 
{\rm CI}(\psi_k) = [\hpsi_k + \widehat{\rm s.e}(\hpsi_k)q((1-\alpha)/2) , \hpsi_k + \widehat{\rm s.e}(\hpsi_k)q((1+\alpha)/2)]
 
</math></div>
 
 
 
 
 
where $q(\alpha)$ is the quantile of order $\alpha$ of a ${\cal N}(0,1)$ distribution.
 
 
 
 
 
<u>''Remarks:''</u> <br>
 
:The normal distribution for $\hpsi/\widehat{\rm s.e}(\hpsi_k)$ is a "good" approximation only when the number of observations $n$ is large.
 
:Better approximation should be used for small $n$.
 
:In the model $y_j = f(t_j ; \phi) + a\teps_j$, the distribution of $\hat{a}^2$ can be approximated by a chi-square distribution with $(n-d_\phi)$ degrees of freedom, :where $d_\phi$ is the size of $\phi$.
 
:The quantiles of the normal distribution can then be replaced by the quantiles of a $t$-distribution with $(n-d_\phi)$ df.
 
<!--$${\rm CI}(\psi_k) = [\hpsi_k - \widehat{\rm s.e}(\hpsi_k)q((1-\alpha)/2,n-d) , \hpsi_k + \widehat{\rm s.e}(\hpsi_k)q((1+\alpha)/2,n-d)]$$-->
 
<!--where $q(\alpha,\nu)$ is the quantile of order $\alpha$ of a $t$-distribution with $\nu$ degrees of freedom.-->
 
 
 
===Deriving confidence intervals for the predictions===
 
 
 
The regression model (or structural model) can be predicted for any $t$ using the following expression:
 
 
 
 
 
::<div style="text-align: left;font-size: 11pt"><math>
 
\hat{f}(t,\phi) = f(t ; \hphi)
 
</math></div>
 
 
 
 
 
It is then possible for any $t$ to derive a confidence interval for $f(t,\phi)$ using the estimated variance of $\hphi$.
 
Indeed, as a first approximation, we have:
 
 
 
<!-- ::<div style="text-align: left;font-size: 11pt"><math> -->
 
<!-- f(t ; \hphi) \simeq f(t ; \phi) + J_f(\phi) (\hphi - \phi) -->
 
<!-- </math></div> -->
 
<!-- where $J_f(t,\phi)$ is the Jacobian matrix of $f$, ''i.e.'' the matrix of all first-order partial derivatives of $f(t,\phi)$ with respect to the components of $\phi$ (the $k$th row of $J_f(t,\phi)$ is ) -->
 
 
 
 
 
::<div style="text-align: left;font-size: 11pt"><math>
 
f(t ; \hphi) \simeq f(t ; \phis) + \nabla f (t,\phis) (\hphi - \phis)
 
</math></div>
 
 
 
 
 
where $\nabla f(t,\phis)$ is the gradient of $f$, ''i.e.'' the vector of all first-order partial derivatives of $f(t,\phi)$ with respect to the components of $\phi$, obtained with $\phi=\phis$.
 
 
 
 
 
$\nabla f(t,\phis)$ can be estimated by $\nabla f(t,\hphi)$. We can then estimate the variance of $f(t ; \hphi)$ by
 
 
 
 
 
::<div style="text-align: left;font-size: 11pt"><math>
 
\widehat{\rm Var}\left(f(t ; \hphi)\right) \simeq \nabla f (t,\hphi)\widehat{\rm Var}(\hphi) \left(\nabla f (t,\hphi) \right)^\prime
 
</math></div>
 
<br><br>
 
 
 
===Estimating confidence intervals by Monte Carlo===
 
 
 
The utilization of Monte Carlo methods to estimate a distribution does not require any approximation of the corresponding model.
 
 
 
We can easily estimate the distribution of $\hpsi$ by simulating a large number $M$ of independent vectors of observations $(y^{(m)},1\leq m \leq M)$  using the predicted distribution of the observed vector $y=(y_j)$. The result obtained is accurate:
 
 
 
 
 
::<div style="text-align: left;font-size: 11pt"><math>
 
y^{(m)}_j = f(t_j ;\hpsi) + g(t_j ;\hpsi)\teps^{(m)}_j
 
</math></div>
 
 
 
 
 
here $\teps^{(m)}_j  \sim_{i.i.d.} {\cal N}(0,1)$.
 
 
 
We can then compute the MLE of $\psi$ for each of these replicates to derive an empirical estimation of the distribution of $\hpsi$.
 
In particular, we can estimate any quantile of the distribution of $\hpsi_k$ using the empirical quantiles of $(\hpsi^{(m)}_k ,1\leq m \leq M)$.
 
 
 
Any confidence interval for $\psi_k$ (resp. $f(t,\psi_k)$) can then be approximated by a prediction interval for $\hpsi_k$ (resp. $f(t,\hpsi_k)$) .
 
 
 
==A PK  example==
 
 
 
 
 
 
 
Let us consider a dose $D=50$ mg of a drug that is administrated orally to a patient at time 0. The concentrations of the drug are measured at different times (0.5, 1.0,  1.5,  2.0,  3.0,  4.0,  8.0, 10.0, 12.0, 16.0, 20.0, 24.0).
 
 
 
{| cellpadding="5" cellspacing="5"
 
| style="width: 600px;" |
 
[[File:individual1.png]]
 
| style="width: 600px;" |
 
<pre style="background-color:#EFEFEF; font-family:'courier new';font-size:10pt;  border: 1px solid darkgray; border-radius:1em;">
 
pk1=read.table("example1_data.txt",header=T)
 
t=pk1$time 
 
y=pk1$concentration
 
plot(t,y,xlab="time(hour)",ylab="concentration(mg/l)",col="blue")  </pre>
 
|}
 
 
 
 
 
There are two structural models that we can consider in order to evaluate the effects of such administration:
 
 
 
{| cellpadding="5" cellspacing="5"
 
| style="width: 600px;" |
 
: 1. One compartment model, first order absorption and linear elimination
 
 
 
 
 
::<div style="font-size: 11pt"><math>\begin{align}
 
\phi_1 &= (k_a, V, k_e) \\
 
f_1(t ; \phi_1) &= \frac{D\, k_a}{V(k_a-k_e)} \left( e^{-k_e \, t} - e^{-k_a \, t}\right) \end{align}
 
</math></div>
 
 
 
 
 
 
 
: 2. One compartment model, zero order absorption and linear elimination
 
 
 
 
 
::<div style="font-size: 11pt"><math>\begin{align}
 
\phi_2 &= (T_{k0}, V, k_e) \\
 
f_2(t ; \phi_2) &= \left\{  \begin{array}{ll}
 
                        \frac{D}{V \,T_{k0} \, k_e} \left( 1- e^{-k_e \, t} \right) & {\rm if } \ t\leq T_{k0} \\[0.3cm]
 
                        \frac{D}{V \,T_{k0} \, k_e} \left( 1- e^{-k_e \, T_{k0}} \right)e^{-k_e \, (t- T_{k0})} & {\rm otherwise}
 
                        \end{array}
 
\right.
 
\end{align}</math></div>
 
| style="width: 600px;" |
 
<pre style=" background-color:#EFEFEF; font-family:'courier new';font-size:10pt;border: 1px solid darkgray; border-radius:1em;">
 
predc1=function(t,x){
 
A=50*x[1]/x[2]/(x[1]-x[3])
 
f=A*(exp(-x[3]*t)-exp(-x[1]*t))
 
 
}
 
}
 
+
</bibtex><bibtex>
predc2=function(t,x){  
+
@article{serroyen2009nonlinear,
A=50/x[1]/x[2]/x[3]
+
  title={Nonlinear models for longitudinal data},
ff=A*(1-exp(-x[3]*t))
+
  author={Serroyen, J. and Molenberghs, G. and Verbeke, G. and Davidian, M. },
ff[t>x[1]]=A*(1-exp(-x[3]*x[1]))*exp(-x[3]*(t[t>x[1]]-x[1]))
+
  journal={The American Statistician},
f=ff
+
  volume={63},
 +
  number={4},
 +
  pages={378-388},
 +
  year={2009},
 +
  publisher={Taylor & Francis}
 
}
 
}
</pre>  
+
</bibtex>
|}
+
<bibtex>
 
+
@book{wolberg2006data,
 
+
  title={Data analysis using the method of least squares: extracting the most information from experiments},
 
+
  author={Wolberg, J.R.},
 
+
   volume={1},
We define then the models ${\cal M}_1$ and ${\cal M}_2$ by assuming  constant error models:
+
  year={2006},
::<div style="text-align: left;font-size: 11pt"><math>\begin{align}
+
  publisher={Springer Berlin, Germany}
{\cal M}_1  : \quad y_j & = & f_1(t_j ; \phi_1) + a_1\teps_j \\
 
{\cal M}_2  : \quad y_j & = & f_2(t_j ; \phi_2) + a_2\teps_j
 
\end{align}</math></div>
 
 
 
 
 
We can fit these two models to our data by computing $\hpsi_1=(\hphi_1,\hat{a}_1)$ and $\hpsi_2=(\hphi_2,\hat{a}_2)$, the MLEs of $\psi$  under models ${\cal M}_1$ and ${\cal M}_2$.
 
 
 
 
 
{| cellpadding="5" cellspacing="5"
 
| style="width: 600px;" |
 
<pre style="font-family:'courier new';font-size:11pt; color:blue;">
 
>>psi1
 
[1]    0.3240916 6.001204 0.3239337 0.4366948
 
 
 
>>psi2
 
[1]    3.203111 8.999746 0.229977 0.2555242
 
</pre>
 
|style="width: 600px;"|
 
<pre style=" background-color:#EFEFEF; font-family:'courier new';font-size:10pt;border: 1px solid darkgray; border-radius:1em;">
 
fmin1=function(x,y,t)
 
{f=predc1(t,x)
 
g=x[4]
 
e=sum( ((y-f)/g)^2 + log(g^2))}
 
 
 
fmin2=function(x,y,t)
 
{f=predc2(t,x)
 
g=x[4]
 
e=sum( ((y-f)/g)^2 + log(g^2))}
 
 
 
pk.nlm1=nlm(fmin1, c(0.3,6,0.2,1), y, t, hessian="true")
 
psi1=pk.nlm1$estimate
 
 
 
pk.nlm2=nlm(fmin2, c(3,10,0.2,4), y, t, hessian="true")
 
psi2=pk.nlm2$estimate
 
</pre>
 
|}
 
 
 
 
 
Therefore, the estimated parameters $\hphi_1$ and $\hphi_2$ can be used for computing predicted concentrations $\hat{f}_1(t)$ and $\hat{f}_2(t)$ under both models and for any time $t$.
 
We clearly see here that a much better fit is obtained with model ${\cal M}_2$, ''i.e.'' assuming a zero order absorption process.
 
 
 
 
 
 
 
{| cellpadding="5" cellspacing="5"
 
| style="width: 600px;" |
 
[[File:individual2.png]]
 
|style="width: 600px;"|
 
<pre style=" background-color:#EFEFEF; font-family:'courier new';font-size:10pt;border: 1px solid darkgray; border-radius:1em;">
 
tc=seq(from=0,to=25,by=0.1)
 
fc1=predc1(tc,phi1)
 
fc2=predc2(tc,phi2)
 
plot(t,y,ylim=c(0,4.1),xlab="time (hour)",ylab="concentration (mg/l)",col="blue")
 
lines(tc,fc1,type="l",col="green",lwd=2)
 
lines(tc,fc2,type="l",col="red",lwd=2)
 
abline(a=0,b=0,lty=2)
 
legend(13,4,c("observations","first order absorption","zero order absorption"),
 
      lty=c(-1,1,1),pch=c(1,-1,-1),lwd=2,col=c("blue","green","red"))
 
</pre>
 
|}
 
 
 
 
 
Another useful goodness of fit plot is obtained by displaying the observations $(y_j)$ versus the predictions $\hat{y}_j=f(t_j ; \hpsi)$ given by the model:
 
 
 
{| cellpadding="5" cellspacing="5"
 
| style="width: 600px;" |
 
[[File:individual3.png]]
 
|style="width: 600px;"|
 
<pre style=" background-color:#EFEFEF; font-family:'courier new';font-size:10pt;border: 1px solid darkgray; border-radius:1em;">
 
pk.nlm1=nlm(fmin1,c(0.3,6,0.2,1),y,t,hessian="true")
 
f2=predc2(t,phi2)
 
par(mfrow= c(1,2))
 
plot(f1,y,xlim=c(0,4),ylim=c(0,4),main="model 1")
 
abline(a=0,b=1,lty=1)
 
plot(f2,y,xlim=c(0,4),ylim=c(0,4),main="model 2")
 
abline(a=0,b=1,lty=1)</pre>
 
|}
 
 
 
 
 
As we can see in the following paragraph, the Bayesian Information Criteria confirm the zero order absorption process should be selected.
 
 
 
{| cellpadding="5" cellspacing="5"
 
| style="width: 600px;" |
 
<pre style="font-family:'courier new';font-size:11pt; color:blue;">
 
>>bic1
 
[1]    24.10972
 
 
 
>>bic2
 
[1]    11.24769
 
</pre>
 
|style="width: 600px;"|
 
<pre style=" background-color:#EFEFEF; font-family:'courier new';font-size:10pt;border: 1px solid darkgray; border-radius:1em;">
 
deviance1=pk.nlm1$minimum + n*log(2*pi)
 
bic1=deviance1+log(n)*length(psi1)
 
deviance2=pk.nlm2$minimum + n*log(2*pi)
 
bic2=deviance2+log(n)*length(psi2)
 
</pre>
 
|}
 
 
 
 
 
 
 
 
 
 
 
 
 
We have only considered for the moment constant error models. Nevertheless, the graphic "observations vs predictions" seems to indicate that the amplitude of the residual errors increase with the prediction. We will then consider four different residual error models associated with the same structural model $f_2$.
 
 
 
 
 
{| cellpadding="5" cellspacing="5"
 
| style="width: 600px;" |
 
{| align=left; cellpadding="3"
 
| ${\cal M}_2$, || constant error model:    ||  $y_j=f_2(t_j;\phi_2)+a_2\teps_j$
 
|-
 
| ${\cal M}_3$, ||    proportional error model: || $y_j=f_2(t_j;\phi_3)+b_3f_2(t_j;\phi_3)\teps_j$
 
|-
 
| ${\cal M}_4$, ||   combined error model:    || $y_j=f_2(t_j;\phi_4)+(a_4+b_4f_2(t_j;\phi_4))\teps_j$
 
|-
 
|${\cal M}_5$, ||  exponential error model:  || $\log(y_j)=\log(f_2(t_j;\phi_5)) + a_5\teps_j$
 
|}
 
|style="width: 600px;"|
 
<pre style=" background-color:#EFEFEF; font-family:'courier new';font-size:10pt;border: 1px solid darkgray; border-radius:1em;">
 
fmin3=function(x,y,t)
 
{f=predc2(t,x)
 
g=x[4]*f
 
e=sum( ((y-f)/g)^2 + log(g^2))}
 
 
 
fmin4=function(x,y,t)
 
{f=predc2(t,x)
 
g=abs(x[4])+abs(x[5])*f
 
e=sum( ((y-f)/g)^2 + log(g^2))}
 
 
 
fmin5=function(x,y,t)
 
{f=predc2(t,x)
 
g=x[4]
 
e=sum( ((log(y)-log(f))/g)^2 + log(g^2))}
 
</pre>
 
|}
 
 
 
 
 
We can now compute $\hpsi_3=(\hphi_3,\hat{b}_3)$, $\hpsi_4=(\hphi_4,\hat{a}_4,,\hat{b}_4)$ and $\hpsi_5=(\hphi_5,\hat{a}_5)$, the MLEs of $\psi$  under models ${\cal M}_3$, ${\cal M}_4$  and ${\cal M}_5$.
 
 
 
{| cellpadding="5" cellspacing="5"
 
| style="width: 600px;" |
 
<pre style="font-family:'courier new';font-size:11pt; color:blue;">
 
>>psi3
 
[1]    2.642409 11.44113 0.1838779 0.2189221
 
 
 
>>psi4
 
[1]    2.890066 10.16836 0.2068221 0.02741416 0.1456332
 
 
 
>>psi5
 
[1]    2.710984 11.2744 0.188901 0.2310001
 
</pre>
 
|style="width: 600px;"|
 
<pre style=" background-color:#EFEFEF; font-family:'courier new';font-size:10pt;border: 1px solid darkgray; border-radius:1em;">
 
pk.nlm3=nlm(fmin3,c(phi2,0.1),y,t,hessian="true")
 
psi3=pk.nlm3$estimate
 
pk.nlm4=nlm(fmin4,c(phi2,1,0.1),y,t,hessian="true")
 
psi4=pk.nlm4$estimate
 
psi4[c(4,5)]=abs(psi4[c(4,5)])
 
pk.nlm5=nlm(fmin5, c(phi2,0.1),y,t,hessian="true")
 
psi5=pk.nlm5$estimate
 
</pre>
 
|}
 
 
 
As you can see, the three predicted concentrations obtained with models ${\cal M}_3$, ${\cal M}_4$  and ${\cal M}_5$ are very similar
 
 
 
{| cellpadding="5" cellspacing="5"
 
| style="width: 600px;" |
 
[[File:individual4.png]]
 
|style="width: 600px;"|
 
<pre style=" background-color:#EFEFEF; font-family:'courier new';font-size:10pt;border: 1px solid darkgray; border-radius:1em;">
 
tc=seq(from=0,to=25,by=0.1)
 
fc1=predc1(tc,phi1)
 
fc2=predc2(tc,phi2)
 
plot(t,y,ylim=c(0,4.1),xlab="time (hour)",
 
    ylab="concentration (mg/l)",col = "blue")
 
lines(tc,fc1, type = "l", col = "green", lwd=2)
 
lines(tc,fc2, type = "l", col = "red", lwd=2)
 
abline(a=0,b=0,lty=2)
 
legend(13,4,c("observations","first order absorption","zero order absorption"),
 
      lty=c(-1,1,1), pch=c(1,-1,-1),lwd=2, col=c("blue","green","red"))
 
</pre>
 
|}
 
 
 
BIC confirms that a residual error model including a proportional component should be selected.
 
 
 
{| cellpadding="5" cellspacing="5"
 
| style="width: 600px;" |
 
<pre style="font-family:'courier new';font-size:11pt; color:blue;">
 
>> bic3
 
[1]    3.443607
 
 
 
>> bic4
 
[1]    3.475841
 
 
 
>> bic5
 
[1]    4.108521
 
</pre>
 
|style="width: 600px;"|
 
<pre style=" background-color:#EFEFEF; font-family:'courier new';font-size:10pt;border: 1px solid darkgray; border-radius:1em;">
 
deviance3=pk.nlm3$minimum + n*log(2*pi)
 
bic3=deviance3 + log(n)*length(psi3)
 
deviance4=pk.nlm4$minimum + n*log(2*pi)
 
bic4=deviance4 + log(n)*length(psi4)
 
deviance5=pk.nlm5$minimum + 2*sum(log(y)) + n*log(2*pi)
 
bic5=deviance5 + log(n)*length(psi5)
 
</pre>
 
|}
 
 
 
There is no obvious difference between these three error models. We will use the combined error model ${\cal M}_4$ in the following.
 
 
 
A 90% confidence interval for $\psi_4$ can derived from the Hessian matrix of the objective function.
 
 
 
{| cellpadding="5" cellspacing="5"
 
| style="width: 600px;" |
 
<pre style="font-family:'courier new';font-size:11pt; color:blue;">
 
>>ci4
 
            [,1]        [,2]
 
[1,]  2.22576690  3.55436561
 
[2,]  7.93442421 12.40228967
 
[3,]  0.16628224  0.24736196
 
[4,] -0.02444571  0.07927403
 
[5,]  0.04119983  0.25006660
 
</pre>
 
|style="width: 600px;"|
 
<pre style=" background-color:#EFEFEF; font-family:'courier new';font-size:10pt;border: 1px solid darkgray; border-radius:1em;">
 
alpha=0.9
 
df=n-length(phi4)
 
I4=pk.nlm4$hessian/2
 
H4=solve(I4)
 
s4=sqrt(diag(H4)*n/df)
 
delta4=s4*qt(0.5+alpha/2,df)
 
ci4=matrix(c(psi4-delta4,psi4+delta4),ncol=2)
 
</pre>
 
|}
 
 
 
as well as a confidence interval for $f_4(t)$ using the Central Limit Theorem
 
 
 
{| cellpadding="5" cellspacing="5"
 
| style="width: 600px;" |
 
[[File:individual6.png]]
 
|style="width: 600px;"|
 
<pre style=" background-color:#EFEFEF; font-family:'courier new';font-size:10pt;border: 1px solid darkgray; border-radius:1em;">
 
nlpredci=function(phi,f,H)
 
{dphi=length(phi)
 
nf=length(f)
 
H=H*n/(n-dphi)
 
S=H[seq(1,dphi),seq(1,dphi)]
 
G=matrix(nrow=nf,ncol=dphi)
 
for (k in seq(1,dphi)) {
 
  dk=phi[k]*(1e-5)
 
  phid=phi
 
  phid[k]=phi[k] + dk
 
  fd=predc2(tc,phid)
 
  G[,k]=(f-fd)/dk }
 
M=rowSums((G%*%S)*G)
 
deltaf=sqrt(M)*qt(0.5+alpha/2,df)}
 
 
 
deltafc4=nlpredci(phi4,fc4,H4)
 
 
 
par(mfrow= c(1,1))
 
plot(t,y,ylim=c(0,4.5),xlab="time (hour)",ylab="concentration (mg/l)",col = "blue")
 
lines(tc,fc4, type = "l", col = "red", lwd=2)
 
lines(tc,fc4-deltafc4, type = "l", col = "red", lwd=1, lty=3)
 
lines(tc,fc4+deltafc4, type = "l", col = "red", lwd=1, lty=3)
 
abline(a=0,b=0,lty=2)
 
legend(10.5,4.5,c("observed concentrations","predicted concentration","IC for predicted concentration"),
 
      lty=c(-1,1,3), pch=c(1,-1,-1), lwd=c(2,2,1), col=c("blue","red","red"))
 
</pre>
 
|}
 
 
 
Alternatively, prediction intervals for $\hpsi_4$, $\hat{f}_4(t;\hpsi_4)$ and new observations at any time can  be estimated by Monte-Carlo
 
 
 
{| cellpadding="5" cellspacing="5"
 
| style="width: 600px;" |
 
<pre style="font-family:'courier new';font-size:11pt; color:blue;">
 
>>ci4s
 
            [,1]        [,2]
 
[1,] 2.412131e+00  3.54008925
 
[2,] 8.554283e+00 11.89304332
 
[3,] 1.829453e-01  0.23922393
 
[4,] 1.558052e-08  0.08006378
 
[5,] 1.233954e-02  0.19552977
 
</pre>
 
[[ File:individual7.png]]
 
|style="width: 600px;"|
 
<pre style=" background-color:#EFEFEF; font-family:'courier new';font-size:10pt;border: 1px solid darkgray; border-radius:1em;">
 
f=predc2(t,phi4)
 
a4=psi4[4]
 
b4=psi4[5]
 
g=a4+b4*f
 
dpsi=length(psi4)
 
nc=length(tc)
 
N=1000
 
qalpha=c(0.5 - alpha/2,0.5 + alpha/2)
 
PSI=matrix(nrow=N,ncol=dpsi)
 
FC=matrix(nrow=N,ncol=nc)
 
Y=matrix(nrow=N,ncol=nc)
 
for (k in seq(1,N)) {
 
  eps=rnorm(n)
 
  ys=f+g*eps
 
  pk.nlm=nlm(fmin4, psi4, ys, t)
 
  psie=pk.nlm$estimate
 
  psie[c(4,5)]=abs(psie[c(4,5)])
 
  PSI[k,]=psie
 
  fce=predc2(tc,psie[c(1,2,3)])
 
  FC[k,]=fce
 
  gce=a4+b4*fce
 
  Y[k,]=fce + gce*rnorm(1)
 
 
}
 
}
 
+
</bibtex>
ci4s=matrix(nrow=dpsi,ncol=2)
 
for (k in seq(1,dpsi)){
 
  ci4s[k,]=quantile(PSI[,k],qalpha,names=FALSE)}
 
 
 
cifc4s=matrix(nrow=nc,ncol=2)
 
for (k in seq(1,nc)){
 
  cifc4s[k,]=quantile(FC[,k],qalpha,names=FALSE)}
 
 
 
ciy4s=matrix(nrow=nc,ncol=2)
 
for (k in seq(1,nc)){
 
  ciy4s[k,]=quantile(Y[,k],qalpha,names=FALSE)}
 
par(mfrow= c(1,1))
 
plot(t,y,ylim=c(0,4.5),xlab="time (hour)",ylab="concentration (mg/l)",col = "blue")
 
lines(tc,fc4, type = "l", col = "red", lwd=2)
 
lines(tc,cifc4s[,1], type = "l", col = "red", lwd=1, lty=3)
 
lines(tc,cifc4s[,2], type = "l", col = "red", lwd=1, lty=3)
 
lines(tc,ciy4s[,1], type = "l", col = "green", lwd=1, lty=3)
 
lines(tc,ciy4s[,2], type = "l", col = "green", lwd=1, lty=3)
 
abline(a=0,b=0,lty=2)
 
legend(10.5,4.5,c("observed concentrations","predicted concentration","IC for predicted concentration",
 
      "IC for observed concentrations"),lty=c(-1,1,3,3), pch=c(1,-1,-1,-1), lwd=c(2,2,1,1),
 
      col=c("blue","red","red","green"))
 
</pre>  
 
|}
 
 
 
  
  
The R code and the input data used in this section to illustrate a basic PK example can be downloaded using the following link : {{filepath:R_IndividualFitting.rar}}.
+
{{Back&Next
 +
|linkBack=Overview
 +
|linkNext=What is a model? A joint probability distribution! }}

Latest revision as of 14:58, 28 August 2013

Overview

Before we start looking at modeling a whole population at the same time, we are going to consider only one individual from that population. Much of the basic methodology for modeling one individual follows through to population modeling. We will see that when stepping up from one individual to a population, the difference is that some parameters shared by individuals are considered to be drawn from a probability distribution.

Let us begin with a simple example. An individual receives 100mg of a drug at time $t=0$. At that time and then every hour for fifteen hours, the concentration of a marker in the bloodstream is measured and plotted against time:

New Individual1.png

We aim to find a mathematical model to describe what we see in the figure. The eventual goal is then to extend this approach to the simultaneous modeling of a whole population.



Model and methods for the individual approach


Defining a model

In our example, the concentration is a continuous variable, so we will try to use continuous functions to model it. Different types of data (e.g., count data, categorical data, time-to-event data, etc.) require different types of models. All of these data types will be considered in due time, but for now let us concentrate on a continuous data model.

A model for continuous data can be represented mathematically as follows:

\( y_{j} = f(t_j ; \psi) + e_j, \quad \quad 1\leq j \leq n, \)

where:


  • $f$ is called the structural model. It corresponds to the basic type of curve we suspect the data is following, e.g., linear, logarithmic, exponential, etc. Sometimes, a model of the associated biological processes leads to equations that define the curve's shape.
  • $(t_1,t_2,\ldots , t_n)$ is the vector of observation times. Here, $t_1 = 0$ hours and $t_n = t_{16} = 15$ hours.
  • $\psi=(\psi_1, \psi_2, \ldots, \psi_d)$ is a vector of $d$ parameters that influences the value of $f$.
  • $(e_1, e_2, \ldots, e_n)$ are called the residual errors. Usually, we suppose that they come from some centered probability distribution: $\esp{e_j} =0$.


In fact, we usually state a continuous data model in a slightly more flexible way:

\( y_{j} = f(t_j ; \psi) + g(t_j ; \psi)\teps_j , \quad \quad 1\leq j \leq n, \)
(1)

where now:


    • $g$ is called the residual error model. It may be a function of the time $t_j$ and parameters $\psi$.
    • $(\teps_1, \teps_2, \ldots, \teps_n)$ are the normalized residual errors. We suppose that these come from a probability distribution which is centered and has unit variance: $\esp{\teps_j} = 0$ and $\var{\teps_j} =1$.


Choosing a residual error model

The choice of a residual error model $g$ is very flexible, and allows us to account for many different hypotheses we may have on the error's distribution. Let $f_j=f(t_j;\psi)$. Here are some simple error models.


    • Constant error model: $g=a$. That is, $y_j=f_j+a\teps_j$.
    • Proportional error model: $g=b\,f$. That is, $y_j=f_j+bf_j\teps_j$. This is for when we think the magnitude of the error is proportional to the value of the predicted value $f$.
    • Combined error model: $g=a+b f$. Here, $y_j=f_j+(a+bf_j)\teps_j$.
    • Alternative combined error model: $g^2=a^2+b^2f^2$. Here, $y_j=f_j+\sqrt{a^2+b^2f_j^2}\teps_j$.
    • Exponential error model: here, the model is instead $\log(y_j)=\log(f_j) + a\teps_j$, that is, $g=a$. It is exponential in the sense that if we exponentiate, we end up with $y_j = f_j e^{a\teps_j}$.



Tasks

To model a vector of observations $y = (y_j,\, 1\leq j \leq n$) we must perform several tasks:

    • Select a structural model $f$ and a residual error model $g$.
    • Estimate the model's parameters $\psi$.
    • Assess and validate the selected model.



Selecting structural and residual error models

As we are interested in parametric modeling, we must choose parametric structural and residual error models. In the absence of biological (or other) information, we might suggest possible structural models just by looking at the graphs of time-evolution of the data. For example, if $y_j$ is increasing with time, we might suggest an affine, quadratic or logarithmic model, depending on the approximate trend of the data. If $y_j$ is instead decreasing ever slower to zero, an exponential model might be appropriate.

However, often we have biological (or other) information to help us make our choice. For instance, if we have a system of differential equations describing how the drug is eliminated from the body, its solution may provide the formula (i.e., structural model) we are looking for.

As for the residual error model, if it is not immediately obvious which one to choose, several can be tested in conjunction with one or several possible structural models. After parameter estimation, each structural and residual error model pair can be assessed, compared against the others, and/or validated in various ways.

Now we can have a first look at parameter estimation, and further on, model assessment and validation.



Parameter estimation

Given the observed data and the choice of a parametric model to describe it, our goal becomes to find the "best" parameters for the model. A traditional framework to solve this kind of problem is called maximum likelihood estimation or MLE, in which the "most likely" parameters are found, given the data that was observed.

The likelihood $L$ is a function defined as:

\( L(\psi ; y_1,y_2,\ldots,y_n) \ \ \eqdef \ \ \py( y_1,y_2,\ldots,y_n; \psi) , \)

i.e., the conditional joint density function of $(y_j)$ given the parameters $\psi$, but looked at as if the data are known and the parameters not. The $\hat{\psi}$ which maximizes $L$ is known as the maximum likelihood estimator.

Suppose that we have chosen a structural model $f$ and residual error model $g$. If we assume for instance that $ \teps_j \sim_{i.i.d} {\cal N}(0,1)$, then the $y_j$ are independent of each other and (1) means that:

\( y_{j} \sim {\cal N}\left(f(t_j ; \psi) , g(t_j ; \psi)^2\right), \quad \quad 1\leq j \leq n .\)

Due to this independence, the pdf of $y = (y_1, y_2, \ldots, y_n)$ is the product of the pdfs of each $y_j$:

\(\begin{eqnarray} \py(y_1, y_2, \ldots y_n ; \psi) &=& \prod_{j=1}^n \pyj(y_j ; \psi) \\ \\ & = & \frac{1}{\prod_{j=1}^n \sqrt{2\pi} g(t_j ; \psi)} \ {\rm exp}\left\{-\frac{1}{2} \sum_{j=1}^n \left( \displaystyle{ \frac{y_j - f(t_j ; \psi)}{g(t_j ; \psi)} }\right)^2\right\} . \end{eqnarray}\)

This is the same thing as the likelihood function $L$ when seen as a function of $\psi$. Maximizing $L$ is equivalent to minimizing the deviance, i.e., -2 $\times$ the $\log$-likelihood ($LL$):

\(\begin{eqnarray} \hat{\psi} &=& \argmin{\psi} \left\{ -2 \,LL \right\}\\ &=& \argmin{\psi} \left\{ \sum_{j=1}^n \log\left(g(t_j ; \psi)^2\right) + \sum_{j=1}^n \left(\displaystyle{ \frac{y_j - f(t_j ; \psi)}{g(t_j ; \psi)} }\right)^2 \right\} . \end{eqnarray}\)
(2)


This minimization problem does not usually have an analytical solution for nonlinear models, so an optimization procedure needs to be used. However, for a few specific models, analytical solutions do exist.

For instance, suppose we have a constant error model: $y_{j} = f(t_j ; \psi) + a \, \teps_j,\,\, 1\leq j \leq n,$ that is: $g(t_j;\psi) = a$. In practice, $f$ is not itself a function of $a$, so we can write $\psi = (\phi,a)$ and therefore: $y_{j} = f(t_j ; \phi) + a \, \teps_j.$ Thus, (2) simplifies to:

\( (\hat{\phi},\hat{a}) \ \ = \ \ \argmin{(\phi,a)} \left\{ n \log(a^2) + \sum_{j=1}^n \left(\displaystyle{ \frac{y_j - f(t_j ; \phi)}{a} }\right)^2 \right\} . \)

The solution is then:

\(\begin{eqnarray} \hat{\phi} &=& \argmin{\phi} \sum_{j=1}^n \left( y_j - f(t_j ; \phi)\right)^2 \\ \hat{a}^2&=& \frac{1}{n}\sum_{j=1}^n \left( y_j - f(t_j ; \hat{\phi})\right)^2 , \end{eqnarray} \)

where $\hat{a}^2$ is found by setting the partial derivative of $-2LL$ to zero.

Whether this has an analytical solution or not depends on the form of $f$. For example, if $f(t_j;\phi)$ is just a linear function of the components of the vector $\phi$, we can represent it as a matrix $F$ whose $j$th row gives the coefficients at time $t_j$. Therefore, we have the matrix equation $y = F \phi + a \teps$.

The solution for $\hat{\phi}$ is thus the least-squares one, and for $\hat{a}^2$ it is the same as before:

\(\begin{eqnarray} \hat{\phi} &=& (F^\prime F)^{-1} F^\prime y \\ \hat{a}^2&=& \frac{1}{n}\sum_{j=1}^n \left( y_j - F_j \hat{\phi}\right)^2 . \\ \end{eqnarray}\)



Computing the Fisher information matrix

The Fisher information is a way of measuring the amount of information that an observable random variable carries about an unknown parameter upon which its probability distribution depends.

Let $\psis $ be the true unknown value of $\psi$, and let $\hatpsi$ be the maximum likelihood estimate of $\psi$. If the observed likelihood function is sufficiently smooth, asymptotic theory for maximum-likelihood estimation holds and

\( I_n(\psis)^{\frac{1}{2} }(\hatpsi-\psis) \limite{n\to \infty}{} {\mathcal N}(0,\id) , \)
(3)

where $I_n(\psis)$ is (minus) the Hessian (i.e., the matrix of the second derivatives) of the log-likelihood:

\(I_n(\psis)=- \displaystyle{ \frac{\partial^2}{\partial \psi \partial \psi^\prime} } LL(\psis;y_1,y_2,\ldots,y_n) \)

is the observed Fisher information matrix. Here, "observed" means that it is a function of observed variables $y_1,y_2,\ldots,y_n$.

Thus, an estimate of the covariance of $\hatpsi$ is the inverse of the observed Fisher information matrix as expressed by the formula:

\(C(\hatpsi) = - I_n(\hatpsi)^{-1} . \)



Deriving confidence intervals for parameters

Let $\psi_k$ be the $k$th of $d$ components of $\psi$. Imagine that we have estimated $\psi_k$ with $\hatpsi_k$, the $k$th component of the MLE $\hatpsi$, that is, a random variable that converges to $\psi_k^{\star}$ when $n \to \infty$ under very general conditions.

An estimator of its variance is the $k$th element of the diagonal of the covariance matrix $C(\hatpsi)$:

\(\widehat{\rm Var}(\hatpsi_k) = C_{kk}(\hatpsi) .\)

We can thus derive an estimator of its standard error:

\(\widehat{\rm s.e.}(\hatpsi_k) = \sqrt{C_{kk}(\hatpsi)} ,\)

and a confidence interval of level $1-\alpha$ for $\psi_k^\star$:

\({\rm CI}(\psi_k^\star) = \left[\hatpsi_k + \widehat{\rm s.e.}(\hatpsi_k)\,q\left(\frac{\alpha}{2}\right), \ \hatpsi_k + \widehat{\rm s.e.}(\hatpsi_k)\,q\left(1-\frac{\alpha}{2}\right)\right] , \)

where $q(w)$ is the quantile of order $w$ of a ${\cal N}(0,1)$ distribution.


Remarks

Approximating the fraction $\hatpsi/\widehat{\rm s.e}(\hatpsi_k)$ by the normal distribution is a "good" approximation only when the number of observations $n$ is large. A better approximation should be used for small $n$. In the model $y_j = f(t_j ; \phi) + a\teps_j$, the distribution of $\hat{a}^2$ can be approximated by a chi-squared distribution with $(n-d_\phi)$ degrees of freedom, where $d_\phi$ is the dimension of $\phi$. The quantiles of the normal distribution can then be replaced by those of a Student's $t$-distribution with $(n-d_\phi)$ degrees of freedom.



Deriving confidence intervals for predictions

The structural model $f$ can be predicted for any $t$ using the estimated value $f(t; \hatphi)$. For that $t$, we can then derive a confidence interval for $f(t,\phi)$ using the estimated variance of $\hatphi$. Indeed, as a first approximation we have:


\( f(t ; \hatphi) \simeq f(t ; \phis) + \nabla f (t,\phis) (\hatphi - \phis) ,\)

where $\nabla f(t,\phis)$ is the gradient of $f$ at $\phis$, i.e., the vector of the first-order partial derivatives of $f$ with respect to the components of $\phi$, evaluated at $\phis$. Of course, we do not actually know $\phis$, but we can estimate $\nabla f(t,\phis)$ with $\nabla f(t,\hatphi)$. The variance of $f(t ; \hatphi)$ can then be estimated by

\( \widehat{\rm Var}\left(f(t ; \hatphi)\right) \simeq \nabla f (t,\hatphi)\widehat{\rm Var}(\hatphi) \left(\nabla f (t,\hatphi) \right)^\prime . \)

We can then derive an estimate of the standard error of $f (t,\hatphi)$ for any $t$:

\(\widehat{\rm s.e.}(f(t ; \hatphi)) = \sqrt{\widehat{\rm Var}\left(f(t ; \hatphi)\right)} , \)

and a confidence interval of level $1-\alpha$ for $f(t ; \phi^\star)$:

\({\rm CI}(f(t ; \phi^\star)) = \left[f(t ; \hatphi) + \widehat{\rm s.e.}(f(t ; \hatphi))\,q\left(\frac{\alpha}{2}\right), \ f(t ; \hatphi) + \widehat{\rm s.e.}(f(t ; \hatphi))\,q\left(1-\frac{\alpha}{2}\right)\right].\)



Estimating confidence intervals using Monte Carlo simulation

The use of Monte Carlo methods to estimate a distribution does not require any approximation of the model.

We proceed in the following way. Suppose we have found a MLE $\hatpsi$ of $\psi$. We then simulate a data vector $y^{(1)}$ by first randomly generating the vector $\teps^{(1)}$ and then calculating for $1 \leq j \leq n$,

\( y^{(1)}_j = f(t_j ;\hatpsi) + g(t_j ;\hatpsi)\teps^{(1)}_j . \)

In a sense, this gives us an example of "new" data from the "same" model. We can then compute a new MLE $\hat{\psi}^{(1)}$ of $\psi$ using $y^{(1)}$.

Repeating this process $M$ times gives $M$ estimates of $\psi$ from which we can obtain an empirical estimation of the distribution of $\hatpsi$, or any quantile we like.

Any confidence interval for $\psi_k$ (resp. $f(t,\psi_k)$) can then be approximated by a prediction interval for $\hatpsi_k$ (resp. $f(t,\hatpsi_k)$). For instance, a two-sided confidence interval of level $1-\alpha$ for $\psi_k^\star$ can be estimated by the prediction interval

\( [\hat{\psi}_{k,([\frac{\alpha}{2} M])} \ , \ \hat{\psi}_{k,([ (1-\frac{\alpha}{2})M])} ], \)

where $[\cdot]$ denotes the integer part and $(\psi_{k,(m)},\ 1 \leq m \leq M)$ the order statistic, i.e., the parameters $(\hatpsi_k^{(m)}, 1 \leq m \leq M)$ reordered so that $\hatpsi_{k,(1)} \leq \hatpsi_{k,(2)} \leq \ldots \leq \hatpsi_{k,(M)}$.




A PK example

In the real world, it is often not enough to look at the data, choose one possible model and estimate the parameters. The chosen structural model may or may not be "good" at representing the data. It may be good but the chosen residual error model bad, meaning that the overall model is poor, and so on. That is why in practice we may want to try out several structural and residual error models. After performing parameter estimation for each model, various assessment tasks can then be performed in order to conclude which model is best.



The data

This modeling process is illustrated in detail in the following PK example. Let us consider a dose D=50mg of a drug administered orally to a patient at time $t=0$. The concentration of the drug in the bloodstream is then measured at times $(t_j) = (0.5, 1,\,1.5,\,2,\,3,\,4,\,8,\,10,\,12,\,16,\,20,\,24).$ Here is the file individualFitting_data.txt with the data:


Time Concentration
0.5 0.94
1.0 1.30
1.5 1.64
2.0 3.38
3.0 3.72
4.0 3.29
8.0 1.31
10.0 0.80
12.0 0.39
16.0 0.31
20.0 0.10
24.0 0.09


We are going to perform the analyses for this example with the free statistical software R. First, we import the data and plot it to have a look:

NewIndividual1.png

Rstudio.png
R

pk1=read.table("individualFitting_data.txt",header=T) 
t=pk1$time  
y=pk1$concentration
plot(t, y, xlab="time(hour)",
     ylab="concentration(mg/l)", col="blue")   



Fitting two PK models

We are going to consider two possible structural models that may describe the observed time-course of the concentration:


    \(\begin{eqnarray} \phi_1 &=& (k_a, V, k_e) \\ f_1(t ; \phi_1) &=& \frac{D\, k_a}{V(k_a-k_e)} \left( e^{-k_e \, t} - e^{-k_a \, t} \right). \end{eqnarray}\)


    • A one compartment model with zero-order absorption and linear elimination:

    \(\begin{eqnarray} \phi_2 &=& (T_{k0}, V, k_e) \\ f_2(t ; \phi_2) &=& \left\{ \begin{array}{ll} \displaystyle{ \frac{D}{V \,T_{k0} \, k_e} }\left( 1- e^{-k_e \, t} \right) & {\rm if }\ t\leq T_{k0} \\ \displaystyle{ \frac{D}{V \,T_{k0} \, k_e} } \left( 1- e^{-k_e \, T_{k0} } \right)e^{-k_e \, (t- T_{k0})} & {\rm otherwise} . \end{array} \right. \end{eqnarray}\)


We define each of these functions in R:


Rstudio.png
R


predc1=function(t,x){
  f=50*x[1]/x[2]/(x[1]-x[3])*(exp(-x[3]*t)-exp(-x[1]*t))
return(f)}

predc2=function(t,x){
  f=50/x[1]/x[2]/x[3]*(1-exp(-x[3]*t))
  f[t>x[1]]=50/x[1]/x[2]/x[3]*(1-exp(-x[3]*x[1]))*exp(-x[3]*(t[t>x[1]]-x[1]))
return(f)} 


We then define two models ${\cal M}_1$ and ${\cal M}_2$ that assume (for now) constant residual error models:

\(\begin{eqnarray} {\cal M}_1 : \quad y_j & = & f_1(t_j ; \phi_1) + a_1\teps_j \\ {\cal M}_2 : \quad y_j & = & f_2(t_j ; \phi_2) + a_2\teps_j . \end{eqnarray}\)

We can fit these two models to our data by computing the MLE $\hatpsi_1=(\hatphi_1,\hat{a}_1)$ and $\hatpsi_2=(\hatphi_2,\hat{a}_2)$ of $\psi$ under each model:

Rstudio.png
R

fmin1=function(x,y,t){
  f=predc1(t,x)
  g=x[4]
  e=sum( ((y-f)/g)^2 + log(g^2))
return(e)}

fmin2=function(x,y,t){
  f=predc2(t,x)
  g=x[4]
  e=sum( ((y-f)/g)^2 + log(g^2))
return(e)}

#--------- MLE --------------------------------

pk.nlm1=nlm(fmin1, c(0.3,6,0.2,1), y, t, hessian="true")
psi1=pk.nlm1$estimate

pk.nlm2=nlm(fmin2, c(3,10,0.2,4), y, t, hessian="true")
psi2=pk.nlm2$estimate
Here are the parameter estimation results:


> cat(" psi1 =",psi1,"\n\n")
 psi1 = 0.3240916 6.001204 0.3239337 0.4366948

> cat(" psi2 =",psi2,"\n\n")
 psi2 = 3.203111 8.999746 0.229977 0.2555242



Assessing and selecting the PK model

The estimated parameters $\hatphi_1$ and $\hatphi_2$ can then be used for computing the predicted concentrations $\hat{f}_1(t)$ and $\hat{f}_2(t)$ under both models at any time $t$. These curves can then be plotted over the original data and compared:

New Individual2.png

Rstudio.png
R

tc=seq(from=0,to=25,by=0.1)
phi1=psi1[c(1,2,3)]
fc1=predc1(tc,phi1)
phi2=psi2[c(1,2,3)]
fc2=predc2(tc,phi2)

plot(t,y,ylim=c(0,4.1),xlab="time (hour)", 
          ylab="concentration (mg/l)",col = "blue")
lines(tc,fc1, type = "l", col = "green", lwd=2)
lines(tc,fc2, type = "l", col = "red", lwd=2)
abline(a=0,b=0,lty=2)
legend(13,4,c("observations","first order absorption", 
          "zero order absorption"),
lty=c(-1,1,1), pch=c(1,-1,-1), lwd=2, col=c("blue","green","red"))

We clearly see that a much better fit is obtained with model ${\cal M}_2$, i.e., the one assuming a zero-order absorption process.

Another useful goodness-of-fit plot is obtained by displaying the observations $(y_j)$ versus the predictions $\hat{y}_j=f(t_j ; \hatpsi)$ given by the models:

Individual3.png

Rstudio.png
R

f1=predc1(t,phi1)
f2=predc2(t,phi2)

par(mfrow= c(1,2))
plot(f1,y,xlim=c(0,4),ylim=c(0,4),main="model 1")
abline(a=0,b=1,lty=1)
plot(f2,y,xlim=c(0,4),ylim=c(0,4),main="model 2")
abline(a=0,b=1,lty=1)



Model selection

Again, ${\cal M}_2$ would seem to have a slight edge. This can be tested more analytically using the Bayesian Information Criteria (BIC):

Rstudio.png
R

deviance1=pk.nlm1$minimum + n*log(2*pi)
bic1=deviance1+log(n)*length(psi1)
deviance2=pk.nlm2$minimum + n*log(2*pi)
bic2=deviance2+log(n)*length(psi2)
> cat(" bic1 =",bic1,"\n\n")
 bic1 = 24.10972

> cat(" bic2 =",bic2,"\n\n")
 bic2 = 11.24769

A smaller BIC is better. Therefore, this also suggests that model ${\cal M}_2$ should be selected.



Fitting different error models

For the moment, we have only considered constant error models. However, the "observations vs predictions" figure hints that the amplitude of the residual errors may increase with the size of the predicted value. Let us therefore take a closer look at four different residual error models, each of which we will associate with the "best" structural model $f_2$:

${\cal M}_2$ Constant error model: $y_j=f_2(t_j;\phi_2)+a_2\teps_j$
${\cal M}_3$ Proportional error model: $y_j=f_2(t_j;\phi_3)+b_3f_2(t_j;\phi_3)\teps_j$
${\cal M}_4$ Combined error model: $y_j=f_2(t_j;\phi_4)+(a_4+b_4f_2(t_j;\phi_4))\teps_j$
${\cal M}_5$ Exponential error model: $\log(y_j)=\log(f_2(t_j;\phi_5)) + a_5\teps_j$.

The three new ones need to be entered into R:


Rstudio.png
R


fmin3=function(x,y,t){
  f=predc2(t,x)
  g=x[4]*f
  e=sum( ((y-f)/g)^2 + log(g^2))
return(e)}

fmin4=function(x,y,t){
  f=predc2(t,x)
  g=abs(x[4])+abs(x[5])*f
  e=sum( ((y-f)/g)^2 + log(g^2))
return(e)}

fmin5=function(x,y,t){
  f=predc2(t,x)
  g=x[4]
  e=sum( ((log(y)-log(f))/g)^2 + log(g^2))
return(e)}


We can now compute the MLE $\hatpsi_3=(\hatphi_3,\hat{b}_3)$, $\hatpsi_4=(\hatphi_4,\hat{a}_4,\hat{b}_4)$ and $\hatpsi_5=(\hatphi_5,\hat{a}_5)$ of $\psi$ under models ${\cal M}_3$, ${\cal M}_4$ and ${\cal M}_5$:

Rstudio.png
R

#----------------  MLE  -------------------

pk.nlm3=nlm(fmin3, c(phi2,0.1), y, t, 
       hessian="true")
psi3=pk.nlm3$estimate

pk.nlm4=nlm(fmin4, c(phi2,1,0.1), y, t,  
       hessian="true")
psi4=pk.nlm4$estimate
psi4[c(4,5)]=abs(psi4[c(4,5)])

pk.nlm5=nlm(fmin5, c(phi2,0.1), y, t,  
       hessian="true")
psi5=pk.nlm5$estimate  
> cat(" psi3 =",psi3,"\n\n")
 psi3 = 2.642409 11.44113 0.1838779 0.2189221

> cat(" psi4 =",psi4,"\n\n")
 psi4 = 2.890066 10.16836 0.2068221 0.02741416 0.1456332

> cat(" psi5 =",psi5,"\n\n")
 psi5 = 2.710984 11.2744 0.188901 0.2310001



Selecting the error model

As before, these curves can be plotted over the original data and compared:


New Individual4.png

Rstudio.png
R

phi3=psi3[c(1,2,3)]
fc3=predc2(tc,phi3)
phi4=psi4[c(1,2,3)]
fc4=predc2(tc,phi4)
phi5=psi5[c(1,2,3)]
fc5=predc2(tc,phi5)

par(mfrow= c(1,1))
plot(t,y,ylim=c(0,4.1),xlab="time (hour)",ylab="concentration (mg/l)",
        col = "blue")
lines(tc,fc2, type = "l", col = "red", lwd=2)
lines(tc,fc3, type = "l", col = "green", lwd=2)
lines(tc,fc4, type = "l", col = "cyan", lwd=2)
lines(tc,fc5, type = "l", col = "magenta", lwd=2)
abline(a=0,b=0,lty=2)
legend(13,4,c("observations","constant error model",
        "proportional error model","combined error model","exponential error model"),
 lty=c(-1,1,1,1,1), pch=c(1,-1,-1,-1,-1), lwd=2, 
        col=c("blue","red","green","cyan","magenta"))


As you can see, the three predicted concentrations obtained with models ${\cal M}_3$, ${\cal M}_4$ and ${\cal M}_5$ are quite similar. We now calculate the BIC for each:


Rstudio.png
R

deviance3=pk.nlm3$minimum + n*log(2*pi)
bic3=deviance3 + log(n)*length(psi3)
deviance4=pk.nlm4$minimum + n*log(2*pi)
bic4=deviance4 + log(n)*length(psi4)
deviance5=pk.nlm5$minimum + 2*sum(log(y)) + n*log(2*pi)
bic5=deviance5 + log(n)*length(psi5)
> cat(" bic3 =",bic3,"\n\n")
 bic3 = 3.443607

> cat(" bic4 =",bic4,"\n\n")
 bic4 = 3.475841

> cat(" bic5 =",bic5,"\n\n")
 bic5 = 4.108521

All of these BIC are lower than the constant residual error one. BIC selects the residual error model ${\cal M}_3$ with a proportional component.

There is not a large difference between these three error models, though the proportional and combined error models give the smallest and essentially identical BIC. We decide to use the combined error model ${\cal M}_4$ in the following (the same types of analysis could be done with the proportional error model).

A 90% confidence interval for $\psi_4$ can derived from the Hessian (i.e., the square matrix of second-order partial derivatives) of the objective function (i.e., -2 $\times \ LL$):


Rstudio.png
R

ialpha=0.9
df=n-length(phi4)
I4=pk.nlm4$hessian/2
H4=solve(I4)
s4=sqrt(diag(H4)*n/df)
delta4=s4*qt(0.5+ialpha/2, df)
ci4=matrix(c(psi4-delta4,psi4+delta4),ncol=2)
> ci4
            [,1]        [,2]
[1,]  2.22576690  3.55436561
[2,]  7.93442421 12.40228967
[3,]  0.16628224  0.24736196
[4,] -0.02444571  0.07927403
[5,]  0.04119983  0.25006660


We can also calculate a 90% confidence interval for $f_4(t)$ using the Central Limit Theorem (see (3)):


Rstudio.png
R


nlpredci=function(phi,f,H){
  dphi=length(phi)
  nf=length(f)
  H=H*n/(n-dphi)
  S=H[seq(1,dphi),seq(1,dphi)]
  G=matrix(nrow=nf,ncol=dphi)
  for (k in seq(1,dphi)) {
    dk=phi[k]*(1e-5)
    phid=phi
    phid[k]=phi[k] + dk
    fd=predc2(tc,phid)
    G[,k]=(f-fd)/dk
  }
  M=rowSums((G%*%S)*G)
  deltaf=sqrt(M)*qt(0.5+alpha/2,df)
return(deltaf)}

deltafc4=nlpredci(phi4,fc4,H4)


This can then be plotted:


NewIndividual6.png

Rstudio.png
R

plot(t,y,ylim=c(0,4.5), xlab="time (hour)", 
       ylab="concentration (mg/l)", col="blue")
lines(tc,fc4, type = "l",col = "red",lwd=2)
lines(tc, fc4-deltafc4, type = "l",
       col = "red" ,lwd=1, lty=3)
lines(tc,fc4+deltafc4,type = "l",
       col = "red", lwd=1, lty=3)
abline(a=0,b=0,lty=2)
legend(10.5,4.5,c("observed concentrations",
       "predicted concentration", 
       "CI for predicted concentration"),
       lty=c(-1,1,3),pch=c(1,-1,-1),lwd=c(2,2,1),
       col=c("blue","red","red"))

Alternatively, prediction intervals for $\hatpsi_4$, $\hat{f}_4(t;\hatpsi_4)$ and new observations for any time $t$ can be estimated by Monte Carlo simulation:


Rstudio.png
R


f=predc2(t,phi4)
a4=psi4[4]
b4=psi4[5]
g=a4+b4*f
dpsi=length(psi4)
nc=length(tc)
N=1000
qalpha=c(0.5 - alpha/2,0.5 + alpha/2)
PSI=matrix(nrow=N,ncol=dpsi)
FC=matrix(nrow=N,ncol=nc)
Y=matrix(nrow=N,ncol=nc)
for (k in seq(1,N)) {
   eps=rnorm(n)
   ys=f+g*eps
   pk.nlm=nlm(fmin4, psi4, ys, t)
   psie=pk.nlm$estimate
   psie[c(4,5)]=abs(psie[c(4,5)])
   PSI[k,]=psie
   fce=predc2(tc,psie[c(1,2,3)])
   FC[k,]=fce
   gce=a4+b4*fce
   Y[k,]=fce + gce*rnorm(1)
}

ci4s=matrix(nrow=dpsi,ncol=2)
for (k in seq(1,dpsi)){
   ci4s[k,]=quantile(PSI[,k],qalpha,names=FALSE)
}
m4s=colMeans(PSI)
sd4s=apply(PSI,2,sd)

cifc4s=matrix(nrow=nc,ncol=2)
for (k in seq(1,nc)){
   cifc4s[k,]=quantile(FC[,k],qalpha,names=FALSE)
}

ciy4s=matrix(nrow=nc,ncol=2)
for (k in seq(1,nc)){
   ciy4s[k,]=quantile(Y[,k],qalpha,names=FALSE)
}

par(mfrow= c(1,1))
plot(t,y,ylim=c(0,4.5),xlab="time (hour)",
       ylab="concentration (mg/l)",col = "blue")
lines(tc,fc4, type = "l", col = "red", lwd=2)
lines(tc,cifc4s[,1], type = "l", col = "red", lwd=1, lty=3)
lines(tc,cifc4s[,2], type = "l", col = "red", lwd=1, lty=3)
lines(tc,ciy4s[,1], type = "l", col = "green", lwd=1, lty=3)
lines(tc,ciy4s[,2], type = "l", col = "green", lwd=1, lty=3)
abline(a=0,b=0,lty=2)
legend(10.5,4.5,c("observed concentrations", "predicted concentration", 
       "CI for predicted concentration", "CI for observed concentrations"), 
       lty=c(-1,1,3,3), pch=c(1,-1,-1,-1), lwd=c(2,2,1,1), col=c("blue","red","red","green"))


NewIndividual7.png

> ci4s
             [,1]        [,2]
[1,] 2.350653e+00  3.53526320
[2,] 8.350764e+00 12.04910579
[3,] 1.818431e-01  0.24156832
[4,] 5.445459e-09  0.08819339
[5,] 1.563625e-02  0.19638889


The R code and input data used in this section can be downloaded here: https://wiki.inria.fr/wikis/popix/images/a/a1/R_IndividualFitting.rar.

Bibliography

Buonaccorsi, J.P. - Measurement Error: Models, Methods, and Applications

Taylor & Francis,2010
http://books.google.fr/books?id=QVtVmaCqLHMC
Bibtex
Author : Buonaccorsi, J.P.
Title : Measurement Error: Models, Methods, and Applications
In : -
Address :
Date : 2010

Carroll, R.J., Ruppert, D., Stefanski, L.A., Crainiceanu, C.M. - Measurement Error in Nonlinear Models: A Modern Perspective, Second Edition

Taylor & Francis,2010
http://books.google.fr/books?id=9kBx5CPZCqkC
Bibtex
Author : Carroll, R.J., Ruppert, D., Stefanski, L.A., Crainiceanu, C.M.
Title : Measurement Error in Nonlinear Models: A Modern Perspective, Second Edition
In : -
Address :
Date : 2010

Fitzmaurice, G.M., Laird, N.M., Ware, J.H. - Applied Longitudinal Analysis

Wiley,2004
http://books.google.fr/books?id=gCoTIFejMgYC
Bibtex
Author : Fitzmaurice, G.M., Laird, N.M., Ware, J.H.
Title : Applied Longitudinal Analysis
In : -
Address :
Date : 2004

Gallant, A.R. - Nonlinear Statistical Models

Wiley,2009
http://books.google.fr/books?id=imv-NMozseEC
Bibtex
Author : Gallant, A.R.
Title : Nonlinear Statistical Models
In : -
Address :
Date : 2009

Huet, S., Bouvier, A., Poursat, M.A., Jolivet, E. - Statistical tools for nonlinear regression: a practical guide with S-PLUS and R examples

Springer,2003
Bibtex
Author : Huet, S., Bouvier, A., Poursat, M.A., Jolivet, E.
Title : Statistical tools for nonlinear regression: a practical guide with S-PLUS and R examples
In : -
Address :
Date : 2003

Ritz, C., Streibig, J.C. - Nonlinear regression with R

Vol. 33, Springer New York,2008
Bibtex
Author : Ritz, C., Streibig, J.C.
Title : Nonlinear regression with R
In : -
Address :
Date : 2008

Ross, G.J.S. - Nonlinear estimation

Springer-Verlag,1990
http://books.google.fr/books?id=7LkyzdLMghIC
Bibtex
Author : Ross, G.J.S.
Title : Nonlinear estimation
In : -
Address :
Date : 1990

Seber, G.A.F., Wild, C.J. - Nonlinear Regression

Wiley,2003
http://books.google.fr/books?id=YBYlCpBNo\_cC
Bibtex
Author : Seber, G.A.F., Wild, C.J.
Title : Nonlinear Regression
In : -
Address :
Date : 2003

Serroyen, J., Molenberghs, G., Verbeke, G., Davidian, M. - Nonlinear models for longitudinal data

The American Statistician 63(4):378-388,2009
Bibtex
Author : Serroyen, J., Molenberghs, G., Verbeke, G., Davidian, M.
Title : Nonlinear models for longitudinal data
In : The American Statistician -
Address :
Date : 2009

Wolberg, J.R. - Data analysis using the method of least squares: extracting the most information from experiments

Vol. 1, Springer Berlin, Germany,2006
Bibtex
Author : Wolberg, J.R.
Title : Data analysis using the method of least squares: extracting the most information from experiments
In : -
Address :
Date : 2006


Back.png
Forward.png