https://wiki.inria.fr/wikis/popix/api.php?action=feedcontributions&user=Brocco&feedformat=atomPopix - User contributions [en]2024-03-28T22:47:50ZUser contributionsMediaWiki 1.32.6https://wiki.inria.fr/wikis/popix/index.php?title=Gaussian_models&diff=7407Gaussian models2013-06-25T08:36:02Z<p>Brocco: /* The normal distribution */</p>
<hr />
<div><!-- Menu for the Individual Parameters chapter --><br />
<sidebarmenu><br />
+[[Modeling the individual parameters]]<br />
*[[Modeling the individual parameters| Introduction ]] | [[Gaussian models]] | [[Model with covariates]] | [[Extension to multivariate distributions]] | [[Additional levels of variability]] <br />
</sidebarmenu><br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
== The normal distribution ==<br />
<br />
Gaussian models have several advantages, including the capacity of describing with ease both the predicted value of a random variable and its fluctuations around this value. Indeed, if we consider a Gaussian random variable $\psi$ with mean $\mu$ and standard deviation $\omega$, we can work with two entirely equivalent mathematical representations:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="indiv_gaussian1"><math> \begin{eqnarray}<br />
\psi &\sim& {\cal N}(\mu , \omega^2) <br />
\end{eqnarray}</math></div><br />
|reference=(1) }}<br />
{{EquationWithRef<br />
|equation=<div id="indiv_gaussian2"><math> \begin{eqnarray}<br />
\psi &=& \mu + \eta, \quad {\rm where }\ \quad \ \eta \sim {\cal N}(0,\omega^2) .<br />
\end{eqnarray}</math></div><br />
|reference=(2) }}<br />
<br />
The form [[#indiv_gaussian1|(1)]] provides an explicit description of the distribution of $\psi$ from which we can deduce the [http://en.wikipedia.org/wiki/Probability_density_function pdf] and other characteristics such as the [http://en.wikipedia.org/wiki/Median median], [http://en.wikipedia.org/wiki/Mode_%28statistics%29 mode] and [http://en.wikipedia.org/wiki/Quantile quantiles]. The figure below shows the [http://en.wikipedia.org/wiki/Probability_density_function pdf] of a normal distribution with [http://en.wikipedia.org/wiki/Mean mean] $\mu$ and [http://en.wikipedia.org/wiki/Standard_deviation standard deviation] $\omega$. <br />
Each vertical band contains 10% of the distribution.<br />
<br />
<br />
:{{ImageWithCaption|image=Ndistrib.png|caption=The ${\cal N}(\mu,\omega^2)$ distribution}}<br />
<br />
<br />
This type of graphical representation is powerful and helps us to better visualize the types of values the random variable can take and those values that are more likely than others.<br />
<br />
Examples of normal distributions with various parameters are shown in the next figure.<br />
<br />
<br />
{{ImageWithCaption|image=distrib1.png|caption=Normal distributions}}<br />
<br />
<br />
Representation [[#indiv_gaussian2|(2)]] lets us separate the random and non-random components of $\psi$. If we define as the predicted value the value obtained in the absence of randomness ($\eta=0$), we get that $\hat{\psi}=\mu$. In the particular case of a normal distribution, this predicted value is the mean, median and mode of $\psi$. We can therefore rewrite equations [[#indiv_gaussian1|(1)]] and [[#indiv_gaussian2|(2)]] using $\hpsi$:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\psi &\sim& {\cal N}(\hpsi , \omega^2) \\<br />
\psi &=& \hpsi + \eta, \quad {\rm where } \quad \ \ \eta \sim {\cal N}(0,\omega^2) .<br />
\end{eqnarray}</math> }}<br />
<br />
<br />
<br><br />
<br />
== Extensions of the normal distribution == <br />
<br />
Clearly, not all distributions are Gaussian. To begin with, the normal distribution has the support $\Rset$, unlike many parameters that take values in precise ranges; some variables take only positive values (e.g., concentrations and volumes) and others are restricted to bounded intervals (e.g., bioavailability).<br />
<br />
Furthermore, the [http://en.wikipedia.org/wiki/Gaussian_distribution Gaussian distribution] is symmetric, which is not a property shared by all distributions. One way to extend the use of [http://en.wikipedia.org/wiki/Gaussian_distribution Gaussian distributions] is to consider that some transform of the parameters we are interested in is Gaussian,<br />
i.e., assume the existence of a monotonic function $h$ such that $h(\psi)$ is normally distributed. Then, there exists some $\mu$ and $\omega$ such that $h(\psi) \sim {\cal N}(\mu , \omega^2)$.<br />
<br />
For a given transformation $h$, we can parametrize using $\hat{\psi}$, the predicted value of $\psi$. Indeed, the predicted value of $h(\psi)$ is $\mu=h(\hat{\psi})$, and<br />
<br />
{{EquationWithRef<br />
|equation=<div id="indiv_gaussian3"><math>\begin{eqnarray}<br />
h(\psi) &\sim& {\cal N}(h(\hat{\psi}) , \omega^2) <br />
\end{eqnarray}</math></div><br />
|reference=(3) }}<br />
{{EquationWithRef<br />
|equation=<div id="indiv_gaussian4"><math>\begin{eqnarray}<br />
h(\psi) &=& h(\hat{\psi}) + \eta , \quad {\rm where } \quad \ \eta \sim {\cal N}(0,\omega^2). <br />
\end{eqnarray}</math></div><br />
|reference=(4) }}<br />
<br />
It is possible to derive the pdf of $\psi$ from [[#indiv_gaussian3|(4)]]:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="indiv_gaussian5"><math><br />
\ppsi(\psi)=\displaystyle{ \frac{h^\prime(\psi)}{\sqrt{2 \pi \omega^2} } } \ \exp\left\{-\displaystyle{ \frac{1}{2 \, \omega^2} } (h(\psi) - h(\hpsi))^2 \right\}. </math></div> <br />
|reference=(5) }}<br />
<br />
Let us now see some examples of transformed normal pdfs:<br />
<br />
<br />
<br><br />
===Log-normal distribution===<br />
<br />
The log-normal distribution is widely used for describing the distribution of PK/PD parameters. This choice is usually justified by the fact that it ensures non-negative values, and rarely because it is shown to properly describe the population distribution of the parameter of interest.<br />
<br />
Let $\psi$ be a log-normally distributed random variable with parameters $(\mu,\omega)$:<br />
<br />
{{Equation1<br />
|equation=<math>\log(\psi) \sim {\cal N}( \mu, \omega). </math> }}<br />
<br />
This distribution can be also parameterized with $(m,\omega)$, where $m = \mu = \hat{\psi}$. Then, $\log(\psi) \sim {\cal N}( \log(m), \omega)$ and<br />
<br />
{{Equation1<br />
|equation=<math><br />
\ppsi(\psi)=\displaystyle{ \frac{1}{\psi \, \sqrt{2 \pi \omega^2} } }\ \exp\left\{- \displaystyle{\frac{1}{2 \, \omega^2} (\log(\psi) - \log(m))^2} \right\}.<br />
</math> }}<br />
<br />
We display below some log-normal pdfs obtained with different parameters $(m,\omega)$.<br />
<br />
<br />
{{ImageWithCaption|image=distrib2.png|caption=Log-normal distributions}}<br />
<br />
<br />
We see that for a given standard deviation $\omega$, the pdfs obtained for different $m$ are simply rescaled.<br />
<!-- {{Equation1|equation=<math> f_{\alpha m,\omega}(x) = \frac{f_{m,\omega}(x/\alpha)}{\alpha} </math> }} --><br />
On the other hand, for a given $m$ the asymmetry of the distribution increases when the standard deviation $\omega$ increases.<br />
<br />
<br />
<br />
{{Remarks<br />
|title=Remarks<br />
|text=<br />
Note that the log-normal distribution takes its values in $(0,+\infty)$. It is straightforward to define a rescaled distribution in $(a,+\infty)$ by shifting it:<br />
<br />
{{Equation1<br />
|equation= <br />
<math>\begin{eqnarray}<br />
\log(\psi-a) &\sim& {\cal N}( \log(m-a), \omega^2).<br />
\end{eqnarray}</math> }}<br />
}}<br />
<br />
<br />
<br><br />
<br />
===Power-normal (or Box-Cox) distribution===<br />
<br />
<br />
This is the distribution of a random variable $\psi$ for which the Box-Cox transformation of $\psi$,<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
h(\psi) = \displaystyle{ \frac{\psi^\lambda -1}{\lambda} }<br />
\end{eqnarray}</math> }}<br />
<br />
(with $\lambda > 0$) follows a normal distribution ${\cal N}( \mu, \omega^2)$ truncated such that $h(\psi)>0$. It therefore takes its values in $(0,+\infty)$.<br />
The distribution converges to the log-normal distribution when $\lambda \to 0$ and a truncated normal distribution when $\lambda \to 1$.<br />
The main interest of a power-normal distribution is its ability to represent a distribution "between" the log-normal distribution and the normal distribution.<br />
<br />
Here, $m = \hat{\psi} = (\lambda \mu + 1)^{1/\lambda}$.<br />
We display below several power-normal pdfs obtained with various parameter sets $(\lambda,m,\omega)$.<br />
<br />
<br />
{{ImageWithCaption|image=distrib3.png|caption=Power-normal distributions }}<br />
<br />
<br />
<br><br />
===Logit-normal and probit-normal distributions.===<br />
<br />
A random variable $\psi$ with a logit-normal distribution takes its values in $(0,1)$. The logit of $\psi$ is normally distributed, i.e.,<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\logit(\psi) &= &\log \left(\displaystyle{ \frac{\psi}{1-\psi} }\right) \<br />
\sim \ \ {\cal N}( \mu, \omega^2) \\<br />
m &=& \displaystyle{ \frac{1}{1+e^{-\mu} } }.<br />
\end{eqnarray}</math> }}<br />
<br />
This means that $\mu=\logit(m)$.<br />
<br />
A random variable $\psi$ with a probit-normal distribution also takes its values in $(0,1)$. Then, the <balloon title="The probit function is the inverse cumulative distribution function (quantile function) 1/&Phi; associated with the standard normal distribution N(0,1)." style="color:#177245">probit</balloon> of $\psi$ is normally distributed:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\probit(\psi) &= &\Phi^{-1}(\psi) \<br />
\sim \ {\cal N}( \mu, \omega^2) \\<br />
m &=& \Phi(\mu).<br />
\end{eqnarray}</math> }}<br />
<br />
This means that $\mu=\probit(m)$.<br />
<br />
We can see in the figures below that the pdfs of the logit and probit distributions with the same $m$ and well-chosen $\omega$ are very similar.<br />
Thus, these two distributions can be used interchangeably for modeling the distribution of a parameter that takes its values in $(0,1)$.<br />
<br />
<br />
{{ImageWithCaption|image=distribution4.png|caption=Logit-normal and probit-normal distributions }}<br />
<br />
<br />
Logit and probit transformations can be generalized to any interval $(a,b)$ by setting<br />
<br />
{{Equation1<br />
|equation=<math> \psi = a + (b-a)\tilde{\psi}, </math> }}<br />
<br />
where $\tilde{\psi}$ is a random variable that takes its values in $(0,1)$ with a logit (or probit) distribution.<br />
<br />
Furthermore, it is easy to show that the probit-normal distribution with $m=0.5$ and $\omega=1$ is the uniform distribution on $(0,1)$.<br />
Thus, any uniform distribution can easily be derived from the probit-normal distribution.<br />
<br />
<br />
<br><br />
=== Extension to transformed Student's $t$-distributions ===<br />
These extensions (log-$t$, power-$t$, etc.) can be obtained simply by replacing the normal distribution of the random effects with a [http://en.wikipedia.org/wiki/Student%27s_t-distribution Student $t$-distribution]. Such extensions can be useful for modeling heavy-tailed distributions.<br />
Several [http://en.wikipedia.org/wiki/Student%27s_t-distribution Student's $t$-distributions] with different degrees of freedom (d.f.) are displayed below. The [http://en.wikipedia.org/wiki/Student%27s_t-distribution Student's $t$-distribution] converges to the normal distribution as the d.f. increases, whereas heavy tails are obtained for small d.f.<br />
<br />
<br />
{{ImageWithCaption|image=student.png|caption=Standardized normal and Student's $t$ probability distribution functions }}<br />
<br />
<br />
<br><br />
<br />
== $\mlxtran$ for the Gaussian model== <br />
<br />
<br />
{{ExampleWithCode<br />
|title1=Example<br />
|title2=<br />
|text=<br />
|equation=<math>\begin{eqnarray}<br />
\logit(F_i) &\sim& {\cal N}(\logit(F_{\rm pop}), \omega_F^2) \\<br />
\log(ka_i) &\sim& {\cal N}(\log(ka_{\rm pop}), \omega_{ka}^2) \\<br />
V_i &\sim& {\cal N}(V_{\rm pop}, \omega_V^2) \\<br />
\displaystyle{\frac{Cl_i^{\lambda_{Cl} } - 1}{\lambda_{Cl} } } &\sim& {\cal N}(\frac{Cl_{\rm pop}^{\lambda_{Cl} } - 1}{\lambda_{Cl} }, \omega_{Cl}^2) <br />
\end{eqnarray}</math> <br />
|code= <br />
{{MLXTranForTable<br />
|name=<br />
|text=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
[INDIVIDUAL]<br />
input={F_pop, ka_pop, V_pop, Cl_pop, lambda_Cl, <br />
omega_F, omega_ka, omega_V, omega_Cl}<br />
<br />
DEFINITION:<br />
F = {distribution=logitnormal,reference=F_pop,sd=omega_F}<br />
ka = {distribution=lognormal,reference=ka_pop,sd=omega_ka}<br />
V = {distribution=normal,reference=V_pop,sd=omega_V}<br />
Cl = {distribution=powernormal,<br />
reference=Cl_pop,power=lambda_Cl,sd=omega_Cl}<br />
</pre> }}<br />
<br />
}}<br />
<br />
{{Back&Next<br />
|linkBack=Modeling the individual parameters<br />
|linkNext=Model with covariates }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=Gaussian_models&diff=7406Gaussian models2013-06-25T08:34:11Z<p>Brocco: /* The normal distribution */</p>
<hr />
<div><!-- Menu for the Individual Parameters chapter --><br />
<sidebarmenu><br />
+[[Modeling the individual parameters]]<br />
*[[Modeling the individual parameters| Introduction ]] | [[Gaussian models]] | [[Model with covariates]] | [[Extension to multivariate distributions]] | [[Additional levels of variability]] <br />
</sidebarmenu><br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
== The normal distribution ==<br />
<br />
Gaussian models have several advantages, including the capacity of describing with ease both the predicted value of a random variable and its fluctuations around this value. Indeed, if we consider a Gaussian random variable $\psi$ with mean $\mu$ and standard deviation $\omega$, we can work with two entirely equivalent mathematical representations:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="indiv_gaussian1"><math> \begin{eqnarray}<br />
\psi &\sim& {\cal N}(\mu , \omega^2) <br />
\end{eqnarray}</math></div><br />
|reference=(1) }}<br />
{{EquationWithRef<br />
|equation=<div id="indiv_gaussian2"><math> \begin{eqnarray}<br />
\psi &=& \mu + \eta, \quad {\rm where }\ \quad \ \eta \sim {\cal N}(0,\omega^2) .<br />
\end{eqnarray}</math></div><br />
|reference=(2) }}<br />
<br />
The form [[#indiv_gaussian1|(1)]] provides an explicit description of the distribution of $\psi$ from which we can deduce the [http://en.wikipedia.org/wiki/Probability_density_function pdf] and other characteristics such as the [http://en.wikipedia.org/wiki/Median median], [http://en.wikipedia.org/wiki/Mode_%28statistics%29 mode] and [http://en.wikipedia.org/wiki/Quantile quantiles]. The figure below shows the [http://en.wikipedia.org/wiki/Probability_density_function pdf] of a normal distribution with mean $\mu$ and standard deviation $\omega$. <br />
Each vertical band contains 10% of the distribution.<br />
<br />
<br />
:{{ImageWithCaption|image=Ndistrib.png|caption=The ${\cal N}(\mu,\omega^2)$ distribution}}<br />
<br />
<br />
This type of graphical representation is powerful and helps us to better visualize the types of values the random variable can take and those values that are more likely than others.<br />
<br />
Examples of normal distributions with various parameters are shown in the next figure.<br />
<br />
<br />
{{ImageWithCaption|image=distrib1.png|caption=Normal distributions}}<br />
<br />
<br />
Representation [[#indiv_gaussian2|(2)]] lets us separate the random and non-random components of $\psi$. If we define as the predicted value the value obtained in the absence of randomness ($\eta=0$), we get that $\hat{\psi}=\mu$. In the particular case of a normal distribution, this predicted value is the mean, median and mode of $\psi$. We can therefore rewrite equations [[#indiv_gaussian1|(1)]] and [[#indiv_gaussian2|(2)]] using $\hpsi$:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\psi &\sim& {\cal N}(\hpsi , \omega^2) \\<br />
\psi &=& \hpsi + \eta, \quad {\rm where } \quad \ \ \eta \sim {\cal N}(0,\omega^2) .<br />
\end{eqnarray}</math> }}<br />
<br />
<br />
<br><br />
<br />
== Extensions of the normal distribution == <br />
<br />
Clearly, not all distributions are Gaussian. To begin with, the normal distribution has the support $\Rset$, unlike many parameters that take values in precise ranges; some variables take only positive values (e.g., concentrations and volumes) and others are restricted to bounded intervals (e.g., bioavailability).<br />
<br />
Furthermore, the [http://en.wikipedia.org/wiki/Gaussian_distribution Gaussian distribution] is symmetric, which is not a property shared by all distributions. One way to extend the use of [http://en.wikipedia.org/wiki/Gaussian_distribution Gaussian distributions] is to consider that some transform of the parameters we are interested in is Gaussian,<br />
i.e., assume the existence of a monotonic function $h$ such that $h(\psi)$ is normally distributed. Then, there exists some $\mu$ and $\omega$ such that $h(\psi) \sim {\cal N}(\mu , \omega^2)$.<br />
<br />
For a given transformation $h$, we can parametrize using $\hat{\psi}$, the predicted value of $\psi$. Indeed, the predicted value of $h(\psi)$ is $\mu=h(\hat{\psi})$, and<br />
<br />
{{EquationWithRef<br />
|equation=<div id="indiv_gaussian3"><math>\begin{eqnarray}<br />
h(\psi) &\sim& {\cal N}(h(\hat{\psi}) , \omega^2) <br />
\end{eqnarray}</math></div><br />
|reference=(3) }}<br />
{{EquationWithRef<br />
|equation=<div id="indiv_gaussian4"><math>\begin{eqnarray}<br />
h(\psi) &=& h(\hat{\psi}) + \eta , \quad {\rm where } \quad \ \eta \sim {\cal N}(0,\omega^2). <br />
\end{eqnarray}</math></div><br />
|reference=(4) }}<br />
<br />
It is possible to derive the pdf of $\psi$ from [[#indiv_gaussian3|(4)]]:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="indiv_gaussian5"><math><br />
\ppsi(\psi)=\displaystyle{ \frac{h^\prime(\psi)}{\sqrt{2 \pi \omega^2} } } \ \exp\left\{-\displaystyle{ \frac{1}{2 \, \omega^2} } (h(\psi) - h(\hpsi))^2 \right\}. </math></div> <br />
|reference=(5) }}<br />
<br />
Let us now see some examples of transformed normal pdfs:<br />
<br />
<br />
<br><br />
===Log-normal distribution===<br />
<br />
The log-normal distribution is widely used for describing the distribution of PK/PD parameters. This choice is usually justified by the fact that it ensures non-negative values, and rarely because it is shown to properly describe the population distribution of the parameter of interest.<br />
<br />
Let $\psi$ be a log-normally distributed random variable with parameters $(\mu,\omega)$:<br />
<br />
{{Equation1<br />
|equation=<math>\log(\psi) \sim {\cal N}( \mu, \omega). </math> }}<br />
<br />
This distribution can be also parameterized with $(m,\omega)$, where $m = \mu = \hat{\psi}$. Then, $\log(\psi) \sim {\cal N}( \log(m), \omega)$ and<br />
<br />
{{Equation1<br />
|equation=<math><br />
\ppsi(\psi)=\displaystyle{ \frac{1}{\psi \, \sqrt{2 \pi \omega^2} } }\ \exp\left\{- \displaystyle{\frac{1}{2 \, \omega^2} (\log(\psi) - \log(m))^2} \right\}.<br />
</math> }}<br />
<br />
We display below some log-normal pdfs obtained with different parameters $(m,\omega)$.<br />
<br />
<br />
{{ImageWithCaption|image=distrib2.png|caption=Log-normal distributions}}<br />
<br />
<br />
We see that for a given standard deviation $\omega$, the pdfs obtained for different $m$ are simply rescaled.<br />
<!-- {{Equation1|equation=<math> f_{\alpha m,\omega}(x) = \frac{f_{m,\omega}(x/\alpha)}{\alpha} </math> }} --><br />
On the other hand, for a given $m$ the asymmetry of the distribution increases when the standard deviation $\omega$ increases.<br />
<br />
<br />
<br />
{{Remarks<br />
|title=Remarks<br />
|text=<br />
Note that the log-normal distribution takes its values in $(0,+\infty)$. It is straightforward to define a rescaled distribution in $(a,+\infty)$ by shifting it:<br />
<br />
{{Equation1<br />
|equation= <br />
<math>\begin{eqnarray}<br />
\log(\psi-a) &\sim& {\cal N}( \log(m-a), \omega^2).<br />
\end{eqnarray}</math> }}<br />
}}<br />
<br />
<br />
<br><br />
<br />
===Power-normal (or Box-Cox) distribution===<br />
<br />
<br />
This is the distribution of a random variable $\psi$ for which the Box-Cox transformation of $\psi$,<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
h(\psi) = \displaystyle{ \frac{\psi^\lambda -1}{\lambda} }<br />
\end{eqnarray}</math> }}<br />
<br />
(with $\lambda > 0$) follows a normal distribution ${\cal N}( \mu, \omega^2)$ truncated such that $h(\psi)>0$. It therefore takes its values in $(0,+\infty)$.<br />
The distribution converges to the log-normal distribution when $\lambda \to 0$ and a truncated normal distribution when $\lambda \to 1$.<br />
The main interest of a power-normal distribution is its ability to represent a distribution "between" the log-normal distribution and the normal distribution.<br />
<br />
Here, $m = \hat{\psi} = (\lambda \mu + 1)^{1/\lambda}$.<br />
We display below several power-normal pdfs obtained with various parameter sets $(\lambda,m,\omega)$.<br />
<br />
<br />
{{ImageWithCaption|image=distrib3.png|caption=Power-normal distributions }}<br />
<br />
<br />
<br><br />
===Logit-normal and probit-normal distributions.===<br />
<br />
A random variable $\psi$ with a logit-normal distribution takes its values in $(0,1)$. The logit of $\psi$ is normally distributed, i.e.,<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\logit(\psi) &= &\log \left(\displaystyle{ \frac{\psi}{1-\psi} }\right) \<br />
\sim \ \ {\cal N}( \mu, \omega^2) \\<br />
m &=& \displaystyle{ \frac{1}{1+e^{-\mu} } }.<br />
\end{eqnarray}</math> }}<br />
<br />
This means that $\mu=\logit(m)$.<br />
<br />
A random variable $\psi$ with a probit-normal distribution also takes its values in $(0,1)$. Then, the <balloon title="The probit function is the inverse cumulative distribution function (quantile function) 1/&Phi; associated with the standard normal distribution N(0,1)." style="color:#177245">probit</balloon> of $\psi$ is normally distributed:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\probit(\psi) &= &\Phi^{-1}(\psi) \<br />
\sim \ {\cal N}( \mu, \omega^2) \\<br />
m &=& \Phi(\mu).<br />
\end{eqnarray}</math> }}<br />
<br />
This means that $\mu=\probit(m)$.<br />
<br />
We can see in the figures below that the pdfs of the logit and probit distributions with the same $m$ and well-chosen $\omega$ are very similar.<br />
Thus, these two distributions can be used interchangeably for modeling the distribution of a parameter that takes its values in $(0,1)$.<br />
<br />
<br />
{{ImageWithCaption|image=distribution4.png|caption=Logit-normal and probit-normal distributions }}<br />
<br />
<br />
Logit and probit transformations can be generalized to any interval $(a,b)$ by setting<br />
<br />
{{Equation1<br />
|equation=<math> \psi = a + (b-a)\tilde{\psi}, </math> }}<br />
<br />
where $\tilde{\psi}$ is a random variable that takes its values in $(0,1)$ with a logit (or probit) distribution.<br />
<br />
Furthermore, it is easy to show that the probit-normal distribution with $m=0.5$ and $\omega=1$ is the uniform distribution on $(0,1)$.<br />
Thus, any uniform distribution can easily be derived from the probit-normal distribution.<br />
<br />
<br />
<br><br />
=== Extension to transformed Student's $t$-distributions ===<br />
These extensions (log-$t$, power-$t$, etc.) can be obtained simply by replacing the normal distribution of the random effects with a [http://en.wikipedia.org/wiki/Student%27s_t-distribution Student $t$-distribution]. Such extensions can be useful for modeling heavy-tailed distributions.<br />
Several [http://en.wikipedia.org/wiki/Student%27s_t-distribution Student's $t$-distributions] with different degrees of freedom (d.f.) are displayed below. The [http://en.wikipedia.org/wiki/Student%27s_t-distribution Student's $t$-distribution] converges to the normal distribution as the d.f. increases, whereas heavy tails are obtained for small d.f.<br />
<br />
<br />
{{ImageWithCaption|image=student.png|caption=Standardized normal and Student's $t$ probability distribution functions }}<br />
<br />
<br />
<br><br />
<br />
== $\mlxtran$ for the Gaussian model== <br />
<br />
<br />
{{ExampleWithCode<br />
|title1=Example<br />
|title2=<br />
|text=<br />
|equation=<math>\begin{eqnarray}<br />
\logit(F_i) &\sim& {\cal N}(\logit(F_{\rm pop}), \omega_F^2) \\<br />
\log(ka_i) &\sim& {\cal N}(\log(ka_{\rm pop}), \omega_{ka}^2) \\<br />
V_i &\sim& {\cal N}(V_{\rm pop}, \omega_V^2) \\<br />
\displaystyle{\frac{Cl_i^{\lambda_{Cl} } - 1}{\lambda_{Cl} } } &\sim& {\cal N}(\frac{Cl_{\rm pop}^{\lambda_{Cl} } - 1}{\lambda_{Cl} }, \omega_{Cl}^2) <br />
\end{eqnarray}</math> <br />
|code= <br />
{{MLXTranForTable<br />
|name=<br />
|text=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
[INDIVIDUAL]<br />
input={F_pop, ka_pop, V_pop, Cl_pop, lambda_Cl, <br />
omega_F, omega_ka, omega_V, omega_Cl}<br />
<br />
DEFINITION:<br />
F = {distribution=logitnormal,reference=F_pop,sd=omega_F}<br />
ka = {distribution=lognormal,reference=ka_pop,sd=omega_ka}<br />
V = {distribution=normal,reference=V_pop,sd=omega_V}<br />
Cl = {distribution=powernormal,<br />
reference=Cl_pop,power=lambda_Cl,sd=omega_Cl}<br />
</pre> }}<br />
<br />
}}<br />
<br />
{{Back&Next<br />
|linkBack=Modeling the individual parameters<br />
|linkNext=Model with covariates }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=Gaussian_models&diff=7405Gaussian models2013-06-25T08:33:33Z<p>Brocco: /* The normal distribution */</p>
<hr />
<div><!-- Menu for the Individual Parameters chapter --><br />
<sidebarmenu><br />
+[[Modeling the individual parameters]]<br />
*[[Modeling the individual parameters| Introduction ]] | [[Gaussian models]] | [[Model with covariates]] | [[Extension to multivariate distributions]] | [[Additional levels of variability]] <br />
</sidebarmenu><br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
== The normal distribution ==<br />
<br />
Gaussian models have several advantages, including the capacity of describing with ease both the predicted value of a random variable and its fluctuations around this value. Indeed, if we consider a Gaussian random variable $\psi$ with mean $\mu$ and standard deviation $\omega$, we can work with two entirely equivalent mathematical representations:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="indiv_gaussian1"><math> \begin{eqnarray}<br />
\psi &\sim& {\cal N}(\mu , \omega^2) <br />
\end{eqnarray}</math></div><br />
|reference=(1) }}<br />
{{EquationWithRef<br />
|equation=<div id="indiv_gaussian2"><math> \begin{eqnarray}<br />
\psi &=& \mu + \eta, \quad {\rm where }\ \quad \ \eta \sim {\cal N}(0,\omega^2) .<br />
\end{eqnarray}</math></div><br />
|reference=(2) }}<br />
<br />
The form [[#indiv_gaussian1|(1)]] provides an explicit description of the distribution of $\psi$ from which we can deduce the [http://en.wikipedia.org/wiki/Probability_density_function pdf] and other characteristics such as the [http://en.wikipedia.org/wiki/Median median], [http://en.wikipedia.org/wiki/Mode_%28statistics%29 mode] and [http://en.wikipedia.org/wiki/Quantile quantiles]. The figure below shows the pdf of a normal distribution with mean $\mu$ and standard deviation $\omega$. <br />
Each vertical band contains 10% of the distribution.<br />
<br />
<br />
:{{ImageWithCaption|image=Ndistrib.png|caption=The ${\cal N}(\mu,\omega^2)$ distribution}}<br />
<br />
<br />
This type of graphical representation is powerful and helps us to better visualize the types of values the random variable can take and those values that are more likely than others.<br />
<br />
Examples of normal distributions with various parameters are shown in the next figure.<br />
<br />
<br />
{{ImageWithCaption|image=distrib1.png|caption=Normal distributions}}<br />
<br />
<br />
Representation [[#indiv_gaussian2|(2)]] lets us separate the random and non-random components of $\psi$. If we define as the predicted value the value obtained in the absence of randomness ($\eta=0$), we get that $\hat{\psi}=\mu$. In the particular case of a normal distribution, this predicted value is the mean, median and mode of $\psi$. We can therefore rewrite equations [[#indiv_gaussian1|(1)]] and [[#indiv_gaussian2|(2)]] using $\hpsi$:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\psi &\sim& {\cal N}(\hpsi , \omega^2) \\<br />
\psi &=& \hpsi + \eta, \quad {\rm where } \quad \ \ \eta \sim {\cal N}(0,\omega^2) .<br />
\end{eqnarray}</math> }}<br />
<br />
<br />
<br><br />
<br />
== Extensions of the normal distribution == <br />
<br />
Clearly, not all distributions are Gaussian. To begin with, the normal distribution has the support $\Rset$, unlike many parameters that take values in precise ranges; some variables take only positive values (e.g., concentrations and volumes) and others are restricted to bounded intervals (e.g., bioavailability).<br />
<br />
Furthermore, the [http://en.wikipedia.org/wiki/Gaussian_distribution Gaussian distribution] is symmetric, which is not a property shared by all distributions. One way to extend the use of [http://en.wikipedia.org/wiki/Gaussian_distribution Gaussian distributions] is to consider that some transform of the parameters we are interested in is Gaussian,<br />
i.e., assume the existence of a monotonic function $h$ such that $h(\psi)$ is normally distributed. Then, there exists some $\mu$ and $\omega$ such that $h(\psi) \sim {\cal N}(\mu , \omega^2)$.<br />
<br />
For a given transformation $h$, we can parametrize using $\hat{\psi}$, the predicted value of $\psi$. Indeed, the predicted value of $h(\psi)$ is $\mu=h(\hat{\psi})$, and<br />
<br />
{{EquationWithRef<br />
|equation=<div id="indiv_gaussian3"><math>\begin{eqnarray}<br />
h(\psi) &\sim& {\cal N}(h(\hat{\psi}) , \omega^2) <br />
\end{eqnarray}</math></div><br />
|reference=(3) }}<br />
{{EquationWithRef<br />
|equation=<div id="indiv_gaussian4"><math>\begin{eqnarray}<br />
h(\psi) &=& h(\hat{\psi}) + \eta , \quad {\rm where } \quad \ \eta \sim {\cal N}(0,\omega^2). <br />
\end{eqnarray}</math></div><br />
|reference=(4) }}<br />
<br />
It is possible to derive the pdf of $\psi$ from [[#indiv_gaussian3|(4)]]:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="indiv_gaussian5"><math><br />
\ppsi(\psi)=\displaystyle{ \frac{h^\prime(\psi)}{\sqrt{2 \pi \omega^2} } } \ \exp\left\{-\displaystyle{ \frac{1}{2 \, \omega^2} } (h(\psi) - h(\hpsi))^2 \right\}. </math></div> <br />
|reference=(5) }}<br />
<br />
Let us now see some examples of transformed normal pdfs:<br />
<br />
<br />
<br><br />
===Log-normal distribution===<br />
<br />
The log-normal distribution is widely used for describing the distribution of PK/PD parameters. This choice is usually justified by the fact that it ensures non-negative values, and rarely because it is shown to properly describe the population distribution of the parameter of interest.<br />
<br />
Let $\psi$ be a log-normally distributed random variable with parameters $(\mu,\omega)$:<br />
<br />
{{Equation1<br />
|equation=<math>\log(\psi) \sim {\cal N}( \mu, \omega). </math> }}<br />
<br />
This distribution can be also parameterized with $(m,\omega)$, where $m = \mu = \hat{\psi}$. Then, $\log(\psi) \sim {\cal N}( \log(m), \omega)$ and<br />
<br />
{{Equation1<br />
|equation=<math><br />
\ppsi(\psi)=\displaystyle{ \frac{1}{\psi \, \sqrt{2 \pi \omega^2} } }\ \exp\left\{- \displaystyle{\frac{1}{2 \, \omega^2} (\log(\psi) - \log(m))^2} \right\}.<br />
</math> }}<br />
<br />
We display below some log-normal pdfs obtained with different parameters $(m,\omega)$.<br />
<br />
<br />
{{ImageWithCaption|image=distrib2.png|caption=Log-normal distributions}}<br />
<br />
<br />
We see that for a given standard deviation $\omega$, the pdfs obtained for different $m$ are simply rescaled.<br />
<!-- {{Equation1|equation=<math> f_{\alpha m,\omega}(x) = \frac{f_{m,\omega}(x/\alpha)}{\alpha} </math> }} --><br />
On the other hand, for a given $m$ the asymmetry of the distribution increases when the standard deviation $\omega$ increases.<br />
<br />
<br />
<br />
{{Remarks<br />
|title=Remarks<br />
|text=<br />
Note that the log-normal distribution takes its values in $(0,+\infty)$. It is straightforward to define a rescaled distribution in $(a,+\infty)$ by shifting it:<br />
<br />
{{Equation1<br />
|equation= <br />
<math>\begin{eqnarray}<br />
\log(\psi-a) &\sim& {\cal N}( \log(m-a), \omega^2).<br />
\end{eqnarray}</math> }}<br />
}}<br />
<br />
<br />
<br><br />
<br />
===Power-normal (or Box-Cox) distribution===<br />
<br />
<br />
This is the distribution of a random variable $\psi$ for which the Box-Cox transformation of $\psi$,<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
h(\psi) = \displaystyle{ \frac{\psi^\lambda -1}{\lambda} }<br />
\end{eqnarray}</math> }}<br />
<br />
(with $\lambda > 0$) follows a normal distribution ${\cal N}( \mu, \omega^2)$ truncated such that $h(\psi)>0$. It therefore takes its values in $(0,+\infty)$.<br />
The distribution converges to the log-normal distribution when $\lambda \to 0$ and a truncated normal distribution when $\lambda \to 1$.<br />
The main interest of a power-normal distribution is its ability to represent a distribution "between" the log-normal distribution and the normal distribution.<br />
<br />
Here, $m = \hat{\psi} = (\lambda \mu + 1)^{1/\lambda}$.<br />
We display below several power-normal pdfs obtained with various parameter sets $(\lambda,m,\omega)$.<br />
<br />
<br />
{{ImageWithCaption|image=distrib3.png|caption=Power-normal distributions }}<br />
<br />
<br />
<br><br />
===Logit-normal and probit-normal distributions.===<br />
<br />
A random variable $\psi$ with a logit-normal distribution takes its values in $(0,1)$. The logit of $\psi$ is normally distributed, i.e.,<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\logit(\psi) &= &\log \left(\displaystyle{ \frac{\psi}{1-\psi} }\right) \<br />
\sim \ \ {\cal N}( \mu, \omega^2) \\<br />
m &=& \displaystyle{ \frac{1}{1+e^{-\mu} } }.<br />
\end{eqnarray}</math> }}<br />
<br />
This means that $\mu=\logit(m)$.<br />
<br />
A random variable $\psi$ with a probit-normal distribution also takes its values in $(0,1)$. Then, the <balloon title="The probit function is the inverse cumulative distribution function (quantile function) 1/&Phi; associated with the standard normal distribution N(0,1)." style="color:#177245">probit</balloon> of $\psi$ is normally distributed:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\probit(\psi) &= &\Phi^{-1}(\psi) \<br />
\sim \ {\cal N}( \mu, \omega^2) \\<br />
m &=& \Phi(\mu).<br />
\end{eqnarray}</math> }}<br />
<br />
This means that $\mu=\probit(m)$.<br />
<br />
We can see in the figures below that the pdfs of the logit and probit distributions with the same $m$ and well-chosen $\omega$ are very similar.<br />
Thus, these two distributions can be used interchangeably for modeling the distribution of a parameter that takes its values in $(0,1)$.<br />
<br />
<br />
{{ImageWithCaption|image=distribution4.png|caption=Logit-normal and probit-normal distributions }}<br />
<br />
<br />
Logit and probit transformations can be generalized to any interval $(a,b)$ by setting<br />
<br />
{{Equation1<br />
|equation=<math> \psi = a + (b-a)\tilde{\psi}, </math> }}<br />
<br />
where $\tilde{\psi}$ is a random variable that takes its values in $(0,1)$ with a logit (or probit) distribution.<br />
<br />
Furthermore, it is easy to show that the probit-normal distribution with $m=0.5$ and $\omega=1$ is the uniform distribution on $(0,1)$.<br />
Thus, any uniform distribution can easily be derived from the probit-normal distribution.<br />
<br />
<br />
<br><br />
=== Extension to transformed Student's $t$-distributions ===<br />
These extensions (log-$t$, power-$t$, etc.) can be obtained simply by replacing the normal distribution of the random effects with a [http://en.wikipedia.org/wiki/Student%27s_t-distribution Student $t$-distribution]. Such extensions can be useful for modeling heavy-tailed distributions.<br />
Several [http://en.wikipedia.org/wiki/Student%27s_t-distribution Student's $t$-distributions] with different degrees of freedom (d.f.) are displayed below. The [http://en.wikipedia.org/wiki/Student%27s_t-distribution Student's $t$-distribution] converges to the normal distribution as the d.f. increases, whereas heavy tails are obtained for small d.f.<br />
<br />
<br />
{{ImageWithCaption|image=student.png|caption=Standardized normal and Student's $t$ probability distribution functions }}<br />
<br />
<br />
<br><br />
<br />
== $\mlxtran$ for the Gaussian model== <br />
<br />
<br />
{{ExampleWithCode<br />
|title1=Example<br />
|title2=<br />
|text=<br />
|equation=<math>\begin{eqnarray}<br />
\logit(F_i) &\sim& {\cal N}(\logit(F_{\rm pop}), \omega_F^2) \\<br />
\log(ka_i) &\sim& {\cal N}(\log(ka_{\rm pop}), \omega_{ka}^2) \\<br />
V_i &\sim& {\cal N}(V_{\rm pop}, \omega_V^2) \\<br />
\displaystyle{\frac{Cl_i^{\lambda_{Cl} } - 1}{\lambda_{Cl} } } &\sim& {\cal N}(\frac{Cl_{\rm pop}^{\lambda_{Cl} } - 1}{\lambda_{Cl} }, \omega_{Cl}^2) <br />
\end{eqnarray}</math> <br />
|code= <br />
{{MLXTranForTable<br />
|name=<br />
|text=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
[INDIVIDUAL]<br />
input={F_pop, ka_pop, V_pop, Cl_pop, lambda_Cl, <br />
omega_F, omega_ka, omega_V, omega_Cl}<br />
<br />
DEFINITION:<br />
F = {distribution=logitnormal,reference=F_pop,sd=omega_F}<br />
ka = {distribution=lognormal,reference=ka_pop,sd=omega_ka}<br />
V = {distribution=normal,reference=V_pop,sd=omega_V}<br />
Cl = {distribution=powernormal,<br />
reference=Cl_pop,power=lambda_Cl,sd=omega_Cl}<br />
</pre> }}<br />
<br />
}}<br />
<br />
{{Back&Next<br />
|linkBack=Modeling the individual parameters<br />
|linkNext=Model with covariates }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=Modeling_the_individual_parameters&diff=7404Modeling the individual parameters2013-06-25T08:28:07Z<p>Brocco: </p>
<hr />
<div><!-- Menu for the Individual Parameters chapter --><br />
<sidebarmenu><br />
+[[Modeling the individual parameters]]<br />
*[[Modeling the individual parameters| Introduction ]] | [[Gaussian models]] | [[Model with covariates]] | [[Extension to multivariate distributions]] | [[Additional levels of variability]] <br />
</sidebarmenu><br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<div style="color: #2E5894; padding-left: 1.4em; padding-right:2.2em; padding-bottom:0.8em; padding:top:1">[[Image:attention4.jpg|45px|left|link=]] <br />
(If you are experiencing problems with the display of the mathematical formula, you can either try to use another browser, or use this link which should work smoothly: http://popix.lixoft.net)<br />
</div><br />
<br />
In [[The individual approach]] section we introduced the modeling approach for a single individual whose response variable depended on the parameter $\psi$. In the population approach, we now suppose that each individual $i$ has its own "individual" parameter $\psi_i$ and more importantly, that this $\psi_i$ comes from some probability distribution $\qpsii$.<br />
<br />
In this chapter, we are interested in the description, representation and implementation of these individual parameter distributions $\qpsii$.<br />
Generally speaking, we assume that individuals are independent. This means that in the following analysis, it suffices to take a closer look at the distribution $\qpsii$ of a unique individual $i$.<br />
<br />
If $\qpsii$ is a parametric distribution that depends on a vector $\theta$ of ''population parameters'' and a set of ''individual covariates'' $c_i=(c_{i,1} , c_{i,2},\ldots, c_{i,L})$, this dependence can be stated explicitly:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="modelindiv1"><math><br />
\psi_i \sim \qpsii(\, \cdot \, ;c_i,\theta) .<br />
</math></div><br />
|reference=(1) }}<br />
<br />
The distribution $\qpsii$ plays a fundamental role since it describes the ''inter-individual variability'' of the individual parameter $\psi_i$.<br />
It achieves two things:<br />
<br />
<br />
<ul><br />
* Definition of a ''predicted'' value $\hpsi_i$ of $\psi_i$ for a given vector of covariates $c_i$ and a given population parameter $\theta$, i.e., a "typical" value of the individual parameter $\psi_i$ for individuals who share the same covariates in a given population.<br />
<br><br />
<br />
* A description of how the individual parameter $\psi_i$ fluctuates around its predicted value $\hpsi_i$. In other words, describes the distribution of the individual parameters for individuals who share the same covariates $c_i$.<br />
</ul><br />
<br />
<br />
This means that modeling the individual parameters reduces to describing these two properties of the distribution $\qpsii$. We can imagine all sorts of discrete or continuous distributions and linear or nonlinear covariate models to define $\hpsi_i$. Nevertheless, we must remember that in the modeling context, the parameters $\psi_i$ are not actually going to be themselves observed. This means that we are going to prefer certain types of models with a structure that lets them be both identifiable and interpretable.<br />
<br />
Example distributions via the normal distribution are proposed in the [[Gaussian models]] section, and continuous and categorical covariate models are presented in the [[The covariate model]] section.<br />
<br />
Rather than defining $\psi_i$ using a probability distribution as in [[#modelindiv1|(1)]], we can instead use equations:<br />
<br />
{{EquationWithRef<br />
|equation= <div id="modelindiv2"><math><br />
\psi_i = \model(\bbeta,c_i,\eta_i) , <br />
</math></div><br />
|reference=(2) }}<br />
<br />
where $\bbeta$ is a vector of ''fixed effects'' and $\eta_i$ a vector of ''random effects'', i.e., a vector of zero-mean random variables: $\esp{\eta_i}=0$.<br />
The predicted value $\hpsi_i$ is then seen as the value of $\psi_i$ with the random effects set to zero:<br />
<br />
{{EquationWithRef<br />
|equation= <div id="modelindiv3"><math><br />
\hpsi_i = \model(\bbeta,c_i,\eta_i \equiv 0) .<br />
</math></div><br />
|reference=(3) }}<br />
The pros and cons of the two approaches are discussed in the [[Description, representation and implementation of a model]] section.<br />
We will show that both representations can be used with the various models presented in the [[Gaussian models]] and [[The_covariate_model|The covariate model]] sections.<br />
<br />
A multivariate representation of the distribution of $\psi_i$ is given in the [[Extension to multivariate distributions]] section for when the random effects vector $\eta_i$ is Gaussian. In this case, under fairly general hypotheses, we can explicitly calculate the [http://en.wikipedia.org/wiki/Likelihood_function likelihood function]<br />
<br />
<br />
{{EquationWithBorder<br />
|equation= <math> {\like}(\theta ; \psi_1,\psi_2,\ldots, \psi_N) \ \ \eqdef \ \ \prod_{i=1}^{N}\ppsii(\psi_i ; c_i , \theta). </math> }} <br />
<br />
<br />
Here, the distribution of the vector of random effects is completely defined by its variance-covariance matrix $\Omega$. Then, the vector of population parameters $\theta$ contains the vector $\bbeta$ of fixed effects and the variance-covariance matrix $\Omega$.<br />
<br />
Several extensions are possible:<br />
<br />
* We can suppose that the individual parameters of a given individual can fluctuate over time. Here, the model needs to describe the ''intra-individual variability'' of the individual parameters.<br />
<br />
* We can also suppose that the individuals are not in fact independent. The model then requires us to provide the inter-individual dependencies of the individual parameters.<br />
<br />
<br />
Some of these models that incorporate differing types of variability are presented in the [[Additional_levels_of_variability|Additional levels of variability]] section.<br />
<br />
<br />
{{Back&Next<br />
|linkBack= Introduction & notation {{!}} Introduction to Models<br />
|linkNext= Gaussian models }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=Introduction_%26_notation&diff=7403Introduction & notation2013-06-25T08:25:22Z<p>Brocco: </p>
<hr />
<div><div style="color: #2E5894; padding-left: 1.4em; padding-right:2.2em; padding-bottom:0.8em; padding:top:1">[[Image:attention4.jpg|45px|left|link=]] <br />
(If you are experiencing problems with the display of the mathematical formula, you can either try to use another browser, or use this link which should work smoothly: http://popix.lixoft.net)<br />
</div><br />
<br />
Models are attempts to describe observations in a logical, simple way, involving the relationship between measurements, parameters, covariates and so on. If working in a probabilistic framework - as we are here - there will be randomness in the model, involving random variables, probability distributions, errors and more.<br />
<br />
Because of this, we are going to make the following definition of a model in this context: [[What is a model? A joint probability distribution! | '''a model is a joint probability distribution''']].<br />
<br />
Therefore, defining a model means defining a [http://en.wikipedia.org/wiki/Joint_probability_distribution joint probability distribution], which can then be decomposed into a product of [http://en.wikipedia.org/wiki/Conditional_distribution conditional distributions] we can perform tasks on: estimation, model selection, simulation, etc.<br />
<br />
This chapter is therefore about defining appropriate probability distributions. We start by introducing some general notation and conventions.<br />
<br />
<br />
* We will call $y_i$ the set of observations recorded on subject $i$, and $\by$ the combined set of observations for all the $N$ individuals: $\by = (y_1, ...,y_N)$. In general, we will use '''bold''' text (like for $\by$) when a variable regroups several individuals. Thus, we write $\psi_i$ for the parameter vector for individual $i$ and $\bpsi$ the parameter vector of a set of individuals, $\bpsi = (\psi_1,\ldots,\psi_N)$.<br />
<br />
<br />
* We note $\qy$ and $\qpsi$ the distributions of $\by$ and $\bpsi$ respectively, $\qcypsi$ the conditional distribution of $\by$ given $\bpsi$, and $\qypsi$ the joint distribution of $\by$ and $\bpsi$. In these (and other distributions), we have placed the variable described by the distribution in the index.<br />
<br />
<br />
* We use the same "$p$" notation for the distribution of a random variable as for its probability density function (pdf).<br />
<br />
<br />
* When there is no ambiguity when working with whole equations, to simplify notation we may omit the indices and simply use the symbol $\pmacro$. For instance, $\qy(\by)$, the pdf of $\by$, becomes $\py(\by)$; both are equivalent. The symbol $\pmacro$ has no meaning on its own, it is completely defined by its arguments.<br />
<br />
<br />
* When the distribution of the individual parameters $\psi_i$ of subject $i$ depends on a vector of individual covariates $c_i$ and a population parameter $\theta$, we may choose to explicitly show this dependence by writing the distribution of $\psi_i$ as $\ppsii(\psi_i;c_i,\theta)$.<br />
<br />
<br />
* When the [http://en.wikipedia.org/wiki/Conditional_distribution conditional distribution] $\qcyipsii$ of the observations $y_i=(y_{ij}, 1\leq j \leq n_i)$ of individual $i$ depends on regression variables $x_i=(x_{ij}, 1\leq j \leq n_i)$ and source terms $u_i$, (i.e., inputs of a dynamical system such as doses in a [[Introduction_to_PK_modeling_using_MLXPlore_-_Part_I | pharmacokinetic model]]), we may choose to explicitly show this dependence, writing the conditional distribution as $\pcyipsii(y_i | \psi_i;x_i,u_i)$.<br />
<br />
<br />
There are two important pieces to the puzzle: the observations $\by$ whose distribution $\qy$ depends on the individual parameters, and the individual parameters $\bpsi$ themselves with distribution $\qpsi$. In the population approach, the base distribution is the joint distribution $\qypsi$ of the observations and individual parameters:<br />
<br />
{{Equation1<br />
|equation=<math>\pypsi(\by,\bpsi) = \pcypsi(\by {{!}} \bpsi)\ppsi(\bpsi).<br />
</math> }}<br />
<br />
In this chapter, we concentrate essentially on these two components: the [http://en.wikipedia.org/wiki/Conditional_distribution conditional distribution] $\qcypsi$ of the observations, and the distribution $\qpsi$ of the individual parameters.<br />
<br />
Depending on the required complexity of the model, its other components such as [http://en.wikipedia.org/wiki/Covariate covariates], population parameters and design can also be modeled as [http://en.wikipedia.org/wiki/Random_variable random variables], but we will not go into such detail in this chapter.<br />
<br />
For each model, we aim to precisely identify the minimal amount of information needed to represent it mathematically, so that it remains possible to implement and analyze. To do this, we will be able to use $\mlxtran$, a powerful formal declarative language that allows us to describe complicated structural and statistical models in a straightforward, intuitive way.<br />
<br />
{{Next<br />
|link=Modeling the individual parameters }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=Introduction_%26_notation&diff=7402Introduction & notation2013-06-25T08:23:51Z<p>Brocco: </p>
<hr />
<div><div style="color: #2E5894; padding-left: 1.4em; padding-right:2.2em; padding-bottom:0.8em; padding:top:1">[[Image:attention4.jpg|45px|left|link=]] <br />
(If you are experiencing problems with the display of the mathematical formula, you can either try to use another browser, or use this link which should work smoothly: http://popix.lixoft.net)<br />
</div><br />
<br />
Models are attempts to describe observations in a logical, simple way, involving the relationship between measurements, parameters, covariates and so on. If working in a probabilistic framework - as we are here - there will be randomness in the model, involving random variables, probability distributions, errors and more.<br />
<br />
Because of this, we are going to make the following definition of a model in this context: [[What is a model? A joint probability distribution! | '''a model is a joint probability distribution''']].<br />
<br />
Therefore, defining a model means defining a [http://en.wikipedia.org/wiki/Joint_probability_distribution joint probability distribution], which can then be decomposed into a product of [http://en.wikipedia.org/wiki/Conditional_distribution conditional distributions] we can perform tasks on: estimation, model selection, simulation, etc.<br />
<br />
This chapter is therefore about defining appropriate probability distributions. We start by introducing some general notation and conventions.<br />
<br />
<br />
* We will call $y_i$ the set of observations recorded on subject $i$, and $\by$ the combined set of observations for all the $N$ individuals: $\by = (y_1, ...,y_N)$. In general, we will use '''bold''' text (like for $\by$) when a variable regroups several individuals. Thus, we write $\psi_i$ for the parameter vector for individual $i$ and $\bpsi$ the parameter vector of a set of individuals, $\bpsi = (\psi_1,\ldots,\psi_N)$.<br />
<br />
<br />
* We note $\qy$ and $\qpsi$ the distributions of $\by$ and $\bpsi$ respectively, $\qcypsi$ the conditional distribution of $\by$ given $\bpsi$, and $\qypsi$ the joint distribution of $\by$ and $\bpsi$. In these (and other distributions), we have placed the variable described by the distribution in the index.<br />
<br />
<br />
* We use the same "$p$" notation for the distribution of a random variable as for its probability density function (pdf).<br />
<br />
<br />
* When there is no ambiguity when working with whole equations, to simplify notation we may omit the indices and simply use the symbol $\pmacro$. For instance, $\qy(\by)$, the pdf of $\by$, becomes $\py(\by)$; both are equivalent. The symbol $\pmacro$ has no meaning on its own, it is completely defined by its arguments.<br />
<br />
<br />
* When the distribution of the individual parameters $\psi_i$ of subject $i$ depends on a vector of individual covariates $c_i$ and a population parameter $\theta$, we may choose to explicitly show this dependence by writing the distribution of $\psi_i$ as $\ppsii(\psi_i;c_i,\theta)$.<br />
<br />
<br />
* When the [http://en.wikipedia.org/wiki/Conditional_distribution conditional distribution] $\qcyipsii$ of the observations $y_i=(y_{ij}, 1\leq j \leq n_i)$ of individual $i$ depends on regression variables $x_i=(x_{ij}, 1\leq j \leq n_i)$ and source terms $u_i$, (i.e., inputs of a dynamical system such as doses in a [[https://wiki.inria.fr/popix/Introduction_to_PK_modeling_using_MLXPlore_-_Part_I | pharmacokinetic model]]), we may choose to explicitly show this dependence, writing the conditional distribution as $\pcyipsii(y_i | \psi_i;x_i,u_i)$.<br />
<br />
<br />
There are two important pieces to the puzzle: the observations $\by$ whose distribution $\qy$ depends on the individual parameters, and the individual parameters $\bpsi$ themselves with distribution $\qpsi$. In the population approach, the base distribution is the joint distribution $\qypsi$ of the observations and individual parameters:<br />
<br />
{{Equation1<br />
|equation=<math>\pypsi(\by,\bpsi) = \pcypsi(\by {{!}} \bpsi)\ppsi(\bpsi).<br />
</math> }}<br />
<br />
In this chapter, we concentrate essentially on these two components: the [http://en.wikipedia.org/wiki/Conditional_distribution conditional distribution] $\qcypsi$ of the observations, and the distribution $\qpsi$ of the individual parameters.<br />
<br />
Depending on the required complexity of the model, its other components such as [http://en.wikipedia.org/wiki/Covariate covariates], population parameters and design can also be modeled as random variables, but we will not go into such detail in this chapter.<br />
<br />
For each model, we aim to precisely identify the minimal amount of information needed to represent it mathematically, so that it remains possible to implement and analyze. To do this, we will be able to use $\mlxtran$, a powerful formal declarative language that allows us to describe complicated structural and statistical models in a straightforward, intuitive way.<br />
<br />
{{Next<br />
|link=Modeling the individual parameters }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=Introduction_%26_notation&diff=7401Introduction & notation2013-06-25T08:02:13Z<p>Brocco: </p>
<hr />
<div><div style="color: #2E5894; padding-left: 1.4em; padding-right:2.2em; padding-bottom:0.8em; padding:top:1">[[Image:attention4.jpg|45px|left|link=]] <br />
(If you are experiencing problems with the display of the mathematical formula, you can either try to use another browser, or use this link which should work smoothly: http://popix.lixoft.net)<br />
</div><br />
<br />
Models are attempts to describe observations in a logical, simple way, involving the relationship between measurements, parameters, covariates and so on. If working in a probabilistic framework - as we are here - there will be randomness in the model, involving random variables, probability distributions, errors and more.<br />
<br />
Because of this, we are going to make the following definition of a model in this context: [[What is a model? A joint probability distribution! | '''a model is a joint probability distribution''']].<br />
<br />
Therefore, defining a model means defining a joint probability distribution, which can then be decomposed into a product of conditional distributions we can perform tasks on: estimation, model selection, simulation, etc.<br />
<br />
This chapter is therefore about defining appropriate probability distributions. We start by introducing some general notation and conventions.<br />
<br />
<br />
* We will call $y_i$ the set of observations recorded on subject $i$, and $\by$ the combined set of observations for all the $N$ individuals: $\by = (y_1, ...,y_N)$. In general, we will use '''bold''' text (like for $\by$) when a variable regroups several individuals. Thus, we write $\psi_i$ for the parameter vector for individual $i$ and $\bpsi$ the parameter vector of a set of individuals, $\bpsi = (\psi_1,\ldots,\psi_N)$.<br />
<br />
<br />
* We note $\qy$ and $\qpsi$ the distributions of $\by$ and $\bpsi$ respectively, $\qcypsi$ the conditional distribution of $\by$ given $\bpsi$, and $\qypsi$ the joint distribution of $\by$ and $\bpsi$. In these (and other distributions), we have placed the variable described by the distribution in the index.<br />
<br />
<br />
* We use the same "$p$" notation for the distribution of a random variable as for its probability density function (pdf).<br />
<br />
<br />
* When there is no ambiguity when working with whole equations, to simplify notation we may omit the indices and simply use the symbol $\pmacro$. For instance, $\qy(\by)$, the pdf of $\by$, becomes $\py(\by)$; both are equivalent. The symbol $\pmacro$ has no meaning on its own, it is completely defined by its arguments.<br />
<br />
<br />
* When the distribution of the individual parameters $\psi_i$ of subject $i$ depends on a vector of individual covariates $c_i$ and a population parameter $\theta$, we may choose to explicitly show this dependence by writing the distribution of $\psi_i$ as $\ppsii(\psi_i;c_i,\theta)$.<br />
<br />
<br />
* When the conditional distribution $\qcyipsii$ of the observations $y_i=(y_{ij}, 1\leq j \leq n_i)$ of individual $i$ depends on regression variables $x_i=(x_{ij}, 1\leq j \leq n_i)$ and source terms $u_i$, (i.e., inputs of a dynamical system such as doses in a pharmacokinetic model), we may choose to explicitly show this dependence, writing the conditional distribution as $\pcyipsii(y_i | \psi_i;x_i,u_i)$.<br />
<br />
<br />
There are two important pieces to the puzzle: the observations $\by$ whose distribution $\qy$ depends on the individual parameters, and the individual parameters $\bpsi$ themselves with distribution $\qpsi$. In the population approach, the base distribution is the joint distribution $\qypsi$ of the observations and individual parameters:<br />
<br />
{{Equation1<br />
|equation=<math>\pypsi(\by,\bpsi) = \pcypsi(\by {{!}} \bpsi)\ppsi(\bpsi).<br />
</math> }}<br />
<br />
In this chapter, we concentrate essentially on these two components: the conditional distribution $\qcypsi$ of the observations, and the distribution $\qpsi$ of the individual parameters.<br />
<br />
Depending on the required complexity of the model, its other components such as covariates, population parameters and design can also be modeled as random variables, but we will not go into such detail in this chapter.<br />
<br />
For each model, we aim to precisely identify the minimal amount of information needed to represent it mathematically, so that it remains possible to implement and analyze. To do this, we will be able to use $\mlxtran$, a powerful formal declarative language that allows us to describe complicated structural and statistical models in a straightforward, intuitive way.<br />
<br />
{{Next<br />
|link=Modeling the individual parameters }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=Continuous_data_models&diff=7400Continuous data models2013-06-24T07:47:17Z<p>Brocco: /* Censored data */</p>
<hr />
<div><!-- Menu for the Observations chapter --><br />
<sidebarmenu><br />
+[[Modeling the observations]]<br />
*[[Modeling the observations| Introduction ]] | [[ Continuous data models ]] | [[Models for count data]] | [[Model for categorical data]] | [[Models for time-to-event data ]] | [[Joint models]] <br />
</sidebarmenu><br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
== The data ==<br />
<br />
Continuous data is data that can take any real value within a given range. For instance, a concentration takes its values in $\Rset^+$, the log of the viral load in $\Rset$, an effect expressed as a percentage in $[0,100]$.<br />
<br />
The data can be stored in a table and represented graphically. Here is some simple pharmacokinetics data involving four individuals.<br />
<br />
<br />
{| cellpadding="0" cellspacing="0" <br />
| style="width:60%" align="center"| <br />
:[[File:continuous_graf0a_1.png]]<br />
| style="width: 40%" align="left"| <br />
:{| class="wikitable" style="width: 70%;"<br />
!| ID || TIME ||CONCENTRATION<br />
|- <br />
|1 || 1.0 || 9.84 <br />
|-<br />
|1 || 2.0 || 8.19 <br />
|-<br />
|1 || 4.0 || 6.91 <br />
|-<br />
|1 || 8.0 || 3.71 <br />
|-<br />
|1 || 12.0 || 1.25 <br />
|-<br />
|2 || 1.0 || 17.23 <br />
|-<br />
|2 || 3.0 || 11.14 <br />
|-<br />
|2 || 5.0 || 4.35 <br />
|-<br />
|2 || 10.0 || 2.92 <br />
|-<br />
|3 || 2.0 || 9.78 <br />
|-<br />
|3 || 3.0 || 10.40 <br />
|-<br />
|3 || 4.0 || 7.67 <br />
|-<br />
|3 || 6.0 || 6.84 <br />
|-<br />
|3 || 11.0 || 1.10 <br />
|-<br />
|4 || 4.0 || 8.78 <br />
|-<br />
|4 || 6.0 || 3.87 <br />
|-<br />
|4 || 12.0 || 1.85 <br />
|}<br />
|}<br />
<br />
<br />
Instead of individual plots, we can plot them all together. Such a figure is usually called a ''spaghetti plot'':<br />
<br />
<br />
::[[File:continuous_graf0b_1.png]]<br />
<br />
<br />
<br><br />
<br />
== The model ==<br />
<br />
<br />
For continuous data, we are going to consider scalar outcomes ($y_{ij}\in \Yr \subset \Rset$) and assume the following general model:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="nlme" ><math>y_{ij}=f(t_{ij},\psi_i)+ g(t_{ij},\psi_i)\teps_{ij}, \quad\ \quad 1\leq i \leq N, \quad \ 1 \leq j \leq n_i. </math></div><br />
|reference=(1)<br />
}}<br />
<br />
where $g(t_{ij},\psi_i)\geq 0$.<br />
<br />
Here, the residual errors $(\teps_{ij})$ are standardized random variables (mean zero and standard deviation 1).<br />
In this case, it is clear that $f(t_{ij},\psi_i)$ and $g(t_{ij},\psi_i)$ are the mean and standard deviation of $y_{ij}$, i.e.,<br />
<br />
{{Equation1<br />
|equation= <math>\begin{eqnarray} \esp{y_{ij} {{!}} \psi_i} &=& f(t_{ij},\psi_i) \\ <br />
\std{y_{ij} {{!}} \psi_i} &=& g(t_{ij},\psi_i).<br />
\end{eqnarray}</math>}}<br />
<br />
<br />
<br><br />
<br />
== The structural model == <br />
<br />
<br />
$f$ is known as the ''structural model'' and aims to describe the time evolution of the phenomena under study. For a given subject $i$ and vector of individual parameters $\psi_i$, $f(t_{ij},\psi_i)$ is the prediction of the observed variable at time $t_{ij}$. In other words, it is the value that would be measured at time $t_{ij}$ if there was no error ($\teps_{ij}=0$).<br />
<br />
In the current example, we decide to model with the structural model $f=A\exp\left(-\alpha t \right)$.<br />
Here are some example curves for various combinations of $A$ and $\alpha$:<br />
<br />
<br />
::[[File:continuous_graf1bis.png|link=]]<br />
<br />
<br />
Other models involving more complicated dynamical systems can be imagined, such as those defined as solutions of systems of ordinary or partial differential equations. Real-life examples are found in the study of HIV, pharmacokinetics and tumor growth.<br />
<br />
<br />
<br />
<br><br />
== The residual error model ==<br />
<br />
<br />
For a given structural model $f$, the conditional probability distribution of the observations $(y_{ij})$ is completely defined by the residual error model, i.e., the probability distribution of the residual errors $(\teps_{ij})$ and the standard deviation $g(x_{ij},\psi_i)$. The residual error model can take many forms. For example,<br />
<br />
<br />
<ul><br />
* A constant error model assumes that $g(t_{ij},\psi_i)=a_i$. Model [[#nlme|(1)]] then reduces to<br />
<br />
{{EquationWithRef <br />
|equation=<div id="nlme1" ><math>y_{ij}=f(t_{ij},\psi_i)+ a_i\teps_{ij}, \quad \quad \ 1\leq i \leq N<br />
\quad \ 1 \leq j \leq n_i. </math></div><br />
|reference=(2) }}<br />
<br />
:The figure below shows four simulated sequences of observations $(y_{ij}, 1\leq i \leq 4, 1\leq j \leq 10)$ with their respective structural model $f(t,\psi_i)$ in blue. Here, $a_i=2$ is the standard deviation of $y_{ij}$ for all $(i,j)$.<br />
<br />
<br />
::[[File: continuous_graf2a1.png|link=]]<br />
<br />
<br />
:Let $\hat{y}_{ij}=f(t_{ij},\psi_i)$ be the prediction of $y_{ij}$ given by the model [[#nlme1|(2)]]. The figure below shows for 50 individuals:<br />
<br />
<br />
<ul><br />
::'''-left''': prediction errors $e_{ij}=y_{ij}-\hat{y}_{ij}$ vs. predictions $(\hat{y}_{ij})$. The pink line is the mean $\esp{e_{ij}}=0$; the green lines are $\pm$ 1 standard deviations: $[\std{e_{ij}} , +\std{e_{ij}}]$ where $\std{e_{ij}}=a_i=0.5$. <br />
<br><br />
::'''-right''': observations $(y_{ij})$ vs. predictions $(\hat{y}_{ij})$. The pink line is the identify $y=\hat{y}$, the green lines represent an interval of $\pm 1$ standard deviations around $\hat{y}$: $[\hat{y}-\std{e_{ij}} , \hat{y}+\std{e_{ij}}]$.<br />
</ul><br />
<br />
<br />
::[[File:continuous_graf2a2.png|link=]]<br />
<br />
<br />
:These figures are typical for constant error models. The standard deviation of the prediction errors does not depend on the value of the predictions $(\hat{y}_{ij})$, so both intervals have constant amplitude.<br />
<br />
<br />
* A proportional error model assumes that $g(t_{ij},\psi_i) =b_i f(t_{ij},\psi_i)$. Model [[#nlme|(1)]] then becomes<br />
<br />
<br />
{{EquationWithRef <br />
|equation=<div id="nlme2"><math> y_{ij}=f(t_{ij},\psi_i)(1 + b_i\teps_{ij}), \quad\ \quad 1\leq i \leq N,<br />
\quad \ 1 \leq j \leq n_i . </math></div><br />
|reference=(3) }}<br />
<br />
:The standard deviation of the prediction error $e_{ij}=y_{ij}-\hat{y}_{ij}$ is proportional to the prediction $\hat{y}_{ij}$. Therefore, the amplitude of the $\pm 1$ standard deviation intervals increases linearly with $f$:<br />
<br />
<br />
::[[File:continuous_graf2b.png|link=]]<br />
<br />
<br />
* A combined error model combines a constant and a proportional error model by assuming $g(t_{ij},\psi_i) =a_i + b_i f(t_{ij},\psi_i)$, where $a_1>0$ and $b_i>0$. The standard deviation of the prediction error $e_{ij}$ and thus the amplitude of the intervals are now affine functions of the prediction $\hat{y}_{ij}$:<br />
<br />
<br />
::[[File:continuous_graf2c.png|link=]]<br />
<br />
<br />
* Another alternative combined error model is $g(t_{ij},\psi_i) =\sqrt{a_i^2 + b_i^2 f^2(t_{ij},\psi_i)}$. This gives intervals that look fairly similar to the previous ones, though they are no longer affine.<br />
<br />
<br />
::[[File:continuous_graf2d.png|link=]]<br />
</ul><br />
<br />
<br />
<br><br />
<br />
== Extension to autocorrelated errors == <br />
<br />
<br />
For any subject $i$, the residual errors $(\teps_{ij},1\leq j \leq n_i)$ are usually assumed to be independent random variables. Extension to autocorrelated errors is possible by assuming for instance that $(\teps_{ij})$ is a stationary [http://en.wikipedia.org/wiki/Autoregressive%E2%80%93moving-average_model ARMA] (Autoregressive Moving Average) process.<br />
For example, an autoregressive process of order 1, AR(1), assumes that autocorrelation decreases exponentially:<br />
<br />
{{EquationWithRef <br />
|equation=<div id="autocorr1"><math> {\rm corr}(\teps_{ij},\teps_{i\,{j+1} }) = \rho_i^{(t_{i\,j+1}-t_{ij})}. </math></div><br />
|reference=(4) }}<br />
<br />
where $0\leq \rho_i <1$ for each individual $i$.<br />
If we assume that $t_{ij}=j$ for any $(i,j)$. Then, $t_{i,j+1}-t_{i,j}=1$ and the autocorrelation function $\gamma$ is given by:<br />
<br />
{{Equation1<br />
|equation=<math> \begin{array}<br />
\gamma(\tau) &=& {\rm corr}(\teps_{ij},\teps_{i\,j+\tau}) \\ &= &\rho_i^{\tau} .<br />
\end{array}</math> }}<br />
<br />
The figure below displays 3 different sequences of residual errors simulated with 3 different autocorrelations $\rho_1=0.1$, $\rho_2=0.6$ and $\rho_3=0.95$. The autocorrelation functions $\gamma(\tau)$ are also displayed.<br />
<br />
<br />
::[[File:continuousGraf3.png|link=]]<br />
<br />
<br />
<br />
<br><br />
<br />
== Distribution of the standardized residual errors ==<br />
<br />
<br />
The distribution of the standardized residual errors $(\teps_{ij})$ is usually assumed to be the same for each individual $i$ and any observation time $t_{ij}$.<br />
Furthermore, for identifiability reasons it is also assumed to be symmetrical around 0, i.e., $\prob{\teps_{ij}<-u}=\prob{\teps_{ij}>u}$ for all $u\in \Rset$.<br />
Thus, for any $(i,j)$ the distribution of the observation $y_{ij}$ is also symmetrical around its prediction $f(t_{ij},\psi_i)$. This $f(t_{ij},\psi_i)$ is therefore both the mean and the median of the distribution of $y_{ij}$: $\esp{y_{ij}|\psi_i}=f(t_{ij},\psi_i)$ and $\prob{y_{ij}>f(t_{ij},\psi_i)} = \prob{y_{ij}<f(t_{ij},\psi_i)} = 1/2$. If we make the additional hypothesis that 0 is the mode of the distribution of $\teps_{ij}$, then $f(t_{ij},\psi_i)$ is also the mode of the distribution of $y_{ij}$.<br />
<br />
A widely used bell-shaped distribution for modeling residual errors is the normal distribution. If we assume that $\teps_{ij}\sim {\cal N}(0,1)$, then $y_{ij}$ is also normally distributed: $ y_{ij}\sim {\cal N}(f(t_{ij},\bpsi_i),\, g(t_{ij},\bpsi_i))$.<br />
<br />
Other distributions can be used, such as [http://en.wikipedia.org/wiki/Student's_t-distribution Student's $t$-distribution] (also known simply as the $t$-distribution) which is also symmetric and bell-shaped but with heavier tails, meaning that it is more prone to producing values that fall far from its prediction.<br />
<br />
<br />
::[[File:continuous_graf4_bis.png|link=]]<br />
<br />
<br />
If we assume that $\teps_{ij}\sim t(\nu)$, then $y_{ij}$ has a non-standardized [http://en.wikipedia.org/wiki/Student's_t-distribution Student's $t$-distribution].<br />
<br />
<br />
<br />
<br><br />
<br />
== The conditional likelihood ==<br />
<br />
<br />
The conditional likelihood for given observations $\by$ is defined as<br />
<br />
{{Equation1<br />
|equation=<math> {\like}(\bpsi; \by) \ \ \eqdef \ \ \pcypsi(\by {{!}} \bpsi), </math> }}<br />
<br />
where $\pcypsi(\by | \bpsi)$ is the conditional density function of the observations. <br />
If we assume that the residual errors $(\teps_{ij},\ 1\leq i \leq N,\ 1\leq j \leq n_i)$ are i.i.d., then this conditional density is straightforward to compute:<br />
<br />
{{EquationWithRef <br />
|equation=<div id="likeN_model1"><math> \begin{eqnarray}\pcypsi(\by {{!}} \bpsi ) & = & \prod_{i=1}^N \pcyipsii(\by_i {{!}} \psi_i ) \\<br />
& = & \prod_{i=1}^N \prod_{j=1}^{n_i} \pypsiij(y_{ij} {{!}} \bpsi_i ) \\<br />
& = & \prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{\frac{1}{g(t_{ij},\psi_i)} } \, \qeps\left(\frac{y_{ij} - f(t_{ij},\psi_i)}{g(t_{ij},\psi_i)}\right) ,<br />
\end{eqnarray} </math></div><br />
|reference=(5) }}<br />
<br />
where $\qeps$ is the pdf of the i.i.d. residual errors ($\teps_{ij}$).<br />
<br />
For example, if we assume that the residual errors $\teps_{ij}$ are Gaussian random variables with mean 0 and variance 1, then $ \qeps(x) = e^{-{x^2}/{2}}/\sqrt{2 \pi}$, and<br />
<br />
{{EquationWithRef <br />
|equation=<div id="likeN_model2" ><math> \begin{eqnarray}<br />
\pcypsi(\by {{!}} \psi ) & = &<br />
\prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{ \frac{1}{\sqrt{2 \pi} g(t_{ij},\psi_i)} }\, \exp\left\{-\frac{1}{2}\left(\frac{y_{ij} - f(t_{ij},\psi_i)}{g(t_{ij},\psi_i)}\right)^2\right\} .<br />
\end{eqnarray} </math></div><br />
|reference=(6) }}<br />
<br />
<br />
<br />
<br><br />
<br />
== Transforming the data==<br />
<br />
<br />
The assumption that the distribution of any observation $y_{ij}$ is symmetrical around its predicted value is a very strong one. If this assumption does not hold, we may decide to transform the data to make it more symmetric around its (transformed) predicted value. In other cases, constraints on the values that observations can take may also lead us to want to transform the data.<br />
<br />
Model [[#nlme|(1)]] can be extended to include a transformation of the data:<br />
<br />
{{EquationWithRef <br />
|equation=<div id="def_t" ><math> \transy(y_{ij})=\transy(f(t_{ij},\psi_i))+ g(t_{ij},\psi_i)\teps_{ij} </math></div><br />
|reference=(7) }}<br />
<br />
where $\transy$ is a monotonic transformation (a strictly increasing or decreasing function).<br />
As you can see, both the data $y_{ij}$ and the structural model $f$ are transformed by the function $\transy$ so that $f(t_{ij},\psi_i)$ remains the prediction of $y_{ij}$.<br />
<br />
<br />
<br />
{{Example<br />
|title=Examples: <br />
| text=<br />
1. If $y$ takes non-negative values, a log transformation can be used: $\transy(y) = \log(y)$. We can then present the model with one of two equivalent representations:<br />
<br />
<!-- Therefore, $y=f e^{g\teps}$. --><br />
<br />
{{Equation1<br />
|equation= <math> \begin{eqnarray}<br />
\log(y_{ij})&=&\log(f(t_{ij},\psi_i))+ g(t_{ij},\psi_i)\teps_{ij}, \\<br />
y_{ij}&=&f(t_{ij},\psi_i)\, e^{ \displaystyle{ -g(t_{ij},\psi_i)\teps_{ij} } }.<br />
\end{eqnarray}</math><br />
}}<br />
<br />
<br />
::[[File: continuous_graf5a.png|link=]]<br />
<br />
<br />
2. If $y$ takes its values between 0 and 1, a logit transformation can be used:<br />
<!-- %\begin{eqnarray*}<br />
%\transy(y)&=&\log(y/(1-y)) \\<br />
% y&=&\frac{f}{f+(1-f) e^{-g\teps}} .<br />
%\end{eqnarray*} --><br />
<br />
{{Equation1<br />
|equation= <math> \begin{eqnarray}<br />
\logit(y_{ij})&=&\logit(f(t_{ij},\psi_i))+ g(t_{ij},\psi_i)\teps_{ij} , \\<br />
y_{ij}&=& \displaystyle{\frac{ f(t_{ij},\bpsi_i) }{ f(t_{ij},\psi_i) + (1- f(t_{ij},\bpsi_i)) \, e^{ g(t_{ij},\psi_i)\teps_{ij} } } }.<br />
\end{eqnarray}</math><br />
}}<br />
<br />
<br />
::[[File:continuous_graf5b.png|link=]]<br />
<br />
<br />
3. The logit error model can be extended if the $y_{ij}$ are known to take their values in an interval $[A,B]$:<br />
<br />
{{Equation1<br />
|equation=<math> \begin{eqnarray}<br />
\transy(y_{ij})&=&\log((y_{ij}-A)/(B-y_{ij})), \\<br />
y_{ij}&=&A+(B-A)\displaystyle{\frac{f(t_{ij},\psi_i)-A}{f(t_{ij},\psi_i)-A+(B-f(t_{ij},\psi_i)) e^{-g(t_{ij},\psi_i)\teps_{ij} } } }\, .<br />
\end{eqnarray}</math><br />
}}<br />
<!-- [[File:continuous_graf5c.png]] --><br />
}}<br />
<br />
<br />
Using the transformation proposed in [[#def_t|(7)]], the conditional density $\pcypsi$ becomes<br />
<br />
{{EquationWithRef<br />
|equation= <div id="likeN_model3" ><math> \begin{eqnarray}<br />
\pcypsi(\by {{!}} \bpsi ) & = & \prod_{i=1}^N \prod_{j=1}^{n_i} \pypsiij(y_{ij} {{!}} \psi_i ) \\<br />
& = & \prod_{i=1}^N \prod_{j=1}^{n_i} \transy^\prime(y_{ij}) \, \ptypsiij(\transy(y_{ij}) {{!}} \psi_i ) \\<br />
& = & \prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{ \frac{\transy^\prime(y_{ij})}{g(t_{ij},\psi_i)} } \, \qeps\left(\frac{\transy(y_{ij}) - \transy(f(t_{ij},\psi_i))}{g(t_{ij},\psi_i)}\right)<br />
\end{eqnarray}<br />
</math></div><br />
|reference=(8) }}<br />
<br />
For example, if the observations are log-normally distributed given the individual parameters ($\transy(y) = \log(y)$), with a constant error model ($g(t;\psi_i)=a$), then<br />
<br />
{{Equation1<br />
|equation=<math> \pcypsi(\by {{!}} \bpsi ) = \prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{ \frac{1}{\sqrt{2 \pi a^2} \, y_{ij} } }\, \exp\left\{-\frac{1}{2 \, a^2}\left(\log(y_{ij}) - \log(f(t_{ij},\psi_i))\right)^2\right\}.<br />
</math> }} <br />
<br />
<br />
<br><br />
<br />
== Censored data ==<br />
<br />
<br />
Censoring occurs when the value of a measurement or observation is only partially known.<br />
For continuous data measurements in the longitudinal context, censoring refers to the values of the measurements, not the times at which they were taken.<br />
<br />
For example, in analytical chemistry, the lower [http://en.wikipedia.org/wiki/Detection_limit limit of detection] (LLOD) is the lowest quantity of a substance that can be distinguished from the absence of that substance. Therefore, any time the quantity is below the LLOD, the "measurement" is not a number but the information that the quantity is less than the LLOD.<br />
<br />
Similarly, in pharmacokinetic studies, measurements of the concentration below a certain limit referred to as the lower [http://en.wikipedia.org/wiki/Detection_limit limit of quantification] (LLOQ) are so low that their reliability is considered suspect. A measuring device can also have an upper [http://en.wikipedia.org/wiki/Detection_limit limit of quantification] (ULOQ) such that any value above this limit cannot be measured and reported.<br />
<br />
As hinted above, censored values are not typically reported as a number, but their existence is known, as well as the type of censoring. Thus, the observation $\repy_{ij}$ (i.e., what is reported) is the measurement $y_{ij}$ if not censored, and the type of censoring otherwise.<br />
<br />
We usually distinguish three types of censoring: left, right and interval. We now introduce these, along with illustrative data sets.<br />
<br />
<br />
* '''Left censoring''': a data point is below a certain value $L$ but it is not known by how much:<br />
<br />
{{Equation1<br />
|equation = <math> <br />
\repy_{ij} = \left\{ \begin{array}{c}<br />
y_{ij} & {\rm if } \ y_{ij} \geq L \\<br />
y_{ij} < L & {\rm otherwise.}<br />
\end{array} \right. </math> }} <br />
<br />
<blockquote>In the figures below, the "data" below the limit $L=-0.30$, shown in gray, is not observed. The values are therefore not reported in the dataset. An additional column {{Verbatim|cens}} can be used to indicate if an observation is left-censored ({{Verbatim|cens{{-}}1}}) or not ({{Verbatim|cens{{-}}0}}). The column of observations {{Verbatim|log-VL}} displays the observed log-viral load when it is above the limit $L=-0.30$, and the limit $L=-0.30$ otherwise.</blockquote><br />
<br />
<br />
{| cellspacing="0" cellpadding="0" <br />
| style="width=60%" |<br />
[[File:continuous_graf6a.png|link=]]<br />
| style="width=40%" align="right"|<br />
{| class="wikitable" style="width: 150%"<br />
!| ID || TIME ||log-VL || cens<br />
|- <br />
| 1 || 1.0 || 0.26 || 0<br />
|-<br />
| 1 || 2.0 || 0.02 || 0<br />
|-<br />
| 1 || 3.0 || -0.13 || 0<br />
|-<br />
| 1 || 4.0 || -0.13 || 0<br />
|-<br />
| 1 || 5.0 || -0.30 || 1<br />
|-<br />
| 1 || 6.0 || -0.30 || 1<br />
|-<br />
| 1 || 7.0 || -0.25 || 0<br />
|-<br />
| 1 || 8.0 || -0.30 || 1<br />
|-<br />
| 1 || 9.0 || -0.29 || 0<br />
|-<br />
| 1 || 10.0 || -0.30 || 1<br />
|}<br />
|}<br />
<br />
<br />
* '''Interval censoring:''' if a data point is in interval $I$, its exact value is not known:<br />
<br />
{{Equation1<br />
|equation=<math> \repy_{ij} = \left\{ \begin{array}{cc}<br />
y_{ij} & {\rm if } \ y_{ij}\notin I \\<br />
y_{ij} \in I & {\rm otherwise.}<br />
\end{array} \right. </math> }}<br />
<br />
<blockquote>For example, suppose we are measuring a concentration which naturally only takes non-negative values, but again we cannot measure it below the level $L = 1$. Therefore, any data point $y_{ij}$ below $1$ will be recorded only as "$y_{ij} \in [0,1)$". In the table, an additional column {{Verbatim|llimit}} is required to indicate the lower bound of the censoring interval.</blockquote><br />
<br />
<br />
{| cellspacing="0" cellpadding="0" <br />
| style="width=60%" |<br />
[[File:continuous_graf6b.png|link=]]<br />
| style="width=40%" align="right"|<br />
{| class="wikitable" style="width: 150%"<br />
!| ID || TIME ||CONC. || llimit || cens<br />
|-<br />
| 1 || 0.3 || 1.20 || . || 0<br />
|-<br />
| 1 || 0.5 || 1.93 || . || 0<br />
|-<br />
| 1 || 1.0 || 3.38 || . || 0<br />
|-<br />
| 1 || 2.0 || 3.88 || . || 0<br />
|-<br />
| 1 || 4.0 || 3.24 || . || 0<br />
|-<br />
| 1 || 6.0 || 1.82 || . || 0<br />
|-<br />
| 1 || 8.0 || 1.07 || . || 0<br />
|-<br />
| 1 || 12.0 || 1.00 || 0.00 || 1<br />
|-<br />
| 1 || 16.0 || 1.00 || 0.00 || 1<br />
|-<br />
| 1 || 20.0 || 1.00 || 0.00 || 1<br />
|}<br />
|}<br />
<br />
<br />
<br />
* '''Right censoring:''' when a data point is above a certain value $U$, it is not known by how much:<br />
<br />
{{Equation1<br />
|equation= <math> \repy_{ij} = \left\{ \begin{array}{cc}<br />
y_{ij} & {\rm if } \ y_{ij}\leq U \\<br />
y_{ij} > U & {\rm otherwise.}<br />
\end{array} \right. <br />
</math> }}<br />
<br />
<blockquote>Column {{Verbatim|cens}} is used to indicate if an observation is right-censored ({{Verbatim|cens{{-}}-1}}) or not ({{Verbatim|cens{{-}}0}}).<br />
</blockquote><br />
<br />
{| cellspacing="0" cellpadding="0" <br />
| style="width=60%" |<br />
[[File:continuous_graf6c.png|link=]]<br />
| style="width=40%" align="right" |<br />
{| class="wikitable" style="width: 150%"<br />
!| ID || TIME ||VOLUME || CENS<br />
|-<br />
| 1 || 2.0 || 1.85 || 0<br />
|-<br />
| 1 || 7.0 || 2.40 || 0<br />
|-<br />
| 1 || 12.0 || 3.27 || 0<br />
|-<br />
| 1 || 17.0 || 3.28 || 0<br />
|-<br />
| 1 || 22.0 || 3.62 || 0<br />
|- <br />
| 1 || 27.0 || 3.02 || 0<br />
|-<br />
| 1 || 32.0 || 3.80 || -1<br />
|-<br />
| 1 || 37.0 || 3.80 || -1<br />
|-<br />
| 1 || 42.0 || 3.80 || -1<br />
|-<br />
| 1 || 47.0 || 3.80 || -1<br />
|}<br />
|}<br />
<br />
<br />
<br />
{{Remarks<br />
|title=Remarks<br />
<br />
|text= &#32;<br />
* Different censoring limits and intervals can be in play at different times and for different individuals.<br />
* Interval censoring covers any type of censoring, i.e., setting $I=(-\infty,L]$ for left censoring and $I=[U,+\infty)$ for right censoring.<br />
}}<br />
<br />
<br />
The likelihood needs to be computed carefully in the presence of censored data. To cover all three types of censoring in one go, let $I_{ij}$ be the (finite or infinite) censoring interval existing for individual $i$ at time $t_{ij}$. Then,<br />
<br />
{{EquationWithRef<br />
|equation = <div id="likeN_model4"><math> <br />
\begin{eqnarray} \pcypsi(\brepy {{!}} \bpsi ) & = & \prod_{i=1}^N \prod_{j=1}^{n_i} \pypsiij(y_{ij} {{!}} \psi_i )^{\mathbf{1}_{y_{ij} \notin I_{ij} } } \, \prob{y_{ij} \in I_{ij} {{!}} \psi_i}^{\mathbf{1}_{y_{ij} \in I_{ij} } }.<br />
\end{eqnarray}<br />
</math></div><br />
|reference=(9) }}<br />
<br />
where<br />
<br />
{{Equation1<br />
|equation=<math> \prob{y_{ij} \in I_{ij} {{!}} \psi_i} = \int_{I_{ij} } \qypsiij(u {{!}} \psi_i )\, du </math> }}<br />
<br />
We see that if $y_{ij}$ is not censored (i.e., $ \mathbf{1}_{y_{ij} \notin I_{ij}} = 1$), the contribution to the likelihood is the usual $\pypsiij(y_{ij} | \psi_i )$, whereas if it is censored, the contribution is $\prob{y_{ij} \in I_{ij}|\psi_i}$.<br />
<br />
<br />
<br><br />
<br />
== Extensions to multidimensional continuous observations == <br />
<br />
<br />
<ul><br />
* Extension to multidimensional observations is straightforward. If $d$ outcomes are simultaneously measured at $t_{ij}$, then $y_{ij}$ is a now a vector in $\Rset^d$ and we can suppose that equation [[#nlme|(1)]] still holds for each component of $y_{ij}$. Thus, for $1\leq m \leq d$,<br />
<br />
{{Equation1<br />
|equation=<math><br />
y_{ijm}=f_m(t_{ij},\psi_i)+ g_m(t_{ij},\psi_i)\teps_{ijm} , \ \ 1\leq i \leq N,<br />
\ \ 1 \leq j \leq n_i.<br />
</math>}}<br />
<br />
: It is then possible to introduce correlation between the components of each observation by assuming that $\teps_{ij} = (\teps_{ijm} , 1\leq m \leq d)$ is a random vector with mean 0 and correlation matrix $R_{\teps_{ij}}$.<br />
<br />
<br />
* Suppose instead that $K$ replicates of the same measurement are taken at time $t_{ij}$. Then, the model becomes, for $1 \leq k \leq K$,<br />
<br />
{{Equation1<br />
|equation=<math><br />
y_{ijk}=f(t_{ij},\psi_i)+ g(t_{ij},\bpsi_i)\teps_{ijk} ,\ \ 1\leq i \leq N,<br />
\ \ 1 \leq j \leq n_i .<br />
</math> }}<br />
<br />
: Following what can be done for decomposing random effects into inter-individual and inter-occasion components, we can decompose the residual error into inter-measurement and inter-replicate components:<br />
<br />
{{Equation1<br />
|equation=<math><br />
y_{ijk}=f(t_{ij},\psi_i)+ g_{I\!M}(t_{ij},\psi_i)\vari{\teps}{ij}{I\!M} + g_{I\!R}(x_{ij},\psi_i)\vari{\teps}{ijk}{I\!R} .<br />
</math> }}<br />
</ul><br />
<br><br><br />
-----------------------------------------------<br />
<br><br><br />
<br />
{{Summary<br />
|title=Summary <br />
|text= <br />
A model for continuous data is completely defined by:<br />
<br />
*The structural model $f$<br />
*The residual error model $g$<br />
*The probability distribution of the residual errors $(\teps_{ij})$<br />
*Possibly a transformation $\transy$ of the data<br />
<br />
<br />
The model is associated with a design which includes:<br />
<br />
<br />
- the observation times $(t_{ij})$<br />
<br />
- possibly some additional regression variables $(x_{ij})$<br />
<br />
- possibly the inputs $(u_i)$ (e.g., the dosing regimen for a PK model)<br />
<br />
- possibly a censoring process $(I_{ij})$<br />
<br />
}}<br />
<br />
<br />
== $\mlxtran$ for continuous data models == <br />
<br />
<br />
<br />
{{ExampleWithCode<br />
|title1=Example 1:<br />
|title2=<br />
<br />
|text= <br />
|equation=<math>\begin{eqnarray}<br />
\psi &=& (A,\alpha,B,\beta, a) \\<br />
f(t,\psi) &=& A\, e^{- \alpha \, t} + B\, e^{- \beta \, t} \\<br />
y_{ij} &=& f(t_{ij} , \psi_i) + a\, \teps_{ij}<br />
\end{eqnarray}</math><br />
|code=<br />
{{MLXTranForTable<br />
|name=<br />
|text=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
INPUT:<br />
input = {A, B, alpha, beta, a}<br />
<br />
EQUATION:<br />
f = A*exp(-alpha*t) + B*exp(-beta*t)<br />
<br />
DEFINITION:<br />
y = {distribution=normal, prediction=f, std=a}</pre><br />
}}<br />
<br />
}}<br />
<br />
<br />
{{ExampleWithCode<br />
|title1=Example 2:<br />
|title2=<br />
<br />
|text=<br />
|equation= <math> \begin{eqnarray}<br />
\psi &=& (\delta, c , \beta, p, s, d, \nu,\rho, a) \\<br />
t_0 &=&0 \\[0.2cm]<br />
{\rm if \quad t<t_0} \\[0.2cm]<br />
\quad \nitc &=& \delta \, c/( \beta \, p) \\<br />
\quad \itc &=& (s - d\,\nitc) / \delta \\<br />
\quad \vl &=& p \, \itc / c. \\[0.2cm] <br />
{\rm else \quad \quad }\\[0.2cm] <br />
\quad \dA{\nitc}{} & =& s - \beta(1-\nu) \, \nitc(t) \, \vl(t) - d\,\nitc(t) \\<br />
\quad \dA{\itc}{} & = &\beta(1-\nu) \, \nitc(t) \, \vl(t) - \delta \, \itc(t) \\<br />
\quad \dA{\vl}{} & = &p(1-\rho) \, \itc(t) - c \, \vl(t) \\<br />
\quad \log(y_{ij}) &= &\log(V(t_{ij} , \psi_i)) + a\, \teps_{ij} <br />
\end{eqnarray}</math><br />
|code=<br />
{{MLXTranForTable<br />
|name=<br />
|text=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
INPUT:<br />
input = {delta, c, beta, p, s, d, nu, rho, a}<br />
<br />
EQUATION:<br />
t0=0<br />
N_0 = delta*c/(beta*p)<br />
I_0 = (s - d*N_0)/delta<br />
V_0 = p*I_0/c<br />
ddt_N = s - beta*(1-nu)*N*V - d*N<br />
ddt_I = beta*(1-nu)*N*V - delta*I<br />
ddt_V = p*(1-rho)*I - c*V<br />
<br />
DEFINITION:<br />
y = {distribution=logNormal, prediction=V, std=a}<br />
</pre> }} <br />
}}<br />
<br />
<br><br><br />
<br />
<br />
==Bibliography==<br />
<br />
<br />
<bibtex><br />
@book{davidian1995,<br />
author = {Davidian, M. and Giltinan, D.M. },<br />
title = {Nonlinear Models for Repeated Measurements Data },<br />
publisher = {Chapman & Hall.},<br />
address = {London},<br />
edition = {},<br />
year = {1995}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{demidenko2005mixed,<br />
title={Mixed Models: Theory and Applications},<br />
author={Demidenko, E.},<br />
isbn={9780471726135},<br />
series={Wiley Series in Probability and Statistics}, url={http://books.google.fr/books/about/Mixed_Models.html?id=IWQR8d_UZHoC&redir_esc=y}, <br />
year={2005}, publisher={Wiley}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{fitzmaurice2008longitudinal,<br />
title={Longitudinal Data Analysis},<br />
author={Fitzmaurice, G. and Davidian, M. and Verbeke, G. and Molenberghs, G.},<br />
isbn={9781420011579},<br />
lccn={2008020681},<br />
series={Chapman & Hall/CRC Handbooks of Modern Statistical Methods},url={http://books.google.fr/books?id=zVBjCvQCoGQC},<br />
year={2008},publisher={Taylor & Francis}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{jiang2007,<br />
author = {Jiang, J.},<br />
title = {Linear and Generalized Linear Mixed Models and Their Applications},<br />
publisher = {Springer Series in Statistics},<br />
year = {2007},<br />
address = {New York}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{laird1982,<br />
author = {Laird, N.M. and Ware, J.H.},<br />
title = {Random-Effects Models for Longitudinal Data},<br />
journal = {Biometrics},<br />
volume = {38},<br />
pages = {963-974},<br />
year = {1982}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{lindstrom1990Nonlinear,<br />
author = {Lindstrom, M.J. and Bates, D.M. },<br />
title = {Nonlinear mixed-effects models for repeated measures},<br />
journal = {Biometrics},<br />
volume = {46},<br />
pages = {673-687},<br />
year = {1990}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{littell2006sas,<br />
title={SAS for mixed models},<br />
author={Littell, R.C.},<br />
year={2006},<br />
publisher={SAS institute}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{mcculloch2011generalized,<br />
title={Generalized, Linear, and Mixed Models},<br />
author={McCulloch, C.E. and Searle, S.R.},<br />
isbn={9781118209967},<br />
series={Wiley Series in Probability and Statistics}, url={http://books.google.fr/books/about/Generalized_Linear_and_Mixed_Models.html?id=bWDPukohugQC&redir_esc=y}, year={2004}, publisher={Wiley & Sons} <br />
}<br />
</bibtex><br />
<bibtex><br />
@book{verbeke2009linear,<br />
title={Linear Mixed Models for Longitudinal Data},<br />
author={Verbeke, G. and Molenberghs, G.},<br />
isbn={9781441902993},<br />
lccn={2010483807},<br />
series={Springer Series in Statistics},<br />
url={http://books.google.fr/books?id=jmPkX4VU7h0C},<br />
year={2009},<br />
publisher={Springer}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{west2006linear,<br />
title={Linear Mixed Models: A Practical Guide Using Statistical Software},<br />
author={West, B. and Welch, K.B. and Galecki, A.T.},<br />
isbn={9781584884804},<br />
lccn={2006045440},year={2006},publisher={Taylor & Francis}<br />
}<br />
</bibtex><br />
<br />
<br />
{{Back&Next<br />
|linkBack=Modeling the observations <br />
|linkNext=Models for count data }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=Continuous_data_models&diff=7399Continuous data models2013-06-24T07:46:31Z<p>Brocco: /* Censored data */</p>
<hr />
<div><!-- Menu for the Observations chapter --><br />
<sidebarmenu><br />
+[[Modeling the observations]]<br />
*[[Modeling the observations| Introduction ]] | [[ Continuous data models ]] | [[Models for count data]] | [[Model for categorical data]] | [[Models for time-to-event data ]] | [[Joint models]] <br />
</sidebarmenu><br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
== The data ==<br />
<br />
Continuous data is data that can take any real value within a given range. For instance, a concentration takes its values in $\Rset^+$, the log of the viral load in $\Rset$, an effect expressed as a percentage in $[0,100]$.<br />
<br />
The data can be stored in a table and represented graphically. Here is some simple pharmacokinetics data involving four individuals.<br />
<br />
<br />
{| cellpadding="0" cellspacing="0" <br />
| style="width:60%" align="center"| <br />
:[[File:continuous_graf0a_1.png]]<br />
| style="width: 40%" align="left"| <br />
:{| class="wikitable" style="width: 70%;"<br />
!| ID || TIME ||CONCENTRATION<br />
|- <br />
|1 || 1.0 || 9.84 <br />
|-<br />
|1 || 2.0 || 8.19 <br />
|-<br />
|1 || 4.0 || 6.91 <br />
|-<br />
|1 || 8.0 || 3.71 <br />
|-<br />
|1 || 12.0 || 1.25 <br />
|-<br />
|2 || 1.0 || 17.23 <br />
|-<br />
|2 || 3.0 || 11.14 <br />
|-<br />
|2 || 5.0 || 4.35 <br />
|-<br />
|2 || 10.0 || 2.92 <br />
|-<br />
|3 || 2.0 || 9.78 <br />
|-<br />
|3 || 3.0 || 10.40 <br />
|-<br />
|3 || 4.0 || 7.67 <br />
|-<br />
|3 || 6.0 || 6.84 <br />
|-<br />
|3 || 11.0 || 1.10 <br />
|-<br />
|4 || 4.0 || 8.78 <br />
|-<br />
|4 || 6.0 || 3.87 <br />
|-<br />
|4 || 12.0 || 1.85 <br />
|}<br />
|}<br />
<br />
<br />
Instead of individual plots, we can plot them all together. Such a figure is usually called a ''spaghetti plot'':<br />
<br />
<br />
::[[File:continuous_graf0b_1.png]]<br />
<br />
<br />
<br><br />
<br />
== The model ==<br />
<br />
<br />
For continuous data, we are going to consider scalar outcomes ($y_{ij}\in \Yr \subset \Rset$) and assume the following general model:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="nlme" ><math>y_{ij}=f(t_{ij},\psi_i)+ g(t_{ij},\psi_i)\teps_{ij}, \quad\ \quad 1\leq i \leq N, \quad \ 1 \leq j \leq n_i. </math></div><br />
|reference=(1)<br />
}}<br />
<br />
where $g(t_{ij},\psi_i)\geq 0$.<br />
<br />
Here, the residual errors $(\teps_{ij})$ are standardized random variables (mean zero and standard deviation 1).<br />
In this case, it is clear that $f(t_{ij},\psi_i)$ and $g(t_{ij},\psi_i)$ are the mean and standard deviation of $y_{ij}$, i.e.,<br />
<br />
{{Equation1<br />
|equation= <math>\begin{eqnarray} \esp{y_{ij} {{!}} \psi_i} &=& f(t_{ij},\psi_i) \\ <br />
\std{y_{ij} {{!}} \psi_i} &=& g(t_{ij},\psi_i).<br />
\end{eqnarray}</math>}}<br />
<br />
<br />
<br><br />
<br />
== The structural model == <br />
<br />
<br />
$f$ is known as the ''structural model'' and aims to describe the time evolution of the phenomena under study. For a given subject $i$ and vector of individual parameters $\psi_i$, $f(t_{ij},\psi_i)$ is the prediction of the observed variable at time $t_{ij}$. In other words, it is the value that would be measured at time $t_{ij}$ if there was no error ($\teps_{ij}=0$).<br />
<br />
In the current example, we decide to model with the structural model $f=A\exp\left(-\alpha t \right)$.<br />
Here are some example curves for various combinations of $A$ and $\alpha$:<br />
<br />
<br />
::[[File:continuous_graf1bis.png|link=]]<br />
<br />
<br />
Other models involving more complicated dynamical systems can be imagined, such as those defined as solutions of systems of ordinary or partial differential equations. Real-life examples are found in the study of HIV, pharmacokinetics and tumor growth.<br />
<br />
<br />
<br />
<br><br />
== The residual error model ==<br />
<br />
<br />
For a given structural model $f$, the conditional probability distribution of the observations $(y_{ij})$ is completely defined by the residual error model, i.e., the probability distribution of the residual errors $(\teps_{ij})$ and the standard deviation $g(x_{ij},\psi_i)$. The residual error model can take many forms. For example,<br />
<br />
<br />
<ul><br />
* A constant error model assumes that $g(t_{ij},\psi_i)=a_i$. Model [[#nlme|(1)]] then reduces to<br />
<br />
{{EquationWithRef <br />
|equation=<div id="nlme1" ><math>y_{ij}=f(t_{ij},\psi_i)+ a_i\teps_{ij}, \quad \quad \ 1\leq i \leq N<br />
\quad \ 1 \leq j \leq n_i. </math></div><br />
|reference=(2) }}<br />
<br />
:The figure below shows four simulated sequences of observations $(y_{ij}, 1\leq i \leq 4, 1\leq j \leq 10)$ with their respective structural model $f(t,\psi_i)$ in blue. Here, $a_i=2$ is the standard deviation of $y_{ij}$ for all $(i,j)$.<br />
<br />
<br />
::[[File: continuous_graf2a1.png|link=]]<br />
<br />
<br />
:Let $\hat{y}_{ij}=f(t_{ij},\psi_i)$ be the prediction of $y_{ij}$ given by the model [[#nlme1|(2)]]. The figure below shows for 50 individuals:<br />
<br />
<br />
<ul><br />
::'''-left''': prediction errors $e_{ij}=y_{ij}-\hat{y}_{ij}$ vs. predictions $(\hat{y}_{ij})$. The pink line is the mean $\esp{e_{ij}}=0$; the green lines are $\pm$ 1 standard deviations: $[\std{e_{ij}} , +\std{e_{ij}}]$ where $\std{e_{ij}}=a_i=0.5$. <br />
<br><br />
::'''-right''': observations $(y_{ij})$ vs. predictions $(\hat{y}_{ij})$. The pink line is the identify $y=\hat{y}$, the green lines represent an interval of $\pm 1$ standard deviations around $\hat{y}$: $[\hat{y}-\std{e_{ij}} , \hat{y}+\std{e_{ij}}]$.<br />
</ul><br />
<br />
<br />
::[[File:continuous_graf2a2.png|link=]]<br />
<br />
<br />
:These figures are typical for constant error models. The standard deviation of the prediction errors does not depend on the value of the predictions $(\hat{y}_{ij})$, so both intervals have constant amplitude.<br />
<br />
<br />
* A proportional error model assumes that $g(t_{ij},\psi_i) =b_i f(t_{ij},\psi_i)$. Model [[#nlme|(1)]] then becomes<br />
<br />
<br />
{{EquationWithRef <br />
|equation=<div id="nlme2"><math> y_{ij}=f(t_{ij},\psi_i)(1 + b_i\teps_{ij}), \quad\ \quad 1\leq i \leq N,<br />
\quad \ 1 \leq j \leq n_i . </math></div><br />
|reference=(3) }}<br />
<br />
:The standard deviation of the prediction error $e_{ij}=y_{ij}-\hat{y}_{ij}$ is proportional to the prediction $\hat{y}_{ij}$. Therefore, the amplitude of the $\pm 1$ standard deviation intervals increases linearly with $f$:<br />
<br />
<br />
::[[File:continuous_graf2b.png|link=]]<br />
<br />
<br />
* A combined error model combines a constant and a proportional error model by assuming $g(t_{ij},\psi_i) =a_i + b_i f(t_{ij},\psi_i)$, where $a_1>0$ and $b_i>0$. The standard deviation of the prediction error $e_{ij}$ and thus the amplitude of the intervals are now affine functions of the prediction $\hat{y}_{ij}$:<br />
<br />
<br />
::[[File:continuous_graf2c.png|link=]]<br />
<br />
<br />
* Another alternative combined error model is $g(t_{ij},\psi_i) =\sqrt{a_i^2 + b_i^2 f^2(t_{ij},\psi_i)}$. This gives intervals that look fairly similar to the previous ones, though they are no longer affine.<br />
<br />
<br />
::[[File:continuous_graf2d.png|link=]]<br />
</ul><br />
<br />
<br />
<br><br />
<br />
== Extension to autocorrelated errors == <br />
<br />
<br />
For any subject $i$, the residual errors $(\teps_{ij},1\leq j \leq n_i)$ are usually assumed to be independent random variables. Extension to autocorrelated errors is possible by assuming for instance that $(\teps_{ij})$ is a stationary [http://en.wikipedia.org/wiki/Autoregressive%E2%80%93moving-average_model ARMA] (Autoregressive Moving Average) process.<br />
For example, an autoregressive process of order 1, AR(1), assumes that autocorrelation decreases exponentially:<br />
<br />
{{EquationWithRef <br />
|equation=<div id="autocorr1"><math> {\rm corr}(\teps_{ij},\teps_{i\,{j+1} }) = \rho_i^{(t_{i\,j+1}-t_{ij})}. </math></div><br />
|reference=(4) }}<br />
<br />
where $0\leq \rho_i <1$ for each individual $i$.<br />
If we assume that $t_{ij}=j$ for any $(i,j)$. Then, $t_{i,j+1}-t_{i,j}=1$ and the autocorrelation function $\gamma$ is given by:<br />
<br />
{{Equation1<br />
|equation=<math> \begin{array}<br />
\gamma(\tau) &=& {\rm corr}(\teps_{ij},\teps_{i\,j+\tau}) \\ &= &\rho_i^{\tau} .<br />
\end{array}</math> }}<br />
<br />
The figure below displays 3 different sequences of residual errors simulated with 3 different autocorrelations $\rho_1=0.1$, $\rho_2=0.6$ and $\rho_3=0.95$. The autocorrelation functions $\gamma(\tau)$ are also displayed.<br />
<br />
<br />
::[[File:continuousGraf3.png|link=]]<br />
<br />
<br />
<br />
<br><br />
<br />
== Distribution of the standardized residual errors ==<br />
<br />
<br />
The distribution of the standardized residual errors $(\teps_{ij})$ is usually assumed to be the same for each individual $i$ and any observation time $t_{ij}$.<br />
Furthermore, for identifiability reasons it is also assumed to be symmetrical around 0, i.e., $\prob{\teps_{ij}<-u}=\prob{\teps_{ij}>u}$ for all $u\in \Rset$.<br />
Thus, for any $(i,j)$ the distribution of the observation $y_{ij}$ is also symmetrical around its prediction $f(t_{ij},\psi_i)$. This $f(t_{ij},\psi_i)$ is therefore both the mean and the median of the distribution of $y_{ij}$: $\esp{y_{ij}|\psi_i}=f(t_{ij},\psi_i)$ and $\prob{y_{ij}>f(t_{ij},\psi_i)} = \prob{y_{ij}<f(t_{ij},\psi_i)} = 1/2$. If we make the additional hypothesis that 0 is the mode of the distribution of $\teps_{ij}$, then $f(t_{ij},\psi_i)$ is also the mode of the distribution of $y_{ij}$.<br />
<br />
A widely used bell-shaped distribution for modeling residual errors is the normal distribution. If we assume that $\teps_{ij}\sim {\cal N}(0,1)$, then $y_{ij}$ is also normally distributed: $ y_{ij}\sim {\cal N}(f(t_{ij},\bpsi_i),\, g(t_{ij},\bpsi_i))$.<br />
<br />
Other distributions can be used, such as [http://en.wikipedia.org/wiki/Student's_t-distribution Student's $t$-distribution] (also known simply as the $t$-distribution) which is also symmetric and bell-shaped but with heavier tails, meaning that it is more prone to producing values that fall far from its prediction.<br />
<br />
<br />
::[[File:continuous_graf4_bis.png|link=]]<br />
<br />
<br />
If we assume that $\teps_{ij}\sim t(\nu)$, then $y_{ij}$ has a non-standardized [http://en.wikipedia.org/wiki/Student's_t-distribution Student's $t$-distribution].<br />
<br />
<br />
<br />
<br><br />
<br />
== The conditional likelihood ==<br />
<br />
<br />
The conditional likelihood for given observations $\by$ is defined as<br />
<br />
{{Equation1<br />
|equation=<math> {\like}(\bpsi; \by) \ \ \eqdef \ \ \pcypsi(\by {{!}} \bpsi), </math> }}<br />
<br />
where $\pcypsi(\by | \bpsi)$ is the conditional density function of the observations. <br />
If we assume that the residual errors $(\teps_{ij},\ 1\leq i \leq N,\ 1\leq j \leq n_i)$ are i.i.d., then this conditional density is straightforward to compute:<br />
<br />
{{EquationWithRef <br />
|equation=<div id="likeN_model1"><math> \begin{eqnarray}\pcypsi(\by {{!}} \bpsi ) & = & \prod_{i=1}^N \pcyipsii(\by_i {{!}} \psi_i ) \\<br />
& = & \prod_{i=1}^N \prod_{j=1}^{n_i} \pypsiij(y_{ij} {{!}} \bpsi_i ) \\<br />
& = & \prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{\frac{1}{g(t_{ij},\psi_i)} } \, \qeps\left(\frac{y_{ij} - f(t_{ij},\psi_i)}{g(t_{ij},\psi_i)}\right) ,<br />
\end{eqnarray} </math></div><br />
|reference=(5) }}<br />
<br />
where $\qeps$ is the pdf of the i.i.d. residual errors ($\teps_{ij}$).<br />
<br />
For example, if we assume that the residual errors $\teps_{ij}$ are Gaussian random variables with mean 0 and variance 1, then $ \qeps(x) = e^{-{x^2}/{2}}/\sqrt{2 \pi}$, and<br />
<br />
{{EquationWithRef <br />
|equation=<div id="likeN_model2" ><math> \begin{eqnarray}<br />
\pcypsi(\by {{!}} \psi ) & = &<br />
\prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{ \frac{1}{\sqrt{2 \pi} g(t_{ij},\psi_i)} }\, \exp\left\{-\frac{1}{2}\left(\frac{y_{ij} - f(t_{ij},\psi_i)}{g(t_{ij},\psi_i)}\right)^2\right\} .<br />
\end{eqnarray} </math></div><br />
|reference=(6) }}<br />
<br />
<br />
<br />
<br><br />
<br />
== Transforming the data==<br />
<br />
<br />
The assumption that the distribution of any observation $y_{ij}$ is symmetrical around its predicted value is a very strong one. If this assumption does not hold, we may decide to transform the data to make it more symmetric around its (transformed) predicted value. In other cases, constraints on the values that observations can take may also lead us to want to transform the data.<br />
<br />
Model [[#nlme|(1)]] can be extended to include a transformation of the data:<br />
<br />
{{EquationWithRef <br />
|equation=<div id="def_t" ><math> \transy(y_{ij})=\transy(f(t_{ij},\psi_i))+ g(t_{ij},\psi_i)\teps_{ij} </math></div><br />
|reference=(7) }}<br />
<br />
where $\transy$ is a monotonic transformation (a strictly increasing or decreasing function).<br />
As you can see, both the data $y_{ij}$ and the structural model $f$ are transformed by the function $\transy$ so that $f(t_{ij},\psi_i)$ remains the prediction of $y_{ij}$.<br />
<br />
<br />
<br />
{{Example<br />
|title=Examples: <br />
| text=<br />
1. If $y$ takes non-negative values, a log transformation can be used: $\transy(y) = \log(y)$. We can then present the model with one of two equivalent representations:<br />
<br />
<!-- Therefore, $y=f e^{g\teps}$. --><br />
<br />
{{Equation1<br />
|equation= <math> \begin{eqnarray}<br />
\log(y_{ij})&=&\log(f(t_{ij},\psi_i))+ g(t_{ij},\psi_i)\teps_{ij}, \\<br />
y_{ij}&=&f(t_{ij},\psi_i)\, e^{ \displaystyle{ -g(t_{ij},\psi_i)\teps_{ij} } }.<br />
\end{eqnarray}</math><br />
}}<br />
<br />
<br />
::[[File: continuous_graf5a.png|link=]]<br />
<br />
<br />
2. If $y$ takes its values between 0 and 1, a logit transformation can be used:<br />
<!-- %\begin{eqnarray*}<br />
%\transy(y)&=&\log(y/(1-y)) \\<br />
% y&=&\frac{f}{f+(1-f) e^{-g\teps}} .<br />
%\end{eqnarray*} --><br />
<br />
{{Equation1<br />
|equation= <math> \begin{eqnarray}<br />
\logit(y_{ij})&=&\logit(f(t_{ij},\psi_i))+ g(t_{ij},\psi_i)\teps_{ij} , \\<br />
y_{ij}&=& \displaystyle{\frac{ f(t_{ij},\bpsi_i) }{ f(t_{ij},\psi_i) + (1- f(t_{ij},\bpsi_i)) \, e^{ g(t_{ij},\psi_i)\teps_{ij} } } }.<br />
\end{eqnarray}</math><br />
}}<br />
<br />
<br />
::[[File:continuous_graf5b.png|link=]]<br />
<br />
<br />
3. The logit error model can be extended if the $y_{ij}$ are known to take their values in an interval $[A,B]$:<br />
<br />
{{Equation1<br />
|equation=<math> \begin{eqnarray}<br />
\transy(y_{ij})&=&\log((y_{ij}-A)/(B-y_{ij})), \\<br />
y_{ij}&=&A+(B-A)\displaystyle{\frac{f(t_{ij},\psi_i)-A}{f(t_{ij},\psi_i)-A+(B-f(t_{ij},\psi_i)) e^{-g(t_{ij},\psi_i)\teps_{ij} } } }\, .<br />
\end{eqnarray}</math><br />
}}<br />
<!-- [[File:continuous_graf5c.png]] --><br />
}}<br />
<br />
<br />
Using the transformation proposed in [[#def_t|(7)]], the conditional density $\pcypsi$ becomes<br />
<br />
{{EquationWithRef<br />
|equation= <div id="likeN_model3" ><math> \begin{eqnarray}<br />
\pcypsi(\by {{!}} \bpsi ) & = & \prod_{i=1}^N \prod_{j=1}^{n_i} \pypsiij(y_{ij} {{!}} \psi_i ) \\<br />
& = & \prod_{i=1}^N \prod_{j=1}^{n_i} \transy^\prime(y_{ij}) \, \ptypsiij(\transy(y_{ij}) {{!}} \psi_i ) \\<br />
& = & \prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{ \frac{\transy^\prime(y_{ij})}{g(t_{ij},\psi_i)} } \, \qeps\left(\frac{\transy(y_{ij}) - \transy(f(t_{ij},\psi_i))}{g(t_{ij},\psi_i)}\right)<br />
\end{eqnarray}<br />
</math></div><br />
|reference=(8) }}<br />
<br />
For example, if the observations are log-normally distributed given the individual parameters ($\transy(y) = \log(y)$), with a constant error model ($g(t;\psi_i)=a$), then<br />
<br />
{{Equation1<br />
|equation=<math> \pcypsi(\by {{!}} \bpsi ) = \prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{ \frac{1}{\sqrt{2 \pi a^2} \, y_{ij} } }\, \exp\left\{-\frac{1}{2 \, a^2}\left(\log(y_{ij}) - \log(f(t_{ij},\psi_i))\right)^2\right\}.<br />
</math> }} <br />
<br />
<br />
<br><br />
<br />
== Censored data ==<br />
<br />
<br />
Censoring occurs when the value of a measurement or observation is only partially known.<br />
For continuous data measurements in the longitudinal context, censoring refers to the values of the measurements, not the times at which they were taken.<br />
<br />
For example, in analytical chemistry, the lower [http://en.wikipedia.org/wiki/Detection_limit limit of detection] (LLOD) is the lowest quantity of a substance that can be distinguished from the absence of that substance. Therefore, any time the quantity is below the LLOD, the "measurement" is not a number but the information that the quantity is less than the LLOD.<br />
<br />
Similarly, in pharmacokinetic studies, measurements of the concentration below a certain limit referred to as the lower [http://en.wikipedia.org/wiki/Detection_limit limit of quantification] (LLOQ) are so low that their reliability is considered suspect. A measuring device can also have an upper limit of quantification (ULOQ) such that any value above this limit cannot be measured and reported.<br />
<br />
As hinted above, censored values are not typically reported as a number, but their existence is known, as well as the type of censoring. Thus, the observation $\repy_{ij}$ (i.e., what is reported) is the measurement $y_{ij}$ if not censored, and the type of censoring otherwise.<br />
<br />
We usually distinguish three types of censoring: left, right and interval. We now introduce these, along with illustrative data sets.<br />
<br />
<br />
* '''Left censoring''': a data point is below a certain value $L$ but it is not known by how much:<br />
<br />
{{Equation1<br />
|equation = <math> <br />
\repy_{ij} = \left\{ \begin{array}{c}<br />
y_{ij} & {\rm if } \ y_{ij} \geq L \\<br />
y_{ij} < L & {\rm otherwise.}<br />
\end{array} \right. </math> }} <br />
<br />
<blockquote>In the figures below, the "data" below the limit $L=-0.30$, shown in gray, is not observed. The values are therefore not reported in the dataset. An additional column {{Verbatim|cens}} can be used to indicate if an observation is left-censored ({{Verbatim|cens{{-}}1}}) or not ({{Verbatim|cens{{-}}0}}). The column of observations {{Verbatim|log-VL}} displays the observed log-viral load when it is above the limit $L=-0.30$, and the limit $L=-0.30$ otherwise.</blockquote><br />
<br />
<br />
{| cellspacing="0" cellpadding="0" <br />
| style="width=60%" |<br />
[[File:continuous_graf6a.png|link=]]<br />
| style="width=40%" align="right"|<br />
{| class="wikitable" style="width: 150%"<br />
!| ID || TIME ||log-VL || cens<br />
|- <br />
| 1 || 1.0 || 0.26 || 0<br />
|-<br />
| 1 || 2.0 || 0.02 || 0<br />
|-<br />
| 1 || 3.0 || -0.13 || 0<br />
|-<br />
| 1 || 4.0 || -0.13 || 0<br />
|-<br />
| 1 || 5.0 || -0.30 || 1<br />
|-<br />
| 1 || 6.0 || -0.30 || 1<br />
|-<br />
| 1 || 7.0 || -0.25 || 0<br />
|-<br />
| 1 || 8.0 || -0.30 || 1<br />
|-<br />
| 1 || 9.0 || -0.29 || 0<br />
|-<br />
| 1 || 10.0 || -0.30 || 1<br />
|}<br />
|}<br />
<br />
<br />
* '''Interval censoring:''' if a data point is in interval $I$, its exact value is not known:<br />
<br />
{{Equation1<br />
|equation=<math> \repy_{ij} = \left\{ \begin{array}{cc}<br />
y_{ij} & {\rm if } \ y_{ij}\notin I \\<br />
y_{ij} \in I & {\rm otherwise.}<br />
\end{array} \right. </math> }}<br />
<br />
<blockquote>For example, suppose we are measuring a concentration which naturally only takes non-negative values, but again we cannot measure it below the level $L = 1$. Therefore, any data point $y_{ij}$ below $1$ will be recorded only as "$y_{ij} \in [0,1)$". In the table, an additional column {{Verbatim|llimit}} is required to indicate the lower bound of the censoring interval.</blockquote><br />
<br />
<br />
{| cellspacing="0" cellpadding="0" <br />
| style="width=60%" |<br />
[[File:continuous_graf6b.png|link=]]<br />
| style="width=40%" align="right"|<br />
{| class="wikitable" style="width: 150%"<br />
!| ID || TIME ||CONC. || llimit || cens<br />
|-<br />
| 1 || 0.3 || 1.20 || . || 0<br />
|-<br />
| 1 || 0.5 || 1.93 || . || 0<br />
|-<br />
| 1 || 1.0 || 3.38 || . || 0<br />
|-<br />
| 1 || 2.0 || 3.88 || . || 0<br />
|-<br />
| 1 || 4.0 || 3.24 || . || 0<br />
|-<br />
| 1 || 6.0 || 1.82 || . || 0<br />
|-<br />
| 1 || 8.0 || 1.07 || . || 0<br />
|-<br />
| 1 || 12.0 || 1.00 || 0.00 || 1<br />
|-<br />
| 1 || 16.0 || 1.00 || 0.00 || 1<br />
|-<br />
| 1 || 20.0 || 1.00 || 0.00 || 1<br />
|}<br />
|}<br />
<br />
<br />
<br />
* '''Right censoring:''' when a data point is above a certain value $U$, it is not known by how much:<br />
<br />
{{Equation1<br />
|equation= <math> \repy_{ij} = \left\{ \begin{array}{cc}<br />
y_{ij} & {\rm if } \ y_{ij}\leq U \\<br />
y_{ij} > U & {\rm otherwise.}<br />
\end{array} \right. <br />
</math> }}<br />
<br />
<blockquote>Column {{Verbatim|cens}} is used to indicate if an observation is right-censored ({{Verbatim|cens{{-}}-1}}) or not ({{Verbatim|cens{{-}}0}}).<br />
</blockquote><br />
<br />
{| cellspacing="0" cellpadding="0" <br />
| style="width=60%" |<br />
[[File:continuous_graf6c.png|link=]]<br />
| style="width=40%" align="right" |<br />
{| class="wikitable" style="width: 150%"<br />
!| ID || TIME ||VOLUME || CENS<br />
|-<br />
| 1 || 2.0 || 1.85 || 0<br />
|-<br />
| 1 || 7.0 || 2.40 || 0<br />
|-<br />
| 1 || 12.0 || 3.27 || 0<br />
|-<br />
| 1 || 17.0 || 3.28 || 0<br />
|-<br />
| 1 || 22.0 || 3.62 || 0<br />
|- <br />
| 1 || 27.0 || 3.02 || 0<br />
|-<br />
| 1 || 32.0 || 3.80 || -1<br />
|-<br />
| 1 || 37.0 || 3.80 || -1<br />
|-<br />
| 1 || 42.0 || 3.80 || -1<br />
|-<br />
| 1 || 47.0 || 3.80 || -1<br />
|}<br />
|}<br />
<br />
<br />
<br />
{{Remarks<br />
|title=Remarks<br />
<br />
|text= &#32;<br />
* Different censoring limits and intervals can be in play at different times and for different individuals.<br />
* Interval censoring covers any type of censoring, i.e., setting $I=(-\infty,L]$ for left censoring and $I=[U,+\infty)$ for right censoring.<br />
}}<br />
<br />
<br />
The likelihood needs to be computed carefully in the presence of censored data. To cover all three types of censoring in one go, let $I_{ij}$ be the (finite or infinite) censoring interval existing for individual $i$ at time $t_{ij}$. Then,<br />
<br />
{{EquationWithRef<br />
|equation = <div id="likeN_model4"><math> <br />
\begin{eqnarray} \pcypsi(\brepy {{!}} \bpsi ) & = & \prod_{i=1}^N \prod_{j=1}^{n_i} \pypsiij(y_{ij} {{!}} \psi_i )^{\mathbf{1}_{y_{ij} \notin I_{ij} } } \, \prob{y_{ij} \in I_{ij} {{!}} \psi_i}^{\mathbf{1}_{y_{ij} \in I_{ij} } }.<br />
\end{eqnarray}<br />
</math></div><br />
|reference=(9) }}<br />
<br />
where<br />
<br />
{{Equation1<br />
|equation=<math> \prob{y_{ij} \in I_{ij} {{!}} \psi_i} = \int_{I_{ij} } \qypsiij(u {{!}} \psi_i )\, du </math> }}<br />
<br />
We see that if $y_{ij}$ is not censored (i.e., $ \mathbf{1}_{y_{ij} \notin I_{ij}} = 1$), the contribution to the likelihood is the usual $\pypsiij(y_{ij} | \psi_i )$, whereas if it is censored, the contribution is $\prob{y_{ij} \in I_{ij}|\psi_i}$.<br />
<br />
<br />
<br><br />
<br />
== Extensions to multidimensional continuous observations == <br />
<br />
<br />
<ul><br />
* Extension to multidimensional observations is straightforward. If $d$ outcomes are simultaneously measured at $t_{ij}$, then $y_{ij}$ is a now a vector in $\Rset^d$ and we can suppose that equation [[#nlme|(1)]] still holds for each component of $y_{ij}$. Thus, for $1\leq m \leq d$,<br />
<br />
{{Equation1<br />
|equation=<math><br />
y_{ijm}=f_m(t_{ij},\psi_i)+ g_m(t_{ij},\psi_i)\teps_{ijm} , \ \ 1\leq i \leq N,<br />
\ \ 1 \leq j \leq n_i.<br />
</math>}}<br />
<br />
: It is then possible to introduce correlation between the components of each observation by assuming that $\teps_{ij} = (\teps_{ijm} , 1\leq m \leq d)$ is a random vector with mean 0 and correlation matrix $R_{\teps_{ij}}$.<br />
<br />
<br />
* Suppose instead that $K$ replicates of the same measurement are taken at time $t_{ij}$. Then, the model becomes, for $1 \leq k \leq K$,<br />
<br />
{{Equation1<br />
|equation=<math><br />
y_{ijk}=f(t_{ij},\psi_i)+ g(t_{ij},\bpsi_i)\teps_{ijk} ,\ \ 1\leq i \leq N,<br />
\ \ 1 \leq j \leq n_i .<br />
</math> }}<br />
<br />
: Following what can be done for decomposing random effects into inter-individual and inter-occasion components, we can decompose the residual error into inter-measurement and inter-replicate components:<br />
<br />
{{Equation1<br />
|equation=<math><br />
y_{ijk}=f(t_{ij},\psi_i)+ g_{I\!M}(t_{ij},\psi_i)\vari{\teps}{ij}{I\!M} + g_{I\!R}(x_{ij},\psi_i)\vari{\teps}{ijk}{I\!R} .<br />
</math> }}<br />
</ul><br />
<br><br><br />
-----------------------------------------------<br />
<br><br><br />
<br />
{{Summary<br />
|title=Summary <br />
|text= <br />
A model for continuous data is completely defined by:<br />
<br />
*The structural model $f$<br />
*The residual error model $g$<br />
*The probability distribution of the residual errors $(\teps_{ij})$<br />
*Possibly a transformation $\transy$ of the data<br />
<br />
<br />
The model is associated with a design which includes:<br />
<br />
<br />
- the observation times $(t_{ij})$<br />
<br />
- possibly some additional regression variables $(x_{ij})$<br />
<br />
- possibly the inputs $(u_i)$ (e.g., the dosing regimen for a PK model)<br />
<br />
- possibly a censoring process $(I_{ij})$<br />
<br />
}}<br />
<br />
<br />
== $\mlxtran$ for continuous data models == <br />
<br />
<br />
<br />
{{ExampleWithCode<br />
|title1=Example 1:<br />
|title2=<br />
<br />
|text= <br />
|equation=<math>\begin{eqnarray}<br />
\psi &=& (A,\alpha,B,\beta, a) \\<br />
f(t,\psi) &=& A\, e^{- \alpha \, t} + B\, e^{- \beta \, t} \\<br />
y_{ij} &=& f(t_{ij} , \psi_i) + a\, \teps_{ij}<br />
\end{eqnarray}</math><br />
|code=<br />
{{MLXTranForTable<br />
|name=<br />
|text=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
INPUT:<br />
input = {A, B, alpha, beta, a}<br />
<br />
EQUATION:<br />
f = A*exp(-alpha*t) + B*exp(-beta*t)<br />
<br />
DEFINITION:<br />
y = {distribution=normal, prediction=f, std=a}</pre><br />
}}<br />
<br />
}}<br />
<br />
<br />
{{ExampleWithCode<br />
|title1=Example 2:<br />
|title2=<br />
<br />
|text=<br />
|equation= <math> \begin{eqnarray}<br />
\psi &=& (\delta, c , \beta, p, s, d, \nu,\rho, a) \\<br />
t_0 &=&0 \\[0.2cm]<br />
{\rm if \quad t<t_0} \\[0.2cm]<br />
\quad \nitc &=& \delta \, c/( \beta \, p) \\<br />
\quad \itc &=& (s - d\,\nitc) / \delta \\<br />
\quad \vl &=& p \, \itc / c. \\[0.2cm] <br />
{\rm else \quad \quad }\\[0.2cm] <br />
\quad \dA{\nitc}{} & =& s - \beta(1-\nu) \, \nitc(t) \, \vl(t) - d\,\nitc(t) \\<br />
\quad \dA{\itc}{} & = &\beta(1-\nu) \, \nitc(t) \, \vl(t) - \delta \, \itc(t) \\<br />
\quad \dA{\vl}{} & = &p(1-\rho) \, \itc(t) - c \, \vl(t) \\<br />
\quad \log(y_{ij}) &= &\log(V(t_{ij} , \psi_i)) + a\, \teps_{ij} <br />
\end{eqnarray}</math><br />
|code=<br />
{{MLXTranForTable<br />
|name=<br />
|text=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
INPUT:<br />
input = {delta, c, beta, p, s, d, nu, rho, a}<br />
<br />
EQUATION:<br />
t0=0<br />
N_0 = delta*c/(beta*p)<br />
I_0 = (s - d*N_0)/delta<br />
V_0 = p*I_0/c<br />
ddt_N = s - beta*(1-nu)*N*V - d*N<br />
ddt_I = beta*(1-nu)*N*V - delta*I<br />
ddt_V = p*(1-rho)*I - c*V<br />
<br />
DEFINITION:<br />
y = {distribution=logNormal, prediction=V, std=a}<br />
</pre> }} <br />
}}<br />
<br />
<br><br><br />
<br />
<br />
==Bibliography==<br />
<br />
<br />
<bibtex><br />
@book{davidian1995,<br />
author = {Davidian, M. and Giltinan, D.M. },<br />
title = {Nonlinear Models for Repeated Measurements Data },<br />
publisher = {Chapman & Hall.},<br />
address = {London},<br />
edition = {},<br />
year = {1995}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{demidenko2005mixed,<br />
title={Mixed Models: Theory and Applications},<br />
author={Demidenko, E.},<br />
isbn={9780471726135},<br />
series={Wiley Series in Probability and Statistics}, url={http://books.google.fr/books/about/Mixed_Models.html?id=IWQR8d_UZHoC&redir_esc=y}, <br />
year={2005}, publisher={Wiley}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{fitzmaurice2008longitudinal,<br />
title={Longitudinal Data Analysis},<br />
author={Fitzmaurice, G. and Davidian, M. and Verbeke, G. and Molenberghs, G.},<br />
isbn={9781420011579},<br />
lccn={2008020681},<br />
series={Chapman & Hall/CRC Handbooks of Modern Statistical Methods},url={http://books.google.fr/books?id=zVBjCvQCoGQC},<br />
year={2008},publisher={Taylor & Francis}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{jiang2007,<br />
author = {Jiang, J.},<br />
title = {Linear and Generalized Linear Mixed Models and Their Applications},<br />
publisher = {Springer Series in Statistics},<br />
year = {2007},<br />
address = {New York}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{laird1982,<br />
author = {Laird, N.M. and Ware, J.H.},<br />
title = {Random-Effects Models for Longitudinal Data},<br />
journal = {Biometrics},<br />
volume = {38},<br />
pages = {963-974},<br />
year = {1982}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{lindstrom1990Nonlinear,<br />
author = {Lindstrom, M.J. and Bates, D.M. },<br />
title = {Nonlinear mixed-effects models for repeated measures},<br />
journal = {Biometrics},<br />
volume = {46},<br />
pages = {673-687},<br />
year = {1990}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{littell2006sas,<br />
title={SAS for mixed models},<br />
author={Littell, R.C.},<br />
year={2006},<br />
publisher={SAS institute}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{mcculloch2011generalized,<br />
title={Generalized, Linear, and Mixed Models},<br />
author={McCulloch, C.E. and Searle, S.R.},<br />
isbn={9781118209967},<br />
series={Wiley Series in Probability and Statistics}, url={http://books.google.fr/books/about/Generalized_Linear_and_Mixed_Models.html?id=bWDPukohugQC&redir_esc=y}, year={2004}, publisher={Wiley & Sons} <br />
}<br />
</bibtex><br />
<bibtex><br />
@book{verbeke2009linear,<br />
title={Linear Mixed Models for Longitudinal Data},<br />
author={Verbeke, G. and Molenberghs, G.},<br />
isbn={9781441902993},<br />
lccn={2010483807},<br />
series={Springer Series in Statistics},<br />
url={http://books.google.fr/books?id=jmPkX4VU7h0C},<br />
year={2009},<br />
publisher={Springer}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{west2006linear,<br />
title={Linear Mixed Models: A Practical Guide Using Statistical Software},<br />
author={West, B. and Welch, K.B. and Galecki, A.T.},<br />
isbn={9781584884804},<br />
lccn={2006045440},year={2006},publisher={Taylor & Francis}<br />
}<br />
</bibtex><br />
<br />
<br />
{{Back&Next<br />
|linkBack=Modeling the observations <br />
|linkNext=Models for count data }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=Continuous_data_models&diff=7398Continuous data models2013-06-24T07:42:53Z<p>Brocco: /* Extension to autocorrelated errors */</p>
<hr />
<div><!-- Menu for the Observations chapter --><br />
<sidebarmenu><br />
+[[Modeling the observations]]<br />
*[[Modeling the observations| Introduction ]] | [[ Continuous data models ]] | [[Models for count data]] | [[Model for categorical data]] | [[Models for time-to-event data ]] | [[Joint models]] <br />
</sidebarmenu><br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
== The data ==<br />
<br />
Continuous data is data that can take any real value within a given range. For instance, a concentration takes its values in $\Rset^+$, the log of the viral load in $\Rset$, an effect expressed as a percentage in $[0,100]$.<br />
<br />
The data can be stored in a table and represented graphically. Here is some simple pharmacokinetics data involving four individuals.<br />
<br />
<br />
{| cellpadding="0" cellspacing="0" <br />
| style="width:60%" align="center"| <br />
:[[File:continuous_graf0a_1.png]]<br />
| style="width: 40%" align="left"| <br />
:{| class="wikitable" style="width: 70%;"<br />
!| ID || TIME ||CONCENTRATION<br />
|- <br />
|1 || 1.0 || 9.84 <br />
|-<br />
|1 || 2.0 || 8.19 <br />
|-<br />
|1 || 4.0 || 6.91 <br />
|-<br />
|1 || 8.0 || 3.71 <br />
|-<br />
|1 || 12.0 || 1.25 <br />
|-<br />
|2 || 1.0 || 17.23 <br />
|-<br />
|2 || 3.0 || 11.14 <br />
|-<br />
|2 || 5.0 || 4.35 <br />
|-<br />
|2 || 10.0 || 2.92 <br />
|-<br />
|3 || 2.0 || 9.78 <br />
|-<br />
|3 || 3.0 || 10.40 <br />
|-<br />
|3 || 4.0 || 7.67 <br />
|-<br />
|3 || 6.0 || 6.84 <br />
|-<br />
|3 || 11.0 || 1.10 <br />
|-<br />
|4 || 4.0 || 8.78 <br />
|-<br />
|4 || 6.0 || 3.87 <br />
|-<br />
|4 || 12.0 || 1.85 <br />
|}<br />
|}<br />
<br />
<br />
Instead of individual plots, we can plot them all together. Such a figure is usually called a ''spaghetti plot'':<br />
<br />
<br />
::[[File:continuous_graf0b_1.png]]<br />
<br />
<br />
<br><br />
<br />
== The model ==<br />
<br />
<br />
For continuous data, we are going to consider scalar outcomes ($y_{ij}\in \Yr \subset \Rset$) and assume the following general model:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="nlme" ><math>y_{ij}=f(t_{ij},\psi_i)+ g(t_{ij},\psi_i)\teps_{ij}, \quad\ \quad 1\leq i \leq N, \quad \ 1 \leq j \leq n_i. </math></div><br />
|reference=(1)<br />
}}<br />
<br />
where $g(t_{ij},\psi_i)\geq 0$.<br />
<br />
Here, the residual errors $(\teps_{ij})$ are standardized random variables (mean zero and standard deviation 1).<br />
In this case, it is clear that $f(t_{ij},\psi_i)$ and $g(t_{ij},\psi_i)$ are the mean and standard deviation of $y_{ij}$, i.e.,<br />
<br />
{{Equation1<br />
|equation= <math>\begin{eqnarray} \esp{y_{ij} {{!}} \psi_i} &=& f(t_{ij},\psi_i) \\ <br />
\std{y_{ij} {{!}} \psi_i} &=& g(t_{ij},\psi_i).<br />
\end{eqnarray}</math>}}<br />
<br />
<br />
<br><br />
<br />
== The structural model == <br />
<br />
<br />
$f$ is known as the ''structural model'' and aims to describe the time evolution of the phenomena under study. For a given subject $i$ and vector of individual parameters $\psi_i$, $f(t_{ij},\psi_i)$ is the prediction of the observed variable at time $t_{ij}$. In other words, it is the value that would be measured at time $t_{ij}$ if there was no error ($\teps_{ij}=0$).<br />
<br />
In the current example, we decide to model with the structural model $f=A\exp\left(-\alpha t \right)$.<br />
Here are some example curves for various combinations of $A$ and $\alpha$:<br />
<br />
<br />
::[[File:continuous_graf1bis.png|link=]]<br />
<br />
<br />
Other models involving more complicated dynamical systems can be imagined, such as those defined as solutions of systems of ordinary or partial differential equations. Real-life examples are found in the study of HIV, pharmacokinetics and tumor growth.<br />
<br />
<br />
<br />
<br><br />
== The residual error model ==<br />
<br />
<br />
For a given structural model $f$, the conditional probability distribution of the observations $(y_{ij})$ is completely defined by the residual error model, i.e., the probability distribution of the residual errors $(\teps_{ij})$ and the standard deviation $g(x_{ij},\psi_i)$. The residual error model can take many forms. For example,<br />
<br />
<br />
<ul><br />
* A constant error model assumes that $g(t_{ij},\psi_i)=a_i$. Model [[#nlme|(1)]] then reduces to<br />
<br />
{{EquationWithRef <br />
|equation=<div id="nlme1" ><math>y_{ij}=f(t_{ij},\psi_i)+ a_i\teps_{ij}, \quad \quad \ 1\leq i \leq N<br />
\quad \ 1 \leq j \leq n_i. </math></div><br />
|reference=(2) }}<br />
<br />
:The figure below shows four simulated sequences of observations $(y_{ij}, 1\leq i \leq 4, 1\leq j \leq 10)$ with their respective structural model $f(t,\psi_i)$ in blue. Here, $a_i=2$ is the standard deviation of $y_{ij}$ for all $(i,j)$.<br />
<br />
<br />
::[[File: continuous_graf2a1.png|link=]]<br />
<br />
<br />
:Let $\hat{y}_{ij}=f(t_{ij},\psi_i)$ be the prediction of $y_{ij}$ given by the model [[#nlme1|(2)]]. The figure below shows for 50 individuals:<br />
<br />
<br />
<ul><br />
::'''-left''': prediction errors $e_{ij}=y_{ij}-\hat{y}_{ij}$ vs. predictions $(\hat{y}_{ij})$. The pink line is the mean $\esp{e_{ij}}=0$; the green lines are $\pm$ 1 standard deviations: $[\std{e_{ij}} , +\std{e_{ij}}]$ where $\std{e_{ij}}=a_i=0.5$. <br />
<br><br />
::'''-right''': observations $(y_{ij})$ vs. predictions $(\hat{y}_{ij})$. The pink line is the identify $y=\hat{y}$, the green lines represent an interval of $\pm 1$ standard deviations around $\hat{y}$: $[\hat{y}-\std{e_{ij}} , \hat{y}+\std{e_{ij}}]$.<br />
</ul><br />
<br />
<br />
::[[File:continuous_graf2a2.png|link=]]<br />
<br />
<br />
:These figures are typical for constant error models. The standard deviation of the prediction errors does not depend on the value of the predictions $(\hat{y}_{ij})$, so both intervals have constant amplitude.<br />
<br />
<br />
* A proportional error model assumes that $g(t_{ij},\psi_i) =b_i f(t_{ij},\psi_i)$. Model [[#nlme|(1)]] then becomes<br />
<br />
<br />
{{EquationWithRef <br />
|equation=<div id="nlme2"><math> y_{ij}=f(t_{ij},\psi_i)(1 + b_i\teps_{ij}), \quad\ \quad 1\leq i \leq N,<br />
\quad \ 1 \leq j \leq n_i . </math></div><br />
|reference=(3) }}<br />
<br />
:The standard deviation of the prediction error $e_{ij}=y_{ij}-\hat{y}_{ij}$ is proportional to the prediction $\hat{y}_{ij}$. Therefore, the amplitude of the $\pm 1$ standard deviation intervals increases linearly with $f$:<br />
<br />
<br />
::[[File:continuous_graf2b.png|link=]]<br />
<br />
<br />
* A combined error model combines a constant and a proportional error model by assuming $g(t_{ij},\psi_i) =a_i + b_i f(t_{ij},\psi_i)$, where $a_1>0$ and $b_i>0$. The standard deviation of the prediction error $e_{ij}$ and thus the amplitude of the intervals are now affine functions of the prediction $\hat{y}_{ij}$:<br />
<br />
<br />
::[[File:continuous_graf2c.png|link=]]<br />
<br />
<br />
* Another alternative combined error model is $g(t_{ij},\psi_i) =\sqrt{a_i^2 + b_i^2 f^2(t_{ij},\psi_i)}$. This gives intervals that look fairly similar to the previous ones, though they are no longer affine.<br />
<br />
<br />
::[[File:continuous_graf2d.png|link=]]<br />
</ul><br />
<br />
<br />
<br><br />
<br />
== Extension to autocorrelated errors == <br />
<br />
<br />
For any subject $i$, the residual errors $(\teps_{ij},1\leq j \leq n_i)$ are usually assumed to be independent random variables. Extension to autocorrelated errors is possible by assuming for instance that $(\teps_{ij})$ is a stationary [http://en.wikipedia.org/wiki/Autoregressive%E2%80%93moving-average_model ARMA] (Autoregressive Moving Average) process.<br />
For example, an autoregressive process of order 1, AR(1), assumes that autocorrelation decreases exponentially:<br />
<br />
{{EquationWithRef <br />
|equation=<div id="autocorr1"><math> {\rm corr}(\teps_{ij},\teps_{i\,{j+1} }) = \rho_i^{(t_{i\,j+1}-t_{ij})}. </math></div><br />
|reference=(4) }}<br />
<br />
where $0\leq \rho_i <1$ for each individual $i$.<br />
If we assume that $t_{ij}=j$ for any $(i,j)$. Then, $t_{i,j+1}-t_{i,j}=1$ and the autocorrelation function $\gamma$ is given by:<br />
<br />
{{Equation1<br />
|equation=<math> \begin{array}<br />
\gamma(\tau) &=& {\rm corr}(\teps_{ij},\teps_{i\,j+\tau}) \\ &= &\rho_i^{\tau} .<br />
\end{array}</math> }}<br />
<br />
The figure below displays 3 different sequences of residual errors simulated with 3 different autocorrelations $\rho_1=0.1$, $\rho_2=0.6$ and $\rho_3=0.95$. The autocorrelation functions $\gamma(\tau)$ are also displayed.<br />
<br />
<br />
::[[File:continuousGraf3.png|link=]]<br />
<br />
<br />
<br />
<br><br />
<br />
== Distribution of the standardized residual errors ==<br />
<br />
<br />
The distribution of the standardized residual errors $(\teps_{ij})$ is usually assumed to be the same for each individual $i$ and any observation time $t_{ij}$.<br />
Furthermore, for identifiability reasons it is also assumed to be symmetrical around 0, i.e., $\prob{\teps_{ij}<-u}=\prob{\teps_{ij}>u}$ for all $u\in \Rset$.<br />
Thus, for any $(i,j)$ the distribution of the observation $y_{ij}$ is also symmetrical around its prediction $f(t_{ij},\psi_i)$. This $f(t_{ij},\psi_i)$ is therefore both the mean and the median of the distribution of $y_{ij}$: $\esp{y_{ij}|\psi_i}=f(t_{ij},\psi_i)$ and $\prob{y_{ij}>f(t_{ij},\psi_i)} = \prob{y_{ij}<f(t_{ij},\psi_i)} = 1/2$. If we make the additional hypothesis that 0 is the mode of the distribution of $\teps_{ij}$, then $f(t_{ij},\psi_i)$ is also the mode of the distribution of $y_{ij}$.<br />
<br />
A widely used bell-shaped distribution for modeling residual errors is the normal distribution. If we assume that $\teps_{ij}\sim {\cal N}(0,1)$, then $y_{ij}$ is also normally distributed: $ y_{ij}\sim {\cal N}(f(t_{ij},\bpsi_i),\, g(t_{ij},\bpsi_i))$.<br />
<br />
Other distributions can be used, such as [http://en.wikipedia.org/wiki/Student's_t-distribution Student's $t$-distribution] (also known simply as the $t$-distribution) which is also symmetric and bell-shaped but with heavier tails, meaning that it is more prone to producing values that fall far from its prediction.<br />
<br />
<br />
::[[File:continuous_graf4_bis.png|link=]]<br />
<br />
<br />
If we assume that $\teps_{ij}\sim t(\nu)$, then $y_{ij}$ has a non-standardized [http://en.wikipedia.org/wiki/Student's_t-distribution Student's $t$-distribution].<br />
<br />
<br />
<br />
<br><br />
<br />
== The conditional likelihood ==<br />
<br />
<br />
The conditional likelihood for given observations $\by$ is defined as<br />
<br />
{{Equation1<br />
|equation=<math> {\like}(\bpsi; \by) \ \ \eqdef \ \ \pcypsi(\by {{!}} \bpsi), </math> }}<br />
<br />
where $\pcypsi(\by | \bpsi)$ is the conditional density function of the observations. <br />
If we assume that the residual errors $(\teps_{ij},\ 1\leq i \leq N,\ 1\leq j \leq n_i)$ are i.i.d., then this conditional density is straightforward to compute:<br />
<br />
{{EquationWithRef <br />
|equation=<div id="likeN_model1"><math> \begin{eqnarray}\pcypsi(\by {{!}} \bpsi ) & = & \prod_{i=1}^N \pcyipsii(\by_i {{!}} \psi_i ) \\<br />
& = & \prod_{i=1}^N \prod_{j=1}^{n_i} \pypsiij(y_{ij} {{!}} \bpsi_i ) \\<br />
& = & \prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{\frac{1}{g(t_{ij},\psi_i)} } \, \qeps\left(\frac{y_{ij} - f(t_{ij},\psi_i)}{g(t_{ij},\psi_i)}\right) ,<br />
\end{eqnarray} </math></div><br />
|reference=(5) }}<br />
<br />
where $\qeps$ is the pdf of the i.i.d. residual errors ($\teps_{ij}$).<br />
<br />
For example, if we assume that the residual errors $\teps_{ij}$ are Gaussian random variables with mean 0 and variance 1, then $ \qeps(x) = e^{-{x^2}/{2}}/\sqrt{2 \pi}$, and<br />
<br />
{{EquationWithRef <br />
|equation=<div id="likeN_model2" ><math> \begin{eqnarray}<br />
\pcypsi(\by {{!}} \psi ) & = &<br />
\prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{ \frac{1}{\sqrt{2 \pi} g(t_{ij},\psi_i)} }\, \exp\left\{-\frac{1}{2}\left(\frac{y_{ij} - f(t_{ij},\psi_i)}{g(t_{ij},\psi_i)}\right)^2\right\} .<br />
\end{eqnarray} </math></div><br />
|reference=(6) }}<br />
<br />
<br />
<br />
<br><br />
<br />
== Transforming the data==<br />
<br />
<br />
The assumption that the distribution of any observation $y_{ij}$ is symmetrical around its predicted value is a very strong one. If this assumption does not hold, we may decide to transform the data to make it more symmetric around its (transformed) predicted value. In other cases, constraints on the values that observations can take may also lead us to want to transform the data.<br />
<br />
Model [[#nlme|(1)]] can be extended to include a transformation of the data:<br />
<br />
{{EquationWithRef <br />
|equation=<div id="def_t" ><math> \transy(y_{ij})=\transy(f(t_{ij},\psi_i))+ g(t_{ij},\psi_i)\teps_{ij} </math></div><br />
|reference=(7) }}<br />
<br />
where $\transy$ is a monotonic transformation (a strictly increasing or decreasing function).<br />
As you can see, both the data $y_{ij}$ and the structural model $f$ are transformed by the function $\transy$ so that $f(t_{ij},\psi_i)$ remains the prediction of $y_{ij}$.<br />
<br />
<br />
<br />
{{Example<br />
|title=Examples: <br />
| text=<br />
1. If $y$ takes non-negative values, a log transformation can be used: $\transy(y) = \log(y)$. We can then present the model with one of two equivalent representations:<br />
<br />
<!-- Therefore, $y=f e^{g\teps}$. --><br />
<br />
{{Equation1<br />
|equation= <math> \begin{eqnarray}<br />
\log(y_{ij})&=&\log(f(t_{ij},\psi_i))+ g(t_{ij},\psi_i)\teps_{ij}, \\<br />
y_{ij}&=&f(t_{ij},\psi_i)\, e^{ \displaystyle{ -g(t_{ij},\psi_i)\teps_{ij} } }.<br />
\end{eqnarray}</math><br />
}}<br />
<br />
<br />
::[[File: continuous_graf5a.png|link=]]<br />
<br />
<br />
2. If $y$ takes its values between 0 and 1, a logit transformation can be used:<br />
<!-- %\begin{eqnarray*}<br />
%\transy(y)&=&\log(y/(1-y)) \\<br />
% y&=&\frac{f}{f+(1-f) e^{-g\teps}} .<br />
%\end{eqnarray*} --><br />
<br />
{{Equation1<br />
|equation= <math> \begin{eqnarray}<br />
\logit(y_{ij})&=&\logit(f(t_{ij},\psi_i))+ g(t_{ij},\psi_i)\teps_{ij} , \\<br />
y_{ij}&=& \displaystyle{\frac{ f(t_{ij},\bpsi_i) }{ f(t_{ij},\psi_i) + (1- f(t_{ij},\bpsi_i)) \, e^{ g(t_{ij},\psi_i)\teps_{ij} } } }.<br />
\end{eqnarray}</math><br />
}}<br />
<br />
<br />
::[[File:continuous_graf5b.png|link=]]<br />
<br />
<br />
3. The logit error model can be extended if the $y_{ij}$ are known to take their values in an interval $[A,B]$:<br />
<br />
{{Equation1<br />
|equation=<math> \begin{eqnarray}<br />
\transy(y_{ij})&=&\log((y_{ij}-A)/(B-y_{ij})), \\<br />
y_{ij}&=&A+(B-A)\displaystyle{\frac{f(t_{ij},\psi_i)-A}{f(t_{ij},\psi_i)-A+(B-f(t_{ij},\psi_i)) e^{-g(t_{ij},\psi_i)\teps_{ij} } } }\, .<br />
\end{eqnarray}</math><br />
}}<br />
<!-- [[File:continuous_graf5c.png]] --><br />
}}<br />
<br />
<br />
Using the transformation proposed in [[#def_t|(7)]], the conditional density $\pcypsi$ becomes<br />
<br />
{{EquationWithRef<br />
|equation= <div id="likeN_model3" ><math> \begin{eqnarray}<br />
\pcypsi(\by {{!}} \bpsi ) & = & \prod_{i=1}^N \prod_{j=1}^{n_i} \pypsiij(y_{ij} {{!}} \psi_i ) \\<br />
& = & \prod_{i=1}^N \prod_{j=1}^{n_i} \transy^\prime(y_{ij}) \, \ptypsiij(\transy(y_{ij}) {{!}} \psi_i ) \\<br />
& = & \prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{ \frac{\transy^\prime(y_{ij})}{g(t_{ij},\psi_i)} } \, \qeps\left(\frac{\transy(y_{ij}) - \transy(f(t_{ij},\psi_i))}{g(t_{ij},\psi_i)}\right)<br />
\end{eqnarray}<br />
</math></div><br />
|reference=(8) }}<br />
<br />
For example, if the observations are log-normally distributed given the individual parameters ($\transy(y) = \log(y)$), with a constant error model ($g(t;\psi_i)=a$), then<br />
<br />
{{Equation1<br />
|equation=<math> \pcypsi(\by {{!}} \bpsi ) = \prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{ \frac{1}{\sqrt{2 \pi a^2} \, y_{ij} } }\, \exp\left\{-\frac{1}{2 \, a^2}\left(\log(y_{ij}) - \log(f(t_{ij},\psi_i))\right)^2\right\}.<br />
</math> }} <br />
<br />
<br />
<br><br />
<br />
== Censored data ==<br />
<br />
<br />
Censoring occurs when the value of a measurement or observation is only partially known.<br />
For continuous data measurements in the longitudinal context, censoring refers to the values of the measurements, not the times at which they were taken.<br />
<br />
For example, in analytical chemistry, the lower limit of detection (LLOD) is the lowest quantity of a substance that can be distinguished from the absence of that substance. Therefore, any time the quantity is below the LLOD, the "measurement" is not a number but the information that the quantity is less than the LLOD.<br />
<br />
Similarly, in pharmacokinetic studies, measurements of the concentration below a certain limit referred to as the lower limit of quantification (LLOQ) are so low that their reliability is considered suspect. A measuring device can also have an upper limit of quantification (ULOQ) such that any value above this limit cannot be measured and reported.<br />
<br />
As hinted above, censored values are not typically reported as a number, but their existence is known, as well as the type of censoring. Thus, the observation $\repy_{ij}$ (i.e., what is reported) is the measurement $y_{ij}$ if not censored, and the type of censoring otherwise.<br />
<br />
We usually distinguish three types of censoring: left, right and interval. We now introduce these, along with illustrative data sets.<br />
<br />
<br />
* '''Left censoring''': a data point is below a certain value $L$ but it is not known by how much:<br />
<br />
{{Equation1<br />
|equation = <math> <br />
\repy_{ij} = \left\{ \begin{array}{c}<br />
y_{ij} & {\rm if } \ y_{ij} \geq L \\<br />
y_{ij} < L & {\rm otherwise.}<br />
\end{array} \right. </math> }} <br />
<br />
<blockquote>In the figures below, the "data" below the limit $L=-0.30$, shown in gray, is not observed. The values are therefore not reported in the dataset. An additional column {{Verbatim|cens}} can be used to indicate if an observation is left-censored ({{Verbatim|cens{{-}}1}}) or not ({{Verbatim|cens{{-}}0}}). The column of observations {{Verbatim|log-VL}} displays the observed log-viral load when it is above the limit $L=-0.30$, and the limit $L=-0.30$ otherwise.</blockquote><br />
<br />
<br />
{| cellspacing="0" cellpadding="0" <br />
| style="width=60%" |<br />
[[File:continuous_graf6a.png|link=]]<br />
| style="width=40%" align="right"|<br />
{| class="wikitable" style="width: 150%"<br />
!| ID || TIME ||log-VL || cens<br />
|- <br />
| 1 || 1.0 || 0.26 || 0<br />
|-<br />
| 1 || 2.0 || 0.02 || 0<br />
|-<br />
| 1 || 3.0 || -0.13 || 0<br />
|-<br />
| 1 || 4.0 || -0.13 || 0<br />
|-<br />
| 1 || 5.0 || -0.30 || 1<br />
|-<br />
| 1 || 6.0 || -0.30 || 1<br />
|-<br />
| 1 || 7.0 || -0.25 || 0<br />
|-<br />
| 1 || 8.0 || -0.30 || 1<br />
|-<br />
| 1 || 9.0 || -0.29 || 0<br />
|-<br />
| 1 || 10.0 || -0.30 || 1<br />
|}<br />
|}<br />
<br />
<br />
* '''Interval censoring:''' if a data point is in interval $I$, its exact value is not known:<br />
<br />
{{Equation1<br />
|equation=<math> \repy_{ij} = \left\{ \begin{array}{cc}<br />
y_{ij} & {\rm if } \ y_{ij}\notin I \\<br />
y_{ij} \in I & {\rm otherwise.}<br />
\end{array} \right. </math> }}<br />
<br />
<blockquote>For example, suppose we are measuring a concentration which naturally only takes non-negative values, but again we cannot measure it below the level $L = 1$. Therefore, any data point $y_{ij}$ below $1$ will be recorded only as "$y_{ij} \in [0,1)$". In the table, an additional column {{Verbatim|llimit}} is required to indicate the lower bound of the censoring interval.</blockquote><br />
<br />
<br />
{| cellspacing="0" cellpadding="0" <br />
| style="width=60%" |<br />
[[File:continuous_graf6b.png|link=]]<br />
| style="width=40%" align="right"|<br />
{| class="wikitable" style="width: 150%"<br />
!| ID || TIME ||CONC. || llimit || cens<br />
|-<br />
| 1 || 0.3 || 1.20 || . || 0<br />
|-<br />
| 1 || 0.5 || 1.93 || . || 0<br />
|-<br />
| 1 || 1.0 || 3.38 || . || 0<br />
|-<br />
| 1 || 2.0 || 3.88 || . || 0<br />
|-<br />
| 1 || 4.0 || 3.24 || . || 0<br />
|-<br />
| 1 || 6.0 || 1.82 || . || 0<br />
|-<br />
| 1 || 8.0 || 1.07 || . || 0<br />
|-<br />
| 1 || 12.0 || 1.00 || 0.00 || 1<br />
|-<br />
| 1 || 16.0 || 1.00 || 0.00 || 1<br />
|-<br />
| 1 || 20.0 || 1.00 || 0.00 || 1<br />
|}<br />
|}<br />
<br />
<br />
<br />
* '''Right censoring:''' when a data point is above a certain value $U$, it is not known by how much:<br />
<br />
{{Equation1<br />
|equation= <math> \repy_{ij} = \left\{ \begin{array}{cc}<br />
y_{ij} & {\rm if } \ y_{ij}\leq U \\<br />
y_{ij} > U & {\rm otherwise.}<br />
\end{array} \right. <br />
</math> }}<br />
<br />
<blockquote>Column {{Verbatim|cens}} is used to indicate if an observation is right-censored ({{Verbatim|cens{{-}}-1}}) or not ({{Verbatim|cens{{-}}0}}).<br />
</blockquote><br />
<br />
{| cellspacing="0" cellpadding="0" <br />
| style="width=60%" |<br />
[[File:continuous_graf6c.png|link=]]<br />
| style="width=40%" align="right" |<br />
{| class="wikitable" style="width: 150%"<br />
!| ID || TIME ||VOLUME || CENS<br />
|-<br />
| 1 || 2.0 || 1.85 || 0<br />
|-<br />
| 1 || 7.0 || 2.40 || 0<br />
|-<br />
| 1 || 12.0 || 3.27 || 0<br />
|-<br />
| 1 || 17.0 || 3.28 || 0<br />
|-<br />
| 1 || 22.0 || 3.62 || 0<br />
|- <br />
| 1 || 27.0 || 3.02 || 0<br />
|-<br />
| 1 || 32.0 || 3.80 || -1<br />
|-<br />
| 1 || 37.0 || 3.80 || -1<br />
|-<br />
| 1 || 42.0 || 3.80 || -1<br />
|-<br />
| 1 || 47.0 || 3.80 || -1<br />
|}<br />
|}<br />
<br />
<br />
<br />
{{Remarks<br />
|title=Remarks<br />
<br />
|text= &#32;<br />
* Different censoring limits and intervals can be in play at different times and for different individuals.<br />
* Interval censoring covers any type of censoring, i.e., setting $I=(-\infty,L]$ for left censoring and $I=[U,+\infty)$ for right censoring.<br />
}}<br />
<br />
<br />
The likelihood needs to be computed carefully in the presence of censored data. To cover all three types of censoring in one go, let $I_{ij}$ be the (finite or infinite) censoring interval existing for individual $i$ at time $t_{ij}$. Then,<br />
<br />
{{EquationWithRef<br />
|equation = <div id="likeN_model4"><math> <br />
\begin{eqnarray} \pcypsi(\brepy {{!}} \bpsi ) & = & \prod_{i=1}^N \prod_{j=1}^{n_i} \pypsiij(y_{ij} {{!}} \psi_i )^{\mathbf{1}_{y_{ij} \notin I_{ij} } } \, \prob{y_{ij} \in I_{ij} {{!}} \psi_i}^{\mathbf{1}_{y_{ij} \in I_{ij} } }.<br />
\end{eqnarray}<br />
</math></div><br />
|reference=(9) }}<br />
<br />
where<br />
<br />
{{Equation1<br />
|equation=<math> \prob{y_{ij} \in I_{ij} {{!}} \psi_i} = \int_{I_{ij} } \qypsiij(u {{!}} \psi_i )\, du </math> }}<br />
<br />
We see that if $y_{ij}$ is not censored (i.e., $ \mathbf{1}_{y_{ij} \notin I_{ij}} = 1$), the contribution to the likelihood is the usual $\pypsiij(y_{ij} | \psi_i )$, whereas if it is censored, the contribution is $\prob{y_{ij} \in I_{ij}|\psi_i}$.<br />
<br />
<br />
<br><br />
<br />
== Extensions to multidimensional continuous observations == <br />
<br />
<br />
<ul><br />
* Extension to multidimensional observations is straightforward. If $d$ outcomes are simultaneously measured at $t_{ij}$, then $y_{ij}$ is a now a vector in $\Rset^d$ and we can suppose that equation [[#nlme|(1)]] still holds for each component of $y_{ij}$. Thus, for $1\leq m \leq d$,<br />
<br />
{{Equation1<br />
|equation=<math><br />
y_{ijm}=f_m(t_{ij},\psi_i)+ g_m(t_{ij},\psi_i)\teps_{ijm} , \ \ 1\leq i \leq N,<br />
\ \ 1 \leq j \leq n_i.<br />
</math>}}<br />
<br />
: It is then possible to introduce correlation between the components of each observation by assuming that $\teps_{ij} = (\teps_{ijm} , 1\leq m \leq d)$ is a random vector with mean 0 and correlation matrix $R_{\teps_{ij}}$.<br />
<br />
<br />
* Suppose instead that $K$ replicates of the same measurement are taken at time $t_{ij}$. Then, the model becomes, for $1 \leq k \leq K$,<br />
<br />
{{Equation1<br />
|equation=<math><br />
y_{ijk}=f(t_{ij},\psi_i)+ g(t_{ij},\bpsi_i)\teps_{ijk} ,\ \ 1\leq i \leq N,<br />
\ \ 1 \leq j \leq n_i .<br />
</math> }}<br />
<br />
: Following what can be done for decomposing random effects into inter-individual and inter-occasion components, we can decompose the residual error into inter-measurement and inter-replicate components:<br />
<br />
{{Equation1<br />
|equation=<math><br />
y_{ijk}=f(t_{ij},\psi_i)+ g_{I\!M}(t_{ij},\psi_i)\vari{\teps}{ij}{I\!M} + g_{I\!R}(x_{ij},\psi_i)\vari{\teps}{ijk}{I\!R} .<br />
</math> }}<br />
</ul><br />
<br><br><br />
-----------------------------------------------<br />
<br><br><br />
<br />
{{Summary<br />
|title=Summary <br />
|text= <br />
A model for continuous data is completely defined by:<br />
<br />
*The structural model $f$<br />
*The residual error model $g$<br />
*The probability distribution of the residual errors $(\teps_{ij})$<br />
*Possibly a transformation $\transy$ of the data<br />
<br />
<br />
The model is associated with a design which includes:<br />
<br />
<br />
- the observation times $(t_{ij})$<br />
<br />
- possibly some additional regression variables $(x_{ij})$<br />
<br />
- possibly the inputs $(u_i)$ (e.g., the dosing regimen for a PK model)<br />
<br />
- possibly a censoring process $(I_{ij})$<br />
<br />
}}<br />
<br />
<br />
== $\mlxtran$ for continuous data models == <br />
<br />
<br />
<br />
{{ExampleWithCode<br />
|title1=Example 1:<br />
|title2=<br />
<br />
|text= <br />
|equation=<math>\begin{eqnarray}<br />
\psi &=& (A,\alpha,B,\beta, a) \\<br />
f(t,\psi) &=& A\, e^{- \alpha \, t} + B\, e^{- \beta \, t} \\<br />
y_{ij} &=& f(t_{ij} , \psi_i) + a\, \teps_{ij}<br />
\end{eqnarray}</math><br />
|code=<br />
{{MLXTranForTable<br />
|name=<br />
|text=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
INPUT:<br />
input = {A, B, alpha, beta, a}<br />
<br />
EQUATION:<br />
f = A*exp(-alpha*t) + B*exp(-beta*t)<br />
<br />
DEFINITION:<br />
y = {distribution=normal, prediction=f, std=a}</pre><br />
}}<br />
<br />
}}<br />
<br />
<br />
{{ExampleWithCode<br />
|title1=Example 2:<br />
|title2=<br />
<br />
|text=<br />
|equation= <math> \begin{eqnarray}<br />
\psi &=& (\delta, c , \beta, p, s, d, \nu,\rho, a) \\<br />
t_0 &=&0 \\[0.2cm]<br />
{\rm if \quad t<t_0} \\[0.2cm]<br />
\quad \nitc &=& \delta \, c/( \beta \, p) \\<br />
\quad \itc &=& (s - d\,\nitc) / \delta \\<br />
\quad \vl &=& p \, \itc / c. \\[0.2cm] <br />
{\rm else \quad \quad }\\[0.2cm] <br />
\quad \dA{\nitc}{} & =& s - \beta(1-\nu) \, \nitc(t) \, \vl(t) - d\,\nitc(t) \\<br />
\quad \dA{\itc}{} & = &\beta(1-\nu) \, \nitc(t) \, \vl(t) - \delta \, \itc(t) \\<br />
\quad \dA{\vl}{} & = &p(1-\rho) \, \itc(t) - c \, \vl(t) \\<br />
\quad \log(y_{ij}) &= &\log(V(t_{ij} , \psi_i)) + a\, \teps_{ij} <br />
\end{eqnarray}</math><br />
|code=<br />
{{MLXTranForTable<br />
|name=<br />
|text=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
INPUT:<br />
input = {delta, c, beta, p, s, d, nu, rho, a}<br />
<br />
EQUATION:<br />
t0=0<br />
N_0 = delta*c/(beta*p)<br />
I_0 = (s - d*N_0)/delta<br />
V_0 = p*I_0/c<br />
ddt_N = s - beta*(1-nu)*N*V - d*N<br />
ddt_I = beta*(1-nu)*N*V - delta*I<br />
ddt_V = p*(1-rho)*I - c*V<br />
<br />
DEFINITION:<br />
y = {distribution=logNormal, prediction=V, std=a}<br />
</pre> }} <br />
}}<br />
<br />
<br><br><br />
<br />
<br />
==Bibliography==<br />
<br />
<br />
<bibtex><br />
@book{davidian1995,<br />
author = {Davidian, M. and Giltinan, D.M. },<br />
title = {Nonlinear Models for Repeated Measurements Data },<br />
publisher = {Chapman & Hall.},<br />
address = {London},<br />
edition = {},<br />
year = {1995}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{demidenko2005mixed,<br />
title={Mixed Models: Theory and Applications},<br />
author={Demidenko, E.},<br />
isbn={9780471726135},<br />
series={Wiley Series in Probability and Statistics}, url={http://books.google.fr/books/about/Mixed_Models.html?id=IWQR8d_UZHoC&redir_esc=y}, <br />
year={2005}, publisher={Wiley}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{fitzmaurice2008longitudinal,<br />
title={Longitudinal Data Analysis},<br />
author={Fitzmaurice, G. and Davidian, M. and Verbeke, G. and Molenberghs, G.},<br />
isbn={9781420011579},<br />
lccn={2008020681},<br />
series={Chapman & Hall/CRC Handbooks of Modern Statistical Methods},url={http://books.google.fr/books?id=zVBjCvQCoGQC},<br />
year={2008},publisher={Taylor & Francis}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{jiang2007,<br />
author = {Jiang, J.},<br />
title = {Linear and Generalized Linear Mixed Models and Their Applications},<br />
publisher = {Springer Series in Statistics},<br />
year = {2007},<br />
address = {New York}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{laird1982,<br />
author = {Laird, N.M. and Ware, J.H.},<br />
title = {Random-Effects Models for Longitudinal Data},<br />
journal = {Biometrics},<br />
volume = {38},<br />
pages = {963-974},<br />
year = {1982}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{lindstrom1990Nonlinear,<br />
author = {Lindstrom, M.J. and Bates, D.M. },<br />
title = {Nonlinear mixed-effects models for repeated measures},<br />
journal = {Biometrics},<br />
volume = {46},<br />
pages = {673-687},<br />
year = {1990}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{littell2006sas,<br />
title={SAS for mixed models},<br />
author={Littell, R.C.},<br />
year={2006},<br />
publisher={SAS institute}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{mcculloch2011generalized,<br />
title={Generalized, Linear, and Mixed Models},<br />
author={McCulloch, C.E. and Searle, S.R.},<br />
isbn={9781118209967},<br />
series={Wiley Series in Probability and Statistics}, url={http://books.google.fr/books/about/Generalized_Linear_and_Mixed_Models.html?id=bWDPukohugQC&redir_esc=y}, year={2004}, publisher={Wiley & Sons} <br />
}<br />
</bibtex><br />
<bibtex><br />
@book{verbeke2009linear,<br />
title={Linear Mixed Models for Longitudinal Data},<br />
author={Verbeke, G. and Molenberghs, G.},<br />
isbn={9781441902993},<br />
lccn={2010483807},<br />
series={Springer Series in Statistics},<br />
url={http://books.google.fr/books?id=jmPkX4VU7h0C},<br />
year={2009},<br />
publisher={Springer}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{west2006linear,<br />
title={Linear Mixed Models: A Practical Guide Using Statistical Software},<br />
author={West, B. and Welch, K.B. and Galecki, A.T.},<br />
isbn={9781584884804},<br />
lccn={2006045440},year={2006},publisher={Taylor & Francis}<br />
}<br />
</bibtex><br />
<br />
<br />
{{Back&Next<br />
|linkBack=Modeling the observations <br />
|linkNext=Models for count data }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=Introduction_%26_notation&diff=7397Introduction & notation2013-06-24T07:30:18Z<p>Brocco: </p>
<hr />
<div><div style="color: #2E5894; padding-left: 1.4em; padding-right:2.2em; padding-bottom:0.8em; padding:top:1">[[Image:attention4.jpg|45px|left|link=]] <br />
(If you are experiencing problems with the display of the mathematical formula, you can either try to use another browser, or use this link which should work smoothly: http://popix.lixoft.net)<br />
</div><br />
<br />
Models are attempts to describe observations in a logical, simple way, involving the relationship between measurements, parameters, covariates and so on. If working in a probabilistic framework - as we are here - there will be randomness in the model, involving random variables, probability distributions, errors and more.<br />
<br />
Because of this, we are going to make the following definition of a model in this context: '''a model is a joint probability distribution'''.<br />
<br />
Therefore, [[What is a model? A joint probability distribution! | defining a model means defining a joint probability distribution]], which can then be decomposed into a product of conditional distributions we can perform tasks on: estimation, model selection, simulation, etc.<br />
<br />
This chapter is therefore about defining appropriate probability distributions. We start by introducing some general notation and conventions.<br />
<br />
<br />
* We will call $y_i$ the set of observations recorded on subject $i$, and $\by$ the combined set of observations for all the $N$ individuals: $\by = (y_1, ...,y_N)$. In general, we will use '''bold''' text (like for $\by$) when a variable regroups several individuals. Thus, we write $\psi_i$ for the parameter vector for individual $i$ and $\bpsi$ the parameter vector of a set of individuals, $\bpsi = (\psi_1,\ldots,\psi_N)$.<br />
<br />
<br />
* We note $\qy$ and $\qpsi$ the distributions of $\by$ and $\bpsi$ respectively, $\qcypsi$ the conditional distribution of $\by$ given $\bpsi$, and $\qypsi$ the joint distribution of $\by$ and $\bpsi$. In these (and other distributions), we have placed the variable described by the distribution in the index.<br />
<br />
<br />
* We use the same "$p$" notation for the distribution of a random variable as for its probability density function (pdf).<br />
<br />
<br />
* When there is no ambiguity when working with whole equations, to simplify notation we may omit the indices and simply use the symbol $\pmacro$. For instance, $\qy(\by)$, the pdf of $\by$, becomes $\py(\by)$; both are equivalent. The symbol $\pmacro$ has no meaning on its own, it is completely defined by its arguments.<br />
<br />
<br />
* When the distribution of the individual parameters $\psi_i$ of subject $i$ depends on a vector of individual covariates $c_i$ and a population parameter $\theta$, we may choose to explicitly show this dependence by writing the distribution of $\psi_i$ as $\ppsii(\psi_i;c_i,\theta)$.<br />
<br />
<br />
* When the conditional distribution $\qcyipsii$ of the observations $y_i=(y_{ij}, 1\leq j \leq n_i)$ of individual $i$ depends on regression variables $x_i=(x_{ij}, 1\leq j \leq n_i)$ and source terms $u_i$, (i.e., inputs of a dynamical system such as doses in a pharmacokinetic model), we may choose to explicitly show this dependence, writing the conditional distribution as $\pcyipsii(y_i | \psi_i;x_i,u_i)$.<br />
<br />
<br />
There are two important pieces to the puzzle: the observations $\by$ whose distribution $\qy$ depends on the individual parameters, and the individual parameters $\bpsi$ themselves with distribution $\qpsi$. In the population approach, the base distribution is the joint distribution $\qypsi$ of the observations and individual parameters:<br />
<br />
{{Equation1<br />
|equation=<math>\pypsi(\by,\bpsi) = \pcypsi(\by {{!}} \bpsi)\ppsi(\bpsi).<br />
</math> }}<br />
<br />
In this chapter, we concentrate essentially on these two components: the conditional distribution $\qcypsi$ of the observations, and the distribution $\qpsi$ of the individual parameters.<br />
<br />
Depending on the required complexity of the model, its other components such as covariates, population parameters and design can also be modeled as random variables, but we will not go into such detail in this chapter.<br />
<br />
For each model, we aim to precisely identify the minimal amount of information needed to represent it mathematically, so that it remains possible to implement and analyze. To do this, we will be able to use $\mlxtran$, a powerful formal declarative language that allows us to describe complicated structural and statistical models in a straightforward, intuitive way.<br />
<br />
{{Next<br />
|link=Modeling the individual parameters }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=Introduction_%26_notation&diff=7396Introduction & notation2013-06-24T07:29:52Z<p>Brocco: </p>
<hr />
<div><div style="color: #2E5894; padding-left: 1.4em; padding-right:2.2em; padding-bottom:0.8em; padding:top:1">[[Image:attention4.jpg|45px|left|link=]] <br />
(If you are experiencing problems with the display of the mathematical formula, you can either try to use another browser, or use this link which should work smoothly: http://popix.lixoft.net)<br />
</div><br />
<br />
Models are attempts to describe observations in a logical, simple way, involving the relationship between measurements, parameters, covariates and so on. If working in a probabilistic framework - as we are here - there will be randomness in the model, involving random variables, probability distributions, errors and more.<br />
<br />
Because of this, we are going to make the following definition of a model in this context: '''a model is a joint probability distribution'''.<br />
<br />
Therefore, [[What is a model? A joint probability distribution! defining a model means defining a joint probability distribution]], which can then be decomposed into a product of conditional distributions we can perform tasks on: estimation, model selection, simulation, etc.<br />
<br />
This chapter is therefore about defining appropriate probability distributions. We start by introducing some general notation and conventions.<br />
<br />
<br />
* We will call $y_i$ the set of observations recorded on subject $i$, and $\by$ the combined set of observations for all the $N$ individuals: $\by = (y_1, ...,y_N)$. In general, we will use '''bold''' text (like for $\by$) when a variable regroups several individuals. Thus, we write $\psi_i$ for the parameter vector for individual $i$ and $\bpsi$ the parameter vector of a set of individuals, $\bpsi = (\psi_1,\ldots,\psi_N)$.<br />
<br />
<br />
* We note $\qy$ and $\qpsi$ the distributions of $\by$ and $\bpsi$ respectively, $\qcypsi$ the conditional distribution of $\by$ given $\bpsi$, and $\qypsi$ the joint distribution of $\by$ and $\bpsi$. In these (and other distributions), we have placed the variable described by the distribution in the index.<br />
<br />
<br />
* We use the same "$p$" notation for the distribution of a random variable as for its probability density function (pdf).<br />
<br />
<br />
* When there is no ambiguity when working with whole equations, to simplify notation we may omit the indices and simply use the symbol $\pmacro$. For instance, $\qy(\by)$, the pdf of $\by$, becomes $\py(\by)$; both are equivalent. The symbol $\pmacro$ has no meaning on its own, it is completely defined by its arguments.<br />
<br />
<br />
* When the distribution of the individual parameters $\psi_i$ of subject $i$ depends on a vector of individual covariates $c_i$ and a population parameter $\theta$, we may choose to explicitly show this dependence by writing the distribution of $\psi_i$ as $\ppsii(\psi_i;c_i,\theta)$.<br />
<br />
<br />
* When the conditional distribution $\qcyipsii$ of the observations $y_i=(y_{ij}, 1\leq j \leq n_i)$ of individual $i$ depends on regression variables $x_i=(x_{ij}, 1\leq j \leq n_i)$ and source terms $u_i$, (i.e., inputs of a dynamical system such as doses in a pharmacokinetic model), we may choose to explicitly show this dependence, writing the conditional distribution as $\pcyipsii(y_i | \psi_i;x_i,u_i)$.<br />
<br />
<br />
There are two important pieces to the puzzle: the observations $\by$ whose distribution $\qy$ depends on the individual parameters, and the individual parameters $\bpsi$ themselves with distribution $\qpsi$. In the population approach, the base distribution is the joint distribution $\qypsi$ of the observations and individual parameters:<br />
<br />
{{Equation1<br />
|equation=<math>\pypsi(\by,\bpsi) = \pcypsi(\by {{!}} \bpsi)\ppsi(\bpsi).<br />
</math> }}<br />
<br />
In this chapter, we concentrate essentially on these two components: the conditional distribution $\qcypsi$ of the observations, and the distribution $\qpsi$ of the individual parameters.<br />
<br />
Depending on the required complexity of the model, its other components such as covariates, population parameters and design can also be modeled as random variables, but we will not go into such detail in this chapter.<br />
<br />
For each model, we aim to precisely identify the minimal amount of information needed to represent it mathematically, so that it remains possible to implement and analyze. To do this, we will be able to use $\mlxtran$, a powerful formal declarative language that allows us to describe complicated structural and statistical models in a straightforward, intuitive way.<br />
<br />
{{Next<br />
|link=Modeling the individual parameters }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=Visualization&diff=7395Visualization2013-06-21T09:45:30Z<p>Brocco: </p>
<hr />
<div><div style="color: #2E5894; padding-left: 1.4em; padding-right:2.2em; padding-bottom:0.8em; padding:top:1">[[Image:attention4.jpg|45px|left|link=]] <br />
(If you are experiencing problems with the display of the mathematical formula, you can either try to use another browser, or use this link which should work smoothly: http://popix.lixoft.net)<br />
</div><br />
<br />
== Introduction ==<br />
<br />
Before deciding to model data, it is very important to be able to visualize it. This is especially the case for longitudinal data when we want to see how an outcome varies with time or as a function of another outcome. We may also want to visualize how the individual covariates are distributed, visually detect if there are relationships between variables, visually compare data from different groups, etc. Development of such visual exploration tools poses no methodological problems. It is simple to write a Matlab or R code for one's own needs. To<br />
illustrate the data visualization part of this chapter, we have created a little Matlab toolbox called $\popixplore$ ({{filepath:popixplore 1.1.zip}}) which can be freely downloaded and used.<br />
<br />
It may also be useful to be able to visualize the model itself by undertaking a sensitivity analysis to look at how the structural model changes when we vary one or several parameters. This is important for truly understanding the structural model, i.e., what is behind the given mathematical equations. In the modeling context, we may also want to visually calibrate parameters in order to obtain predictions as close as possible to the observations. Developing such a tool is a difficult task because the tool needs to be able to easily input a model using some coding language, perform complex calculations, and provide a decent graphical interface (e.g., one that lets you easily modify the model parameters).<br />
<br />
Various model visualization tools exist, such as [http://www.berkeleymadonna.com/index.html Berkeley Madonna], specialized in the analysis of dynamical systems and the resolution of ordinary differential equations. Here, we use [http://www.lixoft.eu/products/mlxplore/mlxplore-overview/ $\mlxplore$] for some different reasons:<br />
<br />
<br />
<ul><br />
* [http://www.lixoft.eu/products/mlxplore/mlxplore-overview/ $\mlxplore$] uses the $\mlxtran$ language which is extremely flexible and well-adapted to implementing complex mixed-effects models. Indeed, with $\mlxtran$ we can implement pharmacokinetic models with complex administration schedules, include inter-individual variability in parameters, define a statistical model for the covariates, etc. Another extremely important aspect of $\mlxtran$ is that it rigorously adopts the model representation formalisms proposed in $\wikipopix$. In other words, model implementation is completely in sync with its mathematical representation.<br />
<br><br />
<br />
* [http://www.lixoft.eu/products/mlxplore/mlxplore-overview/ $\mlxplore$] provides a clear graphical interface that of course allows us to visualize the structural model, but also the statistical model, which is of fundamental importance in the population approach. We can thus visualize the impact of covariates and inter-individual variability of model parameters on predictions.<br />
</ul><br />
<br />
<br />
<br><br />
<br />
== Data exploration ==<br />
<br />
<br />
The following example involves 80 individuals that receive a unique dose of an anticoagulant at time $t=0$. For each patient we then measure the plasmatic concentration of the drug at various times. This drug can cause undesirable side effects such as nose bleeds. If this happens, we also record the times at which this happens. The data is recorded in columns of a single text file {{Verbatim|pkrtte_data.csv}}. In this example, the columns are:<br />
<br />
<br />
<ul><br />
'''id''' the ID number of the patient<br />
<br><br><br />
'''time''' dose administration and observation times<br />
<br><br><br />
'''amt''' the amount of drug administered<br />
<br><br><br />
'''y''' the observations (concentrations and events)<br />
<br><br><br />
'''ytype''' the type of observation: 1=concentration, 2=event<br />
<br><br><br />
'''weight''' a continuous individual covariate<br />
<br><br><br />
'''gender''' a categorical individual covariate (F or M)<br />
<br><br><br />
'''group''' four different groups receive different doses: A=40mg, B=60mg, C=80mg, D=100mg.<br />
</ul><br />
<br />
<br />
{{ImageWithCaption|image=exploredata0.png|caption=The datafile {{Verbatim|pkrtte_data.csv}} }} <br />
<br />
<br />
We can read this datafile with the function {{Verbatim|readdatapx}} and add additional information about the data:<br />
<br />
<br />
{{MATLABcode<br />
|name=<br />
|code=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
datafile.name='pkrtte_data.csv';<br />
datafile.format='csv'; % can be "csv", "space", "tab" or ";"<br />
<br />
info.header = {'ID','TIME','AMT','Y','YTYPE','COV','CAT','CAT'};<br />
info.observation.name={'concentration','hemorrhaging'};<br />
info.observation.type={'continuous','event'};<br />
info.observation.unit={'mg/l',''};<br />
info.covariate.unit={'kg',''};<br />
info.time.unit='h';<br />
<br />
data=readdatapx(datafile,info);<br />
</pre> }}<br />
<br />
<br />
How we graphically represent data depends on the type of data. Often for continuous data we use "spaghetti plots", where all of the observations are given on the same plot, and those for each individual are joined up using line segments. Time-to-event data are usually represented using [https://en.wikipedia.org/wiki/Kaplan-Meier_survival_curve Kaplan-Meier plots], i.e., an estimate of the survival function for the first event. In the case of repeated events, we can instead represent the average cumulative number of events per individual.<br />
<br />
<br />
{{MATLABcode<br />
|name=<br />
|code=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
>>exploredatapx(data)<br />
</pre> }}<br />
<br />
<br />
{{ImageWithCaption|image=exploredata1.png|caption=Graphical representation of the data. Left: concentrations, right: average cumulative number of events per individual}}<br />
<br />
<br />
When different groups receive different treatments, it can be useful to separately visualize the data from each group. Here for instance we can separate the patients into groups depending on the initial dose given.<br />
<br />
<br />
{{ImageWithCaption|image=exploredata2.png|caption=Concentration profiles per dose group}}<br />
<br />
<br />
{| cellpadding="10" cellspacing="0"<br />
|style = "width:50%"| [[File:exploredata3a.png]] <br />
|style = "width:50%"| [[File:exploredata3b.png]]<br />
|-<br />
|cellspan="2" align="center" style="text-align:center"| ''Distribution of weight and gender per dose group'' <br />
|}<br />
<br />
<br />
<br />
{{Remarks<br />
|title=Remark<br />
|text=The data file {{Verbatim|pkrtte_data.csv}} and the matlab script {{Verbatim|pkrtte_demo.m}} are available in the folder {{Verbatim|demos}} of $\popixplore$: {{filepath:popixplore 1.1.zip}}.<br />
}}<br />
<br />
<br />
<br><br />
<br />
==Model exploration==<br />
<br />
===Exploring the structural model===<br />
<br />
Suppose that we now want to visualize the following joint model which is one that can be used for simultaneously modeling PK and time-to-event data:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
k&=&Cl/V \\<br />
\deriv{A_d} &=& - k_a \, A_d(t) \\<br />
\deriv{A_c} &=& k_a \, A_d(t) - k \, A_c(t) \\<br />
Cc(t) &=& {Ac(t)}/{V} \\<br />
h(t) &=& h_0 \, \exp(\gamma\, Cc(t)) .<br />
\end{eqnarray} </math> }}<br />
<br />
Here, $A_d$ and $A_c$ are the amounts of drug in the depot and central compartments, $Cc$ the concentration in the central compartment and $h$ the hazard function for the event of interest (hemorrhaging for instance). The parameters of the model are the absorption rate constant $ka$, the volume of distribution $V$, the clearance $Cl$, the baseline hazard $h_0$ and the coefficient $\gamma$.<br />
We assume that the drug can be administered both intravenously and orally, meaning that the drug can be administered to both the depot and the central compartment.<br />
<br />
We first need to implement this model using $\mlxtran$:<br />
<br />
<br />
{{MLXTran<br />
|name=joint1_model.txt<br />
|text=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
[PREDICTION]<br />
input={ka, V, Cl, h0, gamma}<br />
<br />
PK:<br />
depot(type=1,target=Ad)<br />
depot(type=2,target=Ac)<br />
<br />
EQUATION:<br />
k = Cl/V<br />
ddt_Ad = -ka*Ad<br />
ddt_Ac = ka*Ad - k*Ac<br />
Cc = Ac/V<br />
h = h0*exp(gamma*Cc)<br />
</pre>}}<br />
<br />
<br />
Here, an administration of type 1 (resp. 2) is an oral (resp. iv) administration.<br />
<br />
The tasks, i.e., how the model is to be used, are then coded as an [http://www.lixoft.eu/products/mlxplore/mlxplore-overview/ $\mlxplore$] project:<br />
<br />
<br />
{{MLXPlore<br />
|name=joint1_project.txt<br />
|text=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
<MODEL><br />
file='joint1_model.txt'<br />
<br />
<DESIGN><br />
[ADMINISTRATION]<br />
adm1={time=0, amount=50,type=1}<br />
<br />
<PARAMETER><br />
ka = 0.5<br />
V = 10<br />
Cl = 0.5<br />
h0 = 0.01<br />
gamma = 0.5<br />
<br />
<OUTPUT><br />
list={Cc, h}<br />
grid=0:0.1:100<br />
</pre> }}<br />
<br />
<br />
In this example, a single dose of 50 mg is administered orally ({{Verbatim|target{{-}}Ad}} when {{Verbatim|type{{-}}1}}) at time 0. We have asked [http://www.lixoft.eu/products/mlxplore/mlxplore-overview/ $\mlxplore$] to display the predicted concentration $Cc$ and the hazard function $h$ between $t=0$ and $t=100$ every $0.1\,h$ for a given set of parameters. We can then change the values of these parameters with the sliders to see what the impact on the two functions is.<br />
<br />
<br />
{{ImageWithCaption|image=exploremodel1.png|caption=Exploring the model using $\mlxplore$ }}<br />
<br />
<br />
We can easily modify the dose regimen without changing anything in the model itself. Suppose for instance that we want now to compare a treatment with repeated doses of 50mg every 24 hours and a treatment with repeated doses of 25mg every 12 hours. Only the section {{Verbatim|<DESIGN>}} needs to be modified:<br />
<br />
<br />
{{ExampleWithCode&Image<br />
|title=<br />
|text=<br />
|code={{MLXPloreForTable<br />
|name=joint2_project.txt<br />
|text=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
<br />
<DESIGN><br />
[ADMINISTRATION]<br />
adm1={time=0:24:144, amount=50,type=1}<br />
adm2={time=0:12:144, amount=25,type=1}<br />
</pre> }}<br />
|image=[[File:exploremodel2.png]] }}<br />
<br />
<br />
We can combine different administrations (oral and intravenous for instance) into one global treatment:<br />
<br />
<br />
{{ExampleWithCode&Image<br />
|title=<br />
|text=<br />
|code={{MLXPloreForTable<br />
|name=joint3_project.txt<br />
|text=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
<DESIGN><br />
[ADMINISTRATION]<br />
adm1={time=0:24:144, amount=50,type=1}<br />
adm2={time=6:48:150, amount=25,type=2}<br />
<br />
[TREATMENT]<br />
trt1={adm1, adm2}<br />
</pre> }}<br />
|image= [[File:exploremodel3.png]]<br />
}}<br />
<br />
===Exploring the statistical model===<br />
<br />
One of the main advantages of [http://www.lixoft.eu/products/mlxplore/mlxplore-overview/ $\mlxplore$] is its ability to graphically display the predicted distribution of the functions of interest $Cc$ and $h$ when certain parameters of the model are assumed to be random variables. Assume for instance that $V$, $Cl$ and $h_0$ are log-normally distributed. To take this into account, we simply need to insert a section {{Verbatim|[INDIVIDUAL]}} into the project file:<br />
<br />
<br />
{{MLXTran<br />
|name=joint2_model.txt<br />
|text=<pre style="background-color: #EFEFEF; border:none"><br />
[INDIVIDUAL]<br />
input={V_pop,Cl_pop,h0_pop,omega_V,omega_Cl,omega_h0}<br />
<br />
DEFINITION:<br />
V = {distribution=lognormal, reference=V_pop, sd=omega_V}<br />
Cl = {distribution=lognormal, reference=Cl_pop, sd=omega_Cl}<br />
h0 = {distribution=lognormal, reference=h0_pop, sd=omega_h0}<br />
<br />
[PREDICTION]<br />
input={ka, V, Cl, h0, gamma}<br />
.<br />
.<br />
.<br />
</pre> }}<br />
<br />
<br />
The parameters of the model are now the population parameters $V_{\rm pop}$, $Cl_{\rm pop}$, $h0_{\rm pop}$, $\omega_V$, $\omega_{Cl}$ and $\omega_{h_0}$ and the parameters $k_a$ and $\gamma$ which have no inter-individual variability.<br />
<br />
<br />
{{MLXTran<br />
|name=joint4_project.txt<br />
|text=<pre style="background-color: #EFEFEF; border:none"><br />
<MODEL><br />
file='joint2_model.txt'<br />
<br />
<DESIGN><br />
[ADMINISTRATION]<br />
adm1={time=0, amount=50,type=1}<br />
<br />
<PARAMETER><br />
V_pop = 10<br />
Cl_pop = 0.5<br />
h0_pop=0.01<br />
omega_V = 0.2<br />
omega_Cl = 0.3<br />
omega_h0 = 0.2<br />
ka = 0.5<br />
gamma = 0.5<br />
<br />
<OUTPUT><br />
list={Cc, h}<br />
grid=0:0.1:100<br />
</pre> }}<br />
<br />
<br />
When some parameters of the model are random variables, $\mlxplore$ displays the median of the predicted distribution and several prediction intervals (the default is to use different shaded areas for the 10%, 20%, ..., 90% quantiles).<br />
<br />
<br />
{{ImageWithCaption|image=exploremodel4b.png|caption=Exploring the statistical model using $\mlxplore$}}<br />
<br />
<br />
It is possible to introduce covariates into the statistical model by considering for example that the volume depends on the weight, and considering that these covariates are themselves random variables. This may be important if we are for example looking to visualize the amount of variation in concentration due to variation in weight, and the variation in concentration which remains unaccounted for, caused by random effects.<br />
<br />
<br />
{{ImageWithCaption|image=exploremodel5.png|caption=Exploring the statistical model using $\mlxplore$ }}<br />
<br />
<br />
The $\mlxtran$ model files and the $\mlxplore$ scripts can be downloaded here: {{filepath:pk mlxplore.zip}}.<br />
<br />
<br />
<br><br />
== Bibliography ==<br />
<br />
<br />
<bibtex><br />
@ARTICLE{popixplore,<br />
author = {POPIX Inria team},<br />
title = {Popixplore 1.0},<br />
url = {https://wiki.inria.fr/wikis/popix/images/7/71/Popixplore_1.1.zip},<br />
}<br />
</bibtex><br />
<bibtex><br />
@ARTICLE{MLXplore,<br />
author = {Lixoft},<br />
title = {MLXPlore 1.0},<br />
url = {http://www.lixoft.eu/products/mlxplore/mlxplore-overview},<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{macey2000berkeley,<br />
title={Berkeley Madonna user’s guide},<br />
author={Macey, R. and Oster, G. and Zahnley, T.},<br />
journal={Berkeley (CA): University of California},<br />
year={2000}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{chatterjee2009sensitivity,<br />
title={Sensitivity analysis in linear regression},<br />
author={Chatterjee, S. and Hadi, A. S.},<br />
volume={327},<br />
year={2009},<br />
publisher={Wiley}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{sensibilité2013,<br />
title={Analyse de sensibilité et exploration de modèles},<br />
author={Faivre R. and Looss B. and Mah&eacute;vas, S. and Makowski, D. and Monod, H.},<br />
year={2013},<br />
publisher={Editions Quae}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{saltelli2000sensitivity,<br />
title={Sensitivity analysis},<br />
author={Saltelli, A. and Chan, K. and Scott, E. M. and others},<br />
volume={134},<br />
year={2000},<br />
publisher={Wiley New York}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{saltelli2008global,<br />
title={Global sensitivity analysis: the primer},<br />
author={Saltelli, A. and Ratto, M. and Andres, T. and Campolongo, F. and Cariboni, J. and Gatelli, D. and Saisana, M. and Tarantola, S.},<br />
year={2008},<br />
publisher={Wiley-Interscience}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{saltelli2004sensitivity,<br />
title={Sensitivity analysis in practice: a guide to assessing scientific models},<br />
author={Saltelli, A. and Tarantola, S. and Campolongo, F. and Ratto, M.},<br />
year={2004},<br />
publisher={Wiley}<br />
}<br />
</bibtex><br />
<br />
<br />
{{Next<br />
|link=Modeling}}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=Visualization&diff=7394Visualization2013-06-21T09:37:18Z<p>Brocco: </p>
<hr />
<div><div style="color: #2E5894; padding-left: 1.4em; padding-right:2.2em; padding-bottom:0.8em; padding:top:1">[[Image:attention4.jpg|45px|left|link=]] <br />
(If you are experiencing problems with the display of the mathematical formula, you can either try to use another browser, or use this link which should work smoothly: http://popix.lixoft.net)<br />
</div><br />
<br />
== Introduction ==<br />
<br />
Before deciding to model data, it is very important to be able to visualize it. This is especially the case for longitudinal data when we want to see how an outcome varies with time or as a function of another outcome. We may also want to visualize how the individual covariates are distributed, visually detect if there are relationships between variables, visually compare data from different groups, etc. Development of such visual exploration tools poses no methodological problems. It is simple to write a Matlab or R code for one's own needs. To<br />
illustrate the data visualization part of this chapter, we have created a little Matlab toolbox called $\popixplore$ ({{filepath:popixplore 1.1.zip}}) which can be freely downloaded and used.<br />
<br />
It may also be useful to be able to visualize the model itself by undertaking a sensitivity analysis to look at how the structural model changes when we vary one or several parameters. This is important for truly understanding the structural model, i.e., what is behind the given mathematical equations. In the modeling context, we may also want to visually calibrate parameters in order to obtain predictions as close as possible to the observations. Developing such a tool is a difficult task because the tool needs to be able to easily input a model using some coding language, perform complex calculations, and provide a decent graphical interface (e.g., one that lets you easily modify the model parameters).<br />
<br />
Various model visualization tools exist, such as [http://www.berkeleymadonna.com/index.html Berkeley Madonna], specialized in the analysis of dynamical systems and the resolution of ordinary differential equations. Here, we use [[http://www.lixoft.eu/products/mlxplore/mlxplore-overview/ $\mlxplore$]] for some different reasons:<br />
<br />
<br />
<ul><br />
* $\mlxplore$ uses the $\mlxtran$ language which is extremely flexible and well-adapted to implementing complex mixed-effects models. Indeed, with $\mlxtran$ we can implement pharmacokinetic models with complex administration schedules, include inter-individual variability in parameters, define a statistical model for the covariates, etc. Another extremely important aspect of $\mlxtran$ is that it rigorously adopts the model representation formalisms proposed in $\wikipopix$. In other words, model implementation is completely in sync with its mathematical representation.<br />
<br><br />
<br />
* $\mlxplore$ provides a clear graphical interface that of course allows us to visualize the structural model, but also the statistical model, which is of fundamental importance in the population approach. We can thus visualize the impact of covariates and inter-individual variability of model parameters on predictions.<br />
</ul><br />
<br />
<br />
<br><br />
<br />
== Data exploration ==<br />
<br />
<br />
The following example involves 80 individuals that receive a unique dose of an anticoagulant at time $t=0$. For each patient we then measure the plasmatic concentration of the drug at various times. This drug can cause undesirable side effects such as nose bleeds. If this happens, we also record the times at which this happens. The data is recorded in columns of a single text file {{Verbatim|pkrtte_data.csv}}. In this example, the columns are:<br />
<br />
<br />
<ul><br />
'''id''' the ID number of the patient<br />
<br><br><br />
'''time''' dose administration and observation times<br />
<br><br><br />
'''amt''' the amount of drug administered<br />
<br><br><br />
'''y''' the observations (concentrations and events)<br />
<br><br><br />
'''ytype''' the type of observation: 1=concentration, 2=event<br />
<br><br><br />
'''weight''' a continuous individual covariate<br />
<br><br><br />
'''gender''' a categorical individual covariate (F or M)<br />
<br><br><br />
'''group''' four different groups receive different doses: A=40mg, B=60mg, C=80mg, D=100mg.<br />
</ul><br />
<br />
<br />
{{ImageWithCaption|image=exploredata0.png|caption=The datafile {{Verbatim|pkrtte_data.csv}} }} <br />
<br />
<br />
We can read this datafile with the function {{Verbatim|readdatapx}} and add additional information about the data:<br />
<br />
<br />
{{MATLABcode<br />
|name=<br />
|code=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
datafile.name='pkrtte_data.csv';<br />
datafile.format='csv'; % can be "csv", "space", "tab" or ";"<br />
<br />
info.header = {'ID','TIME','AMT','Y','YTYPE','COV','CAT','CAT'};<br />
info.observation.name={'concentration','hemorrhaging'};<br />
info.observation.type={'continuous','event'};<br />
info.observation.unit={'mg/l',''};<br />
info.covariate.unit={'kg',''};<br />
info.time.unit='h';<br />
<br />
data=readdatapx(datafile,info);<br />
</pre> }}<br />
<br />
<br />
How we graphically represent data depends on the type of data. Often for continuous data we use "spaghetti plots", where all of the observations are given on the same plot, and those for each individual are joined up using line segments. Time-to-event data are usually represented using [https://en.wikipedia.org/wiki/Kaplan-Meier_survival_curve Kaplan-Meier plots], i.e., an estimate of the survival function for the first event. In the case of repeated events, we can instead represent the average cumulative number of events per individual.<br />
<br />
<br />
{{MATLABcode<br />
|name=<br />
|code=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
>>exploredatapx(data)<br />
</pre> }}<br />
<br />
<br />
{{ImageWithCaption|image=exploredata1.png|caption=Graphical representation of the data. Left: concentrations, right: average cumulative number of events per individual}}<br />
<br />
<br />
When different groups receive different treatments, it can be useful to separately visualize the data from each group. Here for instance we can separate the patients into groups depending on the initial dose given.<br />
<br />
<br />
{{ImageWithCaption|image=exploredata2.png|caption=Concentration profiles per dose group}}<br />
<br />
<br />
{| cellpadding="10" cellspacing="0"<br />
|style = "width:50%"| [[File:exploredata3a.png]] <br />
|style = "width:50%"| [[File:exploredata3b.png]]<br />
|-<br />
|cellspan="2" align="center" style="text-align:center"| ''Distribution of weight and gender per dose group'' <br />
|}<br />
<br />
<br />
<br />
{{Remarks<br />
|title=Remark<br />
|text=The data file {{Verbatim|pkrtte_data.csv}} and the matlab script {{Verbatim|pkrtte_demo.m}} are available in the folder {{Verbatim|demos}} of $\popixplore$: {{filepath:popixplore 1.1.zip}}.<br />
}}<br />
<br />
<br />
<br><br />
<br />
==Model exploration==<br />
<br />
===Exploring the structural model===<br />
<br />
Suppose that we now want to visualize the following joint model which is one that can be used for simultaneously modeling PK and time-to-event data:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
k&=&Cl/V \\<br />
\deriv{A_d} &=& - k_a \, A_d(t) \\<br />
\deriv{A_c} &=& k_a \, A_d(t) - k \, A_c(t) \\<br />
Cc(t) &=& {Ac(t)}/{V} \\<br />
h(t) &=& h_0 \, \exp(\gamma\, Cc(t)) .<br />
\end{eqnarray} </math> }}<br />
<br />
Here, $A_d$ and $A_c$ are the amounts of drug in the depot and central compartments, $Cc$ the concentration in the central compartment and $h$ the hazard function for the event of interest (hemorrhaging for instance). The parameters of the model are the absorption rate constant $ka$, the volume of distribution $V$, the clearance $Cl$, the baseline hazard $h_0$ and the coefficient $\gamma$.<br />
We assume that the drug can be administered both intravenously and orally, meaning that the drug can be administered to both the depot and the central compartment.<br />
<br />
We first need to implement this model using $\mlxtran$:<br />
<br />
<br />
{{MLXTran<br />
|name=joint1_model.txt<br />
|text=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
[PREDICTION]<br />
input={ka, V, Cl, h0, gamma}<br />
<br />
PK:<br />
depot(type=1,target=Ad)<br />
depot(type=2,target=Ac)<br />
<br />
EQUATION:<br />
k = Cl/V<br />
ddt_Ad = -ka*Ad<br />
ddt_Ac = ka*Ad - k*Ac<br />
Cc = Ac/V<br />
h = h0*exp(gamma*Cc)<br />
</pre>}}<br />
<br />
<br />
Here, an administration of type 1 (resp. 2) is an oral (resp. iv) administration.<br />
<br />
The tasks, i.e., how the model is to be used, are then coded as an $\mlxplore$ project:<br />
<br />
<br />
{{MLXPlore<br />
|name=joint1_project.txt<br />
|text=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
<MODEL><br />
file='joint1_model.txt'<br />
<br />
<DESIGN><br />
[ADMINISTRATION]<br />
adm1={time=0, amount=50,type=1}<br />
<br />
<PARAMETER><br />
ka = 0.5<br />
V = 10<br />
Cl = 0.5<br />
h0 = 0.01<br />
gamma = 0.5<br />
<br />
<OUTPUT><br />
list={Cc, h}<br />
grid=0:0.1:100<br />
</pre> }}<br />
<br />
<br />
In this example, a single dose of 50 mg is administered orally ({{Verbatim|target{{-}}Ad}} when {{Verbatim|type{{-}}1}}) at time 0. We have asked $\mlxplore$ to display the predicted concentration $Cc$ and the hazard function $h$ between $t=0$ and $t=100$ every $0.1\,h$ for a given set of parameters. We can then change the values of these parameters with the sliders to see what the impact on the two functions is.<br />
<br />
<br />
{{ImageWithCaption|image=exploremodel1.png|caption=Exploring the model using $\mlxplore$ }}<br />
<br />
<br />
We can easily modify the dose regimen without changing anything in the model itself. Suppose for instance that we want now to compare a treatment with repeated doses of 50mg every 24 hours and a treatment with repeated doses of 25mg every 12 hours. Only the section {{Verbatim|<DESIGN>}} needs to be modified:<br />
<br />
<br />
{{ExampleWithCode&Image<br />
|title=<br />
|text=<br />
|code={{MLXPloreForTable<br />
|name=joint2_project.txt<br />
|text=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
<br />
<DESIGN><br />
[ADMINISTRATION]<br />
adm1={time=0:24:144, amount=50,type=1}<br />
adm2={time=0:12:144, amount=25,type=1}<br />
</pre> }}<br />
|image=[[File:exploremodel2.png]] }}<br />
<br />
<br />
We can combine different administrations (oral and intravenous for instance) into one global treatment:<br />
<br />
<br />
{{ExampleWithCode&Image<br />
|title=<br />
|text=<br />
|code={{MLXPloreForTable<br />
|name=joint3_project.txt<br />
|text=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
<DESIGN><br />
[ADMINISTRATION]<br />
adm1={time=0:24:144, amount=50,type=1}<br />
adm2={time=6:48:150, amount=25,type=2}<br />
<br />
[TREATMENT]<br />
trt1={adm1, adm2}<br />
</pre> }}<br />
|image= [[File:exploremodel3.png]]<br />
}}<br />
<br />
===Exploring the statistical model===<br />
<br />
One of the main advantages of $\mlxplore$ is its ability to graphically display the predicted distribution of the functions of interest $Cc$ and $h$ when certain parameters of the model are assumed to be random variables. Assume for instance that $V$, $Cl$ and $h_0$ are log-normally distributed. To take this into account, we simply need to insert a section {{Verbatim|[INDIVIDUAL]}} into the project file:<br />
<br />
<br />
{{MLXTran<br />
|name=joint2_model.txt<br />
|text=<pre style="background-color: #EFEFEF; border:none"><br />
[INDIVIDUAL]<br />
input={V_pop,Cl_pop,h0_pop,omega_V,omega_Cl,omega_h0}<br />
<br />
DEFINITION:<br />
V = {distribution=lognormal, reference=V_pop, sd=omega_V}<br />
Cl = {distribution=lognormal, reference=Cl_pop, sd=omega_Cl}<br />
h0 = {distribution=lognormal, reference=h0_pop, sd=omega_h0}<br />
<br />
[PREDICTION]<br />
input={ka, V, Cl, h0, gamma}<br />
.<br />
.<br />
.<br />
</pre> }}<br />
<br />
<br />
The parameters of the model are now the population parameters $V_{\rm pop}$, $Cl_{\rm pop}$, $h0_{\rm pop}$, $\omega_V$, $\omega_{Cl}$ and $\omega_{h_0}$ and the parameters $k_a$ and $\gamma$ which have no inter-individual variability.<br />
<br />
<br />
{{MLXTran<br />
|name=joint4_project.txt<br />
|text=<pre style="background-color: #EFEFEF; border:none"><br />
<MODEL><br />
file='joint2_model.txt'<br />
<br />
<DESIGN><br />
[ADMINISTRATION]<br />
adm1={time=0, amount=50,type=1}<br />
<br />
<PARAMETER><br />
V_pop = 10<br />
Cl_pop = 0.5<br />
h0_pop=0.01<br />
omega_V = 0.2<br />
omega_Cl = 0.3<br />
omega_h0 = 0.2<br />
ka = 0.5<br />
gamma = 0.5<br />
<br />
<OUTPUT><br />
list={Cc, h}<br />
grid=0:0.1:100<br />
</pre> }}<br />
<br />
<br />
When some parameters of the model are random variables, $\mlxplore$ displays the median of the predicted distribution and several prediction intervals (the default is to use different shaded areas for the 10%, 20%, ..., 90% quantiles).<br />
<br />
<br />
{{ImageWithCaption|image=exploremodel4b.png|caption=Exploring the statistical model using $\mlxplore$}}<br />
<br />
<br />
It is possible to introduce covariates into the statistical model by considering for example that the volume depends on the weight, and considering that these covariates are themselves random variables. This may be important if we are for example looking to visualize the amount of variation in concentration due to variation in weight, and the variation in concentration which remains unaccounted for, caused by random effects.<br />
<br />
<br />
{{ImageWithCaption|image=exploremodel5.png|caption=Exploring the statistical model using $\mlxplore$ }}<br />
<br />
<br />
The $\mlxtran$ model files and the $\mlxplore$ scripts can be downloaded here: {{filepath:pk mlxplore.zip}}.<br />
<br />
<br />
<br><br />
== Bibliography ==<br />
<br />
<br />
<bibtex><br />
@ARTICLE{popixplore,<br />
author = {POPIX Inria team},<br />
title = {Popixplore 1.0},<br />
url = {https://wiki.inria.fr/wikis/popix/images/7/71/Popixplore_1.1.zip},<br />
}<br />
</bibtex><br />
<bibtex><br />
@ARTICLE{MLXplore,<br />
author = {Lixoft},<br />
title = {MLXPlore 1.0},<br />
url = {http://www.lixoft.eu/products/mlxplore/mlxplore-overview},<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{macey2000berkeley,<br />
title={Berkeley Madonna user’s guide},<br />
author={Macey, R. and Oster, G. and Zahnley, T.},<br />
journal={Berkeley (CA): University of California},<br />
year={2000}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{chatterjee2009sensitivity,<br />
title={Sensitivity analysis in linear regression},<br />
author={Chatterjee, S. and Hadi, A. S.},<br />
volume={327},<br />
year={2009},<br />
publisher={Wiley}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{sensibilité2013,<br />
title={Analyse de sensibilité et exploration de modèles},<br />
author={Faivre R. and Looss B. and Mah&eacute;vas, S. and Makowski, D. and Monod, H.},<br />
year={2013},<br />
publisher={Editions Quae}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{saltelli2000sensitivity,<br />
title={Sensitivity analysis},<br />
author={Saltelli, A. and Chan, K. and Scott, E. M. and others},<br />
volume={134},<br />
year={2000},<br />
publisher={Wiley New York}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{saltelli2008global,<br />
title={Global sensitivity analysis: the primer},<br />
author={Saltelli, A. and Ratto, M. and Andres, T. and Campolongo, F. and Cariboni, J. and Gatelli, D. and Saisana, M. and Tarantola, S.},<br />
year={2008},<br />
publisher={Wiley-Interscience}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{saltelli2004sensitivity,<br />
title={Sensitivity analysis in practice: a guide to assessing scientific models},<br />
author={Saltelli, A. and Tarantola, S. and Campolongo, F. and Ratto, M.},<br />
year={2004},<br />
publisher={Wiley}<br />
}<br />
</bibtex><br />
<br />
<br />
<br />
{{Next<br />
|link=Modeling}}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=Visualization&diff=7393Visualization2013-06-21T09:19:18Z<p>Brocco: /* Introduction */</p>
<hr />
<div><div style="color: #2E5894; padding-left: 1.4em; padding-right:2.2em; padding-bottom:0.8em; padding:top:1">[[Image:attention4.jpg|45px|left|link=]] <br />
(If you are experiencing problems with the display of the mathematical formula, you can either try to use another browser, or use this link which should work smoothly: http://popix.lixoft.net)<br />
</div><br />
<br />
== Introduction ==<br />
<br />
Before deciding to model data, it is very important to be able to visualize it. This is especially the case for longitudinal data when we want to see how an outcome varies with time or as a function of another outcome. We may also want to visualize how the individual covariates are distributed, visually detect if there are relationships between variables, visually compare data from different groups, etc. Development of such visual exploration tools poses no methodological problems. It is simple to write a Matlab or R code for one's own needs. To<br />
illustrate the data visualization part of this chapter, we have created a little Matlab toolbox called $\popixplore$ ({{filepath:popixplore 1.1.zip}}) which can be freely downloaded and used.<br />
<br />
It may also be useful to be able to visualize the model itself by undertaking a sensitivity analysis to look at how the structural model changes when we vary one or several parameters. This is important for truly understanding the structural model, i.e., what is behind the given mathematical equations. In the modeling context, we may also want to visually calibrate parameters in order to obtain predictions as close as possible to the observations. Developing such a tool is a difficult task because the tool needs to be able to easily input a model using some coding language, perform complex calculations, and provide a decent graphical interface (e.g., one that lets you easily modify the model parameters).<br />
<br />
Various model visualization tools exist, such as [http://www.berkeleymadonna.com/index.html Berkeley Madonna], specialized in the analysis of dynamical systems and the resolution of ordinary differential equations. Here, we use [[http://www.lixoft.eu/products/mlxplore/mlxplore-overview/ $\mlxplore$]] for some different reasons:<br />
<br />
<br />
<ul><br />
* $\mlxplore$ uses the $\mlxtran$ language which is extremely flexible and well-adapted to implementing complex mixed-effects models. Indeed, with $\mlxtran$ we can implement pharmacokinetic models with complex administration schedules, include inter-individual variability in parameters, define a statistical model for the covariates, etc. Another extremely important aspect of $\mlxtran$ is that it rigorously adopts the model representation formalisms proposed in $\wikipopix$. In other words, model implementation is completely in sync with its mathematical representation.<br />
<br><br />
<br />
* $\mlxplore$ provides a clear graphical interface that of course allows us to visualize the structural model, but also the statistical model, which is of fundamental importance in the population approach. We can thus visualize the impact of covariates and inter-individual variability of model parameters on predictions.<br />
</ul><br />
<br />
<br />
<br><br />
<br />
== Data exploration ==<br />
<br />
<br />
The following example involves 80 individuals that receive a unique dose of an anticoagulant at time $t=0$. For each patient we then measure the plasmatic concentration of the drug at various times. This drug can cause undesirable side effects such as nose bleeds. If this happens, we also record the times at which this happens. The data is recorded in columns of a single text file {{Verbatim|pkrtte_data.csv}}. In this example, the columns are:<br />
<br />
<br />
<ul><br />
'''id''' the ID number of the patient<br />
<br><br><br />
'''time''' dose administration and observation times<br />
<br><br><br />
'''amt''' the amount of drug administered<br />
<br><br><br />
'''y''' the observations (concentrations and events)<br />
<br><br><br />
'''ytype''' the type of observation: 1=concentration, 2=event<br />
<br><br><br />
'''weight''' a continuous individual covariate<br />
<br><br><br />
'''gender''' a categorical individual covariate (F or M)<br />
<br><br><br />
'''group''' four different groups receive different doses: A=40mg, B=60mg, C=80mg, D=100mg.<br />
</ul><br />
<br />
<br />
{{ImageWithCaption|image=exploredata0.png|caption=The datafile {{Verbatim|pkrtte_data.csv}} }} <br />
<br />
<br />
We can read this datafile with the function {{Verbatim|readdatapx}} and add additional information about the data:<br />
<br />
<br />
{{MATLABcode<br />
|name=<br />
|code=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
datafile.name='pkrtte_data.csv';<br />
datafile.format='csv'; % can be "csv", "space", "tab" or ";"<br />
<br />
info.header = {'ID','TIME','AMT','Y','YTYPE','COV','CAT','CAT'};<br />
info.observation.name={'concentration','hemorrhaging'};<br />
info.observation.type={'continuous','event'};<br />
info.observation.unit={'mg/l',''};<br />
info.covariate.unit={'kg',''};<br />
info.time.unit='h';<br />
<br />
data=readdatapx(datafile,info);<br />
</pre> }}<br />
<br />
<br />
How we graphically represent data depends on the type of data. Often for continuous data we use "spaghetti plots", where all of the observations are given on the same plot, and those for each individual are joined up using line segments. Time-to-event data are usually represented using Kaplan-Meyer plots, i.e., an estimate of the survival function for the first event. In the case of repeated events, we can instead represent the average cumulative number of events per individual.<br />
<br />
<br />
{{MATLABcode<br />
|name=<br />
|code=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
>>exploredatapx(data)<br />
</pre> }}<br />
<br />
<br />
{{ImageWithCaption|image=exploredata1.png|caption=Graphical representation of the data. Left: concentrations, right: average cumulative number of events per individual}}<br />
<br />
<br />
When different groups receive different treatments, it can be useful to separately visualize the data from each group. Here for instance we can separate the patients into groups depending on the initial dose given.<br />
<br />
<br />
{{ImageWithCaption|image=exploredata2.png|caption=Concentration profiles per dose group}}<br />
<br />
<br />
{| cellpadding="10" cellspacing="0"<br />
|style = "width:50%"| [[File:exploredata3a.png]] <br />
|style = "width:50%"| [[File:exploredata3b.png]]<br />
|-<br />
|cellspan="2" align="center" style="text-align:center"| ''Distribution of weight and gender per dose group'' <br />
|}<br />
<br />
<br />
<br />
{{Remarks<br />
|title=Remark<br />
|text=The data file {{Verbatim|pkrtte_data.csv}} and the matlab script {{Verbatim|pkrtte_demo.m}} are available in the folder {{Verbatim|demos}} of $\popixplore$: {{filepath:popixplore 1.1.zip}}.<br />
}}<br />
<br />
<br />
<br><br />
<br />
==Model exploration==<br />
<br />
===Exploring the structural model===<br />
<br />
Suppose that we now want to visualize the following joint model which is one that can be used for simultaneously modeling PK and time-to-event data:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
k&=&Cl/V \\<br />
\deriv{A_d} &=& - k_a \, A_d(t) \\<br />
\deriv{A_c} &=& k_a \, A_d(t) - k \, A_c(t) \\<br />
Cc(t) &=& {Ac(t)}/{V} \\<br />
h(t) &=& h_0 \, \exp(\gamma\, Cc(t)) .<br />
\end{eqnarray} </math> }}<br />
<br />
Here, $A_d$ and $A_c$ are the amounts of drug in the depot and central compartments, $Cc$ the concentration in the central compartment and $h$ the hazard function for the event of interest (hemorrhaging for instance). The parameters of the model are the absorption rate constant $ka$, the volume of distribution $V$, the clearance $Cl$, the baseline hazard $h_0$ and the coefficient $\gamma$.<br />
We assume that the drug can be administered both intravenously and orally, meaning that the drug can be administered to both the depot and the central compartment.<br />
<br />
We first need to implement this model using $\mlxtran$:<br />
<br />
<br />
{{MLXTran<br />
|name=joint1_model.txt<br />
|text=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
[PREDICTION]<br />
input={ka, V, Cl, h0, gamma}<br />
<br />
PK:<br />
depot(type=1,target=Ad)<br />
depot(type=2,target=Ac)<br />
<br />
EQUATION:<br />
k = Cl/V<br />
ddt_Ad = -ka*Ad<br />
ddt_Ac = ka*Ad - k*Ac<br />
Cc = Ac/V<br />
h = h0*exp(gamma*Cc)<br />
</pre>}}<br />
<br />
<br />
Here, an administration of type 1 (resp. 2) is an oral (resp. iv) administration.<br />
<br />
The tasks, i.e., how the model is to be used, are then coded as an $\mlxplore$ project:<br />
<br />
<br />
{{MLXPlore<br />
|name=joint1_project.txt<br />
|text=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
<MODEL><br />
file='joint1_model.txt'<br />
<br />
<DESIGN><br />
[ADMINISTRATION]<br />
adm1={time=0, amount=50,type=1}<br />
<br />
<PARAMETER><br />
ka = 0.5<br />
V = 10<br />
Cl = 0.5<br />
h0 = 0.01<br />
gamma = 0.5<br />
<br />
<OUTPUT><br />
list={Cc, h}<br />
grid=0:0.1:100<br />
</pre> }}<br />
<br />
<br />
In this example, a single dose of 50 mg is administered orally ({{Verbatim|target{{-}}Ad}} when {{Verbatim|type{{-}}1}}) at time 0. We have asked $\mlxplore$ to display the predicted concentration $Cc$ and the hazard function $h$ between $t=0$ and $t=100$ every $0.1\,h$ for a given set of parameters. We can then change the values of these parameters with the sliders to see what the impact on the two functions is.<br />
<br />
<br />
{{ImageWithCaption|image=exploremodel1.png|caption=Exploring the model using $\mlxplore$ }}<br />
<br />
<br />
We can easily modify the dose regimen without changing anything in the model itself. Suppose for instance that we want now to compare a treatment with repeated doses of 50mg every 24 hours and a treatment with repeated doses of 25mg every 12 hours. Only the section {{Verbatim|<DESIGN>}} needs to be modified:<br />
<br />
<br />
{{ExampleWithCode&Image<br />
|title=<br />
|text=<br />
|code={{MLXPloreForTable<br />
|name=joint2_project.txt<br />
|text=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
<br />
<DESIGN><br />
[ADMINISTRATION]<br />
adm1={time=0:24:144, amount=50,type=1}<br />
adm2={time=0:12:144, amount=25,type=1}<br />
</pre> }}<br />
|image=[[File:exploremodel2.png]] }}<br />
<br />
<br />
We can combine different administrations (oral and intravenous for instance) into one global treatment:<br />
<br />
<br />
{{ExampleWithCode&Image<br />
|title=<br />
|text=<br />
|code={{MLXPloreForTable<br />
|name=joint3_project.txt<br />
|text=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
<DESIGN><br />
[ADMINISTRATION]<br />
adm1={time=0:24:144, amount=50,type=1}<br />
adm2={time=6:48:150, amount=25,type=2}<br />
<br />
[TREATMENT]<br />
trt1={adm1, adm2}<br />
</pre> }}<br />
|image= [[File:exploremodel3.png]]<br />
}}<br />
<br />
===Exploring the statistical model===<br />
<br />
One of the main advantages of $\mlxplore$ is its ability to graphically display the predicted distribution of the functions of interest $Cc$ and $h$ when certain parameters of the model are assumed to be random variables. Assume for instance that $V$, $Cl$ and $h_0$ are log-normally distributed. To take this into account, we simply need to insert a section {{Verbatim|[INDIVIDUAL]}} into the project file:<br />
<br />
<br />
{{MLXTran<br />
|name=joint2_model.txt<br />
|text=<pre style="background-color: #EFEFEF; border:none"><br />
[INDIVIDUAL]<br />
input={V_pop,Cl_pop,h0_pop,omega_V,omega_Cl,omega_h0}<br />
<br />
DEFINITION:<br />
V = {distribution=lognormal, reference=V_pop, sd=omega_V}<br />
Cl = {distribution=lognormal, reference=Cl_pop, sd=omega_Cl}<br />
h0 = {distribution=lognormal, reference=h0_pop, sd=omega_h0}<br />
<br />
[PREDICTION]<br />
input={ka, V, Cl, h0, gamma}<br />
.<br />
.<br />
.<br />
</pre> }}<br />
<br />
<br />
The parameters of the model are now the population parameters $V_{\rm pop}$, $Cl_{\rm pop}$, $h0_{\rm pop}$, $\omega_V$, $\omega_{Cl}$ and $\omega_{h_0}$ and the parameters $k_a$ and $\gamma$ which have no inter-individual variability.<br />
<br />
<br />
{{MLXTran<br />
|name=joint4_project.txt<br />
|text=<pre style="background-color: #EFEFEF; border:none"><br />
<MODEL><br />
file='joint2_model.txt'<br />
<br />
<DESIGN><br />
[ADMINISTRATION]<br />
adm1={time=0, amount=50,type=1}<br />
<br />
<PARAMETER><br />
V_pop = 10<br />
Cl_pop = 0.5<br />
h0_pop=0.01<br />
omega_V = 0.2<br />
omega_Cl = 0.3<br />
omega_h0 = 0.2<br />
ka = 0.5<br />
gamma = 0.5<br />
<br />
<OUTPUT><br />
list={Cc, h}<br />
grid=0:0.1:100<br />
</pre> }}<br />
<br />
<br />
When some parameters of the model are random variables, $\mlxplore$ displays the median of the predicted distribution and several prediction intervals (the default is to use different shaded areas for the 10%, 20%, ..., 90% quantiles).<br />
<br />
<br />
{{ImageWithCaption|image=exploremodel4b.png|caption=Exploring the statistical model using $\mlxplore$}}<br />
<br />
<br />
It is possible to introduce covariates into the statistical model by considering for example that the volume depends on the weight, and considering that these covariates are themselves random variables. This may be important if we are for example looking to visualize the amount of variation in concentration due to variation in weight, and the variation in concentration which remains unaccounted for, caused by random effects.<br />
<br />
<br />
{{ImageWithCaption|image=exploremodel5.png|caption=Exploring the statistical model using $\mlxplore$ }}<br />
<br />
<br />
The $\mlxtran$ model files and the $\mlxplore$ scripts can be downloaded here: {{filepath:pk mlxplore.zip}}.<br />
<br />
<br />
<br><br />
== Bibliography ==<br />
<br />
<br />
<bibtex><br />
@ARTICLE{popixplore,<br />
author = {POPIX Inria team},<br />
title = {Popixplore 1.0},<br />
url = {https://wiki.inria.fr/wikis/popix/images/7/71/Popixplore_1.1.zip},<br />
}<br />
</bibtex><br />
<bibtex><br />
@ARTICLE{MLXplore,<br />
author = {Lixoft},<br />
title = {MLXPlore 1.0},<br />
url = {http://www.lixoft.eu/products/mlxplore/mlxplore-overview},<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{macey2000berkeley,<br />
title={Berkeley Madonna user’s guide},<br />
author={Macey, R. and Oster, G. and Zahnley, T.},<br />
journal={Berkeley (CA): University of California},<br />
year={2000}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{chatterjee2009sensitivity,<br />
title={Sensitivity analysis in linear regression},<br />
author={Chatterjee, S. and Hadi, A. S.},<br />
volume={327},<br />
year={2009},<br />
publisher={Wiley}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{sensibilité2013,<br />
title={Analyse de sensibilité et exploration de modèles},<br />
author={Faivre R. and Looss B. and Mah&eacute;vas, S. and Makowski, D. and Monod, H.},<br />
year={2013},<br />
publisher={Editions Quae}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{saltelli2000sensitivity,<br />
title={Sensitivity analysis},<br />
author={Saltelli, A. and Chan, K. and Scott, E. M. and others},<br />
volume={134},<br />
year={2000},<br />
publisher={Wiley New York}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{saltelli2008global,<br />
title={Global sensitivity analysis: the primer},<br />
author={Saltelli, A. and Ratto, M. and Andres, T. and Campolongo, F. and Cariboni, J. and Gatelli, D. and Saisana, M. and Tarantola, S.},<br />
year={2008},<br />
publisher={Wiley-Interscience}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{saltelli2004sensitivity,<br />
title={Sensitivity analysis in practice: a guide to assessing scientific models},<br />
author={Saltelli, A. and Tarantola, S. and Campolongo, F. and Ratto, M.},<br />
year={2004},<br />
publisher={Wiley}<br />
}<br />
</bibtex><br />
<br />
<br />
<br />
{{Next<br />
|link=Modeling}}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=The_SAEM_algorithm_for_estimating_population_parameters&diff=7392The SAEM algorithm for estimating population parameters2013-06-21T09:17:28Z<p>Brocco: </p>
<hr />
<div>==Introduction ==<br />
<br />
<br />
The SAEM (Stochastic Approximation of EM) algorithm is a stochastic algorithm for calculating the maximum likelihood estimator (MLE) in the quite general setting of incomplete data models. SAEM has been shown to be a very powerful NLMEM tool, known to accurately estimate population parameters as well as having good theoretical properties. In fact, it converges to the MLE under very general hypotheses.<br />
<br />
SAEM was first implemented in the $\monolix$ software. It has also been implemented in NONMEM, the {{Verbatim|R}} package {{Verbatim|saemix}} and the Matlab statistics toolbox as the function {{Verbatim|nlmefitsa.m}}.<br />
<br />
Here, we consider a model that includes observations $\by=(y_i , 1\leq i \leq N)$, unobserved individual parameters $\bpsi=(\psi_i , 1\leq i \leq N)$ and a vector of parameters $\theta$. By definition, the maximum likelihood estimator of $\theta$ maximizes<br />
<br />
{{Equation1<br />
|equation=<math> {\like}(\theta ; \by) = \py(\by ; \theta) = \displaystyle{ \int \pypsi(\by,\bpsi ; \theta) \, d \bpsi}.<br />
</math> }}<br />
<br />
<br />
SAEM is an iterative algorithm that essentially consists of constructing $N$ [http://en.wikipedia.org/wiki/Markov_chain Markov chains] $(\psi_1^{(k)})$, ..., $ (\psi_N^{(k)})$ that converge to the conditional distributions $\pmacro(\psi_1|y_1),\ldots , \pmacro(\psi_N|y_N)$, using at each step the complete data $(\by,\bpsi^{(k)})$ to calculate a new parameter vector $\theta_k$. We will present a general description of the algorithm highlighting the connection with the EM algorithm, and present by way of a simple example how to implement SAEM and use it in practice.<br />
<br />
We will also give some extensions of the base algorithm that allow us to improve the convergence properties of the algorithm. For instance, it is possible to stabilize the algorithm's convergence by using several [http://en.wikipedia.org/wiki/Markov_chain Markov chains] per individual. Also, a simulated annealing version of SAEM allows us improve the chances of converging to the global maximum of the likelihood rather than to local maxima.<br />
<br />
<br />
<br><br />
==The EM algorithm==<br />
<br />
<br />
We first remark that if the individual parameters $\bpsi=(\psi_i)$ are observed, estimation is not thwarted by any particular problem because an estimator could be found by directly maximizing the joint distribution $\pypsi(\by,\bpsi ; \theta) $.<br />
<br />
However, since the $\psi_i$ are not observed, the EM algorithm replaces $\bpsi$ by its conditional expectation. Then, given some initial value $\theta_0$, iteration $k$ updates ${\theta}_{k-1}$ to ${\theta}_{k}$ with the two following steps:<br />
<br />
<br />
* $\textbf{E-step:}$ evaluate the quantity<br />
<br />
{{Equation1<br />
|equation=<math> Q_k(\theta)=\esp{\log \pmacro(\by,\bpsi;\theta){{!}} \by;\theta_{k-1} } .</math> }}<br />
<br />
<br />
* $\textbf{M-step:}$ update the estimation of $\theta$:<br />
<br />
{{Equation1<br />
|equation=<math> \theta_{k} = \argmax{\theta} \, Q_k(\theta) .<br />
</math> }}<br />
<br />
<br />
In can be proved that each EM iteration increases the likelihood of observations and that the EM sequence $(\theta_k)$ converges to a<br />
stationary point of the observed likelihood under mild regularity conditions.<br />
<br />
Unfortunately, in the framework of nonlinear mixed-effects models, there is no explicit expression for the E-step since the relationship between observations $\by$ and individual parameters $\bpsi$ is nonlinear. However, even though this expectation cannot be computed in a closed-form, it can be approximated by simulation. For instance,<br />
<br />
<br />
* The Monte Carlo EM (MCEM) algorithm replaces the E-step by a Monte Carlo approximation based on a large number of independent simulations of the non-observed individual parameters $\bpsi$.<br />
<br />
* The SAEM algorithm replaces the E-step by a stochastic approximation based on a single simulation of $\bpsi$.<br />
<br />
<br />
<br><br />
<br />
==The SAEM algorithm==<br />
<br />
At iteration $k$ of SAEM:<br />
<br />
<br />
* $\textbf{Simulation step}$: for $i=1,2,\ldots, N$, draw $\psi_i^{(k)}$ from the conditional distribution $\pmacro(\psi_i |y_i ;\theta_{k-1})$.<br />
<br />
<br />
* $\textbf{Stochastic approximation}$: update $Q_k(\theta)$ according to<br />
<br />
{{Equation1<br />
|equation=<math> Q_k(\theta) = Q_{k-1}(\theta) + \gamma_k ( \log \pmacro(\by,\bpsi^{(k)};\theta) - Q_{k-1}(\theta) ),<br />
</math> }}<br />
<br />
where $(\gamma_k)$ is a decreasing sequence of positive numbers such that $\gamma_1=1$, $ \sum_{k=1}^{\infty} \gamma_k = \infty$ and $\sum_{k=1}^{\infty} \gamma_k^2 < \infty$.<br />
<br />
<br />
* $\textbf{Maximization step}$: update $\theta_{k-1}$ according to<br />
<br />
{{Equation1<br />
|equation=<math> \theta_{k} = \argmax{\theta} \, Q_k(\theta) .</math> }}<br />
<br />
<br />
{{Remarks <br />
|title=Remarks<br />
|text= &#32;<br />
* Setting $\gamma_k=1$ for all $k$ means that there is no memory in the stochastic approximation:<br />
<br />
{{Equation1<br />
|equation=<math> Q_k(\theta) = \log \pmacro(\by,\bpsi^{(k)};\theta) . </math> }}<br />
<br />
: This algorithm, known as Stochastic EM (SEM) thus consists of successively simulating $\bpsi^{(k)}$ with the conditional distribution $\pmacro(\bpsi^{(k)} {{!}} \by;\theta_{k-1})$, then computing $\theta_k$ by maximizing the joint distribution $\pmacro(\by,\bpsi^{(k)};\theta)$.<br />
<br />
<br />
* When the number $N$ of subjects is small, convergence of SAEM can be improved by running $L$ [http://en.wikipedia.org/wiki/Markov_chain Markov chains] for each individual instead of one. The simulation step at iteration $k$ then requires us to draw $L$ sequences $ { \phi_i^{(k,1)} } ,\ldots , { \phi_i^{(k,L)} } $ for each individual $i$ and to combine stochastic approximation and Monte Carlo in the approximation step:<br />
<br />
{{Equation1<br />
|equation=<math> Q_k(\theta) = Q_{k-1}(\theta) + \gamma_k \left( \frac{1}{L}\sum_{\ell=1}^{L} \log \pmacro(\by,\bpsi^{(k,\ell)};\theta) - Q_{k-1}(\theta) \right) .<br />
</math> }}<br />
<br />
: By default, $\monolix$ selects $L$ so that $N\times L \geq 50$.<br />
}}<br />
<br />
<br />
Implementation of SAEM is simplified when the complete model $\pmacro(\by,\bpsi;\theta)$ belongs to a regular (curved) exponential family:<br />
<br />
{{Equation1<br />
|equation=<math> \pmacro(\by,\bpsi ;\theta) = \exp\left\{ - \zeta(\theta) + \langle \tilde{S}(\by,\bpsi) , \varphi(\theta) \rangle \right\} , </math> }}<br />
<br />
where $\tilde{S}(\by,\bpsi)$ is a sufficient statistic of the complete model (i.e., whose value contains all the information needed to compute any estimate of $\theta$) which takes its values in an open subset ${\cal S}$ of $\Rset^m$. Then, there exists a function $\tilde{\theta}$ such that for any $s\in {\cal S}$,<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:saem_stat"><math><br />
\tilde{\theta}(s) = \argmax{\theta} \left\{ - \zeta(\theta) + \langle s , \varphi(\theta) \rangle \right\} .<br />
</math></div><br />
|reference=(1) }}<br />
<br />
The approximation step of SAEM simplifies to a general Robbins-Monro-type scheme for approximating this conditional expectation:<br />
<br />
<br />
* $\textbf{Stochastic approximation}$: update $s_k$ according to<br />
<br />
{{Equation1<br />
|equation=<math><br />
s_k = s_{k-1} + \gamma_k ( \tilde{S}(\by,\bpsi^{(k)}) - s_{k-1} ) . </math> }}<br />
<br />
<br />
Note that the E-step of EM simplifies to computing $s_k=\esp{\tilde{S}(\by,\bpsi) | \by ; \theta_{k-1}}$.<br />
<br />
Then, both EM and SAEM algorithms use [[#eq:saem_stat|(1)]] for the M-step: $\theta_k = \tilde{\theta}(s_k)$.<br />
<br />
Precise results for convergence of SAEM were obtained in the [[Estimation of the observed Fisher information matrix#Estimation using linearization of the model|Estimation of the F.I.M. using a linearization of the model]] chapter in the case where $\pmacro(\by,\bpsi;\theta)$ belongs to a regular curved exponential family. This first version of [[The SAEM algorithm for estimating population parameters|SAEM]] and these first results assume that the individual parameters are simulated exactly under the conditional distribution at each iteration. Unfortunately, for most nonlinear models or non-Gaussian models, the unobserved data cannot be simulated exactly under this conditional distribution. A well-known alternative consists in using the Metropolis-Hastings algorithm: introduce a transition probability which has as unique invariant distribution the conditional distribution we want to simulate.<br />
<br />
In other words, the procedure consists of replacing the Simulation step of SAEM at iteration $k$ by $m$ iterations of the<br />
Metropolis-Hastings (MH) algorithm described in [[The Metropolis-Hastings algorithm for simulating the individual parameters|The Metropolis-Hastings algorithm]] section. It was shown in the [[Estimation of the observed Fisher information matrix#Estimation using linearization of the model|Estimation of the F.I.M. using a linearization of the model]] section that [[The SAEM algorithm for estimating population parameters|SAEM]] still converges under general conditions when coupled with a [http://en.wikipedia.org/wiki/Markov_chain Markov chain] Monte Carlo procedure.<br />
<br />
<br />
{{Remarks<br />
|title= Remark<br />
|text= Convergence of the [http://en.wikipedia.org/wiki/Markov_chain Markov chains] $(\psi_i^{(k)})$ is not necessary at each SAEM iteration. It suffices to run a few MH iterations with various transition kernels before resetting $\theta_{k-1}$. In $\monolix$ by default, three transition kernels are used twice each, successively, in each SAEM iteration.<br />
}}<br />
<br />
<br />
<br><br />
<br />
== Implementing SAEM ==<br />
<br />
Implementation of SAEM can be difficult to describe when looking at complex statistical models such as mixture models, models with inter-occasion variability, etc. We are therefore going to limit ourselves to looking at some basic models in order to illustrate how SAEM can be implemented.<br />
<br />
<br><br />
===SAEM for general hierarchical models===<br />
<br />
Consider first a very general model for any type (continuous, categorical, survival, etc.) of data $(y_i)$:<br />
<br />
{{Equation1<br />
|equation= <math>\begin{eqnarray} y_i {{!}} \psi_i &\sim& \pcyipsii(y_i {{!}} \psi_i) \\<br />
h(\psi_i) &\sim& {\cal N}( \mu , \Omega),<br />
\end{eqnarray}</math> }}<br />
<br />
where $h(\psi_i)=(h_1(\psi_{i,1}), h_2(\psi_{i,2}), \ldots , h_d(\psi_{i,d}) )^\transpose$ is a $d$-vector of (transformed) individual parameters, $\mu$ a $d$-vector of fixed effects and $\Omega$ a $d\times d$ variance-covariance matrix.<br />
<br />
We assume here that $\Omega$ is positive-definite. Then, a sufficient statistic for the complete model $\pmacro(\by,\bpsi;\theta)$ is<br />
$\tilde{S}(\bpsi) = (\tilde{S}_1(\bpsi),\tilde{S}_2(\bpsi))$, where<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\tilde{S}_1(\bpsi) &= & \sum_{i=1}^N h(\psi_i) \\<br />
\tilde{S}_2(\bpsi) &= & \sum_{i=1}^N h(\psi_i) h(\psi_i)^\transpose .<br />
\end{eqnarray}</math> }}<br />
<br />
At iteration $k$ of SAEM, we have:<br />
<br />
<br />
* $\textbf{Simulation step}$: for $i=1,2,\ldots, N$, draw $\psi_i^{(k)}$ from $m$ iterations of the MH algorithm described in [[The Metropolis-Hastings algorithm for simulating the individual parameters|The Metropolis-Hastings algorithm]] with $\pmacro(\psi_i |y_i ;\mu_{k-1},\Omega_{k-1})$ as limiting distribution.<br />
<br />
* $\textbf{Stochastic approximation}$: update $s_k=(s_{k,1},s_{k,2})$ according to<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
s_{k,1} &=& s_{k-1,1} + \gamma_k \left( \sum_{i=1}^N h(\psi_i^{(k)}) - s_{k-1,1} \right) \\<br />
s_{k,2} &=& s_{k-1,2} + \gamma_k \left( \sum_{i=1}^N h(\psi_i^{(k)})h(\psi_i^{(k)})^\transpose - s_{k-1,2} \right) .<br />
\end{eqnarray}</math> }}<br />
<br />
<br />
* $\textbf{Maximization step}$: update $(\mu_{k-1},\Omega_{k-1})$ according to<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\mu_{k} &=& \frac{1}{N} s_{k,1} \\<br />
\Omega_k &=& \frac{1}{N}\left( s_{k,2} - s_{k,1}s_{k,1}^\transpose \right) .<br />
\end{eqnarray}</math> }}<br />
<br />
<br />
What is remarkable is that it suffices to be able to calculate $\pcyipsii(y_i | \psi_i)$ for all $\psi_i$ and $y_i$ in order to be able to run SAEM. In effect, this allows the simulation step to be run using MH since the acceptance probabilities can be calculated.<br />
<br />
<br />
<br><br />
<br />
===SAEM for continuous data models===<br />
Consider now a continuous data model in which the residual error variance is now constant:<br />
<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
y_{ij} &=& f(t_{ij},\phi_i) + a \teps_{ij} \\<br />
h(\psi_i) &\sim& {\cal N}( \mu , \Omega) .<br />
\end{eqnarray}</math> }}<br />
<br />
<br />
Here, the individual parameters are $\psi_i=(\phi_i,a)$. The variance-covariance matrix for $\psi_i$ is not positive-definite in this case because $a$ has no variability. If we suppose that the variance matrix $\Omega$ is positive-definite, then noting $\theta=(\mu,\Omega,a)$, a natural decomposition of the model is:<br />
<br />
{{Equation1<br />
|equation=<math>\pmacro(\by,\bpsi;\theta) = \pmacro(\by {{!}} \bpsi;a)\pmacro(\bpsi;\mu,\Omega) .<br />
</math> }}<br />
<br />
The previous statistic $\tilde{S}(\bpsi) = (\tilde{S}_1(\bpsi),\tilde{S}_2(\bpsi))$ is not sufficient for estimating $a$. Indeed, we need an additional component which is a function both of $\by$ and $\bpsi$:<br />
<br />
{{Equation1<br />
|equation=<math> \tilde{S}_3(\by, \bpsi) =\sum_{i=1}^N \sum_{j=1}^{n_i}(y_{ij} - f(t_{ij},\psi_i))^2. </math> }}<br />
<br />
Then,<br />
<br />
{{Equation1<br />
|equation=<math> \begin{eqnarray}<br />
s_{k,3} &=& s_{k-1,3} + \gamma_k ( \tilde{S}_3(\by, \bpsi) - s_{k-1,3} ) \\<br />
a_k^2 &=& \displaystyle{ \frac{1}{\sum_{i=1}^N n_i} s_{k,3} }\ .<br />
\end{eqnarray}</math> }}<br />
<br />
The choice of step-size $(\gamma_k)$ is extremely important for ensuring convergence of SAEM. The sequence $(\gamma_k)$ used in $\monolix$ decreases like $k^{-\alpha}$. We recommend using $\alpha=0$ (that is, $\gamma_k=1$) during the first $K_1$ iterations, in order to converge quickly to a neighborhood of a maximum of the likelihood, and $\alpha=1$ during the next $K_2$ iterations.<br />
Indeed, the initial guess $\theta_0$ may be far from the maximum likelihood value we are looking for, and the first iterations with $\gamma_k=1$ allow SAEM to converge quickly to a neighborhood of this value. Following this, smaller step-sizes ensure the<br />
almost sure convergence of the algorithm to the maximum likelihood estimator.<br />
<br />
<br />
<br />
{{Example<br />
|title=Example<br />
|text= Consider a simple model for continuous data:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
y_{ij} &\sim& {\cal N}(A_i\,e^{-k_i \, t_{ij} } , a^2) \\<br />
\log(A_i)&\sim&{\cal N}(\log(A_{\rm pop}) , \omega_A^2) \\<br />
\log(k_i)&\sim&{\cal N}(\log(k_{\rm pop}) , \omega_k^2) ,<br />
\end{eqnarray}</math> }}<br />
<br />
where $A_{\rm pop}=6$, $k_{\rm pop}=0.25$, $\omega_A=0.3$, $\omega_k=0.3$ and $a=0.2$.<br />
Let us look at the effect of different settings for $(\gamma_k)$ (and $L$) for estimating the population parameters of the model with SAEM.<br />
<br />
<br />
1. For all $k$, $\gamma_k = 1$: the sequence $(\theta_{k})$ converges very quickly to a neighborhood of the "solution". The sequence $(\theta_{k})$ is a homogeneous Markov Chain that converges in distribution but does not converge almost surely. <br />
<br />
[[File:saem1.png|link=]]<br />
<br />
<br />
2. For all $k$, $\gamma_k = 1/k$: the sequence $(\theta_{k})$ converges almost surely to the maximum likelihood estimate of $\theta$, but very slowly. <br />
<br />
[[File:saem2.png|link=]]<br />
<br />
<br />
3. $\gamma_k = 1$, $k=1$, ...,$40$, $\gamma_k = 1/(k-40)$, $k \geq 41$: the sequence $(\theta_{k})$ converges almost surely to the maximum likelihood estimate of $\theta$, and quickly.<br />
<br />
[[File:saem3.png|link=]]<br />
<br />
<br />
4. $L=10$, $\gamma_k = 1$, $k \geq 1$: the sequence $(\theta_{k})$ is an homogeneous Markov chain that converges in distribution, as in Example 1, but the variance is reduced by a factor $\sqrt{10}$; in this case, SAEM behaves like EM. <br />
<br />
[[File:saem4.png|link=]]<br />
}}<br />
<br />
<br />
<br><br />
<br />
==A simple example to understand why SAEM converges in practice==<br />
<br />
<br />
Let us look at a very simple Gaussian model, with only one observation per individual:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\psi_i &\sim& {\cal N}(\theta,\omega^2) , \ \ \ 1 \leq i \leq N \\<br />
y_i &\sim& {\cal N}(\psi_i,\sigma^2).<br />
\end{eqnarray}</math> }}<br />
<br />
We will furthermore assume that both $\omega^2$ and $\sigma^2$ are known.<br />
<br />
Here, the maximum likelihood estimator $ \hat{\theta}$ of $\theta$ is easy to compute since $y_i \sim_{i.i.d.} {\cal N}(\theta,\omega^2+\sigma^2)$. We find that<br />
<br />
{{Equation1<br />
|equation=<math> \hat{\theta} = \displaystyle{\frac{1}{N} }\sum_{i=1}^{N} y_i .<br />
</math>}}<br />
<br />
We now propose to try and compute $\hat{\theta}$ using SAEM instead. The simulation step is straightforward since the conditional distribution of $\psi_i$ is a normal distribution:<br />
<br />
{{Equation1<br />
|equation=<math><br />
\psi_i {{!}} y_i \sim {\cal N}(a \theta + (1-a)y_i , \gamma^2) ,<br />
</math> }}<br />
<br />
where<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
a &= & \displaystyle{ \frac{1}{\omega^2} } \left(\displaystyle{ \frac{1}{\sigma^2} }+ \displaystyle{\frac{1}{\omega^2} }\right)^{-1} \\<br />
\gamma^2 &= &\left(\displaystyle{ \frac{1}{\sigma^2} }+ \displaystyle{\frac{1}{\omega^2} }\right)^{-1}.<br />
\end{eqnarray}</math> }}<br />
<br />
The maximization step is also straightforward. Indeed, a sufficient statistic for estimating $\theta$ is<br />
<br />
{{Equation1<br />
|equation=<math> {\cal S}(\bpsi) = \sum_{i=1}^{N} \psi_i. </math> }}<br />
<br />
Then,<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\tilde{\theta}({\cal S(\bpsi)} ) &=& \argmax{\theta} \pmacro(y_1,\ldots,y_N,\psi_1,\ldots,\psi_N;\theta) \\<br />
&=& \argmax{\theta} \pmacro(\psi_1,\ldots,\psi_N;\theta) \\<br />
&=& \frac{ {\cal S}(\bpsi)}{N}.<br />
\end{eqnarray}</math> }}<br />
<br />
Let us first look at the behavior of SAEM when $\gamma_k=1$. At iteration $k$,<br />
<br />
<br />
* Simulation step: $\psi_i^{(k)} \sim {\cal N}( a \theta_{k-1} + (1-a)y_i , \gamma^2). $<br />
<br />
* Maximization step: $\theta_k = \displaystyle{ \frac{ {\cal S}(\bpsi^{(k)})}{N} } = \displaystyle{ \frac{1}{N} }\sum_{i=1}^{N} \psi_i^{(k)}$.<br />
<br />
<br />
It can be shown that:<br />
<br />
{{Equation1<br />
|equation=<math> \theta_k - \hat{\theta} = a(\theta_{k-1} - \hat{\theta}) + e_k ,<br />
</math> }}<br />
<br />
where $e_k \sim {\cal N}(0, \gamma^2 /N)$. Then, the sequence $(\theta_k)$ is an autoregressive process of order 1 (AR(1)) which converges in distribution to a normal distribution when $k\to \infty$:<br />
<br />
{{Equation1<br />
|equation=<math>\theta_k \limite{}{\cal D} {\cal N}\left(\hat{\theta} , \displaystyle{ \frac{\gamma^2}{N(1-a^2)} }\right) .<br />
</math> }}<br />
<br />
<br />
{{ImageWithCaption|image=saemb1.png|caption=10 sequences $(\theta_k)$ obtained with different initial values and $\gamma_k=1$ for $1\leq k \leq 50$ }} <br />
<br />
<br />
Now, let us see what happens instead when $\gamma_k$ decreases like $1/k$. At iteration $k$,<br />
<br />
<br />
* Simulation step: $\psi_i^{(k)} \sim {\cal N}( a \theta_{k-1} + (1-a)y_i , \gamma^2) $<br />
<br />
* Maximization step:<br />
<br />
{{Equation1<br />
|equation= <math>\theta_k = \theta_{k-1} + \displaystyle{ \frac{1}{k} }\left( \displaystyle{ \frac{1}{N} }\sum_{i=1}^{N} \psi_i^{(k)} -\theta_{k-1} \right). <br />
</math> }}<br />
<br />
<br />
: Here, we can show that:<br />
<br />
{{Equation1<br />
|equation=<math> \theta_k - \hat{\theta} = \displaystyle{ \frac{k-a}{k} }(\theta_{k-1} - \hat{\theta}) + \displaystyle{\frac{e_k}{k} }, <br />
</math> }}<br />
<br />
: where $e_k \sim {\cal N}(0, \gamma^2 /N)$. Then, the sequence $(\theta_k)$ converges almost surely to $\hat{\theta}$.<br />
<br />
<br />
{{ImageWithCaption|image=saemb2.png|caption=10 sequences $(\theta_k)$ obtained with different initial values and $\gamma_k=1/k$ for $1\leq k \leq 50$ }}<br />
<br />
<br />
Thus, we see that by combining the two strategies, the sequence $(\theta_k)$ is a Markov chain that converges to a random walk around $\hat{\theta}$ during the first $K_1$ iterations, then converges almost surely to $\hat{\theta}$ during the next $K_2$ iterations.<br />
<br />
<br />
{{ImageWithCaption|image=saemb3.png|caption=10 sequences $(\theta_k)$ obtained with different initial values, $\gamma_k=1$ for $1\leq k \leq 20$ and $\gamma_k=1/(k-20)$ for $21\leq k \leq 50$ }}<br />
<br />
<br />
{{ShowVideo|image=saem5b.png|video=http://popix.lixoft.net/images/2/20/saem.mp4|caption=The SAEM algorithm in practice. }}<br />
<br />
<!-- {{ImageWithCaptionL|image=saem5.png|size=750px|caption= The SAEM algorithm in practice. (a) the observations and the initialization $p_0(\psi_i)$, (b) the initialization $p_0(\psi_i)$ and the conditional distributions of the observations $p(y_i{{!}}\psi_i)$, (c) the conditional distributions $p_0(\psi_i{{!}}y_i)$ and the simulated individual parameters $(\psi_i^{(1)})$, (d) the updated distribution $p_1(\psi_i)$. }} --><br />
<br />
==A simulated annealing version of SAEM==<br />
<br />
<br />
Convergence of SAEM can strongly depend on the initial guess when the likelihood ${\like}$ has several local maxima. A simulated annealing version of SAEM can improve convergence of the algorithm toward the global maximum of ${\like}$.<br />
<br />
To detail this, we can first rewrite the joint pdf of $(\by,\bpsi)$ as follows:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\pmacro(\by,\bpsi;\theta) = C(\theta)\, \exp \left\{-U(\by,\bpsi;\theta)\right\} ,<br />
\end{eqnarray}</math> }}<br />
<br />
where $C(\theta)$ is a normalizing constant that only depends on $\theta$. Then, for any "temperature" $T\geq0$, we consider the complete model<br />
<br />
{{Equation1<br />
|equation=<math>\pmacro_T(\by,\bpsi;\theta) = C_T(\theta)\, \exp \left\{-\displaystyle{\frac{1}{T} }U(\by,\bpsi;\theta) \right\} ,<br />
</math> }}<br />
<br />
where $C_T(\theta)$ is still a normalizing constant.<br />
<br />
We then introduce a decreasing temperature sequence $(T_k, 1\leq k \leq K)$ and use the SAEM algorithm on the complete model $\pmacro_{T_k}(\by,\bpsi;\theta)$ at iteration $k$ (the usual version of SAEM uses $T_k=1$ at each iteration). The sequence $(T_k)$ is chosen to have large positive values during the first iterations, then decrease with an exponential rate to 1: $ T_k = \max(1, \tau \ T_{k-1}) $.<br />
<br />
Consider for example the following model for continuous data:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
y_{ij} &\sim& {\cal N}(f(t_{ij};\psi_i) , a^2) \\<br />
h(\psi_i) &\sim& {\cal N}(\mu , \Omega) .<br />
\end{eqnarray}</math> }}<br />
<br />
Here, $\theta = (\mu,\Omega,a^2)$ and<br />
<br />
{{Equation1<br />
|equation=<math><br />
\pmacro(\by,\bpsi;\theta) = C(\theta)\, \exp \left\{- \displaystyle{ \frac{1}{2 a^2} }\sum_{i=1}^N \sum_{j=1}^{n_i} (y_{ij} - f(t_{ij};\psi_i))^2 - \displaystyle{ \frac{1}{2} } \sum_{i=1}^N (h(\psi_i)-\mu)^\transpose \Omega (h(\psi_i)-\mu) \right\},<br />
</math> }}<br />
<br />
where $C(\theta)$ is a normalizing constant that only depends on $a$ and $\Omega$.<br />
<br />
<br />
We see that $\pmacro_T(\by,\bpsi;\theta)$ will also be a normal distribution whose residual error variance $a^2$ is replaced by $T a^2$ and variance matrix $\Omega$ for the random effects by $T\Omega$.<br />
In other words, a model with a "large temperature" is a model with large variances.<br />
<br />
The algorithm therefore consists in choosing large initial variances $\Omega_0$ and $a^2_0$ (that include the initial temperature $T_0$ implicitly) and setting $ a^2_k = \max(\tau \ a^2_{k-1} , \hat{a}(\by,\bpsi^{(k)}) $ and $ \Omega_k = \max(\tau \ \Omega_{k-1} , \hat{\Omega}(\bpsi^{(k)}) $ during the first iterations. Here, $0\leq\tau\leq 1$.<br />
<br />
These large values of the variance make the conditional distributions $\pmacro_T(\psi_i | y_i;\theta)$ less concentrated around their modes, and thus allow the sequence $(\theta_k)$ to "escape" from local maxima of the likelihood during the first iterations of SAEM and converge to a neighborhood of the global maximum of ${\like}$.<br />
After these initial iterations, the usual SAEM algorithm is used to estimate these variances at each iteration.<br />
<br />
<br />
{{Remarks<br />
|title= Remark<br />
|text= We can use two different coefficients $\tau_1$ and $\tau_2$ for $\Omega$ and $a^2$ in $\monolix$. It is possible, for example, to choose $\tau_1<1$ and $\tau_2>1$, with large initial inter-subject variances $\Omega_0$ and small initial residual variance $a^2_0$. In this case, SAEM tries to obtain the best possible fit during the first iterations, allowing for a large inter-subject variability. During the next iterations, this variability is reduced and the residual variance increases until reaching the best possible trade-off between the two criteria.<br />
}}<br />
<br />
<br />
{{Example<br />
|title=A PK example<br />
|text= <br />
<br />
Consider a simple one-compartment model for oral administration:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:saem_sa"><math><br />
f(t;ka,V,k) = \displaystyle{ \frac{D\, ka}{V(ka-ke)} }\left( e^{-ke \, t} - e^{-ka \, t} \right) .<br />
</math></div><br />
|reference=(2) }}<br />
<br />
We then simulate PK data from 80 patients using the following population PK parameters:<br />
<br />
{{Equation1<br />
|equation=<math> ka_{\rm pop} = 1, \quad V_{\rm pop}=8, \quad ke_{\rm pop}=0.25 .</math> }}<br />
<br />
We can see that the following parametrization gives the same prediction as the one given in [[#eq:saem_sa|(2)]]:<br />
<br />
{{Equation1<br />
|equation=<math> \tilde{ka} = ke, \quad \tilde{V}=V \times ke/ka, \quad \tilde{ke}=ka . </math> }}<br />
<br />
We can then expect a (global) maximum around $(ka,V,ke) = (1, \ 8, \ 0.25)$ and a (local) maximum of the likelihood around $(ka,V,ke) = (0.25, \ 2, \ 1).$<br />
<br />
The figure below displays the convergence of SAEM without simulated annealing to a local maximum of the likelihood (deviance = $-2\,\log {\like} =816$). The initial values of the population parameters we chose were $(ka_0,V_0,k_0) = (1,1,1)$.<br />
<br />
:{{ImageWithCaption_special|image=recuit1.png|caption=Convergence of SAEM to a local maxima of the likelihood}} <br />
<br />
Using the same initial guess, the simulated annealing version of SAEM converges to the global maximum of the likelihood (deviance = 734).<br />
<br />
:{{ImageWithCaption_special|image=recuit2.png|caption=Convergence of SAEM to the global maxima of the likelihood }}<br />
}}<br />
<br />
<br />
<br><br />
== Bibliography ==<br />
<br />
<br />
<bibtex><br />
@article{allassonniere2010construction,<br />
title={Construction of Bayesian deformable models via a stochastic approximation algorithm: a convergence study},<br />
author={Allassonnière, S. and Kuhn, E. and Trouvé, A.},<br />
journal={Bernoulli},<br />
volume={16},<br />
number={3},<br />
pages={641--678},<br />
year={2010},<br />
publisher={Bernoulli Society for Mathematical Statistics and Probability}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{delattre2012maximum,<br />
title={Maximum likelihood estimation in discrete mixed hidden Markov models using the SAEM algorithm},<br />
author={Delattre, M. and Lavielle, M.},<br />
journal={Computational Statistics & Data Analysis},<br />
year={2012},<br />
volume={56},<br />
pages={2073-2085}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{delattre2013sde,<br />
title={Coupling the SAEM algorithm and the extended Kalman filter for maximum likelihood estimation in mixed-effects diffusion models},<br />
author={Delattre, M. and Lavielle, M.},<br />
journal={Statistics and its interfaces},<br />
year={2013},<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{delyon1999convergence,<br />
title={Convergence of a stochastic approximation version of the EM algorithm},<br />
author={Delyon, B. and Lavielle, M. and Moulines, E.},<br />
journal={Annals of Statistics},<br />
pages={94-128},<br />
year={1999},<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{dempster1977maximum,<br />
title={Maximum likelihood from incomplete data via the EM algorithm},<br />
author={Dempster, A.P. and Laird, N.M. and Rubin, D.B.},<br />
journal={Journal of the Royal Statistical Society. Series B (Methodological)},<br />
pages={1-38},<br />
year={1977},<br />
publisher={JSTOR}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{kuhn2004coupling,<br />
title={Coupling a stochastic approximation version of EM with an MCMC procedure},<br />
author={Kuhn, E. and Lavielle, M.},<br />
journal={ESAIM: Probability and Statistics},<br />
volume={8},<br />
pages={115-131},<br />
year={2004},<br />
publisher={EDP Sciences, 17 Avenue du Hoggar Les Ulis Cedex A BP 112 91944 France}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{lavielle2013improved,<br />
title={An improved SAEM algorithm for maximum likelihood estimation in mixtures of non linear mixed effects models},<br />
author={Lavielle, M. and Mbogning, C.},<br />
journal={Statistics and Computing},<br />
year={2013},<br />
publisher={Springer}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{mclachlan2007algorithm,<br />
title={The EM algorithm and extensions},<br />
author={McLachlan, G.J. and Krishnan, T.},<br />
volume={382},<br />
year={2007},<br />
publisher={Wiley-Interscience}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{samson2006extension,<br />
title={Extension of the SAEM algorithm to left-censored data in nonlinear mixed-effects model: Application to HIV dynamics model},<br />
author={Samson, A. and Lavielle, M. and Mentr&eacute;, F.},<br />
journal={Computational statistics & data analysis},<br />
volume={51},<br />
number={3},<br />
pages={1562-1574},<br />
year={2006},<br />
publisher={Elsevier}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{wei1990monte,<br />
title={A Monte Carlo implementation of the EM algorithm and the poor man's data augmentation algorithms},<br />
author={Wei, G. and Tanner, M.},<br />
journal={Journal of the American Statistical Association},<br />
volume={85},<br />
number={411},<br />
pages={699-704},<br />
year={1990},<br />
publisher={Taylor & Francis}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{wu1983convergence,<br />
title={On the convergence properties of the EM algorithm},<br />
author={Wu, C.F.},<br />
journal={The Annals of Statistics},<br />
volume={11},<br />
number={1},<br />
pages={95-103},<br />
year={1983},<br />
publisher={Institute of Mathematical Statistics}<br />
}<br />
</bibtex><br />
<br />
<br />
{{Back&Next<br />
|linkBack=Introduction and notation<br />
|linkNext=The Metropolis-Hastings algorithm for simulating the individual parameters }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=The_SAEM_algorithm_for_estimating_population_parameters&diff=7391The SAEM algorithm for estimating population parameters2013-06-21T09:16:49Z<p>Brocco: </p>
<hr />
<div>==Introduction ==<br />
<br />
<br />
The SAEM (Stochastic Approximation of EM) algorithm is a stochastic algorithm for calculating the maximum likelihood estimator (MLE) in the quite general setting of incomplete data models. SAEM has been shown to be a very powerful NLMEM tool, known to accurately estimate population parameters as well as having good theoretical properties. In fact, it converges to the MLE under very general hypotheses.<br />
<br />
SAEM was first implemented in the $\monolix$ software. It has also been implemented in NONMEM, the {{Verbatim|R}} package {{Verbatim|saemix}} and the Matlab statistics toolbox as the function {{Verbatim|nlmefitsa.m}}.<br />
<br />
Here, we consider a model that includes observations $\by=(y_i , 1\leq i \leq N)$, unobserved individual parameters $\bpsi=(\psi_i , 1\leq i \leq N)$ and a vector of parameters $\theta$. By definition, the maximum likelihood estimator of $\theta$ maximizes<br />
<br />
{{Equation1<br />
|equation=<math> {\like}(\theta ; \by) = \py(\by ; \theta) = \displaystyle{ \int \pypsi(\by,\bpsi ; \theta) \, d \bpsi}.<br />
</math> }}<br />
<br />
<br />
SAEM is an iterative algorithm that essentially consists of constructing $N$ [http://en.wikipedia.org/wiki/Markov_chain Markov chains] $(\psi_1^{(k)})$, ..., $ (\psi_N^{(k)})$ that converge to the conditional distributions $\pmacro(\psi_1|y_1),\ldots , \pmacro(\psi_N|y_N)$, using at each step the complete data $(\by,\bpsi^{(k)})$ to calculate a new parameter vector $\theta_k$. We will present a general description of the algorithm highlighting the connection with the EM algorithm, and present by way of a simple example how to implement SAEM and use it in practice.<br />
<br />
We will also give some extensions of the base algorithm that allow us to improve the convergence properties of the algorithm. For instance, it is possible to stabilize the algorithm's convergence by using several [http://en.wikipedia.org/wiki/Markov_chain Markov chains] per individual. Also, a simulated annealing version of SAEM allows us improve the chances of converging to the global maximum of the likelihood rather than to local maxima.<br />
<br />
<br />
<br><br />
==The EM algorithm==<br />
<br />
<br />
We first remark that if the individual parameters $\bpsi=(\psi_i)$ are observed, estimation is not thwarted by any particular problem because an estimator could be found by directly maximizing the joint distribution $\pypsi(\by,\bpsi ; \theta) $.<br />
<br />
However, since the $\psi_i$ are not observed, the EM algorithm replaces $\bpsi$ by its conditional expectation. Then, given some initial value $\theta_0$, iteration $k$ updates ${\theta}_{k-1}$ to ${\theta}_{k}$ with the two following steps:<br />
<br />
<br />
* $\textbf{E-step:}$ evaluate the quantity<br />
<br />
{{Equation1<br />
|equation=<math> Q_k(\theta)=\esp{\log \pmacro(\by,\bpsi;\theta){{!}} \by;\theta_{k-1} } .</math> }}<br />
<br />
<br />
* $\textbf{M-step:}$ update the estimation of $\theta$:<br />
<br />
{{Equation1<br />
|equation=<math> \theta_{k} = \argmax{\theta} \, Q_k(\theta) .<br />
</math> }}<br />
<br />
<br />
In can be proved that each EM iteration increases the likelihood of observations and that the EM sequence $(\theta_k)$ converges to a<br />
stationary point of the observed likelihood under mild regularity conditions.<br />
<br />
Unfortunately, in the framework of nonlinear mixed-effects models, there is no explicit expression for the E-step since the relationship between observations $\by$ and individual parameters $\bpsi$ is nonlinear. However, even though this expectation cannot be computed in a closed-form, it can be approximated by simulation. For instance,<br />
<br />
<br />
* The Monte Carlo EM (MCEM) algorithm replaces the E-step by a Monte Carlo approximation based on a large number of independent simulations of the non-observed individual parameters $\bpsi$.<br />
<br />
* The SAEM algorithm replaces the E-step by a stochastic approximation based on a single simulation of $\bpsi$.<br />
<br />
<br />
<br><br />
<br />
==The SAEM algorithm==<br />
<br />
At iteration $k$ of SAEM:<br />
<br />
<br />
* $\textbf{Simulation step}$: for $i=1,2,\ldots, N$, draw $\psi_i^{(k)}$ from the conditional distribution $\pmacro(\psi_i |y_i ;\theta_{k-1})$.<br />
<br />
<br />
* $\textbf{Stochastic approximation}$: update $Q_k(\theta)$ according to<br />
<br />
{{Equation1<br />
|equation=<math> Q_k(\theta) = Q_{k-1}(\theta) + \gamma_k ( \log \pmacro(\by,\bpsi^{(k)};\theta) - Q_{k-1}(\theta) ),<br />
</math> }}<br />
<br />
where $(\gamma_k)$ is a decreasing sequence of positive numbers such that $\gamma_1=1$, $ \sum_{k=1}^{\infty} \gamma_k = \infty$ and $\sum_{k=1}^{\infty} \gamma_k^2 < \infty$.<br />
<br />
<br />
* $\textbf{Maximization step}$: update $\theta_{k-1}$ according to<br />
<br />
{{Equation1<br />
|equation=<math> \theta_{k} = \argmax{\theta} \, Q_k(\theta) .</math> }}<br />
<br />
<br />
{{Remarks <br />
|title=Remarks<br />
|text= &#32;<br />
* Setting $\gamma_k=1$ for all $k$ means that there is no memory in the stochastic approximation:<br />
<br />
{{Equation1<br />
|equation=<math> Q_k(\theta) = \log \pmacro(\by,\bpsi^{(k)};\theta) . </math> }}<br />
<br />
: This algorithm, known as Stochastic EM (SEM) thus consists of successively simulating $\bpsi^{(k)}$ with the conditional distribution $\pmacro(\bpsi^{(k)} {{!}} \by;\theta_{k-1})$, then computing $\theta_k$ by maximizing the joint distribution $\pmacro(\by,\bpsi^{(k)};\theta)$.<br />
<br />
<br />
* When the number $N$ of subjects is small, convergence of SAEM can be improved by running $L$ [http://en.wikipedia.org/wiki/Markov_chain Markov chains] for each individual instead of one. The simulation step at iteration $k$ then requires us to draw $L$ sequences $ { \phi_i^{(k,1)} } ,\ldots , { \phi_i^{(k,L)} } $ for each individual $i$ and to combine stochastic approximation and Monte Carlo in the approximation step:<br />
<br />
{{Equation1<br />
|equation=<math> Q_k(\theta) = Q_{k-1}(\theta) + \gamma_k \left( \frac{1}{L}\sum_{\ell=1}^{L} \log \pmacro(\by,\bpsi^{(k,\ell)};\theta) - Q_{k-1}(\theta) \right) .<br />
</math> }}<br />
<br />
: By default, $\monolix$ selects $L$ so that $N\times L \geq 50$.<br />
}}<br />
<br />
<br />
Implementation of SAEM is simplified when the complete model $\pmacro(\by,\bpsi;\theta)$ belongs to a regular (curved) exponential family:<br />
<br />
{{Equation1<br />
|equation=<math> \pmacro(\by,\bpsi ;\theta) = \exp\left\{ - \zeta(\theta) + \langle \tilde{S}(\by,\bpsi) , \varphi(\theta) \rangle \right\} , </math> }}<br />
<br />
where $\tilde{S}(\by,\bpsi)$ is a sufficient statistic of the complete model (i.e., whose value contains all the information needed to compute any estimate of $\theta$) which takes its values in an open subset ${\cal S}$ of $\Rset^m$. Then, there exists a function $\tilde{\theta}$ such that for any $s\in {\cal S}$,<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:saem_stat"><math><br />
\tilde{\theta}(s) = \argmax{\theta} \left\{ - \zeta(\theta) + \langle s , \varphi(\theta) \rangle \right\} .<br />
</math></div><br />
|reference=(1) }}<br />
<br />
The approximation step of SAEM simplifies to a general Robbins-Monro-type scheme for approximating this conditional expectation:<br />
<br />
<br />
* $\textbf{Stochastic approximation}$: update $s_k$ according to<br />
<br />
{{Equation1<br />
|equation=<math><br />
s_k = s_{k-1} + \gamma_k ( \tilde{S}(\by,\bpsi^{(k)}) - s_{k-1} ) . </math> }}<br />
<br />
<br />
Note that the E-step of EM simplifies to computing $s_k=\esp{\tilde{S}(\by,\bpsi) | \by ; \theta_{k-1}}$.<br />
<br />
Then, both EM and SAEM algorithms use [[#eq:saem_stat|(1)]] for the M-step: $\theta_k = \tilde{\theta}(s_k)$.<br />
<br />
Precise results for convergence of SAEM were obtained in the [[Estimation of the observed Fisher information matrix#Estimation using linearization of the model|Estimation of the F.I.M. using a linearization of the model]] chapter in the case where $\pmacro(\by,\bpsi;\theta)$ belongs to a regular curved exponential family. This first version of [[The SAEM algorithm for estimating population parameters|SAEM]] and these first results assume that the individual parameters are simulated exactly under the conditional distribution at each iteration. Unfortunately, for most nonlinear models or non-Gaussian models, the unobserved data cannot be simulated exactly under this conditional distribution. A well-known alternative consists in using the Metropolis-Hastings algorithm: introduce a transition probability which has as unique invariant distribution the conditional distribution we want to simulate.<br />
<br />
In other words, the procedure consists of replacing the Simulation step of SAEM at iteration $k$ by $m$ iterations of the<br />
Metropolis-Hastings (MH) algorithm described in [[The Metropolis-Hastings algorithm for simulating the individual parameters|The Metropolis-Hastings algorithm]] section. It was shown in the [[Estimation of the observed Fisher information matrix#Estimation using linearization of the model|Estimation of the F.I.M. using a linearization of the model]] section that [[The SAEM algorithm for estimating population parameters|SAEM]] still converges under general conditions when coupled with a [http://en.wikipedia.org/wiki/Markov_chain Markov chain] Monte Carlo procedure.<br />
<br />
<br />
{{Remarks<br />
|title= Remark<br />
|text= Convergence of the [http://en.wikipedia.org/wiki/Markov_chain Markov chains] $(\psi_i^{(k)})$ is not necessary at each SAEM iteration. It suffices to run a few MH iterations with various transition kernels before resetting $\theta_{k-1}$. In $\monolix$ by default, three transition kernels are used twice each, successively, in each SAEM iteration.<br />
}}<br />
<br />
<br />
<br><br />
<br />
== Implementing SAEM ==<br />
<br />
Implementation of SAEM can be difficult to describe when looking at complex statistical models such as mixture models, models with inter-occasion variability, etc. We are therefore going to limit ourselves to looking at some basic models in order to illustrate how SAEM can be implemented.<br />
<br />
<br><br />
===SAEM for general hierarchical models===<br />
<br />
Consider first a very general model for any type (continuous, categorical, survival, etc.) of data $(y_i)$:<br />
<br />
{{Equation1<br />
|equation= <math>\begin{eqnarray} y_i {{!}} \psi_i &\sim& \pcyipsii(y_i {{!}} \psi_i) \\<br />
h(\psi_i) &\sim& {\cal N}( \mu , \Omega),<br />
\end{eqnarray}</math> }}<br />
<br />
where $h(\psi_i)=(h_1(\psi_{i,1}), h_2(\psi_{i,2}), \ldots , h_d(\psi_{i,d}) )^\transpose$ is a $d$-vector of (transformed) individual parameters, $\mu$ a $d$-vector of fixed effects and $\Omega$ a $d\times d$ variance-covariance matrix.<br />
<br />
We assume here that $\Omega$ is positive-definite. Then, a sufficient statistic for the complete model $\pmacro(\by,\bpsi;\theta)$ is<br />
$\tilde{S}(\bpsi) = (\tilde{S}_1(\bpsi),\tilde{S}_2(\bpsi))$, where<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\tilde{S}_1(\bpsi) &= & \sum_{i=1}^N h(\psi_i) \\<br />
\tilde{S}_2(\bpsi) &= & \sum_{i=1}^N h(\psi_i) h(\psi_i)^\transpose .<br />
\end{eqnarray}</math> }}<br />
<br />
At iteration $k$ of SAEM, we have:<br />
<br />
<br />
* $\textbf{Simulation step}$: for $i=1,2,\ldots, N$, draw $\psi_i^{(k)}$ from $m$ iterations of the MH algorithm described in [[The Metropolis-Hastings algorithm for simulating the individual parameters|The Metropolis-Hastings algorithm]] with $\pmacro(\psi_i |y_i ;\mu_{k-1},\Omega_{k-1})$ as limiting distribution.<br />
<br />
* $\textbf{Stochastic approximation}$: update $s_k=(s_{k,1},s_{k,2})$ according to<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
s_{k,1} &=& s_{k-1,1} + \gamma_k \left( \sum_{i=1}^N h(\psi_i^{(k)}) - s_{k-1,1} \right) \\<br />
s_{k,2} &=& s_{k-1,2} + \gamma_k \left( \sum_{i=1}^N h(\psi_i^{(k)})h(\psi_i^{(k)})^\transpose - s_{k-1,2} \right) .<br />
\end{eqnarray}</math> }}<br />
<br />
<br />
* $\textbf{Maximization step}$: update $(\mu_{k-1},\Omega_{k-1})$ according to<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\mu_{k} &=& \frac{1}{N} s_{k,1} \\<br />
\Omega_k &=& \frac{1}{N}\left( s_{k,2} - s_{k,1}s_{k,1}^\transpose \right) .<br />
\end{eqnarray}</math> }}<br />
<br />
<br />
What is remarkable is that it suffices to be able to calculate $\pcyipsii(y_i | \psi_i)$ for all $\psi_i$ and $y_i$ in order to be able to run SAEM. In effect, this allows the simulation step to be run using MH since the acceptance probabilities can be calculated.<br />
<br />
<br />
<br><br />
<br />
===SAEM for continuous data models===<br />
Consider now a continuous data model in which the residual error variance is now constant:<br />
<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
y_{ij} &=& f(t_{ij},\phi_i) + a \teps_{ij} \\<br />
h(\psi_i) &\sim& {\cal N}( \mu , \Omega) .<br />
\end{eqnarray}</math> }}<br />
<br />
<br />
Here, the individual parameters are $\psi_i=(\phi_i,a)$. The variance-covariance matrix for $\psi_i$ is not positive-definite in this case because $a$ has no variability. If we suppose that the variance matrix $\Omega$ is positive-definite, then noting $\theta=(\mu,\Omega,a)$, a natural decomposition of the model is:<br />
<br />
{{Equation1<br />
|equation=<math>\pmacro(\by,\bpsi;\theta) = \pmacro(\by {{!}} \bpsi;a)\pmacro(\bpsi;\mu,\Omega) .<br />
</math> }}<br />
<br />
The previous statistic $\tilde{S}(\bpsi) = (\tilde{S}_1(\bpsi),\tilde{S}_2(\bpsi))$ is not sufficient for estimating $a$. Indeed, we need an additional component which is a function both of $\by$ and $\bpsi$:<br />
<br />
{{Equation1<br />
|equation=<math> \tilde{S}_3(\by, \bpsi) =\sum_{i=1}^N \sum_{j=1}^{n_i}(y_{ij} - f(t_{ij},\psi_i))^2. </math> }}<br />
<br />
Then,<br />
<br />
{{Equation1<br />
|equation=<math> \begin{eqnarray}<br />
s_{k,3} &=& s_{k-1,3} + \gamma_k ( \tilde{S}_3(\by, \bpsi) - s_{k-1,3} ) \\<br />
a_k^2 &=& \displaystyle{ \frac{1}{\sum_{i=1}^N n_i} s_{k,3} }\ .<br />
\end{eqnarray}</math> }}<br />
<br />
The choice of step-size $(\gamma_k)$ is extremely important for ensuring convergence of SAEM. The sequence $(\gamma_k)$ used in $\monolix$ decreases like $k^{-\alpha}$. We recommend using $\alpha=0$ (that is, $\gamma_k=1$) during the first $K_1$ iterations, in order to converge quickly to a neighborhood of a maximum of the likelihood, and $\alpha=1$ during the next $K_2$ iterations.<br />
Indeed, the initial guess $\theta_0$ may be far from the maximum likelihood value we are looking for, and the first iterations with $\gamma_k=1$ allow SAEM to converge quickly to a neighborhood of this value. Following this, smaller step-sizes ensure the<br />
almost sure convergence of the algorithm to the maximum likelihood estimator.<br />
<br />
<br />
<br />
{{Example<br />
|title=Example<br />
|text= Consider a simple model for continuous data:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
y_{ij} &\sim& {\cal N}(A_i\,e^{-k_i \, t_{ij} } , a^2) \\<br />
\log(A_i)&\sim&{\cal N}(\log(A_{\rm pop}) , \omega_A^2) \\<br />
\log(k_i)&\sim&{\cal N}(\log(k_{\rm pop}) , \omega_k^2) ,<br />
\end{eqnarray}</math> }}<br />
<br />
where $A_{\rm pop}=6$, $k_{\rm pop}=0.25$, $\omega_A=0.3$, $\omega_k=0.3$ and $a=0.2$.<br />
Let us look at the effect of different settings for $(\gamma_k)$ (and $L$) for estimating the population parameters of the model with SAEM.<br />
<br />
<br />
1. For all $k$, $\gamma_k = 1$: the sequence $(\theta_{k})$ converges very quickly to a neighborhood of the "solution". The sequence $(\theta_{k})$ is a homogeneous Markov Chain that converges in distribution but does not converge almost surely. <br />
<br />
[[File:saem1.png|link=]]<br />
<br />
<br />
2. For all $k$, $\gamma_k = 1/k$: the sequence $(\theta_{k})$ converges almost surely to the maximum likelihood estimate of $\theta$, but very slowly. <br />
<br />
[[File:saem2.png|link=]]<br />
<br />
<br />
3. $\gamma_k = 1$, $k=1$, ...,$40$, $\gamma_k = 1/(k-40)$, $k \geq 41$: the sequence $(\theta_{k})$ converges almost surely to the maximum likelihood estimate of $\theta$, and quickly.<br />
<br />
[[File:saem3.png|link=]]<br />
<br />
<br />
4. $L=10$, $\gamma_k = 1$, $k \geq 1$: the sequence $(\theta_{k})$ is an homogeneous Markov chain that converges in distribution, as in Example 1, but the variance is reduced by a factor $\sqrt{10}$; in this case, SAEM behaves like EM. <br />
<br />
[[File:saem4.png|link=]]<br />
}}<br />
<br />
<br />
<br><br />
<br />
==A simple example to understand why SAEM converges in practice==<br />
<br />
<br />
Let us look at a very simple Gaussian model, with only one observation per individual:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\psi_i &\sim& {\cal N}(\theta,\omega^2) , \ \ \ 1 \leq i \leq N \\<br />
y_i &\sim& {\cal N}(\psi_i,\sigma^2).<br />
\end{eqnarray}</math> }}<br />
<br />
We will furthermore assume that both $\omega^2$ and $\sigma^2$ are known.<br />
<br />
Here, the maximum likelihood estimator $ \hat{\theta}$ of $\theta$ is easy to compute since $y_i \sim_{i.i.d.} {\cal N}(\theta,\omega^2+\sigma^2)$. We find that<br />
<br />
{{Equation1<br />
|equation=<math> \hat{\theta} = \displaystyle{\frac{1}{N} }\sum_{i=1}^{N} y_i .<br />
</math>}}<br />
<br />
We now propose to try and compute $\hat{\theta}$ using SAEM instead. The simulation step is straightforward since the conditional distribution of $\psi_i$ is a normal distribution:<br />
<br />
{{Equation1<br />
|equation=<math><br />
\psi_i {{!}} y_i \sim {\cal N}(a \theta + (1-a)y_i , \gamma^2) ,<br />
</math> }}<br />
<br />
where<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
a &= & \displaystyle{ \frac{1}{\omega^2} } \left(\displaystyle{ \frac{1}{\sigma^2} }+ \displaystyle{\frac{1}{\omega^2} }\right)^{-1} \\<br />
\gamma^2 &= &\left(\displaystyle{ \frac{1}{\sigma^2} }+ \displaystyle{\frac{1}{\omega^2} }\right)^{-1}.<br />
\end{eqnarray}</math> }}<br />
<br />
The maximization step is also straightforward. Indeed, a sufficient statistic for estimating $\theta$ is<br />
<br />
{{Equation1<br />
|equation=<math> {\cal S}(\bpsi) = \sum_{i=1}^{N} \psi_i. </math> }}<br />
<br />
Then,<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\tilde{\theta}({\cal S(\bpsi)} ) &=& \argmax{\theta} \pmacro(y_1,\ldots,y_N,\psi_1,\ldots,\psi_N;\theta) \\<br />
&=& \argmax{\theta} \pmacro(\psi_1,\ldots,\psi_N;\theta) \\<br />
&=& \frac{ {\cal S}(\bpsi)}{N}.<br />
\end{eqnarray}</math> }}<br />
<br />
Let us first look at the behavior of SAEM when $\gamma_k=1$. At iteration $k$,<br />
<br />
<br />
* Simulation step: $\psi_i^{(k)} \sim {\cal N}( a \theta_{k-1} + (1-a)y_i , \gamma^2). $<br />
<br />
* Maximization step: $\theta_k = \displaystyle{ \frac{ {\cal S}(\bpsi^{(k)})}{N} } = \displaystyle{ \frac{1}{N} }\sum_{i=1}^{N} \psi_i^{(k)}$.<br />
<br />
<br />
It can be shown that:<br />
<br />
{{Equation1<br />
|equation=<math> \theta_k - \hat{\theta} = a(\theta_{k-1} - \hat{\theta}) + e_k ,<br />
</math> }}<br />
<br />
where $e_k \sim {\cal N}(0, \gamma^2 /N)$. Then, the sequence $(\theta_k)$ is an autoregressive process of order 1 (AR(1)) which converges in distribution to a normal distribution when $k\to \infty$:<br />
<br />
{{Equation1<br />
|equation=<math>\theta_k \limite{}{\cal D} {\cal N}\left(\hat{\theta} , \displaystyle{ \frac{\gamma^2}{N(1-a^2)} }\right) .<br />
</math> }}<br />
<br />
<br />
{{ImageWithCaption|image=saemb1.png|caption=10 sequences $(\theta_k)$ obtained with different initial values and $\gamma_k=1$ for $1\leq k \leq 50$ }} <br />
<br />
<br />
Now, let us see what happens instead when $\gamma_k$ decreases like $1/k$. At iteration $k$,<br />
<br />
<br />
* Simulation step: $\psi_i^{(k)} \sim {\cal N}( a \theta_{k-1} + (1-a)y_i , \gamma^2) $<br />
<br />
* Maximization step:<br />
<br />
{{Equation1<br />
|equation= <math>\theta_k = \theta_{k-1} + \displaystyle{ \frac{1}{k} }\left( \displaystyle{ \frac{1}{N} }\sum_{i=1}^{N} \psi_i^{(k)} -\theta_{k-1} \right). <br />
</math> }}<br />
<br />
<br />
: Here, we can show that:<br />
<br />
{{Equation1<br />
|equation=<math> \theta_k - \hat{\theta} = \displaystyle{ \frac{k-a}{k} }(\theta_{k-1} - \hat{\theta}) + \displaystyle{\frac{e_k}{k} }, <br />
</math> }}<br />
<br />
: where $e_k \sim {\cal N}(0, \gamma^2 /N)$. Then, the sequence $(\theta_k)$ converges almost surely to $\hat{\theta}$.<br />
<br />
<br />
{{ImageWithCaption|image=saemb2.png|caption=10 sequences $(\theta_k)$ obtained with different initial values and $\gamma_k=1/k$ for $1\leq k \leq 50$ }}<br />
<br />
<br />
Thus, we see that by combining the two strategies, the sequence $(\theta_k)$ is a Markov chain that converges to a random walk around $\hat{\theta}$ during the first $K_1$ iterations, then converges almost surely to $\hat{\theta}$ during the next $K_2$ iterations.<br />
<br />
<br />
{{ImageWithCaption|image=saemb3.png|caption=10 sequences $(\theta_k)$ obtained with different initial values, $\gamma_k=1$ for $1\leq k \leq 20$ and $\gamma_k=1/(k-20)$ for $21\leq k \leq 50$ }}<br />
<br />
<br />
{{ShowVideo|image=saem5b.png|video=http://popix.lixoft.net/images/2/20/saem.mp4|caption=The SAEM algorithm in practice. }}<br />
<br />
<!-- {{ImageWithCaptionL|image=saem5.png|size=750px|caption= The SAEM algorithm in practice. (a) the observations and the initialization $p_0(\psi_i)$, (b) the initialization $p_0(\psi_i)$ and the conditional distributions of the observations $p(y_i{{!}}\psi_i)$, (c) the conditional distributions $p_0(\psi_i{{!}}y_i)$ and the simulated individual parameters $(\psi_i^{(1)})$, (d) the updated distribution $p_1(\psi_i)$. }} --><br />
<br />
==A simulated annealing version of SAEM==<br />
<br />
<br />
Convergence of SAEM can strongly depend on the initial guess when the likelihood ${\like}$ has several local maxima. A simulated annealing version of SAEM can improve convergence of the algorithm toward the global maximum of ${\like}$.<br />
<br />
To detail this, we can first rewrite the joint pdf of $(\by,\bpsi)$ as follows:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\pmacro(\by,\bpsi;\theta) = C(\theta)\, \exp \left\{-U(\by,\bpsi;\theta)\right\} ,<br />
\end{eqnarray}</math> }}<br />
<br />
where $C(\theta)$ is a normalizing constant that only depends on $\theta$. Then, for any "temperature" $T\geq0$, we consider the complete model<br />
<br />
{{Equation1<br />
|equation=<math>\pmacro_T(\by,\bpsi;\theta) = C_T(\theta)\, \exp \left\{-\displaystyle{\frac{1}{T} }U(\by,\bpsi;\theta) \right\} ,<br />
</math> }}<br />
<br />
where $C_T(\theta)$ is still a normalizing constant.<br />
<br />
We then introduce a decreasing temperature sequence $(T_k, 1\leq k \leq K)$ and use the SAEM algorithm on the complete model $\pmacro_{T_k}(\by,\bpsi;\theta)$ at iteration $k$ (the usual version of SAEM uses $T_k=1$ at each iteration). The sequence $(T_k)$ is chosen to have large positive values during the first iterations, then decrease with an exponential rate to 1: $ T_k = \max(1, \tau \ T_{k-1}) $.<br />
<br />
Consider for example the following model for continuous data:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
y_{ij} &\sim& {\cal N}(f(t_{ij};\psi_i) , a^2) \\<br />
h(\psi_i) &\sim& {\cal N}(\mu , \Omega) .<br />
\end{eqnarray}</math> }}<br />
<br />
Here, $\theta = (\mu,\Omega,a^2)$ and<br />
<br />
{{Equation1<br />
|equation=<math><br />
\pmacro(\by,\bpsi;\theta) = C(\theta)\, \exp \left\{- \displaystyle{ \frac{1}{2 a^2} }\sum_{i=1}^N \sum_{j=1}^{n_i} (y_{ij} - f(t_{ij};\psi_i))^2 - \displaystyle{ \frac{1}{2} } \sum_{i=1}^N (h(\psi_i)-\mu)^\transpose \Omega (h(\psi_i)-\mu) \right\},<br />
</math> }}<br />
<br />
where $C(\theta)$ is a normalizing constant that only depends on $a$ and $\Omega$.<br />
<br />
<br />
We see that $\pmacro_T(\by,\bpsi;\theta)$ will also be a normal distribution whose residual error variance $a^2$ is replaced by $T a^2$ and variance matrix $\Omega$ for the random effects by $T\Omega$.<br />
In other words, a model with a "large temperature" is a model with large variances.<br />
<br />
The algorithm therefore consists in choosing large initial variances $\Omega_0$ and $a^2_0$ (that include the initial temperature $T_0$ implicitly) and setting $ a^2_k = \max(\tau \ a^2_{k-1} , \hat{a}(\by,\bpsi^{(k)}) $ and $ \Omega_k = \max(\tau \ \Omega_{k-1} , \hat{\Omega}(\bpsi^{(k)}) $ during the first iterations. Here, $0\leq\tau\leq 1$.<br />
<br />
These large values of the variance make the conditional distributions $\pmacro_T(\psi_i | y_i;\theta)$ less concentrated around their modes, and thus allow the sequence $(\theta_k)$ to "escape" from local maxima of the likelihood during the first iterations of SAEM and converge to a neighborhood of the global maximum of ${\like}$.<br />
After these initial iterations, the usual SAEM algorithm is used to estimate these variances at each iteration.<br />
<br />
<br />
{{Remarks<br />
|title= Remark<br />
|text= We can use two different coefficients $\tau_1$ and $\tau_2$ for $\Omega$ and $a^2$ in $\monolix$. It is possible, for example, to choose $\tau_1<1$ and $\tau_2>1$, with large initial inter-subject variances $\Omega_0$ and small initial residual variance $a^2_0$. In this case, SAEM tries to obtain the best possible fit during the first iterations, allowing for a large inter-subject variability. During the next iterations, this variability is reduced and the residual variance increases until reaching the best possible trade-off between the two criteria.<br />
}}<br />
<br />
<br />
{{Example<br />
|title=A PK example<br />
|text= <br />
<br />
Consider a simple one-compartment model for oral administration:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:saem_sa"><math><br />
f(t;ka,V,k) = \displaystyle{ \frac{D\, ka}{V(ka-ke)} }\left( e^{-ke \, t} - e^{-ka \, t} \right) .<br />
</math></div><br />
|reference=(2) }}<br />
<br />
We then simulate PK data from 80 patients using the following population PK parameters:<br />
<br />
{{Equation1<br />
|equation=<math> ka_{\rm pop} = 1, \quad V_{\rm pop}=8, \quad ke_{\rm pop}=0.25 .</math> }}<br />
<br />
We can see that the following parametrization gives the same prediction as the one given in [[#eq:saem_sa|(2)]]:<br />
<br />
{{Equation1<br />
|equation=<math> \tilde{ka} = ke, \quad \tilde{V}=V \times ke/ka, \quad \tilde{ke}=ka . </math> }}<br />
<br />
We can then expect a (global) maximum around $(ka,V,ke) = (1, \ 8, \ 0.25)$ and a (local) maximum of the likelihood around $(ka,V,ke) = (0.25, \ 2, \ 1).$<br />
<br />
The figure below displays the convergence of SAEM without simulated annealing to a local maximum of the likelihood (deviance = $-2\,\log {\like} =816$). The initial values of the population parameters we chose were $(ka_0,V_0,k_0) = (1,1,1)$.<br />
<br />
:{{ImageWithCaption_special|image=recuit1.png|caption=Convergence of SAEM to a local maxima of the likelihood}} <br />
<br />
Using the same initial guess, the simulated annealing version of SAEM converges to the global maximum of the likelihood (deviance = 734).<br />
<br />
:{{ImageWithCaption_special|image=recuit2.png|caption=Convergence of SAEM to the global maxima of the likelihood }}<br />
}}<br />
<br />
<br />
<br><br />
== Bibliography ==<br />
<br />
<br />
<bibtex><br />
@article{allassonniere2010construction,<br />
title={Construction of Bayesian deformable models via a stochastic approximation algorithm: a convergence study},<br />
author={Allassonnière, S. and Kuhn, E. and Trouvé, A.},<br />
journal={Bernoulli},<br />
volume={16},<br />
number={3},<br />
pages={641--678},<br />
year={2010},<br />
publisher={Bernoulli Society for Mathematical Statistics and Probability}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{delattre2012maximum,<br />
title={Maximum likelihood estimation in discrete mixed hidden Markov models using the SAEM algorithm},<br />
author={Delattre, M. and Lavielle, M.},<br />
journal={Computational Statistics & Data Analysis},<br />
year={2012},<br />
volume={56},<br />
pages={2073-2085}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{delattre2013sde,<br />
title={Coupling the SAEM algorithm and the extended Kalman filter for maximum likelihood estimation in mixed-effects diffusion models},<br />
author={Delattre, M. and Lavielle, M.},<br />
journal={Statistics and its interfaces},<br />
year={2013},<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{delyon1999convergence,<br />
title={Convergence of a stochastic approximation version of the EM algorithm},<br />
author={Delyon, B. and Lavielle, M. and Moulines, E.},<br />
journal={Annals of Statistics},<br />
pages={94-128},<br />
year={1999},<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{dempster1977maximum,<br />
title={Maximum likelihood from incomplete data via the EM algorithm},<br />
author={Dempster, A.P. and Laird, N.M. and Rubin, D.B.},<br />
journal={Journal of the Royal Statistical Society. Series B (Methodological)},<br />
pages={1-38},<br />
year={1977},<br />
publisher={JSTOR}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{kuhn2004coupling,<br />
title={Coupling a stochastic approximation version of EM with an MCMC procedure},<br />
author={Kuhn, E. and Lavielle, M.},<br />
journal={ESAIM: Probability and Statistics},<br />
volume={8},<br />
pages={115-131},<br />
year={2004},<br />
publisher={EDP Sciences, 17 Avenue du Hoggar Les Ulis Cedex A BP 112 91944 France}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{lavielle2013improved,<br />
title={An improved SAEM algorithm for maximum likelihood estimation in mixtures of non linear mixed effects models},<br />
author={Lavielle, M. and Mbogning, C.},<br />
journal={Statistics and Computing},<br />
year={2013},<br />
publisher={Springer}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{mclachlan2007algorithm,<br />
title={The EM algorithm and extensions},<br />
author={McLachlan, G.J. and Krishnan, T.},<br />
volume={382},<br />
year={2007},<br />
publisher={Wiley-Interscience}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{samson2006extension,<br />
title={Extension of the SAEM algorithm to left-censored data in nonlinear mixed-effects model: Application to HIV dynamics model},<br />
author={Samson, A. and Lavielle, M. and Mentr&eacute;, F.},<br />
journal={Computational statistics & data analysis},<br />
volume={51},<br />
number={3},<br />
pages={1562-1574},<br />
year={2006},<br />
publisher={Elsevier}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{wei1990monte,<br />
title={A Monte Carlo implementation of the EM algorithm and the poor man's data augmentation algorithms},<br />
author={Wei, G. and Tanner, M.},<br />
journal={Journal of the American Statistical Association},<br />
volume={85},<br />
number={411},<br />
pages={699-704},<br />
year={1990},<br />
publisher={Taylor & Francis}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{wu1983convergence,<br />
title={On the convergence properties of the EM algorithm},<br />
author={Wu, C.F.},<br />
journal={The Annals of Statistics},<br />
volume={11},<br />
number={1},<br />
pages={95-103},<br />
year={1983},<br />
publisher={Institute of Mathematical Statistics}<br />
}<br />
</bibtex><br />
<br />
<br />
<br />
{{Back&Next<br />
|linkBack=Introduction and notation<br />
|linkNext=The Metropolis-Hastings algorithm for simulating the individual parameters }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=Hidden_Markov_models&diff=7390Hidden Markov models2013-06-21T09:13:28Z<p>Brocco: </p>
<hr />
<div><!-- Menu for the Extensions chapter --><br />
<sidebarmenu><br />
+[[Extensions]]<br />
*[[Extensions| Introduction ]] | [[ Mixture models ]] | [[Hidden Markov models]] | [[Stochastic differential equations based models]] <br />
</sidebarmenu><br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
==Introduction==<br />
<br />
<br />
[http://en.wikipedia.org/wiki/Markov_chain Markov chains] are a useful tool for analyzing categorical longitudinal data. However, sometimes the [https://en.wikipedia.org/wiki/Markov_process Markov process] cannot be directly observed, though some output, dependent on the<br />
(hidden) state, is visible. More precisely, we assume that the distribution of this observable output depends on the underlying hidden state. Such models are called hidden Markov models (HMMs).<br />
HMMs can be applied in many contexts and have turned out to be particularly pertinent in several biological contexts. For example, they are useful when characterizing diseases for which the existence of several discrete stages of illness is a realistic assumption, e.g., epilepsy and migraines.<br />
<br />
Here, we will consider a parametric framework with [http://en.wikipedia.org/wiki/Markov_chain Markov chains] in a discrete and finite state space $\mathbf{K} = \{1,\ldots,K\}$.<br />
<br />
<br />
<br><br />
<br />
==Mixed hidden Markov models==<br />
<br />
<br />
HMMs have been developed to describe how a given system moves from one state to another over time, in situations where the successive visited states are unknown and a set of observations is the only available information to describe the dynamics of the system. HMMs can be seen as a variant of mixture models that allow for possible memory in the sequence of hidden states. An HMM is thus defined as a pair of processes $(z_j,y_j, j=1,2,\ldots)$, where the latent sequence $(z_j)$ is a [http://en.wikipedia.org/wiki/Markov_chain Markov chain] and where the distribution of the observation $y_j$ at time $t_j$ depends on the state $z_j$.<br />
<br />
<br />
{{ImageWithCaption|image=hmm0.png|caption=Dynamics of a hidden Markov model}}<br />
<br />
<br />
In a population approach, HMMs from several individuals can be described simultaneously by considering ''mixed'' HMMs.<br />
Let $y_i=\left(y_{i,1},\ldots,y_{i,n_i}\right)$ and $z_i= \left(z_{i,1}, \ldots,z_{i,n_i}\right)$ denote respectively the sequences of observations and hidden states for individual $i$.<br />
<br />
We suppose that the joint distribution of $(z_i,y_i)$ is a parametric distribution that depends on a vector of parameters $\psi_i$ and can be decomposed as<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:hmm1"><math><br />
\pcyzipsii(z_i,y_i {{!}} \psi_i) = \pczipsii(z_i {{!}}\psi_i) \, \pcyizpsii(y_i {{!}} z_i,\psi_i) .<br />
</math></div><br />
|reference=(1) }}<br />
<br />
For each individual $i$, $z_i$ is a [http://en.wikipedia.org/wiki/Markov_chain Markov chain] whose probability distribution is defined by<br />
<br />
<br />
<ul><br />
* the distribution $ \pi_{i,1} = (\pi_{i,1}^{k},\ k=1,2,\ldots,K)$ of the first state $z_{i,1}$:<br />
<br />
{{Equation1<br />
|equation=<math> \pi_{i,1}^{k} = \prob{z_{i,1} = k {{!}} \psi_i} . </math> }}<br />
<br />
<br />
* the sequence of ''transition matrices'' $(Q_{i,j} \ ; \, j=2,3,\ldots)$, where for each $j$, $Q_{i,j} = (q_{i,j}^{\ell,k} \ ; \, 1\leq \ell,k \leq K)$ is a matrix of size $K \times K$ such that $q_{i,j}^{\ell,k} = \prob{z_{i,j} = k | z_{i,j-1}=\ell , \psi_i}$.<br />
</ul><br />
<br />
<br />
{{ImageWithCaption|image=markov_1.png|caption=Transitions of a Markov chain with 3 states}}<br />
<br />
<br />
The conditional distribution $\qcyizpsii$ depends on the model for the observations: for each state, observation $y_{ij}$ has a certain distribution. Let us see some examples:<br />
<br />
<br />
<br><br />
=== Examples ===<br />
<br />
<br />
1. In a continuous data model, one possibility is that the residual error model is a hidden Markov model that can randomly switch between $K$ possible residual error models.<br />
<br />
<br />
{{Example<br />
|title=Example 1<br />
|text=In this example, we consider a 2-state Markov chain. A constant error model is assumed in each state:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
y_{ij} &=& \sin(\alpha \, t_{ij}) + a_{i,1} \teps_{ij} \quad \text{if } z_{ij}=1 \\<br />
y_{ij} &=& \sin(\alpha \, t_{ij}) + a_{i,2} \teps_{ij} \quad \text{if } z_{ij}=2.<br />
\end{eqnarray}</math> }}<br />
<br />
The figure below displays simulated data from this model for 4 individuals. Observations drawn from state 1 (resp. state 2) are displayed in magenta (resp. black). Of course, the states are unknown in the case of hidden Markov models, i.e., only the values are observed in practice, not the colors.<br />
<br />
<br />
::[[File:hmm1bis.png|link=]]<br />
<br />
}}<br />
<br />
<br />
<br />
2. In a Poisson model for count data, the Poisson parameter might randomly switch between $K$ intensities. Such models have been used for describing the evolution of seizures in epileptic patients:<br />
<br />
<br />
{{Example<br />
|title=Example 2<br />
|text= Instead of assuming a single Poisson distribution for the observed numbers of seizures, this model assumes that patients go through alternating periods of low and high epileptic susceptibility. Therefore we consider what is called a 2-state Poisson mixed-HMM:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
y_{ij} &\sim& {\rm Poisson}(\lambda_{i,1}) \quad \text{if } z_{ij}=1 \\<br />
y_{ij} &\sim& {\rm Poisson}(\lambda_{i,2}) \quad \text{if } z_{ij}=2.<br />
\end{eqnarray}</math> }}<br />
<br />
<br />
:: [[File:hmm2bis.png|link=]]<br />
<br />
}}<br />
<br />
<br />
<br />
<br><br />
<br />
==Distributions of observations==<br />
<br />
<br />
Assuming that the $N$ individuals are independent, the joint pdf is given by:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:sdepdf"><math><br />
\pcypsi(y_1,\ldots,y_N {{!}} \psi_1,\ldots,\psi_N ) = \prod_{i=1}^{N}\pcyipsii(y_i {{!}} \psi_i).<br />
</math></div><br />
|reference=(2) }}<br />
<br />
Then, computing the conditional distribution of the observations $\qcyipsii$ for any individual $i$ requires integration of the joint conditional distribution $\qcyzipsii$ over the states:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\pcyipsii(y_i {{!}} \psi_i) &=& \sum_{z_i \in \mathbf{S} } \pcyzipsii(z_i, y_i {{!}} \psi_i) \\<br />
&=& \sum_{z_i \in \mathbf{S} } \pczipsii(z_i {{!}} \psi_i) \, \pcyizpsii(y_i {{!}} z_i,\psi_i) \\<br />
&=& \sum_{z_i \in \mathbf{S} } \left\{ \pi_{i,1}^{z_{i,1} } \pcyiONEzpsii(y_{i,1} {{!}} z_{i,1},\psi_i)\prod_{j=2}^{n} \left( q_{i,j}^{z_{i,j-1},z_{i,j} } \, \pcyijzpsii(y_{i,j} {{!}} z_{i,j},\psi_i) \right) \right\} .<br />
\end{eqnarray}</math> }}<br />
<br />
Though this looks complicated, it turns out that forward recursion of the [http://en.wikipedia.org/wiki/Baum-Welch_algorithm Baum-Welch algorithm] provides a quick way to numerically compute it.<br />
<br />
<br />
<br />
<br><br />
<br />
== Bibliography==<br />
<br />
<br />
<bibtex><br />
@article{Albert1991,<br />
title = "A two state Markov mixture model for a time series of epileptic seizure counts",<br />
author = "Albert, P. S.",<br />
journal = "Biometrics",<br />
volume = "47",<br />
year = "1991",<br />
pages = "1371-1381"}<br />
</bibtex><br />
<bibtex><br />
@article{Altman2007,<br />
title = "Mixed hidden Markov models : an extension of the hidden Markov model to the longitudinal data setting",<br />
author = "Altman, R. M.",<br />
journal = "Journal of the American Statistical Association",<br />
volume = "102",<br />
year = "2007",<br />
pages = "201-210"}<br />
</bibtex><br />
<bibtex><br />
@article{Anisimov2007,<br />
title = "Analysis of responses in migraine modelling using hidden Markov models",<br />
author = "Anisimov, W. and Maas, H. J. and Danhof, M. and Della Pasqua, O.",<br />
journal = "Statistics in Medicine",<br />
volume = "26",<br />
year = "2007",<br />
pages = "4163-4178"}<br />
</bibtex><br />
<bibtex><br />
@book{Cappe2005,<br />
author = "Capp&eacute;e, O. and Moulines, E. and Ryd&eacute;en, T.",<br />
title = "Inference in hidden Markov models",<br />
year = "2005",<br />
publisher= "Springer Series in Statistics"}<br />
</bibtex><br />
<bibtex><br />
@article{ChaubertPereira2011,<br />
title = "Markov and Semi-Markov Switching Linear Mixed Models Used to Identify<br />
Forest Tree Growth Components",<br />
author = "Chaubert-Pereira, F. and Gu&eacute;don, Y. and Lavergne, C. and Trottier, C.",<br />
journal = "Biometrics",<br />
volume = "66",<br />
year = "2011",<br />
pages = "753-762"}<br />
</bibtex><br />
<bibtex><br />
@article{delattre2012maximum,<br />
title={Maximum likelihood estimation in discrete mixed hidden Markov models using the SAEM algorithm},<br />
author={Delattre, M. and Lavielle, M.},<br />
journal={Computational Statistics & Data Analysis},<br />
year={2012},<br />
publisher={Elsevier}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{delattre2012analysis,<br />
title={Analysis of exposure-response of CI-945 in patients with epilepsy: application of novel mixed hidden Markov modeling methodology},<br />
author={Delattre, M. and Savic, R. M. and Miller, R. and Karlsson, M. O. and Lavielle, M.},<br />
journal={Journal of pharmacokinetics and pharmacodynamics},<br />
pages={1-9},<br />
year={2012},<br />
publisher={Springer}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{Maruotti2009,<br />
title = "A semiparametric approach to hidden Markov models under longitudinal<br />
observations",<br />
author = "Maruotti, A. and Ryd&eacute;en, T.",<br />
journal = "Statistics and Computing",<br />
volume = "19",<br />
year = "2009",<br />
pages = "381-393"}<br />
</bibtex><br />
<bibtex><br />
@article{Rabiner1989,<br />
title = "A tutorial on Hidden Markov Models and selected applications in speech recognition",<br />
author = "Rabiner, L. R.",<br />
journal = "Proceedings of the IEEE",<br />
volume = "77",<br />
year = "1989",<br />
pages = "257-286"}<br />
</bibtex><br />
<bibtex><br />
@article{Rijmen2008,<br />
title = "Qualitative longitudinal analysis of symptoms in patients with primary<br />
and metastatic brain tumours",<br />
author = "Rijmen, F. and Ip, E. H. and Rapp, S. and Shaw, E. G.",<br />
journal = "Journal of the Royal Statistical Society - Series A.",<br />
volume = "171, Part 3",<br />
year = "2008",<br />
pages = "739-753"}<br />
</bibtex><br />
<br />
<br />
{{Back&Next<br />
|linkBack= Mixture models<br />
|linkNext= Stochastic differential equations based models }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=Models_for_count_data&diff=7389Models for count data2013-06-21T09:04:42Z<p>Brocco: </p>
<hr />
<div><!-- Menu for the Observations chapter --><br />
<sidebarmenu><br />
+[[Modeling the observations]]<br />
*[[Modeling the observations| Introduction ]] | [[ Continuous data models ]] | [[Models for count data]] | [[Model for categorical data]] | [[Models for time-to-event data ]] | [[Joint models]] <br />
</sidebarmenu><br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
Count data is a special type of statistical data that can only take non-negative integer values $\{0, 1, 2,\ldots\}$ that come from counting something, e.g., the number of seizures, hemorrhages or lesions in each given time period. More precisely, data from individual $i$ is the sequence $y_i=(y_{ij},1\leq j \leq n_i)$ where $y_{ij}$ is the number of events observed in the $j$th time interval $I_{ij}$.<br />
<br />
For the moment, let us assume that all the intervals have the same length. This is the case, for instance, if data are daily seizure counts: $I_{ij}$ is the $j$th day after the start of the experiment and $y_{ij}$ the number of seizures observed during that day.<br />
<br />
We will then model the sequence $y_i=(y_{ij},1\leq j \leq n_i)$ as a sequence of random variables that take its values in $\{ 0, 1, 2,\ldots\}$.<br />
<br />
If we assume that these random variables are independent, then the model is completely defined by the probability mass functions $\prob{y_{ij}=k}$, for $k \geq 0$ and $1 \leq j \leq n_i$. Common distributions used to model count data include [http://en.wikipedia.org/wiki/Poisson_distribution Poisson], [http://en.wikipedia.org/wiki/Binomial_distribution binomial] and [http://en.wikipedia.org/wiki/Negative_binomial_distribution negative binomial].<br />
<br />
Indeed, here we will only consider parametric distributions. In this context, building a model means defining:<br />
<br />
<br />
<ul><br />
* the parameter function (or "intensity") $\lambda_{ij} = \lambda(t_{ij},\psi_i)$ for any individual $i$ that depends on individual parameters $\psi_i$ and possibly the time $t_{ij}$.<br><br />
<br />
* the probability mass function $\prob{y_{ij}=k; \lambda_{ij}}$.<br />
</ul><br />
<br />
<br />
The conditional distribution of the observations is therefore written:<br />
<br />
{{Equation1<br />
|equation = <math> \prob{y_{ij}=k {{!}} \psi_i} = \prob{y_{ij}=k ; \lambda_{ij} }. </math> }} <br />
<br />
<br />
{{Example<br />
|title=Example<br />
<br />
|text= Let us illustrate this approach for the [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution].<br />
A [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution] with intensity $\lambda$ is defined by its probability mass function:<br />
<br />
{{Equation1|equation=<math> \prob{y=k ; \lambda} = \displaystyle{\frac{\lambda^{k} \, e^{-\lambda} }{k!} }. </math>}}<br />
<br />
<br />
::[[File:poisson1.png|link=]]<br />
<br />
<br />
One of the main property of the [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution] is that $\lambda$ is both the mean and the variance of the distribution:<br />
<br />
{{Equation1|equation=<math>\esp{y} = \var{y} = \lambda </math>}}<br />
<br />
All that remains is to define the Poisson intensity function $ \lambda_{ij} = \lambda(t_{ij},\psi_i)$. Then,<br />
<br />
{{Equation1<br />
|equation=<math>\prob{y_{ij}=k {{!}} \psi_i} = \displaystyle{\frac{\lambda_{ij}^{k}\, e^{-\lambda_{ij} } } {k!} }. </math>}}<br />
}}<br />
<br />
<br />
There are many variations of the Poisson model:<br />
<br />
<br />
<ul><br />
* ''Homogeneous [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution]:'' this assumes a constant intensity $\lambda_i$ for each individual $i$. Here, $\psi_i = \lambda_i$ and $\lambda(t_{ij},\psi_i)=\lambda_i$. <br />
<br><br><br />
* ''Non-homogeneous [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution]:'' this assumes that the Poisson intensity is a function of time. For example, suppose that we believe that a disease-related event is increasing linearly in frequency each month. We could then model this using $\lambda(t_{ij},\psi_i) = \lambda_{i} + a_i t_{ij}$, where $t_{ij} = j$ (months). Here, $\psi_i=(\lambda_{i},a_i)$.<br />
<br><br><br />
* ''Additional regression variables:'' the Poisson intensity may depend on regression variables other than time. For example, assume that taking a drug tends to reduce the number of events. We can then link the time-varying drug concentration $C$ to the value of $\lambda$ at time $t_{ij}$ using for instance an "Imax" model:<br />
<br />
{{Equation1|equation=<math> <br />
\lambda(t_{ij},\psi_i) = \lambda_{i}\left(1-\Imax_i\displaystyle{\frac{ \ C_i(t_{ij})}{IC_{50,i} + C_i(t_{ij})} }\right) ,<br />
</math> }}<br />
<br />
: where $\lambda_{i}$ is the baseline intensity and where $0\leq \Imax_i\leq 1$. Here, $\psi_{i} = (\lambda_{i}, \Imax_i, IC_{50,i})$.<br />
<br />
: This model can even be combined with the previous non-homogeneous model by assuming a time-varying baseline $\lambda_{i}(t)$ in order to combine a drug effect model with a disease model for instance.<br><br />
<br />
<br />
* Instead of assuming independent count data, we can introduce Markovian dependency into the model by assuming for example that $\lambda_{ij}$ is function of $y_{i,j-1}$. Then, $\prob{y_{ij}=k\, |\, y_{i\,j-1}, t_{ij},\psi_i}$ is the probability function of a Poisson random variable with parameter $\lambda_{ij} =\lambda(y_{i,j-1}, t_{ij},\psi_i)$.<br />
<br><br><br />
<br />
* If $y_{ij}$ is the number of a given type of events (seizures, hemorrhages, etc.) in a given time interval $I_{ij}$, and if $h_i(t)=h(t,\psi_i)$ is the hazard function associated with this sequence of events for individual $i$, then $y_{ij}$ is a non-homogeneous Poisson process with Poisson intensity $\lambda_{ij}=\displaystyle{ \int_{I_{ij}}} h(t,\psi_i)dt$ in interval $I_{ij}$ (see [[Models for time-to-event data]] section).<br />
</ul><br />
<br />
<br />
Let us see now some other examples of distributions for count data:<br />
<br />
<br />
<ul><br />
* The inflated [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution]:<br />
<br />
{{Equation1<br />
|equation=<math> <br />
\prob{y=k ; \lambda,p_0} = \left\{ \begin{array}{cc}<br />
p_0 + (1-p_0)e^{-\lambda} & {\rm if } \ k=0 \\<br />
(1-p_0) \displaystyle {\frac{e^{-\lambda} \lambda^{k} }{k!} } & {\rm if } \ k>0 .<br />
\end{array}<br />
\right.<br />
</math>}}<br />
<br />
:where $0\leq p_0 <1$. This is useful when data seem generally to follow a [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution] except for having an overly large quantity of cases when $k=0$:<br />
<br />
<br />
::[[File:poisson2.png|link=]]<br />
<br />
<br />
* The [http://en.wikipedia.org/wiki/Negative_binomial_distribution negative binomial distribution] is:<br />
<br />
{{Equation1<br />
|equation=<math><br />
\prob{y=k ; p,r} = \displaystyle{ \frac{\Gamma(k+r)}{k!\, \Gamma(r)} }(1-p)^r p^k ,<br />
</math>}}<br />
<br />
:with $0\leq p \leq 1$ and $r>0$. If $r$ is an integer, then the [http://en.wikipedia.org/wiki/Negative_binomial_distribution negative binomial (NB) distribution] with parameters $(p,r)$ is the probability distribution of the number of successes in a sequence of [http://en.wikipedia.org/wiki/Bernoulli_trial Bernoulli trials] with probability of success $p$ before $r$ failures occur.<br />
<br />
<br />
::[[File:poisson3.png|link=]]<br />
<br />
<br />
* The generalized [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution] is: <br />
<br />
{{Equation1<br />
|equation=<math><br />
\prob{y=k ; \lambda,\delta} = \displaystyle {\frac{\lambda (\lambda+k\delta)^{k-1} e^{-\lambda-k\delta} }{k!} },<br />
</math> }}<br />
<br />
:with $\lambda>0$ and $0\leq \delta <1$.<br />
:The generalized Poisson (GP) distribution includes the [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution] as a special case $(\delta=0)$, and is over-dispersed relative to the Poisson. Indeed, the variance to mean ratio exceeds 1:<br />
<br />
<br />
{{Equation1<br />
|equation=<math> \begin{eqnarray} \esp{y} &=& \frac{\lambda}{1-\delta} \\<br />
\var{y} &=& \frac{\lambda}{1-\delta^3}.<br />
\end{eqnarray}</math> }}<br />
<br />
<br />
::[[File:poisson4.png|link=]]<br />
<ul><br />
<br />
<br><br><br />
-----------------<br />
<br><br><br />
<br />
{{Summary<br />
|title=Summary<br />
|text=<br />
For a given design $\bx_{i}$ and a given vector of parameters $\psi_i$, a parametric model for count data is completely defined by:<br />
<br />
<br />
<ul><br />
- the probability mass function used to represent the distribution of the data in a given time interval<br />
<br><br><br />
- a model which defines how the distribution's parameter function (i.e., intensity) varies over time.<br />
</ul><br />
}}<br />
<br />
<br />
<br><br />
== $\mlxtran$ for count data models == <br />
<br />
<br />
<br />
{{ExampleWithCode<br />
|title1= Example 1: <br />
|title2= Poisson model with time varying intensity<br />
|text=<br />
<br />
|equation=<math> \begin{array}{c}<br />
\psi_i &=& (\alpha_i,\beta_i) \\[0.3cm]<br />
\lambda(t,\psi_i) &=& \alpha_i + \beta_i\,t \\[0.3cm]<br />
\prob{y_{ij}=k} &=& \displaystyle{ \frac{\lambda(t_{ij} , \psi_i)^k}{k!} } e^{-\lambda(t_{ij} , \psi_i)}\\<br />
\end{array}</math><br />
|code=<br />
{{MLXTranForTable<br />
|name=<br />
|text=<br />
<pre style=" background-color:#EFEFEF; border: none;"> <br />
INPUT:<br />
input = {alpha, beta}<br />
<br />
EQUATION:<br />
lambda = alpha + beta*t<br />
<br />
DEFINITION:<br />
y ~ poisson(lambda)<br />
</pre> }}<br />
}}<br />
<br />
<br />
<br />
{{ExampleWithCode<br />
|title1= Example 2: <br />
|title2= generalized Poisson model<br />
|text=<br />
<br />
|equation=<math> \begin{array}{c}<br />
\psi_i &=& (\lambda_i,\delta_i) \\<br />
\log\left( \prob{y_{ij}=k} \right) &=& \log(\lambda_i) + (k-1)\log(\lambda_i+k\delta_i) \\<br />
&& -\lambda_i-k\delta_i - \log(k!)\\[1cm]<br />
\end{array}</math><br />
|code=<br />
{{MLXTranForTable<br />
|name=<br />
|text=<br />
<pre style=" background-color:#EFEFEF; border:none;"> <br />
INPUT:<br />
parameter = {dlt, lbd}<br />
<br />
DEFINITION:<br />
Y = {<br />
type = count,<br />
log(P(Y=k)) = log(lambda)<br />
+ (k-1)*log(lambda+k*delta)<br />
- lambda -k*delta - factln(k)<br />
} </pre> }}<br />
}}<br />
<br />
<br />
<br />
<br><br />
<br />
== Bibliography==<br />
<br />
<br />
<bibtex><br />
@article{blundell2002individual,<br />
title={Individual effects and dynamics in count data models},<br />
author={Blundell, R. and Griffith, R. and Windmeijer, F.},<br />
journal={Journal of Econometrics},<br />
volume={108},<br />
number={1},<br />
pages={113-131},<br />
year={2002},<br />
publisher={Elsevier}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{bolker2009generalized,<br />
title={Generalized linear mixed models: a practical guide for ecology and evolution},<br />
author={Bolker, B. M. and Brooks, M. E. and Clark, C. J. and Geange, S. W. and Poulsen, J. R. and Stevens, M. H. and White, J.-S. S. and others},<br />
journal={Trends in ecology & evolution},<br />
volume={24},<br />
number={3},<br />
pages={127-135},<br />
year={2009},<br />
publisher={Elsevier Science}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{cameron1998regression,<br />
title={Regression analysis of count data},<br />
author={Cameron, A. C. and Trivedi, P. K.},<br />
volume={30},<br />
year={1998},<br />
publisher={Cambridge University Press}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{christensen2002bayesian,<br />
title={Bayesian prediction of spatial count data using generalized linear mixed models},<br />
author={Christensen, O. F. and Waagepetersen, R.},<br />
journal={Biometrics},<br />
volume={58},<br />
number={2},<br />
pages={280-286},<br />
year={2002},<br />
publisher={Wiley Online Library}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{fahrmeir1994multivariate,<br />
title={Multivariate statistical modelling based on generalized linear models},<br />
author={Fahrmeir, L. and Tutz, G. and Hennevogl, W.},<br />
volume={2},<br />
year={1994},<br />
publisher={Springer New York}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{hall2004zero,<br />
title={Zero-inflated Poisson and binomial regression with random effects: a case study},<br />
author={Hall, D. B.},<br />
journal={Biometrics},<br />
volume={56},<br />
number={4},<br />
pages={103--1039},<br />
year={2004},<br />
publisher={Wiley Online Library}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{heilbron2007zero,<br />
title={Zero-Altered and other Regression Models for Count Data with Added Zeros},<br />
author={Heilbron, D. C.},<br />
journal={Biometrical Journal},<br />
volume={36},<br />
number={5},<br />
pages={531-547},<br />
year={2007},<br />
publisher={Wiley Online Library}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{lawless1987negative,<br />
title={Negative binomial and mixed Poisson regression},<br />
author={Lawless, J. F.},<br />
journal={Canadian Journal of Statistics},<br />
volume={15},<br />
number={3},<br />
pages={209-225},<br />
year={1987},<br />
publisher={Wiley Online Library}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{lee2006multi,<br />
title={Multi-level zero-inflated Poisson regression modelling of correlated count data with excess zeros},<br />
author={Lee, A. H. and Wang, K. and Scott, J. A. and Yau, K. K. W. and McLachlan, G. J.},<br />
journal={Statistical Methods in Medical Research},<br />
volume={15},<br />
number={1},<br />
pages={47-61},<br />
year={2006},<br />
publisher={SAGE Publications}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{mcculloch2011generalized,<br />
title={Generalized, Linear, and Mixed Models},<br />
author={McCulloch, C. E. and Searle, S. R. and Neuhaus, J. M.},<br />
isbn={9781118209967},<br />
series={Wiley Series in Probability and Statistics},<br />
url={http://books.google.fr/books?id=kyvgyK\_sBlkC},<br />
year={2011},<br />
publisher={Wiley}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{min2005random,<br />
title={Random effect models for repeated measures of zero-inflated count data},<br />
author={Min, Y. and Agresti, A.},<br />
journal={Statistical Modelling},<br />
volume={5},<br />
number={1},<br />
pages={1-19},<br />
year={2005},<br />
publisher={SAGE Publications}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{molenberghs2005models,<br />
title={Models for discrete longitudinal data},<br />
author={Molenberghs, G. and Verbeke, G.},<br />
year={2005},<br />
publisher={Springer}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{mullahy1998heterogeneity,<br />
title={Heterogeneity, excess zeros, and the structure of count data models},<br />
author={Mullahy, J.},<br />
journal={Journal of Applied Econometrics},<br />
volume={12},<br />
number={3},<br />
pages={337-350},<br />
year={1998},<br />
publisher={Wiley Online Library}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{savic2009performance,<br />
title={Performance in population models for count data, part ii: A new saem algorithm},<br />
author={Savic, R. and Lavielle, M.},<br />
journal={Journal of pharmacokinetics and pharmacodynamics},<br />
volume={36},<br />
number={4},<br />
pages={367-379},<br />
year={2009},<br />
publisher={Springer}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{thall1988mixed,<br />
title={Mixed Poisson likelihood regression models for longitudinal interval count data},<br />
author={Thall, P. F.},<br />
journal={Biometrics},<br />
pages={197-209},<br />
year={1988},<br />
publisher={JSTOR}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{thall1990some,<br />
title={Some covariance models for longitudinal count data with overdispersion},<br />
author={Thall, P. F. and Vail, S. C.},<br />
journal={Biometrics},<br />
pages={657-671},<br />
year={1990},<br />
publisher={JSTOR}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{tempelman1996mixed,<br />
title={A mixed effects model for overdispersed count data in animal breeding},<br />
author={Tempelman, R. J. and Gianola, D.},<br />
journal={Biometrics},<br />
pages={265-279},<br />
year={1996},<br />
publisher={JSTOR}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{winkelmann2008econometric,<br />
title={Econometric analysis of count data},<br />
author={Winkelmann, R.},<br />
year={2008},<br />
publisher={Springer}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{wolfinger1993generalized,<br />
title={Generalized linear mixed models a pseudo-likelihood approach},<br />
author={Wolfinger, R. and O'Connell, M.},<br />
journal={Journal of statistical Computation and Simulation},<br />
volume={48},<br />
number={3-4},<br />
pages={233-243},<br />
year={1993},<br />
publisher={Taylor & Francis}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{yau2003zero,<br />
title={Zero-Inflated Negative Binomial Mixed Regression Modeling of Over-Dispersed Count Data with Extra Zeros},<br />
author={Yau, K. K. W. and Wang, K. and Lee, A. H.},<br />
journal={Biometrical Journal},<br />
volume={45},<br />
number={4},<br />
pages={437-452},<br />
year={2003},<br />
publisher={Wiley Online Library}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{zeileis2008regression,<br />
title={Regression models for count data in R},<br />
author={Zeileis, A. and Kleiber, C. and Jackman, S.},<br />
journal={Journal of Statistical Software},<br />
volume={27},<br />
number={8},<br />
pages={1-25},<br />
year={2008}<br />
}<br />
</bibtex><br />
<br />
{{Back&Next<br />
|linkBack=Continuous data models<br />
|linkNext=Model for categorical data }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=Continuous_data_models&diff=7388Continuous data models2013-06-21T09:01:27Z<p>Brocco: /* Distribution of the standardized residual errors */</p>
<hr />
<div><!-- Menu for the Observations chapter --><br />
<sidebarmenu><br />
+[[Modeling the observations]]<br />
*[[Modeling the observations| Introduction ]] | [[ Continuous data models ]] | [[Models for count data]] | [[Model for categorical data]] | [[Models for time-to-event data ]] | [[Joint models]] <br />
</sidebarmenu><br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
== The data ==<br />
<br />
Continuous data is data that can take any real value within a given range. For instance, a concentration takes its values in $\Rset^+$, the log of the viral load in $\Rset$, an effect expressed as a percentage in $[0,100]$.<br />
<br />
The data can be stored in a table and represented graphically. Here is some simple pharmacokinetics data involving four individuals.<br />
<br />
<br />
{| cellpadding="0" cellspacing="0" <br />
| style="width:60%" align="center"| <br />
:[[File:continuous_graf0a_1.png]]<br />
| style="width: 40%" align="left"| <br />
:{| class="wikitable" style="width: 70%;"<br />
!| ID || TIME ||CONCENTRATION<br />
|- <br />
|1 || 1.0 || 9.84 <br />
|-<br />
|1 || 2.0 || 8.19 <br />
|-<br />
|1 || 4.0 || 6.91 <br />
|-<br />
|1 || 8.0 || 3.71 <br />
|-<br />
|1 || 12.0 || 1.25 <br />
|-<br />
|2 || 1.0 || 17.23 <br />
|-<br />
|2 || 3.0 || 11.14 <br />
|-<br />
|2 || 5.0 || 4.35 <br />
|-<br />
|2 || 10.0 || 2.92 <br />
|-<br />
|3 || 2.0 || 9.78 <br />
|-<br />
|3 || 3.0 || 10.40 <br />
|-<br />
|3 || 4.0 || 7.67 <br />
|-<br />
|3 || 6.0 || 6.84 <br />
|-<br />
|3 || 11.0 || 1.10 <br />
|-<br />
|4 || 4.0 || 8.78 <br />
|-<br />
|4 || 6.0 || 3.87 <br />
|-<br />
|4 || 12.0 || 1.85 <br />
|}<br />
|}<br />
<br />
<br />
Instead of individual plots, we can plot them all together. Such a figure is usually called a ''spaghetti plot'':<br />
<br />
<br />
::[[File:continuous_graf0b_1.png]]<br />
<br />
<br />
<br><br />
<br />
== The model ==<br />
<br />
<br />
For continuous data, we are going to consider scalar outcomes ($y_{ij}\in \Yr \subset \Rset$) and assume the following general model:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="nlme" ><math>y_{ij}=f(t_{ij},\psi_i)+ g(t_{ij},\psi_i)\teps_{ij}, \quad\ \quad 1\leq i \leq N, \quad \ 1 \leq j \leq n_i. </math></div><br />
|reference=(1)<br />
}}<br />
<br />
where $g(t_{ij},\psi_i)\geq 0$.<br />
<br />
Here, the residual errors $(\teps_{ij})$ are standardized random variables (mean zero and standard deviation 1).<br />
In this case, it is clear that $f(t_{ij},\psi_i)$ and $g(t_{ij},\psi_i)$ are the mean and standard deviation of $y_{ij}$, i.e.,<br />
<br />
{{Equation1<br />
|equation= <math>\begin{eqnarray} \esp{y_{ij} {{!}} \psi_i} &=& f(t_{ij},\psi_i) \\ <br />
\std{y_{ij} {{!}} \psi_i} &=& g(t_{ij},\psi_i).<br />
\end{eqnarray}</math>}}<br />
<br />
<br />
<br><br />
<br />
== The structural model == <br />
<br />
<br />
$f$ is known as the ''structural model'' and aims to describe the time evolution of the phenomena under study. For a given subject $i$ and vector of individual parameters $\psi_i$, $f(t_{ij},\psi_i)$ is the prediction of the observed variable at time $t_{ij}$. In other words, it is the value that would be measured at time $t_{ij}$ if there was no error ($\teps_{ij}=0$).<br />
<br />
In the current example, we decide to model with the structural model $f=A\exp\left(-\alpha t \right)$.<br />
Here are some example curves for various combinations of $A$ and $\alpha$:<br />
<br />
<br />
::[[File:continuous_graf1bis.png|link=]]<br />
<br />
<br />
Other models involving more complicated dynamical systems can be imagined, such as those defined as solutions of systems of ordinary or partial differential equations. Real-life examples are found in the study of HIV, pharmacokinetics and tumor growth.<br />
<br />
<br />
<br />
<br><br />
== The residual error model ==<br />
<br />
<br />
For a given structural model $f$, the conditional probability distribution of the observations $(y_{ij})$ is completely defined by the residual error model, i.e., the probability distribution of the residual errors $(\teps_{ij})$ and the standard deviation $g(x_{ij},\psi_i)$. The residual error model can take many forms. For example,<br />
<br />
<br />
<ul><br />
* A constant error model assumes that $g(t_{ij},\psi_i)=a_i$. Model [[#nlme|(1)]] then reduces to<br />
<br />
{{EquationWithRef <br />
|equation=<div id="nlme1" ><math>y_{ij}=f(t_{ij},\psi_i)+ a_i\teps_{ij}, \quad \quad \ 1\leq i \leq N<br />
\quad \ 1 \leq j \leq n_i. </math></div><br />
|reference=(2) }}<br />
<br />
:The figure below shows four simulated sequences of observations $(y_{ij}, 1\leq i \leq 4, 1\leq j \leq 10)$ with their respective structural model $f(t,\psi_i)$ in blue. Here, $a_i=2$ is the standard deviation of $y_{ij}$ for all $(i,j)$.<br />
<br />
<br />
::[[File: continuous_graf2a1.png|link=]]<br />
<br />
<br />
:Let $\hat{y}_{ij}=f(t_{ij},\psi_i)$ be the prediction of $y_{ij}$ given by the model [[#nlme1|(2)]]. The figure below shows for 50 individuals:<br />
<br />
<br />
<ul><br />
::'''-left''': prediction errors $e_{ij}=y_{ij}-\hat{y}_{ij}$ vs. predictions $(\hat{y}_{ij})$. The pink line is the mean $\esp{e_{ij}}=0$; the green lines are $\pm$ 1 standard deviations: $[\std{e_{ij}} , +\std{e_{ij}}]$ where $\std{e_{ij}}=a_i=0.5$. <br />
<br><br />
::'''-right''': observations $(y_{ij})$ vs. predictions $(\hat{y}_{ij})$. The pink line is the identify $y=\hat{y}$, the green lines represent an interval of $\pm 1$ standard deviations around $\hat{y}$: $[\hat{y}-\std{e_{ij}} , \hat{y}+\std{e_{ij}}]$.<br />
</ul><br />
<br />
<br />
::[[File:continuous_graf2a2.png|link=]]<br />
<br />
<br />
:These figures are typical for constant error models. The standard deviation of the prediction errors does not depend on the value of the predictions $(\hat{y}_{ij})$, so both intervals have constant amplitude.<br />
<br />
<br />
* A proportional error model assumes that $g(t_{ij},\psi_i) =b_i f(t_{ij},\psi_i)$. Model [[#nlme|(1)]] then becomes<br />
<br />
<br />
{{EquationWithRef <br />
|equation=<div id="nlme2"><math> y_{ij}=f(t_{ij},\psi_i)(1 + b_i\teps_{ij}), \quad\ \quad 1\leq i \leq N,<br />
\quad \ 1 \leq j \leq n_i . </math></div><br />
|reference=(3) }}<br />
<br />
:The standard deviation of the prediction error $e_{ij}=y_{ij}-\hat{y}_{ij}$ is proportional to the prediction $\hat{y}_{ij}$. Therefore, the amplitude of the $\pm 1$ standard deviation intervals increases linearly with $f$:<br />
<br />
<br />
::[[File:continuous_graf2b.png|link=]]<br />
<br />
<br />
* A combined error model combines a constant and a proportional error model by assuming $g(t_{ij},\psi_i) =a_i + b_i f(t_{ij},\psi_i)$, where $a_1>0$ and $b_i>0$. The standard deviation of the prediction error $e_{ij}$ and thus the amplitude of the intervals are now affine functions of the prediction $\hat{y}_{ij}$:<br />
<br />
<br />
::[[File:continuous_graf2c.png|link=]]<br />
<br />
<br />
* Another alternative combined error model is $g(t_{ij},\psi_i) =\sqrt{a_i^2 + b_i^2 f^2(t_{ij},\psi_i)}$. This gives intervals that look fairly similar to the previous ones, though they are no longer affine.<br />
<br />
<br />
::[[File:continuous_graf2d.png|link=]]<br />
</ul><br />
<br />
<br />
<br><br />
<br />
== Extension to autocorrelated errors == <br />
<br />
<br />
For any subject $i$, the residual errors $(\teps_{ij},1\leq j \leq n_i)$ are usually assumed to be independent random variables. Extension to autocorrelated errors is possible by assuming for instance that $(\teps_{ij})$ is a stationary ARMA (Autoregressive Moving Average) process.<br />
For example, an autoregressive process of order 1, AR(1), assumes that autocorrelation decreases exponentially:<br />
<br />
{{EquationWithRef <br />
|equation=<div id="autocorr1"><math> {\rm corr}(\teps_{ij},\teps_{i\,{j+1} }) = \rho_i^{(t_{i\,j+1}-t_{ij})}. </math></div><br />
|reference=(4) }}<br />
<br />
where $0\leq \rho_i <1$ for each individual $i$.<br />
If we assume that $t_{ij}=j$ for any $(i,j)$. Then, $t_{i,j+1}-t_{i,j}=1$ and the autocorrelation function $\gamma$ is given by:<br />
<br />
{{Equation1<br />
|equation=<math> \begin{array}<br />
\gamma(\tau) &=& {\rm corr}(\teps_{ij},\teps_{i\,j+\tau}) \\ &= &\rho_i^{\tau} .<br />
\end{array}</math> }}<br />
<br />
The figure below displays 3 different sequences of residual errors simulated with 3 different autocorrelations $\rho_1=0.1$, $\rho_2=0.6$ and $\rho_3=0.95$. The autocorrelation functions $\gamma(\tau)$ are also displayed.<br />
<br />
<br />
::[[File:continuousGraf3.png|link=]]<br />
<br />
<br />
<br />
<br><br />
== Distribution of the standardized residual errors ==<br />
<br />
<br />
The distribution of the standardized residual errors $(\teps_{ij})$ is usually assumed to be the same for each individual $i$ and any observation time $t_{ij}$.<br />
Furthermore, for identifiability reasons it is also assumed to be symmetrical around 0, i.e., $\prob{\teps_{ij}<-u}=\prob{\teps_{ij}>u}$ for all $u\in \Rset$.<br />
Thus, for any $(i,j)$ the distribution of the observation $y_{ij}$ is also symmetrical around its prediction $f(t_{ij},\psi_i)$. This $f(t_{ij},\psi_i)$ is therefore both the mean and the median of the distribution of $y_{ij}$: $\esp{y_{ij}|\psi_i}=f(t_{ij},\psi_i)$ and $\prob{y_{ij}>f(t_{ij},\psi_i)} = \prob{y_{ij}<f(t_{ij},\psi_i)} = 1/2$. If we make the additional hypothesis that 0 is the mode of the distribution of $\teps_{ij}$, then $f(t_{ij},\psi_i)$ is also the mode of the distribution of $y_{ij}$.<br />
<br />
A widely used bell-shaped distribution for modeling residual errors is the normal distribution. If we assume that $\teps_{ij}\sim {\cal N}(0,1)$, then $y_{ij}$ is also normally distributed: $ y_{ij}\sim {\cal N}(f(t_{ij},\bpsi_i),\, g(t_{ij},\bpsi_i))$.<br />
<br />
Other distributions can be used, such as [http://en.wikipedia.org/wiki/Student's_t-distribution Student's $t$-distribution] (also known simply as the $t$-distribution) which is also symmetric and bell-shaped but with heavier tails, meaning that it is more prone to producing values that fall far from its prediction.<br />
<br />
<br />
::[[File:continuous_graf4_bis.png|link=]]<br />
<br />
<br />
If we assume that $\teps_{ij}\sim t(\nu)$, then $y_{ij}$ has a non-standardized [http://en.wikipedia.org/wiki/Student's_t-distribution Student's $t$-distribution].<br />
<br />
<br />
<br />
<br><br />
<br />
== The conditional likelihood ==<br />
<br />
<br />
The conditional likelihood for given observations $\by$ is defined as<br />
<br />
{{Equation1<br />
|equation=<math> {\like}(\bpsi; \by) \ \ \eqdef \ \ \pcypsi(\by {{!}} \bpsi), </math> }}<br />
<br />
where $\pcypsi(\by | \bpsi)$ is the conditional density function of the observations. <br />
If we assume that the residual errors $(\teps_{ij},\ 1\leq i \leq N,\ 1\leq j \leq n_i)$ are i.i.d., then this conditional density is straightforward to compute:<br />
<br />
{{EquationWithRef <br />
|equation=<div id="likeN_model1"><math> \begin{eqnarray}\pcypsi(\by {{!}} \bpsi ) & = & \prod_{i=1}^N \pcyipsii(\by_i {{!}} \psi_i ) \\<br />
& = & \prod_{i=1}^N \prod_{j=1}^{n_i} \pypsiij(y_{ij} {{!}} \bpsi_i ) \\<br />
& = & \prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{\frac{1}{g(t_{ij},\psi_i)} } \, \qeps\left(\frac{y_{ij} - f(t_{ij},\psi_i)}{g(t_{ij},\psi_i)}\right) ,<br />
\end{eqnarray} </math></div><br />
|reference=(5) }}<br />
<br />
where $\qeps$ is the pdf of the i.i.d. residual errors ($\teps_{ij}$).<br />
<br />
For example, if we assume that the residual errors $\teps_{ij}$ are Gaussian random variables with mean 0 and variance 1, then $ \qeps(x) = e^{-{x^2}/{2}}/\sqrt{2 \pi}$, and<br />
<br />
{{EquationWithRef <br />
|equation=<div id="likeN_model2" ><math> \begin{eqnarray}<br />
\pcypsi(\by {{!}} \psi ) & = &<br />
\prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{ \frac{1}{\sqrt{2 \pi} g(t_{ij},\psi_i)} }\, \exp\left\{-\frac{1}{2}\left(\frac{y_{ij} - f(t_{ij},\psi_i)}{g(t_{ij},\psi_i)}\right)^2\right\} .<br />
\end{eqnarray} </math></div><br />
|reference=(6) }}<br />
<br />
<br />
<br />
<br><br />
<br />
== Transforming the data==<br />
<br />
<br />
The assumption that the distribution of any observation $y_{ij}$ is symmetrical around its predicted value is a very strong one. If this assumption does not hold, we may decide to transform the data to make it more symmetric around its (transformed) predicted value. In other cases, constraints on the values that observations can take may also lead us to want to transform the data.<br />
<br />
Model [[#nlme|(1)]] can be extended to include a transformation of the data:<br />
<br />
{{EquationWithRef <br />
|equation=<div id="def_t" ><math> \transy(y_{ij})=\transy(f(t_{ij},\psi_i))+ g(t_{ij},\psi_i)\teps_{ij} </math></div><br />
|reference=(7) }}<br />
<br />
where $\transy$ is a monotonic transformation (a strictly increasing or decreasing function).<br />
As you can see, both the data $y_{ij}$ and the structural model $f$ are transformed by the function $\transy$ so that $f(t_{ij},\psi_i)$ remains the prediction of $y_{ij}$.<br />
<br />
<br />
<br />
{{Example<br />
|title=Examples: <br />
| text=<br />
1. If $y$ takes non-negative values, a log transformation can be used: $\transy(y) = \log(y)$. We can then present the model with one of two equivalent representations:<br />
<br />
<!-- Therefore, $y=f e^{g\teps}$. --><br />
<br />
{{Equation1<br />
|equation= <math> \begin{eqnarray}<br />
\log(y_{ij})&=&\log(f(t_{ij},\psi_i))+ g(t_{ij},\psi_i)\teps_{ij}, \\<br />
y_{ij}&=&f(t_{ij},\psi_i)\, e^{ \displaystyle{ -g(t_{ij},\psi_i)\teps_{ij} } }.<br />
\end{eqnarray}</math><br />
}}<br />
<br />
<br />
::[[File: continuous_graf5a.png|link=]]<br />
<br />
<br />
2. If $y$ takes its values between 0 and 1, a logit transformation can be used:<br />
<!-- %\begin{eqnarray*}<br />
%\transy(y)&=&\log(y/(1-y)) \\<br />
% y&=&\frac{f}{f+(1-f) e^{-g\teps}} .<br />
%\end{eqnarray*} --><br />
<br />
{{Equation1<br />
|equation= <math> \begin{eqnarray}<br />
\logit(y_{ij})&=&\logit(f(t_{ij},\psi_i))+ g(t_{ij},\psi_i)\teps_{ij} , \\<br />
y_{ij}&=& \displaystyle{\frac{ f(t_{ij},\bpsi_i) }{ f(t_{ij},\psi_i) + (1- f(t_{ij},\bpsi_i)) \, e^{ g(t_{ij},\psi_i)\teps_{ij} } } }.<br />
\end{eqnarray}</math><br />
}}<br />
<br />
<br />
::[[File:continuous_graf5b.png|link=]]<br />
<br />
<br />
3. The logit error model can be extended if the $y_{ij}$ are known to take their values in an interval $[A,B]$:<br />
<br />
{{Equation1<br />
|equation=<math> \begin{eqnarray}<br />
\transy(y_{ij})&=&\log((y_{ij}-A)/(B-y_{ij})), \\<br />
y_{ij}&=&A+(B-A)\displaystyle{\frac{f(t_{ij},\psi_i)-A}{f(t_{ij},\psi_i)-A+(B-f(t_{ij},\psi_i)) e^{-g(t_{ij},\psi_i)\teps_{ij} } } }\, .<br />
\end{eqnarray}</math><br />
}}<br />
<!-- [[File:continuous_graf5c.png]] --><br />
}}<br />
<br />
<br />
Using the transformation proposed in [[#def_t|(7)]], the conditional density $\pcypsi$ becomes<br />
<br />
{{EquationWithRef<br />
|equation= <div id="likeN_model3" ><math> \begin{eqnarray}<br />
\pcypsi(\by {{!}} \bpsi ) & = & \prod_{i=1}^N \prod_{j=1}^{n_i} \pypsiij(y_{ij} {{!}} \psi_i ) \\<br />
& = & \prod_{i=1}^N \prod_{j=1}^{n_i} \transy^\prime(y_{ij}) \, \ptypsiij(\transy(y_{ij}) {{!}} \psi_i ) \\<br />
& = & \prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{ \frac{\transy^\prime(y_{ij})}{g(t_{ij},\psi_i)} } \, \qeps\left(\frac{\transy(y_{ij}) - \transy(f(t_{ij},\psi_i))}{g(t_{ij},\psi_i)}\right)<br />
\end{eqnarray}<br />
</math></div><br />
|reference=(8) }}<br />
<br />
For example, if the observations are log-normally distributed given the individual parameters ($\transy(y) = \log(y)$), with a constant error model ($g(t;\psi_i)=a$), then<br />
<br />
{{Equation1<br />
|equation=<math> \pcypsi(\by {{!}} \bpsi ) = \prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{ \frac{1}{\sqrt{2 \pi a^2} \, y_{ij} } }\, \exp\left\{-\frac{1}{2 \, a^2}\left(\log(y_{ij}) - \log(f(t_{ij},\psi_i))\right)^2\right\}.<br />
</math> }} <br />
<br />
<br />
<br><br />
<br />
== Censored data ==<br />
<br />
<br />
Censoring occurs when the value of a measurement or observation is only partially known.<br />
For continuous data measurements in the longitudinal context, censoring refers to the values of the measurements, not the times at which they were taken.<br />
<br />
For example, in analytical chemistry, the lower limit of detection (LLOD) is the lowest quantity of a substance that can be distinguished from the absence of that substance. Therefore, any time the quantity is below the LLOD, the "measurement" is not a number but the information that the quantity is less than the LLOD.<br />
<br />
Similarly, in pharmacokinetic studies, measurements of the concentration below a certain limit referred to as the lower limit of quantification (LLOQ) are so low that their reliability is considered suspect. A measuring device can also have an upper limit of quantification (ULOQ) such that any value above this limit cannot be measured and reported.<br />
<br />
As hinted above, censored values are not typically reported as a number, but their existence is known, as well as the type of censoring. Thus, the observation $\repy_{ij}$ (i.e., what is reported) is the measurement $y_{ij}$ if not censored, and the type of censoring otherwise.<br />
<br />
We usually distinguish three types of censoring: left, right and interval. We now introduce these, along with illustrative data sets.<br />
<br />
<br />
* '''Left censoring''': a data point is below a certain value $L$ but it is not known by how much:<br />
<br />
{{Equation1<br />
|equation = <math> <br />
\repy_{ij} = \left\{ \begin{array}{c}<br />
y_{ij} & {\rm if } \ y_{ij} \geq L \\<br />
y_{ij} < L & {\rm otherwise.}<br />
\end{array} \right. </math> }} <br />
<br />
<blockquote>In the figures below, the "data" below the limit $L=-0.30$, shown in gray, is not observed. The values are therefore not reported in the dataset. An additional column {{Verbatim|cens}} can be used to indicate if an observation is left-censored ({{Verbatim|cens{{-}}1}}) or not ({{Verbatim|cens{{-}}0}}). The column of observations {{Verbatim|log-VL}} displays the observed log-viral load when it is above the limit $L=-0.30$, and the limit $L=-0.30$ otherwise.</blockquote><br />
<br />
<br />
{| cellspacing="0" cellpadding="0" <br />
| style="width=60%" |<br />
[[File:continuous_graf6a.png|link=]]<br />
| style="width=40%" align="right"|<br />
{| class="wikitable" style="width: 150%"<br />
!| ID || TIME ||log-VL || cens<br />
|- <br />
| 1 || 1.0 || 0.26 || 0<br />
|-<br />
| 1 || 2.0 || 0.02 || 0<br />
|-<br />
| 1 || 3.0 || -0.13 || 0<br />
|-<br />
| 1 || 4.0 || -0.13 || 0<br />
|-<br />
| 1 || 5.0 || -0.30 || 1<br />
|-<br />
| 1 || 6.0 || -0.30 || 1<br />
|-<br />
| 1 || 7.0 || -0.25 || 0<br />
|-<br />
| 1 || 8.0 || -0.30 || 1<br />
|-<br />
| 1 || 9.0 || -0.29 || 0<br />
|-<br />
| 1 || 10.0 || -0.30 || 1<br />
|}<br />
|}<br />
<br />
<br />
* '''Interval censoring:''' if a data point is in interval $I$, its exact value is not known:<br />
<br />
{{Equation1<br />
|equation=<math> \repy_{ij} = \left\{ \begin{array}{cc}<br />
y_{ij} & {\rm if } \ y_{ij}\notin I \\<br />
y_{ij} \in I & {\rm otherwise.}<br />
\end{array} \right. </math> }}<br />
<br />
<blockquote>For example, suppose we are measuring a concentration which naturally only takes non-negative values, but again we cannot measure it below the level $L = 1$. Therefore, any data point $y_{ij}$ below $1$ will be recorded only as "$y_{ij} \in [0,1)$". In the table, an additional column {{Verbatim|llimit}} is required to indicate the lower bound of the censoring interval.</blockquote><br />
<br />
<br />
{| cellspacing="0" cellpadding="0" <br />
| style="width=60%" |<br />
[[File:continuous_graf6b.png|link=]]<br />
| style="width=40%" align="right"|<br />
{| class="wikitable" style="width: 150%"<br />
!| ID || TIME ||CONC. || llimit || cens<br />
|-<br />
| 1 || 0.3 || 1.20 || . || 0<br />
|-<br />
| 1 || 0.5 || 1.93 || . || 0<br />
|-<br />
| 1 || 1.0 || 3.38 || . || 0<br />
|-<br />
| 1 || 2.0 || 3.88 || . || 0<br />
|-<br />
| 1 || 4.0 || 3.24 || . || 0<br />
|-<br />
| 1 || 6.0 || 1.82 || . || 0<br />
|-<br />
| 1 || 8.0 || 1.07 || . || 0<br />
|-<br />
| 1 || 12.0 || 1.00 || 0.00 || 1<br />
|-<br />
| 1 || 16.0 || 1.00 || 0.00 || 1<br />
|-<br />
| 1 || 20.0 || 1.00 || 0.00 || 1<br />
|}<br />
|}<br />
<br />
<br />
<br />
* '''Right censoring:''' when a data point is above a certain value $U$, it is not known by how much:<br />
<br />
{{Equation1<br />
|equation= <math> \repy_{ij} = \left\{ \begin{array}{cc}<br />
y_{ij} & {\rm if } \ y_{ij}\leq U \\<br />
y_{ij} > U & {\rm otherwise.}<br />
\end{array} \right. <br />
</math> }}<br />
<br />
<blockquote>Column {{Verbatim|cens}} is used to indicate if an observation is right-censored ({{Verbatim|cens{{-}}-1}}) or not ({{Verbatim|cens{{-}}0}}).<br />
</blockquote><br />
<br />
{| cellspacing="0" cellpadding="0" <br />
| style="width=60%" |<br />
[[File:continuous_graf6c.png|link=]]<br />
| style="width=40%" align="right" |<br />
{| class="wikitable" style="width: 150%"<br />
!| ID || TIME ||VOLUME || CENS<br />
|-<br />
| 1 || 2.0 || 1.85 || 0<br />
|-<br />
| 1 || 7.0 || 2.40 || 0<br />
|-<br />
| 1 || 12.0 || 3.27 || 0<br />
|-<br />
| 1 || 17.0 || 3.28 || 0<br />
|-<br />
| 1 || 22.0 || 3.62 || 0<br />
|- <br />
| 1 || 27.0 || 3.02 || 0<br />
|-<br />
| 1 || 32.0 || 3.80 || -1<br />
|-<br />
| 1 || 37.0 || 3.80 || -1<br />
|-<br />
| 1 || 42.0 || 3.80 || -1<br />
|-<br />
| 1 || 47.0 || 3.80 || -1<br />
|}<br />
|}<br />
<br />
<br />
<br />
{{Remarks<br />
|title=Remarks<br />
<br />
|text= &#32;<br />
* Different censoring limits and intervals can be in play at different times and for different individuals.<br />
* Interval censoring covers any type of censoring, i.e., setting $I=(-\infty,L]$ for left censoring and $I=[U,+\infty)$ for right censoring.<br />
}}<br />
<br />
<br />
The likelihood needs to be computed carefully in the presence of censored data. To cover all three types of censoring in one go, let $I_{ij}$ be the (finite or infinite) censoring interval existing for individual $i$ at time $t_{ij}$. Then,<br />
<br />
{{EquationWithRef<br />
|equation = <div id="likeN_model4"><math> <br />
\begin{eqnarray} \pcypsi(\brepy {{!}} \bpsi ) & = & \prod_{i=1}^N \prod_{j=1}^{n_i} \pypsiij(y_{ij} {{!}} \psi_i )^{\mathbf{1}_{y_{ij} \notin I_{ij} } } \, \prob{y_{ij} \in I_{ij} {{!}} \psi_i}^{\mathbf{1}_{y_{ij} \in I_{ij} } }.<br />
\end{eqnarray}<br />
</math></div><br />
|reference=(9) }}<br />
<br />
where<br />
<br />
{{Equation1<br />
|equation=<math> \prob{y_{ij} \in I_{ij} {{!}} \psi_i} = \int_{I_{ij} } \qypsiij(u {{!}} \psi_i )\, du </math> }}<br />
<br />
We see that if $y_{ij}$ is not censored (i.e., $ \mathbf{1}_{y_{ij} \notin I_{ij}} = 1$), the contribution to the likelihood is the usual $\pypsiij(y_{ij} | \psi_i )$, whereas if it is censored, the contribution is $\prob{y_{ij} \in I_{ij}|\psi_i}$.<br />
<br />
<br />
<br><br />
<br />
== Extensions to multidimensional continuous observations == <br />
<br />
<br />
<ul><br />
* Extension to multidimensional observations is straightforward. If $d$ outcomes are simultaneously measured at $t_{ij}$, then $y_{ij}$ is a now a vector in $\Rset^d$ and we can suppose that equation [[#nlme|(1)]] still holds for each component of $y_{ij}$. Thus, for $1\leq m \leq d$,<br />
<br />
{{Equation1<br />
|equation=<math><br />
y_{ijm}=f_m(t_{ij},\psi_i)+ g_m(t_{ij},\psi_i)\teps_{ijm} , \ \ 1\leq i \leq N,<br />
\ \ 1 \leq j \leq n_i.<br />
</math>}}<br />
<br />
: It is then possible to introduce correlation between the components of each observation by assuming that $\teps_{ij} = (\teps_{ijm} , 1\leq m \leq d)$ is a random vector with mean 0 and correlation matrix $R_{\teps_{ij}}$.<br />
<br />
<br />
* Suppose instead that $K$ replicates of the same measurement are taken at time $t_{ij}$. Then, the model becomes, for $1 \leq k \leq K$,<br />
<br />
{{Equation1<br />
|equation=<math><br />
y_{ijk}=f(t_{ij},\psi_i)+ g(t_{ij},\bpsi_i)\teps_{ijk} ,\ \ 1\leq i \leq N,<br />
\ \ 1 \leq j \leq n_i .<br />
</math> }}<br />
<br />
: Following what can be done for decomposing random effects into inter-individual and inter-occasion components, we can decompose the residual error into inter-measurement and inter-replicate components:<br />
<br />
{{Equation1<br />
|equation=<math><br />
y_{ijk}=f(t_{ij},\psi_i)+ g_{I\!M}(t_{ij},\psi_i)\vari{\teps}{ij}{I\!M} + g_{I\!R}(x_{ij},\psi_i)\vari{\teps}{ijk}{I\!R} .<br />
</math> }}<br />
</ul><br />
<br><br><br />
-----------------------------------------------<br />
<br><br><br />
<br />
{{Summary<br />
|title=Summary <br />
|text= <br />
A model for continuous data is completely defined by:<br />
<br />
*The structural model $f$<br />
*The residual error model $g$<br />
*The probability distribution of the residual errors $(\teps_{ij})$<br />
*Possibly a transformation $\transy$ of the data<br />
<br />
<br />
The model is associated with a design which includes:<br />
<br />
<br />
- the observation times $(t_{ij})$<br />
<br />
- possibly some additional regression variables $(x_{ij})$<br />
<br />
- possibly the inputs $(u_i)$ (e.g., the dosing regimen for a PK model)<br />
<br />
- possibly a censoring process $(I_{ij})$<br />
<br />
}}<br />
<br />
<br />
== $\mlxtran$ for continuous data models == <br />
<br />
<br />
<br />
{{ExampleWithCode<br />
|title1=Example 1:<br />
|title2=<br />
<br />
|text= <br />
|equation=<math>\begin{eqnarray}<br />
\psi &=& (A,\alpha,B,\beta, a) \\<br />
f(t,\psi) &=& A\, e^{- \alpha \, t} + B\, e^{- \beta \, t} \\<br />
y_{ij} &=& f(t_{ij} , \psi_i) + a\, \teps_{ij}<br />
\end{eqnarray}</math><br />
|code=<br />
{{MLXTranForTable<br />
|name=<br />
|text=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
INPUT:<br />
input = {A, B, alpha, beta, a}<br />
<br />
EQUATION:<br />
f = A*exp(-alpha*t) + B*exp(-beta*t)<br />
<br />
DEFINITION:<br />
y = {distribution=normal, prediction=f, std=a}</pre><br />
}}<br />
<br />
}}<br />
<br />
<br />
{{ExampleWithCode<br />
|title1=Example 2:<br />
|title2=<br />
<br />
|text=<br />
|equation= <math> \begin{eqnarray}<br />
\psi &=& (\delta, c , \beta, p, s, d, \nu,\rho, a) \\<br />
t_0 &=&0 \\[0.2cm]<br />
{\rm if \quad t<t_0} \\[0.2cm]<br />
\quad \nitc &=& \delta \, c/( \beta \, p) \\<br />
\quad \itc &=& (s - d\,\nitc) / \delta \\<br />
\quad \vl &=& p \, \itc / c. \\[0.2cm] <br />
{\rm else \quad \quad }\\[0.2cm] <br />
\quad \dA{\nitc}{} & =& s - \beta(1-\nu) \, \nitc(t) \, \vl(t) - d\,\nitc(t) \\<br />
\quad \dA{\itc}{} & = &\beta(1-\nu) \, \nitc(t) \, \vl(t) - \delta \, \itc(t) \\<br />
\quad \dA{\vl}{} & = &p(1-\rho) \, \itc(t) - c \, \vl(t) \\<br />
\quad \log(y_{ij}) &= &\log(V(t_{ij} , \psi_i)) + a\, \teps_{ij} <br />
\end{eqnarray}</math><br />
|code=<br />
{{MLXTranForTable<br />
|name=<br />
|text=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
INPUT:<br />
input = {delta, c, beta, p, s, d, nu, rho, a}<br />
<br />
EQUATION:<br />
t0=0<br />
N_0 = delta*c/(beta*p)<br />
I_0 = (s - d*N_0)/delta<br />
V_0 = p*I_0/c<br />
ddt_N = s - beta*(1-nu)*N*V - d*N<br />
ddt_I = beta*(1-nu)*N*V - delta*I<br />
ddt_V = p*(1-rho)*I - c*V<br />
<br />
DEFINITION:<br />
y = {distribution=logNormal, prediction=V, std=a}<br />
</pre> }} <br />
}}<br />
<br />
<br><br><br />
<br />
<br />
==Bibliography==<br />
<br />
<br />
<bibtex><br />
@book{davidian1995,<br />
author = {Davidian, M. and Giltinan, D.M. },<br />
title = {Nonlinear Models for Repeated Measurements Data },<br />
publisher = {Chapman & Hall.},<br />
address = {London},<br />
edition = {},<br />
year = {1995}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{demidenko2005mixed,<br />
title={Mixed Models: Theory and Applications},<br />
author={Demidenko, E.},<br />
isbn={9780471726135},<br />
series={Wiley Series in Probability and Statistics}, url={http://books.google.fr/books/about/Mixed_Models.html?id=IWQR8d_UZHoC&redir_esc=y}, <br />
year={2005}, publisher={Wiley}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{fitzmaurice2008longitudinal,<br />
title={Longitudinal Data Analysis},<br />
author={Fitzmaurice, G. and Davidian, M. and Verbeke, G. and Molenberghs, G.},<br />
isbn={9781420011579},<br />
lccn={2008020681},<br />
series={Chapman & Hall/CRC Handbooks of Modern Statistical Methods},url={http://books.google.fr/books?id=zVBjCvQCoGQC},<br />
year={2008},publisher={Taylor & Francis}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{jiang2007,<br />
author = {Jiang, J.},<br />
title = {Linear and Generalized Linear Mixed Models and Their Applications},<br />
publisher = {Springer Series in Statistics},<br />
year = {2007},<br />
address = {New York}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{laird1982,<br />
author = {Laird, N.M. and Ware, J.H.},<br />
title = {Random-Effects Models for Longitudinal Data},<br />
journal = {Biometrics},<br />
volume = {38},<br />
pages = {963-974},<br />
year = {1982}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{lindstrom1990Nonlinear,<br />
author = {Lindstrom, M.J. and Bates, D.M. },<br />
title = {Nonlinear mixed-effects models for repeated measures},<br />
journal = {Biometrics},<br />
volume = {46},<br />
pages = {673-687},<br />
year = {1990}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{littell2006sas,<br />
title={SAS for mixed models},<br />
author={Littell, R.C.},<br />
year={2006},<br />
publisher={SAS institute}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{mcculloch2011generalized,<br />
title={Generalized, Linear, and Mixed Models},<br />
author={McCulloch, C.E. and Searle, S.R.},<br />
isbn={9781118209967},<br />
series={Wiley Series in Probability and Statistics}, url={http://books.google.fr/books/about/Generalized_Linear_and_Mixed_Models.html?id=bWDPukohugQC&redir_esc=y}, year={2004}, publisher={Wiley & Sons} <br />
}<br />
</bibtex><br />
<bibtex><br />
@book{verbeke2009linear,<br />
title={Linear Mixed Models for Longitudinal Data},<br />
author={Verbeke, G. and Molenberghs, G.},<br />
isbn={9781441902993},<br />
lccn={2010483807},<br />
series={Springer Series in Statistics},<br />
url={http://books.google.fr/books?id=jmPkX4VU7h0C},<br />
year={2009},<br />
publisher={Springer}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{west2006linear,<br />
title={Linear Mixed Models: A Practical Guide Using Statistical Software},<br />
author={West, B. and Welch, K.B. and Galecki, A.T.},<br />
isbn={9781584884804},<br />
lccn={2006045440},year={2006},publisher={Taylor & Francis}<br />
}<br />
</bibtex><br />
<br />
<br />
{{Back&Next<br />
|linkBack=Modeling the observations <br />
|linkNext=Models for count data }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=What_is_a_model%3F_A_joint_probability_distribution!&diff=7387What is a model? A joint probability distribution!2013-06-21T08:40:21Z<p>Brocco: </p>
<hr />
<div>==Introduction==<br />
<br />
A model built for real-world applications can involve various types of variables, such as measurements, individual and population parameters, covariates, design, etc. The model allows us to represent relationships between these variables.<br />
<br />
If we consider things from a probabilistic point of view, some of the variables will be random, so the model becomes a probabilistic one, representing the [http://en.wikipedia.org/wiki/Joint_probability_distribution joint distribution] of these random variables.<br />
<br />
Defining a model therefore means defining a joint distribution. The hierarchical structure of the model will then allow it to be decomposed into submodels, i.e., the joint distribution decomposed into a product of [http://en.wikipedia.org/wiki/Conditional_probability_distribution conditional distributions].<br />
<br />
Tasks such as estimation, model selection, simulation and optimization can then be expressed as specific ways of using this probability distribution.<br />
<br />
<br />
{{OutlineText<br />
|text= <br />
- A model is a joint probability distribution. <br />
<br />
- A submodel is a conditional distribution derived from this joint distribution. <br />
<br />
- A task is a specific use of this distribution. <br />
}}<br />
<br />
We will illustrate this approach starting with a very simple example that we will gradually make more sophisticated. Then we will see in various situations what can be defined as the model and what its inputs are.<br />
<br />
<br />
<br><br />
<br />
==An illustrative example==<br />
<br />
<br><br />
===A model for the observations of a single individual===<br />
Let $y=(y_j, 1\leq j \leq n)$ be a vector of observations obtained at times $\vt=(t_j, 1\leq j \leq n)$. We consider that the $y_j$ are random variables and we denote $\qy$ the distribution (or [http://en.wikipedia.org/wiki/Probability_density_function pdf]) of $y$. If we assume a [http://en.wikipedia.org/wiki/Parametric_model parametric model], then there exists a vector of parameters $\psi$ that completely define $y$.<br />
<br />
We can then explicitly represent this dependency with respect to $\bpsi$ by writing $\qy( \, \cdot \, ; \psi)$ for the pdf of $y$.<br />
<br />
If we wish to be even more precise, we can even make it clear that this distribution is defined for a given design, i.e., a given vector of times $\vt$, and write $ \qy(\, \cdot \, ; \psi,\vt)$ instead.<br />
<br />
By convention, the variables which are before the symbol ";" are random variables. Those that are after the ";" are non-random parameters or variables.<br />
When there is no risk of confusion, the non-random terms can be left out of the notation.<br />
<br />
<br />
{{OutlineText<br />
|text= <br />
-In this context, the model is the distribution of the observations $\qy(\, \cdot \, ; \psi,\vt)$. <br><br />
-The inputs of the model are the parameters $\psi$ and the design $\vt$.<br />
}}<br />
<br />
<br />
{{Example<br />
|title=Example<br />
|text= 500 mg of a drug is given by [http://en.wikipedia.org/wiki/Intravenous_therapy intravenous] [http://en.wikipedia.org/wiki/Bolus_%28medicine%29 bolus] to a patient at time 0. We assume that the evolution of the [http://en.wikipedia.org/wiki/Blood_plasma plasmatic] concentration of the drug over time is described by the [http://en.wikipedia.org/wiki/Pharmacokinetics pharmacokinetic] (PK) model<br />
<br />
{{Equation1<br />
|equation=<math> f(t;V,k) = \displaystyle{ \frac{500}{V} }e^{-k \, t} , </math> }}<br />
<br />
where $V$ is the [http://en.wikipedia.org/wiki/Volume_of_distribution volume of distribution] and $k$ the [http://en.wikipedia.org/wiki/Elimination_rate_constant elimination rate constant]. The concentration is measured at times $(t_j, 1\leq j \leq n)$ with additive residual errors:<br />
<br />
{{Equation1<br />
|equation=<math> y_j = f(t_j;V,k) + e_j , \quad 1 \leq j \leq n . </math> }}<br />
<br />
Assuming that the residual errors $(e_j)$ are [http://en.wikipedia.org/wiki/Dependent_and_independent_variables independent] and [http://en.wikipedia.org/wiki/Normal_distribution normally distributed] with constant variance $a^2$, the observed values $(y_j)$ are also independent random variables and<br />
<br />
{{EquationWithRef<br />
|equation=<div id="ex_proba1" ><math><br />
y_j \sim {\cal N} \left( f(t_j ; V,k) , a^2 \right), \quad 1 \leq j \leq n. </math></div><br />
|reference=(1) }}<br />
<br />
Here, the vector of parameters $\psi$ is $(V,k,a)$. $V$ and $k$ are the PK parameters for the structural PK model and $a$ the residual error parameter.<br />
As the $y_j$ are independent, the joint distribution of $y$ is the product of their marginal distributions:<br />
<br />
{{Equation1<br />
|equation=<math> \py(y ; \psi,\vt) = \prod_{j=1}^n \pyj(y_j ; \psi,t_j) ,<br />
</math> }}<br />
<br />
where $\qyj$ is the normal distribution defined in [[#ex_proba1|(1)]]. <br />
}}<br />
<br />
<br />
<br />
<br><br />
<br />
=== A model for several individuals ===<br />
<br />
Now let us move to $N$ individuals. It is natural to suppose that each is represented by the same basic parametric model, but not necessarily the exact same parameter values. Thus, individual $i$ has parameters $\psi_i$. If we consider that individuals are randomly selected from the [http://en.wikipedia.org/wiki/Statistical_population population], then we can treat the $\psi_i$ as if they were random vectors. As both $\by=(y_i , 1\leq i \leq N)$ and $\bpsi=(\psi_i , 1\leq i \leq N)$ are random, the model is now a joint distribution: $\qypsi$. Using basic probability, this can be written as:<br />
<br />
{{Equation1<br />
|equation=<math> <br />
\pypsi(\by,\bpsi) = \pcypsi(\by {{!}} \bpsi) \, \ppsi(\bpsi) .</math> }}<br />
<br />
If $\qpsi$ is a parametric distribution that depends on a vector $\theta$ of ''population parameters'' and a set of ''individual covariates'' $\bc=(c_i , 1\leq i \leq N)$, this dependence can be made explicit by writing $\qpsi(\, \cdot \,;\theta,\bc)$ for the pdf of $\bpsi$.<br />
Each $i$ has a potentially unique set of times $t_i=(t_{i1},\ldots,t_{i \ \!\!n_i})$ in the design, and $n_i$ can be different for each individual.<br />
<br />
<br />
{{OutlineText<br />
|text= <br />
- In this context, the model is the joint distribution of the observations and the individual parameters:<br />
<br />
{{Equation1<br />
|equation=<math> \pypsi(\by , \bpsi; \theta, \bc,\bt)=\pcypsi(\by {{!}} \bpsi;\bt) \, \ppsi(\bpsi;\theta,\bc) . </math>}}<br />
<br />
- The inputs of the model are the population parameters $\theta$, the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times <br />
:$\bt=(t_{ij} ,\ 1\leq i \leq N ,\ 1\leq j \leq n_i)$.<br />
}}<br />
<br />
<br />
{{Example<br />
|title=Example<br />
|text= Let us suppose $ N$ patients received the same treatment as the single patient did. We now have the same PK model [[#ex_proba1|(1)]] for each patient, except that each has its own individual PK parameters $ V_i$ and $ k_i$ and potentially its own residual error parameter $ a_i$:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="ex_proba2a"><math> <br />
y_{ij} \sim {\cal N} \left( \displaystyle{\frac{500}{V_i}e^{-k_i \, t_{ij} } } , a_i^2 \right).<br />
</math></div><br />
|reference=(2) }}<br />
<br />
Here, $\psi_i = (V_i,k_i,a_i)$. One possible model is then to assume the same residual error model for all patients, and log-normal distributions for $V$ and $k$:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
a_i &=& a \end{eqnarray}</math> }}<br />
{{EquationWithRef<br />
|equation=<div id="ex_proba2b"><math>\begin{eqnarray}<br />
\log(V_i) &\sim_{i.i.d.}& {\cal N}\left(\log(V_{\rm pop})+\beta\,\log(w_i/70),\ \omega_V^2\right) \end{eqnarray}</math></div><br />
|reference=(3) }}<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\log(k_i) &\sim_{i.i.d.}& {\cal N}\left(\log(k_{\rm pop}),\ \omega_k^2\right), \end{eqnarray}</math> }}<br />
<br />
where the only covariate we choose to consider, $w_i$, is the weight (in kg) of patient $i$. The model therefore consists of the conditional distribution of the concentrations defined in [[#ex_proba2a|(2)]] and the distribution of the individual PK parameters defined in [[#ex_proba2b|(3)]]. The inputs of the model are the population parameters $\theta = (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$, the covariates (here, the weight) $(w_i, 1\leq i \leq N)$, and the design $\bt$.<br />
}}<br />
<br />
<br><br />
===A model for the population parameters===<br />
<br />
In some cases, it may turn out that it is useful or important to consider that the population parameter $\theta$ is itself random, rather than fixed. There are various reasons for this, such as if we want to model uncertainty in its value, introduce a priori information in an estimation context, or model an inter-population variability if the model is not looking at only one given population.<br />
<br />
If so, let us denote $\qth$ the distribution of $\theta$. As the status of $\theta$ has therefore changed, the model now becomes the joint distribution of its random variables, i.e., of $\by$, $\bpsi$ and $\theta$, and can be decomposed as follows:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="proba3a"><math><br />
\pypsith(\by,\bpsi,\theta;\bt,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta) .<br />
</math></div><br />
|reference=(4) }}<br />
<br />
<br />
{{Remarks<br />
|title=Remarks<br />
|text= <ol><br />
<li> The formula is identical for $\ppsi(\bpsi; \theta)$ and $\pcpsith(\bpsi{{!}}\theta)$. What has changed is the status of $\theta$. It is not random in $\ppsi(\bpsi; \theta)$, the distribution of $\bpsi$ for any given value of $\theta$, whereas it is random in $\pcpsith(\bpsi {{!}} \theta)$, the conditional distribution of $\bpsi$, i.e., the distribution of $\bpsi$ obtained after observing randomly generated $\theta$. </li><br><br />
<br />
<li>If $\qth$ is a parametric distribution with parameter $\varphi$, this dependence can be made explicit by writing $\qth(\, \cdot \,;\varphi)$ for the distribution of $\theta$.</li><br><br />
<br />
<li>Not necessarily all of the components of $\theta$ need be random. If it is possible to decompose $\theta$ into $(\theta_F,\theta_R)$, where $\theta_F$ is fixed and $\theta_R$ random, then the decomposition [[#proba3a{{!}}(4)]] becomes </li><br />
<br />
{{EquationWithRef<br />
|equation=<div id="proba3b"><math><br />
\pypsith(\by,\bpsi,\theta_R;\bt,\theta_F,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta_R;\theta_F,\bc) \, \pth(\theta_R).<br />
</math></div><br />
|reference=(5) }} <br />
</ol>}}<br />
<br />
<br />
{{OutlineText<br />
|text=<br />
<li> In this context, the model is the joint distribution of the observations, the individual parameters and the population parameters:<br />
<br />
{{Equation1<br />
|equation=<math>\pypsith(\by,\bpsi,\theta;\bc,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta). </math> }}<br />
<br />
<li> The inputs of the model are the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times $\bt=(t_{ij} , 1\leq i \leq N , 1\leq j \leq n_i)$.<br />
}}<br />
<br />
<br />
{{Example<br />
|title=Example:<br />
|text= We can introduce [http://en.wikipedia.org/wiki/Prior_probability prior distributions] in order to model the inter-population variability of the population parameters $ V_{\rm pop}$ and $k_{\rm pop}$: <br />
<br />
{{EquationWithRef<br />
|equation=<div id="ex_proba3"><math>\begin{eqnarray}<br />
V_{\rm pop} &\sim& {\cal N}\left(30,3^2\right) <br />
\end{eqnarray}</math></div><br />
|reference=(6) }}<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
k_{\rm pop} &\sim& {\cal N}\left(0.1,0.01^2\right). \end{eqnarray}</math> }}<br />
<br />
As before, the conditional distribution of the concentration is given by [[#ex_proba2a|(2)]]. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given $\theta_R=(V_{\rm pop},k_{\rm pop})$. The distribution of $\theta_R$ is defined in [[#ex_proba3|(6)]]. Here, the inputs of the model are the fixed population parameters $\theta_F = (\omega_V,\omega_k,\beta,a)$, the weights $(w_i)$ and the design $\bt$.<br />
}}<br />
<br />
<br />
<br><br />
<br />
===A model for the covariates===<br />
<br />
<br />
Another scenario is to suppose that in fact it is the covariates $\bc$ that are random, not the population parameters. This may either be in the context of wanting to simulate individuals, or when modeling and wanting to take into account uncertainty in the covariate values. If we note $\qc$ the distribution of the covariates, the joint distribution $\qpsic$ of the individual parameters and the covariates decomposes naturally as:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="proba4"><math><br />
\ppsic(\bpsi,\bc;\theta) = \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) \, ,<br />
</math><br />
|reference=(7) }}<br />
<br />
where $\qcpsic$ is the conditional distribution of $\bpsi$ given $\bc$.<br />
<br />
<br />
{{OutlineText<br />
|text= <br />
<li>In this context, the model is the joint distribution of the observations, the individual parameters and the covariates:<br />
<br />
{{Equation1<br />
|equation=<math> <br />
\pypsic(\by,\bpsi,\bc;\theta,\bt) = \pcypsi(\by {{!}} \bpsi;\bt) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) . </math> }}<br />
<br />
<li>The inputs of the model are the population parameters $\theta$ and the measurement times $\bt$.<br />
}}<br />
<br />
<br />
{{Example<br />
|title=Example:<br />
|text= We could assume a normal distribution as a prior for the weights: <br />
<br />
{{EquationWithRef<br />
|equation=<div id="ex_proba4" ><math> <br />
w_i \sim_{i.i.d.} {\cal N}\left(70,10^2\right). </math></div><br />
|reference=(8) }}<br />
<br />
Once more, [[#ex_proba2a|(2)]] defines the conditional distribution of the concentrations. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given the weight $\bw$, which is now a random variable whose distribution is defined in [[#ex_proba4|(8)]]. Now, the inputs of the model are the population parameters $\theta= (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$ and the design $\bt$.<br />
}}<br />
<br />
<br />
<br><br />
<br />
===A model for the measurement times===<br />
<br />
Another scenario is to suppose that there is uncertainty in the measurement times $\bt=(t_{ij})$ and not the population parameters or covariates. If we note $\nominal{\bt}=(\nominal{t}_{ij}, 1\leq i \leq N, 1\leq j \leq n_i)$ the nominal measurement times (i.e., those presented in a data set), then the "true" measurement times $\bt$ at which the measurement were made can be considered random fluctuations around $\nominal{\bt}$ following some distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br />
<br />
Randomness with respect to time can also appear in the presence of dropout, i.e., individuals that prematurely leave a clinical trial. For such an individual $i$ who leaves at the random time $T_i$, measurement times are the nominal times before $T_i$: $t_{i} = (\nominal{t}_{ij} \ \ {\rm s.t. }\ \ \nominal{t}_{ij}\leq T_i)$.<br />
In such situations, measurement times are therefore random and can be thought to come from a distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br />
<br />
<br />
{{Remarks<br />
|title=Remark<br />
|text= If there are also other regression variables $\bx=(x_{ij})$, it is of course possible to use the same approach and consider $\bx$ as a random variable fluctuating around $\nominal{\bx}$. }}<br />
<br />
<br />
{{OutlineText<br />
|text=<br />
<li> In this context, the model is the joint distribution of the observations, the individual parameters and the measurement times:<br />
<br />
{{Equation1<br />
|equation=<math>\pypsit(\by , \bpsi,\bt; \theta,\bc,\nominal{\bt})=\pcypsit(\by {{!}}\bpsi,\bt) \, \ppsi(\bpsi;\theta,\bc) \, \pt(\bt ; \nominal{\bt}) .<br />
</math> }}<br />
<br />
<br />
<li> The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$ and the nominal design $\nominal{\bt}$.<br />
}}<br />
<br />
{{Example<br />
|title=Example:<br />
|text= Let us assume as prior a normal distribution around the nominal times: <br />
<br />
{{EquationWithRef<br />
|equation=<div id="ex_proba5" ><math> <br />
t_{ij} \sim_{i.i.d.} {\cal N}\left(\nominal{t}_{ij},0.03^2\right). </math></div><br />
|reference=(9) }}<br />
<br />
Here, [[#ex_proba5|(9)]] defines the distribution of the now random variable $ \bt$. The other components of the model defined in [[#ex_proba2a|(2)]] and [[#ex_proba2b|(3)]] remain unchanged. <br />
The inputs of the model are the population parameters $ \theta$, the weights $ (w_i)$ and the nominal measurement times $ \nominal{\bt}$.<br />
}}<br />
<br />
<br />
<br><br />
<br />
===A model for the dose regimen===<br />
<br />
If the structural model is a dynamical system (e.g., defined by a system of [http://en.wikipedia.org/wiki/Ordinary_differential_equation ordinary differential equations]), the ''source terms'' $\bu = (u_i, 1\leq i \leq N)$, i.e., the inputs of the dynamical system, are usually considered fixed and known. This is the case for example for doses administered to patients for a given treatment. Here, the source term $u_i$ is made up of the dose(s) given to patient $i$, the time(s) of administration, and their type (intravenous bolus, infusion, oral, etc.).<br />
<br />
Here again, there may be differences between the nominal dose regimen stated in the protocol and given in the data set, and the dose regimen that was in reality administered. For example, it might be that the times of administration and/or the dosage were not exactly respected or recorded. Also, there may have been non-compliance, i.e., certain doses that were not taken by the patient.<br />
<br />
If we denote $\nominal{\bu}=(\nominal{u}_{i}, 1\leq i \leq N)$ the nominal dose regimens (reported in the dataset), then in this context the "real" dose regimens $\bu$ can be considered to randomly fluctuate around $\nominal{\bu}$ with some distribution $\qu(\, \cdot \, ; \nominal{\bu})$.<br />
<br />
<br />
{{OutlineText<br />
|text=<br />
<li> In this context, the model is the joint distribution of the observations, the individual parameters and the dose regimens:<br />
<br />
{{Equation1<br />
|equation=<math>\pypsiu(\by , \bpsi,\bu; \theta,\bc,\bt,\nominal{\bu})=\pcypsiu(\by {{!}} \bpsi,\bu;\bt) \, \pu(\bu ; \nominal{\bu}) \, \ppsi(\bpsi;\theta,\bc) .<br />
</math> }}<br />
<br />
<li> The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$, the nominal design $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br />
}}<br />
<br />
<br />
{{Example<br />
|title=Example:<br />
|text=Suppose that instead of the one dose given in the example up to now, there are now repeated doses $(d_{ik}, k \geq 1)$ administered to patient $i$ at times $(\tau_{ik} , k \geq 1)$. Then, it is easy to see that<br />
<br />
{{EquationWithRef<br />
|equation=<div id="ex_proba6b" ><math> <br />
y_{ij} \sim {\cal N}\left(f(t_{ij};V_i,k_i) , a_i^2 \right), </math></div><br />
|reference=(10) }}<br />
<br />
where<br />
<br />
{{EquationWithRef<br />
|equation=<div id="ex_proba6a" ><math> <br />
f(t;V_i,k_i) = \sum_{k, \tau_{ik}<t}\displaystyle {\frac{d_{ik} }{V_i} }\, e^{-k_i \, (t- \tau_{ik})} .<br />
</math></div><br />
|reference=(11) }}<br />
<br />
The "real" dose regimen administered to patient $i$ can be written $u_i=(d_{ik},\tau_{ik}, k\geq 1)$, and the prescribed dose regimen $\nominal{u}_i=(\nominal{d}_{ik},\nominal{\tau}_{ik}, k\geq 1)$.<br />
<br />
We can model the random fluctuations of the administration times $\tau_{ik}$ around the nominal times $(\nominal{\tau}_{ij})$:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="ex_proba6c"><math>\begin{eqnarray}<br />
\tau_{ik} &\sim_{i.i.d.}& {\cal N}\left(\nominal{\tau}_{ik},0.02^2\right)\ , <br />
\end{eqnarray}</math></div><br />
|reference=(12) }}<br />
<br />
and non-compliance (here meaning that a dose is not taken):<br />
<br />
{{EquationWithRef<br />
|equation=<div id="ex_proba6d"><math>\begin{eqnarray}<br />
\pi &=& \prob{d_{ik} = 0} \nonumber \\ &=& 1 - \prob{d_{ik}= \nominal{d}_{ik} }. <br />
\end{eqnarray}</math></div><br />
|reference=(13) }}<br />
<br />
Here, [[#ex_proba6b{{!}}(10)]] and [[#ex_proba6a{{!}}(11)]] define the conditional distributions of the concentrations $(y_{ij})$, [[#ex_proba2b{{!}}(3)]] defines the distribution of $\bpsi$ and [[#ex_proba6c{{!}}(12)]] and [[#ex_proba6d{{!}}(13)]] define the distribution of $\bu$. The inputs are the population parameters $\theta$, the weights $(w_i)$, the measurement times $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br />
}}<br />
<br />
<br />
<br><br />
<br />
===A complete model===<br />
<br />
We have now seen the variety of ways in which the variables in a model either play the role of random variables whose distribution is defined by the model, or nonrandom variables or parameters. Any combination is possible, depending on the context. For instance, the population parameters $\theta$ and covariates $\bc$ could be random with parametric probability distributions $\qth(\, \cdot \,;\varphi)$ and $\qc(\, \cdot \,;\gamma)$, and the dose regimen $\bu$ and measurement times $\bt$ reported with uncertainty and therefore modeled as random variables with distributions $\qu$ and $\qt$.<br />
<br />
<br />
{{OutlineText<br />
|text=<br />
<li> In this context, the model is the joint distribution of the observations, the individual parameters, the population parameters, the dose regimens, the covariates and the measurement times:<br />
<br />
{{Equation1<br />
|equation=<math><br />
\pypsithcut(\by , \bpsi, \theta, \bu, \bc,\bt; \nominal{\bu},\nominal{\bt},\varphi,\gamma)=\pcypsiut(\by {{!}}\bpsi,\bu,\bt) \, \pcpsithc(\bpsi{{!}}\theta,\bc) \, \pth(\theta;\varphi) \, \pc(\bc;\gamma) \, \pu(\bu ; \nominal{\bu}) \, \pt(\bt ; \nominal{\bt}). </math> }}<br />
<br />
<li> The inputs of the model are the nominal dose regimens $\nominal{\bu}$, the nominal measurement times $\nominal{\bt}$ and the "hyper-parameters" $\varphi$ and $\gamma$.<br />
}}<br />
<br />
<br />
<br><br />
<br />
==Using the model for executing tasks==<br />
<br />
<br />
In the modeling and simulation context, the tasks to execute make specific use of the various probability distributions associated with a model.<br />
<br />
<br><br />
===Simulation===<br />
<br />
<br />
By definition, simulation makes direct use of the probability distribution that defines the model. Simulation of the global model is straightforward as soon as the joint distribution can be decomposed into a product of easily simulated conditional distributions.<br />
<br />
Consider for example that the variables involved in the model are those introduced in [[#An illustrative example|the previous section]]:<br />
<br />
<br />
# The population parameters $\theta$ can either be given, or simulated from the distribution $\qth$.<br />
# The individual covariates $\bc$ can either be given, or simulated from the distribution $\qc$.<br />
# The individual parameters $\bpsi$ can be simulated from the distribution $\qcpsithc$ using the values of $\theta$ and $\bc$ obtained in steps 1 and 2.<br />
# The dose regimen $\bu$ can either be given, or simulated from the distribution $\qu$.<br />
# The measurement times $\bt$ (resp. regression variables $\bx$) can either be given, or simulated from the distribution $\qt$ (resp. $\qx$).<br />
# Lastly, observations $\by$ can be simulated from the distribution $\qcypsiut$ using the values of $\bpsi$, $\bu$ and $\bt$ obtained at steps 3, 4 and 5.<br />
<br />
<br />
{{OutlineText<br />
|text=<br />
Simulation of a set of variables $w$ using another given set of variables $z$ requires:<br />
<br />
<br />
<ul><br />
* a model, i.e., a distribution $\qw$ if $z$ is treated as a nonrandom variable, or a conditional distribution $\qcwz$ if $z$ is treated as a random variable.<br />
* the input $z$, i.e., a value of $z$ which allows the distribution $\qw(\, \cdot \, ; z)$ or the conditional distribution $\qcwz(\, \cdot \, {{!}} z)$ to be defined.<br />
* an algorithm which allows us to generate $w$ from $\qw$ or $\qcwz$.<br />
</ul><br />
}}<br />
<br />
<br />
{{Example<br />
|title=Example:<br />
|text=<br />
- Imagine instead that the population parameter $\theta$ and the design $(\bu,\bt)$ are given, and we want to simulate the individual covariates $\bc$, the individual parameters $\bpsi$ and the observations $\by$. Here, the variables to simulate are $w=(\bc,\bpsi,\by)$ and the variables which are given are $z=(\theta,\bu,\bt)$. If the components of $z$ are taken to be nonrandom variables, then:<br />
<br />
<br />
<ul><br />
* The model is the joint distribution $\qypsic( \, \cdot \, ;\theta,\bu,\bt)$ of $(\by,\bpsi,\bc)$.<br />
* The inputs required for the simulation are the values of $(\theta,\bu,\bt)$.<br />
* The algorithm should be able to generate $(\by,\bpsi,\bc)$ from the joint distribution $\qypsic(\, \cdot \, ;\theta,\bu,\bt)$. Decomposing the model into three submodels leads to decomposing the joint distribution as<br />
</ul><br />
<br />
{{Equation1<br />
|equation=<math><br />
\pypsic(\by,\bpsi,\bc ;\theta,\bu,\bt) = \pc(\bc) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pcypsi(\by {{!}} \bpsi;\bu,\bt) . </math> }}<br />
<br />
The algorithm therefore reduces to successively drawing $\bc$, $\bpsi$ and $\by$ from $\qc$, $\qcpsic(\, \cdot \, {{!}} \bc;\theta)$ and $\qcypsi(\, \cdot \, {{!}} \bpsi;\bu,\bt)$. <br />
<br />
<br />
- Imagine instead that the individual covariates $\bc$, the observations $\by$, the design $(\bu,\bt)$ and the population parameter $\theta$ are given (in a modeling context for instance, $\theta$ may have been estimated), and we want to simulate the individual parameters $\bpsi$. The only variable to simulate is $w=\bpsi$ and the variables which are given are $z=(\by,\bc,\theta,\bu,\bt)$. Here, we will treat $\by$ as if it is a random variable. The other components of $z$ can be treated as non random variables. Here,<br />
<br />
<br />
<ul><br />
* The model is the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$ of $\psi$.<br />
* The inputs required for the simulation are the values of $(\by,\bc,\theta,\bu,\bt)$.<br />
* The algorithm should be able to sample $\bpsi$ from the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$. [http://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo Markov Chain Monte Carlo] (MCMC) algorithms can be used for sampling from such complex conditional distributions.<br />
</ul><br />
}}<br />
<br />
<br><br />
<br />
===Estimation of the population parameters===<br />
<br />
<br />
In a modeling context, we usually assume that we have data that includes the observations $\by$ and the measurement times $\bt$. There may also be individual covariates $\bc$, and in pharmacological applications the dose regimen $\bu$. For clarity, let us consider the most general case where all are present.<br />
<br />
Any statistical method for estimating the population parameters $\theta$ will be based on some specific probability distribution. Let us illustrate this with two common statistical methods: maximum likelihood and Bayesian estimation.<br />
<br />
<br />
''Maximum likelihood estimation'' consists in maximizing with respect to $\theta$ the ''observed likelihood'', defined by:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
{\like}(\theta ; \by,\bc,\bu,\bt) &\eqdef& \py(\by ; \bc,\bu,\bt,\theta) \\<br />
&=& \int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi .<br />
\end{eqnarray}</math> }}<br />
<br />
The variance of the estimator $\thmle$ and therefore confidence intervals can be derived from the observed Fisher information matrix, which itself is calculated using the observed likelihood (i.e., the pdf of the observations $\by$):<br />
<br />
{{EquationWithRef<br />
|equation=<div id="ofim_intro3"><math><br />
\ofim(\thmle ; \by,\bc,\bu,\bt) \ \ \eqdef \ \ - \displaystyle{ \frac{\partial^2}{\partial \theta^2} } \log({\like}(\thmle ; \by,\bc,\bu,\bt)) .<br />
</math></div><br />
|reference=(14) }}<br />
<br />
<br />
{{OutlineText<br />
|text=Maximum likelihood estimation of the population parameter $\theta$ requires:<br />
<br />
<br />
* a model, i.e., a joint distribution $\qypsi$.<br />
* inputs $\by$, $\bc$, $\bu$ and $\bt$.<br />
* an algorithm which allows us to maximize $\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi$ with respect to $\theta$ and to compute $\displaystyle{ \frac{\partial^2}{\partial \theta^2} }\left\{\log\left(\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\thmle) \, d \bpsi \right)\right\}$.<br />
}}<br />
<br />
<br />
''Bayesian estimation'' consists in estimating and/or maximizing the conditional distribution<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\pcthy(\theta {{!}} \by ;\bc,\bu,\bt) &=& \displaystyle{ \frac{\pyth(\by , \theta ; \bc,\bu,\bt)}{\py(\by ; \bc,\bu,\bt)} } \\<br />
&=& \frac{\displaystyle{ \int \pypsith(\by,\bpsi,\theta ; \bc,\bu,\bt) \, d \bpsi} }{\py(\by ; \bc,\bu,\bt)} .<br />
\end{eqnarray}</math> }}<br />
<br />
<br />
{{outlineText<br />
|text= Bayesian estimation of the population parameter $\theta$ requires:<br />
<br />
<br />
* a model, i.e., a joint distribution $\qypsith(\, \cdot \, ; \bc, \bu, \bt)$ for $(\by,\bpsi,\theta)$.<br />
* inputs $\by$, $\bc$, $\bu$ and $\bt$.<br />
* algorithms able to estimate and maximize $\pcthy(\theta {{!}} \by ;\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode, i.e., finding its maximum.<br />
}}<br />
<br />
<br />
<br><br />
<br />
===Estimation of the individual parameters===<br />
<br />
<br />
When $\theta$ is given (or estimated), various estimators of the individual parameters $\bpsi$ are available. They are all based on a probability distribution:<br />
<br />
''Maximum likelihood estimation'' consists of maximizing with respect to $\bpsi$ the ''conditional likelihood''<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
{\like}(\bpsi ; \by,\bu,\bt) &\eqdef& \pcypsi(\by {{!}} \bpsi ;\bu,\bt) .<br />
\end{eqnarray}</math> }}<br />
<br />
The ''maximum a posteriori'' (MAP) estimator is obtained by maximizing with respect to $\bpsi$ the ''conditional distribution''<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\pcpsiy(\bpsi {{!}} \by ; \theta,\bc,\bu,\bt) &=& \displaystyle{ \frac{\pypsi(\by , \bpsi;\theta,\bc,\bu,\bt)}{\py(\by ; \theta,\bc,\bu,\bt)} } .<br />
\end{eqnarray}</math> }}<br />
<br />
The ''conditional mean'' of $\bpsi$ is defined as the mean of the conditional distribution $\qcpsiy(\, \cdot \, | \by ; \theta,\bc,\bu,\bt)$ of $\psi$.<br />
<br />
<br />
{{OutlineText<br />
|text=<br />
Estimation of the individual parameters $\bpsi$ requires:<br />
<br />
<br />
* a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br />
* inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br />
* algorithms able to estimate and maximize $\pcpsiy(\bpsi {{!}} \by ; \theta,\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode (i.e., its MAP).<br />
}}<br />
<br />
<br />
<br><br />
===Model selection===<br />
<br />
<br />
Likelihood ratio tests and statistical information criteria ([http://en.wikipedia.org/wiki/Bayesian_information_criterion BIC], [http://en.wikipedia.org/wiki/Akaike_information_criterion AIC]) compare the ''observed likelihoods'' computed under different models, i.e., the probability distribution functions $\py^{(1)}(\by ; \bc,\bu,\bt,\thmle_1)$, $\py^{(2)}(\by ; \bc,\bu,\bt,\thmle_2)$, ..., $\py^{(K)}(\by ; \bc,\bu,\bt,\thmle_K)$ computed under models ${\cal M}_1, {\cal M}_2$, ..., ${\cal M}_K$, where $\thmle_k$ maximizes the observed likelihood of model ${\cal M}_k$, i.e., maximizes $\py^{(k)}(\by ; \bc,\bu,\bt,\theta)$ .<br />
<br />
<br />
{{outlineText<br />
|text=<br />
Computing the observed likelihood and information criteria require:<br />
<br />
<br />
* a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br />
* inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br />
* an algorithm able to compute $\int \pypsi( \by ,\bpsi ;\theta,\bc,\bu,\bt) \, d\bpsi$. For nonlinear models, linearization methods or Monte Carlo methods can be used.<br />
}}<br />
<br />
<br />
<br><br />
<br />
===Optimal design===<br />
<br />
In the design of experiments for estimating statistical models, optimal designs allow parameters to be estimated with minimum variance by optimizing some statistical criteria. Common optimality criteria are [http://en.wikipedia.org/wiki/Functional_%28mathematics%29 functionals] of the [http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors eigenvalues] of the expected Fisher information matrix<br />
<br />
{{EquationWithRef<br />
|equation=<div id="efim_intro3"><math><br />
\efim(\theta ; \bu,\bt) \ \ \eqdef \ \ \esps{y}{\ofim(\theta ; \by,\bu,\bt)} ,<br />
</math></div><br />
|reference=(15) }}<br />
<br />
where $\ofim$ is the observed Fisher information matrix defined in [[#ofim_intro3|(15)]]. For the sake of simplicity, we consider models without covariates $\bc$.<br />
<br />
<br />
{{OutlineText<br />
|text=Optimal design for minimum variance estimation requires:<br />
<br />
<br />
* a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br />
* a vector of population parameters $\theta$.<br />
* a criteria ${\cal D}(\bu,\bt)$ derived from the expected Fisher information matrix $\efim(\theta ; \bu,\bt)$.<br />
* an algorithm able to estimate $\efim(\theta ; \bu,\bt)$ for any design $(\bu,\bt)$ and to maximize ${\cal D}(\bu,\bt)$ with respect to $\bu$ and $\bt$.<br />
}}<br />
<br />
<br />
In a [http://en.wikipedia.org/wiki/Clinical_trial clinical trial] context, studies are designed to optimize the probability of reaching some predefined target ${\cal A}$, i.e., $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$. This may include optimizing safety and efficacy, and things like the probability of reaching [http://en.wikipedia.org/wiki/Sustained_viral_response sustained virologic response], etc.<br />
<br />
<br />
{{OutlineText<br />
|text=Optimal design for clinical trials requires:<br />
<br />
<br />
* a model, i.e., a joint distribution $\qypsic(\, \cdot \, ; \theta, \bu, \bt)$ for $(\by,\bpsi,\bc)$.<br />
* a vector of population parameters $\theta$.<br />
* a target ${\cal A}$.<br />
* an algorithm able to estimate $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$ and to maximize it with respect to $\bu$ and $\bt$.<br />
}}<br />
<br />
<br />
<br><br />
<br />
==Implementing models and running tasks==<br />
<br />
<br />
===Example 1 ===<br />
<br />
Consider first the model defined by the joint distribution <br />
<br />
{{Equation1<br />
|equation= <math>\pypsi(\by,\bpsi ; \theta ,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \pcpsic(\bpsi ; \theta),</math>}}<br />
<br />
where as in our running example, <br />
<br />
<br />
<ul><br />
* $\by = (y_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are concentrations <br />
<br />
* $ \bpsi= (\psi_i, 1\leq i \leq N)$ are individual parameters; here $ \psi_i=(V_i,k_i,a_i)$<br />
<br />
* $ \theta=(V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,a)$ are population parameters <br />
<br />
* $ \bt = (t_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are the measurement times. <br />
</ul><br />
<br />
<br />
We aim to define a joint model for $\by$ and $\bpsi$. To do this, we will characterize each component of the model and show how this can be implemented with $\mlxtran$.<br />
<br />
{| cellspacing="10" cellpadding="10"<br />
|style="width:50%"|<br />
{{Equation2 <br />
|name=<math> \pypsi(\by,\bpsi ; \theta, \bt) </math> <br />
|equation= }}<br />
{{Equation2<br />
|name= <math> \pcpsic(\bpsi {{!}}\theta)</math><br />
|equation=<br />
<math>\begin{array}{c} <br />
\log(V_i) &\sim& {\cal N}\left(\log(V_{\rm pop}), \, \omega_V^2\right) \\<br />
\log(k_i) &\sim& {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br />
\end{array}</math> }}<br />
{{Equation2<br />
|name= <math>\pcypsi(y{{!}}\bpsi; \bt) </math><br />
|equation=<br />
<math>\begin{eqnarray}<br />
f(t;V_i,k_i) &=& \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br />
y_{ij} &\sim& {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br />
\end{eqnarray}</math> }}<br />
<br />
|style = "width:50%" |<br />
{{MLXTranForTable<br />
|name=Example 1<br />
|text=<pre style="background-color:#EFEFEF; border:none;"> <br />
[INDIVIDUAL PARAMETER]<br />
input={V_pop,k_pop,omega_V,omega_k}<br />
<br />
DEFINITION:<br />
V = {distribution=logNormal, prediction=V_pop,sd=omega_V}<br />
k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br />
<br />
<br />
[OBSERVATION]<br />
input={V,k,a}<br />
<br />
EQUATION:<br />
f = 500/V*exp(-k*t)<br />
<br />
DEFINITION:<br />
y = {distribution=normal, prediction=f, sd=a}<br />
</pre> }}<br />
|}<br />
<br />
<br />
We can then use this model with different tools for executing different tasks: it can be used for example with $\mlxplore$ for model exploration, with $\monolix$ for modeling, with R or Matlab for simulation, etc.<br />
<br />
It is important to remember that $\mlxtran$ is not a "function" that calculates an output. It is not an imperative but rather a declarative language, one that allows us to describe a model. It is then the tasks we choose to do which use $\mlxtran$ like a function, "requesting" it to give predictions, simulate random variables, compute a pdf, maximizes a likelihood, etc.<br />
<br />
<br />
<br><br />
<br />
===Example 2===<br />
<br />
Consider now a model defined by the joint distribution<br />
<br />
{{Equation1<br />
|equation=<math><br />
\pypsithc(\by,\bpsi, \theta, \bc ; \bt) = \pcypsi(\by{{!}}\bpsi;\bt) \pcpsic(\bpsi{{!}}\bc ; \theta) \, \pth(\theta) \pc(\bc) ,<br />
</math> }}<br />
<br />
where the covariates $\bc$ are the weights of the individuals: $\bc = (w_i, 1\leq i \leq N)$. The other variables and parameters are those already defined in the previous example.<br />
<br />
We now aim to define a joint model for $\by$, $\bpsi$, $\bc$ and $\theta_R=(V_{\rm pop},k_{\rm pop})$.<br />
<br />
<br />
{| cellspacing="10" cellpadding="10"<br />
|style="width:50%" |<br />
{{Equation2 <br />
|name= <math>\pypsithc(\by,\bpsi, \theta, \bc ; \bt)</math><br />
|equation= }}<br />
{{Equation2<br />
|name=<math>\pth(\theta)</math><br />
|equation=<math>\begin{eqnarray}<br />
V_{\rm pop} &\sim& {\cal N}\left(30,3^2\right) \\<br />
k_{\rm pop} &\sim& {\cal N}\left(0.1,0.01^2\right)<br />
\end{eqnarray}</math> }}<br />
{{Equation2<br />
|name=<math>\pc(\bc)</math><br />
|equation=<br />
<math>\begin{eqnarray}<br />
w_i &\sim& {\cal N}\left(70,10^2\right)<br />
\end{eqnarray}</math> }}<br />
{{Equation2<br />
|name=<math>\pcpsic(\bpsi {{!}}\bc;\theta)</math><br />
|equation=<math><br />
\begin{eqnarray}<br />
\hat{V}_i &=& V_{\rm pop}\left(\frac{w_i}{70}\right)^\beta \\[0.4cm]<br />
\log(V_i) &\sim& {\cal N}\left(\log(\hat{V}_i), \, \omega_V^2\right) \\<br />
\log(k_i) &\sim& {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br />
\end{eqnarray}</math> }}<br />
{{Equation2<br />
|name=<math>\pcypsi(y{{!}}\bpsi; \bt) </math><br />
|equation=<math>\begin{eqnarray}<br />
f(t;V_i,k_i) &=& \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br />
y_{ij} &\sim& {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br />
\end{eqnarray}</math> }}<br />
<br />
|style="width:50%"|<br />
{{MLXTranForTable<br />
|name=jointModel2.txt<br />
|text=<pre style="background-color:#EFEFEF; border:none"><br />
[POPULATION PARAMETER]<br />
<br />
DEFINITION:<br />
V_pop = {distribution=normal, mean=30, sd=3}<br />
k_pop = {distribution=normal, mean=0.1, sd=0.01}<br />
<br />
<br />
[COVARIATE]<br />
<br />
DEFINITION:<br />
weight = {distribution=normal, mean=70, sd=10}<br />
<br />
<br />
<br />
[INDIVIDUAL PARAMETER]<br />
input={V_pop,k_pop,omega_V,omega_k,beta,weight}<br />
<br />
EQUATION:<br />
V_pred = V_pop*(weight/70)^beta<br />
<br />
DEFINITION:<br />
V = {distribution=logNormal, prediction=V_pred,sd=omega_V}<br />
k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br />
<br />
<br />
[OBSERVATION]<br />
input={V,k,a}<br />
<br />
EQUATION:<br />
f = 500/V*exp(-k*t)<br />
<br />
DEFINITION:<br />
y = {distribution=normal, prediction=f, sd=a}<br />
</pre> }}<br />
|}<br />
<br />
We can use the approach described above for various tasks, e.g., simulating $(\by,\bpsi, \bc, \theta_R)$ for a given input $(\theta_F, \bt)$, simulating the population parameters $(V_{\rm pop},k_{\rm pop})$ with the conditional distribution $p_{\theta_R|\by, \bc}( \, \cdot \, | \by, \bc ; \theta_F,\bt)$, estimating the log-likelihood, maximizing the observed likelihood and computing the MAP.<br />
<br />
<!--<br />
==Bibliography==<br />
<br />
--><br />
<br />
{{Back&Next<br />
|linkBack=The individual approach<br />
|linkNext=Description, representation and implementation of a model }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=What_is_a_model%3F_A_joint_probability_distribution!&diff=7386What is a model? A joint probability distribution!2013-06-21T08:39:55Z<p>Brocco: </p>
<hr />
<div>==Introduction==<br />
<br />
A model built for real-world applications can involve various types of variables, such as measurements, individual and population parameters, covariates, design, etc. The model allows us to represent relationships between these variables.<br />
<br />
If we consider things from a probabilistic point of view, some of the variables will be random, so the model becomes a probabilistic one, representing the [http://en.wikipedia.org/wiki/Joint_probability_distribution joint distribution] of these random variables.<br />
<br />
Defining a model therefore means defining a joint distribution. The hierarchical structure of the model will then allow it to be decomposed into submodels, i.e., the joint distribution decomposed into a product of [http://en.wikipedia.org/wiki/Conditional_probability_distribution conditional distributions].<br />
<br />
Tasks such as estimation, model selection, simulation and optimization can then be expressed as specific ways of using this probability distribution.<br />
<br />
<br />
{{OutlineText<br />
|text= <br />
- A model is a joint probability distribution. <br />
<br />
- A submodel is a conditional distribution derived from this joint distribution. <br />
<br />
- A task is a specific use of this distribution. <br />
}}<br />
<br />
We will illustrate this approach starting with a very simple example that we will gradually make more sophisticated. Then we will see in various situations what can be defined as the model and what its inputs are.<br />
<br />
<br />
<br><br />
<br />
==An illustrative example==<br />
<br />
<br><br />
===A model for the observations of a single individual===<br />
Let $y=(y_j, 1\leq j \leq n)$ be a vector of observations obtained at times $\vt=(t_j, 1\leq j \leq n)$. We consider that the $y_j$ are random variables and we denote $\qy$ the distribution (or [http://en.wikipedia.org/wiki/Probability_density_function pdf]) of $y$. If we assume a [http://en.wikipedia.org/wiki/Parametric_model parametric model], then there exists a vector of parameters $\psi$ that completely define $y$.<br />
<br />
We can then explicitly represent this dependency with respect to $\bpsi$ by writing $\qy( \, \cdot \, ; \psi)$ for the pdf of $y$.<br />
<br />
If we wish to be even more precise, we can even make it clear that this distribution is defined for a given design, i.e., a given vector of times $\vt$, and write $ \qy(\, \cdot \, ; \psi,\vt)$ instead.<br />
<br />
By convention, the variables which are before the symbol ";" are random variables. Those that are after the ";" are non-random parameters or variables.<br />
When there is no risk of confusion, the non-random terms can be left out of the notation.<br />
<br />
<br />
{{OutlineText<br />
|text= <br />
-In this context, the model is the distribution of the observations $\qy(\, \cdot \, ; \psi,\vt)$. <br><br />
-The inputs of the model are the parameters $\psi$ and the design $\vt$.<br />
}}<br />
<br />
<br />
{{Example<br />
|title=Example<br />
|text= 500 mg of a drug is given by [http://en.wikipedia.org/wiki/Intravenous_therapy intravenous] [http://en.wikipedia.org/wiki/Bolus_%28medicine%29 bolus] to a patient at time 0. We assume that the evolution of the [http://en.wikipedia.org/wiki/Blood_plasma plasmatic] concentration of the drug over time is described by the [http://en.wikipedia.org/wiki/Pharmacokinetics pharmacokinetic] (PK) model<br />
<br />
{{Equation1<br />
|equation=<math> f(t;V,k) = \displaystyle{ \frac{500}{V} }e^{-k \, t} , </math> }}<br />
<br />
where $V$ is the [http://en.wikipedia.org/wiki/Volume_of_distribution volume of distribution] and $k$ the [http://en.wikipedia.org/wiki/Elimination_rate_constant elimination rate constant]. The concentration is measured at times $(t_j, 1\leq j \leq n)$ with additive residual errors:<br />
<br />
{{Equation1<br />
|equation=<math> y_j = f(t_j;V,k) + e_j , \quad 1 \leq j \leq n . </math> }}<br />
<br />
Assuming that the residual errors $(e_j)$ are [http://en.wikipedia.org/wiki/Dependent_and_independent_variables independent] and [http://en.wikipedia.org/wiki/Normal_distribution normally distributed] with constant variance $a^2$, the observed values $(y_j)$ are also independent random variables and<br />
<br />
{{EquationWithRef<br />
|equation=<div id="ex_proba1" ><math><br />
y_j \sim {\cal N} \left( f(t_j ; V,k) , a^2 \right), \quad 1 \leq j \leq n. </math></div><br />
|reference=(1) }}<br />
<br />
Here, the vector of parameters $\psi$ is $(V,k,a)$. $V$ and $k$ are the PK parameters for the structural PK model and $a$ the residual error parameter.<br />
As the $y_j$ are independent, the joint distribution of $y$ is the product of their marginal distributions:<br />
<br />
{{Equation1<br />
|equation=<math> \py(y ; \psi,\vt) = \prod_{j=1}^n \pyj(y_j ; \psi,t_j) ,<br />
</math> }}<br />
<br />
where $\qyj$ is the normal distribution defined in [[#ex_proba1|(1)]]. <br />
}}<br />
<br />
<br />
<br />
<br><br />
<br />
=== A model for several individuals ===<br />
<br />
Now let us move to $N$ individuals. It is natural to suppose that each is represented by the same basic parametric model, but not necessarily the exact same parameter values. Thus, individual $i$ has parameters $\psi_i$. If we consider that individuals are randomly selected from the [http://en.wikipedia.org/wiki/Statistical_population population], then we can treat the $\psi_i$ as if they were random vectors. As both $\by=(y_i , 1\leq i \leq N)$ and $\bpsi=(\psi_i , 1\leq i \leq N)$ are random, the model is now a joint distribution: $\qypsi$. Using basic probability, this can be written as:<br />
<br />
{{Equation1<br />
|equation=<math> <br />
\pypsi(\by,\bpsi) = \pcypsi(\by {{!}} \bpsi) \, \ppsi(\bpsi) .</math> }}<br />
<br />
If $\qpsi$ is a parametric distribution that depends on a vector $\theta$ of ''population parameters'' and a set of ''individual covariates'' $\bc=(c_i , 1\leq i \leq N)$, this dependence can be made explicit by writing $\qpsi(\, \cdot \,;\theta,\bc)$ for the pdf of $\bpsi$.<br />
Each $i$ has a potentially unique set of times $t_i=(t_{i1},\ldots,t_{i \ \!\!n_i})$ in the design, and $n_i$ can be different for each individual.<br />
<br />
<br />
{{OutlineText<br />
|text= <br />
- In this context, the model is the joint distribution of the observations and the individual parameters:<br />
<br />
{{Equation1<br />
|equation=<math> \pypsi(\by , \bpsi; \theta, \bc,\bt)=\pcypsi(\by {{!}} \bpsi;\bt) \, \ppsi(\bpsi;\theta,\bc) . </math>}}<br />
<br />
- The inputs of the model are the population parameters $\theta$, the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times <br />
:$\bt=(t_{ij} ,\ 1\leq i \leq N ,\ 1\leq j \leq n_i)$.<br />
}}<br />
<br />
<br />
{{Example<br />
|title=Example<br />
|text= Let us suppose $ N$ patients received the same treatment as the single patient did. We now have the same PK model [[#ex_proba1|(1)]] for each patient, except that each has its own individual PK parameters $ V_i$ and $ k_i$ and potentially its own residual error parameter $ a_i$:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="ex_proba2a"><math> <br />
y_{ij} \sim {\cal N} \left( \displaystyle{\frac{500}{V_i}e^{-k_i \, t_{ij} } } , a_i^2 \right).<br />
</math></div><br />
|reference=(2) }}<br />
<br />
Here, $\psi_i = (V_i,k_i,a_i)$. One possible model is then to assume the same residual error model for all patients, and log-normal distributions for $V$ and $k$:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
a_i &=& a \end{eqnarray}</math> }}<br />
{{EquationWithRef<br />
|equation=<div id="ex_proba2b"><math>\begin{eqnarray}<br />
\log(V_i) &\sim_{i.i.d.}& {\cal N}\left(\log(V_{\rm pop})+\beta\,\log(w_i/70),\ \omega_V^2\right) \end{eqnarray}</math></div><br />
|reference=(3) }}<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\log(k_i) &\sim_{i.i.d.}& {\cal N}\left(\log(k_{\rm pop}),\ \omega_k^2\right), \end{eqnarray}</math> }}<br />
<br />
where the only covariate we choose to consider, $w_i$, is the weight (in kg) of patient $i$. The model therefore consists of the conditional distribution of the concentrations defined in [[#ex_proba2a|(2)]] and the distribution of the individual PK parameters defined in [[#ex_proba2b|(3)]]. The inputs of the model are the population parameters $\theta = (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$, the covariates (here, the weight) $(w_i, 1\leq i \leq N)$, and the design $\bt$.<br />
}}<br />
<br />
<br><br />
===A model for the population parameters===<br />
<br />
In some cases, it may turn out that it is useful or important to consider that the population parameter $\theta$ is itself random, rather than fixed. There are various reasons for this, such as if we want to model uncertainty in its value, introduce a priori information in an estimation context, or model an inter-population variability if the model is not looking at only one given population.<br />
<br />
If so, let us denote $\qth$ the distribution of $\theta$. As the status of $\theta$ has therefore changed, the model now becomes the joint distribution of its random variables, i.e., of $\by$, $\bpsi$ and $\theta$, and can be decomposed as follows:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="proba3a"><math><br />
\pypsith(\by,\bpsi,\theta;\bt,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta) .<br />
</math></div><br />
|reference=(4) }}<br />
<br />
<br />
{{Remarks<br />
|title=Remarks<br />
|text= <ol><br />
<li> The formula is identical for $\ppsi(\bpsi; \theta)$ and $\pcpsith(\bpsi{{!}}\theta)$. What has changed is the status of $\theta$. It is not random in $\ppsi(\bpsi; \theta)$, the distribution of $\bpsi$ for any given value of $\theta$, whereas it is random in $\pcpsith(\bpsi {{!}} \theta)$, the conditional distribution of $\bpsi$, i.e., the distribution of $\bpsi$ obtained after observing randomly generated $\theta$. </li><br><br />
<br />
<li>If $\qth$ is a parametric distribution with parameter $\varphi$, this dependence can be made explicit by writing $\qth(\, \cdot \,;\varphi)$ for the distribution of $\theta$.</li><br><br />
<br />
<li>Not necessarily all of the components of $\theta$ need be random. If it is possible to decompose $\theta$ into $(\theta_F,\theta_R)$, where $\theta_F$ is fixed and $\theta_R$ random, then the decomposition [[#proba3a{{!}}(4)]] becomes </li><br />
<br />
{{EquationWithRef<br />
|equation=<div id="proba3b"><math><br />
\pypsith(\by,\bpsi,\theta_R;\bt,\theta_F,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta_R;\theta_F,\bc) \, \pth(\theta_R).<br />
</math></div><br />
|reference=(5) }} <br />
</ol>}}<br />
<br />
<br />
{{OutlineText<br />
|text=<br />
<li> In this context, the model is the joint distribution of the observations, the individual parameters and the population parameters:<br />
<br />
{{Equation1<br />
|equation=<math>\pypsith(\by,\bpsi,\theta;\bc,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta). </math> }}<br />
<br />
<li> The inputs of the model are the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times $\bt=(t_{ij} , 1\leq i \leq N , 1\leq j \leq n_i)$.<br />
}}<br />
<br />
<br />
{{Example<br />
|title=Example:<br />
|text= We can introduce [http://en.wikipedia.org/wiki/Prior_probability prior distributions] in order to model the inter-population variability of the population parameters $ V_{\rm pop}$ and $k_{\rm pop}$: <br />
<br />
{{EquationWithRef<br />
|equation=<div id="ex_proba3"><math>\begin{eqnarray}<br />
V_{\rm pop} &\sim& {\cal N}\left(30,3^2\right) <br />
\end{eqnarray}</math></div><br />
|reference=(6) }}<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
k_{\rm pop} &\sim& {\cal N}\left(0.1,0.01^2\right). \end{eqnarray}</math> }}<br />
<br />
As before, the conditional distribution of the concentration is given by [[#ex_proba2a|(2)]]. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given $\theta_R=(V_{\rm pop},k_{\rm pop})$. The distribution of $\theta_R$ is defined in [[#ex_proba3|(6)]]. Here, the inputs of the model are the fixed population parameters $\theta_F = (\omega_V,\omega_k,\beta,a)$, the weights $(w_i)$ and the design $\bt$.<br />
}}<br />
<br />
<br />
<br><br />
<br />
===A model for the covariates===<br />
<br />
<br />
Another scenario is to suppose that in fact it is the covariates $\bc$ that are random, not the population parameters. This may either be in the context of wanting to simulate individuals, or when modeling and wanting to take into account uncertainty in the covariate values. If we note $\qc$ the distribution of the covariates, the joint distribution $\qpsic$ of the individual parameters and the covariates decomposes naturally as:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="proba4"><math><br />
\ppsic(\bpsi,\bc;\theta) = \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) \, ,<br />
</math><br />
|reference=(7) }}<br />
<br />
where $\qcpsic$ is the conditional distribution of $\bpsi$ given $\bc$.<br />
<br />
<br />
{{OutlineText<br />
|text= <br />
<li>In this context, the model is the joint distribution of the observations, the individual parameters and the covariates:<br />
<br />
{{Equation1<br />
|equation=<math> <br />
\pypsic(\by,\bpsi,\bc;\theta,\bt) = \pcypsi(\by {{!}} \bpsi;\bt) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) . </math> }}<br />
<br />
<li>The inputs of the model are the population parameters $\theta$ and the measurement times $\bt$.<br />
}}<br />
<br />
<br />
{{Example<br />
|title=Example:<br />
|text= We could assume a normal distribution as a prior for the weights: <br />
<br />
{{EquationWithRef<br />
|equation=<div id="ex_proba4" ><math> <br />
w_i \sim_{i.i.d.} {\cal N}\left(70,10^2\right). </math></div><br />
|reference=(8) }}<br />
<br />
Once more, [[#ex_proba2a|(2)]] defines the conditional distribution of the concentrations. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given the weight $\bw$, which is now a random variable whose distribution is defined in [[#ex_proba4|(8)]]. Now, the inputs of the model are the population parameters $\theta= (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$ and the design $\bt$.<br />
}}<br />
<br />
<br />
<br><br />
<br />
===A model for the measurement times===<br />
<br />
Another scenario is to suppose that there is uncertainty in the measurement times $\bt=(t_{ij})$ and not the population parameters or covariates. If we note $\nominal{\bt}=(\nominal{t}_{ij}, 1\leq i \leq N, 1\leq j \leq n_i)$ the nominal measurement times (i.e., those presented in a data set), then the "true" measurement times $\bt$ at which the measurement were made can be considered random fluctuations around $\nominal{\bt}$ following some distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br />
<br />
Randomness with respect to time can also appear in the presence of dropout, i.e., individuals that prematurely leave a clinical trial. For such an individual $i$ who leaves at the random time $T_i$, measurement times are the nominal times before $T_i$: $t_{i} = (\nominal{t}_{ij} \ \ {\rm s.t. }\ \ \nominal{t}_{ij}\leq T_i)$.<br />
In such situations, measurement times are therefore random and can be thought to come from a distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br />
<br />
<br />
{{Remarks<br />
|title=Remark<br />
|text= If there are also other regression variables $\bx=(x_{ij})$, it is of course possible to use the same approach and consider $\bx$ as a random variable fluctuating around $\nominal{\bx}$. }}<br />
<br />
<br />
{{OutlineText<br />
|text=<br />
<li> In this context, the model is the joint distribution of the observations, the individual parameters and the measurement times:<br />
<br />
{{Equation1<br />
|equation=<math>\pypsit(\by , \bpsi,\bt; \theta,\bc,\nominal{\bt})=\pcypsit(\by {{!}}\bpsi,\bt) \, \ppsi(\bpsi;\theta,\bc) \, \pt(\bt ; \nominal{\bt}) .<br />
</math> }}<br />
<br />
<br />
<li> The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$ and the nominal design $\nominal{\bt}$.<br />
}}<br />
<br />
{{Example<br />
|title=Example:<br />
|text= Let us assume as prior a normal distribution around the nominal times: <br />
<br />
{{EquationWithRef<br />
|equation=<div id="ex_proba5" ><math> <br />
t_{ij} \sim_{i.i.d.} {\cal N}\left(\nominal{t}_{ij},0.03^2\right). </math></div><br />
|reference=(9) }}<br />
<br />
Here, [[#ex_proba5|(9)]] defines the distribution of the now random variable $ \bt$. The other components of the model defined in [[#ex_proba2a|(2)]] and [[#ex_proba2b|(3)]] remain unchanged. <br />
The inputs of the model are the population parameters $ \theta$, the weights $ (w_i)$ and the nominal measurement times $ \nominal{\bt}$.<br />
}}<br />
<br />
<br />
<br><br />
<br />
===A model for the dose regimen===<br />
<br />
If the structural model is a dynamical system (e.g., defined by a system of [http://en.wikipedia.org/wiki/Ordinary_differential_equation ordinary differential equations]), the ''source terms'' $\bu = (u_i, 1\leq i \leq N)$, i.e., the inputs of the dynamical system, are usually considered fixed and known. This is the case for example for doses administered to patients for a given treatment. Here, the source term $u_i$ is made up of the dose(s) given to patient $i$, the time(s) of administration, and their type (intravenous bolus, infusion, oral, etc.).<br />
<br />
Here again, there may be differences between the nominal dose regimen stated in the protocol and given in the data set, and the dose regimen that was in reality administered. For example, it might be that the times of administration and/or the dosage were not exactly respected or recorded. Also, there may have been non-compliance, i.e., certain doses that were not taken by the patient.<br />
<br />
If we denote $\nominal{\bu}=(\nominal{u}_{i}, 1\leq i \leq N)$ the nominal dose regimens (reported in the dataset), then in this context the "real" dose regimens $\bu$ can be considered to randomly fluctuate around $\nominal{\bu}$ with some distribution $\qu(\, \cdot \, ; \nominal{\bu})$.<br />
<br />
<br />
{{OutlineText<br />
|text=<br />
<li> In this context, the model is the joint distribution of the observations, the individual parameters and the dose regimens:<br />
<br />
{{Equation1<br />
|equation=<math>\pypsiu(\by , \bpsi,\bu; \theta,\bc,\bt,\nominal{\bu})=\pcypsiu(\by {{!}} \bpsi,\bu;\bt) \, \pu(\bu ; \nominal{\bu}) \, \ppsi(\bpsi;\theta,\bc) .<br />
</math> }}<br />
<br />
<li> The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$, the nominal design $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br />
}}<br />
<br />
<br />
{{Example<br />
|title=Example:<br />
|text=Suppose that instead of the one dose given in the example up to now, there are now repeated doses $(d_{ik}, k \geq 1)$ administered to patient $i$ at times $(\tau_{ik} , k \geq 1)$. Then, it is easy to see that<br />
<br />
{{EquationWithRef<br />
|equation=<div id="ex_proba6b" ><math> <br />
y_{ij} \sim {\cal N}\left(f(t_{ij};V_i,k_i) , a_i^2 \right), </math></div><br />
|reference=(10) }}<br />
<br />
where<br />
<br />
{{EquationWithRef<br />
|equation=<div id="ex_proba6a" ><math> <br />
f(t;V_i,k_i) = \sum_{k, \tau_{ik}<t}\displaystyle {\frac{d_{ik} }{V_i} }\, e^{-k_i \, (t- \tau_{ik})} .<br />
</math></div><br />
|reference=(11) }}<br />
<br />
The "real" dose regimen administered to patient $i$ can be written $u_i=(d_{ik},\tau_{ik}, k\geq 1)$, and the prescribed dose regimen $\nominal{u}_i=(\nominal{d}_{ik},\nominal{\tau}_{ik}, k\geq 1)$.<br />
<br />
We can model the random fluctuations of the administration times $\tau_{ik}$ around the nominal times $(\nominal{\tau}_{ij})$:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="ex_proba6c"><math>\begin{eqnarray}<br />
\tau_{ik} &\sim_{i.i.d.}& {\cal N}\left(\nominal{\tau}_{ik},0.02^2\right)\ , <br />
\end{eqnarray}</math></div><br />
|reference=(12) }}<br />
<br />
and non-compliance (here meaning that a dose is not taken):<br />
<br />
{{EquationWithRef<br />
|equation=<div id="ex_proba6d"><math>\begin{eqnarray}<br />
\pi &=& \prob{d_{ik} = 0} \nonumber \\ &=& 1 - \prob{d_{ik}= \nominal{d}_{ik} }. <br />
\end{eqnarray}</math></div><br />
|reference=(13) }}<br />
<br />
Here, [[#ex_proba6b{{!}}(10)]] and [[#ex_proba6a{{!}}(11)]] define the conditional distributions of the concentrations $(y_{ij})$, [[#ex_proba2b{{!}}(3)]] defines the distribution of $\bpsi$ and [[#ex_proba6c{{!}}(12)]] and [[#ex_proba6d{{!}}(13)]] define the distribution of $\bu$. The inputs are the population parameters $\theta$, the weights $(w_i)$, the measurement times $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br />
}}<br />
<br />
<br />
<br><br />
<br />
===A complete model===<br />
<br />
We have now seen the variety of ways in which the variables in a model either play the role of random variables whose distribution is defined by the model, or nonrandom variables or parameters. Any combination is possible, depending on the context. For instance, the population parameters $\theta$ and covariates $\bc$ could be random with parametric probability distributions $\qth(\, \cdot \,;\varphi)$ and $\qc(\, \cdot \,;\gamma)$, and the dose regimen $\bu$ and measurement times $\bt$ reported with uncertainty and therefore modeled as random variables with distributions $\qu$ and $\qt$.<br />
<br />
<br />
{{OutlineText<br />
|text=<br />
<li> In this context, the model is the joint distribution of the observations, the individual parameters, the population parameters, the dose regimens, the covariates and the measurement times:<br />
<br />
{{Equation1<br />
|equation=<math><br />
\pypsithcut(\by , \bpsi, \theta, \bu, \bc,\bt; \nominal{\bu},\nominal{\bt},\varphi,\gamma)=\pcypsiut(\by {{!}}\bpsi,\bu,\bt) \, \pcpsithc(\bpsi{{!}}\theta,\bc) \, \pth(\theta;\varphi) \, \pc(\bc;\gamma) \, \pu(\bu ; \nominal{\bu}) \, \pt(\bt ; \nominal{\bt}). </math> }}<br />
<br />
<li> The inputs of the model are the nominal dose regimens $\nominal{\bu}$, the nominal measurement times $\nominal{\bt}$ and the "hyper-parameters" $\varphi$ and $\gamma$.<br />
}}<br />
<br />
<br />
<br><br />
<br />
==Using the model for executing tasks==<br />
<br />
<br />
In the modeling and simulation context, the tasks to execute make specific use of the various probability distributions associated with a model.<br />
<br />
<br><br />
===Simulation===<br />
<br />
<br />
By definition, simulation makes direct use of the probability distribution that defines the model. Simulation of the global model is straightforward as soon as the joint distribution can be decomposed into a product of easily simulated conditional distributions.<br />
<br />
Consider for example that the variables involved in the model are those introduced in [[#An illustrative example|the previous section]]:<br />
<br />
<br />
# The population parameters $\theta$ can either be given, or simulated from the distribution $\qth$.<br />
# The individual covariates $\bc$ can either be given, or simulated from the distribution $\qc$.<br />
# The individual parameters $\bpsi$ can be simulated from the distribution $\qcpsithc$ using the values of $\theta$ and $\bc$ obtained in steps 1 and 2.<br />
# The dose regimen $\bu$ can either be given, or simulated from the distribution $\qu$.<br />
# The measurement times $\bt$ (resp. regression variables $\bx$) can either be given, or simulated from the distribution $\qt$ (resp. $\qx$).<br />
# Lastly, observations $\by$ can be simulated from the distribution $\qcypsiut$ using the values of $\bpsi$, $\bu$ and $\bt$ obtained at steps 3, 4 and 5.<br />
<br />
<br />
{{OutlineText<br />
|text=<br />
Simulation of a set of variables $w$ using another given set of variables $z$ requires:<br />
<br />
<br />
<ul><br />
* a model, i.e., a distribution $\qw$ if $z$ is treated as a nonrandom variable, or a conditional distribution $\qcwz$ if $z$ is treated as a random variable.<br />
* the input $z$, i.e., a value of $z$ which allows the distribution $\qw(\, \cdot \, ; z)$ or the conditional distribution $\qcwz(\, \cdot \, {{!}} z)$ to be defined.<br />
* an algorithm which allows us to generate $w$ from $\qw$ or $\qcwz$.<br />
</ul><br />
}}<br />
<br />
<br />
{{Example<br />
|title=Example:<br />
|text=<br />
- Imagine instead that the population parameter $\theta$ and the design $(\bu,\bt)$ are given, and we want to simulate the individual covariates $\bc$, the individual parameters $\bpsi$ and the observations $\by$. Here, the variables to simulate are $w=(\bc,\bpsi,\by)$ and the variables which are given are $z=(\theta,\bu,\bt)$. If the components of $z$ are taken to be nonrandom variables, then:<br />
<br />
<br />
<ul><br />
* The model is the joint distribution $\qypsic( \, \cdot \, ;\theta,\bu,\bt)$ of $(\by,\bpsi,\bc)$.<br />
* The inputs required for the simulation are the values of $(\theta,\bu,\bt)$.<br />
* The algorithm should be able to generate $(\by,\bpsi,\bc)$ from the joint distribution $\qypsic(\, \cdot \, ;\theta,\bu,\bt)$. Decomposing the model into three submodels leads to decomposing the joint distribution as<br />
</ul><br />
<br />
{{Equation1<br />
|equation=<math><br />
\pypsic(\by,\bpsi,\bc ;\theta,\bu,\bt) = \pc(\bc) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pcypsi(\by {{!}} \bpsi;\bu,\bt) . </math> }}<br />
<br />
The algorithm therefore reduces to successively drawing $\bc$, $\bpsi$ and $\by$ from $\qc$, $\qcpsic(\, \cdot \, {{!}} \bc;\theta)$ and $\qcypsi(\, \cdot \, {{!}} \bpsi;\bu,\bt)$. <br />
<br />
<br />
- Imagine instead that the individual covariates $\bc$, the observations $\by$, the design $(\bu,\bt)$ and the population parameter $\theta$ are given (in a modeling context for instance, $\theta$ may have been estimated), and we want to simulate the individual parameters $\bpsi$. The only variable to simulate is $w=\bpsi$ and the variables which are given are $z=(\by,\bc,\theta,\bu,\bt)$. Here, we will treat $\by$ as if it is a random variable. The other components of $z$ can be treated as non random variables. Here,<br />
<br />
<br />
<ul><br />
* The model is the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$ of $\psi$.<br />
* The inputs required for the simulation are the values of $(\by,\bc,\theta,\bu,\bt)$.<br />
* The algorithm should be able to sample $\bpsi$ from the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$. [http://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo Markov Chain Monte Carlo] (MCMC) algorithms can be used for sampling from such complex conditional distributions.<br />
</ul><br />
}}<br />
<br />
<br><br />
<br />
===Estimation of the population parameters===<br />
<br />
<br />
In a modeling context, we usually assume that we have data that includes the observations $\by$ and the measurement times $\bt$. There may also be individual covariates $\bc$, and in pharmacological applications the dose regimen $\bu$. For clarity, let us consider the most general case where all are present.<br />
<br />
Any statistical method for estimating the population parameters $\theta$ will be based on some specific probability distribution. Let us illustrate this with two common statistical methods: maximum likelihood and Bayesian estimation.<br />
<br />
<br />
''Maximum likelihood estimation'' consists in maximizing with respect to $\theta$ the ''observed likelihood'', defined by:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
{\like}(\theta ; \by,\bc,\bu,\bt) &\eqdef& \py(\by ; \bc,\bu,\bt,\theta) \\<br />
&=& \int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi .<br />
\end{eqnarray}</math> }}<br />
<br />
The variance of the estimator $\thmle$ and therefore confidence intervals can be derived from the observed Fisher information matrix, which itself is calculated using the observed likelihood (i.e., the pdf of the observations $\by$):<br />
<br />
{{EquationWithRef<br />
|equation=<div id="ofim_intro3"><math><br />
\ofim(\thmle ; \by,\bc,\bu,\bt) \ \ \eqdef \ \ - \displaystyle{ \frac{\partial^2}{\partial \theta^2} } \log({\like}(\thmle ; \by,\bc,\bu,\bt)) .<br />
</math></div><br />
|reference=(14) }}<br />
<br />
<br />
{{OutlineText<br />
|text=Maximum likelihood estimation of the population parameter $\theta$ requires:<br />
<br />
<br />
* a model, i.e., a joint distribution $\qypsi$.<br />
* inputs $\by$, $\bc$, $\bu$ and $\bt$.<br />
* an algorithm which allows us to maximize $\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi$ with respect to $\theta$ and to compute $\displaystyle{ \frac{\partial^2}{\partial \theta^2} }\left\{\log\left(\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\thmle) \, d \bpsi \right)\right\}$.<br />
}}<br />
<br />
<br />
''Bayesian estimation'' consists in estimating and/or maximizing the conditional distribution<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\pcthy(\theta {{!}} \by ;\bc,\bu,\bt) &=& \displaystyle{ \frac{\pyth(\by , \theta ; \bc,\bu,\bt)}{\py(\by ; \bc,\bu,\bt)} } \\<br />
&=& \frac{\displaystyle{ \int \pypsith(\by,\bpsi,\theta ; \bc,\bu,\bt) \, d \bpsi} }{\py(\by ; \bc,\bu,\bt)} .<br />
\end{eqnarray}</math> }}<br />
<br />
<br />
{{outlineText<br />
|text= Bayesian estimation of the population parameter $\theta$ requires:<br />
<br />
<br />
* a model, i.e., a joint distribution $\qypsith(\, \cdot \, ; \bc, \bu, \bt)$ for $(\by,\bpsi,\theta)$.<br />
* inputs $\by$, $\bc$, $\bu$ and $\bt$.<br />
* algorithms able to estimate and maximize $\pcthy(\theta {{!}} \by ;\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode, i.e., finding its maximum.<br />
}}<br />
<br />
<br />
<br><br />
<br />
===Estimation of the individual parameters===<br />
<br />
<br />
When $\theta$ is given (or estimated), various estimators of the individual parameters $\bpsi$ are available. They are all based on a probability distribution:<br />
<br />
''Maximum likelihood estimation'' consists of maximizing with respect to $\bpsi$ the ''conditional likelihood''<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
{\like}(\bpsi ; \by,\bu,\bt) &\eqdef& \pcypsi(\by {{!}} \bpsi ;\bu,\bt) .<br />
\end{eqnarray}</math> }}<br />
<br />
The ''maximum a posteriori'' (MAP) estimator is obtained by maximizing with respect to $\bpsi$ the ''conditional distribution''<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\pcpsiy(\bpsi {{!}} \by ; \theta,\bc,\bu,\bt) &=& \displaystyle{ \frac{\pypsi(\by , \bpsi;\theta,\bc,\bu,\bt)}{\py(\by ; \theta,\bc,\bu,\bt)} } .<br />
\end{eqnarray}</math> }}<br />
<br />
The ''conditional mean'' of $\bpsi$ is defined as the mean of the conditional distribution $\qcpsiy(\, \cdot \, | \by ; \theta,\bc,\bu,\bt)$ of $\psi$.<br />
<br />
<br />
{{OutlineText<br />
|text=<br />
Estimation of the individual parameters $\bpsi$ requires:<br />
<br />
<br />
* a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br />
* inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br />
* algorithms able to estimate and maximize $\pcpsiy(\bpsi {{!}} \by ; \theta,\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode (i.e., its MAP).<br />
}}<br />
<br />
<br />
<br><br />
===Model selection===<br />
<br />
<br />
Likelihood ratio tests and statistical information criteria ([http://en.wikipedia.org/wiki/Bayesian_information_criterion BIC], [http://en.wikipedia.org/wiki/Akaike_information_criterion AIC]) compare the ''observed likelihoods'' computed under different models, i.e., the probability distribution functions $\py^{(1)}(\by ; \bc,\bu,\bt,\thmle_1)$, $\py^{(2)}(\by ; \bc,\bu,\bt,\thmle_2)$, ..., $\py^{(K)}(\by ; \bc,\bu,\bt,\thmle_K)$ computed under models ${\cal M}_1, {\cal M}_2$, ..., ${\cal M}_K$, where $\thmle_k$ maximizes the observed likelihood of model ${\cal M}_k$, i.e., maximizes $\py^{(k)}(\by ; \bc,\bu,\bt,\theta)$ .<br />
<br />
<br />
{{outlineText<br />
|text=<br />
Computing the observed likelihood and information criteria require:<br />
<br />
<br />
* a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br />
* inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br />
* an algorithm able to compute $\int \pypsi( \by ,\bpsi ;\theta,\bc,\bu,\bt) \, d\bpsi$. For nonlinear models, linearization methods or Monte Carlo methods can be used.<br />
}}<br />
<br />
<br />
<br><br />
<br />
===Optimal design===<br />
<br />
In the design of experiments for estimating statistical models, optimal designs allow parameters to be estimated with minimum variance by optimizing some statistical criteria. Common optimality criteria are [http://en.wikipedia.org/wiki/Functional_%28mathematics%29 functionals] of the [http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors eigenvalues] of the expected Fisher information matrix<br />
<br />
{{EquationWithRef<br />
|equation=<div id="efim_intro3"><math><br />
\efim(\theta ; \bu,\bt) \ \ \eqdef \ \ \esps{y}{\ofim(\theta ; \by,\bu,\bt)} ,<br />
</math></div><br />
|reference=(15) }}<br />
<br />
where $\ofim$ is the observed Fisher information matrix defined in [[#ofim_intro3|(15)]]. For the sake of simplicity, we consider models without covariates $\bc$.<br />
<br />
<br />
{{OutlineText<br />
|text=Optimal design for minimum variance estimation requires:<br />
<br />
<br />
* a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br />
* a vector of population parameters $\theta$.<br />
* a criteria ${\cal D}(\bu,\bt)$ derived from the expected Fisher information matrix $\efim(\theta ; \bu,\bt)$.<br />
* an algorithm able to estimate $\efim(\theta ; \bu,\bt)$ for any design $(\bu,\bt)$ and to maximize ${\cal D}(\bu,\bt)$ with respect to $\bu$ and $\bt$.<br />
}}<br />
<br />
<br />
In a [http://en.wikipedia.org/wiki/Clinical_trial clinical trial] context, studies are designed to optimize the probability of reaching some predefined target ${\cal A}$, i.e., $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$. This may include optimizing safety and efficacy, and things like the probability of reaching [http://en.wikipedia.org/wiki/Sustained_viral_response sustained virologic response], etc.<br />
<br />
<br />
{{OutlineText<br />
|text=Optimal design for clinical trials requires:<br />
<br />
<br />
* a model, i.e., a joint distribution $\qypsic(\, \cdot \, ; \theta, \bu, \bt)$ for $(\by,\bpsi,\bc)$.<br />
* a vector of population parameters $\theta$.<br />
* a target ${\cal A}$.<br />
* an algorithm able to estimate $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$ and to maximize it with respect to $\bu$ and $\bt$.<br />
}}<br />
<br />
<br />
<br><br />
<br />
==Implementing models and running tasks==<br />
<br />
<br />
===Example 1 ===<br />
<br />
Consider first the model defined by the joint distribution <br />
<br />
{{Equation1<br />
|equation= <math>\pypsi(\by,\bpsi ; \theta ,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \pcpsic(\bpsi ; \theta),</math>}}<br />
<br />
where as in our running example, <br />
<br />
<br />
<ul><br />
* $\by = (y_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are concentrations <br />
<br />
* $ \bpsi= (\psi_i, 1\leq i \leq N)$ are individual parameters; here $ \psi_i=(V_i,k_i,a_i)$<br />
<br />
* $ \theta=(V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,a)$ are population parameters <br />
<br />
* $ \bt = (t_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are the measurement times. <br />
</ul><br />
<br />
<br />
We aim to define a joint model for $\by$ and $\bpsi$. To do this, we will characterize each component of the model and show how this can be implemented with $\mlxtran$.<br />
<br />
{| cellspacing="10" cellpadding="10"<br />
|style="width:50%"|<br />
{{Equation2 <br />
|name=<math> \pypsi(\by,\bpsi ; \theta, \bt) </math> <br />
|equation= }}<br />
{{Equation2<br />
|name= <math> \pcpsic(\bpsi {{!}}\theta)</math><br />
|equation=<br />
<math>\begin{array}{c} <br />
\log(V_i) &\sim& {\cal N}\left(\log(V_{\rm pop}), \, \omega_V^2\right) \\<br />
\log(k_i) &\sim& {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br />
\end{array}</math> }}<br />
{{Equation2<br />
|name= <math>\pcypsi(y{{!}}\bpsi; \bt) </math><br />
|equation=<br />
<math>\begin{eqnarray}<br />
f(t;V_i,k_i) &=& \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br />
y_{ij} &\sim& {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br />
\end{eqnarray}</math> }}<br />
<br />
|style = "width:50%" |<br />
{{MLXTranForTable<br />
|name=Example 1<br />
|text=<pre style="background-color:#EFEFEF; border:none;"> <br />
[INDIVIDUAL PARAMETER]<br />
input={V_pop,k_pop,omega_V,omega_k}<br />
<br />
DEFINITION:<br />
V = {distribution=logNormal, prediction=V_pop,sd=omega_V}<br />
k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br />
<br />
<br />
[OBSERVATION]<br />
input={V,k,a}<br />
<br />
EQUATION:<br />
f = 500/V*exp(-k*t)<br />
<br />
DEFINITION:<br />
y = {distribution=normal, prediction=f, sd=a}<br />
</pre> }}<br />
|}<br />
<br />
<br />
We can then use this model with different tools for executing different tasks: it can be used for example with $\mlxplore$ for model exploration, with $\monolix$ for modeling, with R or Matlab for simulation, etc.<br />
<br />
It is important to remember that $\mlxtran$ is not a "function" that calculates an output. It is not an imperative but rather a declarative language, one that allows us to describe a model. It is then the tasks we choose to do which use $\mlxtran$ like a function, "requesting" it to give predictions, simulate random variables, compute a pdf, maximizes a likelihood, etc.<br />
<br />
<br />
<br><br />
<br />
===Example 2===<br />
<br />
Consider now a model defined by the joint distribution<br />
<br />
{{Equation1<br />
|equation=<math><br />
\pypsithc(\by,\bpsi, \theta, \bc ; \bt) = \pcypsi(\by{{!}}\bpsi;\bt) \pcpsic(\bpsi{{!}}\bc ; \theta) \, \pth(\theta) \pc(\bc) ,<br />
</math> }}<br />
<br />
where the covariates $\bc$ are the weights of the individuals: $\bc = (w_i, 1\leq i \leq N)$. The other variables and parameters are those already defined in the previous example.<br />
<br />
We now aim to define a joint model for $\by$, $\bpsi$, $\bc$ and $\theta_R=(V_{\rm pop},k_{\rm pop})$.<br />
<br />
<br />
{| cellspacing="10" cellpadding="10"<br />
|style="width:50%" |<br />
{{Equation2 <br />
|name= <math>\pypsithc(\by,\bpsi, \theta, \bc ; \bt)</math><br />
|equation= }}<br />
{{Equation2<br />
|name=<math>\pth(\theta)</math><br />
|equation=<math>\begin{eqnarray}<br />
V_{\rm pop} &\sim& {\cal N}\left(30,3^2\right) \\<br />
k_{\rm pop} &\sim& {\cal N}\left(0.1,0.01^2\right)<br />
\end{eqnarray}</math> }}<br />
{{Equation2<br />
|name=<math>\pc(\bc)</math><br />
|equation=<br />
<math>\begin{eqnarray}<br />
w_i &\sim& {\cal N}\left(70,10^2\right)<br />
\end{eqnarray}</math> }}<br />
{{Equation2<br />
|name=<math>\pcpsic(\bpsi {{!}}\bc;\theta)</math><br />
|equation=<math><br />
\begin{eqnarray}<br />
\hat{V}_i &=& V_{\rm pop}\left(\frac{w_i}{70}\right)^\beta \\[0.4cm]<br />
\log(V_i) &\sim& {\cal N}\left(\log(\hat{V}_i), \, \omega_V^2\right) \\<br />
\log(k_i) &\sim& {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br />
\end{eqnarray}</math> }}<br />
{{Equation2<br />
|name=<math>\pcypsi(y{{!}}\bpsi; \bt) </math><br />
|equation=<math>\begin{eqnarray}<br />
f(t;V_i,k_i) &=& \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br />
y_{ij} &\sim& {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br />
\end{eqnarray}</math> }}<br />
<br />
|style="width:50%"|<br />
{{MLXTranForTable<br />
|name=jointModel2.txt<br />
|text=<pre style="background-color:#EFEFEF; border:none"><br />
[POPULATION PARAMETER]<br />
<br />
DEFINITION:<br />
V_pop = {distribution=normal, mean=30, sd=3}<br />
k_pop = {distribution=normal, mean=0.1, sd=0.01}<br />
<br />
<br />
[COVARIATE]<br />
<br />
DEFINITION:<br />
weight = {distribution=normal, mean=70, sd=10}<br />
<br />
<br />
<br />
[INDIVIDUAL PARAMETER]<br />
input={V_pop,k_pop,omega_V,omega_k,beta,weight}<br />
<br />
EQUATION:<br />
V_pred = V_pop*(weight/70)^beta<br />
<br />
DEFINITION:<br />
V = {distribution=logNormal, prediction=V_pred,sd=omega_V}<br />
k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br />
<br />
<br />
[OBSERVATION]<br />
input={V,k,a}<br />
<br />
EQUATION:<br />
f = 500/V*exp(-k*t)<br />
<br />
DEFINITION:<br />
y = {distribution=normal, prediction=f, sd=a}<br />
</pre> }}<br />
|}<br />
<br />
We can use the approach described above for various tasks, e.g., simulating $(\by,\bpsi, \bc, \theta_R)$ for a given input $(\theta_F, \bt)$, simulating the population parameters $(V_{\rm pop},k_{\rm pop})$ with the conditional distribution $p_{\theta_R|\by, \bc}( \, \cdot \, | \by, \bc ; \theta_F,\bt)$, estimating the log-likelihood, maximizing the observed likelihood and computing the MAP.<br />
<br />
<br />
<!--<br />
==Bibliography==<br />
TO DO<br />
--><br />
<br />
<br />
{{Back&Next<br />
|linkBack=The individual approach<br />
|linkNext=Description, representation and implementation of a model }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=Overview&diff=7385Overview2013-06-21T08:34:46Z<p>Brocco: </p>
<hr />
<div><div style="color: #2E5894; padding-left: 1.4em; padding-right:2.2em; padding-bottom:0.8em; padding:top:1">[[Image:attention4.jpg|45px|left|link=]] <br />
(If you are experiencing problems with the display of the mathematical formula, you can either try to use another browser, or use this link which should work smoothly: http://popix.lixoft.net)<br />
</div><br />
$<br />
\def\simulix{\mathsf{simulix} } <br />
$<br />
<br />
The desire to model a biological or physical phenomenon often arises when we are able to record some observations issued from that phenomenon. Nothing would be more natural therefore than to begin this introduction by looking at some observed data.<br />
<br />
<br />
{{ExampleWithImage<br />
|text= This first plot displays the [http://en.wikipedia.org/wiki/Viral_load viral load] of four patients with [http://en.wikipedia.org/wiki/Hepatitis_C hepatitis C] who started a treatment at time $t=0$.<br />
|image = NEWintro1.png<br />
}} <br />
<br />
<br />
{{ExampleWithImage<br />
|text=This second example involves weight data for rats measured over 14 weeks, for a sub-chronic [http://en.wikipedia.org/wiki/Toxicity toxicity] study related to the question of [http://en.wikipedia.org/wiki/Genetically_modified_maize genetically modified corn].<br />
|image = NEWintro2.png}}<br />
<br />
<br />
{{ExampleWithImage<br />
|text= In this third example, data are [http://en.wikipedia.org/wiki/Fluorescence fluorescence] intensities measured over time in a cellular biology experiment.<br />
|image=NEWintro3.png }}<br />
<br />
<br />
{{ExampleWithImage<br />
|text= Note that repeated measurements are not necessarily always functions of time.<br />
For example, we may be interested in corn production as a function of fertilizer quantity.<br />
|image= NEWintro4.png}}<br />
<br />
<br />
Even though these examples come from quite different domains, in each case the data is made up of repeated measurements on several individuals from a population. What we will call a "population approach" is therefore relevant for characterizing and modeling this data. The modeling goal is thus twofold: characterize the biological or physical phenomena observed for each individual, and secondly, the variability seen between individuals.<br />
<br />
In the example with the rats, the model needs to integrate a growth model that describes how a rat's weight increases with time, and a statistical model that describes why these kinetics can vary from one rat to another. The goal is thus to finish with a "typical" curve for the population (in red) and to be able to explain the variability in the individual's curves (in green) around this population curve.<br />
<br />
<br />
::[[File:NEWintro5.png|link=]]<br />
<br />
<br />
The model will explain some of this variability by individual [http://en.wikipedia.org/wiki/Covariate covariates] such as sex or diet (rats 1 and 3 are male while rats 2 and 4 are female), but some of the variability will remain unexplained and will be considered as random. Integrating into the same model effects considered fixed and others considered random leads naturally to the use of [http://en.wikipedia.org/wiki/Mixed_model mixed-effects models].<br />
<br />
An alternative yet equivalent approach considers this model as a [http://en.wikipedia.org/wiki/Multilevel_model hierarchical] one: each curve is described by a single model, and the variability between individual models is described by a population model. In the case of [http://en.wikipedia.org/wiki/Parametric_model parametric models], this means that the observations for a given individual are described by a model of the observations that depends on a vector of individual parameters: this is the classic individual approach. The population approach is then a direct extension of [[The individual approach|the individual approach]]: we add a component to the model that describes the variability of the individual parameters within the population.<br />
<br />
A model can thus be seen as a [[What is a model? A joint probability distribution! | joint probability distribution]], which can easily be extended to the case where other variables in the model are considered as random variables: covariates, population parameters, the design, etc. The hierarchical structure of the model leads to a natural decomposition of the joint distribution into a product of [http://en.wikipedia.org/wiki/Conditional_probability_distribution conditional] and [http://en.wikipedia.org/wiki/Marginal_distribution marginal] distributions.<br />
<br />
Models for [[Modeling the individual parameters |individual parameters]] and models for [[Modeling the observations | observations]] are described in the [[Introduction_%26_notation|Models]] chapter. In particular, models for [[Continuous data models|continuous observations]], [[Model for categorical data|categorical data]], [[Models for count data|count data]] and [[ Models for time-to-event data | survival data]] are presented and illustrated by various examples. Extensions for [[ Mixture models|mixture models]], [[Hidden Markov models|hidden Markov models]] and [[Stochastic differential equations based models| stochastic differential equation based models]] are also presented.<br />
<br />
The Tasks & Tools chapter presents practical examples of using these models: [[Visualization|exploration and visualization]], [[Estimation|estimation]], [[Model evaluation#Model diagnostics|model diagnostics]], [[Model evaluation#Model selection|model selection]] and [[Simulation|simulation]]. All approaches and proposed methods are rigorously detailed in the [[Introduction and notation|Methods]] chapter.<br />
<br />
The main purpose of a model is to be used. Mathematical modeling and statistics remain useful tools for many disciplines (biology, agronomy, environmental studies, pharmacology, etc.), but it is important that these tools are used properly. The various software packages used in this wiki have been developed with this in mind: they serve the modeler well, while fully complying with a coherent mathematical formalism and using well-known and theoretically justified methods.<br />
<br />
Tools for model exploration ($\mlxplore$), modeling ($\monolix$) and simulation ($\simulix$) use the same model coding language $\mlxtran$. This allows us to define a complete workflow using the same model implementation, i.e., to run several different tasks based on the same model.<br />
<br />
$\mlxtran$ is extremely flexible and well-adapted to implementing complex mixed-effects models.<br />
With $\mlxtran$ we can easily write ODE-based models, implement [[Introduction_to_PK_modeling_using_MLXPlore_-_Part_I|pharmacokinetic models]] with complex administration schedules, include inter-individual variability in parameters, define statistical models for covariates, etc.<br />
Another crucial property of $\mlxtran$ is that it rigorously adopts the model representation formalism proposed in $\wikipopix$. In other words, the model implementation is fully consistent with its mathematical representation.<br />
<br />
$\mlxplore$ provides a clear graphical interface that allows us to visualize not only the structural model but also the statistical model, which is of fundamental importance in the population approach. We can visualize for instance the impact of covariates and inter-individual variability of model parameters on predictions. $\mlxplore$ is an ideal tool for teaching or discovering what a [[Introduction_to_PK_modeling_using_MLXPlore_-_Part_I|pharmacokinetic model]] is, for example.<br />
<br />
The algorithms implemented in $\monolix$ ([http://en.wikipedia.org/wiki/Stochastic_approximation Stochastic Approximation] of EM, [http://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo MCMC], [http://en.wikipedia.org/wiki/Simulated_Annealing Simulated Annealing], [http://en.wikipedia.org/wiki/Importance_sampling Importance Sampling], etc.) are extremely efficient for a wide variety of complex models. Furthermore, convergence of [[The SAEM algorithm for estimating population parameters|SAEM]] and its extensions ([[Mixture models|mixture models]], [[Hidden Markov models|hidden Markov models]], [[Stochastic differential equations based models|SDE-based models]], censored data, etc.) has been rigorously proved and published in statistical journals.<br />
<br />
$\simulix$ is a model computation engine which enables us to simulate a $\mlxtran$ model from within various environments. $\simulix$ is now available for the Matlab and R platforms, allowing any user to combine the flexibility of R and Matlab scripts with the power of $\mlxtran$ in order to easily encode complex models and simulate data.<br />
<br />
For these reasons, $\wikipopix$ and these tools can be used with confidence for training and teaching. This is even more the case because $\mlxplore$, $\monolix$ and $\simulix$ are free for academic research and education purposes.<br />
<br />
<br />
{{Next<br />
|link=The individual approach }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=What_is_a_model%3F_A_joint_probability_distribution!&diff=7378What is a model? A joint probability distribution!2013-06-19T12:08:25Z<p>Brocco: </p>
<hr />
<div>==Introduction==<br />
<br />
A model built for real-world applications can involve various types of variables, such as measurements, individual and population parameters, covariates, design, etc. The model allows us to represent relationships between these variables.<br />
<br />
If we consider things from a probabilistic point of view, some of the variables will be random, so the model becomes a probabilistic one, representing the [http://en.wikipedia.org/wiki/Joint_probability_distribution joint distribution] of these random variables.<br />
<br />
Defining a model therefore means defining a joint distribution. The hierarchical structure of the model will then allow it to be decomposed into submodels, i.e., the joint distribution decomposed into a product of [http://en.wikipedia.org/wiki/Conditional_probability_distribution conditional distributions].<br />
<br />
Tasks such as estimation, model selection, simulation and optimization can then be expressed as specific ways of using this probability distribution.<br />
<br />
<br />
{{OutlineText<br />
|text= <br />
- A model is a joint probability distribution. <br />
<br />
- A submodel is a conditional distribution derived from this joint distribution. <br />
<br />
- A task is a specific use of this distribution. <br />
}}<br />
<br />
We will illustrate this approach starting with a very simple example that we will gradually make more sophisticated. Then we will see in various situations what can be defined as the model and what its inputs are.<br />
<br />
<br />
<br><br />
<br />
==An illustrative example==<br />
<br />
<br><br />
===A model for the observations of a single individual===<br />
Let $y=(y_j, 1\leq j \leq n)$ be a vector of observations obtained at times $\vt=(t_j, 1\leq j \leq n)$. We consider that the $y_j$ are random variables and we denote $\qy$ the distribution (or [http://en.wikipedia.org/wiki/Probability_density_function pdf]) of $y$. If we assume a [http://en.wikipedia.org/wiki/Parametric_model parametric model], then there exists a vector of parameters $\psi$ that completely define $y$.<br />
<br />
We can then explicitly represent this dependency with respect to $\bpsi$ by writing $\qy( \, \cdot \, ; \psi)$ for the pdf of $y$.<br />
<br />
If we wish to be even more precise, we can even make it clear that this distribution is defined for a given design, i.e., a given vector of times $\vt$, and write $ \qy(\, \cdot \, ; \psi,\vt)$ instead.<br />
<br />
By convention, the variables which are before the symbol ";" are random variables. Those that are after the ";" are non-random parameters or variables.<br />
When there is no risk of confusion, the non-random terms can be left out of the notation.<br />
<br />
<br />
{{OutlineText<br />
|text= <br />
-In this context, the model is the distribution of the observations $\qy(\, \cdot \, ; \psi,\vt)$. <br><br />
-The inputs of the model are the parameters $\psi$ and the design $\vt$.<br />
}}<br />
<br />
<br />
{{Example<br />
|title=Example<br />
|text= 500 mg of a drug is given by [http://en.wikipedia.org/wiki/Intravenous_therapy intravenous] [http://en.wikipedia.org/wiki/Bolus_%28medicine%29 bolus] to a patient at time 0. We assume that the evolution of the [http://en.wikipedia.org/wiki/Blood_plasma plasmatic] concentration of the drug over time is described by the [http://en.wikipedia.org/wiki/Pharmacokinetics pharmacokinetic] (PK) model<br />
<br />
{{Equation1<br />
|equation=<math> f(t;V,k) = \displaystyle{ \frac{500}{V} }e^{-k \, t} , </math> }}<br />
<br />
where $V$ is the [http://en.wikipedia.org/wiki/Volume_of_distribution volume of distribution] and $k$ the [http://en.wikipedia.org/wiki/Elimination_rate_constant elimination rate constant]. The concentration is measured at times $(t_j, 1\leq j \leq n)$ with additive residual errors:<br />
<br />
{{Equation1<br />
|equation=<math> y_j = f(t_j;V,k) + e_j , \quad 1 \leq j \leq n . </math> }}<br />
<br />
Assuming that the residual errors $(e_j)$ are [http://en.wikipedia.org/wiki/Dependent_and_independent_variables independent] and [http://en.wikipedia.org/wiki/Normal_distribution normally distributed] with constant variance $a^2$, the observed values $(y_j)$ are also independent random variables and<br />
<br />
{{EquationWithRef<br />
|equation=<div id="ex_proba1" ><math><br />
y_j \sim {\cal N} \left( f(t_j ; V,k) , a^2 \right), \quad 1 \leq j \leq n. </math></div><br />
|reference=(1) }}<br />
<br />
Here, the vector of parameters $\psi$ is $(V,k,a)$. $V$ and $k$ are the PK parameters for the structural PK model and $a$ the residual error parameter.<br />
As the $y_j$ are independent, the joint distribution of $y$ is the product of their marginal distributions:<br />
<br />
{{Equation1<br />
|equation=<math> \py(y ; \psi,\vt) = \prod_{j=1}^n \pyj(y_j ; \psi,t_j) ,<br />
</math> }}<br />
<br />
where $\qyj$ is the normal distribution defined in [[#ex_proba1|(1)]]. <br />
}}<br />
<br />
<br />
<br />
<br><br />
<br />
=== A model for several individuals ===<br />
<br />
Now let us move to $N$ individuals. It is natural to suppose that each is represented by the same basic parametric model, but not necessarily the exact same parameter values. Thus, individual $i$ has parameters $\psi_i$. If we consider that individuals are randomly selected from the [http://en.wikipedia.org/wiki/Statistical_population population], then we can treat the $\psi_i$ as if they were random vectors. As both $\by=(y_i , 1\leq i \leq N)$ and $\bpsi=(\psi_i , 1\leq i \leq N)$ are random, the model is now a joint distribution: $\qypsi$. Using basic probability, this can be written as:<br />
<br />
{{Equation1<br />
|equation=<math> <br />
\pypsi(\by,\bpsi) = \pcypsi(\by {{!}} \bpsi) \, \ppsi(\bpsi) .</math> }}<br />
<br />
If $\qpsi$ is a parametric distribution that depends on a vector $\theta$ of ''population parameters'' and a set of ''individual covariates'' $\bc=(c_i , 1\leq i \leq N)$, this dependence can be made explicit by writing $\qpsi(\, \cdot \,;\theta,\bc)$ for the pdf of $\bpsi$.<br />
Each $i$ has a potentially unique set of times $t_i=(t_{i1},\ldots,t_{i \ \!\!n_i})$ in the design, and $n_i$ can be different for each individual.<br />
<br />
<br />
{{OutlineText<br />
|text= <br />
- In this context, the model is the joint distribution of the observations and the individual parameters:<br />
<br />
{{Equation1<br />
|equation=<math> \pypsi(\by , \bpsi; \theta, \bc,\bt)=\pcypsi(\by {{!}} \bpsi;\bt) \, \ppsi(\bpsi;\theta,\bc) . </math>}}<br />
<br />
- The inputs of the model are the population parameters $\theta$, the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times <br />
:$\bt=(t_{ij} ,\ 1\leq i \leq N ,\ 1\leq j \leq n_i)$.<br />
}}<br />
<br />
<br />
{{Example<br />
|title=Example<br />
|text= Let us suppose $ N$ patients received the same treatment as the single patient did. We now have the same PK model [[#ex_proba1|(1)]] for each patient, except that each has its own individual PK parameters $ V_i$ and $ k_i$ and potentially its own residual error parameter $ a_i$:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="ex_proba2a"><math> <br />
y_{ij} \sim {\cal N} \left( \displaystyle{\frac{500}{V_i}e^{-k_i \, t_{ij} } } , a_i^2 \right).<br />
</math></div><br />
|reference=(2) }}<br />
<br />
Here, $\psi_i = (V_i,k_i,a_i)$. One possible model is then to assume the same residual error model for all patients, and log-normal distributions for $V$ and $k$:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
a_i &=& a \end{eqnarray}</math> }}<br />
{{EquationWithRef<br />
|equation=<div id="ex_proba2b"><math>\begin{eqnarray}<br />
\log(V_i) &\sim_{i.i.d.}& {\cal N}\left(\log(V_{\rm pop})+\beta\,\log(w_i/70),\ \omega_V^2\right) \end{eqnarray}</math></div><br />
|reference=(3) }}<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\log(k_i) &\sim_{i.i.d.}& {\cal N}\left(\log(k_{\rm pop}),\ \omega_k^2\right), \end{eqnarray}</math> }}<br />
<br />
where the only covariate we choose to consider, $w_i$, is the weight (in kg) of patient $i$. The model therefore consists of the conditional distribution of the concentrations defined in [[#ex_proba2a|(2)]] and the distribution of the individual PK parameters defined in [[#ex_proba2b|(3)]]. The inputs of the model are the population parameters $\theta = (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$, the covariates (here, the weight) $(w_i, 1\leq i \leq N)$, and the design $\bt$.<br />
}}<br />
<br />
<br><br />
===A model for the population parameters===<br />
<br />
In some cases, it may turn out that it is useful or important to consider that the population parameter $\theta$ is itself random, rather than fixed. There are various reasons for this, such as if we want to model uncertainty in its value, introduce a priori information in an estimation context, or model an inter-population variability if the model is not looking at only one given population.<br />
<br />
If so, let us denote $\qth$ the distribution of $\theta$. As the status of $\theta$ has therefore changed, the model now becomes the joint distribution of its random variables, i.e., of $\by$, $\bpsi$ and $\theta$, and can be decomposed as follows:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="proba3a"><math><br />
\pypsith(\by,\bpsi,\theta;\bt,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta) .<br />
</math></div><br />
|reference=(4) }}<br />
<br />
<br />
{{Remarks<br />
|title=Remarks<br />
|text= <ol><br />
<li> The formula is identical for $\ppsi(\bpsi; \theta)$ and $\pcpsith(\bpsi{{!}}\theta)$. What has changed is the status of $\theta$. It is not random in $\ppsi(\bpsi; \theta)$, the distribution of $\bpsi$ for any given value of $\theta$, whereas it is random in $\pcpsith(\bpsi {{!}} \theta)$, the conditional distribution of $\bpsi$, i.e., the distribution of $\bpsi$ obtained after observing randomly generated $\theta$. </li><br><br />
<br />
<li>If $\qth$ is a parametric distribution with parameter $\varphi$, this dependence can be made explicit by writing $\qth(\, \cdot \,;\varphi)$ for the distribution of $\theta$.</li><br><br />
<br />
<li>Not necessarily all of the components of $\theta$ need be random. If it is possible to decompose $\theta$ into $(\theta_F,\theta_R)$, where $\theta_F$ is fixed and $\theta_R$ random, then the decomposition [[#proba3a{{!}}(4)]] becomes </li><br />
<br />
{{EquationWithRef<br />
|equation=<div id="proba3b"><math><br />
\pypsith(\by,\bpsi,\theta_R;\bt,\theta_F,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta_R;\theta_F,\bc) \, \pth(\theta_R).<br />
</math></div><br />
|reference=(5) }} <br />
</ol>}}<br />
<br />
<br />
{{OutlineText<br />
|text=<br />
<li> In this context, the model is the joint distribution of the observations, the individual parameters and the population parameters:<br />
<br />
{{Equation1<br />
|equation=<math>\pypsith(\by,\bpsi,\theta;\bc,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta). </math> }}<br />
<br />
<li> The inputs of the model are the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times $\bt=(t_{ij} , 1\leq i \leq N , 1\leq j \leq n_i)$.<br />
}}<br />
<br />
<br />
{{Example<br />
|title=Example:<br />
|text= We can introduce [http://en.wikipedia.org/wiki/Prior_probability prior distributions] in order to model the inter-population variability of the population parameters $ V_{\rm pop}$ and $k_{\rm pop}$: <br />
<br />
{{EquationWithRef<br />
|equation=<div id="ex_proba3"><math>\begin{eqnarray}<br />
V_{\rm pop} &\sim& {\cal N}\left(30,3^2\right) <br />
\end{eqnarray}</math></div><br />
|reference=(6) }}<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
k_{\rm pop} &\sim& {\cal N}\left(0.1,0.01^2\right). \end{eqnarray}</math> }}<br />
<br />
As before, the conditional distribution of the concentration is given by [[#ex_proba2a|(2)]]. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given $\theta_R=(V_{\rm pop},k_{\rm pop})$. The distribution of $\theta_R$ is defined in [[#ex_proba3|(6)]]. Here, the inputs of the model are the fixed population parameters $\theta_F = (\omega_V,\omega_k,\beta,a)$, the weights $(w_i)$ and the design $\bt$.<br />
}}<br />
<br />
<br />
<br><br />
<br />
===A model for the covariates===<br />
<br />
<br />
Another scenario is to suppose that in fact it is the covariates $\bc$ that are random, not the population parameters. This may either be in the context of wanting to simulate individuals, or when modeling and wanting to take into account uncertainty in the covariate values. If we note $\qc$ the distribution of the covariates, the joint distribution $\qpsic$ of the individual parameters and the covariates decomposes naturally as:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="proba4"><math><br />
\ppsic(\bpsi,\bc;\theta) = \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) \, ,<br />
</math><br />
|reference=(7) }}<br />
<br />
where $\qcpsic$ is the conditional distribution of $\bpsi$ given $\bc$.<br />
<br />
<br />
{{OutlineText<br />
|text= <br />
<li>In this context, the model is the joint distribution of the observations, the individual parameters and the covariates:<br />
<br />
{{Equation1<br />
|equation=<math> <br />
\pypsic(\by,\bpsi,\bc;\theta,\bt) = \pcypsi(\by {{!}} \bpsi;\bt) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) . </math> }}<br />
<br />
<li>The inputs of the model are the population parameters $\theta$ and the measurement times $\bt$.<br />
}}<br />
<br />
<br />
{{Example<br />
|title=Example:<br />
|text= We could assume a normal distribution as a prior for the weights: <br />
<br />
{{EquationWithRef<br />
|equation=<div id="ex_proba4" ><math> <br />
w_i \sim_{i.i.d.} {\cal N}\left(70,10^2\right). </math></div><br />
|reference=(8) }}<br />
<br />
Once more, [[#ex_proba2a|(2)]] defines the conditional distribution of the concentrations. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given the weight $\bw$, which is now a random variable whose distribution is defined in [[#ex_proba4|(8)]]. Now, the inputs of the model are the population parameters $\theta= (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$ and the design $\bt$.<br />
}}<br />
<br />
<br />
<br><br />
<br />
===A model for the measurement times===<br />
<br />
Another scenario is to suppose that there is uncertainty in the measurement times $\bt=(t_{ij})$ and not the population parameters or covariates. If we note $\nominal{\bt}=(\nominal{t}_{ij}, 1\leq i \leq N, 1\leq j \leq n_i)$ the nominal measurement times (i.e., those presented in a data set), then the "true" measurement times $\bt$ at which the measurement were made can be considered random fluctuations around $\nominal{\bt}$ following some distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br />
<br />
Randomness with respect to time can also appear in the presence of dropout, i.e., individuals that prematurely leave a clinical trial. For such an individual $i$ who leaves at the random time $T_i$, measurement times are the nominal times before $T_i$: $t_{i} = (\nominal{t}_{ij} \ \ {\rm s.t. }\ \ \nominal{t}_{ij}\leq T_i)$.<br />
In such situations, measurement times are therefore random and can be thought to come from a distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br />
<br />
<br />
{{Remarks<br />
|title=Remark<br />
|text= If there are also other regression variables $\bx=(x_{ij})$, it is of course possible to use the same approach and consider $\bx$ as a random variable fluctuating around $\nominal{\bx}$. }}<br />
<br />
<br />
{{OutlineText<br />
|text=<br />
<li> In this context, the model is the joint distribution of the observations, the individual parameters and the measurement times:<br />
<br />
{{Equation1<br />
|equation=<math>\pypsit(\by , \bpsi,\bt; \theta,\bc,\nominal{\bt})=\pcypsit(\by {{!}}\bpsi,\bt) \, \ppsi(\bpsi;\theta,\bc) \, \pt(\bt ; \nominal{\bt}) .<br />
</math> }}<br />
<br />
<br />
<li> The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$ and the nominal design $\nominal{\bt}$.<br />
}}<br />
<br />
{{Example<br />
|title=Example:<br />
|text= Let us assume as prior a normal distribution around the nominal times: <br />
<br />
{{EquationWithRef<br />
|equation=<div id="ex_proba5" ><math> <br />
t_{ij} \sim_{i.i.d.} {\cal N}\left(\nominal{t}_{ij},0.03^2\right). </math></div><br />
|reference=(9) }}<br />
<br />
Here, [[#ex_proba5|(9)]] defines the distribution of the now random variable $ \bt$. The other components of the model defined in [[#ex_proba2a|(2)]] and [[#ex_proba2b|(3)]] remain unchanged. <br />
The inputs of the model are the population parameters $ \theta$, the weights $ (w_i)$ and the nominal measurement times $ \nominal{\bt}$.<br />
}}<br />
<br />
<br />
<br><br />
<br />
===A model for the dose regimen===<br />
<br />
If the structural model is a dynamical system (e.g., defined by a system of [http://en.wikipedia.org/wiki/Ordinary_differential_equation ordinary differential equations]), the ''source terms'' $\bu = (u_i, 1\leq i \leq N)$, i.e., the inputs of the dynamical system, are usually considered fixed and known. This is the case for example for doses administered to patients for a given treatment. Here, the source term $u_i$ is made up of the dose(s) given to patient $i$, the time(s) of administration, and their type (intravenous bolus, infusion, oral, etc.).<br />
<br />
Here again, there may be differences between the nominal dose regimen stated in the protocol and given in the data set, and the dose regimen that was in reality administered. For example, it might be that the times of administration and/or the dosage were not exactly respected or recorded. Also, there may have been non-compliance, i.e., certain doses that were not taken by the patient.<br />
<br />
If we denote $\nominal{\bu}=(\nominal{u}_{i}, 1\leq i \leq N)$ the nominal dose regimens (reported in the dataset), then in this context the "real" dose regimens $\bu$ can be considered to randomly fluctuate around $\nominal{\bu}$ with some distribution $\qu(\, \cdot \, ; \nominal{\bu})$.<br />
<br />
<br />
{{OutlineText<br />
|text=<br />
<li> In this context, the model is the joint distribution of the observations, the individual parameters and the dose regimens:<br />
<br />
{{Equation1<br />
|equation=<math>\pypsiu(\by , \bpsi,\bu; \theta,\bc,\bt,\nominal{\bu})=\pcypsiu(\by {{!}} \bpsi,\bu;\bt) \, \pu(\bu ; \nominal{\bu}) \, \ppsi(\bpsi;\theta,\bc) .<br />
</math> }}<br />
<br />
<li> The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$, the nominal design $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br />
}}<br />
<br />
<br />
{{Example<br />
|title=Example:<br />
|text=Suppose that instead of the one dose given in the example up to now, there are now repeated doses $(d_{ik}, k \geq 1)$ administered to patient $i$ at times $(\tau_{ik} , k \geq 1)$. Then, it is easy to see that<br />
<br />
{{EquationWithRef<br />
|equation=<div id="ex_proba6b" ><math> <br />
y_{ij} \sim {\cal N}\left(f(t_{ij};V_i,k_i) , a_i^2 \right), </math></div><br />
|reference=(10) }}<br />
<br />
where<br />
<br />
{{EquationWithRef<br />
|equation=<div id="ex_proba6a" ><math> <br />
f(t;V_i,k_i) = \sum_{k, \tau_{ik}<t}\displaystyle {\frac{d_{ik} }{V_i} }\, e^{-k_i \, (t- \tau_{ik})} .<br />
</math></div><br />
|reference=(11) }}<br />
<br />
The "real" dose regimen administered to patient $i$ can be written $u_i=(d_{ik},\tau_{ik}, k\geq 1)$, and the prescribed dose regimen $\nominal{u}_i=(\nominal{d}_{ik},\nominal{\tau}_{ik}, k\geq 1)$.<br />
<br />
We can model the random fluctuations of the administration times $\tau_{ik}$ around the nominal times $(\nominal{\tau}_{ij})$:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="ex_proba6c"><math>\begin{eqnarray}<br />
\tau_{ik} &\sim_{i.i.d.}& {\cal N}\left(\nominal{\tau}_{ik},0.02^2\right)\ , <br />
\end{eqnarray}</math></div><br />
|reference=(12) }}<br />
<br />
and non-compliance (here meaning that a dose is not taken):<br />
<br />
{{EquationWithRef<br />
|equation=<div id="ex_proba6d"><math>\begin{eqnarray}<br />
\pi &=& \prob{d_{ik} = 0} \nonumber \\ &=& 1 - \prob{d_{ik}= \nominal{d}_{ik} }. <br />
\end{eqnarray}</math></div><br />
|reference=(13) }}<br />
<br />
Here, [[#ex_proba6b{{!}}(10)]] and [[#ex_proba6a{{!}}(11)]] define the conditional distributions of the concentrations $(y_{ij})$, [[#ex_proba2b{{!}}(3)]] defines the distribution of $\bpsi$ and [[#ex_proba6c{{!}}(12)]] and [[#ex_proba6d{{!}}(13)]] define the distribution of $\bu$. The inputs are the population parameters $\theta$, the weights $(w_i)$, the measurement times $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br />
}}<br />
<br />
<br />
<br><br />
<br />
===A complete model===<br />
<br />
We have now seen the variety of ways in which the variables in a model either play the role of random variables whose distribution is defined by the model, or nonrandom variables or parameters. Any combination is possible, depending on the context. For instance, the population parameters $\theta$ and covariates $\bc$ could be random with parametric probability distributions $\qth(\, \cdot \,;\varphi)$ and $\qc(\, \cdot \,;\gamma)$, and the dose regimen $\bu$ and measurement times $\bt$ reported with uncertainty and therefore modeled as random variables with distributions $\qu$ and $\qt$.<br />
<br />
<br />
{{OutlineText<br />
|text=<br />
<li> In this context, the model is the joint distribution of the observations, the individual parameters, the population parameters, the dose regimens, the covariates and the measurement times:<br />
<br />
{{Equation1<br />
|equation=<math><br />
\pypsithcut(\by , \bpsi, \theta, \bu, \bc,\bt; \nominal{\bu},\nominal{\bt},\varphi,\gamma)=\pcypsiut(\by {{!}}\bpsi,\bu,\bt) \, \pcpsithc(\bpsi{{!}}\theta,\bc) \, \pth(\theta;\varphi) \, \pc(\bc;\gamma) \, \pu(\bu ; \nominal{\bu}) \, \pt(\bt ; \nominal{\bt}). </math> }}<br />
<br />
<li> The inputs of the model are the nominal dose regimens $\nominal{\bu}$, the nominal measurement times $\nominal{\bt}$ and the "hyper-parameters" $\varphi$ and $\gamma$.<br />
}}<br />
<br />
<br />
<br><br />
<br />
==Using the model for executing tasks==<br />
<br />
<br />
In the modeling and simulation context, the tasks to execute make specific use of the various probability distributions associated with a model.<br />
<br />
<br><br />
===Simulation===<br />
<br />
<br />
By definition, simulation makes direct use of the probability distribution that defines the model. Simulation of the global model is straightforward as soon as the joint distribution can be decomposed into a product of easily simulated conditional distributions.<br />
<br />
Consider for example that the variables involved in the model are those introduced in [[#An illustrative example|the previous section]]:<br />
<br />
<br />
# The population parameters $\theta$ can either be given, or simulated from the distribution $\qth$.<br />
# The individual covariates $\bc$ can either be given, or simulated from the distribution $\qc$.<br />
# The individual parameters $\bpsi$ can be simulated from the distribution $\qcpsithc$ using the values of $\theta$ and $\bc$ obtained in steps 1 and 2.<br />
# The dose regimen $\bu$ can either be given, or simulated from the distribution $\qu$.<br />
# The measurement times $\bt$ (resp. regression variables $\bx$) can either be given, or simulated from the distribution $\qt$ (resp. $\qx$).<br />
# Lastly, observations $\by$ can be simulated from the distribution $\qcypsiut$ using the values of $\bpsi$, $\bu$ and $\bt$ obtained at steps 3, 4 and 5.<br />
<br />
<br />
{{OutlineText<br />
|text=<br />
Simulation of a set of variables $w$ using another given set of variables $z$ requires:<br />
<br />
<br />
<ul><br />
* a model, i.e., a distribution $\qw$ if $z$ is treated as a nonrandom variable, or a conditional distribution $\qcwz$ if $z$ is treated as a random variable.<br />
* the input $z$, i.e., a value of $z$ which allows the distribution $\qw(\, \cdot \, ; z)$ or the conditional distribution $\qcwz(\, \cdot \, {{!}} z)$ to be defined.<br />
* an algorithm which allows us to generate $w$ from $\qw$ or $\qcwz$.<br />
</ul><br />
}}<br />
<br />
<br />
{{Example<br />
|title=Example:<br />
|text=<br />
- Imagine instead that the population parameter $\theta$ and the design $(\bu,\bt)$ are given, and we want to simulate the individual covariates $\bc$, the individual parameters $\bpsi$ and the observations $\by$. Here, the variables to simulate are $w=(\bc,\bpsi,\by)$ and the variables which are given are $z=(\theta,\bu,\bt)$. If the components of $z$ are taken to be nonrandom variables, then:<br />
<br />
<br />
<ul><br />
* The model is the joint distribution $\qypsic( \, \cdot \, ;\theta,\bu,\bt)$ of $(\by,\bpsi,\bc)$.<br />
* The inputs required for the simulation are the values of $(\theta,\bu,\bt)$.<br />
* The algorithm should be able to generate $(\by,\bpsi,\bc)$ from the joint distribution $\qypsic(\, \cdot \, ;\theta,\bu,\bt)$. Decomposing the model into three submodels leads to decomposing the joint distribution as<br />
</ul><br />
<br />
{{Equation1<br />
|equation=<math><br />
\pypsic(\by,\bpsi,\bc ;\theta,\bu,\bt) = \pc(\bc) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pcypsi(\by {{!}} \bpsi;\bu,\bt) . </math> }}<br />
<br />
The algorithm therefore reduces to successively drawing $\bc$, $\bpsi$ and $\by$ from $\qc$, $\qcpsic(\, \cdot \, {{!}} \bc;\theta)$ and $\qcypsi(\, \cdot \, {{!}} \bpsi;\bu,\bt)$. <br />
<br />
<br />
- Imagine instead that the individual covariates $\bc$, the observations $\by$, the design $(\bu,\bt)$ and the population parameter $\theta$ are given (in a modeling context for instance, $\theta$ may have been estimated), and we want to simulate the individual parameters $\bpsi$. The only variable to simulate is $w=\bpsi$ and the variables which are given are $z=(\by,\bc,\theta,\bu,\bt)$. Here, we will treat $\by$ as if it is a random variable. The other components of $z$ can be treated as non random variables. Here,<br />
<br />
<br />
<ul><br />
* The model is the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$ of $\psi$.<br />
* The inputs required for the simulation are the values of $(\by,\bc,\theta,\bu,\bt)$.<br />
* The algorithm should be able to sample $\bpsi$ from the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$. [http://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo Markov Chain Monte Carlo] (MCMC) algorithms can be used for sampling from such complex conditional distributions.<br />
</ul><br />
}}<br />
<br />
<br><br />
<br />
===Estimation of the population parameters===<br />
<br />
<br />
In a modeling context, we usually assume that we have data that includes the observations $\by$ and the measurement times $\bt$. There may also be individual covariates $\bc$, and in pharmacological applications the dose regimen $\bu$. For clarity, let us consider the most general case where all are present.<br />
<br />
Any statistical method for estimating the population parameters $\theta$ will be based on some specific probability distribution. Let us illustrate this with two common statistical methods: maximum likelihood and Bayesian estimation.<br />
<br />
<br />
''Maximum likelihood estimation'' consists in maximizing with respect to $\theta$ the ''observed likelihood'', defined by:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
{\like}(\theta ; \by,\bc,\bu,\bt) &\eqdef& \py(\by ; \bc,\bu,\bt,\theta) \\<br />
&=& \int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi .<br />
\end{eqnarray}</math> }}<br />
<br />
The variance of the estimator $\thmle$ and therefore confidence intervals can be derived from the observed Fisher information matrix, which itself is calculated using the observed likelihood (i.e., the pdf of the observations $\by$):<br />
<br />
{{EquationWithRef<br />
|equation=<div id="ofim_intro3"><math><br />
\ofim(\thmle ; \by,\bc,\bu,\bt) \ \ \eqdef \ \ - \displaystyle{ \frac{\partial^2}{\partial \theta^2} } \log({\like}(\thmle ; \by,\bc,\bu,\bt)) .<br />
</math></div><br />
|reference=(14) }}<br />
<br />
<br />
{{OutlineText<br />
|text=Maximum likelihood estimation of the population parameter $\theta$ requires:<br />
<br />
<br />
* a model, i.e., a joint distribution $\qypsi$.<br />
* inputs $\by$, $\bc$, $\bu$ and $\bt$.<br />
* an algorithm which allows us to maximize $\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi$ with respect to $\theta$ and to compute $\displaystyle{ \frac{\partial^2}{\partial \theta^2} }\left\{\log\left(\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\thmle) \, d \bpsi \right)\right\}$.<br />
}}<br />
<br />
<br />
''Bayesian estimation'' consists in estimating and/or maximizing the conditional distribution<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\pcthy(\theta {{!}} \by ;\bc,\bu,\bt) &=& \displaystyle{ \frac{\pyth(\by , \theta ; \bc,\bu,\bt)}{\py(\by ; \bc,\bu,\bt)} } \\<br />
&=& \frac{\displaystyle{ \int \pypsith(\by,\bpsi,\theta ; \bc,\bu,\bt) \, d \bpsi} }{\py(\by ; \bc,\bu,\bt)} .<br />
\end{eqnarray}</math> }}<br />
<br />
<br />
{{outlineText<br />
|text= Bayesian estimation of the population parameter $\theta$ requires:<br />
<br />
<br />
* a model, i.e., a joint distribution $\qypsith(\, \cdot \, ; \bc, \bu, \bt)$ for $(\by,\bpsi,\theta)$.<br />
* inputs $\by$, $\bc$, $\bu$ and $\bt$.<br />
* algorithms able to estimate and maximize $\pcthy(\theta {{!}} \by ;\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode, i.e., finding its maximum.<br />
}}<br />
<br />
<br />
<br><br />
<br />
===Estimation of the individual parameters===<br />
<br />
<br />
When $\theta$ is given (or estimated), various estimators of the individual parameters $\bpsi$ are available. They are all based on a probability distribution:<br />
<br />
''Maximum likelihood estimation'' consists of maximizing with respect to $\bpsi$ the ''conditional likelihood''<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
{\like}(\bpsi ; \by,\bu,\bt) &\eqdef& \pcypsi(\by {{!}} \bpsi ;\bu,\bt) .<br />
\end{eqnarray}</math> }}<br />
<br />
The ''maximum a posteriori'' (MAP) estimator is obtained by maximizing with respect to $\bpsi$ the ''conditional distribution''<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\pcpsiy(\bpsi {{!}} \by ; \theta,\bc,\bu,\bt) &=& \displaystyle{ \frac{\pypsi(\by , \bpsi;\theta,\bc,\bu,\bt)}{\py(\by ; \theta,\bc,\bu,\bt)} } .<br />
\end{eqnarray}</math> }}<br />
<br />
The ''conditional mean'' of $\bpsi$ is defined as the mean of the conditional distribution $\qcpsiy(\, \cdot \, | \by ; \theta,\bc,\bu,\bt)$ of $\psi$.<br />
<br />
<br />
{{OutlineText<br />
|text=<br />
Estimation of the individual parameters $\bpsi$ requires:<br />
<br />
<br />
* a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br />
* inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br />
* algorithms able to estimate and maximize $\pcpsiy(\bpsi {{!}} \by ; \theta,\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode (i.e., its MAP).<br />
}}<br />
<br />
<br />
<br><br />
===Model selection===<br />
<br />
<br />
Likelihood ratio tests and statistical information criteria ([http://en.wikipedia.org/wiki/Bayesian_information_criterion BIC], [http://en.wikipedia.org/wiki/Akaike_information_criterion AIC]) compare the ''observed likelihoods'' computed under different models, i.e., the probability distribution functions $\py^{(1)}(\by ; \bc,\bu,\bt,\thmle_1)$, $\py^{(2)}(\by ; \bc,\bu,\bt,\thmle_2)$, ..., $\py^{(K)}(\by ; \bc,\bu,\bt,\thmle_K)$ computed under models ${\cal M}_1, {\cal M}_2$, ..., ${\cal M}_K$, where $\thmle_k$ maximizes the observed likelihood of model ${\cal M}_k$, i.e., maximizes $\py^{(k)}(\by ; \bc,\bu,\bt,\theta)$ .<br />
<br />
<br />
{{outlineText<br />
|text=<br />
Computing the observed likelihood and information criteria require:<br />
<br />
<br />
* a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br />
* inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br />
* an algorithm able to compute $\int \pypsi( \by ,\bpsi ;\theta,\bc,\bu,\bt) \, d\bpsi$. For nonlinear models, linearization methods or Monte Carlo methods can be used.<br />
}}<br />
<br />
<br />
<br><br />
<br />
===Optimal design===<br />
<br />
In the design of experiments for estimating statistical models, optimal designs allow parameters to be estimated with minimum variance by optimizing some statistical criteria. Common optimality criteria are [http://en.wikipedia.org/wiki/Functional_%28mathematics%29 functionals] of the [http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors eigenvalues] of the expected Fisher information matrix<br />
<br />
{{EquationWithRef<br />
|equation=<div id="efim_intro3"><math><br />
\efim(\theta ; \bu,\bt) \ \ \eqdef \ \ \esps{y}{\ofim(\theta ; \by,\bu,\bt)} ,<br />
</math></div><br />
|reference=(15) }}<br />
<br />
where $\ofim$ is the observed Fisher information matrix defined in [[#ofim_intro3|(15)]]. For the sake of simplicity, we consider models without covariates $\bc$.<br />
<br />
<br />
{{OutlineText<br />
|text=Optimal design for minimum variance estimation requires:<br />
<br />
<br />
* a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br />
* a vector of population parameters $\theta$.<br />
* a criteria ${\cal D}(\bu,\bt)$ derived from the expected Fisher information matrix $\efim(\theta ; \bu,\bt)$.<br />
* an algorithm able to estimate $\efim(\theta ; \bu,\bt)$ for any design $(\bu,\bt)$ and to maximize ${\cal D}(\bu,\bt)$ with respect to $\bu$ and $\bt$.<br />
}}<br />
<br />
<br />
In a [http://en.wikipedia.org/wiki/Clinical_trial clinical trial] context, studies are designed to optimize the probability of reaching some predefined target ${\cal A}$, i.e., $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$. This may include optimizing safety and efficacy, and things like the probability of reaching [http://en.wikipedia.org/wiki/Sustained_viral_response sustained virologic response], etc.<br />
<br />
<br />
{{OutlineText<br />
|text=Optimal design for clinical trials requires:<br />
<br />
<br />
* a model, i.e., a joint distribution $\qypsic(\, \cdot \, ; \theta, \bu, \bt)$ for $(\by,\bpsi,\bc)$.<br />
* a vector of population parameters $\theta$.<br />
* a target ${\cal A}$.<br />
* an algorithm able to estimate $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$ and to maximize it with respect to $\bu$ and $\bt$.<br />
}}<br />
<br />
<br />
<br><br />
<br />
==Implementing models and running tasks==<br />
<br />
<br />
===Example 1 ===<br />
<br />
Consider first the model defined by the joint distribution <br />
<br />
{{Equation1<br />
|equation= <math>\pypsi(\by,\bpsi ; \theta ,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \pcpsic(\bpsi ; \theta),</math>}}<br />
<br />
where as in our running example, <br />
<br />
<br />
<ul><br />
* $\by = (y_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are concentrations <br />
<br />
* $ \bpsi= (\psi_i, 1\leq i \leq N)$ are individual parameters; here $ \psi_i=(V_i,k_i,a_i)$<br />
<br />
* $ \theta=(V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,a)$ are population parameters <br />
<br />
* $ \bt = (t_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are the measurement times. <br />
</ul><br />
<br />
<br />
We aim to define a joint model for $\by$ and $\bpsi$. To do this, we will characterize each component of the model and show how this can be implemented with $\mlxtran$.<br />
<br />
{| cellspacing="10" cellpadding="10"<br />
|style="width:50%"|<br />
{{Equation2 <br />
|name=<math> \pypsi(\by,\bpsi ; \theta, \bt) </math> <br />
|equation= }}<br />
{{Equation2<br />
|name= <math> \pcpsic(\bpsi {{!}}\theta)</math><br />
|equation=<br />
<math>\begin{array}{c} <br />
\log(V_i) &\sim& {\cal N}\left(\log(V_{\rm pop}), \, \omega_V^2\right) \\<br />
\log(k_i) &\sim& {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br />
\end{array}</math> }}<br />
{{Equation2<br />
|name= <math>\pcypsi(y{{!}}\bpsi; \bt) </math><br />
|equation=<br />
<math>\begin{eqnarray}<br />
f(t;V_i,k_i) &=& \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br />
y_{ij} &\sim& {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br />
\end{eqnarray}</math> }}<br />
<br />
|style = "width:50%" |<br />
{{MLXTranForTable<br />
|name=Example 1<br />
|text=<pre style="background-color:#EFEFEF; border:none;"> <br />
[INDIVIDUAL PARAMETER]<br />
input={V_pop,k_pop,omega_V,omega_k}<br />
<br />
DEFINITION:<br />
V = {distribution=logNormal, prediction=V_pop,sd=omega_V}<br />
k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br />
<br />
<br />
[OBSERVATION]<br />
input={V,k,a}<br />
<br />
EQUATION:<br />
f = 500/V*exp(-k*t)<br />
<br />
DEFINITION:<br />
y = {distribution=normal, prediction=f, sd=a}<br />
</pre> }}<br />
|}<br />
<br />
<br />
We can then use this model with different tools for executing different tasks: it can be used for example with $\mlxplore$ for model exploration, with $\monolix$ for modeling, with R or Matlab for simulation, etc.<br />
<br />
It is important to remember that $\mlxtran$ is not a "function" that calculates an output. It is not an imperative but rather a declarative language, one that allows us to describe a model. It is then the tasks we choose to do which use $\mlxtran$ like a function, "requesting" it to give predictions, simulate random variables, compute a pdf, maximizes a likelihood, etc.<br />
<br />
<br />
<br><br />
<br />
===Example 2===<br />
<br />
Consider now a model defined by the joint distribution<br />
<br />
{{Equation1<br />
|equation=<math><br />
\pypsithc(\by,\bpsi, \theta, \bc ; \bt) = \pcypsi(\by{{!}}\bpsi;\bt) \pcpsic(\bpsi{{!}}\bc ; \theta) \, \pth(\theta) \pc(\bc) ,<br />
</math> }}<br />
<br />
where the covariates $\bc$ are the weights of the individuals: $\bc = (w_i, 1\leq i \leq N)$. The other variables and parameters are those already defined in the previous example.<br />
<br />
We now aim to define a joint model for $\by$, $\bpsi$, $\bc$ and $\theta_R=(V_{\rm pop},k_{\rm pop})$.<br />
<br />
<br />
{| cellspacing="10" cellpadding="10"<br />
|style="width:50%" |<br />
{{Equation2 <br />
|name= <math>\pypsithc(\by,\bpsi, \theta, \bc ; \bt)</math><br />
|equation= }}<br />
{{Equation2<br />
|name=<math>\pth(\theta)</math><br />
|equation=<math>\begin{eqnarray}<br />
V_{\rm pop} &\sim& {\cal N}\left(30,3^2\right) \\<br />
k_{\rm pop} &\sim& {\cal N}\left(0.1,0.01^2\right)<br />
\end{eqnarray}</math> }}<br />
{{Equation2<br />
|name=<math>\pc(\bc)</math><br />
|equation=<br />
<math>\begin{eqnarray}<br />
w_i &\sim& {\cal N}\left(70,10^2\right)<br />
\end{eqnarray}</math> }}<br />
{{Equation2<br />
|name=<math>\pcpsic(\bpsi {{!}}\bc;\theta)</math><br />
|equation=<math><br />
\begin{eqnarray}<br />
\hat{V}_i &=& V_{\rm pop}\left(\frac{w_i}{70}\right)^\beta \\[0.4cm]<br />
\log(V_i) &\sim& {\cal N}\left(\log(\hat{V}_i), \, \omega_V^2\right) \\<br />
\log(k_i) &\sim& {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br />
\end{eqnarray}</math> }}<br />
{{Equation2<br />
|name=<math>\pcypsi(y{{!}}\bpsi; \bt) </math><br />
|equation=<math>\begin{eqnarray}<br />
f(t;V_i,k_i) &=& \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br />
y_{ij} &\sim& {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br />
\end{eqnarray}</math> }}<br />
<br />
|style="width:50%"|<br />
{{MLXTranForTable<br />
|name=jointModel2.txt<br />
|text=<pre style="background-color:#EFEFEF; border:none"><br />
[POPULATION PARAMETER]<br />
<br />
DEFINITION:<br />
V_pop = {distribution=normal, mean=30, sd=3}<br />
k_pop = {distribution=normal, mean=0.1, sd=0.01}<br />
<br />
<br />
[COVARIATE]<br />
<br />
DEFINITION:<br />
weight = {distribution=normal, mean=70, sd=10}<br />
<br />
<br />
<br />
[INDIVIDUAL PARAMETER]<br />
input={V_pop,k_pop,omega_V,omega_k,beta,weight}<br />
<br />
EQUATION:<br />
V_pred = V_pop*(weight/70)^beta<br />
<br />
DEFINITION:<br />
V = {distribution=logNormal, prediction=V_pred,sd=omega_V}<br />
k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br />
<br />
<br />
[OBSERVATION]<br />
input={V,k,a}<br />
<br />
EQUATION:<br />
f = 500/V*exp(-k*t)<br />
<br />
DEFINITION:<br />
y = {distribution=normal, prediction=f, sd=a}<br />
</pre> }}<br />
|}<br />
<br />
We can use the approach described above for various tasks, e.g., simulating $(\by,\bpsi, \bc, \theta_R)$ for a given input $(\theta_F, \bt)$, simulating the population parameters $(V_{\rm pop},k_{\rm pop})$ with the conditional distribution $p_{\theta_R|\by, \bc}( \, \cdot \, | \by, \bc ; \theta_F,\bt)$, estimating the log-likelihood, maximizing the observed likelihood and computing the MAP.<br />
<br />
<br />
<br />
<br><br />
<br />
<!--<br />
==Bibliography==<br />
TO DO<br />
--><br />
<br />
<br />
{{Back&Next<br />
|linkBack=The individual approach<br />
|linkNext=Description, representation and implementation of a model }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=What_is_a_model%3F_A_joint_probability_distribution!&diff=7377What is a model? A joint probability distribution!2013-06-19T10:53:25Z<p>Brocco: /* Example 2 */</p>
<hr />
<div>==Introduction==<br />
<br />
A model built for real-world applications can involve various types of variable, such as measurements, individual and population parameters, covariates, design, etc. The model allows us to represent relationships between these variables.<br />
<br />
If we consider things from a probabilistic point of view, some of the variables will be random, so the model becomes a probabilistic one, representing the [http://en.wikipedia.org/wiki/Joint_probability_distribution joint distribution] of these random variables.<br />
<br />
Defining a model therefore means defining a joint distribution. The hierarchical structure of the model will then allow it to be decomposed into submodels, i.e., the joint distribution decomposed into a product of [http://en.wikipedia.org/wiki/Conditional_probability_distribution conditional distributions].<br />
<br />
Tasks such as estimation, model selection, simulation and optimization can then be expressed as specific ways of using this probability distribution.<br />
<br />
<br />
{{OutlineText<br />
|text= <br />
- A model is a joint probability distribution. <br />
<br />
- A submodel is a conditional distribution derived from this joint distribution. <br />
<br />
- A task is a specific use of this distribution. <br />
}}<br />
<br />
We will illustrate this approach starting with a very simple example that we will gradually make more sophisticated. Then we will see in various situations what can be defined as the model and what its inputs are.<br />
<br />
<br />
<br><br />
<br />
==An illustrative example==<br />
<br />
<br><br />
===A model for the observations of a single individual===<br />
Let $y=(y_j, 1\leq j \leq n)$ be a vector of observations obtained at times $\vt=(t_j, 1\leq j \leq n)$. We consider that the $y_j$ are random variables and we denote $\qy$ the distribution (or [http://en.wikipedia.org/wiki/Probability_density_function pdf]) of $y$. If we assume a [http://en.wikipedia.org/wiki/Parametric_model parametric model], then there exists a vector of parameters $\psi$ that completely define $y$.<br />
<br />
We can then explicitly represent this dependency with respect to $\bpsi$ by writing $\qy( \, \cdot \, ; \psi)$ for the pdf of $y$.<br />
<br />
If we wish to be even more precise, we can even make it clear that this distribution is defined for a given design, i.e., a given vector of times $\vt$, and write $ \qy(\, \cdot \, ; \psi,\vt)$ instead.<br />
<br />
By convention, the variables which are before the symbol ";" are random variables. Those that are after the ";" are non-random parameters or variables.<br />
When there is no risk of confusion, the non-random terms can be left out of the notation.<br />
<br />
<br />
{{OutlineText<br />
|text= <br />
-In this context, the model is the distribution of the observations $\qy(\, \cdot \, ; \psi,\vt)$. <br><br />
-The inputs of the model are the parameters $\psi$ and the design $\vt$.<br />
}}<br />
<br />
<br />
{{Example<br />
|title=Example<br />
|text= 500 mg of a drug is given by [http://en.wikipedia.org/wiki/Intravenous_therapy intravenous] [http://en.wikipedia.org/wiki/Bolus_%28medicine%29 bolus] to a patient at time 0. We assume that the evolution of the [http://en.wikipedia.org/wiki/Blood_plasma plasmatic] concentration of the drug over time is described by the [http://en.wikipedia.org/wiki/Pharmacokinetics pharmacokinetic] (PK) model<br />
<br />
{{Equation1<br />
|equation=<math> f(t;V,k) = \displaystyle{ \frac{500}{V} }e^{-k \, t} , </math> }}<br />
<br />
where $V$ is the [http://en.wikipedia.org/wiki/Volume_of_distribution volume of distribution] and $k$ the [http://en.wikipedia.org/wiki/Elimination_rate_constant elimination rate constant]. The concentration is measured at times $(t_j, 1\leq j \leq n)$ with additive residual errors:<br />
<br />
{{Equation1<br />
|equation=<math> y_j = f(t_j;V,k) + e_j , \quad 1 \leq j \leq n . </math> }}<br />
<br />
Assuming that the residual errors $(e_j)$ are [http://en.wikipedia.org/wiki/Dependent_and_independent_variables independent] and [http://en.wikipedia.org/wiki/Normal_distribution normally distributed] with constant variance $a^2$, the observed values $(y_j)$ are also independent random variables and<br />
<br />
{{EquationWithRef<br />
|equation=<div id="ex_proba1" ><math><br />
y_j \sim {\cal N} \left( f(t_j ; V,k) , a^2 \right), \quad 1 \leq j \leq n. </math></div><br />
|reference=(1) }}<br />
<br />
Here, the vector of parameters $\psi$ is $(V,k,a)$. $V$ and $k$ are the PK parameters for the structural PK model and $a$ the residual error parameter.<br />
As the $y_j$ are independent, the joint distribution of $y$ is the product of their marginal distributions:<br />
<br />
{{Equation1<br />
|equation=<math> \py(y ; \psi,\vt) = \prod_{j=1}^n \pyj(y_j ; \psi,t_j) ,<br />
</math> }}<br />
<br />
where $\qyj$ is the normal distribution defined in [[#ex_proba1|(1)]]. <br />
}}<br />
<br />
<br />
<br />
<br><br />
<br />
=== A model for several individuals ===<br />
<br />
Now let us move to $N$ individuals. It is natural to suppose that each is represented by the same basic parametric model, but not necessarily the exact same parameter values. Thus, individual $i$ has parameters $\psi_i$. If we consider that individuals are randomly selected from the [http://en.wikipedia.org/wiki/Statistical_population population], then we can treat the $\psi_i$ as if they were random vectors. As both $\by=(y_i , 1\leq i \leq N)$ and $\bpsi=(\psi_i , 1\leq i \leq N)$ are random, the model is now a joint distribution: $\qypsi$. Using basic probability, this can be written as:<br />
<br />
{{Equation1<br />
|equation=<math> <br />
\pypsi(\by,\bpsi) = \pcypsi(\by {{!}} \bpsi) \, \ppsi(\bpsi) .</math> }}<br />
<br />
If $\qpsi$ is a parametric distribution that depends on a vector $\theta$ of ''population parameters'' and a set of ''individual covariates'' $\bc=(c_i , 1\leq i \leq N)$, this dependence can be made explicit by writing $\qpsi(\, \cdot \,;\theta,\bc)$ for the pdf of $\bpsi$.<br />
Each $i$ has a potentially unique set of times $t_i=(t_{i1},\ldots,t_{i \ \!\!n_i})$ in the design, and $n_i$ can be different for each individual.<br />
<br />
<br />
{{OutlineText<br />
|text= <br />
- In this context, the model is the joint distribution of the observations and the individual parameters:<br />
<br />
{{Equation1<br />
|equation=<math> \pypsi(\by , \bpsi; \theta, \bc,\bt)=\pcypsi(\by {{!}} \bpsi;\bt) \, \ppsi(\bpsi;\theta,\bc) . </math>}}<br />
<br />
- The inputs of the model are the population parameters $\theta$, the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times <br />
:$\bt=(t_{ij} ,\ 1\leq i \leq N ,\ 1\leq j \leq n_i)$.<br />
}}<br />
<br />
<br />
{{Example<br />
|title=Example<br />
|text= Let us suppose $ N$ patients received the same treatment as the single patient did. We now have the same PK model [[#ex_proba1|(1)]] for each patient, except that each has its own individual PK parameters $ V_i$ and $ k_i$ and potentially its own residual error parameter $ a_i$:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="ex_proba2a"><math> <br />
y_{ij} \sim {\cal N} \left( \displaystyle{\frac{500}{V_i}e^{-k_i \, t_{ij} } } , a_i^2 \right).<br />
</math></div><br />
|reference=(2) }}<br />
<br />
Here, $\psi_i = (V_i,k_i,a_i)$. One possible model is then to assume the same residual error model for all patients, and log-normal distributions for $V$ and $k$:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
a_i &=& a \end{eqnarray}</math> }}<br />
{{EquationWithRef<br />
|equation=<div id="ex_proba2b"><math>\begin{eqnarray}<br />
\log(V_i) &\sim_{i.i.d.}& {\cal N}\left(\log(V_{\rm pop})+\beta\,\log(w_i/70),\ \omega_V^2\right) \end{eqnarray}</math></div><br />
|reference=(3) }}<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\log(k_i) &\sim_{i.i.d.}& {\cal N}\left(\log(k_{\rm pop}),\ \omega_k^2\right), \end{eqnarray}</math> }}<br />
<br />
where the only covariate we choose to consider, $w_i$, is the weight (in kg) of patient $i$. The model therefore consists of the conditional distribution of the concentrations defined in [[#ex_proba2a|(2)]] and the distribution of the individual PK parameters defined in [[#ex_proba2b|(3)]]. The inputs of the model are the population parameters $\theta = (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$, the covariates (here, the weight) $(w_i, 1\leq i \leq N)$, and the design $\bt$.<br />
}}<br />
<br />
<br><br />
===A model for the population parameters===<br />
<br />
In some cases, it may turn out that it is useful or important to consider that the population parameter $\theta$ is itself random, rather than fixed. There are various reasons for this, such as if we want to model uncertainty in its value, introduce a priori information in an estimation context, or model an inter-population variability if the model is not looking at only one given population.<br />
<br />
If so, let us denote $\qth$ the distribution of $\theta$. As the status of $\theta$ has therefore changed, the model now becomes the joint distribution of its random variables, i.e., of $\by$, $\bpsi$ and $\theta$, and can be decomposed as follows:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="proba3a"><math><br />
\pypsith(\by,\bpsi,\theta;\bt,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta) .<br />
</math></div><br />
|reference=(4) }}<br />
<br />
<br />
{{Remarks<br />
|title=Remarks<br />
|text= <ol><br />
<li> The formula is identical for $\ppsi(\bpsi; \theta)$ and $\pcpsith(\bpsi{{!}}\theta)$. What has changed is the status of $\theta$. It is not random in $\ppsi(\bpsi; \theta)$, the distribution of $\bpsi$ for any given value of $\theta$, whereas it is random in $\pcpsith(\bpsi {{!}} \theta)$, the conditional distribution of $\bpsi$, i.e., the distribution of $\bpsi$ obtained after observing randomly generated $\theta$. </li><br><br />
<br />
<li>If $\qth$ is a parametric distribution with parameter $\varphi$, this dependence can be made explicit by writing $\qth(\, \cdot \,;\varphi)$ for the distribution of $\theta$.</li><br><br />
<br />
<li>Not necessarily all of the components of $\theta$ need be random. If it is possible to decompose $\theta$ into $(\theta_F,\theta_R)$, where $\theta_F$ is fixed and $\theta_R$ random, then the decomposition [[#proba3a{{!}}(4)]] becomes </li><br />
<br />
{{EquationWithRef<br />
|equation=<div id="proba3b"><math><br />
\pypsith(\by,\bpsi,\theta_R;\bt,\theta_F,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta_R;\theta_F,\bc) \, \pth(\theta_R).<br />
</math></div><br />
|reference=(5) }} <br />
</ol>}}<br />
<br />
<br />
{{OutlineText<br />
|text=<br />
<li> In this context, the model is the joint distribution of the observations, the individual parameters and the population parameters:<br />
<br />
{{Equation1<br />
|equation=<math>\pypsith(\by,\bpsi,\theta;\bc,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta). </math> }}<br />
<br />
<li> The inputs of the model are the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times $\bt=(t_{ij} , 1\leq i \leq N , 1\leq j \leq n_i)$.<br />
}}<br />
<br />
<br />
{{Example<br />
|title=Example:<br />
|text= We can introduce [http://en.wikipedia.org/wiki/Prior_probability prior distributions] in order to model the inter-population variability of the population parameters $ V_{\rm pop}$ and $k_{\rm pop}$: <br />
<br />
{{EquationWithRef<br />
|equation=<div id="ex_proba3"><math>\begin{eqnarray}<br />
V_{\rm pop} &\sim& {\cal N}\left(30,3^2\right) <br />
\end{eqnarray}</math></div><br />
|reference=(6) }}<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
k_{\rm pop} &\sim& {\cal N}\left(0.1,0.01^2\right). \end{eqnarray}</math> }}<br />
<br />
As before, the conditional distribution of the concentration is given by [[#ex_proba2a|(2)]]. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given $\theta_R=(V_{\rm pop},k_{\rm pop})$. The distribution of $\theta_R$ is defined in [[#ex_proba3|(6)]]. Here, the inputs of the model are the fixed population parameters $\theta_F = (\omega_V,\omega_k,\beta,a)$, the weights $(w_i)$ and the design $\bt$.<br />
}}<br />
<br />
<br />
<br><br />
<br />
===A model for the covariates===<br />
<br />
<br />
Another scenario is to suppose that in fact it is the covariates $\bc$ that are random, not the population parameters. This may either be in the context of wanting to simulate individuals, or when modeling and wanting to take into account uncertainty in the covariate values. If we note $\qc$ the distribution of the covariates, the joint distribution $\qpsic$ of the individual parameters and the covariates decomposes naturally as:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="proba4"><math><br />
\ppsic(\bpsi,\bc;\theta) = \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) \, ,<br />
</math><br />
|reference=(7) }}<br />
<br />
where $\qcpsic$ is the conditional distribution of $\bpsi$ given $\bc$.<br />
<br />
<br />
{{OutlineText<br />
|text= <br />
<li>In this context, the model is the joint distribution of the observations, the individual parameters and the covariates:<br />
<br />
{{Equation1<br />
|equation=<math> <br />
\pypsic(\by,\bpsi,\bc;\theta,\bt) = \pcypsi(\by {{!}} \bpsi;\bt) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) . </math> }}<br />
<br />
<li>The inputs of the model are the population parameters $\theta$ and the measurement times $\bt$.<br />
}}<br />
<br />
<br />
{{Example<br />
|title=Example:<br />
|text= We could assume a normal distribution as a prior for the weights: <br />
<br />
{{EquationWithRef<br />
|equation=<div id="ex_proba4" ><math> <br />
w_i \sim_{i.i.d.} {\cal N}\left(70,10^2\right). </math></div><br />
|reference=(8) }}<br />
<br />
Once more, [[#ex_proba2a|(2)]] defines the conditional distribution of the concentrations. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given the weight $\bw$, which is now a random variable whose distribution is defined in [[#ex_proba4|(8)]]. Now, the inputs of the model are the population parameters $\theta= (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$ and the design $\bt$.<br />
}}<br />
<br />
<br />
<br><br />
<br />
===A model for the measurement times===<br />
<br />
Another scenario is to suppose that there is uncertainty in the measurement times $\bt=(t_{ij})$ and not the population parameters or covariates. If we note $\nominal{\bt}=(\nominal{t}_{ij}, 1\leq i \leq N, 1\leq j \leq n_i)$ the nominal measurement times (i.e., those presented in a data set), then the "true" measurement times $\bt$ at which the measurement were made can be considered random fluctuations around $\nominal{\bt}$ following some distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br />
<br />
Randomness with respect to time can also appear in the presence of dropout, i.e., individuals that prematurely leave a clinical trial. For such an individual $i$ who leaves at the random time $T_i$, measurement times are the nominal times before $T_i$: $t_{i} = (\nominal{t}_{ij} \ \ {\rm s.t. }\ \ \nominal{t}_{ij}\leq T_i)$.<br />
In such situations, measurement times are therefore random and can be thought to come from a distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br />
<br />
<br />
{{Remarks<br />
|title=Remark<br />
|text= If there are also other regression variables $\bx=(x_{ij})$, it is of course possible to use the same approach and consider $\bx$ as a random variable fluctuating around $\nominal{\bx}$. }}<br />
<br />
<br />
{{OutlineText<br />
|text=<br />
<li> In this context, the model is the joint distribution of the observations, the individual parameters and the measurement times:<br />
<br />
{{Equation1<br />
|equation=<math>\pypsit(\by , \bpsi,\bt; \theta,\bc,\nominal{\bt})=\pcypsit(\by {{!}}\bpsi,\bt) \, \ppsi(\bpsi;\theta,\bc) \, \pt(\bt ; \nominal{\bt}) .<br />
</math> }}<br />
<br />
<br />
<li> The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$ and the nominal design $\nominal{\bt}$.<br />
}}<br />
<br />
{{Example<br />
|title=Example:<br />
|text= Let us assume as prior a normal distribution around the nominal times: <br />
<br />
{{EquationWithRef<br />
|equation=<div id="ex_proba5" ><math> <br />
t_{ij} \sim_{i.i.d.} {\cal N}\left(\nominal{t}_{ij},0.03^2\right). </math></div><br />
|reference=(9) }}<br />
<br />
Here, [[#ex_proba5|(9)]] defines the distribution of the now random variable $ \bt$. The other components of the model defined in [[#ex_proba2a|(2)]] and [[#ex_proba2b|(3)]] remain unchanged. <br />
The inputs of the model are the population parameters $ \theta$, the weights $ (w_i)$ and the nominal measurement times $ \nominal{\bt}$.<br />
}}<br />
<br />
<br />
<br><br />
<br />
===A model for the dose regimen===<br />
<br />
If the structural model is a dynamical system (e.g., defined by a system of [http://en.wikipedia.org/wiki/Ordinary_differential_equation ordinary differential equations]), the ''source terms'' $\bu = (u_i, 1\leq i \leq N)$, i.e., the inputs of the dynamical system, are usually considered fixed and known. This is the case for example for doses administered to patients for a given treatment. Here, the source term $u_i$ is made up of the dose(s) given to patient $i$, the time(s) of administration, and their type (intravenous bolus, infusion, oral, etc.).<br />
<br />
Here again, there may be differences between the nominal dose regimen stated in the protocol and given in the data set, and the dose regimen that was in reality administered. For example, it might be that the times of administration and/or the dosage were not exactly respected or recorded. Also, there may have been non-compliance, i.e., certain doses that were not taken by the patient.<br />
<br />
If we denote $\nominal{\bu}=(\nominal{u}_{i}, 1\leq i \leq N)$ the nominal dose regimens (reported in the dataset), then in this context the "real" dose regimens $\bu$ can be considered to randomly fluctuate around $\nominal{\bu}$ with some distribution $\qu(\, \cdot \, ; \nominal{\bu})$.<br />
<br />
<br />
{{OutlineText<br />
|text=<br />
<li> In this context, the model is the joint distribution of the observations, the individual parameters and the dose regimens:<br />
<br />
{{Equation1<br />
|equation=<math>\pypsiu(\by , \bpsi,\bu; \theta,\bc,\bt,\nominal{\bu})=\pcypsiu(\by {{!}} \bpsi,\bu;\bt) \, \pu(\bu ; \nominal{\bu}) \, \ppsi(\bpsi;\theta,\bc) .<br />
</math> }}<br />
<br />
<li> The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$, the nominal design $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br />
}}<br />
<br />
<br />
{{Example<br />
|title=Example:<br />
|text=Suppose that instead of the one dose given in the example up to now, there are now repeated doses $(d_{ik}, k \geq 1)$ administered to patient $i$ at times $(\tau_{ik} , k \geq 1)$. Then, it is easy to see that<br />
<br />
{{EquationWithRef<br />
|equation=<div id="ex_proba6b" ><math> <br />
y_{ij} \sim {\cal N}\left(f(t_{ij};V_i,k_i) , a_i^2 \right), </math></div><br />
|reference=(10) }}<br />
<br />
where<br />
<br />
{{EquationWithRef<br />
|equation=<div id="ex_proba6a" ><math> <br />
f(t;V_i,k_i) = \sum_{k, \tau_{ik}<t}\displaystyle {\frac{d_{ik} }{V_i} }\, e^{-k_i \, (t- \tau_{ik})} .<br />
</math></div><br />
|reference=(11) }}<br />
<br />
The "real" dose regimen administered to patient $i$ can be written $u_i=(d_{ik},\tau_{ik}, k\geq 1)$, and the prescribed dose regimen $\nominal{u}_i=(\nominal{d}_{ik},\nominal{\tau}_{ik}, k\geq 1)$.<br />
<br />
We can model the random fluctuations of the administration times $\tau_{ik}$ around the nominal times $(\nominal{\tau}_{ij})$:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="ex_proba6c"><math>\begin{eqnarray}<br />
\tau_{ik} &\sim_{i.i.d.}& {\cal N}\left(\nominal{\tau}_{ik},0.02^2\right)\ , <br />
\end{eqnarray}</math></div><br />
|reference=(12) }}<br />
<br />
and non-compliance (here meaning that a dose is not taken):<br />
<br />
{{EquationWithRef<br />
|equation=<div id="ex_proba6d"><math>\begin{eqnarray}<br />
\pi &=& \prob{d_{ik} = 0} \nonumber \\ &=& 1 - \prob{d_{ik}= \nominal{d}_{ik} }. <br />
\end{eqnarray}</math></div><br />
|reference=(13) }}<br />
<br />
Here, [[#ex_proba6b{{!}}(10)]] and [[#ex_proba6a{{!}}(11)]] define the conditional distributions of the concentrations $(y_{ij})$, [[#ex_proba2b{{!}}(3)]] defines the distribution of $\bpsi$ and [[#ex_proba6c{{!}}(12)]] and [[#ex_proba6d{{!}}(13)]] define the distribution of $\bu$. The inputs are the population parameters $\theta$, the weights $(w_i)$, the measurement times $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br />
}}<br />
<br />
<br />
<br><br />
<br />
===A complete model===<br />
<br />
We have now seen the variety of ways in which the variables in a model either play the role of random variables whose distribution is defined by the model, or nonrandom variables or parameters. Any combination is possible, depending on the context. For instance, the population parameters $\theta$ and covariates $\bc$ could be random with parametric probability distributions $\qth(\, \cdot \,;\varphi)$ and $\qc(\, \cdot \,;\gamma)$, and the dose regimen $\bu$ and measurement times $\bt$ reported with uncertainty and therefore modeled as random variables with distributions $\qu$ and $\qt$.<br />
<br />
<br />
{{OutlineText<br />
|text=<br />
<li> In this context, the model is the joint distribution of the observations, the individual parameters, the population parameters, the dose regimens, the covariates and the measurement times:<br />
<br />
{{Equation1<br />
|equation=<math><br />
\pypsithcut(\by , \bpsi, \theta, \bu, \bc,\bt; \nominal{\bu},\nominal{\bt},\varphi,\gamma)=\pcypsiut(\by {{!}}\bpsi,\bu,\bt) \, \pcpsithc(\bpsi{{!}}\theta,\bc) \, \pth(\theta;\varphi) \, \pc(\bc;\gamma) \, \pu(\bu ; \nominal{\bu}) \, \pt(\bt ; \nominal{\bt}). </math> }}<br />
<br />
<li> The inputs of the model are the nominal dose regimens $\nominal{\bu}$, the nominal measurement times $\nominal{\bt}$ and the "hyper-parameters" $\varphi$ and $\gamma$.<br />
}}<br />
<br />
<br />
<br><br />
<br />
==Using the model for executing tasks==<br />
<br />
<br />
In the modeling and simulation context, the tasks to execute make specific use of the various probability distributions associated with a model.<br />
<br />
<br><br />
===Simulation===<br />
<br />
<br />
By definition, simulation makes direct use of the probability distribution that defines the model. Simulation of the global model is straightforward as soon as the joint distribution can be decomposed into a product of easily simulated conditional distributions.<br />
<br />
Consider for example that the variables involved in the model are those introduced in [[#An illustrative example|the previous section]]:<br />
<br />
<br />
# The population parameters $\theta$ can either be given, or simulated from the distribution $\qth$.<br />
# The individual covariates $\bc$ can either be given, or simulated from the distribution $\qc$.<br />
# The individual parameters $\bpsi$ can be simulated from the distribution $\qcpsithc$ using the values of $\theta$ and $\bc$ obtained in steps 1 and 2.<br />
# The dose regimen $\bu$ can either be given, or simulated from the distribution $\qu$.<br />
# The measurement times $\bt$ (resp. regression variables $\bx$) can either be given, or simulated from the distribution $\qt$ (resp. $\qx$).<br />
# Lastly, observations $\by$ can be simulated from the distribution $\qcypsiut$ using the values of $\bpsi$, $\bu$ and $\bt$ obtained at steps 3, 4 and 5.<br />
<br />
<br />
{{OutlineText<br />
|text=<br />
Simulation of a set of variables $w$ using another given set of variables $z$ requires:<br />
<br />
<br />
<ul><br />
* a model, i.e., a distribution $\qw$ if $z$ is treated as a nonrandom variable, or a conditional distribution $\qcwz$ if $z$ is treated as a random variable.<br />
* the input $z$, i.e., a value of $z$ which allows the distribution $\qw(\, \cdot \, ; z)$ or the conditional distribution $\qcwz(\, \cdot \, {{!}} z)$ to be defined.<br />
* an algorithm which allows us to generate $w$ from $\qw$ or $\qcwz$.<br />
</ul><br />
}}<br />
<br />
<br />
{{Example<br />
|title=Example:<br />
|text=<br />
- Imagine instead that the population parameter $\theta$ and the design $(\bu,\bt)$ are given, and we want to simulate the individual covariates $\bc$, the individual parameters $\bpsi$ and the observations $\by$. Here, the variables to simulate are $w=(\bc,\bpsi,\by)$ and the variables which are given are $z=(\theta,\bu,\bt)$. If the components of $z$ are taken to be nonrandom variables, then:<br />
<br />
<br />
<ul><br />
* The model is the joint distribution $\qypsic( \, \cdot \, ;\theta,\bu,\bt)$ of $(\by,\bpsi,\bc)$.<br />
* The inputs required for the simulation are the values of $(\theta,\bu,\bt)$.<br />
* The algorithm should be able to generate $(\by,\bpsi,\bc)$ from the joint distribution $\qypsic(\, \cdot \, ;\theta,\bu,\bt)$. Decomposing the model into three submodels leads to decomposing the joint distribution as<br />
</ul><br />
<br />
{{Equation1<br />
|equation=<math><br />
\pypsic(\by,\bpsi,\bc ;\theta,\bu,\bt) = \pc(\bc) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pcypsi(\by {{!}} \bpsi;\bu,\bt) . </math> }}<br />
<br />
The algorithm therefore reduces to successively drawing $\bc$, $\bpsi$ and $\by$ from $\qc$, $\qcpsic(\, \cdot \, {{!}} \bc;\theta)$ and $\qcypsi(\, \cdot \, {{!}} \bpsi;\bu,\bt)$. <br />
<br />
<br />
- Imagine instead that the individual covariates $\bc$, the observations $\by$, the design $(\bu,\bt)$ and the population parameter $\theta$ are given (in a modeling context for instance, $\theta$ may have been estimated), and we want to simulate the individual parameters $\bpsi$. The only variable to simulate is $w=\bpsi$ and the variables which are given are $z=(\by,\bc,\theta,\bu,\bt)$. Here, we will treat $\by$ as if it is a random variable. The other components of $z$ can be treated as non random variables. Here,<br />
<br />
<br />
<ul><br />
* The model is the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$ of $\psi$.<br />
* The inputs required for the simulation are the values of $(\by,\bc,\theta,\bu,\bt)$.<br />
* The algorithm should be able to sample $\bpsi$ from the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$. [http://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo Markov Chain Monte Carlo] (MCMC) algorithms can be used for sampling from such complex conditional distributions.<br />
</ul><br />
}}<br />
<br />
<br><br />
<br />
===Estimation of the population parameters===<br />
<br />
<br />
In a modeling context, we usually assume that we have data that includes the observations $\by$ and the measurement times $\bt$. There may also be individual covariates $\bc$, and in pharmacological applications the dose regimen $\bu$. For clarity, let us consider the most general case where all are present.<br />
<br />
Any statistical method for estimating the population parameters $\theta$ will be based on some specific probability distribution. Let us illustrate this with two common statistical methods: maximum likelihood and Bayesian estimation.<br />
<br />
<br />
''Maximum likelihood estimation'' consists in maximizing with respect to $\theta$ the ''observed likelihood'', defined by:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
{\like}(\theta ; \by,\bc,\bu,\bt) &\eqdef& \py(\by ; \bc,\bu,\bt,\theta) \\<br />
&=& \int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi .<br />
\end{eqnarray}</math> }}<br />
<br />
The variance of the estimator $\thmle$ and therefore confidence intervals can be derived from the observed Fisher information matrix, which itself is calculated using the observed likelihood (i.e., the pdf of the observations $\by$):<br />
<br />
{{EquationWithRef<br />
|equation=<div id="ofim_intro3"><math><br />
\ofim(\thmle ; \by,\bc,\bu,\bt) \ \ \eqdef \ \ - \displaystyle{ \frac{\partial^2}{\partial \theta^2} } \log({\like}(\thmle ; \by,\bc,\bu,\bt)) .<br />
</math></div><br />
|reference=(14) }}<br />
<br />
<br />
{{OutlineText<br />
|text=Maximum likelihood estimation of the population parameter $\theta$ requires:<br />
<br />
<br />
* a model, i.e., a joint distribution $\qypsi$.<br />
* inputs $\by$, $\bc$, $\bu$ and $\bt$.<br />
* an algorithm which allows us to maximize $\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi$ with respect to $\theta$ and to compute $\displaystyle{ \frac{\partial^2}{\partial \theta^2} }\left\{\log\left(\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\thmle) \, d \bpsi \right)\right\}$.<br />
}}<br />
<br />
<br />
''Bayesian estimation'' consists in estimating and/or maximizing the conditional distribution<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\pcthy(\theta {{!}} \by ;\bc,\bu,\bt) &=& \displaystyle{ \frac{\pyth(\by , \theta ; \bc,\bu,\bt)}{\py(\by ; \bc,\bu,\bt)} } \\<br />
&=& \frac{\displaystyle{ \int \pypsith(\by,\bpsi,\theta ; \bc,\bu,\bt) \, d \bpsi} }{\py(\by ; \bc,\bu,\bt)} .<br />
\end{eqnarray}</math> }}<br />
<br />
<br />
{{outlineText<br />
|text= Bayesian estimation of the population parameter $\theta$ requires:<br />
<br />
<br />
* a model, i.e., a joint distribution $\qypsith(\, \cdot \, ; \bc, \bu, \bt)$ for $(\by,\bpsi,\theta)$.<br />
* inputs $\by$, $\bc$, $\bu$ and $\bt$.<br />
* algorithms able to estimate and maximize $\pcthy(\theta {{!}} \by ;\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode, i.e., finding its maximum.<br />
}}<br />
<br />
<br />
<br><br />
<br />
===Estimation of the individual parameters===<br />
<br />
<br />
When $\theta$ is given (or estimated), various estimators of the individual parameters $\bpsi$ are available. They are all based on a probability distribution:<br />
<br />
''Maximum likelihood estimation'' consists of maximizing with respect to $\bpsi$ the ''conditional likelihood''<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
{\like}(\bpsi ; \by,\bu,\bt) &\eqdef& \pcypsi(\by {{!}} \bpsi ;\bu,\bt) .<br />
\end{eqnarray}</math> }}<br />
<br />
The ''maximum a posteriori'' (MAP) estimator is obtained by maximizing with respect to $\bpsi$ the ''conditional distribution''<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\pcpsiy(\bpsi {{!}} \by ; \theta,\bc,\bu,\bt) &=& \displaystyle{ \frac{\pypsi(\by , \bpsi;\theta,\bc,\bu,\bt)}{\py(\by ; \theta,\bc,\bu,\bt)} } .<br />
\end{eqnarray}</math> }}<br />
<br />
The ''conditional mean'' of $\bpsi$ is defined as the mean of the conditional distribution $\qcpsiy(\, \cdot \, | \by ; \theta,\bc,\bu,\bt)$ of $\psi$.<br />
<br />
<br />
{{OutlineText<br />
|text=<br />
Estimation of the individual parameters $\bpsi$ requires:<br />
<br />
<br />
* a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br />
* inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br />
* algorithms able to estimate and maximize $\pcpsiy(\bpsi {{!}} \by ; \theta,\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode (i.e., its MAP).<br />
}}<br />
<br />
<br />
<br><br />
===Model selection===<br />
<br />
<br />
Likelihood ratio tests and statistical information criteria ([http://en.wikipedia.org/wiki/Bayesian_information_criterion BIC], [http://en.wikipedia.org/wiki/Akaike_information_criterion AIC]) compare the ''observed likelihoods'' computed under different models, i.e., the probability distribution functions $\py^{(1)}(\by ; \bc,\bu,\bt,\thmle_1)$, $\py^{(2)}(\by ; \bc,\bu,\bt,\thmle_2)$, ..., $\py^{(K)}(\by ; \bc,\bu,\bt,\thmle_K)$ computed under models ${\cal M}_1, {\cal M}_2$, ..., ${\cal M}_K$, where $\thmle_k$ maximizes the observed likelihood of model ${\cal M}_k$, i.e., maximizes $\py^{(k)}(\by ; \bc,\bu,\bt,\theta)$ .<br />
<br />
<br />
{{outlineText<br />
|text=<br />
Computing the observed likelihood and information criteria require:<br />
<br />
<br />
* a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br />
* inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br />
* an algorithm able to compute $\int \pypsi( \by ,\bpsi ;\theta,\bc,\bu,\bt) \, d\bpsi$. For nonlinear models, linearization methods or Monte Carlo methods can be used.<br />
}}<br />
<br />
<br />
<br><br />
<br />
===Optimal design===<br />
<br />
In the design of experiments for estimating statistical models, optimal designs allow parameters to be estimated with minimum variance by optimizing some statistical criteria. Common optimality criteria are [http://en.wikipedia.org/wiki/Functional_%28mathematics%29 functionals] of the [http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors eigenvalues] of the expected Fisher information matrix<br />
<br />
{{EquationWithRef<br />
|equation=<div id="efim_intro3"><math><br />
\efim(\theta ; \bu,\bt) \ \ \eqdef \ \ \esps{y}{\ofim(\theta ; \by,\bu,\bt)} ,<br />
</math></div><br />
|reference=(15) }}<br />
<br />
where $\ofim$ is the observed Fisher information matrix defined in [[#ofim_intro3|(15)]]. For the sake of simplicity, we consider models without covariates $\bc$.<br />
<br />
<br />
{{OutlineText<br />
|text=Optimal design for minimum variance estimation requires:<br />
<br />
<br />
* a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br />
* a vector of population parameters $\theta$.<br />
* a criteria ${\cal D}(\bu,\bt)$ derived from the expected Fisher information matrix $\efim(\theta ; \bu,\bt)$.<br />
* an algorithm able to estimate $\efim(\theta ; \bu,\bt)$ for any design $(\bu,\bt)$ and to maximize ${\cal D}(\bu,\bt)$ with respect to $\bu$ and $\bt$.<br />
}}<br />
<br />
<br />
In a [http://en.wikipedia.org/wiki/Clinical_trial clinical trial] context, studies are designed to optimize the probability of reaching some predefined target ${\cal A}$, i.e., $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$. This may include optimizing safety and efficacy, and things like the probability of reaching [http://en.wikipedia.org/wiki/Sustained_viral_response sustained virologic response], etc.<br />
<br />
<br />
{{OutlineText<br />
|text=Optimal design for clinical trials requires:<br />
<br />
<br />
* a model, i.e., a joint distribution $\qypsic(\, \cdot \, ; \theta, \bu, \bt)$ for $(\by,\bpsi,\bc)$.<br />
* a vector of population parameters $\theta$.<br />
* a target ${\cal A}$.<br />
* an algorithm able to estimate $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$ and to maximize it with respect to $\bu$ and $\bt$.<br />
}}<br />
<br />
<br />
<br><br />
<br />
==Implementing models and running tasks==<br />
<br />
<br />
===Example 1 ===<br />
<br />
Consider first the model defined by the joint distribution <br />
<br />
{{Equation1<br />
|equation= <math>\pypsi(\by,\bpsi ; \theta ,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \pcpsic(\bpsi ; \theta),</math>}}<br />
<br />
where as in our running example, <br />
<br />
<br />
<ul><br />
* $\by = (y_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are concentrations <br />
<br />
* $ \bpsi= (\psi_i, 1\leq i \leq N)$ are individual parameters; here $ \psi_i=(V_i,k_i,a_i)$<br />
<br />
* $ \theta=(V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,a)$ are population parameters <br />
<br />
* $ \bt = (t_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are the measurement times. <br />
</ul><br />
<br />
<br />
We aim to define a joint model for $\by$ and $\bpsi$. To do this, we will characterize each component of the model and show how this can be implemented with $\mlxtran$.<br />
<br />
{| cellspacing="10" cellpadding="10"<br />
|style="width:50%"|<br />
{{Equation2 <br />
|name=<math> \pypsi(\by,\bpsi ; \theta, \bt) </math> <br />
|equation= }}<br />
{{Equation2<br />
|name= <math> \pcpsic(\bpsi {{!}}\theta)</math><br />
|equation=<br />
<math>\begin{array}{c} <br />
\log(V_i) &\sim& {\cal N}\left(\log(V_{\rm pop}), \, \omega_V^2\right) \\<br />
\log(k_i) &\sim& {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br />
\end{array}</math> }}<br />
{{Equation2<br />
|name= <math>\pcypsi(y{{!}}\bpsi; \bt) </math><br />
|equation=<br />
<math>\begin{eqnarray}<br />
f(t;V_i,k_i) &=& \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br />
y_{ij} &\sim& {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br />
\end{eqnarray}</math> }}<br />
<br />
|style = "width:50%" |<br />
{{MLXTranForTable<br />
|name=Example 1<br />
|text=<pre style="background-color:#EFEFEF; border:none;"> <br />
[INDIVIDUAL PARAMETER]<br />
input={V_pop,k_pop,omega_V,omega_k}<br />
<br />
DEFINITION:<br />
V = {distribution=logNormal, prediction=V_pop,sd=omega_V}<br />
k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br />
<br />
<br />
[OBSERVATION]<br />
input={V,k,a}<br />
<br />
EQUATION:<br />
f = 500/V*exp(-k*t)<br />
<br />
DEFINITION:<br />
y = {distribution=normal, prediction=f, sd=a}<br />
</pre> }}<br />
|}<br />
<br />
<br />
We can then use this model with different tools for executing different tasks: it can be used for example with $\mlxplore$ for model exploration, with $\monolix$ for modeling, with R or Matlab for simulation, etc.<br />
<br />
It is important to remember that $\mlxtran$ is not a "function" that calculates an output. It is not an imperative but rather a declarative language, one that allows us to describe a model. It is then the tasks we choose to do which use $\mlxtran$ like a function, "requesting" it to give predictions, simulate random variables, compute a pdf, maximizes a likelihood, etc.<br />
<br />
<br />
<br><br />
<br />
===Example 2===<br />
<br />
Consider now a model defined by the joint distribution<br />
<br />
{{Equation1<br />
|equation=<math><br />
\pypsithc(\by,\bpsi, \theta, \bc ; \bt) = \pcypsi(\by{{!}}\bpsi;\bt) \pcpsic(\bpsi{{!}}\bc ; \theta) \, \pth(\theta) \pc(\bc) ,<br />
</math> }}<br />
<br />
where the covariates $\bc$ are the weights of the individuals: $\bc = (w_i, 1\leq i \leq N)$. The other variables and parameters are those already defined in the previous example.<br />
<br />
We now aim to define a joint model for $\by$, $\bpsi$, $\bc$ and $\theta_R=(V_{\rm pop},k_{\rm pop})$.<br />
<br />
<br />
{| cellspacing="10" cellpadding="10"<br />
|style="width:50%" |<br />
{{Equation2 <br />
|name= <math>\pypsithc(\by,\bpsi, \theta, \bc ; \bt)</math><br />
|equation= }}<br />
{{Equation2<br />
|name=<math>\pth(\theta)</math><br />
|equation=<math>\begin{eqnarray}<br />
V_{\rm pop} &\sim& {\cal N}\left(30,3^2\right) \\<br />
k_{\rm pop} &\sim& {\cal N}\left(0.1,0.01^2\right)<br />
\end{eqnarray}</math> }}<br />
{{Equation2<br />
|name=<math>\pc(\bc)</math><br />
|equation=<br />
<math>\begin{eqnarray}<br />
w_i &\sim& {\cal N}\left(70,10^2\right)<br />
\end{eqnarray}</math> }}<br />
{{Equation2<br />
|name=<math>\pcpsic(\bpsi {{!}}\bc;\theta)</math><br />
|equation=<math><br />
\begin{eqnarray}<br />
\hat{V}_i &=& V_{\rm pop}\left(\frac{w_i}{70}\right)^\beta \\[0.4cm]<br />
\log(V_i) &\sim& {\cal N}\left(\log(\hat{V}_i), \, \omega_V^2\right) \\<br />
\log(k_i) &\sim& {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br />
\end{eqnarray}</math> }}<br />
{{Equation2<br />
|name=<math>\pcypsi(y{{!}}\bpsi; \bt) </math><br />
|equation=<math>\begin{eqnarray}<br />
f(t;V_i,k_i) &=& \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br />
y_{ij} &\sim& {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br />
\end{eqnarray}</math> }}<br />
<br />
|style="width:50%"|<br />
{{MLXTranForTable<br />
|name=jointModel2.txt<br />
|text=<pre style="background-color:#EFEFEF; border:none"><br />
[POPULATION PARAMETER]<br />
<br />
DEFINITION:<br />
V_pop = {distribution=normal, mean=30, sd=3}<br />
k_pop = {distribution=normal, mean=0.1, sd=0.01}<br />
<br />
<br />
[COVARIATE]<br />
<br />
DEFINITION:<br />
weight = {distribution=normal, mean=70, sd=10}<br />
<br />
<br />
<br />
[INDIVIDUAL PARAMETER]<br />
input={V_pop,k_pop,omega_V,omega_k,beta,weight}<br />
<br />
EQUATION:<br />
V_pred = V_pop*(weight/70)^beta<br />
<br />
DEFINITION:<br />
V = {distribution=logNormal, prediction=V_pred,sd=omega_V}<br />
k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br />
<br />
<br />
[OBSERVATION]<br />
input={V,k,a}<br />
<br />
EQUATION:<br />
f = 500/V*exp(-k*t)<br />
<br />
DEFINITION:<br />
y = {distribution=normal, prediction=f, sd=a}<br />
</pre> }}<br />
|}<br />
<br />
We can use the approach described above for various tasks, e.g., simulating $(\by,\bpsi, \bc, \theta_R)$ for a given input $(\theta_F, \bt)$, simulating the population parameters $(V_{\rm pop},k_{\rm pop})$ with the conditional distribution $p_{\theta_R|\by, \bc}( \, \cdot \, | \by, \bc ; \theta_F,\bt)$, estimating the log-likelihood, maximizing the observed likelihood and computing the MAP.<br />
<br />
<br />
<br />
<br><br />
<br />
<!--<br />
==Bibliography==<br />
TO DO<br />
--><br />
<br />
<br />
{{Back&Next<br />
|linkBack=The individual approach<br />
|linkNext=Description, representation and implementation of a model }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=The_SAEM_algorithm_for_estimating_population_parameters&diff=7312The SAEM algorithm for estimating population parameters2013-06-17T10:08:15Z<p>Brocco: </p>
<hr />
<div>==Introduction ==<br />
<br />
<br />
The SAEM (Stochastic Approximation of EM) algorithm is a stochastic algorithm for calculating the maximum likelihood estimator (MLE) in the quite general setting of incomplete data models. SAEM has been shown to be a very powerful NLMEM tool, known to accurately estimate population parameters as well as having good theoretical properties. In fact, it converges to the MLE under very general hypotheses.<br />
<br />
SAEM was first implemented in the $\monolix$ software. It has also been implemented in NONMEM, the {{Verbatim|R}} package {{Verbatim|saemix}} and the Matlab statistics toolbox as the function {{Verbatim|nlmefitsa.m}}.<br />
<br />
Here, we consider a model that includes observations $\by=(y_i , 1\leq i \leq N)$, unobserved individual parameters $\bpsi=(\psi_i , 1\leq i \leq N)$ and a vector of parameters $\theta$. By definition, the maximum likelihood estimator of $\theta$ maximizes<br />
<br />
{{Equation1<br />
|equation=<math> {\like}(\theta ; \by) = \py(\by ; \theta) = \displaystyle{ \int \pypsi(\by,\bpsi ; \theta) \, d \bpsi}.<br />
</math> }}<br />
<br />
<br />
SAEM is an iterative algorithm that essentially consists of constructing $N$ <span class="plainlinks">[http://en.wikipedia.org/wiki/Markov_chain Markov chains]</span> $(\psi_1^{(k)})$, ..., $ (\psi_N^{(k)})$ that converge to the conditional distributions $\pmacro(\psi_1|y_1),\ldots , \pmacro(\psi_N|y_N)$, using at each step the complete data $(\by,\bpsi^{(k)})$ to calculate a new parameter vector $\theta_k$. We will present a general description of the algorithm highlighting the connection with the EM algorithm, and present by way of a simple example how to implement SAEM and use it in practice.<br />
<br />
We will also give some extensions of the base algorithm that allow us to improve the convergence properties of the algorithm. For instance, it is possible to stabilize the algorithm's convergence by using several <span class="plainlinks">[http://en.wikipedia.org/wiki/Markov_chain Markov chains]</span> per individual. Also, a simulated annealing version of SAEM allows us improve the chances of converging to the global maximum of the likelihood rather than to local maxima.<br />
<br />
<br />
<br><br />
==The EM algorithm==<br />
<br />
<br />
We first remark that if the individual parameters $\bpsi=(\psi_i)$ are observed, estimation is not thwarted by any particular problem because an estimator could be found by directly maximizing the joint distribution $\pypsi(\by,\bpsi ; \theta) $.<br />
<br />
However, since the $\psi_i$ are not observed, the EM algorithm replaces $\bpsi$ by its conditional expectation. Then, given some initial value $\theta_0$, iteration $k$ updates ${\theta}_{k-1}$ to ${\theta}_{k}$ with the two following steps:<br />
<br />
<br />
* $\textbf{E-step:}$ evaluate the quantity<br />
<br />
{{Equation1<br />
|equation=<math> Q_k(\theta)=\esp{\log \pmacro(\by,\bpsi;\theta){{!}} \by;\theta_{k-1} } .</math> }}<br />
<br />
<br />
* $\textbf{M-step:}$ update the estimation of $\theta$:<br />
<br />
{{Equation1<br />
|equation=<math> \theta_{k} = \argmax{\theta} \, Q_k(\theta) .<br />
</math> }}<br />
<br />
<br />
In can be proved that each EM iteration increases the likelihood of observations and that the EM sequence $(\theta_k)$ converges to a<br />
stationary point of the observed likelihood under mild regularity conditions.<br />
<br />
Unfortunately, in the framework of nonlinear mixed-effects models, there is no explicit expression for the E-step since the relationship between observations $\by$ and individual parameters $\bpsi$ is nonlinear. However, even though this expectation cannot be computed in a closed-form, it can be approximated by simulation. For instance,<br />
<br />
<br />
* The Monte Carlo EM (MCEM) algorithm replaces the E-step by a Monte Carlo approximation based on a large number of independent simulations of the non-observed individual parameters $\bpsi$.<br />
<br />
* The SAEM algorithm replaces the E-step by a stochastic approximation based on a single simulation of $\bpsi$.<br />
<br />
<br />
<br><br />
<br />
==The SAEM algorithm==<br />
<br />
At iteration $k$ of SAEM:<br />
<br />
<br />
* $\textbf{Simulation step}$: for $i=1,2,\ldots, N$, draw $\psi_i^{(k)}$ from the conditional distribution $\pmacro(\psi_i |y_i ;\theta_{k-1})$.<br />
<br />
<br />
* $\textbf{Stochastic approximation}$: update $Q_k(\theta)$ according to<br />
<br />
{{Equation1<br />
|equation=<math> Q_k(\theta) = Q_{k-1}(\theta) + \gamma_k ( \log \pmacro(\by,\bpsi^{(k)};\theta) - Q_{k-1}(\theta) ),<br />
</math> }}<br />
<br />
where $(\gamma_k)$ is a decreasing sequence of positive numbers such that $\gamma_1=1$, $ \sum_{k=1}^{\infty} \gamma_k = \infty$ and $\sum_{k=1}^{\infty} \gamma_k^2 < \infty$.<br />
<br />
<br />
* $\textbf{Maximization step}$: update $\theta_{k-1}$ according to<br />
<br />
{{Equation1<br />
|equation=<math> \theta_{k} = \argmax{\theta} \, Q_k(\theta) .</math> }}<br />
<br />
<br />
{{Remarks <br />
|title=Remarks<br />
|text= &#32;<br />
* Setting $\gamma_k=1$ for all $k$ means that there is no memory in the stochastic approximation:<br />
<br />
{{Equation1<br />
|equation=<math> Q_k(\theta) = \log \pmacro(\by,\bpsi^{(k)};\theta) . </math> }}<br />
<br />
: This algorithm, known as Stochastic EM (SEM) thus consists of successively simulating $\bpsi^{(k)}$ with the conditional distribution $\pmacro(\bpsi^{(k)} {{!}} \by;\theta_{k-1})$, then computing $\theta_k$ by maximizing the joint distribution $\pmacro(\by,\bpsi^{(k)};\theta)$.<br />
<br />
<br />
* When the number $N$ of subjects is small, convergence of SAEM can be improved by running $L$ <span class="plainlinks">[http://en.wikipedia.org/wiki/Markov_chain Markov chains]</span> for each individual instead of one. The simulation step at iteration $k$ then requires us to draw $L$ sequences $ { \phi_i^{(k,1)} } ,\ldots , { \phi_i^{(k,L)} } $ for each individual $i$ and to combine stochastic approximation and Monte Carlo in the approximation step:<br />
<br />
{{Equation1<br />
|equation=<math> Q_k(\theta) = Q_{k-1}(\theta) + \gamma_k \left( \frac{1}{L}\sum_{\ell=1}^{L} \log \pmacro(\by,\bpsi^{(k,\ell)};\theta) - Q_{k-1}(\theta) \right) .<br />
</math> }}<br />
<br />
: By default, $\monolix$ selects $L$ so that $N\times L \geq 50$.<br />
}}<br />
<br />
<br />
Implementation of SAEM is simplified when the complete model $\pmacro(\by,\bpsi;\theta)$ belongs to a regular (curved) exponential family:<br />
<br />
{{Equation1<br />
|equation=<math> \pmacro(\by,\bpsi ;\theta) = \exp\left\{ - \zeta(\theta) + \langle \tilde{S}(\by,\bpsi) , \varphi(\theta) \rangle \right\} , </math> }}<br />
<br />
where $\tilde{S}(\by,\bpsi)$ is a sufficient statistic of the complete model (i.e., whose value contains all the information needed to compute any estimate of $\theta$) which takes its values in an open subset ${\cal S}$ of $\Rset^m$. Then, there exists a function $\tilde{\theta}$ such that for any $s\in {\cal S}$,<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:saem_stat"><math><br />
\tilde{\theta}(s) = \argmax{\theta} \left\{ - \zeta(\theta) + \langle s , \varphi(\theta) \rangle \right\} .<br />
</math></div><br />
|reference=(1) }}<br />
<br />
The approximation step of SAEM simplifies to a general Robbins-Monro-type scheme for approximating this conditional expectation:<br />
<br />
<br />
* $\textbf{Stochastic approximation}$: update $s_k$ according to<br />
<br />
{{Equation1<br />
|equation=<math><br />
s_k = s_{k-1} + \gamma_k ( \tilde{S}(\by,\bpsi^{(k)}) - s_{k-1} ) . </math> }}<br />
<br />
<br />
Note that the E-step of EM simplifies to computing $s_k=\esp{\tilde{S}(\by,\bpsi) | \by ; \theta_{k-1}}$.<br />
<br />
Then, both EM and SAEM algorithms use [[#eq:saem_stat|(1)]] for the M-step: $\theta_k = \tilde{\theta}(s_k)$.<br />
<br />
Precise results for convergence of SAEM were obtained in the [[Estimation of the observed Fisher information matrix#Estimation using linearization of the model|Estimation of the F.I.M. using a linearization of the model]] chapter in the case where $\pmacro(\by,\bpsi;\theta)$ belongs to a regular curved exponential family. This first version of [[The SAEM algorithm for estimating population parameters|SAEM]] and these first results assume that the individual parameters are simulated exactly under the conditional distribution at each iteration. Unfortunately, for most nonlinear models or non-Gaussian models, the unobserved data cannot be simulated exactly under this conditional distribution. A well-known alternative consists in using the Metropolis-Hastings algorithm: introduce a transition probability which has as unique invariant distribution the conditional distribution we want to simulate.<br />
<br />
In other words, the procedure consists of replacing the Simulation step of SAEM at iteration $k$ by $m$ iterations of the<br />
Metropolis-Hastings (MH) algorithm described in [[The Metropolis-Hastings algorithm for simulating the individual parameters|The Metropolis-Hastings algorithm]] section. It was shown in the [[Estimation of the observed Fisher information matrix#Estimation using linearization of the model|Estimation of the F.I.M. using a linearization of the model]] section that [[The SAEM algorithm for estimating population parameters|SAEM]] still converges under general conditions when coupled with a <span class="plainlinks">[http://en.wikipedia.org/wiki/Markov_chain Markov chain]</span> Monte Carlo procedure.<br />
<br />
<br />
{{Remarks<br />
|title= Remark<br />
|text= Convergence of the <span class="plainlinks">[http://en.wikipedia.org/wiki/Markov_chain Markov chains]</span> $(\psi_i^{(k)})$ is not necessary at each SAEM iteration. It suffices to run a few MH iterations with various transition kernels before resetting $\theta_{k-1}$. In $\monolix$ by default, three transition kernels are used twice each, successively, in each SAEM iteration.<br />
}}<br />
<br />
<br />
<br><br />
<br />
== Implementing SAEM ==<br />
<br />
Implementation of SAEM can be difficult to describe when looking at complex statistical models such as mixture models, models with inter-occasion variability, etc. We are therefore going to limit ourselves to looking at some basic models in order to illustrate how SAEM can be implemented.<br />
<br />
<br><br />
===SAEM for general hierarchical models===<br />
<br />
Consider first a very general model for any type (continuous, categorical, survival, etc.) of data $(y_i)$:<br />
<br />
{{Equation1<br />
|equation= <math>\begin{eqnarray} y_i {{!}} \psi_i &\sim& \pcyipsii(y_i {{!}} \psi_i) \\<br />
h(\psi_i) &\sim& {\cal N}( \mu , \Omega),<br />
\end{eqnarray}</math> }}<br />
<br />
where $h(\psi_i)=(h_1(\psi_{i,1}), h_2(\psi_{i,2}), \ldots , h_d(\psi_{i,d}) )^\transpose$ is a $d$-vector of (transformed) individual parameters, $\mu$ a $d$-vector of fixed effects and $\Omega$ a $d\times d$ variance-covariance matrix.<br />
<br />
We assume here that $\Omega$ is positive-definite. Then, a sufficient statistic for the complete model $\pmacro(\by,\bpsi;\theta)$ is<br />
$\tilde{S}(\bpsi) = (\tilde{S}_1(\bpsi),\tilde{S}_2(\bpsi))$, where<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\tilde{S}_1(\bpsi) &= & \sum_{i=1}^N h(\psi_i) \\<br />
\tilde{S}_2(\bpsi) &= & \sum_{i=1}^N h(\psi_i) h(\psi_i)^\transpose .<br />
\end{eqnarray}</math> }}<br />
<br />
At iteration $k$ of SAEM, we have:<br />
<br />
<br />
* $\textbf{Simulation step}$: for $i=1,2,\ldots, N$, draw $\psi_i^{(k)}$ from $m$ iterations of the MH algorithm described in [[The Metropolis-Hastings algorithm for simulating the individual parameters|The Metropolis-Hastings algorithm]] with $\pmacro(\psi_i |y_i ;\mu_{k-1},\Omega_{k-1})$ as limiting distribution.<br />
<br />
* $\textbf{Stochastic approximation}$: update $s_k=(s_{k,1},s_{k,2})$ according to<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
s_{k,1} &=& s_{k-1,1} + \gamma_k \left( \sum_{i=1}^N h(\psi_i^{(k)}) - s_{k-1,1} \right) \\<br />
s_{k,2} &=& s_{k-1,2} + \gamma_k \left( \sum_{i=1}^N h(\psi_i^{(k)})h(\psi_i^{(k)})^\transpose - s_{k-1,2} \right) .<br />
\end{eqnarray}</math> }}<br />
<br />
<br />
* $\textbf{Maximization step}$: update $(\mu_{k-1},\Omega_{k-1})$ according to<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\mu_{k} &=& \frac{1}{N} s_{k,1} \\<br />
\Omega_k &=& \frac{1}{N}\left( s_{k,2} - s_{k,1}s_{k,1}^\transpose \right) .<br />
\end{eqnarray}</math> }}<br />
<br />
<br />
What is remarkable is that it suffices to be able to calculate $\pcyipsii(y_i | \psi_i)$ for all $\psi_i$ and $y_i$ in order to be able to run SAEM. In effect, this allows the simulation step to be run using MH since the acceptance probabilities can be calculated.<br />
<br />
<br />
<br><br />
<br />
===SAEM for continuous data models===<br />
Consider now a continuous data model in which the residual error variance is now constant:<br />
<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
y_{ij} &=& f(t_{ij},\phi_i) + a \teps_{ij} \\<br />
h(\psi_i) &\sim& {\cal N}( \mu , \Omega) .<br />
\end{eqnarray}</math> }}<br />
<br />
<br />
Here, the individual parameters are $\psi_i=(\phi_i,a)$. The variance-covariance matrix for $\psi_i$ is not positive-definite in this case because $a$ has no variability. If we suppose that the variance matrix $\Omega$ is positive-definite, then noting $\theta=(\mu,\Omega,a)$, a natural decomposition of the model is:<br />
<br />
{{Equation1<br />
|equation=<math>\pmacro(\by,\bpsi;\theta) = \pmacro(\by {{!}} \bpsi;a)\pmacro(\bpsi;\mu,\Omega) .<br />
</math> }}<br />
<br />
The previous statistic $\tilde{S}(\bpsi) = (\tilde{S}_1(\bpsi),\tilde{S}_2(\bpsi))$ is not sufficient for estimating $a$. Indeed, we need an additional component which is a function both of $\by$ and $\bpsi$:<br />
<br />
{{Equation1<br />
|equation=<math> \tilde{S}_3(\by, \bpsi) =\sum_{i=1}^N \sum_{j=1}^{n_i}(y_{ij} - f(t_{ij},\psi_i))^2. </math> }}<br />
<br />
Then,<br />
<br />
{{Equation1<br />
|equation=<math> \begin{eqnarray}<br />
s_{k,3} &=& s_{k-1,3} + \gamma_k ( \tilde{S}_3(\by, \bpsi) - s_{k-1,3} ) \\<br />
a_k^2 &=& \displaystyle{ \frac{1}{\sum_{i=1}^N n_i} s_{k,3} }\ .<br />
\end{eqnarray}</math> }}<br />
<br />
The choice of step-size $(\gamma_k)$ is extremely important for ensuring convergence of SAEM. The sequence $(\gamma_k)$ used in $\monolix$ decreases like $k^{-\alpha}$. We recommend using $\alpha=0$ (that is, $\gamma_k=1$) during the first $K_1$ iterations, in order to converge quickly to a neighborhood of a maximum of the likelihood, and $\alpha=1$ during the next $K_2$ iterations.<br />
Indeed, the initial guess $\theta_0$ may be far from the maximum likelihood value we are looking for, and the first iterations with $\gamma_k=1$ allow SAEM to converge quickly to a neighborhood of this value. Following this, smaller step-sizes ensure the<br />
almost sure convergence of the algorithm to the maximum likelihood estimator.<br />
<br />
<br />
<br />
{{Example<br />
|title=Example<br />
|text= Consider a simple model for continuous data:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
y_{ij} &\sim& {\cal N}(A_i\,e^{-k_i \, t_{ij} } , a^2) \\<br />
\log(A_i)&\sim&{\cal N}(\log(A_{\rm pop}) , \omega_A^2) \\<br />
\log(k_i)&\sim&{\cal N}(\log(k_{\rm pop}) , \omega_k^2) ,<br />
\end{eqnarray}</math> }}<br />
<br />
where $A_{\rm pop}=6$, $k_{\rm pop}=0.25$, $\omega_A=0.3$, $\omega_k=0.3$ and $a=0.2$.<br />
Let us look at the effect of different settings for $(\gamma_k)$ (and $L$) for estimating the population parameters of the model with SAEM.<br />
<br />
<br />
1. For all $k$, $\gamma_k = 1$: the sequence $(\theta_{k})$ converges very quickly to a neighborhood of the "solution". The sequence $(\theta_{k})$ is a homogeneous Markov Chain that converges in distribution but does not converge almost surely. <br />
<br />
[[File:saem1.png|link=]]<br />
<br />
<br />
2. For all $k$, $\gamma_k = 1/k$: the sequence $(\theta_{k})$ converges almost surely to the maximum likelihood estimate of $\theta$, but very slowly. <br />
<br />
[[File:saem2.png|link=]]<br />
<br />
<br />
3. $\gamma_k = 1$, $k=1$, ...,$40$, $\gamma_k = 1/(k-40)$, $k \geq 41$: the sequence $(\theta_{k})$ converges almost surely to the maximum likelihood estimate of $\theta$, and quickly.<br />
<br />
[[File:saem3.png|link=]]<br />
<br />
<br />
4. $L=10$, $\gamma_k = 1$, $k \geq 1$: the sequence $(\theta_{k})$ is an homogeneous Markov chain that converges in distribution, as in Example 1, but the variance is reduced by a factor $\sqrt{10}$; in this case, SAEM behaves like EM. <br />
<br />
[[File:saem4.png|link=]]<br />
}}<br />
<br />
<br />
<br><br />
<br />
==A simple example to understand why SAEM converges in practice==<br />
<br />
<br />
Let us look at a very simple Gaussian model, with only one observation per individual:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\psi_i &\sim& {\cal N}(\theta,\omega^2) , \ \ \ 1 \leq i \leq N \\<br />
y_i &\sim& {\cal N}(\psi_i,\sigma^2).<br />
\end{eqnarray}</math> }}<br />
<br />
We will furthermore assume that both $\omega^2$ and $\sigma^2$ are known.<br />
<br />
Here, the maximum likelihood estimator $ \hat{\theta}$ of $\theta$ is easy to compute since $y_i \sim_{i.i.d.} {\cal N}(\theta,\omega^2+\sigma^2)$. We find that<br />
<br />
{{Equation1<br />
|equation=<math> \hat{\theta} = \displaystyle{\frac{1}{N} }\sum_{i=1}^{N} y_i .<br />
</math>}}<br />
<br />
We now propose to try and compute $\hat{\theta}$ using SAEM instead. The simulation step is straightforward since the conditional distribution of $\psi_i$ is a normal distribution:<br />
<br />
{{Equation1<br />
|equation=<math><br />
\psi_i {{!}} y_i \sim {\cal N}(a \theta + (1-a)y_i , \gamma^2) ,<br />
</math> }}<br />
<br />
where<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
a &= & \displaystyle{ \frac{1}{\omega^2} } \left(\displaystyle{ \frac{1}{\sigma^2} }+ \displaystyle{\frac{1}{\omega^2} }\right)^{-1} \\<br />
\gamma^2 &= &\left(\displaystyle{ \frac{1}{\sigma^2} }+ \displaystyle{\frac{1}{\omega^2} }\right)^{-1}.<br />
\end{eqnarray}</math> }}<br />
<br />
The maximization step is also straightforward. Indeed, a sufficient statistic for estimating $\theta$ is<br />
<br />
{{Equation1<br />
|equation=<math> {\cal S}(\bpsi) = \sum_{i=1}^{N} \psi_i. </math> }}<br />
<br />
Then,<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\tilde{\theta}({\cal S(\bpsi)} ) &=& \argmax{\theta} \pmacro(y_1,\ldots,y_N,\psi_1,\ldots,\psi_N;\theta) \\<br />
&=& \argmax{\theta} \pmacro(\psi_1,\ldots,\psi_N;\theta) \\<br />
&=& \frac{ {\cal S}(\bpsi)}{N}.<br />
\end{eqnarray}</math> }}<br />
<br />
Let us first look at the behavior of SAEM when $\gamma_k=1$. At iteration $k$,<br />
<br />
<br />
* Simulation step: $\psi_i^{(k)} \sim {\cal N}( a \theta_{k-1} + (1-a)y_i , \gamma^2). $<br />
<br />
* Maximization step: $\theta_k = \displaystyle{ \frac{ {\cal S}(\bpsi^{(k)})}{N} } = \displaystyle{ \frac{1}{N} }\sum_{i=1}^{N} \psi_i^{(k)}$.<br />
<br />
<br />
It can be shown that:<br />
<br />
{{Equation1<br />
|equation=<math> \theta_k - \hat{\theta} = a(\theta_{k-1} - \hat{\theta}) + e_k ,<br />
</math> }}<br />
<br />
where $e_k \sim {\cal N}(0, \gamma^2 /N)$. Then, the sequence $(\theta_k)$ is an autoregressive process of order 1 (AR(1)) which converges in distribution to a normal distribution when $k\to \infty$:<br />
<br />
{{Equation1<br />
|equation=<math>\theta_k \limite{}{\cal D} {\cal N}\left(\hat{\theta} , \displaystyle{ \frac{\gamma^2}{N(1-a^2)} }\right) .<br />
</math> }}<br />
<br />
<br />
{{ImageWithCaption|image=saemb1.png|caption=10 sequences $(\theta_k)$ obtained with different initial values and $\gamma_k=1$ for $1\leq k \leq 50$ }} <br />
<br />
<br />
Now, let us see what happens instead when $\gamma_k$ decreases like $1/k$. At iteration $k$,<br />
<br />
<br />
* Simulation step: $\psi_i^{(k)} \sim {\cal N}( a \theta_{k-1} + (1-a)y_i , \gamma^2) $<br />
<br />
* Maximization step:<br />
<br />
{{Equation1<br />
|equation= <math>\theta_k = \theta_{k-1} + \displaystyle{ \frac{1}{k} }\left( \displaystyle{ \frac{1}{N} }\sum_{i=1}^{N} \psi_i^{(k)} -\theta_{k-1} \right). <br />
</math> }}<br />
<br />
<br />
: Here, we can show that:<br />
<br />
{{Equation1<br />
|equation=<math> \theta_k - \hat{\theta} = \displaystyle{ \frac{k-a}{k} }(\theta_{k-1} - \hat{\theta}) + \displaystyle{\frac{e_k}{k} }, <br />
</math> }}<br />
<br />
: where $e_k \sim {\cal N}(0, \gamma^2 /N)$. Then, the sequence $(\theta_k)$ converges almost surely to $\hat{\theta}$.<br />
<br />
<br />
{{ImageWithCaption|image=saemb2.png|caption=10 sequences $(\theta_k)$ obtained with different initial values and $\gamma_k=1/k$ for $1\leq k \leq 50$ }}<br />
<br />
<br />
Thus, we see that by combining the two strategies, the sequence $(\theta_k)$ is a Markov chain that converges to a random walk around $\hat{\theta}$ during the first $K_1$ iterations, then converges almost surely to $\hat{\theta}$ during the next $K_2$ iterations.<br />
<br />
<br />
{{ImageWithCaption|image=saemb3.png|caption=10 sequences $(\theta_k)$ obtained with different initial values, $\gamma_k=1$ for $1\leq k \leq 20$ and $\gamma_k=1/(k-20)$ for $21\leq k \leq 50$ }}<br />
<br />
<br />
{{ShowVideo|image=saem5b.png|video=http://popix.lixoft.net/images/2/20/saem.mp4|caption=The SAEM algorithm in practice. }}<br />
<br />
<!-- {{ImageWithCaptionL|image=saem5.png|size=750px|caption= The SAEM algorithm in practice. (a) the observations and the initialization $p_0(\psi_i)$, (b) the initialization $p_0(\psi_i)$ and the conditional distributions of the observations $p(y_i{{!}}\psi_i)$, (c) the conditional distributions $p_0(\psi_i{{!}}y_i)$ and the simulated individual parameters $(\psi_i^{(1)})$, (d) the updated distribution $p_1(\psi_i)$. }} --><br />
<br />
==A simulated annealing version of SAEM==<br />
<br />
<br />
Convergence of SAEM can strongly depend on the initial guess when the likelihood ${\like}$ has several local maxima. A simulated annealing version of SAEM can improve convergence of the algorithm toward the global maximum of ${\like}$.<br />
<br />
To detail this, we can first rewrite the joint pdf of $(\by,\bpsi)$ as follows:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\pmacro(\by,\bpsi;\theta) = C(\theta)\, \exp \left\{-U(\by,\bpsi;\theta)\right\} ,<br />
\end{eqnarray}</math> }}<br />
<br />
where $C(\theta)$ is a normalizing constant that only depends on $\theta$. Then, for any "temperature" $T\geq0$, we consider the complete model<br />
<br />
{{Equation1<br />
|equation=<math>\pmacro_T(\by,\bpsi;\theta) = C_T(\theta)\, \exp \left\{-\displaystyle{\frac{1}{T} }U(\by,\bpsi;\theta) \right\} ,<br />
</math> }}<br />
<br />
where $C_T(\theta)$ is still a normalizing constant.<br />
<br />
We then introduce a decreasing temperature sequence $(T_k, 1\leq k \leq K)$ and use the SAEM algorithm on the complete model $\pmacro_{T_k}(\by,\bpsi;\theta)$ at iteration $k$ (the usual version of SAEM uses $T_k=1$ at each iteration). The sequence $(T_k)$ is chosen to have large positive values during the first iterations, then decrease with an exponential rate to 1: $ T_k = \max(1, \tau \ T_{k-1}) $.<br />
<br />
Consider for example the following model for continuous data:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
y_{ij} &\sim& {\cal N}(f(t_{ij};\psi_i) , a^2) \\<br />
h(\psi_i) &\sim& {\cal N}(\mu , \Omega) .<br />
\end{eqnarray}</math> }}<br />
<br />
Here, $\theta = (\mu,\Omega,a^2)$ and<br />
<br />
{{Equation1<br />
|equation=<math><br />
\pmacro(\by,\bpsi;\theta) = C(\theta)\, \exp \left\{- \displaystyle{ \frac{1}{2 a^2} }\sum_{i=1}^N \sum_{j=1}^{n_i} (y_{ij} - f(t_{ij};\psi_i))^2 - \displaystyle{ \frac{1}{2} } \sum_{i=1}^N (h(\psi_i)-\mu)^\transpose \Omega (h(\psi_i)-\mu) \right\},<br />
</math> }}<br />
<br />
where $C(\theta)$ is a normalizing constant that only depends on $a$ and $\Omega$.<br />
<br />
<br />
We see that $\pmacro_T(\by,\bpsi;\theta)$ will also be a normal distribution whose residual error variance $a^2$ is replaced by $T a^2$ and variance matrix $\Omega$ for the random effects by $T\Omega$.<br />
In other words, a model with a "large temperature" is a model with large variances.<br />
<br />
The algorithm therefore consists in choosing large initial variances $\Omega_0$ and $a^2_0$ (that include the initial temperature $T_0$ implicitly) and setting $ a^2_k = \max(\tau \ a^2_{k-1} , \hat{a}(\by,\bpsi^{(k)}) $ and $ \Omega_k = \max(\tau \ \Omega_{k-1} , \hat{\Omega}(\bpsi^{(k)}) $ during the first iterations. Here, $0\leq\tau\leq 1$.<br />
<br />
These large values of the variance make the conditional distributions $\pmacro_T(\psi_i | y_i;\theta)$ less concentrated around their modes, and thus allow the sequence $(\theta_k)$ to "escape" from local maxima of the likelihood during the first iterations of SAEM and converge to a neighborhood of the global maximum of ${\like}$.<br />
After these initial iterations, the usual SAEM algorithm is used to estimate these variances at each iteration.<br />
<br />
<br />
{{Remarks<br />
|title= Remark<br />
|text= We can use two different coefficients $\tau_1$ and $\tau_2$ for $\Omega$ and $a^2$ in $\monolix$. It is possible, for example, to choose $\tau_1<1$ and $\tau_2>1$, with large initial inter-subject variances $\Omega_0$ and small initial residual variance $a^2_0$. In this case, SAEM tries to obtain the best possible fit during the first iterations, allowing for a large inter-subject variability. During the next iterations, this variability is reduced and the residual variance increases until reaching the best possible trade-off between the two criteria.<br />
}}<br />
<br />
<br />
{{Example<br />
|title=A PK example<br />
|text= <br />
<br />
Consider a simple one-compartment model for oral administration:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:saem_sa"><math><br />
f(t;ka,V,k) = \displaystyle{ \frac{D\, ka}{V(ka-ke)} }\left( e^{-ke \, t} - e^{-ka \, t} \right) .<br />
</math></div><br />
|reference=(2) }}<br />
<br />
We then simulate PK data from 80 patients using the following population PK parameters:<br />
<br />
{{Equation1<br />
|equation=<math> ka_{\rm pop} = 1, \quad V_{\rm pop}=8, \quad ke_{\rm pop}=0.25 .</math> }}<br />
<br />
We can see that the following parametrization gives the same prediction as the one given in [[#eq:saem_sa|(2)]]:<br />
<br />
{{Equation1<br />
|equation=<math> \tilde{ka} = ke, \quad \tilde{V}=V \times ke/ka, \quad \tilde{ke}=ka . </math> }}<br />
<br />
We can then expect a (global) maximum around $(ka,V,ke) = (1, \ 8, \ 0.25)$ and a (local) maximum of the likelihood around $(ka,V,ke) = (0.25, \ 2, \ 1).$<br />
<br />
The figure below displays the convergence of SAEM without simulated annealing to a local maximum of the likelihood (deviance = $-2\,\log {\like} =816$). The initial values of the population parameters we chose were $(ka_0,V_0,k_0) = (1,1,1)$.<br />
<br />
:{{ImageWithCaption_special|image=recuit1.png|caption=Convergence of SAEM to a local maxima of the likelihood}} <br />
<br />
Using the same initial guess, the simulated annealing version of SAEM converges to the global maximum of the likelihood (deviance = 734).<br />
<br />
:{{ImageWithCaption_special|image=recuit2.png|caption=Convergence of SAEM to the global maxima of the likelihood }}<br />
}}<br />
<br />
<br />
<br><br />
== Bibliography ==<br />
<br />
<br />
<bibtex><br />
@article{allassonniere2010construction,<br />
title={Construction of Bayesian deformable models via a stochastic approximation algorithm: a convergence study},<br />
author={Allassonnière, S. and Kuhn, E. and Trouvé, A.},<br />
journal={Bernoulli},<br />
volume={16},<br />
number={3},<br />
pages={641--678},<br />
year={2010},<br />
publisher={Bernoulli Society for Mathematical Statistics and Probability}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{delattre2012maximum,<br />
title={Maximum likelihood estimation in discrete mixed hidden Markov models using the SAEM algorithm},<br />
author={Delattre, M. and Lavielle, M.},<br />
journal={Computational Statistics & Data Analysis},<br />
year={2012},<br />
volume={56},<br />
pages={2073-2085}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{delattre2013sde,<br />
title={Coupling the SAEM algorithm and the extended Kalman filter for maximum likelihood estimation in mixed-effects diffusion models},<br />
author={Delattre, M. and Lavielle, M.},<br />
journal={Statistics and its interfaces},<br />
year={2013},<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{delyon1999convergence,<br />
title={Convergence of a stochastic approximation version of the EM algorithm},<br />
author={Delyon, B. and Lavielle, M. and Moulines, E.},<br />
journal={Annals of Statistics},<br />
pages={94-128},<br />
year={1999},<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{dempster1977maximum,<br />
title={Maximum likelihood from incomplete data via the EM algorithm},<br />
author={Dempster, A.P. and Laird, N.M. and Rubin, D.B.},<br />
journal={Journal of the Royal Statistical Society. Series B (Methodological)},<br />
pages={1-38},<br />
year={1977},<br />
publisher={JSTOR}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{kuhn2004coupling,<br />
title={Coupling a stochastic approximation version of EM with an MCMC procedure},<br />
author={Kuhn, E. and Lavielle, M.},<br />
journal={ESAIM: Probability and Statistics},<br />
volume={8},<br />
pages={115-131},<br />
year={2004},<br />
publisher={EDP Sciences, 17 Avenue du Hoggar Les Ulis Cedex A BP 112 91944 France}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{lavielle2013improved,<br />
title={An improved SAEM algorithm for maximum likelihood estimation in mixtures of non linear mixed effects models},<br />
author={Lavielle, M. and Mbogning, C.},<br />
journal={Statistics and Computing},<br />
year={2013},<br />
publisher={Springer}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{mclachlan2007algorithm,<br />
title={The EM algorithm and extensions},<br />
author={McLachlan, G.J. and Krishnan, T.},<br />
volume={382},<br />
year={2007},<br />
publisher={Wiley-Interscience}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{samson2006extension,<br />
title={Extension of the SAEM algorithm to left-censored data in nonlinear mixed-effects model: Application to HIV dynamics model},<br />
author={Samson, A. and Lavielle, M. and Mentr&eacute;, F.},<br />
journal={Computational statistics & data analysis},<br />
volume={51},<br />
number={3},<br />
pages={1562-1574},<br />
year={2006},<br />
publisher={Elsevier}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{wei1990monte,<br />
title={A Monte Carlo implementation of the EM algorithm and the poor man's data augmentation algorithms},<br />
author={Wei, G. and Tanner, M.},<br />
journal={Journal of the American Statistical Association},<br />
volume={85},<br />
number={411},<br />
pages={699-704},<br />
year={1990},<br />
publisher={Taylor & Francis}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{wu1983convergence,<br />
title={On the convergence properties of the EM algorithm},<br />
author={Wu, C.F.},<br />
journal={The Annals of Statistics},<br />
volume={11},<br />
number={1},<br />
pages={95-103},<br />
year={1983},<br />
publisher={Institute of Mathematical Statistics}<br />
}<br />
</bibtex><br />
<br />
<br />
<br />
{{Back&Next<br />
|linkBack=Introduction and notation<br />
|linkNext=The Metropolis-Hastings algorithm for simulating the individual parameters }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=The_SAEM_algorithm_for_estimating_population_parameters&diff=7300The SAEM algorithm for estimating population parameters2013-06-17T09:16:06Z<p>Brocco: </p>
<hr />
<div>==Introduction ==<br />
<br />
<br />
The SAEM (Stochastic Approximation of EM) algorithm is a stochastic algorithm for calculating the maximum likelihood estimator (MLE) in the quite general setting of incomplete data models. SAEM has been shown to be a very powerful NLMEM tool, known to accurately estimate population parameters as well as having good theoretical properties. In fact, it converges to the MLE under very general hypotheses.<br />
<br />
SAEM was first implemented in the $\monolix$ software. It has also been implemented in NONMEM, the {{Verbatim|R}} package {{Verbatim|saemix}} and the Matlab statistics toolbox as the function {{Verbatim|nlmefitsa.m}}.<br />
<br />
Here, we consider a model that includes observations $\by=(y_i , 1\leq i \leq N)$, unobserved individual parameters $\bpsi=(\psi_i , 1\leq i \leq N)$ and a vector of parameters $\theta$. By definition, the maximum likelihood estimator of $\theta$ maximizes<br />
<br />
{{Equation1<br />
|equation=<math> {\like}(\theta ; \by) = \py(\by ; \theta) = \displaystyle{ \int \pypsi(\by,\bpsi ; \theta) \, d \bpsi}.<br />
</math> }}<br />
<br />
<br />
SAEM is an iterative algorithm that essentially consists of constructing $N$ <span class="plainlinks">[http://en.wikipedia.org/wiki/Markov_chain Markov chains]</span> $(\psi_1^{(k)})$, ..., $ (\psi_N^{(k)})$ that converge to the conditional distributions $\pmacro(\psi_1|y_1),\ldots , \pmacro(\psi_N|y_N)$, using at each step the complete data $(\by,\bpsi^{(k)})$ to calculate a new parameter vector $\theta_k$. We will present a general description of the algorithm highlighting the connection with the EM algorithm, and present by way of a simple example how to implement SAEM and use it in practice.<br />
<br />
We will also give some extensions of the base algorithm that allow us to improve the convergence properties of the algorithm. For instance, it is possible to stabilize the algorithm's convergence by using several Markov chains per individual. Also, a simulated annealing version of SAEM allows us improve the chances of converging to the global maximum of the likelihood rather than to local maxima.<br />
<br />
<br />
<br><br />
==The EM algorithm==<br />
<br />
<br />
We first remark that if the individual parameters $\bpsi=(\psi_i)$ are observed, estimation is not thwarted by any particular problem because an estimator could be found by directly maximizing the joint distribution $\pypsi(\by,\bpsi ; \theta) $.<br />
<br />
However, since the $\psi_i$ are not observed, the EM algorithm replaces $\bpsi$ by its conditional expectation. Then, given some initial value $\theta_0$, iteration $k$ updates ${\theta}_{k-1}$ to ${\theta}_{k}$ with the two following steps:<br />
<br />
<br />
* $\textbf{E-step:}$ evaluate the quantity<br />
<br />
{{Equation1<br />
|equation=<math> Q_k(\theta)=\esp{\log \pmacro(\by,\bpsi;\theta){{!}} \by;\theta_{k-1} } .</math> }}<br />
<br />
<br />
* $\textbf{M-step:}$ update the estimation of $\theta$:<br />
<br />
{{Equation1<br />
|equation=<math> \theta_{k} = \argmax{\theta} \, Q_k(\theta) .<br />
</math> }}<br />
<br />
<br />
In can be proved that each EM iteration increases the likelihood of observations and that the EM sequence $(\theta_k)$ converges to a<br />
stationary point of the observed likelihood under mild regularity conditions.<br />
<br />
Unfortunately, in the framework of nonlinear mixed-effects models, there is no explicit expression for the E-step since the relationship between observations $\by$ and individual parameters $\bpsi$ is nonlinear. However, even though this expectation cannot be computed in a closed-form, it can be approximated by simulation. For instance,<br />
<br />
<br />
* The Monte Carlo EM (MCEM) algorithm replaces the E-step by a Monte Carlo approximation based on a large number of independent simulations of the non-observed individual parameters $\bpsi$.<br />
<br />
* The SAEM algorithm replaces the E-step by a stochastic approximation based on a single simulation of $\bpsi$.<br />
<br />
<br />
<br><br />
<br />
==The SAEM algorithm==<br />
<br />
At iteration $k$ of SAEM:<br />
<br />
<br />
* $\textbf{Simulation step}$: for $i=1,2,\ldots, N$, draw $\psi_i^{(k)}$ from the conditional distribution $\pmacro(\psi_i |y_i ;\theta_{k-1})$.<br />
<br />
<br />
* $\textbf{Stochastic approximation}$: update $Q_k(\theta)$ according to<br />
<br />
{{Equation1<br />
|equation=<math> Q_k(\theta) = Q_{k-1}(\theta) + \gamma_k ( \log \pmacro(\by,\bpsi^{(k)};\theta) - Q_{k-1}(\theta) ),<br />
</math> }}<br />
<br />
where $(\gamma_k)$ is a decreasing sequence of positive numbers such that $\gamma_1=1$, $ \sum_{k=1}^{\infty} \gamma_k = \infty$ and $\sum_{k=1}^{\infty} \gamma_k^2 < \infty$.<br />
<br />
<br />
* $\textbf{Maximization step}$: update $\theta_{k-1}$ according to<br />
<br />
{{Equation1<br />
|equation=<math> \theta_{k} = \argmax{\theta} \, Q_k(\theta) .</math> }}<br />
<br />
<br />
{{Remarks <br />
|title=Remarks<br />
|text= &#32;<br />
* Setting $\gamma_k=1$ for all $k$ means that there is no memory in the stochastic approximation:<br />
<br />
{{Equation1<br />
|equation=<math> Q_k(\theta) = \log \pmacro(\by,\bpsi^{(k)};\theta) . </math> }}<br />
<br />
: This algorithm, known as Stochastic EM (SEM) thus consists of successively simulating $\bpsi^{(k)}$ with the conditional distribution $\pmacro(\bpsi^{(k)} {{!}} \by;\theta_{k-1})$, then computing $\theta_k$ by maximizing the joint distribution $\pmacro(\by,\bpsi^{(k)};\theta)$.<br />
<br />
<br />
* When the number $N$ of subjects is small, convergence of SAEM can be improved by running $L$ Markov chains for each individual instead of one. The simulation step at iteration $k$ then requires us to draw $L$ sequences $ { \phi_i^{(k,1)} } ,\ldots , { \phi_i^{(k,L)} } $ for each individual $i$ and to combine stochastic approximation and Monte Carlo in the approximation step:<br />
<br />
{{Equation1<br />
|equation=<math> Q_k(\theta) = Q_{k-1}(\theta) + \gamma_k \left( \frac{1}{L}\sum_{\ell=1}^{L} \log \pmacro(\by,\bpsi^{(k,\ell)};\theta) - Q_{k-1}(\theta) \right) .<br />
</math> }}<br />
<br />
: By default, $\monolix$ selects $L$ so that $N\times L \geq 50$.<br />
}}<br />
<br />
<br />
Implementation of SAEM is simplified when the complete model $\pmacro(\by,\bpsi;\theta)$ belongs to a regular (curved) exponential family:<br />
<br />
{{Equation1<br />
|equation=<math> \pmacro(\by,\bpsi ;\theta) = \exp\left\{ - \zeta(\theta) + \langle \tilde{S}(\by,\bpsi) , \varphi(\theta) \rangle \right\} , </math> }}<br />
<br />
where $\tilde{S}(\by,\bpsi)$ is a sufficient statistic of the complete model (i.e., whose value contains all the information needed to compute any estimate of $\theta$) which takes its values in an open subset ${\cal S}$ of $\Rset^m$. Then, there exists a function $\tilde{\theta}$ such that for any $s\in {\cal S}$,<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:saem_stat"><math><br />
\tilde{\theta}(s) = \argmax{\theta} \left\{ - \zeta(\theta) + \langle s , \varphi(\theta) \rangle \right\} .<br />
</math></div><br />
|reference=(1) }}<br />
<br />
The approximation step of SAEM simplifies to a general Robbins-Monro-type scheme for approximating this conditional expectation:<br />
<br />
<br />
* $\textbf{Stochastic approximation}$: update $s_k$ according to<br />
<br />
{{Equation1<br />
|equation=<math><br />
s_k = s_{k-1} + \gamma_k ( \tilde{S}(\by,\bpsi^{(k)}) - s_{k-1} ) . </math> }}<br />
<br />
<br />
Note that the E-step of EM simplifies to computing $s_k=\esp{\tilde{S}(\by,\bpsi) | \by ; \theta_{k-1}}$.<br />
<br />
Then, both EM and SAEM algorithms use [[#eq:saem_stat|(1)]] for the M-step: $\theta_k = \tilde{\theta}(s_k)$.<br />
<br />
Precise results for convergence of SAEM were obtained in the [[Estimation of the observed Fisher information matrix#Estimation using linearization of the model|Estimation of the F.I.M. using a linearization of the model]] chapter in the case where $\pmacro(\by,\bpsi;\theta)$ belongs to a regular curved exponential family. This first version of [[The SAEM algorithm for estimating population parameters|SAEM]] and these first results assume that the individual parameters are simulated exactly under the conditional distribution at each iteration. Unfortunately, for most nonlinear models or non-Gaussian models, the unobserved data cannot be simulated exactly under this conditional distribution. A well-known alternative consists in using the Metropolis-Hastings algorithm: introduce a transition probability which has as unique invariant distribution the conditional distribution we want to simulate.<br />
<br />
In other words, the procedure consists of replacing the Simulation step of SAEM at iteration $k$ by $m$ iterations of the<br />
Metropolis-Hastings (MH) algorithm described in [[The Metropolis-Hastings algorithm for simulating the individual parameters|The Metropolis-Hastings algorithm]] section. It was shown in the [[Estimation of the observed Fisher information matrix#Estimation using linearization of the model|Estimation of the F.I.M. using a linearization of the model]] section that [[The SAEM algorithm for estimating population parameters|SAEM]] still converges under general conditions when coupled with a Markov chain Monte Carlo procedure.<br />
<br />
<br />
{{Remarks<br />
|title= Remark<br />
|text= Convergence of the Markov chains $(\psi_i^{(k)})$ is not necessary at each SAEM iteration. It suffices to run a few MH iterations with various transition kernels before resetting $\theta_{k-1}$. In $\monolix$ by default, three transition kernels are used twice each, successively, in each SAEM iteration.<br />
}}<br />
<br />
<br />
<br><br />
<br />
== Implementing SAEM ==<br />
<br />
Implementation of SAEM can be difficult to describe when looking at complex statistical models such as mixture models, models with inter-occasion variability, etc. We are therefore going to limit ourselves to looking at some basic models in order to illustrate how SAEM can be implemented.<br />
<br />
<br><br />
===SAEM for general hierarchical models===<br />
<br />
Consider first a very general model for any type (continuous, categorical, survival, etc.) of data $(y_i)$:<br />
<br />
{{Equation1<br />
|equation= <math>\begin{eqnarray} y_i {{!}} \psi_i &\sim& \pcyipsii(y_i {{!}} \psi_i) \\<br />
h(\psi_i) &\sim& {\cal N}( \mu , \Omega),<br />
\end{eqnarray}</math> }}<br />
<br />
where $h(\psi_i)=(h_1(\psi_{i,1}), h_2(\psi_{i,2}), \ldots , h_d(\psi_{i,d}) )^\transpose$ is a $d$-vector of (transformed) individual parameters, $\mu$ a $d$-vector of fixed effects and $\Omega$ a $d\times d$ variance-covariance matrix.<br />
<br />
We assume here that $\Omega$ is positive-definite. Then, a sufficient statistic for the complete model $\pmacro(\by,\bpsi;\theta)$ is<br />
$\tilde{S}(\bpsi) = (\tilde{S}_1(\bpsi),\tilde{S}_2(\bpsi))$, where<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\tilde{S}_1(\bpsi) &= & \sum_{i=1}^N h(\psi_i) \\<br />
\tilde{S}_2(\bpsi) &= & \sum_{i=1}^N h(\psi_i) h(\psi_i)^\transpose .<br />
\end{eqnarray}</math> }}<br />
<br />
At iteration $k$ of SAEM, we have:<br />
<br />
<br />
* $\textbf{Simulation step}$: for $i=1,2,\ldots, N$, draw $\psi_i^{(k)}$ from $m$ iterations of the MH algorithm described in [[The Metropolis-Hastings algorithm for simulating the individual parameters|The Metropolis-Hastings algorithm]] with $\pmacro(\psi_i |y_i ;\mu_{k-1},\Omega_{k-1})$ as limiting distribution.<br />
<br />
* $\textbf{Stochastic approximation}$: update $s_k=(s_{k,1},s_{k,2})$ according to<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
s_{k,1} &=& s_{k-1,1} + \gamma_k \left( \sum_{i=1}^N h(\psi_i^{(k)}) - s_{k-1,1} \right) \\<br />
s_{k,2} &=& s_{k-1,2} + \gamma_k \left( \sum_{i=1}^N h(\psi_i^{(k)})h(\psi_i^{(k)})^\transpose - s_{k-1,2} \right) .<br />
\end{eqnarray}</math> }}<br />
<br />
<br />
* $\textbf{Maximization step}$: update $(\mu_{k-1},\Omega_{k-1})$ according to<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\mu_{k} &=& \frac{1}{N} s_{k,1} \\<br />
\Omega_k &=& \frac{1}{N}\left( s_{k,2} - s_{k,1}s_{k,1}^\transpose \right) .<br />
\end{eqnarray}</math> }}<br />
<br />
<br />
What is remarkable is that it suffices to be able to calculate $\pcyipsii(y_i | \psi_i)$ for all $\psi_i$ and $y_i$ in order to be able to run SAEM. In effect, this allows the simulation step to be run using MH since the acceptance probabilities can be calculated.<br />
<br />
<br />
<br><br />
<br />
===SAEM for continuous data models===<br />
Consider now a continuous data model in which the residual error variance is now constant:<br />
<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
y_{ij} &=& f(t_{ij},\phi_i) + a \teps_{ij} \\<br />
h(\psi_i) &\sim& {\cal N}( \mu , \Omega) .<br />
\end{eqnarray}</math> }}<br />
<br />
<br />
Here, the individual parameters are $\psi_i=(\phi_i,a)$. The variance-covariance matrix for $\psi_i$ is not positive-definite in this case because $a$ has no variability. If we suppose that the variance matrix $\Omega$ is positive-definite, then noting $\theta=(\mu,\Omega,a)$, a natural decomposition of the model is:<br />
<br />
{{Equation1<br />
|equation=<math>\pmacro(\by,\bpsi;\theta) = \pmacro(\by {{!}} \bpsi;a)\pmacro(\bpsi;\mu,\Omega) .<br />
</math> }}<br />
<br />
The previous statistic $\tilde{S}(\bpsi) = (\tilde{S}_1(\bpsi),\tilde{S}_2(\bpsi))$ is not sufficient for estimating $a$. Indeed, we need an additional component which is a function both of $\by$ and $\bpsi$:<br />
<br />
{{Equation1<br />
|equation=<math> \tilde{S}_3(\by, \bpsi) =\sum_{i=1}^N \sum_{j=1}^{n_i}(y_{ij} - f(t_{ij},\psi_i))^2. </math> }}<br />
<br />
Then,<br />
<br />
{{Equation1<br />
|equation=<math> \begin{eqnarray}<br />
s_{k,3} &=& s_{k-1,3} + \gamma_k ( \tilde{S}_3(\by, \bpsi) - s_{k-1,3} ) \\<br />
a_k^2 &=& \displaystyle{ \frac{1}{\sum_{i=1}^N n_i} s_{k,3} }\ .<br />
\end{eqnarray}</math> }}<br />
<br />
The choice of step-size $(\gamma_k)$ is extremely important for ensuring convergence of SAEM. The sequence $(\gamma_k)$ used in $\monolix$ decreases like $k^{-\alpha}$. We recommend using $\alpha=0$ (that is, $\gamma_k=1$) during the first $K_1$ iterations, in order to converge quickly to a neighborhood of a maximum of the likelihood, and $\alpha=1$ during the next $K_2$ iterations.<br />
Indeed, the initial guess $\theta_0$ may be far from the maximum likelihood value we are looking for, and the first iterations with $\gamma_k=1$ allow SAEM to converge quickly to a neighborhood of this value. Following this, smaller step-sizes ensure the<br />
almost sure convergence of the algorithm to the maximum likelihood estimator.<br />
<br />
<br />
<br />
{{Example<br />
|title=Example<br />
|text= Consider a simple model for continuous data:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
y_{ij} &\sim& {\cal N}(A_i\,e^{-k_i \, t_{ij} } , a^2) \\<br />
\log(A_i)&\sim&{\cal N}(\log(A_{\rm pop}) , \omega_A^2) \\<br />
\log(k_i)&\sim&{\cal N}(\log(k_{\rm pop}) , \omega_k^2) ,<br />
\end{eqnarray}</math> }}<br />
<br />
where $A_{\rm pop}=6$, $k_{\rm pop}=0.25$, $\omega_A=0.3$, $\omega_k=0.3$ and $a=0.2$.<br />
Let us look at the effect of different settings for $(\gamma_k)$ (and $L$) for estimating the population parameters of the model with SAEM.<br />
<br />
<br />
1. For all $k$, $\gamma_k = 1$: the sequence $(\theta_{k})$ converges very quickly to a neighborhood of the "solution". The sequence $(\theta_{k})$ is a homogeneous Markov Chain that converges in distribution but does not converge almost surely. <br />
<br />
[[File:saem1.png|link=]]<br />
<br />
<br />
2. For all $k$, $\gamma_k = 1/k$: the sequence $(\theta_{k})$ converges almost surely to the maximum likelihood estimate of $\theta$, but very slowly. <br />
<br />
[[File:saem2.png|link=]]<br />
<br />
<br />
3. $\gamma_k = 1$, $k=1$, ...,$40$, $\gamma_k = 1/(k-40)$, $k \geq 41$: the sequence $(\theta_{k})$ converges almost surely to the maximum likelihood estimate of $\theta$, and quickly.<br />
<br />
[[File:saem3.png|link=]]<br />
<br />
<br />
4. $L=10$, $\gamma_k = 1$, $k \geq 1$: the sequence $(\theta_{k})$ is an homogeneous Markov chain that converges in distribution, as in Example 1, but the variance is reduced by a factor $\sqrt{10}$; in this case, SAEM behaves like EM. <br />
<br />
[[File:saem4.png|link=]]<br />
}}<br />
<br />
<br />
<br><br />
<br />
==A simple example to understand why SAEM converges in practice==<br />
<br />
<br />
Let us look at a very simple Gaussian model, with only one observation per individual:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\psi_i &\sim& {\cal N}(\theta,\omega^2) , \ \ \ 1 \leq i \leq N \\<br />
y_i &\sim& {\cal N}(\psi_i,\sigma^2).<br />
\end{eqnarray}</math> }}<br />
<br />
We will furthermore assume that both $\omega^2$ and $\sigma^2$ are known.<br />
<br />
Here, the maximum likelihood estimator $ \hat{\theta}$ of $\theta$ is easy to compute since $y_i \sim_{i.i.d.} {\cal N}(\theta,\omega^2+\sigma^2)$. We find that<br />
<br />
{{Equation1<br />
|equation=<math> \hat{\theta} = \displaystyle{\frac{1}{N} }\sum_{i=1}^{N} y_i .<br />
</math>}}<br />
<br />
We now propose to try and compute $\hat{\theta}$ using SAEM instead. The simulation step is straightforward since the conditional distribution of $\psi_i$ is a normal distribution:<br />
<br />
{{Equation1<br />
|equation=<math><br />
\psi_i {{!}} y_i \sim {\cal N}(a \theta + (1-a)y_i , \gamma^2) ,<br />
</math> }}<br />
<br />
where<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
a &= & \displaystyle{ \frac{1}{\omega^2} } \left(\displaystyle{ \frac{1}{\sigma^2} }+ \displaystyle{\frac{1}{\omega^2} }\right)^{-1} \\<br />
\gamma^2 &= &\left(\displaystyle{ \frac{1}{\sigma^2} }+ \displaystyle{\frac{1}{\omega^2} }\right)^{-1}.<br />
\end{eqnarray}</math> }}<br />
<br />
The maximization step is also straightforward. Indeed, a sufficient statistic for estimating $\theta$ is<br />
<br />
{{Equation1<br />
|equation=<math> {\cal S}(\bpsi) = \sum_{i=1}^{N} \psi_i. </math> }}<br />
<br />
Then,<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\tilde{\theta}({\cal S(\bpsi)} ) &=& \argmax{\theta} \pmacro(y_1,\ldots,y_N,\psi_1,\ldots,\psi_N;\theta) \\<br />
&=& \argmax{\theta} \pmacro(\psi_1,\ldots,\psi_N;\theta) \\<br />
&=& \frac{ {\cal S}(\bpsi)}{N}.<br />
\end{eqnarray}</math> }}<br />
<br />
Let us first look at the behavior of SAEM when $\gamma_k=1$. At iteration $k$,<br />
<br />
<br />
* Simulation step: $\psi_i^{(k)} \sim {\cal N}( a \theta_{k-1} + (1-a)y_i , \gamma^2). $<br />
<br />
* Maximization step: $\theta_k = \displaystyle{ \frac{ {\cal S}(\bpsi^{(k)})}{N} } = \displaystyle{ \frac{1}{N} }\sum_{i=1}^{N} \psi_i^{(k)}$.<br />
<br />
<br />
It can be shown that:<br />
<br />
{{Equation1<br />
|equation=<math> \theta_k - \hat{\theta} = a(\theta_{k-1} - \hat{\theta}) + e_k ,<br />
</math> }}<br />
<br />
where $e_k \sim {\cal N}(0, \gamma^2 /N)$. Then, the sequence $(\theta_k)$ is an autoregressive process of order 1 (AR(1)) which converges in distribution to a normal distribution when $k\to \infty$:<br />
<br />
{{Equation1<br />
|equation=<math>\theta_k \limite{}{\cal D} {\cal N}\left(\hat{\theta} , \displaystyle{ \frac{\gamma^2}{N(1-a^2)} }\right) .<br />
</math> }}<br />
<br />
<br />
{{ImageWithCaption|image=saemb1.png|caption=10 sequences $(\theta_k)$ obtained with different initial values and $\gamma_k=1$ for $1\leq k \leq 50$ }} <br />
<br />
<br />
Now, let us see what happens instead when $\gamma_k$ decreases like $1/k$. At iteration $k$,<br />
<br />
<br />
* Simulation step: $\psi_i^{(k)} \sim {\cal N}( a \theta_{k-1} + (1-a)y_i , \gamma^2) $<br />
<br />
* Maximization step:<br />
<br />
{{Equation1<br />
|equation= <math>\theta_k = \theta_{k-1} + \displaystyle{ \frac{1}{k} }\left( \displaystyle{ \frac{1}{N} }\sum_{i=1}^{N} \psi_i^{(k)} -\theta_{k-1} \right). <br />
</math> }}<br />
<br />
<br />
: Here, we can show that:<br />
<br />
{{Equation1<br />
|equation=<math> \theta_k - \hat{\theta} = \displaystyle{ \frac{k-a}{k} }(\theta_{k-1} - \hat{\theta}) + \displaystyle{\frac{e_k}{k} }, <br />
</math> }}<br />
<br />
: where $e_k \sim {\cal N}(0, \gamma^2 /N)$. Then, the sequence $(\theta_k)$ converges almost surely to $\hat{\theta}$.<br />
<br />
<br />
{{ImageWithCaption|image=saemb2.png|caption=10 sequences $(\theta_k)$ obtained with different initial values and $\gamma_k=1/k$ for $1\leq k \leq 50$ }}<br />
<br />
<br />
Thus, we see that by combining the two strategies, the sequence $(\theta_k)$ is a Markov chain that converges to a random walk around $\hat{\theta}$ during the first $K_1$ iterations, then converges almost surely to $\hat{\theta}$ during the next $K_2$ iterations.<br />
<br />
<br />
{{ImageWithCaption|image=saemb3.png|caption=10 sequences $(\theta_k)$ obtained with different initial values, $\gamma_k=1$ for $1\leq k \leq 20$ and $\gamma_k=1/(k-20)$ for $21\leq k \leq 50$ }}<br />
<br />
<br />
{{ShowVideo|image=saem5b.png|video=http://popix.lixoft.net/images/2/20/saem.mp4|caption=The SAEM algorithm in practice. }}<br />
<br />
<!-- {{ImageWithCaptionL|image=saem5.png|size=750px|caption= The SAEM algorithm in practice. (a) the observations and the initialization $p_0(\psi_i)$, (b) the initialization $p_0(\psi_i)$ and the conditional distributions of the observations $p(y_i{{!}}\psi_i)$, (c) the conditional distributions $p_0(\psi_i{{!}}y_i)$ and the simulated individual parameters $(\psi_i^{(1)})$, (d) the updated distribution $p_1(\psi_i)$. }} --><br />
<br />
==A simulated annealing version of SAEM==<br />
<br />
<br />
Convergence of SAEM can strongly depend on the initial guess when the likelihood ${\like}$ has several local maxima. A simulated annealing version of SAEM can improve convergence of the algorithm toward the global maximum of ${\like}$.<br />
<br />
To detail this, we can first rewrite the joint pdf of $(\by,\bpsi)$ as follows:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\pmacro(\by,\bpsi;\theta) = C(\theta)\, \exp \left\{-U(\by,\bpsi;\theta)\right\} ,<br />
\end{eqnarray}</math> }}<br />
<br />
where $C(\theta)$ is a normalizing constant that only depends on $\theta$. Then, for any "temperature" $T\geq0$, we consider the complete model<br />
<br />
{{Equation1<br />
|equation=<math>\pmacro_T(\by,\bpsi;\theta) = C_T(\theta)\, \exp \left\{-\displaystyle{\frac{1}{T} }U(\by,\bpsi;\theta) \right\} ,<br />
</math> }}<br />
<br />
where $C_T(\theta)$ is still a normalizing constant.<br />
<br />
We then introduce a decreasing temperature sequence $(T_k, 1\leq k \leq K)$ and use the SAEM algorithm on the complete model $\pmacro_{T_k}(\by,\bpsi;\theta)$ at iteration $k$ (the usual version of SAEM uses $T_k=1$ at each iteration). The sequence $(T_k)$ is chosen to have large positive values during the first iterations, then decrease with an exponential rate to 1: $ T_k = \max(1, \tau \ T_{k-1}) $.<br />
<br />
Consider for example the following model for continuous data:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
y_{ij} &\sim& {\cal N}(f(t_{ij};\psi_i) , a^2) \\<br />
h(\psi_i) &\sim& {\cal N}(\mu , \Omega) .<br />
\end{eqnarray}</math> }}<br />
<br />
Here, $\theta = (\mu,\Omega,a^2)$ and<br />
<br />
{{Equation1<br />
|equation=<math><br />
\pmacro(\by,\bpsi;\theta) = C(\theta)\, \exp \left\{- \displaystyle{ \frac{1}{2 a^2} }\sum_{i=1}^N \sum_{j=1}^{n_i} (y_{ij} - f(t_{ij};\psi_i))^2 - \displaystyle{ \frac{1}{2} } \sum_{i=1}^N (h(\psi_i)-\mu)^\transpose \Omega (h(\psi_i)-\mu) \right\},<br />
</math> }}<br />
<br />
where $C(\theta)$ is a normalizing constant that only depends on $a$ and $\Omega$.<br />
<br />
<br />
We see that $\pmacro_T(\by,\bpsi;\theta)$ will also be a normal distribution whose residual error variance $a^2$ is replaced by $T a^2$ and variance matrix $\Omega$ for the random effects by $T\Omega$.<br />
In other words, a model with a "large temperature" is a model with large variances.<br />
<br />
The algorithm therefore consists in choosing large initial variances $\Omega_0$ and $a^2_0$ (that include the initial temperature $T_0$ implicitly) and setting $ a^2_k = \max(\tau \ a^2_{k-1} , \hat{a}(\by,\bpsi^{(k)}) $ and $ \Omega_k = \max(\tau \ \Omega_{k-1} , \hat{\Omega}(\bpsi^{(k)}) $ during the first iterations. Here, $0\leq\tau\leq 1$.<br />
<br />
These large values of the variance make the conditional distributions $\pmacro_T(\psi_i | y_i;\theta)$ less concentrated around their modes, and thus allow the sequence $(\theta_k)$ to "escape" from local maxima of the likelihood during the first iterations of SAEM and converge to a neighborhood of the global maximum of ${\like}$.<br />
After these initial iterations, the usual SAEM algorithm is used to estimate these variances at each iteration.<br />
<br />
<br />
{{Remarks<br />
|title= Remark<br />
|text= We can use two different coefficients $\tau_1$ and $\tau_2$ for $\Omega$ and $a^2$ in $\monolix$. It is possible, for example, to choose $\tau_1<1$ and $\tau_2>1$, with large initial inter-subject variances $\Omega_0$ and small initial residual variance $a^2_0$. In this case, SAEM tries to obtain the best possible fit during the first iterations, allowing for a large inter-subject variability. During the next iterations, this variability is reduced and the residual variance increases until reaching the best possible trade-off between the two criteria.<br />
}}<br />
<br />
<br />
{{Example<br />
|title=A PK example<br />
|text= <br />
<br />
Consider a simple one-compartment model for oral administration:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:saem_sa"><math><br />
f(t;ka,V,k) = \displaystyle{ \frac{D\, ka}{V(ka-ke)} }\left( e^{-ke \, t} - e^{-ka \, t} \right) .<br />
</math></div><br />
|reference=(2) }}<br />
<br />
We then simulate PK data from 80 patients using the following population PK parameters:<br />
<br />
{{Equation1<br />
|equation=<math> ka_{\rm pop} = 1, \quad V_{\rm pop}=8, \quad ke_{\rm pop}=0.25 .</math> }}<br />
<br />
We can see that the following parametrization gives the same prediction as the one given in [[#eq:saem_sa|(2)]]:<br />
<br />
{{Equation1<br />
|equation=<math> \tilde{ka} = ke, \quad \tilde{V}=V \times ke/ka, \quad \tilde{ke}=ka . </math> }}<br />
<br />
We can then expect a (global) maximum around $(ka,V,ke) = (1, \ 8, \ 0.25)$ and a (local) maximum of the likelihood around $(ka,V,ke) = (0.25, \ 2, \ 1).$<br />
<br />
The figure below displays the convergence of SAEM without simulated annealing to a local maximum of the likelihood (deviance = $-2\,\log {\like} =816$). The initial values of the population parameters we chose were $(ka_0,V_0,k_0) = (1,1,1)$.<br />
<br />
:{{ImageWithCaption_special|image=recuit1.png|caption=Convergence of SAEM to a local maxima of the likelihood}} <br />
<br />
Using the same initial guess, the simulated annealing version of SAEM converges to the global maximum of the likelihood (deviance = 734).<br />
<br />
:{{ImageWithCaption_special|image=recuit2.png|caption=Convergence of SAEM to the global maxima of the likelihood }}<br />
}}<br />
<br />
<br />
<br><br />
== Bibliography ==<br />
<br />
<br />
<bibtex><br />
@article{allassonniere2010construction,<br />
title={Construction of Bayesian deformable models via a stochastic approximation algorithm: a convergence study},<br />
author={Allassonnière, S. and Kuhn, E. and Trouvé, A.},<br />
journal={Bernoulli},<br />
volume={16},<br />
number={3},<br />
pages={641--678},<br />
year={2010},<br />
publisher={Bernoulli Society for Mathematical Statistics and Probability}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{delattre2012maximum,<br />
title={Maximum likelihood estimation in discrete mixed hidden Markov models using the SAEM algorithm},<br />
author={Delattre, M. and Lavielle, M.},<br />
journal={Computational Statistics & Data Analysis},<br />
year={2012},<br />
volume={56},<br />
pages={2073-2085}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{delattre2013sde,<br />
title={Coupling the SAEM algorithm and the extended Kalman filter for maximum likelihood estimation in mixed-effects diffusion models},<br />
author={Delattre, M. and Lavielle, M.},<br />
journal={Statistics and its interfaces},<br />
year={2013},<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{delyon1999convergence,<br />
title={Convergence of a stochastic approximation version of the EM algorithm},<br />
author={Delyon, B. and Lavielle, M. and Moulines, E.},<br />
journal={Annals of Statistics},<br />
pages={94-128},<br />
year={1999},<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{dempster1977maximum,<br />
title={Maximum likelihood from incomplete data via the EM algorithm},<br />
author={Dempster, A.P. and Laird, N.M. and Rubin, D.B.},<br />
journal={Journal of the Royal Statistical Society. Series B (Methodological)},<br />
pages={1-38},<br />
year={1977},<br />
publisher={JSTOR}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{kuhn2004coupling,<br />
title={Coupling a stochastic approximation version of EM with an MCMC procedure},<br />
author={Kuhn, E. and Lavielle, M.},<br />
journal={ESAIM: Probability and Statistics},<br />
volume={8},<br />
pages={115-131},<br />
year={2004},<br />
publisher={EDP Sciences, 17 Avenue du Hoggar Les Ulis Cedex A BP 112 91944 France}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{lavielle2013improved,<br />
title={An improved SAEM algorithm for maximum likelihood estimation in mixtures of non linear mixed effects models},<br />
author={Lavielle, M. and Mbogning, C.},<br />
journal={Statistics and Computing},<br />
year={2013},<br />
publisher={Springer}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{mclachlan2007algorithm,<br />
title={The EM algorithm and extensions},<br />
author={McLachlan, G.J. and Krishnan, T.},<br />
volume={382},<br />
year={2007},<br />
publisher={Wiley-Interscience}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{samson2006extension,<br />
title={Extension of the SAEM algorithm to left-censored data in nonlinear mixed-effects model: Application to HIV dynamics model},<br />
author={Samson, A. and Lavielle, M. and Mentr&eacute;, F.},<br />
journal={Computational statistics & data analysis},<br />
volume={51},<br />
number={3},<br />
pages={1562-1574},<br />
year={2006},<br />
publisher={Elsevier}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{wei1990monte,<br />
title={A Monte Carlo implementation of the EM algorithm and the poor man's data augmentation algorithms},<br />
author={Wei, G. and Tanner, M.},<br />
journal={Journal of the American Statistical Association},<br />
volume={85},<br />
number={411},<br />
pages={699-704},<br />
year={1990},<br />
publisher={Taylor & Francis}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{wu1983convergence,<br />
title={On the convergence properties of the EM algorithm},<br />
author={Wu, C.F.},<br />
journal={The Annals of Statistics},<br />
volume={11},<br />
number={1},<br />
pages={95-103},<br />
year={1983},<br />
publisher={Institute of Mathematical Statistics}<br />
}<br />
</bibtex><br />
<br />
<br />
<br />
{{Back&Next<br />
|linkBack=Introduction and notation<br />
|linkNext=The Metropolis-Hastings algorithm for simulating the individual parameters }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=The_SAEM_algorithm_for_estimating_population_parameters&diff=7299The SAEM algorithm for estimating population parameters2013-06-17T09:15:31Z<p>Brocco: </p>
<hr />
<div>==Introduction ==<br />
<br />
<br />
The SAEM (Stochastic Approximation of EM) algorithm is a stochastic algorithm for calculating the maximum likelihood estimator (MLE) in the quite general setting of incomplete data models. SAEM has been shown to be a very powerful NLMEM tool, known to accurately estimate population parameters as well as having good theoretical properties. In fact, it converges to the MLE under very general hypotheses.<br />
<br />
SAEM was first implemented in the $\monolix$ software. It has also been implemented in NONMEM, the {{Verbatim|R}} package {{Verbatim|saemix}} and the Matlab statistics toolbox as the function {{Verbatim|nlmefitsa.m}}.<br />
<br />
Here, we consider a model that includes observations $\by=(y_i , 1\leq i \leq N)$, unobserved individual parameters $\bpsi=(\psi_i , 1\leq i \leq N)$ and a vector of parameters $\theta$. By definition, the maximum likelihood estimator of $\theta$ maximizes<br />
<br />
{{Equation1<br />
|equation=<math> {\like}(\theta ; \by) = \py(\by ; \theta) = \displaystyle{ \int \pypsi(\by,\bpsi ; \theta) \, d \bpsi}.<br />
</math> }}<br />
<br />
<br />
SAEM is an iterative algorithm that essentially consists of constructing $N$ <span class="plainlinks">[http://en.wikipedia.org/wiki/Markov_chain Markov chains]</span>$(\psi_1^{(k)})$, ..., $(\psi_N^{(k)})$ that converge to the conditional distributions $\pmacro(\psi_1|y_1),\ldots , \pmacro(\psi_N|y_N)$, using at each step the complete data $(\by,\bpsi^{(k)})$ to calculate a new parameter vector $\theta_k$. We will present a general description of the algorithm highlighting the connection with the EM algorithm, and present by way of a simple example how to implement SAEM and use it in practice.<br />
<br />
We will also give some extensions of the base algorithm that allow us to improve the convergence properties of the algorithm. For instance, it is possible to stabilize the algorithm's convergence by using several Markov chains per individual. Also, a simulated annealing version of SAEM allows us improve the chances of converging to the global maximum of the likelihood rather than to local maxima.<br />
<br />
<br />
<br><br />
==The EM algorithm==<br />
<br />
<br />
We first remark that if the individual parameters $\bpsi=(\psi_i)$ are observed, estimation is not thwarted by any particular problem because an estimator could be found by directly maximizing the joint distribution $\pypsi(\by,\bpsi ; \theta) $.<br />
<br />
However, since the $\psi_i$ are not observed, the EM algorithm replaces $\bpsi$ by its conditional expectation. Then, given some initial value $\theta_0$, iteration $k$ updates ${\theta}_{k-1}$ to ${\theta}_{k}$ with the two following steps:<br />
<br />
<br />
* $\textbf{E-step:}$ evaluate the quantity<br />
<br />
{{Equation1<br />
|equation=<math> Q_k(\theta)=\esp{\log \pmacro(\by,\bpsi;\theta){{!}} \by;\theta_{k-1} } .</math> }}<br />
<br />
<br />
* $\textbf{M-step:}$ update the estimation of $\theta$:<br />
<br />
{{Equation1<br />
|equation=<math> \theta_{k} = \argmax{\theta} \, Q_k(\theta) .<br />
</math> }}<br />
<br />
<br />
In can be proved that each EM iteration increases the likelihood of observations and that the EM sequence $(\theta_k)$ converges to a<br />
stationary point of the observed likelihood under mild regularity conditions.<br />
<br />
Unfortunately, in the framework of nonlinear mixed-effects models, there is no explicit expression for the E-step since the relationship between observations $\by$ and individual parameters $\bpsi$ is nonlinear. However, even though this expectation cannot be computed in a closed-form, it can be approximated by simulation. For instance,<br />
<br />
<br />
* The Monte Carlo EM (MCEM) algorithm replaces the E-step by a Monte Carlo approximation based on a large number of independent simulations of the non-observed individual parameters $\bpsi$.<br />
<br />
* The SAEM algorithm replaces the E-step by a stochastic approximation based on a single simulation of $\bpsi$.<br />
<br />
<br />
<br><br />
<br />
==The SAEM algorithm==<br />
<br />
At iteration $k$ of SAEM:<br />
<br />
<br />
* $\textbf{Simulation step}$: for $i=1,2,\ldots, N$, draw $\psi_i^{(k)}$ from the conditional distribution $\pmacro(\psi_i |y_i ;\theta_{k-1})$.<br />
<br />
<br />
* $\textbf{Stochastic approximation}$: update $Q_k(\theta)$ according to<br />
<br />
{{Equation1<br />
|equation=<math> Q_k(\theta) = Q_{k-1}(\theta) + \gamma_k ( \log \pmacro(\by,\bpsi^{(k)};\theta) - Q_{k-1}(\theta) ),<br />
</math> }}<br />
<br />
where $(\gamma_k)$ is a decreasing sequence of positive numbers such that $\gamma_1=1$, $ \sum_{k=1}^{\infty} \gamma_k = \infty$ and $\sum_{k=1}^{\infty} \gamma_k^2 < \infty$.<br />
<br />
<br />
* $\textbf{Maximization step}$: update $\theta_{k-1}$ according to<br />
<br />
{{Equation1<br />
|equation=<math> \theta_{k} = \argmax{\theta} \, Q_k(\theta) .</math> }}<br />
<br />
<br />
{{Remarks <br />
|title=Remarks<br />
|text= &#32;<br />
* Setting $\gamma_k=1$ for all $k$ means that there is no memory in the stochastic approximation:<br />
<br />
{{Equation1<br />
|equation=<math> Q_k(\theta) = \log \pmacro(\by,\bpsi^{(k)};\theta) . </math> }}<br />
<br />
: This algorithm, known as Stochastic EM (SEM) thus consists of successively simulating $\bpsi^{(k)}$ with the conditional distribution $\pmacro(\bpsi^{(k)} {{!}} \by;\theta_{k-1})$, then computing $\theta_k$ by maximizing the joint distribution $\pmacro(\by,\bpsi^{(k)};\theta)$.<br />
<br />
<br />
* When the number $N$ of subjects is small, convergence of SAEM can be improved by running $L$ Markov chains for each individual instead of one. The simulation step at iteration $k$ then requires us to draw $L$ sequences $ { \phi_i^{(k,1)} } ,\ldots , { \phi_i^{(k,L)} } $ for each individual $i$ and to combine stochastic approximation and Monte Carlo in the approximation step:<br />
<br />
{{Equation1<br />
|equation=<math> Q_k(\theta) = Q_{k-1}(\theta) + \gamma_k \left( \frac{1}{L}\sum_{\ell=1}^{L} \log \pmacro(\by,\bpsi^{(k,\ell)};\theta) - Q_{k-1}(\theta) \right) .<br />
</math> }}<br />
<br />
: By default, $\monolix$ selects $L$ so that $N\times L \geq 50$.<br />
}}<br />
<br />
<br />
Implementation of SAEM is simplified when the complete model $\pmacro(\by,\bpsi;\theta)$ belongs to a regular (curved) exponential family:<br />
<br />
{{Equation1<br />
|equation=<math> \pmacro(\by,\bpsi ;\theta) = \exp\left\{ - \zeta(\theta) + \langle \tilde{S}(\by,\bpsi) , \varphi(\theta) \rangle \right\} , </math> }}<br />
<br />
where $\tilde{S}(\by,\bpsi)$ is a sufficient statistic of the complete model (i.e., whose value contains all the information needed to compute any estimate of $\theta$) which takes its values in an open subset ${\cal S}$ of $\Rset^m$. Then, there exists a function $\tilde{\theta}$ such that for any $s\in {\cal S}$,<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:saem_stat"><math><br />
\tilde{\theta}(s) = \argmax{\theta} \left\{ - \zeta(\theta) + \langle s , \varphi(\theta) \rangle \right\} .<br />
</math></div><br />
|reference=(1) }}<br />
<br />
The approximation step of SAEM simplifies to a general Robbins-Monro-type scheme for approximating this conditional expectation:<br />
<br />
<br />
* $\textbf{Stochastic approximation}$: update $s_k$ according to<br />
<br />
{{Equation1<br />
|equation=<math><br />
s_k = s_{k-1} + \gamma_k ( \tilde{S}(\by,\bpsi^{(k)}) - s_{k-1} ) . </math> }}<br />
<br />
<br />
Note that the E-step of EM simplifies to computing $s_k=\esp{\tilde{S}(\by,\bpsi) | \by ; \theta_{k-1}}$.<br />
<br />
Then, both EM and SAEM algorithms use [[#eq:saem_stat|(1)]] for the M-step: $\theta_k = \tilde{\theta}(s_k)$.<br />
<br />
Precise results for convergence of SAEM were obtained in the [[Estimation of the observed Fisher information matrix#Estimation using linearization of the model|Estimation of the F.I.M. using a linearization of the model]] chapter in the case where $\pmacro(\by,\bpsi;\theta)$ belongs to a regular curved exponential family. This first version of [[The SAEM algorithm for estimating population parameters|SAEM]] and these first results assume that the individual parameters are simulated exactly under the conditional distribution at each iteration. Unfortunately, for most nonlinear models or non-Gaussian models, the unobserved data cannot be simulated exactly under this conditional distribution. A well-known alternative consists in using the Metropolis-Hastings algorithm: introduce a transition probability which has as unique invariant distribution the conditional distribution we want to simulate.<br />
<br />
In other words, the procedure consists of replacing the Simulation step of SAEM at iteration $k$ by $m$ iterations of the<br />
Metropolis-Hastings (MH) algorithm described in [[The Metropolis-Hastings algorithm for simulating the individual parameters|The Metropolis-Hastings algorithm]] section. It was shown in the [[Estimation of the observed Fisher information matrix#Estimation using linearization of the model|Estimation of the F.I.M. using a linearization of the model]] section that [[The SAEM algorithm for estimating population parameters|SAEM]] still converges under general conditions when coupled with a Markov chain Monte Carlo procedure.<br />
<br />
<br />
{{Remarks<br />
|title= Remark<br />
|text= Convergence of the Markov chains $(\psi_i^{(k)})$ is not necessary at each SAEM iteration. It suffices to run a few MH iterations with various transition kernels before resetting $\theta_{k-1}$. In $\monolix$ by default, three transition kernels are used twice each, successively, in each SAEM iteration.<br />
}}<br />
<br />
<br />
<br><br />
<br />
== Implementing SAEM ==<br />
<br />
Implementation of SAEM can be difficult to describe when looking at complex statistical models such as mixture models, models with inter-occasion variability, etc. We are therefore going to limit ourselves to looking at some basic models in order to illustrate how SAEM can be implemented.<br />
<br />
<br><br />
===SAEM for general hierarchical models===<br />
<br />
Consider first a very general model for any type (continuous, categorical, survival, etc.) of data $(y_i)$:<br />
<br />
{{Equation1<br />
|equation= <math>\begin{eqnarray} y_i {{!}} \psi_i &\sim& \pcyipsii(y_i {{!}} \psi_i) \\<br />
h(\psi_i) &\sim& {\cal N}( \mu , \Omega),<br />
\end{eqnarray}</math> }}<br />
<br />
where $h(\psi_i)=(h_1(\psi_{i,1}), h_2(\psi_{i,2}), \ldots , h_d(\psi_{i,d}) )^\transpose$ is a $d$-vector of (transformed) individual parameters, $\mu$ a $d$-vector of fixed effects and $\Omega$ a $d\times d$ variance-covariance matrix.<br />
<br />
We assume here that $\Omega$ is positive-definite. Then, a sufficient statistic for the complete model $\pmacro(\by,\bpsi;\theta)$ is<br />
$\tilde{S}(\bpsi) = (\tilde{S}_1(\bpsi),\tilde{S}_2(\bpsi))$, where<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\tilde{S}_1(\bpsi) &= & \sum_{i=1}^N h(\psi_i) \\<br />
\tilde{S}_2(\bpsi) &= & \sum_{i=1}^N h(\psi_i) h(\psi_i)^\transpose .<br />
\end{eqnarray}</math> }}<br />
<br />
At iteration $k$ of SAEM, we have:<br />
<br />
<br />
* $\textbf{Simulation step}$: for $i=1,2,\ldots, N$, draw $\psi_i^{(k)}$ from $m$ iterations of the MH algorithm described in [[The Metropolis-Hastings algorithm for simulating the individual parameters|The Metropolis-Hastings algorithm]] with $\pmacro(\psi_i |y_i ;\mu_{k-1},\Omega_{k-1})$ as limiting distribution.<br />
<br />
* $\textbf{Stochastic approximation}$: update $s_k=(s_{k,1},s_{k,2})$ according to<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
s_{k,1} &=& s_{k-1,1} + \gamma_k \left( \sum_{i=1}^N h(\psi_i^{(k)}) - s_{k-1,1} \right) \\<br />
s_{k,2} &=& s_{k-1,2} + \gamma_k \left( \sum_{i=1}^N h(\psi_i^{(k)})h(\psi_i^{(k)})^\transpose - s_{k-1,2} \right) .<br />
\end{eqnarray}</math> }}<br />
<br />
<br />
* $\textbf{Maximization step}$: update $(\mu_{k-1},\Omega_{k-1})$ according to<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\mu_{k} &=& \frac{1}{N} s_{k,1} \\<br />
\Omega_k &=& \frac{1}{N}\left( s_{k,2} - s_{k,1}s_{k,1}^\transpose \right) .<br />
\end{eqnarray}</math> }}<br />
<br />
<br />
What is remarkable is that it suffices to be able to calculate $\pcyipsii(y_i | \psi_i)$ for all $\psi_i$ and $y_i$ in order to be able to run SAEM. In effect, this allows the simulation step to be run using MH since the acceptance probabilities can be calculated.<br />
<br />
<br />
<br><br />
<br />
===SAEM for continuous data models===<br />
Consider now a continuous data model in which the residual error variance is now constant:<br />
<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
y_{ij} &=& f(t_{ij},\phi_i) + a \teps_{ij} \\<br />
h(\psi_i) &\sim& {\cal N}( \mu , \Omega) .<br />
\end{eqnarray}</math> }}<br />
<br />
<br />
Here, the individual parameters are $\psi_i=(\phi_i,a)$. The variance-covariance matrix for $\psi_i$ is not positive-definite in this case because $a$ has no variability. If we suppose that the variance matrix $\Omega$ is positive-definite, then noting $\theta=(\mu,\Omega,a)$, a natural decomposition of the model is:<br />
<br />
{{Equation1<br />
|equation=<math>\pmacro(\by,\bpsi;\theta) = \pmacro(\by {{!}} \bpsi;a)\pmacro(\bpsi;\mu,\Omega) .<br />
</math> }}<br />
<br />
The previous statistic $\tilde{S}(\bpsi) = (\tilde{S}_1(\bpsi),\tilde{S}_2(\bpsi))$ is not sufficient for estimating $a$. Indeed, we need an additional component which is a function both of $\by$ and $\bpsi$:<br />
<br />
{{Equation1<br />
|equation=<math> \tilde{S}_3(\by, \bpsi) =\sum_{i=1}^N \sum_{j=1}^{n_i}(y_{ij} - f(t_{ij},\psi_i))^2. </math> }}<br />
<br />
Then,<br />
<br />
{{Equation1<br />
|equation=<math> \begin{eqnarray}<br />
s_{k,3} &=& s_{k-1,3} + \gamma_k ( \tilde{S}_3(\by, \bpsi) - s_{k-1,3} ) \\<br />
a_k^2 &=& \displaystyle{ \frac{1}{\sum_{i=1}^N n_i} s_{k,3} }\ .<br />
\end{eqnarray}</math> }}<br />
<br />
The choice of step-size $(\gamma_k)$ is extremely important for ensuring convergence of SAEM. The sequence $(\gamma_k)$ used in $\monolix$ decreases like $k^{-\alpha}$. We recommend using $\alpha=0$ (that is, $\gamma_k=1$) during the first $K_1$ iterations, in order to converge quickly to a neighborhood of a maximum of the likelihood, and $\alpha=1$ during the next $K_2$ iterations.<br />
Indeed, the initial guess $\theta_0$ may be far from the maximum likelihood value we are looking for, and the first iterations with $\gamma_k=1$ allow SAEM to converge quickly to a neighborhood of this value. Following this, smaller step-sizes ensure the<br />
almost sure convergence of the algorithm to the maximum likelihood estimator.<br />
<br />
<br />
<br />
{{Example<br />
|title=Example<br />
|text= Consider a simple model for continuous data:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
y_{ij} &\sim& {\cal N}(A_i\,e^{-k_i \, t_{ij} } , a^2) \\<br />
\log(A_i)&\sim&{\cal N}(\log(A_{\rm pop}) , \omega_A^2) \\<br />
\log(k_i)&\sim&{\cal N}(\log(k_{\rm pop}) , \omega_k^2) ,<br />
\end{eqnarray}</math> }}<br />
<br />
where $A_{\rm pop}=6$, $k_{\rm pop}=0.25$, $\omega_A=0.3$, $\omega_k=0.3$ and $a=0.2$.<br />
Let us look at the effect of different settings for $(\gamma_k)$ (and $L$) for estimating the population parameters of the model with SAEM.<br />
<br />
<br />
1. For all $k$, $\gamma_k = 1$: the sequence $(\theta_{k})$ converges very quickly to a neighborhood of the "solution". The sequence $(\theta_{k})$ is a homogeneous Markov Chain that converges in distribution but does not converge almost surely. <br />
<br />
[[File:saem1.png|link=]]<br />
<br />
<br />
2. For all $k$, $\gamma_k = 1/k$: the sequence $(\theta_{k})$ converges almost surely to the maximum likelihood estimate of $\theta$, but very slowly. <br />
<br />
[[File:saem2.png|link=]]<br />
<br />
<br />
3. $\gamma_k = 1$, $k=1$, ...,$40$, $\gamma_k = 1/(k-40)$, $k \geq 41$: the sequence $(\theta_{k})$ converges almost surely to the maximum likelihood estimate of $\theta$, and quickly.<br />
<br />
[[File:saem3.png|link=]]<br />
<br />
<br />
4. $L=10$, $\gamma_k = 1$, $k \geq 1$: the sequence $(\theta_{k})$ is an homogeneous Markov chain that converges in distribution, as in Example 1, but the variance is reduced by a factor $\sqrt{10}$; in this case, SAEM behaves like EM. <br />
<br />
[[File:saem4.png|link=]]<br />
}}<br />
<br />
<br />
<br><br />
<br />
==A simple example to understand why SAEM converges in practice==<br />
<br />
<br />
Let us look at a very simple Gaussian model, with only one observation per individual:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\psi_i &\sim& {\cal N}(\theta,\omega^2) , \ \ \ 1 \leq i \leq N \\<br />
y_i &\sim& {\cal N}(\psi_i,\sigma^2).<br />
\end{eqnarray}</math> }}<br />
<br />
We will furthermore assume that both $\omega^2$ and $\sigma^2$ are known.<br />
<br />
Here, the maximum likelihood estimator $ \hat{\theta}$ of $\theta$ is easy to compute since $y_i \sim_{i.i.d.} {\cal N}(\theta,\omega^2+\sigma^2)$. We find that<br />
<br />
{{Equation1<br />
|equation=<math> \hat{\theta} = \displaystyle{\frac{1}{N} }\sum_{i=1}^{N} y_i .<br />
</math>}}<br />
<br />
We now propose to try and compute $\hat{\theta}$ using SAEM instead. The simulation step is straightforward since the conditional distribution of $\psi_i$ is a normal distribution:<br />
<br />
{{Equation1<br />
|equation=<math><br />
\psi_i {{!}} y_i \sim {\cal N}(a \theta + (1-a)y_i , \gamma^2) ,<br />
</math> }}<br />
<br />
where<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
a &= & \displaystyle{ \frac{1}{\omega^2} } \left(\displaystyle{ \frac{1}{\sigma^2} }+ \displaystyle{\frac{1}{\omega^2} }\right)^{-1} \\<br />
\gamma^2 &= &\left(\displaystyle{ \frac{1}{\sigma^2} }+ \displaystyle{\frac{1}{\omega^2} }\right)^{-1}.<br />
\end{eqnarray}</math> }}<br />
<br />
The maximization step is also straightforward. Indeed, a sufficient statistic for estimating $\theta$ is<br />
<br />
{{Equation1<br />
|equation=<math> {\cal S}(\bpsi) = \sum_{i=1}^{N} \psi_i. </math> }}<br />
<br />
Then,<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\tilde{\theta}({\cal S(\bpsi)} ) &=& \argmax{\theta} \pmacro(y_1,\ldots,y_N,\psi_1,\ldots,\psi_N;\theta) \\<br />
&=& \argmax{\theta} \pmacro(\psi_1,\ldots,\psi_N;\theta) \\<br />
&=& \frac{ {\cal S}(\bpsi)}{N}.<br />
\end{eqnarray}</math> }}<br />
<br />
Let us first look at the behavior of SAEM when $\gamma_k=1$. At iteration $k$,<br />
<br />
<br />
* Simulation step: $\psi_i^{(k)} \sim {\cal N}( a \theta_{k-1} + (1-a)y_i , \gamma^2). $<br />
<br />
* Maximization step: $\theta_k = \displaystyle{ \frac{ {\cal S}(\bpsi^{(k)})}{N} } = \displaystyle{ \frac{1}{N} }\sum_{i=1}^{N} \psi_i^{(k)}$.<br />
<br />
<br />
It can be shown that:<br />
<br />
{{Equation1<br />
|equation=<math> \theta_k - \hat{\theta} = a(\theta_{k-1} - \hat{\theta}) + e_k ,<br />
</math> }}<br />
<br />
where $e_k \sim {\cal N}(0, \gamma^2 /N)$. Then, the sequence $(\theta_k)$ is an autoregressive process of order 1 (AR(1)) which converges in distribution to a normal distribution when $k\to \infty$:<br />
<br />
{{Equation1<br />
|equation=<math>\theta_k \limite{}{\cal D} {\cal N}\left(\hat{\theta} , \displaystyle{ \frac{\gamma^2}{N(1-a^2)} }\right) .<br />
</math> }}<br />
<br />
<br />
{{ImageWithCaption|image=saemb1.png|caption=10 sequences $(\theta_k)$ obtained with different initial values and $\gamma_k=1$ for $1\leq k \leq 50$ }} <br />
<br />
<br />
Now, let us see what happens instead when $\gamma_k$ decreases like $1/k$. At iteration $k$,<br />
<br />
<br />
* Simulation step: $\psi_i^{(k)} \sim {\cal N}( a \theta_{k-1} + (1-a)y_i , \gamma^2) $<br />
<br />
* Maximization step:<br />
<br />
{{Equation1<br />
|equation= <math>\theta_k = \theta_{k-1} + \displaystyle{ \frac{1}{k} }\left( \displaystyle{ \frac{1}{N} }\sum_{i=1}^{N} \psi_i^{(k)} -\theta_{k-1} \right). <br />
</math> }}<br />
<br />
<br />
: Here, we can show that:<br />
<br />
{{Equation1<br />
|equation=<math> \theta_k - \hat{\theta} = \displaystyle{ \frac{k-a}{k} }(\theta_{k-1} - \hat{\theta}) + \displaystyle{\frac{e_k}{k} }, <br />
</math> }}<br />
<br />
: where $e_k \sim {\cal N}(0, \gamma^2 /N)$. Then, the sequence $(\theta_k)$ converges almost surely to $\hat{\theta}$.<br />
<br />
<br />
{{ImageWithCaption|image=saemb2.png|caption=10 sequences $(\theta_k)$ obtained with different initial values and $\gamma_k=1/k$ for $1\leq k \leq 50$ }}<br />
<br />
<br />
Thus, we see that by combining the two strategies, the sequence $(\theta_k)$ is a Markov chain that converges to a random walk around $\hat{\theta}$ during the first $K_1$ iterations, then converges almost surely to $\hat{\theta}$ during the next $K_2$ iterations.<br />
<br />
<br />
{{ImageWithCaption|image=saemb3.png|caption=10 sequences $(\theta_k)$ obtained with different initial values, $\gamma_k=1$ for $1\leq k \leq 20$ and $\gamma_k=1/(k-20)$ for $21\leq k \leq 50$ }}<br />
<br />
<br />
{{ShowVideo|image=saem5b.png|video=http://popix.lixoft.net/images/2/20/saem.mp4|caption=The SAEM algorithm in practice. }}<br />
<br />
<!-- {{ImageWithCaptionL|image=saem5.png|size=750px|caption= The SAEM algorithm in practice. (a) the observations and the initialization $p_0(\psi_i)$, (b) the initialization $p_0(\psi_i)$ and the conditional distributions of the observations $p(y_i{{!}}\psi_i)$, (c) the conditional distributions $p_0(\psi_i{{!}}y_i)$ and the simulated individual parameters $(\psi_i^{(1)})$, (d) the updated distribution $p_1(\psi_i)$. }} --><br />
<br />
==A simulated annealing version of SAEM==<br />
<br />
<br />
Convergence of SAEM can strongly depend on the initial guess when the likelihood ${\like}$ has several local maxima. A simulated annealing version of SAEM can improve convergence of the algorithm toward the global maximum of ${\like}$.<br />
<br />
To detail this, we can first rewrite the joint pdf of $(\by,\bpsi)$ as follows:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\pmacro(\by,\bpsi;\theta) = C(\theta)\, \exp \left\{-U(\by,\bpsi;\theta)\right\} ,<br />
\end{eqnarray}</math> }}<br />
<br />
where $C(\theta)$ is a normalizing constant that only depends on $\theta$. Then, for any "temperature" $T\geq0$, we consider the complete model<br />
<br />
{{Equation1<br />
|equation=<math>\pmacro_T(\by,\bpsi;\theta) = C_T(\theta)\, \exp \left\{-\displaystyle{\frac{1}{T} }U(\by,\bpsi;\theta) \right\} ,<br />
</math> }}<br />
<br />
where $C_T(\theta)$ is still a normalizing constant.<br />
<br />
We then introduce a decreasing temperature sequence $(T_k, 1\leq k \leq K)$ and use the SAEM algorithm on the complete model $\pmacro_{T_k}(\by,\bpsi;\theta)$ at iteration $k$ (the usual version of SAEM uses $T_k=1$ at each iteration). The sequence $(T_k)$ is chosen to have large positive values during the first iterations, then decrease with an exponential rate to 1: $ T_k = \max(1, \tau \ T_{k-1}) $.<br />
<br />
Consider for example the following model for continuous data:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
y_{ij} &\sim& {\cal N}(f(t_{ij};\psi_i) , a^2) \\<br />
h(\psi_i) &\sim& {\cal N}(\mu , \Omega) .<br />
\end{eqnarray}</math> }}<br />
<br />
Here, $\theta = (\mu,\Omega,a^2)$ and<br />
<br />
{{Equation1<br />
|equation=<math><br />
\pmacro(\by,\bpsi;\theta) = C(\theta)\, \exp \left\{- \displaystyle{ \frac{1}{2 a^2} }\sum_{i=1}^N \sum_{j=1}^{n_i} (y_{ij} - f(t_{ij};\psi_i))^2 - \displaystyle{ \frac{1}{2} } \sum_{i=1}^N (h(\psi_i)-\mu)^\transpose \Omega (h(\psi_i)-\mu) \right\},<br />
</math> }}<br />
<br />
where $C(\theta)$ is a normalizing constant that only depends on $a$ and $\Omega$.<br />
<br />
<br />
We see that $\pmacro_T(\by,\bpsi;\theta)$ will also be a normal distribution whose residual error variance $a^2$ is replaced by $T a^2$ and variance matrix $\Omega$ for the random effects by $T\Omega$.<br />
In other words, a model with a "large temperature" is a model with large variances.<br />
<br />
The algorithm therefore consists in choosing large initial variances $\Omega_0$ and $a^2_0$ (that include the initial temperature $T_0$ implicitly) and setting $ a^2_k = \max(\tau \ a^2_{k-1} , \hat{a}(\by,\bpsi^{(k)}) $ and $ \Omega_k = \max(\tau \ \Omega_{k-1} , \hat{\Omega}(\bpsi^{(k)}) $ during the first iterations. Here, $0\leq\tau\leq 1$.<br />
<br />
These large values of the variance make the conditional distributions $\pmacro_T(\psi_i | y_i;\theta)$ less concentrated around their modes, and thus allow the sequence $(\theta_k)$ to "escape" from local maxima of the likelihood during the first iterations of SAEM and converge to a neighborhood of the global maximum of ${\like}$.<br />
After these initial iterations, the usual SAEM algorithm is used to estimate these variances at each iteration.<br />
<br />
<br />
{{Remarks<br />
|title= Remark<br />
|text= We can use two different coefficients $\tau_1$ and $\tau_2$ for $\Omega$ and $a^2$ in $\monolix$. It is possible, for example, to choose $\tau_1<1$ and $\tau_2>1$, with large initial inter-subject variances $\Omega_0$ and small initial residual variance $a^2_0$. In this case, SAEM tries to obtain the best possible fit during the first iterations, allowing for a large inter-subject variability. During the next iterations, this variability is reduced and the residual variance increases until reaching the best possible trade-off between the two criteria.<br />
}}<br />
<br />
<br />
{{Example<br />
|title=A PK example<br />
|text= <br />
<br />
Consider a simple one-compartment model for oral administration:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:saem_sa"><math><br />
f(t;ka,V,k) = \displaystyle{ \frac{D\, ka}{V(ka-ke)} }\left( e^{-ke \, t} - e^{-ka \, t} \right) .<br />
</math></div><br />
|reference=(2) }}<br />
<br />
We then simulate PK data from 80 patients using the following population PK parameters:<br />
<br />
{{Equation1<br />
|equation=<math> ka_{\rm pop} = 1, \quad V_{\rm pop}=8, \quad ke_{\rm pop}=0.25 .</math> }}<br />
<br />
We can see that the following parametrization gives the same prediction as the one given in [[#eq:saem_sa|(2)]]:<br />
<br />
{{Equation1<br />
|equation=<math> \tilde{ka} = ke, \quad \tilde{V}=V \times ke/ka, \quad \tilde{ke}=ka . </math> }}<br />
<br />
We can then expect a (global) maximum around $(ka,V,ke) = (1, \ 8, \ 0.25)$ and a (local) maximum of the likelihood around $(ka,V,ke) = (0.25, \ 2, \ 1).$<br />
<br />
The figure below displays the convergence of SAEM without simulated annealing to a local maximum of the likelihood (deviance = $-2\,\log {\like} =816$). The initial values of the population parameters we chose were $(ka_0,V_0,k_0) = (1,1,1)$.<br />
<br />
:{{ImageWithCaption_special|image=recuit1.png|caption=Convergence of SAEM to a local maxima of the likelihood}} <br />
<br />
Using the same initial guess, the simulated annealing version of SAEM converges to the global maximum of the likelihood (deviance = 734).<br />
<br />
:{{ImageWithCaption_special|image=recuit2.png|caption=Convergence of SAEM to the global maxima of the likelihood }}<br />
}}<br />
<br />
<br />
<br><br />
== Bibliography ==<br />
<br />
<br />
<bibtex><br />
@article{allassonniere2010construction,<br />
title={Construction of Bayesian deformable models via a stochastic approximation algorithm: a convergence study},<br />
author={Allassonnière, S. and Kuhn, E. and Trouvé, A.},<br />
journal={Bernoulli},<br />
volume={16},<br />
number={3},<br />
pages={641--678},<br />
year={2010},<br />
publisher={Bernoulli Society for Mathematical Statistics and Probability}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{delattre2012maximum,<br />
title={Maximum likelihood estimation in discrete mixed hidden Markov models using the SAEM algorithm},<br />
author={Delattre, M. and Lavielle, M.},<br />
journal={Computational Statistics & Data Analysis},<br />
year={2012},<br />
volume={56},<br />
pages={2073-2085}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{delattre2013sde,<br />
title={Coupling the SAEM algorithm and the extended Kalman filter for maximum likelihood estimation in mixed-effects diffusion models},<br />
author={Delattre, M. and Lavielle, M.},<br />
journal={Statistics and its interfaces},<br />
year={2013},<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{delyon1999convergence,<br />
title={Convergence of a stochastic approximation version of the EM algorithm},<br />
author={Delyon, B. and Lavielle, M. and Moulines, E.},<br />
journal={Annals of Statistics},<br />
pages={94-128},<br />
year={1999},<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{dempster1977maximum,<br />
title={Maximum likelihood from incomplete data via the EM algorithm},<br />
author={Dempster, A.P. and Laird, N.M. and Rubin, D.B.},<br />
journal={Journal of the Royal Statistical Society. Series B (Methodological)},<br />
pages={1-38},<br />
year={1977},<br />
publisher={JSTOR}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{kuhn2004coupling,<br />
title={Coupling a stochastic approximation version of EM with an MCMC procedure},<br />
author={Kuhn, E. and Lavielle, M.},<br />
journal={ESAIM: Probability and Statistics},<br />
volume={8},<br />
pages={115-131},<br />
year={2004},<br />
publisher={EDP Sciences, 17 Avenue du Hoggar Les Ulis Cedex A BP 112 91944 France}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{lavielle2013improved,<br />
title={An improved SAEM algorithm for maximum likelihood estimation in mixtures of non linear mixed effects models},<br />
author={Lavielle, M. and Mbogning, C.},<br />
journal={Statistics and Computing},<br />
year={2013},<br />
publisher={Springer}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{mclachlan2007algorithm,<br />
title={The EM algorithm and extensions},<br />
author={McLachlan, G.J. and Krishnan, T.},<br />
volume={382},<br />
year={2007},<br />
publisher={Wiley-Interscience}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{samson2006extension,<br />
title={Extension of the SAEM algorithm to left-censored data in nonlinear mixed-effects model: Application to HIV dynamics model},<br />
author={Samson, A. and Lavielle, M. and Mentr&eacute;, F.},<br />
journal={Computational statistics & data analysis},<br />
volume={51},<br />
number={3},<br />
pages={1562-1574},<br />
year={2006},<br />
publisher={Elsevier}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{wei1990monte,<br />
title={A Monte Carlo implementation of the EM algorithm and the poor man's data augmentation algorithms},<br />
author={Wei, G. and Tanner, M.},<br />
journal={Journal of the American Statistical Association},<br />
volume={85},<br />
number={411},<br />
pages={699-704},<br />
year={1990},<br />
publisher={Taylor & Francis}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{wu1983convergence,<br />
title={On the convergence properties of the EM algorithm},<br />
author={Wu, C.F.},<br />
journal={The Annals of Statistics},<br />
volume={11},<br />
number={1},<br />
pages={95-103},<br />
year={1983},<br />
publisher={Institute of Mathematical Statistics}<br />
}<br />
</bibtex><br />
<br />
<br />
<br />
{{Back&Next<br />
|linkBack=Introduction and notation<br />
|linkNext=The Metropolis-Hastings algorithm for simulating the individual parameters }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=The_SAEM_algorithm_for_estimating_population_parameters&diff=7298The SAEM algorithm for estimating population parameters2013-06-17T09:12:06Z<p>Brocco: </p>
<hr />
<div>==Introduction ==<br />
<br />
<br />
The SAEM (Stochastic Approximation of EM) algorithm is a stochastic algorithm for calculating the maximum likelihood estimator (MLE) in the quite general setting of incomplete data models. SAEM has been shown to be a very powerful NLMEM tool, known to accurately estimate population parameters as well as having good theoretical properties. In fact, it converges to the MLE under very general hypotheses.<br />
<br />
SAEM was first implemented in the $\monolix$ software. It has also been implemented in NONMEM, the {{Verbatim|R}} package {{Verbatim|saemix}} and the Matlab statistics toolbox as the function {{Verbatim|nlmefitsa.m}}.<br />
<br />
Here, we consider a model that includes observations $\by=(y_i , 1\leq i \leq N)$, unobserved individual parameters $\bpsi=(\psi_i , 1\leq i \leq N)$ and a vector of parameters $\theta$. By definition, the maximum likelihood estimator of $\theta$ maximizes<br />
<br />
{{Equation1<br />
|equation=<math> {\like}(\theta ; \by) = \py(\by ; \theta) = \displaystyle{ \int \pypsi(\by,\bpsi ; \theta) \, d \bpsi}.<br />
</math> }}<br />
<br />
<br />
SAEM is an iterative algorithm that essentially consists of constructing $N$ [en.wikipedia.org/wiki/Markov_chain Markov chains] $(\psi_1^{(k)})$, ..., $(\psi_N^{(k)})$ that converge to the conditional distributions $\pmacro(\psi_1|y_1),\ldots , \pmacro(\psi_N|y_N)$, using at each step the complete data $(\by,\bpsi^{(k)})$ to calculate a new parameter vector $\theta_k$. We will present a general description of the algorithm highlighting the connection with the EM algorithm, and present by way of a simple example how to implement SAEM and use it in practice.<br />
<br />
We will also give some extensions of the base algorithm that allow us to improve the convergence properties of the algorithm. For instance, it is possible to stabilize the algorithm's convergence by using several Markov chains per individual. Also, a simulated annealing version of SAEM allows us improve the chances of converging to the global maximum of the likelihood rather than to local maxima.<br />
<br />
<br />
<br><br />
==The EM algorithm==<br />
<br />
<br />
We first remark that if the individual parameters $\bpsi=(\psi_i)$ are observed, estimation is not thwarted by any particular problem because an estimator could be found by directly maximizing the joint distribution $\pypsi(\by,\bpsi ; \theta) $.<br />
<br />
However, since the $\psi_i$ are not observed, the EM algorithm replaces $\bpsi$ by its conditional expectation. Then, given some initial value $\theta_0$, iteration $k$ updates ${\theta}_{k-1}$ to ${\theta}_{k}$ with the two following steps:<br />
<br />
<br />
* $\textbf{E-step:}$ evaluate the quantity<br />
<br />
{{Equation1<br />
|equation=<math> Q_k(\theta)=\esp{\log \pmacro(\by,\bpsi;\theta){{!}} \by;\theta_{k-1} } .</math> }}<br />
<br />
<br />
* $\textbf{M-step:}$ update the estimation of $\theta$:<br />
<br />
{{Equation1<br />
|equation=<math> \theta_{k} = \argmax{\theta} \, Q_k(\theta) .<br />
</math> }}<br />
<br />
<br />
In can be proved that each EM iteration increases the likelihood of observations and that the EM sequence $(\theta_k)$ converges to a<br />
stationary point of the observed likelihood under mild regularity conditions.<br />
<br />
Unfortunately, in the framework of nonlinear mixed-effects models, there is no explicit expression for the E-step since the relationship between observations $\by$ and individual parameters $\bpsi$ is nonlinear. However, even though this expectation cannot be computed in a closed-form, it can be approximated by simulation. For instance,<br />
<br />
<br />
* The Monte Carlo EM (MCEM) algorithm replaces the E-step by a Monte Carlo approximation based on a large number of independent simulations of the non-observed individual parameters $\bpsi$.<br />
<br />
* The SAEM algorithm replaces the E-step by a stochastic approximation based on a single simulation of $\bpsi$.<br />
<br />
<br />
<br><br />
<br />
==The SAEM algorithm==<br />
<br />
At iteration $k$ of SAEM:<br />
<br />
<br />
* $\textbf{Simulation step}$: for $i=1,2,\ldots, N$, draw $\psi_i^{(k)}$ from the conditional distribution $\pmacro(\psi_i |y_i ;\theta_{k-1})$.<br />
<br />
<br />
* $\textbf{Stochastic approximation}$: update $Q_k(\theta)$ according to<br />
<br />
{{Equation1<br />
|equation=<math> Q_k(\theta) = Q_{k-1}(\theta) + \gamma_k ( \log \pmacro(\by,\bpsi^{(k)};\theta) - Q_{k-1}(\theta) ),<br />
</math> }}<br />
<br />
where $(\gamma_k)$ is a decreasing sequence of positive numbers such that $\gamma_1=1$, $ \sum_{k=1}^{\infty} \gamma_k = \infty$ and $\sum_{k=1}^{\infty} \gamma_k^2 < \infty$.<br />
<br />
<br />
* $\textbf{Maximization step}$: update $\theta_{k-1}$ according to<br />
<br />
{{Equation1<br />
|equation=<math> \theta_{k} = \argmax{\theta} \, Q_k(\theta) .</math> }}<br />
<br />
<br />
{{Remarks <br />
|title=Remarks<br />
|text= &#32;<br />
* Setting $\gamma_k=1$ for all $k$ means that there is no memory in the stochastic approximation:<br />
<br />
{{Equation1<br />
|equation=<math> Q_k(\theta) = \log \pmacro(\by,\bpsi^{(k)};\theta) . </math> }}<br />
<br />
: This algorithm, known as Stochastic EM (SEM) thus consists of successively simulating $\bpsi^{(k)}$ with the conditional distribution $\pmacro(\bpsi^{(k)} {{!}} \by;\theta_{k-1})$, then computing $\theta_k$ by maximizing the joint distribution $\pmacro(\by,\bpsi^{(k)};\theta)$.<br />
<br />
<br />
* When the number $N$ of subjects is small, convergence of SAEM can be improved by running $L$ Markov chains for each individual instead of one. The simulation step at iteration $k$ then requires us to draw $L$ sequences $ { \phi_i^{(k,1)} } ,\ldots , { \phi_i^{(k,L)} } $ for each individual $i$ and to combine stochastic approximation and Monte Carlo in the approximation step:<br />
<br />
{{Equation1<br />
|equation=<math> Q_k(\theta) = Q_{k-1}(\theta) + \gamma_k \left( \frac{1}{L}\sum_{\ell=1}^{L} \log \pmacro(\by,\bpsi^{(k,\ell)};\theta) - Q_{k-1}(\theta) \right) .<br />
</math> }}<br />
<br />
: By default, $\monolix$ selects $L$ so that $N\times L \geq 50$.<br />
}}<br />
<br />
<br />
Implementation of SAEM is simplified when the complete model $\pmacro(\by,\bpsi;\theta)$ belongs to a regular (curved) exponential family:<br />
<br />
{{Equation1<br />
|equation=<math> \pmacro(\by,\bpsi ;\theta) = \exp\left\{ - \zeta(\theta) + \langle \tilde{S}(\by,\bpsi) , \varphi(\theta) \rangle \right\} , </math> }}<br />
<br />
where $\tilde{S}(\by,\bpsi)$ is a sufficient statistic of the complete model (i.e., whose value contains all the information needed to compute any estimate of $\theta$) which takes its values in an open subset ${\cal S}$ of $\Rset^m$. Then, there exists a function $\tilde{\theta}$ such that for any $s\in {\cal S}$,<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:saem_stat"><math><br />
\tilde{\theta}(s) = \argmax{\theta} \left\{ - \zeta(\theta) + \langle s , \varphi(\theta) \rangle \right\} .<br />
</math></div><br />
|reference=(1) }}<br />
<br />
The approximation step of SAEM simplifies to a general Robbins-Monro-type scheme for approximating this conditional expectation:<br />
<br />
<br />
* $\textbf{Stochastic approximation}$: update $s_k$ according to<br />
<br />
{{Equation1<br />
|equation=<math><br />
s_k = s_{k-1} + \gamma_k ( \tilde{S}(\by,\bpsi^{(k)}) - s_{k-1} ) . </math> }}<br />
<br />
<br />
Note that the E-step of EM simplifies to computing $s_k=\esp{\tilde{S}(\by,\bpsi) | \by ; \theta_{k-1}}$.<br />
<br />
Then, both EM and SAEM algorithms use [[#eq:saem_stat|(1)]] for the M-step: $\theta_k = \tilde{\theta}(s_k)$.<br />
<br />
Precise results for convergence of SAEM were obtained in the [[Estimation of the observed Fisher information matrix#Estimation using linearization of the model|Estimation of the F.I.M. using a linearization of the model]] chapter in the case where $\pmacro(\by,\bpsi;\theta)$ belongs to a regular curved exponential family. This first version of [[The SAEM algorithm for estimating population parameters|SAEM]] and these first results assume that the individual parameters are simulated exactly under the conditional distribution at each iteration. Unfortunately, for most nonlinear models or non-Gaussian models, the unobserved data cannot be simulated exactly under this conditional distribution. A well-known alternative consists in using the Metropolis-Hastings algorithm: introduce a transition probability which has as unique invariant distribution the conditional distribution we want to simulate.<br />
<br />
In other words, the procedure consists of replacing the Simulation step of SAEM at iteration $k$ by $m$ iterations of the<br />
Metropolis-Hastings (MH) algorithm described in [[The Metropolis-Hastings algorithm for simulating the individual parameters|The Metropolis-Hastings algorithm]] section. It was shown in the [[Estimation of the observed Fisher information matrix#Estimation using linearization of the model|Estimation of the F.I.M. using a linearization of the model]] section that [[The SAEM algorithm for estimating population parameters|SAEM]] still converges under general conditions when coupled with a Markov chain Monte Carlo procedure.<br />
<br />
<br />
{{Remarks<br />
|title= Remark<br />
|text= Convergence of the Markov chains $(\psi_i^{(k)})$ is not necessary at each SAEM iteration. It suffices to run a few MH iterations with various transition kernels before resetting $\theta_{k-1}$. In $\monolix$ by default, three transition kernels are used twice each, successively, in each SAEM iteration.<br />
}}<br />
<br />
<br />
<br><br />
<br />
== Implementing SAEM ==<br />
<br />
Implementation of SAEM can be difficult to describe when looking at complex statistical models such as mixture models, models with inter-occasion variability, etc. We are therefore going to limit ourselves to looking at some basic models in order to illustrate how SAEM can be implemented.<br />
<br />
<br><br />
===SAEM for general hierarchical models===<br />
<br />
Consider first a very general model for any type (continuous, categorical, survival, etc.) of data $(y_i)$:<br />
<br />
{{Equation1<br />
|equation= <math>\begin{eqnarray} y_i {{!}} \psi_i &\sim& \pcyipsii(y_i {{!}} \psi_i) \\<br />
h(\psi_i) &\sim& {\cal N}( \mu , \Omega),<br />
\end{eqnarray}</math> }}<br />
<br />
where $h(\psi_i)=(h_1(\psi_{i,1}), h_2(\psi_{i,2}), \ldots , h_d(\psi_{i,d}) )^\transpose$ is a $d$-vector of (transformed) individual parameters, $\mu$ a $d$-vector of fixed effects and $\Omega$ a $d\times d$ variance-covariance matrix.<br />
<br />
We assume here that $\Omega$ is positive-definite. Then, a sufficient statistic for the complete model $\pmacro(\by,\bpsi;\theta)$ is<br />
$\tilde{S}(\bpsi) = (\tilde{S}_1(\bpsi),\tilde{S}_2(\bpsi))$, where<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\tilde{S}_1(\bpsi) &= & \sum_{i=1}^N h(\psi_i) \\<br />
\tilde{S}_2(\bpsi) &= & \sum_{i=1}^N h(\psi_i) h(\psi_i)^\transpose .<br />
\end{eqnarray}</math> }}<br />
<br />
At iteration $k$ of SAEM, we have:<br />
<br />
<br />
* $\textbf{Simulation step}$: for $i=1,2,\ldots, N$, draw $\psi_i^{(k)}$ from $m$ iterations of the MH algorithm described in [[The Metropolis-Hastings algorithm for simulating the individual parameters|The Metropolis-Hastings algorithm]] with $\pmacro(\psi_i |y_i ;\mu_{k-1},\Omega_{k-1})$ as limiting distribution.<br />
<br />
* $\textbf{Stochastic approximation}$: update $s_k=(s_{k,1},s_{k,2})$ according to<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
s_{k,1} &=& s_{k-1,1} + \gamma_k \left( \sum_{i=1}^N h(\psi_i^{(k)}) - s_{k-1,1} \right) \\<br />
s_{k,2} &=& s_{k-1,2} + \gamma_k \left( \sum_{i=1}^N h(\psi_i^{(k)})h(\psi_i^{(k)})^\transpose - s_{k-1,2} \right) .<br />
\end{eqnarray}</math> }}<br />
<br />
<br />
* $\textbf{Maximization step}$: update $(\mu_{k-1},\Omega_{k-1})$ according to<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\mu_{k} &=& \frac{1}{N} s_{k,1} \\<br />
\Omega_k &=& \frac{1}{N}\left( s_{k,2} - s_{k,1}s_{k,1}^\transpose \right) .<br />
\end{eqnarray}</math> }}<br />
<br />
<br />
What is remarkable is that it suffices to be able to calculate $\pcyipsii(y_i | \psi_i)$ for all $\psi_i$ and $y_i$ in order to be able to run SAEM. In effect, this allows the simulation step to be run using MH since the acceptance probabilities can be calculated.<br />
<br />
<br />
<br><br />
<br />
===SAEM for continuous data models===<br />
Consider now a continuous data model in which the residual error variance is now constant:<br />
<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
y_{ij} &=& f(t_{ij},\phi_i) + a \teps_{ij} \\<br />
h(\psi_i) &\sim& {\cal N}( \mu , \Omega) .<br />
\end{eqnarray}</math> }}<br />
<br />
<br />
Here, the individual parameters are $\psi_i=(\phi_i,a)$. The variance-covariance matrix for $\psi_i$ is not positive-definite in this case because $a$ has no variability. If we suppose that the variance matrix $\Omega$ is positive-definite, then noting $\theta=(\mu,\Omega,a)$, a natural decomposition of the model is:<br />
<br />
{{Equation1<br />
|equation=<math>\pmacro(\by,\bpsi;\theta) = \pmacro(\by {{!}} \bpsi;a)\pmacro(\bpsi;\mu,\Omega) .<br />
</math> }}<br />
<br />
The previous statistic $\tilde{S}(\bpsi) = (\tilde{S}_1(\bpsi),\tilde{S}_2(\bpsi))$ is not sufficient for estimating $a$. Indeed, we need an additional component which is a function both of $\by$ and $\bpsi$:<br />
<br />
{{Equation1<br />
|equation=<math> \tilde{S}_3(\by, \bpsi) =\sum_{i=1}^N \sum_{j=1}^{n_i}(y_{ij} - f(t_{ij},\psi_i))^2. </math> }}<br />
<br />
Then,<br />
<br />
{{Equation1<br />
|equation=<math> \begin{eqnarray}<br />
s_{k,3} &=& s_{k-1,3} + \gamma_k ( \tilde{S}_3(\by, \bpsi) - s_{k-1,3} ) \\<br />
a_k^2 &=& \displaystyle{ \frac{1}{\sum_{i=1}^N n_i} s_{k,3} }\ .<br />
\end{eqnarray}</math> }}<br />
<br />
The choice of step-size $(\gamma_k)$ is extremely important for ensuring convergence of SAEM. The sequence $(\gamma_k)$ used in $\monolix$ decreases like $k^{-\alpha}$. We recommend using $\alpha=0$ (that is, $\gamma_k=1$) during the first $K_1$ iterations, in order to converge quickly to a neighborhood of a maximum of the likelihood, and $\alpha=1$ during the next $K_2$ iterations.<br />
Indeed, the initial guess $\theta_0$ may be far from the maximum likelihood value we are looking for, and the first iterations with $\gamma_k=1$ allow SAEM to converge quickly to a neighborhood of this value. Following this, smaller step-sizes ensure the<br />
almost sure convergence of the algorithm to the maximum likelihood estimator.<br />
<br />
<br />
<br />
{{Example<br />
|title=Example<br />
|text= Consider a simple model for continuous data:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
y_{ij} &\sim& {\cal N}(A_i\,e^{-k_i \, t_{ij} } , a^2) \\<br />
\log(A_i)&\sim&{\cal N}(\log(A_{\rm pop}) , \omega_A^2) \\<br />
\log(k_i)&\sim&{\cal N}(\log(k_{\rm pop}) , \omega_k^2) ,<br />
\end{eqnarray}</math> }}<br />
<br />
where $A_{\rm pop}=6$, $k_{\rm pop}=0.25$, $\omega_A=0.3$, $\omega_k=0.3$ and $a=0.2$.<br />
Let us look at the effect of different settings for $(\gamma_k)$ (and $L$) for estimating the population parameters of the model with SAEM.<br />
<br />
<br />
1. For all $k$, $\gamma_k = 1$: the sequence $(\theta_{k})$ converges very quickly to a neighborhood of the "solution". The sequence $(\theta_{k})$ is a homogeneous Markov Chain that converges in distribution but does not converge almost surely. <br />
<br />
[[File:saem1.png|link=]]<br />
<br />
<br />
2. For all $k$, $\gamma_k = 1/k$: the sequence $(\theta_{k})$ converges almost surely to the maximum likelihood estimate of $\theta$, but very slowly. <br />
<br />
[[File:saem2.png|link=]]<br />
<br />
<br />
3. $\gamma_k = 1$, $k=1$, ...,$40$, $\gamma_k = 1/(k-40)$, $k \geq 41$: the sequence $(\theta_{k})$ converges almost surely to the maximum likelihood estimate of $\theta$, and quickly.<br />
<br />
[[File:saem3.png|link=]]<br />
<br />
<br />
4. $L=10$, $\gamma_k = 1$, $k \geq 1$: the sequence $(\theta_{k})$ is an homogeneous Markov chain that converges in distribution, as in Example 1, but the variance is reduced by a factor $\sqrt{10}$; in this case, SAEM behaves like EM. <br />
<br />
[[File:saem4.png|link=]]<br />
}}<br />
<br />
<br />
<br><br />
<br />
==A simple example to understand why SAEM converges in practice==<br />
<br />
<br />
Let us look at a very simple Gaussian model, with only one observation per individual:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\psi_i &\sim& {\cal N}(\theta,\omega^2) , \ \ \ 1 \leq i \leq N \\<br />
y_i &\sim& {\cal N}(\psi_i,\sigma^2).<br />
\end{eqnarray}</math> }}<br />
<br />
We will furthermore assume that both $\omega^2$ and $\sigma^2$ are known.<br />
<br />
Here, the maximum likelihood estimator $ \hat{\theta}$ of $\theta$ is easy to compute since $y_i \sim_{i.i.d.} {\cal N}(\theta,\omega^2+\sigma^2)$. We find that<br />
<br />
{{Equation1<br />
|equation=<math> \hat{\theta} = \displaystyle{\frac{1}{N} }\sum_{i=1}^{N} y_i .<br />
</math>}}<br />
<br />
We now propose to try and compute $\hat{\theta}$ using SAEM instead. The simulation step is straightforward since the conditional distribution of $\psi_i$ is a normal distribution:<br />
<br />
{{Equation1<br />
|equation=<math><br />
\psi_i {{!}} y_i \sim {\cal N}(a \theta + (1-a)y_i , \gamma^2) ,<br />
</math> }}<br />
<br />
where<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
a &= & \displaystyle{ \frac{1}{\omega^2} } \left(\displaystyle{ \frac{1}{\sigma^2} }+ \displaystyle{\frac{1}{\omega^2} }\right)^{-1} \\<br />
\gamma^2 &= &\left(\displaystyle{ \frac{1}{\sigma^2} }+ \displaystyle{\frac{1}{\omega^2} }\right)^{-1}.<br />
\end{eqnarray}</math> }}<br />
<br />
The maximization step is also straightforward. Indeed, a sufficient statistic for estimating $\theta$ is<br />
<br />
{{Equation1<br />
|equation=<math> {\cal S}(\bpsi) = \sum_{i=1}^{N} \psi_i. </math> }}<br />
<br />
Then,<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\tilde{\theta}({\cal S(\bpsi)} ) &=& \argmax{\theta} \pmacro(y_1,\ldots,y_N,\psi_1,\ldots,\psi_N;\theta) \\<br />
&=& \argmax{\theta} \pmacro(\psi_1,\ldots,\psi_N;\theta) \\<br />
&=& \frac{ {\cal S}(\bpsi)}{N}.<br />
\end{eqnarray}</math> }}<br />
<br />
Let us first look at the behavior of SAEM when $\gamma_k=1$. At iteration $k$,<br />
<br />
<br />
* Simulation step: $\psi_i^{(k)} \sim {\cal N}( a \theta_{k-1} + (1-a)y_i , \gamma^2). $<br />
<br />
* Maximization step: $\theta_k = \displaystyle{ \frac{ {\cal S}(\bpsi^{(k)})}{N} } = \displaystyle{ \frac{1}{N} }\sum_{i=1}^{N} \psi_i^{(k)}$.<br />
<br />
<br />
It can be shown that:<br />
<br />
{{Equation1<br />
|equation=<math> \theta_k - \hat{\theta} = a(\theta_{k-1} - \hat{\theta}) + e_k ,<br />
</math> }}<br />
<br />
where $e_k \sim {\cal N}(0, \gamma^2 /N)$. Then, the sequence $(\theta_k)$ is an autoregressive process of order 1 (AR(1)) which converges in distribution to a normal distribution when $k\to \infty$:<br />
<br />
{{Equation1<br />
|equation=<math>\theta_k \limite{}{\cal D} {\cal N}\left(\hat{\theta} , \displaystyle{ \frac{\gamma^2}{N(1-a^2)} }\right) .<br />
</math> }}<br />
<br />
<br />
{{ImageWithCaption|image=saemb1.png|caption=10 sequences $(\theta_k)$ obtained with different initial values and $\gamma_k=1$ for $1\leq k \leq 50$ }} <br />
<br />
<br />
Now, let us see what happens instead when $\gamma_k$ decreases like $1/k$. At iteration $k$,<br />
<br />
<br />
* Simulation step: $\psi_i^{(k)} \sim {\cal N}( a \theta_{k-1} + (1-a)y_i , \gamma^2) $<br />
<br />
* Maximization step:<br />
<br />
{{Equation1<br />
|equation= <math>\theta_k = \theta_{k-1} + \displaystyle{ \frac{1}{k} }\left( \displaystyle{ \frac{1}{N} }\sum_{i=1}^{N} \psi_i^{(k)} -\theta_{k-1} \right). <br />
</math> }}<br />
<br />
<br />
: Here, we can show that:<br />
<br />
{{Equation1<br />
|equation=<math> \theta_k - \hat{\theta} = \displaystyle{ \frac{k-a}{k} }(\theta_{k-1} - \hat{\theta}) + \displaystyle{\frac{e_k}{k} }, <br />
</math> }}<br />
<br />
: where $e_k \sim {\cal N}(0, \gamma^2 /N)$. Then, the sequence $(\theta_k)$ converges almost surely to $\hat{\theta}$.<br />
<br />
<br />
{{ImageWithCaption|image=saemb2.png|caption=10 sequences $(\theta_k)$ obtained with different initial values and $\gamma_k=1/k$ for $1\leq k \leq 50$ }}<br />
<br />
<br />
Thus, we see that by combining the two strategies, the sequence $(\theta_k)$ is a Markov chain that converges to a random walk around $\hat{\theta}$ during the first $K_1$ iterations, then converges almost surely to $\hat{\theta}$ during the next $K_2$ iterations.<br />
<br />
<br />
{{ImageWithCaption|image=saemb3.png|caption=10 sequences $(\theta_k)$ obtained with different initial values, $\gamma_k=1$ for $1\leq k \leq 20$ and $\gamma_k=1/(k-20)$ for $21\leq k \leq 50$ }}<br />
<br />
<br />
{{ShowVideo|image=saem5b.png|video=http://popix.lixoft.net/images/2/20/saem.mp4|caption=The SAEM algorithm in practice. }}<br />
<br />
<!-- {{ImageWithCaptionL|image=saem5.png|size=750px|caption= The SAEM algorithm in practice. (a) the observations and the initialization $p_0(\psi_i)$, (b) the initialization $p_0(\psi_i)$ and the conditional distributions of the observations $p(y_i{{!}}\psi_i)$, (c) the conditional distributions $p_0(\psi_i{{!}}y_i)$ and the simulated individual parameters $(\psi_i^{(1)})$, (d) the updated distribution $p_1(\psi_i)$. }} --><br />
<br />
==A simulated annealing version of SAEM==<br />
<br />
<br />
Convergence of SAEM can strongly depend on the initial guess when the likelihood ${\like}$ has several local maxima. A simulated annealing version of SAEM can improve convergence of the algorithm toward the global maximum of ${\like}$.<br />
<br />
To detail this, we can first rewrite the joint pdf of $(\by,\bpsi)$ as follows:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\pmacro(\by,\bpsi;\theta) = C(\theta)\, \exp \left\{-U(\by,\bpsi;\theta)\right\} ,<br />
\end{eqnarray}</math> }}<br />
<br />
where $C(\theta)$ is a normalizing constant that only depends on $\theta$. Then, for any "temperature" $T\geq0$, we consider the complete model<br />
<br />
{{Equation1<br />
|equation=<math>\pmacro_T(\by,\bpsi;\theta) = C_T(\theta)\, \exp \left\{-\displaystyle{\frac{1}{T} }U(\by,\bpsi;\theta) \right\} ,<br />
</math> }}<br />
<br />
where $C_T(\theta)$ is still a normalizing constant.<br />
<br />
We then introduce a decreasing temperature sequence $(T_k, 1\leq k \leq K)$ and use the SAEM algorithm on the complete model $\pmacro_{T_k}(\by,\bpsi;\theta)$ at iteration $k$ (the usual version of SAEM uses $T_k=1$ at each iteration). The sequence $(T_k)$ is chosen to have large positive values during the first iterations, then decrease with an exponential rate to 1: $ T_k = \max(1, \tau \ T_{k-1}) $.<br />
<br />
Consider for example the following model for continuous data:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
y_{ij} &\sim& {\cal N}(f(t_{ij};\psi_i) , a^2) \\<br />
h(\psi_i) &\sim& {\cal N}(\mu , \Omega) .<br />
\end{eqnarray}</math> }}<br />
<br />
Here, $\theta = (\mu,\Omega,a^2)$ and<br />
<br />
{{Equation1<br />
|equation=<math><br />
\pmacro(\by,\bpsi;\theta) = C(\theta)\, \exp \left\{- \displaystyle{ \frac{1}{2 a^2} }\sum_{i=1}^N \sum_{j=1}^{n_i} (y_{ij} - f(t_{ij};\psi_i))^2 - \displaystyle{ \frac{1}{2} } \sum_{i=1}^N (h(\psi_i)-\mu)^\transpose \Omega (h(\psi_i)-\mu) \right\},<br />
</math> }}<br />
<br />
where $C(\theta)$ is a normalizing constant that only depends on $a$ and $\Omega$.<br />
<br />
<br />
We see that $\pmacro_T(\by,\bpsi;\theta)$ will also be a normal distribution whose residual error variance $a^2$ is replaced by $T a^2$ and variance matrix $\Omega$ for the random effects by $T\Omega$.<br />
In other words, a model with a "large temperature" is a model with large variances.<br />
<br />
The algorithm therefore consists in choosing large initial variances $\Omega_0$ and $a^2_0$ (that include the initial temperature $T_0$ implicitly) and setting $ a^2_k = \max(\tau \ a^2_{k-1} , \hat{a}(\by,\bpsi^{(k)}) $ and $ \Omega_k = \max(\tau \ \Omega_{k-1} , \hat{\Omega}(\bpsi^{(k)}) $ during the first iterations. Here, $0\leq\tau\leq 1$.<br />
<br />
These large values of the variance make the conditional distributions $\pmacro_T(\psi_i | y_i;\theta)$ less concentrated around their modes, and thus allow the sequence $(\theta_k)$ to "escape" from local maxima of the likelihood during the first iterations of SAEM and converge to a neighborhood of the global maximum of ${\like}$.<br />
After these initial iterations, the usual SAEM algorithm is used to estimate these variances at each iteration.<br />
<br />
<br />
{{Remarks<br />
|title= Remark<br />
|text= We can use two different coefficients $\tau_1$ and $\tau_2$ for $\Omega$ and $a^2$ in $\monolix$. It is possible, for example, to choose $\tau_1<1$ and $\tau_2>1$, with large initial inter-subject variances $\Omega_0$ and small initial residual variance $a^2_0$. In this case, SAEM tries to obtain the best possible fit during the first iterations, allowing for a large inter-subject variability. During the next iterations, this variability is reduced and the residual variance increases until reaching the best possible trade-off between the two criteria.<br />
}}<br />
<br />
<br />
{{Example<br />
|title=A PK example<br />
|text= <br />
<br />
Consider a simple one-compartment model for oral administration:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:saem_sa"><math><br />
f(t;ka,V,k) = \displaystyle{ \frac{D\, ka}{V(ka-ke)} }\left( e^{-ke \, t} - e^{-ka \, t} \right) .<br />
</math></div><br />
|reference=(2) }}<br />
<br />
We then simulate PK data from 80 patients using the following population PK parameters:<br />
<br />
{{Equation1<br />
|equation=<math> ka_{\rm pop} = 1, \quad V_{\rm pop}=8, \quad ke_{\rm pop}=0.25 .</math> }}<br />
<br />
We can see that the following parametrization gives the same prediction as the one given in [[#eq:saem_sa|(2)]]:<br />
<br />
{{Equation1<br />
|equation=<math> \tilde{ka} = ke, \quad \tilde{V}=V \times ke/ka, \quad \tilde{ke}=ka . </math> }}<br />
<br />
We can then expect a (global) maximum around $(ka,V,ke) = (1, \ 8, \ 0.25)$ and a (local) maximum of the likelihood around $(ka,V,ke) = (0.25, \ 2, \ 1).$<br />
<br />
The figure below displays the convergence of SAEM without simulated annealing to a local maximum of the likelihood (deviance = $-2\,\log {\like} =816$). The initial values of the population parameters we chose were $(ka_0,V_0,k_0) = (1,1,1)$.<br />
<br />
:{{ImageWithCaption_special|image=recuit1.png|caption=Convergence of SAEM to a local maxima of the likelihood}} <br />
<br />
Using the same initial guess, the simulated annealing version of SAEM converges to the global maximum of the likelihood (deviance = 734).<br />
<br />
:{{ImageWithCaption_special|image=recuit2.png|caption=Convergence of SAEM to the global maxima of the likelihood }}<br />
}}<br />
<br />
<br />
<br><br />
== Bibliography ==<br />
<br />
<br />
<bibtex><br />
@article{allassonniere2010construction,<br />
title={Construction of Bayesian deformable models via a stochastic approximation algorithm: a convergence study},<br />
author={Allassonnière, S. and Kuhn, E. and Trouvé, A.},<br />
journal={Bernoulli},<br />
volume={16},<br />
number={3},<br />
pages={641--678},<br />
year={2010},<br />
publisher={Bernoulli Society for Mathematical Statistics and Probability}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{delattre2012maximum,<br />
title={Maximum likelihood estimation in discrete mixed hidden Markov models using the SAEM algorithm},<br />
author={Delattre, M. and Lavielle, M.},<br />
journal={Computational Statistics & Data Analysis},<br />
year={2012},<br />
volume={56},<br />
pages={2073-2085}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{delattre2013sde,<br />
title={Coupling the SAEM algorithm and the extended Kalman filter for maximum likelihood estimation in mixed-effects diffusion models},<br />
author={Delattre, M. and Lavielle, M.},<br />
journal={Statistics and its interfaces},<br />
year={2013},<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{delyon1999convergence,<br />
title={Convergence of a stochastic approximation version of the EM algorithm},<br />
author={Delyon, B. and Lavielle, M. and Moulines, E.},<br />
journal={Annals of Statistics},<br />
pages={94-128},<br />
year={1999},<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{dempster1977maximum,<br />
title={Maximum likelihood from incomplete data via the EM algorithm},<br />
author={Dempster, A.P. and Laird, N.M. and Rubin, D.B.},<br />
journal={Journal of the Royal Statistical Society. Series B (Methodological)},<br />
pages={1-38},<br />
year={1977},<br />
publisher={JSTOR}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{kuhn2004coupling,<br />
title={Coupling a stochastic approximation version of EM with an MCMC procedure},<br />
author={Kuhn, E. and Lavielle, M.},<br />
journal={ESAIM: Probability and Statistics},<br />
volume={8},<br />
pages={115-131},<br />
year={2004},<br />
publisher={EDP Sciences, 17 Avenue du Hoggar Les Ulis Cedex A BP 112 91944 France}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{lavielle2013improved,<br />
title={An improved SAEM algorithm for maximum likelihood estimation in mixtures of non linear mixed effects models},<br />
author={Lavielle, M. and Mbogning, C.},<br />
journal={Statistics and Computing},<br />
year={2013},<br />
publisher={Springer}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{mclachlan2007algorithm,<br />
title={The EM algorithm and extensions},<br />
author={McLachlan, G.J. and Krishnan, T.},<br />
volume={382},<br />
year={2007},<br />
publisher={Wiley-Interscience}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{samson2006extension,<br />
title={Extension of the SAEM algorithm to left-censored data in nonlinear mixed-effects model: Application to HIV dynamics model},<br />
author={Samson, A. and Lavielle, M. and Mentr&eacute;, F.},<br />
journal={Computational statistics & data analysis},<br />
volume={51},<br />
number={3},<br />
pages={1562-1574},<br />
year={2006},<br />
publisher={Elsevier}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{wei1990monte,<br />
title={A Monte Carlo implementation of the EM algorithm and the poor man's data augmentation algorithms},<br />
author={Wei, G. and Tanner, M.},<br />
journal={Journal of the American Statistical Association},<br />
volume={85},<br />
number={411},<br />
pages={699-704},<br />
year={1990},<br />
publisher={Taylor & Francis}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{wu1983convergence,<br />
title={On the convergence properties of the EM algorithm},<br />
author={Wu, C.F.},<br />
journal={The Annals of Statistics},<br />
volume={11},<br />
number={1},<br />
pages={95-103},<br />
year={1983},<br />
publisher={Institute of Mathematical Statistics}<br />
}<br />
</bibtex><br />
<br />
<br />
<br />
{{Back&Next<br />
|linkBack=Introduction and notation<br />
|linkNext=The Metropolis-Hastings algorithm for simulating the individual parameters }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=Stochastic_differential_equations_based_models&diff=7296Stochastic differential equations based models2013-06-07T14:04:35Z<p>Brocco: /* Mixed-effects diffusion models */</p>
<hr />
<div><!-- Menu for the Extensions chapter --><br />
<sidebarmenu><br />
+[[Extensions]]<br />
*[[Extensions| Introduction ]] | [[ Mixture models ]] | [[Hidden Markov models]] | [[Stochastic differential equations based models]] <br />
</sidebarmenu><br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
==Introduction==<br />
<br />
<br />
Diffusion models are known to be a relevant tool for modeling stochastic dynamic phenomena, and are widely used in various fields including finance, physics, biology, physiology and control.<br />
In a population approach, a mixed-effects diffusion model describes each individual series of observations using a system of stochastic differential equations (SDE) while also taking into account variability between individuals.<br />
<br />
For the sake of simplicity we will consider first a diffusion model for a single individual, and illustrate it with a very general dynamical system with linear transfers and PK examples. We will then show that the extension to mixed diffusion models is fairly straightforward.<br />
<br />
Note that the conditional distribution $\qcypsi$ of the observations usually does not have a closed-form expression. When the underlying system is a Gaussian linear dynamical one, the conditional pdf of the observations, $\pcypsi(y_i|\psi_i)$ can be computed using the [http://en.wikipedia.org/wiki/Kalman_filter ''Kalman filter'' (KF)]. When the system is not linear, the [http://en.wikipedia.org/wiki/Extended_Kalman_Filter ''extended Kalman filter'' (EKF)] provides an approximation of the conditional pdf.<br />
<br />
<br />
<br><br />
<br />
==Diffusion model==<br />
<br />
<br />
We assume that one diffusion trajectory is observed with noise at discrete time points $t_1<\ldots<t_j<\ldots<t_n$. Let us note $(X(t),t>0) \in \Rset^d$ the underlying dynamical process and $y_j \in \Rset$ a noisy function of $X(t_j)$, $j=1,\ldots,n$. The general form of the diffusion model is given by:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:SDEmodel"><math><br />
\left\{<br />
\begin{array}{lll}<br />
dX(t) &=& b(X(t),\psi)dt + \gamma(X(t),\psi)dW(t)\\[0.2cm]<br />
y_{j} &=& c(X(t_{j}),\psi) + \varepsilon_{j} \\[0.2cm]<br />
\varepsilon_{j} &\underset{i.i.d.}{\sim}& \mathcal{N}(0,a^2(\psi)), \quad j=1,\ldots,n ,<br />
\end{array}<br />
\right. </math></div><br />
|reference=(1) }}<br />
<br />
with the initial condition $X(t_1) = x \in \Rset^d$. Here, $(W(t),t>0)$ is a standard [http://en.wikipedia.org/wiki/Wiener_process Wiener process] in $\Rset^d$ and $\varepsilon_j \in \Rset$ represents the measurement error occurring at the $j^{\mathrm{th}}$ observation, independent of $W(t)$. The measurement function $c: \ \Rset^d \times \Rset^p \rightarrow \Rset$, the drift function $b: \ \Rset^d \times \Rset^p \rightarrow \Rset^d$ and the diffusion function $\gamma: \ \Rset^d \times \Rset^p \rightarrow \mathcal{M}_d(\Rset)$, where $\mathcal{M}_d(\Rset)$ is the set of $d \times d$ matrices with real elements, are known functions that depend on an unknown parameter $\psi \in \Rset^p$.<br />
<br />
We can in fact consider an SDE-based model as a ODE-based one with a stochastic component.<br />
<br />
<br />
{{Example1<br />
|title1=Example: <br />
|title2= &#32; IV bolus with linear elimination<br />
<br />
|text= The ordinary differential equation <br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:ode1"><math> <br />
dA_c(t) = -k A_c(t) dt<br />
</math></div><br />
|reference=(2) }}<br />
<br />
is usually used to describe the kinetics of a drug administered by rapid injection (IV bolus) into plasma. In bolus-specific compartmental models, plasma is treated as the single compartment of the human body. $A_c(t)$ represents the amount of a drug ingredient in plasma at time $t$ after injection, and $k$ is the elimination rate constant. The figure below displays the typical evolution of the amount found in the central compartment when $k=4$.<br />
<br />
{{ImageWithCaption|image=sde0.png|caption=Drug concentration evolution for ODE diffusion example }}<br />
<br />
<br />
Imagine now that we aim to describe the evolution of the drug amount over time by means of stochastic differential equations rather than ordinary differential equations, in order to better describe the ''intra-individual variability'' of the observed process. We can assume for example that the system [[#eq:ode1|(2)]] is randomly perturbed by an additive [http://en.wikipedia.org/wiki/Wiener_process Wiener process]:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:sde1"><math><br />
dA_c(t) = -k A_c(t) dt + \gamma dW(t). <br />
</math></div><br />
|reference=(3) }}<br />
<br />
The figure below displays four kinetics for the amount in the central compartment, simulated from this model with $k=4$ and $\gamma=2$.<br />
<br />
<br />
{{ImageWithCaption|image=sde1.png|caption=Drug concentration evolution for SDE diffusion example }}<br />
<br />
}}<br />
<br />
<br />
These kinetics are clearly stochastic. Nevertheless, they are not realistic because:<br />
<br />
<br />
* they give an overly erratic description of the evolution of the drug concentration within the compartments of the human body.<br />
<br />
* they do not comply with certain constraints on biological dynamics (sign, monotony).<br />
<br />
<br />
A more relevant model might consider that some parameters of the model randomly fluctuate over time, rather than the observed variable itself, modeling for example the elimination rate "constant" $k$ as a stochastic process $k(t)$ that randomly varies around a typical value $k^\star$.<br />
<br />
More generally, we can describe the fluctuations within a linear dynamical systems by considering the transfer rates, described below, as diffusion processes rather than the observed processes themselves.<br />
<br />
<br />
<br />
<br><br />
==Diffusion models for dynamical systems with linear transfers==<br />
<br />
<br />
Dynamical systems have applications in many fields. They can be used to model viral dynamics, population flows, interactions between cells, and drug pharmacokinetics. Dynamical systems involving linear transfers between different entities are usually modeled by means of a system of ODEs with the following general form:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:linearTransferODEModel"><math><br />
dA(t) = K\, A(t)dt,<br />
</math></div><br />
|reference=(4) }}<br />
<br />
<br />
where $A(t)$ is a vector whose $l^{\textrm{th}}$ component represents the condition of the $l^{\textrm{th}}$ entity at time $t$ and $K=(K_{l,l^\prime} \, 1\leq l , l^\prime \leq d)$ a deterministic matrix defined as:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:K"><math><br />
\left\{<br />
\begin{array}{ll}<br />
K_{l,l^\prime} = k_{l,l^\prime} & \textrm{if} \quad l \neq l^\prime\\<br />
K_{l,l} = - k_{l,0} - \sum_{l^\prime} k_{l,l^\prime} ,<br />
\end{array}<br />
\right.<br />
</math></div><br />
|reference=(5) }}<br />
<br />
where $k_{l,l^\prime}$ represents the transfer rate from entity $l$ to entity $l^\prime$, and $k_{l,0}$ the elimination rate from entity $l$. An example of such a dynamical system with $3$ components is schematized below.<br />
<br />
<br />
{{ImageWithCaption|image=linear.png|caption=A dynamical system with $3$ components (circles) and linear transfers between components (arrows) }}<br />
<br />
<br />
In this particular example, matrix $K$ would be defined as<br />
<br />
{{Equation1<br />
|equation= <math><br />
K = \begin{pmatrix}<br />
-k_{10} -k_{12} -k_{13} & k_{21} & k_{31}\\<br />
k_{12} & -k_{20} -k_{21} -k_{23} & k_{32}\\<br />
k_{13} & k_{23} & -k_{30} -k_{31} -k_{32}<br />
\end{pmatrix}.<br />
</math> }}<br />
<br />
The model defined by equations [[#eq:linearTransferODEModel|(4)]] and [[#eq:K|(5)]] is a deterministic model which assumes that transfers take place at the same rate at all times. This is often a restrictive assumption since in reality, dynamical systems usually exhibit some random behavior. It is therefore reasonable to consider that transfers are not constant but randomly fluctuate over time. This new assumption leads to the following dynamical system:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:linearTransferSDEModel"><math><br />
dA(t) = K(t)A(t)dt,<br />
</math></div><br />
|reference=(6) }}<br />
<br />
where $K$ has the same structure as in [[#eq:K|(5)]] but now some components $k_{l,l^\prime}$ are stochastic processes which take non-negative values and randomly fluctuate around a typical value $k_{l,l^\prime}^\star$.<br />
<br />
Let us now illustrate the construction of such diffusion models using some specific examples in pharmacokinetics.<br />
<br />
<br />
{{Example1<br />
|title1=Example 1: <br />
|title2= &#32; IV bolus administration with stochastic linear elimination<br />
<br />
|text= We will first extend the ODE based model defined in [[#eq:ode1|(2)]] by assuming that $k$ is a diffusion process which takes non-negative values and fluctuates around a typical value $k^\star$.<br />
In this example, non-negativity of $k(t)$ is ensured by defining the logarithm of the transfer rate as an Ornstein-Uhlenbeck diffusion process:<br />
<br />
{{Equation1<br />
|equation=<math> d\log k(t) = - \alpha \left( \log k(t) - \log k^\star \right) dt + \gamma d W(t), </math> }}<br />
<br />
where $W$ is a standard one-dimensional [http://en.wikipedia.org/wiki/Wiener_process Wiener process]. This results in the following diffusion system:<br />
<br />
{{Equation1<br />
|equation=<math> dX(t) = b(X(t))dt + \gamma(X(t))dW(t), </math> }}<br />
<br />
where<br />
<br />
{{Equation1<br />
|equation=<math> <br />
X(t) = \begin{pmatrix} A_c(t) \\ \log k(t) \end{pmatrix}, \ \ \ \<br />
b(x) = \begin{pmatrix} -x_1 \exp(x_2) \\ -\alpha (x_2-\log k^{\star}) \end{pmatrix}, \ \ \ \<br />
\gamma(x) = \begin{pmatrix} 0 & 0 \\ 0 & \gamma \end{pmatrix}.<br />
</math> }}<br />
<br />
Note that in this specific example, the Jacobian matrix of the drift function $b$ has a simple form: <br />
<br />
{{Equation1<br />
|equation=<math> B(x)=\begin{pmatrix} - \exp(x_2) & -x_1 \exp(x_2)\\ 0 & -\alpha \end{pmatrix}. </math> }}<br />
<br />
The two figures below display four simulated processes $k(t)$ and the associated amount processes $A_c(t)$.<br />
<br />
<br />
::[[File:sde2.png|link=]]<br />
<br />
:::[[File:sde3.png|link=]]<br />
<br />
<br />
We measure the concentration at times $(t_{j}, 1\leq j \leq n)$:<br />
<br />
{{Equation1<br />
|equation= <math>y_j = \displaystyle{\frac{A_c(t_{j})}{V} } + a \, \teps_j . </math> }}<br />
<br />
The parameter vector of the model is therefore $\psi = (V, k^\star, \alpha, \gamma, a)$. We see in this example that the simulated kinetics are much more realistic than those obtained with the previous model, because:<br />
<br />
<br />
* the elimination rate process $k(t)$ is a stochastic process that takes non-negative values,<br />
<br />
* even though the amount process is stochastic, it is smooth and decreases monotonically with time.<br />
}}<br />
<br />
<br />
<br />
{{Example1<br />
|title1=Example 2: <br />
|title2= &#32; Oral administration with first-order absorption and stochastic linear elimination<br />
<br />
|text=Oral PK models with first-order absorption and linear elimination are widely used to describe the time-course of a drug orally administered to a unique compartment of the human body. The drug is administrated in a depot compartment, absorbed by the central compartment with absorption rate $k_a$ and eliminated with elimination rate $k_e$. Such a model is described by the following system of ODEs:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:oral1"><math><br />
\displaystyle{ \frac{d}{dt} } \begin{pmatrix} A_d(t) \\ A_c(t) \end{pmatrix} \ \ = \ \ \begin{pmatrix} -k_a & 0\\ k_a & -k_e\end{pmatrix} \begin{pmatrix} A_d(t) \\ A_c(t) \end{pmatrix},<br />
</math></div><br />
|reference=(7) }}<br />
<br />
where $A_d(t)$ and $A_c(t)$ respectively represent the amounts of drug at time $t$ in the depot and central compartments. Assume now that the elimination constant is driven by a stochastic process, solution to the stochastic differential equation<br />
<br />
{{Equation1<br />
|equation=<math> d k_e(t) = - \alpha (k_e - k_e^\star ) dt + \gamma \sqrt{k_e(t)} dW(t),<br />
</math> }}<br />
<br />
where $W$ is a standard one-dimensional [http://en.wikipedia.org/wiki/Wiener_process Wiener process]. Then [[#eq:oral1|(7)]] becomes:<br />
<br />
{{Equation1<br />
|equation=<math> dX(t) = b(X(t))dt + \gamma(X(t))dW(t). </math> }}<br />
<br />
Here,<br />
<br />
{{Equation1<br />
|equation=<math><br />
X(t)= \begin{pmatrix} A_d(t) \\ A_c(t) \\ k_e(t) \end{pmatrix}, \ \ \ \<br />
b(x) = \begin{pmatrix} -k_a x_1 \\ k_a x_1 -x_3 x_2 \\ -\alpha(x_3-k_e^\star ) \end{pmatrix}, \ \ \ \<br />
\gamma(x) = \begin{pmatrix} 0 & 0 & 0 \\ 0 & 0 & 0\\ 0 & 0 & \gamma \sqrt{x_3}\end{pmatrix} ,<br />
</math> }}<br />
<br />
and the parameter vector of the model is $\psi = (V, k_a, k^\star, \alpha, \gamma, a) .$<br />
}}<br />
<br />
In both examples, the diffusion model can be easily extended to a population approach by defining the system's parameters $\psi$ as an individual random vector.<br />
<br />
<br />
<br />
<br><br />
<br />
==Mixed-effects diffusion models==<br />
<br />
Let us now consider model [[#eq:SDEmodel|(1)]] with observations coming from several subjects. An adequate adaptation of model [[#eq:SDEmodel|(1)]] in such a context consists of considering as many dynamical systems as individuals, and defining the parameters of the individual dynamical systems as independent random variables, in such a way to correctly reflect variability between the different trajectories. To standardize notation, we consider $N$ different subjects randomly chosen from a population and note $n_i$ the number of observations for individual $i$, so that $t_{i1}<\ldots<t_{i,n_i}$ are subject $i$'s observation time points. $(X_i(t),t>0) \in \Rset^d$ and $y_{ij} \in \Rset$ will respectively denote individual $i$'s diffusion and the observation $X_i(t_{ij})$. The $y_{ij}$, $i=1,\ldots,N$, $j=1,\ldots,n_i$ are governed by a mixed-effects model based on a $d$-dimensional real-valued system of stochastic differential equations with the general form:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:SDEmixedModel"><math><br />
\left\{<br />
\begin{array}{l}<br />
dX_i(t) = b(X_i(t),\psi_i)dt + \gamma(X_i(t),\psi_i)dW_i(t),\\[0.2cm]<br />
y_{ij} = c(X_i(t_{ij}),\psi_i) + \teps_{ij},\\[0.2cm]<br />
\teps_{ij} \underset{i.i.d.}{\sim} \mathcal{N}(0,a^2(\psi_i)) \; , \; j=1,\ldots, n_i \; , \; i=1,\ldots,N,\\<br />
\end{array}<br />
\right.<br />
</math></div><br />
|reference=(8) }}<br />
<br />
with initial condition $X_i(t_1) = x_{i1} \in \Rset^d$ for $i=1,\ldots,N$. The $\psi_i$'s are unobserved independent $d$-dimensional random subject-specific parameters, drawn from a distribution $\qpsi$ which depends on a set of population parameters $\theta$, $(W_1(t),t>0), \ldots, (W_N(t),t>0)$ are standard independent [http://en.wikipedia.org/wiki/Wiener_process Wiener processes], and the $\teps_{ij}$ are independent Gaussian random variables representing residual errors such that the $\psi_i$, $W_i$ and $\teps_{ij}$ are mutually independent.<br />
The measurement function $c$, the drift function $b$ and the diffusion function $\gamma$ are known functions that are common to the $N$ subjects and depend on the unknown parameters $\psi_i$.<br />
<br />
Assuming that the $N$ individuals are independent, the joint pdf is given by:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:sdepdf"><math><br />
\pcypsi(y_1,\ldots,y_N {{!}} \psi_1,\ldots,\psi_N) = \prod_{i=1}^{N}\pcyipsii(y_i {{!}} \psi_i).<br />
</math></div><br />
|reference=(9) }}<br />
<br />
Computing the conditional distribution $\pcyipsii$ of the observations for any individual $i$ requires here to compute the conditional distribution of each observation given the past:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\pcyipsii(y_i {{!}} \psi_i) &=& \pyipsiONE(y_{i1} {{!}} \psi_i)\prod_{j=2}^{n_i} p(y_{i,j} {{!}} y_{i,1},\ldots,y_{i,j-1} {{!}} \psi_i) .<br />
\end{eqnarray}</math> }}<br />
<br />
Except in some very specific classes of mixed-effects diffusion models, the transition density $\pmacro(y_{i,j}|y_{i,1},\ldots,y_{i,j-1} | \psi_i)$ does not have a closed-form expression since it involves the transition densities of the underlying diffusion processes $X_i$.<br />
When the underlying system is a Gaussian linear dynamical system, this density is a Gaussian density whose mean and variance can be computed using the [http://en.wikipedia.org/wiki/Kalman_filter Kalman filter]. When the system is not linear, a first solution consists in approximating this density by a Gaussian density and using the [http://en.wikipedia.org/wiki/Extended_Kalman_Filter extended Kalman filter] for quickly computing the mean and the variance of this density. On the other hand, particle filters do not make any approximations of the transition density, but are very demanding in terms of simulation volume and computation time.<br />
<br />
<br />
<br />
<br><br />
<br />
==Bibliography==<br />
<br />
<br />
<bibtex><br />
@article{delattre2013sii,<br />
title={Coupling the SAEM algorithm and the extended Kalman filter for maximum likelihood estimation in mixed-effects diffusion models},<br />
author={Delattre, M. and Lavielle, M.},<br />
journal={Statistics and Its Interface},<br />
year={2013}<br />
}<br />
</bibtex><br />
<bibtex><br />
@Article{Ditlevsen2005,<br />
title = {Mixed Effects in Stochastic Differential Equation Models},<br />
author = {Ditlevsen, S. and De Gaetano, A.},<br />
journal = {REVSTAT Statistical Journal},<br />
volume = {3},<br />
year = {2005},<br />
pages = {137-153}<br />
}<br />
</bibtex><br />
<bibtex><br />
@Article{Donnet2008,<br />
title = {Parametric Inference for Mixed Models Defined by Stochastic Differential Equations},<br />
author = {Donnet, S. and Samson, A.},<br />
journal = {ESAIM: Probability and Statistics},<br />
volume = {12},<br />
year = {2008},<br />
pages = {196-218}<br />
}<br />
</bibtex><br />
<bibtex><br />
@inproceedings{doucet2011tutorial,<br />
title={A tutorial on particle filtering and smoothing: Fifteen years later},<br />
author={Doucet, A. and Johansen, A. M.},<br />
booktitle={Oxford Handbook of Nonlinear Filtering},<br />
year={2011},<br />
organization={Citeseer}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{Klim2009,<br />
author = {Klim, S. and Mortensen, S. B. and Kristensen, N. R. and Overgaard, R. V. and Madsen, H.},<br />
title = {Population stochastic modelling (PSM)-an R package for mixed-effects models based on stochastic differential equations},<br />
journal = {Computer methods and programs in biomedicine},<br />
volume = {94},<br />
pages = {279-289},<br />
year = {2009}<br />
}<br />
</bibtex><br />
<bibtex><br />
@Article{Kristensen2005,<br />
title = {Using Stochastic Differential Equations for PK/PD Model Development},<br />
author = {Kristensen, N. R. and Madsen, H. and Ingwersen, S. H.},<br />
journal = {Journal of Pharmacokinetics and Pharmacodynamics},<br />
volume = {32},<br />
year = {2005},<br />
pages = {109-141}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{Mazzoni2008,<br />
title = {Computational aspects of continuous-discrete extended Kalman-filtering},<br />
author = {Mazzoni, T.},<br />
journal = {Computational Statistics},<br />
volume = {23},<br />
year = {2008},<br />
pages = {519-39}<br />
}<br />
</bibtex><br />
<bibtex><br />
@Article{PSM,<br />
title = {Population Stochastic Modelling (PSM): Model definition, description and examples},<br />
author = {Mortensen, S. and Klim, S.}, <br />
year = {2008},<br />
url = {http://www2.imm.dtu.dk/projects/psm/},<br />
}<br />
</bibtex><br />
<bibtex><br />
@Article{Mortensen2007,<br />
title = {A Matlab framework for estimation of NLME models using stochastic differential equations - Applications for estimation of insulin secretion rates},<br />
author = {Mortensen, S. B. and Klim, S. and Dammann, B. and Kristensen, N. R. and Madsen, H. and Overgaard, R. V.},<br />
journal = {Journal of Pharmacokinetics and Pharmacodynamics},<br />
volume = {34},<br />
year = {2007},<br />
pages = {623-642}<br />
}<br />
</bibtex><br />
<bibtex><br />
@Article{Overgaard2005,<br />
title = {Non-Linear Mixed-Effects Models with Stochastic Differential Equations: Implementation of an Estimation Algorithm},<br />
author = {Overgaard, R. V. and Jonsson, N. and Torn&oslash;e, C. W. and Madsen, H.},<br />
journal = {Journal of Pharmacokinetics and Pharmacodynamics},<br />
volume = {32},<br />
year = {2005},<br />
pages = {85-107}<br />
}<br />
</bibtex><br />
<bibtex><br />
@Article{Picchini2010,<br />
title = {Stochastic Differential Mixed-Effects Models},<br />
author = {Picchini, U. and De Gaetano, A. and Ditlevsen, S.},<br />
journal = {Scandinavian Journal of Statistics},<br />
volume = {37},<br />
year = {2010},<br />
pages = {67-90}<br />
}<br />
</bibtex><br />
<bibtex><br />
@Article{Picchini2011,<br />
title = {Practical Estimation of High Dimensional Stochastic Differential Mixed-Effects Models},<br />
author = {Picchini, U. and Ditlevsen, S.},<br />
journal = {Computational Statistics and Data Analysis},<br />
volume = {55},<br />
number = {3},<br />
year = {2011},<br />
pages = {1426-1444}<br />
}<br />
</bibtex><br />
<bibtex><br />
@Article{Tornoe2005,<br />
title = {Stochastic Differential Equations in NONMEM: Implementation, Application, and Comparison with Ordinary Differential Equations},<br />
author = {Torn&oslash;e, C. W. and Overgaard, R. V. and Agers&oslash;, H. and Nielsen, H. A. and Madsen, H. and Jonsson, E. N.},<br />
journal = {Pharmaceutical Research},<br />
volume = {22},<br />
year = {2005},<br />
pages = {1247-1258}<br />
}<br />
</bibtex><br />
<br />
<br />
{{Back<br />
|link=Hidden Markov models }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=Stochastic_differential_equations_based_models&diff=7295Stochastic differential equations based models2013-06-07T14:02:19Z<p>Brocco: </p>
<hr />
<div><!-- Menu for the Extensions chapter --><br />
<sidebarmenu><br />
+[[Extensions]]<br />
*[[Extensions| Introduction ]] | [[ Mixture models ]] | [[Hidden Markov models]] | [[Stochastic differential equations based models]] <br />
</sidebarmenu><br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
==Introduction==<br />
<br />
<br />
Diffusion models are known to be a relevant tool for modeling stochastic dynamic phenomena, and are widely used in various fields including finance, physics, biology, physiology and control.<br />
In a population approach, a mixed-effects diffusion model describes each individual series of observations using a system of stochastic differential equations (SDE) while also taking into account variability between individuals.<br />
<br />
For the sake of simplicity we will consider first a diffusion model for a single individual, and illustrate it with a very general dynamical system with linear transfers and PK examples. We will then show that the extension to mixed diffusion models is fairly straightforward.<br />
<br />
Note that the conditional distribution $\qcypsi$ of the observations usually does not have a closed-form expression. When the underlying system is a Gaussian linear dynamical one, the conditional pdf of the observations, $\pcypsi(y_i|\psi_i)$ can be computed using the [http://en.wikipedia.org/wiki/Kalman_filter ''Kalman filter'' (KF)]. When the system is not linear, the [http://en.wikipedia.org/wiki/Extended_Kalman_Filter ''extended Kalman filter'' (EKF)] provides an approximation of the conditional pdf.<br />
<br />
<br />
<br><br />
<br />
==Diffusion model==<br />
<br />
<br />
We assume that one diffusion trajectory is observed with noise at discrete time points $t_1<\ldots<t_j<\ldots<t_n$. Let us note $(X(t),t>0) \in \Rset^d$ the underlying dynamical process and $y_j \in \Rset$ a noisy function of $X(t_j)$, $j=1,\ldots,n$. The general form of the diffusion model is given by:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:SDEmodel"><math><br />
\left\{<br />
\begin{array}{lll}<br />
dX(t) &=& b(X(t),\psi)dt + \gamma(X(t),\psi)dW(t)\\[0.2cm]<br />
y_{j} &=& c(X(t_{j}),\psi) + \varepsilon_{j} \\[0.2cm]<br />
\varepsilon_{j} &\underset{i.i.d.}{\sim}& \mathcal{N}(0,a^2(\psi)), \quad j=1,\ldots,n ,<br />
\end{array}<br />
\right. </math></div><br />
|reference=(1) }}<br />
<br />
with the initial condition $X(t_1) = x \in \Rset^d$. Here, $(W(t),t>0)$ is a standard [http://en.wikipedia.org/wiki/Wiener_process Wiener process] in $\Rset^d$ and $\varepsilon_j \in \Rset$ represents the measurement error occurring at the $j^{\mathrm{th}}$ observation, independent of $W(t)$. The measurement function $c: \ \Rset^d \times \Rset^p \rightarrow \Rset$, the drift function $b: \ \Rset^d \times \Rset^p \rightarrow \Rset^d$ and the diffusion function $\gamma: \ \Rset^d \times \Rset^p \rightarrow \mathcal{M}_d(\Rset)$, where $\mathcal{M}_d(\Rset)$ is the set of $d \times d$ matrices with real elements, are known functions that depend on an unknown parameter $\psi \in \Rset^p$.<br />
<br />
We can in fact consider an SDE-based model as a ODE-based one with a stochastic component.<br />
<br />
<br />
{{Example1<br />
|title1=Example: <br />
|title2= &#32; IV bolus with linear elimination<br />
<br />
|text= The ordinary differential equation <br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:ode1"><math> <br />
dA_c(t) = -k A_c(t) dt<br />
</math></div><br />
|reference=(2) }}<br />
<br />
is usually used to describe the kinetics of a drug administered by rapid injection (IV bolus) into plasma. In bolus-specific compartmental models, plasma is treated as the single compartment of the human body. $A_c(t)$ represents the amount of a drug ingredient in plasma at time $t$ after injection, and $k$ is the elimination rate constant. The figure below displays the typical evolution of the amount found in the central compartment when $k=4$.<br />
<br />
{{ImageWithCaption|image=sde0.png|caption=Drug concentration evolution for ODE diffusion example }}<br />
<br />
<br />
Imagine now that we aim to describe the evolution of the drug amount over time by means of stochastic differential equations rather than ordinary differential equations, in order to better describe the ''intra-individual variability'' of the observed process. We can assume for example that the system [[#eq:ode1|(2)]] is randomly perturbed by an additive [http://en.wikipedia.org/wiki/Wiener_process Wiener process]:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:sde1"><math><br />
dA_c(t) = -k A_c(t) dt + \gamma dW(t). <br />
</math></div><br />
|reference=(3) }}<br />
<br />
The figure below displays four kinetics for the amount in the central compartment, simulated from this model with $k=4$ and $\gamma=2$.<br />
<br />
<br />
{{ImageWithCaption|image=sde1.png|caption=Drug concentration evolution for SDE diffusion example }}<br />
<br />
}}<br />
<br />
<br />
These kinetics are clearly stochastic. Nevertheless, they are not realistic because:<br />
<br />
<br />
* they give an overly erratic description of the evolution of the drug concentration within the compartments of the human body.<br />
<br />
* they do not comply with certain constraints on biological dynamics (sign, monotony).<br />
<br />
<br />
A more relevant model might consider that some parameters of the model randomly fluctuate over time, rather than the observed variable itself, modeling for example the elimination rate "constant" $k$ as a stochastic process $k(t)$ that randomly varies around a typical value $k^\star$.<br />
<br />
More generally, we can describe the fluctuations within a linear dynamical systems by considering the transfer rates, described below, as diffusion processes rather than the observed processes themselves.<br />
<br />
<br />
<br />
<br><br />
==Diffusion models for dynamical systems with linear transfers==<br />
<br />
<br />
Dynamical systems have applications in many fields. They can be used to model viral dynamics, population flows, interactions between cells, and drug pharmacokinetics. Dynamical systems involving linear transfers between different entities are usually modeled by means of a system of ODEs with the following general form:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:linearTransferODEModel"><math><br />
dA(t) = K\, A(t)dt,<br />
</math></div><br />
|reference=(4) }}<br />
<br />
<br />
where $A(t)$ is a vector whose $l^{\textrm{th}}$ component represents the condition of the $l^{\textrm{th}}$ entity at time $t$ and $K=(K_{l,l^\prime} \, 1\leq l , l^\prime \leq d)$ a deterministic matrix defined as:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:K"><math><br />
\left\{<br />
\begin{array}{ll}<br />
K_{l,l^\prime} = k_{l,l^\prime} & \textrm{if} \quad l \neq l^\prime\\<br />
K_{l,l} = - k_{l,0} - \sum_{l^\prime} k_{l,l^\prime} ,<br />
\end{array}<br />
\right.<br />
</math></div><br />
|reference=(5) }}<br />
<br />
where $k_{l,l^\prime}$ represents the transfer rate from entity $l$ to entity $l^\prime$, and $k_{l,0}$ the elimination rate from entity $l$. An example of such a dynamical system with $3$ components is schematized below.<br />
<br />
<br />
{{ImageWithCaption|image=linear.png|caption=A dynamical system with $3$ components (circles) and linear transfers between components (arrows) }}<br />
<br />
<br />
In this particular example, matrix $K$ would be defined as<br />
<br />
{{Equation1<br />
|equation= <math><br />
K = \begin{pmatrix}<br />
-k_{10} -k_{12} -k_{13} & k_{21} & k_{31}\\<br />
k_{12} & -k_{20} -k_{21} -k_{23} & k_{32}\\<br />
k_{13} & k_{23} & -k_{30} -k_{31} -k_{32}<br />
\end{pmatrix}.<br />
</math> }}<br />
<br />
The model defined by equations [[#eq:linearTransferODEModel|(4)]] and [[#eq:K|(5)]] is a deterministic model which assumes that transfers take place at the same rate at all times. This is often a restrictive assumption since in reality, dynamical systems usually exhibit some random behavior. It is therefore reasonable to consider that transfers are not constant but randomly fluctuate over time. This new assumption leads to the following dynamical system:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:linearTransferSDEModel"><math><br />
dA(t) = K(t)A(t)dt,<br />
</math></div><br />
|reference=(6) }}<br />
<br />
where $K$ has the same structure as in [[#eq:K|(5)]] but now some components $k_{l,l^\prime}$ are stochastic processes which take non-negative values and randomly fluctuate around a typical value $k_{l,l^\prime}^\star$.<br />
<br />
Let us now illustrate the construction of such diffusion models using some specific examples in pharmacokinetics.<br />
<br />
<br />
{{Example1<br />
|title1=Example 1: <br />
|title2= &#32; IV bolus administration with stochastic linear elimination<br />
<br />
|text= We will first extend the ODE based model defined in [[#eq:ode1|(2)]] by assuming that $k$ is a diffusion process which takes non-negative values and fluctuates around a typical value $k^\star$.<br />
In this example, non-negativity of $k(t)$ is ensured by defining the logarithm of the transfer rate as an Ornstein-Uhlenbeck diffusion process:<br />
<br />
{{Equation1<br />
|equation=<math> d\log k(t) = - \alpha \left( \log k(t) - \log k^\star \right) dt + \gamma d W(t), </math> }}<br />
<br />
where $W$ is a standard one-dimensional [http://en.wikipedia.org/wiki/Wiener_process Wiener process]. This results in the following diffusion system:<br />
<br />
{{Equation1<br />
|equation=<math> dX(t) = b(X(t))dt + \gamma(X(t))dW(t), </math> }}<br />
<br />
where<br />
<br />
{{Equation1<br />
|equation=<math> <br />
X(t) = \begin{pmatrix} A_c(t) \\ \log k(t) \end{pmatrix}, \ \ \ \<br />
b(x) = \begin{pmatrix} -x_1 \exp(x_2) \\ -\alpha (x_2-\log k^{\star}) \end{pmatrix}, \ \ \ \<br />
\gamma(x) = \begin{pmatrix} 0 & 0 \\ 0 & \gamma \end{pmatrix}.<br />
</math> }}<br />
<br />
Note that in this specific example, the Jacobian matrix of the drift function $b$ has a simple form: <br />
<br />
{{Equation1<br />
|equation=<math> B(x)=\begin{pmatrix} - \exp(x_2) & -x_1 \exp(x_2)\\ 0 & -\alpha \end{pmatrix}. </math> }}<br />
<br />
The two figures below display four simulated processes $k(t)$ and the associated amount processes $A_c(t)$.<br />
<br />
<br />
::[[File:sde2.png|link=]]<br />
<br />
:::[[File:sde3.png|link=]]<br />
<br />
<br />
We measure the concentration at times $(t_{j}, 1\leq j \leq n)$:<br />
<br />
{{Equation1<br />
|equation= <math>y_j = \displaystyle{\frac{A_c(t_{j})}{V} } + a \, \teps_j . </math> }}<br />
<br />
The parameter vector of the model is therefore $\psi = (V, k^\star, \alpha, \gamma, a)$. We see in this example that the simulated kinetics are much more realistic than those obtained with the previous model, because:<br />
<br />
<br />
* the elimination rate process $k(t)$ is a stochastic process that takes non-negative values,<br />
<br />
* even though the amount process is stochastic, it is smooth and decreases monotonically with time.<br />
}}<br />
<br />
<br />
<br />
{{Example1<br />
|title1=Example 2: <br />
|title2= &#32; Oral administration with first-order absorption and stochastic linear elimination<br />
<br />
|text=Oral PK models with first-order absorption and linear elimination are widely used to describe the time-course of a drug orally administered to a unique compartment of the human body. The drug is administrated in a depot compartment, absorbed by the central compartment with absorption rate $k_a$ and eliminated with elimination rate $k_e$. Such a model is described by the following system of ODEs:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:oral1"><math><br />
\displaystyle{ \frac{d}{dt} } \begin{pmatrix} A_d(t) \\ A_c(t) \end{pmatrix} \ \ = \ \ \begin{pmatrix} -k_a & 0\\ k_a & -k_e\end{pmatrix} \begin{pmatrix} A_d(t) \\ A_c(t) \end{pmatrix},<br />
</math></div><br />
|reference=(7) }}<br />
<br />
where $A_d(t)$ and $A_c(t)$ respectively represent the amounts of drug at time $t$ in the depot and central compartments. Assume now that the elimination constant is driven by a stochastic process, solution to the stochastic differential equation<br />
<br />
{{Equation1<br />
|equation=<math> d k_e(t) = - \alpha (k_e - k_e^\star ) dt + \gamma \sqrt{k_e(t)} dW(t),<br />
</math> }}<br />
<br />
where $W$ is a standard one-dimensional [http://en.wikipedia.org/wiki/Wiener_process Wiener process]. Then [[#eq:oral1|(7)]] becomes:<br />
<br />
{{Equation1<br />
|equation=<math> dX(t) = b(X(t))dt + \gamma(X(t))dW(t). </math> }}<br />
<br />
Here,<br />
<br />
{{Equation1<br />
|equation=<math><br />
X(t)= \begin{pmatrix} A_d(t) \\ A_c(t) \\ k_e(t) \end{pmatrix}, \ \ \ \<br />
b(x) = \begin{pmatrix} -k_a x_1 \\ k_a x_1 -x_3 x_2 \\ -\alpha(x_3-k_e^\star ) \end{pmatrix}, \ \ \ \<br />
\gamma(x) = \begin{pmatrix} 0 & 0 & 0 \\ 0 & 0 & 0\\ 0 & 0 & \gamma \sqrt{x_3}\end{pmatrix} ,<br />
</math> }}<br />
<br />
and the parameter vector of the model is $\psi = (V, k_a, k^\star, \alpha, \gamma, a) .$<br />
}}<br />
<br />
In both examples, the diffusion model can be easily extended to a population approach by defining the system's parameters $\psi$ as an individual random vector.<br />
<br />
<br />
<br />
<br><br />
<br />
==Mixed-effects diffusion models==<br />
<br />
Let us now consider model [[#eq:SDEmodel|(1)]] with observations coming from several subjects. An adequate adaptation of model [[#eq:SDEmodel|(1)]] in such a context consists of considering as many dynamical systems as individuals, and defining the parameters of the individual dynamical systems as independent random variables, in such a way to correctly reflect variability between the different trajectories. To standardize notation, we consider $N$ different subjects randomly chosen from a population and note $n_i$ the number of observations for individual $i$, so that $t_{i1}<\ldots<t_{i,n_i}$ are subject $i$'s observation time points. $(X_i(t),t>0) \in \Rset^d$ and $y_{ij} \in \Rset$ will respectively denote individual $i$'s diffusion and the observation $X_i(t_{ij})$. The $y_{ij}$, $i=1,\ldots,N$, $j=1,\ldots,n_i$ are governed by a mixed-effects model based on a $d$-dimensional real-valued system of stochastic differential equations with the general form:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:SDEmixedModel"><math><br />
\left\{<br />
\begin{array}{l}<br />
dX_i(t) = b(X_i(t),\psi_i)dt + \gamma(X_i(t),\psi_i)dW_i(t),\\[0.2cm]<br />
y_{ij} = c(X_i(t_{ij}),\psi_i) + \teps_{ij},\\[0.2cm]<br />
\teps_{ij} \underset{i.i.d.}{\sim} \mathcal{N}(0,a^2(\psi_i)) \; , \; j=1,\ldots, n_i \; , \; i=1,\ldots,N,\\<br />
\end{array}<br />
\right.<br />
</math></div><br />
|reference=(8) }}<br />
<br />
with initial condition $X_i(t_1) = x_{i1} \in \Rset^d$ for $i=1,\ldots,N$. The $\psi_i$'s are unobserved independent $d$-dimensional random subject-specific parameters, drawn from a distribution $\qpsi$ which depends on a set of population parameters $\theta$, $(W_1(t),t>0), \ldots, (W_N(t),t>0)$ are standard independent [http://en.wikipedia.org/wiki/Wiener_process Wiener processes], and the $\teps_{ij}$ are independent Gaussian random variables representing residual errors such that the $\psi_i$, $W_i$ and $\teps_{ij}$ are mutually independent.<br />
The measurement function $c$, the drift function $b$ and the diffusion function $\gamma$ are known functions that are common to the $N$ subjects and depend on the unknown parameters $\psi_i$.<br />
<br />
Assuming that the $N$ individuals are independent, the joint pdf is given by:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:sdepdf"><math><br />
\pcypsi(y_1,\ldots,y_N {{!}} \psi_1,\ldots,\psi_N) = \prod_{i=1}^{N}\pcyipsii(y_i {{!}} \psi_i).<br />
</math></div><br />
|reference=(9) }}<br />
<br />
Computing the conditional distribution $\pcyipsii$ of the observations for any individual $i$ requires here to compute the conditional distribution of each observation given the past:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\pcyipsii(y_i {{!}} \psi_i) &=& \pyipsiONE(y_{i1} {{!}} \psi_i)\prod_{j=2}^{n_i} p(y_{i,j} {{!}} y_{i,1},\ldots,y_{i,j-1} {{!}} \psi_i) .<br />
\end{eqnarray}</math> }}<br />
<br />
Except in some very specific classes of mixed-effects diffusion models, the transition density $\pmacro(y_{i,j}|y_{i,1},\ldots,y_{i,j-1} | \psi_i)$ does not have a closed-form expression since it involves the transition densities of the underlying diffusion processes $X_i$.<br />
When the underlying system is a Gaussian linear dynamical system, this density is a Gaussian density whose mean and variance can be computed using the Kalman filter. When the system is not linear, a first solution consists in approximating this density by a Gaussian density and using the extended Kalman filter for quickly computing the mean and the variance of this density. On the other hand, particle filters do not make any approximations of the transition density, but are very demanding in terms of simulation volume and computation time.<br />
<br />
<br />
<br />
<br><br />
<br />
==Bibliography==<br />
<br />
<br />
<bibtex><br />
@article{delattre2013sii,<br />
title={Coupling the SAEM algorithm and the extended Kalman filter for maximum likelihood estimation in mixed-effects diffusion models},<br />
author={Delattre, M. and Lavielle, M.},<br />
journal={Statistics and Its Interface},<br />
year={2013}<br />
}<br />
</bibtex><br />
<bibtex><br />
@Article{Ditlevsen2005,<br />
title = {Mixed Effects in Stochastic Differential Equation Models},<br />
author = {Ditlevsen, S. and De Gaetano, A.},<br />
journal = {REVSTAT Statistical Journal},<br />
volume = {3},<br />
year = {2005},<br />
pages = {137-153}<br />
}<br />
</bibtex><br />
<bibtex><br />
@Article{Donnet2008,<br />
title = {Parametric Inference for Mixed Models Defined by Stochastic Differential Equations},<br />
author = {Donnet, S. and Samson, A.},<br />
journal = {ESAIM: Probability and Statistics},<br />
volume = {12},<br />
year = {2008},<br />
pages = {196-218}<br />
}<br />
</bibtex><br />
<bibtex><br />
@inproceedings{doucet2011tutorial,<br />
title={A tutorial on particle filtering and smoothing: Fifteen years later},<br />
author={Doucet, A. and Johansen, A. M.},<br />
booktitle={Oxford Handbook of Nonlinear Filtering},<br />
year={2011},<br />
organization={Citeseer}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{Klim2009,<br />
author = {Klim, S. and Mortensen, S. B. and Kristensen, N. R. and Overgaard, R. V. and Madsen, H.},<br />
title = {Population stochastic modelling (PSM)-an R package for mixed-effects models based on stochastic differential equations},<br />
journal = {Computer methods and programs in biomedicine},<br />
volume = {94},<br />
pages = {279-289},<br />
year = {2009}<br />
}<br />
</bibtex><br />
<bibtex><br />
@Article{Kristensen2005,<br />
title = {Using Stochastic Differential Equations for PK/PD Model Development},<br />
author = {Kristensen, N. R. and Madsen, H. and Ingwersen, S. H.},<br />
journal = {Journal of Pharmacokinetics and Pharmacodynamics},<br />
volume = {32},<br />
year = {2005},<br />
pages = {109-141}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{Mazzoni2008,<br />
title = {Computational aspects of continuous-discrete extended Kalman-filtering},<br />
author = {Mazzoni, T.},<br />
journal = {Computational Statistics},<br />
volume = {23},<br />
year = {2008},<br />
pages = {519-39}<br />
}<br />
</bibtex><br />
<bibtex><br />
@Article{PSM,<br />
title = {Population Stochastic Modelling (PSM): Model definition, description and examples},<br />
author = {Mortensen, S. and Klim, S.}, <br />
year = {2008},<br />
url = {http://www2.imm.dtu.dk/projects/psm/},<br />
}<br />
</bibtex><br />
<bibtex><br />
@Article{Mortensen2007,<br />
title = {A Matlab framework for estimation of NLME models using stochastic differential equations - Applications for estimation of insulin secretion rates},<br />
author = {Mortensen, S. B. and Klim, S. and Dammann, B. and Kristensen, N. R. and Madsen, H. and Overgaard, R. V.},<br />
journal = {Journal of Pharmacokinetics and Pharmacodynamics},<br />
volume = {34},<br />
year = {2007},<br />
pages = {623-642}<br />
}<br />
</bibtex><br />
<bibtex><br />
@Article{Overgaard2005,<br />
title = {Non-Linear Mixed-Effects Models with Stochastic Differential Equations: Implementation of an Estimation Algorithm},<br />
author = {Overgaard, R. V. and Jonsson, N. and Torn&oslash;e, C. W. and Madsen, H.},<br />
journal = {Journal of Pharmacokinetics and Pharmacodynamics},<br />
volume = {32},<br />
year = {2005},<br />
pages = {85-107}<br />
}<br />
</bibtex><br />
<bibtex><br />
@Article{Picchini2010,<br />
title = {Stochastic Differential Mixed-Effects Models},<br />
author = {Picchini, U. and De Gaetano, A. and Ditlevsen, S.},<br />
journal = {Scandinavian Journal of Statistics},<br />
volume = {37},<br />
year = {2010},<br />
pages = {67-90}<br />
}<br />
</bibtex><br />
<bibtex><br />
@Article{Picchini2011,<br />
title = {Practical Estimation of High Dimensional Stochastic Differential Mixed-Effects Models},<br />
author = {Picchini, U. and Ditlevsen, S.},<br />
journal = {Computational Statistics and Data Analysis},<br />
volume = {55},<br />
number = {3},<br />
year = {2011},<br />
pages = {1426-1444}<br />
}<br />
</bibtex><br />
<bibtex><br />
@Article{Tornoe2005,<br />
title = {Stochastic Differential Equations in NONMEM: Implementation, Application, and Comparison with Ordinary Differential Equations},<br />
author = {Torn&oslash;e, C. W. and Overgaard, R. V. and Agers&oslash;, H. and Nielsen, H. A. and Madsen, H. and Jonsson, E. N.},<br />
journal = {Pharmaceutical Research},<br />
volume = {22},<br />
year = {2005},<br />
pages = {1247-1258}<br />
}<br />
</bibtex><br />
<br />
<br />
{{Back<br />
|link=Hidden Markov models }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=Stochastic_differential_equations_based_models&diff=7294Stochastic differential equations based models2013-06-07T14:00:37Z<p>Brocco: /* Introduction */</p>
<hr />
<div><!-- Menu for the Extensions chapter --><br />
<sidebarmenu><br />
+[[Extensions]]<br />
*[[Extensions| Introduction ]] | [[ Mixture models ]] | [[Hidden Markov models]] | [[Stochastic differential equations based models]] <br />
</sidebarmenu><br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
==Introduction==<br />
<br />
<br />
Diffusion models are known to be a relevant tool for modeling stochastic dynamic phenomena, and are widely used in various fields including finance, physics, biology, physiology and control.<br />
In a population approach, a mixed-effects diffusion model describes each individual series of observations using a system of stochastic differential equations (SDE) while also taking into account variability between individuals.<br />
<br />
For the sake of simplicity we will consider first a diffusion model for a single individual, and illustrate it with a very general dynamical system with linear transfers and PK examples. We will then show that the extension to mixed diffusion models is fairly straightforward.<br />
<br />
Note that the conditional distribution $\qcypsi$ of the observations usually does not have a closed-form expression. When the underlying system is a Gaussian linear dynamical one, the conditional pdf of the observations, $\pcypsi(y_i|\psi_i)$ can be computed using the [http://en.wikipedia.org/wiki/Kalman_filter ''Kalman filter'' (KF)]. When the system is not linear, the [http://en.wikipedia.org/wiki/Extended_Kalman_Filter ''extended Kalman filter'' (EKF)] provides an approximation of the conditional pdf.<br />
<br />
<br />
<br><br />
<br />
==Diffusion model==<br />
<br />
<br />
We assume that one diffusion trajectory is observed with noise at discrete time points $t_1<\ldots<t_j<\ldots<t_n$. Let us note $(X(t),t>0) \in \Rset^d$ the underlying dynamical process and $y_j \in \Rset$ a noisy function of $X(t_j)$, $j=1,\ldots,n$. The general form of the diffusion model is given by:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:SDEmodel"><math><br />
\left\{<br />
\begin{array}{lll}<br />
dX(t) &=& b(X(t),\psi)dt + \gamma(X(t),\psi)dW(t)\\[0.2cm]<br />
y_{j} &=& c(X(t_{j}),\psi) + \varepsilon_{j} \\[0.2cm]<br />
\varepsilon_{j} &\underset{i.i.d.}{\sim}& \mathcal{N}(0,a^2(\psi)), \quad j=1,\ldots,n ,<br />
\end{array}<br />
\right. </math></div><br />
|reference=(1) }}<br />
<br />
with the initial condition $X(t_1) = x \in \Rset^d$. Here, $(W(t),t>0)$ is a standard Wiener process in $\Rset^d$ and $\varepsilon_j \in \Rset$ represents the measurement error occurring at the $j^{\mathrm{th}}$ observation, independent of $W(t)$. The measurement function $c: \ \Rset^d \times \Rset^p \rightarrow \Rset$, the drift function $b: \ \Rset^d \times \Rset^p \rightarrow \Rset^d$ and the diffusion function $\gamma: \ \Rset^d \times \Rset^p \rightarrow \mathcal{M}_d(\Rset)$, where $\mathcal{M}_d(\Rset)$ is the set of $d \times d$ matrices with real elements, are known functions that depend on an unknown parameter $\psi \in \Rset^p$.<br />
<br />
We can in fact consider an SDE-based model as a ODE-based one with a stochastic component.<br />
<br />
<br />
{{Example1<br />
|title1=Example: <br />
|title2= &#32; IV bolus with linear elimination<br />
<br />
|text= The ordinary differential equation <br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:ode1"><math> <br />
dA_c(t) = -k A_c(t) dt<br />
</math></div><br />
|reference=(2) }}<br />
<br />
is usually used to describe the kinetics of a drug administered by rapid injection (IV bolus) into plasma. In bolus-specific compartmental models, plasma is treated as the single compartment of the human body. $A_c(t)$ represents the amount of a drug ingredient in plasma at time $t$ after injection, and $k$ is the elimination rate constant. The figure below displays the typical evolution of the amount found in the central compartment when $k=4$.<br />
<br />
{{ImageWithCaption|image=sde0.png|caption=Drug concentration evolution for ODE diffusion example }}<br />
<br />
<br />
Imagine now that we aim to describe the evolution of the drug amount over time by means of stochastic differential equations rather than ordinary differential equations, in order to better describe the ''intra-individual variability'' of the observed process. We can assume for example that the system [[#eq:ode1|(2)]] is randomly perturbed by an additive Wiener process:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:sde1"><math><br />
dA_c(t) = -k A_c(t) dt + \gamma dW(t). <br />
</math></div><br />
|reference=(3) }}<br />
<br />
The figure below displays four kinetics for the amount in the central compartment, simulated from this model with $k=4$ and $\gamma=2$.<br />
<br />
<br />
{{ImageWithCaption|image=sde1.png|caption=Drug concentration evolution for SDE diffusion example }}<br />
<br />
}}<br />
<br />
<br />
These kinetics are clearly stochastic. Nevertheless, they are not realistic because:<br />
<br />
<br />
* they give an overly erratic description of the evolution of the drug concentration within the compartments of the human body.<br />
<br />
* they do not comply with certain constraints on biological dynamics (sign, monotony).<br />
<br />
<br />
A more relevant model might consider that some parameters of the model randomly fluctuate over time, rather than the observed variable itself, modeling for example the elimination rate "constant" $k$ as a stochastic process $k(t)$ that randomly varies around a typical value $k^\star$.<br />
<br />
More generally, we can describe the fluctuations within a linear dynamical systems by considering the transfer rates, described below, as diffusion processes rather than the observed processes themselves.<br />
<br />
<br />
<br />
<br><br />
==Diffusion models for dynamical systems with linear transfers==<br />
<br />
<br />
Dynamical systems have applications in many fields. They can be used to model viral dynamics, population flows, interactions between cells, and drug pharmacokinetics. Dynamical systems involving linear transfers between different entities are usually modeled by means of a system of ODEs with the following general form:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:linearTransferODEModel"><math><br />
dA(t) = K\, A(t)dt,<br />
</math></div><br />
|reference=(4) }}<br />
<br />
<br />
where $A(t)$ is a vector whose $l^{\textrm{th}}$ component represents the condition of the $l^{\textrm{th}}$ entity at time $t$ and $K=(K_{l,l^\prime} \, 1\leq l , l^\prime \leq d)$ a deterministic matrix defined as:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:K"><math><br />
\left\{<br />
\begin{array}{ll}<br />
K_{l,l^\prime} = k_{l,l^\prime} & \textrm{if} \quad l \neq l^\prime\\<br />
K_{l,l} = - k_{l,0} - \sum_{l^\prime} k_{l,l^\prime} ,<br />
\end{array}<br />
\right.<br />
</math></div><br />
|reference=(5) }}<br />
<br />
where $k_{l,l^\prime}$ represents the transfer rate from entity $l$ to entity $l^\prime$, and $k_{l,0}$ the elimination rate from entity $l$. An example of such a dynamical system with $3$ components is schematized below.<br />
<br />
<br />
{{ImageWithCaption|image=linear.png|caption=A dynamical system with $3$ components (circles) and linear transfers between components (arrows) }}<br />
<br />
<br />
In this particular example, matrix $K$ would be defined as<br />
<br />
{{Equation1<br />
|equation= <math><br />
K = \begin{pmatrix}<br />
-k_{10} -k_{12} -k_{13} & k_{21} & k_{31}\\<br />
k_{12} & -k_{20} -k_{21} -k_{23} & k_{32}\\<br />
k_{13} & k_{23} & -k_{30} -k_{31} -k_{32}<br />
\end{pmatrix}.<br />
</math> }}<br />
<br />
The model defined by equations [[#eq:linearTransferODEModel|(4)]] and [[#eq:K|(5)]] is a deterministic model which assumes that transfers take place at the same rate at all times. This is often a restrictive assumption since in reality, dynamical systems usually exhibit some random behavior. It is therefore reasonable to consider that transfers are not constant but randomly fluctuate over time. This new assumption leads to the following dynamical system:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:linearTransferSDEModel"><math><br />
dA(t) = K(t)A(t)dt,<br />
</math></div><br />
|reference=(6) }}<br />
<br />
where $K$ has the same structure as in [[#eq:K|(5)]] but now some components $k_{l,l^\prime}$ are stochastic processes which take non-negative values and randomly fluctuate around a typical value $k_{l,l^\prime}^\star$.<br />
<br />
Let us now illustrate the construction of such diffusion models using some specific examples in pharmacokinetics.<br />
<br />
<br />
{{Example1<br />
|title1=Example 1: <br />
|title2= &#32; IV bolus administration with stochastic linear elimination<br />
<br />
|text= We will first extend the ODE based model defined in [[#eq:ode1|(2)]] by assuming that $k$ is a diffusion process which takes non-negative values and fluctuates around a typical value $k^\star$.<br />
In this example, non-negativity of $k(t)$ is ensured by defining the logarithm of the transfer rate as an Ornstein-Uhlenbeck diffusion process:<br />
<br />
{{Equation1<br />
|equation=<math> d\log k(t) = - \alpha \left( \log k(t) - \log k^\star \right) dt + \gamma d W(t), </math> }}<br />
<br />
where $W$ is a standard one-dimensional Wiener process. This results in the following diffusion system:<br />
<br />
{{Equation1<br />
|equation=<math> dX(t) = b(X(t))dt + \gamma(X(t))dW(t), </math> }}<br />
<br />
where<br />
<br />
{{Equation1<br />
|equation=<math> <br />
X(t) = \begin{pmatrix} A_c(t) \\ \log k(t) \end{pmatrix}, \ \ \ \<br />
b(x) = \begin{pmatrix} -x_1 \exp(x_2) \\ -\alpha (x_2-\log k^{\star}) \end{pmatrix}, \ \ \ \<br />
\gamma(x) = \begin{pmatrix} 0 & 0 \\ 0 & \gamma \end{pmatrix}.<br />
</math> }}<br />
<br />
Note that in this specific example, the Jacobian matrix of the drift function $b$ has a simple form: <br />
<br />
{{Equation1<br />
|equation=<math> B(x)=\begin{pmatrix} - \exp(x_2) & -x_1 \exp(x_2)\\ 0 & -\alpha \end{pmatrix}. </math> }}<br />
<br />
The two figures below display four simulated processes $k(t)$ and the associated amount processes $A_c(t)$.<br />
<br />
<br />
::[[File:sde2.png|link=]]<br />
<br />
:::[[File:sde3.png|link=]]<br />
<br />
<br />
We measure the concentration at times $(t_{j}, 1\leq j \leq n)$:<br />
<br />
{{Equation1<br />
|equation= <math>y_j = \displaystyle{\frac{A_c(t_{j})}{V} } + a \, \teps_j . </math> }}<br />
<br />
The parameter vector of the model is therefore $\psi = (V, k^\star, \alpha, \gamma, a)$. We see in this example that the simulated kinetics are much more realistic than those obtained with the previous model, because:<br />
<br />
<br />
* the elimination rate process $k(t)$ is a stochastic process that takes non-negative values,<br />
<br />
* even though the amount process is stochastic, it is smooth and decreases monotonically with time.<br />
}}<br />
<br />
<br />
<br />
{{Example1<br />
|title1=Example 2: <br />
|title2= &#32; Oral administration with first-order absorption and stochastic linear elimination<br />
<br />
|text=Oral PK models with first-order absorption and linear elimination are widely used to describe the time-course of a drug orally administered to a unique compartment of the human body. The drug is administrated in a depot compartment, absorbed by the central compartment with absorption rate $k_a$ and eliminated with elimination rate $k_e$. Such a model is described by the following system of ODEs:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:oral1"><math><br />
\displaystyle{ \frac{d}{dt} } \begin{pmatrix} A_d(t) \\ A_c(t) \end{pmatrix} \ \ = \ \ \begin{pmatrix} -k_a & 0\\ k_a & -k_e\end{pmatrix} \begin{pmatrix} A_d(t) \\ A_c(t) \end{pmatrix},<br />
</math></div><br />
|reference=(7) }}<br />
<br />
where $A_d(t)$ and $A_c(t)$ respectively represent the amounts of drug at time $t$ in the depot and central compartments. Assume now that the elimination constant is driven by a stochastic process, solution to the stochastic differential equation<br />
<br />
{{Equation1<br />
|equation=<math> d k_e(t) = - \alpha (k_e - k_e^\star ) dt + \gamma \sqrt{k_e(t)} dW(t),<br />
</math> }}<br />
<br />
where $W$ is a standard one-dimensional Wiener process. Then [[#eq:oral1|(7)]] becomes:<br />
<br />
{{Equation1<br />
|equation=<math> dX(t) = b(X(t))dt + \gamma(X(t))dW(t). </math> }}<br />
<br />
Here,<br />
<br />
{{Equation1<br />
|equation=<math><br />
X(t)= \begin{pmatrix} A_d(t) \\ A_c(t) \\ k_e(t) \end{pmatrix}, \ \ \ \<br />
b(x) = \begin{pmatrix} -k_a x_1 \\ k_a x_1 -x_3 x_2 \\ -\alpha(x_3-k_e^\star ) \end{pmatrix}, \ \ \ \<br />
\gamma(x) = \begin{pmatrix} 0 & 0 & 0 \\ 0 & 0 & 0\\ 0 & 0 & \gamma \sqrt{x_3}\end{pmatrix} ,<br />
</math> }}<br />
<br />
and the parameter vector of the model is $\psi = (V, k_a, k^\star, \alpha, \gamma, a) .$<br />
}}<br />
<br />
In both examples, the diffusion model can be easily extended to a population approach by defining the system's parameters $\psi$ as an individual random vector.<br />
<br />
<br />
<br />
<br><br />
<br />
==Mixed-effects diffusion models==<br />
<br />
Let us now consider model [[#eq:SDEmodel|(1)]] with observations coming from several subjects. An adequate adaptation of model [[#eq:SDEmodel|(1)]] in such a context consists of considering as many dynamical systems as individuals, and defining the parameters of the individual dynamical systems as independent random variables, in such a way to correctly reflect variability between the different trajectories. To standardize notation, we consider $N$ different subjects randomly chosen from a population and note $n_i$ the number of observations for individual $i$, so that $t_{i1}<\ldots<t_{i,n_i}$ are subject $i$'s observation time points. $(X_i(t),t>0) \in \Rset^d$ and $y_{ij} \in \Rset$ will respectively denote individual $i$'s diffusion and the observation $X_i(t_{ij})$. The $y_{ij}$, $i=1,\ldots,N$, $j=1,\ldots,n_i$ are governed by a mixed-effects model based on a $d$-dimensional real-valued system of stochastic differential equations with the general form:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:SDEmixedModel"><math><br />
\left\{<br />
\begin{array}{l}<br />
dX_i(t) = b(X_i(t),\psi_i)dt + \gamma(X_i(t),\psi_i)dW_i(t),\\[0.2cm]<br />
y_{ij} = c(X_i(t_{ij}),\psi_i) + \teps_{ij},\\[0.2cm]<br />
\teps_{ij} \underset{i.i.d.}{\sim} \mathcal{N}(0,a^2(\psi_i)) \; , \; j=1,\ldots, n_i \; , \; i=1,\ldots,N,\\<br />
\end{array}<br />
\right.<br />
</math></div><br />
|reference=(8) }}<br />
<br />
with initial condition $X_i(t_1) = x_{i1} \in \Rset^d$ for $i=1,\ldots,N$. The $\psi_i$'s are unobserved independent $d$-dimensional random subject-specific parameters, drawn from a distribution $\qpsi$ which depends on a set of population parameters $\theta$, $(W_1(t),t>0), \ldots, (W_N(t),t>0)$ are standard independent Wiener processes, and the $\teps_{ij}$ are independent Gaussian random variables representing residual errors such that the $\psi_i$, $W_i$ and $\teps_{ij}$ are mutually independent.<br />
The measurement function $c$, the drift function $b$ and the diffusion function $\gamma$ are known functions that are common to the $N$ subjects and depend on the unknown parameters $\psi_i$.<br />
<br />
Assuming that the $N$ individuals are independent, the joint pdf is given by:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:sdepdf"><math><br />
\pcypsi(y_1,\ldots,y_N {{!}} \psi_1,\ldots,\psi_N) = \prod_{i=1}^{N}\pcyipsii(y_i {{!}} \psi_i).<br />
</math></div><br />
|reference=(9) }}<br />
<br />
Computing the conditional distribution $\pcyipsii$ of the observations for any individual $i$ requires here to compute the conditional distribution of each observation given the past:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\pcyipsii(y_i {{!}} \psi_i) &=& \pyipsiONE(y_{i1} {{!}} \psi_i)\prod_{j=2}^{n_i} p(y_{i,j} {{!}} y_{i,1},\ldots,y_{i,j-1} {{!}} \psi_i) .<br />
\end{eqnarray}</math> }}<br />
<br />
Except in some very specific classes of mixed-effects diffusion models, the transition density $\pmacro(y_{i,j}|y_{i,1},\ldots,y_{i,j-1} | \psi_i)$ does not have a closed-form expression since it involves the transition densities of the underlying diffusion processes $X_i$.<br />
When the underlying system is a Gaussian linear dynamical system, this density is a Gaussian density whose mean and variance can be computed using the Kalman filter. When the system is not linear, a first solution consists in approximating this density by a Gaussian density and using the extended Kalman filter for quickly computing the mean and the variance of this density. On the other hand, particle filters do not make any approximations of the transition density, but are very demanding in terms of simulation volume and computation time.<br />
<br />
<br />
<br />
<br><br />
<br />
==Bibliography==<br />
<br />
<br />
<bibtex><br />
@article{delattre2013sii,<br />
title={Coupling the SAEM algorithm and the extended Kalman filter for maximum likelihood estimation in mixed-effects diffusion models},<br />
author={Delattre, M. and Lavielle, M.},<br />
journal={Statistics and Its Interface},<br />
year={2013}<br />
}<br />
</bibtex><br />
<bibtex><br />
@Article{Ditlevsen2005,<br />
title = {Mixed Effects in Stochastic Differential Equation Models},<br />
author = {Ditlevsen, S. and De Gaetano, A.},<br />
journal = {REVSTAT Statistical Journal},<br />
volume = {3},<br />
year = {2005},<br />
pages = {137-153}<br />
}<br />
</bibtex><br />
<bibtex><br />
@Article{Donnet2008,<br />
title = {Parametric Inference for Mixed Models Defined by Stochastic Differential Equations},<br />
author = {Donnet, S. and Samson, A.},<br />
journal = {ESAIM: Probability and Statistics},<br />
volume = {12},<br />
year = {2008},<br />
pages = {196-218}<br />
}<br />
</bibtex><br />
<bibtex><br />
@inproceedings{doucet2011tutorial,<br />
title={A tutorial on particle filtering and smoothing: Fifteen years later},<br />
author={Doucet, A. and Johansen, A. M.},<br />
booktitle={Oxford Handbook of Nonlinear Filtering},<br />
year={2011},<br />
organization={Citeseer}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{Klim2009,<br />
author = {Klim, S. and Mortensen, S. B. and Kristensen, N. R. and Overgaard, R. V. and Madsen, H.},<br />
title = {Population stochastic modelling (PSM)-an R package for mixed-effects models based on stochastic differential equations},<br />
journal = {Computer methods and programs in biomedicine},<br />
volume = {94},<br />
pages = {279-289},<br />
year = {2009}<br />
}<br />
</bibtex><br />
<bibtex><br />
@Article{Kristensen2005,<br />
title = {Using Stochastic Differential Equations for PK/PD Model Development},<br />
author = {Kristensen, N. R. and Madsen, H. and Ingwersen, S. H.},<br />
journal = {Journal of Pharmacokinetics and Pharmacodynamics},<br />
volume = {32},<br />
year = {2005},<br />
pages = {109-141}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{Mazzoni2008,<br />
title = {Computational aspects of continuous-discrete extended Kalman-filtering},<br />
author = {Mazzoni, T.},<br />
journal = {Computational Statistics},<br />
volume = {23},<br />
year = {2008},<br />
pages = {519-39}<br />
}<br />
</bibtex><br />
<bibtex><br />
@Article{PSM,<br />
title = {Population Stochastic Modelling (PSM): Model definition, description and examples},<br />
author = {Mortensen, S. and Klim, S.}, <br />
year = {2008},<br />
url = {http://www2.imm.dtu.dk/projects/psm/},<br />
}<br />
</bibtex><br />
<bibtex><br />
@Article{Mortensen2007,<br />
title = {A Matlab framework for estimation of NLME models using stochastic differential equations - Applications for estimation of insulin secretion rates},<br />
author = {Mortensen, S. B. and Klim, S. and Dammann, B. and Kristensen, N. R. and Madsen, H. and Overgaard, R. V.},<br />
journal = {Journal of Pharmacokinetics and Pharmacodynamics},<br />
volume = {34},<br />
year = {2007},<br />
pages = {623-642}<br />
}<br />
</bibtex><br />
<bibtex><br />
@Article{Overgaard2005,<br />
title = {Non-Linear Mixed-Effects Models with Stochastic Differential Equations: Implementation of an Estimation Algorithm},<br />
author = {Overgaard, R. V. and Jonsson, N. and Torn&oslash;e, C. W. and Madsen, H.},<br />
journal = {Journal of Pharmacokinetics and Pharmacodynamics},<br />
volume = {32},<br />
year = {2005},<br />
pages = {85-107}<br />
}<br />
</bibtex><br />
<bibtex><br />
@Article{Picchini2010,<br />
title = {Stochastic Differential Mixed-Effects Models},<br />
author = {Picchini, U. and De Gaetano, A. and Ditlevsen, S.},<br />
journal = {Scandinavian Journal of Statistics},<br />
volume = {37},<br />
year = {2010},<br />
pages = {67-90}<br />
}<br />
</bibtex><br />
<bibtex><br />
@Article{Picchini2011,<br />
title = {Practical Estimation of High Dimensional Stochastic Differential Mixed-Effects Models},<br />
author = {Picchini, U. and Ditlevsen, S.},<br />
journal = {Computational Statistics and Data Analysis},<br />
volume = {55},<br />
number = {3},<br />
year = {2011},<br />
pages = {1426-1444}<br />
}<br />
</bibtex><br />
<bibtex><br />
@Article{Tornoe2005,<br />
title = {Stochastic Differential Equations in NONMEM: Implementation, Application, and Comparison with Ordinary Differential Equations},<br />
author = {Torn&oslash;e, C. W. and Overgaard, R. V. and Agers&oslash;, H. and Nielsen, H. A. and Madsen, H. and Jonsson, E. N.},<br />
journal = {Pharmaceutical Research},<br />
volume = {22},<br />
year = {2005},<br />
pages = {1247-1258}<br />
}<br />
</bibtex><br />
<br />
<br />
{{Back<br />
|link=Hidden Markov models }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=Hidden_Markov_models&diff=7293Hidden Markov models2013-06-07T13:58:57Z<p>Brocco: /* Distributions of observations */</p>
<hr />
<div><!-- Menu for the Extensions chapter --><br />
<sidebarmenu><br />
+[[Extensions]]<br />
*[[Extensions| Introduction ]] | [[ Mixture models ]] | [[Hidden Markov models]] | [[Stochastic differential equations based models]] <br />
</sidebarmenu><br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
==Introduction==<br />
<br />
<br />
Markov chains are a useful tool for analyzing categorical longitudinal data. However, sometimes the Markov process cannot be directly observed, though some output, dependent on the<br />
(hidden) state, is visible. More precisely, we assume that the distribution of this observable output depends on the underlying hidden state. Such models are called hidden Markov models (HMMs).<br />
HMMs can be applied in many contexts and have turned out to be particularly pertinent in several biological contexts. For example, they are useful when characterizing diseases for which the existence of several discrete stages of illness is a realistic assumption, e.g., epilepsy and migraines.<br />
<br />
Here, we will consider a parametric framework with [http://en.wikipedia.org/wiki/Markov_chain Markov chains] in a discrete and finite state space $\mathbf{K} = \{1,\ldots,K\}$.<br />
<br />
<br />
<br><br />
<br />
==Mixed hidden Markov models==<br />
<br />
<br />
HMMs have been developed to describe how a given system moves from one state to another over time, in situations where the successive visited states are unknown and a set of observations is the only available information to describe the dynamics of the system. HMMs can be seen as a variant of mixture models that allow for possible memory in the sequence of hidden states. An HMM is thus defined as a pair of processes $(z_j,y_j, j=1,2,\ldots)$, where the latent sequence $(z_j)$ is a Markov chain and where the distribution of the observation $y_j$ at time $t_j$ depends on the state $z_j$.<br />
<br />
<br />
{{ImageWithCaption|image=hmm0.png|caption=Dynamics of a hidden Markov model}}<br />
<br />
<br />
In a population approach, HMMs from several individuals can be described simultaneously by considering ''mixed'' HMMs.<br />
Let $y_i=\left(y_{i,1},\ldots,y_{i,n_i}\right)$ and $z_i= \left(z_{i,1}, \ldots,z_{i,n_i}\right)$ denote respectively the sequences of observations and hidden states for individual $i$.<br />
<br />
We suppose that the joint distribution of $(z_i,y_i)$ is a parametric distribution that depends on a vector of parameters $\psi_i$ and can be decomposed as<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:hmm1"><math><br />
\pcyzipsii(z_i,y_i {{!}} \psi_i) = \pczipsii(z_i {{!}}\psi_i) \, \pcyizpsii(y_i {{!}} z_i,\psi_i) .<br />
</math></div><br />
|reference=(1) }}<br />
<br />
For each individual $i$, $z_i$ is a Markov chain whose probability distribution is defined by<br />
<br />
<br />
<ul><br />
* the distribution $ \pi_{i,1} = (\pi_{i,1}^{k},\ k=1,2,\ldots,K)$ of the first state $z_{i,1}$:<br />
<br />
{{Equation1<br />
|equation=<math> \pi_{i,1}^{k} = \prob{z_{i,1} = k {{!}} \psi_i} . </math> }}<br />
<br />
<br />
* the sequence of ''transition matrices'' $(Q_{i,j} \ ; \, j=2,3,\ldots)$, where for each $j$, $Q_{i,j} = (q_{i,j}^{\ell,k} \ ; \, 1\leq \ell,k \leq K)$ is a matrix of size $K \times K$ such that $q_{i,j}^{\ell,k} = \prob{z_{i,j} = k | z_{i,j-1}=\ell , \psi_i}$.<br />
</ul><br />
<br />
<br />
{{ImageWithCaption|image=markov_1.png|caption=Transitions of a Markov chain with 3 states}}<br />
<br />
<br />
The conditional distribution $\qcyizpsii$ depends on the model for the observations: for each state, observation $y_{ij}$ has a certain distribution. Let us see some examples:<br />
<br />
<br />
<br><br />
=== Examples ===<br />
<br />
<br />
1. In a continuous data model, one possibility is that the residual error model is a hidden Markov model that can randomly switch between $K$ possible residual error models.<br />
<br />
<br />
{{Example<br />
|title=Example 1<br />
|text=In this example, we consider a 2-state Markov chain. A constant error model is assumed in each state:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
y_{ij} &=& \sin(\alpha \, t_{ij}) + a_{i,1} \teps_{ij} \quad \text{if } z_{ij}=1 \\<br />
y_{ij} &=& \sin(\alpha \, t_{ij}) + a_{i,2} \teps_{ij} \quad \text{if } z_{ij}=2.<br />
\end{eqnarray}</math> }}<br />
<br />
The figure below displays simulated data from this model for 4 individuals. Observations drawn from state 1 (resp. state 2) are displayed in magenta (resp. black). Of course, the states are unknown in the case of hidden Markov models, i.e., only the values are observed in practice, not the colors.<br />
<br />
<br />
::[[File:hmm1bis.png|link=]]<br />
<br />
}}<br />
<br />
<br />
<br />
2. In a Poisson model for count data, the Poisson parameter might randomly switch between $K$ intensities. Such models have been used for describing the evolution of seizures in epileptic patients:<br />
<br />
<br />
{{Example<br />
|title=Example 2<br />
|text= Instead of assuming a single Poisson distribution for the observed numbers of seizures, this model assumes that patients go through alternating periods of low and high epileptic susceptibility. Therefore we consider what is called a 2-state Poisson mixed-HMM:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
y_{ij} &\sim& {\rm Poisson}(\lambda_{i,1}) \quad \text{if } z_{ij}=1 \\<br />
y_{ij} &\sim& {\rm Poisson}(\lambda_{i,2}) \quad \text{if } z_{ij}=2.<br />
\end{eqnarray}</math> }}<br />
<br />
<br />
:: [[File:hmm2bis.png|link=]]<br />
<br />
}}<br />
<br />
<br />
<br />
<br><br />
<br />
==Distributions of observations==<br />
<br />
<br />
Assuming that the $N$ individuals are independent, the joint pdf is given by:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:sdepdf"><math><br />
\pcypsi(y_1,\ldots,y_N {{!}} \psi_1,\ldots,\psi_N ) = \prod_{i=1}^{N}\pcyipsii(y_i {{!}} \psi_i).<br />
</math></div><br />
|reference=(2) }}<br />
<br />
Then, computing the conditional distribution of the observations $\qcyipsii$ for any individual $i$ requires integration of the joint conditional distribution $\qcyzipsii$ over the states:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\pcyipsii(y_i {{!}} \psi_i) &=& \sum_{z_i \in \mathbf{S} } \pcyzipsii(z_i, y_i {{!}} \psi_i) \\<br />
&=& \sum_{z_i \in \mathbf{S} } \pczipsii(z_i {{!}} \psi_i) \, \pcyizpsii(y_i {{!}} z_i,\psi_i) \\<br />
&=& \sum_{z_i \in \mathbf{S} } \left\{ \pi_{i,1}^{z_{i,1} } \pcyiONEzpsii(y_{i,1} {{!}} z_{i,1},\psi_i)\prod_{j=2}^{n} \left( q_{i,j}^{z_{i,j-1},z_{i,j} } \, \pcyijzpsii(y_{i,j} {{!}} z_{i,j},\psi_i) \right) \right\} .<br />
\end{eqnarray}</math> }}<br />
<br />
Though this looks complicated, it turns out that forward recursion of the [http://en.wikipedia.org/wiki/Baum-Welch_algorithm Baum-Welch algorithm] provides a quick way to numerically compute it.<br />
<br />
<br />
<br />
<br><br />
<br />
== Bibliography==<br />
<br />
<br />
<bibtex><br />
@article{Albert1991,<br />
title = "A two state Markov mixture model for a time series of epileptic seizure counts",<br />
author = "Albert, P. S.",<br />
journal = "Biometrics",<br />
volume = "47",<br />
year = "1991",<br />
pages = "1371-1381"}<br />
</bibtex><br />
<bibtex><br />
@article{Altman2007,<br />
title = "Mixed hidden Markov models : an extension of the hidden Markov model to the longitudinal data setting",<br />
author = "Altman, R. M.",<br />
journal = "Journal of the American Statistical Association",<br />
volume = "102",<br />
year = "2007",<br />
pages = "201-210"}<br />
</bibtex><br />
<bibtex><br />
@article{Anisimov2007,<br />
title = "Analysis of responses in migraine modelling using hidden Markov models",<br />
author = "Anisimov, W. and Maas, H. J. and Danhof, M. and Della Pasqua, O.",<br />
journal = "Statistics in Medicine",<br />
volume = "26",<br />
year = "2007",<br />
pages = "4163-4178"}<br />
</bibtex><br />
<bibtex><br />
@book{Cappe2005,<br />
author = "Capp&eacute;e, O. and Moulines, E. and Ryd&eacute;en, T.",<br />
title = "Inference in hidden Markov models",<br />
year = "2005",<br />
publisher= "Springer Series in Statistics"}<br />
</bibtex><br />
<bibtex><br />
@article{ChaubertPereira2011,<br />
title = "Markov and Semi-Markov Switching Linear Mixed Models Used to Identify<br />
Forest Tree Growth Components",<br />
author = "Chaubert-Pereira, F. and Gu&eacute;don, Y. and Lavergne, C. and Trottier, C.",<br />
journal = "Biometrics",<br />
volume = "66",<br />
year = "2011",<br />
pages = "753-762"}<br />
</bibtex><br />
<bibtex><br />
@article{delattre2012maximum,<br />
title={Maximum likelihood estimation in discrete mixed hidden Markov models using the SAEM algorithm},<br />
author={Delattre, M. and Lavielle, M.},<br />
journal={Computational Statistics & Data Analysis},<br />
year={2012},<br />
publisher={Elsevier}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{delattre2012analysis,<br />
title={Analysis of exposure-response of CI-945 in patients with epilepsy: application of novel mixed hidden Markov modeling methodology},<br />
author={Delattre, M. and Savic, R. M. and Miller, R. and Karlsson, M. O. and Lavielle, M.},<br />
journal={Journal of pharmacokinetics and pharmacodynamics},<br />
pages={1-9},<br />
year={2012},<br />
publisher={Springer}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{Maruotti2009,<br />
title = "A semiparametric approach to hidden Markov models under longitudinal<br />
observations",<br />
author = "Maruotti, A. and Ryd&eacute;en, T.",<br />
journal = "Statistics and Computing",<br />
volume = "19",<br />
year = "2009",<br />
pages = "381-393"}<br />
</bibtex><br />
<bibtex><br />
@article{Rabiner1989,<br />
title = "A tutorial on Hidden Markov Models and selected applications in speech recognition",<br />
author = "Rabiner, L. R.",<br />
journal = "Proceedings of the IEEE",<br />
volume = "77",<br />
year = "1989",<br />
pages = "257-286"}<br />
</bibtex><br />
<bibtex><br />
@article{Rijmen2008,<br />
title = "Qualitative longitudinal analysis of symptoms in patients with primary<br />
and metastatic brain tumours",<br />
author = "Rijmen, F. and Ip, E. H. and Rapp, S. and Shaw, E. G.",<br />
journal = "Journal of the Royal Statistical Society - Series A.",<br />
volume = "171, Part 3",<br />
year = "2008",<br />
pages = "739-753"}<br />
</bibtex><br />
<br />
<br />
{{Back&Next<br />
|linkBack= Mixture models<br />
|linkNext= Stochastic differential equations based models }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=Hidden_Markov_models&diff=7292Hidden Markov models2013-06-07T13:56:44Z<p>Brocco: /* Introduction */</p>
<hr />
<div><!-- Menu for the Extensions chapter --><br />
<sidebarmenu><br />
+[[Extensions]]<br />
*[[Extensions| Introduction ]] | [[ Mixture models ]] | [[Hidden Markov models]] | [[Stochastic differential equations based models]] <br />
</sidebarmenu><br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
==Introduction==<br />
<br />
<br />
Markov chains are a useful tool for analyzing categorical longitudinal data. However, sometimes the Markov process cannot be directly observed, though some output, dependent on the<br />
(hidden) state, is visible. More precisely, we assume that the distribution of this observable output depends on the underlying hidden state. Such models are called hidden Markov models (HMMs).<br />
HMMs can be applied in many contexts and have turned out to be particularly pertinent in several biological contexts. For example, they are useful when characterizing diseases for which the existence of several discrete stages of illness is a realistic assumption, e.g., epilepsy and migraines.<br />
<br />
Here, we will consider a parametric framework with [http://en.wikipedia.org/wiki/Markov_chain Markov chains] in a discrete and finite state space $\mathbf{K} = \{1,\ldots,K\}$.<br />
<br />
<br />
<br><br />
<br />
==Mixed hidden Markov models==<br />
<br />
<br />
HMMs have been developed to describe how a given system moves from one state to another over time, in situations where the successive visited states are unknown and a set of observations is the only available information to describe the dynamics of the system. HMMs can be seen as a variant of mixture models that allow for possible memory in the sequence of hidden states. An HMM is thus defined as a pair of processes $(z_j,y_j, j=1,2,\ldots)$, where the latent sequence $(z_j)$ is a Markov chain and where the distribution of the observation $y_j$ at time $t_j$ depends on the state $z_j$.<br />
<br />
<br />
{{ImageWithCaption|image=hmm0.png|caption=Dynamics of a hidden Markov model}}<br />
<br />
<br />
In a population approach, HMMs from several individuals can be described simultaneously by considering ''mixed'' HMMs.<br />
Let $y_i=\left(y_{i,1},\ldots,y_{i,n_i}\right)$ and $z_i= \left(z_{i,1}, \ldots,z_{i,n_i}\right)$ denote respectively the sequences of observations and hidden states for individual $i$.<br />
<br />
We suppose that the joint distribution of $(z_i,y_i)$ is a parametric distribution that depends on a vector of parameters $\psi_i$ and can be decomposed as<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:hmm1"><math><br />
\pcyzipsii(z_i,y_i {{!}} \psi_i) = \pczipsii(z_i {{!}}\psi_i) \, \pcyizpsii(y_i {{!}} z_i,\psi_i) .<br />
</math></div><br />
|reference=(1) }}<br />
<br />
For each individual $i$, $z_i$ is a Markov chain whose probability distribution is defined by<br />
<br />
<br />
<ul><br />
* the distribution $ \pi_{i,1} = (\pi_{i,1}^{k},\ k=1,2,\ldots,K)$ of the first state $z_{i,1}$:<br />
<br />
{{Equation1<br />
|equation=<math> \pi_{i,1}^{k} = \prob{z_{i,1} = k {{!}} \psi_i} . </math> }}<br />
<br />
<br />
* the sequence of ''transition matrices'' $(Q_{i,j} \ ; \, j=2,3,\ldots)$, where for each $j$, $Q_{i,j} = (q_{i,j}^{\ell,k} \ ; \, 1\leq \ell,k \leq K)$ is a matrix of size $K \times K$ such that $q_{i,j}^{\ell,k} = \prob{z_{i,j} = k | z_{i,j-1}=\ell , \psi_i}$.<br />
</ul><br />
<br />
<br />
{{ImageWithCaption|image=markov_1.png|caption=Transitions of a Markov chain with 3 states}}<br />
<br />
<br />
The conditional distribution $\qcyizpsii$ depends on the model for the observations: for each state, observation $y_{ij}$ has a certain distribution. Let us see some examples:<br />
<br />
<br />
<br><br />
=== Examples ===<br />
<br />
<br />
1. In a continuous data model, one possibility is that the residual error model is a hidden Markov model that can randomly switch between $K$ possible residual error models.<br />
<br />
<br />
{{Example<br />
|title=Example 1<br />
|text=In this example, we consider a 2-state Markov chain. A constant error model is assumed in each state:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
y_{ij} &=& \sin(\alpha \, t_{ij}) + a_{i,1} \teps_{ij} \quad \text{if } z_{ij}=1 \\<br />
y_{ij} &=& \sin(\alpha \, t_{ij}) + a_{i,2} \teps_{ij} \quad \text{if } z_{ij}=2.<br />
\end{eqnarray}</math> }}<br />
<br />
The figure below displays simulated data from this model for 4 individuals. Observations drawn from state 1 (resp. state 2) are displayed in magenta (resp. black). Of course, the states are unknown in the case of hidden Markov models, i.e., only the values are observed in practice, not the colors.<br />
<br />
<br />
::[[File:hmm1bis.png|link=]]<br />
<br />
}}<br />
<br />
<br />
<br />
2. In a Poisson model for count data, the Poisson parameter might randomly switch between $K$ intensities. Such models have been used for describing the evolution of seizures in epileptic patients:<br />
<br />
<br />
{{Example<br />
|title=Example 2<br />
|text= Instead of assuming a single Poisson distribution for the observed numbers of seizures, this model assumes that patients go through alternating periods of low and high epileptic susceptibility. Therefore we consider what is called a 2-state Poisson mixed-HMM:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
y_{ij} &\sim& {\rm Poisson}(\lambda_{i,1}) \quad \text{if } z_{ij}=1 \\<br />
y_{ij} &\sim& {\rm Poisson}(\lambda_{i,2}) \quad \text{if } z_{ij}=2.<br />
\end{eqnarray}</math> }}<br />
<br />
<br />
:: [[File:hmm2bis.png|link=]]<br />
<br />
}}<br />
<br />
<br />
<br />
<br><br />
<br />
==Distributions of observations==<br />
<br />
<br />
Assuming that the $N$ individuals are independent, the joint pdf is given by:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="eq:sdepdf"><math><br />
\pcypsi(y_1,\ldots,y_N {{!}} \psi_1,\ldots,\psi_N ) = \prod_{i=1}^{N}\pcyipsii(y_i {{!}} \psi_i).<br />
</math></div><br />
|reference=(2) }}<br />
<br />
Then, computing the conditional distribution of the observations $\qcyipsii$ for any individual $i$ requires integration of the joint conditional distribution $\qcyzipsii$ over the states:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\pcyipsii(y_i {{!}} \psi_i) &=& \sum_{z_i \in \mathbf{S} } \pcyzipsii(z_i, y_i {{!}} \psi_i) \\<br />
&=& \sum_{z_i \in \mathbf{S} } \pczipsii(z_i {{!}} \psi_i) \, \pcyizpsii(y_i {{!}} z_i,\psi_i) \\<br />
&=& \sum_{z_i \in \mathbf{S} } \left\{ \pi_{i,1}^{z_{i,1} } \pcyiONEzpsii(y_{i,1} {{!}} z_{i,1},\psi_i)\prod_{j=2}^{n} \left( q_{i,j}^{z_{i,j-1},z_{i,j} } \, \pcyijzpsii(y_{i,j} {{!}} z_{i,j},\psi_i) \right) \right\} .<br />
\end{eqnarray}</math> }}<br />
<br />
Though this looks complicated, it turns out that forward recursion of the Baum-Welch algorithm provides a quick way to numerically compute it.<br />
<br />
<br />
<br />
<br><br />
<br />
== Bibliography==<br />
<br />
<br />
<bibtex><br />
@article{Albert1991,<br />
title = "A two state Markov mixture model for a time series of epileptic seizure counts",<br />
author = "Albert, P. S.",<br />
journal = "Biometrics",<br />
volume = "47",<br />
year = "1991",<br />
pages = "1371-1381"}<br />
</bibtex><br />
<bibtex><br />
@article{Altman2007,<br />
title = "Mixed hidden Markov models : an extension of the hidden Markov model to the longitudinal data setting",<br />
author = "Altman, R. M.",<br />
journal = "Journal of the American Statistical Association",<br />
volume = "102",<br />
year = "2007",<br />
pages = "201-210"}<br />
</bibtex><br />
<bibtex><br />
@article{Anisimov2007,<br />
title = "Analysis of responses in migraine modelling using hidden Markov models",<br />
author = "Anisimov, W. and Maas, H. J. and Danhof, M. and Della Pasqua, O.",<br />
journal = "Statistics in Medicine",<br />
volume = "26",<br />
year = "2007",<br />
pages = "4163-4178"}<br />
</bibtex><br />
<bibtex><br />
@book{Cappe2005,<br />
author = "Capp&eacute;e, O. and Moulines, E. and Ryd&eacute;en, T.",<br />
title = "Inference in hidden Markov models",<br />
year = "2005",<br />
publisher= "Springer Series in Statistics"}<br />
</bibtex><br />
<bibtex><br />
@article{ChaubertPereira2011,<br />
title = "Markov and Semi-Markov Switching Linear Mixed Models Used to Identify<br />
Forest Tree Growth Components",<br />
author = "Chaubert-Pereira, F. and Gu&eacute;don, Y. and Lavergne, C. and Trottier, C.",<br />
journal = "Biometrics",<br />
volume = "66",<br />
year = "2011",<br />
pages = "753-762"}<br />
</bibtex><br />
<bibtex><br />
@article{delattre2012maximum,<br />
title={Maximum likelihood estimation in discrete mixed hidden Markov models using the SAEM algorithm},<br />
author={Delattre, M. and Lavielle, M.},<br />
journal={Computational Statistics & Data Analysis},<br />
year={2012},<br />
publisher={Elsevier}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{delattre2012analysis,<br />
title={Analysis of exposure-response of CI-945 in patients with epilepsy: application of novel mixed hidden Markov modeling methodology},<br />
author={Delattre, M. and Savic, R. M. and Miller, R. and Karlsson, M. O. and Lavielle, M.},<br />
journal={Journal of pharmacokinetics and pharmacodynamics},<br />
pages={1-9},<br />
year={2012},<br />
publisher={Springer}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{Maruotti2009,<br />
title = "A semiparametric approach to hidden Markov models under longitudinal<br />
observations",<br />
author = "Maruotti, A. and Ryd&eacute;en, T.",<br />
journal = "Statistics and Computing",<br />
volume = "19",<br />
year = "2009",<br />
pages = "381-393"}<br />
</bibtex><br />
<bibtex><br />
@article{Rabiner1989,<br />
title = "A tutorial on Hidden Markov Models and selected applications in speech recognition",<br />
author = "Rabiner, L. R.",<br />
journal = "Proceedings of the IEEE",<br />
volume = "77",<br />
year = "1989",<br />
pages = "257-286"}<br />
</bibtex><br />
<bibtex><br />
@article{Rijmen2008,<br />
title = "Qualitative longitudinal analysis of symptoms in patients with primary<br />
and metastatic brain tumours",<br />
author = "Rijmen, F. and Ip, E. H. and Rapp, S. and Shaw, E. G.",<br />
journal = "Journal of the Royal Statistical Society - Series A.",<br />
volume = "171, Part 3",<br />
year = "2008",<br />
pages = "739-753"}<br />
</bibtex><br />
<br />
<br />
{{Back&Next<br />
|linkBack= Mixture models<br />
|linkNext= Stochastic differential equations based models }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=Models_for_time-to-event_data&diff=7291Models for time-to-event data2013-06-07T13:48:49Z<p>Brocco: </p>
<hr />
<div><!-- Menu for the Observations chapter --><br />
<sidebarmenu><br />
+[[Modeling the observations]]<br />
*[[Modeling the observations| Introduction ]] | [[ Continuous data models ]] | [[Models for count data]] | [[Model for categorical data]] | [[Models for time-to-event data]] | [[Joint models]] <br />
</sidebarmenu><br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
Here, observations are the "times at which events occur". An event may be one-off (e.g., death, hardware failure) or repeated (e.g., epileptic seizures, metro strike).<br />
<br />
<br><br />
==Single event==<br />
<br />
<br />
To begin with, we will consider a one-off event.<br />
Depending on the application, the length of time to this event may be called the ''survival'' time (until death), ''failure'' time (until hardware fails), etc. To be general, we can just say ''event'' time.<br />
<br />
The random variable representing the event time for subject $i$ is typically written $T_i$. Several situations are then possible to define the observations:<br />
<br />
<br />
<ul><br />
* The event time is exactly observed.<br />
<br />
<br />
::[[File:survival1.png|link=]]<br />
<br />
<br />
: Then, the observation for individual $i$ is $y_i = t_i$, where $t_i$ is a realization of the random variable $T_i$.<br />
<br><br />
<br />
* We may know the event has happened in an interval $I_i$ but not know the exact time $t_i$. This is ''interval censoring''. For example, at a routine check-up, cancer recurrence may be detected, and we only know that it has occurred at some point in time since the last check-up.<br />
<br />
<br />
::[[File:survival3.png|link=]]<br />
<br />
<br />
: The observation for individual $i$ is the event: $y_i = $ "$a_i < t_i \leq b_i$".<br />
<br><br />
<br />
* If we assume that the trial ends at time $\tstop$, then the event may happen after the end of the trial period. This is ''right censoring''.<br />
<br />
<br />
::[[File:survival2.png|link=]]<br />
<br />
<br />
: There are several variations of this for defining what the observations are:<br />
<br><br />
<br />
* If events (before $\tstop$) are exactly observed, then for $i=1,2,\ldots, N$,<br />
<br />
{{Equation1|<br />
equation=<math><br />
y_i = \left\{<br />
\begin{array}{ll}<br />
t_i & {\rm if \quad} t_i \leq \tstop \\<br />
{\rm t_i > \tstop \quad} & {\rm otherwise. \quad}<br />
\end{array} \right.<br />
</math>}}<br />
</ul><br />
<br />
<br />
{{ExampleWithText&Table<br />
|title1=Example:<br />
|title2=<br />
|equation=<br />
Assume that a trial starts at $\tstart=0$ and ends at $\tstop=5$, and that we obtain the following observations from 4 individuals: <br />
<br />
$y_1 = 3.2$ <br />
<br />
$y_2=$ "$t_2>5$"<br />
<br />
$y_3= 2.7$ <br />
<br />
$y_4 =$ "$t_4>5$"<br />
<br />
<br />
These observations can be stored in a data file as shown in the table on the right.<br />
<br />
Here, "event=0" at time $t$ means that the event happened after $t$ while "event=1" means that the event happened at time $t$. <br />
<br />
The lines with $t=0$ are used to state the trial start time $\tstart=0$.<br />
<br />
|table=<br />
{{{!}} class="wikitable" align="center" style="width: 75%"<br />
!{{!}} ID {{!}}{{!}} TIME {{!}}{{!}} EVENT <br />
{{!}}-<br />
{{!}}1 {{!}}{{!}} 0 {{!}}{{!}} 0 <br />
{{!}}-<br />
{{!}}1 {{!}}{{!}} 3.2 {{!}}{{!}} 1 <br />
{{!}}-<br />
{{!}}2 {{!}}{{!}} 0 {{!}}{{!}} 0 <br />
{{!}}-<br />
{{!}}2 {{!}}{{!}} 5 {{!}}{{!}} 0 <br />
{{!}}-<br />
{{!}}3 {{!}}{{!}} 0 {{!}}{{!}} 0 <br />
{{!}}-<br />
{{!}}3 {{!}}{{!}} 2.7 {{!}}{{!}} 1 <br />
{{!}}-<br />
{{!}}4 {{!}}{{!}} 0 {{!}}{{!}} 0 <br />
{{!}}-<br />
{{!}}4 {{!}}{{!}} 5 {{!}}{{!}} 0 <br />
{{!}}} <br />
}}<br />
<br />
<br />
<ul><br />
* If events before $\tstop$ are interval censored, then for $i=1,2,\ldots, N$,<br />
<br />
{{Equation1<br />
|equation=<math><br />
y_i = \left\{<br />
\begin{array}{ll}<br />
{\rm a_i < t_i \quad \leq \quad b_i} & {\rm if \quad} t_i\leq \tstop \\<br />
{\rm t_i > \tstop \quad} & {\rm otherwise.}<br />
\end{array}<br />
\right.<br />
</math> }}<br />
</ul><br />
<br />
<br />
{{ExampleWithText&Table<br />
|title1=Example:<br />
|title2=<br />
|equation=<br />
Assume that we have censoring intervals of length 1: <br />
<br />
<br />
$(0,1],(1,2],\ldots,(4,5]$.<br />
<br />
<br />
For the same four individuals as the previous example, we now have the following observations: <br />
<br />
<br />
$y_1=$ "$3 < t_1 \leq 4$", <br />
<br />
$y_2=$ "$t_2>5$", <br />
<br />
$y_3=$ "$2< t_3 \leq 3$", <br />
<br />
$y_4=$ "$t_4>5$". <br />
<br />
<br />
These observations can be stored in a data file as shown in the table on the right.<br />
<br />
Here "event=0" at time $t$ means that the event happened after $t$ while "event=1" means that the event happened before time $t$.<br />
|table=<br />
{{{!}} class="wikitable" align="center" style="width: 75%"<br />
!{{!}} ID {{!}}{{!}} TIME {{!}}{{!}} EVENT <br />
{{!}}-<br />
{{!}}1 {{!}}{{!}} 0 {{!}}{{!}} 0 <br />
{{!}}-<br />
{{!}}1 {{!}}{{!}} 3 {{!}}{{!}} 0 <br />
{{!}}-<br />
{{!}}1 {{!}}{{!}} 4 {{!}}{{!}} 1 <br />
{{!}}-<br />
{{!}}2 {{!}}{{!}} 0 {{!}}{{!}} 0 <br />
{{!}}-<br />
{{!}}2 {{!}}{{!}} 5 {{!}}{{!}} 0 <br />
{{!}}-<br />
{{!}}3 {{!}}{{!}} 0 {{!}}{{!}} 0 <br />
{{!}}-<br />
{{!}}3 {{!}}{{!}} 2 {{!}}{{!}} 0 <br />
{{!}}-<br />
{{!}}3 {{!}}{{!}} 3 {{!}}{{!}} 1 <br />
{{!}}-<br />
{{!}}4 {{!}}{{!}} 0 {{!}}{{!}} 0 <br />
{{!}}-<br />
{{!}}4 {{!}}{{!}} 5 {{!}}{{!}} 0 <br />
{{!}}}<br />
}}<br />
<br />
<br />
<br><br />
<br />
== Probability distributions == <br />
<br />
<br />
Several functions play key roles in time-to-event analysis: the survival function, the hazard function and the cumulative hazard function.<br />
We are still working under a population approach here and so these functions, detailed below, are therefore individual functions, i.e., each subject has its own. As we are using parametric models, this means that these functions depend on individual parameters $(\psi_i)$.<br />
<br />
<br />
<ul><br />
* The '''survival function''' $S(t; \psi_i)$ gives the probability that the event happens to individual $i$ after time $t>t_{start}$:<br />
<br />
{{Equation1<br />
|equation=<math> <br />
S(t; \psi_i) \ \ \eqdef \ \ \prob{T_i>t ; \psi_i} .<br />
</math> }}<br />
<br />
<br />
<br />
* The '''hazard function''' $\hazard(t;\psi_i)$ is defined for individual $i$ as the instantaneous rate of the event at time $t$, given that the event has not already occurred:<br />
<br />
{{Equation1<br />
|equation=<math> <br />
\hazard(t;\psi_i) \ \ \eqdef \ \ \lim_{dt\to 0} \displaystyle{\frac{S(t;\psi_i) - S(t + dt;\psi_i)}{ S(t;\psi_i) \, dt} }. <br />
</math> }}<br />
<br />
: This is equivalent to: <br />
<br />
{{Equation1<br />
|equation=<div id="HazardSurvival" ><math> <br />
\hazard(t;\psi_i) \ \ = \ \ -\displaystyle{ \frac{d}{dt} } \log{S(t;\psi_i)}. <br />
</math></div><br />
|reference=(1)<br />
}} <br />
<br />
<br />
* Another useful quantity is the '''cumulative hazard function''' $\cumhaz(a,b;\psi_i)$, defined for individual $i$ as:<br />
<br />
{{Equation1<br />
|equation=<math><br />
\cumhaz(a,b;\psi_i) \ \ \eqdef \ \ \displaystyle{\int_a^b \hazard(t;\psi_i) \, dt }.<br />
</math>}}<br />
<br />
: Note that [[#HazardSurvival|(1)]] implies that:<br />
<br />
{{Equation1<br />
|equation=<math><br />
S(t;\psi_i) \ \ = \ \ e^{-\cumhaz(t_{start},t;\psi_i)}.<br />
</math> }}<br />
</ul><br />
<br />
<br />
Equation [[#HazardSurvival|(1)]] shows that the hazard function $\hazard(t;\psi_i)$ characterizes the problem, because knowing it is the same as knowing the survival function $S(t;\psi_i)$. The probability distribution of survival data is therefore completely defined by the hazard function.<br />
Let $\qcyipsii$ be the conditional distribution of the observation $y_i$ given the vector of individual parameters $\psi_i$. Its pdf can be easily computed for the various censoring situations discussed above:<br />
<br />
<br />
<ol><br />
<li>If the event is exactly observed with $y_i=t_i$, the density is the derivative of the cumulative density function, i.e., the derivative of $1 - S(t_i;\psi_i)$:</li><br />
<br />
{{Equation1<br />
|equation=<math><br />
\begin{eqnarray}\pcyipsii(y_i {{!}} \psi_i) &=& \frac{d}{dt_i}\left(1 - e^{-\cumhaz(t_{start},t_i;\psi_i)}\right)\\<br />
%&=& \left(\frac{d}{dt_i} \int_{t_{start} }^{t_i} \hazard(u;\psi_i) \, du \right) e^{-\cumhaz(t_{start},t_i;\psi_i)}\\<br />
&=&\hazard(t_i;\psi_i)e^{-\cumhaz(t_{start},t_i;\psi_i)} .<br />
\end{eqnarray}</math> }}<br />
<br />
<br />
<li>If the event is interval-censored with $y_i=\,$ "$a_i<t_i\leq b_i$":</li><br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\pcyipsii(y_i {{!}} \psi_i) &=& \prob{T_i \in (a_i,b_i]\,{{!}} \,\psi_i} \\<br />
%&=& \prob{T_i \leq b_i {{!}} \psi_i} - \prob{T_i \leq a_i {{!}} \psi_i} \\<br />
%&=& (1-S( b_i ; \psi_i)) - (1-S( a_i ; \psi_i)) \\<br />
&=& e^{-\cumhaz(t_{start},a_i;\psi_i)} - e^{-\cumhaz(t_{start},b_i;\psi_i)} .<br />
\end{eqnarray}</math> }}<br />
<br />
<br />
<li>If the event is right-censored with $y_i= \,$ "$t_i>t_{stop}$":</li><br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\pcyipsii(y_i {{!}} \psi_i) &=& \prob{T_i > t_{stop} {{!}} \psi_i} \\<br />
%&=& S( t_{stop} ; \psi_i) \\<br />
&=& e^{-\cumhaz(t_{start},t_{stop};\psi_i)} .<br />
\end{eqnarray}</math> }}<br />
</ol><br />
<br />
<br />
<br><br><br />
<br />
==Repeated events==<br />
<br />
<br />
<br />
Sometimes, an event can potentially happen again and again, e.g., epileptic seizures, heart attacks, etc.<br />
For any given hazard function $\hazard$, the survival function $S$ for individual $i$ now represents survival since the previous event at $t_{i,j-1}$, written here in terms of the cumulative hazard from $t_{i,j-1}$ to $t_{i,j}$:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
S(t_{i,j} {{!}} t_{i,j-1};\psi_i) &=& \prob{T_{i,j} > t_{i,j}\, {{!}} \,T_{i,j-1} = t_{i,j-1};\psi_i} \\<br />
&=& e^{-\cumhaz(t_{i,j-1},t_{i,j};\psi_i)} \\<br />
&=& \exp\left({-\int_{t_{i,j-1} }^{t_{i,j} } \hazard(t;\psi_i) \, dt}\right) .<br />
\end{eqnarray}</math> }}<br />
<br />
<!--%In the most simple case, $y_i$ is a vector of known event times: $y_i = (t_{i1},t_{i2},\ldots,t_{i\,n_i}).$ --><br />
<br />
<br />
<br><br />
==Censoring and probability distributions==<br />
<br />
<br />
Taking into account censoring for repeated events is slightly more complicated than for one-off events.<br />
First, let us assume that a trial starts at time $t_{start}$ and ends at time $t_{stop}$. Let $(T_{i1}, T_{i2}, \ldots )$ be random event times after $t_{start}$. Then, we can distinguish between the two following situations:<br />
<br />
<br />
<br />
<ul><br />
1. ''Exactly observed events:'' A sequence of $n_i$ event times is precisely observed before $t_{stop}$, i.e., ${\rm y_i = (t_{i,1},t_{i,2},\ldots,t_{i,n_i}, \quad t_{i,n_i+1}>\tstop)}$. <br />
<br />
: The conditional pdf of $y_i$ is given by:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="repeatcensor" ><math> <br />
\pcyipsii(y_i {{!}} \psi_i) = \left(\prod_{j=1}^{n_i}\hazard(t_{ij};\psi_i)e^{-\cumhaz(t_{i,j-1},t_{i,j};\psi_i)} \right)e^{-\cumhaz(t_{n_i},\tstop;\psi_i)} ,<br />
</math></div><br />
|reference=(1) }}<br />
<br />
: where $t_{i0}=\tstart$.<br />
</ul><br />
<br />
{{ExampleWith2Tables<br />
|title1=Example<br />
|title2=<br />
|text=<br />
Suppose that for individual $i=1$ we know there were 8 events but only 7 of them occurred before $\tstop$. Here is a graphic showing the events that were exactly observed:<br />
<br />
<br />
::[[File:survival4.png|link=]]<br />
<br />
<br />
This data is then stored in the table on the left below. We see that the 8th and final event is noted "event = 0" with time $\tstop = 18$, indicating that the event was not observed at the end of the time period $\tstop$. In the table on the right, we show the contributions of each observation to the conditional pdf of $y_1$. Indeed, equation [[#repeatcensor|(1)]] means that the pdf of $y_1=(y_{1,1}, \ldots, y_{1,8})$ is the product of the conditional pdfs given in the right table.<br />
<br />
<br />
|table1=<br />
{{{!}} class="wikitable" align="center" style="width:120%; margin-left:10%;margin-right:10%"<br />
!{{!}} ID {{!}}{{!}} TIME {{!}}{{!}} EVENT <br />
{{!}}-<br />
{{!}}1 {{!}}{{!}} 0 {{!}}{{!}} 0 <br />
{{!}}-<br />
{{!}}1 {{!}}{{!}} 1.4 {{!}}{{!}} 1 <br />
{{!}}-<br />
{{!}}1 {{!}}{{!}} 3.5 {{!}}{{!}} 1 <br />
{{!}}-<br />
{{!}}1 {{!}}{{!}} 4.4 {{!}}{{!}} 1 <br />
{{!}}-<br />
{{!}}1 {{!}}{{!}} 5.6 {{!}}{{!}} 1 <br />
{{!}}-<br />
{{!}}1 {{!}}{{!}} 9.7 {{!}}{{!}} 1 <br />
{{!}}-<br />
{{!}}1 {{!}}{{!}} 11.4 {{!}}{{!}} 1 <br />
{{!}}-<br />
{{!}}1 {{!}}{{!}} 15.8 {{!}}{{!}} 1 <br />
{{!}}-<br />
{{!}}1 {{!}}{{!}} 18 {{!}}{{!}} 0 <br />
{{!}}}<br />
<br />
|table2 =<br />
{{{!}} class="wikitable" align="center" style="width:200%; margin-right:10%; margin-left:10%"<br />
!{{!}} pdf <br />
{{!}}-<br />
{{!}} 1 <br />
{{!}}-<br />
{{!}} $\hazard(1.4;\psi_1)e^{-\cumhaz(0,1.4;\psi_1)}$<br />
{{!}}-<br />
{{!}} $\hazard(3.5;\psi_1)e^{-\cumhaz(1.4,3.5;\psi_1)}$<br />
{{!}}-<br />
{{!}} $\hazard(4.4;\psi_1)e^{-\cumhaz(3.5,4.4;\psi_1)}$<br />
{{!}}-<br />
{{!}} $\hazard(5.6;\psi_1)e^{-\cumhaz(4.4,5.6;\psi_1)}$<br />
{{!}}-<br />
{{!}} $\hazard(9.7;\psi_1)e^{-\cumhaz(5.6,9.7;\psi_1)}$<br />
{{!}}-<br />
{{!}} $\hazard(11.4;\psi_1)e^{-\cumhaz(9.7,11.4;\psi_1)}$<br />
{{!}}-<br />
{{!}} $\hazard(15.8;\psi_1)e^{-\cumhaz(11.4,15.8;\psi_1)}$<br />
{{!}}-<br />
{{!}} $e^{-\cumhaz(15,18;\psi_1)}$<br />
{{!}}}<br />
}}<br />
<br />
<br />
<ul><br />
2. ''Interval-censored events:'' Let $(b_{0}, b_1], (b_{1}, b_2], \ldots , (b_{K-1}, b_K]$ be a sequence of successive intervals with $\tstart=b_0<b_1<b_2 < \ldots <b_K = \tstop$. We do not know the exact event times, but a sequence $(m_{ik}; \, 1 \leq k \leq K)$ is observed, where $m_{ik}$ is the number of events that occurred for individual $i$ in interval $(b_{k-1}, b_k]$.<br />
<br />
: We can show that the conditional pdf of $y_i$ is given by:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="pdf_mult_int" ><math><br />
\pcyipsii(y_i {{!}} \psi_i) = \prod_{k=1}^{K} e^{-\cumhaz(b_{k-1}, b_k;\psi_i)} \displaystyle{\frac{\cumhaz^{m_{ik} }(b_{k-1}, b_k;\psi_i)}{m_{ik}!} } .<br />
</math></div><br />
|reference=(2) }}<br />
<br />
: In other words, the number of events per interval for individual $i$ is a (possibly non-homogeneous) Poisson process with intensity $\cumhaz(b_{k-1}, b_k;\psi_i)$ in interval $(b_{k-1}, b_k]$.<br />
<br />
<br />
{{ExampleWith2Tables<br />
|title1=Example<br />
|title2=<br />
<br />
|text= Here is a graphic that shows an example of the interval boundaries and the number of events that occurred in each interval for individual $i=1$.<br />
<br />
<br />
::[[File:survival5.png|link=]]<br />
<br />
<br />
The table on the left below shows the same data. Using [[#pdf_mult_int|(2)]] we see that the conditional pdf of $y_1=(y_{1,1}, \ldots, y_{1,6})$ is the product of the conditional pdfs given in the table on the right.<br />
<br />
<br />
|table1=<br />
{{{!}} class="wikitable" align="center" style="width:120%; margin-left:10%;margin-right:10%"<br />
!{{!}} ID {{!}}{{!}} TIME {{!}}{{!}} EVENT <br />
{{!}}-<br />
{{!}}1 {{!}}{{!}} 0 {{!}}{{!}} 0 <br />
{{!}}-<br />
{{!}}1 {{!}}{{!}} 3 {{!}}{{!}} 1 <br />
{{!}}-<br />
{{!}}1 {{!}}{{!}} 6 {{!}}{{!}} 3 <br />
{{!}}-<br />
{{!}}1 {{!}}{{!}} 9 {{!}}{{!}} 0 <br />
{{!}}-<br />
{{!}}1 {{!}}{{!}} 12 {{!}}{{!}} 2 <br />
{{!}}-<br />
{{!}}1 {{!}}{{!}} 15 {{!}}{{!}} 0 <br />
{{!}}-<br />
{{!}}1 {{!}}{{!}} 18 {{!}}{{!}} 1 <br />
{{!}}}<br />
<br />
|table2=<br />
{{{!}} class="wikitable" align="center" style="width:200%; margin-right:10%; margin-left:10% "<br />
!{{!}} pdf <br />
{{!}}-<br />
{{!}} 1 <br />
{{!}}-<br />
{{!}} $e^{-\cumhaz(0,3;\psi_1)}\cumhaz(0,3;\psi_1) $<br />
{{!}}-<br />
{{!}} $e^{-\cumhaz(3,6;\psi_1)} {\cumhaz^{3}(3,6;\psi_1)}/{6} $<br />
{{!}}-<br />
{{!}} $e^{-\cumhaz(6,9;\psi_1)}$<br />
{{!}}-<br />
{{!}} $e^{-\cumhaz(9,12;\psi_1)} {\cumhaz^{2}(9,12;\psi_1)}/{2} $<br />
{{!}}-<br />
{{!}} $e^{-\cumhaz(12,15;\psi_1)}$<br />
{{!}}-<br />
{{!}} $e^{-\cumhaz(15,18;\psi_1)}\cumhaz(15,18;\psi_1) $<br />
{{!}}}<br />
}}<br />
<br />
<br />
{{Remarks<br />
|title=Remark<br />
|text= if the total number $n_i$ of (observed and unobserved) events for individual $i$ is known to be finite, then formula [[#pdf_mult_int|(2)]] is slightly modified when the last event occurs before $\tstop$ ($t_{n_i}<\tstop$).<br />
Assume that the last event for individual $i$ occurs in the $K_i$-th interval. Let $s_{i} = \sum_{i=1}^{k_i-1} m_{ik}$ be the number of events that occurred before this interval. Then, we can show that<br />
<br />
{{EquationWithRef_Special<br />
|equation=<div id="pdf_mult_int2"><math><br />
\pcyipsii(y_i {{!}} \psi_i) = \prod_{k=1}^{K_i-1} \left( \displaystyle{ \frac{\cumhaz^{m_{ik} }(b_{k-1}, b_k;\psi_i)}{m_{ik}!} }e^{-\cumhaz(b_{k-1}, b_k;\psi_i)} \right)<br />
\!\times \!\left(1 - \sum_{\ell=0}^{n_i-s_{i} } \displaystyle{ \frac{\cumhaz^{\ell}(b_{k_i -1},b_{k_i};\psi_i)}{\ell!} } e^{-\cumhaz(b_{k_i -1},b_{k_i};\psi_i)}\right) . </math></div><br />
|reference=(3) }}<br />
}}<br />
<br />
<br />
<br><br />
<br />
== Examples of hazard functions==<br />
<br />
<br />
<br />
<ul><br />
* ''Constant hazard model:'' <br />
: The most simple case is that of a constant hazard function: $\hazard(t;\psi_i) = \hazard_i \in \Rset$. Here, $\psi_i=\hazard_i$. <br />
<br><br />
<br />
<br />
* ''Proportional hazards model:''<br />
<br />
{{Equation1<br />
|equation=<math><br />
\hazard(t;\psi_i) = \hazard_0(t;\alpha_i) \, e^{ \langle \beta , c_i \rangle}.<br />
</math>}}<br />
<br />
: Here, the hazard is decomposed into two terms: a baseline function $\hazard_0$ of $t$, and an "individual" term, function of some individual covariates $c_i$. $ \langle \beta , c_i \rangle$ means a scalar product, i.e., a linear function of $c_i$. In a proportional hazards model, a unit increase in the value of a covariate has a multiplicative effect on the hazard.<br />
<br />
: In the usual proportional hazard model, $\alpha_i$ is a population constant ($\alpha_i=\alpha$). Then, $\psi_i$ can be decomposed into a set of population parameters $\alpha$ and an individual parameter $ \langle \beta , c_i \rangle$. A straightforward extension consists in assuming that $\alpha_i$ is also an individual parameter.<br />
<br><br />
<br />
<br />
* ''Extended proportional hazards model:''<br />
<br />
: Another possible extension assumes that the hazard function is a (possibly nonlinear) function $u$ of a regression variable $x_i$:<br />
<br />
{{Equation1<br />
|equation=<math><br />
\hazard(t;\bpsi_i) = \hazard_0(t;\alpha_{i}) \, e^{ u(\beta_i,x_i(t))} .<br />
</math> }}<br />
<br />
:Consider for example that $x_i(t)$ is the plasmatic concentration of a drug at time $t$ for individual $i$. Then, $u(\beta_i,x_i(t))$ is the term that represents (i.e., models) the effect of the drug on the hazard, while $\hazard_0(t;\alpha_i)$ might model the effect of disease progression on the hazard.<br />
<!--%We consider here parametric functions that possibly depend on individual parameters.--><br />
<br />
: In this example, $x_i(t)$ is the "true" plasmatic concentration for subject $i$ at time $t$, and it is a continuous function of time. However, in practice it is only measured at precise times, so a longitudinal model for plasmatic concentration is needed to give a concentration value for each $t$.<br />
:Therefore, in practice we need to develop a ''joint model'' in order to simultaneously model time-to-events data and longitudinal data. Such an approach is introduced in the [[Joint models]] section.<br />
<br><br />
<br />
<br />
* ''Accelerated failure time (AFT) model:''<br />
<br />
:Unlike proportional hazards models, the AFT model supposes that a change in a covariate has a multiplicative effect not on the hazard but the ''predicted event time''. This can be written as:<br />
<br />
{{Equation1<br />
|equation=<math><br />
\log(T_i) = \langle \psi_i , c_i \rangle + \xi_i<br />
</math><br />
}}<br />
<br />
: where $\xi_i$ is a zero-mean random variable, e.g., a centered normal distribution. Usually, parameters are fixed effects: $\psi_i=\psi$ for each subject $i$.<br />
: To calculate the hazard function, let us first denote $p_{\xi_i}$ the density and $F_{\xi_i}$ the cdf of $\xi_i$, and to simplify, denote $\mu_i = \langle \psi_i , c_i \rangle$ the mean of $\log(T_i)$. We begin by calculating the survival function:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
S(t;\psi_i) &=& \prob{\log{T_i} > \log{t} ; \bpsi_i} \\<br />
&=& \int_{\log{t}-\mu_i}^{\infty} p_{\xi_i}(u; \psi_i) \, du \\<br />
&=& 1 - F_{\xi_i}(\log{t}-\mu_i ; \psi_i) .<br />
\end{eqnarray}</math> }}<br />
<br />
:Calculating [[#HazardSurvival|(1)]] then gives the hazard function:<br />
<br />
{{Equation1<br />
|equation=<math><br />
\hazard(t;\psi_i) = \displaystyle{ \frac{p_{\xi_i}(\log{t} - \mu_i; \psi_i)}{t(1- F_{\xi_i}(\log{t} - \mu_i; \psi_i))} }\,<br />
</math> }}<br />
<br />
<br><br><br />
-------<br />
<br><br><br />
<br />
{{Summary <br />
|title=Summary<br />
|text=<br />
For a given vector of individual parameters $\psi_i$, a model for (repeated) time-to-event data is completely defined by<br />
<br />
<br />
<ol><br />
<li> the hazard function $\hazard(t ; \psi_i)$, or the survival function $S(t ; \psi_i)$ </li><br />
<br />
<li> (possibly) the interval and/or right censoring process </li><br />
<br />
<li> (possibly) the maximum number of possible events </li><br />
</ol> }}<br />
<br />
<br />
<br><br />
<br />
<!--<br />
==$\mlxtran$ for time-to-event data models==<br />
--><br />
<br />
<br><br />
<br />
==Bibliography==<br />
<br />
<bibtex><br />
@book{aalen2008,<br />
author = {Aalen, O. and Borgan, O. and Gjessing, H.},<br />
title = {Survival and Event History Analysis. },<br />
publisher = {Springer},<br />
address = {New York},<br />
year = {2008}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{andersen2006survival,<br />
title={Survival analysis},<br />
author={Andersen, P. K.},<br />
year={2006},<br />
publisher={Wiley Online Library}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{diggle1994,<br />
author = {Diggle, P. and Kenward, M. G.},<br />
title = {Informative drop-out in longitudinal data analysis.},<br />
journal = {Appl. Stats},<br />
volume = {43},<br />
number = {},<br />
pages = {49-93},<br />
year = {1994}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@book{duchateau2008,<br />
author = {Duchateau, L. and Janssen, P.},<br />
title = {The Frailty Model. Statistics for Biology and Health },<br />
publisher = {Springer.},<br />
volume = {},<br />
pages = {},<br />
year = {2008},<br />
series = {},<br />
address = {New York},<br />
edition = {},<br />
month = {}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{fleming2011counting,<br />
title={Counting processes and survival analysis},<br />
author={Fleming, T. R. and Harrington, D. P.},<br />
volume={169},<br />
year={2011},<br />
publisher={Wiley}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{huang2007,<br />
author = {Huang, X. and Liu, L.},<br />
title = {A joint frailty model for survival and gap times between recurrent events.},<br />
journal = {Biometrics},<br />
volume = {63},<br />
number = {},<br />
pages = {389-397},<br />
year = {2007}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{ibrahim2005bayesian,<br />
title={Bayesian survival analysis},<br />
author={Ibrahim, J. G. and Chen, M.-H. and Sinha, D.},<br />
year={2005},<br />
publisher={Wiley Online Library}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{kalbfleisch2011statistical,<br />
title={The statistical analysis of failure time data},<br />
author={Kalbfleisch, J. D. and Prentice, R. L.},<br />
year={2011},<br />
publisher={Wiley-Interscience}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{kelly2000,<br />
author = {Kelly, P. J. and Jim, L. L.},<br />
title = {Survival analysis for recurrent event data: an application to childhood infectious disease.},<br />
journal = {Statistics in Medicine},<br />
volume = {19},<br />
number = {1},<br />
pages = {13-33},<br />
year = {2000}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{klein2003survival,<br />
title={Survival analysis: techniques for censored and truncated data},<br />
author={Klein, J. P. and Moeschberger, M. L.},<br />
year={2003},<br />
publisher={Springer}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{klein1997,<br />
author = {Klein, J. P. and Moeschberger, M. L.},<br />
title = { Survival Analysis - Techniques for Censored and Truncated Data. },<br />
publisher = {Springer-Verlag},<br />
volume = {},<br />
pages = {},<br />
year = {1997},<br />
series = {},<br />
address = {New York},<br />
edition = {},<br />
month = {}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{kleinbaum2011survival,<br />
title={Survival analysis},<br />
author={Kleinbaum, D. G.},<br />
year={2011},<br />
publisher={Springer}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{littell2006sas,<br />
title={SAS for mixed models},<br />
author={Littell, R. C.},<br />
year={2006},<br />
publisher={SAS institute}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{miller2011survival,<br />
title={Survival analysis},<br />
author={Miller Jr, R. G.},<br />
year={2011},<br />
publisher={Wiley-Interscience}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{wienke2010frailty,<br />
title={Frailty models in survival analysis},<br />
author={Wienke, A.},<br />
volume={37},<br />
year={2010},<br />
publisher={Chapman & Hall}<br />
}<br />
</bibtex><br />
<br />
<br />
<br />
{{Back&Next<br />
|linkBack=Model for categorical data<br />
|linkNext=Joint models }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=Model_for_categorical_data&diff=7290Model for categorical data2013-06-07T13:47:19Z<p>Brocco: /* Continuous time Markov chains */</p>
<hr />
<div><!-- Menu for the Observations chapter --><br />
<sidebarmenu><br />
+[[Modeling the observations]]<br />
*[[Modeling the observations| Introduction ]] | [[ Continuous data models ]] | [[Models for count data]] | [[Model for categorical data]] | [[Models for time-to-event data ]] | [[Joint models]] <br />
</sidebarmenu><br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
== Overview == <br />
<br />
Assume now that the observed data takes its values in a fixed and finite set of nominal categories $\{c_1, c_2,\ldots , c_K\}$.<br />
Considering the observations $(y_{ij}, 1 \leq j \leq n_i)$ of any individual $i$ as a sequence of independent random variables, the model is completely defined by the probability mass functions $\prob{y_{ij}=c_k | \psi_i}$, for $k=1,\ldots, K$ and $1 \leq j \leq n_i$.<br />
<br />
For a given $(i,j)$, the sum of the $K$ probabilities is 1, so in fact only $K-1$ of them need to be defined.<br />
<br />
In the most general way possible, any model can be considered so long as it defines a probability distribution, i.e., for each $k$, $\prob{y_{ij}=c_k | \psi_i} \in [0,1]$, and $\sum_{k=1}^{K} \prob{y_{ij}=c_k | \psi_i} = 1$. For instance, we could define $K$ time-dependent parametric functions $a_1$, $a_2$, ..., $a_K$ and set for any individual $i$, time $t_{ij}$ and $k \in \{1,\ldots,K\}$,<br />
<br />
{{EquationWithRef<br />
|equation=<div id="categorical1" ><math> <br />
\prob{y_{ij}=c_k {{!}} \psi_i} = \displaystyle{\frac{e^{a_k(t_{ij},\psi_i)} }{\sum_{m=1}^K e^{a_m(t_{ij},\psi_i)} } }. </math></div><br />
|reference=(1) }}<br />
<br />
<br />
{{Example<br />
|title=Example:<br />
|text= Suppose we want to model binary data, i.e., data where $y_{ij} \in \{0,1\}$.<br />
<br />
Let $\psi_i=(\alpha_i,\beta_i)$ and let $a_1(t,\psi_i)=0$ and $a_2(t,\psi_i) = \alpha_i + \beta_i \, t$. Then, [[#categorical1|(1)]] gives a probability distribution for binary outcomes:<br />
<br />
{{Equation1|equation= <math><br />
\prob{y_{ij}=0 {{!}} \psi_i} = \displaystyle{\frac{1}{1 + e^{\alpha_i + \beta_i \, t_{ij} } } } \quad \ \ \ \text{and} \quad<br />
\ \ \ \prob{y_{ij}=1 {{!}} \psi_i} = \displaystyle{\frac{e^{\alpha_i + \beta_i \, t_{ij} } }{1 + e^{\alpha_i + \beta_i \, t_{ij} } } }. <br />
</math>}}<br />
}}<br />
<br />
<br />
Such parametrizations are extremely flexible and easy to interpret in simple situations.<br />
In the previous example for instance, $\prob{y_{ij}=1 | \psi_i}$ and $a_2(t_{ij},\psi_i)$ move in the same direction as time increases.<br />
<br />
<br />
<br><br />
== Ordinal data ==<br />
<br />
<br />
Ordinal data further assumes that the categories are ordered, i.e., there exists an order $\prec$ such that<br />
<br />
{{Equation1|equation=<math><br />
c_1 \prec c_2,\prec \ldots \prec c_K .<br />
</math>}}<br />
<br />
We can think for instance of levels of pain (low, moderate, severe), or any scores on a discrete scale, e.g., from 1 to 10.<br />
<br />
Instead of defining the probabilities of each category, it may be convenient to define the cumulative probabilities $\prob{y_{ij} \preceq c_k | \psi_i}$ for $k=1,\ldots ,K-1$, or in the other direction: $\prob{y_{ij} \succeq c_k | \psi_i}$ for $k=2,\ldots, K$. <br />
Any model is possible as long as it defines a probability distribution, i.e., satisfies:<br />
<br />
{{Equation1|equation=<math><br />
0 \leq \prob{y_{ij} \preceq c_1 {{!}} \psi_i} \leq \prob{y_{ij} \preceq c_2 {{!}} \bpsi_i} \leq \ldots \leq \prob{y_{ij} \preceq c_K {{!}} \psi_i} =1 .<br />
</math> }}<br />
<br />
Without any loss of generality, we will consider numerical categories in what follows. The order $\prec$ then reduces to the usual order $<$ on $\Rset$.<br />
Currently, the most popular model for ordinal data is the proportional odds model which uses ''logits'' of these cumulative probabilities, also called ''cumulative logits''. We assume that there exist $\alpha_{i,1}\geq0$, $\alpha_{i,2}\geq 0, \ldots , \alpha_{i,K-1}\geq 0$ such that for $k=1,2,\ldots,K-1$,<br />
<br />
{{EquationWithRef<br />
|equation=<div id="propodds_model"><math> \logit \left(\prob{y_{ij} \leq c_k {{!}} \psi_i} \right) = \left( \sum_{m=1}^k \alpha_{im}\right) + \beta_i \, x(t_{ij}) ,<br />
</math></div><br />
|reference=(2) }}<br />
<br />
where $x(t_{ij})$ is a vector of regression variables and $\beta_i$ a vector of coefficients. Here, $\bpsi_i=(\alpha_{i1},\alpha_{i2},\ldots,\alpha_{i,K-1},\beta_i)$.<br />
<br />
Recall that $\logit(p) = \log\left(p/(1-p)\right)$. Then, the probability defined in [[#propodds_model|(2)]] can also be expressed as<br />
<br />
{{Equation1|equation=<math><br />
\prob{y_{ij} \leq c_k {{!}} \bpsi_i} = \displaystyle{\frac{1}{1 + e^{ \left(\sum_{m=1}^k \alpha_{im}\right) + \beta_i \, x(t_{ij})} } }.<br />
</math>}} <br />
<br />
<br />
{{Example<br />
|title=Example:<br />
|text= We give to patients a drug which is supposed to decrease the level of a given type of pain. <br />
The level of pain is measured on a scale from 1 to 3: 1=low, 2=moderate, 3=high. We consider the following model with the constraint that $\alpha_{i2}\geq 0$:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\logit \left(\prob{y_{ij} \leq 1 {{!}} \psi_i}\right) &=& \alpha_{i,1} + \beta_{i,1}\, t_{ij} + \beta_{i,2}\, C_{ij} \\<br />
\logit \left(\prob{y_{ij} \leq 2 {{!}} \psi_i}\right) &=& \alpha_{i,1} + \alpha_{i,2} + \beta_{i,1}\, t_{ij} + \beta_{i2}\, C_{ij} \\<br />
\prob{y_{ij} \leq 3 {{!}} \psi_i} &=& 1,<br />
\end{eqnarray}</math> }} <br />
<br />
where $C_{ij}$ is the concentration of the drug at time $t_{ij}$. The model parameters are quite easy to explain:<br />
<br />
<br />
* $\beta_{i,1}=0$ means that without treatment, the level of pain tends to remains stable over time.<br />
* $\beta_{i,1}<0$ (resp. $\beta_{i1}>0$) means that the pain tends to increase (resp. decrease) over time.<br />
* $\beta_{i,2}=0$ means that the drug has no effect on pain.<br />
* $\beta_{i,2}>0$ means that the level of pain tends to decrease when the drug concentration increases, whereas $\beta_{i2}<0$ means that pain is an adverse drug effect.<br />
}}<br />
<br />
<br />
{{Remarks<br />
|title=Remarks<br />
|text= Exclusive use of linear models (or generalized linear models) has no real justification today since very efficient tools are available for nonlinear models.<br />
Model [[#propodds_model|(2)]] can be easily extended to a nonlinear model:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="propodds_model2"><math> \logit \left(\prob{y_{ij} \leq k {{!}} \psi_i } \right) = \sum_{m=1}^k \alpha_{i,m} + \beta(x(t_{ij})) , </math></div><br />
|reference=(3) }}<br />
<br />
where $\beta$ is any (linear or nonlinear) function of $x(t_{ij})$. }}<br />
<br />
<br />
<br />
<br><br />
== Markovian dependence ==<br />
<br />
<br />
For the sake of simplicity, we will assume here that the observations $(y_{ij})$ take their values in $\{1, 2, \ldots, K\}$.<br />
<br />
We have so far assumed that the categorical observations $(y_{ij},\,j=1,2,\ldots,n_i)$ for individual $i$ are independent. It is however possible to introduce dependency between observations from the same individual by assuming that $(y_{ij},\,j=1,2,\ldots,n_i)$ forms a [http://en.wikipedia.org/wiki/Markov_chain Markov chain]. For instance, a [http://en.wikipedia.org/wiki/Markov_chain Markov chain] with memory 1 assumes that all is required from the past to determine the distribution of $y_{i,j}$ is the value of the previous observation $y_{i,j-1}$. i.e., for all $k=1,2,\ldots ,K$,<br />
<br />
{{Equation1<br />
|equation=<math><br />
\prob{y_{i,j} = k\, {{!}} \,y_{i,j-1}, y_{i,j-2}, y_{i,j-3},\ldots,\psi_i} = \prob{y_{i,j} = k {{!}} y_{i,j-1},\psi_i}.<br />
</math> }}<br />
<br />
<br />
<br><br />
=== Discrete time Markov chains ===<br />
<br />
If the observation times are regularly spaced (constant length of time between successive observations), we can consider the observations $(y_{ij},\,j=1,2,\ldots,n_i)$ to be a discrete time [http://en.wikipedia.org/wiki/Markov_chain Markov chain]. Here, for each individual $i$, the probability distribution of the sequence $(y_{ij},\,j=1,2,\ldots,n_i)$ is defined by:<br />
<br />
<br />
<ul><br />
* the distribution $ \pi_{i,1} = (\pi_{i,1}^{k} , k=1,2,\ldots,K)$ of the first observation $y_{i,1}$:<br />
<br />
{{Equation1<br />
|equation=<math> \pi_{i,1}^{k} = \prob{y_{i,1} = k {{!}} \psi_i} </math> }}<br />
<br />
<br />
* the sequence of ''transition matrices'' $(Q_{i,j}, j=2,3,\ldots)$, where for each $j$, $Q_{i,j} = (q_{i,j}^{\ell,k}, 1\leq \ell,k \leq K)$ is a matrix of size $K \times K$ such that,<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
q_{i,j}^{\ell,k} &=& \prob{y_{i,j} = k {{!}} y_{i,j-1}=\ell , \psi_i} \quad \text{ for all } (\ell,k),\\<br />
\sum_{k=1}^{K}q_{ij}^{\ell,k} &=& 1 \quad \text{ for all } (\ell,k).<br />
\end{eqnarray}</math> }}<br />
</ul><br />
<br />
<br />
The conditional distribution of $y_i=(y_{i,j}, j=1,2,\ldots, n_i)$ is then well-defined:<br />
<br />
{{Equation1<br />
|equation=<math><br />
\pcyipsii(y_i {{!}} \psi_i) = \pmacro(y_{i,1}{{!}}\psi_i) \prod_{j=2}^{n_i} \pmacro(y_{i,j} {{!}} y_{i,j-1},\psi_i) .<br />
</math> }}<br />
<br />
For a given individual $i$, $Q_{i,j}$ defines the transition probabilities between states at a given time $t_{ij}$:<br />
<br />
<br />
::[[File:markov_1.png|link=]]<br />
<br />
<br />
Our model must therefore give, for each individual $i$, the distribution of first observation $(y_{i,1})$ and a description of how the transition probabilities evolve with time.<br />
<br />
The figure below shows several examples of simulated sequences coming from a model with 2 states defined by:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\logit\left(q_{i,j}^{1,2}\right) &=& a_i+b_i \, t_j \\<br />
\logit\left(q_{i,j}^{2,1}\right) &=& c_i+d_i \, t_j \\<br />
\prob{y_{i,1}=1} &=& 0.5 ,<br />
\end{eqnarray}</math> }}<br />
<br />
where $t_j = j$.<br />
<br />
[[File:markov_2.png|link=]]<br />
<br />
In the first example (left), the logits of the transitions between states are constant ($b_i = d_i = 0$).<br />
Transition probabilities are therefore constant over time. Here, $q^{1,2}=1/(1+\exp(2.5))=0.0759$ and $q^{2,1}=1/(1+\exp(2))=0.1192$. As $q^{1,2}$ and $q^{2,1}$ are small with $q^{1,2}<q^{2,1}$, transitions between the two states are rare, and a larger amount of time (on average) is spent in state 1. Indeed, the stationary distribution is the eigenvector of the transition matrix $P$: $\prob{y_{ij}=1}=0.611$ and $ \prob{y_{ij}=2}=0.389$.<br />
The figure (left) displays the transition rates $q^{1,2}$ and $q^{2,1}$ as function of the time (top left) and two simulated sequences of states (centre and bottom left).<br />
<br />
In the second example (center), $b_i$ and $d_i$ are negative. This means that as time progresses, transitions from state 1 to 2 become rarer, and the same is true from 2 to 1.<br />
<br />
In the third example (right), now $b_i$ and $d_i$ are positive. This means that as time progresses, transitions from state 1 to 2 become more and more frequent, and also more frequent from 2 to 1.<br />
Note that the value of $a_i$ (resp. $c_i$) can be seen as the transition probability from state 1 to 2 (resp. 2 to 1) at time $t=0$.<br />
<br />
Different choices can be made for defining an initial distribution $\pi_{i,1}$:<br />
<br />
<br />
<ul><br />
* The initial state can be defined arbitrarily: $y_{i,1}=k_0$. This means that $\pi_{i,1}^{k_0} = 1$ and $\pi_{i,1}^{k} = 0$ for $k\neq k_0$.<br />
<br><br />
<br />
* More generally, any simple probability distribution can be put on the choice of the initial state, e.g., the uniform distribution $\pi_{i,1}^{k} = 1/K$ for $ k=1,2,\ldots , K$.<br />
<br><br />
<br />
* If a transition matrix $Q_{i1} $ has been defined at time $t_1$, we might consider using its stationary distribution, i.e., taking for $\pi_{i,1}$ the solution to:<br />
<br />
{{Equation1<br />
|equation=<math><br />
\pi_{i,1} = \pi_{i,1} Q_{i1} .<br />
</math> }}<br />
</ul><br />
<br />
<br />
<br><br />
<br />
=== Continuous time Markov chains ===<br />
<br />
<br />
<br />
The previous situation can be extended to the case where observation times are irregular, by modeling the<br />
sequence of states as a continuous-time [http://en.wikipedia.org/wiki/Markov_process Markov process]. The difference is that rather than transitioning to a new (possibly the same) state at each time step, the system remains in the current state for some random amount of time before transitioning. This process is now characterized by ''transition rates'' instead of transition probabilities:<br />
<br />
{{Equation1 <br />
|equation=<math><br />
\prob{y_{i}(t+h) = k\, {{!}} \,y_{i}(t)=\ell , \psi_i} = h \, \rho_{i}^{\ell,k}(t) + o(h),\quad k \neq \ell .<br />
</math> }}<br />
<br />
The probability that no transition happens between $t$ and $t+h$ is<br />
<br />
{{Equation1 <br />
|equation=<math><br />
\prob{y_{i}(s) = \ell, \forall s\in(t, t+h) \ {{!}} \ y_{i}(t)=\ell , \psi_i} = e^{h \, \rho_{i}^{\ell,\ell}(t)} . <br />
</math> }}<br />
<br />
<br />
<br><br><br />
------------------------<br />
<br><br><br />
<br />
{{Summary<br />
|title=Summary<br />
|text= <br />
A model for independent categorical data is completely defined by:<br />
<br />
<ul><br />
<li>The probability mass functions $\left(\prob{y_{ij} = k {{!}} \psi_i} \right)$<br />
<li> (or) the cumulative probability functions $\left(\prob{y_{ij} \leq c_k {{!}} \psi_i} \right)$ for ordinal data<br />
<li> (or) the cumulative logits $\left(\logit \left( \prob{y_{ij} \leq k {{!}} \psi_i} \right)\right)$ for a proportional odds model<br />
</ul><br />
<br />
<br />
A model for categorical data with Markovian dependency is completely defined by:<br />
<br />
<br />
<ol><br />
<li> the probability transitions in the case of a discrete-time [http://en.wikipedia.org/wiki/Markov_chain Markov chain]</li><br />
<br />
<li> (or) the transition rates in the case of a continuous-time [http://en.wikipedia.org/wiki/Markov_process Markov process]</li><br />
<br />
<li> the probability distribution of the initial states</li><br />
</ol><br />
}}<br />
<br />
<br />
<br />
<br />
<br><br />
<br />
== $\mlxtran$ for categorical data models == <br />
<br />
<br />
<br />
{{ExampleWithCode<br />
|title1=Example 1:<br />
|title2= $ \quad y_{ij} \in \{0, 1, 2\}$<br />
|text=<br />
<br />
|equation=<math> \begin{eqnarray}<br />
\psi_i &=& (V_i, k_i, \alpha_{0,i}, \alpha_{1,i}, \gamma_i) \\[0.2cm]<br />
D &=&100 \\<br />
C(t,\psi_i) &=& \frac{D_i}{V_i} e^{-k_i \, t} \\[0.2cm]<br />
\prob{y_{ij}\leq 0} &=& \alpha_{0,i} + \gamma_i \, C(t_{ij},\psi_i) \\<br />
\prob{y_{ij}\leq 1} &=& \alpha_{0,i} + \alpha_{1,i} + \gamma_i \, C(t_{ij},\psi_i) \\<br />
\prob{y_{ij}\leq 2} &=& 1<br />
\end{eqnarray}</math><br />
|code=<br />
{{MLXTranForTable<br />
|name=<br />
|text=<br />
<pre style=" background-color:#EFEFEF; border:none;"><br />
INPUT:<br />
input = {V, k, alpha0, alpha1, gamma}<br />
<br />
EQUATION:<br />
D = 100<br />
C = D/V*exp(-k*t)<br />
p0 = alpha0 + gamma*C<br />
p1 = p0 + alpha1<br />
<br />
DEFINITION:<br />
y = {type=categorical,<br />
categories={0, 1, 2},<br />
P(y<=0)=p0,<br />
P(y<=1)=p1<br />
}<br />
</pre> }}<br />
}}<br />
<br />
<br />
<br />
<br />
{{ExampleWithCode<br />
|title1=Example 2:<br />
|title2= $\quad$ 2-state discrete-time Markov chain<br />
|text=<br />
<br />
|equation=<math> \begin{eqnarray}<br />
\psi_i &=& (a_i,b_i,c_i,d_i) \\[0.2cm]<br />
\logit(p_{ij}^{12}) &=& a_i+b_i \, t_{ij} \\<br />
\logit(p_{ij}^{21}) &=& c_i+d_i \, t_{ij} \\<br />
\prob{y_{i,1}=1} &=& 0.5<br />
\end{eqnarray}</math><br />
|code=<br />
{{MLXTranForTable<br />
|name=<br />
|text=<br />
<pre style=" background-color:#EFEFEF; border:none;"> <br />
INPUT:<br />
input = {a, b, c, d}<br />
<br />
DEFINITION:<br />
Y = { type = categorical,<br />
categories = {1, 2},<br />
dependence = Markov<br />
P(Y_1=1) = 0.5<br />
logit(P(Y=2 | Y_p=1)) = a + b*t<br />
logit(P(Y=1 | Y_p=2)) = c + d*t<br />
}<br />
</pre> }}<br />
}}<br />
<br />
<br />
<br />
{{ExampleWithCode<br />
|title1=Example 3:<br />
|title2= $\quad$ 2-state continuous-time Markov chain<br />
|text=<br />
<br />
|equation=<math> \begin{eqnarray}<br />
\psi_i &=& (a_i,b_i,c_i,d_i,\pi_i) \\[0.2cm]<br />
q_{i}^{12}(t) &=& e^{a_i+b_i \, t} \\<br />
q_{i}^{21}(t) &=& e^{c_i+d_i \, t} \\<br />
\prob{y_{i,1}=1} &=& \pi_i<br />
\end{eqnarray}</math><br />
|code=<br />
{{MLXTranForTable<br />
|name=<br />
|text=<br />
<pre style=" background-color:#EFEFEF; border:none;"> <br />
INPUT:<br />
input = {a, b, c, d, pi}<br />
<br />
DEFINITION:<br />
Y = { type = categorical,<br />
categories = {1, 2},<br />
dependence = Markov<br />
P(Y_1=1) = pi<br />
transitionRate(1,2) = exp(a + b*t)<br />
transitionRate(2,1) = exp(c + d*t)<br />
}<br />
</pre> }}<br />
}}<br />
<br />
== Bibliography==<br />
<br />
<bibtex><br />
@book{agresti2010analysis,<br />
title={Analysis of ordinal categorical data},<br />
author={Agresti, A.},<br />
volume={656},<br />
year={2010},<br />
publisher={Wiley}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{agresti2007introduction,<br />
title={An introduction to categorical data analysis},<br />
author={Agresti, A.},<br />
volume={423},<br />
year={2007},<br />
publisher={Wiley-Interscience}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{bolker2009generalized,<br />
title={Generalized linear mixed models: a practical guide for ecology and evolution},<br />
author={Bolker, B. M. and Brooks, M. E. and Clark, C. J. and Geange, S. W. and Poulsen, J. R. and Stevens, M. H. H. and White, J.-S. S. and others},<br />
journal={Trends in ecology & evolution},<br />
volume={24},<br />
number={3},<br />
pages={127-135},<br />
year={2009},<br />
publisher={Elsevier Science}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{davidian1995,<br />
author = {Davidian, M. and Giltinan, D. M.},<br />
title = {Nonlinear Models for Repeated Measurements Data },<br />
publisher = {Chapman & Hall.},<br />
address = {London},<br />
edition = {},<br />
year = {1995}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{jiang2007,<br />
author = {Jiang., J.},<br />
title = {Linear and Generalized Linear Mixed Models and Their Applications.},<br />
publisher = {Springer Series in Statistics},<br />
volume = {},<br />
pages = {},<br />
year = {2007},<br />
series = {},<br />
address = {New York},<br />
edition = {},<br />
month = {}<br />
}<br />
<br />
</bibtex><br />
<bibtex><br />
@book{littell2006sas,<br />
title={SAS for mixed models},<br />
author={Littell, R. C.},<br />
year={2006},<br />
publisher={SAS institute}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{mcculloch2011generalized,<br />
title={Generalized, Linear, and Mixed Models},<br />
author={McCulloch, C. E. and Searle, S. R. and Neuhaus, J. M.},<br />
isbn={9781118209967},<br />
series={Wiley Series in Probability and Statistics},<br />
year={2011},<br />
publisher={Wiley}<br />
url={http://books.google.fr/books?id=kyvgyK\_sBlkC},<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{molenberghs2005models,<br />
title={Models for discrete longitudinal data},<br />
author={Molenberghs, G. and Verbeke, G.},<br />
year={2005},<br />
publisher={Springer}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{powers2008statistical,<br />
title={Statistical methods for categorical data analysis},<br />
author={Powers, D. A. and Xie, Y.},<br />
year={2008},<br />
publisher={Emerald Group Publishing}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{wolfinger1993generalized,<br />
title={Generalized linear mixed models a pseudo-likelihood approach},<br />
author={Wolfinger, R. and O'Connell, M.},<br />
journal={Journal of statistical Computation and Simulation},<br />
volume={48},<br />
number={3-4},<br />
pages={233-243},<br />
year={1993},<br />
publisher={Taylor & Francis}<br />
}<br />
</bibtex><br />
<br />
<br />
{{Back&Next<br />
|linkBack=Models for count data<br />
|linkNext=Models for time-to-event data }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=Continuous_data_models&diff=7289Continuous data models2013-06-07T13:39:51Z<p>Brocco: /* Censored data */</p>
<hr />
<div><!-- Menu for the Observations chapter --><br />
<sidebarmenu><br />
+[[Modeling the observations]]<br />
*[[Modeling the observations| Introduction ]] | [[ Continuous data models ]] | [[Models for count data]] | [[Model for categorical data]] | [[Models for time-to-event data ]] | [[Joint models]] <br />
</sidebarmenu><br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
== The data ==<br />
<br />
Continuous data is data that can take any real value within a given range. For instance, a concentration takes its values in $\Rset^+$, the log of the viral load in $\Rset$, an effect expressed as a percentage in $[0,100]$.<br />
<br />
The data can be stored in a table and represented graphically. Here is some simple pharmacokinetics data involving four individuals.<br />
<br />
<br />
{| cellpadding="0" cellspacing="0" <br />
| style="width:60%" align="center"| <br />
:[[File:continuous_graf0a_1.png]]<br />
| style="width: 40%" align="left"| <br />
:{| class="wikitable" style="width: 70%;"<br />
!| ID || TIME ||CONCENTRATION<br />
|- <br />
|1 || 1.0 || 9.84 <br />
|-<br />
|1 || 2.0 || 8.19 <br />
|-<br />
|1 || 4.0 || 6.91 <br />
|-<br />
|1 || 8.0 || 3.71 <br />
|-<br />
|1 || 12.0 || 1.25 <br />
|-<br />
|2 || 1.0 || 17.23 <br />
|-<br />
|2 || 3.0 || 11.14 <br />
|-<br />
|2 || 5.0 || 4.35 <br />
|-<br />
|2 || 10.0 || 2.92 <br />
|-<br />
|3 || 2.0 || 9.78 <br />
|-<br />
|3 || 3.0 || 10.40 <br />
|-<br />
|3 || 4.0 || 7.67 <br />
|-<br />
|3 || 6.0 || 6.84 <br />
|-<br />
|3 || 11.0 || 1.10 <br />
|-<br />
|4 || 4.0 || 8.78 <br />
|-<br />
|4 || 6.0 || 3.87 <br />
|-<br />
|4 || 12.0 || 1.85 <br />
|}<br />
|}<br />
<br />
<br />
Instead of individual plots, we can plot them all together. Such a figure is usually called a ''spaghetti plot'':<br />
<br />
<br />
::[[File:continuous_graf0b_1.png]]<br />
<br />
<br />
<br><br />
<br />
== The model ==<br />
<br />
<br />
For continuous data, we are going to consider scalar outcomes ($y_{ij}\in \Yr \subset \Rset$) and assume the following general model:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="nlme" ><math>y_{ij}=f(t_{ij},\psi_i)+ g(t_{ij},\psi_i)\teps_{ij}, \quad\ \quad 1\leq i \leq N, \quad \ 1 \leq j \leq n_i. </math></div><br />
|reference=(1)<br />
}}<br />
<br />
where $g(t_{ij},\psi_i)\geq 0$.<br />
<br />
Here, the residual errors $(\teps_{ij})$ are standardized random variables (mean zero and standard deviation 1).<br />
In this case, it is clear that $f(t_{ij},\psi_i)$ and $g(t_{ij},\psi_i)$ are the mean and standard deviation of $y_{ij}$, i.e.,<br />
<br />
{{Equation1<br />
|equation= <math>\begin{eqnarray} \esp{y_{ij} {{!}} \psi_i} &=& f(t_{ij},\psi_i) \\ <br />
\std{y_{ij} {{!}} \psi_i} &=& g(t_{ij},\psi_i).<br />
\end{eqnarray}</math>}}<br />
<br />
<br />
<br><br />
<br />
== The structural model == <br />
<br />
<br />
$f$ is known as the ''structural model'' and aims to describe the time evolution of the phenomena under study. For a given subject $i$ and vector of individual parameters $\psi_i$, $f(t_{ij},\psi_i)$ is the prediction of the observed variable at time $t_{ij}$. In other words, it is the value that would be measured at time $t_{ij}$ if there was no error ($\teps_{ij}=0$).<br />
<br />
In the current example, we decide to model with the structural model $f=A\exp\left(-\alpha t \right)$.<br />
Here are some example curves for various combinations of $A$ and $\alpha$:<br />
<br />
<br />
::[[File:continuous_graf1bis.png|link=]]<br />
<br />
<br />
Other models involving more complicated dynamical systems can be imagined, such as those defined as solutions of systems of ordinary or partial differential equations. Real-life examples are found in the study of HIV, pharmacokinetics and tumor growth.<br />
<br />
<br />
<br />
<br><br />
== The residual error model ==<br />
<br />
<br />
For a given structural model $f$, the conditional probability distribution of the observations $(y_{ij})$ is completely defined by the residual error model, i.e., the probability distribution of the residual errors $(\teps_{ij})$ and the standard deviation $g(x_{ij},\psi_i)$. The residual error model can take many forms. For example,<br />
<br />
<br />
<ul><br />
* A constant error model assumes that $g(t_{ij},\psi_i)=a_i$. Model [[#nlme|(1)]] then reduces to<br />
<br />
{{EquationWithRef <br />
|equation=<div id="nlme1" ><math>y_{ij}=f(t_{ij},\psi_i)+ a_i\teps_{ij}, \quad \quad \ 1\leq i \leq N<br />
\quad \ 1 \leq j \leq n_i. </math></div><br />
|reference=(2) }}<br />
<br />
:The figure below shows four simulated sequences of observations $(y_{ij}, 1\leq i \leq 4, 1\leq j \leq 10)$ with their respective structural model $f(t,\psi_i)$ in blue. Here, $a_i=2$ is the standard deviation of $y_{ij}$ for all $(i,j)$.<br />
<br />
<br />
::[[File: continuous_graf2a1.png|link=]]<br />
<br />
<br />
:Let $\hat{y}_{ij}=f(t_{ij},\psi_i)$ be the prediction of $y_{ij}$ given by the model [[#nlme1|(2)]]. The figure below shows for 50 individuals:<br />
<br />
<br />
<ul><br />
::'''-left''': prediction errors $e_{ij}=y_{ij}-\hat{y}_{ij}$ vs. predictions $(\hat{y}_{ij})$. The pink line is the mean $\esp{e_{ij}}=0$; the green lines are $\pm$ 1 standard deviations: $[\std{e_{ij}} , +\std{e_{ij}}]$ where $\std{e_{ij}}=a_i=0.5$. <br />
<br><br />
::'''-right''': observations $(y_{ij})$ vs. predictions $(\hat{y}_{ij})$. The pink line is the identify $y=\hat{y}$, the green lines represent an interval of $\pm 1$ standard deviations around $\hat{y}$: $[\hat{y}-\std{e_{ij}} , \hat{y}+\std{e_{ij}}]$.<br />
</ul><br />
<br />
<br />
::[[File:continuous_graf2a2.png|link=]]<br />
<br />
<br />
:These figures are typical for constant error models. The standard deviation of the prediction errors does not depend on the value of the predictions $(\hat{y}_{ij})$, so both intervals have constant amplitude.<br />
<br />
<br />
* A proportional error model assumes that $g(t_{ij},\psi_i) =b_i f(t_{ij},\psi_i)$. Model [[#nlme|(1)]] then becomes<br />
<br />
<br />
{{EquationWithRef <br />
|equation=<div id="nlme2"><math> y_{ij}=f(t_{ij},\psi_i)(1 + b_i\teps_{ij}), \quad\ \quad 1\leq i \leq N,<br />
\quad \ 1 \leq j \leq n_i . </math></div><br />
|reference=(3) }}<br />
<br />
:The standard deviation of the prediction error $e_{ij}=y_{ij}-\hat{y}_{ij}$ is proportional to the prediction $\hat{y}_{ij}$. Therefore, the amplitude of the $\pm 1$ standard deviation intervals increases linearly with $f$:<br />
<br />
<br />
::[[File:continuous_graf2b.png|link=]]<br />
<br />
<br />
* A combined error model combines a constant and a proportional error model by assuming $g(t_{ij},\psi_i) =a_i + b_i f(t_{ij},\psi_i)$, where $a_1>0$ and $b_i>0$. The standard deviation of the prediction error $e_{ij}$ and thus the amplitude of the intervals are now affine functions of the prediction $\hat{y}_{ij}$:<br />
<br />
<br />
::[[File:continuous_graf2c.png|link=]]<br />
<br />
<br />
* Another alternative combined error model is $g(t_{ij},\psi_i) =\sqrt{a_i^2 + b_i^2 f^2(t_{ij},\psi_i)}$. This gives intervals that look fairly similar to the previous ones, though they are no longer affine.<br />
<br />
<br />
::[[File:continuous_graf2d.png|link=]]<br />
</ul><br />
<br />
<br />
<br><br />
<br />
== Extension to autocorrelated errors == <br />
<br />
<br />
For any subject $i$, the residual errors $(\teps_{ij},1\leq j \leq n_i)$ are usually assumed to be independent random variables. Extension to autocorrelated errors is possible by assuming for instance that $(\teps_{ij})$ is a stationary ARMA (Autoregressive Moving Average) process.<br />
For example, an autoregressive process of order 1, AR(1), assumes that autocorrelation decreases exponentially:<br />
<br />
{{EquationWithRef <br />
|equation=<div id="autocorr1"><math> {\rm corr}(\teps_{ij},\teps_{i\,{j+1} }) = \rho_i^{(t_{i\,j+1}-t_{ij})}. </math></div><br />
|reference=(4) }}<br />
<br />
where $0\leq \rho_i <1$ for each individual $i$.<br />
If we assume that $t_{ij}=j$ for any $(i,j)$. Then, $t_{i,j+1}-t_{i,j}=1$ and the autocorrelation function $\gamma$ is given by:<br />
<br />
{{Equation1<br />
|equation=<math> \begin{array}<br />
\gamma(\tau) &=& {\rm corr}(\teps_{ij},\teps_{i\,j+\tau}) \\ &= &\rho_i^{\tau} .<br />
\end{array}</math> }}<br />
<br />
The figure below displays 3 different sequences of residual errors simulated with 3 different autocorrelations $\rho_1=0.1$, $\rho_2=0.6$ and $\rho_3=0.95$. The autocorrelation functions $\gamma(\tau)$ are also displayed.<br />
<br />
<br />
::[[File:continuousGraf3.png|link=]]<br />
<br />
<br />
<br />
<br><br />
== Distribution of the standardized residual errors ==<br />
<br />
<br />
The distribution of the standardized residual errors $(\teps_{ij})$ is usually assumed to be the same for each individual $i$ and any observation time $t_{ij}$.<br />
Furthermore, for identifiability reasons it is also assumed to be symmetrical around 0, i.e., $\prob{\teps_{ij}<-u}=\prob{\teps_{ij}>u}$ for all $u\in \Rset$.<br />
Thus, for any $(i,j)$ the distribution of the observation $y_{ij}$ is also symmetrical around its prediction $f(t_{ij},\psi_i)$. This $f(t_{ij},\psi_i)$ is therefore both the mean and the median of the distribution of $y_{ij}$: $\esp{y_{ij}|\psi_i}=f(t_{ij},\psi_i)$ and $\prob{y_{ij}>f(t_{ij},\psi_i)} = \prob{y_{ij}<f(t_{ij},\psi_i)} = 1/2$. If we make the additional hypothesis that 0 is the mode of the distribution of $\teps_{ij}$, then $f(t_{ij},\psi_i)$ is also the mode of the distribution of $y_{ij}$.<br />
<br />
A widely used bell-shaped distribution for modeling residual errors is the normal distribution. If we assume that $\teps_{ij}\sim {\cal N}(0,1)$, then $y_{ij}$ is also normally distributed: $ y_{ij}\sim {\cal N}(f(t_{ij},\bpsi_i),\, g(t_{ij},\bpsi_i))$.<br />
<br />
Other distributions can be used, such as [http://en.wikipedia.org/wiki/Student's_t-distribution Student's t-distribution] (also known simply as the $t$-distribution) which is also symmetric and bell-shaped but with heavier tails, meaning that it is more prone to producing values that fall far from its prediction.<br />
<br />
<br />
::[[File:continuous_graf4_bis.png|link=]]<br />
<br />
<br />
If we assume that $\teps_{ij}\sim t(\nu)$, then $y_{ij}$ has a non-standardized Student's $t$-distribution.<br />
<br />
<br />
<br />
<br><br />
<br />
== The conditional likelihood ==<br />
<br />
<br />
The conditional likelihood for given observations $\by$ is defined as<br />
<br />
{{Equation1<br />
|equation=<math> {\like}(\bpsi; \by) \ \ \eqdef \ \ \pcypsi(\by {{!}} \bpsi), </math> }}<br />
<br />
where $\pcypsi(\by | \bpsi)$ is the conditional density function of the observations. <br />
If we assume that the residual errors $(\teps_{ij},\ 1\leq i \leq N,\ 1\leq j \leq n_i)$ are i.i.d., then this conditional density is straightforward to compute:<br />
<br />
{{EquationWithRef <br />
|equation=<div id="likeN_model1"><math> \begin{eqnarray}\pcypsi(\by {{!}} \bpsi ) & = & \prod_{i=1}^N \pcyipsii(\by_i {{!}} \psi_i ) \\<br />
& = & \prod_{i=1}^N \prod_{j=1}^{n_i} \pypsiij(y_{ij} {{!}} \bpsi_i ) \\<br />
& = & \prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{\frac{1}{g(t_{ij},\psi_i)} } \, \qeps\left(\frac{y_{ij} - f(t_{ij},\psi_i)}{g(t_{ij},\psi_i)}\right) ,<br />
\end{eqnarray} </math></div><br />
|reference=(5) }}<br />
<br />
where $\qeps$ is the pdf of the i.i.d. residual errors ($\teps_{ij}$).<br />
<br />
For example, if we assume that the residual errors $\teps_{ij}$ are Gaussian random variables with mean 0 and variance 1, then $ \qeps(x) = e^{-{x^2}/{2}}/\sqrt{2 \pi}$, and<br />
<br />
{{EquationWithRef <br />
|equation=<div id="likeN_model2" ><math> \begin{eqnarray}<br />
\pcypsi(\by {{!}} \psi ) & = &<br />
\prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{ \frac{1}{\sqrt{2 \pi} g(t_{ij},\psi_i)} }\, \exp\left\{-\frac{1}{2}\left(\frac{y_{ij} - f(t_{ij},\psi_i)}{g(t_{ij},\psi_i)}\right)^2\right\} .<br />
\end{eqnarray} </math></div><br />
|reference=(6) }}<br />
<br />
<br />
<br />
<br><br />
<br />
== Transforming the data==<br />
<br />
<br />
The assumption that the distribution of any observation $y_{ij}$ is symmetrical around its predicted value is a very strong one. If this assumption does not hold, we may decide to transform the data to make it more symmetric around its (transformed) predicted value. In other cases, constraints on the values that observations can take may also lead us to want to transform the data.<br />
<br />
Model [[#nlme|(1)]] can be extended to include a transformation of the data:<br />
<br />
{{EquationWithRef <br />
|equation=<div id="def_t" ><math> \transy(y_{ij})=\transy(f(t_{ij},\psi_i))+ g(t_{ij},\psi_i)\teps_{ij} </math></div><br />
|reference=(7) }}<br />
<br />
where $\transy$ is a monotonic transformation (a strictly increasing or decreasing function).<br />
As you can see, both the data $y_{ij}$ and the structural model $f$ are transformed by the function $\transy$ so that $f(t_{ij},\psi_i)$ remains the prediction of $y_{ij}$.<br />
<br />
<br />
<br />
{{Example<br />
|title=Examples: <br />
| text=<br />
1. If $y$ takes non-negative values, a log transformation can be used: $\transy(y) = \log(y)$. We can then present the model with one of two equivalent representations:<br />
<br />
<!-- Therefore, $y=f e^{g\teps}$. --><br />
<br />
{{Equation1<br />
|equation= <math> \begin{eqnarray}<br />
\log(y_{ij})&=&\log(f(t_{ij},\psi_i))+ g(t_{ij},\psi_i)\teps_{ij}, \\<br />
y_{ij}&=&f(t_{ij},\psi_i)\, e^{ \displaystyle{ -g(t_{ij},\psi_i)\teps_{ij} } }.<br />
\end{eqnarray}</math><br />
}}<br />
<br />
<br />
::[[File: continuous_graf5a.png|link=]]<br />
<br />
<br />
2. If $y$ takes its values between 0 and 1, a logit transformation can be used:<br />
<!-- %\begin{eqnarray*}<br />
%\transy(y)&=&\log(y/(1-y)) \\<br />
% y&=&\frac{f}{f+(1-f) e^{-g\teps}} .<br />
%\end{eqnarray*} --><br />
<br />
{{Equation1<br />
|equation= <math> \begin{eqnarray}<br />
\logit(y_{ij})&=&\logit(f(t_{ij},\psi_i))+ g(t_{ij},\psi_i)\teps_{ij} , \\<br />
y_{ij}&=& \displaystyle{\frac{ f(t_{ij},\bpsi_i) }{ f(t_{ij},\psi_i) + (1- f(t_{ij},\bpsi_i)) \, e^{ g(t_{ij},\psi_i)\teps_{ij} } } }.<br />
\end{eqnarray}</math><br />
}}<br />
<br />
<br />
::[[File:continuous_graf5b.png|link=]]<br />
<br />
<br />
3. The logit error model can be extended if the $y_{ij}$ are known to take their values in an interval $[A,B]$:<br />
<br />
{{Equation1<br />
|equation=<math> \begin{eqnarray}<br />
\transy(y_{ij})&=&\log((y_{ij}-A)/(B-y_{ij})), \\<br />
y_{ij}&=&A+(B-A)\displaystyle{\frac{f(t_{ij},\psi_i)-A}{f(t_{ij},\psi_i)-A+(B-f(t_{ij},\psi_i)) e^{-g(t_{ij},\psi_i)\teps_{ij} } } }\, .<br />
\end{eqnarray}</math><br />
}}<br />
<!-- [[File:continuous_graf5c.png]] --><br />
}}<br />
<br />
<br />
Using the transformation proposed in [[#def_t|(7)]], the conditional density $\pcypsi$ becomes<br />
<br />
{{EquationWithRef<br />
|equation= <div id="likeN_model3" ><math> \begin{eqnarray}<br />
\pcypsi(\by {{!}} \bpsi ) & = & \prod_{i=1}^N \prod_{j=1}^{n_i} \pypsiij(y_{ij} {{!}} \psi_i ) \\<br />
& = & \prod_{i=1}^N \prod_{j=1}^{n_i} \transy^\prime(y_{ij}) \, \ptypsiij(\transy(y_{ij}) {{!}} \psi_i ) \\<br />
& = & \prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{ \frac{\transy^\prime(y_{ij})}{g(t_{ij},\psi_i)} } \, \qeps\left(\frac{\transy(y_{ij}) - \transy(f(t_{ij},\psi_i))}{g(t_{ij},\psi_i)}\right)<br />
\end{eqnarray}<br />
</math></div><br />
|reference=(8) }}<br />
<br />
For example, if the observations are log-normally distributed given the individual parameters ($\transy(y) = \log(y)$), with a constant error model ($g(t;\psi_i)=a$), then<br />
<br />
{{Equation1<br />
|equation=<math> \pcypsi(\by {{!}} \bpsi ) = \prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{ \frac{1}{\sqrt{2 \pi a^2} \, y_{ij} } }\, \exp\left\{-\frac{1}{2 \, a^2}\left(\log(y_{ij}) - \log(f(t_{ij},\psi_i))\right)^2\right\}.<br />
</math> }} <br />
<br />
<br />
<br><br />
<br />
== Censored data ==<br />
<br />
<br />
Censoring occurs when the value of a measurement or observation is only partially known.<br />
For continuous data measurements in the longitudinal context, censoring refers to the values of the measurements, not the times at which they were taken.<br />
<br />
For example, in analytical chemistry, the lower limit of detection (LLOD) is the lowest quantity of a substance that can be distinguished from the absence of that substance. Therefore, any time the quantity is below the LLOD, the "measurement" is not a number but the information that the quantity is less than the LLOD.<br />
<br />
Similarly, in pharmacokinetic studies, measurements of the concentration below a certain limit referred to as the lower limit of quantification (LLOQ) are so low that their reliability is considered suspect. A measuring device can also have an upper limit of quantification (ULOQ) such that any value above this limit cannot be measured and reported.<br />
<br />
As hinted above, censored values are not typically reported as a number, but their existence is known, as well as the type of censoring. Thus, the observation $\repy_{ij}$ (i.e., what is reported) is the measurement $y_{ij}$ if not censored, and the type of censoring otherwise.<br />
<br />
We usually distinguish three types of censoring: left, right and interval. We now introduce these, along with illustrative data sets.<br />
<br />
<br />
* '''Left censoring''': a data point is below a certain value $L$ but it is not known by how much:<br />
<br />
{{Equation1<br />
|equation = <math> <br />
\repy_{ij} = \left\{ \begin{array}{c}<br />
y_{ij} & {\rm if } \ y_{ij} \geq L \\<br />
y_{ij} < L & {\rm otherwise.}<br />
\end{array} \right. </math> }} <br />
<br />
<blockquote>In the figures below, the "data" below the limit $L=-0.30$, shown in gray, is not observed. The values are therefore not reported in the dataset. An additional column {{Verbatim|cens}} can be used to indicate if an observation is left-censored ({{Verbatim|cens{{-}}1}}) or not ({{Verbatim|cens{{-}}0}}). The column of observations {{Verbatim|log-VL}} displays the observed log-viral load when it is above the limit $L=-0.30$, and the limit $L=-0.30$ otherwise.</blockquote><br />
<br />
<br />
{| cellspacing="0" cellpadding="0" <br />
| style="width=60%" |<br />
[[File:continuous_graf6a.png|link=]]<br />
| style="width=40%" align="right"|<br />
{| class="wikitable" style="width: 150%"<br />
!| ID || TIME ||log-VL || cens<br />
|- <br />
| 1 || 1.0 || 0.26 || 0<br />
|-<br />
| 1 || 2.0 || 0.02 || 0<br />
|-<br />
| 1 || 3.0 || -0.13 || 0<br />
|-<br />
| 1 || 4.0 || -0.13 || 0<br />
|-<br />
| 1 || 5.0 || -0.30 || 1<br />
|-<br />
| 1 || 6.0 || -0.30 || 1<br />
|-<br />
| 1 || 7.0 || -0.25 || 0<br />
|-<br />
| 1 || 8.0 || -0.30 || 1<br />
|-<br />
| 1 || 9.0 || -0.29 || 0<br />
|-<br />
| 1 || 10.0 || -0.30 || 1<br />
|}<br />
|}<br />
<br />
<br />
* '''Interval censoring:''' if a data point is in interval $I$, its exact value is not known:<br />
<br />
{{Equation1<br />
|equation=<math> \repy_{ij} = \left\{ \begin{array}{cc}<br />
y_{ij} & {\rm if } \ y_{ij}\notin I \\<br />
y_{ij} \in I & {\rm otherwise.}<br />
\end{array} \right. </math> }}<br />
<br />
<blockquote>For example, suppose we are measuring a concentration which naturally only takes non-negative values, but again we cannot measure it below the level $L = 1$. Therefore, any data point $y_{ij}$ below $1$ will be recorded only as "$y_{ij} \in [0,1)$". In the table, an additional column {{Verbatim|llimit}} is required to indicate the lower bound of the censoring interval.</blockquote><br />
<br />
<br />
{| cellspacing="0" cellpadding="0" <br />
| style="width=60%" |<br />
[[File:continuous_graf6b.png|link=]]<br />
| style="width=40%" align="right"|<br />
{| class="wikitable" style="width: 150%"<br />
!| ID || TIME ||CONC. || llimit || cens<br />
|-<br />
| 1 || 0.3 || 1.20 || . || 0<br />
|-<br />
| 1 || 0.5 || 1.93 || . || 0<br />
|-<br />
| 1 || 1.0 || 3.38 || . || 0<br />
|-<br />
| 1 || 2.0 || 3.88 || . || 0<br />
|-<br />
| 1 || 4.0 || 3.24 || . || 0<br />
|-<br />
| 1 || 6.0 || 1.82 || . || 0<br />
|-<br />
| 1 || 8.0 || 1.07 || . || 0<br />
|-<br />
| 1 || 12.0 || 1.00 || 0.00 || 1<br />
|-<br />
| 1 || 16.0 || 1.00 || 0.00 || 1<br />
|-<br />
| 1 || 20.0 || 1.00 || 0.00 || 1<br />
|}<br />
|}<br />
<br />
<br />
<br />
* '''Right censoring:''' when a data point is above a certain value $U$, it is not known by how much:<br />
<br />
{{Equation1<br />
|equation= <math> \repy_{ij} = \left\{ \begin{array}{cc}<br />
y_{ij} & {\rm if } \ y_{ij}\leq U \\<br />
y_{ij} > U & {\rm otherwise.}<br />
\end{array} \right. <br />
</math> }}<br />
<br />
<blockquote>Column {{Verbatim|cens}} is used to indicate if an observation is right-censored ({{Verbatim|cens{{-}}-1}}) or not ({{Verbatim|cens{{-}}0}}).<br />
</blockquote><br />
<br />
{| cellspacing="0" cellpadding="0" <br />
| style="width=60%" |<br />
[[File:continuous_graf6c.png|link=]]<br />
| style="width=40%" align="right" |<br />
{| class="wikitable" style="width: 150%"<br />
!| ID || TIME ||VOLUME || CENS<br />
|-<br />
| 1 || 2.0 || 1.85 || 0<br />
|-<br />
| 1 || 7.0 || 2.40 || 0<br />
|-<br />
| 1 || 12.0 || 3.27 || 0<br />
|-<br />
| 1 || 17.0 || 3.28 || 0<br />
|-<br />
| 1 || 22.0 || 3.62 || 0<br />
|- <br />
| 1 || 27.0 || 3.02 || 0<br />
|-<br />
| 1 || 32.0 || 3.80 || -1<br />
|-<br />
| 1 || 37.0 || 3.80 || -1<br />
|-<br />
| 1 || 42.0 || 3.80 || -1<br />
|-<br />
| 1 || 47.0 || 3.80 || -1<br />
|}<br />
|}<br />
<br />
<br />
<br />
{{Remarks<br />
|title=Remarks<br />
<br />
|text= &#32;<br />
* Different censoring limits and intervals can be in play at different times and for different individuals.<br />
* Interval censoring covers any type of censoring, i.e., setting $I=(-\infty,L]$ for left censoring and $I=[U,+\infty)$ for right censoring.<br />
}}<br />
<br />
<br />
The likelihood needs to be computed carefully in the presence of censored data. To cover all three types of censoring in one go, let $I_{ij}$ be the (finite or infinite) censoring interval existing for individual $i$ at time $t_{ij}$. Then,<br />
<br />
{{EquationWithRef<br />
|equation = <div id="likeN_model4"><math> <br />
\begin{eqnarray} \pcypsi(\brepy {{!}} \bpsi ) & = & \prod_{i=1}^N \prod_{j=1}^{n_i} \pypsiij(y_{ij} {{!}} \psi_i )^{\mathbf{1}_{y_{ij} \notin I_{ij} } } \, \prob{y_{ij} \in I_{ij} {{!}} \psi_i}^{\mathbf{1}_{y_{ij} \in I_{ij} } }.<br />
\end{eqnarray}<br />
</math></div><br />
|reference=(9) }}<br />
<br />
where<br />
<br />
{{Equation1<br />
|equation=<math> \prob{y_{ij} \in I_{ij} {{!}} \psi_i} = \int_{I_{ij} } \qypsiij(u {{!}} \psi_i )\, du </math> }}<br />
<br />
We see that if $y_{ij}$ is not censored (i.e., $ \mathbf{1}_{y_{ij} \notin I_{ij}} = 1$), the contribution to the likelihood is the usual $\pypsiij(y_{ij} | \psi_i )$, whereas if it is censored, the contribution is $\prob{y_{ij} \in I_{ij}|\psi_i}$.<br />
<br />
<br />
<br><br />
<br />
== Extensions to multidimensional continuous observations == <br />
<br />
<br />
<ul><br />
* Extension to multidimensional observations is straightforward. If $d$ outcomes are simultaneously measured at $t_{ij}$, then $y_{ij}$ is a now a vector in $\Rset^d$ and we can suppose that equation [[#nlme|(1)]] still holds for each component of $y_{ij}$. Thus, for $1\leq m \leq d$,<br />
<br />
{{Equation1<br />
|equation=<math><br />
y_{ijm}=f_m(t_{ij},\psi_i)+ g_m(t_{ij},\psi_i)\teps_{ijm} , \ \ 1\leq i \leq N,<br />
\ \ 1 \leq j \leq n_i.<br />
</math>}}<br />
<br />
: It is then possible to introduce correlation between the components of each observation by assuming that $\teps_{ij} = (\teps_{ijm} , 1\leq m \leq d)$ is a random vector with mean 0 and correlation matrix $R_{\teps_{ij}}$.<br />
<br />
<br />
* Suppose instead that $K$ replicates of the same measurement are taken at time $t_{ij}$. Then, the model becomes, for $1 \leq k \leq K$,<br />
<br />
{{Equation1<br />
|equation=<math><br />
y_{ijk}=f(t_{ij},\psi_i)+ g(t_{ij},\bpsi_i)\teps_{ijk} ,\ \ 1\leq i \leq N,<br />
\ \ 1 \leq j \leq n_i .<br />
</math> }}<br />
<br />
: Following what can be done for decomposing random effects into inter-individual and inter-occasion components, we can decompose the residual error into inter-measurement and inter-replicate components:<br />
<br />
{{Equation1<br />
|equation=<math><br />
y_{ijk}=f(t_{ij},\psi_i)+ g_{I\!M}(t_{ij},\psi_i)\vari{\teps}{ij}{I\!M} + g_{I\!R}(x_{ij},\psi_i)\vari{\teps}{ijk}{I\!R} .<br />
</math> }}<br />
</ul><br />
<br><br><br />
-----------------------------------------------<br />
<br><br><br />
<br />
{{Summary<br />
|title=Summary <br />
|text= <br />
A model for continuous data is completely defined by:<br />
<br />
*The structural model $f$<br />
*The residual error model $g$<br />
*The probability distribution of the residual errors $(\teps_{ij})$<br />
*Possibly a transformation $\transy$ of the data<br />
<br />
<br />
The model is associated with a design which includes:<br />
<br />
<br />
- the observation times $(t_{ij})$<br />
<br />
- possibly some additional regression variables $(x_{ij})$<br />
<br />
- possibly the inputs $(u_i)$ (e.g., the dosing regimen for a PK model)<br />
<br />
- possibly a censoring process $(I_{ij})$<br />
<br />
}}<br />
<br />
<br />
== $\mlxtran$ for continuous data models == <br />
<br />
<br />
<br />
{{ExampleWithCode<br />
|title1=Example 1:<br />
|title2=<br />
<br />
|text= <br />
|equation=<math>\begin{eqnarray}<br />
\psi &=& (A,\alpha,B,\beta, a) \\<br />
f(t,\psi) &=& A\, e^{- \alpha \, t} + B\, e^{- \beta \, t} \\<br />
y_{ij} &=& f(t_{ij} , \psi_i) + a\, \teps_{ij}<br />
\end{eqnarray}</math><br />
|code=<br />
{{MLXTranForTable<br />
|name=<br />
|text=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
INPUT:<br />
input = {A, B, alpha, beta, a}<br />
<br />
EQUATION:<br />
f = A*exp(-alpha*t) + B*exp(-beta*t)<br />
<br />
DEFINITION:<br />
y = {distribution=normal, prediction=f, std=a}</pre><br />
}}<br />
<br />
}}<br />
<br />
<br />
{{ExampleWithCode<br />
|title1=Example 2:<br />
|title2=<br />
<br />
|text=<br />
|equation= <math> \begin{eqnarray}<br />
\psi &=& (\delta, c , \beta, p, s, d, \nu,\rho, a) \\<br />
t_0 &=&0 \\[0.2cm]<br />
{\rm if \quad t<t_0} \\[0.2cm]<br />
\quad \nitc &=& \delta \, c/( \beta \, p) \\<br />
\quad \itc &=& (s - d\,\nitc) / \delta \\<br />
\quad \vl &=& p \, \itc / c. \\[0.2cm] <br />
{\rm else \quad \quad }\\[0.2cm] <br />
\quad \dA{\nitc}{} & =& s - \beta(1-\nu) \, \nitc(t) \, \vl(t) - d\,\nitc(t) \\<br />
\quad \dA{\itc}{} & = &\beta(1-\nu) \, \nitc(t) \, \vl(t) - \delta \, \itc(t) \\<br />
\quad \dA{\vl}{} & = &p(1-\rho) \, \itc(t) - c \, \vl(t) \\<br />
\quad \log(y_{ij}) &= &\log(V(t_{ij} , \psi_i)) + a\, \teps_{ij} <br />
\end{eqnarray}</math><br />
|code=<br />
{{MLXTranForTable<br />
|name=<br />
|text=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
INPUT:<br />
input = {delta, c, beta, p, s, d, nu, rho, a}<br />
<br />
EQUATION:<br />
t0=0<br />
N_0 = delta*c/(beta*p)<br />
I_0 = (s - d*N_0)/delta<br />
V_0 = p*I_0/c<br />
ddt_N = s - beta*(1-nu)*N*V - d*N<br />
ddt_I = beta*(1-nu)*N*V - delta*I<br />
ddt_V = p*(1-rho)*I - c*V<br />
<br />
DEFINITION:<br />
y = {distribution=logNormal, prediction=V, std=a}<br />
</pre> }} <br />
}}<br />
<br />
<br><br><br />
<br />
<br />
==Bibliography==<br />
<br />
<br />
<bibtex><br />
@book{davidian1995,<br />
author = {Davidian, M. and Giltinan, D.M. },<br />
title = {Nonlinear Models for Repeated Measurements Data },<br />
publisher = {Chapman & Hall.},<br />
address = {London},<br />
edition = {},<br />
year = {1995}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{demidenko2005mixed,<br />
title={Mixed Models: Theory and Applications},<br />
author={Demidenko, E.},<br />
isbn={9780471726135},<br />
series={Wiley Series in Probability and Statistics}, url={http://books.google.fr/books/about/Mixed_Models.html?id=IWQR8d_UZHoC&redir_esc=y}, <br />
year={2005}, publisher={Wiley}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{fitzmaurice2008longitudinal,<br />
title={Longitudinal Data Analysis},<br />
author={Fitzmaurice, G. and Davidian, M. and Verbeke, G. and Molenberghs, G.},<br />
isbn={9781420011579},<br />
lccn={2008020681},<br />
series={Chapman & Hall/CRC Handbooks of Modern Statistical Methods},url={http://books.google.fr/books?id=zVBjCvQCoGQC},<br />
year={2008},publisher={Taylor & Francis}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{jiang2007,<br />
author = {Jiang, J.},<br />
title = {Linear and Generalized Linear Mixed Models and Their Applications},<br />
publisher = {Springer Series in Statistics},<br />
year = {2007},<br />
address = {New York}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{laird1982,<br />
author = {Laird, N.M. and Ware, J.H.},<br />
title = {Random-Effects Models for Longitudinal Data},<br />
journal = {Biometrics},<br />
volume = {38},<br />
pages = {963-974},<br />
year = {1982}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{lindstrom1990Nonlinear,<br />
author = {Lindstrom, M.J. and Bates, D.M. },<br />
title = {Nonlinear mixed-effects models for repeated measures},<br />
journal = {Biometrics},<br />
volume = {46},<br />
pages = {673-687},<br />
year = {1990}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{littell2006sas,<br />
title={SAS for mixed models},<br />
author={Littell, R.C.},<br />
year={2006},<br />
publisher={SAS institute}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{mcculloch2011generalized,<br />
title={Generalized, Linear, and Mixed Models},<br />
author={McCulloch, C.E. and Searle, S.R.},<br />
isbn={9781118209967},<br />
series={Wiley Series in Probability and Statistics}, url={http://books.google.fr/books/about/Generalized_Linear_and_Mixed_Models.html?id=bWDPukohugQC&redir_esc=y}, year={2004}, publisher={Wiley & Sons} <br />
}<br />
</bibtex><br />
<bibtex><br />
@book{verbeke2009linear,<br />
title={Linear Mixed Models for Longitudinal Data},<br />
author={Verbeke, G. and Molenberghs, G.},<br />
isbn={9781441902993},<br />
lccn={2010483807},<br />
series={Springer Series in Statistics},<br />
url={http://books.google.fr/books?id=jmPkX4VU7h0C},<br />
year={2009},<br />
publisher={Springer}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{west2006linear,<br />
title={Linear Mixed Models: A Practical Guide Using Statistical Software},<br />
author={West, B. and Welch, K.B. and Galecki, A.T.},<br />
isbn={9781584884804},<br />
lccn={2006045440},year={2006},publisher={Taylor & Francis}<br />
}<br />
</bibtex><br />
<br />
<br />
{{Back&Next<br />
|linkBack=Modeling the observations <br />
|linkNext=Models for count data }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=Continuous_data_models&diff=7288Continuous data models2013-06-07T13:37:23Z<p>Brocco: /* Transforming the data */</p>
<hr />
<div><!-- Menu for the Observations chapter --><br />
<sidebarmenu><br />
+[[Modeling the observations]]<br />
*[[Modeling the observations| Introduction ]] | [[ Continuous data models ]] | [[Models for count data]] | [[Model for categorical data]] | [[Models for time-to-event data ]] | [[Joint models]] <br />
</sidebarmenu><br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
== The data ==<br />
<br />
Continuous data is data that can take any real value within a given range. For instance, a concentration takes its values in $\Rset^+$, the log of the viral load in $\Rset$, an effect expressed as a percentage in $[0,100]$.<br />
<br />
The data can be stored in a table and represented graphically. Here is some simple pharmacokinetics data involving four individuals.<br />
<br />
<br />
{| cellpadding="0" cellspacing="0" <br />
| style="width:60%" align="center"| <br />
:[[File:continuous_graf0a_1.png]]<br />
| style="width: 40%" align="left"| <br />
:{| class="wikitable" style="width: 70%;"<br />
!| ID || TIME ||CONCENTRATION<br />
|- <br />
|1 || 1.0 || 9.84 <br />
|-<br />
|1 || 2.0 || 8.19 <br />
|-<br />
|1 || 4.0 || 6.91 <br />
|-<br />
|1 || 8.0 || 3.71 <br />
|-<br />
|1 || 12.0 || 1.25 <br />
|-<br />
|2 || 1.0 || 17.23 <br />
|-<br />
|2 || 3.0 || 11.14 <br />
|-<br />
|2 || 5.0 || 4.35 <br />
|-<br />
|2 || 10.0 || 2.92 <br />
|-<br />
|3 || 2.0 || 9.78 <br />
|-<br />
|3 || 3.0 || 10.40 <br />
|-<br />
|3 || 4.0 || 7.67 <br />
|-<br />
|3 || 6.0 || 6.84 <br />
|-<br />
|3 || 11.0 || 1.10 <br />
|-<br />
|4 || 4.0 || 8.78 <br />
|-<br />
|4 || 6.0 || 3.87 <br />
|-<br />
|4 || 12.0 || 1.85 <br />
|}<br />
|}<br />
<br />
<br />
Instead of individual plots, we can plot them all together. Such a figure is usually called a ''spaghetti plot'':<br />
<br />
<br />
::[[File:continuous_graf0b_1.png]]<br />
<br />
<br />
<br><br />
<br />
== The model ==<br />
<br />
<br />
For continuous data, we are going to consider scalar outcomes ($y_{ij}\in \Yr \subset \Rset$) and assume the following general model:<br />
<br />
{{EquationWithRef<br />
|equation=<div id="nlme" ><math>y_{ij}=f(t_{ij},\psi_i)+ g(t_{ij},\psi_i)\teps_{ij}, \quad\ \quad 1\leq i \leq N, \quad \ 1 \leq j \leq n_i. </math></div><br />
|reference=(1)<br />
}}<br />
<br />
where $g(t_{ij},\psi_i)\geq 0$.<br />
<br />
Here, the residual errors $(\teps_{ij})$ are standardized random variables (mean zero and standard deviation 1).<br />
In this case, it is clear that $f(t_{ij},\psi_i)$ and $g(t_{ij},\psi_i)$ are the mean and standard deviation of $y_{ij}$, i.e.,<br />
<br />
{{Equation1<br />
|equation= <math>\begin{eqnarray} \esp{y_{ij} {{!}} \psi_i} &=& f(t_{ij},\psi_i) \\ <br />
\std{y_{ij} {{!}} \psi_i} &=& g(t_{ij},\psi_i).<br />
\end{eqnarray}</math>}}<br />
<br />
<br />
<br><br />
<br />
== The structural model == <br />
<br />
<br />
$f$ is known as the ''structural model'' and aims to describe the time evolution of the phenomena under study. For a given subject $i$ and vector of individual parameters $\psi_i$, $f(t_{ij},\psi_i)$ is the prediction of the observed variable at time $t_{ij}$. In other words, it is the value that would be measured at time $t_{ij}$ if there was no error ($\teps_{ij}=0$).<br />
<br />
In the current example, we decide to model with the structural model $f=A\exp\left(-\alpha t \right)$.<br />
Here are some example curves for various combinations of $A$ and $\alpha$:<br />
<br />
<br />
::[[File:continuous_graf1bis.png|link=]]<br />
<br />
<br />
Other models involving more complicated dynamical systems can be imagined, such as those defined as solutions of systems of ordinary or partial differential equations. Real-life examples are found in the study of HIV, pharmacokinetics and tumor growth.<br />
<br />
<br />
<br />
<br><br />
== The residual error model ==<br />
<br />
<br />
For a given structural model $f$, the conditional probability distribution of the observations $(y_{ij})$ is completely defined by the residual error model, i.e., the probability distribution of the residual errors $(\teps_{ij})$ and the standard deviation $g(x_{ij},\psi_i)$. The residual error model can take many forms. For example,<br />
<br />
<br />
<ul><br />
* A constant error model assumes that $g(t_{ij},\psi_i)=a_i$. Model [[#nlme|(1)]] then reduces to<br />
<br />
{{EquationWithRef <br />
|equation=<div id="nlme1" ><math>y_{ij}=f(t_{ij},\psi_i)+ a_i\teps_{ij}, \quad \quad \ 1\leq i \leq N<br />
\quad \ 1 \leq j \leq n_i. </math></div><br />
|reference=(2) }}<br />
<br />
:The figure below shows four simulated sequences of observations $(y_{ij}, 1\leq i \leq 4, 1\leq j \leq 10)$ with their respective structural model $f(t,\psi_i)$ in blue. Here, $a_i=2$ is the standard deviation of $y_{ij}$ for all $(i,j)$.<br />
<br />
<br />
::[[File: continuous_graf2a1.png|link=]]<br />
<br />
<br />
:Let $\hat{y}_{ij}=f(t_{ij},\psi_i)$ be the prediction of $y_{ij}$ given by the model [[#nlme1|(2)]]. The figure below shows for 50 individuals:<br />
<br />
<br />
<ul><br />
::'''-left''': prediction errors $e_{ij}=y_{ij}-\hat{y}_{ij}$ vs. predictions $(\hat{y}_{ij})$. The pink line is the mean $\esp{e_{ij}}=0$; the green lines are $\pm$ 1 standard deviations: $[\std{e_{ij}} , +\std{e_{ij}}]$ where $\std{e_{ij}}=a_i=0.5$. <br />
<br><br />
::'''-right''': observations $(y_{ij})$ vs. predictions $(\hat{y}_{ij})$. The pink line is the identify $y=\hat{y}$, the green lines represent an interval of $\pm 1$ standard deviations around $\hat{y}$: $[\hat{y}-\std{e_{ij}} , \hat{y}+\std{e_{ij}}]$.<br />
</ul><br />
<br />
<br />
::[[File:continuous_graf2a2.png|link=]]<br />
<br />
<br />
:These figures are typical for constant error models. The standard deviation of the prediction errors does not depend on the value of the predictions $(\hat{y}_{ij})$, so both intervals have constant amplitude.<br />
<br />
<br />
* A proportional error model assumes that $g(t_{ij},\psi_i) =b_i f(t_{ij},\psi_i)$. Model [[#nlme|(1)]] then becomes<br />
<br />
<br />
{{EquationWithRef <br />
|equation=<div id="nlme2"><math> y_{ij}=f(t_{ij},\psi_i)(1 + b_i\teps_{ij}), \quad\ \quad 1\leq i \leq N,<br />
\quad \ 1 \leq j \leq n_i . </math></div><br />
|reference=(3) }}<br />
<br />
:The standard deviation of the prediction error $e_{ij}=y_{ij}-\hat{y}_{ij}$ is proportional to the prediction $\hat{y}_{ij}$. Therefore, the amplitude of the $\pm 1$ standard deviation intervals increases linearly with $f$:<br />
<br />
<br />
::[[File:continuous_graf2b.png|link=]]<br />
<br />
<br />
* A combined error model combines a constant and a proportional error model by assuming $g(t_{ij},\psi_i) =a_i + b_i f(t_{ij},\psi_i)$, where $a_1>0$ and $b_i>0$. The standard deviation of the prediction error $e_{ij}$ and thus the amplitude of the intervals are now affine functions of the prediction $\hat{y}_{ij}$:<br />
<br />
<br />
::[[File:continuous_graf2c.png|link=]]<br />
<br />
<br />
* Another alternative combined error model is $g(t_{ij},\psi_i) =\sqrt{a_i^2 + b_i^2 f^2(t_{ij},\psi_i)}$. This gives intervals that look fairly similar to the previous ones, though they are no longer affine.<br />
<br />
<br />
::[[File:continuous_graf2d.png|link=]]<br />
</ul><br />
<br />
<br />
<br><br />
<br />
== Extension to autocorrelated errors == <br />
<br />
<br />
For any subject $i$, the residual errors $(\teps_{ij},1\leq j \leq n_i)$ are usually assumed to be independent random variables. Extension to autocorrelated errors is possible by assuming for instance that $(\teps_{ij})$ is a stationary ARMA (Autoregressive Moving Average) process.<br />
For example, an autoregressive process of order 1, AR(1), assumes that autocorrelation decreases exponentially:<br />
<br />
{{EquationWithRef <br />
|equation=<div id="autocorr1"><math> {\rm corr}(\teps_{ij},\teps_{i\,{j+1} }) = \rho_i^{(t_{i\,j+1}-t_{ij})}. </math></div><br />
|reference=(4) }}<br />
<br />
where $0\leq \rho_i <1$ for each individual $i$.<br />
If we assume that $t_{ij}=j$ for any $(i,j)$. Then, $t_{i,j+1}-t_{i,j}=1$ and the autocorrelation function $\gamma$ is given by:<br />
<br />
{{Equation1<br />
|equation=<math> \begin{array}<br />
\gamma(\tau) &=& {\rm corr}(\teps_{ij},\teps_{i\,j+\tau}) \\ &= &\rho_i^{\tau} .<br />
\end{array}</math> }}<br />
<br />
The figure below displays 3 different sequences of residual errors simulated with 3 different autocorrelations $\rho_1=0.1$, $\rho_2=0.6$ and $\rho_3=0.95$. The autocorrelation functions $\gamma(\tau)$ are also displayed.<br />
<br />
<br />
::[[File:continuousGraf3.png|link=]]<br />
<br />
<br />
<br />
<br><br />
== Distribution of the standardized residual errors ==<br />
<br />
<br />
The distribution of the standardized residual errors $(\teps_{ij})$ is usually assumed to be the same for each individual $i$ and any observation time $t_{ij}$.<br />
Furthermore, for identifiability reasons it is also assumed to be symmetrical around 0, i.e., $\prob{\teps_{ij}<-u}=\prob{\teps_{ij}>u}$ for all $u\in \Rset$.<br />
Thus, for any $(i,j)$ the distribution of the observation $y_{ij}$ is also symmetrical around its prediction $f(t_{ij},\psi_i)$. This $f(t_{ij},\psi_i)$ is therefore both the mean and the median of the distribution of $y_{ij}$: $\esp{y_{ij}|\psi_i}=f(t_{ij},\psi_i)$ and $\prob{y_{ij}>f(t_{ij},\psi_i)} = \prob{y_{ij}<f(t_{ij},\psi_i)} = 1/2$. If we make the additional hypothesis that 0 is the mode of the distribution of $\teps_{ij}$, then $f(t_{ij},\psi_i)$ is also the mode of the distribution of $y_{ij}$.<br />
<br />
A widely used bell-shaped distribution for modeling residual errors is the normal distribution. If we assume that $\teps_{ij}\sim {\cal N}(0,1)$, then $y_{ij}$ is also normally distributed: $ y_{ij}\sim {\cal N}(f(t_{ij},\bpsi_i),\, g(t_{ij},\bpsi_i))$.<br />
<br />
Other distributions can be used, such as [http://en.wikipedia.org/wiki/Student's_t-distribution Student's t-distribution] (also known simply as the $t$-distribution) which is also symmetric and bell-shaped but with heavier tails, meaning that it is more prone to producing values that fall far from its prediction.<br />
<br />
<br />
::[[File:continuous_graf4_bis.png|link=]]<br />
<br />
<br />
If we assume that $\teps_{ij}\sim t(\nu)$, then $y_{ij}$ has a non-standardized Student's $t$-distribution.<br />
<br />
<br />
<br />
<br><br />
<br />
== The conditional likelihood ==<br />
<br />
<br />
The conditional likelihood for given observations $\by$ is defined as<br />
<br />
{{Equation1<br />
|equation=<math> {\like}(\bpsi; \by) \ \ \eqdef \ \ \pcypsi(\by {{!}} \bpsi), </math> }}<br />
<br />
where $\pcypsi(\by | \bpsi)$ is the conditional density function of the observations. <br />
If we assume that the residual errors $(\teps_{ij},\ 1\leq i \leq N,\ 1\leq j \leq n_i)$ are i.i.d., then this conditional density is straightforward to compute:<br />
<br />
{{EquationWithRef <br />
|equation=<div id="likeN_model1"><math> \begin{eqnarray}\pcypsi(\by {{!}} \bpsi ) & = & \prod_{i=1}^N \pcyipsii(\by_i {{!}} \psi_i ) \\<br />
& = & \prod_{i=1}^N \prod_{j=1}^{n_i} \pypsiij(y_{ij} {{!}} \bpsi_i ) \\<br />
& = & \prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{\frac{1}{g(t_{ij},\psi_i)} } \, \qeps\left(\frac{y_{ij} - f(t_{ij},\psi_i)}{g(t_{ij},\psi_i)}\right) ,<br />
\end{eqnarray} </math></div><br />
|reference=(5) }}<br />
<br />
where $\qeps$ is the pdf of the i.i.d. residual errors ($\teps_{ij}$).<br />
<br />
For example, if we assume that the residual errors $\teps_{ij}$ are Gaussian random variables with mean 0 and variance 1, then $ \qeps(x) = e^{-{x^2}/{2}}/\sqrt{2 \pi}$, and<br />
<br />
{{EquationWithRef <br />
|equation=<div id="likeN_model2" ><math> \begin{eqnarray}<br />
\pcypsi(\by {{!}} \psi ) & = &<br />
\prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{ \frac{1}{\sqrt{2 \pi} g(t_{ij},\psi_i)} }\, \exp\left\{-\frac{1}{2}\left(\frac{y_{ij} - f(t_{ij},\psi_i)}{g(t_{ij},\psi_i)}\right)^2\right\} .<br />
\end{eqnarray} </math></div><br />
|reference=(6) }}<br />
<br />
<br />
<br />
<br><br />
<br />
== Transforming the data==<br />
<br />
<br />
The assumption that the distribution of any observation $y_{ij}$ is symmetrical around its predicted value is a very strong one. If this assumption does not hold, we may decide to transform the data to make it more symmetric around its (transformed) predicted value. In other cases, constraints on the values that observations can take may also lead us to want to transform the data.<br />
<br />
Model [[#nlme|(1)]] can be extended to include a transformation of the data:<br />
<br />
{{EquationWithRef <br />
|equation=<div id="def_t" ><math> \transy(y_{ij})=\transy(f(t_{ij},\psi_i))+ g(t_{ij},\psi_i)\teps_{ij} </math></div><br />
|reference=(7) }}<br />
<br />
where $\transy$ is a monotonic transformation (a strictly increasing or decreasing function).<br />
As you can see, both the data $y_{ij}$ and the structural model $f$ are transformed by the function $\transy$ so that $f(t_{ij},\psi_i)$ remains the prediction of $y_{ij}$.<br />
<br />
<br />
<br />
{{Example<br />
|title=Examples: <br />
| text=<br />
1. If $y$ takes non-negative values, a log transformation can be used: $\transy(y) = \log(y)$. We can then present the model with one of two equivalent representations:<br />
<br />
<!-- Therefore, $y=f e^{g\teps}$. --><br />
<br />
{{Equation1<br />
|equation= <math> \begin{eqnarray}<br />
\log(y_{ij})&=&\log(f(t_{ij},\psi_i))+ g(t_{ij},\psi_i)\teps_{ij}, \\<br />
y_{ij}&=&f(t_{ij},\psi_i)\, e^{ \displaystyle{ -g(t_{ij},\psi_i)\teps_{ij} } }.<br />
\end{eqnarray}</math><br />
}}<br />
<br />
<br />
::[[File: continuous_graf5a.png|link=]]<br />
<br />
<br />
2. If $y$ takes its values between 0 and 1, a logit transformation can be used:<br />
<!-- %\begin{eqnarray*}<br />
%\transy(y)&=&\log(y/(1-y)) \\<br />
% y&=&\frac{f}{f+(1-f) e^{-g\teps}} .<br />
%\end{eqnarray*} --><br />
<br />
{{Equation1<br />
|equation= <math> \begin{eqnarray}<br />
\logit(y_{ij})&=&\logit(f(t_{ij},\psi_i))+ g(t_{ij},\psi_i)\teps_{ij} , \\<br />
y_{ij}&=& \displaystyle{\frac{ f(t_{ij},\bpsi_i) }{ f(t_{ij},\psi_i) + (1- f(t_{ij},\bpsi_i)) \, e^{ g(t_{ij},\psi_i)\teps_{ij} } } }.<br />
\end{eqnarray}</math><br />
}}<br />
<br />
<br />
::[[File:continuous_graf5b.png|link=]]<br />
<br />
<br />
3. The logit error model can be extended if the $y_{ij}$ are known to take their values in an interval $[A,B]$:<br />
<br />
{{Equation1<br />
|equation=<math> \begin{eqnarray}<br />
\transy(y_{ij})&=&\log((y_{ij}-A)/(B-y_{ij})), \\<br />
y_{ij}&=&A+(B-A)\displaystyle{\frac{f(t_{ij},\psi_i)-A}{f(t_{ij},\psi_i)-A+(B-f(t_{ij},\psi_i)) e^{-g(t_{ij},\psi_i)\teps_{ij} } } }\, .<br />
\end{eqnarray}</math><br />
}}<br />
<!-- [[File:continuous_graf5c.png]] --><br />
}}<br />
<br />
<br />
Using the transformation proposed in [[#def_t|(7)]], the conditional density $\pcypsi$ becomes<br />
<br />
{{EquationWithRef<br />
|equation= <div id="likeN_model3" ><math> \begin{eqnarray}<br />
\pcypsi(\by {{!}} \bpsi ) & = & \prod_{i=1}^N \prod_{j=1}^{n_i} \pypsiij(y_{ij} {{!}} \psi_i ) \\<br />
& = & \prod_{i=1}^N \prod_{j=1}^{n_i} \transy^\prime(y_{ij}) \, \ptypsiij(\transy(y_{ij}) {{!}} \psi_i ) \\<br />
& = & \prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{ \frac{\transy^\prime(y_{ij})}{g(t_{ij},\psi_i)} } \, \qeps\left(\frac{\transy(y_{ij}) - \transy(f(t_{ij},\psi_i))}{g(t_{ij},\psi_i)}\right)<br />
\end{eqnarray}<br />
</math></div><br />
|reference=(8) }}<br />
<br />
For example, if the observations are log-normally distributed given the individual parameters ($\transy(y) = \log(y)$), with a constant error model ($g(t;\psi_i)=a$), then<br />
<br />
{{Equation1<br />
|equation=<math> \pcypsi(\by {{!}} \bpsi ) = \prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{ \frac{1}{\sqrt{2 \pi a^2} \, y_{ij} } }\, \exp\left\{-\frac{1}{2 \, a^2}\left(\log(y_{ij}) - \log(f(t_{ij},\psi_i))\right)^2\right\}.<br />
</math> }} <br />
<br />
<br />
<br><br />
<br />
== Censored data ==<br />
<br />
<br />
Censoring occurs when the value of a measurement or observation is only partially known.<br />
For continuous data measurements in the longitudinal context, censoring refers to the values of the measurements, not the times at which they were taken.<br />
<br />
For example, in analytical chemistry, the lower limit of detection (LLOD) is the lowest quantity of a substance that can be distinguished from the absence of that substance. Therefore, any time the quantity is below the LLOD, the "measurement" is not a number but the information that the quantity is less than the LLOD.<br />
<br />
Similarly, in pharmacokinetic studies, measurements of the concentration below a certain limit referred to as the lower limit of quantification (LLOQ) are so low that their reliability is considered suspect. A measuring device can also have an upper limit of quantification (ULOQ) such that any value above this limit cannot be measured and reported.<br />
<br />
As hinted above, censored values are not typically reported as a number, but their existence is known, as well as the type of censoring. Thus, the observation $\repy_{ij}$ (i.e., what is reported) is the measurement $y_{ij}$ if not censored, and the type of censoring otherwise.<br />
<br />
We usually distinguish three types of censoring: left, right and interval. We now introduce these, along with illustrative data sets.<br />
<br />
<br />
* '''Left censoring''': a data point is below a certain value $L$ but it is not known by how much:<br />
<br />
{{Equation1<br />
|equation = <math> <br />
\repy_{ij} = \left\{ \begin{array}{c}<br />
y_{ij} & {\rm if } \ y_{ij} \geq L \\<br />
y_{ij} < L & {\rm otherwise.}<br />
\end{array} \right. </math> }} <br />
<br />
<blockquote>In the figures below, the "data" below the limit $L=-0.30$, shown in gray, is not observed. The values are therefore not reported in the dataset. An additional column {{Verbatim|cens}} can be used to indicate if an observation is left-censored ({{Verbatim|cens{{-}}1}}) or not ({{Verbatim|cens{{-}}0}}). The column of observations {{Verbatim|log-VL}} displays the observed log-viral load when it is above the limit $L=-0.30$, and the limit $L=-0.30$ otherwise.</blockquote><br />
<br />
<br />
{| cellspacing="0" cellpadding="0" <br />
| style="width=60%" |<br />
[[File:continuous_graf6a.png]]<br />
| style="width=40%" align="right"|<br />
{| class="wikitable" style="width: 150%"<br />
!| ID || TIME ||log-VL || cens<br />
|- <br />
| 1 || 1.0 || 0.26 || 0<br />
|-<br />
| 1 || 2.0 || 0.02 || 0<br />
|-<br />
| 1 || 3.0 || -0.13 || 0<br />
|-<br />
| 1 || 4.0 || -0.13 || 0<br />
|-<br />
| 1 || 5.0 || -0.30 || 1<br />
|-<br />
| 1 || 6.0 || -0.30 || 1<br />
|-<br />
| 1 || 7.0 || -0.25 || 0<br />
|-<br />
| 1 || 8.0 || -0.30 || 1<br />
|-<br />
| 1 || 9.0 || -0.29 || 0<br />
|-<br />
| 1 || 10.0 || -0.30 || 1<br />
|}<br />
|}<br />
<br />
<br />
* '''Interval censoring:''' if a data point is in interval $I$, its exact value is not known:<br />
<br />
{{Equation1<br />
|equation=<math> \repy_{ij} = \left\{ \begin{array}{cc}<br />
y_{ij} & {\rm if } \ y_{ij}\notin I \\<br />
y_{ij} \in I & {\rm otherwise.}<br />
\end{array} \right. </math> }}<br />
<br />
<blockquote>For example, suppose we are measuring a concentration which naturally only takes non-negative values, but again we cannot measure it below the level $L = 1$. Therefore, any data point $y_{ij}$ below $1$ will be recorded only as "$y_{ij} \in [0,1)$". In the table, an additional column {{Verbatim|llimit}} is required to indicate the lower bound of the censoring interval.</blockquote><br />
<br />
<br />
{| cellspacing="0" cellpadding="0" <br />
| style="width=60%" |<br />
[[File:continuous_graf6b.png]]<br />
| style="width=40%" align="right"|<br />
{| class="wikitable" style="width: 150%"<br />
!| ID || TIME ||CONC. || llimit || cens<br />
|-<br />
| 1 || 0.3 || 1.20 || . || 0<br />
|-<br />
| 1 || 0.5 || 1.93 || . || 0<br />
|-<br />
| 1 || 1.0 || 3.38 || . || 0<br />
|-<br />
| 1 || 2.0 || 3.88 || . || 0<br />
|-<br />
| 1 || 4.0 || 3.24 || . || 0<br />
|-<br />
| 1 || 6.0 || 1.82 || . || 0<br />
|-<br />
| 1 || 8.0 || 1.07 || . || 0<br />
|-<br />
| 1 || 12.0 || 1.00 || 0.00 || 1<br />
|-<br />
| 1 || 16.0 || 1.00 || 0.00 || 1<br />
|-<br />
| 1 || 20.0 || 1.00 || 0.00 || 1<br />
|}<br />
|}<br />
<br />
<br />
<br />
* '''Right censoring:''' when a data point is above a certain value $U$, it is not known by how much:<br />
<br />
{{Equation1<br />
|equation= <math> \repy_{ij} = \left\{ \begin{array}{cc}<br />
y_{ij} & {\rm if } \ y_{ij}\leq U \\<br />
y_{ij} > U & {\rm otherwise.}<br />
\end{array} \right. <br />
</math> }}<br />
<br />
<blockquote>Column {{Verbatim|cens}} is used to indicate if an observation is right-censored ({{Verbatim|cens{{-}}-1}}) or not ({{Verbatim|cens{{-}}0}}).<br />
</blockquote><br />
<br />
{| cellspacing="0" cellpadding="0" <br />
| style="width=60%" |<br />
[[File:continuous_graf6c.png]]<br />
| style="width=40%" align="right" |<br />
{| class="wikitable" style="width: 150%"<br />
!| ID || TIME ||VOLUME || CENS<br />
|-<br />
| 1 || 2.0 || 1.85 || 0<br />
|-<br />
| 1 || 7.0 || 2.40 || 0<br />
|-<br />
| 1 || 12.0 || 3.27 || 0<br />
|-<br />
| 1 || 17.0 || 3.28 || 0<br />
|-<br />
| 1 || 22.0 || 3.62 || 0<br />
|- <br />
| 1 || 27.0 || 3.02 || 0<br />
|-<br />
| 1 || 32.0 || 3.80 || -1<br />
|-<br />
| 1 || 37.0 || 3.80 || -1<br />
|-<br />
| 1 || 42.0 || 3.80 || -1<br />
|-<br />
| 1 || 47.0 || 3.80 || -1<br />
|}<br />
|}<br />
<br />
<br />
<br />
{{Remarks<br />
|title=Remarks<br />
<br />
|text= &#32;<br />
* Different censoring limits and intervals can be in play at different times and for different individuals.<br />
* Interval censoring covers any type of censoring, i.e., setting $I=(-\infty,L]$ for left censoring and $I=[U,+\infty)$ for right censoring.<br />
}}<br />
<br />
<br />
The likelihood needs to be computed carefully in the presence of censored data. To cover all three types of censoring in one go, let $I_{ij}$ be the (finite or infinite) censoring interval existing for individual $i$ at time $t_{ij}$. Then,<br />
<br />
{{EquationWithRef<br />
|equation = <div id="likeN_model4"><math> <br />
\begin{eqnarray} \pcypsi(\brepy {{!}} \bpsi ) & = & \prod_{i=1}^N \prod_{j=1}^{n_i} \pypsiij(y_{ij} {{!}} \psi_i )^{\mathbf{1}_{y_{ij} \notin I_{ij} } } \, \prob{y_{ij} \in I_{ij} {{!}} \psi_i}^{\mathbf{1}_{y_{ij} \in I_{ij} } }.<br />
\end{eqnarray}<br />
</math></div><br />
|reference=(9) }}<br />
<br />
where<br />
<br />
{{Equation1<br />
|equation=<math> \prob{y_{ij} \in I_{ij} {{!}} \psi_i} = \int_{I_{ij} } \qypsiij(u {{!}} \psi_i )\, du </math> }}<br />
<br />
We see that if $y_{ij}$ is not censored (i.e., $ \mathbf{1}_{y_{ij} \notin I_{ij}} = 1$), the contribution to the likelihood is the usual $\pypsiij(y_{ij} | \psi_i )$, whereas if it is censored, the contribution is $\prob{y_{ij} \in I_{ij}|\psi_i}$.<br />
<br />
<br />
<br><br />
<br />
== Extensions to multidimensional continuous observations == <br />
<br />
<br />
<ul><br />
* Extension to multidimensional observations is straightforward. If $d$ outcomes are simultaneously measured at $t_{ij}$, then $y_{ij}$ is a now a vector in $\Rset^d$ and we can suppose that equation [[#nlme|(1)]] still holds for each component of $y_{ij}$. Thus, for $1\leq m \leq d$,<br />
<br />
{{Equation1<br />
|equation=<math><br />
y_{ijm}=f_m(t_{ij},\psi_i)+ g_m(t_{ij},\psi_i)\teps_{ijm} , \ \ 1\leq i \leq N,<br />
\ \ 1 \leq j \leq n_i.<br />
</math>}}<br />
<br />
: It is then possible to introduce correlation between the components of each observation by assuming that $\teps_{ij} = (\teps_{ijm} , 1\leq m \leq d)$ is a random vector with mean 0 and correlation matrix $R_{\teps_{ij}}$.<br />
<br />
<br />
* Suppose instead that $K$ replicates of the same measurement are taken at time $t_{ij}$. Then, the model becomes, for $1 \leq k \leq K$,<br />
<br />
{{Equation1<br />
|equation=<math><br />
y_{ijk}=f(t_{ij},\psi_i)+ g(t_{ij},\bpsi_i)\teps_{ijk} ,\ \ 1\leq i \leq N,<br />
\ \ 1 \leq j \leq n_i .<br />
</math> }}<br />
<br />
: Following what can be done for decomposing random effects into inter-individual and inter-occasion components, we can decompose the residual error into inter-measurement and inter-replicate components:<br />
<br />
{{Equation1<br />
|equation=<math><br />
y_{ijk}=f(t_{ij},\psi_i)+ g_{I\!M}(t_{ij},\psi_i)\vari{\teps}{ij}{I\!M} + g_{I\!R}(x_{ij},\psi_i)\vari{\teps}{ijk}{I\!R} .<br />
</math> }}<br />
</ul><br />
<br><br><br />
-----------------------------------------------<br />
<br><br><br />
<br />
{{Summary<br />
|title=Summary <br />
|text= <br />
A model for continuous data is completely defined by:<br />
<br />
*The structural model $f$<br />
*The residual error model $g$<br />
*The probability distribution of the residual errors $(\teps_{ij})$<br />
*Possibly a transformation $\transy$ of the data<br />
<br />
<br />
The model is associated with a design which includes:<br />
<br />
<br />
- the observation times $(t_{ij})$<br />
<br />
- possibly some additional regression variables $(x_{ij})$<br />
<br />
- possibly the inputs $(u_i)$ (e.g., the dosing regimen for a PK model)<br />
<br />
- possibly a censoring process $(I_{ij})$<br />
<br />
}}<br />
<br />
<br />
== $\mlxtran$ for continuous data models == <br />
<br />
<br />
<br />
{{ExampleWithCode<br />
|title1=Example 1:<br />
|title2=<br />
<br />
|text= <br />
|equation=<math>\begin{eqnarray}<br />
\psi &=& (A,\alpha,B,\beta, a) \\<br />
f(t,\psi) &=& A\, e^{- \alpha \, t} + B\, e^{- \beta \, t} \\<br />
y_{ij} &=& f(t_{ij} , \psi_i) + a\, \teps_{ij}<br />
\end{eqnarray}</math><br />
|code=<br />
{{MLXTranForTable<br />
|name=<br />
|text=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
INPUT:<br />
input = {A, B, alpha, beta, a}<br />
<br />
EQUATION:<br />
f = A*exp(-alpha*t) + B*exp(-beta*t)<br />
<br />
DEFINITION:<br />
y = {distribution=normal, prediction=f, std=a}</pre><br />
}}<br />
<br />
}}<br />
<br />
<br />
{{ExampleWithCode<br />
|title1=Example 2:<br />
|title2=<br />
<br />
|text=<br />
|equation= <math> \begin{eqnarray}<br />
\psi &=& (\delta, c , \beta, p, s, d, \nu,\rho, a) \\<br />
t_0 &=&0 \\[0.2cm]<br />
{\rm if \quad t<t_0} \\[0.2cm]<br />
\quad \nitc &=& \delta \, c/( \beta \, p) \\<br />
\quad \itc &=& (s - d\,\nitc) / \delta \\<br />
\quad \vl &=& p \, \itc / c. \\[0.2cm] <br />
{\rm else \quad \quad }\\[0.2cm] <br />
\quad \dA{\nitc}{} & =& s - \beta(1-\nu) \, \nitc(t) \, \vl(t) - d\,\nitc(t) \\<br />
\quad \dA{\itc}{} & = &\beta(1-\nu) \, \nitc(t) \, \vl(t) - \delta \, \itc(t) \\<br />
\quad \dA{\vl}{} & = &p(1-\rho) \, \itc(t) - c \, \vl(t) \\<br />
\quad \log(y_{ij}) &= &\log(V(t_{ij} , \psi_i)) + a\, \teps_{ij} <br />
\end{eqnarray}</math><br />
|code=<br />
{{MLXTranForTable<br />
|name=<br />
|text=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
INPUT:<br />
input = {delta, c, beta, p, s, d, nu, rho, a}<br />
<br />
EQUATION:<br />
t0=0<br />
N_0 = delta*c/(beta*p)<br />
I_0 = (s - d*N_0)/delta<br />
V_0 = p*I_0/c<br />
ddt_N = s - beta*(1-nu)*N*V - d*N<br />
ddt_I = beta*(1-nu)*N*V - delta*I<br />
ddt_V = p*(1-rho)*I - c*V<br />
<br />
DEFINITION:<br />
y = {distribution=logNormal, prediction=V, std=a}<br />
</pre> }} <br />
}}<br />
<br />
<br><br><br />
<br />
<br />
==Bibliography==<br />
<br />
<br />
<bibtex><br />
@book{davidian1995,<br />
author = {Davidian, M. and Giltinan, D.M. },<br />
title = {Nonlinear Models for Repeated Measurements Data },<br />
publisher = {Chapman & Hall.},<br />
address = {London},<br />
edition = {},<br />
year = {1995}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{demidenko2005mixed,<br />
title={Mixed Models: Theory and Applications},<br />
author={Demidenko, E.},<br />
isbn={9780471726135},<br />
series={Wiley Series in Probability and Statistics}, url={http://books.google.fr/books/about/Mixed_Models.html?id=IWQR8d_UZHoC&redir_esc=y}, <br />
year={2005}, publisher={Wiley}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{fitzmaurice2008longitudinal,<br />
title={Longitudinal Data Analysis},<br />
author={Fitzmaurice, G. and Davidian, M. and Verbeke, G. and Molenberghs, G.},<br />
isbn={9781420011579},<br />
lccn={2008020681},<br />
series={Chapman & Hall/CRC Handbooks of Modern Statistical Methods},url={http://books.google.fr/books?id=zVBjCvQCoGQC},<br />
year={2008},publisher={Taylor & Francis}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{jiang2007,<br />
author = {Jiang, J.},<br />
title = {Linear and Generalized Linear Mixed Models and Their Applications},<br />
publisher = {Springer Series in Statistics},<br />
year = {2007},<br />
address = {New York}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{laird1982,<br />
author = {Laird, N.M. and Ware, J.H.},<br />
title = {Random-Effects Models for Longitudinal Data},<br />
journal = {Biometrics},<br />
volume = {38},<br />
pages = {963-974},<br />
year = {1982}<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{lindstrom1990Nonlinear,<br />
author = {Lindstrom, M.J. and Bates, D.M. },<br />
title = {Nonlinear mixed-effects models for repeated measures},<br />
journal = {Biometrics},<br />
volume = {46},<br />
pages = {673-687},<br />
year = {1990}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{littell2006sas,<br />
title={SAS for mixed models},<br />
author={Littell, R.C.},<br />
year={2006},<br />
publisher={SAS institute}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{mcculloch2011generalized,<br />
title={Generalized, Linear, and Mixed Models},<br />
author={McCulloch, C.E. and Searle, S.R.},<br />
isbn={9781118209967},<br />
series={Wiley Series in Probability and Statistics}, url={http://books.google.fr/books/about/Generalized_Linear_and_Mixed_Models.html?id=bWDPukohugQC&redir_esc=y}, year={2004}, publisher={Wiley & Sons} <br />
}<br />
</bibtex><br />
<bibtex><br />
@book{verbeke2009linear,<br />
title={Linear Mixed Models for Longitudinal Data},<br />
author={Verbeke, G. and Molenberghs, G.},<br />
isbn={9781441902993},<br />
lccn={2010483807},<br />
series={Springer Series in Statistics},<br />
url={http://books.google.fr/books?id=jmPkX4VU7h0C},<br />
year={2009},<br />
publisher={Springer}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{west2006linear,<br />
title={Linear Mixed Models: A Practical Guide Using Statistical Software},<br />
author={West, B. and Welch, K.B. and Galecki, A.T.},<br />
isbn={9781584884804},<br />
lccn={2006045440},year={2006},publisher={Taylor & Francis}<br />
}<br />
</bibtex><br />
<br />
<br />
{{Back&Next<br />
|linkBack=Modeling the observations <br />
|linkNext=Models for count data }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=Extension_to_multivariate_distributions&diff=7287Extension to multivariate distributions2013-06-07T13:34:38Z<p>Brocco: </p>
<hr />
<div><!-- Menu for the Individual Parameters chapter --><br />
<sidebarmenu><br />
+[[Modeling the individual parameters]]<br />
*[[Modeling the individual parameters| Introduction ]] | [[Gaussian models]] | [[Model with covariates]] | [[Extension to multivariate distributions]] | [[Additional levels of variability]] <br />
</sidebarmenu><br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
== The Gaussian model ==<br />
<br />
We would now like to extend the model defined for a unique individual scalar parameter $\psi_i$ to the case where $\psi_i$ is a vector $(\psi_{i,1},\psi_{i,2}, \ldots,\psi_{i,d})$ of individual parameters.<br />
<br />
To begin with, we are going to merely generalize the basic model to each component of $\psi_i$. To this end, we suppose that there exists a vector of covariates $c_i = (c_{i,1}, \ldots, c_{i,L})$ and:<br />
<br />
<br />
<ul><br />
* $d$ monotonic transformations $h_1$, $h_2$, $\ldots$, $h_d$<br />
<br />
* $d$ vectors of fixed coefficients $\bbeta_1$, $\bbeta_2, \ldots, \bbeta_d$<br />
<br />
* $d$ functions $\hmodel_1$, $\hmodel_2$, $\ldots$, $\hmodel_d$<br />
<br />
* a vector of random effects $\beeta_i = (\eta_{i,1},\eta_{i,2},\ldots , \eta_{i,d})$,<br />
</ul><br />
<br />
<br />
such that, for each $\iparam=1,2,\ldots,d$,<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\hpsi_{i,\iparam} &=& \hmodel_\iparam(\bbeta_\iparam,c_i) \\<br />
h_\iparam(\psi_{i,\iparam}) & =& h_\iparam(\hpsi_{i,\iparam}) +\eta_{i,\iparam} \\<br />
& =& \mmodel_\iparam(\bbeta_\iparam,c_i) +\eta_{i,\iparam}.<br />
\end{eqnarray} </math> }}<br />
<br />
For instance, a linear covariate model supposes that for each $\iparam=1,2,\ldots,d$, we have:<br />
<br />
{{Equation1<br />
|equation=<math><br />
h_\iparam(\hpsi_{i,\iparam}) = h_\iparam(\psi_{ {\rm pop},\iparam})+ \bbeta_{\iparam,1}(c_{i,1} - c_{\rm pop,1}) + \bbeta_{\iparam,2}(c_{i,2} - c_{\rm pop,2}) + \ldots + \bbeta_{\iparam,L}(c_{i,L} - c_{\rm pop,L}) .<br />
</math> }}<br />
<br />
Dependency can be introduced between parameters by supposing that the random effects $(\eta_{i,\iparam})$ are not independent. In the special case where the random effects are Gaussian, this means considering them to be correlated, i.e., we suppose there exists a $d\times d$ variance-covariance matrix $\Omega$ such that<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\esp{\beeta_i}&=&0 \\<br />
\esp{\beeta_i \beeta_i^\prime} &=& \Omega .<br />
\end{eqnarray} </math> }}<br />
<br />
Here,<br />
<br />
{{Equation1<br />
|equation=<math><br />
\Omega = \left(<br />
\begin{array}{cccc}<br />
\omega_1^2 & \omega_{1,2} & \ldots & \omega_{1,d} \\<br />
\omega_{1,2} & \omega_2^2 & \ldots & \omega_{2,d} \\<br />
\vdots & \vdots & \ddots & \vdots \\<br />
\omega_{1,d} & \omega_{2,d} & \ldots & \omega_d^2<br />
\end{array}<br />
\right), <br />
</math> }}<br />
<br />
where $\omega_\iparam^2$ is the variance of $\eta_{i,\iparam}$ and $\omega_{\iparam,\iparam^\prime}$ the covariance between $\eta_{i,\iparam}$ and $\eta_{i,\iparam^\prime}$.<br />
<br />
It will be useful in the following to have a diagonal decomposition of $\Omega$. To this end, let us define the correlation matrix $R=(R_{\iparam,\iparam^\prime}, 1 \leq \iparam,\iparam^\prime \leq d)$ of the vector $\eta_i$:<br />
<br />
{{Equation1<br />
|equation=<math> R_{\iparam,\iparam^\prime} = \left\{<br />
\begin{array}{ll}<br />
1 & {\rm if \quad } \iparam=\iparam^\prime \\<br />
\rho_{\iparam,\iparam^\prime}=\frac{\omega_{\iparam,\iparam^\prime} }{\omega_{\iparam}\omega_{\iparam^\prime} } & \hbox{otherwise,}<br />
\end{array}<br />
\right.<br />
</math> }}<br />
<br />
and let $D=(D_{\iparam,\iparam^\prime})$ be a diagonal matrix which contains the standard deviations $(\omega_\iparam)$:<br />
<br />
{{Equation1<br />
|equation=<math>D_{\iparam,\iparam^\prime} = \left\{<br />
\begin{array}{ll}<br />
\omega_{\iparam} & {\rm if \quad } \iparam=\iparam^\prime \\<br />
0 & {\rm otherwise.}<br />
\end{array}<br />
\right.<br />
</math> }}<br />
<br />
Then we have the diagonal decomposition: $\Omega = D \, R \, D$.<br />
<br />
<br />
<br />
{{Example<br />
|title=Example:<br />
|text=Consider a PK model with three PK parameters: the absorption rate constant $ka$, the volume $V$ and the clearance $Cl$. Here, $\psi_i=(ka_i,V_i,Cl_i)$.<br />
<br />
<br />
<li> If we make the assumption that $\eta_{i,V}$ and $\eta_{i,Cl}$ are correlated, it means that the log-volume and the log-clearance are linearly correlated, with correlation:<br />
<br />
{{Equation1<br />
|equation= <math>\begin{eqnarray}<br />
\rho_{V,Cl} & = & \corr{\eta_{i,V},\eta_{i,Cl} } \\<br />
& = & \corr{\log(V_i),\log(Cl_i)} .<br />
\end{eqnarray}</math> }}<br />
<br />
<br />
<li> Assuming that $ka$ is fixed in the population means that $ka_i = ka_{\rm pop}$ for any $i$, which implies that $\eta_{i,ka}=0$, and thus $\omega_{ka}=0$.<br />
<br />
The correlation matrix $R$ and the variance covariance matrix $\Omega$ of $(\eta_{i,ka}, \eta_{i,V}, \eta_{i,Cl})$ are therefore<br />
<br />
{{Equation1<br />
|equation=<math> R = \left(<br />
\begin{array}{ccc}<br />
1 & 0 & 0 \\<br />
0 & 1 & \rho_{V,Cl} \\<br />
0 & \rho_{V,Cl} & 1 \\<br />
\end{array}<br />
\right)<br />
, \quad \quad<br />
\Omega = DRD = \left(<br />
\begin{array}{ccc}<br />
0 & 0 & 0 \\<br />
0 & \omega_V^2 & \omega_V\omega_{Cl}\, \rho_{V,Cl} \\<br />
0 & \omega_V\omega_{Cl}\, \rho_{V,Cl} & \omega_{Cl}^2 \\<br />
\end{array}<br />
\right) .<br />
</math> }}<br />
}}<br />
<br />
<br />
<br />
<br><br />
<br />
== The probability distribution function ==<br />
<br />
<br />
<br />
We have now all the elements needed for computing the pdf of $\psi_i=(\psi_{i,1},\psi_{i,2}, \ldots,\psi_{i,d})$.<br />
Here, $\theta = (\psi_{{\rm pop},1}, \ldots, \psi_{{\rm pop},d}, \bbeta_1, \ldots,\bbeta_d,\Omega)$.<br />
<br />
<br />
<ul><br />
* If $\Omega$ is a positive-definite matrix, it can be inverted and a straightforward extension of the pdf proposed in [[Covariate_models#indiv_cov6|(9) of The covariate model]] for a scalar variable gives<br />
<br />
{{EquationWithRef<br />
|equation=<div id="indiv_multi1"><math><br />
\ppsii(\psi_i;c_i,\theta )= \left( \prod_{\iparam=1}^d h_\iparam^\prime(\psi_{i,\iparam}) \right)<br />
(2 \pi)^{-\frac{d}{2} } {{!}}\Omega{{!}}^{-\frac{1}{2} }<br />
{\rm exp} \left\{-\frac{1}{2} ( h(\psi_i) - \mmodel(\bbeta,c_i) )^\prime \Omega^{-1} ( h(\psi_i) - \mmodel(\bbeta,c_i) ) \right\} ,<br />
</math></div><br />
|reference=(1) }}<br />
<br />
: where $h(\psi_i)$ is the column vector $(h_1(\psi_{i,1}), h_2(\psi_{i,2}), \ldots, h_d(\psi_{i,d}))^\prime$ and $\mmodel(\bbeta,c_i)$ the column vector $(h_1(\hpsi_{i,1}), h_2(\hpsi_{i,2}), \ldots, h_d(\hpsi_{i,d}))^\prime$.<br />
<br />
<br />
* If the variance of some of the random effects is null, $\Omega$ is not positive-definite. The pdf in [[#indiv_multi1|(1)]] does not apply anymore for the complete $d$-vector $\psi_i$ but only for the $d_1$-vector subset $\psi_i^{(1)}$ of $\psi_i$ whose variance matrix $\Omega_1$ is positive-definite. The distribution of the remaining fixed parameters $\psi_i^{(0)}$ is a Dirac delta distribution. Let $I_0$ be the indices of the parameters $\psi_i^{(0)}$ and $I_1$ those of the parameters $\psi_i^{(1)}$, i.e., $\omega_\iparam =0$ if $\iparam \in I_0$ and $\omega_\iparam >0$ if $\iparam \in I_1$. Then,<br />
<br />
{{Equation1<br />
|equation=<math> \ppsii(\psi_i;c_i,\theta ) = \pmacro(\psi_i^{(0)};c_i,\theta )\,\,\pmacro(\psi_i^{(1)};c_i,\theta ) ,<br />
</math> }}<br />
<br />
: where<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\pmacro(\psi_i^{(0)};c_i,\theta ) &= & \prod_{\iparam \in I_0} \delta_{ \{ h(\psi_{i,\iparam})=\mmodel_{\iparam}(\bbeta_{\iparam},c_i) \} } \\<br />
\pmacro(\psi_i^{(1)};c_i,\theta )&=& \left( \prod_{\iparam \in I_1} h_\iparam^\prime(\psi_{i,\iparam}) \right)<br />
(2 \pi)^{-\frac{d_1}{2} } {{!}}\Omega_1{{!}}^{-\frac{1}{2} }<br />
{\rm exp} \left\{ -\frac{1}{2} ( h(\psi_i) - \mmodel(\bbeta,c_i) )^{(1)^\prime} \Omega_1^{-1} ( h(\psi_i) - \mmodel(\bbeta,c_i) )^{(1)} \right\},<br />
\end{eqnarray}</math> }}<br />
<br />
: with $( h(\psi_i) - \mmodel(\bbeta,c_i) )^{(1)}$ the same as $( h(\psi_i) - \mmodel(\bbeta,c_i) )$ but with the $I_0$ entries removed.<br />
<br />
<br />
* There exist other situations where $\Omega$ is not positive-definite. This is the case for instance when two random effects are equal: $\eta_{i,\iparam} \equiv \eta_{i,\iparam^\prime}$. For them, we can calculate a joint distribution <br />
<br />
{{Equation1<br />
|equation=<math><br />
\pmacro(\psi_{i,\iparam}, \psi_{i,\iparam^\prime},\eta_{i,\iparam}; \bbeta_{\iparam},\bbeta_{\iparam^\prime},\omega^2_\iparam ,c_i ) = \pmacro(\psi_{i,\iparam} {{!}} \eta_{i,\iparam}; \bbeta_{\iparam}, c_i ) \<br />
\pmacro(\psi_{i,\iparam^\prime} {{!}} \eta_{i,\iparam};\bbeta_{\iparam^\prime} , c_i ) \<br />
\pmacro(\eta_{i,\iparam} ; \omega^2_\iparam) , </math> }}<br />
<br />
: where<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\pmacro(\psi_{i,\iparam}{{!}} \eta_{i,\iparam}; \bbeta_{\iparam}, c_i ) &=& \delta_{\{h(\psi_{i,\iparam})=\mmodel_{\iparam}(\bbeta_{\iparam},c_i)+\eta_{i,\iparam} \} } \\<br />
\pmacro( \psi_{i,\iparam^\prime} {{!}} \eta_{i,\iparam};\bbeta_{\iparam^\prime} , c_i ) &=& \delta_{\{h(\psi_{i,\iparam^\prime})=\mmodel_{\iparam^\prime}(\bbeta_{\iparam^\prime},c_i)+\eta_{i,\iparam} \} } \\<br />
\pmacro(\eta_{i,\iparam} ; \omega^2_\iparam) &=& \displaystyle{ \frac{ 1}{\sqrt{2 \, \pi \omega_\iparam^2 } } }\ \exp\left\{-\displaystyle{ \frac{\eta_{i,\iparam}^2}{2 \omega_\iparam^2} }\right\}.<br />
\end{eqnarray}</math> }}<br />
</ul><br />
<br />
<br />
All kinds of combinations are possible, including parameters with and without variability, algebraic relationships between random effects, etc. In all possible cases it is possible to find an adequate decomposition that lets us characterize a pdf. This pdf turns out to play a fundamental role for tasks such as population parameter estimation with maximum likelihood, where we start with the observations $\by = (y_i , 1\leq i \leq N)$ and the individual parameters $(\psi_i)$ are not observed.<br />
<br />
<br />
<br><br />
<br />
<!--<br />
== $\mlxtran$ for the covariance model ==<br />
<br />
<br />
<br />
{{ExampleWithCode<br />
|title1=Example 1:<br />
|title2=<br />
|text= TO DO<br />
|equation=<br />
|code =<br />
}}<br />
--><br />
<br />
{{Back&Next<br />
|linkBack=Model with covariates<br />
|linkNext=Additional levels of variability }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=Estimation&diff=7286Estimation2013-06-07T13:18:39Z<p>Brocco: /* Estimation of the observed log-likelihood */</p>
<hr />
<div>== Introduction ==<br />
<br />
In the modeling context, we usually assume that we have data that includes observations $\by$, measurement times $\bt$ and possibly additional regression variables $\bx$. There may also be individual covariates $\bc$, and in pharmacological applications the dose regimen $\bu$. For clarity, in the following notation we will omit the design variables $\bt$, $\bx$ and $\bu$, and the covariates $\bc$.<br />
<br />
Here, we find ourselves in the classical framework of incomplete data models. Indeed, only $\by = (y_{ij})$ is observed in the joint model $\pypsi(\by,\bpsi;\theta)$.<br />
<br />
Estimation tasks are common ones seen in statistics:<br />
<br />
<br />
<ol><br />
<li> Estimate the population parameter $\theta$ using the available observations and possibly a priori information that is available.</li><br />
<br />
<li>Evaluate the precision of the proposed estimates.</li><br />
<br />
<li>Reconstruct missing data, here being the individual parameters $\bpsi=(\psi_i, 1\leq i \leq N)$. </li><br />
<br />
<li>Estimate the log-likelihood for a given model, i.e., for a given joint distribution $\qypsi$ and value of $\theta$.</li><br />
</ol><br />
<br />
<br />
<br><br />
<br />
== Maximum likelihood estimation of the population parameters== <br />
<br />
<br><br />
=== Definitions ===<br />
<br />
<br />
''Maximum likelihood estimation'' consists of maximizing with respect to $\theta$ the ''observed likelihood'' defined by:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\like(\theta ; \by) &\eqdef& \py(\by ; \theta) \\<br />
&=& \int \pypsi(\by,\bpsi ;\theta) \, d \bpsi .<br />
\end{eqnarray}</math> }}<br />
<br />
Maximum likelihood estimation of the population parameter $\theta$ requires:<br />
<br />
<blockquote><br />
* A model, i.e., a joint distribution $\qypsi$. Depending on the software used, the model can be implemented using a script or a graphical user interface. $\monolix$ is extremely flexible and allows us to combine both. It is possible for instance to code the structural model using $\mlxtran$ and use the GUI for implementing the statistical model. Whatever the options selected, the complete model can always be saved as a text file. <br><br><br />
* Inputs $\by$, $\bc$, $\bu$ and $\bt$. All of these variables tend to be stored in a unique data file (see the [[Visualization#Data exploration | Data Exploration ]] Section). <br><br><br />
* An algorithm which allows us to maximize $\int \pypsi(\by,\bpsi ;\theta) \, d \bpsi$ with respect to $\theta$. Each software package has its own algorithms implemented. It is not our goal here to rate and compare the various algorithms and implementations. We will use exclusively the SAEM algorithm as described in [[The SAEM algorithm for estimating population parameters | The SAEM algorithm]] and implemented in $\monolix$ as we are entirely satisfied by both its theoretical and practical qualities: <br><br><br />
** The algorithms implemented in $\monolix$ including SAEM and its extensions (mixture models, hidden Markov models, SDE-based model, censored data, etc.) have been published in statistical journals. Furthermore, convergence of SAEM has been rigorously proved.<br><br><br />
** The SAEM implementation in $\monolix$ is extremely efficient for a wide variety of complex models.<br><br><br />
** The SAEM implementation in $\monolix$ was done by the same group that proposed the algorithm and studied in detail its theoretical and practical properties.<br />
</blockquote><br />
<br />
<br />
{{Remarks<br />
|title=Remark<br />
|text= It is important to highlight the fact that for a parameter $\psi_i$ whose distribution is the tranformation of a normal one (log-normal, logit-normal, etc.) the MLE $\hat{\psi}_{\rm pop}$ of the reference parameter $\psi_{\rm pop}$ is neither the mean nor the mode of the distribution. It is in fact the median.<br />
<br />
To show why this is the case, let $h$ be a nonlinear, twice continuously derivable and strictly increasing function such that $h(\psi_i)$ is normally distributed.<br />
<br />
<br />
* First we show that it is not the mean. By definition, the MLE of $h(\psi_{\rm pop})$ is $h(\hat{\psi}_{\rm pop})$. Thus, the estimated distribution of $h(\psi_i)$ is the normal distribution with mean $h(\hat{\psi}_{\rm pop})$, but $\esp{h(\psi_i)} = h(\hat{\psi}_{\rm pop})$ implies that $\esp{\psi_i} \neq \hat{\psi}_{\rm pop}$ since $h$ is nonlinear. In other words, $\hat{\psi}_{\rm pop}$ is not the mean of the estimated distribution of $\psi_i$.<br />
<br />
<br />
* Next we show that it is not the mode. Let $f$ be the pdf of $\psi_i$ and let $f_h$ be the pdf of $h(\psi_i)$. By definition, for any $h(t)\in \mathbb{R}$,<br />
<br />
{{Equation1<br />
|equation=<math><br />
f(t) = h^\prime(t)f_h(h(t)) . </math> }}<br />
<br />
: Thus,<br />
<br />
{{Equation1<br />
|equation=<math> <br />
f^\prime(t) = h^{\prime \prime}(t)f_h(h(t)) + h^{\prime 2}(t)f_h^\prime(h(t)) .<br />
</math> }}<br />
<br />
: By definition of the mode, $f_h^\prime(h(\hat{\psi}_{\rm pop}))=0$. Since $h$ is nonlinear, $h^{\prime \prime}(\hat{\psi}_{\rm pop})\neq 0$ a.s. and $f^\prime(\hat{\psi}_{\rm pop})\neq 0$ a.s.. In other words, $\hat{\psi}_{\rm pop}$ is not the mode of the estimated distribution of $\psi_i$.<br />
<br />
<br />
* Now we show that it is the median. Since $h$ is a strictly increasing function,<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\probs{\hat{\psi}_{\rm pop} }{\psi_i \leq \hat{\psi}_{\rm pop} } &=& \probs{\hat{\psi}_{\rm pop} }{h(\psi_i) \leq h(\hat{\psi}_{\rm pop})} \\<br />
&=& 0.5 .<br />
\end{eqnarray}</math> }} <br />
<br />
: In other words, $\hat{\psi}_{\rm pop}$ is the median of the estimated distribution of $\psi_i$.<br />
}}<br />
<br />
<br />
<br><br />
<br />
=== Example ===<br />
<br />
Let us again look at the model used in the [[Visualization#Model exploration | Model Visualization]] Section. For the case of a unique dose $D$ given at time $t=0$, the structural model is written:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
ke&=&Cl/V \\<br />
Cc(t) &=& \displaystyle{\frac{D \, ka}{V(ka-ke)} }\left(e^{-ke\,t} - e^{-ka\,t} \right) \\<br />
h(t) &=& h_0 \, \exp(\gamma\, Cc(t)) ,<br />
\end{eqnarray}</math> }}<br />
<br />
where $Cc$ is the concentration in the central compartment and $h$ the hazard function for the event of interest (hemorrhaging). Supposing a constant error model for the concentration, the model for the observations can be easily implemented using $\mlxtran$.<br />
<br />
<br />
{{MLXTran<br />
|name=joint1est_model.txt<br />
|text=<pre style="background-color:#EFEFEF; border:none;"> <br />
INPUT:<br />
parameter = {ka, V, Cl, h0, gamma}<br />
<br />
EQUATION:<br />
ke=Cl/V<br />
Cc = amtDose*ka/(V*(ka-ke))*(exp(-ke*t) - exp(-ka*t))<br />
h = h0*exp(gamma*Cc)<br />
<br />
OBSERVATION:<br />
Concentration = {type=continuous, prediction=Cc, errorModel=constant}<br />
Hemorrhaging = {type=event, hazard=h}<br />
<br />
OUTPUT:<br />
output = {Concentration, Hemorrhaging}<br />
</pre> }}<br />
<br />
<br />
Here, {{Verbatim|amtDose}} is a reserved keyword for the last administered dose.<br />
<br />
The model's parameters are the absorption rate constant $ka$, the volume of distribution $V$, the clearance $Cl$, the baseline hazard $h_0$ and the coefficient $\gamma$. The statistical model for the individual parameters can be defined in the $\monolix$ project file (left) and/or the $\monolix$ GUI (right):<br />
<br />
<br />
{{ExampleWithCode&Image<br />
|title=<br />
|text=<br />
|code={{MLXTranForTable<br />
|name=<br />
|text=<pre style="background-color:#EFEFEF; border:none;"> <br />
INDIVIDUAL:<br />
ka = {distribution=logNormal, iiv=yes}<br />
V = {distribution=logNormal, iiv=yes},<br />
Cl = {distribution=normal, iiv=yes},<br />
h0 = {distribution=probitNormal, iiv=yes},<br />
gamma = {distribution=logitNormal, iiv=yes},<br />
</pre> }}<br />
|image=<br />
[[File:Vsaem1.png]]<br />
}}<br />
<br />
<br />
Once the model is implemented, tasks such as maximum likelihood estimation can be performed using the SAEM algorithm. Certain settings in SAEM must be provided by the user. Even though SAEM is quite insensitive to the initial parameter values,<br />
it is possible to perform a preliminary sensitivity analysis in order to select "good" initial values.<br />
<br />
<br />
{{ImageWithCaption|image=Vsaem2.png|caption=Looking for good initial values for SAEM}}<br />
<br />
<br />
<br />
Then, when we run SAEM, it converges easily and quickly to the MLE:<br />
<br />
<br />
{{JustCode<br />
|code=<pre style="background-color:#EFEFEF; border:none;">Estimation of the population parameters<br />
<br />
parameter<br />
ka : 0.974<br />
V : 7.07<br />
Cl : 2.00<br />
h0 : 0.0102<br />
gamma : 0.485<br />
<br />
omega_ka : 0.668<br />
omega_V : 0.365<br />
omega_Cl : 0.588<br />
omega_h0 : 0.105<br />
omega_gamma : 0.0901<br />
<br />
a_1 : 0.345<br />
</pre> }}<br />
<br />
<br />
Parameter estimation can therefore be seen as estimating the reference values and variance of the random effects.<br />
<br />
In addition to these numbers, it is important to be able to graphically represent these distributions in order to see them and therefore understand them better. In effect, the interpretation of certain parameters is not always simple. Of course, we know what a normal distribution represents and in particular its mean, median and mode, which are equal (see the distribution of $Cl$ below for instance). These measures of central tendency can be different among themselves for other asymmetric distributions such as the log-normal (see the distribution of $ka$).<br />
<br />
Interpreting dispersion terms like $\omega_{ka}$ and $\omega_{V}$ is not obvious either when the parameter distributions are not normal. In such cases, quartiles or quantiles of order 5% and 95% (for example) may be useful for quantitively describing the variability of these parameters.<br />
<br />
<br />
{{Remarks <br />
|title=Remarks<br />
|text=<br />
For a parameter $\psi$ whose distribution is log-normal, we can approximate the coefficient of variation for $\psi$ by the standard deviation $\omega_{\psi}$ of the random effect $\eta$ if this is fairly small. In effect, when $\omega_{\psi}$ is small,<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\psi &=& \psi_{\rm pop} e^{\eta} \\<br />
&\approx & \psi_{\rm pop}(1+ \eta) .<br />
\end{eqnarray}</math> }}<br />
<br />
Thus<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\esp{\psi} &\approx& \psi_{\rm pop} \\<br />
\std{\psi} &\approx & \psi_{\rm pop}\omega_{\psi},<br />
\end{eqnarray}</math> }}<br />
<br />
and<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
{\rm cv}(\psi) &=& \frac{\std{\psi} }{\esp{\psi} } \\<br />
&\approx & \omega_{\psi} .<br />
\end{eqnarray}</math> }}<br />
<br />
Do not forget that this approximation is only valid when $\omega$ is small and in the case of log-normal distributions. It does not carry over to any other distribution. Thus, when $\omega_{h0}=0.1$ for a probit-normal distribution or $\omega_{\gamma}=0.09$ for a logit-normal one, there is no immediate interpretation available. Only by looking at the graphical display of the pdf or by calculating some quantiles of interest can we begin to get an idea of dispersion in the parameters $h0$ and $\gamma$.<br />
}}<br />
<br />
<br />
{{ImageWithCaption|image=saem3b.png|caption=Estimation of the population distributions of the individual parameters of the model }}<br />
<br />
<br />
<br />
<br><br />
<br />
==Bayesian estimation of the population parameters==<br />
<br />
The ''Bayesian approach'' considers $\theta$ as a random vector with a ''prior distribution'' $\qth$. We can then define the posterior distribution of $\theta$:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\pcthy(\theta {{!}} \by ) &=& \displaystyle{ \frac{\pth( \theta )\pcyth(\by {{!}} \theta )}{\py(\by)} }\\<br />
&=& \displaystyle{ \frac{\pth( \theta ) \int \pypsith(\by,\bpsi {{!}}\theta) \, d \bpsi}{\py(\by)} }.<br />
\end{eqnarray}</math> }}<br />
<br />
We can estimate this conditional distribution and derive any statistics (posterior mean, standard deviation, percentiles, etc.) or derive the so-called ''Maximum a Posteriori'' (MAP) estimate of $\theta$:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\hat{\theta}^{\rm MAP} &=& \argmax{\theta} \pcthy(\theta {{!}} \by ) \\<br />
&=& \argmax{\theta} \left\{ {\llike}(\theta ; \by) + \log( \pth( \theta ) ) \right\} .<br />
\end{eqnarray}</math> }}<br />
<br />
The MAP estimate therefore maximizes a penalized version of the observed likelihood. In other words, maximum a posteriori estimation reduces to penalized maximum likelihood estimation. Suppose for instance that $\theta$ is a scalar parameter and the prior is a normal distribution with mean $\theta_0$ and variance $\gamma^2$. Then, the MAP estimate minimizes<br />
<br />
{{Equation1<br />
|equation=<math><br />
\hat{\theta}^{\rm MAP} =\argmax{\theta} \left\{ {\llike} (\theta ; \by) - \displaystyle{ \frac{1}{2\gamma^2} }(\theta - \theta_0)^2 \right\} .<br />
</math> }}<br />
<br />
The MAP estimate is a trade-off between the MLE which maximizes ${\llike}(\theta ; \by)$ and $\theta_0$ which minimizes $(\theta - \theta_0)^2$. The weight given to the prior directly depends on the variance of the prior distribution: the smaller $\gamma^2$ is, the closer to $\theta_0$ the MAP is. The limiting distribution considers that $\gamma^2=0$: this prior means here that $\theta$ is fixed as $\theta_0$ and no longer needs to be estimated.<br />
<br />
Both the Bayesian and frequentist approaches have their supporters and detractors. But rather than being dogmatic and blindly following the same rule-book every time, we need to be pragmatic and ask the right methodological questions when confronted with a new problem.<br />
<br />
We have to remember that Bayesian methods have been extremely successful, in particular for numerical calculations. For instance, (Bayesian) MCMC methods allow us to estimate more or less any conditional distribution coming from any hierarchical model, whereas frequentist approaches such as maximum likelihood estimation can be much more difficult to implement.<br />
<br />
All things said, the problem comes down to knowing whether the data contains sufficient information to answer a given question, and whether some other information may be available to help answer it. This is the essence of the art of modeling: finding the right compromise between the confidence we have in the data and prior knowledge of the problem. Each problem is different and requires a specific approach. For instance, if all the patients in a pharmacokinetic trial have essentially the same weight, it is pointless to estimate a relationship between weight and the model's PK parameters using the trial data. In this case, the modeler would be better served trying to use prior information based on physiological criteria rather than just a statistical model.<br />
<br />
Therefore, we can use information available to us, of course! Why not? But this information needs to be pertinent. Systematically using a prior for the parameters is not always meaningful. Can we reasonable suppose that we have access to such information? For continuous data for example, what does putting a prior on the residual error model's parameters mean in reality? A reasoned statistical approach consists of only including prior information for certain parameters (those for which we have real prior information) and having confidence in the data for the others.<br />
<br />
$\monolix$ allows this hybrid approach which reconciles the Bayesian and frequentist approaches. A given parameter can be:<br />
<br />
<br />
<ul><br />
* a fixed constant if we have absolute confidence in its value or the data does not allow it to be estimated, essentially due to identifiability constraints.<br />
<br><br />
<br />
* estimated by maximum likelihood, either because we have great confidence in the data or have no information on the parameter.<br />
<br><br />
<br />
* estimated by introducing a prior and calculating the MAP estimate.<br />
<br><br />
<br />
* estimated by introducing a prior and then estimating the posterior distribution.<br />
</ul><br />
<br />
<br />
We put aside dealing with the fixed components of $\theta$ in the following. Here are some possible situations:<br />
<br />
<br />
<ol><br />
<li> ''Combined maximum likelihood and maximum a posteriori estimation'': decompose $\theta$ into $(\theta_E,\theta_{M})$ where $\theta_E$ are the components of $\theta$ to be estimated with MLE and $\theta_{M}$ those with a prior distribution whose posterior distribution is to be maximized. Then, $(\hat{\theta}_E , \hat{\theta}_{M} )$ below maximizes the penalized likelihood of $(\theta_E,\theta_{M})$: </li><br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
(\hat{\theta}_E , \hat{\theta}_{M} ) &=& \argmax{\theta_E , \theta_{M} } \log(\py(\by , \theta_{M}; \theta_E)) \\<br />
&=& \argmax{\theta_E , \theta_{M} } \left\{ {\llike}(\theta_E , \theta_{M}; \by) + \log( \pth( \theta_M ) ) \right\} ,<br />
\end{eqnarray}</math> }}<br />
<br />
where ${\llike} (\theta_E , \theta_{M}; \by) \ \ \eqdef \ \ \log\left(\py(\by | \theta_{M}; \theta_E)\right).$<br />
<br />
<br />
<li> ''Combined maximum likelihood and posterior distribution estimation'': here, decompose $\theta$ into $(\theta_E,\theta_{R})$ where $\theta_E$ are the components of $\theta$ to be estimated with MLE and $\theta_{R}$ those with a prior distribution whose posterior distribution is to be estimated. We propose the following strategy for estimating $\theta_E$ and $\theta_{R}$: </li><br />
<br />
<br />
<ol style="list-style-type:lower-roman"><br />
<li> Compute the maximum likelihood of $\theta_E$: </li><br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\hat{\theta}_E &=& \argmax{\theta_E} \log(\py(\by ; \theta_E)) \\<br />
&=& \argmax{\theta_E} \int \pmacro(\by , \theta_R ; \theta_E ) d \theta_R .<br />
\end{eqnarray}</math> }}<br />
<br />
<br />
<li> Estimate the conditional distribution $\pmacro(\theta_{R} | \by ;\hat{\theta}_E)$. </li><br />
</ol><br />
<br />
<br />
It is then straightforward to extend this approach to more complex situations where some components of $\theta$ are estimated with MLE, others using MAP estimation and others still by estimating their conditional distributions.<br />
</ol><br />
<br />
<br />
{{Example1<br />
|title1=Example<br />
|title2=A PK example<br />
|text=<br />
In this example we use only the pharmacokinetic data and aim to estimate the population parameter distributions of the PK parameters $ka$, $V$ and $Cl$. We assume log-normal distributions for these three parameters. All of the model's population parameters are estimated by maximum likelihood estimation except $ka_{\rm pop}$ for which a log-normal distribution is used as a prior:<br />
<br />
{{Equation1<br />
|equation=<math> \log(ka_{\rm pop}) \sim {\cal N}(\log(1.5), \gamma^2) . </math> }}<br />
<br />
$\monolix$ allows us to compute the MAP estimate and to estimate the posterior distribution of $ka_{\rm pop}$ for various values of $\gamma$.<br />
<br />
<br />
<div style="margin-left:15%; margin-right:32%; align:center"><br />
{{{!}} class="wikitable" align="center" style="width:100%"<br />
{{!}} $\gamma$ {{!}}{{!}} 0 {{!}}{{!}} 0.01 {{!}}{{!}} 0.025 {{!}}{{!}} 0.05 {{!}}{{!}} 0.1 {{!}}{{!}} 0.2 {{!}}{{!}} $+ \infty$ <br />
{{!}}-<br />
{{!}}$\hat{ka}_{\rm pop}^{\rm MAP}$ {{!}}{{!}} 1.5 {{!}}{{!}} 1.49 {{!}}{{!}} 1.47 {{!}}{{!}} 1.39 {{!}}{{!}} 1.22 {{!}}{{!}} 1.11 {{!}}{{!}} 1.05 <br />
{{!}}}</div><br />
<br />
{{ImageWithCaption|image=bayes1.png|caption=Prior and posterior distributions of $ka_{\rm pop}$ for different values of $\gamma$}}<br />
<br />
<br />
As expected, the posterior distribution converges to the prior distribution when the standard deviation $\gamma$ of the prior distribution decreases. Also, the mode of the posterior distribution converges to the maximum likelihood estimate of $ka_{\rm pop}$ when $\gamma$ increases.<br />
}}<br />
<br />
<br />
<br><br />
== Estimation of the Fisher information matrix ==<br />
<br />
The variance of the estimator $\thmle$ and thus confidence intervals can be derived from the [[Estimation of the observed Fisher information matrix|observed Fisher information matrix (F.I.M.)]], which itself is calculated using the observed likelihood (i.e., the pdf of the observations $\by$):<br />
<br />
{{EquationWithRef<br />
|equation=<div id="ofim_intro3"><math><br />
\ofim(\thmle ; \by) \ \ \eqdef \ \ - \displaystyle{ \frac{\partial^2}{\partial \theta^2} }\log({\like}(\thmle ; \by)) .<br />
</math></div><br />
|reference=(1) }}<br />
<br />
Then, the variance-covariance matrix of the maximum likelihood estimator $\thmle$ can be estimated by the inverse of the observed F.I.M. Standard errors (s.e.) for each component of $\thmle$ are their standard deviations, i.e., the square-root of the diagonal elements of this covariance matrix. $\monolix$ also displays the (estimated) relative standard errors (r.s.e.), i.e., the (estimated) standard error divided by the value of the estimated parameter.<br />
<br />
<br />
{{JustCode<br />
|code=<pre style="background-color:#EFEFEF; border:none;">Estimation of the population parameters<br />
<br />
parameter s.e. (s.a.) r.s.e.(%)<br />
ka : 0.974 0.082 8<br />
V : 7.07 0.35 5<br />
Cl : 2 0.07 4<br />
h0 : 0.0102 0.0014 14<br />
gamma : 0.485 0.015 3<br />
<br />
omega_ka : 0.668 0.064 10<br />
omega_V : 0.365 0.037 10<br />
omega_Cl : 0.588 0.055 9<br />
omega_h0 : 0.105 0.032 30<br />
omega_gamma : 0.0901 0.044 49<br />
<br />
a_1 : 0.345 0.012 3<br />
</pre> }}<br />
<br />
The F.I.M. can be used for detecting overparametrization of the structural model. In effect, if the model is poorly identifiable, certain estimators will be quite correlated and the F.I.M. will therefore be poorly conditioned and difficult to inverse. Suppose for example that we want to fit a two compartment PK model to the same data as before. The output is shown below. The large values for the relative standard errors for the inter-compartmental clearance $Q$ and the volume of the peripheral compartment $V_2$ mean that the data does not allow us to estimate well these two parameters.<br />
<br />
<br />
{{JustCode<br />
|code=<pre style="background-color:#EFEFEF; border:none;">Estimation of the population parameters<br />
<br />
parameter s.e. (lin) r.s.e.(%)<br />
ka : 0.246 0.0081 3<br />
Cl : 1.9 0.075 4<br />
V1 : 1.71 0.14 8<br />
Q : 0.000171 0.024 1.43e+04<br />
V2 : 0.00673 3.1 4.62e+04<br />
<br />
omega_ka : 0.171 0.026 15<br />
omega_Cl : 0.293 0.026 9<br />
omega_V1 : 0.621 0.062 10<br />
omega_Q : 5.72 1.4e+03 2.41e+04<br />
omega_V2 : 4.61 1.8e+04 3.94e+05<br />
<br />
a : 0.136 0.0073 5<br />
</pre> }}<br />
<br />
<br />
The Fisher information criteria is also widely used in optimal experimental design. Indeed, minimizing the variance of the estimator corresponds to maximizing the information. Then, estimators and designs can be evaluated by looking at certain summary statistics of the covariance matrix (like the determinant or trace for instance).<br />
<br />
<br><br />
== Estimation of the individual parameters ==<br />
<br />
Once $\theta$ has been estimated, the conditional distribution $\pmacro(\psi_i | y_i ; \hat{\theta})$ of the individual parameters $\psi_i$ can be estimated for each individual $i$ using the [[The Metropolis-Hastings algorithm for simulating the individual parameters| Metropolis-Hastings algorithm]]. For each $i$, this algorithm generates a sequence $(\psi_i^{k}, k \geq 1)$ which converges in distribution to the conditional distribution $\pmacro(\psi_i | y_i ; \hat{\theta})$ and that can be used for estimating any summary statistic of this distribution (mean, standard deviation, quantiles, etc.).<br />
<br />
The mode of this conditional distribution can be estimated using this sequence or by maximizing $\pmacro(\psi_i | y_i ; \hat{\theta})$ using numerical methods.<br />
<br />
The choice of using the conditional mean or the conditional mode is arbitrary. By default, $\monolix$ uses the conditional mode, taking the philosophy that the "most likely" values of the individual parameters are the most suited for computing the "most likely" predictions.<br />
<br />
<br />
{{ImageWithCaption|image=mode1.png|caption=Predicted concentrations for 6 individuals using the estimated conditional modes of the individual PK parameters}} <br />
<br />
<br><br />
<br />
== Estimation of the observed log-likelihood ==<br />
<br />
<br />
Once $\theta$ has been estimated, the observed log-likelihood of $\hat{\theta}$ is defined as<br />
<br />
{{Equation1<br />
|equation=<math> \begin{eqnarray}<br />
{\llike} (\hat{\theta};\by) &=& \log({\like}(\hat{\theta};\by)) \\<br />
&\eqdef& \log(\py(\by;\hat{\theta})) .<br />
\end{eqnarray}</math> }}<br />
<br />
The observed log-likelihood cannot be computed in closed form for nonlinear mixed effects models, but can be estimated using the methods described in the [[Estimation of the log-likelihood]] Section. The estimated log-likelihood can then be used for performing likelihood ratio tests and for computing information criteria such as AIC and BIC (see the [[Model evaluation]] Section).<br />
<br />
<br />
<br><br />
<br />
== Bibliography ==<br />
<br />
<bibtex><br />
@article{Monolix,<br />
author = {Lixoft},<br />
title = {Monolix 4.2},<br />
year={2012}<br />
journal = {http://www.lixoft.eu/products/monolix/product-monolix-overview},<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@article{comets2011package,<br />
title={saemix: Stochastic Approximation Expectation Maximization (SAEM) algorithm. R package version 0.96.1},<br />
author={Comets, E. and Lavenu, A. and Lavielle, M.},<br />
journal = {http://cran.r-project.org/web/packages/saemix/index.html},<br />
year={2013}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@article{nlmefitsa,<br />
title={nlmefitsa: fit nonlinear mixed-effects model with stochastic EM algorithm. Matlab R2013a function},<br />
author={The MathWorks},<br />
journal = {http://www.mathworks.fr/fr/help/stats/nlmefitsa.html},<br />
year={2013}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@article{beal1992nonmem,<br />
title={NONMEM users guides},<br />
author={Beal, S.L. and Sheiner, L.B. and Boeckmann, A. and Bauer, R.J.},<br />
journal={San Francisco, NONMEM Project Group, University of California},<br />
year={1992}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@book{pinheiro2000mixed,<br />
title={Mixed effects models in S and S-PLUS},<br />
author={Pinheiro, J.C. and Bates, D.M.},<br />
year={2000},<br />
publisher={Springer Verlag}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@article{pinheiro2010r,<br />
title={the R Core team (2009) nlme: Linear and Nonlinear Mixed Effects Models. R package version 3.1-96},<br />
author={Pinheiro, J. and Bates, D. and DebRoy, S. and Sarkar, D.},<br />
journal={R Foundation for Statistical Computing, Vienna},<br />
year={2010}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@article{spiegelhalter2003winbugs,<br />
title={WinBUGS user manual},<br />
author={Spiegelhalter, D. and Thomas, A. and Best, N. and Lunn, D.},<br />
journal={Cambridge: MRC Biostatistics Unit},<br />
year={2003}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@Manual{docSPSS,<br />
title = {Linear mixed-effects modeling in SPSS. An introduction to the MIXED procedure},<br />
author = {SPSS},<br />
year = {2002},<br />
note={Technical Report}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@Manual{docSAS,<br />
title = {The NLMMIXED procedure, SAS/STAT 9.2 User's Guide},<br />
chapter = {61},<br />
pages = {4337--4435},<br />
author = {SAS},<br />
year = {2008}<br />
}<br />
</bibtex><br />
<br />
<br />
{{Back&Next<br />
|linkBack=Visualization<br />
|linkNext=Model evaluation }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=Estimation&diff=7285Estimation2013-06-07T13:16:46Z<p>Brocco: </p>
<hr />
<div>== Introduction ==<br />
<br />
In the modeling context, we usually assume that we have data that includes observations $\by$, measurement times $\bt$ and possibly additional regression variables $\bx$. There may also be individual covariates $\bc$, and in pharmacological applications the dose regimen $\bu$. For clarity, in the following notation we will omit the design variables $\bt$, $\bx$ and $\bu$, and the covariates $\bc$.<br />
<br />
Here, we find ourselves in the classical framework of incomplete data models. Indeed, only $\by = (y_{ij})$ is observed in the joint model $\pypsi(\by,\bpsi;\theta)$.<br />
<br />
Estimation tasks are common ones seen in statistics:<br />
<br />
<br />
<ol><br />
<li> Estimate the population parameter $\theta$ using the available observations and possibly a priori information that is available.</li><br />
<br />
<li>Evaluate the precision of the proposed estimates.</li><br />
<br />
<li>Reconstruct missing data, here being the individual parameters $\bpsi=(\psi_i, 1\leq i \leq N)$. </li><br />
<br />
<li>Estimate the log-likelihood for a given model, i.e., for a given joint distribution $\qypsi$ and value of $\theta$.</li><br />
</ol><br />
<br />
<br />
<br><br />
<br />
== Maximum likelihood estimation of the population parameters== <br />
<br />
<br><br />
=== Definitions ===<br />
<br />
<br />
''Maximum likelihood estimation'' consists of maximizing with respect to $\theta$ the ''observed likelihood'' defined by:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\like(\theta ; \by) &\eqdef& \py(\by ; \theta) \\<br />
&=& \int \pypsi(\by,\bpsi ;\theta) \, d \bpsi .<br />
\end{eqnarray}</math> }}<br />
<br />
Maximum likelihood estimation of the population parameter $\theta$ requires:<br />
<br />
<blockquote><br />
* A model, i.e., a joint distribution $\qypsi$. Depending on the software used, the model can be implemented using a script or a graphical user interface. $\monolix$ is extremely flexible and allows us to combine both. It is possible for instance to code the structural model using $\mlxtran$ and use the GUI for implementing the statistical model. Whatever the options selected, the complete model can always be saved as a text file. <br><br><br />
* Inputs $\by$, $\bc$, $\bu$ and $\bt$. All of these variables tend to be stored in a unique data file (see the [[Visualization#Data exploration | Data Exploration ]] Section). <br><br><br />
* An algorithm which allows us to maximize $\int \pypsi(\by,\bpsi ;\theta) \, d \bpsi$ with respect to $\theta$. Each software package has its own algorithms implemented. It is not our goal here to rate and compare the various algorithms and implementations. We will use exclusively the SAEM algorithm as described in [[The SAEM algorithm for estimating population parameters | The SAEM algorithm]] and implemented in $\monolix$ as we are entirely satisfied by both its theoretical and practical qualities: <br><br><br />
** The algorithms implemented in $\monolix$ including SAEM and its extensions (mixture models, hidden Markov models, SDE-based model, censored data, etc.) have been published in statistical journals. Furthermore, convergence of SAEM has been rigorously proved.<br><br><br />
** The SAEM implementation in $\monolix$ is extremely efficient for a wide variety of complex models.<br><br><br />
** The SAEM implementation in $\monolix$ was done by the same group that proposed the algorithm and studied in detail its theoretical and practical properties.<br />
</blockquote><br />
<br />
<br />
{{Remarks<br />
|title=Remark<br />
|text= It is important to highlight the fact that for a parameter $\psi_i$ whose distribution is the tranformation of a normal one (log-normal, logit-normal, etc.) the MLE $\hat{\psi}_{\rm pop}$ of the reference parameter $\psi_{\rm pop}$ is neither the mean nor the mode of the distribution. It is in fact the median.<br />
<br />
To show why this is the case, let $h$ be a nonlinear, twice continuously derivable and strictly increasing function such that $h(\psi_i)$ is normally distributed.<br />
<br />
<br />
* First we show that it is not the mean. By definition, the MLE of $h(\psi_{\rm pop})$ is $h(\hat{\psi}_{\rm pop})$. Thus, the estimated distribution of $h(\psi_i)$ is the normal distribution with mean $h(\hat{\psi}_{\rm pop})$, but $\esp{h(\psi_i)} = h(\hat{\psi}_{\rm pop})$ implies that $\esp{\psi_i} \neq \hat{\psi}_{\rm pop}$ since $h$ is nonlinear. In other words, $\hat{\psi}_{\rm pop}$ is not the mean of the estimated distribution of $\psi_i$.<br />
<br />
<br />
* Next we show that it is not the mode. Let $f$ be the pdf of $\psi_i$ and let $f_h$ be the pdf of $h(\psi_i)$. By definition, for any $h(t)\in \mathbb{R}$,<br />
<br />
{{Equation1<br />
|equation=<math><br />
f(t) = h^\prime(t)f_h(h(t)) . </math> }}<br />
<br />
: Thus,<br />
<br />
{{Equation1<br />
|equation=<math> <br />
f^\prime(t) = h^{\prime \prime}(t)f_h(h(t)) + h^{\prime 2}(t)f_h^\prime(h(t)) .<br />
</math> }}<br />
<br />
: By definition of the mode, $f_h^\prime(h(\hat{\psi}_{\rm pop}))=0$. Since $h$ is nonlinear, $h^{\prime \prime}(\hat{\psi}_{\rm pop})\neq 0$ a.s. and $f^\prime(\hat{\psi}_{\rm pop})\neq 0$ a.s.. In other words, $\hat{\psi}_{\rm pop}$ is not the mode of the estimated distribution of $\psi_i$.<br />
<br />
<br />
* Now we show that it is the median. Since $h$ is a strictly increasing function,<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\probs{\hat{\psi}_{\rm pop} }{\psi_i \leq \hat{\psi}_{\rm pop} } &=& \probs{\hat{\psi}_{\rm pop} }{h(\psi_i) \leq h(\hat{\psi}_{\rm pop})} \\<br />
&=& 0.5 .<br />
\end{eqnarray}</math> }} <br />
<br />
: In other words, $\hat{\psi}_{\rm pop}$ is the median of the estimated distribution of $\psi_i$.<br />
}}<br />
<br />
<br />
<br><br />
<br />
=== Example ===<br />
<br />
Let us again look at the model used in the [[Visualization#Model exploration | Model Visualization]] Section. For the case of a unique dose $D$ given at time $t=0$, the structural model is written:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
ke&=&Cl/V \\<br />
Cc(t) &=& \displaystyle{\frac{D \, ka}{V(ka-ke)} }\left(e^{-ke\,t} - e^{-ka\,t} \right) \\<br />
h(t) &=& h_0 \, \exp(\gamma\, Cc(t)) ,<br />
\end{eqnarray}</math> }}<br />
<br />
where $Cc$ is the concentration in the central compartment and $h$ the hazard function for the event of interest (hemorrhaging). Supposing a constant error model for the concentration, the model for the observations can be easily implemented using $\mlxtran$.<br />
<br />
<br />
{{MLXTran<br />
|name=joint1est_model.txt<br />
|text=<pre style="background-color:#EFEFEF; border:none;"> <br />
INPUT:<br />
parameter = {ka, V, Cl, h0, gamma}<br />
<br />
EQUATION:<br />
ke=Cl/V<br />
Cc = amtDose*ka/(V*(ka-ke))*(exp(-ke*t) - exp(-ka*t))<br />
h = h0*exp(gamma*Cc)<br />
<br />
OBSERVATION:<br />
Concentration = {type=continuous, prediction=Cc, errorModel=constant}<br />
Hemorrhaging = {type=event, hazard=h}<br />
<br />
OUTPUT:<br />
output = {Concentration, Hemorrhaging}<br />
</pre> }}<br />
<br />
<br />
Here, {{Verbatim|amtDose}} is a reserved keyword for the last administered dose.<br />
<br />
The model's parameters are the absorption rate constant $ka$, the volume of distribution $V$, the clearance $Cl$, the baseline hazard $h_0$ and the coefficient $\gamma$. The statistical model for the individual parameters can be defined in the $\monolix$ project file (left) and/or the $\monolix$ GUI (right):<br />
<br />
<br />
{{ExampleWithCode&Image<br />
|title=<br />
|text=<br />
|code={{MLXTranForTable<br />
|name=<br />
|text=<pre style="background-color:#EFEFEF; border:none;"> <br />
INDIVIDUAL:<br />
ka = {distribution=logNormal, iiv=yes}<br />
V = {distribution=logNormal, iiv=yes},<br />
Cl = {distribution=normal, iiv=yes},<br />
h0 = {distribution=probitNormal, iiv=yes},<br />
gamma = {distribution=logitNormal, iiv=yes},<br />
</pre> }}<br />
|image=<br />
[[File:Vsaem1.png]]<br />
}}<br />
<br />
<br />
Once the model is implemented, tasks such as maximum likelihood estimation can be performed using the SAEM algorithm. Certain settings in SAEM must be provided by the user. Even though SAEM is quite insensitive to the initial parameter values,<br />
it is possible to perform a preliminary sensitivity analysis in order to select "good" initial values.<br />
<br />
<br />
{{ImageWithCaption|image=Vsaem2.png|caption=Looking for good initial values for SAEM}}<br />
<br />
<br />
<br />
Then, when we run SAEM, it converges easily and quickly to the MLE:<br />
<br />
<br />
{{JustCode<br />
|code=<pre style="background-color:#EFEFEF; border:none;">Estimation of the population parameters<br />
<br />
parameter<br />
ka : 0.974<br />
V : 7.07<br />
Cl : 2.00<br />
h0 : 0.0102<br />
gamma : 0.485<br />
<br />
omega_ka : 0.668<br />
omega_V : 0.365<br />
omega_Cl : 0.588<br />
omega_h0 : 0.105<br />
omega_gamma : 0.0901<br />
<br />
a_1 : 0.345<br />
</pre> }}<br />
<br />
<br />
Parameter estimation can therefore be seen as estimating the reference values and variance of the random effects.<br />
<br />
In addition to these numbers, it is important to be able to graphically represent these distributions in order to see them and therefore understand them better. In effect, the interpretation of certain parameters is not always simple. Of course, we know what a normal distribution represents and in particular its mean, median and mode, which are equal (see the distribution of $Cl$ below for instance). These measures of central tendency can be different among themselves for other asymmetric distributions such as the log-normal (see the distribution of $ka$).<br />
<br />
Interpreting dispersion terms like $\omega_{ka}$ and $\omega_{V}$ is not obvious either when the parameter distributions are not normal. In such cases, quartiles or quantiles of order 5% and 95% (for example) may be useful for quantitively describing the variability of these parameters.<br />
<br />
<br />
{{Remarks <br />
|title=Remarks<br />
|text=<br />
For a parameter $\psi$ whose distribution is log-normal, we can approximate the coefficient of variation for $\psi$ by the standard deviation $\omega_{\psi}$ of the random effect $\eta$ if this is fairly small. In effect, when $\omega_{\psi}$ is small,<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\psi &=& \psi_{\rm pop} e^{\eta} \\<br />
&\approx & \psi_{\rm pop}(1+ \eta) .<br />
\end{eqnarray}</math> }}<br />
<br />
Thus<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\esp{\psi} &\approx& \psi_{\rm pop} \\<br />
\std{\psi} &\approx & \psi_{\rm pop}\omega_{\psi},<br />
\end{eqnarray}</math> }}<br />
<br />
and<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
{\rm cv}(\psi) &=& \frac{\std{\psi} }{\esp{\psi} } \\<br />
&\approx & \omega_{\psi} .<br />
\end{eqnarray}</math> }}<br />
<br />
Do not forget that this approximation is only valid when $\omega$ is small and in the case of log-normal distributions. It does not carry over to any other distribution. Thus, when $\omega_{h0}=0.1$ for a probit-normal distribution or $\omega_{\gamma}=0.09$ for a logit-normal one, there is no immediate interpretation available. Only by looking at the graphical display of the pdf or by calculating some quantiles of interest can we begin to get an idea of dispersion in the parameters $h0$ and $\gamma$.<br />
}}<br />
<br />
<br />
{{ImageWithCaption|image=saem3b.png|caption=Estimation of the population distributions of the individual parameters of the model }}<br />
<br />
<br />
<br />
<br><br />
<br />
==Bayesian estimation of the population parameters==<br />
<br />
The ''Bayesian approach'' considers $\theta$ as a random vector with a ''prior distribution'' $\qth$. We can then define the posterior distribution of $\theta$:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\pcthy(\theta {{!}} \by ) &=& \displaystyle{ \frac{\pth( \theta )\pcyth(\by {{!}} \theta )}{\py(\by)} }\\<br />
&=& \displaystyle{ \frac{\pth( \theta ) \int \pypsith(\by,\bpsi {{!}}\theta) \, d \bpsi}{\py(\by)} }.<br />
\end{eqnarray}</math> }}<br />
<br />
We can estimate this conditional distribution and derive any statistics (posterior mean, standard deviation, percentiles, etc.) or derive the so-called ''Maximum a Posteriori'' (MAP) estimate of $\theta$:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\hat{\theta}^{\rm MAP} &=& \argmax{\theta} \pcthy(\theta {{!}} \by ) \\<br />
&=& \argmax{\theta} \left\{ {\llike}(\theta ; \by) + \log( \pth( \theta ) ) \right\} .<br />
\end{eqnarray}</math> }}<br />
<br />
The MAP estimate therefore maximizes a penalized version of the observed likelihood. In other words, maximum a posteriori estimation reduces to penalized maximum likelihood estimation. Suppose for instance that $\theta$ is a scalar parameter and the prior is a normal distribution with mean $\theta_0$ and variance $\gamma^2$. Then, the MAP estimate minimizes<br />
<br />
{{Equation1<br />
|equation=<math><br />
\hat{\theta}^{\rm MAP} =\argmax{\theta} \left\{ {\llike} (\theta ; \by) - \displaystyle{ \frac{1}{2\gamma^2} }(\theta - \theta_0)^2 \right\} .<br />
</math> }}<br />
<br />
The MAP estimate is a trade-off between the MLE which maximizes ${\llike}(\theta ; \by)$ and $\theta_0$ which minimizes $(\theta - \theta_0)^2$. The weight given to the prior directly depends on the variance of the prior distribution: the smaller $\gamma^2$ is, the closer to $\theta_0$ the MAP is. The limiting distribution considers that $\gamma^2=0$: this prior means here that $\theta$ is fixed as $\theta_0$ and no longer needs to be estimated.<br />
<br />
Both the Bayesian and frequentist approaches have their supporters and detractors. But rather than being dogmatic and blindly following the same rule-book every time, we need to be pragmatic and ask the right methodological questions when confronted with a new problem.<br />
<br />
We have to remember that Bayesian methods have been extremely successful, in particular for numerical calculations. For instance, (Bayesian) MCMC methods allow us to estimate more or less any conditional distribution coming from any hierarchical model, whereas frequentist approaches such as maximum likelihood estimation can be much more difficult to implement.<br />
<br />
All things said, the problem comes down to knowing whether the data contains sufficient information to answer a given question, and whether some other information may be available to help answer it. This is the essence of the art of modeling: finding the right compromise between the confidence we have in the data and prior knowledge of the problem. Each problem is different and requires a specific approach. For instance, if all the patients in a pharmacokinetic trial have essentially the same weight, it is pointless to estimate a relationship between weight and the model's PK parameters using the trial data. In this case, the modeler would be better served trying to use prior information based on physiological criteria rather than just a statistical model.<br />
<br />
Therefore, we can use information available to us, of course! Why not? But this information needs to be pertinent. Systematically using a prior for the parameters is not always meaningful. Can we reasonable suppose that we have access to such information? For continuous data for example, what does putting a prior on the residual error model's parameters mean in reality? A reasoned statistical approach consists of only including prior information for certain parameters (those for which we have real prior information) and having confidence in the data for the others.<br />
<br />
$\monolix$ allows this hybrid approach which reconciles the Bayesian and frequentist approaches. A given parameter can be:<br />
<br />
<br />
<ul><br />
* a fixed constant if we have absolute confidence in its value or the data does not allow it to be estimated, essentially due to identifiability constraints.<br />
<br><br />
<br />
* estimated by maximum likelihood, either because we have great confidence in the data or have no information on the parameter.<br />
<br><br />
<br />
* estimated by introducing a prior and calculating the MAP estimate.<br />
<br><br />
<br />
* estimated by introducing a prior and then estimating the posterior distribution.<br />
</ul><br />
<br />
<br />
We put aside dealing with the fixed components of $\theta$ in the following. Here are some possible situations:<br />
<br />
<br />
<ol><br />
<li> ''Combined maximum likelihood and maximum a posteriori estimation'': decompose $\theta$ into $(\theta_E,\theta_{M})$ where $\theta_E$ are the components of $\theta$ to be estimated with MLE and $\theta_{M}$ those with a prior distribution whose posterior distribution is to be maximized. Then, $(\hat{\theta}_E , \hat{\theta}_{M} )$ below maximizes the penalized likelihood of $(\theta_E,\theta_{M})$: </li><br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
(\hat{\theta}_E , \hat{\theta}_{M} ) &=& \argmax{\theta_E , \theta_{M} } \log(\py(\by , \theta_{M}; \theta_E)) \\<br />
&=& \argmax{\theta_E , \theta_{M} } \left\{ {\llike}(\theta_E , \theta_{M}; \by) + \log( \pth( \theta_M ) ) \right\} ,<br />
\end{eqnarray}</math> }}<br />
<br />
where ${\llike} (\theta_E , \theta_{M}; \by) \ \ \eqdef \ \ \log\left(\py(\by | \theta_{M}; \theta_E)\right).$<br />
<br />
<br />
<li> ''Combined maximum likelihood and posterior distribution estimation'': here, decompose $\theta$ into $(\theta_E,\theta_{R})$ where $\theta_E$ are the components of $\theta$ to be estimated with MLE and $\theta_{R}$ those with a prior distribution whose posterior distribution is to be estimated. We propose the following strategy for estimating $\theta_E$ and $\theta_{R}$: </li><br />
<br />
<br />
<ol style="list-style-type:lower-roman"><br />
<li> Compute the maximum likelihood of $\theta_E$: </li><br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\hat{\theta}_E &=& \argmax{\theta_E} \log(\py(\by ; \theta_E)) \\<br />
&=& \argmax{\theta_E} \int \pmacro(\by , \theta_R ; \theta_E ) d \theta_R .<br />
\end{eqnarray}</math> }}<br />
<br />
<br />
<li> Estimate the conditional distribution $\pmacro(\theta_{R} | \by ;\hat{\theta}_E)$. </li><br />
</ol><br />
<br />
<br />
It is then straightforward to extend this approach to more complex situations where some components of $\theta$ are estimated with MLE, others using MAP estimation and others still by estimating their conditional distributions.<br />
</ol><br />
<br />
<br />
{{Example1<br />
|title1=Example<br />
|title2=A PK example<br />
|text=<br />
In this example we use only the pharmacokinetic data and aim to estimate the population parameter distributions of the PK parameters $ka$, $V$ and $Cl$. We assume log-normal distributions for these three parameters. All of the model's population parameters are estimated by maximum likelihood estimation except $ka_{\rm pop}$ for which a log-normal distribution is used as a prior:<br />
<br />
{{Equation1<br />
|equation=<math> \log(ka_{\rm pop}) \sim {\cal N}(\log(1.5), \gamma^2) . </math> }}<br />
<br />
$\monolix$ allows us to compute the MAP estimate and to estimate the posterior distribution of $ka_{\rm pop}$ for various values of $\gamma$.<br />
<br />
<br />
<div style="margin-left:15%; margin-right:32%; align:center"><br />
{{{!}} class="wikitable" align="center" style="width:100%"<br />
{{!}} $\gamma$ {{!}}{{!}} 0 {{!}}{{!}} 0.01 {{!}}{{!}} 0.025 {{!}}{{!}} 0.05 {{!}}{{!}} 0.1 {{!}}{{!}} 0.2 {{!}}{{!}} $+ \infty$ <br />
{{!}}-<br />
{{!}}$\hat{ka}_{\rm pop}^{\rm MAP}$ {{!}}{{!}} 1.5 {{!}}{{!}} 1.49 {{!}}{{!}} 1.47 {{!}}{{!}} 1.39 {{!}}{{!}} 1.22 {{!}}{{!}} 1.11 {{!}}{{!}} 1.05 <br />
{{!}}}</div><br />
<br />
{{ImageWithCaption|image=bayes1.png|caption=Prior and posterior distributions of $ka_{\rm pop}$ for different values of $\gamma$}}<br />
<br />
<br />
As expected, the posterior distribution converges to the prior distribution when the standard deviation $\gamma$ of the prior distribution decreases. Also, the mode of the posterior distribution converges to the maximum likelihood estimate of $ka_{\rm pop}$ when $\gamma$ increases.<br />
}}<br />
<br />
<br />
<br><br />
== Estimation of the Fisher information matrix ==<br />
<br />
The variance of the estimator $\thmle$ and thus confidence intervals can be derived from the [[Estimation of the observed Fisher information matrix|observed Fisher information matrix (F.I.M.)]], which itself is calculated using the observed likelihood (i.e., the pdf of the observations $\by$):<br />
<br />
{{EquationWithRef<br />
|equation=<div id="ofim_intro3"><math><br />
\ofim(\thmle ; \by) \ \ \eqdef \ \ - \displaystyle{ \frac{\partial^2}{\partial \theta^2} }\log({\like}(\thmle ; \by)) .<br />
</math></div><br />
|reference=(1) }}<br />
<br />
Then, the variance-covariance matrix of the maximum likelihood estimator $\thmle$ can be estimated by the inverse of the observed F.I.M. Standard errors (s.e.) for each component of $\thmle$ are their standard deviations, i.e., the square-root of the diagonal elements of this covariance matrix. $\monolix$ also displays the (estimated) relative standard errors (r.s.e.), i.e., the (estimated) standard error divided by the value of the estimated parameter.<br />
<br />
<br />
{{JustCode<br />
|code=<pre style="background-color:#EFEFEF; border:none;">Estimation of the population parameters<br />
<br />
parameter s.e. (s.a.) r.s.e.(%)<br />
ka : 0.974 0.082 8<br />
V : 7.07 0.35 5<br />
Cl : 2 0.07 4<br />
h0 : 0.0102 0.0014 14<br />
gamma : 0.485 0.015 3<br />
<br />
omega_ka : 0.668 0.064 10<br />
omega_V : 0.365 0.037 10<br />
omega_Cl : 0.588 0.055 9<br />
omega_h0 : 0.105 0.032 30<br />
omega_gamma : 0.0901 0.044 49<br />
<br />
a_1 : 0.345 0.012 3<br />
</pre> }}<br />
<br />
The F.I.M. can be used for detecting overparametrization of the structural model. In effect, if the model is poorly identifiable, certain estimators will be quite correlated and the F.I.M. will therefore be poorly conditioned and difficult to inverse. Suppose for example that we want to fit a two compartment PK model to the same data as before. The output is shown below. The large values for the relative standard errors for the inter-compartmental clearance $Q$ and the volume of the peripheral compartment $V_2$ mean that the data does not allow us to estimate well these two parameters.<br />
<br />
<br />
{{JustCode<br />
|code=<pre style="background-color:#EFEFEF; border:none;">Estimation of the population parameters<br />
<br />
parameter s.e. (lin) r.s.e.(%)<br />
ka : 0.246 0.0081 3<br />
Cl : 1.9 0.075 4<br />
V1 : 1.71 0.14 8<br />
Q : 0.000171 0.024 1.43e+04<br />
V2 : 0.00673 3.1 4.62e+04<br />
<br />
omega_ka : 0.171 0.026 15<br />
omega_Cl : 0.293 0.026 9<br />
omega_V1 : 0.621 0.062 10<br />
omega_Q : 5.72 1.4e+03 2.41e+04<br />
omega_V2 : 4.61 1.8e+04 3.94e+05<br />
<br />
a : 0.136 0.0073 5<br />
</pre> }}<br />
<br />
<br />
The Fisher information criteria is also widely used in optimal experimental design. Indeed, minimizing the variance of the estimator corresponds to maximizing the information. Then, estimators and designs can be evaluated by looking at certain summary statistics of the covariance matrix (like the determinant or trace for instance).<br />
<br />
<br><br />
== Estimation of the individual parameters ==<br />
<br />
Once $\theta$ has been estimated, the conditional distribution $\pmacro(\psi_i | y_i ; \hat{\theta})$ of the individual parameters $\psi_i$ can be estimated for each individual $i$ using the [[The Metropolis-Hastings algorithm for simulating the individual parameters| Metropolis-Hastings algorithm]]. For each $i$, this algorithm generates a sequence $(\psi_i^{k}, k \geq 1)$ which converges in distribution to the conditional distribution $\pmacro(\psi_i | y_i ; \hat{\theta})$ and that can be used for estimating any summary statistic of this distribution (mean, standard deviation, quantiles, etc.).<br />
<br />
The mode of this conditional distribution can be estimated using this sequence or by maximizing $\pmacro(\psi_i | y_i ; \hat{\theta})$ using numerical methods.<br />
<br />
The choice of using the conditional mean or the conditional mode is arbitrary. By default, $\monolix$ uses the conditional mode, taking the philosophy that the "most likely" values of the individual parameters are the most suited for computing the "most likely" predictions.<br />
<br />
<br />
{{ImageWithCaption|image=mode1.png|caption=Predicted concentrations for 6 individuals using the estimated conditional modes of the individual PK parameters}} <br />
<br />
<br><br />
<br />
== Estimation of the observed log-likelihood ==<br />
<br />
<br />
Once $\theta$ has been estimated, the observed log-likelihood of $\hat{\theta}$ is defined as<br />
<br />
{{Equation1<br />
|equation=<math> \begin{eqnarray}<br />
{\llike} (\hat{\theta};\by) &=& \log({\like}(\hat{\theta};\by)) \\<br />
&\eqdef& \log(\py(\by;\hat{\theta})) .<br />
\end{eqnarray}</math> }}<br />
<br />
The observed log-likelihood cannot be computed in closed form for nonlinear mixed effects models, but can be estimated using the methods described in the [[Estimation of the log-likelihood]] Section. The estimated log-likelihood can then be used for performing likelihood ratio tests and for computing information criteria such as AIC and BIC (see the [[Evaluation]] Section).<br />
<br />
<br />
<br><br />
== Bibliography ==<br />
<br />
<bibtex><br />
@article{Monolix,<br />
author = {Lixoft},<br />
title = {Monolix 4.2},<br />
year={2012}<br />
journal = {http://www.lixoft.eu/products/monolix/product-monolix-overview},<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@article{comets2011package,<br />
title={saemix: Stochastic Approximation Expectation Maximization (SAEM) algorithm. R package version 0.96.1},<br />
author={Comets, E. and Lavenu, A. and Lavielle, M.},<br />
journal = {http://cran.r-project.org/web/packages/saemix/index.html},<br />
year={2013}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@article{nlmefitsa,<br />
title={nlmefitsa: fit nonlinear mixed-effects model with stochastic EM algorithm. Matlab R2013a function},<br />
author={The MathWorks},<br />
journal = {http://www.mathworks.fr/fr/help/stats/nlmefitsa.html},<br />
year={2013}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@article{beal1992nonmem,<br />
title={NONMEM users guides},<br />
author={Beal, S.L. and Sheiner, L.B. and Boeckmann, A. and Bauer, R.J.},<br />
journal={San Francisco, NONMEM Project Group, University of California},<br />
year={1992}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@book{pinheiro2000mixed,<br />
title={Mixed effects models in S and S-PLUS},<br />
author={Pinheiro, J.C. and Bates, D.M.},<br />
year={2000},<br />
publisher={Springer Verlag}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@article{pinheiro2010r,<br />
title={the R Core team (2009) nlme: Linear and Nonlinear Mixed Effects Models. R package version 3.1-96},<br />
author={Pinheiro, J. and Bates, D. and DebRoy, S. and Sarkar, D.},<br />
journal={R Foundation for Statistical Computing, Vienna},<br />
year={2010}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@article{spiegelhalter2003winbugs,<br />
title={WinBUGS user manual},<br />
author={Spiegelhalter, D. and Thomas, A. and Best, N. and Lunn, D.},<br />
journal={Cambridge: MRC Biostatistics Unit},<br />
year={2003}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@Manual{docSPSS,<br />
title = {Linear mixed-effects modeling in SPSS. An introduction to the MIXED procedure},<br />
author = {SPSS},<br />
year = {2002},<br />
note={Technical Report}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@Manual{docSAS,<br />
title = {The NLMMIXED procedure, SAS/STAT 9.2 User's Guide},<br />
chapter = {61},<br />
pages = {4337--4435},<br />
author = {SAS},<br />
year = {2008}<br />
}<br />
</bibtex><br />
<br />
<br />
{{Back&Next<br />
|linkBack=Visualization<br />
|linkNext=Model evaluation }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=Estimation&diff=7284Estimation2013-06-07T13:15:50Z<p>Brocco: </p>
<hr />
<div>== Introduction ==<br />
<br />
In the modeling context, we usually assume that we have data that includes observations $\by$, measurement times $\bt$ and possibly additional regression variables $\bx$. There may also be individual covariates $\bc$, and in pharmacological applications the dose regimen $\bu$. For clarity, in the following notation we will omit the design variables $\bt$, $\bx$ and $\bu$, and the covariates $\bc$.<br />
<br />
Here, we find ourselves in the classical framework of incomplete data models. Indeed, only $\by = (y_{ij})$ is observed in the joint model $\pypsi(\by,\bpsi;\theta)$.<br />
<br />
Estimation tasks are common ones seen in statistics:<br />
<br />
<br />
<ol><br />
<li> Estimate the population parameter $\theta$ using the available observations and possibly a priori information that is available.</li><br />
<br />
<li>Evaluate the precision of the proposed estimates.</li><br />
<br />
<li>Reconstruct missing data, here being the individual parameters $\bpsi=(\psi_i, 1\leq i \leq N)$. </li><br />
<br />
<li>Estimate the log-likelihood for a given model, i.e., for a given joint distribution $\qypsi$ and value of $\theta$.</li><br />
</ol><br />
<br />
<br />
<br><br />
<br />
== Maximum likelihood estimation of the population parameters== <br />
<br />
<br><br />
=== Definitions ===<br />
<br />
<br />
''Maximum likelihood estimation'' consists of maximizing with respect to $\theta$ the ''observed likelihood'' defined by:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\like(\theta ; \by) &\eqdef& \py(\by ; \theta) \\<br />
&=& \int \pypsi(\by,\bpsi ;\theta) \, d \bpsi .<br />
\end{eqnarray}</math> }}<br />
<br />
Maximum likelihood estimation of the population parameter $\theta$ requires:<br />
<br />
<blockquote><br />
* A model, i.e., a joint distribution $\qypsi$. Depending on the software used, the model can be implemented using a script or a graphical user interface. $\monolix$ is extremely flexible and allows us to combine both. It is possible for instance to code the structural model using $\mlxtran$ and use the GUI for implementing the statistical model. Whatever the options selected, the complete model can always be saved as a text file. <br><br><br />
* Inputs $\by$, $\bc$, $\bu$ and $\bt$. All of these variables tend to be stored in a unique data file (see the [[Visualization#Data exploration | Data Exploration ]] Section). <br><br><br />
* An algorithm which allows us to maximize $\int \pypsi(\by,\bpsi ;\theta) \, d \bpsi$ with respect to $\theta$. Each software package has its own algorithms implemented. It is not our goal here to rate and compare the various algorithms and implementations. We will use exclusively the SAEM algorithm as described in [[The SAEM algorithm for estimating population parameters | The SAEM algorithm]] and implemented in $\monolix$ as we are entirely satisfied by both its theoretical and practical qualities: <br><br><br />
** The algorithms implemented in $\monolix$ including SAEM and its extensions (mixture models, hidden Markov models, SDE-based model, censored data, etc.) have been published in statistical journals. Furthermore, convergence of SAEM has been rigorously proved.<br><br><br />
** The SAEM implementation in $\monolix$ is extremely efficient for a wide variety of complex models.<br><br><br />
** The SAEM implementation in $\monolix$ was done by the same group that proposed the algorithm and studied in detail its theoretical and practical properties.<br />
</blockquote><br />
<br />
<br />
{{Remarks<br />
|title=Remark<br />
|text= It is important to highlight the fact that for a parameter $\psi_i$ whose distribution is the tranformation of a normal one (log-normal, logit-normal, etc.) the MLE $\hat{\psi}_{\rm pop}$ of the reference parameter $\psi_{\rm pop}$ is neither the mean nor the mode of the distribution. It is in fact the median.<br />
<br />
To show why this is the case, let $h$ be a nonlinear, twice continuously derivable and strictly increasing function such that $h(\psi_i)$ is normally distributed.<br />
<br />
<br />
* First we show that it is not the mean. By definition, the MLE of $h(\psi_{\rm pop})$ is $h(\hat{\psi}_{\rm pop})$. Thus, the estimated distribution of $h(\psi_i)$ is the normal distribution with mean $h(\hat{\psi}_{\rm pop})$, but $\esp{h(\psi_i)} = h(\hat{\psi}_{\rm pop})$ implies that $\esp{\psi_i} \neq \hat{\psi}_{\rm pop}$ since $h$ is nonlinear. In other words, $\hat{\psi}_{\rm pop}$ is not the mean of the estimated distribution of $\psi_i$.<br />
<br />
<br />
* Next we show that it is not the mode. Let $f$ be the pdf of $\psi_i$ and let $f_h$ be the pdf of $h(\psi_i)$. By definition, for any $h(t)\in \mathbb{R}$,<br />
<br />
{{Equation1<br />
|equation=<math><br />
f(t) = h^\prime(t)f_h(h(t)) . </math> }}<br />
<br />
: Thus,<br />
<br />
{{Equation1<br />
|equation=<math> <br />
f^\prime(t) = h^{\prime \prime}(t)f_h(h(t)) + h^{\prime 2}(t)f_h^\prime(h(t)) .<br />
</math> }}<br />
<br />
: By definition of the mode, $f_h^\prime(h(\hat{\psi}_{\rm pop}))=0$. Since $h$ is nonlinear, $h^{\prime \prime}(\hat{\psi}_{\rm pop})\neq 0$ a.s. and $f^\prime(\hat{\psi}_{\rm pop})\neq 0$ a.s.. In other words, $\hat{\psi}_{\rm pop}$ is not the mode of the estimated distribution of $\psi_i$.<br />
<br />
<br />
* Now we show that it is the median. Since $h$ is a strictly increasing function,<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\probs{\hat{\psi}_{\rm pop} }{\psi_i \leq \hat{\psi}_{\rm pop} } &=& \probs{\hat{\psi}_{\rm pop} }{h(\psi_i) \leq h(\hat{\psi}_{\rm pop})} \\<br />
&=& 0.5 .<br />
\end{eqnarray}</math> }} <br />
<br />
: In other words, $\hat{\psi}_{\rm pop}$ is the median of the estimated distribution of $\psi_i$.<br />
}}<br />
<br />
<br />
<br><br />
<br />
=== Example ===<br />
<br />
Let us again look at the model used in the [[Visualization#Model exploration | Model Visualization]] Section. For the case of a unique dose $D$ given at time $t=0$, the structural model is written:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
ke&=&Cl/V \\<br />
Cc(t) &=& \displaystyle{\frac{D \, ka}{V(ka-ke)} }\left(e^{-ke\,t} - e^{-ka\,t} \right) \\<br />
h(t) &=& h_0 \, \exp(\gamma\, Cc(t)) ,<br />
\end{eqnarray}</math> }}<br />
<br />
where $Cc$ is the concentration in the central compartment and $h$ the hazard function for the event of interest (hemorrhaging). Supposing a constant error model for the concentration, the model for the observations can be easily implemented using $\mlxtran$.<br />
<br />
<br />
{{MLXTran<br />
|name=joint1est_model.txt<br />
|text=<pre style="background-color:#EFEFEF; border:none;"> <br />
INPUT:<br />
parameter = {ka, V, Cl, h0, gamma}<br />
<br />
EQUATION:<br />
ke=Cl/V<br />
Cc = amtDose*ka/(V*(ka-ke))*(exp(-ke*t) - exp(-ka*t))<br />
h = h0*exp(gamma*Cc)<br />
<br />
OBSERVATION:<br />
Concentration = {type=continuous, prediction=Cc, errorModel=constant}<br />
Hemorrhaging = {type=event, hazard=h}<br />
<br />
OUTPUT:<br />
output = {Concentration, Hemorrhaging}<br />
</pre> }}<br />
<br />
<br />
Here, {{Verbatim|amtDose}} is a reserved keyword for the last administered dose.<br />
<br />
The model's parameters are the absorption rate constant $ka$, the volume of distribution $V$, the clearance $Cl$, the baseline hazard $h_0$ and the coefficient $\gamma$. The statistical model for the individual parameters can be defined in the $\monolix$ project file (left) and/or the $\monolix$ GUI (right):<br />
<br />
<br />
{{ExampleWithCode&Image<br />
|title=<br />
|text=<br />
|code={{MLXTranForTable<br />
|name=<br />
|text=<pre style="background-color:#EFEFEF; border:none;"> <br />
INDIVIDUAL:<br />
ka = {distribution=logNormal, iiv=yes}<br />
V = {distribution=logNormal, iiv=yes},<br />
Cl = {distribution=normal, iiv=yes},<br />
h0 = {distribution=probitNormal, iiv=yes},<br />
gamma = {distribution=logitNormal, iiv=yes},<br />
</pre> }}<br />
|image=<br />
[[File:Vsaem1.png]]<br />
}}<br />
<br />
<br />
Once the model is implemented, tasks such as maximum likelihood estimation can be performed using the SAEM algorithm. Certain settings in SAEM must be provided by the user. Even though SAEM is quite insensitive to the initial parameter values,<br />
it is possible to perform a preliminary sensitivity analysis in order to select "good" initial values.<br />
<br />
<br />
{{ImageWithCaption|image=Vsaem2.png|caption=Looking for good initial values for SAEM}}<br />
<br />
<br />
<br />
Then, when we run SAEM, it converges easily and quickly to the MLE:<br />
<br />
<br />
{{JustCode<br />
|code=<pre style="background-color:#EFEFEF; border:none;">Estimation of the population parameters<br />
<br />
parameter<br />
ka : 0.974<br />
V : 7.07<br />
Cl : 2.00<br />
h0 : 0.0102<br />
gamma : 0.485<br />
<br />
omega_ka : 0.668<br />
omega_V : 0.365<br />
omega_Cl : 0.588<br />
omega_h0 : 0.105<br />
omega_gamma : 0.0901<br />
<br />
a_1 : 0.345<br />
</pre> }}<br />
<br />
<br />
Parameter estimation can therefore be seen as estimating the reference values and variance of the random effects.<br />
<br />
In addition to these numbers, it is important to be able to graphically represent these distributions in order to see them and therefore understand them better. In effect, the interpretation of certain parameters is not always simple. Of course, we know what a normal distribution represents and in particular its mean, median and mode, which are equal (see the distribution of $Cl$ below for instance). These measures of central tendency can be different among themselves for other asymmetric distributions such as the log-normal (see the distribution of $ka$).<br />
<br />
Interpreting dispersion terms like $\omega_{ka}$ and $\omega_{V}$ is not obvious either when the parameter distributions are not normal. In such cases, quartiles or quantiles of order 5% and 95% (for example) may be useful for quantitively describing the variability of these parameters.<br />
<br />
<br />
{{Remarks <br />
|title=Remarks<br />
|text=<br />
For a parameter $\psi$ whose distribution is log-normal, we can approximate the coefficient of variation for $\psi$ by the standard deviation $\omega_{\psi}$ of the random effect $\eta$ if this is fairly small. In effect, when $\omega_{\psi}$ is small,<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\psi &=& \psi_{\rm pop} e^{\eta} \\<br />
&\approx & \psi_{\rm pop}(1+ \eta) .<br />
\end{eqnarray}</math> }}<br />
<br />
Thus<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\esp{\psi} &\approx& \psi_{\rm pop} \\<br />
\std{\psi} &\approx & \psi_{\rm pop}\omega_{\psi},<br />
\end{eqnarray}</math> }}<br />
<br />
and<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
{\rm cv}(\psi) &=& \frac{\std{\psi} }{\esp{\psi} } \\<br />
&\approx & \omega_{\psi} .<br />
\end{eqnarray}</math> }}<br />
<br />
Do not forget that this approximation is only valid when $\omega$ is small and in the case of log-normal distributions. It does not carry over to any other distribution. Thus, when $\omega_{h0}=0.1$ for a probit-normal distribution or $\omega_{\gamma}=0.09$ for a logit-normal one, there is no immediate interpretation available. Only by looking at the graphical display of the pdf or by calculating some quantiles of interest can we begin to get an idea of dispersion in the parameters $h0$ and $\gamma$.<br />
}}<br />
<br />
<br />
{{ImageWithCaption|image=saem3b.png|caption=Estimation of the population distributions of the individual parameters of the model }}<br />
<br />
<br />
<br />
<br><br />
<br />
==Bayesian estimation of the population parameters==<br />
<br />
The ''Bayesian approach'' considers $\theta$ as a random vector with a ''prior distribution'' $\qth$. We can then define the posterior distribution of $\theta$:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\pcthy(\theta {{!}} \by ) &=& \displaystyle{ \frac{\pth( \theta )\pcyth(\by {{!}} \theta )}{\py(\by)} }\\<br />
&=& \displaystyle{ \frac{\pth( \theta ) \int \pypsith(\by,\bpsi {{!}}\theta) \, d \bpsi}{\py(\by)} }.<br />
\end{eqnarray}</math> }}<br />
<br />
We can estimate this conditional distribution and derive any statistics (posterior mean, standard deviation, percentiles, etc.) or derive the so-called ''Maximum a Posteriori'' (MAP) estimate of $\theta$:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\hat{\theta}^{\rm MAP} &=& \argmax{\theta} \pcthy(\theta {{!}} \by ) \\<br />
&=& \argmax{\theta} \left\{ {\llike}(\theta ; \by) + \log( \pth( \theta ) ) \right\} .<br />
\end{eqnarray}</math> }}<br />
<br />
The MAP estimate therefore maximizes a penalized version of the observed likelihood. In other words, maximum a posteriori estimation reduces to penalized maximum likelihood estimation. Suppose for instance that $\theta$ is a scalar parameter and the prior is a normal distribution with mean $\theta_0$ and variance $\gamma^2$. Then, the MAP estimate minimizes<br />
<br />
{{Equation1<br />
|equation=<math><br />
\hat{\theta}^{\rm MAP} =\argmax{\theta} \left\{ {\llike} (\theta ; \by) - \displaystyle{ \frac{1}{2\gamma^2} }(\theta - \theta_0)^2 \right\} .<br />
</math> }}<br />
<br />
The MAP estimate is a trade-off between the MLE which maximizes ${\llike}(\theta ; \by)$ and $\theta_0$ which minimizes $(\theta - \theta_0)^2$. The weight given to the prior directly depends on the variance of the prior distribution: the smaller $\gamma^2$ is, the closer to $\theta_0$ the MAP is. The limiting distribution considers that $\gamma^2=0$: this prior means here that $\theta$ is fixed as $\theta_0$ and no longer needs to be estimated.<br />
<br />
Both the Bayesian and frequentist approaches have their supporters and detractors. But rather than being dogmatic and blindly following the same rule-book every time, we need to be pragmatic and ask the right methodological questions when confronted with a new problem.<br />
<br />
We have to remember that Bayesian methods have been extremely successful, in particular for numerical calculations. For instance, (Bayesian) MCMC methods allow us to estimate more or less any conditional distribution coming from any hierarchical model, whereas frequentist approaches such as maximum likelihood estimation can be much more difficult to implement.<br />
<br />
All things said, the problem comes down to knowing whether the data contains sufficient information to answer a given question, and whether some other information may be available to help answer it. This is the essence of the art of modeling: finding the right compromise between the confidence we have in the data and prior knowledge of the problem. Each problem is different and requires a specific approach. For instance, if all the patients in a pharmacokinetic trial have essentially the same weight, it is pointless to estimate a relationship between weight and the model's PK parameters using the trial data. In this case, the modeler would be better served trying to use prior information based on physiological criteria rather than just a statistical model.<br />
<br />
Therefore, we can use information available to us, of course! Why not? But this information needs to be pertinent. Systematically using a prior for the parameters is not always meaningful. Can we reasonable suppose that we have access to such information? For continuous data for example, what does putting a prior on the residual error model's parameters mean in reality? A reasoned statistical approach consists of only including prior information for certain parameters (those for which we have real prior information) and having confidence in the data for the others.<br />
<br />
$\monolix$ allows this hybrid approach which reconciles the Bayesian and frequentist approaches. A given parameter can be:<br />
<br />
<br />
<ul><br />
* a fixed constant if we have absolute confidence in its value or the data does not allow it to be estimated, essentially due to identifiability constraints.<br />
<br><br />
<br />
* estimated by maximum likelihood, either because we have great confidence in the data or have no information on the parameter.<br />
<br><br />
<br />
* estimated by introducing a prior and calculating the MAP estimate.<br />
<br><br />
<br />
* estimated by introducing a prior and then estimating the posterior distribution.<br />
</ul><br />
<br />
<br />
We put aside dealing with the fixed components of $\theta$ in the following. Here are some possible situations:<br />
<br />
<br />
<ol><br />
<li> ''Combined maximum likelihood and maximum a posteriori estimation'': decompose $\theta$ into $(\theta_E,\theta_{M})$ where $\theta_E$ are the components of $\theta$ to be estimated with MLE and $\theta_{M}$ those with a prior distribution whose posterior distribution is to be maximized. Then, $(\hat{\theta}_E , \hat{\theta}_{M} )$ below maximizes the penalized likelihood of $(\theta_E,\theta_{M})$: </li><br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
(\hat{\theta}_E , \hat{\theta}_{M} ) &=& \argmax{\theta_E , \theta_{M} } \log(\py(\by , \theta_{M}; \theta_E)) \\<br />
&=& \argmax{\theta_E , \theta_{M} } \left\{ {\llike}(\theta_E , \theta_{M}; \by) + \log( \pth( \theta_M ) ) \right\} ,<br />
\end{eqnarray}</math> }}<br />
<br />
where ${\llike} (\theta_E , \theta_{M}; \by) \ \ \eqdef \ \ \log\left(\py(\by | \theta_{M}; \theta_E)\right).$<br />
<br />
<br />
<li> ''Combined maximum likelihood and posterior distribution estimation'': here, decompose $\theta$ into $(\theta_E,\theta_{R})$ where $\theta_E$ are the components of $\theta$ to be estimated with MLE and $\theta_{R}$ those with a prior distribution whose posterior distribution is to be estimated. We propose the following strategy for estimating $\theta_E$ and $\theta_{R}$: </li><br />
<br />
<br />
<ol style="list-style-type:lower-roman"><br />
<li> Compute the maximum likelihood of $\theta_E$: </li><br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\hat{\theta}_E &=& \argmax{\theta_E} \log(\py(\by ; \theta_E)) \\<br />
&=& \argmax{\theta_E} \int \pmacro(\by , \theta_R ; \theta_E ) d \theta_R .<br />
\end{eqnarray}</math> }}<br />
<br />
<br />
<li> Estimate the conditional distribution $\pmacro(\theta_{R} | \by ;\hat{\theta}_E)$. </li><br />
</ol><br />
<br />
<br />
It is then straightforward to extend this approach to more complex situations where some components of $\theta$ are estimated with MLE, others using MAP estimation and others still by estimating their conditional distributions.<br />
</ol><br />
<br />
<br />
{{Example1<br />
|title1=Example<br />
|title2=A PK example<br />
|text=<br />
In this example we use only the pharmacokinetic data and aim to estimate the population parameter distributions of the PK parameters $ka$, $V$ and $Cl$. We assume log-normal distributions for these three parameters. All of the model's population parameters are estimated by maximum likelihood estimation except $ka_{\rm pop}$ for which a log-normal distribution is used as a prior:<br />
<br />
{{Equation1<br />
|equation=<math> \log(ka_{\rm pop}) \sim {\cal N}(\log(1.5), \gamma^2) . </math> }}<br />
<br />
$\monolix$ allows us to compute the MAP estimate and to estimate the posterior distribution of $ka_{\rm pop}$ for various values of $\gamma$.<br />
<br />
<br />
<div style="margin-left:15%; margin-right:30%; align:center"><br />
{{{!}} class="wikitable" align="center" style="width:100%"<br />
{{!}} $\gamma$ {{!}}{{!}} 0 {{!}}{{!}} 0.01 {{!}}{{!}} 0.025 {{!}}{{!}} 0.05 {{!}}{{!}} 0.1 {{!}}{{!}} 0.2 {{!}}{{!}} $+ \infty$ <br />
{{!}}-<br />
{{!}}$\hat{ka}_{\rm pop}^{\rm MAP}$ {{!}}{{!}} 1.5 {{!}}{{!}} 1.49 {{!}}{{!}} 1.47 {{!}}{{!}} 1.39 {{!}}{{!}} 1.22 {{!}}{{!}} 1.11 {{!}}{{!}} 1.05 <br />
{{!}}}</div><br />
<br />
{{ImageWithCaption|image=bayes1.png|caption=Prior and posterior distributions of $ka_{\rm pop}$ for different values of $\gamma$}}<br />
<br />
<br />
As expected, the posterior distribution converges to the prior distribution when the standard deviation $\gamma$ of the prior distribution decreases. Also, the mode of the posterior distribution converges to the maximum likelihood estimate of $ka_{\rm pop}$ when $\gamma$ increases.<br />
}}<br />
<br />
<br />
<br><br />
== Estimation of the Fisher information matrix ==<br />
<br />
The variance of the estimator $\thmle$ and thus confidence intervals can be derived from the [[Estimation of the observed Fisher information matrix|observed Fisher information matrix (F.I.M.)]], which itself is calculated using the observed likelihood (i.e., the pdf of the observations $\by$):<br />
<br />
{{EquationWithRef<br />
|equation=<div id="ofim_intro3"><math><br />
\ofim(\thmle ; \by) \ \ \eqdef \ \ - \displaystyle{ \frac{\partial^2}{\partial \theta^2} }\log({\like}(\thmle ; \by)) .<br />
</math></div><br />
|reference=(1) }}<br />
<br />
Then, the variance-covariance matrix of the maximum likelihood estimator $\thmle$ can be estimated by the inverse of the observed F.I.M. Standard errors (s.e.) for each component of $\thmle$ are their standard deviations, i.e., the square-root of the diagonal elements of this covariance matrix. $\monolix$ also displays the (estimated) relative standard errors (r.s.e.), i.e., the (estimated) standard error divided by the value of the estimated parameter.<br />
<br />
<br />
{{JustCode<br />
|code=<pre style="background-color:#EFEFEF; border:none;">Estimation of the population parameters<br />
<br />
parameter s.e. (s.a.) r.s.e.(%)<br />
ka : 0.974 0.082 8<br />
V : 7.07 0.35 5<br />
Cl : 2 0.07 4<br />
h0 : 0.0102 0.0014 14<br />
gamma : 0.485 0.015 3<br />
<br />
omega_ka : 0.668 0.064 10<br />
omega_V : 0.365 0.037 10<br />
omega_Cl : 0.588 0.055 9<br />
omega_h0 : 0.105 0.032 30<br />
omega_gamma : 0.0901 0.044 49<br />
<br />
a_1 : 0.345 0.012 3<br />
</pre> }}<br />
<br />
The F.I.M. can be used for detecting overparametrization of the structural model. In effect, if the model is poorly identifiable, certain estimators will be quite correlated and the F.I.M. will therefore be poorly conditioned and difficult to inverse. Suppose for example that we want to fit a two compartment PK model to the same data as before. The output is shown below. The large values for the relative standard errors for the inter-compartmental clearance $Q$ and the volume of the peripheral compartment $V_2$ mean that the data does not allow us to estimate well these two parameters.<br />
<br />
<br />
{{JustCode<br />
|code=<pre style="background-color:#EFEFEF; border:none;">Estimation of the population parameters<br />
<br />
parameter s.e. (lin) r.s.e.(%)<br />
ka : 0.246 0.0081 3<br />
Cl : 1.9 0.075 4<br />
V1 : 1.71 0.14 8<br />
Q : 0.000171 0.024 1.43e+04<br />
V2 : 0.00673 3.1 4.62e+04<br />
<br />
omega_ka : 0.171 0.026 15<br />
omega_Cl : 0.293 0.026 9<br />
omega_V1 : 0.621 0.062 10<br />
omega_Q : 5.72 1.4e+03 2.41e+04<br />
omega_V2 : 4.61 1.8e+04 3.94e+05<br />
<br />
a : 0.136 0.0073 5<br />
</pre> }}<br />
<br />
<br />
The Fisher information criteria is also widely used in optimal experimental design. Indeed, minimizing the variance of the estimator corresponds to maximizing the information. Then, estimators and designs can be evaluated by looking at certain summary statistics of the covariance matrix (like the determinant or trace for instance).<br />
<br />
<br><br />
== Estimation of the individual parameters ==<br />
<br />
Once $\theta$ has been estimated, the conditional distribution $\pmacro(\psi_i | y_i ; \hat{\theta})$ of the individual parameters $\psi_i$ can be estimated for each individual $i$ using the [[The Metropolis-Hastings algorithm for simulating the individual parameters| Metropolis-Hastings algorithm]]. For each $i$, this algorithm generates a sequence $(\psi_i^{k}, k \geq 1)$ which converges in distribution to the conditional distribution $\pmacro(\psi_i | y_i ; \hat{\theta})$ and that can be used for estimating any summary statistic of this distribution (mean, standard deviation, quantiles, etc.).<br />
<br />
The mode of this conditional distribution can be estimated using this sequence or by maximizing $\pmacro(\psi_i | y_i ; \hat{\theta})$ using numerical methods.<br />
<br />
The choice of using the conditional mean or the conditional mode is arbitrary. By default, $\monolix$ uses the conditional mode, taking the philosophy that the "most likely" values of the individual parameters are the most suited for computing the "most likely" predictions.<br />
<br />
<br />
{{ImageWithCaption|image=mode1.png|caption=Predicted concentrations for 6 individuals using the estimated conditional modes of the individual PK parameters}} <br />
<br />
<br><br />
<br />
== Estimation of the observed log-likelihood ==<br />
<br />
<br />
Once $\theta$ has been estimated, the observed log-likelihood of $\hat{\theta}$ is defined as<br />
<br />
{{Equation1<br />
|equation=<math> \begin{eqnarray}<br />
{\llike} (\hat{\theta};\by) &=& \log({\like}(\hat{\theta};\by)) \\<br />
&\eqdef& \log(\py(\by;\hat{\theta})) .<br />
\end{eqnarray}</math> }}<br />
<br />
The observed log-likelihood cannot be computed in closed form for nonlinear mixed effects models, but can be estimated using the methods described in the [[Estimation of the log-likelihood]] Section. The estimated log-likelihood can then be used for performing likelihood ratio tests and for computing information criteria such as AIC and BIC (see the [[Evaluation]] Section).<br />
<br />
<br />
<br><br />
== Bibliography ==<br />
<br />
<bibtex><br />
@article{Monolix,<br />
author = {Lixoft},<br />
title = {Monolix 4.2},<br />
year={2012}<br />
journal = {http://www.lixoft.eu/products/monolix/product-monolix-overview},<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@article{comets2011package,<br />
title={saemix: Stochastic Approximation Expectation Maximization (SAEM) algorithm. R package version 0.96.1},<br />
author={Comets, E. and Lavenu, A. and Lavielle, M.},<br />
journal = {http://cran.r-project.org/web/packages/saemix/index.html},<br />
year={2013}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@article{nlmefitsa,<br />
title={nlmefitsa: fit nonlinear mixed-effects model with stochastic EM algorithm. Matlab R2013a function},<br />
author={The MathWorks},<br />
journal = {http://www.mathworks.fr/fr/help/stats/nlmefitsa.html},<br />
year={2013}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@article{beal1992nonmem,<br />
title={NONMEM users guides},<br />
author={Beal, S.L. and Sheiner, L.B. and Boeckmann, A. and Bauer, R.J.},<br />
journal={San Francisco, NONMEM Project Group, University of California},<br />
year={1992}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@book{pinheiro2000mixed,<br />
title={Mixed effects models in S and S-PLUS},<br />
author={Pinheiro, J.C. and Bates, D.M.},<br />
year={2000},<br />
publisher={Springer Verlag}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@article{pinheiro2010r,<br />
title={the R Core team (2009) nlme: Linear and Nonlinear Mixed Effects Models. R package version 3.1-96},<br />
author={Pinheiro, J. and Bates, D. and DebRoy, S. and Sarkar, D.},<br />
journal={R Foundation for Statistical Computing, Vienna},<br />
year={2010}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@article{spiegelhalter2003winbugs,<br />
title={WinBUGS user manual},<br />
author={Spiegelhalter, D. and Thomas, A. and Best, N. and Lunn, D.},<br />
journal={Cambridge: MRC Biostatistics Unit},<br />
year={2003}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@Manual{docSPSS,<br />
title = {Linear mixed-effects modeling in SPSS. An introduction to the MIXED procedure},<br />
author = {SPSS},<br />
year = {2002},<br />
note={Technical Report}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@Manual{docSAS,<br />
title = {The NLMMIXED procedure, SAS/STAT 9.2 User's Guide},<br />
chapter = {61},<br />
pages = {4337--4435},<br />
author = {SAS},<br />
year = {2008}<br />
}<br />
</bibtex><br />
<br />
<br />
{{Back&Next<br />
|linkBack=Visualization<br />
|linkNext=Model evaluation }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=Estimation&diff=7283Estimation2013-06-07T13:15:23Z<p>Brocco: </p>
<hr />
<div>== Introduction ==<br />
<br />
In the modeling context, we usually assume that we have data that includes observations $\by$, measurement times $\bt$ and possibly additional regression variables $\bx$. There may also be individual covariates $\bc$, and in pharmacological applications the dose regimen $\bu$. For clarity, in the following notation we will omit the design variables $\bt$, $\bx$ and $\bu$, and the covariates $\bc$.<br />
<br />
Here, we find ourselves in the classical framework of incomplete data models. Indeed, only $\by = (y_{ij})$ is observed in the joint model $\pypsi(\by,\bpsi;\theta)$.<br />
<br />
Estimation tasks are common ones seen in statistics:<br />
<br />
<br />
<ol><br />
<li> Estimate the population parameter $\theta$ using the available observations and possibly a priori information that is available.</li><br />
<br />
<li>Evaluate the precision of the proposed estimates.</li><br />
<br />
<li>Reconstruct missing data, here being the individual parameters $\bpsi=(\psi_i, 1\leq i \leq N)$. </li><br />
<br />
<li>Estimate the log-likelihood for a given model, i.e., for a given joint distribution $\qypsi$ and value of $\theta$.</li><br />
</ol><br />
<br />
<br />
<br><br />
<br />
== Maximum likelihood estimation of the population parameters== <br />
<br />
<br><br />
=== Definitions ===<br />
<br />
<br />
''Maximum likelihood estimation'' consists of maximizing with respect to $\theta$ the ''observed likelihood'' defined by:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\like(\theta ; \by) &\eqdef& \py(\by ; \theta) \\<br />
&=& \int \pypsi(\by,\bpsi ;\theta) \, d \bpsi .<br />
\end{eqnarray}</math> }}<br />
<br />
Maximum likelihood estimation of the population parameter $\theta$ requires:<br />
<br />
<blockquote><br />
* A model, i.e., a joint distribution $\qypsi$. Depending on the software used, the model can be implemented using a script or a graphical user interface. $\monolix$ is extremely flexible and allows us to combine both. It is possible for instance to code the structural model using $\mlxtran$ and use the GUI for implementing the statistical model. Whatever the options selected, the complete model can always be saved as a text file. <br><br><br />
* Inputs $\by$, $\bc$, $\bu$ and $\bt$. All of these variables tend to be stored in a unique data file (see the [[Visualization#Data exploration | Data Exploration ]] Section). <br><br><br />
* An algorithm which allows us to maximize $\int \pypsi(\by,\bpsi ;\theta) \, d \bpsi$ with respect to $\theta$. Each software package has its own algorithms implemented. It is not our goal here to rate and compare the various algorithms and implementations. We will use exclusively the SAEM algorithm as described in [[The SAEM algorithm for estimating population parameters | The SAEM algorithm]] and implemented in $\monolix$ as we are entirely satisfied by both its theoretical and practical qualities: <br><br><br />
** The algorithms implemented in $\monolix$ including SAEM and its extensions (mixture models, hidden Markov models, SDE-based model, censored data, etc.) have been published in statistical journals. Furthermore, convergence of SAEM has been rigorously proved.<br><br><br />
** The SAEM implementation in $\monolix$ is extremely efficient for a wide variety of complex models.<br><br><br />
** The SAEM implementation in $\monolix$ was done by the same group that proposed the algorithm and studied in detail its theoretical and practical properties.<br />
</blockquote><br />
<br />
<br />
{{Remarks<br />
|title=Remark<br />
|text= It is important to highlight the fact that for a parameter $\psi_i$ whose distribution is the tranformation of a normal one (log-normal, logit-normal, etc.) the MLE $\hat{\psi}_{\rm pop}$ of the reference parameter $\psi_{\rm pop}$ is neither the mean nor the mode of the distribution. It is in fact the median.<br />
<br />
To show why this is the case, let $h$ be a nonlinear, twice continuously derivable and strictly increasing function such that $h(\psi_i)$ is normally distributed.<br />
<br />
<br />
* First we show that it is not the mean. By definition, the MLE of $h(\psi_{\rm pop})$ is $h(\hat{\psi}_{\rm pop})$. Thus, the estimated distribution of $h(\psi_i)$ is the normal distribution with mean $h(\hat{\psi}_{\rm pop})$, but $\esp{h(\psi_i)} = h(\hat{\psi}_{\rm pop})$ implies that $\esp{\psi_i} \neq \hat{\psi}_{\rm pop}$ since $h$ is nonlinear. In other words, $\hat{\psi}_{\rm pop}$ is not the mean of the estimated distribution of $\psi_i$.<br />
<br />
<br />
* Next we show that it is not the mode. Let $f$ be the pdf of $\psi_i$ and let $f_h$ be the pdf of $h(\psi_i)$. By definition, for any $h(t)\in \mathbb{R}$,<br />
<br />
{{Equation1<br />
|equation=<math><br />
f(t) = h^\prime(t)f_h(h(t)) . </math> }}<br />
<br />
: Thus,<br />
<br />
{{Equation1<br />
|equation=<math> <br />
f^\prime(t) = h^{\prime \prime}(t)f_h(h(t)) + h^{\prime 2}(t)f_h^\prime(h(t)) .<br />
</math> }}<br />
<br />
: By definition of the mode, $f_h^\prime(h(\hat{\psi}_{\rm pop}))=0$. Since $h$ is nonlinear, $h^{\prime \prime}(\hat{\psi}_{\rm pop})\neq 0$ a.s. and $f^\prime(\hat{\psi}_{\rm pop})\neq 0$ a.s.. In other words, $\hat{\psi}_{\rm pop}$ is not the mode of the estimated distribution of $\psi_i$.<br />
<br />
<br />
* Now we show that it is the median. Since $h$ is a strictly increasing function,<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\probs{\hat{\psi}_{\rm pop} }{\psi_i \leq \hat{\psi}_{\rm pop} } &=& \probs{\hat{\psi}_{\rm pop} }{h(\psi_i) \leq h(\hat{\psi}_{\rm pop})} \\<br />
&=& 0.5 .<br />
\end{eqnarray}</math> }} <br />
<br />
: In other words, $\hat{\psi}_{\rm pop}$ is the median of the estimated distribution of $\psi_i$.<br />
}}<br />
<br />
<br />
<br><br />
<br />
=== Example ===<br />
<br />
Let us again look at the model used in the [[Visualization#Model exploration | Model Visualization]] Section. For the case of a unique dose $D$ given at time $t=0$, the structural model is written:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
ke&=&Cl/V \\<br />
Cc(t) &=& \displaystyle{\frac{D \, ka}{V(ka-ke)} }\left(e^{-ke\,t} - e^{-ka\,t} \right) \\<br />
h(t) &=& h_0 \, \exp(\gamma\, Cc(t)) ,<br />
\end{eqnarray}</math> }}<br />
<br />
where $Cc$ is the concentration in the central compartment and $h$ the hazard function for the event of interest (hemorrhaging). Supposing a constant error model for the concentration, the model for the observations can be easily implemented using $\mlxtran$.<br />
<br />
<br />
{{MLXTran<br />
|name=joint1est_model.txt<br />
|text=<pre style="background-color:#EFEFEF; border:none;"> <br />
INPUT:<br />
parameter = {ka, V, Cl, h0, gamma}<br />
<br />
EQUATION:<br />
ke=Cl/V<br />
Cc = amtDose*ka/(V*(ka-ke))*(exp(-ke*t) - exp(-ka*t))<br />
h = h0*exp(gamma*Cc)<br />
<br />
OBSERVATION:<br />
Concentration = {type=continuous, prediction=Cc, errorModel=constant}<br />
Hemorrhaging = {type=event, hazard=h}<br />
<br />
OUTPUT:<br />
output = {Concentration, Hemorrhaging}<br />
</pre> }}<br />
<br />
<br />
Here, {{Verbatim|amtDose}} is a reserved keyword for the last administered dose.<br />
<br />
The model's parameters are the absorption rate constant $ka$, the volume of distribution $V$, the clearance $Cl$, the baseline hazard $h_0$ and the coefficient $\gamma$. The statistical model for the individual parameters can be defined in the $\monolix$ project file (left) and/or the $\monolix$ GUI (right):<br />
<br />
<br />
{{ExampleWithCode&Image<br />
|title=<br />
|text=<br />
|code={{MLXTranForTable<br />
|name=<br />
|text=<pre style="background-color:#EFEFEF; border:none;"> <br />
INDIVIDUAL:<br />
ka = {distribution=logNormal, iiv=yes}<br />
V = {distribution=logNormal, iiv=yes},<br />
Cl = {distribution=normal, iiv=yes},<br />
h0 = {distribution=probitNormal, iiv=yes},<br />
gamma = {distribution=logitNormal, iiv=yes},<br />
</pre> }}<br />
|image=<br />
[[File:Vsaem1.png]]<br />
}}<br />
<br />
<br />
Once the model is implemented, tasks such as maximum likelihood estimation can be performed using the SAEM algorithm. Certain settings in SAEM must be provided by the user. Even though SAEM is quite insensitive to the initial parameter values,<br />
it is possible to perform a preliminary sensitivity analysis in order to select "good" initial values.<br />
<br />
<br />
{{ImageWithCaption|image=Vsaem2.png|caption=Looking for good initial values for SAEM}}<br />
<br />
<br />
<br />
Then, when we run SAEM, it converges easily and quickly to the MLE:<br />
<br />
<br />
{{JustCode<br />
|code=<pre style="background-color:#EFEFEF; border:none;">Estimation of the population parameters<br />
<br />
parameter<br />
ka : 0.974<br />
V : 7.07<br />
Cl : 2.00<br />
h0 : 0.0102<br />
gamma : 0.485<br />
<br />
omega_ka : 0.668<br />
omega_V : 0.365<br />
omega_Cl : 0.588<br />
omega_h0 : 0.105<br />
omega_gamma : 0.0901<br />
<br />
a_1 : 0.345<br />
</pre> }}<br />
<br />
<br />
Parameter estimation can therefore be seen as estimating the reference values and variance of the random effects.<br />
<br />
In addition to these numbers, it is important to be able to graphically represent these distributions in order to see them and therefore understand them better. In effect, the interpretation of certain parameters is not always simple. Of course, we know what a normal distribution represents and in particular its mean, median and mode, which are equal (see the distribution of $Cl$ below for instance). These measures of central tendency can be different among themselves for other asymmetric distributions such as the log-normal (see the distribution of $ka$).<br />
<br />
Interpreting dispersion terms like $\omega_{ka}$ and $\omega_{V}$ is not obvious either when the parameter distributions are not normal. In such cases, quartiles or quantiles of order 5% and 95% (for example) may be useful for quantitively describing the variability of these parameters.<br />
<br />
<br />
{{Remarks <br />
|title=Remarks<br />
|text=<br />
For a parameter $\psi$ whose distribution is log-normal, we can approximate the coefficient of variation for $\psi$ by the standard deviation $\omega_{\psi}$ of the random effect $\eta$ if this is fairly small. In effect, when $\omega_{\psi}$ is small,<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\psi &=& \psi_{\rm pop} e^{\eta} \\<br />
&\approx & \psi_{\rm pop}(1+ \eta) .<br />
\end{eqnarray}</math> }}<br />
<br />
Thus<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\esp{\psi} &\approx& \psi_{\rm pop} \\<br />
\std{\psi} &\approx & \psi_{\rm pop}\omega_{\psi},<br />
\end{eqnarray}</math> }}<br />
<br />
and<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
{\rm cv}(\psi) &=& \frac{\std{\psi} }{\esp{\psi} } \\<br />
&\approx & \omega_{\psi} .<br />
\end{eqnarray}</math> }}<br />
<br />
Do not forget that this approximation is only valid when $\omega$ is small and in the case of log-normal distributions. It does not carry over to any other distribution. Thus, when $\omega_{h0}=0.1$ for a probit-normal distribution or $\omega_{\gamma}=0.09$ for a logit-normal one, there is no immediate interpretation available. Only by looking at the graphical display of the pdf or by calculating some quantiles of interest can we begin to get an idea of dispersion in the parameters $h0$ and $\gamma$.<br />
}}<br />
<br />
<br />
{{ImageWithCaption|image=saem3b.png|caption=Estimation of the population distributions of the individual parameters of the model }}<br />
<br />
<br />
<br />
<br><br />
<br />
==Bayesian estimation of the population parameters==<br />
<br />
The ''Bayesian approach'' considers $\theta$ as a random vector with a ''prior distribution'' $\qth$. We can then define the posterior distribution of $\theta$:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\pcthy(\theta {{!}} \by ) &=& \displaystyle{ \frac{\pth( \theta )\pcyth(\by {{!}} \theta )}{\py(\by)} }\\<br />
&=& \displaystyle{ \frac{\pth( \theta ) \int \pypsith(\by,\bpsi {{!}}\theta) \, d \bpsi}{\py(\by)} }.<br />
\end{eqnarray}</math> }}<br />
<br />
We can estimate this conditional distribution and derive any statistics (posterior mean, standard deviation, percentiles, etc.) or derive the so-called ''Maximum a Posteriori'' (MAP) estimate of $\theta$:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\hat{\theta}^{\rm MAP} &=& \argmax{\theta} \pcthy(\theta {{!}} \by ) \\<br />
&=& \argmax{\theta} \left\{ {\llike}(\theta ; \by) + \log( \pth( \theta ) ) \right\} .<br />
\end{eqnarray}</math> }}<br />
<br />
The MAP estimate therefore maximizes a penalized version of the observed likelihood. In other words, maximum a posteriori estimation reduces to penalized maximum likelihood estimation. Suppose for instance that $\theta$ is a scalar parameter and the prior is a normal distribution with mean $\theta_0$ and variance $\gamma^2$. Then, the MAP estimate minimizes<br />
<br />
{{Equation1<br />
|equation=<math><br />
\hat{\theta}^{\rm MAP} =\argmax{\theta} \left\{ {\llike} (\theta ; \by) - \displaystyle{ \frac{1}{2\gamma^2} }(\theta - \theta_0)^2 \right\} .<br />
</math> }}<br />
<br />
The MAP estimate is a trade-off between the MLE which maximizes ${\llike}(\theta ; \by)$ and $\theta_0$ which minimizes $(\theta - \theta_0)^2$. The weight given to the prior directly depends on the variance of the prior distribution: the smaller $\gamma^2$ is, the closer to $\theta_0$ the MAP is. The limiting distribution considers that $\gamma^2=0$: this prior means here that $\theta$ is fixed as $\theta_0$ and no longer needs to be estimated.<br />
<br />
Both the Bayesian and frequentist approaches have their supporters and detractors. But rather than being dogmatic and blindly following the same rule-book every time, we need to be pragmatic and ask the right methodological questions when confronted with a new problem.<br />
<br />
We have to remember that Bayesian methods have been extremely successful, in particular for numerical calculations. For instance, (Bayesian) MCMC methods allow us to estimate more or less any conditional distribution coming from any hierarchical model, whereas frequentist approaches such as maximum likelihood estimation can be much more difficult to implement.<br />
<br />
All things said, the problem comes down to knowing whether the data contains sufficient information to answer a given question, and whether some other information may be available to help answer it. This is the essence of the art of modeling: finding the right compromise between the confidence we have in the data and prior knowledge of the problem. Each problem is different and requires a specific approach. For instance, if all the patients in a pharmacokinetic trial have essentially the same weight, it is pointless to estimate a relationship between weight and the model's PK parameters using the trial data. In this case, the modeler would be better served trying to use prior information based on physiological criteria rather than just a statistical model.<br />
<br />
Therefore, we can use information available to us, of course! Why not? But this information needs to be pertinent. Systematically using a prior for the parameters is not always meaningful. Can we reasonable suppose that we have access to such information? For continuous data for example, what does putting a prior on the residual error model's parameters mean in reality? A reasoned statistical approach consists of only including prior information for certain parameters (those for which we have real prior information) and having confidence in the data for the others.<br />
<br />
$\monolix$ allows this hybrid approach which reconciles the Bayesian and frequentist approaches. A given parameter can be:<br />
<br />
<br />
<ul><br />
* a fixed constant if we have absolute confidence in its value or the data does not allow it to be estimated, essentially due to identifiability constraints.<br />
<br><br />
<br />
* estimated by maximum likelihood, either because we have great confidence in the data or have no information on the parameter.<br />
<br><br />
<br />
* estimated by introducing a prior and calculating the MAP estimate.<br />
<br><br />
<br />
* estimated by introducing a prior and then estimating the posterior distribution.<br />
</ul><br />
<br />
<br />
We put aside dealing with the fixed components of $\theta$ in the following. Here are some possible situations:<br />
<br />
<br />
<ol><br />
<li> ''Combined maximum likelihood and maximum a posteriori estimation'': decompose $\theta$ into $(\theta_E,\theta_{M})$ where $\theta_E$ are the components of $\theta$ to be estimated with MLE and $\theta_{M}$ those with a prior distribution whose posterior distribution is to be maximized. Then, $(\hat{\theta}_E , \hat{\theta}_{M} )$ below maximizes the penalized likelihood of $(\theta_E,\theta_{M})$: </li><br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
(\hat{\theta}_E , \hat{\theta}_{M} ) &=& \argmax{\theta_E , \theta_{M} } \log(\py(\by , \theta_{M}; \theta_E)) \\<br />
&=& \argmax{\theta_E , \theta_{M} } \left\{ {\llike}(\theta_E , \theta_{M}; \by) + \log( \pth( \theta_M ) ) \right\} ,<br />
\end{eqnarray}</math> }}<br />
<br />
where ${\llike} (\theta_E , \theta_{M}; \by) \ \ \eqdef \ \ \log\left(\py(\by | \theta_{M}; \theta_E)\right).$<br />
<br />
<br />
<li> ''Combined maximum likelihood and posterior distribution estimation'': here, decompose $\theta$ into $(\theta_E,\theta_{R})$ where $\theta_E$ are the components of $\theta$ to be estimated with MLE and $\theta_{R}$ those with a prior distribution whose posterior distribution is to be estimated. We propose the following strategy for estimating $\theta_E$ and $\theta_{R}$: </li><br />
<br />
<br />
<ol style="list-style-type:lower-roman"><br />
<li> Compute the maximum likelihood of $\theta_E$: </li><br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\hat{\theta}_E &=& \argmax{\theta_E} \log(\py(\by ; \theta_E)) \\<br />
&=& \argmax{\theta_E} \int \pmacro(\by , \theta_R ; \theta_E ) d \theta_R .<br />
\end{eqnarray}</math> }}<br />
<br />
<br />
<li> Estimate the conditional distribution $\pmacro(\theta_{R} | \by ;\hat{\theta}_E)$. </li><br />
</ol><br />
<br />
<br />
It is then straightforward to extend this approach to more complex situations where some components of $\theta$ are estimated with MLE, others using MAP estimation and others still by estimating their conditional distributions.<br />
</ol><br />
<br />
<br />
{{Example1<br />
|title1=Example<br />
|title2=A PK example<br />
|text=<br />
In this example we use only the pharmacokinetic data and aim to estimate the population parameter distributions of the PK parameters $ka$, $V$ and $Cl$. We assume log-normal distributions for these three parameters. All of the model's population parameters are estimated by maximum likelihood estimation except $ka_{\rm pop}$ for which a log-normal distribution is used as a prior:<br />
<br />
{{Equation1<br />
|equation=<math> \log(ka_{\rm pop}) \sim {\cal N}(\log(1.5), \gamma^2) . </math> }}<br />
<br />
$\monolix$ allows us to compute the MAP estimate and to estimate the posterior distribution of $ka_{\rm pop}$ for various values of $\gamma$.<br />
<br />
<br />
<div style="margin-left:15%; margin-right:28%; align:center"><br />
{{{!}} class="wikitable" align="center" style="width:100%"<br />
{{!}} $\gamma$ {{!}}{{!}} 0 {{!}}{{!}} 0.01 {{!}}{{!}} 0.025 {{!}}{{!}} 0.05 {{!}}{{!}} 0.1 {{!}}{{!}} 0.2 {{!}}{{!}} $+ \infty$ <br />
{{!}}-<br />
{{!}}$\hat{ka}_{\rm pop}^{\rm MAP}$ {{!}}{{!}} 1.5 {{!}}{{!}} 1.49 {{!}}{{!}} 1.47 {{!}}{{!}} 1.39 {{!}}{{!}} 1.22 {{!}}{{!}} 1.11 {{!}}{{!}} 1.05 <br />
{{!}}}</div><br />
<br />
{{ImageWithCaption|image=bayes1.png|caption=Prior and posterior distributions of $ka_{\rm pop}$ for different values of $\gamma$}}<br />
<br />
<br />
As expected, the posterior distribution converges to the prior distribution when the standard deviation $\gamma$ of the prior distribution decreases. Also, the mode of the posterior distribution converges to the maximum likelihood estimate of $ka_{\rm pop}$ when $\gamma$ increases.<br />
}}<br />
<br />
<br />
<br><br />
== Estimation of the Fisher information matrix ==<br />
<br />
The variance of the estimator $\thmle$ and thus confidence intervals can be derived from the [[Estimation of the observed Fisher information matrix|observed Fisher information matrix (F.I.M.)]], which itself is calculated using the observed likelihood (i.e., the pdf of the observations $\by$):<br />
<br />
{{EquationWithRef<br />
|equation=<div id="ofim_intro3"><math><br />
\ofim(\thmle ; \by) \ \ \eqdef \ \ - \displaystyle{ \frac{\partial^2}{\partial \theta^2} }\log({\like}(\thmle ; \by)) .<br />
</math></div><br />
|reference=(1) }}<br />
<br />
Then, the variance-covariance matrix of the maximum likelihood estimator $\thmle$ can be estimated by the inverse of the observed F.I.M. Standard errors (s.e.) for each component of $\thmle$ are their standard deviations, i.e., the square-root of the diagonal elements of this covariance matrix. $\monolix$ also displays the (estimated) relative standard errors (r.s.e.), i.e., the (estimated) standard error divided by the value of the estimated parameter.<br />
<br />
<br />
{{JustCode<br />
|code=<pre style="background-color:#EFEFEF; border:none;">Estimation of the population parameters<br />
<br />
parameter s.e. (s.a.) r.s.e.(%)<br />
ka : 0.974 0.082 8<br />
V : 7.07 0.35 5<br />
Cl : 2 0.07 4<br />
h0 : 0.0102 0.0014 14<br />
gamma : 0.485 0.015 3<br />
<br />
omega_ka : 0.668 0.064 10<br />
omega_V : 0.365 0.037 10<br />
omega_Cl : 0.588 0.055 9<br />
omega_h0 : 0.105 0.032 30<br />
omega_gamma : 0.0901 0.044 49<br />
<br />
a_1 : 0.345 0.012 3<br />
</pre> }}<br />
<br />
The F.I.M. can be used for detecting overparametrization of the structural model. In effect, if the model is poorly identifiable, certain estimators will be quite correlated and the F.I.M. will therefore be poorly conditioned and difficult to inverse. Suppose for example that we want to fit a two compartment PK model to the same data as before. The output is shown below. The large values for the relative standard errors for the inter-compartmental clearance $Q$ and the volume of the peripheral compartment $V_2$ mean that the data does not allow us to estimate well these two parameters.<br />
<br />
<br />
{{JustCode<br />
|code=<pre style="background-color:#EFEFEF; border:none;">Estimation of the population parameters<br />
<br />
parameter s.e. (lin) r.s.e.(%)<br />
ka : 0.246 0.0081 3<br />
Cl : 1.9 0.075 4<br />
V1 : 1.71 0.14 8<br />
Q : 0.000171 0.024 1.43e+04<br />
V2 : 0.00673 3.1 4.62e+04<br />
<br />
omega_ka : 0.171 0.026 15<br />
omega_Cl : 0.293 0.026 9<br />
omega_V1 : 0.621 0.062 10<br />
omega_Q : 5.72 1.4e+03 2.41e+04<br />
omega_V2 : 4.61 1.8e+04 3.94e+05<br />
<br />
a : 0.136 0.0073 5<br />
</pre> }}<br />
<br />
<br />
The Fisher information criteria is also widely used in optimal experimental design. Indeed, minimizing the variance of the estimator corresponds to maximizing the information. Then, estimators and designs can be evaluated by looking at certain summary statistics of the covariance matrix (like the determinant or trace for instance).<br />
<br />
<br><br />
== Estimation of the individual parameters ==<br />
<br />
Once $\theta$ has been estimated, the conditional distribution $\pmacro(\psi_i | y_i ; \hat{\theta})$ of the individual parameters $\psi_i$ can be estimated for each individual $i$ using the [[The Metropolis-Hastings algorithm for simulating the individual parameters| Metropolis-Hastings algorithm]]. For each $i$, this algorithm generates a sequence $(\psi_i^{k}, k \geq 1)$ which converges in distribution to the conditional distribution $\pmacro(\psi_i | y_i ; \hat{\theta})$ and that can be used for estimating any summary statistic of this distribution (mean, standard deviation, quantiles, etc.).<br />
<br />
The mode of this conditional distribution can be estimated using this sequence or by maximizing $\pmacro(\psi_i | y_i ; \hat{\theta})$ using numerical methods.<br />
<br />
The choice of using the conditional mean or the conditional mode is arbitrary. By default, $\monolix$ uses the conditional mode, taking the philosophy that the "most likely" values of the individual parameters are the most suited for computing the "most likely" predictions.<br />
<br />
<br />
{{ImageWithCaption|image=mode1.png|caption=Predicted concentrations for 6 individuals using the estimated conditional modes of the individual PK parameters}} <br />
<br />
<br><br />
<br />
== Estimation of the observed log-likelihood ==<br />
<br />
<br />
Once $\theta$ has been estimated, the observed log-likelihood of $\hat{\theta}$ is defined as<br />
<br />
{{Equation1<br />
|equation=<math> \begin{eqnarray}<br />
{\llike} (\hat{\theta};\by) &=& \log({\like}(\hat{\theta};\by)) \\<br />
&\eqdef& \log(\py(\by;\hat{\theta})) .<br />
\end{eqnarray}</math> }}<br />
<br />
The observed log-likelihood cannot be computed in closed form for nonlinear mixed effects models, but can be estimated using the methods described in the [[Estimation of the log-likelihood]] Section. The estimated log-likelihood can then be used for performing likelihood ratio tests and for computing information criteria such as AIC and BIC (see the [[Evaluation]] Section).<br />
<br />
<br />
<br><br />
== Bibliography ==<br />
<br />
<bibtex><br />
@article{Monolix,<br />
author = {Lixoft},<br />
title = {Monolix 4.2},<br />
year={2012}<br />
journal = {http://www.lixoft.eu/products/monolix/product-monolix-overview},<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@article{comets2011package,<br />
title={saemix: Stochastic Approximation Expectation Maximization (SAEM) algorithm. R package version 0.96.1},<br />
author={Comets, E. and Lavenu, A. and Lavielle, M.},<br />
journal = {http://cran.r-project.org/web/packages/saemix/index.html},<br />
year={2013}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@article{nlmefitsa,<br />
title={nlmefitsa: fit nonlinear mixed-effects model with stochastic EM algorithm. Matlab R2013a function},<br />
author={The MathWorks},<br />
journal = {http://www.mathworks.fr/fr/help/stats/nlmefitsa.html},<br />
year={2013}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@article{beal1992nonmem,<br />
title={NONMEM users guides},<br />
author={Beal, S.L. and Sheiner, L.B. and Boeckmann, A. and Bauer, R.J.},<br />
journal={San Francisco, NONMEM Project Group, University of California},<br />
year={1992}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@book{pinheiro2000mixed,<br />
title={Mixed effects models in S and S-PLUS},<br />
author={Pinheiro, J.C. and Bates, D.M.},<br />
year={2000},<br />
publisher={Springer Verlag}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@article{pinheiro2010r,<br />
title={the R Core team (2009) nlme: Linear and Nonlinear Mixed Effects Models. R package version 3.1-96},<br />
author={Pinheiro, J. and Bates, D. and DebRoy, S. and Sarkar, D.},<br />
journal={R Foundation for Statistical Computing, Vienna},<br />
year={2010}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@article{spiegelhalter2003winbugs,<br />
title={WinBUGS user manual},<br />
author={Spiegelhalter, D. and Thomas, A. and Best, N. and Lunn, D.},<br />
journal={Cambridge: MRC Biostatistics Unit},<br />
year={2003}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@Manual{docSPSS,<br />
title = {Linear mixed-effects modeling in SPSS. An introduction to the MIXED procedure},<br />
author = {SPSS},<br />
year = {2002},<br />
note={Technical Report}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@Manual{docSAS,<br />
title = {The NLMMIXED procedure, SAS/STAT 9.2 User's Guide},<br />
chapter = {61},<br />
pages = {4337--4435},<br />
author = {SAS},<br />
year = {2008}<br />
}<br />
</bibtex><br />
<br />
<br />
{{Back&Next<br />
|linkBack=Visualization<br />
|linkNext=Model evaluation }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=Estimation&diff=7282Estimation2013-06-07T13:15:03Z<p>Brocco: </p>
<hr />
<div>== Introduction ==<br />
<br />
In the modeling context, we usually assume that we have data that includes observations $\by$, measurement times $\bt$ and possibly additional regression variables $\bx$. There may also be individual covariates $\bc$, and in pharmacological applications the dose regimen $\bu$. For clarity, in the following notation we will omit the design variables $\bt$, $\bx$ and $\bu$, and the covariates $\bc$.<br />
<br />
Here, we find ourselves in the classical framework of incomplete data models. Indeed, only $\by = (y_{ij})$ is observed in the joint model $\pypsi(\by,\bpsi;\theta)$.<br />
<br />
Estimation tasks are common ones seen in statistics:<br />
<br />
<br />
<ol><br />
<li> Estimate the population parameter $\theta$ using the available observations and possibly a priori information that is available.</li><br />
<br />
<li>Evaluate the precision of the proposed estimates.</li><br />
<br />
<li>Reconstruct missing data, here being the individual parameters $\bpsi=(\psi_i, 1\leq i \leq N)$. </li><br />
<br />
<li>Estimate the log-likelihood for a given model, i.e., for a given joint distribution $\qypsi$ and value of $\theta$.</li><br />
</ol><br />
<br />
<br />
<br><br />
<br />
== Maximum likelihood estimation of the population parameters== <br />
<br />
<br><br />
=== Definitions ===<br />
<br />
<br />
''Maximum likelihood estimation'' consists of maximizing with respect to $\theta$ the ''observed likelihood'' defined by:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\like(\theta ; \by) &\eqdef& \py(\by ; \theta) \\<br />
&=& \int \pypsi(\by,\bpsi ;\theta) \, d \bpsi .<br />
\end{eqnarray}</math> }}<br />
<br />
Maximum likelihood estimation of the population parameter $\theta$ requires:<br />
<br />
<blockquote><br />
* A model, i.e., a joint distribution $\qypsi$. Depending on the software used, the model can be implemented using a script or a graphical user interface. $\monolix$ is extremely flexible and allows us to combine both. It is possible for instance to code the structural model using $\mlxtran$ and use the GUI for implementing the statistical model. Whatever the options selected, the complete model can always be saved as a text file. <br><br><br />
* Inputs $\by$, $\bc$, $\bu$ and $\bt$. All of these variables tend to be stored in a unique data file (see the [[Visualization#Data exploration | Data Exploration ]] Section). <br><br><br />
* An algorithm which allows us to maximize $\int \pypsi(\by,\bpsi ;\theta) \, d \bpsi$ with respect to $\theta$. Each software package has its own algorithms implemented. It is not our goal here to rate and compare the various algorithms and implementations. We will use exclusively the SAEM algorithm as described in [[The SAEM algorithm for estimating population parameters | The SAEM algorithm]] and implemented in $\monolix$ as we are entirely satisfied by both its theoretical and practical qualities: <br><br><br />
** The algorithms implemented in $\monolix$ including SAEM and its extensions (mixture models, hidden Markov models, SDE-based model, censored data, etc.) have been published in statistical journals. Furthermore, convergence of SAEM has been rigorously proved.<br><br><br />
** The SAEM implementation in $\monolix$ is extremely efficient for a wide variety of complex models.<br><br><br />
** The SAEM implementation in $\monolix$ was done by the same group that proposed the algorithm and studied in detail its theoretical and practical properties.<br />
</blockquote><br />
<br />
<br />
{{Remarks<br />
|title=Remark<br />
|text= It is important to highlight the fact that for a parameter $\psi_i$ whose distribution is the tranformation of a normal one (log-normal, logit-normal, etc.) the MLE $\hat{\psi}_{\rm pop}$ of the reference parameter $\psi_{\rm pop}$ is neither the mean nor the mode of the distribution. It is in fact the median.<br />
<br />
To show why this is the case, let $h$ be a nonlinear, twice continuously derivable and strictly increasing function such that $h(\psi_i)$ is normally distributed.<br />
<br />
<br />
* First we show that it is not the mean. By definition, the MLE of $h(\psi_{\rm pop})$ is $h(\hat{\psi}_{\rm pop})$. Thus, the estimated distribution of $h(\psi_i)$ is the normal distribution with mean $h(\hat{\psi}_{\rm pop})$, but $\esp{h(\psi_i)} = h(\hat{\psi}_{\rm pop})$ implies that $\esp{\psi_i} \neq \hat{\psi}_{\rm pop}$ since $h$ is nonlinear. In other words, $\hat{\psi}_{\rm pop}$ is not the mean of the estimated distribution of $\psi_i$.<br />
<br />
<br />
* Next we show that it is not the mode. Let $f$ be the pdf of $\psi_i$ and let $f_h$ be the pdf of $h(\psi_i)$. By definition, for any $h(t)\in \mathbb{R}$,<br />
<br />
{{Equation1<br />
|equation=<math><br />
f(t) = h^\prime(t)f_h(h(t)) . </math> }}<br />
<br />
: Thus,<br />
<br />
{{Equation1<br />
|equation=<math> <br />
f^\prime(t) = h^{\prime \prime}(t)f_h(h(t)) + h^{\prime 2}(t)f_h^\prime(h(t)) .<br />
</math> }}<br />
<br />
: By definition of the mode, $f_h^\prime(h(\hat{\psi}_{\rm pop}))=0$. Since $h$ is nonlinear, $h^{\prime \prime}(\hat{\psi}_{\rm pop})\neq 0$ a.s. and $f^\prime(\hat{\psi}_{\rm pop})\neq 0$ a.s.. In other words, $\hat{\psi}_{\rm pop}$ is not the mode of the estimated distribution of $\psi_i$.<br />
<br />
<br />
* Now we show that it is the median. Since $h$ is a strictly increasing function,<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\probs{\hat{\psi}_{\rm pop} }{\psi_i \leq \hat{\psi}_{\rm pop} } &=& \probs{\hat{\psi}_{\rm pop} }{h(\psi_i) \leq h(\hat{\psi}_{\rm pop})} \\<br />
&=& 0.5 .<br />
\end{eqnarray}</math> }} <br />
<br />
: In other words, $\hat{\psi}_{\rm pop}$ is the median of the estimated distribution of $\psi_i$.<br />
}}<br />
<br />
<br />
<br><br />
<br />
=== Example ===<br />
<br />
Let us again look at the model used in the [[Visualization#Model exploration | Model Visualization]] Section. For the case of a unique dose $D$ given at time $t=0$, the structural model is written:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
ke&=&Cl/V \\<br />
Cc(t) &=& \displaystyle{\frac{D \, ka}{V(ka-ke)} }\left(e^{-ke\,t} - e^{-ka\,t} \right) \\<br />
h(t) &=& h_0 \, \exp(\gamma\, Cc(t)) ,<br />
\end{eqnarray}</math> }}<br />
<br />
where $Cc$ is the concentration in the central compartment and $h$ the hazard function for the event of interest (hemorrhaging). Supposing a constant error model for the concentration, the model for the observations can be easily implemented using $\mlxtran$.<br />
<br />
<br />
{{MLXTran<br />
|name=joint1est_model.txt<br />
|text=<pre style="background-color:#EFEFEF; border:none;"> <br />
INPUT:<br />
parameter = {ka, V, Cl, h0, gamma}<br />
<br />
EQUATION:<br />
ke=Cl/V<br />
Cc = amtDose*ka/(V*(ka-ke))*(exp(-ke*t) - exp(-ka*t))<br />
h = h0*exp(gamma*Cc)<br />
<br />
OBSERVATION:<br />
Concentration = {type=continuous, prediction=Cc, errorModel=constant}<br />
Hemorrhaging = {type=event, hazard=h}<br />
<br />
OUTPUT:<br />
output = {Concentration, Hemorrhaging}<br />
</pre> }}<br />
<br />
<br />
Here, {{Verbatim|amtDose}} is a reserved keyword for the last administered dose.<br />
<br />
The model's parameters are the absorption rate constant $ka$, the volume of distribution $V$, the clearance $Cl$, the baseline hazard $h_0$ and the coefficient $\gamma$. The statistical model for the individual parameters can be defined in the $\monolix$ project file (left) and/or the $\monolix$ GUI (right):<br />
<br />
<br />
{{ExampleWithCode&Image<br />
|title=<br />
|text=<br />
|code={{MLXTranForTable<br />
|name=<br />
|text=<pre style="background-color:#EFEFEF; border:none;"> <br />
INDIVIDUAL:<br />
ka = {distribution=logNormal, iiv=yes}<br />
V = {distribution=logNormal, iiv=yes},<br />
Cl = {distribution=normal, iiv=yes},<br />
h0 = {distribution=probitNormal, iiv=yes},<br />
gamma = {distribution=logitNormal, iiv=yes},<br />
</pre> }}<br />
|image=<br />
[[File:Vsaem1.png]]<br />
}}<br />
<br />
<br />
Once the model is implemented, tasks such as maximum likelihood estimation can be performed using the SAEM algorithm. Certain settings in SAEM must be provided by the user. Even though SAEM is quite insensitive to the initial parameter values,<br />
it is possible to perform a preliminary sensitivity analysis in order to select "good" initial values.<br />
<br />
<br />
{{ImageWithCaption|image=Vsaem2.png|caption=Looking for good initial values for SAEM}}<br />
<br />
<br />
<br />
Then, when we run SAEM, it converges easily and quickly to the MLE:<br />
<br />
<br />
{{JustCode<br />
|code=<pre style="background-color:#EFEFEF; border:none;">Estimation of the population parameters<br />
<br />
parameter<br />
ka : 0.974<br />
V : 7.07<br />
Cl : 2.00<br />
h0 : 0.0102<br />
gamma : 0.485<br />
<br />
omega_ka : 0.668<br />
omega_V : 0.365<br />
omega_Cl : 0.588<br />
omega_h0 : 0.105<br />
omega_gamma : 0.0901<br />
<br />
a_1 : 0.345<br />
</pre> }}<br />
<br />
<br />
Parameter estimation can therefore be seen as estimating the reference values and variance of the random effects.<br />
<br />
In addition to these numbers, it is important to be able to graphically represent these distributions in order to see them and therefore understand them better. In effect, the interpretation of certain parameters is not always simple. Of course, we know what a normal distribution represents and in particular its mean, median and mode, which are equal (see the distribution of $Cl$ below for instance). These measures of central tendency can be different among themselves for other asymmetric distributions such as the log-normal (see the distribution of $ka$).<br />
<br />
Interpreting dispersion terms like $\omega_{ka}$ and $\omega_{V}$ is not obvious either when the parameter distributions are not normal. In such cases, quartiles or quantiles of order 5% and 95% (for example) may be useful for quantitively describing the variability of these parameters.<br />
<br />
<br />
{{Remarks <br />
|title=Remarks<br />
|text=<br />
For a parameter $\psi$ whose distribution is log-normal, we can approximate the coefficient of variation for $\psi$ by the standard deviation $\omega_{\psi}$ of the random effect $\eta$ if this is fairly small. In effect, when $\omega_{\psi}$ is small,<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\psi &=& \psi_{\rm pop} e^{\eta} \\<br />
&\approx & \psi_{\rm pop}(1+ \eta) .<br />
\end{eqnarray}</math> }}<br />
<br />
Thus<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\esp{\psi} &\approx& \psi_{\rm pop} \\<br />
\std{\psi} &\approx & \psi_{\rm pop}\omega_{\psi},<br />
\end{eqnarray}</math> }}<br />
<br />
and<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
{\rm cv}(\psi) &=& \frac{\std{\psi} }{\esp{\psi} } \\<br />
&\approx & \omega_{\psi} .<br />
\end{eqnarray}</math> }}<br />
<br />
Do not forget that this approximation is only valid when $\omega$ is small and in the case of log-normal distributions. It does not carry over to any other distribution. Thus, when $\omega_{h0}=0.1$ for a probit-normal distribution or $\omega_{\gamma}=0.09$ for a logit-normal one, there is no immediate interpretation available. Only by looking at the graphical display of the pdf or by calculating some quantiles of interest can we begin to get an idea of dispersion in the parameters $h0$ and $\gamma$.<br />
}}<br />
<br />
<br />
{{ImageWithCaption|image=saem3b.png|caption=Estimation of the population distributions of the individual parameters of the model }}<br />
<br />
<br />
<br />
<br><br />
<br />
==Bayesian estimation of the population parameters==<br />
<br />
The ''Bayesian approach'' considers $\theta$ as a random vector with a ''prior distribution'' $\qth$. We can then define the posterior distribution of $\theta$:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\pcthy(\theta {{!}} \by ) &=& \displaystyle{ \frac{\pth( \theta )\pcyth(\by {{!}} \theta )}{\py(\by)} }\\<br />
&=& \displaystyle{ \frac{\pth( \theta ) \int \pypsith(\by,\bpsi {{!}}\theta) \, d \bpsi}{\py(\by)} }.<br />
\end{eqnarray}</math> }}<br />
<br />
We can estimate this conditional distribution and derive any statistics (posterior mean, standard deviation, percentiles, etc.) or derive the so-called ''Maximum a Posteriori'' (MAP) estimate of $\theta$:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\hat{\theta}^{\rm MAP} &=& \argmax{\theta} \pcthy(\theta {{!}} \by ) \\<br />
&=& \argmax{\theta} \left\{ {\llike}(\theta ; \by) + \log( \pth( \theta ) ) \right\} .<br />
\end{eqnarray}</math> }}<br />
<br />
The MAP estimate therefore maximizes a penalized version of the observed likelihood. In other words, maximum a posteriori estimation reduces to penalized maximum likelihood estimation. Suppose for instance that $\theta$ is a scalar parameter and the prior is a normal distribution with mean $\theta_0$ and variance $\gamma^2$. Then, the MAP estimate minimizes<br />
<br />
{{Equation1<br />
|equation=<math><br />
\hat{\theta}^{\rm MAP} =\argmax{\theta} \left\{ {\llike} (\theta ; \by) - \displaystyle{ \frac{1}{2\gamma^2} }(\theta - \theta_0)^2 \right\} .<br />
</math> }}<br />
<br />
The MAP estimate is a trade-off between the MLE which maximizes ${\llike}(\theta ; \by)$ and $\theta_0$ which minimizes $(\theta - \theta_0)^2$. The weight given to the prior directly depends on the variance of the prior distribution: the smaller $\gamma^2$ is, the closer to $\theta_0$ the MAP is. The limiting distribution considers that $\gamma^2=0$: this prior means here that $\theta$ is fixed as $\theta_0$ and no longer needs to be estimated.<br />
<br />
Both the Bayesian and frequentist approaches have their supporters and detractors. But rather than being dogmatic and blindly following the same rule-book every time, we need to be pragmatic and ask the right methodological questions when confronted with a new problem.<br />
<br />
We have to remember that Bayesian methods have been extremely successful, in particular for numerical calculations. For instance, (Bayesian) MCMC methods allow us to estimate more or less any conditional distribution coming from any hierarchical model, whereas frequentist approaches such as maximum likelihood estimation can be much more difficult to implement.<br />
<br />
All things said, the problem comes down to knowing whether the data contains sufficient information to answer a given question, and whether some other information may be available to help answer it. This is the essence of the art of modeling: finding the right compromise between the confidence we have in the data and prior knowledge of the problem. Each problem is different and requires a specific approach. For instance, if all the patients in a pharmacokinetic trial have essentially the same weight, it is pointless to estimate a relationship between weight and the model's PK parameters using the trial data. In this case, the modeler would be better served trying to use prior information based on physiological criteria rather than just a statistical model.<br />
<br />
Therefore, we can use information available to us, of course! Why not? But this information needs to be pertinent. Systematically using a prior for the parameters is not always meaningful. Can we reasonable suppose that we have access to such information? For continuous data for example, what does putting a prior on the residual error model's parameters mean in reality? A reasoned statistical approach consists of only including prior information for certain parameters (those for which we have real prior information) and having confidence in the data for the others.<br />
<br />
$\monolix$ allows this hybrid approach which reconciles the Bayesian and frequentist approaches. A given parameter can be:<br />
<br />
<br />
<ul><br />
* a fixed constant if we have absolute confidence in its value or the data does not allow it to be estimated, essentially due to identifiability constraints.<br />
<br><br />
<br />
* estimated by maximum likelihood, either because we have great confidence in the data or have no information on the parameter.<br />
<br><br />
<br />
* estimated by introducing a prior and calculating the MAP estimate.<br />
<br><br />
<br />
* estimated by introducing a prior and then estimating the posterior distribution.<br />
</ul><br />
<br />
<br />
We put aside dealing with the fixed components of $\theta$ in the following. Here are some possible situations:<br />
<br />
<br />
<ol><br />
<li> ''Combined maximum likelihood and maximum a posteriori estimation'': decompose $\theta$ into $(\theta_E,\theta_{M})$ where $\theta_E$ are the components of $\theta$ to be estimated with MLE and $\theta_{M}$ those with a prior distribution whose posterior distribution is to be maximized. Then, $(\hat{\theta}_E , \hat{\theta}_{M} )$ below maximizes the penalized likelihood of $(\theta_E,\theta_{M})$: </li><br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
(\hat{\theta}_E , \hat{\theta}_{M} ) &=& \argmax{\theta_E , \theta_{M} } \log(\py(\by , \theta_{M}; \theta_E)) \\<br />
&=& \argmax{\theta_E , \theta_{M} } \left\{ {\llike}(\theta_E , \theta_{M}; \by) + \log( \pth( \theta_M ) ) \right\} ,<br />
\end{eqnarray}</math> }}<br />
<br />
where ${\llike} (\theta_E , \theta_{M}; \by) \ \ \eqdef \ \ \log\left(\py(\by | \theta_{M}; \theta_E)\right).$<br />
<br />
<br />
<li> ''Combined maximum likelihood and posterior distribution estimation'': here, decompose $\theta$ into $(\theta_E,\theta_{R})$ where $\theta_E$ are the components of $\theta$ to be estimated with MLE and $\theta_{R}$ those with a prior distribution whose posterior distribution is to be estimated. We propose the following strategy for estimating $\theta_E$ and $\theta_{R}$: </li><br />
<br />
<br />
<ol style="list-style-type:lower-roman"><br />
<li> Compute the maximum likelihood of $\theta_E$: </li><br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\hat{\theta}_E &=& \argmax{\theta_E} \log(\py(\by ; \theta_E)) \\<br />
&=& \argmax{\theta_E} \int \pmacro(\by , \theta_R ; \theta_E ) d \theta_R .<br />
\end{eqnarray}</math> }}<br />
<br />
<br />
<li> Estimate the conditional distribution $\pmacro(\theta_{R} | \by ;\hat{\theta}_E)$. </li><br />
</ol><br />
<br />
<br />
It is then straightforward to extend this approach to more complex situations where some components of $\theta$ are estimated with MLE, others using MAP estimation and others still by estimating their conditional distributions.<br />
</ol><br />
<br />
<br />
{{Example1<br />
|title1=Example<br />
|title2=A PK example<br />
|text=<br />
In this example we use only the pharmacokinetic data and aim to estimate the population parameter distributions of the PK parameters $ka$, $V$ and $Cl$. We assume log-normal distributions for these three parameters. All of the model's population parameters are estimated by maximum likelihood estimation except $ka_{\rm pop}$ for which a log-normal distribution is used as a prior:<br />
<br />
{{Equation1<br />
|equation=<math> \log(ka_{\rm pop}) \sim {\cal N}(\log(1.5), \gamma^2) . </math> }}<br />
<br />
$\monolix$ allows us to compute the MAP estimate and to estimate the posterior distribution of $ka_{\rm pop}$ for various values of $\gamma$.<br />
<br />
<br />
<div style="margin-left:15%; margin-right:25%; align:center"><br />
{{{!}} class="wikitable" align="center" style="width:100%"<br />
{{!}} $\gamma$ {{!}}{{!}} 0 {{!}}{{!}} 0.01 {{!}}{{!}} 0.025 {{!}}{{!}} 0.05 {{!}}{{!}} 0.1 {{!}}{{!}} 0.2 {{!}}{{!}} $+ \infty$ <br />
{{!}}-<br />
{{!}}$\hat{ka}_{\rm pop}^{\rm MAP}$ {{!}}{{!}} 1.5 {{!}}{{!}} 1.49 {{!}}{{!}} 1.47 {{!}}{{!}} 1.39 {{!}}{{!}} 1.22 {{!}}{{!}} 1.11 {{!}}{{!}} 1.05 <br />
{{!}}}</div><br />
<br />
{{ImageWithCaption|image=bayes1.png|caption=Prior and posterior distributions of $ka_{\rm pop}$ for different values of $\gamma$}}<br />
<br />
<br />
As expected, the posterior distribution converges to the prior distribution when the standard deviation $\gamma$ of the prior distribution decreases. Also, the mode of the posterior distribution converges to the maximum likelihood estimate of $ka_{\rm pop}$ when $\gamma$ increases.<br />
}}<br />
<br />
<br />
<br><br />
== Estimation of the Fisher information matrix ==<br />
<br />
The variance of the estimator $\thmle$ and thus confidence intervals can be derived from the [[Estimation of the observed Fisher information matrix|observed Fisher information matrix (F.I.M.)]], which itself is calculated using the observed likelihood (i.e., the pdf of the observations $\by$):<br />
<br />
{{EquationWithRef<br />
|equation=<div id="ofim_intro3"><math><br />
\ofim(\thmle ; \by) \ \ \eqdef \ \ - \displaystyle{ \frac{\partial^2}{\partial \theta^2} }\log({\like}(\thmle ; \by)) .<br />
</math></div><br />
|reference=(1) }}<br />
<br />
Then, the variance-covariance matrix of the maximum likelihood estimator $\thmle$ can be estimated by the inverse of the observed F.I.M. Standard errors (s.e.) for each component of $\thmle$ are their standard deviations, i.e., the square-root of the diagonal elements of this covariance matrix. $\monolix$ also displays the (estimated) relative standard errors (r.s.e.), i.e., the (estimated) standard error divided by the value of the estimated parameter.<br />
<br />
<br />
{{JustCode<br />
|code=<pre style="background-color:#EFEFEF; border:none;">Estimation of the population parameters<br />
<br />
parameter s.e. (s.a.) r.s.e.(%)<br />
ka : 0.974 0.082 8<br />
V : 7.07 0.35 5<br />
Cl : 2 0.07 4<br />
h0 : 0.0102 0.0014 14<br />
gamma : 0.485 0.015 3<br />
<br />
omega_ka : 0.668 0.064 10<br />
omega_V : 0.365 0.037 10<br />
omega_Cl : 0.588 0.055 9<br />
omega_h0 : 0.105 0.032 30<br />
omega_gamma : 0.0901 0.044 49<br />
<br />
a_1 : 0.345 0.012 3<br />
</pre> }}<br />
<br />
The F.I.M. can be used for detecting overparametrization of the structural model. In effect, if the model is poorly identifiable, certain estimators will be quite correlated and the F.I.M. will therefore be poorly conditioned and difficult to inverse. Suppose for example that we want to fit a two compartment PK model to the same data as before. The output is shown below. The large values for the relative standard errors for the inter-compartmental clearance $Q$ and the volume of the peripheral compartment $V_2$ mean that the data does not allow us to estimate well these two parameters.<br />
<br />
<br />
{{JustCode<br />
|code=<pre style="background-color:#EFEFEF; border:none;">Estimation of the population parameters<br />
<br />
parameter s.e. (lin) r.s.e.(%)<br />
ka : 0.246 0.0081 3<br />
Cl : 1.9 0.075 4<br />
V1 : 1.71 0.14 8<br />
Q : 0.000171 0.024 1.43e+04<br />
V2 : 0.00673 3.1 4.62e+04<br />
<br />
omega_ka : 0.171 0.026 15<br />
omega_Cl : 0.293 0.026 9<br />
omega_V1 : 0.621 0.062 10<br />
omega_Q : 5.72 1.4e+03 2.41e+04<br />
omega_V2 : 4.61 1.8e+04 3.94e+05<br />
<br />
a : 0.136 0.0073 5<br />
</pre> }}<br />
<br />
<br />
The Fisher information criteria is also widely used in optimal experimental design. Indeed, minimizing the variance of the estimator corresponds to maximizing the information. Then, estimators and designs can be evaluated by looking at certain summary statistics of the covariance matrix (like the determinant or trace for instance).<br />
<br />
<br><br />
== Estimation of the individual parameters ==<br />
<br />
Once $\theta$ has been estimated, the conditional distribution $\pmacro(\psi_i | y_i ; \hat{\theta})$ of the individual parameters $\psi_i$ can be estimated for each individual $i$ using the [[The Metropolis-Hastings algorithm for simulating the individual parameters| Metropolis-Hastings algorithm]]. For each $i$, this algorithm generates a sequence $(\psi_i^{k}, k \geq 1)$ which converges in distribution to the conditional distribution $\pmacro(\psi_i | y_i ; \hat{\theta})$ and that can be used for estimating any summary statistic of this distribution (mean, standard deviation, quantiles, etc.).<br />
<br />
The mode of this conditional distribution can be estimated using this sequence or by maximizing $\pmacro(\psi_i | y_i ; \hat{\theta})$ using numerical methods.<br />
<br />
The choice of using the conditional mean or the conditional mode is arbitrary. By default, $\monolix$ uses the conditional mode, taking the philosophy that the "most likely" values of the individual parameters are the most suited for computing the "most likely" predictions.<br />
<br />
<br />
{{ImageWithCaption|image=mode1.png|caption=Predicted concentrations for 6 individuals using the estimated conditional modes of the individual PK parameters}} <br />
<br />
<br><br />
<br />
== Estimation of the observed log-likelihood ==<br />
<br />
<br />
Once $\theta$ has been estimated, the observed log-likelihood of $\hat{\theta}$ is defined as<br />
<br />
{{Equation1<br />
|equation=<math> \begin{eqnarray}<br />
{\llike} (\hat{\theta};\by) &=& \log({\like}(\hat{\theta};\by)) \\<br />
&\eqdef& \log(\py(\by;\hat{\theta})) .<br />
\end{eqnarray}</math> }}<br />
<br />
The observed log-likelihood cannot be computed in closed form for nonlinear mixed effects models, but can be estimated using the methods described in the [[Estimation of the log-likelihood]] Section. The estimated log-likelihood can then be used for performing likelihood ratio tests and for computing information criteria such as AIC and BIC (see the [[Evaluation]] Section).<br />
<br />
<br />
<br><br />
== Bibliography ==<br />
<br />
<bibtex><br />
@article{Monolix,<br />
author = {Lixoft},<br />
title = {Monolix 4.2},<br />
year={2012}<br />
journal = {http://www.lixoft.eu/products/monolix/product-monolix-overview},<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@article{comets2011package,<br />
title={saemix: Stochastic Approximation Expectation Maximization (SAEM) algorithm. R package version 0.96.1},<br />
author={Comets, E. and Lavenu, A. and Lavielle, M.},<br />
journal = {http://cran.r-project.org/web/packages/saemix/index.html},<br />
year={2013}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@article{nlmefitsa,<br />
title={nlmefitsa: fit nonlinear mixed-effects model with stochastic EM algorithm. Matlab R2013a function},<br />
author={The MathWorks},<br />
journal = {http://www.mathworks.fr/fr/help/stats/nlmefitsa.html},<br />
year={2013}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@article{beal1992nonmem,<br />
title={NONMEM users guides},<br />
author={Beal, S.L. and Sheiner, L.B. and Boeckmann, A. and Bauer, R.J.},<br />
journal={San Francisco, NONMEM Project Group, University of California},<br />
year={1992}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@book{pinheiro2000mixed,<br />
title={Mixed effects models in S and S-PLUS},<br />
author={Pinheiro, J.C. and Bates, D.M.},<br />
year={2000},<br />
publisher={Springer Verlag}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@article{pinheiro2010r,<br />
title={the R Core team (2009) nlme: Linear and Nonlinear Mixed Effects Models. R package version 3.1-96},<br />
author={Pinheiro, J. and Bates, D. and DebRoy, S. and Sarkar, D.},<br />
journal={R Foundation for Statistical Computing, Vienna},<br />
year={2010}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@article{spiegelhalter2003winbugs,<br />
title={WinBUGS user manual},<br />
author={Spiegelhalter, D. and Thomas, A. and Best, N. and Lunn, D.},<br />
journal={Cambridge: MRC Biostatistics Unit},<br />
year={2003}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@Manual{docSPSS,<br />
title = {Linear mixed-effects modeling in SPSS. An introduction to the MIXED procedure},<br />
author = {SPSS},<br />
year = {2002},<br />
note={Technical Report}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@Manual{docSAS,<br />
title = {The NLMMIXED procedure, SAS/STAT 9.2 User's Guide},<br />
chapter = {61},<br />
pages = {4337--4435},<br />
author = {SAS},<br />
year = {2008}<br />
}<br />
</bibtex><br />
<br />
<br />
{{Back&Next<br />
|linkBack=Visualization<br />
|linkNext=Model evaluation }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=Estimation&diff=7281Estimation2013-06-07T13:14:41Z<p>Brocco: </p>
<hr />
<div>== Introduction ==<br />
<br />
In the modeling context, we usually assume that we have data that includes observations $\by$, measurement times $\bt$ and possibly additional regression variables $\bx$. There may also be individual covariates $\bc$, and in pharmacological applications the dose regimen $\bu$. For clarity, in the following notation we will omit the design variables $\bt$, $\bx$ and $\bu$, and the covariates $\bc$.<br />
<br />
Here, we find ourselves in the classical framework of incomplete data models. Indeed, only $\by = (y_{ij})$ is observed in the joint model $\pypsi(\by,\bpsi;\theta)$.<br />
<br />
Estimation tasks are common ones seen in statistics:<br />
<br />
<br />
<ol><br />
<li> Estimate the population parameter $\theta$ using the available observations and possibly a priori information that is available.</li><br />
<br />
<li>Evaluate the precision of the proposed estimates.</li><br />
<br />
<li>Reconstruct missing data, here being the individual parameters $\bpsi=(\psi_i, 1\leq i \leq N)$. </li><br />
<br />
<li>Estimate the log-likelihood for a given model, i.e., for a given joint distribution $\qypsi$ and value of $\theta$.</li><br />
</ol><br />
<br />
<br />
<br><br />
<br />
== Maximum likelihood estimation of the population parameters== <br />
<br />
<br><br />
=== Definitions ===<br />
<br />
<br />
''Maximum likelihood estimation'' consists of maximizing with respect to $\theta$ the ''observed likelihood'' defined by:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\like(\theta ; \by) &\eqdef& \py(\by ; \theta) \\<br />
&=& \int \pypsi(\by,\bpsi ;\theta) \, d \bpsi .<br />
\end{eqnarray}</math> }}<br />
<br />
Maximum likelihood estimation of the population parameter $\theta$ requires:<br />
<br />
<blockquote><br />
* A model, i.e., a joint distribution $\qypsi$. Depending on the software used, the model can be implemented using a script or a graphical user interface. $\monolix$ is extremely flexible and allows us to combine both. It is possible for instance to code the structural model using $\mlxtran$ and use the GUI for implementing the statistical model. Whatever the options selected, the complete model can always be saved as a text file. <br><br><br />
* Inputs $\by$, $\bc$, $\bu$ and $\bt$. All of these variables tend to be stored in a unique data file (see the [[Visualization#Data exploration | Data Exploration ]] Section). <br><br><br />
* An algorithm which allows us to maximize $\int \pypsi(\by,\bpsi ;\theta) \, d \bpsi$ with respect to $\theta$. Each software package has its own algorithms implemented. It is not our goal here to rate and compare the various algorithms and implementations. We will use exclusively the SAEM algorithm as described in [[The SAEM algorithm for estimating population parameters | The SAEM algorithm]] and implemented in $\monolix$ as we are entirely satisfied by both its theoretical and practical qualities: <br><br><br />
** The algorithms implemented in $\monolix$ including SAEM and its extensions (mixture models, hidden Markov models, SDE-based model, censored data, etc.) have been published in statistical journals. Furthermore, convergence of SAEM has been rigorously proved.<br><br><br />
** The SAEM implementation in $\monolix$ is extremely efficient for a wide variety of complex models.<br><br><br />
** The SAEM implementation in $\monolix$ was done by the same group that proposed the algorithm and studied in detail its theoretical and practical properties.<br />
</blockquote><br />
<br />
<br />
{{Remarks<br />
|title=Remark<br />
|text= It is important to highlight the fact that for a parameter $\psi_i$ whose distribution is the tranformation of a normal one (log-normal, logit-normal, etc.) the MLE $\hat{\psi}_{\rm pop}$ of the reference parameter $\psi_{\rm pop}$ is neither the mean nor the mode of the distribution. It is in fact the median.<br />
<br />
To show why this is the case, let $h$ be a nonlinear, twice continuously derivable and strictly increasing function such that $h(\psi_i)$ is normally distributed.<br />
<br />
<br />
* First we show that it is not the mean. By definition, the MLE of $h(\psi_{\rm pop})$ is $h(\hat{\psi}_{\rm pop})$. Thus, the estimated distribution of $h(\psi_i)$ is the normal distribution with mean $h(\hat{\psi}_{\rm pop})$, but $\esp{h(\psi_i)} = h(\hat{\psi}_{\rm pop})$ implies that $\esp{\psi_i} \neq \hat{\psi}_{\rm pop}$ since $h$ is nonlinear. In other words, $\hat{\psi}_{\rm pop}$ is not the mean of the estimated distribution of $\psi_i$.<br />
<br />
<br />
* Next we show that it is not the mode. Let $f$ be the pdf of $\psi_i$ and let $f_h$ be the pdf of $h(\psi_i)$. By definition, for any $h(t)\in \mathbb{R}$,<br />
<br />
{{Equation1<br />
|equation=<math><br />
f(t) = h^\prime(t)f_h(h(t)) . </math> }}<br />
<br />
: Thus,<br />
<br />
{{Equation1<br />
|equation=<math> <br />
f^\prime(t) = h^{\prime \prime}(t)f_h(h(t)) + h^{\prime 2}(t)f_h^\prime(h(t)) .<br />
</math> }}<br />
<br />
: By definition of the mode, $f_h^\prime(h(\hat{\psi}_{\rm pop}))=0$. Since $h$ is nonlinear, $h^{\prime \prime}(\hat{\psi}_{\rm pop})\neq 0$ a.s. and $f^\prime(\hat{\psi}_{\rm pop})\neq 0$ a.s.. In other words, $\hat{\psi}_{\rm pop}$ is not the mode of the estimated distribution of $\psi_i$.<br />
<br />
<br />
* Now we show that it is the median. Since $h$ is a strictly increasing function,<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\probs{\hat{\psi}_{\rm pop} }{\psi_i \leq \hat{\psi}_{\rm pop} } &=& \probs{\hat{\psi}_{\rm pop} }{h(\psi_i) \leq h(\hat{\psi}_{\rm pop})} \\<br />
&=& 0.5 .<br />
\end{eqnarray}</math> }} <br />
<br />
: In other words, $\hat{\psi}_{\rm pop}$ is the median of the estimated distribution of $\psi_i$.<br />
}}<br />
<br />
<br />
<br><br />
<br />
=== Example ===<br />
<br />
Let us again look at the model used in the [[Visualization#Model exploration | Model Visualization]] Section. For the case of a unique dose $D$ given at time $t=0$, the structural model is written:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
ke&=&Cl/V \\<br />
Cc(t) &=& \displaystyle{\frac{D \, ka}{V(ka-ke)} }\left(e^{-ke\,t} - e^{-ka\,t} \right) \\<br />
h(t) &=& h_0 \, \exp(\gamma\, Cc(t)) ,<br />
\end{eqnarray}</math> }}<br />
<br />
where $Cc$ is the concentration in the central compartment and $h$ the hazard function for the event of interest (hemorrhaging). Supposing a constant error model for the concentration, the model for the observations can be easily implemented using $\mlxtran$.<br />
<br />
<br />
{{MLXTran<br />
|name=joint1est_model.txt<br />
|text=<pre style="background-color:#EFEFEF; border:none;"> <br />
INPUT:<br />
parameter = {ka, V, Cl, h0, gamma}<br />
<br />
EQUATION:<br />
ke=Cl/V<br />
Cc = amtDose*ka/(V*(ka-ke))*(exp(-ke*t) - exp(-ka*t))<br />
h = h0*exp(gamma*Cc)<br />
<br />
OBSERVATION:<br />
Concentration = {type=continuous, prediction=Cc, errorModel=constant}<br />
Hemorrhaging = {type=event, hazard=h}<br />
<br />
OUTPUT:<br />
output = {Concentration, Hemorrhaging}<br />
</pre> }}<br />
<br />
<br />
Here, {{Verbatim|amtDose}} is a reserved keyword for the last administered dose.<br />
<br />
The model's parameters are the absorption rate constant $ka$, the volume of distribution $V$, the clearance $Cl$, the baseline hazard $h_0$ and the coefficient $\gamma$. The statistical model for the individual parameters can be defined in the $\monolix$ project file (left) and/or the $\monolix$ GUI (right):<br />
<br />
<br />
{{ExampleWithCode&Image<br />
|title=<br />
|text=<br />
|code={{MLXTranForTable<br />
|name=<br />
|text=<pre style="background-color:#EFEFEF; border:none;"> <br />
INDIVIDUAL:<br />
ka = {distribution=logNormal, iiv=yes}<br />
V = {distribution=logNormal, iiv=yes},<br />
Cl = {distribution=normal, iiv=yes},<br />
h0 = {distribution=probitNormal, iiv=yes},<br />
gamma = {distribution=logitNormal, iiv=yes},<br />
</pre> }}<br />
|image=<br />
[[File:Vsaem1.png]]<br />
}}<br />
<br />
<br />
Once the model is implemented, tasks such as maximum likelihood estimation can be performed using the SAEM algorithm. Certain settings in SAEM must be provided by the user. Even though SAEM is quite insensitive to the initial parameter values,<br />
it is possible to perform a preliminary sensitivity analysis in order to select "good" initial values.<br />
<br />
<br />
{{ImageWithCaption|image=Vsaem2.png|caption=Looking for good initial values for SAEM}}<br />
<br />
<br />
<br />
Then, when we run SAEM, it converges easily and quickly to the MLE:<br />
<br />
<br />
{{JustCode<br />
|code=<pre style="background-color:#EFEFEF; border:none;">Estimation of the population parameters<br />
<br />
parameter<br />
ka : 0.974<br />
V : 7.07<br />
Cl : 2.00<br />
h0 : 0.0102<br />
gamma : 0.485<br />
<br />
omega_ka : 0.668<br />
omega_V : 0.365<br />
omega_Cl : 0.588<br />
omega_h0 : 0.105<br />
omega_gamma : 0.0901<br />
<br />
a_1 : 0.345<br />
</pre> }}<br />
<br />
<br />
Parameter estimation can therefore be seen as estimating the reference values and variance of the random effects.<br />
<br />
In addition to these numbers, it is important to be able to graphically represent these distributions in order to see them and therefore understand them better. In effect, the interpretation of certain parameters is not always simple. Of course, we know what a normal distribution represents and in particular its mean, median and mode, which are equal (see the distribution of $Cl$ below for instance). These measures of central tendency can be different among themselves for other asymmetric distributions such as the log-normal (see the distribution of $ka$).<br />
<br />
Interpreting dispersion terms like $\omega_{ka}$ and $\omega_{V}$ is not obvious either when the parameter distributions are not normal. In such cases, quartiles or quantiles of order 5% and 95% (for example) may be useful for quantitively describing the variability of these parameters.<br />
<br />
<br />
{{Remarks <br />
|title=Remarks<br />
|text=<br />
For a parameter $\psi$ whose distribution is log-normal, we can approximate the coefficient of variation for $\psi$ by the standard deviation $\omega_{\psi}$ of the random effect $\eta$ if this is fairly small. In effect, when $\omega_{\psi}$ is small,<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\psi &=& \psi_{\rm pop} e^{\eta} \\<br />
&\approx & \psi_{\rm pop}(1+ \eta) .<br />
\end{eqnarray}</math> }}<br />
<br />
Thus<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\esp{\psi} &\approx& \psi_{\rm pop} \\<br />
\std{\psi} &\approx & \psi_{\rm pop}\omega_{\psi},<br />
\end{eqnarray}</math> }}<br />
<br />
and<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
{\rm cv}(\psi) &=& \frac{\std{\psi} }{\esp{\psi} } \\<br />
&\approx & \omega_{\psi} .<br />
\end{eqnarray}</math> }}<br />
<br />
Do not forget that this approximation is only valid when $\omega$ is small and in the case of log-normal distributions. It does not carry over to any other distribution. Thus, when $\omega_{h0}=0.1$ for a probit-normal distribution or $\omega_{\gamma}=0.09$ for a logit-normal one, there is no immediate interpretation available. Only by looking at the graphical display of the pdf or by calculating some quantiles of interest can we begin to get an idea of dispersion in the parameters $h0$ and $\gamma$.<br />
}}<br />
<br />
<br />
{{ImageWithCaption|image=saem3b.png|caption=Estimation of the population distributions of the individual parameters of the model }}<br />
<br />
<br />
<br />
<br><br />
<br />
==Bayesian estimation of the population parameters==<br />
<br />
The ''Bayesian approach'' considers $\theta$ as a random vector with a ''prior distribution'' $\qth$. We can then define the posterior distribution of $\theta$:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\pcthy(\theta {{!}} \by ) &=& \displaystyle{ \frac{\pth( \theta )\pcyth(\by {{!}} \theta )}{\py(\by)} }\\<br />
&=& \displaystyle{ \frac{\pth( \theta ) \int \pypsith(\by,\bpsi {{!}}\theta) \, d \bpsi}{\py(\by)} }.<br />
\end{eqnarray}</math> }}<br />
<br />
We can estimate this conditional distribution and derive any statistics (posterior mean, standard deviation, percentiles, etc.) or derive the so-called ''Maximum a Posteriori'' (MAP) estimate of $\theta$:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\hat{\theta}^{\rm MAP} &=& \argmax{\theta} \pcthy(\theta {{!}} \by ) \\<br />
&=& \argmax{\theta} \left\{ {\llike}(\theta ; \by) + \log( \pth( \theta ) ) \right\} .<br />
\end{eqnarray}</math> }}<br />
<br />
The MAP estimate therefore maximizes a penalized version of the observed likelihood. In other words, maximum a posteriori estimation reduces to penalized maximum likelihood estimation. Suppose for instance that $\theta$ is a scalar parameter and the prior is a normal distribution with mean $\theta_0$ and variance $\gamma^2$. Then, the MAP estimate minimizes<br />
<br />
{{Equation1<br />
|equation=<math><br />
\hat{\theta}^{\rm MAP} =\argmax{\theta} \left\{ {\llike} (\theta ; \by) - \displaystyle{ \frac{1}{2\gamma^2} }(\theta - \theta_0)^2 \right\} .<br />
</math> }}<br />
<br />
The MAP estimate is a trade-off between the MLE which maximizes ${\llike}(\theta ; \by)$ and $\theta_0$ which minimizes $(\theta - \theta_0)^2$. The weight given to the prior directly depends on the variance of the prior distribution: the smaller $\gamma^2$ is, the closer to $\theta_0$ the MAP is. The limiting distribution considers that $\gamma^2=0$: this prior means here that $\theta$ is fixed as $\theta_0$ and no longer needs to be estimated.<br />
<br />
Both the Bayesian and frequentist approaches have their supporters and detractors. But rather than being dogmatic and blindly following the same rule-book every time, we need to be pragmatic and ask the right methodological questions when confronted with a new problem.<br />
<br />
We have to remember that Bayesian methods have been extremely successful, in particular for numerical calculations. For instance, (Bayesian) MCMC methods allow us to estimate more or less any conditional distribution coming from any hierarchical model, whereas frequentist approaches such as maximum likelihood estimation can be much more difficult to implement.<br />
<br />
All things said, the problem comes down to knowing whether the data contains sufficient information to answer a given question, and whether some other information may be available to help answer it. This is the essence of the art of modeling: finding the right compromise between the confidence we have in the data and prior knowledge of the problem. Each problem is different and requires a specific approach. For instance, if all the patients in a pharmacokinetic trial have essentially the same weight, it is pointless to estimate a relationship between weight and the model's PK parameters using the trial data. In this case, the modeler would be better served trying to use prior information based on physiological criteria rather than just a statistical model.<br />
<br />
Therefore, we can use information available to us, of course! Why not? But this information needs to be pertinent. Systematically using a prior for the parameters is not always meaningful. Can we reasonable suppose that we have access to such information? For continuous data for example, what does putting a prior on the residual error model's parameters mean in reality? A reasoned statistical approach consists of only including prior information for certain parameters (those for which we have real prior information) and having confidence in the data for the others.<br />
<br />
$\monolix$ allows this hybrid approach which reconciles the Bayesian and frequentist approaches. A given parameter can be:<br />
<br />
<br />
<ul><br />
* a fixed constant if we have absolute confidence in its value or the data does not allow it to be estimated, essentially due to identifiability constraints.<br />
<br><br />
<br />
* estimated by maximum likelihood, either because we have great confidence in the data or have no information on the parameter.<br />
<br><br />
<br />
* estimated by introducing a prior and calculating the MAP estimate.<br />
<br><br />
<br />
* estimated by introducing a prior and then estimating the posterior distribution.<br />
</ul><br />
<br />
<br />
We put aside dealing with the fixed components of $\theta$ in the following. Here are some possible situations:<br />
<br />
<br />
<ol><br />
<li> ''Combined maximum likelihood and maximum a posteriori estimation'': decompose $\theta$ into $(\theta_E,\theta_{M})$ where $\theta_E$ are the components of $\theta$ to be estimated with MLE and $\theta_{M}$ those with a prior distribution whose posterior distribution is to be maximized. Then, $(\hat{\theta}_E , \hat{\theta}_{M} )$ below maximizes the penalized likelihood of $(\theta_E,\theta_{M})$: </li><br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
(\hat{\theta}_E , \hat{\theta}_{M} ) &=& \argmax{\theta_E , \theta_{M} } \log(\py(\by , \theta_{M}; \theta_E)) \\<br />
&=& \argmax{\theta_E , \theta_{M} } \left\{ {\llike}(\theta_E , \theta_{M}; \by) + \log( \pth( \theta_M ) ) \right\} ,<br />
\end{eqnarray}</math> }}<br />
<br />
where ${\llike} (\theta_E , \theta_{M}; \by) \ \ \eqdef \ \ \log\left(\py(\by | \theta_{M}; \theta_E)\right).$<br />
<br />
<br />
<li> ''Combined maximum likelihood and posterior distribution estimation'': here, decompose $\theta$ into $(\theta_E,\theta_{R})$ where $\theta_E$ are the components of $\theta$ to be estimated with MLE and $\theta_{R}$ those with a prior distribution whose posterior distribution is to be estimated. We propose the following strategy for estimating $\theta_E$ and $\theta_{R}$: </li><br />
<br />
<br />
<ol style="list-style-type:lower-roman"><br />
<li> Compute the maximum likelihood of $\theta_E$: </li><br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\hat{\theta}_E &=& \argmax{\theta_E} \log(\py(\by ; \theta_E)) \\<br />
&=& \argmax{\theta_E} \int \pmacro(\by , \theta_R ; \theta_E ) d \theta_R .<br />
\end{eqnarray}</math> }}<br />
<br />
<br />
<li> Estimate the conditional distribution $\pmacro(\theta_{R} | \by ;\hat{\theta}_E)$. </li><br />
</ol><br />
<br />
<br />
It is then straightforward to extend this approach to more complex situations where some components of $\theta$ are estimated with MLE, others using MAP estimation and others still by estimating their conditional distributions.<br />
</ol><br />
<br />
<br />
{{Example1<br />
|title1=Example<br />
|title2=A PK example<br />
|text=<br />
In this example we use only the pharmacokinetic data and aim to estimate the population parameter distributions of the PK parameters $ka$, $V$ and $Cl$. We assume log-normal distributions for these three parameters. All of the model's population parameters are estimated by maximum likelihood estimation except $ka_{\rm pop}$ for which a log-normal distribution is used as a prior:<br />
<br />
{{Equation1<br />
|equation=<math> \log(ka_{\rm pop}) \sim {\cal N}(\log(1.5), \gamma^2) . </math> }}<br />
<br />
$\monolix$ allows us to compute the MAP estimate and to estimate the posterior distribution of $ka_{\rm pop}$ for various values of $\gamma$.<br />
<br />
<br />
<div style="margin-left:15%; margin-right:15%; align:center"><br />
{{{!}} class="wikitable" align="center" style="width:100%"<br />
{{!}} $\gamma$ {{!}}{{!}} 0 {{!}}{{!}} 0.01 {{!}}{{!}} 0.025 {{!}}{{!}} 0.05 {{!}}{{!}} 0.1 {{!}}{{!}} 0.2 {{!}}{{!}} $+ \infty$ <br />
{{!}}-<br />
{{!}}$\hat{ka}_{\rm pop}^{\rm MAP}$ {{!}}{{!}} 1.5 {{!}}{{!}} 1.49 {{!}}{{!}} 1.47 {{!}}{{!}} 1.39 {{!}}{{!}} 1.22 {{!}}{{!}} 1.11 {{!}}{{!}} 1.05 <br />
{{!}}}</div><br />
<br />
{{ImageWithCaption|image=bayes1.png|caption=Prior and posterior distributions of $ka_{\rm pop}$ for different values of $\gamma$}}<br />
<br />
<br />
As expected, the posterior distribution converges to the prior distribution when the standard deviation $\gamma$ of the prior distribution decreases. Also, the mode of the posterior distribution converges to the maximum likelihood estimate of $ka_{\rm pop}$ when $\gamma$ increases.<br />
}}<br />
<br />
<br />
<br><br />
== Estimation of the Fisher information matrix ==<br />
<br />
The variance of the estimator $\thmle$ and thus confidence intervals can be derived from the [[Estimation of the observed Fisher information matrix|observed Fisher information matrix (F.I.M.)]], which itself is calculated using the observed likelihood (i.e., the pdf of the observations $\by$):<br />
<br />
{{EquationWithRef<br />
|equation=<div id="ofim_intro3"><math><br />
\ofim(\thmle ; \by) \ \ \eqdef \ \ - \displaystyle{ \frac{\partial^2}{\partial \theta^2} }\log({\like}(\thmle ; \by)) .<br />
</math></div><br />
|reference=(1) }}<br />
<br />
Then, the variance-covariance matrix of the maximum likelihood estimator $\thmle$ can be estimated by the inverse of the observed F.I.M. Standard errors (s.e.) for each component of $\thmle$ are their standard deviations, i.e., the square-root of the diagonal elements of this covariance matrix. $\monolix$ also displays the (estimated) relative standard errors (r.s.e.), i.e., the (estimated) standard error divided by the value of the estimated parameter.<br />
<br />
<br />
{{JustCode<br />
|code=<pre style="background-color:#EFEFEF; border:none;">Estimation of the population parameters<br />
<br />
parameter s.e. (s.a.) r.s.e.(%)<br />
ka : 0.974 0.082 8<br />
V : 7.07 0.35 5<br />
Cl : 2 0.07 4<br />
h0 : 0.0102 0.0014 14<br />
gamma : 0.485 0.015 3<br />
<br />
omega_ka : 0.668 0.064 10<br />
omega_V : 0.365 0.037 10<br />
omega_Cl : 0.588 0.055 9<br />
omega_h0 : 0.105 0.032 30<br />
omega_gamma : 0.0901 0.044 49<br />
<br />
a_1 : 0.345 0.012 3<br />
</pre> }}<br />
<br />
The F.I.M. can be used for detecting overparametrization of the structural model. In effect, if the model is poorly identifiable, certain estimators will be quite correlated and the F.I.M. will therefore be poorly conditioned and difficult to inverse. Suppose for example that we want to fit a two compartment PK model to the same data as before. The output is shown below. The large values for the relative standard errors for the inter-compartmental clearance $Q$ and the volume of the peripheral compartment $V_2$ mean that the data does not allow us to estimate well these two parameters.<br />
<br />
<br />
{{JustCode<br />
|code=<pre style="background-color:#EFEFEF; border:none;">Estimation of the population parameters<br />
<br />
parameter s.e. (lin) r.s.e.(%)<br />
ka : 0.246 0.0081 3<br />
Cl : 1.9 0.075 4<br />
V1 : 1.71 0.14 8<br />
Q : 0.000171 0.024 1.43e+04<br />
V2 : 0.00673 3.1 4.62e+04<br />
<br />
omega_ka : 0.171 0.026 15<br />
omega_Cl : 0.293 0.026 9<br />
omega_V1 : 0.621 0.062 10<br />
omega_Q : 5.72 1.4e+03 2.41e+04<br />
omega_V2 : 4.61 1.8e+04 3.94e+05<br />
<br />
a : 0.136 0.0073 5<br />
</pre> }}<br />
<br />
<br />
The Fisher information criteria is also widely used in optimal experimental design. Indeed, minimizing the variance of the estimator corresponds to maximizing the information. Then, estimators and designs can be evaluated by looking at certain summary statistics of the covariance matrix (like the determinant or trace for instance).<br />
<br />
<br><br />
== Estimation of the individual parameters ==<br />
<br />
Once $\theta$ has been estimated, the conditional distribution $\pmacro(\psi_i | y_i ; \hat{\theta})$ of the individual parameters $\psi_i$ can be estimated for each individual $i$ using the [[The Metropolis-Hastings algorithm for simulating the individual parameters| Metropolis-Hastings algorithm]]. For each $i$, this algorithm generates a sequence $(\psi_i^{k}, k \geq 1)$ which converges in distribution to the conditional distribution $\pmacro(\psi_i | y_i ; \hat{\theta})$ and that can be used for estimating any summary statistic of this distribution (mean, standard deviation, quantiles, etc.).<br />
<br />
The mode of this conditional distribution can be estimated using this sequence or by maximizing $\pmacro(\psi_i | y_i ; \hat{\theta})$ using numerical methods.<br />
<br />
The choice of using the conditional mean or the conditional mode is arbitrary. By default, $\monolix$ uses the conditional mode, taking the philosophy that the "most likely" values of the individual parameters are the most suited for computing the "most likely" predictions.<br />
<br />
<br />
{{ImageWithCaption|image=mode1.png|caption=Predicted concentrations for 6 individuals using the estimated conditional modes of the individual PK parameters}} <br />
<br />
<br><br />
<br />
== Estimation of the observed log-likelihood ==<br />
<br />
<br />
Once $\theta$ has been estimated, the observed log-likelihood of $\hat{\theta}$ is defined as<br />
<br />
{{Equation1<br />
|equation=<math> \begin{eqnarray}<br />
{\llike} (\hat{\theta};\by) &=& \log({\like}(\hat{\theta};\by)) \\<br />
&\eqdef& \log(\py(\by;\hat{\theta})) .<br />
\end{eqnarray}</math> }}<br />
<br />
The observed log-likelihood cannot be computed in closed form for nonlinear mixed effects models, but can be estimated using the methods described in the [[Estimation of the log-likelihood]] Section. The estimated log-likelihood can then be used for performing likelihood ratio tests and for computing information criteria such as AIC and BIC (see the [[Evaluation]] Section).<br />
<br />
<br />
<br><br />
== Bibliography ==<br />
<br />
<bibtex><br />
@article{Monolix,<br />
author = {Lixoft},<br />
title = {Monolix 4.2},<br />
year={2012}<br />
journal = {http://www.lixoft.eu/products/monolix/product-monolix-overview},<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@article{comets2011package,<br />
title={saemix: Stochastic Approximation Expectation Maximization (SAEM) algorithm. R package version 0.96.1},<br />
author={Comets, E. and Lavenu, A. and Lavielle, M.},<br />
journal = {http://cran.r-project.org/web/packages/saemix/index.html},<br />
year={2013}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@article{nlmefitsa,<br />
title={nlmefitsa: fit nonlinear mixed-effects model with stochastic EM algorithm. Matlab R2013a function},<br />
author={The MathWorks},<br />
journal = {http://www.mathworks.fr/fr/help/stats/nlmefitsa.html},<br />
year={2013}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@article{beal1992nonmem,<br />
title={NONMEM users guides},<br />
author={Beal, S.L. and Sheiner, L.B. and Boeckmann, A. and Bauer, R.J.},<br />
journal={San Francisco, NONMEM Project Group, University of California},<br />
year={1992}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@book{pinheiro2000mixed,<br />
title={Mixed effects models in S and S-PLUS},<br />
author={Pinheiro, J.C. and Bates, D.M.},<br />
year={2000},<br />
publisher={Springer Verlag}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@article{pinheiro2010r,<br />
title={the R Core team (2009) nlme: Linear and Nonlinear Mixed Effects Models. R package version 3.1-96},<br />
author={Pinheiro, J. and Bates, D. and DebRoy, S. and Sarkar, D.},<br />
journal={R Foundation for Statistical Computing, Vienna},<br />
year={2010}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@article{spiegelhalter2003winbugs,<br />
title={WinBUGS user manual},<br />
author={Spiegelhalter, D. and Thomas, A. and Best, N. and Lunn, D.},<br />
journal={Cambridge: MRC Biostatistics Unit},<br />
year={2003}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@Manual{docSPSS,<br />
title = {Linear mixed-effects modeling in SPSS. An introduction to the MIXED procedure},<br />
author = {SPSS},<br />
year = {2002},<br />
note={Technical Report}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@Manual{docSAS,<br />
title = {The NLMMIXED procedure, SAS/STAT 9.2 User's Guide},<br />
chapter = {61},<br />
pages = {4337--4435},<br />
author = {SAS},<br />
year = {2008}<br />
}<br />
</bibtex><br />
<br />
<br />
{{Back&Next<br />
|linkBack=Visualization<br />
|linkNext=Model evaluation }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=Visualization&diff=7280Visualization2013-06-07T13:02:04Z<p>Brocco: </p>
<hr />
<div><div style="color: #2E5894; padding-left: 1.4em; padding-right:2.2em; padding-bottom:0.8em; padding:top:1">[[Image:attention4.jpg|45px|left|link=]] <br />
(If you are experiencing problems with the display of the mathematical formula, you can either try to use another browser, or use this link which should work smoothly: http://popix.lixoft.net)<br />
</div><br />
<br />
== Introduction ==<br />
<br />
Before deciding to model data, it is very important to be able to visualize it. This is especially the case for longitudinal data when we want to see how an outcome varies with time or as a function of another outcome. We may also want to visualize how the individual covariates are distributed, visually detect if there are relationships between variables, visually compare data from different groups, etc. Development of such visual exploration tools poses no methodological problems. It is simple to write a Matlab or R code for one's own needs. To<br />
illustrate the data visualization part of this chapter, we have created a little Matlab toolbox called $\popixplore$ ({{filepath:popixplore 1.1.zip}}) which can be freely downloaded and used.<br />
<br />
It may also be useful to be able to visualize the model itself by undertaking a sensitivity analysis to look at how the structural model changes when we vary one or several parameters. This is important for truly understanding the structural model, i.e., what is behind the given mathematical equations. In the modeling context, we may also want to visually calibrate parameters in order to obtain predictions as close as possible to the observations. Developing such a tool is a difficult task because the tool needs to be able to easily input a model using some coding language, perform complex calculations, and provide a decent graphical interface (e.g., one that lets you easily modify the model parameters).<br />
<br />
Various model visualization tools exist, such as [[http://www.berkeleymadonna.com/index.html Berkeley Madonna]], specialized in the analysis of dynamical systems and the resolution of ordinary differential equations. Here, we use [[http://www.lixoft.eu/products/mlxplore/mlxplore-overview/ $\mlxplore$]] for some different reasons:<br />
<br />
<br />
<ul><br />
* $\mlxplore$ uses the $\mlxtran$ language which is extremely flexible and well-adapted to implementing complex mixed-effects models. Indeed, with $\mlxtran$ we can implement pharmacokinetic models with complex administration schedules, include inter-individual variability in parameters, define a statistical model for the covariates, etc. Another extremely important aspect of $\mlxtran$ is that it rigorously adopts the model representation formalisms proposed in $\wikipopix$. In other words, model implementation is completely in sync with its mathematical representation.<br />
<br><br />
<br />
* $\mlxplore$ provides a clear graphical interface that of course allows us to visualize the structural model, but also the statistical model, which is of fundamental importance in the population approach. We can thus visualize the impact of covariates and inter-individual variability of model parameters on predictions.<br />
</ul><br />
<br />
<br />
<br><br />
<br />
== Data exploration ==<br />
<br />
<br />
The following example involves 80 individuals that receive a unique dose of an anticoagulant at time $t=0$. For each patient we then measure the plasmatic concentration of the drug at various times. This drug can cause undesirable side effects such as nose bleeds. If this happens, we also record the times at which this happens. The data is recorded in columns of a single text file {{Verbatim|pkrtte_data.csv}}. In this example, the columns are:<br />
<br />
<br />
<ul><br />
'''id''' the ID number of the patient<br />
<br><br><br />
'''time''' dose administration and observation times<br />
<br><br><br />
'''amt''' the amount of drug administered<br />
<br><br><br />
'''y''' the observations (concentrations and events)<br />
<br><br><br />
'''ytype''' the type of observation: 1=concentration, 2=event<br />
<br><br><br />
'''weight''' a continuous individual covariate<br />
<br><br><br />
'''gender''' a categorical individual covariate (F or M)<br />
<br><br><br />
'''group''' four different groups receive different doses: A=40mg, B=60mg, C=80mg, D=100mg.<br />
</ul><br />
<br />
<br />
{{ImageWithCaption|image=exploredata0.png|caption=The datafile {{Verbatim|pkrtte_data.csv}} }} <br />
<br />
<br />
We can read this datafile with the function {{Verbatim|readdatapx}} and add additional information about the data:<br />
<br />
<br />
{{MATLABcode<br />
|name=<br />
|code=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
datafile.name='pkrtte_data.csv';<br />
datafile.format='csv'; % can be "csv", "space", "tab" or ";"<br />
<br />
info.header = {'ID','TIME','AMT','Y','YTYPE','COV','CAT','CAT'};<br />
info.observation.name={'concentration','hemorrhaging'};<br />
info.observation.type={'continuous','event'};<br />
info.observation.unit={'mg/l',''};<br />
info.covariate.unit={'kg',''};<br />
info.time.unit='h';<br />
<br />
data=readdatapx(datafile,info);<br />
</pre> }}<br />
<br />
<br />
How we graphically represent data depends on the type of data. Often for continuous data we use "spaghetti plots", where all of the observations are given on the same plot, and those for each individual are joined up using line segments. Time-to-event data are usually represented using Kaplan-Meyer plots, i.e., an estimate of the survival function for the first event. In the case of repeated events, we can instead represent the average cumulative number of events per individual.<br />
<br />
<br />
{{MATLABcode<br />
|name=<br />
|code=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
>>exploredatapx(data)<br />
</pre> }}<br />
<br />
<br />
{{ImageWithCaption|image=exploredata1.png|caption=Graphical representation of the data. Left: concentrations, right: average cumulative number of events per individual}}<br />
<br />
<br />
When different groups receive different treatments, it can be useful to separately visualize the data from each group. Here for instance we can separate the patients into groups depending on the initial dose given.<br />
<br />
<br />
{{ImageWithCaption|image=exploredata2.png|caption=Concentration profiles per dose group}}<br />
<br />
<br />
{| cellpadding="10" cellspacing="0"<br />
|style = "width:50%"| [[File:exploredata3a.png]] <br />
|style = "width:50%"| [[File:exploredata3b.png]]<br />
|-<br />
|cellspan="2" align="center" style="text-align:center"| ''Distribution of weight and gender per dose group'' <br />
|}<br />
<br />
<br />
<br />
{{Remarks<br />
|title=Remark<br />
|text=The data file {{Verbatim|pkrtte_data.csv}} and the matlab script {{Verbatim|pkrtte_demo.m}} are available in the folder {{Verbatim|demos}} of $\popixplore$: {{filepath:popixplore 1.1.zip}}.<br />
}}<br />
<br />
<br />
<br><br />
<br />
==Model exploration==<br />
<br />
===Exploring the structural model===<br />
<br />
Suppose that we now want to visualize the following joint model which is one that can be used for simultaneously modeling PK and time-to-event data:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
k&=&Cl/V \\<br />
\deriv{A_d} &=& - k_a \, A_d(t) \\<br />
\deriv{A_c} &=& k_a \, A_d(t) - k \, A_c(t) \\<br />
Cc(t) &=& {Ac(t)}/{V} \\<br />
h(t) &=& h_0 \, \exp(\gamma\, Cc(t)) .<br />
\end{eqnarray} </math> }}<br />
<br />
Here, $A_d$ and $A_c$ are the amounts of drug in the depot and central compartments, $Cc$ the concentration in the central compartment and $h$ the hazard function for the event of interest (hemorrhaging for instance). The parameters of the model are the absorption rate constant $ka$, the volume of distribution $V$, the clearance $Cl$, the baseline hazard $h_0$ and the coefficient $\gamma$.<br />
We assume that the drug can be administered both intravenously and orally, meaning that the drug can be administered to both the depot and the central compartment.<br />
<br />
We first need to implement this model using $\mlxtran$:<br />
<br />
<br />
{{MLXTran<br />
|name=joint1_model.txt<br />
|text=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
[PREDICTION]<br />
input={ka, V, Cl, h0, gamma}<br />
<br />
PK:<br />
depot(type=1,target=Ad)<br />
depot(type=2,target=Ac)<br />
<br />
EQUATION:<br />
k = Cl/V<br />
ddt_Ad = -ka*Ad<br />
ddt_Ac = ka*Ad - k*Ac<br />
Cc = Ac/V<br />
h = h0*exp(gamma*Cc)<br />
</pre>}}<br />
<br />
<br />
Here, an administration of type 1 (resp. 2) is an oral (resp. iv) administration.<br />
<br />
The tasks, i.e., how the model is to be used, are then coded as an $\mlxplore$ project:<br />
<br />
<br />
{{MLXPlore<br />
|name=joint1_project.txt<br />
|text=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
<MODEL><br />
file='joint1_model.txt'<br />
<br />
<DESIGN><br />
[ADMINISTRATION]<br />
adm1={time=0, amount=50,type=1}<br />
<br />
<PARAMETER><br />
ka = 0.5<br />
V = 10<br />
Cl = 0.5<br />
h0 = 0.01<br />
gamma = 0.5<br />
<br />
<OUTPUT><br />
list={Cc, h}<br />
grid=0:0.1:100<br />
</pre> }}<br />
<br />
<br />
In this example, a single dose of 50 mg is administered orally ({{Verbatim|target{{-}}Ad}} when {{Verbatim|type{{-}}1}}) at time 0. We have asked $\mlxplore$ to display the predicted concentration $Cc$ and the hazard function $h$ between $t=0$ and $t=100$ every $0.1\,h$ for a given set of parameters. We can then change the values of these parameters with the sliders to see what the impact on the two functions is.<br />
<br />
<br />
{{ImageWithCaption|image=exploremodel1.png|caption=Exploring the model using $\mlxplore$ }}<br />
<br />
<br />
We can easily modify the dose regimen without changing anything in the model itself. Suppose for instance that we want now to compare a treatment with repeated doses of 50mg every 24 hours and a treatment with repeated doses of 25mg every 12 hours. Only the section {{Verbatim|<DESIGN>}} needs to be modified:<br />
<br />
<br />
{{ExampleWithCode&Image<br />
|title=<br />
|text=<br />
|code={{MLXPloreForTable<br />
|name=joint2_project.txt<br />
|text=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
<br />
<DESIGN><br />
[ADMINISTRATION]<br />
adm1={time=0:24:144, amount=50,type=1}<br />
adm2={time=0:12:144, amount=25,type=1}<br />
</pre> }}<br />
|image=[[File:exploremodel2.png]] }}<br />
<br />
<br />
We can combine different administrations (oral and intravenous for instance) into one global treatment:<br />
<br />
<br />
{{ExampleWithCode&Image<br />
|title=<br />
|text=<br />
|code={{MLXPloreForTable<br />
|name=joint3_project.txt<br />
|text=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
<DESIGN><br />
[ADMINISTRATION]<br />
adm1={time=0:24:144, amount=50,type=1}<br />
adm2={time=6:48:150, amount=25,type=2}<br />
<br />
[TREATMENT]<br />
trt1={adm1, adm2}<br />
</pre> }}<br />
|image= [[File:exploremodel3.png]]<br />
}}<br />
<br />
===Exploring the statistical model===<br />
<br />
One of the main advantages of $\mlxplore$ is its ability to graphically display the predicted distribution of the functions of interest $Cc$ and $h$ when certain parameters of the model are assumed to be random variables. Assume for instance that $V$, $Cl$ and $h_0$ are log-normally distributed. To take this into account, we simply need to insert a section {{Verbatim|[INDIVIDUAL]}} into the project file:<br />
<br />
<br />
{{MLXTran<br />
|name=joint2_model.txt<br />
|text=<pre style="background-color: #EFEFEF; border:none"><br />
[INDIVIDUAL]<br />
input={V_pop,Cl_pop,h0_pop,omega_V,omega_Cl,omega_h0}<br />
<br />
DEFINITION:<br />
V = {distribution=lognormal, reference=V_pop, sd=omega_V}<br />
Cl = {distribution=lognormal, reference=Cl_pop, sd=omega_Cl}<br />
h0 = {distribution=lognormal, reference=h0_pop, sd=omega_h0}<br />
<br />
[PREDICTION]<br />
input={ka, V, Cl, h0, gamma}<br />
.<br />
.<br />
.<br />
</pre> }}<br />
<br />
<br />
The parameters of the model are now the population parameters $V_{\rm pop}$, $Cl_{\rm pop}$, $h0_{\rm pop}$, $\omega_V$, $\omega_{Cl}$ and $\omega_{h_0}$ and the parameters $k_a$ and $\gamma$ which have no inter-individual variability.<br />
<br />
<br />
{{MLXTran<br />
|name=joint4_project.txt<br />
|text=<pre style="background-color: #EFEFEF; border:none"><br />
<MODEL><br />
file='joint2_model.txt'<br />
<br />
<DESIGN><br />
[ADMINISTRATION]<br />
adm1={time=0, amount=50,type=1}<br />
<br />
<PARAMETER><br />
V_pop = 10<br />
Cl_pop = 0.5<br />
h0_pop=0.01<br />
omega_V = 0.2<br />
omega_Cl = 0.3<br />
omega_h0 = 0.2<br />
ka = 0.5<br />
gamma = 0.5<br />
<br />
<OUTPUT><br />
list={Cc, h}<br />
grid=0:0.1:100<br />
</pre> }}<br />
<br />
<br />
When some parameters of the model are random variables, $\mlxplore$ displays the median of the predicted distribution and several prediction intervals (the default is to use different shaded areas for the 10%, 20%, ..., 90% quantiles).<br />
<br />
<br />
{{ImageWithCaption|image=exploremodel4b.png|caption=Exploring the statistical model using $\mlxplore$}}<br />
<br />
<br />
It is possible to introduce covariates into the statistical model by considering for example that the volume depends on the weight, and considering that these covariates are themselves random variables. This may be important if we are for example looking to visualize the amount of variation in concentration due to variation in weight, and the variation in concentration which remains unaccounted for, caused by random effects.<br />
<br />
<br />
{{ImageWithCaption|image=exploremodel5.png|caption=Exploring the statistical model using $\mlxplore$ }}<br />
<br />
<br />
The $\mlxtran$ model files and the $\mlxplore$ scripts can be downloaded here: {{filepath:pk mlxplore.zip}}.<br />
<br />
<br />
<br><br />
== Bibliography ==<br />
<br />
<br />
<bibtex><br />
@ARTICLE{popixplore,<br />
author = {POPIX Inria team},<br />
title = {Popixplore 1.0},<br />
url = {https://wiki.inria.fr/wikis/popix/images/7/71/Popixplore_1.1.zip},<br />
}<br />
</bibtex><br />
<bibtex><br />
@ARTICLE{MLXplore,<br />
author = {Lixoft},<br />
title = {MLXPlore 1.0},<br />
url = {http://www.lixoft.eu/products/mlxplore/mlxplore-overview},<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{macey2000berkeley,<br />
title={Berkeley Madonna user’s guide},<br />
author={Macey, R. and Oster, G. and Zahnley, T.},<br />
journal={Berkeley (CA): University of California},<br />
year={2000}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{chatterjee2009sensitivity,<br />
title={Sensitivity analysis in linear regression},<br />
author={Chatterjee, S. and Hadi, A. S.},<br />
volume={327},<br />
year={2009},<br />
publisher={Wiley}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{sensibilité2013,<br />
title={Analyse de sensibilité et exploration de modèles},<br />
author={Faivre R. and Looss B. and Mah&eacute;vas, S. and Makowski, D. and Monod, H.},<br />
year={2013},<br />
publisher={Editions Quae}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{saltelli2000sensitivity,<br />
title={Sensitivity analysis},<br />
author={Saltelli, A. and Chan, K. and Scott, E. M. and others},<br />
volume={134},<br />
year={2000},<br />
publisher={Wiley New York}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{saltelli2008global,<br />
title={Global sensitivity analysis: the primer},<br />
author={Saltelli, A. and Ratto, M. and Andres, T. and Campolongo, F. and Cariboni, J. and Gatelli, D. and Saisana, M. and Tarantola, S.},<br />
year={2008},<br />
publisher={Wiley-Interscience}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{saltelli2004sensitivity,<br />
title={Sensitivity analysis in practice: a guide to assessing scientific models},<br />
author={Saltelli, A. and Tarantola, S. and Campolongo, F. and Ratto, M.},<br />
year={2004},<br />
publisher={Wiley}<br />
}<br />
</bibtex><br />
<br />
<br />
<br />
{{Next<br />
|link=Modeling}}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=Visualization&diff=7279Visualization2013-06-07T13:01:49Z<p>Brocco: </p>
<hr />
<div><div style="color: #2E5894; padding-left: 1.4em; padding-right:2.2em; padding-bottom:0.8em; padding:top:1">[[Image:attention4.jpg|45px|left|link=]] <br />
(If you are experiencing problems with the display of the mathematical formula, you can either try to use another browser, or use this link which should work smoothly: http://popix.lixoft.net)<br />
</div><br />
<br />
<br />
== Introduction ==<br />
<br />
Before deciding to model data, it is very important to be able to visualize it. This is especially the case for longitudinal data when we want to see how an outcome varies with time or as a function of another outcome. We may also want to visualize how the individual covariates are distributed, visually detect if there are relationships between variables, visually compare data from different groups, etc. Development of such visual exploration tools poses no methodological problems. It is simple to write a Matlab or R code for one's own needs. To<br />
illustrate the data visualization part of this chapter, we have created a little Matlab toolbox called $\popixplore$ ({{filepath:popixplore 1.1.zip}}) which can be freely downloaded and used.<br />
<br />
It may also be useful to be able to visualize the model itself by undertaking a sensitivity analysis to look at how the structural model changes when we vary one or several parameters. This is important for truly understanding the structural model, i.e., what is behind the given mathematical equations. In the modeling context, we may also want to visually calibrate parameters in order to obtain predictions as close as possible to the observations. Developing such a tool is a difficult task because the tool needs to be able to easily input a model using some coding language, perform complex calculations, and provide a decent graphical interface (e.g., one that lets you easily modify the model parameters).<br />
<br />
Various model visualization tools exist, such as [[http://www.berkeleymadonna.com/index.html Berkeley Madonna]], specialized in the analysis of dynamical systems and the resolution of ordinary differential equations. Here, we use [[http://www.lixoft.eu/products/mlxplore/mlxplore-overview/ $\mlxplore$]] for some different reasons:<br />
<br />
<br />
<ul><br />
* $\mlxplore$ uses the $\mlxtran$ language which is extremely flexible and well-adapted to implementing complex mixed-effects models. Indeed, with $\mlxtran$ we can implement pharmacokinetic models with complex administration schedules, include inter-individual variability in parameters, define a statistical model for the covariates, etc. Another extremely important aspect of $\mlxtran$ is that it rigorously adopts the model representation formalisms proposed in $\wikipopix$. In other words, model implementation is completely in sync with its mathematical representation.<br />
<br><br />
<br />
* $\mlxplore$ provides a clear graphical interface that of course allows us to visualize the structural model, but also the statistical model, which is of fundamental importance in the population approach. We can thus visualize the impact of covariates and inter-individual variability of model parameters on predictions.<br />
</ul><br />
<br />
<br />
<br><br />
<br />
== Data exploration ==<br />
<br />
<br />
The following example involves 80 individuals that receive a unique dose of an anticoagulant at time $t=0$. For each patient we then measure the plasmatic concentration of the drug at various times. This drug can cause undesirable side effects such as nose bleeds. If this happens, we also record the times at which this happens. The data is recorded in columns of a single text file {{Verbatim|pkrtte_data.csv}}. In this example, the columns are:<br />
<br />
<br />
<ul><br />
'''id''' the ID number of the patient<br />
<br><br><br />
'''time''' dose administration and observation times<br />
<br><br><br />
'''amt''' the amount of drug administered<br />
<br><br><br />
'''y''' the observations (concentrations and events)<br />
<br><br><br />
'''ytype''' the type of observation: 1=concentration, 2=event<br />
<br><br><br />
'''weight''' a continuous individual covariate<br />
<br><br><br />
'''gender''' a categorical individual covariate (F or M)<br />
<br><br><br />
'''group''' four different groups receive different doses: A=40mg, B=60mg, C=80mg, D=100mg.<br />
</ul><br />
<br />
<br />
{{ImageWithCaption|image=exploredata0.png|caption=The datafile {{Verbatim|pkrtte_data.csv}} }} <br />
<br />
<br />
We can read this datafile with the function {{Verbatim|readdatapx}} and add additional information about the data:<br />
<br />
<br />
{{MATLABcode<br />
|name=<br />
|code=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
datafile.name='pkrtte_data.csv';<br />
datafile.format='csv'; % can be "csv", "space", "tab" or ";"<br />
<br />
info.header = {'ID','TIME','AMT','Y','YTYPE','COV','CAT','CAT'};<br />
info.observation.name={'concentration','hemorrhaging'};<br />
info.observation.type={'continuous','event'};<br />
info.observation.unit={'mg/l',''};<br />
info.covariate.unit={'kg',''};<br />
info.time.unit='h';<br />
<br />
data=readdatapx(datafile,info);<br />
</pre> }}<br />
<br />
<br />
How we graphically represent data depends on the type of data. Often for continuous data we use "spaghetti plots", where all of the observations are given on the same plot, and those for each individual are joined up using line segments. Time-to-event data are usually represented using Kaplan-Meyer plots, i.e., an estimate of the survival function for the first event. In the case of repeated events, we can instead represent the average cumulative number of events per individual.<br />
<br />
<br />
{{MATLABcode<br />
|name=<br />
|code=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
>>exploredatapx(data)<br />
</pre> }}<br />
<br />
<br />
{{ImageWithCaption|image=exploredata1.png|caption=Graphical representation of the data. Left: concentrations, right: average cumulative number of events per individual}}<br />
<br />
<br />
When different groups receive different treatments, it can be useful to separately visualize the data from each group. Here for instance we can separate the patients into groups depending on the initial dose given.<br />
<br />
<br />
{{ImageWithCaption|image=exploredata2.png|caption=Concentration profiles per dose group}}<br />
<br />
<br />
{| cellpadding="10" cellspacing="0"<br />
|style = "width:50%"| [[File:exploredata3a.png]] <br />
|style = "width:50%"| [[File:exploredata3b.png]]<br />
|-<br />
|cellspan="2" align="center" style="text-align:center"| ''Distribution of weight and gender per dose group'' <br />
|}<br />
<br />
<br />
<br />
{{Remarks<br />
|title=Remark<br />
|text=The data file {{Verbatim|pkrtte_data.csv}} and the matlab script {{Verbatim|pkrtte_demo.m}} are available in the folder {{Verbatim|demos}} of $\popixplore$: {{filepath:popixplore 1.1.zip}}.<br />
}}<br />
<br />
<br />
<br><br />
<br />
==Model exploration==<br />
<br />
===Exploring the structural model===<br />
<br />
Suppose that we now want to visualize the following joint model which is one that can be used for simultaneously modeling PK and time-to-event data:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
k&=&Cl/V \\<br />
\deriv{A_d} &=& - k_a \, A_d(t) \\<br />
\deriv{A_c} &=& k_a \, A_d(t) - k \, A_c(t) \\<br />
Cc(t) &=& {Ac(t)}/{V} \\<br />
h(t) &=& h_0 \, \exp(\gamma\, Cc(t)) .<br />
\end{eqnarray} </math> }}<br />
<br />
Here, $A_d$ and $A_c$ are the amounts of drug in the depot and central compartments, $Cc$ the concentration in the central compartment and $h$ the hazard function for the event of interest (hemorrhaging for instance). The parameters of the model are the absorption rate constant $ka$, the volume of distribution $V$, the clearance $Cl$, the baseline hazard $h_0$ and the coefficient $\gamma$.<br />
We assume that the drug can be administered both intravenously and orally, meaning that the drug can be administered to both the depot and the central compartment.<br />
<br />
We first need to implement this model using $\mlxtran$:<br />
<br />
<br />
{{MLXTran<br />
|name=joint1_model.txt<br />
|text=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
[PREDICTION]<br />
input={ka, V, Cl, h0, gamma}<br />
<br />
PK:<br />
depot(type=1,target=Ad)<br />
depot(type=2,target=Ac)<br />
<br />
EQUATION:<br />
k = Cl/V<br />
ddt_Ad = -ka*Ad<br />
ddt_Ac = ka*Ad - k*Ac<br />
Cc = Ac/V<br />
h = h0*exp(gamma*Cc)<br />
</pre>}}<br />
<br />
<br />
Here, an administration of type 1 (resp. 2) is an oral (resp. iv) administration.<br />
<br />
The tasks, i.e., how the model is to be used, are then coded as an $\mlxplore$ project:<br />
<br />
<br />
{{MLXPlore<br />
|name=joint1_project.txt<br />
|text=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
<MODEL><br />
file='joint1_model.txt'<br />
<br />
<DESIGN><br />
[ADMINISTRATION]<br />
adm1={time=0, amount=50,type=1}<br />
<br />
<PARAMETER><br />
ka = 0.5<br />
V = 10<br />
Cl = 0.5<br />
h0 = 0.01<br />
gamma = 0.5<br />
<br />
<OUTPUT><br />
list={Cc, h}<br />
grid=0:0.1:100<br />
</pre> }}<br />
<br />
<br />
In this example, a single dose of 50 mg is administered orally ({{Verbatim|target{{-}}Ad}} when {{Verbatim|type{{-}}1}}) at time 0. We have asked $\mlxplore$ to display the predicted concentration $Cc$ and the hazard function $h$ between $t=0$ and $t=100$ every $0.1\,h$ for a given set of parameters. We can then change the values of these parameters with the sliders to see what the impact on the two functions is.<br />
<br />
<br />
{{ImageWithCaption|image=exploremodel1.png|caption=Exploring the model using $\mlxplore$ }}<br />
<br />
<br />
We can easily modify the dose regimen without changing anything in the model itself. Suppose for instance that we want now to compare a treatment with repeated doses of 50mg every 24 hours and a treatment with repeated doses of 25mg every 12 hours. Only the section {{Verbatim|<DESIGN>}} needs to be modified:<br />
<br />
<br />
{{ExampleWithCode&Image<br />
|title=<br />
|text=<br />
|code={{MLXPloreForTable<br />
|name=joint2_project.txt<br />
|text=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
<br />
<DESIGN><br />
[ADMINISTRATION]<br />
adm1={time=0:24:144, amount=50,type=1}<br />
adm2={time=0:12:144, amount=25,type=1}<br />
</pre> }}<br />
|image=[[File:exploremodel2.png]] }}<br />
<br />
<br />
We can combine different administrations (oral and intravenous for instance) into one global treatment:<br />
<br />
<br />
{{ExampleWithCode&Image<br />
|title=<br />
|text=<br />
|code={{MLXPloreForTable<br />
|name=joint3_project.txt<br />
|text=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
<DESIGN><br />
[ADMINISTRATION]<br />
adm1={time=0:24:144, amount=50,type=1}<br />
adm2={time=6:48:150, amount=25,type=2}<br />
<br />
[TREATMENT]<br />
trt1={adm1, adm2}<br />
</pre> }}<br />
|image= [[File:exploremodel3.png]]<br />
}}<br />
<br />
===Exploring the statistical model===<br />
<br />
One of the main advantages of $\mlxplore$ is its ability to graphically display the predicted distribution of the functions of interest $Cc$ and $h$ when certain parameters of the model are assumed to be random variables. Assume for instance that $V$, $Cl$ and $h_0$ are log-normally distributed. To take this into account, we simply need to insert a section {{Verbatim|[INDIVIDUAL]}} into the project file:<br />
<br />
<br />
{{MLXTran<br />
|name=joint2_model.txt<br />
|text=<pre style="background-color: #EFEFEF; border:none"><br />
[INDIVIDUAL]<br />
input={V_pop,Cl_pop,h0_pop,omega_V,omega_Cl,omega_h0}<br />
<br />
DEFINITION:<br />
V = {distribution=lognormal, reference=V_pop, sd=omega_V}<br />
Cl = {distribution=lognormal, reference=Cl_pop, sd=omega_Cl}<br />
h0 = {distribution=lognormal, reference=h0_pop, sd=omega_h0}<br />
<br />
[PREDICTION]<br />
input={ka, V, Cl, h0, gamma}<br />
.<br />
.<br />
.<br />
</pre> }}<br />
<br />
<br />
The parameters of the model are now the population parameters $V_{\rm pop}$, $Cl_{\rm pop}$, $h0_{\rm pop}$, $\omega_V$, $\omega_{Cl}$ and $\omega_{h_0}$ and the parameters $k_a$ and $\gamma$ which have no inter-individual variability.<br />
<br />
<br />
{{MLXTran<br />
|name=joint4_project.txt<br />
|text=<pre style="background-color: #EFEFEF; border:none"><br />
<MODEL><br />
file='joint2_model.txt'<br />
<br />
<DESIGN><br />
[ADMINISTRATION]<br />
adm1={time=0, amount=50,type=1}<br />
<br />
<PARAMETER><br />
V_pop = 10<br />
Cl_pop = 0.5<br />
h0_pop=0.01<br />
omega_V = 0.2<br />
omega_Cl = 0.3<br />
omega_h0 = 0.2<br />
ka = 0.5<br />
gamma = 0.5<br />
<br />
<OUTPUT><br />
list={Cc, h}<br />
grid=0:0.1:100<br />
</pre> }}<br />
<br />
<br />
When some parameters of the model are random variables, $\mlxplore$ displays the median of the predicted distribution and several prediction intervals (the default is to use different shaded areas for the 10%, 20%, ..., 90% quantiles).<br />
<br />
<br />
{{ImageWithCaption|image=exploremodel4b.png|caption=Exploring the statistical model using $\mlxplore$}}<br />
<br />
<br />
It is possible to introduce covariates into the statistical model by considering for example that the volume depends on the weight, and considering that these covariates are themselves random variables. This may be important if we are for example looking to visualize the amount of variation in concentration due to variation in weight, and the variation in concentration which remains unaccounted for, caused by random effects.<br />
<br />
<br />
{{ImageWithCaption|image=exploremodel5.png|caption=Exploring the statistical model using $\mlxplore$ }}<br />
<br />
<br />
The $\mlxtran$ model files and the $\mlxplore$ scripts can be downloaded here: {{filepath:pk mlxplore.zip}}.<br />
<br />
<br />
<br><br />
== Bibliography ==<br />
<br />
<br />
<bibtex><br />
@ARTICLE{popixplore,<br />
author = {POPIX Inria team},<br />
title = {Popixplore 1.0},<br />
url = {https://wiki.inria.fr/wikis/popix/images/7/71/Popixplore_1.1.zip},<br />
}<br />
</bibtex><br />
<bibtex><br />
@ARTICLE{MLXplore,<br />
author = {Lixoft},<br />
title = {MLXPlore 1.0},<br />
url = {http://www.lixoft.eu/products/mlxplore/mlxplore-overview},<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{macey2000berkeley,<br />
title={Berkeley Madonna user’s guide},<br />
author={Macey, R. and Oster, G. and Zahnley, T.},<br />
journal={Berkeley (CA): University of California},<br />
year={2000}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{chatterjee2009sensitivity,<br />
title={Sensitivity analysis in linear regression},<br />
author={Chatterjee, S. and Hadi, A. S.},<br />
volume={327},<br />
year={2009},<br />
publisher={Wiley}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{sensibilité2013,<br />
title={Analyse de sensibilité et exploration de modèles},<br />
author={Faivre R. and Looss B. and Mah&eacute;vas, S. and Makowski, D. and Monod, H.},<br />
year={2013},<br />
publisher={Editions Quae}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{saltelli2000sensitivity,<br />
title={Sensitivity analysis},<br />
author={Saltelli, A. and Chan, K. and Scott, E. M. and others},<br />
volume={134},<br />
year={2000},<br />
publisher={Wiley New York}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{saltelli2008global,<br />
title={Global sensitivity analysis: the primer},<br />
author={Saltelli, A. and Ratto, M. and Andres, T. and Campolongo, F. and Cariboni, J. and Gatelli, D. and Saisana, M. and Tarantola, S.},<br />
year={2008},<br />
publisher={Wiley-Interscience}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{saltelli2004sensitivity,<br />
title={Sensitivity analysis in practice: a guide to assessing scientific models},<br />
author={Saltelli, A. and Tarantola, S. and Campolongo, F. and Ratto, M.},<br />
year={2004},<br />
publisher={Wiley}<br />
}<br />
</bibtex><br />
<br />
<br />
<br />
{{Next<br />
|link=Modeling}}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=Visualization&diff=7278Visualization2013-06-07T13:01:39Z<p>Brocco: </p>
<hr />
<div><div style="color: #2E5894; padding-left: 1.4em; padding-right:2.2em; padding-bottom:0.8em; padding:top:1">[[Image:attention4.jpg|45px|left|link=]] <br />
(If you are experiencing problems with the display of the mathematical formula, you can either try to use another browser, or use this link which should work smoothly: http://popix.lixoft.net)<br />
</div><br />
<br />
== Introduction ==<br />
<br />
Before deciding to model data, it is very important to be able to visualize it. This is especially the case for longitudinal data when we want to see how an outcome varies with time or as a function of another outcome. We may also want to visualize how the individual covariates are distributed, visually detect if there are relationships between variables, visually compare data from different groups, etc. Development of such visual exploration tools poses no methodological problems. It is simple to write a Matlab or R code for one's own needs. To<br />
illustrate the data visualization part of this chapter, we have created a little Matlab toolbox called $\popixplore$ ({{filepath:popixplore 1.1.zip}}) which can be freely downloaded and used.<br />
<br />
It may also be useful to be able to visualize the model itself by undertaking a sensitivity analysis to look at how the structural model changes when we vary one or several parameters. This is important for truly understanding the structural model, i.e., what is behind the given mathematical equations. In the modeling context, we may also want to visually calibrate parameters in order to obtain predictions as close as possible to the observations. Developing such a tool is a difficult task because the tool needs to be able to easily input a model using some coding language, perform complex calculations, and provide a decent graphical interface (e.g., one that lets you easily modify the model parameters).<br />
<br />
Various model visualization tools exist, such as [[http://www.berkeleymadonna.com/index.html Berkeley Madonna]], specialized in the analysis of dynamical systems and the resolution of ordinary differential equations. Here, we use [[http://www.lixoft.eu/products/mlxplore/mlxplore-overview/ $\mlxplore$]] for some different reasons:<br />
<br />
<br />
<ul><br />
* $\mlxplore$ uses the $\mlxtran$ language which is extremely flexible and well-adapted to implementing complex mixed-effects models. Indeed, with $\mlxtran$ we can implement pharmacokinetic models with complex administration schedules, include inter-individual variability in parameters, define a statistical model for the covariates, etc. Another extremely important aspect of $\mlxtran$ is that it rigorously adopts the model representation formalisms proposed in $\wikipopix$. In other words, model implementation is completely in sync with its mathematical representation.<br />
<br><br />
<br />
* $\mlxplore$ provides a clear graphical interface that of course allows us to visualize the structural model, but also the statistical model, which is of fundamental importance in the population approach. We can thus visualize the impact of covariates and inter-individual variability of model parameters on predictions.<br />
</ul><br />
<br />
<br />
<br><br />
<br />
== Data exploration ==<br />
<br />
<br />
The following example involves 80 individuals that receive a unique dose of an anticoagulant at time $t=0$. For each patient we then measure the plasmatic concentration of the drug at various times. This drug can cause undesirable side effects such as nose bleeds. If this happens, we also record the times at which this happens. The data is recorded in columns of a single text file {{Verbatim|pkrtte_data.csv}}. In this example, the columns are:<br />
<br />
<br />
<ul><br />
'''id''' the ID number of the patient<br />
<br><br><br />
'''time''' dose administration and observation times<br />
<br><br><br />
'''amt''' the amount of drug administered<br />
<br><br><br />
'''y''' the observations (concentrations and events)<br />
<br><br><br />
'''ytype''' the type of observation: 1=concentration, 2=event<br />
<br><br><br />
'''weight''' a continuous individual covariate<br />
<br><br><br />
'''gender''' a categorical individual covariate (F or M)<br />
<br><br><br />
'''group''' four different groups receive different doses: A=40mg, B=60mg, C=80mg, D=100mg.<br />
</ul><br />
<br />
<br />
{{ImageWithCaption|image=exploredata0.png|caption=The datafile {{Verbatim|pkrtte_data.csv}} }} <br />
<br />
<br />
We can read this datafile with the function {{Verbatim|readdatapx}} and add additional information about the data:<br />
<br />
<br />
{{MATLABcode<br />
|name=<br />
|code=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
datafile.name='pkrtte_data.csv';<br />
datafile.format='csv'; % can be "csv", "space", "tab" or ";"<br />
<br />
info.header = {'ID','TIME','AMT','Y','YTYPE','COV','CAT','CAT'};<br />
info.observation.name={'concentration','hemorrhaging'};<br />
info.observation.type={'continuous','event'};<br />
info.observation.unit={'mg/l',''};<br />
info.covariate.unit={'kg',''};<br />
info.time.unit='h';<br />
<br />
data=readdatapx(datafile,info);<br />
</pre> }}<br />
<br />
<br />
How we graphically represent data depends on the type of data. Often for continuous data we use "spaghetti plots", where all of the observations are given on the same plot, and those for each individual are joined up using line segments. Time-to-event data are usually represented using Kaplan-Meyer plots, i.e., an estimate of the survival function for the first event. In the case of repeated events, we can instead represent the average cumulative number of events per individual.<br />
<br />
<br />
{{MATLABcode<br />
|name=<br />
|code=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
>>exploredatapx(data)<br />
</pre> }}<br />
<br />
<br />
{{ImageWithCaption|image=exploredata1.png|caption=Graphical representation of the data. Left: concentrations, right: average cumulative number of events per individual}}<br />
<br />
<br />
When different groups receive different treatments, it can be useful to separately visualize the data from each group. Here for instance we can separate the patients into groups depending on the initial dose given.<br />
<br />
<br />
{{ImageWithCaption|image=exploredata2.png|caption=Concentration profiles per dose group}}<br />
<br />
<br />
{| cellpadding="10" cellspacing="0"<br />
|style = "width:50%"| [[File:exploredata3a.png]] <br />
|style = "width:50%"| [[File:exploredata3b.png]]<br />
|-<br />
|cellspan="2" align="center" style="text-align:center"| ''Distribution of weight and gender per dose group'' <br />
|}<br />
<br />
<br />
<br />
{{Remarks<br />
|title=Remark<br />
|text=The data file {{Verbatim|pkrtte_data.csv}} and the matlab script {{Verbatim|pkrtte_demo.m}} are available in the folder {{Verbatim|demos}} of $\popixplore$: {{filepath:popixplore 1.1.zip}}.<br />
}}<br />
<br />
<br />
<br><br />
<br />
==Model exploration==<br />
<br />
===Exploring the structural model===<br />
<br />
Suppose that we now want to visualize the following joint model which is one that can be used for simultaneously modeling PK and time-to-event data:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
k&=&Cl/V \\<br />
\deriv{A_d} &=& - k_a \, A_d(t) \\<br />
\deriv{A_c} &=& k_a \, A_d(t) - k \, A_c(t) \\<br />
Cc(t) &=& {Ac(t)}/{V} \\<br />
h(t) &=& h_0 \, \exp(\gamma\, Cc(t)) .<br />
\end{eqnarray} </math> }}<br />
<br />
Here, $A_d$ and $A_c$ are the amounts of drug in the depot and central compartments, $Cc$ the concentration in the central compartment and $h$ the hazard function for the event of interest (hemorrhaging for instance). The parameters of the model are the absorption rate constant $ka$, the volume of distribution $V$, the clearance $Cl$, the baseline hazard $h_0$ and the coefficient $\gamma$.<br />
We assume that the drug can be administered both intravenously and orally, meaning that the drug can be administered to both the depot and the central compartment.<br />
<br />
We first need to implement this model using $\mlxtran$:<br />
<br />
<br />
{{MLXTran<br />
|name=joint1_model.txt<br />
|text=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
[PREDICTION]<br />
input={ka, V, Cl, h0, gamma}<br />
<br />
PK:<br />
depot(type=1,target=Ad)<br />
depot(type=2,target=Ac)<br />
<br />
EQUATION:<br />
k = Cl/V<br />
ddt_Ad = -ka*Ad<br />
ddt_Ac = ka*Ad - k*Ac<br />
Cc = Ac/V<br />
h = h0*exp(gamma*Cc)<br />
</pre>}}<br />
<br />
<br />
Here, an administration of type 1 (resp. 2) is an oral (resp. iv) administration.<br />
<br />
The tasks, i.e., how the model is to be used, are then coded as an $\mlxplore$ project:<br />
<br />
<br />
{{MLXPlore<br />
|name=joint1_project.txt<br />
|text=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
<MODEL><br />
file='joint1_model.txt'<br />
<br />
<DESIGN><br />
[ADMINISTRATION]<br />
adm1={time=0, amount=50,type=1}<br />
<br />
<PARAMETER><br />
ka = 0.5<br />
V = 10<br />
Cl = 0.5<br />
h0 = 0.01<br />
gamma = 0.5<br />
<br />
<OUTPUT><br />
list={Cc, h}<br />
grid=0:0.1:100<br />
</pre> }}<br />
<br />
<br />
In this example, a single dose of 50 mg is administered orally ({{Verbatim|target{{-}}Ad}} when {{Verbatim|type{{-}}1}}) at time 0. We have asked $\mlxplore$ to display the predicted concentration $Cc$ and the hazard function $h$ between $t=0$ and $t=100$ every $0.1\,h$ for a given set of parameters. We can then change the values of these parameters with the sliders to see what the impact on the two functions is.<br />
<br />
<br />
{{ImageWithCaption|image=exploremodel1.png|caption=Exploring the model using $\mlxplore$ }}<br />
<br />
<br />
We can easily modify the dose regimen without changing anything in the model itself. Suppose for instance that we want now to compare a treatment with repeated doses of 50mg every 24 hours and a treatment with repeated doses of 25mg every 12 hours. Only the section {{Verbatim|<DESIGN>}} needs to be modified:<br />
<br />
<br />
{{ExampleWithCode&Image<br />
|title=<br />
|text=<br />
|code={{MLXPloreForTable<br />
|name=joint2_project.txt<br />
|text=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
<br />
<DESIGN><br />
[ADMINISTRATION]<br />
adm1={time=0:24:144, amount=50,type=1}<br />
adm2={time=0:12:144, amount=25,type=1}<br />
</pre> }}<br />
|image=[[File:exploremodel2.png]] }}<br />
<br />
<br />
We can combine different administrations (oral and intravenous for instance) into one global treatment:<br />
<br />
<br />
{{ExampleWithCode&Image<br />
|title=<br />
|text=<br />
|code={{MLXPloreForTable<br />
|name=joint3_project.txt<br />
|text=<br />
<pre style="background-color: #EFEFEF; border:none"><br />
<DESIGN><br />
[ADMINISTRATION]<br />
adm1={time=0:24:144, amount=50,type=1}<br />
adm2={time=6:48:150, amount=25,type=2}<br />
<br />
[TREATMENT]<br />
trt1={adm1, adm2}<br />
</pre> }}<br />
|image= [[File:exploremodel3.png]]<br />
}}<br />
<br />
===Exploring the statistical model===<br />
<br />
One of the main advantages of $\mlxplore$ is its ability to graphically display the predicted distribution of the functions of interest $Cc$ and $h$ when certain parameters of the model are assumed to be random variables. Assume for instance that $V$, $Cl$ and $h_0$ are log-normally distributed. To take this into account, we simply need to insert a section {{Verbatim|[INDIVIDUAL]}} into the project file:<br />
<br />
<br />
{{MLXTran<br />
|name=joint2_model.txt<br />
|text=<pre style="background-color: #EFEFEF; border:none"><br />
[INDIVIDUAL]<br />
input={V_pop,Cl_pop,h0_pop,omega_V,omega_Cl,omega_h0}<br />
<br />
DEFINITION:<br />
V = {distribution=lognormal, reference=V_pop, sd=omega_V}<br />
Cl = {distribution=lognormal, reference=Cl_pop, sd=omega_Cl}<br />
h0 = {distribution=lognormal, reference=h0_pop, sd=omega_h0}<br />
<br />
[PREDICTION]<br />
input={ka, V, Cl, h0, gamma}<br />
.<br />
.<br />
.<br />
</pre> }}<br />
<br />
<br />
The parameters of the model are now the population parameters $V_{\rm pop}$, $Cl_{\rm pop}$, $h0_{\rm pop}$, $\omega_V$, $\omega_{Cl}$ and $\omega_{h_0}$ and the parameters $k_a$ and $\gamma$ which have no inter-individual variability.<br />
<br />
<br />
{{MLXTran<br />
|name=joint4_project.txt<br />
|text=<pre style="background-color: #EFEFEF; border:none"><br />
<MODEL><br />
file='joint2_model.txt'<br />
<br />
<DESIGN><br />
[ADMINISTRATION]<br />
adm1={time=0, amount=50,type=1}<br />
<br />
<PARAMETER><br />
V_pop = 10<br />
Cl_pop = 0.5<br />
h0_pop=0.01<br />
omega_V = 0.2<br />
omega_Cl = 0.3<br />
omega_h0 = 0.2<br />
ka = 0.5<br />
gamma = 0.5<br />
<br />
<OUTPUT><br />
list={Cc, h}<br />
grid=0:0.1:100<br />
</pre> }}<br />
<br />
<br />
When some parameters of the model are random variables, $\mlxplore$ displays the median of the predicted distribution and several prediction intervals (the default is to use different shaded areas for the 10%, 20%, ..., 90% quantiles).<br />
<br />
<br />
{{ImageWithCaption|image=exploremodel4b.png|caption=Exploring the statistical model using $\mlxplore$}}<br />
<br />
<br />
It is possible to introduce covariates into the statistical model by considering for example that the volume depends on the weight, and considering that these covariates are themselves random variables. This may be important if we are for example looking to visualize the amount of variation in concentration due to variation in weight, and the variation in concentration which remains unaccounted for, caused by random effects.<br />
<br />
<br />
{{ImageWithCaption|image=exploremodel5.png|caption=Exploring the statistical model using $\mlxplore$ }}<br />
<br />
<br />
The $\mlxtran$ model files and the $\mlxplore$ scripts can be downloaded here: {{filepath:pk mlxplore.zip}}.<br />
<br />
<br />
<br><br />
== Bibliography ==<br />
<br />
<br />
<bibtex><br />
@ARTICLE{popixplore,<br />
author = {POPIX Inria team},<br />
title = {Popixplore 1.0},<br />
url = {https://wiki.inria.fr/wikis/popix/images/7/71/Popixplore_1.1.zip},<br />
}<br />
</bibtex><br />
<bibtex><br />
@ARTICLE{MLXplore,<br />
author = {Lixoft},<br />
title = {MLXPlore 1.0},<br />
url = {http://www.lixoft.eu/products/mlxplore/mlxplore-overview},<br />
}<br />
</bibtex><br />
<bibtex><br />
@article{macey2000berkeley,<br />
title={Berkeley Madonna user’s guide},<br />
author={Macey, R. and Oster, G. and Zahnley, T.},<br />
journal={Berkeley (CA): University of California},<br />
year={2000}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{chatterjee2009sensitivity,<br />
title={Sensitivity analysis in linear regression},<br />
author={Chatterjee, S. and Hadi, A. S.},<br />
volume={327},<br />
year={2009},<br />
publisher={Wiley}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{sensibilité2013,<br />
title={Analyse de sensibilité et exploration de modèles},<br />
author={Faivre R. and Looss B. and Mah&eacute;vas, S. and Makowski, D. and Monod, H.},<br />
year={2013},<br />
publisher={Editions Quae}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{saltelli2000sensitivity,<br />
title={Sensitivity analysis},<br />
author={Saltelli, A. and Chan, K. and Scott, E. M. and others},<br />
volume={134},<br />
year={2000},<br />
publisher={Wiley New York}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{saltelli2008global,<br />
title={Global sensitivity analysis: the primer},<br />
author={Saltelli, A. and Ratto, M. and Andres, T. and Campolongo, F. and Cariboni, J. and Gatelli, D. and Saisana, M. and Tarantola, S.},<br />
year={2008},<br />
publisher={Wiley-Interscience}<br />
}<br />
</bibtex><br />
<bibtex><br />
@book{saltelli2004sensitivity,<br />
title={Sensitivity analysis in practice: a guide to assessing scientific models},<br />
author={Saltelli, A. and Tarantola, S. and Campolongo, F. and Ratto, M.},<br />
year={2004},<br />
publisher={Wiley}<br />
}<br />
</bibtex><br />
<br />
<br />
<br />
{{Next<br />
|link=Modeling}}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=Extensions&diff=7277Extensions2013-06-07T13:01:26Z<p>Brocco: </p>
<hr />
<div><!-- Menu for the Extensions chapter --><br />
<sidebarmenu><br />
+[[Extensions]]<br />
*[[Extensions| Introduction ]] | [[ Mixture models ]] | [[Hidden Markov models]] | [[Stochastic differential equations based models]] <br />
</sidebarmenu><br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<div style="color: #2E5894; padding-left: 1.4em; padding-right:2.2em; padding-bottom:0.8em; padding:top:1">[[Image:attention4.jpg|45px|left|link=]] <br />
(If you are experiencing problems with the display of the mathematical formula, you can either try to use another browser, or use this link which should work smoothly: http://popix.lixoft.net)<br />
</div><br />
<br />
We have so far reviewed the most frequently used models for describing both the individual parameters $(\psi_i)$ and the observations $(y_i)$, but several extensions can be considered.<br />
<br />
For instance, if we assume that a population consists of several homogeneous sub-populations, mixtures models can be very useful for describing different types of mixtures, such as mixtures of distributions, mixtures of structural models and mixtures of residual models (see [[Mixture models|Mixture models]]).<br />
<br />
A stochastic component can also be introduced into the model by assuming some underlying stochastic dynamics, characterized either by a hidden Markov model (see [[Hidden Markov models|Hidden Markov models]]) or a system of stochastic differential equations (see [[Stochastic differential equations based models]]).<br />
<br />
Although we restrict ourselves to these extensions in this document, it should be noted that other extensions mentioned in the introduction (see [[What is a model? A joint probability distribution!|What is a model? A joint probability distribution!]]) could also have been addressed:<br />
<br />
<br />
<ul><br />
* Population parameter models: introduce a priori information in an estimation context, or to model inter-population variability.<br />
* Covariate models: mainly relevant in the context of wanting to simulate virtual individuals.<br />
* Design models: measurement times, dose regimens, etc.<br />
</ul><br />
<br />
<br />
{{Back&Next<br />
|linkBack=Joint models<br />
|linkNext=Mixture models }}</div>Broccohttps://wiki.inria.fr/wikis/popix/index.php?title=Estimation&diff=7276Estimation2013-06-07T12:57:15Z<p>Brocco: /* Bibliography */</p>
<hr />
<div>== Introduction ==<br />
<br />
In the modeling context, we usually assume that we have data that includes observations $\by$, measurement times $\bt$ and possibly additional regression variables $\bx$. There may also be individual covariates $\bc$, and in pharmacological applications the dose regimen $\bu$. For clarity, in the following notation we will omit the design variables $\bt$, $\bx$ and $\bu$, and the covariates $\bc$.<br />
<br />
Here, we find ourselves in the classical framework of incomplete data models. Indeed, only $\by = (y_{ij})$ is observed in the joint model $\pypsi(\by,\bpsi;\theta)$.<br />
<br />
Estimation tasks are common ones seen in statistics:<br />
<br />
<br />
<ol><br />
<li> Estimate the population parameter $\theta$ using the available observations and possibly a priori information that is available.</li><br />
<br />
<li>Evaluate the precision of the proposed estimates.</li><br />
<br />
<li>Reconstruct missing data, here being the individual parameters $\bpsi=(\psi_i, 1\leq i \leq N)$. </li><br />
<br />
<li>Estimate the log-likelihood for a given model, i.e., for a given joint distribution $\qypsi$ and value of $\theta$.</li><br />
</ol><br />
<br />
<br />
<br><br />
<br />
== Maximum likelihood estimation of the population parameters== <br />
<br />
<br><br />
=== Definitions ===<br />
<br />
<br />
''Maximum likelihood estimation'' consists of maximizing with respect to $\theta$ the ''observed likelihood'' defined by:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\like(\theta ; \by) &\eqdef& \py(\by ; \theta) \\<br />
&=& \int \pypsi(\by,\bpsi ;\theta) \, d \bpsi .<br />
\end{eqnarray}</math> }}<br />
<br />
Maximum likelihood estimation of the population parameter $\theta$ requires:<br />
<br />
<blockquote><br />
* A model, i.e., a joint distribution $\qypsi$. Depending on the software used, the model can be implemented using a script or a graphical user interface. $\monolix$ is extremely flexible and allows us to combine both. It is possible for instance to code the structural model using $\mlxtran$ and use the GUI for implementing the statistical model. Whatever the options selected, the complete model can always be saved as a text file. <br><br><br />
* Inputs $\by$, $\bc$, $\bu$ and $\bt$. All of these variables tend to be stored in a unique data file (see the [[Visualization#Data exploration | Data Exploration ]] Section). <br><br><br />
* An algorithm which allows us to maximize $\int \pypsi(\by,\bpsi ;\theta) \, d \bpsi$ with respect to $\theta$. Each software package has its own algorithms implemented. It is not our goal here to rate and compare the various algorithms and implementations. We will use exclusively the SAEM algorithm as described in [[The SAEM algorithm for estimating population parameters | The SAEM algorithm]] and implemented in $\monolix$ as we are entirely satisfied by both its theoretical and practical qualities: <br><br><br />
** The algorithms implemented in $\monolix$ including SAEM and its extensions (mixture models, hidden Markov models, SDE-based model, censored data, etc.) have been published in statistical journals. Furthermore, convergence of SAEM has been rigorously proved.<br><br><br />
** The SAEM implementation in $\monolix$ is extremely efficient for a wide variety of complex models.<br><br><br />
** The SAEM implementation in $\monolix$ was done by the same group that proposed the algorithm and studied in detail its theoretical and practical properties.<br />
</blockquote><br />
<br />
<br />
{{Remarks<br />
|title=Remark<br />
|text= It is important to highlight the fact that for a parameter $\psi_i$ whose distribution is the tranformation of a normal one (log-normal, logit-normal, etc.) the MLE $\hat{\psi}_{\rm pop}$ of the reference parameter $\psi_{\rm pop}$ is neither the mean nor the mode of the distribution. It is in fact the median.<br />
<br />
To show why this is the case, let $h$ be a nonlinear, twice continuously derivable and strictly increasing function such that $h(\psi_i)$ is normally distributed.<br />
<br />
<br />
* First we show that it is not the mean. By definition, the MLE of $h(\psi_{\rm pop})$ is $h(\hat{\psi}_{\rm pop})$. Thus, the estimated distribution of $h(\psi_i)$ is the normal distribution with mean $h(\hat{\psi}_{\rm pop})$, but $\esp{h(\psi_i)} = h(\hat{\psi}_{\rm pop})$ implies that $\esp{\psi_i} \neq \hat{\psi}_{\rm pop}$ since $h$ is nonlinear. In other words, $\hat{\psi}_{\rm pop}$ is not the mean of the estimated distribution of $\psi_i$.<br />
<br />
<br />
* Next we show that it is not the mode. Let $f$ be the pdf of $\psi_i$ and let $f_h$ be the pdf of $h(\psi_i)$. By definition, for any $h(t)\in \mathbb{R}$,<br />
<br />
{{Equation1<br />
|equation=<math><br />
f(t) = h^\prime(t)f_h(h(t)) . </math> }}<br />
<br />
: Thus,<br />
<br />
{{Equation1<br />
|equation=<math> <br />
f^\prime(t) = h^{\prime \prime}(t)f_h(h(t)) + h^{\prime 2}(t)f_h^\prime(h(t)) .<br />
</math> }}<br />
<br />
: By definition of the mode, $f_h^\prime(h(\hat{\psi}_{\rm pop}))=0$. Since $h$ is nonlinear, $h^{\prime \prime}(\hat{\psi}_{\rm pop})\neq 0$ a.s. and $f^\prime(\hat{\psi}_{\rm pop})\neq 0$ a.s.. In other words, $\hat{\psi}_{\rm pop}$ is not the mode of the estimated distribution of $\psi_i$.<br />
<br />
<br />
* Now we show that it is the median. Since $h$ is a strictly increasing function,<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\probs{\hat{\psi}_{\rm pop} }{\psi_i \leq \hat{\psi}_{\rm pop} } &=& \probs{\hat{\psi}_{\rm pop} }{h(\psi_i) \leq h(\hat{\psi}_{\rm pop})} \\<br />
&=& 0.5 .<br />
\end{eqnarray}</math> }} <br />
<br />
: In other words, $\hat{\psi}_{\rm pop}$ is the median of the estimated distribution of $\psi_i$.<br />
}}<br />
<br />
<br />
<br><br />
<br />
=== Example ===<br />
<br />
Let us again look at the model used in the [[Visualization#Model exploration | Model Visualization]] Section. For the case of a unique dose $D$ given at time $t=0$, the structural model is written:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
ke&=&Cl/V \\<br />
Cc(t) &=& \displaystyle{\frac{D \, ka}{V(ka-ke)} }\left(e^{-ke\,t} - e^{-ka\,t} \right) \\<br />
h(t) &=& h_0 \, \exp(\gamma\, Cc(t)) ,<br />
\end{eqnarray}</math> }}<br />
<br />
where $Cc$ is the concentration in the central compartment and $h$ the hazard function for the event of interest (hemorrhaging). Supposing a constant error model for the concentration, the model for the observations can be easily implemented using $\mlxtran$.<br />
<br />
<br />
{{MLXTran<br />
|name=joint1est_model.txt<br />
|text=<pre style="background-color:#EFEFEF; border:none;"> <br />
INPUT:<br />
parameter = {ka, V, Cl, h0, gamma}<br />
<br />
EQUATION:<br />
ke=Cl/V<br />
Cc = amtDose*ka/(V*(ka-ke))*(exp(-ke*t) - exp(-ka*t))<br />
h = h0*exp(gamma*Cc)<br />
<br />
OBSERVATION:<br />
Concentration = {type=continuous, prediction=Cc, errorModel=constant}<br />
Hemorrhaging = {type=event, hazard=h}<br />
<br />
OUTPUT:<br />
output = {Concentration, Hemorrhaging}<br />
</pre> }}<br />
<br />
<br />
Here, {{Verbatim|amtDose}} is a reserved keyword for the last administered dose.<br />
<br />
The model's parameters are the absorption rate constant $ka$, the volume of distribution $V$, the clearance $Cl$, the baseline hazard $h_0$ and the coefficient $\gamma$. The statistical model for the individual parameters can be defined in the $\monolix$ project file (left) and/or the $\monolix$ GUI (right):<br />
<br />
<br />
{{ExampleWithCode&Image<br />
|title=<br />
|text=<br />
|code={{MLXTranForTable<br />
|name=<br />
|text=<pre style="background-color:#EFEFEF; border:none;"> <br />
INDIVIDUAL:<br />
ka = {distribution=logNormal, iiv=yes}<br />
V = {distribution=logNormal, iiv=yes},<br />
Cl = {distribution=normal, iiv=yes},<br />
h0 = {distribution=probitNormal, iiv=yes},<br />
gamma = {distribution=logitNormal, iiv=yes},<br />
</pre> }}<br />
|image=<br />
[[File:Vsaem1.png]]<br />
}}<br />
<br />
<br />
Once the model is implemented, tasks such as maximum likelihood estimation can be performed using the SAEM algorithm. Certain settings in SAEM must be provided by the user. Even though SAEM is quite insensitive to the initial parameter values,<br />
it is possible to perform a preliminary sensitivity analysis in order to select "good" initial values.<br />
<br />
<br />
{{ImageWithCaption|image=Vsaem2.png|caption=Looking for good initial values for SAEM}}<br />
<br />
<br />
<br />
Then, when we run SAEM, it converges easily and quickly to the MLE:<br />
<br />
<br />
{{JustCode<br />
|code=<pre style="background-color:#EFEFEF; border:none;">Estimation of the population parameters<br />
<br />
parameter<br />
ka : 0.974<br />
V : 7.07<br />
Cl : 2.00<br />
h0 : 0.0102<br />
gamma : 0.485<br />
<br />
omega_ka : 0.668<br />
omega_V : 0.365<br />
omega_Cl : 0.588<br />
omega_h0 : 0.105<br />
omega_gamma : 0.0901<br />
<br />
a_1 : 0.345<br />
</pre> }}<br />
<br />
<br />
Parameter estimation can therefore be seen as estimating the reference values and variance of the random effects.<br />
<br />
In addition to these numbers, it is important to be able to graphically represent these distributions in order to see them and therefore understand them better. In effect, the interpretation of certain parameters is not always simple. Of course, we know what a normal distribution represents and in particular its mean, median and mode, which are equal (see the distribution of $Cl$ below for instance). These measures of central tendency can be different among themselves for other asymmetric distributions such as the log-normal (see the distribution of $ka$).<br />
<br />
Interpreting dispersion terms like $\omega_{ka}$ and $\omega_{V}$ is not obvious either when the parameter distributions are not normal. In such cases, quartiles or quantiles of order 5% and 95% (for example) may be useful for quantitively describing the variability of these parameters.<br />
<br />
<br />
{{Remarks <br />
|title=Remarks<br />
|text=<br />
For a parameter $\psi$ whose distribution is log-normal, we can approximate the coefficient of variation for $\psi$ by the standard deviation $\omega_{\psi}$ of the random effect $\eta$ if this is fairly small. In effect, when $\omega_{\psi}$ is small,<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\psi &=& \psi_{\rm pop} e^{\eta} \\<br />
&\approx & \psi_{\rm pop}(1+ \eta) .<br />
\end{eqnarray}</math> }}<br />
<br />
Thus<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\esp{\psi} &\approx& \psi_{\rm pop} \\<br />
\std{\psi} &\approx & \psi_{\rm pop}\omega_{\psi},<br />
\end{eqnarray}</math> }}<br />
<br />
and<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
{\rm cv}(\psi) &=& \frac{\std{\psi} }{\esp{\psi} } \\<br />
&\approx & \omega_{\psi} .<br />
\end{eqnarray}</math> }}<br />
<br />
Do not forget that this approximation is only valid when $\omega$ is small and in the case of log-normal distributions. It does not carry over to any other distribution. Thus, when $\omega_{h0}=0.1$ for a probit-normal distribution or $\omega_{\gamma}=0.09$ for a logit-normal one, there is no immediate interpretation available. Only by looking at the graphical display of the pdf or by calculating some quantiles of interest can we begin to get an idea of dispersion in the parameters $h0$ and $\gamma$.<br />
}}<br />
<br />
<br />
{{ImageWithCaption|image=saem3b.png|caption=Estimation of the population distributions of the individual parameters of the model }}<br />
<br />
<br />
<br />
<br><br />
<br />
==Bayesian estimation of the population parameters==<br />
<br />
The ''Bayesian approach'' considers $\theta$ as a random vector with a ''prior distribution'' $\qth$. We can then define the posterior distribution of $\theta$:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\pcthy(\theta {{!}} \by ) &=& \displaystyle{ \frac{\pth( \theta )\pcyth(\by {{!}} \theta )}{\py(\by)} }\\<br />
&=& \displaystyle{ \frac{\pth( \theta ) \int \pypsith(\by,\bpsi {{!}}\theta) \, d \bpsi}{\py(\by)} }.<br />
\end{eqnarray}</math> }}<br />
<br />
We can estimate this conditional distribution and derive any statistics (posterior mean, standard deviation, percentiles, etc.) or derive the so-called ''Maximum a Posteriori'' (MAP) estimate of $\theta$:<br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\hat{\theta}^{\rm MAP} &=& \argmax{\theta} \pcthy(\theta {{!}} \by ) \\<br />
&=& \argmax{\theta} \left\{ {\llike}(\theta ; \by) + \log( \pth( \theta ) ) \right\} .<br />
\end{eqnarray}</math> }}<br />
<br />
The MAP estimate therefore maximizes a penalized version of the observed likelihood. In other words, maximum a posteriori estimation reduces to penalized maximum likelihood estimation. Suppose for instance that $\theta$ is a scalar parameter and the prior is a normal distribution with mean $\theta_0$ and variance $\gamma^2$. Then, the MAP estimate minimizes<br />
<br />
{{Equation1<br />
|equation=<math><br />
\hat{\theta}^{\rm MAP} =\argmax{\theta} \left\{ {\llike} (\theta ; \by) - \displaystyle{ \frac{1}{2\gamma^2} }(\theta - \theta_0)^2 \right\} .<br />
</math> }}<br />
<br />
The MAP estimate is a trade-off between the MLE which maximizes ${\llike}(\theta ; \by)$ and $\theta_0$ which minimizes $(\theta - \theta_0)^2$. The weight given to the prior directly depends on the variance of the prior distribution: the smaller $\gamma^2$ is, the closer to $\theta_0$ the MAP is. The limiting distribution considers that $\gamma^2=0$: this prior means here that $\theta$ is fixed as $\theta_0$ and no longer needs to be estimated.<br />
<br />
Both the Bayesian and frequentist approaches have their supporters and detractors. But rather than being dogmatic and blindly following the same rule-book every time, we need to be pragmatic and ask the right methodological questions when confronted with a new problem.<br />
<br />
We have to remember that Bayesian methods have been extremely successful, in particular for numerical calculations. For instance, (Bayesian) MCMC methods allow us to estimate more or less any conditional distribution coming from any hierarchical model, whereas frequentist approaches such as maximum likelihood estimation can be much more difficult to implement.<br />
<br />
All things said, the problem comes down to knowing whether the data contains sufficient information to answer a given question, and whether some other information may be available to help answer it. This is the essence of the art of modeling: finding the right compromise between the confidence we have in the data and prior knowledge of the problem. Each problem is different and requires a specific approach. For instance, if all the patients in a pharmacokinetic trial have essentially the same weight, it is pointless to estimate a relationship between weight and the model's PK parameters using the trial data. In this case, the modeler would be better served trying to use prior information based on physiological criteria rather than just a statistical model.<br />
<br />
Therefore, we can use information available to us, of course! Why not? But this information needs to be pertinent. Systematically using a prior for the parameters is not always meaningful. Can we reasonable suppose that we have access to such information? For continuous data for example, what does putting a prior on the residual error model's parameters mean in reality? A reasoned statistical approach consists of only including prior information for certain parameters (those for which we have real prior information) and having confidence in the data for the others.<br />
<br />
$\monolix$ allows this hybrid approach which reconciles the Bayesian and frequentist approaches. A given parameter can be:<br />
<br />
<br />
<ul><br />
* a fixed constant if we have absolute confidence in its value or the data does not allow it to be estimated, essentially due to identifiability constraints.<br />
<br><br />
<br />
* estimated by maximum likelihood, either because we have great confidence in the data or have no information on the parameter.<br />
<br><br />
<br />
* estimated by introducing a prior and calculating the MAP estimate.<br />
<br><br />
<br />
* estimated by introducing a prior and then estimating the posterior distribution.<br />
</ul><br />
<br />
<br />
We put aside dealing with the fixed components of $\theta$ in the following. Here are some possible situations:<br />
<br />
<br />
<ol><br />
<li> ''Combined maximum likelihood and maximum a posteriori estimation'': decompose $\theta$ into $(\theta_E,\theta_{M})$ where $\theta_E$ are the components of $\theta$ to be estimated with MLE and $\theta_{M}$ those with a prior distribution whose posterior distribution is to be maximized. Then, $(\hat{\theta}_E , \hat{\theta}_{M} )$ below maximizes the penalized likelihood of $(\theta_E,\theta_{M})$: </li><br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
(\hat{\theta}_E , \hat{\theta}_{M} ) &=& \argmax{\theta_E , \theta_{M} } \log(\py(\by , \theta_{M}; \theta_E)) \\<br />
&=& \argmax{\theta_E , \theta_{M} } \left\{ {\llike}(\theta_E , \theta_{M}; \by) + \log( \pth( \theta_M ) ) \right\} ,<br />
\end{eqnarray}</math> }}<br />
<br />
where ${\llike} (\theta_E , \theta_{M}; \by) \ \ \eqdef \ \ \log\left(\py(\by | \theta_{M}; \theta_E)\right).$<br />
<br />
<br />
<li> ''Combined maximum likelihood and posterior distribution estimation'': here, decompose $\theta$ into $(\theta_E,\theta_{R})$ where $\theta_E$ are the components of $\theta$ to be estimated with MLE and $\theta_{R}$ those with a prior distribution whose posterior distribution is to be estimated. We propose the following strategy for estimating $\theta_E$ and $\theta_{R}$: </li><br />
<br />
<br />
<ol style="list-style-type:lower-roman"><br />
<li> Compute the maximum likelihood of $\theta_E$: </li><br />
<br />
{{Equation1<br />
|equation=<math>\begin{eqnarray}<br />
\hat{\theta}_E &=& \argmax{\theta_E} \log(\py(\by ; \theta_E)) \\<br />
&=& \argmax{\theta_E} \int \pmacro(\by , \theta_R ; \theta_E ) d \theta_R .<br />
\end{eqnarray}</math> }}<br />
<br />
<br />
<li> Estimate the conditional distribution $\pmacro(\theta_{R} | \by ;\hat{\theta}_E)$. </li><br />
</ol><br />
<br />
<br />
It is then straightforward to extend this approach to more complex situations where some components of $\theta$ are estimated with MLE, others using MAP estimation and others still by estimating their conditional distributions.<br />
</ol><br />
<br />
<br />
{{Example1<br />
|title1=Example<br />
|title2=A PK example<br />
|text=<br />
In this example we use only the pharmacokinetic data and aim to estimate the population parameter distributions of the PK parameters $ka$, $V$ and $Cl$. We assume log-normal distributions for these three parameters. All of the model's population parameters are estimated by maximum likelihood estimation except $ka_{\rm pop}$ for which a log-normal distribution is used as a prior:<br />
<br />
{{Equation1<br />
|equation=<math> \log(ka_{\rm pop}) \sim {\cal N}(\log(1.5), \gamma^2) . </math> }}<br />
<br />
$\monolix$ allows us to compute the MAP estimate and to estimate the posterior distribution of $ka_{\rm pop}$ for various values of $\gamma$.<br />
<br />
<br />
<div style="margin-left:17%; margin-right:17%; align:center"><br />
{{{!}} class="wikitable" align="center" style="width:100%"<br />
{{!}} $\gamma$ {{!}}{{!}} 0 {{!}}{{!}} 0.01 {{!}}{{!}} 0.025 {{!}}{{!}} 0.05 {{!}}{{!}} 0.1 {{!}}{{!}} 0.2 {{!}}{{!}} $+ \infty$ <br />
{{!}}-<br />
{{!}}$\hat{ka}_{\rm pop}^{\rm MAP}$ {{!}}{{!}} 1.5 {{!}}{{!}} 1.49 {{!}}{{!}} 1.47 {{!}}{{!}} 1.39 {{!}}{{!}} 1.22 {{!}}{{!}} 1.11 {{!}}{{!}} 1.05 <br />
{{!}}}</div><br />
<br />
{{ImageWithCaption|image=bayes1.png|caption=Prior and posterior distributions of $ka_{\rm pop}$ for different values of $\gamma$}}<br />
<br />
<br />
As expected, the posterior distribution converges to the prior distribution when the standard deviation $\gamma$ of the prior distribution decreases. Also, the mode of the posterior distribution converges to the maximum likelihood estimate of $ka_{\rm pop}$ when $\gamma$ increases.<br />
}}<br />
<br />
<br />
<br><br />
== Estimation of the Fisher information matrix ==<br />
<br />
The variance of the estimator $\thmle$ and thus confidence intervals can be derived from the [[Estimation of the observed Fisher information matrix|observed Fisher information matrix (F.I.M.)]], which itself is calculated using the observed likelihood (i.e., the pdf of the observations $\by$):<br />
<br />
{{EquationWithRef<br />
|equation=<div id="ofim_intro3"><math><br />
\ofim(\thmle ; \by) \ \ \eqdef \ \ - \displaystyle{ \frac{\partial^2}{\partial \theta^2} }\log({\like}(\thmle ; \by)) .<br />
</math></div><br />
|reference=(1) }}<br />
<br />
Then, the variance-covariance matrix of the maximum likelihood estimator $\thmle$ can be estimated by the inverse of the observed F.I.M. Standard errors (s.e.) for each component of $\thmle$ are their standard deviations, i.e., the square-root of the diagonal elements of this covariance matrix. $\monolix$ also displays the (estimated) relative standard errors (r.s.e.), i.e., the (estimated) standard error divided by the value of the estimated parameter.<br />
<br />
<br />
{{JustCode<br />
|code=<pre style="background-color:#EFEFEF; border:none;">Estimation of the population parameters<br />
<br />
parameter s.e. (s.a.) r.s.e.(%)<br />
ka : 0.974 0.082 8<br />
V : 7.07 0.35 5<br />
Cl : 2 0.07 4<br />
h0 : 0.0102 0.0014 14<br />
gamma : 0.485 0.015 3<br />
<br />
omega_ka : 0.668 0.064 10<br />
omega_V : 0.365 0.037 10<br />
omega_Cl : 0.588 0.055 9<br />
omega_h0 : 0.105 0.032 30<br />
omega_gamma : 0.0901 0.044 49<br />
<br />
a_1 : 0.345 0.012 3<br />
</pre> }}<br />
<br />
The F.I.M. can be used for detecting overparametrization of the structural model. In effect, if the model is poorly identifiable, certain estimators will be quite correlated and the F.I.M. will therefore be poorly conditioned and difficult to inverse. Suppose for example that we want to fit a two compartment PK model to the same data as before. The output is shown below. The large values for the relative standard errors for the inter-compartmental clearance $Q$ and the volume of the peripheral compartment $V_2$ mean that the data does not allow us to estimate well these two parameters.<br />
<br />
<br />
{{JustCode<br />
|code=<pre style="background-color:#EFEFEF; border:none;">Estimation of the population parameters<br />
<br />
parameter s.e. (lin) r.s.e.(%)<br />
ka : 0.246 0.0081 3<br />
Cl : 1.9 0.075 4<br />
V1 : 1.71 0.14 8<br />
Q : 0.000171 0.024 1.43e+04<br />
V2 : 0.00673 3.1 4.62e+04<br />
<br />
omega_ka : 0.171 0.026 15<br />
omega_Cl : 0.293 0.026 9<br />
omega_V1 : 0.621 0.062 10<br />
omega_Q : 5.72 1.4e+03 2.41e+04<br />
omega_V2 : 4.61 1.8e+04 3.94e+05<br />
<br />
a : 0.136 0.0073 5<br />
</pre> }}<br />
<br />
<br />
The Fisher information criteria is also widely used in optimal experimental design. Indeed, minimizing the variance of the estimator corresponds to maximizing the information. Then, estimators and designs can be evaluated by looking at certain summary statistics of the covariance matrix (like the determinant or trace for instance).<br />
<br />
<br><br />
== Estimation of the individual parameters ==<br />
<br />
Once $\theta$ has been estimated, the conditional distribution $\pmacro(\psi_i | y_i ; \hat{\theta})$ of the individual parameters $\psi_i$ can be estimated for each individual $i$ using the [[The Metropolis-Hastings algorithm for simulating the individual parameters| Metropolis-Hastings algorithm]]. For each $i$, this algorithm generates a sequence $(\psi_i^{k}, k \geq 1)$ which converges in distribution to the conditional distribution $\pmacro(\psi_i | y_i ; \hat{\theta})$ and that can be used for estimating any summary statistic of this distribution (mean, standard deviation, quantiles, etc.).<br />
<br />
The mode of this conditional distribution can be estimated using this sequence or by maximizing $\pmacro(\psi_i | y_i ; \hat{\theta})$ using numerical methods.<br />
<br />
The choice of using the conditional mean or the conditional mode is arbitrary. By default, $\monolix$ uses the conditional mode, taking the philosophy that the "most likely" values of the individual parameters are the most suited for computing the "most likely" predictions.<br />
<br />
<br />
{{ImageWithCaption|image=mode1.png|caption=Predicted concentrations for 6 individuals using the estimated conditional modes of the individual PK parameters}} <br />
<br />
<br><br />
<br />
== Estimation of the observed log-likelihood ==<br />
<br />
<br />
Once $\theta$ has been estimated, the observed log-likelihood of $\hat{\theta}$ is defined as<br />
<br />
{{Equation1<br />
|equation=<math> \begin{eqnarray}<br />
{\llike} (\hat{\theta};\by) &=& \log({\like}(\hat{\theta};\by)) \\<br />
&\eqdef& \log(\py(\by;\hat{\theta})) .<br />
\end{eqnarray}</math> }}<br />
<br />
The observed log-likelihood cannot be computed in closed form for nonlinear mixed effects models, but can be estimated using the methods described in the [[Estimation of the log-likelihood]] Section. The estimated log-likelihood can then be used for performing likelihood ratio tests and for computing information criteria such as AIC and BIC (see the [[Evaluation]] Section).<br />
<br />
<br />
<br><br />
== Bibliography ==<br />
<br />
<bibtex><br />
@article{Monolix,<br />
author = {Lixoft},<br />
title = {Monolix 4.2},<br />
year={2012}<br />
journal = {http://www.lixoft.eu/products/monolix/product-monolix-overview},<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@article{comets2011package,<br />
title={saemix: Stochastic Approximation Expectation Maximization (SAEM) algorithm. R package version 0.96.1},<br />
author={Comets, E. and Lavenu, A. and Lavielle, M.},<br />
journal = {http://cran.r-project.org/web/packages/saemix/index.html},<br />
year={2013}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@article{nlmefitsa,<br />
title={nlmefitsa: fit nonlinear mixed-effects model with stochastic EM algorithm. Matlab R2013a function},<br />
author={The MathWorks},<br />
journal = {http://www.mathworks.fr/fr/help/stats/nlmefitsa.html},<br />
year={2013}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@article{beal1992nonmem,<br />
title={NONMEM users guides},<br />
author={Beal, S.L. and Sheiner, L.B. and Boeckmann, A. and Bauer, R.J.},<br />
journal={San Francisco, NONMEM Project Group, University of California},<br />
year={1992}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@book{pinheiro2000mixed,<br />
title={Mixed effects models in S and S-PLUS},<br />
author={Pinheiro, J.C. and Bates, D.M.},<br />
year={2000},<br />
publisher={Springer Verlag}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@article{pinheiro2010r,<br />
title={the R Core team (2009) nlme: Linear and Nonlinear Mixed Effects Models. R package version 3.1-96},<br />
author={Pinheiro, J. and Bates, D. and DebRoy, S. and Sarkar, D.},<br />
journal={R Foundation for Statistical Computing, Vienna},<br />
year={2010}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@article{spiegelhalter2003winbugs,<br />
title={WinBUGS user manual},<br />
author={Spiegelhalter, D. and Thomas, A. and Best, N. and Lunn, D.},<br />
journal={Cambridge: MRC Biostatistics Unit},<br />
year={2003}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@Manual{docSPSS,<br />
title = {Linear mixed-effects modeling in SPSS. An introduction to the MIXED procedure},<br />
author = {SPSS},<br />
year = {2002},<br />
note={Technical Report}<br />
}<br />
</bibtex><br />
<br />
<bibtex><br />
@Manual{docSAS,<br />
title = {The NLMMIXED procedure, SAS/STAT 9.2 User's Guide},<br />
chapter = {61},<br />
pages = {4337--4435},<br />
author = {SAS},<br />
year = {2008}<br />
}<br />
</bibtex><br />
<br />
<br />
{{Back&Next<br />
|linkBack=Visualization<br />
|linkNext=Model evaluation }}</div>Brocco