https://wiki.inria.fr/wikis/popix/api.php?action=feedcontributions&user=Bleakley&feedformat=atom Popix - User contributions [en] 2022-07-03T10:51:53Z User contributions MediaWiki 1.32.6 https://wiki.inria.fr/wikis/popix/index.php?title=Modeling_the_individual_parameters&diff=7459 Modeling the individual parameters 2013-07-01T08:52:12Z <p>Bleakley: </p> <hr /> <div>&lt;!-- Menu for the Individual Parameters chapter --&gt;<br /> &lt;sidebarmenu&gt;<br /> +[[Modeling the individual parameters]]<br /> *[[Modeling the individual parameters| Introduction ]] | [[Gaussian models]] | [[Model with covariates]] | [[Extension to multivariate distributions]] | [[Additional levels of variability]] <br /> &lt;/sidebarmenu&gt;<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> &lt;div style=&quot;color: #2E5894; padding-left: 1.4em; padding-right:2.2em; padding-bottom:0.8em; padding:top:1&quot;&gt;[[Image:attention4.jpg|45px|left|link=]] <br /> (If you are experiencing problems with the display of the mathematical formula, you can either try to use another browser, or use this link which should work smoothly: http://popix.lixoft.net)<br /> &lt;/div&gt;<br /> <br /> In [[The individual approach]] section we introduced the modeling approach for a single individual whose response variable depended on the parameter $\psi$. In the population approach, we now suppose that each individual $i$ has its own &quot;individual&quot; parameter $\psi_i$ and more importantly, that this $\psi_i$ comes from some probability distribution $\qpsii$.<br /> <br /> In this chapter, we are interested in the description, representation and implementation of these individual parameter distributions $\qpsii$.<br /> Generally speaking, we assume that individuals are [http://en.wikipedia.org/wiki/Independence_%28probability_theory%29 independent]. This means that in the following analysis, it suffices to take a closer look at the [http://en.wikipedia.org/wiki/Probability_distribution distribution] $\qpsii$ of a unique individual $i$.<br /> <br /> If $\qpsii$ is a [http://en.wikipedia.org/wiki/Parametric_statistics parametric] distribution that depends on a vector $\theta$ of ''population parameters'' and a set of ''individual covariates'' $c_i=(c_{i,1} , c_{i,2},\ldots, c_{i,L})$, this dependence can be stated explicitly:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;modelindiv1&quot;&gt;&lt;math&gt;<br /> \psi_i \sim \qpsii(\, \cdot \, ;c_i,\theta) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> The distribution $\qpsii$ plays a fundamental role since it describes the ''inter-individual variability'' of the individual parameter $\psi_i$.<br /> It achieves two things:<br /> <br /> <br /> &lt;ul&gt;<br /> * Definition of a ''predicted'' value $\hpsi_i$ of $\psi_i$ for a given vector of covariates $c_i$ and a given population parameter $\theta$, i.e., a &quot;typical&quot; value of the individual parameter $\psi_i$ for individuals who share the same covariates in a given population.<br /> &lt;br&gt;<br /> <br /> * A description of how the individual parameter $\psi_i$ fluctuates around its predicted value $\hpsi_i$. In other words, describes the distribution of the individual parameters for individuals who share the same covariates $c_i$.<br /> &lt;/ul&gt;<br /> <br /> <br /> This means that modeling the individual parameters reduces to describing these two properties of the distribution $\qpsii$. We can imagine all sorts of discrete or continuous distributions and linear or nonlinear covariate models to define $\hpsi_i$. Nevertheless, we must remember that in the modeling context, the parameters $\psi_i$ are not actually going to be themselves observed. This means that we are going to prefer certain types of models with a structure that lets them be both [http://en.wikipedia.org/wiki/Identifiability identifiable] and interpretable.<br /> <br /> Example distributions via the normal distribution are proposed in the [[Gaussian models]] section, and continuous and categorical covariate models are presented in the [[The covariate model]] section.<br /> <br /> Rather than defining $\psi_i$ using a probability distribution as in [[#modelindiv1|(1)]], we can instead use equations:<br /> <br /> {{EquationWithRef<br /> |equation= &lt;div id=&quot;modelindiv2&quot;&gt;&lt;math&gt;<br /> \psi_i = \model(\bbeta,c_i,\eta_i) , <br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> where $\bbeta$ is a vector of ''fixed effects'' and $\eta_i$ a vector of ''random effects'', i.e., a vector of zero-mean random variables: $\esp{\eta_i}=0$.<br /> The predicted value $\hpsi_i$ is then seen as the value of $\psi_i$ with the random effects set to zero:<br /> <br /> {{EquationWithRef<br /> |equation= &lt;div id=&quot;modelindiv3&quot;&gt;&lt;math&gt;<br /> \hpsi_i = \model(\bbeta,c_i,\eta_i \equiv 0) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> The pros and cons of the two approaches are discussed in the [[Description, representation and implementation of a model]] section.<br /> We will show that both representations can be used with the various models presented in the [[Gaussian models]] and [[The_covariate_model|The covariate model]] sections.<br /> <br /> A multivariate representation of the distribution of $\psi_i$ is given in the [[Extension to multivariate distributions]] section for when the random effects vector $\eta_i$ is Gaussian. In this case, under fairly general hypotheses, we can explicitly calculate the [http://en.wikipedia.org/wiki/Likelihood_function likelihood function]<br /> <br /> <br /> {{EquationWithBorder<br /> |equation= &lt;math&gt; {\like}(\theta ; \psi_1,\psi_2,\ldots, \psi_N) \ \ \eqdef \ \ \prod_{i=1}^{N}\ppsii(\psi_i ; c_i , \theta). &lt;/math&gt; }} <br /> <br /> <br /> Here, the distribution of the vector of random effects is completely defined by its variance-covariance matrix $\Omega$. Then, the vector of population parameters $\theta$ contains the vector $\bbeta$ of fixed effects and the variance-covariance matrix $\Omega$.<br /> <br /> Several extensions are possible:<br /> <br /> * We can suppose that the individual parameters of a given individual can fluctuate over time. Here, the model needs to describe the ''intra-individual variability'' of the individual parameters.<br /> <br /> * We can also suppose that the individuals are not in fact independent. The model then requires us to provide the inter-individual dependencies of the individual parameters.<br /> <br /> <br /> Some of these models that incorporate differing types of variability are presented in the [[Additional_levels_of_variability|Additional levels of variability]] section.<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack= Introduction &amp; notation {{!}} Introduction to Models<br /> |linkNext= Gaussian models }}</div> Bleakley https://wiki.inria.fr/wikis/popix/index.php?title=Modeling_the_individual_parameters&diff=7458 Modeling the individual parameters 2013-07-01T08:47:04Z <p>Bleakley: </p> <hr /> <div>&lt;!-- Menu for the Individual Parameters chapter --&gt;<br /> &lt;sidebarmenu&gt;<br /> +[[Modeling the individual parameters]]<br /> *[[Modeling the individual parameters| Introduction ]] | [[Gaussian models]] | [[Model with covariates]] | [[Extension to multivariate distributions]] | [[Additional levels of variability]] <br /> &lt;/sidebarmenu&gt;<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> &lt;div style=&quot;color: #2E5894; padding-left: 1.4em; padding-right:2.2em; padding-bottom:0.8em; padding:top:1&quot;&gt;[[Image:attention4.jpg|45px|left|link=]] <br /> (If you are experiencing problems with the display of the mathematical formula, you can either try to use another browser, or use this link which should work smoothly: http://popix.lixoft.net)<br /> &lt;/div&gt;<br /> <br /> In [[The individual approach]] section we introduced the modeling approach for a single individual whose response variable depended on the parameter $\psi$. In the population approach, we now suppose that each individual $i$ has its own &quot;individual&quot; parameter $\psi_i$ and more importantly, that this $\psi_i$ comes from some probability distribution $\qpsii$.<br /> <br /> In this chapter, we are interested in the description, representation and implementation of these individual parameter distributions $\qpsii$.<br /> Generally speaking, we assume that individuals are [http://en.wikipedia.org/wiki/Independence_%28probability_theory%29 independent]. This means that in the following analysis, it suffices to take a closer look at the [http://en.wikipedia.org/wiki/Probability_distribution distribution] $\qpsii$ of a unique individual $i$.<br /> <br /> If $\qpsii$ is a [http://en.wikipedia.org/wiki/Parametric_statistics parametric] distribution that depends on a vector $\theta$ of ''population parameters'' and a set of ''individual covariates'' $c_i=(c_{i,1} , c_{i,2},\ldots, c_{i,L})$, this dependence can be stated explicitly:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;modelindiv1&quot;&gt;&lt;math&gt;<br /> \psi_i \sim \qpsii(\, \cdot \, ;c_i,\theta) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> The distribution $\qpsii$ plays a fundamental role since it describes the ''inter-individual variability'' of the individual parameter $\psi_i$.<br /> It achieves two things:<br /> <br /> <br /> &lt;ul&gt;<br /> * Definition of a ''predicted'' value $\hpsi_i$ of $\psi_i$ for a given vector of covariates $c_i$ and a given population parameter $\theta$, i.e., a &quot;typical&quot; value of the individual parameter $\psi_i$ for individuals who share the same covariates in a given population.<br /> &lt;br&gt;<br /> <br /> * A description of how the individual parameter $\psi_i$ fluctuates around its predicted value $\hpsi_i$. In other words, describes the distribution of the individual parameters for individuals who share the same covariates $c_i$.<br /> &lt;/ul&gt;<br /> <br /> <br /> This means that modeling the individual parameters reduces to describing these two properties of the distribution $\qpsii$. We can imagine all sorts of discrete or continuous distributions and linear or nonlinear covariate models to define $\hpsi_i$. Nevertheless, we must remember that in the modeling context, the parameters $\psi_i$ are not actually going to be themselves observed. This means that we are going to prefer certain types of models with a structure that lets them be both identifiable and interpretable.<br /> <br /> Example distributions via the normal distribution are proposed in the [[Gaussian models]] section, and continuous and categorical covariate models are presented in the [[The covariate model]] section.<br /> <br /> Rather than defining $\psi_i$ using a probability distribution as in [[#modelindiv1|(1)]], we can instead use equations:<br /> <br /> {{EquationWithRef<br /> |equation= &lt;div id=&quot;modelindiv2&quot;&gt;&lt;math&gt;<br /> \psi_i = \model(\bbeta,c_i,\eta_i) , <br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> where $\bbeta$ is a vector of ''fixed effects'' and $\eta_i$ a vector of ''random effects'', i.e., a vector of zero-mean random variables: $\esp{\eta_i}=0$.<br /> The predicted value $\hpsi_i$ is then seen as the value of $\psi_i$ with the random effects set to zero:<br /> <br /> {{EquationWithRef<br /> |equation= &lt;div id=&quot;modelindiv3&quot;&gt;&lt;math&gt;<br /> \hpsi_i = \model(\bbeta,c_i,\eta_i \equiv 0) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> The pros and cons of the two approaches are discussed in the [[Description, representation and implementation of a model]] section.<br /> We will show that both representations can be used with the various models presented in the [[Gaussian models]] and [[The_covariate_model|The covariate model]] sections.<br /> <br /> A multivariate representation of the distribution of $\psi_i$ is given in the [[Extension to multivariate distributions]] section for when the random effects vector $\eta_i$ is Gaussian. In this case, under fairly general hypotheses, we can explicitly calculate the [http://en.wikipedia.org/wiki/Likelihood_function likelihood function]<br /> <br /> <br /> {{EquationWithBorder<br /> |equation= &lt;math&gt; {\like}(\theta ; \psi_1,\psi_2,\ldots, \psi_N) \ \ \eqdef \ \ \prod_{i=1}^{N}\ppsii(\psi_i ; c_i , \theta). &lt;/math&gt; }} <br /> <br /> <br /> Here, the distribution of the vector of random effects is completely defined by its variance-covariance matrix $\Omega$. Then, the vector of population parameters $\theta$ contains the vector $\bbeta$ of fixed effects and the variance-covariance matrix $\Omega$.<br /> <br /> Several extensions are possible:<br /> <br /> * We can suppose that the individual parameters of a given individual can fluctuate over time. Here, the model needs to describe the ''intra-individual variability'' of the individual parameters.<br /> <br /> * We can also suppose that the individuals are not in fact independent. The model then requires us to provide the inter-individual dependencies of the individual parameters.<br /> <br /> <br /> Some of these models that incorporate differing types of variability are presented in the [[Additional_levels_of_variability|Additional levels of variability]] section.<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack= Introduction &amp; notation {{!}} Introduction to Models<br /> |linkNext= Gaussian models }}</div> Bleakley https://wiki.inria.fr/wikis/popix/index.php?title=Modeling_the_individual_parameters&diff=7457 Modeling the individual parameters 2013-07-01T08:45:26Z <p>Bleakley: </p> <hr /> <div>&lt;!-- Menu for the Individual Parameters chapter --&gt;<br /> &lt;sidebarmenu&gt;<br /> +[[Modeling the individual parameters]]<br /> *[[Modeling the individual parameters| Introduction ]] | [[Gaussian models]] | [[Model with covariates]] | [[Extension to multivariate distributions]] | [[Additional levels of variability]] <br /> &lt;/sidebarmenu&gt;<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> &lt;div style=&quot;color: #2E5894; padding-left: 1.4em; padding-right:2.2em; padding-bottom:0.8em; padding:top:1&quot;&gt;[[Image:attention4.jpg|45px|left|link=]] <br /> (If you are experiencing problems with the display of the mathematical formula, you can either try to use another browser, or use this link which should work smoothly: http://popix.lixoft.net)<br /> &lt;/div&gt;<br /> <br /> In [[The individual approach]] section we introduced the modeling approach for a single individual whose response variable depended on the parameter $\psi$. In the population approach, we now suppose that each individual $i$ has its own &quot;individual&quot; parameter $\psi_i$ and more importantly, that this $\psi_i$ comes from some probability distribution $\qpsii$.<br /> <br /> In this chapter, we are interested in the description, representation and implementation of these individual parameter distributions $\qpsii$.<br /> Generally speaking, we assume that individuals are [http://en.wikipedia.org/wiki/Independence_%28probability_theory%29 independent]. This means that in the following analysis, it suffices to take a closer look at the [http://en.wikipedia.org/wiki/Probability_distribution distribution] $\qpsii$ of a unique individual $i$.<br /> <br /> If $\qpsii$ is a parametric distribution that depends on a vector $\theta$ of ''population parameters'' and a set of ''individual covariates'' $c_i=(c_{i,1} , c_{i,2},\ldots, c_{i,L})$, this dependence can be stated explicitly:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;modelindiv1&quot;&gt;&lt;math&gt;<br /> \psi_i \sim \qpsii(\, \cdot \, ;c_i,\theta) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> The distribution $\qpsii$ plays a fundamental role since it describes the ''inter-individual variability'' of the individual parameter $\psi_i$.<br /> It achieves two things:<br /> <br /> <br /> &lt;ul&gt;<br /> * Definition of a ''predicted'' value $\hpsi_i$ of $\psi_i$ for a given vector of covariates $c_i$ and a given population parameter $\theta$, i.e., a &quot;typical&quot; value of the individual parameter $\psi_i$ for individuals who share the same covariates in a given population.<br /> &lt;br&gt;<br /> <br /> * A description of how the individual parameter $\psi_i$ fluctuates around its predicted value $\hpsi_i$. In other words, describes the distribution of the individual parameters for individuals who share the same covariates $c_i$.<br /> &lt;/ul&gt;<br /> <br /> <br /> This means that modeling the individual parameters reduces to describing these two properties of the distribution $\qpsii$. We can imagine all sorts of discrete or continuous distributions and linear or nonlinear covariate models to define $\hpsi_i$. Nevertheless, we must remember that in the modeling context, the parameters $\psi_i$ are not actually going to be themselves observed. This means that we are going to prefer certain types of models with a structure that lets them be both identifiable and interpretable.<br /> <br /> Example distributions via the normal distribution are proposed in the [[Gaussian models]] section, and continuous and categorical covariate models are presented in the [[The covariate model]] section.<br /> <br /> Rather than defining $\psi_i$ using a probability distribution as in [[#modelindiv1|(1)]], we can instead use equations:<br /> <br /> {{EquationWithRef<br /> |equation= &lt;div id=&quot;modelindiv2&quot;&gt;&lt;math&gt;<br /> \psi_i = \model(\bbeta,c_i,\eta_i) , <br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> where $\bbeta$ is a vector of ''fixed effects'' and $\eta_i$ a vector of ''random effects'', i.e., a vector of zero-mean random variables: $\esp{\eta_i}=0$.<br /> The predicted value $\hpsi_i$ is then seen as the value of $\psi_i$ with the random effects set to zero:<br /> <br /> {{EquationWithRef<br /> |equation= &lt;div id=&quot;modelindiv3&quot;&gt;&lt;math&gt;<br /> \hpsi_i = \model(\bbeta,c_i,\eta_i \equiv 0) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> The pros and cons of the two approaches are discussed in the [[Description, representation and implementation of a model]] section.<br /> We will show that both representations can be used with the various models presented in the [[Gaussian models]] and [[The_covariate_model|The covariate model]] sections.<br /> <br /> A multivariate representation of the distribution of $\psi_i$ is given in the [[Extension to multivariate distributions]] section for when the random effects vector $\eta_i$ is Gaussian. In this case, under fairly general hypotheses, we can explicitly calculate the [http://en.wikipedia.org/wiki/Likelihood_function likelihood function]<br /> <br /> <br /> {{EquationWithBorder<br /> |equation= &lt;math&gt; {\like}(\theta ; \psi_1,\psi_2,\ldots, \psi_N) \ \ \eqdef \ \ \prod_{i=1}^{N}\ppsii(\psi_i ; c_i , \theta). &lt;/math&gt; }} <br /> <br /> <br /> Here, the distribution of the vector of random effects is completely defined by its variance-covariance matrix $\Omega$. Then, the vector of population parameters $\theta$ contains the vector $\bbeta$ of fixed effects and the variance-covariance matrix $\Omega$.<br /> <br /> Several extensions are possible:<br /> <br /> * We can suppose that the individual parameters of a given individual can fluctuate over time. Here, the model needs to describe the ''intra-individual variability'' of the individual parameters.<br /> <br /> * We can also suppose that the individuals are not in fact independent. The model then requires us to provide the inter-individual dependencies of the individual parameters.<br /> <br /> <br /> Some of these models that incorporate differing types of variability are presented in the [[Additional_levels_of_variability|Additional levels of variability]] section.<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack= Introduction &amp; notation {{!}} Introduction to Models<br /> |linkNext= Gaussian models }}</div> Bleakley https://wiki.inria.fr/wikis/popix/index.php?title=Introduction_%26_notation&diff=7456 Introduction & notation 2013-07-01T08:36:55Z <p>Bleakley: </p> <hr /> <div>&lt;div style=&quot;color: #2E5894; padding-left: 1.4em; padding-right:2.2em; padding-bottom:0.8em; padding:top:1&quot;&gt;[[Image:attention4.jpg|45px|left|link=]] <br /> (If you are experiencing problems with the display of the mathematical formula, you can either try to use another browser, or use this link which should work smoothly: http://popix.lixoft.net)<br /> &lt;/div&gt;<br /> <br /> Models are attempts to describe observations in a logical, simple way, involving the relationship between measurements, parameters, covariates and so on. If working in a probabilistic framework - as we are here - there will be randomness in the model, involving random variables, probability distributions, errors and more.<br /> <br /> Because of this, we are going to make the following definition of a model in this context: [[What is a model? A joint probability distribution! | '''a model is a joint probability distribution''']].<br /> <br /> Therefore, defining a model means defining a [http://en.wikipedia.org/wiki/Joint_probability_distribution joint probability distribution], which can then be decomposed into a product of [http://en.wikipedia.org/wiki/Conditional_distribution conditional distributions] we can perform tasks on: estimation, model selection, simulation, etc.<br /> <br /> This chapter is therefore about defining appropriate probability distributions. We start by introducing some general notation and conventions.<br /> <br /> <br /> * We will call $y_i$ the set of observations recorded on subject $i$, and $\by$ the combined set of observations for all the $N$ individuals: $\by = (y_1, ...,y_N)$. In general, we will use '''bold''' text (like for $\by$) when a variable regroups several individuals. Thus, we write $\psi_i$ for the parameter vector for individual $i$ and $\bpsi$ the parameter vector of a set of individuals, $\bpsi = (\psi_1,\ldots,\psi_N)$.<br /> <br /> <br /> * We note $\qy$ and $\qpsi$ the distributions of $\by$ and $\bpsi$ respectively, $\qcypsi$ the conditional distribution of $\by$ given $\bpsi$, and $\qypsi$ the joint distribution of $\by$ and $\bpsi$. In these (and other distributions), we have placed the variable described by the distribution in the index.<br /> <br /> <br /> * We use the same &quot;$p$&quot; notation for the [http://en.wikipedia.org/wiki/Probability_distribution distribution] of a random variable as for its [http://en.wikipedia.org/wiki/Probability_density_function probability density function] (pdf).<br /> <br /> <br /> * When there is no ambiguity when working with whole equations, to simplify notation we may omit the indices and simply use the symbol $\pmacro$. For instance, $\qy(\by)$, the pdf of $\by$, becomes $\py(\by)$; both are equivalent. The symbol $\pmacro$ has no meaning on its own, it is completely defined by its arguments.<br /> <br /> <br /> * When the distribution of the individual parameters $\psi_i$ of subject $i$ depends on a vector of individual [http://en.wikipedia.org/wiki/Covariate covariates] $c_i$ and a population parameter $\theta$, we may choose to explicitly show this dependence by writing the distribution of $\psi_i$ as $\ppsii(\psi_i;c_i,\theta)$.<br /> <br /> <br /> * When the [http://en.wikipedia.org/wiki/Conditional_distribution conditional distribution] $\qcyipsii$ of the observations $y_i=(y_{ij}, 1\leq j \leq n_i)$ of individual $i$ depends on regression variables $x_i=(x_{ij}, 1\leq j \leq n_i)$ and source terms $u_i$, (i.e., inputs of a dynamical system such as doses in a [[Introduction_to_PK_modeling_using_MLXPlore_-_Part_I | pharmacokinetic model]]), we may choose to explicitly show this dependence, writing the conditional distribution as $\pcyipsii(y_i | \psi_i;x_i,u_i)$.<br /> <br /> <br /> There are two important pieces to the puzzle: the observations $\by$ whose distribution $\qy$ depends on the individual parameters, and the individual parameters $\bpsi$ themselves with distribution $\qpsi$. In the population approach, the base distribution is the joint distribution $\qypsi$ of the observations and individual parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsi(\by,\bpsi) = \pcypsi(\by {{!}} \bpsi)\ppsi(\bpsi).<br /> &lt;/math&gt; }}<br /> <br /> In this chapter, we concentrate essentially on these two components: the [http://en.wikipedia.org/wiki/Conditional_distribution conditional distribution] $\qcypsi$ of the observations, and the distribution $\qpsi$ of the individual parameters.<br /> <br /> Depending on the required complexity of the model, its other components such as [http://en.wikipedia.org/wiki/Covariate covariates], population parameters and design can also be modeled as [http://en.wikipedia.org/wiki/Random_variable random variables], but we will not go into such detail in this chapter.<br /> <br /> For each model, we aim to precisely identify the minimal amount of information needed to represent it mathematically, so that it remains possible to implement and analyze. To do this, we will be able to use $\mlxtran$, a powerful formal declarative language that allows us to describe complicated structural and statistical models in a straightforward, intuitive way.<br /> <br /> {{Next<br /> |link=Modeling the individual parameters }}</div> Bleakley https://wiki.inria.fr/wikis/popix/index.php?title=Introduction_%26_notation&diff=7455 Introduction & notation 2013-07-01T08:34:53Z <p>Bleakley: </p> <hr /> <div>&lt;div style=&quot;color: #2E5894; padding-left: 1.4em; padding-right:2.2em; padding-bottom:0.8em; padding:top:1&quot;&gt;[[Image:attention4.jpg|45px|left|link=]] <br /> (If you are experiencing problems with the display of the mathematical formula, you can either try to use another browser, or use this link which should work smoothly: http://popix.lixoft.net)<br /> &lt;/div&gt;<br /> <br /> Models are attempts to describe observations in a logical, simple way, involving the relationship between measurements, parameters, covariates and so on. If working in a probabilistic framework - as we are here - there will be randomness in the model, involving random variables, probability distributions, errors and more.<br /> <br /> Because of this, we are going to make the following definition of a model in this context: [[What is a model? A joint probability distribution! | '''a model is a joint probability distribution''']].<br /> <br /> Therefore, defining a model means defining a [http://en.wikipedia.org/wiki/Joint_probability_distribution joint probability distribution], which can then be decomposed into a product of [http://en.wikipedia.org/wiki/Conditional_distribution conditional distributions] we can perform tasks on: estimation, model selection, simulation, etc.<br /> <br /> This chapter is therefore about defining appropriate probability distributions. We start by introducing some general notation and conventions.<br /> <br /> <br /> * We will call $y_i$ the set of observations recorded on subject $i$, and $\by$ the combined set of observations for all the $N$ individuals: $\by = (y_1, ...,y_N)$. In general, we will use '''bold''' text (like for $\by$) when a variable regroups several individuals. Thus, we write $\psi_i$ for the parameter vector for individual $i$ and $\bpsi$ the parameter vector of a set of individuals, $\bpsi = (\psi_1,\ldots,\psi_N)$.<br /> <br /> <br /> * We note $\qy$ and $\qpsi$ the distributions of $\by$ and $\bpsi$ respectively, $\qcypsi$ the conditional distribution of $\by$ given $\bpsi$, and $\qypsi$ the joint distribution of $\by$ and $\bpsi$. In these (and other distributions), we have placed the variable described by the distribution in the index.<br /> <br /> <br /> * We use the same &quot;$p$&quot; notation for the [http://en.wikipedia.org/wiki/Probability_distribution distribution] of a random variable as for its [http://en.wikipedia.org/wiki/Probability_density_function probability density function] (pdf).<br /> <br /> <br /> * When there is no ambiguity when working with whole equations, to simplify notation we may omit the indices and simply use the symbol $\pmacro$. For instance, $\qy(\by)$, the pdf of $\by$, becomes $\py(\by)$; both are equivalent. The symbol $\pmacro$ has no meaning on its own, it is completely defined by its arguments.<br /> <br /> <br /> * When the distribution of the individual parameters $\psi_i$ of subject $i$ depends on a vector of individual covariates $c_i$ and a population parameter $\theta$, we may choose to explicitly show this dependence by writing the distribution of $\psi_i$ as $\ppsii(\psi_i;c_i,\theta)$.<br /> <br /> <br /> * When the [http://en.wikipedia.org/wiki/Conditional_distribution conditional distribution] $\qcyipsii$ of the observations $y_i=(y_{ij}, 1\leq j \leq n_i)$ of individual $i$ depends on regression variables $x_i=(x_{ij}, 1\leq j \leq n_i)$ and source terms $u_i$, (i.e., inputs of a dynamical system such as doses in a [[Introduction_to_PK_modeling_using_MLXPlore_-_Part_I | pharmacokinetic model]]), we may choose to explicitly show this dependence, writing the conditional distribution as $\pcyipsii(y_i | \psi_i;x_i,u_i)$.<br /> <br /> <br /> There are two important pieces to the puzzle: the observations $\by$ whose distribution $\qy$ depends on the individual parameters, and the individual parameters $\bpsi$ themselves with distribution $\qpsi$. In the population approach, the base distribution is the joint distribution $\qypsi$ of the observations and individual parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsi(\by,\bpsi) = \pcypsi(\by {{!}} \bpsi)\ppsi(\bpsi).<br /> &lt;/math&gt; }}<br /> <br /> In this chapter, we concentrate essentially on these two components: the [http://en.wikipedia.org/wiki/Conditional_distribution conditional distribution] $\qcypsi$ of the observations, and the distribution $\qpsi$ of the individual parameters.<br /> <br /> Depending on the required complexity of the model, its other components such as [http://en.wikipedia.org/wiki/Covariate covariates], population parameters and design can also be modeled as [http://en.wikipedia.org/wiki/Random_variable random variables], but we will not go into such detail in this chapter.<br /> <br /> For each model, we aim to precisely identify the minimal amount of information needed to represent it mathematically, so that it remains possible to implement and analyze. To do this, we will be able to use $\mlxtran$, a powerful formal declarative language that allows us to describe complicated structural and statistical models in a straightforward, intuitive way.<br /> <br /> {{Next<br /> |link=Modeling the individual parameters }}</div> Bleakley https://wiki.inria.fr/wikis/popix/index.php?title=Description,_representation_and_implementation_of_a_model&diff=7384 Description, representation and implementation of a model 2013-06-21T08:09:11Z <p>Bleakley: </p> <hr /> <div>==Introduction==<br /> <br /> A &quot;model&quot; can be implemented in the real world if it can be programmed using software. To do this, we need a language that can be understood by the software. Before even arriving at this point, it is important to be very clear and systematic about what a model is and how we want to use it.<br /> <br /> It is fundamental to distinguish between the description, representation and implementation of a model. Each of these three concepts uses a specific language.<br /> {| cellpadding=&quot;15&quot; cellspacing=&quot;15&quot;<br /> |style=&quot;width=500px&quot;| 1. First, we describe a model with words, i.e., a human language: || &lt;span style=&quot;font-family:comic sans ms;font-size:11pt&quot;&gt;&quot;The weight is a linear function of the height&quot;&lt;span&gt;<br /> |-<br /> |2. Then we represent the model using a mathematical or schematic language: || &lt;math&gt; W=a\,H + b &lt;/math&gt;<br /> |-<br /> |3. Lastly, we implement the model via a language understood by the software: || {{Verbatim|WEIGHT {{-}} a*HEIGHT + b<br /> |} <br /> <br /> <br /> The representation of a model is not unique. The choice of the representation should be driven by the tasks to be executed: if the model is only used to perform computations, a system of equations contains all the information required. If properties of the model need to be tested (linearity, [http://en.wikipedia.org/wiki/Homoscedasticity homoscedasticity], etc.), then they need to be represented via explicit definitions.<br /> <br /> In the context of mixed-effects models, models that we want to implement can be decomposed into two components: the structural model and the statistical model. Both components have to be described, represented and implemented with precision.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Description, representation and implementation of the structural model==<br /> <br /> Let us now look in more detail at these three things.<br /> <br /> &lt;br&gt;<br /> ===Description of the structural model===<br /> <br /> The first step consists in describing a model with high precision, using terminology and vocabulary well-adapted to the application. For instance, let us consider a [http://en.wikipedia.org/wiki/Pharmacokinetics PK] model that describes drug concentration as a function of time. We can describe the model with the sentence:<br /> <br /> &lt;blockquote&gt;<br /> &quot;''The PK model is a two-compartment model with first-order absorption (from the depot compartment - the gut, to the central compartment - the bloodstream), linear transfers between the central and the peripheral compartment, and linear elimination from the central compartment''&quot;.<br /> &lt;/blockquote&gt;<br /> &lt;br&gt;<br /> <br /> ===Representation of the structural model===<br /> <br /> <br /> 1. ''Using a diagram''<br /> <br /> This PK model can by represented by a diagram like the one shown the the following figure. Such diagrams offer both a descriptive and explicit representation (because the properties of the PK model are clearly shown).<br /> <br /> <br /> :::[[File:intro41.png|400px|link=]]<br /> <br /> <br /> 2. ''Using mathematical equations''<br /> <br /> Alternatively, a mathematical representation can be used to translate the description of the model into a system of equations:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \deriv{A_d} &amp; =&amp; -k_a A_d(t) \\<br /> &lt;!--%\deriv{A_c} &amp; =&amp; k_a A_d(t) -k_{12}A_c(t) + k_{21}A_p(t) - \frac{V_m}{V\,K_m + A_c(t)} A_c(t) \\--&gt;<br /> \deriv{A_c} &amp; =&amp; k_a A_d(t) -k_{12}A_c(t) + k_{21}A_p(t) - k_e A_c(t) \\<br /> \deriv{A_p} &amp; =&amp; k_{12}A_c(t) - k_{21}A_p(t) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> This representation allows us to calculate the amount of drug in each compartment at any point of time. On the other hand, this description of the model is implicit: even if a modeler is able to recognize the model described by the equations, i.e., to identify the processes of [http://en.wikipedia.org/wiki/Absorption_%28pharmacokinetics%29 absorption], [http://en.wikipedia.org/wiki/Distribution_%28pharmacology%29 distribution] and elimination, these are not explicitly represented like in the diagram.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Implementation of the structural model===<br /> <br /> <br /> <br /> 1. ''Using macros''<br /> <br /> The $\mlxtran$ language allows us to implement the model represented in the previous diagram using a simple script and a system of macros:<br /> <br /> <br /> {{MLXTran<br /> |name= - version 1<br /> |text=&lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt; <br /> PK:<br /> compartment(amount=Ac)<br /> oral(ka)<br /> peripheral(k12,k21)<br /> elimination(ke)<br /> &lt;/pre&gt; }}<br /> <br /> As you can see, there is a one-to-one mapping between the diagram and the code: each element of the diagram (and therefore of the model) is implemented as a macro.<br /> <br /> <br /> 2. ''Using equations''<br /> <br /> Alternatively, implementation of the model using the mathematical representation requires entering the system of equations into $\mlxtran$. The syntax used should be as close as possible to the original mathematical language in order make development simple and the code easy to parse. Here is the $\mlxtran$ syntax in this case:<br /> <br /> <br /> {{MLXTran<br /> |name= - version 2<br /> |text=&lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt; <br /> EQUATION:<br /> ddt_Ad = -ka*Ad <br /> ddt_Ac = ka*Ad - k12*Ac + k21*Ap - ke*Ac<br /> ddt_Ap = k12*Ac - k21*Ap<br /> &lt;/pre&gt; }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Which representation for which task?===<br /> <br /> <br /> It is fundamental to have the possibility of using several representations, and therefore several implementations, depending on the task at hand. One reason is that each kind of implementation has its pros and cons.<br /> <br /> The use of equations has the big advantage of being able to represent ''any'' complex model. This is not possible when using macros, which are fixed in number by default. For instance, the PK macros in $\mlxtran$ allow us to code linear and nonlinear ([http://en.wikipedia.org/wiki/Michaelis%E2%80%93Menten_kinetics Michaelis-Menten]) elimination, but no macro exists that can combine the two types of elimination. In contrast, such processes can be easily input using equations:<br /> <br /> ::{{Verbatim |ddt_Ac{{-}} ka*Ad - k12*Ac + k21*Ap - k*Ac - Vm*Ac/(Km*V + Ac) }}<br /> <br /> In a similar vein, models that are well-defined mathematically may be horribly complex to implement using equations, but easy using macros. This is true for instance for [http://en.wikipedia.org/wiki/Dynamical_system dynamical systems] with source terms such as PK models with repeated oral doses and zero-order absorption. In that example, the absorption rate is a piecewise-constant function.<br /> It is not easy to code this model using equations, and not worth it when we can quickly use the $\mlxtran$ macro {{Verbatim|oral(Tk0)}}, which completely characterizes the model for any dose design. The C++ code generated from an $\mlxtran$ script that uses this macro is the same as the one (that would be) generated by a script using a system of equations.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Description, representation and implementation of the statistical model==<br /> <br /> The statistical component of the model can be decomposed in two sub-models: a model that describes the variability of the parameters and a model that describes the variability of the observations. Each sub-model needs to be described, represented and implemented. Let us illustrate this approach with a very simple statistical model used for modeling the variability of a single individual parameter.<br /> <br /> &lt;br&gt;<br /> ===Description of the statistical model===<br /> In this example we want to describe the distribution of the volume in the population, using weight as a covariate. The first step consists of describing with extreme precision the statistical model that we want to use:<br /> <br /> <br /> &lt;ol&gt;<br /> &lt;li&gt;''Individuals in the population are mutually independent''&lt;/li&gt;<br /> &lt;li&gt;''The volume is log-normally distributed''&lt;/li&gt;<br /> &lt;li&gt;''The log-volume predicted by the model is a linear function of the log-weight''&lt;/li&gt;<br /> &lt;li&gt;''The reference weight in the population is 70kg''&lt;/li&gt;<br /> &lt;li&gt;''The variance of the log-volume is constant.''&lt;/li&gt;<br /> &lt;/ol&gt;<br /> <br /> &lt;br&gt;<br /> <br /> ===Representation of the statistical model===<br /> <br /> <br /> Since this model involves probability distributions, we will use a probabilistic model to represent it. Let $V_i$ and $w_i$ be the volume and weight of individual $i$. Statement 1 implies that only the conditional distribution $p(V_i | w_i)$ for individual $i$ needs to be represented. A probability distribution can be mathematically represented by a series of definitions and equations. This mathematical representation is not unique. We can use for instance any of these three representations:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:ex2a&quot;&gt;&lt;math&gt;\begin{eqnarray} <br /> V_i &amp;=&amp; \Vpop \left(\displaystyle{ \frac{w_i}{70} }\right)^\beta \, e^{\eta_i} \quad \text{where} \quad \eta_i \sim {\cal N}(0, \omega^2)<br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:ex2b&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \hat{V}_i &amp;=&amp; \Vpop \left(\displaystyle{ \frac{w_i}{70} }\right)^\beta \quad \text{and} \quad<br /> \log(V_i) \sim {\cal N}(\log(\hat{V}_i) , \omega^2)<br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:ex2c&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \tilde{w}_i&amp; =&amp; \log\left(\displaystyle{ \frac{w_i}{70} }\right) \quad \text{and} \quad \log(V_i) \sim {\cal N}(\log(\Vpop)+\beta \, \tilde{w}_i , \omega^2) .<br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt; <br /> |reference=(3) }}<br /> Here, $\omega$ is the standard deviation of the log-volume, $\Vpop$ is a reference value of volume in the population for a reference individual of 70kg and $\Vpop (w_i/70)^\beta$ the predicted volume for an individual with weight $w_i$.<br /> <br /> These three representations combine equations and definitions. The equations allow us to define the variables via algebraic equations, while the definitions characterize the random variables via probability distributions.<br /> <br /> <br /> &lt;br&gt;<br /> ===Implementation of the statistical model===<br /> <br /> <br /> The implementation of such models with $\mlxtran$ allows the direct usage of the same definitions and equations with a language very close to the mathematical one. The model in [[#eq:ex2a|(1)]] can be implemented in the following way:<br /> <br /> <br /> {{MLXTran<br /> |name=<br /> |text= &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;DEFINITION:<br /> eta = {distribution=normal, mean=0, standardDeviation=omega}<br /> <br /> EQUATION:<br /> V = Vpop*((w/70)^beta)*exp(eta)<br /> &lt;/pre&gt; }}<br /> <br /> <br /> The model in [[#eq:ex2b|(2)]] can be implemented this way:<br /> <br /> <br /> {{MLXTran<br /> |name=<br /> |text= &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;EQUATION:<br /> Vpred = Vpop*(W/70)^beta<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=Vpred, standardDeviation=omega}<br /> &lt;/pre&gt; }}<br /> <br /> <br /> The model in [[#eq:ex2c|(3)]] can be implemented like this:<br /> <br /> <br /> {{MLXTran<br /> |name=<br /> |text= &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;EQUATION:<br /> lw70 = log(W/70)<br /> <br /> DEFINITION[model=linear]:<br /> V = {distribution=logNormal, reference=Vpop, covariate=lw70,<br /> covariateCoefficient=beta, standardDeviation=omega}<br /> &lt;/pre&gt; }}<br /> <br /> Note that the linearity of the model is information that is explicitly entered.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Which representation for which task?===<br /> <br /> Representations [[#eq:ex2a|(1)]], [[#eq:ex2b|(2)]] and [[#eq:ex2c|(3)]] provide three different mathematical representations of the same probabilistic model. This means that when any of them are written in text or on a slide, anyone with some basic knowledge in statistics and mathematics will be able to derive the same information from any of the representations.<br /> <br /> However, if we want to use the model to perform tasks using specific software, the information passed to the software needs to be of a form that the software can understand with respect to each given task. It is not always true that any representation paired with any implementation can be used to perform any task. Let us illustrate this on our example for three basic tasks: simulation, likelihood computation and covariate model assessment.<br /> <br /> <br /> '''Simulation.''' If we assume that the software we use is able to simulate normal random variables with any given mean and standard deviation, then any representation of the model can be used for simulation:<br /> <br /> <br /> &lt;ul&gt;<br /> * Using [[#eq:ex2a|(1)]], $\eta_i$ is first simulated as a normal random variable with mean 0 and variance $\omega^2$. Then the volume $V_i$ is calculated as a function of $\eta_i$.<br /> <br /> * Using [[#eq:ex2b|(2)]] or [[#eq:ex2c|(3)]], $\log(V_i)$ can be directly simulated as a random normal variable with mean $\log(\Vpop)+\beta \log\left(w_i/70\right)$, or equivalently $\log(\Vpop(w_i/70)^\beta)$, and standard deviation $\omega^2$. Then $V_i = \exp\left(\log(V_i)\right)$.<br /> &lt;/ul&gt;<br /> <br /> <br /> In summary, what is required for simulation is the capacity to express the variable to be simulated as a function of some random variable that can be directly simulated by the software. In conclusion, any of the three $\mlxtran$ implementations proposed above can be used for simulation.<br /> <br /> <br /> '''Likelihood computation.''' By definition, the likelihood of a set of parameter values given some continuous observed outcomes is equal to the probability distribution function (pdf) of those observed outcomes given those parameters. In other words, to derive the likelihood of $\theta=(\Vpop,\beta,\omega^2)$ requires computation of the pdf of $V_i$ or a certain function of it. Here, it is straightforward to derive the likelihood from the pdf of $V_i$, which is log-normally distributed:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;likelihood1=&gt;&lt;math&gt;\begin{eqnarray}<br /> L_1(\theta ; V_1,\ldots,V_N) &amp;=&amp; \py(V_1,V_2,\ldots,V_N ;\theta) \\<br /> &amp; = &amp; \prod_{i=1}^N \py( V_i ;\theta) \\<br /> &amp; = &amp; \prod_{i=1}^N \displaystyle{ \frac{1}{\sqrt{2\pi \omega}V_i} } \exp\left\{-\displaystyle{ \frac{1}{2\omega} } \left( \log(V_i) - \log\left(\Vpop \left({w_i}/{70}\right)^\beta\right) \right)^2\right\} .<br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> It is also straightforward to derive the likelihood from the pdf of $\log(V_i)$, which is normally distributed:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;likelihood2&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> L_2(\theta ; \log(V_1),\ldots,\log(V_N)) &amp;=&amp; \py(\log(V_1),\ldots,\log(V_N) ;\theta) \\<br /> &amp; = &amp; \prod_{i=1}^N \displaystyle{ \frac{1}{\sqrt{2\pi \omega} } }\exp\left\{-\frac{1}{2\omega} \left( \log(V_i) - \log\left(\Vpop \left({w_i}/{70}\right)^\beta\right) \right)^2 \right\} .<br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(5) }}<br /> <br /> These two likelihoods $L_1$ and $L_2$ are equal up to a constant $\prod_i V_i$. No matter what the definition of the likelihood is based on, it is nonetheless necessary to provide some information about the pdf of $V_i$ for computing the likelihood. In this very basic example, the minimal information about the model that needs to be passed to the software via code to be able to compute the likelihood is:<br /> <br /> <br /> &lt;ul&gt;<br /> * The log-volume is normally distributed<br /> * The mean of $\log(V_i)$ is $\log\left(\Vpop \left({w_i}/{70}\right)^\beta\right)$<br /> * The standard deviation of $\log(V_i)$ is $\omega$.<br /> &lt;/ul&gt;<br /> <br /> <br /> Then, the likelihood can be easily computed if the software is able to compute a normal pdf for a given mean and standard deviation.<br /> <br /> In our example, only the representations of the model given in [[#eq:ex2b|(2)]] and [[#eq:ex2c|(3)]] (and therefore versions 2 and 3 of the $\mlxtran$ implementation) can be used for computing the likelihood in closed form. Indeed, both representations explicitly describe the probability distribution of $\log(V_i)$ and provide all the required information. On the other hand, the representation given in [[#eq:ex2a|(1)]] does not provide any explicit information about the distribution of $V_i$. Deriving the pdf of $V_i$ from [[#eq:ex2a|(1)]] would therefore require an interpreter to &quot;understand&quot; the formula, and a tool that can perform symbolic computation.<br /> <br /> <br /> '''Covariate model assessment.''' Our model hypothesizes a linear relationship between the log-weight and the log-volume.<br /> To assess if this is valid or not, we might consider using some visual diagnostic check of the plot of the (predicted or simulated) log-volume against the log-weight, to see whether this linear relationship seems plausible or not. Specific statistical procedures can also be used for testing the linearity hypothesis.<br /> <br /> Thus, both displaying an appropriate goodness of fit plot and using an appropriate statistical test require knowledge of the explicit relationship between the covariate and the parameter, i.e., the software needs to &quot;know&quot; this relationship. Neither of the representations of the model based on equations [[#eq:ex2a|(1)]] and [[#eq:ex2b|(2)]] explicitly spell out this relationship to the software. Of course, we can rewrite [[#eq:ex2b|(2)]] as:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \mu_i &amp;=&amp; \log(\Vpop)+\beta \log\left(\displaystyle{ \frac{W_i}{70} }\right) \\<br /> \log(V_i) &amp;\sim&amp; {\cal N}(\mu_i , \omega),<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> and clearly &quot;see&quot; that the predicted log-volume is a linear function of the log-weight. The issue is that, without a powerful interpreter, this information is not available to the software, so it cannot automatically run these tasks. Therefore, we must explicitly &quot;tell&quot; the software that the model is linear, as can be done with $\mlxtran$.<br /> <br /> <br /> {{Back<br /> |link=What is a model? A joint probability distribution! }}</div> Bleakley https://wiki.inria.fr/wikis/popix/index.php?title=Description,_representation_and_implementation_of_a_model&diff=7383 Description, representation and implementation of a model 2013-06-21T08:02:13Z <p>Bleakley: </p> <hr /> <div>==Introduction==<br /> <br /> A &quot;model&quot; can be implemented in the real world if it can be programmed using software. To do this, we need a language that can be understood by the software. Before even arriving at this point, it is important to be very clear and systematic about what a model is and how we want to use it.<br /> <br /> It is fundamental to distinguish between the description, representation and implementation of a model. Each of these three concepts uses a specific language.<br /> {| cellpadding=&quot;15&quot; cellspacing=&quot;15&quot;<br /> |style=&quot;width=500px&quot;| 1. First, we describe a model with words, i.e., a human language: || &lt;span style=&quot;font-family:comic sans ms;font-size:11pt&quot;&gt;&quot;The weight is a linear function of the height&quot;&lt;span&gt;<br /> |-<br /> |2. Then we represent the model using a mathematical or schematic language: || &lt;math&gt; W=a\,H + b &lt;/math&gt;<br /> |-<br /> |3. Lastly, we implement the model via a language understood by the software: || {{Verbatim|WEIGHT {{-}} a*HEIGHT + b<br /> |} <br /> <br /> <br /> The representation of a model is not unique. The choice of the representation should be driven by the tasks to be executed: if the model is only used to perform computations, a system of equations contains all the information required. If properties of the model need to be tested (linearity, [http://en.wikipedia.org/wiki/Homoscedasticity homoscedasticity], etc.), then they need to be represented via explicit definitions.<br /> <br /> In the context of mixed-effects models, models that we want to implement can be decomposed into two components: the structural model and the statistical model. Both components have to be described, represented and implemented with precision.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Description, representation and implementation of the structural model==<br /> <br /> Let us now look in more detail at these three things.<br /> <br /> &lt;br&gt;<br /> ===Description of the structural model===<br /> <br /> The first step consists in describing a model with high precision, using terminology and vocabulary well-adapted to the application. For instance, let us consider a [http://en.wikipedia.org/wiki/Pharmacokinetics PK] model that describes drug concentration as a function of time. We can describe the model with the sentence:<br /> <br /> &lt;blockquote&gt;<br /> &quot;''The PK model is a two-compartment model with first-order absorption (from the depot compartment - the gut, to the central compartment - the bloodstream), linear transfers between the central and the peripheral compartment, and linear elimination from the central compartment''&quot;.<br /> &lt;/blockquote&gt;<br /> &lt;br&gt;<br /> <br /> ===Representation of the structural model===<br /> <br /> <br /> 1. ''Using a diagram''<br /> <br /> This PK model can by represented by a diagram like the one shown the the following figure. Such diagrams offer both a descriptive and explicit representation (because the properties of the PK model are clearly shown).<br /> <br /> <br /> :::[[File:intro41.png|400px|link=]]<br /> <br /> <br /> 2. ''Using mathematical equations''<br /> <br /> Alternatively, a mathematical representation can be used to translate the description of the model into a system of equations:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \deriv{A_d} &amp; =&amp; -k_a A_d(t) \\<br /> &lt;!--%\deriv{A_c} &amp; =&amp; k_a A_d(t) -k_{12}A_c(t) + k_{21}A_p(t) - \frac{V_m}{V\,K_m + A_c(t)} A_c(t) \\--&gt;<br /> \deriv{A_c} &amp; =&amp; k_a A_d(t) -k_{12}A_c(t) + k_{21}A_p(t) - k_e A_c(t) \\<br /> \deriv{A_p} &amp; =&amp; k_{12}A_c(t) - k_{21}A_p(t) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> This representation allows us to calculate the amount of drug in each compartment at any point of time. On the other hand, this description of the model is implicit: even if a modeler is able to recognize the model described by the equations, i.e., to identify the processes of [http://en.wikipedia.org/wiki/Absorption_%28pharmacokinetics%29 absorption], [http://en.wikipedia.org/wiki/Distribution_%28pharmacology%29 distribution] and elimination, these are not explicitly represented like in the diagram.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Implementation of the structural model===<br /> <br /> <br /> <br /> 1. ''Using macros''<br /> <br /> The $\mlxtran$ language allows us to implement the model represented in the previous diagram using a simple script and a system of macros:<br /> <br /> <br /> {{MLXTran<br /> |name= - version 1<br /> |text=&lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt; <br /> PK:<br /> compartment(amount=Ac)<br /> oral(ka)<br /> peripheral(k12,k21)<br /> elimination(ke)<br /> &lt;/pre&gt; }}<br /> <br /> As you can see, there is a one-to-one mapping between the diagram and the code: each element of the diagram (and therefore of the model) is implemented as a macro.<br /> <br /> <br /> 2. ''Using equations''<br /> <br /> Alternatively, implementation of the model using the mathematical representation requires entering the system of equations into $\mlxtran$. The syntax used should be as close as possible to the original mathematical language in order make development simple and the code easy to parse. Here is the $\mlxtran$ syntax in this case:<br /> <br /> <br /> {{MLXTran<br /> |name= - version 2<br /> |text=&lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt; <br /> EQUATION:<br /> ddt_Ad = -ka*Ad <br /> ddt_Ac = ka*Ad - k12*Ac + k21*Ap - ke*Ac<br /> ddt_Ap = k12*Ac - k21*Ap<br /> &lt;/pre&gt; }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Which representation for which task?===<br /> <br /> <br /> It is fundamental to have the possibility of using several representations, and therefore several implementations, depending on the task at hand. One reason is that each kind of implementation has its pros and cons.<br /> <br /> The use of equations has the big advantage of being able to represent ''any'' complex model. This is not possible when using macros, which are fixed in number by default. For instance, the PK macros in $\mlxtran$ allow us to code linear and nonlinear ([http://en.wikipedia.org/wiki/Michaelis%E2%80%93Menten_kinetics Michaelis-Menten]) elimination, but no macro exists that can combine the two types of elimination. In contrast, such processes can be easily input using equations:<br /> <br /> ::{{Verbatim |ddt_Ac{{-}} ka*Ad - k12*Ac + k21*Ap - k*Ac - Vm*Ac/(Km*V + Ac) }}<br /> <br /> In a similar vein, models that are well-defined mathematically may be horribly complex to implement using equations, but easy using macros. This is true for instance for [http://en.wikipedia.org/wiki/Dynamical_system dynamical systems] with source terms such as PK models with repeated oral doses and zero-order absorption. In that example, the absorption rate is a piecewise constant function.<br /> It is not easy to code this model using equations, and not worth it when we can quickly use the $\mlxtran$ macro {{Verbatim|oral(Tk0)}}, which completely characterizes the model for any dose design. The C++ code generated from an $\mlxtran$ script that uses this macro is the same as the one (that would be) generated by a script using a system of equations.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Description, representation and implementation of the statistical model==<br /> <br /> The statistical component of the model can be decomposed in two sub-models: a model that describes the variability of the parameters and a model that describes the variability of the observations. Each sub-model needs to be described, represented and implemented. Let us illustrate this approach with a very simple statistical model used for modeling the variability of a single individual parameter.<br /> <br /> &lt;br&gt;<br /> ===Description of the statistical model===<br /> In this example we want to describe the distribution of the volume in the population, using weight as a covariate. The first step consists of describing with extreme precision the statistical model that we want to use:<br /> <br /> <br /> &lt;ol&gt;<br /> &lt;li&gt;''Individuals in the population are mutually independent''&lt;/li&gt;<br /> &lt;li&gt;''The volume is log-normally distributed''&lt;/li&gt;<br /> &lt;li&gt;''The log-volume predicted by the model is a linear function of the log-weight''&lt;/li&gt;<br /> &lt;li&gt;''The reference weight in the population is 70kg''&lt;/li&gt;<br /> &lt;li&gt;''The variance of the log-volume is constant.''&lt;/li&gt;<br /> &lt;/ol&gt;<br /> <br /> &lt;br&gt;<br /> <br /> ===Representation of the statistical model===<br /> <br /> <br /> Since this model involves probability distributions, we will use a probabilistic model to represent it. Let $V_i$ and $w_i$ be the volume and weight of individual $i$. Statement 1 implies that only the conditional distribution $p(V_i | w_i)$ for individual $i$ needs to be represented. A probability distribution can be mathematically represented by a series of definitions and equations. This mathematical representation is not unique. We can use for instance any of these three representations:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:ex2a&quot;&gt;&lt;math&gt;\begin{eqnarray} <br /> V_i &amp;=&amp; \Vpop \left(\displaystyle{ \frac{w_i}{70} }\right)^\beta \, e^{\eta_i} \quad \text{where} \quad \eta_i \sim {\cal N}(0, \omega^2)<br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:ex2b&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \hat{V}_i &amp;=&amp; \Vpop \left(\displaystyle{ \frac{w_i}{70} }\right)^\beta \quad \text{and} \quad<br /> \log(V_i) \sim {\cal N}(\log(\hat{V}_i) , \omega^2)<br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:ex2c&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \tilde{w}_i&amp; =&amp; \log\left(\displaystyle{ \frac{w_i}{70} }\right) \quad \text{and} \quad \log(V_i) \sim {\cal N}(\log(\Vpop)+\beta \, \tilde{w}_i , \omega^2) .<br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt; <br /> |reference=(3) }}<br /> Here, $\omega$ is the standard deviation of the log-volume, $\Vpop$ is a reference value of volume in the population for a reference individual of 70kg and $\Vpop (w_i/70)^\beta$ the predicted volume for an individual with weight $w_i$.<br /> <br /> These three representations combine equations and definitions. The equations allow us to define the variables via algebraic equations, while the definitions characterize the random variables via probability distributions.<br /> <br /> <br /> &lt;br&gt;<br /> ===Implementation of the statistical model===<br /> <br /> <br /> The implementation of such models with $\mlxtran$ allows the direct usage of the same definitions and equations with a language very close to the mathematical one. The model in [[#eq:ex2a|(1)]] can be implemented in the following way:<br /> <br /> <br /> {{MLXTran<br /> |name=<br /> |text= &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;DEFINITION:<br /> eta = {distribution=normal, mean=0, standardDeviation=omega}<br /> <br /> EQUATION:<br /> V = Vpop*((w/70)^beta)*exp(eta)<br /> &lt;/pre&gt; }}<br /> <br /> <br /> The model in [[#eq:ex2b|(2)]] can be implemented this way:<br /> <br /> <br /> {{MLXTran<br /> |name=<br /> |text= &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;EQUATION:<br /> Vpred = Vpop*(W/70)^beta<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=Vpred, standardDeviation=omega}<br /> &lt;/pre&gt; }}<br /> <br /> <br /> The model in [[#eq:ex2c|(3)]] can be implemented like this:<br /> <br /> <br /> {{MLXTran<br /> |name=<br /> |text= &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;EQUATION:<br /> lw70 = log(W/70)<br /> <br /> DEFINITION[model=linear]:<br /> V = {distribution=logNormal, reference=Vpop, covariate=lw70,<br /> covariateCoefficient=beta, standardDeviation=omega}<br /> &lt;/pre&gt; }}<br /> <br /> Note that the linearity of the model is information that is explicitly entered.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Which representation for which task?===<br /> <br /> Representations [[#eq:ex2a|(1)]], [[#eq:ex2b|(2)]] and [[#eq:ex2c|(3)]] provide three different mathematical representations of the same probabilistic model. This means that when any of them are written in text or on a slide, anyone with some basic knowledge in statistics and mathematics will be able to derive the same information from any of the representations.<br /> <br /> However, if we want to use the model to perform tasks using specific software, the information passed to the software needs to be of a form that the software can understand with respect to each given task. It is not always true that any representation paired with any implementation can be used to perform any task. Let us illustrate this on our example for three basic tasks: simulation, likelihood computation and covariate model assessment.<br /> <br /> <br /> '''Simulation.''' If we assume that the software we use is able to simulate normal random variables with any given mean and standard deviation, then any representation of the model can be used for simulation:<br /> <br /> <br /> &lt;ul&gt;<br /> * Using [[#eq:ex2a|(1)]], $\eta_i$ is first simulated as a random normal variable with mean 0 and variance $\omega^2$. Then the volume $V_i$ is calculated as a function of $\eta_i$.<br /> <br /> * Using [[#eq:ex2b|(2)]] or [[#eq:ex2c|(3)]], $\log(V_i)$ can be directly simulated as a random normal variable with mean $\log(\Vpop)+\beta \log\left(w_i/70\right)$, or equivalently $\log(\Vpop(w_i/70)^\beta)$, and standard deviation $\omega^2$. Then $V_i = \exp\left(\log(V_i)\right)$.<br /> &lt;/ul&gt;<br /> <br /> <br /> In summary, what is required for simulation is the capacity to express the variable to be simulated as a function of some random variable that can be directly simulated by the software. In conclusion, any of the three $\mlxtran$ implementations proposed above can be used for simulation.<br /> <br /> <br /> '''Likelihood computation.''' By definition, the likelihood of a set of parameter values given some continuous observed outcomes is equal to the probability distribution function (pdf) of those observed outcomes given those parameters. In other words, to derive the likelihood of $\theta=(\Vpop,\beta,\omega^2)$ requires computation of the pdf of $V_i$ or a certain function of it. Here, it is straightforward to derive the likelihood from the pdf of $V_i$, which is log-normally distributed:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;likelihood1=&gt;&lt;math&gt;\begin{eqnarray}<br /> L_1(\theta ; V_1,\ldots,V_N) &amp;=&amp; \py(V_1,V_2,\ldots,V_N ;\theta) \\<br /> &amp; = &amp; \prod_{i=1}^N \py( V_i ;\theta) \\<br /> &amp; = &amp; \prod_{i=1}^N \displaystyle{ \frac{1}{\sqrt{2\pi \omega}V_i} } \exp\left\{-\displaystyle{ \frac{1}{2\omega} } \left( \log(V_i) - \log\left(\Vpop \left({w_i}/{70}\right)^\beta\right) \right)^2\right\} .<br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> It is also straightforward to derive the likelihood from the pdf of $\log(V_i)$, which is normally distributed:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;likelihood2&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> L_2(\theta ; \log(V_1),\ldots,\log(V_N)) &amp;=&amp; \py(\log(V_1),\ldots,\log(V_N) ;\theta) \\<br /> &amp; = &amp; \prod_{i=1}^N \displaystyle{ \frac{1}{\sqrt{2\pi \omega} } }\exp\left\{-\frac{1}{2\omega} \left( \log(V_i) - \log\left(\Vpop \left({w_i}/{70}\right)^\beta\right) \right)^2 \right\} .<br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(5) }}<br /> <br /> These two likelihoods $L_1$ and $L_2$ are equal up to a constant $\prod_i V_i$. No matter what the definition of the likelihood is based on, it is nonetheless necessary to provide some information about the pdf of $V_i$ for computing the likelihood. In this very basic example, the minimal information about the model that needs to be passed to the software via code to be able to compute the likelihood is:<br /> <br /> <br /> &lt;ul&gt;<br /> * The log-volume is normally distributed<br /> * The mean of $\log(V_i)$ is $\log\left(\Vpop \left({w_i}/{70}\right)^\beta\right)$<br /> * The standard deviation of $\log(V_i)$ is $\omega$.<br /> &lt;/ul&gt;<br /> <br /> <br /> Then, the likelihood can be easily computed if the software is able to compute a normal pdf for a given mean and standard deviation.<br /> <br /> In our example, only the representations of the model given in [[#eq:ex2b|(2)]] and [[#eq:ex2c|(3)]] (and therefore versions 2 and 3 of the $\mlxtran$ implementation) can be used for computing the likelihood in closed form. Indeed, both representations explicitly describe the probability distribution of $\log(V_i)$ and provide all the required information. On the other hand, the representation given in [[#eq:ex2a|(1)]] does not provide any explicit information about the distribution of $V_i$. Deriving the pdf of $V_i$ from [[#eq:ex2a|(1)]] would therefore require an interpreter to &quot;understand&quot; the formula, and a tool that can perform symbolic computation.<br /> <br /> <br /> '''Covariate model assessment.''' Our model hypothesizes a linear relationship between the log-weight and the log-volume.<br /> To assess if this is valid or not, we might consider using some visual diagnostic check of the plot of the (predicted or simulated) log-volume against the log-weight, to see whether this linear relationship seems plausible or not. Specific statistical procedures can also be used for testing the linearity hypothesis.<br /> <br /> Thus, both displaying an appropriate goodness of fit plot and using an appropriate statistical test require knowledge of the explicit relationship between the covariate and the parameter, i.e., the software needs to &quot;know&quot; this relationship. Neither of the representations of the model based on equations [[#eq:ex2a|(1)]] and [[#eq:ex2b|(2)]] explicitly spell out this relationship to the software. Of course, we can rewrite [[#eq:ex2b|(2)]] as:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \mu_i &amp;=&amp; \log(\Vpop)+\beta \log\left(\displaystyle{ \frac{W_i}{70} }\right) \\<br /> \log(V_i) &amp;\sim&amp; {\cal N}(\mu_i , \omega),<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> and clearly &quot;see&quot; that the predicted log-volume is a linear function of the log-weight. The issue is that, without a powerful interpreter, this information is not available to the software, so it cannot automatically run these tasks. Therefore, we must explicitly &quot;tell&quot; the software that the model is linear, as can be done with $\mlxtran$.<br /> <br /> <br /> {{Back<br /> |link=What is a model? A joint probability distribution! }}</div> Bleakley https://wiki.inria.fr/wikis/popix/index.php?title=Description,_representation_and_implementation_of_a_model&diff=7382 Description, representation and implementation of a model 2013-06-21T08:00:33Z <p>Bleakley: </p> <hr /> <div>==Introduction==<br /> <br /> A &quot;model&quot; can be implemented in the real world if it can be programmed using software. To do this, we need a language that can be understood by the software. Before even arriving at this point, it is important to be very clear and systematic about what a model is and how we want to use it.<br /> <br /> It is fundamental to distinguish between the description, representation and implementation of a model. Each of these three concepts uses a specific language.<br /> {| cellpadding=&quot;15&quot; cellspacing=&quot;15&quot;<br /> |style=&quot;width=500px&quot;| 1. First, we describe a model with words, i.e., a human language: || &lt;span style=&quot;font-family:comic sans ms;font-size:11pt&quot;&gt;&quot;The weight is a linear function of the height&quot;&lt;span&gt;<br /> |-<br /> |2. Then we represent the model using a mathematical or schematic language: || &lt;math&gt; W=a\,H + b &lt;/math&gt;<br /> |-<br /> |3. Lastly, we implement the model via a language understood by the software: || {{Verbatim|WEIGHT {{-}} a*HEIGHT + b<br /> |} <br /> <br /> <br /> The representation of a model is not unique. The choice of the representation should be driven by the tasks to be executed: if the model is only used to perform computations, a system of equations contains all the information required. If properties of the model need to be tested (linearity, [http://en.wikipedia.org/wiki/Homoscedasticity homoscedasticity], etc.), then they need to be represented via explicit definitions.<br /> <br /> In the context of mixed-effects models, models that we want to implement can be decomposed into two components: the structural model and the statistical model. Both components have to be described, represented and implemented with precision.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Description, representation and implementation of the structural model==<br /> <br /> Let us now look in more detail at these three things.<br /> <br /> &lt;br&gt;<br /> ===Description of the structural model===<br /> <br /> The first step consists in describing a model with high precision, using terminology and vocabulary well-adapted to the application. For instance, let us consider a [http://en.wikipedia.org/wiki/Pharmacokinetics PK] model that describes drug concentration as a function of time. We can describe the model with the sentence:<br /> <br /> &lt;blockquote&gt;<br /> &quot;''The PK model is a two-compartment model with first-order absorption (from the depot compartment - the gut, to the central compartment - the bloodstream), linear transfers between the central and the peripheral compartment, and linear elimination from the central compartment''&quot;.<br /> &lt;/blockquote&gt;<br /> &lt;br&gt;<br /> <br /> ===Representation of the structural model===<br /> <br /> <br /> 1. ''Using a diagram''<br /> <br /> This PK model can by represented by a diagram like the one shown the the following figure. Such diagrams offer both a descriptive and explicit representation (because the properties of the PK model are clearly shown).<br /> <br /> <br /> :::[[File:intro41.png|400px|link=]]<br /> <br /> <br /> 2. ''Using mathematical equations''<br /> <br /> Alternatively, a mathematical representation can be used to translate the description of the model into a system of equations:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \deriv{A_d} &amp; =&amp; -k_a A_d(t) \\<br /> &lt;!--%\deriv{A_c} &amp; =&amp; k_a A_d(t) -k_{12}A_c(t) + k_{21}A_p(t) - \frac{V_m}{V\,K_m + A_c(t)} A_c(t) \\--&gt;<br /> \deriv{A_c} &amp; =&amp; k_a A_d(t) -k_{12}A_c(t) + k_{21}A_p(t) - k_e A_c(t) \\<br /> \deriv{A_p} &amp; =&amp; k_{12}A_c(t) - k_{21}A_p(t) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> This representation allows us to calculate the amount of drug in each compartment at any point of time. On the other hand, this description of the model is implicit: even if a modeler is able to recognize the model described by the equations, i.e., to identify the processes of [http://en.wikipedia.org/wiki/Absorption_%28pharmacokinetics%29 absorption], [http://en.wikipedia.org/wiki/Distribution_%28pharmacology%29 distribution] and elimination, these are not explicitly represented like in the diagram.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Implementation of the structural model===<br /> <br /> <br /> <br /> 1. ''Using macros''<br /> <br /> The $\mlxtran$ language allows us to implement the model represented in the previous diagram using a simple script and a system of macros:<br /> <br /> <br /> {{MLXTran<br /> |name= - version 1<br /> |text=&lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt; <br /> PK:<br /> compartment(amount=Ac)<br /> oral(ka)<br /> peripheral(k12,k21)<br /> elimination(ke)<br /> &lt;/pre&gt; }}<br /> <br /> As you can see, there is a one-to-one mapping between the diagram and the code: each element of the diagram (and therefore of the model) is implemented as a macro.<br /> <br /> <br /> 2. ''Using equations''<br /> <br /> Alternatively, implementation of the model using the mathematical representation requires entering the system of equations into $\mlxtran$. The syntax used should be as close as possible to the original mathematical language in order make development simple and the code easy to parse. Here is the $\mlxtran$ syntax in this case:<br /> <br /> <br /> {{MLXTran<br /> |name= - version 2<br /> |text=&lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt; <br /> EQUATION:<br /> ddt_Ad = -ka*Ad <br /> ddt_Ac = ka*Ad - k12*Ac + k21*Ap - ke*Ac<br /> ddt_Ap = k12*Ac - k21*Ap<br /> &lt;/pre&gt; }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Which representation for which task?===<br /> <br /> <br /> It is fundamental to have the possibility of using several representations, and therefore several implementations, depending on the task at hand. One reason is that each kind of implementation has its pros and cons.<br /> <br /> The use of equations has the big advantage of being able to represent ''any'' complex model. This is not possible when using macros, which are fixed in number by default. For instance, the PK macros in $\mlxtran$ allow us to code linear and nonlinear ([http://en.wikipedia.org/wiki/Michaelis%E2%80%93Menten_kinetics Michaelis-Menten]) elimination, but no macro exists that can combine the two types of elimination. In contrast, such processes can be easily input using equations:<br /> <br /> ::{{Verbatim |ddt_Ac{{-}} ka*Ad - k12*Ac + k21*Ap - k*Ac - Vm*Ac/(Km*V + Ac) }}<br /> <br /> In a similar vein, models that are well-defined mathematically may be horribly complex to implement using equations, but easy using macros. This is true for instance for dynamical systems with source terms such as PK models with repeated oral doses and zero-order absorption. In that example, the absorption rate is a piecewise constant function.<br /> It is not easy to code this model using equations, and not worth it when we can quickly use the $\mlxtran$ macro {{Verbatim|oral(Tk0)}}, which completely characterizes the model for any dose design. The C++ code generated from an $\mlxtran$ script that uses this macro is the same as the one (that would be) generated by a script using a system of equations.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Description, representation and implementation of the statistical model==<br /> <br /> The statistical component of the model can be decomposed in two sub-models: a model that describes the variability of the parameters and a model that describes the variability of the observations. Each sub-model needs to be described, represented and implemented. Let us illustrate this approach with a very simple statistical model used for modeling the variability of a single individual parameter.<br /> <br /> &lt;br&gt;<br /> ===Description of the statistical model===<br /> In this example we want to describe the distribution of the volume in the population, using weight as a covariate. The first step consists of describing with extreme precision the statistical model that we want to use:<br /> <br /> <br /> &lt;ol&gt;<br /> &lt;li&gt;''Individuals in the population are mutually independent''&lt;/li&gt;<br /> &lt;li&gt;''The volume is log-normally distributed''&lt;/li&gt;<br /> &lt;li&gt;''The log-volume predicted by the model is a linear function of the log-weight''&lt;/li&gt;<br /> &lt;li&gt;''The reference weight in the population is 70kg''&lt;/li&gt;<br /> &lt;li&gt;''The variance of the log-volume is constant.''&lt;/li&gt;<br /> &lt;/ol&gt;<br /> <br /> &lt;br&gt;<br /> <br /> ===Representation of the statistical model===<br /> <br /> <br /> Since this model involves probability distributions, we will use a probabilistic model to represent it. Let $V_i$ and $w_i$ be the volume and weight of individual $i$. Statement 1 implies that only the conditional distribution $p(V_i | w_i)$ for individual $i$ needs to be represented. A probability distribution can be mathematically represented by a series of definitions and equations. This mathematical representation is not unique. We can use for instance any of these three representations:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:ex2a&quot;&gt;&lt;math&gt;\begin{eqnarray} <br /> V_i &amp;=&amp; \Vpop \left(\displaystyle{ \frac{w_i}{70} }\right)^\beta \, e^{\eta_i} \quad \text{where} \quad \eta_i \sim {\cal N}(0, \omega^2)<br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:ex2b&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \hat{V}_i &amp;=&amp; \Vpop \left(\displaystyle{ \frac{w_i}{70} }\right)^\beta \quad \text{and} \quad<br /> \log(V_i) \sim {\cal N}(\log(\hat{V}_i) , \omega^2)<br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:ex2c&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \tilde{w}_i&amp; =&amp; \log\left(\displaystyle{ \frac{w_i}{70} }\right) \quad \text{and} \quad \log(V_i) \sim {\cal N}(\log(\Vpop)+\beta \, \tilde{w}_i , \omega^2) .<br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt; <br /> |reference=(3) }}<br /> Here, $\omega$ is the standard deviation of the log-volume, $\Vpop$ is a reference value of volume in the population for a reference individual of 70kg and $\Vpop (w_i/70)^\beta$ the predicted volume for an individual with weight $w_i$.<br /> <br /> These three representations combine equations and definitions. The equations allow us to define the variables via algebraic equations, while the definitions characterize the random variables via probability distributions.<br /> <br /> <br /> &lt;br&gt;<br /> ===Implementation of the statistical model===<br /> <br /> <br /> The implementation of such models with $\mlxtran$ allows the direct usage of the same definitions and equations with a language very close to the mathematical one. The model in [[#eq:ex2a|(1)]] can be implemented in the following way:<br /> <br /> <br /> {{MLXTran<br /> |name=<br /> |text= &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;DEFINITION:<br /> eta = {distribution=normal, mean=0, standardDeviation=omega}<br /> <br /> EQUATION:<br /> V = Vpop*((w/70)^beta)*exp(eta)<br /> &lt;/pre&gt; }}<br /> <br /> <br /> The model in [[#eq:ex2b|(2)]] can be implemented this way:<br /> <br /> <br /> {{MLXTran<br /> |name=<br /> |text= &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;EQUATION:<br /> Vpred = Vpop*(W/70)^beta<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=Vpred, standardDeviation=omega}<br /> &lt;/pre&gt; }}<br /> <br /> <br /> The model in [[#eq:ex2c|(3)]] can be implemented like this:<br /> <br /> <br /> {{MLXTran<br /> |name=<br /> |text= &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;EQUATION:<br /> lw70 = log(W/70)<br /> <br /> DEFINITION[model=linear]:<br /> V = {distribution=logNormal, reference=Vpop, covariate=lw70,<br /> covariateCoefficient=beta, standardDeviation=omega}<br /> &lt;/pre&gt; }}<br /> <br /> Note that the linearity of the model is information that is explicitly entered.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Which representation for which task?===<br /> <br /> Representations [[#eq:ex2a|(1)]], [[#eq:ex2b|(2)]] and [[#eq:ex2c|(3)]] provide three different mathematical representations of the same probabilistic model. This means that when any of them are written in text or on a slide, anyone with some basic knowledge in statistics and mathematics will be able to derive the same information from any of the representations.<br /> <br /> However, if we want to use the model to perform tasks using specific software, the information passed to the software needs to be of a form that the software can understand with respect to each given task. It is not always true that any representation paired with any implementation can be used to perform any task. Let us illustrate this on our example for three basic tasks: simulation, likelihood computation and covariate model assessment.<br /> <br /> <br /> '''Simulation.''' If we assume that the software we use is able to simulate normal random variables with any given mean and standard deviation, then any representation of the model can be used for simulation:<br /> <br /> <br /> &lt;ul&gt;<br /> * Using [[#eq:ex2a|(1)]], $\eta_i$ is first simulated as a random normal variable with mean 0 and variance $\omega^2$. Then the volume $V_i$ is calculated as a function of $\eta_i$.<br /> <br /> * Using [[#eq:ex2b|(2)]] or [[#eq:ex2c|(3)]], $\log(V_i)$ can be directly simulated as a random normal variable with mean $\log(\Vpop)+\beta \log\left(w_i/70\right)$, or equivalently $\log(\Vpop(w_i/70)^\beta)$, and standard deviation $\omega^2$. Then $V_i = \exp\left(\log(V_i)\right)$.<br /> &lt;/ul&gt;<br /> <br /> <br /> In summary, what is required for simulation is the capacity to express the variable to be simulated as a function of some random variable that can be directly simulated by the software. In conclusion, any of the three $\mlxtran$ implementations proposed above can be used for simulation.<br /> <br /> <br /> '''Likelihood computation.''' By definition, the likelihood of a set of parameter values given some continuous observed outcomes is equal to the probability distribution function (pdf) of those observed outcomes given those parameters. In other words, to derive the likelihood of $\theta=(\Vpop,\beta,\omega^2)$ requires computation of the pdf of $V_i$ or a certain function of it. Here, it is straightforward to derive the likelihood from the pdf of $V_i$, which is log-normally distributed:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;likelihood1=&gt;&lt;math&gt;\begin{eqnarray}<br /> L_1(\theta ; V_1,\ldots,V_N) &amp;=&amp; \py(V_1,V_2,\ldots,V_N ;\theta) \\<br /> &amp; = &amp; \prod_{i=1}^N \py( V_i ;\theta) \\<br /> &amp; = &amp; \prod_{i=1}^N \displaystyle{ \frac{1}{\sqrt{2\pi \omega}V_i} } \exp\left\{-\displaystyle{ \frac{1}{2\omega} } \left( \log(V_i) - \log\left(\Vpop \left({w_i}/{70}\right)^\beta\right) \right)^2\right\} .<br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> It is also straightforward to derive the likelihood from the pdf of $\log(V_i)$, which is normally distributed:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;likelihood2&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> L_2(\theta ; \log(V_1),\ldots,\log(V_N)) &amp;=&amp; \py(\log(V_1),\ldots,\log(V_N) ;\theta) \\<br /> &amp; = &amp; \prod_{i=1}^N \displaystyle{ \frac{1}{\sqrt{2\pi \omega} } }\exp\left\{-\frac{1}{2\omega} \left( \log(V_i) - \log\left(\Vpop \left({w_i}/{70}\right)^\beta\right) \right)^2 \right\} .<br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(5) }}<br /> <br /> These two likelihoods $L_1$ and $L_2$ are equal up to a constant $\prod_i V_i$. No matter what the definition of the likelihood is based on, it is nonetheless necessary to provide some information about the pdf of $V_i$ for computing the likelihood. In this very basic example, the minimal information about the model that needs to be passed to the software via code to be able to compute the likelihood is:<br /> <br /> <br /> &lt;ul&gt;<br /> * The log-volume is normally distributed<br /> * The mean of $\log(V_i)$ is $\log\left(\Vpop \left({w_i}/{70}\right)^\beta\right)$<br /> * The standard deviation of $\log(V_i)$ is $\omega$.<br /> &lt;/ul&gt;<br /> <br /> <br /> Then, the likelihood can be easily computed if the software is able to compute a normal pdf for a given mean and standard deviation.<br /> <br /> In our example, only the representations of the model given in [[#eq:ex2b|(2)]] and [[#eq:ex2c|(3)]] (and therefore versions 2 and 3 of the $\mlxtran$ implementation) can be used for computing the likelihood in closed form. Indeed, both representations explicitly describe the probability distribution of $\log(V_i)$ and provide all the required information. On the other hand, the representation given in [[#eq:ex2a|(1)]] does not provide any explicit information about the distribution of $V_i$. Deriving the pdf of $V_i$ from [[#eq:ex2a|(1)]] would therefore require an interpreter to &quot;understand&quot; the formula, and a tool that can perform symbolic computation.<br /> <br /> <br /> '''Covariate model assessment.''' Our model hypothesizes a linear relationship between the log-weight and the log-volume.<br /> To assess if this is valid or not, we might consider using some visual diagnostic check of the plot of the (predicted or simulated) log-volume against the log-weight, to see whether this linear relationship seems plausible or not. Specific statistical procedures can also be used for testing the linearity hypothesis.<br /> <br /> Thus, both displaying an appropriate goodness of fit plot and using an appropriate statistical test require knowledge of the explicit relationship between the covariate and the parameter, i.e., the software needs to &quot;know&quot; this relationship. Neither of the representations of the model based on equations [[#eq:ex2a|(1)]] and [[#eq:ex2b|(2)]] explicitly spell out this relationship to the software. Of course, we can rewrite [[#eq:ex2b|(2)]] as:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \mu_i &amp;=&amp; \log(\Vpop)+\beta \log\left(\displaystyle{ \frac{W_i}{70} }\right) \\<br /> \log(V_i) &amp;\sim&amp; {\cal N}(\mu_i , \omega),<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> and clearly &quot;see&quot; that the predicted log-volume is a linear function of the log-weight. The issue is that, without a powerful interpreter, this information is not available to the software, so it cannot automatically run these tasks. Therefore, we must explicitly &quot;tell&quot; the software that the model is linear, as can be done with $\mlxtran$.<br /> <br /> <br /> {{Back<br /> |link=What is a model? A joint probability distribution! }}</div> Bleakley https://wiki.inria.fr/wikis/popix/index.php?title=Description,_representation_and_implementation_of_a_model&diff=7381 Description, representation and implementation of a model 2013-06-21T07:57:12Z <p>Bleakley: </p> <hr /> <div>==Introduction==<br /> <br /> A &quot;model&quot; can be implemented in the real world if it can be programmed using software. To do this, we need a language that can be understood by the software. Before even arriving at this point, it is important to be very clear and systematic about what a model is and how we want to use it.<br /> <br /> It is fundamental to distinguish between the description, representation and implementation of a model. Each of these three concepts uses a specific language.<br /> {| cellpadding=&quot;15&quot; cellspacing=&quot;15&quot;<br /> |style=&quot;width=500px&quot;| 1. First, we describe a model with words, i.e., a human language: || &lt;span style=&quot;font-family:comic sans ms;font-size:11pt&quot;&gt;&quot;The weight is a linear function of the height&quot;&lt;span&gt;<br /> |-<br /> |2. Then we represent the model using a mathematical or schematic language: || &lt;math&gt; W=a\,H + b &lt;/math&gt;<br /> |-<br /> |3. Lastly, we implement the model via a language understood by the software: || {{Verbatim|WEIGHT {{-}} a*HEIGHT + b<br /> |} <br /> <br /> <br /> The representation of a model is not unique. The choice of the representation should be driven by the tasks to be executed: if the model is only used to perform computations, a system of equations contains all the information required. If properties of the model need to be tested (linearity, [http://en.wikipedia.org/wiki/Homoscedasticity homoscedasticity], etc.), then they need to be represented via explicit definitions.<br /> <br /> In the context of mixed-effects models, models that we want to implement can be decomposed into two components: the structural model and the statistical model. Both components have to be described, represented and implemented with precision.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Description, representation and implementation of the structural model==<br /> <br /> Let us now look in more detail at these three things.<br /> <br /> &lt;br&gt;<br /> ===Description of the structural model===<br /> <br /> The first step consists in describing a model with high precision, using terminology and vocabulary well-adapted to the application. For instance, let us consider a [http://en.wikipedia.org/wiki/Pharmacokinetics PK] model that describes drug concentration as a function of time. We can describe the model with the sentence:<br /> <br /> &lt;blockquote&gt;<br /> &quot;''The PK model is a two-compartment model with first-order absorption (from the depot compartment - the gut, to the central compartment - the bloodstream), linear transfers between the central and the peripheral compartment, and linear elimination from the central compartment''&quot;.<br /> &lt;/blockquote&gt;<br /> &lt;br&gt;<br /> <br /> ===Representation of the structural model===<br /> <br /> <br /> 1. ''Using a diagram''<br /> <br /> This PK model can by represented by a diagram like the one shown the the following figure. Such diagrams offer both a descriptive and explicit representation (because the properties of the PK model are clearly shown).<br /> <br /> <br /> :::[[File:intro41.png|400px|link=]]<br /> <br /> <br /> 2. ''Using mathematical equations''<br /> <br /> Alternatively, a mathematical representation can be used to translate the description of the model into a system of equations:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \deriv{A_d} &amp; =&amp; -k_a A_d(t) \\<br /> &lt;!--%\deriv{A_c} &amp; =&amp; k_a A_d(t) -k_{12}A_c(t) + k_{21}A_p(t) - \frac{V_m}{V\,K_m + A_c(t)} A_c(t) \\--&gt;<br /> \deriv{A_c} &amp; =&amp; k_a A_d(t) -k_{12}A_c(t) + k_{21}A_p(t) - k_e A_c(t) \\<br /> \deriv{A_p} &amp; =&amp; k_{12}A_c(t) - k_{21}A_p(t) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> This representation allows us to calculate the amount of drug in each compartment at any point of time. On the other hand, this description of the model is implicit: even if a modeler is able to recognize the model described by the equations, i.e., to identify the processes of [http://en.wikipedia.org/wiki/Absorption_%28pharmacokinetics%29 absorption], [http://en.wikipedia.org/wiki/Distribution_%28pharmacology%29 distribution] and elimination, these are not explicitly represented like in the diagram.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Implementation of the structural model===<br /> <br /> <br /> <br /> 1. ''Using macros''<br /> <br /> The $\mlxtran$ language allows us to implement the model represented in the previous diagram using a simple script and a system of macros:<br /> <br /> <br /> {{MLXTran<br /> |name= - version 1<br /> |text=&lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt; <br /> PK:<br /> compartment(amount=Ac)<br /> oral(ka)<br /> peripheral(k12,k21)<br /> elimination(ke)<br /> &lt;/pre&gt; }}<br /> <br /> As you can see, there is a one-to-one mapping between the diagram and the code: each element of the diagram (and therefore of the model) is implemented as a macro.<br /> <br /> <br /> 2. ''Using equations''<br /> <br /> Alternatively, implementation of the model using the mathematical representation requires entering the system of equations into $\mlxtran$. The syntax used should be as close as possible to the original mathematical language in order make development simple and the code easy to parse. Here is the $\mlxtran$ syntax in this case:<br /> <br /> <br /> {{MLXTran<br /> |name= - version 2<br /> |text=&lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt; <br /> EQUATION:<br /> ddt_Ad = -ka*Ad <br /> ddt_Ac = ka*Ad - k12*Ac + k21*Ap - ke*Ac<br /> ddt_Ap = k12*Ac - k21*Ap<br /> &lt;/pre&gt; }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Which representation for which task?===<br /> <br /> <br /> It is fundamental to have the possibility of using several representations, and therefore several implementations, depending on the task at hand. One reason is that each kind of implementation has its pros and cons.<br /> <br /> The use of equations has the big advantage of being able to represent ''any'' complex model. This is not possible when using macros, which are fixed in number by default. For instance, the PK macros in $\mlxtran$ allow us to code linear and non-linear (Michaelis-Menten) elimination, but no macro exists that can combine the two types of elimination. In contrast, such processes can be easily input using equations:<br /> <br /> ::{{Verbatim |ddt_Ac{{-}} ka*Ad - k12*Ac + k21*Ap - k*Ac - Vm*Ac/(Km*V + Ac) }}<br /> <br /> In a similar vein, models that are well-defined mathematically may be horribly complex to implement using equations, but easy using macros. This is true for instance for dynamical systems with source terms such as PK models with repeated oral doses and zero-order absorption. In that example, the absorption rate is a piecewise constant function.<br /> It is not easy to code this model using equations, and not worth it when we can quickly use the $\mlxtran$ macro {{Verbatim|oral(Tk0)}}, which completely characterizes the model for any dose design. The C++ code generated from an $\mlxtran$ script that uses this macro is the same as the one (that would be) generated by a script using a system of equations.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Description, representation and implementation of the statistical model==<br /> <br /> The statistical component of the model can be decomposed in two sub-models: a model that describes the variability of the parameters and a model that describes the variability of the observations. Each sub-model needs to be described, represented and implemented. Let us illustrate this approach with a very simple statistical model used for modeling the variability of a single individual parameter.<br /> <br /> &lt;br&gt;<br /> ===Description of the statistical model===<br /> In this example we want to describe the distribution of the volume in the population, using weight as a covariate. The first step consists of describing with extreme precision the statistical model that we want to use:<br /> <br /> <br /> &lt;ol&gt;<br /> &lt;li&gt;''Individuals in the population are mutually independent''&lt;/li&gt;<br /> &lt;li&gt;''The volume is log-normally distributed''&lt;/li&gt;<br /> &lt;li&gt;''The log-volume predicted by the model is a linear function of the log-weight''&lt;/li&gt;<br /> &lt;li&gt;''The reference weight in the population is 70kg''&lt;/li&gt;<br /> &lt;li&gt;''The variance of the log-volume is constant.''&lt;/li&gt;<br /> &lt;/ol&gt;<br /> <br /> &lt;br&gt;<br /> <br /> ===Representation of the statistical model===<br /> <br /> <br /> Since this model involves probability distributions, we will use a probabilistic model to represent it. Let $V_i$ and $w_i$ be the volume and weight of individual $i$. Statement 1 implies that only the conditional distribution $p(V_i | w_i)$ for individual $i$ needs to be represented. A probability distribution can be mathematically represented by a series of definitions and equations. This mathematical representation is not unique. We can use for instance any of these three representations:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:ex2a&quot;&gt;&lt;math&gt;\begin{eqnarray} <br /> V_i &amp;=&amp; \Vpop \left(\displaystyle{ \frac{w_i}{70} }\right)^\beta \, e^{\eta_i} \quad \text{where} \quad \eta_i \sim {\cal N}(0, \omega^2)<br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:ex2b&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \hat{V}_i &amp;=&amp; \Vpop \left(\displaystyle{ \frac{w_i}{70} }\right)^\beta \quad \text{and} \quad<br /> \log(V_i) \sim {\cal N}(\log(\hat{V}_i) , \omega^2)<br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:ex2c&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \tilde{w}_i&amp; =&amp; \log\left(\displaystyle{ \frac{w_i}{70} }\right) \quad \text{and} \quad \log(V_i) \sim {\cal N}(\log(\Vpop)+\beta \, \tilde{w}_i , \omega^2) .<br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt; <br /> |reference=(3) }}<br /> Here, $\omega$ is the standard deviation of the log-volume, $\Vpop$ is a reference value of volume in the population for a reference individual of 70kg and $\Vpop (w_i/70)^\beta$ the predicted volume for an individual with weight $w_i$.<br /> <br /> These three representations combine equations and definitions. The equations allow us to define the variables via algebraic equations, while the definitions characterize the random variables via probability distributions.<br /> <br /> <br /> &lt;br&gt;<br /> ===Implementation of the statistical model===<br /> <br /> <br /> The implementation of such models with $\mlxtran$ allows the direct usage of the same definitions and equations with a language very close to the mathematical one. The model in [[#eq:ex2a|(1)]] can be implemented in the following way:<br /> <br /> <br /> {{MLXTran<br /> |name=<br /> |text= &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;DEFINITION:<br /> eta = {distribution=normal, mean=0, standardDeviation=omega}<br /> <br /> EQUATION:<br /> V = Vpop*((w/70)^beta)*exp(eta)<br /> &lt;/pre&gt; }}<br /> <br /> <br /> The model in [[#eq:ex2b|(2)]] can be implemented this way:<br /> <br /> <br /> {{MLXTran<br /> |name=<br /> |text= &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;EQUATION:<br /> Vpred = Vpop*(W/70)^beta<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=Vpred, standardDeviation=omega}<br /> &lt;/pre&gt; }}<br /> <br /> <br /> The model in [[#eq:ex2c|(3)]] can be implemented like this:<br /> <br /> <br /> {{MLXTran<br /> |name=<br /> |text= &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;EQUATION:<br /> lw70 = log(W/70)<br /> <br /> DEFINITION[model=linear]:<br /> V = {distribution=logNormal, reference=Vpop, covariate=lw70,<br /> covariateCoefficient=beta, standardDeviation=omega}<br /> &lt;/pre&gt; }}<br /> <br /> Note that the linearity of the model is information that is explicitly entered.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Which representation for which task?===<br /> <br /> Representations [[#eq:ex2a|(1)]], [[#eq:ex2b|(2)]] and [[#eq:ex2c|(3)]] provide three different mathematical representations of the same probabilistic model. This means that when any of them are written in text or on a slide, anyone with some basic knowledge in statistics and mathematics will be able to derive the same information from any of the representations.<br /> <br /> However, if we want to use the model to perform tasks using specific software, the information passed to the software needs to be of a form that the software can understand with respect to each given task. It is not always true that any representation paired with any implementation can be used to perform any task. Let us illustrate this on our example for three basic tasks: simulation, likelihood computation and covariate model assessment.<br /> <br /> <br /> '''Simulation.''' If we assume that the software we use is able to simulate normal random variables with any given mean and standard deviation, then any representation of the model can be used for simulation:<br /> <br /> <br /> &lt;ul&gt;<br /> * Using [[#eq:ex2a|(1)]], $\eta_i$ is first simulated as a random normal variable with mean 0 and variance $\omega^2$. Then the volume $V_i$ is calculated as a function of $\eta_i$.<br /> <br /> * Using [[#eq:ex2b|(2)]] or [[#eq:ex2c|(3)]], $\log(V_i)$ can be directly simulated as a random normal variable with mean $\log(\Vpop)+\beta \log\left(w_i/70\right)$, or equivalently $\log(\Vpop(w_i/70)^\beta)$, and standard deviation $\omega^2$. Then $V_i = \exp\left(\log(V_i)\right)$.<br /> &lt;/ul&gt;<br /> <br /> <br /> In summary, what is required for simulation is the capacity to express the variable to be simulated as a function of some random variable that can be directly simulated by the software. In conclusion, any of the three $\mlxtran$ implementations proposed above can be used for simulation.<br /> <br /> <br /> '''Likelihood computation.''' By definition, the likelihood of a set of parameter values given some continuous observed outcomes is equal to the probability distribution function (pdf) of those observed outcomes given those parameters. In other words, to derive the likelihood of $\theta=(\Vpop,\beta,\omega^2)$ requires computation of the pdf of $V_i$ or a certain function of it. Here, it is straightforward to derive the likelihood from the pdf of $V_i$, which is log-normally distributed:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;likelihood1=&gt;&lt;math&gt;\begin{eqnarray}<br /> L_1(\theta ; V_1,\ldots,V_N) &amp;=&amp; \py(V_1,V_2,\ldots,V_N ;\theta) \\<br /> &amp; = &amp; \prod_{i=1}^N \py( V_i ;\theta) \\<br /> &amp; = &amp; \prod_{i=1}^N \displaystyle{ \frac{1}{\sqrt{2\pi \omega}V_i} } \exp\left\{-\displaystyle{ \frac{1}{2\omega} } \left( \log(V_i) - \log\left(\Vpop \left({w_i}/{70}\right)^\beta\right) \right)^2\right\} .<br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> It is also straightforward to derive the likelihood from the pdf of $\log(V_i)$, which is normally distributed:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;likelihood2&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> L_2(\theta ; \log(V_1),\ldots,\log(V_N)) &amp;=&amp; \py(\log(V_1),\ldots,\log(V_N) ;\theta) \\<br /> &amp; = &amp; \prod_{i=1}^N \displaystyle{ \frac{1}{\sqrt{2\pi \omega} } }\exp\left\{-\frac{1}{2\omega} \left( \log(V_i) - \log\left(\Vpop \left({w_i}/{70}\right)^\beta\right) \right)^2 \right\} .<br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(5) }}<br /> <br /> These two likelihoods $L_1$ and $L_2$ are equal up to a constant $\prod_i V_i$. No matter what the definition of the likelihood is based on, it is nonetheless necessary to provide some information about the pdf of $V_i$ for computing the likelihood. In this very basic example, the minimal information about the model that needs to be passed to the software via code to be able to compute the likelihood is:<br /> <br /> <br /> &lt;ul&gt;<br /> * The log-volume is normally distributed<br /> * The mean of $\log(V_i)$ is $\log\left(\Vpop \left({w_i}/{70}\right)^\beta\right)$<br /> * The standard deviation of $\log(V_i)$ is $\omega$.<br /> &lt;/ul&gt;<br /> <br /> <br /> Then, the likelihood can be easily computed if the software is able to compute a normal pdf for a given mean and standard deviation.<br /> <br /> In our example, only the representations of the model given in [[#eq:ex2b|(2)]] and [[#eq:ex2c|(3)]] (and therefore versions 2 and 3 of the $\mlxtran$ implementation) can be used for computing the likelihood in closed form. Indeed, both representations explicitly describe the probability distribution of $\log(V_i)$ and provide all the required information. On the other hand, the representation given in [[#eq:ex2a|(1)]] does not provide any explicit information about the distribution of $V_i$. Deriving the pdf of $V_i$ from [[#eq:ex2a|(1)]] would therefore require an interpreter to &quot;understand&quot; the formula, and a tool that can perform symbolic computation.<br /> <br /> <br /> '''Covariate model assessment.''' Our model hypothesizes a linear relationship between the log-weight and the log-volume.<br /> To assess if this is valid or not, we might consider using some visual diagnostic check of the plot of the (predicted or simulated) log-volume against the log-weight, to see whether this linear relationship seems plausible or not. Specific statistical procedures can also be used for testing the linearity hypothesis.<br /> <br /> Thus, both displaying an appropriate goodness of fit plot and using an appropriate statistical test require knowledge of the explicit relationship between the covariate and the parameter, i.e., the software needs to &quot;know&quot; this relationship. Neither of the representations of the model based on equations [[#eq:ex2a|(1)]] and [[#eq:ex2b|(2)]] explicitly spell out this relationship to the software. Of course, we can rewrite [[#eq:ex2b|(2)]] as:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \mu_i &amp;=&amp; \log(\Vpop)+\beta \log\left(\displaystyle{ \frac{W_i}{70} }\right) \\<br /> \log(V_i) &amp;\sim&amp; {\cal N}(\mu_i , \omega),<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> and clearly &quot;see&quot; that the predicted log-volume is a linear function of the log-weight. The issue is that, without a powerful interpreter, this information is not available to the software, so it cannot automatically run these tasks. Therefore, we must explicitly &quot;tell&quot; the software that the model is linear, as can be done with $\mlxtran$.<br /> <br /> <br /> {{Back<br /> |link=What is a model? A joint probability distribution! }}</div> Bleakley https://wiki.inria.fr/wikis/popix/index.php?title=Description,_representation_and_implementation_of_a_model&diff=7380 Description, representation and implementation of a model 2013-06-21T07:54:26Z <p>Bleakley: </p> <hr /> <div>==Introduction==<br /> <br /> A &quot;model&quot; can be implemented in the real world if it can be programmed using software. To do this, we need a language that can be understood by the software. Before even arriving at this point, it is important to be very clear and systematic about what a model is and how we want to use it.<br /> <br /> It is fundamental to distinguish between the description, representation and implementation of a model. Each of these three concepts uses a specific language.<br /> {| cellpadding=&quot;15&quot; cellspacing=&quot;15&quot;<br /> |style=&quot;width=500px&quot;| 1. First, we describe a model with words, i.e., a human language: || &lt;span style=&quot;font-family:comic sans ms;font-size:11pt&quot;&gt;&quot;The weight is a linear function of the height&quot;&lt;span&gt;<br /> |-<br /> |2. Then we represent the model using a mathematical or schematic language: || &lt;math&gt; W=a\,H + b &lt;/math&gt;<br /> |-<br /> |3. Lastly, we implement the model via a language understood by the software: || {{Verbatim|WEIGHT {{-}} a*HEIGHT + b<br /> |} <br /> <br /> <br /> The representation of a model is not unique. The choice of the representation should be driven by the tasks to be executed: if the model is only used to perform computations, a system of equations contains all the information required. If properties of the model need to be tested (linearity, [http://en.wikipedia.org/wiki/Homoscedasticity homoscedasticity], etc.), then they need to be represented via explicit definitions.<br /> <br /> In the context of mixed-effects models, models that we want to implement can be decomposed into two components: the structural model and the statistical model. Both components have to be described, represented and implemented with precision.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Description, representation and implementation of the structural model==<br /> <br /> Let us now look in more detail at these three things.<br /> <br /> &lt;br&gt;<br /> ===Description of the structural model===<br /> <br /> The first step consists in describing a model with high precision, using terminology and vocabulary well-adapted to the application. For instance, let us consider a [http://en.wikipedia.org/wiki/Pharmacokinetics PK] model that describes drug concentration as a function of time. We can describe the model with the sentence:<br /> <br /> &lt;blockquote&gt;<br /> &quot;''The PK model is a two-compartment model with first-order absorption (from the depot compartment - the gut, to the central compartment - the bloodstream), linear transfers between the central and the peripheral compartment, and linear elimination from the central compartment''&quot;.<br /> &lt;/blockquote&gt;<br /> &lt;br&gt;<br /> <br /> ===Representation of the structural model===<br /> <br /> <br /> 1. ''Using a diagram''<br /> <br /> This PK model can by represented by a diagram like the one shown the the following figure. Such diagrams offer both a descriptive and explicit representation (because the properties of the PK model are clearly shown).<br /> <br /> <br /> :::[[File:intro41.png|400px|link=]]<br /> <br /> <br /> 2. ''Using mathematical equations''<br /> <br /> Alternatively, a mathematical representation can be used to translate the description of the model into a system of equations:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \deriv{A_d} &amp; =&amp; -k_a A_d(t) \\<br /> &lt;!--%\deriv{A_c} &amp; =&amp; k_a A_d(t) -k_{12}A_c(t) + k_{21}A_p(t) - \frac{V_m}{V\,K_m + A_c(t)} A_c(t) \\--&gt;<br /> \deriv{A_c} &amp; =&amp; k_a A_d(t) -k_{12}A_c(t) + k_{21}A_p(t) - k_e A_c(t) \\<br /> \deriv{A_p} &amp; =&amp; k_{12}A_c(t) - k_{21}A_p(t) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> This representation allows us to calculate the amount of drug in each compartment at any point of time. On the other hand, this description of the model is implicit: even if a modeler is able to recognize the model described by the equations, i.e., to identify the processes of absorption, distribution and elimination, these are not explicitly represented like in the diagram.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Implementation of the structural model===<br /> <br /> <br /> <br /> 1. ''Using macros''<br /> <br /> The $\mlxtran$ language allows us to implement the model represented in the previous diagram using a simple script and a system of macros:<br /> <br /> <br /> {{MLXTran<br /> |name= - version 1<br /> |text=&lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt; <br /> PK:<br /> compartment(amount=Ac)<br /> oral(ka)<br /> peripheral(k12,k21)<br /> elimination(ke)<br /> &lt;/pre&gt; }}<br /> <br /> As you can see, there is a one-to-one mapping between the diagram and the code: each element of the diagram (and therefore of the model) is implemented as a macro.<br /> <br /> <br /> 2. ''Using equations''<br /> <br /> Alternatively, implementation of the model using the mathematical representation requires entering the system of equations into $\mlxtran$. The syntax used should be as close as possible to the original mathematical language in order make development simple and the code easy to parse. Here is the $\mlxtran$ syntax in this case:<br /> <br /> <br /> {{MLXTran<br /> |name= - version 2<br /> |text=&lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt; <br /> EQUATION:<br /> ddt_Ad = -ka*Ad <br /> ddt_Ac = ka*Ad - k12*Ac + k21*Ap - ke*Ac<br /> ddt_Ap = k12*Ac - k21*Ap<br /> &lt;/pre&gt; }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Which representation for which task?===<br /> <br /> <br /> It is fundamental to have the possibility of using several representations, and therefore several implementations, depending on the task at hand. One reason is that each kind of implementation has its pros and cons.<br /> <br /> The use of equations has the big advantage of being able to represent ''any'' complex model. This is not possible when using macros, which are fixed in number by default. For instance, the PK macros in $\mlxtran$ allow us to code linear and non-linear (Michaelis-Menten) elimination, but no macro exists that can combine the two types of elimination. In contrast, such processes can be easily input using equations:<br /> <br /> ::{{Verbatim |ddt_Ac{{-}} ka*Ad - k12*Ac + k21*Ap - k*Ac - Vm*Ac/(Km*V + Ac) }}<br /> <br /> In a similar vein, models that are well-defined mathematically may be horribly complex to implement using equations, but easy using macros. This is true for instance for dynamical systems with source terms such as PK models with repeated oral doses and zero-order absorption. In that example, the absorption rate is a piecewise constant function.<br /> It is not easy to code this model using equations, and not worth it when we can quickly use the $\mlxtran$ macro {{Verbatim|oral(Tk0)}}, which completely characterizes the model for any dose design. The C++ code generated from an $\mlxtran$ script that uses this macro is the same as the one (that would be) generated by a script using a system of equations.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Description, representation and implementation of the statistical model==<br /> <br /> The statistical component of the model can be decomposed in two sub-models: a model that describes the variability of the parameters and a model that describes the variability of the observations. Each sub-model needs to be described, represented and implemented. Let us illustrate this approach with a very simple statistical model used for modeling the variability of a single individual parameter.<br /> <br /> &lt;br&gt;<br /> ===Description of the statistical model===<br /> In this example we want to describe the distribution of the volume in the population, using weight as a covariate. The first step consists of describing with extreme precision the statistical model that we want to use:<br /> <br /> <br /> &lt;ol&gt;<br /> &lt;li&gt;''Individuals in the population are mutually independent''&lt;/li&gt;<br /> &lt;li&gt;''The volume is log-normally distributed''&lt;/li&gt;<br /> &lt;li&gt;''The log-volume predicted by the model is a linear function of the log-weight''&lt;/li&gt;<br /> &lt;li&gt;''The reference weight in the population is 70kg''&lt;/li&gt;<br /> &lt;li&gt;''The variance of the log-volume is constant.''&lt;/li&gt;<br /> &lt;/ol&gt;<br /> <br /> &lt;br&gt;<br /> <br /> ===Representation of the statistical model===<br /> <br /> <br /> Since this model involves probability distributions, we will use a probabilistic model to represent it. Let $V_i$ and $w_i$ be the volume and weight of individual $i$. Statement 1 implies that only the conditional distribution $p(V_i | w_i)$ for individual $i$ needs to be represented. A probability distribution can be mathematically represented by a series of definitions and equations. This mathematical representation is not unique. We can use for instance any of these three representations:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:ex2a&quot;&gt;&lt;math&gt;\begin{eqnarray} <br /> V_i &amp;=&amp; \Vpop \left(\displaystyle{ \frac{w_i}{70} }\right)^\beta \, e^{\eta_i} \quad \text{where} \quad \eta_i \sim {\cal N}(0, \omega^2)<br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:ex2b&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \hat{V}_i &amp;=&amp; \Vpop \left(\displaystyle{ \frac{w_i}{70} }\right)^\beta \quad \text{and} \quad<br /> \log(V_i) \sim {\cal N}(\log(\hat{V}_i) , \omega^2)<br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:ex2c&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \tilde{w}_i&amp; =&amp; \log\left(\displaystyle{ \frac{w_i}{70} }\right) \quad \text{and} \quad \log(V_i) \sim {\cal N}(\log(\Vpop)+\beta \, \tilde{w}_i , \omega^2) .<br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt; <br /> |reference=(3) }}<br /> Here, $\omega$ is the standard deviation of the log-volume, $\Vpop$ is a reference value of volume in the population for a reference individual of 70kg and $\Vpop (w_i/70)^\beta$ the predicted volume for an individual with weight $w_i$.<br /> <br /> These three representations combine equations and definitions. The equations allow us to define the variables via algebraic equations, while the definitions characterize the random variables via probability distributions.<br /> <br /> <br /> &lt;br&gt;<br /> ===Implementation of the statistical model===<br /> <br /> <br /> The implementation of such models with $\mlxtran$ allows the direct usage of the same definitions and equations with a language very close to the mathematical one. The model in [[#eq:ex2a|(1)]] can be implemented in the following way:<br /> <br /> <br /> {{MLXTran<br /> |name=<br /> |text= &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;DEFINITION:<br /> eta = {distribution=normal, mean=0, standardDeviation=omega}<br /> <br /> EQUATION:<br /> V = Vpop*((w/70)^beta)*exp(eta)<br /> &lt;/pre&gt; }}<br /> <br /> <br /> The model in [[#eq:ex2b|(2)]] can be implemented this way:<br /> <br /> <br /> {{MLXTran<br /> |name=<br /> |text= &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;EQUATION:<br /> Vpred = Vpop*(W/70)^beta<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=Vpred, standardDeviation=omega}<br /> &lt;/pre&gt; }}<br /> <br /> <br /> The model in [[#eq:ex2c|(3)]] can be implemented like this:<br /> <br /> <br /> {{MLXTran<br /> |name=<br /> |text= &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;EQUATION:<br /> lw70 = log(W/70)<br /> <br /> DEFINITION[model=linear]:<br /> V = {distribution=logNormal, reference=Vpop, covariate=lw70,<br /> covariateCoefficient=beta, standardDeviation=omega}<br /> &lt;/pre&gt; }}<br /> <br /> Note that the linearity of the model is information that is explicitly entered.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Which representation for which task?===<br /> <br /> Representations [[#eq:ex2a|(1)]], [[#eq:ex2b|(2)]] and [[#eq:ex2c|(3)]] provide three different mathematical representations of the same probabilistic model. This means that when any of them are written in text or on a slide, anyone with some basic knowledge in statistics and mathematics will be able to derive the same information from any of the representations.<br /> <br /> However, if we want to use the model to perform tasks using specific software, the information passed to the software needs to be of a form that the software can understand with respect to each given task. It is not always true that any representation paired with any implementation can be used to perform any task. Let us illustrate this on our example for three basic tasks: simulation, likelihood computation and covariate model assessment.<br /> <br /> <br /> '''Simulation.''' If we assume that the software we use is able to simulate normal random variables with any given mean and standard deviation, then any representation of the model can be used for simulation:<br /> <br /> <br /> &lt;ul&gt;<br /> * Using [[#eq:ex2a|(1)]], $\eta_i$ is first simulated as a random normal variable with mean 0 and variance $\omega^2$. Then the volume $V_i$ is calculated as a function of $\eta_i$.<br /> <br /> * Using [[#eq:ex2b|(2)]] or [[#eq:ex2c|(3)]], $\log(V_i)$ can be directly simulated as a random normal variable with mean $\log(\Vpop)+\beta \log\left(w_i/70\right)$, or equivalently $\log(\Vpop(w_i/70)^\beta)$, and standard deviation $\omega^2$. Then $V_i = \exp\left(\log(V_i)\right)$.<br /> &lt;/ul&gt;<br /> <br /> <br /> In summary, what is required for simulation is the capacity to express the variable to be simulated as a function of some random variable that can be directly simulated by the software. In conclusion, any of the three $\mlxtran$ implementations proposed above can be used for simulation.<br /> <br /> <br /> '''Likelihood computation.''' By definition, the likelihood of a set of parameter values given some continuous observed outcomes is equal to the probability distribution function (pdf) of those observed outcomes given those parameters. In other words, to derive the likelihood of $\theta=(\Vpop,\beta,\omega^2)$ requires computation of the pdf of $V_i$ or a certain function of it. Here, it is straightforward to derive the likelihood from the pdf of $V_i$, which is log-normally distributed:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;likelihood1=&gt;&lt;math&gt;\begin{eqnarray}<br /> L_1(\theta ; V_1,\ldots,V_N) &amp;=&amp; \py(V_1,V_2,\ldots,V_N ;\theta) \\<br /> &amp; = &amp; \prod_{i=1}^N \py( V_i ;\theta) \\<br /> &amp; = &amp; \prod_{i=1}^N \displaystyle{ \frac{1}{\sqrt{2\pi \omega}V_i} } \exp\left\{-\displaystyle{ \frac{1}{2\omega} } \left( \log(V_i) - \log\left(\Vpop \left({w_i}/{70}\right)^\beta\right) \right)^2\right\} .<br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> It is also straightforward to derive the likelihood from the pdf of $\log(V_i)$, which is normally distributed:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;likelihood2&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> L_2(\theta ; \log(V_1),\ldots,\log(V_N)) &amp;=&amp; \py(\log(V_1),\ldots,\log(V_N) ;\theta) \\<br /> &amp; = &amp; \prod_{i=1}^N \displaystyle{ \frac{1}{\sqrt{2\pi \omega} } }\exp\left\{-\frac{1}{2\omega} \left( \log(V_i) - \log\left(\Vpop \left({w_i}/{70}\right)^\beta\right) \right)^2 \right\} .<br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(5) }}<br /> <br /> These two likelihoods $L_1$ and $L_2$ are equal up to a constant $\prod_i V_i$. No matter what the definition of the likelihood is based on, it is nonetheless necessary to provide some information about the pdf of $V_i$ for computing the likelihood. In this very basic example, the minimal information about the model that needs to be passed to the software via code to be able to compute the likelihood is:<br /> <br /> <br /> &lt;ul&gt;<br /> * The log-volume is normally distributed<br /> * The mean of $\log(V_i)$ is $\log\left(\Vpop \left({w_i}/{70}\right)^\beta\right)$<br /> * The standard deviation of $\log(V_i)$ is $\omega$.<br /> &lt;/ul&gt;<br /> <br /> <br /> Then, the likelihood can be easily computed if the software is able to compute a normal pdf for a given mean and standard deviation.<br /> <br /> In our example, only the representations of the model given in [[#eq:ex2b|(2)]] and [[#eq:ex2c|(3)]] (and therefore versions 2 and 3 of the $\mlxtran$ implementation) can be used for computing the likelihood in closed form. Indeed, both representations explicitly describe the probability distribution of $\log(V_i)$ and provide all the required information. On the other hand, the representation given in [[#eq:ex2a|(1)]] does not provide any explicit information about the distribution of $V_i$. Deriving the pdf of $V_i$ from [[#eq:ex2a|(1)]] would therefore require an interpreter to &quot;understand&quot; the formula, and a tool that can perform symbolic computation.<br /> <br /> <br /> '''Covariate model assessment.''' Our model hypothesizes a linear relationship between the log-weight and the log-volume.<br /> To assess if this is valid or not, we might consider using some visual diagnostic check of the plot of the (predicted or simulated) log-volume against the log-weight, to see whether this linear relationship seems plausible or not. Specific statistical procedures can also be used for testing the linearity hypothesis.<br /> <br /> Thus, both displaying an appropriate goodness of fit plot and using an appropriate statistical test require knowledge of the explicit relationship between the covariate and the parameter, i.e., the software needs to &quot;know&quot; this relationship. Neither of the representations of the model based on equations [[#eq:ex2a|(1)]] and [[#eq:ex2b|(2)]] explicitly spell out this relationship to the software. Of course, we can rewrite [[#eq:ex2b|(2)]] as:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \mu_i &amp;=&amp; \log(\Vpop)+\beta \log\left(\displaystyle{ \frac{W_i}{70} }\right) \\<br /> \log(V_i) &amp;\sim&amp; {\cal N}(\mu_i , \omega),<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> and clearly &quot;see&quot; that the predicted log-volume is a linear function of the log-weight. The issue is that, without a powerful interpreter, this information is not available to the software, so it cannot automatically run these tasks. Therefore, we must explicitly &quot;tell&quot; the software that the model is linear, as can be done with $\mlxtran$.<br /> <br /> <br /> {{Back<br /> |link=What is a model? A joint probability distribution! }}</div> Bleakley https://wiki.inria.fr/wikis/popix/index.php?title=Description,_representation_and_implementation_of_a_model&diff=7379 Description, representation and implementation of a model 2013-06-21T07:52:43Z <p>Bleakley: </p> <hr /> <div>==Introduction==<br /> <br /> A &quot;model&quot; can be implemented in the real world if it can be programmed using software. To do this, we need a language that can be understood by the software. Before even arriving at this point, it is important to be very clear and systematic about what a model is and how we want to use it.<br /> <br /> It is fundamental to distinguish between the description, representation and implementation of a model. Each of these three concepts uses a specific language.<br /> {| cellpadding=&quot;15&quot; cellspacing=&quot;15&quot;<br /> |style=&quot;width=500px&quot;| 1. First, we describe a model with words, i.e., a human language: || &lt;span style=&quot;font-family:comic sans ms;font-size:11pt&quot;&gt;&quot;The weight is a linear function of the height&quot;&lt;span&gt;<br /> |-<br /> |2. Then we represent the model using a mathematical or schematic language: || &lt;math&gt; W=a\,H + b &lt;/math&gt;<br /> |-<br /> |3. Lastly, we implement the model via a language understood by the software: || {{Verbatim|WEIGHT {{-}} a*HEIGHT + b<br /> |} <br /> <br /> <br /> The representation of a model is not unique. The choice of the representation should be driven by the tasks to be executed: if the model is only used to perform computations, a system of equations contains all the information required. If properties of the model need to be tested (linearity, [http://en.wikipedia.org/wiki/Homoscedasticity homoscedasticity], etc.), then they need to be represented via explicit definitions.<br /> <br /> In the context of mixed-effects models, models that we want to implement can be decomposed into two components: the structural model and the statistical model. Both components have to be described, represented and implemented with precision.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Description, representation and implementation of the structural model==<br /> <br /> Let us now look in more detail at these three things.<br /> <br /> &lt;br&gt;<br /> ===Description of the structural model===<br /> <br /> The first step consists in describing a model with high precision, using terminology and vocabulary well-adapted to the application. For instance, let us consider a PK model that describes drug concentration as a function of time. We can describe the model with the sentence:<br /> <br /> &lt;blockquote&gt;<br /> &quot;''The PK model is a two-compartment model with first-order absorption (from the depot compartment - the gut, to the central compartment - the bloodstream), linear transfers between the central and the peripheral compartment, and linear elimination from the central compartment''&quot;.<br /> &lt;/blockquote&gt;<br /> &lt;br&gt;<br /> <br /> ===Representation of the structural model===<br /> <br /> <br /> 1. ''Using a diagram''<br /> <br /> This PK model can by represented by a diagram like the one shown the the following figure. Such diagrams offer both a descriptive and explicit representation (because the properties of the PK model are clearly shown).<br /> <br /> <br /> :::[[File:intro41.png|400px|link=]]<br /> <br /> <br /> 2. ''Using mathematical equations''<br /> <br /> Alternatively, a mathematical representation can be used to translate the description of the model into a system of equations:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \deriv{A_d} &amp; =&amp; -k_a A_d(t) \\<br /> &lt;!--%\deriv{A_c} &amp; =&amp; k_a A_d(t) -k_{12}A_c(t) + k_{21}A_p(t) - \frac{V_m}{V\,K_m + A_c(t)} A_c(t) \\--&gt;<br /> \deriv{A_c} &amp; =&amp; k_a A_d(t) -k_{12}A_c(t) + k_{21}A_p(t) - k_e A_c(t) \\<br /> \deriv{A_p} &amp; =&amp; k_{12}A_c(t) - k_{21}A_p(t) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> This representation allows us to calculate the amount of drug in each compartment at any point of time. On the other hand, this description of the model is implicit: even if a modeler is able to recognize the model described by the equations, i.e., to identify the processes of absorption, distribution and elimination, these are not explicitly represented like in the diagram.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Implementation of the structural model===<br /> <br /> <br /> <br /> 1. ''Using macros''<br /> <br /> The $\mlxtran$ language allows us to implement the model represented in the previous diagram using a simple script and a system of macros:<br /> <br /> <br /> {{MLXTran<br /> |name= - version 1<br /> |text=&lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt; <br /> PK:<br /> compartment(amount=Ac)<br /> oral(ka)<br /> peripheral(k12,k21)<br /> elimination(ke)<br /> &lt;/pre&gt; }}<br /> <br /> As you can see, there is a one-to-one mapping between the diagram and the code: each element of the diagram (and therefore of the model) is implemented as a macro.<br /> <br /> <br /> 2. ''Using equations''<br /> <br /> Alternatively, implementation of the model using the mathematical representation requires entering the system of equations into $\mlxtran$. The syntax used should be as close as possible to the original mathematical language in order make development simple and the code easy to parse. Here is the $\mlxtran$ syntax in this case:<br /> <br /> <br /> {{MLXTran<br /> |name= - version 2<br /> |text=&lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt; <br /> EQUATION:<br /> ddt_Ad = -ka*Ad <br /> ddt_Ac = ka*Ad - k12*Ac + k21*Ap - ke*Ac<br /> ddt_Ap = k12*Ac - k21*Ap<br /> &lt;/pre&gt; }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Which representation for which task?===<br /> <br /> <br /> It is fundamental to have the possibility of using several representations, and therefore several implementations, depending on the task at hand. One reason is that each kind of implementation has its pros and cons.<br /> <br /> The use of equations has the big advantage of being able to represent ''any'' complex model. This is not possible when using macros, which are fixed in number by default. For instance, the PK macros in $\mlxtran$ allow us to code linear and non-linear (Michaelis-Menten) elimination, but no macro exists that can combine the two types of elimination. In contrast, such processes can be easily input using equations:<br /> <br /> ::{{Verbatim |ddt_Ac{{-}} ka*Ad - k12*Ac + k21*Ap - k*Ac - Vm*Ac/(Km*V + Ac) }}<br /> <br /> In a similar vein, models that are well-defined mathematically may be horribly complex to implement using equations, but easy using macros. This is true for instance for dynamical systems with source terms such as PK models with repeated oral doses and zero-order absorption. In that example, the absorption rate is a piecewise constant function.<br /> It is not easy to code this model using equations, and not worth it when we can quickly use the $\mlxtran$ macro {{Verbatim|oral(Tk0)}}, which completely characterizes the model for any dose design. The C++ code generated from an $\mlxtran$ script that uses this macro is the same as the one (that would be) generated by a script using a system of equations.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Description, representation and implementation of the statistical model==<br /> <br /> The statistical component of the model can be decomposed in two sub-models: a model that describes the variability of the parameters and a model that describes the variability of the observations. Each sub-model needs to be described, represented and implemented. Let us illustrate this approach with a very simple statistical model used for modeling the variability of a single individual parameter.<br /> <br /> &lt;br&gt;<br /> ===Description of the statistical model===<br /> In this example we want to describe the distribution of the volume in the population, using weight as a covariate. The first step consists of describing with extreme precision the statistical model that we want to use:<br /> <br /> <br /> &lt;ol&gt;<br /> &lt;li&gt;''Individuals in the population are mutually independent''&lt;/li&gt;<br /> &lt;li&gt;''The volume is log-normally distributed''&lt;/li&gt;<br /> &lt;li&gt;''The log-volume predicted by the model is a linear function of the log-weight''&lt;/li&gt;<br /> &lt;li&gt;''The reference weight in the population is 70kg''&lt;/li&gt;<br /> &lt;li&gt;''The variance of the log-volume is constant.''&lt;/li&gt;<br /> &lt;/ol&gt;<br /> <br /> &lt;br&gt;<br /> <br /> ===Representation of the statistical model===<br /> <br /> <br /> Since this model involves probability distributions, we will use a probabilistic model to represent it. Let $V_i$ and $w_i$ be the volume and weight of individual $i$. Statement 1 implies that only the conditional distribution $p(V_i | w_i)$ for individual $i$ needs to be represented. A probability distribution can be mathematically represented by a series of definitions and equations. This mathematical representation is not unique. We can use for instance any of these three representations:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:ex2a&quot;&gt;&lt;math&gt;\begin{eqnarray} <br /> V_i &amp;=&amp; \Vpop \left(\displaystyle{ \frac{w_i}{70} }\right)^\beta \, e^{\eta_i} \quad \text{where} \quad \eta_i \sim {\cal N}(0, \omega^2)<br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:ex2b&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \hat{V}_i &amp;=&amp; \Vpop \left(\displaystyle{ \frac{w_i}{70} }\right)^\beta \quad \text{and} \quad<br /> \log(V_i) \sim {\cal N}(\log(\hat{V}_i) , \omega^2)<br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:ex2c&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \tilde{w}_i&amp; =&amp; \log\left(\displaystyle{ \frac{w_i}{70} }\right) \quad \text{and} \quad \log(V_i) \sim {\cal N}(\log(\Vpop)+\beta \, \tilde{w}_i , \omega^2) .<br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt; <br /> |reference=(3) }}<br /> Here, $\omega$ is the standard deviation of the log-volume, $\Vpop$ is a reference value of volume in the population for a reference individual of 70kg and $\Vpop (w_i/70)^\beta$ the predicted volume for an individual with weight $w_i$.<br /> <br /> These three representations combine equations and definitions. The equations allow us to define the variables via algebraic equations, while the definitions characterize the random variables via probability distributions.<br /> <br /> <br /> &lt;br&gt;<br /> ===Implementation of the statistical model===<br /> <br /> <br /> The implementation of such models with $\mlxtran$ allows the direct usage of the same definitions and equations with a language very close to the mathematical one. The model in [[#eq:ex2a|(1)]] can be implemented in the following way:<br /> <br /> <br /> {{MLXTran<br /> |name=<br /> |text= &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;DEFINITION:<br /> eta = {distribution=normal, mean=0, standardDeviation=omega}<br /> <br /> EQUATION:<br /> V = Vpop*((w/70)^beta)*exp(eta)<br /> &lt;/pre&gt; }}<br /> <br /> <br /> The model in [[#eq:ex2b|(2)]] can be implemented this way:<br /> <br /> <br /> {{MLXTran<br /> |name=<br /> |text= &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;EQUATION:<br /> Vpred = Vpop*(W/70)^beta<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=Vpred, standardDeviation=omega}<br /> &lt;/pre&gt; }}<br /> <br /> <br /> The model in [[#eq:ex2c|(3)]] can be implemented like this:<br /> <br /> <br /> {{MLXTran<br /> |name=<br /> |text= &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;EQUATION:<br /> lw70 = log(W/70)<br /> <br /> DEFINITION[model=linear]:<br /> V = {distribution=logNormal, reference=Vpop, covariate=lw70,<br /> covariateCoefficient=beta, standardDeviation=omega}<br /> &lt;/pre&gt; }}<br /> <br /> Note that the linearity of the model is information that is explicitly entered.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Which representation for which task?===<br /> <br /> Representations [[#eq:ex2a|(1)]], [[#eq:ex2b|(2)]] and [[#eq:ex2c|(3)]] provide three different mathematical representations of the same probabilistic model. This means that when any of them are written in text or on a slide, anyone with some basic knowledge in statistics and mathematics will be able to derive the same information from any of the representations.<br /> <br /> However, if we want to use the model to perform tasks using specific software, the information passed to the software needs to be of a form that the software can understand with respect to each given task. It is not always true that any representation paired with any implementation can be used to perform any task. Let us illustrate this on our example for three basic tasks: simulation, likelihood computation and covariate model assessment.<br /> <br /> <br /> '''Simulation.''' If we assume that the software we use is able to simulate normal random variables with any given mean and standard deviation, then any representation of the model can be used for simulation:<br /> <br /> <br /> &lt;ul&gt;<br /> * Using [[#eq:ex2a|(1)]], $\eta_i$ is first simulated as a random normal variable with mean 0 and variance $\omega^2$. Then the volume $V_i$ is calculated as a function of $\eta_i$.<br /> <br /> * Using [[#eq:ex2b|(2)]] or [[#eq:ex2c|(3)]], $\log(V_i)$ can be directly simulated as a random normal variable with mean $\log(\Vpop)+\beta \log\left(w_i/70\right)$, or equivalently $\log(\Vpop(w_i/70)^\beta)$, and standard deviation $\omega^2$. Then $V_i = \exp\left(\log(V_i)\right)$.<br /> &lt;/ul&gt;<br /> <br /> <br /> In summary, what is required for simulation is the capacity to express the variable to be simulated as a function of some random variable that can be directly simulated by the software. In conclusion, any of the three $\mlxtran$ implementations proposed above can be used for simulation.<br /> <br /> <br /> '''Likelihood computation.''' By definition, the likelihood of a set of parameter values given some continuous observed outcomes is equal to the probability distribution function (pdf) of those observed outcomes given those parameters. In other words, to derive the likelihood of $\theta=(\Vpop,\beta,\omega^2)$ requires computation of the pdf of $V_i$ or a certain function of it. Here, it is straightforward to derive the likelihood from the pdf of $V_i$, which is log-normally distributed:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;likelihood1=&gt;&lt;math&gt;\begin{eqnarray}<br /> L_1(\theta ; V_1,\ldots,V_N) &amp;=&amp; \py(V_1,V_2,\ldots,V_N ;\theta) \\<br /> &amp; = &amp; \prod_{i=1}^N \py( V_i ;\theta) \\<br /> &amp; = &amp; \prod_{i=1}^N \displaystyle{ \frac{1}{\sqrt{2\pi \omega}V_i} } \exp\left\{-\displaystyle{ \frac{1}{2\omega} } \left( \log(V_i) - \log\left(\Vpop \left({w_i}/{70}\right)^\beta\right) \right)^2\right\} .<br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> It is also straightforward to derive the likelihood from the pdf of $\log(V_i)$, which is normally distributed:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;likelihood2&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> L_2(\theta ; \log(V_1),\ldots,\log(V_N)) &amp;=&amp; \py(\log(V_1),\ldots,\log(V_N) ;\theta) \\<br /> &amp; = &amp; \prod_{i=1}^N \displaystyle{ \frac{1}{\sqrt{2\pi \omega} } }\exp\left\{-\frac{1}{2\omega} \left( \log(V_i) - \log\left(\Vpop \left({w_i}/{70}\right)^\beta\right) \right)^2 \right\} .<br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(5) }}<br /> <br /> These two likelihoods $L_1$ and $L_2$ are equal up to a constant $\prod_i V_i$. No matter what the definition of the likelihood is based on, it is nonetheless necessary to provide some information about the pdf of $V_i$ for computing the likelihood. In this very basic example, the minimal information about the model that needs to be passed to the software via code to be able to compute the likelihood is:<br /> <br /> <br /> &lt;ul&gt;<br /> * The log-volume is normally distributed<br /> * The mean of $\log(V_i)$ is $\log\left(\Vpop \left({w_i}/{70}\right)^\beta\right)$<br /> * The standard deviation of $\log(V_i)$ is $\omega$.<br /> &lt;/ul&gt;<br /> <br /> <br /> Then, the likelihood can be easily computed if the software is able to compute a normal pdf for a given mean and standard deviation.<br /> <br /> In our example, only the representations of the model given in [[#eq:ex2b|(2)]] and [[#eq:ex2c|(3)]] (and therefore versions 2 and 3 of the $\mlxtran$ implementation) can be used for computing the likelihood in closed form. Indeed, both representations explicitly describe the probability distribution of $\log(V_i)$ and provide all the required information. On the other hand, the representation given in [[#eq:ex2a|(1)]] does not provide any explicit information about the distribution of $V_i$. Deriving the pdf of $V_i$ from [[#eq:ex2a|(1)]] would therefore require an interpreter to &quot;understand&quot; the formula, and a tool that can perform symbolic computation.<br /> <br /> <br /> '''Covariate model assessment.''' Our model hypothesizes a linear relationship between the log-weight and the log-volume.<br /> To assess if this is valid or not, we might consider using some visual diagnostic check of the plot of the (predicted or simulated) log-volume against the log-weight, to see whether this linear relationship seems plausible or not. Specific statistical procedures can also be used for testing the linearity hypothesis.<br /> <br /> Thus, both displaying an appropriate goodness of fit plot and using an appropriate statistical test require knowledge of the explicit relationship between the covariate and the parameter, i.e., the software needs to &quot;know&quot; this relationship. Neither of the representations of the model based on equations [[#eq:ex2a|(1)]] and [[#eq:ex2b|(2)]] explicitly spell out this relationship to the software. Of course, we can rewrite [[#eq:ex2b|(2)]] as:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \mu_i &amp;=&amp; \log(\Vpop)+\beta \log\left(\displaystyle{ \frac{W_i}{70} }\right) \\<br /> \log(V_i) &amp;\sim&amp; {\cal N}(\mu_i , \omega),<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> and clearly &quot;see&quot; that the predicted log-volume is a linear function of the log-weight. The issue is that, without a powerful interpreter, this information is not available to the software, so it cannot automatically run these tasks. Therefore, we must explicitly &quot;tell&quot; the software that the model is linear, as can be done with $\mlxtran$.<br /> <br /> <br /> {{Back<br /> |link=What is a model? A joint probability distribution! }}</div> Bleakley https://wiki.inria.fr/wikis/popix/index.php?title=What_is_a_model%3F_A_joint_probability_distribution!&diff=7375 What is a model? A joint probability distribution! 2013-06-19T09:37:31Z <p>Bleakley: </p> <hr /> <div>==Introduction==<br /> <br /> A model built for real-world applications can involve various types of variable, such as measurements, individual and population parameters, covariates, design, etc. The model allows us to represent relationships between these variables.<br /> <br /> If we consider things from a probabilistic point of view, some of the variables will be random, so the model becomes a probabilistic one, representing the [http://en.wikipedia.org/wiki/Joint_probability_distribution joint distribution] of these random variables.<br /> <br /> Defining a model therefore means defining a joint distribution. The hierarchical structure of the model will then allow it to be decomposed into submodels, i.e., the joint distribution decomposed into a product of [http://en.wikipedia.org/wiki/Conditional_probability_distribution conditional distributions].<br /> <br /> Tasks such as estimation, model selection, simulation and optimization can then be expressed as specific ways of using this probability distribution.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - A model is a joint probability distribution. <br /> <br /> - A submodel is a conditional distribution derived from this joint distribution. <br /> <br /> - A task is a specific use of this distribution. <br /> }}<br /> <br /> We will illustrate this approach starting with a very simple example that we will gradually make more sophisticated. Then we will see in various situations what can be defined as the model and what its inputs are.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==An illustrative example==<br /> <br /> &lt;br&gt;<br /> ===A model for the observations of a single individual===<br /> Let $y=(y_j, 1\leq j \leq n)$ be a vector of observations obtained at times $\vt=(t_j, 1\leq j \leq n)$. We consider that the $y_j$ are random variables and we denote $\qy$ the distribution (or [http://en.wikipedia.org/wiki/Probability_density_function pdf]) of $y$. If we assume a [http://en.wikipedia.org/wiki/Parametric_model parametric model], then there exists a vector of parameters $\psi$ that completely define $y$.<br /> <br /> We can then explicitly represent this dependency with respect to $\bpsi$ by writing $\qy( \, \cdot \, ; \psi)$ for the pdf of $y$.<br /> <br /> If we wish to be even more precise, we can even make it clear that this distribution is defined for a given design, i.e., a given vector of times $\vt$, and write $\qy(\, \cdot \, ; \psi,\vt)$ instead.<br /> <br /> By convention, the variables which are before the symbol &quot;;&quot; are random variables. Those that are after the &quot;;&quot; are non-random parameters or variables.<br /> When there is no risk of confusion, the non-random terms can be left out of the notation.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> -In this context, the model is the distribution of the observations $\qy(\, \cdot \, ; \psi,\vt)$. &lt;br&gt;<br /> -The inputs of the model are the parameters $\psi$ and the design $\vt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= 500 mg of a drug is given by [http://en.wikipedia.org/wiki/Intravenous_therapy intravenous] [http://en.wikipedia.org/wiki/Bolus_%28medicine%29 bolus] to a patient at time 0. We assume that the evolution of the [http://en.wikipedia.org/wiki/Blood_plasma plasmatic] concentration of the drug over time is described by the [http://en.wikipedia.org/wiki/Pharmacokinetics pharmacokinetic] (PK) model<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; f(t;V,k) = \displaystyle{ \frac{500}{V} }e^{-k \, t} , &lt;/math&gt; }}<br /> <br /> where $V$ is the [http://en.wikipedia.org/wiki/Volume_of_distribution volume of distribution] and $k$ the [http://en.wikipedia.org/wiki/Elimination_rate_constant elimination rate constant]. The concentration is measured at times $(t_j, 1\leq j \leq n)$ with additive residual errors:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; y_j = f(t_j;V,k) + e_j , \quad 1 \leq j \leq n . &lt;/math&gt; }}<br /> <br /> Assuming that the residual errors $(e_j)$ are [http://en.wikipedia.org/wiki/Dependent_and_independent_variables independent] and [http://en.wikipedia.org/wiki/Normal_distribution normally distributed] with constant variance $a^2$, the observed values $(y_j)$ are also independent random variables and<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba1&quot; &gt;&lt;math&gt;<br /> y_j \sim {\cal N} \left( f(t_j ; V,k) , a^2 \right), \quad 1 \leq j \leq n. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> Here, the vector of parameters $\psi$ is $(V,k,a)$. $V$ and $k$ are the PK parameters for the structural PK model and $a$ the residual error parameter.<br /> As the $y_j$ are independent, the joint distribution of $y$ is the product of their marginal distributions:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \py(y ; \psi,\vt) = \prod_{j=1}^n \pyj(y_j ; \psi,t_j) ,<br /> &lt;/math&gt; }}<br /> <br /> where $\qyj$ is the normal distribution defined in [[#ex_proba1|(1)]]. <br /> }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> === A model for several individuals ===<br /> <br /> Now let us move to $N$ individuals. It is natural to suppose that each is represented by the same basic parametric model, but not necessarily the exact same parameter values. Thus, individual $i$ has parameters $\psi_i$. If we consider that individuals are randomly selected from the [http://en.wikipedia.org/wiki/Statistical_population population], then we can treat the $\psi_i$ as if they were random vectors. As both $\by=(y_i , 1\leq i \leq N)$ and $\bpsi=(\psi_i , 1\leq i \leq N)$ are random, the model is now a joint distribution: $\qypsi$. Using basic probability, this can be written as:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsi(\by,\bpsi) = \pcypsi(\by {{!}} \bpsi) \, \ppsi(\bpsi) .&lt;/math&gt; }}<br /> <br /> If $\qpsi$ is a parametric distribution that depends on a vector $\theta$ of ''population parameters'' and a set of ''individual covariates'' $\bc=(c_i , 1\leq i \leq N)$, this dependence can be made explicit by writing $\qpsi(\, \cdot \,;\theta,\bc)$ for the pdf of $\bpsi$.<br /> Each $i$ has a potentially unique set of times $t_i=(t_{i1},\ldots,t_{i \ \!\!n_i})$ in the design, and $n_i$ can be different for each individual.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - In this context, the model is the joint distribution of the observations and the individual parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pypsi(\by , \bpsi; \theta, \bc,\bt)=\pcypsi(\by {{!}} \bpsi;\bt) \, \ppsi(\bpsi;\theta,\bc) . &lt;/math&gt;}}<br /> <br /> - The inputs of the model are the population parameters $\theta$, the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times <br /> :$\bt=(t_{ij} ,\ 1\leq i \leq N ,\ 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= Let us suppose $N$ patients received the same treatment as the single patient did. We now have the same PK model [[#ex_proba1|(1)]] for each patient, except that each has its own individual PK parameters $V_i$ and $k_i$ and potentially its own residual error parameter $a_i$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2a&quot;&gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N} \left( \displaystyle{\frac{500}{V_i}e^{-k_i \, t_{ij} } } , a_i^2 \right).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> Here, $\psi_i = (V_i,k_i,a_i)$. One possible model is then to assume the same residual error model for all patients, and log-normal distributions for $V$ and $k$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> a_i &amp;=&amp; a \end{eqnarray}&lt;/math&gt; }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2b&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \log(V_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(V_{\rm pop})+\beta\,\log(w_i/70),\ \omega_V^2\right) \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \log(k_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(k_{\rm pop}),\ \omega_k^2\right), \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where the only covariate we choose to consider, $w_i$, is the weight (in kg) of patient $i$. The model therefore consists of the conditional distribution of the concentrations defined in [[#ex_proba2a|(2)]] and the distribution of the individual PK parameters defined in [[#ex_proba2b|(3)]]. The inputs of the model are the population parameters $\theta = (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$, the covariates (here, the weight) $(w_i, 1\leq i \leq N)$, and the design $\bt$.<br /> }}<br /> <br /> &lt;br&gt;<br /> ===A model for the population parameters===<br /> <br /> In some cases, it may turn out that it is useful or important to consider that the population parameter $\theta$ is itself random, rather than fixed. There are various reasons for this, such as if we want to model uncertainty in its value, introduce a priori information in an estimation context, or model an inter-population variability if the model is not looking at only one given population.<br /> <br /> If so, let us denote $\qth$ the distribution of $\theta$. As the status of $\theta$ has therefore changed, the model now becomes the joint distribution of its random variables, i.e., of $\by$, $\bpsi$ and $\theta$, and can be decomposed as follows:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3a&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta;\bt,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> |text= &lt;ol&gt;<br /> &lt;li&gt; The formula is identical for $\ppsi(\bpsi; \theta)$ and $\pcpsith(\bpsi{{!}}\theta)$. What has changed is the status of $\theta$. It is not random in $\ppsi(\bpsi; \theta)$, the distribution of $\bpsi$ for any given value of $\theta$, whereas it is random in $\pcpsith(\bpsi {{!}} \theta)$, the conditional distribution of $\bpsi$, i.e., the distribution of $\bpsi$ obtained after observing randomly generated $\theta$. &lt;/li&gt;&lt;br&gt;<br /> <br /> &lt;li&gt;If $\qth$ is a parametric distribution with parameter $\varphi$, this dependence can be made explicit by writing $\qth(\, \cdot \,;\varphi)$ for the distribution of $\theta$.&lt;/li&gt;&lt;br&gt;<br /> <br /> &lt;li&gt;Not necessarily all of the components of $\theta$ need be random. If it is possible to decompose $\theta$ into $(\theta_F,\theta_R)$, where $\theta_F$ is fixed and $\theta_R$ random, then the decomposition [[#proba3a{{!}}(4)]] becomes &lt;/li&gt;<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3b&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta_R;\bt,\theta_F,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta_R;\theta_F,\bc) \, \pth(\theta_R).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(5) }} <br /> &lt;/ol&gt;}}<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the population parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsith(\by,\bpsi,\theta;\bc,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta). &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times $\bt=(t_{ij} , 1\leq i \leq N , 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We can introduce [http://en.wikipedia.org/wiki/Prior_probability prior distributions] in order to model the inter-population variability of the population parameters $V_{\rm pop}$ and $k_{\rm pop}$: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba3&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> V_{\rm pop} &amp;\sim&amp; {\cal N}\left(30,3^2\right) <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(6) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> k_{\rm pop} &amp;\sim&amp; {\cal N}\left(0.1,0.01^2\right). \end{eqnarray}&lt;/math&gt; }}<br /> <br /> As before, the conditional distribution of the concentration is given by [[#ex_proba2a|(2)]]. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given $\theta_R=(V_{\rm pop},k_{\rm pop})$. The distribution of $\theta_R$ is defined in [[#ex_proba3|(6)]]. Here, the inputs of the model are the fixed population parameters $\theta_F = (\omega_V,\omega_k,\beta,a)$, the weights $(w_i)$ and the design $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the covariates===<br /> <br /> <br /> Another scenario is to suppose that in fact it is the covariates $\bc$ that are random, not the population parameters. This may either be in the context of wanting to simulate individuals, or when modeling and wanting to take into account uncertainty in the covariate values. If we note $\qc$ the distribution of the covariates, the joint distribution $\qpsic$ of the individual parameters and the covariates decomposes naturally as:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba4&quot;&gt;&lt;math&gt;<br /> \ppsic(\bpsi,\bc;\theta) = \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) \, ,<br /> &lt;/math&gt;<br /> |reference=(7) }}<br /> <br /> where $\qcpsic$ is the conditional distribution of $\bpsi$ given $\bc$.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> &lt;li&gt;In this context, the model is the joint distribution of the observations, the individual parameters and the covariates:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsic(\by,\bpsi,\bc;\theta,\bt) = \pcypsi(\by {{!}} \bpsi;\bt) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) . &lt;/math&gt; }}<br /> <br /> &lt;li&gt;The inputs of the model are the population parameters $\theta$ and the measurement times $\bt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We could assume a normal distribution as a prior for the weights: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba4&quot; &gt;&lt;math&gt; <br /> w_i \sim_{i.i.d.} {\cal N}\left(70,10^2\right). &lt;/math&gt;&lt;/div&gt;<br /> |reference=(8) }}<br /> <br /> Once more, [[#ex_proba2a|(2)]] defines the conditional distribution of the concentrations. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given the weight $\bw$, which is now a random variable whose distribution is defined in [[#ex_proba4|(8)]]. Now, the inputs of the model are the population parameters $\theta= (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$ and the design $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the measurement times===<br /> <br /> Another scenario is to suppose that there is uncertainty in the measurement times $\bt=(t_{ij})$ and not the population parameters or covariates. If we note $\nominal{\bt}=(\nominal{t}_{ij}, 1\leq i \leq N, 1\leq j \leq n_i)$ the nominal measurement times (i.e., those presented in a data set), then the &quot;true&quot; measurement times $\bt$ at which the measurement were made can be considered random fluctuations around $\nominal{\bt}$ following some distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> Randomness with respect to time can also appear in the presence of dropout, i.e., individuals that prematurely leave a clinical trial. For such an individual $i$ who leaves at the random time $T_i$, measurement times are the nominal times before $T_i$: $t_{i} = (\nominal{t}_{ij} \ \ {\rm s.t. }\ \ \nominal{t}_{ij}\leq T_i)$.<br /> In such situations, measurement times are therefore random and can be thought to come from a distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> <br /> {{Remarks<br /> |title=Remark<br /> |text= If there are also other regression variables $\bx=(x_{ij})$, it is of course possible to use the same approach and consider $\bx$ as a random variable fluctuating around $\nominal{\bx}$. }}<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsit(\by , \bpsi,\bt; \theta,\bc,\nominal{\bt})=\pcypsit(\by {{!}}\bpsi,\bt) \, \ppsi(\bpsi;\theta,\bc) \, \pt(\bt ; \nominal{\bt}) .<br /> &lt;/math&gt; }}<br /> <br /> <br /> &lt;li&gt; The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$ and the nominal design $\nominal{\bt}$.<br /> }}<br /> <br /> {{Example<br /> |title=Example:<br /> |text= Let us assume as prior a normal distribution around the nominal times: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba5&quot; &gt;&lt;math&gt; <br /> t_{ij} \sim_{i.i.d.} {\cal N}\left(\nominal{t}_{ij},0.03^2\right). &lt;/math&gt;&lt;/div&gt;<br /> |reference=(9) }}<br /> <br /> Here, [[#ex_proba5|(9)]] defines the distribution of the now random variable $\bt$. The other components of the model defined in [[#ex_proba2a|(2)]] and [[#ex_proba2b|(3)]] remain unchanged. <br /> The inputs of the model are the population parameters $\theta$, the weights $(w_i)$ and the nominal measurement times $\nominal{\bt}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the dose regimen===<br /> <br /> If the structural model is a dynamical system (e.g., defined by a system of [http://en.wikipedia.org/wiki/Ordinary_differential_equation ordinary differential equations]), the ''source terms'' $\bu = (u_i, 1\leq i \leq N)$, i.e., the inputs of the dynamical system, are usually considered fixed and known. This is the case for example for doses administered to patients for a given treatment. Here, the source term $u_i$ is made up of the dose(s) given to patient $i$, the time(s) of administration, and their type (intravenous bolus, infusion, oral, etc.).<br /> <br /> Here again, there may be differences between the nominal dose regimen stated in the protocol and given in the data set, and the dose regimen that was in reality administered. For example, it might be that the times of administration and/or the dosage were not exactly respected or recorded. Also, there may have been non-compliance, i.e., certain doses that were not taken by the patient.<br /> <br /> If we denote $\nominal{\bu}=(\nominal{u}_{i}, 1\leq i \leq N)$ the nominal dose regimens (reported in the dataset), then in this context the &quot;real&quot; dose regimens $\bu$ can be considered to randomly fluctuate around $\nominal{\bu}$ with some distribution $\qu(\, \cdot \, ; \nominal{\bu})$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the dose regimens:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsiu(\by , \bpsi,\bu; \theta,\bc,\bt,\nominal{\bu})=\pcypsiu(\by {{!}} \bpsi,\bu;\bt) \, \pu(\bu ; \nominal{\bu}) \, \ppsi(\bpsi;\theta,\bc) .<br /> &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$, the nominal design $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=Suppose that instead of the one dose given in the example up to now, there are now repeated doses $(d_{ik}, k \geq 1)$ administered to patient $i$ at times $(\tau_{ik} , k \geq 1)$. Then, it is easy to see that<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6b&quot; &gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N}\left(f(t_{ij};V_i,k_i) , a_i^2 \right), &lt;/math&gt;&lt;/div&gt;<br /> |reference=(10) }}<br /> <br /> where<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6a&quot; &gt;&lt;math&gt; <br /> f(t;V_i,k_i) = \sum_{k, \tau_{ik}&lt;t}\displaystyle {\frac{d_{ik} }{V_i} }\, e^{-k_i \, (t- \tau_{ik})} .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(11) }}<br /> <br /> The &quot;real&quot; dose regimen administered to patient $i$ can be written $u_i=(d_{ik},\tau_{ik}, k\geq 1)$, and the prescribed dose regimen $\nominal{u}_i=(\nominal{d}_{ik},\nominal{\tau}_{ik}, k\geq 1)$.<br /> <br /> We can model the random fluctuations of the administration times $\tau_{ik}$ around the nominal times $(\nominal{\tau}_{ij})$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6c&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \tau_{ik} &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\nominal{\tau}_{ik},0.02^2\right)\ , <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(12) }}<br /> <br /> and non-compliance (here meaning that a dose is not taken):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6d&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \pi &amp;=&amp; \prob{d_{ik} = 0} \nonumber \\ &amp;=&amp; 1 - \prob{d_{ik}= \nominal{d}_{ik} }. <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(13) }}<br /> <br /> Here, [[#ex_proba6b{{!}}(10)]] and [[#ex_proba6a{{!}}(11)]] define the conditional distributions of the concentrations $(y_{ij})$, [[#ex_proba2b{{!}}(3)]] defines the distribution of $\bpsi$ and [[#ex_proba6c{{!}}(12)]] and [[#ex_proba6d{{!}}(13)]] define the distribution of $\bu$. The inputs are the population parameters $\theta$, the weights $(w_i)$, the measurement times $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A complete model===<br /> <br /> We have now seen the variety of ways in which the variables in a model either play the role of random variables whose distribution is defined by the model, or nonrandom variables or parameters. Any combination is possible, depending on the context. For instance, the population parameters $\theta$ and covariates $\bc$ could be random with parametric probability distributions $\qth(\, \cdot \,;\varphi)$ and $\qc(\, \cdot \,;\gamma)$, and the dose regimen $\bu$ and measurement times $\bt$ reported with uncertainty and therefore modeled as random variables with distributions $\qu$ and $\qt$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters, the population parameters, the dose regimens, the covariates and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsithcut(\by , \bpsi, \theta, \bu, \bc,\bt; \nominal{\bu},\nominal{\bt},\varphi,\gamma)=\pcypsiut(\by {{!}}\bpsi,\bu,\bt) \, \pcpsithc(\bpsi{{!}}\theta,\bc) \, \pth(\theta;\varphi) \, \pc(\bc;\gamma) \, \pu(\bu ; \nominal{\bu}) \, \pt(\bt ; \nominal{\bt}). &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the nominal dose regimens $\nominal{\bu}$, the nominal measurement times $\nominal{\bt}$ and the &quot;hyper-parameters&quot; $\varphi$ and $\gamma$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Using the model for executing tasks==<br /> <br /> <br /> In the modeling and simulation context, the tasks to execute make specific use of the various probability distributions associated with a model.<br /> <br /> &lt;br&gt;<br /> ===Simulation===<br /> <br /> <br /> By definition, simulation makes direct use of the probability distribution that defines the model. Simulation of the global model is straightforward as soon as the joint distribution can be decomposed into a product of easily simulated conditional distributions.<br /> <br /> Consider for example that the variables involved in the model are those introduced in [[#An illustrative example|the previous section]]:<br /> <br /> <br /> # The population parameters $\theta$ can either be given, or simulated from the distribution $\qth$.<br /> # The individual covariates $\bc$ can either be given, or simulated from the distribution $\qc$.<br /> # The individual parameters $\bpsi$ can be simulated from the distribution $\qcpsithc$ using the values of $\theta$ and $\bc$ obtained in steps 1 and 2.<br /> # The dose regimen $\bu$ can either be given, or simulated from the distribution $\qu$.<br /> # The measurement times $\bt$ (resp. regression variables $\bx$) can either be given, or simulated from the distribution $\qt$ (resp. $\qx$).<br /> # Lastly, observations $\by$ can be simulated from the distribution $\qcypsiut$ using the values of $\bpsi$, $\bu$ and $\bt$ obtained at steps 3, 4 and 5.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> Simulation of a set of variables $w$ using another given set of variables $z$ requires:<br /> <br /> <br /> &lt;ul&gt;<br /> * a model, i.e., a distribution $\qw$ if $z$ is treated as a nonrandom variable, or a conditional distribution $\qcwz$ if $z$ is treated as a random variable.<br /> * the input $z$, i.e., a value of $z$ which allows the distribution $\qw(\, \cdot \, ; z)$ or the conditional distribution $\qcwz(\, \cdot \, {{!}} z)$ to be defined.<br /> * an algorithm which allows us to generate $w$ from $\qw$ or $\qcwz$.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=<br /> - Imagine instead that the population parameter $\theta$ and the design $(\bu,\bt)$ are given, and we want to simulate the individual covariates $\bc$, the individual parameters $\bpsi$ and the observations $\by$. Here, the variables to simulate are $w=(\bc,\bpsi,\by)$ and the variables which are given are $z=(\theta,\bu,\bt)$. If the components of $z$ are taken to be nonrandom variables, then:<br /> <br /> <br /> &lt;ul&gt;<br /> * The model is the joint distribution $\qypsic( \, \cdot \, ;\theta,\bu,\bt)$ of $(\by,\bpsi,\bc)$.<br /> * The inputs required for the simulation are the values of $(\theta,\bu,\bt)$.<br /> * The algorithm should be able to generate $(\by,\bpsi,\bc)$ from the joint distribution $\qypsic(\, \cdot \, ;\theta,\bu,\bt)$. Decomposing the model into three submodels leads to decomposing the joint distribution as<br /> &lt;/ul&gt;<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsic(\by,\bpsi,\bc ;\theta,\bu,\bt) = \pc(\bc) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pcypsi(\by {{!}} \bpsi;\bu,\bt) . &lt;/math&gt; }}<br /> <br /> The algorithm therefore reduces to successively drawing $\bc$, $\bpsi$ and $\by$ from $\qc$, $\qcpsic(\, \cdot \, {{!}} \bc;\theta)$ and $\qcypsi(\, \cdot \, {{!}} \bpsi;\bu,\bt)$. <br /> <br /> <br /> - Imagine instead that the individual covariates $\bc$, the observations $\by$, the design $(\bu,\bt)$ and the population parameter $\theta$ are given (in a modeling context for instance, $\theta$ may have been estimated), and we want to simulate the individual parameters $\bpsi$. The only variable to simulate is $w=\bpsi$ and the variables which are given are $z=(\by,\bc,\theta,\bu,\bt)$. Here, we will treat $\by$ as if it is a random variable. The other components of $z$ can be treated as non random variables. Here,<br /> <br /> <br /> &lt;ul&gt;<br /> * The model is the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$ of $\psi$.<br /> * The inputs required for the simulation are the values of $(\by,\bc,\theta,\bu,\bt)$.<br /> * The algorithm should be able to sample $\bpsi$ from the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$. [http://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo Markov Chain Monte Carlo] (MCMC) algorithms can be used for sampling from such complex conditional distributions.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> &lt;br&gt;<br /> <br /> ===Estimation of the population parameters===<br /> <br /> <br /> In a modeling context, we usually assume that we have data that includes the observations $\by$ and the measurement times $\bt$. There may also be individual covariates $\bc$, and in pharmacological applications the dose regimen $\bu$. For clarity, let us consider the most general case where all are present.<br /> <br /> Any statistical method for estimating the population parameters $\theta$ will be based on some specific probability distribution. Let us illustrate this with two common statistical methods: maximum likelihood and Bayesian estimation.<br /> <br /> <br /> ''Maximum likelihood estimation'' consists in maximizing with respect to $\theta$ the ''observed likelihood'', defined by:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\like}(\theta ; \by,\bc,\bu,\bt) &amp;\eqdef&amp; \py(\by ; \bc,\bu,\bt,\theta) \\<br /> &amp;=&amp; \int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The variance of the estimator $\thmle$ and therefore confidence intervals can be derived from the observed Fisher information matrix, which itself is calculated using the observed likelihood (i.e., the pdf of the observations $\by$):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ofim_intro3&quot;&gt;&lt;math&gt;<br /> \ofim(\thmle ; \by,\bc,\bu,\bt) \ \ \eqdef \ \ - \displaystyle{ \frac{\partial^2}{\partial \theta^2} } \log({\like}(\thmle ; \by,\bc,\bu,\bt)) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(14) }}<br /> <br /> <br /> {{OutlineText<br /> |text=Maximum likelihood estimation of the population parameter $\theta$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi$.<br /> * inputs $\by$, $\bc$, $\bu$ and $\bt$.<br /> * an algorithm which allows us to maximize $\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi$ with respect to $\theta$ and to compute $\displaystyle{ \frac{\partial^2}{\partial \theta^2} }\left\{\log\left(\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\thmle) \, d \bpsi \right)\right\}$.<br /> }}<br /> <br /> <br /> ''Bayesian estimation'' consists in estimating and/or maximizing the conditional distribution<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcthy(\theta {{!}} \by ;\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pyth(\by , \theta ; \bc,\bu,\bt)}{\py(\by ; \bc,\bu,\bt)} } \\<br /> &amp;=&amp; \frac{\displaystyle{ \int \pypsith(\by,\bpsi,\theta ; \bc,\bu,\bt) \, d \bpsi} }{\py(\by ; \bc,\bu,\bt)} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> {{outlineText<br /> |text= Bayesian estimation of the population parameter $\theta$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsith(\, \cdot \, ; \bc, \bu, \bt)$ for $(\by,\bpsi,\theta)$.<br /> * inputs $\by$, $\bc$, $\bu$ and $\bt$.<br /> * algorithms able to estimate and maximize $\pcthy(\theta {{!}} \by ;\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode, i.e., finding its maximum.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Estimation of the individual parameters===<br /> <br /> <br /> When $\theta$ is given (or estimated), various estimators of the individual parameters $\bpsi$ are available. They are all based on a probability distribution:<br /> <br /> ''Maximum likelihood estimation'' consists of maximizing with respect to $\bpsi$ the ''conditional likelihood''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\like}(\bpsi ; \by,\bu,\bt) &amp;\eqdef&amp; \pcypsi(\by {{!}} \bpsi ;\bu,\bt) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''maximum a posteriori'' (MAP) estimator is obtained by maximizing with respect to $\bpsi$ the ''conditional distribution''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcpsiy(\bpsi {{!}} \by ; \theta,\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pypsi(\by , \bpsi;\theta,\bc,\bu,\bt)}{\py(\by ; \theta,\bc,\bu,\bt)} } .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''conditional mean'' of $\bpsi$ is defined as the mean of the conditional distribution $\qcpsiy(\, \cdot \, | \by ; \theta,\bc,\bu,\bt)$ of $\psi$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> Estimation of the individual parameters $\bpsi$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br /> * algorithms able to estimate and maximize $\pcpsiy(\bpsi {{!}} \by ; \theta,\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode (i.e., its MAP).<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> ===Model selection===<br /> <br /> <br /> Likelihood ratio tests and statistical information criteria ([http://en.wikipedia.org/wiki/Bayesian_information_criterion BIC], [http://en.wikipedia.org/wiki/Akaike_information_criterion AIC]) compare the ''observed likelihoods'' computed under different models, i.e., the probability distribution functions $\py^{(1)}(\by ; \bc,\bu,\bt,\thmle_1)$, $\py^{(2)}(\by ; \bc,\bu,\bt,\thmle_2)$, ..., $\py^{(K)}(\by ; \bc,\bu,\bt,\thmle_K)$ computed under models ${\cal M}_1, {\cal M}_2$, ..., ${\cal M}_K$, where $\thmle_k$ maximizes the observed likelihood of model ${\cal M}_k$, i.e., maximizes $\py^{(k)}(\by ; \bc,\bu,\bt,\theta)$ .<br /> <br /> <br /> {{outlineText<br /> |text=<br /> Computing the observed likelihood and information criteria require:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br /> * an algorithm able to compute $\int \pypsi( \by ,\bpsi ;\theta,\bc,\bu,\bt) \, d\bpsi$. For nonlinear models, linearization methods or Monte Carlo methods can be used.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Optimal design===<br /> <br /> In the design of experiments for estimating statistical models, optimal designs allow parameters to be estimated with minimum variance by optimizing some statistical criteria. Common optimality criteria are [http://en.wikipedia.org/wiki/Functional_%28mathematics%29 functionals] of the [http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors eigenvalues] of the expected Fisher information matrix<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;efim_intro3&quot;&gt;&lt;math&gt;<br /> \efim(\theta ; \bu,\bt) \ \ \eqdef \ \ \esps{y}{\ofim(\theta ; \by,\bu,\bt)} ,<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(15) }}<br /> <br /> where $\ofim$ is the observed Fisher information matrix defined in [[#ofim_intro3|(15)]]. For the sake of simplicity, we consider models without covariates $\bc$.<br /> <br /> <br /> {{OutlineText<br /> |text=Optimal design for minimum variance estimation requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * a vector of population parameters $\theta$.<br /> * a criteria ${\cal D}(\bu,\bt)$ derived from the expected Fisher information matrix $\efim(\theta ; \bu,\bt)$.<br /> * an algorithm able to estimate $\efim(\theta ; \bu,\bt)$ for any design $(\bu,\bt)$ and to maximize ${\cal D}(\bu,\bt)$ with respect to $\bu$ and $\bt$.<br /> }}<br /> <br /> <br /> In a [http://en.wikipedia.org/wiki/Clinical_trial clinical trial] context, studies are designed to optimize the probability of reaching some predefined target ${\cal A}$, i.e., $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$. This may include optimizing safety and efficacy, and things like the probability of reaching [http://en.wikipedia.org/wiki/Sustained_viral_response sustained virologic response], etc.<br /> <br /> <br /> {{OutlineText<br /> |text=Optimal design for clinical trials requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsic(\, \cdot \, ; \theta, \bu, \bt)$ for $(\by,\bpsi,\bc)$.<br /> * a vector of population parameters $\theta$.<br /> * a target ${\cal A}$.<br /> * an algorithm able to estimate $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$ and to maximize it with respect to $\bu$ and $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Implementing models and running tasks==<br /> <br /> <br /> ===Example 1 ===<br /> <br /> Consider first the model defined by the joint distribution <br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\pypsi(\by,\bpsi ; \theta ,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \pcpsic(\bpsi ; \theta),&lt;/math&gt;}}<br /> <br /> where as in our running example, <br /> <br /> <br /> &lt;ul&gt;<br /> * $\by = (y_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are concentrations <br /> <br /> * $\bpsi= (\psi_i, 1\leq i \leq N)$ are individual parameters; here $\psi_i=(V_i,k_i,a_i)$<br /> <br /> * $\theta=(V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,a)$ are population parameters <br /> <br /> * $\bt = (t_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are the measurement times. <br /> &lt;/ul&gt;<br /> <br /> <br /> We aim to define a joint model for $\by$ and $\bpsi$. To do this, we will characterize each component of the model and show how this can be implemented with $\mlxtran$.<br /> <br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot;|<br /> {{Equation2 <br /> |name=&lt;math&gt; \pypsi(\by,\bpsi ; \theta, \bt) &lt;/math&gt; <br /> |equation= }}<br /> {{Equation2<br /> |name= &lt;math&gt; \pcpsic(\bpsi {{!}}\theta)&lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{array}{c} <br /> \log(V_i) &amp;\sim&amp; {\cal N}\left(\log(V_{\rm pop}), \, \omega_V^2\right) \\<br /> \log(k_i) &amp;\sim&amp; {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br /> \end{array}&lt;/math&gt; }}<br /> {{Equation2<br /> |name= &lt;math&gt;\pcypsi(y{{!}}\bpsi; \bt) &lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{eqnarray}<br /> f(t;V_i,k_i) &amp;=&amp; \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br /> y_{ij} &amp;\sim&amp; {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> |style = &quot;width:50%&quot; |<br /> {{MLXTranForTable<br /> |name=Example 1<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> [INDIVIDUAL PARAMETER]<br /> input={V_pop,k_pop,omega_V,omega_k}<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=V_pop,sd=omega_V}<br /> k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br /> <br /> <br /> [OBSERVATION]<br /> input={V,k,a}<br /> <br /> EQUATION:<br /> f = 500/V*exp(-k*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, sd=a}<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> <br /> We can then use this model with different tools for executing different tasks: it can be used for example with $\mlxplore$ for model exploration, with $\monolix$ for modeling, with R or Matlab for simulation, etc.<br /> <br /> It is important to remember that $\mlxtran$ is not a &quot;function&quot; that calculates an output. It is not an imperative but rather a declarative language, one that allows us to describe a model. It is then the tasks we choose to do which use $\mlxtran$ like a function, &quot;requesting&quot; it to give predictions, simulate random variables, compute a pdf, maximizes a likelihood, etc.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Example 1 ===<br /> <br /> Consider first the model defined by the joint distribution <br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\pypsi(\by,\bpsi ; \theta ,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \pcpsic(\bpsi ; \theta),&lt;/math&gt;}}<br /> <br /> where as in our running example, <br /> <br /> <br /> &lt;ul&gt;<br /> * $\by = (y_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are concentrations <br /> <br /> * $\bpsi= (\psi_i, 1\leq i \leq N)$ are individual parameters; here $\psi_i=(V_i,k_i,a_i)$<br /> <br /> * $\theta=(V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,a)$ are population parameters <br /> <br /> * $\bt = (t_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are the measurement times. <br /> &lt;/ul&gt;<br /> <br /> <br /> We aim to define a joint model for $\by$ and $\bpsi$. To do this, we will characterize each component of the model and show how this can be implemented with $\mlxtran$.<br /> <br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot;|<br /> {{Equation2 <br /> |name=&lt;math&gt; \pypsi(\by,\bpsi ; \theta, \bt) &lt;/math&gt; <br /> |equation= }}<br /> {{Equation2<br /> |name= &lt;math&gt; \pcpsic(\bpsi {{!}}\theta)&lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{array}{c} <br /> \log(V_i) &amp;\sim&amp; {\cal N}\left(\log(V_{\rm pop}), \, \omega_V^2\right) \\<br /> \log(k_i) &amp;\sim&amp; {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br /> \end{array}&lt;/math&gt; }}<br /> {{Equation2<br /> |name= &lt;math&gt;\pcypsi(y{{!}}\bpsi; \bt) &lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{eqnarray}<br /> f(t;V_i,k_i) &amp;=&amp; \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br /> y_{ij} &amp;\sim&amp; {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> |style = &quot;width:50%&quot; |<br /> {{MLXTranForTable<br /> |name=Example 1<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> [INDIVIDUAL PARAMETER]<br /> input={V_pop,k_pop,omega_V,omega_k}<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=V_pop,sd=omega_V}<br /> k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br /> <br /> <br /> [OBSERVATION]<br /> input={V,k,a}<br /> <br /> EQUATION:<br /> f = 500/V*exp(-k*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, sd=a}<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> <br /> We can then use this model with different tools for executing different tasks: it can be used for example with $\mlxplore$ for model exploration, with $\monolix$ for modeling, with R or Matlab for simulation...<br /> <br /> It is important to remember that $\mlxtran$ is not a &quot;function&quot; that calculates an output. It is not an imperative but rather a declarative language, one that allows us to describe a model. It is then the tasks we choose to do which use $\mlxtran$ like a function, &quot;requesting&quot; it to give predictions, simulate random variables, compute a pdf, maximize a likelihood, etc.<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack=The individual approach<br /> |linkNext=Description, representation and implementation of a model }}</div> Bleakley https://wiki.inria.fr/wikis/popix/index.php?title=What_is_a_model%3F_A_joint_probability_distribution!&diff=7374 What is a model? A joint probability distribution! 2013-06-19T09:35:03Z <p>Bleakley: </p> <hr /> <div>==Introduction==<br /> <br /> A model built for real-world applications can involve various types of variable, such as measurements, individual and population parameters, covariates, design, etc. The model allows us to represent relationships between these variables.<br /> <br /> If we consider things from a probabilistic point of view, some of the variables will be random, so the model becomes a probabilistic one, representing the [http://en.wikipedia.org/wiki/Joint_probability_distribution joint distribution] of these random variables.<br /> <br /> Defining a model therefore means defining a joint distribution. The hierarchical structure of the model will then allow it to be decomposed into submodels, i.e., the joint distribution decomposed into a product of [http://en.wikipedia.org/wiki/Conditional_probability_distribution conditional distributions].<br /> <br /> Tasks such as estimation, model selection, simulation and optimization can then be expressed as specific ways of using this probability distribution.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - A model is a joint probability distribution. <br /> <br /> - A submodel is a conditional distribution derived from this joint distribution. <br /> <br /> - A task is a specific use of this distribution. <br /> }}<br /> <br /> We will illustrate this approach starting with a very simple example that we will gradually make more sophisticated. Then we will see in various situations what can be defined as the model and what its inputs are.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==An illustrative example==<br /> <br /> &lt;br&gt;<br /> ===A model for the observations of a single individual===<br /> Let $y=(y_j, 1\leq j \leq n)$ be a vector of observations obtained at times $\vt=(t_j, 1\leq j \leq n)$. We consider that the $y_j$ are random variables and we denote $\qy$ the distribution (or [http://en.wikipedia.org/wiki/Probability_density_function pdf]) of $y$. If we assume a [http://en.wikipedia.org/wiki/Parametric_model parametric model], then there exists a vector of parameters $\psi$ that completely define $y$.<br /> <br /> We can then explicitly represent this dependency with respect to $\bpsi$ by writing $\qy( \, \cdot \, ; \psi)$ for the pdf of $y$.<br /> <br /> If we wish to be even more precise, we can even make it clear that this distribution is defined for a given design, i.e., a given vector of times $\vt$, and write $\qy(\, \cdot \, ; \psi,\vt)$ instead.<br /> <br /> By convention, the variables which are before the symbol &quot;;&quot; are random variables. Those that are after the &quot;;&quot; are non-random parameters or variables.<br /> When there is no risk of confusion, the non-random terms can be left out of the notation.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> -In this context, the model is the distribution of the observations $\qy(\, \cdot \, ; \psi,\vt)$. &lt;br&gt;<br /> -The inputs of the model are the parameters $\psi$ and the design $\vt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= 500 mg of a drug is given by [http://en.wikipedia.org/wiki/Intravenous_therapy intravenous] [http://en.wikipedia.org/wiki/Bolus_%28medicine%29 bolus] to a patient at time 0. We assume that the evolution of the [http://en.wikipedia.org/wiki/Blood_plasma plasmatic] concentration of the drug over time is described by the [http://en.wikipedia.org/wiki/Pharmacokinetics pharmacokinetic] (PK) model<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; f(t;V,k) = \displaystyle{ \frac{500}{V} }e^{-k \, t} , &lt;/math&gt; }}<br /> <br /> where $V$ is the [http://en.wikipedia.org/wiki/Volume_of_distribution volume of distribution] and $k$ the [http://en.wikipedia.org/wiki/Elimination_rate_constant elimination rate constant]. The concentration is measured at times $(t_j, 1\leq j \leq n)$ with additive residual errors:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; y_j = f(t_j;V,k) + e_j , \quad 1 \leq j \leq n . &lt;/math&gt; }}<br /> <br /> Assuming that the residual errors $(e_j)$ are [http://en.wikipedia.org/wiki/Dependent_and_independent_variables independent] and [http://en.wikipedia.org/wiki/Normal_distribution normally distributed] with constant variance $a^2$, the observed values $(y_j)$ are also independent random variables and<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba1&quot; &gt;&lt;math&gt;<br /> y_j \sim {\cal N} \left( f(t_j ; V,k) , a^2 \right), \quad 1 \leq j \leq n. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> Here, the vector of parameters $\psi$ is $(V,k,a)$. $V$ and $k$ are the PK parameters for the structural PK model and $a$ the residual error parameter.<br /> As the $y_j$ are independent, the joint distribution of $y$ is the product of their marginal distributions:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \py(y ; \psi,\vt) = \prod_{j=1}^n \pyj(y_j ; \psi,t_j) ,<br /> &lt;/math&gt; }}<br /> <br /> where $\qyj$ is the normal distribution defined in [[#ex_proba1|(1)]]. <br /> }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> === A model for several individuals ===<br /> <br /> Now let us move to $N$ individuals. It is natural to suppose that each is represented by the same basic parametric model, but not necessarily the exact same parameter values. Thus, individual $i$ has parameters $\psi_i$. If we consider that individuals are randomly selected from the [http://en.wikipedia.org/wiki/Statistical_population population], then we can treat the $\psi_i$ as if they were random vectors. As both $\by=(y_i , 1\leq i \leq N)$ and $\bpsi=(\psi_i , 1\leq i \leq N)$ are random, the model is now a joint distribution: $\qypsi$. Using basic probability, this can be written as:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsi(\by,\bpsi) = \pcypsi(\by {{!}} \bpsi) \, \ppsi(\bpsi) .&lt;/math&gt; }}<br /> <br /> If $\qpsi$ is a parametric distribution that depends on a vector $\theta$ of ''population parameters'' and a set of ''individual covariates'' $\bc=(c_i , 1\leq i \leq N)$, this dependence can be made explicit by writing $\qpsi(\, \cdot \,;\theta,\bc)$ for the pdf of $\bpsi$.<br /> Each $i$ has a potentially unique set of times $t_i=(t_{i1},\ldots,t_{i \ \!\!n_i})$ in the design, and $n_i$ can be different for each individual.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - In this context, the model is the joint distribution of the observations and the individual parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pypsi(\by , \bpsi; \theta, \bc,\bt)=\pcypsi(\by {{!}} \bpsi;\bt) \, \ppsi(\bpsi;\theta,\bc) . &lt;/math&gt;}}<br /> <br /> - The inputs of the model are the population parameters $\theta$, the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times <br /> :$\bt=(t_{ij} ,\ 1\leq i \leq N ,\ 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= Let us suppose $N$ patients received the same treatment as the single patient did. We now have the same PK model [[#ex_proba1|(1)]] for each patient, except that each has its own individual PK parameters $V_i$ and $k_i$ and potentially its own residual error parameter $a_i$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2a&quot;&gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N} \left( \displaystyle{\frac{500}{V_i}e^{-k_i \, t_{ij} } } , a_i^2 \right).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> Here, $\psi_i = (V_i,k_i,a_i)$. One possible model is then to assume the same residual error model for all patients, and log-normal distributions for $V$ and $k$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> a_i &amp;=&amp; a \end{eqnarray}&lt;/math&gt; }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2b&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \log(V_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(V_{\rm pop})+\beta\,\log(w_i/70),\ \omega_V^2\right) \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \log(k_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(k_{\rm pop}),\ \omega_k^2\right), \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where the only covariate we choose to consider, $w_i$, is the weight (in kg) of patient $i$. The model therefore consists of the conditional distribution of the concentrations defined in [[#ex_proba2a|(2)]] and the distribution of the individual PK parameters defined in [[#ex_proba2b|(3)]]. The inputs of the model are the population parameters $\theta = (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$, the covariates (here, the weight) $(w_i, 1\leq i \leq N)$, and the design $\bt$.<br /> }}<br /> <br /> &lt;br&gt;<br /> ===A model for the population parameters===<br /> <br /> In some cases, it may turn out that it is useful or important to consider that the population parameter $\theta$ is itself random, rather than fixed. There are various reasons for this, such as if we want to model uncertainty in its value, introduce a priori information in an estimation context, or model an inter-population variability if the model is not looking at only one given population.<br /> <br /> If so, let us denote $\qth$ the distribution of $\theta$. As the status of $\theta$ has therefore changed, the model now becomes the joint distribution of its random variables, i.e., of $\by$, $\bpsi$ and $\theta$, and can be decomposed as follows:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3a&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta;\bt,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> |text= &lt;ol&gt;<br /> &lt;li&gt; The formula is identical for $\ppsi(\bpsi; \theta)$ and $\pcpsith(\bpsi{{!}}\theta)$. What has changed is the status of $\theta$. It is not random in $\ppsi(\bpsi; \theta)$, the distribution of $\bpsi$ for any given value of $\theta$, whereas it is random in $\pcpsith(\bpsi {{!}} \theta)$, the conditional distribution of $\bpsi$, i.e., the distribution of $\bpsi$ obtained after observing randomly generated $\theta$. &lt;/li&gt;&lt;br&gt;<br /> <br /> &lt;li&gt;If $\qth$ is a parametric distribution with parameter $\varphi$, this dependence can be made explicit by writing $\qth(\, \cdot \,;\varphi)$ for the distribution of $\theta$.&lt;/li&gt;&lt;br&gt;<br /> <br /> &lt;li&gt;Not necessarily all of the components of $\theta$ need be random. If it is possible to decompose $\theta$ into $(\theta_F,\theta_R)$, where $\theta_F$ is fixed and $\theta_R$ random, then the decomposition [[#proba3a{{!}}(4)]] becomes &lt;/li&gt;<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3b&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta_R;\bt,\theta_F,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta_R;\theta_F,\bc) \, \pth(\theta_R).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(5) }} <br /> &lt;/ol&gt;}}<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the population parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsith(\by,\bpsi,\theta;\bc,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta). &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times $\bt=(t_{ij} , 1\leq i \leq N , 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We can introduce [http://en.wikipedia.org/wiki/Prior_probability prior distributions] in order to model the inter-population variability of the population parameters $V_{\rm pop}$ and $k_{\rm pop}$: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba3&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> V_{\rm pop} &amp;\sim&amp; {\cal N}\left(30,3^2\right) <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(6) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> k_{\rm pop} &amp;\sim&amp; {\cal N}\left(0.1,0.01^2\right). \end{eqnarray}&lt;/math&gt; }}<br /> <br /> As before, the conditional distribution of the concentration is given by [[#ex_proba2a|(2)]]. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given $\theta_R=(V_{\rm pop},k_{\rm pop})$. The distribution of $\theta_R$ is defined in [[#ex_proba3|(6)]]. Here, the inputs of the model are the fixed population parameters $\theta_F = (\omega_V,\omega_k,\beta,a)$, the weights $(w_i)$ and the design $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the covariates===<br /> <br /> <br /> Another scenario is to suppose that in fact it is the covariates $\bc$ that are random, not the population parameters. This may either be in the context of wanting to simulate individuals, or when modeling and wanting to take into account uncertainty in the covariate values. If we note $\qc$ the distribution of the covariates, the joint distribution $\qpsic$ of the individual parameters and the covariates decomposes naturally as:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba4&quot;&gt;&lt;math&gt;<br /> \ppsic(\bpsi,\bc;\theta) = \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) \, ,<br /> &lt;/math&gt;<br /> |reference=(7) }}<br /> <br /> where $\qcpsic$ is the conditional distribution of $\bpsi$ given $\bc$.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> &lt;li&gt;In this context, the model is the joint distribution of the observations, the individual parameters and the covariates:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsic(\by,\bpsi,\bc;\theta,\bt) = \pcypsi(\by {{!}} \bpsi;\bt) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) . &lt;/math&gt; }}<br /> <br /> &lt;li&gt;The inputs of the model are the population parameters $\theta$ and the measurement times $\bt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We could assume a normal distribution as a prior for the weights: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba4&quot; &gt;&lt;math&gt; <br /> w_i \sim_{i.i.d.} {\cal N}\left(70,10^2\right). &lt;/math&gt;&lt;/div&gt;<br /> |reference=(8) }}<br /> <br /> Once more, [[#ex_proba2a|(2)]] defines the conditional distribution of the concentrations. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given the weight $\bw$, which is now a random variable whose distribution is defined in [[#ex_proba4|(8)]]. Now, the inputs of the model are the population parameters $\theta= (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$ and the design $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the measurement times===<br /> <br /> Another scenario is to suppose that there is uncertainty in the measurement times $\bt=(t_{ij})$ and not the population parameters or covariates. If we note $\nominal{\bt}=(\nominal{t}_{ij}, 1\leq i \leq N, 1\leq j \leq n_i)$ the nominal measurement times (i.e., those presented in a data set), then the &quot;true&quot; measurement times $\bt$ at which the measurement were made can be considered random fluctuations around $\nominal{\bt}$ following some distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> Randomness with respect to time can also appear in the presence of dropout, i.e., individuals that prematurely leave a clinical trial. For such an individual $i$ who leaves at the random time $T_i$, measurement times are the nominal times before $T_i$: $t_{i} = (\nominal{t}_{ij} \ \ {\rm s.t. }\ \ \nominal{t}_{ij}\leq T_i)$.<br /> In such situations, measurement times are therefore random and can be thought to come from a distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> <br /> {{Remarks<br /> |title=Remark<br /> |text= If there are also other regression variables $\bx=(x_{ij})$, it is of course possible to use the same approach and consider $\bx$ as a random variable fluctuating around $\nominal{\bx}$. }}<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsit(\by , \bpsi,\bt; \theta,\bc,\nominal{\bt})=\pcypsit(\by {{!}}\bpsi,\bt) \, \ppsi(\bpsi;\theta,\bc) \, \pt(\bt ; \nominal{\bt}) .<br /> &lt;/math&gt; }}<br /> <br /> <br /> &lt;li&gt; The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$ and the nominal design $\nominal{\bt}$.<br /> }}<br /> <br /> {{Example<br /> |title=Example:<br /> |text= Let us assume as prior a normal distribution around the nominal times: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba5&quot; &gt;&lt;math&gt; <br /> t_{ij} \sim_{i.i.d.} {\cal N}\left(\nominal{t}_{ij},0.03^2\right). &lt;/math&gt;&lt;/div&gt;<br /> |reference=(9) }}<br /> <br /> Here, [[#ex_proba5|(9)]] defines the distribution of the now random variable $\bt$. The other components of the model defined in [[#ex_proba2a|(2)]] and [[#ex_proba2b|(3)]] remain unchanged. <br /> The inputs of the model are the population parameters $\theta$, the weights $(w_i)$ and the nominal measurement times $\nominal{\bt}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the dose regimen===<br /> <br /> If the structural model is a dynamical system (e.g., defined by a system of [http://en.wikipedia.org/wiki/Ordinary_differential_equation ordinary differential equations]), the ''source terms'' $\bu = (u_i, 1\leq i \leq N)$, i.e., the inputs of the dynamical system, are usually considered fixed and known. This is the case for example for doses administered to patients for a given treatment. Here, the source term $u_i$ is made up of the dose(s) given to patient $i$, the time(s) of administration, and their type (intravenous bolus, infusion, oral, etc.).<br /> <br /> Here again, there may be differences between the nominal dose regimen stated in the protocol and given in the data set, and the dose regimen that was in reality administered. For example, it might be that the times of administration and/or the dosage were not exactly respected or recorded. Also, there may have been non-compliance, i.e., certain doses that were not taken by the patient.<br /> <br /> If we denote $\nominal{\bu}=(\nominal{u}_{i}, 1\leq i \leq N)$ the nominal dose regimens (reported in the dataset), then in this context the &quot;real&quot; dose regimens $\bu$ can be considered to randomly fluctuate around $\nominal{\bu}$ with some distribution $\qu(\, \cdot \, ; \nominal{\bu})$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the dose regimens:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsiu(\by , \bpsi,\bu; \theta,\bc,\bt,\nominal{\bu})=\pcypsiu(\by {{!}} \bpsi,\bu;\bt) \, \pu(\bu ; \nominal{\bu}) \, \ppsi(\bpsi;\theta,\bc) .<br /> &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$, the nominal design $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=Suppose that instead of the one dose given in the example up to now, there are now repeated doses $(d_{ik}, k \geq 1)$ administered to patient $i$ at times $(\tau_{ik} , k \geq 1)$. Then, it is easy to see that<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6b&quot; &gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N}\left(f(t_{ij};V_i,k_i) , a_i^2 \right), &lt;/math&gt;&lt;/div&gt;<br /> |reference=(10) }}<br /> <br /> where<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6a&quot; &gt;&lt;math&gt; <br /> f(t;V_i,k_i) = \sum_{k, \tau_{ik}&lt;t}\displaystyle {\frac{d_{ik} }{V_i} }\, e^{-k_i \, (t- \tau_{ik})} .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(11) }}<br /> <br /> The &quot;real&quot; dose regimen administered to patient $i$ can be written $u_i=(d_{ik},\tau_{ik}, k\geq 1)$, and the prescribed dose regimen $\nominal{u}_i=(\nominal{d}_{ik},\nominal{\tau}_{ik}, k\geq 1)$.<br /> <br /> We can model the random fluctuations of the administration times $\tau_{ik}$ around the nominal times $(\nominal{\tau}_{ij})$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6c&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \tau_{ik} &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\nominal{\tau}_{ik},0.02^2\right)\ , <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(12) }}<br /> <br /> and non-compliance (here meaning that a dose is not taken):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6d&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \pi &amp;=&amp; \prob{d_{ik} = 0} \nonumber \\ &amp;=&amp; 1 - \prob{d_{ik}= \nominal{d}_{ik} }. <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(13) }}<br /> <br /> Here, [[#ex_proba6b{{!}}(10)]] and [[#ex_proba6a{{!}}(11)]] define the conditional distributions of the concentrations $(y_{ij})$, [[#ex_proba2b{{!}}(3)]] defines the distribution of $\bpsi$ and [[#ex_proba6c{{!}}(12)]] and [[#ex_proba6d{{!}}(13)]] define the distribution of $\bu$. The inputs are the population parameters $\theta$, the weights $(w_i)$, the measurement times $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A complete model===<br /> <br /> We have now seen the variety of ways in which the variables in a model either play the role of random variables whose distribution is defined by the model, or nonrandom variables or parameters. Any combination is possible, depending on the context. For instance, the population parameters $\theta$ and covariates $\bc$ could be random with parametric probability distributions $\qth(\, \cdot \,;\varphi)$ and $\qc(\, \cdot \,;\gamma)$, and the dose regimen $\bu$ and measurement times $\bt$ reported with uncertainty and therefore modeled as random variables with distributions $\qu$ and $\qt$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters, the population parameters, the dose regimens, the covariates and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsithcut(\by , \bpsi, \theta, \bu, \bc,\bt; \nominal{\bu},\nominal{\bt},\varphi,\gamma)=\pcypsiut(\by {{!}}\bpsi,\bu,\bt) \, \pcpsithc(\bpsi{{!}}\theta,\bc) \, \pth(\theta;\varphi) \, \pc(\bc;\gamma) \, \pu(\bu ; \nominal{\bu}) \, \pt(\bt ; \nominal{\bt}). &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the nominal dose regimens $\nominal{\bu}$, the nominal measurement times $\nominal{\bt}$ and the &quot;hyper-parameters&quot; $\varphi$ and $\gamma$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Using the model for executing tasks==<br /> <br /> <br /> In the modeling and simulation context, the tasks to execute make specific use of the various probability distributions associated with a model.<br /> <br /> &lt;br&gt;<br /> ===Simulation===<br /> <br /> <br /> By definition, simulation makes direct use of the probability distribution that defines the model. Simulation of the global model is straightforward as soon as the joint distribution can be decomposed into a product of easily simulated conditional distributions.<br /> <br /> Consider for example that the variables involved in the model are those introduced in [[#An illustrative example|the previous section]]:<br /> <br /> <br /> # The population parameters $\theta$ can either be given, or simulated from the distribution $\qth$.<br /> # The individual covariates $\bc$ can either be given, or simulated from the distribution $\qc$.<br /> # The individual parameters $\bpsi$ can be simulated from the distribution $\qcpsithc$ using the values of $\theta$ and $\bc$ obtained in steps 1 and 2.<br /> # The dose regimen $\bu$ can either be given, or simulated from the distribution $\qu$.<br /> # The measurement times $\bt$ (resp. regression variables $\bx$) can either be given, or simulated from the distribution $\qt$ (resp. $\qx$).<br /> # Lastly, observations $\by$ can be simulated from the distribution $\qcypsiut$ using the values of $\bpsi$, $\bu$ and $\bt$ obtained at steps 3, 4 and 5.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> Simulation of a set of variables $w$ using another given set of variables $z$ requires:<br /> <br /> <br /> &lt;ul&gt;<br /> * a model, i.e., a distribution $\qw$ if $z$ is treated as a nonrandom variable, or a conditional distribution $\qcwz$ if $z$ is treated as a random variable.<br /> * the input $z$, i.e., a value of $z$ which allows the distribution $\qw(\, \cdot \, ; z)$ or the conditional distribution $\qcwz(\, \cdot \, {{!}} z)$ to be defined.<br /> * an algorithm which allows us to generate $w$ from $\qw$ or $\qcwz$.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=<br /> - Imagine instead that the population parameter $\theta$ and the design $(\bu,\bt)$ are given, and we want to simulate the individual covariates $\bc$, the individual parameters $\bpsi$ and the observations $\by$. Here, the variables to simulate are $w=(\bc,\bpsi,\by)$ and the variables which are given are $z=(\theta,\bu,\bt)$. If the components of $z$ are taken to be nonrandom variables, then:<br /> <br /> <br /> &lt;ul&gt;<br /> * The model is the joint distribution $\qypsic( \, \cdot \, ;\theta,\bu,\bt)$ of $(\by,\bpsi,\bc)$.<br /> * The inputs required for the simulation are the values of $(\theta,\bu,\bt)$.<br /> * The algorithm should be able to generate $(\by,\bpsi,\bc)$ from the joint distribution $\qypsic(\, \cdot \, ;\theta,\bu,\bt)$. Decomposing the model into three submodels leads to decomposing the joint distribution as<br /> &lt;/ul&gt;<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsic(\by,\bpsi,\bc ;\theta,\bu,\bt) = \pc(\bc) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pcypsi(\by {{!}} \bpsi;\bu,\bt) . &lt;/math&gt; }}<br /> <br /> The algorithm therefore reduces to successively drawing $\bc$, $\bpsi$ and $\by$ from $\qc$, $\qcpsic(\, \cdot \, {{!}} \bc;\theta)$ and $\qcypsi(\, \cdot \, {{!}} \bpsi;\bu,\bt)$. <br /> <br /> <br /> - Imagine instead that the individual covariates $\bc$, the observations $\by$, the design $(\bu,\bt)$ and the population parameter $\theta$ are given (in a modeling context for instance, $\theta$ may have been estimated), and we want to simulate the individual parameters $\bpsi$. The only variable to simulate is $w=\bpsi$ and the variables which are given are $z=(\by,\bc,\theta,\bu,\bt)$. Here, we will treat $\by$ as if it is a random variable. The other components of $z$ can be treated as non random variables. Here,<br /> <br /> <br /> &lt;ul&gt;<br /> * The model is the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$ of $\psi$.<br /> * The inputs required for the simulation are the values of $(\by,\bc,\theta,\bu,\bt)$.<br /> * The algorithm should be able to sample $\bpsi$ from the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$. [http://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo Markov Chain Monte Carlo] (MCMC) algorithms can be used for sampling from such complex conditional distributions.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> &lt;br&gt;<br /> <br /> ===Estimation of the population parameters===<br /> <br /> <br /> In a modeling context, we usually assume that we have data that includes the observations $\by$ and the measurement times $\bt$. There may also be individual covariates $\bc$, and in pharmacological applications the dose regimen $\bu$. For clarity, let us consider the most general case where all are present.<br /> <br /> Any statistical method for estimating the population parameters $\theta$ will be based on some specific probability distribution. Let us illustrate this with two common statistical methods: maximum likelihood and Bayesian estimation.<br /> <br /> <br /> ''Maximum likelihood estimation'' consists in maximizing with respect to $\theta$ the ''observed likelihood'', defined by:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\like}(\theta ; \by,\bc,\bu,\bt) &amp;\eqdef&amp; \py(\by ; \bc,\bu,\bt,\theta) \\<br /> &amp;=&amp; \int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The variance of the estimator $\thmle$ and therefore confidence intervals can be derived from the observed Fisher information matrix, which itself is calculated using the observed likelihood (i.e., the pdf of the observations $\by$):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ofim_intro3&quot;&gt;&lt;math&gt;<br /> \ofim(\thmle ; \by,\bc,\bu,\bt) \ \ \eqdef \ \ - \displaystyle{ \frac{\partial^2}{\partial \theta^2} } \log({\like}(\thmle ; \by,\bc,\bu,\bt)) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(14) }}<br /> <br /> <br /> {{OutlineText<br /> |text=Maximum likelihood estimation of the population parameter $\theta$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi$.<br /> * inputs $\by$, $\bc$, $\bu$ and $\bt$.<br /> * an algorithm which allows us to maximize $\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi$ with respect to $\theta$ and to compute $\displaystyle{ \frac{\partial^2}{\partial \theta^2} }\left\{\log\left(\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\thmle) \, d \bpsi \right)\right\}$.<br /> }}<br /> <br /> <br /> ''Bayesian estimation'' consists in estimating and/or maximizing the conditional distribution<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcthy(\theta {{!}} \by ;\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pyth(\by , \theta ; \bc,\bu,\bt)}{\py(\by ; \bc,\bu,\bt)} } \\<br /> &amp;=&amp; \frac{\displaystyle{ \int \pypsith(\by,\bpsi,\theta ; \bc,\bu,\bt) \, d \bpsi} }{\py(\by ; \bc,\bu,\bt)} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> {{outlineText<br /> |text= Bayesian estimation of the population parameter $\theta$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsith(\, \cdot \, ; \bc, \bu, \bt)$ for $(\by,\bpsi,\theta)$.<br /> * inputs $\by$, $\bc$, $\bu$ and $\bt$.<br /> * algorithms able to estimate and maximize $\pcthy(\theta {{!}} \by ;\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode, i.e., finding its maximum.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Estimation of the individual parameters===<br /> <br /> <br /> When $\theta$ is given (or estimated), various estimators of the individual parameters $\bpsi$ are available. They are all based on a probability distribution:<br /> <br /> ''Maximum likelihood estimation'' consists of maximizing with respect to $\bpsi$ the ''conditional likelihood''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\like}(\bpsi ; \by,\bu,\bt) &amp;\eqdef&amp; \pcypsi(\by {{!}} \bpsi ;\bu,\bt) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''maximum a posteriori'' (MAP) estimator is obtained by maximizing with respect to $\bpsi$ the ''conditional distribution''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcpsiy(\bpsi {{!}} \by ; \theta,\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pypsi(\by , \bpsi;\theta,\bc,\bu,\bt)}{\py(\by ; \theta,\bc,\bu,\bt)} } .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''conditional mean'' of $\bpsi$ is defined as the mean of the conditional distribution $\qcpsiy(\, \cdot \, | \by ; \theta,\bc,\bu,\bt)$ of $\psi$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> Estimation of the individual parameters $\bpsi$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br /> * algorithms able to estimate and maximize $\pcpsiy(\bpsi {{!}} \by ; \theta,\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode (i.e., its MAP).<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> ===Model selection===<br /> <br /> <br /> Likelihood ratio tests and statistical information criteria ([http://en.wikipedia.org/wiki/Bayesian_information_criterion BIC], [http://en.wikipedia.org/wiki/Akaike_information_criterion AIC]) compare the ''observed likelihoods'' computed under different models, i.e., the probability distribution functions $\py^{(1)}(\by ; \bc,\bu,\bt,\thmle_1)$, $\py^{(2)}(\by ; \bc,\bu,\bt,\thmle_2)$, ..., $\py^{(K)}(\by ; \bc,\bu,\bt,\thmle_K)$ computed under models ${\cal M}_1, {\cal M}_2$, ..., ${\cal M}_K$, where $\thmle_k$ maximizes the observed likelihood of model ${\cal M}_k$, i.e., maximizes $\py^{(k)}(\by ; \bc,\bu,\bt,\theta)$ .<br /> <br /> <br /> {{outlineText<br /> |text=<br /> Computing the observed likelihood and information criteria require:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br /> * an algorithm able to compute $\int \pypsi( \by ,\bpsi ;\theta,\bc,\bu,\bt) \, d\bpsi$. For nonlinear models, linearization methods or Monte Carlo methods can be used.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Optimal design===<br /> <br /> In the design of experiments for estimating statistical models, optimal designs allow parameters to be estimated with minimum variance by optimizing some statistical criteria. Common optimality criteria are [http://en.wikipedia.org/wiki/Functional_%28mathematics%29 functionals] of the [http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors eigenvalues] of the expected Fisher information matrix<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;efim_intro3&quot;&gt;&lt;math&gt;<br /> \efim(\theta ; \bu,\bt) \ \ \eqdef \ \ \esps{y}{\ofim(\theta ; \by,\bu,\bt)} ,<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(15) }}<br /> <br /> where $\ofim$ is the observed Fisher information matrix defined in [[#ofim_intro3|(15)]]. For the sake of simplicity, we consider models without covariates $\bc$.<br /> <br /> <br /> {{OutlineText<br /> |text=Optimal design for minimum variance estimation requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * a vector of population parameters $\theta$.<br /> * a criteria ${\cal D}(\bu,\bt)$ derived from the expected Fisher information matrix $\efim(\theta ; \bu,\bt)$.<br /> * an algorithm able to estimate $\efim(\theta ; \bu,\bt)$ for any design $(\bu,\bt)$ and to maximize ${\cal D}(\bu,\bt)$ with respect to $\bu$ and $\bt$.<br /> }}<br /> <br /> <br /> In a [http://en.wikipedia.org/wiki/Clinical_trial clinical trial] context, studies are designed to optimize the probability of reaching some predefined target ${\cal A}$, i.e., $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$. This may include optimizing safety and efficacy, and things like the probability of reaching [http://en.wikipedia.org/wiki/Sustained_viral_response sustained virologic response], etc.<br /> <br /> <br /> {{OutlineText<br /> |text=Optimal design for clinical trials requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsic(\, \cdot \, ; \theta, \bu, \bt)$ for $(\by,\bpsi,\bc)$.<br /> * a vector of population parameters $\theta$.<br /> * a target ${\cal A}$.<br /> * an algorithm able to estimate $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$ and to maximize it with respect to $\bu$ and $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Implementing models and running tasks==<br /> <br /> <br /> ===Example 1 ===<br /> <br /> Consider first the model defined by the joint distribution <br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\pypsi(\by,\bpsi ; \theta ,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \pcpsic(\bpsi ; \theta),&lt;/math&gt;}}<br /> <br /> where as in our running example, <br /> <br /> <br /> &lt;ul&gt;<br /> * $\by = (y_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are concentrations <br /> <br /> * $\bpsi= (\psi_i, 1\leq i \leq N)$ are individual parameters; here $\psi_i=(V_i,k_i,a_i)$<br /> <br /> * $\theta=(V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,a)$ are population parameters <br /> <br /> * $\bt = (t_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are the measurement times. <br /> &lt;/ul&gt;<br /> <br /> <br /> We aim to define a joint model for $\by$ and $\bpsi$. To do this, we will characterize each component of the model and show how this can be implemented with $\mlxtran$.<br /> <br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot;|<br /> {{Equation2 <br /> |name=&lt;math&gt; \pypsi(\by,\bpsi ; \theta, \bt) &lt;/math&gt; <br /> |equation= }}<br /> {{Equation2<br /> |name= &lt;math&gt; \pcpsic(\bpsi {{!}}\theta)&lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{array}{c} <br /> \log(V_i) &amp;\sim&amp; {\cal N}\left(\log(V_{\rm pop}), \, \omega_V^2\right) \\<br /> \log(k_i) &amp;\sim&amp; {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br /> \end{array}&lt;/math&gt; }}<br /> {{Equation2<br /> |name= &lt;math&gt;\pcypsi(y{{!}}\bpsi; \bt) &lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{eqnarray}<br /> f(t;V_i,k_i) &amp;=&amp; \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br /> y_{ij} &amp;\sim&amp; {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> |style = &quot;width:50%&quot; |<br /> {{MLXTranForTable<br /> |name=Example 1<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> [INDIVIDUAL PARAMETER]<br /> input={V_pop,k_pop,omega_V,omega_k}<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=V_pop,sd=omega_V}<br /> k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br /> <br /> <br /> [OBSERVATION]<br /> input={V,k,a}<br /> <br /> EQUATION:<br /> f = 500/V*exp(-k*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, sd=a}<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> <br /> We can then use this model with different tools for executing different tasks: it can be used for example with $\mlxplore$ for model exploration, with $\monolix$ for modeling, with R or Matlab for simulation, etc.<br /> <br /> It is important to remember that $\mlxtran$ is not a &quot;function&quot; that calculates an output. It is not an imperative but rather a declarative language, one that allows us to describe a model. It is then the tasks we choose to do which use $\mlxtran$ like a function, &quot;requesting&quot; it to give predictions, simulate random variables, compute a pdf, maximizes a likelihood, etc.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Example 2 ===<br /> <br /> Consider first the model defined by the joint distribution <br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\pypsi(\by,\bpsi ; \theta ,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \pcpsic(\bpsi ; \theta),&lt;/math&gt;}}<br /> <br /> where as in our running example, <br /> <br /> <br /> &lt;ul&gt;<br /> * $\by = (y_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are concentrations <br /> <br /> * $\bpsi= (\psi_i, 1\leq i \leq N)$ are individual parameters; here $\psi_i=(V_i,k_i,a_i)$<br /> <br /> * $\theta=(V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,a)$ are population parameters <br /> <br /> * $\bt = (t_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are the measurement times. <br /> &lt;/ul&gt;<br /> <br /> <br /> We aim to define a joint model for $\by$ and $\bpsi$. To do this, we will characterize each component of the model and show how this can be implemented with $\mlxtran$.<br /> <br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot;|<br /> {{Equation2 <br /> |name=&lt;math&gt; \pypsi(\by,\bpsi ; \theta, \bt) &lt;/math&gt; <br /> |equation= }}<br /> {{Equation2<br /> |name= &lt;math&gt; \pcpsic(\bpsi {{!}}\theta)&lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{array}{c} <br /> \log(V_i) &amp;\sim&amp; {\cal N}\left(\log(V_{\rm pop}), \, \omega_V^2\right) \\<br /> \log(k_i) &amp;\sim&amp; {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br /> \end{array}&lt;/math&gt; }}<br /> {{Equation2<br /> |name= &lt;math&gt;\pcypsi(y{{!}}\bpsi; \bt) &lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{eqnarray}<br /> f(t;V_i,k_i) &amp;=&amp; \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br /> y_{ij} &amp;\sim&amp; {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> |style = &quot;width:50%&quot; |<br /> {{MLXTranForTable<br /> |name=Example 1<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> [INDIVIDUAL PARAMETER]<br /> input={V_pop,k_pop,omega_V,omega_k}<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=V_pop,sd=omega_V}<br /> k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br /> <br /> <br /> [OBSERVATION]<br /> input={V,k,a}<br /> <br /> EQUATION:<br /> f = 500/V*exp(-k*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, sd=a}<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> <br /> We can then use this model with different tools for executing different tasks: it can be used for example with $\mlxplore$ for model exploration, with $\monolix$ for modeling, with R or Matlab for simulation...<br /> <br /> It is important to remember that $\mlxtran$ is not a &quot;function&quot; that calculates an output. It is not an imperative but rather a declarative language, one that allows us to describe a model. It is then the tasks we choose to do which use $\mlxtran$ like a function, &quot;requesting&quot; it to give predictions, simulate random variables, compute a pdf, maximize a likelihood, etc.<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack=The individual approach<br /> |linkNext=Description, representation and implementation of a model }}</div> Bleakley https://wiki.inria.fr/wikis/popix/index.php?title=What_is_a_model%3F_A_joint_probability_distribution!&diff=7373 What is a model? A joint probability distribution! 2013-06-19T09:34:04Z <p>Bleakley: </p> <hr /> <div>==Introduction==<br /> <br /> A model built for real-world applications can involve various types of variable, such as measurements, individual and population parameters, covariates, design, etc. The model allows us to represent relationships between these variables.<br /> <br /> If we consider things from a probabilistic point of view, some of the variables will be random, so the model becomes a probabilistic one, representing the [http://en.wikipedia.org/wiki/Joint_probability_distribution joint distribution] of these random variables.<br /> <br /> Defining a model therefore means defining a joint distribution. The hierarchical structure of the model will then allow it to be decomposed into submodels, i.e., the joint distribution decomposed into a product of [http://en.wikipedia.org/wiki/Conditional_probability_distribution conditional distributions].<br /> <br /> Tasks such as estimation, model selection, simulation and optimization can then be expressed as specific ways of using this probability distribution.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - A model is a joint probability distribution. <br /> <br /> - A submodel is a conditional distribution derived from this joint distribution. <br /> <br /> - A task is a specific use of this distribution. <br /> }}<br /> <br /> We will illustrate this approach starting with a very simple example that we will gradually make more sophisticated. Then we will see in various situations what can be defined as the model and what its inputs are.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==An illustrative example==<br /> <br /> &lt;br&gt;<br /> ===A model for the observations of a single individual===<br /> Let $y=(y_j, 1\leq j \leq n)$ be a vector of observations obtained at times $\vt=(t_j, 1\leq j \leq n)$. We consider that the $y_j$ are random variables and we denote $\qy$ the distribution (or [http://en.wikipedia.org/wiki/Probability_density_function pdf]) of $y$. If we assume a [http://en.wikipedia.org/wiki/Parametric_model parametric model], then there exists a vector of parameters $\psi$ that completely define $y$.<br /> <br /> We can then explicitly represent this dependency with respect to $\bpsi$ by writing $\qy( \, \cdot \, ; \psi)$ for the pdf of $y$.<br /> <br /> If we wish to be even more precise, we can even make it clear that this distribution is defined for a given design, i.e., a given vector of times $\vt$, and write $\qy(\, \cdot \, ; \psi,\vt)$ instead.<br /> <br /> By convention, the variables which are before the symbol &quot;;&quot; are random variables. Those that are after the &quot;;&quot; are non-random parameters or variables.<br /> When there is no risk of confusion, the non-random terms can be left out of the notation.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> -In this context, the model is the distribution of the observations $\qy(\, \cdot \, ; \psi,\vt)$. &lt;br&gt;<br /> -The inputs of the model are the parameters $\psi$ and the design $\vt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= 500 mg of a drug is given by [http://en.wikipedia.org/wiki/Intravenous_therapy intravenous] [http://en.wikipedia.org/wiki/Bolus_%28medicine%29 bolus] to a patient at time 0. We assume that the evolution of the [http://en.wikipedia.org/wiki/Blood_plasma plasmatic] concentration of the drug over time is described by the [http://en.wikipedia.org/wiki/Pharmacokinetics pharmacokinetic] (PK) model<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; f(t;V,k) = \displaystyle{ \frac{500}{V} }e^{-k \, t} , &lt;/math&gt; }}<br /> <br /> where $V$ is the [http://en.wikipedia.org/wiki/Volume_of_distribution volume of distribution] and $k$ the [http://en.wikipedia.org/wiki/Elimination_rate_constant elimination rate constant]. The concentration is measured at times $(t_j, 1\leq j \leq n)$ with additive residual errors:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; y_j = f(t_j;V,k) + e_j , \quad 1 \leq j \leq n . &lt;/math&gt; }}<br /> <br /> Assuming that the residual errors $(e_j)$ are [http://en.wikipedia.org/wiki/Dependent_and_independent_variables independent] and [http://en.wikipedia.org/wiki/Normal_distribution normally distributed] with constant variance $a^2$, the observed values $(y_j)$ are also independent random variables and<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba1&quot; &gt;&lt;math&gt;<br /> y_j \sim {\cal N} \left( f(t_j ; V,k) , a^2 \right), \quad 1 \leq j \leq n. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> Here, the vector of parameters $\psi$ is $(V,k,a)$. $V$ and $k$ are the PK parameters for the structural PK model and $a$ the residual error parameter.<br /> As the $y_j$ are independent, the joint distribution of $y$ is the product of their marginal distributions:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \py(y ; \psi,\vt) = \prod_{j=1}^n \pyj(y_j ; \psi,t_j) ,<br /> &lt;/math&gt; }}<br /> <br /> where $\qyj$ is the normal distribution defined in [[#ex_proba1|(1)]]. <br /> }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> === A model for several individuals ===<br /> <br /> Now let us move to $N$ individuals. It is natural to suppose that each is represented by the same basic parametric model, but not necessarily the exact same parameter values. Thus, individual $i$ has parameters $\psi_i$. If we consider that individuals are randomly selected from the [http://en.wikipedia.org/wiki/Statistical_population population], then we can treat the $\psi_i$ as if they were random vectors. As both $\by=(y_i , 1\leq i \leq N)$ and $\bpsi=(\psi_i , 1\leq i \leq N)$ are random, the model is now a joint distribution: $\qypsi$. Using basic probability, this can be written as:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsi(\by,\bpsi) = \pcypsi(\by {{!}} \bpsi) \, \ppsi(\bpsi) .&lt;/math&gt; }}<br /> <br /> If $\qpsi$ is a parametric distribution that depends on a vector $\theta$ of ''population parameters'' and a set of ''individual covariates'' $\bc=(c_i , 1\leq i \leq N)$, this dependence can be made explicit by writing $\qpsi(\, \cdot \,;\theta,\bc)$ for the pdf of $\bpsi$.<br /> Each $i$ has a potentially unique set of times $t_i=(t_{i1},\ldots,t_{i \ \!\!n_i})$ in the design, and $n_i$ can be different for each individual.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - In this context, the model is the joint distribution of the observations and the individual parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pypsi(\by , \bpsi; \theta, \bc,\bt)=\pcypsi(\by {{!}} \bpsi;\bt) \, \ppsi(\bpsi;\theta,\bc) . &lt;/math&gt;}}<br /> <br /> - The inputs of the model are the population parameters $\theta$, the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times <br /> :$\bt=(t_{ij} ,\ 1\leq i \leq N ,\ 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= Let us suppose $N$ patients received the same treatment as the single patient did. We now have the same PK model [[#ex_proba1|(1)]] for each patient, except that each has its own individual PK parameters $V_i$ and $k_i$ and potentially its own residual error parameter $a_i$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2a&quot;&gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N} \left( \displaystyle{\frac{500}{V_i}e^{-k_i \, t_{ij} } } , a_i^2 \right).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> Here, $\psi_i = (V_i,k_i,a_i)$. One possible model is then to assume the same residual error model for all patients, and log-normal distributions for $V$ and $k$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> a_i &amp;=&amp; a \end{eqnarray}&lt;/math&gt; }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2b&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \log(V_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(V_{\rm pop})+\beta\,\log(w_i/70),\ \omega_V^2\right) \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \log(k_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(k_{\rm pop}),\ \omega_k^2\right), \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where the only covariate we choose to consider, $w_i$, is the weight (in kg) of patient $i$. The model therefore consists of the conditional distribution of the concentrations defined in [[#ex_proba2a|(2)]] and the distribution of the individual PK parameters defined in [[#ex_proba2b|(3)]]. The inputs of the model are the population parameters $\theta = (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$, the covariates (here, the weight) $(w_i, 1\leq i \leq N)$, and the design $\bt$.<br /> }}<br /> <br /> &lt;br&gt;<br /> ===A model for the population parameters===<br /> <br /> In some cases, it may turn out that it is useful or important to consider that the population parameter $\theta$ is itself random, rather than fixed. There are various reasons for this, such as if we want to model uncertainty in its value, introduce a priori information in an estimation context, or model an inter-population variability if the model is not looking at only one given population.<br /> <br /> If so, let us denote $\qth$ the distribution of $\theta$. As the status of $\theta$ has therefore changed, the model now becomes the joint distribution of its random variables, i.e., of $\by$, $\bpsi$ and $\theta$, and can be decomposed as follows:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3a&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta;\bt,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> |text= &lt;ol&gt;<br /> &lt;li&gt; The formula is identical for $\ppsi(\bpsi; \theta)$ and $\pcpsith(\bpsi{{!}}\theta)$. What has changed is the status of $\theta$. It is not random in $\ppsi(\bpsi; \theta)$, the distribution of $\bpsi$ for any given value of $\theta$, whereas it is random in $\pcpsith(\bpsi {{!}} \theta)$, the conditional distribution of $\bpsi$, i.e., the distribution of $\bpsi$ obtained after observing randomly generated $\theta$. &lt;/li&gt;&lt;br&gt;<br /> <br /> &lt;li&gt;If $\qth$ is a parametric distribution with parameter $\varphi$, this dependence can be made explicit by writing $\qth(\, \cdot \,;\varphi)$ for the distribution of $\theta$.&lt;/li&gt;&lt;br&gt;<br /> <br /> &lt;li&gt;Not necessarily all of the components of $\theta$ need be random. If it is possible to decompose $\theta$ into $(\theta_F,\theta_R)$, where $\theta_F$ is fixed and $\theta_R$ random, then the decomposition [[#proba3a{{!}}(4)]] becomes &lt;/li&gt;<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3b&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta_R;\bt,\theta_F,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta_R;\theta_F,\bc) \, \pth(\theta_R).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(5) }} <br /> &lt;/ol&gt;}}<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the population parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsith(\by,\bpsi,\theta;\bc,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta). &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times $\bt=(t_{ij} , 1\leq i \leq N , 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We can introduce [http://en.wikipedia.org/wiki/Prior_probability prior distributions] in order to model the inter-population variability of the population parameters $V_{\rm pop}$ and $k_{\rm pop}$: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba3&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> V_{\rm pop} &amp;\sim&amp; {\cal N}\left(30,3^2\right) <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(6) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> k_{\rm pop} &amp;\sim&amp; {\cal N}\left(0.1,0.01^2\right). \end{eqnarray}&lt;/math&gt; }}<br /> <br /> As before, the conditional distribution of the concentration is given by [[#ex_proba2a|(2)]]. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given $\theta_R=(V_{\rm pop},k_{\rm pop})$. The distribution of $\theta_R$ is defined in [[#ex_proba3|(6)]]. Here, the inputs of the model are the fixed population parameters $\theta_F = (\omega_V,\omega_k,\beta,a)$, the weights $(w_i)$ and the design $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the covariates===<br /> <br /> <br /> Another scenario is to suppose that in fact it is the covariates $\bc$ that are random, not the population parameters. This may either be in the context of wanting to simulate individuals, or when modeling and wanting to take into account uncertainty in the covariate values. If we note $\qc$ the distribution of the covariates, the joint distribution $\qpsic$ of the individual parameters and the covariates decomposes naturally as:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba4&quot;&gt;&lt;math&gt;<br /> \ppsic(\bpsi,\bc;\theta) = \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) \, ,<br /> &lt;/math&gt;<br /> |reference=(7) }}<br /> <br /> where $\qcpsic$ is the conditional distribution of $\bpsi$ given $\bc$.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> &lt;li&gt;In this context, the model is the joint distribution of the observations, the individual parameters and the covariates:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsic(\by,\bpsi,\bc;\theta,\bt) = \pcypsi(\by {{!}} \bpsi;\bt) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) . &lt;/math&gt; }}<br /> <br /> &lt;li&gt;The inputs of the model are the population parameters $\theta$ and the measurement times $\bt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We could assume a normal distribution as a prior for the weights: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba4&quot; &gt;&lt;math&gt; <br /> w_i \sim_{i.i.d.} {\cal N}\left(70,10^2\right). &lt;/math&gt;&lt;/div&gt;<br /> |reference=(8) }}<br /> <br /> Once more, [[#ex_proba2a|(2)]] defines the conditional distribution of the concentrations. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given the weight $\bw$, which is now a random variable whose distribution is defined in [[#ex_proba4|(8)]]. Now, the inputs of the model are the population parameters $\theta= (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$ and the design $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the measurement times===<br /> <br /> Another scenario is to suppose that there is uncertainty in the measurement times $\bt=(t_{ij})$ and not the population parameters or covariates. If we note $\nominal{\bt}=(\nominal{t}_{ij}, 1\leq i \leq N, 1\leq j \leq n_i)$ the nominal measurement times (i.e., those presented in a data set), then the &quot;true&quot; measurement times $\bt$ at which the measurement were made can be considered random fluctuations around $\nominal{\bt}$ following some distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> Randomness with respect to time can also appear in the presence of dropout, i.e., individuals that prematurely leave a clinical trial. For such an individual $i$ who leaves at the random time $T_i$, measurement times are the nominal times before $T_i$: $t_{i} = (\nominal{t}_{ij} \ \ {\rm s.t. }\ \ \nominal{t}_{ij}\leq T_i)$.<br /> In such situations, measurement times are therefore random and can be thought to come from a distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> <br /> {{Remarks<br /> |title=Remark<br /> |text= If there are also other regression variables $\bx=(x_{ij})$, it is of course possible to use the same approach and consider $\bx$ as a random variable fluctuating around $\nominal{\bx}$. }}<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsit(\by , \bpsi,\bt; \theta,\bc,\nominal{\bt})=\pcypsit(\by {{!}}\bpsi,\bt) \, \ppsi(\bpsi;\theta,\bc) \, \pt(\bt ; \nominal{\bt}) .<br /> &lt;/math&gt; }}<br /> <br /> <br /> &lt;li&gt; The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$ and the nominal design $\nominal{\bt}$.<br /> }}<br /> <br /> {{Example<br /> |title=Example:<br /> |text= Let us assume as prior a normal distribution around the nominal times: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba5&quot; &gt;&lt;math&gt; <br /> t_{ij} \sim_{i.i.d.} {\cal N}\left(\nominal{t}_{ij},0.03^2\right). &lt;/math&gt;&lt;/div&gt;<br /> |reference=(9) }}<br /> <br /> Here, [[#ex_proba5|(9)]] defines the distribution of the now random variable $\bt$. The other components of the model defined in [[#ex_proba2a|(2)]] and [[#ex_proba2b|(3)]] remain unchanged. <br /> The inputs of the model are the population parameters $\theta$, the weights $(w_i)$ and the nominal measurement times $\nominal{\bt}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the dose regimen===<br /> <br /> If the structural model is a dynamical system (e.g., defined by a system of [http://en.wikipedia.org/wiki/Ordinary_differential_equation ordinary differential equations]), the ''source terms'' $\bu = (u_i, 1\leq i \leq N)$, i.e., the inputs of the dynamical system, are usually considered fixed and known. This is the case for example for doses administered to patients for a given treatment. Here, the source term $u_i$ is made up of the dose(s) given to patient $i$, the time(s) of administration, and their type (intravenous bolus, infusion, oral, etc.).<br /> <br /> Here again, there may be differences between the nominal dose regimen stated in the protocol and given in the data set, and the dose regimen that was in reality administered. For example, it might be that the times of administration and/or the dosage were not exactly respected or recorded. Also, there may have been non-compliance, i.e., certain doses that were not taken by the patient.<br /> <br /> If we denote $\nominal{\bu}=(\nominal{u}_{i}, 1\leq i \leq N)$ the nominal dose regimens (reported in the dataset), then in this context the &quot;real&quot; dose regimens $\bu$ can be considered to randomly fluctuate around $\nominal{\bu}$ with some distribution $\qu(\, \cdot \, ; \nominal{\bu})$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the dose regimens:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsiu(\by , \bpsi,\bu; \theta,\bc,\bt,\nominal{\bu})=\pcypsiu(\by {{!}} \bpsi,\bu;\bt) \, \pu(\bu ; \nominal{\bu}) \, \ppsi(\bpsi;\theta,\bc) .<br /> &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$, the nominal design $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=Suppose that instead of the one dose given in the example up to now, there are now repeated doses $(d_{ik}, k \geq 1)$ administered to patient $i$ at times $(\tau_{ik} , k \geq 1)$. Then, it is easy to see that<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6b&quot; &gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N}\left(f(t_{ij};V_i,k_i) , a_i^2 \right), &lt;/math&gt;&lt;/div&gt;<br /> |reference=(10) }}<br /> <br /> where<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6a&quot; &gt;&lt;math&gt; <br /> f(t;V_i,k_i) = \sum_{k, \tau_{ik}&lt;t}\displaystyle {\frac{d_{ik} }{V_i} }\, e^{-k_i \, (t- \tau_{ik})} .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(11) }}<br /> <br /> The &quot;real&quot; dose regimen administered to patient $i$ can be written $u_i=(d_{ik},\tau_{ik}, k\geq 1)$, and the prescribed dose regimen $\nominal{u}_i=(\nominal{d}_{ik},\nominal{\tau}_{ik}, k\geq 1)$.<br /> <br /> We can model the random fluctuations of the administration times $\tau_{ik}$ around the nominal times $(\nominal{\tau}_{ij})$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6c&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \tau_{ik} &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\nominal{\tau}_{ik},0.02^2\right)\ , <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(12) }}<br /> <br /> and non-compliance (here meaning that a dose is not taken):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6d&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \pi &amp;=&amp; \prob{d_{ik} = 0} \nonumber \\ &amp;=&amp; 1 - \prob{d_{ik}= \nominal{d}_{ik} }. <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(13) }}<br /> <br /> Here, [[#ex_proba6b{{!}}(10)]] and [[#ex_proba6a{{!}}(11)]] define the conditional distributions of the concentrations $(y_{ij})$, [[#ex_proba2b{{!}}(3)]] defines the distribution of $\bpsi$ and [[#ex_proba6c{{!}}(12)]] and [[#ex_proba6d{{!}}(13)]] define the distribution of $\bu$. The inputs are the population parameters $\theta$, the weights $(w_i)$, the measurement times $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A complete model===<br /> <br /> We have now seen the variety of ways in which the variables in a model either play the role of random variables whose distribution is defined by the model, or nonrandom variables or parameters. Any combination is possible, depending on the context. For instance, the population parameters $\theta$ and covariates $\bc$ could be random with parametric probability distributions $\qth(\, \cdot \,;\varphi)$ and $\qc(\, \cdot \,;\gamma)$, and the dose regimen $\bu$ and measurement times $\bt$ reported with uncertainty and therefore modeled as random variables with distributions $\qu$ and $\qt$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters, the population parameters, the dose regimens, the covariates and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsithcut(\by , \bpsi, \theta, \bu, \bc,\bt; \nominal{\bu},\nominal{\bt},\varphi,\gamma)=\pcypsiut(\by {{!}}\bpsi,\bu,\bt) \, \pcpsithc(\bpsi{{!}}\theta,\bc) \, \pth(\theta;\varphi) \, \pc(\bc;\gamma) \, \pu(\bu ; \nominal{\bu}) \, \pt(\bt ; \nominal{\bt}). &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the nominal dose regimens $\nominal{\bu}$, the nominal measurement times $\nominal{\bt}$ and the &quot;hyper-parameters&quot; $\varphi$ and $\gamma$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Using the model for executing tasks==<br /> <br /> <br /> In the modeling and simulation context, the tasks to execute make specific use of the various probability distributions associated with a model.<br /> <br /> &lt;br&gt;<br /> ===Simulation===<br /> <br /> <br /> By definition, simulation makes direct use of the probability distribution that defines the model. Simulation of the global model is straightforward as soon as the joint distribution can be decomposed into a product of easily simulated conditional distributions.<br /> <br /> Consider for example that the variables involved in the model are those introduced in [[#An illustrative example|the previous section]]:<br /> <br /> <br /> # The population parameters $\theta$ can either be given, or simulated from the distribution $\qth$.<br /> # The individual covariates $\bc$ can either be given, or simulated from the distribution $\qc$.<br /> # The individual parameters $\bpsi$ can be simulated from the distribution $\qcpsithc$ using the values of $\theta$ and $\bc$ obtained in steps 1 and 2.<br /> # The dose regimen $\bu$ can either be given, or simulated from the distribution $\qu$.<br /> # The measurement times $\bt$ (resp. regression variables $\bx$) can either be given, or simulated from the distribution $\qt$ (resp. $\qx$).<br /> # Lastly, observations $\by$ can be simulated from the distribution $\qcypsiut$ using the values of $\bpsi$, $\bu$ and $\bt$ obtained at steps 3, 4 and 5.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> Simulation of a set of variables $w$ using another given set of variables $z$ requires:<br /> <br /> <br /> &lt;ul&gt;<br /> * a model, i.e., a distribution $\qw$ if $z$ is treated as a nonrandom variable, or a conditional distribution $\qcwz$ if $z$ is treated as a random variable.<br /> * the input $z$, i.e., a value of $z$ which allows the distribution $\qw(\, \cdot \, ; z)$ or the conditional distribution $\qcwz(\, \cdot \, {{!}} z)$ to be defined.<br /> * an algorithm which allows us to generate $w$ from $\qw$ or $\qcwz$.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=<br /> - Imagine instead that the population parameter $\theta$ and the design $(\bu,\bt)$ are given, and we want to simulate the individual covariates $\bc$, the individual parameters $\bpsi$ and the observations $\by$. Here, the variables to simulate are $w=(\bc,\bpsi,\by)$ and the variables which are given are $z=(\theta,\bu,\bt)$. If the components of $z$ are taken to be nonrandom variables, then:<br /> <br /> <br /> &lt;ul&gt;<br /> * The model is the joint distribution $\qypsic( \, \cdot \, ;\theta,\bu,\bt)$ of $(\by,\bpsi,\bc)$.<br /> * The inputs required for the simulation are the values of $(\theta,\bu,\bt)$.<br /> * The algorithm should be able to generate $(\by,\bpsi,\bc)$ from the joint distribution $\qypsic(\, \cdot \, ;\theta,\bu,\bt)$. Decomposing the model into three submodels leads to decomposing the joint distribution as<br /> &lt;/ul&gt;<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsic(\by,\bpsi,\bc ;\theta,\bu,\bt) = \pc(\bc) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pcypsi(\by {{!}} \bpsi;\bu,\bt) . &lt;/math&gt; }}<br /> <br /> The algorithm therefore reduces to successively drawing $\bc$, $\bpsi$ and $\by$ from $\qc$, $\qcpsic(\, \cdot \, {{!}} \bc;\theta)$ and $\qcypsi(\, \cdot \, {{!}} \bpsi;\bu,\bt)$. <br /> <br /> <br /> - Imagine instead that the individual covariates $\bc$, the observations $\by$, the design $(\bu,\bt)$ and the population parameter $\theta$ are given (in a modeling context for instance, $\theta$ may have been estimated), and we want to simulate the individual parameters $\bpsi$. The only variable to simulate is $w=\bpsi$ and the variables which are given are $z=(\by,\bc,\theta,\bu,\bt)$. Here, we will treat $\by$ as if it is a random variable. The other components of $z$ can be treated as non random variables. Here,<br /> <br /> <br /> &lt;ul&gt;<br /> * The model is the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$ of $\psi$.<br /> * The inputs required for the simulation are the values of $(\by,\bc,\theta,\bu,\bt)$.<br /> * The algorithm should be able to sample $\bpsi$ from the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$. [http://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo Markov Chain Monte Carlo] (MCMC) algorithms can be used for sampling from such complex conditional distributions.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> &lt;br&gt;<br /> <br /> ===Estimation of the population parameters===<br /> <br /> <br /> In a modeling context, we usually assume that we have data that includes the observations $\by$ and the measurement times $\bt$. There may also be individual covariates $\bc$, and in pharmacological applications the dose regimen $\bu$. For clarity, let us consider the most general case where all are present.<br /> <br /> Any statistical method for estimating the population parameters $\theta$ will be based on some specific probability distribution. Let us illustrate this with two common statistical methods: maximum likelihood and Bayesian estimation.<br /> <br /> <br /> ''Maximum likelihood estimation'' consists in maximizing with respect to $\theta$ the ''observed likelihood'', defined by:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\like}(\theta ; \by,\bc,\bu,\bt) &amp;\eqdef&amp; \py(\by ; \bc,\bu,\bt,\theta) \\<br /> &amp;=&amp; \int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The variance of the estimator $\thmle$ and therefore confidence intervals can be derived from the observed Fisher information matrix, which itself is calculated using the observed likelihood (i.e., the pdf of the observations $\by$):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ofim_intro3&quot;&gt;&lt;math&gt;<br /> \ofim(\thmle ; \by,\bc,\bu,\bt) \ \ \eqdef \ \ - \displaystyle{ \frac{\partial^2}{\partial \theta^2} } \log({\like}(\thmle ; \by,\bc,\bu,\bt)) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(14) }}<br /> <br /> <br /> {{OutlineText<br /> |text=Maximum likelihood estimation of the population parameter $\theta$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi$.<br /> * inputs $\by$, $\bc$, $\bu$ and $\bt$.<br /> * an algorithm which allows us to maximize $\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi$ with respect to $\theta$ and to compute $\displaystyle{ \frac{\partial^2}{\partial \theta^2} }\left\{\log\left(\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\thmle) \, d \bpsi \right)\right\}$.<br /> }}<br /> <br /> <br /> ''Bayesian estimation'' consists in estimating and/or maximizing the conditional distribution<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcthy(\theta {{!}} \by ;\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pyth(\by , \theta ; \bc,\bu,\bt)}{\py(\by ; \bc,\bu,\bt)} } \\<br /> &amp;=&amp; \frac{\displaystyle{ \int \pypsith(\by,\bpsi,\theta ; \bc,\bu,\bt) \, d \bpsi} }{\py(\by ; \bc,\bu,\bt)} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> {{outlineText<br /> |text= Bayesian estimation of the population parameter $\theta$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsith(\, \cdot \, ; \bc, \bu, \bt)$ for $(\by,\bpsi,\theta)$.<br /> * inputs $\by$, $\bc$, $\bu$ and $\bt$.<br /> * algorithms able to estimate and maximize $\pcthy(\theta {{!}} \by ;\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode, i.e., finding its maximum.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Estimation of the individual parameters===<br /> <br /> <br /> When $\theta$ is given (or estimated), various estimators of the individual parameters $\bpsi$ are available. They are all based on a probability distribution:<br /> <br /> ''Maximum likelihood estimation'' consists of maximizing with respect to $\bpsi$ the ''conditional likelihood''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\like}(\bpsi ; \by,\bu,\bt) &amp;\eqdef&amp; \pcypsi(\by {{!}} \bpsi ;\bu,\bt) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''maximum a posteriori'' (MAP) estimator is obtained by maximizing with respect to $\bpsi$ the ''conditional distribution''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcpsiy(\bpsi {{!}} \by ; \theta,\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pypsi(\by , \bpsi;\theta,\bc,\bu,\bt)}{\py(\by ; \theta,\bc,\bu,\bt)} } .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''conditional mean'' of $\bpsi$ is defined as the mean of the conditional distribution $\qcpsiy(\, \cdot \, | \by ; \theta,\bc,\bu,\bt)$ of $\psi$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> Estimation of the individual parameters $\bpsi$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br /> * algorithms able to estimate and maximize $\pcpsiy(\bpsi {{!}} \by ; \theta,\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode (i.e., its MAP).<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> ===Model selection===<br /> <br /> <br /> Likelihood ratio tests and statistical information criteria ([http://en.wikipedia.org/wiki/Bayesian_information_criterion BIC], [http://en.wikipedia.org/wiki/Akaike_information_criterion AIC]) compare the ''observed likelihoods'' computed under different models, i.e., the probability distribution functions $\py^{(1)}(\by ; \bc,\bu,\bt,\thmle_1)$, $\py^{(2)}(\by ; \bc,\bu,\bt,\thmle_2)$, ..., $\py^{(K)}(\by ; \bc,\bu,\bt,\thmle_K)$ computed under models ${\cal M}_1, {\cal M}_2$, ..., ${\cal M}_K$, where $\thmle_k$ maximizes the observed likelihood of model ${\cal M}_k$, i.e., maximizes $\py^{(k)}(\by ; \bc,\bu,\bt,\theta)$ .<br /> <br /> <br /> {{outlineText<br /> |text=<br /> Computing the observed likelihood and information criteria require:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br /> * an algorithm able to compute $\int \pypsi( \by ,\bpsi ;\theta,\bc,\bu,\bt) \, d\bpsi$. For nonlinear models, linearization methods or Monte Carlo methods can be used.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Optimal design===<br /> <br /> In the design of experiments for estimating statistical models, optimal designs allow parameters to be estimated with minimum variance by optimizing some statistical criteria. Common optimality criteria are [http://en.wikipedia.org/wiki/Functional_%28mathematics%29 functionals] of the [http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors eigenvalues] of the expected Fisher information matrix<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;efim_intro3&quot;&gt;&lt;math&gt;<br /> \efim(\theta ; \bu,\bt) \ \ \eqdef \ \ \esps{y}{\ofim(\theta ; \by,\bu,\bt)} ,<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(15) }}<br /> <br /> where $\ofim$ is the observed Fisher information matrix defined in [[#ofim_intro3|(15)]]. For the sake of simplicity, we consider models without covariates $\bc$.<br /> <br /> <br /> {{OutlineText<br /> |text=Optimal design for minimum variance estimation requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * a vector of population parameters $\theta$.<br /> * a criteria ${\cal D}(\bu,\bt)$ derived from the expected Fisher information matrix $\efim(\theta ; \bu,\bt)$.<br /> * an algorithm able to estimate $\efim(\theta ; \bu,\bt)$ for any design $(\bu,\bt)$ and to maximize ${\cal D}(\bu,\bt)$ with respect to $\bu$ and $\bt$.<br /> }}<br /> <br /> <br /> In a [http://en.wikipedia.org/wiki/Clinical_trial clinical trial] context, studies are designed to optimize the probability of reaching some predefined target ${\cal A}$, i.e., $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$. This may include optimizing safety and efficacy, and things like the probability of reaching [http://en.wikipedia.org/wiki/Sustained_viral_response sustained virologic response], etc.<br /> <br /> <br /> {{OutlineText<br /> |text=Optimal design for clinical trials requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsic(\, \cdot \, ; \theta, \bu, \bt)$ for $(\by,\bpsi,\bc)$.<br /> * a vector of population parameters $\theta$.<br /> * a target ${\cal A}$.<br /> * an algorithm able to estimate $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$ and to maximize it with respect to $\bu$ and $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Implementing models and running tasks==<br /> <br /> <br /> ===Example 1 ===<br /> <br /> Consider first the model defined by the joint distribution <br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\pypsi(\by,\bpsi ; \theta ,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \pcpsic(\bpsi ; \theta),&lt;/math&gt;}}<br /> <br /> where as in our running example, <br /> <br /> <br /> &lt;ul&gt;<br /> * $\by = (y_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are concentrations <br /> <br /> * $\bpsi= (\psi_i, 1\leq i \leq N)$ are individual parameters; here $\psi_i=(V_i,k_i,a_i)$<br /> <br /> * $\theta=(V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,a)$ are population parameters <br /> <br /> * $\bt = (t_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are the measurement times. <br /> &lt;/ul&gt;<br /> <br /> <br /> We aim to define a joint model for $\by$ and $\bpsi$. To do this, we will characterize each component of the model and show how this can be implemented with $\mlxtran$.<br /> <br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot;|<br /> {{Equation2 <br /> |name=&lt;math&gt; \pypsi(\by,\bpsi ; \theta, \bt) &lt;/math&gt; <br /> |equation= }}<br /> {{Equation2<br /> |name= &lt;math&gt; \pcpsic(\bpsi {{!}}\theta)&lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{array}{c} <br /> \log(V_i) &amp;\sim&amp; {\cal N}\left(\log(V_{\rm pop}), \, \omega_V^2\right) \\<br /> \log(k_i) &amp;\sim&amp; {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br /> \end{array}&lt;/math&gt; }}<br /> {{Equation2<br /> |name= &lt;math&gt;\pcypsi(y{{!}}\bpsi; \bt) &lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{eqnarray}<br /> f(t;V_i,k_i) &amp;=&amp; \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br /> y_{ij} &amp;\sim&amp; {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> |style = &quot;width:50%&quot; |<br /> {{MLXTranForTable<br /> |name=Example 1<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> [INDIVIDUAL PARAMETER]<br /> input={V_pop,k_pop,omega_V,omega_k}<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=V_pop,sd=omega_V}<br /> k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br /> <br /> <br /> [OBSERVATION]<br /> input={V,k,a}<br /> <br /> EQUATION:<br /> f = 500/V*exp(-k*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, sd=a}<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> <br /> We can then use this model with different tools for executing different tasks: it can be used for example with $\mlxplore$ for model exploration, with $\monolix$ for modeling, with R or Matlab for simulation, etc.<br /> <br /> It is important to remember that $\mlxtran$ is not a &quot;function&quot; that calculates an output. It is not an imperative but rather a declarative language, one that allows us to describe a model. It is then the tasks we choose to do which use $\mlxtran$ like a function, &quot;requesting&quot; it to give predictions, simulate random variables, compute a pdf, maximizes a likelihood, etc.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Example 1 ===<br /> <br /> Consider first the model defined by the joint distribution <br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\pypsi(\by,\bpsi ; \theta ,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \pcpsic(\bpsi ; \theta),&lt;/math&gt;}}<br /> <br /> where as in our running example, <br /> <br /> <br /> &lt;ul&gt;<br /> * $\by = (y_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are concentrations <br /> <br /> * $\bpsi= (\psi_i, 1\leq i \leq N)$ are individual parameters; here $\psi_i=(V_i,k_i,a_i)$<br /> <br /> * $\theta=(V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,a)$ are population parameters <br /> <br /> * $\bt = (t_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are the measurement times. <br /> &lt;/ul&gt;<br /> <br /> <br /> We aim to define a joint model for $\by$ and $\bpsi$. To do this, we will characterize each component of the model and show how this can be implemented with $\mlxtran$.<br /> <br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot;|<br /> {{Equation2 <br /> |name=&lt;math&gt; \pypsi(\by,\bpsi ; \theta, \bt) &lt;/math&gt; <br /> |equation= }}<br /> {{Equation2<br /> |name= &lt;math&gt; \pcpsic(\bpsi {{!}}\theta)&lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{array}{c} <br /> \log(V_i) &amp;\sim&amp; {\cal N}\left(\log(V_{\rm pop}), \, \omega_V^2\right) \\<br /> \log(k_i) &amp;\sim&amp; {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br /> \end{array}&lt;/math&gt; }}<br /> {{Equation2<br /> |name= &lt;math&gt;\pcypsi(y{{!}}\bpsi; \bt) &lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{eqnarray}<br /> f(t;V_i,k_i) &amp;=&amp; \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br /> y_{ij} &amp;\sim&amp; {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> |style = &quot;width:50%&quot; |<br /> {{MLXTranForTable<br /> |name=Example 1<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> [INDIVIDUAL PARAMETER]<br /> input={V_pop,k_pop,omega_V,omega_k}<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=V_pop,sd=omega_V}<br /> k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br /> <br /> <br /> [OBSERVATION]<br /> input={V,k,a}<br /> <br /> EQUATION:<br /> f = 500/V*exp(-k*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, sd=a}<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> <br /> We can then use this model with different tools for executing different tasks: it can be used for example with $\mlxplore$ for model exploration, with $\monolix$ for modeling, with R or Matlab for simulation...<br /> <br /> It is important to remember that $\mlxtran$ is not a &quot;function&quot; that calculates an output. It is not an imperative but rather a declarative language, one that allows us to describe a model. It is then the tasks we choose to do which use $\mlxtran$ like a function, &quot;requesting&quot; it to give predictions, simulate random variables, compute a pdf, maximize a likelihood, etc.<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack=The individual approach<br /> |linkNext=Description, representation and implementation of a model }}</div> Bleakley https://wiki.inria.fr/wikis/popix/index.php?title=What_is_a_model%3F_A_joint_probability_distribution!&diff=7372 What is a model? A joint probability distribution! 2013-06-19T09:32:32Z <p>Bleakley: </p> <hr /> <div>==Introduction==<br /> <br /> A model built for real-world applications can involve various types of variable, such as measurements, individual and population parameters, covariates, design, etc. The model allows us to represent relationships between these variables.<br /> <br /> If we consider things from a probabilistic point of view, some of the variables will be random, so the model becomes a probabilistic one, representing the [http://en.wikipedia.org/wiki/Joint_probability_distribution joint distribution] of these random variables.<br /> <br /> Defining a model therefore means defining a joint distribution. The hierarchical structure of the model will then allow it to be decomposed into submodels, i.e., the joint distribution decomposed into a product of [http://en.wikipedia.org/wiki/Conditional_probability_distribution conditional distributions].<br /> <br /> Tasks such as estimation, model selection, simulation and optimization can then be expressed as specific ways of using this probability distribution.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - A model is a joint probability distribution. <br /> <br /> - A submodel is a conditional distribution derived from this joint distribution. <br /> <br /> - A task is a specific use of this distribution. <br /> }}<br /> <br /> We will illustrate this approach starting with a very simple example that we will gradually make more sophisticated. Then we will see in various situations what can be defined as the model and what its inputs are.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==An illustrative example==<br /> <br /> &lt;br&gt;<br /> ===A model for the observations of a single individual===<br /> Let $y=(y_j, 1\leq j \leq n)$ be a vector of observations obtained at times $\vt=(t_j, 1\leq j \leq n)$. We consider that the $y_j$ are random variables and we denote $\qy$ the distribution (or [http://en.wikipedia.org/wiki/Probability_density_function pdf]) of $y$. If we assume a [http://en.wikipedia.org/wiki/Parametric_model parametric model], then there exists a vector of parameters $\psi$ that completely define $y$.<br /> <br /> We can then explicitly represent this dependency with respect to $\bpsi$ by writing $\qy( \, \cdot \, ; \psi)$ for the pdf of $y$.<br /> <br /> If we wish to be even more precise, we can even make it clear that this distribution is defined for a given design, i.e., a given vector of times $\vt$, and write $\qy(\, \cdot \, ; \psi,\vt)$ instead.<br /> <br /> By convention, the variables which are before the symbol &quot;;&quot; are random variables. Those that are after the &quot;;&quot; are non-random parameters or variables.<br /> When there is no risk of confusion, the non-random terms can be left out of the notation.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> -In this context, the model is the distribution of the observations $\qy(\, \cdot \, ; \psi,\vt)$. &lt;br&gt;<br /> -The inputs of the model are the parameters $\psi$ and the design $\vt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= 500 mg of a drug is given by [http://en.wikipedia.org/wiki/Intravenous_therapy intravenous] [http://en.wikipedia.org/wiki/Bolus_%28medicine%29 bolus] to a patient at time 0. We assume that the evolution of the [http://en.wikipedia.org/wiki/Blood_plasma plasmatic] concentration of the drug over time is described by the [http://en.wikipedia.org/wiki/Pharmacokinetics pharmacokinetic] (PK) model<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; f(t;V,k) = \displaystyle{ \frac{500}{V} }e^{-k \, t} , &lt;/math&gt; }}<br /> <br /> where $V$ is the [http://en.wikipedia.org/wiki/Volume_of_distribution volume of distribution] and $k$ the [http://en.wikipedia.org/wiki/Elimination_rate_constant elimination rate constant]. The concentration is measured at times $(t_j, 1\leq j \leq n)$ with additive residual errors:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; y_j = f(t_j;V,k) + e_j , \quad 1 \leq j \leq n . &lt;/math&gt; }}<br /> <br /> Assuming that the residual errors $(e_j)$ are [http://en.wikipedia.org/wiki/Dependent_and_independent_variables independent] and [http://en.wikipedia.org/wiki/Normal_distribution normally distributed] with constant variance $a^2$, the observed values $(y_j)$ are also independent random variables and<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba1&quot; &gt;&lt;math&gt;<br /> y_j \sim {\cal N} \left( f(t_j ; V,k) , a^2 \right), \quad 1 \leq j \leq n. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> Here, the vector of parameters $\psi$ is $(V,k,a)$. $V$ and $k$ are the PK parameters for the structural PK model and $a$ the residual error parameter.<br /> As the $y_j$ are independent, the joint distribution of $y$ is the product of their marginal distributions:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \py(y ; \psi,\vt) = \prod_{j=1}^n \pyj(y_j ; \psi,t_j) ,<br /> &lt;/math&gt; }}<br /> <br /> where $\qyj$ is the normal distribution defined in [[#ex_proba1|(1)]]. <br /> }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> === A model for several individuals ===<br /> <br /> Now let us move to $N$ individuals. It is natural to suppose that each is represented by the same basic parametric model, but not necessarily the exact same parameter values. Thus, individual $i$ has parameters $\psi_i$. If we consider that individuals are randomly selected from the [http://en.wikipedia.org/wiki/Statistical_population population], then we can treat the $\psi_i$ as if they were random vectors. As both $\by=(y_i , 1\leq i \leq N)$ and $\bpsi=(\psi_i , 1\leq i \leq N)$ are random, the model is now a joint distribution: $\qypsi$. Using basic probability, this can be written as:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsi(\by,\bpsi) = \pcypsi(\by {{!}} \bpsi) \, \ppsi(\bpsi) .&lt;/math&gt; }}<br /> <br /> If $\qpsi$ is a parametric distribution that depends on a vector $\theta$ of ''population parameters'' and a set of ''individual covariates'' $\bc=(c_i , 1\leq i \leq N)$, this dependence can be made explicit by writing $\qpsi(\, \cdot \,;\theta,\bc)$ for the pdf of $\bpsi$.<br /> Each $i$ has a potentially unique set of times $t_i=(t_{i1},\ldots,t_{i \ \!\!n_i})$ in the design, and $n_i$ can be different for each individual.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - In this context, the model is the joint distribution of the observations and the individual parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pypsi(\by , \bpsi; \theta, \bc,\bt)=\pcypsi(\by {{!}} \bpsi;\bt) \, \ppsi(\bpsi;\theta,\bc) . &lt;/math&gt;}}<br /> <br /> - The inputs of the model are the population parameters $\theta$, the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times <br /> :$\bt=(t_{ij} ,\ 1\leq i \leq N ,\ 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= Let us suppose $N$ patients received the same treatment as the single patient did. We now have the same PK model [[#ex_proba1|(1)]] for each patient, except that each has its own individual PK parameters $V_i$ and $k_i$ and potentially its own residual error parameter $a_i$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2a&quot;&gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N} \left( \displaystyle{\frac{500}{V_i}e^{-k_i \, t_{ij} } } , a_i^2 \right).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> Here, $\psi_i = (V_i,k_i,a_i)$. One possible model is then to assume the same residual error model for all patients, and log-normal distributions for $V$ and $k$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> a_i &amp;=&amp; a \end{eqnarray}&lt;/math&gt; }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2b&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \log(V_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(V_{\rm pop})+\beta\,\log(w_i/70),\ \omega_V^2\right) \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \log(k_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(k_{\rm pop}),\ \omega_k^2\right), \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where the only covariate we choose to consider, $w_i$, is the weight (in kg) of patient $i$. The model therefore consists of the conditional distribution of the concentrations defined in [[#ex_proba2a|(2)]] and the distribution of the individual PK parameters defined in [[#ex_proba2b|(3)]]. The inputs of the model are the population parameters $\theta = (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$, the covariates (here, the weight) $(w_i, 1\leq i \leq N)$, and the design $\bt$.<br /> }}<br /> <br /> &lt;br&gt;<br /> ===A model for the population parameters===<br /> <br /> In some cases, it may turn out that it is useful or important to consider that the population parameter $\theta$ is itself random, rather than fixed. There are various reasons for this, such as if we want to model uncertainty in its value, introduce a priori information in an estimation context, or model an inter-population variability if the model is not looking at only one given population.<br /> <br /> If so, let us denote $\qth$ the distribution of $\theta$. As the status of $\theta$ has therefore changed, the model now becomes the joint distribution of its random variables, i.e., of $\by$, $\bpsi$ and $\theta$, and can be decomposed as follows:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3a&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta;\bt,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> |text= &lt;ol&gt;<br /> &lt;li&gt; The formula is identical for $\ppsi(\bpsi; \theta)$ and $\pcpsith(\bpsi{{!}}\theta)$. What has changed is the status of $\theta$. It is not random in $\ppsi(\bpsi; \theta)$, the distribution of $\bpsi$ for any given value of $\theta$, whereas it is random in $\pcpsith(\bpsi {{!}} \theta)$, the conditional distribution of $\bpsi$, i.e., the distribution of $\bpsi$ obtained after observing randomly generated $\theta$. &lt;/li&gt;&lt;br&gt;<br /> <br /> &lt;li&gt;If $\qth$ is a parametric distribution with parameter $\varphi$, this dependence can be made explicit by writing $\qth(\, \cdot \,;\varphi)$ for the distribution of $\theta$.&lt;/li&gt;&lt;br&gt;<br /> <br /> &lt;li&gt;Not necessarily all of the components of $\theta$ need be random. If it is possible to decompose $\theta$ into $(\theta_F,\theta_R)$, where $\theta_F$ is fixed and $\theta_R$ random, then the decomposition [[#proba3a{{!}}(4)]] becomes &lt;/li&gt;<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3b&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta_R;\bt,\theta_F,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta_R;\theta_F,\bc) \, \pth(\theta_R).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(5) }} <br /> &lt;/ol&gt;}}<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the population parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsith(\by,\bpsi,\theta;\bc,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta). &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times $\bt=(t_{ij} , 1\leq i \leq N , 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We can introduce [http://en.wikipedia.org/wiki/Prior_probability prior distributions] in order to model the inter-population variability of the population parameters $V_{\rm pop}$ and $k_{\rm pop}$: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba3&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> V_{\rm pop} &amp;\sim&amp; {\cal N}\left(30,3^2\right) <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(6) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> k_{\rm pop} &amp;\sim&amp; {\cal N}\left(0.1,0.01^2\right). \end{eqnarray}&lt;/math&gt; }}<br /> <br /> As before, the conditional distribution of the concentration is given by [[#ex_proba2a|(2)]]. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given $\theta_R=(V_{\rm pop},k_{\rm pop})$. The distribution of $\theta_R$ is defined in [[#ex_proba3|(6)]]. Here, the inputs of the model are the fixed population parameters $\theta_F = (\omega_V,\omega_k,\beta,a)$, the weights $(w_i)$ and the design $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the covariates===<br /> <br /> <br /> Another scenario is to suppose that in fact it is the covariates $\bc$ that are random, not the population parameters. This may either be in the context of wanting to simulate individuals, or when modeling and wanting to take into account uncertainty in the covariate values. If we note $\qc$ the distribution of the covariates, the joint distribution $\qpsic$ of the individual parameters and the covariates decomposes naturally as:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba4&quot;&gt;&lt;math&gt;<br /> \ppsic(\bpsi,\bc;\theta) = \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) \, ,<br /> &lt;/math&gt;<br /> |reference=(7) }}<br /> <br /> where $\qcpsic$ is the conditional distribution of $\bpsi$ given $\bc$.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> &lt;li&gt;In this context, the model is the joint distribution of the observations, the individual parameters and the covariates:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsic(\by,\bpsi,\bc;\theta,\bt) = \pcypsi(\by {{!}} \bpsi;\bt) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) . &lt;/math&gt; }}<br /> <br /> &lt;li&gt;The inputs of the model are the population parameters $\theta$ and the measurement times $\bt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We could assume a normal distribution as a prior for the weights: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba4&quot; &gt;&lt;math&gt; <br /> w_i \sim_{i.i.d.} {\cal N}\left(70,10^2\right). &lt;/math&gt;&lt;/div&gt;<br /> |reference=(8) }}<br /> <br /> Once more, [[#ex_proba2a|(2)]] defines the conditional distribution of the concentrations. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given the weight $\bw$, which is now a random variable whose distribution is defined in [[#ex_proba4|(8)]]. Now, the inputs of the model are the population parameters $\theta= (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$ and the design $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the measurement times===<br /> <br /> Another scenario is to suppose that there is uncertainty in the measurement times $\bt=(t_{ij})$ and not the population parameters or covariates. If we note $\nominal{\bt}=(\nominal{t}_{ij}, 1\leq i \leq N, 1\leq j \leq n_i)$ the nominal measurement times (i.e., those presented in a data set), then the &quot;true&quot; measurement times $\bt$ at which the measurement were made can be considered random fluctuations around $\nominal{\bt}$ following some distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> Randomness with respect to time can also appear in the presence of dropout, i.e., individuals that prematurely leave a clinical trial. For such an individual $i$ who leaves at the random time $T_i$, measurement times are the nominal times before $T_i$: $t_{i} = (\nominal{t}_{ij} \ \ {\rm s.t. }\ \ \nominal{t}_{ij}\leq T_i)$.<br /> In such situations, measurement times are therefore random and can be thought to come from a distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> <br /> {{Remarks<br /> |title=Remark<br /> |text= If there are also other regression variables $\bx=(x_{ij})$, it is of course possible to use the same approach and consider $\bx$ as a random variable fluctuating around $\nominal{\bx}$. }}<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsit(\by , \bpsi,\bt; \theta,\bc,\nominal{\bt})=\pcypsit(\by {{!}}\bpsi,\bt) \, \ppsi(\bpsi;\theta,\bc) \, \pt(\bt ; \nominal{\bt}) .<br /> &lt;/math&gt; }}<br /> <br /> <br /> &lt;li&gt; The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$ and the nominal design $\nominal{\bt}$.<br /> }}<br /> <br /> {{Example<br /> |title=Example:<br /> |text= Let us assume as prior a normal distribution around the nominal times: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba5&quot; &gt;&lt;math&gt; <br /> t_{ij} \sim_{i.i.d.} {\cal N}\left(\nominal{t}_{ij},0.03^2\right). &lt;/math&gt;&lt;/div&gt;<br /> |reference=(9) }}<br /> <br /> Here, [[#ex_proba5|(9)]] defines the distribution of the now random variable $\bt$. The other components of the model defined in [[#ex_proba2a|(2)]] and [[#ex_proba2b|(3)]] remain unchanged. <br /> The inputs of the model are the population parameters $\theta$, the weights $(w_i)$ and the nominal measurement times $\nominal{\bt}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the dose regimen===<br /> <br /> If the structural model is a dynamical system (e.g., defined by a system of [http://en.wikipedia.org/wiki/Ordinary_differential_equation ordinary differential equations]), the ''source terms'' $\bu = (u_i, 1\leq i \leq N)$, i.e., the inputs of the dynamical system, are usually considered fixed and known. This is the case for example for doses administered to patients for a given treatment. Here, the source term $u_i$ is made up of the dose(s) given to patient $i$, the time(s) of administration, and their type (intravenous bolus, infusion, oral, etc.).<br /> <br /> Here again, there may be differences between the nominal dose regimen stated in the protocol and given in the data set, and the dose regimen that was in reality administered. For example, it might be that the times of administration and/or the dosage were not exactly respected or recorded. Also, there may have been non-compliance, i.e., certain doses that were not taken by the patient.<br /> <br /> If we denote $\nominal{\bu}=(\nominal{u}_{i}, 1\leq i \leq N)$ the nominal dose regimens (reported in the dataset), then in this context the &quot;real&quot; dose regimens $\bu$ can be considered to randomly fluctuate around $\nominal{\bu}$ with some distribution $\qu(\, \cdot \, ; \nominal{\bu})$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the dose regimens:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsiu(\by , \bpsi,\bu; \theta,\bc,\bt,\nominal{\bu})=\pcypsiu(\by {{!}} \bpsi,\bu;\bt) \, \pu(\bu ; \nominal{\bu}) \, \ppsi(\bpsi;\theta,\bc) .<br /> &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$, the nominal design $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=Suppose that instead of the one dose given in the example up to now, there are now repeated doses $(d_{ik}, k \geq 1)$ administered to patient $i$ at times $(\tau_{ik} , k \geq 1)$. Then, it is easy to see that<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6b&quot; &gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N}\left(f(t_{ij};V_i,k_i) , a_i^2 \right), &lt;/math&gt;&lt;/div&gt;<br /> |reference=(10) }}<br /> <br /> where<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6a&quot; &gt;&lt;math&gt; <br /> f(t;V_i,k_i) = \sum_{k, \tau_{ik}&lt;t}\displaystyle {\frac{d_{ik} }{V_i} }\, e^{-k_i \, (t- \tau_{ik})} .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(11) }}<br /> <br /> The &quot;real&quot; dose regimen administered to patient $i$ can be written $u_i=(d_{ik},\tau_{ik}, k\geq 1)$, and the prescribed dose regimen $\nominal{u}_i=(\nominal{d}_{ik},\nominal{\tau}_{ik}, k\geq 1)$.<br /> <br /> We can model the random fluctuations of the administration times $\tau_{ik}$ around the nominal times $(\nominal{\tau}_{ij})$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6c&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \tau_{ik} &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\nominal{\tau}_{ik},0.02^2\right)\ , <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(12) }}<br /> <br /> and non-compliance (here meaning that a dose is not taken):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6d&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \pi &amp;=&amp; \prob{d_{ik} = 0} \nonumber \\ &amp;=&amp; 1 - \prob{d_{ik}= \nominal{d}_{ik} }. <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(13) }}<br /> <br /> Here, [[#ex_proba6b{{!}}(10)]] and [[#ex_proba6a{{!}}(11)]] define the conditional distributions of the concentrations $(y_{ij})$, [[#ex_proba2b{{!}}(3)]] defines the distribution of $\bpsi$ and [[#ex_proba6c{{!}}(12)]] and [[#ex_proba6d{{!}}(13)]] define the distribution of $\bu$. The inputs are the population parameters $\theta$, the weights $(w_i)$, the measurement times $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A complete model===<br /> <br /> We have now seen the variety of ways in which the variables in a model either play the role of random variables whose distribution is defined by the model, or nonrandom variables or parameters. Any combination is possible, depending on the context. For instance, the population parameters $\theta$ and covariates $\bc$ could be random with parametric probability distributions $\qth(\, \cdot \,;\varphi)$ and $\qc(\, \cdot \,;\gamma)$, and the dose regimen $\bu$ and measurement times $\bt$ reported with uncertainty and therefore modeled as random variables with distributions $\qu$ and $\qt$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters, the population parameters, the dose regimens, the covariates and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsithcut(\by , \bpsi, \theta, \bu, \bc,\bt; \nominal{\bu},\nominal{\bt},\varphi,\gamma)=\pcypsiut(\by {{!}}\bpsi,\bu,\bt) \, \pcpsithc(\bpsi{{!}}\theta,\bc) \, \pth(\theta;\varphi) \, \pc(\bc;\gamma) \, \pu(\bu ; \nominal{\bu}) \, \pt(\bt ; \nominal{\bt}). &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the nominal dose regimens $\nominal{\bu}$, the nominal measurement times $\nominal{\bt}$ and the &quot;hyper-parameters&quot; $\varphi$ and $\gamma$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Using the model for executing tasks==<br /> <br /> <br /> In the modeling and simulation context, the tasks to execute make specific use of the various probability distributions associated with a model.<br /> <br /> &lt;br&gt;<br /> ===Simulation===<br /> <br /> <br /> By definition, simulation makes direct use of the probability distribution that defines the model. Simulation of the global model is straightforward as soon as the joint distribution can be decomposed into a product of easily simulated conditional distributions.<br /> <br /> Consider for example that the variables involved in the model are those introduced in [[#An illustrative example|the previous section]]:<br /> <br /> <br /> # The population parameters $\theta$ can either be given, or simulated from the distribution $\qth$.<br /> # The individual covariates $\bc$ can either be given, or simulated from the distribution $\qc$.<br /> # The individual parameters $\bpsi$ can be simulated from the distribution $\qcpsithc$ using the values of $\theta$ and $\bc$ obtained in steps 1 and 2.<br /> # The dose regimen $\bu$ can either be given, or simulated from the distribution $\qu$.<br /> # The measurement times $\bt$ (resp. regression variables $\bx$) can either be given, or simulated from the distribution $\qt$ (resp. $\qx$).<br /> # Lastly, observations $\by$ can be simulated from the distribution $\qcypsiut$ using the values of $\bpsi$, $\bu$ and $\bt$ obtained at steps 3, 4 and 5.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> Simulation of a set of variables $w$ using another given set of variables $z$ requires:<br /> <br /> <br /> &lt;ul&gt;<br /> * a model, i.e., a distribution $\qw$ if $z$ is treated as a nonrandom variable, or a conditional distribution $\qcwz$ if $z$ is treated as a random variable.<br /> * the input $z$, i.e., a value of $z$ which allows the distribution $\qw(\, \cdot \, ; z)$ or the conditional distribution $\qcwz(\, \cdot \, {{!}} z)$ to be defined.<br /> * an algorithm which allows us to generate $w$ from $\qw$ or $\qcwz$.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=<br /> - Imagine instead that the population parameter $\theta$ and the design $(\bu,\bt)$ are given, and we want to simulate the individual covariates $\bc$, the individual parameters $\bpsi$ and the observations $\by$. Here, the variables to simulate are $w=(\bc,\bpsi,\by)$ and the variables which are given are $z=(\theta,\bu,\bt)$. If the components of $z$ are taken to be nonrandom variables, then:<br /> <br /> <br /> &lt;ul&gt;<br /> * The model is the joint distribution $\qypsic( \, \cdot \, ;\theta,\bu,\bt)$ of $(\by,\bpsi,\bc)$.<br /> * The inputs required for the simulation are the values of $(\theta,\bu,\bt)$.<br /> * The algorithm should be able to generate $(\by,\bpsi,\bc)$ from the joint distribution $\qypsic(\, \cdot \, ;\theta,\bu,\bt)$. Decomposing the model into three submodels leads to decomposing the joint distribution as<br /> &lt;/ul&gt;<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsic(\by,\bpsi,\bc ;\theta,\bu,\bt) = \pc(\bc) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pcypsi(\by {{!}} \bpsi;\bu,\bt) . &lt;/math&gt; }}<br /> <br /> The algorithm therefore reduces to successively drawing $\bc$, $\bpsi$ and $\by$ from $\qc$, $\qcpsic(\, \cdot \, {{!}} \bc;\theta)$ and $\qcypsi(\, \cdot \, {{!}} \bpsi;\bu,\bt)$. <br /> <br /> <br /> - Imagine instead that the individual covariates $\bc$, the observations $\by$, the design $(\bu,\bt)$ and the population parameter $\theta$ are given (in a modeling context for instance, $\theta$ may have been estimated), and we want to simulate the individual parameters $\bpsi$. The only variable to simulate is $w=\bpsi$ and the variables which are given are $z=(\by,\bc,\theta,\bu,\bt)$. Here, we will treat $\by$ as if it is a random variable. The other components of $z$ can be treated as non random variables. Here,<br /> <br /> <br /> &lt;ul&gt;<br /> * The model is the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$ of $\psi$.<br /> * The inputs required for the simulation are the values of $(\by,\bc,\theta,\bu,\bt)$.<br /> * The algorithm should be able to sample $\bpsi$ from the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$. [http://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo Markov Chain Monte Carlo] (MCMC) algorithms can be used for sampling from such complex conditional distributions.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> &lt;br&gt;<br /> <br /> ===Estimation of the population parameters===<br /> <br /> <br /> In a modeling context, we usually assume that we have data that includes the observations $\by$ and the measurement times $\bt$. There may also be individual covariates $\bc$, and in pharmacological applications the dose regimen $\bu$. For clarity, let us consider the most general case where all are present.<br /> <br /> Any statistical method for estimating the population parameters $\theta$ will be based on some specific probability distribution. Let us illustrate this with two common statistical methods: maximum likelihood and Bayesian estimation.<br /> <br /> <br /> ''Maximum likelihood estimation'' consists in maximizing with respect to $\theta$ the ''observed likelihood'', defined by:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\like}(\theta ; \by,\bc,\bu,\bt) &amp;\eqdef&amp; \py(\by ; \bc,\bu,\bt,\theta) \\<br /> &amp;=&amp; \int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The variance of the estimator $\thmle$ and therefore confidence intervals can be derived from the observed Fisher information matrix, which itself is calculated using the observed likelihood (i.e., the pdf of the observations $\by$):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ofim_intro3&quot;&gt;&lt;math&gt;<br /> \ofim(\thmle ; \by,\bc,\bu,\bt) \ \ \eqdef \ \ - \displaystyle{ \frac{\partial^2}{\partial \theta^2} } \log({\like}(\thmle ; \by,\bc,\bu,\bt)) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(14) }}<br /> <br /> <br /> {{OutlineText<br /> |text=Maximum likelihood estimation of the population parameter $\theta$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi$.<br /> * inputs $\by$, $\bc$, $\bu$ and $\bt$.<br /> * an algorithm which allows us to maximize $\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi$ with respect to $\theta$ and to compute $\displaystyle{ \frac{\partial^2}{\partial \theta^2} }\left\{\log\left(\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\thmle) \, d \bpsi \right)\right\}$.<br /> }}<br /> <br /> <br /> ''Bayesian estimation'' consists in estimating and/or maximizing the conditional distribution<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcthy(\theta {{!}} \by ;\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pyth(\by , \theta ; \bc,\bu,\bt)}{\py(\by ; \bc,\bu,\bt)} } \\<br /> &amp;=&amp; \frac{\displaystyle{ \int \pypsith(\by,\bpsi,\theta ; \bc,\bu,\bt) \, d \bpsi} }{\py(\by ; \bc,\bu,\bt)} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> {{outlineText<br /> |text= Bayesian estimation of the population parameter $\theta$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsith(\, \cdot \, ; \bc, \bu, \bt)$ for $(\by,\bpsi,\theta)$.<br /> * inputs $\by$, $\bc$, $\bu$ and $\bt$.<br /> * algorithms able to estimate and maximize $\pcthy(\theta {{!}} \by ;\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode, i.e., finding its maximum.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Estimation of the individual parameters===<br /> <br /> <br /> When $\theta$ is given (or estimated), various estimators of the individual parameters $\bpsi$ are available. They are all based on a probability distribution:<br /> <br /> ''Maximum likelihood estimation'' consists of maximizing with respect to $\bpsi$ the ''conditional likelihood''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\like}(\bpsi ; \by,\bu,\bt) &amp;\eqdef&amp; \pcypsi(\by {{!}} \bpsi ;\bu,\bt) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''maximum a posteriori'' (MAP) estimator is obtained by maximizing with respect to $\bpsi$ the ''conditional distribution''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcpsiy(\bpsi {{!}} \by ; \theta,\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pypsi(\by , \bpsi;\theta,\bc,\bu,\bt)}{\py(\by ; \theta,\bc,\bu,\bt)} } .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''conditional mean'' of $\bpsi$ is defined as the mean of the conditional distribution $\qcpsiy(\, \cdot \, | \by ; \theta,\bc,\bu,\bt)$ of $\psi$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> Estimation of the individual parameters $\bpsi$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br /> * algorithms able to estimate and maximize $\pcpsiy(\bpsi {{!}} \by ; \theta,\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode (i.e., its MAP).<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> ===Model selection===<br /> <br /> <br /> Likelihood ratio tests and statistical information criteria ([http://en.wikipedia.org/wiki/Bayesian_information_criterion BIC], [http://en.wikipedia.org/wiki/Akaike_information_criterion AIC]) compare the ''observed likelihoods'' computed under different models, i.e., the probability distribution functions $\py^{(1)}(\by ; \bc,\bu,\bt,\thmle_1)$, $\py^{(2)}(\by ; \bc,\bu,\bt,\thmle_2)$, ..., $\py^{(K)}(\by ; \bc,\bu,\bt,\thmle_K)$ computed under models ${\cal M}_1, {\cal M}_2$, ..., ${\cal M}_K$, where $\thmle_k$ maximizes the observed likelihood of model ${\cal M}_k$, i.e., maximizes $\py^{(k)}(\by ; \bc,\bu,\bt,\theta)$ .<br /> <br /> <br /> {{outlineText<br /> |text=<br /> Computing the observed likelihood and information criteria require:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br /> * an algorithm able to compute $\int \pypsi( \by ,\bpsi ;\theta,\bc,\bu,\bt) \, d\bpsi$. For nonlinear models, linearization methods or Monte Carlo methods can be used.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Optimal design===<br /> <br /> In the design of experiments for estimating statistical models, optimal designs allow parameters to be estimated with minimum variance by optimizing some statistical criteria. Common optimality criteria are [http://en.wikipedia.org/wiki/Functional_%28mathematics%29 functionals] of the [http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors eigenvalues] of the expected Fisher information matrix<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;efim_intro3&quot;&gt;&lt;math&gt;<br /> \efim(\theta ; \bu,\bt) \ \ \eqdef \ \ \esps{y}{\ofim(\theta ; \by,\bu,\bt)} ,<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(15) }}<br /> <br /> where $\ofim$ is the observed Fisher information matrix defined in [[#ofim_intro3|(15)]]. For the sake of simplicity, we consider models without covariates $\bc$.<br /> <br /> <br /> {{OutlineText<br /> |text=Optimal design for minimum variance estimation requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * a vector of population parameters $\theta$.<br /> * a criteria ${\cal D}(\bu,\bt)$ derived from the expected Fisher information matrix $\efim(\theta ; \bu,\bt)$.<br /> * an algorithm able to estimate $\efim(\theta ; \bu,\bt)$ for any design $(\bu,\bt)$ and to maximize ${\cal D}(\bu,\bt)$ with respect to $\bu$ and $\bt$.<br /> }}<br /> <br /> <br /> In a [http://en.wikipedia.org/wiki/Clinical_trial clinical trial] context, studies are designed to optimize the probability of reaching some predefined target ${\cal A}$, i.e., $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$. This may include optimizing safety and efficacy, and things like the probability of reaching [http://en.wikipedia.org/wiki/Sustained_viral_response sustained virologic response], etc.<br /> <br /> <br /> {{OutlineText<br /> |text=Optimal design for clinical trials requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsic(\, \cdot \, ; \theta, \bu, \bt)$ for $(\by,\bpsi,\bc)$.<br /> * a vector of population parameters $\theta$.<br /> * a target ${\cal A}$.<br /> * an algorithm able to estimate $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$ and to maximize it with respect to $\bu$ and $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Implementing models and running tasks==<br /> <br /> <br /> ===Example 1 ===<br /> <br /> Consider first the model defined by the joint distribution <br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\pypsi(\by,\bpsi ; \theta ,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \pcpsic(\bpsi ; \theta),&lt;/math&gt;}}<br /> <br /> where as in our running example, <br /> <br /> <br /> &lt;ul&gt;<br /> * $\by = (y_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are concentrations <br /> <br /> * $\bpsi= (\psi_i, 1\leq i \leq N)$ are individual parameters; here $\psi_i=(V_i,k_i,a_i)$<br /> <br /> * $\theta=(V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,a)$ are population parameters <br /> <br /> * $\bt = (t_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are the measurement times. <br /> &lt;/ul&gt;<br /> <br /> <br /> We aim to define a joint model for $\by$ and $\bpsi$. To do this, we will characterize each component of the model and show how this can be implemented with $\mlxtran$.<br /> <br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot;|<br /> {{Equation2 <br /> |name=&lt;math&gt; \pypsi(\by,\bpsi ; \theta, \bt) &lt;/math&gt; <br /> |equation= }}<br /> {{Equation2<br /> |name= &lt;math&gt; \pcpsic(\bpsi {{!}}\theta)&lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{array}{c} <br /> \log(V_i) &amp;\sim&amp; {\cal N}\left(\log(V_{\rm pop}), \, \omega_V^2\right) \\<br /> \log(k_i) &amp;\sim&amp; {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br /> \end{array}&lt;/math&gt; }}<br /> {{Equation2<br /> |name= &lt;math&gt;\pcypsi(y{{!}}\bpsi; \bt) &lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{eqnarray}<br /> f(t;V_i,k_i) &amp;=&amp; \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br /> y_{ij} &amp;\sim&amp; {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> |style = &quot;width:50%&quot; |<br /> {{MLXTranForTable<br /> |name=Example 1<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> [INDIVIDUAL PARAMETER]<br /> input={V_pop,k_pop,omega_V,omega_k}<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=V_pop,sd=omega_V}<br /> k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br /> <br /> <br /> [OBSERVATION]<br /> input={V,k,a}<br /> <br /> EQUATION:<br /> f = 500/V*exp(-k*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, sd=a}<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> <br /> We can then use this model with different tools for executing different tasks: it can be used for example with $\mlxplore$ for model exploration, with $\monolix$ for modeling, with R or Matlab for simulation, etc.<br /> <br /> It is important to remember that $\mlxtran$ is not a &quot;function&quot; that calculates an output. It is not an imperative but rather a declarative language, one that allows us to describe a model. It is then the tasks we choose to do which use $\mlxtran$ like a function, &quot;requesting&quot; it to give predictions, simulate random variables, compute a pdf, maximizes a likelihood, etc.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Example 1 ===<br /> <br /> Consider first the model defined by the joint distribution <br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\pypsi(\by,\bpsi ; \theta ,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \pcpsic(\bpsi ; \theta),&lt;/math&gt;}}<br /> <br /> where as in our running example, <br /> <br /> <br /> &lt;ul&gt;<br /> * $\by = (y_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are concentrations <br /> <br /> * $\bpsi= (\psi_i, 1\leq i \leq N)$ are individual parameters; here $\psi_i=(V_i,k_i,a_i)$<br /> <br /> * $\theta=(V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,a)$ are population parameters <br /> <br /> * $\bt = (t_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are the measurement times. <br /> &lt;/ul&gt;<br /> <br /> <br /> We aim to define a joint model for $\by$ and $\bpsi$. To do this, we will characterize each component of the model and show how this can be implemented with $\mlxtran$.<br /> <br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot;|<br /> {{Equation2 <br /> |name=&lt;math&gt; \pypsi(\by,\bpsi ; \theta, \bt) &lt;/math&gt; <br /> |equation= }}<br /> {{Equation2<br /> |name= &lt;math&gt; \pcpsic(\bpsi {{!}}\theta)&lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{array}{c} <br /> \log(V_i) &amp;\sim&amp; {\cal N}\left(\log(V_{\rm pop}), \, \omega_V^2\right) \\<br /> \log(k_i) &amp;\sim&amp; {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br /> \end{array}&lt;/math&gt; }}<br /> {{Equation2<br /> |name= &lt;math&gt;\pcypsi(y{{!}}\bpsi; \bt) &lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{eqnarray}<br /> f(t;V_i,k_i) &amp;=&amp; \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br /> y_{ij} &amp;\sim&amp; {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> |style = &quot;width:50%&quot; |<br /> {{MLXTranForTable<br /> |name=Example 1<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> [INDIVIDUAL PARAMETER]<br /> input={V_pop,k_pop,omega_V,omega_k}<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=V_pop,sd=omega_V}<br /> k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br /> <br /> <br /> [OBSERVATION]<br /> input={V,k,a}<br /> <br /> EQUATION:<br /> f = 500/V*exp(-k*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, sd=a}<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> <br /> We can then use this model with different tools for executing different tasks: it can be used for example with $\mlxplore$ for model exploration, with $\monolix$ for modeling, with R or Matlab for simulation...<br /> <br /> It is important to remember that $\mlxtran$ is not a &quot;function&quot; that calculates an output. It is not an imperative but rather a declarative language, one that allows us to describe a model. It is then the tasks we choose to do which use $\mlxtran$ like a function, &quot;requesting&quot; it to give predictions, simulate random variables, compute a pdf, maximizes a likelihood, etc.<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack=The individual approach<br /> |linkNext=Description, representation and implementation of a model }}</div> Bleakley https://wiki.inria.fr/wikis/popix/index.php?title=What_is_a_model%3F_A_joint_probability_distribution!&diff=7371 What is a model? A joint probability distribution! 2013-06-19T09:30:05Z <p>Bleakley: </p> <hr /> <div>==Introduction==<br /> <br /> A model built for real-world applications can involve various types of variable, such as measurements, individual and population parameters, covariates, design, etc. The model allows us to represent relationships between these variables.<br /> <br /> If we consider things from a probabilistic point of view, some of the variables will be random, so the model becomes a probabilistic one, representing the [http://en.wikipedia.org/wiki/Joint_probability_distribution joint distribution] of these random variables.<br /> <br /> Defining a model therefore means defining a joint distribution. The hierarchical structure of the model will then allow it to be decomposed into submodels, i.e., the joint distribution decomposed into a product of [http://en.wikipedia.org/wiki/Conditional_probability_distribution conditional distributions].<br /> <br /> Tasks such as estimation, model selection, simulation and optimization can then be expressed as specific ways of using this probability distribution.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - A model is a joint probability distribution. <br /> <br /> - A submodel is a conditional distribution derived from this joint distribution. <br /> <br /> - A task is a specific use of this distribution. <br /> }}<br /> <br /> We will illustrate this approach starting with a very simple example that we will gradually make more sophisticated. Then we will see in various situations what can be defined as the model and what its inputs are.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==An illustrative example==<br /> <br /> &lt;br&gt;<br /> ===A model for the observations of a single individual===<br /> Let $y=(y_j, 1\leq j \leq n)$ be a vector of observations obtained at times $\vt=(t_j, 1\leq j \leq n)$. We consider that the $y_j$ are random variables and we denote $\qy$ the distribution (or [http://en.wikipedia.org/wiki/Probability_density_function pdf]) of $y$. If we assume a [http://en.wikipedia.org/wiki/Parametric_model parametric model], then there exists a vector of parameters $\psi$ that completely define $y$.<br /> <br /> We can then explicitly represent this dependency with respect to $\bpsi$ by writing $\qy( \, \cdot \, ; \psi)$ for the pdf of $y$.<br /> <br /> If we wish to be even more precise, we can even make it clear that this distribution is defined for a given design, i.e., a given vector of times $\vt$, and write $\qy(\, \cdot \, ; \psi,\vt)$ instead.<br /> <br /> By convention, the variables which are before the symbol &quot;;&quot; are random variables. Those that are after the &quot;;&quot; are non-random parameters or variables.<br /> When there is no risk of confusion, the non-random terms can be left out of the notation.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> -In this context, the model is the distribution of the observations $\qy(\, \cdot \, ; \psi,\vt)$. &lt;br&gt;<br /> -The inputs of the model are the parameters $\psi$ and the design $\vt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= 500 mg of a drug is given by [http://en.wikipedia.org/wiki/Intravenous_therapy intravenous] [http://en.wikipedia.org/wiki/Bolus_%28medicine%29 bolus] to a patient at time 0. We assume that the evolution of the [http://en.wikipedia.org/wiki/Blood_plasma plasmatic] concentration of the drug over time is described by the [http://en.wikipedia.org/wiki/Pharmacokinetics pharmacokinetic] (PK) model<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; f(t;V,k) = \displaystyle{ \frac{500}{V} }e^{-k \, t} , &lt;/math&gt; }}<br /> <br /> where $V$ is the [http://en.wikipedia.org/wiki/Volume_of_distribution volume of distribution] and $k$ the [http://en.wikipedia.org/wiki/Elimination_rate_constant elimination rate constant]. The concentration is measured at times $(t_j, 1\leq j \leq n)$ with additive residual errors:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; y_j = f(t_j;V,k) + e_j , \quad 1 \leq j \leq n . &lt;/math&gt; }}<br /> <br /> Assuming that the residual errors $(e_j)$ are [http://en.wikipedia.org/wiki/Dependent_and_independent_variables independent] and [http://en.wikipedia.org/wiki/Normal_distribution normally distributed] with constant variance $a^2$, the observed values $(y_j)$ are also independent random variables and<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba1&quot; &gt;&lt;math&gt;<br /> y_j \sim {\cal N} \left( f(t_j ; V,k) , a^2 \right), \quad 1 \leq j \leq n. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> Here, the vector of parameters $\psi$ is $(V,k,a)$. $V$ and $k$ are the PK parameters for the structural PK model and $a$ the residual error parameter.<br /> As the $y_j$ are independent, the joint distribution of $y$ is the product of their marginal distributions:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \py(y ; \psi,\vt) = \prod_{j=1}^n \pyj(y_j ; \psi,t_j) ,<br /> &lt;/math&gt; }}<br /> <br /> where $\qyj$ is the normal distribution defined in [[#ex_proba1|(1)]]. <br /> }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> === A model for several individuals ===<br /> <br /> Now let us move to $N$ individuals. It is natural to suppose that each is represented by the same basic parametric model, but not necessarily the exact same parameter values. Thus, individual $i$ has parameters $\psi_i$. If we consider that individuals are randomly selected from the [http://en.wikipedia.org/wiki/Statistical_population population], then we can treat the $\psi_i$ as if they were random vectors. As both $\by=(y_i , 1\leq i \leq N)$ and $\bpsi=(\psi_i , 1\leq i \leq N)$ are random, the model is now a joint distribution: $\qypsi$. Using basic probability, this can be written as:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsi(\by,\bpsi) = \pcypsi(\by {{!}} \bpsi) \, \ppsi(\bpsi) .&lt;/math&gt; }}<br /> <br /> If $\qpsi$ is a parametric distribution that depends on a vector $\theta$ of ''population parameters'' and a set of ''individual covariates'' $\bc=(c_i , 1\leq i \leq N)$, this dependence can be made explicit by writing $\qpsi(\, \cdot \,;\theta,\bc)$ for the pdf of $\bpsi$.<br /> Each $i$ has a potentially unique set of times $t_i=(t_{i1},\ldots,t_{i \ \!\!n_i})$ in the design, and $n_i$ can be different for each individual.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - In this context, the model is the joint distribution of the observations and the individual parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pypsi(\by , \bpsi; \theta, \bc,\bt)=\pcypsi(\by {{!}} \bpsi;\bt) \, \ppsi(\bpsi;\theta,\bc) . &lt;/math&gt;}}<br /> <br /> - The inputs of the model are the population parameters $\theta$, the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times <br /> :$\bt=(t_{ij} ,\ 1\leq i \leq N ,\ 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= Let us suppose $N$ patients received the same treatment as the single patient did. We now have the same PK model [[#ex_proba1|(1)]] for each patient, except that each has its own individual PK parameters $V_i$ and $k_i$ and potentially its own residual error parameter $a_i$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2a&quot;&gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N} \left( \displaystyle{\frac{500}{V_i}e^{-k_i \, t_{ij} } } , a_i^2 \right).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> Here, $\psi_i = (V_i,k_i,a_i)$. One possible model is then to assume the same residual error model for all patients, and log-normal distributions for $V$ and $k$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> a_i &amp;=&amp; a \end{eqnarray}&lt;/math&gt; }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2b&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \log(V_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(V_{\rm pop})+\beta\,\log(w_i/70),\ \omega_V^2\right) \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \log(k_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(k_{\rm pop}),\ \omega_k^2\right), \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where the only covariate we choose to consider, $w_i$, is the weight (in kg) of patient $i$. The model therefore consists of the conditional distribution of the concentrations defined in [[#ex_proba2a|(2)]] and the distribution of the individual PK parameters defined in [[#ex_proba2b|(3)]]. The inputs of the model are the population parameters $\theta = (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$, the covariates (here, the weight) $(w_i, 1\leq i \leq N)$, and the design $\bt$.<br /> }}<br /> <br /> &lt;br&gt;<br /> ===A model for the population parameters===<br /> <br /> In some cases, it may turn out that it is useful or important to consider that the population parameter $\theta$ is itself random, rather than fixed. There are various reasons for this, such as if we want to model uncertainty in its value, introduce a priori information in an estimation context, or model an inter-population variability if the model is not looking at only one given population.<br /> <br /> If so, let us denote $\qth$ the distribution of $\theta$. As the status of $\theta$ has therefore changed, the model now becomes the joint distribution of its random variables, i.e., of $\by$, $\bpsi$ and $\theta$, and can be decomposed as follows:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3a&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta;\bt,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> |text= &lt;ol&gt;<br /> &lt;li&gt; The formula is identical for $\ppsi(\bpsi; \theta)$ and $\pcpsith(\bpsi{{!}}\theta)$. What has changed is the status of $\theta$. It is not random in $\ppsi(\bpsi; \theta)$, the distribution of $\bpsi$ for any given value of $\theta$, whereas it is random in $\pcpsith(\bpsi {{!}} \theta)$, the conditional distribution of $\bpsi$, i.e., the distribution of $\bpsi$ obtained after observing randomly generated $\theta$. &lt;/li&gt;&lt;br&gt;<br /> <br /> &lt;li&gt;If $\qth$ is a parametric distribution with parameter $\varphi$, this dependence can be made explicit by writing $\qth(\, \cdot \,;\varphi)$ for the distribution of $\theta$.&lt;/li&gt;&lt;br&gt;<br /> <br /> &lt;li&gt;Not necessarily all of the components of $\theta$ need be random. If it is possible to decompose $\theta$ into $(\theta_F,\theta_R)$, where $\theta_F$ is fixed and $\theta_R$ random, then the decomposition [[#proba3a{{!}}(4)]] becomes &lt;/li&gt;<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3b&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta_R;\bt,\theta_F,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta_R;\theta_F,\bc) \, \pth(\theta_R).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(5) }} <br /> &lt;/ol&gt;}}<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the population parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsith(\by,\bpsi,\theta;\bc,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta). &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times $\bt=(t_{ij} , 1\leq i \leq N , 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We can introduce [http://en.wikipedia.org/wiki/Prior_probability prior distributions] in order to model the inter-population variability of the population parameters $V_{\rm pop}$ and $k_{\rm pop}$: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba3&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> V_{\rm pop} &amp;\sim&amp; {\cal N}\left(30,3^2\right) <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(6) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> k_{\rm pop} &amp;\sim&amp; {\cal N}\left(0.1,0.01^2\right). \end{eqnarray}&lt;/math&gt; }}<br /> <br /> As before, the conditional distribution of the concentration is given by [[#ex_proba2a|(2)]]. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given $\theta_R=(V_{\rm pop},k_{\rm pop})$. The distribution of $\theta_R$ is defined in [[#ex_proba3|(6)]]. Here, the inputs of the model are the fixed population parameters $\theta_F = (\omega_V,\omega_k,\beta,a)$, the weights $(w_i)$ and the design $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the covariates===<br /> <br /> <br /> Another scenario is to suppose that in fact it is the covariates $\bc$ that are random, not the population parameters. This may either be in the context of wanting to simulate individuals, or when modeling and wanting to take into account uncertainty in the covariate values. If we note $\qc$ the distribution of the covariates, the joint distribution $\qpsic$ of the individual parameters and the covariates decomposes naturally as:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba4&quot;&gt;&lt;math&gt;<br /> \ppsic(\bpsi,\bc;\theta) = \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) \, ,<br /> &lt;/math&gt;<br /> |reference=(7) }}<br /> <br /> where $\qcpsic$ is the conditional distribution of $\bpsi$ given $\bc$.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> &lt;li&gt;In this context, the model is the joint distribution of the observations, the individual parameters and the covariates:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsic(\by,\bpsi,\bc;\theta,\bt) = \pcypsi(\by {{!}} \bpsi;\bt) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) . &lt;/math&gt; }}<br /> <br /> &lt;li&gt;The inputs of the model are the population parameters $\theta$ and the measurement times $\bt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We could assume a normal distribution as a prior for the weights: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba4&quot; &gt;&lt;math&gt; <br /> w_i \sim_{i.i.d.} {\cal N}\left(70,10^2\right). &lt;/math&gt;&lt;/div&gt;<br /> |reference=(8) }}<br /> <br /> Once more, [[#ex_proba2a|(2)]] defines the conditional distribution of the concentrations. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given the weight $\bw$, which is now a random variable whose distribution is defined in [[#ex_proba4|(8)]]. Now, the inputs of the model are the population parameters $\theta= (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$ and the design $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the measurement times===<br /> <br /> Another scenario is to suppose that there is uncertainty in the measurement times $\bt=(t_{ij})$ and not the population parameters or covariates. If we note $\nominal{\bt}=(\nominal{t}_{ij}, 1\leq i \leq N, 1\leq j \leq n_i)$ the nominal measurement times (i.e., those presented in a data set), then the &quot;true&quot; measurement times $\bt$ at which the measurement were made can be considered random fluctuations around $\nominal{\bt}$ following some distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> Randomness with respect to time can also appear in the presence of dropout, i.e., individuals that prematurely leave a clinical trial. For such an individual $i$ who leaves at the random time $T_i$, measurement times are the nominal times before $T_i$: $t_{i} = (\nominal{t}_{ij} \ \ {\rm s.t. }\ \ \nominal{t}_{ij}\leq T_i)$.<br /> In such situations, measurement times are therefore random and can be thought to come from a distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> <br /> {{Remarks<br /> |title=Remark<br /> |text= If there are also other regression variables $\bx=(x_{ij})$, it is of course possible to use the same approach and consider $\bx$ as a random variable fluctuating around $\nominal{\bx}$. }}<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsit(\by , \bpsi,\bt; \theta,\bc,\nominal{\bt})=\pcypsit(\by {{!}}\bpsi,\bt) \, \ppsi(\bpsi;\theta,\bc) \, \pt(\bt ; \nominal{\bt}) .<br /> &lt;/math&gt; }}<br /> <br /> <br /> &lt;li&gt; The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$ and the nominal design $\nominal{\bt}$.<br /> }}<br /> <br /> {{Example<br /> |title=Example:<br /> |text= Let us assume as prior a normal distribution around the nominal times: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba5&quot; &gt;&lt;math&gt; <br /> t_{ij} \sim_{i.i.d.} {\cal N}\left(\nominal{t}_{ij},0.03^2\right). &lt;/math&gt;&lt;/div&gt;<br /> |reference=(9) }}<br /> <br /> Here, [[#ex_proba5|(9)]] defines the distribution of the now random variable $\bt$. The other components of the model defined in [[#ex_proba2a|(2)]] and [[#ex_proba2b|(3)]] remain unchanged. <br /> The inputs of the model are the population parameters $\theta$, the weights $(w_i)$ and the nominal measurement times $\nominal{\bt}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the dose regimen===<br /> <br /> If the structural model is a dynamical system (e.g., defined by a system of [http://en.wikipedia.org/wiki/Ordinary_differential_equation ordinary differential equations]), the ''source terms'' $\bu = (u_i, 1\leq i \leq N)$, i.e., the inputs of the dynamical system, are usually considered fixed and known. This is the case for example for doses administered to patients for a given treatment. Here, the source term $u_i$ is made up of the dose(s) given to patient $i$, the time(s) of administration, and their type (intravenous bolus, infusion, oral, etc.).<br /> <br /> Here again, there may be differences between the nominal dose regimen stated in the protocol and given in the data set, and the dose regimen that was in reality administered. For example, it might be that the times of administration and/or the dosage were not exactly respected or recorded. Also, there may have been non-compliance, i.e., certain doses that were not taken by the patient.<br /> <br /> If we denote $\nominal{\bu}=(\nominal{u}_{i}, 1\leq i \leq N)$ the nominal dose regimens (reported in the dataset), then in this context the &quot;real&quot; dose regimens $\bu$ can be considered to randomly fluctuate around $\nominal{\bu}$ with some distribution $\qu(\, \cdot \, ; \nominal{\bu})$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the dose regimens:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsiu(\by , \bpsi,\bu; \theta,\bc,\bt,\nominal{\bu})=\pcypsiu(\by {{!}} \bpsi,\bu;\bt) \, \pu(\bu ; \nominal{\bu}) \, \ppsi(\bpsi;\theta,\bc) .<br /> &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$, the nominal design $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=Suppose that instead of the one dose given in the example up to now, there are now repeated doses $(d_{ik}, k \geq 1)$ administered to patient $i$ at times $(\tau_{ik} , k \geq 1)$. Then, it is easy to see that<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6b&quot; &gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N}\left(f(t_{ij};V_i,k_i) , a_i^2 \right), &lt;/math&gt;&lt;/div&gt;<br /> |reference=(10) }}<br /> <br /> where<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6a&quot; &gt;&lt;math&gt; <br /> f(t;V_i,k_i) = \sum_{k, \tau_{ik}&lt;t}\displaystyle {\frac{d_{ik} }{V_i} }\, e^{-k_i \, (t- \tau_{ik})} .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(11) }}<br /> <br /> The &quot;real&quot; dose regimen administered to patient $i$ can be written $u_i=(d_{ik},\tau_{ik}, k\geq 1)$, and the prescribed dose regimen $\nominal{u}_i=(\nominal{d}_{ik},\nominal{\tau}_{ik}, k\geq 1)$.<br /> <br /> We can model the random fluctuations of the administration times $\tau_{ik}$ around the nominal times $(\nominal{\tau}_{ij})$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6c&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \tau_{ik} &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\nominal{\tau}_{ik},0.02^2\right)\ , <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(12) }}<br /> <br /> and non-compliance (here meaning that a dose is not taken):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6d&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \pi &amp;=&amp; \prob{d_{ik} = 0} \nonumber \\ &amp;=&amp; 1 - \prob{d_{ik}= \nominal{d}_{ik} }. <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(13) }}<br /> <br /> Here, [[#ex_proba6b{{!}}(10)]] and [[#ex_proba6a{{!}}(11)]] define the conditional distributions of the concentrations $(y_{ij})$, [[#ex_proba2b{{!}}(3)]] defines the distribution of $\bpsi$ and [[#ex_proba6c{{!}}(12)]] and [[#ex_proba6d{{!}}(13)]] define the distribution of $\bu$. The inputs are the population parameters $\theta$, the weights $(w_i)$, the measurement times $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A complete model===<br /> <br /> We have now seen the variety of ways in which the variables in a model either play the role of random variables whose distribution is defined by the model, or nonrandom variables or parameters. Any combination is possible, depending on the context. For instance, the population parameters $\theta$ and covariates $\bc$ could be random with parametric probability distributions $\qth(\, \cdot \,;\varphi)$ and $\qc(\, \cdot \,;\gamma)$, and the dose regimen $\bu$ and measurement times $\bt$ reported with uncertainty and therefore modeled as random variables with distributions $\qu$ and $\qt$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters, the population parameters, the dose regimens, the covariates and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsithcut(\by , \bpsi, \theta, \bu, \bc,\bt; \nominal{\bu},\nominal{\bt},\varphi,\gamma)=\pcypsiut(\by {{!}}\bpsi,\bu,\bt) \, \pcpsithc(\bpsi{{!}}\theta,\bc) \, \pth(\theta;\varphi) \, \pc(\bc;\gamma) \, \pu(\bu ; \nominal{\bu}) \, \pt(\bt ; \nominal{\bt}). &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the nominal dose regimens $\nominal{\bu}$, the nominal measurement times $\nominal{\bt}$ and the &quot;hyper-parameters&quot; $\varphi$ and $\gamma$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Using the model for executing tasks==<br /> <br /> <br /> In the modeling and simulation context, the tasks to execute make specific use of the various probability distributions associated with a model.<br /> <br /> &lt;br&gt;<br /> ===Simulation===<br /> <br /> <br /> By definition, simulation makes direct use of the probability distribution that defines the model. Simulation of the global model is straightforward as soon as the joint distribution can be decomposed into a product of easily simulated conditional distributions.<br /> <br /> Consider for example that the variables involved in the model are those introduced in [[#An illustrative example|the previous section]]:<br /> <br /> <br /> # The population parameters $\theta$ can either be given, or simulated from the distribution $\qth$.<br /> # The individual covariates $\bc$ can either be given, or simulated from the distribution $\qc$.<br /> # The individual parameters $\bpsi$ can be simulated from the distribution $\qcpsithc$ using the values of $\theta$ and $\bc$ obtained in steps 1 and 2.<br /> # The dose regimen $\bu$ can either be given, or simulated from the distribution $\qu$.<br /> # The measurement times $\bt$ (resp. regression variables $\bx$) can either be given, or simulated from the distribution $\qt$ (resp. $\qx$).<br /> # Lastly, observations $\by$ can be simulated from the distribution $\qcypsiut$ using the values of $\bpsi$, $\bu$ and $\bt$ obtained at steps 3, 4 and 5.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> Simulation of a set of variables $w$ using another given set of variables $z$ requires:<br /> <br /> <br /> &lt;ul&gt;<br /> * a model, i.e., a distribution $\qw$ if $z$ is treated as a nonrandom variable, or a conditional distribution $\qcwz$ if $z$ is treated as a random variable.<br /> * the input $z$, i.e., a value of $z$ which allows the distribution $\qw(\, \cdot \, ; z)$ or the conditional distribution $\qcwz(\, \cdot \, {{!}} z)$ to be defined.<br /> * an algorithm which allows us to generate $w$ from $\qw$ or $\qcwz$.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=<br /> - Imagine instead that the population parameter $\theta$ and the design $(\bu,\bt)$ are given, and we want to simulate the individual covariates $\bc$, the individual parameters $\bpsi$ and the observations $\by$. Here, the variables to simulate are $w=(\bc,\bpsi,\by)$ and the variables which are given are $z=(\theta,\bu,\bt)$. If the components of $z$ are taken to be nonrandom variables, then:<br /> <br /> <br /> &lt;ul&gt;<br /> * The model is the joint distribution $\qypsic( \, \cdot \, ;\theta,\bu,\bt)$ of $(\by,\bpsi,\bc)$.<br /> * The inputs required for the simulation are the values of $(\theta,\bu,\bt)$.<br /> * The algorithm should be able to generate $(\by,\bpsi,\bc)$ from the joint distribution $\qypsic(\, \cdot \, ;\theta,\bu,\bt)$. Decomposing the model into three submodels leads to decomposing the joint distribution as<br /> &lt;/ul&gt;<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsic(\by,\bpsi,\bc ;\theta,\bu,\bt) = \pc(\bc) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pcypsi(\by {{!}} \bpsi;\bu,\bt) . &lt;/math&gt; }}<br /> <br /> The algorithm therefore reduces to successively drawing $\bc$, $\bpsi$ and $\by$ from $\qc$, $\qcpsic(\, \cdot \, {{!}} \bc;\theta)$ and $\qcypsi(\, \cdot \, {{!}} \bpsi;\bu,\bt)$. <br /> <br /> <br /> - Imagine instead that the individual covariates $\bc$, the observations $\by$, the design $(\bu,\bt)$ and the population parameter $\theta$ are given (in a modeling context for instance, $\theta$ may have been estimated), and we want to simulate the individual parameters $\bpsi$. The only variable to simulate is $w=\bpsi$ and the variables which are given are $z=(\by,\bc,\theta,\bu,\bt)$. Here, we will treat $\by$ as if it is a random variable. The other components of $z$ can be treated as non random variables. Here,<br /> <br /> <br /> &lt;ul&gt;<br /> * The model is the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$ of $\psi$.<br /> * The inputs required for the simulation are the values of $(\by,\bc,\theta,\bu,\bt)$.<br /> * The algorithm should be able to sample $\bpsi$ from the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$. [http://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo Markov Chain Monte Carlo] (MCMC) algorithms can be used for sampling from such complex conditional distributions.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> &lt;br&gt;<br /> <br /> ===Estimation of the population parameters===<br /> <br /> <br /> In a modeling context, we usually assume that we have data that includes the observations $\by$ and the measurement times $\bt$. There may also be individual covariates $\bc$, and in pharmacological applications the dose regimen $\bu$. For clarity, let us consider the most general case where all are present.<br /> <br /> Any statistical method for estimating the population parameters $\theta$ will be based on some specific probability distribution. Let us illustrate this with two common statistical methods: maximum likelihood and Bayesian estimation.<br /> <br /> <br /> ''Maximum likelihood estimation'' consists in maximizing with respect to $\theta$ the ''observed likelihood'', defined by:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\like}(\theta ; \by,\bc,\bu,\bt) &amp;\eqdef&amp; \py(\by ; \bc,\bu,\bt,\theta) \\<br /> &amp;=&amp; \int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The variance of the estimator $\thmle$ and therefore confidence intervals can be derived from the observed Fisher information matrix, which itself is calculated using the observed likelihood (i.e., the pdf of the observations $\by$):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ofim_intro3&quot;&gt;&lt;math&gt;<br /> \ofim(\thmle ; \by,\bc,\bu,\bt) \ \ \eqdef \ \ - \displaystyle{ \frac{\partial^2}{\partial \theta^2} } \log({\like}(\thmle ; \by,\bc,\bu,\bt)) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(14) }}<br /> <br /> <br /> {{OutlineText<br /> |text=Maximum likelihood estimation of the population parameter $\theta$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi$.<br /> * inputs $\by$, $\bc$, $\bu$ and $\bt$.<br /> * an algorithm which allows us to maximize $\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi$ with respect to $\theta$ and to compute $\displaystyle{ \frac{\partial^2}{\partial \theta^2} }\left\{\log\left(\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\thmle) \, d \bpsi \right)\right\}$.<br /> }}<br /> <br /> <br /> ''Bayesian estimation'' consists in estimating and/or maximizing the conditional distribution<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcthy(\theta {{!}} \by ;\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pyth(\by , \theta ; \bc,\bu,\bt)}{\py(\by ; \bc,\bu,\bt)} } \\<br /> &amp;=&amp; \frac{\displaystyle{ \int \pypsith(\by,\bpsi,\theta ; \bc,\bu,\bt) \, d \bpsi} }{\py(\by ; \bc,\bu,\bt)} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> {{outlineText<br /> |text= Bayesian estimation of the population parameter $\theta$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsith(\, \cdot \, ; \bc, \bu, \bt)$ for $(\by,\bpsi,\theta)$.<br /> * inputs $\by$, $\bc$, $\bu$ and $\bt$.<br /> * algorithms able to estimate and maximize $\pcthy(\theta {{!}} \by ;\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode, i.e., finding its maximum.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Estimation of the individual parameters===<br /> <br /> <br /> When $\theta$ is given (or estimated), various estimators of the individual parameters $\bpsi$ are available. They are all based on a probability distribution:<br /> <br /> ''Maximum likelihood estimation'' consists of maximizing with respect to $\bpsi$ the ''conditional likelihood''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\like}(\bpsi ; \by,\bu,\bt) &amp;\eqdef&amp; \pcypsi(\by {{!}} \bpsi ;\bu,\bt) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''maximum a posteriori'' (MAP) estimator is obtained by maximizing with respect to $\bpsi$ the ''conditional distribution''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcpsiy(\bpsi {{!}} \by ; \theta,\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pypsi(\by , \bpsi;\theta,\bc,\bu,\bt)}{\py(\by ; \theta,\bc,\bu,\bt)} } .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''conditional mean'' of $\bpsi$ is defined as the mean of the conditional distribution $\qcpsiy(\, \cdot \, | \by ; \theta,\bc,\bu,\bt)$ of $\psi$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> Estimation of the individual parameters $\bpsi$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br /> * algorithms able to estimate and maximize $\pcpsiy(\bpsi {{!}} \by ; \theta,\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode (i.e., its MAP).<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> ===Model selection===<br /> <br /> <br /> Likelihood ratio tests and statistical information criteria ([http://en.wikipedia.org/wiki/Bayesian_information_criterion BIC], [http://en.wikipedia.org/wiki/Akaike_information_criterion AIC]) compare the ''observed likelihoods'' computed under different models, i.e., the probability distribution functions $\py^{(1)}(\by ; \bc,\bu,\bt,\thmle_1)$, $\py^{(2)}(\by ; \bc,\bu,\bt,\thmle_2)$, ..., $\py^{(K)}(\by ; \bc,\bu,\bt,\thmle_K)$ computed under models ${\cal M}_1, {\cal M}_2$, ..., ${\cal M}_K$, where $\thmle_k$ maximizes the observed likelihood of model ${\cal M}_k$, i.e., maximizes $\py^{(k)}(\by ; \bc,\bu,\bt,\theta)$ .<br /> <br /> <br /> {{outlineText<br /> |text=<br /> Computing the observed likelihood and information criteria require:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br /> * an algorithm able to compute $\int \pypsi( \by ,\bpsi ;\theta,\bc,\bu,\bt) \, d\bpsi$. For nonlinear models, linearization methods or Monte Carlo methods can be used.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Optimal design===<br /> <br /> In the design of experiments for estimating statistical models, optimal designs allow parameters to be estimated with minimum variance by optimizing some statistical criteria. Common optimality criteria are [http://en.wikipedia.org/wiki/Functional_%28mathematics%29 functionals] of the [http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors eigenvalues] of the expected Fisher information matrix<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;efim_intro3&quot;&gt;&lt;math&gt;<br /> \efim(\theta ; \bu,\bt) \ \ \eqdef \ \ \esps{y}{\ofim(\theta ; \by,\bu,\bt)} ,<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(15) }}<br /> <br /> where $\ofim$ is the observed Fisher information matrix defined in [[#ofim_intro3|(15)]]. For the sake of simplicity, we consider models without covariates $\bc$.<br /> <br /> <br /> {{OutlineText<br /> |text=Optimal design for minimum variance estimation requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * a vector of population parameters $\theta$.<br /> * a criteria ${\cal D}(\bu,\bt)$ derived from the expected Fisher information matrix $\efim(\theta ; \bu,\bt)$.<br /> * an algorithm able to estimate $\efim(\theta ; \bu,\bt)$ for any design $(\bu,\bt)$ and to maximize ${\cal D}(\bu,\bt)$ with respect to $\bu$ and $\bt$.<br /> }}<br /> <br /> <br /> In a [http://en.wikipedia.org/wiki/Clinical_trial clinical trial] context, studies are designed to optimize the probability of reaching some predefined target ${\cal A}$, i.e., $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$. This may include optimizing safety and efficacy, and things like the probability of reaching [http://en.wikipedia.org/wiki/Sustained_viral_response sustained virologic response], etc.<br /> <br /> <br /> {{OutlineText<br /> |text=Optimal design for clinical trials requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsic(\, \cdot \, ; \theta, \bu, \bt)$ for $(\by,\bpsi,\bc)$.<br /> * a vector of population parameters $\theta$.<br /> * a target ${\cal A}$.<br /> * an algorithm able to estimate $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$ and to maximize it with respect to $\bu$ and $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Implementing models and running tasks==<br /> <br /> <br /> ===Example 1 ===<br /> <br /> Consider first the model defined by the joint distribution <br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\pypsi(\by,\bpsi ; \theta ,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \pcpsic(\bpsi ; \theta),&lt;/math&gt;}}<br /> <br /> where as in our running example, <br /> <br /> <br /> &lt;ul&gt;<br /> * $\by = (y_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are concentrations <br /> <br /> * $\bpsi= (\psi_i, 1\leq i \leq N)$ are individual parameters; here $\psi_i=(V_i,k_i,a_i)$<br /> <br /> * $\theta=(V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,a)$ are population parameters <br /> <br /> * $\bt = (t_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are the measurement times. <br /> &lt;/ul&gt;<br /> <br /> <br /> We aim to define a joint model for $\by$ and $\bpsi$. To do this, we will characterize each component of the model and show how this can be implemented with $\mlxtran$.<br /> <br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot;|<br /> {{Equation2 <br /> |name=&lt;math&gt; \pypsi(\by,\bpsi ; \theta, \bt) &lt;/math&gt; <br /> |equation= }}<br /> {{Equation2<br /> |name= &lt;math&gt; \pcpsic(\bpsi {{!}}\theta)&lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{array}{c} <br /> \log(V_i) &amp;\sim&amp; {\cal N}\left(\log(V_{\rm pop}), \, \omega_V^2\right) \\<br /> \log(k_i) &amp;\sim&amp; {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br /> \end{array}&lt;/math&gt; }}<br /> {{Equation2<br /> |name= &lt;math&gt;\pcypsi(y{{!}}\bpsi; \bt) &lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{eqnarray}<br /> f(t;V_i,k_i) &amp;=&amp; \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br /> y_{ij} &amp;\sim&amp; {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> |style = &quot;width:50%&quot; |<br /> {{MLXTranForTable<br /> |name=Example 1<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> [INDIVIDUAL PARAMETER]<br /> input={V_pop,k_pop,omega_V,omega_k}<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=V_pop,sd=omega_V}<br /> k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br /> <br /> <br /> [OBSERVATION]<br /> input={V,k,a}<br /> <br /> EQUATION:<br /> f = 500/V*exp(-k*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, sd=a}<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> <br /> We can then use this model with different tools for executing different tasks: it can be used for example with $\mlxplore$ for model exploration, with $\monolix$ for modeling, with R or Matlab for simulation...<br /> <br /> It is important to remember that $\mlxtran$ is not a &quot;function&quot; that calculates an output. It is not an imperative but rather a declarative language, one that allows us to describe a model. It is then the tasks we choose to do which use $\mlxtran$ like a function, &quot;requesting&quot; it to give predictions, simulate random variables, compute a pdf, maximizes a likelihood, etc.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Example 1 ===<br /> <br /> Consider first the model defined by the joint distribution <br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\pypsi(\by,\bpsi ; \theta ,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \pcpsic(\bpsi ; \theta),&lt;/math&gt;}}<br /> <br /> where as in our running example, <br /> <br /> <br /> &lt;ul&gt;<br /> * $\by = (y_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are concentrations <br /> <br /> * $\bpsi= (\psi_i, 1\leq i \leq N)$ are individual parameters; here $\psi_i=(V_i,k_i,a_i)$<br /> <br /> * $\theta=(V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,a)$ are population parameters <br /> <br /> * $\bt = (t_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are the measurement times. <br /> &lt;/ul&gt;<br /> <br /> <br /> We aim to define a joint model for $\by$ and $\bpsi$. To do this, we will characterize each component of the model and show how this can be implemented with $\mlxtran$.<br /> <br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot;|<br /> {{Equation2 <br /> |name=&lt;math&gt; \pypsi(\by,\bpsi ; \theta, \bt) &lt;/math&gt; <br /> |equation= }}<br /> {{Equation2<br /> |name= &lt;math&gt; \pcpsic(\bpsi {{!}}\theta)&lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{array}{c} <br /> \log(V_i) &amp;\sim&amp; {\cal N}\left(\log(V_{\rm pop}), \, \omega_V^2\right) \\<br /> \log(k_i) &amp;\sim&amp; {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br /> \end{array}&lt;/math&gt; }}<br /> {{Equation2<br /> |name= &lt;math&gt;\pcypsi(y{{!}}\bpsi; \bt) &lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{eqnarray}<br /> f(t;V_i,k_i) &amp;=&amp; \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br /> y_{ij} &amp;\sim&amp; {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> |style = &quot;width:50%&quot; |<br /> {{MLXTranForTable<br /> |name=Example 1<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> [INDIVIDUAL PARAMETER]<br /> input={V_pop,k_pop,omega_V,omega_k}<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=V_pop,sd=omega_V}<br /> k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br /> <br /> <br /> [OBSERVATION]<br /> input={V,k,a}<br /> <br /> EQUATION:<br /> f = 500/V*exp(-k*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, sd=a}<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> <br /> We can then use this model with different tools for executing different tasks: it can be used for example with $\mlxplore$ for model exploration, with $\monolix$ for modeling, with R or Matlab for simulation...<br /> <br /> It is important to remember that $\mlxtran$ is not a &quot;function&quot; that calculates an output. It is not an imperative but rather a declarative language, one that allows us to describe a model. It is then the tasks we choose to do which use $\mlxtran$ like a function, &quot;requesting&quot; it to give predictions, simulate random variables, compute a pdf, maximizes a likelihood, etc.<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack=The individual approach<br /> |linkNext=Description, representation and implementation of a model }}</div> Bleakley https://wiki.inria.fr/wikis/popix/index.php?title=What_is_a_model%3F_A_joint_probability_distribution!&diff=7370 What is a model? A joint probability distribution! 2013-06-19T09:28:22Z <p>Bleakley: </p> <hr /> <div>==Introduction==<br /> <br /> A model built for real-world applications can involve various types of variable, such as measurements, individual and population parameters, covariates, design, etc. The model allows us to represent relationships between these variables.<br /> <br /> If we consider things from a probabilistic point of view, some of the variables will be random, so the model becomes a probabilistic one, representing the [http://en.wikipedia.org/wiki/Joint_probability_distribution joint distribution] of these random variables.<br /> <br /> Defining a model therefore means defining a joint distribution. The hierarchical structure of the model will then allow it to be decomposed into submodels, i.e., the joint distribution decomposed into a product of [http://en.wikipedia.org/wiki/Conditional_probability_distribution conditional distributions].<br /> <br /> Tasks such as estimation, model selection, simulation and optimization can then be expressed as specific ways of using this probability distribution.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - A model is a joint probability distribution. <br /> <br /> - A submodel is a conditional distribution derived from this joint distribution. <br /> <br /> - A task is a specific use of this distribution. <br /> }}<br /> <br /> We will illustrate this approach starting with a very simple example that we will gradually make more sophisticated. Then we will see in various situations what can be defined as the model and what its inputs are.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==An illustrative example==<br /> <br /> &lt;br&gt;<br /> ===A model for the observations of a single individual===<br /> Let $y=(y_j, 1\leq j \leq n)$ be a vector of observations obtained at times $\vt=(t_j, 1\leq j \leq n)$. We consider that the $y_j$ are random variables and we denote $\qy$ the distribution (or [http://en.wikipedia.org/wiki/Probability_density_function pdf]) of $y$. If we assume a [http://en.wikipedia.org/wiki/Parametric_model parametric model], then there exists a vector of parameters $\psi$ that completely define $y$.<br /> <br /> We can then explicitly represent this dependency with respect to $\bpsi$ by writing $\qy( \, \cdot \, ; \psi)$ for the pdf of $y$.<br /> <br /> If we wish to be even more precise, we can even make it clear that this distribution is defined for a given design, i.e., a given vector of times $\vt$, and write $\qy(\, \cdot \, ; \psi,\vt)$ instead.<br /> <br /> By convention, the variables which are before the symbol &quot;;&quot; are random variables. Those that are after the &quot;;&quot; are non-random parameters or variables.<br /> When there is no risk of confusion, the non-random terms can be left out of the notation.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> -In this context, the model is the distribution of the observations $\qy(\, \cdot \, ; \psi,\vt)$. &lt;br&gt;<br /> -The inputs of the model are the parameters $\psi$ and the design $\vt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= 500 mg of a drug is given by [http://en.wikipedia.org/wiki/Intravenous_therapy intravenous] [http://en.wikipedia.org/wiki/Bolus_%28medicine%29 bolus] to a patient at time 0. We assume that the evolution of the [http://en.wikipedia.org/wiki/Blood_plasma plasmatic] concentration of the drug over time is described by the [http://en.wikipedia.org/wiki/Pharmacokinetics pharmacokinetic] (PK) model<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; f(t;V,k) = \displaystyle{ \frac{500}{V} }e^{-k \, t} , &lt;/math&gt; }}<br /> <br /> where $V$ is the [http://en.wikipedia.org/wiki/Volume_of_distribution volume of distribution] and $k$ the [http://en.wikipedia.org/wiki/Elimination_rate_constant elimination rate constant]. The concentration is measured at times $(t_j, 1\leq j \leq n)$ with additive residual errors:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; y_j = f(t_j;V,k) + e_j , \quad 1 \leq j \leq n . &lt;/math&gt; }}<br /> <br /> Assuming that the residual errors $(e_j)$ are [http://en.wikipedia.org/wiki/Dependent_and_independent_variables independent] and [http://en.wikipedia.org/wiki/Normal_distribution normally distributed] with constant variance $a^2$, the observed values $(y_j)$ are also independent random variables and<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba1&quot; &gt;&lt;math&gt;<br /> y_j \sim {\cal N} \left( f(t_j ; V,k) , a^2 \right), \quad 1 \leq j \leq n. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> Here, the vector of parameters $\psi$ is $(V,k,a)$. $V$ and $k$ are the PK parameters for the structural PK model and $a$ the residual error parameter.<br /> As the $y_j$ are independent, the joint distribution of $y$ is the product of their marginal distributions:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \py(y ; \psi,\vt) = \prod_{j=1}^n \pyj(y_j ; \psi,t_j) ,<br /> &lt;/math&gt; }}<br /> <br /> where $\qyj$ is the normal distribution defined in [[#ex_proba1|(1)]]. <br /> }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> === A model for several individuals ===<br /> <br /> Now let us move to $N$ individuals. It is natural to suppose that each is represented by the same basic parametric model, but not necessarily the exact same parameter values. Thus, individual $i$ has parameters $\psi_i$. If we consider that individuals are randomly selected from the [http://en.wikipedia.org/wiki/Statistical_population population], then we can treat the $\psi_i$ as if they were random vectors. As both $\by=(y_i , 1\leq i \leq N)$ and $\bpsi=(\psi_i , 1\leq i \leq N)$ are random, the model is now a joint distribution: $\qypsi$. Using basic probability, this can be written as:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsi(\by,\bpsi) = \pcypsi(\by {{!}} \bpsi) \, \ppsi(\bpsi) .&lt;/math&gt; }}<br /> <br /> If $\qpsi$ is a parametric distribution that depends on a vector $\theta$ of ''population parameters'' and a set of ''individual covariates'' $\bc=(c_i , 1\leq i \leq N)$, this dependence can be made explicit by writing $\qpsi(\, \cdot \,;\theta,\bc)$ for the pdf of $\bpsi$.<br /> Each $i$ has a potentially unique set of times $t_i=(t_{i1},\ldots,t_{i \ \!\!n_i})$ in the design, and $n_i$ can be different for each individual.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - In this context, the model is the joint distribution of the observations and the individual parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pypsi(\by , \bpsi; \theta, \bc,\bt)=\pcypsi(\by {{!}} \bpsi;\bt) \, \ppsi(\bpsi;\theta,\bc) . &lt;/math&gt;}}<br /> <br /> - The inputs of the model are the population parameters $\theta$, the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times <br /> :$\bt=(t_{ij} ,\ 1\leq i \leq N ,\ 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= Let us suppose $N$ patients received the same treatment as the single patient did. We now have the same PK model [[#ex_proba1|(1)]] for each patient, except that each has its own individual PK parameters $V_i$ and $k_i$ and potentially its own residual error parameter $a_i$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2a&quot;&gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N} \left( \displaystyle{\frac{500}{V_i}e^{-k_i \, t_{ij} } } , a_i^2 \right).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> Here, $\psi_i = (V_i,k_i,a_i)$. One possible model is then to assume the same residual error model for all patients, and log-normal distributions for $V$ and $k$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> a_i &amp;=&amp; a \end{eqnarray}&lt;/math&gt; }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2b&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \log(V_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(V_{\rm pop})+\beta\,\log(w_i/70),\ \omega_V^2\right) \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \log(k_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(k_{\rm pop}),\ \omega_k^2\right), \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where the only covariate we choose to consider, $w_i$, is the weight (in kg) of patient $i$. The model therefore consists of the conditional distribution of the concentrations defined in [[#ex_proba2a|(2)]] and the distribution of the individual PK parameters defined in [[#ex_proba2b|(3)]]. The inputs of the model are the population parameters $\theta = (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$, the covariates (here, the weight) $(w_i, 1\leq i \leq N)$, and the design $\bt$.<br /> }}<br /> <br /> &lt;br&gt;<br /> ===A model for the population parameters===<br /> <br /> In some cases, it may turn out that it is useful or important to consider that the population parameter $\theta$ is itself random, rather than fixed. There are various reasons for this, such as if we want to model uncertainty in its value, introduce a priori information in an estimation context, or model an inter-population variability if the model is not looking at only one given population.<br /> <br /> If so, let us denote $\qth$ the distribution of $\theta$. As the status of $\theta$ has therefore changed, the model now becomes the joint distribution of its random variables, i.e., of $\by$, $\bpsi$ and $\theta$, and can be decomposed as follows:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3a&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta;\bt,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> |text= &lt;ol&gt;<br /> &lt;li&gt; The formula is identical for $\ppsi(\bpsi; \theta)$ and $\pcpsith(\bpsi{{!}}\theta)$. What has changed is the status of $\theta$. It is not random in $\ppsi(\bpsi; \theta)$, the distribution of $\bpsi$ for any given value of $\theta$, whereas it is random in $\pcpsith(\bpsi {{!}} \theta)$, the conditional distribution of $\bpsi$, i.e., the distribution of $\bpsi$ obtained after observing randomly generated $\theta$. &lt;/li&gt;&lt;br&gt;<br /> <br /> &lt;li&gt;If $\qth$ is a parametric distribution with parameter $\varphi$, this dependence can be made explicit by writing $\qth(\, \cdot \,;\varphi)$ for the distribution of $\theta$.&lt;/li&gt;&lt;br&gt;<br /> <br /> &lt;li&gt;Not necessarily all of the components of $\theta$ need be random. If it is possible to decompose $\theta$ into $(\theta_F,\theta_R)$, where $\theta_F$ is fixed and $\theta_R$ random, then the decomposition [[#proba3a{{!}}(4)]] becomes &lt;/li&gt;<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3b&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta_R;\bt,\theta_F,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta_R;\theta_F,\bc) \, \pth(\theta_R).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(5) }} <br /> &lt;/ol&gt;}}<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the population parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsith(\by,\bpsi,\theta;\bc,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta). &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times $\bt=(t_{ij} , 1\leq i \leq N , 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We can introduce [http://en.wikipedia.org/wiki/Prior_probability prior distributions] in order to model the inter-population variability of the population parameters $V_{\rm pop}$ and $k_{\rm pop}$: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba3&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> V_{\rm pop} &amp;\sim&amp; {\cal N}\left(30,3^2\right) <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(6) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> k_{\rm pop} &amp;\sim&amp; {\cal N}\left(0.1,0.01^2\right). \end{eqnarray}&lt;/math&gt; }}<br /> <br /> As before, the conditional distribution of the concentration is given by [[#ex_proba2a|(2)]]. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given $\theta_R=(V_{\rm pop},k_{\rm pop})$. The distribution of $\theta_R$ is defined in [[#ex_proba3|(6)]]. Here, the inputs of the model are the fixed population parameters $\theta_F = (\omega_V,\omega_k,\beta,a)$, the weights $(w_i)$ and the design $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the covariates===<br /> <br /> <br /> Another scenario is to suppose that in fact it is the covariates $\bc$ that are random, not the population parameters. This may either be in the context of wanting to simulate individuals, or when modeling and wanting to take into account uncertainty in the covariate values. If we note $\qc$ the distribution of the covariates, the joint distribution $\qpsic$ of the individual parameters and the covariates decomposes naturally as:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba4&quot;&gt;&lt;math&gt;<br /> \ppsic(\bpsi,\bc;\theta) = \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) \, ,<br /> &lt;/math&gt;<br /> |reference=(7) }}<br /> <br /> where $\qcpsic$ is the conditional distribution of $\bpsi$ given $\bc$.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> &lt;li&gt;In this context, the model is the joint distribution of the observations, the individual parameters and the covariates:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsic(\by,\bpsi,\bc;\theta,\bt) = \pcypsi(\by {{!}} \bpsi;\bt) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) . &lt;/math&gt; }}<br /> <br /> &lt;li&gt;The inputs of the model are the population parameters $\theta$ and the measurement times $\bt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We could assume a normal distribution as a prior for the weights: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba4&quot; &gt;&lt;math&gt; <br /> w_i \sim_{i.i.d.} {\cal N}\left(70,10^2\right). &lt;/math&gt;&lt;/div&gt;<br /> |reference=(8) }}<br /> <br /> Once more, [[#ex_proba2a|(2)]] defines the conditional distribution of the concentrations. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given the weight $\bw$, which is now a random variable whose distribution is defined in [[#ex_proba4|(8)]]. Now, the inputs of the model are the population parameters $\theta= (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$ and the design $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the measurement times===<br /> <br /> Another scenario is to suppose that there is uncertainty in the measurement times $\bt=(t_{ij})$ and not the population parameters or covariates. If we note $\nominal{\bt}=(\nominal{t}_{ij}, 1\leq i \leq N, 1\leq j \leq n_i)$ the nominal measurement times (i.e., those presented in a data set), then the &quot;true&quot; measurement times $\bt$ at which the measurement were made can be considered random fluctuations around $\nominal{\bt}$ following some distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> Randomness with respect to time can also appear in the presence of dropout, i.e., individuals that prematurely leave a clinical trial. For such an individual $i$ who leaves at the random time $T_i$, measurement times are the nominal times before $T_i$: $t_{i} = (\nominal{t}_{ij} \ \ {\rm s.t. }\ \ \nominal{t}_{ij}\leq T_i)$.<br /> In such situations, measurement times are therefore random and can be thought to come from a distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> <br /> {{Remarks<br /> |title=Remark<br /> |text= If there are also other regression variables $\bx=(x_{ij})$, it is of course possible to use the same approach and consider $\bx$ as a random variable fluctuating around $\nominal{\bx}$. }}<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsit(\by , \bpsi,\bt; \theta,\bc,\nominal{\bt})=\pcypsit(\by {{!}}\bpsi,\bt) \, \ppsi(\bpsi;\theta,\bc) \, \pt(\bt ; \nominal{\bt}) .<br /> &lt;/math&gt; }}<br /> <br /> <br /> &lt;li&gt; The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$ and the nominal design $\nominal{\bt}$.<br /> }}<br /> <br /> {{Example<br /> |title=Example:<br /> |text= Let us assume as prior a normal distribution around the nominal times: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba5&quot; &gt;&lt;math&gt; <br /> t_{ij} \sim_{i.i.d.} {\cal N}\left(\nominal{t}_{ij},0.03^2\right). &lt;/math&gt;&lt;/div&gt;<br /> |reference=(9) }}<br /> <br /> Here, [[#ex_proba5|(9)]] defines the distribution of the now random variable $\bt$. The other components of the model defined in [[#ex_proba2a|(2)]] and [[#ex_proba2b|(3)]] remain unchanged. <br /> The inputs of the model are the population parameters $\theta$, the weights $(w_i)$ and the nominal measurement times $\nominal{\bt}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the dose regimen===<br /> <br /> If the structural model is a dynamical system (e.g., defined by a system of [http://en.wikipedia.org/wiki/Ordinary_differential_equation ordinary differential equations]), the ''source terms'' $\bu = (u_i, 1\leq i \leq N)$, i.e., the inputs of the dynamical system, are usually considered fixed and known. This is the case for example for doses administered to patients for a given treatment. Here, the source term $u_i$ is made up of the dose(s) given to patient $i$, the time(s) of administration, and their type (intravenous bolus, infusion, oral, etc.).<br /> <br /> Here again, there may be differences between the nominal dose regimen stated in the protocol and given in the data set, and the dose regimen that was in reality administered. For example, it might be that the times of administration and/or the dosage were not exactly respected or recorded. Also, there may have been non-compliance, i.e., certain doses that were not taken by the patient.<br /> <br /> If we denote $\nominal{\bu}=(\nominal{u}_{i}, 1\leq i \leq N)$ the nominal dose regimens (reported in the dataset), then in this context the &quot;real&quot; dose regimens $\bu$ can be considered to randomly fluctuate around $\nominal{\bu}$ with some distribution $\qu(\, \cdot \, ; \nominal{\bu})$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the dose regimens:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsiu(\by , \bpsi,\bu; \theta,\bc,\bt,\nominal{\bu})=\pcypsiu(\by {{!}} \bpsi,\bu;\bt) \, \pu(\bu ; \nominal{\bu}) \, \ppsi(\bpsi;\theta,\bc) .<br /> &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$, the nominal design $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=Suppose that instead of the one dose given in the example up to now, there are now repeated doses $(d_{ik}, k \geq 1)$ administered to patient $i$ at times $(\tau_{ik} , k \geq 1)$. Then, it is easy to see that<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6b&quot; &gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N}\left(f(t_{ij};V_i,k_i) , a_i^2 \right), &lt;/math&gt;&lt;/div&gt;<br /> |reference=(10) }}<br /> <br /> where<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6a&quot; &gt;&lt;math&gt; <br /> f(t;V_i,k_i) = \sum_{k, \tau_{ik}&lt;t}\displaystyle {\frac{d_{ik} }{V_i} }\, e^{-k_i \, (t- \tau_{ik})} .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(11) }}<br /> <br /> The &quot;real&quot; dose regimen administered to patient $i$ can be written $u_i=(d_{ik},\tau_{ik}, k\geq 1)$, and the prescribed dose regimen $\nominal{u}_i=(\nominal{d}_{ik},\nominal{\tau}_{ik}, k\geq 1)$.<br /> <br /> We can model the random fluctuations of the administration times $\tau_{ik}$ around the nominal times $(\nominal{\tau}_{ij})$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6c&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \tau_{ik} &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\nominal{\tau}_{ik},0.02^2\right)\ , <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(12) }}<br /> <br /> and non-compliance (here meaning that a dose is not taken):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6d&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \pi &amp;=&amp; \prob{d_{ik} = 0} \nonumber \\ &amp;=&amp; 1 - \prob{d_{ik}= \nominal{d}_{ik} }. <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(13) }}<br /> <br /> Here, [[#ex_proba6b{{!}}(10)]] and [[#ex_proba6a{{!}}(11)]] define the conditional distributions of the concentrations $(y_{ij})$, [[#ex_proba2b{{!}}(3)]] defines the distribution of $\bpsi$ and [[#ex_proba6c{{!}}(12)]] and [[#ex_proba6d{{!}}(13)]] define the distribution of $\bu$. The inputs are the population parameters $\theta$, the weights $(w_i)$, the measurement times $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A complete model===<br /> <br /> We have now seen the variety of ways in which the variables in a model either play the role of random variables whose distribution is defined by the model, or nonrandom variables or parameters. Any combination is possible, depending on the context. For instance, the population parameters $\theta$ and covariates $\bc$ could be random with parametric probability distributions $\qth(\, \cdot \,;\varphi)$ and $\qc(\, \cdot \,;\gamma)$, and the dose regimen $\bu$ and measurement times $\bt$ reported with uncertainty and therefore modeled as random variables with distributions $\qu$ and $\qt$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters, the population parameters, the dose regimens, the covariates and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsithcut(\by , \bpsi, \theta, \bu, \bc,\bt; \nominal{\bu},\nominal{\bt},\varphi,\gamma)=\pcypsiut(\by {{!}}\bpsi,\bu,\bt) \, \pcpsithc(\bpsi{{!}}\theta,\bc) \, \pth(\theta;\varphi) \, \pc(\bc;\gamma) \, \pu(\bu ; \nominal{\bu}) \, \pt(\bt ; \nominal{\bt}). &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the nominal dose regimens $\nominal{\bu}$, the nominal measurement times $\nominal{\bt}$ and the &quot;hyper-parameters&quot; $\varphi$ and $\gamma$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Using the model for executing tasks==<br /> <br /> <br /> In the modeling and simulation context, the tasks to execute make specific use of the various probability distributions associated with a model.<br /> <br /> &lt;br&gt;<br /> ===Simulation===<br /> <br /> <br /> By definition, simulation makes direct use of the probability distribution that defines the model. Simulation of the global model is straightforward as soon as the joint distribution can be decomposed into a product of easily simulated conditional distributions.<br /> <br /> Consider for example that the variables involved in the model are those introduced in [[#An illustrative example|the previous section]]:<br /> <br /> <br /> # The population parameters $\theta$ can either be given, or simulated from the distribution $\qth$.<br /> # The individual covariates $\bc$ can either be given, or simulated from the distribution $\qc$.<br /> # The individual parameters $\bpsi$ can be simulated from the distribution $\qcpsithc$ using the values of $\theta$ and $\bc$ obtained in steps 1 and 2.<br /> # The dose regimen $\bu$ can either be given, or simulated from the distribution $\qu$.<br /> # The measurement times $\bt$ (resp. regression variables $\bx$) can either be given, or simulated from the distribution $\qt$ (resp. $\qx$).<br /> # Lastly, observations $\by$ can be simulated from the distribution $\qcypsiut$ using the values of $\bpsi$, $\bu$ and $\bt$ obtained at steps 3, 4 and 5.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> Simulation of a set of variables $w$ using another given set of variables $z$ requires:<br /> <br /> <br /> &lt;ul&gt;<br /> * a model, i.e., a distribution $\qw$ if $z$ is treated as a nonrandom variable, or a conditional distribution $\qcwz$ if $z$ is treated as a random variable.<br /> * the input $z$, i.e., a value of $z$ which allows the distribution $\qw(\, \cdot \, ; z)$ or the conditional distribution $\qcwz(\, \cdot \, {{!}} z)$ to be defined.<br /> * an algorithm which allows us to generate $w$ from $\qw$ or $\qcwz$.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=<br /> - Imagine instead that the population parameter $\theta$ and the design $(\bu,\bt)$ are given, and we want to simulate the individual covariates $\bc$, the individual parameters $\bpsi$ and the observations $\by$. Here, the variables to simulate are $w=(\bc,\bpsi,\by)$ and the variables which are given are $z=(\theta,\bu,\bt)$. If the components of $z$ are taken to be nonrandom variables, then:<br /> <br /> <br /> &lt;ul&gt;<br /> * The model is the joint distribution $\qypsic( \, \cdot \, ;\theta,\bu,\bt)$ of $(\by,\bpsi,\bc)$.<br /> * The inputs required for the simulation are the values of $(\theta,\bu,\bt)$.<br /> * The algorithm should be able to generate $(\by,\bpsi,\bc)$ from the joint distribution $\qypsic(\, \cdot \, ;\theta,\bu,\bt)$. Decomposing the model into three submodels leads to decomposing the joint distribution as<br /> &lt;/ul&gt;<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsic(\by,\bpsi,\bc ;\theta,\bu,\bt) = \pc(\bc) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pcypsi(\by {{!}} \bpsi;\bu,\bt) . &lt;/math&gt; }}<br /> <br /> The algorithm therefore reduces to successively drawing $\bc$, $\bpsi$ and $\by$ from $\qc$, $\qcpsic(\, \cdot \, {{!}} \bc;\theta)$ and $\qcypsi(\, \cdot \, {{!}} \bpsi;\bu,\bt)$. <br /> <br /> <br /> - Imagine instead that the individual covariates $\bc$, the observations $\by$, the design $(\bu,\bt)$ and the population parameter $\theta$ are given (in a modeling context for instance, $\theta$ may have been estimated), and we want to simulate the individual parameters $\bpsi$. The only variable to simulate is $w=\bpsi$ and the variables which are given are $z=(\by,\bc,\theta,\bu,\bt)$. Here, we will treat $\by$ as if it is a random variable. The other components of $z$ can be treated as non random variables. Here,<br /> <br /> <br /> &lt;ul&gt;<br /> * The model is the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$ of $\psi$.<br /> * The inputs required for the simulation are the values of $(\by,\bc,\theta,\bu,\bt)$.<br /> * The algorithm should be able to sample $\bpsi$ from the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$. [http://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo Markov Chain Monte Carlo] (MCMC) algorithms can be used for sampling from such complex conditional distributions.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> &lt;br&gt;<br /> <br /> ===Estimation of the population parameters===<br /> <br /> <br /> In a modeling context, we usually assume that we have data that includes the observations $\by$ and the measurement times $\bt$. There may also be individual covariates $\bc$, and in pharmacological applications the dose regimen $\bu$. For clarity, let us consider the most general case where all are present.<br /> <br /> Any statistical method for estimating the population parameters $\theta$ will be based on some specific probability distribution. Let us illustrate this with two common statistical methods: maximum likelihood and Bayesian estimation.<br /> <br /> <br /> ''Maximum likelihood estimation'' consists in maximizing with respect to $\theta$ the ''observed likelihood'', defined by:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\like}(\theta ; \by,\bc,\bu,\bt) &amp;\eqdef&amp; \py(\by ; \bc,\bu,\bt,\theta) \\<br /> &amp;=&amp; \int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The variance of the estimator $\thmle$ and therefore confidence intervals can be derived from the observed Fisher information matrix, which itself is calculated using the observed likelihood (i.e., the pdf of the observations $\by$):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ofim_intro3&quot;&gt;&lt;math&gt;<br /> \ofim(\thmle ; \by,\bc,\bu,\bt) \ \ \eqdef \ \ - \displaystyle{ \frac{\partial^2}{\partial \theta^2} } \log({\like}(\thmle ; \by,\bc,\bu,\bt)) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(14) }}<br /> <br /> <br /> {{OutlineText<br /> |text=Maximum likelihood estimation of the population parameter $\theta$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi$.<br /> * inputs $\by$, $\bc$, $\bu$ and $\bt$.<br /> * an algorithm which allows us to maximize $\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi$ with respect to $\theta$ and to compute $\displaystyle{ \frac{\partial^2}{\partial \theta^2} }\left\{\log\left(\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\thmle) \, d \bpsi \right)\right\}$.<br /> }}<br /> <br /> <br /> ''Bayesian estimation'' consists in estimating and/or maximizing the conditional distribution<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcthy(\theta {{!}} \by ;\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pyth(\by , \theta ; \bc,\bu,\bt)}{\py(\by ; \bc,\bu,\bt)} } \\<br /> &amp;=&amp; \frac{\displaystyle{ \int \pypsith(\by,\bpsi,\theta ; \bc,\bu,\bt) \, d \bpsi} }{\py(\by ; \bc,\bu,\bt)} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> {{outlineText<br /> |text= Bayesian estimation of the population parameter $\theta$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsith(\, \cdot \, ; \bc, \bu, \bt)$ for $(\by,\bpsi,\theta)$.<br /> * inputs $\by$, $\bc$, $\bu$ and $\bt$.<br /> * algorithms able to estimate and maximize $\pcthy(\theta {{!}} \by ;\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode, i.e., finding its maximum.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Estimation of the individual parameters===<br /> <br /> <br /> When $\theta$ is given (or estimated), various estimators of the individual parameters $\bpsi$ are available. They are all based on a probability distribution:<br /> <br /> ''Maximum likelihood estimation'' consists of maximizing with respect to $\bpsi$ the ''conditional likelihood''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\like}(\bpsi ; \by,\bu,\bt) &amp;\eqdef&amp; \pcypsi(\by {{!}} \bpsi ;\bu,\bt) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''maximum a posteriori'' (MAP) estimator is obtained by maximizing with respect to $\bpsi$ the ''conditional distribution''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcpsiy(\bpsi {{!}} \by ; \theta,\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pypsi(\by , \bpsi;\theta,\bc,\bu,\bt)}{\py(\by ; \theta,\bc,\bu,\bt)} } .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''conditional mean'' of $\bpsi$ is defined as the mean of the conditional distribution $\qcpsiy(\, \cdot \, | \by ; \theta,\bc,\bu,\bt)$ of $\psi$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> Estimation of the individual parameters $\bpsi$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br /> * algorithms able to estimate and maximize $\pcpsiy(\bpsi {{!}} \by ; \theta,\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode (i.e., its MAP).<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> ===Model selection===<br /> <br /> <br /> Likelihood ratio tests and statistical information criteria ([http://en.wikipedia.org/wiki/Bayesian_information_criterion BIC], [http://en.wikipedia.org/wiki/Akaike_information_criterion AIC]) compare the ''observed likelihoods'' computed under different models, i.e., the probability distribution functions $\py^{(1)}(\by ; \bc,\bu,\bt,\thmle_1)$, $\py^{(2)}(\by ; \bc,\bu,\bt,\thmle_2)$, ..., $\py^{(K)}(\by ; \bc,\bu,\bt,\thmle_K)$ computed under models ${\cal M}_1, {\cal M}_2$, ..., ${\cal M}_K$, where $\thmle_k$ maximizes the observed likelihood of model ${\cal M}_k$, i.e., maximizes $\py^{(k)}(\by ; \bc,\bu,\bt,\theta)$ .<br /> <br /> <br /> {{outlineText<br /> |text=<br /> Computing the observed likelihood and information criteria require:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br /> * an algorithm able to compute $\int \pypsi( \by ,\bpsi ;\theta,\bc,\bu,\bt) \, d\bpsi$. For nonlinear models, linearization methods or Monte Carlo methods can be used.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Optimal design===<br /> <br /> In the design of experiments for estimating statistical models, optimal designs allow parameters to be estimated with minimum variance by optimizing some statistical criteria. Common optimality criteria are [http://en.wikipedia.org/wiki/Functional_%28mathematics%29 functionals] of the [http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors eigenvalues] of the expected Fisher information matrix<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;efim_intro3&quot;&gt;&lt;math&gt;<br /> \efim(\theta ; \bu,\bt) \ \ \eqdef \ \ \esps{y}{\ofim(\theta ; \by,\bu,\bt)} ,<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(15) }}<br /> <br /> where $\ofim$ is the observed Fisher information matrix defined in [[#ofim_intro3|(15)]]. For the sake of simplicity, we consider models without covariates $\bc$.<br /> <br /> <br /> {{OutlineText<br /> |text=Optimal design for minimum variance estimation requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * a vector of population parameters $\theta$.<br /> * a criteria ${\cal D}(\bu,\bt)$ derived from the expected Fisher information matrix $\efim(\theta ; \bu,\bt)$.<br /> * an algorithm able to estimate $\efim(\theta ; \bu,\bt)$ for any design $(\bu,\bt)$ and to maximize ${\cal D}(\bu,\bt)$ with respect to $\bu$ and $\bt$.<br /> }}<br /> <br /> <br /> In a [http://en.wikipedia.org/wiki/Clinical_trial clinical trial] context, studies are designed to optimize the probability of reaching some predefined target ${\cal A}$, i.e., $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$. This may include optimizing safety and efficacy, and things like the probability of reaching sustained virologic response, etc.<br /> <br /> <br /> {{OutlineText<br /> |text=Optimal design for clinical trials requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsic(\, \cdot \, ; \theta, \bu, \bt)$ for $(\by,\bpsi,\bc)$.<br /> * a vector of population parameters $\theta$.<br /> * a target ${\cal A}$.<br /> * an algorithm able to estimate $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$ and to maximize it with respect to $\bu$ and $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Implementing models and running tasks==<br /> <br /> <br /> ===Example 1 ===<br /> <br /> Consider first the model defined by the joint distribution <br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\pypsi(\by,\bpsi ; \theta ,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \pcpsic(\bpsi ; \theta),&lt;/math&gt;}}<br /> <br /> where as in our running example, <br /> <br /> <br /> &lt;ul&gt;<br /> * $\by = (y_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are concentrations <br /> <br /> * $\bpsi= (\psi_i, 1\leq i \leq N)$ are individual parameters; here $\psi_i=(V_i,k_i,a_i)$<br /> <br /> * $\theta=(V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,a)$ are population parameters <br /> <br /> * $\bt = (t_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are the measurement times. <br /> &lt;/ul&gt;<br /> <br /> <br /> We aim to define a joint model for $\by$ and $\bpsi$. To do this, we will characterize each component of the model and show how this can be implemented with $\mlxtran$.<br /> <br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot;|<br /> {{Equation2 <br /> |name=&lt;math&gt; \pypsi(\by,\bpsi ; \theta, \bt) &lt;/math&gt; <br /> |equation= }}<br /> {{Equation2<br /> |name= &lt;math&gt; \pcpsic(\bpsi {{!}}\theta)&lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{array}{c} <br /> \log(V_i) &amp;\sim&amp; {\cal N}\left(\log(V_{\rm pop}), \, \omega_V^2\right) \\<br /> \log(k_i) &amp;\sim&amp; {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br /> \end{array}&lt;/math&gt; }}<br /> {{Equation2<br /> |name= &lt;math&gt;\pcypsi(y{{!}}\bpsi; \bt) &lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{eqnarray}<br /> f(t;V_i,k_i) &amp;=&amp; \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br /> y_{ij} &amp;\sim&amp; {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> |style = &quot;width:50%&quot; |<br /> {{MLXTranForTable<br /> |name=Example 1<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> [INDIVIDUAL PARAMETER]<br /> input={V_pop,k_pop,omega_V,omega_k}<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=V_pop,sd=omega_V}<br /> k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br /> <br /> <br /> [OBSERVATION]<br /> input={V,k,a}<br /> <br /> EQUATION:<br /> f = 500/V*exp(-k*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, sd=a}<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> <br /> We can then use this model with different tools for executing different tasks: it can be used for example with $\mlxplore$ for model exploration, with $\monolix$ for modeling, with R or Matlab for simulation...<br /> <br /> It is important to remember that $\mlxtran$ is not a &quot;function&quot; that calculates an output. It is not an imperative but rather a declarative language, one that allows us to describe a model. It is then the tasks we choose to do which use $\mlxtran$ like a function, &quot;requesting&quot; it to give predictions, simulate random variables, compute a pdf, maximizes a likelihood, etc.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Example 1 ===<br /> <br /> Consider first the model defined by the joint distribution <br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\pypsi(\by,\bpsi ; \theta ,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \pcpsic(\bpsi ; \theta),&lt;/math&gt;}}<br /> <br /> where as in our running example, <br /> <br /> <br /> &lt;ul&gt;<br /> * $\by = (y_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are concentrations <br /> <br /> * $\bpsi= (\psi_i, 1\leq i \leq N)$ are individual parameters; here $\psi_i=(V_i,k_i,a_i)$<br /> <br /> * $\theta=(V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,a)$ are population parameters <br /> <br /> * $\bt = (t_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are the measurement times. <br /> &lt;/ul&gt;<br /> <br /> <br /> We aim to define a joint model for $\by$ and $\bpsi$. To do this, we will characterize each component of the model and show how this can be implemented with $\mlxtran$.<br /> <br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot;|<br /> {{Equation2 <br /> |name=&lt;math&gt; \pypsi(\by,\bpsi ; \theta, \bt) &lt;/math&gt; <br /> |equation= }}<br /> {{Equation2<br /> |name= &lt;math&gt; \pcpsic(\bpsi {{!}}\theta)&lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{array}{c} <br /> \log(V_i) &amp;\sim&amp; {\cal N}\left(\log(V_{\rm pop}), \, \omega_V^2\right) \\<br /> \log(k_i) &amp;\sim&amp; {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br /> \end{array}&lt;/math&gt; }}<br /> {{Equation2<br /> |name= &lt;math&gt;\pcypsi(y{{!}}\bpsi; \bt) &lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{eqnarray}<br /> f(t;V_i,k_i) &amp;=&amp; \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br /> y_{ij} &amp;\sim&amp; {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> |style = &quot;width:50%&quot; |<br /> {{MLXTranForTable<br /> |name=Example 1<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> [INDIVIDUAL PARAMETER]<br /> input={V_pop,k_pop,omega_V,omega_k}<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=V_pop,sd=omega_V}<br /> k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br /> <br /> <br /> [OBSERVATION]<br /> input={V,k,a}<br /> <br /> EQUATION:<br /> f = 500/V*exp(-k*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, sd=a}<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> <br /> We can then use this model with different tools for executing different tasks: it can be used for example with $\mlxplore$ for model exploration, with $\monolix$ for modeling, with R or Matlab for simulation...<br /> <br /> It is important to remember that $\mlxtran$ is not a &quot;function&quot; that calculates an output. It is not an imperative but rather a declarative language, one that allows us to describe a model. It is then the tasks we choose to do which use $\mlxtran$ like a function, &quot;requesting&quot; it to give predictions, simulate random variables, compute a pdf, maximizes a likelihood, etc.<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack=The individual approach<br /> |linkNext=Description, representation and implementation of a model }}</div> Bleakley https://wiki.inria.fr/wikis/popix/index.php?title=What_is_a_model%3F_A_joint_probability_distribution!&diff=7369 What is a model? A joint probability distribution! 2013-06-19T09:26:32Z <p>Bleakley: </p> <hr /> <div>==Introduction==<br /> <br /> A model built for real-world applications can involve various types of variable, such as measurements, individual and population parameters, covariates, design, etc. The model allows us to represent relationships between these variables.<br /> <br /> If we consider things from a probabilistic point of view, some of the variables will be random, so the model becomes a probabilistic one, representing the [http://en.wikipedia.org/wiki/Joint_probability_distribution joint distribution] of these random variables.<br /> <br /> Defining a model therefore means defining a joint distribution. The hierarchical structure of the model will then allow it to be decomposed into submodels, i.e., the joint distribution decomposed into a product of [http://en.wikipedia.org/wiki/Conditional_probability_distribution conditional distributions].<br /> <br /> Tasks such as estimation, model selection, simulation and optimization can then be expressed as specific ways of using this probability distribution.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - A model is a joint probability distribution. <br /> <br /> - A submodel is a conditional distribution derived from this joint distribution. <br /> <br /> - A task is a specific use of this distribution. <br /> }}<br /> <br /> We will illustrate this approach starting with a very simple example that we will gradually make more sophisticated. Then we will see in various situations what can be defined as the model and what its inputs are.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==An illustrative example==<br /> <br /> &lt;br&gt;<br /> ===A model for the observations of a single individual===<br /> Let $y=(y_j, 1\leq j \leq n)$ be a vector of observations obtained at times $\vt=(t_j, 1\leq j \leq n)$. We consider that the $y_j$ are random variables and we denote $\qy$ the distribution (or [http://en.wikipedia.org/wiki/Probability_density_function pdf]) of $y$. If we assume a [http://en.wikipedia.org/wiki/Parametric_model parametric model], then there exists a vector of parameters $\psi$ that completely define $y$.<br /> <br /> We can then explicitly represent this dependency with respect to $\bpsi$ by writing $\qy( \, \cdot \, ; \psi)$ for the pdf of $y$.<br /> <br /> If we wish to be even more precise, we can even make it clear that this distribution is defined for a given design, i.e., a given vector of times $\vt$, and write $\qy(\, \cdot \, ; \psi,\vt)$ instead.<br /> <br /> By convention, the variables which are before the symbol &quot;;&quot; are random variables. Those that are after the &quot;;&quot; are non-random parameters or variables.<br /> When there is no risk of confusion, the non-random terms can be left out of the notation.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> -In this context, the model is the distribution of the observations $\qy(\, \cdot \, ; \psi,\vt)$. &lt;br&gt;<br /> -The inputs of the model are the parameters $\psi$ and the design $\vt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= 500 mg of a drug is given by [http://en.wikipedia.org/wiki/Intravenous_therapy intravenous] [http://en.wikipedia.org/wiki/Bolus_%28medicine%29 bolus] to a patient at time 0. We assume that the evolution of the [http://en.wikipedia.org/wiki/Blood_plasma plasmatic] concentration of the drug over time is described by the [http://en.wikipedia.org/wiki/Pharmacokinetics pharmacokinetic] (PK) model<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; f(t;V,k) = \displaystyle{ \frac{500}{V} }e^{-k \, t} , &lt;/math&gt; }}<br /> <br /> where $V$ is the [http://en.wikipedia.org/wiki/Volume_of_distribution volume of distribution] and $k$ the [http://en.wikipedia.org/wiki/Elimination_rate_constant elimination rate constant]. The concentration is measured at times $(t_j, 1\leq j \leq n)$ with additive residual errors:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; y_j = f(t_j;V,k) + e_j , \quad 1 \leq j \leq n . &lt;/math&gt; }}<br /> <br /> Assuming that the residual errors $(e_j)$ are [http://en.wikipedia.org/wiki/Dependent_and_independent_variables independent] and [http://en.wikipedia.org/wiki/Normal_distribution normally distributed] with constant variance $a^2$, the observed values $(y_j)$ are also independent random variables and<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba1&quot; &gt;&lt;math&gt;<br /> y_j \sim {\cal N} \left( f(t_j ; V,k) , a^2 \right), \quad 1 \leq j \leq n. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> Here, the vector of parameters $\psi$ is $(V,k,a)$. $V$ and $k$ are the PK parameters for the structural PK model and $a$ the residual error parameter.<br /> As the $y_j$ are independent, the joint distribution of $y$ is the product of their marginal distributions:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \py(y ; \psi,\vt) = \prod_{j=1}^n \pyj(y_j ; \psi,t_j) ,<br /> &lt;/math&gt; }}<br /> <br /> where $\qyj$ is the normal distribution defined in [[#ex_proba1|(1)]]. <br /> }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> === A model for several individuals ===<br /> <br /> Now let us move to $N$ individuals. It is natural to suppose that each is represented by the same basic parametric model, but not necessarily the exact same parameter values. Thus, individual $i$ has parameters $\psi_i$. If we consider that individuals are randomly selected from the [http://en.wikipedia.org/wiki/Statistical_population population], then we can treat the $\psi_i$ as if they were random vectors. As both $\by=(y_i , 1\leq i \leq N)$ and $\bpsi=(\psi_i , 1\leq i \leq N)$ are random, the model is now a joint distribution: $\qypsi$. Using basic probability, this can be written as:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsi(\by,\bpsi) = \pcypsi(\by {{!}} \bpsi) \, \ppsi(\bpsi) .&lt;/math&gt; }}<br /> <br /> If $\qpsi$ is a parametric distribution that depends on a vector $\theta$ of ''population parameters'' and a set of ''individual covariates'' $\bc=(c_i , 1\leq i \leq N)$, this dependence can be made explicit by writing $\qpsi(\, \cdot \,;\theta,\bc)$ for the pdf of $\bpsi$.<br /> Each $i$ has a potentially unique set of times $t_i=(t_{i1},\ldots,t_{i \ \!\!n_i})$ in the design, and $n_i$ can be different for each individual.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - In this context, the model is the joint distribution of the observations and the individual parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pypsi(\by , \bpsi; \theta, \bc,\bt)=\pcypsi(\by {{!}} \bpsi;\bt) \, \ppsi(\bpsi;\theta,\bc) . &lt;/math&gt;}}<br /> <br /> - The inputs of the model are the population parameters $\theta$, the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times <br /> :$\bt=(t_{ij} ,\ 1\leq i \leq N ,\ 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= Let us suppose $N$ patients received the same treatment as the single patient did. We now have the same PK model [[#ex_proba1|(1)]] for each patient, except that each has its own individual PK parameters $V_i$ and $k_i$ and potentially its own residual error parameter $a_i$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2a&quot;&gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N} \left( \displaystyle{\frac{500}{V_i}e^{-k_i \, t_{ij} } } , a_i^2 \right).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> Here, $\psi_i = (V_i,k_i,a_i)$. One possible model is then to assume the same residual error model for all patients, and log-normal distributions for $V$ and $k$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> a_i &amp;=&amp; a \end{eqnarray}&lt;/math&gt; }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2b&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \log(V_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(V_{\rm pop})+\beta\,\log(w_i/70),\ \omega_V^2\right) \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \log(k_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(k_{\rm pop}),\ \omega_k^2\right), \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where the only covariate we choose to consider, $w_i$, is the weight (in kg) of patient $i$. The model therefore consists of the conditional distribution of the concentrations defined in [[#ex_proba2a|(2)]] and the distribution of the individual PK parameters defined in [[#ex_proba2b|(3)]]. The inputs of the model are the population parameters $\theta = (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$, the covariates (here, the weight) $(w_i, 1\leq i \leq N)$, and the design $\bt$.<br /> }}<br /> <br /> &lt;br&gt;<br /> ===A model for the population parameters===<br /> <br /> In some cases, it may turn out that it is useful or important to consider that the population parameter $\theta$ is itself random, rather than fixed. There are various reasons for this, such as if we want to model uncertainty in its value, introduce a priori information in an estimation context, or model an inter-population variability if the model is not looking at only one given population.<br /> <br /> If so, let us denote $\qth$ the distribution of $\theta$. As the status of $\theta$ has therefore changed, the model now becomes the joint distribution of its random variables, i.e., of $\by$, $\bpsi$ and $\theta$, and can be decomposed as follows:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3a&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta;\bt,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> |text= &lt;ol&gt;<br /> &lt;li&gt; The formula is identical for $\ppsi(\bpsi; \theta)$ and $\pcpsith(\bpsi{{!}}\theta)$. What has changed is the status of $\theta$. It is not random in $\ppsi(\bpsi; \theta)$, the distribution of $\bpsi$ for any given value of $\theta$, whereas it is random in $\pcpsith(\bpsi {{!}} \theta)$, the conditional distribution of $\bpsi$, i.e., the distribution of $\bpsi$ obtained after observing randomly generated $\theta$. &lt;/li&gt;&lt;br&gt;<br /> <br /> &lt;li&gt;If $\qth$ is a parametric distribution with parameter $\varphi$, this dependence can be made explicit by writing $\qth(\, \cdot \,;\varphi)$ for the distribution of $\theta$.&lt;/li&gt;&lt;br&gt;<br /> <br /> &lt;li&gt;Not necessarily all of the components of $\theta$ need be random. If it is possible to decompose $\theta$ into $(\theta_F,\theta_R)$, where $\theta_F$ is fixed and $\theta_R$ random, then the decomposition [[#proba3a{{!}}(4)]] becomes &lt;/li&gt;<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3b&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta_R;\bt,\theta_F,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta_R;\theta_F,\bc) \, \pth(\theta_R).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(5) }} <br /> &lt;/ol&gt;}}<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the population parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsith(\by,\bpsi,\theta;\bc,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta). &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times $\bt=(t_{ij} , 1\leq i \leq N , 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We can introduce [http://en.wikipedia.org/wiki/Prior_probability prior distributions] in order to model the inter-population variability of the population parameters $V_{\rm pop}$ and $k_{\rm pop}$: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba3&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> V_{\rm pop} &amp;\sim&amp; {\cal N}\left(30,3^2\right) <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(6) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> k_{\rm pop} &amp;\sim&amp; {\cal N}\left(0.1,0.01^2\right). \end{eqnarray}&lt;/math&gt; }}<br /> <br /> As before, the conditional distribution of the concentration is given by [[#ex_proba2a|(2)]]. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given $\theta_R=(V_{\rm pop},k_{\rm pop})$. The distribution of $\theta_R$ is defined in [[#ex_proba3|(6)]]. Here, the inputs of the model are the fixed population parameters $\theta_F = (\omega_V,\omega_k,\beta,a)$, the weights $(w_i)$ and the design $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the covariates===<br /> <br /> <br /> Another scenario is to suppose that in fact it is the covariates $\bc$ that are random, not the population parameters. This may either be in the context of wanting to simulate individuals, or when modeling and wanting to take into account uncertainty in the covariate values. If we note $\qc$ the distribution of the covariates, the joint distribution $\qpsic$ of the individual parameters and the covariates decomposes naturally as:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba4&quot;&gt;&lt;math&gt;<br /> \ppsic(\bpsi,\bc;\theta) = \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) \, ,<br /> &lt;/math&gt;<br /> |reference=(7) }}<br /> <br /> where $\qcpsic$ is the conditional distribution of $\bpsi$ given $\bc$.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> &lt;li&gt;In this context, the model is the joint distribution of the observations, the individual parameters and the covariates:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsic(\by,\bpsi,\bc;\theta,\bt) = \pcypsi(\by {{!}} \bpsi;\bt) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) . &lt;/math&gt; }}<br /> <br /> &lt;li&gt;The inputs of the model are the population parameters $\theta$ and the measurement times $\bt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We could assume a normal distribution as a prior for the weights: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba4&quot; &gt;&lt;math&gt; <br /> w_i \sim_{i.i.d.} {\cal N}\left(70,10^2\right). &lt;/math&gt;&lt;/div&gt;<br /> |reference=(8) }}<br /> <br /> Once more, [[#ex_proba2a|(2)]] defines the conditional distribution of the concentrations. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given the weight $\bw$, which is now a random variable whose distribution is defined in [[#ex_proba4|(8)]]. Now, the inputs of the model are the population parameters $\theta= (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$ and the design $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the measurement times===<br /> <br /> Another scenario is to suppose that there is uncertainty in the measurement times $\bt=(t_{ij})$ and not the population parameters or covariates. If we note $\nominal{\bt}=(\nominal{t}_{ij}, 1\leq i \leq N, 1\leq j \leq n_i)$ the nominal measurement times (i.e., those presented in a data set), then the &quot;true&quot; measurement times $\bt$ at which the measurement were made can be considered random fluctuations around $\nominal{\bt}$ following some distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> Randomness with respect to time can also appear in the presence of dropout, i.e., individuals that prematurely leave a clinical trial. For such an individual $i$ who leaves at the random time $T_i$, measurement times are the nominal times before $T_i$: $t_{i} = (\nominal{t}_{ij} \ \ {\rm s.t. }\ \ \nominal{t}_{ij}\leq T_i)$.<br /> In such situations, measurement times are therefore random and can be thought to come from a distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> <br /> {{Remarks<br /> |title=Remark<br /> |text= If there are also other regression variables $\bx=(x_{ij})$, it is of course possible to use the same approach and consider $\bx$ as a random variable fluctuating around $\nominal{\bx}$. }}<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsit(\by , \bpsi,\bt; \theta,\bc,\nominal{\bt})=\pcypsit(\by {{!}}\bpsi,\bt) \, \ppsi(\bpsi;\theta,\bc) \, \pt(\bt ; \nominal{\bt}) .<br /> &lt;/math&gt; }}<br /> <br /> <br /> &lt;li&gt; The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$ and the nominal design $\nominal{\bt}$.<br /> }}<br /> <br /> {{Example<br /> |title=Example:<br /> |text= Let us assume as prior a normal distribution around the nominal times: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba5&quot; &gt;&lt;math&gt; <br /> t_{ij} \sim_{i.i.d.} {\cal N}\left(\nominal{t}_{ij},0.03^2\right). &lt;/math&gt;&lt;/div&gt;<br /> |reference=(9) }}<br /> <br /> Here, [[#ex_proba5|(9)]] defines the distribution of the now random variable $\bt$. The other components of the model defined in [[#ex_proba2a|(2)]] and [[#ex_proba2b|(3)]] remain unchanged. <br /> The inputs of the model are the population parameters $\theta$, the weights $(w_i)$ and the nominal measurement times $\nominal{\bt}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the dose regimen===<br /> <br /> If the structural model is a dynamical system (e.g., defined by a system of [http://en.wikipedia.org/wiki/Ordinary_differential_equation ordinary differential equations]), the ''source terms'' $\bu = (u_i, 1\leq i \leq N)$, i.e., the inputs of the dynamical system, are usually considered fixed and known. This is the case for example for doses administered to patients for a given treatment. Here, the source term $u_i$ is made up of the dose(s) given to patient $i$, the time(s) of administration, and their type (intravenous bolus, infusion, oral, etc.).<br /> <br /> Here again, there may be differences between the nominal dose regimen stated in the protocol and given in the data set, and the dose regimen that was in reality administered. For example, it might be that the times of administration and/or the dosage were not exactly respected or recorded. Also, there may have been non-compliance, i.e., certain doses that were not taken by the patient.<br /> <br /> If we denote $\nominal{\bu}=(\nominal{u}_{i}, 1\leq i \leq N)$ the nominal dose regimens (reported in the dataset), then in this context the &quot;real&quot; dose regimens $\bu$ can be considered to randomly fluctuate around $\nominal{\bu}$ with some distribution $\qu(\, \cdot \, ; \nominal{\bu})$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the dose regimens:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsiu(\by , \bpsi,\bu; \theta,\bc,\bt,\nominal{\bu})=\pcypsiu(\by {{!}} \bpsi,\bu;\bt) \, \pu(\bu ; \nominal{\bu}) \, \ppsi(\bpsi;\theta,\bc) .<br /> &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$, the nominal design $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=Suppose that instead of the one dose given in the example up to now, there are now repeated doses $(d_{ik}, k \geq 1)$ administered to patient $i$ at times $(\tau_{ik} , k \geq 1)$. Then, it is easy to see that<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6b&quot; &gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N}\left(f(t_{ij};V_i,k_i) , a_i^2 \right), &lt;/math&gt;&lt;/div&gt;<br /> |reference=(10) }}<br /> <br /> where<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6a&quot; &gt;&lt;math&gt; <br /> f(t;V_i,k_i) = \sum_{k, \tau_{ik}&lt;t}\displaystyle {\frac{d_{ik} }{V_i} }\, e^{-k_i \, (t- \tau_{ik})} .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(11) }}<br /> <br /> The &quot;real&quot; dose regimen administered to patient $i$ can be written $u_i=(d_{ik},\tau_{ik}, k\geq 1)$, and the prescribed dose regimen $\nominal{u}_i=(\nominal{d}_{ik},\nominal{\tau}_{ik}, k\geq 1)$.<br /> <br /> We can model the random fluctuations of the administration times $\tau_{ik}$ around the nominal times $(\nominal{\tau}_{ij})$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6c&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \tau_{ik} &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\nominal{\tau}_{ik},0.02^2\right)\ , <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(12) }}<br /> <br /> and non-compliance (here meaning that a dose is not taken):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6d&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \pi &amp;=&amp; \prob{d_{ik} = 0} \nonumber \\ &amp;=&amp; 1 - \prob{d_{ik}= \nominal{d}_{ik} }. <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(13) }}<br /> <br /> Here, [[#ex_proba6b{{!}}(10)]] and [[#ex_proba6a{{!}}(11)]] define the conditional distributions of the concentrations $(y_{ij})$, [[#ex_proba2b{{!}}(3)]] defines the distribution of $\bpsi$ and [[#ex_proba6c{{!}}(12)]] and [[#ex_proba6d{{!}}(13)]] define the distribution of $\bu$. The inputs are the population parameters $\theta$, the weights $(w_i)$, the measurement times $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A complete model===<br /> <br /> We have now seen the variety of ways in which the variables in a model either play the role of random variables whose distribution is defined by the model, or nonrandom variables or parameters. Any combination is possible, depending on the context. For instance, the population parameters $\theta$ and covariates $\bc$ could be random with parametric probability distributions $\qth(\, \cdot \,;\varphi)$ and $\qc(\, \cdot \,;\gamma)$, and the dose regimen $\bu$ and measurement times $\bt$ reported with uncertainty and therefore modeled as random variables with distributions $\qu$ and $\qt$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters, the population parameters, the dose regimens, the covariates and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsithcut(\by , \bpsi, \theta, \bu, \bc,\bt; \nominal{\bu},\nominal{\bt},\varphi,\gamma)=\pcypsiut(\by {{!}}\bpsi,\bu,\bt) \, \pcpsithc(\bpsi{{!}}\theta,\bc) \, \pth(\theta;\varphi) \, \pc(\bc;\gamma) \, \pu(\bu ; \nominal{\bu}) \, \pt(\bt ; \nominal{\bt}). &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the nominal dose regimens $\nominal{\bu}$, the nominal measurement times $\nominal{\bt}$ and the &quot;hyper-parameters&quot; $\varphi$ and $\gamma$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Using the model for executing tasks==<br /> <br /> <br /> In the modeling and simulation context, the tasks to execute make specific use of the various probability distributions associated with a model.<br /> <br /> &lt;br&gt;<br /> ===Simulation===<br /> <br /> <br /> By definition, simulation makes direct use of the probability distribution that defines the model. Simulation of the global model is straightforward as soon as the joint distribution can be decomposed into a product of easily simulated conditional distributions.<br /> <br /> Consider for example that the variables involved in the model are those introduced in [[#An illustrative example|the previous section]]:<br /> <br /> <br /> # The population parameters $\theta$ can either be given, or simulated from the distribution $\qth$.<br /> # The individual covariates $\bc$ can either be given, or simulated from the distribution $\qc$.<br /> # The individual parameters $\bpsi$ can be simulated from the distribution $\qcpsithc$ using the values of $\theta$ and $\bc$ obtained in steps 1 and 2.<br /> # The dose regimen $\bu$ can either be given, or simulated from the distribution $\qu$.<br /> # The measurement times $\bt$ (resp. regression variables $\bx$) can either be given, or simulated from the distribution $\qt$ (resp. $\qx$).<br /> # Lastly, observations $\by$ can be simulated from the distribution $\qcypsiut$ using the values of $\bpsi$, $\bu$ and $\bt$ obtained at steps 3, 4 and 5.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> Simulation of a set of variables $w$ using another given set of variables $z$ requires:<br /> <br /> <br /> &lt;ul&gt;<br /> * a model, i.e., a distribution $\qw$ if $z$ is treated as a nonrandom variable, or a conditional distribution $\qcwz$ if $z$ is treated as a random variable.<br /> * the input $z$, i.e., a value of $z$ which allows the distribution $\qw(\, \cdot \, ; z)$ or the conditional distribution $\qcwz(\, \cdot \, {{!}} z)$ to be defined.<br /> * an algorithm which allows us to generate $w$ from $\qw$ or $\qcwz$.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=<br /> - Imagine instead that the population parameter $\theta$ and the design $(\bu,\bt)$ are given, and we want to simulate the individual covariates $\bc$, the individual parameters $\bpsi$ and the observations $\by$. Here, the variables to simulate are $w=(\bc,\bpsi,\by)$ and the variables which are given are $z=(\theta,\bu,\bt)$. If the components of $z$ are taken to be nonrandom variables, then:<br /> <br /> <br /> &lt;ul&gt;<br /> * The model is the joint distribution $\qypsic( \, \cdot \, ;\theta,\bu,\bt)$ of $(\by,\bpsi,\bc)$.<br /> * The inputs required for the simulation are the values of $(\theta,\bu,\bt)$.<br /> * The algorithm should be able to generate $(\by,\bpsi,\bc)$ from the joint distribution $\qypsic(\, \cdot \, ;\theta,\bu,\bt)$. Decomposing the model into three submodels leads to decomposing the joint distribution as<br /> &lt;/ul&gt;<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsic(\by,\bpsi,\bc ;\theta,\bu,\bt) = \pc(\bc) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pcypsi(\by {{!}} \bpsi;\bu,\bt) . &lt;/math&gt; }}<br /> <br /> The algorithm therefore reduces to successively drawing $\bc$, $\bpsi$ and $\by$ from $\qc$, $\qcpsic(\, \cdot \, {{!}} \bc;\theta)$ and $\qcypsi(\, \cdot \, {{!}} \bpsi;\bu,\bt)$. <br /> <br /> <br /> - Imagine instead that the individual covariates $\bc$, the observations $\by$, the design $(\bu,\bt)$ and the population parameter $\theta$ are given (in a modeling context for instance, $\theta$ may have been estimated), and we want to simulate the individual parameters $\bpsi$. The only variable to simulate is $w=\bpsi$ and the variables which are given are $z=(\by,\bc,\theta,\bu,\bt)$. Here, we will treat $\by$ as if it is a random variable. The other components of $z$ can be treated as non random variables. Here,<br /> <br /> <br /> &lt;ul&gt;<br /> * The model is the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$ of $\psi$.<br /> * The inputs required for the simulation are the values of $(\by,\bc,\theta,\bu,\bt)$.<br /> * The algorithm should be able to sample $\bpsi$ from the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$. [http://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo Markov Chain Monte Carlo] (MCMC) algorithms can be used for sampling from such complex conditional distributions.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> &lt;br&gt;<br /> <br /> ===Estimation of the population parameters===<br /> <br /> <br /> In a modeling context, we usually assume that we have data that includes the observations $\by$ and the measurement times $\bt$. There may also be individual covariates $\bc$, and in pharmacological applications the dose regimen $\bu$. For clarity, let us consider the most general case where all are present.<br /> <br /> Any statistical method for estimating the population parameters $\theta$ will be based on some specific probability distribution. Let us illustrate this with two common statistical methods: maximum likelihood and Bayesian estimation.<br /> <br /> <br /> ''Maximum likelihood estimation'' consists in maximizing with respect to $\theta$ the ''observed likelihood'', defined by:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\like}(\theta ; \by,\bc,\bu,\bt) &amp;\eqdef&amp; \py(\by ; \bc,\bu,\bt,\theta) \\<br /> &amp;=&amp; \int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The variance of the estimator $\thmle$ and therefore confidence intervals can be derived from the observed Fisher information matrix, which itself is calculated using the observed likelihood (i.e., the pdf of the observations $\by$):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ofim_intro3&quot;&gt;&lt;math&gt;<br /> \ofim(\thmle ; \by,\bc,\bu,\bt) \ \ \eqdef \ \ - \displaystyle{ \frac{\partial^2}{\partial \theta^2} } \log({\like}(\thmle ; \by,\bc,\bu,\bt)) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(14) }}<br /> <br /> <br /> {{OutlineText<br /> |text=Maximum likelihood estimation of the population parameter $\theta$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi$.<br /> * inputs $\by$, $\bc$, $\bu$ and $\bt$.<br /> * an algorithm which allows us to maximize $\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi$ with respect to $\theta$ and to compute $\displaystyle{ \frac{\partial^2}{\partial \theta^2} }\left\{\log\left(\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\thmle) \, d \bpsi \right)\right\}$.<br /> }}<br /> <br /> <br /> ''Bayesian estimation'' consists in estimating and/or maximizing the conditional distribution<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcthy(\theta {{!}} \by ;\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pyth(\by , \theta ; \bc,\bu,\bt)}{\py(\by ; \bc,\bu,\bt)} } \\<br /> &amp;=&amp; \frac{\displaystyle{ \int \pypsith(\by,\bpsi,\theta ; \bc,\bu,\bt) \, d \bpsi} }{\py(\by ; \bc,\bu,\bt)} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> {{outlineText<br /> |text= Bayesian estimation of the population parameter $\theta$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsith(\, \cdot \, ; \bc, \bu, \bt)$ for $(\by,\bpsi,\theta)$.<br /> * inputs $\by$, $\bc$, $\bu$ and $\bt$.<br /> * algorithms able to estimate and maximize $\pcthy(\theta {{!}} \by ;\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode, i.e., finding its maximum.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Estimation of the individual parameters===<br /> <br /> <br /> When $\theta$ is given (or estimated), various estimators of the individual parameters $\bpsi$ are available. They are all based on a probability distribution:<br /> <br /> ''Maximum likelihood estimation'' consists of maximizing with respect to $\bpsi$ the ''conditional likelihood''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\like}(\bpsi ; \by,\bu,\bt) &amp;\eqdef&amp; \pcypsi(\by {{!}} \bpsi ;\bu,\bt) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''maximum a posteriori'' (MAP) estimator is obtained by maximizing with respect to $\bpsi$ the ''conditional distribution''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcpsiy(\bpsi {{!}} \by ; \theta,\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pypsi(\by , \bpsi;\theta,\bc,\bu,\bt)}{\py(\by ; \theta,\bc,\bu,\bt)} } .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''conditional mean'' of $\bpsi$ is defined as the mean of the conditional distribution $\qcpsiy(\, \cdot \, | \by ; \theta,\bc,\bu,\bt)$ of $\psi$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> Estimation of the individual parameters $\bpsi$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br /> * algorithms able to estimate and maximize $\pcpsiy(\bpsi {{!}} \by ; \theta,\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode (i.e., its MAP).<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> ===Model selection===<br /> <br /> <br /> Likelihood ratio tests and statistical information criteria ([http://en.wikipedia.org/wiki/Bayesian_information_criterion BIC], [http://en.wikipedia.org/wiki/Akaike_information_criterion AIC]) compare the ''observed likelihoods'' computed under different models, i.e., the probability distribution functions $\py^{(1)}(\by ; \bc,\bu,\bt,\thmle_1)$, $\py^{(2)}(\by ; \bc,\bu,\bt,\thmle_2)$, ..., $\py^{(K)}(\by ; \bc,\bu,\bt,\thmle_K)$ computed under models ${\cal M}_1, {\cal M}_2$, ..., ${\cal M}_K$, where $\thmle_k$ maximizes the observed likelihood of model ${\cal M}_k$, i.e., maximizes $\py^{(k)}(\by ; \bc,\bu,\bt,\theta)$ .<br /> <br /> <br /> {{outlineText<br /> |text=<br /> Computing the observed likelihood and information criteria require:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br /> * an algorithm able to compute $\int \pypsi( \by ,\bpsi ;\theta,\bc,\bu,\bt) \, d\bpsi$. For nonlinear models, linearization methods or Monte Carlo methods can be used.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Optimal design===<br /> <br /> In the design of experiments for estimating statistical models, optimal designs allow parameters to be estimated with minimum variance by optimizing some statistical criteria. Common optimality criteria are [http://en.wikipedia.org/wiki/Functional_%28mathematics%29 functionals] of the [http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors eigenvalues] of the expected Fisher information matrix<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;efim_intro3&quot;&gt;&lt;math&gt;<br /> \efim(\theta ; \bu,\bt) \ \ \eqdef \ \ \esps{y}{\ofim(\theta ; \by,\bu,\bt)} ,<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(15) }}<br /> <br /> where $\ofim$ is the observed Fisher information matrix defined in [[#ofim_intro3|(15)]]. For the sake of simplicity, we consider models without covariates $\bc$.<br /> <br /> <br /> {{OutlineText<br /> |text=Optimal design for minimum variance estimation requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * a vector of population parameters $\theta$.<br /> * a criteria ${\cal D}(\bu,\bt)$ derived from the expected Fisher information matrix $\efim(\theta ; \bu,\bt)$.<br /> * an algorithm able to estimate $\efim(\theta ; \bu,\bt)$ for any design $(\bu,\bt)$ and to maximize ${\cal D}(\bu,\bt)$ with respect to $\bu$ and $\bt$.<br /> }}<br /> <br /> <br /> In a clinical trial context, studies are designed to optimize the probability of reaching some predefined target ${\cal A}$, i.e., $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$. This may include optimizing safety and efficacy, and things like the probability of reaching sustained virologic response, etc.<br /> <br /> <br /> {{OutlineText<br /> |text=Optimal design for clinical trials requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsic(\, \cdot \, ; \theta, \bu, \bt)$ for $(\by,\bpsi,\bc)$.<br /> * a vector of population parameters $\theta$.<br /> * a target ${\cal A}$.<br /> * an algorithm able to estimate $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$ and to maximize it with respect to $\bu$ and $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Implementing models and running tasks==<br /> <br /> <br /> ===Example 1 ===<br /> <br /> Consider first the model defined by the joint distribution <br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\pypsi(\by,\bpsi ; \theta ,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \pcpsic(\bpsi ; \theta),&lt;/math&gt;}}<br /> <br /> where as in our running example, <br /> <br /> <br /> &lt;ul&gt;<br /> * $\by = (y_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are concentrations <br /> <br /> * $\bpsi= (\psi_i, 1\leq i \leq N)$ are individual parameters; here $\psi_i=(V_i,k_i,a_i)$<br /> <br /> * $\theta=(V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,a)$ are population parameters <br /> <br /> * $\bt = (t_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are the measurement times. <br /> &lt;/ul&gt;<br /> <br /> <br /> We aim to define a joint model for $\by$ and $\bpsi$. To do this, we will characterize each component of the model and show how this can be implemented with $\mlxtran$.<br /> <br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot;|<br /> {{Equation2 <br /> |name=&lt;math&gt; \pypsi(\by,\bpsi ; \theta, \bt) &lt;/math&gt; <br /> |equation= }}<br /> {{Equation2<br /> |name= &lt;math&gt; \pcpsic(\bpsi {{!}}\theta)&lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{array}{c} <br /> \log(V_i) &amp;\sim&amp; {\cal N}\left(\log(V_{\rm pop}), \, \omega_V^2\right) \\<br /> \log(k_i) &amp;\sim&amp; {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br /> \end{array}&lt;/math&gt; }}<br /> {{Equation2<br /> |name= &lt;math&gt;\pcypsi(y{{!}}\bpsi; \bt) &lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{eqnarray}<br /> f(t;V_i,k_i) &amp;=&amp; \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br /> y_{ij} &amp;\sim&amp; {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> |style = &quot;width:50%&quot; |<br /> {{MLXTranForTable<br /> |name=Example 1<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> [INDIVIDUAL PARAMETER]<br /> input={V_pop,k_pop,omega_V,omega_k}<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=V_pop,sd=omega_V}<br /> k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br /> <br /> <br /> [OBSERVATION]<br /> input={V,k,a}<br /> <br /> EQUATION:<br /> f = 500/V*exp(-k*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, sd=a}<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> <br /> We can then use this model with different tools for executing different tasks: it can be used for example with $\mlxplore$ for model exploration, with $\monolix$ for modeling, with R or Matlab for simulation...<br /> <br /> It is important to remember that $\mlxtran$ is not a &quot;function&quot; that calculates an output. It is not an imperative but rather a declarative language, one that allows us to describe a model. It is then the tasks we choose to do which use $\mlxtran$ like a function, &quot;requesting&quot; it to give predictions, simulate random variables, compute a pdf, maximizes a likelihood, etc.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Example 1 ===<br /> <br /> Consider first the model defined by the joint distribution <br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\pypsi(\by,\bpsi ; \theta ,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \pcpsic(\bpsi ; \theta),&lt;/math&gt;}}<br /> <br /> where as in our running example, <br /> <br /> <br /> &lt;ul&gt;<br /> * $\by = (y_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are concentrations <br /> <br /> * $\bpsi= (\psi_i, 1\leq i \leq N)$ are individual parameters; here $\psi_i=(V_i,k_i,a_i)$<br /> <br /> * $\theta=(V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,a)$ are population parameters <br /> <br /> * $\bt = (t_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are the measurement times. <br /> &lt;/ul&gt;<br /> <br /> <br /> We aim to define a joint model for $\by$ and $\bpsi$. To do this, we will characterize each component of the model and show how this can be implemented with $\mlxtran$.<br /> <br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot;|<br /> {{Equation2 <br /> |name=&lt;math&gt; \pypsi(\by,\bpsi ; \theta, \bt) &lt;/math&gt; <br /> |equation= }}<br /> {{Equation2<br /> |name= &lt;math&gt; \pcpsic(\bpsi {{!}}\theta)&lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{array}{c} <br /> \log(V_i) &amp;\sim&amp; {\cal N}\left(\log(V_{\rm pop}), \, \omega_V^2\right) \\<br /> \log(k_i) &amp;\sim&amp; {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br /> \end{array}&lt;/math&gt; }}<br /> {{Equation2<br /> |name= &lt;math&gt;\pcypsi(y{{!}}\bpsi; \bt) &lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{eqnarray}<br /> f(t;V_i,k_i) &amp;=&amp; \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br /> y_{ij} &amp;\sim&amp; {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> |style = &quot;width:50%&quot; |<br /> {{MLXTranForTable<br /> |name=Example 1<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> [INDIVIDUAL PARAMETER]<br /> input={V_pop,k_pop,omega_V,omega_k}<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=V_pop,sd=omega_V}<br /> k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br /> <br /> <br /> [OBSERVATION]<br /> input={V,k,a}<br /> <br /> EQUATION:<br /> f = 500/V*exp(-k*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, sd=a}<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> <br /> We can then use this model with different tools for executing different tasks: it can be used for example with $\mlxplore$ for model exploration, with $\monolix$ for modeling, with R or Matlab for simulation...<br /> <br /> It is important to remember that $\mlxtran$ is not a &quot;function&quot; that calculates an output. It is not an imperative but rather a declarative language, one that allows us to describe a model. It is then the tasks we choose to do which use $\mlxtran$ like a function, &quot;requesting&quot; it to give predictions, simulate random variables, compute a pdf, maximizes a likelihood, etc.<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack=The individual approach<br /> |linkNext=Description, representation and implementation of a model }}</div> Bleakley https://wiki.inria.fr/wikis/popix/index.php?title=What_is_a_model%3F_A_joint_probability_distribution!&diff=7368 What is a model? A joint probability distribution! 2013-06-19T09:24:05Z <p>Bleakley: </p> <hr /> <div>==Introduction==<br /> <br /> A model built for real-world applications can involve various types of variable, such as measurements, individual and population parameters, covariates, design, etc. The model allows us to represent relationships between these variables.<br /> <br /> If we consider things from a probabilistic point of view, some of the variables will be random, so the model becomes a probabilistic one, representing the [http://en.wikipedia.org/wiki/Joint_probability_distribution joint distribution] of these random variables.<br /> <br /> Defining a model therefore means defining a joint distribution. The hierarchical structure of the model will then allow it to be decomposed into submodels, i.e., the joint distribution decomposed into a product of [http://en.wikipedia.org/wiki/Conditional_probability_distribution conditional distributions].<br /> <br /> Tasks such as estimation, model selection, simulation and optimization can then be expressed as specific ways of using this probability distribution.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - A model is a joint probability distribution. <br /> <br /> - A submodel is a conditional distribution derived from this joint distribution. <br /> <br /> - A task is a specific use of this distribution. <br /> }}<br /> <br /> We will illustrate this approach starting with a very simple example that we will gradually make more sophisticated. Then we will see in various situations what can be defined as the model and what its inputs are.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==An illustrative example==<br /> <br /> &lt;br&gt;<br /> ===A model for the observations of a single individual===<br /> Let $y=(y_j, 1\leq j \leq n)$ be a vector of observations obtained at times $\vt=(t_j, 1\leq j \leq n)$. We consider that the $y_j$ are random variables and we denote $\qy$ the distribution (or [http://en.wikipedia.org/wiki/Probability_density_function pdf]) of $y$. If we assume a [http://en.wikipedia.org/wiki/Parametric_model parametric model], then there exists a vector of parameters $\psi$ that completely define $y$.<br /> <br /> We can then explicitly represent this dependency with respect to $\bpsi$ by writing $\qy( \, \cdot \, ; \psi)$ for the pdf of $y$.<br /> <br /> If we wish to be even more precise, we can even make it clear that this distribution is defined for a given design, i.e., a given vector of times $\vt$, and write $\qy(\, \cdot \, ; \psi,\vt)$ instead.<br /> <br /> By convention, the variables which are before the symbol &quot;;&quot; are random variables. Those that are after the &quot;;&quot; are non-random parameters or variables.<br /> When there is no risk of confusion, the non-random terms can be left out of the notation.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> -In this context, the model is the distribution of the observations $\qy(\, \cdot \, ; \psi,\vt)$. &lt;br&gt;<br /> -The inputs of the model are the parameters $\psi$ and the design $\vt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= 500 mg of a drug is given by [http://en.wikipedia.org/wiki/Intravenous_therapy intravenous] [http://en.wikipedia.org/wiki/Bolus_%28medicine%29 bolus] to a patient at time 0. We assume that the evolution of the [http://en.wikipedia.org/wiki/Blood_plasma plasmatic] concentration of the drug over time is described by the [http://en.wikipedia.org/wiki/Pharmacokinetics pharmacokinetic] (PK) model<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; f(t;V,k) = \displaystyle{ \frac{500}{V} }e^{-k \, t} , &lt;/math&gt; }}<br /> <br /> where $V$ is the [http://en.wikipedia.org/wiki/Volume_of_distribution volume of distribution] and $k$ the [http://en.wikipedia.org/wiki/Elimination_rate_constant elimination rate constant]. The concentration is measured at times $(t_j, 1\leq j \leq n)$ with additive residual errors:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; y_j = f(t_j;V,k) + e_j , \quad 1 \leq j \leq n . &lt;/math&gt; }}<br /> <br /> Assuming that the residual errors $(e_j)$ are [http://en.wikipedia.org/wiki/Dependent_and_independent_variables independent] and [http://en.wikipedia.org/wiki/Normal_distribution normally distributed] with constant variance $a^2$, the observed values $(y_j)$ are also independent random variables and<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba1&quot; &gt;&lt;math&gt;<br /> y_j \sim {\cal N} \left( f(t_j ; V,k) , a^2 \right), \quad 1 \leq j \leq n. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> Here, the vector of parameters $\psi$ is $(V,k,a)$. $V$ and $k$ are the PK parameters for the structural PK model and $a$ the residual error parameter.<br /> As the $y_j$ are independent, the joint distribution of $y$ is the product of their marginal distributions:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \py(y ; \psi,\vt) = \prod_{j=1}^n \pyj(y_j ; \psi,t_j) ,<br /> &lt;/math&gt; }}<br /> <br /> where $\qyj$ is the normal distribution defined in [[#ex_proba1|(1)]]. <br /> }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> === A model for several individuals ===<br /> <br /> Now let us move to $N$ individuals. It is natural to suppose that each is represented by the same basic parametric model, but not necessarily the exact same parameter values. Thus, individual $i$ has parameters $\psi_i$. If we consider that individuals are randomly selected from the [http://en.wikipedia.org/wiki/Statistical_population population], then we can treat the $\psi_i$ as if they were random vectors. As both $\by=(y_i , 1\leq i \leq N)$ and $\bpsi=(\psi_i , 1\leq i \leq N)$ are random, the model is now a joint distribution: $\qypsi$. Using basic probability, this can be written as:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsi(\by,\bpsi) = \pcypsi(\by {{!}} \bpsi) \, \ppsi(\bpsi) .&lt;/math&gt; }}<br /> <br /> If $\qpsi$ is a parametric distribution that depends on a vector $\theta$ of ''population parameters'' and a set of ''individual covariates'' $\bc=(c_i , 1\leq i \leq N)$, this dependence can be made explicit by writing $\qpsi(\, \cdot \,;\theta,\bc)$ for the pdf of $\bpsi$.<br /> Each $i$ has a potentially unique set of times $t_i=(t_{i1},\ldots,t_{i \ \!\!n_i})$ in the design, and $n_i$ can be different for each individual.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - In this context, the model is the joint distribution of the observations and the individual parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pypsi(\by , \bpsi; \theta, \bc,\bt)=\pcypsi(\by {{!}} \bpsi;\bt) \, \ppsi(\bpsi;\theta,\bc) . &lt;/math&gt;}}<br /> <br /> - The inputs of the model are the population parameters $\theta$, the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times <br /> :$\bt=(t_{ij} ,\ 1\leq i \leq N ,\ 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= Let us suppose $N$ patients received the same treatment as the single patient did. We now have the same PK model [[#ex_proba1|(1)]] for each patient, except that each has its own individual PK parameters $V_i$ and $k_i$ and potentially its own residual error parameter $a_i$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2a&quot;&gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N} \left( \displaystyle{\frac{500}{V_i}e^{-k_i \, t_{ij} } } , a_i^2 \right).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> Here, $\psi_i = (V_i,k_i,a_i)$. One possible model is then to assume the same residual error model for all patients, and log-normal distributions for $V$ and $k$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> a_i &amp;=&amp; a \end{eqnarray}&lt;/math&gt; }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2b&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \log(V_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(V_{\rm pop})+\beta\,\log(w_i/70),\ \omega_V^2\right) \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \log(k_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(k_{\rm pop}),\ \omega_k^2\right), \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where the only covariate we choose to consider, $w_i$, is the weight (in kg) of patient $i$. The model therefore consists of the conditional distribution of the concentrations defined in [[#ex_proba2a|(2)]] and the distribution of the individual PK parameters defined in [[#ex_proba2b|(3)]]. The inputs of the model are the population parameters $\theta = (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$, the covariates (here, the weight) $(w_i, 1\leq i \leq N)$, and the design $\bt$.<br /> }}<br /> <br /> &lt;br&gt;<br /> ===A model for the population parameters===<br /> <br /> In some cases, it may turn out that it is useful or important to consider that the population parameter $\theta$ is itself random, rather than fixed. There are various reasons for this, such as if we want to model uncertainty in its value, introduce a priori information in an estimation context, or model an inter-population variability if the model is not looking at only one given population.<br /> <br /> If so, let us denote $\qth$ the distribution of $\theta$. As the status of $\theta$ has therefore changed, the model now becomes the joint distribution of its random variables, i.e., of $\by$, $\bpsi$ and $\theta$, and can be decomposed as follows:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3a&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta;\bt,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> |text= &lt;ol&gt;<br /> &lt;li&gt; The formula is identical for $\ppsi(\bpsi; \theta)$ and $\pcpsith(\bpsi{{!}}\theta)$. What has changed is the status of $\theta$. It is not random in $\ppsi(\bpsi; \theta)$, the distribution of $\bpsi$ for any given value of $\theta$, whereas it is random in $\pcpsith(\bpsi {{!}} \theta)$, the conditional distribution of $\bpsi$, i.e., the distribution of $\bpsi$ obtained after observing randomly generated $\theta$. &lt;/li&gt;&lt;br&gt;<br /> <br /> &lt;li&gt;If $\qth$ is a parametric distribution with parameter $\varphi$, this dependence can be made explicit by writing $\qth(\, \cdot \,;\varphi)$ for the distribution of $\theta$.&lt;/li&gt;&lt;br&gt;<br /> <br /> &lt;li&gt;Not necessarily all of the components of $\theta$ need be random. If it is possible to decompose $\theta$ into $(\theta_F,\theta_R)$, where $\theta_F$ is fixed and $\theta_R$ random, then the decomposition [[#proba3a{{!}}(4)]] becomes &lt;/li&gt;<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3b&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta_R;\bt,\theta_F,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta_R;\theta_F,\bc) \, \pth(\theta_R).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(5) }} <br /> &lt;/ol&gt;}}<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the population parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsith(\by,\bpsi,\theta;\bc,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta). &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times $\bt=(t_{ij} , 1\leq i \leq N , 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We can introduce [http://en.wikipedia.org/wiki/Prior_probability prior distributions] in order to model the inter-population variability of the population parameters $V_{\rm pop}$ and $k_{\rm pop}$: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba3&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> V_{\rm pop} &amp;\sim&amp; {\cal N}\left(30,3^2\right) <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(6) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> k_{\rm pop} &amp;\sim&amp; {\cal N}\left(0.1,0.01^2\right). \end{eqnarray}&lt;/math&gt; }}<br /> <br /> As before, the conditional distribution of the concentration is given by [[#ex_proba2a|(2)]]. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given $\theta_R=(V_{\rm pop},k_{\rm pop})$. The distribution of $\theta_R$ is defined in [[#ex_proba3|(6)]]. Here, the inputs of the model are the fixed population parameters $\theta_F = (\omega_V,\omega_k,\beta,a)$, the weights $(w_i)$ and the design $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the covariates===<br /> <br /> <br /> Another scenario is to suppose that in fact it is the covariates $\bc$ that are random, not the population parameters. This may either be in the context of wanting to simulate individuals, or when modeling and wanting to take into account uncertainty in the covariate values. If we note $\qc$ the distribution of the covariates, the joint distribution $\qpsic$ of the individual parameters and the covariates decomposes naturally as:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba4&quot;&gt;&lt;math&gt;<br /> \ppsic(\bpsi,\bc;\theta) = \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) \, ,<br /> &lt;/math&gt;<br /> |reference=(7) }}<br /> <br /> where $\qcpsic$ is the conditional distribution of $\bpsi$ given $\bc$.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> &lt;li&gt;In this context, the model is the joint distribution of the observations, the individual parameters and the covariates:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsic(\by,\bpsi,\bc;\theta,\bt) = \pcypsi(\by {{!}} \bpsi;\bt) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) . &lt;/math&gt; }}<br /> <br /> &lt;li&gt;The inputs of the model are the population parameters $\theta$ and the measurement times $\bt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We could assume a normal distribution as a prior for the weights: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba4&quot; &gt;&lt;math&gt; <br /> w_i \sim_{i.i.d.} {\cal N}\left(70,10^2\right). &lt;/math&gt;&lt;/div&gt;<br /> |reference=(8) }}<br /> <br /> Once more, [[#ex_proba2a|(2)]] defines the conditional distribution of the concentrations. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given the weight $\bw$, which is now a random variable whose distribution is defined in [[#ex_proba4|(8)]]. Now, the inputs of the model are the population parameters $\theta= (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$ and the design $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the measurement times===<br /> <br /> Another scenario is to suppose that there is uncertainty in the measurement times $\bt=(t_{ij})$ and not the population parameters or covariates. If we note $\nominal{\bt}=(\nominal{t}_{ij}, 1\leq i \leq N, 1\leq j \leq n_i)$ the nominal measurement times (i.e., those presented in a data set), then the &quot;true&quot; measurement times $\bt$ at which the measurement were made can be considered random fluctuations around $\nominal{\bt}$ following some distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> Randomness with respect to time can also appear in the presence of dropout, i.e., individuals that prematurely leave a clinical trial. For such an individual $i$ who leaves at the random time $T_i$, measurement times are the nominal times before $T_i$: $t_{i} = (\nominal{t}_{ij} \ \ {\rm s.t. }\ \ \nominal{t}_{ij}\leq T_i)$.<br /> In such situations, measurement times are therefore random and can be thought to come from a distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> <br /> {{Remarks<br /> |title=Remark<br /> |text= If there are also other regression variables $\bx=(x_{ij})$, it is of course possible to use the same approach and consider $\bx$ as a random variable fluctuating around $\nominal{\bx}$. }}<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsit(\by , \bpsi,\bt; \theta,\bc,\nominal{\bt})=\pcypsit(\by {{!}}\bpsi,\bt) \, \ppsi(\bpsi;\theta,\bc) \, \pt(\bt ; \nominal{\bt}) .<br /> &lt;/math&gt; }}<br /> <br /> <br /> &lt;li&gt; The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$ and the nominal design $\nominal{\bt}$.<br /> }}<br /> <br /> {{Example<br /> |title=Example:<br /> |text= Let us assume as prior a normal distribution around the nominal times: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba5&quot; &gt;&lt;math&gt; <br /> t_{ij} \sim_{i.i.d.} {\cal N}\left(\nominal{t}_{ij},0.03^2\right). &lt;/math&gt;&lt;/div&gt;<br /> |reference=(9) }}<br /> <br /> Here, [[#ex_proba5|(9)]] defines the distribution of the now random variable $\bt$. The other components of the model defined in [[#ex_proba2a|(2)]] and [[#ex_proba2b|(3)]] remain unchanged. <br /> The inputs of the model are the population parameters $\theta$, the weights $(w_i)$ and the nominal measurement times $\nominal{\bt}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the dose regimen===<br /> <br /> If the structural model is a dynamical system (e.g., defined by a system of [http://en.wikipedia.org/wiki/Ordinary_differential_equation ordinary differential equations]), the ''source terms'' $\bu = (u_i, 1\leq i \leq N)$, i.e., the inputs of the dynamical system, are usually considered fixed and known. This is the case for example for doses administered to patients for a given treatment. Here, the source term $u_i$ is made up of the dose(s) given to patient $i$, the time(s) of administration, and their type (intravenous bolus, infusion, oral, etc.).<br /> <br /> Here again, there may be differences between the nominal dose regimen stated in the protocol and given in the data set, and the dose regimen that was in reality administered. For example, it might be that the times of administration and/or the dosage were not exactly respected or recorded. Also, there may have been non-compliance, i.e., certain doses that were not taken by the patient.<br /> <br /> If we denote $\nominal{\bu}=(\nominal{u}_{i}, 1\leq i \leq N)$ the nominal dose regimens (reported in the dataset), then in this context the &quot;real&quot; dose regimens $\bu$ can be considered to randomly fluctuate around $\nominal{\bu}$ with some distribution $\qu(\, \cdot \, ; \nominal{\bu})$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the dose regimens:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsiu(\by , \bpsi,\bu; \theta,\bc,\bt,\nominal{\bu})=\pcypsiu(\by {{!}} \bpsi,\bu;\bt) \, \pu(\bu ; \nominal{\bu}) \, \ppsi(\bpsi;\theta,\bc) .<br /> &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$, the nominal design $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=Suppose that instead of the one dose given in the example up to now, there are now repeated doses $(d_{ik}, k \geq 1)$ administered to patient $i$ at times $(\tau_{ik} , k \geq 1)$. Then, it is easy to see that<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6b&quot; &gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N}\left(f(t_{ij};V_i,k_i) , a_i^2 \right), &lt;/math&gt;&lt;/div&gt;<br /> |reference=(10) }}<br /> <br /> where<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6a&quot; &gt;&lt;math&gt; <br /> f(t;V_i,k_i) = \sum_{k, \tau_{ik}&lt;t}\displaystyle {\frac{d_{ik} }{V_i} }\, e^{-k_i \, (t- \tau_{ik})} .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(11) }}<br /> <br /> The &quot;real&quot; dose regimen administered to patient $i$ can be written $u_i=(d_{ik},\tau_{ik}, k\geq 1)$, and the prescribed dose regimen $\nominal{u}_i=(\nominal{d}_{ik},\nominal{\tau}_{ik}, k\geq 1)$.<br /> <br /> We can model the random fluctuations of the administration times $\tau_{ik}$ around the nominal times $(\nominal{\tau}_{ij})$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6c&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \tau_{ik} &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\nominal{\tau}_{ik},0.02^2\right)\ , <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(12) }}<br /> <br /> and non-compliance (here meaning that a dose is not taken):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6d&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \pi &amp;=&amp; \prob{d_{ik} = 0} \nonumber \\ &amp;=&amp; 1 - \prob{d_{ik}= \nominal{d}_{ik} }. <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(13) }}<br /> <br /> Here, [[#ex_proba6b{{!}}(10)]] and [[#ex_proba6a{{!}}(11)]] define the conditional distributions of the concentrations $(y_{ij})$, [[#ex_proba2b{{!}}(3)]] defines the distribution of $\bpsi$ and [[#ex_proba6c{{!}}(12)]] and [[#ex_proba6d{{!}}(13)]] define the distribution of $\bu$. The inputs are the population parameters $\theta$, the weights $(w_i)$, the measurement times $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A complete model===<br /> <br /> We have now seen the variety of ways in which the variables in a model either play the role of random variables whose distribution is defined by the model, or nonrandom variables or parameters. Any combination is possible, depending on the context. For instance, the population parameters $\theta$ and covariates $\bc$ could be random with parametric probability distributions $\qth(\, \cdot \,;\varphi)$ and $\qc(\, \cdot \,;\gamma)$, and the dose regimen $\bu$ and measurement times $\bt$ reported with uncertainty and therefore modeled as random variables with distributions $\qu$ and $\qt$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters, the population parameters, the dose regimens, the covariates and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsithcut(\by , \bpsi, \theta, \bu, \bc,\bt; \nominal{\bu},\nominal{\bt},\varphi,\gamma)=\pcypsiut(\by {{!}}\bpsi,\bu,\bt) \, \pcpsithc(\bpsi{{!}}\theta,\bc) \, \pth(\theta;\varphi) \, \pc(\bc;\gamma) \, \pu(\bu ; \nominal{\bu}) \, \pt(\bt ; \nominal{\bt}). &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the nominal dose regimens $\nominal{\bu}$, the nominal measurement times $\nominal{\bt}$ and the &quot;hyper-parameters&quot; $\varphi$ and $\gamma$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Using the model for executing tasks==<br /> <br /> <br /> In the modeling and simulation context, the tasks to execute make specific use of the various probability distributions associated with a model.<br /> <br /> &lt;br&gt;<br /> ===Simulation===<br /> <br /> <br /> By definition, simulation makes direct use of the probability distribution that defines the model. Simulation of the global model is straightforward as soon as the joint distribution can be decomposed into a product of easily simulated conditional distributions.<br /> <br /> Consider for example that the variables involved in the model are those introduced in [[#An illustrative example|the previous section]]:<br /> <br /> <br /> # The population parameters $\theta$ can either be given, or simulated from the distribution $\qth$.<br /> # The individual covariates $\bc$ can either be given, or simulated from the distribution $\qc$.<br /> # The individual parameters $\bpsi$ can be simulated from the distribution $\qcpsithc$ using the values of $\theta$ and $\bc$ obtained in steps 1 and 2.<br /> # The dose regimen $\bu$ can either be given, or simulated from the distribution $\qu$.<br /> # The measurement times $\bt$ (resp. regression variables $\bx$) can either be given, or simulated from the distribution $\qt$ (resp. $\qx$).<br /> # Lastly, observations $\by$ can be simulated from the distribution $\qcypsiut$ using the values of $\bpsi$, $\bu$ and $\bt$ obtained at steps 3, 4 and 5.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> Simulation of a set of variables $w$ using another given set of variables $z$ requires:<br /> <br /> <br /> &lt;ul&gt;<br /> * a model, i.e., a distribution $\qw$ if $z$ is treated as a nonrandom variable, or a conditional distribution $\qcwz$ if $z$ is treated as a random variable.<br /> * the input $z$, i.e., a value of $z$ which allows the distribution $\qw(\, \cdot \, ; z)$ or the conditional distribution $\qcwz(\, \cdot \, {{!}} z)$ to be defined.<br /> * an algorithm which allows us to generate $w$ from $\qw$ or $\qcwz$.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=<br /> - Imagine instead that the population parameter $\theta$ and the design $(\bu,\bt)$ are given, and we want to simulate the individual covariates $\bc$, the individual parameters $\bpsi$ and the observations $\by$. Here, the variables to simulate are $w=(\bc,\bpsi,\by)$ and the variables which are given are $z=(\theta,\bu,\bt)$. If the components of $z$ are taken to be nonrandom variables, then:<br /> <br /> <br /> &lt;ul&gt;<br /> * The model is the joint distribution $\qypsic( \, \cdot \, ;\theta,\bu,\bt)$ of $(\by,\bpsi,\bc)$.<br /> * The inputs required for the simulation are the values of $(\theta,\bu,\bt)$.<br /> * The algorithm should be able to generate $(\by,\bpsi,\bc)$ from the joint distribution $\qypsic(\, \cdot \, ;\theta,\bu,\bt)$. Decomposing the model into three submodels leads to decomposing the joint distribution as<br /> &lt;/ul&gt;<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsic(\by,\bpsi,\bc ;\theta,\bu,\bt) = \pc(\bc) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pcypsi(\by {{!}} \bpsi;\bu,\bt) . &lt;/math&gt; }}<br /> <br /> The algorithm therefore reduces to successively drawing $\bc$, $\bpsi$ and $\by$ from $\qc$, $\qcpsic(\, \cdot \, {{!}} \bc;\theta)$ and $\qcypsi(\, \cdot \, {{!}} \bpsi;\bu,\bt)$. <br /> <br /> <br /> - Imagine instead that the individual covariates $\bc$, the observations $\by$, the design $(\bu,\bt)$ and the population parameter $\theta$ are given (in a modeling context for instance, $\theta$ may have been estimated), and we want to simulate the individual parameters $\bpsi$. The only variable to simulate is $w=\bpsi$ and the variables which are given are $z=(\by,\bc,\theta,\bu,\bt)$. Here, we will treat $\by$ as if it is a random variable. The other components of $z$ can be treated as non random variables. Here,<br /> <br /> <br /> &lt;ul&gt;<br /> * The model is the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$ of $\psi$.<br /> * The inputs required for the simulation are the values of $(\by,\bc,\theta,\bu,\bt)$.<br /> * The algorithm should be able to sample $\bpsi$ from the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$. [http://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo Markov Chain Monte Carlo] (MCMC) algorithms can be used for sampling from such complex conditional distributions.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> &lt;br&gt;<br /> <br /> ===Estimation of the population parameters===<br /> <br /> <br /> In a modeling context, we usually assume that we have data that includes the observations $\by$ and the measurement times $\bt$. There may also be individual covariates $\bc$, and in pharmacological applications the dose regimen $\bu$. For clarity, let us consider the most general case where all are present.<br /> <br /> Any statistical method for estimating the population parameters $\theta$ will be based on some specific probability distribution. Let us illustrate this with two common statistical methods: maximum likelihood and Bayesian estimation.<br /> <br /> <br /> ''Maximum likelihood estimation'' consists in maximizing with respect to $\theta$ the ''observed likelihood'', defined by:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\like}(\theta ; \by,\bc,\bu,\bt) &amp;\eqdef&amp; \py(\by ; \bc,\bu,\bt,\theta) \\<br /> &amp;=&amp; \int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The variance of the estimator $\thmle$ and therefore confidence intervals can be derived from the observed Fisher information matrix, which itself is calculated using the observed likelihood (i.e., the pdf of the observations $\by$):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ofim_intro3&quot;&gt;&lt;math&gt;<br /> \ofim(\thmle ; \by,\bc,\bu,\bt) \ \ \eqdef \ \ - \displaystyle{ \frac{\partial^2}{\partial \theta^2} } \log({\like}(\thmle ; \by,\bc,\bu,\bt)) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(14) }}<br /> <br /> <br /> {{OutlineText<br /> |text=Maximum likelihood estimation of the population parameter $\theta$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi$.<br /> * inputs $\by$, $\bc$, $\bu$ and $\bt$.<br /> * an algorithm which allows us to maximize $\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi$ with respect to $\theta$ and to compute $\displaystyle{ \frac{\partial^2}{\partial \theta^2} }\left\{\log\left(\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\thmle) \, d \bpsi \right)\right\}$.<br /> }}<br /> <br /> <br /> ''Bayesian estimation'' consists in estimating and/or maximizing the conditional distribution<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcthy(\theta {{!}} \by ;\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pyth(\by , \theta ; \bc,\bu,\bt)}{\py(\by ; \bc,\bu,\bt)} } \\<br /> &amp;=&amp; \frac{\displaystyle{ \int \pypsith(\by,\bpsi,\theta ; \bc,\bu,\bt) \, d \bpsi} }{\py(\by ; \bc,\bu,\bt)} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> {{outlineText<br /> |text= Bayesian estimation of the population parameter $\theta$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsith(\, \cdot \, ; \bc, \bu, \bt)$ for $(\by,\bpsi,\theta)$.<br /> * inputs $\by$, $\bc$, $\bu$ and $\bt$.<br /> * algorithms able to estimate and maximize $\pcthy(\theta {{!}} \by ;\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode, i.e., finding its maximum.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Estimation of the individual parameters===<br /> <br /> <br /> When $\theta$ is given (or estimated), various estimators of the individual parameters $\bpsi$ are available. They are all based on a probability distribution:<br /> <br /> ''Maximum likelihood estimation'' consists of maximizing with respect to $\bpsi$ the ''conditional likelihood''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\like}(\bpsi ; \by,\bu,\bt) &amp;\eqdef&amp; \pcypsi(\by {{!}} \bpsi ;\bu,\bt) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''maximum a posteriori'' (MAP) estimator is obtained by maximizing with respect to $\bpsi$ the ''conditional distribution''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcpsiy(\bpsi {{!}} \by ; \theta,\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pypsi(\by , \bpsi;\theta,\bc,\bu,\bt)}{\py(\by ; \theta,\bc,\bu,\bt)} } .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''conditional mean'' of $\bpsi$ is defined as the mean of the conditional distribution $\qcpsiy(\, \cdot \, | \by ; \theta,\bc,\bu,\bt)$ of $\psi$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> Estimation of the individual parameters $\bpsi$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br /> * algorithms able to estimate and maximize $\pcpsiy(\bpsi {{!}} \by ; \theta,\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode (i.e., its MAP).<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> ===Model selection===<br /> <br /> <br /> Likelihood ratio tests and statistical information criteria ([http://en.wikipedia.org/wiki/Bayesian_information_criterion BIC], [http://en.wikipedia.org/wiki/Akaike_information_criterion AIC]) compare the ''observed likelihoods'' computed under different models, i.e., the probability distribution functions $\py^{(1)}(\by ; \bc,\bu,\bt,\thmle_1)$, $\py^{(2)}(\by ; \bc,\bu,\bt,\thmle_2)$, ..., $\py^{(K)}(\by ; \bc,\bu,\bt,\thmle_K)$ computed under models ${\cal M}_1, {\cal M}_2$, ..., ${\cal M}_K$, where $\thmle_k$ maximizes the observed likelihood of model ${\cal M}_k$, i.e., maximizes $\py^{(k)}(\by ; \bc,\bu,\bt,\theta)$ .<br /> <br /> <br /> {{outlineText<br /> |text=<br /> Computing the observed likelihood and information criteria require:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br /> * an algorithm able to compute $\int \pypsi( \by ,\bpsi ;\theta,\bc,\bu,\bt) \, d\bpsi$. For nonlinear models, linearization methods or Monte Carlo methods can be used.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Optimal design===<br /> <br /> In the design of experiments for estimating statistical models, optimal designs allow parameters to be estimated with minimum variance by optimizing some statistical criteria. Common optimality criteria are functionals of the eigenvalues of the expected Fisher information matrix:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;efim_intro3&quot;&gt;&lt;math&gt;<br /> \efim(\theta ; \bu,\bt) \ \ \eqdef \ \ \esps{y}{\ofim(\theta ; \by,\bu,\bt)} ,<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(15) }}<br /> <br /> where $\ofim$ is the observed Fisher information matrix defined in [[#ofim_intro3|(15)]]. For the sake of simplicity, we consider models without covariates $\bc$.<br /> <br /> <br /> {{OutlineText<br /> |text=Optimal design for minimum variance estimation requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * a vector of population parameters $\theta$.<br /> * a criteria ${\cal D}(\bu,\bt)$ derived from the expected Fisher information matrix $\efim(\theta ; \bu,\bt)$.<br /> * an algorithm able to estimate $\efim(\theta ; \bu,\bt)$ for any design $(\bu,\bt)$ and to maximize ${\cal D}(\bu,\bt)$ with respect to $\bu$ and $\bt$.<br /> }}<br /> <br /> <br /> In a clinical trial context, studies are designed to optimize the probability of reaching some predefined target ${\cal A}$, i.e., $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$. This may include optimizing safety and efficacy, and things like the probability of reaching sustained virologic response, etc.<br /> <br /> <br /> {{OutlineText<br /> |text=Optimal design for clinical trials requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsic(\, \cdot \, ; \theta, \bu, \bt)$ for $(\by,\bpsi,\bc)$.<br /> * a vector of population parameters $\theta$.<br /> * a target ${\cal A}$.<br /> * an algorithm able to estimate $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$ and to maximize it with respect to $\bu$ and $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Implementing models and running tasks==<br /> <br /> <br /> ===Example 1 ===<br /> <br /> Consider first the model defined by the joint distribution <br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\pypsi(\by,\bpsi ; \theta ,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \pcpsic(\bpsi ; \theta),&lt;/math&gt;}}<br /> <br /> where as in our running example, <br /> <br /> <br /> &lt;ul&gt;<br /> * $\by = (y_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are concentrations <br /> <br /> * $\bpsi= (\psi_i, 1\leq i \leq N)$ are individual parameters; here $\psi_i=(V_i,k_i,a_i)$<br /> <br /> * $\theta=(V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,a)$ are population parameters <br /> <br /> * $\bt = (t_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are the measurement times. <br /> &lt;/ul&gt;<br /> <br /> <br /> We aim to define a joint model for $\by$ and $\bpsi$. To do this, we will characterize each component of the model and show how this can be implemented with $\mlxtran$.<br /> <br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot;|<br /> {{Equation2 <br /> |name=&lt;math&gt; \pypsi(\by,\bpsi ; \theta, \bt) &lt;/math&gt; <br /> |equation= }}<br /> {{Equation2<br /> |name= &lt;math&gt; \pcpsic(\bpsi {{!}}\theta)&lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{array}{c} <br /> \log(V_i) &amp;\sim&amp; {\cal N}\left(\log(V_{\rm pop}), \, \omega_V^2\right) \\<br /> \log(k_i) &amp;\sim&amp; {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br /> \end{array}&lt;/math&gt; }}<br /> {{Equation2<br /> |name= &lt;math&gt;\pcypsi(y{{!}}\bpsi; \bt) &lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{eqnarray}<br /> f(t;V_i,k_i) &amp;=&amp; \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br /> y_{ij} &amp;\sim&amp; {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> |style = &quot;width:50%&quot; |<br /> {{MLXTranForTable<br /> |name=Example 1<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> [INDIVIDUAL PARAMETER]<br /> input={V_pop,k_pop,omega_V,omega_k}<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=V_pop,sd=omega_V}<br /> k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br /> <br /> <br /> [OBSERVATION]<br /> input={V,k,a}<br /> <br /> EQUATION:<br /> f = 500/V*exp(-k*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, sd=a}<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> <br /> We can then use this model with different tools for executing different tasks: it can be used for example with $\mlxplore$ for model exploration, with $\monolix$ for modeling, with R or Matlab for simulation...<br /> <br /> It is important to remember that $\mlxtran$ is not a &quot;function&quot; that calculates an output. It is not an imperative but rather a declarative language, one that allows us to describe a model. It is then the tasks we choose to do which use $\mlxtran$ like a function, &quot;requesting&quot; it to give predictions, simulate random variables, compute a pdf, maximizes a likelihood, etc.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Example 1 ===<br /> <br /> Consider first the model defined by the joint distribution <br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\pypsi(\by,\bpsi ; \theta ,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \pcpsic(\bpsi ; \theta),&lt;/math&gt;}}<br /> <br /> where as in our running example, <br /> <br /> <br /> &lt;ul&gt;<br /> * $\by = (y_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are concentrations <br /> <br /> * $\bpsi= (\psi_i, 1\leq i \leq N)$ are individual parameters; here $\psi_i=(V_i,k_i,a_i)$<br /> <br /> * $\theta=(V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,a)$ are population parameters <br /> <br /> * $\bt = (t_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are the measurement times. <br /> &lt;/ul&gt;<br /> <br /> <br /> We aim to define a joint model for $\by$ and $\bpsi$. To do this, we will characterize each component of the model and show how this can be implemented with $\mlxtran$.<br /> <br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot;|<br /> {{Equation2 <br /> |name=&lt;math&gt; \pypsi(\by,\bpsi ; \theta, \bt) &lt;/math&gt; <br /> |equation= }}<br /> {{Equation2<br /> |name= &lt;math&gt; \pcpsic(\bpsi {{!}}\theta)&lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{array}{c} <br /> \log(V_i) &amp;\sim&amp; {\cal N}\left(\log(V_{\rm pop}), \, \omega_V^2\right) \\<br /> \log(k_i) &amp;\sim&amp; {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br /> \end{array}&lt;/math&gt; }}<br /> {{Equation2<br /> |name= &lt;math&gt;\pcypsi(y{{!}}\bpsi; \bt) &lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{eqnarray}<br /> f(t;V_i,k_i) &amp;=&amp; \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br /> y_{ij} &amp;\sim&amp; {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> |style = &quot;width:50%&quot; |<br /> {{MLXTranForTable<br /> |name=Example 1<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> [INDIVIDUAL PARAMETER]<br /> input={V_pop,k_pop,omega_V,omega_k}<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=V_pop,sd=omega_V}<br /> k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br /> <br /> <br /> [OBSERVATION]<br /> input={V,k,a}<br /> <br /> EQUATION:<br /> f = 500/V*exp(-k*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, sd=a}<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> <br /> We can then use this model with different tools for executing different tasks: it can be used for example with $\mlxplore$ for model exploration, with $\monolix$ for modeling, with R or Matlab for simulation...<br /> <br /> It is important to remember that $\mlxtran$ is not a &quot;function&quot; that calculates an output. It is not an imperative but rather a declarative language, one that allows us to describe a model. It is then the tasks we choose to do which use $\mlxtran$ like a function, &quot;requesting&quot; it to give predictions, simulate random variables, compute a pdf, maximizes a likelihood, etc.<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack=The individual approach<br /> |linkNext=Description, representation and implementation of a model }}</div> Bleakley https://wiki.inria.fr/wikis/popix/index.php?title=What_is_a_model%3F_A_joint_probability_distribution!&diff=7367 What is a model? A joint probability distribution! 2013-06-19T09:22:27Z <p>Bleakley: </p> <hr /> <div>==Introduction==<br /> <br /> A model built for real-world applications can involve various types of variable, such as measurements, individual and population parameters, covariates, design, etc. The model allows us to represent relationships between these variables.<br /> <br /> If we consider things from a probabilistic point of view, some of the variables will be random, so the model becomes a probabilistic one, representing the [http://en.wikipedia.org/wiki/Joint_probability_distribution joint distribution] of these random variables.<br /> <br /> Defining a model therefore means defining a joint distribution. The hierarchical structure of the model will then allow it to be decomposed into submodels, i.e., the joint distribution decomposed into a product of [http://en.wikipedia.org/wiki/Conditional_probability_distribution conditional distributions].<br /> <br /> Tasks such as estimation, model selection, simulation and optimization can then be expressed as specific ways of using this probability distribution.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - A model is a joint probability distribution. <br /> <br /> - A submodel is a conditional distribution derived from this joint distribution. <br /> <br /> - A task is a specific use of this distribution. <br /> }}<br /> <br /> We will illustrate this approach starting with a very simple example that we will gradually make more sophisticated. Then we will see in various situations what can be defined as the model and what its inputs are.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==An illustrative example==<br /> <br /> &lt;br&gt;<br /> ===A model for the observations of a single individual===<br /> Let $y=(y_j, 1\leq j \leq n)$ be a vector of observations obtained at times $\vt=(t_j, 1\leq j \leq n)$. We consider that the $y_j$ are random variables and we denote $\qy$ the distribution (or [http://en.wikipedia.org/wiki/Probability_density_function pdf]) of $y$. If we assume a [http://en.wikipedia.org/wiki/Parametric_model parametric model], then there exists a vector of parameters $\psi$ that completely define $y$.<br /> <br /> We can then explicitly represent this dependency with respect to $\bpsi$ by writing $\qy( \, \cdot \, ; \psi)$ for the pdf of $y$.<br /> <br /> If we wish to be even more precise, we can even make it clear that this distribution is defined for a given design, i.e., a given vector of times $\vt$, and write $\qy(\, \cdot \, ; \psi,\vt)$ instead.<br /> <br /> By convention, the variables which are before the symbol &quot;;&quot; are random variables. Those that are after the &quot;;&quot; are non-random parameters or variables.<br /> When there is no risk of confusion, the non-random terms can be left out of the notation.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> -In this context, the model is the distribution of the observations $\qy(\, \cdot \, ; \psi,\vt)$. &lt;br&gt;<br /> -The inputs of the model are the parameters $\psi$ and the design $\vt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= 500 mg of a drug is given by [http://en.wikipedia.org/wiki/Intravenous_therapy intravenous] [http://en.wikipedia.org/wiki/Bolus_%28medicine%29 bolus] to a patient at time 0. We assume that the evolution of the [http://en.wikipedia.org/wiki/Blood_plasma plasmatic] concentration of the drug over time is described by the [http://en.wikipedia.org/wiki/Pharmacokinetics pharmacokinetic] (PK) model<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; f(t;V,k) = \displaystyle{ \frac{500}{V} }e^{-k \, t} , &lt;/math&gt; }}<br /> <br /> where $V$ is the [http://en.wikipedia.org/wiki/Volume_of_distribution volume of distribution] and $k$ the [http://en.wikipedia.org/wiki/Elimination_rate_constant elimination rate constant]. The concentration is measured at times $(t_j, 1\leq j \leq n)$ with additive residual errors:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; y_j = f(t_j;V,k) + e_j , \quad 1 \leq j \leq n . &lt;/math&gt; }}<br /> <br /> Assuming that the residual errors $(e_j)$ are [http://en.wikipedia.org/wiki/Dependent_and_independent_variables independent] and [http://en.wikipedia.org/wiki/Normal_distribution normally distributed] with constant variance $a^2$, the observed values $(y_j)$ are also independent random variables and<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba1&quot; &gt;&lt;math&gt;<br /> y_j \sim {\cal N} \left( f(t_j ; V,k) , a^2 \right), \quad 1 \leq j \leq n. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> Here, the vector of parameters $\psi$ is $(V,k,a)$. $V$ and $k$ are the PK parameters for the structural PK model and $a$ the residual error parameter.<br /> As the $y_j$ are independent, the joint distribution of $y$ is the product of their marginal distributions:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \py(y ; \psi,\vt) = \prod_{j=1}^n \pyj(y_j ; \psi,t_j) ,<br /> &lt;/math&gt; }}<br /> <br /> where $\qyj$ is the normal distribution defined in [[#ex_proba1|(1)]]. <br /> }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> === A model for several individuals ===<br /> <br /> Now let us move to $N$ individuals. It is natural to suppose that each is represented by the same basic parametric model, but not necessarily the exact same parameter values. Thus, individual $i$ has parameters $\psi_i$. If we consider that individuals are randomly selected from the [http://en.wikipedia.org/wiki/Statistical_population population], then we can treat the $\psi_i$ as if they were random vectors. As both $\by=(y_i , 1\leq i \leq N)$ and $\bpsi=(\psi_i , 1\leq i \leq N)$ are random, the model is now a joint distribution: $\qypsi$. Using basic probability, this can be written as:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsi(\by,\bpsi) = \pcypsi(\by {{!}} \bpsi) \, \ppsi(\bpsi) .&lt;/math&gt; }}<br /> <br /> If $\qpsi$ is a parametric distribution that depends on a vector $\theta$ of ''population parameters'' and a set of ''individual covariates'' $\bc=(c_i , 1\leq i \leq N)$, this dependence can be made explicit by writing $\qpsi(\, \cdot \,;\theta,\bc)$ for the pdf of $\bpsi$.<br /> Each $i$ has a potentially unique set of times $t_i=(t_{i1},\ldots,t_{i \ \!\!n_i})$ in the design, and $n_i$ can be different for each individual.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - In this context, the model is the joint distribution of the observations and the individual parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pypsi(\by , \bpsi; \theta, \bc,\bt)=\pcypsi(\by {{!}} \bpsi;\bt) \, \ppsi(\bpsi;\theta,\bc) . &lt;/math&gt;}}<br /> <br /> - The inputs of the model are the population parameters $\theta$, the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times <br /> :$\bt=(t_{ij} ,\ 1\leq i \leq N ,\ 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= Let us suppose $N$ patients received the same treatment as the single patient did. We now have the same PK model [[#ex_proba1|(1)]] for each patient, except that each has its own individual PK parameters $V_i$ and $k_i$ and potentially its own residual error parameter $a_i$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2a&quot;&gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N} \left( \displaystyle{\frac{500}{V_i}e^{-k_i \, t_{ij} } } , a_i^2 \right).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> Here, $\psi_i = (V_i,k_i,a_i)$. One possible model is then to assume the same residual error model for all patients, and log-normal distributions for $V$ and $k$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> a_i &amp;=&amp; a \end{eqnarray}&lt;/math&gt; }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2b&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \log(V_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(V_{\rm pop})+\beta\,\log(w_i/70),\ \omega_V^2\right) \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \log(k_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(k_{\rm pop}),\ \omega_k^2\right), \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where the only covariate we choose to consider, $w_i$, is the weight (in kg) of patient $i$. The model therefore consists of the conditional distribution of the concentrations defined in [[#ex_proba2a|(2)]] and the distribution of the individual PK parameters defined in [[#ex_proba2b|(3)]]. The inputs of the model are the population parameters $\theta = (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$, the covariates (here, the weight) $(w_i, 1\leq i \leq N)$, and the design $\bt$.<br /> }}<br /> <br /> &lt;br&gt;<br /> ===A model for the population parameters===<br /> <br /> In some cases, it may turn out that it is useful or important to consider that the population parameter $\theta$ is itself random, rather than fixed. There are various reasons for this, such as if we want to model uncertainty in its value, introduce a priori information in an estimation context, or model an inter-population variability if the model is not looking at only one given population.<br /> <br /> If so, let us denote $\qth$ the distribution of $\theta$. As the status of $\theta$ has therefore changed, the model now becomes the joint distribution of its random variables, i.e., of $\by$, $\bpsi$ and $\theta$, and can be decomposed as follows:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3a&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta;\bt,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> |text= &lt;ol&gt;<br /> &lt;li&gt; The formula is identical for $\ppsi(\bpsi; \theta)$ and $\pcpsith(\bpsi{{!}}\theta)$. What has changed is the status of $\theta$. It is not random in $\ppsi(\bpsi; \theta)$, the distribution of $\bpsi$ for any given value of $\theta$, whereas it is random in $\pcpsith(\bpsi {{!}} \theta)$, the conditional distribution of $\bpsi$, i.e., the distribution of $\bpsi$ obtained after observing randomly generated $\theta$. &lt;/li&gt;&lt;br&gt;<br /> <br /> &lt;li&gt;If $\qth$ is a parametric distribution with parameter $\varphi$, this dependence can be made explicit by writing $\qth(\, \cdot \,;\varphi)$ for the distribution of $\theta$.&lt;/li&gt;&lt;br&gt;<br /> <br /> &lt;li&gt;Not necessarily all of the components of $\theta$ need be random. If it is possible to decompose $\theta$ into $(\theta_F,\theta_R)$, where $\theta_F$ is fixed and $\theta_R$ random, then the decomposition [[#proba3a{{!}}(4)]] becomes &lt;/li&gt;<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3b&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta_R;\bt,\theta_F,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta_R;\theta_F,\bc) \, \pth(\theta_R).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(5) }} <br /> &lt;/ol&gt;}}<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the population parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsith(\by,\bpsi,\theta;\bc,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta). &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times $\bt=(t_{ij} , 1\leq i \leq N , 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We can introduce [http://en.wikipedia.org/wiki/Prior_probability prior distributions] in order to model the inter-population variability of the population parameters $V_{\rm pop}$ and $k_{\rm pop}$: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba3&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> V_{\rm pop} &amp;\sim&amp; {\cal N}\left(30,3^2\right) <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(6) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> k_{\rm pop} &amp;\sim&amp; {\cal N}\left(0.1,0.01^2\right). \end{eqnarray}&lt;/math&gt; }}<br /> <br /> As before, the conditional distribution of the concentration is given by [[#ex_proba2a|(2)]]. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given $\theta_R=(V_{\rm pop},k_{\rm pop})$. The distribution of $\theta_R$ is defined in [[#ex_proba3|(6)]]. Here, the inputs of the model are the fixed population parameters $\theta_F = (\omega_V,\omega_k,\beta,a)$, the weights $(w_i)$ and the design $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the covariates===<br /> <br /> <br /> Another scenario is to suppose that in fact it is the covariates $\bc$ that are random, not the population parameters. This may either be in the context of wanting to simulate individuals, or when modeling and wanting to take into account uncertainty in the covariate values. If we note $\qc$ the distribution of the covariates, the joint distribution $\qpsic$ of the individual parameters and the covariates decomposes naturally as:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba4&quot;&gt;&lt;math&gt;<br /> \ppsic(\bpsi,\bc;\theta) = \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) \, ,<br /> &lt;/math&gt;<br /> |reference=(7) }}<br /> <br /> where $\qcpsic$ is the conditional distribution of $\bpsi$ given $\bc$.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> &lt;li&gt;In this context, the model is the joint distribution of the observations, the individual parameters and the covariates:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsic(\by,\bpsi,\bc;\theta,\bt) = \pcypsi(\by {{!}} \bpsi;\bt) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) . &lt;/math&gt; }}<br /> <br /> &lt;li&gt;The inputs of the model are the population parameters $\theta$ and the measurement times $\bt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We could assume a normal distribution as a prior for the weights: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba4&quot; &gt;&lt;math&gt; <br /> w_i \sim_{i.i.d.} {\cal N}\left(70,10^2\right). &lt;/math&gt;&lt;/div&gt;<br /> |reference=(8) }}<br /> <br /> Once more, [[#ex_proba2a|(2)]] defines the conditional distribution of the concentrations. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given the weight $\bw$, which is now a random variable whose distribution is defined in [[#ex_proba4|(8)]]. Now, the inputs of the model are the population parameters $\theta= (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$ and the design $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the measurement times===<br /> <br /> Another scenario is to suppose that there is uncertainty in the measurement times $\bt=(t_{ij})$ and not the population parameters or covariates. If we note $\nominal{\bt}=(\nominal{t}_{ij}, 1\leq i \leq N, 1\leq j \leq n_i)$ the nominal measurement times (i.e., those presented in a data set), then the &quot;true&quot; measurement times $\bt$ at which the measurement were made can be considered random fluctuations around $\nominal{\bt}$ following some distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> Randomness with respect to time can also appear in the presence of dropout, i.e., individuals that prematurely leave a clinical trial. For such an individual $i$ who leaves at the random time $T_i$, measurement times are the nominal times before $T_i$: $t_{i} = (\nominal{t}_{ij} \ \ {\rm s.t. }\ \ \nominal{t}_{ij}\leq T_i)$.<br /> In such situations, measurement times are therefore random and can be thought to come from a distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> <br /> {{Remarks<br /> |title=Remark<br /> |text= If there are also other regression variables $\bx=(x_{ij})$, it is of course possible to use the same approach and consider $\bx$ as a random variable fluctuating around $\nominal{\bx}$. }}<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsit(\by , \bpsi,\bt; \theta,\bc,\nominal{\bt})=\pcypsit(\by {{!}}\bpsi,\bt) \, \ppsi(\bpsi;\theta,\bc) \, \pt(\bt ; \nominal{\bt}) .<br /> &lt;/math&gt; }}<br /> <br /> <br /> &lt;li&gt; The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$ and the nominal design $\nominal{\bt}$.<br /> }}<br /> <br /> {{Example<br /> |title=Example:<br /> |text= Let us assume as prior a normal distribution around the nominal times: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba5&quot; &gt;&lt;math&gt; <br /> t_{ij} \sim_{i.i.d.} {\cal N}\left(\nominal{t}_{ij},0.03^2\right). &lt;/math&gt;&lt;/div&gt;<br /> |reference=(9) }}<br /> <br /> Here, [[#ex_proba5|(9)]] defines the distribution of the now random variable $\bt$. The other components of the model defined in [[#ex_proba2a|(2)]] and [[#ex_proba2b|(3)]] remain unchanged. <br /> The inputs of the model are the population parameters $\theta$, the weights $(w_i)$ and the nominal measurement times $\nominal{\bt}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the dose regimen===<br /> <br /> If the structural model is a dynamical system (e.g., defined by a system of [http://en.wikipedia.org/wiki/Ordinary_differential_equation ordinary differential equations]), the ''source terms'' $\bu = (u_i, 1\leq i \leq N)$, i.e., the inputs of the dynamical system, are usually considered fixed and known. This is the case for example for doses administered to patients for a given treatment. Here, the source term $u_i$ is made up of the dose(s) given to patient $i$, the time(s) of administration, and their type (intravenous bolus, infusion, oral, etc.).<br /> <br /> Here again, there may be differences between the nominal dose regimen stated in the protocol and given in the data set, and the dose regimen that was in reality administered. For example, it might be that the times of administration and/or the dosage were not exactly respected or recorded. Also, there may have been non-compliance, i.e., certain doses that were not taken by the patient.<br /> <br /> If we denote $\nominal{\bu}=(\nominal{u}_{i}, 1\leq i \leq N)$ the nominal dose regimens (reported in the dataset), then in this context the &quot;real&quot; dose regimens $\bu$ can be considered to randomly fluctuate around $\nominal{\bu}$ with some distribution $\qu(\, \cdot \, ; \nominal{\bu})$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the dose regimens:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsiu(\by , \bpsi,\bu; \theta,\bc,\bt,\nominal{\bu})=\pcypsiu(\by {{!}} \bpsi,\bu;\bt) \, \pu(\bu ; \nominal{\bu}) \, \ppsi(\bpsi;\theta,\bc) .<br /> &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$, the nominal design $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=Suppose that instead of the one dose given in the example up to now, there are now repeated doses $(d_{ik}, k \geq 1)$ administered to patient $i$ at times $(\tau_{ik} , k \geq 1)$. Then, it is easy to see that<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6b&quot; &gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N}\left(f(t_{ij};V_i,k_i) , a_i^2 \right), &lt;/math&gt;&lt;/div&gt;<br /> |reference=(10) }}<br /> <br /> where<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6a&quot; &gt;&lt;math&gt; <br /> f(t;V_i,k_i) = \sum_{k, \tau_{ik}&lt;t}\displaystyle {\frac{d_{ik} }{V_i} }\, e^{-k_i \, (t- \tau_{ik})} .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(11) }}<br /> <br /> The &quot;real&quot; dose regimen administered to patient $i$ can be written $u_i=(d_{ik},\tau_{ik}, k\geq 1)$, and the prescribed dose regimen $\nominal{u}_i=(\nominal{d}_{ik},\nominal{\tau}_{ik}, k\geq 1)$.<br /> <br /> We can model the random fluctuations of the administration times $\tau_{ik}$ around the nominal times $(\nominal{\tau}_{ij})$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6c&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \tau_{ik} &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\nominal{\tau}_{ik},0.02^2\right)\ , <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(12) }}<br /> <br /> and non-compliance (here meaning that a dose is not taken):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6d&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \pi &amp;=&amp; \prob{d_{ik} = 0} \nonumber \\ &amp;=&amp; 1 - \prob{d_{ik}= \nominal{d}_{ik} }. <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(13) }}<br /> <br /> Here, [[#ex_proba6b{{!}}(10)]] and [[#ex_proba6a{{!}}(11)]] define the conditional distributions of the concentrations $(y_{ij})$, [[#ex_proba2b{{!}}(3)]] defines the distribution of $\bpsi$ and [[#ex_proba6c{{!}}(12)]] and [[#ex_proba6d{{!}}(13)]] define the distribution of $\bu$. The inputs are the population parameters $\theta$, the weights $(w_i)$, the measurement times $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A complete model===<br /> <br /> We have now seen the variety of ways in which the variables in a model either play the role of random variables whose distribution is defined by the model, or nonrandom variables or parameters. Any combination is possible, depending on the context. For instance, the population parameters $\theta$ and covariates $\bc$ could be random with parametric probability distributions $\qth(\, \cdot \,;\varphi)$ and $\qc(\, \cdot \,;\gamma)$, and the dose regimen $\bu$ and measurement times $\bt$ reported with uncertainty and therefore modeled as random variables with distributions $\qu$ and $\qt$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters, the population parameters, the dose regimens, the covariates and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsithcut(\by , \bpsi, \theta, \bu, \bc,\bt; \nominal{\bu},\nominal{\bt},\varphi,\gamma)=\pcypsiut(\by {{!}}\bpsi,\bu,\bt) \, \pcpsithc(\bpsi{{!}}\theta,\bc) \, \pth(\theta;\varphi) \, \pc(\bc;\gamma) \, \pu(\bu ; \nominal{\bu}) \, \pt(\bt ; \nominal{\bt}). &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the nominal dose regimens $\nominal{\bu}$, the nominal measurement times $\nominal{\bt}$ and the &quot;hyper-parameters&quot; $\varphi$ and $\gamma$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Using the model for executing tasks==<br /> <br /> <br /> In the modeling and simulation context, the tasks to execute make specific use of the various probability distributions associated with a model.<br /> <br /> &lt;br&gt;<br /> ===Simulation===<br /> <br /> <br /> By definition, simulation makes direct use of the probability distribution that defines the model. Simulation of the global model is straightforward as soon as the joint distribution can be decomposed into a product of easily simulated conditional distributions.<br /> <br /> Consider for example that the variables involved in the model are those introduced in [[#An illustrative example|the previous section]]:<br /> <br /> <br /> # The population parameters $\theta$ can either be given, or simulated from the distribution $\qth$.<br /> # The individual covariates $\bc$ can either be given, or simulated from the distribution $\qc$.<br /> # The individual parameters $\bpsi$ can be simulated from the distribution $\qcpsithc$ using the values of $\theta$ and $\bc$ obtained in steps 1 and 2.<br /> # The dose regimen $\bu$ can either be given, or simulated from the distribution $\qu$.<br /> # The measurement times $\bt$ (resp. regression variables $\bx$) can either be given, or simulated from the distribution $\qt$ (resp. $\qx$).<br /> # Lastly, observations $\by$ can be simulated from the distribution $\qcypsiut$ using the values of $\bpsi$, $\bu$ and $\bt$ obtained at steps 3, 4 and 5.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> Simulation of a set of variables $w$ using another given set of variables $z$ requires:<br /> <br /> <br /> &lt;ul&gt;<br /> * a model, i.e., a distribution $\qw$ if $z$ is treated as a nonrandom variable, or a conditional distribution $\qcwz$ if $z$ is treated as a random variable.<br /> * the input $z$, i.e., a value of $z$ which allows the distribution $\qw(\, \cdot \, ; z)$ or the conditional distribution $\qcwz(\, \cdot \, {{!}} z)$ to be defined.<br /> * an algorithm which allows us to generate $w$ from $\qw$ or $\qcwz$.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=<br /> - Imagine instead that the population parameter $\theta$ and the design $(\bu,\bt)$ are given, and we want to simulate the individual covariates $\bc$, the individual parameters $\bpsi$ and the observations $\by$. Here, the variables to simulate are $w=(\bc,\bpsi,\by)$ and the variables which are given are $z=(\theta,\bu,\bt)$. If the components of $z$ are taken to be nonrandom variables, then:<br /> <br /> <br /> &lt;ul&gt;<br /> * The model is the joint distribution $\qypsic( \, \cdot \, ;\theta,\bu,\bt)$ of $(\by,\bpsi,\bc)$.<br /> * The inputs required for the simulation are the values of $(\theta,\bu,\bt)$.<br /> * The algorithm should be able to generate $(\by,\bpsi,\bc)$ from the joint distribution $\qypsic(\, \cdot \, ;\theta,\bu,\bt)$. Decomposing the model into three submodels leads to decomposing the joint distribution as<br /> &lt;/ul&gt;<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsic(\by,\bpsi,\bc ;\theta,\bu,\bt) = \pc(\bc) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pcypsi(\by {{!}} \bpsi;\bu,\bt) . &lt;/math&gt; }}<br /> <br /> The algorithm therefore reduces to successively drawing $\bc$, $\bpsi$ and $\by$ from $\qc$, $\qcpsic(\, \cdot \, {{!}} \bc;\theta)$ and $\qcypsi(\, \cdot \, {{!}} \bpsi;\bu,\bt)$. <br /> <br /> <br /> - Imagine instead that the individual covariates $\bc$, the observations $\by$, the design $(\bu,\bt)$ and the population parameter $\theta$ are given (in a modeling context for instance, $\theta$ may have been estimated), and we want to simulate the individual parameters $\bpsi$. The only variable to simulate is $w=\bpsi$ and the variables which are given are $z=(\by,\bc,\theta,\bu,\bt)$. Here, we will treat $\by$ as if it is a random variable. The other components of $z$ can be treated as non random variables. Here,<br /> <br /> <br /> &lt;ul&gt;<br /> * The model is the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$ of $\psi$.<br /> * The inputs required for the simulation are the values of $(\by,\bc,\theta,\bu,\bt)$.<br /> * The algorithm should be able to sample $\bpsi$ from the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$. [http://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo Markov Chain Monte Carlo] (MCMC) algorithms can be used for sampling from such complex conditional distributions.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> &lt;br&gt;<br /> <br /> ===Estimation of the population parameters===<br /> <br /> <br /> In a modeling context, we usually assume that we have data that includes the observations $\by$ and the measurement times $\bt$. There may also be individual covariates $\bc$, and in pharmacological applications the dose regimen $\bu$. For clarity, let us consider the most general case where all are present.<br /> <br /> Any statistical method for estimating the population parameters $\theta$ will be based on some specific probability distribution. Let us illustrate this with two common statistical methods: maximum likelihood and Bayesian estimation.<br /> <br /> <br /> ''Maximum likelihood estimation'' consists in maximizing with respect to $\theta$ the ''observed likelihood'', defined by:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\like}(\theta ; \by,\bc,\bu,\bt) &amp;\eqdef&amp; \py(\by ; \bc,\bu,\bt,\theta) \\<br /> &amp;=&amp; \int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The variance of the estimator $\thmle$ and therefore confidence intervals can be derived from the observed Fisher information matrix, which itself is calculated using the observed likelihood (i.e., the pdf of the observations $\by$):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ofim_intro3&quot;&gt;&lt;math&gt;<br /> \ofim(\thmle ; \by,\bc,\bu,\bt) \ \ \eqdef \ \ - \displaystyle{ \frac{\partial^2}{\partial \theta^2} } \log({\like}(\thmle ; \by,\bc,\bu,\bt)) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(14) }}<br /> <br /> <br /> {{OutlineText<br /> |text=Maximum likelihood estimation of the population parameter $\theta$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi$.<br /> * inputs $\by$, $\bc$, $\bu$ and $\bt$.<br /> * an algorithm which allows us to maximize $\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi$ with respect to $\theta$ and to compute $\displaystyle{ \frac{\partial^2}{\partial \theta^2} }\left\{\log\left(\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\thmle) \, d \bpsi \right)\right\}$.<br /> }}<br /> <br /> <br /> ''Bayesian estimation'' consists in estimating and/or maximizing the conditional distribution<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcthy(\theta {{!}} \by ;\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pyth(\by , \theta ; \bc,\bu,\bt)}{\py(\by ; \bc,\bu,\bt)} } \\<br /> &amp;=&amp; \frac{\displaystyle{ \int \pypsith(\by,\bpsi,\theta ; \bc,\bu,\bt) \, d \bpsi} }{\py(\by ; \bc,\bu,\bt)} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> {{outlineText<br /> |text= Bayesian estimation of the population parameter $\theta$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsith(\, \cdot \, ; \bc, \bu, \bt)$ for $(\by,\bpsi,\theta)$.<br /> * inputs $\by$, $\bc$, $\bu$ and $\bt$.<br /> * algorithms able to estimate and maximize $\pcthy(\theta {{!}} \by ;\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode, i.e., finding its maximum.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Estimation of the individual parameters===<br /> <br /> <br /> When $\theta$ is given (or estimated), various estimators of the individual parameters $\bpsi$ are available. They are all based on a probability distribution:<br /> <br /> ''Maximum likelihood estimation'' consists of maximizing with respect to $\bpsi$ the ''conditional likelihood''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\like}(\bpsi ; \by,\bu,\bt) &amp;\eqdef&amp; \pcypsi(\by {{!}} \bpsi ;\bu,\bt) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''maximum a posteriori'' (MAP) estimator is obtained by maximizing with respect to $\bpsi$ the ''conditional distribution''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcpsiy(\bpsi {{!}} \by ; \theta,\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pypsi(\by , \bpsi;\theta,\bc,\bu,\bt)}{\py(\by ; \theta,\bc,\bu,\bt)} } .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''conditional mean'' of $\bpsi$ is defined as the mean of the conditional distribution $\qcpsiy(\, \cdot \, | \by ; \theta,\bc,\bu,\bt)$ of $\psi$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> Estimation of the individual parameters $\bpsi$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br /> * algorithms able to estimate and maximize $\pcpsiy(\bpsi {{!}} \by ; \theta,\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode (i.e., its MAP).<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> ===Model selection===<br /> <br /> <br /> Likelihood ratio tests and statistical information criteria ([http://en.wikipedia.org/wiki/Bayesian_information_criterion BIC], [http://en.wikipedia.org/wiki/Akaike_information_criterion AIC]) compare the ''observed likelihoods'' computed under different models, i.e., the probability distribution functions $\py^{(1)}(\by ; \bc,\bu,\bt,\thmle_1)$, $\py^{(2)}(\by ; \bc,\bu,\bt,\thmle_2)$, ..., $\py^{(K)}(\by ; \bc,\bu,\bt,\thmle_K)$ computed under models ${\cal M}_1, {\cal M}_2$, ..., ${\cal M}_K$, where $\thmle_k$ maximizes the observed likelihood of model ${\cal M}_k$, i.e., maximizes $\py^{(k)}(\by ; \bc,\bu,\bt,\theta)$ .<br /> <br /> <br /> {{outlineText<br /> |text=<br /> Computing the observed likelihood and information criteria require:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br /> * an algorithm able to compute $\int \pypsi( \by ,\bpsi ;\theta,\bc,\bu,\bt) \, d\bpsi$. For nonlinear models, linearization methods or Monte-Carlo methods can be used.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Optimal design===<br /> <br /> In the design of experiments for estimating statistical models, optimal designs allow parameters to be estimated with minimum variance by optimizing some statistical criteria. Common optimality criteria are functionals of the eigenvalues of the expected Fisher information matrix:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;efim_intro3&quot;&gt;&lt;math&gt;<br /> \efim(\theta ; \bu,\bt) \ \ \eqdef \ \ \esps{y}{\ofim(\theta ; \by,\bu,\bt)} ,<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(15) }}<br /> <br /> where $\ofim$ is the observed Fisher information matrix defined in [[#ofim_intro3|(15)]]. For the sake of simplicity, we consider models without covariates $\bc$.<br /> <br /> <br /> {{OutlineText<br /> |text=Optimal design for minimum variance estimation requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * a vector of population parameters $\theta$.<br /> * a criteria ${\cal D}(\bu,\bt)$ derived from the expected Fisher information matrix $\efim(\theta ; \bu,\bt)$.<br /> * an algorithm able to estimate $\efim(\theta ; \bu,\bt)$ for any design $(\bu,\bt)$ and to maximize ${\cal D}(\bu,\bt)$ with respect to $\bu$ and $\bt$.<br /> }}<br /> <br /> <br /> In a clinical trial context, studies are designed to optimize the probability of reaching some predefined target ${\cal A}$, i.e., $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$. This may include optimizing safety and efficacy, and things like the probability of reaching sustained virologic response, etc.<br /> <br /> <br /> {{OutlineText<br /> |text=Optimal design for clinical trials requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsic(\, \cdot \, ; \theta, \bu, \bt)$ for $(\by,\bpsi,\bc)$.<br /> * a vector of population parameters $\theta$.<br /> * a target ${\cal A}$.<br /> * an algorithm able to estimate $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$ and to maximize it with respect to $\bu$ and $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Implementing models and running tasks==<br /> <br /> <br /> ===Example 1 ===<br /> <br /> Consider first the model defined by the joint distribution <br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\pypsi(\by,\bpsi ; \theta ,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \pcpsic(\bpsi ; \theta),&lt;/math&gt;}}<br /> <br /> where as in our running example, <br /> <br /> <br /> &lt;ul&gt;<br /> * $\by = (y_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are concentrations <br /> <br /> * $\bpsi= (\psi_i, 1\leq i \leq N)$ are individual parameters; here $\psi_i=(V_i,k_i,a_i)$<br /> <br /> * $\theta=(V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,a)$ are population parameters <br /> <br /> * $\bt = (t_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are the measurement times. <br /> &lt;/ul&gt;<br /> <br /> <br /> We aim to define a joint model for $\by$ and $\bpsi$. To do this, we will characterize each component of the model and show how this can be implemented with $\mlxtran$.<br /> <br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot;|<br /> {{Equation2 <br /> |name=&lt;math&gt; \pypsi(\by,\bpsi ; \theta, \bt) &lt;/math&gt; <br /> |equation= }}<br /> {{Equation2<br /> |name= &lt;math&gt; \pcpsic(\bpsi {{!}}\theta)&lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{array}{c} <br /> \log(V_i) &amp;\sim&amp; {\cal N}\left(\log(V_{\rm pop}), \, \omega_V^2\right) \\<br /> \log(k_i) &amp;\sim&amp; {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br /> \end{array}&lt;/math&gt; }}<br /> {{Equation2<br /> |name= &lt;math&gt;\pcypsi(y{{!}}\bpsi; \bt) &lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{eqnarray}<br /> f(t;V_i,k_i) &amp;=&amp; \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br /> y_{ij} &amp;\sim&amp; {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> |style = &quot;width:50%&quot; |<br /> {{MLXTranForTable<br /> |name=Example 1<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> [INDIVIDUAL PARAMETER]<br /> input={V_pop,k_pop,omega_V,omega_k}<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=V_pop,sd=omega_V}<br /> k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br /> <br /> <br /> [OBSERVATION]<br /> input={V,k,a}<br /> <br /> EQUATION:<br /> f = 500/V*exp(-k*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, sd=a}<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> <br /> We can then use this model with different tools for executing different tasks: it can be used for example with $\mlxplore$ for model exploration, with $\monolix$ for modeling, with R or Matlab for simulation...<br /> <br /> It is important to remember that $\mlxtran$ is not a &quot;function&quot; that calculates an output. It is not an imperative but rather a declarative language, one that allows us to describe a model. It is then the tasks we choose to do which use $\mlxtran$ like a function, &quot;requesting&quot; it to give predictions, simulate random variables, compute a pdf, maximizes a likelihood, etc.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Example 1 ===<br /> <br /> Consider first the model defined by the joint distribution <br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\pypsi(\by,\bpsi ; \theta ,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \pcpsic(\bpsi ; \theta),&lt;/math&gt;}}<br /> <br /> where as in our running example, <br /> <br /> <br /> &lt;ul&gt;<br /> * $\by = (y_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are concentrations <br /> <br /> * $\bpsi= (\psi_i, 1\leq i \leq N)$ are individual parameters; here $\psi_i=(V_i,k_i,a_i)$<br /> <br /> * $\theta=(V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,a)$ are population parameters <br /> <br /> * $\bt = (t_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are the measurement times. <br /> &lt;/ul&gt;<br /> <br /> <br /> We aim to define a joint model for $\by$ and $\bpsi$. To do this, we will characterize each component of the model and show how this can be implemented with $\mlxtran$.<br /> <br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot;|<br /> {{Equation2 <br /> |name=&lt;math&gt; \pypsi(\by,\bpsi ; \theta, \bt) &lt;/math&gt; <br /> |equation= }}<br /> {{Equation2<br /> |name= &lt;math&gt; \pcpsic(\bpsi {{!}}\theta)&lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{array}{c} <br /> \log(V_i) &amp;\sim&amp; {\cal N}\left(\log(V_{\rm pop}), \, \omega_V^2\right) \\<br /> \log(k_i) &amp;\sim&amp; {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br /> \end{array}&lt;/math&gt; }}<br /> {{Equation2<br /> |name= &lt;math&gt;\pcypsi(y{{!}}\bpsi; \bt) &lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{eqnarray}<br /> f(t;V_i,k_i) &amp;=&amp; \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br /> y_{ij} &amp;\sim&amp; {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> |style = &quot;width:50%&quot; |<br /> {{MLXTranForTable<br /> |name=Example 1<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> [INDIVIDUAL PARAMETER]<br /> input={V_pop,k_pop,omega_V,omega_k}<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=V_pop,sd=omega_V}<br /> k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br /> <br /> <br /> [OBSERVATION]<br /> input={V,k,a}<br /> <br /> EQUATION:<br /> f = 500/V*exp(-k*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, sd=a}<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> <br /> We can then use this model with different tools for executing different tasks: it can be used for example with $\mlxplore$ for model exploration, with $\monolix$ for modeling, with R or Matlab for simulation...<br /> <br /> It is important to remember that $\mlxtran$ is not a &quot;function&quot; that calculates an output. It is not an imperative but rather a declarative language, one that allows us to describe a model. It is then the tasks we choose to do which use $\mlxtran$ like a function, &quot;requesting&quot; it to give predictions, simulate random variables, compute a pdf, maximizes a likelihood, etc.<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack=The individual approach<br /> |linkNext=Description, representation and implementation of a model }}</div> Bleakley https://wiki.inria.fr/wikis/popix/index.php?title=What_is_a_model%3F_A_joint_probability_distribution!&diff=7366 What is a model? A joint probability distribution! 2013-06-19T09:20:32Z <p>Bleakley: </p> <hr /> <div>==Introduction==<br /> <br /> A model built for real-world applications can involve various types of variable, such as measurements, individual and population parameters, covariates, design, etc. The model allows us to represent relationships between these variables.<br /> <br /> If we consider things from a probabilistic point of view, some of the variables will be random, so the model becomes a probabilistic one, representing the [http://en.wikipedia.org/wiki/Joint_probability_distribution joint distribution] of these random variables.<br /> <br /> Defining a model therefore means defining a joint distribution. The hierarchical structure of the model will then allow it to be decomposed into submodels, i.e., the joint distribution decomposed into a product of [http://en.wikipedia.org/wiki/Conditional_probability_distribution conditional distributions].<br /> <br /> Tasks such as estimation, model selection, simulation and optimization can then be expressed as specific ways of using this probability distribution.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - A model is a joint probability distribution. <br /> <br /> - A submodel is a conditional distribution derived from this joint distribution. <br /> <br /> - A task is a specific use of this distribution. <br /> }}<br /> <br /> We will illustrate this approach starting with a very simple example that we will gradually make more sophisticated. Then we will see in various situations what can be defined as the model and what its inputs are.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==An illustrative example==<br /> <br /> &lt;br&gt;<br /> ===A model for the observations of a single individual===<br /> Let $y=(y_j, 1\leq j \leq n)$ be a vector of observations obtained at times $\vt=(t_j, 1\leq j \leq n)$. We consider that the $y_j$ are random variables and we denote $\qy$ the distribution (or [http://en.wikipedia.org/wiki/Probability_density_function pdf]) of $y$. If we assume a [http://en.wikipedia.org/wiki/Parametric_model parametric model], then there exists a vector of parameters $\psi$ that completely define $y$.<br /> <br /> We can then explicitly represent this dependency with respect to $\bpsi$ by writing $\qy( \, \cdot \, ; \psi)$ for the pdf of $y$.<br /> <br /> If we wish to be even more precise, we can even make it clear that this distribution is defined for a given design, i.e., a given vector of times $\vt$, and write $\qy(\, \cdot \, ; \psi,\vt)$ instead.<br /> <br /> By convention, the variables which are before the symbol &quot;;&quot; are random variables. Those that are after the &quot;;&quot; are non-random parameters or variables.<br /> When there is no risk of confusion, the non-random terms can be left out of the notation.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> -In this context, the model is the distribution of the observations $\qy(\, \cdot \, ; \psi,\vt)$. &lt;br&gt;<br /> -The inputs of the model are the parameters $\psi$ and the design $\vt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= 500 mg of a drug is given by [http://en.wikipedia.org/wiki/Intravenous_therapy intravenous] [http://en.wikipedia.org/wiki/Bolus_%28medicine%29 bolus] to a patient at time 0. We assume that the evolution of the [http://en.wikipedia.org/wiki/Blood_plasma plasmatic] concentration of the drug over time is described by the [http://en.wikipedia.org/wiki/Pharmacokinetics pharmacokinetic] (PK) model<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; f(t;V,k) = \displaystyle{ \frac{500}{V} }e^{-k \, t} , &lt;/math&gt; }}<br /> <br /> where $V$ is the [http://en.wikipedia.org/wiki/Volume_of_distribution volume of distribution] and $k$ the [http://en.wikipedia.org/wiki/Elimination_rate_constant elimination rate constant]. The concentration is measured at times $(t_j, 1\leq j \leq n)$ with additive residual errors:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; y_j = f(t_j;V,k) + e_j , \quad 1 \leq j \leq n . &lt;/math&gt; }}<br /> <br /> Assuming that the residual errors $(e_j)$ are [http://en.wikipedia.org/wiki/Dependent_and_independent_variables independent] and [http://en.wikipedia.org/wiki/Normal_distribution normally distributed] with constant variance $a^2$, the observed values $(y_j)$ are also independent random variables and<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba1&quot; &gt;&lt;math&gt;<br /> y_j \sim {\cal N} \left( f(t_j ; V,k) , a^2 \right), \quad 1 \leq j \leq n. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> Here, the vector of parameters $\psi$ is $(V,k,a)$. $V$ and $k$ are the PK parameters for the structural PK model and $a$ the residual error parameter.<br /> As the $y_j$ are independent, the joint distribution of $y$ is the product of their marginal distributions:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \py(y ; \psi,\vt) = \prod_{j=1}^n \pyj(y_j ; \psi,t_j) ,<br /> &lt;/math&gt; }}<br /> <br /> where $\qyj$ is the normal distribution defined in [[#ex_proba1|(1)]]. <br /> }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> === A model for several individuals ===<br /> <br /> Now let us move to $N$ individuals. It is natural to suppose that each is represented by the same basic parametric model, but not necessarily the exact same parameter values. Thus, individual $i$ has parameters $\psi_i$. If we consider that individuals are randomly selected from the [http://en.wikipedia.org/wiki/Statistical_population population], then we can treat the $\psi_i$ as if they were random vectors. As both $\by=(y_i , 1\leq i \leq N)$ and $\bpsi=(\psi_i , 1\leq i \leq N)$ are random, the model is now a joint distribution: $\qypsi$. Using basic probability, this can be written as:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsi(\by,\bpsi) = \pcypsi(\by {{!}} \bpsi) \, \ppsi(\bpsi) .&lt;/math&gt; }}<br /> <br /> If $\qpsi$ is a parametric distribution that depends on a vector $\theta$ of ''population parameters'' and a set of ''individual covariates'' $\bc=(c_i , 1\leq i \leq N)$, this dependence can be made explicit by writing $\qpsi(\, \cdot \,;\theta,\bc)$ for the pdf of $\bpsi$.<br /> Each $i$ has a potentially unique set of times $t_i=(t_{i1},\ldots,t_{i \ \!\!n_i})$ in the design, and $n_i$ can be different for each individual.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - In this context, the model is the joint distribution of the observations and the individual parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pypsi(\by , \bpsi; \theta, \bc,\bt)=\pcypsi(\by {{!}} \bpsi;\bt) \, \ppsi(\bpsi;\theta,\bc) . &lt;/math&gt;}}<br /> <br /> - The inputs of the model are the population parameters $\theta$, the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times <br /> :$\bt=(t_{ij} ,\ 1\leq i \leq N ,\ 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= Let us suppose $N$ patients received the same treatment as the single patient did. We now have the same PK model [[#ex_proba1|(1)]] for each patient, except that each has its own individual PK parameters $V_i$ and $k_i$ and potentially its own residual error parameter $a_i$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2a&quot;&gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N} \left( \displaystyle{\frac{500}{V_i}e^{-k_i \, t_{ij} } } , a_i^2 \right).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> Here, $\psi_i = (V_i,k_i,a_i)$. One possible model is then to assume the same residual error model for all patients, and log-normal distributions for $V$ and $k$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> a_i &amp;=&amp; a \end{eqnarray}&lt;/math&gt; }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2b&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \log(V_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(V_{\rm pop})+\beta\,\log(w_i/70),\ \omega_V^2\right) \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \log(k_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(k_{\rm pop}),\ \omega_k^2\right), \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where the only covariate we choose to consider, $w_i$, is the weight (in kg) of patient $i$. The model therefore consists of the conditional distribution of the concentrations defined in [[#ex_proba2a|(2)]] and the distribution of the individual PK parameters defined in [[#ex_proba2b|(3)]]. The inputs of the model are the population parameters $\theta = (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$, the covariates (here, the weight) $(w_i, 1\leq i \leq N)$, and the design $\bt$.<br /> }}<br /> <br /> &lt;br&gt;<br /> ===A model for the population parameters===<br /> <br /> In some cases, it may turn out that it is useful or important to consider that the population parameter $\theta$ is itself random, rather than fixed. There are various reasons for this, such as if we want to model uncertainty in its value, introduce a priori information in an estimation context, or model an inter-population variability if the model is not looking at only one given population.<br /> <br /> If so, let us denote $\qth$ the distribution of $\theta$. As the status of $\theta$ has therefore changed, the model now becomes the joint distribution of its random variables, i.e., of $\by$, $\bpsi$ and $\theta$, and can be decomposed as follows:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3a&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta;\bt,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> |text= &lt;ol&gt;<br /> &lt;li&gt; The formula is identical for $\ppsi(\bpsi; \theta)$ and $\pcpsith(\bpsi{{!}}\theta)$. What has changed is the status of $\theta$. It is not random in $\ppsi(\bpsi; \theta)$, the distribution of $\bpsi$ for any given value of $\theta$, whereas it is random in $\pcpsith(\bpsi {{!}} \theta)$, the conditional distribution of $\bpsi$, i.e., the distribution of $\bpsi$ obtained after observing randomly generated $\theta$. &lt;/li&gt;&lt;br&gt;<br /> <br /> &lt;li&gt;If $\qth$ is a parametric distribution with parameter $\varphi$, this dependence can be made explicit by writing $\qth(\, \cdot \,;\varphi)$ for the distribution of $\theta$.&lt;/li&gt;&lt;br&gt;<br /> <br /> &lt;li&gt;Not necessarily all of the components of $\theta$ need be random. If it is possible to decompose $\theta$ into $(\theta_F,\theta_R)$, where $\theta_F$ is fixed and $\theta_R$ random, then the decomposition [[#proba3a{{!}}(4)]] becomes &lt;/li&gt;<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3b&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta_R;\bt,\theta_F,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta_R;\theta_F,\bc) \, \pth(\theta_R).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(5) }} <br /> &lt;/ol&gt;}}<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the population parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsith(\by,\bpsi,\theta;\bc,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta). &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times $\bt=(t_{ij} , 1\leq i \leq N , 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We can introduce [http://en.wikipedia.org/wiki/Prior_probability prior distributions] in order to model the inter-population variability of the population parameters $V_{\rm pop}$ and $k_{\rm pop}$: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba3&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> V_{\rm pop} &amp;\sim&amp; {\cal N}\left(30,3^2\right) <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(6) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> k_{\rm pop} &amp;\sim&amp; {\cal N}\left(0.1,0.01^2\right). \end{eqnarray}&lt;/math&gt; }}<br /> <br /> As before, the conditional distribution of the concentration is given by [[#ex_proba2a|(2)]]. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given $\theta_R=(V_{\rm pop},k_{\rm pop})$. The distribution of $\theta_R$ is defined in [[#ex_proba3|(6)]]. Here, the inputs of the model are the fixed population parameters $\theta_F = (\omega_V,\omega_k,\beta,a)$, the weights $(w_i)$ and the design $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the covariates===<br /> <br /> <br /> Another scenario is to suppose that in fact it is the covariates $\bc$ that are random, not the population parameters. This may either be in the context of wanting to simulate individuals, or when modeling and wanting to take into account uncertainty in the covariate values. If we note $\qc$ the distribution of the covariates, the joint distribution $\qpsic$ of the individual parameters and the covariates decomposes naturally as:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba4&quot;&gt;&lt;math&gt;<br /> \ppsic(\bpsi,\bc;\theta) = \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) \, ,<br /> &lt;/math&gt;<br /> |reference=(7) }}<br /> <br /> where $\qcpsic$ is the conditional distribution of $\bpsi$ given $\bc$.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> &lt;li&gt;In this context, the model is the joint distribution of the observations, the individual parameters and the covariates:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsic(\by,\bpsi,\bc;\theta,\bt) = \pcypsi(\by {{!}} \bpsi;\bt) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) . &lt;/math&gt; }}<br /> <br /> &lt;li&gt;The inputs of the model are the population parameters $\theta$ and the measurement times $\bt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We could assume a normal distribution as a prior for the weights: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba4&quot; &gt;&lt;math&gt; <br /> w_i \sim_{i.i.d.} {\cal N}\left(70,10^2\right). &lt;/math&gt;&lt;/div&gt;<br /> |reference=(8) }}<br /> <br /> Once more, [[#ex_proba2a|(2)]] defines the conditional distribution of the concentrations. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given the weight $\bw$, which is now a random variable whose distribution is defined in [[#ex_proba4|(8)]]. Now, the inputs of the model are the population parameters $\theta= (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$ and the design $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the measurement times===<br /> <br /> Another scenario is to suppose that there is uncertainty in the measurement times $\bt=(t_{ij})$ and not the population parameters or covariates. If we note $\nominal{\bt}=(\nominal{t}_{ij}, 1\leq i \leq N, 1\leq j \leq n_i)$ the nominal measurement times (i.e., those presented in a data set), then the &quot;true&quot; measurement times $\bt$ at which the measurement were made can be considered random fluctuations around $\nominal{\bt}$ following some distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> Randomness with respect to time can also appear in the presence of dropout, i.e., individuals that prematurely leave a clinical trial. For such an individual $i$ who leaves at the random time $T_i$, measurement times are the nominal times before $T_i$: $t_{i} = (\nominal{t}_{ij} \ \ {\rm s.t. }\ \ \nominal{t}_{ij}\leq T_i)$.<br /> In such situations, measurement times are therefore random and can be thought to come from a distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> <br /> {{Remarks<br /> |title=Remark<br /> |text= If there are also other regression variables $\bx=(x_{ij})$, it is of course possible to use the same approach and consider $\bx$ as a random variable fluctuating around $\nominal{\bx}$. }}<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsit(\by , \bpsi,\bt; \theta,\bc,\nominal{\bt})=\pcypsit(\by {{!}}\bpsi,\bt) \, \ppsi(\bpsi;\theta,\bc) \, \pt(\bt ; \nominal{\bt}) .<br /> &lt;/math&gt; }}<br /> <br /> <br /> &lt;li&gt; The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$ and the nominal design $\nominal{\bt}$.<br /> }}<br /> <br /> {{Example<br /> |title=Example:<br /> |text= Let us assume as prior a normal distribution around the nominal times: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba5&quot; &gt;&lt;math&gt; <br /> t_{ij} \sim_{i.i.d.} {\cal N}\left(\nominal{t}_{ij},0.03^2\right). &lt;/math&gt;&lt;/div&gt;<br /> |reference=(9) }}<br /> <br /> Here, [[#ex_proba5|(9)]] defines the distribution of the now random variable $\bt$. The other components of the model defined in [[#ex_proba2a|(2)]] and [[#ex_proba2b|(3)]] remain unchanged. <br /> The inputs of the model are the population parameters $\theta$, the weights $(w_i)$ and the nominal measurement times $\nominal{\bt}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the dose regimen===<br /> <br /> If the structural model is a dynamical system (e.g., defined by a system of [http://en.wikipedia.org/wiki/Ordinary_differential_equation ordinary differential equations]), the ''source terms'' $\bu = (u_i, 1\leq i \leq N)$, i.e., the inputs of the dynamical system, are usually considered fixed and known. This is the case for example for doses administered to patients for a given treatment. Here, the source term $u_i$ is made up of the dose(s) given to patient $i$, the time(s) of administration, and their type (intravenous bolus, infusion, oral, etc.).<br /> <br /> Here again, there may be differences between the nominal dose regimen stated in the protocol and given in the data set, and the dose regimen that was in reality administered. For example, it might be that the times of administration and/or the dosage were not exactly respected or recorded. Also, there may have been non-compliance, i.e., certain doses that were not taken by the patient.<br /> <br /> If we denote $\nominal{\bu}=(\nominal{u}_{i}, 1\leq i \leq N)$ the nominal dose regimens (reported in the dataset), then in this context the &quot;real&quot; dose regimens $\bu$ can be considered to randomly fluctuate around $\nominal{\bu}$ with some distribution $\qu(\, \cdot \, ; \nominal{\bu})$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the dose regimens:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsiu(\by , \bpsi,\bu; \theta,\bc,\bt,\nominal{\bu})=\pcypsiu(\by {{!}} \bpsi,\bu;\bt) \, \pu(\bu ; \nominal{\bu}) \, \ppsi(\bpsi;\theta,\bc) .<br /> &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$, the nominal design $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=Suppose that instead of the one dose given in the example up to now, there are now repeated doses $(d_{ik}, k \geq 1)$ administered to patient $i$ at times $(\tau_{ik} , k \geq 1)$. Then, it is easy to see that<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6b&quot; &gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N}\left(f(t_{ij};V_i,k_i) , a_i^2 \right), &lt;/math&gt;&lt;/div&gt;<br /> |reference=(10) }}<br /> <br /> where<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6a&quot; &gt;&lt;math&gt; <br /> f(t;V_i,k_i) = \sum_{k, \tau_{ik}&lt;t}\displaystyle {\frac{d_{ik} }{V_i} }\, e^{-k_i \, (t- \tau_{ik})} .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(11) }}<br /> <br /> The &quot;real&quot; dose regimen administered to patient $i$ can be written $u_i=(d_{ik},\tau_{ik}, k\geq 1)$, and the prescribed dose regimen $\nominal{u}_i=(\nominal{d}_{ik},\nominal{\tau}_{ik}, k\geq 1)$.<br /> <br /> We can model the random fluctuations of the administration times $\tau_{ik}$ around the nominal times $(\nominal{\tau}_{ij})$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6c&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \tau_{ik} &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\nominal{\tau}_{ik},0.02^2\right)\ , <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(12) }}<br /> <br /> and non-compliance (here meaning that a dose is not taken):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6d&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \pi &amp;=&amp; \prob{d_{ik} = 0} \nonumber \\ &amp;=&amp; 1 - \prob{d_{ik}= \nominal{d}_{ik} }. <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(13) }}<br /> <br /> Here, [[#ex_proba6b{{!}}(10)]] and [[#ex_proba6a{{!}}(11)]] define the conditional distributions of the concentrations $(y_{ij})$, [[#ex_proba2b{{!}}(3)]] defines the distribution of $\bpsi$ and [[#ex_proba6c{{!}}(12)]] and [[#ex_proba6d{{!}}(13)]] define the distribution of $\bu$. The inputs are the population parameters $\theta$, the weights $(w_i)$, the measurement times $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A complete model===<br /> <br /> We have now seen the variety of ways in which the variables in a model either play the role of random variables whose distribution is defined by the model, or nonrandom variables or parameters. Any combination is possible, depending on the context. For instance, the population parameters $\theta$ and covariates $\bc$ could be random with parametric probability distributions $\qth(\, \cdot \,;\varphi)$ and $\qc(\, \cdot \,;\gamma)$, and the dose regimen $\bu$ and measurement times $\bt$ reported with uncertainty and therefore modeled as random variables with distributions $\qu$ and $\qt$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters, the population parameters, the dose regimens, the covariates and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsithcut(\by , \bpsi, \theta, \bu, \bc,\bt; \nominal{\bu},\nominal{\bt},\varphi,\gamma)=\pcypsiut(\by {{!}}\bpsi,\bu,\bt) \, \pcpsithc(\bpsi{{!}}\theta,\bc) \, \pth(\theta;\varphi) \, \pc(\bc;\gamma) \, \pu(\bu ; \nominal{\bu}) \, \pt(\bt ; \nominal{\bt}). &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the nominal dose regimens $\nominal{\bu}$, the nominal measurement times $\nominal{\bt}$ and the &quot;hyper-parameters&quot; $\varphi$ and $\gamma$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Using the model for executing tasks==<br /> <br /> <br /> In the modeling and simulation context, the tasks to execute make specific use of the various probability distributions associated with a model.<br /> <br /> &lt;br&gt;<br /> ===Simulation===<br /> <br /> <br /> By definition, simulation makes direct use of the probability distribution that defines the model. Simulation of the global model is straightforward as soon as the joint distribution can be decomposed into a product of easily simulated conditional distributions.<br /> <br /> Consider for example that the variables involved in the model are those introduced in [[#An illustrative example|the previous section]]:<br /> <br /> <br /> # The population parameters $\theta$ can either be given, or simulated from the distribution $\qth$.<br /> # The individual covariates $\bc$ can either be given, or simulated from the distribution $\qc$.<br /> # The individual parameters $\bpsi$ can be simulated from the distribution $\qcpsithc$ using the values of $\theta$ and $\bc$ obtained in steps 1 and 2.<br /> # The dose regimen $\bu$ can either be given, or simulated from the distribution $\qu$.<br /> # The measurement times $\bt$ (resp. regression variables $\bx$) can either be given, or simulated from the distribution $\qt$ (resp. $\qx$).<br /> # Lastly, observations $\by$ can be simulated from the distribution $\qcypsiut$ using the values of $\bpsi$, $\bu$ and $\bt$ obtained at steps 3, 4 and 5.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> Simulation of a set of variables $w$ using another given set of variables $z$ requires:<br /> <br /> <br /> &lt;ul&gt;<br /> * a model, i.e., a distribution $\qw$ if $z$ is treated as a nonrandom variable, or a conditional distribution $\qcwz$ if $z$ is treated as a random variable.<br /> * the input $z$, i.e., a value of $z$ which allows the distribution $\qw(\, \cdot \, ; z)$ or the conditional distribution $\qcwz(\, \cdot \, {{!}} z)$ to be defined.<br /> * an algorithm which allows us to generate $w$ from $\qw$ or $\qcwz$.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=<br /> - Imagine instead that the population parameter $\theta$ and the design $(\bu,\bt)$ are given, and we want to simulate the individual covariates $\bc$, the individual parameters $\bpsi$ and the observations $\by$. Here, the variables to simulate are $w=(\bc,\bpsi,\by)$ and the variables which are given are $z=(\theta,\bu,\bt)$. If the components of $z$ are taken to be nonrandom variables, then:<br /> <br /> <br /> &lt;ul&gt;<br /> * The model is the joint distribution $\qypsic( \, \cdot \, ;\theta,\bu,\bt)$ of $(\by,\bpsi,\bc)$.<br /> * The inputs required for the simulation are the values of $(\theta,\bu,\bt)$.<br /> * The algorithm should be able to generate $(\by,\bpsi,\bc)$ from the joint distribution $\qypsic(\, \cdot \, ;\theta,\bu,\bt)$. Decomposing the model into three submodels leads to decomposing the joint distribution as<br /> &lt;/ul&gt;<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsic(\by,\bpsi,\bc ;\theta,\bu,\bt) = \pc(\bc) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pcypsi(\by {{!}} \bpsi;\bu,\bt) . &lt;/math&gt; }}<br /> <br /> The algorithm therefore reduces to successively drawing $\bc$, $\bpsi$ and $\by$ from $\qc$, $\qcpsic(\, \cdot \, {{!}} \bc;\theta)$ and $\qcypsi(\, \cdot \, {{!}} \bpsi;\bu,\bt)$. <br /> <br /> <br /> - Imagine instead that the individual covariates $\bc$, the observations $\by$, the design $(\bu,\bt)$ and the population parameter $\theta$ are given (in a modeling context for instance, $\theta$ may have been estimated), and we want to simulate the individual parameters $\bpsi$. The only variable to simulate is $w=\bpsi$ and the variables which are given are $z=(\by,\bc,\theta,\bu,\bt)$. Here, we will treat $\by$ as if it is a random variable. The other components of $z$ can be treated as non random variables. Here,<br /> <br /> <br /> &lt;ul&gt;<br /> * The model is the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$ of $\psi$.<br /> * The inputs required for the simulation are the values of $(\by,\bc,\theta,\bu,\bt)$.<br /> * The algorithm should be able to sample $\bpsi$ from the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$. [http://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo Markov Chain Monte Carlo] (MCMC) algorithms can be used for sampling from such complex conditional distributions.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> &lt;br&gt;<br /> <br /> ===Estimation of the population parameters===<br /> <br /> <br /> In a modeling context, we usually assume that we have data that includes the observations $\by$ and the measurement times $\bt$. There may also be individual covariates $\bc$, and in pharmacological applications the dose regimen $\bu$. For clarity, let us consider the most general case where all are present.<br /> <br /> Any statistical method for estimating the population parameters $\theta$ will be based on some specific probability distribution. Let us illustrate this with two common statistical methods: maximum likelihood and Bayesian estimation.<br /> <br /> <br /> ''Maximum likelihood estimation'' consists in maximizing with respect to $\theta$ the ''observed likelihood'', defined by:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\like}(\theta ; \by,\bc,\bu,\bt) &amp;\eqdef&amp; \py(\by ; \bc,\bu,\bt,\theta) \\<br /> &amp;=&amp; \int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The variance of the estimator $\thmle$ and therefore confidence intervals can be derived from the observed Fisher information matrix, which itself is calculated using the observed likelihood (i.e., the pdf of the observations $\by$):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ofim_intro3&quot;&gt;&lt;math&gt;<br /> \ofim(\thmle ; \by,\bc,\bu,\bt) \ \ \eqdef \ \ - \displaystyle{ \frac{\partial^2}{\partial \theta^2} } \log({\like}(\thmle ; \by,\bc,\bu,\bt)) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(14) }}<br /> <br /> <br /> {{OutlineText<br /> |text=Maximum likelihood estimation of the population parameter $\theta$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi$.<br /> * inputs $\by$, $\bc$, $\bu$ and $\bt$.<br /> * an algorithm which allows us to maximize $\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi$ with respect to $\theta$ and to compute $\displaystyle{ \frac{\partial^2}{\partial \theta^2} }\left\{\log\left(\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\thmle) \, d \bpsi \right)\right\}$.<br /> }}<br /> <br /> <br /> ''Bayesian estimation'' consists in estimating and/or maximizing the conditional distribution<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcthy(\theta {{!}} \by ;\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pyth(\by , \theta ; \bc,\bu,\bt)}{\py(\by ; \bc,\bu,\bt)} } \\<br /> &amp;=&amp; \frac{\displaystyle{ \int \pypsith(\by,\bpsi,\theta ; \bc,\bu,\bt) \, d \bpsi} }{\py(\by ; \bc,\bu,\bt)} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> {{outlineText<br /> |text= Bayesian estimation of the population parameter $\theta$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsith(\, \cdot \, ; \bc, \bu, \bt)$ for $(\by,\bpsi,\theta)$.<br /> * inputs $\by$, $\bc$, $\bu$ and $\bt$.<br /> * algorithms able to estimate and maximize $\pcthy(\theta {{!}} \by ;\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode, i.e., finding its maximum.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Estimation of the individual parameters===<br /> <br /> <br /> When $\theta$ is given (or estimated), various estimators of the individual parameters $\bpsi$ are available. They are all based on a probability distribution:<br /> <br /> ''Maximum likelihood estimation'' consists of maximizing with respect to $\bpsi$ the ''conditional likelihood''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\like}(\bpsi ; \by,\bu,\bt) &amp;\eqdef&amp; \pcypsi(\by {{!}} \bpsi ;\bu,\bt) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''maximum a posteriori'' (MAP) estimator is obtained by maximizing with respect to $\bpsi$ the ''conditional distribution''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcpsiy(\bpsi {{!}} \by ; \theta,\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pypsi(\by , \bpsi;\theta,\bc,\bu,\bt)}{\py(\by ; \theta,\bc,\bu,\bt)} } .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''conditional mean'' of $\bpsi$ is defined as the mean of the conditional distribution $\qcpsiy(\, \cdot \, | \by ; \theta,\bc,\bu,\bt)$ of $\psi$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> Estimation of the individual parameters $\bpsi$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br /> * algorithms able to estimate and maximize $\pcpsiy(\bpsi {{!}} \by ; \theta,\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode (i.e., its MAP).<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> ===Model selection===<br /> <br /> <br /> Likelihood ratio tests and statistical information criteria (BIC, AIC) compare the ''observed likelihoods'' computed under different models, i.e., the probability distribution functions $\py^{(1)}(\by ; \bc,\bu,\bt,\thmle_1)$, $\py^{(2)}(\by ; \bc,\bu,\bt,\thmle_2)$, ..., $\py^{(K)}(\by ; \bc,\bu,\bt,\thmle_K)$ computed under models ${\cal M}_1, {\cal M}_2$, ..., ${\cal M}_K$, where $\thmle_k$ maximizes the observed likelihood of model ${\cal M}_k$, i.e., maximizes $\py^{(k)}(\by ; \bc,\bu,\bt,\theta)$ .<br /> <br /> <br /> {{outlineText<br /> |text=<br /> Computing the observed likelihood and information criteria require:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br /> * an algorithm able to compute $\int \pypsi( \by ,\bpsi ;\theta,\bc,\bu,\bt) \, d\bpsi$. For nonlinear models, linearization methods or Monte-Carlo methods can be used.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Optimal design===<br /> <br /> In the design of experiments for estimating statistical models, optimal designs allow parameters to be estimated with minimum variance by optimizing some statistical criteria. Common optimality criteria are functionals of the eigenvalues of the expected Fisher information matrix:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;efim_intro3&quot;&gt;&lt;math&gt;<br /> \efim(\theta ; \bu,\bt) \ \ \eqdef \ \ \esps{y}{\ofim(\theta ; \by,\bu,\bt)} ,<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(15) }}<br /> <br /> where $\ofim$ is the observed Fisher information matrix defined in [[#ofim_intro3|(15)]]. For the sake of simplicity, we consider models without covariates $\bc$.<br /> <br /> <br /> {{OutlineText<br /> |text=Optimal design for minimum variance estimation requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * a vector of population parameters $\theta$.<br /> * a criteria ${\cal D}(\bu,\bt)$ derived from the expected Fisher information matrix $\efim(\theta ; \bu,\bt)$.<br /> * an algorithm able to estimate $\efim(\theta ; \bu,\bt)$ for any design $(\bu,\bt)$ and to maximize ${\cal D}(\bu,\bt)$ with respect to $\bu$ and $\bt$.<br /> }}<br /> <br /> <br /> In a clinical trial context, studies are designed to optimize the probability of reaching some predefined target ${\cal A}$, i.e., $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$. This may include optimizing safety and efficacy, and things like the probability of reaching sustained virologic response, etc.<br /> <br /> <br /> {{OutlineText<br /> |text=Optimal design for clinical trials requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsic(\, \cdot \, ; \theta, \bu, \bt)$ for $(\by,\bpsi,\bc)$.<br /> * a vector of population parameters $\theta$.<br /> * a target ${\cal A}$.<br /> * an algorithm able to estimate $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$ and to maximize it with respect to $\bu$ and $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Implementing models and running tasks==<br /> <br /> <br /> ===Example 1 ===<br /> <br /> Consider first the model defined by the joint distribution <br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\pypsi(\by,\bpsi ; \theta ,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \pcpsic(\bpsi ; \theta),&lt;/math&gt;}}<br /> <br /> where as in our running example, <br /> <br /> <br /> &lt;ul&gt;<br /> * $\by = (y_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are concentrations <br /> <br /> * $\bpsi= (\psi_i, 1\leq i \leq N)$ are individual parameters; here $\psi_i=(V_i,k_i,a_i)$<br /> <br /> * $\theta=(V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,a)$ are population parameters <br /> <br /> * $\bt = (t_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are the measurement times. <br /> &lt;/ul&gt;<br /> <br /> <br /> We aim to define a joint model for $\by$ and $\bpsi$. To do this, we will characterize each component of the model and show how this can be implemented with $\mlxtran$.<br /> <br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot;|<br /> {{Equation2 <br /> |name=&lt;math&gt; \pypsi(\by,\bpsi ; \theta, \bt) &lt;/math&gt; <br /> |equation= }}<br /> {{Equation2<br /> |name= &lt;math&gt; \pcpsic(\bpsi {{!}}\theta)&lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{array}{c} <br /> \log(V_i) &amp;\sim&amp; {\cal N}\left(\log(V_{\rm pop}), \, \omega_V^2\right) \\<br /> \log(k_i) &amp;\sim&amp; {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br /> \end{array}&lt;/math&gt; }}<br /> {{Equation2<br /> |name= &lt;math&gt;\pcypsi(y{{!}}\bpsi; \bt) &lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{eqnarray}<br /> f(t;V_i,k_i) &amp;=&amp; \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br /> y_{ij} &amp;\sim&amp; {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> |style = &quot;width:50%&quot; |<br /> {{MLXTranForTable<br /> |name=Example 1<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> [INDIVIDUAL PARAMETER]<br /> input={V_pop,k_pop,omega_V,omega_k}<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=V_pop,sd=omega_V}<br /> k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br /> <br /> <br /> [OBSERVATION]<br /> input={V,k,a}<br /> <br /> EQUATION:<br /> f = 500/V*exp(-k*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, sd=a}<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> <br /> We can then use this model with different tools for executing different tasks: it can be used for example with $\mlxplore$ for model exploration, with $\monolix$ for modeling, with R or Matlab for simulation...<br /> <br /> It is important to remember that $\mlxtran$ is not a &quot;function&quot; that calculates an output. It is not an imperative but rather a declarative language, one that allows us to describe a model. It is then the tasks we choose to do which use $\mlxtran$ like a function, &quot;requesting&quot; it to give predictions, simulate random variables, compute a pdf, maximizes a likelihood, etc.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Example 1 ===<br /> <br /> Consider first the model defined by the joint distribution <br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\pypsi(\by,\bpsi ; \theta ,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \pcpsic(\bpsi ; \theta),&lt;/math&gt;}}<br /> <br /> where as in our running example, <br /> <br /> <br /> &lt;ul&gt;<br /> * $\by = (y_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are concentrations <br /> <br /> * $\bpsi= (\psi_i, 1\leq i \leq N)$ are individual parameters; here $\psi_i=(V_i,k_i,a_i)$<br /> <br /> * $\theta=(V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,a)$ are population parameters <br /> <br /> * $\bt = (t_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are the measurement times. <br /> &lt;/ul&gt;<br /> <br /> <br /> We aim to define a joint model for $\by$ and $\bpsi$. To do this, we will characterize each component of the model and show how this can be implemented with $\mlxtran$.<br /> <br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot;|<br /> {{Equation2 <br /> |name=&lt;math&gt; \pypsi(\by,\bpsi ; \theta, \bt) &lt;/math&gt; <br /> |equation= }}<br /> {{Equation2<br /> |name= &lt;math&gt; \pcpsic(\bpsi {{!}}\theta)&lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{array}{c} <br /> \log(V_i) &amp;\sim&amp; {\cal N}\left(\log(V_{\rm pop}), \, \omega_V^2\right) \\<br /> \log(k_i) &amp;\sim&amp; {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br /> \end{array}&lt;/math&gt; }}<br /> {{Equation2<br /> |name= &lt;math&gt;\pcypsi(y{{!}}\bpsi; \bt) &lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{eqnarray}<br /> f(t;V_i,k_i) &amp;=&amp; \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br /> y_{ij} &amp;\sim&amp; {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> |style = &quot;width:50%&quot; |<br /> {{MLXTranForTable<br /> |name=Example 1<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> [INDIVIDUAL PARAMETER]<br /> input={V_pop,k_pop,omega_V,omega_k}<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=V_pop,sd=omega_V}<br /> k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br /> <br /> <br /> [OBSERVATION]<br /> input={V,k,a}<br /> <br /> EQUATION:<br /> f = 500/V*exp(-k*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, sd=a}<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> <br /> We can then use this model with different tools for executing different tasks: it can be used for example with $\mlxplore$ for model exploration, with $\monolix$ for modeling, with R or Matlab for simulation...<br /> <br /> It is important to remember that $\mlxtran$ is not a &quot;function&quot; that calculates an output. It is not an imperative but rather a declarative language, one that allows us to describe a model. It is then the tasks we choose to do which use $\mlxtran$ like a function, &quot;requesting&quot; it to give predictions, simulate random variables, compute a pdf, maximizes a likelihood, etc.<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack=The individual approach<br /> |linkNext=Description, representation and implementation of a model }}</div> Bleakley https://wiki.inria.fr/wikis/popix/index.php?title=What_is_a_model%3F_A_joint_probability_distribution!&diff=7365 What is a model? A joint probability distribution! 2013-06-19T09:13:38Z <p>Bleakley: </p> <hr /> <div>==Introduction==<br /> <br /> A model built for real-world applications can involve various types of variable, such as measurements, individual and population parameters, covariates, design, etc. The model allows us to represent relationships between these variables.<br /> <br /> If we consider things from a probabilistic point of view, some of the variables will be random, so the model becomes a probabilistic one, representing the [http://en.wikipedia.org/wiki/Joint_probability_distribution joint distribution] of these random variables.<br /> <br /> Defining a model therefore means defining a joint distribution. The hierarchical structure of the model will then allow it to be decomposed into submodels, i.e., the joint distribution decomposed into a product of [http://en.wikipedia.org/wiki/Conditional_probability_distribution conditional distributions].<br /> <br /> Tasks such as estimation, model selection, simulation and optimization can then be expressed as specific ways of using this probability distribution.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - A model is a joint probability distribution. <br /> <br /> - A submodel is a conditional distribution derived from this joint distribution. <br /> <br /> - A task is a specific use of this distribution. <br /> }}<br /> <br /> We will illustrate this approach starting with a very simple example that we will gradually make more sophisticated. Then we will see in various situations what can be defined as the model and what its inputs are.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==An illustrative example==<br /> <br /> &lt;br&gt;<br /> ===A model for the observations of a single individual===<br /> Let $y=(y_j, 1\leq j \leq n)$ be a vector of observations obtained at times $\vt=(t_j, 1\leq j \leq n)$. We consider that the $y_j$ are random variables and we denote $\qy$ the distribution (or [http://en.wikipedia.org/wiki/Probability_density_function pdf]) of $y$. If we assume a [http://en.wikipedia.org/wiki/Parametric_model parametric model], then there exists a vector of parameters $\psi$ that completely define $y$.<br /> <br /> We can then explicitly represent this dependency with respect to $\bpsi$ by writing $\qy( \, \cdot \, ; \psi)$ for the pdf of $y$.<br /> <br /> If we wish to be even more precise, we can even make it clear that this distribution is defined for a given design, i.e., a given vector of times $\vt$, and write $\qy(\, \cdot \, ; \psi,\vt)$ instead.<br /> <br /> By convention, the variables which are before the symbol &quot;;&quot; are random variables. Those that are after the &quot;;&quot; are non-random parameters or variables.<br /> When there is no risk of confusion, the non-random terms can be left out of the notation.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> -In this context, the model is the distribution of the observations $\qy(\, \cdot \, ; \psi,\vt)$. &lt;br&gt;<br /> -The inputs of the model are the parameters $\psi$ and the design $\vt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= 500 mg of a drug is given by [http://en.wikipedia.org/wiki/Intravenous_therapy intravenous] [http://en.wikipedia.org/wiki/Bolus_%28medicine%29 bolus] to a patient at time 0. We assume that the evolution of the [http://en.wikipedia.org/wiki/Blood_plasma plasmatic] concentration of the drug over time is described by the [http://en.wikipedia.org/wiki/Pharmacokinetics pharmacokinetic] (PK) model<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; f(t;V,k) = \displaystyle{ \frac{500}{V} }e^{-k \, t} , &lt;/math&gt; }}<br /> <br /> where $V$ is the [http://en.wikipedia.org/wiki/Volume_of_distribution volume of distribution] and $k$ the [http://en.wikipedia.org/wiki/Elimination_rate_constant elimination rate constant]. The concentration is measured at times $(t_j, 1\leq j \leq n)$ with additive residual errors:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; y_j = f(t_j;V,k) + e_j , \quad 1 \leq j \leq n . &lt;/math&gt; }}<br /> <br /> Assuming that the residual errors $(e_j)$ are [http://en.wikipedia.org/wiki/Dependent_and_independent_variables independent] and [http://en.wikipedia.org/wiki/Normal_distribution normally distributed] with constant variance $a^2$, the observed values $(y_j)$ are also independent random variables and<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba1&quot; &gt;&lt;math&gt;<br /> y_j \sim {\cal N} \left( f(t_j ; V,k) , a^2 \right), \quad 1 \leq j \leq n. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> Here, the vector of parameters $\psi$ is $(V,k,a)$. $V$ and $k$ are the PK parameters for the structural PK model and $a$ the residual error parameter.<br /> As the $y_j$ are independent, the joint distribution of $y$ is the product of their marginal distributions:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \py(y ; \psi,\vt) = \prod_{j=1}^n \pyj(y_j ; \psi,t_j) ,<br /> &lt;/math&gt; }}<br /> <br /> where $\qyj$ is the normal distribution defined in [[#ex_proba1|(1)]]. <br /> }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> === A model for several individuals ===<br /> <br /> Now let us move to $N$ individuals. It is natural to suppose that each is represented by the same basic parametric model, but not necessarily the exact same parameter values. Thus, individual $i$ has parameters $\psi_i$. If we consider that individuals are randomly selected from the [http://en.wikipedia.org/wiki/Statistical_population population], then we can treat the $\psi_i$ as if they were random vectors. As both $\by=(y_i , 1\leq i \leq N)$ and $\bpsi=(\psi_i , 1\leq i \leq N)$ are random, the model is now a joint distribution: $\qypsi$. Using basic probability, this can be written as:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsi(\by,\bpsi) = \pcypsi(\by {{!}} \bpsi) \, \ppsi(\bpsi) .&lt;/math&gt; }}<br /> <br /> If $\qpsi$ is a parametric distribution that depends on a vector $\theta$ of ''population parameters'' and a set of ''individual covariates'' $\bc=(c_i , 1\leq i \leq N)$, this dependence can be made explicit by writing $\qpsi(\, \cdot \,;\theta,\bc)$ for the pdf of $\bpsi$.<br /> Each $i$ has a potentially unique set of times $t_i=(t_{i1},\ldots,t_{i \ \!\!n_i})$ in the design, and $n_i$ can be different for each individual.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - In this context, the model is the joint distribution of the observations and the individual parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pypsi(\by , \bpsi; \theta, \bc,\bt)=\pcypsi(\by {{!}} \bpsi;\bt) \, \ppsi(\bpsi;\theta,\bc) . &lt;/math&gt;}}<br /> <br /> - The inputs of the model are the population parameters $\theta$, the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times <br /> :$\bt=(t_{ij} ,\ 1\leq i \leq N ,\ 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= Let us suppose $N$ patients received the same treatment as the single patient did. We now have the same PK model [[#ex_proba1|(1)]] for each patient, except that each has its own individual PK parameters $V_i$ and $k_i$ and potentially its own residual error parameter $a_i$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2a&quot;&gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N} \left( \displaystyle{\frac{500}{V_i}e^{-k_i \, t_{ij} } } , a_i^2 \right).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> Here, $\psi_i = (V_i,k_i,a_i)$. One possible model is then to assume the same residual error model for all patients, and log-normal distributions for $V$ and $k$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> a_i &amp;=&amp; a \end{eqnarray}&lt;/math&gt; }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2b&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \log(V_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(V_{\rm pop})+\beta\,\log(w_i/70),\ \omega_V^2\right) \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \log(k_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(k_{\rm pop}),\ \omega_k^2\right), \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where the only covariate we choose to consider, $w_i$, is the weight (in kg) of patient $i$. The model therefore consists of the conditional distribution of the concentrations defined in [[#ex_proba2a|(2)]] and the distribution of the individual PK parameters defined in [[#ex_proba2b|(3)]]. The inputs of the model are the population parameters $\theta = (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$, the covariates (here, the weight) $(w_i, 1\leq i \leq N)$, and the design $\bt$.<br /> }}<br /> <br /> &lt;br&gt;<br /> ===A model for the population parameters===<br /> <br /> In some cases, it may turn out that it is useful or important to consider that the population parameter $\theta$ is itself random, rather than fixed. There are various reasons for this, such as if we want to model uncertainty in its value, introduce a priori information in an estimation context, or model an inter-population variability if the model is not looking at only one given population.<br /> <br /> If so, let us denote $\qth$ the distribution of $\theta$. As the status of $\theta$ has therefore changed, the model now becomes the joint distribution of its random variables, i.e., of $\by$, $\bpsi$ and $\theta$, and can be decomposed as follows:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3a&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta;\bt,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> |text= &lt;ol&gt;<br /> &lt;li&gt; The formula is identical for $\ppsi(\bpsi; \theta)$ and $\pcpsith(\bpsi{{!}}\theta)$. What has changed is the status of $\theta$. It is not random in $\ppsi(\bpsi; \theta)$, the distribution of $\bpsi$ for any given value of $\theta$, whereas it is random in $\pcpsith(\bpsi {{!}} \theta)$, the conditional distribution of $\bpsi$, i.e., the distribution of $\bpsi$ obtained after observing randomly generated $\theta$. &lt;/li&gt;&lt;br&gt;<br /> <br /> &lt;li&gt;If $\qth$ is a parametric distribution with parameter $\varphi$, this dependence can be made explicit by writing $\qth(\, \cdot \,;\varphi)$ for the distribution of $\theta$.&lt;/li&gt;&lt;br&gt;<br /> <br /> &lt;li&gt;Not necessarily all of the components of $\theta$ need be random. If it is possible to decompose $\theta$ into $(\theta_F,\theta_R)$, where $\theta_F$ is fixed and $\theta_R$ random, then the decomposition [[#proba3a{{!}}(4)]] becomes &lt;/li&gt;<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3b&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta_R;\bt,\theta_F,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta_R;\theta_F,\bc) \, \pth(\theta_R).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(5) }} <br /> &lt;/ol&gt;}}<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the population parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsith(\by,\bpsi,\theta;\bc,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta). &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times $\bt=(t_{ij} , 1\leq i \leq N , 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We can introduce [http://en.wikipedia.org/wiki/Prior_probability prior distributions] in order to model the inter-population variability of the population parameters $V_{\rm pop}$ and $k_{\rm pop}$: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba3&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> V_{\rm pop} &amp;\sim&amp; {\cal N}\left(30,3^2\right) <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(6) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> k_{\rm pop} &amp;\sim&amp; {\cal N}\left(0.1,0.01^2\right). \end{eqnarray}&lt;/math&gt; }}<br /> <br /> As before, the conditional distribution of the concentration is given by [[#ex_proba2a|(2)]]. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given $\theta_R=(V_{\rm pop},k_{\rm pop})$. The distribution of $\theta_R$ is defined in [[#ex_proba3|(6)]]. Here, the inputs of the model are the fixed population parameters $\theta_F = (\omega_V,\omega_k,\beta,a)$, the weights $(w_i)$ and the design $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the covariates===<br /> <br /> <br /> Another scenario is to suppose that in fact it is the covariates $\bc$ that are random, not the population parameters. This may either be in the context of wanting to simulate individuals, or when modeling and wanting to take into account uncertainty in the covariate values. If we note $\qc$ the distribution of the covariates, the joint distribution $\qpsic$ of the individual parameters and the covariates decomposes naturally as:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba4&quot;&gt;&lt;math&gt;<br /> \ppsic(\bpsi,\bc;\theta) = \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) \, ,<br /> &lt;/math&gt;<br /> |reference=(7) }}<br /> <br /> where $\qcpsic$ is the conditional distribution of $\bpsi$ given $\bc$.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> &lt;li&gt;In this context, the model is the joint distribution of the observations, the individual parameters and the covariates:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsic(\by,\bpsi,\bc;\theta,\bt) = \pcypsi(\by {{!}} \bpsi;\bt) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) . &lt;/math&gt; }}<br /> <br /> &lt;li&gt;The inputs of the model are the population parameters $\theta$ and the measurement times $\bt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We could assume a normal distribution as a prior for the weights: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba4&quot; &gt;&lt;math&gt; <br /> w_i \sim_{i.i.d.} {\cal N}\left(70,10^2\right). &lt;/math&gt;&lt;/div&gt;<br /> |reference=(8) }}<br /> <br /> Once more, [[#ex_proba2a|(2)]] defines the conditional distribution of the concentrations. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given the weight $\bw$, which is now a random variable whose distribution is defined in [[#ex_proba4|(8)]]. Now, the inputs of the model are the population parameters $\theta= (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$ and the design $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the measurement times===<br /> <br /> Another scenario is to suppose that there is uncertainty in the measurement times $\bt=(t_{ij})$ and not the population parameters or covariates. If we note $\nominal{\bt}=(\nominal{t}_{ij}, 1\leq i \leq N, 1\leq j \leq n_i)$ the nominal measurement times (i.e., those presented in a data set), then the &quot;true&quot; measurement times $\bt$ at which the measurement were made can be considered random fluctuations around $\nominal{\bt}$ following some distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> Randomness with respect to time can also appear in the presence of dropout, i.e., individuals that prematurely leave a clinical trial. For such an individual $i$ who leaves at the random time $T_i$, measurement times are the nominal times before $T_i$: $t_{i} = (\nominal{t}_{ij} \ \ {\rm s.t. }\ \ \nominal{t}_{ij}\leq T_i)$.<br /> In such situations, measurement times are therefore random and can be thought to come from a distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> <br /> {{Remarks<br /> |title=Remark<br /> |text= If there are also other regression variables $\bx=(x_{ij})$, it is of course possible to use the same approach and consider $\bx$ as a random variable fluctuating around $\nominal{\bx}$. }}<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsit(\by , \bpsi,\bt; \theta,\bc,\nominal{\bt})=\pcypsit(\by {{!}}\bpsi,\bt) \, \ppsi(\bpsi;\theta,\bc) \, \pt(\bt ; \nominal{\bt}) .<br /> &lt;/math&gt; }}<br /> <br /> <br /> &lt;li&gt; The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$ and the nominal design $\nominal{\bt}$.<br /> }}<br /> <br /> {{Example<br /> |title=Example:<br /> |text= Let us assume as prior a normal distribution around the nominal times: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba5&quot; &gt;&lt;math&gt; <br /> t_{ij} \sim_{i.i.d.} {\cal N}\left(\nominal{t}_{ij},0.03^2\right). &lt;/math&gt;&lt;/div&gt;<br /> |reference=(9) }}<br /> <br /> Here, [[#ex_proba5|(9)]] defines the distribution of the now random variable $\bt$. The other components of the model defined in [[#ex_proba2a|(2)]] and [[#ex_proba2b|(3)]] remain unchanged. <br /> The inputs of the model are the population parameters $\theta$, the weights $(w_i)$ and the nominal measurement times $\nominal{\bt}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the dose regimen===<br /> <br /> If the structural model is a dynamical system (e.g., defined by a system of [http://en.wikipedia.org/wiki/Ordinary_differential_equation ordinary differential equations]), the ''source terms'' $\bu = (u_i, 1\leq i \leq N)$, i.e., the inputs of the dynamical system, are usually considered fixed and known. This is the case for example for doses administered to patients for a given treatment. Here, the source term $u_i$ is made up of the dose(s) given to patient $i$, the time(s) of administration, and their type (intravenous bolus, infusion, oral, etc.).<br /> <br /> Here again, there may be differences between the nominal dose regimen stated in the protocol and given in the data set, and the dose regimen that was in reality administered. For example, it might be that the times of administration and/or the dosage were not exactly respected or recorded. Also, there may have been non-compliance, i.e., certain doses that were not taken by the patient.<br /> <br /> If we denote $\nominal{\bu}=(\nominal{u}_{i}, 1\leq i \leq N)$ the nominal dose regimens (reported in the dataset), then in this context the &quot;real&quot; dose regimens $\bu$ can be considered to randomly fluctuate around $\nominal{\bu}$ with some distribution $\qu(\, \cdot \, ; \nominal{\bu})$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the dose regimens:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsiu(\by , \bpsi,\bu; \theta,\bc,\bt,\nominal{\bu})=\pcypsiu(\by {{!}} \bpsi,\bu;\bt) \, \pu(\bu ; \nominal{\bu}) \, \ppsi(\bpsi;\theta,\bc) .<br /> &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$, the nominal design $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=Suppose that instead of the one dose given in the example up to now, there are now repeated doses $(d_{ik}, k \geq 1)$ administered to patient $i$ at times $(\tau_{ik} , k \geq 1)$. Then, it is easy to see that<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6b&quot; &gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N}\left(f(t_{ij};V_i,k_i) , a_i^2 \right), &lt;/math&gt;&lt;/div&gt;<br /> |reference=(10) }}<br /> <br /> where<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6a&quot; &gt;&lt;math&gt; <br /> f(t;V_i,k_i) = \sum_{k, \tau_{ik}&lt;t}\displaystyle {\frac{d_{ik} }{V_i} }\, e^{-k_i \, (t- \tau_{ik})} .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(11) }}<br /> <br /> The &quot;real&quot; dose regimen administered to patient $i$ can be written $u_i=(d_{ik},\tau_{ik}, k\geq 1)$, and the prescribed dose regimen $\nominal{u}_i=(\nominal{d}_{ik},\nominal{\tau}_{ik}, k\geq 1)$.<br /> <br /> We can model the random fluctuations of the administration times $\tau_{ik}$ around the nominal times $(\nominal{\tau}_{ij})$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6c&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \tau_{ik} &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\nominal{\tau}_{ik},0.02^2\right)\ , <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(12) }}<br /> <br /> and non-compliance (here meaning that a dose is not taken):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6d&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \pi &amp;=&amp; \prob{d_{ik} = 0} \nonumber \\ &amp;=&amp; 1 - \prob{d_{ik}= \nominal{d}_{ik} }. <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(13) }}<br /> <br /> Here, [[#ex_proba6b{{!}}(10)]] and [[#ex_proba6a{{!}}(11)]] define the conditional distributions of the concentrations $(y_{ij})$, [[#ex_proba2b{{!}}(3)]] defines the distribution of $\bpsi$ and [[#ex_proba6c{{!}}(12)]] and [[#ex_proba6d{{!}}(13)]] define the distribution of $\bu$. The inputs are the population parameters $\theta$, the weights $(w_i)$, the measurement times $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A complete model===<br /> <br /> We have now seen the variety of ways in which the variables in a model either play the role of random variables whose distribution is defined by the model, or nonrandom variables or parameters. Any combination is possible, depending on the context. For instance, the population parameters $\theta$ and covariates $\bc$ could be random with parametric probability distributions $\qth(\, \cdot \,;\varphi)$ and $\qc(\, \cdot \,;\gamma)$, and the dose regimen $\bu$ and measurement times $\bt$ reported with uncertainty and therefore modeled as random variables with distributions $\qu$ and $\qt$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters, the population parameters, the dose regimens, the covariates and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsithcut(\by , \bpsi, \theta, \bu, \bc,\bt; \nominal{\bu},\nominal{\bt},\varphi,\gamma)=\pcypsiut(\by {{!}}\bpsi,\bu,\bt) \, \pcpsithc(\bpsi{{!}}\theta,\bc) \, \pth(\theta;\varphi) \, \pc(\bc;\gamma) \, \pu(\bu ; \nominal{\bu}) \, \pt(\bt ; \nominal{\bt}). &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the nominal dose regimens $\nominal{\bu}$, the nominal measurement times $\nominal{\bt}$ and the &quot;hyper-parameters&quot; $\varphi$ and $\gamma$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Using the model for executing tasks==<br /> <br /> <br /> In the modeling and simulation context, the tasks to execute make specific use of the various probability distributions associated with a model.<br /> <br /> &lt;br&gt;<br /> ===Simulation===<br /> <br /> <br /> By definition, simulation makes direct use of the probability distribution that defines the model. Simulation of the global model is straightforward as soon as the joint distribution can be decomposed into a product of easily simulated conditional distributions.<br /> <br /> Consider for example that the variables involved in the model are those introduced in [[#An illustrative example|the previous section]]:<br /> <br /> <br /> # The population parameters $\theta$ can either be given, or simulated from the distribution $\qth$.<br /> # The individual covariates $\bc$ can either be given, or simulated from the distribution $\qc$.<br /> # The individual parameters $\bpsi$ can be simulated from the distribution $\qcpsithc$ using the values of $\theta$ and $\bc$ obtained in steps 1 and 2.<br /> # The dose regimen $\bu$ can either be given, or simulated from the distribution $\qu$.<br /> # The measurement times $\bt$ (resp. regression variables $\bx$) can either be given, or simulated from the distribution $\qt$ (resp. $\qx$).<br /> # Lastly, observations $\by$ can be simulated from the distribution $\qcypsiut$ using the values of $\bpsi$, $\bu$ and $\bt$ obtained at steps 3, 4 and 5.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> Simulation of a set of variables $w$ using another given set of variables $z$ requires:<br /> <br /> <br /> &lt;ul&gt;<br /> * a model, i.e., a distribution $\qw$ if $z$ is treated as a nonrandom variable, or a conditional distribution $\qcwz$ if $z$ is treated as a random variable.<br /> * the input $z$, i.e., a value of $z$ which allows the distribution $\qw(\, \cdot \, ; z)$ or the conditional distribution $\qcwz(\, \cdot \, {{!}} z)$ to be defined.<br /> * an algorithm which allows us to generate $w$ from $\qw$ or $\qcwz$.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=<br /> - Imagine instead that the population parameter $\theta$ and the design $(\bu,\bt)$ are given, and we want to simulate the individual covariates $\bc$, the individual parameters $\bpsi$ and the observations $\by$. Here, the variables to simulate are $w=(\bc,\bpsi,\by)$ and the variables which are given are $z=(\theta,\bu,\bt)$. If the components of $z$ are taken to be nonrandom variables, then:<br /> <br /> <br /> &lt;ul&gt;<br /> * The model is the joint distribution $\qypsic( \, \cdot \, ;\theta,\bu,\bt)$ of $(\by,\bpsi,\bc)$.<br /> * The inputs required for the simulation are the values of $(\theta,\bu,\bt)$.<br /> * The algorithm should be able to generate $(\by,\bpsi,\bc)$ from the joint distribution $\qypsic(\, \cdot \, ;\theta,\bu,\bt)$. Decomposing the model into three submodels leads to decomposing the joint distribution as<br /> &lt;/ul&gt;<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsic(\by,\bpsi,\bc ;\theta,\bu,\bt) = \pc(\bc) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pcypsi(\by {{!}} \bpsi;\bu,\bt) . &lt;/math&gt; }}<br /> <br /> The algorithm therefore reduces to successively drawing $\bc$, $\bpsi$ and $\by$ from $\qc$, $\qcpsic(\, \cdot \, {{!}} \bc;\theta)$ and $\qcypsi(\, \cdot \, {{!}} \bpsi;\bu,\bt)$. <br /> <br /> <br /> - Imagine instead that the individual covariates $\bc$, the observations $\by$, the design $(\bu,\bt)$ and the population parameter $\theta$ are given (in a modeling context for instance, $\theta$ may have been estimated), and we want to simulate the individual parameters $\bpsi$. The only variable to simulate is $w=\bpsi$ and the variables which are given are $z=(\by,\bc,\theta,\bu,\bt)$. Here, we will treat $\by$ as if it is a random variable. The other components of $z$ can be treated as non random variables. Here,<br /> <br /> <br /> &lt;ul&gt;<br /> * The model is the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$ of $\psi$.<br /> * The inputs required for the simulation are the values of $(\by,\bc,\theta,\bu,\bt)$.<br /> * The algorithm should be able to sample $\bpsi$ from the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$. [http://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo Markov Chain Monte Carlo] (MCMC) algorithms can be used for sampling from such complex conditional distributions.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> &lt;br&gt;<br /> <br /> ===Estimation of the population parameters===<br /> <br /> <br /> In a modeling context, we usually assume that we have data that includes the observations $\by$ and the measurement times $\bt$. There may also be individual covariates $\bc$, and in pharmacological applications the dose regimen $\bu$. For clarity, let us consider the most general case where all are present.<br /> <br /> Any statistical method for estimating the population parameters $\theta$ will be based on some specific probability distribution. Let us illustrate this with two common statistical methods: maximum likelihood and Bayesian estimation.<br /> <br /> <br /> ''Maximum likelihood estimation'' consists in maximizing with respect to $\theta$ the ''observed likelihood'', defined by:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\like}(\theta ; \by,\bc,\bu,\bt) &amp;\eqdef&amp; \py(\by ; \bc,\bu,\bt,\theta) \\<br /> &amp;=&amp; \int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The variance of the estimator $\thmle$ and therefore confidence intervals can be derived from the observed Fisher information matrix, which itself is calculated using the observed likelihood (i.e., the pdf of the observations $\by$):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ofim_intro3&quot;&gt;&lt;math&gt;<br /> \ofim(\thmle ; \by,\bc,\bu,\bt) \ \ \eqdef \ \ - \displaystyle{ \frac{\partial^2}{\partial \theta^2} } \log({\like}(\thmle ; \by,\bc,\bu,\bt)) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(14) }}<br /> <br /> <br /> {{OutlineText<br /> |text=Maximum likelihood estimation of the population parameter $\theta$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi$.<br /> * inputs $\by$, $\bc$, $\bu$ and $\bt$.<br /> * an algorithm which allows us to maximize $\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi$ with respect to $\theta$ and to compute $\displaystyle{ \frac{\partial^2}{\partial \theta^2} }\left\{\log\left(\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\thmle) \, d \bpsi \right)\right\}$.<br /> }}<br /> <br /> <br /> ''Bayesian estimation'' consists in estimating and/or maximizing the conditional distribution<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcthy(\theta {{!}} \by ;\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pyth(\by , \theta ; \bc,\bu,\bt)}{\py(\by ; \bc,\bu,\bt)} } \\<br /> &amp;=&amp; \frac{\displaystyle{ \int \pypsith(\by,\bpsi,\theta ; \bc,\bu,\bt) \, d \bpsi} }{\py(\by ; \bc,\bu,\bt)} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> {{outlineText<br /> |text= Bayesian estimation of the population parameter $\theta$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsith(\, \cdot \, ; \bc, \bu, \bt)$ for $(\by,\bpsi,\theta)$.<br /> * inputs $\by$, $\bc$, $\bu$ and $\bt$.<br /> * algorithms able to estimate and maximize $\pcthy(\theta {{!}} \by ;\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode, i.e., finding its maximum.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Estimation of the individual parameters===<br /> <br /> <br /> When $\theta$ is given (or estimated), various estimators of the individual parameters $\bpsi$ are available. They are all based on a probability distribution:<br /> <br /> ''Maximum likelihood estimation'' consists of maximizing with respect to $\bpsi$ the ''conditional likelihood''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\like}(\bpsi ; \by,\bu,\bt) &amp;\eqdef&amp; \pcypsi(\by {{!}} \bpsi ;\bu,\bt) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''maximum a posteriori'' (MAP) estimator is obtained by maximizing with respect to $\bpsi$ the ''conditional distribution''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcpsiy(\bpsi {{!]}} \by ; \theta,\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pypsi(\by , \bpsi;\theta,\bc,\bu,\bt)}{\py(\by ; \theta,\bc,\bu,\bt)} } .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''conditional mean'' of $\bpsi$ is defined as the mean of the conditional distribution $\qcpsiy(\, \cdot \, | \by ; \theta,\bc,\bu,\bt)$ of $\psi$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> Estimation of the individual parameters $\bpsi$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br /> * algorithms able to estimate and maximize $\pcpsiy(\bpsi {{!}} \by ; \theta,\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode (i.e., its MAP).<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> ===Model selection===<br /> <br /> <br /> Likelihood ratio tests and statistical information criteria (BIC, AIC) compare the ''observed likelihoods'' computed under different models, i.e., the probability distribution functions $\py^{(1)}(\by ; \bc,\bu,\bt,\thmle_1)$, $\py^{(2)}(\by ; \bc,\bu,\bt,\thmle_2)$, ..., $\py^{(K)}(\by ; \bc,\bu,\bt,\thmle_K)$ computed under models ${\cal M}_1, {\cal M}_2$, ..., ${\cal M}_K$, where $\thmle_k$ maximizes the observed likelihood of model ${\cal M}_k$, i.e., maximizes $\py^{(k)}(\by ; \bc,\bu,\bt,\theta)$ .<br /> <br /> <br /> {{outlineText<br /> |text=<br /> Computing the observed likelihood and information criteria require:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br /> * an algorithm able to compute $\int \pypsi( \by ,\bpsi ;\theta,\bc,\bu,\bt) \, d\bpsi$. For nonlinear models, linearization methods or Monte-Carlo methods can be used.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Optimal design===<br /> <br /> In the design of experiments for estimating statistical models, optimal designs allow parameters to be estimated with minimum variance by optimizing some statistical criteria. Common optimality criteria are functionals of the eigenvalues of the expected Fisher information matrix:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;efim_intro3&quot;&gt;&lt;math&gt;<br /> \efim(\theta ; \bu,\bt) \ \ \eqdef \ \ \esps{y}{\ofim(\theta ; \by,\bu,\bt)} ,<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(15) }}<br /> <br /> where $\ofim$ is the observed Fisher information matrix defined in [[#ofim_intro3|(15)]]. For the sake of simplicity, we consider models without covariates $\bc$.<br /> <br /> <br /> {{OutlineText<br /> |text=Optimal design for minimum variance estimation requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * a vector of population parameters $\theta$.<br /> * a criteria ${\cal D}(\bu,\bt)$ derived from the expected Fisher information matrix $\efim(\theta ; \bu,\bt)$.<br /> * an algorithm able to estimate $\efim(\theta ; \bu,\bt)$ for any design $(\bu,\bt)$ and to maximize ${\cal D}(\bu,\bt)$ with respect to $\bu$ and $\bt$.<br /> }}<br /> <br /> <br /> In a clinical trial context, studies are designed to optimize the probability of reaching some predefined target ${\cal A}$, i.e., $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$. This may include optimizing safety and efficacy, and things like the probability of reaching sustained virologic response, etc.<br /> <br /> <br /> {{OutlineText<br /> |text=Optimal design for clinical trials requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsic(\, \cdot \, ; \theta, \bu, \bt)$ for $(\by,\bpsi,\bc)$.<br /> * a vector of population parameters $\theta$.<br /> * a target ${\cal A}$.<br /> * an algorithm able to estimate $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$ and to maximize it with respect to $\bu$ and $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Implementing models and running tasks==<br /> <br /> <br /> ===Example 1 ===<br /> <br /> Consider first the model defined by the joint distribution <br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\pypsi(\by,\bpsi ; \theta ,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \pcpsic(\bpsi ; \theta),&lt;/math&gt;}}<br /> <br /> where as in our running example, <br /> <br /> <br /> &lt;ul&gt;<br /> * $\by = (y_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are concentrations <br /> <br /> * $\bpsi= (\psi_i, 1\leq i \leq N)$ are individual parameters; here $\psi_i=(V_i,k_i,a_i)$<br /> <br /> * $\theta=(V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,a)$ are population parameters <br /> <br /> * $\bt = (t_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are the measurement times. <br /> &lt;/ul&gt;<br /> <br /> <br /> We aim to define a joint model for $\by$ and $\bpsi$. To do this, we will characterize each component of the model and show how this can be implemented with $\mlxtran$.<br /> <br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot;|<br /> {{Equation2 <br /> |name=&lt;math&gt; \pypsi(\by,\bpsi ; \theta, \bt) &lt;/math&gt; <br /> |equation= }}<br /> {{Equation2<br /> |name= &lt;math&gt; \pcpsic(\bpsi {{!}}\theta)&lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{array}{c} <br /> \log(V_i) &amp;\sim&amp; {\cal N}\left(\log(V_{\rm pop}), \, \omega_V^2\right) \\<br /> \log(k_i) &amp;\sim&amp; {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br /> \end{array}&lt;/math&gt; }}<br /> {{Equation2<br /> |name= &lt;math&gt;\pcypsi(y{{!}}\bpsi; \bt) &lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{eqnarray}<br /> f(t;V_i,k_i) &amp;=&amp; \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br /> y_{ij} &amp;\sim&amp; {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> |style = &quot;width:50%&quot; |<br /> {{MLXTranForTable<br /> |name=Example 1<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> [INDIVIDUAL PARAMETER]<br /> input={V_pop,k_pop,omega_V,omega_k}<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=V_pop,sd=omega_V}<br /> k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br /> <br /> <br /> [OBSERVATION]<br /> input={V,k,a}<br /> <br /> EQUATION:<br /> f = 500/V*exp(-k*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, sd=a}<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> <br /> We can then use this model with different tools for executing different tasks: it can be used for example with $\mlxplore$ for model exploration, with $\monolix$ for modeling, with R or Matlab for simulation...<br /> <br /> It is important to remember that $\mlxtran$ is not a &quot;function&quot; that calculates an output. It is not an imperative but rather a declarative language, one that allows us to describe a model. It is then the tasks we choose to do which use $\mlxtran$ like a function, &quot;requesting&quot; it to give predictions, simulate random variables, compute a pdf, maximizes a likelihood, etc.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Example 1 ===<br /> <br /> Consider first the model defined by the joint distribution <br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\pypsi(\by,\bpsi ; \theta ,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \pcpsic(\bpsi ; \theta),&lt;/math&gt;}}<br /> <br /> where as in our running example, <br /> <br /> <br /> &lt;ul&gt;<br /> * $\by = (y_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are concentrations <br /> <br /> * $\bpsi= (\psi_i, 1\leq i \leq N)$ are individual parameters; here $\psi_i=(V_i,k_i,a_i)$<br /> <br /> * $\theta=(V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,a)$ are population parameters <br /> <br /> * $\bt = (t_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are the measurement times. <br /> &lt;/ul&gt;<br /> <br /> <br /> We aim to define a joint model for $\by$ and $\bpsi$. To do this, we will characterize each component of the model and show how this can be implemented with $\mlxtran$.<br /> <br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot;|<br /> {{Equation2 <br /> |name=&lt;math&gt; \pypsi(\by,\bpsi ; \theta, \bt) &lt;/math&gt; <br /> |equation= }}<br /> {{Equation2<br /> |name= &lt;math&gt; \pcpsic(\bpsi {{!}}\theta)&lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{array}{c} <br /> \log(V_i) &amp;\sim&amp; {\cal N}\left(\log(V_{\rm pop}), \, \omega_V^2\right) \\<br /> \log(k_i) &amp;\sim&amp; {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br /> \end{array}&lt;/math&gt; }}<br /> {{Equation2<br /> |name= &lt;math&gt;\pcypsi(y{{!}}\bpsi; \bt) &lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{eqnarray}<br /> f(t;V_i,k_i) &amp;=&amp; \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br /> y_{ij} &amp;\sim&amp; {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> |style = &quot;width:50%&quot; |<br /> {{MLXTranForTable<br /> |name=Example 1<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> [INDIVIDUAL PARAMETER]<br /> input={V_pop,k_pop,omega_V,omega_k}<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=V_pop,sd=omega_V}<br /> k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br /> <br /> <br /> [OBSERVATION]<br /> input={V,k,a}<br /> <br /> EQUATION:<br /> f = 500/V*exp(-k*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, sd=a}<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> <br /> We can then use this model with different tools for executing different tasks: it can be used for example with $\mlxplore$ for model exploration, with $\monolix$ for modeling, with R or Matlab for simulation...<br /> <br /> It is important to remember that $\mlxtran$ is not a &quot;function&quot; that calculates an output. It is not an imperative but rather a declarative language, one that allows us to describe a model. It is then the tasks we choose to do which use $\mlxtran$ like a function, &quot;requesting&quot; it to give predictions, simulate random variables, compute a pdf, maximizes a likelihood, etc.<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack=The individual approach<br /> |linkNext=Description, representation and implementation of a model }}</div> Bleakley https://wiki.inria.fr/wikis/popix/index.php?title=What_is_a_model%3F_A_joint_probability_distribution!&diff=7364 What is a model? A joint probability distribution! 2013-06-19T08:57:33Z <p>Bleakley: </p> <hr /> <div>==Introduction==<br /> <br /> A model built for real-world applications can involve various types of variable, such as measurements, individual and population parameters, covariates, design, etc. The model allows us to represent relationships between these variables.<br /> <br /> If we consider things from a probabilistic point of view, some of the variables will be random, so the model becomes a probabilistic one, representing the [http://en.wikipedia.org/wiki/Joint_probability_distribution joint distribution] of these random variables.<br /> <br /> Defining a model therefore means defining a joint distribution. The hierarchical structure of the model will then allow it to be decomposed into submodels, i.e., the joint distribution decomposed into a product of [http://en.wikipedia.org/wiki/Conditional_probability_distribution conditional distributions].<br /> <br /> Tasks such as estimation, model selection, simulation and optimization can then be expressed as specific ways of using this probability distribution.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - A model is a joint probability distribution. <br /> <br /> - A submodel is a conditional distribution derived from this joint distribution. <br /> <br /> - A task is a specific use of this distribution. <br /> }}<br /> <br /> We will illustrate this approach starting with a very simple example that we will gradually make more sophisticated. Then we will see in various situations what can be defined as the model and what its inputs are.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==An illustrative example==<br /> <br /> &lt;br&gt;<br /> ===A model for the observations of a single individual===<br /> Let $y=(y_j, 1\leq j \leq n)$ be a vector of observations obtained at times $\vt=(t_j, 1\leq j \leq n)$. We consider that the $y_j$ are random variables and we denote $\qy$ the distribution (or [http://en.wikipedia.org/wiki/Probability_density_function pdf]) of $y$. If we assume a [http://en.wikipedia.org/wiki/Parametric_model parametric model], then there exists a vector of parameters $\psi$ that completely define $y$.<br /> <br /> We can then explicitly represent this dependency with respect to $\bpsi$ by writing $\qy( \, \cdot \, ; \psi)$ for the pdf of $y$.<br /> <br /> If we wish to be even more precise, we can even make it clear that this distribution is defined for a given design, i.e., a given vector of times $\vt$, and write $\qy(\, \cdot \, ; \psi,\vt)$ instead.<br /> <br /> By convention, the variables which are before the symbol &quot;;&quot; are random variables. Those that are after the &quot;;&quot; are non-random parameters or variables.<br /> When there is no risk of confusion, the non-random terms can be left out of the notation.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> -In this context, the model is the distribution of the observations $\qy(\, \cdot \, ; \psi,\vt)$. &lt;br&gt;<br /> -The inputs of the model are the parameters $\psi$ and the design $\vt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= 500 mg of a drug is given by [http://en.wikipedia.org/wiki/Intravenous_therapy intravenous] [http://en.wikipedia.org/wiki/Bolus_%28medicine%29 bolus] to a patient at time 0. We assume that the evolution of the [http://en.wikipedia.org/wiki/Blood_plasma plasmatic] concentration of the drug over time is described by the [http://en.wikipedia.org/wiki/Pharmacokinetics pharmacokinetic] (PK) model<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; f(t;V,k) = \displaystyle{ \frac{500}{V} }e^{-k \, t} , &lt;/math&gt; }}<br /> <br /> where $V$ is the [http://en.wikipedia.org/wiki/Volume_of_distribution volume of distribution] and $k$ the [http://en.wikipedia.org/wiki/Elimination_rate_constant elimination rate constant]. The concentration is measured at times $(t_j, 1\leq j \leq n)$ with additive residual errors:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; y_j = f(t_j;V,k) + e_j , \quad 1 \leq j \leq n . &lt;/math&gt; }}<br /> <br /> Assuming that the residual errors $(e_j)$ are [http://en.wikipedia.org/wiki/Dependent_and_independent_variables independent] and [http://en.wikipedia.org/wiki/Normal_distribution normally distributed] with constant variance $a^2$, the observed values $(y_j)$ are also independent random variables and<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba1&quot; &gt;&lt;math&gt;<br /> y_j \sim {\cal N} \left( f(t_j ; V,k) , a^2 \right), \quad 1 \leq j \leq n. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> Here, the vector of parameters $\psi$ is $(V,k,a)$. $V$ and $k$ are the PK parameters for the structural PK model and $a$ the residual error parameter.<br /> As the $y_j$ are independent, the joint distribution of $y$ is the product of their marginal distributions:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \py(y ; \psi,\vt) = \prod_{j=1}^n \pyj(y_j ; \psi,t_j) ,<br /> &lt;/math&gt; }}<br /> <br /> where $\qyj$ is the normal distribution defined in [[#ex_proba1|(1)]]. <br /> }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> === A model for several individuals ===<br /> <br /> Now let us move to $N$ individuals. It is natural to suppose that each is represented by the same basic parametric model, but not necessarily the exact same parameter values. Thus, individual $i$ has parameters $\psi_i$. If we consider that individuals are randomly selected from the [http://en.wikipedia.org/wiki/Statistical_population population], then we can treat the $\psi_i$ as if they were random vectors. As both $\by=(y_i , 1\leq i \leq N)$ and $\bpsi=(\psi_i , 1\leq i \leq N)$ are random, the model is now a joint distribution: $\qypsi$. Using basic probability, this can be written as:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsi(\by,\bpsi) = \pcypsi(\by {{!}} \bpsi) \, \ppsi(\bpsi) .&lt;/math&gt; }}<br /> <br /> If $\qpsi$ is a parametric distribution that depends on a vector $\theta$ of ''population parameters'' and a set of ''individual covariates'' $\bc=(c_i , 1\leq i \leq N)$, this dependence can be made explicit by writing $\qpsi(\, \cdot \,;\theta,\bc)$ for the pdf of $\bpsi$.<br /> Each $i$ has a potentially unique set of times $t_i=(t_{i1},\ldots,t_{i \ \!\!n_i})$ in the design, and $n_i$ can be different for each individual.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - In this context, the model is the joint distribution of the observations and the individual parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pypsi(\by , \bpsi; \theta, \bc,\bt)=\pcypsi(\by {{!}} \bpsi;\bt) \, \ppsi(\bpsi;\theta,\bc) . &lt;/math&gt;}}<br /> <br /> - The inputs of the model are the population parameters $\theta$, the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times <br /> :$\bt=(t_{ij} ,\ 1\leq i \leq N ,\ 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= Let us suppose $N$ patients received the same treatment as the single patient did. We now have the same PK model [[#ex_proba1|(1)]] for each patient, except that each has its own individual PK parameters $V_i$ and $k_i$ and potentially its own residual error parameter $a_i$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2a&quot;&gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N} \left( \displaystyle{\frac{500}{V_i}e^{-k_i \, t_{ij} } } , a_i^2 \right).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> Here, $\psi_i = (V_i,k_i,a_i)$. One possible model is then to assume the same residual error model for all patients, and log-normal distributions for $V$ and $k$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> a_i &amp;=&amp; a \end{eqnarray}&lt;/math&gt; }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2b&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \log(V_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(V_{\rm pop})+\beta\,\log(w_i/70),\ \omega_V^2\right) \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \log(k_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(k_{\rm pop}),\ \omega_k^2\right), \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where the only covariate we choose to consider, $w_i$, is the weight (in kg) of patient $i$. The model therefore consists of the conditional distribution of the concentrations defined in [[#ex_proba2a|(2)]] and the distribution of the individual PK parameters defined in [[#ex_proba2b|(3)]]. The inputs of the model are the population parameters $\theta = (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$, the covariates (here, the weight) $(w_i, 1\leq i \leq N)$, and the design $\bt$.<br /> }}<br /> <br /> &lt;br&gt;<br /> ===A model for the population parameters===<br /> <br /> In some cases, it may turn out that it is useful or important to consider that the population parameter $\theta$ is itself random, rather than fixed. There are various reasons for this, such as if we want to model uncertainty in its value, introduce a priori information in an estimation context, or model an inter-population variability if the model is not looking at only one given population.<br /> <br /> If so, let us denote $\qth$ the distribution of $\theta$. As the status of $\theta$ has therefore changed, the model now becomes the joint distribution of its random variables, i.e., of $\by$, $\bpsi$ and $\theta$, and can be decomposed as follows:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3a&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta;\bt,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> |text= &lt;ol&gt;<br /> &lt;li&gt; The formula is identical for $\ppsi(\bpsi; \theta)$ and $\pcpsith(\bpsi{{!}}\theta)$. What has changed is the status of $\theta$. It is not random in $\ppsi(\bpsi; \theta)$, the distribution of $\bpsi$ for any given value of $\theta$, whereas it is random in $\pcpsith(\bpsi {{!}} \theta)$, the conditional distribution of $\bpsi$, i.e., the distribution of $\bpsi$ obtained after observing randomly generated $\theta$. &lt;/li&gt;&lt;br&gt;<br /> <br /> &lt;li&gt;If $\qth$ is a parametric distribution with parameter $\varphi$, this dependence can be made explicit by writing $\qth(\, \cdot \,;\varphi)$ for the distribution of $\theta$.&lt;/li&gt;&lt;br&gt;<br /> <br /> &lt;li&gt;Not necessarily all of the components of $\theta$ need be random. If it is possible to decompose $\theta$ into $(\theta_F,\theta_R)$, where $\theta_F$ is fixed and $\theta_R$ random, then the decomposition [[#proba3a{{!}}(4)]] becomes &lt;/li&gt;<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3b&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta_R;\bt,\theta_F,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta_R;\theta_F,\bc) \, \pth(\theta_R).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(5) }} <br /> &lt;/ol&gt;}}<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the population parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsith(\by,\bpsi,\theta;\bc,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta). &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times $\bt=(t_{ij} , 1\leq i \leq N , 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We can introduce [http://en.wikipedia.org/wiki/Prior_probability prior distributions] in order to model the inter-population variability of the population parameters $V_{\rm pop}$ and $k_{\rm pop}$: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba3&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> V_{\rm pop} &amp;\sim&amp; {\cal N}\left(30,3^2\right) <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(6) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> k_{\rm pop} &amp;\sim&amp; {\cal N}\left(0.1,0.01^2\right). \end{eqnarray}&lt;/math&gt; }}<br /> <br /> As before, the conditional distribution of the concentration is given by [[#ex_proba2a|(2)]]. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given $\theta_R=(V_{\rm pop},k_{\rm pop})$. The distribution of $\theta_R$ is defined in [[#ex_proba3|(6)]]. Here, the inputs of the model are the fixed population parameters $\theta_F = (\omega_V,\omega_k,\beta,a)$, the weights $(w_i)$ and the design $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the covariates===<br /> <br /> <br /> Another scenario is to suppose that in fact it is the covariates $\bc$ that are random, not the population parameters. This may either be in the context of wanting to simulate individuals, or when modeling and wanting to take into account uncertainty in the covariate values. If we note $\qc$ the distribution of the covariates, the joint distribution $\qpsic$ of the individual parameters and the covariates decomposes naturally as:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba4&quot;&gt;&lt;math&gt;<br /> \ppsic(\bpsi,\bc;\theta) = \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) \, ,<br /> &lt;/math&gt;<br /> |reference=(7) }}<br /> <br /> where $\qcpsic$ is the conditional distribution of $\bpsi$ given $\bc$.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> &lt;li&gt;In this context, the model is the joint distribution of the observations, the individual parameters and the covariates:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsic(\by,\bpsi,\bc;\theta,\bt) = \pcypsi(\by {{!}} \bpsi;\bt) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) . &lt;/math&gt; }}<br /> <br /> &lt;li&gt;The inputs of the model are the population parameters $\theta$ and the measurement times $\bt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We could assume a normal distribution as a prior for the weights: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba4&quot; &gt;&lt;math&gt; <br /> w_i \sim_{i.i.d.} {\cal N}\left(70,10^2\right). &lt;/math&gt;&lt;/div&gt;<br /> |reference=(8) }}<br /> <br /> Once more, [[#ex_proba2a|(2)]] defines the conditional distribution of the concentrations. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given the weight $\bw$, which is now a random variable whose distribution is defined in [[#ex_proba4|(8)]]. Now, the inputs of the model are the population parameters $\theta= (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$ and the design $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the measurement times===<br /> <br /> Another scenario is to suppose that there is uncertainty in the measurement times $\bt=(t_{ij})$ and not the population parameters or covariates. If we note $\nominal{\bt}=(\nominal{t}_{ij}, 1\leq i \leq N, 1\leq j \leq n_i)$ the nominal measurement times (i.e., those presented in a data set), then the &quot;true&quot; measurement times $\bt$ at which the measurement were made can be considered random fluctuations around $\nominal{\bt}$ following some distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> Randomness with respect to time can also appear in the presence of dropout, i.e., individuals that prematurely leave a clinical trial. For such an individual $i$ who leaves at the random time $T_i$, measurement times are the nominal times before $T_i$: $t_{i} = (\nominal{t}_{ij} \ \ {\rm s.t. }\ \ \nominal{t}_{ij}\leq T_i)$.<br /> In such situations, measurement times are therefore random and can be thought to come from a distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> <br /> {{Remarks<br /> |title=Remark<br /> |text= If there are also other regression variables $\bx=(x_{ij})$, it is of course possible to use the same approach and consider $\bx$ as a random variable fluctuating around $\nominal{\bx}$. }}<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsit(\by , \bpsi,\bt; \theta,\bc,\nominal{\bt})=\pcypsit(\by {{!}}\bpsi,\bt) \, \ppsi(\bpsi;\theta,\bc) \, \pt(\bt ; \nominal{\bt}) .<br /> &lt;/math&gt; }}<br /> <br /> <br /> &lt;li&gt; The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$ and the nominal design $\nominal{\bt}$.<br /> }}<br /> <br /> {{Example<br /> |title=Example:<br /> |text= Let us assume as prior a normal distribution around the nominal times: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba5&quot; &gt;&lt;math&gt; <br /> t_{ij} \sim_{i.i.d.} {\cal N}\left(\nominal{t}_{ij},0.03^2\right). &lt;/math&gt;&lt;/div&gt;<br /> |reference=(9) }}<br /> <br /> Here, [[#ex_proba5|(9)]] defines the distribution of the now random variable $\bt$. The other components of the model defined in [[#ex_proba2a|(2)]] and [[#ex_proba2b|(3)]] remain unchanged. <br /> The inputs of the model are the population parameters $\theta$, the weights $(w_i)$ and the nominal measurement times $\nominal{\bt}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the dose regimen===<br /> <br /> If the structural model is a dynamical system (e.g., defined by a system of [http://en.wikipedia.org/wiki/Ordinary_differential_equation ordinary differential equations]), the ''source terms'' $\bu = (u_i, 1\leq i \leq N)$, i.e., the inputs of the dynamical system, are usually considered fixed and known. This is the case for example for doses administered to patients for a given treatment. Here, the source term $u_i$ is made up of the dose(s) given to patient $i$, the time(s) of administration, and their type (intravenous bolus, infusion, oral, etc.).<br /> <br /> Here again, there may be differences between the nominal dose regimen stated in the protocol and given in the data set, and the dose regimen that was in reality administered. For example, it might be that the times of administration and/or the dosage were not exactly respected or recorded. Also, there may have been non-compliance, i.e., certain doses that were not taken by the patient.<br /> <br /> If we denote $\nominal{\bu}=(\nominal{u}_{i}, 1\leq i \leq N)$ the nominal dose regimens (reported in the dataset), then in this context the &quot;real&quot; dose regimens $\bu$ can be considered to randomly fluctuate around $\nominal{\bu}$ with some distribution $\qu(\, \cdot \, ; \nominal{\bu})$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the dose regimens:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsiu(\by , \bpsi,\bu; \theta,\bc,\bt,\nominal{\bu})=\pcypsiu(\by {{!}} \bpsi,\bu;\bt) \, \pu(\bu ; \nominal{\bu}) \, \ppsi(\bpsi;\theta,\bc) .<br /> &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$, the nominal design $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=Suppose that instead of the one dose given in the example up to now, there are now repeated doses $(d_{ik}, k \geq 1)$ administered to patient $i$ at times $(\tau_{ik} , k \geq 1)$. Then, it is easy to see that<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6b&quot; &gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N}\left(f(t_{ij};V_i,k_i) , a_i^2 \right), &lt;/math&gt;&lt;/div&gt;<br /> |reference=(10) }}<br /> <br /> where<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6a&quot; &gt;&lt;math&gt; <br /> f(t;V_i,k_i) = \sum_{k, \tau_{ik}&lt;t}\displaystyle {\frac{d_{ik} }{V_i} }\, e^{-k_i \, (t- \tau_{ik})} .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(11) }}<br /> <br /> The &quot;real&quot; dose regimen administered to patient $i$ can be written $u_i=(d_{ik},\tau_{ik}, k\geq 1)$, and the prescribed dose regimen $\nominal{u}_i=(\nominal{d}_{ik},\nominal{\tau}_{ik}, k\geq 1)$.<br /> <br /> We can model the random fluctuations of the administration times $\tau_{ik}$ around the nominal times $(\nominal{\tau}_{ij})$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6c&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \tau_{ik} &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\nominal{\tau}_{ik},0.02^2\right)\ , <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(12) }}<br /> <br /> and non-compliance (here meaning that a dose is not taken):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6d&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \pi &amp;=&amp; \prob{d_{ik} = 0} \nonumber \\ &amp;=&amp; 1 - \prob{d_{ik}= \nominal{d}_{ik} }. <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(13) }}<br /> <br /> Here, [[#ex_proba6b{{!}}(10)]] and [[#ex_proba6a{{!}}(11)]] define the conditional distributions of the concentrations $(y_{ij})$, [[#ex_proba2b{{!}}(3)]] defines the distribution of $\bpsi$ and [[#ex_proba6c{{!}}(12)]] and [[#ex_proba6d{{!}}(13)]] define the distribution of $\bu$. The inputs are the population parameters $\theta$, the weights $(w_i)$, the measurement times $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A complete model===<br /> <br /> We have now seen the variety of ways in which the variables in a model either play the role of random variables whose distribution is defined by the model, or nonrandom variables or parameters. Any combination is possible, depending on the context. For instance, the population parameters $\theta$ and covariates $\bc$ could be random with parametric probability distributions $\qth(\, \cdot \,;\varphi)$ and $\qc(\, \cdot \,;\gamma)$, and the dose regimen $\bu$ and measurement times $\bt$ reported with uncertainty and therefore modeled as random variables with distributions $\qu$ and $\qt$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters, the population parameters, the dose regimens, the covariates and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsithcut(\by , \bpsi, \theta, \bu, \bc,\bt; \nominal{\bu},\nominal{\bt},\varphi,\gamma)=\pcypsiut(\by {{!}}\bpsi,\bu,\bt) \, \pcpsithc(\bpsi{{!}}\theta,\bc) \, \pth(\theta;\varphi) \, \pc(\bc;\gamma) \, \pu(\bu ; \nominal{\bu}) \, \pt(\bt ; \nominal{\bt}). &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the nominal dose regimens $\nominal{\bu}$, the nominal measurement times $\nominal{\bt}$ and the &quot;hyper-parameters&quot; $\varphi$ and $\gamma$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Using the model for executing tasks==<br /> <br /> <br /> In the modeling and simulation context, the tasks to execute make specific use of the various probability distributions associated with a model.<br /> <br /> &lt;br&gt;<br /> ===Simulation===<br /> <br /> <br /> By definition, simulation makes direct use of the probability distribution that defines the model. Simulation of the global model is straightforward as soon as the joint distribution can be decomposed into a product of easily simulated conditional distributions.<br /> <br /> Consider for example that the variables involved in the model are those introduced in [[#An illustrative example|the previous section]]:<br /> <br /> <br /> # The population parameters $\theta$ can either be given, or simulated from the distribution $\qth$.<br /> # The individual covariates $\bc$ can either be given, or simulated from the distribution $\qc$.<br /> # The individual parameters $\bpsi$ can be simulated from the distribution $\qcpsithc$ using the values of $\theta$ and $\bc$ obtained in steps 1 and 2.<br /> # The dose regimen $\bu$ can either be given, or simulated from the distribution $\qu$.<br /> # The measurement times $\bt$ (resp. regression variables $\bx$) can either be given, or simulated from the distribution $\qt$ (resp. $\qx$).<br /> # Lastly, observations $\by$ can be simulated from the distribution $\qcypsiut$ using the values of $\bpsi$, $\bu$ and $\bt$ obtained at steps 3, 4 and 5.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> Simulation of a set of variables $w$ using another given set of variables $z$ requires:<br /> <br /> <br /> &lt;ul&gt;<br /> * a model, i.e., a distribution $\qw$ if $z$ is treated as a nonrandom variable, or a conditional distribution $\qcwz$ if $z$ is treated as a random variable.<br /> * the input $z$, i.e., a value of $z$ which allows the distribution $\qw(\, \cdot \, ; z)$ or the conditional distribution $\qcwz(\, \cdot \, {{!}} z)$ to be defined.<br /> * an algorithm which allows us to generate $w$ from $\qw$ or $\qcwz$.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=<br /> - Imagine instead that the population parameter $\theta$ and the design $(\bu,\bt)$ are given, and we want to simulate the individual covariates $\bc$, the individual parameters $\bpsi$ and the observations $\by$. Here, the variables to simulate are $w=(\bc,\bpsi,\by)$ and the variables which are given are $z=(\theta,\bu,\bt)$. If the components of $z$ are taken to be nonrandom variables, then:<br /> <br /> <br /> &lt;ul&gt;<br /> * The model is the joint distribution $\qypsic( \, \cdot \, ;\theta,\bu,\bt)$ of $(\by,\bpsi,\bc)$.<br /> * The inputs required for the simulation are the values of $(\theta,\bu,\bt)$.<br /> * The algorithm should be able to generate $(\by,\bpsi,\bc)$ from the joint distribution $\qypsic(\, \cdot \, ;\theta,\bu,\bt)$. Decomposing the model into three submodels leads to decomposing the joint distribution as<br /> &lt;/ul&gt;<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsic(\by,\bpsi,\bc ;\theta,\bu,\bt) = \pc(\bc) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pcypsi(\by {{!}} \bpsi;\bu,\bt) . &lt;/math&gt; }}<br /> <br /> The algorithm therefore reduces to successively drawing $\bc$, $\bpsi$ and $\by$ from $\qc$, $\qcpsic(\, \cdot \, {{!}} \bc;\theta)$ and $\qcypsi(\, \cdot \, {{!}} \bpsi;\bu,\bt)$. <br /> <br /> <br /> - Imagine instead that the individual covariates $\bc$, the observations $\by$, the design $(\bu,\bt)$ and the population parameter $\theta$ are given (in a modeling context for instance, $\theta$ may have been estimated), and we want to simulate the individual parameters $\bpsi$. The only variable to simulate is $w=\bpsi$ and the variables which are given are $z=(\by,\bc,\theta,\bu,\bt)$. Here, we will treat $\by$ as if it is a random variable. The other components of $z$ can be treated as non random variables. Here,<br /> <br /> <br /> &lt;ul&gt;<br /> * The model is the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$ of $\psi$.<br /> * The inputs required for the simulation are the values of $(\by,\bc,\theta,\bu,\bt)$.<br /> * The algorithm should be able to sample $\bpsi$ from the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$. Markov Chain Monte Carlo (MCMC) algorithms can be used for sampling from such complex conditional distributions.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> &lt;br&gt;<br /> <br /> ===Estimation of the population parameters===<br /> <br /> <br /> In a modeling context, we usually assume that we have data that includes the observations $\by$ and the measurement times $\bt$. There may also be individual covariates $\bc$, and in pharmacological applications the dose regimen $\bu$. For clarity, let us consider the most general case where all are present.<br /> <br /> Any statistical method for estimating the population parameters $\theta$ will be based on some specific probability distribution. Let us illustrate this with two common statistical methods: maximum likelihood and Bayesian estimation.<br /> <br /> <br /> ''Maximum likelihood estimation'' consists in maximizing with respect to $\theta$ the ''observed likelihood'', defined by:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\like}(\theta ; \by,\bc,\bu,\bt) &amp;\eqdef&amp; \py(\by ; \bc,\bu,\bt,\theta) \\<br /> &amp;=&amp; \int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The variance of the estimator $\thmle$ and therefore confidence intervals can be derived from the observed Fisher information matrix, which itself is calculated using the observed likelihood (i.e., the pdf of the observations $\by$):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ofim_intro3&quot;&gt;&lt;math&gt;<br /> \ofim(\thmle ; \by,\bc,\bu,\bt) \ \ \eqdef \ \ - \displaystyle{ \frac{\partial^2}{\partial \theta^2} } \log({\like}(\thmle ; \by,\bc,\bu,\bt)) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(14) }}<br /> <br /> <br /> {{OutlineText<br /> |text=Maximum likelihood estimation of the population parameter $\theta$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi$.<br /> * inputs $\by$, $\bc$, $\bu$ and $\bt$.<br /> * an algorithm which allows us to maximize $\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi$ with respect to $\theta$ and to compute $\displaystyle{ \frac{\partial^2}{\partial \theta^2} }\left\{\log\left(\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\thmle) \, d \bpsi \right)\right\}$.<br /> }}<br /> <br /> <br /> ''Bayesian estimation'' consists in estimating and/or maximizing the conditional distribution<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcthy(\theta {{!}} \by ;\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pyth(\by , \theta ; \bc,\bu,\bt)}{\py(\by ; \bc,\bu,\bt)} } \\<br /> &amp;=&amp; \frac{\displaystyle{ \int \pypsith(\by,\bpsi,\theta ; \bc,\bu,\bt) \, d \bpsi} }{\py(\by ; \bc,\bu,\bt)} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> {{outlineText<br /> |text= Bayesian estimation of the population parameter $\theta$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsith(\, \cdot \, ; \bc, \bu, \bt)$ for $(\by,\bpsi,\theta)$.<br /> * inputs $\by$, $\bc$, $\bu$ and $\bt$.<br /> * algorithms able to estimate and maximize $\pcthy(\theta {{!}} \by ;\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode, i.e., finding its maximum.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Estimation of the individual parameters===<br /> <br /> <br /> When $\theta$ is given (or estimated), various estimators of the individual parameters $\bpsi$ are available. They are all based on a probability distribution:<br /> <br /> ''Maximum likelihood estimation'' consists of maximizing with respect to $\bpsi$ the ''conditional likelihood''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\like}(\bpsi ; \by,\bu,\bt) &amp;\eqdef&amp; \pcypsi(\by {{!}} \bpsi ;\bu,\bt) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''maximum a posteriori'' (MAP) estimator is obtained by maximizing with respect to $\bpsi$ the ''conditional distribution''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcpsiy(\bpsi {{!]}} \by ; \theta,\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pypsi(\by , \bpsi;\theta,\bc,\bu,\bt)}{\py(\by ; \theta,\bc,\bu,\bt)} } .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''conditional mean'' of $\bpsi$ is defined as the mean of the conditional distribution $\qcpsiy(\, \cdot \, | \by ; \theta,\bc,\bu,\bt)$ of $\psi$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> Estimation of the individual parameters $\bpsi$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br /> * algorithms able to estimate and maximize $\pcpsiy(\bpsi {{!}} \by ; \theta,\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode (i.e., its MAP).<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> ===Model selection===<br /> <br /> <br /> Likelihood ratio tests and statistical information criteria (BIC, AIC) compare the ''observed likelihoods'' computed under different models, i.e., the probability distribution functions $\py^{(1)}(\by ; \bc,\bu,\bt,\thmle_1)$, $\py^{(2)}(\by ; \bc,\bu,\bt,\thmle_2)$, ..., $\py^{(K)}(\by ; \bc,\bu,\bt,\thmle_K)$ computed under models ${\cal M}_1, {\cal M}_2$, ..., ${\cal M}_K$, where $\thmle_k$ maximizes the observed likelihood of model ${\cal M}_k$, i.e., maximizes $\py^{(k)}(\by ; \bc,\bu,\bt,\theta)$ .<br /> <br /> <br /> {{outlineText<br /> |text=<br /> Computing the observed likelihood and information criteria require:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br /> * an algorithm able to compute $\int \pypsi( \by ,\bpsi ;\theta,\bc,\bu,\bt) \, d\bpsi$. For nonlinear models, linearization methods or Monte-Carlo methods can be used.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Optimal design===<br /> <br /> In the design of experiments for estimating statistical models, optimal designs allow parameters to be estimated with minimum variance by optimizing some statistical criteria. Common optimality criteria are functionals of the eigenvalues of the expected Fisher information matrix:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;efim_intro3&quot;&gt;&lt;math&gt;<br /> \efim(\theta ; \bu,\bt) \ \ \eqdef \ \ \esps{y}{\ofim(\theta ; \by,\bu,\bt)} ,<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(15) }}<br /> <br /> where $\ofim$ is the observed Fisher information matrix defined in [[#ofim_intro3|(15)]]. For the sake of simplicity, we consider models without covariates $\bc$.<br /> <br /> <br /> {{OutlineText<br /> |text=Optimal design for minimum variance estimation requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * a vector of population parameters $\theta$.<br /> * a criteria ${\cal D}(\bu,\bt)$ derived from the expected Fisher information matrix $\efim(\theta ; \bu,\bt)$.<br /> * an algorithm able to estimate $\efim(\theta ; \bu,\bt)$ for any design $(\bu,\bt)$ and to maximize ${\cal D}(\bu,\bt)$ with respect to $\bu$ and $\bt$.<br /> }}<br /> <br /> <br /> In a clinical trial context, studies are designed to optimize the probability of reaching some predefined target ${\cal A}$, i.e., $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$. This may include optimizing safety and efficacy, and things like the probability of reaching sustained virologic response, etc.<br /> <br /> <br /> {{OutlineText<br /> |text=Optimal design for clinical trials requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsic(\, \cdot \, ; \theta, \bu, \bt)$ for $(\by,\bpsi,\bc)$.<br /> * a vector of population parameters $\theta$.<br /> * a target ${\cal A}$.<br /> * an algorithm able to estimate $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$ and to maximize it with respect to $\bu$ and $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Implementing models and running tasks==<br /> <br /> <br /> ===Example 1 ===<br /> <br /> Consider first the model defined by the joint distribution <br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\pypsi(\by,\bpsi ; \theta ,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \pcpsic(\bpsi ; \theta),&lt;/math&gt;}}<br /> <br /> where as in our running example, <br /> <br /> <br /> &lt;ul&gt;<br /> * $\by = (y_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are concentrations <br /> <br /> * $\bpsi= (\psi_i, 1\leq i \leq N)$ are individual parameters; here $\psi_i=(V_i,k_i,a_i)$<br /> <br /> * $\theta=(V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,a)$ are population parameters <br /> <br /> * $\bt = (t_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are the measurement times. <br /> &lt;/ul&gt;<br /> <br /> <br /> We aim to define a joint model for $\by$ and $\bpsi$. To do this, we will characterize each component of the model and show how this can be implemented with $\mlxtran$.<br /> <br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot;|<br /> {{Equation2 <br /> |name=&lt;math&gt; \pypsi(\by,\bpsi ; \theta, \bt) &lt;/math&gt; <br /> |equation= }}<br /> {{Equation2<br /> |name= &lt;math&gt; \pcpsic(\bpsi {{!}}\theta)&lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{array}{c} <br /> \log(V_i) &amp;\sim&amp; {\cal N}\left(\log(V_{\rm pop}), \, \omega_V^2\right) \\<br /> \log(k_i) &amp;\sim&amp; {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br /> \end{array}&lt;/math&gt; }}<br /> {{Equation2<br /> |name= &lt;math&gt;\pcypsi(y{{!}}\bpsi; \bt) &lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{eqnarray}<br /> f(t;V_i,k_i) &amp;=&amp; \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br /> y_{ij} &amp;\sim&amp; {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> |style = &quot;width:50%&quot; |<br /> {{MLXTranForTable<br /> |name=Example 1<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> [INDIVIDUAL PARAMETER]<br /> input={V_pop,k_pop,omega_V,omega_k}<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=V_pop,sd=omega_V}<br /> k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br /> <br /> <br /> [OBSERVATION]<br /> input={V,k,a}<br /> <br /> EQUATION:<br /> f = 500/V*exp(-k*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, sd=a}<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> <br /> We can then use this model with different tools for executing different tasks: it can be used for example with $\mlxplore$ for model exploration, with $\monolix$ for modeling, with R or Matlab for simulation...<br /> <br /> It is important to remember that $\mlxtran$ is not a &quot;function&quot; that calculates an output. It is not an imperative but rather a declarative language, one that allows us to describe a model. It is then the tasks we choose to do which use $\mlxtran$ like a function, &quot;requesting&quot; it to give predictions, simulate random variables, compute a pdf, maximizes a likelihood, etc.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Example 1 ===<br /> <br /> Consider first the model defined by the joint distribution <br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\pypsi(\by,\bpsi ; \theta ,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \pcpsic(\bpsi ; \theta),&lt;/math&gt;}}<br /> <br /> where as in our running example, <br /> <br /> <br /> &lt;ul&gt;<br /> * $\by = (y_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are concentrations <br /> <br /> * $\bpsi= (\psi_i, 1\leq i \leq N)$ are individual parameters; here $\psi_i=(V_i,k_i,a_i)$<br /> <br /> * $\theta=(V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,a)$ are population parameters <br /> <br /> * $\bt = (t_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are the measurement times. <br /> &lt;/ul&gt;<br /> <br /> <br /> We aim to define a joint model for $\by$ and $\bpsi$. To do this, we will characterize each component of the model and show how this can be implemented with $\mlxtran$.<br /> <br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot;|<br /> {{Equation2 <br /> |name=&lt;math&gt; \pypsi(\by,\bpsi ; \theta, \bt) &lt;/math&gt; <br /> |equation= }}<br /> {{Equation2<br /> |name= &lt;math&gt; \pcpsic(\bpsi {{!}}\theta)&lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{array}{c} <br /> \log(V_i) &amp;\sim&amp; {\cal N}\left(\log(V_{\rm pop}), \, \omega_V^2\right) \\<br /> \log(k_i) &amp;\sim&amp; {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br /> \end{array}&lt;/math&gt; }}<br /> {{Equation2<br /> |name= &lt;math&gt;\pcypsi(y{{!}}\bpsi; \bt) &lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{eqnarray}<br /> f(t;V_i,k_i) &amp;=&amp; \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br /> y_{ij} &amp;\sim&amp; {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> |style = &quot;width:50%&quot; |<br /> {{MLXTranForTable<br /> |name=Example 1<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> [INDIVIDUAL PARAMETER]<br /> input={V_pop,k_pop,omega_V,omega_k}<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=V_pop,sd=omega_V}<br /> k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br /> <br /> <br /> [OBSERVATION]<br /> input={V,k,a}<br /> <br /> EQUATION:<br /> f = 500/V*exp(-k*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, sd=a}<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> <br /> We can then use this model with different tools for executing different tasks: it can be used for example with $\mlxplore$ for model exploration, with $\monolix$ for modeling, with R or Matlab for simulation...<br /> <br /> It is important to remember that $\mlxtran$ is not a &quot;function&quot; that calculates an output. It is not an imperative but rather a declarative language, one that allows us to describe a model. It is then the tasks we choose to do which use $\mlxtran$ like a function, &quot;requesting&quot; it to give predictions, simulate random variables, compute a pdf, maximizes a likelihood, etc.<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack=The individual approach<br /> |linkNext=Description, representation and implementation of a model }}</div> Bleakley https://wiki.inria.fr/wikis/popix/index.php?title=What_is_a_model%3F_A_joint_probability_distribution!&diff=7363 What is a model? A joint probability distribution! 2013-06-18T08:34:49Z <p>Bleakley: </p> <hr /> <div>==Introduction==<br /> <br /> A model built for real-world applications can involve various types of variable, such as measurements, individual and population parameters, covariates, design, etc. The model allows us to represent relationships between these variables.<br /> <br /> If we consider things from a probabilistic point of view, some of the variables will be random, so the model becomes a probabilistic one, representing the [http://en.wikipedia.org/wiki/Joint_probability_distribution joint distribution] of these random variables.<br /> <br /> Defining a model therefore means defining a joint distribution. The hierarchical structure of the model will then allow it to be decomposed into submodels, i.e., the joint distribution decomposed into a product of [http://en.wikipedia.org/wiki/Conditional_probability_distribution conditional distributions].<br /> <br /> Tasks such as estimation, model selection, simulation and optimization can then be expressed as specific ways of using this probability distribution.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - A model is a joint probability distribution. <br /> <br /> - A submodel is a conditional distribution derived from this joint distribution. <br /> <br /> - A task is a specific use of this distribution. <br /> }}<br /> <br /> We will illustrate this approach starting with a very simple example that we will gradually make more sophisticated. Then we will see in various situations what can be defined as the model and what its inputs are.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==An illustrative example==<br /> <br /> &lt;br&gt;<br /> ===A model for the observations of a single individual===<br /> Let $y=(y_j, 1\leq j \leq n)$ be a vector of observations obtained at times $\vt=(t_j, 1\leq j \leq n)$. We consider that the $y_j$ are random variables and we denote $\qy$ the distribution (or [http://en.wikipedia.org/wiki/Probability_density_function pdf]) of $y$. If we assume a [http://en.wikipedia.org/wiki/Parametric_model parametric model], then there exists a vector of parameters $\psi$ that completely define $y$.<br /> <br /> We can then explicitly represent this dependency with respect to $\bpsi$ by writing $\qy( \, \cdot \, ; \psi)$ for the pdf of $y$.<br /> <br /> If we wish to be even more precise, we can even make it clear that this distribution is defined for a given design, i.e., a given vector of times $\vt$, and write $\qy(\, \cdot \, ; \psi,\vt)$ instead.<br /> <br /> By convention, the variables which are before the symbol &quot;;&quot; are random variables. Those that are after the &quot;;&quot; are non-random parameters or variables.<br /> When there is no risk of confusion, the non-random terms can be left out of the notation.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> -In this context, the model is the distribution of the observations $\qy(\, \cdot \, ; \psi,\vt)$. &lt;br&gt;<br /> -The inputs of the model are the parameters $\psi$ and the design $\vt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= 500 mg of a drug is given by [http://en.wikipedia.org/wiki/Intravenous_therapy intravenous] [http://en.wikipedia.org/wiki/Bolus_%28medicine%29 bolus] to a patient at time 0. We assume that the evolution of the [http://en.wikipedia.org/wiki/Blood_plasma plasmatic] concentration of the drug over time is described by the [http://en.wikipedia.org/wiki/Pharmacokinetics pharmacokinetic] (PK) model<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; f(t;V,k) = \displaystyle{ \frac{500}{V} }e^{-k \, t} , &lt;/math&gt; }}<br /> <br /> where $V$ is the [http://en.wikipedia.org/wiki/Volume_of_distribution volume of distribution] and $k$ the [http://en.wikipedia.org/wiki/Elimination_rate_constant elimination rate constant]. The concentration is measured at times $(t_j, 1\leq j \leq n)$ with additive residual errors:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; y_j = f(t_j;V,k) + e_j , \quad 1 \leq j \leq n . &lt;/math&gt; }}<br /> <br /> Assuming that the residual errors $(e_j)$ are [http://en.wikipedia.org/wiki/Dependent_and_independent_variables independent] and [http://en.wikipedia.org/wiki/Normal_distribution normally distributed] with constant variance $a^2$, the observed values $(y_j)$ are also independent random variables and<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba1&quot; &gt;&lt;math&gt;<br /> y_j \sim {\cal N} \left( f(t_j ; V,k) , a^2 \right), \quad 1 \leq j \leq n. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> Here, the vector of parameters $\psi$ is $(V,k,a)$. $V$ and $k$ are the PK parameters for the structural PK model and $a$ the residual error parameter.<br /> As the $y_j$ are independent, the joint distribution of $y$ is the product of their marginal distributions:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \py(y ; \psi,\vt) = \prod_{j=1}^n \pyj(y_j ; \psi,t_j) ,<br /> &lt;/math&gt; }}<br /> <br /> where $\qyj$ is the normal distribution defined in [[#ex_proba1|(1)]]. <br /> }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> === A model for several individuals ===<br /> <br /> Now let us move to $N$ individuals. It is natural to suppose that each is represented by the same basic parametric model, but not necessarily the exact same parameter values. Thus, individual $i$ has parameters $\psi_i$. If we consider that individuals are randomly selected from the [http://en.wikipedia.org/wiki/Statistical_population population], then we can treat the $\psi_i$ as if they were random vectors. As both $\by=(y_i , 1\leq i \leq N)$ and $\bpsi=(\psi_i , 1\leq i \leq N)$ are random, the model is now a joint distribution: $\qypsi$. Using basic probability, this can be written as:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsi(\by,\bpsi) = \pcypsi(\by {{!}} \bpsi) \, \ppsi(\bpsi) .&lt;/math&gt; }}<br /> <br /> If $\qpsi$ is a parametric distribution that depends on a vector $\theta$ of ''population parameters'' and a set of ''individual covariates'' $\bc=(c_i , 1\leq i \leq N)$, this dependence can be made explicit by writing $\qpsi(\, \cdot \,;\theta,\bc)$ for the pdf of $\bpsi$.<br /> Each $i$ has a potentially unique set of times $t_i=(t_{i1},\ldots,t_{i \ \!\!n_i})$ in the design, and $n_i$ can be different for each individual.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - In this context, the model is the joint distribution of the observations and the individual parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pypsi(\by , \bpsi; \theta, \bc,\bt)=\pcypsi(\by {{!}} \bpsi;\bt) \, \ppsi(\bpsi;\theta,\bc) . &lt;/math&gt;}}<br /> <br /> - The inputs of the model are the population parameters $\theta$, the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times <br /> :$\bt=(t_{ij} ,\ 1\leq i \leq N ,\ 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= Let us suppose $N$ patients received the same treatment as the single patient did. We now have the same PK model [[#ex_proba1|(1)]] for each patient, except that each has its own individual PK parameters $V_i$ and $k_i$ and potentially its own residual error parameter $a_i$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2a&quot;&gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N} \left( \displaystyle{\frac{500}{V_i}e^{-k_i \, t_{ij} } } , a_i^2 \right).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> Here, $\psi_i = (V_i,k_i,a_i)$. One possible model is then to assume the same residual error model for all patients, and log-normal distributions for $V$ and $k$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> a_i &amp;=&amp; a \end{eqnarray}&lt;/math&gt; }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2b&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \log(V_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(V_{\rm pop})+\beta\,\log(w_i/70),\ \omega_V^2\right) \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \log(k_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(k_{\rm pop}),\ \omega_k^2\right), \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where the only covariate we choose to consider, $w_i$, is the weight (in kg) of patient $i$. The model therefore consists of the conditional distribution of the concentrations defined in [[#ex_proba2a|(2)]] and the distribution of the individual PK parameters defined in [[#ex_proba2b|(3)]]. The inputs of the model are the population parameters $\theta = (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$, the covariates (here, the weight) $(w_i, 1\leq i \leq N)$, and the design $\bt$.<br /> }}<br /> <br /> &lt;br&gt;<br /> ===A model for the population parameters===<br /> <br /> In some cases, it may turn out that it is useful or important to consider that the population parameter $\theta$ is itself random, rather than fixed. There are various reasons for this, such as if we want to model uncertainty in its value, introduce a priori information in an estimation context, or model an inter-population variability if the model is not looking at only one given population.<br /> <br /> If so, let us denote $\qth$ the distribution of $\theta$. As the status of $\theta$ has therefore changed, the model now becomes the joint distribution of its random variables, i.e., of $\by$, $\bpsi$ and $\theta$, and can be decomposed as follows:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3a&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta;\bt,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> |text= &lt;ol&gt;<br /> &lt;li&gt; The formula is identical for $\ppsi(\bpsi; \theta)$ and $\pcpsith(\bpsi{{!}}\theta)$. What has changed is the status of $\theta$. It is not random in $\ppsi(\bpsi; \theta)$, the distribution of $\bpsi$ for any given value of $\theta$, whereas it is random in $\pcpsith(\bpsi {{!}} \theta)$, the conditional distribution of $\bpsi$, i.e., the distribution of $\bpsi$ obtained after observing randomly generated $\theta$. &lt;/li&gt;&lt;br&gt;<br /> <br /> &lt;li&gt;If $\qth$ is a parametric distribution with parameter $\varphi$, this dependence can be made explicit by writing $\qth(\, \cdot \,;\varphi)$ for the distribution of $\theta$.&lt;/li&gt;&lt;br&gt;<br /> <br /> &lt;li&gt;Not necessarily all of the components of $\theta$ need be random. If it is possible to decompose $\theta$ into $(\theta_F,\theta_R)$, where $\theta_F$ is fixed and $\theta_R$ random, then the decomposition [[#proba3a{{!}}(4)]] becomes &lt;/li&gt;<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3b&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta_R;\bt,\theta_F,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta_R;\theta_F,\bc) \, \pth(\theta_R).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(5) }} <br /> &lt;/ol&gt;}}<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the population parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsith(\by,\bpsi,\theta;\bc,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta). &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times $\bt=(t_{ij} , 1\leq i \leq N , 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We can introduce [http://en.wikipedia.org/wiki/Prior_probability prior distributions] in order to model the inter-population variability of the population parameters $V_{\rm pop}$ and $k_{\rm pop}$: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba3&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> V_{\rm pop} &amp;\sim&amp; {\cal N}\left(30,3^2\right) <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(6) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> k_{\rm pop} &amp;\sim&amp; {\cal N}\left(0.1,0.01^2\right). \end{eqnarray}&lt;/math&gt; }}<br /> <br /> As before, the conditional distribution of the concentration is given by [[#ex_proba2a|(2)]]. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given $\theta_R=(V_{\rm pop},k_{\rm pop})$. The distribution of $\theta_R$ is defined in [[#ex_proba3|(6)]]. Here, the inputs of the model are the fixed population parameters $\theta_F = (\omega_V,\omega_k,\beta,a)$, the weights $(w_i)$ and the design $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the covariates===<br /> <br /> <br /> Another scenario is to suppose that in fact it is the covariates $\bc$ that are random, not the population parameters. This may either be in the context of wanting to simulate individuals, or when modeling and wanting to take into account uncertainty in the covariate values. If we note $\qc$ the distribution of the covariates, the joint distribution $\qpsic$ of the individual parameters and the covariates decomposes naturally as:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba4&quot;&gt;&lt;math&gt;<br /> \ppsic(\bpsi,\bc;\theta) = \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) \, ,<br /> &lt;/math&gt;<br /> |reference=(7) }}<br /> <br /> where $\qcpsic$ is the conditional distribution of $\bpsi$ given $\bc$.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> &lt;li&gt;In this context, the model is the joint distribution of the observations, the individual parameters and the covariates:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsic(\by,\bpsi,\bc;\theta,\bt) = \pcypsi(\by {{!}} \bpsi;\bt) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) . &lt;/math&gt; }}<br /> <br /> &lt;li&gt;The inputs of the model are the population parameters $\theta$ and the measurement times $\bt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We could assume a normal distribution as a prior for the weights: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba4&quot; &gt;&lt;math&gt; <br /> w_i \sim_{i.i.d.} {\cal N}\left(70,10^2\right). &lt;/math&gt;&lt;/div&gt;<br /> |reference=(8) }}<br /> <br /> Once more, [[#ex_proba2a|(2)]] defines the conditional distribution of the concentrations. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given the weight $\bw$, which is now a random variable whose distribution is defined in [[#ex_proba4|(8)]]. Now, the inputs of the model are the population parameters $\theta= (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$ and the design $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the measurement times===<br /> <br /> Another scenario is to suppose that there is uncertainty in the measurement times $\bt=(t_{ij})$ and not the population parameters or covariates. If we note $\nominal{\bt}=(\nominal{t}_{ij}, 1\leq i \leq N, 1\leq j \leq n_i)$ the nominal measurement times (i.e., those presented in a data set), then the &quot;true&quot; measurement times $\bt$ at which the measurement were made can be considered random fluctuations around $\nominal{\bt}$ following some distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> Randomness with respect to time can also appear in the presence of dropout, i.e., individuals that prematurely leave a clinical trial. For such an individual $i$ who leaves at the random time $T_i$, measurement times are the nominal times before $T_i$: $t_{i} = (\nominal{t}_{ij} \ \ {\rm s.t. }\ \ \nominal{t}_{ij}\leq T_i)$.<br /> In such situations, measurement times are therefore random and can be thought to come from a distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> <br /> {{Remarks<br /> |title=Remark<br /> |text= If there are also other regression variables $\bx=(x_{ij})$, it is of course possible to use the same approach and consider $\bx$ as a random variable fluctuating around $\nominal{\bx}$. }}<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsit(\by , \bpsi,\bt; \theta,\bc,\nominal{\bt})=\pcypsit(\by {{!}}\bpsi,\bt) \, \ppsi(\bpsi;\theta,\bc) \, \pt(\bt ; \nominal{\bt}) .<br /> &lt;/math&gt; }}<br /> <br /> <br /> &lt;li&gt; The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$ and the nominal design $\nominal{\bt}$.<br /> }}<br /> <br /> {{Example<br /> |title=Example:<br /> |text= Let us assume as prior a normal distribution around the nominal times: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba5&quot; &gt;&lt;math&gt; <br /> t_{ij} \sim_{i.i.d.} {\cal N}\left(\nominal{t}_{ij},0.03^2\right). &lt;/math&gt;&lt;/div&gt;<br /> |reference=(9) }}<br /> <br /> Here, [[#ex_proba5|(9)]] defines the distribution of the now random variable $\bt$. The other components of the model defined in [[#ex_proba2a|(2)]] and [[#ex_proba2b|(3)]] remain unchanged. <br /> The inputs of the model are the population parameters $\theta$, the weights $(w_i)$ and the nominal measurement times $\nominal{\bt}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the dose regimen===<br /> <br /> If the structural model is a dynamical system (e.g., defined by a system of [http://en.wikipedia.org/wiki/Ordinary_differential_equation ordinary differential equations]), the ''source terms'' $\bu = (u_i, 1\leq i \leq N)$, i.e., the inputs of the dynamical system, are usually considered fixed and known. This is the case for example for doses administered to patients for a given treatment. Here, the source term $u_i$ is made up of the dose(s) given to patient $i$, the time(s) of administration, and their type (intravenous bolus, infusion, oral, etc.).<br /> <br /> Here again, there may be differences between the nominal dose regimen stated in the protocol and given in the data set, and the dose regimen that was in reality administered. For example, it might be that the times of administration and/or the dosage were not exactly respected or recorded. Also, there may have been non-compliance, i.e., certain doses that were not taken by the patient.<br /> <br /> If we denote $\nominal{\bu}=(\nominal{u}_{i}, 1\leq i \leq N)$ the nominal dose regimens (reported in the dataset), then in this context the &quot;real&quot; dose regimens $\bu$ can be considered to randomly fluctuate around $\nominal{\bu}$ with some distribution $\qu(\, \cdot \, ; \nominal{\bu})$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the dose regimens:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsiu(\by , \bpsi,\bu; \theta,\bc,\bt,\nominal{\bu})=\pcypsiu(\by {{!}} \bpsi,\bu;\bt) \, \pu(\bu ; \nominal{\bu}) \, \ppsi(\bpsi;\theta,\bc) .<br /> &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$, the nominal design $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=Suppose that instead of the one dose given in the example up to now, there are now repeated doses $(d_{ik}, k \geq 1)$ administered to patient $i$ at times $(\tau_{ik} , k \geq 1)$. Then, it is easy to see that<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6b&quot; &gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N}\left(f(t_{ij};V_i,k_i) , a_i^2 \right), &lt;/math&gt;&lt;/div&gt;<br /> |reference=(10) }}<br /> <br /> where<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6a&quot; &gt;&lt;math&gt; <br /> f(t;V_i,k_i) = \sum_{k, \tau_{ik}&lt;t}\displaystyle {\frac{d_{ik} }{V_i} }\, e^{-k_i \, (t- \tau_{ik})} .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(11) }}<br /> <br /> The &quot;real&quot; dose regimen administered to patient $i$ can be written $u_i=(d_{ik},\tau_{ik}, k\geq 1)$, and the prescribed dose regimen $\nominal{u}_i=(\nominal{d}_{ik},\nominal{\tau}_{ik}, k\geq 1)$.<br /> <br /> We can model the random fluctuations of the administration times $\tau_{ik}$ around the nominal times $(\nominal{\tau}_{ij})$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6c&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \tau_{ik} &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\nominal{\tau}_{ik},0.02^2\right)\ , <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(12) }}<br /> <br /> and non-compliance (here meaning that a dose is not taken):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6d&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \pi &amp;=&amp; \prob{d_{ik} = 0} \nonumber \\ &amp;=&amp; 1 - \prob{d_{ik}= \nominal{d}_{ik} }. <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(13) }}<br /> <br /> Here, [[#ex_proba6b{{!}}(10)]] and [[#ex_proba6a{{!}}(11)]] define the conditional distributions of the concentrations $(y_{ij})$, [[#ex_proba2b{{!}}(3)]] defines the distribution of $\bpsi$ and [[#ex_proba6c{{!}}(12)]] and [[#ex_proba6d{{!}}(13)]] define the distribution of $\bu$. The inputs are the population parameters $\theta$, the weights $(w_i)$, the measurement times $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A complete model===<br /> <br /> We have now seen the variety of ways in which the variables in a model either play the role of random variables whose distribution is defined by the model, or nonrandom variables or parameters. Any combination is possible, depending on the context. For instance, the population parameters $\theta$ and covariates $\bc$ could be random with parametric probability distributions $\qth(\, \cdot \,;\varphi)$ and $\qc(\, \cdot \,;\gamma)$, and the dose regimen $\bu$ and measurement times $\bt$ reported with uncertainty and therefore modeled as random variables with distributions $\qu$ and $\qt$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters, the population parameters, the dose regimens, the covariates and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsithcut(\by , \bpsi, \theta, \bu, \bc,\bt; \nominal{\bu},\nominal{\bt},\varphi,\gamma)=\pcypsiut(\by {{!}}\bpsi,\bu,\bt) \, \pcpsithc(\bpsi{{!}}\theta,\bc) \, \pth(\theta;\varphi) \, \pc(\bc;\gamma) \, \pu(\bu ; \nominal{\bu}) \, \pt(\bt ; \nominal{\bt}). &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the nominal dose regimens $\nominal{\bu}$, the nominal measurement times $\nominal{\bt}$ and the &quot;hyper-parameters&quot; $\varphi$ and $\gamma$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Using the model for executing tasks==<br /> <br /> <br /> In a modeling and simulation context, the tasks to execute make specific use of the various probability distributions associated with a model.<br /> <br /> &lt;br&gt;<br /> ===Simulation===<br /> <br /> <br /> By definition, simulation makes direct use of the probability distribution that defines the model. Simulation of the global model is straightforward as soon as the joint distribution can be decomposed into a product of easily simulated conditional distributions.<br /> <br /> Consider for example that the variables involved in the model are those introduced in [[#An illustrative example|the previous section]]:<br /> <br /> <br /> # The population parameters $\theta$ can either be given, or simulated from the distribution $\qth$.<br /> # The individual covariates $\bc$ can either be given, or simulated from the distribution $\qc$.<br /> # The individual parameters $\bpsi$ can be simulated from the distribution $\qcpsithc$ using the values of $\theta$ and $\bc$ obtained in steps 1 and 2.<br /> # The dose regimen $\bu$ can either be given, or simulated from the distribution $\qu$.<br /> # The measurement times $\bt$ (resp. regression variables $\bx$) can either be given, or simulated from the distribution $\qt$ (resp. $\qx$).<br /> # Lastly, observations $\by$ can be simulated from the distribution $\qcypsiut$ using the values of $\bpsi$, $\bu$ and $\bt$ obtained at steps 3, 4 and 5.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> Simulation of a set of variables $w$ using another given set of variables $z$ requires:<br /> <br /> <br /> &lt;ul&gt;<br /> * a model, i.e., a distribution $\qw$ if $z$ is treated as a nonrandom variable, or a conditional distribution $\qcwz$ if $z$ is treated as a random variable.<br /> * the input $z$, i.e., a value of $z$ which allows the distribution $\qw(\, \cdot \, ; z)$ or the conditional distribution $\qcwz(\, \cdot \, {{!}} z)$ to be defined.<br /> * an algorithm which allows us to generate $w$ from $\qw$ or $\qcwz$.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=<br /> - Imagine instead that the population parameter $\theta$ and the design $(\bu,\bt)$ are given, and we want to simulate the individual covariates $\bc$, the individual parameters $\bpsi$ and the observations $\by$. Here, the variables to simulate are $w=(\bc,\bpsi,\by)$ and the variables which are given are $z=(\theta,\bu,\bt)$. If the components of $z$ are taken to be nonrandom variables, then:<br /> <br /> <br /> &lt;ul&gt;<br /> * The model is the joint distribution $\qypsic( \, \cdot \, ;\theta,\bu,\bt)$ of $(\by,\bpsi,\bc)$.<br /> * The inputs required for the simulation are the values of $(\theta,\bu,\bt)$.<br /> * The algorithm should be able to generate $(\by,\bpsi,\bc)$ from the joint distribution $\qypsic(\, \cdot \, ;\theta,\bu,\bt)$. Decomposing the model into three submodels leads to decomposing the joint distribution as<br /> &lt;/ul&gt;<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsic(\by,\bpsi,\bc ;\theta,\bu,\bt) = \pc(\bc) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pcypsi(\by {{!}} \bpsi;\bu,\bt) . &lt;/math&gt; }}<br /> <br /> The algorithm therefore reduces to successively drawing $\bc$, $\bpsi$ and $\by$ from $\qc$, $\qcpsic(\, \cdot \, {{!}} \bc;\theta)$ and $\qcypsi(\, \cdot \, {{!}} \bpsi;\bu,\bt)$. <br /> <br /> <br /> - Imagine instead that the individual covariates $\bc$, the observations $\by$, the design $(\bu,\bt)$ and the population parameter $\theta$ are given (in a modeling context for instance, $\theta$ may have been estimated), and we want to simulate the individual parameters $\bpsi$. The only variable to simulate is $w=\bpsi$ and the variables which are given are $z=(\by,\bc,\theta,\bu,\bt)$. Here, we will treat $\by$ as if it is a random variable. The other components of $z$ can be treated as non random variables. Here,<br /> <br /> <br /> &lt;ul&gt;<br /> * The model is the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$ of $\psi$.<br /> * The inputs required for the simulation are the values of $(\by,\bc,\theta,\bu,\bt)$.<br /> * The algorithm should be able to sample $\bpsi$ from the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$. Markov Chain Monte Carlo (MCMC) algorithms can be used for sampling from such complex conditional distributions.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> &lt;br&gt;<br /> <br /> ===Estimation of the population parameters===<br /> <br /> <br /> In a modeling context, we usually assume that we have data that includes the observations $\by$ and the measurement times $\bt$. There may also be individual covariates $\bc$, and in pharmacological applications the dose regimen $\bu$. For clarity, let us consider the most general case where all are present.<br /> <br /> Any statistical method for estimating the population parameters $\theta$ will be based on some specific probability distribution. Let us illustrate this with two common statistical methods: maximum likelihood and Bayesian estimation.<br /> <br /> <br /> ''Maximum likelihood estimation'' consists in maximizing with respect to $\theta$ the ''observed likelihood'', defined by:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\like}(\theta ; \by,\bc,\bu,\bt) &amp;\eqdef&amp; \py(\by ; \bc,\bu,\bt,\theta) \\<br /> &amp;=&amp; \int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The variance of the estimator $\thmle$ and therefore confidence intervals can be derived from the observed Fisher information matrix, which itself is calculated using the observed likelihood (i.e., the pdf of the observations $\by$):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ofim_intro3&quot;&gt;&lt;math&gt;<br /> \ofim(\thmle ; \by,\bc,\bu,\bt) \ \ \eqdef \ \ - \displaystyle{ \frac{\partial^2}{\partial \theta^2} } \log({\like}(\thmle ; \by,\bc,\bu,\bt)) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(14) }}<br /> <br /> <br /> {{OutlineText<br /> |text=Maximum likelihood estimation of the population parameter $\theta$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi$.<br /> * inputs $\by$, $\bc$, $\bu$ and $\bt$.<br /> * an algorithm which allows us to maximize $\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi$ with respect to $\theta$ and to compute $\displaystyle{ \frac{\partial^2}{\partial \theta^2} }\left\{\log\left(\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\thmle) \, d \bpsi \right)\right\}$.<br /> }}<br /> <br /> <br /> ''Bayesian estimation'' consists in estimating and/or maximizing the conditional distribution<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcthy(\theta {{!}} \by ;\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pyth(\by , \theta ; \bc,\bu,\bt)}{\py(\by ; \bc,\bu,\bt)} } \\<br /> &amp;=&amp; \frac{\displaystyle{ \int \pypsith(\by,\bpsi,\theta ; \bc,\bu,\bt) \, d \bpsi} }{\py(\by ; \bc,\bu,\bt)} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> {{outlineText<br /> |text= Bayesian estimation of the population parameter $\theta$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsith(\, \cdot \, ; \bc, \bu, \bt)$ for $(\by,\bpsi,\theta)$.<br /> * inputs $\by$, $\bc$, $\bu$ and $\bt$.<br /> * algorithms able to estimate and maximize $\pcthy(\theta {{!}} \by ;\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode, i.e., finding its maximum.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Estimation of the individual parameters===<br /> <br /> <br /> When $\theta$ is given (or estimated), various estimators of the individual parameters $\bpsi$ are available. They are all based on a probability distribution:<br /> <br /> ''Maximum likelihood estimation'' consists of maximizing with respect to $\bpsi$ the ''conditional likelihood''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\like}(\bpsi ; \by,\bu,\bt) &amp;\eqdef&amp; \pcypsi(\by {{!}} \bpsi ;\bu,\bt) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''maximum a posteriori'' (MAP) estimator is obtained by maximizing with respect to $\bpsi$ the ''conditional distribution''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcpsiy(\bpsi {{!]}} \by ; \theta,\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pypsi(\by , \bpsi;\theta,\bc,\bu,\bt)}{\py(\by ; \theta,\bc,\bu,\bt)} } .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''conditional mean'' of $\bpsi$ is defined as the mean of the conditional distribution $\qcpsiy(\, \cdot \, | \by ; \theta,\bc,\bu,\bt)$ of $\psi$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> Estimation of the individual parameters $\bpsi$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br /> * algorithms able to estimate and maximize $\pcpsiy(\bpsi {{!}} \by ; \theta,\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode (i.e., its MAP).<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> ===Model selection===<br /> <br /> <br /> Likelihood ratio tests and statistical information criteria (BIC, AIC) compare the ''observed likelihoods'' computed under different models, i.e., the probability distribution functions $\py^{(1)}(\by ; \bc,\bu,\bt,\thmle_1)$, $\py^{(2)}(\by ; \bc,\bu,\bt,\thmle_2)$, ..., $\py^{(K)}(\by ; \bc,\bu,\bt,\thmle_K)$ computed under models ${\cal M}_1, {\cal M}_2$, ..., ${\cal M}_K$, where $\thmle_k$ maximizes the observed likelihood of model ${\cal M}_k$, i.e., maximizes $\py^{(k)}(\by ; \bc,\bu,\bt,\theta)$ .<br /> <br /> <br /> {{outlineText<br /> |text=<br /> Computing the observed likelihood and information criteria require:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br /> * an algorithm able to compute $\int \pypsi( \by ,\bpsi ;\theta,\bc,\bu,\bt) \, d\bpsi$. For nonlinear models, linearization methods or Monte-Carlo methods can be used.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Optimal design===<br /> <br /> In the design of experiments for estimating statistical models, optimal designs allow parameters to be estimated with minimum variance by optimizing some statistical criteria. Common optimality criteria are functionals of the eigenvalues of the expected Fisher information matrix:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;efim_intro3&quot;&gt;&lt;math&gt;<br /> \efim(\theta ; \bu,\bt) \ \ \eqdef \ \ \esps{y}{\ofim(\theta ; \by,\bu,\bt)} ,<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(15) }}<br /> <br /> where $\ofim$ is the observed Fisher information matrix defined in [[#ofim_intro3|(15)]]. For the sake of simplicity, we consider models without covariates $\bc$.<br /> <br /> <br /> {{OutlineText<br /> |text=Optimal design for minimum variance estimation requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * a vector of population parameters $\theta$.<br /> * a criteria ${\cal D}(\bu,\bt)$ derived from the expected Fisher information matrix $\efim(\theta ; \bu,\bt)$.<br /> * an algorithm able to estimate $\efim(\theta ; \bu,\bt)$ for any design $(\bu,\bt)$ and to maximize ${\cal D}(\bu,\bt)$ with respect to $\bu$ and $\bt$.<br /> }}<br /> <br /> <br /> In a clinical trial context, studies are designed to optimize the probability of reaching some predefined target ${\cal A}$, i.e., $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$. This may include optimizing safety and efficacy, and things like the probability of reaching sustained virologic response, etc.<br /> <br /> <br /> {{OutlineText<br /> |text=Optimal design for clinical trials requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsic(\, \cdot \, ; \theta, \bu, \bt)$ for $(\by,\bpsi,\bc)$.<br /> * a vector of population parameters $\theta$.<br /> * a target ${\cal A}$.<br /> * an algorithm able to estimate $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$ and to maximize it with respect to $\bu$ and $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Implementing models and running tasks==<br /> <br /> <br /> ===Example 1 ===<br /> <br /> Consider first the model defined by the joint distribution <br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\pypsi(\by,\bpsi ; \theta ,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \pcpsic(\bpsi ; \theta),&lt;/math&gt;}}<br /> <br /> where as in our running example, <br /> <br /> <br /> &lt;ul&gt;<br /> * $\by = (y_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are concentrations <br /> <br /> * $\bpsi= (\psi_i, 1\leq i \leq N)$ are individual parameters; here $\psi_i=(V_i,k_i,a_i)$<br /> <br /> * $\theta=(V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,a)$ are population parameters <br /> <br /> * $\bt = (t_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are the measurement times. <br /> &lt;/ul&gt;<br /> <br /> <br /> We aim to define a joint model for $\by$ and $\bpsi$. To do this, we will characterize each component of the model and show how this can be implemented with $\mlxtran$.<br /> <br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot;|<br /> {{Equation2 <br /> |name=&lt;math&gt; \pypsi(\by,\bpsi ; \theta, \bt) &lt;/math&gt; <br /> |equation= }}<br /> {{Equation2<br /> |name= &lt;math&gt; \pcpsic(\bpsi {{!}}\theta)&lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{array}{c} <br /> \log(V_i) &amp;\sim&amp; {\cal N}\left(\log(V_{\rm pop}), \, \omega_V^2\right) \\<br /> \log(k_i) &amp;\sim&amp; {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br /> \end{array}&lt;/math&gt; }}<br /> {{Equation2<br /> |name= &lt;math&gt;\pcypsi(y{{!}}\bpsi; \bt) &lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{eqnarray}<br /> f(t;V_i,k_i) &amp;=&amp; \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br /> y_{ij} &amp;\sim&amp; {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> |style = &quot;width:50%&quot; |<br /> {{MLXTranForTable<br /> |name=Example 1<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> [INDIVIDUAL PARAMETER]<br /> input={V_pop,k_pop,omega_V,omega_k}<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=V_pop,sd=omega_V}<br /> k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br /> <br /> <br /> [OBSERVATION]<br /> input={V,k,a}<br /> <br /> EQUATION:<br /> f = 500/V*exp(-k*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, sd=a}<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> <br /> We can then use this model with different tools for executing different tasks: it can be used for example with $\mlxplore$ for model exploration, with $\monolix$ for modeling, with R or Matlab for simulation...<br /> <br /> It is important to remember that $\mlxtran$ is not a &quot;function&quot; that calculates an output. It is not an imperative but rather a declarative language, one that allows us to describe a model. It is then the tasks we choose to do which use $\mlxtran$ like a function, &quot;requesting&quot; it to give predictions, simulate random variables, compute a pdf, maximizes a likelihood, etc.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Example 1 ===<br /> <br /> Consider first the model defined by the joint distribution <br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\pypsi(\by,\bpsi ; \theta ,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \pcpsic(\bpsi ; \theta),&lt;/math&gt;}}<br /> <br /> where as in our running example, <br /> <br /> <br /> &lt;ul&gt;<br /> * $\by = (y_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are concentrations <br /> <br /> * $\bpsi= (\psi_i, 1\leq i \leq N)$ are individual parameters; here $\psi_i=(V_i,k_i,a_i)$<br /> <br /> * $\theta=(V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,a)$ are population parameters <br /> <br /> * $\bt = (t_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are the measurement times. <br /> &lt;/ul&gt;<br /> <br /> <br /> We aim to define a joint model for $\by$ and $\bpsi$. To do this, we will characterize each component of the model and show how this can be implemented with $\mlxtran$.<br /> <br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot;|<br /> {{Equation2 <br /> |name=&lt;math&gt; \pypsi(\by,\bpsi ; \theta, \bt) &lt;/math&gt; <br /> |equation= }}<br /> {{Equation2<br /> |name= &lt;math&gt; \pcpsic(\bpsi {{!}}\theta)&lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{array}{c} <br /> \log(V_i) &amp;\sim&amp; {\cal N}\left(\log(V_{\rm pop}), \, \omega_V^2\right) \\<br /> \log(k_i) &amp;\sim&amp; {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br /> \end{array}&lt;/math&gt; }}<br /> {{Equation2<br /> |name= &lt;math&gt;\pcypsi(y{{!}}\bpsi; \bt) &lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{eqnarray}<br /> f(t;V_i,k_i) &amp;=&amp; \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br /> y_{ij} &amp;\sim&amp; {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> |style = &quot;width:50%&quot; |<br /> {{MLXTranForTable<br /> |name=Example 1<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> [INDIVIDUAL PARAMETER]<br /> input={V_pop,k_pop,omega_V,omega_k}<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=V_pop,sd=omega_V}<br /> k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br /> <br /> <br /> [OBSERVATION]<br /> input={V,k,a}<br /> <br /> EQUATION:<br /> f = 500/V*exp(-k*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, sd=a}<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> <br /> We can then use this model with different tools for executing different tasks: it can be used for example with $\mlxplore$ for model exploration, with $\monolix$ for modeling, with R or Matlab for simulation...<br /> <br /> It is important to remember that $\mlxtran$ is not a &quot;function&quot; that calculates an output. It is not an imperative but rather a declarative language, one that allows us to describe a model. It is then the tasks we choose to do which use $\mlxtran$ like a function, &quot;requesting&quot; it to give predictions, simulate random variables, compute a pdf, maximizes a likelihood, etc.<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack=The individual approach<br /> |linkNext=Description, representation and implementation of a model }}</div> Bleakley https://wiki.inria.fr/wikis/popix/index.php?title=What_is_a_model%3F_A_joint_probability_distribution!&diff=7362 What is a model? A joint probability distribution! 2013-06-18T08:33:39Z <p>Bleakley: </p> <hr /> <div>==Introduction==<br /> <br /> A model built for real-world applications can involve various types of variable, such as measurements, individual and population parameters, covariates, design, etc. The model allows us to represent relationships between these variables.<br /> <br /> If we consider things from a probabilistic point of view, some of the variables will be random, so the model becomes a probabilistic one, representing the [http://en.wikipedia.org/wiki/Joint_probability_distribution joint distribution] of these random variables.<br /> <br /> Defining a model therefore means defining a joint distribution. The hierarchical structure of the model will then allow it to be decomposed into submodels, i.e., the joint distribution decomposed into a product of [http://en.wikipedia.org/wiki/Conditional_probability_distribution conditional distributions].<br /> <br /> Tasks such as estimation, model selection, simulation and optimization can then be expressed as specific ways of using this probability distribution.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - A model is a joint probability distribution. <br /> <br /> - A submodel is a conditional distribution derived from this joint distribution. <br /> <br /> - A task is a specific use of this distribution. <br /> }}<br /> <br /> We will illustrate this approach starting with a very simple example that we will gradually make more sophisticated. Then we will see in various situations what can be defined as the model and what its inputs are.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==An illustrative example==<br /> <br /> &lt;br&gt;<br /> ===A model for the observations of a single individual===<br /> Let $y=(y_j, 1\leq j \leq n)$ be a vector of observations obtained at times $\vt=(t_j, 1\leq j \leq n)$. We consider that the $y_j$ are random variables and we denote $\qy$ the distribution (or [http://en.wikipedia.org/wiki/Probability_density_function pdf]) of $y$. If we assume a [http://en.wikipedia.org/wiki/Parametric_model parametric model], then there exists a vector of parameters $\psi$ that completely define $y$.<br /> <br /> We can then explicitly represent this dependency with respect to $\bpsi$ by writing $\qy( \, \cdot \, ; \psi)$ for the pdf of $y$.<br /> <br /> If we wish to be even more precise, we can even make it clear that this distribution is defined for a given design, i.e., a given vector of times $\vt$, and write $\qy(\, \cdot \, ; \psi,\vt)$ instead.<br /> <br /> By convention, the variables which are before the symbol &quot;;&quot; are random variables. Those that are after the &quot;;&quot; are non-random parameters or variables.<br /> When there is no risk of confusion, the non-random terms can be left out of the notation.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> -In this context, the model is the distribution of the observations $\qy(\, \cdot \, ; \psi,\vt)$. &lt;br&gt;<br /> -The inputs of the model are the parameters $\psi$ and the design $\vt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= 500 mg of a drug is given by [http://en.wikipedia.org/wiki/Intravenous_therapy intravenous] [http://en.wikipedia.org/wiki/Bolus_%28medicine%29 bolus] to a patient at time 0. We assume that the evolution of the [http://en.wikipedia.org/wiki/Blood_plasma plasmatic] concentration of the drug over time is described by the [http://en.wikipedia.org/wiki/Pharmacokinetics pharmacokinetic] (PK) model<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; f(t;V,k) = \displaystyle{ \frac{500}{V} }e^{-k \, t} , &lt;/math&gt; }}<br /> <br /> where $V$ is the [http://en.wikipedia.org/wiki/Volume_of_distribution volume of distribution] and $k$ the [http://en.wikipedia.org/wiki/Elimination_rate_constant elimination rate constant]. The concentration is measured at times $(t_j, 1\leq j \leq n)$ with additive residual errors:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; y_j = f(t_j;V,k) + e_j , \quad 1 \leq j \leq n . &lt;/math&gt; }}<br /> <br /> Assuming that the residual errors $(e_j)$ are [http://en.wikipedia.org/wiki/Dependent_and_independent_variables independent] and [http://en.wikipedia.org/wiki/Normal_distribution normally distributed] with constant variance $a^2$, the observed values $(y_j)$ are also independent random variables and<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba1&quot; &gt;&lt;math&gt;<br /> y_j \sim {\cal N} \left( f(t_j ; V,k) , a^2 \right), \quad 1 \leq j \leq n. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> Here, the vector of parameters $\psi$ is $(V,k,a)$. $V$ and $k$ are the PK parameters for the structural PK model and $a$ the residual error parameter.<br /> As the $y_j$ are independent, the joint distribution of $y$ is the product of their marginal distributions:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \py(y ; \psi,\vt) = \prod_{j=1}^n \pyj(y_j ; \psi,t_j) ,<br /> &lt;/math&gt; }}<br /> <br /> where $\qyj$ is the normal distribution defined in [[#ex_proba1|(1)]]. <br /> }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> === A model for several individuals ===<br /> <br /> Now let us move to $N$ individuals. It is natural to suppose that each is represented by the same basic parametric model, but not necessarily the exact same parameter values. Thus, individual $i$ has parameters $\psi_i$. If we consider that individuals are randomly selected from the [http://en.wikipedia.org/wiki/Statistical_population population], then we can treat the $\psi_i$ as if they were random vectors. As both $\by=(y_i , 1\leq i \leq N)$ and $\bpsi=(\psi_i , 1\leq i \leq N)$ are random, the model is now a joint distribution: $\qypsi$. Using basic probability, this can be written as:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsi(\by,\bpsi) = \pcypsi(\by {{!}} \bpsi) \, \ppsi(\bpsi) .&lt;/math&gt; }}<br /> <br /> If $\qpsi$ is a parametric distribution that depends on a vector $\theta$ of ''population parameters'' and a set of ''individual covariates'' $\bc=(c_i , 1\leq i \leq N)$, this dependence can be made explicit by writing $\qpsi(\, \cdot \,;\theta,\bc)$ for the pdf of $\bpsi$.<br /> Each $i$ has a potentially unique set of times $t_i=(t_{i1},\ldots,t_{i \ \!\!n_i})$ in the design, and $n_i$ can be different for each individual.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - In this context, the model is the joint distribution of the observations and the individual parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pypsi(\by , \bpsi; \theta, \bc,\bt)=\pcypsi(\by {{!}} \bpsi;\bt) \, \ppsi(\bpsi;\theta,\bc) . &lt;/math&gt;}}<br /> <br /> - The inputs of the model are the population parameters $\theta$, the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times <br /> :$\bt=(t_{ij} ,\ 1\leq i \leq N ,\ 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= Let us suppose $N$ patients received the same treatment as the single patient did. We now have the same PK model [[#ex_proba1|(1)]] for each patient, except that each has its own individual PK parameters $V_i$ and $k_i$ and potentially its own residual error parameter $a_i$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2a&quot;&gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N} \left( \displaystyle{\frac{500}{V_i}e^{-k_i \, t_{ij} } } , a_i^2 \right).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> Here, $\psi_i = (V_i,k_i,a_i)$. One possible model is then to assume the same residual error model for all patients, and log-normal distributions for $V$ and $k$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> a_i &amp;=&amp; a \end{eqnarray}&lt;/math&gt; }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2b&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \log(V_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(V_{\rm pop})+\beta\,\log(w_i/70),\ \omega_V^2\right) \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \log(k_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(k_{\rm pop}),\ \omega_k^2\right), \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where the only covariate we choose to consider, $w_i$, is the weight (in kg) of patient $i$. The model therefore consists of the conditional distribution of the concentrations defined in [[#ex_proba2a|(2)]] and the distribution of the individual PK parameters defined in [[#ex_proba2b|(3)]]. The inputs of the model are the population parameters $\theta = (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$, the covariates (here, the weight) $(w_i, 1\leq i \leq N)$, and the design $\bt$.<br /> }}<br /> <br /> &lt;br&gt;<br /> ===A model for the population parameters===<br /> <br /> In some cases, it may turn out that it is useful or important to consider that the population parameter $\theta$ is itself random, rather than fixed. There are various reasons for this, such as if we want to model uncertainty in its value, introduce a priori information in an estimation context, or model an inter-population variability if the model is not looking at only one given population.<br /> <br /> If so, let us denote $\qth$ the distribution of $\theta$. As the status of $\theta$ has therefore changed, the model now becomes the joint distribution of its random variables, i.e., of $\by$, $\bpsi$ and $\theta$, and can be decomposed as follows:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3a&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta;\bt,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> |text= &lt;ol&gt;<br /> &lt;li&gt; The formula is identical for $\ppsi(\bpsi; \theta)$ and $\pcpsith(\bpsi{{!}}\theta)$. What has changed is the status of $\theta$. It is not random in $\ppsi(\bpsi; \theta)$, the distribution of $\bpsi$ for any given value of $\theta$, whereas it is random in $\pcpsith(\bpsi {{!}} \theta)$, the conditional distribution of $\bpsi$, i.e., the distribution of $\bpsi$ obtained after observing randomly generated $\theta$. &lt;/li&gt;&lt;br&gt;<br /> <br /> &lt;li&gt;If $\qth$ is a parametric distribution with parameter $\varphi$, this dependence can be made explicit by writing $\qth(\, \cdot \,;\varphi)$ for the distribution of $\theta$.&lt;/li&gt;&lt;br&gt;<br /> <br /> &lt;li&gt;Not necessarily all of the components of $\theta$ need be random. If it is possible to decompose $\theta$ into $(\theta_F,\theta_R)$, where $\theta_F$ is fixed and $\theta_R$ random, then the decomposition [[#proba3a{{!}}(4)]] becomes &lt;/li&gt;<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3b&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta_R;\bt,\theta_F,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta_R;\theta_F,\bc) \, \pth(\theta_R).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(5) }} <br /> &lt;/ol&gt;}}<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the population parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsith(\by,\bpsi,\theta;\bc,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta). &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times $\bt=(t_{ij} , 1\leq i \leq N , 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We can introduce [http://en.wikipedia.org/wiki/Prior_probability prior distributions] in order to model the inter-population variability of the population parameters $V_{\rm pop}$ and $k_{\rm pop}$: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba3&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> V_{\rm pop} &amp;\sim&amp; {\cal N}\left(30,3^2\right) <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(6) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> k_{\rm pop} &amp;\sim&amp; {\cal N}\left(0.1,0.01^2\right). \end{eqnarray}&lt;/math&gt; }}<br /> <br /> As before, the conditional distribution of the concentration is given by [[#ex_proba2a|(2)]]. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given $\theta_R=(V_{\rm pop},k_{\rm pop})$. The distribution of $\theta_R$ is defined in [[#ex_proba3|(6)]]. Here, the inputs of the model are the fixed population parameters $\theta_F = (\omega_V,\omega_k,\beta,a)$, the weights $(w_i)$ and the design $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the covariates===<br /> <br /> <br /> Another scenario is to suppose that in fact it is the covariates $\bc$ that are random, not the population parameters. This may either be in the context of wanting to simulate individuals, or when modeling and wanting to take into account uncertainty in the covariate values. If we note $\qc$ the distribution of the covariates, the joint distribution $\qpsic$ of the individual parameters and the covariates decomposes naturally as:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba4&quot;&gt;&lt;math&gt;<br /> \ppsic(\bpsi,\bc;\theta) = \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) \, ,<br /> &lt;/math&gt;<br /> |reference=(7) }}<br /> <br /> where $\qcpsic$ is the conditional distribution of $\bpsi$ given $\bc$.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> &lt;li&gt;In this context, the model is the joint distribution of the observations, the individual parameters and the covariates:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsic(\by,\bpsi,\bc;\theta,\bt) = \pcypsi(\by {{!}} \bpsi;\bt) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) . &lt;/math&gt; }}<br /> <br /> &lt;li&gt;The inputs of the model are the population parameters $\theta$ and the measurement times $\bt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We could assume a normal distribution as a prior for the weights: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba4&quot; &gt;&lt;math&gt; <br /> w_i \sim_{i.i.d.} {\cal N}\left(70,10^2\right). &lt;/math&gt;&lt;/div&gt;<br /> |reference=(8) }}<br /> <br /> Once more, [[#ex_proba2a|(2)]] defines the conditional distribution of the concentrations. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given the weight $\bw$, which is now a random variable whose distribution is defined in [[#ex_proba4|(8)]]. Now, the inputs of the model are the population parameters $\theta= (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$ and the design $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the measurement times===<br /> <br /> Another scenario is to suppose that there is uncertainty in the measurement times $\bt=(t_{ij})$ and not the population parameters or covariates. If we note $\nominal{\bt}=(\nominal{t}_{ij}, 1\leq i \leq N, 1\leq j \leq n_i)$ the nominal measurement times (i.e., those presented in a data set), then the &quot;true&quot; measurement times $\bt$ at which the measurement were made can be considered random fluctuations around $\nominal{\bt}$ following some distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> Randomness with respect to time can also appear in the presence of dropout, i.e., individuals that prematurely leave a clinical trial. For such an individual $i$ who leaves at the random time $T_i$, measurement times are the nominal times before $T_i$: $t_{i} = (\nominal{t}_{ij} \ \ {\rm s.t. }\ \ \nominal{t}_{ij}\leq T_i)$.<br /> In such situations, measurement times are therefore random and can be thought to come from a distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> <br /> {{Remarks<br /> |title=Remark<br /> |text= If there are also other regression variables $\bx=(x_{ij})$, it is of course possible to use the same approach and consider $\bx$ as a random variable fluctuating around $\nominal{\bx}$. }}<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsit(\by , \bpsi,\bt; \theta,\bc,\nominal{\bt})=\pcypsit(\by {{!}}\bpsi,\bt) \, \ppsi(\bpsi;\theta,\bc) \, \pt(\bt ; \nominal{\bt}) .<br /> &lt;/math&gt; }}<br /> <br /> <br /> &lt;li&gt; The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$ and the nominal design $\nominal{\bt}$.<br /> }}<br /> <br /> {{Example<br /> |title=Example:<br /> |text= Let us assume as prior a normal distribution around the nominal times: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba5&quot; &gt;&lt;math&gt; <br /> t_{ij} \sim_{i.i.d.} {\cal N}\left(\nominal{t}_{ij},0.03^2\right). &lt;/math&gt;&lt;/div&gt;<br /> |reference=(9) }}<br /> <br /> Here, [[#ex_proba5|(9)]] defines the distribution of the now random variable $\bt$. The other components of the model defined in [[#ex_proba2a|(2)]] and [[#ex_proba2b|(3)]] remain unchanged. <br /> The inputs of the model are the population parameters $\theta$, the weights $(w_i)$ and the nominal measurement times $\nominal{\bt}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the dose regimen===<br /> <br /> If the structural model is a dynamical system (e.g., defined by a system of [http://en.wikipedia.org/wiki/Ordinary_differential_equation ordinary differential equations]), the ''source terms'' $\bu = (u_i, 1\leq i \leq N)$, i.e., the inputs of the dynamical system, are usually considered fixed and known. This is the case for example for doses administered to patients for a given treatment. Here, the source term $u_i$ is made up of the dose(s) given to patient $i$, the time(s) of administration, and their type (intravenous bolus, infusion, oral, etc.).<br /> <br /> Here again, there may be differences between the nominal dose regimen stated in the protocol and given in the data set, and the dose regimen that was in reality administered. For example, it might be that the times of administration and/or the dosage were not exactly respected or recorded. Also, there may have been non-compliance, i.e., certain doses that were not taken by the patient.<br /> <br /> If we denote $\nominal{\bu}=(\nominal{u}_{i}, 1\leq i \leq N)$ the nominal dose regimens (reported in the dataset), then in this context the &quot;real&quot; dose regimens $\bu$ can be considered to randomly fluctuate around $\nominal{\bu}$ with some distribution $\qu(\, \cdot \, ; \nominal{\bu})$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the dose regimens:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsiu(\by , \bpsi,\bu; \theta,\bc,\bt,\nominal{\bu})=\pcypsiu(\by {{!}} \bpsi,\bu;\bt) \, \pu(\bu ; \nominal{\bu}) \, \ppsi(\bpsi;\theta,\bc) .<br /> &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$, the nominal design $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=Suppose that instead of the one dose given in the example up to now, there are now repeated doses $(d_{ik}, k \geq 1)$ administered to patient $i$ at times $(\tau_{ik} , k \geq 1)$. Then, it is easy to see that<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6b&quot; &gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N}\left(f(t_{ij};V_i,k_i) , a_i^2 \right), &lt;/math&gt;&lt;/div&gt;<br /> |reference=(10) }}<br /> <br /> where<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6a&quot; &gt;&lt;math&gt; <br /> f(t;V_i,k_i) = \sum_{k, \tau_{ik}&lt;t}\displaystyle {\frac{d_{ik} }{V_i} }\, e^{-k_i \, (t- \tau_{ik})} .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(11) }}<br /> <br /> The &quot;real&quot; dose regimen administered to patient $i$ can be written $u_i=(d_{ik},\tau_{ik}, k\geq 1)$, and the prescribed dose regimen $\nominal{u}_i=(\nominal{d}_{ik},\nominal{\tau}_{ik}, k\geq 1)$.<br /> <br /> We can model the random fluctuations of the administration times $\tau_{ik}$ around the nominal times $(\nominal{\tau}_{ij})$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6c&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \tau_{ik} &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\nominal{\tau}_{ik},0.02^2\right)\ , <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(12) }}<br /> <br /> and non-compliance (here meaning that a dose is not taken):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6d&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \pi &amp;=&amp; \prob{d_{ik} = 0} \nonumber \\ &amp;=&amp; 1 - \prob{d_{ik}= \nominal{d}_{ik} }. <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(13) }}<br /> <br /> Here, [[#ex_proba6b{{!}}(10)]] and [[#ex_proba6a{{!}}(11)]] define the conditional distributions of the concentrations $(y_{ij})$, [[#ex_proba2b{{!}}(3)]] defines the distribution of $\bpsi$ and [[#ex_proba6c{{!}}(12)]] and [[#ex_proba6d{{!}}(13)]] define the distribution of $\bu$. The inputs are the population parameters $\theta$, the weights $(w_i)$, the measurement times $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A complete model===<br /> <br /> We have now seen the variety of ways in which the variables in a model either play the role of random variables whose distribution is defined by the model, or nonrandom variables or parameters. Any combination is possible, depending on the context. For instance, the population parameters $\theta$ and covariates $\bc$ could be random with parametric probability distributions $\qth(\, \cdot \,;\varphi)$ and $\qc(\, \cdot \,;\gamma)$, and the dose regimen $\bu$ and measurement times $\bt$ reported with uncertainty and therefore modeled as random variables with distribution $\qu$ and $\qt$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters, the population parameters, the dose regimens, the covariates and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsithcut(\by , \bpsi, \theta, \bu, \bc,\bt; \nominal{\bu},\nominal{\bt},\varphi,\gamma)=\pcypsiut(\by {{!}}\bpsi,\bu,\bt) \, \pcpsithc(\bpsi{{!}}\theta,\bc) \, \pth(\theta;\varphi) \, \pc(\bc;\gamma) \, \pu(\bu ; \nominal{\bu}) \, \pt(\bt ; \nominal{\bt}). &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the nominal dose regimens $\nominal{\bu}$, the nominal measurement times $\nominal{\bt}$ and the &quot;hyper-parameters&quot; $\varphi$ and $\gamma$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Using the model for executing tasks==<br /> <br /> <br /> In a modeling and simulation context, the tasks to execute make specific use of the various probability distributions associated with a model.<br /> <br /> &lt;br&gt;<br /> ===Simulation===<br /> <br /> <br /> By definition, simulation makes direct use of the probability distribution that defines the model. Simulation of the global model is straightforward as soon as the joint distribution can be decomposed into a product of easily simulated conditional distributions.<br /> <br /> Consider for example that the variables involved in the model are those introduced in [[#An illustrative example|the previous section]]:<br /> <br /> <br /> # The population parameters $\theta$ can either be given, or simulated from the distribution $\qth$.<br /> # The individual covariates $\bc$ can either be given, or simulated from the distribution $\qc$.<br /> # The individual parameters $\bpsi$ can be simulated from the distribution $\qcpsithc$ using the values of $\theta$ and $\bc$ obtained in steps 1 and 2.<br /> # The dose regimen $\bu$ can either be given, or simulated from the distribution $\qu$.<br /> # The measurement times $\bt$ (resp. regression variables $\bx$) can either be given, or simulated from the distribution $\qt$ (resp. $\qx$).<br /> # Lastly, observations $\by$ can be simulated from the distribution $\qcypsiut$ using the values of $\bpsi$, $\bu$ and $\bt$ obtained at steps 3, 4 and 5.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> Simulation of a set of variables $w$ using another given set of variables $z$ requires:<br /> <br /> <br /> &lt;ul&gt;<br /> * a model, i.e., a distribution $\qw$ if $z$ is treated as a nonrandom variable, or a conditional distribution $\qcwz$ if $z$ is treated as a random variable.<br /> * the input $z$, i.e., a value of $z$ which allows the distribution $\qw(\, \cdot \, ; z)$ or the conditional distribution $\qcwz(\, \cdot \, {{!}} z)$ to be defined.<br /> * an algorithm which allows us to generate $w$ from $\qw$ or $\qcwz$.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=<br /> - Imagine instead that the population parameter $\theta$ and the design $(\bu,\bt)$ are given, and we want to simulate the individual covariates $\bc$, the individual parameters $\bpsi$ and the observations $\by$. Here, the variables to simulate are $w=(\bc,\bpsi,\by)$ and the variables which are given are $z=(\theta,\bu,\bt)$. If the components of $z$ are taken to be nonrandom variables, then:<br /> <br /> <br /> &lt;ul&gt;<br /> * The model is the joint distribution $\qypsic( \, \cdot \, ;\theta,\bu,\bt)$ of $(\by,\bpsi,\bc)$.<br /> * The inputs required for the simulation are the values of $(\theta,\bu,\bt)$.<br /> * The algorithm should be able to generate $(\by,\bpsi,\bc)$ from the joint distribution $\qypsic(\, \cdot \, ;\theta,\bu,\bt)$. Decomposing the model into three submodels leads to decomposing the joint distribution as<br /> &lt;/ul&gt;<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsic(\by,\bpsi,\bc ;\theta,\bu,\bt) = \pc(\bc) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pcypsi(\by {{!}} \bpsi;\bu,\bt) . &lt;/math&gt; }}<br /> <br /> The algorithm therefore reduces to successively drawing $\bc$, $\bpsi$ and $\by$ from $\qc$, $\qcpsic(\, \cdot \, {{!}} \bc;\theta)$ and $\qcypsi(\, \cdot \, {{!}} \bpsi;\bu,\bt)$. <br /> <br /> <br /> - Imagine instead that the individual covariates $\bc$, the observations $\by$, the design $(\bu,\bt)$ and the population parameter $\theta$ are given (in a modeling context for instance, $\theta$ may have been estimated), and we want to simulate the individual parameters $\bpsi$. The only variable to simulate is $w=\bpsi$ and the variables which are given are $z=(\by,\bc,\theta,\bu,\bt)$. Here, we will treat $\by$ as if it is a random variable. The other components of $z$ can be treated as non random variables. Here,<br /> <br /> <br /> &lt;ul&gt;<br /> * The model is the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$ of $\psi$.<br /> * The inputs required for the simulation are the values of $(\by,\bc,\theta,\bu,\bt)$.<br /> * The algorithm should be able to sample $\bpsi$ from the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$. Markov Chain Monte Carlo (MCMC) algorithms can be used for sampling from such complex conditional distributions.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> &lt;br&gt;<br /> <br /> ===Estimation of the population parameters===<br /> <br /> <br /> In a modeling context, we usually assume that we have data that includes the observations $\by$ and the measurement times $\bt$. There may also be individual covariates $\bc$, and in pharmacological applications the dose regimen $\bu$. For clarity, let us consider the most general case where all are present.<br /> <br /> Any statistical method for estimating the population parameters $\theta$ will be based on some specific probability distribution. Let us illustrate this with two common statistical methods: maximum likelihood and Bayesian estimation.<br /> <br /> <br /> ''Maximum likelihood estimation'' consists in maximizing with respect to $\theta$ the ''observed likelihood'', defined by:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\like}(\theta ; \by,\bc,\bu,\bt) &amp;\eqdef&amp; \py(\by ; \bc,\bu,\bt,\theta) \\<br /> &amp;=&amp; \int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The variance of the estimator $\thmle$ and therefore confidence intervals can be derived from the observed Fisher information matrix, which itself is calculated using the observed likelihood (i.e., the pdf of the observations $\by$):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ofim_intro3&quot;&gt;&lt;math&gt;<br /> \ofim(\thmle ; \by,\bc,\bu,\bt) \ \ \eqdef \ \ - \displaystyle{ \frac{\partial^2}{\partial \theta^2} } \log({\like}(\thmle ; \by,\bc,\bu,\bt)) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(14) }}<br /> <br /> <br /> {{OutlineText<br /> |text=Maximum likelihood estimation of the population parameter $\theta$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi$.<br /> * inputs $\by$, $\bc$, $\bu$ and $\bt$.<br /> * an algorithm which allows us to maximize $\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi$ with respect to $\theta$ and to compute $\displaystyle{ \frac{\partial^2}{\partial \theta^2} }\left\{\log\left(\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\thmle) \, d \bpsi \right)\right\}$.<br /> }}<br /> <br /> <br /> ''Bayesian estimation'' consists in estimating and/or maximizing the conditional distribution<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcthy(\theta {{!}} \by ;\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pyth(\by , \theta ; \bc,\bu,\bt)}{\py(\by ; \bc,\bu,\bt)} } \\<br /> &amp;=&amp; \frac{\displaystyle{ \int \pypsith(\by,\bpsi,\theta ; \bc,\bu,\bt) \, d \bpsi} }{\py(\by ; \bc,\bu,\bt)} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> {{outlineText<br /> |text= Bayesian estimation of the population parameter $\theta$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsith(\, \cdot \, ; \bc, \bu, \bt)$ for $(\by,\bpsi,\theta)$.<br /> * inputs $\by$, $\bc$, $\bu$ and $\bt$.<br /> * algorithms able to estimate and maximize $\pcthy(\theta {{!}} \by ;\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode, i.e., finding its maximum.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Estimation of the individual parameters===<br /> <br /> <br /> When $\theta$ is given (or estimated), various estimators of the individual parameters $\bpsi$ are available. They are all based on a probability distribution:<br /> <br /> ''Maximum likelihood estimation'' consists of maximizing with respect to $\bpsi$ the ''conditional likelihood''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\like}(\bpsi ; \by,\bu,\bt) &amp;\eqdef&amp; \pcypsi(\by {{!}} \bpsi ;\bu,\bt) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''maximum a posteriori'' (MAP) estimator is obtained by maximizing with respect to $\bpsi$ the ''conditional distribution''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcpsiy(\bpsi {{!]}} \by ; \theta,\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pypsi(\by , \bpsi;\theta,\bc,\bu,\bt)}{\py(\by ; \theta,\bc,\bu,\bt)} } .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''conditional mean'' of $\bpsi$ is defined as the mean of the conditional distribution $\qcpsiy(\, \cdot \, | \by ; \theta,\bc,\bu,\bt)$ of $\psi$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> Estimation of the individual parameters $\bpsi$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br /> * algorithms able to estimate and maximize $\pcpsiy(\bpsi {{!}} \by ; \theta,\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode (i.e., its MAP).<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> ===Model selection===<br /> <br /> <br /> Likelihood ratio tests and statistical information criteria (BIC, AIC) compare the ''observed likelihoods'' computed under different models, i.e., the probability distribution functions $\py^{(1)}(\by ; \bc,\bu,\bt,\thmle_1)$, $\py^{(2)}(\by ; \bc,\bu,\bt,\thmle_2)$, ..., $\py^{(K)}(\by ; \bc,\bu,\bt,\thmle_K)$ computed under models ${\cal M}_1, {\cal M}_2$, ..., ${\cal M}_K$, where $\thmle_k$ maximizes the observed likelihood of model ${\cal M}_k$, i.e., maximizes $\py^{(k)}(\by ; \bc,\bu,\bt,\theta)$ .<br /> <br /> <br /> {{outlineText<br /> |text=<br /> Computing the observed likelihood and information criteria require:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br /> * an algorithm able to compute $\int \pypsi( \by ,\bpsi ;\theta,\bc,\bu,\bt) \, d\bpsi$. For nonlinear models, linearization methods or Monte-Carlo methods can be used.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Optimal design===<br /> <br /> In the design of experiments for estimating statistical models, optimal designs allow parameters to be estimated with minimum variance by optimizing some statistical criteria. Common optimality criteria are functionals of the eigenvalues of the expected Fisher information matrix:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;efim_intro3&quot;&gt;&lt;math&gt;<br /> \efim(\theta ; \bu,\bt) \ \ \eqdef \ \ \esps{y}{\ofim(\theta ; \by,\bu,\bt)} ,<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(15) }}<br /> <br /> where $\ofim$ is the observed Fisher information matrix defined in [[#ofim_intro3|(15)]]. For the sake of simplicity, we consider models without covariates $\bc$.<br /> <br /> <br /> {{OutlineText<br /> |text=Optimal design for minimum variance estimation requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * a vector of population parameters $\theta$.<br /> * a criteria ${\cal D}(\bu,\bt)$ derived from the expected Fisher information matrix $\efim(\theta ; \bu,\bt)$.<br /> * an algorithm able to estimate $\efim(\theta ; \bu,\bt)$ for any design $(\bu,\bt)$ and to maximize ${\cal D}(\bu,\bt)$ with respect to $\bu$ and $\bt$.<br /> }}<br /> <br /> <br /> In a clinical trial context, studies are designed to optimize the probability of reaching some predefined target ${\cal A}$, i.e., $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$. This may include optimizing safety and efficacy, and things like the probability of reaching sustained virologic response, etc.<br /> <br /> <br /> {{OutlineText<br /> |text=Optimal design for clinical trials requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsic(\, \cdot \, ; \theta, \bu, \bt)$ for $(\by,\bpsi,\bc)$.<br /> * a vector of population parameters $\theta$.<br /> * a target ${\cal A}$.<br /> * an algorithm able to estimate $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$ and to maximize it with respect to $\bu$ and $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Implementing models and running tasks==<br /> <br /> <br /> ===Example 1 ===<br /> <br /> Consider first the model defined by the joint distribution <br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\pypsi(\by,\bpsi ; \theta ,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \pcpsic(\bpsi ; \theta),&lt;/math&gt;}}<br /> <br /> where as in our running example, <br /> <br /> <br /> &lt;ul&gt;<br /> * $\by = (y_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are concentrations <br /> <br /> * $\bpsi= (\psi_i, 1\leq i \leq N)$ are individual parameters; here $\psi_i=(V_i,k_i,a_i)$<br /> <br /> * $\theta=(V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,a)$ are population parameters <br /> <br /> * $\bt = (t_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are the measurement times. <br /> &lt;/ul&gt;<br /> <br /> <br /> We aim to define a joint model for $\by$ and $\bpsi$. To do this, we will characterize each component of the model and show how this can be implemented with $\mlxtran$.<br /> <br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot;|<br /> {{Equation2 <br /> |name=&lt;math&gt; \pypsi(\by,\bpsi ; \theta, \bt) &lt;/math&gt; <br /> |equation= }}<br /> {{Equation2<br /> |name= &lt;math&gt; \pcpsic(\bpsi {{!}}\theta)&lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{array}{c} <br /> \log(V_i) &amp;\sim&amp; {\cal N}\left(\log(V_{\rm pop}), \, \omega_V^2\right) \\<br /> \log(k_i) &amp;\sim&amp; {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br /> \end{array}&lt;/math&gt; }}<br /> {{Equation2<br /> |name= &lt;math&gt;\pcypsi(y{{!}}\bpsi; \bt) &lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{eqnarray}<br /> f(t;V_i,k_i) &amp;=&amp; \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br /> y_{ij} &amp;\sim&amp; {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> |style = &quot;width:50%&quot; |<br /> {{MLXTranForTable<br /> |name=Example 1<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> [INDIVIDUAL PARAMETER]<br /> input={V_pop,k_pop,omega_V,omega_k}<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=V_pop,sd=omega_V}<br /> k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br /> <br /> <br /> [OBSERVATION]<br /> input={V,k,a}<br /> <br /> EQUATION:<br /> f = 500/V*exp(-k*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, sd=a}<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> <br /> We can then use this model with different tools for executing different tasks: it can be used for example with $\mlxplore$ for model exploration, with $\monolix$ for modeling, with R or Matlab for simulation...<br /> <br /> It is important to remember that $\mlxtran$ is not a &quot;function&quot; that calculates an output. It is not an imperative but rather a declarative language, one that allows us to describe a model. It is then the tasks we choose to do which use $\mlxtran$ like a function, &quot;requesting&quot; it to give predictions, simulate random variables, compute a pdf, maximizes a likelihood, etc.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Example 1 ===<br /> <br /> Consider first the model defined by the joint distribution <br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\pypsi(\by,\bpsi ; \theta ,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \pcpsic(\bpsi ; \theta),&lt;/math&gt;}}<br /> <br /> where as in our running example, <br /> <br /> <br /> &lt;ul&gt;<br /> * $\by = (y_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are concentrations <br /> <br /> * $\bpsi= (\psi_i, 1\leq i \leq N)$ are individual parameters; here $\psi_i=(V_i,k_i,a_i)$<br /> <br /> * $\theta=(V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,a)$ are population parameters <br /> <br /> * $\bt = (t_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are the measurement times. <br /> &lt;/ul&gt;<br /> <br /> <br /> We aim to define a joint model for $\by$ and $\bpsi$. To do this, we will characterize each component of the model and show how this can be implemented with $\mlxtran$.<br /> <br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot;|<br /> {{Equation2 <br /> |name=&lt;math&gt; \pypsi(\by,\bpsi ; \theta, \bt) &lt;/math&gt; <br /> |equation= }}<br /> {{Equation2<br /> |name= &lt;math&gt; \pcpsic(\bpsi {{!}}\theta)&lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{array}{c} <br /> \log(V_i) &amp;\sim&amp; {\cal N}\left(\log(V_{\rm pop}), \, \omega_V^2\right) \\<br /> \log(k_i) &amp;\sim&amp; {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br /> \end{array}&lt;/math&gt; }}<br /> {{Equation2<br /> |name= &lt;math&gt;\pcypsi(y{{!}}\bpsi; \bt) &lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{eqnarray}<br /> f(t;V_i,k_i) &amp;=&amp; \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br /> y_{ij} &amp;\sim&amp; {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> |style = &quot;width:50%&quot; |<br /> {{MLXTranForTable<br /> |name=Example 1<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> [INDIVIDUAL PARAMETER]<br /> input={V_pop,k_pop,omega_V,omega_k}<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=V_pop,sd=omega_V}<br /> k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br /> <br /> <br /> [OBSERVATION]<br /> input={V,k,a}<br /> <br /> EQUATION:<br /> f = 500/V*exp(-k*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, sd=a}<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> <br /> We can then use this model with different tools for executing different tasks: it can be used for example with $\mlxplore$ for model exploration, with $\monolix$ for modeling, with R or Matlab for simulation...<br /> <br /> It is important to remember that $\mlxtran$ is not a &quot;function&quot; that calculates an output. It is not an imperative but rather a declarative language, one that allows us to describe a model. It is then the tasks we choose to do which use $\mlxtran$ like a function, &quot;requesting&quot; it to give predictions, simulate random variables, compute a pdf, maximizes a likelihood, etc.<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack=The individual approach<br /> |linkNext=Description, representation and implementation of a model }}</div> Bleakley https://wiki.inria.fr/wikis/popix/index.php?title=What_is_a_model%3F_A_joint_probability_distribution!&diff=7361 What is a model? A joint probability distribution! 2013-06-18T08:32:14Z <p>Bleakley: </p> <hr /> <div>==Introduction==<br /> <br /> A model built for real-world applications can involve various types of variable, such as measurements, individual and population parameters, covariates, design, etc. The model allows us to represent relationships between these variables.<br /> <br /> If we consider things from a probabilistic point of view, some of the variables will be random, so the model becomes a probabilistic one, representing the [http://en.wikipedia.org/wiki/Joint_probability_distribution joint distribution] of these random variables.<br /> <br /> Defining a model therefore means defining a joint distribution. The hierarchical structure of the model will then allow it to be decomposed into submodels, i.e., the joint distribution decomposed into a product of [http://en.wikipedia.org/wiki/Conditional_probability_distribution conditional distributions].<br /> <br /> Tasks such as estimation, model selection, simulation and optimization can then be expressed as specific ways of using this probability distribution.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - A model is a joint probability distribution. <br /> <br /> - A submodel is a conditional distribution derived from this joint distribution. <br /> <br /> - A task is a specific use of this distribution. <br /> }}<br /> <br /> We will illustrate this approach starting with a very simple example that we will gradually make more sophisticated. Then we will see in various situations what can be defined as the model and what its inputs are.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==An illustrative example==<br /> <br /> &lt;br&gt;<br /> ===A model for the observations of a single individual===<br /> Let $y=(y_j, 1\leq j \leq n)$ be a vector of observations obtained at times $\vt=(t_j, 1\leq j \leq n)$. We consider that the $y_j$ are random variables and we denote $\qy$ the distribution (or [http://en.wikipedia.org/wiki/Probability_density_function pdf]) of $y$. If we assume a [http://en.wikipedia.org/wiki/Parametric_model parametric model], then there exists a vector of parameters $\psi$ that completely define $y$.<br /> <br /> We can then explicitly represent this dependency with respect to $\bpsi$ by writing $\qy( \, \cdot \, ; \psi)$ for the pdf of $y$.<br /> <br /> If we wish to be even more precise, we can even make it clear that this distribution is defined for a given design, i.e., a given vector of times $\vt$, and write $\qy(\, \cdot \, ; \psi,\vt)$ instead.<br /> <br /> By convention, the variables which are before the symbol &quot;;&quot; are random variables. Those that are after the &quot;;&quot; are non-random parameters or variables.<br /> When there is no risk of confusion, the non-random terms can be left out of the notation.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> -In this context, the model is the distribution of the observations $\qy(\, \cdot \, ; \psi,\vt)$. &lt;br&gt;<br /> -The inputs of the model are the parameters $\psi$ and the design $\vt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= 500 mg of a drug is given by [http://en.wikipedia.org/wiki/Intravenous_therapy intravenous] [http://en.wikipedia.org/wiki/Bolus_%28medicine%29 bolus] to a patient at time 0. We assume that the evolution of the [http://en.wikipedia.org/wiki/Blood_plasma plasmatic] concentration of the drug over time is described by the [http://en.wikipedia.org/wiki/Pharmacokinetics pharmacokinetic] (PK) model<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; f(t;V,k) = \displaystyle{ \frac{500}{V} }e^{-k \, t} , &lt;/math&gt; }}<br /> <br /> where $V$ is the [http://en.wikipedia.org/wiki/Volume_of_distribution volume of distribution] and $k$ the [http://en.wikipedia.org/wiki/Elimination_rate_constant elimination rate constant]. The concentration is measured at times $(t_j, 1\leq j \leq n)$ with additive residual errors:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; y_j = f(t_j;V,k) + e_j , \quad 1 \leq j \leq n . &lt;/math&gt; }}<br /> <br /> Assuming that the residual errors $(e_j)$ are [http://en.wikipedia.org/wiki/Dependent_and_independent_variables independent] and [http://en.wikipedia.org/wiki/Normal_distribution normally distributed] with constant variance $a^2$, the observed values $(y_j)$ are also independent random variables and<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba1&quot; &gt;&lt;math&gt;<br /> y_j \sim {\cal N} \left( f(t_j ; V,k) , a^2 \right), \quad 1 \leq j \leq n. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> Here, the vector of parameters $\psi$ is $(V,k,a)$. $V$ and $k$ are the PK parameters for the structural PK model and $a$ the residual error parameter.<br /> As the $y_j$ are independent, the joint distribution of $y$ is the product of their marginal distributions:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \py(y ; \psi,\vt) = \prod_{j=1}^n \pyj(y_j ; \psi,t_j) ,<br /> &lt;/math&gt; }}<br /> <br /> where $\qyj$ is the normal distribution defined in [[#ex_proba1|(1)]]. <br /> }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> === A model for several individuals ===<br /> <br /> Now let us move to $N$ individuals. It is natural to suppose that each is represented by the same basic parametric model, but not necessarily the exact same parameter values. Thus, individual $i$ has parameters $\psi_i$. If we consider that individuals are randomly selected from the [http://en.wikipedia.org/wiki/Statistical_population population], then we can treat the $\psi_i$ as if they were random vectors. As both $\by=(y_i , 1\leq i \leq N)$ and $\bpsi=(\psi_i , 1\leq i \leq N)$ are random, the model is now a joint distribution: $\qypsi$. Using basic probability, this can be written as:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsi(\by,\bpsi) = \pcypsi(\by {{!}} \bpsi) \, \ppsi(\bpsi) .&lt;/math&gt; }}<br /> <br /> If $\qpsi$ is a parametric distribution that depends on a vector $\theta$ of ''population parameters'' and a set of ''individual covariates'' $\bc=(c_i , 1\leq i \leq N)$, this dependence can be made explicit by writing $\qpsi(\, \cdot \,;\theta,\bc)$ for the pdf of $\bpsi$.<br /> Each $i$ has a potentially unique set of times $t_i=(t_{i1},\ldots,t_{i \ \!\!n_i})$ in the design, and $n_i$ can be different for each individual.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - In this context, the model is the joint distribution of the observations and the individual parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pypsi(\by , \bpsi; \theta, \bc,\bt)=\pcypsi(\by {{!}} \bpsi;\bt) \, \ppsi(\bpsi;\theta,\bc) . &lt;/math&gt;}}<br /> <br /> - The inputs of the model are the population parameters $\theta$, the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times <br /> :$\bt=(t_{ij} ,\ 1\leq i \leq N ,\ 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= Let us suppose $N$ patients received the same treatment as the single patient did. We now have the same PK model [[#ex_proba1|(1)]] for each patient, except that each has its own individual PK parameters $V_i$ and $k_i$ and potentially its own residual error parameter $a_i$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2a&quot;&gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N} \left( \displaystyle{\frac{500}{V_i}e^{-k_i \, t_{ij} } } , a_i^2 \right).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> Here, $\psi_i = (V_i,k_i,a_i)$. One possible model is then to assume the same residual error model for all patients, and log-normal distributions for $V$ and $k$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> a_i &amp;=&amp; a \end{eqnarray}&lt;/math&gt; }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2b&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \log(V_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(V_{\rm pop})+\beta\,\log(w_i/70),\ \omega_V^2\right) \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \log(k_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(k_{\rm pop}),\ \omega_k^2\right), \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where the only covariate we choose to consider, $w_i$, is the weight (in kg) of patient $i$. The model therefore consists of the conditional distribution of the concentrations defined in [[#ex_proba2a|(2)]] and the distribution of the individual PK parameters defined in [[#ex_proba2b|(3)]]. The inputs of the model are the population parameters $\theta = (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$, the covariates (here, the weight) $(w_i, 1\leq i \leq N)$, and the design $\bt$.<br /> }}<br /> <br /> &lt;br&gt;<br /> ===A model for the population parameters===<br /> <br /> In some cases, it may turn out that it is useful or important to consider that the population parameter $\theta$ is itself random, rather than fixed. There are various reasons for this, such as if we want to model uncertainty in its value, introduce a priori information in an estimation context, or model an inter-population variability if the model is not looking at only one given population.<br /> <br /> If so, let us denote $\qth$ the distribution of $\theta$. As the status of $\theta$ has therefore changed, the model now becomes the joint distribution of its random variables, i.e., of $\by$, $\bpsi$ and $\theta$, and can be decomposed as follows:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3a&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta;\bt,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> |text= &lt;ol&gt;<br /> &lt;li&gt; The formula is identical for $\ppsi(\bpsi; \theta)$ and $\pcpsith(\bpsi{{!}}\theta)$. What has changed is the status of $\theta$. It is not random in $\ppsi(\bpsi; \theta)$, the distribution of $\bpsi$ for any given value of $\theta$, whereas it is random in $\pcpsith(\bpsi {{!}} \theta)$, the conditional distribution of $\bpsi$, i.e., the distribution of $\bpsi$ obtained after observing randomly generated $\theta$. &lt;/li&gt;&lt;br&gt;<br /> <br /> &lt;li&gt;If $\qth$ is a parametric distribution with parameter $\varphi$, this dependence can be made explicit by writing $\qth(\, \cdot \,;\varphi)$ for the distribution of $\theta$.&lt;/li&gt;&lt;br&gt;<br /> <br /> &lt;li&gt;Not necessarily all of the components of $\theta$ need be random. If it is possible to decompose $\theta$ into $(\theta_F,\theta_R)$, where $\theta_F$ is fixed and $\theta_R$ random, then the decomposition [[#proba3a{{!}}(4)]] becomes &lt;/li&gt;<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3b&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta_R;\bt,\theta_F,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta_R;\theta_F,\bc) \, \pth(\theta_R).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(5) }} <br /> &lt;/ol&gt;}}<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the population parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsith(\by,\bpsi,\theta;\bc,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta). &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times $\bt=(t_{ij} , 1\leq i \leq N , 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We can introduce [http://en.wikipedia.org/wiki/Prior_probability prior distributions] in order to model the inter-population variability of the population parameters $V_{\rm pop}$ and $k_{\rm pop}$: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba3&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> V_{\rm pop} &amp;\sim&amp; {\cal N}\left(30,3^2\right) <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(6) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> k_{\rm pop} &amp;\sim&amp; {\cal N}\left(0.1,0.01^2\right). \end{eqnarray}&lt;/math&gt; }}<br /> <br /> As before, the conditional distribution of the concentration is given by [[#ex_proba2a|(2)]]. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given $\theta_R=(V_{\rm pop},k_{\rm pop})$. The distribution of $\theta_R$ is defined in [[#ex_proba3|(6)]]. Here, the inputs of the model are the fixed population parameters $\theta_F = (\omega_V,\omega_k,\beta,a)$, the weights $(w_i)$ and the design $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the covariates===<br /> <br /> <br /> Another scenario is to suppose that in fact it is the covariates $\bc$ that are random, not the population parameters. This may either be in the context of wanting to simulate individuals, or when modeling and wanting to take into account uncertainty in the covariate values. If we note $\qc$ the distribution of the covariates, the joint distribution $\qpsic$ of the individual parameters and the covariates decomposes naturally as:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba4&quot;&gt;&lt;math&gt;<br /> \ppsic(\bpsi,\bc;\theta) = \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) \, ,<br /> &lt;/math&gt;<br /> |reference=(7) }}<br /> <br /> where $\qcpsic$ is the conditional distribution of $\bpsi$ given $\bc$.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> &lt;li&gt;In this context, the model is the joint distribution of the observations, the individual parameters and the covariates:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsic(\by,\bpsi,\bc;\theta,\bt) = \pcypsi(\by {{!}} \bpsi;\bt) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) . &lt;/math&gt; }}<br /> <br /> &lt;li&gt;The inputs of the model are the population parameters $\theta$ and the measurement times $\bt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We could assume a normal distribution as a prior for the weights: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba4&quot; &gt;&lt;math&gt; <br /> w_i \sim_{i.i.d.} {\cal N}\left(70,10^2\right). &lt;/math&gt;&lt;/div&gt;<br /> |reference=(8) }}<br /> <br /> Once more, [[#ex_proba2a|(2)]] defines the conditional distribution of the concentrations. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given the weight $\bw$, which is now a random variable whose distribution is defined in [[#ex_proba4|(8)]]. Now, the inputs of the model are the population parameters $\theta= (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$ and the design $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the measurement times===<br /> <br /> Another scenario is to suppose that there is uncertainty in the measurement times $\bt=(t_{ij})$ and not the population parameters or covariates. If we note $\nominal{\bt}=(\nominal{t}_{ij}, 1\leq i \leq N, 1\leq j \leq n_i)$ the nominal measurement times (i.e., those presented in a data set), then the &quot;true&quot; measurement times $\bt$ at which the measurement were made can be considered random fluctuations around $\nominal{\bt}$ following some distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> Randomness with respect to time can also appear in the presence of dropout, i.e., individuals that prematurely leave a clinical trial. For such an individual $i$ who leaves at the random time $T_i$, measurement times are the nominal times before $T_i$: $t_{i} = (\nominal{t}_{ij} \ \ {\rm s.t. }\ \ \nominal{t}_{ij}\leq T_i)$.<br /> In such situations, measurement times are therefore random and can be thought to come from a distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> <br /> {{Remarks<br /> |title=Remark<br /> |text= If there are also other regression variables $\bx=(x_{ij})$, it is of course possible to use the same approach and consider $\bx$ as a random variable fluctuating around $\nominal{\bx}$. }}<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsit(\by , \bpsi,\bt; \theta,\bc,\nominal{\bt})=\pcypsit(\by {{!}}\bpsi,\bt) \, \ppsi(\bpsi;\theta,\bc) \, \pt(\bt ; \nominal{\bt}) .<br /> &lt;/math&gt; }}<br /> <br /> <br /> &lt;li&gt; The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$ and the nominal design $\nominal{\bt}$.<br /> }}<br /> <br /> {{Example<br /> |title=Example:<br /> |text= Let us assume as prior a normal distribution around the nominal times: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba5&quot; &gt;&lt;math&gt; <br /> t_{ij} \sim_{i.i.d.} {\cal N}\left(\nominal{t}_{ij},0.03^2\right). &lt;/math&gt;&lt;/div&gt;<br /> |reference=(9) }}<br /> <br /> Here, [[#ex_proba5|(9)]] defines the distribution of the now random variable $\bt$. The other components of the model defined in [[#ex_proba2a|(2)]] and [[#ex_proba2b|(3)]] remain unchanged. <br /> The inputs of the model are the population parameters $\theta$, the weights $(w_i)$ and the nominal measurement times $\nominal{\bt}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the dose regimen===<br /> <br /> If the structural model is a dynamical system (e.g., defined by a system of [http://en.wikipedia.org/wiki/Ordinary_differential_equation ordinary differential equations]), the ''source terms'' $\bu = (u_i, 1\leq i \leq N)$, i.e., the inputs of the dynamical system, are usually considered fixed and known. This is the case for example for doses administered to patients for a given treatment. Here, the source term $u_i$ is made up of the dose(s) given to patient $i$, the time(s) of administration, and their type (intravenous bolus, infusion, oral, etc.).<br /> <br /> Here again, there may be differences between the nominal dose regimen stated in the protocol and given in the data set, and the dose regimen that was in reality administered. For example, it might be that the times of administration and/or the dosage were not exactly respected or recorded. Also, there may have been non-compliance, i.e., certain doses that were not taken by the patient.<br /> <br /> If we denote $\nominal{\bu}=(\nominal{u}_{i}, 1\leq i \leq N)$ the nominal dose regimens (reported in the dataset), then in this context the &quot;real&quot; dose regimens $\bu$ can be considered to randomly fluctuate around $\nominal{\bu}$ with some distribution $\qu(\, \cdot \, ; \nominal{\bu})$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the dose regimens:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsiu(\by , \bpsi,\bu; \theta,\bc,\bt,\nominal{\bu})=\pcypsiu(\by {{!}} \bpsi,\bu;\bt) \, \pu(\bu ; \nominal{\bu}) \, \ppsi(\bpsi;\theta,\bc) .<br /> &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$, the nominal design $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=Suppose that instead of the one dose given in the example up to now, there are now repeated doses $(d_{ik}, k \geq 1)$ administered to patient $i$ at times $(\tau_{ik} , k \geq 1)$. Then, it is easy to see that<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6b&quot; &gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N}\left(f(t_{ij};V_i,k_i) , a_i^2 \right), &lt;/math&gt;&lt;/div&gt;<br /> |reference=(10) }}<br /> <br /> where<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6a&quot; &gt;&lt;math&gt; <br /> f(t;V_i,k_i) = \sum_{k, \tau_{ik}&lt;t}\displaystyle {\frac{d_{ik} }{V_i} }\, e^{-k_i \, (t- \tau_{ik})} .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(11) }}<br /> <br /> The &quot;real&quot; dose regimen administered to patient $i$ can be written $u_i=(d_{ik},\tau_{ik}, k\geq 1)$, and the prescribed dose regimen $\nominal{u}_i=(\nominal{d}_{ik},\nominal{\tau}_{ik}, k\geq 1)$.<br /> <br /> We can model the random fluctuations of the administration times $\tau_{ik}$ around the nominal times $(\nominal{\tau}_{ij})$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6c&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \tau_{ik} &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\nominal{\tau}_{ik},0.02^2\right)\ , <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(12) }}<br /> <br /> and non-compliance (here meaning that a dose is not taken):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6d&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \pi &amp;=&amp; \prob{d_{ik} = 0} \nonumber \\ &amp;=&amp; 1 - \prob{d_{ik}= \nominal{d}_{ik} }. <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(13) }}<br /> <br /> Here, [[#ex_proba6b{{!}}(10)]] and [[#ex_proba6a{{!}}(11)]] define the conditional distributions of the concentrations $(y_{ij})$, [[#ex_proba2b{{!}}(3)]] defines the distribution of $\bpsi$ and [[#ex_proba6c{{!}}(12)]] and [[#ex_proba6d{{!}}(13)]] define the distribution of $\bu$. The inputs are the population parameters $\theta$, the weights $(w_i)$, the measurement times $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A complete model===<br /> <br /> We have now seen the variety of ways in which the variables in a model either play the role of random variables whose distribution is defined by the model, or that of nonrandom variables or parameters. Any combination is possible, depending on the context. For instance, the population parameters $\theta$ and covariates $\bc$ could be random with parametric probability distributions $\qth(\, \cdot \,;\varphi)$ and $\qc(\, \cdot \,;\gamma)$, and the dose regimen $\bu$ and measurement times $\bt$ reported with uncertainty and therefore modeled as random variables with distribution $\qu$ and $\qt$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters, the population parameters, the dose regimens, the covariates and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsithcut(\by , \bpsi, \theta, \bu, \bc,\bt; \nominal{\bu},\nominal{\bt},\varphi,\gamma)=\pcypsiut(\by {{!}}\bpsi,\bu,\bt) \, \pcpsithc(\bpsi{{!}}\theta,\bc) \, \pth(\theta;\varphi) \, \pc(\bc;\gamma) \, \pu(\bu ; \nominal{\bu}) \, \pt(\bt ; \nominal{\bt}). &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the nominal dose regimens $\nominal{\bu}$, the nominal measurement times $\nominal{\bt}$ and the &quot;hyper-parameters&quot; $\varphi$ and $\gamma$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Using the model for executing tasks==<br /> <br /> <br /> In a modeling and simulation context, the tasks to execute make specific use of the various probability distributions associated with a model.<br /> <br /> &lt;br&gt;<br /> ===Simulation===<br /> <br /> <br /> By definition, simulation makes direct use of the probability distribution that defines the model. Simulation of the global model is straightforward as soon as the joint distribution can be decomposed into a product of easily simulated conditional distributions.<br /> <br /> Consider for example that the variables involved in the model are those introduced in [[#An illustrative example|the previous section]]:<br /> <br /> <br /> # The population parameters $\theta$ can either be given, or simulated from the distribution $\qth$.<br /> # The individual covariates $\bc$ can either be given, or simulated from the distribution $\qc$.<br /> # The individual parameters $\bpsi$ can be simulated from the distribution $\qcpsithc$ using the values of $\theta$ and $\bc$ obtained in steps 1 and 2.<br /> # The dose regimen $\bu$ can either be given, or simulated from the distribution $\qu$.<br /> # The measurement times $\bt$ (resp. regression variables $\bx$) can either be given, or simulated from the distribution $\qt$ (resp. $\qx$).<br /> # Lastly, observations $\by$ can be simulated from the distribution $\qcypsiut$ using the values of $\bpsi$, $\bu$ and $\bt$ obtained at steps 3, 4 and 5.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> Simulation of a set of variables $w$ using another given set of variables $z$ requires:<br /> <br /> <br /> &lt;ul&gt;<br /> * a model, i.e., a distribution $\qw$ if $z$ is treated as a nonrandom variable, or a conditional distribution $\qcwz$ if $z$ is treated as a random variable.<br /> * the input $z$, i.e., a value of $z$ which allows the distribution $\qw(\, \cdot \, ; z)$ or the conditional distribution $\qcwz(\, \cdot \, {{!}} z)$ to be defined.<br /> * an algorithm which allows us to generate $w$ from $\qw$ or $\qcwz$.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=<br /> - Imagine instead that the population parameter $\theta$ and the design $(\bu,\bt)$ are given, and we want to simulate the individual covariates $\bc$, the individual parameters $\bpsi$ and the observations $\by$. Here, the variables to simulate are $w=(\bc,\bpsi,\by)$ and the variables which are given are $z=(\theta,\bu,\bt)$. If the components of $z$ are taken to be nonrandom variables, then:<br /> <br /> <br /> &lt;ul&gt;<br /> * The model is the joint distribution $\qypsic( \, \cdot \, ;\theta,\bu,\bt)$ of $(\by,\bpsi,\bc)$.<br /> * The inputs required for the simulation are the values of $(\theta,\bu,\bt)$.<br /> * The algorithm should be able to generate $(\by,\bpsi,\bc)$ from the joint distribution $\qypsic(\, \cdot \, ;\theta,\bu,\bt)$. Decomposing the model into three submodels leads to decomposing the joint distribution as<br /> &lt;/ul&gt;<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsic(\by,\bpsi,\bc ;\theta,\bu,\bt) = \pc(\bc) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pcypsi(\by {{!}} \bpsi;\bu,\bt) . &lt;/math&gt; }}<br /> <br /> The algorithm therefore reduces to successively drawing $\bc$, $\bpsi$ and $\by$ from $\qc$, $\qcpsic(\, \cdot \, {{!}} \bc;\theta)$ and $\qcypsi(\, \cdot \, {{!}} \bpsi;\bu,\bt)$. <br /> <br /> <br /> - Imagine instead that the individual covariates $\bc$, the observations $\by$, the design $(\bu,\bt)$ and the population parameter $\theta$ are given (in a modeling context for instance, $\theta$ may have been estimated), and we want to simulate the individual parameters $\bpsi$. The only variable to simulate is $w=\bpsi$ and the variables which are given are $z=(\by,\bc,\theta,\bu,\bt)$. Here, we will treat $\by$ as if it is a random variable. The other components of $z$ can be treated as non random variables. Here,<br /> <br /> <br /> &lt;ul&gt;<br /> * The model is the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$ of $\psi$.<br /> * The inputs required for the simulation are the values of $(\by,\bc,\theta,\bu,\bt)$.<br /> * The algorithm should be able to sample $\bpsi$ from the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$. Markov Chain Monte Carlo (MCMC) algorithms can be used for sampling from such complex conditional distributions.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> &lt;br&gt;<br /> <br /> ===Estimation of the population parameters===<br /> <br /> <br /> In a modeling context, we usually assume that we have data that includes the observations $\by$ and the measurement times $\bt$. There may also be individual covariates $\bc$, and in pharmacological applications the dose regimen $\bu$. For clarity, let us consider the most general case where all are present.<br /> <br /> Any statistical method for estimating the population parameters $\theta$ will be based on some specific probability distribution. Let us illustrate this with two common statistical methods: maximum likelihood and Bayesian estimation.<br /> <br /> <br /> ''Maximum likelihood estimation'' consists in maximizing with respect to $\theta$ the ''observed likelihood'', defined by:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\like}(\theta ; \by,\bc,\bu,\bt) &amp;\eqdef&amp; \py(\by ; \bc,\bu,\bt,\theta) \\<br /> &amp;=&amp; \int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The variance of the estimator $\thmle$ and therefore confidence intervals can be derived from the observed Fisher information matrix, which itself is calculated using the observed likelihood (i.e., the pdf of the observations $\by$):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ofim_intro3&quot;&gt;&lt;math&gt;<br /> \ofim(\thmle ; \by,\bc,\bu,\bt) \ \ \eqdef \ \ - \displaystyle{ \frac{\partial^2}{\partial \theta^2} } \log({\like}(\thmle ; \by,\bc,\bu,\bt)) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(14) }}<br /> <br /> <br /> {{OutlineText<br /> |text=Maximum likelihood estimation of the population parameter $\theta$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi$.<br /> * inputs $\by$, $\bc$, $\bu$ and $\bt$.<br /> * an algorithm which allows us to maximize $\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi$ with respect to $\theta$ and to compute $\displaystyle{ \frac{\partial^2}{\partial \theta^2} }\left\{\log\left(\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\thmle) \, d \bpsi \right)\right\}$.<br /> }}<br /> <br /> <br /> ''Bayesian estimation'' consists in estimating and/or maximizing the conditional distribution<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcthy(\theta {{!}} \by ;\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pyth(\by , \theta ; \bc,\bu,\bt)}{\py(\by ; \bc,\bu,\bt)} } \\<br /> &amp;=&amp; \frac{\displaystyle{ \int \pypsith(\by,\bpsi,\theta ; \bc,\bu,\bt) \, d \bpsi} }{\py(\by ; \bc,\bu,\bt)} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> {{outlineText<br /> |text= Bayesian estimation of the population parameter $\theta$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsith(\, \cdot \, ; \bc, \bu, \bt)$ for $(\by,\bpsi,\theta)$.<br /> * inputs $\by$, $\bc$, $\bu$ and $\bt$.<br /> * algorithms able to estimate and maximize $\pcthy(\theta {{!}} \by ;\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode, i.e., finding its maximum.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Estimation of the individual parameters===<br /> <br /> <br /> When $\theta$ is given (or estimated), various estimators of the individual parameters $\bpsi$ are available. They are all based on a probability distribution:<br /> <br /> ''Maximum likelihood estimation'' consists of maximizing with respect to $\bpsi$ the ''conditional likelihood''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\like}(\bpsi ; \by,\bu,\bt) &amp;\eqdef&amp; \pcypsi(\by {{!}} \bpsi ;\bu,\bt) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''maximum a posteriori'' (MAP) estimator is obtained by maximizing with respect to $\bpsi$ the ''conditional distribution''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcpsiy(\bpsi {{!]}} \by ; \theta,\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pypsi(\by , \bpsi;\theta,\bc,\bu,\bt)}{\py(\by ; \theta,\bc,\bu,\bt)} } .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''conditional mean'' of $\bpsi$ is defined as the mean of the conditional distribution $\qcpsiy(\, \cdot \, | \by ; \theta,\bc,\bu,\bt)$ of $\psi$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> Estimation of the individual parameters $\bpsi$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br /> * algorithms able to estimate and maximize $\pcpsiy(\bpsi {{!}} \by ; \theta,\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode (i.e., its MAP).<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> ===Model selection===<br /> <br /> <br /> Likelihood ratio tests and statistical information criteria (BIC, AIC) compare the ''observed likelihoods'' computed under different models, i.e., the probability distribution functions $\py^{(1)}(\by ; \bc,\bu,\bt,\thmle_1)$, $\py^{(2)}(\by ; \bc,\bu,\bt,\thmle_2)$, ..., $\py^{(K)}(\by ; \bc,\bu,\bt,\thmle_K)$ computed under models ${\cal M}_1, {\cal M}_2$, ..., ${\cal M}_K$, where $\thmle_k$ maximizes the observed likelihood of model ${\cal M}_k$, i.e., maximizes $\py^{(k)}(\by ; \bc,\bu,\bt,\theta)$ .<br /> <br /> <br /> {{outlineText<br /> |text=<br /> Computing the observed likelihood and information criteria require:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br /> * an algorithm able to compute $\int \pypsi( \by ,\bpsi ;\theta,\bc,\bu,\bt) \, d\bpsi$. For nonlinear models, linearization methods or Monte-Carlo methods can be used.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Optimal design===<br /> <br /> In the design of experiments for estimating statistical models, optimal designs allow parameters to be estimated with minimum variance by optimizing some statistical criteria. Common optimality criteria are functionals of the eigenvalues of the expected Fisher information matrix:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;efim_intro3&quot;&gt;&lt;math&gt;<br /> \efim(\theta ; \bu,\bt) \ \ \eqdef \ \ \esps{y}{\ofim(\theta ; \by,\bu,\bt)} ,<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(15) }}<br /> <br /> where $\ofim$ is the observed Fisher information matrix defined in [[#ofim_intro3|(15)]]. For the sake of simplicity, we consider models without covariates $\bc$.<br /> <br /> <br /> {{OutlineText<br /> |text=Optimal design for minimum variance estimation requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * a vector of population parameters $\theta$.<br /> * a criteria ${\cal D}(\bu,\bt)$ derived from the expected Fisher information matrix $\efim(\theta ; \bu,\bt)$.<br /> * an algorithm able to estimate $\efim(\theta ; \bu,\bt)$ for any design $(\bu,\bt)$ and to maximize ${\cal D}(\bu,\bt)$ with respect to $\bu$ and $\bt$.<br /> }}<br /> <br /> <br /> In a clinical trial context, studies are designed to optimize the probability of reaching some predefined target ${\cal A}$, i.e., $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$. This may include optimizing safety and efficacy, and things like the probability of reaching sustained virologic response, etc.<br /> <br /> <br /> {{OutlineText<br /> |text=Optimal design for clinical trials requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsic(\, \cdot \, ; \theta, \bu, \bt)$ for $(\by,\bpsi,\bc)$.<br /> * a vector of population parameters $\theta$.<br /> * a target ${\cal A}$.<br /> * an algorithm able to estimate $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$ and to maximize it with respect to $\bu$ and $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Implementing models and running tasks==<br /> <br /> <br /> ===Example 1 ===<br /> <br /> Consider first the model defined by the joint distribution <br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\pypsi(\by,\bpsi ; \theta ,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \pcpsic(\bpsi ; \theta),&lt;/math&gt;}}<br /> <br /> where as in our running example, <br /> <br /> <br /> &lt;ul&gt;<br /> * $\by = (y_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are concentrations <br /> <br /> * $\bpsi= (\psi_i, 1\leq i \leq N)$ are individual parameters; here $\psi_i=(V_i,k_i,a_i)$<br /> <br /> * $\theta=(V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,a)$ are population parameters <br /> <br /> * $\bt = (t_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are the measurement times. <br /> &lt;/ul&gt;<br /> <br /> <br /> We aim to define a joint model for $\by$ and $\bpsi$. To do this, we will characterize each component of the model and show how this can be implemented with $\mlxtran$.<br /> <br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot;|<br /> {{Equation2 <br /> |name=&lt;math&gt; \pypsi(\by,\bpsi ; \theta, \bt) &lt;/math&gt; <br /> |equation= }}<br /> {{Equation2<br /> |name= &lt;math&gt; \pcpsic(\bpsi {{!}}\theta)&lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{array}{c} <br /> \log(V_i) &amp;\sim&amp; {\cal N}\left(\log(V_{\rm pop}), \, \omega_V^2\right) \\<br /> \log(k_i) &amp;\sim&amp; {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br /> \end{array}&lt;/math&gt; }}<br /> {{Equation2<br /> |name= &lt;math&gt;\pcypsi(y{{!}}\bpsi; \bt) &lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{eqnarray}<br /> f(t;V_i,k_i) &amp;=&amp; \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br /> y_{ij} &amp;\sim&amp; {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> |style = &quot;width:50%&quot; |<br /> {{MLXTranForTable<br /> |name=Example 1<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> [INDIVIDUAL PARAMETER]<br /> input={V_pop,k_pop,omega_V,omega_k}<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=V_pop,sd=omega_V}<br /> k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br /> <br /> <br /> [OBSERVATION]<br /> input={V,k,a}<br /> <br /> EQUATION:<br /> f = 500/V*exp(-k*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, sd=a}<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> <br /> We can then use this model with different tools for executing different tasks: it can be used for example with $\mlxplore$ for model exploration, with $\monolix$ for modeling, with R or Matlab for simulation...<br /> <br /> It is important to remember that $\mlxtran$ is not a &quot;function&quot; that calculates an output. It is not an imperative but rather a declarative language, one that allows us to describe a model. It is then the tasks we choose to do which use $\mlxtran$ like a function, &quot;requesting&quot; it to give predictions, simulate random variables, compute a pdf, maximizes a likelihood, etc.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Example 1 ===<br /> <br /> Consider first the model defined by the joint distribution <br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\pypsi(\by,\bpsi ; \theta ,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \pcpsic(\bpsi ; \theta),&lt;/math&gt;}}<br /> <br /> where as in our running example, <br /> <br /> <br /> &lt;ul&gt;<br /> * $\by = (y_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are concentrations <br /> <br /> * $\bpsi= (\psi_i, 1\leq i \leq N)$ are individual parameters; here $\psi_i=(V_i,k_i,a_i)$<br /> <br /> * $\theta=(V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,a)$ are population parameters <br /> <br /> * $\bt = (t_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are the measurement times. <br /> &lt;/ul&gt;<br /> <br /> <br /> We aim to define a joint model for $\by$ and $\bpsi$. To do this, we will characterize each component of the model and show how this can be implemented with $\mlxtran$.<br /> <br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot;|<br /> {{Equation2 <br /> |name=&lt;math&gt; \pypsi(\by,\bpsi ; \theta, \bt) &lt;/math&gt; <br /> |equation= }}<br /> {{Equation2<br /> |name= &lt;math&gt; \pcpsic(\bpsi {{!}}\theta)&lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{array}{c} <br /> \log(V_i) &amp;\sim&amp; {\cal N}\left(\log(V_{\rm pop}), \, \omega_V^2\right) \\<br /> \log(k_i) &amp;\sim&amp; {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br /> \end{array}&lt;/math&gt; }}<br /> {{Equation2<br /> |name= &lt;math&gt;\pcypsi(y{{!}}\bpsi; \bt) &lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{eqnarray}<br /> f(t;V_i,k_i) &amp;=&amp; \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br /> y_{ij} &amp;\sim&amp; {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> |style = &quot;width:50%&quot; |<br /> {{MLXTranForTable<br /> |name=Example 1<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> [INDIVIDUAL PARAMETER]<br /> input={V_pop,k_pop,omega_V,omega_k}<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=V_pop,sd=omega_V}<br /> k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br /> <br /> <br /> [OBSERVATION]<br /> input={V,k,a}<br /> <br /> EQUATION:<br /> f = 500/V*exp(-k*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, sd=a}<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> <br /> We can then use this model with different tools for executing different tasks: it can be used for example with $\mlxplore$ for model exploration, with $\monolix$ for modeling, with R or Matlab for simulation...<br /> <br /> It is important to remember that $\mlxtran$ is not a &quot;function&quot; that calculates an output. It is not an imperative but rather a declarative language, one that allows us to describe a model. It is then the tasks we choose to do which use $\mlxtran$ like a function, &quot;requesting&quot; it to give predictions, simulate random variables, compute a pdf, maximizes a likelihood, etc.<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack=The individual approach<br /> |linkNext=Description, representation and implementation of a model }}</div> Bleakley https://wiki.inria.fr/wikis/popix/index.php?title=What_is_a_model%3F_A_joint_probability_distribution!&diff=7360 What is a model? A joint probability distribution! 2013-06-18T08:30:22Z <p>Bleakley: </p> <hr /> <div>==Introduction==<br /> <br /> A model built for real-world applications can involve various types of variable, such as measurements, individual and population parameters, covariates, design, etc. The model allows us to represent relationships between these variables.<br /> <br /> If we consider things from a probabilistic point of view, some of the variables will be random, so the model becomes a probabilistic one, representing the [http://en.wikipedia.org/wiki/Joint_probability_distribution joint distribution] of these random variables.<br /> <br /> Defining a model therefore means defining a joint distribution. The hierarchical structure of the model will then allow it to be decomposed into submodels, i.e., the joint distribution decomposed into a product of [http://en.wikipedia.org/wiki/Conditional_probability_distribution conditional distributions].<br /> <br /> Tasks such as estimation, model selection, simulation and optimization can then be expressed as specific ways of using this probability distribution.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - A model is a joint probability distribution. <br /> <br /> - A submodel is a conditional distribution derived from this joint distribution. <br /> <br /> - A task is a specific use of this distribution. <br /> }}<br /> <br /> We will illustrate this approach starting with a very simple example that we will gradually make more sophisticated. Then we will see in various situations what can be defined as the model and what its inputs are.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==An illustrative example==<br /> <br /> &lt;br&gt;<br /> ===A model for the observations of a single individual===<br /> Let $y=(y_j, 1\leq j \leq n)$ be a vector of observations obtained at times $\vt=(t_j, 1\leq j \leq n)$. We consider that the $y_j$ are random variables and we denote $\qy$ the distribution (or [http://en.wikipedia.org/wiki/Probability_density_function pdf]) of $y$. If we assume a [http://en.wikipedia.org/wiki/Parametric_model parametric model], then there exists a vector of parameters $\psi$ that completely define $y$.<br /> <br /> We can then explicitly represent this dependency with respect to $\bpsi$ by writing $\qy( \, \cdot \, ; \psi)$ for the pdf of $y$.<br /> <br /> If we wish to be even more precise, we can even make it clear that this distribution is defined for a given design, i.e., a given vector of times $\vt$, and write $\qy(\, \cdot \, ; \psi,\vt)$ instead.<br /> <br /> By convention, the variables which are before the symbol &quot;;&quot; are random variables. Those that are after the &quot;;&quot; are non-random parameters or variables.<br /> When there is no risk of confusion, the non-random terms can be left out of the notation.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> -In this context, the model is the distribution of the observations $\qy(\, \cdot \, ; \psi,\vt)$. &lt;br&gt;<br /> -The inputs of the model are the parameters $\psi$ and the design $\vt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= 500 mg of a drug is given by [http://en.wikipedia.org/wiki/Intravenous_therapy intravenous] [http://en.wikipedia.org/wiki/Bolus_%28medicine%29 bolus] to a patient at time 0. We assume that the evolution of the [http://en.wikipedia.org/wiki/Blood_plasma plasmatic] concentration of the drug over time is described by the [http://en.wikipedia.org/wiki/Pharmacokinetics pharmacokinetic] (PK) model<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; f(t;V,k) = \displaystyle{ \frac{500}{V} }e^{-k \, t} , &lt;/math&gt; }}<br /> <br /> where $V$ is the [http://en.wikipedia.org/wiki/Volume_of_distribution volume of distribution] and $k$ the [http://en.wikipedia.org/wiki/Elimination_rate_constant elimination rate constant]. The concentration is measured at times $(t_j, 1\leq j \leq n)$ with additive residual errors:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; y_j = f(t_j;V,k) + e_j , \quad 1 \leq j \leq n . &lt;/math&gt; }}<br /> <br /> Assuming that the residual errors $(e_j)$ are [http://en.wikipedia.org/wiki/Dependent_and_independent_variables independent] and [http://en.wikipedia.org/wiki/Normal_distribution normally distributed] with constant variance $a^2$, the observed values $(y_j)$ are also independent random variables and<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba1&quot; &gt;&lt;math&gt;<br /> y_j \sim {\cal N} \left( f(t_j ; V,k) , a^2 \right), \quad 1 \leq j \leq n. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> Here, the vector of parameters $\psi$ is $(V,k,a)$. $V$ and $k$ are the PK parameters for the structural PK model and $a$ the residual error parameter.<br /> As the $y_j$ are independent, the joint distribution of $y$ is the product of their marginal distributions:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \py(y ; \psi,\vt) = \prod_{j=1}^n \pyj(y_j ; \psi,t_j) ,<br /> &lt;/math&gt; }}<br /> <br /> where $\qyj$ is the normal distribution defined in [[#ex_proba1|(1)]]. <br /> }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> === A model for several individuals ===<br /> <br /> Now let us move to $N$ individuals. It is natural to suppose that each is represented by the same basic parametric model, but not necessarily the exact same parameter values. Thus, individual $i$ has parameters $\psi_i$. If we consider that individuals are randomly selected from the [http://en.wikipedia.org/wiki/Statistical_population population], then we can treat the $\psi_i$ as if they were random vectors. As both $\by=(y_i , 1\leq i \leq N)$ and $\bpsi=(\psi_i , 1\leq i \leq N)$ are random, the model is now a joint distribution: $\qypsi$. Using basic probability, this can be written as:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsi(\by,\bpsi) = \pcypsi(\by {{!}} \bpsi) \, \ppsi(\bpsi) .&lt;/math&gt; }}<br /> <br /> If $\qpsi$ is a parametric distribution that depends on a vector $\theta$ of ''population parameters'' and a set of ''individual covariates'' $\bc=(c_i , 1\leq i \leq N)$, this dependence can be made explicit by writing $\qpsi(\, \cdot \,;\theta,\bc)$ for the pdf of $\bpsi$.<br /> Each $i$ has a potentially unique set of times $t_i=(t_{i1},\ldots,t_{i \ \!\!n_i})$ in the design, and $n_i$ can be different for each individual.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - In this context, the model is the joint distribution of the observations and the individual parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pypsi(\by , \bpsi; \theta, \bc,\bt)=\pcypsi(\by {{!}} \bpsi;\bt) \, \ppsi(\bpsi;\theta,\bc) . &lt;/math&gt;}}<br /> <br /> - The inputs of the model are the population parameters $\theta$, the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times <br /> :$\bt=(t_{ij} ,\ 1\leq i \leq N ,\ 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= Let us suppose $N$ patients received the same treatment as the single patient did. We now have the same PK model [[#ex_proba1|(1)]] for each patient, except that each has its own individual PK parameters $V_i$ and $k_i$ and potentially its own residual error parameter $a_i$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2a&quot;&gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N} \left( \displaystyle{\frac{500}{V_i}e^{-k_i \, t_{ij} } } , a_i^2 \right).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> Here, $\psi_i = (V_i,k_i,a_i)$. One possible model is then to assume the same residual error model for all patients, and log-normal distributions for $V$ and $k$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> a_i &amp;=&amp; a \end{eqnarray}&lt;/math&gt; }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2b&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \log(V_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(V_{\rm pop})+\beta\,\log(w_i/70),\ \omega_V^2\right) \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \log(k_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(k_{\rm pop}),\ \omega_k^2\right), \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where the only covariate we choose to consider, $w_i$, is the weight (in kg) of patient $i$. The model therefore consists of the conditional distribution of the concentrations defined in [[#ex_proba2a|(2)]] and the distribution of the individual PK parameters defined in [[#ex_proba2b|(3)]]. The inputs of the model are the population parameters $\theta = (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$, the covariates (here, the weight) $(w_i, 1\leq i \leq N)$, and the design $\bt$.<br /> }}<br /> <br /> &lt;br&gt;<br /> ===A model for the population parameters===<br /> <br /> In some cases, it may turn out that it is useful or important to consider that the population parameter $\theta$ is itself random, rather than fixed. There are various reasons for this, such as if we want to model uncertainty in its value, introduce a priori information in an estimation context, or model an inter-population variability if the model is not looking at only one given population.<br /> <br /> If so, let us denote $\qth$ the distribution of $\theta$. As the status of $\theta$ has therefore changed, the model now becomes the joint distribution of its random variables, i.e., of $\by$, $\bpsi$ and $\theta$, and can be decomposed as follows:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3a&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta;\bt,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> |text= &lt;ol&gt;<br /> &lt;li&gt; The formula is identical for $\ppsi(\bpsi; \theta)$ and $\pcpsith(\bpsi{{!}}\theta)$. What has changed is the status of $\theta$. It is not random in $\ppsi(\bpsi; \theta)$, the distribution of $\bpsi$ for any given value of $\theta$, whereas it is random in $\pcpsith(\bpsi {{!}} \theta)$, the conditional distribution of $\bpsi$, i.e., the distribution of $\bpsi$ obtained after observing randomly generated $\theta$. &lt;/li&gt;&lt;br&gt;<br /> <br /> &lt;li&gt;If $\qth$ is a parametric distribution with parameter $\varphi$, this dependence can be made explicit by writing $\qth(\, \cdot \,;\varphi)$ for the distribution of $\theta$.&lt;/li&gt;&lt;br&gt;<br /> <br /> &lt;li&gt;Not necessarily all of the components of $\theta$ need be random. If it is possible to decompose $\theta$ into $(\theta_F,\theta_R)$, where $\theta_F$ is fixed and $\theta_R$ random, then the decomposition [[#proba3a{{!}}(4)]] becomes &lt;/li&gt;<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3b&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta_R;\bt,\theta_F,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta_R;\theta_F,\bc) \, \pth(\theta_R).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(5) }} <br /> &lt;/ol&gt;}}<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the population parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsith(\by,\bpsi,\theta;\bc,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta). &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times $\bt=(t_{ij} , 1\leq i \leq N , 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We can introduce [http://en.wikipedia.org/wiki/Prior_probability prior distributions] in order to model the inter-population variability of the population parameters $V_{\rm pop}$ and $k_{\rm pop}$: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba3&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> V_{\rm pop} &amp;\sim&amp; {\cal N}\left(30,3^2\right) <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(6) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> k_{\rm pop} &amp;\sim&amp; {\cal N}\left(0.1,0.01^2\right). \end{eqnarray}&lt;/math&gt; }}<br /> <br /> As before, the conditional distribution of the concentration is given by [[#ex_proba2a|(2)]]. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given $\theta_R=(V_{\rm pop},k_{\rm pop})$. The distribution of $\theta_R$ is defined in [[#ex_proba3|(6)]]. Here, the inputs of the model are the fixed population parameters $\theta_F = (\omega_V,\omega_k,\beta,a)$, the weights $(w_i)$ and the design $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the covariates===<br /> <br /> <br /> Another scenario is to suppose that in fact it is the covariates $\bc$ that are random, not the population parameters. This may either be in the context of wanting to simulate individuals, or when modeling and wanting to take into account uncertainty in the covariate values. If we note $\qc$ the distribution of the covariates, the joint distribution $\qpsic$ of the individual parameters and the covariates decomposes naturally as:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba4&quot;&gt;&lt;math&gt;<br /> \ppsic(\bpsi,\bc;\theta) = \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) \, ,<br /> &lt;/math&gt;<br /> |reference=(7) }}<br /> <br /> where $\qcpsic$ is the conditional distribution of $\bpsi$ given $\bc$.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> &lt;li&gt;In this context, the model is the joint distribution of the observations, the individual parameters and the covariates:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsic(\by,\bpsi,\bc;\theta,\bt) = \pcypsi(\by {{!}} \bpsi;\bt) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) . &lt;/math&gt; }}<br /> <br /> &lt;li&gt;The inputs of the model are the population parameters $\theta$ and the measurement times $\bt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We could assume a normal distribution as a prior for the weights: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba4&quot; &gt;&lt;math&gt; <br /> w_i \sim_{i.i.d.} {\cal N}\left(70,10^2\right). &lt;/math&gt;&lt;/div&gt;<br /> |reference=(8) }}<br /> <br /> Once more, [[#ex_proba2a|(2)]] defines the conditional distribution of the concentrations. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given the weight $\bw$, which is now a random variable whose distribution is defined in [[#ex_proba4|(8)]]. Now, the inputs of the model are the population parameters $\theta= (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$ and the design $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the measurement times===<br /> <br /> Another scenario is to suppose that there is uncertainty in the measurement times $\bt=(t_{ij})$ and not the population parameters or covariates. If we note $\nominal{\bt}=(\nominal{t}_{ij}, 1\leq i \leq N, 1\leq j \leq n_i)$ the nominal measurement times (i.e., those presented in a data set), then the &quot;true&quot; measurement times $\bt$ at which the measurement were made can be considered random fluctuations around $\nominal{\bt}$ following some distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> Randomness with respect to time can also appear in the presence of dropout, i.e., individuals that prematurely leave a clinical trial. For such an individual $i$ who leaves at the random time $T_i$, measurement times are the nominal times before $T_i$: $t_{i} = (\nominal{t}_{ij} \ \ {\rm s.t. }\ \ \nominal{t}_{ij}\leq T_i)$.<br /> In such situations, measurement times are therefore random and can be thought to come from a distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> <br /> {{Remarks<br /> |title=Remark<br /> |text= If there are also other regression variables $\bx=(x_{ij})$, it is of course possible to use the same approach and consider $\bx$ as a random variable fluctuating around $\nominal{\bx}$. }}<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsit(\by , \bpsi,\bt; \theta,\bc,\nominal{\bt})=\pcypsit(\by {{!}}\bpsi,\bt) \, \ppsi(\bpsi;\theta,\bc) \, \pt(\bt ; \nominal{\bt}) .<br /> &lt;/math&gt; }}<br /> <br /> <br /> &lt;li&gt; The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$ and the nominal design $\nominal{\bt}$.<br /> }}<br /> <br /> {{Example<br /> |title=Example:<br /> |text= Let us assume as prior a normal distribution around the nominal times: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba5&quot; &gt;&lt;math&gt; <br /> t_{ij} \sim_{i.i.d.} {\cal N}\left(\nominal{t}_{ij},0.03^2\right). &lt;/math&gt;&lt;/div&gt;<br /> |reference=(9) }}<br /> <br /> Here, [[#ex_proba5|(9)]] defines the distribution of the now random variable $\bt$. The other components of the model defined in [[#ex_proba2a|(2)]] and [[#ex_proba2b|(3)]] remain unchanged. <br /> The inputs of the model are the population parameters $\theta$, the weights $(w_i)$ and the nominal measurement times $\nominal{\bt}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the dose regimen===<br /> <br /> If the structural model is a dynamical system (e.g., defined by a system of [http://en.wikipedia.org/wiki/Ordinary_differential_equation ordinary differential equations]), the ''source terms'' $\bu = (u_i, 1\leq i \leq N)$, i.e., the inputs of the dynamical system, are usually considered fixed and known. This is the case for example for doses administered to patients for a given treatment. Here, the source term $u_i$ is made up of the dose(s) given to patient $i$, the time(s) of administration, and their type (intravenous bolus, infusion, oral, etc.).<br /> <br /> Here again, there may be differences between the nominal dose regimen stated in the protocol and given in the data set, and the dose regimen that was in reality administered. For example, it might be that the times of administration and/or the dosage were not exactly respected or recorded. Also, there may have been non-compliance, i.e., certain doses that were not taken by the patient.<br /> <br /> If we denote $\nominal{\bu}=(\nominal{u}_{i}, 1\leq i \leq N)$ the nominal dose regimens (reported in the dataset), then in this context the &quot;real&quot; dose regimens $\bu$ can be considered to randomly fluctuate around $\nominal{\bu}$ with some distribution $\qu(\, \cdot \, ; \nominal{\bu})$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the dose regimens:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsiu(\by , \bpsi,\bu; \theta,\bc,\bt,\nominal{\bu})=\pcypsiu(\by {{!}} \bpsi,\bu;\bt) \, \pu(\bu ; \nominal{\bu}) \, \ppsi(\bpsi;\theta,\bc) .<br /> &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$, the nominal design $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=Suppose that instead of the one dose given in the example up to now, there are now repeated doses $(d_{ik}, k \geq 1)$ administered to patient $i$ at times $(\tau_{ik} , k \geq 1)$. Then, it is easy to see that<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6b&quot; &gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N}\left(f(t_{ij};V_i,k_i) , a_i^2 \right), &lt;/math&gt;&lt;/div&gt;<br /> |reference=(10) }}<br /> <br /> where<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6a&quot; &gt;&lt;math&gt; <br /> f(t;V_i,k_i) = \sum_{k, \tau_{ik}&lt;t}\displaystyle {\frac{d_{ik} }{V_i} }\, e^{-k_i \, (t- \tau_{ik})} .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(11) }}<br /> <br /> The &quot;real&quot; dose regimen administered to patient $i$ can be written $u_i=(d_{ik},\tau_{ik}, k\geq 1)$, and the prescribed dose regimen $\nominal{u}_i=(\nominal{d}_{ik},\nominal{\tau}_{ik}, k\geq 1)$.<br /> <br /> We can model the random fluctuations of the administration times $\tau_{ik}$ around the nominal times $(\nominal{\tau}_{ij})$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6c&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \tau_{ik} &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\nominal{\tau}_{ik},0.02^2\right)\ , <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(12) }}<br /> <br /> and non compliance (here meaning that a dose is not taken):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6d&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \pi &amp;=&amp; \prob{d_{ik} = 0} \nonumber \\ &amp;=&amp; 1 - \prob{d_{ik}= \nominal{d}_{ik} }. <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(13) }}<br /> <br /> Here, [[#ex_proba6b{{!}}(10)]] and [[#ex_proba6a{{!}}(11)]] define the conditional distributions of the concentrations $(y_{ij})$, [[#ex_proba2b{{!}}(3)]] defines the distribution of $\bpsi$ and [[#ex_proba6c{{!}}(12)]] and [[#ex_proba6d{{!}}(13)]] define the distribution of $\bu$. The inputs are the population parameters $\theta$, the weights $(w_i)$, the measurement times $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A complete model===<br /> <br /> We have now seen the variety of ways in which the variables in a model either play the role of random variables whose distribution is defined by the model, or that of nonrandom variables or parameters. Any combination is possible, depending on the context. For instance, the population parameters $\theta$ and covariates $\bc$ could be random with parametric probability distributions $\qth(\, \cdot \,;\varphi)$ and $\qc(\, \cdot \,;\gamma)$, and the dose regimen $\bu$ and measurement times $\bt$ reported with uncertainty and therefore modeled as random variables with distribution $\qu$ and $\qt$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters, the population parameters, the dose regimens, the covariates and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsithcut(\by , \bpsi, \theta, \bu, \bc,\bt; \nominal{\bu},\nominal{\bt},\varphi,\gamma)=\pcypsiut(\by {{!}}\bpsi,\bu,\bt) \, \pcpsithc(\bpsi{{!}}\theta,\bc) \, \pth(\theta;\varphi) \, \pc(\bc;\gamma) \, \pu(\bu ; \nominal{\bu}) \, \pt(\bt ; \nominal{\bt}). &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the nominal dose regimens $\nominal{\bu}$, the nominal measurement times $\nominal{\bt}$ and the &quot;hyper-parameters&quot; $\varphi$ and $\gamma$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Using the model for executing tasks==<br /> <br /> <br /> In a modeling and simulation context, the tasks to execute make specific use of the various probability distributions associated with a model.<br /> <br /> &lt;br&gt;<br /> ===Simulation===<br /> <br /> <br /> By definition, simulation makes direct use of the probability distribution that defines the model. Simulation of the global model is straightforward as soon as the joint distribution can be decomposed into a product of easily simulated conditional distributions.<br /> <br /> Consider for example that the variables involved in the model are those introduced in [[#An illustrative example|the previous section]]:<br /> <br /> <br /> # The population parameters $\theta$ can either be given, or simulated from the distribution $\qth$.<br /> # The individual covariates $\bc$ can either be given, or simulated from the distribution $\qc$.<br /> # The individual parameters $\bpsi$ can be simulated from the distribution $\qcpsithc$ using the values of $\theta$ and $\bc$ obtained in steps 1 and 2.<br /> # The dose regimen $\bu$ can either be given, or simulated from the distribution $\qu$.<br /> # The measurement times $\bt$ (resp. regression variables $\bx$) can either be given, or simulated from the distribution $\qt$ (resp. $\qx$).<br /> # Lastly, observations $\by$ can be simulated from the distribution $\qcypsiut$ using the values of $\bpsi$, $\bu$ and $\bt$ obtained at steps 3, 4 and 5.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> Simulation of a set of variables $w$ using another given set of variables $z$ requires:<br /> <br /> <br /> &lt;ul&gt;<br /> * a model, i.e., a distribution $\qw$ if $z$ is treated as a nonrandom variable, or a conditional distribution $\qcwz$ if $z$ is treated as a random variable.<br /> * the input $z$, i.e., a value of $z$ which allows the distribution $\qw(\, \cdot \, ; z)$ or the conditional distribution $\qcwz(\, \cdot \, {{!}} z)$ to be defined.<br /> * an algorithm which allows us to generate $w$ from $\qw$ or $\qcwz$.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=<br /> - Imagine instead that the population parameter $\theta$ and the design $(\bu,\bt)$ are given, and we want to simulate the individual covariates $\bc$, the individual parameters $\bpsi$ and the observations $\by$. Here, the variables to simulate are $w=(\bc,\bpsi,\by)$ and the variables which are given are $z=(\theta,\bu,\bt)$. If the components of $z$ are taken to be nonrandom variables, then:<br /> <br /> <br /> &lt;ul&gt;<br /> * The model is the joint distribution $\qypsic( \, \cdot \, ;\theta,\bu,\bt)$ of $(\by,\bpsi,\bc)$.<br /> * The inputs required for the simulation are the values of $(\theta,\bu,\bt)$.<br /> * The algorithm should be able to generate $(\by,\bpsi,\bc)$ from the joint distribution $\qypsic(\, \cdot \, ;\theta,\bu,\bt)$. Decomposing the model into three submodels leads to decomposing the joint distribution as<br /> &lt;/ul&gt;<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsic(\by,\bpsi,\bc ;\theta,\bu,\bt) = \pc(\bc) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pcypsi(\by {{!}} \bpsi;\bu,\bt) . &lt;/math&gt; }}<br /> <br /> The algorithm therefore reduces to successively drawing $\bc$, $\bpsi$ and $\by$ from $\qc$, $\qcpsic(\, \cdot \, {{!}} \bc;\theta)$ and $\qcypsi(\, \cdot \, {{!}} \bpsi;\bu,\bt)$. <br /> <br /> <br /> - Imagine instead that the individual covariates $\bc$, the observations $\by$, the design $(\bu,\bt)$ and the population parameter $\theta$ are given (in a modeling context for instance, $\theta$ may have been estimated), and we want to simulate the individual parameters $\bpsi$. The only variable to simulate is $w=\bpsi$ and the variables which are given are $z=(\by,\bc,\theta,\bu,\bt)$. Here, we will treat $\by$ as if it is a random variable. The other components of $z$ can be treated as non random variables. Here,<br /> <br /> <br /> &lt;ul&gt;<br /> * The model is the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$ of $\psi$.<br /> * The inputs required for the simulation are the values of $(\by,\bc,\theta,\bu,\bt)$.<br /> * The algorithm should be able to sample $\bpsi$ from the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$. Markov Chain Monte Carlo (MCMC) algorithms can be used for sampling from such complex conditional distributions.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> &lt;br&gt;<br /> <br /> ===Estimation of the population parameters===<br /> <br /> <br /> In a modeling context, we usually assume that we have data that includes the observations $\by$ and the measurement times $\bt$. There may also be individual covariates $\bc$, and in pharmacological applications the dose regimen $\bu$. For clarity, let us consider the most general case where all are present.<br /> <br /> Any statistical method for estimating the population parameters $\theta$ will be based on some specific probability distribution. Let us illustrate this with two common statistical methods: maximum likelihood and Bayesian estimation.<br /> <br /> <br /> ''Maximum likelihood estimation'' consists in maximizing with respect to $\theta$ the ''observed likelihood'', defined by:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\like}(\theta ; \by,\bc,\bu,\bt) &amp;\eqdef&amp; \py(\by ; \bc,\bu,\bt,\theta) \\<br /> &amp;=&amp; \int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The variance of the estimator $\thmle$ and therefore confidence intervals can be derived from the observed Fisher information matrix, which itself is calculated using the observed likelihood (i.e., the pdf of the observations $\by$):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ofim_intro3&quot;&gt;&lt;math&gt;<br /> \ofim(\thmle ; \by,\bc,\bu,\bt) \ \ \eqdef \ \ - \displaystyle{ \frac{\partial^2}{\partial \theta^2} } \log({\like}(\thmle ; \by,\bc,\bu,\bt)) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(14) }}<br /> <br /> <br /> {{OutlineText<br /> |text=Maximum likelihood estimation of the population parameter $\theta$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi$.<br /> * inputs $\by$, $\bc$, $\bu$ and $\bt$.<br /> * an algorithm which allows us to maximize $\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi$ with respect to $\theta$ and to compute $\displaystyle{ \frac{\partial^2}{\partial \theta^2} }\left\{\log\left(\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\thmle) \, d \bpsi \right)\right\}$.<br /> }}<br /> <br /> <br /> ''Bayesian estimation'' consists in estimating and/or maximizing the conditional distribution<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcthy(\theta {{!}} \by ;\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pyth(\by , \theta ; \bc,\bu,\bt)}{\py(\by ; \bc,\bu,\bt)} } \\<br /> &amp;=&amp; \frac{\displaystyle{ \int \pypsith(\by,\bpsi,\theta ; \bc,\bu,\bt) \, d \bpsi} }{\py(\by ; \bc,\bu,\bt)} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> {{outlineText<br /> |text= Bayesian estimation of the population parameter $\theta$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsith(\, \cdot \, ; \bc, \bu, \bt)$ for $(\by,\bpsi,\theta)$.<br /> * inputs $\by$, $\bc$, $\bu$ and $\bt$.<br /> * algorithms able to estimate and maximize $\pcthy(\theta {{!}} \by ;\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode, i.e., finding its maximum.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Estimation of the individual parameters===<br /> <br /> <br /> When $\theta$ is given (or estimated), various estimators of the individual parameters $\bpsi$ are available. They are all based on a probability distribution:<br /> <br /> ''Maximum likelihood estimation'' consists of maximizing with respect to $\bpsi$ the ''conditional likelihood''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\like}(\bpsi ; \by,\bu,\bt) &amp;\eqdef&amp; \pcypsi(\by {{!}} \bpsi ;\bu,\bt) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''maximum a posteriori'' (MAP) estimator is obtained by maximizing with respect to $\bpsi$ the ''conditional distribution''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcpsiy(\bpsi {{!]}} \by ; \theta,\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pypsi(\by , \bpsi;\theta,\bc,\bu,\bt)}{\py(\by ; \theta,\bc,\bu,\bt)} } .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''conditional mean'' of $\bpsi$ is defined as the mean of the conditional distribution $\qcpsiy(\, \cdot \, | \by ; \theta,\bc,\bu,\bt)$ of $\psi$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> Estimation of the individual parameters $\bpsi$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br /> * algorithms able to estimate and maximize $\pcpsiy(\bpsi {{!}} \by ; \theta,\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode (i.e., its MAP).<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> ===Model selection===<br /> <br /> <br /> Likelihood ratio tests and statistical information criteria (BIC, AIC) compare the ''observed likelihoods'' computed under different models, i.e., the probability distribution functions $\py^{(1)}(\by ; \bc,\bu,\bt,\thmle_1)$, $\py^{(2)}(\by ; \bc,\bu,\bt,\thmle_2)$, ..., $\py^{(K)}(\by ; \bc,\bu,\bt,\thmle_K)$ computed under models ${\cal M}_1, {\cal M}_2$, ..., ${\cal M}_K$, where $\thmle_k$ maximizes the observed likelihood of model ${\cal M}_k$, i.e., maximizes $\py^{(k)}(\by ; \bc,\bu,\bt,\theta)$ .<br /> <br /> <br /> {{outlineText<br /> |text=<br /> Computing the observed likelihood and information criteria require:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br /> * an algorithm able to compute $\int \pypsi( \by ,\bpsi ;\theta,\bc,\bu,\bt) \, d\bpsi$. For nonlinear models, linearization methods or Monte-Carlo methods can be used.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Optimal design===<br /> <br /> In the design of experiments for estimating statistical models, optimal designs allow parameters to be estimated with minimum variance by optimizing some statistical criteria. Common optimality criteria are functionals of the eigenvalues of the expected Fisher information matrix:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;efim_intro3&quot;&gt;&lt;math&gt;<br /> \efim(\theta ; \bu,\bt) \ \ \eqdef \ \ \esps{y}{\ofim(\theta ; \by,\bu,\bt)} ,<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(15) }}<br /> <br /> where $\ofim$ is the observed Fisher information matrix defined in [[#ofim_intro3|(15)]]. For the sake of simplicity, we consider models without covariates $\bc$.<br /> <br /> <br /> {{OutlineText<br /> |text=Optimal design for minimum variance estimation requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * a vector of population parameters $\theta$.<br /> * a criteria ${\cal D}(\bu,\bt)$ derived from the expected Fisher information matrix $\efim(\theta ; \bu,\bt)$.<br /> * an algorithm able to estimate $\efim(\theta ; \bu,\bt)$ for any design $(\bu,\bt)$ and to maximize ${\cal D}(\bu,\bt)$ with respect to $\bu$ and $\bt$.<br /> }}<br /> <br /> <br /> In a clinical trial context, studies are designed to optimize the probability of reaching some predefined target ${\cal A}$, i.e., $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$. This may include optimizing safety and efficacy, and things like the probability of reaching sustained virologic response, etc.<br /> <br /> <br /> {{OutlineText<br /> |text=Optimal design for clinical trials requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsic(\, \cdot \, ; \theta, \bu, \bt)$ for $(\by,\bpsi,\bc)$.<br /> * a vector of population parameters $\theta$.<br /> * a target ${\cal A}$.<br /> * an algorithm able to estimate $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$ and to maximize it with respect to $\bu$ and $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Implementing models and running tasks==<br /> <br /> <br /> ===Example 1 ===<br /> <br /> Consider first the model defined by the joint distribution <br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\pypsi(\by,\bpsi ; \theta ,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \pcpsic(\bpsi ; \theta),&lt;/math&gt;}}<br /> <br /> where as in our running example, <br /> <br /> <br /> &lt;ul&gt;<br /> * $\by = (y_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are concentrations <br /> <br /> * $\bpsi= (\psi_i, 1\leq i \leq N)$ are individual parameters; here $\psi_i=(V_i,k_i,a_i)$<br /> <br /> * $\theta=(V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,a)$ are population parameters <br /> <br /> * $\bt = (t_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are the measurement times. <br /> &lt;/ul&gt;<br /> <br /> <br /> We aim to define a joint model for $\by$ and $\bpsi$. To do this, we will characterize each component of the model and show how this can be implemented with $\mlxtran$.<br /> <br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot;|<br /> {{Equation2 <br /> |name=&lt;math&gt; \pypsi(\by,\bpsi ; \theta, \bt) &lt;/math&gt; <br /> |equation= }}<br /> {{Equation2<br /> |name= &lt;math&gt; \pcpsic(\bpsi {{!}}\theta)&lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{array}{c} <br /> \log(V_i) &amp;\sim&amp; {\cal N}\left(\log(V_{\rm pop}), \, \omega_V^2\right) \\<br /> \log(k_i) &amp;\sim&amp; {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br /> \end{array}&lt;/math&gt; }}<br /> {{Equation2<br /> |name= &lt;math&gt;\pcypsi(y{{!}}\bpsi; \bt) &lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{eqnarray}<br /> f(t;V_i,k_i) &amp;=&amp; \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br /> y_{ij} &amp;\sim&amp; {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> |style = &quot;width:50%&quot; |<br /> {{MLXTranForTable<br /> |name=Example 1<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> [INDIVIDUAL PARAMETER]<br /> input={V_pop,k_pop,omega_V,omega_k}<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=V_pop,sd=omega_V}<br /> k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br /> <br /> <br /> [OBSERVATION]<br /> input={V,k,a}<br /> <br /> EQUATION:<br /> f = 500/V*exp(-k*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, sd=a}<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> <br /> We can then use this model with different tools for executing different tasks: it can be used for example with $\mlxplore$ for model exploration, with $\monolix$ for modeling, with R or Matlab for simulation...<br /> <br /> It is important to remember that $\mlxtran$ is not a &quot;function&quot; that calculates an output. It is not an imperative but rather a declarative language, one that allows us to describe a model. It is then the tasks we choose to do which use $\mlxtran$ like a function, &quot;requesting&quot; it to give predictions, simulate random variables, compute a pdf, maximizes a likelihood, etc.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Example 1 ===<br /> <br /> Consider first the model defined by the joint distribution <br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\pypsi(\by,\bpsi ; \theta ,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \pcpsic(\bpsi ; \theta),&lt;/math&gt;}}<br /> <br /> where as in our running example, <br /> <br /> <br /> &lt;ul&gt;<br /> * $\by = (y_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are concentrations <br /> <br /> * $\bpsi= (\psi_i, 1\leq i \leq N)$ are individual parameters; here $\psi_i=(V_i,k_i,a_i)$<br /> <br /> * $\theta=(V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,a)$ are population parameters <br /> <br /> * $\bt = (t_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are the measurement times. <br /> &lt;/ul&gt;<br /> <br /> <br /> We aim to define a joint model for $\by$ and $\bpsi$. To do this, we will characterize each component of the model and show how this can be implemented with $\mlxtran$.<br /> <br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot;|<br /> {{Equation2 <br /> |name=&lt;math&gt; \pypsi(\by,\bpsi ; \theta, \bt) &lt;/math&gt; <br /> |equation= }}<br /> {{Equation2<br /> |name= &lt;math&gt; \pcpsic(\bpsi {{!}}\theta)&lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{array}{c} <br /> \log(V_i) &amp;\sim&amp; {\cal N}\left(\log(V_{\rm pop}), \, \omega_V^2\right) \\<br /> \log(k_i) &amp;\sim&amp; {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br /> \end{array}&lt;/math&gt; }}<br /> {{Equation2<br /> |name= &lt;math&gt;\pcypsi(y{{!}}\bpsi; \bt) &lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{eqnarray}<br /> f(t;V_i,k_i) &amp;=&amp; \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br /> y_{ij} &amp;\sim&amp; {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> |style = &quot;width:50%&quot; |<br /> {{MLXTranForTable<br /> |name=Example 1<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> [INDIVIDUAL PARAMETER]<br /> input={V_pop,k_pop,omega_V,omega_k}<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=V_pop,sd=omega_V}<br /> k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br /> <br /> <br /> [OBSERVATION]<br /> input={V,k,a}<br /> <br /> EQUATION:<br /> f = 500/V*exp(-k*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, sd=a}<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> <br /> We can then use this model with different tools for executing different tasks: it can be used for example with $\mlxplore$ for model exploration, with $\monolix$ for modeling, with R or Matlab for simulation...<br /> <br /> It is important to remember that $\mlxtran$ is not a &quot;function&quot; that calculates an output. It is not an imperative but rather a declarative language, one that allows us to describe a model. It is then the tasks we choose to do which use $\mlxtran$ like a function, &quot;requesting&quot; it to give predictions, simulate random variables, compute a pdf, maximizes a likelihood, etc.<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack=The individual approach<br /> |linkNext=Description, representation and implementation of a model }}</div> Bleakley https://wiki.inria.fr/wikis/popix/index.php?title=What_is_a_model%3F_A_joint_probability_distribution!&diff=7359 What is a model? A joint probability distribution! 2013-06-18T08:26:29Z <p>Bleakley: </p> <hr /> <div>==Introduction==<br /> <br /> A model built for real-world applications can involve various types of variable, such as measurements, individual and population parameters, covariates, design, etc. The model allows us to represent relationships between these variables.<br /> <br /> If we consider things from a probabilistic point of view, some of the variables will be random, so the model becomes a probabilistic one, representing the [http://en.wikipedia.org/wiki/Joint_probability_distribution joint distribution] of these random variables.<br /> <br /> Defining a model therefore means defining a joint distribution. The hierarchical structure of the model will then allow it to be decomposed into submodels, i.e., the joint distribution decomposed into a product of [http://en.wikipedia.org/wiki/Conditional_probability_distribution conditional distributions].<br /> <br /> Tasks such as estimation, model selection, simulation and optimization can then be expressed as specific ways of using this probability distribution.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - A model is a joint probability distribution. <br /> <br /> - A submodel is a conditional distribution derived from this joint distribution. <br /> <br /> - A task is a specific use of this distribution. <br /> }}<br /> <br /> We will illustrate this approach starting with a very simple example that we will gradually make more sophisticated. Then we will see in various situations what can be defined as the model and what its inputs are.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==An illustrative example==<br /> <br /> &lt;br&gt;<br /> ===A model for the observations of a single individual===<br /> Let $y=(y_j, 1\leq j \leq n)$ be a vector of observations obtained at times $\vt=(t_j, 1\leq j \leq n)$. We consider that the $y_j$ are random variables and we denote $\qy$ the distribution (or [http://en.wikipedia.org/wiki/Probability_density_function pdf]) of $y$. If we assume a [http://en.wikipedia.org/wiki/Parametric_model parametric model], then there exists a vector of parameters $\psi$ that completely define $y$.<br /> <br /> We can then explicitly represent this dependency with respect to $\bpsi$ by writing $\qy( \, \cdot \, ; \psi)$ for the pdf of $y$.<br /> <br /> If we wish to be even more precise, we can even make it clear that this distribution is defined for a given design, i.e., a given vector of times $\vt$, and write $\qy(\, \cdot \, ; \psi,\vt)$ instead.<br /> <br /> By convention, the variables which are before the symbol &quot;;&quot; are random variables. Those that are after the &quot;;&quot; are non-random parameters or variables.<br /> When there is no risk of confusion, the non-random terms can be left out of the notation.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> -In this context, the model is the distribution of the observations $\qy(\, \cdot \, ; \psi,\vt)$. &lt;br&gt;<br /> -The inputs of the model are the parameters $\psi$ and the design $\vt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= 500 mg of a drug is given by [http://en.wikipedia.org/wiki/Intravenous_therapy intravenous] [http://en.wikipedia.org/wiki/Bolus_%28medicine%29 bolus] to a patient at time 0. We assume that the evolution of the [http://en.wikipedia.org/wiki/Blood_plasma plasmatic] concentration of the drug over time is described by the [http://en.wikipedia.org/wiki/Pharmacokinetics pharmacokinetic] (PK) model<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; f(t;V,k) = \displaystyle{ \frac{500}{V} }e^{-k \, t} , &lt;/math&gt; }}<br /> <br /> where $V$ is the [http://en.wikipedia.org/wiki/Volume_of_distribution volume of distribution] and $k$ the [http://en.wikipedia.org/wiki/Elimination_rate_constant elimination rate constant]. The concentration is measured at times $(t_j, 1\leq j \leq n)$ with additive residual errors:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; y_j = f(t_j;V,k) + e_j , \quad 1 \leq j \leq n . &lt;/math&gt; }}<br /> <br /> Assuming that the residual errors $(e_j)$ are [http://en.wikipedia.org/wiki/Dependent_and_independent_variables independent] and [http://en.wikipedia.org/wiki/Normal_distribution normally distributed] with constant variance $a^2$, the observed values $(y_j)$ are also independent random variables and<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba1&quot; &gt;&lt;math&gt;<br /> y_j \sim {\cal N} \left( f(t_j ; V,k) , a^2 \right), \quad 1 \leq j \leq n. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> Here, the vector of parameters $\psi$ is $(V,k,a)$. $V$ and $k$ are the PK parameters for the structural PK model and $a$ the residual error parameter.<br /> As the $y_j$ are independent, the joint distribution of $y$ is the product of their marginal distributions:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \py(y ; \psi,\vt) = \prod_{j=1}^n \pyj(y_j ; \psi,t_j) ,<br /> &lt;/math&gt; }}<br /> <br /> where $\qyj$ is the normal distribution defined in [[#ex_proba1|(1)]]. <br /> }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> === A model for several individuals ===<br /> <br /> Now let us move to $N$ individuals. It is natural to suppose that each is represented by the same basic parametric model, but not necessarily the exact same parameter values. Thus, individual $i$ has parameters $\psi_i$. If we consider that individuals are randomly selected from the [http://en.wikipedia.org/wiki/Statistical_population population], then we can treat the $\psi_i$ as if they were random vectors. As both $\by=(y_i , 1\leq i \leq N)$ and $\bpsi=(\psi_i , 1\leq i \leq N)$ are random, the model is now a joint distribution: $\qypsi$. Using basic probability, this can be written as:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsi(\by,\bpsi) = \pcypsi(\by {{!}} \bpsi) \, \ppsi(\bpsi) .&lt;/math&gt; }}<br /> <br /> If $\qpsi$ is a parametric distribution that depends on a vector $\theta$ of ''population parameters'' and a set of ''individual covariates'' $\bc=(c_i , 1\leq i \leq N)$, this dependence can be made explicit by writing $\qpsi(\, \cdot \,;\theta,\bc)$ for the pdf of $\bpsi$.<br /> Each $i$ has a potentially unique set of times $t_i=(t_{i1},\ldots,t_{i \ \!\!n_i})$ in the design, and $n_i$ can be different for each individual.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - In this context, the model is the joint distribution of the observations and the individual parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pypsi(\by , \bpsi; \theta, \bc,\bt)=\pcypsi(\by {{!}} \bpsi;\bt) \, \ppsi(\bpsi;\theta,\bc) . &lt;/math&gt;}}<br /> <br /> - The inputs of the model are the population parameters $\theta$, the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times <br /> :$\bt=(t_{ij} ,\ 1\leq i \leq N ,\ 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= Let us suppose $N$ patients received the same treatment as the single patient did. We now have the same PK model [[#ex_proba1|(1)]] for each patient, except that each has its own individual PK parameters $V_i$ and $k_i$ and potentially its own residual error parameter $a_i$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2a&quot;&gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N} \left( \displaystyle{\frac{500}{V_i}e^{-k_i \, t_{ij} } } , a_i^2 \right).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> Here, $\psi_i = (V_i,k_i,a_i)$. One possible model is then to assume the same residual error model for all patients, and log-normal distributions for $V$ and $k$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> a_i &amp;=&amp; a \end{eqnarray}&lt;/math&gt; }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2b&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \log(V_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(V_{\rm pop})+\beta\,\log(w_i/70),\ \omega_V^2\right) \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \log(k_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(k_{\rm pop}),\ \omega_k^2\right), \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where the only covariate we choose to consider, $w_i$, is the weight (in kg) of patient $i$. The model therefore consists of the conditional distribution of the concentrations defined in [[#ex_proba2a|(2)]] and the distribution of the individual PK parameters defined in [[#ex_proba2b|(3)]]. The inputs of the model are the population parameters $\theta = (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$, the covariates (here, the weight) $(w_i, 1\leq i \leq N)$, and the design $\bt$.<br /> }}<br /> <br /> &lt;br&gt;<br /> ===A model for the population parameters===<br /> <br /> In some cases, it may turn out that it is useful or important to consider that the population parameter $\theta$ is itself random, rather than fixed. There are various reasons for this, such as if we want to model uncertainty in its value, introduce a priori information in an estimation context, or model an inter-population variability if the model is not looking at only one given population.<br /> <br /> If so, let us denote $\qth$ the distribution of $\theta$. As the status of $\theta$ has therefore changed, the model now becomes the joint distribution of its random variables, i.e., of $\by$, $\bpsi$ and $\theta$, and can be decomposed as follows:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3a&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta;\bt,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> |text= &lt;ol&gt;<br /> &lt;li&gt; The formula is identical for $\ppsi(\bpsi; \theta)$ and $\pcpsith(\bpsi{{!}}\theta)$. What has changed is the status of $\theta$. It is not random in $\ppsi(\bpsi; \theta)$, the distribution of $\bpsi$ for any given value of $\theta$, whereas it is random in $\pcpsith(\bpsi {{!}} \theta)$, the conditional distribution of $\bpsi$, i.e., the distribution of $\bpsi$ obtained after observing randomly generated $\theta$. &lt;/li&gt;&lt;br&gt;<br /> <br /> &lt;li&gt;If $\qth$ is a parametric distribution with parameter $\varphi$, this dependence can be made explicit by writing $\qth(\, \cdot \,;\varphi)$ for the distribution of $\theta$.&lt;/li&gt;&lt;br&gt;<br /> <br /> &lt;li&gt;Not necessarily all of the components of $\theta$ need be random. If it is possible to decompose $\theta$ into $(\theta_F,\theta_R)$, where $\theta_F$ is fixed and $\theta_R$ random, then the decomposition [[#proba3a{{!}}(4)]] becomes &lt;/li&gt;<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3b&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta_R;\bt,\theta_F,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta_R;\theta_F,\bc) \, \pth(\theta_R).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(5) }} <br /> &lt;/ol&gt;}}<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the population parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsith(\by,\bpsi,\theta;\bc,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta). &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times $\bt=(t_{ij} , 1\leq i \leq N , 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We can introduce [http://en.wikipedia.org/wiki/Prior_probability prior distributions] in order to model the inter-population variability of the population parameters $V_{\rm pop}$ and $k_{\rm pop}$: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba3&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> V_{\rm pop} &amp;\sim&amp; {\cal N}\left(30,3^2\right) <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(6) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> k_{\rm pop} &amp;\sim&amp; {\cal N}\left(0.1,0.01^2\right). \end{eqnarray}&lt;/math&gt; }}<br /> <br /> As before, the conditional distribution of the concentration is given by [[#ex_proba2a|(2)]]. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given $\theta_R=(V_{\rm pop},k_{\rm pop})$. The distribution of $\theta_R$ is defined in [[#ex_proba3|(6)]]. Here, the inputs of the model are the fixed population parameters $\theta_F = (\omega_V,\omega_k,\beta,a)$, the weights $(w_i)$ and the design $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the covariates===<br /> <br /> <br /> Another scenario is to suppose that in fact it is the covariates $\bc$ that are random, not the population parameters. This may either be in the context of wanting to simulate individuals, or when modeling and wanting to take into account uncertainty in the covariate values. If we note $\qc$ the distribution of the covariates, the joint distribution $\qpsic$ of the individual parameters and the covariates decomposes naturally as:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba4&quot;&gt;&lt;math&gt;<br /> \ppsic(\bpsi,\bc;\theta) = \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) \, ,<br /> &lt;/math&gt;<br /> |reference=(7) }}<br /> <br /> where $\qcpsic$ is the conditional distribution of $\bpsi$ given $\bc$.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> &lt;li&gt;In this context, the model is the joint distribution of the observations, the individual parameters and the covariates:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsic(\by,\bpsi,\bc;\theta,\bt) = \pcypsi(\by {{!}} \bpsi;\bt) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) . &lt;/math&gt; }}<br /> <br /> &lt;li&gt;The inputs of the model are the population parameters $\theta$ and the measurement times $\bt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We could assume a normal distribution as a prior for the weights: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba4&quot; &gt;&lt;math&gt; <br /> w_i \sim_{i.i.d.} {\cal N}\left(70,10^2\right). &lt;/math&gt;&lt;/div&gt;<br /> |reference=(8) }}<br /> <br /> Once more, [[#ex_proba2a|(2)]] defines the conditional distribution of the concentrations. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given the weight $\bw$, which is now a random variable whose distribution is defined in [[#ex_proba4|(8)]]. Now, the inputs of the model are the population parameters $\theta= (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$ and the design $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the measurement times===<br /> <br /> Another scenario is to suppose that there is uncertainty in the measurement times $\bt=(t_{ij})$ and not the population parameters or covariates. If we note $\nominal{\bt}=(\nominal{t}_{ij}, 1\leq i \leq N, 1\leq j \leq n_i)$ the nominal measurement times (i.e., those presented in a data set), then the &quot;true&quot; measurement times $\bt$ at which the measurement were made can be considered random fluctuations around $\nominal{\bt}$ following some distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> Randomness with respect to time can also appear in the presence of dropout, i.e., individuals that prematurely leave a clinical trial. For such an individual $i$ who leaves at the random time $T_i$, measurement times are the nominal times before $T_i$: $t_{i} = (\nominal{t}_{ij} \ \ {\rm s.t. }\ \ \nominal{t}_{ij}\leq T_i)$.<br /> In such situations, measurement times are therefore random and can be thought to come from a distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> <br /> {{Remarks<br /> |title=Remark<br /> |text= If there are also other regression variables $\bx=(x_{ij})$, it is of course possible to use the same approach and consider $\bx$ as a random variable fluctuating around $\nominal{\bx}$. }}<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsit(\by , \bpsi,\bt; \theta,\bc,\nominal{\bt})=\pcypsit(\by {{!}}\bpsi,\bt) \, \ppsi(\bpsi;\theta,\bc) \, \pt(\bt ; \nominal{\bt}) .<br /> &lt;/math&gt; }}<br /> <br /> <br /> &lt;li&gt; The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$ and the nominal design $\nominal{\bt}$.<br /> }}<br /> <br /> {{Example<br /> |title=Example:<br /> |text= Let us assume as prior a normal distribution around the nominal times: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba5&quot; &gt;&lt;math&gt; <br /> t_{ij} \sim_{i.i.d.} {\cal N}\left(\nominal{t}_{ij},0.03^2\right). &lt;/math&gt;&lt;/div&gt;<br /> |reference=(9) }}<br /> <br /> Here, [[#ex_proba5|(9)]] defines the distribution of the now random variable $\bt$. The other components of the model defined in [[#ex_proba2a|(2)]] and [[#ex_proba2b|(3)]] remain unchanged. <br /> The inputs of the model are the population parameters $\theta$, the weights $(w_i)$ and the nominal measurement times $\nominal{\bt}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the dose regimen===<br /> <br /> If the structural model is a dynamical system (e.g., defined by a system of [http://en.wikipedia.org/wiki/Ordinary_differential_equation ordinary differential equations]), the ''source terms'' $\bu = (u_i, 1\leq i \leq N)$, i.e., the inputs of the dynamical system, are usually considered fixed and known. This is the case for example for doses administered to patients for a given treatment. Here, the source term $u_i$ is made up of the dose(s) given to patient $i$, the time(s) of administration, and their type (intravenous bolus, infusion, oral, etc.).<br /> <br /> Here again, there may be differences between the nominal dosa regimen stated in the protocol and given in the data set, and the dose regimen that was in reality administered. For example, it might be that the times of administration and/or dose were not exactly respected or recorded. Also, there may have been non compliance, i.e., certain doses that were not taken by the patient.<br /> <br /> If we denote $\nominal{\bu}=(\nominal{u}_{i}, 1\leq i \leq N)$ the nominal dose regimens (reported in the dataset), then in this context the &quot;real&quot; dose regimens $\bu$ can be considered to randomly fluctuate around $\nominal{\bu}$ with some distribution $\qu(\, \cdot \, ; \nominal{\bu})$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the dose regimens:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsiu(\by , \bpsi,\bu; \theta,\bc,\bt,\nominal{\bu})=\pcypsiu(\by {{!}} \bpsi,\bu;\bt) \, \pu(\bu ; \nominal{\bu}) \, \ppsi(\bpsi;\theta,\bc) .<br /> &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$, the nominal design $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=Suppose that instead of the one dose given in the example up to now, there are now repeated doses $(d_{ik}, k \geq 1)$ administered to patient $i$ at times $(\tau_{ik} , k \geq 1)$. Then, it is easy to see that<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6b&quot; &gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N}\left(f(t_{ij};V_i,k_i) , a_i^2 \right), &lt;/math&gt;&lt;/div&gt;<br /> |reference=(10) }}<br /> <br /> where<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6a&quot; &gt;&lt;math&gt; <br /> f(t;V_i,k_i) = \sum_{k, \tau_{ik}&lt;t}\displaystyle {\frac{d_{ik} }{V_i} }\, e^{-k_i \, (t- \tau_{ik})} .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(11) }}<br /> <br /> The &quot;real&quot; dose regimen administered to patient $i$ can be written $u_i=(d_{ik},\tau_{ik}, k\geq 1)$, and the prescribed dose regimen $\nominal{u}_i=(\nominal{d}_{ik},\nominal{\tau}_{ik}, k\geq 1)$.<br /> <br /> We can model the random fluctuations of the administration times $\tau_{ik}$ around the nominal times $(\nominal{\tau}_{ij})$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6c&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \tau_{ik} &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\nominal{\tau}_{ik},0.02^2\right)\ , <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(12) }}<br /> <br /> and non compliance (here meaning that a dose is not taken):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6d&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \pi &amp;=&amp; \prob{d_{ik} = 0} \nonumber \\ &amp;=&amp; 1 - \prob{d_{ik}= \nominal{d}_{ik} }. <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(13) }}<br /> <br /> Here, [[#ex_proba6b{{!}}(10)]] and [[#ex_proba6a{{!}}(11)]] define the conditional distributions of the concentrations $(y_{ij})$, [[#ex_proba2b{{!}}(3)]] defines the distribution of $\bpsi$ and [[#ex_proba6c{{!}}(12)]] and [[#ex_proba6d{{!}}(13)]] define the distribution of $\bu$. The inputs are the population parameters $\theta$, the weights $(w_i)$, the measurement times $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A complete model===<br /> <br /> We have now seen the variety of ways in which the variables in a model either play the role of random variables whose distribution is defined by the model, or that of nonrandom variables or parameters. Any combination is possible, depending on the context. For instance, the population parameters $\theta$ and covariates $\bc$ could be random with parametric probability distributions $\qth(\, \cdot \,;\varphi)$ and $\qc(\, \cdot \,;\gamma)$, and the dose regimen $\bu$ and measurement times $\bt$ reported with uncertainty and therefore modeled as random variables with distribution $\qu$ and $\qt$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters, the population parameters, the dose regimens, the covariates and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsithcut(\by , \bpsi, \theta, \bu, \bc,\bt; \nominal{\bu},\nominal{\bt},\varphi,\gamma)=\pcypsiut(\by {{!}}\bpsi,\bu,\bt) \, \pcpsithc(\bpsi{{!}}\theta,\bc) \, \pth(\theta;\varphi) \, \pc(\bc;\gamma) \, \pu(\bu ; \nominal{\bu}) \, \pt(\bt ; \nominal{\bt}). &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the nominal dose regimens $\nominal{\bu}$, the nominal measurement times $\nominal{\bt}$ and the &quot;hyper-parameters&quot; $\varphi$ and $\gamma$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Using the model for executing tasks==<br /> <br /> <br /> In a modeling and simulation context, the tasks to execute make specific use of the various probability distributions associated with a model.<br /> <br /> &lt;br&gt;<br /> ===Simulation===<br /> <br /> <br /> By definition, simulation makes direct use of the probability distribution that defines the model. Simulation of the global model is straightforward as soon as the joint distribution can be decomposed into a product of easily simulated conditional distributions.<br /> <br /> Consider for example that the variables involved in the model are those introduced in [[#An illustrative example|the previous section]]:<br /> <br /> <br /> # The population parameters $\theta$ can either be given, or simulated from the distribution $\qth$.<br /> # The individual covariates $\bc$ can either be given, or simulated from the distribution $\qc$.<br /> # The individual parameters $\bpsi$ can be simulated from the distribution $\qcpsithc$ using the values of $\theta$ and $\bc$ obtained in steps 1 and 2.<br /> # The dose regimen $\bu$ can either be given, or simulated from the distribution $\qu$.<br /> # The measurement times $\bt$ (resp. regression variables $\bx$) can either be given, or simulated from the distribution $\qt$ (resp. $\qx$).<br /> # Lastly, observations $\by$ can be simulated from the distribution $\qcypsiut$ using the values of $\bpsi$, $\bu$ and $\bt$ obtained at steps 3, 4 and 5.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> Simulation of a set of variables $w$ using another given set of variables $z$ requires:<br /> <br /> <br /> &lt;ul&gt;<br /> * a model, i.e., a distribution $\qw$ if $z$ is treated as a nonrandom variable, or a conditional distribution $\qcwz$ if $z$ is treated as a random variable.<br /> * the input $z$, i.e., a value of $z$ which allows the distribution $\qw(\, \cdot \, ; z)$ or the conditional distribution $\qcwz(\, \cdot \, {{!}} z)$ to be defined.<br /> * an algorithm which allows us to generate $w$ from $\qw$ or $\qcwz$.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=<br /> - Imagine instead that the population parameter $\theta$ and the design $(\bu,\bt)$ are given, and we want to simulate the individual covariates $\bc$, the individual parameters $\bpsi$ and the observations $\by$. Here, the variables to simulate are $w=(\bc,\bpsi,\by)$ and the variables which are given are $z=(\theta,\bu,\bt)$. If the components of $z$ are taken to be nonrandom variables, then:<br /> <br /> <br /> &lt;ul&gt;<br /> * The model is the joint distribution $\qypsic( \, \cdot \, ;\theta,\bu,\bt)$ of $(\by,\bpsi,\bc)$.<br /> * The inputs required for the simulation are the values of $(\theta,\bu,\bt)$.<br /> * The algorithm should be able to generate $(\by,\bpsi,\bc)$ from the joint distribution $\qypsic(\, \cdot \, ;\theta,\bu,\bt)$. Decomposing the model into three submodels leads to decomposing the joint distribution as<br /> &lt;/ul&gt;<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsic(\by,\bpsi,\bc ;\theta,\bu,\bt) = \pc(\bc) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pcypsi(\by {{!}} \bpsi;\bu,\bt) . &lt;/math&gt; }}<br /> <br /> The algorithm therefore reduces to successively drawing $\bc$, $\bpsi$ and $\by$ from $\qc$, $\qcpsic(\, \cdot \, {{!}} \bc;\theta)$ and $\qcypsi(\, \cdot \, {{!}} \bpsi;\bu,\bt)$. <br /> <br /> <br /> - Imagine instead that the individual covariates $\bc$, the observations $\by$, the design $(\bu,\bt)$ and the population parameter $\theta$ are given (in a modeling context for instance, $\theta$ may have been estimated), and we want to simulate the individual parameters $\bpsi$. The only variable to simulate is $w=\bpsi$ and the variables which are given are $z=(\by,\bc,\theta,\bu,\bt)$. Here, we will treat $\by$ as if it is a random variable. The other components of $z$ can be treated as non random variables. Here,<br /> <br /> <br /> &lt;ul&gt;<br /> * The model is the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$ of $\psi$.<br /> * The inputs required for the simulation are the values of $(\by,\bc,\theta,\bu,\bt)$.<br /> * The algorithm should be able to sample $\bpsi$ from the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$. Markov Chain Monte Carlo (MCMC) algorithms can be used for sampling from such complex conditional distributions.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> &lt;br&gt;<br /> <br /> ===Estimation of the population parameters===<br /> <br /> <br /> In a modeling context, we usually assume that we have data that includes the observations $\by$ and the measurement times $\bt$. There may also be individual covariates $\bc$, and in pharmacological applications the dose regimen $\bu$. For clarity, let us consider the most general case where all are present.<br /> <br /> Any statistical method for estimating the population parameters $\theta$ will be based on some specific probability distribution. Let us illustrate this with two common statistical methods: maximum likelihood and Bayesian estimation.<br /> <br /> <br /> ''Maximum likelihood estimation'' consists in maximizing with respect to $\theta$ the ''observed likelihood'', defined by:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\like}(\theta ; \by,\bc,\bu,\bt) &amp;\eqdef&amp; \py(\by ; \bc,\bu,\bt,\theta) \\<br /> &amp;=&amp; \int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The variance of the estimator $\thmle$ and therefore confidence intervals can be derived from the observed Fisher information matrix, which itself is calculated using the observed likelihood (i.e., the pdf of the observations $\by$):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ofim_intro3&quot;&gt;&lt;math&gt;<br /> \ofim(\thmle ; \by,\bc,\bu,\bt) \ \ \eqdef \ \ - \displaystyle{ \frac{\partial^2}{\partial \theta^2} } \log({\like}(\thmle ; \by,\bc,\bu,\bt)) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(14) }}<br /> <br /> <br /> {{OutlineText<br /> |text=Maximum likelihood estimation of the population parameter $\theta$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi$.<br /> * inputs $\by$, $\bc$, $\bu$ and $\bt$.<br /> * an algorithm which allows us to maximize $\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi$ with respect to $\theta$ and to compute $\displaystyle{ \frac{\partial^2}{\partial \theta^2} }\left\{\log\left(\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\thmle) \, d \bpsi \right)\right\}$.<br /> }}<br /> <br /> <br /> ''Bayesian estimation'' consists in estimating and/or maximizing the conditional distribution<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcthy(\theta {{!}} \by ;\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pyth(\by , \theta ; \bc,\bu,\bt)}{\py(\by ; \bc,\bu,\bt)} } \\<br /> &amp;=&amp; \frac{\displaystyle{ \int \pypsith(\by,\bpsi,\theta ; \bc,\bu,\bt) \, d \bpsi} }{\py(\by ; \bc,\bu,\bt)} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> {{outlineText<br /> |text= Bayesian estimation of the population parameter $\theta$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsith(\, \cdot \, ; \bc, \bu, \bt)$ for $(\by,\bpsi,\theta)$.<br /> * inputs $\by$, $\bc$, $\bu$ and $\bt$.<br /> * algorithms able to estimate and maximize $\pcthy(\theta {{!}} \by ;\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode, i.e., finding its maximum.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Estimation of the individual parameters===<br /> <br /> <br /> When $\theta$ is given (or estimated), various estimators of the individual parameters $\bpsi$ are available. They are all based on a probability distribution:<br /> <br /> ''Maximum likelihood estimation'' consists of maximizing with respect to $\bpsi$ the ''conditional likelihood''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\like}(\bpsi ; \by,\bu,\bt) &amp;\eqdef&amp; \pcypsi(\by {{!}} \bpsi ;\bu,\bt) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''maximum a posteriori'' (MAP) estimator is obtained by maximizing with respect to $\bpsi$ the ''conditional distribution''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcpsiy(\bpsi {{!]}} \by ; \theta,\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pypsi(\by , \bpsi;\theta,\bc,\bu,\bt)}{\py(\by ; \theta,\bc,\bu,\bt)} } .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''conditional mean'' of $\bpsi$ is defined as the mean of the conditional distribution $\qcpsiy(\, \cdot \, | \by ; \theta,\bc,\bu,\bt)$ of $\psi$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> Estimation of the individual parameters $\bpsi$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br /> * algorithms able to estimate and maximize $\pcpsiy(\bpsi {{!}} \by ; \theta,\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode (i.e., its MAP).<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> ===Model selection===<br /> <br /> <br /> Likelihood ratio tests and statistical information criteria (BIC, AIC) compare the ''observed likelihoods'' computed under different models, i.e., the probability distribution functions $\py^{(1)}(\by ; \bc,\bu,\bt,\thmle_1)$, $\py^{(2)}(\by ; \bc,\bu,\bt,\thmle_2)$, ..., $\py^{(K)}(\by ; \bc,\bu,\bt,\thmle_K)$ computed under models ${\cal M}_1, {\cal M}_2$, ..., ${\cal M}_K$, where $\thmle_k$ maximizes the observed likelihood of model ${\cal M}_k$, i.e., maximizes $\py^{(k)}(\by ; \bc,\bu,\bt,\theta)$ .<br /> <br /> <br /> {{outlineText<br /> |text=<br /> Computing the observed likelihood and information criteria require:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br /> * an algorithm able to compute $\int \pypsi( \by ,\bpsi ;\theta,\bc,\bu,\bt) \, d\bpsi$. For nonlinear models, linearization methods or Monte-Carlo methods can be used.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Optimal design===<br /> <br /> In the design of experiments for estimating statistical models, optimal designs allow parameters to be estimated with minimum variance by optimizing some statistical criteria. Common optimality criteria are functionals of the eigenvalues of the expected Fisher information matrix:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;efim_intro3&quot;&gt;&lt;math&gt;<br /> \efim(\theta ; \bu,\bt) \ \ \eqdef \ \ \esps{y}{\ofim(\theta ; \by,\bu,\bt)} ,<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(15) }}<br /> <br /> where $\ofim$ is the observed Fisher information matrix defined in [[#ofim_intro3|(15)]]. For the sake of simplicity, we consider models without covariates $\bc$.<br /> <br /> <br /> {{OutlineText<br /> |text=Optimal design for minimum variance estimation requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * a vector of population parameters $\theta$.<br /> * a criteria ${\cal D}(\bu,\bt)$ derived from the expected Fisher information matrix $\efim(\theta ; \bu,\bt)$.<br /> * an algorithm able to estimate $\efim(\theta ; \bu,\bt)$ for any design $(\bu,\bt)$ and to maximize ${\cal D}(\bu,\bt)$ with respect to $\bu$ and $\bt$.<br /> }}<br /> <br /> <br /> In a clinical trial context, studies are designed to optimize the probability of reaching some predefined target ${\cal A}$, i.e., $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$. This may include optimizing safety and efficacy, and things like the probability of reaching sustained virologic response, etc.<br /> <br /> <br /> {{OutlineText<br /> |text=Optimal design for clinical trials requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsic(\, \cdot \, ; \theta, \bu, \bt)$ for $(\by,\bpsi,\bc)$.<br /> * a vector of population parameters $\theta$.<br /> * a target ${\cal A}$.<br /> * an algorithm able to estimate $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$ and to maximize it with respect to $\bu$ and $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Implementing models and running tasks==<br /> <br /> <br /> ===Example 1 ===<br /> <br /> Consider first the model defined by the joint distribution <br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\pypsi(\by,\bpsi ; \theta ,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \pcpsic(\bpsi ; \theta),&lt;/math&gt;}}<br /> <br /> where as in our running example, <br /> <br /> <br /> &lt;ul&gt;<br /> * $\by = (y_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are concentrations <br /> <br /> * $\bpsi= (\psi_i, 1\leq i \leq N)$ are individual parameters; here $\psi_i=(V_i,k_i,a_i)$<br /> <br /> * $\theta=(V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,a)$ are population parameters <br /> <br /> * $\bt = (t_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are the measurement times. <br /> &lt;/ul&gt;<br /> <br /> <br /> We aim to define a joint model for $\by$ and $\bpsi$. To do this, we will characterize each component of the model and show how this can be implemented with $\mlxtran$.<br /> <br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot;|<br /> {{Equation2 <br /> |name=&lt;math&gt; \pypsi(\by,\bpsi ; \theta, \bt) &lt;/math&gt; <br /> |equation= }}<br /> {{Equation2<br /> |name= &lt;math&gt; \pcpsic(\bpsi {{!}}\theta)&lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{array}{c} <br /> \log(V_i) &amp;\sim&amp; {\cal N}\left(\log(V_{\rm pop}), \, \omega_V^2\right) \\<br /> \log(k_i) &amp;\sim&amp; {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br /> \end{array}&lt;/math&gt; }}<br /> {{Equation2<br /> |name= &lt;math&gt;\pcypsi(y{{!}}\bpsi; \bt) &lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{eqnarray}<br /> f(t;V_i,k_i) &amp;=&amp; \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br /> y_{ij} &amp;\sim&amp; {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> |style = &quot;width:50%&quot; |<br /> {{MLXTranForTable<br /> |name=Example 1<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> [INDIVIDUAL PARAMETER]<br /> input={V_pop,k_pop,omega_V,omega_k}<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=V_pop,sd=omega_V}<br /> k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br /> <br /> <br /> [OBSERVATION]<br /> input={V,k,a}<br /> <br /> EQUATION:<br /> f = 500/V*exp(-k*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, sd=a}<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> <br /> We can then use this model with different tools for executing different tasks: it can be used for example with $\mlxplore$ for model exploration, with $\monolix$ for modeling, with R or Matlab for simulation...<br /> <br /> It is important to remember that $\mlxtran$ is not a &quot;function&quot; that calculates an output. It is not an imperative but rather a declarative language, one that allows us to describe a model. It is then the tasks we choose to do which use $\mlxtran$ like a function, &quot;requesting&quot; it to give predictions, simulate random variables, compute a pdf, maximizes a likelihood, etc.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Example 1 ===<br /> <br /> Consider first the model defined by the joint distribution <br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\pypsi(\by,\bpsi ; \theta ,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \pcpsic(\bpsi ; \theta),&lt;/math&gt;}}<br /> <br /> where as in our running example, <br /> <br /> <br /> &lt;ul&gt;<br /> * $\by = (y_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are concentrations <br /> <br /> * $\bpsi= (\psi_i, 1\leq i \leq N)$ are individual parameters; here $\psi_i=(V_i,k_i,a_i)$<br /> <br /> * $\theta=(V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,a)$ are population parameters <br /> <br /> * $\bt = (t_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are the measurement times. <br /> &lt;/ul&gt;<br /> <br /> <br /> We aim to define a joint model for $\by$ and $\bpsi$. To do this, we will characterize each component of the model and show how this can be implemented with $\mlxtran$.<br /> <br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot;|<br /> {{Equation2 <br /> |name=&lt;math&gt; \pypsi(\by,\bpsi ; \theta, \bt) &lt;/math&gt; <br /> |equation= }}<br /> {{Equation2<br /> |name= &lt;math&gt; \pcpsic(\bpsi {{!}}\theta)&lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{array}{c} <br /> \log(V_i) &amp;\sim&amp; {\cal N}\left(\log(V_{\rm pop}), \, \omega_V^2\right) \\<br /> \log(k_i) &amp;\sim&amp; {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br /> \end{array}&lt;/math&gt; }}<br /> {{Equation2<br /> |name= &lt;math&gt;\pcypsi(y{{!}}\bpsi; \bt) &lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{eqnarray}<br /> f(t;V_i,k_i) &amp;=&amp; \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br /> y_{ij} &amp;\sim&amp; {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> |style = &quot;width:50%&quot; |<br /> {{MLXTranForTable<br /> |name=Example 1<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> [INDIVIDUAL PARAMETER]<br /> input={V_pop,k_pop,omega_V,omega_k}<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=V_pop,sd=omega_V}<br /> k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br /> <br /> <br /> [OBSERVATION]<br /> input={V,k,a}<br /> <br /> EQUATION:<br /> f = 500/V*exp(-k*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, sd=a}<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> <br /> We can then use this model with different tools for executing different tasks: it can be used for example with $\mlxplore$ for model exploration, with $\monolix$ for modeling, with R or Matlab for simulation...<br /> <br /> It is important to remember that $\mlxtran$ is not a &quot;function&quot; that calculates an output. It is not an imperative but rather a declarative language, one that allows us to describe a model. It is then the tasks we choose to do which use $\mlxtran$ like a function, &quot;requesting&quot; it to give predictions, simulate random variables, compute a pdf, maximizes a likelihood, etc.<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack=The individual approach<br /> |linkNext=Description, representation and implementation of a model }}</div> Bleakley https://wiki.inria.fr/wikis/popix/index.php?title=What_is_a_model%3F_A_joint_probability_distribution!&diff=7358 What is a model? A joint probability distribution! 2013-06-18T08:24:20Z <p>Bleakley: </p> <hr /> <div>==Introduction==<br /> <br /> A model built for real-world applications can involve various types of variable, such as measurements, individual and population parameters, covariates, design, etc. The model allows us to represent relationships between these variables.<br /> <br /> If we consider things from a probabilistic point of view, some of the variables will be random, so the model becomes a probabilistic one, representing the [http://en.wikipedia.org/wiki/Joint_probability_distribution joint distribution] of these random variables.<br /> <br /> Defining a model therefore means defining a joint distribution. The hierarchical structure of the model will then allow it to be decomposed into submodels, i.e., the joint distribution decomposed into a product of [http://en.wikipedia.org/wiki/Conditional_probability_distribution conditional distributions].<br /> <br /> Tasks such as estimation, model selection, simulation and optimization can then be expressed as specific ways of using this probability distribution.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - A model is a joint probability distribution. <br /> <br /> - A submodel is a conditional distribution derived from this joint distribution. <br /> <br /> - A task is a specific use of this distribution. <br /> }}<br /> <br /> We will illustrate this approach starting with a very simple example that we will gradually make more sophisticated. Then we will see in various situations what can be defined as the model and what its inputs are.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==An illustrative example==<br /> <br /> &lt;br&gt;<br /> ===A model for the observations of a single individual===<br /> Let $y=(y_j, 1\leq j \leq n)$ be a vector of observations obtained at times $\vt=(t_j, 1\leq j \leq n)$. We consider that the $y_j$ are random variables and we denote $\qy$ the distribution (or [http://en.wikipedia.org/wiki/Probability_density_function pdf]) of $y$. If we assume a [http://en.wikipedia.org/wiki/Parametric_model parametric model], then there exists a vector of parameters $\psi$ that completely define $y$.<br /> <br /> We can then explicitly represent this dependency with respect to $\bpsi$ by writing $\qy( \, \cdot \, ; \psi)$ for the pdf of $y$.<br /> <br /> If we wish to be even more precise, we can even make it clear that this distribution is defined for a given design, i.e., a given vector of times $\vt$, and write $\qy(\, \cdot \, ; \psi,\vt)$ instead.<br /> <br /> By convention, the variables which are before the symbol &quot;;&quot; are random variables. Those that are after the &quot;;&quot; are non-random parameters or variables.<br /> When there is no risk of confusion, the non-random terms can be left out of the notation.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> -In this context, the model is the distribution of the observations $\qy(\, \cdot \, ; \psi,\vt)$. &lt;br&gt;<br /> -The inputs of the model are the parameters $\psi$ and the design $\vt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= 500 mg of a drug is given by [http://en.wikipedia.org/wiki/Intravenous_therapy intravenous] [http://en.wikipedia.org/wiki/Bolus_%28medicine%29 bolus] to a patient at time 0. We assume that the evolution of the [http://en.wikipedia.org/wiki/Blood_plasma plasmatic] concentration of the drug over time is described by the [http://en.wikipedia.org/wiki/Pharmacokinetics pharmacokinetic] (PK) model<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; f(t;V,k) = \displaystyle{ \frac{500}{V} }e^{-k \, t} , &lt;/math&gt; }}<br /> <br /> where $V$ is the [http://en.wikipedia.org/wiki/Volume_of_distribution volume of distribution] and $k$ the [http://en.wikipedia.org/wiki/Elimination_rate_constant elimination rate constant]. The concentration is measured at times $(t_j, 1\leq j \leq n)$ with additive residual errors:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; y_j = f(t_j;V,k) + e_j , \quad 1 \leq j \leq n . &lt;/math&gt; }}<br /> <br /> Assuming that the residual errors $(e_j)$ are [http://en.wikipedia.org/wiki/Dependent_and_independent_variables independent] and [http://en.wikipedia.org/wiki/Normal_distribution normally distributed] with constant variance $a^2$, the observed values $(y_j)$ are also independent random variables and<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba1&quot; &gt;&lt;math&gt;<br /> y_j \sim {\cal N} \left( f(t_j ; V,k) , a^2 \right), \quad 1 \leq j \leq n. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> Here, the vector of parameters $\psi$ is $(V,k,a)$. $V$ and $k$ are the PK parameters for the structural PK model and $a$ the residual error parameter.<br /> As the $y_j$ are independent, the joint distribution of $y$ is the product of their marginal distributions:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \py(y ; \psi,\vt) = \prod_{j=1}^n \pyj(y_j ; \psi,t_j) ,<br /> &lt;/math&gt; }}<br /> <br /> where $\qyj$ is the normal distribution defined in [[#ex_proba1|(1)]]. <br /> }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> === A model for several individuals ===<br /> <br /> Now let us move to $N$ individuals. It is natural to suppose that each is represented by the same basic parametric model, but not necessarily the exact same parameter values. Thus, individual $i$ has parameters $\psi_i$. If we consider that individuals are randomly selected from the [http://en.wikipedia.org/wiki/Statistical_population population], then we can treat the $\psi_i$ as if they were random vectors. As both $\by=(y_i , 1\leq i \leq N)$ and $\bpsi=(\psi_i , 1\leq i \leq N)$ are random, the model is now a joint distribution: $\qypsi$. Using basic probability, this can be written as:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsi(\by,\bpsi) = \pcypsi(\by {{!}} \bpsi) \, \ppsi(\bpsi) .&lt;/math&gt; }}<br /> <br /> If $\qpsi$ is a parametric distribution that depends on a vector $\theta$ of ''population parameters'' and a set of ''individual covariates'' $\bc=(c_i , 1\leq i \leq N)$, this dependence can be made explicit by writing $\qpsi(\, \cdot \,;\theta,\bc)$ for the pdf of $\bpsi$.<br /> Each $i$ has a potentially unique set of times $t_i=(t_{i1},\ldots,t_{i \ \!\!n_i})$ in the design, and $n_i$ can be different for each individual.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - In this context, the model is the joint distribution of the observations and the individual parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pypsi(\by , \bpsi; \theta, \bc,\bt)=\pcypsi(\by {{!}} \bpsi;\bt) \, \ppsi(\bpsi;\theta,\bc) . &lt;/math&gt;}}<br /> <br /> - The inputs of the model are the population parameters $\theta$, the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times <br /> :$\bt=(t_{ij} ,\ 1\leq i \leq N ,\ 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= Let us suppose $N$ patients received the same treatment as the single patient did. We now have the same PK model [[#ex_proba1|(1)]] for each patient, except that each has its own individual PK parameters $V_i$ and $k_i$ and potentially its own residual error parameter $a_i$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2a&quot;&gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N} \left( \displaystyle{\frac{500}{V_i}e^{-k_i \, t_{ij} } } , a_i^2 \right).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> Here, $\psi_i = (V_i,k_i,a_i)$. One possible model is then to assume the same residual error model for all patients, and log-normal distributions for $V$ and $k$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> a_i &amp;=&amp; a \end{eqnarray}&lt;/math&gt; }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2b&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \log(V_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(V_{\rm pop})+\beta\,\log(w_i/70),\ \omega_V^2\right) \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \log(k_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(k_{\rm pop}),\ \omega_k^2\right), \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where the only covariate we choose to consider, $w_i$, is the weight (in kg) of patient $i$. The model therefore consists of the conditional distribution of the concentrations defined in [[#ex_proba2a|(2)]] and the distribution of the individual PK parameters defined in [[#ex_proba2b|(3)]]. The inputs of the model are the population parameters $\theta = (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$, the covariates (here, the weight) $(w_i, 1\leq i \leq N)$, and the design $\bt$.<br /> }}<br /> <br /> &lt;br&gt;<br /> ===A model for the population parameters===<br /> <br /> In some cases, it may turn out that it is useful or important to consider that the population parameter $\theta$ is itself random, rather than fixed. There are various reasons for this, such as if we want to model uncertainty in its value, introduce a priori information in an estimation context, or model an inter-population variability if the model is not looking at only one given population.<br /> <br /> If so, let us denote $\qth$ the distribution of $\theta$. As the status of $\theta$ has therefore changed, the model now becomes the joint distribution of its random variables, i.e., of $\by$, $\bpsi$ and $\theta$, and can be decomposed as follows:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3a&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta;\bt,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> |text= &lt;ol&gt;<br /> &lt;li&gt; The formula is identical for $\ppsi(\bpsi; \theta)$ and $\pcpsith(\bpsi{{!}}\theta)$. What has changed is the status of $\theta$. It is not random in $\ppsi(\bpsi; \theta)$, the distribution of $\bpsi$ for any given value of $\theta$, whereas it is random in $\pcpsith(\bpsi {{!}} \theta)$, the conditional distribution of $\bpsi$, i.e., the distribution of $\bpsi$ obtained after observing randomly generated $\theta$. &lt;/li&gt;&lt;br&gt;<br /> <br /> &lt;li&gt;If $\qth$ is a parametric distribution with parameter $\varphi$, this dependence can be made explicit by writing $\qth(\, \cdot \,;\varphi)$ for the distribution of $\theta$.&lt;/li&gt;&lt;br&gt;<br /> <br /> &lt;li&gt;Not necessarily all of the components of $\theta$ need be random. If it is possible to decompose $\theta$ into $(\theta_F,\theta_R)$, where $\theta_F$ is fixed and $\theta_R$ random, then the decomposition [[#proba3a{{!}}(4)]] becomes &lt;/li&gt;<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3b&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta_R;\bt,\theta_F,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta_R;\theta_F,\bc) \, \pth(\theta_R).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(5) }} <br /> &lt;/ol&gt;}}<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the population parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsith(\by,\bpsi,\theta;\bc,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta). &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times $\bt=(t_{ij} , 1\leq i \leq N , 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We can introduce [http://en.wikipedia.org/wiki/Prior_probability prior distributions] in order to model the inter-population variability of the population parameters $V_{\rm pop}$ and $k_{\rm pop}$: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba3&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> V_{\rm pop} &amp;\sim&amp; {\cal N}\left(30,3^2\right) <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(6) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> k_{\rm pop} &amp;\sim&amp; {\cal N}\left(0.1,0.01^2\right). \end{eqnarray}&lt;/math&gt; }}<br /> <br /> As before, the conditional distribution of the concentration is given by [[#ex_proba2a|(2)]]. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given $\theta_R=(V_{\rm pop},k_{\rm pop})$. The distribution of $\theta_R$ is defined in [[#ex_proba3|(6)]]. Here, the inputs of the model are the fixed population parameters $\theta_F = (\omega_V,\omega_k,\beta,a)$, the weights $(w_i)$ and the design $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the covariates===<br /> <br /> <br /> Another scenario is to suppose that in fact it is the covariates $\bc$ that are random, not the population parameters. This may either be in the context of wanting to simulate individuals, or when modeling and wanting to take into account uncertainty in the covariate values. If we note $\qc$ the distribution of the covariates, the joint distribution $\qpsic$ of the individual parameters and the covariates decomposes naturally as:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba4&quot;&gt;&lt;math&gt;<br /> \ppsic(\bpsi,\bc;\theta) = \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) \, ,<br /> &lt;/math&gt;<br /> |reference=(7) }}<br /> <br /> where $\qcpsic$ is the conditional distribution of $\bpsi$ given $\bc$.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> &lt;li&gt;In this context, the model is the joint distribution of the observations, the individual parameters and the covariates:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsic(\by,\bpsi,\bc;\theta,\bt) = \pcypsi(\by {{!}} \bpsi;\bt) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) . &lt;/math&gt; }}<br /> <br /> &lt;li&gt;The inputs of the model are the population parameters $\theta$ and the measurement times $\bt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We could assume a normal distribution as a prior for the weights: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba4&quot; &gt;&lt;math&gt; <br /> w_i \sim_{i.i.d.} {\cal N}\left(70,10^2\right). &lt;/math&gt;&lt;/div&gt;<br /> |reference=(8) }}<br /> <br /> Once more, [[#ex_proba2a|(2)]] defines the conditional distribution of the concentrations. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given the weight $\bw$, which is now a random variable whose distribution is defined in [[#ex_proba4|(8)]]. Now, the inputs of the model are the population parameters $\theta= (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$ and the design $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the measurement times===<br /> <br /> Another scenario is to suppose that there is uncertainty in the measurement times $\bt=(t_{ij})$ and not the population parameters or covariates. If we note $\nominal{\bt}=(\nominal{t}_{ij}, 1\leq i \leq N, 1\leq j \leq n_i)$ the nominal measurement times (i.e., those presented in a data set), then the &quot;true&quot; measurement times $\bt$ at which the measurement were made can be considered random fluctuations around $\nominal{\bt}$ following some distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> Randomness with respect to time can also appear in the presence of dropout, i.e., individuals that prematurely leave a clinical trial. For such an individual $i$ who leaves at the random time $T_i$, measurement times are the nominal times before $T_i$: $t_{i} = (\nominal{t}_{ij} \ \ {\rm s.t. }\ \ \nominal{t}_{ij}\leq T_i)$.<br /> In such situations, measurement times are therefore random and can be thought to come from a distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> <br /> {{Remarks<br /> |title=Remark<br /> |text= If there are also other regression variables $\bx=(x_{ij})$, it is of course possible to use the same approach and consider $\bx$ as a random variable fluctuating around $\nominal{\bx}$. }}<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsit(\by , \bpsi,\bt; \theta,\bc,\nominal{\bt})=\pcypsit(\by {{!}}\bpsi,\bt) \, \ppsi(\bpsi;\theta,\bc) \, \pt(\bt ; \nominal{\bt}) .<br /> &lt;/math&gt; }}<br /> <br /> <br /> &lt;li&gt; The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$ and the nominal design $\nominal{\bt}$.<br /> }}<br /> <br /> {{Example<br /> |title=Example:<br /> |text= Let us assume as prior a normal distribution around the nominal times: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba5&quot; &gt;&lt;math&gt; <br /> t_{ij} \sim_{i.i.d.} {\cal N}\left(\nominal{t}_{ij},0.03^2\right). &lt;/math&gt;&lt;/div&gt;<br /> |reference=(9) }}<br /> <br /> Here, [[#ex_proba5|(9)]] defines the distribution of the now random variable $\bt$. The other components of the model defined in [[#ex_proba2a|(2)]] and [[#ex_proba2b|(3)]] remain unchanged. <br /> The inputs of the model are the population parameters $\theta$, the weights $(w_i)$ and the nominal measurement times $\nominal{\bt}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the dose regimen===<br /> <br /> If the structural model is a dynamical system (e.g., defined by a system of [http://en.wikipedia.org/wiki/Ordinary_differential_equation ordinary differential equations]), the ''source terms'' $\bu = (u_i, 1\leq i \leq N)$, i.e., the inputs of the dynamical system, are usually considered fixed and known. This is the case for example for doses administered to patients for a given treatment. Here, the source term $u_i$ is made up of the dose(s) given to patient $i$, the time(s) of administration, and their type (intravenous bolus, infusion, oral, etc.).<br /> <br /> Here again, there may be differences between the nominal dosage regimen stated in the protocol and given in the data set, and the dosage regimem that was in reality administered. For example, it might be that the times of administration and/or dose were not exactly respected or recorded. Also, there may have been non compliance, i.e., certain doses that were not taken by the patient.<br /> <br /> If we denote $\nominal{\bu}=(\nominal{u}_{i}, 1\leq i \leq N)$ the nominal dose regimens (reported in the dataset), then in this context the &quot;real&quot; dose regimens $\bu$ can be considered to randomly fluctuate around $\nominal{\bu}$ with some distribution $\qu(\, \cdot \, ; \nominal{\bu})$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the dose regimens:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsiu(\by , \bpsi,\bu; \theta,\bc,\bt,\nominal{\bu})=\pcypsiu(\by {{!}} \bpsi,\bu;\bt) \, \pu(\bu ; \nominal{\bu}) \, \ppsi(\bpsi;\theta,\bc) .<br /> &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$, the nominal design $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=Suppose that instead of the one dose given in the example up to now, there are now repeated doses $(d_{ik}, k \geq 1)$ administered to patient $i$ at times $(\tau_{ik} , k \geq 1)$. Then, it is easy to see that<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6b&quot; &gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N}\left(f(t_{ij};V_i,k_i) , a_i^2 \right), &lt;/math&gt;&lt;/div&gt;<br /> |reference=(10) }}<br /> <br /> where<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6a&quot; &gt;&lt;math&gt; <br /> f(t;V_i,k_i) = \sum_{k, \tau_{ik}&lt;t}\displaystyle {\frac{d_{ik} }{V_i} }\, e^{-k_i \, (t- \tau_{ik})} .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(11) }}<br /> <br /> The &quot;real&quot; dose regimen administered to patient $i$ can be written $u_i=(d_{ik},\tau_{ik}, k\geq 1)$, and the prescribed dose regimen $\nominal{u}_i=(\nominal{d}_{ik},\nominal{\tau}_{ik}, k\geq 1)$.<br /> <br /> We can model the random fluctuations of the administration times $\tau_{ik}$ around the nominal times $(\nominal{\tau}_{ij})$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6c&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \tau_{ik} &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\nominal{\tau}_{ik},0.02^2\right)\ , <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(12) }}<br /> <br /> and non compliance (here meaning that a dose is not taken):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6d&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \pi &amp;=&amp; \prob{d_{ik} = 0} \nonumber \\ &amp;=&amp; 1 - \prob{d_{ik}= \nominal{d}_{ik} }. <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(13) }}<br /> <br /> Here, [[#ex_proba6b{{!}}(10)]] and [[#ex_proba6a{{!}}(11)]] define the conditional distributions of the concentrations $(y_{ij})$, [[#ex_proba2b{{!}}(3)]] defines the distribution of $\bpsi$ and [[#ex_proba6c{{!}}(12)]] and [[#ex_proba6d{{!}}(13)]] define the distribution of $\bu$. The inputs are the population parameters $\theta$, the weights $(w_i)$, the measurement times $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A complete model===<br /> <br /> We have now seen the variety of ways in which the variables in a model either play the role of random variables whose distribution is defined by the model, or that of nonrandom variables or parameters. Any combination is possible, depending on the context. For instance, the population parameters $\theta$ and covariates $\bc$ could be random with parametric probability distributions $\qth(\, \cdot \,;\varphi)$ and $\qc(\, \cdot \,;\gamma)$, and the dose regimen $\bu$ and measurement times $\bt$ reported with uncertainty and therefore modeled as random variables with distribution $\qu$ and $\qt$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters, the population parameters, the dose regimens, the covariates and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsithcut(\by , \bpsi, \theta, \bu, \bc,\bt; \nominal{\bu},\nominal{\bt},\varphi,\gamma)=\pcypsiut(\by {{!}}\bpsi,\bu,\bt) \, \pcpsithc(\bpsi{{!}}\theta,\bc) \, \pth(\theta;\varphi) \, \pc(\bc;\gamma) \, \pu(\bu ; \nominal{\bu}) \, \pt(\bt ; \nominal{\bt}). &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the nominal dose regimens $\nominal{\bu}$, the nominal measurement times $\nominal{\bt}$ and the &quot;hyper-parameters&quot; $\varphi$ and $\gamma$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Using the model for executing tasks==<br /> <br /> <br /> In a modeling and simulation context, the tasks to execute make specific use of the various probability distributions associated with a model.<br /> <br /> &lt;br&gt;<br /> ===Simulation===<br /> <br /> <br /> By definition, simulation makes direct use of the probability distribution that defines the model. Simulation of the global model is straightforward as soon as the joint distribution can be decomposed into a product of easily simulated conditional distributions.<br /> <br /> Consider for example that the variables involved in the model are those introduced in [[#An illustrative example|the previous section]]:<br /> <br /> <br /> # The population parameters $\theta$ can either be given, or simulated from the distribution $\qth$.<br /> # The individual covariates $\bc$ can either be given, or simulated from the distribution $\qc$.<br /> # The individual parameters $\bpsi$ can be simulated from the distribution $\qcpsithc$ using the values of $\theta$ and $\bc$ obtained in steps 1 and 2.<br /> # The dose regimen $\bu$ can either be given, or simulated from the distribution $\qu$.<br /> # The measurement times $\bt$ (resp. regression variables $\bx$) can either be given, or simulated from the distribution $\qt$ (resp. $\qx$).<br /> # Lastly, observations $\by$ can be simulated from the distribution $\qcypsiut$ using the values of $\bpsi$, $\bu$ and $\bt$ obtained at steps 3, 4 and 5.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> Simulation of a set of variables $w$ using another given set of variables $z$ requires:<br /> <br /> <br /> &lt;ul&gt;<br /> * a model, i.e., a distribution $\qw$ if $z$ is treated as a nonrandom variable, or a conditional distribution $\qcwz$ if $z$ is treated as a random variable.<br /> * the input $z$, i.e., a value of $z$ which allows the distribution $\qw(\, \cdot \, ; z)$ or the conditional distribution $\qcwz(\, \cdot \, {{!}} z)$ to be defined.<br /> * an algorithm which allows us to generate $w$ from $\qw$ or $\qcwz$.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=<br /> - Imagine instead that the population parameter $\theta$ and the design $(\bu,\bt)$ are given, and we want to simulate the individual covariates $\bc$, the individual parameters $\bpsi$ and the observations $\by$. Here, the variables to simulate are $w=(\bc,\bpsi,\by)$ and the variables which are given are $z=(\theta,\bu,\bt)$. If the components of $z$ are taken to be nonrandom variables, then:<br /> <br /> <br /> &lt;ul&gt;<br /> * The model is the joint distribution $\qypsic( \, \cdot \, ;\theta,\bu,\bt)$ of $(\by,\bpsi,\bc)$.<br /> * The inputs required for the simulation are the values of $(\theta,\bu,\bt)$.<br /> * The algorithm should be able to generate $(\by,\bpsi,\bc)$ from the joint distribution $\qypsic(\, \cdot \, ;\theta,\bu,\bt)$. Decomposing the model into three submodels leads to decomposing the joint distribution as<br /> &lt;/ul&gt;<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsic(\by,\bpsi,\bc ;\theta,\bu,\bt) = \pc(\bc) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pcypsi(\by {{!}} \bpsi;\bu,\bt) . &lt;/math&gt; }}<br /> <br /> The algorithm therefore reduces to successively drawing $\bc$, $\bpsi$ and $\by$ from $\qc$, $\qcpsic(\, \cdot \, {{!}} \bc;\theta)$ and $\qcypsi(\, \cdot \, {{!}} \bpsi;\bu,\bt)$. <br /> <br /> <br /> - Imagine instead that the individual covariates $\bc$, the observations $\by$, the design $(\bu,\bt)$ and the population parameter $\theta$ are given (in a modeling context for instance, $\theta$ may have been estimated), and we want to simulate the individual parameters $\bpsi$. The only variable to simulate is $w=\bpsi$ and the variables which are given are $z=(\by,\bc,\theta,\bu,\bt)$. Here, we will treat $\by$ as if it is a random variable. The other components of $z$ can be treated as non random variables. Here,<br /> <br /> <br /> &lt;ul&gt;<br /> * The model is the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$ of $\psi$.<br /> * The inputs required for the simulation are the values of $(\by,\bc,\theta,\bu,\bt)$.<br /> * The algorithm should be able to sample $\bpsi$ from the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$. Markov Chain Monte Carlo (MCMC) algorithms can be used for sampling from such complex conditional distributions.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> &lt;br&gt;<br /> <br /> ===Estimation of the population parameters===<br /> <br /> <br /> In a modeling context, we usually assume that we have data that includes the observations $\by$ and the measurement times $\bt$. There may also be individual covariates $\bc$, and in pharmacological applications the dose regimen $\bu$. For clarity, let us consider the most general case where all are present.<br /> <br /> Any statistical method for estimating the population parameters $\theta$ will be based on some specific probability distribution. Let us illustrate this with two common statistical methods: maximum likelihood and Bayesian estimation.<br /> <br /> <br /> ''Maximum likelihood estimation'' consists in maximizing with respect to $\theta$ the ''observed likelihood'', defined by:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\like}(\theta ; \by,\bc,\bu,\bt) &amp;\eqdef&amp; \py(\by ; \bc,\bu,\bt,\theta) \\<br /> &amp;=&amp; \int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The variance of the estimator $\thmle$ and therefore confidence intervals can be derived from the observed Fisher information matrix, which itself is calculated using the observed likelihood (i.e., the pdf of the observations $\by$):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ofim_intro3&quot;&gt;&lt;math&gt;<br /> \ofim(\thmle ; \by,\bc,\bu,\bt) \ \ \eqdef \ \ - \displaystyle{ \frac{\partial^2}{\partial \theta^2} } \log({\like}(\thmle ; \by,\bc,\bu,\bt)) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(14) }}<br /> <br /> <br /> {{OutlineText<br /> |text=Maximum likelihood estimation of the population parameter $\theta$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi$.<br /> * inputs $\by$, $\bc$, $\bu$ and $\bt$.<br /> * an algorithm which allows us to maximize $\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi$ with respect to $\theta$ and to compute $\displaystyle{ \frac{\partial^2}{\partial \theta^2} }\left\{\log\left(\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\thmle) \, d \bpsi \right)\right\}$.<br /> }}<br /> <br /> <br /> ''Bayesian estimation'' consists in estimating and/or maximizing the conditional distribution<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcthy(\theta {{!}} \by ;\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pyth(\by , \theta ; \bc,\bu,\bt)}{\py(\by ; \bc,\bu,\bt)} } \\<br /> &amp;=&amp; \frac{\displaystyle{ \int \pypsith(\by,\bpsi,\theta ; \bc,\bu,\bt) \, d \bpsi} }{\py(\by ; \bc,\bu,\bt)} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> {{outlineText<br /> |text= Bayesian estimation of the population parameter $\theta$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsith(\, \cdot \, ; \bc, \bu, \bt)$ for $(\by,\bpsi,\theta)$.<br /> * inputs $\by$, $\bc$, $\bu$ and $\bt$.<br /> * algorithms able to estimate and maximize $\pcthy(\theta {{!}} \by ;\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode, i.e., finding its maximum.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Estimation of the individual parameters===<br /> <br /> <br /> When $\theta$ is given (or estimated), various estimators of the individual parameters $\bpsi$ are available. They are all based on a probability distribution:<br /> <br /> ''Maximum likelihood estimation'' consists of maximizing with respect to $\bpsi$ the ''conditional likelihood''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\like}(\bpsi ; \by,\bu,\bt) &amp;\eqdef&amp; \pcypsi(\by {{!}} \bpsi ;\bu,\bt) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''maximum a posteriori'' (MAP) estimator is obtained by maximizing with respect to $\bpsi$ the ''conditional distribution''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcpsiy(\bpsi {{!]}} \by ; \theta,\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pypsi(\by , \bpsi;\theta,\bc,\bu,\bt)}{\py(\by ; \theta,\bc,\bu,\bt)} } .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''conditional mean'' of $\bpsi$ is defined as the mean of the conditional distribution $\qcpsiy(\, \cdot \, | \by ; \theta,\bc,\bu,\bt)$ of $\psi$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> Estimation of the individual parameters $\bpsi$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br /> * algorithms able to estimate and maximize $\pcpsiy(\bpsi {{!}} \by ; \theta,\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode (i.e., its MAP).<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> ===Model selection===<br /> <br /> <br /> Likelihood ratio tests and statistical information criteria (BIC, AIC) compare the ''observed likelihoods'' computed under different models, i.e., the probability distribution functions $\py^{(1)}(\by ; \bc,\bu,\bt,\thmle_1)$, $\py^{(2)}(\by ; \bc,\bu,\bt,\thmle_2)$, ..., $\py^{(K)}(\by ; \bc,\bu,\bt,\thmle_K)$ computed under models ${\cal M}_1, {\cal M}_2$, ..., ${\cal M}_K$, where $\thmle_k$ maximizes the observed likelihood of model ${\cal M}_k$, i.e., maximizes $\py^{(k)}(\by ; \bc,\bu,\bt,\theta)$ .<br /> <br /> <br /> {{outlineText<br /> |text=<br /> Computing the observed likelihood and information criteria require:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br /> * an algorithm able to compute $\int \pypsi( \by ,\bpsi ;\theta,\bc,\bu,\bt) \, d\bpsi$. For nonlinear models, linearization methods or Monte-Carlo methods can be used.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Optimal design===<br /> <br /> In the design of experiments for estimating statistical models, optimal designs allow parameters to be estimated with minimum variance by optimizing some statistical criteria. Common optimality criteria are functionals of the eigenvalues of the expected Fisher information matrix:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;efim_intro3&quot;&gt;&lt;math&gt;<br /> \efim(\theta ; \bu,\bt) \ \ \eqdef \ \ \esps{y}{\ofim(\theta ; \by,\bu,\bt)} ,<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(15) }}<br /> <br /> where $\ofim$ is the observed Fisher information matrix defined in [[#ofim_intro3|(15)]]. For the sake of simplicity, we consider models without covariates $\bc$.<br /> <br /> <br /> {{OutlineText<br /> |text=Optimal design for minimum variance estimation requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * a vector of population parameters $\theta$.<br /> * a criteria ${\cal D}(\bu,\bt)$ derived from the expected Fisher information matrix $\efim(\theta ; \bu,\bt)$.<br /> * an algorithm able to estimate $\efim(\theta ; \bu,\bt)$ for any design $(\bu,\bt)$ and to maximize ${\cal D}(\bu,\bt)$ with respect to $\bu$ and $\bt$.<br /> }}<br /> <br /> <br /> In a clinical trial context, studies are designed to optimize the probability of reaching some predefined target ${\cal A}$, i.e., $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$. This may include optimizing safety and efficacy, and things like the probability of reaching sustained virologic response, etc.<br /> <br /> <br /> {{OutlineText<br /> |text=Optimal design for clinical trials requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsic(\, \cdot \, ; \theta, \bu, \bt)$ for $(\by,\bpsi,\bc)$.<br /> * a vector of population parameters $\theta$.<br /> * a target ${\cal A}$.<br /> * an algorithm able to estimate $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$ and to maximize it with respect to $\bu$ and $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Implementing models and running tasks==<br /> <br /> <br /> ===Example 1 ===<br /> <br /> Co