Testing A joint probability
$ \DeclareMathOperator{\argmin}{arg\,min} \DeclareMathOperator{\argmax}{arg\,max} \newcommand{\nominal}[1]{#1^{\star}} \newcommand{\psis}{\psi{^\star}} \newcommand{\phis}{\phi{^\star}} \newcommand{\hpsi}{\hat{\psi}} \newcommand{\hphi}{\hat{\phi}} \newcommand{\teps}{\varepsilon} \newcommand{\limite}[2]{\mathop{\longrightarrow}\limits_{\mathrm{#1}}^{\mathrm{#2}}} \newcommand{\DDt}[1]{\partial^2_\theta #1} \def\aref{a^\star} \def\kref{k^\star} \def\model{M} \def\hmodel{m} \def\mmodel{\mu} \def\imodel{H} \def\like{\cal L} \def\thmle{\hat{\theta}} \def\ofim{I^{\rm obs}} \def\Imax{\text{\it Imax}} \def\id{ {\rm Id}} \def\teta{\tilde{\eta}} \newcommand{\eqdef}{\mathop{=}\limits^{\mathrm{def}}} \newcommand{\deriv}[1]{\frac{d}{dt}#1(t)} \newcommand{\pred}[1]{\tilde{#1}} \def\phis{\phi{^\star}} \def\hphi{\tilde{\phi}} \def\hw{\tilde{w}} \def\hpsi{\tilde{\psi}} \def\hatpsi{\hat{\psi}} \def\hatphi{\hat{\phi}} \def\psis{\psi{^\star}} \def\transy{u} \def\psipop{\psi_{\rm pop}} \newcommand{\psigr}[1]{\hat{\bpsi}_{#1}} \newcommand{\Vgr}[1]{\hat{V}_{#1}} \def\psig{\psi} \def\psigprime{\psig^{\prime}} \def\psigiprime{\psig_i^{\prime}} \def\psigk{\psig^{(k)}} \def\psigki{ {\psig_i^{(k)}}} \def\psigkun{\psig^{(k+1)}} \def\psigkuni{\psig_i^{(k+1)}} \def\psigi{ {\psig_i}} \def\psigil{ {\psig_{i,\ell}}} \def\phig{ {\phi}} \def\phigi{ {\phig_i}} \def\phigil{ {\phig_{i,\ell}}} \def\etagi{ {\eta_i}} \def\IIV{ {\Omega}} \def\thetag{ {\theta}} \def\thetagk{ {\theta_k}} \def\thetagkun{ {\theta_{k+1}}} \def\thetagkunm{\theta_{k1}} \def\sgk{s_{k}} \def\sgkun{s_{k+1}} \def\yg{y} \def\xg{x} \def\qy{p_{_y}} \def\qt{p_{_t}} \def\qc{p_{_c}} \def\qu{p_{_u}} \def\qyi{p_{_{y_i}}} \def\qyj{p_{_{y_j}}} \def\qpsi{p_{_{\psi}}} \def\qpsii{p_{_{\psi_i}}} \def\qcpsith{p_{_{\psi\theta}}} \def\qth{p_{_{\theta}}} \def\qypsi{p_{_{y,\psi}}} \def\qcypsi{p_{_{y\psi}}} \def\qpsic{p_{_{\psi,c}}} \def\qcpsic{p_{_{\psic}}} \def\qypsic{p_{_{y,\psi,c}}} \def\qypsit{p_{_{y,\psi,t}}} \def\qcypsit{p_{_{y\psi,t}}} \def\qypsiu{p_{_{y,\psi,u}}} \def\qcypsiu{p_{_{y\psi,u}}} \def\qypsith{p_{_{y,\psi,\theta}}} \def\qypsithcut{p_{_{y,\psi,\theta,c,u,t}}} \def\qypsithc{p_{_{y,\psi,\theta,c}}} \def\qcypsiut{p_{_{y\psi,u,t}}} \def\qcpsithc{p_{_{\psi\theta,c}}} \def\qcthy{p_{_{\theta  y}}} \def\qyth{p_{_{y,\theta}}} \def\qcpsiy{p_{_{\psiy}}} \def\qz{p_{_z}} \def\qw{p_{_w}} \def\qcwz{p_{_{wz}}} \def\qw{p_{_w}} \def\qcyipsii{p_{_{y_i\psi_i}}} \def\qyipsii{p_{_{y_i,\psi_i}}} \def\qypsiij{p_{_{y_{ij}\psi_{i}}}} \def\qyipsi1{p_{_{y_{i1}\psi_{i}}}} \def\qtypsiij{p_{_{\transy(y_{ij})\psi_{i}}}} \def\qcyzipsii{p_{_{z_i,y_i\psi_i}}} \def\qczipsii{p_{_{z_i\psi_i}}} \def\qcyizpsii{p_{_{y_iz_i,\psi_i}}} \def\qcyijzpsii{p_{_{y_{ij}z_{ij},\psi_i}}} \def\qcyi1zpsii{p_{_{y_{i1}z_{i1},\psi_i}}} \def\qcypsiz{p_{_{y,\psiz}}} \def\qccypsiz{p_{_{y\psi,z}}} \def\qypsiz{p_{_{y,\psi,z}}} \def\qcpsiz{p_{_{\psiz}}} \def\qeps{p_{_{\teps}}} \def\neta{ {n_\eta}} \def\ncov{M} \def\npsi{n_\psig} \def\bu{\boldsymbol{u}} \def\bt{\boldsymbol{t}} \def\bT{\boldsymbol{T}} \def\by{\boldsymbol{y}} \def\bx{\boldsymbol{x}} \def\bc{\boldsymbol{c}} \def\bw{\boldsymbol{w}} \def\bz{\boldsymbol{z}} \def\bpsi{\boldsymbol{\psi}} \def\bbeta{\beta} \def\beeta{\eta} \def\logit{\rm logit} \def\transy{u} \def\so{O} \def\one{\mathbb 1} \newcommand{\prob}[1]{ \mathbb{P}\!\left(#1\right)} \newcommand{\probs}[2]{ \mathbb{P}_{#1}\!\left(#2\right)} \newcommand{\esp}[1]{\mathbb{E}\left(#1\right)} \newcommand{\esps}[2]{\mathbb{E}_{#1}\left(#2\right)} \newcommand{\var}[1]{\mbox{Var}\left(#1\right)} \newcommand{\vars}[2]{\mbox{Var}_{#1}\left(#2\right)} \newcommand{\std}[1]{\mbox{sd}\left(#1\right)} \newcommand{\stds}[2]{\mbox{sd}_{#1}\left(#2\right)} \newcommand{\corr}[1]{\mbox{Corr}\left(#1\right)} \def\pmacro{\mathbf{p}} \def\py{\pmacro} \def\pt{\pmacro} \def\pc{\pmacro} \def\pu{\pmacro} \def\pyi{\pmacro} \def\pyj{\pmacro} \def\ppsi{\pmacro} \def\ppsii{\pmacro} \def\pcpsith{\pmacro} \def\pth{\pmacro} \def\pypsi{\pmacro} \def\pcypsi{\pmacro} \def\ppsic{\pmacro} \def\pcpsic{\pmacro} \def\pypsic{\pmacro} \def\pypsit{\pmacro} \def\pcypsit{\pmacro} \def\pypsiu{\pmacro} \def\pcypsiu{\pmacro} \def\pypsith{\pmacro} \def\pypsithcut{\pmacro} \def\pypsithc{\pmacro} \def\pcypsiut{\pmacro} \def\pcpsithc{\pmacro} \def\pcthy{\pmacro} \def\pyth{\pmacro} \def\pcpsiy{\pmacro} \def\pz{\pmacro} \def\pw{\pmacro} \def\pcwz{\pmacro} \def\pw{\pmacro} \def\pcyipsii{\pmacro} \def\pyipsii{\pmacro} \def\pypsiij{\pmacro} \def\pyipsi1{\pmacro} \def\ptypsiij{\pmacro} \def\pcyzipsii{\pmacro} \def\pczipsii{\pmacro} \def\pcyizpsii{\pmacro} \def\pcyijzpsii{\pmacro} \def\pcyi1zpsii{\pmacro} \def\pcypsiz{\pmacro} \def\pccypsiz{\pmacro} \def\pypsiz{\pmacro} \def\pcpsiz{\pmacro} \def\peps{\pmacro} \def\vt{ {t} } \def\mlxtran{\mathbb MLXtran} $
Introduction
A model built for realworld applications can involve various types of variable, such as measurements, individual and population parameters, covariates, design, etc. The model allows us to represent relationships between these variables.
If we consider things from a probabilistic point of view, some of the variables will be random, so the model becomes a probabilistic one, representing the joint distribution of these random variables.
Defining a model therefore means defining a joint distribution. The hierarchical structure of the model will then allow it to be decomposed into submodels, i.e., the joint distribution decomposed into a product of conditional distributions.
Tasks such as estimation, model selection, simulation and optimization can then be expressed as specific ways of using this probability distribution.
We will illustrate this approach starting with a very simple example that we will gradually make more sophisticated. Then we will see in various situations what can be defined as the model and what its inputs are.
An illustrative example
A model for the observations of a single individual
Let $y=(y_j, 1\leq j \leq n)$ be a vector of observations obtained at times $\vt=(t_j, 1\leq j \leq n)$. We consider that the $y_j$ are random variables and we denote $\py$ the distribution (or pdf) of $y$. If we assume that $y$ is a parametric model, then there exists a vector of parameters $\psi$ that completely define $y$.
We can then explicitly represent this dependency with respect to $\bpsi$ by writing $\qy( \, \cdot \, ; \psi)$ for the pdf of $y$.
If we wish to be even more precise, we can even make it clear that this distribution is defined for a given design, i.e., a given vector of times $\vt$, and write $ \qy(\, \cdot \, ; \psi,\vt)$ instead.
By convention, the variables which are before the symbol ";" are random variables. Those that are after the ";" are nonrandom parameters or variables. When there is no risk of confusion, the nonrandom terms can be left out of the notation.
A model for the population parameters
In some cases, it may turn out that it is useful or important to consider that the population parameter $\theta$ is itself random, rather than fixed. There are various reasons for this, such as if we want to model uncertainty in its value, introduce a priori information in an estimation context, or model an interpopulation variability if the model is not looking at only one given population.
If so, let us denote $\qth$ the distribution of $\theta$. As the status of $\theta$ has therefore changed, the model now becomes the joint distribution of its random variables, i.e., of $\by$, $\bpsi$ and $\theta$, and can be decomposed as follows:
\(
\pypsith(\by,\bpsi,\theta;\bt,\bc) = \pcypsi(\by \bpsi;\bt) \, \pcpsith(\bpsi\theta;\bc) \, \pth(\theta) .
\)

(1.7) 
A model for several individuals
Now let us move to $N$ individuals. It is natural to suppose that each is represented by the same basic parametric model, but not necessarily the exact same parameter values. Thus, individual $i$ has parameters $\psi_i$. If we consider that individuals are randomly selected from the population, then we can treat the $\psi_i$ as if they were random vectors. As both $\by=(y_i , 1\leq i \leq N)$ and $\bpsi=(\psi_i , 1\leq i \leq N)$ are random, the model is now a joint distribution: $\qypsi$. Using basic probability, this can be written as:
If $\qpsi$ is a parametric distribution that depends on a vector $\theta$ of population parameters and a set of individual covariates $\bc=(c_i , 1\leq i \leq N)$, this dependence can be made explicit by writing $\qpsi(\, \cdot \,;\theta,\bc)$ for the pdf of $\bpsi$.
Each $i$ has a potentially unique set of times $t_i=(t_{i1},\ldots,t_{i \ \!\!n_i})$ in the design, and $n_i$ can be different for each individual.
A model for the covariates
Another scenario is to suppose that in fact it is the covariates $\bc$ that are random, not the population parameters. This may either be in the context of wanting to simulate individuals, or when modeling and wanting to take into account uncertainty in the covariate values. If we note $\qc$ the distribution of the covariates, the joint distribution $\qpsic$ of the individual parameters and the covariates decomposes naturally as:
\(
\ppsic(\bpsi,\bc;\theta) = \pcpsic(\bpsi  \bc;\theta) \, \pc(\bc) \ ,
\)

(1.10) 
where $\qcpsic$ is the conditional distribution of $\bpsi$ given $\bc$.
A model for the measurement times
Another scenario is to suppose that there is uncertainty in the measurement times $\bt=(t_{ij})$ and not the population parameters or covariates. If we note $\nominal{\bt}=(\nominal{t}_{ij}, 1\leq i \leq N, 1\leq j \leq n_i)$ the nominal measurement times (i.e., those presented in a data set), then the "true" measurement times $\bt$ at which the measurement were made can be considered random fluctuations around $\nominal{\bt}$ following some distribution $\qt(\, \cdot \, ; \nominal{\bt})$.
Randomness with respect to time can also appear in the presence of dropout, i.e., individuals that prematurely leave a clinical trial. For such an individual $i$ who leaves at the random time $T_i$, measurement times are the nominal times before $T_i$: $t_{i} = (\nominal{t}_{ij} \ \ {\rm s.t. }\ \ \nominal{t}_{ij}\leq T_i)$. In such situations, measurement times are therefore random and can be thought to come from a distribution $\qt(\, \cdot \, ; \nominal{\bt})$.
A model for the dose regimen
If the structural model is a dynamical system (e.g., defined by a system of ordinary differential equations), the source terms $\bu = (u_i, 1\leq i \leq N)$, i.e., the inputs of the dynamical system, are usually considered fixed and known. This is the case for example for doses administered to patients for a given treatment. Here, the source term $u_i$ is made up of the dose(s) given to patient $i$, the time(s) of administration, and their type (IV bolus, infusion, oral, etc.).
Here again, there may be differences between the nominal dosage regimen stated in the protocol and given in the data set, and the dosage regimem that was in reality administered. For example, it might be that the times of administration and/or dose were not exactly respected or recorded. Also, there may have been non compliance, i.e., certain doses that were not taken by the patient.
If we denote $\nominal{\bu}=(\nominal{u}_{i}, 1\leq i \leq N)$ the nominal dose regimens (reported in the dataset), then in this context the "real" dose regimens $\bu$ can be considered to randomly fluctuate around $\nominal{\bu}$ with some distribution $\qu(\, \cdot \, ; \nominal{\bu})$.
A complete model
We have now seen the variety of ways in which the variables in a model either play the role of random variables whose distribution is defined by the model, or that of nonrandom variables or parameters. Any combination is possible, depending on the context. For instance, the population parameters $\theta$ and covariates $\bc$ could be random with parametric probability distributions $\qth(\, \cdot \,;\varphi)$ and $\qc(\, \cdot \,;\gamma)$, and the dose regimen $\bu$ and measurement times $\bt$ reported with uncertainty and therefore modeled as random variables with distribution $\qu$ and $\qt$.
Using the model for executing tasks
In a modeling and simulation context, the tasks to execute make specific use of the various probability distributions associated with a model.
Simulation
By definition, simulation makes direct use of the probability distribution that defines the model. Simulation of the global model is straightforward as soon as the joint distribution can be decomposed into a product of easily simulated conditional distributions.
Consider for example that the variables involved in the model are those introduced in the previous section:
 The population parameters $\theta$ can either be given, or simulated from the distribution $\qth$.
 The individual covariates $\bc$ can either be given, or simulated from the distribution $\qc$.
 The individual parameters $\bpsi$ can be simulated from the distribution $\qcpsithc$ using the values of $\theta$ and $\bc$ obtained in steps 1 and 2.
 The dose regimen $\bu$ can either be given, or simulated from the distribution $\qu$.
 The measurement times $\bt$ (or regression variables $\bx$) can either be given, or simulated from the distribution $\qt$.
 Lastly, observations $\by$ can be simulated from the distribution $\qcypsiut$ using the values of $\bpsi$, $\bu$ and $\bt$ obtained at steps 3, 4 and 5.
Estimation of the population parameters
In a modeling context, we usually assume that we have data that includes the observations $\by$ and the measurement times $\bt$. There may also be individual covariates $\bc$, and in pharmacological applications the dose regimen $\bu$. For clarity, let us consider the most general case where all are present.
Any statistical method for estimating the population parameters $\theta$ will be based on some specific probability distribution. Let us illustrate this with two common statistical methods: maximum likelihood and Bayesian estimation.
Maximum likelihood estimation consists in maximizing with respect to $\theta$ the observed likelihood, defined by:
The variance of the estimator $\thmle$ and therefore confidence intervals can be derived from the observed Fisher information matrix, which itself is calculated using the observed likelihood (i.e., the pdf of the observations $\by$):
\(
\ofim(\thmle ; \by,\bc,\bu,\bt) \ \ \eqdef \ \  \displaystyle{ \frac{\partial^2}{\partial \theta^2} } \log(\like(\thmle ; \by,\bc,\bu,\bt)) .
\)

(1.17) 
Bayesian estimation consists in estimating and/or maximizing the conditional distribution
Estimation of the individual parameters
When $\theta$ is given (or estimated), various estimators of the individual parameters $\bpsi$ are available. They are all based on a probability distribution:
Maximum likelihood estimation consists of maximizing with respect to $\bpsi$ the conditional likelihood
The maximum a posteriori (MAP) estimator is obtained by maximizing with respect to $\bpsi$ the {\it conditional distribution}
The conditional mean of $\bpsi$ is defined as the mean of the conditional distribution $\qcpsiy(\, \cdot \,  \by ; \theta,\bc,\bu,\bt)$ of $\psi$.