# Difference between revisions of "Testing A joint probability"

$\DeclareMathOperator{\argmin}{arg\,min} \DeclareMathOperator{\argmax}{arg\,max} \newcommand{\nominal}[1]{#1^{\star}} \newcommand{\psis}{\psi{^\star}} \newcommand{\phis}{\phi{^\star}} \newcommand{\hpsi}{\hat{\psi}} \newcommand{\hphi}{\hat{\phi}} \newcommand{\teps}{\varepsilon} \newcommand{\limite}[2]{\mathop{\longrightarrow}\limits_{\mathrm{#1}}^{\mathrm{#2}}} \newcommand{\DDt}[1]{\partial^2_\theta #1} \def\aref{a^\star} \def\kref{k^\star} \def\model{M} \def\hmodel{m} \def\mmodel{\mu} \def\imodel{H} \def\Imax{\text{\it Imax}} \def\id{ {\rm Id}} \def\teta{\tilde{\eta}} \newcommand{\eqdef}{\mathop{=}\limits^{\mathrm{def}}} \newcommand{\deriv}[1]{\frac{d}{dt}#1(t)} \newcommand{\pred}[1]{\tilde{#1}} \def\phis{\phi{^\star}} \def\hphi{\tilde{\phi}} \def\hw{\tilde{w}} \def\hpsi{\tilde{\psi}} \def\hatpsi{\hat{\psi}} \def\hatphi{\hat{\phi}} \def\psis{\psi{^\star}} \def\transy{u} \def\psipop{\psi_{\rm pop}} \newcommand{\psigr}[1]{\hat{\bpsi}_{#1}} \newcommand{\Vgr}[1]{\hat{V}_{#1}} \def\psig{\psi} \def\psigprime{\psig^{\prime}} \def\psigiprime{\psig_i^{\prime}} \def\psigk{\psig^{(k)}} \def\psigki{ {\psig_i^{(k)}}} \def\psigkun{\psig^{(k+1)}} \def\psigkuni{\psig_i^{(k+1)}} \def\psigi{ {\psig_i}} \def\psigil{ {\psig_{i,\ell}}} \def\phig{ {\phi}} \def\phigi{ {\phig_i}} \def\phigil{ {\phig_{i,\ell}}} \def\etagi{ {\eta_i}} \def\IIV{ {\Omega}} \def\thetag{ {\theta}} \def\thetagk{ {\theta_k}} \def\thetagkun{ {\theta_{k+1}}} \def\thetagkunm{\theta_{k-1}} \def\sgk{s_{k}} \def\sgkun{s_{k+1}} \def\yg{y} \def\xg{x} \def\qy{p_{_y}} \def\qt{p_{_t}} \def\qc{p_{_c}} \def\qu{p_{_u}} \def\qyi{p_{_{y_i}}} \def\qyj{p_{_{y_j}}} \def\qpsi{p_{_{\psi}}} \def\qpsii{p_{_{\psi_i}}} \def\qcpsith{p_{_{\psi|\theta}}} \def\qth{p_{_{\theta}}} \def\qypsi{p_{_{y,\psi}}} \def\qcypsi{p_{_{y|\psi}}} \def\qpsic{p_{_{\psi,c}}} \def\qcpsic{p_{_{\psi|c}}} \def\qypsic{p_{_{y,\psi,c}}} \def\qypsit{p_{_{y,\psi,t}}} \def\qcypsit{p_{_{y|\psi,t}}} \def\qypsiu{p_{_{y,\psi,u}}} \def\qcypsiu{p_{_{y|\psi,u}}} \def\qypsith{p_{_{y,\psi,\theta}}} \def\qypsithcut{p_{_{y,\psi,\theta,c,u,t}}} \def\qypsithc{p_{_{y,\psi,\theta,c}}} \def\qcypsiut{p_{_{y|\psi,u,t}}} \def\qcpsithc{p_{_{\psi|\theta,c}}} \def\qcthy{p_{_{\theta | y}}} \def\qyth{p_{_{y,\theta}}} \def\qcpsiy{p_{_{\psi|y}}} \def\qz{p_{_z}} \def\qw{p_{_w}} \def\qcwz{p_{_{w|z}}} \def\qw{p_{_w}} \def\qcyipsii{p_{_{y_i|\psi_i}}} \def\qyipsii{p_{_{y_i,\psi_i}}} \def\qypsiij{p_{_{y_{ij}|\psi_{i}}}} \def\qyipsi1{p_{_{y_{i1}|\psi_{i}}}} \def\qtypsiij{p_{_{\transy(y_{ij})|\psi_{i}}}} \def\qcyzipsii{p_{_{z_i,y_i|\psi_i}}} \def\qczipsii{p_{_{z_i|\psi_i}}} \def\qcyizpsii{p_{_{y_i|z_i,\psi_i}}} \def\qcyijzpsii{p_{_{y_{ij}|z_{ij},\psi_i}}} \def\qcyi1zpsii{p_{_{y_{i1}|z_{i1},\psi_i}}} \def\qcypsiz{p_{_{y,\psi|z}}} \def\qccypsiz{p_{_{y|\psi,z}}} \def\qypsiz{p_{_{y,\psi,z}}} \def\qcpsiz{p_{_{\psi|z}}} \def\qeps{p_{_{\teps}}} \def\neta{ {n_\eta}} \def\ncov{M} \def\npsi{n_\psig} \def\bu{\boldsymbol{u}} \def\bt{\boldsymbol{t}} \def\bT{\boldsymbol{T}} \def\by{\boldsymbol{y}} \def\bx{\boldsymbol{x}} \def\bc{\boldsymbol{c}} \def\bw{\boldsymbol{w}} \def\bz{\boldsymbol{z}} \def\bpsi{\boldsymbol{\psi}} \def\bbeta{\beta} \def\beeta{\eta} \def\logit{\rm logit} \def\transy{u} \def\so{O} \def\one{\mathbb 1} \newcommand{\prob}[1]{ \mathbb{P}\!\left(#1\right)} \newcommand{\probs}[2]{ \mathbb{P}_{#1}\!\left(#2\right)} \newcommand{\esp}[1]{\mathbb{E}\left(#1\right)} \newcommand{\esps}[2]{\mathbb{E}_{#1}\left(#2\right)} \newcommand{\var}[1]{\mbox{Var}\left(#1\right)} \newcommand{\vars}[2]{\mbox{Var}_{#1}\left(#2\right)} \newcommand{\std}[1]{\mbox{sd}\left(#1\right)} \newcommand{\stds}[2]{\mbox{sd}_{#1}\left(#2\right)} \newcommand{\corr}[1]{\mbox{Corr}\left(#1\right)} \def\pmacro{\mathbf{p}} \def\py{\pmacro} \def\pt{\pmacro} \def\pc{\pmacro} \def\pu{\pmacro} \def\pyi{\pmacro} \def\pyj{\pmacro} \def\ppsi{\pmacro} \def\ppsii{\pmacro} \def\pcpsith{\pmacro} \def\pth{\pmacro} \def\pypsi{\pmacro} \def\pcypsi{\pmacro} \def\ppsic{\pmacro} \def\pcpsic{\pmacro} \def\pypsic{\pmacro} \def\pypsit{\pmacro} \def\pcypsit{\pmacro} \def\pypsiu{\pmacro} \def\pcypsiu{\pmacro} \def\pypsith{\pmacro} \def\pypsithcut{\pmacro} \def\pypsithc{\pmacro} \def\pcypsiut{\pmacro} \def\pcpsithc{\pmacro} \def\pcthy{\pmacro} \def\pyth{\pmacro} \def\pcpsiy{\pmacro} \def\pz{\pmacro} \def\pw{\pmacro} \def\pcwz{\pmacro} \def\pw{\pmacro} \def\pcyipsii{\pmacro} \def\pyipsii{\pmacro} \def\pypsiij{\pmacro} \def\pyipsi1{\pmacro} \def\ptypsiij{\pmacro} \def\pcyzipsii{\pmacro} \def\pczipsii{\pmacro} \def\pcyizpsii{\pmacro} \def\pcyijzpsii{\pmacro} \def\pcyi1zpsii{\pmacro} \def\pcypsiz{\pmacro} \def\pccypsiz{\pmacro} \def\pypsiz{\pmacro} \def\pcpsiz{\pmacro} \def\peps{\pmacro} \def\vt{ {t} } \def\mlxtran{\mathbb MLXtran}$

## Introduction

A model built for real-world applications can involve various types of variable, such as measurements, individual and population parameters, covariates, design, etc. The model allows us to represent relationships between these variables.

If we consider things from a probabilistic point of view, some of the variables will be random, so the model becomes a probabilistic one, representing the joint distribution of these random variables.

Defining a model therefore means defining a joint distribution. The hierarchical structure of the model will then allow it to be decomposed into submodels, i.e., the joint distribution decomposed into a product of conditional distributions.

Tasks such as estimation, model selection, simulation and optimization can then be expressed as specific ways of using this probability distribution.

- A model is a joint probability distribution.

- A submodel is a conditional distribution derived from this joint distribution.

- A task is a specific use of this distribution.

We will illustrate this approach starting with a very simple example that we will gradually make more sophisticated. Then we will see in various situations what can be defined as the model and what its inputs are.

## An illustrative example

### A model for the observations of a single individual

Let $y=(y_j, 1\leq j \leq n)$ be a vector of observations obtained at times $\vt=(t_j, 1\leq j \leq n)$. We consider that the $y_j$ are random variables and we denote $\py$ the distribution (or pdf) of $y$. If we assume that $y$ is a parametric model, then there exists a vector of parameters $\psi$ that completely define $y$.

We can then explicitly represent this dependency with respect to $\bpsi$ by writing $\qy( \, \cdot \, ; \psi)$ for the pdf of $y$.

If we wish to be even more precise, we can even make it clear that this distribution is defined for a given design, i.e., a given vector of times $\vt$, and write $\qy(\, \cdot \, ; \psi,\vt)$ instead.

By convention, the variables which are before the symbol ";" are random variables. Those that are after the ";" are non-random parameters or variables. When there is no risk of confusion, the non-random terms can be left out of the notation.

* In this context, the model is the distribution of the observations $\qy(\, \cdot \, ; \psi,\vt)$.
• The inputs of the model are the parameters $\psi$ and the design $\vt$.

Example

500 mg of a drug is given by intravenous bolus to a patient at time 0. We assume that the evolution of the plasmatic concentration of the drug over time is described by the pharmacokinetic (PK) model

$$f(t;V,k) = \displaystyle{ \frac{500}{V} }e^{-k \, t} ,$$

where $V$ is the volume of distribution and $k$ the elimination rate constant. The concentration is measured at times $(t_j, 1\leq j \leq n)$ with additive residual errors:

$$y_j = f(t_j;V,k) + e_j , \quad 1 \leq j \leq n .$$

Assuming that the residual errors $(e_j)$ are independent and normally distributed with constant variance $a^2$, the observed values $(y_j)$ are also independent random variables and

 $$y_j \sim {\cal N} \left( f(t_j ; V,k) , a^2 \right), \quad 1 \leq j \leq n.$$ (1.4)

Here, the vector of parameters $\psi$ is $(V,k,a)$. $V$ and $k$ are the PK parameters for the structural PK model and $a$ the residual error parameter. As the $y_j$ are independent, the joint distribution of $\by$ is the product of their marginal distributions:

$$\py(y ; \psi,\vt) = \prod_{j=1}^n \pyj(y_j ; \psi,t_j) ,$$

where $\pyj$ is the normal distribution defined in (1.4).

### A model for the population parameters

In some cases, it may turn out that it is useful or important to consider that the population parameter $\theta$ is itself random, rather than fixed. There are various reasons for this, such as if we want to model uncertainty in its value, introduce a priori information in an estimation context, or model an inter-population variability if the model is not looking at only one given population.

If so, let us denote $\qth$ the distribution of $\theta$. As the status of $\theta$ has therefore changed, the model now becomes the joint distribution of its random variables, i.e., of $\by$, $\bpsi$ and $\theta$, and can be decomposed as follows:

 $$\pypsith(\by,\bpsi,\theta;\bt,\bc) = \pcypsi(\by |\bpsi;\bt) \, \pcpsith(\bpsi|\theta;\bc) \, \pth(\theta) .$$ (1.7)

Remarks:

1. The formula is identical for $\ppsi(\bpsi; \theta)$ and $\pcpsith(\bpsi|\theta)$. What has changed is the status of $\theta$. It is not random in $\ppsi(\bpsi; \theta)$, the distribution of $\bpsi$ for any given value of $\theta$, whereas it is random in $\pcpsith(\bpsi | \theta)$, the conditional distribution of $\bpsi$, i.e., the distribution of $\bpsi$ obtained after observing randomly generating.

2. If $\qth$ is a parametric distribution with parameter $\varphi$, this dependence can be made explicit by writing $\qth(\, \cdot \,;\varphi)$ for the distribution of $\theta$.

3. Not necessarily all of the components of $\theta$ need be random. If it is possible to decompose $\theta$ into $(\theta_F,\theta_R)$, where $\theta_F$ is fixed and $\theta_R$ random, then the decomposition (1.7) becomes

 $$\pypsith(\by,\bpsi,\theta_R;\bt,\theta_F,\bc) = \pcypsi(\by |\bpsi;\bt) \, \pcpsith(\bpsi|\theta_R;\theta_F,\bc) \, \pth(\theta_R).$$ (1.8)

### A model for several individuals

Now let us move to $N$ individuals. It is natural to suppose that each is represented by the same basic parametric model, but not necessarily the exact same parameter values. Thus, individual $i$ has parameters $\psi_i$. If we consider that individuals are randomly selected from the population, then we can treat the $\psi_i$ as if they were random vectors. As both $\by=(y_i , 1\leq i \leq N)$ and $\bpsi=(\psi_i , 1\leq i \leq N)$ are random, the model is now a joint distribution: $\qypsi$. Using basic probability, this can be written as:

$$\pypsi(\by,\bpsi) = \pcypsi(\by | \bpsi) \, \ppsi(\bpsi) .$$

If $\qpsi$ is a parametric distribution that depends on a vector $\theta$ of population parameters and a set of individual covariates $\bc=(c_i , 1\leq i \leq N)$, this dependence can be made explicit by writing $\qpsi(\, \cdot \,;\theta,\bc)$ for the pdf of $\bpsi$. Each $i$ has a potentially unique set of times $t_i=(t_{i1},\ldots,t_{i \ \!\!n_i})$ in the design, and $n_i$ can be different for each individual.

Example

Let us suppose $N$ patients received the same treatment as the single patient did. We now have the same PK model (1.4) for each patient, except that each has its own individual PK parameters $V_i$ and $k_i$ and potentially its own residual error parameter $a_i$:
 $$y_{ij} \sim {\cal N} \left( \displaystyle{\frac{500}{V_i}e^{-k_i \, t_{ij} } } , a_i^2 \right).$$ (1.5)

Here, $\psi_i = (V_i,k_i,a_i)$. One possible model is then to assume the same residual error model for all patients, and log-normal distributions for $V$ and $k$:

$$\begin{eqnarray} a_i &=& a \end{eqnarray}$$

 $$\begin{eqnarray} \log(V_i) &\sim_{i.i.d.}& {\cal N}\left(\log(V_{\rm pop})+\beta\,\log(w_i/70),\ \omega_V^2\right) \end{eqnarray}$$ (1.6)

$$\begin{eqnarray} \log(k_i) &\sim_{i.i.d.}& {\cal N}\left(\log(k_{\rm pop}),\ \omega_k^2\right). \end{eqnarray}$$

where the only covariate we choose to consider, $w_i$, is the weight (in kg) of patient $i$. The model therefore consists of the conditional distribution of the concentrations defined in (1.5) and the distribution of the individual PK parameters defined in (1.6). The inputs of the model are the population parameters $\theta = (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$, the covariates (here, the weight) $(w_i, 1\leq i \leq N)$, and the design $\bt$.

{{OutlineTextL text= - In this context, the model is the joint distribution of the observations, the individual parameters and the population parameters:

$$\pypsith(\by,\bpsi,\theta;\bc,\bt) = \pcypsi(\by |\bpsi;\bt) \, \pcpsith(\bpsi|\theta;\bc) \, \pth(\theta)$$

- The inputs of the model are the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times $\bt=(t_{ij} , 1\leq i \leq N , 1\leq j \leq n_i)$. }}

Example:

We can introduce prior distributions in order to model the inter-population variability of the population parameters $V_{\rm pop}$ and $k_{\rm pop}$:
 $$\begin{eqnarray} V_{\rm pop} &\sim& {\cal N}\left(30,3^2\right) \end{eqnarray}$$ (1.9)

$$\begin{eqnarray} k_{\rm pop} &\sim& {\cal N}\left(0.1,0.01^2\right). \end{eqnarray}$$

As before, the conditional distribution of the concentration is given by (1.5). Now, (1.6) is the conditional distribution of the individual PK parameters, given $\theta_R=(V_{\rm pop},k_{\rm pop})$. The distribution of $\theta_R$ is defined in (1.9). Here, the inputs of the model are the fixed population parameters $\theta_F = (\omega_V,\omega_k,\beta,a)$, the weights $(w_i)$ and the design $\bt$.

### A model for the covariates

Another scenario is to suppose that in fact it is the covariates $\bc$ that are random, not the population parameters. This may either be in the context of wanting to simulate individuals, or when modeling and wanting to take into account uncertainty in the covariate values. If we note $\qc$ the distribution of the covariates, the joint distribution $\qpsic$ of the individual parameters and the covariates decomposes naturally as:

 $$\ppsic(\bpsi,\bc;\theta) = \pcpsic(\bpsi | \bc;\theta) \, \pc(\bc) \ ,$$ (1.10)

where $\qcpsic$ is the conditional distribution of $\bpsi$ given $\bc$.

Example:

We could assume a normal distribution as a prior for the weights:
 $$w_i \sim_{i.i.d.} {\cal N}\left(70,10^2\right)$$ (1.11)
Once more, (1.5) defines the conditional distribution of the concentrations. Now, (1.6) is the conditional distribution of the individual PK parameters, given the weight $\bw$, which is now a random variable whose distribution is defined in (1.11). Now, the inputs of the model are the population parameters $\theta= (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$ and the design $\bt$.

### A model for the measurement times

Another scenario is to suppose that there is uncertainty in the measurement times $\bt=(t_{ij})$ and not the population parameters or covariates. If we note $\nominal{\bt}=(\nominal{t}_{ij}, 1\leq i \leq N, 1\leq j \leq n_i)$ the nominal measurement times (i.e., those presented in a data set), then the "true" measurement times $\bt$ at which the measurement were made can be considered random fluctuations around $\nominal{\bt}$ following some distribution $\qt(\, \cdot \, ; \nominal{\bt})$.

Randomness with respect to time can also appear in the presence of dropout, i.e., individuals that prematurely leave a clinical trial. For such an individual $i$ who leaves at the random time $T_i$, measurement times are the nominal times before $T_i$: $t_{i} = (\nominal{t}_{ij} \ \ {\rm s.t. }\ \ \nominal{t}_{ij}\leq T_i)$. In such situations, measurement times are therefore random and can be thought to come from a distribution $\qt(\, \cdot \, ; \nominal{\bt})$.

Remark:

If there are also other regression variables $\bx=(x_{ij})$, it is of course possible to use the same approach and consider $\bx$ as a random variable fluctuating around $\nominal{\bx}$.

{Example |title=Example: |text= Let us assume as prior a normal distribution around the nominal times:

 $$t_{ij} \sim_{i.i.d.} {\cal N}\left(\nominal{t}_{ij},0.03^2\right)$$ (1.12)

Here, (1.12) defines the distribution of the now random variable $\bt$. The other components of the model defined in (1.5) and (1.6) remain unchanged. The inputs of the model are the population parameters $\theta$, the weights $(w_i)$ and the nominal measurement times $\nominal{\bt}$. }}

### A model for the dose regimen

If the structural model is a dynamical system (e.g., defined by a system of ordinary differential equations), the source terms $\bu = (u_i, 1\leq i \leq N)$, i.e., the inputs of the dynamical system, are usually considered fixed and known. This is the case for example for doses administered to patients for a given treatment. Here, the source term $u_i$ is made up of the dose(s) given to patient $i$, the time(s) of administration, and their type (IV bolus, infusion, oral, etc.).

Here again, there may be differences between the nominal dosage regimen stated in the protocol and given in the data set, and the dosage regimem that was in reality administered. For example, it might be that the times of administration and/or dose were not exactly respected or recorded. Also, there may have been non compliance, i.e., certain doses that were not taken by the patient.

If we denote $\nominal{\bu}=(\nominal{u}_{i}, 1\leq i \leq N)$ the nominal dose regimens (reported in the dataset), then in this context the "real" dose regimens $\bu$ can be considered to randomly fluctuate around $\nominal{\bu}$ with some distribution $\qu(\, \cdot \, ; \nominal{\bu})$.

{{Example |title=Example: |text=Suppose that instead of the one dose given in the example up to now, there are now repeated doses $(d_{ik}, k \geq 1)$ administered to patient $i$ at times $(\tau_{ik} , k \geq 1)$. Then, it is easy to see that

 $$y_{ij} \sim {\cal N}\left(f(t_{ij};V_i,k_i) , a_i^2 \right),$$ (1.13)

where

 $$f(t;V_i,k_i) = \sum_{k, \tau_{ik} The "real" dose regimen administrated to patient i can be written u_i=(d_{ik},\tau_{ik}, k\geq 1), and the prescribed dose regimen \nominal{u}_i=(\nominal{d}_{ik},\nominal{\tau}_{ik}, k\geq 1). We can model the random fluctuations of the administration times \tau_{ik} around the nominal times (\nominal{\tau}_{ij}):  \(\begin{eqnarray} \tau_{ik} &\sim_{i.i.d.}& {\cal N}\left(\nominal{\tau}_{ik},0.02^2\right)\ , \end{eqnarray}$$ (1.15)

and non compliance (here meaning that a dose is not taken):

 $$\begin{eqnarray} \pi &=& \prob{d_{ik} = 0} \nonumber \\ &=& 1 - \prob{d_{ik}= \nominal{d}_{ik} }. \end{eqnarray}$$ (1.16)