Testing A joint probability
$ \DeclareMathOperator{\argmin}{arg\,min} \DeclareMathOperator{\argmax}{arg\,max} \newcommand{\nominal}[1]{#1^{\star}} \newcommand{\psis}{\psi{^\star}} \newcommand{\phis}{\phi{^\star}} \newcommand{\hpsi}{\hat{\psi}} \newcommand{\hphi}{\hat{\phi}} \newcommand{\teps}{\varepsilon} \newcommand{\limite}[2]{\mathop{\longrightarrow}\limits_{\mathrm{#1}}^{\mathrm{#2}}} \newcommand{\DDt}[1]{\partial^2_\theta #1} \def\aref{a^\star} \def\kref{k^\star} \def\model{M} \def\hmodel{m} \def\mmodel{\mu} \def\imodel{H} \def\Imax{\text{\it Imax}} \def\id{ {\rm Id}} \def\teta{\tilde{\eta}} \newcommand{\eqdef}{\mathop{=}\limits^{\mathrm{def}}} \newcommand{\deriv}[1]{\frac{d}{dt}#1(t)} \newcommand{\pred}[1]{\tilde{#1}} \def\phis{\phi{^\star}} \def\hphi{\tilde{\phi}} \def\hw{\tilde{w}} \def\hpsi{\tilde{\psi}} \def\hatpsi{\hat{\psi}} \def\hatphi{\hat{\phi}} \def\psis{\psi{^\star}} \def\transy{u} \def\psipop{\psi_{\rm pop}} \newcommand{\psigr}[1]{\hat{\bpsi}_{#1}} \newcommand{\Vgr}[1]{\hat{V}_{#1}} \def\psig{\psi} \def\psigprime{\psig^{\prime}} \def\psigiprime{\psig_i^{\prime}} \def\psigk{\psig^{(k)}} \def\psigki{ {\psig_i^{(k)}}} \def\psigkun{\psig^{(k+1)}} \def\psigkuni{\psig_i^{(k+1)}} \def\psigi{ {\psig_i}} \def\psigil{ {\psig_{i,\ell}}} \def\phig{ {\phi}} \def\phigi{ {\phig_i}} \def\phigil{ {\phig_{i,\ell}}} \def\etagi{ {\eta_i}} \def\IIV{ {\Omega}} \def\thetag{ {\theta}} \def\thetagk{ {\theta_k}} \def\thetagkun{ {\theta_{k+1}}} \def\thetagkunm{\theta_{k-1}} \def\sgk{s_{k}} \def\sgkun{s_{k+1}} \def\yg{y} \def\xg{x} \def\qy{p_{_y}} \def\qt{p_{_t}} \def\qc{p_{_c}} \def\qu{p_{_u}} \def\qyi{p_{_{y_i}}} \def\qyj{p_{_{y_j}}} \def\qpsi{p_{_{\psi}}} \def\qpsii{p_{_{\psi_i}}} \def\qcpsith{p_{_{\psi|\theta}}} \def\qth{p_{_{\theta}}} \def\qypsi{p_{_{y,\psi}}} \def\qcypsi{p_{_{y|\psi}}} \def\qpsic{p_{_{\psi,c}}} \def\qcpsic{p_{_{\psi|c}}} \def\qypsic{p_{_{y,\psi,c}}} \def\qypsit{p_{_{y,\psi,t}}} \def\qcypsit{p_{_{y|\psi,t}}} \def\qypsiu{p_{_{y,\psi,u}}} \def\qcypsiu{p_{_{y|\psi,u}}} \def\qypsith{p_{_{y,\psi,\theta}}} \def\qypsithcut{p_{_{y,\psi,\theta,c,u,t}}} \def\qypsithc{p_{_{y,\psi,\theta,c}}} \def\qcypsiut{p_{_{y|\psi,u,t}}} \def\qcpsithc{p_{_{\psi|\theta,c}}} \def\qcthy{p_{_{\theta | y}}} \def\qyth{p_{_{y,\theta}}} \def\qcpsiy{p_{_{\psi|y}}} \def\qz{p_{_z}} \def\qw{p_{_w}} \def\qcwz{p_{_{w|z}}} \def\qw{p_{_w}} \def\qcyipsii{p_{_{y_i|\psi_i}}} \def\qyipsii{p_{_{y_i,\psi_i}}} \def\qypsiij{p_{_{y_{ij}|\psi_{i}}}} \def\qyipsi1{p_{_{y_{i1}|\psi_{i}}}} \def\qtypsiij{p_{_{\transy(y_{ij})|\psi_{i}}}} \def\qcyzipsii{p_{_{z_i,y_i|\psi_i}}} \def\qczipsii{p_{_{z_i|\psi_i}}} \def\qcyizpsii{p_{_{y_i|z_i,\psi_i}}} \def\qcyijzpsii{p_{_{y_{ij}|z_{ij},\psi_i}}} \def\qcyi1zpsii{p_{_{y_{i1}|z_{i1},\psi_i}}} \def\qcypsiz{p_{_{y,\psi|z}}} \def\qccypsiz{p_{_{y|\psi,z}}} \def\qypsiz{p_{_{y,\psi,z}}} \def\qcpsiz{p_{_{\psi|z}}} \def\qeps{p_{_{\teps}}} \def\neta{ {n_\eta}} \def\ncov{M} \def\npsi{n_\psig} \def\bu{\boldsymbol{u}} \def\bt{\boldsymbol{t}} \def\bT{\boldsymbol{T}} \def\by{\boldsymbol{y}} \def\bx{\boldsymbol{x}} \def\bc{\boldsymbol{c}} \def\bw{\boldsymbol{w}} \def\bz{\boldsymbol{z}} \def\bpsi{\boldsymbol{\psi}} \def\bbeta{\beta} \def\beeta{\eta} \def\logit{\rm logit} \def\transy{u} \def\so{O} \def\one{\mathbb 1} \newcommand{\prob}[1]{ \mathbb{P}\!\left(#1\right)} \newcommand{\probs}[2]{ \mathbb{P}_{#1}\!\left(#2\right)} \newcommand{\esp}[1]{\mathbb{E}\left(#1\right)} \newcommand{\esps}[2]{\mathbb{E}_{#1}\left(#2\right)} \newcommand{\var}[1]{\mbox{Var}\left(#1\right)} \newcommand{\vars}[2]{\mbox{Var}_{#1}\left(#2\right)} \newcommand{\std}[1]{\mbox{sd}\left(#1\right)} \newcommand{\stds}[2]{\mbox{sd}_{#1}\left(#2\right)} \newcommand{\corr}[1]{\mbox{Corr}\left(#1\right)} \def\pmacro{\mathbf{p}} \def\py{\pmacro} \def\pt{\pmacro} \def\pc{\pmacro} \def\pu{\pmacro} \def\pyi{\pmacro} \def\pyj{\pmacro} \def\ppsi{\pmacro} \def\ppsii{\pmacro} \def\pcpsith{\pmacro} \def\pth{\pmacro} \def\pypsi{\pmacro} \def\pcypsi{\pmacro} \def\ppsic{\pmacro} \def\pcpsic{\pmacro} \def\pypsic{\pmacro} \def\pypsit{\pmacro} \def\pcypsit{\pmacro} \def\pypsiu{\pmacro} \def\pcypsiu{\pmacro} \def\pypsith{\pmacro} \def\pypsithcut{\pmacro} \def\pypsithc{\pmacro} \def\pcypsiut{\pmacro} \def\pcpsithc{\pmacro} \def\pcthy{\pmacro} \def\pyth{\pmacro} \def\pcpsiy{\pmacro} \def\pz{\pmacro} \def\pw{\pmacro} \def\pcwz{\pmacro} \def\pw{\pmacro} \def\pcyipsii{\pmacro} \def\pyipsii{\pmacro} \def\pypsiij{\pmacro} \def\pyipsi1{\pmacro} \def\ptypsiij{\pmacro} \def\pcyzipsii{\pmacro} \def\pczipsii{\pmacro} \def\pcyizpsii{\pmacro} \def\pcyijzpsii{\pmacro} \def\pcyi1zpsii{\pmacro} \def\pcypsiz{\pmacro} \def\pccypsiz{\pmacro} \def\pypsiz{\pmacro} \def\pcpsiz{\pmacro} \def\peps{\pmacro} \def\vt{ {t} } \def\mlxtran{\mathbb MLXtran} $
Contents
Introduction
A model built for real-world applications can involve various types of variable, such as measurements, individual and population parameters, covariates, design, etc. The model allows us to represent relationships between these variables.
If we consider things from a probabilistic point of view, some of the variables will be random, so the model becomes a probabilistic one, representing the joint distribution of these random variables.
Defining a model therefore means defining a joint distribution. The hierarchical structure of the model will then allow it to be decomposed into submodels, i.e., the joint distribution decomposed into a product of conditional distributions.
Tasks such as estimation, model selection, simulation and optimization can then be expressed as specific ways of using this probability distribution.
We will illustrate this approach starting with a very simple example that we will gradually make more sophisticated. Then we will see in various situations what can be defined as the model and what its inputs are.
An illustrative example
A model for the observations of a single individual
Let $y=(y_j, 1\leq j \leq n)$ be a vector of observations obtained at times $\vt=(t_j, 1\leq j \leq n)$. We consider that the $y_j$ are random variables and we denote $\py$ the distribution (or pdf) of $y$. If we assume that $y$ is a parametric model, then there exists a vector of parameters $\psi$ that completely define $y$.
We can then explicitly represent this dependency with respect to $\bpsi$ by writing $\qy( \, \cdot \, ; \psi)$ for the pdf of $y$.
If we wish to be even more precise, we can even make it clear that this distribution is defined for a given design, i.e., a given vector of times $\vt$, and write $ \qy(\, \cdot \, ; \psi,\vt)$ instead.
By convention, the variables which are before the symbol ";" are random variables. Those that are after the ";" are non-random parameters or variables. When there is no risk of confusion, the non-random terms can be left out of the notation.
A model for the population parameters
In some cases, it may turn out that it is useful or important to consider that the population parameter $\theta$ is itself random, rather than fixed. There are various reasons for this, such as if we want to model uncertainty in its value, introduce a priori information in an estimation context, or model an inter-population variability if the model is not looking at only one given population.
If so, let us denote $\qth$ the distribution of $\theta$. As the status of $\theta$ has therefore changed, the model now becomes the joint distribution of its random variables, i.e., of $\by$, $\bpsi$ and $\theta$, and can be decomposed as follows:
\(
\pypsith(\by,\bpsi,\theta;\bt,\bc) = \pcypsi(\by |\bpsi;\bt) \, \pcpsith(\bpsi|\theta;\bc) \, \pth(\theta) .
\)
|
(1.7) |
A model for several individuals
Now let us move to $N$ individuals. It is natural to suppose that each is represented by the same basic parametric model, but not necessarily the exact same parameter values. Thus, individual $i$ has parameters $\psi_i$. If we consider that individuals are randomly selected from the population, then we can treat the $\psi_i$ as if they were random vectors. As both $\by=(y_i , 1\leq i \leq N)$ and $\bpsi=(\psi_i , 1\leq i \leq N)$ are random, the model is now a joint distribution: $\qypsi$. Using basic probability, this can be written as:
If $\qpsi$ is a parametric distribution that depends on a vector $\theta$ of population parameters and a set of individual covariates $\bc=(c_i , 1\leq i \leq N)$, this dependence can be made explicit by writing $\qpsi(\, \cdot \,;\theta,\bc)$ for the pdf of $\bpsi$.
Each $i$ has a potentially unique set of times $t_i=(t_{i1},\ldots,t_{i \ \!\!n_i})$ in the design, and $n_i$ can be different for each individual.
{{OutlineTextL text= - In this context, the model is the joint distribution of the observations, the individual parameters and the population parameters:
- The inputs of the model are the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times $\bt=(t_{ij} , 1\leq i \leq N , 1\leq j \leq n_i)$.
}}
A model for the covariates
Another scenario is to suppose that in fact it is the covariates $\bc$ that are random, not the population parameters. This may either be in the context of wanting to simulate individuals, or when modeling and wanting to take into account uncertainty in the covariate values. If we note $\qc$ the distribution of the covariates, the joint distribution $\qpsic$ of the individual parameters and the covariates decomposes naturally as:
\(
\ppsic(\bpsi,\bc;\theta) = \pcpsic(\bpsi | \bc;\theta) \, \pc(\bc) \ ,
\)
|
(1.10) |
where $\qcpsic$ is the conditional distribution of $\bpsi$ given $\bc$.