https://wiki.inria.fr/wikis/popix/index.php?title=Special:NewPages&feed=atom&hideredirs=1&limit=50&offset=&namespace=0&username=&tagfilter=&size-mode=max&size=0 Popix - New pages [en] 2022-05-24T08:41:23Z From Popix MediaWiki 1.32.6 https://wiki.inria.fr/popix/Simulation Simulation 2013-06-03T11:30:34Z <p>Admin: /* Examples */</p> <hr /> <div>== Simulation of a hierarchical model ==<br /> <br /> In the [[What is a model? A joint probability distribution!]] section we proposed to consider a model as a joint probability distribution of random variables. Furthermore, the hierarchical structure of a model allows us to decompose this joint distribution into a product<br /> of conditional and marginal distributions. <br /> <br /> Thus, if a model includes observations $\by$, individual parameters $\bpsi$, a design made up of observation times $\bt$ and source terms (e.g., doses) $\bu$, individual covariates $\bc$ and population parameters $\theta$, the joint distribution of these random variables can be written:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsithcut(\by , \bpsi, \theta, \bu, \bc,\bt)=\pcypsiut(\by {{!}} \bpsi,\bu,\bt) \, \pcpsithc(\bpsi {{!}} \theta,\bc) \, \pth(\theta) \, \pc(\bc) \, \pu(\bu) \, \pt(\bt) .<br /> &lt;/math&gt; }}<br /> <br /> Simulating this model means successively drawing the variables of the model using their associated submodels:<br /> <br /> <br /> &lt;ol&gt;<br /> &lt;li&gt; The population parameters $\theta$ from the distribution $\qth$. &lt;/li&gt;<br /> <br /> <br /> &lt;li&gt; The individual covariates $\bc$ from the distribution $\qc$. &lt;/li&gt;<br /> <br /> <br /> &lt;li&gt; The individual parameters $\bpsi$ from the distribution $\qcpsithc$ using the values of $\theta$ and $\bc$ obtained in steps 1 and 2. &lt;/li&gt;<br /> <br /> <br /> &lt;li&gt; The dose regimen $\bu$ from the distribution $\qu$. &lt;/li&gt;<br /> <br /> <br /> &lt;li&gt; The measurement times $\bt$ from the distribution $\qt$.&lt;/li&gt;<br /> <br /> <br /> &lt;li&gt; Lastly, observations $\by$ from the distribution $\qcypsiut$ using the values of $\bpsi$, $\bu$ and $\bt$ obtained at steps 3, 4 and 5. &lt;/li&gt;<br /> &lt;/ol&gt;<br /> <br /> <br /> Evidently, if certain variables are not considered random and are given, the associated simulation step is not performed.<br /> <br /> We have already seen that for other tasks (model exploration, estimation) $\mlxtran$ is well-adapted to implementing this type of hierarchical model because it can deal with this structure by defining the various submodels in easy-to-read blocks.<br /> It is also possible to use $\mlxtran$ like a function from within R and Matlab. This capability means that we can write our own R or Matlab script in which the design is defined, the simulation function is called and the results are used for plotting figures, performing tests, etc. This solution combines the flexibility of the R and Matlab environments with the ability of $\mlxtran$ to easily encode complex models.<br /> <br /> <br /> &lt;br&gt;<br /> == Examples ==<br /> <br /> The joint model ${\cal M}_2$ detailed in the [[Model evaluation#Model selection| Model selection]] section can be implemented with $\mlxtran$. We suppose that there can be either simple or multiple drug administrations either orally ({{Verbatim|adm{{-}}1}}) or by<br /> iv ({{Verbatim|adm{{-}}2}}). The model is &quot;ready&quot; to accept these different dose regimens, which are defined either within the data (for estimation) or the simulation script.<br /> <br /> <br /> {{MLXTran<br /> |name=joint1_model.txt<br /> |text=&lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> [INDIVIDUAL]<br /> input = {ka_pop,V_pop, Cl_pop,h0_pop,omega_ka, omega_V, omega_Cl, <br /> omega_h0,beta_V,beta_Cl, weight}<br /> <br /> EQUATION:<br /> lw70=log(weight/70)<br /> <br /> DEFINITION:<br /> ka = {distribution=lognormal, reference=ka_pop, sd=omega_ka}<br /> V = {distribution=lognormal, reference=V_pop, sd=omega_V, <br /> covariate=lw70, coefficient=beta_V}<br /> Cl = {distribution=lognormal, reference=Cl_pop, sd=omega_Cl, <br /> covariate=lw70, coefficient=beta_Cl}<br /> h0 = {distribution=lognormal, reference=h0_pop, sd=omega_h0}<br /> <br /> <br /> [OBSERVATION]<br /> input = {ka, V, Cl, h0, gamma, a, b}<br /> <br /> PK:<br /> depot(adm=1, target=Ad)<br /> depot(adm=2, target=Ac)<br /> <br /> EQUATION:<br /> ke=Cl/V<br /> ddt_Ad = -ka*Ad<br /> ddt_Ac = ka*Ad - ke*Ac<br /> Cc = Ac/V<br /> h = h0*exp(gamma*Cc)<br /> <br /> DEFINITION:<br /> Concentration = {type=continuous, prediction=Cc, errorModel=combined1(a,b)}<br /> Hemorrhaging = {type=event, hazard=h}<br /> &lt;/pre&gt;}}<br /> <br /> <br /> Here is an example of a Matlab script that uses the {{Verbatim|joint1_model.txt}} model to simulate data for 10 patients.<br /> In this example, a dose of 50mg is administered every 24 hours over a 7 day period. The simulated data is stored in a cell array. We can then use {{Verbatim|exploredatapx}} for visualizing it.<br /> <br /> <br /> {{MATLABcode<br /> |name=<br /> |code=&lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> %% dose regimen <br /> admin1 = struct('adm',1,'amount',50,'time',0:24:144); <br /> <br /> %% group<br /> group1 = struct('size',10);<br /> <br /> %% parameter<br /> parameterName = {'ka_pop','V_pop','Cl_pop','h0_pop','omega_ka','omega_V',...<br /> 'omega_Cl','omega_h0','beta_V','beta_Cl','gamma','a','b'};<br /> parameterValue = [1, 8, 2, 0.01, 0.4, 0.2, 0.2, 0.1, 1, 0.75, 0.5, 0.1, 0.15];<br /> param1 = struct('name',parameterName,'value',parameterValue);<br /> <br /> %% output<br /> outputName = {'Cc','h','Concentration','Hemorrhaging'};<br /> outputTime = {0:1:200 , 0:1:200 , [1 2 4 6 12 18 28 52 76 100 124 148 172]};<br /> output1 = struct('name',outputName,'time',outputTime);<br /> <br /> %% tasks<br /> data=simulix('model','joint1_model.txt','administration',admin1,'group',group1,...<br /> 'parameter',param1,'output',output1);<br /> <br /> exploredatapx(data)<br /> &lt;/pre&gt;}}<br /> <br /> <br /> If we preferred, we could instead use R to run exactly the same simulations using exactly the same $\mlxtran$ model:<br /> <br /> <br /> {{Rcode<br /> |name=<br /> |code=&lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> ## dose regimen<br /> admin1 &lt;-list(adm = 1, amount = 50, time = seq(0, 144, by=24))<br /> <br /> ##group<br /> group1 &lt;- list(size=10)<br /> <br /> ## parameter<br /> parameterName &lt;- c(&quot;ka_pop&quot;,&quot;V_pop&quot;,&quot;Cl_pop&quot;,&quot;h0_pop&quot;,&quot;omega_ka&quot;,&quot;omega_V&quot;,<br /> &quot;omega_Cl&quot;,&quot;omega_h0&quot;,&quot;beta_V&quot;,&quot;beta_Cl&quot;,&quot;gamma&quot;,&quot;a&quot;,&quot;b&quot;)<br /> parameterValue &lt;- c(1, 8, 2, 0.01, 0.4, 0.2, 0.2, 0.1, 1, 0.75, 0.5, 0.1, 0.15)<br /> param1 &lt;- list(name=parameterName, value=parameterValue)<br /> <br /> ## output<br /> outputName &lt;- c(&quot;Cc&quot;,&quot;h&quot;,&quot;Concentration&quot;,&quot;Hemorrhaging&quot;);<br /> outputTime &lt;- c(seq(0,200), seq(0,200), c(1 2 4 6 12 18 28 52 76 100 124 148 172))<br /> output1 &lt;- list(name=outputName, time=outputTime)<br /> <br /> ## tasks<br /> data &lt;- simulix(model=&quot;joint1_model.txt&quot;,administration=admin1, group=group1,<br /> parameter=param1,output=output1);<br /> &lt;/pre&gt; }}<br /> <br /> <br /> It is also possible to include in the model a statistical model for the population parameters or the covariates (here, the weight) by adding sections [POPULATION] and/or [COVARIATE] to the $\mlxtran$ model. For example, we could add the following code to the previous $\mlxtran$ code:<br /> <br /> <br /> {{MLXTran<br /> |name=joint2_model.txt<br /> |text=&lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> [POPULATION]<br /> input = {V_0, Cl_0, gamma_V, gamma_Cl}<br /> <br /> DEFINITION:<br /> V_pop = {distribution=lognormal, reference=V_0, sd=gamma_V}<br /> Cl_pop = {distribution=lognormal, reference=Cl_0, sd=gamma_Cl}<br /> <br /> <br /> [COVARIATE]<br /> input = {omega_w}<br /> <br /> DEFINITION:<br /> weight = {distribution=normal, mean=70, sd=omega_w}<br /> &lt;/pre&gt;}}<br /> <br /> <br /> {{Remarks<br /> |title=Remark<br /> |text=The {{Verbatim|simulix}} function and necessary R and Matlab connectors are still under development and will be available soon.<br /> }}<br /> <br /> <br /> {{Back<br /> |link=Model evaluation }}</div> Admin https://wiki.inria.fr/popix/Model_evaluation Model evaluation 2013-06-03T09:26:59Z <p>Admin: </p> <hr /> <div>== Introduction ==<br /> <br /> Defining the expression &quot;model evaluation&quot; is a harder task than it may appear at first glance. Intuitively, it would seem to suggest evaluating the performance of a model based on the observed data, the same data that was used to build the model. Fair enough, but what then do we mean by the &quot;performance&quot; of a model?<br /> <br /> Do we mean the ability of the model to characterize and explain the phenomena being studied, in which case the goal is to use the model to understand the phenomena? Or do we mean the model's predictive performance when the model is used to predict the phenomena's behavior, either in the future or under new experimental conditions?<br /> <br /> What is comes down to is this: do we want to use the model to understand or to predict? This is the key question to ask before even starting to think about what tools to use and tasks to execute.<br /> <br /> Here, we will be for the most part focused on the ability of a model to explain the phenomena and data. Therefore, the first goal will be to check whether the data are in agreement with the model, and vice versa. In this process, model diagnostics can be used to eliminate model candidates that do not seem capable of reproducing the observed data.<br /> As is the usual case in statistics, it is not because a model has not been rejected that it is necessarily the &quot;true&quot; one. All that we can say is that the experimental data does not allow us to reject this model. It is merely one of perhaps many models that cannot be rejected. Indeed, we can usually find several models that get past this first diagnostic step and are therefore not rejected.<br /> <br /> What to do, then, when several possible models are retained? Well, we can try to select the &quot;best&quot; one (or best ones if no leader distinguishes itself from the rest). This means developing a model selection process which allows us to compare the models to each other. Compare models? But with what criteria?<br /> <br /> In a purely explanatory context, [http://en.wikipedia.org/wiki/Occam%27s_razor Occam's razor] is a useful parsimony principle which states that among competing hypotheses, the one with the fewest assumptions should be selected. In a modeling context, this means that among valid competing models, the most simple one should be selected.<br /> <br /> Model diagnostic tools are for the most part graphical or visual: we &quot;see&quot; when something is not right between a chosen model and the data it is hypothesized to describe. Model selection tools, on the other hand, are analytical: we calculate the value of some criteria that allows us to compare the models with each other. However, it is absolutely critical to keep in mind the limits of these tools. These are not decision-making tools. It is not a $p$-value or some information criteria that can automatically decide which model to choose. It is always the modeler who must have the last word! This person uses the model diagnostic and selection tools in order to guide their decision, but at the end, it is they that must make the final decision. There is nothing more dangerous that rules applied without thinking and arbitrary cut-off values used blindly without reflection.<br /> <br /> Here, we will not look model selection techniques based on the various models' predictive performances. One such approach consists of splitting the data into three sets: a ''learning set'' is used for fitting the model, a ''validation set'' for choosing between models and a ''test set'' to assess the quality of the predictions made by the chosen model.<br /> <br /> Very few model diagnostic and selection tools exist for mixed-effects models. One of the most complete is [http://xpose.sourceforge.net Xpose], an R-based model-building aid for population analysis using [http://www.iconplc.com/technology/products/nonmem/ NONMEM] that facilitates model diagnostics, candidate covariate identification and model comparison. Here we will use $\monolix$ and illustrate these techniques with the example used previously for model exploration and parameter estimation.<br /> <br /> &lt;br&gt;<br /> <br /> == Model diagnostics==<br /> <br /> &lt;br&gt;<br /> === Model diagnostics and statistical tests ===<br /> <br /> Suppose first that a model has been entirely defined by the modeler, and that its parameters have either been chosen or estimated.<br /> What we call the &quot;model&quot; is therefore a joint probability distribution along with some parameter values.<br /> <br /> <br /> Note ${\cal M}_0$ the model we wish to evaluate. We place ourselves in the framework of statistical testing, and would like to perform the following hypothesis test:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; H_0: \quad {\cal M}={\cal M}_0 \quad vs \quad H_1: \quad {\cal M}\neq{\cal M}_0. &lt;/math&gt;}}<br /> <br /> &quot;Passing&quot; the test does not mean that we accept $H_0$ but rather that we do not reject it. We will use the same point of view for model diagnostics whereby we eliminate model candidates that do not seem capable of reproducing the observed data, i.e., models for which we conclude that ${\cal M}\neq{\cal M}_0$.<br /> <br /> <br /> {{Remarks<br /> |title=Remark<br /> |text=Running a statistical test only makes sense when we have doubts on the hypothesis of retaining a model.<br /> If it is clear from the beginning that for example the structural model is totally misspecified (e.g., we use a linear function of time even though a curvature is clearly visible in the data), any basic goodness-of-fit plot (individual fits, observation vs prediction, residuals, etc.) will detect this misspecification without any doubt and without the need to evaluate the probability of making a mistake.<br /> }}<br /> <br /> <br /> To put into practice such a statistical test, we usually construct a test statistic $T(\by)$ which is a function of the observations and for which we are able to calculate a distribution under the null hypothesis $H_0$.<br /> <br /> For a given significance level $\alpha$, we then define a rejection region $R_\alpha$ such that:<br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\probs{H_0}{T(\by) \in R_\alpha} = \alpha . &lt;/math&gt; }}<br /> <br /> Thus, $\alpha$ is the probability of incorrectly rejecting the null hypothesis $H_0$.<br /> <br /> The difficulty in creating and using such tests comes from two main things:<br /> <br /> <br /> &lt;ul&gt;<br /> * We need to be capable of calculating the distribution of the test statistic $T(\by)$ under $H_0$ in order to carefully track the significance level, i.e., ensure that the probability of incorrectly rejecting $H_0$ is indeed $\alpha$.<br /> &lt;br&gt;<br /> <br /> * Being able to control the type I error $\alpha$ is of no interest if the test has low power, i.e., if the probability of correctly rejecting $H_0$ is low.<br /> &lt;/ul&gt; <br /> <br /> <br /> In the present context, the first point is clearly a problem. Due to the complexity of the models we are interested in, it is impossible to analytically calculate the distribution of a function of the observations, even for something as simple as the empirical mean $\overline{\by}=\sum_{i,j}y_{ij}/\sum_i{n_i}$.<br /> Using limit theorems to approximate such distributions is also more or less hopeless. Perhaps the most powerful, general and precise solution we have available to us is [http://en.wikipedia.org/wiki/Monte_Carlo_simulation Monte Carlo simulation]:<br /> <br /> <br /> &lt;ul&gt;<br /> * Generate independent values $\by^{(1)}, \by^{(2)}, \ldots , \by^{(L)}$ under the model ${\cal M}_0$ using the same design and covariates as in the original data. <br /> &lt;br&gt;<br /> <br /> * Calculate the $L$ statistics $T(\by^{(1)}), T(\by^{(2)}), \ldots , T(\by^{(L)})$.<br /> &lt;br&gt;<br /> <br /> * Estimate the distribution of $T(\by)$ under ${\cal M}_0$ with the empirical distribution of the $T(\by^{(\ell)})$.<br /> &lt;/ul&gt;<br /> <br /> <br /> The estimation error essentially depends on the number $L$ of simulated data sets. We must therefore choose $L$ large enough so that this error is negligible.<br /> <br /> This solves part of the problem. But it remains hard to define a rejection region $R_\alpha$ if the test statistic is multidimensional. We can of course calculate relevant prediction intervals for each component of $T(\by)$ individually, but this does not really help us to define a real multidimensional rejection region.<br /> <br /> Consequently, diagnostics methods are essentially visual: we compare the observed statistic $T(\by)$ with the expected distribution under ${\cal M}_0$ by graphically displaying (for example) 90% or 95% prediction intervals for each component of $T(\by)$.<br /> <br /> The second point is also delicate because we need to decide what ${\cal M}\neq{\cal M}_0$ means, i.e., $H_0$ being false.<br /> Does it mean that the structural model is misspecified? Or the distribution of the random effects, the residual error model, the covariate model? There are so many ways in which a model can be misspecified that we cannot realistically expect to be able to create one unique statistic sufficiently powerful to detect all of these at once. We therefore prefer to construct several different test statistics, i.e., several graphical diagnostics tools, each good at dealing with one particular type of misspecification. It is then the combination of all these tools that will make up our test; we can fairly reasonably hope that a misspecified model will not succeed in passing through this filter.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Diagnostic plots using individual parameters===<br /> <br /> Diagnostic plots constructed using only the observations are useful for looking at the distribution $\qy$ of the observations,<br /> but do not help with testing hypotheses made on the non-observed individual parameters (about the distribution, covariate model, etc.).<br /> <br /> One possible solution is to estimate the individual parameters (using for example the conditional mode) and then use these estimates to create new diagnostic tools. This strategy is only useful when the individual parameters have been estimated well.<br /> <br /> If instead the data does not contain enough information to estimate certain individual parameters well, the individual estimates are all shrunk towards the same (population) value; this is the mode (resp. mean) of the population distribution of the parameter if we use the conditional mode (resp. conditional mean). For a parameter $\psi_i$ which is a function of a random effect $\eta_i$, we can quantify this phenomena by defining the so-called $\eta$-shrinkage as<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \eta\text{-shrinkage} = 1 - \displaystyle{\frac{\var{\hat{\eta}_i} }{\var{\eta_i} } }, &lt;/math&gt; }}<br /> <br /> where $\hat{\eta}_i$ is an estimate of $\eta_i$ (the conditional mode, conditional mean, etc.).<br /> <br /> This shrinkage phenomenon is simple to understand because the conditional distribution $\qcetaiyi$ of $\eta_i$ is defined by the product $\pmacro(y_i|\eta_i)\pmacro(\eta_i)$. Saying that the observations $y_i$ provides little information about $\eta_i$ means that the conditional distribution of $y_i$ has a reduced importance in the construction of $\qcetaiyi$. The mode (resp. mean) of $\qcetaiyi$ will therefore be close to 0 which is both the mode and mean of $\qetai$. The result is a high level of shrinkage (close to 1) whenever $\var{\hat{\eta}_i}\ll\var{\eta_i}$.<br /> <br /> Estimates of the $\psi_i$ are therefore biased because they do not correctly reflect the marginal distribution $\qpsii$ (in particular, their variance is much less). A particularly effective solution is to simulate the individual parameters $\psi_i$ with the conditional distribution $\qcpsiiyi$ rather than taking the mode. The resulting estimator is unbiased in the following sense:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pmacro(\psi_i) &amp;=&amp; \displaystyle{ \int \pmacro(\psi_i {{!}} y_i )\pmacro( y_i ) d\, y_i }\\<br /> &amp;=&amp; \esps{y_i}{\pmacro(\psi_i {{!}} y_i )} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> This relationship is a fundamental one when considering inverse problems, incomplete data models, mixed-effects models, etc. So what does it imply exactly? Well, if we randomly draw a vector $y_i$ of observations for an individual in a population and then generate a vector $\psi_i$ using the conditional distribution $\qcpsiiyi$, then the distribution of $\psi_i$ is the population distribution $\qpsii$. In other words, even if each $\psi_i$ is simulated using its own conditional distribution, the fact of pooling them allows us to look at them as if they were a sample from $\qpsii$, i.e., the marginal distribution $\qpsii$ is a mixture of conditional distributions $\qcpsiiyi$.<br /> <br /> The procedure is therefore as follows: we generate several values from each conditional distribution $\qcpsiiyi$ using the [[The Metropolis-Hastings algorithm for simulating the individual parameters|Metropolis-Hastings algorithm]], and use them in addition to the observations in order to build various diagnostic plots.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> === An example ===<br /> <br /> We are going to use the same model used for model exploration in the [[Visualization]] section and parameter estimation in the [[Estimation#Maximum likelihood estimation of the population parameters | Maximum likelihood estimation of the population parameters]] chapter.<br /> <br /> <br /> The structural model that defines the concentration in the central compartment and the hazard function for the events (hemorrhaging) is<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> Cc(t) &amp;=&amp; \displaystyle{ \frac{D \, ka}{V(ka-Cl/V)} }\left(e^{-(Cl/V)\,t} - e^{-ka\,t} \right) \\<br /> h(t) &amp;=&amp; h_0 \, \exp(\gamma\, Cc(t)) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The statistical model assumes that $ka_i$ and $V_i$ are log-normally distributed, $Cl_i$ normal, $h0_i$ probit-normal and $\gamma$ logit-normal. No covariates are used in this model. Lastly, we suppose a constant residual error model. Now we are going to review several different diagnostic plots and look at the conclusions that can be made using them.<br /> <br /> <br /> 1) &lt;u&gt;Individual fits.&lt;/u&gt;<br /> <br /> In the continuous data model $y_{ij}=f(t_{ij};\psi_i) + f(t_{ij};\psi_i)\teps_{ij}$, estimation of the population parameters $\psi_{\rm pop}$ and individual parameters $\psi_{i}$ allows us to compute for each individual:<br /> <br /> &lt;ul&gt;<br /> * $f(t ; \hat{\psi}_{\rm pop})$, the predicted profile given by the estimated population model<br /> &lt;br&gt;<br /> <br /> * $f(t ; \hat{\psi}_{i})$, the predicted profile given by the estimated individual model.<br /> &lt;/ul&gt;<br /> <br /> <br /> {{ExampleWithImage<br /> |text=The figure plotting for each individual the two curves for the predicted concentration shows evidence of inter-individual variability in the kinetics, and furthermore does not allow us to reject the proposed PK model since the fits seem acceptable.<br /> |image=indfits1.png<br /> }}<br /> <br /> <br /> 2) &lt;u&gt;Observations vs predictions.&lt;/u&gt;<br /> <br /> The population and individual models also allow us to calculate for each individual predictions at the observation times<br /> $f(t_{ij} ; \hat{\psi}_{\rm pop})$ and $f(t_{ij} ; \hat{\psi}_{i})$.<br /> <br /> <br /> {{ExampleWithImage<br /> |text= The figure showing observations vs individuals reveals no obvious misspecification. In particular it does not allow us to reject the constant residual error model.<br /> |image=obspred1.png<br /> }}<br /> <br /> <br /> 3) &lt;u&gt;Residuals.&lt;/u&gt;<br /> <br /> Several types of residuals can be defined: ''population weighted residuals'' $({\rm PWRES}_{ij})$, ''individual weighted residuals'' $({\rm IWRES}_{ij})$, ''normalised prediction distribution errors'' $({\rm NPDE}_{ij})$, etc. The first two are defined by:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\rm PWRES}_{ij} &amp;=&amp; \frac{y_{ij} - \hat{\mathbb{E} }(y_{ij})}{\hat{\rm std}(y_{ij})} \\ {\rm IWRES}_{ij} &amp;=&amp; \frac{y_{ij} - f(t_{ij} ; \hat{\psi}_{i})}{g(t_{ij} ; \hat{\psi}_{i})},<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where $\hat{\mathbb{E}}(y_{ij})$ and $\hat{\rm std}(y_{ij})$ are the mean and variance of $y_{ij}$ estimated by Monte Carlo.<br /> ${\rm NPDE}_{ij}$ is a nonparametric version of ${\rm PWRES}_{ij}$, based on a rank statistic. See [http://www.npde.biostat.fr npde] for more details.<br /> <br /> Statistics useful for summarizing the residuals are the 10%, 50% (median) and 90% quantiles. The procedure described earlier for estimating prediction intervals of these quantiles using Monte Carlo can again be used.<br /> <br /> <br /> {{ExampleWithImage<br /> |text= Both the individual residuals and the NPDEs suggest that the model is misspecified. Indeed, under ${\cal M}_0$ residuals are expected to behave as i.i.d. ${\cal N}(0,1)$ random variables, which is clearly not the case here. It is nevertheless difficult to identify the reasons for this misspecification using only these figures.<br /> |image=residual1.png<br /> }}<br /> <br /> <br /> {{ExampleWithImage<br /> |text= These plots show at each observation time the order 10%, 50% and 90% quantiles of the IWRES and NPDE.<br /> The 90% prediction intervals are also displayed. These plots are more informative than the original residual plots.<br /> We can now reasonably conclude that the behavior of the three quantiles is not the one expected under ${\cal M}_0$. In particular, a proportional component in the residual error model appears not to have been taken into account.<br /> |image=residual2.png<br /> }}<br /> <br /> <br /> 4) &lt;u&gt;The distributions of the individual parameters.&lt;/u&gt;<br /> <br /> The hypotheses we have made about the distributions of the individual parameters can be tested by visually comparing the pdf of the pre-selected distribution of each parameter with the empirical distribution of that parameter. We are going to see that using the estimated individual parameters does not allow us to construct a pertinent diagnostic plot, and that we must rather use parameters simulated with the conditional distribution $\qcpsiiyi$ for each individual.<br /> <br /> <br /> {{ExampleWithImage<br /> |text= The plots show for each model parameter the pdf obtained from the estimated parameters and the empirical distribution, shown as a histogram, of the estimated individual parameters (here the estimated parameters are the modes of the conditional distributions).<br /> |image=distparam0.png<br /> }}<br /> <br /> <br /> {{ExampleWithImage<br /> |text= Instead of histograms, $\monolix$ can also display the empirical distribution of a continuous variable using nonparametric density estimation. This is a better way to represent continuous distributions than a histogram.<br /> &lt;br&gt;<br /> It is also possible to display the $\eta$-shrinkage for each parameter. As expected, $\eta$-shrinkage is large for the parameters associated with the time-to-events process. That does not mean that the statistical models for $h_0$ and $\gamma$ are misspecified, but that the data does not allow us to correctly recover these individual parameters.<br /> |image=distparam1.png<br /> }}<br /> <br /> {{ExampleWithImage<br /> |text= The simulated individual parameters now allow us to construct a diagnostic tool for the distributions of the individual parameters. Only the distribution of the clearance $Cl$ would appear to be rejected. Though the model had supposed a normal distribution, the simulated parameters seem to suggest an asymmetric distribution. This diagnostic plot leads us to think about testing the hypothesis of a log-normal distribution for $Cl$.<br /> |image=distparam2.png<br /> }}<br /> <br /> <br /> 5) &lt;u&gt;Covariate model.&lt;/u&gt;<br /> <br /> The model assumes that for a given individual parameter $\psi_i$, there exists a function $h$ such that $h(\psi_i) = h(\psi_{\rm pop}) + \eta_i$, where the random effects are i.i.d. Gaussian variables. We can then graphically display the random effects simulated with the conditional distributions as a function of the various covariates in order to see whether this hypothesis is valid.<br /> <br /> <br /> {{ExampleWithImage<br /> |text= These plots clearly show that the simulated random effects for $V$ and $Cl$ are correlated with the weight and have different distributions depending on gender. The assumption that volume $V$ and clearance $Cl$ are independent of weight should be rejected. The statistical model also needs to take into account the fact that both predicted volume and clearance increase with weight.<br /> |image=Eval_covariate2.png <br /> }}<br /> <br /> <br /> 6) &lt;u&gt;The correlation model.&lt;/u&gt;<br /> The model assumes that for a given individual, the random effects associated with the each individual parameter are independent.<br /> We can plot each pair of random effects simulated with the conditional distributions against each other to see if this hypothesis is valid.<br /> <br /> <br /> {{ExampleWithImage<br /> |text=The various point clouds show no correlation between random effects except for $(\eta_V,\eta_{Cl})$ and perhaps $(\eta_{ka},\eta_V)$.<br /> |image=correl1.png<br /> }}<br /> <br /> <br /> 7) &lt;u&gt;Visual predictive checks.&lt;/u&gt;<br /> <br /> A VPC is a diagnostic tool well suited to continuous data. It allows us to summarize in the same figure the structural and statistical models. The VPC shown uses the order 10%, 50% and 90% for the observations after having regrouped them into bins for successive intervals. Then, prediction intervals for these quantiles under ${\cal M}_0$ are estimated using Monte Carlo.<br /> <br /> <br /> {{ExampleWithImage<br /> |text=To make it easier to interpret VPCs, we represent in red the zones where the observed quantiles are outside the prediction intervals. Here, the structural model seems to be ok, but the statistical model exhibits some incoherencies. In particular, the three quantiles obtained using the observations appear much closer together than the model ${\cal M}_0$ would suggest. This adds weight to the suggestion that a proportional component should be added to the error model.<br /> |image=vpc4.png<br /> }}<br /> <br /> <br /> 8) &lt;u&gt;Kaplan Meier plots.&lt;/u&gt;<br /> <br /> Special diagnostic plots need to be defined for non-continuous observations.<br /> We can for survival (time-to-event) data use Kaplan Meier plots (for the first event) as a statistic, and/or the average cumulated number of events per individual (i.e., the mean number of observed events before time $t$). The prediction intervals for these statistics can be estimated by Monte Carlo.<br /> <br /> <br /> {{ExampleWithImage<br /> |text= The shapes of the curves seem correct. The model appears to slightly overestimate the survival function after the 15 hr mark, but it is difficult at this point to decide whether this comes from the time-to-event model itself, the statistical model or the model for the concentration.<br /> |image=km2.png<br /> }}<br /> <br /> <br /> {{OutlineText<br /> |text= In summary, this ensemble of diagnostic plots has suggested to us that we should suppose:<br /> <br /> <br /> * a log-normal distribution for $Cl$<br /> <br /> * a combined residual error model, e.g., $y=f+(a+b*f)\teps$<br /> <br /> * a statistical model for $Cl$ and $V$ which incorporates weight as a covariate, assuming for instance a linear relationship between $\log(V)$ and $\log({\rm weight})$ and a linear relationship between $\log(Cl)$ and $\log({\rm weight})$<br /> <br /> *a linear correlation between $\log(Cl)$ and $\log(V)$.<br /> }}<br /> <br /> <br /> This new model can be easily implemented in $\monolix$. Population parameters and individual parameters are then estimated anew and new diagnostic plots drawn. A few of these are displayed below and clearly show that the new model is better than the previous one, and can likely be retained as a candidate model.<br /> <br /> ::[[File:diagnostic.png]]<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Model selection == <br /> <br /> &lt;br&gt;<br /> === Statistical tools for model selection ===<br /> <br /> <br /> Statistical tools for model selection include information criteria (AIC and BIC) and hypothesis tests such as the Wald test and likelihood ratio test (LRT).<br /> <br /> The Akaike Information Criteria (AIC) and the Bayesian information Criteria (BIC) are defined by<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> AIC &amp;=&amp; - 2 {\llike}(\theta;\by) + 2 P \\<br /> BIC &amp;=&amp; - 2 {\llike}(\theta;\by) + \log(N) P ,<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where $P$ is the total number of parameters to be estimated and $N$ the number of subjects. The models being compared using AIC or BIC need not be nested, unlike the case when models are being compared using an F-test or likelihood ratio test.<br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> |text= &amp;#32;<br /> * Surprisingly, the formula for calculating the BIC differs from one software to another. The reason is that the effective sample size is not clearly defined in the context of mixed-effects models. The question is whether we should use the number of subjects $N$ or the total number of observations $n_{\mathrm{tot} }$ in the penalty term. The penalty using $n_{\mathrm{tot} }$ is implemented in the R package nlme &lt;!--[http://cran.r-project.org/web/packages/nlme/nlme.pdf nlme]--&gt; and the SPSS procedure MIXED &lt;!--[http://www.spss.ch/upload/1126184451_Linear%20Mixed%20Effects%20Modeling%20in%20SPSS.pdf MIXED]--&gt;, while $N$ is used in saemix for $\monolix$, &lt;!--[http://cran.r-project.org/web/packages/saemix/saemix.pdf saemix]--&gt; and in SAS proc NLMIXED&lt;!--[http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#nlmixed_toc.htm NLMIXED]--&gt;.<br /> <br /> : An appropriate decomposition of the complete log-likelihood combined with the Laplace approximation can be used to derive the asymptotic BIC approximation. This leads to an optimal BIC penalty based on two terms proportional to $\log N$ and $\log n_{\mathrm{tot} }$ that adapts to the mixed-effects structure of the model. This new approach is not implemented yet in any software.<br /> <br /> <br /> * AIC and BIC are justified based on asymptotic criteria (the AIC heuristic uses Wilks' theorem and BIC uses Bayesian statistics), i.e., when the number of individuals increases and the model dimension stays fixed. In the alternative, non-asymptotic approach, the model size can increase freely. The form of the penalty can differ from one model to the next in this framework. It can be shown for example that for certain Gaussian models, the penalty term has the form $c_1P + c_2P\log(N/P)$. The problem then becomes to calibrate the coefficients $c_1$ and $c_2$ in order to obtain an optimal penalty, which is not necessarily a simple task, making it harder to use this approach in real applications.<br /> }}<br /> <br /> <br /> The observed log-likelihood ${\llike}(\theta;\by) = \log(\py(\by;\theta))$ cannot be computed in closed form for nonlinear mixed-effects models. It can be estimated by Monte Carlo using the importance sampling algorithm described in the [[Estimation of the log-likelihood]] section.<br /> <br /> When comparing two nested models ${\cal M}_0$ and ${\cal M}_1$ with dimensions $P_0$ and $P_1$ (with $P_1&gt;P_0$), the ''likelihood ratio test'' (LRT) uses the test statistic<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> LRT = 2 ( {\llike}(\hthetag_1;\by) - {\llike}(\hthetag_0;\by) ) ,<br /> &lt;/math&gt; }}<br /> <br /> where $\hthetag_0$ and $\hthetag_1$ are the MLE of $\theta$ under ${\cal M}_0$ and ${\cal M}_1$.<br /> <br /> Depending on the hypotheses, the limit distribution of $LRT$ is either a $\chi ^2$ distribution or a mixture of a $\chi^2$ distribution and a Dirac $\delta$ distribution. For example:<br /> <br /> <br /> &lt;ul&gt;<br /> - Testing whether some fixed effects are null and assuming the same covariance structure for the random effects implies that<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; LRT \limite{N\to \infty}{} \chi^2(P_1-P_0) .<br /> &lt;/math&gt; }}<br /> <br /> <br /> - Testing whether some correlations in the covariance matrix $\IIV$ are null and assuming the same covariate model implies that<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> LRT \limite{N\to \infty}{} \chi^2(P_1-P_0) .<br /> &lt;/math&gt; }}<br /> <br /> <br /> - Testing whether the variance of one of the random effects is zero and assuming the same covariate model implies that<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> LRT \limite{N\to \infty}{} \displaystyle{ \frac{1}{2} }\chi^2(1) + \displaystyle{ \frac{1}{2} }\delta_0 .<br /> &lt;/math&gt; }}<br /> &lt;/ul&gt;<br /> <br /> <br /> Statistical tests, as is the case for BIC, help us decide whether the difference between two models is statistically significant.<br /> Suppose that we want to test whether a fixed model parameter $\beta$ is null:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> H_0: \quad \beta=0 \quad vs \quad H_1: \quad \beta\neq0 .<br /> &lt;/math&gt; }} <br /> <br /> We construct a statistical test $T$ for which the distribution under $H_0$ allows us to calculate a $p$-value, i.e., the probability that $T$ is at least as big as the value observed under $H_0$. A small $p$-value leads us to reject $H_0$ with high confidence. It is usual to use the arbitrary cutoff of 5% to make this decision: we frequently read statements such as &quot;a decrease in the objective function of a least 3.84 was required to identify a signifiant covariate&quot;.<br /> In the same way we could select models based on their BIC values under $H_0$ and $H_1$ by providing an arbitrary decision rule. It is sometimes suggested to choose $H_1$ if the difference $BIC_{H_1} -BIC_{H_0}$ is inferior to a certain arbitrary cutoff.<br /> <br /> These approaches seem to simplify the modeler's life because they provide decision rules that can be applied systematically without thinking and thus justify decisions. But a rule, whatever it is, should never stop us asking why we are applying it and whether it is applicable in the present case. Remember that even a very small difference will be statistically significant if the sample size is large enough. The question is thus not to know whether a difference is statistically significant, but whether it is physically or biologically significant. We must therefore look carefully at the size of an effect and its real impact, both for understanding the model and for understanding its impact on the model's predictive capacities.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===An example===<br /> <br /> We are going to continue using our joint model for concentration data and hemorrhaging events. The structural model is chosen to be the one used earlier for the diagnostic tests. We would now like to compare several statistical models:<br /> <br /> &lt;blockquote&gt;<br /> ${\cal M}_1$: all the individual parameters are log-normally distributed, there is no correlation between individual parameters, the residual error model is a combined one ($y=f+(a+b*f)*\teps$), both $\log(V)$ and $\log(Cl)$ are linear functions of $\log({\rm weight})$. <br /> &lt;/blockquote&gt;<br /> &lt;blockquote&gt;<br /> ${\cal M}_2$: model ${\cal M}_1$, assuming that $\log(V_i)$ and $\log(Cl_i)$ are linearly correlated. <br /> &lt;/blockquote&gt;<br /> &lt;blockquote&gt;<br /> ${\cal M}_3$: model ${\cal M}_2$, assuming that $\gamma_i=\gamma_{\rm pop}$ (i.e., $\omega_\gamma = 0$). <br /> &lt;/blockquote&gt;<br /> &lt;blockquote&gt;<br /> ${\cal M}_4$: model ${\cal M}_2$, assuming that $Cl$ is not a function of the weight. <br /> &lt;/blockquote&gt;<br /> <br /> <br /> The results for these four models are as follows:<br /> <br /> <br /> {| class=&quot;wikitable&quot; cellpadding=&quot;20&quot; cellspacing=&quot;20&quot; style=&quot;margin-left:20%; margin-right=20%; width:50%&quot;<br /> ! Model || $-2\times {\llike}$ || BIC <br /> |- <br /> | ${\cal M}_1$ || 1390 || 1451 <br /> |-<br /> | ${\cal M}_2$ || 1370 || 1436 <br /> |-<br /> | ${\cal M}_3$ || 1370 ||1432 <br /> |-<br /> | ${\cal M}_4$ || 1446 || 1477 <br /> |} <br /> <br /> <br /> Clearly models ${\cal M}_2$ and ${\cal M}_3$ are the best in terms of BIC. We can therefore envisage selecting a model that includes both a correlation between $\log(V)$ and $\log(Cl)$ and that supposes that the volume $V$ is a function of the weight.<br /> <br /> Rather than a purely statistical criteria (LRT or BIC) it is particularly the estimated value of the standard deviation of $\gamma$ ($\hat{\omega}_\gamma = 0.002$) under the model ${\cal M}_2$ that would lead us to conclude that the inter-individual variability of $\gamma$ is negligible.<br /> <br /> It is also important to try and evaluate the impact that a bad decision could have. Retaining $\hat{\omega}_\gamma = 0.002$ would have little impact on predictions because $\gamma_i$ is log-normally distributed, representing a variability of around 0.2%. Models ${\cal M}_2$ and ${\cal M}_3$ are thus practically identical and there is no particular advantage of selecting one over the other.<br /> <br /> <br /> &lt;br&gt;<br /> == Bibliography ==<br /> <br /> <br /> &lt;bibtex&gt;<br /> @article{akaike1983information,<br /> title={Information measures and model selection},<br /> author={Akaike, H.},<br /> journal={Bulletin of the International Statistical Institute},<br /> volume={50},<br /> number={1},<br /> pages={277-291},<br /> year={1983}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{arlot2010survey,<br /> title={A survey of cross-validation procedures for model selection},<br /> author={Arlot, S. and Celisse, A.},<br /> journal={Statistics Surveys},<br /> volume={4},<br /> pages={40 - 79},<br /> year={2010},<br /> publisher={The author, under a Creative Commons Attribution License}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{berger1996intrinsic,<br /> title={The intrinsic Bayes factor for model selection and prediction},<br /> author={Berger, J.O. and Pericchi, L.R.},<br /> journal={Journal of the American Statistical Association},<br /> volume={91},<br /> number={433},<br /> pages={109-122},<br /> year={1996},<br /> publisher={Taylor &amp; Francis}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{bergstrand2011prediction,<br /> title={Prediction-corrected visual predictive checks for diagnosing nonlinear mixed-effects models},<br /> author={Bergstrand, M. and Hooker, A.C. and Wallin, J.E. and Karlsson, M.O.},<br /> journal={The AAPS journal},<br /> volume={13},<br /> number={2},<br /> pages={143-151},<br /> year={2011},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{burnham2002model,<br /> title={Model selection and multi-model inference: a practical information-theoretic approach},<br /> author={Burnham, K.P. and Anderson, D.R.},<br /> year={2002},<br /> publisher={Springer Verlag}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{comets2010model,<br /> title={Model evaluation in nonlinear mixed effect models, with applications to pharmacokinetics},<br /> author={Comets,E. and Brendel,K.},<br /> journal={Journal de la Soci&amp;eacute;t&amp;eacute; Fran&amp;ccedil;aise de Statistique},<br /> volume={151},<br /> number={1},<br /> pages={106-128},<br /> year={2010}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{comets2008computing,<br /> title={Computing normalised prediction distribution errors to evaluate nonlinear mixed-effect models: the npde add-on package for R},<br /> author={Comets, E. and Brendel, K. and Mentr&amp;eacute;, F.},<br /> journal={Computer methods and programs in biomedicine},<br /> volume={90},<br /> number={2},<br /> pages={154-166},<br /> year={2008},<br /> publisher={Elsevier}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{hausman1984specification,<br /> title={Specification tests for the multinomial logit model},<br /> author={Hausman, J. and McFadden, D.},<br /> journal={Econometrica: Journal of the Econometric Society},<br /> pages={1219-1240},<br /> year={1984},<br /> publisher={JSTOR}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @ARTICLE{Hooker2009Xpose,<br /> author = {Hooker, A. and Karlsson, M.O. and Jonsson, E.N.},<br /> title = {Model diagnostic using XPOSE},<br /> url = {http://xpose.sourceforge.net/generic\_chm/xpose.VPC.html},<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @INPROCEEDINGS{KarlssonHolford2008Tutorial,<br /> author = {Karlsson, M.O. and Holford, N.},<br /> title = {A Tutorial on Visual Predictive Checks},<br /> booktitle = {PAGE 2008},<br /> year = {2008},<br /> owner = {kb},<br /> timestamp = {2011.04.18},<br /> url = {http://www.page-meeting.org/pdf\_assets/8694-Karlsson_Holford_VPC_Tutorial_hires.pdf}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{lavielle2011automatic,<br /> title={Automatic data binning for improved visual diagnosis of pharmacometric models},<br /> author={Lavielle, M. and Bleakley, K.},<br /> journal={Journal of pharmacokinetics and pharmacodynamics},<br /> volume={38},<br /> number={6},<br /> pages={861-871},<br /> year={2011},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @misc{massart2007concentration,<br /> title={Concentration Inequalities and Model Selection, Ecole d’Et&amp;eacute; de Probabilit&amp;eacute;s de Saint-Flour XXXIII-2003 Lecture Notes in Mathematics 1896},<br /> author={Massart, P.},<br /> year={2007},<br /> publisher={Springer-Verlag, Berlin}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{neyman1992problem,<br /> title={On the problem of the most efficient tests of statistical hypotheses},<br /> author={Neyman, J. and Pearson, E.S.},<br /> year={1992},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{schwarz1978estimating,<br /> title={Estimating the dimension of a model},<br /> author={Schwarz, G.},<br /> journal={The annals of statistics},<br /> volume={6},<br /> number={2},<br /> pages={461-464},<br /> year={1978},<br /> publisher={Institute of Mathematical Statistics}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{shimodaira1999multiple,<br /> title={Multiple comparisons of log-likelihoods with applications to phylogenetic inference},<br /> author={Shimodaira, H. and Hasegawa, M.},<br /> journal={Molecular biology and evolution},<br /> volume={16},<br /> pages={1114-1116},<br /> year={1999},<br /> publisher={UNIVERSITY OF CHICAGO PRESS}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{vaida2005conditional,<br /> title={Conditional Akaike information for mixed-effects models},<br /> author={Vaida, F. and Blanchard, S.},<br /> journal={Biometrika},<br /> volume={92},<br /> number={2},<br /> pages={351--370},<br /> year={2005},<br /> publisher={Biometrika Trust}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{wald1943tests,<br /> title={Tests of statistical hypotheses concerning several parameters when the number of observations is large},<br /> author={Wald, A.},<br /> journal={Transactions of the American Mathematical Society},<br /> volume={54},<br /> number={3},<br /> pages={426-482},<br /> year={1943},<br /> publisher={JSTOR}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{wilks1938large,<br /> title={The large-sample distribution of the likelihood ratio for testing composite hypotheses},<br /> author={Wilks, S.S.},<br /> journal={The Annals of Mathematical Statistics},<br /> volume={9},<br /> number={1},<br /> pages={60-62},<br /> year={1938},<br /> publisher={JSTOR}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{zhao2006model,<br /> title={On model selection consistency of Lasso},<br /> author={Zhao, P. and Yu, B.},<br /> journal={The Journal of Machine Learning Research},<br /> volume={7},<br /> pages={2541-2563},<br /> year={2006},<br /> publisher={JMLR. org}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack=Visualization<br /> |linkNext=Simulation }}</div> Admin https://wiki.inria.fr/popix/Modeling Modeling 2013-05-30T08:04:35Z <p>Admin: /* Bayesian estimation of the population parameters */</p> <hr /> <div>== Introduction ==<br /> <br /> In the modeling context, we usually assume that we have data that includes observations $\by$, measurement times $\bt$ and possibly additional regression variables $\bx$. There may also be individual covariates $\bc$, and in pharmacological applications the dose regimen $\bu$. For clarity, in the following notation we will omit the design variables $\bt$, $\bx$ and $\bu$, and the covariates $\bc$.<br /> <br /> Here, we find ourselves in the classical framework of incomplete data models. Indeed, only $\by = (y_{ij})$ is observed in the joint model $\pypsi(\by,\bpsi;\theta)$.<br /> <br /> Estimation tasks are common ones seen in statistics:<br /> <br /> <br /> &lt;ol&gt;<br /> &lt;li&gt; Estimate the population parameter $\theta$ using the available observations and possibly a priori information that is available.&lt;/li&gt;<br /> <br /> &lt;li&gt;Evaluate the precision of the proposed estimates.&lt;/li&gt;<br /> <br /> &lt;li&gt;Reconstruct missing data, here being the individual parameters $\bpsi=(\psi_i, 1\leq i \leq N)$. &lt;/li&gt;<br /> <br /> &lt;li&gt;Estimate the log-likelihood for a given model, i.e., for a given joint distribution $\qypsi$ and value of $\theta$.&lt;/li&gt;<br /> &lt;/ol&gt;<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Maximum likelihood estimation of the population parameters== <br /> <br /> &lt;br&gt;<br /> === Definitions ===<br /> <br /> <br /> ''Maximum likelihood estimation'' consists of maximizing with respect to $\theta$ the ''observed likelihood'' defined by:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \like(\theta ; \by) &amp;\eqdef&amp; \py(\by ; \theta) \\<br /> &amp;=&amp; \int \pypsi(\by,\bpsi ;\theta) \, d \bpsi .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Maximum likelihood estimation of the population parameter $\theta$ requires:<br /> <br /> &lt;blockquote&gt;<br /> * A model, i.e., a joint distribution $\qypsi$. Depending on the software used, the model can be implemented using a script or a graphical user interface. $\monolix$ is extremely flexible and allows us to combine both. It is possible for instance to code the structural model using $\mlxtran$ and use the [http://en.wikipedia.org/wiki/GUI GUI] for implementing the statistical model. Whatever the options selected, the complete model can always be saved as a text file. &lt;br&gt;&lt;br&gt;<br /> * Inputs $\by$, $\bc$, $\bu$ and $\bt$. All of these variables tend to be stored in a unique data file (see the [[Visualization#Data exploration | Data Exploration ]] Section). &lt;br&gt;&lt;br&gt;<br /> * An algorithm which allows us to maximize $\int \pypsi(\by,\bpsi ;\theta) \, d \bpsi$ with respect to $\theta$. Each software package has its own algorithms implemented. It is not our goal here to rate and compare the various algorithms and implementations. We will use exclusively the SAEM algorithm as described in [[The SAEM algorithm for estimating population parameters | The SAEM algorithm]] and implemented in $\monolix$ as we are entirely satisfied by both its theoretical and practical qualities: &lt;br&gt;&lt;br&gt;<br /> ** The algorithms implemented in $\monolix$ including SAEM and its extensions ([[Mixture models|mixture models]], [[Hidden Markov models|hidden Markov models]], [[Stochastic differential equations based models|SDE-based model]], [http://en.wikipedia.org/wiki/Censored_data censored data], etc.) have been published in statistical journals. Furthermore, convergence of SAEM has been rigorously proved.&lt;br&gt;&lt;br&gt;<br /> ** The SAEM implementation in $\monolix$ is extremely efficient for a wide variety of complex models.&lt;br&gt;&lt;br&gt;<br /> ** The SAEM implementation in $\monolix$ was done by the same group that proposed the algorithm and studied in detail its theoretical and practical properties.<br /> &lt;/blockquote&gt;<br /> <br /> <br /> {{Remarks<br /> |title=Remark<br /> |text= It is important to highlight the fact that for a parameter $\psi_i$ whose distribution is the transformation of a normal one (log-normal, logit-normal, etc.) the MLE $\hat{\psi}_{\rm pop}$ of the reference parameter $\psi_{\rm pop}$ is neither the mean nor the mode of the distribution. It is in fact the median.<br /> <br /> To show why this is the case, let $h$ be a nonlinear, twice continuously derivable and strictly increasing function such that $h(\psi_i)$ is normally distributed.<br /> <br /> <br /> * First we show that it is not the mean. By definition, the MLE of $h(\psi_{\rm pop})$ is $h(\hat{\psi}_{\rm pop})$. Thus, the estimated distribution of $h(\psi_i)$ is the normal distribution with mean $h(\hat{\psi}_{\rm pop})$, but $\esp{h(\psi_i)} = h(\hat{\psi}_{\rm pop})$ implies that $\esp{\psi_i} \neq \hat{\psi}_{\rm pop}$ since $h$ is nonlinear. In other words, $\hat{\psi}_{\rm pop}$ is not the mean of the estimated distribution of $\psi_i$.<br /> <br /> <br /> * Next we show that it is not the mode. Let $f$ be the pdf of $\psi_i$ and let $f_h$ be the pdf of $h(\psi_i)$. By definition, for any $h(t)\in \mathbb{R}$,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> f(t) = h^\prime(t)f_h(h(t)) . &lt;/math&gt; }}<br /> <br /> : Thus,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> f^\prime(t) = h^{\prime \prime}(t)f_h(h(t)) + h^{\prime 2}(t)f_h^\prime(h(t)) .<br /> &lt;/math&gt; }}<br /> <br /> : By definition of the mode, $f_h^\prime(h(\hat{\psi}_{\rm pop}))=0$. Since $h$ is nonlinear, $h^{\prime \prime}(\hat{\psi}_{\rm pop})\neq 0$ a.s. and $f^\prime(\hat{\psi}_{\rm pop})\neq 0$ a.s.. In other words, $\hat{\psi}_{\rm pop}$ is not the mode of the estimated distribution of $\psi_i$.<br /> <br /> <br /> * Now we show that it is the median. Since $h$ is a strictly increasing function,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \probs{\hat{\psi}_{\rm pop} }{\psi_i \leq \hat{\psi}_{\rm pop} } &amp;=&amp; \probs{\hat{\psi}_{\rm pop} }{h(\psi_i) \leq h(\hat{\psi}_{\rm pop})} \\<br /> &amp;=&amp; 0.5 .<br /> \end{eqnarray}&lt;/math&gt; }} <br /> <br /> : In other words, $\hat{\psi}_{\rm pop}$ is the median of the estimated distribution of $\psi_i$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> === Example ===<br /> <br /> Let us again look at the model used in the [[Visualization#Model exploration | Model Visualization]] Section. For the case of a unique dose $D$ given at time $t=0$, the structural model is written:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> ke&amp;=&amp;Cl/V \\<br /> Cc(t) &amp;=&amp; \displaystyle{\frac{D \, ka}{V(ka-ke)} }\left(e^{-ke\,t} - e^{-ka\,t} \right) \\<br /> h(t) &amp;=&amp; h_0 \, \exp(\gamma\, Cc(t)) ,<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where $Cc$ is the concentration in the central compartment and $h$ the hazard function for the event of interest (hemorrhaging). Supposing a constant error model for the concentration, the model for the observations can be easily implemented using $\mlxtran$.<br /> <br /> <br /> {{MLXTran<br /> |name=joint1est_model.txt<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> INPUT:<br /> parameter = {ka, V, Cl, h0, gamma}<br /> <br /> EQUATION:<br /> ke=Cl/V<br /> Cc = amtDose*ka/(V*(ka-ke))*(exp(-ke*t) - exp(-ka*t))<br /> h = h0*exp(gamma*Cc)<br /> <br /> OBSERVATION:<br /> Concentration = {type=continuous, prediction=Cc, errorModel=constant}<br /> Hemorrhaging = {type=event, hazard=h}<br /> <br /> OUTPUT:<br /> output = {Concentration, Hemorrhaging}<br /> &lt;/pre&gt; }}<br /> <br /> <br /> Here, {{Verbatim|amtDose}} is a reserved keyword for the last administered dose.<br /> <br /> The model's parameters are the absorption rate constant $ka$, the volume of distribution $V$, the clearance $Cl$, the baseline hazard $h_0$ and the coefficient $\gamma$. The statistical model for the individual parameters can be defined in the $\monolix$ project file (left) and/or the $\monolix$ GUI (right):<br /> <br /> <br /> {{ExampleWithCode&amp;Image<br /> |title=<br /> |text=<br /> |code={{MLXTranForTable<br /> |name=<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> INDIVIDUAL:<br /> ka = {distribution=logNormal, iiv=yes}<br /> V = {distribution=logNormal, iiv=yes},<br /> Cl = {distribution=normal, iiv=yes},<br /> h0 = {distribution=probitNormal, iiv=yes},<br /> gamma = {distribution=logitNormal, iiv=yes},<br /> &lt;/pre&gt; }}<br /> |image=<br /> [[File:Vsaem1.png]]<br /> }}<br /> <br /> <br /> Once the model is implemented, tasks such as maximum likelihood estimation can be performed using the SAEM algorithm. Certain settings in SAEM must be provided by the user. Even though SAEM is quite insensitive to the initial parameter values,<br /> it is possible to perform a preliminary sensitivity analysis in order to select &quot;good&quot; initial values.<br /> <br /> <br /> {{ImageWithCaption|image=Vsaem2.png|caption=Looking for good initial values for SAEM}}<br /> <br /> <br /> <br /> Then, when we run SAEM, it converges easily and quickly to the MLE:<br /> <br /> <br /> {{JustCode<br /> |code=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt;Estimation of the population parameters<br /> <br /> parameter<br /> ka : 0.974<br /> V : 7.07<br /> Cl : 2.00<br /> h0 : 0.0102<br /> gamma : 0.485<br /> <br /> omega_ka : 0.668<br /> omega_V : 0.365<br /> omega_Cl : 0.588<br /> omega_h0 : 0.105<br /> omega_gamma : 0.0901<br /> <br /> a_1 : 0.345<br /> &lt;/pre&gt; }}<br /> <br /> <br /> Parameter estimation can therefore be seen as estimating the reference values and variance of the random effects.<br /> <br /> In addition to these numbers, it is important to be able to graphically represent these distributions in order to see them and therefore understand them better. In effect, the interpretation of certain parameters is not always simple. Of course, we know what a normal distribution represents and in particular its mean, median and mode, which are equal (see the distribution of $Cl$ below for instance). These measures of central tendency can be different among themselves for other asymmetric distributions such as the log-normal (see the distribution of $ka$).<br /> <br /> Interpreting dispersion terms like $\omega_{ka}$ and $\omega_{V}$ is not obvious either when the parameter distributions are not normal. In such cases, quartiles or quantiles of order 5% and 95% (for example) may be useful for quantitively describing the variability of these parameters.<br /> <br /> <br /> {{Remarks <br /> |title=Remarks<br /> |text=<br /> For a parameter $\psi$ whose distribution is log-normal, we can approximate the coefficient of variation for $\psi$ by the standard deviation $\omega_{\psi}$ of the random effect $\eta$ if this is fairly small. In effect, when $\omega_{\psi}$ is small,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \psi &amp;=&amp; \psi_{\rm pop} e^{\eta} \\<br /> &amp;\approx &amp; \psi_{\rm pop}(1+ \eta) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Thus<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \esp{\psi} &amp;\approx&amp; \psi_{\rm pop} \\<br /> \std{\psi} &amp;\approx &amp; \psi_{\rm pop}\omega_{\psi},<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> and<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\rm cv}(\psi) &amp;=&amp; \frac{\std{\psi} }{\esp{\psi} } \\<br /> &amp;\approx &amp; \omega_{\psi} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Do not forget that this approximation is only valid when $\omega$ is small and in the case of log-normal distributions. It does not carry over to any other distribution. Thus, when $\omega_{h0}=0.1$ for a probit-normal distribution or $\omega_{\gamma}=0.09$ for a logit-normal one, there is no immediate interpretation available. Only by looking at the graphical display of the pdf or by calculating some quantiles of interest can we begin to get an idea of dispersion in the parameters $h0$ and $\gamma$.<br /> }}<br /> <br /> <br /> {{ImageWithCaption|image=saem3b.png|caption=Estimation of the population distributions of the individual parameters of the model }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Bayesian estimation of the population parameters==<br /> <br /> The [http://en.wikipedia.org/wiki/Bayesian_probability ''Bayesian approach''] considers $\theta$ as a random vector with a ''prior distribution'' $\qth$. We can then define the posterior distribution of $\theta$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcthy(\theta {{!}} \by ) &amp;=&amp; \displaystyle{ \frac{\pth( \theta )\pcyth(\by {{!}} \theta )}{\py(\by)} }\\<br /> &amp;=&amp; \displaystyle{ \frac{\pth( \theta ) \int \pypsith(\by,\bpsi {{!}}\theta) \, d \bpsi}{\py(\by)} }.<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> We can estimate this conditional distribution and derive any statistics (posterior mean, standard deviation, percentiles, etc.) or derive the so-called [http://en.wikipedia.org/wiki/Maximum_a_posteriori_estimation ''Maximum a Posteriori'' (MAP) estimate] of $\theta$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \hat{\theta}^{\rm MAP} &amp;=&amp; \argmax{\theta} \pcthy(\theta {{!}} \by ) \\<br /> &amp;=&amp; \argmax{\theta} \left\{ {\llike}(\theta ; \by) + \log( \pth( \theta ) ) \right\} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The MAP estimate therefore maximizes a penalized version of the observed likelihood. In other words, maximum a posteriori estimation reduces to penalized maximum likelihood estimation. Suppose for instance that $\theta$ is a scalar parameter and the prior is a normal distribution with mean $\theta_0$ and variance $\gamma^2$. Then, the MAP estimate minimizes<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \hat{\theta}^{\rm MAP} =\argmax{\theta} \left\{ {\llike} (\theta ; \by) - \displaystyle{ \frac{1}{2\gamma^2} }(\theta - \theta_0)^2 \right\} .<br /> &lt;/math&gt; }}<br /> <br /> The MAP estimate is a trade-off between the [http://en.wikipedia.org/wiki/Maximum_likelihood_estimation MLE] which maximizes ${\llike}(\theta ; \by)$ and $\theta_0$ which minimizes $(\theta - \theta_0)^2$. The weight given to the prior directly depends on the variance of the prior distribution: the smaller $\gamma^2$ is, the closer to $\theta_0$ the MAP is. The limiting distribution considers that $\gamma^2=0$: this prior means here that $\theta$ is fixed as $\theta_0$ and no longer needs to be estimated.<br /> <br /> Both the Bayesian and [http://en.wikipedia.org/wiki/Frequentist_probability frequentist] approaches have their supporters and detractors. But rather than being dogmatic and blindly following the same rule-book every time, we need to be pragmatic and ask the right methodological questions when confronted with a new problem.<br /> <br /> We have to remember that Bayesian methods have been extremely successful, in particular for numerical calculations. For instance, (Bayesian) [http://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo MCMC methods] allow us to estimate more or less any conditional distribution coming from any hierarchical model, whereas frequentist approaches such as maximum likelihood estimation can be much more difficult to implement.<br /> <br /> All things said, the problem comes down to knowing whether the data contains sufficient information to answer a given question, and whether some other information may be available to help answer it. This is the essence of the art of modeling: finding the right compromise between the confidence we have in the data and prior knowledge of the problem. Each problem is different and requires a specific approach. For instance, if all the patients in a pharmacokinetic trial have essentially the same weight, it is pointless to estimate a relationship between weight and the model's PK parameters using the trial data. In this case, the modeler would be better served trying to use prior information based on physiological criteria rather than just a statistical model.<br /> <br /> Therefore, we can use information available to us, of course! Why not? But this information needs to be pertinent. Systematically using a prior for the parameters is not always meaningful. Can we reasonable suppose that we have access to such information? For continuous data for example, what does putting a prior on the residual error model's parameters mean in reality? A reasoned statistical approach consists of only including prior information for certain parameters (those for which we have real prior information) and having confidence in the data for the others.<br /> <br /> $\monolix$ allows this hybrid approach which reconciles the Bayesian and frequentist approaches. A given parameter can be:<br /> <br /> <br /> &lt;ul&gt;<br /> * a fixed constant if we have absolute confidence in its value or the data does not allow it to be estimated, essentially due to identifiability constraints.<br /> &lt;br&gt;<br /> <br /> * estimated by maximum likelihood, either because we have great confidence in the data or have no information on the parameter.<br /> &lt;br&gt;<br /> <br /> * estimated by introducing a prior and calculating the MAP estimate.<br /> &lt;br&gt;<br /> <br /> * estimated by introducing a prior and then estimating the posterior distribution.<br /> &lt;/ul&gt;<br /> <br /> <br /> We put aside dealing with the fixed components of $\theta$ in the following. Here are some possible situations:<br /> <br /> <br /> &lt;ol&gt;<br /> &lt;li&gt; ''Combined maximum likelihood and maximum a posteriori estimation'': decompose $\theta$ into $(\theta_E,\theta_{M})$ where $\theta_E$ are the components of $\theta$ to be estimated with MLE and $\theta_{M}$ those with a prior distribution whose posterior distribution is to be maximized. Then, $(\hat{\theta}_E , \hat{\theta}_{M} )$ below maximizes the penalized likelihood of $(\theta_E,\theta_{M})$: &lt;/li&gt;<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> (\hat{\theta}_E , \hat{\theta}_{M} ) &amp;=&amp; \argmax{\theta_E , \theta_{M} } \log(\py(\by , \theta_{M}; \theta_E)) \\<br /> &amp;=&amp; \argmax{\theta_E , \theta_{M} } \left\{ {\llike}(\theta_E , \theta_{M}; \by) + \log( \pth( \theta_M ) ) \right\} ,<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where ${\llike} (\theta_E , \theta_{M}; \by) \ \ \eqdef \ \ \log\left(\py(\by | \theta_{M}; \theta_E)\right).$<br /> <br /> <br /> &lt;li&gt; ''Combined maximum likelihood and posterior distribution estimation'': here, decompose $\theta$ into $(\theta_E,\theta_{R})$ where $\theta_E$ are the components of $\theta$ to be estimated with MLE and $\theta_{R}$ those with a prior distribution whose posterior distribution is to be estimated. We propose the following strategy for estimating $\theta_E$ and $\theta_{R}$: &lt;/li&gt;<br /> <br /> <br /> &lt;ol style=&quot;list-style-type:lower-roman&quot;&gt;<br /> &lt;li&gt; Compute the maximum likelihood of $\theta_E$: &lt;/li&gt;<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \hat{\theta}_E &amp;=&amp; \argmax{\theta_E} \log(\py(\by ; \theta_E)) \\<br /> &amp;=&amp; \argmax{\theta_E} \int \pmacro(\by , \theta_R ; \theta_E ) d \theta_R .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> &lt;li&gt; Estimate the conditional distribution $\pmacro(\theta_{R} | \by ;\hat{\theta}_E)$. &lt;/li&gt;<br /> &lt;/ol&gt;<br /> <br /> <br /> It is then straightforward to extend this approach to more complex situations where some components of $\theta$ are estimated with MLE, others using MAP estimation and others still by estimating their conditional distributions.<br /> &lt;/ol&gt;<br /> <br /> <br /> {{Example1<br /> |title1=Example<br /> |title2=A PK example<br /> |text=<br /> In this example we use only the pharmacokinetic data and aim to estimate the population parameter distributions of the PK parameters $ka$, $V$ and $Cl$. We assume log-normal distributions for these three parameters. All of the model's population parameters are estimated by maximum likelihood estimation except $ka_{\rm pop}$ for which a log-normal distribution is used as a prior:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \log(ka_{\rm pop}) \sim {\cal N}(\log(1.5), \gamma^2) . &lt;/math&gt; }}<br /> <br /> $\monolix$ allows us to compute the MAP estimate and to estimate the posterior distribution of $ka_{\rm pop}$ for various values of $\gamma$.<br /> <br /> <br /> &lt;div style=&quot;margin-left:15%; margin-right:32%; align:center&quot;&gt;<br /> {{{!}} class=&quot;wikitable&quot; align=&quot;center&quot; style=&quot;width:100%&quot;<br /> {{!}} $\gamma$ {{!}}{{!}} 0 {{!}}{{!}} 0.01 {{!}}{{!}} 0.025 {{!}}{{!}} 0.05 {{!}}{{!}} 0.1 {{!}}{{!}} 0.2 {{!}}{{!}} $+ \infty$ <br /> {{!}}-<br /> {{!}}$\hat{ka}_{\rm pop}^{\rm MAP}$ {{!}}{{!}} 1.5 {{!}}{{!}} 1.49 {{!}}{{!}} 1.47 {{!}}{{!}} 1.39 {{!}}{{!}} 1.22 {{!}}{{!}} 1.11 {{!}}{{!}} 1.05 <br /> {{!}}}&lt;/div&gt;<br /> <br /> {{ImageWithCaption|image=bayes1.png|caption=Prior and posterior distributions of $ka_{\rm pop}$ for different values of $\gamma$}}<br /> <br /> <br /> As expected, the posterior distribution converges to the prior distribution when the standard deviation $\gamma$ of the prior distribution decreases. Also, the mode of the posterior distribution converges to the maximum likelihood estimate of $ka_{\rm pop}$ when $\gamma$ increases.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Estimation of the Fisher information matrix ==<br /> <br /> The variance of the estimator $\thmle$ and thus confidence intervals can be derived from the [[Estimation of the observed Fisher information matrix|observed Fisher information matrix (F.I.M.)]], which itself is calculated using the observed likelihood (i.e., the pdf of the observations $\by$):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ofim_intro3&quot;&gt;&lt;math&gt;<br /> \ofim(\thmle ; \by) \ \ \eqdef \ \ - \displaystyle{ \frac{\partial^2}{\partial \theta^2} }\log({\like}(\thmle ; \by)) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> Then, the variance-covariance matrix of the maximum likelihood estimator $\thmle$ can be estimated by the inverse of the observed F.I.M. Standard errors (s.e.) for each component of $\thmle$ are their standard deviations, i.e., the square-root of the diagonal elements of this covariance matrix. $\monolix$ also displays the (estimated) relative standard errors (r.s.e.), i.e., the (estimated) standard error divided by the value of the estimated parameter.<br /> <br /> <br /> {{JustCode<br /> |code=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt;Estimation of the population parameters<br /> <br /> parameter s.e. (s.a.) r.s.e.(%)<br /> ka : 0.974 0.082 8<br /> V : 7.07 0.35 5<br /> Cl : 2 0.07 4<br /> h0 : 0.0102 0.0014 14<br /> gamma : 0.485 0.015 3<br /> <br /> omega_ka : 0.668 0.064 10<br /> omega_V : 0.365 0.037 10<br /> omega_Cl : 0.588 0.055 9<br /> omega_h0 : 0.105 0.032 30<br /> omega_gamma : 0.0901 0.044 49<br /> <br /> a_1 : 0.345 0.012 3<br /> &lt;/pre&gt; }}<br /> <br /> The F.I.M. can be used for detecting overparametrization of the structural model. In effect, if the model is poorly identifiable, certain estimators will be quite correlated and the F.I.M. will therefore be poorly conditioned and difficult to inverse. Suppose for example that we want to fit a two compartment PK model to the same data as before. The output is shown below. The large values for the relative standard errors for the inter-compartmental clearance $Q$ and the volume of the peripheral compartment $V_2$ mean that the data does not allow us to estimate well these two parameters.<br /> <br /> <br /> {{JustCode<br /> |code=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt;Estimation of the population parameters<br /> <br /> parameter s.e. (lin) r.s.e.(%)<br /> ka : 0.246 0.0081 3<br /> Cl : 1.9 0.075 4<br /> V1 : 1.71 0.14 8<br /> Q : 0.000171 0.024 1.43e+04<br /> V2 : 0.00673 3.1 4.62e+04<br /> <br /> omega_ka : 0.171 0.026 15<br /> omega_Cl : 0.293 0.026 9<br /> omega_V1 : 0.621 0.062 10<br /> omega_Q : 5.72 1.4e+03 2.41e+04<br /> omega_V2 : 4.61 1.8e+04 3.94e+05<br /> <br /> a : 0.136 0.0073 5<br /> &lt;/pre&gt; }}<br /> <br /> <br /> The Fisher information criteria is also widely used in optimal experimental design. Indeed, minimizing the variance of the estimator corresponds to maximizing the information. Then, estimators and designs can be evaluated by looking at certain summary statistics of the covariance matrix (like the determinant or trace for instance).<br /> <br /> &lt;br&gt;<br /> == Estimation of the individual parameters ==<br /> <br /> Once $\theta$ has been estimated, the conditional distribution $\pmacro(\psi_i | y_i ; \hat{\theta})$ of the individual parameters $\psi_i$ can be estimated for each individual $i$ using the [[The Metropolis-Hastings algorithm for simulating the individual parameters| Metropolis-Hastings algorithm]]. For each $i$, this algorithm generates a sequence $(\psi_i^{k}, k \geq 1)$ which converges in distribution to the conditional distribution $\pmacro(\psi_i | y_i ; \hat{\theta})$ and that can be used for estimating any summary statistic of this distribution (mean, standard deviation, quantiles, etc.).<br /> <br /> The mode of this conditional distribution can be estimated using this sequence or by maximizing $\pmacro(\psi_i | y_i ; \hat{\theta})$ using numerical methods.<br /> <br /> The choice of using the conditional mean or the conditional mode is arbitrary. By default, $\monolix$ uses the conditional mode, taking the philosophy that the &quot;most likely&quot; values of the individual parameters are the most suited for computing the &quot;most likely&quot; predictions.<br /> <br /> <br /> {{ImageWithCaption|image=mode1.png|caption=Predicted concentrations for 6 individuals using the estimated conditional modes of the individual PK parameters}} <br /> <br /> &lt;br&gt;<br /> <br /> == Estimation of the observed log-likelihood ==<br /> <br /> <br /> Once $\theta$ has been estimated, the observed log-likelihood of $\hat{\theta}$ is defined as<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \begin{eqnarray}<br /> {\llike} (\hat{\theta};\by) &amp;=&amp; \log({\like}(\hat{\theta};\by)) \\<br /> &amp;\eqdef&amp; \log(\py(\by;\hat{\theta})) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The observed log-likelihood cannot be computed in closed form for nonlinear mixed effects models, but can be estimated using the methods described in the [[Estimation of the log-likelihood]] Section. The estimated log-likelihood can then be used for performing likelihood ratio tests and for computing information criteria such as AIC and BIC (see the [[Model evaluation]] Section).<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Bibliography ==<br /> <br /> &lt;bibtex&gt;<br /> @article{Monolix,<br /> author = {Lixoft},<br /> title = {Monolix 4.2},<br /> year={2012}<br /> journal = {http://www.lixoft.eu/products/monolix/product-monolix-overview},<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> &lt;bibtex&gt;<br /> @article{comets2011package,<br /> title={saemix: Stochastic Approximation Expectation Maximization (SAEM) algorithm. R package version 0.96.1},<br /> author={Comets, E. and Lavenu, A. and Lavielle, M.},<br /> journal = {http://cran.r-project.org/web/packages/saemix/index.html},<br /> year={2013}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> &lt;bibtex&gt;<br /> @article{nlmefitsa,<br /> title={nlmefitsa: fit nonlinear mixed-effects model with stochastic EM algorithm. Matlab R2013a function},<br /> author={The MathWorks},<br /> journal = {http://www.mathworks.fr/fr/help/stats/nlmefitsa.html},<br /> year={2013}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> &lt;bibtex&gt;<br /> @article{beal1992nonmem,<br /> title={NONMEM users guides},<br /> author={Beal, S.L. and Sheiner, L.B. and Boeckmann, A. and Bauer, R.J.},<br /> journal={San Francisco, NONMEM Project Group, University of California},<br /> year={1992}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> &lt;bibtex&gt;<br /> @book{pinheiro2000mixed,<br /> title={Mixed effects models in S and S-PLUS},<br /> author={Pinheiro, J.C. and Bates, D.M.},<br /> year={2000},<br /> publisher={Springer Verlag}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> &lt;bibtex&gt;<br /> @article{pinheiro2010r,<br /> title={the R Core team (2009) nlme: Linear and Nonlinear Mixed Effects Models. R package version 3.1-96},<br /> author={Pinheiro, J. and Bates, D. and DebRoy, S. and Sarkar, D.},<br /> journal={R Foundation for Statistical Computing, Vienna},<br /> year={2010}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> &lt;bibtex&gt;<br /> @article{spiegelhalter2003winbugs,<br /> title={WinBUGS user manual},<br /> author={Spiegelhalter, D. and Thomas, A. and Best, N. and Lunn, D.},<br /> journal={Cambridge: MRC Biostatistics Unit},<br /> year={2003}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> &lt;bibtex&gt;<br /> @Manual{docSPSS,<br /> title = {Linear mixed-effects modeling in SPSS. An introduction to the MIXED procedure},<br /> author = {SPSS},<br /> year = {2002},<br /> note={Technical Report}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> &lt;bibtex&gt;<br /> @Manual{docSAS,<br /> title = {The NLMMIXED procedure, SAS/STAT 9.2 User's Guide},<br /> chapter = {61},<br /> pages = {4337--4435},<br /> author = {SAS},<br /> year = {2008}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack=Visualization<br /> |linkNext=Model evaluation }}</div> Admin https://wiki.inria.fr/popix/Visualization Visualization 2013-05-27T15:04:17Z <p>Admin: /* Introduction */</p> <hr /> <div>&lt;div style=&quot;color: #2E5894; padding-left: 1.4em; padding-right:2.2em; padding-bottom:0.8em; padding:top:1&quot;&gt;[[Image:attention4.jpg|45px|left|link=]] <br /> (If you are experiencing problems with the display of the mathematical formula, you can either try to use another browser, or use this link which should work smoothly: http://popix.lixoft.net)<br /> &lt;/div&gt;<br /> <br /> == Introduction ==<br /> <br /> Before deciding to model data, it is very important to be able to visualize it. This is especially the case for longitudinal data when we want to see how an outcome varies with time or as a function of another outcome. We may also want to visualize how the individual covariates are distributed, visually detect if there are relationships between variables, visually compare data from different groups, etc. Development of such visual exploration tools poses no methodological problems. It is simple to write a [http://www.mathworks.fr/products/matlab/ Matlab] or [http://www.r-project.org/ R] code for one's own needs. To<br /> illustrate the data visualization part of this chapter, we have created a little Matlab toolbox called [[Media:popixplore.pdf| $\popixplore$]] ({{filepath:popixplore 1.1.zip}}) which can be freely downloaded and used.<br /> <br /> It may also be useful to be able to visualize the model itself by undertaking a sensitivity analysis to look at how the structural model changes when we vary one or several parameters. This is important for truly understanding the structural model, i.e., what is behind the given mathematical equations. In the modeling context, we may also want to visually calibrate parameters in order to obtain predictions as close as possible to the observations. Developing such a tool is a difficult task because the tool needs to be able to easily input a model using some coding language, perform complex calculations, and provide a decent graphical interface (e.g., one that lets you easily modify the model parameters).<br /> <br /> Various model visualization tools exist, such as [http://www.berkeleymadonna.com/index.html Berkeley Madonna], specialized in the analysis of dynamical systems and the resolution of ordinary differential equations. Here, we use [http://www.lixoft.eu/products/mlxplore/mlxplore-overview/ $\mlxplore$] for some different reasons:<br /> <br /> <br /> &lt;ul&gt;<br /> * $\mlxplore$ uses the [http://www.lixoft.com/wp-content/resources/docs/modelMLXTRANtutorial.pdf $\mlxtran$] language which is extremely flexible and well-adapted to implementing complex mixed-effects models. Indeed, with $\mlxtran$ we can implement pharmacokinetic models with complex administration schedules, include inter-individual variability in parameters, define a statistical model for the covariates, etc. Another extremely important aspect of $\mlxtran$ is that it rigorously adopts the model representation formalisms proposed in $\wikipopix$. In other words, model implementation is completely in sync with its mathematical representation.<br /> &lt;br&gt;<br /> <br /> * $\mlxplore$ provides a clear graphical interface that of course allows us to visualize the structural model, but also the statistical model, which is of fundamental importance in the population approach. We can thus visualize the impact of covariates and inter-individual variability of model parameters on predictions.<br /> &lt;/ul&gt;<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Data exploration ==<br /> <br /> <br /> The following example involves 80 individuals that receive a unique dose of an anticoagulant at time $t=0$. For each patient we then measure the plasmatic concentration of the drug at various times. This drug can cause undesirable side effects such as nose bleeds. If this happens, we also record the times at which this happens. The data is recorded in columns of a single text file {{Verbatim|pkrtte_data.csv}}. In this example, the columns are:<br /> <br /> <br /> &lt;ul&gt;<br /> '''id''' the ID number of the patient<br /> &lt;br&gt;&lt;br&gt;<br /> '''time''' dose administration and observation times<br /> &lt;br&gt;&lt;br&gt;<br /> '''amt''' the amount of drug administered<br /> &lt;br&gt;&lt;br&gt;<br /> '''y''' the observations (concentrations and events)<br /> &lt;br&gt;&lt;br&gt;<br /> '''ytype''' the type of observation: 1=concentration, 2=event<br /> &lt;br&gt;&lt;br&gt;<br /> '''weight''' a continuous individual covariate<br /> &lt;br&gt;&lt;br&gt;<br /> '''gender''' a categorical individual covariate (F or M)<br /> &lt;br&gt;&lt;br&gt;<br /> '''group''' four different groups receive different doses: A=40mg, B=60mg, C=80mg, D=100mg.<br /> &lt;/ul&gt;<br /> <br /> <br /> {{ImageWithCaption|image=exploredata0.png|caption=The datafile {{Verbatim|pkrtte_data.csv}} }} <br /> <br /> <br /> We can read this datafile with the function {{Verbatim|readdatapx}} and add additional information about the data:<br /> <br /> <br /> {{MATLABcode<br /> |name=<br /> |code=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> datafile.name='pkrtte_data.csv';<br /> datafile.format='csv'; % can be &quot;csv&quot;, &quot;space&quot;, &quot;tab&quot; or &quot;;&quot;<br /> <br /> info.header = {'ID','TIME','AMT','Y','YTYPE','COV','CAT','CAT'};<br /> info.observation.name={'concentration','hemorrhaging'};<br /> info.observation.type={'continuous','event'};<br /> info.observation.unit={'mg/l',''};<br /> info.covariate.unit={'kg',''};<br /> info.time.unit='h';<br /> <br /> data=readdatapx(datafile,info);<br /> &lt;/pre&gt; }}<br /> <br /> <br /> How we graphically represent data depends on the type of data. Often for continuous data we use &quot;spaghetti plots&quot;, where all of the observations are given on the same plot, and those for each individual are joined up using line segments. Time-to-event data are usually represented using [https://en.wikipedia.org/wiki/Kaplan-Meier_survival_curve Kaplan-Meier plots], i.e., an estimate of the survival function for the first event. In the case of repeated events, we can instead represent the average cumulative number of events per individual.<br /> <br /> <br /> {{MATLABcode<br /> |name=<br /> |code=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> &gt;&gt;exploredatapx(data)<br /> &lt;/pre&gt; }}<br /> <br /> <br /> {{ImageWithCaption|image=exploredata1.png|caption=Graphical representation of the data. Left: concentrations, right: average cumulative number of events per individual}}<br /> <br /> <br /> When different groups receive different treatments, it can be useful to separately visualize the data from each group. Here for instance we can separate the patients into groups depending on the initial dose given.<br /> <br /> <br /> {{ImageWithCaption|image=exploredata2.png|caption=Concentration profiles per dose group}}<br /> <br /> <br /> {| cellpadding=&quot;10&quot; cellspacing=&quot;0&quot;<br /> |style = &quot;width:50%&quot;| [[File:exploredata3a.png]] <br /> |style = &quot;width:50%&quot;| [[File:exploredata3b.png]]<br /> |-<br /> |cellspan=&quot;2&quot; align=&quot;center&quot; style=&quot;text-align:center&quot;| ''Distribution of weight and gender per dose group'' <br /> |}<br /> <br /> <br /> <br /> {{Remarks<br /> |title=Remark<br /> |text=The data file {{Verbatim|pkrtte_data.csv}} and the matlab script {{Verbatim|pkrtte_demo.m}} are available in the folder {{Verbatim|demos}} of $\popixplore$: {{filepath:popixplore 1.1.zip}}.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Model exploration==<br /> <br /> ===Exploring the structural model===<br /> <br /> Suppose that we now want to visualize the following joint model which is one that can be used for simultaneously modeling PK and time-to-event data:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> k&amp;=&amp;Cl/V \\<br /> \deriv{A_d} &amp;=&amp; - k_a \, A_d(t) \\<br /> \deriv{A_c} &amp;=&amp; k_a \, A_d(t) - k \, A_c(t) \\<br /> Cc(t) &amp;=&amp; {Ac(t)}/{V} \\<br /> h(t) &amp;=&amp; h_0 \, \exp(\gamma\, Cc(t)) .<br /> \end{eqnarray} &lt;/math&gt; }}<br /> <br /> Here, $A_d$ and $A_c$ are the amounts of drug in the depot and central compartments, $Cc$ the concentration in the central compartment and $h$ the hazard function for the event of interest (hemorrhaging for instance). The parameters of the model are the absorption rate constant $ka$, the volume of distribution $V$, the clearance $Cl$, the baseline hazard $h_0$ and the coefficient $\gamma$.<br /> We assume that the drug can be administered both intravenously and orally, meaning that the drug can be administered to both the depot and the central compartment.<br /> <br /> We first need to implement this model using $\mlxtran$:<br /> <br /> <br /> {{MLXTran<br /> |name=joint1_model.txt<br /> |text=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> [PREDICTION]<br /> input={ka, V, Cl, h0, gamma}<br /> <br /> PK:<br /> depot(type=1,target=Ad)<br /> depot(type=2,target=Ac)<br /> <br /> EQUATION:<br /> k = Cl/V<br /> ddt_Ad = -ka*Ad<br /> ddt_Ac = ka*Ad - k*Ac<br /> Cc = Ac/V<br /> h = h0*exp(gamma*Cc)<br /> &lt;/pre&gt;}}<br /> <br /> <br /> Here, an administration of type 1 (resp. 2) is an oral (resp. iv) administration.<br /> <br /> The tasks, i.e., how the model is to be used, are then coded as an $\mlxplore$ project:<br /> <br /> <br /> {{MLXPlore<br /> |name=joint1_project.txt<br /> |text=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> &lt;MODEL&gt;<br /> file='joint1_model.txt'<br /> <br /> &lt;DESIGN&gt;<br /> [ADMINISTRATION]<br /> adm1={time=0, amount=50,type=1}<br /> <br /> &lt;PARAMETER&gt;<br /> ka = 0.5<br /> V = 10<br /> Cl = 0.5<br /> h0 = 0.01<br /> gamma = 0.5<br /> <br /> &lt;OUTPUT&gt;<br /> list={Cc, h}<br /> grid=0:0.1:100<br /> &lt;/pre&gt; }}<br /> <br /> <br /> In this example, a single dose of 50 mg is administered orally ({{Verbatim|target{{-}}Ad}} when {{Verbatim|type{{-}}1}}) at time 0. We have asked $\mlxplore$ to display the predicted concentration $Cc$ and the hazard function $h$ between $t=0$ and $t=100$ every $0.1\,h$ for a given set of parameters. We can then change the values of these parameters with the sliders to see what the impact on the two functions is.<br /> <br /> <br /> {{ImageWithCaption|image=exploremodel1.png|caption=Exploring the model using $\mlxplore$ }}<br /> <br /> <br /> We can easily modify the dose regimen without changing anything in the model itself. Suppose for instance that we want now to compare a treatment with repeated doses of 50mg every 24 hours and a treatment with repeated doses of 25mg every 12 hours. Only the section {{Verbatim|&lt;DESIGN&gt;}} needs to be modified:<br /> <br /> <br /> {{ExampleWithCode&amp;Image<br /> |title=<br /> |text=<br /> |code={{MLXPloreForTable<br /> |name=joint2_project.txt<br /> |text=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> <br /> &lt;DESIGN&gt;<br /> [ADMINISTRATION]<br /> adm1={time=0:24:144, amount=50,type=1}<br /> adm2={time=0:12:144, amount=25,type=1}<br /> &lt;/pre&gt; }}<br /> |image=[[File:exploremodel2.png]] }}<br /> <br /> <br /> We can combine different administrations (oral and intravenous for instance) into one global treatment:<br /> <br /> <br /> {{ExampleWithCode&amp;Image<br /> |title=<br /> |text=<br /> |code={{MLXPloreForTable<br /> |name=joint3_project.txt<br /> |text=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> &lt;DESIGN&gt;<br /> [ADMINISTRATION]<br /> adm1={time=0:24:144, amount=50,type=1}<br /> adm2={time=6:48:150, amount=25,type=2}<br /> <br /> [TREATMENT]<br /> trt1={adm1, adm2}<br /> &lt;/pre&gt; }}<br /> |image= [[File:exploremodel3.png]]<br /> }}<br /> <br /> ===Exploring the statistical model===<br /> <br /> One of the main advantages of $\mlxplore$ is its ability to graphically display the predicted distribution of the functions of interest $Cc$ and $h$ when certain parameters of the model are assumed to be random variables. Assume for instance that $V$, $Cl$ and $h_0$ are log-normally distributed. To take this into account, we simply need to insert a section {{Verbatim|[INDIVIDUAL]}} into the project file:<br /> <br /> <br /> {{MLXTran<br /> |name=joint2_model.txt<br /> |text=&lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> [INDIVIDUAL]<br /> input={V_pop,Cl_pop,h0_pop,omega_V,omega_Cl,omega_h0}<br /> <br /> DEFINITION:<br /> V = {distribution=lognormal, reference=V_pop, sd=omega_V}<br /> Cl = {distribution=lognormal, reference=Cl_pop, sd=omega_Cl}<br /> h0 = {distribution=lognormal, reference=h0_pop, sd=omega_h0}<br /> <br /> [PREDICTION]<br /> input={ka, V, Cl, h0, gamma}<br /> .<br /> .<br /> .<br /> &lt;/pre&gt; }}<br /> <br /> <br /> The parameters of the model are now the population parameters $V_{\rm pop}$, $Cl_{\rm pop}$, $h0_{\rm pop}$, $\omega_V$, $\omega_{Cl}$ and $\omega_{h_0}$ and the parameters $k_a$ and $\gamma$ which have no inter-individual variability.<br /> <br /> <br /> {{MLXTran<br /> |name=joint4_project.txt<br /> |text=&lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> &lt;MODEL&gt;<br /> file='joint2_model.txt'<br /> <br /> &lt;DESIGN&gt;<br /> [ADMINISTRATION]<br /> adm1={time=0, amount=50,type=1}<br /> <br /> &lt;PARAMETER&gt;<br /> V_pop = 10<br /> Cl_pop = 0.5<br /> h0_pop=0.01<br /> omega_V = 0.2<br /> omega_Cl = 0.3<br /> omega_h0 = 0.2<br /> ka = 0.5<br /> gamma = 0.5<br /> <br /> &lt;OUTPUT&gt;<br /> list={Cc, h}<br /> grid=0:0.1:100<br /> &lt;/pre&gt; }}<br /> <br /> <br /> When some parameters of the model are random variables, $\mlxplore$ displays the median of the predicted distribution and several prediction intervals (the default is to use different shaded areas for the 10%, 20%, ..., 90% quantiles).<br /> <br /> <br /> {{ImageWithCaption|image=exploremodel4b.png|caption=Exploring the statistical model using $\mlxplore$}}<br /> <br /> <br /> It is possible to introduce covariates into the statistical model by considering for example that the volume depends on the weight, and considering that these covariates are themselves random variables. This may be important if we are for example looking to visualize the amount of variation in concentration due to variation in weight, and the variation in concentration which remains unaccounted for, caused by random effects.<br /> <br /> <br /> {{ImageWithCaption|image=exploremodel5.png|caption=Exploring the statistical model using $\mlxplore$ }}<br /> <br /> <br /> The $\mlxtran$ model files and the $\mlxplore$ scripts can be downloaded here: {{filepath:pk mlxplore.zip}}.<br /> <br /> <br /> &lt;br&gt;<br /> == Bibliography ==<br /> <br /> <br /> &lt;bibtex&gt;<br /> @ARTICLE{popixplore,<br /> author = {POPIX Inria team},<br /> title = {Popixplore 1.0},<br /> url = {https://wiki.inria.fr/wikis/popix/images/7/71/Popixplore_1.1.zip},<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @ARTICLE{MLXplore,<br /> author = {Lixoft},<br /> title = {MLXPlore 1.0},<br /> url = {http://www.lixoft.eu/products/mlxplore/mlxplore-overview},<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{macey2000berkeley,<br /> title={Berkeley Madonna user’s guide},<br /> author={Macey, R. and Oster, G. and Zahnley, T.},<br /> journal={Berkeley (CA): University of California},<br /> year={2000}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{chatterjee2009sensitivity,<br /> title={Sensitivity analysis in linear regression},<br /> author={Chatterjee, S. and Hadi, A. S.},<br /> volume={327},<br /> year={2009},<br /> publisher={Wiley}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{sensibilité2013,<br /> title={Analyse de sensibilité et exploration de modèles},<br /> author={Faivre R. and Looss B. and Mah&amp;eacute;vas, S. and Makowski, D. and Monod, H.},<br /> year={2013},<br /> publisher={Editions Quae}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{saltelli2000sensitivity,<br /> title={Sensitivity analysis},<br /> author={Saltelli, A. and Chan, K. and Scott, E. M. and others},<br /> volume={134},<br /> year={2000},<br /> publisher={Wiley New York}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{saltelli2008global,<br /> title={Global sensitivity analysis: the primer},<br /> author={Saltelli, A. and Ratto, M. and Andres, T. and Campolongo, F. and Cariboni, J. and Gatelli, D. and Saisana, M. and Tarantola, S.},<br /> year={2008},<br /> publisher={Wiley-Interscience}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{saltelli2004sensitivity,<br /> title={Sensitivity analysis in practice: a guide to assessing scientific models},<br /> author={Saltelli, A. and Tarantola, S. and Campolongo, F. and Ratto, M.},<br /> year={2004},<br /> publisher={Wiley}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> <br /> {{Next<br /> |link=Modeling}}</div> Admin https://wiki.inria.fr/popix/Introduction_to_PK_modeling_using_$%5Cmlxplore$ Introduction to PK modeling using $\mlxplore$ 2013-05-27T09:30:21Z <p>Admin: /* Introduction */</p> <hr /> <div>== Introduction ==<br /> This is an introductory tutorial for describing and visualizing simple and more complex [http://en.wikipedia.org/wiki/Pharmacokinetics pharmacokinetic] (PK) models.<br /> <br /> We will present several PK model examples and visualize the processes of [http://en.wikipedia.org/wiki/Absorption_%28pharmacokinetics%29 absorption], [http://en.wikipedia.org/wiki/Distribution_%28pharmacology%29 distribution] and [http://en.wikipedia.org/wiki/Elimination_%28pharmacology%29 elimination] that characterize them. <br /> <br /> We will suppose in all these examples that a single dose is administered at time t=0.<br /> In each example, the modeling goal is defined. Then, the model and requests for graphical outputs are coded in [http://www.lixoft.eu/products/mlxplore/mlxplore-overview/ MLXPlore], a new graphical and interactive software for the exploration and visualization of complex [http://en.wikipedia.org/wiki/Pharmacometrics pharmacometric] models. MLXPlore uses the easy and intuitive [http://www.lixoft.com/wp-content/resources/docs/modelMLXTRANtutorial.pdf MLXtran] model coding language, popularized by the [http://ww35.monolix.org/ Monolix] software. <br /> <br /> MLXPlore is used here for computing the predicted amount in the central compartment. We further display in the [[ Introduction to PK modeling using MLXPlore - Part II | Part II]] the predicted amount in the depot compartment and the MLXPlore project that was used for computing it.<br /> <br /> &lt;br&gt;<br /> <br /> == Absorption ==<br /> <br /> <br /> === First-order and zero-order absorption ===<br /> <br /> <br /> &lt;span style=&quot;color:#993300&quot;&gt;{{Verbatim|absorption1a_script:}}&lt;/span&gt; this computes and displays the amount (Ac) in the central compartment when the drug is absorbed with a first-order or zero-order absorption process. <br /> &lt;blockquote&gt;<br /> '''Left:''' In the right-hand side window, the two (first-order and zero-order) models are described using the MLXtran coding language. In the left-hand side window, the structural model, experimental design, parameters and requested graphical output are defined. <br /> &lt;br&gt;&lt;br&gt;<br /> '''Right:''' The graphical output of MLXPlore, which was told to output the amount Ac in the central compartment with respect to time for zero-order (red) and first-order (blue) absorption.<br /> &lt;/blockquote&gt;<br /> <br /> &lt;div style=&quot;overflow-x:auto&quot;&gt;<br /> {| cellpadding=&quot;10&quot; cellspacing=&quot;10&quot;<br /> |style=&quot;width:50%&quot;| <br /> [[File:Absorption1a_script.png]]<br /> |style=&quot;width:50%&quot;|<br /> [[File:Absorption1a_bis.png]]<br /> |} &lt;/div&gt;<br /> <br /> <br /> See the [[Introduction to PK modeling using MLXPlore - Part II | Part II]] for the corresponding amounts in the depot compartment and the related $\mlxplore$ project.<br /> <br /> &lt;br&gt;<br /> <br /> === First-order, zero-order and $\alpha$-order absorption ===<br /> <br /> <br /> &lt;span style=&quot;color:#993300&quot;&gt;{{Verbatim|absorption2a_script:}}&lt;/span&gt; we compute and display the amounts in the central and depot compartments when the drug is transferred from the depot to the central compartment with a first-order, zero-order or $\alpha$-order absorption process. <br /> <br /> Note $\dot{A}d(t) \, = \, -ka \, \times \, Ad^{\alpha}(t).$ Zero-order absorption is obtained with $\alpha=0$ and first-order absorption with $\alpha=1$. <br /> The green curves are with respect to the $\alpha$-order absorption process. <br /> <br /> &lt;div style=&quot;overflow-x:auto&quot;&gt;<br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot; <br /> |style=&quot;width:50%&quot;|<br /> [[File:Absorption2a_script.png]]<br /> |style=&quot;width:50%&quot;|<br /> [[File:Absorption2a.png]]<br /> |} &lt;/div&gt;<br /> <br /> <br /> See the [[ Introduction to PK modeling using MLXPlore - Part II | Part II]] for the corresponding amounts in the depot compartment and the related $\mlxplore$ project. <br /> <br /> <br /> &lt;br&gt;<br /> <br /> === First-order, zero-order and sequential zero-order/first-order absorption ===<br /> <br /> &lt;span style=&quot;color:#993300&quot;&gt;{{Verbatim|absorption3a_script:}}&lt;/span&gt; we compute and display the amount in the central compartment when the drug is transferred from the depot to the central compartment with a first-order, zero-order or sequential zero-order/first-order absorption process. <br /> <br /> Here, $r0$ is the absorption rate for the zero-order process and $F0$ the fraction of the dose absorbed in a zero-order process.<br /> The green curves refer to the sequential zero-order/first-order absorption process. <br /> <br /> <br /> &lt;div style=&quot;overflow-x:auto&quot;&gt;<br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width=50%&quot;|<br /> [[File:Absorption3a_script.png]]<br /> |style=&quot;width=50%|<br /> [[File:Absorption3a.png]]<br /> |} &lt;/div&gt;<br /> <br /> <br /> See the [[ Introduction to PK modeling using MLXPlore - Part II | Part II]] for the corresponding amounts in the depot compartment and the related $\mlxplore$ project. <br /> <br /> <br /> &lt;br&gt;<br /> <br /> === First-order and saturated absorption ===<br /> <br /> &lt;span style=&quot;color:#993300&quot;&gt;{{Verbatim|absorption4_script:}}&lt;/span&gt; we compute and display the amount in the central compartment when the drug is transferred from the depot to the central compartment with a first-order or saturated (Michaelis-Mentens) absorption process.<br /> The red curve is now for the saturated absorption process. <br /> <br /> &lt;div style=&quot;overflow-x:auto&quot;&gt;<br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot;|<br /> [[File:Absorption4a_script.png]]<br /> |style=&quot;width:50%&quot;|<br /> [[File:Absorption4a.png]]<br /> |} &lt;/div&gt;<br /> <br /> <br /> See the [[ Introduction to PK modeling using MLXPlore - Part II| Part II]] for the corresponding amounts in the depot compartment and the related $\mlxplore$ project.<br /> <br /> === Lag-time and transit compartments ===<br /> <br /> &lt;span style=&quot;color:#993300&quot;&gt;{{Verbatim|absorption5_script:}}&lt;/span&gt; we compute and display the amount in the central compartment when a lag time or a transit compartment model is used. <br /> <br /> Here, the blue curve is for first-order absorption without lag-time, the red curve for the lag-time model and the green one for the transit compartment model. The number of transit compartments is $Ntr=Mtt/Ktr$. When $Mtt=Tlag$, the transit compartment model can be seen as a smooth version of the lag-time model. It converges to the lag-time model when the number of compartments increases (i.e., when the transfer rate constant $Ktr$ increases). <br /> <br /> <br /> &lt;div style=&quot;overflow-x:auto&quot;&gt;<br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot;|<br /> [[File:Absorption5a_script.png]]<br /> |style=&quot;width:50%&quot;|<br /> [[File:Absorption5a.png]]<br /> |} &lt;/div&gt;<br /> <br /> See the [[ Introduction to PK modeling using MLXPlore - Part II| Part II]] for the corresponding amounts in the depot compartment and the related $\mlxplore$ project. <br /> <br /> &lt;br&gt;<br /> <br /> === Summary ===<br /> <br /> &lt;span style=&quot;color:#993300&quot;&gt;{{Verbatim|absorption6a_script:}}&lt;/span&gt; we compute and display the amount in the central compartment for all of the different absorption models presented in the previous examples.<br /> <br /> In the figure, abs1 is first-order absorption, abs2 is $\alpha$-order absorption, abs3 is saturated absorption, abs4 is zero-order absorption and abs5 is sequential zero-order/first-order absorption.<br /> <br /> <br /> &lt;div style=&quot;overflow-x:auto&quot;&gt;<br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot;|<br /> [[File:Absorption6a_script.png]]<br /> |style=&quot;width:50%&quot;|<br /> [[File:Absorption6a.png]]<br /> |} &lt;/div&gt;<br /> <br /> <br /> See the [[ Introduction to PK modeling using MLXPlore - Part II | Part II]] for the corresponding amounts in the depot compartment and the related $\mlxplore$ project.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Distribution ==<br /> <br /> &lt;br&gt;<br /> === One, two and three compartment models ===<br /> <br /> &lt;span style=&quot;color:#993300&quot;&gt;{{Verbatim|distribution1_script:}}&lt;/span&gt; we compute and display the amount in the central and peripheral compartments when the drug is distributed assuming one, two or three compartment models.<br /> <br /> Here, $Ap$ and $Aq$ are the amounts in the first and second peripheral compartments and $lAc$ the log-amount in the central compartment. <br /> <br /> &lt;div style=&quot;overflow-x:auto&quot;&gt;<br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot;|<br /> [[File:Distribution1_script.png]]<br /> |style=&quot;width:50%&quot;|<br /> [[File:Distribution1.png]]<br /> |} &lt;/div&gt;<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Elimination ==<br /> <br /> &lt;br&gt;<br /> === Linear, nonlinear and combined elimination ===<br /> <br /> &lt;span style=&quot;color:#993300&quot;&gt;{{Verbatim|elimination1_script:}}&lt;/span&gt; we compute and display the amount in the central compartment and the rate of elimination when the drug is eliminated with a linear, nonlinear (Michaelis-Mentens) or combined elimination process (linear when $\alpha=1$ and Michaelis-Mentens when $\alpha=1$).<br /> <br /> Here, $lAc$ is the log-amount in the central compartment and lre the log-rate of elimination of the drug. By definition, lre is a linear function of time for a linear elimination process.<br /> <br /> <br /> &lt;div style=&quot;overflow-x:auto&quot;&gt;<br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot;|<br /> [[File:Elimination1_script.png|460px]]<br /> |style=&quot;width:50%&quot;|<br /> [[File:Plot_Elimination1.png]]<br /> |} &lt;/div&gt;<br /> <br /> <br /> <br /> All the projects shown in this session can be downloaded here: {{filepath:Pk mlxplore.zip}}.<br /> <br /> {{Next<br /> |link=Introduction to PK modeling using MLXPlore - Part II }}</div> Admin https://wiki.inria.fr/popix/Admin/vector.css Admin/vector.css 2013-05-21T14:13:36Z <p>Admin: Page créée avec « *{ color:green; } »</p> <hr /> <div>*{ color:green; }</div> Admin https://wiki.inria.fr/popix/Admin/Vector.css Admin/Vector.css 2013-05-21T14:11:07Z <p>Admin: </p> <hr /> <div>*{<br /> color: green;<br /> }</div> Admin https://wiki.inria.fr/popix/LauraTest LauraTest 2013-05-17T10:24:41Z <p>Admin: </p> <hr /> <div>Ceci est un test <br /> <br /> &lt;math&gt; a = 1 + 3 &lt;/math&gt;<br /> <br /> $a + b = log(C)$</div> Admin https://wiki.inria.fr/popix/Animations_%26_Videos Animations & Videos 2013-05-14T12:57:07Z <p>Admin: /* Introduction to PK modeling */</p> <hr /> <div><br /> == Introduction to the population approach == <br /> <br /> The goal of this animation is to show that the population approach is relevant to many fields of application (biology, agronomy, toxicology, pharmacology, etc.) and to present a PK modeling application in a bit more detail.<br /> <br /> {{ShowVideo_NoCaption|image=IntroductionPA.png|size=500px|video=http://team.inria.fr/popix/files/2012/01/Populations.swf}}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Introduction to PK modeling ==<br /> <br /> This animation describes how complex biological phenomena can be approximated by simplified models represented by mathematical equations.<br /> <br /> {{ShowVideo_NoCaption|image=IntroductionPK.png |size=500px|video=https://team.inria.fr/popix/files/2013/02/PKmodelling.swf }}<br /> <br /> <br /> Examples of PK modeling using $\mlxplore$ can be visualized [[Introduction_to_PK_modeling_using_MLXPlore_-_Part_I|here]].<br /> &lt;br&gt;&lt;br&gt;<br /> <br /> == The SAEM algorithm in practice ==<br /> <br /> This video shows how the SAEM algorithm works, as described in [[The SAEM algorithm for estimating population parameters]] section. <br /> <br /> <br /> {{ShowVideo_NoCaption|image=saem5b.png|video=http://popix.lixoft.net/images/2/20/saem.mp4}}</div> Admin https://wiki.inria.fr/popix/Code Code 2013-05-14T12:26:42Z <p>Admin: /* R code */</p> <hr /> <div>=== R code ===<br /> <br /> The code and the input data used in the examples of [[The individual approach]] section can be downloaded here: {{filepath: individualFitting.zip}}.<br /> <br /> === Matlab ===<br /> <br /> The MATLAB Toolbox $\popixplore$,used in the [[Visualization]] section, can be downloaded here: {{filepath: popixplore 1.1.zip}}.<br /> <br /> === $\monolix$, $\mlxtran$ &amp; $\mlxplore$===<br /> <br /> The scripts and the projects used to explore and visualize PK models ([[Introduction to PK modeling using MLXPlore - Part I|Part I]] and [[Introduction to PK modeling using MLXPlore - Part II|Part II]]) can be downloaded here: {{filepath:pk mlxplore.zip}}.</div> Admin https://wiki.inria.fr/popix/Introduction_to_models Introduction to models 2013-05-13T13:47:29Z <p>Bleakley: </p> <hr /> <div>&lt;div style=&quot;color: #2E5894; padding-left: 1.4em; padding-right:2.2em; padding-bottom:0.8em; padding:top:1&quot;&gt;[[Image:attention4.jpg|45px|left|link=]] <br /> (If you are experiencing problems with the display of the mathematical formula, you can either try to use another browser, or use this link which should work smoothly: http://popix.lixoft.net)<br /> &lt;/div&gt;<br /> <br /> Models are attempts to describe observations in a logical, simple way, involving the relationship between measurements, parameters, covariates and so on. If working in a probabilistic framework - as we are here - there will be randomness in the model, involving random variables, probability distributions, errors and more.<br /> <br /> Because of this, we are going to make the following definition of a model in this context: [[What is a model? A joint probability distribution! | '''a model is a joint probability distribution''']].<br /> <br /> Therefore, defining a model means defining a [http://en.wikipedia.org/wiki/Joint_probability_distribution joint probability distribution], which can then be decomposed into a product of [http://en.wikipedia.org/wiki/Conditional_distribution conditional distributions] we can perform tasks on: estimation, model selection, simulation, etc.<br /> <br /> This chapter is therefore about defining appropriate probability distributions. We start by introducing some general notation and conventions.<br /> <br /> <br /> * We will call $y_i$ the set of observations recorded on subject $i$, and $\by$ the combined set of observations for all the $N$ individuals: $\by = (y_1, ...,y_N)$. In general, we will use '''bold''' text (like for $\by$) when a variable regroups several individuals. Thus, we write $\psi_i$ for the parameter vector for individual $i$ and $\bpsi$ the parameter vector of a set of individuals, $\bpsi = (\psi_1,\ldots,\psi_N)$.<br /> <br /> <br /> * We note $\qy$ and $\qpsi$ the distributions of $\by$ and $\bpsi$ respectively, $\qcypsi$ the conditional distribution of $\by$ given $\bpsi$, and $\qypsi$ the joint distribution of $\by$ and $\bpsi$. In these (and other distributions), we have placed the variable described by the distribution in the index.<br /> <br /> <br /> * We use the same &quot;$p$&quot; notation for the [http://en.wikipedia.org/wiki/Probability_distribution distribution] of a random variable as for its [http://en.wikipedia.org/wiki/Probability_density_function probability density function] (pdf).<br /> <br /> <br /> * When there is no ambiguity when working with whole equations, to simplify notation we may omit the indices and simply use the symbol $\pmacro$. For instance, $\qy(\by)$, the pdf of $\by$, becomes $\py(\by)$; both are equivalent. The symbol $\pmacro$ has no meaning on its own, it is completely defined by its arguments.<br /> <br /> <br /> * When the distribution of the individual parameters $\psi_i$ of subject $i$ depends on a vector of individual [http://en.wikipedia.org/wiki/Covariate covariates] $c_i$ and a population parameter $\theta$, we may choose to explicitly show this dependence by writing the distribution of $\psi_i$ as $\ppsii(\psi_i;c_i,\theta)$.<br /> <br /> <br /> * When the [http://en.wikipedia.org/wiki/Conditional_distribution conditional distribution] $\qcyipsii$ of the observations $y_i=(y_{ij}, 1\leq j \leq n_i)$ of individual $i$ depends on regression variables $x_i=(x_{ij}, 1\leq j \leq n_i)$ and source terms $u_i$, (i.e., inputs of a dynamical system such as doses in a [[Introduction_to_PK_modeling_using_MLXPlore_-_Part_I | pharmacokinetic model]]), we may choose to explicitly show this dependence, writing the conditional distribution as $\pcyipsii(y_i | \psi_i;x_i,u_i)$.<br /> <br /> <br /> There are two important pieces to the puzzle: the observations $\by$ whose distribution $\qy$ depends on the individual parameters, and the individual parameters $\bpsi$ themselves with distribution $\qpsi$. In the population approach, the base distribution is the joint distribution $\qypsi$ of the observations and individual parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsi(\by,\bpsi) = \pcypsi(\by {{!}} \bpsi)\ppsi(\bpsi).<br /> &lt;/math&gt; }}<br /> <br /> In this chapter, we concentrate essentially on these two components: the [http://en.wikipedia.org/wiki/Conditional_distribution conditional distribution] $\qcypsi$ of the observations, and the distribution $\qpsi$ of the individual parameters.<br /> <br /> Depending on the required complexity of the model, its other components such as [http://en.wikipedia.org/wiki/Covariate covariates], population parameters and design can also be modeled as [http://en.wikipedia.org/wiki/Random_variable random variables], but we will not go into such detail in this chapter.<br /> <br /> For each model, we aim to precisely identify the minimal amount of information needed to represent it mathematically, so that it remains possible to implement and analyze. To do this, we will be able to use $\mlxtran$, a powerful formal declarative language that allows us to describe complicated structural and statistical models in a straightforward, intuitive way.<br /> <br /> {{Next<br /> |link=Modeling the individual parameters }}</div> Admin https://wiki.inria.fr/popix/Estimation_of_the_log-likelihood_via_importance_sampling Estimation of the log-likelihood via importance sampling 2013-04-30T10:00:57Z <p>Admin: </p> <hr /> <div>== Estimation using importance sampling ==<br /> <br /> The observed log-likelihood ${\llike}(\theta;\by)=\log({\like}(\theta;\by))$ can be estimated without requiring approximation of the model, using a Monte Carlo approach.<br /> <br /> Since<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\llike}(\theta;\by) &amp;\eqdef&amp; \log(\py(\by;\theta)) \\<br /> &amp;=&amp; \sum_{i=1}^{N} \log(\pyi(y_i;\theta)),<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> we can estimate $\log(\pyi(y_i;\theta))$ for each individual and derive an estimate of the log-likelihood as the sum of these individual log-likelihoods. We will now explain how to estimate $\log(\pyi(y_i;\theta))$ for any individual $i$.<br /> <br /> Using the $\phi$-representation of the model, notice first that $\pyi(y_i;\theta)$ can be decomposed as follows:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pyi(y_i;\theta) &amp;=&amp;<br /> \displaystyle{ \int \pyipsii(y_i,\phi_i;\theta)\,d\phi_i }\\<br /> &amp;=&amp; \displaystyle{\int \pcyiphii(y_i {{!}} \phi_i;\theta)\pphii(\phi_i;\theta)\,d\phi_i } \\<br /> &amp;=&amp; \esps{\qphii}{\pcyiphii(y_i {{!}} \phi_i;\theta)}.<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Thus, $\pyi(y_i;\theta)$ is expressed as a mean. It can therefore be approximated by an empirical mean using a Monte Carlo procedure:<br /> <br /> <br /> &lt;blockquote&gt;<br /> 1. Draw $M$ independent values $\phi_i^{(1)}$, $\phi_i^{(2)}$, ..., $\phi_i^{(M)}$ from the normal distribution $\qphii(\, \cdot \, ; \theta)$.<br /> &lt;/blockquote&gt;<br /> &lt;blockquote&gt;<br /> 2. Estimate $\pyi(y_i;\theta)$ with<br /> &lt;/blockquote&gt;<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \hat{p}_{i,M} = \displaystyle{\frac{1}{M} }\sum_{m=1}^{M}\pcyiphii(y_i {{!}} \phi_i^{(m)};\theta) .<br /> &lt;/math&gt; }}<br /> <br /> <br /> By construction, this estimator is unbiased:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \esp{\hat{p}_{i,M} }&amp;=&amp; \esps{\qphii}{\pcyiphii(y_i {{!}} \phi_i^{(m)};\theta)} \\<br /> &amp;=&amp; \pyi(y_i;\theta) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Furthermore, it is consistent since its variance decreases as $1/M$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \var{\hat{p}_{i,M} }= \displaystyle{ \frac{1}{M} }\vars{\qphii}{\pcyiphii(y_i {{!}} \phi_i^{(m)};\theta)}.<br /> &lt;/math&gt; }}<br /> <br /> We could consider ourselves satisfied with this estimator since we &quot;only&quot; have to select $M$ large enough to get an estimator with a small variance. Nevertheless, we will see now that it is possible to improve the statistical properties of this estimator.<br /> <br /> For any distribution $\tqphii$ that is absolutely continuous with respect to the marginal distribution $\qphii$, we can write<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pyi(y_i;\theta) &amp;=&amp;<br /> \displaystyle{ \int \pyiphii(y_i,\phi_i;\theta)\,d\phi_i }\\<br /> &amp;=&amp; \displaystyle{\int \pcyiphii(y_i {{!}} \phi_i;\theta)\frac{\pphii(\phi_i;\theta)}{\tpphii(\phi_i;\theta)}\tpphii(\phi_i;\theta)\,d\phi_i } \\<br /> &amp;=&amp; \esps{\tqphii}{\pcyiphii(y_i {{!}} \phi_i;\theta)\displaystyle{\frac{\pphii(\phi_i;\theta)}{\tpphii(\phi_i;\theta)} } }.<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> We can now approximate $\pyi(y_i;\theta)$ via an ''importance sampling'' integration method using $\tqphii$ as a proposal distribution:<br /> <br /> <br /> &lt;blockquote&gt;<br /> 1. Draw $M$ independent values $\phi_i^{(1)}$, $\phi_i^{(2)}$, ..., $\phi_i^{(M)}$ from the proposal distribution $\tqphii(\, \cdot \, ; \theta)$.<br /> &lt;/blockquote&gt;<br /> &lt;blockquote&gt;<br /> 2. Estimate $\pyi(y_i;\theta)$ with<br /> &lt;/blockquote&gt;<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \hat{p}_{i,M} = \displaystyle{\frac{1}{M} }\sum_{m=1}^{M}\pcyiphii(y_i {{!}} \phi_i^{(m)};\theta) \displaystyle{\frac{\pphii(\phi_i^{(m)};\theta)}{\tpphii(\phi_i^{(m)};\theta)} }.<br /> &lt;/math&gt; }}<br /> <br /> <br /> By construction, this new estimator is also unbiased and its variance also decreases as $1/M$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \var{\hat{p}_{i,M} }= \displaystyle{\frac{1}{M} }\vars{\tqphii}{\pcyiphii(y_i {{!}} \phi_i^{(m)};\theta) \displaystyle{\frac{\pphii(\phi_i^{(m)};\theta)}{\tpphii(\phi_i^{(m)};\theta)} } }.<br /> &lt;/math&gt; }}<br /> <br /> There exists an infinite number of possible proposal distributions $\tpphii$ which all provide the same rate of convergence $1/M$.<br /> The trick is to reduce the variance of the estimator by selecting a proposal distribution so that the numerator is as small as possible.<br /> <br /> Imagine that we use the conditional distribution $\qcphiiyi$ as the proposal. Then, for any $m=1,2,\ldots,M$,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcyiphii(y_i {{!}} \phi_i^{(m)};\theta) \displaystyle{\frac{\pphii(\phi_i^{(m)};\theta)}{\tpphii(\phi_i^{(m)};\theta)} }<br /> &amp;=&amp; \pcyiphii(y_i {{!}} \phi_i^{(m)};\theta) \displaystyle{\frac{\pphii(\phi_i^{(m)};\theta)}{\pcphiiyi(\phi_i^{(m)} {{!}} y_i;\theta)} }\\<br /> &amp;=&amp; \pyi(y_i;\theta),<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> which means that $\hat{p}_{i,M}=\pyi(y_i;\theta)$! Such an estimator is optimal since its variance is null and only one realization of $\qcphiiyi$ is required to exactly compute $\pyi(y_i;\theta)$. The problem is that it is not possible to generate the $\phi_i^{(m)}$ with this conditional distribution, since that would require to compute a normalizing constant, which here is precisely $\pyi(y_i;\theta)$.<br /> <br /> Nevertheless, this conditional distribution can be estimated using the Metropolis-Hastings algorithm described in [[The Metropolis-Hastings algorithm for simulating the individual parameters]] section and a practical proposal &quot;close&quot; to the optimal proposal $\qcphiiyi$ can be derived. We can then expect to get a very accurate estimate with a relatively small Monte Carlo size $M$.<br /> <br /> In $\monolix$, the mean and variance of the conditional distribution $\qcphiiyi$ are estimated by Metropolis-Hastings for each individual $i$. Then, the $\phi_i^{(m)}$ are drawn with a noncentral $t$ distribution:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \phi_i^{(m)} = \mu_i + \sigma_i \times T_{i,m} ,<br /> &lt;/math&gt; }}<br /> <br /> where $\mu_i$ and $\sigma^2_i$ are estimates of $\esp{\phi_i|y_i;\theta}$ and $\var{\phi_i|y_i;\theta}$, and $(T_{i,m})$ is a sequence of i.i.d. random variables distributed with a Student's $t$ distribution with $\nu$ degrees of freedom. <br /> <br /> $\monolix$ uses the default value $\nu=5$. It is also possible to automatically test different degrees of freedom from the set $\{2, 5, 10, 20\}$ and to select the one that provides the smallest empirical variance for $\widehat{ {\llike} }_M(\theta;\by) = \sum_{i=1}^{N}\log(\hat{p}_{i,M})$.<br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> |text=Even if $\widehat{ {\like} }_M(\theta;\by) = \prod_{i=1}^{N}\hat{p}_{i,M}$ is an unbiased estimator of ${\like}(\theta;\by)$, $\widehat{ {\llike} }_M(\theta;\by)$ is a biased estimator of $\log({\like}(\theta;\by))$. Indeed, by Jensen's inequality, we have that<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \esp{\log(\widehat{ {\like} }_M(\theta;\by))} \leq \log \esp{\widehat{ {\like} }_M(\theta;\by)} = \log ( {\like}(\theta;\by)).<br /> &lt;/math&gt; }}<br /> <br /> However, the bias decreases as $M$ increases, and also if $\widehat{ {\like} }_M(\theta;\by)$ is close to ${\like}(\theta;\by)$. It is therefore highly recommended to use a proposal as close as possible to the conditional distribution $\qcphiiyi$, which means having to estimate this conditional distribution before estimating the log-likelihood.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= A mixture model is used for this example. The goal here is not to provide details about the model and the formulas used for implementation. We only want to highlight the importance of the choice of the proposal distribution for estimating the likelihood in complex models. The three figures below display estimates of the deviance ($-2\times {\llike} (\theta;\by)$) by importance sampling using different proposal distributions. The estimated deviance is displayed as a function of the Monte Carlo size.<br /> <br /> In the first example, the conditional distributions of the individual parameters $\phi_i$ were estimated using the [[The Metropolis-Hastings algorithm for simulating the individual parameters|Metropolis-Hastings algorithm]], and noncentral $t$ distributions with 5 d.f. were used as proposal distributions. There is no bias and an accurate estimate is obtained with a small Monte Carlo size. The estimated deviance is 14386.8 (s.e. = 0.7). Here, only 100 Metropolis-Hastings iterations were required to correctly estimate the conditional mean and variance of the $\psi_i$.<br /> <br /> <br /> {{ImageWithCaption|image=ll2.png|caption= }}<br /> <br /> <br /> In the second example, only the last 30 iterations of [[The SAEM algorithm for estimating population parameters|SAEM]] were used for estimating the conditional mean and variance of the $\phi_i$. Then, noncentral $t$ distributions with 5 d.f. we used as proposal distributions. Now there is bias which decreases very slowly with the Monte Carlo size and which is non-negligible even with $10^5$ simulations. The estimated deviance is 14397.0 (s.e. = 2.7).<br /> <br /> <br /> {{ImageWithCaption|image=ll1.png|caption= }}<br /> <br /> <br /> In the third example, normal distributions were used as proposals. The parameters of these distributions were the mean and variance of the conditional distributions of the $\phi_i$ estimated using Metropolis-Hastings. Results are similar to those obtained with a $t$ distribution (estimated deviance=14386.9, s.e. = 1) but here, more simulations are required to eliminate the bias. <br /> <br /> <br /> {{ImageWithCaption|image=ll3.png|caption= }}<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Estimation using linearization ==<br /> <br /> For continuous data models, an alternative to the importance sampling approach is to use a linearization of the model like that proposed in the [[Estimation of the observed Fisher information matrix#Estimation using linearization of the model|Estimation of the F.I.M. using a linearization of the model]] chapter to approximate the observed Fisher Information Matrix. Indeed, the marginal distribution of a continuous vector of observations $y_i$ can be approximated by a normal distribution. It is then straightforward to derive the associated likelihood. All of these calculations are described in the [[Estimation of the observed Fisher information matrix#Estimation using linearization of the model|Estimation of the F.I.M. using a linearization of the model]] chapter.<br /> <br /> This method can be much faster than importance sampling. It should be used by modelers for model selection purposes during the initial runs, when the goal is to identify significant differences between models. Importance sampling should be used when a more precise evaluation of the log-likelihood is required. <br /> <br /> <br /> {{Back<br /> |link=Estimation of the observed Fisher information matrix }}</div> Admin https://wiki.inria.fr/popix/Estimation_of_the_observed_Fisher_information_matrix Estimation of the observed Fisher information matrix 2013-04-29T15:08:25Z <p>Admin: </p> <hr /> <div>$\def\hphi{\tilde{\phi}}$<br /> ==Estimation using stochastic approximation==<br /> <br /> The ''observed'' Fisher information matrix (F.I.M.) is a function of $\theta$ defined as<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq_fim1&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> I(\theta) &amp;=&amp; -\DDt{\log ({\like}(\theta;\by))} \\<br /> &amp;=&amp; -\DDt{\log (\py(\by;\theta))} .<br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> Due to the likelihood being quite complex, $I(\theta)$ usually has no closed form expression. It is however possible to estimate it using a stochastic approximation procedure based on &lt;balloon title=&quot;Kuhn05: put here the reference!!!&quot; style=&quot;color:#177245&quot;&gt;Louis' formula&lt;/balloon&gt;:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\DDt{\log (\pmacro(\by;\theta))} = \esp{\DDt{\log (\pmacro(\by,\bpsi;\theta))} {{!}} \by ;\theta} + \cov{\Dt{\log (\pmacro(\by,\bpsi;\theta))} {{!}} \by ; \theta},<br /> &lt;/math&gt; }}<br /> <br /> where <br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \cov{\Dt{\log (\pmacro(\by,\bpsi;\theta))} {{!}} \by ; \theta} &amp;=&amp;<br /> \esp{ \left(\Dt{\log (\pmacro(\by,\bpsi;\theta))} \right)\left(\Dt{\log (\pmacro(\by,\bpsi;\theta))}\right)^{\transpose} {{!}} \by ; \theta} \\<br /> &amp;&amp; - \esp{\Dt{\log (\pmacro(\by,\bpsi;\theta))} {{!}} \by ; \theta}\esp{\Dt{\log (\pmacro(\by,\bpsi;\theta))} {{!}} \by ; \theta}^{\transpose} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Thus, $\DDt{\log (\pmacro(\by;\theta))}$ is defined as a combination of conditional expectations. Each of these conditional expectations can be estimated by Monte Carlo, or equivalently approximated using a stochastic approximation algorithm.<br /> <br /> We can then draw a sequence $(\psi_i^{(k)})$ using a [[The Metropolis-Hastings algorithm for simulating the individual parameters|Metropolis-Hasting algorithm]] and estimate the observed F.I.M. online. At iteration $k$ of the algorithm:<br /> <br /> <br /> * '''Simulation step''': for $i=1,2,\ldots,N$, draw $\psi_i^{(k)}$ from $m$ iterations of the Metropolis-Hastings algorithm described in [[The Metropolis-Hastings algorithm for simulating the individual parameters| The Metropolis-Hastings algorithm]] section with $\pmacro(\psi_i |y_i ;{\theta})$ as the limit distribution.<br /> <br /> <br /> * '''Stochastic approximation''': update $D_k$, $G_k$ and $\Delta_k$ according to the following recurrence relations:<br /> <br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \Delta_k &amp; = &amp; \Delta_{k-1} + \gamma_k \left(\Dt{\log (\pmacro(\by,\bpsi^{(k)};{\theta}))} - \Delta_{k-1} \right) \\<br /> D_k &amp; = &amp; D_{k-1} + \gamma_k \left(\DDt{\log (\pmacro(\by,\bpsi^{(k)};{\theta}))} - D_{k-1} \right)\\<br /> G_k &amp; = &amp; G_{k-1} + \gamma_k \left((\Dt{\log (\pmacro(\by,\bpsi^{(k)};{\theta}))})(\Dt{\log (\pmacro(\by,\bpsi^{(k)};{\theta}))})^\transpose -G_{k-1} \right),<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> : where $(\gamma_k)$ is a decreasing sequence of positive numbers such that $\gamma_1=1$, $\sum_{k=1}^{\infty} \gamma_k = \infty$, and $\sum_{k=1}^{\infty} \gamma_k^2 &lt; \infty$.<br /> <br /> <br /> * '''Estimation step''': update the estimate $H_k$ of the F.I.M. according to<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;H_k = D_k + G_k - \Delta_k \Delta_k^{\transpose}. &lt;/math&gt; }} <br /> <br /> <br /> <br /> Implementing this algorithm therefore requires computation of the first and second derivatives of<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\log (\pmacro(\by,\bpsi;\theta))=\sum_{i=1}^{N} \log (\pmacro(y_i,\psi_i;\theta)).&lt;/math&gt; }}<br /> <br /> Assume first that the joint distribution of $\by$ and $\bpsi$ decomposes as<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:fim_dec1&quot;&gt;&lt;math&gt;<br /> \pypsi(\by,\bpsi;\theta) = \pcypsi(\by {{!}} \bpsi)\ppsi(\bpsi;\theta).<br /> &lt;/math&gt;&lt;/div&gt; <br /> |reference=(2) }}<br /> <br /> This assumption means that for any $i=1,2,\ldots,N$, all of the components of $\psi_i$ are random and there exists a sufficient statistic ${\cal S}(\bpsi)$ for the estimation of $\theta$. It is then sufficient to compute the first and second derivatives of $\log (\pmacro(\bpsi;\theta))$ in order to estimate the F.I.M. This can be done relatively simply in closed form when the individual parameters are normally distributed (or a transformation $h$ of them is).<br /> <br /> If some component of $\psi_i$ has no variability, [[#eq:fim_dec1|(2)]] no longer holds, but we can decompose $\theta$ into $(\theta_y,\theta_\psi)$ such that<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pyipsii(y_i,\psi_i;\theta) = \pcyipsii(y_i {{!}} \psi_i ; \theta_y)\ppsii(\psi_i;\theta_\psi).<br /> &lt;/math&gt; }}<br /> <br /> We then need to compute the first and second derivatives of $\log(\pcyipsii(y_i |\psi_i ; \theta_y))$ and $\log(\ppsii(\psi_i;\theta_\psi))$. Derivatives of $\log(\pcyipsii(y_i |\psi_i ; \theta_y))$ that do not have a closed form expression can be obtained using central differences.<br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> |text=<br /> 1. Using $\gamma_k=1/k$ for $k \geq 1$ means that each term is approximated with an empirical mean obtained from $(\bpsi^{(k)}, k \geq 1)$. For instance,<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:fim_Delta1&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \Delta_k<br /> &amp;=&amp; \Delta_{k-1} + \displaystyle{ \frac{1}{k} } \left(\Dt{\log (\pmacro(\by,\bpsi^{(k)};\theta))} - \Delta_{k-1} \right) <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:fim_Delta2&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> &amp;=&amp; \displaystyle{ \frac{1}{k} }\sum_{j=1}^{k} \Dt{\log (\pmacro(\by,\bpsi^{(j)};\theta))} . <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> [[#eq:fim_Delta1|(3)]] (resp. [[#eq:fim_Delta2|(4)]]) defines $\Delta_k$ using an online (resp. offline) algorithm. Writing $\Delta_k$ as in [[#eq:fim_Delta1|(3)]] instead of [[#eq:fim_Delta2|(4)]] avoids having to store all simulated sequences $(\bpsi^{(j)}, 1\leq j \leq k)$ when computing $\Delta_k$.<br /> <br /> <br /> 2. This approach is used for computing the F.I.M. $I(\hat{\theta})$ in practice, where $\hat{\theta}$ is the maximum likelihood estimate of $\theta$. The only difference with the [[The Metropolis-Hastings algorithm for simulating the individual parameters|Metropolis-Hastings]] used for SAEM is that the population parameter $\theta$ is not updated and remains fixed at $\hat{\theta}$.<br /> }}<br /> <br /> <br /> {{OutlineText<br /> |text=In summary, for a given estimate $\hat{\theta}$ of the population parameter $\theta$, a stochastic approximation algorithm for estimating the observed Fisher Information Matrix $I(\hat{\theta)}$ consists of:<br /> <br /> &lt;blockquote&gt;<br /> 1. For $i=1,2,\ldots,N$, run a [[The Metropolis-Hastings algorithm for simulating the individual parameters|Metropolis-Hastings algorithm]] to draw a sequence $\psi_i^{(k)}$ with limit distribution $\pmacro(\psi_i {{!}}y_i ;\hat{\theta})$.<br /> &lt;/blockquote&gt;<br /> &lt;blockquote&gt;<br /> 2. At iteration $k$ of the Metropolis-Hastings algorithm, compute the first and second derivatives of $\pypsi(\by,\bpsi^{(k)};\hat{\theta})$.<br /> &lt;/blockquote&gt;<br /> &lt;blockquote&gt;<br /> 3.Update $\Delta_k$, $G_k$, $D_k$ and compute an estimate $H_k$ of the F.I.M.<br /> &lt;/blockquote&gt;<br /> }} <br /> <br /> <br /> {{Example<br /> |title=Example 1<br /> |text=Consider the model<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> y_i {{!}} \psi_i &amp;\sim&amp; \pcyipsii(y_i {{!}} \psi_i) \\<br /> h(\psi_i) &amp;\sim_{i.i.d}&amp; {\cal N}( h(\psi_{\rm pop}) , \Omega),<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where $\Omega = {\rm diag}(\omega_1^2,\omega_2^2,\ldots,\omega_d^2)$ is a diagonal matrix and $h(\psi_i)=(h_1(\psi_{i,1}), h_2(\psi_{i,2}), \ldots , h_d(\psi_{i,d}) )^{\transpose}$.<br /> The vector of population parameters is $\theta = (\psi_{\rm pop} , \Omega)=(\psi_{ {\rm pop},1},\ldots,\psi_{ {\rm pop},d},\omega_1^2,\ldots,\omega_d^2)$.<br /> <br /> Here,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \log (\pyipsii(y_i,\psi_i;\theta)) = \log (\pcyipsii(y_i {{!}} \psi_i)) + \log (\ppsii(\psi_i;\theta)).<br /> &lt;/math&gt; }}<br /> <br /> Then,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \Dt{\log (\pyipsii(y_i,\psi_i;\theta))} &amp;=&amp; \Dt{\log (\ppsii(\psi_i;\theta))} \\<br /> \DDt{\log (\pyipsii(y_i,\psi_i;\theta))} &amp;=&amp; \DDt{\log (\ppsii(\psi_i;\theta))} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> More precisely,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \log (\ppsii(\psi_i;\theta)) &amp;=&amp; -\displaystyle{\frac{d}{2} }\log(2\pi) + \sum_{\iparam=1}^d \log(h_\iparam^{\prime}(\psi_{i,\iparam}))<br /> -\displaystyle{ \frac{1}{2} } \sum_{\iparam=1}^d \log(\omega_\iparam^2)<br /> -\sum_{\iparam=1}^d \displaystyle{ \frac{1}{2\, \omega_\iparam^2} }( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) )^2 \\<br /> \partial \log (\ppsii(\psi_i;\theta))/\partial \psi_{ {\rm pop},\iparam} &amp;=&amp;<br /> \displaystyle{\frac{1}{\omega_\iparam^2} }h_\iparam^{\prime}(\psi_{ {\rm pop},\iparam})( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) ) \\<br /> \partial \log (\ppsii(\psi_i;\theta))/\partial \omega^2_{\iparam} &amp;=&amp;<br /> -\displaystyle{ \frac{1}{2\omega_\iparam^2} }<br /> +\displaystyle{\frac{1}{2\, \omega_\iparam^4} }( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) )^2 \\<br /> \partial^2 \log (\ppsii(\psi_i;\theta))/\partial \psi_{ {\rm pop},\iparam} \partial \psi_{ {\rm pop},\jparam} &amp;=&amp;<br /> \left\{<br /> \begin{array}{ll}<br /> &lt;!-- % \frac{1}{\omega_\iparam^2} --&gt;<br /> \left( h_\iparam^{\prime\prime}(\psi_{ {\rm pop},\iparam})( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) )- h_\iparam^{\prime \, 2}(\psi_{ {\rm pop},\iparam}) \right)/\omega_\iparam^2 &amp; {\rm if \quad } \iparam=\jparam \\<br /> 0 &amp; {\rm otherwise}<br /> \end{array}<br /> \right.<br /> \\<br /> \partial^2 \log (\ppsii(\psi_i;\theta))/\partial \omega^2_{\iparam} \partial \omega^2_{\jparam} &amp;=&amp; \left\{<br /> \begin{array}{ll}<br /> &lt;!-- % \frac{1}{2\omega_\iparam^4} - \frac{1}{\omega_\iparam^6} --&gt;<br /> 1/(2\omega_\iparam^4) -<br /> ( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) )^2/\omega_\iparam^6 &amp; {\rm if \quad} \iparam=\jparam \\<br /> 0 &amp; {\rm otherwise}<br /> \end{array}<br /> \right.<br /> \\<br /> \partial^2 \log (\ppsii(\psi_i;\theta))/\partial \psi_{ {\rm pop},\iparam} \partial \omega^2_{\jparam} &amp;=&amp; \left\{<br /> \begin{array}{ll}<br /> &lt;!-- % -\frac{1}{\omega_\iparam^4} --&gt;<br /> -h_\iparam^{\prime}(\psi_{ {\rm pop},\iparam})( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) )/\omega_\iparam^4 &amp; {\rm if \quad} \iparam=\jparam \\<br /> 0 &amp; {\rm otherwise.}<br /> \end{array}<br /> \right.<br /> \end{eqnarray}&lt;/math&gt; }}<br /> }}<br /> <br /> <br /> <br /> {{Example<br /> |title=Example 2<br /> |text= We consider the same model for continuous data, assuming a constant error model and that the variance $a^2$ of the residual error has no variability:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> y_{ij} {{!}} \psi_i &amp;\sim&amp; {\cal N}(f(t_{ij}, \psi_i) \ , \ a^2), \ \ 1 \leq j \leq n_i \\<br /> h(\psi_i) &amp;\sim_{i.i.d}&amp; {\cal N}( h(\psi_{\rm pop}) , \Omega).<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Here, $\theta_y=a^2$, $\theta_\psi=(\psi_{\rm pop},\Omega)$ and<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \log(\pyipsii(y_i,\psi_i;\theta)) = \log(\pcyipsii(y_i {{!}} \psi_i ; a^2)) + \log(\ppsii(\psi_i;\psi_{\rm pop},\Omega)),<br /> &lt;/math&gt; }}<br /> <br /> where<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \log(\pcyipsii(y_i {{!}} \psi_i ; a^2))<br /> =-\displaystyle{\frac{n_i}{2} }\log(2\pi)- \displaystyle{\frac{n_i}{2} }\log(a^2) - \displaystyle{\frac{1}{2a^2} }\sum_{j=1}^{n_i}(y_{ij} - f(t_{ij}, \psi_i))^2 .<br /> &lt;/math&gt; }}<br /> <br /> Derivatives of $\log(\pcyipsii(y_i {{!}} \psi_i ; a^2))$ with respect to $a^2$ are straightforward to compute. Derivatives of $\log(\ppsii(\psi_i;\psi_{\rm pop},\Omega))$ with respect to $\psi_{\rm pop}$ and $\Omega$ remain unchanged.<br /> }}<br /> <br /> <br /> <br /> <br /> {{Example<br /> |title=Example 3<br /> |text= Consider again the same model for continuous data, assuming now that a subset $\xi$ of the parameters of the structural model has no variability:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> y_{ij} {{!}} \psi_i &amp;\sim&amp; {\cal N}(f(t_{ij}, \psi_i,\xi) \ , \ a^2), \ \ 1 \leq j \leq n_i \\<br /> h(\psi_i) &amp;\sim_{i.i.d}&amp; {\cal N}( h(\psi_{\rm pop}) , \Omega).<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Let $\psi$ remain as the subset of individual parameters with variability. Here, $\theta_y=(\xi,a^2)$, $\theta_\psi=(\psi_{\rm pop},\Omega)$, and<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \log(\pcyipsii(y_i {{!}} \psi_i ; \xi,a^2))<br /> =-\displaystyle{\frac{n_i}{2} }\log(2\pi)- \displaystyle{\frac{n_i}{2} }\log(a^2) - \displaystyle{\frac{1}{2 a^2} }\sum_{j=1}^{n_i}(y_{ij} - f(t_{ij}, \psi_i,\xi))^2 .<br /> &lt;/math&gt; }}<br /> <br /> Derivatives of $\log(\pcyipsii(y_i {{!}} \psi_i ; \xi, a^2))$ with respect to $\xi$ require computation of the derivative of $f$ with respect to $\xi$. These derivatives are usually not calculable. One possibility is to numerically approximate them using finite differences.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Estimation using linearization of the model == <br /> <br /> Consider here a model for continuous data that uses a $\phi$-parametrization for the individual parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> y_{ij} &amp;= &amp; f(t_{ij} , \phi_i) + g(t_{ij} , \phi_i)\teps_{ij} \\<br /> \phi_i &amp;=&amp; \phi_{\rm pop} + \eta_i .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Let $\hphi_i$ be some predicted value of $\phi_i$, such as for instance the estimated mean or estimated mode of the conditional distribution $\pmacro(\phi_i |y_i ; \hat{\theta})$.<br /> <br /> We can then choose to linearize the model for the observations $(y_{ij}, 1\leq j \leq n_i)$ of individual $i$ around the vector of predicted individual parameters. Let $\Dphi{f(t , \phi)}$ be the row vector of derivatives of $f(t , \phi)$ with respect to $\phi$. Then,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> y_{ij} &amp;\simeq&amp; f(t_{ij} , \hphi_i) + \Dphi{f(t_{ij} , \hphi_i)} \, (\phi_i - \hphi_i) + g(t_{ij} , \hphi_i)\teps_{ij} \\<br /> &amp;\simeq&amp; f(t_{ij} , \hphi_i) + \Dphi{f(t_{ij} , \hphi_i)} \, (\phi_{\rm pop} - \hphi_i)<br /> + \Dphi{f(t_{ij} , \hphi_i)} \, \eta_i + g(t_{ij} , \hphi_i)\teps_{ij} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Then, we can approximate the marginal distribution of the vector $y_i$ as a normal distribution:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:fim_approx&quot;&gt;&lt;math&gt;<br /> y_{i} \approx {\cal N}\left(f(t_{i} , \hphi_i) + \Dphi{f(t_{i} , \hphi_i)} \, (\phi_{\rm pop} - \hphi_i) ,<br /> \Dphi{f(t_{i} , \hphi_i)} \Omega \Dphi{f(t_{i} , \hphi_i)}^{\transpose} + g(t_{i} , \hphi_i)\Sigma_{n_i} g(t_{ij} , \hphi_i)^{\transpose} \right),<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(5) }}<br /> <br /> where $\Sigma_{n_i}$ is the variance-covariance matrix of $\teps_{i,1},\ldots,\teps_{i,n_i}$. If the $\teps_{ij}$ are i.i.d., then<br /> $\Sigma_{n_i}$ is the identity matrix.<br /> <br /> We can equivalently use the original $\psi$-parametrization and the fact that $\phi_i=h(\psi_i)$. Then,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \Dphi{f(t_{i} , \hphi_i)} = \Dpsi{f(t_{i} , \hpsi_i)} J_h(\hpsi_i)^{\transpose} , &lt;/math&gt; }}<br /> <br /> where $J_h$ is the Jacobian of $h$.<br /> <br /> We then can approximate the observed log-likelihood ${\llike}(\theta) = \log(\like(\theta;\by))=\sum_{i=1}^N \log(\pyi(y_i;\theta))$ using this normal approximation. We can also derive the F.I.M. by computing the matrix of second-order partial derivatives of ${\llike}(\theta)$.<br /> <br /> Except for very simple models, computing these second-order partial derivatives in closed form is not straightforward. In such cases, finite differences can be used for numerically approximating them. We can use for instance a central difference approximation of the second derivative of $\llike(\theta)$. To this end, let $\nu&gt;0$. For $j=1,2,\ldots, m$, let $\nu^{(j)}=(\nu^{(j)}_{k}, 1\leq k \leq m)$ be the $m$-vector such that<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \nu^{(j)}_{k} = \left\{<br /> \begin{array}{ll}<br /> \nu &amp; {\rm if \quad j= k} \\<br /> 0 &amp; {\rm otherwise.}<br /> \end{array}<br /> \right.<br /> &lt;/math&gt; }}<br /> <br /> Then, for $\nu$ small enough,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \partial_{\theta_j}{ {\llike}(\theta)} &amp;\approx&amp; \displaystyle{ \frac{ {\llike}(\theta+\nu^{(j)})- {\llike}(\theta-\nu^{(j)})}{2\nu} } \\<br /> \end{eqnarray}&lt;/math&gt; }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:fim_diff&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \partial^2_{\theta_j,\theta_k}{ {\llike}(\theta)} &amp;\approx&amp; \displaystyle{\frac{ {\llike}(\theta+\nu^{(j)}+\nu^{(k)})- {\llike}(\theta+\nu^{(j)}-\nu^{(k)})<br /> -{\llike}(\theta-\nu^{(j)}+\nu^{(k)})+{\llike}(\theta-\nu^{(j)}-\nu^{(k)})}{4\nu^2} } . <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(6) }}<br /> <br /> &lt;br&gt;&lt;br&gt;<br /> ------<br /> &lt;br&gt;&lt;br&gt;<br /> <br /> {{OutlineText<br /> |text=In summary, for a given estimate $\hat{\theta}$ of the population parameter $\theta$, the algorithm for approximating the Fisher Information Matrix $I(\hat{\theta)}$ using a linear approximation of the model consists of:<br /> <br /> &lt;blockquote&gt;<br /> 1. For $i=1,2,\ldots,N$, obtain some estimate $(\hpsi_i)$ of the individual parameters $(\psi_i)$ (we can average for example the final terms of the sequence $(\psi_i^{(k)})$ drawn during the final iterations of the [[The SAEM algorithm for estimating population parameters| SAEM algorithm]]).<br /> &lt;/blockquote&gt;<br /> &lt;blockquote&gt;<br /> 2. For $i=1,2,\ldots,N$, compute $\hphi_i=h(\hpsi_i)$, the mean and the variance of the normal distribution defined in [[#eq:fim_approx|(5)]], and ${\llike}(\theta)$ using this normal approximation.<br /> &lt;/blockquote&gt;<br /> &lt;blockquote&gt;<br /> 3. Use [[#eq:fim_diff|(6)]] to approximate the matrix of second-order derivatives of ${\llike}(\theta)$.<br /> &lt;/blockquote&gt;<br /> }}<br /> <br /> {{Back&amp;Next<br /> |linkNext=Estimation of the log-likelihood<br /> |linkBack=The Metropolis-Hastings algorithm for simulating the individual parameters }}</div> Admin https://wiki.inria.fr/popix/The_SAEM_algorithm_for_estimating_population_parameters The SAEM algorithm for estimating population parameters 2013-04-29T13:26:26Z <p>Brocco: </p> <hr /> <div>==Introduction ==<br /> <br /> <br /> The SAEM (Stochastic Approximation of EM) algorithm is a stochastic algorithm for calculating the maximum likelihood estimator (MLE) in the quite general setting of incomplete data models. SAEM has been shown to be a very powerful NLMEM tool, known to accurately estimate population parameters as well as having good theoretical properties. In fact, it converges to the MLE under very general hypotheses.<br /> <br /> SAEM was first implemented in the $\monolix$ software. It has also been implemented in NONMEM, the {{Verbatim|R}} package {{Verbatim|saemix}} and the Matlab statistics toolbox as the function {{Verbatim|nlmefitsa.m}}.<br /> <br /> Here, we consider a model that includes observations $\by=(y_i , 1\leq i \leq N)$, unobserved individual parameters $\bpsi=(\psi_i , 1\leq i \leq N)$ and a vector of parameters $\theta$. By definition, the maximum likelihood estimator of $\theta$ maximizes<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; {\like}(\theta ; \by) = \py(\by ; \theta) = \displaystyle{ \int \pypsi(\by,\bpsi ; \theta) \, d \bpsi}.<br /> &lt;/math&gt; }}<br /> <br /> <br /> SAEM is an iterative algorithm that essentially consists of constructing $N$ [http://en.wikipedia.org/wiki/Markov_chain Markov chains] $(\psi_1^{(k)})$, ..., $(\psi_N^{(k)})$ that converge to the conditional distributions $\pmacro(\psi_1|y_1),\ldots , \pmacro(\psi_N|y_N)$, using at each step the complete data $(\by,\bpsi^{(k)})$ to calculate a new parameter vector $\theta_k$. We will present a general description of the algorithm highlighting the connection with the EM algorithm, and present by way of a simple example how to implement SAEM and use it in practice.<br /> <br /> We will also give some extensions of the base algorithm that allow us to improve the convergence properties of the algorithm. For instance, it is possible to stabilize the algorithm's convergence by using several [http://en.wikipedia.org/wiki/Markov_chain Markov chains] per individual. Also, a simulated annealing version of SAEM allows us improve the chances of converging to the global maximum of the likelihood rather than to local maxima.<br /> <br /> <br /> &lt;br&gt;<br /> ==The EM algorithm==<br /> <br /> <br /> We first remark that if the individual parameters $\bpsi=(\psi_i)$ are observed, estimation is not thwarted by any particular problem because an estimator could be found by directly maximizing the joint distribution $\pypsi(\by,\bpsi ; \theta)$.<br /> <br /> However, since the $\psi_i$ are not observed, the EM algorithm replaces $\bpsi$ by its conditional expectation. Then, given some initial value $\theta_0$, iteration $k$ updates ${\theta}_{k-1}$ to ${\theta}_{k}$ with the two following steps:<br /> <br /> <br /> * $\textbf{E-step:}$ evaluate the quantity<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; Q_k(\theta)=\esp{\log \pmacro(\by,\bpsi;\theta){{!}} \by;\theta_{k-1} } .&lt;/math&gt; }}<br /> <br /> <br /> * $\textbf{M-step:}$ update the estimation of $\theta$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \theta_{k} = \argmax{\theta} \, Q_k(\theta) .<br /> &lt;/math&gt; }}<br /> <br /> <br /> In can be proved that each EM iteration increases the likelihood of observations and that the EM sequence $(\theta_k)$ converges to a<br /> stationary point of the observed likelihood under mild regularity conditions.<br /> <br /> Unfortunately, in the framework of nonlinear mixed-effects models, there is no explicit expression for the E-step since the relationship between observations $\by$ and individual parameters $\bpsi$ is nonlinear. However, even though this expectation cannot be computed in a closed-form, it can be approximated by simulation. For instance,<br /> <br /> <br /> * The Monte Carlo EM (MCEM) algorithm replaces the E-step by a Monte Carlo approximation based on a large number of independent simulations of the non-observed individual parameters $\bpsi$.<br /> <br /> * The SAEM algorithm replaces the E-step by a stochastic approximation based on a single simulation of $\bpsi$.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==The SAEM algorithm==<br /> <br /> At iteration $k$ of SAEM:<br /> <br /> <br /> * $\textbf{Simulation step}$: for $i=1,2,\ldots, N$, draw $\psi_i^{(k)}$ from the conditional distribution $\pmacro(\psi_i |y_i ;\theta_{k-1})$.<br /> <br /> <br /> * $\textbf{Stochastic approximation}$: update $Q_k(\theta)$ according to<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; Q_k(\theta) = Q_{k-1}(\theta) + \gamma_k ( \log \pmacro(\by,\bpsi^{(k)};\theta) - Q_{k-1}(\theta) ),<br /> &lt;/math&gt; }}<br /> <br /> where $(\gamma_k)$ is a decreasing sequence of positive numbers such that $\gamma_1=1$, $\sum_{k=1}^{\infty} \gamma_k = \infty$ and $\sum_{k=1}^{\infty} \gamma_k^2 &lt; \infty$.<br /> <br /> <br /> * $\textbf{Maximization step}$: update $\theta_{k-1}$ according to<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \theta_{k} = \argmax{\theta} \, Q_k(\theta) .&lt;/math&gt; }}<br /> <br /> <br /> {{Remarks <br /> |title=Remarks<br /> |text= &amp;#32;<br /> * Setting $\gamma_k=1$ for all $k$ means that there is no memory in the stochastic approximation:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; Q_k(\theta) = \log \pmacro(\by,\bpsi^{(k)};\theta) . &lt;/math&gt; }}<br /> <br /> : This algorithm, known as Stochastic EM (SEM) thus consists of successively simulating $\bpsi^{(k)}$ with the conditional distribution $\pmacro(\bpsi^{(k)} {{!}} \by;\theta_{k-1})$, then computing $\theta_k$ by maximizing the joint distribution $\pmacro(\by,\bpsi^{(k)};\theta)$.<br /> <br /> <br /> * When the number $N$ of subjects is small, convergence of SAEM can be improved by running $L$ [http://en.wikipedia.org/wiki/Markov_chain Markov chains] for each individual instead of one. The simulation step at iteration $k$ then requires us to draw $L$ sequences ${ \phi_i^{(k,1)} } ,\ldots , { \phi_i^{(k,L)} }$ for each individual $i$ and to combine stochastic approximation and Monte Carlo in the approximation step:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; Q_k(\theta) = Q_{k-1}(\theta) + \gamma_k \left( \frac{1}{L}\sum_{\ell=1}^{L} \log \pmacro(\by,\bpsi^{(k,\ell)};\theta) - Q_{k-1}(\theta) \right) .<br /> &lt;/math&gt; }}<br /> <br /> : By default, $\monolix$ selects $L$ so that $N\times L \geq 50$.<br /> }}<br /> <br /> <br /> Implementation of SAEM is simplified when the complete model $\pmacro(\by,\bpsi;\theta)$ belongs to a regular (curved) exponential family:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pmacro(\by,\bpsi ;\theta) = \exp\left\{ - \zeta(\theta) + \langle \tilde{S}(\by,\bpsi) , \varphi(\theta) \rangle \right\} , &lt;/math&gt; }}<br /> <br /> where $\tilde{S}(\by,\bpsi)$ is a sufficient statistic of the complete model (i.e., whose value contains all the information needed to compute any estimate of $\theta$) which takes its values in an open subset ${\cal S}$ of $\Rset^m$. Then, there exists a function $\tilde{\theta}$ such that for any $s\in {\cal S}$,<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:saem_stat&quot;&gt;&lt;math&gt;<br /> \tilde{\theta}(s) = \argmax{\theta} \left\{ - \zeta(\theta) + \langle s , \varphi(\theta) \rangle \right\} .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> The approximation step of SAEM simplifies to a general Robbins-Monro-type scheme for approximating this conditional expectation:<br /> <br /> <br /> * $\textbf{Stochastic approximation}$: update $s_k$ according to<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> s_k = s_{k-1} + \gamma_k ( \tilde{S}(\by,\bpsi^{(k)}) - s_{k-1} ) . &lt;/math&gt; }}<br /> <br /> <br /> Note that the E-step of EM simplifies to computing $s_k=\esp{\tilde{S}(\by,\bpsi) | \by ; \theta_{k-1}}$.<br /> <br /> Then, both EM and SAEM algorithms use [[#eq:saem_stat|(1)]] for the M-step: $\theta_k = \tilde{\theta}(s_k)$.<br /> <br /> Precise results for convergence of SAEM were obtained in the [[Estimation of the observed Fisher information matrix#Estimation using linearization of the model|Estimation of the F.I.M. using a linearization of the model]] chapter in the case where $\pmacro(\by,\bpsi;\theta)$ belongs to a regular curved exponential family. This first version of [[The SAEM algorithm for estimating population parameters|SAEM]] and these first results assume that the individual parameters are simulated exactly under the conditional distribution at each iteration. Unfortunately, for most nonlinear models or non-Gaussian models, the unobserved data cannot be simulated exactly under this conditional distribution. A well-known alternative consists in using the Metropolis-Hastings algorithm: introduce a transition probability which has as unique invariant distribution the conditional distribution we want to simulate.<br /> <br /> In other words, the procedure consists of replacing the Simulation step of SAEM at iteration $k$ by $m$ iterations of the<br /> Metropolis-Hastings (MH) algorithm described in [[The Metropolis-Hastings algorithm for simulating the individual parameters|The Metropolis-Hastings algorithm]] section. It was shown in the [[Estimation of the observed Fisher information matrix#Estimation using linearization of the model|Estimation of the F.I.M. using a linearization of the model]] section that [[The SAEM algorithm for estimating population parameters|SAEM]] still converges under general conditions when coupled with a [http://en.wikipedia.org/wiki/Markov_chain Markov chain] Monte Carlo procedure.<br /> <br /> <br /> {{Remarks<br /> |title= Remark<br /> |text= Convergence of the [http://en.wikipedia.org/wiki/Markov_chain Markov chains] $(\psi_i^{(k)})$ is not necessary at each SAEM iteration. It suffices to run a few MH iterations with various transition kernels before resetting $\theta_{k-1}$. In $\monolix$ by default, three transition kernels are used twice each, successively, in each SAEM iteration.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Implementing SAEM ==<br /> <br /> Implementation of SAEM can be difficult to describe when looking at complex statistical models such as mixture models, models with inter-occasion variability, etc. We are therefore going to limit ourselves to looking at some basic models in order to illustrate how SAEM can be implemented.<br /> <br /> &lt;br&gt;<br /> ===SAEM for general hierarchical models===<br /> <br /> Consider first a very general model for any type (continuous, categorical, survival, etc.) of data $(y_i)$:<br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\begin{eqnarray} y_i {{!}} \psi_i &amp;\sim&amp; \pcyipsii(y_i {{!}} \psi_i) \\<br /> h(\psi_i) &amp;\sim&amp; {\cal N}( \mu , \Omega),<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where $h(\psi_i)=(h_1(\psi_{i,1}), h_2(\psi_{i,2}), \ldots , h_d(\psi_{i,d}) )^\transpose$ is a $d$-vector of (transformed) individual parameters, $\mu$ a $d$-vector of fixed effects and $\Omega$ a $d\times d$ variance-covariance matrix.<br /> <br /> We assume here that $\Omega$ is positive-definite. Then, a sufficient statistic for the complete model $\pmacro(\by,\bpsi;\theta)$ is<br /> $\tilde{S}(\bpsi) = (\tilde{S}_1(\bpsi),\tilde{S}_2(\bpsi))$, where<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \tilde{S}_1(\bpsi) &amp;= &amp; \sum_{i=1}^N h(\psi_i) \\<br /> \tilde{S}_2(\bpsi) &amp;= &amp; \sum_{i=1}^N h(\psi_i) h(\psi_i)^\transpose .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> At iteration $k$ of SAEM, we have:<br /> <br /> <br /> * $\textbf{Simulation step}$: for $i=1,2,\ldots, N$, draw $\psi_i^{(k)}$ from $m$ iterations of the MH algorithm described in [[The Metropolis-Hastings algorithm for simulating the individual parameters|The Metropolis-Hastings algorithm]] with $\pmacro(\psi_i |y_i ;\mu_{k-1},\Omega_{k-1})$ as limiting distribution.<br /> <br /> * $\textbf{Stochastic approximation}$: update $s_k=(s_{k,1},s_{k,2})$ according to<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> s_{k,1} &amp;=&amp; s_{k-1,1} + \gamma_k \left( \sum_{i=1}^N h(\psi_i^{(k)}) - s_{k-1,1} \right) \\<br /> s_{k,2} &amp;=&amp; s_{k-1,2} + \gamma_k \left( \sum_{i=1}^N h(\psi_i^{(k)})h(\psi_i^{(k)})^\transpose - s_{k-1,2} \right) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> * $\textbf{Maximization step}$: update $(\mu_{k-1},\Omega_{k-1})$ according to<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \mu_{k} &amp;=&amp; \frac{1}{N} s_{k,1} \\<br /> \Omega_k &amp;=&amp; \frac{1}{N}\left( s_{k,2} - s_{k,1}s_{k,1}^\transpose \right) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> What is remarkable is that it suffices to be able to calculate $\pcyipsii(y_i | \psi_i)$ for all $\psi_i$ and $y_i$ in order to be able to run SAEM. In effect, this allows the simulation step to be run using MH since the acceptance probabilities can be calculated.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===SAEM for continuous data models===<br /> Consider now a continuous data model in which the residual error variance is now constant:<br /> <br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> y_{ij} &amp;=&amp; f(t_{ij},\phi_i) + a \teps_{ij} \\<br /> h(\psi_i) &amp;\sim&amp; {\cal N}( \mu , \Omega) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> Here, the individual parameters are $\psi_i=(\phi_i,a)$. The variance-covariance matrix for $\psi_i$ is not positive-definite in this case because $a$ has no variability. If we suppose that the variance matrix $\Omega$ is positive-definite, then noting $\theta=(\mu,\Omega,a)$, a natural decomposition of the model is:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pmacro(\by,\bpsi;\theta) = \pmacro(\by {{!}} \bpsi;a)\pmacro(\bpsi;\mu,\Omega) .<br /> &lt;/math&gt; }}<br /> <br /> The previous statistic $\tilde{S}(\bpsi) = (\tilde{S}_1(\bpsi),\tilde{S}_2(\bpsi))$ is not sufficient for estimating $a$. Indeed, we need an additional component which is a function both of $\by$ and $\bpsi$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \tilde{S}_3(\by, \bpsi) =\sum_{i=1}^N \sum_{j=1}^{n_i}(y_{ij} - f(t_{ij},\psi_i))^2. &lt;/math&gt; }}<br /> <br /> Then,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \begin{eqnarray}<br /> s_{k,3} &amp;=&amp; s_{k-1,3} + \gamma_k ( \tilde{S}_3(\by, \bpsi) - s_{k-1,3} ) \\<br /> a_k^2 &amp;=&amp; \displaystyle{ \frac{1}{\sum_{i=1}^N n_i} s_{k,3} }\ .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The choice of step-size $(\gamma_k)$ is extremely important for ensuring convergence of SAEM. The sequence $(\gamma_k)$ used in $\monolix$ decreases like $k^{-\alpha}$. We recommend using $\alpha=0$ (that is, $\gamma_k=1$) during the first $K_1$ iterations, in order to converge quickly to a neighborhood of a maximum of the likelihood, and $\alpha=1$ during the next $K_2$ iterations.<br /> Indeed, the initial guess $\theta_0$ may be far from the maximum likelihood value we are looking for, and the first iterations with $\gamma_k=1$ allow SAEM to converge quickly to a neighborhood of this value. Following this, smaller step-sizes ensure the<br /> almost sure convergence of the algorithm to the maximum likelihood estimator.<br /> <br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= Consider a simple model for continuous data:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> y_{ij} &amp;\sim&amp; {\cal N}(A_i\,e^{-k_i \, t_{ij} } , a^2) \\<br /> \log(A_i)&amp;\sim&amp;{\cal N}(\log(A_{\rm pop}) , \omega_A^2) \\<br /> \log(k_i)&amp;\sim&amp;{\cal N}(\log(k_{\rm pop}) , \omega_k^2) ,<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where $A_{\rm pop}=6$, $k_{\rm pop}=0.25$, $\omega_A=0.3$, $\omega_k=0.3$ and $a=0.2$.<br /> Let us look at the effect of different settings for $(\gamma_k)$ (and $L$) for estimating the population parameters of the model with SAEM.<br /> <br /> <br /> 1. For all $k$, $\gamma_k = 1$: the sequence $(\theta_{k})$ converges very quickly to a neighborhood of the &quot;solution&quot;. The sequence $(\theta_{k})$ is a homogeneous Markov Chain that converges in distribution but does not converge almost surely. <br /> <br /> [[File:saem1.png|link=]]<br /> <br /> <br /> 2. For all $k$, $\gamma_k = 1/k$: the sequence $(\theta_{k})$ converges almost surely to the maximum likelihood estimate of $\theta$, but very slowly. <br /> <br /> [[File:saem2.png|link=]]<br /> <br /> <br /> 3. $\gamma_k = 1$, $k=1$, ...,$40$, $\gamma_k = 1/(k-40)$, $k \geq 41$: the sequence $(\theta_{k})$ converges almost surely to the maximum likelihood estimate of $\theta$, and quickly.<br /> <br /> [[File:saem3.png|link=]]<br /> <br /> <br /> 4. $L=10$, $\gamma_k = 1$, $k \geq 1$: the sequence $(\theta_{k})$ is an homogeneous Markov chain that converges in distribution, as in Example 1, but the variance is reduced by a factor $\sqrt{10}$; in this case, SAEM behaves like EM. <br /> <br /> [[File:saem4.png|link=]]<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==A simple example to understand why SAEM converges in practice==<br /> <br /> <br /> Let us look at a very simple Gaussian model, with only one observation per individual:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \psi_i &amp;\sim&amp; {\cal N}(\theta,\omega^2) , \ \ \ 1 \leq i \leq N \\<br /> y_i &amp;\sim&amp; {\cal N}(\psi_i,\sigma^2).<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> We will furthermore assume that both $\omega^2$ and $\sigma^2$ are known.<br /> <br /> Here, the maximum likelihood estimator $\hat{\theta}$ of $\theta$ is easy to compute since $y_i \sim_{i.i.d.} {\cal N}(\theta,\omega^2+\sigma^2)$. We find that<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \hat{\theta} = \displaystyle{\frac{1}{N} }\sum_{i=1}^{N} y_i .<br /> &lt;/math&gt;}}<br /> <br /> We now propose to try and compute $\hat{\theta}$ using SAEM instead. The simulation step is straightforward since the conditional distribution of $\psi_i$ is a normal distribution:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \psi_i {{!}} y_i \sim {\cal N}(a \theta + (1-a)y_i , \gamma^2) ,<br /> &lt;/math&gt; }}<br /> <br /> where<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> a &amp;= &amp; \displaystyle{ \frac{1}{\omega^2} } \left(\displaystyle{ \frac{1}{\sigma^2} }+ \displaystyle{\frac{1}{\omega^2} }\right)^{-1} \\<br /> \gamma^2 &amp;= &amp;\left(\displaystyle{ \frac{1}{\sigma^2} }+ \displaystyle{\frac{1}{\omega^2} }\right)^{-1}.<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The maximization step is also straightforward. Indeed, a sufficient statistic for estimating $\theta$ is<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; {\cal S}(\bpsi) = \sum_{i=1}^{N} \psi_i. &lt;/math&gt; }}<br /> <br /> Then,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \tilde{\theta}({\cal S(\bpsi)} ) &amp;=&amp; \argmax{\theta} \pmacro(y_1,\ldots,y_N,\psi_1,\ldots,\psi_N;\theta) \\<br /> &amp;=&amp; \argmax{\theta} \pmacro(\psi_1,\ldots,\psi_N;\theta) \\<br /> &amp;=&amp; \frac{ {\cal S}(\bpsi)}{N}.<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Let us first look at the behavior of SAEM when $\gamma_k=1$. At iteration $k$,<br /> <br /> <br /> * Simulation step: $\psi_i^{(k)} \sim {\cal N}( a \theta_{k-1} + (1-a)y_i , \gamma^2).$<br /> <br /> * Maximization step: $\theta_k = \displaystyle{ \frac{ {\cal S}(\bpsi^{(k)})}{N} } = \displaystyle{ \frac{1}{N} }\sum_{i=1}^{N} \psi_i^{(k)}$.<br /> <br /> <br /> It can be shown that:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \theta_k - \hat{\theta} = a(\theta_{k-1} - \hat{\theta}) + e_k ,<br /> &lt;/math&gt; }}<br /> <br /> where $e_k \sim {\cal N}(0, \gamma^2 /N)$. Then, the sequence $(\theta_k)$ is an autoregressive process of order 1 (AR(1)) which converges in distribution to a normal distribution when $k\to \infty$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\theta_k \limite{}{\cal D} {\cal N}\left(\hat{\theta} , \displaystyle{ \frac{\gamma^2}{N(1-a^2)} }\right) .<br /> &lt;/math&gt; }}<br /> <br /> <br /> {{ImageWithCaption|image=saemb1.png|caption=10 sequences $(\theta_k)$ obtained with different initial values and $\gamma_k=1$ for $1\leq k \leq 50$ }} <br /> <br /> <br /> Now, let us see what happens instead when $\gamma_k$ decreases like $1/k$. At iteration $k$,<br /> <br /> <br /> * Simulation step: $\psi_i^{(k)} \sim {\cal N}( a \theta_{k-1} + (1-a)y_i , \gamma^2)$<br /> <br /> * Maximization step:<br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\theta_k = \theta_{k-1} + \displaystyle{ \frac{1}{k} }\left( \displaystyle{ \frac{1}{N} }\sum_{i=1}^{N} \psi_i^{(k)} -\theta_{k-1} \right). <br /> &lt;/math&gt; }}<br /> <br /> <br /> : Here, we can show that:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \theta_k - \hat{\theta} = \displaystyle{ \frac{k-a}{k} }(\theta_{k-1} - \hat{\theta}) + \displaystyle{\frac{e_k}{k} }, <br /> &lt;/math&gt; }}<br /> <br /> : where $e_k \sim {\cal N}(0, \gamma^2 /N)$. Then, the sequence $(\theta_k)$ converges almost surely to $\hat{\theta}$.<br /> <br /> <br /> {{ImageWithCaption|image=saemb2.png|caption=10 sequences $(\theta_k)$ obtained with different initial values and $\gamma_k=1/k$ for $1\leq k \leq 50$ }}<br /> <br /> <br /> Thus, we see that by combining the two strategies, the sequence $(\theta_k)$ is a Markov chain that converges to a random walk around $\hat{\theta}$ during the first $K_1$ iterations, then converges almost surely to $\hat{\theta}$ during the next $K_2$ iterations.<br /> <br /> <br /> {{ImageWithCaption|image=saemb3.png|caption=10 sequences $(\theta_k)$ obtained with different initial values, $\gamma_k=1$ for $1\leq k \leq 20$ and $\gamma_k=1/(k-20)$ for $21\leq k \leq 50$ }}<br /> <br /> <br /> {{ShowVideo|image=saem5b.png|video=http://popix.lixoft.net/images/2/20/saem.mp4|caption=The SAEM algorithm in practice. }}<br /> <br /> &lt;!-- {{ImageWithCaptionL|image=saem5.png|size=750px|caption= The SAEM algorithm in practice. (a) the observations and the initialization $p_0(\psi_i)$, (b) the initialization $p_0(\psi_i)$ and the conditional distributions of the observations $p(y_i{{!}}\psi_i)$, (c) the conditional distributions $p_0(\psi_i{{!}}y_i)$ and the simulated individual parameters $(\psi_i^{(1)})$, (d) the updated distribution $p_1(\psi_i)$. }} --&gt;<br /> <br /> ==A simulated annealing version of SAEM==<br /> <br /> <br /> Convergence of SAEM can strongly depend on the initial guess when the likelihood ${\like}$ has several local maxima. A simulated annealing version of SAEM can improve convergence of the algorithm toward the global maximum of ${\like}$.<br /> <br /> To detail this, we can first rewrite the joint pdf of $(\by,\bpsi)$ as follows:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pmacro(\by,\bpsi;\theta) = C(\theta)\, \exp \left\{-U(\by,\bpsi;\theta)\right\} ,<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where $C(\theta)$ is a normalizing constant that only depends on $\theta$. Then, for any &quot;temperature&quot; $T\geq0$, we consider the complete model<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pmacro_T(\by,\bpsi;\theta) = C_T(\theta)\, \exp \left\{-\displaystyle{\frac{1}{T} }U(\by,\bpsi;\theta) \right\} ,<br /> &lt;/math&gt; }}<br /> <br /> where $C_T(\theta)$ is still a normalizing constant.<br /> <br /> We then introduce a decreasing temperature sequence $(T_k, 1\leq k \leq K)$ and use the SAEM algorithm on the complete model $\pmacro_{T_k}(\by,\bpsi;\theta)$ at iteration $k$ (the usual version of SAEM uses $T_k=1$ at each iteration). The sequence $(T_k)$ is chosen to have large positive values during the first iterations, then decrease with an exponential rate to 1: $T_k = \max(1, \tau \ T_{k-1})$.<br /> <br /> Consider for example the following model for continuous data:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> y_{ij} &amp;\sim&amp; {\cal N}(f(t_{ij};\psi_i) , a^2) \\<br /> h(\psi_i) &amp;\sim&amp; {\cal N}(\mu , \Omega) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Here, $\theta = (\mu,\Omega,a^2)$ and<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pmacro(\by,\bpsi;\theta) = C(\theta)\, \exp \left\{- \displaystyle{ \frac{1}{2 a^2} }\sum_{i=1}^N \sum_{j=1}^{n_i} (y_{ij} - f(t_{ij};\psi_i))^2 - \displaystyle{ \frac{1}{2} } \sum_{i=1}^N (h(\psi_i)-\mu)^\transpose \Omega (h(\psi_i)-\mu) \right\},<br /> &lt;/math&gt; }}<br /> <br /> where $C(\theta)$ is a normalizing constant that only depends on $a$ and $\Omega$.<br /> <br /> <br /> We see that $\pmacro_T(\by,\bpsi;\theta)$ will also be a normal distribution whose residual error variance $a^2$ is replaced by $T a^2$ and variance matrix $\Omega$ for the random effects by $T\Omega$.<br /> In other words, a model with a &quot;large temperature&quot; is a model with large variances.<br /> <br /> The algorithm therefore consists in choosing large initial variances $\Omega_0$ and $a^2_0$ (that include the initial temperature $T_0$ implicitly) and setting $a^2_k = \max(\tau \ a^2_{k-1} , \hat{a}(\by,\bpsi^{(k)})$ and $\Omega_k = \max(\tau \ \Omega_{k-1} , \hat{\Omega}(\bpsi^{(k)})$ during the first iterations. Here, $0\leq\tau\leq 1$.<br /> <br /> These large values of the variance make the conditional distributions $\pmacro_T(\psi_i | y_i;\theta)$ less concentrated around their modes, and thus allow the sequence $(\theta_k)$ to &quot;escape&quot; from local maxima of the likelihood during the first iterations of SAEM and converge to a neighborhood of the global maximum of ${\like}$.<br /> After these initial iterations, the usual SAEM algorithm is used to estimate these variances at each iteration.<br /> <br /> <br /> {{Remarks<br /> |title= Remark<br /> |text= We can use two different coefficients $\tau_1$ and $\tau_2$ for $\Omega$ and $a^2$ in $\monolix$. It is possible, for example, to choose $\tau_1&lt;1$ and $\tau_2&gt;1$, with large initial inter-subject variances $\Omega_0$ and small initial residual variance $a^2_0$. In this case, SAEM tries to obtain the best possible fit during the first iterations, allowing for a large inter-subject variability. During the next iterations, this variability is reduced and the residual variance increases until reaching the best possible trade-off between the two criteria.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=A PK example<br /> |text= <br /> <br /> Consider a simple one-compartment model for oral administration:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:saem_sa&quot;&gt;&lt;math&gt;<br /> f(t;ka,V,k) = \displaystyle{ \frac{D\, ka}{V(ka-ke)} }\left( e^{-ke \, t} - e^{-ka \, t} \right) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> We then simulate PK data from 80 patients using the following population PK parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; ka_{\rm pop} = 1, \quad V_{\rm pop}=8, \quad ke_{\rm pop}=0.25 .&lt;/math&gt; }}<br /> <br /> We can see that the following parametrization gives the same prediction as the one given in [[#eq:saem_sa|(2)]]:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \tilde{ka} = ke, \quad \tilde{V}=V \times ke/ka, \quad \tilde{ke}=ka . &lt;/math&gt; }}<br /> <br /> We can then expect a (global) maximum around $(ka,V,ke) = (1, \ 8, \ 0.25)$ and a (local) maximum of the likelihood around $(ka,V,ke) = (0.25, \ 2, \ 1).$<br /> <br /> The figure below displays the convergence of SAEM without simulated annealing to a local maximum of the likelihood (deviance = $-2\,\log {\like} =816$). The initial values of the population parameters we chose were $(ka_0,V_0,k_0) = (1,1,1)$.<br /> <br /> :{{ImageWithCaption_special|image=recuit1.png|caption=Convergence of SAEM to a local maxima of the likelihood}} <br /> <br /> Using the same initial guess, the simulated annealing version of SAEM converges to the global maximum of the likelihood (deviance = 734).<br /> <br /> :{{ImageWithCaption_special|image=recuit2.png|caption=Convergence of SAEM to the global maxima of the likelihood }}<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> == Bibliography ==<br /> <br /> <br /> &lt;bibtex&gt;<br /> @article{allassonniere2010construction,<br /> title={Construction of Bayesian deformable models via a stochastic approximation algorithm: a convergence study},<br /> author={Allassonnière, S. and Kuhn, E. and Trouvé, A.},<br /> journal={Bernoulli},<br /> volume={16},<br /> number={3},<br /> pages={641--678},<br /> year={2010},<br /> publisher={Bernoulli Society for Mathematical Statistics and Probability}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{delattre2012maximum,<br /> title={Maximum likelihood estimation in discrete mixed hidden Markov models using the SAEM algorithm},<br /> author={Delattre, M. and Lavielle, M.},<br /> journal={Computational Statistics &amp; Data Analysis},<br /> year={2012},<br /> volume={56},<br /> pages={2073-2085}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{delattre2013sde,<br /> title={Coupling the SAEM algorithm and the extended Kalman filter for maximum likelihood estimation in mixed-effects diffusion models},<br /> author={Delattre, M. and Lavielle, M.},<br /> journal={Statistics and its interfaces},<br /> year={2013},<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{delyon1999convergence,<br /> title={Convergence of a stochastic approximation version of the EM algorithm},<br /> author={Delyon, B. and Lavielle, M. and Moulines, E.},<br /> journal={Annals of Statistics},<br /> pages={94-128},<br /> year={1999},<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{dempster1977maximum,<br /> title={Maximum likelihood from incomplete data via the EM algorithm},<br /> author={Dempster, A.P. and Laird, N.M. and Rubin, D.B.},<br /> journal={Journal of the Royal Statistical Society. Series B (Methodological)},<br /> pages={1-38},<br /> year={1977},<br /> publisher={JSTOR}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{kuhn2004coupling,<br /> title={Coupling a stochastic approximation version of EM with an MCMC procedure},<br /> author={Kuhn, E. and Lavielle, M.},<br /> journal={ESAIM: Probability and Statistics},<br /> volume={8},<br /> pages={115-131},<br /> year={2004},<br /> publisher={EDP Sciences, 17 Avenue du Hoggar Les Ulis Cedex A BP 112 91944 France}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{lavielle2013improved,<br /> title={An improved SAEM algorithm for maximum likelihood estimation in mixtures of non linear mixed effects models},<br /> author={Lavielle, M. and Mbogning, C.},<br /> journal={Statistics and Computing},<br /> year={2013},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{mclachlan2007algorithm,<br /> title={The EM algorithm and extensions},<br /> author={McLachlan, G.J. and Krishnan, T.},<br /> volume={382},<br /> year={2007},<br /> publisher={Wiley-Interscience}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{samson2006extension,<br /> title={Extension of the SAEM algorithm to left-censored data in nonlinear mixed-effects model: Application to HIV dynamics model},<br /> author={Samson, A. and Lavielle, M. and Mentr&amp;eacute;, F.},<br /> journal={Computational statistics &amp; data analysis},<br /> volume={51},<br /> number={3},<br /> pages={1562-1574},<br /> year={2006},<br /> publisher={Elsevier}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{wei1990monte,<br /> title={A Monte Carlo implementation of the EM algorithm and the poor man's data augmentation algorithms},<br /> author={Wei, G. and Tanner, M.},<br /> journal={Journal of the American Statistical Association},<br /> volume={85},<br /> number={411},<br /> pages={699-704},<br /> year={1990},<br /> publisher={Taylor &amp; Francis}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{wu1983convergence,<br /> title={On the convergence properties of the EM algorithm},<br /> author={Wu, C.F.},<br /> journal={The Annals of Statistics},<br /> volume={11},<br /> number={1},<br /> pages={95-103},<br /> year={1983},<br /> publisher={Institute of Mathematical Statistics}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack=Introduction and notation<br /> |linkNext=The Metropolis-Hastings algorithm for simulating the individual parameters }}</div> Admin https://wiki.inria.fr/popix/The_Metropolis-Hastings_algorithm_for_simulating_the_individual_parameters The Metropolis-Hastings algorithm for simulating the individual parameters 2013-04-29T12:14:35Z <p>Admin: </p> <hr /> <div>We consider a joint model for the observations and individual parameters of individual $i$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pyipsii(y_i,\psi_i) = \pcyipsii(y_i {{!}} \psi_i) \ppsii(\psi_i) ,<br /> &lt;/math&gt; }}<br /> <br /> where $\qcyipsii$ is the conditional distribution of the observations of individual $i$ (see [[Modeling the observations]]) &lt;!--section in Chapter [[Models]])--&gt; and $\ppsii(\psi_i)$ the distribution of the individual parameters of individual $i$ (see [[Modeling the individual parameters]]). <br /> <br /> <br /> {{Remarks<br /> |title=Remark<br /> |text= This distribution depends on a vector of population parameters $\theta$ and possibly covariates $c_i$, regression variables $x_i$ and inputs $u_i$. We suppose that all of these components of the model are given, so it is not necessary to explicitly write them each time. }}<br /> <br /> <br /> Our goal here is to generate values from the conditional distribution $\qcpsiiyi$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pcpsiiyi(\psi_i{{!}}y_i) = \displaystyle{ \frac{\pyipsii(y_i,\psi_i)}{\pyi(y_i)} } .<br /> &lt;/math&gt; }}<br /> <br /> This distribution cannot usually be computed in closed-form when the model is not a linear Gaussian one. However, we will see that the Metropolis-Hastings (MH) algorithm allows us to draw a sequence $(\psi_i^{(\imh)}, \imh=1,2,\ldots)$ which converges (in distribution) to the target distribution $\qcpsiiyi$.<br /> <br /> We will consider a very general model for the individual parameters:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:mh1&quot;&gt;&lt;math&gt;<br /> \psi_i = M(\beta,c_i,\eta_i) ,<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> where $\beta$ is a vector of fixed effects, $c_i$ a vector of (observed) individual covariates and $\eta_i$ a vector of random effects whose probability distribution is denoted $\qetai$.<br /> <br /> The MH algorithm is used to simulate a sequence of random effects $(\eta_i^{(\imh)}, \imh=1,2,\ldots)$ with the target distribution being the conditional distribution $\qcetaiyi$ of the random effects $\eta_i$. We will then obtain the sequence $(\psi_i^{(\imh)}, \imh=1,2,\ldots)$ using [[#eq:mh1|(1)]].<br /> <br /> The MH algorithm is iterative and requires an initial value $\eta_i^{(0)}$. Then, at iteration $\imh$, we<br /> <br /> <br /> &lt;ol&gt;<br /> &lt;li&gt;Draw a new value $\ceta_i^{(\imh)}$ with some ''proposal distribution'' $q_{\imh}(\cdot \, ; \eta_i^{(\imh-1)})$&lt;/li&gt;<br /> <br /> &lt;li&gt; Compute $\cpsi_i^{(\imh)} = M(\beta,c_i,\ceta_i^{(\imh)})$&lt;/li&gt;<br /> <br /> &lt;li&gt;Accept this new value, that is let $\eta_i^{(\imh)}=\ceta_i^{(\imh)}$, with probability&lt;/li&gt;<br /> <br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \alpha(\ceta_i^{(\imh)} ; \eta_i^{(\imh-1)} ) &amp;=&amp; \displaystyle{\frac{ q_{\imh}(\eta_i^{(\imh-1)}; \ceta_i^{(\imh)} ) \qcetaiyi(\ceta_i^{(\imh)}{{!}}y_i)}<br /> {q_{\imh}(\ceta_i^{(\imh)} ; \eta_i^{(\imh-1)}) \qcetaiyi(\eta_i^{(\imh-1)}{{!}}y_i) } } \\<br /> &amp;=&amp; \displaystyle{\frac{ q_{\imh}(\eta_i^{(\imh-1)}; \ceta_i^{(\imh)} ) \qetai(\ceta_i^{(\imh)}) \qcyipsii(y_i {{!}} \cpsi_i^{(\imh)})}<br /> {q_{\imh}(\ceta_i^{(\imh)} ; \eta_i^{(\imh-1)}) \qetai(\eta_i^{(\imh-1)}) \qcyipsii(y_i {{!}} \psi_i^{(\imh-1)}) } } .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> &lt;/ol&gt;<br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> |text= &amp;#32;<br /> * In order to run this algorithm, we need to be able to calculate the transition density $q_{\imh}(\ceta_i;\eta_i)$, the random effects density $\petai(\eta_i)$ (which poses no problem if $\eta_i$ is a Gaussian vector for example), and in particular the conditional distribution $\pyipsii(y_i{{!}}\psi_i)$. This is why this calculation is explicitly performed in the various examples provided in [[Modeling the observations]]. <br /> <br /> <br /> * We denote $q_{\imh}$ the proposal distribution used at iteration $\imh$ of the algorithm because different proposals can be used at different iterations.<br /> }}<br /> <br /> <br /> Several proposal distributions are used in $\monolix$:<br /> <br /> <br /> &lt;ol&gt;<br /> &lt;li&gt; $q^{(1)}=\qetai$ is the marginal distribution of $\eta_i$, that is, the normal distribution ${\cal N}(0,\Omega)$. The acceptance probability for this kernel is &lt;/li&gt;<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \alpha(\ceta_i^{(\imh)} ; \eta_i^{(\imh-1)} )<br /> = \displaystyle{\frac{ \qcyipsii(y_i {{!}} \cpsi_i^{(\imh)})}{\qcyipsii(y_i {{!}} \psi_i^{(\imh-1)}) } } .<br /> &lt;/math&gt; }}<br /> <br /> <br /> &lt;li&gt; $q^{(2,\ieta)}$, for $\ieta=1,2,\ldots, d$ is the unidimensional Gaussian random walk for component $\ieta$ of $\eta_i$:&lt;/li&gt;<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \ceta_{i,\ieta}^{(\imh)} = \eta_{i,\ieta}^{(\imh-1)} + \xi_{i,\ieta}^{(\imh)} , &lt;/math&gt; }}<br /> <br /> where $\xi_{i,\ieta}^{(\imh)} \sim {\cal N}(0, \upsilon_\ieta^{(\imh)})$. The variance $\upsilon_\ieta^{(\imh)}$ of this random walk is calibrated in order to reach an optimal acceptance rate $\alpha^\star$ ($\monolix$ uses $\alpha^\star = 0.3$ as default). Here, the transition kernel is symmetrical and<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \alpha(\ceta_i^{(\imh)} ; \eta_i^{(\imh-1)} ) =<br /> \displaystyle{\frac{ \qetai(\ceta_i^{(\imh)}) \qcyipsii(y_i {{!}} \cpsi_i^{(\imh)})}<br /> {\qetai(\eta_i^{(\imh-1)}) \qcyipsii(y_i {{!}} \psi_i^{(\imh-1)}) } }.<br /> &lt;/math&gt; }}<br /> <br /> <br /> &lt;li&gt; $q^{(3,\Meta)}$, for $\Meta \subset \{1,2,\ldots,d\}$ is the multidimensional Gaussian random walk for the vector $\eta_\Meta = (\eta_\ieta , \ieta\in \Meta)$:&lt;/li&gt;<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \ceta_{i,\Meta}^{(\imh)} = \eta_i^{(\imh-1)} + \xi_{i,\Meta}^{(\imh)} , &lt;/math&gt; }}<br /> <br /> where $\xi_{i,\Meta}^{(\imh)}=(\xi_{i,\ieta}^{(\imh)}, \ieta\in \Meta)$ is a Gaussian vector with diagonal variance matrix $\Upsilon_\Meta^{(\imh)}$. Here as well, the variance $\Upsilon_\Meta^{(\imh)}$ of this random walk is adjusted in order to reach the optimal acceptation rate $\alpha^\star$. Different subsets $\Meta$ are chosen at each iteration.<br /> <br /> <br /> The MH algorithm consists then in using successively these different proposals for $i=1,2,\ldots , N$.<br /> <br /> The variances $(\upsilon_\ieta,\, \ieta=1,2,\ldots, d)$ for proposal $q^{(2,\ieta)}$ are updated at iteration $\imh$ as follows:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\upsilon_\ieta^{(\imh)}=\upsilon_\ieta^{(\imh)}(1+\delta(\overline{\alpha}_\ieta^{(\imh-1)}-\alpha^\star)),<br /> &lt;/math&gt; }}<br /> <br /> where $0&lt;\delta&lt;1$ is a constant and $\overline{\alpha}_\ieta^{(\imh)}$ the empirical acceptance rate at iteration $\imh$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \overline{\alpha}_\ieta^{(\imh)} = \frac{1}{N} \sum_{i=1}^N \one_{\eta_i^{(\imh)}=\ceta_i^{(\imh)} } .<br /> &lt;/math&gt; }}<br /> <br /> Indeed, a too small (resp. large) variance for the random walk leads to a too large (resp. small) acceptance probability. The strategy proposed here therefore allows us to adaptively correct the variance for a given acceptance probability $\alpha^\star$. A small value of $a$ allows us to smooth out the sequence $(\upsilon_\ieta^{(\imh)})$. $\monolix$ uses $a = 0.4$ as a default value.<br /> <br /> We can use the same strategy for updating the diagonal variance matrices $\Upsilon_\Meta,\ \Meta \subset \{1,2,,\ldots,d\}$ for the kernel $q^{(3,M)}$.<br /> <br /> The simulated sequence $(\psi_i^{(\imh)}, \imh=1,2,\ldots)$ can then be used for estimating empirically the conditional distribution $\qcpsiiyi$ and the conditional mean $\esp{F(\psi_i)| y_i}$ of any function $F$ such that $\esp{F^2(\psi_i)| y_i}&lt;+\infty$. The accuracy of the estimation obviously depends on the length $K$ of the sequence $(\psi_i^{(\imh)})$ used for the estimation, since the variance of the estimator merely decreases as $1/K$. Estimation is also improved if the sequence starts from a point which is representative of the equilibrium distribution (burn-in is one method of finding a good starting point) or known to have reasonably high probability (the conditional mode or the last value obtained with [[The SAEM algorithm for estimating population parameters|SAEM]] for instance).<br /> <br /> {{Back&amp;Next<br /> |linkBack=The SAEM algorithm for estimating population parameters<br /> |linkNext=Estimation of the observed Fisher information matrix }}</div> Admin https://wiki.inria.fr/popix/Methods Methods 2013-04-29T11:17:44Z <p>Admin: </p> <hr /> <div>&lt;div style=&quot;font-size:12pt; font-family:garamond&quot;&gt;<br /> &lt;!-- Menu for the Extensions chapter --&gt;<br /> &lt;sidebarmenu&gt;<br /> +[[Methods]]<br /> *[[Methods| Introduction ]] | [[The Metropolis-Hastings algorithm for simulating the individual parameters|The Metropolis-Hastings algorithm ]] | [[The SAEM algorithm for estimating population parameters|The SAEM algorithm]] | [[Estimation of the observed Fisher information matrix|Observed Fisher information matrix]] |[[Estimation of the log-likelihood via importance sampling| Log-likelihood via importance sampling]]<br /> &lt;/sidebarmenu&gt;<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> TO DO <br /> <br /> &lt;/div&gt;</div> Admin https://wiki.inria.fr/popix/TestMarc6 TestMarc6 2013-04-26T11:31:20Z <p>Admin: Page créée avec « &lt;div style=&quot;font-size:12pt;font-family:Helvetica&quot;&gt; &lt;!-- some LaTeX macros we want to use: --&gt; $\newcommand{\argmin}{ \mathop{\rm arg} \mathop{\rm min}\limits_{#1} } \... »</p> <hr /> <div>&lt;div style=&quot;font-size:12pt;font-family:Helvetica&quot;&gt;<br /> &lt;!-- some LaTeX macros we want to use: --&gt;<br />$<br /> \newcommand{\argmin}{ \mathop{\rm arg} \mathop{\rm min}\limits_{#1} } <br /> \newcommand{\nominal}{#1^{\star}}<br /> \newcommand{\psis}{\psi{^\star}}<br /> \newcommand{\phis}{\phi{^\star}}<br /> \newcommand{\hpsi}{\hat{\psi}}<br /> \newcommand{\hphi}{\hat{\phi}}<br /> \newcommand{\teps}{\varepsilon}<br /> \newcommand{\limite}{\mathop{\longrightarrow}\limits_{\mathrm{#1}}^{\mathrm{#2}}}<br /> \newcommand{\DDt}{\partial^2_\theta #1}<br /> <br /> \def\bu{\boldsymbol{u}}<br /> \def\bt{\boldsymbol{t}}<br /> \def\bT{\boldsymbol{T}}<br /> \def\by{\boldsymbol{y}}<br /> \def\bx{\boldsymbol{x}}<br /> \def\bc{\boldsymbol{c}}<br /> \def\bw{\boldsymbol{w}}<br /> \def\bz{\boldsymbol{z}}<br /> \def\bpsi{\boldsymbol{\psi}}<br /> \def\bbeta{\beta}<br /> <br /> \def\aref{a^\star}<br /> \def\kref{k^\star}<br /> \def\model{M}<br /> \def\hmodel{m}<br /> \def\mmodel{\mu}<br /> \def\imodel{H}<br /> \def\like{\cal L}<br /> \def\thmle{\hat{\theta}}<br /> \def\ofim{I^{\rm obs}}<br /> \def\efim{I^{\star}}<br /> <br /> \def\Imax{\rm Imax}<br /> \def\probit{\rm probit}<br /> \def\vt{t}<br /> \def\id{\rm Id}<br /> \def\teta{\tilde{\eta}}<br /> \newcommand{\eqdef}{\mathop{=}\limits^{\mathrm{def}}}<br /> <br /> \newcommand{\deriv}{\frac{d}{dt}#1(t)}<br /> <br /> \newcommand{\pred}{\tilde{#1}}<br /> \def\phis{\phi{^\star}}<br /> \def\hphi{\tilde{\phi}}<br /> \def\hw{\tilde{w}}<br /> \def\hpsi{\tilde{\psi}}<br /> \def\hatpsi{\hat{\psi}}<br /> \def\hatphi{\hat{\phi}}<br /> \def\psis{\psi{^\star}}<br /> \def\transy{u}<br /> \def\psipop{\psi_{\rm pop}}<br /> \newcommand{\psigr}{\hat{\bpsi}_{#1}}<br /> \newcommand{\Vgr}{\hat{V}_{#1}}<br /> <br /> %\def\pmacro{\mathcrm{p}}<br /> %\def\pmacro{\verb!p!}<br /> \def\pmacro{\text{p}}<br /> \def\py{\pmacro}<br /> \def\pt{\pmacro}<br /> \def\pc{\pmacro}<br /> \def\pu{\pmacro}<br /> \def\pyi{\pmacro}<br /> \def\pyj{\pmacro}<br /> \def\ppsi{\pmacro}<br /> \def\ppsii{\pmacro}<br /> \def\pcpsith{\pmacro}<br /> \def\pth{\pmacro}<br /> \def\pypsi{\pmacro}<br /> \def\pcypsi{\pmacro}<br /> \def\ppsic{\pmacro}<br /> \def\pcpsic{\pmacro}<br /> \def\pypsic{\pmacro}<br /> \def\pypsit{\pmacro}<br /> \def\pcypsit{\pmacro}<br /> \def\pypsiu{\pmacro}<br /> \def\pcypsiu{\pmacro}<br /> \def\pypsith{\pmacro}<br /> \def\pypsithcut{\pmacro}<br /> \def\pypsithc{\pmacro}<br /> \def\pcypsiut{\pmacro}<br /> \def\pcpsithc{\pmacro}<br /> \def\pcthy{\pmacro}<br /> \def\pyth{\pmacro}<br /> \def\pcpsiy{\pmacro}<br /> \def\pz{\pmacro}<br /> \def\pw{\pmacro}<br /> \def\pcwz{\pmacro}<br /> \def\pw{\pmacro}<br /> \def\pcyipsii{\pmacro}<br /> \def\pyipsii{\pmacro}<br /> \def\pypsiij{\pmacro}<br /> \def\pyipsiONE{\pmacro}<br /> \def\ptypsiij{\pmacro}<br /> \def\pcyzipsii{\pmacro}<br /> \def\pczipsii{\pmacro}<br /> \def\pcyizpsii{\pmacro}<br /> \def\pcyijzpsii{\pmacro}<br /> \def\pcyiONEzpsii{\pmacro}<br /> \def\pcypsiz{\pmacro}<br /> \def\pccypsiz{\pmacro}<br /> \def\pypsiz{\pmacro}<br /> \def\pcpsiz{\pmacro}<br /> \def\peps{\pmacro}<br /> <br /> \def\psig{\psi}<br /> \def\psigprime{\psig^{\prime}}<br /> \def\psigiprime{\psig_i^{\prime}}<br /> \def\psigk{\psig^{(k)}}<br /> \def\psigki{\psig_i^{(k)}}<br /> \def\psigkun{\psig^{(k+1)}}<br /> \def\psigkuni{\psig_i^{(k+1)}}<br /> \def\psigi{\psig_i}<br /> \def\psigil{\psig_{i,\ell}}<br /> \def\phig{\phi}<br /> \def\phigi{\phig_i}<br /> \def\phigil{\phig_{i,\ell}}<br /> <br /> <br /> \def\etagi{\eta_i}<br /> \def\IIV{\Omega}<br /> \def\thetag{\theta}<br /> \def\thetagk{\theta_k}<br /> \def\thetagkun{\theta_{k+1}}<br /> \def\thetagkunm{\theta_{k-1}}<br /> \def\sgk{s_{k}}<br /> \def\sgkun{s_{k+1}}<br /> \def\yg{y}<br /> \def\xg{x}<br /> <br /> \def\qx{p_x}<br /> \def\qy{p_y}<br /> \def\qt{p_t}<br /> \def\qc{p_c}<br /> \def\qu{p_u}<br /> \def\qyi{p_{y_i}}<br /> \def\qyj{p_{y_j}}<br /> \def\qpsi{p_{\psi}}<br /> \def\qpsii{p_{\psi_i}}<br /> \def\qcpsith{p_{\psi|\theta}}<br /> \def\qth{p_{\theta}}<br /> \def\qypsi{p_{y,\psi}}<br /> \def\qcypsi{p_{y|\psi}}<br /> \def\qpsic{p_{\psi,c}}<br /> \def\qcpsic{p_{\psi|c}}<br /> \def\qypsic{p_{y,\psi,c}}<br /> \def\qypsit{p_{y,\psi,t}}<br /> \def\qcypsit{p_{y|\psi,t}}<br /> \def\qypsiu{p_{y,\psi,u}}<br /> \def\qcypsiu{p_{y|\psi,u}}<br /> \def\qypsith{p_{y,\psi,\theta}}<br /> \def\qypsithcut{p_{y,\psi,\theta,c,u,t}}<br /> \def\qypsithc{p_{y,\psi,\theta,c}}<br /> \def\qcypsiut{p_{y|\psi,u,t}}<br /> \def\qcpsithc{p_{\psi|\theta,c}}<br /> \def\qcthy{p_{\theta | y}}<br /> \def\qyth{p_{y,\theta}}<br /> \def\qcpsiy{p_{\psi|y}}<br /> \def\qz{p_z}<br /> \def\qw{p_w}<br /> \def\qcwz{p_{w|z}}<br /> \def\qw{p_w}<br /> \def\qcyipsii{p_{y_i|\psi_i}}<br /> \def\qyipsii{p_{y_i,\psi_i}}<br /> \def\qypsiij{p_{y_{ij}|\psi_{i}}}<br /> \def\qyipsi1{p_{y_{i1}|\psi_{i}}}<br /> \def\qtypsiij{p_{\transy(y_{ij})|\psi_{i}}}<br /> \def\qcyzipsii{p_{z_i,y_i|\psi_i}}<br /> \def\qczipsii{p_{z_i|\psi_i}}<br /> \def\qcyizpsii{p_{y_i|z_i,\psi_i}}<br /> \def\qcyijzpsii{p_{y_{ij}|z_{ij},\psi_i}}<br /> \def\qcyi1zpsii{p_{y_{i1}|z_{i1},\psi_i}}<br /> \def\qcypsiz{p_{y,\psi|z}}<br /> \def\qccypsiz{p_{y|\psi,z}}<br /> \def\qypsiz{p_{y,\psi,z}}<br /> \def\qcpsiz{p_{\psi|z}}<br /> \def\qeps{p_{\teps}}<br /> <br /> \def\neta{n_\eta}<br /> \def\ncov{M}<br /> \def\npsi{n_\psig}<br /> <br /> \def\beeta{\eta}<br /> <br /> \def\logit{\rm logit}<br /> \def\transy{u}<br /> \def\so{O}<br /> <br /> \newcommand{\prob}{ \mathbb{P}\left(#1\right)}<br /> \newcommand{\probs}{ \mathbb{P}_{#1}\left(#2\right)}<br /> \newcommand{\esp}{\mathbb{E}\left(#1\right)}<br /> \newcommand{\esps}{\mathbb{E}_{#1}\left(#2\right)}<br /> \newcommand{\var}{\mbox{Var}\left(#1\right)}<br /> \newcommand{\vars}{\mbox{Var}_{#1}\left(#2\right)}<br /> \newcommand{\std}{\mbox{sd}\left(#1\right)}<br /> \newcommand{\stds}{\mbox{sd}_{#1}\left(#2\right)}<br /> \newcommand{\corr}{\mbox{Corr}\left(#1\right)}<br /> \newcommand{\Rset}{\mbox{$\mathbb{R}$}}<br /> \newcommand{\Yr}{\mbox{$\mathcal{Y}$}}<br /> \newcommand{\teps}{\varepsilon}<br /> \newcommand{\like}{\cal L}<br /> \newcommand{\logit}{\rm logit}<br /> \newcommand{\transy}{u}<br /> \newcommand{\repy}{y^{(r)}}<br /> \newcommand{\brepy}{\boldsymbol{y}^{(r)}}<br /> \newcommand{\vari}{#1_{#2}^{{#3}}}<br /> \newcommand{\dA}{\dot{#1}_{#2}(t)}<br /> \newcommand{\nitc}{N}<br /> \newcommand{\itc}{I}<br /> \newcommand{\vl}{V}<br /> \newcommand{tstart}{t_{start}}<br /> \newcommand{tstop}{t_{stop}}<br /> \newcommand{\one}{\mathbb{1}}<br /> \newcommand{\hazard}{h}<br /> \newcommand{\cumhaz}{H}<br /> \newcommand{\std}{\mbox{sd}\left(#1\right)}<br /> \newcommand{\eqdef}{\mathop{=}\limits^{\mathrm{def}}}<br /> <br /> \def\cpop{c_{\rm pop}}<br /> \def\Vpop{V_{\rm pop}}<br /> \def\iparam{l}<br /> \newcommand{\trcov}{#1}<br /> <br /> \def\mlxtran{\mathbb{MLXtran} }<br /> \def\monolix{\Bbb{Monolix}}<br /> $<br /> <br /> ==Introduction==<br /> <br /> A model built for real-world applications can involve various types of variable, such as measurements, individual and population parameters, covariates, design, etc. The model allows us to represent relationships between these variables.<br /> <br /> If we consider things from a probabilistic point of view, some of the variables will be random, so the model becomes a probabilistic one, representing the joint distribution of these random variables.<br /> <br /> Defining a model therefore means defining a joint distribution. The hierarchical structure of the model will then allow it to be decomposed into submodels, i.e., the joint distribution decomposed into a product of conditional distributions.<br /> <br /> Tasks such as estimation, model selection, simulation and optimization can then be expressed as specific ways of using this probability distribution.<br /> <br /> <br /> {{OutlineTextL<br /> |text= <br /> - A model is a joint probability distribution. <br /> <br /> - A submodel is a conditional distribution derived from this joint distribution. <br /> <br /> - A task is a specific use of this distribution. <br /> }}<br /> <br /> We will illustrate this approach starting with a very simple example that we will gradually make more sophisticated. Then we will see in various situations what can be defined as the model and what its inputs are.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==An illustrative example==<br /> <br /> &lt;br&gt;<br /> ===A model for the observations of a single individual===<br /> Let$y=(y_j, 1\leq j \leq n)$be a vector of ''observations'' obtained at times$\vt=(t_j, 1\leq j \leq n)$. We consider that the$y_j$are random variables and we denote$\qy$the distribution (or pdf) of$y$. If we assume a ''parametric model'', then there exists a vector of parameters$\psi$that completely define$y$.<br /> <br /> We can then explicitly represent this dependency with respect to$\bpsi$by writing$\qy( \, \cdot \, ; \psi)$for the pdf of$y$.<br /> <br /> If we wish to be even more precise, we can even make it clear that this distribution is defined for a given design, i.e., a given vector of times$\vt$, and write$ \qy(\, \cdot \, ; \psi,\vt)$instead.<br /> <br /> By convention, the variables which are before the symbol &quot;;&quot; are random variables. Those that are after the &quot;;&quot; are non-random parameters or variables.<br /> When there is no risk of confusion, the non-random terms can be left out of the notation.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> -In this context, the model is the distribution of the observations$\qy(\, \cdot \, ; \psi,\vt)$. &lt;br&gt;<br /> -The inputs of the model are the parameters$\psi$and the design$\vt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= 500 mg of a drug is given by intravenous bolus to a patient at time 0. We assume that the evolution of the plasmatic concentration of the drug over time is described by the pharmacokinetic (PK) model<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; f(t;V,k) = \displaystyle{ \frac{500}{V} }e^{-k \, t} , &lt;/math&gt; }}<br /> <br /> where$V$is the volume of distribution and$k$the elimination rate constant. The concentration is measured at times$(t_j, 1\leq j \leq n)$with additive residual errors:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; y_j = f(t_j;V,k) + e_j , \quad 1 \leq j \leq n . &lt;/math&gt; }}<br /> <br /> Assuming that the residual errors$(e_j)$are independent and normally distributed with constant variance$a^2$, the observed values$(y_j)$are also independent random variables and<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba1&quot; &gt;&lt;math&gt;<br /> y_j \sim {\cal N} \left( f(t_j ; V,k) , a^2 \right), \quad 1 \leq j \leq n. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1.4) }}<br /> <br /> Here, the vector of parameters$\psi$is$(V,k,a)$.$V$and$k$are the PK parameters for the structural PK model and$a$the residual error parameter.<br /> As the$y_j$are independent, the joint distribution of$y$is the product of their marginal distributions:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \py(y ; \psi,\vt) = \prod_{j=1}^n \pyj(y_j ; \psi,t_j) ,<br /> &lt;/math&gt; }}<br /> <br /> where$\qyj$is the normal distribution defined in [[#ex_proba1|(1.4)]]. <br /> }}<br /> <br /> === A model for several individuals ===<br /> <br /> Now let us move to$N$individuals. It is natural to suppose that each is represented by the same basic parametric model, but not necessarily the exact same parameter values. Thus, individual$i$has parameters$\psi_i$. If we consider that individuals are randomly selected from the population, then we can treat the$\psi_i$as if they were random vectors. As both$\by=(y_i , 1\leq i \leq N)$and$\bpsi=(\psi_i , 1\leq i \leq N)$are random, the model is now a joint distribution:$\qypsi$. Using basic probability, this can be written as:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsi(\by,\bpsi) = \pcypsi(\by {{!}} \bpsi) \, \ppsi(\bpsi) .&lt;/math&gt; }}<br /> <br /> If$\qpsi$is a parametric distribution that depends on a vector$\theta$of ''population parameters'' and a set of ''individual covariates''$\bc=(c_i , 1\leq i \leq N)$, this dependence can be made explicit by writing$\qpsi(\, \cdot \,;\theta,\bc)$for the pdf of$\bpsi$.<br /> Each$i$has a potentially unique set of times$t_i=(t_{i1},\ldots,t_{i \ \!\!n_i})$in the design, and$n_i$can be different for each individual.<br /> <br /> <br /> {{OutlineTextL<br /> |text= <br /> - In this context, the model is the joint distribution of the observations and the individual parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pypsi(\by , \bpsi; \theta, \bc,\bt)=\pcypsi(\by {{!}} \bpsi;\bt) \, \ppsi(\bpsi;\theta,\bc) . &lt;/math&gt;}}<br /> <br /> - The inputs of the model are the population parameters$\theta$, the individual covariates$\bc=(c_i , 1\leq i \leq N)$and the measurement times <br /> :$\bt=(t_{ij} ,\ 1\leq i \leq N ,\ 1\leq j \leq n_i)$.<br /> }}</div> Admin https://wiki.inria.fr/popix/TestMarc5 TestMarc5 2013-04-26T11:30:46Z <p>Admin: </p> <hr /> <div>&lt;div style=&quot;font-size:11pt;font-family:Tahoma&quot;&gt;<br /> &lt;!-- some LaTeX macros we want to use: --&gt;<br />$<br /> \newcommand{\argmin}{ \mathop{\rm arg} \mathop{\rm min}\limits_{#1} } <br /> \newcommand{\nominal}{#1^{\star}}<br /> \newcommand{\psis}{\psi{^\star}}<br /> \newcommand{\phis}{\phi{^\star}}<br /> \newcommand{\hpsi}{\hat{\psi}}<br /> \newcommand{\hphi}{\hat{\phi}}<br /> \newcommand{\teps}{\varepsilon}<br /> \newcommand{\limite}{\mathop{\longrightarrow}\limits_{\mathrm{#1}}^{\mathrm{#2}}}<br /> \newcommand{\DDt}{\partial^2_\theta #1}<br /> <br /> \def\bu{\boldsymbol{u}}<br /> \def\bt{\boldsymbol{t}}<br /> \def\bT{\boldsymbol{T}}<br /> \def\by{\boldsymbol{y}}<br /> \def\bx{\boldsymbol{x}}<br /> \def\bc{\boldsymbol{c}}<br /> \def\bw{\boldsymbol{w}}<br /> \def\bz{\boldsymbol{z}}<br /> \def\bpsi{\boldsymbol{\psi}}<br /> \def\bbeta{\beta}<br /> <br /> \def\aref{a^\star}<br /> \def\kref{k^\star}<br /> \def\model{M}<br /> \def\hmodel{m}<br /> \def\mmodel{\mu}<br /> \def\imodel{H}<br /> \def\like{\cal L}<br /> \def\thmle{\hat{\theta}}<br /> \def\ofim{I^{\rm obs}}<br /> \def\efim{I^{\star}}<br /> <br /> \def\Imax{\rm Imax}<br /> \def\probit{\rm probit}<br /> \def\vt{t}<br /> \def\id{\rm Id}<br /> \def\teta{\tilde{\eta}}<br /> \newcommand{\eqdef}{\mathop{=}\limits^{\mathrm{def}}}<br /> <br /> \newcommand{\deriv}{\frac{d}{dt}#1(t)}<br /> <br /> \newcommand{\pred}{\tilde{#1}}<br /> \def\phis{\phi{^\star}}<br /> \def\hphi{\tilde{\phi}}<br /> \def\hw{\tilde{w}}<br /> \def\hpsi{\tilde{\psi}}<br /> \def\hatpsi{\hat{\psi}}<br /> \def\hatphi{\hat{\phi}}<br /> \def\psis{\psi{^\star}}<br /> \def\transy{u}<br /> \def\psipop{\psi_{\rm pop}}<br /> \newcommand{\psigr}{\hat{\bpsi}_{#1}}<br /> \newcommand{\Vgr}{\hat{V}_{#1}}<br /> <br /> %\def\pmacro{\mathcrm{p}}<br /> %\def\pmacro{\verb!p!}<br /> \def\pmacro{\text{p}}<br /> \def\py{\pmacro}<br /> \def\pt{\pmacro}<br /> \def\pc{\pmacro}<br /> \def\pu{\pmacro}<br /> \def\pyi{\pmacro}<br /> \def\pyj{\pmacro}<br /> \def\ppsi{\pmacro}<br /> \def\ppsii{\pmacro}<br /> \def\pcpsith{\pmacro}<br /> \def\pth{\pmacro}<br /> \def\pypsi{\pmacro}<br /> \def\pcypsi{\pmacro}<br /> \def\ppsic{\pmacro}<br /> \def\pcpsic{\pmacro}<br /> \def\pypsic{\pmacro}<br /> \def\pypsit{\pmacro}<br /> \def\pcypsit{\pmacro}<br /> \def\pypsiu{\pmacro}<br /> \def\pcypsiu{\pmacro}<br /> \def\pypsith{\pmacro}<br /> \def\pypsithcut{\pmacro}<br /> \def\pypsithc{\pmacro}<br /> \def\pcypsiut{\pmacro}<br /> \def\pcpsithc{\pmacro}<br /> \def\pcthy{\pmacro}<br /> \def\pyth{\pmacro}<br /> \def\pcpsiy{\pmacro}<br /> \def\pz{\pmacro}<br /> \def\pw{\pmacro}<br /> \def\pcwz{\pmacro}<br /> \def\pw{\pmacro}<br /> \def\pcyipsii{\pmacro}<br /> \def\pyipsii{\pmacro}<br /> \def\pypsiij{\pmacro}<br /> \def\pyipsiONE{\pmacro}<br /> \def\ptypsiij{\pmacro}<br /> \def\pcyzipsii{\pmacro}<br /> \def\pczipsii{\pmacro}<br /> \def\pcyizpsii{\pmacro}<br /> \def\pcyijzpsii{\pmacro}<br /> \def\pcyiONEzpsii{\pmacro}<br /> \def\pcypsiz{\pmacro}<br /> \def\pccypsiz{\pmacro}<br /> \def\pypsiz{\pmacro}<br /> \def\pcpsiz{\pmacro}<br /> \def\peps{\pmacro}<br /> <br /> \def\psig{\psi}<br /> \def\psigprime{\psig^{\prime}}<br /> \def\psigiprime{\psig_i^{\prime}}<br /> \def\psigk{\psig^{(k)}}<br /> \def\psigki{\psig_i^{(k)}}<br /> \def\psigkun{\psig^{(k+1)}}<br /> \def\psigkuni{\psig_i^{(k+1)}}<br /> \def\psigi{\psig_i}<br /> \def\psigil{\psig_{i,\ell}}<br /> \def\phig{\phi}<br /> \def\phigi{\phig_i}<br /> \def\phigil{\phig_{i,\ell}}<br /> <br /> <br /> \def\etagi{\eta_i}<br /> \def\IIV{\Omega}<br /> \def\thetag{\theta}<br /> \def\thetagk{\theta_k}<br /> \def\thetagkun{\theta_{k+1}}<br /> \def\thetagkunm{\theta_{k-1}}<br /> \def\sgk{s_{k}}<br /> \def\sgkun{s_{k+1}}<br /> \def\yg{y}<br /> \def\xg{x}<br /> <br /> \def\qx{p_x}<br /> \def\qy{p_y}<br /> \def\qt{p_t}<br /> \def\qc{p_c}<br /> \def\qu{p_u}<br /> \def\qyi{p_{y_i}}<br /> \def\qyj{p_{y_j}}<br /> \def\qpsi{p_{\psi}}<br /> \def\qpsii{p_{\psi_i}}<br /> \def\qcpsith{p_{\psi|\theta}}<br /> \def\qth{p_{\theta}}<br /> \def\qypsi{p_{y,\psi}}<br /> \def\qcypsi{p_{y|\psi}}<br /> \def\qpsic{p_{\psi,c}}<br /> \def\qcpsic{p_{\psi|c}}<br /> \def\qypsic{p_{y,\psi,c}}<br /> \def\qypsit{p_{y,\psi,t}}<br /> \def\qcypsit{p_{y|\psi,t}}<br /> \def\qypsiu{p_{y,\psi,u}}<br /> \def\qcypsiu{p_{y|\psi,u}}<br /> \def\qypsith{p_{y,\psi,\theta}}<br /> \def\qypsithcut{p_{y,\psi,\theta,c,u,t}}<br /> \def\qypsithc{p_{y,\psi,\theta,c}}<br /> \def\qcypsiut{p_{y|\psi,u,t}}<br /> \def\qcpsithc{p_{\psi|\theta,c}}<br /> \def\qcthy{p_{\theta | y}}<br /> \def\qyth{p_{y,\theta}}<br /> \def\qcpsiy{p_{\psi|y}}<br /> \def\qz{p_z}<br /> \def\qw{p_w}<br /> \def\qcwz{p_{w|z}}<br /> \def\qw{p_w}<br /> \def\qcyipsii{p_{y_i|\psi_i}}<br /> \def\qyipsii{p_{y_i,\psi_i}}<br /> \def\qypsiij{p_{y_{ij}|\psi_{i}}}<br /> \def\qyipsi1{p_{y_{i1}|\psi_{i}}}<br /> \def\qtypsiij{p_{\transy(y_{ij})|\psi_{i}}}<br /> \def\qcyzipsii{p_{z_i,y_i|\psi_i}}<br /> \def\qczipsii{p_{z_i|\psi_i}}<br /> \def\qcyizpsii{p_{y_i|z_i,\psi_i}}<br /> \def\qcyijzpsii{p_{y_{ij}|z_{ij},\psi_i}}<br /> \def\qcyi1zpsii{p_{y_{i1}|z_{i1},\psi_i}}<br /> \def\qcypsiz{p_{y,\psi|z}}<br /> \def\qccypsiz{p_{y|\psi,z}}<br /> \def\qypsiz{p_{y,\psi,z}}<br /> \def\qcpsiz{p_{\psi|z}}<br /> \def\qeps{p_{\teps}}<br /> <br /> \def\neta{n_\eta}<br /> \def\ncov{M}<br /> \def\npsi{n_\psig}<br /> <br /> \def\beeta{\eta}<br /> <br /> \def\logit{\rm logit}<br /> \def\transy{u}<br /> \def\so{O}<br /> <br /> \newcommand{\prob}{ \mathbb{P}\left(#1\right)}<br /> \newcommand{\probs}{ \mathbb{P}_{#1}\left(#2\right)}<br /> \newcommand{\esp}{\mathbb{E}\left(#1\right)}<br /> \newcommand{\esps}{\mathbb{E}_{#1}\left(#2\right)}<br /> \newcommand{\var}{\mbox{Var}\left(#1\right)}<br /> \newcommand{\vars}{\mbox{Var}_{#1}\left(#2\right)}<br /> \newcommand{\std}{\mbox{sd}\left(#1\right)}<br /> \newcommand{\stds}{\mbox{sd}_{#1}\left(#2\right)}<br /> \newcommand{\corr}{\mbox{Corr}\left(#1\right)}<br /> \newcommand{\Rset}{\mbox{$\mathbb{R}$}}<br /> \newcommand{\Yr}{\mbox{$\mathcal{Y}$}}<br /> \newcommand{\teps}{\varepsilon}<br /> \newcommand{\like}{\cal L}<br /> \newcommand{\logit}{\rm logit}<br /> \newcommand{\transy}{u}<br /> \newcommand{\repy}{y^{(r)}}<br /> \newcommand{\brepy}{\boldsymbol{y}^{(r)}}<br /> \newcommand{\vari}{#1_{#2}^{{#3}}}<br /> \newcommand{\dA}{\dot{#1}_{#2}(t)}<br /> \newcommand{\nitc}{N}<br /> \newcommand{\itc}{I}<br /> \newcommand{\vl}{V}<br /> \newcommand{tstart}{t_{start}}<br /> \newcommand{tstop}{t_{stop}}<br /> \newcommand{\one}{\mathbb{1}}<br /> \newcommand{\hazard}{h}<br /> \newcommand{\cumhaz}{H}<br /> \newcommand{\std}{\mbox{sd}\left(#1\right)}<br /> \newcommand{\eqdef}{\mathop{=}\limits^{\mathrm{def}}}<br /> <br /> \def\cpop{c_{\rm pop}}<br /> \def\Vpop{V_{\rm pop}}<br /> \def\iparam{l}<br /> \newcommand{\trcov}{#1}<br /> <br /> \def\mlxtran{\mathbb{MLXtran} }<br /> \def\monolix{\Bbb{Monolix}}<br /> $<br /> <br /> ==Introduction==<br /> <br /> A model built for real-world applications can involve various types of variable, such as measurements, individual and population parameters, covariates, design, etc. The model allows us to represent relationships between these variables.<br /> <br /> If we consider things from a probabilistic point of view, some of the variables will be random, so the model becomes a probabilistic one, representing the joint distribution of these random variables.<br /> <br /> Defining a model therefore means defining a joint distribution. The hierarchical structure of the model will then allow it to be decomposed into submodels, i.e., the joint distribution decomposed into a product of conditional distributions.<br /> <br /> Tasks such as estimation, model selection, simulation and optimization can then be expressed as specific ways of using this probability distribution.<br /> <br /> <br /> {{OutlineTextL<br /> |text= <br /> - A model is a joint probability distribution. <br /> <br /> - A submodel is a conditional distribution derived from this joint distribution. <br /> <br /> - A task is a specific use of this distribution. <br /> }}<br /> <br /> We will illustrate this approach starting with a very simple example that we will gradually make more sophisticated. Then we will see in various situations what can be defined as the model and what its inputs are.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==An illustrative example==<br /> <br /> &lt;br&gt;<br /> ===A model for the observations of a single individual===<br /> Let$y=(y_j, 1\leq j \leq n)$be a vector of ''observations'' obtained at times$\vt=(t_j, 1\leq j \leq n)$. We consider that the$y_j$are random variables and we denote$\qy$the distribution (or pdf) of$y$. If we assume a ''parametric model'', then there exists a vector of parameters$\psi$that completely define$y$.<br /> <br /> We can then explicitly represent this dependency with respect to$\bpsi$by writing$\qy( \, \cdot \, ; \psi)$for the pdf of$y$.<br /> <br /> If we wish to be even more precise, we can even make it clear that this distribution is defined for a given design, i.e., a given vector of times$\vt$, and write$ \qy(\, \cdot \, ; \psi,\vt)$instead.<br /> <br /> By convention, the variables which are before the symbol &quot;;&quot; are random variables. Those that are after the &quot;;&quot; are non-random parameters or variables.<br /> When there is no risk of confusion, the non-random terms can be left out of the notation.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> -In this context, the model is the distribution of the observations$\qy(\, \cdot \, ; \psi,\vt)$. &lt;br&gt;<br /> -The inputs of the model are the parameters$\psi$and the design$\vt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= 500 mg of a drug is given by intravenous bolus to a patient at time 0. We assume that the evolution of the plasmatic concentration of the drug over time is described by the pharmacokinetic (PK) model<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; f(t;V,k) = \displaystyle{ \frac{500}{V} }e^{-k \, t} , &lt;/math&gt; }}<br /> <br /> where$V$is the volume of distribution and$k$the elimination rate constant. The concentration is measured at times$(t_j, 1\leq j \leq n)$with additive residual errors:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; y_j = f(t_j;V,k) + e_j , \quad 1 \leq j \leq n . &lt;/math&gt; }}<br /> <br /> Assuming that the residual errors$(e_j)$are independent and normally distributed with constant variance$a^2$, the observed values$(y_j)$are also independent random variables and<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba1&quot; &gt;&lt;math&gt;<br /> y_j \sim {\cal N} \left( f(t_j ; V,k) , a^2 \right), \quad 1 \leq j \leq n. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1.4) }}<br /> <br /> Here, the vector of parameters$\psi$is$(V,k,a)$.$V$and$k$are the PK parameters for the structural PK model and$a$the residual error parameter.<br /> As the$y_j$are independent, the joint distribution of$y$is the product of their marginal distributions:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \py(y ; \psi,\vt) = \prod_{j=1}^n \pyj(y_j ; \psi,t_j) ,<br /> &lt;/math&gt; }}<br /> <br /> where$\qyj$is the normal distribution defined in [[#ex_proba1|(1.4)]]. <br /> }}<br /> <br /> === A model for several individuals ===<br /> <br /> Now let us move to$N$individuals. It is natural to suppose that each is represented by the same basic parametric model, but not necessarily the exact same parameter values. Thus, individual$i$has parameters$\psi_i$. If we consider that individuals are randomly selected from the population, then we can treat the$\psi_i$as if they were random vectors. As both$\by=(y_i , 1\leq i \leq N)$and$\bpsi=(\psi_i , 1\leq i \leq N)$are random, the model is now a joint distribution:$\qypsi$. Using basic probability, this can be written as:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsi(\by,\bpsi) = \pcypsi(\by {{!}} \bpsi) \, \ppsi(\bpsi) .&lt;/math&gt; }}<br /> <br /> If$\qpsi$is a parametric distribution that depends on a vector$\theta$of ''population parameters'' and a set of ''individual covariates''$\bc=(c_i , 1\leq i \leq N)$, this dependence can be made explicit by writing$\qpsi(\, \cdot \,;\theta,\bc)$for the pdf of$\bpsi$.<br /> Each$i$has a potentially unique set of times$t_i=(t_{i1},\ldots,t_{i \ \!\!n_i})$in the design, and$n_i$can be different for each individual.<br /> <br /> <br /> {{OutlineTextL<br /> |text= <br /> - In this context, the model is the joint distribution of the observations and the individual parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pypsi(\by , \bpsi; \theta, \bc,\bt)=\pcypsi(\by {{!}} \bpsi;\bt) \, \ppsi(\bpsi;\theta,\bc) . &lt;/math&gt;}}<br /> <br /> - The inputs of the model are the population parameters$\theta$, the individual covariates$\bc=(c_i , 1\leq i \leq N)$and the measurement times <br /> :$\bt=(t_{ij} ,\ 1\leq i \leq N ,\ 1\leq j \leq n_i)$.<br /> }}</div> Admin https://wiki.inria.fr/popix/TestMarc4 TestMarc4 2013-04-26T11:21:51Z <p>Admin: Page créée avec « &lt;div style=&quot;font-size:12pt;font-family:DejaVu Sans&quot;&gt; &lt;!-- some LaTeX macros we want to use: --&gt;$ \newcommand{\argmin}{ \mathop{\rm arg} \mathop{\rm min}\limits_{#1} } ... »</p> <hr /> <div>&lt;div style=&quot;font-size:12pt;font-family:DejaVu Sans&quot;&gt;<br /> &lt;!-- some LaTeX macros we want to use: --&gt;<br /> $<br /> \newcommand{\argmin}{ \mathop{\rm arg} \mathop{\rm min}\limits_{#1} } <br /> \newcommand{\nominal}{#1^{\star}}<br /> \newcommand{\psis}{\psi{^\star}}<br /> \newcommand{\phis}{\phi{^\star}}<br /> \newcommand{\hpsi}{\hat{\psi}}<br /> \newcommand{\hphi}{\hat{\phi}}<br /> \newcommand{\teps}{\varepsilon}<br /> \newcommand{\limite}{\mathop{\longrightarrow}\limits_{\mathrm{#1}}^{\mathrm{#2}}}<br /> \newcommand{\DDt}{\partial^2_\theta #1}<br /> <br /> \def\bu{\boldsymbol{u}}<br /> \def\bt{\boldsymbol{t}}<br /> \def\bT{\boldsymbol{T}}<br /> \def\by{\boldsymbol{y}}<br /> \def\bx{\boldsymbol{x}}<br /> \def\bc{\boldsymbol{c}}<br /> \def\bw{\boldsymbol{w}}<br /> \def\bz{\boldsymbol{z}}<br /> \def\bpsi{\boldsymbol{\psi}}<br /> \def\bbeta{\beta}<br /> <br /> \def\aref{a^\star}<br /> \def\kref{k^\star}<br /> \def\model{M}<br /> \def\hmodel{m}<br /> \def\mmodel{\mu}<br /> \def\imodel{H}<br /> \def\like{\cal L}<br /> \def\thmle{\hat{\theta}}<br /> \def\ofim{I^{\rm obs}}<br /> \def\efim{I^{\star}}<br /> <br /> \def\Imax{\rm Imax}<br /> \def\probit{\rm probit}<br /> \def\vt{t}<br /> \def\id{\rm Id}<br /> \def\teta{\tilde{\eta}}<br /> \newcommand{\eqdef}{\mathop{=}\limits^{\mathrm{def}}}<br /> <br /> \newcommand{\deriv}{\frac{d}{dt}#1(t)}<br /> <br /> \newcommand{\pred}{\tilde{#1}}<br /> \def\phis{\phi{^\star}}<br /> \def\hphi{\tilde{\phi}}<br /> \def\hw{\tilde{w}}<br /> \def\hpsi{\tilde{\psi}}<br /> \def\hatpsi{\hat{\psi}}<br /> \def\hatphi{\hat{\phi}}<br /> \def\psis{\psi{^\star}}<br /> \def\transy{u}<br /> \def\psipop{\psi_{\rm pop}}<br /> \newcommand{\psigr}{\hat{\bpsi}_{#1}}<br /> \newcommand{\Vgr}{\hat{V}_{#1}}<br /> <br /> %\def\pmacro{\mathcrm{p}}<br /> %\def\pmacro{\verb!p!}<br /> \def\pmacro{\text{p}}<br /> \def\py{\pmacro}<br /> \def\pt{\pmacro}<br /> \def\pc{\pmacro}<br /> \def\pu{\pmacro}<br /> \def\pyi{\pmacro}<br /> \def\pyj{\pmacro}<br /> \def\ppsi{\pmacro}<br /> \def\ppsii{\pmacro}<br /> \def\pcpsith{\pmacro}<br /> \def\pth{\pmacro}<br /> \def\pypsi{\pmacro}<br /> \def\pcypsi{\pmacro}<br /> \def\ppsic{\pmacro}<br /> \def\pcpsic{\pmacro}<br /> \def\pypsic{\pmacro}<br /> \def\pypsit{\pmacro}<br /> \def\pcypsit{\pmacro}<br /> \def\pypsiu{\pmacro}<br /> \def\pcypsiu{\pmacro}<br /> \def\pypsith{\pmacro}<br /> \def\pypsithcut{\pmacro}<br /> \def\pypsithc{\pmacro}<br /> \def\pcypsiut{\pmacro}<br /> \def\pcpsithc{\pmacro}<br /> \def\pcthy{\pmacro}<br /> \def\pyth{\pmacro}<br /> \def\pcpsiy{\pmacro}<br /> \def\pz{\pmacro}<br /> \def\pw{\pmacro}<br /> \def\pcwz{\pmacro}<br /> \def\pw{\pmacro}<br /> \def\pcyipsii{\pmacro}<br /> \def\pyipsii{\pmacro}<br /> \def\pypsiij{\pmacro}<br /> \def\pyipsiONE{\pmacro}<br /> \def\ptypsiij{\pmacro}<br /> \def\pcyzipsii{\pmacro}<br /> \def\pczipsii{\pmacro}<br /> \def\pcyizpsii{\pmacro}<br /> \def\pcyijzpsii{\pmacro}<br /> \def\pcyiONEzpsii{\pmacro}<br /> \def\pcypsiz{\pmacro}<br /> \def\pccypsiz{\pmacro}<br /> \def\pypsiz{\pmacro}<br /> \def\pcpsiz{\pmacro}<br /> \def\peps{\pmacro}<br /> <br /> \def\psig{\psi}<br /> \def\psigprime{\psig^{\prime}}<br /> \def\psigiprime{\psig_i^{\prime}}<br /> \def\psigk{\psig^{(k)}}<br /> \def\psigki{\psig_i^{(k)}}<br /> \def\psigkun{\psig^{(k+1)}}<br /> \def\psigkuni{\psig_i^{(k+1)}}<br /> \def\psigi{\psig_i}<br /> \def\psigil{\psig_{i,\ell}}<br /> \def\phig{\phi}<br /> \def\phigi{\phig_i}<br /> \def\phigil{\phig_{i,\ell}}<br /> <br /> <br /> \def\etagi{\eta_i}<br /> \def\IIV{\Omega}<br /> \def\thetag{\theta}<br /> \def\thetagk{\theta_k}<br /> \def\thetagkun{\theta_{k+1}}<br /> \def\thetagkunm{\theta_{k-1}}<br /> \def\sgk{s_{k}}<br /> \def\sgkun{s_{k+1}}<br /> \def\yg{y}<br /> \def\xg{x}<br /> <br /> \def\qx{p_x}<br /> \def\qy{p_y}<br /> \def\qt{p_t}<br /> \def\qc{p_c}<br /> \def\qu{p_u}<br /> \def\qyi{p_{y_i}}<br /> \def\qyj{p_{y_j}}<br /> \def\qpsi{p_{\psi}}<br /> \def\qpsii{p_{\psi_i}}<br /> \def\qcpsith{p_{\psi|\theta}}<br /> \def\qth{p_{\theta}}<br /> \def\qypsi{p_{y,\psi}}<br /> \def\qcypsi{p_{y|\psi}}<br /> \def\qpsic{p_{\psi,c}}<br /> \def\qcpsic{p_{\psi|c}}<br /> \def\qypsic{p_{y,\psi,c}}<br /> \def\qypsit{p_{y,\psi,t}}<br /> \def\qcypsit{p_{y|\psi,t}}<br /> \def\qypsiu{p_{y,\psi,u}}<br /> \def\qcypsiu{p_{y|\psi,u}}<br /> \def\qypsith{p_{y,\psi,\theta}}<br /> \def\qypsithcut{p_{y,\psi,\theta,c,u,t}}<br /> \def\qypsithc{p_{y,\psi,\theta,c}}<br /> \def\qcypsiut{p_{y|\psi,u,t}}<br /> \def\qcpsithc{p_{\psi|\theta,c}}<br /> \def\qcthy{p_{\theta | y}}<br /> \def\qyth{p_{y,\theta}}<br /> \def\qcpsiy{p_{\psi|y}}<br /> \def\qz{p_z}<br /> \def\qw{p_w}<br /> \def\qcwz{p_{w|z}}<br /> \def\qw{p_w}<br /> \def\qcyipsii{p_{y_i|\psi_i}}<br /> \def\qyipsii{p_{y_i,\psi_i}}<br /> \def\qypsiij{p_{y_{ij}|\psi_{i}}}<br /> \def\qyipsi1{p_{y_{i1}|\psi_{i}}}<br /> \def\qtypsiij{p_{\transy(y_{ij})|\psi_{i}}}<br /> \def\qcyzipsii{p_{z_i,y_i|\psi_i}}<br /> \def\qczipsii{p_{z_i|\psi_i}}<br /> \def\qcyizpsii{p_{y_i|z_i,\psi_i}}<br /> \def\qcyijzpsii{p_{y_{ij}|z_{ij},\psi_i}}<br /> \def\qcyi1zpsii{p_{y_{i1}|z_{i1},\psi_i}}<br /> \def\qcypsiz{p_{y,\psi|z}}<br /> \def\qccypsiz{p_{y|\psi,z}}<br /> \def\qypsiz{p_{y,\psi,z}}<br /> \def\qcpsiz{p_{\psi|z}}<br /> \def\qeps{p_{\teps}}<br /> <br /> \def\neta{n_\eta}<br /> \def\ncov{M}<br /> \def\npsi{n_\psig}<br /> <br /> \def\beeta{\eta}<br /> <br /> \def\logit{\rm logit}<br /> \def\transy{u}<br /> \def\so{O}<br /> <br /> \newcommand{\prob}{ \mathbb{P}\left(#1\right)}<br /> \newcommand{\probs}{ \mathbb{P}_{#1}\left(#2\right)}<br /> \newcommand{\esp}{\mathbb{E}\left(#1\right)}<br /> \newcommand{\esps}{\mathbb{E}_{#1}\left(#2\right)}<br /> \newcommand{\var}{\mbox{Var}\left(#1\right)}<br /> \newcommand{\vars}{\mbox{Var}_{#1}\left(#2\right)}<br /> \newcommand{\std}{\mbox{sd}\left(#1\right)}<br /> \newcommand{\stds}{\mbox{sd}_{#1}\left(#2\right)}<br /> \newcommand{\corr}{\mbox{Corr}\left(#1\right)}<br /> \newcommand{\Rset}{\mbox{$\mathbb{R}$}}<br /> \newcommand{\Yr}{\mbox{$\mathcal{Y}$}}<br /> \newcommand{\teps}{\varepsilon}<br /> \newcommand{\like}{\cal L}<br /> \newcommand{\logit}{\rm logit}<br /> \newcommand{\transy}{u}<br /> \newcommand{\repy}{y^{(r)}}<br /> \newcommand{\brepy}{\boldsymbol{y}^{(r)}}<br /> \newcommand{\vari}{#1_{#2}^{{#3}}}<br /> \newcommand{\dA}{\dot{#1}_{#2}(t)}<br /> \newcommand{\nitc}{N}<br /> \newcommand{\itc}{I}<br /> \newcommand{\vl}{V}<br /> \newcommand{tstart}{t_{start}}<br /> \newcommand{tstop}{t_{stop}}<br /> \newcommand{\one}{\mathbb{1}}<br /> \newcommand{\hazard}{h}<br /> \newcommand{\cumhaz}{H}<br /> \newcommand{\std}{\mbox{sd}\left(#1\right)}<br /> \newcommand{\eqdef}{\mathop{=}\limits^{\mathrm{def}}}<br /> <br /> \def\cpop{c_{\rm pop}}<br /> \def\Vpop{V_{\rm pop}}<br /> \def\iparam{l}<br /> \newcommand{\trcov}{#1}<br /> <br /> \def\mlxtran{\mathbb{MLXtran} }<br /> \def\monolix{\Bbb{Monolix}}<br />$<br /> <br /> ==Introduction==<br /> <br /> A model built for real-world applications can involve various types of variable, such as measurements, individual and population parameters, covariates, design, etc. The model allows us to represent relationships between these variables.<br /> <br /> If we consider things from a probabilistic point of view, some of the variables will be random, so the model becomes a probabilistic one, representing the joint distribution of these random variables.<br /> <br /> Defining a model therefore means defining a joint distribution. The hierarchical structure of the model will then allow it to be decomposed into submodels, i.e., the joint distribution decomposed into a product of conditional distributions.<br /> <br /> Tasks such as estimation, model selection, simulation and optimization can then be expressed as specific ways of using this probability distribution.<br /> <br /> <br /> {{OutlineTextL<br /> |text= <br /> - A model is a joint probability distribution. <br /> <br /> - A submodel is a conditional distribution derived from this joint distribution. <br /> <br /> - A task is a specific use of this distribution. <br /> }}<br /> <br /> We will illustrate this approach starting with a very simple example that we will gradually make more sophisticated. Then we will see in various situations what can be defined as the model and what its inputs are.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==An illustrative example==<br /> <br /> &lt;br&gt;<br /> ===A model for the observations of a single individual===<br /> Let $y=(y_j, 1\leq j \leq n)$ be a vector of ''observations'' obtained at times $\vt=(t_j, 1\leq j \leq n)$. We consider that the $y_j$ are random variables and we denote $\qy$ the distribution (or pdf) of $y$. If we assume a ''parametric model'', then there exists a vector of parameters $\psi$ that completely define $y$.<br /> <br /> We can then explicitly represent this dependency with respect to $\bpsi$ by writing $\qy( \, \cdot \, ; \psi)$ for the pdf of $y$.<br /> <br /> If we wish to be even more precise, we can even make it clear that this distribution is defined for a given design, i.e., a given vector of times $\vt$, and write $\qy(\, \cdot \, ; \psi,\vt)$ instead.<br /> <br /> By convention, the variables which are before the symbol &quot;;&quot; are random variables. Those that are after the &quot;;&quot; are non-random parameters or variables.<br /> When there is no risk of confusion, the non-random terms can be left out of the notation.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> -In this context, the model is the distribution of the observations $\qy(\, \cdot \, ; \psi,\vt)$. &lt;br&gt;<br /> -The inputs of the model are the parameters $\psi$ and the design $\vt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= 500 mg of a drug is given by intravenous bolus to a patient at time 0. We assume that the evolution of the plasmatic concentration of the drug over time is described by the pharmacokinetic (PK) model<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; f(t;V,k) = \displaystyle{ \frac{500}{V} }e^{-k \, t} , &lt;/math&gt; }}<br /> <br /> where $V$ is the volume of distribution and $k$ the elimination rate constant. The concentration is measured at times $(t_j, 1\leq j \leq n)$ with additive residual errors:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; y_j = f(t_j;V,k) + e_j , \quad 1 \leq j \leq n . &lt;/math&gt; }}<br /> <br /> Assuming that the residual errors $(e_j)$ are independent and normally distributed with constant variance $a^2$, the observed values $(y_j)$ are also independent random variables and<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba1&quot; &gt;&lt;math&gt;<br /> y_j \sim {\cal N} \left( f(t_j ; V,k) , a^2 \right), \quad 1 \leq j \leq n. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1.4) }}<br /> <br /> Here, the vector of parameters $\psi$ is $(V,k,a)$. $V$ and $k$ are the PK parameters for the structural PK model and $a$ the residual error parameter.<br /> As the $y_j$ are independent, the joint distribution of $y$ is the product of their marginal distributions:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \py(y ; \psi,\vt) = \prod_{j=1}^n \pyj(y_j ; \psi,t_j) ,<br /> &lt;/math&gt; }}<br /> <br /> where $\qyj$ is the normal distribution defined in [[#ex_proba1|(1.4)]]. <br /> }}<br /> <br /> === A model for several individuals ===<br /> <br /> Now let us move to $N$ individuals. It is natural to suppose that each is represented by the same basic parametric model, but not necessarily the exact same parameter values. Thus, individual $i$ has parameters $\psi_i$. If we consider that individuals are randomly selected from the population, then we can treat the $\psi_i$ as if they were random vectors. As both $\by=(y_i , 1\leq i \leq N)$ and $\bpsi=(\psi_i , 1\leq i \leq N)$ are random, the model is now a joint distribution: $\qypsi$. Using basic probability, this can be written as:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsi(\by,\bpsi) = \pcypsi(\by {{!}} \bpsi) \, \ppsi(\bpsi) .&lt;/math&gt; }}<br /> <br /> If $\qpsi$ is a parametric distribution that depends on a vector $\theta$ of ''population parameters'' and a set of ''individual covariates'' $\bc=(c_i , 1\leq i \leq N)$, this dependence can be made explicit by writing $\qpsi(\, \cdot \,;\theta,\bc)$ for the pdf of $\bpsi$.<br /> Each $i$ has a potentially unique set of times $t_i=(t_{i1},\ldots,t_{i \ \!\!n_i})$ in the design, and $n_i$ can be different for each individual.<br /> <br /> <br /> {{OutlineTextL<br /> |text= <br /> - In this context, the model is the joint distribution of the observations and the individual parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pypsi(\by , \bpsi; \theta, \bc,\bt)=\pcypsi(\by {{!}} \bpsi;\bt) \, \ppsi(\bpsi;\theta,\bc) . &lt;/math&gt;}}<br /> <br /> - The inputs of the model are the population parameters $\theta$, the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times <br /> :$\bt=(t_{ij} ,\ 1\leq i \leq N ,\ 1\leq j \leq n_i)$.<br /> }}</div> Admin https://wiki.inria.fr/popix/TestMarc3 TestMarc3 2013-04-26T11:20:54Z <p>Admin: Page créée avec « &lt;div style=&quot;font-size:12pt;font-family:Calibri&quot;&gt; &lt;!-- some LaTeX macros we want to use: --&gt; $\newcommand{\argmin}{ \mathop{\rm arg} \mathop{\rm min}\limits_{#1} } \ne... »</p> <hr /> <div>&lt;div style=&quot;font-size:12pt;font-family:Calibri&quot;&gt;<br /> &lt;!-- some LaTeX macros we want to use: --&gt;<br />$<br /> \newcommand{\argmin}{ \mathop{\rm arg} \mathop{\rm min}\limits_{#1} } <br /> \newcommand{\nominal}{#1^{\star}}<br /> \newcommand{\psis}{\psi{^\star}}<br /> \newcommand{\phis}{\phi{^\star}}<br /> \newcommand{\hpsi}{\hat{\psi}}<br /> \newcommand{\hphi}{\hat{\phi}}<br /> \newcommand{\teps}{\varepsilon}<br /> \newcommand{\limite}{\mathop{\longrightarrow}\limits_{\mathrm{#1}}^{\mathrm{#2}}}<br /> \newcommand{\DDt}{\partial^2_\theta #1}<br /> <br /> \def\bu{\boldsymbol{u}}<br /> \def\bt{\boldsymbol{t}}<br /> \def\bT{\boldsymbol{T}}<br /> \def\by{\boldsymbol{y}}<br /> \def\bx{\boldsymbol{x}}<br /> \def\bc{\boldsymbol{c}}<br /> \def\bw{\boldsymbol{w}}<br /> \def\bz{\boldsymbol{z}}<br /> \def\bpsi{\boldsymbol{\psi}}<br /> \def\bbeta{\beta}<br /> <br /> \def\aref{a^\star}<br /> \def\kref{k^\star}<br /> \def\model{M}<br /> \def\hmodel{m}<br /> \def\mmodel{\mu}<br /> \def\imodel{H}<br /> \def\like{\cal L}<br /> \def\thmle{\hat{\theta}}<br /> \def\ofim{I^{\rm obs}}<br /> \def\efim{I^{\star}}<br /> <br /> \def\Imax{\rm Imax}<br /> \def\probit{\rm probit}<br /> \def\vt{t}<br /> \def\id{\rm Id}<br /> \def\teta{\tilde{\eta}}<br /> \newcommand{\eqdef}{\mathop{=}\limits^{\mathrm{def}}}<br /> <br /> \newcommand{\deriv}{\frac{d}{dt}#1(t)}<br /> <br /> \newcommand{\pred}{\tilde{#1}}<br /> \def\phis{\phi{^\star}}<br /> \def\hphi{\tilde{\phi}}<br /> \def\hw{\tilde{w}}<br /> \def\hpsi{\tilde{\psi}}<br /> \def\hatpsi{\hat{\psi}}<br /> \def\hatphi{\hat{\phi}}<br /> \def\psis{\psi{^\star}}<br /> \def\transy{u}<br /> \def\psipop{\psi_{\rm pop}}<br /> \newcommand{\psigr}{\hat{\bpsi}_{#1}}<br /> \newcommand{\Vgr}{\hat{V}_{#1}}<br /> <br /> %\def\pmacro{\mathcrm{p}}<br /> %\def\pmacro{\verb!p!}<br /> \def\pmacro{\text{p}}<br /> \def\py{\pmacro}<br /> \def\pt{\pmacro}<br /> \def\pc{\pmacro}<br /> \def\pu{\pmacro}<br /> \def\pyi{\pmacro}<br /> \def\pyj{\pmacro}<br /> \def\ppsi{\pmacro}<br /> \def\ppsii{\pmacro}<br /> \def\pcpsith{\pmacro}<br /> \def\pth{\pmacro}<br /> \def\pypsi{\pmacro}<br /> \def\pcypsi{\pmacro}<br /> \def\ppsic{\pmacro}<br /> \def\pcpsic{\pmacro}<br /> \def\pypsic{\pmacro}<br /> \def\pypsit{\pmacro}<br /> \def\pcypsit{\pmacro}<br /> \def\pypsiu{\pmacro}<br /> \def\pcypsiu{\pmacro}<br /> \def\pypsith{\pmacro}<br /> \def\pypsithcut{\pmacro}<br /> \def\pypsithc{\pmacro}<br /> \def\pcypsiut{\pmacro}<br /> \def\pcpsithc{\pmacro}<br /> \def\pcthy{\pmacro}<br /> \def\pyth{\pmacro}<br /> \def\pcpsiy{\pmacro}<br /> \def\pz{\pmacro}<br /> \def\pw{\pmacro}<br /> \def\pcwz{\pmacro}<br /> \def\pw{\pmacro}<br /> \def\pcyipsii{\pmacro}<br /> \def\pyipsii{\pmacro}<br /> \def\pypsiij{\pmacro}<br /> \def\pyipsiONE{\pmacro}<br /> \def\ptypsiij{\pmacro}<br /> \def\pcyzipsii{\pmacro}<br /> \def\pczipsii{\pmacro}<br /> \def\pcyizpsii{\pmacro}<br /> \def\pcyijzpsii{\pmacro}<br /> \def\pcyiONEzpsii{\pmacro}<br /> \def\pcypsiz{\pmacro}<br /> \def\pccypsiz{\pmacro}<br /> \def\pypsiz{\pmacro}<br /> \def\pcpsiz{\pmacro}<br /> \def\peps{\pmacro}<br /> <br /> \def\psig{\psi}<br /> \def\psigprime{\psig^{\prime}}<br /> \def\psigiprime{\psig_i^{\prime}}<br /> \def\psigk{\psig^{(k)}}<br /> \def\psigki{\psig_i^{(k)}}<br /> \def\psigkun{\psig^{(k+1)}}<br /> \def\psigkuni{\psig_i^{(k+1)}}<br /> \def\psigi{\psig_i}<br /> \def\psigil{\psig_{i,\ell}}<br /> \def\phig{\phi}<br /> \def\phigi{\phig_i}<br /> \def\phigil{\phig_{i,\ell}}<br /> <br /> <br /> \def\etagi{\eta_i}<br /> \def\IIV{\Omega}<br /> \def\thetag{\theta}<br /> \def\thetagk{\theta_k}<br /> \def\thetagkun{\theta_{k+1}}<br /> \def\thetagkunm{\theta_{k-1}}<br /> \def\sgk{s_{k}}<br /> \def\sgkun{s_{k+1}}<br /> \def\yg{y}<br /> \def\xg{x}<br /> <br /> \def\qx{p_x}<br /> \def\qy{p_y}<br /> \def\qt{p_t}<br /> \def\qc{p_c}<br /> \def\qu{p_u}<br /> \def\qyi{p_{y_i}}<br /> \def\qyj{p_{y_j}}<br /> \def\qpsi{p_{\psi}}<br /> \def\qpsii{p_{\psi_i}}<br /> \def\qcpsith{p_{\psi|\theta}}<br /> \def\qth{p_{\theta}}<br /> \def\qypsi{p_{y,\psi}}<br /> \def\qcypsi{p_{y|\psi}}<br /> \def\qpsic{p_{\psi,c}}<br /> \def\qcpsic{p_{\psi|c}}<br /> \def\qypsic{p_{y,\psi,c}}<br /> \def\qypsit{p_{y,\psi,t}}<br /> \def\qcypsit{p_{y|\psi,t}}<br /> \def\qypsiu{p_{y,\psi,u}}<br /> \def\qcypsiu{p_{y|\psi,u}}<br /> \def\qypsith{p_{y,\psi,\theta}}<br /> \def\qypsithcut{p_{y,\psi,\theta,c,u,t}}<br /> \def\qypsithc{p_{y,\psi,\theta,c}}<br /> \def\qcypsiut{p_{y|\psi,u,t}}<br /> \def\qcpsithc{p_{\psi|\theta,c}}<br /> \def\qcthy{p_{\theta | y}}<br /> \def\qyth{p_{y,\theta}}<br /> \def\qcpsiy{p_{\psi|y}}<br /> \def\qz{p_z}<br /> \def\qw{p_w}<br /> \def\qcwz{p_{w|z}}<br /> \def\qw{p_w}<br /> \def\qcyipsii{p_{y_i|\psi_i}}<br /> \def\qyipsii{p_{y_i,\psi_i}}<br /> \def\qypsiij{p_{y_{ij}|\psi_{i}}}<br /> \def\qyipsi1{p_{y_{i1}|\psi_{i}}}<br /> \def\qtypsiij{p_{\transy(y_{ij})|\psi_{i}}}<br /> \def\qcyzipsii{p_{z_i,y_i|\psi_i}}<br /> \def\qczipsii{p_{z_i|\psi_i}}<br /> \def\qcyizpsii{p_{y_i|z_i,\psi_i}}<br /> \def\qcyijzpsii{p_{y_{ij}|z_{ij},\psi_i}}<br /> \def\qcyi1zpsii{p_{y_{i1}|z_{i1},\psi_i}}<br /> \def\qcypsiz{p_{y,\psi|z}}<br /> \def\qccypsiz{p_{y|\psi,z}}<br /> \def\qypsiz{p_{y,\psi,z}}<br /> \def\qcpsiz{p_{\psi|z}}<br /> \def\qeps{p_{\teps}}<br /> <br /> \def\neta{n_\eta}<br /> \def\ncov{M}<br /> \def\npsi{n_\psig}<br /> <br /> \def\beeta{\eta}<br /> <br /> \def\logit{\rm logit}<br /> \def\transy{u}<br /> \def\so{O}<br /> <br /> \newcommand{\prob}{ \mathbb{P}\left(#1\right)}<br /> \newcommand{\probs}{ \mathbb{P}_{#1}\left(#2\right)}<br /> \newcommand{\esp}{\mathbb{E}\left(#1\right)}<br /> \newcommand{\esps}{\mathbb{E}_{#1}\left(#2\right)}<br /> \newcommand{\var}{\mbox{Var}\left(#1\right)}<br /> \newcommand{\vars}{\mbox{Var}_{#1}\left(#2\right)}<br /> \newcommand{\std}{\mbox{sd}\left(#1\right)}<br /> \newcommand{\stds}{\mbox{sd}_{#1}\left(#2\right)}<br /> \newcommand{\corr}{\mbox{Corr}\left(#1\right)}<br /> \newcommand{\Rset}{\mbox{$\mathbb{R}$}}<br /> \newcommand{\Yr}{\mbox{$\mathcal{Y}$}}<br /> \newcommand{\teps}{\varepsilon}<br /> \newcommand{\like}{\cal L}<br /> \newcommand{\logit}{\rm logit}<br /> \newcommand{\transy}{u}<br /> \newcommand{\repy}{y^{(r)}}<br /> \newcommand{\brepy}{\boldsymbol{y}^{(r)}}<br /> \newcommand{\vari}{#1_{#2}^{{#3}}}<br /> \newcommand{\dA}{\dot{#1}_{#2}(t)}<br /> \newcommand{\nitc}{N}<br /> \newcommand{\itc}{I}<br /> \newcommand{\vl}{V}<br /> \newcommand{tstart}{t_{start}}<br /> \newcommand{tstop}{t_{stop}}<br /> \newcommand{\one}{\mathbb{1}}<br /> \newcommand{\hazard}{h}<br /> \newcommand{\cumhaz}{H}<br /> \newcommand{\std}{\mbox{sd}\left(#1\right)}<br /> \newcommand{\eqdef}{\mathop{=}\limits^{\mathrm{def}}}<br /> <br /> \def\cpop{c_{\rm pop}}<br /> \def\Vpop{V_{\rm pop}}<br /> \def\iparam{l}<br /> \newcommand{\trcov}{#1}<br /> <br /> \def\mlxtran{\mathbb{MLXtran} }<br /> \def\monolix{\Bbb{Monolix}}<br /> $<br /> <br /> ==Introduction==<br /> <br /> A model built for real-world applications can involve various types of variable, such as measurements, individual and population parameters, covariates, design, etc. The model allows us to represent relationships between these variables.<br /> <br /> If we consider things from a probabilistic point of view, some of the variables will be random, so the model becomes a probabilistic one, representing the joint distribution of these random variables.<br /> <br /> Defining a model therefore means defining a joint distribution. The hierarchical structure of the model will then allow it to be decomposed into submodels, i.e., the joint distribution decomposed into a product of conditional distributions.<br /> <br /> Tasks such as estimation, model selection, simulation and optimization can then be expressed as specific ways of using this probability distribution.<br /> <br /> <br /> {{OutlineTextL<br /> |text= <br /> - A model is a joint probability distribution. <br /> <br /> - A submodel is a conditional distribution derived from this joint distribution. <br /> <br /> - A task is a specific use of this distribution. <br /> }}<br /> <br /> We will illustrate this approach starting with a very simple example that we will gradually make more sophisticated. Then we will see in various situations what can be defined as the model and what its inputs are.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==An illustrative example==<br /> <br /> &lt;br&gt;<br /> ===A model for the observations of a single individual===<br /> Let$y=(y_j, 1\leq j \leq n)$be a vector of ''observations'' obtained at times$\vt=(t_j, 1\leq j \leq n)$. We consider that the$y_j$are random variables and we denote$\qy$the distribution (or pdf) of$y$. If we assume a ''parametric model'', then there exists a vector of parameters$\psi$that completely define$y$.<br /> <br /> We can then explicitly represent this dependency with respect to$\bpsi$by writing$\qy( \, \cdot \, ; \psi)$for the pdf of$y$.<br /> <br /> If we wish to be even more precise, we can even make it clear that this distribution is defined for a given design, i.e., a given vector of times$\vt$, and write$ \qy(\, \cdot \, ; \psi,\vt)$instead.<br /> <br /> By convention, the variables which are before the symbol &quot;;&quot; are random variables. Those that are after the &quot;;&quot; are non-random parameters or variables.<br /> When there is no risk of confusion, the non-random terms can be left out of the notation.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> -In this context, the model is the distribution of the observations$\qy(\, \cdot \, ; \psi,\vt)$. &lt;br&gt;<br /> -The inputs of the model are the parameters$\psi$and the design$\vt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= 500 mg of a drug is given by intravenous bolus to a patient at time 0. We assume that the evolution of the plasmatic concentration of the drug over time is described by the pharmacokinetic (PK) model<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; f(t;V,k) = \displaystyle{ \frac{500}{V} }e^{-k \, t} , &lt;/math&gt; }}<br /> <br /> where$V$is the volume of distribution and$k$the elimination rate constant. The concentration is measured at times$(t_j, 1\leq j \leq n)$with additive residual errors:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; y_j = f(t_j;V,k) + e_j , \quad 1 \leq j \leq n . &lt;/math&gt; }}<br /> <br /> Assuming that the residual errors$(e_j)$are independent and normally distributed with constant variance$a^2$, the observed values$(y_j)$are also independent random variables and<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba1&quot; &gt;&lt;math&gt;<br /> y_j \sim {\cal N} \left( f(t_j ; V,k) , a^2 \right), \quad 1 \leq j \leq n. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1.4) }}<br /> <br /> Here, the vector of parameters$\psi$is$(V,k,a)$.$V$and$k$are the PK parameters for the structural PK model and$a$the residual error parameter.<br /> As the$y_j$are independent, the joint distribution of$y$is the product of their marginal distributions:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \py(y ; \psi,\vt) = \prod_{j=1}^n \pyj(y_j ; \psi,t_j) ,<br /> &lt;/math&gt; }}<br /> <br /> where$\qyj$is the normal distribution defined in [[#ex_proba1|(1.4)]]. <br /> }}<br /> <br /> === A model for several individuals ===<br /> <br /> Now let us move to$N$individuals. It is natural to suppose that each is represented by the same basic parametric model, but not necessarily the exact same parameter values. Thus, individual$i$has parameters$\psi_i$. If we consider that individuals are randomly selected from the population, then we can treat the$\psi_i$as if they were random vectors. As both$\by=(y_i , 1\leq i \leq N)$and$\bpsi=(\psi_i , 1\leq i \leq N)$are random, the model is now a joint distribution:$\qypsi$. Using basic probability, this can be written as:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsi(\by,\bpsi) = \pcypsi(\by {{!}} \bpsi) \, \ppsi(\bpsi) .&lt;/math&gt; }}<br /> <br /> If$\qpsi$is a parametric distribution that depends on a vector$\theta$of ''population parameters'' and a set of ''individual covariates''$\bc=(c_i , 1\leq i \leq N)$, this dependence can be made explicit by writing$\qpsi(\, \cdot \,;\theta,\bc)$for the pdf of$\bpsi$.<br /> Each$i$has a potentially unique set of times$t_i=(t_{i1},\ldots,t_{i \ \!\!n_i})$in the design, and$n_i$can be different for each individual.<br /> <br /> <br /> {{OutlineTextL<br /> |text= <br /> - In this context, the model is the joint distribution of the observations and the individual parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pypsi(\by , \bpsi; \theta, \bc,\bt)=\pcypsi(\by {{!}} \bpsi;\bt) \, \ppsi(\bpsi;\theta,\bc) . &lt;/math&gt;}}<br /> <br /> - The inputs of the model are the population parameters$\theta$, the individual covariates$\bc=(c_i , 1\leq i \leq N)$and the measurement times <br /> :$\bt=(t_{ij} ,\ 1\leq i \leq N ,\ 1\leq j \leq n_i)$.<br /> }}</div> Admin https://wiki.inria.fr/popix/TestMarc2 TestMarc2 2013-04-26T11:19:51Z <p>Admin: </p> <hr /> <div>&lt;div style=&quot;font-size:12pt;font-family:Arial&quot;&gt;<br /> <br /> ==Introduction==<br /> <br /> A model built for real-world applications can involve various types of variable, such as measurements, individual and population parameters, covariates, design, etc. The model allows us to represent relationships between these variables.<br /> <br /> If we consider things from a probabilistic point of view, some of the variables will be random, so the model becomes a probabilistic one, representing the joint distribution of these random variables.<br /> <br /> Defining a model therefore means defining a joint distribution. The hierarchical structure of the model will then allow it to be decomposed into submodels, i.e., the joint distribution decomposed into a product of conditional distributions.<br /> <br /> Tasks such as estimation, model selection, simulation and optimization can then be expressed as specific ways of using this probability distribution.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - A model is a joint probability distribution. <br /> <br /> - A submodel is a conditional distribution derived from this joint distribution. <br /> <br /> - A task is a specific use of this distribution. <br /> }}<br /> <br /> We will illustrate this approach starting with a very simple example that we will gradually make more sophisticated. Then we will see in various situations what can be defined as the model and what its inputs are.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==An illustrative example==<br /> <br /> &lt;br&gt;<br /> ===A model for the observations of a single individual===<br /> Let$y=(y_j, 1\leq j \leq n)$be a vector of ''observations'' obtained at times$ t=(t_j, 1\leq j \leq n)$. We consider that the$y_j$are random variables and we denote$p_y$the distribution (or pdf) of$y$. If we assume a ''parametric model'', then there exists a vector of parameters$\psi$that completely define$y$.<br /> <br /> We can then explicitly represent this dependency with respect to${\bf \psi}$by writing$p_y( \, \cdot \, ; \psi)$for the pdf of$y$.<br /> <br /> If we wish to be even more precise, we can even make it clear that this distribution is defined for a given design, i.e., a given vector of times$ t$, and write$ p_y(\, \cdot \, ; \psi, t)$instead.<br /> <br /> By convention, the variables which are before the symbol &quot;;&quot; are random variables. Those that are after the &quot;;&quot; are non-random parameters or variables.<br /> When there is no risk of confusion, the non-random terms can be left out of the notation.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> -In this context, the model is the distribution of the observations$p_y(\, \cdot \, ; \psi, t)$. &lt;br&gt;<br /> -The inputs of the model are the parameters$\psi$and the design$ t$.<br /> }}<br /> <br /> {{Example1<br /> |title1=Example:<br /> |title2=<br /> |text=500 mg of a drug is given by intravenous bolus to a patient at time 0. We assume that the evolution of the plasmatic concentration of the drug over time is described by the pharmacokinetic (PK) model<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; f(t;V,k) = \displaystyle{ \frac{500}{V} }e^{-k \, t} , &lt;/math&gt; }}<br /> <br /> where$V$is the volume of distribution and$k$the elimination rate constant. The concentration is measured at times$(t_j, 1\leq j \leq n)$with additive residual errors:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; y_j = f(t_j;V,k) + e_j , \quad 1 \leq j \leq n . &lt;/math&gt; }}<br /> <br /> Assuming that the residual errors$(e_j)$are independent and normally distributed with constant variance$a^2$, the observed values$(y_j)$are also independent random variables and<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba1&quot; &gt;&lt;math&gt;<br /> y_j \sim {\cal N} \left( f(t_j ; V,k) , a^2 \right), \quad 1 \leq j \leq n. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1.4) }}<br /> <br /> Here, the vector of parameters$\psi$is$(V,k,a)$.$V$and$k$are the PK parameters for the structural PK model and$a$the residual error parameter.<br /> As the$y_j$are independent, the joint distribution of$y$is the product of their marginal distributions:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; p(y ; \psi, t) = \prod_{j=1}^n p_j(y_j ; \psi,t_j) ,<br /> &lt;/math&gt; }}<br /> <br /> where$p_j$is the normal distribution defined in [[#ex_proba1|(1.4)]]. }}<br /> <br /> {{Example<br /> |title=Example:<br /> |text= 500 mg of a drug is given by intravenous bolus to a patient at time 0. We assume that the evolution of the plasmatic concentration of the drug over time is described by the pharmacokinetic (PK) model<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; f(t;V,k) = \displaystyle{ \frac{500}{V} }e^{-k \, t} , &lt;/math&gt; }}<br /> <br /> where$V$is the volume of distribution and$k$the elimination rate constant. The concentration is measured at times$(t_j, 1\leq j \leq n)$with additive residual errors:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; y_j = f(t_j;V,k) + e_j , \quad 1 \leq j \leq n . &lt;/math&gt; }}<br /> <br /> Assuming that the residual errors$(e_j)$are independent and normally distributed with constant variance$a^2$, the observed values$(y_j)$are also independent random variables and<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba1&quot; &gt;&lt;math&gt;<br /> y_j \sim {\cal N} \left( f(t_j ; V,k) , a^2 \right), \quad 1 \leq j \leq n. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1.4) }}<br /> <br /> Here, the vector of parameters$\psi$is$(V,k,a)$.$V$and$k$are the PK parameters for the structural PK model and$a$the residual error parameter.<br /> As the$y_j$are independent, the joint distribution of$y$is the product of their marginal distributions:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; p(y ; \psi, t) = \prod_{j=1}^n p_j(y_j ; \psi,t_j) ,<br /> &lt;/math&gt; }}<br /> <br /> where$p_j$is the normal distribution defined in [[#ex_proba1|(1.4)]]. }}<br /> <br /> <br /> {{Example2<br /> |title=Example:<br /> |text= 500 mg of a drug is given by intravenous bolus to a patient at time 0. We assume that the evolution of the plasmatic concentration of the drug over time is described by the pharmacokinetic (PK) model<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; f(t;V,k) = \displaystyle{ \frac{500}{V} }e^{-k \, t} , &lt;/math&gt; }}<br /> <br /> where$V$is the volume of distribution and$k$the elimination rate constant. The concentration is measured at times$(t_j, 1\leq j \leq n)$with additive residual errors:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; y_j = f(t_j;V,k) + e_j , \quad 1 \leq j \leq n . &lt;/math&gt; }}<br /> <br /> Assuming that the residual errors$(e_j)$are independent and normally distributed with constant variance$a^2$, the observed values$(y_j)$are also independent random variables and<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba1&quot; &gt;&lt;math&gt;<br /> y_j \sim {\cal N} \left( f(t_j ; V,k) , a^2 \right), \quad 1 \leq j \leq n. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1.4) }}<br /> <br /> Here, the vector of parameters$\psi$is$(V,k,a)$.$V$and$k$are the PK parameters for the structural PK model and$a$the residual error parameter.<br /> As the$y_j$are independent, the joint distribution of$y$is the product of their marginal distributions:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; p(y ; \psi, t) = \prod_{j=1}^n p_j(y_j ; \psi,t_j) ,<br /> &lt;/math&gt; }}<br /> <br /> where$p_j$is the normal distribution defined in [[#ex_proba1|(1.4)]]. }}<br /> <br /> <br /> <br /> === A model for several individuals ===<br /> <br /> Now let us move to$N$individuals. It is natural to suppose that each is represented by the same basic parametric model, but not necessarily the exact same parameter values. Thus, individual$i$has parameters$\psi_i$. If we consider that individuals are randomly selected from the population, then we can treat the$\psi_i$as if they were random vectors. As both${\bf y}=(y_i , 1\leq i \leq N)$and${\bf \psi}=(\psi_i , 1\leq i \leq N)$are random, the model is now a joint distribution:$p_ypsi$. Using basic probability, this can be written as:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> {\mathrm p}({\bf y},{\bf \psi}) = \mathrm{p}({\bf y} {{!}} {\bf \psi}) \, \mathrm{p}({\bf \psi}) .&lt;/math&gt; }}<br /> <br /> If$p_\psi$is a parametric distribution that depends on a vector$\theta$of ''population parameters'' and a set of ''individual covariates''${\bf c}=(c_i , 1\leq i \leq N)$, this dependence can be made explicit by writing$p_\psi(\, \cdot \,;\theta,{\bf c})$for the pdf of${\bf \psi}$.<br /> Each$i$has a potentially unique set of times$t_i=(t_{i1},\ldots,t_{i \ \!\!n_i})$in the design, and$n_i$can be different for each individual.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - In this context, the model is the joint distribution of the observations and the individual parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; {\mathrm p}({\bf y} , {\bf \psi}; \theta, {\bf c},{\bf t})= {\mathrm p}({\bf y} , {\bf \psi};{\bf t}) \, \mathrm{p}({\bf \psi};\theta,{\bf c}) . &lt;/math&gt;}}<br /> <br /> - The inputs of the model are the population parameters$\theta$, the individual covariates${\bf c}=(c_i , 1\leq i \leq N)$and the measurement times <br /> :${\bf t}=(t_{ij} ,\ 1\leq i \leq N ,\ 1\leq j \leq n_i)$.<br /> }}<br /> <br /> {{Remarks<br /> |title=Remarks<br /> |text= Approximating the fraction$\hat{\psi}/\widehat{\rm s.e}(\hat{\psi}_k)$by the normal distribution is a &quot;good&quot; approximation only when the number of observations$n$is large. A better approximation should be used for small$n$. In the model$y_j = f(t_j ; \phi) + a\varepsilon_j$, the distribution of$\hat{a}^2$can be approximated by a chi-square distribution with$(n-d_\phi)$degrees of freedom, where$d_\phi$is the dimension of$\phi$. The quantiles of the normal distribution can then be replaced by those of a Student's$t$-distribution with$(n-d_\phi)$degrees of freedom.<br /> &lt;!-- %$${\rm CI}(\psi_k) = [\hat{\psi}_k - \widehat{\rm s.e}(\hat{\psi}_k)q((1-\alpha)/2,n-d) , \hat{\psi}_k + \widehat{\rm s.e}(\hat{\psi}_k)q((1+\alpha)/2,n-d)]$$ --&gt;<br /> &lt;!-- %where$q(\alpha,\nu)$is the quantile of order$\alpha$of a$t$-distribution with$\nu$degrees of freedom. --&gt;<br /> }}<br /> <br /> &lt;br&gt;<br /> == Examples With Equations/Code/Tables ==<br /> <br /> {{ExampleWithCode<br /> |title1= Example 1: <br /> |title2= Poisson model with time varying intensity<br /> |text=<br /> <br /> |equation=&lt;math&gt; \begin{array}{c}<br /> \psi_i &amp;=&amp; (\alpha_i,\beta_i) \\[0.3cm]<br /> \lambda(t,\psi_i) &amp;=&amp; \alpha_i + \beta_i\,t \\[0.3cm]<br /> \mathbb{P}(y_{ij}=k) &amp;=&amp; \displaystyle{ \frac{\lambda(t_{ij} , \psi_i)^k}{k!} } e^{-\lambda(t_{ij} , \psi_i)}\\<br /> \end{array}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot; background-color:#EFEFEF; border: none;&quot;&gt; <br /> INPUT:<br /> input = {alpha, beta}<br /> <br /> EQUATION:<br /> lambda = alpha + beta*t<br /> <br /> DEFINITION:<br /> y ~ poisson(lambda)<br /> &lt;/pre&gt; }}<br /> }}<br /> <br /> <br /> {{ExampleWithTable1bis<br /> |title1= Example 1: <br /> |title2= Poisson model with time varying intensity<br /> |text=<br /> <br /> |equation=&lt;math&gt; \begin{array}{c}<br /> \psi_i &amp;=&amp; (\alpha_i,\beta_i) \\[0.3cm]<br /> \lambda(t,\psi_i) &amp;=&amp; \alpha_i + \beta_i\,t \\[0.3cm]<br /> \mathbb{P}(y_{ij}=k) &amp;=&amp; \displaystyle{ \frac{\lambda(t_{ij} , \psi_i)^k}{k!} } e^{-\lambda(t_{ij} , \psi_i)}\\<br /> \end{array}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot; background-color:#EFEFEF; border: none;&quot;&gt; <br /> INPUT:<br /> input = {alpha, beta}<br /> <br /> EQUATION:<br /> lambda = alpha + beta*t<br /> <br /> DEFINITION:<br /> y ~ poisson(lambda)<br /> &lt;/pre&gt; }}<br /> }}<br /> <br /> {{ExampleWithTable1ter<br /> |title1= Example 1: <br /> |title2= Poisson model with time varying intensity<br /> |text=<br /> <br /> |equation=&lt;math&gt; \begin{array}{c}<br /> \psi_i &amp;=&amp; (\alpha_i,\beta_i) \\[0.3cm]<br /> \lambda(t,\psi_i) &amp;=&amp; \alpha_i + \beta_i\,t \\[0.3cm]<br /> \mathbb{P}(y_{ij}=k) &amp;=&amp; \displaystyle{ \frac{\lambda(t_{ij} , \psi_i)^k}{k!} } e^{-\lambda(t_{ij} , \psi_i)}\\<br /> \end{array}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot; background-color:#EFEFEF; border: none;&quot;&gt; <br /> INPUT:<br /> input = {alpha, beta}<br /> <br /> EQUATION:<br /> lambda = alpha + beta*t<br /> <br /> DEFINITION:<br /> y ~ poisson(lambda)<br /> &lt;/pre&gt; }}<br /> }}<br /> <br /> <br /> {{ExampleWithTable_4<br /> |title1= Example 1: <br /> |title2= Poisson model with time varying intensity<br /> |text=<br /> <br /> |equation=&lt;math&gt; \begin{array}{c}<br /> \psi_i &amp;=&amp; (\alpha_i,\beta_i) \\[0.3cm]<br /> \lambda(t,\psi_i) &amp;=&amp; \alpha_i + \beta_i\,t \\[0.3cm]<br /> \mathbb{P}(y_{ij}=k) &amp;=&amp; \displaystyle{ \frac{\lambda(t_{ij} , \psi_i)^k}{k!} } e^{-\lambda(t_{ij} , \psi_i)}\\<br /> \end{array}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot; background-color:#EFEFEF; border: none;&quot;&gt; <br /> INPUT:<br /> input = {alpha, beta}<br /> <br /> EQUATION:<br /> lambda = alpha + beta*t<br /> <br /> DEFINITION:<br /> y ~ poisson(lambda)<br /> &lt;/pre&gt; }}<br /> }}<br /> <br /> <br /> {| cellpadding=&quot;5&quot; cellspacing=&quot;5&quot; <br /> | style=&quot;width:550px;&quot; | <br /> {{RcodeForTable<br /> |name=<br /> |code=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> fmin1=function(x,y,t)<br /> {f=predc1(t,x)<br /> g=x<br /> e=sum( ((y-f)/g)^2 + log(g^2))<br /> }<br /> <br /> fmin2=function(x,y,t)<br /> {f=predc2(t,x)<br /> g=x<br /> e=sum( ((y-f)/g)^2 + log(g^2))<br /> }<br /> <br /> #--------- MLE --------------------------------<br /> <br /> pk.nlm1=nlm(fmin1, c(0.3,6,0.2,1), y, t, hessian=&quot;true&quot;)<br /> psi1=pk.nlm1$estimate<br /> <br /> pk.nlm2=nlm(fmin2, c(3,10,0.2,4), y, t, hessian=&quot;true&quot;)<br /> psi2=pk.nlm2$estimate<br /> &lt;/pre&gt;<br /> }}<br /> | style=&quot;width:550px;&quot; | <br /> :Here are the parameter estimation results:<br /> <br /> <br /> {{JustCodeForTable<br /> |code=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none; color:blue&quot;&gt;<br /> &gt; cat(&quot; psi1 =&quot;,psi1,&quot;\n\n&quot;)<br /> psi1 = 0.3240916 6.001204 0.3239337 0.4366948<br /> <br /> &gt; cat(&quot; psi2 =&quot;,psi2,&quot;\n\n&quot;)<br /> psi2 = 3.203111 8.999746 0.229977 0.2555242<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> == Equations == <br /> <br /> <br /> Here are some examples of these various types of data:<br /> <br /> <br /> * Continuous data with a normal distribution:<br /> <br /> {{EquationWithBorder| &lt;math&gt;y_{ij} \sim {\cal N}\left(f(t_{ij},\psi_i),\, g^2(t_{ij},\psi_i)\right)&lt;/math&gt; }}<br /> <br /> :Here,$\lambda(t_{ij},\psi_i)=\left(f(t_{ij},\psi_i),\,g(t_{ij},\psi_i)\right)$, where$f(t_{ij},\psi_i)$is the mean and$g(t_{ij},\psi_i)$the standard deviation of$y_{ij}$.<br /> <br /> <br /> <br /> * Categorical data with a Bernoulli distribution:<br /> <br /> {{EquationWithBorder|&lt;math&gt; y_{ij} \sim {\cal B}\left(\lambda(t_{ij},\psi_i)\right) &lt;/math&gt; }}<br /> <br /> :Here,$\lambda(t_{ij},\psi_i)$is the probability that$y_{ij}$takes the value 1.<br /> <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div it=&quot;myRef&quot;&gt;&lt;math&gt;<br /> y_{ij} \sim {\cal N}\left(f(t_{ij},\psi_i),\, g^2(t_{ij},\psi_i)\right)<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2.1) }}<br /> <br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;y_{ij} \sim {\cal N}\left(f(t_{ij},\psi_i),\, g^2(t_{ij},\psi_i)\right)&lt;/math&gt; }}<br /> <br /> <br /> <br /> {{EquationWithBorder<br /> |equation= &lt;math&gt; {\cal L}(\theta ; \psi_1,\psi_2,\ldots, \psi_N) \ \ = \ \ \prod_{i=1}^{N}{\mathrm p}(\psi_i ; c_i , \theta). &lt;/math&gt; }} <br /> <br /> <br /> {{ImageWithCaption|individual4.png|caption=This is the caption of the figure}}</div> Admin https://wiki.inria.fr/popix/TestMarc1 TestMarc1 2013-04-26T11:16:08Z <p>Admin: </p> <hr /> <div>&lt;div style=&quot;font-size:12pt;font-family:Segoe UI&quot;&gt;<br /> &lt;!-- some LaTeX macros we want to use: --&gt;<br />$<br /> \newcommand{\argmin}{ \mathop{\rm arg} \mathop{\rm min}\limits_{#1} } <br /> \newcommand{\nominal}{#1^{\star}}<br /> \newcommand{\psis}{\psi{^\star}}<br /> \newcommand{\phis}{\phi{^\star}}<br /> \newcommand{\hpsi}{\hat{\psi}}<br /> \newcommand{\hphi}{\hat{\phi}}<br /> \newcommand{\teps}{\varepsilon}<br /> \newcommand{\limite}{\mathop{\longrightarrow}\limits_{\mathrm{#1}}^{\mathrm{#2}}}<br /> \newcommand{\DDt}{\partial^2_\theta #1}<br /> <br /> \def\bu{\boldsymbol{u}}<br /> \def\bt{\boldsymbol{t}}<br /> \def\bT{\boldsymbol{T}}<br /> \def\by{\boldsymbol{y}}<br /> \def\bx{\boldsymbol{x}}<br /> \def\bc{\boldsymbol{c}}<br /> \def\bw{\boldsymbol{w}}<br /> \def\bz{\boldsymbol{z}}<br /> \def\bpsi{\boldsymbol{\psi}}<br /> \def\bbeta{\beta}<br /> <br /> \def\aref{a^\star}<br /> \def\kref{k^\star}<br /> \def\model{M}<br /> \def\hmodel{m}<br /> \def\mmodel{\mu}<br /> \def\imodel{H}<br /> \def\like{\cal L}<br /> \def\thmle{\hat{\theta}}<br /> \def\ofim{I^{\rm obs}}<br /> \def\efim{I^{\star}}<br /> <br /> \def\Imax{\rm Imax}<br /> \def\probit{\rm probit}<br /> \def\vt{t}<br /> \def\id{\rm Id}<br /> \def\teta{\tilde{\eta}}<br /> \newcommand{\eqdef}{\mathop{=}\limits^{\mathrm{def}}}<br /> <br /> \newcommand{\deriv}{\frac{d}{dt}#1(t)}<br /> <br /> \newcommand{\pred}{\tilde{#1}}<br /> \def\phis{\phi{^\star}}<br /> \def\hphi{\tilde{\phi}}<br /> \def\hw{\tilde{w}}<br /> \def\hpsi{\tilde{\psi}}<br /> \def\hatpsi{\hat{\psi}}<br /> \def\hatphi{\hat{\phi}}<br /> \def\psis{\psi{^\star}}<br /> \def\transy{u}<br /> \def\psipop{\psi_{\rm pop}}<br /> \newcommand{\psigr}{\hat{\bpsi}_{#1}}<br /> \newcommand{\Vgr}{\hat{V}_{#1}}<br /> <br /> %\def\pmacro{\mathcrm{p}}<br /> %\def\pmacro{\verb!p!}<br /> \def\pmacro{\text{p}}<br /> \def\py{\pmacro}<br /> \def\pt{\pmacro}<br /> \def\pc{\pmacro}<br /> \def\pu{\pmacro}<br /> \def\pyi{\pmacro}<br /> \def\pyj{\pmacro}<br /> \def\ppsi{\pmacro}<br /> \def\ppsii{\pmacro}<br /> \def\pcpsith{\pmacro}<br /> \def\pth{\pmacro}<br /> \def\pypsi{\pmacro}<br /> \def\pcypsi{\pmacro}<br /> \def\ppsic{\pmacro}<br /> \def\pcpsic{\pmacro}<br /> \def\pypsic{\pmacro}<br /> \def\pypsit{\pmacro}<br /> \def\pcypsit{\pmacro}<br /> \def\pypsiu{\pmacro}<br /> \def\pcypsiu{\pmacro}<br /> \def\pypsith{\pmacro}<br /> \def\pypsithcut{\pmacro}<br /> \def\pypsithc{\pmacro}<br /> \def\pcypsiut{\pmacro}<br /> \def\pcpsithc{\pmacro}<br /> \def\pcthy{\pmacro}<br /> \def\pyth{\pmacro}<br /> \def\pcpsiy{\pmacro}<br /> \def\pz{\pmacro}<br /> \def\pw{\pmacro}<br /> \def\pcwz{\pmacro}<br /> \def\pw{\pmacro}<br /> \def\pcyipsii{\pmacro}<br /> \def\pyipsii{\pmacro}<br /> \def\pypsiij{\pmacro}<br /> \def\pyipsiONE{\pmacro}<br /> \def\ptypsiij{\pmacro}<br /> \def\pcyzipsii{\pmacro}<br /> \def\pczipsii{\pmacro}<br /> \def\pcyizpsii{\pmacro}<br /> \def\pcyijzpsii{\pmacro}<br /> \def\pcyiONEzpsii{\pmacro}<br /> \def\pcypsiz{\pmacro}<br /> \def\pccypsiz{\pmacro}<br /> \def\pypsiz{\pmacro}<br /> \def\pcpsiz{\pmacro}<br /> \def\peps{\pmacro}<br /> <br /> \def\psig{\psi}<br /> \def\psigprime{\psig^{\prime}}<br /> \def\psigiprime{\psig_i^{\prime}}<br /> \def\psigk{\psig^{(k)}}<br /> \def\psigki{\psig_i^{(k)}}<br /> \def\psigkun{\psig^{(k+1)}}<br /> \def\psigkuni{\psig_i^{(k+1)}}<br /> \def\psigi{\psig_i}<br /> \def\psigil{\psig_{i,\ell}}<br /> \def\phig{\phi}<br /> \def\phigi{\phig_i}<br /> \def\phigil{\phig_{i,\ell}}<br /> <br /> <br /> \def\etagi{\eta_i}<br /> \def\IIV{\Omega}<br /> \def\thetag{\theta}<br /> \def\thetagk{\theta_k}<br /> \def\thetagkun{\theta_{k+1}}<br /> \def\thetagkunm{\theta_{k-1}}<br /> \def\sgk{s_{k}}<br /> \def\sgkun{s_{k+1}}<br /> \def\yg{y}<br /> \def\xg{x}<br /> <br /> \def\qx{p_x}<br /> \def\qy{p_y}<br /> \def\qt{p_t}<br /> \def\qc{p_c}<br /> \def\qu{p_u}<br /> \def\qyi{p_{y_i}}<br /> \def\qyj{p_{y_j}}<br /> \def\qpsi{p_{\psi}}<br /> \def\qpsii{p_{\psi_i}}<br /> \def\qcpsith{p_{\psi|\theta}}<br /> \def\qth{p_{\theta}}<br /> \def\qypsi{p_{y,\psi}}<br /> \def\qcypsi{p_{y|\psi}}<br /> \def\qpsic{p_{\psi,c}}<br /> \def\qcpsic{p_{\psi|c}}<br /> \def\qypsic{p_{y,\psi,c}}<br /> \def\qypsit{p_{y,\psi,t}}<br /> \def\qcypsit{p_{y|\psi,t}}<br /> \def\qypsiu{p_{y,\psi,u}}<br /> \def\qcypsiu{p_{y|\psi,u}}<br /> \def\qypsith{p_{y,\psi,\theta}}<br /> \def\qypsithcut{p_{y,\psi,\theta,c,u,t}}<br /> \def\qypsithc{p_{y,\psi,\theta,c}}<br /> \def\qcypsiut{p_{y|\psi,u,t}}<br /> \def\qcpsithc{p_{\psi|\theta,c}}<br /> \def\qcthy{p_{\theta | y}}<br /> \def\qyth{p_{y,\theta}}<br /> \def\qcpsiy{p_{\psi|y}}<br /> \def\qz{p_z}<br /> \def\qw{p_w}<br /> \def\qcwz{p_{w|z}}<br /> \def\qw{p_w}<br /> \def\qcyipsii{p_{y_i|\psi_i}}<br /> \def\qyipsii{p_{y_i,\psi_i}}<br /> \def\qypsiij{p_{y_{ij}|\psi_{i}}}<br /> \def\qyipsi1{p_{y_{i1}|\psi_{i}}}<br /> \def\qtypsiij{p_{\transy(y_{ij})|\psi_{i}}}<br /> \def\qcyzipsii{p_{z_i,y_i|\psi_i}}<br /> \def\qczipsii{p_{z_i|\psi_i}}<br /> \def\qcyizpsii{p_{y_i|z_i,\psi_i}}<br /> \def\qcyijzpsii{p_{y_{ij}|z_{ij},\psi_i}}<br /> \def\qcyi1zpsii{p_{y_{i1}|z_{i1},\psi_i}}<br /> \def\qcypsiz{p_{y,\psi|z}}<br /> \def\qccypsiz{p_{y|\psi,z}}<br /> \def\qypsiz{p_{y,\psi,z}}<br /> \def\qcpsiz{p_{\psi|z}}<br /> \def\qeps{p_{\teps}}<br /> <br /> \def\neta{n_\eta}<br /> \def\ncov{M}<br /> \def\npsi{n_\psig}<br /> <br /> \def\beeta{\eta}<br /> <br /> \def\logit{\rm logit}<br /> \def\transy{u}<br /> \def\so{O}<br /> <br /> \newcommand{\prob}{ \mathbb{P}\left(#1\right)}<br /> \newcommand{\probs}{ \mathbb{P}_{#1}\left(#2\right)}<br /> \newcommand{\esp}{\mathbb{E}\left(#1\right)}<br /> \newcommand{\esps}{\mathbb{E}_{#1}\left(#2\right)}<br /> \newcommand{\var}{\mbox{Var}\left(#1\right)}<br /> \newcommand{\vars}{\mbox{Var}_{#1}\left(#2\right)}<br /> \newcommand{\std}{\mbox{sd}\left(#1\right)}<br /> \newcommand{\stds}{\mbox{sd}_{#1}\left(#2\right)}<br /> \newcommand{\corr}{\mbox{Corr}\left(#1\right)}<br /> \newcommand{\Rset}{\mbox{$\mathbb{R}$}}<br /> \newcommand{\Yr}{\mbox{$\mathcal{Y}$}}<br /> \newcommand{\teps}{\varepsilon}<br /> \newcommand{\like}{\cal L}<br /> \newcommand{\logit}{\rm logit}<br /> \newcommand{\transy}{u}<br /> \newcommand{\repy}{y^{(r)}}<br /> \newcommand{\brepy}{\boldsymbol{y}^{(r)}}<br /> \newcommand{\vari}{#1_{#2}^{{#3}}}<br /> \newcommand{\dA}{\dot{#1}_{#2}(t)}<br /> \newcommand{\nitc}{N}<br /> \newcommand{\itc}{I}<br /> \newcommand{\vl}{V}<br /> \newcommand{tstart}{t_{start}}<br /> \newcommand{tstop}{t_{stop}}<br /> \newcommand{\one}{\mathbb{1}}<br /> \newcommand{\hazard}{h}<br /> \newcommand{\cumhaz}{H}<br /> \newcommand{\std}{\mbox{sd}\left(#1\right)}<br /> \newcommand{\eqdef}{\mathop{=}\limits^{\mathrm{def}}}<br /> <br /> \def\cpop{c_{\rm pop}}<br /> \def\Vpop{V_{\rm pop}}<br /> \def\iparam{l}<br /> \newcommand{\trcov}{#1}<br /> <br /> \def\mlxtran{\mathbb{MLXtran} }<br /> \def\monolix{\Bbb{Monolix}}<br /> $<br /> <br /> ==Introduction==<br /> <br /> A model built for real-world applications can involve various types of variable, such as measurements, individual and population parameters, covariates, design, etc. The model allows us to represent relationships between these variables.<br /> <br /> If we consider things from a probabilistic point of view, some of the variables will be random, so the model becomes a probabilistic one, representing the joint distribution of these random variables.<br /> <br /> Defining a model therefore means defining a joint distribution. The hierarchical structure of the model will then allow it to be decomposed into submodels, i.e., the joint distribution decomposed into a product of conditional distributions.<br /> <br /> Tasks such as estimation, model selection, simulation and optimization can then be expressed as specific ways of using this probability distribution.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - A model is a joint probability distribution. <br /> <br /> - A submodel is a conditional distribution derived from this joint distribution. <br /> <br /> - A task is a specific use of this distribution. <br /> }}<br /> <br /> We will illustrate this approach starting with a very simple example that we will gradually make more sophisticated. Then we will see in various situations what can be defined as the model and what its inputs are.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==An illustrative example==<br /> <br /> &lt;br&gt;<br /> ===A model for the observations of a single individual===<br /> Let$y=(y_j, 1\leq j \leq n)$be a vector of ''observations'' obtained at times$\vt=(t_j, 1\leq j \leq n)$. We consider that the$y_j$are random variables and we denote$\qy$the distribution (or pdf) of$y$. If we assume a ''parametric model'', then there exists a vector of parameters$\psi$that completely define$y$.<br /> <br /> We can then explicitly represent this dependency with respect to$\bpsi$by writing$\qy( \, \cdot \, ; \psi)$for the pdf of$y$.<br /> <br /> If we wish to be even more precise, we can even make it clear that this distribution is defined for a given design, i.e., a given vector of times$\vt$, and write$ \qy(\, \cdot \, ; \psi,\vt)$instead.<br /> <br /> By convention, the variables which are before the symbol &quot;;&quot; are random variables. Those that are after the &quot;;&quot; are non-random parameters or variables.<br /> When there is no risk of confusion, the non-random terms can be left out of the notation.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> -In this context, the model is the distribution of the observations$\qy(\, \cdot \, ; \psi,\vt)$. &lt;br&gt;<br /> -The inputs of the model are the parameters$\psi$and the design$\vt$.<br /> }}<br /> <br /> {{Example1<br /> |title1=Example:<br /> |title2=<br /> |text= 500 mg of a drug is given by intravenous bolus to a patient at time 0. We assume that the evolution of the plasmatic concentration of the drug over time is described by the pharmacokinetic (PK) model<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; f(t;V,k) = \displaystyle{ \frac{500}{V} }e^{-k \, t} , &lt;/math&gt; }}<br /> <br /> where$V$is the volume of distribution and$k$the elimination rate constant. The concentration is measured at times$(t_j, 1\leq j \leq n)$with additive residual errors:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; y_j = f(t_j;V,k) + e_j , \quad 1 \leq j \leq n . &lt;/math&gt; }}<br /> <br /> Assuming that the residual errors$(e_j)$are independent and normally distributed with constant variance$a^2$, the observed values$(y_j)$are also independent random variables and<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba1&quot; &gt;&lt;math&gt;<br /> y_j \sim {\cal N} \left( f(t_j ; V,k) , a^2 \right), \quad 1 \leq j \leq n. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1.4) }}<br /> <br /> Here, the vector of parameters$\psi$is$(V,k,a)$.$V$and$k$are the PK parameters for the structural PK model and$a$the residual error parameter.<br /> As the$y_j$are independent, the joint distribution of$y$is the product of their marginal distributions:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \py(y ; \psi,\vt) = \prod_{j=1}^n \pyj(y_j ; \psi,t_j) ,<br /> &lt;/math&gt; }}<br /> <br /> where$\qyj$is the normal distribution defined in [[#ex_proba1|(1.4)]]. }}<br /> <br /> {{Example<br /> |title=Example:<br /> |text= 500 mg of a drug is given by intravenous bolus to a patient at time 0. We assume that the evolution of the plasmatic concentration of the drug over time is described by the pharmacokinetic (PK) model<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; f(t;V,k) = \displaystyle{ \frac{500}{V} }e^{-k \, t} , &lt;/math&gt; }}<br /> <br /> where$V$is the volume of distribution and$k$the elimination rate constant. The concentration is measured at times$(t_j, 1\leq j \leq n)$with additive residual errors:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; y_j = f(t_j;V,k) + e_j , \quad 1 \leq j \leq n . &lt;/math&gt; }}<br /> <br /> Assuming that the residual errors$(e_j)$are independent and normally distributed with constant variance$a^2$, the observed values$(y_j)$are also independent random variables and<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba1&quot; &gt;&lt;math&gt;<br /> y_j \sim {\cal N} \left( f(t_j ; V,k) , a^2 \right), \quad 1 \leq j \leq n. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1.4) }}<br /> <br /> Here, the vector of parameters$\psi$is$(V,k,a)$.$V$and$k$are the PK parameters for the structural PK model and$a$the residual error parameter.<br /> As the$y_j$are independent, the joint distribution of$y$is the product of their marginal distributions:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \py(y ; \psi,\vt) = \prod_{j=1}^n \pyj(y_j ; \psi,t_j) ,<br /> &lt;/math&gt; }}<br /> <br /> where$\qyj$is the normal distribution defined in [[#ex_proba1|(1.4)]]. <br /> }}<br /> <br /> <br /> {{Example2<br /> |title=Example:<br /> |text= 500 mg of a drug is given by intravenous bolus to a patient at time 0. We assume that the evolution of the plasmatic concentration of the drug over time is described by the pharmacokinetic (PK) model<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; f(t;V,k) = \displaystyle{ \frac{500}{V} }e^{-k \, t} , &lt;/math&gt; }}<br /> <br /> where$V$is the volume of distribution and$k$the elimination rate constant. The concentration is measured at times$(t_j, 1\leq j \leq n)$with additive residual errors:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; y_j = f(t_j;V,k) + e_j , \quad 1 \leq j \leq n . &lt;/math&gt; }}<br /> <br /> Assuming that the residual errors$(e_j)$are independent and normally distributed with constant variance$a^2$, the observed values$(y_j)$are also independent random variables and<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba1&quot; &gt;&lt;math&gt;<br /> y_j \sim {\cal N} \left( f(t_j ; V,k) , a^2 \right), \quad 1 \leq j \leq n. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1.4) }}<br /> <br /> Here, the vector of parameters$\psi$is$(V,k,a)$.$V$and$k$are the PK parameters for the structural PK model and$a$the residual error parameter.<br /> As the$y_j$are independent, the joint distribution of$y$is the product of their marginal distributions:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \py(y ; \psi,\vt) = \prod_{j=1}^n \pyj(y_j ; \psi,t_j) ,<br /> &lt;/math&gt; }}<br /> <br /> where$\qyj$is the normal distribution defined in [[#ex_proba1|(1.4)]]. }}<br /> <br /> === A model for several individuals ===<br /> <br /> Now let us move to$N$individuals. It is natural to suppose that each is represented by the same basic parametric model, but not necessarily the exact same parameter values. Thus, individual$i$has parameters$\psi_i$. If we consider that individuals are randomly selected from the population, then we can treat the$\psi_i$as if they were random vectors. As both$\by=(y_i , 1\leq i \leq N)$and$\bpsi=(\psi_i , 1\leq i \leq N)$are random, the model is now a joint distribution:$\qypsi$. Using basic probability, this can be written as:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsi(\by,\bpsi) = \pcypsi(\by {{!}} \bpsi) \, \ppsi(\bpsi) .&lt;/math&gt; }}<br /> <br /> If$\qpsi$is a parametric distribution that depends on a vector$\theta$of ''population parameters'' and a set of ''individual covariates''$\bc=(c_i , 1\leq i \leq N)$, this dependence can be made explicit by writing$\qpsi(\, \cdot \,;\theta,\bc)$for the pdf of$\bpsi$.<br /> Each$i$has a potentially unique set of times$t_i=(t_{i1},\ldots,t_{i \ \!\!n_i})$in the design, and$n_i$can be different for each individual.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - In this context, the model is the joint distribution of the observations and the individual parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pypsi(\by , \bpsi; \theta, \bc,\bt)=\pcypsi(\by {{!}} \bpsi;\bt) \, \ppsi(\bpsi;\theta,\bc) . &lt;/math&gt;}}<br /> <br /> - The inputs of the model are the population parameters$\theta$, the individual covariates$\bc=(c_i , 1\leq i \leq N)$and the measurement times <br /> :$\bt=(t_{ij} ,\ 1\leq i \leq N ,\ 1\leq j \leq n_i)$.<br /> }}<br /> <br /> {{Remarks<br /> |title=Remarks<br /> |text= Approximating the fraction$\hatpsi/\widehat{\rm s.e}(\hatpsi_k)$by the normal distribution is a &quot;good&quot; approximation only when the number of observations$n$is large. A better approximation should be used for small$n$. In the model$y_j = f(t_j ; \phi) + a\teps_j$, the distribution of$\hat{a}^2$can be approximated by a chi-square distribution with$(n-d_\phi)$degrees of freedom, where$d_\phi$is the dimension of$\phi$. The quantiles of the normal distribution can then be replaced by those of a Student's$t$-distribution with$(n-d_\phi)$degrees of freedom.<br /> &lt;!-- %$${\rm CI}(\psi_k) = [\hatpsi_k - \widehat{\rm s.e}(\hatpsi_k)q((1-\alpha)/2,n-d) , \hatpsi_k + \widehat{\rm s.e}(\hatpsi_k)q((1+\alpha)/2,n-d)]$$ --&gt;<br /> &lt;!-- %where$q(\alpha,\nu)$is the quantile of order$\alpha$of a$t$-distribution with$\nu$degrees of freedom. --&gt;<br /> }}<br /> <br /> &lt;br&gt;<br /> == Examples With Equations/Code/Tables ==<br /> <br /> {{ExampleWithCode<br /> |title1= Example 1: <br /> |title2= Poisson model with time varying intensity<br /> |text=<br /> <br /> |equation=&lt;math&gt; \begin{array}{c}<br /> \psi_i &amp;=&amp; (\alpha_i,\beta_i) \\[0.3cm]<br /> \lambda(t,\psi_i) &amp;=&amp; \alpha_i + \beta_i\,t \\[0.3cm]<br /> \prob{y_{ij}=k} &amp;=&amp; \displaystyle{ \frac{\lambda(t_{ij} , \psi_i)^k}{k!} } e^{-\lambda(t_{ij} , \psi_i)}\\<br /> \end{array}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot; background-color:#EFEFEF; border: none;&quot;&gt; <br /> INPUT:<br /> input = {alpha, beta}<br /> <br /> EQUATION:<br /> lambda = alpha + beta*t<br /> <br /> DEFINITION:<br /> y ~ poisson(lambda)<br /> &lt;/pre&gt; }}<br /> }}<br /> <br /> <br /> {{ExampleWithTable1bis<br /> |title1= Example 1: <br /> |title2= Poisson model with time varying intensity<br /> |text=<br /> <br /> |equation=&lt;math&gt; \begin{array}{c}<br /> \psi_i &amp;=&amp; (\alpha_i,\beta_i) \\[0.3cm]<br /> \lambda(t,\psi_i) &amp;=&amp; \alpha_i + \beta_i\,t \\[0.3cm]<br /> \prob{y_{ij}=k} &amp;=&amp; \displaystyle{ \frac{\lambda(t_{ij} , \psi_i)^k}{k!} } e^{-\lambda(t_{ij} , \psi_i)}\\<br /> \end{array}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot; background-color:#EFEFEF; border: none;&quot;&gt; <br /> INPUT:<br /> input = {alpha, beta}<br /> <br /> EQUATION:<br /> lambda = alpha + beta*t<br /> <br /> DEFINITION:<br /> y ~ poisson(lambda)<br /> &lt;/pre&gt; }}<br /> }}<br /> <br /> {{ExampleWithTable1ter<br /> |title1= Example 1: <br /> |title2= Poisson model with time varying intensity<br /> |text=<br /> <br /> |equation=&lt;math&gt; \begin{array}{c}<br /> \psi_i &amp;=&amp; (\alpha_i,\beta_i) \\[0.3cm]<br /> \lambda(t,\psi_i) &amp;=&amp; \alpha_i + \beta_i\,t \\[0.3cm]<br /> \prob{y_{ij}=k} &amp;=&amp; \displaystyle{ \frac{\lambda(t_{ij} , \psi_i)^k}{k!} } e^{-\lambda(t_{ij} , \psi_i)}\\<br /> \end{array}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot; background-color:#EFEFEF; border: none;&quot;&gt; <br /> INPUT:<br /> input = {alpha, beta}<br /> <br /> EQUATION:<br /> lambda = alpha + beta*t<br /> <br /> DEFINITION:<br /> y ~ poisson(lambda)<br /> &lt;/pre&gt; }}<br /> }}<br /> <br /> <br /> {{ExampleWithTable_4<br /> |title1= Example 1: <br /> |title2= Poisson model with time varying intensity<br /> |text=<br /> <br /> |equation=&lt;math&gt; \begin{array}{c}<br /> \psi_i &amp;=&amp; (\alpha_i,\beta_i) \\[0.3cm]<br /> \lambda(t,\psi_i) &amp;=&amp; \alpha_i + \beta_i\,t \\[0.3cm]<br /> \prob{y_{ij}=k} &amp;=&amp; \displaystyle{ \frac{\lambda(t_{ij} , \psi_i)^k}{k!} } e^{-\lambda(t_{ij} , \psi_i)}\\<br /> \end{array}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot; background-color:#EFEFEF; border: none;&quot;&gt; <br /> INPUT:<br /> input = {alpha, beta}<br /> <br /> EQUATION:<br /> lambda = alpha + beta*t<br /> <br /> DEFINITION:<br /> y ~ poisson(lambda)<br /> &lt;/pre&gt; }}<br /> }}<br /> <br /> <br /> {| cellpadding=&quot;5&quot; cellspacing=&quot;5&quot; <br /> | style=&quot;width:550px;&quot; | <br /> {{RcodeForTable<br /> |name=<br /> |code=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> fmin1=function(x,y,t)<br /> {f=predc1(t,x)<br /> g=x<br /> e=sum( ((y-f)/g)^2 + log(g^2))<br /> }<br /> <br /> fmin2=function(x,y,t)<br /> {f=predc2(t,x)<br /> g=x<br /> e=sum( ((y-f)/g)^2 + log(g^2))<br /> }<br /> <br /> #--------- MLE --------------------------------<br /> <br /> pk.nlm1=nlm(fmin1, c(0.3,6,0.2,1), y, t, hessian=&quot;true&quot;)<br /> psi1=pk.nlm1$estimate<br /> <br /> pk.nlm2=nlm(fmin2, c(3,10,0.2,4), y, t, hessian=&quot;true&quot;)<br /> psi2=pk.nlm2$estimate<br /> &lt;/pre&gt;<br /> }}<br /> | style=&quot;width:550px;&quot; | <br /> :Here are the parameter estimation results:<br /> <br /> <br /> {{JustCodeForTable<br /> |code=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none; color:blue&quot;&gt;<br /> &gt; cat(&quot; psi1 =&quot;,psi1,&quot;\n\n&quot;)<br /> psi1 = 0.3240916 6.001204 0.3239337 0.4366948<br /> <br /> &gt; cat(&quot; psi2 =&quot;,psi2,&quot;\n\n&quot;)<br /> psi2 = 3.203111 8.999746 0.229977 0.2555242<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> == Equations == <br /> <br /> <br /> Here are some examples of these various types of data:<br /> <br /> <br /> * Continuous data with a normal distribution:<br /> <br /> {{EquationWithBorder| &lt;math&gt;y_{ij} \sim {\cal N}\left(f(t_{ij},\psi_i),\, g^2(t_{ij},\psi_i)\right)&lt;/math&gt; }}<br /> <br /> :Here,$\lambda(t_{ij},\psi_i)=\left(f(t_{ij},\psi_i),\,g(t_{ij},\psi_i)\right)$, where$f(t_{ij},\psi_i)$is the mean and$g(t_{ij},\psi_i)$the standard deviation of$y_{ij}$.<br /> <br /> <br /> <br /> * Categorical data with a Bernoulli distribution:<br /> <br /> {{EquationWithBorder|&lt;math&gt; y_{ij} \sim {\cal B}\left(\lambda(t_{ij},\psi_i)\right) &lt;/math&gt; }}<br /> <br /> :Here,$\lambda(t_{ij},\psi_i)$is the probability that$y_{ij}$takes the value 1.<br /> <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div it=&quot;myRef&quot;&gt;&lt;math&gt;<br /> y_{ij} \sim {\cal N}\left(f(t_{ij},\psi_i),\, g^2(t_{ij},\psi_i)\right)<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2.1) }}<br /> <br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;y_{ij} \sim {\cal N}\left(f(t_{ij},\psi_i),\, g^2(t_{ij},\psi_i)\right)&lt;/math&gt; }}<br /> <br /> {{EquationWithBorder|equation=&lt;math&gt; \like(\theta ; \psi_1,\psi_2,\ldots, \psi_N) \ \ \eqdef \ \ \prod_{i=1}^{N}\ppsii(\psi_i ; c_i , \theta). &lt;/math&gt; }} <br /> <br /> <br /> {{EquationWithBorder<br /> |equation= &lt;math&gt; \like(\theta ; \psi_1,\psi_2,\ldots, \psi_N) \ \ \eqdef \ \ \prod_{i=1}^{N}\ppsii(\psi_i ; c_i , \theta). &lt;/math&gt; }} <br /> <br /> <br /> {{EquationWithBorder<br /> |equation= &lt;math&gt; {\like}(\theta ; \psi_1,\psi_2,\ldots, \psi_N) \ \ \eqdef \ \ \prod_{i=1}^{N}\ppsii(\psi_i ; c_i , \theta). &lt;/math&gt; }} <br /> <br /> <br /> {{ImageWithCaption|individual4.png|caption=This is the caption of the figure}}</div> Admin https://wiki.inria.fr/popix/Test_balloons Test balloons 2013-04-23T09:34:39Z <p>Admin: </p> <hr /> <div><br /> <br /> aaaa<br /> <br /> &lt;balloon title=&quot;Hello I am a$4$tooltip!&quot;&gt;&lt;span style=&quot;color:red;font-size:16pt&quot;&gt;Mouse over me!&lt;/span&gt;&lt;/balloon&gt;<br /> <br /> <br /> &lt;balloon title=&quot;Hello I am a Wiki tooltip!&quot; style=&quot;color:blue&quot;&gt;<br /> Text: Mouse over this text!<br /> &lt;/balloon&gt;<br /> <br /> &lt;balloon title=&quot;these values were not obtained ''by chance'', they were estimated using &lt;math&gt;\monolix&lt;/math&gt;, but that's another story...&quot; style=&quot;color:#177245&quot;&gt; parameters&lt;/balloon&gt;<br /> <br /> <br /> {{tooltip | base text | tooltip text}}</div> Admin https://wiki.inria.fr/popix/Common.css Common.css 2013-04-22T08:34:30Z <p>Admin: </p> <hr /> <div>#bodyContent{<br /> font-size: 18px;<br /> }</div> Admin https://wiki.inria.fr/popix/Testing_The_Individual_approach Testing The Individual approach 2013-04-19T14:10:18Z <p>Admin: /* Example 2 */</p> <hr /> <div>&lt;!-- some LaTeX macros we want to use: --&gt;<br />$<br /> \DeclareMathOperator{\argmin}{arg\,min}<br /> \DeclareMathOperator{\argmax}{arg\,max}<br /> \newcommand{\nominal}{#1^{\star}}<br /> \newcommand{\psis}{\psi{^\star}}<br /> \newcommand{\phis}{\phi{^\star}}<br /> \newcommand{\hpsi}{\hat{\psi}}<br /> \newcommand{\hphi}{\hat{\phi}}<br /> \newcommand{\teps}{\varepsilon}<br /> \newcommand{\limite}{\mathop{\longrightarrow}\limits_{\mathrm{#1}}^{\mathrm{#2}}}<br /> \newcommand{\DDt}{\partial^2_\theta #1}<br /> \def\aref{a^\star}<br /> \def\kref{k^\star}<br /> \def\model{M}<br /> \def\hmodel{m}<br /> \def\mmodel{\mu}<br /> \def\imodel{H}<br /> \def\like{\cal L}<br /> \def\thmle{\hat{\theta}}<br /> \def\ofim{I^{\rm obs}}<br /> \def\efim{I^{\star}}<br /> <br /> \def\Imax{\text{\it Imax}}<br /> \def\id{ {\rm Id}}<br /> \def\teta{\tilde{\eta}}<br /> \newcommand{\eqdef}{\mathop{=}\limits^{\mathrm{def}}}<br /> <br /> \newcommand{\deriv}{\frac{d}{dt}#1(t)}<br /> <br /> \newcommand{\pred}{\tilde{#1}}<br /> \def\phis{\phi{^\star}}<br /> \def\hphi{\tilde{\phi}}<br /> \def\hw{\tilde{w}}<br /> \def\hpsi{\tilde{\psi}}<br /> \def\hatpsi{\hat{\psi}}<br /> \def\hatphi{\hat{\phi}}<br /> \def\psis{\psi{^\star}}<br /> \def\transy{u}<br /> \def\psipop{\psi_{\rm pop}}<br /> \newcommand{\psigr}{\hat{\bpsi}_{#1}}<br /> \newcommand{\Vgr}{\hat{V}_{#1}}<br /> <br /> \def\psig{\psi}<br /> \def\psigprime{\psig^{\prime}}<br /> \def\psigiprime{\psig_i^{\prime}}<br /> \def\psigk{\psig^{(k)}}<br /> \def\psigki{ {\psig_i^{(k)}}}<br /> \def\psigkun{\psig^{(k+1)}}<br /> \def\psigkuni{\psig_i^{(k+1)}}<br /> \def\psigi{ {\psig_i}}<br /> \def\psigil{ {\psig_{i,\ell}}}<br /> \def\phig{ {\phi}}<br /> \def\phigi{ {\phig_i}}<br /> \def\phigil{ {\phig_{i,\ell}}}<br /> \def\etagi{ {\eta_i}}<br /> \def\IIV{ {\Omega}}<br /> \def\thetag{ {\theta}}<br /> \def\thetagk{ {\theta_k}}<br /> \def\thetagkun{ {\theta_{k+1}}}<br /> \def\thetagkunm{\theta_{k-1}}<br /> \def\sgk{s_{k}}<br /> \def\sgkun{s_{k+1}}<br /> \def\yg{y}<br /> \def\xg{x}<br /> <br /> \def\qy{p_{_y}}<br /> \def\qt{p_{_t}}<br /> \def\qc{p_{_c}}<br /> \def\qu{p_{_u}}<br /> \def\qyi{p_{_{y_i}}}<br /> \def\qyj{p_{_{y_j}}}<br /> \def\qpsi{p_{_{\psi}}}<br /> \def\qpsii{p_{_{\psi_i}}}<br /> \def\qcpsith{p_{_{\psi|\theta}}}<br /> \def\qth{p_{_{\theta}}}<br /> \def\qypsi{p_{_{y,\psi}}}<br /> \def\qcypsi{p_{_{y|\psi}}}<br /> \def\qpsic{p_{_{\psi,c}}}<br /> \def\qcpsic{p_{_{\psi|c}}}<br /> \def\qypsic{p_{_{y,\psi,c}}}<br /> \def\qypsit{p_{_{y,\psi,t}}}<br /> \def\qcypsit{p_{_{y|\psi,t}}}<br /> \def\qypsiu{p_{_{y,\psi,u}}}<br /> \def\qcypsiu{p_{_{y|\psi,u}}}<br /> \def\qypsith{p_{_{y,\psi,\theta}}}<br /> \def\qypsithcut{p_{_{y,\psi,\theta,c,u,t}}}<br /> \def\qypsithc{p_{_{y,\psi,\theta,c}}}<br /> \def\qcypsiut{p_{_{y|\psi,u,t}}}<br /> \def\qcpsithc{p_{_{\psi|\theta,c}}}<br /> \def\qcthy{p_{_{\theta | y}}}<br /> \def\qyth{p_{_{y,\theta}}}<br /> \def\qcpsiy{p_{_{\psi|y}}}<br /> \def\qz{p_{_z}}<br /> \def\qw{p_{_w}}<br /> \def\qcwz{p_{_{w|z}}}<br /> \def\qw{p_{_w}}<br /> \def\qcyipsii{p_{_{y_i|\psi_i}}}<br /> \def\qyipsii{p_{_{y_i,\psi_i}}}<br /> \def\qypsiij{p_{_{y_{ij}|\psi_{i}}}}<br /> \def\qyipsi1{p_{_{y_{i1}|\psi_{i}}}}<br /> \def\qtypsiij{p_{_{\transy(y_{ij})|\psi_{i}}}}<br /> \def\qcyzipsii{p_{_{z_i,y_i|\psi_i}}}<br /> \def\qczipsii{p_{_{z_i|\psi_i}}}<br /> \def\qcyizpsii{p_{_{y_i|z_i,\psi_i}}}<br /> \def\qcyijzpsii{p_{_{y_{ij}|z_{ij},\psi_i}}}<br /> \def\qcyi1zpsii{p_{_{y_{i1}|z_{i1},\psi_i}}}<br /> \def\qcypsiz{p_{_{y,\psi|z}}}<br /> \def\qccypsiz{p_{_{y|\psi,z}}}<br /> \def\qypsiz{p_{_{y,\psi,z}}}<br /> \def\qcpsiz{p_{_{\psi|z}}}<br /> \def\qeps{p_{_{\teps}}}<br /> <br /> \def\neta{ {n_\eta}}<br /> \def\ncov{M}<br /> \def\npsi{n_\psig}<br /> <br /> \def\bu{\boldsymbol{u}}<br /> \def\bt{\boldsymbol{t}}<br /> \def\bT{\boldsymbol{T}}<br /> <br /> \def\by{\boldsymbol{y}}<br /> \def\bx{\boldsymbol{x}}<br /> \def\bc{\boldsymbol{c}}<br /> \def\bw{\boldsymbol{w}}<br /> \def\bz{\boldsymbol{z}}<br /> \def\bpsi{\boldsymbol{\psi}}<br /> \def\bbeta{\beta}<br /> \def\beeta{\eta}<br /> <br /> \def\logit{\rm logit}<br /> \def\transy{u}<br /> \def\so{O}<br /> <br /> \def\one{\mathbb 1}<br /> \newcommand{\prob}{ \mathbb{P}\!\left(#1\right)}<br /> \newcommand{\probs}{ \mathbb{P}_{#1}\!\left(#2\right)}<br /> \newcommand{\esp}{\mathbb{E}\left(#1\right)}<br /> \newcommand{\esps}{\mathbb{E}_{#1}\left(#2\right)}<br /> \newcommand{\var}{\mbox{Var}\left(#1\right)}<br /> \newcommand{\vars}{\mbox{Var}_{#1}\left(#2\right)}<br /> \newcommand{\std}{\mbox{sd}\left(#1\right)}<br /> \newcommand{\stds}{\mbox{sd}_{#1}\left(#2\right)}<br /> \newcommand{\corr}{\mbox{Corr}\left(#1\right)}<br /> <br /> \def\pmacro{\mathbf{p}}<br /> \def\py{\pmacro}<br /> \def\pt{\pmacro}<br /> \def\pc{\pmacro}<br /> \def\pu{\pmacro}<br /> \def\pyi{\pmacro}<br /> \def\pyj{\pmacro}<br /> \def\ppsi{\pmacro}<br /> \def\ppsii{\pmacro}<br /> \def\pcpsith{\pmacro}<br /> \def\pth{\pmacro}<br /> \def\pypsi{\pmacro}<br /> \def\pcypsi{\pmacro}<br /> \def\ppsic{\pmacro}<br /> \def\pcpsic{\pmacro}<br /> \def\pypsic{\pmacro}<br /> \def\pypsit{\pmacro}<br /> \def\pcypsit{\pmacro}<br /> \def\pypsiu{\pmacro}<br /> \def\pcypsiu{\pmacro}<br /> \def\pypsith{\pmacro}<br /> \def\pypsithcut{\pmacro}<br /> \def\pypsithc{\pmacro}<br /> \def\pcypsiut{\pmacro}<br /> \def\pcpsithc{\pmacro}<br /> \def\pcthy{\pmacro}<br /> \def\pyth{\pmacro}<br /> \def\pcpsiy{\pmacro}<br /> \def\pz{\pmacro}<br /> \def\pw{\pmacro}<br /> \def\pcwz{\pmacro}<br /> \def\pw{\pmacro}<br /> \def\pcyipsii{\pmacro}<br /> \def\pyipsii{\pmacro}<br /> \def\pypsiij{\pmacro}<br /> \def\pyipsi1{\pmacro}<br /> \def\ptypsiij{\pmacro}<br /> \def\pcyzipsii{\pmacro}<br /> \def\pczipsii{\pmacro}<br /> \def\pcyizpsii{\pmacro}<br /> \def\pcyijzpsii{\pmacro}<br /> \def\pcyi1zpsii{\pmacro}<br /> \def\pcypsiz{\pmacro}<br /> \def\pccypsiz{\pmacro}<br /> \def\pypsiz{\pmacro}<br /> \def\pcpsiz{\pmacro}<br /> \def\peps{\pmacro}<br /> \def\vt{ {t} }<br /> \def\mlxtran{\mathbb MLXtran}<br /> $<br /> <br /> ==Introduction==<br /> <br /> A model built for real-world applications can involve various types of variable, such as measurements, individual and population parameters, covariates, design, etc. The model allows us to represent relationships between these variables.<br /> <br /> If we consider things from a probabilistic point of view, some of the variables will be random, so the model becomes a probabilistic one, representing the joint distribution of these random variables.<br /> <br /> Defining a model therefore means defining a joint distribution. The hierarchical structure of the model will then allow it to be decomposed into submodels, i.e., the joint distribution decomposed into a product of conditional distributions.<br /> <br /> Tasks such as estimation, model selection, simulation and optimization can then be expressed as specific ways of using this probability distribution.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - A model is a joint probability distribution. <br /> <br /> - A submodel is a conditional distribution derived from this joint distribution. <br /> <br /> - A task is a specific use of this distribution. <br /> }}<br /> <br /> We will illustrate this approach starting with a very simple example that we will gradually make more sophisticated. Then we will see in various situations what can be defined as the model and what its inputs are.<br /> <br /> <br /> &lt;br&gt;<br /> ==An illustrative example==<br /> <br /> &lt;br&gt;<br /> ===A model for the observations of a single individual===<br /> Let$y=(y_j, 1\leq j \leq n)$be a vector of ''observations'' obtained at times$\vt=(t_j, 1\leq j \leq n)$. We consider that the$y_j$are random variables and we denote$\py$the distribution (or pdf) of$y$. If we assume that$y$is a ''parametric model'', then there exists a vector of parameters$\psi$that completely define$y$.<br /> <br /> We can then explicitly represent this dependency with respect to$\bpsi$by writing$\qy( \, \cdot \, ; \psi)$for the pdf of$y$.<br /> <br /> If we wish to be even more precise, we can even make it clear that this distribution is defined for a given design, i.e., a given vector of times$\vt$, and write$ \qy(\, \cdot \, ; \psi,\vt)$instead.<br /> <br /> By convention, the variables which are before the symbol &quot;;&quot; are random variables. Those that are after the &quot;;&quot; are non-random parameters or variables.<br /> When there is no risk of confusion, the non-random terms can be left out of the notation.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> -In this context, the model is the distribution of the observations$\qy(\, \cdot \, ; \psi,\vt)$.<br /> <br /> <br /> -The inputs of the model are the parameters$\psi$and the design$\vt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= 500 mg of a drug is given by intravenous bolus to a patient at time 0. We assume that the evolution of the plasmatic concentration of the drug over time is described by the pharmacokinetic (PK) model<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; f(t;V,k) = \displaystyle{ \frac{500}{V} }e^{-k \, t} , &lt;/math&gt; }}<br /> <br /> where$V$is the volume of distribution and$k$the elimination rate constant. The concentration is measured at times$(t_j, 1\leq j \leq n)$with additive residual errors:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; y_j = f(t_j;V,k) + e_j , \quad 1 \leq j \leq n . &lt;/math&gt; }}<br /> <br /> Assuming that the residual errors$(e_j)$are independent and normally distributed with constant variance$a^2$, the observed values$(y_j)$are also independent random variables and<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba1&quot; &gt;&lt;math&gt;<br /> y_j \sim {\cal N} \left( f(t_j ; V,k) , a^2 \right), \quad 1 \leq j \leq n. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1.4) }}<br /> <br /> Here, the vector of parameters$\psi$is$(V,k,a)$.$V$and$k$are the PK parameters for the structural PK model and$a$the residual error parameter.<br /> As the$y_j$are independent, the joint distribution of$\by$is the product of their marginal distributions:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \py(y ; \psi,\vt) = \prod_{j=1}^n \pyj(y_j ; \psi,t_j) ,<br /> &lt;/math&gt; }}<br /> <br /> where$\pyj$is the normal distribution defined in [[#ex_proba1|(1.4)]]. <br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the population parameters===<br /> <br /> In some cases, it may turn out that it is useful or important to consider that the population parameter$\theta$is itself random, rather than fixed. There are various reasons for this, such as if we want to model uncertainty in its value, introduce a priori information in an estimation context, or model an inter-population variability if the model is not looking at only one given population.<br /> <br /> If so, let us denote$\qth$the distribution of$\theta$. As the status of$\theta$has therefore changed, the model now becomes the joint distribution of its random variables, i.e., of$\by$,$\bpsi$and$\theta$, and can be decomposed as follows:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3a&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta;\bt,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1.7) }}<br /> <br /> <br /> {{Remarks<br /> |title=Remarks:<br /> |text=<br /> 1. The formula is identical for$\ppsi(\bpsi; \theta)$and$\pcpsith(\bpsi{{!}}\theta)$. What has changed is the status of$\theta$. It is not random in$\ppsi(\bpsi; \theta)$, the distribution of$\bpsi$for any given value of$\theta$, whereas it is random in$\pcpsith(\bpsi {{!}} \theta)$, the conditional distribution of$\bpsi$, i.e., the distribution of$\bpsi$obtained after observing randomly generating.<br /> <br /> 2. If$\qth$is a parametric distribution with parameter$\varphi$, this dependence can be made explicit by writing$\qth(\, \cdot \,;\varphi)$for the distribution of$\theta$.<br /> <br /> 3. Not necessarily all of the components of$\theta$need be random. If it is possible to decompose$\theta$into$(\theta_F,\theta_R)$, where$\theta_F$is fixed and$\theta_R$random, then the decomposition [[#proba3a{{!}}(1.7)]] becomes<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3b&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta_R;\bt,\theta_F,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta_R;\theta_F,\bc) \, \pth(\theta_R).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1.8) }}<br /> }}<br /> <br /> ===A model for several individuals===<br /> <br /> Now let us move to$N$individuals. It is natural to suppose that each is represented by the same basic parametric model, but not necessarily the exact same parameter values. Thus, individual$i$has parameters$\psi_i$. If we consider that individuals are randomly selected from the population, then we can treat the$\psi_i$as if they were random vectors. As both$\by=(y_i , 1\leq i \leq N)$and$\bpsi=(\psi_i , 1\leq i \leq N)$are random, the model is now a joint distribution:$\qypsi$. Using basic probability, this can be written as:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsi(\by,\bpsi) = \pcypsi(\by {{!}} \bpsi) \, \ppsi(\bpsi) .&lt;/math&gt; }}<br /> <br /> <br /> If$\qpsi$is a parametric distribution that depends on a vector$\theta$of ''population parameters'' and a set of ''individual covariates''$\bc=(c_i , 1\leq i \leq N)$, this dependence can be made explicit by writing$\qpsi(\, \cdot \,;\theta,\bc)$for the pdf of$\bpsi$.<br /> Each$i$has a potentially unique set of times$t_i=(t_{i1},\ldots,t_{i \ \!\!n_i})$in the design, and$n_i$can be different for each individual.<br /> <br /> <br /> {{OutlineTextL<br /> |text= <br /> - In this context, the model is the joint distribution of the observations and the individual parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pypsi(\by , \bpsi; \theta, \bc,\bt)=\pcypsi(\by {{!}} \bpsi;\bt) \, \ppsi(\bpsi;\theta,\bc) . &lt;/math&gt;}}<br /> <br /> - The inputs of the model are the population parameters$\theta$, the individual covariates$\bc=(c_i , 1\leq i \leq N)$and the measurement times$\bt=(t_{ij} ,\ 1\leq i \leq N ,\ 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= Let us suppose$ N$patients received the same treatment as the single patient did. We now have the same PK model [[#ex_proba1|(1.4)]] for each patient, except that each has its own individual PK parameters$ V_i$and$ k_i$and potentially its own residual error parameter$ a_i$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2a&quot;&gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N} \left( \displaystyle{\frac{500}{V_i}e^{-k_i \, t_{ij} } } , a_i^2 \right).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1.5) }}<br /> <br /> Here,$\psi_i = (V_i,k_i,a_i)$. One possible model is then to assume the same residual error model for all patients, and log-normal distributions for$V$and$k$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> a_i &amp;=&amp; a \end{eqnarray}&lt;/math&gt; }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2b&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \log(V_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(V_{\rm pop})+\beta\,\log(w_i/70),\ \omega_V^2\right) \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(1.6) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \log(k_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(k_{\rm pop}),\ \omega_k^2\right), \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where the only covariate we choose to consider,$w_i$, is the weight (in kg) of patient$i$. The model therefore consists of the conditional distribution of the concentrations defined in [[#ex_proba2a|(1.5)]] and the distribution of the individual PK parameters defined in [[#ex_proba2b|(1.6)]]. The inputs of the model are the population parameters$\theta = (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$, the covariates (here, the weight)$(w_i, 1\leq i \leq N)$, and the design$\bt$.<br /> }}<br /> <br /> <br /> {{OutlineTextL<br /> |text=<br /> - In this context, the model is the joint distribution of the observations, the individual parameters and the population parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsith(\by,\bpsi,\theta;\bc,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta) &lt;/math&gt; }}<br /> <br /> <br /> - The inputs of the model are the individual covariates$\bc=(c_i , 1\leq i \leq N)$and the measurement times$\bt=(t_{ij} , 1\leq i \leq N , 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We can introduce prior distributions in order to model the inter-population variability of the population parameters$ V_{\rm pop}$and$k_{\rm pop}$: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba3&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> V_{\rm pop} &amp;\sim&amp; {\cal N}\left(30,3^2\right) <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(1.9) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> k_{\rm pop} &amp;\sim&amp; {\cal N}\left(0.1,0.01^2\right). \end{eqnarray}&lt;/math&gt; }}<br /> <br /> As before, the conditional distribution of the concentration is given by [[#ex_proba2a|(1.5)]]. Now, [[#ex_proba2b|(1.6)]] is the ''conditional distribution'' of the individual PK parameters, given$\theta_R=(V_{\rm pop},k_{\rm pop})$. The distribution of$\theta_R$is defined in [[#ex_proba3|(1.9)]]. Here, the inputs of the model are the fixed population parameters$\theta_F = (\omega_V,\omega_k,\beta,a)$, the weights$(w_i)$and the design$\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the covariates===<br /> <br /> <br /> Another scenario is to suppose that in fact it is the covariates$\bc$that are random, not the population parameters. This may either be in the context of wanting to simulate individuals, or when modeling and wanting to take into account uncertainty in the covariate values. If we note$\qc$the distribution of the covariates, the joint distribution$\qpsic$of the individual parameters and the covariates decomposes naturally as:<br /> <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba4&quot;&gt;&lt;math&gt;<br /> \ppsic(\bpsi,\bc;\theta) = \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) \ ,<br /> &lt;/math&gt;<br /> |reference=(1.10) }}<br /> <br /> where$\qcpsic$is the conditional distribution of$\bpsi$given$\bc$.<br /> <br /> <br /> {{OutlineTextL<br /> |text= In this context, the model is the joint distribution of the observations, the individual parameters and the covariates:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsic(\by,\bpsi,\bc;\theta,\bt) = \pcypsi(\by {{!}} \bpsi;\bt) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) . &lt;/math&gt; }}<br /> <br /> <br /> - The inputs of the model are the population parameters$\theta$and the measurement times$\bt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We could assume a normal distribution as a prior for the weights: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba4&quot; &gt;&lt;math&gt; <br /> w_i \sim_{i.i.d.} {\cal N}\left(70,10^2\right)&lt;/math&gt;&lt;/div&gt;<br /> |reference=(1.11) }}<br /> <br /> Once more, [[#ex_proba2a|(1.5)]] defines the conditional distribution of the concentrations. Now, [[#ex_proba2b|(1.6)]] is the ''conditional distribution'' of the individual PK parameters, given the weight$\bw$, which is now a random variable whose distribution is defined in [[#ex_proba4|(1.11)]]. Now, the inputs of the model are the population parameters$\theta= (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$and the design$\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the measurement times===<br /> <br /> Another scenario is to suppose that there is uncertainty in the measurement times$\bt=(t_{ij})$and not the population parameters or covariates. If we note$\nominal{\bt}=(\nominal{t}_{ij}, 1\leq i \leq N, 1\leq j \leq n_i)$the nominal measurement times (i.e., those presented in a data set), then the &quot;true&quot; measurement times$\bt$at which the measurement were made can be considered random fluctuations around$\nominal{\bt}$following some distribution$\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> Randomness with respect to time can also appear in the presence of dropout, i.e., individuals that prematurely leave a clinical trial. For such an individual$i$who leaves at the random time$T_i$, measurement times are the nominal times before$T_i$:$t_{i} = (\nominal{t}_{ij} \ \ {\rm s.t. }\ \ \nominal{t}_{ij}\leq T_i)$.<br /> In such situations, measurement times are therefore random and can be thought to come from a distribution$\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> <br /> {{Remarks<br /> |title=Remark:<br /> |text= If there are also other regression variables$\bx=(x_{ij})$, it is of course possible to use the same approach and consider$\bx$as a random variable fluctuating around$\nominal{\bx}$. }}<br /> <br /> {{OutlineTextL<br /> |text=<br /> - In this context, the model is the joint distribution of the observations, the individual parameters and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsit(\by , \bpsi,\bt; \theta,\bc,\nominal{\bt})=\pcypsit(\by {{!}}\bpsi,\bt) \, \ppsi(\bpsi;\theta,\bc) \, \pt(\bt ; \nominal{\bt}) .<br /> &lt;/math&gt; }}<br /> <br /> <br /> - The inputs of the model are the population parameters$\theta$, the individual covariates$\bc$and the nominal design$\nominal{\bt}$.<br /> }}<br /> <br /> {{Example<br /> |title=Example:<br /> |text= Let us assume as prior a normal distribution around the nominal times: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba5&quot; &gt;&lt;math&gt; <br /> t_{ij} \sim_{i.i.d.} {\cal N}\left(\nominal{t}_{ij},0.03^2\right) &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1.12) }}<br /> <br /> Here, [[#ex_proba5|(1.12)]] defines the distribution of the now random variable$ \bt$. The other components of the model defined in [[#ex_proba2a|(1.5)]] and [[#ex_proba2b|(1.6)]] remain unchanged. <br /> The inputs of the model are the population parameters$ \theta$, the weights$ (w_i)$and the nominal measurement times$ \nominal{\bt}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the dose regimen===<br /> <br /> If the structural model is a dynamical system (e.g., defined by a system of ordinary differential equations), the ''source terms''$\bu = (u_i, 1\leq i \leq N)$, i.e., the inputs of the dynamical system, are usually considered fixed and known. This is the case for example for doses administered to patients for a given treatment. Here, the source term$u_i$is made up of the dose(s) given to patient$i$, the time(s) of administration, and their type (IV bolus, infusion, oral, etc.).<br /> <br /> Here again, there may be differences between the nominal dosage regimen stated in the protocol and given in the data set, and the dosage regimem that was in reality administered. For example, it might be that the times of administration and/or dose were not exactly respected or recorded. Also, there may have been non compliance, i.e., certain doses that were not taken by the patient.<br /> <br /> If we denote$\nominal{\bu}=(\nominal{u}_{i}, 1\leq i \leq N)$the nominal dose regimens (reported in the dataset), then in this context the &quot;real&quot; dose regimens$\bu$can be considered to randomly fluctuate around$\nominal{\bu}$with some distribution$\qu(\, \cdot \, ; \nominal{\bu})$.<br /> <br /> <br /> {{OutlineTextL<br /> |text=<br /> - In this context, the model is the joint distribution of the observations, the individual parameters and the dose regimens:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsiu(\by , \bpsi,\bu; \theta,\bc,\bt,\nominal{\bu})=\pcypsiu(\by {{!}} \bpsi,\bu;\bt) \, \pu(\bu ; \nominal{\bu}) \, \ppsi(\bpsi;\theta,\bc) .<br /> &lt;/math&gt; }}<br /> <br /> <br /> - The inputs of the model are the population parameters$\theta$, the individual covariates$\bc$, the nominal design$\bt$and the nominal dose regimens$\nominal{\bu}$.<br /> }}<br /> <br /> {{Example<br /> |title=Example:<br /> |text=Suppose that instead of the one dose given in the example up to now, there are now repeated doses$(d_{ik}, k \geq 1)$administered to patient$i$at times$(\tau_{ik} , k \geq 1)$. Then, it is easy to see that<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6b&quot; &gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N}\left(f(t_{ij};V_i,k_i) , a_i^2 \right), &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1.13) }}<br /> <br /> where<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6a&quot; &gt;&lt;math&gt; <br /> f(t;V_i,k_i) = \sum_{k, \tau_{ik}&lt;t}\frac{d_{ik} }{V_i} \, e^{-k_i \, (t- \tau_{ik})} .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1.14) }}<br /> <br /> The &quot;real&quot; dose regimen administrated to patient$i$can be written$u_i=(d_{ik},\tau_{ik}, k\geq 1)$, and the prescribed dose regimen$\nominal{u}_i=(\nominal{d}_{ik},\nominal{\tau}_{ik}, k\geq 1)$.<br /> <br /> We can model the random fluctuations of the administration times$\tau_{ik}$around the nominal times$(\nominal{\tau}_{ij})$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6c&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \tau_{ik} &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\nominal{\tau}_{ik},0.02^2\right)\ , <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(1.15) }}<br /> <br /> and non compliance (here meaning that a dose is not taken):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6d&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \pi &amp;=&amp; \prob{d_{ik} = 0} \nonumber \\ &amp;=&amp; 1 - \prob{d_{ik}= \nominal{d}_{ik} }. <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(1.16) }}<br /> <br /> Here, [[#ex_proba6b{{!}}1.13)]] and [[#ex_proba6a{{!}}(1.14)]] define the conditional distributions of the concentrations$(y_{ij})$, [[#ex_proba2b{{!}}(1.6)]] defines the distribution of$\bpsi$and [[#ex_proba6c{{!}}(1.15)]] and [[#ex_proba6d{{!}}(1.16)]] define the distribution of$\bu$. The inputs are the population parameters$\theta$, the weights$(w_i)$, the measurement times$\bt$and the nominal dose regimens$\nominal{\bu}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> ===A complete model===<br /> <br /> We have now seen the variety of ways in which the variables in a model either play the role of random variables whose distribution is defined by the model, or that of nonrandom variables or parameters. Any combination is possible, depending on the context. For instance, the population parameters$\theta$and covariates$\bc$could be random with parametric probability distributions$\qth(\, \cdot \,;\varphi)$and$\qc(\, \cdot \,;\gamma)$, and the dose regimen$\bu$and measurement times$\bt$reported with uncertainty and therefore modeled as random variables with distribution$\qu$and$\qt$.<br /> <br /> {{OutlineTextL<br /> |text=<br /> - In this context, the model is the joint distribution of the observations, the individual parameters, the population parameters, the dose regimens, the covariates and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsithcut(\by , \bpsi, \theta, \bu, \bc,\bt; \nominal{\bu},\nominal{\bt},\varphi,\gamma)=\pcypsiut(\by {{!}}\bpsi,\bu,\bt) \, \pcpsithc(\bpsi{{!}}\theta,\bc) \, \pth(\theta;\varphi) \, \pc(\bc;\gamma) \, \pu(\bu ; \nominal{\bu}) \, \pt(\bt ; \nominal{\bt}). &lt;/math&gt; }}<br /> <br /> <br /> - The inputs of the model are the nominal dose regimens$\nominal{\bu}$, the nominal measurement times$\nominal{\bt}$and the &quot;hyper-parameters&quot;$\varphi$and$\gamma$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Using the model for executing tasks==<br /> <br /> <br /> In a modeling and simulation context, the tasks to execute make specific use of the various probability distributions associated with a model.<br /> <br /> &lt;br&gt;<br /> ===Simulation===<br /> <br /> <br /> By definition, simulation makes direct use of the probability distribution that defines the model. Simulation of the global model is straightforward as soon as the joint distribution can be decomposed into a product of easily simulated conditional distributions.<br /> <br /> Consider for example that the variables involved in the model are those introduced in [[#An illustrative example|the previous section]]:<br /> <br /> <br /> # The population parameters$\theta$can either be given, or simulated from the distribution$\qth$.<br /> # The individual covariates$\bc$can either be given, or simulated from the distribution$\qc$.<br /> # The individual parameters$\bpsi$can be simulated from the distribution$\qcpsithc$using the values of$\theta$and$\bc$obtained in steps 1 and 2.<br /> # The dose regimen$\bu$can either be given, or simulated from the distribution$\qu$.<br /> # The measurement times$\bt$(or regression variables$\bx$) can either be given, or simulated from the distribution$\qt$.<br /> # Lastly, observations$\by$can be simulated from the distribution$\qcypsiut$using the values of$\bpsi$,$\bu$and$\bt$obtained at steps 3, 4 and 5.<br /> <br /> <br /> {{OutlineTextL<br /> |text=<br /> Simulation of a set of variables$w$using another given set of variables$z$requires:<br /> <br /> <br /> * a model, i.e., a distribution$\pw$if$z$is treated as a nonrandom variable, or a conditional distribution$\qcwz$if$z$is treated as a random variable.<br /> * the input$z$, i.e., a value of$z$which allows the distribution$\qw(\, \cdot \, ; z)$or the conditional distribution$\qcwz(\, \cdot \, {{!}} z)$to be defined.<br /> * an algorithm which allows us to generate$w$from$\qw$or$\qcwz$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=<br /> - Imagine instead that the population parameter$\theta$and the design$(\bu,\bt)$are given, and we want to simulate the individual covariates$\bc$, the individual parameters$\bpsi$and the observations$\by$. Here, the variables to simulate are$w=(\bc,\bpsi,\by)$and the variables which are given are$z=(\theta,\bu,\bt)$. If the components of$z$are taken to be nonrandom variables, then:<br /> <br /> <br /> * The model is the joint distribution$\qypsic( \, \cdot \, ;\theta,\bu,\bt)$of$(\by,\bpsi,\bc)$.<br /> * The inputs required for the simulation are the values of$(\theta,\bu,\bt)$.<br /> * The algorithm should be able to generate$(\by,\bpsi,\bc)$from the joint distribution$\qypsic(\, \cdot \, ;\theta,\bu,\bt)$. Decomposing the model into three submodels leads to decomposing the joint distribution as<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsic(\by,\bpsi,\bc ;\theta,\bu,\bt) = \pc(\bc) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pcypsi(\by {{!}} \bpsi;\bu,\bt) . &lt;/math&gt; }}<br /> <br /> The algorithm therefore reduces to successively drawing$\bc$,$\bpsi$and$\by$from$\qc$,$\qcpsic(\, \cdot \, {{!}} \bc;\theta)$and$\qcypsi(\, \cdot \, {{!}} \bpsi;\bu,\bt)$. <br /> <br /> <br /> - Imagine instead that the individual covariates$\bc$, the observations$\by$, the design$(\bu,\bt)$and the population parameter$\theta$are given (in a modeling context for instance,$\theta$may have been estimated), and we want to simulate the individual parameters$\bpsi$. The only variable to simulate is$w=\bpsi$and the variables which are given are$z=(\by,\bc,\theta,\bu,\bt)$. Here, we will treat$\by$as if it is a random variable. The other components of$z$can be treated as non random variables. Here,<br /> <br /> <br /> * The model is the conditional distribution$\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$of$\psi$.<br /> * The inputs required for the simulation are the values of$(\by,\bc,\theta,\bu,\bt)$.<br /> * The algorithm should be able to sample$\bpsi$from the conditional distribution$\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$. Markov Chain Monte Carlo (MCMC) algorithms can be used for sampling from such complex conditional distributions.<br /> }}<br /> <br /> &lt;br&gt;<br /> ===Estimation of the population parameters===<br /> <br /> <br /> In a modeling context, we usually assume that we have data that includes the observations$\by$and the measurement times$\bt$. There may also be individual covariates$\bc$, and in pharmacological applications the dose regimen$\bu$. For clarity, let us consider the most general case where all are present.<br /> <br /> Any statistical method for estimating the population parameters$\theta$will be based on some specific probability distribution. Let us illustrate this with two common statistical methods: maximum likelihood and Bayesian estimation.<br /> <br /> <br /> ''Maximum likelihood estimation'' consists in maximizing with respect to$\theta$the ''observed likelihood'', defined by:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \like(\theta ; \by,\bc,\bu,\bt) &amp;\eqdef&amp; \py(\by ; \bc,\bu,\bt,\theta) \\<br /> &amp;=&amp; \int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The variance of the estimator$\thmle$and therefore confidence intervals can be derived from the observed Fisher information matrix, which itself is calculated using the observed likelihood (i.e., the pdf of the observations$\by$):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ofim_intro3&quot;&gt;&lt;math&gt;<br /> \ofim(\thmle ; \by,\bc,\bu,\bt) \ \ \eqdef \ \ - \displaystyle{ \frac{\partial^2}{\partial \theta^2} } \log(\like(\thmle ; \by,\bc,\bu,\bt)) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1.17) }}<br /> <br /> <br /> {{OutlineTextL<br /> |text=Maximum likelihood estimation of the population parameter$\theta$requires:<br /> <br /> <br /> * a model, i.e., a joint distribution$\qypsi$.<br /> * inputs$\by$,$\bc$,$\bu$and$\bt$.<br /> * an algorithm which allows us to maximize$\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi$with respect to$\theta$and to compute$\displaystyle{ \frac{\partial^2}{\partial \theta^2} }\left\{\log\left(\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\thmle) \, d \bpsi \right)\right\}$.<br /> }}<br /> <br /> <br /> ''Bayesian estimation'' consists in estimating and/or maximizing the conditional distribution<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcthy(\theta {{!}} \by ;\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pyth(\by , \theta ; \bc,\bu,\bt)}{\py(\by ; \bc,\bu,\bt)} } \\<br /> &amp;=&amp; \frac{\displaystyle{ \int \pypsith(\by,\bpsi,\theta ; \bc,\bu,\bt) \, d \bpsi} }{\py(\by ; \bc,\bu,\bt)} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> {{outlineTextL<br /> |text= Bayesian estimation of the population parameter$\theta$requires:<br /> <br /> <br /> * a model, i.e., a joint distribution$\qypsith(\, \cdot \, ; \bc, \bu, \bt)$for$(\by,\bpsi,\theta)$.<br /> * inputs$\by$,$\bc$,$\bu$and$\bt$.<br /> * algorithms able to estimate and maximize$\pcthy(\theta {{!}} \by ;\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode, i.e., finding its maximum.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> ===Estimation of the individual parameters===<br /> <br /> <br /> When$\theta$is given (or estimated), various estimators of the individual parameters$\bpsi$are available. They are all based on a probability distribution:<br /> <br /> ''Maximum likelihood estimation'' consists of maximizing with respect to$\bpsi$the ''conditional likelihood''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \like(\bpsi ; \by,\bu,\bt) &amp;\eqdef&amp; \pcypsi(\by {{!}} \bpsi ;\bu,\bt) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''maximum a posteriori'' (MAP) estimator is obtained by maximizing with respect to$\bpsi$the ''conditional distribution''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcpsiy(\bpsi {{!]}} \by ; \theta,\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pypsi(\by , \bpsi;\theta,\bc,\bu,\bt)}{\py(\by ; \theta,\bc,\bu,\bt)} } .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''conditional mean'' of$\bpsi$is defined as the mean of the conditional distribution$\qcpsiy(\, \cdot \, | \by ; \theta,\bc,\bu,\bt)$of$\psi$.<br /> <br /> <br /> {{OutlineTextL<br /> |text=<br /> Estimation of the individual parameters$\bpsi$requires:<br /> <br /> <br /> * a model, i.e., a joint distribution$\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$for$(\by,\bpsi)$.<br /> * inputs$\by$,$\theta$,$\bc$,$\bu$and$\bt$.<br /> * algorithms able to estimate and maximize$\pcpsiy(\bpsi {{!}} \by ; \theta,\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode (i.e., its MAP).<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> ===Model selection===<br /> <br /> <br /> Likelihood ratio tests and statistical information criteria (BIC, AIC) compare the ''observed likelihoods'' computed under different models, i.e., the probability distribution functions$\py^{(1)}(\by ; \bc,\bu,\bt,\thmle_1)$,$\py^{(2)}(\by ; \bc,\bu,\bt,\thmle_2)$, \ldots ,$\py^{(K)}(\by ; \bc,\bu,\bt,\thmle_K)$computed under models${\cal M}_1, {\cal M}_2, \ldots, {\cal M}_K$, where$\thmle_k$maximizes the observed likelihood of model${\cal M}_k$, i.e. maximizes$\py^{(k)}(\by ; \bc,\bu,\bt,\theta)$.<br /> <br /> {{outlineTextL<br /> |text=<br /> Computing the observed likelihood and information criteria require:<br /> <br /> <br /> * a model, i.e., a joint distribution$\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$for$(\by,\bpsi)$.<br /> * inputs$\by$,$\theta$,$\bc$,$\bu$and$\bt$.<br /> * an algorithm able to compute$\int \pypsi( \by ,\bpsi ;\theta,\bc,\bu,\bt) \, d\bpsi$. For nonlinear models, linearization methods or Monte-Carlo methods can be used.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> ===Optimal design===<br /> <br /> In the design of experiments for estimating statistical models, optimal designs allow parameters to be estimated with minimum variance by optimizing some statistical criteria. Common optimality criteria are functionals of the eigenvalues of the expected Fisher information matrix:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;efim_intro3&quot;&gt;&lt;math&gt;<br /> \efim(\theta ; \bu,\bt) \ \ \eqdef \ \ \esps{y}{\ofim(\theta ; \by,\bu,\bt)} ,<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1.18) }}<br /> <br /> where$\ofim$is the observed Fisher information matrix defined in [[#ofim_intro3|(1.17)]]. For the sake of simplicity, we consider models without covariates$\bc$.<br /> <br /> <br /> {{OutlineTextL<br /> |text=Optimal design for minimum variance estimation requires:<br /> <br /> <br /> * a model, i.e., a joint distribution$\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$for$(\by,\bpsi)$.<br /> * a vector of population parameters$\theta$.<br /> * a criteria${\cal D}(\bu,\bt)$derived from the expected Fisher information matrix$\efim(\theta ; \bu,\bt)$.<br /> * an algorithm able to estimate$\efim(\theta ; \bu,\bt)$for any design$(\bu,\bt)$and to maximize${\cal D}(\bu,\bt)$with respect to$\bu$and$\bt$.<br /> }}<br /> <br /> <br /> In a clinical trial context, studies are designed to optimize the probability of reaching some predefined target${\cal A}$, i.e.,$\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$. This may include optimizing safety and efficacy, and things like the probability of reaching sustained virologic response, etc.<br /> <br /> <br /> {{OutlineTextL<br /> |text=Optimal design for clinical trials requires:<br /> <br /> <br /> * a model, i.e., a joint distribution$\qypsic(\, \cdot \, ; \theta, \bu, \bt)$for$(\by,\bpsi,\bc)$.<br /> * a vector of population parameters$\theta$.<br /> * a target${\cal A}$.<br /> * an algorithm able to estimate$\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$and to maximize it with respect to$\bu$and$\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> ==Implementing models with$\mlxtran$and running tasks==<br /> <br /> <br /> ===Example 1 ===<br /> <br /> Consider first the model defined by the joint distribution <br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\pypsi(\by,\bpsi ; \theta ,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \pcpsic(\bpsi ; \theta)&lt;/math&gt;}}<br /> <br /> where as in our running example, <br /> <br /> *$\by = (y_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$are concentrations <br /> <br /> *$ \bpsi= (\psi_i, 1\leq i \leq N)$are individual parameters; here$ \psi_i=(V_i,k_i,a_i)$<br /> <br /> *$ \theta=(V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,a)$are population parameters <br /> <br /> *$ \bt = (t_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$are the measurement times. <br /> <br /> <br /> We aim to define a joint model for$\by$and$\bpsi$. To do this, we will characterize each component of the model and show how this can be implemented with$\mlxtran$.<br /> <br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style='width=550px |<br /> {{Equation2 <br /> |name=&lt;math&gt; \pypsi(\by,\bpsi ; \theta, \bt) &lt;/math&gt; <br /> |equation= }}<br /> {{Equation2<br /> |name= &lt;math&gt; \pcpsic(\bpsi {{!}}\theta)&lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{array}{c} <br /> \log(V_i) &amp;\sim&amp; {\cal N}\left(\log(V_{\rm pop}), \, \omega_V^2\right) \\<br /> \log(k_i) &amp;\sim&amp; {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br /> \end{array}&lt;/math&gt; }}<br /> {{Equation2<br /> |name= &lt;math&gt;\pcypsi(y{{!}}\bpsi; \bt) &lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{eqnarray}<br /> f(t;V_i,k_i) &amp;=&amp; \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br /> y_{ij} &amp;\sim&amp; {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> |style = &quot;width=550px |<br /> {{MLXTran<br /> |name=Example 1<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> [INDIVIDUAL PARAMETER]<br /> input={V_pop,k_pop,omega_V,omega_k}<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=V_pop,<br /> sd=omega_V}<br /> k = {distribution=logNormal, prediction=k_pop,<br /> sd=omega_k}<br /> <br /> <br /> [OBSERVATION]<br /> input={V,k,a}<br /> <br /> EQUATION:<br /> f = 500/V*exp(-k*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, sd=a}<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> We could then use this model for simulation, using for example the {{Verbatim|R}} function&lt;ref&gt;or a Matlab function&lt;/ref&gt; {{Verbatim|simulate}}<br /> <br /> <br /> {{JustCode<br /> |code=&lt;pre style=&quot;background-color:#EFEFEF; border:none;color:blue&quot;&gt;<br /> &gt;res=simulate(model=&quot;jointModel.txt&quot;,input=data,output,simulSettings)<br /> &lt;/pre&gt;}}<br /> <br /> where {{Verbatim|data}} is an {{Verbatim |R}} list (or a Matlab structure) which contains the design and the input variables, {{Verbatim |output}} contains the names of the variables to simulate and {{Verbatim |simulSettings}} settings that may be required for the simulation. Here are some examples and the relevant settings required:<br /> <br /> <br /> * If we want to simulate both$\by$and$\bpsi$with the joint distribution$\qypsi(\, \cdot \, ; \theta , \bt)$, {{Verbatim |data}} contains the measurement times$\bt$and the population parameter$\theta$, and {{Verbatim |output = c(&quot;V&quot;,&quot;k&quot;,&quot;y&quot;)}}. The settings are for example the seed used for generating random numbers.<br /> * If we want to simulate$\bpsi$with the conditional distribution$\qcpsiy(\, \cdot \, | \by ; \theta, \bt)$, {{Verbatim |data}} contains the measurement times$\bt$, the observations$\by$and the population parameter$\theta$, and {{Verbatim |output = c(&quot;V&quot;,&quot;k&quot;)}}. Here, an MCMC algorithm can be used for simulating this conditional distribution. Then, {{Verbatim |simulSettings}} are the settings used for the MCMC (number of iterations, transition kernels, etc.).<br /> <br /> <br /> The same model can be used for computing any pdf (followed by any log-likelihood) with the function {{Verbatim |computepdf}}:<br /> <br /> {{JustCode<br /> |code=&lt;pre style=&quot;background-color:#EFEFEF; border:none;color:blue&quot;&gt;<br /> &gt;res=computepdf(model=&quot;jointModel.txt&quot;,input=data,output,pdfSettings)<br /> &lt;/pre&gt;}}<br /> <br /> If we want to compute the observed log-likelihood for a given value of$\theta$, we could use {{Verbatim |computepdf}} for computing the pdf of the observations$\py(\by ; \theta,\bt)$. In this case, {{Verbatim |data}} contains the measurement times$\bt$and the population parameter$\theta$, and {{Verbatim| output = &quot;y&quot;}}. Here, {{Verbatim |pdfSettings}} are for example the settings of the Monte Carlo method used for estimating$\py(\by ; \theta,\bt)$.<br /> <br /> The same model could also be used for maximizing a pdf (and then computing an estimate of certain parameters) with the function {{Verbatim |maximizepdf}}:<br /> <br /> <br /> {{JustCode<br /> |code=&lt;pre style=&quot;background-color:#EFEFEF; border:none;color:blue&quot;&gt;<br /> &gt;res=maximizepdf(model=&quot;jointModel.txt&quot;,input=data,output,variable,estimSettings)<br /> &lt;/pre&gt;}}<br /> <br /> <br /> This function can be used for computing the maximum likelihood estimate of$\theta$: {{Verbatim data}} contains the measurement times$\bt$and the observations$\by$, {{Verbatim output}} is the name of the variable whose pdf is computed &lt;br&gt;{{Verbatim variable=c(&quot;V_pop&quot;,&quot;k_pop&quot;,&quot;omega_V&quot;,&quot;omega_k&quot;,a&quot;)}} is the list of population parameters and {{Verbatim estimSettings}} are settings for an algorithm that stochastically approximates the EM algorithm.<br /> <br /> <br /> &lt;br&gt;<br /> ===Example 2===<br /> <br /> Consider now a model defined by the joint distribution<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsithc(\by,\bpsi, \theta, \bc ; \bt) = \pcypsi(\by{{!}}\bpsi;\bt) \pcpsic(\bpsi{{!}}\bc ; \theta) \, \pth(\theta) \pc(\bc) ,<br /> &lt;/math&gt; }}<br /> <br /> where the covariates$\bc$are the weights of the individuals:$\bc = (w_i, 1\leq i \leq N)$. The other variables and parameters are those already defined in the previous example.<br /> <br /> We now aim to define a joint model for$\by$,$\bpsi$,$\bc$and$\theta_R=(V_{\rm pop},k_{\rm pop})$.<br /> <br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width=550px&quot; |<br /> {{Equation2 <br /> |name= &lt;math&gt;\pypsithc(\by,\bpsi, \theta, \bc ; \bt)&lt;/math&gt;<br /> |equation= }}<br /> {{Equation2<br /> |name=&lt;math&gt;\pth(\theta)&lt;/math&gt;<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> V_{\rm pop} &amp;\sim&amp; {\cal N}\left(30,3^2\right) \\<br /> k_{\rm pop} &amp;\sim&amp; {\cal N}\left(0.1,0.01^2\right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> {{Equation2<br /> |name=&lt;math&gt;\pc(\bc)&lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{eqnarray}<br /> w_i &amp;\sim&amp; {\cal N}\left(70,10^2\right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> {{Equation2<br /> |name=&lt;math&gt;\pcpsic(\bpsi {{!}}\bc;\theta)&lt;/math&gt;<br /> |equation=&lt;math&gt;<br /> \begin{eqnarray}<br /> \hat{V}_i &amp;=&amp; V_{\rm pop}\left(\frac{w_i}{70}\right)^\beta \\[0.4cm]<br /> \log(V_i) &amp;\sim&amp; {\cal N}\left(\log(\hat{V}_i), \, \omega_V^2\right) \\<br /> \log(k_i) &amp;\sim&amp; {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> {{Equation2<br /> |name=&lt;math&gt;\pcypsi(y{{!}}\bpsi; \bt) &lt;/math&gt;<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> f(t;V_i,k_i) &amp;=&amp; \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br /> y_{ij} &amp;\sim&amp; {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> |style=&quot;width:550px&quot;|<br /> {{MLXTran<br /> |name=jointModel2.txt<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none&quot;&gt;<br /> [POPULATION PARAMETER]<br /> <br /> DEFINITION:<br /> V_pop = {distribution=normal, mean=30, sd=3}<br /> k_pop = {distribution=normal, mean=0.1, sd=0.01}<br /> <br /> <br /> [COVARIATE]<br /> <br /> DEFINITION:<br /> weight = {distribution=normal, mean=70, sd=10}<br /> <br /> <br /> <br /> [INDIVIDUAL PARAMETER]<br /> input={V_pop,k_pop,omega_V,omega_k,beta,weight}<br /> <br /> EQUATION:<br /> V_pred = V_pop*(weight/70)^beta<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=V_pred,<br /> sd=omega_V}<br /> k = {distribution=logNormal, prediction=k_pop,<br /> sd=omega_k}<br /> <br /> <br /> [OBSERVATION]<br /> input={V,k,a}<br /> <br /> EQUATION:<br /> f = 500/V*exp(-k*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, sd=a}<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> We can use the approach described above for various tasks, e.g., simulating$(\by,\bpsi, \bc, \theta_R)$for a given input$(\theta_F, \bt)$, simulating the population parameters$(V_{\rm pop},k_{\rm pop})$with the conditional distribution$p_{\theta_R|\by, \bc}( \, \cdot \, | \by, \bc ; \theta_F,\bt), estimating the log-likelihood, maximizing the observed likelihood and computing the MAP.<br /> <br /> == References==<br /> &lt;references/&gt;<br /> <br /> <br /> &lt;br&gt;<br /> ==Bibliography==</div> Admin https://wiki.inria.fr/popix/TestTable TestTable 2013-04-18T09:50:47Z <p>Admin: </p> <hr /> <div>{| class=&quot;wikitable&quot; style=&quot;text-align:center; font-size:10pt; width: 500px; background-color:#F5F5DC;&quot;<br /> !| ID ||A_0$||$A_1$||$A_2$||$A_3$||$\lambda_1$||$\lambda_2$||$\lambda_3$||$\tau$<br /> |-<br /> | 1 || 92 ||$-$||$-$||$-$||$-$||$-$||$-$||$-$<br /> |-<br /> | 2 ||$-$|| 66 || 5 ||$-$|| 0.14 ||$2\times10^{-5}$||$-$||$-$<br /> |-<br /> | 3 ||$-$|| 53 || 6 ||28 ||0.15 ||$1.5\times10^{-5}$||0.15 ||200<br /> |-<br /> | 4 ||$-$|| 77 || 10 ||100 ||0.1 ||$1.5\times10^{-5}$||0.013 ||270 <br /> |}</div> Admin https://wiki.inria.fr/popix/Stochastic_Differential_Equations_based_models Stochastic Differential Equations based models 2013-04-18T08:35:15Z <p>Admin: /* Diffusion models for dynamical systems with linear transfers */</p> <hr /> <div>&lt;!-- Menu for the Extensions chapter --&gt;<br /> &lt;sidebarmenu&gt;<br /> +[[Extensions]]<br /> *[[Extensions| Introduction ]] | [[ Mixture models ]] | [[Hidden Markov models]] | [[Stochastic differential equations based models]] <br /> &lt;/sidebarmenu&gt;<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> ==Introduction==<br /> <br /> <br /> Diffusion models are known to be a relevant tool for modeling [http://en.wikipedia.org/wiki/Stochastic stochastic] dynamic phenomena, and are widely used in various fields including finance, physics, biology, physiology and control. In a population approach, a mixed-effects diffusion model describes each individual series of observations using a system of [http://en.wikipedia.org/wiki/Stochastic_differential_equations stochastic differential equations] (SDE) while also taking into account variability between individuals.<br /> <br /> For the sake of simplicity we will consider first a diffusion model for a single individual, and illustrate it with a very general [http://en.wikipedia.org/wiki/Dynamical_system dynamical system] with linear transfers and PK examples. We will then show that the extension to mixed diffusion models is fairly straightforward.<br /> <br /> Note that the conditional distribution$\qcypsi$of the observations usually does not have a closed-form expression. When the underlying system is a Gaussian linear dynamical one, the conditional pdf of the observations,$\pcypsi(y_i|\psi_i)$can be computed using the [http://en.wikipedia.org/wiki/Kalman_filter ''Kalman filter'' (KF)]. When the system is not linear, the [http://en.wikipedia.org/wiki/Extended_Kalman_Filter ''extended Kalman filter'' (EKF)] provides an approximation of the conditional pdf.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Diffusion model==<br /> <br /> <br /> We assume that one diffusion trajectory is observed with noise at discrete time points$t_1&lt;\ldots&lt;t_j&lt;\ldots&lt;t_n$. Let us note$(X(t),t&gt;0) \in \Rset^d$the underlying dynamical process and$y_j \in \Rset$a noisy function of$X(t_j)$,$j=1,\ldots,n$. The general form of the diffusion model is given by:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:SDEmodel&quot;&gt;&lt;math&gt;<br /> \left\{<br /> \begin{array}{lll}<br /> dX(t) &amp;=&amp; b(X(t),\psi)dt + \gamma(X(t),\psi)dW(t)\\[0.2cm]<br /> y_{j} &amp;=&amp; c(X(t_{j}),\psi) + \varepsilon_{j} \\[0.2cm]<br /> \varepsilon_{j} &amp;\underset{i.i.d.}{\sim}&amp; \mathcal{N}(0,a^2(\psi)), \quad j=1,\ldots,n ,<br /> \end{array}<br /> \right. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> with the initial condition$X(t_1) = x \in \Rset^d$. Here,$(W(t),t&gt;0)$is a standard [http://en.wikipedia.org/wiki/Wiener_process Wiener process] in$\Rset^d$and$\varepsilon_j \in \Rset$represents the measurement error occurring at the$j^{\mathrm{th}}$observation, independent of$W(t)$. The measurement function$c: \ \Rset^d \times \Rset^p \rightarrow \Rset$, the drift function$b: \ \Rset^d \times \Rset^p \rightarrow \Rset^d$and the diffusion function$\gamma: \ \Rset^d \times \Rset^p \rightarrow \mathcal{M}_d(\Rset)$, where$\mathcal{M}_d(\Rset)$is the set of$d \times d$matrices with real elements, are known functions that depend on an unknown parameter$\psi \in \Rset^p$.<br /> <br /> We can in fact consider an SDE-based model as a [http://en.wikipedia.org/wiki/Ordinary_differential_equation ODE]-based one with a stochastic component.<br /> <br /> <br /> {{Example1<br /> |title1=Example: <br /> |title2= &amp;#32; IV bolus with linear elimination<br /> <br /> |text= The ordinary differential equation <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:ode1&quot;&gt;&lt;math&gt; <br /> dA_c(t) = -k A_c(t) dt<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> is usually used to describe the kinetics of a drug administered by rapid injection (IV bolus) into plasma. In bolus-specific compartmental models, plasma is treated as the single compartment of the human body.$A_c(t)$represents the amount of a drug ingredient in plasma at time$t$after injection, and$k$is the elimination rate constant. The figure below displays the typical evolution of the amount found in the central compartment when$k=4$.<br /> <br /> {{ImageWithCaption|image=sde0.png|caption=Drug concentration evolution for ODE diffusion example }}<br /> <br /> <br /> Imagine now that we aim to describe the evolution of the drug amount over time by means of stochastic differential equations rather than ordinary differential equations, in order to better describe the ''intra-individual variability'' of the observed process. We can assume for example that the system [[#eq:ode1|(2)]] is randomly perturbed by an additive Wiener process:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:sde1&quot;&gt;&lt;math&gt;<br /> dA_c(t) = -k A_c(t) dt + \gamma dW(t). <br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> <br /> The figure below displays four kinetics for the amount in the central compartment, simulated from this model with$k=4$and$\gamma=2$.<br /> <br /> <br /> {{ImageWithCaption|image=sde1.png|caption=Drug concentration evolution for SDE diffusion example }}<br /> <br /> }}<br /> <br /> <br /> These kinetics are clearly stochastic. Nevertheless, they are not realistic because:<br /> <br /> <br /> * they give an overly erratic description of the evolution of the drug concentration within the compartments of the human body.<br /> <br /> * they do not comply with certain constraints on biological dynamics (sign, monotony).<br /> <br /> <br /> A more relevant model might consider that some parameters of the model randomly fluctuate over time, rather than the observed variable itself, modeling for example the elimination rate &quot;constant&quot;$k$as a stochastic process$k(t)$that randomly varies around a typical value$k^\star$.<br /> <br /> More generally, we can describe the fluctuations within a linear dynamical systems by considering the transfer rates, described below, as diffusion processes rather than the observed processes themselves.<br /> <br /> <br /> <br /> &lt;br&gt;<br /> ==Diffusion models for dynamical systems with linear transfers==<br /> <br /> <br /> Dynamical systems have applications in many fields. They can be used to model viral dynamics, population flows, interactions between cells, and drug pharmacokinetics. Dynamical systems involving linear transfers between different entities are usually modeled by means of a system of ODEs with the following general form:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:linearTransferODEModel&quot;&gt;&lt;math&gt;<br /> dA(t) = K\, A(t)dt,<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> <br /> where$A(t)$is a vector whose$l^{\textrm{th}}$component represents the condition of the$l^{\textrm{th}}$entity at time$t$and$K=(K_{l,l^\prime} \, 1\leq l , l^\prime \leq d)$a deterministic matrix defined as:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:K&quot;&gt;&lt;math&gt;<br /> \left\{<br /> \begin{array}{ll}<br /> K_{l,l^\prime} = k_{l,l^\prime} &amp; \textrm{if} \quad l \neq l^\prime\\<br /> K_{l,l} = - k_{l,0} - \sum_{l^\prime} k_{l,l^\prime} ,<br /> \end{array}<br /> \right.<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(5) }}<br /> <br /> where$k_{l,l^\prime}$represents the transfer rate from entity$l$to entity$l^\prime$, and$k_{l,0}$the elimination rate from entity$l$. An example of such a dynamical system with$3$components is schematized below.<br /> <br /> <br /> {{ImageWithCaption|image=linear.png|caption=A dynamical system with$3$components (circles) and linear transfers between components (arrows) }}<br /> <br /> <br /> In this particular example, matrix$K$would be defined as<br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;<br /> K = \begin{pmatrix}<br /> -k_{10} -k_{12} -k_{13} &amp; k_{21} &amp; k_{31}\\<br /> k_{12} &amp; -k_{20} -k_{21} -k_{23} &amp; k_{32}\\<br /> k_{13} &amp; k_{23} &amp; -k_{30} -k_{31} -k_{32}<br /> \end{pmatrix}.<br /> &lt;/math&gt; }}<br /> <br /> The model defined by equations [[#eq:linearTransferODEModel|(4)]] and [[#eq:K|(5)]] is a deterministic model which assumes that transfers take place at the same rate at all times. This is often a restrictive assumption since in reality, dynamical systems usually exhibit some random behavior. It is therefore reasonable to consider that transfers are not constant but randomly fluctuate over time. This new assumption leads to the following dynamical system:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:linearTransferSDEModel&quot;&gt;&lt;math&gt;<br /> dA(t) = K(t)A(t)dt,<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(6) }}<br /> <br /> where$K$has the same structure as in [[#eq:K|(5)]] but now some components$k_{l,l^\prime}$are stochastic processes which take non-negative values and randomly fluctuate around a typical value$k_{l,l^\prime}^\star$.<br /> <br /> Let us now illustrate the construction of such diffusion models using some specific examples in pharmacokinetics.<br /> <br /> <br /> {{Example1<br /> |title1=Example 1: <br /> |title2= &amp;#32; IV bolus administration with stochastic linear elimination<br /> <br /> |text= We will first extend the ODE based model defined in [[#eq:ode1|(2)]] by assuming that$k$is a diffusion process which takes non-negative values and fluctuates around a typical value$k^\star$.<br /> In this example, non-negativity of$k(t)$is ensured by defining the logarithm of the transfer rate as an [http://en.wikipedia.org/wiki/Ornstein%E2%80%93Uhlenbeck_process Ornstein-Uhlenbeck diffusion process]:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; d\log k(t) = - \alpha \left( \log k(t) - \log k^\star \right) dt + \gamma d W(t), &lt;/math&gt; }}<br /> <br /> where$W$is a standard one-dimensional Wiener process. This results in the following diffusion system:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; dX(t) = b(X(t))dt + \gamma(X(t))dW(t), &lt;/math&gt; }}<br /> <br /> where<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> X(t) = \begin{pmatrix} A_c(t) \\ \log k(t) \end{pmatrix}, \ \ \ \<br /> b(x) = \begin{pmatrix} -x_1 \exp(x_2) \\ -\alpha (x_2-\log k^{\star}) \end{pmatrix}, \ \ \ \<br /> \gamma(x) = \begin{pmatrix} 0 &amp; 0 \\ 0 &amp; \gamma \end{pmatrix}.<br /> &lt;/math&gt; }}<br /> <br /> Note that in this specific example, the Jacobian matrix of the drift function$b$has a simple form: <br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; B(x)=\begin{pmatrix} - \exp(x_2) &amp; -x_1 \exp(x_2)\\ 0 &amp; -\alpha \end{pmatrix}. &lt;/math&gt; }}<br /> <br /> The two figures below display four simulated processes$k(t)$and the associated amount processes$A_c(t)$.<br /> <br /> <br /> ::[[File:sde2.png|link=]]<br /> <br /> :::[[File:sde3.png|link=]]<br /> <br /> <br /> We measure the concentration at times$(t_{j}, 1\leq j \leq n)$:<br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;y_j = \displaystyle{\frac{A_c(t_{j})}{V} } + a \, \teps_j . &lt;/math&gt; }}<br /> <br /> The parameter vector of the model is therefore$\psi = (V, k^\star, \alpha, \gamma, a)$. We see in this example that the simulated kinetics are much more realistic than those obtained with the previous model, because:<br /> <br /> <br /> * the elimination rate process$k(t)$is a stochastic process that takes non-negative values,<br /> <br /> * even though the amount process is stochastic, it is smooth and decreases monotonically with time.<br /> }}<br /> <br /> <br /> <br /> {{Example1<br /> |title1=Example 2: <br /> |title2= &amp;#32; Oral administration with first-order absorption and stochastic linear elimination<br /> <br /> |text=Oral PK models with first-order absorption and linear elimination are widely used to describe the time-course of a drug orally administered to a unique compartment of the human body. The drug is administrated in a depot compartment, absorbed by the central compartment with absorption rate$k_a$and eliminated with elimination rate$k_e$. Such a model is described by the following system of ODEs:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:oral1&quot;&gt;&lt;math&gt;<br /> \displaystyle{ \frac{d}{dt} } \begin{pmatrix} A_d(t) \\ A_c(t) \end{pmatrix} \ \ = \ \ \begin{pmatrix} -k_a &amp; 0\\ k_a &amp; -k_e\end{pmatrix} \begin{pmatrix} A_d(t) \\ A_c(t) \end{pmatrix},<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(7) }}<br /> <br /> where$A_d(t)$and$A_c(t)$respectively represent the amounts of drug at time$t$in the depot and central compartments. Assume now that the elimination constant is driven by a stochastic process, solution to the stochastic differential equation<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; d k_e(t) = - \alpha (k_e - k_e^\star ) dt + \gamma \sqrt{k_e(t)} dW(t),<br /> &lt;/math&gt; }}<br /> <br /> where$W$is a standard one-dimensional Wiener process. Then [[#eq:oral1|(7)]] becomes:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; dX(t) = b(X(t))dt + \gamma(X(t))dW(t). &lt;/math&gt; }}<br /> <br /> Here,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> X(t)= \begin{pmatrix} A_d(t) \\ A_c(t) \\ k_e(t) \end{pmatrix}, \ \ \ \<br /> b(x) = \begin{pmatrix} -k_a x_1 \\ k_a x_1 -x_3 x_2 \\ -\alpha(x_3-k_e^\star ) \end{pmatrix}, \ \ \ \<br /> \gamma(x) = \begin{pmatrix} 0 &amp; 0 &amp; 0 \\ 0 &amp; 0 &amp; 0\\ 0 &amp; 0 &amp; \gamma \sqrt{x_3}\end{pmatrix} ,<br /> &lt;/math&gt; }}<br /> <br /> and the parameter vector of the model is$\psi = (V, k_a, k^\star, \alpha, \gamma, a) .$<br /> }}<br /> <br /> In both examples, the diffusion model can be easily extended to a population approach by defining the system's parameters$\psi$as an individual random vector.<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Mixed-effects diffusion models==<br /> <br /> Let us now consider model [[#eq:SDEmodel|(1)]] with observations coming from several subjects. An adequate adaptation of model [[#eq:SDEmodel|(1)]] in such a context consists of considering as many dynamical systems as individuals, and defining the parameters of the individual dynamical systems as independent random variables, in such a way to correctly reflect variability between the different trajectories. To standardize notation, we consider$N$different subjects randomly chosen from a population and note$n_i$the number of observations for individual$i$, so that$t_{i1}&lt;\ldots&lt;t_{i,n_i}$are subject$i$'s observation time points.$(X_i(t),t&gt;0) \in \Rset^d$and$y_{ij} \in \Rset$will respectively denote individual$i$'s diffusion and the observation$X_i(t_{ij})$. The$y_{ij}$,$i=1,\ldots,N$,$j=1,\ldots,n_i$are governed by a mixed-effects model based on a$d$-dimensional real-valued system of stochastic differential equations with the general form:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:SDEmixedModel&quot;&gt;&lt;math&gt;<br /> \left\{<br /> \begin{array}{l}<br /> dX_i(t) = b(X_i(t),\psi_i)dt + \gamma(X_i(t),\psi_i)dW_i(t),\\[0.2cm]<br /> y_{ij} = c(X_i(t_{ij}),\psi_i) + \teps_{ij},\\[0.2cm]<br /> \teps_{ij} \underset{i.i.d.}{\sim} \mathcal{N}(0,a^2(\psi_i)) \; , \; j=1,\ldots, n_i \; , \; i=1,\ldots,N,\\<br /> \end{array}<br /> \right.<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(8) }}<br /> <br /> with initial condition$X_i(t_1) = x_{i1} \in \Rset^d$for$i=1,\ldots,N$. The$\psi_i$'s are unobserved independent$d$-dimensional random subject-specific parameters, drawn from a distribution$\qpsi$which depends on a set of population parameters$\theta$,$(W_1(t),t&gt;0), \ldots, (W_N(t),t&gt;0)$are standard independent Wiener processes, and the$\teps_{ij}$are independent Gaussian random variables representing residual errors such that the$\psi_i$,$W_i$and$\teps_{ij}$are mutually independent.<br /> The measurement function$c$, the drift function$b$and the diffusion function$\gamma$are known functions that are common to the$N$subjects and depend on the unknown parameters$\psi_i$.<br /> <br /> Assuming that the$N$individuals are independent, the joint pdf is given by:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:sdepdf&quot;&gt;&lt;math&gt;<br /> \pcypsi(y_1,\ldots,y_N {{!}} \psi_1,\ldots,\psi_N) = \prod_{i=1}^{N}\pcyipsii(y_i {{!}} \psi_i).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(9) }}<br /> <br /> Computing the conditional distribution$\pcyipsii$of the observations for any individual$i$requires here to compute the conditional distribution of each observation given the past:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcyipsii(y_i {{!}} \psi_i) &amp;=&amp; \pyipsiONE(y_{i1} {{!}} \psi_i)\prod_{j=2}^{n_i} p(y_{i,j} {{!}} y_{i,1},\ldots,y_{i,j-1} {{!}} \psi_i) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Except in some very specific classes of mixed-effects diffusion models, the transition density$\pmacro(y_{i,j}|y_{i,1},\ldots,y_{i,j-1} | \psi_i)$does not have a closed-form expression since it involves the transition densities of the underlying diffusion processes$X_i$.<br /> When the underlying system is a Gaussian linear dynamical system, this density is a Gaussian density whose mean and variance can be computed using the Kalman filter. When the system is not linear, a first solution consists in approximating this density by a Gaussian density and using the extended Kalman filter for quickly computing the mean and the variance of this density. On the other hand, particle filters do not make any approximations of the transition density, but are very demanding in terms of simulation volume and computation time.<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Bibliography==<br /> <br /> <br /> &lt;bibtex&gt;<br /> @article{delattre2013sii,<br /> title={Coupling the SAEM algorithm and the extended Kalman filter for maximum likelihood estimation in mixed-effects diffusion models},<br /> author={Delattre, M. and Lavielle, M.},<br /> journal={Statistics and Its Interface},<br /> year={2013}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @Article{Ditlevsen2005,<br /> title = {Mixed Effects in Stochastic Differential Equation Models},<br /> author = {Ditlevsen, S. and De Gaetano, A.},<br /> journal = {REVSTAT Statistical Journal},<br /> volume = {3},<br /> year = {2005},<br /> pages = {137-153}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @Article{Donnet2008,<br /> title = {Parametric Inference for Mixed Models Defined by Stochastic Differential Equations},<br /> author = {Donnet, S. and Samson, A.},<br /> journal = {ESAIM: Probability and Statistics},<br /> volume = {12},<br /> year = {2008},<br /> pages = {196-218}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @inproceedings{doucet2011tutorial,<br /> title={A tutorial on particle filtering and smoothing: Fifteen years later},<br /> author={Doucet, A. and Johansen, A. M.},<br /> booktitle={Oxford Handbook of Nonlinear Filtering},<br /> year={2011},<br /> organization={Citeseer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{Klim2009,<br /> author = {Klim, S. and Mortensen, S. B. and Kristensen, N. R. and Overgaard, R. V. and Madsen, H.},<br /> title = {Population stochastic modelling (PSM)-an R package for mixed-effects models based on stochastic differential equations},<br /> journal = {Computer methods and programs in biomedicine},<br /> volume = {94},<br /> pages = {279-289},<br /> year = {2009}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @Article{Kristensen2005,<br /> title = {Using Stochastic Differential Equations for PK/PD Model Development},<br /> author = {Kristensen, N. R. and Madsen, H. and Ingwersen, S. H.},<br /> journal = {Journal of Pharmacokinetics and Pharmacodynamics},<br /> volume = {32},<br /> year = {2005},<br /> pages = {109-141}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{Mazzoni2008,<br /> title = {Computational aspects of continuous-discrete extended Kalman-filtering},<br /> author = {Mazzoni, T.},<br /> journal = {Computational Statistics},<br /> volume = {23},<br /> year = {2008},<br /> pages = {519-39}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @Article{PSM,<br /> title = {Population Stochastic Modelling (PSM): Model definition, description and examples},<br /> author = {Mortensen, S. and Klim, S.}, <br /> year = {2008},<br /> url = {http://www2.imm.dtu.dk/projects/psm/},<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @Article{Mortensen2007,<br /> title = {A Matlab framework for estimation of NLME models using stochastic differential equations - Applications for estimation of insulin secretion rates},<br /> author = {Mortensen, S. B. and Klim, S. and Dammann, B. and Kristensen, N. R. and Madsen, H. and Overgaard, R. V.},<br /> journal = {Journal of Pharmacokinetics and Pharmacodynamics},<br /> volume = {34},<br /> year = {2007},<br /> pages = {623-642}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @Article{Overgaard2005,<br /> title = {Non-Linear Mixed-Effects Models with Stochastic Differential Equations: Implementation of an Estimation Algorithm},<br /> author = {Overgaard, R. V. and Jonsson, N. and Torn&amp;oslash;e, C. W. and Madsen, H.},<br /> journal = {Journal of Pharmacokinetics and Pharmacodynamics},<br /> volume = {32},<br /> year = {2005},<br /> pages = {85-107}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @Article{Picchini2010,<br /> title = {Stochastic Differential Mixed-Effects Models},<br /> author = {Picchini, U. and De Gaetano, A. and Ditlevsen, S.},<br /> journal = {Scandinavian Journal of Statistics},<br /> volume = {37},<br /> year = {2010},<br /> pages = {67-90}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @Article{Picchini2011,<br /> title = {Practical Estimation of High Dimensional Stochastic Differential Mixed-Effects Models},<br /> author = {Picchini, U. and Ditlevsen, S.},<br /> journal = {Computational Statistics and Data Analysis},<br /> volume = {55},<br /> number = {3},<br /> year = {2011},<br /> pages = {1426-1444}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @Article{Tornoe2005,<br /> title = {Stochastic Differential Equations in NONMEM: Implementation, Application, and Comparison with Ordinary Differential Equations},<br /> author = {Torn&amp;oslash;e, C. W. and Overgaard, R. V. and Agers&amp;oslash;, H. and Nielsen, H. A. and Madsen, H. and Jonsson, E. N.},<br /> journal = {Pharmaceutical Research},<br /> volume = {22},<br /> year = {2005},<br /> pages = {1247-1258}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> <br /> {{Back<br /> |link=Hidden Markov models }}</div> Admin https://wiki.inria.fr/popix/Hidden_Markov_Model Hidden Markov Model 2013-04-15T11:43:26Z <p>Admin: /* Mixed hidden Markov models */</p> <hr /> <div>&lt;!-- Menu for the Extensions chapter --&gt;<br /> &lt;sidebarmenu&gt;<br /> +[[Extensions]]<br /> *[[Extensions| Introduction ]] | [[ Mixture models ]] | [[Hidden Markov models]] | [[Stochastic differential equations based models]] <br /> &lt;/sidebarmenu&gt;<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> ==Introduction==<br /> <br /> <br /> [http://en.wikipedia.org/wiki/Markov_chain Markov chains] are a useful tool for analyzing categorical longitudinal data. However, sometimes the [https://en.wikipedia.org/wiki/Markov_process Markov process] cannot be directly observed, though some output, dependent on the<br /> (hidden) state, is visible. More precisely, we assume that the distribution of this observable output depends on the underlying hidden state. Such models are called hidden Markov models (HMMs).<br /> HMMs can be applied in many contexts and have turned out to be particularly pertinent in several biological contexts. For example, they are useful when characterizing diseases for which the existence of several discrete stages of illness is a realistic assumption, e.g., epilepsy and migraines.<br /> <br /> Here, we will consider a parametric framework with Markov chains in a discrete and finite state space$\mathbf{K} = \{1,\ldots,K\}$.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Mixed hidden Markov models==<br /> <br /> <br /> HMMs have been developed to describe how a given system moves from one state to another over time, in situations where the successive visited states are unknown and a set of observations is the only available information to describe the dynamics of the system. HMMs can be seen as a variant of mixture models that allow for possible memory in the sequence of hidden states. An HMM is thus defined as a pair of processes$(z_j,y_j, j=1,2,\ldots)$, where the latent sequence$(z_j)$is a Markov chain and where the distribution of the observation$y_j$at time$t_j$depends on the state$z_j$.<br /> <br /> <br /> {{ImageWithCaption|image=hmm0.png|caption=Dynamics of a hidden Markov model}}<br /> <br /> <br /> In a population approach, HMMs from several individuals can be described simultaneously by considering ''mixed'' HMMs.<br /> Let$y_i=\left(y_{i,1},\ldots,y_{i,n_i}\right)$and$z_i= \left(z_{i,1}, \ldots,z_{i,n_i}\right)$denote respectively the sequences of observations and hidden states for individual$i$.<br /> <br /> We suppose that the joint distribution of$(z_i,y_i)$is a parametric distribution that depends on a vector of parameters$\psi_i$and can be decomposed as<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:hmm1&quot;&gt;&lt;math&gt;<br /> \pcyzipsii(z_i,y_i {{!}} \psi_i) = \pczipsii(z_i {{!}}\psi_i) \, \pcyizpsii(y_i {{!}} z_i,\psi_i) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> For each individual$i$,$z_i$is a Markov chain whose probability distribution is defined by<br /> <br /> <br /> &lt;ul&gt;<br /> * the distribution$ \pi_{i,1} = (\pi_{i,1}^{k},\ k=1,2,\ldots,K)$of the first state$z_{i,1}$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pi_{i,1}^{k} = \prob{z_{i,1} = k {{!}} \psi_i} . &lt;/math&gt; }}<br /> <br /> <br /> * the sequence of ''transition matrices''$(Q_{i,j} \ ; \, j=2,3,\ldots)$, where for each$j$,$Q_{i,j} = (q_{i,j}^{\ell,k} \ ; \, 1\leq \ell,k \leq K)$is a matrix of size$K \times K$such that$q_{i,j}^{\ell,k} = \prob{z_{i,j} = k | z_{i,j-1}=\ell , \psi_i}$.<br /> &lt;/ul&gt;<br /> <br /> <br /> {{ImageWithCaption|image=markov_1.png|caption=Transitions of a Markov chain with 3 states}}<br /> <br /> <br /> The conditional distribution$\qcyizpsii$depends on the model for the observations: for each state, observation$y_{ij}$has a certain distribution. Let us see some examples:<br /> <br /> <br /> &lt;br&gt;<br /> === Examples ===<br /> <br /> <br /> 1. In a continuous data model, one possibility is that the residual error model is a hidden Markov model that can randomly switch between$K$possible residual error models.<br /> <br /> <br /> {{Example<br /> |title=Example 1<br /> |text=In this example, we consider a 2-state Markov chain. A constant error model is assumed in each state:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> y_{ij} &amp;=&amp; \sin(\alpha \, t_{ij}) + a_{i,1} \teps_{ij} \quad \text{if } z_{ij}=1 \\<br /> y_{ij} &amp;=&amp; \sin(\alpha \, t_{ij}) + a_{i,2} \teps_{ij} \quad \text{if } z_{ij}=2.<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The figure below displays simulated data from this model for 4 individuals. Observations drawn from state 1 (resp. state 2) are displayed in magenta (resp. black). Of course, the states are unknown in the case of hidden Markov models, i.e., only the values are observed in practice, not the colors.<br /> <br /> <br /> ::[[File:hmm1bis.png|link=]]<br /> <br /> }}<br /> <br /> <br /> <br /> 2. In a Poisson model for count data, the Poisson parameter might randomly switch between$K$intensities. Such models have been used for describing the evolution of seizures in epileptic patients:<br /> <br /> <br /> {{Example<br /> |title=Example 2<br /> |text= Instead of assuming a single [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution] for the observed numbers of seizures, this model assumes that patients go through alternating periods of low and high epileptic susceptibility. Therefore we consider what is called a 2-state Poisson mixed-HMM:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> y_{ij} &amp;\sim&amp; {\rm Poisson}(\lambda_{i,1}) \quad \text{if } z_{ij}=1 \\<br /> y_{ij} &amp;\sim&amp; {\rm Poisson}(\lambda_{i,2}) \quad \text{if } z_{ij}=2.<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> :: [[File:hmm2bis.png|link=]]<br /> <br /> }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Distributions of observations==<br /> <br /> <br /> Assuming that the$N$individuals are independent, the joint pdf is given by:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:sdepdf&quot;&gt;&lt;math&gt;<br /> \pcypsi(y_1,\ldots,y_N {{!}} \psi_1,\ldots,\psi_N ) = \prod_{i=1}^{N}\pcyipsii(y_i {{!}} \psi_i).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> Then, computing the conditional distribution of the observations$\qcyipsii$for any individual$i$requires integration of the joint conditional distribution$\qcyzipsii$over the states:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcyipsii(y_i {{!}} \psi_i) &amp;=&amp; \sum_{z_i \in \mathbf{S} } \pcyzipsii(z_i, y_i {{!}} \psi_i) \\<br /> &amp;=&amp; \sum_{z_i \in \mathbf{S} } \pczipsii(z_i {{!}} \psi_i) \, \pcyizpsii(y_i {{!}} z_i,\psi_i) \\<br /> &amp;=&amp; \sum_{z_i \in \mathbf{S} } \left\{ \pi_{i,1}^{z_{i,1} } \pcyiONEzpsii(y_{i,1} {{!}} z_{i,1},\psi_i)\prod_{j=2}^{n} \left( q_{i,j}^{z_{i,j-1},z_{i,j} } \, \pcyijzpsii(y_{i,j} {{!}} z_{i,j},\psi_i) \right) \right\} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Though this looks complicated, it turns out that forward recursion of the [http://en.wikipedia.org/wiki/Baum-Welch_algorithm Baum-Welch algorithm] provides a quick way to numerically compute it.<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Bibliography==<br /> <br /> <br /> &lt;bibtex&gt;<br /> @article{Albert1991,<br /> title = &quot;A two state Markov mixture model for a time series of epileptic seizure counts&quot;,<br /> author = &quot;Albert, P. S.&quot;,<br /> journal = &quot;Biometrics&quot;,<br /> volume = &quot;47&quot;,<br /> year = &quot;1991&quot;,<br /> pages = &quot;1371-1381&quot;}<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{Altman2007,<br /> title = &quot;Mixed hidden Markov models : an extension of the hidden Markov model to the longitudinal data setting&quot;,<br /> author = &quot;Altman, R. M.&quot;,<br /> journal = &quot;Journal of the American Statistical Association&quot;,<br /> volume = &quot;102&quot;,<br /> year = &quot;2007&quot;,<br /> pages = &quot;201-210&quot;}<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{Anisimov2007,<br /> title = &quot;Analysis of responses in migraine modelling using hidden Markov models&quot;,<br /> author = &quot;Anisimov, W. and Maas, H. J. and Danhof, M. and Della Pasqua, O.&quot;,<br /> journal = &quot;Statistics in Medicine&quot;,<br /> volume = &quot;26&quot;,<br /> year = &quot;2007&quot;,<br /> pages = &quot;4163-4178&quot;}<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{Cappe2005,<br /> author = &quot;Capp&amp;eacute;e, O. and Moulines, E. and Ryd&amp;eacute;en, T.&quot;,<br /> title = &quot;Inference in hidden Markov models&quot;,<br /> year = &quot;2005&quot;,<br /> publisher= &quot;Springer Series in Statistics&quot;}<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{ChaubertPereira2011,<br /> title = &quot;Markov and Semi-Markov Switching Linear Mixed Models Used to Identify<br /> Forest Tree Growth Components&quot;,<br /> author = &quot;Chaubert-Pereira, F. and Gu&amp;eacute;don, Y. and Lavergne, C. and Trottier, C.&quot;,<br /> journal = &quot;Biometrics&quot;,<br /> volume = &quot;66&quot;,<br /> year = &quot;2011&quot;,<br /> pages = &quot;753-762&quot;}<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{delattre2012maximum,<br /> title={Maximum likelihood estimation in discrete mixed hidden Markov models using the SAEM algorithm},<br /> author={Delattre, M. and Lavielle, M.},<br /> journal={Computational Statistics &amp; Data Analysis},<br /> year={2012},<br /> publisher={Elsevier}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{delattre2012analysis,<br /> title={Analysis of exposure-response of CI-945 in patients with epilepsy: application of novel mixed hidden Markov modeling methodology},<br /> author={Delattre, M. and Savic, R. M. and Miller, R. and Karlsson, M. O. and Lavielle, M.},<br /> journal={Journal of pharmacokinetics and pharmacodynamics},<br /> pages={1-9},<br /> year={2012},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{Maruotti2009,<br /> title = &quot;A semiparametric approach to hidden Markov models under longitudinal<br /> observations&quot;,<br /> author = &quot;Maruotti, A. and Ryd&amp;eacute;en, T.&quot;,<br /> journal = &quot;Statistics and Computing&quot;,<br /> volume = &quot;19&quot;,<br /> year = &quot;2009&quot;,<br /> pages = &quot;381-393&quot;}<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{Rabiner1989,<br /> title = &quot;A tutorial on Hidden Markov Models and selected applications in speech recognition&quot;,<br /> author = &quot;Rabiner, L. R.&quot;,<br /> journal = &quot;Proceedings of the IEEE&quot;,<br /> volume = &quot;77&quot;,<br /> year = &quot;1989&quot;,<br /> pages = &quot;257-286&quot;}<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{Rijmen2008,<br /> title = &quot;Qualitative longitudinal analysis of symptoms in patients with primary<br /> and metastatic brain tumours&quot;,<br /> author = &quot;Rijmen, F. and Ip, E. H. and Rapp, S. and Shaw, E. G.&quot;,<br /> journal = &quot;Journal of the Royal Statistical Society - Series A.&quot;,<br /> volume = &quot;171, Part 3&quot;,<br /> year = &quot;2008&quot;,<br /> pages = &quot;739-753&quot;}<br /> &lt;/bibtex&gt;<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack= Mixture models<br /> |linkNext= Stochastic differential equations based models }}</div> Admin https://wiki.inria.fr/popix/Mixture_models Mixture models 2013-04-15T11:20:25Z <p>Admin: /* Mixtures of mixed-effects models */</p> <hr /> <div>&lt;!-- Menu for the Extensions chapter --&gt;<br /> &lt;sidebarmenu&gt;<br /> +[[Extensions]]<br /> *[[Extensions| Introduction ]] | [[ Mixture models ]] | [[Hidden Markov models]] | [[Stochastic differential equations based models]] <br /> &lt;/sidebarmenu&gt;<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> ==Introduction==<br /> <br /> Mixed-effects models are frequently used for modeling longitudinal data when data is obtained from different individuals from the same population. These models allow us to take into account between-subject variability.<br /> One complicating factor arises when data is obtained from a population with some underlying heterogeneity. If we assume that the population consists of several homogeneous sub-populations, a straightforward extension of mixed-effects models is a finite mixture of mixed-effects models.<br /> <br /> As an example, the use of a mixture of mixed effects models is particularly relevant when the response of patients to a drug therapy is heterogeneous. In any clinical efficacy trial, patients who respond, partially respond or do not respond at all can be considered different sub-populations with quite different profiles.<br /> <br /> The introduction of a categorical covariate (e.g., sex, [http://en.wikipedia.org/wiki/Genotype genotype], treatment, status, etc.) into such a model already supposes that the whole population can be decomposed into sub-populations. The covariate then serves as a ''label'' for assigning each individual to a sub-population. In practice, the covariate can either be known or not.<br /> <br /> Mixture models usually refer to models for which the categorical covariate is unknown, but whatever the case, the joint model that brings together all the parts (observations, individual parameters, covariates, labels, design, etc.) is the same. The difference appears when having to perform certain tasks and in the methods needed to implement them. For instance, the task of simulation makes no distinction between the two situations because all the variables are simulated, whereas model construction is different depending on whether the labels are known or unknown: we have supervised learning if the labels are known and unsupervised learning otherwise.<br /> <br /> There exist several types of mixture models which are useful in the context of mixed-effects models, e.g., mixtures of distributions, mixtures of residual error models, and mixtures of structural models.<br /> Indeed, heterogeneity in the response variable cannot be always adequately explained only by inter-patient variability of certain parameters. It can therefore be necessary to introduce diversity into the structural models themselves:<br /> <br /> <br /> &lt;ul&gt;<br /> * ''Between-subject model mixtures'' assume that there exist sub-populations of individuals. Here, various structural models describe the response of the different sub-populations, and each subject belongs to one sub-population. One can imagine for example different structural models for responders, non responders and partial responders to a given treatment.<br /> &lt;br&gt;<br /> <br /> * ''Within-subject model mixtures'' assume that there exist sub-populations (of cells, viruses, etc.) within each patient. Again, differing structural models describe the response of the different sub-populations, but the proportion of each sub-population depends on the patient.<br /> &lt;/ul&gt;<br /> <br /> &lt;br&gt;<br /> <br /> == Mixtures of mixed-effects models ==<br /> <br /> <br /> For the sake of simplicity, we will consider a basic model that involves individual parameters$\bpsi=(\psi_i,1\leq i \leq N)$and observations$\by=(y_i,1\leq i \leq N)$, where$y_i=(y_{ij},1\leq j \leq n_i)$. Then, the simplest way to model a finite mixture model is to introduce a label sequence$\bz=(z_i ; 1\leq z_i \leq N)$that takes its values in$\{1,2,\ldots,M\}$and is such that$z_i=m$if subject$i$belongs to sub-population$m$.<br /> <br /> In some situations, the label set$\bz$is known and can then be used as a categorical covariate in the model.<br /> If$\bz$is known and if we consider$\bz$the realization of a random vector, the model is the conditional distribution<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;label{eq:mixt1}&quot;&gt;&lt;math&gt;<br /> \pcypsiz(\by,\bpsi {{!}} \bz;\theta) = \pccypsiz(\by {{!}} \bpsi , \bz)\pcpsiz(\bpsi {{!}} \bz;\theta) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> If$\bz$is unknown, it is modeled as a random vector and the model is the joint distribution<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;label{eq:mixt2}&quot;&gt;&lt;math&gt;<br /> \pypsiz(\by,\bpsi, \bz;\theta) = \pcypsiz(\by,\bpsi {{!}}\bz;\theta)\pz(\bz;\theta) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> We therefore consider that$\bz=(z_i)$is a set of independent random variables taking its values in$\{1,2,\ldots,M\}$: for$i=1,2,\ldots, N$, there exist$\pw_{i,1},\pw_{i,2},\ldots,\pw_{i,M}$such that<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\prob{z_i = m} = \pw_{i,m} . &lt;/math&gt; }}<br /> <br /> A simple model might assume that the$(z_i)$are identically distributed:$\pw_{i,m} = \pw_{m}$for$m=1,\ldots,M$.<br /> But more complex models can be considered, assuming for instance that an individual's probabilities depend on its covariate values.<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text=<br /> The [http://en.wikipedia.org/wiki/Hepatitis_C_virus Hepatitis C virus] (HCV) can be divided into six distinct genotypes. Genotype 1 is the most difficult to treat, whereas individuals with genotypes 2 and 3 are almost three times more likely to respond to the therapy of a combination of [http://en.wikipedia.org/wiki/Alpha_interferon alpha interferon] and [http://en.wikipedia.org/wiki/Ribavirin ribavirin]. <br /> <br /> Suppose we want to divide patients infected with HCV into three outcome groups: patients who respond, partially respond or do not respond. It is valid to assume that an individual's probabilities for ending up in each of these groups depends on their value for the genotype covariate. }}<br /> <br /> In its most general form, a mixture of mixed-effects models assumes that there exist$M$joint distributions$\pyipsii_{1}$, ...,$\pyipsii_{M}$and vectors of parameters$\theta_1$, ...,$\theta_M$such that for any individual$i$, the joint distribution of$y_i$and$\psi_i$becomes<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pyipsii(y_i,\psi_i;\theta) = \sum_{m=1}^M \prob{z_i = m} \pyipsii_{m}(y_i,\psi_i;\theta_m) ,<br /> &lt;/math&gt; }}<br /> <br /> where$\pypsi_{m}$is the joint distribution of$(y_i,\psi_i)$in group$m$and where$\theta=(\theta_1,\ldots,\theta_M)$.<br /> <br /> The distribution of the observations$y_i$is therefore itself a mixture of$M$distributions:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{array}{c}<br /> \pyi(y_i;\theta) &amp;=&amp; \int \pyipsii(y_i,\psi_i;\theta) \, d \psi_i \end{array}&lt;/math&gt; }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:mixt3&quot;&gt;&lt;math&gt;\begin{array}{c}<br /> &amp; = &amp; \sum_{m=1}^M \prob{z_i = m} \left( \int \pyipsii_{m}(y_i,\psi_i,\theta_m) \, d \psi_i \right) <br /> \end{array}&lt;/math&gt;&lt;/div&gt; <br /> |reference=(3) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{array}{c}<br /> &amp; = &amp; \sum_{m=1}^M \prob{z_i = m} \pyi_{m}(y_i;\theta_m) . \end{array}&lt;/math&gt; }}<br /> <br /> The mixture can then be looked at via the distribution of the individual parameters$\qpsii$and/or the conditional distribution of the observations$\qcyipsii$.<br /> Let us now see some examples of such mixtures models.<br /> <br /> <br /> &lt;ul&gt;<br /> * A latency structure can be introduced at the individual parameter level:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pyipsii(y_i,\psi_i;\theta) &amp; =&amp; \pcyipsii(y_i {{!}} \psi_i)\ppsii(\psi_i;\theta) \\<br /> &amp; =&amp; \pcyipsii(y_i {{!}} \psi_i) \left(\sum_{m=1}^M \prob{z_i = m} \ppsii_{m}(\psi_i;\theta_m) \right) ,<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> : where$\ppsii_{m}(\psi_i;\theta_m)$is the distribution of the individual parameters in group$m$. For example, a mixture of linear Gaussian models for the individual parameters assumes that there exist$M$population parameters$\psi_{{\rm pop},1}, \ldots, \psi_{{\rm pop},M}$, vectors of coefficients$\beta_{1}, \ldots, \beta_{M}$, variance matrices$\Omega_{1}, \ldots, \Omega_{M}$and transformations$h_1,\ldots,h_M$such that<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> h_m(\psi_i) \ {{!}} \ z_i=m \ \ \sim \ \ {\cal N}(\mu_m , \Omega_m),<br /> &lt;/math&gt; }}<br /> <br /> : where$\mu_m = h_m(\psi_{ {\rm pop},m})+ \langle \beta_m , c_i \rangle$.<br /> <br /> : This is the most general representation possible because it allows the transformation, population parameters, covariate model and variance-covariance structure of the random effects all to vary from one group to the next. A more simpler representation would have one or all of these fixed across the groups.<br /> &lt;br&gt;<br /> <br /> * A latency structure can also be introduced at the level of the conditional distribution of the observations$(y_{ij})$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pyipsii(y_i,\psi_i;\theta) &amp; =&amp; \pcyipsii(y_i {{!}} \psi_i)\ppsii(\psi_i;\theta) \\<br /> &amp; =&amp; \left(\sum_{m=1}^M \prob{z_i = m} \pcyipsii_{m}(y_i{{!}}\psi_i) \right) \ppsii(\psi_i;\theta) ,<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> : where$\pcyipsii_{m}$is the conditional distribution of the observations in group$m$. For example, the model for continuous data<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;mixturey&quot;&gt;&lt;math&gt;<br /> y_{ij} = f\left( t_{ij};\psi_i,z_i \right) + g\left( t_{ij};\psi_i,z_i \right)\teps_{ij}<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> : with$\teps_{ij} \sim {\cal N}(0,1)$, can be equivalently represented as<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> y_{ij} {{!}} \,z_i=m \ \ \sim \ \ {\cal N}(f_m( t_{ij};\psi_i) , \ g_m( t_{ij};\psi_i)^2)<br /> &lt;/math&gt; }}<br /> <br /> : for each$m=1,\ldots,M$. A mixture of conditional distributions therefore reduces to a mixture of structural models and/or residual errors.<br /> <br /> : To give a precise example, a mixture of constant error models would assume that<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> y_{ij} &amp;= &amp; f\left( t_{ij};\psi_i \right) + \left( \sum_{m=1}^M \one_{z_i = m} a_m \right) \varepsilon_{ij} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> :Alternatively, between subject model mixtures (BSMM) assume that the structural model is a mixture of$M$different structural models:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;bsmm&quot;&gt;&lt;math&gt;<br /> f\left( t_{ij};\psi_i,z_i \right) = \sum_{m=1}^M \one_{z_i = m} f_m\left( t_{ij};\psi_i \right) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(5) }}<br /> &lt;/ul&gt;<br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> |text=It may be too simplistic to assume that each individual is represented by only one well-defined model from the mixture. For instance, in a pharmacological setting there may be sub-populations of cells or viruses ''within each patient'' that react differently to a drug treatment. In this case, it makes sense to consider that the mixture of models happens ''within'' each individual. Such within-subject model mixtures (WSMM) therefore require additional vectors of individual parameters$\pi_i=(\pi_{i,1},\ldots \pi_{i,M})$representing proportions of the$M$models within each individual$i$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;wsmm&quot;&gt;&lt;math&gt;<br /> f\left( t_{ij};\psi_i,z_i \right) = \sum_{m=1}^M \pi_{i,m} f_m\left( t_{ij};\psi_i \right) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(6) }}<br /> <br /> The proportions$(\pi_{i,m})$are now individual parameters in the model and the problem is transformed into a standard NLMEM.<br /> These proportions are assumed to be positive and summing to$1$for each patient. We can then define$\pi_{i,m}$in order to satisfy these constraints. One possible way to do this is:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pi_{i,m} =\displaystyle{ \frac{\gamma_{i,m} }{\sum_{\ell=1}^M \gamma_{i,\ell} } }, &lt;/math&gt; }}<br /> <br /> where$\log(\gamma_{i,m}) \sim {\cal N}(\log(\gamma_{ {\rm pop},m}), \omega^2_m)$.<br /> }}<br /> <br /> <br /> <br /> ==Example 1: Mixtures of normal distributions==<br /> <br /> We consider here a simple PK model for a single oral administration:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> f(t ; ka,V,ke) &amp;=&amp; \frac{D\, k_a}{V(k_a-k_e)} \left( e^{-k_e \, t} - e^{-k_a \, t} \right).<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Here, the PK parameters are the absorption rate constant$ka$, the elimination rate constant$ke$and the volume of distribution$V$.<br /> <br /> We can model the PK parameters$\psi_i=(ka_i,V_i, ke_i)$of individual$i$randomly chosen from the population as a vector of independent random parameters.<br /> <br /> The figure shows the final distribution obtained for the volume when given as a mixture of two log-normal distributions:<br /> <br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\log(V_i ) \sim 0.35 \ {\cal N}(\log(70) , 0.3^2) + 0.65 \ {\cal N}(\log(42) , 0.3^2). &lt;/math&gt; }}<br /> <br /> <br /> {{ImageWithCaption|image=pkmixt.png|caption= 2 log-normal distributions$p_1$and$p_2$for the volume and a mixture of these two distributions}}<br /> <br /> <br /> Here, the structural model$f$is a function of time and$f( t ; \psi_i)$is the predicted concentration of the drug in individual$i$at time$t$.<br /> Then,$f( \, \cdot \, ; \psi_i)$is a random function because it depends on a random parameter$\psi_i$.<br /> The probability distribution of$f( \, \cdot \, ; \psi_i)$is therefore that of the concentration predicted by the model. It represents the inter-individual variability of the drug's pharmacokinetics in the population.<br /> <br /> The figure below displays prediction intervals for the concentration$f( \, \cdot \, ; \psi_i)$for one individual$i$randomly chosen in the population, where$\psi_i=(ka_i,V_i,Cl_i)$are the PK parameters. In other words, this plot allows us to visualize the impact of the inter-individual variability of the individual PK parameters on the exposure to the drug.<br /> Here,$V_i$is a mixture of two log-normal distributions as described above while$ka_i$and$ke_i$have log-normal distributions:<br /> <br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \log(ka_i ) &amp;\sim&amp; {\cal N}(\log(1) , 0.3^2) \\<br /> \log(ke_i ) &amp;\sim&amp; {\cal N}(\log(4) , 0.3^2) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> {{ImageWithCaption|image=pkmixture1.png|caption=Prediction intervals for the predicted concentration kinetics$f( \, \cdot \, ; \psi_i)$}}<br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> |text=<br /> Here, the distribution of$f( \, \cdot \, ; \psi_i)$cannot be computed in a closed form because the model is non-linear, but it can be easily estimated by Monte Carlo simulation.<br /> }}<br /> <br /> <br /> Here, the distribution of$f( \, \cdot \, ; \psi_i)$is itself a mixture of 2 distributions since the distribution of$\psi_i$is a mixture of distributions due to$V_i$. It is interesting to see the distribution of the predicted concentration in each subpopulation. Indeed, any individual$i$will either have a log-volume from${\cal N}(\log(70) , 0.3^2)$(with probability 0.35) or a log-volume from${\cal N}(\log(42) , 0.3^2)$(with probability 0.65), so in order to visualize what really happen to a single individual$i$, we need to split the data into two plots: 35% of the individuals will have concentration kinetics distributed like on the left, and 65% like on the right.<br /> <br /> <br /> {{ImageWithCaption|image=pkmixture2.png|caption=The probability distribution of the predicted concentration kinetics$f( \, \cdot \, ; \psi_i)$in the two subpopulations}}<br /> <br /> <br /> &lt;br&gt;<br /> ==Example 2: Mixtures of structural models==<br /> <br /> Here we are interested in a study which concerns treated HIV-infected patients. The output data is the [http://en.wikipedia.org/wiki/Viral_load viral load] evolution for these patients.<br /> The figure below gives examples of patients with one of three &quot;characteristic&quot; viral load progressions:<br /> <br /> <br /> &lt;ul&gt;<br /> * ''Non-responders'' (1) show no decline in viral load.<br /> <br /> * ''Responders'' (2) exhibit a sustained viral load decline.<br /> <br /> * ''Rebounders'' (3 and 4) exhibit an initial drop in viral load, then a rebound to higher viral load levels.<br /> &lt;/ul&gt;<br /> <br /> <br /> {{ImageWithCaption|image=hiv1.png|caption= Viral load progression for 4 HIV-infected patients. &lt;br&gt; (1) non-responder; (2) responder; (3) and (4) are rebounders. Red points indicate below level of quantification data.}}<br /> <br /> <br /> {{Remarks<br /> |title=Remarks:<br /> |text= &amp;#32;<br /> * Since viral loads generally evolve exponentially over time, they are most commonly expressed on a logarithmic scale.<br /> <br /> *There is a detection limit at$50$HIV RNA copies/ml, corresponding to a log-viral load of$1.7$, i.e., data are left-censored. These points are shown in red.<br /> }}<br /> <br /> <br /> Within a few months of HIV infection, patients typically enter a steady state of chronic infection and have a stabilized concentration of HIV-1 in [http://en.wikipedia.org/wiki/Blood_plasma blood plasma]. This concentration is modeled by an individual constant$A_{i,0}$. When [http://en.wikipedia.org/wiki/Anti-retroviral anti-retroviral treatment] starts, the viral load of patients who respond shows an initial rapid [http://en.wikipedia.org/wiki/Exponential_decay exponential decay], usually followed by a slower second phase of exponential decay.<br /> This two-phase decay in viral load can be approximated by the bi-exponential model:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;A_{1}e^{-\lambda_{1}t} +A_{2}e^{-\lambda_{2}t} .<br /> &lt;/math&gt; }}<br /> <br /> After the decrease in viral load level, some subjects show a rebound, which can be due to several factors (non-adherence to the therapy, emergence of drug-resistant virus strains, etc.).<br /> We propose to extend the bi-exponential model to these patients by adding a third phase, characterized by a logistic growth process$A_{3}/({1+e^{-\lambda_{3}(t-\tau)}})$, where$\tau$is the inflection point of this growth process.<br /> <br /> We can then describe the log-transformed viral load with a BSMM with three simple models, corresponding to each of three characteristic viral load progressions:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> f_1(t_{ij},\psi_i) &amp;=&amp; A_{i,0} \\<br /> f_2(t_{ij},\psi_i) &amp;=&amp;A_{i,1}e^{-\lambda_{i,1}t_{ij} } +A_{i,2}e^{-\lambda_{i,2}t_{ij} } \\<br /> f_3(t_{ij},\psi_i) &amp;=&amp;A_{i,1}e^{-\lambda_{i,1}t_{ij} } +A_{i,2}e^{-\lambda_{i,2}t_{ij} }<br /> + \displaystyle{ \frac{A_{i,3} }{1+e^{-\lambda_{i,3}(t_{ij}-\tau_i) } } } .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The log-transformed viral load can then be modeled by:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\log(y_{ij} ) = \sum_{m=1}^3\one_{z_i=m}\log(f_m(t_{ij},\psi_i) ) + \varepsilon_{ij},<br /> &lt;/math&gt; }}<br /> <br /> where$y_{ij}$is the viral load for subject$i$at time$t_{ij}$and$\psi_i=(A_{i,0},A_{i,1},A_{i,2},A_{i,3},\lambda_{i,1},\lambda_{i,2},\lambda_{i,3},\tau_i)$the vector of individual parameters.<br /> <br /> The figure below displays the predicted viral loads for the 4 patients using model$f_1$for patient 1,$f_2$for patient 2 and$f_3for patients 3 and 4, with the &lt;balloon title=&quot;these values were not obtained ''by chance'', they were estimated using Monolix, but that's another story...&quot; style=&quot;color:#177245&quot;&gt; parameters&lt;/balloon&gt; given to the right of the figure:<br /> <br /> <br /> {| cellpadding=&quot;5&quot; cellspacing=&quot;5&quot; <br /> || <br /> {{ImageWithCaption_special|image=hiv2.png|caption=Observed and predicted viral load progression for 4 HIV-infected patients}}<br /> || <br /> {| class=&quot;wikitable&quot; align=&quot;right&quot; style=&quot;width: 40%&quot;<br /> !| ID ||A_0$||$A_1$||$A_2$||$A_3$||$\lambda_1$||$\lambda_2$||$\lambda_3$||$\tau$<br /> |-<br /> | 1 || 92 ||$-$||$-$||$-$||$-$||$-$||$-$||$-$<br /> |-<br /> | 2 ||$-$|| 66 || 5 ||$-$|| 0.14 ||$2\times10^{-5}$||$-$||$-$<br /> |-<br /> | 3 ||$-$|| 53 || 6 ||28 ||0.15 ||$1.5\times10^{-5}$||0.15 ||200<br /> |-<br /> | 4 ||$-$|| 77 || 10 ||100 ||0.1 ||$1.5\times10^{-5}$||0.013 ||270 <br /> |}<br /> |}<br /> <br /> <br /> Not all observed viral load progressions fall so easily into one of the three classes, as for example the patients shown in the next figure.<br /> <br /> <br /> {{ImageWithCaption|image=hiv3.png|caption= Viral load data for 4 patients with ambiguous progressions}}<br /> <br /> <br /> In these cases, it does not seem quite so reasonable to model the data under the BSMM assumption that each patient must belong uniquely to one class. Instead, it is perhaps more natural to suppose that each patient is partially responding, partially non-responding and partially rebounding to the given drug treatment. The goal becomes to find the relative strength of each process in each patient, and a WSMM is an ideal tool to do this. Without going further into the details, here are the resulting observed and predicted viral loads for these 4 individuals when each individual represents a mixture of the three viral load progressions.<br /> <br /> <br /> {{ImageWithCaption|image=hiv4.png|caption= Observed and predicted viral load progression for 4 patients time using WSMM}}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Bibliography ==<br /> <br /> <br /> &lt;bibtex&gt;<br /> @article{biernacki2006,<br /> author = {Biernacki, C. and Celeux, G. and Govaert, G. and Langrognet, F.},<br /> title = {Model-Based Cluster and Discriminant Analysis with the MIXMOD Software. },<br /> journal = {Computational Statistics and Data Analysis},<br /> volume = {51},<br /> number = {2},<br /> pages = {587-600},<br /> year = {2006}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{celeux2000a,<br /> author = {Celeux, G. and Hurn, M. and Robert, C.},<br /> title = {Computational and inferential difficulties with mixtures posterior distribution. },<br /> journal = {J. American Statist. Assoc.},<br /> volume = {95},<br /> number = {3},<br /> pages = {957-979},<br /> year = {2000}<br /> }<br /> &lt;/bibitex&gt;<br /> &lt;bibtex&gt;<br /> @article{delacruz2008,<br /> author = {De la Cruz, R. and Quintana, F. A. and Marshall, G.},<br /> title = {Model Based Clustering for Longitudinal Data},<br /> journal = {Computational Statistics and Data Analysis},<br /> volume = {52},<br /> number = {3},<br /> pages = {1441-1457},<br /> year = {2008}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{fruhwirth2006,<br /> author = {Fr&amp;uuml;hwirth-Schnatter, S.},<br /> title = {Finite Mixture and Markov Switching Models},<br /> publisher = {Springer},<br /> pages = {},<br /> series = {},<br /> address = {New York},<br /> editor = {},<br /> year = {2006}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{hou2008,<br /> author = {Hou, W. and Li, H. and Zhang, B. and Huang, M. and Wu, R.},<br /> title = {A nonlinear mixed-effect mixture model for functional mapping of dynamic traits},<br /> journal = {Heredity},<br /> volume = {101},<br /> pages = {321-328},<br /> year = {2008}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{ketchum2012,<br /> author = {Ketchum, J. M. and Best, A. M. and Ramakrishnan, V.},<br /> title = {A Within-Subject Normal-Mixture Model with Mixed-Effects for Analyzing Heart Rate Variability},<br /> journal = {J. Biomet Biostat},<br /> volume = {S7:013},<br /> year = {2012}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{lavielle2013mixture,<br /> title={An improved SAEM algorithm for maximum likelihood estimation in mixtures of non linear mixed effects models},<br /> author={Lavielle, M. and Mbogning, C.},<br /> journal={Statistics &amp; Computing (to appear)},<br /> volume={},<br /> number={},<br /> pages={},<br /> year={2013}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{mbogning2012between,<br /> title={Between-subject and within-subject model mixtures for classifying HIV treatment response},<br /> author={Mbogning, C. and Bleakley, K. and Lavielle, M.},<br /> journal={Progress in Applied Mathematics},<br /> volume={4},<br /> number={2},<br /> pages={148-166},<br /> year={2012}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{mclachland2000,<br /> author = {McLachland, G. J. and Peel, D.},<br /> title = { Finite Mixture models. },<br /> publisher = {Wiley-Interscience},<br /> volume = {},<br /> pages = {},<br /> year = {2000},<br /> series = {},<br /> address = {New York},<br /> edition = {},<br /> month = {}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{muthen1999finite,<br /> title={Finite mixture modeling with mixture outcomes using the EM algorithm},<br /> author={Muth&amp;eacute;n, B. and Shedden, K.},<br /> journal={Biometrics},<br /> volume={55},<br /> number={2},<br /> pages={463-469},<br /> year={1999},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{ng2006,<br /> author = {Ng, S. K. and McLachlan, G. J. and Wang, K. and Ben-Tovim, L. and Ng, S. W.},<br /> title = {A mixture model with mixed effects components for clustering correlated gene-expression profiles},<br /> journal = {Bioinformatics},<br /> volume = {22},<br /> pages = {1745-1752},<br /> year = {2006}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{rosner1997,<br /> author = {Rosner, G. L. and Muller, P.},<br /> title = {Bayesian population pharmacokinetic and pharmacodynamic analyses using mixture models.},<br /> journal = {J. Pharmacokin. Biopharm.},<br /> volume = {25},<br /> pages = {209-233},<br /> year = {1997}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{verbeke1996linear,<br /> title={A linear mixed-effects model with heterogeneity in the random-effects population},<br /> author={Verbeke, G. and Lesaffre, E.},<br /> journal={Journal of the American Statistical Association},<br /> volume={91},<br /> number={433},<br /> pages={217-221},<br /> year={1996},<br /> publisher={Taylor &amp; Francis Group}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{wang2007,<br /> author = {Wang, X. and Schumitzky, A. and D'Argenio, D. Z.},<br /> title = {Non linear random effects mixture models : Maximum likelihood estimation via the EM algorithm },<br /> journal = {Comput. Stat. Data Anal.},<br /> volume = {51},<br /> pages = {6614-6623},<br /> year = {2007}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack=Extensions <br /> |linkNext=Hidden Markov models }}</div> Admin https://wiki.inria.fr/popix/Extensions Extensions 2013-04-15T11:15:53Z <p>Admin: </p> <hr /> <div>&lt;!-- Menu for the Extensions chapter --&gt;<br /> &lt;sidebarmenu&gt;<br /> +[[Extensions]]<br /> *[[Extensions| Introduction ]] | [[ Mixture models ]] | [[Hidden Markov models]] | [[Stochastic differential equations based models]] <br /> &lt;/sidebarmenu&gt;<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> &lt;div style=&quot;color: #2E5894; padding-left: 1.4em; padding-right:2.2em; padding-bottom:0.8em; padding:top:1&quot;&gt;[[Image:attention4.jpg|45px|left|link=]] <br /> (If you are experiencing problems with the display of the mathematical formula, you can either try to use another browser, or use this link which should work smoothly: http://popix.lixoft.net)<br /> &lt;/div&gt;<br /> <br /> We have so far reviewed the most frequently used models for describing both the individual parameters$(\psi_i)$and the observations$(y_i)$, but several extensions can be considered.<br /> <br /> For instance, if we assume that a population consists of several homogeneous sub-populations, mixtures models can be very useful for describing different types of mixtures, such as mixtures of distributions, mixtures of structural models and mixtures of residual models (see [[Mixture models|Mixture models]]).<br /> <br /> A stochastic component can also be introduced into the model by assuming some underlying stochastic dynamics, characterized either by a [http://en.wikipedia.org/wiki/Hidden_Markov_model hidden Markov model] (see [[Hidden Markov models|Hidden Markov models]]) or a system of [http://en.wikipedia.org/wiki/Stochastic_differential_equation stochastic differential equations] (see [[Stochastic differential equations based models]]).<br /> <br /> Although we restrict ourselves to these extensions in this document, it should be noted that other extensions mentioned in the introduction (see [[What is a model? A joint probability distribution!|What is a model? A joint probability distribution!]]) could also have been addressed:<br /> <br /> <br /> &lt;ul&gt;<br /> * Population parameter models: introduce a priori information in an estimation context, or to model inter-population variability.<br /> * Covariate models: mainly relevant in the context of wanting to simulate virtual individuals.<br /> * Design models: measurement times, dose regimens, etc.<br /> &lt;/ul&gt;<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack=Joint models<br /> |linkNext=Mixture models }}</div> Admin https://wiki.inria.fr/popix/Additional_levels_of_variability Additional levels of variability 2013-04-12T13:28:26Z <p>Admin: </p> <hr /> <div>&lt;!-- Menu for the Individual Parameters chapter --&gt;<br /> &lt;sidebarmenu&gt;<br /> +[[Modeling the individual parameters]]<br /> *[[Modeling the individual parameters| Introduction ]] | [[Gaussian models]] | [[Model with covariates]] | [[Extension to multivariate distributions]] | [[Additional levels of variability]] <br /> &lt;/sidebarmenu&gt;<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> == Modeling different levels of variability ==<br /> <br /> <br /> Up to now, the distribution$\qpsii$or equivalently the equation<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;indiv_level1&quot;&gt;&lt;math&gt;<br /> \psi_i = \model(\bbeta,c_i,\eta_i),<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> only describes the inter-individual variability of the individual parameters$ (\psi_i)$. This model therefore assumes that:<br /> <br /> <br /> &lt;ul&gt;<br /> * the individual parameter$\psi_i$for individual$i$remains constant during the whole study<br /> <br /> <br /> * the$N$individuals in the study are independent, i.e., the parameters$(\psi_i, 1\leq i \leq N)$are mutually independent.<br /> &lt;!-- %\item the variability of the individual parameters is completely explained by the individual vector of covariates$\bc_i$and the individual vector of random effects$\eta_i$. --&gt;<br /> &lt;/ul&gt;<br /> <br /> <br /> We will now see that these hypotheses can be weakened by considering additional levels of variability.<br /> <br /> Let us look as the first assumption, and now consider introducing ''intra-individual variability'' of individual parameters in the model. <br /> A first simple model consists of splitting the study into$K$time periods or &quot;occasions&quot;, and assuming that individual parameters can vary from occasion to occasion but remain constant within each occasion. Then, we try to explain part of the ''intra-individual'' variability of the individual parameters by [http://en.wikipedia.org/wiki/Piecewise-constant piecewise-constant]covariates, i.e., &quot;occasion-dependent&quot; or &quot;occasion-varying&quot; (varying from occasion to occasion and constant within an occasion). The remaining part must then be described by random effects.<br /> <br /> We will need some additional notation for describing this new statistical model. Let<br /> <br /> <br /> &lt;ul&gt;<br /> *$\psi_{ik}$be the vector of individual parameters of individual$i$for occasion$k$, where$1\leq i \leq N$and$1\leq k \leq K$.<br /> <br /> <br /> *${c}_{ik}$be the vector of covariates of individual$i$for occasion$k$. Some of these covariates remain constant (gender, group treatment, ethnicity, etc.) and others can vary (weight, treatment, etc.).<br /> &lt;/ul&gt;<br /> <br /> <br /> Let$\bpsi_i = (\psi_{i1}, \psi_{i2}, \ldots , \psi_{iK})$be the sequence of$K$individual parameters for individual$i$. The model for$\bpsi_i$is now a [http://en.wikipedia.org/wiki/Joint_distribution joint distribution]:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;indiv_level2&quot;&gt;&lt;math&gt;<br /> \bpsi_i \sim \qpsii( \ \cdot \ ; c_{i1}, c_{i2}, \ldots, c_{iK}, \theta).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> We also need to define:<br /> <br /> <br /> &lt;ul&gt;<br /> *$\vari{\eta}{i}{0}$, the vector of random effects which describes the random ''inter-individual variability'' of the individual parameters.<br /> <br /> <br /> *$\vari{\eta}{ik}{1}$, the vector of random effects which describes the random ''intra-individual variability'' of the individual parameters in occasion$k$, for each$1\leq k \leq K$.<br /> &lt;/ul&gt;<br /> <br /> <br /> Here and in the following, superscript${(0)}$is used to represent ''inter-individual variability'', i.e., variability at the individual (&quot;reference&quot;) level,<br /> while superscript${(1)}$represents ''inter-occasion variability'', i.e., variability at the &quot;occasion&quot; level for each individual.<br /> <br /> Then, for any individual$i$and occasion$k$, model [[#indiv_level1|(1)]] becomes<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;indiv_level2bis&quot;&gt;&lt;math&gt;<br /> \psi_{ik} = \model(\bbeta,c_{ik},\vari{\eta}{i}{0},\vari{\eta}{ik}{1}).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> <br /> As before, the prediction$\hpsi_{ik}$of$\psi_{ik}$is obtained in the absence of random effects:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \hpsi_{ik} &amp;=&amp; \model(\bbeta,c_{ik},\vari{\eta}{i}{0}\equiv 0,\vari{\eta}{ik}{1} \equiv 0) \\<br /> &amp;=&amp; \hmodel(\bbeta,c_{ik}). <br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> If$\vari{\eta}{i}{0}\neq 0$, then the parameters$\psi_{ik}$defined in [[#indiv_level2bis|(3)]] are no longer independent because they all depend on the same random effect$\vari{\eta}{i}{0}$. The joint distribution$\qpsii$will therefore depend on the model$\model$and in particular on the way in which the model integrates the random effects$\vari{\eta}{i}{0}$and$\vari{\eta}{ik}{1}$.<br /> Let us now develop this further.<br /> <br /> <br /> &lt;ol&gt; <br /> &lt;li&gt;Assume first an [http://en.wikipedia.org/wiki/Additive_model additive model] for the random effects. Here,$\vari{\eta}{i}{0}$and$\vari{\eta}{ik}{1}$can be grouped into a random effect$\eta_{ik}$, where&lt;/li&gt;<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;indiv_level3&quot;&gt;&lt;math&gt;<br /> \eta_{ik} = \vari{\eta}{i}{0} + \vari{\eta}{ik}{1}.<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> Here is an example for an individual with three time periods, where the random effect is additive.<br /> <br /> <br /> {{ImageWithCaption_special|image=iov1.png|caption=Possible decomposition of the random effects of a single subject over 3 time periods}}<br /> <br /> <br /> If we also assume that$\vari{\eta}{i}{0}$and$\vari{\eta}{ik}{1}$are normally distributed with variance-covariance matrices$\vari{\Omega}{i}{0}$and$\vari{\Omega}{ik}{1}$, then$\eta_{ik}$is also normally distributed and the covariance between$\eta_{ik}$and$\eta_{ik^\prime}$is<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;indiv_level4&quot;&gt;&lt;math&gt;<br /> {\rm Cov} \left( \eta_{ik} , \eta_{ik^\prime} \right) = \left\{ \begin{array}{ll}<br /> \vari{\Omega}{i}{0} + \vari{\Omega}{ik}{1} &amp; {\rm if } \ k=k^\prime \\<br /> \vari{\Omega}{i}{0} &amp; {\rm otherwise} . \end{array} \right.<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(5) }}<br /> <br /> Model [[#indiv_level2bis|(3)]] then reduces to$\psi_{ik} = \model(\bbeta,c_{ik},\eta_{ik})$, where now the$\psi_{ik}$are not independent.<br /> <br /> <br /> &lt;li&gt;Assume now a Gaussian model of the form&lt;/li&gt;<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> h(\psi_{ik})&amp; = &amp; h(\hpsi_{ik})+ \eta_{ik} \\<br /> &amp; = &amp; h(\hpsi_{ik})+ \vari{\eta}{i}{0} + \vari{\eta}{ik}{1} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Here, the$h(\psi_{i1}), \ldots , h(\psi_{iK})$are correlated Gaussian vectors whose variance-covariance structure is that of the$(\eta_{ik})$defined in [[#indiv_level4|(5)]].<br /> <br /> <br /> &lt;li&gt;Assume furthermore a linear covariate model. For the sake of simplicity, we consider a unique covariate. Extension to multiple covariates, including categorical and continuous covariates, is straightforward. An initial covariate model deduced from our basic linear model proposed in [[Model with covariates#indiv_cov2|(4) of Model with covariates]] is written &lt;/li&gt;<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;indiv_level5&quot;&gt;&lt;math&gt;<br /> h(\psi_{ik}) = h(\psi_{\rm pop})+ \beta(c_{ik} - c_{\rm pop}) + \vari{\eta}{i}{0} + \vari{\eta}{ik}{1}. <br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(6) }}<br /> &lt;/ol&gt;<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= Consider our model for the volume of distribution introduced in the [[Model with covariates]] section, which assumes a linear relationship between the log-weight and the log-volume. If the weight varies from occasion to occasion, we can consider the following model:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;indiv_levelex1&quot;&gt;&lt;math&gt;<br /> \log(V_{ik}) = \log(V_{\rm pop}) + \beta \, \log(w_{ik}/70) + \eta_{ik} .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(7) }}<br /> }}<br /> <br /> <br /> We can then decompose the part of the variability explained by the covariate$c$into inter-individual and intra-individual components, exactly as we did with the random effects. Let$\cpop$be the reference value of the covariate$c$in the population as before, and also let$c_i$be some reference (or typical) value for individual$i$. Then we can write<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;indiv_level6&quot;&gt;&lt;math&gt; <br /> \begin{array}{ccccc}<br /> c_{ik}-\cpop &amp;= &amp;(c_i - \cpop) &amp; + &amp;( c_{ik} - c_i) \\<br /> &amp;= &amp;\vari{d}{i}{0} &amp; + &amp; \vari{d}{ik}{1} ,\\<br /> \end{array}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(8) }}<br /> <br /> where$\vari{d}{i}{0}$describes the variability of the reference individual value$c_i$around the reference population value$\cpop$, and$\vari{d}{ik}{1}$the fluctuations of the sequence of individual covariate values$(c_{ik})$around$c_i$. Here is an illustration of this for one individual and three time periods.<br /> <br /> <br /> {{ImageWithCaption|image=iov2.png|caption=Decomposition of a time-varying covariate of a single subject over three time periods }}<br /> <br /> <br /> It is instructive to now write model [[#indiv_level5|(6)]] with the following decomposition:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;indiv_level7&quot;&gt;&lt;math&gt; <br /> h(\psi_{ik})= h(\psi_{\rm pop})+ \left( \beta(c_{i} - c_{\rm pop}) + \vari{\eta}{i}{0} \right) +<br /> \left( \beta(c_{ik} - c_{i}) + \vari{\eta}{ik}{1} \right) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(9) }}<br /> <br /> On the right hand side the first term gives the inter-individual (or intra-individual) variability, whereas the second gives the inter-occasion variability for<br /> this individual. <br /> If a covariate$c$does not vary between occasions, it is the same as saying that for each$k$,$c_{ik} =c_i$.<br /> It may also be that some random effects do not exhibit inter-occasion variability, i.e.,<br />$\vari{\eta}{ik}{1}=0$. Then, an individual parameter$\psi_i$does not exhibit inter-occasion variability if and only if both$c_{ik}-c_i =0$and$\vari{\eta}{ik}{1}=0$.<br /> <br /> In general the goal is to construct a model based on the perceived variability of each of these two terms:<br /> <br /> <br /> &lt;ul&gt;<br /> * The inter-individual variability (IIV) model: choosing the model for the covariates that do not change from occasion to occasion, and a variance-covariance structure for the random effects$\vari{\eta}{i}{0}$.<br /> <br /> <br /> * The inter-occasion variability (IOV) model: choosing the model for the covariates that change from occasion to occasion, and a variance-covariance structure between the random effects$\vari{\eta}{ik}{1}$.<br /> &lt;/ul&gt;<br /> <br /> <br /> Model [[#indiv_level7|(9)]] assumes that inter-individual and intra-individual variability of the covariate have the same magnitude of effect on the parameter, i.e., an increase of 1 unit of$c_{ik}$with respect to$c_i$has the same effect as an increase of 1 unit of$c_i$with respect to$\cpop$. If we would rather not make this hypothesis, we can weight differently the covariates$(c_i - \cpop)$and$(c_{ik} - c_i)$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;indiv_level8&quot;&gt;&lt;math&gt;<br /> h(\psi_{ik})= h(\psi_{\rm pop})+ \beta(c_{i} - c_{\rm pop}) + \gamma(c_{ik} - c_{i}) + \vari{\eta}{i}{0} + \vari{\eta}{ik}{1}.<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(10) }}<br /> <br /> <br /> <br /> {{Example<br /> |title= Example:<br /> |text= Consider a model that supposes a linear relationship between income and happiness. Denote$\psi_{ik}$the happiness score (on some scale) for subject$i$in year$k$, and$c_{ik}$their income (in K&amp;euro;) in year$k$. Model [[#indiv_level7|(9)]] supposes that for each subject and year$k$, a difference in annual income of 1K&amp;euro; with respect to the reference income in the population generates an increase of$\beta$happiness.<br /> There is no reason to expect here that the intra-individual variability (fluctuation in one individual's salary) has the same effect on happiness. Indeed, an increase in annual salary of 1K&amp;euro; for some individuals might lead to more happiness than the fact of having a salary of 1K&amp;euro;, more than the reference salary.<br /> Model [[#indiv_level7|(9)]] lets us take this into account, assuming for example that$\gamma&gt;\beta$.<br /> }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Extensions to multi-level variability ==<br /> <br /> <br /> Extension of the proposed approach to nested levels of variability is straightforward. We illustrate this with several examples.<br /> <br /> <br /> &lt;ol&gt;<br /> &lt;li&gt; Suppose that an occasion can be split into several sub-occasions. For instance, imagine that the same study (that lasts several days) is repeated each year. In this case, we might want to take into account year by year variability ''and'' day by day variability. To do this, we can introduce an additional level of intra-individual (or inter-occasion) variability into the model:&lt;/li&gt;<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;indiv_level9&quot;&gt;&lt;math&gt;<br /> \psi_{i,k, l} = \model(\bbeta,c_{i,k,l},\vari{\eta}{i}{0},\vari{\eta}{i,k}{1},\vari{\eta}{i,k,l}{2}) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(11) }}<br /> <br /> Here,$c_{i,k,l}$is the value of covariate$c$for subject$i$during sub-occasion$l$of occasion$k$, and$\vari{\eta}{i}{0}$,$\vari{\eta}{i,k}{1}$and$\vari{\eta}{i,k,l}{2}$describe different levels of random variability of the parameter. Like in [[#indiv_level3|(4)]], we can assume an additive model where the different levels of random effect can be grouped as a single one named$\eta_{i,k,l}$. Now both the explained and the unexplained parts of the variability can be decomposed into inter-individual and two levels of intra-individual components:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;indiv_level10&quot;&gt;&lt;math&gt;<br /> \begin{array}{ccccccc}<br /> c_{i,k,l}-\cpop &amp;= &amp;(c_{i} - \cpop) &amp; + &amp;( c_{i,k} - c_{i})&amp; +&amp; (c_{i,k,l} - c_{i,k}) \\<br /> &amp;= &amp;\vari{d}{i}{0} &amp; + &amp; \vari{d}{i,k}{1} &amp;+&amp; \vari{d}{i,k,l}{2} \\<br /> \eta_{i,k,l} &amp;= &amp;\vari{\eta}{i}{0} &amp;+&amp; \vari{\eta}{i,k}{1} &amp;+&amp; \vari{\eta}{i,k,l}{2} \,.<br /> \end{array}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(12) }}<br /> <br /> <br /> &lt;li&gt;We can instead consider that the individuals are allocated to different centers or studies. Then, possible variability between centers or studies should also be taken into account by the statistical model.&lt;/li&gt;<br /> <br /> Let$\ell=1,2,\ldots,L$be the set of subgroups or studies. To keep things simple, first consider the case where there is only one occasion. Then, if individual$i$is allocated to study$\ell$, its vector of individual parameters$\psi_{\ell,i}$is described by a model that takes into account the ''inter-study'' variability:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;indiv_level11&quot;&gt;&lt;math&gt;<br /> \psi_{\ell,i} = \model(\bbeta,c_{\ell,i},\vari{\eta}{\ell}{-1},\vari{\eta}{\ell,i}{0}) ,<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(13) }}<br /> <br /> where$c_{\ell,i}$is the vector of covariates of individual$i$from group$\ell$, and$\vari{\eta}{\ell}{-1}$and$\vari{\eta}{i}{0}$the random effects that describe the random components of the ''inter-study'' and ''inter-individual'' variability (within the same study). Note that some components of$c_{\ell,i}$might be specific to study$\ell$and have no dependence on the given individual. <br /> <br /> <br /> <br /> &lt;li&gt; We can include in the model any combination of ''inter-group'', ''inter-individual'' and ''inter-occasion'' variability with any combination of interactions. &lt;/li&gt;<br /> Consider for example a cross-over study with$K$occasions performed in$L$centers, and assume the following levels of random variability:<br /> <br /> <br /> &lt;ul&gt;<br /> * Center:$\vari{\eta}{\ell}{-1}$<br /> <br /> * Center and individual:$\vari{\eta}{\ell,i}{0}$<br /> <br /> * Center, individual and occasion:$\vari{\eta}{\ell,i,k}{1}$.<br /> &lt;/ul&gt;<br /> <br /> <br /> Then, if we decide to assume an additive model for the random effects, all the random components of the variability can be combined into a unique vector of random effects:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \eta_{\ell,i,k} = \vari{\eta}{\ell}{-1} + \vari{\eta}{\ell,i}{0} + \vari{\eta}{\ell,i,k}{1} .&lt;/math&gt; }}<br /> <br /> For example, if we were considering an animal study, we might want to group animals with the same father (equivalent to &quot;center&quot;), and then try and characterize variability of some animal feature by a &quot;father&quot; effect, an &quot;animal&quot; effect, and an &quot;occasion&quot; effect.<br /> &lt;/ol&gt;<br /> <br /> <br /> &lt;br&gt;<br /> <br /> &lt;!--<br /> ==$\mlxtran$for multiple levels of variability models ==<br /> <br /> <br /> <br /> {{ExampleWithCode<br /> |title1= Example 1:<br /> |title2=<br /> |text = TO DO<br /> |equation=<br /> |code = <br /> }}<br /> --&gt;<br /> <br /> {{Back&amp;Next<br /> |linkBack= Extension to multivariate distributions<br /> |linkNext= Modeling the observations}}</div> Admin https://wiki.inria.fr/popix/The_covariate_models The covariate models 2013-04-11T14:27:02Z <p>Admin: </p> <hr /> <div>&lt;!-- Menu for the Individual Parameters chapter --&gt;<br /> &lt;sidebarmenu&gt;<br /> +[[Modeling the individual parameters]]<br /> *[[Modeling the individual parameters| Introduction ]] | [[Gaussian models]] | [[Model with covariates]] | [[Extension to multivariate distributions]] | [[Additional levels of variability]] <br /> &lt;/sidebarmenu&gt;<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> == Introduction ==<br /> <br /> For the moment, we are still considering for each subject$i$that there is only one scalar parameter$\psi_i$. The covariate model then consists of defining the prediction$\hpsi_i$as a function of the subject's covariates$\trcov{c}_i$and the fixed effects$\bbeta$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;indiv_covar&quot;&gt;&lt;math&gt; <br /> \hpsi_i = \hmodel(\bbeta,\trcov{c}_i).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> We take a statistical approach here. The goal is not necessarily to construct a ''causal'' model that supposes a cause-effect relationship between<br /> covariates and the parameter, but one where the covariates partially describe the variability of the parameter.<br /> <br /> Consider for example a very simple model that posits a linear relationship between the height$h_i$of subject$i$and their weight$w_i$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \hw_i \ = \ \hmodel(\bbeta,h_i) \ = \ \beta_0 + \beta_1 \, h_i. &lt;/math&gt; }}<br /> <br /> The parameters$\bbeta = (\beta_0, \beta_s)$are [http://en.wikipedia.org/wiki/Population_parameter population parameters], which may vary from one population to the next, but are considered fixed within the same homogeneous population. In this model, the height is a covariate that:<br /> <br /> <br /> &lt;ul&gt;<br /> * helps to predict the weight. For an individual of height$h_i$, we predict the weight$\beta_0 + \beta_1 \, h_i$. We use this model without necessarily supposing that there is a cause-effect relationship between height and weight. We merely assume that having information about height gives us some information about weight.<br /> &lt;br&gt;<br /> <br /> * helps to describe the [http://en.wikipedia.org/wiki/Statistical_dispersion variability] of the weight. Suppose that we make the arbitrary choice of a &quot;reference individual&quot; in the population who has height$h_{\rm pop}$and weight$w_{\rm pop}$. Then the model lets us show the link between the reference height and weight:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; w_{\rm pop} = \beta_0 + \beta_1 \, h_{\rm pop}. &lt;/math&gt; }}<br /> <br /> :Then we can more clearly look at the variability in weight around the reference weight as a function of the variation in height around the reference height:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pred{w}_i - w_{\rm pop} = \beta_1 \, (h_i - h_{\rm pop}). &lt;/math&gt; }}<br /> <br /> :If the weight is in kg and the height in cm, then for an individual who is 1cm taller than the reference height, we predict a weight of$\beta_1 \,kg$above the reference weight.<br /> &lt;/ul&gt;<br /> <br /> <br /> In more general examples, there is a vector of reference covariates$\trcov{c}_{\rm pop}$. A reference individual is one who would personally have these covariate values. Consequently,$\psi_{\rm pop}=\hmodel(\bbeta,\trcov{c}_{\rm pop})$is the predicted value of the individual parameter for this virtual individual.<br /> The covariate model therefore describes how$\hpsi_i$falls around$\psi_{\rm pop}$as$\trcov{c}_i$varies around$\trcov{c}_{\rm pop}$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;phiPop&quot;&gt;&lt;math&gt;<br /> \hpsi_i - \psi_{\rm pop} = \hmodel(\bbeta,\trcov{c}_i) - \hmodel(\bbeta,\trcov{c}_{\rm pop}).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> &lt;!-- %In summary,$\hpsi_i$is function of covariates and fixed effects --&gt;<br /> &lt;!-- %\begin{equation} --&gt;<br /> &lt;!-- %\hpsi_i = \hmodel(\bbeta,\trcov{c}_i) \label{indiv_cov0} --&gt;<br /> &lt;!-- %\end{equation} --&gt;<br /> &lt;!-- %The fixed effects possibly include a reference value$\psi_{\rm pop}$defined as$\psi_{\rm pop} = \hmodel(\bbeta,\trcov{c}_{\rm pop})$. --&gt;<br /> For clarity, in the following we distinguish between linear and nonlinear continuous covariate models, and [http://en.wikipedia.org/wiki/Categorical_variable categorical variable] models.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Linear models for continuous covariates==<br /> <br /> <br /> In its most simple form, a linear model is one where the individual parameter is modeled as a [http://en.wikipedia.org/wiki/Linear_combination linear combination] of the covariates, i.e.,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \hpsi_i \ \ = \ \ \langle \bbeta , \trcov{c}_i \rangle \ \ = \ \ \sum_{\ell=1}^{L} \beta_{\ell}\, \trcov{c}_{i\ell} \, .<br /> &lt;/math&gt; }}<br /> <br /> Here, the function$\hmodel$is the inner product of$\bbeta$and$\trcov{c}_i$.<br /> With respect to a reference individual, this can be rewritten as in [[#phiPop|(2)]]:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \hpsi_i \ \ = \ \ \psi_{\rm pop} + \langle \bbeta , \trcov{c}_i - \trcov{c}_{\rm pop} \rangle. &lt;/math&gt; }}<br /> <br /> More generally, we usually suppose that the linearity can be with respect to a transformation$h$of$\hpsi_i$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;indiv_cov1&quot;&gt;&lt;math&gt; <br /> h(\hpsi_i) \ \ = \ \ h(\psi_{\rm pop})+ \langle \bbeta , \trcov{c}_i - \trcov{c}_{\rm pop} \rangle .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> <br />$h$is the transform described in [[Gaussian_models#indiv_gaussian3|(4) of Gaussian models]], such that$h(\psi_i)$can be supposed Gaussian. As well as covariates such as height and age,$\trcov{c}_i$may also include transformed ones, e.g.,$\log$-weight, weight/(height$^2$), etc.<br /> <br /> By combining [[Gaussian_models#indiv_gaussian3|(4) of Gaussian models]] and [[#indiv_cov1|(3)]], we thus obtain the following equivalent representations of$\psi_i$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;indiv_cov2&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> h(\psi_i) &amp; = &amp; h(\psi_{\rm pop})+ \langle \bbeta , \trcov{c}_i - c_{\rm pop} \rangle + \eta_i \,, \quad \eta_i \sim {\cal N}(0,\omega^2) \\ <br /> h(\psi_i) &amp; \sim &amp; {\cal N}(h(\psi_{\rm pop})+ \langle \bbeta , \trcov{c}_i - c_{\rm pop} \rangle , \omega^2). <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> This model gives a clear and easily interpreted decomposition of the variability of$h(\psi_i)$around$h(\psi_{\rm pop})$, i.e., of<br />$\psi_i$around$\psi_{\rm pop}$: <br /> <br /> <br /> &lt;ul&gt;<br /> i) The '''fixed component'''$\langle \bbeta , (\trcov{c}_i - \trcov{c}_{\rm pop}) \rangle$describes part of this variability by way of covariates$\trcov{c}_i$that fluctuate around$\trcov{c}_{\rm pop}$. &lt;br&gt;<br /> <br /> <br /> ii) The '''random component'''$\eta_i$describes the remaining variability, i.e., variability between subjects that have the same covariate values.<br /> &lt;/ul&gt;<br /> <br /> <br /> By definition, a mixed-effects model combines these two components: [http://en.wikipedia.org/wiki/Fixed_effects fixed] and [http://en.wikipedia.org/wiki/Random_effects random effects]. In linear covariate models, these two effects combine additively.<br /> <br /> Here, the vector of population parameters is$\theta = (\psi_{\rm pop},\bbeta,\omega^2)$. We can then use [[Gaussian_models#indiv_gaussian5|(5) of Gaussian models]] to give the pdf of$\psi_i$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;indiv_cov3&quot;&gt;&lt;math&gt;<br /> \ppsii(\psi_i;\trcov{c}_i ,\theta)= \displaystyle{ \frac{h^\prime(\psi_i)}{\sqrt{2 \pi \omega^2} } }\exp \left\{-\frac{1}{2 \, \omega^2} (h(\psi_i) - h(\psi_{\rm pop}) - \langle \bbeta , \trcov{c}_i - \trcov{c}_{\rm pop} \rangle)^2 \right\} ,<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(5) }}<br /> <br /> and the likelihood function:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; {\like}(\theta ; \psi_1,\psi_2,\ldots,\psi_N) \ \ \eqdef \ \ \prod_{i=1}^{N}\ppsii(\psi_i;\trcov{c}_i ,\theta). &lt;/math&gt; }}<br /> <br /> The [http://en.wikipedia.org/wiki/Maximum_likelihood Maximum Likelihood Estimate] (MLE) of$\theta$has a closed form here since the model is linear. Let<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \xi &amp;=&amp; \left(<br /> \begin{array}{c}<br /> h(\psi_{\rm pop}) \\<br /> \beta_1 \\<br /> \vdots \\<br /> \beta_L \\<br /> \end{array}<br /> \right)<br /> , \quad<br /> h(\bpsi) = \left(<br /> \begin{array}{c}<br /> h(\psi_1) \\<br /> h(\psi_2) \\<br /> \vdots \\<br /> h(\psi_N) \\<br /> \end{array}<br /> \right)<br /> , \quad<br /> C = \left(<br /> \begin{array}{cccc}<br /> 1 &amp; \trcov{c}_{1,1}- \trcov{c}_{\rm pop,1} &amp; \ldots &amp; \trcov{c}_{1,L}- \trcov{c}_{\rm pop,L} \\<br /> 1 &amp; \trcov{c}_{2,1}- \trcov{c}_{\rm pop,1} &amp; \ldots &amp; \trcov{c}_{2,L}- \trcov{c}_{\rm pop,L} \\<br /> \vdots &amp; \vdots &amp; \ddots &amp; \vdots \\<br /> 1 &amp; \trcov{c}_{N,1}- \trcov{c}_{\rm pop,1} &amp; \ldots &amp; \trcov{c}_{N,L}- \trcov{c}_{\rm pop,L} \\<br /> \end{array}<br /> \right).<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Then,$\hat{\xi} \ = \ (C \, C^\prime)^{-1} \, C^\prime \, h(\bpsi)$, and<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \hat{\omega}^2 &amp;=&amp; \frac{1}{N} \{{!}} h(\bpsi) - C \hat{\xi} \{{!}}^2 \\<br /> &amp;=&amp; \frac{1}{N} \sum_{i=1}^{N} \left(h(\psi_i) - h(\hpsi_{\rm pop}) - \sum_{\ell=1}^{L} \hat{\beta}_\ell (\trcov{c}_{i,\ell} - \trcov{c}_{\rm pop,\ell}) \right)^2 .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> &lt;div id=&quot;rem:covariate&quot;&gt;<br /> {{Remarks<br /> |title=Remarks<br /> |text= <br /> 1. Let$ d_{i,\ell} = \trcov{c}_{i,\ell} - \trcov{c}_{\rm pop,\ell}$and$\teta_i = \omega^{-1}\eta_i$. Then [[#indiv_cov2|(4)]] can be written<br /> <br /> :{{EquationWithRef_Special<br /> |equation=&lt;div id=&quot;indiv_cov2b&quot;&gt;&lt;math&gt;<br /> h(\psi_i) = h(\psi_{\rm pop}) + \beta_1 d_{i,1} + \beta_2 d_{i,2} + \ldots<br /> + \beta_L d_{i,L} + \omega \teta_i , \quad \teta_i \sim {\cal N}(0,1). <br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(6) }}<br /> <br /> Here,$d_{i,1}$,$d_{i,2}$, ...,$d_{i,L}$and$\teta_i$represent the effects that contribute to the fluctuations of$h(\psi_i)$around$h(\psi_{\rm pop})$. Coefficients$\beta_1$,$\beta_2$,$\ldots$,$\beta_L$and$\omega$represent the magnitude of these effects. If the$\ell$-th coefficient is zero, this means that the$\ell$-th covariate has no effect. Similarly,$\omega =0$signifies that there is no random effect.<br /> <br /> <br /> 2. The$d_{i,\ell}$and the random effect$\teta_i$play similar roles. The difference is essentially that the$d_{i,\ell}$are &quot;known&quot; in the modeling context, unlike$\teta_i$.<br /> If the context is simulation, all of them are random variables with their own specified distributions. We can therefore consider a random effect like a covariate that is not observed.<br /> }}<br /> <br /> <br /> <br /> {{Example<br /> |title= Example 1:<br /> |text =<br /> In this example, the individual parameter$\psi_i$is the ''volume of distribution''$V_i$, which we could assume to be$\log$-normally distributed. The weight$w_i$(kg) can be used to explain part of the variability of the volume between individuals:<br /> <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;indiv_cov4&quot;&gt;&lt;math&gt;<br /> \log(V_i) = \log (V_{\rm pop}) + \beta (\log(w_i) -\log(70)) + \eta_{i}, <br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(7) }}<br /> <br /> where$\eta_{i} \sim {\cal N}(0, \omega_V^2)$.<br /> <br /> Here, the covariate used in the statistical model is the log-weight and the reference weight that we decide to choose is$70$kg.<br /> Of course, it would be absolutely equivalent to define the covariate as$c_i=\log(w_i/70)$. Then, the reference value of this covariate would become$c_{\rm pop}=0$for an individual of 70kg, and model [[#indiv_cov4|(7)]] can instead be written<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \log(V_i) = \log (V_{\rm pop}) + \beta \, \log(w_i/70) + \eta_{i}. &lt;/math&gt; }}<br /> <br /> The same model can be expressed in different ways. For instance, taking the exponential gives a model in terms of$V_i$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; V_i = \Vpop \left(\displaystyle{ \frac{w_i}{70} }\right)^{\beta} \, e^{\eta_{i} }. &lt;/math&gt; }}<br /> <br /> Here, the predicted volume for an individual with weight$w_i$is<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pred{V}_i = \Vpop \left(\displaystyle{ \frac{w_i}{70} }\right)^{\beta}. &lt;/math&gt; }}<br /> The right-hand side panel of the figure shows how the predicted volume$\pred{V}$increases with weight$w$for different values of$\beta$. Here,$\Vpop$has been set at 10. For$\beta$not equal to 0 or 1, the model is not linear. However, the predicted$\log$-volume (left-hand side panel) does increase linearly with the$\log$-weight:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \log(\pred{V}_i) = \log(\Vpop) + \beta \, \log(w_i/70). &lt;/math&gt; }}<br /> <br /> <br /> [[File:covariate1b.png|link=]]<br /> <br /> <br /> Of course this model is not unique: there exist several possible transformations of the weight that ensure that the predicted volume increases with weight. Setting for example$c_i=w_i-70$assumes that the predicted log-volume increases linearly with the weight. These two covariate models give very similar predictions for$\beta$close to 1 (which is a typical value for [[Introduction_to_PK_modeling_using_MLXPlore_-_Part_I|PK applications]]).<br /> <br /> <br /> [[File:covariate2b.png|link=]]<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example 2:<br /> |text=<br /> In this second example, we suppose that the bioavailability$F_i$has a logit-normal distribution, and age$a_i$(years) is used as a covariate with a reference age of 40 years:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \logit(F_i) &amp;=&amp; \logit (\pred{F}_i) + \eta_{F,i} \\<br /> &amp;=&amp; \logit (F_{\rm pop}) + \beta (a_i-40) + \eta_{F,i} ,<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where$\eta_{F,i} \sim {\cal N}(0, \omega_F^2)$. The predicted logit-bioavailability for an individual of age$a_i$is then<br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\logit(\pred{F}_i) = \logit (F_{\rm pop}) + \beta (a_i-40). &lt;/math&gt;}}<br /> <br /> We can derive from this equation an expression for$\pred{F}_i$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pred{F}_i = \displaystyle{ \frac{F_{\rm pop} }{F_{\rm pop} + (1- F_{\rm pop})e^{-\beta (a_i-40)} } }. &lt;/math&gt; }}<br /> <br /> We see in this example how it is much easier to define a model for the transformed parameter$\logit({F}_i)$than for$F_i$itself. Furthermore, as the [http://en.wikipedia.org/wiki/Logit_transformation logit transform] is strictly increasing, both vary in the same direction with respect to changes is$a_i$.<br /> <br /> This figure shows how$\pred{F}_i$and$\logit(\pred{F}_i)$vary with age for several values of$\beta$.<br /> <br /> <br /> [[File:covariate3b.png|link=]]<br /> <br /> }}<br /> <br /> <br /> &lt;br&gt;&lt;br&gt;<br /> <br /> ==Nonlinear models for continuous variables==<br /> <br /> <br /> Nonlinear models allow for much more general relationships between the covariate vector$\trcov{c}_i$and the prediction$\hpsi_i$.<br /> For equation [[#indiv_covar|(1)]] we now only assume that there exists some function$m$and reference value$\psi_{\rm pop}$such that<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \hpsi_i &amp;=&amp; \hmodel(\bbeta,\trcov{c}_i) \\<br /> \psi_{\rm pop} &amp;=&amp; \hmodel(\bbeta,\trcov{c}_{\rm pop}).<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> We can either make the hypothesis that we are still in the Gaussian case, or not.<br /> <br /> <br /> &lt;ul&gt;<br /> * If we hypothesize that we are still working with Gaussian models, then extending the linear model in [[#indiv_cov2|(4)]] is straightforward: we suppose that there exists a [http://en.wikipedia.org/wiki/Monotone_transformation monotone transformation]$h$such that<br /> <br /> {{EquationWithRef<br /> |equation= &lt;div id=&quot;indiv_cov5&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> h(\psi_i) &amp;=&amp; h(\hpsi_i)+ \eta_i \\<br /> &amp;=&amp; \mmodel(\bbeta,\trcov{c}_i)+ \eta_i , <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(8) }}<br /> <br /> : where$\mmodel(\bbeta,\trcov{c}_i)=h(\hmodel(\bbeta,\trcov{c}_i))$is the prediction of$h(\psi_i)$. We can then derive the pdf of$\psi_i$using [[Gaussian_models#indiv_gaussian5|(5) of Gaussian models]] as before:<br /> <br /> <br /> {{EquationWithRef<br /> |equation= &lt;div id=&quot;indiv_cov6&quot;&gt;&lt;math&gt;<br /> \ppsii(\psi_i;\trcov{c}_i , \theta )=\displaystyle{ \frac{h^\prime(\psi_i)}{\sqrt{2 \pi \omega^2} } } \exp\left\{-\displaystyle{ \frac{1}{2 \, \omega^2} } (h(\psi_i) - \mmodel(\bbeta,\trcov{c}_i))^2 \right\}, <br /> &lt;/math&gt;&lt;/div&gt; <br /> |reference=(9) }}<br /> <br /> : where$\theta=(\bbeta,\omega^2)$. The only difference with the Gaussian linear model is that now there is no explicit form available for the MLE of$\theta$. Instead, it is defined as the solution of an [http://en.wikipedia.org/wiki/Optimization_problem optimization problem]:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \hat{\bbeta} &amp;= &amp;\argmin{\bbeta} \left\{ \sum_{i=1}^{N} \left( h(\psi_i) - \mmodel(\bbeta,\trcov{c}_i) \right)^2 \right\} \\<br /> \hat{\omega}^2 &amp;=&amp; \frac{1}{N} \sum_{i=1}^{N} \left( h(\psi_i) - \mmodel(\hat{\bbeta},\trcov{c}_i) \right)^2 .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> &lt;/ul&gt;<br /> <br /> <br /> <br /> {{Example<br /> |title=Example 3:<br /> |text=<br /> Consider the following model for$\psi_i$:<br /> <br /> {{Equation1<br /> |equation= &lt;math&gt; \psi_i = \displaystyle{ \frac{\beta_1\, e^{\eta_i} }{1 + \beta_2 \, \trcov{c}_i} }, &lt;/math&gt; }}<br /> <br /> where$\eta_i \sim {\cal N}(0, \omega^2)$is assumed. We are going to suppose that the$\log$of$\psi_i$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \log(\psi_i) \ \ = \ \ \log\left( \displaystyle{ \frac{\beta_1}{1 + \beta_2 \, \trcov{c}_i} } \right) + \eta_i \, , &lt;/math&gt; }}<br /> is Gaussian. Here,$h$is the$\log$function,$\hpsi_i = \beta_1/(1 + \beta_2 \, \trcov{c}_i)$and$\psi_{\rm pop}=\beta_1/(1 + \beta_2 \, \trcov{c}_{\rm pop})$.<br /> <br /> Therefore,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \begin{eqnarray}<br /> \log(\psi_i) &amp;\sim&amp; {\cal N}\left( \log\left( \displaystyle{ \frac{\beta_1}{1 + \beta_2 \, \trcov{c}_i} } \right) , \omega^2\right) ,<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> and the optimization problem to solve for the MLE is:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \begin{eqnarray}<br /> (\hat{\beta_1},\hat{\beta_2})&amp; =&amp; \argmin{\beta_1,\beta_2} \sum_{i=1}^{N} \left(\log(\psi_i) - \log\left( \displaystyle{ \frac{\beta_1}{1 + \beta_2 \, \trcov{c}_i} }\right) \right)^2 \\<br /> \hat{\omega}^2 &amp;=&amp; \frac{1}{N} \sum_{i=1}^{N} \left(\log(\psi_i) - \log\left( \displaystyle{ \frac{\hat{\beta}_1}{1 + \hat{\beta}_2 \, \trcov{c}_i} } \right) \right)^2.<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> }}<br /> <br /> <br /> &lt;ul&gt;<br /> * For more general distributions of$\psi_i$, we can simply define$\psi_i$as a function of fixed and random effects:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;notGauss&quot;&gt;&lt;math&gt;<br /> \psi_i = \model(\bbeta,\trcov{c}_i,\eta_i).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(10) }}<br /> <br /> : The prediction$\hpsi_i$is obtained when setting$\eta_i \equiv 0$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \hpsi_i = \model(\bbeta,\trcov{c}_i,\eta_i\equiv 0),<br /> &lt;/math&gt; }}<br /> <br /> : and the population value of$\psi$when$c_i = c_{\rm pop}$:<br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;<br /> \psi_{\rm pop} = \model(\bbeta,\trcov{c}_i \equiv \trcov{c}_{\rm pop},\eta_i\equiv 0). <br /> &lt;/math&gt; }}<br /> <br /> : If the random effects are supposed Gaussian, there always exists an underlying Gaussian model which describes the distribution of$\psi_i$. Let$\imodel$be the function obtained by rearranging [[#notGauss|(10)]] as a function of$\eta_i$:<br /> <br /> {{Equation1 <br /> |equation=&lt;math&gt;\eta_i = \imodel(\bbeta,\trcov{c}_i,\psi_i). &lt;/math&gt; }}<br /> : We can then derive the pdf of$\psi_i$from that of$\eta_i$,<br /> <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;indiv_cov7&quot;&gt;&lt;math&gt;<br /> \ppsii(\psi_i;\trcov{c}_i, \theta )=\displaystyle{ \frac{ \partial}{\partial\psi} }\imodel(\bbeta,\trcov{c}_i,\psi_i) \displaystyle{ \frac{1}{\sqrt{2 \pi \omega^2} } }\exp\left\{-\displaystyle{ \frac{\imodel^2(\bbeta,\trcov{c}_i,\psi_i)}{2 \, \omega^2} } \right\}, <br /> &lt;/math&gt;&lt;/div&gt; <br /> |reference=(11) }}<br /> <br /> : where$\theta=(\bbeta,\omega^2)$is the vector of population parameters of the model. We can then state the [http://en.wikipedia.org/wiki/Likelihood_function likelihood function]:<br /> <br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; {\like}(\theta ; \psi_1,\psi_2,\ldots,\psi_N) \ \eqdef \ \prod_{i=1}^{N}\ppsii(\psi_i;\trcov{c}_i, \theta ).<br /> &lt;/math&gt; }}<br /> <br /> : The distribution$\qpsii$and the likelihood${\like}$have closed forms if and only if the inverse function$\imodel$can be computed in closed form, which is not always the case.<br /> &lt;/ul&gt;<br /> <br /> <br /> {{Example<br /> |title= Example 4:<br /> |text=<br /> Suppose that we model with<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \psi_i = \displaystyle {\frac{\beta_1 \, e^{\eta_i} }{1 + \beta_2 \, \trcov{c}_i \, e^{\eta_i} } },<br /> &lt;/math&gt; }}<br /> <br /> where$\eta_i \sim {\cal N}(0, \omega^2)$is assumed.<br /> As before,$\hpsi_i$is obtained when$\eta_i$is set to 0:$\hpsi_i = \beta_1/(1 + \beta_2 \, c_i)$, and$\psi_{\rm pop}=\beta_1/(1 + \beta_2 \, \trcov{c}_{\rm pop})$.<br /> <br /> In this example, it is possible to rearrange the formula for$\eta_i$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \eta_i \ \ = \log \left( \displaystyle{ \frac{\psi_i}{\beta_1 - \beta_2 \, \trcov{c}_i \, \psi_i} } \right) \ \ \sim \ \ {\cal N}( 0 , \omega^2). <br /> &lt;/math&gt; }}<br /> <br /> It is therefore possible to explicitly give the distribution of$\psi_i$and the likelihood${\like}$using [[#indiv_cov7|(10)]] with$\imodel(\bbeta,\trcov{c}_i,\psi_i)= \log \left( \psi_i/(\beta_1 - \beta_2 \, \trcov{c}_i \, \psi_i) \right)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example 5:<br /> |text = Let us now propose a model that has a small modification with respect to the previous one,<br /> <br /> {{Equation1<br /> |equation= &lt;math&gt; \psi_i = \displaystyle{ \frac{\beta_1 + \eta_i}{1 + \beta_2 \, \trcov{c}_i \, e^{\eta_i} } }.<br /> &lt;/math&gt; }}<br /> <br /> The predictions$\hpsi_i$and$\psi_{\rm pop}$can both be described as before, but it is no longer possible to explicitely inverse the formula in order to express$\eta_i$as a function of$\psi_i$. Therefore we cannot explicitly write the likelihood${\like}$in this case.<br /> }}<br /> <br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> |text=<br /> Even though the great flexibility of such models appears attractive at first glance, we must remain attentive to what we want to use them for and the tasks we want to perform. In a modeling context, remember that the individual parameters are not observed. The choice of using a complex model for such variables can pose several problems, for model identification and parameter estimation. Even though software like NONMEM allows us to define nonlinear models of covariates, it is not realistic to think that it can correctly estimate them, since the FO and FOCE algorithms implemented in NONMEM are based on linearization of the model. In this way, a user can precisely define a model, but has little control over the quality of the output.<br /> <br /> <br /> The linear model proposed in [[#indiv_cov1|(3)]] has certain limits due to the fact that it cannot represent all possible and imaginable models, but it remains sufficiently flexible (due to being able to choose a parameter transform$h$and covariate transforms) and robust (see the Tasks &amp; Tools section) to be successfully used in most situations.<br /> <br />$\mlxtran$allows to write any linear or non linear model of covariates. Then, such model can be easily used for simulation (using the R/Matlab function {{Verbatim|simulmlx}} for instance). On the other hand, only linear models of covariates can be used for estimation with$\monolix$. <br /> }} <br /> <br /> <br /> &lt;br&gt;<br /> <br /> == A model for categorical covariates ==<br /> <br /> <br /> <br /> Categorical variables take a finite number of values from some set that is not necessarily numerical or even ordered, e.g., gender, country and ethnicity.<br /> <br /> The approach taken for continuous covariates extends easily to categorical ones.<br /> For simplicity's sake, let us consider a unique covariate$\trcov{c}_i$that takes its values in$\{ a_1, a_2, \ldots, a_K\}$, and a unique parameter$\psi_i$. A reference covariate value$\trcov{c}_{\rm pop}$here is a reference category, i.e., a specific element$a_{\kref}$of$\{ a_1, a_2, \ldots, a_K\}$. The prediction of$\psi_i$is thus given by the following model:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;lin_cov3&quot;&gt;&lt;math&gt; <br /> h(\hpsi_i) = h(\psi_{\rm pop})+ \beta_1 \one_{\trcov{c}_i=a_1} + \beta_2 \one_{\trcov{c}_i=a_2} + \ldots + \beta_K \one_{\trcov{c}_i=a_K}, <br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(12) }}<br /> <br /> &lt;!-- %where$\one$is the indicator function ($\one_A = 1$if$A$is true, 0 otherwise) and --&gt;<br /> with$\beta_{\kref} = 0$. Then, [[#lin_cov3|(12)]] is equivalent to<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> h(\hpsi_i) = \left\{<br /> \begin{array}{ll}<br /> h(\psi_{\rm pop}) &amp; {\rm if \quad} \trcov{c}_i=a_{\kref} \\<br /> h(\psi_{\rm pop}) + \beta_k &amp; {\rm if \quad } \trcov{c}_i=a_k \neq a_{\kref} <br /> \end{array}<br /> \right.<br /> &lt;/math&gt; }}<br /> <br /> We see that if the covariate has$K$categories, then$K-1$coefficients$(\beta_k)$are required for defining the covariate model.<br /> <br /> <br /> {{Example<br /> |title=Example 6:<br /> |text = <br /> Assume that the individual [http://en.wikipedia.org/wiki/Clearance_%28medicine%29 clearance] (of a drug) depends on gender. Here, the gender$g_i$of individual$i$can either be female or male. We arbitrarily choose female as reference gender. Assuming a log-normal distribution for clearance, the model can be written as follows:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \log(Cl_i) = \log(Cl_{\rm pop}) + \beta \one_{g_i={\rm male} } + \eta_i , &lt;/math&gt; }}<br /> <br /> and the predicted clearance is<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pred{Cl}_i = \left\{<br /> \begin{array}{ll}<br /> Cl_{\rm pop} &amp; {\rm if \quad g_i= female} \\<br /> Cl_{\rm pop} \, e^\beta &amp; {\rm if \quad g_i= male}.<br /> \end{array}<br /> \right.<br /> &lt;/math&gt; }}<br /> }}<br /> <br /> <br /> <br /> {{Example<br /> |title=Example 7:<br /> |text= We want to model the variation in weight between individuals of three countries: India, US and China.<br /> Assuming a normal distribution for weight and India as the reference country, we have:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; w_i = w_{\rm pop} + \beta_1 \one_{o_i={\rm US} } + \beta_2 \one_{o_i={\rm China} } + \eta_i. &lt;/math&gt; }}<br /> <br /> The predicted weight is therefore<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pred{w}_i = \left\{<br /> \begin{array}{ll}<br /> w_{\rm pop} &amp; {\rm if \quad o_i= India} \\<br /> w_{\rm pop} + \beta_1 &amp; {\rm if \quad o_i= USA} \\<br /> w_{\rm pop} + \beta_2 &amp; {\rm if \quad o_i= China}.<br /> \end{array}<br /> \right. &lt;/math&gt; }}<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==$\mlxtran$for covariate models ==<br /> <br /> <br /> {{ExampleWithCode_bis<br /> |title1=Example 1 (linear model):<br /> |title2= <br /> |text= <br /> Two covariates:<br /> <br /> &lt;blockquote&gt;<br /> * weight$w_i$: continuous covariate,<br /> <br /> * gender$g_i$: categorical covariate,$g_i\in\{ {\rm F, M}\}$.<br /> &lt;/blockquote&gt;<br /> <br /> <br /> Vector of individual parameters:$\psi_i = (ka_i, V_i, Cl_i)$,<br /> <br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \log(ka_i) &amp;\sim&amp; {\cal N}(\log(ka_{\rm pop}), \omega_{ka}^2) \\<br /> \log(V_i) &amp;\sim&amp; {\cal N}(\log(V_{\rm pop}) + \beta_{V,w}\log(w_i/70), \omega_V^2) \\<br /> \log(Cl_i) &amp;\sim&amp; {\cal N}(\log(Cl_{\rm pop}) + \beta_{Cl,w}\log(w_i/70) + \\<br /> &amp; &amp; \beta_{Cl,g}\one_{g_i=M} , \omega_{Cl}^2) \\<br /> \end{eqnarray}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> [INDIVIDUAL]<br /> input={ka_pop, V_pop, Cl_pop, beta_V, beta1_Cl, beta2_Cl, <br /> omega_ka, omega_V, omega_Cl, weight, gender}<br /> <br /> EQUATION:<br /> lw70=log(weight/70)<br /> <br /> DEFINITION:<br /> ka = {distribution=lognormal,reference=ka_pop,d=omega_ka}<br /> V = {distribution=normal,reference=V_pop, <br /> covariate=lw70,coefficient=beta_V,sd=omega_V}<br /> Cl = {distribution=normal,reference=Cl_pop, <br /> covariate={lw70,gender},<br /> coefficient={beta1_Cl,beta2_Cl},sd=omega_Cl}<br /> &lt;/pre&gt;}}<br /> <br /> <br /> |textAgain= (It is assumed here that {{Verbatim|gender}} has been previously defined as a categorical covariate with two categories {F, M} and F as reference category).<br /> }}<br /> <br /> <br /> {{ExampleWithCode<br /> |title1= Example 2 (nonlinear model):<br /> |title2=<br /> |text=<br /> |equation=&lt;math&gt; \begin{eqnarray}<br /> \log(\psi_i) &amp;\sim&amp; {\cal N}\left( \log\left( \displaystyle{ \frac{\beta_1}{1 + \beta_2 \, \trcov{c}_i} }\right) , \omega^2\right) <br /> \end{eqnarray}&lt;/math&gt; <br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> [INDIVIDUAL]<br /> input={beta1, beta2, omega, c}<br /> <br /> EQUATION:<br /> predpsi = beta1/(1+beta2*c)<br /> <br /> DEFINITION:<br /> psi = {distribution=normal,prediction=predpsi,sd=omega}<br /> &lt;/pre&gt; }}<br /> }}<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack=Gaussian models<br /> |linkNext=Extension to multivariate distributions }}</div> Admin https://wiki.inria.fr/popix/Gaussian_models Gaussian models 2013-04-11T09:59:10Z <p>Admin: /* Extensions of the normal distribution */</p> <hr /> <div>&lt;!-- Menu for the Individual Parameters chapter --&gt;<br /> &lt;sidebarmenu&gt;<br /> +[[Modeling the individual parameters]]<br /> *[[Modeling the individual parameters| Introduction ]] | [[Gaussian models]] | [[Model with covariates]] | [[Extension to multivariate distributions]] | [[Additional levels of variability]] <br /> &lt;/sidebarmenu&gt;<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> == The normal distribution ==<br /> <br /> Gaussian models have several advantages, including the capacity of describing with ease both the predicted value of a random variable and its fluctuations around this value. Indeed, if we consider a [http://en.wikipedia.org/wiki/Gaussian_random_variable Gaussian random variable]$\psi$with [http://en.wikipedia.org/wiki/Mean mean]$\mu$and [http://en.wikipedia.org/wiki/Standard_deviation standard deviation]$\omega$, we can work with two entirely equivalent mathematical representations:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;indiv_gaussian1&quot;&gt;&lt;math&gt; \begin{eqnarray}<br /> \psi &amp;\sim&amp; {\cal N}(\mu , \omega^2) <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;indiv_gaussian2&quot;&gt;&lt;math&gt; \begin{eqnarray}<br /> \psi &amp;=&amp; \mu + \eta, \quad {\rm where }\ \quad \ \eta \sim {\cal N}(0,\omega^2) .<br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> The form [[#indiv_gaussian1|(1)]] provides an explicit description of the distribution of$\psi$from which we can deduce the [http://en.wikipedia.org/wiki/Probability_density_function pdf] and other characteristics such as the [http://en.wikipedia.org/wiki/Median median], [http://en.wikipedia.org/wiki/Mode_%28statistics%29 mode] and [http://en.wikipedia.org/wiki/Quantile quantiles]. The figure below shows the pdf of a [http://en.wikipedia.org/wiki/Normal_distribution normal distribution] with mean$\mu$and standard deviation$\omega$. <br /> Each vertical band contains 10% of the distribution.<br /> <br /> <br /> :{{ImageWithCaption|image=Ndistrib.png|caption=The${\cal N}(\mu,\omega^2)$distribution}}<br /> <br /> <br /> This type of graphical representation is powerful and helps us to better visualize the types of values the random variable can take and those values that are more likely than others.<br /> <br /> Examples of normal distributions with various parameters are shown in the next figure.<br /> <br /> <br /> {{ImageWithCaption|image=distrib1.png|caption=Normal distributions}}<br /> <br /> <br /> Representation [[#indiv_gaussian2|(2)]] lets us separate the random and non-random components of$\psi$. If we define as the predicted value the value obtained in the absence of randomness ($\eta=0$), we get that$\hat{\psi}=\mu$. In the particular case of a normal distribution, this predicted value is the mean, median and mode of$\psi$. We can therefore rewrite equations [[#indiv_gaussian1|(1)]] and [[#indiv_gaussian2|(2)]] using$\hpsi$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \psi &amp;\sim&amp; {\cal N}(\hpsi , \omega^2) \\<br /> \psi &amp;=&amp; \hpsi + \eta, \quad {\rm where } \quad \ \ \eta \sim {\cal N}(0,\omega^2) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Extensions of the normal distribution == <br /> <br /> Clearly, not all distributions are Gaussian. To begin with, the normal distribution has the support$\Rset$, unlike many parameters that take values in precise ranges; some variables take only positive values (e.g., [http://en.wikipedia.org/wiki/Concentration concentrations] and [http://en.wikipedia.org/wiki/Volume volumes]) and others are restricted to bounded intervals (e.g., [http://en.wikipedia.org/wiki/Bioavailability bioavailability]).<br /> <br /> Furthermore, the [http://en.wikipedia.org/wiki/Gaussian_distribution Gaussian distribution] is symmetric, which is not a property shared by all distributions. One way to extend the use of [http://en.wikipedia.org/wiki/Gaussian_distribution Gaussian distributions] is to consider that some transform of the parameters we are interested in is Gaussian, i.e., assume the existence of a [http://en.wikipedia.org/wiki/Monotonic_function monotonic function]$h$such that$h(\psi)$is normally distributed. Then, there exists some$\mu$and$\omega$such that$h(\psi) \sim {\cal N}(\mu , \omega^2)$.<br /> <br /> For a given transformation$h$, we can parametrize using$\hat{\psi}$, the predicted value of$\psi$. Indeed, the predicted value of$h(\psi)$is$\mu=h(\hat{\psi})$, and<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;indiv_gaussian3&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> h(\psi) &amp;\sim&amp; {\cal N}(h(\hat{\psi}) , \omega^2) <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;indiv_gaussian4&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> h(\psi) &amp;=&amp; h(\hat{\psi}) + \eta , \quad {\rm where } \quad \ \eta \sim {\cal N}(0,\omega^2). <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> It is possible to derive the pdf of$\psi$from [[#indiv_gaussian3|(4)]]:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;indiv_gaussian5&quot;&gt;&lt;math&gt;<br /> \ppsi(\psi)=\displaystyle{ \frac{h^\prime(\psi)}{\sqrt{2 \pi \omega^2} } } \ \exp\left\{-\displaystyle{ \frac{1}{2 \, \omega^2} } (h(\psi) - h(\hpsi))^2 \right\}. &lt;/math&gt;&lt;/div&gt; <br /> |reference=(5) }}<br /> <br /> Let us now see some examples of transformed normal pdfs:<br /> <br /> <br /> &lt;br&gt;<br /> ===Log-normal distribution===<br /> <br /> The [http://en.wikipedia.org/wiki/Log-normal_distribution log-normal distribution] is widely used for describing the distribution of PK/PD parameters. This choice is usually justified by the fact that it ensures non-negative values, and rarely because it is shown to properly describe the population distribution of the parameter of interest.<br /> <br /> Let$\psi$be a log-normally distributed random variable with parameters$(\mu,\omega)$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\log(\psi) \sim {\cal N}( \mu, \omega). &lt;/math&gt; }}<br /> <br /> This distribution can be also parameterized with$(m,\omega)$, where$m = \mu = \hat{\psi}$. Then,$\log(\psi) \sim {\cal N}( \log(m), \omega)$and<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \ppsi(\psi)=\displaystyle{ \frac{1}{\psi \, \sqrt{2 \pi \omega^2} } }\ \exp\left\{- \displaystyle{\frac{1}{2 \, \omega^2} (\log(\psi) - \log(m))^2} \right\}.<br /> &lt;/math&gt; }}<br /> <br /> We display below some log-normal pdfs obtained with different parameters$(m,\omega)$.<br /> <br /> <br /> {{ImageWithCaption|image=distrib2.png|caption=Log-normal distributions}}<br /> <br /> <br /> We see that for a given standard deviation$\omega$, the pdfs obtained for different$m$are simply rescaled.<br /> &lt;!-- {{Equation1|equation=&lt;math&gt; f_{\alpha m,\omega}(x) = \frac{f_{m,\omega}(x/\alpha)}{\alpha} &lt;/math&gt; }} --&gt;<br /> On the other hand, for a given$m$the asymmetry of the distribution increases when the standard deviation$\omega$increases.<br /> <br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> |text=<br /> Note that the log-normal distribution takes its values in$(0,+\infty)$. It is straightforward to define a rescaled distribution in$(a,+\infty)$by shifting it:<br /> <br /> {{Equation1<br /> |equation= <br /> &lt;math&gt;\begin{eqnarray}<br /> \log(\psi-a) &amp;\sim&amp; {\cal N}( \log(m-a), \omega^2).<br /> \end{eqnarray}&lt;/math&gt; }}<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Power-normal (or Box-Cox) distribution===<br /> <br /> <br /> This is the distribution of a random variable$\psi$for which the [http://en.wikipedia.org/wiki/Box-Cox_transformation Box-Cox transformation] of$\psi$,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> h(\psi) = \displaystyle{ \frac{\psi^\lambda -1}{\lambda} }<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> (with$\lambda &gt; 0$) follows a normal distribution${\cal N}( \mu, \omega^2)$truncated such that$h(\psi)&gt;0$. It therefore takes its values in$(0,+\infty)$.<br /> The distribution converges to the log-normal distribution when$\lambda \to 0$and a truncated normal distribution when$\lambda \to 1$.<br /> The main interest of a power-normal distribution is its ability to represent a distribution &quot;between&quot; the log-normal distribution and the normal distribution.<br /> <br /> Here,$m = \hat{\psi} = (\lambda \mu + 1)^{1/\lambda}$.<br /> We display below several power-normal pdfs obtained with various parameter sets$(\lambda,m,\omega)$.<br /> <br /> <br /> {{ImageWithCaption|image=distrib3.png|caption=Power-normal distributions }}<br /> <br /> <br /> &lt;br&gt;<br /> ===Logit-normal and probit-normal distributions.===<br /> <br /> A random variable$\psi$with a [http://en.wikipedia.org/wiki/Logit-normal_distribution logit-normal distribution] takes its values in$(0,1)$. The [http://en.wikipedia.org/wiki/Logit logit] of$\psi$is normally distributed, i.e.,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \logit(\psi) &amp;= &amp;\log \left(\displaystyle{ \frac{\psi}{1-\psi} }\right) \<br /> \sim \ \ {\cal N}( \mu, \omega^2) \\<br /> m &amp;=&amp; \displaystyle{ \frac{1}{1+e^{-\mu} } }.<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> This means that$\mu=\logit(m)$.<br /> <br /> A random variable$\psi$with a probit-normal distribution also takes its values in$(0,1)$. Then, the &lt;balloon title=&quot;The probit function is the inverse cumulative distribution function (quantile function) 1/&amp;Phi; associated with the standard normal distribution N(0,1).&quot; style=&quot;color:#177245&quot;&gt;probit&lt;/balloon&gt; of$\psi$is normally distributed:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \probit(\psi) &amp;= &amp;\Phi^{-1}(\psi) \<br /> \sim \ {\cal N}( \mu, \omega^2) \\<br /> m &amp;=&amp; \Phi(\mu).<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> This means that$\mu=\probit(m)$.<br /> <br /> We can see in the figures below that the pdfs of the logit and [http://en.wikipedia.org/wiki/Probit probit] distributions with the same$m$and well-chosen$\omega$are very similar. Thus, these two distributions can be used interchangeably for modeling the distribution of a parameter that takes its values in$(0,1)$.<br /> <br /> <br /> {{ImageWithCaption|image=distribution4.png|caption=Logit-normal and probit-normal distributions }}<br /> <br /> <br /> Logit and probit transformations can be generalized to any interval$(a,b)$by setting<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \psi = a + (b-a)\tilde{\psi}, &lt;/math&gt; }}<br /> <br /> where$\tilde{\psi}$is a random variable that takes its values in$(0,1)$with a logit (or probit) distribution.<br /> <br /> Furthermore, it is easy to show that the probit-normal distribution with$m=0.5$and$\omega=1$is the uniform distribution on$(0,1)$.<br /> Thus, any uniform distribution can easily be derived from the probit-normal distribution.<br /> <br /> <br /> &lt;br&gt;<br /> === Extension to transformed Student's$t$-distributions ===<br /> <br /> These extensions (log-$t$, power-$t$, etc.) can be obtained simply by replacing the normal distribution of the [http://en.wikipedia.org/wiki/Random_effect random effects] with a [http://en.wikipedia.org/wiki/Student%27s_t-distribution Student$t$-distribution]. Such extensions can be useful for modeling heavy-tailed distributions. Several [http://en.wikipedia.org/wiki/Student%27s_t-distribution Student's$t$-distributions] with different degrees of freedom (d.f.) are displayed below. The [http://en.wikipedia.org/wiki/Student%27s_t-distribution Student's$t$-distribution] converges to the normal distribution as the d.f. increases, whereas heavy tails are obtained for small d.f.<br /> <br /> <br /> {{ImageWithCaption|image=student.png|caption=Standardized normal and Student's$t$probability distribution functions }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==$\mlxtran$for the Gaussian model== <br /> <br /> <br /> {{ExampleWithCode<br /> |title1=Example<br /> |title2=<br /> |text=<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \logit(F_i) &amp;\sim&amp; {\cal N}(\logit(F_{\rm pop}), \omega_F^2) \\<br /> \log(ka_i) &amp;\sim&amp; {\cal N}(\log(ka_{\rm pop}), \omega_{ka}^2) \\<br /> V_i &amp;\sim&amp; {\cal N}(V_{\rm pop}, \omega_V^2) \\<br /> \displaystyle{\frac{Cl_i^{\lambda_{Cl} } - 1}{\lambda_{Cl} } } &amp;\sim&amp; {\cal N}(\frac{Cl_{\rm pop}^{\lambda_{Cl} } - 1}{\lambda_{Cl} }, \omega_{Cl}^2) <br /> \end{eqnarray}&lt;/math&gt; <br /> |code= <br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> [INDIVIDUAL]<br /> input={F_pop, ka_pop, V_pop, Cl_pop, lambda_Cl, <br /> omega_F, omega_ka, omega_V, omega_Cl}<br /> <br /> DEFINITION:<br /> F = {distribution=logitnormal,reference=F_pop,sd=omega_F}<br /> ka = {distribution=lognormal,reference=ka_pop,sd=omega_ka}<br /> V = {distribution=normal,reference=V_pop,sd=omega_V}<br /> Cl = {distribution=powernormal,<br /> reference=Cl_pop,power=lambda_Cl,sd=omega_Cl}<br /> &lt;/pre&gt; }}<br /> <br /> }}<br /> <br /> {{Back&amp;Next<br /> |linkBack=Modeling the individual parameters<br /> |linkNext=Model with covariates }}</div> Admin https://wiki.inria.fr/popix/Test_Video1 Test Video1 2013-04-09T14:14:17Z <p>Admin: </p> <hr /> <div>questa e' una prova <br /> <br /> &lt;flvplayertwo width=&quot;320&quot; height=&quot;240&quot;&gt;intro31draft1_formation.swf&lt;/flvplayertwo&gt;</div> Admin https://wiki.inria.fr/popix/Test_Video Test Video 2013-04-09T10:06:10Z <p>Admin: </p> <hr /> <div>This is an example<br /> <br /> {{ShowVideo<br /> |video=intro31draft1_formation.swf<br /> |title=Watch the video<br /> }}<br /> <br /> This is the same example, but I used the extension<br /> <br /> [[File:IntroVideo.png]]&lt;flvplayertwo autoPlay=&quot;false&quot; autoBuffering=&quot;true&quot;&gt;intro31draft1_formation.swf &lt;/flvplayertwo&gt;<br /> <br /> &lt;math&gt; a + b = \frac{1}{2} &lt;/math&gt;</div> Admin https://wiki.inria.fr/popix/Joint_data_models Joint data models 2013-04-09T09:21:25Z <p>Admin: /* Independent observations */</p> <hr /> <div>&lt;!-- Menu for the Observations chapter --&gt;<br /> &lt;sidebarmenu&gt;<br /> +[[Modeling the observations]]<br /> *[[Modeling the observations| Introduction ]] | [[ Continuous data models ]] | [[Models for count data]] | [[Model for categorical data]] | [[Models for time-to-event data ]] | [[Joint models]] <br /> &lt;/sidebarmenu&gt;<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> ==Introduction==<br /> <br /> An important goal of longitudinal studies is to characterize relationships between different types of response data.<br /> <br /> For instance, in a PKPD population study, we may be interested in the relationship between certain pharmacokinetics (absorption, distribution, [http://en.wikipedia.org/wiki/Metabolism metabolism] and [http://en.wikipedia.org/wiki/Elimination_%28pharmacology%29 excretion]) and pharmacodynamics (biochemical and physiological effects) of a drug. To do this, we need to measure some of both types of response data for several individuals from the same population, then try and characterize their relationship.<br /> <br /> Alternatively, many [http://en.wikipedia.org/wiki/Clinical_trial clinical trials] and reliability studies generate both longitudinal and survival ([[Models for time-to-event data |time-to-event]]) data. For example, in HIV clinical trials the viral load and the concentration of [http://en.wikipedia.org/wiki/CD4%2B_cells CD4] cells are widely used as [http://en.wikipedia.org/wiki/Biomarker biomarkers] for progression to AIDS when studying the efficacy of drugs to treat HIV-infected patients. We might then be interested in the relationship between these variables and events such as [http://en.wikipedia.org/wiki/Seroconversion seroconversion] or death.<br /> <br /> Therefore, in general a ''joint model'' is one that allows us to simultaneously describe the distribution of different types of observations made on the same individual. We consider this as usual in the population context.<br /> <br /> Suppose that we have$L$different types of observations for individual$i$:$y_i^{(1)}=(y_{ij}^{(1)},1\leq j \leq n_{i1})$,$y_i^{(2)}=(y_{ij}^{(2)},1\leq j \leq n_{i2})$, ...,$y_i^{(L)}=(y_{ij}^{(L)},1\leq j \leq n_{i,L})$, where$n_{i,\ell}$is the number of observations of type$\ell$made on individual$i$.<br /> Note that$n_{i,\ell}$may be different for different$\ell$for the same individual, and the observation times$(t_{ij}^{(\ell)})$too.<br /> <br /> Denote$y_i$the set of observations for individual$i$:$ y_i = (y_i^{(1)},y_i^{(2)},\ldots,y_i^{(L)})$.<br /> For each individual, the joint probability distribution of the observations$y_i$and the individual parameters$\psi_i$can be decomposed as follows<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray} <br /> \pyipsii(y_i,\psi_i;\theta) &amp;=&amp; \pcyipsii(y_i {{!}} \psi_i) \, \ppsii(\psi_i;\theta) \\<br /> &amp; =&amp; \pcyipsii(y_i^{(1)},y_i^{(2)},\ldots,y_i^{(L)} {{!}} \psi_i) \, \ppsii(\psi_i;\theta) . <br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> We can then distinguish between different types of dependency between observations: independence, conditional independence and conditional dependence.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Independent observations ==<br /> <br /> Suppose first that the vector of individual parameters$\psi_i$can be decomposed into$L$independent sub-vectors$\psi_i^{(1)}$,$\psi_i^{(2)}$, ...,$\psi_i^{(L)}$such that$y_i^{(\ell)}$depends only on$\psi_i^{(\ell)}$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pyipsii(y_i,\psi_i;\theta) &amp;=&amp; \pyipsii\left(y_i^{(1)},y_i^{(2)},\ldots,y_i^{(L)},\psi_i^{(1)}, \psi_i^{(2)}, \ldots , \psi_i^{(L)};\theta\right) \\<br /> &amp;=&amp; \prod_{\ell=1}^{L} \pmacro\left(y_i^{(\ell)},\psi_i^{(\ell)};\theta\right) \\<br /> &amp;=&amp; \prod_{\ell=1}^{L} \pmacro\left(y_i^{(\ell)} {{!}} \psi_i^{(\ell)}\right) \pmacro\left(\psi_i^{(\ell)};\theta\right) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Here, joint modeling does not bring anything new to the picture because all information on$\psi_i^{(\ell)}$is contained in the related set of observations$y_i^{(\ell)}$. We can therefore model separately<br /> each set of observations.<br /> <br /> <br /> {{Example1<br /> |title1=Example<br /> |title2=A PK and PD model for [http://en.wikipedia.org/wiki/Warfarin warfarin] data<br /> |text=<br /> Here, 32 healthy volunteers received a 1.5 mg/kg single oral dose of warfarin, an anticoagulant normally used in the prevention of [http://en.wikipedia.org/wiki/Thrombosis thrombosis]. We then measured at different times the warfarin plasma concentration$C$and the [http://en.wikipedia.org/wiki/Prothrombin prothrombin] complex activity (PCA)$E$for these patients.<br /> The figure represents the PK data (on the left) and the PD data (on the right).<br /> <br /> <br /> {{ImageWithCaption|image=warf0.png|caption= warfarin PK and PD data }}<br /> <br /> <br /> First, we consider two entirely independent parametric models for each of the PK and PD data: a simple one compartment model$f_1$for the PK and rebound model$f_2$for the PD. For any$t&gt;0$,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> C(t) &amp;=&amp; \displaystyle{ \frac{D\, k_a}{V(k_a-k_e)} } \left( e^{-k_e \, t} - e^{-k_a \, t} \right) \\<br /> E(t) &amp;=&amp; 100\left(\displaystyle{ \frac{\beta}{1+\beta} } e^{-\alpha \, t} + \displaystyle{ \frac{1}{1+\beta \, e^{-\gamma \, t} } }\right) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> We can then model the observations supposing for example a combined error model for the PK data and an additive one for the PD data:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:warf1&quot;&gt;&lt;math&gt;\begin{array}{c}<br /> y_{ij}^{(1)} &amp;=&amp; C(t_{ij}^{(1)} ; \psi_i^{(1)}) + (a_1 + b_1\,C(t_{ij}^{(1)};\psi_i^{(1)}))\teps_{ij}^{(1)} \end{array}&lt;/math&gt;&lt;/div&gt; <br /> |reference=(1) }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:warf2&quot;&gt;&lt;math&gt;\begin{array}{c}<br /> y_{ij}^{(2)} &amp;=&amp; E(t_{ij}^{(2)} ; \psi_i^{(2)}) + a_2 \, \teps_{ij}^{(2)} , <br /> \end{array}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> where$\psi_i^{(1)}=(ka_i,V_i, ke_i)$and$\psi_i^{(2)}=(\alpha_i,\beta_i,\gamma_i)$are independent individual parameter vectors that we suppose log-normally distributed.<br /> <br /> Now that the two models have been defined, we can jointly model the two data types. As they are independent, this means that we can simply use the PK model to fit the concentration data and the PD model to fit the PCA data. The figure shows the observed data and the individual predictions given by the two models for the &lt;balloon title=&quot;Monolix was used to fit the models. Note that the PD model is for illustrative purposes only; even though it fits well the data, it has no biological interpretation&quot; style=&quot;color:#177245&quot;&gt;4 individuals&lt;/balloon&gt;. <br /> <br /> <br /> &lt;div style=&quot;padding-left:4em&quot;&gt;[[File:warfpkfit1.png|link=]]&lt;/div&gt;<br /> <br /> {{ImageWithCaption|image=warfpdfit1.png|caption=Jointly fitted PK and PD warfarin data for 4 individuals using two independent models }}<br /> }}<br /> <br /> <br /> In the same way that we jointly modeled these two types of independent continuous data, we can construct joint models using different types of data at the same time, i.e., various combinations of continuous, categorical, count and survival data, etc., if they are independent.<br /> <br /> <br /> {{Example1<br /> |title1=Example<br /> |title2=Longitudinal and time-to-event data model<br /> |text=<br /> Consider the following joint model for survival and longitudinal data:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> y_{ij} &amp;=&amp; f(t_{ij} ; \psi_i^{(1)}) + g(t_{ij} ;\psi_i^{(1)})\teps_{ij} \\<br /> \prob{T_i&gt;t} &amp;=&amp; S(t ; \psi_i^{(2)}) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The continuous outcome$y_{ij}$and the time to event$T_i$are independent if$\psi_i^{(1)}$and$\psi_i^{(2)}$are independent.<br /> <br /> <br /> {{Remarks <br /> |title=Remark<br /> |text= If the event is ''drop-out'', it is sometimes called [http://en.wikipedia.org/wiki/Missing_completely_at_random MCAR] (missing completely at random). This means that the continuous outcome does not provide any information about drop-out. }}<br /> }}<br /> <br /> &lt;br&gt;<br /> <br /> == Conditionally independent examples ==<br /> <br /> In this case, the various observation types depend no longer only on disjoint (i.e., independent) individual parameters. We therefore write$\psi_i$for the overall set of (partially or fully shared)<br /> individual parameters. Observations are nevertheless supposed independent when conditioning on$\psi_i$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pyipsii(y_i,\psi_i;\theta) &amp;=&amp; \pyipsii(y_i^{(1)},y_i^{(2)},\ldots,y_i^{(L)},\psi_i;\theta) \\<br /> &amp;=&amp; \left( \prod_{\ell=1}^{L} \pmacro(y_i^{(\ell)} {{!}} \psi_i) \right) \pmacro(\psi_i;\theta) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> In such cases, each observation provides information on the individual parameter vector$\psi_i$.<br /> <br /> This is the most common case when we are simultaneously modeling different types of longitudinal data of the form:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> y_{ij}^{(1)} &amp;=&amp; f_1(t_{ij}^{(1)} ; \psi_i) + g_1(t_{ij}^{(1)};\psi_i)\teps_{ij}^{(1)} \\<br /> y_{ij}^{(2)} &amp;=&amp; f_2(t_{ij}^{(2)} ; \psi_i) + g_2(t_{ij}^{(2)};\psi_i)\teps_{ij}^{(2)} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Here, the predictions$f_1$and$f_2$both depend on the same vector of individual parameters, which induces dependency between the observations$y_{i}^{(1)}$and$y_{i}^{(2)}$. However, these observations are ''conditionally independent'' if the residual errors$\teps_{ij}^{(1)}$and$\teps_{ij}^{(2)}$are independent.<br /> <br /> <br /> {{Example1<br /> |title1=Example<br /> |title2=A joint PKPD model for warfarin data<br /> <br /> |text=<br /> Pertinent PKPD models aim to establish a link between a drug's concentration and its effect.<br /> An indirect response model assumes that a drug does not instantaneously affect the PD response. Instead, the drug affects a precursor which then influences the PD measure. Here, as warfarin levels increase, prothrombin synthesis is inhibited, which in turn has anti-coagulant effects. Such phenomena can be approximated with a very simple ODE-based mathematical model for the PD component (we use the same one compartment model for the PK component):<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> C(t) &amp;=&amp; \displaystyle{ \frac{D\, k_a}{V(k_a-k_e)} } \left( e^{-k_e \, t} - e^{-k_a \, t} \right) \\<br /> E(t) &amp;=&amp; \displaystyle{ \frac{k_{in} }{ k_{out} } }, \ \ \ \ t\leq 0 \\<br /> \displaystyle{ \frac{d}{dt} }E(t) &amp;=&amp; k_{in}\left( 1 - \displaystyle{ \frac{C(t)}{IC_{50} + C(t)} } \right) - k_{out}\,E(t), \ \ \ \ t &gt;0 .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> We could then use the same residual error models [[#eq:warf1|(1)]] and [[#eq:warf2|(2)]] given in the previous example.<br /> <br /> We can also suppose that the vectors$\psi_i^{(1)}=(ka_i,V_i, ke_i)$and$\psi_i^{(2)}=(IC_{50,i},k_{in,i},k_{out,i})$are independent, but the fact that the effect$E$predicted by the model is a function<br /> of the concentration$C$introduces dependence between the two observation types because both depend on the PK parameters$\psi_i^{(1)}$.<br /> <br /> If the residual errors$(\teps_{ij}^{(1)})$and$(\teps_{ij}^{(2)})$are independent, then the observations are conditionally independent, i.e., when the predicted concentration$C(t)$is given, the observed concentrations$\by^{(1)}$do not bring any further information on the distribution of the PD observations$\by^{(2)}$.<br /> <br /> This joint model can be used to model the same warfarin data as before (again, using$\monolix$).<br /> The figure shows the resulting individual predictions.<br /> <br /> <br /> &lt;div style=&quot;margin-left:4.2em&quot;&gt;[[File:warfpkfit2.png|link=]]&lt;/div&gt;<br /> <br /> {{ImageWithCaption|image=warfpdfit2.png|caption=Fitted PK and PD warfarin data for 4 individuals using a conditionally independent joint model}}<br /> }}<br /> <br /> <br /> We can extend this framework to different types of data, considering for example categorical observations$y_i^{(2)}$for which the probabilities$\prob{y_{ij}^{(2)} = k}$depend on$f_1(t_{ij}^{(2)};\psi_i)$and consequently$\psi_i$. We can also consider survival data for which the risk function depends on$f_1$.<br /> <br /> <br /> <br /> {{Example1<br /> |title1=Example<br /> |title2=Longitudinal and time-to-event data model<br /> <br /> |text=Consider a joint model for survival and longitudinal data, assuming now that the hazard function (or equivalently the survival function) depends on the continuous data prediction:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> y_{ij} &amp;=&amp; f(t_{ij} ; \psi_i) + g(t_{ij} ;\psi_i)\teps_{ij} \\<br /> \prob{T_i&gt;t} &amp;=&amp; S(t ; f(t ; \psi_i)) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> If for instance$(y_{ij})$is the measured viral load of an HIV infected patient, we can assume that the probability of events such as death, seroconversion or drop-out depends on the &quot;true&quot; viral load$f(t ; \psi_i)$.<br /> <br /> <br /> {{Remarks<br /> |title=Remark<br /> |text= if the event is ''drop-out'', it is sometimes called MAR (missing at random). This means that the probability of drop-out depends on some of the individual parameters, but that the observation itself of the continuous outcome does not provide any additional information. In our example, this means that the probability that a patient leaves the study depends on their true state (i.e., their true but unknown viral load), and not on the measured viral load values. }}<br /> }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Conditionally dependent observations ==<br /> <br /> In this case, there is a dependency structure between types of observation that no longer allows us to decompose the joint model into a product of models with only one type of observation in each.<br /> <br /> This kind of dependency occurs when several types of longitudinal data are obtained at the same times, with correlated measurement errors. The joint conditional distribution$\qcyipsii$of the observations is<br /> Gaussian if the residual errors are. The dependency structure between observations can then be characterized by a variance-covariance matrix for the errors.<br /> <br /> We can also consider a natural decomposition of this joint distribution into a product of conditional distributions:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pyipsii(y_i,\psi_i;\theta) &amp;=&amp; \pyipsii(y_i^{(1)},y_i^{(2)},\ldots,y_i^{(L)},\psi_i;\theta) \\<br /> &amp;=&amp; \pmacro(y_i^{(1)} {{!}} \psi_i;\theta) \pmacro(y_i^{(2)} {{!}} y_i^{(1)}, \psi_i;\theta)\ldots \pmacro(y_i^{(L)} {{!}} y_i^{(1)},\ldots,y_i^{(L-1)}, \psi_i;\theta) \pmacro(\psi_i;\theta) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Here, the distribution of$y_i^{(2)}$depends on the observation$y_i^{(1)}$, the distribution of$y_i^{(3)}$depends on$y_i^{(1)}$and$y_i^{(2)}$, etc.<br /> <br /> <br /> {{Example1<br /> |title1=Example<br /> |title2=A longitudinal data and drop-out model<br /> <br /> |text= Consider a joint model for longitudinal data and drop-out, assuming now that the hazard function (or equivalently the survival function) depends on the observed data itself:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> y_{ij} &amp;=&amp; f(t_{ij} ; \psi_i) + g(t_{ij} ;\psi_i)\teps_{ij} \\<br /> \prob{T_i&gt;t} &amp;=&amp; S(t ; (y_{ij}, t_{ij}&lt;t)) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> This drop-out mechanism is sometimes called MNAR (missing not at random).<br /> In this example where$(y_{ij}, t_{ij}&lt;t)$is the sequence of measured viral loads before time$t$, MNAR means that the probability that a patient leaves the study depends on their previously-measured viral concentrations. <br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> &lt;!--<br /> ==$\mlxtran$for joint models==<br /> <br /> TO DO<br /> --&gt;<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Bibliography ==<br /> <br /> <br /> &lt;bibtex&gt;<br /> @article{albert2004modeling,<br /> title={Modeling repeated count data subject to informative dropout},<br /> author={Albert, P. S. and Follmann, D. A.},<br /> journal={Biometrics},<br /> volume={56},<br /> number={3},<br /> pages={667-677},<br /> year={2004},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{chi2006joint,<br /> title={Joint models for multivariate longitudinal and multivariate survival data},<br /> author={Chi, Y.-Y. and Ibrahim, J. G.},<br /> journal={Biometrics},<br /> volume={62},<br /> number={2},<br /> pages={432-445},<br /> year={2006},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{de1994modelling,<br /> title={Modelling progression of CD4-lymphocyte count and its relationship to survival time},<br /> author={De Gruttola, V. and Tu, X. M.},<br /> journal={Biometrics},<br /> pages={1003-1014},<br /> year={1994},<br /> publisher={JSTOR}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{henderson2000joint,<br /> title={Joint modelling of longitudinal measurements and event time data.},<br /> author={Henderson, R. and Diggle, P. and Dobson, A.},<br /> journal={Biostatistics},<br /> volume={1},<br /> number={4},<br /> pages={465-480},<br /> year={2000},<br /> publisher={Biometrika Trust}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{hsieh2006joint,<br /> title={Joint modeling of survival and longitudinal data: likelihood approach revisited},<br /> author={Hsieh, F. and Tseng, Y.-K. and Wang, J.-L.},<br /> journal={Biometrics},<br /> volume={62},<br /> number={4},<br /> pages={1037-1043},<br /> year={2006},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{hu2003joint,<br /> title={A joint model for nonlinear longitudinal data with informative dropout},<br /> author={Hu, C. and Sale, M. E.},<br /> journal={Journal of pharmacokinetics and pharmacodynamics},<br /> volume={30},<br /> number={1},<br /> pages={83-103},<br /> year={2003},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{liu2009,<br /> author = {Liu, L. and Huang, X. },<br /> title = {Joint analysis of correlated repeated measures and recurrent events processes in the presence of a dependent terminal event},<br /> journal = {J. ROY. STAT. SOC. C-APP.},<br /> volume = {58},<br /> pages = {65-81},<br /> year = {2009}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{rizopoulos2012,<br /> author = {Rizopoulos, D. },<br /> title = {Joint Models for Longitudinal and Time-to-Event Data. With Applications in R.},<br /> publisher = {Chapman &amp; Hall/CRC Biostatistics},<br /> address = {Boca Raton},<br /> year = {2012}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{rondeau2007,<br /> author = {Rondeau, V. and Mathoulin-Pelissier, S. and Jacqmin-Gadda, H. and Brouste, V. and Soubeyran, P. },<br /> title = {Joint frailty models for recurring events and death using maximum penalized likelihood estimation: application on cancer events.},<br /> journal = {Biostatistics},<br /> volume = {8},<br /> pages = {708-721},<br /> year = {2007}<br /> }<br /> &lt;/bibitex&gt;<br /> &lt;bibtex&gt;<br /> @article{song2004semiparametric,<br /> title={A Semiparametric Likelihood Approach to Joint Modeling of Longitudinal and Time-to-Event Data},<br /> author={Song,X. and Davidian,M. and Tsiatis,A. A.},<br /> journal={Biometrics},<br /> volume={58},<br /> number={4},<br /> pages={742-753},<br /> year={2004},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{tsiatis2004joint,<br /> title={Joint modeling of longitudinal and time-to-event data: an overview},<br /> author={Tsiatis, A. A. and Davidian, M.},<br /> journal={Statistica Sinica},<br /> volume={14},<br /> number={3},<br /> pages={809-834},<br /> year={2004}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{wu2002joint,<br /> title={A joint model for nonlinear mixed-effects models with censoring and covariates measured with error, with application to AIDS studies},<br /> author={Wu, L.},<br /> journal={Journal of the American Statistical association},<br /> volume={97},<br /> number={460},<br /> pages={955-964},<br /> year={2002},<br /> publisher={American Statistical Association}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{wulfsohn1997joint,<br /> title={A joint model for survival and longitudinal data measured with error},<br /> author={Wulfsohn, M. S. and Tsiatis, A. A.},<br /> journal={Biometrics},<br /> pages={330-339},<br /> year={1997},<br /> publisher={JSTOR}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> <br /> {{Back&amp;Next<br /> |linkNext=Extensions<br /> |linkBack=Models for time-to-event data }}</div> Admin https://wiki.inria.fr/popix/Time-to-event_data_models Time-to-event data models 2013-03-28T11:10:33Z <p>Admin: /* Repeated events */</p> <hr /> <div>&lt;!-- Menu for the Observations chapter --&gt;<br /> &lt;sidebarmenu&gt;<br /> +[[Modeling the observations]]<br /> *[[Modeling the observations| Introduction ]] | [[ Continuous data models ]] | [[Models for count data]] | [[Model for categorical data]] | [[Models for time-to-event data]] | [[Joint models]] <br /> &lt;/sidebarmenu&gt;<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> Here, observations are the &quot;times at which events occur&quot;. An event may be one-off (e.g., death, hardware failure) or repeated (e.g., epileptic seizures, metro strike).<br /> <br /> &lt;br&gt;<br /> ==Single event==<br /> <br /> <br /> To begin with, we will consider a one-off event.<br /> Depending on the application, the length of time to this event may be called the ''survival'' time (until death), ''failure'' time (until hardware fails), etc. To be general, we can just say ''event'' time.<br /> <br /> The random variable representing the event time for subject$i$is typically written$T_i$. Several situations are then possible to define the observations:<br /> <br /> <br /> &lt;ul&gt;<br /> * The event time is exactly observed.<br /> <br /> <br /> ::[[File:survival1.png|link=]]<br /> <br /> <br /> : Then, the observation for individual$i$is$y_i = t_i$, where$t_i$is a realization of the random variable$T_i$.<br /> &lt;br&gt;<br /> <br /> * We may know the event has happened in an interval$I_i$but not know the exact time$t_i$. This is ''interval censoring''. For example, at a routine check-up, cancer recurrence may be detected, and we only know that it has occurred at some point in time since the last check-up.<br /> <br /> <br /> ::[[File:survival3.png|link=]]<br /> <br /> <br /> : The observation for individual$i$is the event:$y_i = $&quot;$a_i &lt; t_i \leq b_i$&quot;.<br /> &lt;br&gt;<br /> <br /> * If we assume that the trial ends at time$\tstop$, then the event may happen after the end of the trial period. This is ''right censoring''.<br /> <br /> <br /> ::[[File:survival2.png|link=]]<br /> <br /> <br /> : There are several variations of this for defining what the observations are:<br /> &lt;br&gt;<br /> <br /> * If events (before$\tstop$) are exactly observed, then for$i=1,2,\ldots, N$,<br /> <br /> {{Equation1|<br /> equation=&lt;math&gt;<br /> y_i = \left\{<br /> \begin{array}{ll}<br /> t_i &amp; {\rm if \quad} t_i \leq \tstop \\<br /> {\rm t_i &gt; \tstop \quad} &amp; {\rm otherwise. \quad}<br /> \end{array} \right.<br /> &lt;/math&gt;}}<br /> &lt;/ul&gt;<br /> <br /> <br /> {{ExampleWithText&amp;Table<br /> |title1=Example:<br /> |title2=<br /> |equation=<br /> Assume that a trial starts at$\tstart=0$and ends at$\tstop=5$, and that we obtain the following observations from 4 individuals: <br /> <br />$y_1 = 3.2$<br /> <br />$y_2=$&quot;$t_2&gt;5$&quot;<br /> <br />$y_3= 2.7$<br /> <br />$y_4 =$&quot;$t_4&gt;5$&quot;<br /> <br /> <br /> These observations can be stored in a data file as shown in the table on the right.<br /> <br /> Here, &quot;event=0&quot; at time$t$means that the event happened after$t$while &quot;event=1&quot; means that the event happened at time$t$. <br /> <br /> The lines with$t=0$are used to state the trial start time$\tstart=0.<br /> <br /> |table=<br /> {{{!}} class=&quot;wikitable&quot; align=&quot;center&quot; style=&quot;width: 75%&quot;<br /> !{{!}} ID {{!}}{{!}} TIME {{!}}{{!}} EVENT <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 0 {{!}}{{!}} 0 <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 3.2 {{!}}{{!}} 1 <br /> {{!}}-<br /> {{!}}2 {{!}}{{!}} 0 {{!}}{{!}} 0 <br /> {{!}}-<br /> {{!}}2 {{!}}{{!}} 5 {{!}}{{!}} 0 <br /> {{!}}-<br /> {{!}}3 {{!}}{{!}} 0 {{!}}{{!}} 0 <br /> {{!}}-<br /> {{!}}3 {{!}}{{!}} 2.7 {{!}}{{!}} 1 <br /> {{!}}-<br /> {{!}}4 {{!}}{{!}} 0 {{!}}{{!}} 0 <br /> {{!}}-<br /> {{!}}4 {{!}}{{!}} 5 {{!}}{{!}} 0 <br /> {{!}}} <br /> }}<br /> <br /> <br /> &lt;ul&gt;<br /> * If events before\tstop$are interval censored, then for$i=1,2,\ldots, N$,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> y_i = \left\{<br /> \begin{array}{ll}<br /> {\rm a_i &lt; t_i \quad \leq \quad b_i} &amp; {\rm if \quad} t_i\leq \tstop \\<br /> {\rm t_i &gt; \tstop \quad} &amp; {\rm otherwise.}<br /> \end{array}<br /> \right.<br /> &lt;/math&gt; }}<br /> &lt;/ul&gt;<br /> <br /> <br /> {{ExampleWithText&amp;Table<br /> |title1=Example:<br /> |title2=<br /> |equation=<br /> Assume that we have censoring intervals of length 1: <br /> <br /> <br />$(0,1],(1,2],\ldots,(4,5]$.<br /> <br /> <br /> For the same four individuals as the previous example, we now have the following observations: <br /> <br /> <br />$y_1=$&quot;$3 &lt; t_1 \leq 4$&quot;, <br /> <br />$y_2=$&quot;$t_2&gt;5$&quot;, <br /> <br />$y_3=$&quot;$2&lt; t_3 \leq 3$&quot;, <br /> <br />$y_4=$&quot;$t_4&gt;5$&quot;. <br /> <br /> <br /> These observations can be stored in a data file as shown in the table on the right.<br /> <br /> Here &quot;event=0&quot; at time$t$means that the event happened after$t$while &quot;event=1&quot; means that the event happened before time$t.<br /> |table=<br /> {{{!}} class=&quot;wikitable&quot; align=&quot;center&quot; style=&quot;width: 75%&quot;<br /> !{{!}} ID {{!}}{{!}} TIME {{!}}{{!}} EVENT <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 0 {{!}}{{!}} 0 <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 3 {{!}}{{!}} 0 <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 4 {{!}}{{!}} 1 <br /> {{!}}-<br /> {{!}}2 {{!}}{{!}} 0 {{!}}{{!}} 0 <br /> {{!}}-<br /> {{!}}2 {{!}}{{!}} 5 {{!}}{{!}} 0 <br /> {{!}}-<br /> {{!}}3 {{!}}{{!}} 0 {{!}}{{!}} 0 <br /> {{!}}-<br /> {{!}}3 {{!}}{{!}} 2 {{!}}{{!}} 0 <br /> {{!}}-<br /> {{!}}3 {{!}}{{!}} 3 {{!}}{{!}} 1 <br /> {{!}}-<br /> {{!}}4 {{!}}{{!}} 0 {{!}}{{!}} 0 <br /> {{!}}-<br /> {{!}}4 {{!}}{{!}} 5 {{!}}{{!}} 0 <br /> {{!}}}<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Probability distributions == <br /> <br /> <br /> Several functions play key roles in time-to-event analysis: the [http://en.wikipedia.org/wiki/Survival_function survival function] the [http://en.wikipedia.org/wiki/Survival_analysis#Hazard_function_and_cumulative_hazard_function hazard function] and the [http://en.wikipedia.org/wiki/Survival_analysis#Hazard_function_and_cumulative_hazard_function cumulative hazard function].<br /> We are still working under a population approach here and so these functions, detailed below, are therefore individual functions, i.e., each subject has its own. As we are using parametric models, this means that these functions depend on individual parameters(\psi_i)$.<br /> <br /> <br /> &lt;ul&gt;<br /> * The '''survival function'''$S(t; \psi_i)$gives the probability that the event happens to individual$i$after time$t&gt;t_{start}$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> S(t; \psi_i) \ \ \eqdef \ \ \prob{T_i&gt;t ; \psi_i} .<br /> &lt;/math&gt; }}<br /> <br /> <br /> <br /> * The '''hazard function'''$\hazard(t;\psi_i)$is defined for individual$i$as the instantaneous rate of the event at time$t$, given that the event has not already occurred:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \hazard(t;\psi_i) \ \ \eqdef \ \ \lim_{dt\to 0} \displaystyle{\frac{S(t;\psi_i) - S(t + dt;\psi_i)}{ S(t;\psi_i) \, dt} }. <br /> &lt;/math&gt; }}<br /> <br /> : This is equivalent to: <br /> <br /> {{Equation1<br /> |equation=&lt;div id=&quot;HazardSurvival&quot; &gt;&lt;math&gt; <br /> \hazard(t;\psi_i) \ \ = \ \ -\displaystyle{ \frac{d}{dt} } \log{S(t;\psi_i)}. <br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1)<br /> }} <br /> <br /> <br /> * Another useful quantity is the '''cumulative hazard function'''$\cumhaz(a,b;\psi_i)$, defined for individual$i$as:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \cumhaz(a,b;\psi_i) \ \ \eqdef \ \ \displaystyle{\int_a^b \hazard(t;\psi_i) \, dt }.<br /> &lt;/math&gt;}}<br /> <br /> : Note that [[#HazardSurvival|(1)]] implies that:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> S(t;\psi_i) \ \ = \ \ e^{-\cumhaz(t_{start},t;\psi_i)}.<br /> &lt;/math&gt; }}<br /> &lt;/ul&gt;<br /> <br /> <br /> Equation [[#HazardSurvival|(1)]] shows that the hazard function$\hazard(t;\psi_i)$characterizes the problem, because knowing it is the same as knowing the survival function$S(t;\psi_i)$. The probability distribution of survival data is therefore completely defined by the hazard function.<br /> Let$\qcyipsii$be the conditional distribution of the observation$y_i$given the vector of individual parameters$\psi_i$. Its pdf can be easily computed for the various censoring situations discussed above:<br /> <br /> <br /> &lt;ol&gt;<br /> &lt;li&gt;If the event is exactly observed with$y_i=t_i$, the density is the derivative of the cumulative density function, i.e., the derivative of$1 - S(t_i;\psi_i)$:&lt;/li&gt;<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \begin{eqnarray}\pcyipsii(y_i {{!}} \psi_i) &amp;=&amp; \frac{d}{dt_i}\left(1 - e^{-\cumhaz(t_{start},t_i;\psi_i)}\right)\\<br /> %&amp;=&amp; \left(\frac{d}{dt_i} \int_{t_{start} }^{t_i} \hazard(u;\psi_i) \, du \right) e^{-\cumhaz(t_{start},t_i;\psi_i)}\\<br /> &amp;=&amp;\hazard(t_i;\psi_i)e^{-\cumhaz(t_{start},t_i;\psi_i)} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> &lt;li&gt;If the event is interval-censored with$y_i=\,$&quot;$a_i&lt;t_i\leq b_i$&quot;:&lt;/li&gt;<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcyipsii(y_i {{!}} \psi_i) &amp;=&amp; \prob{T_i \in (a_i,b_i]\,{{!}} \,\psi_i} \\<br /> %&amp;=&amp; \prob{T_i \leq b_i {{!}} \psi_i} - \prob{T_i \leq a_i {{!}} \psi_i} \\<br /> %&amp;=&amp; (1-S( b_i ; \psi_i)) - (1-S( a_i ; \psi_i)) \\<br /> &amp;=&amp; e^{-\cumhaz(t_{start},a_i;\psi_i)} - e^{-\cumhaz(t_{start},b_i;\psi_i)} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> &lt;li&gt;If the event is right-censored with$y_i= \,$&quot;$t_i&gt;t_{stop}$&quot;:&lt;/li&gt;<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcyipsii(y_i {{!}} \psi_i) &amp;=&amp; \prob{T_i &gt; t_{stop} {{!}} \psi_i} \\<br /> %&amp;=&amp; S( t_{stop} ; \psi_i) \\<br /> &amp;=&amp; e^{-\cumhaz(t_{start},t_{stop};\psi_i)} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> &lt;/ol&gt;<br /> <br /> <br /> &lt;br&gt;&lt;br&gt;<br /> <br /> ==Repeated events==<br /> <br /> <br /> <br /> Sometimes, an event can potentially happen again and again, e.g., [http://en.wikipedia.org/wiki/Epileptic_seizure epileptic seizures], heart attacks, etc.<br /> For any given hazard function$\hazard$, the survival function$S$for individual$i$now represents survival since the previous event at$t_{i,j-1}$, written here in terms of the cumulative hazard from$t_{i,j-1}$to$t_{i,j}$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> S(t_{i,j} {{!}} t_{i,j-1};\psi_i) &amp;=&amp; \prob{T_{i,j} &gt; t_{i,j}\, {{!}} \,T_{i,j-1} = t_{i,j-1};\psi_i} \\<br /> &amp;=&amp; e^{-\cumhaz(t_{i,j-1},t_{i,j};\psi_i)} \\<br /> &amp;=&amp; \exp\left({-\int_{t_{i,j-1} }^{t_{i,j} } \hazard(t;\psi_i) \, dt}\right) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> &lt;!--%In the most simple case,$y_i$is a vector of known event times:$y_i = (t_{i1},t_{i2},\ldots,t_{i\,n_i}).$--&gt;<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Censoring and probability distributions==<br /> <br /> <br /> Taking into account censoring for repeated events is slightly more complicated than for one-off events.<br /> First, let us assume that a trial starts at time$t_{start}$and ends at time$t_{stop}$. Let$(T_{i1}, T_{i2}, \ldots )$be random event times after$t_{start}$. Then, we can distinguish between the two following situations:<br /> <br /> <br /> <br /> &lt;ul&gt;<br /> 1. ''Exactly observed events:'' A sequence of$n_i$event times is precisely observed before$t_{stop}$, i.e.,${\rm y_i = (t_{i,1},t_{i,2},\ldots,t_{i,n_i}, \quad t_{i,n_i+1}&gt;\tstop)}$. <br /> <br /> : The conditional pdf of$y_i$is given by:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;repeatcensor&quot; &gt;&lt;math&gt; <br /> \pcyipsii(y_i {{!}} \psi_i) = \left(\prod_{j=1}^{n_i}\hazard(t_{ij};\psi_i)e^{-\cumhaz(t_{i,j-1},t_{i,j};\psi_i)} \right)e^{-\cumhaz(t_{n_i},\tstop;\psi_i)} ,<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> : where$t_{i0}=\tstart$.<br /> &lt;/ul&gt;<br /> <br /> {{ExampleWith2Tables<br /> |title1=Example<br /> |title2=<br /> |text=<br /> Suppose that for individual$i=1$we know there were 8 events but only 7 of them occurred before$\tstop$. Here is a graphic showing the events that were exactly observed:<br /> <br /> <br /> ::[[File:survival4.png|link=]]<br /> <br /> <br /> This data is then stored in the table on the left below. We see that the 8th and final event is noted &quot;event = 0&quot; with time$\tstop = 18$, indicating that the event was not observed at the end of the time period$\tstop$. In the table on the right, we show the contributions of each observation to the conditional pdf of$y_1$. Indeed, equation [[#repeatcensor|(1)]] means that the pdf of$y_1=(y_{1,1}, \ldots, y_{1,8})is the product of the conditional pdfs given in the right table.<br /> <br /> <br /> |table1=<br /> {{{!}} class=&quot;wikitable&quot; align=&quot;center&quot; style=&quot;width:120%; margin-left:10%;margin-right:10%&quot;<br /> !{{!}} ID {{!}}{{!}} TIME {{!}}{{!}} EVENT <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 0 {{!}}{{!}} 0 <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 1.4 {{!}}{{!}} 1 <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 3.5 {{!}}{{!}} 1 <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 4.4 {{!}}{{!}} 1 <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 5.6 {{!}}{{!}} 1 <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 9.7 {{!}}{{!}} 1 <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 11.4 {{!}}{{!}} 1 <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 15.8 {{!}}{{!}} 1 <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 18 {{!}}{{!}} 0 <br /> {{!}}}<br /> <br /> |table2 =<br /> {{{!}} class=&quot;wikitable&quot; align=&quot;center&quot; style=&quot;width:200%; margin-right:10%; margin-left:10%&quot;<br /> !{{!}} pdf <br /> {{!}}-<br /> {{!}} 1 <br /> {{!}}-<br /> {{!}}\hazard(1.4;\psi_1)e^{-\cumhaz(0,1.4;\psi_1)}$<br /> {{!}}-<br /> {{!}}$\hazard(3.5;\psi_1)e^{-\cumhaz(1.4,3.5;\psi_1)}$<br /> {{!}}-<br /> {{!}}$\hazard(4.4;\psi_1)e^{-\cumhaz(3.5,4.4;\psi_1)}$<br /> {{!}}-<br /> {{!}}$\hazard(5.6;\psi_1)e^{-\cumhaz(4.4,5.6;\psi_1)}$<br /> {{!}}-<br /> {{!}}$\hazard(9.7;\psi_1)e^{-\cumhaz(5.6,9.7;\psi_1)}$<br /> {{!}}-<br /> {{!}}$\hazard(11.4;\psi_1)e^{-\cumhaz(9.7,11.4;\psi_1)}$<br /> {{!}}-<br /> {{!}}$\hazard(15.8;\psi_1)e^{-\cumhaz(11.4,15.8;\psi_1)}$<br /> {{!}}-<br /> {{!}}$e^{-\cumhaz(15,18;\psi_1)}$<br /> {{!}}}<br /> }}<br /> <br /> <br /> &lt;ul&gt;<br /> 2. ''Interval-censored events:'' Let$(b_{0}, b_1], (b_{1}, b_2], \ldots , (b_{K-1}, b_K]$be a sequence of successive intervals with$\tstart=b_0&lt;b_1&lt;b_2 &lt; \ldots &lt;b_K = \tstop$. We do not know the exact event times, but a sequence$(m_{ik}; \, 1 \leq k \leq K)$is observed, where$m_{ik}$is the number of events that occurred for individual$i$in interval$(b_{k-1}, b_k]$.<br /> <br /> : We can show that the conditional pdf of$y_i$is given by:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;pdf_mult_int&quot; &gt;&lt;math&gt;<br /> \pcyipsii(y_i {{!}} \psi_i) = \prod_{k=1}^{K} e^{-\cumhaz(b_{k-1}, b_k;\psi_i)} \displaystyle{\frac{\cumhaz^{m_{ik} }(b_{k-1}, b_k;\psi_i)}{m_{ik}!} } .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> : In other words, the number of events per interval for individual$i$is a (possibly non-homogeneous) Poisson process with intensity$\cumhaz(b_{k-1}, b_k;\psi_i)$in interval$(b_{k-1}, b_k]$.<br /> <br /> <br /> {{ExampleWith2Tables<br /> |title1=Example<br /> |title2=<br /> <br /> |text= Here is a graphic that shows an example of the interval boundaries and the number of events that occurred in each interval for individual$i=1$.<br /> <br /> <br /> ::[[File:survival5.png|link=]]<br /> <br /> <br /> The table on the left below shows the same data. Using [[#pdf_mult_int|(2)]] we see that the conditional pdf of$y_1=(y_{1,1}, \ldots, y_{1,6})is the product of the conditional pdfs given in the table on the right.<br /> <br /> <br /> |table1=<br /> {{{!}} class=&quot;wikitable&quot; align=&quot;center&quot; style=&quot;width:120%; margin-left:10%;margin-right:10%&quot;<br /> !{{!}} ID {{!}}{{!}} TIME {{!}}{{!}} EVENT <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 0 {{!}}{{!}} 0 <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 3 {{!}}{{!}} 1 <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 6 {{!}}{{!}} 3 <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 9 {{!}}{{!}} 0 <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 12 {{!}}{{!}} 2 <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 15 {{!}}{{!}} 0 <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 18 {{!}}{{!}} 1 <br /> {{!}}}<br /> <br /> |table2=<br /> {{{!}} class=&quot;wikitable&quot; align=&quot;center&quot; style=&quot;width:200%; margin-right:10%; margin-left:10% &quot;<br /> !{{!}} pdf <br /> {{!}}-<br /> {{!}} 1 <br /> {{!}}-<br /> {{!}}e^{-\cumhaz(0,3;\psi_1)}\cumhaz(0,3;\psi_1) $<br /> {{!}}-<br /> {{!}}$e^{-\cumhaz(3,6;\psi_1)} {\cumhaz^{3}(3,6;\psi_1)}/{6} $<br /> {{!}}-<br /> {{!}}$e^{-\cumhaz(6,9;\psi_1)}$<br /> {{!}}-<br /> {{!}}$e^{-\cumhaz(9,12;\psi_1)} {\cumhaz^{2}(9,12;\psi_1)}/{2} $<br /> {{!}}-<br /> {{!}}$e^{-\cumhaz(12,15;\psi_1)}$<br /> {{!}}-<br /> {{!}}$e^{-\cumhaz(15,18;\psi_1)}\cumhaz(15,18;\psi_1) $<br /> {{!}}}<br /> }}<br /> <br /> <br /> {{Remarks<br /> |title=Remark<br /> |text= if the total number$n_i$of (observed and unobserved) events for individual$i$is known to be finite, then formula [[#pdf_mult_int|(2)]] is slightly modified when the last event occurs before$\tstop$($t_{n_i}&lt;\tstop$).<br /> Assume that the last event for individual$i$occurs in the$K_i$-th interval. Let$s_{i} = \sum_{i=1}^{k_i-1} m_{ik}$be the number of events that occurred before this interval. Then, we can show that<br /> <br /> {{EquationWithRef_Special<br /> |equation=&lt;div id=&quot;pdf_mult_int2&quot;&gt;&lt;math&gt;<br /> \pcyipsii(y_i {{!}} \psi_i) = \prod_{k=1}^{K_i-1} \left( \displaystyle{ \frac{\cumhaz^{m_{ik} }(b_{k-1}, b_k;\psi_i)}{m_{ik}!} }e^{-\cumhaz(b_{k-1}, b_k;\psi_i)} \right)<br /> \!\times \!\left(1 - \sum_{\ell=0}^{n_i-s_{i} } \displaystyle{ \frac{\cumhaz^{\ell}(b_{k_i -1},b_{k_i};\psi_i)}{\ell!} } e^{-\cumhaz(b_{k_i -1},b_{k_i};\psi_i)}\right) . &lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Examples of hazard functions==<br /> <br /> <br /> <br /> &lt;ul&gt;<br /> * ''Constant hazard model:'' <br /> : The most simple case is that of a constant hazard function:$\hazard(t;\psi_i) = \hazard_i \in \Rset$. Here,$\psi_i=\hazard_i$. <br /> &lt;br&gt;<br /> <br /> <br /> * ''Proportional hazards model:''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \hazard(t;\psi_i) = \hazard_0(t;\alpha_i) \, e^{ \langle \beta , c_i \rangle}.<br /> &lt;/math&gt;}}<br /> <br /> : Here, the hazard is decomposed into two terms: a baseline function$\hazard_0$of$t$, and an &quot;individual&quot; term, function of some individual covariates$c_i$.$ \langle \beta , c_i \rangle$means a scalar product, i.e., a linear function of$c_i$. In a proportional hazards model, a unit increase in the value of a covariate has a multiplicative effect on the hazard.<br /> <br /> : In the usual proportional hazard model,$\alpha_i$is a population constant ($\alpha_i=\alpha$). Then,$\psi_i$can be decomposed into a set of population parameters$\alpha$and an individual parameter$ \langle \beta , c_i \rangle$. A straightforward extension consists in assuming that$\alpha_i$is also an individual parameter.<br /> &lt;br&gt;<br /> <br /> <br /> * ''Extended proportional hazards model:''<br /> <br /> : Another possible extension assumes that the hazard function is a (possibly nonlinear) function$u$of a regression variable$x_i$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \hazard(t;\bpsi_i) = \hazard_0(t;\alpha_{i}) \, e^{ u(\beta_i,x_i(t))} .<br /> &lt;/math&gt; }}<br /> <br /> :Consider for example that$x_i(t)$is the plasmatic concentration of a drug at time$t$for individual$i$. Then,$u(\beta_i,x_i(t))$is the term that represents (i.e., models) the effect of the drug on the hazard, while$\hazard_0(t;\alpha_i)$might model the effect of disease progression on the hazard.<br /> &lt;!--%We consider here parametric functions that possibly depend on individual parameters.--&gt;<br /> <br /> : In this example,$x_i(t)$is the &quot;true&quot; plasmatic concentration for subject$i$at time$t$, and it is a continuous function of time. However, in practice it is only measured at precise times, so a longitudinal model for plasmatic concentration is needed to give a concentration value for each$t$.<br /> :Therefore, in practice we need to develop a ''joint model'' in order to simultaneously model time-to-events data and longitudinal data. Such an approach is introduced in the [[Joint models]] section.<br /> &lt;br&gt;<br /> <br /> <br /> * ''Accelerated failure time (AFT) model:''<br /> <br /> :Unlike proportional hazards models, the AFT model supposes that a change in a covariate has a multiplicative effect not on the hazard but the ''predicted event time''. This can be written as:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \log(T_i) = \langle \psi_i , c_i \rangle + \xi_i<br /> &lt;/math&gt;<br /> }}<br /> <br /> : where$\xi_i$is a zero-mean random variable, e.g., a centered normal distribution. Usually, parameters are fixed effects:$\psi_i=\psi$for each subject$i$.<br /> : To calculate the hazard function, let us first denote$p_{\xi_i}$the density and$F_{\xi_i}$the cdf of$\xi_i$, and to simplify, denote$\mu_i = \langle \psi_i , c_i \rangle$the mean of$\log(T_i)$. We begin by calculating the survival function:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> S(t;\psi_i) &amp;=&amp; \prob{\log{T_i} &gt; \log{t} ; \bpsi_i} \\<br /> &amp;=&amp; \int_{\log{t}-\mu_i}^{\infty} p_{\xi_i}(u; \psi_i) \, du \\<br /> &amp;=&amp; 1 - F_{\xi_i}(\log{t}-\mu_i ; \psi_i) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> :Calculating [[#HazardSurvival|(1)]] then gives the hazard function:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \hazard(t;\psi_i) = \displaystyle{ \frac{p_{\xi_i}(\log{t} - \mu_i; \psi_i)}{t(1- F_{\xi_i}(\log{t} - \mu_i; \psi_i))} }\,<br /> &lt;/math&gt; }}<br /> <br /> &lt;br&gt;&lt;br&gt;<br /> -------<br /> &lt;br&gt;&lt;br&gt;<br /> <br /> {{Summary <br /> |title=Summary<br /> |text=<br /> For a given vector of individual parameters$\psi_i$, a model for (repeated) time-to-event data is completely defined by<br /> <br /> <br /> &lt;ol&gt;<br /> &lt;li&gt; the hazard function$\hazard(t ; \psi_i)$, or the survival function$S(t ; \psi_i)$&lt;/li&gt;<br /> <br /> &lt;li&gt; (possibly) the interval and/or right censoring process &lt;/li&gt;<br /> <br /> &lt;li&gt; (possibly) the maximum number of possible events &lt;/li&gt;<br /> &lt;/ol&gt; }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> &lt;!--<br /> ==$\mlxtran$for time-to-event data models==<br /> --&gt;<br /> <br /> &lt;br&gt;<br /> <br /> ==Bibliography==<br /> <br /> &lt;bibtex&gt;<br /> @book{aalen2008,<br /> author = {Aalen, O. and Borgan, O. and Gjessing, H.},<br /> title = {Survival and Event History Analysis. },<br /> publisher = {Springer},<br /> address = {New York},<br /> year = {2008}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{andersen2006survival,<br /> title={Survival analysis},<br /> author={Andersen, P. K.},<br /> year={2006},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{diggle1994,<br /> author = {Diggle, P. and Kenward, M. G.},<br /> title = {Informative drop-out in longitudinal data analysis.},<br /> journal = {Appl. Stats},<br /> volume = {43},<br /> number = {},<br /> pages = {49-93},<br /> year = {1994}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> &lt;bibtex&gt;<br /> @book{duchateau2008,<br /> author = {Duchateau, L. and Janssen, P.},<br /> title = {The Frailty Model. Statistics for Biology and Health },<br /> publisher = {Springer.},<br /> volume = {},<br /> pages = {},<br /> year = {2008},<br /> series = {},<br /> address = {New York},<br /> edition = {},<br /> month = {}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{fleming2011counting,<br /> title={Counting processes and survival analysis},<br /> author={Fleming, T. R. and Harrington, D. P.},<br /> volume={169},<br /> year={2011},<br /> publisher={Wiley}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{huang2007,<br /> author = {Huang, X. and Liu, L.},<br /> title = {A joint frailty model for survival and gap times between recurrent events.},<br /> journal = {Biometrics},<br /> volume = {63},<br /> number = {},<br /> pages = {389-397},<br /> year = {2007}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{ibrahim2005bayesian,<br /> title={Bayesian survival analysis},<br /> author={Ibrahim, J. G. and Chen, M.-H. and Sinha, D.},<br /> year={2005},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{kalbfleisch2011statistical,<br /> title={The statistical analysis of failure time data},<br /> author={Kalbfleisch, J. D. and Prentice, R. L.},<br /> year={2011},<br /> publisher={Wiley-Interscience}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{kelly2000,<br /> author = {Kelly, P. J. and Jim, L. L.},<br /> title = {Survival analysis for recurrent event data: an application to childhood infectious disease.},<br /> journal = {Statistics in Medicine},<br /> volume = {19},<br /> number = {1},<br /> pages = {13-33},<br /> year = {2000}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{klein2003survival,<br /> title={Survival analysis: techniques for censored and truncated data},<br /> author={Klein, J. P. and Moeschberger, M. L.},<br /> year={2003},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{klein1997,<br /> author = {Klein, J. P. and Moeschberger, M. L.},<br /> title = { Survival Analysis - Techniques for Censored and Truncated Data. },<br /> publisher = {Springer-Verlag},<br /> volume = {},<br /> pages = {},<br /> year = {1997},<br /> series = {},<br /> address = {New York},<br /> edition = {},<br /> month = {}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{kleinbaum2011survival,<br /> title={Survival analysis},<br /> author={Kleinbaum, D. G.},<br /> year={2011},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{littell2006sas,<br /> title={SAS for mixed models},<br /> author={Littell, R. C.},<br /> year={2006},<br /> publisher={SAS institute}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{miller2011survival,<br /> title={Survival analysis},<br /> author={Miller Jr, R. G.},<br /> year={2011},<br /> publisher={Wiley-Interscience}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{wienke2010frailty,<br /> title={Frailty models in survival analysis},<br /> author={Wienke, A.},<br /> volume={37},<br /> year={2010},<br /> publisher={Chapman &amp; Hall}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack=Model for categorical data<br /> |linkNext=Joint models }}</div> Admin https://wiki.inria.fr/popix/Categorical_data_models Categorical data models 2013-03-27T13:41:24Z <p>Admin: </p> <hr /> <div>&lt;!-- Menu for the Observations chapter --&gt;<br /> &lt;sidebarmenu&gt;<br /> +[[Modeling the observations]]<br /> *[[Modeling the observations| Introduction ]] | [[ Continuous data models ]] | [[Models for count data]] | [[Model for categorical data]] | [[Models for time-to-event data ]] | [[Joint models]] <br /> &lt;/sidebarmenu&gt;<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> == Overview == <br /> <br /> Assume now that the observed data takes its values in a fixed and finite set of nominal categories$\{c_1, c_2,\ldots , c_K\}$.<br /> Considering the observations$(y_{ij}, 1 \leq j \leq n_i)$of any individual$i$as a sequence of independent random variables, the model is completely defined by the probability mass functions$\prob{y_{ij}=c_k | \psi_i}$, for$k=1,\ldots, K$and$1 \leq j \leq n_i$.<br /> <br /> For a given$(i,j)$, the sum of the$K$probabilities is 1, so in fact only$K-1$of them need to be defined.<br /> <br /> In the most general way possible, any model can be considered so long as it defines a probability distribution, i.e., for each$k$,$\prob{y_{ij}=c_k | \psi_i} \in [0,1]$, and$\sum_{k=1}^{K} \prob{y_{ij}=c_k | \psi_i} = 1$. For instance, we could define$K$time-dependent parametric functions$a_1$,$a_2$, ...,$a_K$and set for any individual$i$, time$t_{ij}$and$k \in \{1,\ldots,K\}$,<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;categorical1&quot; &gt;&lt;math&gt; <br /> \prob{y_{ij}=c_k {{!}} \psi_i} = \displaystyle{\frac{e^{a_k(t_{ij},\psi_i)} }{\sum_{m=1}^K e^{a_m(t_{ij},\psi_i)} } }. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= Suppose we want to model binary data, i.e., data where$y_{ij} \in \{0,1\}$.<br /> <br /> Let$\psi_i=(\alpha_i,\beta_i)$and let$a_1(t,\psi_i)=0$and$a_2(t,\psi_i) = \alpha_i + \beta_i \, t$. Then, [[#categorical1|(1)]] gives a probability distribution for binary outcomes:<br /> <br /> {{Equation1|equation= &lt;math&gt;<br /> \prob{y_{ij}=0 {{!}} \psi_i} = \displaystyle{\frac{1}{1 + e^{\alpha_i + \beta_i \, t_{ij} } } } \quad \ \ \ \text{and} \quad<br /> \ \ \ \prob{y_{ij}=1 {{!}} \psi_i} = \displaystyle{\frac{e^{\alpha_i + \beta_i \, t_{ij} } }{1 + e^{\alpha_i + \beta_i \, t_{ij} } } }. <br /> &lt;/math&gt;}}<br /> }}<br /> <br /> <br /> Such parametrizations are extremely flexible and easy to interpret in simple situations.<br /> In the previous example for instance,$\prob{y_{ij}=1 | \psi_i}$and$a_2(t_{ij},\psi_i)$move in the same direction as time increases.<br /> <br /> <br /> &lt;br&gt;<br /> == Ordinal data ==<br /> <br /> <br /> Ordinal data further assumes that the categories are ordered, i.e., there exists an order$\prec$such that<br /> <br /> {{Equation1|equation=&lt;math&gt;<br /> c_1 \prec c_2,\prec \ldots \prec c_K .<br /> &lt;/math&gt;}}<br /> <br /> We can think for instance of levels of pain (low, moderate, severe), or any scores on a discrete scale, e.g., from 1 to 10.<br /> <br /> Instead of defining the probabilities of each category, it may be convenient to define the cumulative probabilities$\prob{y_{ij} \preceq c_k | \psi_i}$for$k=1,\ldots ,K-1$, or in the other direction:$\prob{y_{ij} \succeq c_k | \psi_i}$for$k=2,\ldots, K$. <br /> Any model is possible as long as it defines a probability distribution, i.e., satisfies:<br /> <br /> {{Equation1|equation=&lt;math&gt;<br /> 0 \leq \prob{y_{ij} \preceq c_1 {{!}} \psi_i} \leq \prob{y_{ij} \preceq c_2 {{!}} \bpsi_i} \leq \ldots \leq \prob{y_{ij} \preceq c_K {{!}} \psi_i} =1 .<br /> &lt;/math&gt; }}<br /> <br /> Without any loss of generality, we will consider numerical categories in what follows. The order$\prec$then reduces to the usual order$&lt;$on$\Rset$.<br /> <br /> Currently, the most popular model for ordinal data is the proportional odds model which uses ''logits'' of these cumulative probabilities, also called ''cumulative logits''. We assume that there exist$\alpha_{i,1}\geq0$,$\alpha_{i,2}\geq 0, \ldots , \alpha_{i,K-1}\geq 0$such that for$k=1,2,\ldots,K-1$,<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;propodds_model&quot;&gt;&lt;math&gt; \logit \left(\prob{y_{ij} \leq c_k {{!}} \psi_i} \right) = \left( \sum_{m=1}^k \alpha_{im}\right) + \beta_i \, x(t_{ij}) ,<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> where$x(t_{ij})$is a vector of regression variables and$\beta_i$a vector of coefficients. Here,$\bpsi_i=(\alpha_{i1},\alpha_{i2},\ldots,\alpha_{i,K-1},\beta_i)$.<br /> <br /> Recall that$\logit(p) = \log\left(p/(1-p)\right)$. Then, the probability defined in [[#propodds_model|(2)]] can also be expressed as<br /> <br /> {{Equation1|equation=&lt;math&gt;<br /> \prob{y_{ij} \leq c_k {{!}} \bpsi_i} = \displaystyle{\frac{1}{1 + e^{ \left(\sum_{m=1}^k \alpha_{im}\right) + \beta_i \, x(t_{ij})} } }.<br /> &lt;/math&gt;}} <br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We give to patients a drug which is supposed to decrease the level of a given type of pain. <br /> The level of pain is measured on a scale from 1 to 3: 1=low, 2=moderate, 3=high. We consider the following model with the constraint that$\alpha_{i2}\geq 0$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \logit \left(\prob{y_{ij} \leq 1 {{!}} \psi_i}\right) &amp;=&amp; \alpha_{i,1} + \beta_{i,1}\, t_{ij} + \beta_{i,2}\, C_{ij} \\<br /> \logit \left(\prob{y_{ij} \leq 2 {{!}} \psi_i}\right) &amp;=&amp; \alpha_{i,1} + \alpha_{i,2} + \beta_{i,1}\, t_{ij} + \beta_{i2}\, C_{ij} \\<br /> \prob{y_{ij} \leq 3 {{!}} \psi_i} &amp;=&amp; 1,<br /> \end{eqnarray}&lt;/math&gt; }} <br /> <br /> where$C_{ij}$is the concentration of the drug at time$t_{ij}$. The model parameters are quite easy to explain:<br /> <br /> <br /> *$\beta_{i,1}=0$means that without treatment, the level of pain tends to remains stable over time.<br /> *$\beta_{i,1}&lt;0$(resp.$\beta_{i1}&gt;0$) means that the pain tends to increase (resp. decrease) over time.<br /> *$\beta_{i,2}=0$means that the drug has no effect on pain.<br /> *$\beta_{i,2}&gt;0$means that the level of pain tends to decrease when the drug concentration increases, whereas$\beta_{i2}&lt;0$means that pain is an adverse drug effect.<br /> }}<br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> |text= Exclusive use of linear models (or generalized linear models) has no real justification today since very efficient tools are available for nonlinear models.<br /> Model [[#propodds_model|(2)]] can be easily extended to a nonlinear model:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;propodds_model2&quot;&gt;&lt;math&gt; \logit \left(\prob{y_{ij} \leq k {{!}} \psi_i } \right) = \sum_{m=1}^k \alpha_{i,m} + \beta(x(t_{ij})) , &lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> <br /> where$\beta$is any (linear or nonlinear) function of$x(t_{ij})$. }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Markovian dependence ==<br /> <br /> <br /> For the sake of simplicity, we will assume here that the observations$(y_{ij})$take their values in$\{1, 2, \ldots, K\}$.<br /> <br /> We have so far assumed that the categorical observations$(y_{ij},\,j=1,2,\ldots,n_i)$for individual$i$are independent. It is however possible to introduce dependency between observations from the same individual by assuming that$(y_{ij},\,j=1,2,\ldots,n_i)$forms a [http://en.wikipedia.org/wiki/Markov_chain Markov chain]. For instance, a Markov chain with memory 1 assumes that all is required from the past to determine the distribution of$y_{i,j}$is the value of the previous observation$y_{i,j-1}$. i.e., for all$k=1,2,\ldots ,K$,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \prob{y_{i,j} = k\, {{!}} \,y_{i,j-1}, y_{i,j-2}, y_{i,j-3},\ldots,\psi_i} = \prob{y_{i,j} = k {{!}} y_{i,j-1},\psi_i}.<br /> &lt;/math&gt; }}<br /> <br /> <br /> &lt;br&gt;<br /> === Discrete time Markov chains ===<br /> <br /> If the observation times are regularly spaced (constant length of time between successive observations), we can consider the observations$(y_{ij},\,j=1,2,\ldots,n_i)$to be a discrete time Markov chain. Here, for each individual$i$, the probability distribution of the sequence$(y_{ij},\,j=1,2,\ldots,n_i)$is defined by:<br /> <br /> <br /> &lt;ul&gt;<br /> * the distribution$ \pi_{i,1} = (\pi_{i,1}^{k} , k=1,2,\ldots,K)$of the first observation$y_{i,1}$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pi_{i,1}^{k} = \prob{y_{i,1} = k {{!}} \psi_i} &lt;/math&gt; }}<br /> <br /> <br /> * the sequence of ''transition matrices''$(Q_{i,j}, j=2,3,\ldots)$, where for each$j$,$Q_{i,j} = (q_{i,j}^{\ell,k}, 1\leq \ell,k \leq K)$is a matrix of size$K \times K$such that,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> q_{i,j}^{\ell,k} &amp;=&amp; \prob{y_{i,j} = k {{!}} y_{i,j-1}=\ell , \psi_i} \quad \text{ for all } (\ell,k),\\<br /> \sum_{k=1}^{K}q_{ij}^{\ell,k} &amp;=&amp; 1 \quad \text{ for all } (\ell,k).<br /> \end{eqnarray}&lt;/math&gt; }}<br /> &lt;/ul&gt;<br /> <br /> <br /> The conditional distribution of$y_i=(y_{i,j}, j=1,2,\ldots, n_i)$is then well-defined:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pcyipsii(y_i {{!}} \psi_i) = \pmacro(y_{i,1}{{!}}\psi_i) \prod_{j=2}^{n_i} \pmacro(y_{i,j} {{!}} y_{i,j-1},\psi_i) .<br /> &lt;/math&gt; }}<br /> <br /> For a given individual$i$,$Q_{i,j}$defines the transition probabilities between states at a given time$t_{ij}$:<br /> <br /> <br /> ::[[File:markov_1.png|link=]]<br /> <br /> <br /> Our model must therefore give, for each individual$i$, the distribution of first observation$(y_{i,1})$and a description of how the transition probabilities evolve with time.<br /> <br /> The figure below shows several examples of simulated sequences coming from a model with 2 states defined by:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \logit\left(q_{i,j}^{1,2}\right) &amp;=&amp; a_i+b_i \, t_j \\<br /> \logit\left(q_{i,j}^{2,1}\right) &amp;=&amp; c_i+d_i \, t_j \\<br /> \prob{y_{i,1}=1} &amp;=&amp; 0.5 ,<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where$t_j = j$.<br /> <br /> [[File:markov_2.png|link=]]<br /> <br /> In the first example (left), the logits of the transitions between states are constant ($b_i = d_i = 0$).<br /> Transition probabilities are therefore constant over time. Here,$q^{1,2}=1/(1+\exp(2.5))=0.0759$and$q^{2,1}=1/(1+\exp(2))=0.1192$. As$q^{1,2}$and$q^{2,1}$are small with$q^{1,2}&lt;q^{2,1}$, transitions between the two states are rare, and a larger amount of time (on average) is spent in state 1. Indeed, the stationary distribution is the eigenvector of the transition matrix$P$:$\prob{y_{ij}=1}=0.611$and$ \prob{y_{ij}=2}=0.389$.<br /> The figure (left) displays the transition rates$q^{1,2}$and$q^{2,1}$as function of the time (top left) and two simulated sequences of states (centre and bottom left).<br /> <br /> In the second example (center),$b_i$and$d_i$are negative. This means that as time progresses, transitions from state 1 to 2 become rarer, and the same is true from 2 to 1.<br /> <br /> In the third example (right), now$b_i$and$d_i$are positive. This means that as time progresses, transitions from state 1 to 2 become more and more frequent, and also more frequent from 2 to 1.<br /> <br /> Note that the value of$a_i$(resp.$c_i$) can be seen as the transition probability from state 1 to 2 (resp. 2 to 1) at time$t=0$.<br /> <br /> Different choices can be made for defining an initial distribution$\pi_{i,1}$:<br /> <br /> <br /> &lt;ul&gt;<br /> * The initial state can be defined arbitrarily:$y_{i,1}=k_0$. This means that$\pi_{i,1}^{k_0} = 1$and$\pi_{i,1}^{k} = 0$for$k\neq k_0$.<br /> &lt;br&gt;<br /> <br /> * More generally, any simple probability distribution can be put on the choice of the initial state, e.g., the uniform distribution$\pi_{i,1}^{k} = 1/K$for$ k=1,2,\ldots , K$.<br /> &lt;br&gt;<br /> <br /> * If a transition matrix$Q_{i1} $has been defined at time$t_1$, we might consider using its stationary distribution, i.e., taking for$\pi_{i,1}$the solution to:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pi_{i,1} = \pi_{i,1} Q_{i1} .<br /> &lt;/math&gt; }}<br /> &lt;/ul&gt;<br /> <br /> <br /> &lt;br&gt;<br /> <br /> === Continuous time Markov chains ===<br /> <br /> <br /> <br /> The previous situation can be extended to the case where observation times are irregular, by modeling the<br /> sequence of states as a continuous-time [http://en.wikipedia.org/wiki/Markov_process Markov process]. The difference is that rather than transitioning to a new (possibly the same) state at each time step, the system remains in the current state for some random amount of time before transitioning. This process is now characterized by ''transition rates'' instead of transition probabilities:<br /> <br /> {{Equation1 <br /> |equation=&lt;math&gt;<br /> \prob{y_{i}(t+h) = k\, {{!}} \,y_{i}(t)=\ell , \psi_i} = h \, \rho_{i}^{\ell,k}(t) + o(h),\quad k \neq \ell .<br /> &lt;/math&gt; }}<br /> <br /> The probability that no transition happens between$t$and$t+h$is<br /> <br /> {{Equation1 <br /> |equation=&lt;math&gt;<br /> \prob{y_{i}(s) = \ell, \forall s\in(t, t+h) \ {{!}} \ y_{i}(t)=\ell , \psi_i} = e^{h \, \rho_{i}^{\ell,\ell}(t)} . <br /> &lt;/math&gt; }}<br /> <br /> Furthermore, for any individual$i$and any time$t$, the transition rates$\rho_{i}^{\ell,k}(t)$satisfy<br /> <br /> {{Equation1 <br /> |equation=&lt;math&gt;<br /> \sum_{k=1}^K \rho_{i}^{\ell,k}(t) = 0 .<br /> &lt;/math&gt; }}<br /> <br /> &lt;br&gt;&lt;br&gt;<br /> ------------------------<br /> &lt;br&gt;&lt;br&gt;<br /> <br /> {{Summary<br /> |title=Summary<br /> |text= <br /> A model for independent categorical data is completely defined by:<br /> <br /> &lt;ul&gt;<br /> &lt;li&gt;The probability mass functions$\left(\prob{y_{ij} = k {{!}} \psi_i} \right)$<br /> &lt;li&gt; (or) the cumulative probability functions$\left(\prob{y_{ij} \leq c_k {{!}} \psi_i} \right)$for ordinal data<br /> &lt;li&gt; (or) the cumulative logits$\left(\logit \left( \prob{y_{ij} \leq k {{!}} \psi_i} \right)\right)$for a proportional odds model<br /> &lt;/ul&gt;<br /> <br /> <br /> A model for categorical data with Markovian dependency is completely defined by:<br /> <br /> <br /> &lt;ol&gt;<br /> &lt;li&gt; the probability transitions in the case of a discrete-time Markov chain&lt;/li&gt;<br /> <br /> &lt;li&gt; (or) the transition rates in the case of a continuous-time Markov process&lt;/li&gt;<br /> <br /> &lt;li&gt; the probability distribution of the initial states&lt;/li&gt;<br /> &lt;/ol&gt;<br /> }}<br /> <br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==$\mlxtran$for categorical data models == <br /> <br /> <br /> <br /> {{ExampleWithCode<br /> |title1=Example 1:<br /> |title2=$ \quad y_{ij} \in \{0, 1, 2\}$<br /> |text=<br /> <br /> |equation=&lt;math&gt; \begin{eqnarray}<br /> \psi_i &amp;=&amp; (V_i, k_i, \alpha_{0,i}, \alpha_{1,i}, \gamma_i) \\[0.2cm]<br /> D &amp;=&amp;100 \\<br /> C(t,\psi_i) &amp;=&amp; \frac{D_i}{V_i} e^{-k_i \, t} \\[0.2cm]<br /> \prob{y_{ij}\leq 0} &amp;=&amp; \alpha_{0,i} + \gamma_i \, C(t_{ij},\psi_i) \\<br /> \prob{y_{ij}\leq 1} &amp;=&amp; \alpha_{0,i} + \alpha_{1,i} + \gamma_i \, C(t_{ij},\psi_i) \\<br /> \prob{y_{ij}\leq 2} &amp;=&amp; 1<br /> \end{eqnarray}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot; background-color:#EFEFEF; border:none;&quot;&gt;<br /> INPUT:<br /> input = {V, k, alpha0, alpha1, gamma}<br /> <br /> EQUATION:<br /> D = 100<br /> C = D/V*exp(-k*t)<br /> p0 = alpha0 + gamma*C<br /> p1 = p0 + alpha1<br /> <br /> DEFINITION:<br /> y = {type=categorical,<br /> categories={0, 1, 2},<br /> P(y&lt;=0)=p0,<br /> P(y&lt;=1)=p1<br /> }<br /> &lt;/pre&gt; }}<br /> }}<br /> <br /> <br /> <br /> <br /> {{ExampleWithCode<br /> |title1=Example 2:<br /> |title2=$\quad$2-state discrete-time Markov chain<br /> |text=<br /> <br /> |equation=&lt;math&gt; \begin{eqnarray}<br /> \psi_i &amp;=&amp; (a_i,b_i,c_i,d_i) \\[0.2cm]<br /> \logit(p_{ij}^{12}) &amp;=&amp; a_i+b_i \, t_{ij} \\<br /> \logit(p_{ij}^{21}) &amp;=&amp; c_i+d_i \, t_{ij} \\<br /> \prob{y_{i,1}=1} &amp;=&amp; 0.5<br /> \end{eqnarray}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot; background-color:#EFEFEF; border:none;&quot;&gt; <br /> INPUT:<br /> input = {a, b, c, d}<br /> <br /> DEFINITION:<br /> Y = { type = categorical,<br /> categories = {1, 2},<br /> dependence = Markov<br /> P(Y_1=1) = 0.5<br /> logit(P(Y=2 | Y_p=1)) = a + b*t<br /> logit(P(Y=1 | Y_p=2)) = c + d*t<br /> }<br /> &lt;/pre&gt; }}<br /> }}<br /> <br /> <br /> <br /> {{ExampleWithCode<br /> |title1=Example 3:<br /> |title2=$\quad$2-state continuous-time Markov chain<br /> |text=<br /> <br /> |equation=&lt;math&gt; \begin{eqnarray}<br /> \psi_i &amp;=&amp; (a_i,b_i,c_i,d_i,\pi_i) \\[0.2cm]<br /> q_{i}^{12}(t) &amp;=&amp; e^{a_i+b_i \, t} \\<br /> q_{i}^{21}(t) &amp;=&amp; e^{c_i+d_i \, t} \\<br /> \prob{y_{i,1}=1} &amp;=&amp; \pi_i<br /> \end{eqnarray}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot; background-color:#EFEFEF; border:none;&quot;&gt; <br /> INPUT:<br /> input = {a, b, c, d, pi}<br /> <br /> DEFINITION:<br /> Y = { type = categorical,<br /> categories = {1, 2},<br /> dependence = Markov<br /> P(Y_1=1) = pi<br /> transitionRate(1,2) = exp(a + b*t)<br /> transitionRate(2,1) = exp(c + d*t)<br /> }<br /> &lt;/pre&gt; }}<br /> }}<br /> <br /> == Bibliography==<br /> <br /> &lt;bibtex&gt;<br /> @book{agresti2010analysis,<br /> title={Analysis of ordinal categorical data},<br /> author={Agresti, A.},<br /> volume={656},<br /> year={2010},<br /> publisher={Wiley}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{agresti2007introduction,<br /> title={An introduction to categorical data analysis},<br /> author={Agresti, A.},<br /> volume={423},<br /> year={2007},<br /> publisher={Wiley-Interscience}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{bolker2009generalized,<br /> title={Generalized linear mixed models: a practical guide for ecology and evolution},<br /> author={Bolker, B. M. and Brooks, M. E. and Clark, C. J. and Geange, S. W. and Poulsen, J. R. and Stevens, M. H. H. and White, J.-S. S. and others},<br /> journal={Trends in ecology &amp; evolution},<br /> volume={24},<br /> number={3},<br /> pages={127-135},<br /> year={2009},<br /> publisher={Elsevier Science}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{davidian1995,<br /> author = {Davidian, M. and Giltinan, D. M.},<br /> title = {Nonlinear Models for Repeated Measurements Data },<br /> publisher = {Chapman &amp; Hall.},<br /> address = {London},<br /> edition = {},<br /> year = {1995}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{jiang2007,<br /> author = {Jiang., J.},<br /> title = {Linear and Generalized Linear Mixed Models and Their Applications.},<br /> publisher = {Springer Series in Statistics},<br /> volume = {},<br /> pages = {},<br /> year = {2007},<br /> series = {},<br /> address = {New York},<br /> edition = {},<br /> month = {}<br /> }<br /> <br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{littell2006sas,<br /> title={SAS for mixed models},<br /> author={Littell, R. C.},<br /> year={2006},<br /> publisher={SAS institute}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{mcculloch2011generalized,<br /> title={Generalized, Linear, and Mixed Models},<br /> author={McCulloch, C. E. and Searle, S. R. and Neuhaus, J. M.},<br /> isbn={9781118209967},<br /> series={Wiley Series in Probability and Statistics},<br /> year={2011},<br /> publisher={Wiley}<br /> url={http://books.google.fr/books?id=kyvgyK\_sBlkC},<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{molenberghs2005models,<br /> title={Models for discrete longitudinal data},<br /> author={Molenberghs, G. and Verbeke, G.},<br /> year={2005},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{powers2008statistical,<br /> title={Statistical methods for categorical data analysis},<br /> author={Powers, D. A. and Xie, Y.},<br /> year={2008},<br /> publisher={Emerald Group Publishing}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{wolfinger1993generalized,<br /> title={Generalized linear mixed models a pseudo-likelihood approach},<br /> author={Wolfinger, R. and O'Connell, M.},<br /> journal={Journal of statistical Computation and Simulation},<br /> volume={48},<br /> number={3-4},<br /> pages={233-243},<br /> year={1993},<br /> publisher={Taylor &amp; Francis}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack=Models for count data<br /> |linkNext=Models for time-to-event data }}</div> Admin https://wiki.inria.fr/popix/Count_data_models Count data models 2013-03-27T09:20:34Z <p>Admin: </p> <hr /> <div>&lt;!-- Menu for the Observations chapter --&gt;<br /> &lt;sidebarmenu&gt;<br /> +[[Modeling the observations]]<br /> *[[Modeling the observations| Introduction ]] | [[ Continuous data models ]] | [[Models for count data]] | [[Model for categorical data]] | [[Models for time-to-event data ]] | [[Joint models]] <br /> &lt;/sidebarmenu&gt;<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> Count data is a special type of statistical data that can only take non-negative integer values$\{0, 1, 2,\ldots\}$that come from counting something, e.g., the number of [http://en.wikipedia.org/wiki/Seizures seizures], [http://en.wikipedia.org/wiki/Hemorrhages hemorrhages] or lesions in each given time period. More precisely, data from individual$i$is the sequence$y_i=(y_{ij},1\leq j \leq n_i)$where$y_{ij}$is the number of events observed in the$j$th time interval$I_{ij}$.<br /> <br /> For the moment, let us assume that all the intervals have the same length. This is the case, for instance, if data are daily seizure counts:$I_{ij}$is the$j$th day after the start of the experiment and$y_{ij}$the number of seizures observed during that day.<br /> <br /> We will then model the sequence$y_i=(y_{ij},1\leq j \leq n_i)$as a sequence of random variables that take its values in$\{ 0, 1, 2,\ldots\}$.<br /> <br /> If we assume that these random variables are independent, then the model is completely defined by the [http://en.wikipedia.org/wiki/Probability_mass_function probability mass functions]$\prob{y_{ij}=k}$, for$k \geq 0$and$1 \leq j \leq n_i$. Common distributions used to model count data include [http://en.wikipedia.org/wiki/Poisson_distribution Poisson], [http://en.wikipedia.org/wiki/Binomial_distribution binomial] and [http://en.wikipedia.org/wiki/Negative_binomial_distribution negative binomial].<br /> <br /> Indeed, here we will only consider [http://en.wikipedia.org/wiki/Parametric_model parametric distributions]. In this context, building a model means defining:<br /> <br /> <br /> &lt;ul&gt;<br /> * the parameter function (or &quot;intensity&quot;)$\lambda_{ij} = \lambda(t_{ij},\psi_i)$for any individual$i$that depends on individual parameters$\psi_i$and possibly the time$t_{ij}$.&lt;br&gt;<br /> <br /> * the probability mass function$\prob{y_{ij}=k; \lambda_{ij}}$.<br /> &lt;/ul&gt;<br /> <br /> <br /> The conditional distribution of the observations is therefore written:<br /> <br /> {{Equation1<br /> |equation = &lt;math&gt; \prob{y_{ij}=k {{!}} \psi_i} = \prob{y_{ij}=k ; \lambda_{ij} }. &lt;/math&gt; }} <br /> <br /> <br /> {{Example<br /> |title=Example<br /> <br /> |text= Let us illustrate this approach for the Poisson distribution.<br /> A Poisson distribution with intensity$\lambda$is defined by its probability mass function:<br /> <br /> {{Equation1|equation=&lt;math&gt; \prob{y=k ; \lambda} = \displaystyle{\frac{\lambda^{k} \, e^{-\lambda} }{k!} }. &lt;/math&gt;}}<br /> <br /> <br /> ::[[File:poisson1.png|link=]]<br /> <br /> <br /> One of the main property of the Poisson distribution is that$\lambda$is both the mean and the variance of the distribution:<br /> <br /> {{Equation1|equation=&lt;math&gt;\esp{y} = \var{y} = \lambda &lt;/math&gt;}}<br /> <br /> All that remains is to define the Poisson intensity function$ \lambda_{ij} = \lambda(t_{ij},\psi_i)$. Then,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\prob{y_{ij}=k {{!}} \psi_i} = \displaystyle{\frac{\lambda_{ij}^{k}\, e^{-\lambda_{ij} } } {k!} }. &lt;/math&gt;}}<br /> }}<br /> <br /> <br /> There are many variations of the Poisson model:<br /> <br /> <br /> &lt;ul&gt;<br /> * ''Homogeneous Poisson distribution:'' this assumes a constant intensity$\lambda_i$for each individual$i$. Here,$\psi_i = \lambda_i$and$\lambda(t_{ij},\psi_i)=\lambda_i$. <br /> &lt;br&gt;&lt;br&gt;<br /> * ''Non-homogeneous Poisson distribution:'' this assumes that the Poisson intensity is a function of time. For example, suppose that we believe that a disease-related event is increasing linearly in frequency each month. We could then model this using$\lambda(t_{ij},\psi_i) = \lambda_{i} + a_i t_{ij}$, where$t_{ij} = j$(months). Here,$\psi_i=(\lambda_{i},a_i)$.<br /> &lt;br&gt;&lt;br&gt;<br /> * ''Additional regression variables:'' the Poisson intensity may depend on regression variables other than time. For example, assume that taking a drug tends to reduce the number of events. We can then link the time-varying drug concentration$C$to the value of$\lambda$at time$t_{ij}$using for instance an &quot;Imax&quot; model:<br /> <br /> {{Equation1|equation=&lt;math&gt; <br /> \lambda(t_{ij},\psi_i) = \lambda_{i}\left(1-\Imax_i\displaystyle{\frac{ \ C_i(t_{ij})}{IC_{50,i} + C_i(t_{ij})} }\right) ,<br /> &lt;/math&gt; }}<br /> <br /> : where$\lambda_{i}$is the baseline intensity and where$0\leq \Imax_i\leq 1$. Here,$\psi_{i} = (\lambda_{i}, \Imax_i, IC_{50,i})$.<br /> <br /> : This model can even be combined with the previous non-homogeneous model by assuming a time-varying baseline$\lambda_{i}(t)$in order to combine a drug effect model with a disease model for instance.&lt;br&gt;<br /> <br /> <br /> * Instead of assuming independent count data, we can introduce Markovian dependency into the model by assuming for example that$\lambda_{ij}$is function of$y_{i,j-1}$. Then,$\prob{y_{ij}=k\, |\, y_{i\,j-1}, t_{ij},\psi_i}$is the probability function of a Poisson random variable with parameter$\lambda_{ij} =\lambda(y_{i,j-1}, t_{ij},\psi_i)$.<br /> &lt;br&gt;&lt;br&gt;<br /> <br /> * If$y_{ij}$is the number of a given type of events (seizures, hemorrhages, etc.) in a given time interval$I_{ij}$, and if$h_i(t)=h(t,\psi_i)$is the hazard function associated with this sequence of events for individual$i$, then$y_{ij}$is a non-homogeneous Poisson process with Poisson intensity$\lambda_{ij}=\displaystyle{ \int_{I_{ij}}} h(t,\psi_i)dt$in interval$I_{ij}$(see [[Models for time-to-event data]] section).<br /> &lt;/ul&gt;<br /> <br /> <br /> Let us see now some other examples of distributions for count data:<br /> <br /> <br /> &lt;ul&gt;<br /> * Poisson distribution:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \prob{y=k ; \lambda,p_0} = \left\{ \begin{array}{cc}<br /> p_0 + (1-p_0)e^{-\lambda} &amp; {\rm if } \ k=0 \\<br /> (1-p_0) \displaystyle {\frac{e^{-\lambda} \lambda^{k} }{k!} } &amp; {\rm if } \ k&gt;0 .<br /> \end{array}<br /> \right.<br /> &lt;/math&gt;}}<br /> <br /> :where$0\leq p_0 &lt;1$. This is useful when data seem generally to follow a Poisson distribution except for having an overly large quantity of cases when$k=0$:<br /> <br /> <br /> ::[[File:poisson2.png|link=]]<br /> <br /> <br /> * The negative binomial distribution is:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \prob{y=k ; p,r} = \displaystyle{ \frac{\Gamma(k+r)}{k!\, \Gamma(r)} }(1-p)^r p^k ,<br /> &lt;/math&gt;}}<br /> <br /> :with$0\leq p \leq 1$and$r&gt;0$. If$r$is an integer, then the negative binomial (NB) distribution with parameters$(p,r)$is the probability distribution of the number of successes in a sequence of [http://en.wikipedia.org/wiki/Bernoulli_trial Bernoulli trials] with probability of success$p$before$r$failures occur.<br /> <br /> <br /> ::[[File:poisson3.png|link=]]<br /> <br /> <br /> * The generalized Poisson distribution is: <br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \prob{y=k ; \lambda,\delta} = \displaystyle {\frac{\lambda (\lambda+k\delta)^{k-1} e^{-\lambda-k\delta} }{k!} },<br /> &lt;/math&gt; }}<br /> <br /> :with$\lambda&gt;0$and$0\leq \delta &lt;1$.<br /> :The generalized Poisson (GP) distribution includes the Poisson distribution as a special case$(\delta=0)$, and is over-dispersed relative to the Poisson. Indeed, the variance to mean ratio exceeds 1:<br /> <br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \begin{eqnarray} \esp{y} &amp;=&amp; \frac{\lambda}{1-\delta} \\<br /> \var{y} &amp;=&amp; \frac{\lambda}{1-\delta^3}.<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> ::[[File:poisson4.png|link=]]<br /> &lt;ul&gt;<br /> <br /> &lt;br&gt;&lt;br&gt;<br /> -----------------<br /> &lt;br&gt;&lt;br&gt;<br /> <br /> {{Summary<br /> |title=Summary<br /> |text=<br /> For a given design$\bx_{i}$and a given vector of parameters$\psi_i$, a parametric model for count data is completely defined by:<br /> <br /> <br /> &lt;ul&gt;<br /> - the probability mass function used to represent the distribution of the data in a given time interval<br /> &lt;br&gt;&lt;br&gt;<br /> - a model which defines how the distribution's parameter function (i.e., intensity) varies over time.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> ==$\mlxtran$for count data models == <br /> <br /> <br /> <br /> {{ExampleWithCode<br /> |title1= Example 1: <br /> |title2= Poisson model with time varying intensity<br /> |text=<br /> <br /> |equation=&lt;math&gt; \begin{array}{c}<br /> \psi_i &amp;=&amp; (\alpha_i,\beta_i) \\[0.3cm]<br /> \lambda(t,\psi_i) &amp;=&amp; \alpha_i + \beta_i\,t \\[0.3cm]<br /> \prob{y_{ij}=k} &amp;=&amp; \displaystyle{ \frac{\lambda(t_{ij} , \psi_i)^k}{k!} } e^{-\lambda(t_{ij} , \psi_i)}\\<br /> \end{array}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot; background-color:#EFEFEF; border: none;&quot;&gt; <br /> INPUT:<br /> input = {alpha, beta}<br /> <br /> EQUATION:<br /> lambda = alpha + beta*t<br /> <br /> DEFINITION:<br /> y ~ poisson(lambda)<br /> &lt;/pre&gt; }}<br /> }}<br /> <br /> <br /> <br /> {{ExampleWithCode<br /> |title1= Example 2: <br /> |title2= generalized Poisson model<br /> |text=<br /> <br /> |equation=&lt;math&gt; \begin{array}{c}<br /> \psi_i &amp;=&amp; (\lambda_i,\delta_i) \\<br /> \log\left( \prob{y_{ij}=k} \right) &amp;=&amp; \log(\lambda_i) + (k-1)\log(\lambda_i+k\delta_i) \\<br /> &amp;&amp; -\lambda_i-k\delta_i - \log(k!)\\[1cm]<br /> \end{array}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot; background-color:#EFEFEF; border:none;&quot;&gt; <br /> INPUT:<br /> parameter = {dlt, lbd}<br /> <br /> DEFINITION:<br /> Y = {<br /> type = count,<br /> log(P(Y=k)) = log(lambda)<br /> + (k-1)*log(lambda+k*delta)<br /> - lambda -k*delta - factln(k)<br /> } &lt;/pre&gt; }}<br /> }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Bibliography==<br /> <br /> <br /> &lt;bibtex&gt;<br /> @article{blundell2002individual,<br /> title={Individual effects and dynamics in count data models},<br /> author={Blundell, R. and Griffith, R. and Windmeijer, F.},<br /> journal={Journal of Econometrics},<br /> volume={108},<br /> number={1},<br /> pages={113-131},<br /> year={2002},<br /> publisher={Elsevier}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{bolker2009generalized,<br /> title={Generalized linear mixed models: a practical guide for ecology and evolution},<br /> author={Bolker, B. M. and Brooks, M. E. and Clark, C. J. and Geange, S. W. and Poulsen, J. R. and Stevens, M. H. and White, J.-S. S. and others},<br /> journal={Trends in ecology &amp; evolution},<br /> volume={24},<br /> number={3},<br /> pages={127-135},<br /> year={2009},<br /> publisher={Elsevier Science}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{cameron1998regression,<br /> title={Regression analysis of count data},<br /> author={Cameron, A. C. and Trivedi, P. K.},<br /> volume={30},<br /> year={1998},<br /> publisher={Cambridge University Press}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{christensen2002bayesian,<br /> title={Bayesian prediction of spatial count data using generalized linear mixed models},<br /> author={Christensen, O. F. and Waagepetersen, R.},<br /> journal={Biometrics},<br /> volume={58},<br /> number={2},<br /> pages={280-286},<br /> year={2002},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{fahrmeir1994multivariate,<br /> title={Multivariate statistical modelling based on generalized linear models},<br /> author={Fahrmeir, L. and Tutz, G. and Hennevogl, W.},<br /> volume={2},<br /> year={1994},<br /> publisher={Springer New York}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{hall2004zero,<br /> title={Zero-inflated Poisson and binomial regression with random effects: a case study},<br /> author={Hall, D. B.},<br /> journal={Biometrics},<br /> volume={56},<br /> number={4},<br /> pages={103--1039},<br /> year={2004},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{heilbron2007zero,<br /> title={Zero-Altered and other Regression Models for Count Data with Added Zeros},<br /> author={Heilbron, D. C.},<br /> journal={Biometrical Journal},<br /> volume={36},<br /> number={5},<br /> pages={531-547},<br /> year={2007},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{lawless1987negative,<br /> title={Negative binomial and mixed Poisson regression},<br /> author={Lawless, J. F.},<br /> journal={Canadian Journal of Statistics},<br /> volume={15},<br /> number={3},<br /> pages={209-225},<br /> year={1987},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{lee2006multi,<br /> title={Multi-level zero-inflated Poisson regression modelling of correlated count data with excess zeros},<br /> author={Lee, A. H. and Wang, K. and Scott, J. A. and Yau, K. K. W. and McLachlan, G. J.},<br /> journal={Statistical Methods in Medical Research},<br /> volume={15},<br /> number={1},<br /> pages={47-61},<br /> year={2006},<br /> publisher={SAGE Publications}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{mcculloch2011generalized,<br /> title={Generalized, Linear, and Mixed Models},<br /> author={McCulloch, C. E. and Searle, S. R. and Neuhaus, J. M.},<br /> isbn={9781118209967},<br /> series={Wiley Series in Probability and Statistics},<br /> url={http://books.google.fr/books?id=kyvgyK\_sBlkC},<br /> year={2011},<br /> publisher={Wiley}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{min2005random,<br /> title={Random effect models for repeated measures of zero-inflated count data},<br /> author={Min, Y. and Agresti, A.},<br /> journal={Statistical Modelling},<br /> volume={5},<br /> number={1},<br /> pages={1-19},<br /> year={2005},<br /> publisher={SAGE Publications}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{molenberghs2005models,<br /> title={Models for discrete longitudinal data},<br /> author={Molenberghs, G. and Verbeke, G.},<br /> year={2005},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{mullahy1998heterogeneity,<br /> title={Heterogeneity, excess zeros, and the structure of count data models},<br /> author={Mullahy, J.},<br /> journal={Journal of Applied Econometrics},<br /> volume={12},<br /> number={3},<br /> pages={337-350},<br /> year={1998},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{savic2009performance,<br /> title={Performance in population models for count data, part ii: A new saem algorithm},<br /> author={Savic, R. and Lavielle, M.},<br /> journal={Journal of pharmacokinetics and pharmacodynamics},<br /> volume={36},<br /> number={4},<br /> pages={367-379},<br /> year={2009},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{thall1988mixed,<br /> title={Mixed Poisson likelihood regression models for longitudinal interval count data},<br /> author={Thall, P. F.},<br /> journal={Biometrics},<br /> pages={197-209},<br /> year={1988},<br /> publisher={JSTOR}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{thall1990some,<br /> title={Some covariance models for longitudinal count data with overdispersion},<br /> author={Thall, P. F. and Vail, S. C.},<br /> journal={Biometrics},<br /> pages={657-671},<br /> year={1990},<br /> publisher={JSTOR}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{tempelman1996mixed,<br /> title={A mixed effects model for overdispersed count data in animal breeding},<br /> author={Tempelman, R. J. and Gianola, D.},<br /> journal={Biometrics},<br /> pages={265-279},<br /> year={1996},<br /> publisher={JSTOR}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{winkelmann2008econometric,<br /> title={Econometric analysis of count data},<br /> author={Winkelmann, R.},<br /> year={2008},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{wolfinger1993generalized,<br /> title={Generalized linear mixed models a pseudo-likelihood approach},<br /> author={Wolfinger, R. and O'Connell, M.},<br /> journal={Journal of statistical Computation and Simulation},<br /> volume={48},<br /> number={3-4},<br /> pages={233-243},<br /> year={1993},<br /> publisher={Taylor &amp; Francis}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{yau2003zero,<br /> title={Zero-Inflated Negative Binomial Mixed Regression Modeling of Over-Dispersed Count Data with Extra Zeros},<br /> author={Yau, K. K. W. and Wang, K. and Lee, A. H.},<br /> journal={Biometrical Journal},<br /> volume={45},<br /> number={4},<br /> pages={437-452},<br /> year={2003},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{zeileis2008regression,<br /> title={Regression models for count data in R},<br /> author={Zeileis, A. and Kleiber, C. and Jackman, S.},<br /> journal={Journal of Statistical Software},<br /> volume={27},<br /> number={8},<br /> pages={1-25},<br /> year={2008}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> {{Back&amp;Next<br /> |linkBack=Continuous data models<br /> |linkNext=Model for categorical data }}</div> Admin https://wiki.inria.fr/popix/Formula Formula 2013-03-26T10:39:52Z <p>Admin: </p> <hr /> <div><br /> <br /> &lt;math&gt;{s_n = \sum_{i=1}^{n} x_i}&lt;/math&gt;<br /> <br /> :&lt;math&gt;{v_n = \sum_{i=1}^{n} y_i}&lt;/math&gt;<br /> <br /> &lt;math&gt; \label{eq1} w_n = \sum_{i=1}^{n} z_i &lt;/math&gt;<br /> <br /> <br /> equation \eqref{eq1}<br /> <br /> :aaaaa<br /> ::bbbbb</div> Admin https://wiki.inria.fr/popix/Introduction_on_Models Introduction on Models 2013-03-26T10:21:26Z <p>Admin: </p> <hr /> <div>&lt;!-- Menu for the Observations chapter --&gt;<br /> &lt;sidebarmenu&gt;<br /> +[[Modeling the observations]]<br /> *[[Modeling the observations| Introduction ]] | [[ Continuous data models ]] | [[Models for count data]] | [[Model for categorical data]] | [[Models for time-to-event data ]] | [[Joint models]] <br /> &lt;/sidebarmenu&gt;<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> &lt;div style=&quot;color: #2E5894; padding-left: 1.4em; padding-right:2.2em; padding-bottom:0.8em; padding:top:1&quot;&gt;[[Image:attention4.jpg|45px|left|link=]] <br /> (If you are experiencing problems with the display of the mathematical formula, you can either try to use another browser, or use this link which should work smoothly: http://popix.lixoft.net)<br /> &lt;/div&gt;<br /> <br /> We focus in this section on the model for the observations$\by=(y_i, \ 1\leq i \leq N)$when the individual parameters$\bpsi=(\psi_i, \ 1\leq i \leq N)$are given, i.e., the [http://en.wikipedia.org/wiki/Conditional_probability_distribution conditional probability distributions]$({p_{_{y_i|\psi_i}}}, \ 1\leq i \leq N)$, where<br /> <br /> <br /> &lt;ul&gt;<br /> *$N$is the number of subjects. &lt;br&gt;<br /> <br /> *$y_i = (y_{ij}, \ 1\leq j \leq n_i)$are the$n_i$observations for individual$i$. Here,$y_{ij}$is the measurement made on individual$i$at time$t_{ij}$.&lt;br&gt;<br /> <br /> *$\psi_i$is the vector of individual parameters for subject$i$.&lt;br&gt;<br /> &lt;/ul&gt;<br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> |text= &amp;#32;<br /> * We suppose that the model we will use to describe the observations is a function of regression variables$x_i = (x_{ij}, \ 1\leq j \leq n_i)$. Each$x_{ij}$is made up of the time$t_{ij}$and perhaps other variables that vary with time. For example, a [http://en.wikipedia.org/wiki/Pharmacokinetics#Pharmacokinetic_models pharmacokinetic model] can depend on time and weight:$x_{ij} = (t_{ij},w_{ij})$where$w_{ij}$is the weight of individual$i$at time$t_{ij}$, whereas a [http://en.wikipedia.org/wiki/Pharmacodynamics pharmacodynamic model] can depend on time and concentration:$x_{ij} = (t_{ij},c_{ij})$.<br /> <br /> <br /> * The model for individual$i$can also depend on ''input terms''$u_i$. For example, a pharmacokinetic model include the dose regimen administrated to the patients:$\bu_i$is made up of the dose(s) given to patient$i$, the time(s) of administration, and their type (IV bolus, infusion, oral, etc.). If the structural model is a [http://en.wikipedia.org/wiki/Dynamical_system dynamical system] (e.g., defined by a system of [http://en.wikipedia.org/wiki/Ordinary_differential_equation ODEs]), the input terms$(\bu_i)$are also called ''source terms''. }}<br /> <br /> <br /> In our framework, observations$\by$are longitudinal. So, for a given individual$i$, the model has to describe the change in$y_i=(y_{ij})$over time. To do this, we suppose that each observation$y_{ij}$comes from a probability distribution, one that evolves with time. As we have decided to work with parametric models, we suppose that there exists a function$\lambda$such that the distribution of$y_{ij}$depends on$\lambda(t_{ij},\psi_i)$. Implicitly, this includes the time-varying variables$x_{ij}$mentioned above.<br /> <br /> The time-dependence in$\lambda$helps us to describe the change with time of each$y_i$, while the fact it depends on the vector of individual parameters$\psi_i$helps us to describe the inter-individual variability in$y_i$.<br /> <br /> We will distinguish in the following between [[Continuous data models|continuous]] data models, discrete data models (including [[Categorical data models|categorical]] and [[Count data models|count]] data) and [[Models for time-to-event data|time-to-event]] (or survival) models.<br /> <br /> Here are some examples of these various types of data:<br /> <br /> <br /> &lt;ul&gt;<br /> * [[Continuous data models|Continuous data with a normal distribution]]:<br /> <br /> {{EquationWithBorder| &lt;math&gt;y_{ij} \sim {\cal N}\left(f(t_{ij},\psi_i),\, g^2(t_{ij},\psi_i)\right)&lt;/math&gt; }}<br /> <br /> :Here,$\lambda(t_{ij},\psi_i)=\left(f(t_{ij},\psi_i),\,g(t_{ij},\psi_i)\right)$, where$f(t_{ij},\psi_i)$is the mean and$g(t_{ij},\psi_i)$the standard deviation of$y_{ij}$.<br /> <br /> <br /> <br /> * [[Model for categorical data|Categorical data with a Bernoulli distribution]]:<br /> <br /> {{EquationWithBorder|&lt;math&gt; y_{ij} \sim {\cal B}\left(\lambda(t_{ij},\psi_i)\right) &lt;/math&gt; }}<br /> <br /> :Here,$\lambda(t_{ij},\psi_i)$is the probability that$y_{ij}$takes the value 1.<br /> <br /> <br /> <br /> * [[Models for count data|Count data with a Poisson distribution]]:<br /> <br /> {{EquationWithBorder|&lt;math&gt; y_{ij} \sim {\cal P}\left(\lambda(t_{ij},\psi_i)\right) &lt;/math&gt; }}<br /> <br /> :Here,$\lambda(t_{ij},\psi_i)$is the [http://en.wikipedia.org/wiki/Poisson_distribution Poisson] parameter, i.e., the expected value of$y_{ij}$.<br /> <br /> <br /> <br /> * [[Models for time-to-event data | Time-to-event data]]:<br /> <br /> {{EquationWithBorder<br /> |equation = &lt;math&gt; \begin{eqnarray} \prob { y_{i} &gt;t} &amp; = &amp; S( t,\psi_i) \\[6pt] <br /> - \displaystyle{\frac{d}{dt} } \log S(t,\psi_i) &amp; = &amp; \hazard(t,\psi_i)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> :Here,$\lambda(t,\psi_i) = \hazard(t,\psi_i)$is known as the hazard function.<br /> &lt;/ul&gt;<br /> <br /> <br /> In summary, defining a model for the observations means choosing a (parametric) distribution. Then, a model must be chosen for the parameters of this distribution.<br /> <br /> {{Back&amp;Next<br /> |linkBack=Additional levels of variability<br /> |linkNext= Continuous data models }}</div> Admin https://wiki.inria.fr/popix/Modelling_the_Observations Modelling the Observations 2013-03-22T13:09:04Z <p>Admin: /* Extension to autocorrelated errors */</p> <hr /> <div>&lt;!-- Menu for the Observations chapter --&gt;<br /> &lt;sidebarmenu&gt;<br /> +[[Modeling the observations]]<br /> *[[Modeling the observations| Introduction ]] | [[ Continuous data models ]] | [[Models for count data]] | [[Model for categorical data]] | [[Models for time-to-event data ]] | [[Joint models]] <br /> &lt;/sidebarmenu&gt;<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> == The data ==<br /> <br /> Continuous data is data that can take any real value within a given range. For instance, a concentration takes its values in$\Rset^+$, the log of the viral load in$\Rset$, an effect expressed as a percentage in$[0,100].<br /> <br /> The data can be stored in a table and represented graphically. Here is some simple [[Introduction_to_PK_modeling_using_MLXPlore_-_Part_I | pharmacokinetics data]] involving four individuals.<br /> <br /> <br /> {| cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; <br /> | style=&quot;width:60%&quot; align=&quot;center&quot;| <br /> :[[File:continuous_graf0a_1.png]]<br /> | style=&quot;width: 40%&quot; align=&quot;left&quot;| <br /> :{| class=&quot;wikitable&quot; style=&quot;width: 70%;&quot;<br /> !| ID || TIME ||CONCENTRATION<br /> |- <br /> |1 || 1.0 || 9.84 <br /> |-<br /> |1 || 2.0 || 8.19 <br /> |-<br /> |1 || 4.0 || 6.91 <br /> |-<br /> |1 || 8.0 || 3.71 <br /> |-<br /> |1 || 12.0 || 1.25 <br /> |-<br /> |2 || 1.0 || 17.23 <br /> |-<br /> |2 || 3.0 || 11.14 <br /> |-<br /> |2 || 5.0 || 4.35 <br /> |-<br /> |2 || 10.0 || 2.92 <br /> |-<br /> |3 || 2.0 || 9.78 <br /> |-<br /> |3 || 3.0 || 10.40 <br /> |-<br /> |3 || 4.0 || 7.67 <br /> |-<br /> |3 || 6.0 || 6.84 <br /> |-<br /> |3 || 11.0 || 1.10 <br /> |-<br /> |4 || 4.0 || 8.78 <br /> |-<br /> |4 || 6.0 || 3.87 <br /> |-<br /> |4 || 12.0 || 1.85 <br /> |}<br /> |}<br /> <br /> <br /> Instead of individual plots, we can plot them all together. Such a figure is usually called a ''spaghetti plot'':<br /> <br /> <br /> ::[[File:continuous_graf0b_1.png]]<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == The model ==<br /> <br /> <br /> For continuous data, we are going to consider scalar outcomes (y_{ij}\in \Yr \subset \Rset$) and assume the following general model:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;nlme&quot; &gt;&lt;math&gt;y_{ij}=f(t_{ij},\psi_i)+ g(t_{ij},\psi_i)\teps_{ij}, \quad\ \quad 1\leq i \leq N, \quad \ 1 \leq j \leq n_i. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1)<br /> }}<br /> <br /> where$g(t_{ij},\psi_i)\geq 0$.<br /> <br /> Here, the residual errors$(\teps_{ij})$are standardized random variables (mean zero and standard deviation 1).<br /> In this case, it is clear that$f(t_{ij},\psi_i)$and$g(t_{ij},\psi_i)$are the mean and standard deviation of$y_{ij}$, i.e.,<br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\begin{eqnarray} \esp{y_{ij} {{!}} \psi_i} &amp;=&amp; f(t_{ij},\psi_i) \\ <br /> \std{y_{ij} {{!}} \psi_i} &amp;=&amp; g(t_{ij},\psi_i).<br /> \end{eqnarray}&lt;/math&gt;}}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == The structural model == <br /> <br /> <br />$f$is known as the ''structural model'' and aims to describe the time evolution of the phenomena under study. For a given subject$i$and vector of individual parameters$\psi_i$,$f(t_{ij},\psi_i)$is the prediction of the observed variable at time$t_{ij}$. In other words, it is the value that would be measured at time$t_{ij}$if there was no error ($\teps_{ij}=0$).<br /> <br /> In the current example, we decide to model with the structural model$f=A\exp\left(-\alpha t \right)$.<br /> Here are some example curves for various combinations of$A$and$\alpha$:<br /> <br /> <br /> ::[[File:continuous_graf1bis.png|link=]]<br /> <br /> <br /> Other models involving more complicated dynamical systems can be imagined, such as those defined as solutions of systems of ordinary or partial differential equations. Real-life examples are found in the study of [http://en.wikipedia.org/wiki/HIV HIV], [http://en.wikipedia.org/wiki/Pharmacokinetics pharmacokinetics] and [http://en.wikipedia.org/wiki/Tumor_growth_factor tumor growth].<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> == The residual error model ==<br /> <br /> <br /> For a given structural model$f$, the conditional probability distribution of the observations$(y_{ij})$is completely defined by the residual error model, i.e., the probability distribution of the residual errors$(\teps_{ij})$and the standard deviation$g(x_{ij},\psi_i)$. The residual error model can take many forms. For example,<br /> <br /> <br /> &lt;ul&gt;<br /> * A constant error model assumes that$g(t_{ij},\psi_i)=a_i$. Model [[#nlme|(1)]] then reduces to<br /> <br /> {{EquationWithRef <br /> |equation=&lt;div id=&quot;nlme1&quot; &gt;&lt;math&gt;y_{ij}=f(t_{ij},\psi_i)+ a_i\teps_{ij}, \quad \quad \ 1\leq i \leq N<br /> \quad \ 1 \leq j \leq n_i. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> :The figure below shows four simulated sequences of observations$(y_{ij}, 1\leq i \leq 4, 1\leq j \leq 10)$with their respective structural model$f(t,\psi_i)$in blue. Here,$a_i=2$is the standard deviation of$y_{ij}$for all$(i,j)$.<br /> <br /> <br /> ::[[File: continuous_graf2a1.png|link=]]<br /> <br /> <br /> :Let$\hat{y}_{ij}=f(t_{ij},\psi_i)$be the prediction of$y_{ij}$given by the model [[#nlme1|(2)]]. The figure below shows for 50 individuals:<br /> <br /> <br /> &lt;ul&gt;<br /> ::'''-left''': prediction errors$e_{ij}=y_{ij}-\hat{y}_{ij}$vs. predictions$(\hat{y}_{ij})$. The pink line is the mean$\esp{e_{ij}}=0$; the green lines are$\pm$1 standard deviations:$[\std{e_{ij}} , +\std{e_{ij}}]$where$\std{e_{ij}}=a_i=0.5$. <br /> &lt;br&gt;<br /> ::'''-right''': observations$(y_{ij})$vs. predictions$(\hat{y}_{ij})$. The pink line is the identify$y=\hat{y}$, the green lines represent an interval of$\pm 1$standard deviations around$\hat{y}$:$[\hat{y}-\std{e_{ij}} , \hat{y}+\std{e_{ij}}]$.<br /> &lt;/ul&gt;<br /> <br /> <br /> ::[[File:continuous_graf2a2.png|link=]]<br /> <br /> <br /> :These figures are typical for constant error models. The standard deviation of the prediction errors does not depend on the value of the predictions$(\hat{y}_{ij})$, so both intervals have constant amplitude.<br /> <br /> <br /> * A proportional error model assumes that$g(t_{ij},\psi_i) =b_i f(t_{ij},\psi_i)$. Model [[#nlme|(1)]] then becomes<br /> <br /> <br /> {{EquationWithRef <br /> |equation=&lt;div id=&quot;nlme2&quot;&gt;&lt;math&gt; y_{ij}=f(t_{ij},\psi_i)(1 + b_i\teps_{ij}), \quad\ \quad 1\leq i \leq N,<br /> \quad \ 1 \leq j \leq n_i . &lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> <br /> :The standard deviation of the prediction error$e_{ij}=y_{ij}-\hat{y}_{ij}$is proportional to the prediction$\hat{y}_{ij}$. Therefore, the amplitude of the$\pm 1$standard deviation intervals increases linearly with$f$:<br /> <br /> <br /> ::[[File:continuous_graf2b.png|link=]]<br /> <br /> <br /> * A combined error model combines a constant and a proportional error model by assuming$g(t_{ij},\psi_i) =a_i + b_i f(t_{ij},\psi_i)$, where$a_1&gt;0$and$b_i&gt;0$. The standard deviation of the prediction error$e_{ij}$and thus the amplitude of the intervals are now affine functions of the prediction$\hat{y}_{ij}$:<br /> <br /> <br /> ::[[File:continuous_graf2c.png|link=]]<br /> <br /> <br /> * Another alternative combined error model is$g(t_{ij},\psi_i) =\sqrt{a_i^2 + b_i^2 f^2(t_{ij},\psi_i)}$. This gives intervals that look fairly similar to the previous ones, though they are no longer affine.<br /> <br /> <br /> ::[[File:continuous_graf2d.png|link=]]<br /> &lt;/ul&gt;<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Extension to autocorrelated errors == <br /> <br /> <br /> For any subject$i$, the residual errors$(\teps_{ij},1\leq j \leq n_i)$are usually assumed to be independent random variables. Extension to autocorrelated errors is possible by assuming for instance that$(\teps_{ij})$is a stationary [http://en.wikipedia.org/wiki/Autoregressive%E2%80%93moving-average_model ARMA] (Autoregressive Moving Average) process.<br /> For example, an [http://en.wikipedia.org/wiki/Autoregressive_model autoregressive process] of order 1, AR(1), assumes that [http://en.wikipedia.org/wiki/Autocorrelation autocorrelation] decreases exponentially:<br /> <br /> {{EquationWithRef <br /> |equation=&lt;div id=&quot;autocorr1&quot;&gt;&lt;math&gt; {\rm corr}(\teps_{ij},\teps_{i\,{j+1} }) = \rho_i^{(t_{i\,j+1}-t_{ij})}. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> where$0\leq \rho_i &lt;1$for each individual$i$.<br /> If we assume that$t_{ij}=j$for any$(i,j)$. Then,$t_{i,j+1}-t_{i,j}=1$and the autocorrelation function$\gamma$is given by:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \begin{array}<br /> \gamma(\tau) &amp;=&amp; {\rm corr}(\teps_{ij},\teps_{i\,j+\tau}) \\ &amp;= &amp;\rho_i^{\tau} .<br /> \end{array}&lt;/math&gt; }}<br /> <br /> The figure below displays 3 different sequences of residual errors simulated with 3 different autocorrelations$\rho_1=0.1$,$\rho_2=0.6$and$\rho_3=0.95$. The autocorrelation functions$\gamma(\tau)$are also displayed.<br /> <br /> <br /> ::[[File:continuousGraf3.png|link=]]<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Distribution of the standardized residual errors ==<br /> <br /> <br /> The distribution of the standardized residual errors$(\teps_{ij})$is usually assumed to be the same for each individual$i$and any observation time$t_{ij}$.<br /> Furthermore, for identifiability reasons it is also assumed to be symmetrical around 0, i.e.,$\prob{\teps_{ij}&lt;-u}=\prob{\teps_{ij}&gt;u}$for all$u\in \Rset$.<br /> Thus, for any$(i,j)$the distribution of the observation$y_{ij}$is also symmetrical around its prediction$f(t_{ij},\psi_i)$. This$f(t_{ij},\psi_i)$is therefore both the mean and the median of the distribution of$y_{ij}$:$\esp{y_{ij}|\psi_i}=f(t_{ij},\psi_i)$and$\prob{y_{ij}&gt;f(t_{ij},\psi_i)} = \prob{y_{ij}&lt;f(t_{ij},\psi_i)} = 1/2$. If we make the additional hypothesis that 0 is the mode of the distribution of$\teps_{ij}$, then$f(t_{ij},\psi_i)$is also the mode of the distribution of$y_{ij}$.<br /> <br /> A widely used bell-shaped distribution for modeling residual errors is the normal distribution. If we assume that$\teps_{ij}\sim {\cal N}(0,1)$, then$y_{ij}$is also normally distributed:$ y_{ij}\sim {\cal N}(f(t_{ij},\bpsi_i),\, g(t_{ij},\bpsi_i))$.<br /> <br /> Other distributions can be used, such as [http://en.wikipedia.org/wiki/Student's_t-distribution Student's$t$-distribution] (also known simply as the$t$-distribution) which is also symmetric and bell-shaped but with heavier tails, meaning that it is more prone to producing values that fall far from its prediction.<br /> <br /> <br /> ::[[File:continuous_graf4_bis.png|link=]]<br /> <br /> <br /> If we assume that$\teps_{ij}\sim t(\nu)$, then$y_{ij}$has a non-standardized [http://en.wikipedia.org/wiki/Student's_t-distribution Student's$t$-distribution].<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> == The conditional likelihood ==<br /> <br /> <br /> The conditional likelihood for given observations$\by$is defined as<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; {\like}(\bpsi; \by) \ \ \eqdef \ \ \pcypsi(\by {{!}} \bpsi), &lt;/math&gt; }}<br /> <br /> where$\pcypsi(\by | \bpsi)$is the conditional density function of the observations. <br /> If we assume that the residual errors$(\teps_{ij},\ 1\leq i \leq N,\ 1\leq j \leq n_i)$are i.i.d., then this conditional density is straightforward to compute:<br /> <br /> {{EquationWithRef <br /> |equation=&lt;div id=&quot;likeN_model1&quot;&gt;&lt;math&gt; \begin{eqnarray}\pcypsi(\by {{!}} \bpsi ) &amp; = &amp; \prod_{i=1}^N \pcyipsii(\by_i {{!}} \psi_i ) \\<br /> &amp; = &amp; \prod_{i=1}^N \prod_{j=1}^{n_i} \pypsiij(y_{ij} {{!}} \bpsi_i ) \\<br /> &amp; = &amp; \prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{\frac{1}{g(t_{ij},\psi_i)} } \, \qeps\left(\frac{y_{ij} - f(t_{ij},\psi_i)}{g(t_{ij},\psi_i)}\right) ,<br /> \end{eqnarray} &lt;/math&gt;&lt;/div&gt;<br /> |reference=(5) }}<br /> <br /> where$\qeps$is the pdf of the i.i.d. residual errors ($\teps_{ij}$).<br /> <br /> For example, if we assume that the residual errors$\teps_{ij}$are Gaussian random variables with mean 0 and variance 1, then$ \qeps(x) = e^{-{x^2}/{2}}/\sqrt{2 \pi}$, and<br /> <br /> {{EquationWithRef <br /> |equation=&lt;div id=&quot;likeN_model2&quot; &gt;&lt;math&gt; \begin{eqnarray}<br /> \pcypsi(\by {{!}} \psi ) &amp; = &amp;<br /> \prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{ \frac{1}{\sqrt{2 \pi} g(t_{ij},\psi_i)} }\, \exp\left\{-\frac{1}{2}\left(\frac{y_{ij} - f(t_{ij},\psi_i)}{g(t_{ij},\psi_i)}\right)^2\right\} .<br /> \end{eqnarray} &lt;/math&gt;&lt;/div&gt;<br /> |reference=(6) }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Transforming the data==<br /> <br /> <br /> The assumption that the distribution of any observation$y_{ij}$is symmetrical around its predicted value is a very strong one. If this assumption does not hold, we may decide to transform the data to make it more symmetric around its (transformed) predicted value. In other cases, constraints on the values that observations can take may also lead us to want to transform the data.<br /> <br /> Model [[#nlme|(1)]] can be extended to include a transformation of the data:<br /> <br /> {{EquationWithRef <br /> |equation=&lt;div id=&quot;def_t&quot; &gt;&lt;math&gt; \transy(y_{ij})=\transy(f(t_{ij},\psi_i))+ g(t_{ij},\psi_i)\teps_{ij} &lt;/math&gt;&lt;/div&gt;<br /> |reference=(7) }}<br /> <br /> where$\transy$is a monotonic transformation (a strictly increasing or decreasing function).<br /> As you can see, both the data$y_{ij}$and the structural model$f$are transformed by the function$\transy$so that$f(t_{ij},\psi_i)$remains the prediction of$y_{ij}$.<br /> <br /> <br /> <br /> {{Example<br /> |title=Examples: <br /> | text=<br /> 1. If$y$takes non-negative values, a log transformation can be used:$\transy(y) = \log(y)$. We can then present the model with one of two equivalent representations:<br /> <br /> &lt;!-- Therefore,$y=f e^{g\teps}$. --&gt;<br /> <br /> {{Equation1<br /> |equation= &lt;math&gt; \begin{eqnarray}<br /> \log(y_{ij})&amp;=&amp;\log(f(t_{ij},\psi_i))+ g(t_{ij},\psi_i)\teps_{ij}, \\<br /> y_{ij}&amp;=&amp;f(t_{ij},\psi_i)\, e^{ \displaystyle{ -g(t_{ij},\psi_i)\teps_{ij} } }.<br /> \end{eqnarray}&lt;/math&gt;<br /> }}<br /> <br /> <br /> ::[[File: continuous_graf5a.png|link=]]<br /> <br /> <br /> 2. If$y$takes its values between 0 and 1, a logit transformation can be used:<br /> &lt;!-- %\begin{eqnarray*}<br /> %\transy(y)&amp;=&amp;\log(y/(1-y)) \\<br /> % y&amp;=&amp;\frac{f}{f+(1-f) e^{-g\teps}} .<br /> %\end{eqnarray*} --&gt;<br /> <br /> {{Equation1<br /> |equation= &lt;math&gt; \begin{eqnarray}<br /> \logit(y_{ij})&amp;=&amp;\logit(f(t_{ij},\psi_i))+ g(t_{ij},\psi_i)\teps_{ij} , \\<br /> y_{ij}&amp;=&amp; \displaystyle{\frac{ f(t_{ij},\bpsi_i) }{ f(t_{ij},\psi_i) + (1- f(t_{ij},\bpsi_i)) \, e^{ g(t_{ij},\psi_i)\teps_{ij} } } }.<br /> \end{eqnarray}&lt;/math&gt;<br /> }}<br /> <br /> <br /> ::[[File:continuous_graf5b.png|link=]]<br /> <br /> <br /> 3. The logit error model can be extended if the$y_{ij}$are known to take their values in an interval$[A,B]$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \begin{eqnarray}<br /> \transy(y_{ij})&amp;=&amp;\log((y_{ij}-A)/(B-y_{ij})), \\<br /> y_{ij}&amp;=&amp;A+(B-A)\displaystyle{\frac{f(t_{ij},\psi_i)-A}{f(t_{ij},\psi_i)-A+(B-f(t_{ij},\psi_i)) e^{-g(t_{ij},\psi_i)\teps_{ij} } } }\, .<br /> \end{eqnarray}&lt;/math&gt;<br /> }}<br /> &lt;!-- [[File:continuous_graf5c.png]] --&gt;<br /> }}<br /> <br /> <br /> Using the transformation proposed in [[#def_t|(7)]], the conditional density$\pcypsi$becomes<br /> <br /> {{EquationWithRef<br /> |equation= &lt;div id=&quot;likeN_model3&quot; &gt;&lt;math&gt; \begin{eqnarray}<br /> \pcypsi(\by {{!}} \bpsi ) &amp; = &amp; \prod_{i=1}^N \prod_{j=1}^{n_i} \pypsiij(y_{ij} {{!}} \psi_i ) \\<br /> &amp; = &amp; \prod_{i=1}^N \prod_{j=1}^{n_i} \transy^\prime(y_{ij}) \, \ptypsiij(\transy(y_{ij}) {{!}} \psi_i ) \\<br /> &amp; = &amp; \prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{ \frac{\transy^\prime(y_{ij})}{g(t_{ij},\psi_i)} } \, \qeps\left(\frac{\transy(y_{ij}) - \transy(f(t_{ij},\psi_i))}{g(t_{ij},\psi_i)}\right)<br /> \end{eqnarray}<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(8) }}<br /> <br /> For example, if the observations are log-normally distributed given the individual parameters ($\transy(y) = \log(y)$), with a constant error model ($g(t;\psi_i)=a$), then<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pcypsi(\by {{!}} \bpsi ) = \prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{ \frac{1}{\sqrt{2 \pi a^2} \, y_{ij} } }\, \exp\left\{-\frac{1}{2 \, a^2}\left(\log(y_{ij}) - \log(f(t_{ij},\psi_i))\right)^2\right\}.<br /> &lt;/math&gt; }} <br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Censored data ==<br /> <br /> <br /> Censoring occurs when the value of a measurement or observation is only partially known.<br /> For continuous data measurements in the longitudinal context, censoring refers to the values of the measurements, not the times at which they were taken.<br /> <br /> For example, in analytical chemistry, the lower [http://en.wikipedia.org/wiki/Detection_limit limit of detection] (LLOD) is the lowest quantity of a substance that can be distinguished from the absence of that substance. Therefore, any time the quantity is below the LLOD, the &quot;measurement&quot; is not a number but the information that the quantity is less than the LLOD.<br /> <br /> Similarly, in pharmacokinetic studies, measurements of the concentration below a certain limit referred to as the lower [http://en.wikipedia.org/wiki/Detection_limit limit of quantification] (LLOQ) are so low that their reliability is considered suspect. A measuring device can also have an upper [http://en.wikipedia.org/wiki/Detection_limit limit of quantification] (ULOQ) such that any value above this limit cannot be measured and reported.<br /> <br /> As hinted above, censored values are not typically reported as a number, but their existence is known, as well as the type of censoring. Thus, the observation$\repy_{ij}$(i.e., what is reported) is the measurement$y_{ij}$if not censored, and the type of censoring otherwise.<br /> <br /> We usually distinguish three types of censoring: left, right and interval. We now introduce these, along with illustrative data sets.<br /> <br /> <br /> * '''Left censoring''': a data point is below a certain value$L$but it is not known by how much:<br /> <br /> {{Equation1<br /> |equation = &lt;math&gt; <br /> \repy_{ij} = \left\{ \begin{array}{c}<br /> y_{ij} &amp; {\rm if } \ y_{ij} \geq L \\<br /> y_{ij} &lt; L &amp; {\rm otherwise.}<br /> \end{array} \right. &lt;/math&gt; }} <br /> <br /> &lt;blockquote&gt;In the figures below, the &quot;data&quot; below the limit$L=-0.30$, shown in gray, is not observed. The values are therefore not reported in the dataset. An additional column {{Verbatim|cens}} can be used to indicate if an observation is left-censored ({{Verbatim|cens{{-}}1}}) or not ({{Verbatim|cens{{-}}0}}). The column of observations {{Verbatim|log-VL}} displays the observed log-viral load when it is above the limit$L=-0.30$, and the limit$L=-0.30otherwise.&lt;/blockquote&gt;<br /> <br /> <br /> {| cellspacing=&quot;0&quot; cellpadding=&quot;0&quot; <br /> | style=&quot;width=60%&quot; |<br /> [[File:continuous_graf6a.png|link=]]<br /> | style=&quot;width=40%&quot; align=&quot;right&quot;|<br /> {| class=&quot;wikitable&quot; style=&quot;width: 150%&quot;<br /> !| ID || TIME ||log-VL || cens<br /> |- <br /> | 1 || 1.0 || 0.26 || 0<br /> |-<br /> | 1 || 2.0 || 0.02 || 0<br /> |-<br /> | 1 || 3.0 || -0.13 || 0<br /> |-<br /> | 1 || 4.0 || -0.13 || 0<br /> |-<br /> | 1 || 5.0 || -0.30 || 1<br /> |-<br /> | 1 || 6.0 || -0.30 || 1<br /> |-<br /> | 1 || 7.0 || -0.25 || 0<br /> |-<br /> | 1 || 8.0 || -0.30 || 1<br /> |-<br /> | 1 || 9.0 || -0.29 || 0<br /> |-<br /> | 1 || 10.0 || -0.30 || 1<br /> |}<br /> |}<br /> <br /> <br /> * '''Interval censoring:''' if a data point is in intervalI$, its exact value is not known:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \repy_{ij} = \left\{ \begin{array}{cc}<br /> y_{ij} &amp; {\rm if } \ y_{ij}\notin I \\<br /> y_{ij} \in I &amp; {\rm otherwise.}<br /> \end{array} \right. &lt;/math&gt; }}<br /> <br /> &lt;blockquote&gt;For example, suppose we are measuring a concentration which naturally only takes non-negative values, but again we cannot measure it below the level$L = 1$. Therefore, any data point$y_{ij}$below$1$will be recorded only as &quot;$y_{ij} \in [0,1)&quot;. In the table, an additional column {{Verbatim|llimit}} is required to indicate the lower bound of the censoring interval.&lt;/blockquote&gt;<br /> <br /> <br /> {| cellspacing=&quot;0&quot; cellpadding=&quot;0&quot; <br /> | style=&quot;width=60%&quot; |<br /> [[File:continuous_graf6b.png|link=]]<br /> | style=&quot;width=40%&quot; align=&quot;right&quot;|<br /> {| class=&quot;wikitable&quot; style=&quot;width: 150%&quot;<br /> !| ID || TIME ||CONC. || llimit || cens<br /> |-<br /> | 1 || 0.3 || 1.20 || . || 0<br /> |-<br /> | 1 || 0.5 || 1.93 || . || 0<br /> |-<br /> | 1 || 1.0 || 3.38 || . || 0<br /> |-<br /> | 1 || 2.0 || 3.88 || . || 0<br /> |-<br /> | 1 || 4.0 || 3.24 || . || 0<br /> |-<br /> | 1 || 6.0 || 1.82 || . || 0<br /> |-<br /> | 1 || 8.0 || 1.07 || . || 0<br /> |-<br /> | 1 || 12.0 || 1.00 || 0.00 || 1<br /> |-<br /> | 1 || 16.0 || 1.00 || 0.00 || 1<br /> |-<br /> | 1 || 20.0 || 1.00 || 0.00 || 1<br /> |}<br /> |}<br /> <br /> <br /> <br /> * '''Right censoring:''' when a data point is above a certain valueU, it is not known by how much:<br /> <br /> {{Equation1<br /> |equation= &lt;math&gt; \repy_{ij} = \left\{ \begin{array}{cc}<br /> y_{ij} &amp; {\rm if } \ y_{ij}\leq U \\<br /> y_{ij} &gt; U &amp; {\rm otherwise.}<br /> \end{array} \right. <br /> &lt;/math&gt; }}<br /> <br /> &lt;blockquote&gt;Column {{Verbatim|cens}} is used to indicate if an observation is right-censored ({{Verbatim|cens{{-}}-1}}) or not ({{Verbatim|cens{{-}}0}}).<br /> &lt;/blockquote&gt;<br /> <br /> {| cellspacing=&quot;0&quot; cellpadding=&quot;0&quot; <br /> | style=&quot;width=60%&quot; |<br /> [[File:continuous_graf6c.png|link=]]<br /> | style=&quot;width=40%&quot; align=&quot;right&quot; |<br /> {| class=&quot;wikitable&quot; style=&quot;width: 150%&quot;<br /> !| ID || TIME ||VOLUME || CENS<br /> |-<br /> | 1 || 2.0 || 1.85 || 0<br /> |-<br /> | 1 || 7.0 || 2.40 || 0<br /> |-<br /> | 1 || 12.0 || 3.27 || 0<br /> |-<br /> | 1 || 17.0 || 3.28 || 0<br /> |-<br /> | 1 || 22.0 || 3.62 || 0<br /> |- <br /> | 1 || 27.0 || 3.02 || 0<br /> |-<br /> | 1 || 32.0 || 3.80 || -1<br /> |-<br /> | 1 || 37.0 || 3.80 || -1<br /> |-<br /> | 1 || 42.0 || 3.80 || -1<br /> |-<br /> | 1 || 47.0 || 3.80 || -1<br /> |}<br /> |}<br /> <br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> <br /> |text= &amp;#32;<br /> * Different censoring limits and intervals can be in play at different times and for different individuals.<br /> * Interval censoring covers any type of censoring, i.e., settingI=(-\infty,L]$for left censoring and$I=[U,+\infty)$for right censoring.<br /> }}<br /> <br /> <br /> The likelihood needs to be computed carefully in the presence of censored data. To cover all three types of censoring in one go, let$I_{ij}$be the (finite or infinite) censoring interval existing for individual$i$at time$t_{ij}$. Then,<br /> <br /> {{EquationWithRef<br /> |equation = &lt;div id=&quot;likeN_model4&quot;&gt;&lt;math&gt; <br /> \begin{eqnarray} \pcypsi(\brepy {{!}} \bpsi ) &amp; = &amp; \prod_{i=1}^N \prod_{j=1}^{n_i} \pypsiij(y_{ij} {{!}} \psi_i )^{\mathbf{1}_{y_{ij} \notin I_{ij} } } \, \prob{y_{ij} \in I_{ij} {{!}} \psi_i}^{\mathbf{1}_{y_{ij} \in I_{ij} } }.<br /> \end{eqnarray}<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(9) }}<br /> <br /> where<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \prob{y_{ij} \in I_{ij} {{!}} \psi_i} = \int_{I_{ij} } \qypsiij(u {{!}} \psi_i )\, du &lt;/math&gt; }}<br /> <br /> We see that if$y_{ij}$is not censored (i.e.,$ \mathbf{1}_{y_{ij} \notin I_{ij}} = 1$), the contribution to the likelihood is the usual$\pypsiij(y_{ij} | \psi_i )$, whereas if it is censored, the contribution is$\prob{y_{ij} \in I_{ij}|\psi_i}$.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Extensions to multidimensional continuous observations == <br /> <br /> <br /> &lt;ul&gt;<br /> * Extension to multidimensional observations is straightforward. If$d$outcomes are simultaneously measured at$t_{ij}$, then$y_{ij}$is a now a vector in$\Rset^d$and we can suppose that equation [[#nlme|(1)]] still holds for each component of$y_{ij}$. Thus, for$1\leq m \leq d$,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> y_{ijm}=f_m(t_{ij},\psi_i)+ g_m(t_{ij},\psi_i)\teps_{ijm} , \ \ 1\leq i \leq N,<br /> \ \ 1 \leq j \leq n_i.<br /> &lt;/math&gt;}}<br /> <br /> : It is then possible to introduce correlation between the components of each observation by assuming that$\teps_{ij} = (\teps_{ijm} , 1\leq m \leq d)$is a random vector with mean 0 and correlation matrix$R_{\teps_{ij}}$.<br /> <br /> <br /> * Suppose instead that$K$replicates of the same measurement are taken at time$t_{ij}$. Then, the model becomes, for$1 \leq k \leq K$,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> y_{ijk}=f(t_{ij},\psi_i)+ g(t_{ij},\bpsi_i)\teps_{ijk} ,\ \ 1\leq i \leq N,<br /> \ \ 1 \leq j \leq n_i .<br /> &lt;/math&gt; }}<br /> <br /> : Following what can be done for decomposing random effects into inter-individual and inter-occasion components, we can decompose the residual error into inter-measurement and inter-replicate components:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> y_{ijk}=f(t_{ij},\psi_i)+ g_{I\!M}(t_{ij},\psi_i)\vari{\teps}{ij}{I\!M} + g_{I\!R}(x_{ij},\psi_i)\vari{\teps}{ijk}{I\!R} .<br /> &lt;/math&gt; }}<br /> &lt;/ul&gt;<br /> &lt;br&gt;&lt;br&gt;<br /> -----------------------------------------------<br /> &lt;br&gt;&lt;br&gt;<br /> <br /> {{Summary<br /> |title=Summary <br /> |text= <br /> A model for continuous data is completely defined by:<br /> <br /> *The structural model$f$<br /> *The residual error model$g$<br /> *The probability distribution of the residual errors$(\teps_{ij})$<br /> *Possibly a transformation$\transy$of the data<br /> <br /> <br /> The model is associated with a design which includes:<br /> <br /> <br /> - the observation times$(t_{ij})$<br /> <br /> - possibly some additional regression variables$(x_{ij})$<br /> <br /> - possibly the inputs$(u_i)$(e.g., the dosing regimen for a PK model)<br /> <br /> - possibly a censoring process$(I_{ij})$<br /> <br /> }}<br /> <br /> <br /> ==$\mlxtran\$ for continuous data models == <br /> <br /> <br /> <br /> {{ExampleWithCode<br /> |title1=Example 1:<br /> |title2=<br /> <br /> |text= <br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \psi &amp;=&amp; (A,\alpha,B,\beta, a) \\<br /> f(t,\psi) &amp;=&amp; A\, e^{- \alpha \, t} + B\, e^{- \beta \, t} \\<br /> y_{ij} &amp;=&amp; f(t_{ij} , \psi_i) + a\, \teps_{ij}<br /> \end{eqnarray}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> INPUT:<br /> input = {A, B, alpha, beta, a}<br /> <br /> EQUATION:<br /> f = A*exp(-alpha*t) + B*exp(-beta*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, std=a}&lt;/pre&gt;<br /> }}<br /> <br /> }}<br /> <br /> <br /> {{ExampleWithCode<br /> |title1=Example 2:<br /> |title2=<br /> <br /> |text=<br /> |equation= &lt;math&gt; \begin{eqnarray}<br /> \psi &amp;=&amp; (\delta, c , \beta, p, s, d, \nu,\rho, a) \\<br /> t_0 &amp;=&amp;0 \\[0.2cm]<br /> {\rm if \quad t&lt;t_0} \\[0.2cm]<br /> \quad \nitc &amp;=&amp; \delta \, c/( \beta \, p) \\<br /> \quad \itc &amp;=&amp; (s - d\,\nitc) / \delta \\<br /> \quad \vl &amp;=&amp; p \, \itc / c. \\[0.2cm] <br /> {\rm else \quad \quad }\\[0.2cm] <br /> \quad \dA{\nitc}{} &amp; =&amp; s - \beta(1-\nu) \, \nitc(t) \, \vl(t) - d\,\nitc(t) \\<br /> \quad \dA{\itc}{} &amp; = &amp;\beta(1-\nu) \, \nitc(t) \, \vl(t) - \delta \, \itc(t) \\<br /> \quad \dA{\vl}{} &amp; = &amp;p(1-\rho) \, \itc(t) - c \, \vl(t) \\<br /> \quad \log(y_{ij}) &amp;= &amp;\log(V(t_{ij} , \psi_i)) + a\, \teps_{ij} <br /> \end{eqnarray}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> INPUT:<br /> input = {delta, c, beta, p, s, d, nu, rho, a}<br /> <br /> EQUATION:<br /> t0=0<br /> N_0 = delta*c/(beta*p)<br /> I_0 = (s - d*N_0)/delta<br /> V_0 = p*I_0/c<br /> ddt_N = s - beta*(1-nu)*N*V - d*N<br /> ddt_I = beta*(1-nu)*N*V - delta*I<br /> ddt_V = p*(1-rho)*I - c*V<br /> <br /> DEFINITION:<br /> y = {distribution=logNormal, prediction=V, std=a}<br /> &lt;/pre&gt; }} <br /> }}<br /> <br /> &lt;br&gt;&lt;br&gt;<br /> <br /> <br /> ==Bibliography==<br /> <br /> <br /> &lt;bibtex&gt;<br /> @book{davidian1995,<br /> author = {Davidian, M. and Giltinan, D.M. },<br /> title = {Nonlinear Models for Repeated Measurements Data },<br /> publisher = {Chapman &amp; Hall.},<br /> address = {London},<br /> edition = {},<br /> year = {1995}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{demidenko2005mixed,<br /> title={Mixed Models: Theory and Applications},<br /> author={Demidenko, E.},<br /> isbn={9780471726135},<br /> series={Wiley Series in Probability and Statistics}, url={http://books.google.fr/books/about/Mixed_Models.html?id=IWQR8d_UZHoC&amp;redir_esc=y}, <br /> year={2005}, publisher={Wiley}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{fitzmaurice2008longitudinal,<br /> title={Longitudinal Data Analysis},<br /> author={Fitzmaurice, G. and Davidian, M. and Verbeke, G. and Molenberghs, G.},<br /> isbn={9781420011579},<br /> lccn={2008020681},<br /> series={Chapman &amp; Hall/CRC Handbooks of Modern Statistical Methods},url={http://books.google.fr/books?id=zVBjCvQCoGQC},<br /> year={2008},publisher={Taylor &amp; Francis}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{jiang2007,<br /> author = {Jiang, J.},<br /> title = {Linear and Generalized Linear Mixed Models and Their Applications},<br /> publisher = {Springer Series in Statistics},<br /> year = {2007},<br /> address = {New York}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{laird1982,<br /> author = {Laird, N.M. and Ware, J.H.},<br /> title = {Random-Effects Models for Longitudinal Data},<br /> journal = {Biometrics},<br /> volume = {38},<br /> pages = {963-974},<br /> year = {1982}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{lindstrom1990Nonlinear,<br /> author = {Lindstrom, M.J. and Bates, D.M. },<br /> title = {Nonlinear mixed-effects models for repeated measures},<br /> journal = {Biometrics},<br /> volume = {46},<br /> pages = {673-687},<br /> year = {1990}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{littell2006sas,<br /> title={SAS for mixed models},<br /> author={Littell, R.C.},<br /> year={2006},<br /> publisher={SAS institute}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{mcculloch2011generalized,<br /> title={Generalized, Linear, and Mixed Models},<br /> author={McCulloch, C.E. and Searle, S.R.},<br /> isbn={9781118209967},<br /> series={Wiley Series in Probability and Statistics}, url={http://books.google.fr/books/about/Generalized_Linear_and_Mixed_Models.html?id=bWDPukohugQC&amp;redir_esc=y}, year={2004}, publisher={Wiley &amp; Sons} <br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{verbeke2009linear,<br /> title={Linear Mixed Models for Longitudinal Data},<br /> author={Verbeke, G. and Molenberghs, G.},<br /> isbn={9781441902993},<br /> lccn={2010483807},<br /> series={Springer Series in Statistics},<br /> url={http://books.google.fr/books?id=jmPkX4VU7h0C},<br /> year={2009},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{west2006linear,<br /> title={Linear Mixed Models: A Practical Guide Using Statistical Software},<br /> author={West, B. and Welch, K.B. and Galecki, A.T.},<br /> isbn={9781584884804},<br /> lccn={2006045440},year={2006},publisher={Taylor &amp; Francis}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack=Modeling the observations <br /> |linkNext=Models for count data }}</div> Admin