Difference between revisions of "The Metropolis-Hastings algorithm for simulating the individual parameters"
m |
m |
||
Line 308: | Line 308: | ||
<blockquote> | <blockquote> | ||
− | 1. | + | 1. $q^{(1)}=\qetai$ is the marginal distribution of $\eta_i$, that is, the normal distribution ${\cal N}(0,\Omega)$. The acceptance probability for this kernel is |
</blockquote> | </blockquote> | ||
Line 317: | Line 317: | ||
<blockquote> | <blockquote> | ||
− | + | 2. $q^{(2,\ieta)}$, for $\ieta=1,2,\ldots, d$ is the unidimensional Gaussian random walk for component $\ieta$ of $\eta_i$: | |
</blockquote> | </blockquote> | ||
Line 323: | Line 323: | ||
|equation=<math> \ceta_{i,\ieta}^{(\imh)} = \eta_{i,\ieta}^{(\imh-1)} + \xi_{i,\ieta}^{(\imh)} , </math> }} | |equation=<math> \ceta_{i,\ieta}^{(\imh)} = \eta_{i,\ieta}^{(\imh-1)} + \xi_{i,\ieta}^{(\imh)} , </math> }} | ||
+ | <blockquote> | ||
: where $\xi_{i,\ieta}^{(\imh)} \sim {\cal N}(0, \upsilon_\ieta^{(\imh)})$. The variance $\upsilon_\ieta^{(\imh)}$ of this random walk is adjusted in order to reach an optimal acceptance rate $\alpha^\star$ (\monolix uses $\alpha^\star = 0.3$ as default). Here, the transition kernel is symmetrical and | : where $\xi_{i,\ieta}^{(\imh)} \sim {\cal N}(0, \upsilon_\ieta^{(\imh)})$. The variance $\upsilon_\ieta^{(\imh)}$ of this random walk is adjusted in order to reach an optimal acceptance rate $\alpha^\star$ (\monolix uses $\alpha^\star = 0.3$ as default). Here, the transition kernel is symmetrical and | ||
+ | </blockquote> | ||
{{Equation1 | {{Equation1 |
Revision as of 11:00, 30 April 2013
$ \def\ieta{m} \def\Meta{M} \def\imh{\ell} \def\ceta{\tilde{\eta}} \def\cpsi{\tilde{\psi}} \newcommand{\argmin}[1]{ \mathop{\rm arg} \mathop{\rm min}\limits_{#1} } \newcommand{\nominal}[1]{#1^{\star}} \newcommand{\psis}{\psi{^\star}} \newcommand{\phis}{\phi{^\star}} \newcommand{\hpsi}{\hat{\psi}} \newcommand{\hphi}{\hat{\phi}} \newcommand{\teps}{\varepsilon} \newcommand{\limite}[2]{\mathop{\longrightarrow}\limits_{\mathrm{#1}}^{\mathrm{#2}}} \newcommand{\DDt}[1]{\partial^2_\theta #1} \def\cpop{c_{\rm pop}} \def\Vpop{V_{\rm pop}} \def\iparam{l} \newcommand{\trcov}[1]{#1} \def\bu{\boldsymbol{u}} \def\bt{\boldsymbol{t}} \def\bT{\boldsymbol{T}} \def\by{\boldsymbol{y}} \def\bx{\boldsymbol{x}} \def\bc{\boldsymbol{c}} \def\bw{\boldsymbol{w}} \def\bz{\boldsymbol{z}} \def\bpsi{\boldsymbol{\psi}} \def\bbeta{\beta} \def\aref{a^\star} \def\kref{k^\star} \def\model{M} \def\hmodel{m} \def\mmodel{\mu} \def\imodel{H} \def\thmle{\hat{\theta}} \def\ofim{I^{\rm obs}} \def\efim{I^{\star}} \def\Imax{\rm Imax} \def\probit{\rm probit} \def\vt{t} \def\id{\rm Id} \def\teta{\tilde{\eta}} \newcommand{\eqdef}{\mathop{=}\limits^{\mathrm{def}}} \newcommand{\deriv}[1]{\frac{d}{dt}#1(t)} \newcommand{\pred}[1]{\tilde{#1}} \def\phis{\phi{^\star}} \def\hphi{\tilde{\phi}} \def\hw{\tilde{w}} \def\hpsi{\tilde{\psi}} \def\hatpsi{\hat{\psi}} \def\hatphi{\hat{\phi}} \def\psis{\psi{^\star}} \def\transy{u} \def\psipop{\psi_{\rm pop}} \newcommand{\psigr}[1]{\hat{\bpsi}_{#1}} \newcommand{\Vgr}[1]{\hat{V}_{#1}} \def\pmacro{\text{p}} \def\py{\pmacro} \def\pt{\pmacro} \def\pc{\pmacro} \def\pu{\pmacro} \def\pyi{\pmacro} \def\pyj{\pmacro} \def\ppsi{\pmacro} \def\ppsii{\pmacro} \def\pcpsith{\pmacro} \def\pcpsiiyi{\pmacro} \def\pth{\pmacro} \def\pypsi{\pmacro} \def\pcypsi{\pmacro} \def\ppsic{\pmacro} \def\pcpsic{\pmacro} \def\pypsic{\pmacro} \def\pypsit{\pmacro} \def\pcypsit{\pmacro} \def\pypsiu{\pmacro} \def\pcypsiu{\pmacro} \def\pypsith{\pmacro} \def\pypsithcut{\pmacro} \def\pypsithc{\pmacro} \def\pcypsiut{\pmacro} \def\pcpsithc{\pmacro} \def\pcthy{\pmacro} \def\pyth{\pmacro} \def\pcpsiy{\pmacro} \def\pz{\pmacro} \def\pw{\pmacro} \def\pcwz{\pmacro} \def\pw{\pmacro} \def\pcyipsii{\pmacro} \def\pyipsii{\pmacro} \def\pcetaiyi{\pmacro} \def\pypsiij{\pmacro} \def\pyipsiONE{\pmacro} \def\ptypsiij{\pmacro} \def\pcyzipsii{\pmacro} \def\pczipsii{\pmacro} \def\pcyizpsii{\pmacro} \def\pcyijzpsii{\pmacro} \def\pcyiONEzpsii{\pmacro} \def\pcypsiz{\pmacro} \def\pccypsiz{\pmacro} \def\pypsiz{\pmacro} \def\pcpsiz{\pmacro} \def\peps{\pmacro} \def\petai{\pmacro} \def\psig{\psi} \def\psigprime{\psig^{\prime}} \def\psigiprime{\psig_i^{\prime}} \def\psigk{\psig^{(k)}} \def\psigki{\psig_i^{(k)}} \def\psigkun{\psig^{(k+1)}} \def\psigkuni{\psig_i^{(k+1)}} \def\psigi{\psig_i} \def\psigil{\psig_{i,\ell}} \def\phig{\phi} \def\phigi{\phig_i} \def\phigil{\phig_{i,\ell}} \def\etagi{\eta_i} \def\IIV{\Omega} \def\thetag{\theta} \def\thetagk{\theta_k} \def\thetagkun{\theta_{k+1}} \def\thetagkunm{\theta_{k-1}} \def\sgk{s_{k}} \def\sgkun{s_{k+1}} \def\yg{y} \def\xg{x} \def\qx{p_x} \def\qy{p_y} \def\qt{p_t} \def\qc{p_c} \def\qu{p_u} \def\qyi{p_{y_i}} \def\qyj{p_{y_j}} \def\qpsi{p_{\psi}} \def\qpsii{p_{\psi_i}} \def\qcpsith{p_{\psi|\theta}} \def\qth{p_{\theta}} \def\qypsi{p_{y,\psi}} \def\qcypsi{p_{y|\psi}} \def\qpsic{p_{\psi,c}} \def\qcpsic{p_{\psi|c}} \def\qypsic{p_{y,\psi,c}} \def\qypsit{p_{y,\psi,t}} \def\qcypsit{p_{y|\psi,t}} \def\qypsiu{p_{y,\psi,u}} \def\qcypsiu{p_{y|\psi,u}} \def\qypsith{p_{y,\psi,\theta}} \def\qypsithcut{p_{y,\psi,\theta,c,u,t}} \def\qypsithc{p_{y,\psi,\theta,c}} \def\qcypsiut{p_{y|\psi,u,t}} \def\qcpsithc{p_{\psi|\theta,c}} \def\qcthy{p_{\theta | y}} \def\qyth{p_{y,\theta}} \def\qcpsiy{p_{\psi|y}} \def\qcpsiiyi{p_{\psi_i|y_i}} \def\qcetaiyi{p_{\eta_i|y_i}} \def\qz{p_z} \def\qw{p_w} \def\qcwz{p_{w|z}} \def\qw{p_w} \def\qcyipsii{p_{y_i|\psi_i}} \def\qyipsii{p_{y_i,\psi_i}} \def\qypsiij{p_{y_{ij}|\psi_{i}}} \def\qyipsi1{p_{y_{i1}|\psi_{i}}} \def\qtypsiij{p_{\transy(y_{ij})|\psi_{i}}} \def\qcyzipsii{p_{z_i,y_i|\psi_i}} \def\qczipsii{p_{z_i|\psi_i}} \def\qcyizpsii{p_{y_i|z_i,\psi_i}} \def\qcyijzpsii{p_{y_{ij}|z_{ij},\psi_i}} \def\qcyi1zpsii{p_{y_{i1}|z_{i1},\psi_i}} \def\qcypsiz{p_{y,\psi|z}} \def\qccypsiz{p_{y|\psi,z}} \def\qypsiz{p_{y,\psi,z}} \def\qcpsiz{p_{\psi|z}} \def\qeps{p_{\teps}} \def\qetai{p_{\eta_i}} \def\neta{n_\eta} \def\ncov{M} \def\npsi{n_\psig} \def\beeta{\eta} \def\logit{\rm logit} \def\transy{u} \def\so{O} \newcommand{\prob}[1]{ \mathbb{P}\left(#1\right)} \newcommand{\probs}[2]{ \mathbb{P}_{#1}\left(#2\right)} \newcommand{\esp}[1]{\mathbb{E}\left(#1\right)} \newcommand{\esps}[2]{\mathbb{E}_{#1}\left(#2\right)} \newcommand{\var}[1]{\mbox{Var}\left(#1\right)} \newcommand{\vars}[2]{\mbox{Var}_{#1}\left(#2\right)} \newcommand{\std}[1]{\mbox{sd}\left(#1\right)} \newcommand{\stds}[2]{\mbox{sd}_{#1}\left(#2\right)} \newcommand{\corr}[1]{\mbox{Corr}\left(#1\right)} \newcommand{\Rset}{\mbox{$\mathbb{R}$}} \newcommand{\Yr}{\mbox{$\mathcal{Y}$}} \newcommand{\teps}{\varepsilon} \newcommand{\like}{\cal L} \newcommand{\logit}{\rm logit} \newcommand{\transy}{u} \newcommand{\repy}{y^{(r)}} \newcommand{\brepy}{\boldsymbol{y}^{(r)}} \newcommand{\vari}[3]{#1_{#2}^{{#3}}} \newcommand{\dA}[2]{\dot{#1}_{#2}(t)} \newcommand{\nitc}{N} \newcommand{\itc}{I} \newcommand{\vl}{V} \newcommand{tstart}{t_{start}} \newcommand{tstop}{t_{stop}} \newcommand{\one}{\mathbb{1}} \newcommand{\hazard}{h} \newcommand{\cumhaz}{H} \newcommand{\std}[1]{\mbox{sd}\left(#1\right)} \newcommand{\eqdef}{\mathop{=}\limits^{\mathrm{def}}} \def\mlxtran{\text{MLXtran}} \def\monolix{\text{Monolix}} $
We consider a joint model for the observations and individual parameters of individual $i$:
where $\qcyipsii$ is the conditional distribution of the observations of individual $i$ (see Modeling the observations) and $\ppsii(\psi_i)$ the distribution of the individual parameters of individual $i$ (see Modeling the individual parameters).
Our goal here is to generate values from the conditional distribution $\qcpsiiyi$:
This distribution cannot usually be computed in closed-form when the model is not a linear Gaussian one. However, we will see that the Metropolis-Hastings (MH) algorithm allows us to draw a sequence $(\psi_i^{(\imh)}, \imh=1,2,\ldots)$ which converges (in distribution) to the target distribution $\qcpsiiyi$.
We will consider a very general model for the individual parameters:
\(
\psi_i = M(\beta,c_i,\eta_i) ,
\)
|
(1) |
where $\beta$ is a vector of fixed effects, $c_i$ a vector of (observed) individual covariates and $\eta_i$ a vector of random effects whose probability distribution is denoted $\qetai$.
The MH algorithm is used to simulate a sequence of random effects $(\eta_i^{(\imh)}, \imh=1,2,\ldots)$ with the target distribution being the conditional distribution $\qcetaiyi$ of the random effects $\eta_i$. We will then obtain the sequence $(\psi_i^{(\imh)}, \imh=1,2,\ldots)$ using (1).
The M-H algorithm is an iterative algorithm which requires an initial value $\eta_i^{(0)}$. Then, at iteration $\imh$, we
1.draw a new value $\ceta_i^{(\imh)}$ with a proposal distribution $q_{\imh}(\cdot \, ; \eta_i^{(\imh-1)})$
2. compute $\cpsi_i^{(\imh)} = M(\beta,c_i,\ceta_i^{(\imh)})$
3. accept this new value, that is set $\eta_i^{(\imh)}=\ceta_i^{(\imh)}$, with probability
Several proposal distributions are used in $\monolix$:
1. $q^{(1)}=\qetai$ is the marginal distribution of $\eta_i$, that is, the normal distribution ${\cal N}(0,\Omega)$. The acceptance probability for this kernel is
2. $q^{(2,\ieta)}$, for $\ieta=1,2,\ldots, d$ is the unidimensional Gaussian random walk for component $\ieta$ of $\eta_i$:
- where $\xi_{i,\ieta}^{(\imh)} \sim {\cal N}(0, \upsilon_\ieta^{(\imh)})$. The variance $\upsilon_\ieta^{(\imh)}$ of this random walk is adjusted in order to reach an optimal acceptance rate $\alpha^\star$ (\monolix uses $\alpha^\star = 0.3$ as default). Here, the transition kernel is symmetrical and
- $q^{(3,\Meta)}$, for $\Meta \subset \{1,2,\ldots,d\}$ is the multidimensional Gaussian random walk for the vector $\eta_\Meta = (\eta_\ieta , \ieta\in \Meta)$:
- where $\xi_{i,\Meta}^{(\imh)}=(\xi_{i,\ieta}^{(\imh)}, \ieta\in \Meta)$ is a Gaussian vector with diagonal variance matrix $\Upsilon_\Meta^{(\imh)}$. Here as well, the variance $\Upsilon_\Meta^{(\imh)}$ of this random walk is adjusted in order to reach the optimal acceptation rate $\alpha^\star$. Different subsets $\Meta$ are chosen at each iteration.
The Metropolis-Hasting algorithm consists then in using successively these different proposals for $i=1,2,\ldots , N$.
The variances $(\upsilon_\ieta, \ieta=1,2,\ldots d)$ for proposal $q^{(2,\ieta)}$ are updated at iteration $\imh$ as follows:
where $0<\delta<1$ is a constant and where $\overline{\alpha}_\ieta^{(\imh)}$ is the empirical acceptation rate at iteration $\imh$:
Indeed, a too small (resp. large) variance for the random walk leads to a too large (resp. small) acceptance probability. The proposed strategy therefore allows us to adaptively correct the variance for a given acceptance probability $\alpha^\star$. A small value of $a$ allows us to smooth out the sequence $(\upsilon_\ieta^{(\imh)})$. $\monolix$ uses $a = 0.4$ as a default value.
We can use the same strategy for updating the diagonal variance matrices $\Upsilon_\Meta,\ \Meta \subset \{1,2,,\ldots,d\}$ for the kernel $q^{(3,M)}$.
The simulated sequence $(\psi_i^{(\imh)}, \imh=1,2,\ldots)$ can then be used for estimating empirically the conditional distribution $\qcpsiiyi$ and the conditional mean $\esp{F(\psi_i)| y_i}$ of any function $F$ such that $\esp{F^2(\psi_i)| y_i}<+\infty$. The accuracy of the estimation obviously depends on the length $K$ of the sequence $(\psi_i^{(\imh)})$ used for the estimation, since the variance of the estimator merely decreases as $1/K$. The estimation is also improved if the sequence starts from a point which is representative of the equilibrium distribution (burn-in is one method of finding a good starting point), or which is known to have reasonably high probability (the conditional mode or the last value obtained with SAEM for instance).