Difference between revisions of "The Metropolis-Hastings algorithm for simulating the individual parameters"

From Popix
Jump to navigation Jump to search
m
m
 
(One intermediate revision by the same user not shown)
Line 37: Line 37:
  
 
<ol>
 
<ol>
<li>Draw a new value $\ceta_i^{(\imh)}$ with some ''proposal distribution'' $q_{\imh}(\cdot \, ; \eta_i^{(\imh-1)})$.</li>
+
<li>Draw a new value $\ceta_i^{(\imh)}$ with some ''proposal distribution'' $q_{\imh}(\cdot \, ; \eta_i^{(\imh-1)})$</li>
 
 
  
 
<li> Compute $\cpsi_i^{(\imh)} = M(\beta,c_i,\ceta_i^{(\imh)})$</li>
 
<li> Compute $\cpsi_i^{(\imh)} = M(\beta,c_i,\ceta_i^{(\imh)})$</li>
  
 +
<li>Accept this new value, that is let $\eta_i^{(\imh)}=\ceta_i^{(\imh)}$, with probability</li>
  
<li>Accept this new value, that is let $\eta_i^{(\imh)}=\ceta_i^{(\imh)}$, with probability</li>
 
  
 
{{Equation1
 
{{Equation1
Line 53: Line 52:
 
\end{eqnarray}</math> }}
 
\end{eqnarray}</math> }}
 
</ol>
 
</ol>
 +
  
 
{{Remarks
 
{{Remarks
Line 67: Line 67:
  
  
<blockquote>
+
<ol>
1.  $q^{(1)}=\qetai$ is the marginal distribution of $\eta_i$, that is, the normal distribution ${\cal N}(0,\Omega)$. The acceptance probability for this kernel is
+
<li> $q^{(1)}=\qetai$ is the marginal distribution of $\eta_i$, that is, the normal distribution ${\cal N}(0,\Omega)$. The acceptance probability for this kernel is </li>
  
 
{{Equation1
 
{{Equation1
Line 76: Line 76:
  
  
2.  $q^{(2,\ieta)}$, for $\ieta=1,2,\ldots, d$ is the unidimensional Gaussian random walk for component $\ieta$ of $\eta_i$:
+
<li> $q^{(2,\ieta)}$, for $\ieta=1,2,\ldots, d$ is the unidimensional Gaussian random walk for component $\ieta$ of $\eta_i$:</li>
  
 
{{Equation1
 
{{Equation1
 
|equation=<math> \ceta_{i,\ieta}^{(\imh)} =  \eta_{i,\ieta}^{(\imh-1)} + \xi_{i,\ieta}^{(\imh)} , </math> }}
 
|equation=<math> \ceta_{i,\ieta}^{(\imh)} =  \eta_{i,\ieta}^{(\imh-1)} + \xi_{i,\ieta}^{(\imh)} , </math> }}
  
where $\xi_{i,\ieta}^{(\imh)} \sim {\cal N}(0, \upsilon_\ieta^{(\imh)})$. The variance $\upsilon_\ieta^{(\imh)}$ of this random walk is calibrated in order to reach an optimal acceptance rate $\alpha^\star$ (\monolix uses $\alpha^\star = 0.3$ as default). Here, the transition kernel is symmetrical and
+
where $\xi_{i,\ieta}^{(\imh)} \sim {\cal N}(0, \upsilon_\ieta^{(\imh)})$. The variance $\upsilon_\ieta^{(\imh)}$ of this random walk is calibrated in order to reach an optimal acceptance rate $\alpha^\star$ ($\monolix$ uses $\alpha^\star = 0.3$ as default). Here, the transition kernel is symmetrical and
  
 
{{Equation1
 
{{Equation1
Line 90: Line 90:
  
  
3. $q^{(3,\Meta)}$, for $\Meta \subset \{1,2,\ldots,d\}$ is the multidimensional Gaussian random walk for the vector $\eta_\Meta = (\eta_\ieta , \ieta\in \Meta)$:
+
<li> $q^{(3,\Meta)}$, for $\Meta \subset \{1,2,\ldots,d\}$ is the multidimensional Gaussian random walk for the vector $\eta_\Meta = (\eta_\ieta , \ieta\in \Meta)$:</li>
  
 
{{Equation1
 
{{Equation1
Line 96: Line 96:
  
 
where $\xi_{i,\Meta}^{(\imh)}=(\xi_{i,\ieta}^{(\imh)}, \ieta\in \Meta)$ is a Gaussian vector with  diagonal variance matrix $\Upsilon_\Meta^{(\imh)}$. Here as well, the variance $\Upsilon_\Meta^{(\imh)}$ of this random walk is adjusted in order to reach the optimal acceptation rate $\alpha^\star$. Different subsets $\Meta$ are chosen at each iteration.
 
where $\xi_{i,\Meta}^{(\imh)}=(\xi_{i,\ieta}^{(\imh)}, \ieta\in \Meta)$ is a Gaussian vector with  diagonal variance matrix $\Upsilon_\Meta^{(\imh)}$. Here as well, the variance $\Upsilon_\Meta^{(\imh)}$ of this random walk is adjusted in order to reach the optimal acceptation rate $\alpha^\star$. Different subsets $\Meta$ are chosen at each iteration.
</blockquote>
 
  
  
Line 117: Line 116:
 
We can use the same strategy for updating the diagonal variance matrices $\Upsilon_\Meta,\ \Meta \subset \{1,2,,\ldots,d\}$ for the kernel $q^{(3,M)}$.
 
We can use the same strategy for updating the diagonal variance matrices $\Upsilon_\Meta,\ \Meta \subset \{1,2,,\ldots,d\}$ for the kernel $q^{(3,M)}$.
  
The simulated sequence $(\psi_i^{(\imh)}, \imh=1,2,\ldots)$ can then be used for estimating empirically the conditional distribution $\qcpsiiyi$ and the conditional mean $\esp{F(\psi_i)| y_i}$ of any function $F$ such that $\esp{F^2(\psi_i)| y_i}<+\infty$. The accuracy of the estimation obviously depends on the length $K$ of the sequence $(\psi_i^{(\imh)})$ used for the estimation, since the variance of the estimator merely decreases as $1/K$. Estimation is also improved if the sequence starts from a point which is representative of the equilibrium distribution (burn-in is one method of finding a good starting point) or which known to have reasonably high probability (the conditional mode or the last value obtained with [[The SAEM algorithm for estimating population parameters|SAEM]] for instance).
+
The simulated sequence $(\psi_i^{(\imh)}, \imh=1,2,\ldots)$ can then be used for estimating empirically the conditional distribution $\qcpsiiyi$ and the conditional mean $\esp{F(\psi_i)| y_i}$ of any function $F$ such that $\esp{F^2(\psi_i)| y_i}<+\infty$. The accuracy of the estimation obviously depends on the length $K$ of the sequence $(\psi_i^{(\imh)})$ used for the estimation, since the variance of the estimator merely decreases as $1/K$. Estimation is also improved if the sequence starts from a point which is representative of the equilibrium distribution (burn-in is one method of finding a good starting point) or known to have reasonably high probability (the conditional mode or the last value obtained with [[The SAEM algorithm for estimating population parameters|SAEM]] for instance).
  
 
{{Back&Next
 
{{Back&Next
 
|linkBack=The SAEM algorithm for estimating population parameters
 
|linkBack=The SAEM algorithm for estimating population parameters
 
|linkNext=Estimation of the observed Fisher information matrix }}
 
|linkNext=Estimation of the observed Fisher information matrix }}

Latest revision as of 14:55, 22 May 2013

We consider a joint model for the observations and individual parameters of individual $i$:

\(\pyipsii(y_i,\psi_i) = \pcyipsii(y_i | \psi_i) \ppsii(\psi_i) , \)

where $\qcyipsii$ is the conditional distribution of the observations of individual $i$ (see Modeling the observations) and $\ppsii(\psi_i)$ the distribution of the individual parameters of individual $i$ (see Modeling the individual parameters).


Remark

This distribution depends on a vector of population parameters $\theta$ and possibly covariates $c_i$, regression variables $x_i$ and inputs $u_i$. We suppose that all of these components of the model are given, so it is not necessary to explicitly write them each time.


Our goal here is to generate values from the conditional distribution $\qcpsiiyi$:

\( \pcpsiiyi(\psi_i|y_i) = \displaystyle{ \frac{\pyipsii(y_i,\psi_i)}{\pyi(y_i)} } . \)

This distribution cannot usually be computed in closed-form when the model is not a linear Gaussian one. However, we will see that the Metropolis-Hastings (MH) algorithm allows us to draw a sequence $(\psi_i^{(\imh)}, \imh=1,2,\ldots)$ which converges (in distribution) to the target distribution $\qcpsiiyi$.

We will consider a very general model for the individual parameters:

\( \psi_i = M(\beta,c_i,\eta_i) , \)
(1)

where $\beta$ is a vector of fixed effects, $c_i$ a vector of (observed) individual covariates and $\eta_i$ a vector of random effects whose probability distribution is denoted $\qetai$.

The MH algorithm is used to simulate a sequence of random effects $(\eta_i^{(\imh)}, \imh=1,2,\ldots)$ with the target distribution being the conditional distribution $\qcetaiyi$ of the random effects $\eta_i$. We will then obtain the sequence $(\psi_i^{(\imh)}, \imh=1,2,\ldots)$ using (1).

The MH algorithm is iterative and requires an initial value $\eta_i^{(0)}$. Then, at iteration $\imh$, we


  1. Draw a new value $\ceta_i^{(\imh)}$ with some proposal distribution $q_{\imh}(\cdot \, ; \eta_i^{(\imh-1)})$
  2. Compute $\cpsi_i^{(\imh)} = M(\beta,c_i,\ceta_i^{(\imh)})$
  3. Accept this new value, that is let $\eta_i^{(\imh)}=\ceta_i^{(\imh)}$, with probability
  4. \(\begin{eqnarray} \alpha(\ceta_i^{(\imh)} ; \eta_i^{(\imh-1)} ) &=& \displaystyle{\frac{ q_{\imh}(\eta_i^{(\imh-1)}; \ceta_i^{(\imh)} ) \qcetaiyi(\ceta_i^{(\imh)}|y_i)} {q_{\imh}(\ceta_i^{(\imh)} ; \eta_i^{(\imh-1)}) \qcetaiyi(\eta_i^{(\imh-1)}|y_i) } } \\ &=& \displaystyle{\frac{ q_{\imh}(\eta_i^{(\imh-1)}; \ceta_i^{(\imh)} ) \qetai(\ceta_i^{(\imh)}) \qcyipsii(y_i | \cpsi_i^{(\imh)})} {q_{\imh}(\ceta_i^{(\imh)} ; \eta_i^{(\imh-1)}) \qetai(\eta_i^{(\imh-1)}) \qcyipsii(y_i | \psi_i^{(\imh-1)}) } } . \end{eqnarray}\)


Remarks

  • In order to run this algorithm, we need to be able to calculate the transition density $q_{\imh}(\ceta_i;\eta_i)$, the random effects density $\petai(\eta_i)$ (which poses no problem if $\eta_i$ is a Gaussian vector for example), and in particular the conditional distribution $\pyipsii(y_i|\psi_i)$. This is why this calculation is explicitly performed in the various examples provided in Modeling the observations.


  • We denote $q_{\imh}$ the proposal distribution used at iteration $\imh$ of the algorithm because different proposals can be used at different iterations.


Several proposal distributions are used in $\monolix$:


  1. $q^{(1)}=\qetai$ is the marginal distribution of $\eta_i$, that is, the normal distribution ${\cal N}(0,\Omega)$. The acceptance probability for this kernel is
  2. \( \alpha(\ceta_i^{(\imh)} ; \eta_i^{(\imh-1)} ) = \displaystyle{\frac{ \qcyipsii(y_i | \cpsi_i^{(\imh)})}{\qcyipsii(y_i | \psi_i^{(\imh-1)}) } } . \)


  3. $q^{(2,\ieta)}$, for $\ieta=1,2,\ldots, d$ is the unidimensional Gaussian random walk for component $\ieta$ of $\eta_i$:
  4. \( \ceta_{i,\ieta}^{(\imh)} = \eta_{i,\ieta}^{(\imh-1)} + \xi_{i,\ieta}^{(\imh)} , \)

    where $\xi_{i,\ieta}^{(\imh)} \sim {\cal N}(0, \upsilon_\ieta^{(\imh)})$. The variance $\upsilon_\ieta^{(\imh)}$ of this random walk is calibrated in order to reach an optimal acceptance rate $\alpha^\star$ ($\monolix$ uses $\alpha^\star = 0.3$ as default). Here, the transition kernel is symmetrical and

    \( \alpha(\ceta_i^{(\imh)} ; \eta_i^{(\imh-1)} ) = \displaystyle{\frac{ \qetai(\ceta_i^{(\imh)}) \qcyipsii(y_i | \cpsi_i^{(\imh)})} {\qetai(\eta_i^{(\imh-1)}) \qcyipsii(y_i | \psi_i^{(\imh-1)}) } }. \)


  5. $q^{(3,\Meta)}$, for $\Meta \subset \{1,2,\ldots,d\}$ is the multidimensional Gaussian random walk for the vector $\eta_\Meta = (\eta_\ieta , \ieta\in \Meta)$:
  6. \( \ceta_{i,\Meta}^{(\imh)} = \eta_i^{(\imh-1)} + \xi_{i,\Meta}^{(\imh)} , \)

    where $\xi_{i,\Meta}^{(\imh)}=(\xi_{i,\ieta}^{(\imh)}, \ieta\in \Meta)$ is a Gaussian vector with diagonal variance matrix $\Upsilon_\Meta^{(\imh)}$. Here as well, the variance $\Upsilon_\Meta^{(\imh)}$ of this random walk is adjusted in order to reach the optimal acceptation rate $\alpha^\star$. Different subsets $\Meta$ are chosen at each iteration.


    The MH algorithm consists then in using successively these different proposals for $i=1,2,\ldots , N$.

    The variances $(\upsilon_\ieta,\, \ieta=1,2,\ldots, d)$ for proposal $q^{(2,\ieta)}$ are updated at iteration $\imh$ as follows:

    \(\upsilon_\ieta^{(\imh)}=\upsilon_\ieta^{(\imh)}(1+\delta(\overline{\alpha}_\ieta^{(\imh-1)}-\alpha^\star)), \)

    where $0<\delta<1$ is a constant and $\overline{\alpha}_\ieta^{(\imh)}$ the empirical acceptance rate at iteration $\imh$:

    \( \overline{\alpha}_\ieta^{(\imh)} = \frac{1}{N} \sum_{i=1}^N \one_{\eta_i^{(\imh)}=\ceta_i^{(\imh)} } . \)

    Indeed, a too small (resp. large) variance for the random walk leads to a too large (resp. small) acceptance probability. The strategy proposed here therefore allows us to adaptively correct the variance for a given acceptance probability $\alpha^\star$. A small value of $a$ allows us to smooth out the sequence $(\upsilon_\ieta^{(\imh)})$. $\monolix$ uses $a = 0.4$ as a default value.

    We can use the same strategy for updating the diagonal variance matrices $\Upsilon_\Meta,\ \Meta \subset \{1,2,,\ldots,d\}$ for the kernel $q^{(3,M)}$.

    The simulated sequence $(\psi_i^{(\imh)}, \imh=1,2,\ldots)$ can then be used for estimating empirically the conditional distribution $\qcpsiiyi$ and the conditional mean $\esp{F(\psi_i)| y_i}$ of any function $F$ such that $\esp{F^2(\psi_i)| y_i}<+\infty$. The accuracy of the estimation obviously depends on the length $K$ of the sequence $(\psi_i^{(\imh)})$ used for the estimation, since the variance of the estimator merely decreases as $1/K$. Estimation is also improved if the sequence starts from a point which is representative of the equilibrium distribution (burn-in is one method of finding a good starting point) or known to have reasonably high probability (the conditional mode or the last value obtained with SAEM for instance).

    Back.png
    Forward.png