Difference between revisions of "The Metropolis-Hastings algorithm for simulating the individual parameters"

From Popix
Jump to navigation Jump to search
m
m
Line 117: Line 117:
  
 
The simulated sequence $(\psi_i^{(\imh)}, \imh=1,2,\ldots)$ can then be used for estimating empirically the conditional distribution $\qcpsiiyi$ and the conditional mean $\esp{F(\psi_i)| y_i}$ of any function $F$ such that $\esp{F^2(\psi_i)| y_i}<+\infty$. The accuracy of the estimation obviously depends on the length $K$ of the sequence $(\psi_i^{(\imh)})$ used for the estimation, since the variance of the estimator merely decreases as $1/K$. The estimation is also improved if the sequence starts from a point which is representative of the equilibrium distribution (burn-in is one method of finding a good starting point), or which is known to have reasonably high probability (the conditional mode or the last value obtained with [[The SAEM algorithm for estimating population parameters|SAEM]] for instance).
 
The simulated sequence $(\psi_i^{(\imh)}, \imh=1,2,\ldots)$ can then be used for estimating empirically the conditional distribution $\qcpsiiyi$ and the conditional mean $\esp{F(\psi_i)| y_i}$ of any function $F$ such that $\esp{F^2(\psi_i)| y_i}<+\infty$. The accuracy of the estimation obviously depends on the length $K$ of the sequence $(\psi_i^{(\imh)})$ used for the estimation, since the variance of the estimator merely decreases as $1/K$. The estimation is also improved if the sequence starts from a point which is representative of the equilibrium distribution (burn-in is one method of finding a good starting point), or which is known to have reasonably high probability (the conditional mode or the last value obtained with [[The SAEM algorithm for estimating population parameters|SAEM]] for instance).
 +
 +
{{Back&Next
 +
|linkBack=The SAEM algorithm for estimating population parameters
 +
|linkNext=Estimation of the observed Fisher information matrix }}
  
 
</div>
 
</div>

Revision as of 14:27, 17 May 2013

We consider a joint model for the observations and individual parameters of individual $i$:

\(\pyipsii(y_i,\psi_i) = \pcyipsii(y_i | \psi_i) \ppsii(\psi_i) , \)

where $\qcyipsii$ is the conditional distribution of the observations of individual $i$ (see Modeling the observations) and $\ppsii(\psi_i)$ the distribution of the individual parameters of individual $i$ (see Modeling the individual parameters).


Remark

This distribution depends on a vector of population parameters $\theta$ and possibly on covariates $c_i$, regression variables $x_i$, inputs $u_i$. We suppose that all of these components of the model are given, so it is not necessary to explicitly write them each time.


Our goal here is to generate values from the conditional distribution $\qcpsiiyi$:

\( \pcpsiiyi(\psi_i|y_i) = \displaystyle{ \frac{\pyipsii(y_i,\psi_i)}{\pyi(y_i)} } . \)

This distribution cannot usually be computed in closed-form when the model is not a linear Gaussian one. However, we will see that the Metropolis-Hastings (MH) algorithm allows us to draw a sequence $(\psi_i^{(\imh)}, \imh=1,2,\ldots)$ which converges (in distribution) to the target distribution $\qcpsiiyi$.

We will consider a very general model for the individual parameters:

\( \psi_i = M(\beta,c_i,\eta_i) , \)
(1)

where $\beta$ is a vector of fixed effects, $c_i$ a vector of (observed) individual covariates and $\eta_i$ a vector of random effects whose probability distribution is denoted $\qetai$.

The MH algorithm is used to simulate a sequence of random effects $(\eta_i^{(\imh)}, \imh=1,2,\ldots)$ with the target distribution being the conditional distribution $\qcetaiyi$ of the random effects $\eta_i$. We will then obtain the sequence $(\psi_i^{(\imh)}, \imh=1,2,\ldots)$ using (1).

The M-H algorithm is an iterative algorithm which requires an initial value $\eta_i^{(0)}$. Then, at iteration $\imh$, we

1. draw a new value $\ceta_i^{(\imh)}$ with a proposal distribution $q_{\imh}(\cdot \, ; \eta_i^{(\imh-1)})$

2. compute $\cpsi_i^{(\imh)} = M(\beta,c_i,\ceta_i^{(\imh)})$

3. accept this new value, that is set $\eta_i^{(\imh)}=\ceta_i^{(\imh)}$, with probability

\(\begin{eqnarray} \alpha(\ceta_i^{(\imh)} ; \eta_i^{(\imh-1)} ) &=& \displaystyle{\frac{ q_{\imh}(\eta_i^{(\imh-1)}; \ceta_i^{(\imh)} ) \qcetaiyi(\ceta_i^{(\imh)}|y_i)} {q_{\imh}(\ceta_i^{(\imh)} ; \eta_i^{(\imh-1)}) \qcetaiyi(\eta_i^{(\imh-1)}|y_i) } } \\ &=& \displaystyle{\frac{ q_{\imh}(\eta_i^{(\imh-1)}; \ceta_i^{(\imh)} ) \qetai(\ceta_i^{(\imh)}) \qcyipsii(y_i | \cpsi_i^{(\imh)})} {q_{\imh}(\ceta_i^{(\imh)} ; \eta_i^{(\imh-1)}) \qetai(\eta_i^{(\imh-1)}) \qcyipsii(y_i | \psi_i^{(\imh-1)}) } } . \end{eqnarray}\)


Remarks

  • In order to run this algorithm, we need to be able to calculate the transition density $q_{\imh}(\ceta_i;\eta_i)$, the random effects density $\petai(\eta_i)$ (which poses no problem if $\eta_i$ is a Gaussian vector for example), and especially the conditional distribution $\pyipsii(y_i|\psi_i)$. This is why this calculation is explicitly performed in the various examples provided in Modeling the observations.


  • We denote $q_{\imh}$ the proposal distribution used at iteration $\imh$ of the algorithm since different proposal can be used for different iterations.


Several proposal distributions are used in $\monolix$:

1. $q^{(1)}=\qetai$ is the marginal distribution of $\eta_i$, that is, the normal distribution ${\cal N}(0,\Omega)$. The acceptance probability for this kernel is

\( \alpha(\ceta_i^{(\imh)} ; \eta_i^{(\imh-1)} ) = \displaystyle{\frac{ \qcyipsii(y_i | \cpsi_i^{(\imh)})}{\qcyipsii(y_i | \psi_i^{(\imh-1)}) } } . \)


2. $q^{(2,\ieta)}$, for $\ieta=1,2,\ldots, d$ is the unidimensional Gaussian random walk for component $\ieta$ of $\eta_i$:

\( \ceta_{i,\ieta}^{(\imh)} = \eta_{i,\ieta}^{(\imh-1)} + \xi_{i,\ieta}^{(\imh)} , \)

where $\xi_{i,\ieta}^{(\imh)} \sim {\cal N}(0, \upsilon_\ieta^{(\imh)})$. The variance $\upsilon_\ieta^{(\imh)}$ of this random walk is adjusted in order to reach an optimal acceptance rate $\alpha^\star$ (\monolix uses $\alpha^\star = 0.3$ as default). Here, the transition kernel is symmetrical and

\( \alpha(\ceta_i^{(\imh)} ; \eta_i^{(\imh-1)} ) = \displaystyle{\frac{ \qetai(\ceta_i^{(\imh)}) \qcyipsii(y_i | \cpsi_i^{(\imh)})} {\qetai(\eta_i^{(\imh-1)}) \qcyipsii(y_i | \psi_i^{(\imh-1)}) } }. \)


3. $q^{(3,\Meta)}$, for $\Meta \subset \{1,2,\ldots,d\}$ is the multidimensional Gaussian random walk for the vector $\eta_\Meta = (\eta_\ieta , \ieta\in \Meta)$:

\( \ceta_{i,\Meta}^{(\imh)} = \eta_i^{(\imh-1)} + \xi_{i,\Meta}^{(\imh)} , \)

where $\xi_{i,\Meta}^{(\imh)}=(\xi_{i,\ieta}^{(\imh)}, \ieta\in \Meta)$ is a Gaussian vector with diagonal variance matrix $\Upsilon_\Meta^{(\imh)}$. Here as well, the variance $\Upsilon_\Meta^{(\imh)}$ of this random walk is adjusted in order to reach the optimal acceptation rate $\alpha^\star$. Different subsets $\Meta$ are chosen at each iteration.

The Metropolis-Hasting algorithm consists then in using successively these different proposals for $i=1,2,\ldots , N$.

The variances $(\upsilon_\ieta, \ieta=1,2,\ldots d)$ for proposal $q^{(2,\ieta)}$ are updated at iteration $\imh$ as follows:

\(\upsilon_\ieta^{(\imh)}=\upsilon_\ieta^{(\imh)}(1+\delta(\overline{\alpha}_\ieta^{(\imh-1)}-\alpha^\star)) \)

where $0<\delta<1$ is a constant and where $\overline{\alpha}_\ieta^{(\imh)}$ is the empirical acceptation rate at iteration $\imh$:

\( \overline{\alpha}_\ieta^{(\imh)} = \frac{1}{N} \sum_{i=1}^N \one_{\eta_i^{(\imh)}=\ceta_i^{(\imh)} } . \)

Indeed, a too small (resp. large) variance for the random walk leads to a too large (resp. small) acceptance probability. The proposed strategy therefore allows us to adaptively correct the variance for a given acceptance probability $\alpha^\star$. A small value of $a$ allows us to smooth out the sequence $(\upsilon_\ieta^{(\imh)})$. $\monolix$ uses $a = 0.4$ as a default value.

We can use the same strategy for updating the diagonal variance matrices $\Upsilon_\Meta,\ \Meta \subset \{1,2,,\ldots,d\}$ for the kernel $q^{(3,M)}$.

The simulated sequence $(\psi_i^{(\imh)}, \imh=1,2,\ldots)$ can then be used for estimating empirically the conditional distribution $\qcpsiiyi$ and the conditional mean $\esp{F(\psi_i)| y_i}$ of any function $F$ such that $\esp{F^2(\psi_i)| y_i}<+\infty$. The accuracy of the estimation obviously depends on the length $K$ of the sequence $(\psi_i^{(\imh)})$ used for the estimation, since the variance of the estimator merely decreases as $1/K$. The estimation is also improved if the sequence starts from a point which is representative of the equilibrium distribution (burn-in is one method of finding a good starting point), or which is known to have reasonably high probability (the conditional mode or the last value obtained with SAEM for instance).

Back.png
Forward.png