Difference between revisions of "Estimation of the observed Fisher information matrix"
m |
m |
||
(62 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | + | $ \def\hphi{\tilde{\phi}} $ | |
− | + | ==Estimation using stochastic approximation== | |
− | $ | ||
− | |||
− | |||
− | |||
− | \def\ | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
The ''observed'' Fisher information matrix (F.I.M.) is a function of $\theta$ defined as | The ''observed'' Fisher information matrix (F.I.M.) is a function of $\theta$ defined as | ||
{{EquationWithRef | {{EquationWithRef | ||
− | |equation=<div | + | |equation=<div id="eq_fim1"><math>\begin{eqnarray} |
− | I(\theta) &=& -\DDt{\log (\like(\theta;\by))} \\ | + | I(\theta) &=& -\DDt{\log ({\like}(\theta;\by))} \\ |
− | &=& -\DDt{\log (\py(\by;\theta))} | + | &=& -\DDt{\log (\py(\by;\theta))} . |
\end{eqnarray}</math></div> | \end{eqnarray}</math></div> | ||
|reference=(1) }} | |reference=(1) }} | ||
− | Due to the complex | + | Due to the likelihood being quite complex, $I(\theta)$ usually has no closed form expression. It is however possible to estimate it using a stochastic approximation procedure based on <balloon title="Kuhn05: put here the reference!!!" style="color:#177245">Louis' formula</balloon>: |
{{Equation1 | {{Equation1 | ||
− | |equation=<math>\DDt{\log (\pmacro(\by | + | |equation=<math>\DDt{\log (\pmacro(\by;\theta))} = \esp{\DDt{\log (\pmacro(\by,\bpsi;\theta))} {{!}} \by ;\theta} + \cov{\Dt{\log (\pmacro(\by,\bpsi;\theta))} {{!}} \by ; \theta}, |
</math> }} | </math> }} | ||
− | where | + | where |
{{Equation1 | {{Equation1 | ||
|equation=<math>\begin{eqnarray} | |equation=<math>\begin{eqnarray} | ||
− | \cov{\Dt{\log (\pmacro(\by,\bpsi;\theta))} {{!}} | + | \cov{\Dt{\log (\pmacro(\by,\bpsi;\theta))} {{!}} \by ; \theta} &=& |
− | + | \esp{ \left(\Dt{\log (\pmacro(\by,\bpsi;\theta))} \right)\left(\Dt{\log (\pmacro(\by,\bpsi;\theta))}\right)^{\transpose} {{!}} \by ; \theta} \\ | |
− | && - \esp{\Dt{\log (\pmacro(\by,\bpsi;\theta))} {{!}} \by ; \theta}\esp{\Dt{\log (\pmacro(\by,\bpsi;\theta))} {{!}} \by ; \theta}^{\transpose} | + | && - \esp{\Dt{\log (\pmacro(\by,\bpsi;\theta))} {{!}} \by ; \theta}\esp{\Dt{\log (\pmacro(\by,\bpsi;\theta))} {{!}} \by ; \theta}^{\transpose} . |
\end{eqnarray}</math> }} | \end{eqnarray}</math> }} | ||
− | $\ | + | Thus, $\DDt{\log (\pmacro(\by;\theta))}$ is defined as a combination of conditional expectations. Each of these conditional expectations can be estimated by Monte Carlo, or equivalently approximated using a stochastic approximation algorithm. |
− | We can then draw a sequence $(\psi_i^{(k)})$ using a [[The Metropolis-Hastings algorithm for simulating the individual parameters|Metropolis-Hasting algorithm]] and estimate the observed F.I.M | + | We can then draw a sequence $(\psi_i^{(k)})$ using a [[The Metropolis-Hastings algorithm for simulating the individual parameters|Metropolis-Hasting algorithm]] and estimate the observed F.I.M. online. At iteration $k$ of the algorithm: |
− | At iteration $k$ of the algorithm: | + | |
− | + | ||
+ | * '''Simulation step''': for $i=1,2,\ldots,N$, draw $\psi_i^{(k)}$ from $m$ iterations of the Metropolis-Hastings algorithm described in [[The Metropolis-Hastings algorithm for simulating the individual parameters| The Metropolis-Hastings algorithm]] section with $\pmacro(\psi_i |y_i ;{\theta})$ as the limit distribution. | ||
− | |||
+ | * '''Stochastic approximation''': update $D_k$, $G_k$ and $\Delta_k$ according to the following recurrence relations: | ||
− | |||
{{Equation1 | {{Equation1 | ||
|equation=<math>\begin{eqnarray} | |equation=<math>\begin{eqnarray} | ||
− | \Delta_k & = & \Delta_{k-1} + \gamma_k \left(\Dt{\log (\pmacro(\by,\bpsi^{(k)};{\theta}))} - \Delta_{k-1} \right) | + | \Delta_k & = & \Delta_{k-1} + \gamma_k \left(\Dt{\log (\pmacro(\by,\bpsi^{(k)};{\theta}))} - \Delta_{k-1} \right) \\ |
− | D_k & = & D_{k-1} + \gamma_k \left(\DDt{\log (\pmacro(\by,\bpsi^{(k)};{\theta}))} - D_{k-1} \right) | + | D_k & = & D_{k-1} + \gamma_k \left(\DDt{\log (\pmacro(\by,\bpsi^{(k)};{\theta}))} - D_{k-1} \right)\\ |
G_k & = & G_{k-1} + \gamma_k \left((\Dt{\log (\pmacro(\by,\bpsi^{(k)};{\theta}))})(\Dt{\log (\pmacro(\by,\bpsi^{(k)};{\theta}))})^\transpose -G_{k-1} \right), | G_k & = & G_{k-1} + \gamma_k \left((\Dt{\log (\pmacro(\by,\bpsi^{(k)};{\theta}))})(\Dt{\log (\pmacro(\by,\bpsi^{(k)};{\theta}))})^\transpose -G_{k-1} \right), | ||
\end{eqnarray}</math> }} | \end{eqnarray}</math> }} | ||
− | : where $(\gamma_k)$ is a decreasing sequence of positive numbers such that $\gamma_1=1$, $ \sum_{k=1}^{\infty} \gamma_k = \infty$ and $\sum_{k=1}^{\infty} \gamma_k^2 < \infty$. | + | : where $(\gamma_k)$ is a decreasing sequence of positive numbers such that $\gamma_1=1$, $ \sum_{k=1}^{\infty} \gamma_k = \infty$, and $\sum_{k=1}^{\infty} \gamma_k^2 < \infty$. |
− | * | + | * '''Estimation step''': update the estimate $H_k$ of the F.I.M. according to |
{{Equation1 | {{Equation1 | ||
− | |equation=<math> H_k = D_k + G_k - \Delta_k \Delta_k^{\transpose}. </math> }} | + | |equation=<math>H_k = D_k + G_k - \Delta_k \Delta_k^{\transpose}. </math> }} |
+ | |||
− | + | Implementing this algorithm therefore requires computation of the first and second derivatives of | |
{{Equation1 | {{Equation1 | ||
− | |equation=<math> \log (\pmacro(\by,\bpsi;\theta))=\sum_{i=1}^{N} \log (\pmacro(y_i,\psi_i;\theta)). | + | |equation=<math>\log (\pmacro(\by,\bpsi;\theta))=\sum_{i=1}^{N} \log (\pmacro(y_i,\psi_i;\theta)).</math> }} |
− | </math> }} | ||
Assume first that the joint distribution of $\by$ and $\bpsi$ decomposes as | Assume first that the joint distribution of $\by$ and $\bpsi$ decomposes as | ||
Line 318: | Line 67: | ||
|reference=(2) }} | |reference=(2) }} | ||
− | + | This assumption means that for any $i=1,2,\ldots,N$, all of the components of $\psi_i$ are random and there exists a sufficient statistic ${\cal S}(\bpsi)$ for the estimation of $\theta$. It is then sufficient to compute the first and second derivatives of $\log (\pmacro(\bpsi;\theta))$ in order to estimate the F.I.M. This can be done relatively simply in closed form when the individual parameters are normally distributed (or a transformation $h$ of them is). | |
− | It is then | ||
− | If some component of $\psi_i$ has no variability, [[#eq:fim_dec1|(2)]] | + | If some component of $\psi_i$ has no variability, [[#eq:fim_dec1|(2)]] no longer holds, but we can decompose $\theta$ into $(\theta_y,\theta_\psi)$ such that |
{{Equation1 | {{Equation1 | ||
Line 328: | Line 76: | ||
</math> }} | </math> }} | ||
− | We | + | We then need to compute the first and second derivatives of $\log(\pcyipsii(y_i |\psi_i ; \theta_y))$ and $\log(\ppsii(\psi_i;\theta_\psi))$. Derivatives of $\log(\pcyipsii(y_i |\psi_i ; \theta_y))$ that do not have a closed form expression can be obtained using central differences. |
Line 334: | Line 82: | ||
|title=Remarks | |title=Remarks | ||
|text= | |text= | ||
− | + | 1. Using $\gamma_k=1/k$ for $k \geq 1$ means that each term is approximated with an empirical mean obtained from $(\bpsi^{(k)}, k \geq 1)$. For instance, | |
− | 1. Using $\gamma_k=1/k$ for $k | + | |
− | |||
{{EquationWithRef | {{EquationWithRef | ||
|equation=<div id="eq:fim_Delta1"><math>\begin{eqnarray} | |equation=<div id="eq:fim_Delta1"><math>\begin{eqnarray} | ||
− | \Delta_k &=& \Delta_{k-1} + \frac{1}{k} \left(\Dt{\log (\pmacro(\by,\bpsi^{(k)};\theta))} - \Delta_{k-1} \right) | + | \Delta_k |
+ | &=& \Delta_{k-1} + \displaystyle{ \frac{1}{k} } \left(\Dt{\log (\pmacro(\by,\bpsi^{(k)};\theta))} - \Delta_{k-1} \right) | ||
\end{eqnarray}</math></div> | \end{eqnarray}</math></div> | ||
|reference=(3) }} | |reference=(3) }} | ||
{{EquationWithRef | {{EquationWithRef | ||
|equation=<div id="eq:fim_Delta2"><math>\begin{eqnarray} | |equation=<div id="eq:fim_Delta2"><math>\begin{eqnarray} | ||
− | &=& \frac{1}{k} \sum_{j=1}^{k} \Dt{\log (\pmacro(\by,\bpsi^{(j)};\theta))} | + | &=& \displaystyle{ \frac{1}{k} }\sum_{j=1}^{k} \Dt{\log (\pmacro(\by,\bpsi^{(j)};\theta))} . |
\end{eqnarray}</math></div> | \end{eqnarray}</math></div> | ||
|reference=(4) }} | |reference=(4) }} | ||
− | + | [[#eq:fim_Delta1|(3)]] (resp. [[#eq:fim_Delta2|(4)]]) defines $\Delta_k$ using an online (resp. offline) algorithm. Writing $\Delta_k$ as in [[#eq:fim_Delta1|(3)]] instead of [[#eq:fim_Delta2|(4)]] avoids having to store all simulated sequences $(\bpsi^{(j)}, 1\leq j \leq k)$ when computing $\Delta_k$. | |
− | |||
− | 2. This approach is used | + | 2. This approach is used for computing the F.I.M. $I(\hat{\theta})$ in practice, where $\hat{\theta}$ is the maximum likelihood estimate of $\theta$. The only difference with the [[The Metropolis-Hastings algorithm for simulating the individual parameters|Metropolis-Hastings]] used for SAEM is that the population parameter $\theta$ is not updated and remains fixed at $\hat{\theta}$. |
− | |||
}} | }} | ||
− | {{ | + | {{OutlineText |
− | |text=In summary, for a given estimate $\hat{\theta}$ of the population parameter $\theta$, a stochastic approximation algorithm for estimating the observed Fisher Information Matrix $I(\hat{\theta)}$ consists | + | |text=In summary, for a given estimate $\hat{\theta}$ of the population parameter $\theta$, a stochastic approximation algorithm for estimating the observed Fisher Information Matrix $I(\hat{\theta)}$ consists of: |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
+ | <blockquote> | ||
+ | 1. For $i=1,2,\ldots,N$, run a [[The Metropolis-Hastings algorithm for simulating the individual parameters|Metropolis-Hastings algorithm]] to draw a sequence $\psi_i^{(k)}$ with limit distribution $\pmacro(\psi_i {{!}}y_i ;\hat{\theta})$. | ||
+ | </blockquote> | ||
+ | <blockquote> | ||
+ | 2. At iteration $k$ of the Metropolis-Hastings algorithm, compute the first and second derivatives of $\pypsi(\by,\bpsi^{(k)};\hat{\theta})$. | ||
+ | </blockquote> | ||
+ | <blockquote> | ||
+ | 3.Update $\Delta_k$, $G_k$, $D_k$ and compute an estimate $H_k$ of the F.I.M. | ||
+ | </blockquote> | ||
+ | }} | ||
{{Example | {{Example | ||
|title=Example 1 | |title=Example 1 | ||
− | |text= Consider the model | + | |text=Consider the model |
{{Equation1 | {{Equation1 | ||
Line 378: | Line 128: | ||
\end{eqnarray}</math> }} | \end{eqnarray}</math> }} | ||
− | where $\Omega = {\rm diag}(\omega_1^2,\omega_2^2,\ldots,\omega_d^2)$ is a diagonal matrix and $h(\psi_i)=(h_1(\psi_{i,1}), h_2(\psi_{i,2}), \ldots , h_d(\psi_{i,d}) )^{\transpose}$. The vector of population parameters is $\theta = (\psi_{\rm pop} , \Omega)=(\psi_{ {\rm pop},1},\ldots,\psi_{ {\rm pop},d},\omega_1^2,\ldots,\omega_d^2)$. | + | where $\Omega = {\rm diag}(\omega_1^2,\omega_2^2,\ldots,\omega_d^2)$ is a diagonal matrix and $h(\psi_i)=(h_1(\psi_{i,1}), h_2(\psi_{i,2}), \ldots , h_d(\psi_{i,d}) )^{\transpose}$. |
+ | The vector of population parameters is $\theta = (\psi_{\rm pop} , \Omega)=(\psi_{ {\rm pop},1},\ldots,\psi_{ {\rm pop},d},\omega_1^2,\ldots,\omega_d^2)$. | ||
Here, | Here, | ||
{{Equation1 | {{Equation1 | ||
− | |equation=<math> \log (\pyipsii(y_i,\psi_i;\theta)) = \log (\pcyipsii(y_i {{!}} \psi_i)) + \log (\ppsii(\psi_i;\theta)). | + | |equation=<math> |
+ | \log (\pyipsii(y_i,\psi_i;\theta)) = \log (\pcyipsii(y_i {{!}} \psi_i)) + \log (\ppsii(\psi_i;\theta)). | ||
</math> }} | </math> }} | ||
Line 391: | Line 143: | ||
|equation=<math>\begin{eqnarray} | |equation=<math>\begin{eqnarray} | ||
\Dt{\log (\pyipsii(y_i,\psi_i;\theta))} &=& \Dt{\log (\ppsii(\psi_i;\theta))} \\ | \Dt{\log (\pyipsii(y_i,\psi_i;\theta))} &=& \Dt{\log (\ppsii(\psi_i;\theta))} \\ | ||
− | \DDt{\log (\pyipsii(y_i,\psi_i;\theta))} &=& \DDt{\log (\ppsii(\psi_i;\theta))} | + | \DDt{\log (\pyipsii(y_i,\psi_i;\theta))} &=& \DDt{\log (\ppsii(\psi_i;\theta))} . |
\end{eqnarray}</math> }} | \end{eqnarray}</math> }} | ||
Line 398: | Line 150: | ||
{{Equation1 | {{Equation1 | ||
|equation=<math>\begin{eqnarray} | |equation=<math>\begin{eqnarray} | ||
− | \log (\ppsii(\psi_i;\theta)) &=& -\frac{d}{2}\log(2\pi) + \sum_{\iparam=1}^d \log(h_\iparam^{\prime}(\psi_{i,\iparam})) | + | \log (\ppsii(\psi_i;\theta)) &=& -\displaystyle{\frac{d}{2} }\log(2\pi) + \sum_{\iparam=1}^d \log(h_\iparam^{\prime}(\psi_{i,\iparam})) |
− | -\frac{1}{2} \sum_{\iparam=1}^d \log(\omega_\iparam^2) | + | -\displaystyle{ \frac{1}{2} } \sum_{\iparam=1}^d \log(\omega_\iparam^2) |
− | + | -\sum_{\iparam=1}^d \displaystyle{ \frac{1}{2\, \omega_\iparam^2} }( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) )^2 \\ | |
− | |||
− | |||
− | -\sum_{\iparam=1}^d \frac{1}{2\, \omega_\iparam^2}( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) )^2 | ||
\partial \log (\ppsii(\psi_i;\theta))/\partial \psi_{ {\rm pop},\iparam} &=& | \partial \log (\ppsii(\psi_i;\theta))/\partial \psi_{ {\rm pop},\iparam} &=& | ||
− | \frac{1}{\omega_\iparam^2} h_\iparam^{\prime}(\psi_{ {\rm pop},\iparam})( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) ) | + | \displaystyle{\frac{1}{\omega_\iparam^2} }h_\iparam^{\prime}(\psi_{ {\rm pop},\iparam})( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) ) \\ |
\partial \log (\ppsii(\psi_i;\theta))/\partial \omega^2_{\iparam} &=& | \partial \log (\ppsii(\psi_i;\theta))/\partial \omega^2_{\iparam} &=& | ||
− | -\frac{1}{2\omega_\iparam^2} | + | -\displaystyle{ \frac{1}{2\omega_\iparam^2} } |
− | + \frac{1}{2\, \omega_\iparam^4}( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) )^2 | + | +\displaystyle{\frac{1}{2\, \omega_\iparam^4} }( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) )^2 \\ |
\partial^2 \log (\ppsii(\psi_i;\theta))/\partial \psi_{ {\rm pop},\iparam} \partial \psi_{ {\rm pop},\jparam} &=& | \partial^2 \log (\ppsii(\psi_i;\theta))/\partial \psi_{ {\rm pop},\iparam} \partial \psi_{ {\rm pop},\jparam} &=& | ||
\left\{ | \left\{ | ||
\begin{array}{ll} | \begin{array}{ll} | ||
<!-- % \frac{1}{\omega_\iparam^2} --> | <!-- % \frac{1}{\omega_\iparam^2} --> | ||
− | \left( h_\iparam^{\prime\prime}(\psi_{ {\rm pop},\iparam})( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) )- h_\iparam^{\prime \, 2}(\psi_{ {\rm pop},\iparam}) \right)/\omega_\iparam^2 | + | \left( h_\iparam^{\prime\prime}(\psi_{ {\rm pop},\iparam})( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) )- h_\iparam^{\prime \, 2}(\psi_{ {\rm pop},\iparam}) \right)/\omega_\iparam^2 & {\rm if \quad } \iparam=\jparam \\ |
− | 0 | + | 0 & {\rm otherwise} |
\end{array} | \end{array} | ||
\right. | \right. | ||
− | + | \\ | |
\partial^2 \log (\ppsii(\psi_i;\theta))/\partial \omega^2_{\iparam} \partial \omega^2_{\jparam} &=& \left\{ | \partial^2 \log (\ppsii(\psi_i;\theta))/\partial \omega^2_{\iparam} \partial \omega^2_{\jparam} &=& \left\{ | ||
\begin{array}{ll} | \begin{array}{ll} | ||
− | <!-- | + | <!-- % \frac{1}{2\omega_\iparam^4} - \frac{1}{\omega_\iparam^6} --> |
1/(2\omega_\iparam^4) - | 1/(2\omega_\iparam^4) - | ||
− | ( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) )^2/\omega_\iparam^6 | + | ( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) )^2/\omega_\iparam^6 & {\rm if \quad} \iparam=\jparam \\ |
− | 0 | + | 0 & {\rm otherwise} |
\end{array} | \end{array} | ||
\right. | \right. | ||
Line 429: | Line 178: | ||
\partial^2 \log (\ppsii(\psi_i;\theta))/\partial \psi_{ {\rm pop},\iparam} \partial \omega^2_{\jparam} &=& \left\{ | \partial^2 \log (\ppsii(\psi_i;\theta))/\partial \psi_{ {\rm pop},\iparam} \partial \omega^2_{\jparam} &=& \left\{ | ||
\begin{array}{ll} | \begin{array}{ll} | ||
− | % -\frac{1}{\omega_\iparam^4} | + | <!-- % -\frac{1}{\omega_\iparam^4} --> |
− | -h_\iparam^{\prime}(\psi_{ {\rm pop},\iparam})( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) )/\omega_\iparam^4 \\ | + | -h_\iparam^{\prime}(\psi_{ {\rm pop},\iparam})( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) )/\omega_\iparam^4 & {\rm if \quad} \iparam=\jparam \\ |
− | 0 | + | 0 & {\rm otherwise.} |
\end{array} | \end{array} | ||
\right. | \right. | ||
− | \end{eqnarray}<math> }} | + | \end{eqnarray}</math> }} |
}} | }} | ||
Line 441: | Line 190: | ||
{{Example | {{Example | ||
|title=Example 2 | |title=Example 2 | ||
− | |text= We consider the same model for continuous data, assuming a constant error model and | + | |text= We consider the same model for continuous data, assuming a constant error model and that the variance $a^2$ of the residual error has no variability: |
{{Equation1 | {{Equation1 | ||
|equation=<math>\begin{eqnarray} | |equation=<math>\begin{eqnarray} | ||
− | y_{ij} {{!}} \psi_i &\sim& {\cal N}(f(t_{ij}, \psi_i) \ , \ a^2) | + | y_{ij} {{!}} \psi_i &\sim& {\cal N}(f(t_{ij}, \psi_i) \ , \ a^2), \ \ 1 \leq j \leq n_i \\ |
− | h(\psi_i) &\sim_{i.i.d}& {\cal N}( h(\psi_{\rm pop}) , \Omega) | + | h(\psi_i) &\sim_{i.i.d}& {\cal N}( h(\psi_{\rm pop}) , \Omega). |
\end{eqnarray}</math> }} | \end{eqnarray}</math> }} | ||
Line 452: | Line 201: | ||
{{Equation1 | {{Equation1 | ||
− | |equation=<math> | + | |equation=<math> |
− | \log(\pyipsii(y_i,\psi_i;\theta)) = \log(\pcyipsii(y_i {{!}} \psi_i ; a^2)) + \log(\ppsii(\psi_i;\psi_{\rm pop},\Omega)) | + | \log(\pyipsii(y_i,\psi_i;\theta)) = \log(\pcyipsii(y_i {{!}} \psi_i ; a^2)) + \log(\ppsii(\psi_i;\psi_{\rm pop},\Omega)), |
− | + | </math> }} | |
where | where | ||
{{Equation1 | {{Equation1 | ||
− | |equation=<math> \log(\pcyipsii(y_i {{!}} \psi_i ; a^2)) | + | |equation=<math> |
− | =-\frac{n_i}{2}\log(2\pi)- \frac{n_i}{2}\log(a^2) - frac{1}{2a^2}\sum_{j=1}^{n_i}(y_{ij} - f(t_{ij}, \psi_i))^2 | + | \log(\pcyipsii(y_i {{!}} \psi_i ; a^2)) |
+ | =-\displaystyle{\frac{n_i}{2} }\log(2\pi)- \displaystyle{\frac{n_i}{2} }\log(a^2) - \displaystyle{\frac{1}{2a^2} }\sum_{j=1}^{n_i}(y_{ij} - f(t_{ij}, \psi_i))^2 . | ||
</math> }} | </math> }} | ||
− | Derivatives of $\log(\pcyipsii(y_i | + | Derivatives of $\log(\pcyipsii(y_i {{!}} \psi_i ; a^2))$ with respect to $a^2$ are straightforward to compute. Derivatives of $\log(\ppsii(\psi_i;\psi_{\rm pop},\Omega))$ with respect to $\psi_{\rm pop}$ and $\Omega$ remain unchanged. |
}} | }} | ||
+ | |||
Line 470: | Line 221: | ||
{{Example | {{Example | ||
|title=Example 3 | |title=Example 3 | ||
− | |text= | + | |text= Consider again the same model for continuous data, assuming now that a subset $\xi$ of the parameters of the structural model has no variability: |
{{Equation1 | {{Equation1 | ||
|equation=<math>\begin{eqnarray} | |equation=<math>\begin{eqnarray} | ||
− | y_{ij} {{!}} \psi_i &\sim& {\cal N}(f(t_{ij}, \psi_i,\xi) \ , \ a^2) | + | y_{ij} {{!}} \psi_i &\sim& {\cal N}(f(t_{ij}, \psi_i,\xi) \ , \ a^2), \ \ 1 \leq j \leq n_i \\ |
− | h(\psi_i) &\sim_{i.i.d}& {\cal N}( h(\psi_{\rm pop}) , \Omega) | + | h(\psi_i) &\sim_{i.i.d}& {\cal N}( h(\psi_{\rm pop}) , \Omega). |
\end{eqnarray}</math> }} | \end{eqnarray}</math> }} | ||
− | $\psi$ | + | Let $\psi$ remain as the subset of individual parameters with variability. Here, $\theta_y=(\xi,a^2)$, $\theta_\psi=(\psi_{\rm pop},\Omega)$, and |
{{Equation1 | {{Equation1 | ||
− | |equation=<math> \log(\pcyipsii(y_i {{!}} \psi_i ; \xi,a^2)) | + | |equation=<math> |
− | =-\frac{n_i}{2}\log(2\pi)- \frac{n_i}{2}\log(a^2) - \frac{1}{2 a^2}\sum_{j=1}^{n_i}(y_{ij} - f(t_{ij}, \psi_i,\xi))^2 | + | \log(\pcyipsii(y_i {{!}} \psi_i ; \xi,a^2)) |
+ | =-\displaystyle{\frac{n_i}{2} }\log(2\pi)- \displaystyle{\frac{n_i}{2} }\log(a^2) - \displaystyle{\frac{1}{2 a^2} }\sum_{j=1}^{n_i}(y_{ij} - f(t_{ij}, \psi_i,\xi))^2 . | ||
</math> }} | </math> }} | ||
− | Derivatives of $\log(\pcyipsii(y_i | + | Derivatives of $\log(\pcyipsii(y_i {{!}} \psi_i ; \xi, a^2))$ with respect to $\xi$ require computation of the derivative of $f$ with respect to $\xi$. These derivatives are usually not calculable. One possibility is to numerically approximate them using finite differences. |
}} | }} | ||
<br> | <br> | ||
− | + | ||
− | Consider here a model for continuous data | + | == Estimation using linearization of the model == |
+ | |||
+ | Consider here a model for continuous data that uses a $\phi$-parametrization for the individual parameters: | ||
{{Equation1 | {{Equation1 | ||
− | |equation=\begin{eqnarray} | + | |equation=<math>\begin{eqnarray} |
y_{ij} &= & f(t_{ij} , \phi_i) + g(t_{ij} , \phi_i)\teps_{ij} \\ | y_{ij} &= & f(t_{ij} , \phi_i) + g(t_{ij} , \phi_i)\teps_{ij} \\ | ||
− | \phi_i &=& \phi_{\rm pop} + \eta_i | + | \phi_i &=& \phi_{\rm pop} + \eta_i . |
\end{eqnarray}</math> }} | \end{eqnarray}</math> }} | ||
− | Let $\hphi_i$ be some predicted value of $\phi_i$ | + | Let $\hphi_i$ be some predicted value of $\phi_i$, such as for instance the estimated mean or estimated mode of the conditional distribution $\pmacro(\phi_i |y_i ; \hat{\theta})$. |
− | We can | + | We can then choose to linearize the model for the observations $(y_{ij}, 1\leq j \leq n_i)$ of individual $i$ around the vector of predicted individual parameters. Let $\Dphi{f(t , \phi)}$ be the row vector of derivatives of $f(t , \phi)$ with respect to $\phi$. Then, |
{{Equation1 | {{Equation1 | ||
− | |equation=\begin{eqnarray} | + | |equation=<math>\begin{eqnarray} |
y_{ij} &\simeq& f(t_{ij} , \hphi_i) + \Dphi{f(t_{ij} , \hphi_i)} \, (\phi_i - \hphi_i) + g(t_{ij} , \hphi_i)\teps_{ij} \\ | y_{ij} &\simeq& f(t_{ij} , \hphi_i) + \Dphi{f(t_{ij} , \hphi_i)} \, (\phi_i - \hphi_i) + g(t_{ij} , \hphi_i)\teps_{ij} \\ | ||
&\simeq& f(t_{ij} , \hphi_i) + \Dphi{f(t_{ij} , \hphi_i)} \, (\phi_{\rm pop} - \hphi_i) | &\simeq& f(t_{ij} , \hphi_i) + \Dphi{f(t_{ij} , \hphi_i)} \, (\phi_{\rm pop} - \hphi_i) | ||
− | + \Dphi{f(t_{ij} , \hphi_i)} \, \eta_i + g(t_{ij} , \hphi_i)\teps_{ij} | + | + \Dphi{f(t_{ij} , \hphi_i)} \, \eta_i + g(t_{ij} , \hphi_i)\teps_{ij} . |
\end{eqnarray}</math> }} | \end{eqnarray}</math> }} | ||
− | + | Then, we can approximate the marginal distribution of the vector $y_i$ as a normal distribution: | |
{{EquationWithRef | {{EquationWithRef | ||
|equation=<div id="eq:fim_approx"><math> | |equation=<div id="eq:fim_approx"><math> | ||
− | + | y_{i} \approx {\cal N}\left(f(t_{i} , \hphi_i) + \Dphi{f(t_{i} , \hphi_i)} \, (\phi_{\rm pop} - \hphi_i) , | |
− | \Dphi{f(t_{i} , \hphi_i)} \Omega \Dphi{f(t_{i} , \hphi_i)}^{\transpose} + g(t_{i} , \hphi_i)\Sigma_{n_i} g(t_{ij} , \hphi_i)^{\transpose} \right) | + | \Dphi{f(t_{i} , \hphi_i)} \Omega \Dphi{f(t_{i} , \hphi_i)}^{\transpose} + g(t_{i} , \hphi_i)\Sigma_{n_i} g(t_{ij} , \hphi_i)^{\transpose} \right), |
</math></div> | </math></div> | ||
|reference=(5) }} | |reference=(5) }} | ||
− | where $\Sigma_{n_i}$ is the variance-covariance matrix of $\teps_{i,1},\ldots,\teps_{i,n_i} | + | where $\Sigma_{n_i}$ is the variance-covariance matrix of $\teps_{i,1},\ldots,\teps_{i,n_i}$. If the $\teps_{ij}$ are i.i.d., then |
+ | $\Sigma_{n_i}$ is the identity matrix. | ||
− | + | We can equivalently use the original $\psi$-parametrization and the fact that $\phi_i=h(\psi_i)$. Then, | |
− | We can equivalently use the original $\psi$-parametrization | ||
{{Equation1 | {{Equation1 | ||
− | |equation=<math> \Dphi{f(t_{i} , \hphi_i)} = \Dpsi{f(t_{i} , \hpsi_i)} J_h(\hpsi_i)^{\transpose} | + | |equation=<math> |
− | </math> }} | + | \Dphi{f(t_{i} , \hphi_i)} = \Dpsi{f(t_{i} , \hpsi_i)} J_h(\hpsi_i)^{\transpose} , </math> }} |
where $J_h$ is the Jacobian of $h$. | where $J_h$ is the Jacobian of $h$. | ||
+ | We then can approximate the observed log-likelihood ${\llike}(\theta) = \log(\like(\theta;\by))=\sum_{i=1}^N \log(\pyi(y_i;\theta))$ using this normal approximation. We can also derive the F.I.M. by computing the matrix of second-order partial derivatives of ${\llike}(\theta)$. | ||
− | + | Except for very simple models, computing these second-order partial derivatives in closed form is not straightforward. In such cases, finite differences can be used for numerically approximating them. We can use for instance a central difference approximation of the second derivative of $\llike(\theta)$. To this end, let $\nu>0$. For $j=1,2,\ldots, m$, let $\nu^{(j)}=(\nu^{(j)}_{k}, 1\leq k \leq m)$ be the $m$-vector such that | |
− | |||
− | |||
− | Except for very simple models, computing these second-order | ||
− | |||
− | |||
{{Equation1 | {{Equation1 | ||
Line 542: | Line 292: | ||
\nu^{(j)}_{k} = \left\{ | \nu^{(j)}_{k} = \left\{ | ||
\begin{array}{ll} | \begin{array}{ll} | ||
− | \nu | + | \nu & {\rm if \quad j= k} \\ |
− | 0 | + | 0 & {\rm otherwise.} |
\end{array} | \end{array} | ||
\right. | \right. | ||
Line 550: | Line 300: | ||
Then, for $\nu$ small enough, | Then, for $\nu$ small enough, | ||
+ | {{Equation1 | ||
+ | |equation=<math>\begin{eqnarray} | ||
+ | \partial_{\theta_j}{ {\llike}(\theta)} &\approx& \displaystyle{ \frac{ {\llike}(\theta+\nu^{(j)})- {\llike}(\theta-\nu^{(j)})}{2\nu} } \\ | ||
+ | \end{eqnarray}</math> }} | ||
{{EquationWithRef | {{EquationWithRef | ||
|equation=<div id="eq:fim_diff"><math>\begin{eqnarray} | |equation=<div id="eq:fim_diff"><math>\begin{eqnarray} | ||
− | + | \partial^2_{\theta_j,\theta_k}{ {\llike}(\theta)} &\approx& \displaystyle{\frac{ {\llike}(\theta+\nu^{(j)}+\nu^{(k)})- {\llike}(\theta+\nu^{(j)}-\nu^{(k)}) | |
− | \partial^2_{\theta_j,\theta_k}{\llike(\theta)} &\approx& \displaystyle{ \frac{\llike(\theta+\nu^{(j)}+\nu^{(k)})-\llike(\theta+\nu^{(j)}-\nu^{(k)}) | + | -{\llike}(\theta-\nu^{(j)}+\nu^{(k)})+{\llike}(\theta-\nu^{(j)}-\nu^{(k)})}{4\nu^2} } . |
− | -\llike(\theta-\nu^{(j)}+\nu^{(k)})+\llike(\theta-\nu^{(j)}-\nu^{(k)})}{4\nu^2} } . | ||
\end{eqnarray}</math></div> | \end{eqnarray}</math></div> | ||
|reference=(6) }} | |reference=(6) }} | ||
+ | <br><br> | ||
+ | ------ | ||
+ | <br><br> | ||
− | {{ | + | {{OutlineText |
− | |text= | + | |text=In summary, for a given estimate $\hat{\theta}$ of the population parameter $\theta$, the algorithm for approximating the Fisher Information Matrix $I(\hat{\theta)}$ using a linear approximation of the model consists of: |
− | In summary, for a given estimate $\hat{\theta}$ of the population parameter $\theta$, the algorithm for approximating the Fisher Information Matrix $I(\hat{\theta)}$ using a linear approximation of the model consists | ||
− | <blockquote> | + | <blockquote> |
− | 1. | + | 1. For $i=1,2,\ldots,N$, obtain some estimate $(\hpsi_i)$ of the individual parameters $(\psi_i)$ (we can average for example the final terms of the sequence $(\psi_i^{(k)})$ drawn during the final iterations of the [[The SAEM algorithm for estimating population parameters| SAEM algorithm]]). |
− | + | </blockquote> | |
− | 2. | + | <blockquote> |
− | + | 2. For $i=1,2,\ldots,N$, compute $\hphi_i=h(\hpsi_i)$, the mean and the variance of the normal distribution defined in [[#eq:fim_approx|(5)]], and ${\llike}(\theta)$ using this normal approximation. | |
− | 3. | + | </blockquote> |
+ | <blockquote> | ||
+ | 3. Use [[#eq:fim_diff|(6)]] to approximate the matrix of second-order derivatives of ${\llike}(\theta)$. | ||
</blockquote> | </blockquote> | ||
}} | }} | ||
− | + | {{Back&Next | |
+ | |linkNext=Estimation of the log-likelihood | ||
+ | |linkBack=The Metropolis-Hastings algorithm for simulating the individual parameters }} |
Latest revision as of 12:09, 28 August 2013
$ \def\hphi{\tilde{\phi}} $
Estimation using stochastic approximation
The observed Fisher information matrix (F.I.M.) is a function of $\theta$ defined as
\(\begin{eqnarray}
I(\theta) &=& -\DDt{\log ({\like}(\theta;\by))} \\
&=& -\DDt{\log (\py(\by;\theta))} .
\end{eqnarray}\)
|
(1) |
Due to the likelihood being quite complex, $I(\theta)$ usually has no closed form expression. It is however possible to estimate it using a stochastic approximation procedure based on Louis' formula:
where
Thus, $\DDt{\log (\pmacro(\by;\theta))}$ is defined as a combination of conditional expectations. Each of these conditional expectations can be estimated by Monte Carlo, or equivalently approximated using a stochastic approximation algorithm.
We can then draw a sequence $(\psi_i^{(k)})$ using a Metropolis-Hasting algorithm and estimate the observed F.I.M. online. At iteration $k$ of the algorithm:
- Simulation step: for $i=1,2,\ldots,N$, draw $\psi_i^{(k)}$ from $m$ iterations of the Metropolis-Hastings algorithm described in The Metropolis-Hastings algorithm section with $\pmacro(\psi_i |y_i ;{\theta})$ as the limit distribution.
- Stochastic approximation: update $D_k$, $G_k$ and $\Delta_k$ according to the following recurrence relations:
- where $(\gamma_k)$ is a decreasing sequence of positive numbers such that $\gamma_1=1$, $ \sum_{k=1}^{\infty} \gamma_k = \infty$, and $\sum_{k=1}^{\infty} \gamma_k^2 < \infty$.
- Estimation step: update the estimate $H_k$ of the F.I.M. according to
Implementing this algorithm therefore requires computation of the first and second derivatives of
Assume first that the joint distribution of $\by$ and $\bpsi$ decomposes as
\(
\pypsi(\by,\bpsi;\theta) = \pcypsi(\by | \bpsi)\ppsi(\bpsi;\theta).
\)
|
(2) |
This assumption means that for any $i=1,2,\ldots,N$, all of the components of $\psi_i$ are random and there exists a sufficient statistic ${\cal S}(\bpsi)$ for the estimation of $\theta$. It is then sufficient to compute the first and second derivatives of $\log (\pmacro(\bpsi;\theta))$ in order to estimate the F.I.M. This can be done relatively simply in closed form when the individual parameters are normally distributed (or a transformation $h$ of them is).
If some component of $\psi_i$ has no variability, (2) no longer holds, but we can decompose $\theta$ into $(\theta_y,\theta_\psi)$ such that
We then need to compute the first and second derivatives of $\log(\pcyipsii(y_i |\psi_i ; \theta_y))$ and $\log(\ppsii(\psi_i;\theta_\psi))$. Derivatives of $\log(\pcyipsii(y_i |\psi_i ; \theta_y))$ that do not have a closed form expression can be obtained using central differences.
Estimation using linearization of the model
Consider here a model for continuous data that uses a $\phi$-parametrization for the individual parameters:
Let $\hphi_i$ be some predicted value of $\phi_i$, such as for instance the estimated mean or estimated mode of the conditional distribution $\pmacro(\phi_i |y_i ; \hat{\theta})$.
We can then choose to linearize the model for the observations $(y_{ij}, 1\leq j \leq n_i)$ of individual $i$ around the vector of predicted individual parameters. Let $\Dphi{f(t , \phi)}$ be the row vector of derivatives of $f(t , \phi)$ with respect to $\phi$. Then,
Then, we can approximate the marginal distribution of the vector $y_i$ as a normal distribution:
\(
y_{i} \approx {\cal N}\left(f(t_{i} , \hphi_i) + \Dphi{f(t_{i} , \hphi_i)} \, (\phi_{\rm pop} - \hphi_i) ,
\Dphi{f(t_{i} , \hphi_i)} \Omega \Dphi{f(t_{i} , \hphi_i)}^{\transpose} + g(t_{i} , \hphi_i)\Sigma_{n_i} g(t_{ij} , \hphi_i)^{\transpose} \right),
\)
|
(5) |
where $\Sigma_{n_i}$ is the variance-covariance matrix of $\teps_{i,1},\ldots,\teps_{i,n_i}$. If the $\teps_{ij}$ are i.i.d., then $\Sigma_{n_i}$ is the identity matrix.
We can equivalently use the original $\psi$-parametrization and the fact that $\phi_i=h(\psi_i)$. Then,
where $J_h$ is the Jacobian of $h$.
We then can approximate the observed log-likelihood ${\llike}(\theta) = \log(\like(\theta;\by))=\sum_{i=1}^N \log(\pyi(y_i;\theta))$ using this normal approximation. We can also derive the F.I.M. by computing the matrix of second-order partial derivatives of ${\llike}(\theta)$.
Except for very simple models, computing these second-order partial derivatives in closed form is not straightforward. In such cases, finite differences can be used for numerically approximating them. We can use for instance a central difference approximation of the second derivative of $\llike(\theta)$. To this end, let $\nu>0$. For $j=1,2,\ldots, m$, let $\nu^{(j)}=(\nu^{(j)}_{k}, 1\leq k \leq m)$ be the $m$-vector such that
Then, for $\nu$ small enough,
\(\begin{eqnarray}
\partial^2_{\theta_j,\theta_k}{ {\llike}(\theta)} &\approx& \displaystyle{\frac{ {\llike}(\theta+\nu^{(j)}+\nu^{(k)})- {\llike}(\theta+\nu^{(j)}-\nu^{(k)})
-{\llike}(\theta-\nu^{(j)}+\nu^{(k)})+{\llike}(\theta-\nu^{(j)}-\nu^{(k)})}{4\nu^2} } .
\end{eqnarray}\)
|
(6) |