Difference between revisions of "Estimation of the observed Fisher information matrix"
m (→Estimation using linearization of the model) |
m |
||
(7 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
+ | $ \def\hphi{\tilde{\phi}} $ | ||
==Estimation using stochastic approximation== | ==Estimation using stochastic approximation== | ||
Line 13: | Line 14: | ||
{{Equation1 | {{Equation1 | ||
− | |equation=<math>\DDt{\log (\pmacro(\by | + | |equation=<math>\DDt{\log (\pmacro(\by;\theta))} = \esp{\DDt{\log (\pmacro(\by,\bpsi;\theta))} {{!}} \by ;\theta} + \cov{\Dt{\log (\pmacro(\by,\bpsi;\theta))} {{!}} \by ; \theta}, |
</math> }} | </math> }} | ||
Line 20: | Line 21: | ||
{{Equation1 | {{Equation1 | ||
|equation=<math>\begin{eqnarray} | |equation=<math>\begin{eqnarray} | ||
− | \cov{\Dt{\log (\pmacro(\by,\bpsi;\theta))} {{!}} \by ; \theta} &=& | + | \cov{\Dt{\log (\pmacro(\by,\bpsi;\theta))} {{!}} \by ; \theta} &=& |
− | \esp{ \left(\Dt{\log (\pmacro(\by,\bpsi;\theta))} \right)\left(\Dt{\log (\pmacro(\by,\bpsi;\theta))}\right)^{\transpose} {{!}} \by ; \theta} \\ | + | \esp{ \left(\Dt{\log (\pmacro(\by,\bpsi;\theta))} \right)\left(\Dt{\log (\pmacro(\by,\bpsi;\theta))}\right)^{\transpose} {{!}} \by ; \theta} \\ |
&& - \esp{\Dt{\log (\pmacro(\by,\bpsi;\theta))} {{!}} \by ; \theta}\esp{\Dt{\log (\pmacro(\by,\bpsi;\theta))} {{!}} \by ; \theta}^{\transpose} . | && - \esp{\Dt{\log (\pmacro(\by,\bpsi;\theta))} {{!}} \by ; \theta}\esp{\Dt{\log (\pmacro(\by,\bpsi;\theta))} {{!}} \by ; \theta}^{\transpose} . | ||
\end{eqnarray}</math> }} | \end{eqnarray}</math> }} | ||
− | Thus, $\DDt{\log (\pmacro(\by | + | Thus, $\DDt{\log (\pmacro(\by;\theta))}$ is defined as a combination of conditional expectations. Each of these conditional expectations can be estimated by Monte Carlo, or equivalently approximated using a stochastic approximation algorithm. |
We can then draw a sequence $(\psi_i^{(k)})$ using a [[The Metropolis-Hastings algorithm for simulating the individual parameters|Metropolis-Hasting algorithm]] and estimate the observed F.I.M. online. At iteration $k$ of the algorithm: | We can then draw a sequence $(\psi_i^{(k)})$ using a [[The Metropolis-Hastings algorithm for simulating the individual parameters|Metropolis-Hasting algorithm]] and estimate the observed F.I.M. online. At iteration $k$ of the algorithm: | ||
− | * '''Simulation step''': for $i=1,2,\ldots,N$, draw $\psi_i^{(k)}$ from $m$ iterations of the Metropolis-Hastings algorithm described in [[The Metropolis-Hastings algorithm for simulating the individual parameters| The Metropolis-Hastings algorithm | + | * '''Simulation step''': for $i=1,2,\ldots,N$, draw $\psi_i^{(k)}$ from $m$ iterations of the Metropolis-Hastings algorithm described in [[The Metropolis-Hastings algorithm for simulating the individual parameters| The Metropolis-Hastings algorithm]] section with $\pmacro(\psi_i |y_i ;{\theta})$ as the limit distribution. |
* '''Stochastic approximation''': update $D_k$, $G_k$ and $\Delta_k$ according to the following recurrence relations: | * '''Stochastic approximation''': update $D_k$, $G_k$ and $\Delta_k$ according to the following recurrence relations: | ||
+ | |||
{{Equation1 | {{Equation1 | ||
Line 240: | Line 242: | ||
<br> | <br> | ||
+ | |||
== Estimation using linearization of the model == | == Estimation using linearization of the model == | ||
Line 263: | Line 266: | ||
Then, we can approximate the marginal distribution of the vector $y_i$ as a normal distribution: | Then, we can approximate the marginal distribution of the vector $y_i$ as a normal distribution: | ||
− | {{ | + | {{EquationWithRef |
|equation=<div id="eq:fim_approx"><math> | |equation=<div id="eq:fim_approx"><math> | ||
y_{i} \approx {\cal N}\left(f(t_{i} , \hphi_i) + \Dphi{f(t_{i} , \hphi_i)} \, (\phi_{\rm pop} - \hphi_i) , | y_{i} \approx {\cal N}\left(f(t_{i} , \hphi_i) + \Dphi{f(t_{i} , \hphi_i)} \, (\phi_{\rm pop} - \hphi_i) , | ||
Line 312: | Line 315: | ||
<br><br> | <br><br> | ||
− | {{ | + | {{OutlineText |
− | |||
|text=In summary, for a given estimate $\hat{\theta}$ of the population parameter $\theta$, the algorithm for approximating the Fisher Information Matrix $I(\hat{\theta)}$ using a linear approximation of the model consists of: | |text=In summary, for a given estimate $\hat{\theta}$ of the population parameter $\theta$, the algorithm for approximating the Fisher Information Matrix $I(\hat{\theta)}$ using a linear approximation of the model consists of: | ||
Latest revision as of 12:09, 28 August 2013
$ \def\hphi{\tilde{\phi}} $
Estimation using stochastic approximation
The observed Fisher information matrix (F.I.M.) is a function of $\theta$ defined as
\(\begin{eqnarray}
I(\theta) &=& -\DDt{\log ({\like}(\theta;\by))} \\
&=& -\DDt{\log (\py(\by;\theta))} .
\end{eqnarray}\)
|
(1) |
Due to the likelihood being quite complex, $I(\theta)$ usually has no closed form expression. It is however possible to estimate it using a stochastic approximation procedure based on Louis' formula:
where
Thus, $\DDt{\log (\pmacro(\by;\theta))}$ is defined as a combination of conditional expectations. Each of these conditional expectations can be estimated by Monte Carlo, or equivalently approximated using a stochastic approximation algorithm.
We can then draw a sequence $(\psi_i^{(k)})$ using a Metropolis-Hasting algorithm and estimate the observed F.I.M. online. At iteration $k$ of the algorithm:
- Simulation step: for $i=1,2,\ldots,N$, draw $\psi_i^{(k)}$ from $m$ iterations of the Metropolis-Hastings algorithm described in The Metropolis-Hastings algorithm section with $\pmacro(\psi_i |y_i ;{\theta})$ as the limit distribution.
- Stochastic approximation: update $D_k$, $G_k$ and $\Delta_k$ according to the following recurrence relations:
- where $(\gamma_k)$ is a decreasing sequence of positive numbers such that $\gamma_1=1$, $ \sum_{k=1}^{\infty} \gamma_k = \infty$, and $\sum_{k=1}^{\infty} \gamma_k^2 < \infty$.
- Estimation step: update the estimate $H_k$ of the F.I.M. according to
Implementing this algorithm therefore requires computation of the first and second derivatives of
Assume first that the joint distribution of $\by$ and $\bpsi$ decomposes as
\(
\pypsi(\by,\bpsi;\theta) = \pcypsi(\by | \bpsi)\ppsi(\bpsi;\theta).
\)
|
(2) |
This assumption means that for any $i=1,2,\ldots,N$, all of the components of $\psi_i$ are random and there exists a sufficient statistic ${\cal S}(\bpsi)$ for the estimation of $\theta$. It is then sufficient to compute the first and second derivatives of $\log (\pmacro(\bpsi;\theta))$ in order to estimate the F.I.M. This can be done relatively simply in closed form when the individual parameters are normally distributed (or a transformation $h$ of them is).
If some component of $\psi_i$ has no variability, (2) no longer holds, but we can decompose $\theta$ into $(\theta_y,\theta_\psi)$ such that
We then need to compute the first and second derivatives of $\log(\pcyipsii(y_i |\psi_i ; \theta_y))$ and $\log(\ppsii(\psi_i;\theta_\psi))$. Derivatives of $\log(\pcyipsii(y_i |\psi_i ; \theta_y))$ that do not have a closed form expression can be obtained using central differences.
Estimation using linearization of the model
Consider here a model for continuous data that uses a $\phi$-parametrization for the individual parameters:
Let $\hphi_i$ be some predicted value of $\phi_i$, such as for instance the estimated mean or estimated mode of the conditional distribution $\pmacro(\phi_i |y_i ; \hat{\theta})$.
We can then choose to linearize the model for the observations $(y_{ij}, 1\leq j \leq n_i)$ of individual $i$ around the vector of predicted individual parameters. Let $\Dphi{f(t , \phi)}$ be the row vector of derivatives of $f(t , \phi)$ with respect to $\phi$. Then,
Then, we can approximate the marginal distribution of the vector $y_i$ as a normal distribution:
\(
y_{i} \approx {\cal N}\left(f(t_{i} , \hphi_i) + \Dphi{f(t_{i} , \hphi_i)} \, (\phi_{\rm pop} - \hphi_i) ,
\Dphi{f(t_{i} , \hphi_i)} \Omega \Dphi{f(t_{i} , \hphi_i)}^{\transpose} + g(t_{i} , \hphi_i)\Sigma_{n_i} g(t_{ij} , \hphi_i)^{\transpose} \right),
\)
|
(5) |
where $\Sigma_{n_i}$ is the variance-covariance matrix of $\teps_{i,1},\ldots,\teps_{i,n_i}$. If the $\teps_{ij}$ are i.i.d., then $\Sigma_{n_i}$ is the identity matrix.
We can equivalently use the original $\psi$-parametrization and the fact that $\phi_i=h(\psi_i)$. Then,
where $J_h$ is the Jacobian of $h$.
We then can approximate the observed log-likelihood ${\llike}(\theta) = \log(\like(\theta;\by))=\sum_{i=1}^N \log(\pyi(y_i;\theta))$ using this normal approximation. We can also derive the F.I.M. by computing the matrix of second-order partial derivatives of ${\llike}(\theta)$.
Except for very simple models, computing these second-order partial derivatives in closed form is not straightforward. In such cases, finite differences can be used for numerically approximating them. We can use for instance a central difference approximation of the second derivative of $\llike(\theta)$. To this end, let $\nu>0$. For $j=1,2,\ldots, m$, let $\nu^{(j)}=(\nu^{(j)}_{k}, 1\leq k \leq m)$ be the $m$-vector such that
Then, for $\nu$ small enough,
\(\begin{eqnarray}
\partial^2_{\theta_j,\theta_k}{ {\llike}(\theta)} &\approx& \displaystyle{\frac{ {\llike}(\theta+\nu^{(j)}+\nu^{(k)})- {\llike}(\theta+\nu^{(j)}-\nu^{(k)})
-{\llike}(\theta-\nu^{(j)}+\nu^{(k)})+{\llike}(\theta-\nu^{(j)}-\nu^{(k)})}{4\nu^2} } .
\end{eqnarray}\)
|
(6) |