|
|
(22 intermediate revisions by 2 users not shown) |
Line 1: |
Line 1: |
− | <div style="font-size:12pt; font-family:garamond">
| + | $ \def\hphi{\tilde{\phi}} $ |
− | | + | ==Estimation using stochastic approximation== |
− | ==Estimation of the observed F.I.M. using a stochastic approximation== | |
| | | |
| The ''observed'' Fisher information matrix (F.I.M.) is a function of $\theta$ defined as | | The ''observed'' Fisher information matrix (F.I.M.) is a function of $\theta$ defined as |
Line 15: |
Line 14: |
| | | |
| {{Equation1 | | {{Equation1 |
− | |equation=<math>\DDt{\log (\pmacro(\by,\bpsi;\theta))} = \esp{\DDt{\log (\pmacro(\by,\bpsi;\theta))} {{!}} \by ;\theta} + \cov{\Dt{\log (\pmacro(\by,\bpsi;\theta))} {{!}} \by ; \theta}, | + | |equation=<math>\DDt{\log (\pmacro(\by;\theta))} = \esp{\DDt{\log (\pmacro(\by,\bpsi;\theta))} {{!}} \by ;\theta} + \cov{\Dt{\log (\pmacro(\by,\bpsi;\theta))} {{!}} \by ; \theta}, |
| </math> }} | | </math> }} |
| | | |
Line 22: |
Line 21: |
| {{Equation1 | | {{Equation1 |
| |equation=<math>\begin{eqnarray} | | |equation=<math>\begin{eqnarray} |
− | \cov{\Dt{\log (\pmacro(\by,\bpsi;\theta))} {{!}} \by ; \theta} &=& | + | \cov{\Dt{\log (\pmacro(\by,\bpsi;\theta))} {{!}} \by ; \theta} &=& |
− | \esp{ \left(\Dt{\log (\pmacro(\by,\bpsi;\theta))} \right)\left(\Dt{\log (\pmacro(\by,\bpsi;\theta))}\right)^{\transpose} {{!}} \by ; \theta} \\ | + | \esp{ \left(\Dt{\log (\pmacro(\by,\bpsi;\theta))} \right)\left(\Dt{\log (\pmacro(\by,\bpsi;\theta))}\right)^{\transpose} {{!}} \by ; \theta} \\ |
| && - \esp{\Dt{\log (\pmacro(\by,\bpsi;\theta))} {{!}} \by ; \theta}\esp{\Dt{\log (\pmacro(\by,\bpsi;\theta))} {{!}} \by ; \theta}^{\transpose} . | | && - \esp{\Dt{\log (\pmacro(\by,\bpsi;\theta))} {{!}} \by ; \theta}\esp{\Dt{\log (\pmacro(\by,\bpsi;\theta))} {{!}} \by ; \theta}^{\transpose} . |
| \end{eqnarray}</math> }} | | \end{eqnarray}</math> }} |
| | | |
− | Thus, $\DDt{\log (\pmacro(\by,\bpsi;\theta))}$ is defined as a combination of conditional expectations. Each of these conditional expectations can be estimated by Monte Carlo, or equivalently approximated using a stochastic approximation algorithm. | + | Thus, $\DDt{\log (\pmacro(\by;\theta))}$ is defined as a combination of conditional expectations. Each of these conditional expectations can be estimated by Monte Carlo, or equivalently approximated using a stochastic approximation algorithm. |
| | | |
− | We can then draw a sequence $(\psi_i^{(k)})$ using a Metropolis-Hasting algorithm and estimate the observed F.I.M. online. | + | We can then draw a sequence $(\psi_i^{(k)})$ using a [[The Metropolis-Hastings algorithm for simulating the individual parameters|Metropolis-Hasting algorithm]] and estimate the observed F.I.M. online. At iteration $k$ of the algorithm: |
− | At iteration $k$ of the algorithm: | |
| | | |
| | | |
− | * '''Simulation step''': for $i=1,2,\ldots,N$, draw $\psi_i^{(k)}$ from $m$ iterations of the Metropolis-Hastings algorithm described in [[The Metropolis-Hastings algorithm for simulating the individual parameters| The Metropolis-Hastings algorithm section]] with $\pmacro(\psi_i |y_i ;{\theta})$ as the limit distribution. | + | * '''Simulation step''': for $i=1,2,\ldots,N$, draw $\psi_i^{(k)}$ from $m$ iterations of the Metropolis-Hastings algorithm described in [[The Metropolis-Hastings algorithm for simulating the individual parameters| The Metropolis-Hastings algorithm]] section with $\pmacro(\psi_i |y_i ;{\theta})$ as the limit distribution. |
| | | |
| | | |
| * '''Stochastic approximation''': update $D_k$, $G_k$ and $\Delta_k$ according to the following recurrence relations: | | * '''Stochastic approximation''': update $D_k$, $G_k$ and $\Delta_k$ according to the following recurrence relations: |
| + | |
| | | |
| {{Equation1 | | {{Equation1 |
Line 97: |
Line 96: |
| |reference=(4) }} | | |reference=(4) }} |
| | | |
− | : [[#eq:fim_Delta1|(3)]] (resp. [[#eq:fim_Delta2|(4)]]) defines $\Delta_k$ using an online (resp. offline) algorithm.
| + | [[#eq:fim_Delta1|(3)]] (resp. [[#eq:fim_Delta2|(4)]]) defines $\Delta_k$ using an online (resp. offline) algorithm. Writing $\Delta_k$ as in [[#eq:fim_Delta1|(3)]] instead of [[#eq:fim_Delta2|(4)]] avoids having to store all simulated sequences $(\bpsi^{(j)}, 1\leq j \leq k)$ when computing $\Delta_k$. |
− | Writing $\Delta_k$ as in [[#eq:fim_Delta1|(3)]] instead of [[#eq:fim_Delta2|(4)]] avoids having to store all simulated sequences $(\bpsi^{(j)}, 1\leq j \leq k)$ when computing $\Delta_k$. | |
| | | |
| | | |
Line 108: |
Line 106: |
| |text=In summary, for a given estimate $\hat{\theta}$ of the population parameter $\theta$, a stochastic approximation algorithm for estimating the observed Fisher Information Matrix $I(\hat{\theta)}$ consists of: | | |text=In summary, for a given estimate $\hat{\theta}$ of the population parameter $\theta$, a stochastic approximation algorithm for estimating the observed Fisher Information Matrix $I(\hat{\theta)}$ consists of: |
| | | |
− | | + | <blockquote> |
| 1. For $i=1,2,\ldots,N$, run a [[The Metropolis-Hastings algorithm for simulating the individual parameters|Metropolis-Hastings algorithm]] to draw a sequence $\psi_i^{(k)}$ with limit distribution $\pmacro(\psi_i {{!}}y_i ;\hat{\theta})$. | | 1. For $i=1,2,\ldots,N$, run a [[The Metropolis-Hastings algorithm for simulating the individual parameters|Metropolis-Hastings algorithm]] to draw a sequence $\psi_i^{(k)}$ with limit distribution $\pmacro(\psi_i {{!}}y_i ;\hat{\theta})$. |
− | | + | </blockquote> |
− | | + | <blockquote> |
| 2. At iteration $k$ of the Metropolis-Hastings algorithm, compute the first and second derivatives of $\pypsi(\by,\bpsi^{(k)};\hat{\theta})$. | | 2. At iteration $k$ of the Metropolis-Hastings algorithm, compute the first and second derivatives of $\pypsi(\by,\bpsi^{(k)};\hat{\theta})$. |
− | | + | </blockquote> |
− | | + | <blockquote> |
| 3.Update $\Delta_k$, $G_k$, $D_k$ and compute an estimate $H_k$ of the F.I.M. | | 3.Update $\Delta_k$, $G_k$, $D_k$ and compute an estimate $H_k$ of the F.I.M. |
| + | </blockquote> |
| }} | | }} |
| | | |
Line 150: |
Line 149: |
| | | |
| {{Equation1 | | {{Equation1 |
− | |rquation=<math>\begin{eqnarray} | + | |equation=<math>\begin{eqnarray} |
| \log (\ppsii(\psi_i;\theta)) &=& -\displaystyle{\frac{d}{2} }\log(2\pi) + \sum_{\iparam=1}^d \log(h_\iparam^{\prime}(\psi_{i,\iparam})) | | \log (\ppsii(\psi_i;\theta)) &=& -\displaystyle{\frac{d}{2} }\log(2\pi) + \sum_{\iparam=1}^d \log(h_\iparam^{\prime}(\psi_{i,\iparam})) |
| -\displaystyle{ \frac{1}{2} } \sum_{\iparam=1}^d \log(\omega_\iparam^2) | | -\displaystyle{ \frac{1}{2} } \sum_{\iparam=1}^d \log(\omega_\iparam^2) |
Line 163: |
Line 162: |
| \begin{array}{ll} | | \begin{array}{ll} |
| <!-- % \frac{1}{\omega_\iparam^2} --> | | <!-- % \frac{1}{\omega_\iparam^2} --> |
− | \left( h_\iparam^{\prime\prime}(\psi_{ {\rm pop},\iparam})( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) )- h_\iparam^{\prime \, 2}(\psi_{ {\rm pop},\iparam}) \right)/\omega_\iparam^2 & {\rm if \quand \iparam=\jparam} \\ | + | \left( h_\iparam^{\prime\prime}(\psi_{ {\rm pop},\iparam})( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) )- h_\iparam^{\prime \, 2}(\psi_{ {\rm pop},\iparam}) \right)/\omega_\iparam^2 & {\rm if \quad } \iparam=\jparam \\ |
| 0 & {\rm otherwise} | | 0 & {\rm otherwise} |
| \end{array} | | \end{array} |
Line 172: |
Line 171: |
| <!-- % \frac{1}{2\omega_\iparam^4} - \frac{1}{\omega_\iparam^6} --> | | <!-- % \frac{1}{2\omega_\iparam^4} - \frac{1}{\omega_\iparam^6} --> |
| 1/(2\omega_\iparam^4) - | | 1/(2\omega_\iparam^4) - |
− | ( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) )^2/\omega_\iparam^6 & {\rm if \quad \iparam=\jparam} \\ | + | ( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) )^2/\omega_\iparam^6 & {\rm if \quad} \iparam=\jparam \\ |
− | 0 & \rm otherwise} | + | 0 & {\rm otherwise} |
| \end{array} | | \end{array} |
| \right. | | \right. |
Line 180: |
Line 179: |
| \begin{array}{ll} | | \begin{array}{ll} |
| <!-- % -\frac{1}{\omega_\iparam^4} --> | | <!-- % -\frac{1}{\omega_\iparam^4} --> |
− | -h_\iparam^{\prime}(\psi_{ {\rm pop},\iparam})( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) )/\omega_\iparam^4 \\ | + | -h_\iparam^{\prime}(\psi_{ {\rm pop},\iparam})( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) )/\omega_\iparam^4 & {\rm if \quad} \iparam=\jparam \\ |
| 0 & {\rm otherwise.} | | 0 & {\rm otherwise.} |
| \end{array} | | \end{array} |
Line 243: |
Line 242: |
| | | |
| <br> | | <br> |
| + | |
| == Estimation using linearization of the model == | | == Estimation using linearization of the model == |
| | | |
Line 266: |
Line 266: |
| Then, we can approximate the marginal distribution of the vector $y_i$ as a normal distribution: | | Then, we can approximate the marginal distribution of the vector $y_i$ as a normal distribution: |
| | | |
− | {{EquationWIthRef | + | {{EquationWithRef |
| |equation=<div id="eq:fim_approx"><math> | | |equation=<div id="eq:fim_approx"><math> |
| y_{i} \approx {\cal N}\left(f(t_{i} , \hphi_i) + \Dphi{f(t_{i} , \hphi_i)} \, (\phi_{\rm pop} - \hphi_i) , | | y_{i} \approx {\cal N}\left(f(t_{i} , \hphi_i) + \Dphi{f(t_{i} , \hphi_i)} \, (\phi_{\rm pop} - \hphi_i) , |
Line 311: |
Line 311: |
| |reference=(6) }} | | |reference=(6) }} |
| | | |
| + | <br><br> |
| + | ------ |
| + | <br><br> |
| | | |
| {{OutlineText | | {{OutlineText |
| |text=In summary, for a given estimate $\hat{\theta}$ of the population parameter $\theta$, the algorithm for approximating the Fisher Information Matrix $I(\hat{\theta)}$ using a linear approximation of the model consists of: | | |text=In summary, for a given estimate $\hat{\theta}$ of the population parameter $\theta$, the algorithm for approximating the Fisher Information Matrix $I(\hat{\theta)}$ using a linear approximation of the model consists of: |
| | | |
− |
| |
− | 1. For $i=1,2,\ldots,N$, obtain some estimate $(\hpsi_i)$ of the individual parameters $(\psi_i)$ (we can average for example the final terms of the sequence $(\psi_i^{(k)})$ drawn during the final iterations of the SAEM algorithm).
| |
− |
| |
− |
| |
− | 2. For $i=1,2,\ldots,N$, compute $\hphi_i=h(\hpsi_i)$, the mean and the variance of the normal distribution defined in [[#eq:fim_approx|(5)]], and ${\llike}(\theta)$ using this normal approximation.
| |
− |
| |
− |
| |
− | 3. Use [[#eq:fim_diff|(6)]] to approximate the matrix of second-order derivatives of $\llike(\theta)$.
| |
− | }}
| |
− |
| |
− | </div>
| |
− |
| |
− |
| |
− |
| |
− |
| |
− |
| |
− | <!-- OLD VERSION -->
| |
− |
| |
− | The ''observed'' Fisher information matrix (F.I.M.) is a function of $\theta$ defined as
| |
− |
| |
− | {{EquationWithRef
| |
− | |equation=<div it="eq_fim1"><math>\begin{eqnarray}
| |
− | I(\theta) &=& -\DDt{\log (\like(\theta;\by))} \\
| |
− | &=& -\DDt{\log (\py(\by;\theta))}
| |
− | \end{eqnarray}</math></div>
| |
− | |reference=(1) }}
| |
− |
| |
− | Due to the complex expression of the likelihood, $I(\theta)$ has no closed form expression. It is however possible to estimate it using a stochastic approximation procedure, based on <balloon title="Kuhn05: put here the reference!!" style="color:#177245">Louis' formula</balloon>:
| |
− |
| |
− |
| |
− | {{Equation1
| |
− | |equation=<math>\DDt{\log (\pmacro(\by,\bpsi;\theta))} = \esp{\DDt{\log (\pmacro(\by,\bpsi;\theta))} {{!}} \by ;\theta} + \cov{\Dt{\log (\pmacro(\by,\bpsi;\theta))} {{!}} \by ; \theta},
| |
− | </math> }}
| |
− |
| |
− | where
| |
− |
| |
− | {{Equation1
| |
− | |equation=<math>\begin{eqnarray}
| |
− | \cov{\Dt{\log (\pmacro(\by,\bpsi;\theta))} {{!}} \by ; \theta} &=&
| |
− | \esp{ \left(\Dt{\log (\pmacro(\by,\bpsi;\theta))} \right)\left(\Dt{\log (\pmacro(\by,\bpsi;\theta))}\right)^{\transpose} {{!}} \by ; \theta} \\
| |
− | && - \esp{\Dt{\log (\pmacro(\by,\bpsi;\theta))} {{!}} \by ; \theta}\esp{\Dt{\log (\pmacro(\by,\bpsi;\theta))} {{!}} \by ; \theta}^{\transpose}
| |
− | \end{eqnarray}</math> }}
| |
− |
| |
− | $\Dt{\log (\pmacro(\by,\bpsi;\theta))}$ is defined as a combination of conditional expectations. Each of these conditional expectations can be estimated by Monte-Carlo, or equivalently approximated using a stochastic approximation algorithm.
| |
− |
| |
− | We can then draw a sequence $(\psi_i^{(k)})$ using a [[The Metropolis-Hastings algorithm for simulating the individual parameters|Metropolis-Hasting algorithm]] and estimate the observed F.I.M on-line.
| |
− | At iteration $k$ of the algorithm:
| |
− |
| |
− |
| |
− | * $\textbf{Simulation-step}$: for $i=1,2,\ldots N$, draw $\psi_i^{(k)}$ from $m$ iterations of the Metropolis-Hastings algorithm described in [[The Metropolis-Hastings algorithm for simulating the individual parameters]] with $\pmacro(\psi_i |y_i ;{\theta})$ as limiting distribution.
| |
− |
| |
− |
| |
− | * $\textbf{Stochastic approximation}$: update $D_k$, $G_k$ and $\Delta_k$ according to the following recurrent relations:
| |
− |
| |
− | {{Equation1
| |
− | |equation=<math>\begin{eqnarray}
| |
− | \Delta_k & = & \Delta_{k-1} + \gamma_k \left(\Dt{\log (\pmacro(\by,\bpsi^{(k)};{\theta}))} - \Delta_{k-1} \right), \\
| |
− | D_k & = & D_{k-1} + \gamma_k \left(\DDt{\log (\pmacro(\by,\bpsi^{(k)};{\theta}))} - D_{k-1} \right),\\
| |
− | G_k & = & G_{k-1} + \gamma_k \left((\Dt{\log (\pmacro(\by,\bpsi^{(k)};{\theta}))})(\Dt{\log (\pmacro(\by,\bpsi^{(k)};{\theta}))})^\transpose -G_{k-1} \right),
| |
− | \end{eqnarray}</math> }}
| |
− |
| |
− | : where $(\gamma_k)$ is a decreasing sequence of positive numbers such that $\gamma_1=1$, $ \sum_{k=1}^{\infty} \gamma_k = \infty$ and $\sum_{k=1}^{\infty} \gamma_k^2 < \infty$.
| |
− |
| |
− |
| |
− | * $\textbf{Estimation-step}$: update the estimate $H_k$ of the F.I.M. according to
| |
− |
| |
− | {{Equation1
| |
− | |equation=<math> H_k = D_k + G_k - \Delta_k \Delta_k^{\transpose}. </math> }}
| |
− |
| |
− |
| |
− | Implementing this algorithm therefore requires to compute the first and second derivatives of
| |
− |
| |
− | {{Equation1
| |
− | |equation=<math> \log (\pmacro(\by,\bpsi;\theta))=\sum_{i=1}^{N} \log (\pmacro(y_i,\psi_i;\theta)).
| |
− | </math> }}
| |
− |
| |
− | Assume first that the joint distribution of $\by$ and $\bpsi$ decomposes as
| |
− |
| |
− | {{EquationWithRef
| |
− | |equation=<div id="eq:fim_dec1"><math>
| |
− | \pypsi(\by,\bpsi;\theta) = \pcypsi(\by {{!}} \bpsi)\ppsi(\bpsi;\theta).
| |
− | </math></div>
| |
− | |reference=(2) }}
| |
− |
| |
− | his assumption means that, for any $i=1,2,\ldots N$, all the components of $\psi_i$ are random and that there exists a sufficient statistic ${\cal S}(\bpsi)$ for the estimation of $\theta$.
| |
− | It is then sufficient to compute the first and second derivatives of $\log (\pmacro(\bpsi;\theta))$ for estimating the F.I.M. This can be done relatively simply in a closed form when the individual parameters are normally distributed (eventually up to a transformation $h$).
| |
− |
| |
− |
| |
− | If some component of $\psi_i$ has no variability, [[#eq:fim_dec1|(2)]] does not hold anymore but we will decompose $\theta$ into $(\theta_y,\theta_\psi)$ such that
| |
− |
| |
− | {{Equation1
| |
− | |equation=<math>
| |
− | \pyipsii(y_i,\psi_i;\theta) = \pcyipsii(y_i {{!}} \psi_i ; \theta_y)\ppsii(\psi_i;\theta_\psi).
| |
− | </math> }}
| |
− |
| |
− | We will then need to compute the first and second derivatives of $\log(\pcyipsii(y_i |\psi_i ; \theta_y))$ and $\log(\ppsii(\psi_i;\theta_\psi))$. Derivatives of $\log(\pcyipsii(y_i |\psi_i ; \theta_y))$ that don't have closed form expressions can be obtained by central differences.
| |
− |
| |
− |
| |
− | {{Remarks
| |
− | |title=Remarks
| |
− | |text=
| |
| <blockquote> | | <blockquote> |
− | 1. Using $\gamma_k=1/k$ for $k=1,2,\ldots$ means that is approximated each term with an empirical mean obtained from $(\bpsi^{(k)} , k=1,2,\ldots)$. For instance, | + | 1. For $i=1,2,\ldots,N$, obtain some estimate $(\hpsi_i)$ of the individual parameters $(\psi_i)$ (we can average for example the final terms of the sequence $(\psi_i^{(k)})$ drawn during the final iterations of the [[The SAEM algorithm for estimating population parameters| SAEM algorithm]]). |
| </blockquote> | | </blockquote> |
− | {{EquationWithRef
| |
− | |equation=<div id="eq:fim_Delta1"><math>\begin{eqnarray}
| |
− | \Delta_k &=& \Delta_{k-1} + \frac{1}{k} \left(\Dt{\log (\pmacro(\by,\bpsi^{(k)};\theta))} - \Delta_{k-1} \right)
| |
− | \end{eqnarray}</math></div>
| |
− | |reference=(3) }}
| |
− | {{EquationWithRef
| |
− | |equation=<div id="eq:fim_Delta2"><math>\begin{eqnarray}
| |
− | &=& \frac{1}{k} \sum_{j=1}^{k} \Dt{\log (\pmacro(\by,\bpsi^{(j)};\theta))}
| |
− | \end{eqnarray}</math></div>
| |
− | |reference=(4) }}
| |
− |
| |
| <blockquote> | | <blockquote> |
− | [[#eq:fim_Delta1|(3)]] (resp. [[#eq:fim_Delta2|(4)]]) defines $\Delta_k$ using an online (resp. offline) algorithm. Writing $\Delta_k$ as in [[#eq:fim_Delta1|(3)]] instead of [[#eq:fim_Delta2|(4)]] avoids to store all the simulated sequences $(\bpsi^{(j)}, 1\leq j \leq k)$ for computing $\Delta_k$.
| + | 2. For $i=1,2,\ldots,N$, compute $\hphi_i=h(\hpsi_i)$, the mean and the variance of the normal distribution defined in [[#eq:fim_approx|(5)]], and ${\llike}(\theta)$ using this normal approximation. |
− | | |
− | | |
− | 2. This approach is used in practice for computing the F.I.M. $I(\hat{\theta})$ where $\hat{\theta}$ is the maximum likelihood estimate of $\theta$. Then, the only difference with the Metropolis-Hastings used for SAEM is that the population parameter $\theta$ is not updated but remains fixed to $\hat{\theta}$. | |
− | </blockquote>
| |
− | }}
| |
− | | |
− | | |
− | {{OutlineText
| |
− | |text=In summary, for a given estimate $\hat{\theta}$ of the population parameter $\theta$, a stochastic approximation algorithm for estimating the observed Fisher Information Matrix $I(\hat{\theta)}$ consists in:
| |
− | | |
− | | |
− | * for $i=1,2,\ldots N$, run a Metropolis-Hastings algorithm to draw a sequence $\psi_i^{(k)}$ with limiting distribution $\pmacro(\psi_i {{!}} y_i ;\hat{\theta})$,
| |
− | * at iteration $k$ of the M-H algorithm, compute the first and second derivatives of $\pypsi(\by,\bpsi^{(k)};\hat{\theta})$
| |
− | * update $\Delta_k$, $G_k$, $D_k$ and compute an estimate $H_k$ of the F.I.M.
| |
− | }}
| |
− | | |
− | | |
− | | |
− | {{Example
| |
− | |title=Example 1
| |
− | |text= Consider the model
| |
− | | |
− | {{Equation1
| |
− | |equation=<math>\begin{eqnarray}
| |
− | y_i {{!}} \psi_i &\sim& \pcyipsii(y_i {{!}} \psi_i) \\
| |
− | h(\psi_i) &\sim_{i.i.d}& {\cal N}( h(\psi_{\rm pop}) , \Omega),
| |
− | \end{eqnarray}</math> }}
| |
− | | |
− | where $\Omega = {\rm diag}(\omega_1^2,\omega_2^2,\ldots,\omega_d^2)$ is a diagonal matrix and $h(\psi_i)=(h_1(\psi_{i,1}), h_2(\psi_{i,2}), \ldots , h_d(\psi_{i,d}) )^{\transpose}$. The vector of population parameters is $\theta = (\psi_{\rm pop} , \Omega)=(\psi_{ {\rm pop},1},\ldots,\psi_{ {\rm pop},d},\omega_1^2,\ldots,\omega_d^2)$.
| |
− | | |
− | Here,
| |
− | | |
− | {{Equation1
| |
− | |equation=<math> \log (\pyipsii(y_i,\psi_i;\theta)) = \log (\pcyipsii(y_i {{!}} \psi_i)) + \log (\ppsii(\psi_i;\theta)).
| |
− | </math> }}
| |
− | | |
− | Then,
| |
− | | |
− | {{Equation1
| |
− | |equation=<math>\begin{eqnarray}
| |
− | \Dt{\log (\pyipsii(y_i,\psi_i;\theta))} &=& \Dt{\log (\ppsii(\psi_i;\theta))} \\
| |
− | \DDt{\log (\pyipsii(y_i,\psi_i;\theta))} &=& \DDt{\log (\ppsii(\psi_i;\theta))}
| |
− | \end{eqnarray}</math> }}
| |
− | | |
− | More precisely,
| |
− | | |
− | {{Equation1_special
| |
− | |equation=<math>\begin{eqnarray}
| |
− | \log (\ppsii(\psi_i;\theta)) &=& -\frac{d}{2}\log(2\pi) + \sum_{\iparam=1}^d \log(h_\iparam^{\prime}(\psi_{i,\iparam}))
| |
− | -\frac{1}{2} \sum_{\iparam=1}^d \log(\omega_\iparam^2)
| |
− | -\sum_{\iparam=1}^d \frac{1}{2\, \omega_\iparam^2}( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) )^2, \\
| |
− | \partial \log (\ppsii(\psi_i;\theta))/\partial \psi_{ {\rm pop},\iparam} &=&
| |
− | \frac{1}{\omega_\iparam^2} h_\iparam^{\prime}(\psi_{ {\rm pop},\iparam})( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) ), \\
| |
− | \partial \log (\ppsii(\psi_i;\theta))/\partial \omega^2_{\iparam} &=&
| |
− | -\frac{1}{2\omega_\iparam^2}
| |
− | + \frac{1}{2\, \omega_\iparam^4}( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) )^2, \\
| |
− | \partial^2 \log (\ppsii(\psi_i;\theta))/\partial \psi_{ {\rm pop},\iparam} \partial \psi_{ {\rm pop},\jparam} &=&
| |
− | \left\{
| |
− | \begin{array}{ll}
| |
− | <!-- % \frac{1}{\omega_\iparam^2} -->
| |
− | \left( h_\iparam^{\prime\prime}(\psi_{ {\rm pop},\iparam})( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) )- h_\iparam^{\prime \, 2}(\psi_{ {\rm pop},\iparam}) \right)/\omega_\iparam^2, & {\rm \quad if \quad} \iparam=\jparam \\
| |
− | 0, & {\rm \quad otherwise,}
| |
− | \end{array}
| |
− | \right.
| |
− | \\
| |
− | \partial^2 \log (\ppsii(\psi_i;\theta))/\partial \omega^2_{\iparam} \partial \omega^2_{\jparam} &=& \left\{
| |
− | \begin{array}{ll}
| |
− | <!-- % \frac{1}{2\omega_\iparam^4} - \frac{1}{\omega_\iparam^6} -->
| |
− | 1/(2\omega_\iparam^4) -
| |
− | ( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) )^2/\omega_\iparam^6, & {\quad \rm if \quad} \iparam=\jparam \\
| |
− | 0, & {\rm \quad otherwise,}
| |
− | \end{array}
| |
− | \right.
| |
− | \\
| |
− | \partial^2 \log (\ppsii(\psi_i;\theta))/\partial \psi_{ {\rm pop},\iparam} \partial \omega^2_{\jparam} &=& \left\{
| |
− | \begin{array}{ll}
| |
− | <!-- % -\frac{1}{\omega_\iparam^4} -->
| |
− | -h_\iparam^{\prime}(\psi_{ {\rm pop},\iparam})( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) )/\omega_\iparam^4 \\
| |
− | 0, & {\rm \quad \quad otherwise.}
| |
− | \end{array}
| |
− | \right.
| |
− | \end{eqnarray}</math> }}
| |
− | }}
| |
− | | |
− | | |
− | | |
− | {{Example
| |
− | |title=Example 2
| |
− | |text= We consider the same model for continuous data, assuming a constant error model and assuming that the variance $a^2$ of the residual error has no variability:
| |
− | | |
− | {{Equation1
| |
− | |equation=<math>\begin{eqnarray}
| |
− | y_{ij} {{!}} \psi_i &\sim& {\cal N}(f(t_{ij}, \psi_i) \ , \ a^2) \ \ ; \ \ 1 \leq j \leq n_i \\
| |
− | h(\psi_i) &\sim_{i.i.d}& {\cal N}( h(\psi_{\rm pop}) , \Omega),
| |
− | \end{eqnarray}</math> }}
| |
− | | |
− | Here, $\theta_y=a^2$, $\theta_\psi=(\psi_{\rm pop},\Omega)$ and
| |
− | | |
− | {{Equation1
| |
− | |equation=<math>\begin{equation}
| |
− | \log(\pyipsii(y_i,\psi_i;\theta)) = \log(\pcyipsii(y_i {{!}} \psi_i ; a^2)) + \log(\ppsii(\psi_i;\psi_{\rm pop},\Omega)).
| |
− | \end{equation}</math> }}
| |
− | | |
− | where
| |
− | | |
− | {{Equation1
| |
− | |equation=<math> \log(\pcyipsii(y_i {{!}} \psi_i ; a^2))
| |
− | =-\frac{n_i}{2}\log(2\pi)- \frac{n_i}{2}\log(a^2) - frac{1}{2a^2}\sum_{j=1}^{n_i}(y_{ij} - f(t_{ij}, \psi_i))^2
| |
− | </math> }}
| |
− | | |
− | Derivatives of $\log(\pcyipsii(y_i {{!}}\psi_i ; a^2))$ with respect to $a^2$ are straightforward to compute. Derivatives of $\log(\ppsii(\psi_i;\psi_{\rm pop},\Omega))$ with respect to $\psi_{\rm pop}$ and $\Omega$ remain unchanged.
| |
− | }}
| |
− | | |
− | | |
− | | |
− | {{Example
| |
− | |title=Example 3
| |
− | |text= We still consider the same model for continuous data, assuming now that a subset $\xi$ of the parameters of the structural model has no variability:
| |
− | | |
− | {{Equation1
| |
− | |equation=<math>\begin{eqnarray}
| |
− | y_{ij} {{!}} \psi_i &\sim& {\cal N}(f(t_{ij}, \psi_i,\xi) \ , \ a^2) \ \ ; \ \ 1 \leq j \leq n_i \\
| |
− | h(\psi_i) &\sim_{i.i.d}& {\cal N}( h(\psi_{\rm pop}) , \Omega),
| |
− | \end{eqnarray}</math> }}
| |
− | | |
− | $\psi$ remains the subset of individual parameters with variability. Here, $\theta_y=(\xi,a^2)$, $\theta_\psi=(\psi_{\rm pop},\Omega)$, and
| |
− | | |
− | {{Equation1
| |
− | |equation=<math> \log(\pcyipsii(y_i {{!}} \psi_i ; \xi,a^2))
| |
− | =-\frac{n_i}{2}\log(2\pi)- \frac{n_i}{2}\log(a^2) - \frac{1}{2 a^2}\sum_{j=1}^{n_i}(y_{ij} - f(t_{ij}, \psi_i,\xi))^2
| |
− | </math> }}
| |
− | | |
− | Derivatives of $\log(\pcyipsii(y_i {{!}}\psi_i ; \xi, a^2))$ with respect to $\xi$ require to compute the derivative of $f$ with respect to $\xi$. These derivatives are usually not available. An alternative is to numerically approximate these derivatives using finite differences.
| |
− | }}
| |
− | | |
− | | |
− | <br>
| |
− | | |
− | ==Estimation of the F.I.M. using a linearization of the model==
| |
− | Consider here a model for continuous data, using the $\phi$-parametrization for the individual parameters:
| |
− | | |
− | {{Equation1
| |
− | |equation=<math>\begin{eqnarray}
| |
− | y_{ij} &= & f(t_{ij} , \phi_i) + g(t_{ij} , \phi_i)\teps_{ij} \\
| |
− | \phi_i &=& \phi_{\rm pop} + \eta_i
| |
− | \end{eqnarray}</math> }}
| |
− | | |
− | Let $\hphi_i$ be some predicted value of $\phi_i$ ($\hphi_i$ can for instance be the estimated mean or the estimated mode of the conditional distribution $\pmacro(\phi_i |y_i ; \hat{\theta})$).
| |
− | | |
− | We can therefore linearize the model for the observations $(y_{ij}, 1\leq j \leq n_i)$ of individual $i$ around the vector of predicted individual parameters. Let $\Dphi{f(t , \phi)}$ be the row vector of derivatives of $f(t , \phi)$ with respect to $\phi$. Then,
| |
− | | |
− | {{Equation1
| |
− | |equation=<math>\begin{eqnarray}
| |
− | y_{ij} &\simeq& f(t_{ij} , \hphi_i) + \Dphi{f(t_{ij} , \hphi_i)} \, (\phi_i - \hphi_i) + g(t_{ij} , \hphi_i)\teps_{ij} \\
| |
− | &\simeq& f(t_{ij} , \hphi_i) + \Dphi{f(t_{ij} , \hphi_i)} \, (\phi_{\rm pop} - \hphi_i)
| |
− | + \Dphi{f(t_{ij} , \hphi_i)} \, \eta_i + g(t_{ij} , \hphi_i)\teps_{ij}
| |
− | \end{eqnarray}</math> }}
| |
− | | |
− | Then, we can approximate the marginal distribution of the vector $y_i$ as a normal distribution:
| |
− | | |
− | {{EquationWithRef
| |
− | |equation=<div id="eq:fim_approx"><math>
| |
− | y_{i} \approx {\cal N}\left(f(t_{i} , \hphi_i) + \Dphi{f(t_{i} , \hphi_i)} \, (\phi_{\rm pop} - \hphi_i) ,
| |
− | \Dphi{f(t_{i} , \hphi_i)} \Omega \Dphi{f(t_{i} , \hphi_i)}^{\transpose} + g(t_{i} , \hphi_i)\Sigma_{n_i} g(t_{ij} , \hphi_i)^{\transpose} \right).
| |
− | </math></div>
| |
− | |reference=(5) }}
| |
− | | |
− | where $\Sigma_{n_i}$ is the variance-covariance matrix of $\teps_{i,1},\ldots,\teps_{i,n_i})$. If the $\teps_{ij}$'s are ''i.i.d'', then $\Sigma_{n_i}$ is the identity matrix.
| |
− | | |
− | | |
− | We can equivalently use the original $\psi$-parametrization using the fact that $\phi_i=h(\psi_i)$. Then,
| |
− | | |
− | {{Equation1
| |
− | |equation=<math> \Dphi{f(t_{i} , \hphi_i)} = \Dpsi{f(t_{i} , \hpsi_i)} J_h(\hpsi_i)^{\transpose}
| |
− | </math> }}
| |
− | | |
− | where $J_h$ is the Jacobian of $h$.
| |
− | | |
− | | |
− | We then can approximate the observed log-likelihood ${\llike}(\theta) = \log({\like}(\theta;\by))=\sum_{i=1}^N \log(\pyi(y_i;\theta))$ using this normal approximation. We can also derive the F.I.M by computing the matrix of second-order partial derivatives of ${\llike}(\theta)$.
| |
− | | |
− | | |
− | Except for very simple models, computing these second-order order derivatives in a closed form is not straightforward. Then, finite differences can be used for approximating numerically these quantities. We can use for instance a central difference approximation of the second derivative of ${\llike}(\theta)$:
| |
− | | |
− | Let $\nu>0$. For $j=1,2,\ldots, m$, let $\nu^{(j)}=(\nu^{(j)}_{k}, 1\leq k \leq m)$ be the $m$-vector such that
| |
− | | |
− | {{Equation1
| |
− | |equation=<math>
| |
− | \nu^{(j)}_{k} = \left\{
| |
− | \begin{array}{ll}
| |
− | \nu, & {\rm if \quad} j= k; \\
| |
− | 0, & {\rm otherwise.}
| |
− | \end{array}
| |
− | \right.
| |
− | </math> }}
| |
− | | |
− | Then, for $\nu$ small enough,
| |
− | | |
− | {{EquationWithRef
| |
− | |equation=<div id="eq:fim_diff_a"><math>\begin{eqnarray}
| |
− | \partial_{\theta_j}{ {\llike}(\theta)} &\approx& \displaystyle{ \frac{ {\llike}(\theta+\nu^{(j)})- {\llike}(\theta-\nu^{(j)})}{2\nu} } \ ,
| |
− | \end{eqnarray}</math></div>
| |
− | |reference=(6) }}
| |
− | {{EquationWithRef
| |
− | |equation=<div id="eq:fim_diff_b"><math>\begin{eqnarray}
| |
− | \partial^2_{\theta_j,\theta_k}{ {\llike}(\theta)} &\approx& \displaystyle{ \frac{ {\llike}(\theta+\nu^{(j)}+\nu^{(k)})-{\llike}(\theta+\nu^{(j)}-\nu^{(k)})
| |
− | -{\llike}(\theta-\nu^{(j)}+\nu^{(k)})+{\llike}(\theta-\nu^{(j)}-\nu^{(k)})}{4\nu^2} } .
| |
− | \end{eqnarray}</math></div>
| |
− | |reference=(7) }}
| |
− | | |
− | | |
− | {{OutlineText
| |
− | |text=
| |
− | In summary, for a given estimate $\hat{\theta}$ of the population parameter $\theta$, the algorithm for approximating the Fisher Information Matrix $I(\hat{\theta)}$ using a linear approximation of the model consists in:
| |
− | | |
− | <blockquote>
| |
− | 1. for $i=1,2,\ldots N$, get some estimate $(\hpsi_i)$ of the individual parameters $(\psi_i)$ (we can average for example the last terms of the sequence $(\psi_i^{(k)})$ drawn during the last iterations of the SAEM algorithm),
| |
− | </blockquote>
| |
− | <blockquote>
| |
− | 2. for $i=1,2,\ldots N$, compute $\hphi_i=h(\hpsi_i)$, compute the mean and the variance of the normal distribution defined in [[#eq:fim_approx|(5)]], and compute ${\llike}(\theta)$ using this normal approximation,
| |
| </blockquote> | | </blockquote> |
| <blockquote> | | <blockquote> |
− | 3. use [[#eq:fim_diff_a|(6)]] to approximate the matrix of second-order derivatives of ${\llike}(\theta)$. | + | 3. Use [[#eq:fim_diff|(6)]] to approximate the matrix of second-order derivatives of ${\llike}(\theta)$. |
| </blockquote> | | </blockquote> |
| }} | | }} |
| | | |
| {{Back&Next | | {{Back&Next |
− | |linkBack=The Metropolis-Hastings algorithm for simulating the individual parameters | + | |linkNext=Estimation of the log-likelihood |
− | |linkNext=Estimation of the log-likelihood via importance sampling }}
| + | |linkBack=The Metropolis-Hastings algorithm for simulating the individual parameters }} |
− | | |
− | </div>
| |
$ \def\hphi{\tilde{\phi}} $
Estimation using stochastic approximation
The observed Fisher information matrix (F.I.M.) is a function of $\theta$ defined as
\(\begin{eqnarray}
I(\theta) &=& -\DDt{\log ({\like}(\theta;\by))} \\
&=& -\DDt{\log (\py(\by;\theta))} .
\end{eqnarray}\)
|
(1)
|
Due to the likelihood being quite complex, $I(\theta)$ usually has no closed form expression. It is however possible to estimate it using a stochastic approximation procedure based on Louis' formula:
\(\DDt{\log (\pmacro(\by;\theta))} = \esp{\DDt{\log (\pmacro(\by,\bpsi;\theta))} | \by ;\theta} + \cov{\Dt{\log (\pmacro(\by,\bpsi;\theta))} | \by ; \theta},
\)
where
\(\begin{eqnarray}
\cov{\Dt{\log (\pmacro(\by,\bpsi;\theta))} | \by ; \theta} &=&
\esp{ \left(\Dt{\log (\pmacro(\by,\bpsi;\theta))} \right)\left(\Dt{\log (\pmacro(\by,\bpsi;\theta))}\right)^{\transpose} | \by ; \theta} \\
&& - \esp{\Dt{\log (\pmacro(\by,\bpsi;\theta))} | \by ; \theta}\esp{\Dt{\log (\pmacro(\by,\bpsi;\theta))} | \by ; \theta}^{\transpose} .
\end{eqnarray}\)
Thus, $\DDt{\log (\pmacro(\by;\theta))}$ is defined as a combination of conditional expectations. Each of these conditional expectations can be estimated by Monte Carlo, or equivalently approximated using a stochastic approximation algorithm.
We can then draw a sequence $(\psi_i^{(k)})$ using a Metropolis-Hasting algorithm and estimate the observed F.I.M. online. At iteration $k$ of the algorithm:
- Simulation step: for $i=1,2,\ldots,N$, draw $\psi_i^{(k)}$ from $m$ iterations of the Metropolis-Hastings algorithm described in The Metropolis-Hastings algorithm section with $\pmacro(\psi_i |y_i ;{\theta})$ as the limit distribution.
- Stochastic approximation: update $D_k$, $G_k$ and $\Delta_k$ according to the following recurrence relations:
\(\begin{eqnarray}
\Delta_k & = & \Delta_{k-1} + \gamma_k \left(\Dt{\log (\pmacro(\by,\bpsi^{(k)};{\theta}))} - \Delta_{k-1} \right) \\
D_k & = & D_{k-1} + \gamma_k \left(\DDt{\log (\pmacro(\by,\bpsi^{(k)};{\theta}))} - D_{k-1} \right)\\
G_k & = & G_{k-1} + \gamma_k \left((\Dt{\log (\pmacro(\by,\bpsi^{(k)};{\theta}))})(\Dt{\log (\pmacro(\by,\bpsi^{(k)};{\theta}))})^\transpose -G_{k-1} \right),
\end{eqnarray}\)
- where $(\gamma_k)$ is a decreasing sequence of positive numbers such that $\gamma_1=1$, $ \sum_{k=1}^{\infty} \gamma_k = \infty$, and $\sum_{k=1}^{\infty} \gamma_k^2 < \infty$.
- Estimation step: update the estimate $H_k$ of the F.I.M. according to
\(H_k = D_k + G_k - \Delta_k \Delta_k^{\transpose}. \)
Implementing this algorithm therefore requires computation of the first and second derivatives of
\(\log (\pmacro(\by,\bpsi;\theta))=\sum_{i=1}^{N} \log (\pmacro(y_i,\psi_i;\theta)).\)
Assume first that the joint distribution of $\by$ and $\bpsi$ decomposes as
\(
\pypsi(\by,\bpsi;\theta) = \pcypsi(\by | \bpsi)\ppsi(\bpsi;\theta).
\)
|
(2)
|
This assumption means that for any $i=1,2,\ldots,N$, all of the components of $\psi_i$ are random and there exists a sufficient statistic ${\cal S}(\bpsi)$ for the estimation of $\theta$. It is then sufficient to compute the first and second derivatives of $\log (\pmacro(\bpsi;\theta))$ in order to estimate the F.I.M. This can be done relatively simply in closed form when the individual parameters are normally distributed (or a transformation $h$ of them is).
If some component of $\psi_i$ has no variability, (2) no longer holds, but we can decompose $\theta$ into $(\theta_y,\theta_\psi)$ such that
\(
\pyipsii(y_i,\psi_i;\theta) = \pcyipsii(y_i | \psi_i ; \theta_y)\ppsii(\psi_i;\theta_\psi).
\)
We then need to compute the first and second derivatives of $\log(\pcyipsii(y_i |\psi_i ; \theta_y))$ and $\log(\ppsii(\psi_i;\theta_\psi))$. Derivatives of $\log(\pcyipsii(y_i |\psi_i ; \theta_y))$ that do not have a closed form expression can be obtained using central differences.
Remarks
1. Using $\gamma_k=1/k$ for $k \geq 1$ means that each term is approximated with an empirical mean obtained from $(\bpsi^{(k)}, k \geq 1)$. For instance,
\(\begin{eqnarray}
\Delta_k
&=& \Delta_{k-1} + \displaystyle{ \frac{1}{k} } \left(\Dt{\log (\pmacro(\by,\bpsi^{(k)};\theta))} - \Delta_{k-1} \right)
\end{eqnarray}\)
|
(3)
|
\(\begin{eqnarray}
&=& \displaystyle{ \frac{1}{k} }\sum_{j=1}^{k} \Dt{\log (\pmacro(\by,\bpsi^{(j)};\theta))} .
\end{eqnarray}\)
|
(4)
|
(3) (resp. (4)) defines $\Delta_k$ using an online (resp. offline) algorithm. Writing $\Delta_k$ as in (3) instead of (4) avoids having to store all simulated sequences $(\bpsi^{(j)}, 1\leq j \leq k)$ when computing $\Delta_k$.
2. This approach is used for computing the F.I.M. $I(\hat{\theta})$ in practice, where $\hat{\theta}$ is the maximum likelihood estimate of $\theta$. The only difference with the
Metropolis-Hastings used for SAEM is that the population parameter $\theta$ is not updated and remains fixed at $\hat{\theta}$.
In summary, for a given estimate $\hat{\theta}$ of the population parameter $\theta$, a stochastic approximation algorithm for estimating the observed Fisher Information Matrix $I(\hat{\theta)}$ consists of:
1. For $i=1,2,\ldots,N$, run a Metropolis-Hastings algorithm to draw a sequence $\psi_i^{(k)}$ with limit distribution $\pmacro(\psi_i |y_i ;\hat{\theta})$.
2. At iteration $k$ of the Metropolis-Hastings algorithm, compute the first and second derivatives of $\pypsi(\by,\bpsi^{(k)};\hat{\theta})$.
3.Update $\Delta_k$, $G_k$, $D_k$ and compute an estimate $H_k$ of the F.I.M.
Example 1
Consider the model
\(\begin{eqnarray}
y_i | \psi_i &\sim& \pcyipsii(y_i | \psi_i) \\
h(\psi_i) &\sim_{i.i.d}& {\cal N}( h(\psi_{\rm pop}) , \Omega),
\end{eqnarray}\)
where $\Omega = {\rm diag}(\omega_1^2,\omega_2^2,\ldots,\omega_d^2)$ is a diagonal matrix and $h(\psi_i)=(h_1(\psi_{i,1}), h_2(\psi_{i,2}), \ldots , h_d(\psi_{i,d}) )^{\transpose}$.
The vector of population parameters is $\theta = (\psi_{\rm pop} , \Omega)=(\psi_{ {\rm pop},1},\ldots,\psi_{ {\rm pop},d},\omega_1^2,\ldots,\omega_d^2)$.
Here,
\(
\log (\pyipsii(y_i,\psi_i;\theta)) = \log (\pcyipsii(y_i | \psi_i)) + \log (\ppsii(\psi_i;\theta)).
\)
Then,
\(\begin{eqnarray}
\Dt{\log (\pyipsii(y_i,\psi_i;\theta))} &=& \Dt{\log (\ppsii(\psi_i;\theta))} \\
\DDt{\log (\pyipsii(y_i,\psi_i;\theta))} &=& \DDt{\log (\ppsii(\psi_i;\theta))} .
\end{eqnarray}\)
More precisely,
\(\begin{eqnarray}
\log (\ppsii(\psi_i;\theta)) &=& -\displaystyle{\frac{d}{2} }\log(2\pi) + \sum_{\iparam=1}^d \log(h_\iparam^{\prime}(\psi_{i,\iparam}))
-\displaystyle{ \frac{1}{2} } \sum_{\iparam=1}^d \log(\omega_\iparam^2)
-\sum_{\iparam=1}^d \displaystyle{ \frac{1}{2\, \omega_\iparam^2} }( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) )^2 \\
\partial \log (\ppsii(\psi_i;\theta))/\partial \psi_{ {\rm pop},\iparam} &=&
\displaystyle{\frac{1}{\omega_\iparam^2} }h_\iparam^{\prime}(\psi_{ {\rm pop},\iparam})( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) ) \\
\partial \log (\ppsii(\psi_i;\theta))/\partial \omega^2_{\iparam} &=&
-\displaystyle{ \frac{1}{2\omega_\iparam^2} }
+\displaystyle{\frac{1}{2\, \omega_\iparam^4} }( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) )^2 \\
\partial^2 \log (\ppsii(\psi_i;\theta))/\partial \psi_{ {\rm pop},\iparam} \partial \psi_{ {\rm pop},\jparam} &=&
\left\{
\begin{array}{ll}
\left( h_\iparam^{\prime\prime}(\psi_{ {\rm pop},\iparam})( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) )- h_\iparam^{\prime \, 2}(\psi_{ {\rm pop},\iparam}) \right)/\omega_\iparam^2 & {\rm if \quad } \iparam=\jparam \\
0 & {\rm otherwise}
\end{array}
\right.
\\
\partial^2 \log (\ppsii(\psi_i;\theta))/\partial \omega^2_{\iparam} \partial \omega^2_{\jparam} &=& \left\{
\begin{array}{ll}
1/(2\omega_\iparam^4) -
( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) )^2/\omega_\iparam^6 & {\rm if \quad} \iparam=\jparam \\
0 & {\rm otherwise}
\end{array}
\right.
\\
\partial^2 \log (\ppsii(\psi_i;\theta))/\partial \psi_{ {\rm pop},\iparam} \partial \omega^2_{\jparam} &=& \left\{
\begin{array}{ll}
-h_\iparam^{\prime}(\psi_{ {\rm pop},\iparam})( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) )/\omega_\iparam^4 & {\rm if \quad} \iparam=\jparam \\
0 & {\rm otherwise.}
\end{array}
\right.
\end{eqnarray}\)
Example 2
We consider the same model for continuous data, assuming a constant error model and that the variance $a^2$ of the residual error has no variability:
\(\begin{eqnarray}
y_{ij} | \psi_i &\sim& {\cal N}(f(t_{ij}, \psi_i) \ , \ a^2), \ \ 1 \leq j \leq n_i \\
h(\psi_i) &\sim_{i.i.d}& {\cal N}( h(\psi_{\rm pop}) , \Omega).
\end{eqnarray}\)
Here, $\theta_y=a^2$, $\theta_\psi=(\psi_{\rm pop},\Omega)$ and
\(
\log(\pyipsii(y_i,\psi_i;\theta)) = \log(\pcyipsii(y_i | \psi_i ; a^2)) + \log(\ppsii(\psi_i;\psi_{\rm pop},\Omega)),
\)
where
\(
\log(\pcyipsii(y_i | \psi_i ; a^2))
=-\displaystyle{\frac{n_i}{2} }\log(2\pi)- \displaystyle{\frac{n_i}{2} }\log(a^2) - \displaystyle{\frac{1}{2a^2} }\sum_{j=1}^{n_i}(y_{ij} - f(t_{ij}, \psi_i))^2 .
\)
Derivatives of $\log(\pcyipsii(y_i | \psi_i ; a^2))$ with respect to $a^2$ are straightforward to compute. Derivatives of $\log(\ppsii(\psi_i;\psi_{\rm pop},\Omega))$ with respect to $\psi_{\rm pop}$ and $\Omega$ remain unchanged.
Example 3
Consider again the same model for continuous data, assuming now that a subset $\xi$ of the parameters of the structural model has no variability:
\(\begin{eqnarray}
y_{ij} | \psi_i &\sim& {\cal N}(f(t_{ij}, \psi_i,\xi) \ , \ a^2), \ \ 1 \leq j \leq n_i \\
h(\psi_i) &\sim_{i.i.d}& {\cal N}( h(\psi_{\rm pop}) , \Omega).
\end{eqnarray}\)
Let $\psi$ remain as the subset of individual parameters with variability. Here, $\theta_y=(\xi,a^2)$, $\theta_\psi=(\psi_{\rm pop},\Omega)$, and
\(
\log(\pcyipsii(y_i | \psi_i ; \xi,a^2))
=-\displaystyle{\frac{n_i}{2} }\log(2\pi)- \displaystyle{\frac{n_i}{2} }\log(a^2) - \displaystyle{\frac{1}{2 a^2} }\sum_{j=1}^{n_i}(y_{ij} - f(t_{ij}, \psi_i,\xi))^2 .
\)
Derivatives of $\log(\pcyipsii(y_i | \psi_i ; \xi, a^2))$ with respect to $\xi$ require computation of the derivative of $f$ with respect to $\xi$. These derivatives are usually not calculable. One possibility is to numerically approximate them using finite differences.
Estimation using linearization of the model
Consider here a model for continuous data that uses a $\phi$-parametrization for the individual parameters:
\(\begin{eqnarray}
y_{ij} &= & f(t_{ij} , \phi_i) + g(t_{ij} , \phi_i)\teps_{ij} \\
\phi_i &=& \phi_{\rm pop} + \eta_i .
\end{eqnarray}\)
Let $\hphi_i$ be some predicted value of $\phi_i$, such as for instance the estimated mean or estimated mode of the conditional distribution $\pmacro(\phi_i |y_i ; \hat{\theta})$.
We can then choose to linearize the model for the observations $(y_{ij}, 1\leq j \leq n_i)$ of individual $i$ around the vector of predicted individual parameters. Let $\Dphi{f(t , \phi)}$ be the row vector of derivatives of $f(t , \phi)$ with respect to $\phi$. Then,
\(\begin{eqnarray}
y_{ij} &\simeq& f(t_{ij} , \hphi_i) + \Dphi{f(t_{ij} , \hphi_i)} \, (\phi_i - \hphi_i) + g(t_{ij} , \hphi_i)\teps_{ij} \\
&\simeq& f(t_{ij} , \hphi_i) + \Dphi{f(t_{ij} , \hphi_i)} \, (\phi_{\rm pop} - \hphi_i)
+ \Dphi{f(t_{ij} , \hphi_i)} \, \eta_i + g(t_{ij} , \hphi_i)\teps_{ij} .
\end{eqnarray}\)
Then, we can approximate the marginal distribution of the vector $y_i$ as a normal distribution:
\(
y_{i} \approx {\cal N}\left(f(t_{i} , \hphi_i) + \Dphi{f(t_{i} , \hphi_i)} \, (\phi_{\rm pop} - \hphi_i) ,
\Dphi{f(t_{i} , \hphi_i)} \Omega \Dphi{f(t_{i} , \hphi_i)}^{\transpose} + g(t_{i} , \hphi_i)\Sigma_{n_i} g(t_{ij} , \hphi_i)^{\transpose} \right),
\)
|
(5)
|
where $\Sigma_{n_i}$ is the variance-covariance matrix of $\teps_{i,1},\ldots,\teps_{i,n_i}$. If the $\teps_{ij}$ are i.i.d., then
$\Sigma_{n_i}$ is the identity matrix.
We can equivalently use the original $\psi$-parametrization and the fact that $\phi_i=h(\psi_i)$. Then,
\(
\Dphi{f(t_{i} , \hphi_i)} = \Dpsi{f(t_{i} , \hpsi_i)} J_h(\hpsi_i)^{\transpose} , \)
where $J_h$ is the Jacobian of $h$.
We then can approximate the observed log-likelihood ${\llike}(\theta) = \log(\like(\theta;\by))=\sum_{i=1}^N \log(\pyi(y_i;\theta))$ using this normal approximation. We can also derive the F.I.M. by computing the matrix of second-order partial derivatives of ${\llike}(\theta)$.
Except for very simple models, computing these second-order partial derivatives in closed form is not straightforward. In such cases, finite differences can be used for numerically approximating them. We can use for instance a central difference approximation of the second derivative of $\llike(\theta)$. To this end, let $\nu>0$. For $j=1,2,\ldots, m$, let $\nu^{(j)}=(\nu^{(j)}_{k}, 1\leq k \leq m)$ be the $m$-vector such that
\(
\nu^{(j)}_{k} = \left\{
\begin{array}{ll}
\nu & {\rm if \quad j= k} \\
0 & {\rm otherwise.}
\end{array}
\right.
\)
Then, for $\nu$ small enough,
\(\begin{eqnarray}
\partial_{\theta_j}{ {\llike}(\theta)} &\approx& \displaystyle{ \frac{ {\llike}(\theta+\nu^{(j)})- {\llike}(\theta-\nu^{(j)})}{2\nu} } \\
\end{eqnarray}\)
\(\begin{eqnarray}
\partial^2_{\theta_j,\theta_k}{ {\llike}(\theta)} &\approx& \displaystyle{\frac{ {\llike}(\theta+\nu^{(j)}+\nu^{(k)})- {\llike}(\theta+\nu^{(j)}-\nu^{(k)})
-{\llike}(\theta-\nu^{(j)}+\nu^{(k)})+{\llike}(\theta-\nu^{(j)}-\nu^{(k)})}{4\nu^2} } .
\end{eqnarray}\)
|
(6)
|
In summary, for a given estimate $\hat{\theta}$ of the population parameter $\theta$, the algorithm for approximating the Fisher Information Matrix $I(\hat{\theta)}$ using a linear approximation of the model consists of:
1. For $i=1,2,\ldots,N$, obtain some estimate $(\hpsi_i)$ of the individual parameters $(\psi_i)$ (we can average for example the final terms of the sequence $(\psi_i^{(k)})$ drawn during the final iterations of the SAEM algorithm).
2. For $i=1,2,\ldots,N$, compute $\hphi_i=h(\hpsi_i)$, the mean and the variance of the normal distribution defined in (5), and ${\llike}(\theta)$ using this normal approximation.
3. Use (6) to approximate the matrix of second-order derivatives of ${\llike}(\theta)$.