4. Derivations#
4.1. GPFA model#
Equation (4.1) represents the GPFA model:
where \(x_{kr}(\cdot)\) is the latent process \(k\) in trial \(r\), \(h_{nr}(\cdot)\) is the embedding process for neuron \(n\) and trial \(r\) and \(y_{nr}\) is the activity of neuron \(n\) in trial \(r\).
Notes:
the first equation shows that the latent processes are independent,
the second equation shows that the latent processes share mean and covariance functions across trials. That is, for any \(k\), the mean and covariance functions of latents processes of different trials, \(x_{kr}(\cdot), r=1,\ldots, R\), are the same (\(\mu_k(\cdot)\) and \(\kappa_k(\cdot,\cdot)\)),
the fourth equation shows that, given the embedding processes, the responses of different neurons are independent.
4.2. GPFA with inducing points model#
To use the sparse variational framework for Gaussian processes, Duncker and Sahani [DS18] augmented the GPFA model by introducing inducing points \(\mathbf{u}_{kr}\) for each latent process \(k\) and trial \(r\). The inducing points \(\mathbf{u}_{kr}\) represent evaluations of the latent process \(x_{kr}(\cdot)\) at locations \(\mathbf{z}_{kr}=\left[z_{kr}[0],\ldots,z_{kr}[M_{kr}-1]\right]\). A joint prior over the latent process \(x_{kr}(\cdot)\) and its inducing points \(\mathbf{u}_{kr}\) is given in Eq. (4.2).
where \(K_{zz}^{(kr)}[i,j]=\kappa_k(z_{kr}[i],z_{kr}[j])\).
We next derive the functional form of \(p(x_{kr}(\cdot)|\mathbf{u}_{kr})\).
Define the random vector \(\mathbf{x}_{kr}\) as the random process \(x_{kr}(\cdot)\) evaluated at times \(\mathbf{t}^{(r)}=\left\{t_1^{(r)},\ldots,t_M^{(r)}\right\}\) (i.e., \(\mathbf{x}_{kr}=[x_{kr}(t_1^{(r)}),\ldots,x_{kr}(t_M^{(r)})]^\intercal\)). Because the inducing points \(\mathbf{u}_{kr}\) are evaluations of the latent process \(x_{kr}(\cdot)\) at \(\mathbf{z}_{kr}\), then \(\mathbf{x}_{kr}\) and \(\mathbf{u}_{kr}\) are jointly Gaussian:
where \(K_\mathbf{tz}^{(kr)}[i,j]=\kappa_k(t^{(r)}_i,z_{kr}[j])\), \(K_\mathbf{zt}^{(kr)}[i,j]=\kappa_k(z_{kr}[i],t_j^{(r)})\) and \(K_\mathbf{tt}^{(r)}[i,j]=\kappa_k(t_i^{(r)},t_j^{(r)})\).
Now, applying the formula for the conditional pdf for jointly Normal random vectors [Bis16], Eq. 2116, to Eq. (4.3), we obtain
Because Eq. (4.4) is valid for any \(\mathbf{t}^{(r)}\), it follows that
with
which is Eq. 3 in Duncker and Sahani [DS18].
4.3. svGPFA variational lower bound#
Theorem 4.1 proves Eq. 4 in Duncker and Sahani [DS18].
Let \(\mathcal{Y}=\{y_{nr}\}_{n=1,r=1}^{N,R}\) then
We begin with the joint-data likelihood of the full model, given in Eq.1 of the supplementary material in Duncker and Sahani [DS18]
For notational clarity, from now on we omit the bounds of the \(k\) and \(r\) indices. From Corollary 4.3 by taking \(x=\mathcal{Y}\) and \(z=\left(\{x_{kr}(\cdot)\},\{\mathbf{u}_{kr}\}\right)\), we obtain
Choosing
and using Eq. (4.6) we can rewrite Eq. (4.7) as
\[\begin{split}\log p\left(\mathcal{Y}\right)\ge&\int\int q\left(\{x_{kr}(\cdot)\},\{\mathbf{u}_{kr}\}\right)\left(\log p\left(\mathcal{Y}|\{x_{kr}(\cdot)\}\right)-\sum_{r=1}^R\sum_{k=1}^K\log\frac{q\left(\mathbf{u}_{kr}\right)}{p\left(\mathbf{u}_{kr}\right)}\right)d\{x_{kr}(\cdot)\}d\{\mathbf{u}_{kr}\}\nonumber\\ =&\int\int q\left(\{x_{kr}(\cdot)\},\{\mathbf{u}_{kr}\}\right)\log p\left(\mathcal{Y}|\{x_{kr}(\cdot)\}\right)d\{x_{kr}(\cdot)\}d\{\mathbf{u}_{kr}\}-\nonumber\\ &\int\int q\left(\{x_{kr}(\cdot)\},\{\mathbf{u}_{kr}\}\right)\sum_{r=1}^R\sum_{k=1}^K\log\frac{q\left(\mathbf{u}_{kr}\right)}{p\left(\mathbf{u}_{kr}\right)}d\{x_{kr}(\cdot)\}d\{\mathbf{u}_{kr}\}\nonumber\\ =&\int q\left(\{x_{kr}(\cdot)\}\right)\log p\left(\mathcal{Y}|\{x_{kr}(\cdot)\}\right)d\{x_{kr}(\cdot)\}-\sum_{r=1}^R\sum_{k=1}^K\int q\left(\mathbf{u}_{kr}\right)\log\frac{q\left(\mathbf{u}_{kr}\right)}{p\left(\mathbf{u}_{kr}\right)}d\mathbf{u}_{kr}\nonumber\\ =&\;\mathbb{E}_{q\left(\{x_{kr}(\cdot)\}\right)}\left\{\log p\left(\mathcal{Y}|\{x_{kr}(\cdot)\}\right)\right\}-\sum_{r=1}^R\sum_{k=1}^K\text{KL}\left(q\left(\mathbf{u}_{kr}\right)||p\left(\mathbf{u}_{kr}\right)\right)\\ =&\;\mathbb{E}_{q\left(\{h_{nr}(\cdot)\}\right)}\left\{\log p\left(\mathcal{Y}|\{h_{nr}(\cdot)\}\right)\right\}-\sum_{r=1}^R\sum_{k=1}^K\text{KL}\left(q\left(\mathbf{u}_{kr}\right)||p\left(\mathbf{u}_{kr}\right)\right)\\ =&\;\mathbb{E}_{q\left(\{h_{nr}(\cdot)\}\right)}\left\{\sum_{n=1}^N\sum_{r=1}^R\log p\left(y_{nr}|h_{nr}(\cdot)\right)\right\}-\sum_{r=1}^R\sum_{k=1}^K\text{KL}\left(q\left(\mathbf{u}_{kr}\right)||p\left(\mathbf{u}_{kr}\right)\right)\\ =&\;\sum_{n=1}^N\sum_{r=1}^R\mathbb{E}_{q\left(h_{nr}(\cdot)\right)}\left\{\log p\left(y_{nr}|h_{nr}(\cdot)\right)\right\}-\sum_{r=1}^R\sum_{k=1}^K\text{KL}\left(q\left(\mathbf{u}_{kr}\right)||p\left(\mathbf{u}_{kr}\right)\right)\nonumber\end{split}\]
- Notes:
the derivation of the equation in the sixth line from that in the fifth one is subtle. It assumes that there exists a measurable and injective change of variables function \(f(\{x_{kr}(\cdot)\})=\{h_{nr}(\cdot)\}\).
the equation in the seventh line follows from that in the sixth one using the last line in Eq. (4.1).
- Notes:
the first equation uses Bayes rule,
the third equation applies the expected value to both sides of the second equation,
the last equation uses the definition of the KL divergence.
with equality if and only if \(q(z)=p(z|x)\).
4.4. Variational distribution of \(h_{nr}(\cdot)\)#
For the calculation of the lower bound in the right-hand side of Eq. (4.5), below we derive the distribution \(q(h_{nr}(\cdot))\).
We first deduce the distribution \(q(x_{xr}(\cdot))\). Note, from Eq. (4.2), that for any \(P\in\mathbb{N}\) and for any \(\mathbf{t}=(t_1,\ldots,t_P)\in\mathbb{R}^P\) the approximate variational posterior of the random vectors \(\mathbf{x}_{kr}=(x_{kr}(t_1),\ldots,x_{kr}(t_P))\) and \(\mathbf{u}_{kr}\) is jointly Gaussian
where \(K_{tt}\), \(K_{tz}\), \(K_{zt}\), and \(K_{zz}\) are covariance matrices obtained by evalating of \(\kappa_k(t,t')\), \(\kappa_k(t,z)\), \(\kappa_k(z,t)\), and \(\kappa_k(z,z')\), respectively, at \(t,t'\in \{t_1,\ldots t_P\}\) and \(z,z'\in \{\mathbf{z}_{kr}[1],\ldots,\mathbf{z}_{kr}[M_{kr}]\}\). Next, using the expression for the marginal of a joint Gaussian distribution (e.g., Eq.~2.115 in Bishop [Bis16]) we obtain
Because Eq. (4.9) holds for any \(P\in\mathbb{N}\) and for any \(t_1,\ldots,t_P)\in\mathbb{R}^P\) then
Finally, because affine trasformations of Gaussians are Gaussians, \(h_{nr}(\cdot)\) is an affine transformation of \(\{x_{kr}(\cdot)\}\) (which are Gaussians, Eq. (4.10)), then the approximate posterior of \(h_{nr}(\cdot)\) is the Gaussian process in Eq. (4.11).
which is Eq. 5 in Duncker and Sahani [DS18].