4. Derivations#

4.1. GPFA model#

Equation (4.1) represents the GPFA model:

(4.1)#\[\begin{split}p(\{x_{kr}(\cdot)\}_{k=1,r=1}^{K,R}) &= \prod_{r=1}^R\prod_{k=1}^Kp(x_{kr}(\cdot))\\ x_{kr}(\cdot) &\sim \mathcal{GP}(\mu_k(\cdot),\kappa_k(\cdot,\cdot))&&{\text{for}\; k=1, \ldots, K\;\text{and}\;r=1,\ldots, R}\\ h_{nr}(\cdot) &= \sum_{k=1}^Kc_{nk}x_{kr}(\cdot) + d_n&&{\text{for}\; n=1, \ldots, N\;\text{and}\;r=1,\ldots, R}\\ p(\{y_{nr}\}_{n=1,r=1}^{N,R}|\{h_{nr}(\cdot)\}_{n=1,r=1}^{N,R}) &= \prod_{r=1}^R\prod_{n=1}^Np(y_{nr}|h_{nr}(\cdot))\end{split}\]

where \(x_{kr}(\cdot)\) is the latent process \(k\) in trial \(r\), \(h_{nr}(\cdot)\) is the embedding process for neuron \(n\) and trial \(r\) and \(y_{nr}\) is the activity of neuron \(n\) in trial \(r\).

Notes:

the first equation shows that the latent processes are independent,

the second equation shows that the latent processes share mean and covariance functions across trials. That is, for any \(k\), the mean and covariance functions of latents processes of different trials, \(x_{kr}(\cdot), r=1,\ldots, R\), are the same (\(\mu_k(\cdot)\) and \(\kappa_k(\cdot,\cdot)\)),

the fourth equation shows that, given the embedding processes, the responses of different neurons are independent.

4.2. GPFA with inducing points model#

To use the sparse variational framework for Gaussian processes, Duncker and Sahani [DS18] augmented the GPFA model by introducing inducing points \(\mathbf{u}_{kr}\) for each latent process \(k\) and trial \(r\). The inducing points \(\mathbf{u}_{kr}\) represent evaluations of the latent process \(x_{kr}(\cdot)\) at locations \(\mathbf{z}_{kr}=\left[z_{kr}[0],\ldots,z_{kr}[M_{kr}-1]\right]\). A joint prior over the latent process \(x_{kr}(\cdot)\) and its inducing points \(\mathbf{u}_{kr}\) is given in Eq. (4.2).

(4.2)#\[\begin{split}p(\{x_{kr}(\cdot)\}_{k=1,r=1}^{K,R},\{\mathbf{u}_{kr}\}_{k=1,r=1}^{K,R}) &= p(\{x_{kr}(\cdot)\}_{k=1,r=1}^{K,R}|\{\mathbf{u}_{kr}\}_{k=1,r=1}^{K,R})p(\{\mathbf{u}_{kr}\}_ {k=1,r=1}^{K,R})\\ p(\{x_{kr}(\cdot)\}_{k=1,r=1}^{K,R}|\{\mathbf{u}_{kr}\}_ {k=1,r=1}^{K,R}) &= \prod_{k=1}^k\prod_{r=1}^{R}p(x_{kr}(\cdot)|\mathbf{u}_{kr})\\ p(\{\mathbf{u}_{kr}\}_{k=1,r=1}^{K,R})&=\prod_{k=1}^k\prod_{r=1}^{R}p(\mathbf{u}_{kr})\\ p(\mathbf{u}_{kr})&=\mathcal{N}(\mathbf{0},K^{kr}_{zz})\end{split}\]

where \(K_{zz}^{(kr)}[i,j]=\kappa_k(z_{kr}[i],z_{kr}[j])\).

We next derive the functional form of \(p(x_{kr}(\cdot)|\mathbf{u}_{kr})\).

Define the random vector \(\mathbf{x}_{kr}\) as the random process \(x_{kr}(\cdot)\) evaluated at times \(\mathbf{t}^{(r)}=\left\{t_1^{(r)},\ldots,t_M^{(r)}\right\}\) (i.e., \(\mathbf{x}_{kr}=[x_{kr}(t_1^{(r)}),\ldots,x_{kr}(t_M^{(r)})]^\intercal\)). Because the inducing points \(\mathbf{u}_{kr}\) are evaluations of the latent process \(x_{kr}(\cdot)\) at \(\mathbf{z}_{kr}\), then \(\mathbf{x}_{kr}\) and \(\mathbf{u}_{kr}\) are jointly Gaussian:

(4.3)#\[\begin{split}p\left(\left[\begin{array}{c} \mathbf{u}_{kr}\\ \mathbf{x}_{kr} \end{array}\right]\right) =\mathcal{N}\left(\left.\left[\begin{array}{c} \mathbf{u}_{kr}\\ \mathbf{x}_{kr} \end{array}\right]\right|\left[\begin{array}{c} \mathbf{0}\\ \mathbf{0} \end{array}\right],\left[\begin{array}{cc} K_\mathbf{zz}^{(kr)}&K_\mathbf{zt}^{(kr)}\\ K_\mathbf{tz}^{(kr)}&K_\mathbf{tt}^{(r)} \end{array}\right]\right)\end{split}\]

where \(K_\mathbf{tz}^{(kr)}[i,j]=\kappa_k(t^{(r)}_i,z_{kr}[j])\), \(K_\mathbf{zt}^{(kr)}[i,j]=\kappa_k(z_{kr}[i],t_j^{(r)})\) and \(K_\mathbf{tt}^{(r)}[i,j]=\kappa_k(t_i^{(r)},t_j^{(r)})\).

Now, applying the formula for the conditional pdf for jointly Normal random vectors [Bis16], Eq. 2116, to Eq. (4.3), we obtain

(4.4)#\[p(\mathbf{x}_{kr}|\mathbf{u}_{kr})=\mathcal{N}\left(\mathbf{x}_{kr}\left|K_\mathbf{tz}^{(kr)}\left(K_{zz}^{(kr)}\right)^{-1}\mathbf{u}_{kr},\;K_\mathbf{tt}^{(r)}-K_\mathbf{tz}^{(kr)}\left(K_{zz}^{(kr)}\right)^{-1}K_\mathbf{zt}^{(kr)}\right.\right)\]

Because Eq. (4.4) is valid for any \(\mathbf{t}^{(r)}\), it follows that

\[p(x_{kr}(\cdot)|\mathbf{u}_{kr})=\mathcal{GP}\left(\tilde{\mu}_{kr}(\cdot), \tilde{\kappa}_{kr}(\cdot,\cdot\right))\]

with

\[\begin{split}\tilde{\mu}_{kr}(t)&=\kappa_k(t,\mathbf{z}_{kr})\left(K_{zz}^{(kr)}\right)^{-1}\mathbf{u}_{kr},\\ \tilde{\kappa}_k(t,t')&=\kappa_k(t,t')-\kappa_k(t,\mathbf{z}_{kr})\left(K_{zz}^{(kr)}\right)^{-1}\kappa_k(\mathbf{z}_{kr},t')\end{split}\]

which is Eq. 3 in Duncker and Sahani [DS18].

4.3. svGPFA variational lower bound#

Theorem 4.1 proves Eq. 4 in Duncker and Sahani [DS18].

Theorem 4.1 (svGPFA Variational Lower Bound)

Let \(\mathcal{Y}=\{y_{nr}\}_{n=1,r=1}^{N,R}\) then

(4.5)#\[\log p(\mathcal{Y})\ge\sum_{n=1}^N\sum_{r=1}^R\mathbb{E}_{q\left(h_{nr}(\cdot)\right)}\left\{\log p(y_{nr}|h_{nr}(\cdot))\right\}-\sum_{r=1}^R\sum_{k=1}^KKL(q(\mathbf{u}_{kr})||p(\mathbf{u}_{kr}))\]

Proof

We begin with the joint-data likelihood of the full model, given in Eq.1 of the supplementary material in Duncker and Sahani [DS18]

(4.6)#\[p\left(\mathcal{Y},\{x_{kr}(\cdot)\}_{k=1,r=1}^{K,R},\{\mathbf{u}_{kr}\}_{k=1,r=1}^{K,R}\right)=p\left(\mathcal{Y}|\{x_{kr}(\cdot)\}_{k=1,r=1}^{K,R}\right)\prod_{k=1}^K\prod_{r=1}^Rp(x_{kr}(\cdot)|\mathbf{u}_{kr})p(\mathbf{u}_{kr})\]

For notational clarity, from now on we omit the bounds of the \(k\) and \(r\) indices. From Corollary 4.3 by taking \(x=\mathcal{Y}\) and \(z=\left(\{x_{kr}(\cdot)\},\{\mathbf{u}_{kr}\}\right)\), we obtain

(4.7)#\[\log p\left(\mathcal{Y}\right)\ge\int\int q\left(\{x_{kr}(\cdot)\},\{\mathbf{u}_{kr}\}\right)\log\frac{p\left(\mathcal{Y},\{x_{kr}(\cdot)\},\{\mathbf{u}_{kr}\}\right)}{q\left(\{x_{kr}(\cdot)\},\{\mathbf{u}_{kr}\}\right)}d\{x_{kr}(\cdot)\}d\{\mathbf{u}_{kr}\}\]

Choosing

\[q\left(\{x_{kr}(\cdot)\},\{\mathbf{u}_{kr}\}\right)=\prod_{r=1}^R\prod_{k=1}^Kp(x_{kr}(\cdot)|\mathbf{u}_{kr})q(\mathbf{u}_{kr})\]

and using Eq. (4.6) we can rewrite Eq. (4.7) as

\[\begin{split}\log p\left(\mathcal{Y}\right)\ge&\int\int q\left(\{x_{kr}(\cdot)\},\{\mathbf{u}_{kr}\}\right)\left(\log p\left(\mathcal{Y}|\{x_{kr}(\cdot)\}\right)-\sum_{r=1}^R\sum_{k=1}^K\log\frac{q\left(\mathbf{u}_{kr}\right)}{p\left(\mathbf{u}_{kr}\right)}\right)d\{x_{kr}(\cdot)\}d\{\mathbf{u}_{kr}\}\nonumber\\ =&\int\int q\left(\{x_{kr}(\cdot)\},\{\mathbf{u}_{kr}\}\right)\log p\left(\mathcal{Y}|\{x_{kr}(\cdot)\}\right)d\{x_{kr}(\cdot)\}d\{\mathbf{u}_{kr}\}-\nonumber\\ &\int\int q\left(\{x_{kr}(\cdot)\},\{\mathbf{u}_{kr}\}\right)\sum_{r=1}^R\sum_{k=1}^K\log\frac{q\left(\mathbf{u}_{kr}\right)}{p\left(\mathbf{u}_{kr}\right)}d\{x_{kr}(\cdot)\}d\{\mathbf{u}_{kr}\}\nonumber\\ =&\int q\left(\{x_{kr}(\cdot)\}\right)\log p\left(\mathcal{Y}|\{x_{kr}(\cdot)\}\right)d\{x_{kr}(\cdot)\}-\sum_{r=1}^R\sum_{k=1}^K\int q\left(\mathbf{u}_{kr}\right)\log\frac{q\left(\mathbf{u}_{kr}\right)}{p\left(\mathbf{u}_{kr}\right)}d\mathbf{u}_{kr}\nonumber\\ =&\;\mathbb{E}_{q\left(\{x_{kr}(\cdot)\}\right)}\left\{\log p\left(\mathcal{Y}|\{x_{kr}(\cdot)\}\right)\right\}-\sum_{r=1}^R\sum_{k=1}^K\text{KL}\left(q\left(\mathbf{u}_{kr}\right)||p\left(\mathbf{u}_{kr}\right)\right)\\ =&\;\mathbb{E}_{q\left(\{h_{nr}(\cdot)\}\right)}\left\{\log p\left(\mathcal{Y}|\{h_{nr}(\cdot)\}\right)\right\}-\sum_{r=1}^R\sum_{k=1}^K\text{KL}\left(q\left(\mathbf{u}_{kr}\right)||p\left(\mathbf{u}_{kr}\right)\right)\\ =&\;\mathbb{E}_{q\left(\{h_{nr}(\cdot)\}\right)}\left\{\sum_{n=1}^N\sum_{r=1}^R\log p\left(y_{nr}|h_{nr}(\cdot)\right)\right\}-\sum_{r=1}^R\sum_{k=1}^K\text{KL}\left(q\left(\mathbf{u}_{kr}\right)||p\left(\mathbf{u}_{kr}\right)\right)\\ =&\;\sum_{n=1}^N\sum_{r=1}^R\mathbb{E}_{q\left(h_{nr}(\cdot)\right)}\left\{\log p\left(y_{nr}|h_{nr}(\cdot)\right)\right\}-\sum_{r=1}^R\sum_{k=1}^K\text{KL}\left(q\left(\mathbf{u}_{kr}\right)||p\left(\mathbf{u}_{kr}\right)\right)\nonumber\end{split}\]

Notes:

the derivation of the equation in the sixth line from that in the fifth one is subtle. It assumes that there exists a measurable and injective change of variables function \(f(\{x_{kr}(\cdot)\})=\{h_{nr}(\cdot)\}\).
the equation in the seventh line follows from that in the sixth one using the last line in Eq. (4.1).

Lemma 4.2 (Variational Equality)

\[\log p(x) = \mathbb{E}_{q(z)}\left\{\log\frac{p(x,z)}{q(z)}\right\}+\text{KL}\left\{q(z)||p(z|x)\right\}\]

Proof

\[\begin{split}p(x)&=\frac{p(x,z)}{p(z|x)}=\frac{p(x,z)}{q(z)}\frac{q(z)}{p(z|x)}\\ \log p(x)&=\log\frac{p(x,z)}{q(z)}+\log\frac{q(z)}{p(z|x)}\\ \log p(x)&=\mathbb{E}_{q(z)}\left\{\log\frac{p(x,z)}{q(z)}\right\}+\mathbb{E}_{q(z)}\left\{\log\frac{q(z)}{p(z|x)}\right\}\\ \log p(x)&=\mathbb{E}_{q(z)}\left\{\log\frac{p(x,z)}{q(z)}\right\}+\text{KL}\left\{q(z)||p(z|x)\right\}\end{split}\]

Notes:

the first equation uses Bayes rule,
the third equation applies the expected value to both sides of the second equation,
the last equation uses the definition of the KL divergence.

Corollary 4.3 (Variational Inequality)

(4.8)#\[\log p(x) \ge \mathbb{E}_{q(z)}\left\{\log\frac{p(x,z)}{q(z)}\right\}\]

with equality if and only if \(q(z)=p(z|x)\).

Proof

Equation (4.8) follows from Lemma 4.2 by the fact that the KL divergence between two distributions is greater or equal than zero, with equality if and only if the distributions are equal (Information inequality, Cover and Thomas [CT91], Theorem 2.6.3).

4.4. Variational distribution of \(h_{nr}(\cdot)\)#

For the calculation of the lower bound in the right-hand side of Eq. (4.5), below we derive the distribution \(q(h_{nr}(\cdot))\).

We first deduce the distribution \(q(x_{xr}(\cdot))\). Note, from Eq. (4.2), that for any \(P\in\mathbb{N}\) and for any \(\mathbf{t}=(t_1,\ldots,t_P)\in\mathbb{R}^P\) the approximate variational posterior of the random vectors \(\mathbf{x}_{kr}=(x_{kr}(t_1),\ldots,x_{kr}(t_P))\) and \(\mathbf{u}_{kr}\) is jointly Gaussian

\[\begin{split}q(\mathbf{x}_{kr},\mathbf{u}_{kr})&=p(\mathbf{x}_{kr}|\mathbf{u}_{kr})q(\mathbf{u}_{kr})\\ &=\mathcal{N}\left(\mathbf{x}_{kr}|K_{tz}^{kr}(K_{zz}^{kr})^{-1}\mathbf{u}_{kr},\;K_{tt}^k-K_{tz}^{kr}(K_{zz}^{kr})^{-1}K_{zt}^{kr}\right)\mathcal{N}(\mathbf{u}_{kr}|\mathbf{m}_{kr},\;S_{kr})\end{split}\]

where \(K_{tt}\), \(K_{tz}\), \(K_{zt}\), and \(K_{zz}\) are covariance matrices obtained by evalating of \(\kappa_k(t,t')\), \(\kappa_k(t,z)\), \(\kappa_k(z,t)\), and \(\kappa_k(z,z')\), respectively, at \(t,t'\in \{t_1,\ldots t_P\}\) and \(z,z'\in \{\mathbf{z}_{kr}[1],\ldots,\mathbf{z}_{kr}[M_{kr}]\}\). Next, using the expression for the marginal of a joint Gaussian distribution (e.g., Eq.~2.115 in Bishop [Bis16]) we obtain

(4.9)#\[q(\mathbf{x}_{kr})=\mathcal{N}\left(\mathbf{x}_{kr}|K_{tz}^{kr}(K_{zz}^{kr})^{-1}\mathbf{m}_{kr},\;K_{tt}^k+K_{tz}^{kr}\left((K_{zz}^{kr})^{-1}S_{kr}(K_{zz}^{kr})^{-1}-(K_{zz}^{kr})^{-1 }\right)K_{zt}^{kr}\right)\]

Because Eq. (4.9) holds for any \(P\in\mathbb{N}\) and for any \(t_1,\ldots,t_P)\in\mathbb{R}^P\) then

(4.10)#\[\begin{split}q(x_{kr}(\cdot))&=\mathcal{GP}\left(\breve\mu_{kr}(\cdot),\breve\kappa_{kr}(\cdot,\cdot)\right)\\ \breve\mu_{kr}(t)&=\kappa_k(t,z_{kr})(K_{zz}^{kr})^{-1}\mathbf{m}_{kr},\\ \breve\kappa_{kr}(t,t')&=\kappa_k(t,t')+\kappa_k(t,z_{kr})\left((K_{zz}^{kr})^{-1}S_{kr}(K_{zz}^{kr})^{-1}-(K_{zz}^{kr})^{-1}\right)\kappa_k(z_{kr},t')\end{split}\]

Finally, because affine trasformations of Gaussians are Gaussians, \(h_{nr}(\cdot)\) is an affine transformation of \(\{x_{kr}(\cdot)\}\) (which are Gaussians, Eq. (4.10)), then the approximate posterior of \(h_{nr}(\cdot)\) is the Gaussian process in Eq. (4.11).

(4.11)#\[\begin{split}q(h_{nr}(\cdot))&=\mathcal{GP}\left(\tilde\mu_{nr}(\cdot),\tilde\kappa_{nr}(\cdot,\cdot)\right)\\ \tilde\mu_{nr}(t)&=\sum_{k=1}^Kc_{nk}\breve\mu_{kr}(t)+d_n\\ \tilde\kappa_{nr}(t,t')&=\sum_{k=1}^Kc_{nk}^2\breve\kappa_{kr}(t,t')\end{split}\]

which is Eq. 5 in Duncker and Sahani [DS18].