1. Parameters and their specification#
svGPFA uses different groups of parameters. We provide a utility function
svGPFA.utils.initUtils.getParamsAndKernelsTypes()
that builds them from
parameter specifications. These specifications are short descriptions on how
to build a parameter. For example, a parameter specification for the inducing
points locations can be equidistant
, indicating that the inducing points
locations should be set to equidistant values between the start and end of a
trial.
A parameter specification is a nested list (e.g.,
param_spec[group_name][param_name]
) containing the specification of
parameter param_name
in group group_name
. It can be built:
automatically, using default values, with the utility function
svGPFA.utils.initUtils.getDefaultParamsDict()
,manually, by setting parameter specifications in the Python code,
from the command line, with the utility function
svGPFA.utils.initUtils.getParamsDictFromArgs()
,from a configuration file, with the utility function
svGPFA.utils.initUtils.getParamsDictFromStringsDict()
.The Colab notebooks automatically builds this list (1). This script builds the parameters specification list from the command line and from this configuration file (3, 4).
Below we describe all svGPFA parameters and their specifications. Refer to the documentation of the above utility functions for details on how to use them.
1.1. Data structure parameters#
There are two data structure parameters trials_start_times
and
trials_end_times
, which are tensors of length n_trials
giving the start
and end times of each trial; i.e., \(\tau_i\) in Eq 7 in
Duncker and Sahani [DS18].
These parameters can be specified in a trial-specific or trial-common format. If both are specified, the longer format takes precedence.
1.1.1. Trial-specific format#
Two items need to be specified:
trials_start_times
should provide a list of lengthn_trials
, with float values indicating seconds, such thattrials_start_times[i]
gives the start time of the ith trial.trials_end_times
should provide a list of lengthn_trials
, with float values indicating seconds, such thattrials_end_times[i]
gives the end time of the ith trial.params_spec["data_structure_params"] = { "trials_start_times": [0.0, 0.4, 0.7], "trials_end_times": [0.2, 0.5, 0.9], }
1.1.2. Trial-common format#
Two items need to be specified:
trials_start_time
should provide the start time (float value, secs) of all trials.trials_end_time
should provide the end time (float value, secs) of all trials.params_spec["data_structure_params"] = { "trials_start_time": 0.0, "trials_end_time": 1.0, }
1.1.3. Defaults#
All trials start at 0.0 sec and end at 1.0 sec.
params_spec["data_structure_params"] = { "trials_start_time": 0.0, "trials_end_time": 1.0, }
1.2. Initial values of model parameters#
Initial values for four types of model parameters need to be specified:
For most parameters types initial values can be specified in a binary format or in a non-binary shorter or longer formats. In the binary format parameters are given as PyTorch tensors. The shorter format provides the same initial value for all latents and trials, whereas the longer format gives different initial values for each latent and trial. If both shorter and longer format are specified, the longer format take precedence.
1.2.1. Variational parameters#
The variational parameters are the means (\(\mathbf{m}_k^{(r)}\), Duncker and Sahani [DS18], p.3) and covariances (\(S_k^{(r)}\), Duncker and Sahani [DS18], p.3) of the inducing points (\(\mathbf{u}_k^{(r)}\), Duncker and Sahani [DS18], p.3). The data structures for these parameters are described in the next section.
1.2.1.1. Python variable format#
Two items need to be specified:
variational_mean0
should be a list of sizen_latents
. The kth element of this list should be atorch.DoubleTensor
of dimension (n_trials
,n_indPoints[k]
, 1), wherevariational_mean0[k][r, :, 0]
gives the initial variational mean for latentk
and trialr
.variational_cov0
should be a list of sizen_latents
. The kth element of this list should be atorch.DoubleTensor
of dimension (n_trials
,n_indPoints[k]
,n_indPoints[k]
), wherevariational_cov0[k][r, :, :]
gives the initial variational covariance for latentk
and trialr
.n_latents = 3 n_ind_points = [20, 10, 15] var_mean0 = [torch.normal(mean=0, std=1, size=(n_trials, n_ind_points[k], 1), dtype=torch.double) for k in range(n_latents)] diag_value = 1e-2 var_cov0 = [[] for r in range(n_latents)] for k in range(n_latents): var_cov0[k] = torch.empty((n_trials, n_ind_points[k], n_ind_points[k]), dtype=torch.double) for r in range(n_trials): var_cov0[k][r, :, :] = torch.eye(n_ind_points[k], dtype=torch.double)*diag_value params_spec["variational_params0"] = { "variational_mean0": var_mean0, "variational_cov0": var_cov0, }
1.2.1.2. Latent-trial-specific filename format#
For every latent, k, and every trial, r, two items need to be specified:
variational_mean0_filename_latent<k>_trial<r>
should provide the filename (csv format readable by pandas read_csv function) containing the initial values of the variational mean for latent k and trial r. This file should contain a vector of size number_of_inducing_points.variational_cov0_filename_latent<k>_trial<r>
should provide the filename (csv format readable by pandas read_csv function) containing the initial values of the variational covariance for latent k and trial r. This file should contain a matrix of size number_of_inducing_points x number_of_inducing_points.params_spec["variational_params0"] = { "variational_mean0_filename_latent0_trial0": "../data/uniform_0.00_1.00_len09.csv", "variational_cov0_filename_latent0_trial0": "../data/identity_scaled1e-2_09x09.csv", "variational_mean0_filename_latent0_trial1": "../data/gaussian_0.00_1.00_len09.csv", "variational_cov0_filename_latent0_trial1": "../data/identity_scaled1e-4_09x09.csv", "variational_mean0_filename_latent1_trial0": "../data/uniform_0.00_1.00_len09.csv", "variational_cov0_filename_latent1_trial0": "../data/identity_scaled1e-2_09x09.csv", "variational_mean0_filename_latent1_trial1": "../data/gaussian_0.00_1.00_len09.csv", "variational_cov0_filename_latent1_trial1": "../data/identity_scaled1e-4_09x09.csv", }
1.2.1.3. Latent-trial-common filename format#
Two items need to be specified:
variational_means0_filename
should provide the filename (csv format readable by pandas read_csv function) containing the initial values of the variational mean for all latents and trials. This file should contain a vector of size number_of_inducing_points.variational_covs0_filename
should provide the filename (csv format readable by pandas read_csv function) containing the initial values of the variational covariance for all latents and trials. This file should contain a matrix of size number_of_inducing_points x number_of_inducing_points.params_spec["variational_params0"] = { "variational_means0_filename": "../data/uniform_0.00_1.00_len09.csv", "variational_covs0_filename": "../data/identity_scaled1e-2_09x09.csv", }
1.2.1.4. Constant value format#
This initialisation option sets the same variational mean and covariance across all latents and trials. The common variational mean has all elements equal to a constant value, and the common variational covariance is a scaled identity matrix.
Two items need to be specified:
variational_mean0_constant_value
should provide a float value giving the constant value of all elements of the common variational mean.variational_cov0_diag_value
should provide a float value giving the diagonal value of the common variational covariance.params_spec["variational_params0"] = { "variational_mean0_constant_value": 0.0, "variational_cov0_diag_value": 0.01, }
1.2.1.5. Defaults#
The default variational mean and covariance have constant values. For the variational mean the constant value is zero and for the variational covariance the constant diagonal value is 0.01.
params_spec["variational_params0"] = { "variational_mean0_constant_value": 0.0, "variational_cov0_diag_value": 0.01, }
1.2.2. Embedding parameters#
The embedding parameters are the loading matrix (\(C\), Duncker and Sahani [DS18], Eq. 1, middle) and offset vector (\(\mathbf{d}\), Duncker and Sahani [DS18], Eq. 1 middle). The data structures for these parameters are described in the next section.
1.2.2.1. Python variable format#
Two items need to be specified:
c0
should be atorch.DoubleTensor
of size (n_neurons, n_latents)d0
should be atorch.DoubleTensor
of size (n_neurons, 1)n_neurons = 100 n_latents = 3 params_spec["embedding_params0"] = { "c0": torch.normal(mean=0.0, std=1.0, size=(n_neurons, n_latents), dtype=torch.double), "d0": torch.normal(mean=0.0, std=1.0, size=(n_neurons, 1), dtype=torch.double), }
1.2.2.2. Filename format#
Two items need to be specified:
c0_filename
gives the filename (csv format readable by pandas read_csv function) containing the values of loading matrixC
,d0_filename
gives the filename (csv format readable by pandas read_csv function) containing the values of offset vectord
.params_spec["embedding_params0"] = { "c0_filename": "../data/C_constant_1.00constant_100neurons_02latents.csv", "d0_filename": "../data/d_constant_0.00constant_100neurons.csv", }
1.2.2.3. Random format#
Eight items need to be specified:
c0_distribution
string value giving the name of the distribution of the loading matrix C (e.g., Normal).c0_loc
float number giving the location of the distribution of the loading matrix C (e.g., 0.0).c0_scale
float value giving the scale of the distribution of the loading matrix C (e.g., 1.0).c0_random_seed
optional integer value giving the value of the random seed to be set prior to generating the random transition matrixC
. This value can be specified for replicability. If not given, the random seed is not changed prior to generatingC
.d0_distribution
string value giving the name of the distribution of the offset vectord
(e.g., Normal).d0_loc
float number giving the location of the distribution of the offset vectord
(e.g., 0.3).d0_scale
float value giving the scale of the distribution of the offset vectord
(e.g., 1.0).d0_random_seed
optional integer value giving the value of the random seed to be set prior to generating the random transition matrixd
. This value can be specified for replicability. If not given, the random seed is not changed prior to generatingd
.params_spec["embedding_params0"] = { "c0_distribution": "Normal", "c0_loc": 0.0, "c0_scale": 1.0, "c0_random_seed": 102030, "d0_distribution": "Normal", "d0_loc": 0.0, "d0_scale": 1.0, "d0_random_seed": 203040, }
1.2.2.4. Defaults#
The default loading matrix C0/offset vector d0 is a zero mean standard normal random matrix/vector.
params_spec["embedding_params0"] = { "c0_distribution": "Normal", "c0_loc": 0.0, "c0_scale": 1.0, "d0_distribution": "Normal", "d0_loc": 0.0, "d0_scale": 1.0, }
1.2.3. Kernel parameters#
The kernel parameters of latent k are those of the Gaussian process covariance function (\(\kappa_k(\cdot,\cdot)\), Duncker and Sahani [DS18], p. 2). The data structures for these parameters are described in the next section.
1.2.3.1. Python variable format#
Two items need to be specified:
k_types
should be a list of sizen_latents
. The kth element of this list should be a string with the type of kernel for the kth latent (e.g.,k_types[k]=exponentialQuadratic
).k_params0
should be a list of sizen_latents
. The kth element of this list should be atorch.DoubleTensor
containing the parameters of the kth kernel (e.g.,k_params0[k]=torch.DoubleTensor([3.2])
).expQuadK1_lengthscale = 2.9 expQuadK2_lengthscale = 0.5 periodK1_lengthscale = 3.1 periodK1_period = 1.2 params_spec["kernels_params0"] = { "k_types": ["exponentialQuadratic", "exponentialQuadratic", "periodic"], "k_params0": [torch.DoubleTensor([expQuadK1_lengthscale]), torch.DoubleTensor([expQuadK2_lengthscale]), torch.DoubleTensor([periodK1_lengthscale, periodK1_lengthscale]), ], }
1.2.3.2. Latent-specific textual format#
For each latent k, item k_type_latent<k>
needs to be specified, giving the
name of the kernel for latent k. Other items required depend on
the value of item k_type_latent<k>
. For example, for
k_type_latent<k>=exponentialQuadratic
, item
k_lengthscale0_latent<k>
should specify the lengthscale parameter, and for
k_type_latent<k>=periodic
items k_lengthscale0_latent<k>
and
k_period0_latent<k>
should specify the lengthscale and period parameter of
the periodic kernel, respectively.
params_spec["kernels_params0"] = { "k_type_latent0": "exponentialQuadratic", "k_lengthscale0_latent0": 2.0, "k_type_latent1": "periodic", "k_lengthscale0_latent1": 1.0, "k_period0_latent1": 0.75, }
1.2.3.3. Latent-common textual format#
The shorter format requires
item k_types
, giving the name name of the kernel to be used for all latent variables.
Other required items depend on the value of
item k_types
. For example, for k_types=exponentialQuadratic
,
item k_lengthscales0
should specify the lengthscale parameter, and for
k_types=periodic
items k_lengthscales0
and k_periods0
should
specify the lengthscale and period parameter of the periodic kernel,
respectively.
params_spec["kernels_params0"] = { "k_types": "exponentialQuadratic", "k_lengthscales0": 1.0, }
1.2.3.4. Defaults#
For all latents, the default kernel is an exponential quadratic kernel with lengthscale 1.0.
params_spec["kernels_params0"] = { "k_types": "exponentialQuadratic", "k_lengthscales0": 1.0, }
1.2.4. Inducing points locations parameters#
The inducing points locations, or input locations, are the points (\(\mathbf{z}_k^{(r)}\), Duncker and Sahani [DS18], p.3) where the Gaussian process are evaluated to obtain the inducing points. The data structures for these parameters are described in the next section.
1.2.4.1. Python variable format#
One item needs to be specified:
ind_points_locs0
should be a list of sizen_latents
. The kth element of this list should be atorch.DoubleTensor
of size (n_trials
,n_indPoints[k]
, 1), whereind_points_locs0[k][r, :, 0]
gives the initial inducing points locations for latent k and trial r.n_latents = 3 n_ind_points = (10, 20, 15) n_trials = 50 trials_start_time = 0.0 trials_end_time = 7.0 params_spec["ind_points_locs_params0"] = { "ind_points_locs0": [trials_start_time + (trials_end_time-trials_start_time) * torch.rand(n_trials, n_ind_points[k], 1, dtype=torch.double) for k in range(n_latents)] }
1.2.4.2. Latent-trial-specific filename format#
For each latent k and trial r one item needs to be specified:
ind_points_locs0_latent<k>_trial<r>_filename
giving the name of the file (csv format readable by pandas read_csv function) containing the initial inducing points locations for latent k and trial r.params_spec["ind_points_locs_params0"] = { "ind_points_locs0_latent0_trial0_filename": "ind_points_locs0_latent0_trial0.csv", "ind_points_locs0_latent0_trial1_filename": "ind_points_locs0_latent0_trial1.csv", "ind_points_locs0_latent1_trial0_filename": "ind_points_locs0_latent1_trial0.csv", "ind_points_locs0_latent1_trial1_filename": "ind_points_locs0_latent1_trial1.csv", }
1.2.4.3. Latent-trial-common filename format#
This shorter format requires the specification of the item
ind_points_locs0_filename
giving the name of the file (csv format readable by
pandas read_csv function) containing the initial inducing points locations
for all latents and trials.
params_spec["ind_points_locs_params0"] = { "ind_points_locs0_filename": "ind_points_locs0.csv", }
1.2.4.4. Layout format#
The layout format requires the specification of the number of inducing points
in the item n_ind_points
or in the item common_n_ind_points
. Item
n_ind_points
is a list of length n_trials
, such that
n_ind_points[r]
gives the number of inducing points in trial r
. Item
common_n_ind_points
is an integer, that gives the number of inducing points
of all trials.
The layout of the initial inducing points locations is given by the item
ind_points_locs0_layout
. If ind_points_locs0_layout = equidistant
the
initial locations of the inducing points are equidistant between the trial
start and trial end. If ind_points_locs0_layout = uniform
the initial
inducing points are distributed randomly between the start and end of the
trial.
n_ind_points = (10, 20, 15) params_spec["ind_points_locs_params0"] = { "n_ind_points": n_ind_points, "ind_points_locs0_layout": "equidistant", }
1.2.4.5. Defaults#
The default inducing points locations for trial r and latent k are equidistant n_ind_points[k] between the start and end of trial r.
common_n_ind_points = 10 params_spec["ind_points_locs_params0"] = { "common_n_ind_points": common_n_ind_points, "ind_points_locs0_layout": "equidistant", }
1.3. Optimisation parameters#
Parameters values that control the optimisation should be specified
in section [optim_params]
.
optim_method
specifies the method used for for parameter optimisation.If
optim_method = ECM
then the Expectation Conditional Maximisation method is used (McLachlan and Krishnan [MK08], section 5.2). Here the M-step is broken into three conditional maximisation steps: maximisation of the lower bound wrt the embedding parameters (mstep-embedding), wrt the kernels parameters (mstep-kernels) and wrt the inducing points locations (mstep-indPointsLocs). Thus, one ECM iteration comprises one E-step (i.e., maximisation of the lower bound wrt the embedding parameters) followed by the three previous M-step conditional maximisation’s.If
optim_method = mECM
then the Multicycle ECM is used (McLachlan and Krishnan [MK08], section 5.3). Here one E-step maximisation is performed before each of the M-step conditional maximisation’s. Thus, one mECM iteration comprises estep, mstep-embedding, estep, mstep-kernels, estep, mstep-indPointsLocs.em_max_iter
integer value specifying the maximum number of EM iterations.verbose
boolean value indicating whether the optimisation should be verbose or silent.
For each <step> in {estep,mstep_embedding,mstep_kernels,mstep_indPointsLocs}
section [optim_params]
should contain items:
<step>_estimate
boolean value indicating whether<step>
should be estimated or not.<step>_max_iter
integer value indicating the maximum number of iterations used bytorch.optim.LBFGS
for the optimisation of the<step>
within one EM iteration.<step>_lr
float value indicating the learning rate used bytorch.optim.LBFGS
for the optimisation of the<step>
within one EM iteration.<step>_tolerance_grad
float value indicating the termination tolerance on first-order optimality used bytorch.optim.LBFGS
for the optimisation of the<step>
within one EM iteration.<step>_tolerance_change
float value indicating the termination tolerance on function value per parameter changes used bytorch.optim.LBFGS
for the optimisation of the<step>
within one EM iteration.<step>_line_search_fn
string value indicating the line search method used bytorch.optim.LBFGS
. If<step>_line_search_fn=strong_wolfe
line search is performed using the strong_wolfe method. If <step>_line_search_fn=None` line search is not used.params_spec["optim_params"] = { "n_quad": 200, "prior_cov_reg_param": 1e-5, # "optim_method": "ECM", "em_max_iter": 200, # "estep_estimate": True, "estep_max_iter": 20, "estep_lr": 1.0, "estep_tolerance_grad": 1e-7, "estep_tolerance_change": 1e-9, "estep_line_search_fn": "strong_wolfe", # "mstep_embedding_estimate": True, "mstep_embedding_max_iter": 20, "mstep_embedding_lr": 1.0, "mstep_embedding_tolerance_grad": 1e-7, "mstep_embedding_tolerance_change": 1e-9, "mstep_embedding_line_search_fn": "strong_wolfe", # "mstep_kernels_estimate": True, "mstep_kernels_max_iter": 20, "mstep_kernels_lr": 1.0, "mstep_kernels_tolerance_grad": 1e-7, "mstep_kernels_tolerance_change": 1e-9, "mstep_kernels_line_search_fn": "strong_wolfe", # "mstep_indpointslocs_estimate": True, "mstep_indpointslocs_max_iter": 20, "mstep_indpointslocs_lr": 1.0, "mstep_indpointslocs_tolerance_grad": 1e-7, "mstep_indpointslocs_tolerance_change": 1e-9, "mstep_indpointslocs_line_search_fn": "strong_wolfe", # "verbose": True, }
1.3.1. Defaults#
The default optimisation parameters are shown below.
n_quad = 200 prior_cov_reg_param = 1e-3 em_max_iter = 50 params_spec["optim_params"] = { "n_quad": n_quad, "prior_cov_reg_param": prior_cov_reg_param, "optim_method": "ecm", "em_max_iter": em_max_iter, "verbose": True, # "estep_estimate": True, "estep_max_iter": 20, "estep_lr": 1.0, "estep_tolerance_grad": 1e-7, "estep_tolerance_change": 1e-9, "estep_line_search_fn": "strong_wolfe", # "mstep_embedding_estimate": True, "mstep_embedding_max_iter": 20, "mstep_embedding_lr": 1.0, "mstep_embedding_tolerance_grad": 1e-7, "mstep_embedding_tolerance_change": 1e-9, "mstep_embedding_line_search_fn": "strong_wolfe", # "mstep_kernels_estimate": True, "mstep_kernels_max_iter": 20, "mstep_kernels_lr": 1.0, "mstep_kernels_tolerance_grad": 1e-7, "mstep_kernels_tolerance_change": 1e-9, "mstep_kernels_line_search_fn": "strong_wolfe", # "mstep_indpointslocs_estimate": True, "mstep_indpointslocs_max_iter": 20, "mstep_indpointslocs_lr": 1.0, "mstep_indpointslocs_tolerance_grad": 1e-7, "mstep_indpointslocs_tolerance_change": 1e-9, "mstep_indpointslocs_line_search_fn": "strong_wolfe" }