1. Parameters and their specification#
svGPFA uses different groups of parameters. We provide a utility function
svGPFA.utils.initUtils.getParamsAndKernelsTypes() that builds them from
parameter specifications. These specifications are short descriptions on how
to build a parameter. For example, a parameter specification for the inducing
points locations can be equidistant, indicating that the inducing points
locations should be set to equidistant values between the start and end of a
trial.
A parameter specification is a nested list (e.g.,
param_spec[group_name][param_name]) containing the specification of
parameter param_name in group group_name. It can be built:
automatically, using default values, with the utility function
svGPFA.utils.initUtils.getDefaultParamsDict(),manually, by setting parameter specifications in the Python code,
from the command line, with the utility function
svGPFA.utils.initUtils.getParamsDictFromArgs(),from a configuration file, with the utility function
svGPFA.utils.initUtils.getParamsDictFromStringsDict().The Colab notebooks automatically builds this list (1). This script builds the parameters specification list from the command line and from this configuration file (3, 4).
Below we describe all svGPFA parameters and their specifications. Refer to the documentation of the above utility functions for details on how to use them.
1.1. Data structure parameters#
There are two data structure parameters trials_start_times and
trials_end_times, which are tensors of length n_trials giving the start
and end times of each trial; i.e., \(\tau_i\) in Eq 7 in
Duncker and Sahani [DS18].
These parameters can be specified in a trial-specific or trial-common format. If both are specified, the longer format takes precedence.
1.1.1. Trial-specific format#
Two items need to be specified:
trials_start_timesshould provide a list of lengthn_trials, with float values indicating seconds, such thattrials_start_times[i]gives the start time of the ith trial.trials_end_timesshould provide a list of lengthn_trials, with float values indicating seconds, such thattrials_end_times[i]gives the end time of the ith trial.Listing 1.1 addingdata_structure_paramsin the trial-specific Python variable format toparams_spec(3 trials)#params_spec["data_structure_params"] = { "trials_start_times": [0.0, 0.4, 0.7], "trials_end_times": [0.2, 0.5, 0.9], }
1.1.2. Trial-common format#
Two items need to be specified:
trials_start_timeshould provide the start time (float value, secs) of all trials.trials_end_timeshould provide the end time (float value, secs) of all trials.params_spec["data_structure_params"] = { "trials_start_time": 0.0, "trials_end_time": 1.0, }
1.1.3. Defaults#
All trials start at 0.0 sec and end at 1.0 sec.
params_spec["data_structure_params"] = { "trials_start_time": 0.0, "trials_end_time": 1.0, }
1.2. Initial values of model parameters#
Initial values for four types of model parameters need to be specified:
For most parameters types initial values can be specified in a binary format or in a non-binary shorter or longer formats. In the binary format parameters are given as PyTorch tensors. The shorter format provides the same initial value for all latents and trials, whereas the longer format gives different initial values for each latent and trial. If both shorter and longer format are specified, the longer format take precedence.
1.2.1. Variational parameters#
The variational parameters are the means (\(\mathbf{m}_k^{(r)}\), Duncker and Sahani [DS18], p.3) and covariances (\(S_k^{(r)}\), Duncker and Sahani [DS18], p.3) of the inducing points (\(\mathbf{u}_k^{(r)}\), Duncker and Sahani [DS18], p.3). The data structures for these parameters are described in the next section.
1.2.1.1. Python variable format#
Two items need to be specified:
variational_mean0should be a list of sizen_latents. The kth element of this list should be atorch.DoubleTensorof dimension (n_trials,n_indPoints[k], 1), wherevariational_mean0[k][r, :, 0]gives the initial variational mean for latentkand trialr.variational_cov0should be a list of sizen_latents. The kth element of this list should be atorch.DoubleTensorof dimension (n_trials,n_indPoints[k],n_indPoints[k]), wherevariational_cov0[k][r, :, :]gives the initial variational covariance for latentkand trialr.n_latents = 3 n_ind_points = [20, 10, 15] var_mean0 = [torch.normal(mean=0, std=1, size=(n_trials, n_ind_points[k], 1), dtype=torch.double) for k in range(n_latents)] diag_value = 1e-2 var_cov0 = [[] for r in range(n_latents)] for k in range(n_latents): var_cov0[k] = torch.empty((n_trials, n_ind_points[k], n_ind_points[k]), dtype=torch.double) for r in range(n_trials): var_cov0[k][r, :, :] = torch.eye(n_ind_points[k], dtype=torch.double)*diag_value params_spec["variational_params0"] = { "variational_mean0": var_mean0, "variational_cov0": var_cov0, }
1.2.1.2. Latent-trial-specific filename format#
For every latent, k, and every trial, r, two items need to be specified:
variational_mean0_filename_latent<k>_trial<r>should provide the filename (csv format readable by pandas read_csv function) containing the initial values of the variational mean for latent k and trial r. This file should contain a vector of size number_of_inducing_points.variational_cov0_filename_latent<k>_trial<r>should provide the filename (csv format readable by pandas read_csv function) containing the initial values of the variational covariance for latent k and trial r. This file should contain a matrix of size number_of_inducing_points x number_of_inducing_points.Listing 1.5 addingvariational_params0in the latent-trial-specific filename format toparams_spec(2 trials and 2 latents)#params_spec["variational_params0"] = { "variational_mean0_filename_latent0_trial0": "../data/uniform_0.00_1.00_len09.csv", "variational_cov0_filename_latent0_trial0": "../data/identity_scaled1e-2_09x09.csv", "variational_mean0_filename_latent0_trial1": "../data/gaussian_0.00_1.00_len09.csv", "variational_cov0_filename_latent0_trial1": "../data/identity_scaled1e-4_09x09.csv", "variational_mean0_filename_latent1_trial0": "../data/uniform_0.00_1.00_len09.csv", "variational_cov0_filename_latent1_trial0": "../data/identity_scaled1e-2_09x09.csv", "variational_mean0_filename_latent1_trial1": "../data/gaussian_0.00_1.00_len09.csv", "variational_cov0_filename_latent1_trial1": "../data/identity_scaled1e-4_09x09.csv", }
1.2.1.3. Latent-trial-common filename format#
Two items need to be specified:
variational_means0_filenameshould provide the filename (csv format readable by pandas read_csv function) containing the initial values of the variational mean for all latents and trials. This file should contain a vector of size number_of_inducing_points.variational_covs0_filenameshould provide the filename (csv format readable by pandas read_csv function) containing the initial values of the variational covariance for all latents and trials. This file should contain a matrix of size number_of_inducing_points x number_of_inducing_points.params_spec["variational_params0"] = { "variational_means0_filename": "../data/uniform_0.00_1.00_len09.csv", "variational_covs0_filename": "../data/identity_scaled1e-2_09x09.csv", }
1.2.1.4. Constant value format#
This initialisation option sets the same variational mean and covariance across all latents and trials. The common variational mean has all elements equal to a constant value, and the common variational covariance is a scaled identity matrix.
Two items need to be specified:
variational_mean0_constant_valueshould provide a float value giving the constant value of all elements of the common variational mean.variational_cov0_diag_valueshould provide a float value giving the diagonal value of the common variational covariance.params_spec["variational_params0"] = { "variational_mean0_constant_value": 0.0, "variational_cov0_diag_value": 0.01, }
1.2.1.5. Defaults#
The default variational mean and covariance have constant values. For the variational mean the constant value is zero and for the variational covariance the constant diagonal value is 0.01.
params_spec["variational_params0"] = { "variational_mean0_constant_value": 0.0, "variational_cov0_diag_value": 0.01, }
1.2.2. Embedding parameters#
The embedding parameters are the loading matrix (\(C\), Duncker and Sahani [DS18], Eq. 1, middle) and offset vector (\(\mathbf{d}\), Duncker and Sahani [DS18], Eq. 1 middle). The data structures for these parameters are described in the next section.
1.2.2.1. Python variable format#
Two items need to be specified:
c0should be atorch.DoubleTensorof size (n_neurons, n_latents)d0should be atorch.DoubleTensorof size (n_neurons, 1)n_neurons = 100 n_latents = 3 params_spec["embedding_params0"] = { "c0": torch.normal(mean=0.0, std=1.0, size=(n_neurons, n_latents), dtype=torch.double), "d0": torch.normal(mean=0.0, std=1.0, size=(n_neurons, 1), dtype=torch.double), }
1.2.2.2. Filename format#
Two items need to be specified:
c0_filenamegives the filename (csv format readable by pandas read_csv function) containing the values of loading matrixC,d0_filenamegives the filename (csv format readable by pandas read_csv function) containing the values of offset vectord.params_spec["embedding_params0"] = { "c0_filename": "../data/C_constant_1.00constant_100neurons_02latents.csv", "d0_filename": "../data/d_constant_0.00constant_100neurons.csv", }
1.2.2.3. Random format#
Eight items need to be specified:
c0_distributionstring value giving the name of the distribution of the loading matrix C (e.g., Normal).c0_locfloat number giving the location of the distribution of the loading matrix C (e.g., 0.0).c0_scalefloat value giving the scale of the distribution of the loading matrix C (e.g., 1.0).c0_random_seedoptional integer value giving the value of the random seed to be set prior to generating the random transition matrixC. This value can be specified for replicability. If not given, the random seed is not changed prior to generatingC.d0_distributionstring value giving the name of the distribution of the offset vectord(e.g., Normal).d0_locfloat number giving the location of the distribution of the offset vectord(e.g., 0.3).d0_scalefloat value giving the scale of the distribution of the offset vectord(e.g., 1.0).d0_random_seedoptional integer value giving the value of the random seed to be set prior to generating the random transition matrixd. This value can be specified for replicability. If not given, the random seed is not changed prior to generatingd.params_spec["embedding_params0"] = { "c0_distribution": "Normal", "c0_loc": 0.0, "c0_scale": 1.0, "c0_random_seed": 102030, "d0_distribution": "Normal", "d0_loc": 0.0, "d0_scale": 1.0, "d0_random_seed": 203040, }
1.2.2.4. Defaults#
The default loading matrix C0/offset vector d0 is a zero mean standard normal random matrix/vector.
params_spec["embedding_params0"] = { "c0_distribution": "Normal", "c0_loc": 0.0, "c0_scale": 1.0, "d0_distribution": "Normal", "d0_loc": 0.0, "d0_scale": 1.0, }
1.2.3. Kernel parameters#
The kernel parameters of latent k are those of the Gaussian process covariance function (\(\kappa_k(\cdot,\cdot)\), Duncker and Sahani [DS18], p. 2). The data structures for these parameters are described in the next section.
1.2.3.1. Python variable format#
Two items need to be specified:
k_typesshould be a list of sizen_latents. The kth element of this list should be a string with the type of kernel for the kth latent (e.g.,k_types[k]=exponentialQuadratic).k_params0should be a list of sizen_latents. The kth element of this list should be atorch.DoubleTensorcontaining the parameters of the kth kernel (e.g.,k_params0[k]=torch.DoubleTensor([3.2])).expQuadK1_lengthscale = 2.9 expQuadK2_lengthscale = 0.5 periodK1_lengthscale = 3.1 periodK1_period = 1.2 params_spec["kernels_params0"] = { "k_types": ["exponentialQuadratic", "exponentialQuadratic", "periodic"], "k_params0": [torch.DoubleTensor([expQuadK1_lengthscale]), torch.DoubleTensor([expQuadK2_lengthscale]), torch.DoubleTensor([periodK1_lengthscale, periodK1_lengthscale]), ], }
1.2.3.2. Latent-specific textual format#
For each latent k, item k_type_latent<k> needs to be specified, giving the
name of the kernel for latent k. Other items required depend on
the value of item k_type_latent<k>. For example, for
k_type_latent<k>=exponentialQuadratic, item
k_lengthscale0_latent<k> should specify the lengthscale parameter, and for
k_type_latent<k>=periodic items k_lengthscale0_latent<k> and
k_period0_latent<k> should specify the lengthscale and period parameter of
the periodic kernel, respectively.
params_spec["kernels_params0"] = { "k_type_latent0": "exponentialQuadratic", "k_lengthscale0_latent0": 2.0, "k_type_latent1": "periodic", "k_lengthscale0_latent1": 1.0, "k_period0_latent1": 0.75, }
1.2.3.3. Latent-common textual format#
The shorter format requires
item k_types, giving the name name of the kernel to be used for all latent variables.
Other required items depend on the value of
item k_types. For example, for k_types=exponentialQuadratic,
item k_lengthscales0 should specify the lengthscale parameter, and for
k_types=periodic items k_lengthscales0 and k_periods0 should
specify the lengthscale and period parameter of the periodic kernel,
respectively.
params_spec["kernels_params0"] = { "k_types": "exponentialQuadratic", "k_lengthscales0": 1.0, }
1.2.3.4. Defaults#
For all latents, the default kernel is an exponential quadratic kernel with lengthscale 1.0.
params_spec["kernels_params0"] = { "k_types": "exponentialQuadratic", "k_lengthscales0": 1.0, }
1.2.4. Inducing points locations parameters#
The inducing points locations, or input locations, are the points (\(\mathbf{z}_k^{(r)}\), Duncker and Sahani [DS18], p.3) where the Gaussian process are evaluated to obtain the inducing points. The data structures for these parameters are described in the next section.
1.2.4.1. Python variable format#
One item needs to be specified:
ind_points_locs0should be a list of sizen_latents. The kth element of this list should be atorch.DoubleTensorof size (n_trials,n_indPoints[k], 1), whereind_points_locs0[k][r, :, 0]gives the initial inducing points locations for latent k and trial r.Listing 1.17 addingind_points_locs_params0in Python variable format with uniformly distributed inducing points locations toparams_spec#n_latents = 3 n_ind_points = (10, 20, 15) n_trials = 50 trials_start_time = 0.0 trials_end_time = 7.0 params_spec["ind_points_locs_params0"] = { "ind_points_locs0": [trials_start_time + (trials_end_time-trials_start_time) * torch.rand(n_trials, n_ind_points[k], 1, dtype=torch.double) for k in range(n_latents)] }
1.2.4.2. Latent-trial-specific filename format#
For each latent k and trial r one item needs to be specified:
ind_points_locs0_latent<k>_trial<r>_filenamegiving the name of the file (csv format readable by pandas read_csv function) containing the initial inducing points locations for latent k and trial r.Listing 1.18 addingind_points_locs_params0in the latent-trial-specific filename format toparams_spec(2 latents, 2 trials)#params_spec["ind_points_locs_params0"] = { "ind_points_locs0_latent0_trial0_filename": "ind_points_locs0_latent0_trial0.csv", "ind_points_locs0_latent0_trial1_filename": "ind_points_locs0_latent0_trial1.csv", "ind_points_locs0_latent1_trial0_filename": "ind_points_locs0_latent1_trial0.csv", "ind_points_locs0_latent1_trial1_filename": "ind_points_locs0_latent1_trial1.csv", }
1.2.4.3. Latent-trial-common filename format#
This shorter format requires the specification of the item
ind_points_locs0_filename giving the name of the file (csv format readable by
pandas read_csv function) containing the initial inducing points locations
for all latents and trials.
Listing 1.19 addingind_points_locs_params0in the latent-trial-common filename format toparams_spec#params_spec["ind_points_locs_params0"] = { "ind_points_locs0_filename": "ind_points_locs0.csv", }
1.2.4.4. Layout format#
The layout format requires the specification of the number of inducing points
in the item n_ind_points or in the item common_n_ind_points. Item
n_ind_points is a list of length n_trials, such that
n_ind_points[r] gives the number of inducing points in trial r. Item
common_n_ind_points is an integer, that gives the number of inducing points
of all trials.
The layout of the initial inducing points locations is given by the item
ind_points_locs0_layout. If ind_points_locs0_layout = equidistant the
initial locations of the inducing points are equidistant between the trial
start and trial end. If ind_points_locs0_layout = uniform the initial
inducing points are distributed randomly between the start and end of the
trial.
n_ind_points = (10, 20, 15) params_spec["ind_points_locs_params0"] = { "n_ind_points": n_ind_points, "ind_points_locs0_layout": "equidistant", }
1.2.4.5. Defaults#
The default inducing points locations for trial r and latent k are equidistant n_ind_points[k] between the start and end of trial r.
common_n_ind_points = 10 params_spec["ind_points_locs_params0"] = { "common_n_ind_points": common_n_ind_points, "ind_points_locs0_layout": "equidistant", }
1.3. Optimisation parameters#
Parameters values that control the optimisation should be specified
in section [optim_params].
optim_methodspecifies the method used for for parameter optimisation.If
optim_method = ECMthen the Expectation Conditional Maximisation method is used (McLachlan and Krishnan [MK08], section 5.2). Here the M-step is broken into three conditional maximisation steps: maximisation of the lower bound wrt the embedding parameters (mstep-embedding), wrt the kernels parameters (mstep-kernels) and wrt the inducing points locations (mstep-indPointsLocs). Thus, one ECM iteration comprises one E-step (i.e., maximisation of the lower bound wrt the embedding parameters) followed by the three previous M-step conditional maximisation’s.If
optim_method = mECMthen the Multicycle ECM is used (McLachlan and Krishnan [MK08], section 5.3). Here one E-step maximisation is performed before each of the M-step conditional maximisation’s. Thus, one mECM iteration comprises estep, mstep-embedding, estep, mstep-kernels, estep, mstep-indPointsLocs.em_max_iterinteger value specifying the maximum number of EM iterations.verboseboolean value indicating whether the optimisation should be verbose or silent.
For each <step> in {estep,mstep_embedding,mstep_kernels,mstep_indPointsLocs}
section [optim_params] should contain items:
<step>_estimateboolean value indicating whether<step>should be estimated or not.<step>_max_iterinteger value indicating the maximum number of iterations used bytorch.optim.LBFGSfor the optimisation of the<step>within one EM iteration.<step>_lrfloat value indicating the learning rate used bytorch.optim.LBFGSfor the optimisation of the<step>within one EM iteration.<step>_tolerance_gradfloat value indicating the termination tolerance on first-order optimality used bytorch.optim.LBFGSfor the optimisation of the<step>within one EM iteration.<step>_tolerance_changefloat value indicating the termination tolerance on function value per parameter changes used bytorch.optim.LBFGSfor the optimisation of the<step>within one EM iteration.<step>_line_search_fnstring value indicating the line search method used bytorch.optim.LBFGS. If<step>_line_search_fn=strong_wolfeline search is performed using the strong_wolfe method. If <step>_line_search_fn=None` line search is not used.params_spec["optim_params"] = { "n_quad": 200, "prior_cov_reg_param": 1e-5, # "optim_method": "ECM", "em_max_iter": 200, # "estep_estimate": True, "estep_max_iter": 20, "estep_lr": 1.0, "estep_tolerance_grad": 1e-7, "estep_tolerance_change": 1e-9, "estep_line_search_fn": "strong_wolfe", # "mstep_embedding_estimate": True, "mstep_embedding_max_iter": 20, "mstep_embedding_lr": 1.0, "mstep_embedding_tolerance_grad": 1e-7, "mstep_embedding_tolerance_change": 1e-9, "mstep_embedding_line_search_fn": "strong_wolfe", # "mstep_kernels_estimate": True, "mstep_kernels_max_iter": 20, "mstep_kernels_lr": 1.0, "mstep_kernels_tolerance_grad": 1e-7, "mstep_kernels_tolerance_change": 1e-9, "mstep_kernels_line_search_fn": "strong_wolfe", # "mstep_indpointslocs_estimate": True, "mstep_indpointslocs_max_iter": 20, "mstep_indpointslocs_lr": 1.0, "mstep_indpointslocs_tolerance_grad": 1e-7, "mstep_indpointslocs_tolerance_change": 1e-9, "mstep_indpointslocs_line_search_fn": "strong_wolfe", # "verbose": True, }
1.3.1. Defaults#
The default optimisation parameters are shown below.
n_quad = 200 prior_cov_reg_param = 1e-3 em_max_iter = 50 params_spec["optim_params"] = { "n_quad": n_quad, "prior_cov_reg_param": prior_cov_reg_param, "optim_method": "ecm", "em_max_iter": em_max_iter, "verbose": True, # "estep_estimate": True, "estep_max_iter": 20, "estep_lr": 1.0, "estep_tolerance_grad": 1e-7, "estep_tolerance_change": 1e-9, "estep_line_search_fn": "strong_wolfe", # "mstep_embedding_estimate": True, "mstep_embedding_max_iter": 20, "mstep_embedding_lr": 1.0, "mstep_embedding_tolerance_grad": 1e-7, "mstep_embedding_tolerance_change": 1e-9, "mstep_embedding_line_search_fn": "strong_wolfe", # "mstep_kernels_estimate": True, "mstep_kernels_max_iter": 20, "mstep_kernels_lr": 1.0, "mstep_kernels_tolerance_grad": 1e-7, "mstep_kernels_tolerance_change": 1e-9, "mstep_kernels_line_search_fn": "strong_wolfe", # "mstep_indpointslocs_estimate": True, "mstep_indpointslocs_max_iter": 20, "mstep_indpointslocs_lr": 1.0, "mstep_indpointslocs_tolerance_grad": 1e-7, "mstep_indpointslocs_tolerance_change": 1e-9, "mstep_indpointslocs_line_search_fn": "strong_wolfe" }