Gaussian Processes Regression

Modified on Thu, 17 Aug, 2023 at 9:27 AM

Description

Gaussian Processes is a data-driven modelling technique that generates smooth functions that fit data given to the model. This allows Gaussian Processes to be used in numerical regression to predict outputs which are a function of the input data.

Application

To perform predictions, Gaussian Processes computes the similarity between train and test inputs. The similarity between inputs is evaluated based on the Kernel. There are multiple types of Kernel such as RBG, Matérn, etc. Each test point similarity is evaluated against points used to train the model. Based on the similarity between training and test points, the model will output the most probable result and the uncertainty of the result. Test points that with higher similarity to the training inputs will generate predictions with smaller uncertainty.

Gaussian Processes are one of the most common data-driven techniques currently used, due to its:

Flexibility: Gaussian Processes can model a nonlinear relationship between input and output variables without explicitly specifying a function form.
Efficiency with small/medium datasets: Gaussian Processes can handle small to medium-sized datasets efficiently, without requiring large amounts of data to be “trained”, unlike neural networks. This makes Gaussian Processes models particularly suitable for engineering applications where the amount of available data is often limited due to the cost and time needed to produce the data.
Uncertainty quantification: Gaussian Processes provides uncertainty for each prediction due to the way the algorithm is designed.

On the other hand, some downsides of the Gaussian Processes are:

Inefficiency with large datasets: due to the structure of the underlying algorithm, Gaussian Processes becomes too computationally expensive for large datasets.
Kernel selection: Similarly to most data-driven models, the choice of hyperparameters has a significant impact on the model’s performance. Additionally, there is no universally “best” kernel, the appropriate choice depends on the dataset structure.
Limited scalability: Gaussian Processes may not scale well to high-dimensional problems and might not be suitable for input spaces with non-smooth surfaces.

How to use

Under the Data input choose a tabular dataset to build the model on
Under Inputs select the input features you would like to use for the model
Under Outputs select the output features you would like to predict with this model
Select the Kernels you want to use for the model. You can select a single or multiple kernels (see below for more information on kernels)
Specify a Name for the model
Click Apply to train the model

Kernels

The Kernel defines how the similarity between inputs is assessed. In the platform, five different types of Kernel are available. It is also possible to combine Kernels as shown in the “More on this step” section below. As previously mentioned, the “best” Kernel choice depends on the dataset being modelled. Try adding several Gaussian Processes Regression steps with different smoothness kernels, and then evaluate the trained models to see which has the best performance.

Advanced Options

The Advanced Options of the Gaussian Processes model currently only offers to enable/disable Cross Validation.

Examples

See tutorial 2.2 on the Getting Started page on the Monolith platform for a walkthrough tutorial on how to train a Gaussian Processes Regression.

More on this step

Kernels supported in the platform

Radial Basis Function (RBF)

σ is the output variance
ℓ dictates the wiggliness of the fitting function (i.e., how much the output can vary as a function of the input)
‖ • ‖² is the euclidian distance between input values

White Kernel

σ_y is the noise variance
ℓ_n is the identity matrix.

This kernel adds white noise to the model. If your data contains a significant level of noise, adding this kernel could improve the model fit.

Matérn Kernel

The matérn kernel is a generalisation of the RBF kernel, as with ν → ∞ the Matérn Kernel becomes equivalent to the RBF.

σ is the output variance
ν dictates the fitting function smoothness (the larger value of ν, the smoother the fitting function becomes)
ℓ dictates the wiggliness of the fitting function
‖x_i − x_j‖² is the Euclidian distance
K_v( • ) is a modified Bessel Function
Γ( • ) is the gamma function.

By applying different ν values we get different kernel functions. The platform offers Matérn ¹⁄₂, ³⁄₂, and ⁵⁄₂. Higher ν values are more appropriate for smoother functions. The equations for all three options are shown below.

Combining kernels

It is possible to combine multiple Kernel functions, in the platform, by selecting multiple Kernels, the final Kernel is an addition of all the functions selected. For instance, in the example below we combine the RBF and White kernel:

Default hyperparameter values

σ, σ_y, and ℓ are kernel hyperparameters which are automatically optimised during Gaussian Processes Regression training. For each the model training starts with an initial value and searches for optimal values within certain bounds.

Hyperparameter	Initial Value	Bounds
σ	1	n/a^*
σ_y	1	[0.01, 10]
ℓ	1	[0.1, 10]

^*The value is a constant (i.e., it is not optimised).