Radial basis function kernel

Radial basis function kernel
Name	Radial basis function kernel
Type	Stationary kernel
Parameters	Length scale (γ or σ)
Domain	Euclidean space
Range	[0, 1]
Notation	K(x, y)

Contents

Definition and mathematical form
Properties and characteristics
Applications in machine learning
Relation to other kernels
Parameter selection and optimization

Radial basis function kernel. In machine learning and statistics, the radial basis function kernel is a popular kernel function used in various kernel methods. It is a stationary kernel that depends only on the distance between its input points, making it invariant to translations in the input space. Its widespread use is primarily due to its effectiveness in support vector machines and Gaussian process regression.

Definition and mathematical form

The radial basis function kernel, often called the Gaussian kernel, is defined for two input vectors x and y in a Euclidean space. Its most common form is given by the equation K(x, y) = exp(-γ ||x - y||²), where γ is a positive parameter controlling the kernel's width and ||·|| denotes the Euclidean norm. An equivalent parameterization uses a length scale parameter σ, expressed as K(x, y) = exp(-||x - y||² / (2σ²)), establishing a direct link to the probability density function of the normal distribution. This formulation ensures the kernel's value ranges from 0 to 1, achieving a maximum when the inputs are identical. The kernel is a specific instance of a radial basis function and is central to the theory of reproducing kernel Hilbert space.

Properties and characteristics

A key property of the radial basis function kernel is its positive definiteness, which guarantees its use in algorithms like the kernel trick for support vector machines. It is a stationary kernel, meaning its value depends solely on the difference x - y, not their absolute positions, a concept related to Bochner's theorem in harmonic analysis. The kernel induces an infinite-dimensional feature space, allowing it to model highly complex, non-linear relationships. Its smoothness is controlled by its parameter, with smaller length scales leading to more oscillatory functions, a behavior studied in the context of Mercer's theorem. The kernel is also isotropic, treating all directions in the input space equally, unlike kernels like the Mahalanobis distance.

Applications in machine learning

The radial basis function kernel is a cornerstone of kernel methods, most famously in support vector machines for classification and regression analysis, as developed by researchers like Vladimir Vapnik. It is the default or common choice in libsvm and scikit-learn for tackling non-linear problems. In Gaussian process regression, it serves as a prevalent covariance function, modeling smooth functions for applications in geostatistics (kriging) and Bayesian optimization. The kernel is also employed in kernel principal component analysis for non-linear dimensionality reduction and forms the basis of the radial basis function network, a type of artificial neural network. Its use extends to computer vision tasks within frameworks like OpenCV.

Relation to other kernels

The radial basis function kernel is a special case of the Matérn covariance function when the smoothness parameter ν approaches infinity. It is closely related to the Laplacian kernel, which uses the Manhattan distance instead of the squared Euclidean distance, resulting in less smooth functions. The polynomial kernel offers a different inductive bias, suitable for problems where feature conjunctions are important, as explored in the context of the Perceptron. The sigmoid kernel, historically used in neural networks, can mimic the behavior of a two-layer perceptron. Theoretical connections exist via kernel embedding of distributions, linking it to metrics like the maximum mean discrepancy used in two-sample tests.

Parameter selection and optimization

Selecting the kernel parameter γ (or σ) and the regularization parameter C in support vector machines is critical and is typically done via cross-validation or grid search. Bayesian optimization and gradient descent can be used to tune these hyperparameters, especially within frameworks like Gaussian process models. The parameter influences the bias–variance tradeoff; a large γ can lead to overfitting, while a small γ may cause underfitting. Methods like kernel alignment aim to learn the kernel parameters directly from data. The challenge of scaling these methods to large datasets is addressed by approximations like the random Fourier features method, pioneered by researchers such as Ali Rahimi and Ben Recht at NIPS.

Category:Kernel methods for machine learning Category:Support vector machines Category:Covariance functions