Generated by DeepSeek V3.2| Support-vector machine | |
|---|---|
| Name | Support-vector machine |
| Inventor | Vladimir Vapnik, Alexey Chervonenkis |
| Year | 1963 |
| Influenced by | Statistical learning theory |
| Influenced | Machine learning, Pattern recognition |
Support-vector machine. A support-vector machine is a supervised learning model used for classification and regression analysis within the field of machine learning. Developed from the theoretical foundations of statistical learning theory, it constructs a hyperplane or set of hyperplanes in a high-dimensional space to perform tasks like classification. The model is renowned for its effectiveness in high-dimensional spaces and its versatility through the use of kernel functions.
The foundational concepts for the support-vector machine were developed by Vladimir Vapnik and Alexey Chervonenkis at the Institute of Control Sciences in the 1960s. The modern incarnation, including the kernel trick, was later popularized by Bernhard Boser, Isabelle Guyon, and Vapnik at AT&T Bell Laboratories. The core principle involves finding the optimal separating hyperplane that maximizes the margin between different classes in the training data, a concept rooted in Vapnik–Chervonenkis theory. This approach provides strong theoretical guarantees against overfitting, making it a robust tool in the arsenal of pattern recognition.
For linearly separable data, a linear SVM aims to find the hyperplane with the maximum margin. This margin is defined by the distance to the nearest training data points from any class, known as support vectors. The optimization problem is typically formulated as a convex quadratic programming task, minimizing a hinge loss function. The solution depends only on these support vectors, a property that contributes to the model's efficiency. The seminal work on this formulation is often associated with research published by Corinna Cortes and Vapnik.
To handle data that is not linearly separable, SVMs employ the kernel trick. This method implicitly maps input vectors into a higher-dimensional feature space where a linear separation becomes possible. Common kernel functions include the polynomial kernel, the radial basis function kernel, and the sigmoid kernel. The use of kernels allows SVMs to create complex, nonlinear decision boundaries without explicitly performing the computationally expensive transformation. This innovation was crucial for applying SVMs to complex domains like bioinformatics and computer vision.
Several important extensions have been developed to enhance the basic SVM model. The soft-margin SVM was introduced by Cortes and Vapnik to handle noisy, non-separable data by introducing slack variables. For regression tasks, support-vector regression applies similar principles by fitting a tube around the data. Other variants include the ν-SVM, which provides a different parameterization for controlling the margin and errors, and transductive SVMs for semi-supervised learning. The LIBSVM library, created by Chih-Jen Lin, has been instrumental in popularizing these methods.
Efficient training of SVMs, especially for large datasets, is a critical area of research. The sequential minimal optimization algorithm, developed by John Platt at Microsoft Research, breaks the large quadratic programming problem into smaller sub-problems. Other significant software implementations include LIBLINEAR for large-scale linear classification and the scikit-learn library in Python (programming language). These tools have made SVMs accessible for practical applications across various industries and research institutions like Stanford University.
Support-vector machines have been successfully applied across a diverse range of fields. In bioinformatics, they are used for protein structure prediction and cancer classification from gene expression data. Within computer vision, SVMs are a core component for tasks such as handwriting recognition and object detection in systems developed by companies like Google. They have also seen significant use in natural language processing for text categorization and sentiment analysis, as well as in geostatistics for remote sensing data interpretation by agencies like NASA.
Category:Machine learning Category:Classification algorithms Category:Statistical classification