One of the more prevailing and exciting supervised-learning models with associated learning algorithms that vnalyze data and recognize patterns is support vector machines (SVMs). They are used for solving both regression and classification problems. However, they are mostly used in solving classification problems. SVMs were first introduced by B.E. Boser et al. in 1992 and have become popular because of success in handwritten digit recognition in 1994. Before the emergence of boosting algorithms, for example XGBoost and AdaBoost, SVMs had been commonly used.
If you want to have a consolidated foundation of machine-learning algorithms, you should definitely have it in your arsenal. The algorithm of SVMs is powerful, but the concepts behind are not as complicated as you think.
Problem with Logistic Regression
Logistic regression helps solve classification problems separating the instances into two classes. However, there is an infinite number of decision boundaries, and logistic regression only picks an arbitrary one.
Logistic regression doesn’t care if the instances are close to the decision boundary. Therefore, the decision boundary it picks may not be optimal. If a point is far from the decision boundary, we may be more confident in our predictions. Therefore, the optimal decision boundary should be able to maximize the distance between the decision boundary and all instances (i.e., maximize the margins). That’s why the SVM algorithm is important!.
What Are SVMs?
Given a set of training examples, each marked as belonging to one or the other of two categories, an SVM training algorithm builds a model that assigns new examples to one category or the other, making it a nonprobabilistic binary linear classifier.
The objective of applying SVMs is to find the best line in two dimensions or the best hyperplane in more than two dimensions in order to help us separate our space into classes. The hyperplane (line) is found through the maximum margin (i.e., the maximum distance between data points of both classes).
Support Vector Machines
Imagine a labeled training set is two classes of data points (two dimensions): Alice and Cinderella. To separate the two classes, there are so many possible options of hyperplanes that separate correctly. We can achieve exactly the same result using different hyperplanes. However, if we add new data points, the consequence of using various hyperplanes will be very different in terms of classifying new data point into the right group of class.