A few months ago, whenever I heard the terms Support Vector Machine (SVM) I would imagine something that looks like this:

Actually, this happened whenever I tried to make sense out of the mathematics behind it. I admit it, the maths behind SVM is pretty brutal.

It feels like there are very few ML resources out there that describe SVMs in layman’s terms. This post is my attempt to put together a simple illustration on how SVMs work.

**First things first…What is SVM?**

An SVM is a supervised learning model used for classification and regression analysis.

The simplest form of SVM is one that takes a set of *linearly separable* input data and predicts, for each given input, which of two possible classes forms the output. In other words: non-probabilistic binary linear SVM.

Consider the simple binary example below, where we’re trying to find a hyperplane that can achieve optimal separation between two datasets.

- H1 would be a pretty terrible hyperplane, as it groups some of the black points together with the white ones.
- H2 successfully separates the two datasets, but there is a very small margin between the black points and the separator.
- H3 can be seen as the optimal separator, since maximum margin was used.

*In essence, SVM seeks the optimal H3:*

SVM will find you that red line (H3) you see in the figure, with the optimal angle and position which maximizes the separation of the two classes of data. The data points that “support” this hyperplane on either sides are called the *support vectors*, and hence the use of the name SVM. Once your SVM has decided an optimal position and geometry for the line, you can now use it to test other data points you haven’t plotted yet.

**But… Is it always that straightforward?**

Nope, it’s not. The reality is in realistic scenarios, data is rarely linearly separable and we often need more than 2 dimensions of features to represent it.

Here comes the ‘cool’ idea behind SVMs:

If your data is not linearly separable, why not ‘trick’ your model into thinking it is? I found the following video to be a great way to illustrate what this means:

Essentially, the idea behind SVMs is to map your data points into other dot space via a nonlinear map. Normally, working with data in a higher dimension means increasing computational expense. However, SVMs make use of a so-called *kernel function *that can be evaluated easily.

Besides linear SVMs, the most common kernel functions (tricks) are polynomial, radial basis function (RBF) and sigmoid.

**Let’s see how we can implement a simple SVM classifier in Orange for our zoo scenario again:**

Note that *SVMLearner * is what can be used to construct an SVM in Orange. SVMLearner supports several built-in kernel types and even user-defined kernels written in Python.The kernel type is denoted by constants Linear, Polynomial, RBF, Sigmoid and Custom defined in *Orange.classification.svm.kernels. *

This post doesn’t dive deep into the specifics of kernel functions available, so we will just implement a simple default SVM.

The following code constructs an SVM classifier. Data elements from 1 to 20 are used for training, whereas the ones from 21 to 40 are used for testing.

import Orange from Orange.classification import svm from Orange.evaluation import testing, scoring data = Orange.data.Table("zoo.tab") trainingData = data[1:20] classifier = svm.SVMLearner(trainingData) print "CA: %.2f" % scoring.CA(results)[0] print "AUC: %.2f" % scoring.AUC(results)[0] for d in data[21:40]: c = classifier(d) print "%10s; originally %s" % (classifier(d), d.getclass())