## Definition of Decision Boundary

In classification problems with two or more classes, a decision boundary is a hypersurface that separates the underlying vector space into sets, one for each class. Andrew Ng provides a nice example of Decision Boundary in Logistic Regression.

We know that there are some Linear (like logistic regression) and some non-Linear (like Random Forest) decision boundaries. Let’s create a dummy dataset of two explanatory variables and a target of two classes and see the Decision Boundaries of different algorithms.

## Create the Dummy Dataset

We will create a dummy dataset with scikit-learn of 200 rows, 2 informative independent variables, and 1 target of two classes.

from sklearn.datasets import make_classification X, y = make_classification(n_samples=200, n_features=2, n_informative=2, n_redundant=0, n_classes=2, random_state=1)

## Create the Decision Boundary of each Classifier

We will compare 6 classification algorithms such as:

**Logistic Regression****Decision Tree****Random Forest****Support Vector Machines**(SVM)**Naive Bayes****Neural Network**

We will work with the Mlxtend library. For simplicity, we decided to keep the default parameters of every algorithm.

from sklearn.linear_model import LogisticRegression from sklearn.tree import DecisionTreeClassifier from sklearn.ensemble import RandomForestClassifier from sklearn.svm import SVC from sklearn.naive_bayes import GaussianNB from sklearn.neural_network import MLPClassifier # Initializing Classifiers clf1 = LogisticRegression() clf2 = DecisionTreeClassifier() clf3 = RandomForestClassifier() clf4 = SVC(gamma='auto') clf5 = GaussianNB() clf6 = MLPClassifier() import matplotlib.pyplot as plt from mlxtend.plotting import plot_decision_regions import matplotlib.gridspec as gridspec %matplotlib inline gs = gridspec.GridSpec(3, 2) fig = plt.figure(figsize=(14,10)) labels = ['Logistic Regression', 'Decision Tree', 'Random Forest', 'SVM', 'Naive Bayes', 'Neural Network'] for clf, lab, grd in zip([clf1, clf2, clf3, clf4, clf5, clf6], labels, [(0,0), (0,1), (1,0), (1,1), (2,0), (2,1)]): clf.fit(X, y) ax = plt.subplot(gs[grd[0], grd[1]]) fig = plot_decision_regions(X=X, y=y, clf=clf, legend=2) plt.title(lab) plt.show()

## Discussion

Clearly, the Logistic Regression has a Linear Decision Boundary, where the tree-based algorithms like Decision Tree and Random Forest create rectangular partitions. The Naive Bayes leads to a linear decision boundary in many common cases but can also be quadratic as in our case. The SVMs can capture many different boundaries depending on the `gamma`

and the kernel. The same applies to the Neural Networks.