Predictive Hacks

Decision Boundary in Python

decision boundary

Definition of Decision Boundary

In classification problems with two or more classes, a decision boundary is a hypersurface that separates the underlying vector space into sets, one for each class. Andrew Ng provides a nice example of Decision Boundary in Logistic Regression.

We know that there are some Linear (like logistic regression) and some non-Linear (like Random Forest) decision boundaries. Let’s create a dummy dataset of two explanatory variables and a target of two classes and see the Decision Boundaries of different algorithms.

Create the Dummy Dataset

We will create a dummy dataset with scikit-learn of 200 rows, 2 informative independent variables, and 1 target of two classes.

from sklearn.datasets import make_classification
X, y = make_classification(n_samples=200, n_features=2, n_informative=2, n_redundant=0, n_classes=2, random_state=1)
  

Create the Decision Boundary of each Classifier

We will compare 6 classification algorithms such as:

  • Logistic Regression
  • Decision Tree
  • Random Forest
  • Support Vector Machines (SVM)
  • Naive Bayes
  • Neural Network

We will work with the Mlxtend library. For simplicity, we decided to keep the default parameters of every algorithm.

from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB 
from sklearn.neural_network import MLPClassifier



# Initializing Classifiers
clf1 = LogisticRegression()
clf2 = DecisionTreeClassifier()
clf3 = RandomForestClassifier()
clf4 = SVC(gamma='auto')
clf5 = GaussianNB()
clf6 = MLPClassifier()

import matplotlib.pyplot as plt
from mlxtend.plotting import plot_decision_regions
import matplotlib.gridspec as gridspec
%matplotlib inline  

gs = gridspec.GridSpec(3, 2)

fig = plt.figure(figsize=(14,10))

labels = ['Logistic Regression', 'Decision Tree', 'Random Forest', 'SVM', 'Naive Bayes', 'Neural Network']
for clf, lab, grd in zip([clf1, clf2, clf3, clf4, clf5, clf6],
                         labels,
                         [(0,0), (0,1), (1,0), (1,1), (2,0), (2,1)]):

    clf.fit(X, y)
    ax = plt.subplot(gs[grd[0], grd[1]])
    fig = plot_decision_regions(X=X, y=y, clf=clf, legend=2)
    plt.title(lab)

plt.show()
 
decision boundary

Discussion

Clearly, the Logistic Regression has a Linear Decision Boundary, where the tree-based algorithms like Decision Tree and Random Forest create rectangular partitions. The Naive Bayes leads to a linear decision boundary in many common cases but can also be quadratic as in our case. The SVMs can capture many different boundaries depending on the gamma and the kernel. The same applies to the Neural Networks.

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

Leave a Comment

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Python

Image Captioning with HuggingFace

Image captioning with AI is a fascinating application of artificial intelligence (AI) that involves generating textual descriptions for images automatically.

Python

Intro to Chatbots with HuggingFace

In this tutorial, we will show you how to use the Transformers library from HuggingFace to build chatbot pipelines. Let’s