Predictive Hacks

# Decision Boundary in Python

## Definition of Decision Boundary

In classification problems with two or more classes, a decision boundary is a hypersurface that separates the underlying vector space into sets, one for each class. Andrew Ng provides a nice example of Decision Boundary in Logistic Regression.

We know that there are some Linear (like logistic regression) and some non-Linear (like Random Forest) decision boundaries. Let’s create a dummy dataset of two explanatory variables and a target of two classes and see the Decision Boundaries of different algorithms.

## Create the Dummy Dataset

We will create a dummy dataset with scikit-learn of 200 rows, 2 informative independent variables, and 1 target of two classes.

```from sklearn.datasets import make_classification
X, y = make_classification(n_samples=200, n_features=2, n_informative=2, n_redundant=0, n_classes=2, random_state=1)

```

## Create the Decision Boundary of each Classifier

We will compare 6 classification algorithms such as:

• Logistic Regression
• Decision Tree
• Random Forest
• Support Vector Machines (SVM)
• Naive Bayes
• Neural Network

We will work with the Mlxtend library. For simplicity, we decided to keep the default parameters of every algorithm.

```from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.neural_network import MLPClassifier

# Initializing Classifiers
clf1 = LogisticRegression()
clf2 = DecisionTreeClassifier()
clf3 = RandomForestClassifier()
clf4 = SVC(gamma='auto')
clf5 = GaussianNB()
clf6 = MLPClassifier()

import matplotlib.pyplot as plt
from mlxtend.plotting import plot_decision_regions
import matplotlib.gridspec as gridspec
%matplotlib inline

gs = gridspec.GridSpec(3, 2)

fig = plt.figure(figsize=(14,10))

labels = ['Logistic Regression', 'Decision Tree', 'Random Forest', 'SVM', 'Naive Bayes', 'Neural Network']
for clf, lab, grd in zip([clf1, clf2, clf3, clf4, clf5, clf6],
labels,
[(0,0), (0,1), (1,0), (1,1), (2,0), (2,1)]):

clf.fit(X, y)
ax = plt.subplot(gs[grd[0], grd[1]])
fig = plot_decision_regions(X=X, y=y, clf=clf, legend=2)
plt.title(lab)

plt.show()

```

## Discussion

Clearly, the Logistic Regression has a Linear Decision Boundary, where the tree-based algorithms like Decision Tree and Random Forest create rectangular partitions. The Naive Bayes leads to a linear decision boundary in many common cases but can also be quadratic as in our case. The SVMs can capture many different boundaries depending on the `gamma` and the kernel. The same applies to the Neural Networks.

### Get updates and learn from the best

Python

#### Creating Dynamic Forms with Streamlit: A Step-by-Step Guide

In this blog post, we’ll teach you how to create dynamic forms based on user input using Streamlit’s session state

Python

#### How to Connect Wikipedia with ChatGPT and LangChain

ChatGPT’s knowledge is limited to its training data, which has the cutoff year of 2021. This implies that we cannot