Predictive Hacks

Dimension Reduction in Python

Dimension Reduction Python

In statistics and machine learning is quite common to reduce the dimension of the features. There are many available algorithms and techniques and many reasons for doing it. In this post, we are going to give an example of two dimension reduction algorithms such as PCA and t-SNE. We assume that the reason for applying those algorithms is to be able to represent our data into 2 dimensions with a scatterplot.

We are going to work with the famous iris dataset, but this time we are going to get the data directly from the URL.

import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline  

# import data from URL
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
# load dataset into Pandas DataFrame
df = pd.read_csv(url, names=['sepal length','sepal width','petal length','petal width','target'])

df.head()

sepal length	sepal width	petal length	petal width	target
0	5.1	3.5	1.4	0.2	Iris-setosa
1	4.9	3.0	1.4	0.2	Iris-setosa
2	4.7	3.2	1.3	0.2	Iris-setosa
3	4.6	3.1	1.5	0.2	Iris-setosa
4	5.0	3.6	1.4	0.2	Iris-setosa

Now we are going to separate the 4 features into one data frame X and the dependent variable y to another. It is a good approach to normalize the data before you apply a dimension reduction algorithm and especially the PCA.

X = df.iloc[:,0:4]
y = df.iloc[:,4]

# scale/normalize the data
X = StandardScaler().fit_transform(X)

PCA Algorithm

# The two Principal Components
PCs = pd.DataFrame(PCA(n_components=2).fit_transform(X), columns = ['PC1', 'PC2'])

# add the target y to the data frame
PCs['target'] = y

sns.scatterplot(x='PC1', y='PC2', data=PCs, hue='target')
Dimension Reduction in Python 1

t-SNE Algorithm

# the two components
tSNE = pd.DataFrame(TSNE(n_components=2).fit_transform(X), columns = ['tSNE1', 'tSNE2'])

# add the target
tSNE['target'] = y

sns.scatterplot(x='tSNE1', y='tSNE2', data=tSNE, hue='target')
Dimension Reduction in Python 2

Want to Learn more advanced Dimensionality Reduction Algorithms?

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

3 thoughts on “Dimension Reduction in Python”

Leave a Comment

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

letter frequency
Python

Document Letter Frequency in Python

Letter Frequency We will provide you a walk-through example of how you can easily get the letter frequency in documents

[the_ad_group id="232"]
[the_ad id="2133"]