Predictive Hacks

# Dimension Reduction in Python In statistics and machine learning is quite common to reduce the dimension of the features. There are many available algorithms and techniques and many reasons for doing it. In this post, we are going to give an example of two dimension reduction algorithms such as PCA and t-SNE. We assume that the reason for applying those algorithms is to be able to represent our data into 2 dimensions with a scatterplot.

We are going to work with the famous iris dataset, but this time we are going to get the data directly from the URL.

```import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

# import data from URL
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
# load dataset into Pandas DataFrame
df = pd.read_csv(url, names=['sepal length','sepal width','petal length','petal width','target'])

```
``````
sepal length	sepal width	petal length	petal width	target
0	5.1	3.5	1.4	0.2	Iris-setosa
1	4.9	3.0	1.4	0.2	Iris-setosa
2	4.7	3.2	1.3	0.2	Iris-setosa
3	4.6	3.1	1.5	0.2	Iris-setosa
4	5.0	3.6	1.4	0.2	Iris-setosa``````

Now we are going to separate the 4 features into one data frame X and the dependent variable y to another. It is a good approach to normalize the data before you apply a dimension reduction algorithm and especially the PCA.

```X = df.iloc[:,0:4]
y = df.iloc[:,4]

# scale/normalize the data
X = StandardScaler().fit_transform(X)
```

## PCA Algorithm

```# The two Principal Components
PCs = pd.DataFrame(PCA(n_components=2).fit_transform(X), columns = ['PC1', 'PC2'])

# add the target y to the data frame
PCs['target'] = y

sns.scatterplot(x='PC1', y='PC2', data=PCs, hue='target')
```

## t-SNE Algorithm

```# the two components
tSNE = pd.DataFrame(TSNE(n_components=2).fit_transform(X), columns = ['tSNE1', 'tSNE2'])

tSNE['target'] = y

sns.scatterplot(x='tSNE1', y='tSNE2', data=tSNE, hue='target')
```