Predictive Hacks

Autoencoders for Dimensionality Reduction

autoencoders

In the previous post, we explained how we can reduce the dimensions by applying PCA and t-SNE and how we can apply Non-Negative Matrix Factorization for the same scope. In this post, we will provide a concrete example of how we can apply Autoeconders for Dimensionality Reduction. We will work with Python and TensorFlow 2.x.


Autoencoders on MNIST Dataset

We will use the MNIST dataset of tensorflow, where the images are 28 x 28 dimensions, in other words, if we flatten the dimensions, we are dealing with 784 dimensions. Our goal is to reduce the dimensions, from 784 to 2, by including as much information as possible.

Let’s get our hands dirty!

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense,Flatten,Reshape
from tensorflow.keras.optimizers import SGD


from tensorflow.keras.datasets import mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train/255.0
X_test = X_test/255.0


### Encoder
encoder = Sequential()
encoder.add(Flatten(input_shape=[28,28]))
encoder.add(Dense(400,activation="relu"))
encoder.add(Dense(200,activation="relu"))
encoder.add(Dense(100,activation="relu"))
encoder.add(Dense(50,activation="relu"))
encoder.add(Dense(2,activation="relu"))


### Decoder
decoder = Sequential()
decoder.add(Dense(50,input_shape=[2],activation='relu'))
decoder.add(Dense(100,activation='relu'))
decoder.add(Dense(200,activation='relu'))
decoder.add(Dense(400,activation='relu'))
decoder.add(Dense(28 * 28, activation="relu"))
decoder.add(Reshape([28, 28]))

### Autoencoder
autoencoder = Sequential([encoder,decoder])
autoencoder.compile(loss="mse")
autoencoder.fit(X_train,X_train,epochs=50)


encoded_2dim = encoder.predict(X_train)


# The 2D
AE = pd.DataFrame(encoded_2dim, columns = ['X1', 'X2'])

AE['target'] = y_train

sns.lmplot(x='X1', y='X2', data=AE, hue='target', fit_reg=False, size=10)

Example of MNSIT Dataset

Every image in the MNSIT Dataset is a “gray scale” image of 28 x 28 dimensions. Let’s have a look at the first image.

plt.imshow(X_train[0], cmap='gray')

This is one example of the number 5 and the corresponding 28 x 28 array is the:

X_train[0].shape
(28, 28)
X_train[0]
array([[  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   3,
         18,  18,  18, 126, 136, 175,  26, 166, 255, 247, 127,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,  30,  36,  94, 154, 170,
        253, 253, 253, 253, 253, 225, 172, 253, 242, 195,  64,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,  49, 238, 253, 253, 253, 253,
        253, 253, 253, 253, 251,  93,  82,  82,  56,  39,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,  18, 219, 253, 253, 253, 253,
        253, 198, 182, 247, 241,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
      ...,
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0]], dtype=uint8)

Our goal is to reduce the dimensions of MNIST images from 784 to 2 and to represent them in a scatter plot!

Results of Autoencoders

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(10,8))
sns.lmplot(x='X1', y='X2', data=AE, hue='target', fit_reg=False, size=10)
 

We ended up with two dimensions and we can see the corresponding scatterplot below, using as labels the digits.

autoencoder

As we can see from the plot above, only by taking into account 2 dimensions out of 784, we were able somehow to distinguish between the different images (digits). Hence, keep in mind, that apart from PCA and t-SNE, we can also apply AutoEncoders for Dimensionality Reduction

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

1 thought on “Autoencoders for Dimensionality Reduction”

Leave a Comment

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Python

Intro to Chatbots with HuggingFace

In this tutorial, we will show you how to use the Transformers library from HuggingFace to build chatbot pipelines. Let’s

Python

NER with OpenAI and LangChain

Named Entity Recognition (NER) is a natural language processing (NLP) technique used to identify and classify named entities within a