In the previous post, we explained how we can reduce the dimensions by applying PCA and t-SNE. In this post, we will provide a concrete example of how we can apply Autoeconders for Dimensionality Reduction. We will work with Python and TensorFlow 2.x.

## Autoencoders on MNIST Dataset

We will use the MNIST dataset of tensorflow, where the images are 28 x 28 dimensions, in other words, if we flatten the dimensions, we are dealing with **784** **dimensions**. Our goal is to reduce the dimensions, from **784** to **2**, by including as much information as possible.

Let’s get our hands dirty!

from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense,Flatten,Reshape from tensorflow.keras.optimizers import SGD from tensorflow.keras.datasets import mnist (X_train, y_train), (X_test, y_test) = mnist.load_data() X_train = X_train/255.0 X_test = X_test/255.0 ### Encoder encoder = Sequential() encoder.add(Flatten(input_shape=[28,28])) encoder.add(Dense(400,activation="relu")) encoder.add(Dense(200,activation="relu")) encoder.add(Dense(100,activation="relu")) encoder.add(Dense(50,activation="relu")) encoder.add(Dense(2,activation="relu")) ### Decoder decoder = Sequential() decoder.add(Dense(50,input_shape=[2],activation='relu')) decoder.add(Dense(100,activation='relu')) decoder.add(Dense(200,activation='relu')) decoder.add(Dense(400,activation='relu')) decoder.add(Dense(28 * 28, activation="relu")) decoder.add(Reshape([28, 28])) ### Autoencoder autoencoder = Sequential([encoder,decoder]) autoencoder.compile(loss="mse") autoencoder.fit(X_train,X_train,epochs=50) encoded_2dim = encoder.predict(X_train) # The 2D AE = pd.DataFrame(encoded_2dim, columns = ['X1', 'X2']) AE['target'] = y_train sns.lmplot(x='X1', y='X2', data=AE, hue='target', fit_reg=False, size=10)

## Example of MNSIT Dataset

Every image in the MNSIT Dataset is a “gray scale” image of 28 x 28 dimensions. Let’s have a look at the first image.

plt.imshow(X_train[0], cmap='gray')

This is one example of the number 5 and the corresponding 28 x 28 array is the:

X_train[0].shape

`(28, 28)`

X_train[0]

```
array([[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3,
18, 18, 18, 126, 136, 175, 26, 166, 255, 247, 127, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 30, 36, 94, 154, 170,
253, 253, 253, 253, 253, 225, 172, 253, 242, 195, 64, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 49, 238, 253, 253, 253, 253,
253, 253, 253, 253, 251, 93, 82, 82, 56, 39, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 18, 219, 253, 253, 253, 253,
253, 198, 182, 247, 241, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
...,
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0]], dtype=uint8)
```

Our goal is to reduce the dimensions of MNIST images from **784** to **2** and to represent them in a scatter plot!

## Results of Autoencoders

We ended up with two dimensions and we can see the corresponding scatterplot below, using as labels the digits.

As we can see from the plot above, only by taking into account 2 dimensions out of 784, we were able somehow to distinguish between the different images (digits). Hence, keep in mind, that apart from PCA and t-SNE, we can also apply AutoEncoders for Dimensionality Reduction