Predictive Hacks

How to Prevent Overfitting in Neural Networks with TensorFlow 2.0

tensorflow

In this post, we will provide some techniques of how you can prevent overfitting in Neural Network when you work with TensorFlow 2.0. We will apply the following techniques at the same time.

We will work with the diabetes dataset provided by the scikit-learn.

Load the Data

We will load the data, and we will split the data into train and test dataset (90-10)

import tensorflow as tf
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dropout, Dense, BatchNormalization
from tensorflow.keras import regularizers

import pandas as pd

import matplotlib.pyplot as plt
%matplotlib inline



diabetes_dataset = load_diabetes()

data = diabetes_dataset['data']
targets = diabetes_dataset['target']

train_data, test_data, train_targets, test_targets = train_test_split(data, targets, test_size=0.1)

Build the Model

We will build a model with 6 layers of 128 neurons, by adding L2 regularization of rate 0.00001 in every layer as well as a dropout of 30%. Finally, we will add 1 batch normalization.

wd = 0.00001
rate = 0.3 

model = Sequential([
        Dense(128, kernel_regularizer=regularizers.l2(wd), activation="relu", input_shape=(train_data.shape[1],)),
        BatchNormalization(),
        Dropout(rate),
        Dense(128, kernel_regularizer=regularizers.l2(wd), activation="relu"),
        Dropout(rate),
        Dense(128, kernel_regularizer=regularizers.l2(wd), activation="relu"),
        Dropout(rate),
        Dense(128, kernel_regularizer=regularizers.l2(wd), activation="relu"),
        Dropout(rate),
        Dense(128, kernel_regularizer=regularizers.l2(wd), activation="relu"),
        Dropout(rate),
        Dense(128, kernel_regularizer=regularizers.l2(wd), activation="relu"),
        Dense(1)
    ])

model.summary()

The model summary:

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_7 (Dense)              (None, 128)               1408      
_________________________________________________________________
batch_normalization_1 (Batch (None, 128)               512       
_________________________________________________________________
dropout_5 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_8 (Dense)              (None, 128)               16512     
_________________________________________________________________
dropout_6 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_9 (Dense)              (None, 128)               16512     
_________________________________________________________________
dropout_7 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_10 (Dense)             (None, 128)               16512     
_________________________________________________________________
dropout_8 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_11 (Dense)             (None, 128)               16512     
_________________________________________________________________
dropout_9 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_12 (Dense)             (None, 128)               16512     
_________________________________________________________________
dense_13 (Dense)             (None, 1)                 129       
=================================================================
Total params: 84,609
Trainable params: 84,353
Non-trainable params: 256

Compile the Model

We will compile the model using the Adam optimizer and loss function the mse. As metric we will report the mae.

# Compile the model
model.compile(optimizer='adam', loss='mse', metrics=['mae'])

Train the Model

We will train the model with 1000 epochs, by keeping an internal validation dataset of 15% and a batch size equal to 64. Finally, we will apply an early stopping with patience equal to 20, meaning that if the model has not been improved for the next 20 consecutive epochs in the validation dataset, then stop training.

model_history = model.fit(train_data, train_targets, epochs=1000, 
                                        validation_split=0.15, batch_size=64, verbose=False,
                                       callbacks = [tf.keras.callbacks.EarlyStopping(patience=20)])

Plot the Learning Curves

We can plot the Loss of the model across epochs for the “training” and “validation” dataset:

# Plot the training and validation loss

import matplotlib.pyplot as plt

plt.plot(model_history.history['loss'])
plt.plot(model_history.history['val_loss'])
plt.title('Loss vs. epochs')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Training', 'Validation'], loc='upper right')
plt.show()

As we can see the model run only 36 epochs and then stopped since was not getting better. We can evaluate the model on the test dataset:

# Evaluate the model on the test set

model.evaluate(test_data, test_targets, verbose=2)
45/1 - 0s - loss: 4512.4256 - mae: 54.0345
[4839.613487413195, 54.034515]

Let’s compare the regularized models with a model without any of the techniques that we mentioned above.

Model without Regularization

model = Sequential([
        Dense(128, activation="relu", input_shape=(train_data.shape[1],)),
        Dense(128, activation="relu"),
        Dense(128, activation="relu"),
        Dense(128, activation="relu"),
        Dense(128, activation="relu"),
        Dense(128, activation="relu"),
        Dense(1)
    ])


import matplotlib.pyplot as plt

plt.plot(model_history.history['loss'])
plt.plot(model_history.history['val_loss'])
plt.title('Loss vs. epochs')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Training', 'Validation'], loc='upper right')
plt.show()
# Evaluate the model on the test set

model.evaluate(test_data, test_targets, verbose=2)
45/1 - 0s - loss: 8459.9902 - mae: 66.0895
[7391.370073784722, 66.08951]

As we can see, without regularization the mae was 66.09 compared to 54.03 on the test dataset. Also the loss was 7391 compared to 4839. Finally, we saved much time since we ran it only 36 epochs compared to 1000 without early stopping.

The Takeaway

When we build a neural network there is a high chance of overfitting. This is not the case when we are dealing with bagging and boosting algorithms like Gradient Boost, Random Forest, XGBoost, LightGBM etc. For that reason, we should be very careful and always add regularization techniques, especially when we build complicated neural networks with many layers and many neurons. Especially when there are many epochs, there is an even higher risk of overfitting.

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

Leave a Comment

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Python

Image Captioning with HuggingFace

Image captioning with AI is a fascinating application of artificial intelligence (AI) that involves generating textual descriptions for images automatically.

Python

Intro to Chatbots with HuggingFace

In this tutorial, we will show you how to use the Transformers library from HuggingFace to build chatbot pipelines. Let’s