How to Remove Stopwords from Text in Python

In many NLP tasks, it is necessary to remove “stopwords” from the text. Usually, by “stopwords” we mean the words that occur frequently and don’t contribute much to the overall meaning of the sentence. Some examples of the stopwords are the {"a", "an", "the", "this", "that", "is", "it", "to", "and"} and so on.

In this tutorial, we will show how to remove stopwrods in Python using the NLTK library.

Let’s load the libraries

import nltk'stopwords')'punkt')
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

The English stop words are given by the list:


However, someone can create their own stop word list like:

stop_words = ["a", "an", "the", "this", "that", "is", "it", "to", "and"]

How to Add Stopwords to the NLTK Stopword List

Or you can add your custom stop words to the NLTK stopword list. For example:

# stopwords from NLTK
my_stopwords = nltk.corpus.stopwords.words('english')

# my new custom stopwords
my_extra = ['abc', 'google', 'apple']

# add the new custom stopwrds to my stopwords

How to Remove Stopwords from the NLTK Stopword List

Similarly, you can remove some words from the “stopword list” using list comprehensions. For example:

# remove these words from stop words
my_lst = ['have', 'few']

# update the stopwords list without the words above
my_stopwords = [el for el in my_stopwords if el not in my_lst]

How to Remove Stopwords from Text

Now, we are ready to remove the stopwords from the text. Let’s consider the following nonsense text for exhibition purposes.

my_txt = "I'm George. I live in Athens! This is my blog, hopefully you enjoy this post! Look at this!"

filtered_list = []
stop_words = nltk.corpus.stopwords.words('english')

# Tokenize the sentence
words = word_tokenize(my_txt)
for w in words:
    if w.lower() not in stop_words:



Now, we may want to convert the list to a string. Let’s do it:

my_clean_txt = " ".join(filtered_list)


"'m George . live Athens ! blog , hopefully enjoy post ! Look !"

