Predictive Hacks

How to Remove Punctuation from Text in Python

In NLP projects, we used to remove punctuation from the text. However, we should be very careful when we perform such tasks, depending on the project since punctuations can actually be very important like sentiment analysis and so on. Let’s provide some examples:

import re
import string

text = "This is a text!!! It has (parenthesis), square and curly brackets [[{{}}]] and hashtags #."
text.translate(str.maketrans('', '', string.punctuation))
'This is a text It has parenthesis square and curly brackets  and hashtags '

Another way to do that is the following:

re.compile('[%s]' % re.escape(string.punctuation)).sub('', text)
'This is a text It has parenthesis square and curly brackets  and hashtags '

Awesome, we managed to remove all punctuation. But what if we want to keep some of them, like the hashtag?

Remove some Punctuation and Keep some others

Let’s see how we can keep some punctuation. First, let’s get all the punctuation.

('[%s]' % re.escape(string.punctuation))
'[!"\\#\\$%\\&\'\\(\\)\\*\\+,\\-\\./:;<=>\\?@\\[\\\\\\]\\^_`\\{\\|\\}\\~]'

The above is the regular expression. Let’s keep all of them, but hashtags.

re.compile('[!"\\\\$%\\&amp;\'\\(\\)\\*\\+,\\-\\./:;<=>\\?@\\[\\\\\\]\\^_`\\{\\|\\}\\~]').sub('', text)
'This is a text It has parenthesis square and curly brackets  and hashtags #'

Voilà! We managed to keep hashtags!

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Python

Image Captioning with HuggingFace

Image captioning with AI is a fascinating application of artificial intelligence (AI) that involves generating textual descriptions for images automatically.

Python

Intro to Chatbots with HuggingFace

In this tutorial, we will show you how to use the Transformers library from HuggingFace to build chatbot pipelines. Let’s