Predictive Hacks

How to Remove Punctuation from Text in Python

In NLP projects, we used to remove punctuation from the text. However, we should be very careful when we perform such tasks, depending on the project since punctuations can actually be very important like sentiment analysis and so on. Let’s provide some examples:

import re
import string

text = "This is a text!!! It has (parenthesis), square and curly brackets [[{{}}]] and hashtags #."
text.translate(str.maketrans('', '', string.punctuation))
'This is a text It has parenthesis square and curly brackets  and hashtags '

Another way to do that is the following:

re.compile('[%s]' % re.escape(string.punctuation)).sub('', text)
'This is a text It has parenthesis square and curly brackets  and hashtags '

Awesome, we managed to remove all punctuation. But what if we want to keep some of them, like the hashtag?

Remove some Punctuation and Keep some others

Let’s see how we can keep some punctuation. First, let’s get all the punctuation.

('[%s]' % re.escape(string.punctuation))
'[!"\\#\\$%\\&\'\\(\\)\\*\\+,\\-\\./:;<=>\\[email protected]\\[\\\\\\]\\^_`\\{\\|\\}\\~]'

The above is the regular expression. Let’s keep all of them, but hashtags.

re.compile('[!"\\\\$%\\&amp;\'\\(\\)\\*\\+,\\-\\./:;<=>\\[email protected]\\[\\\\\\]\\^_`\\{\\|\\}\\~]').sub('', text)
'This is a text It has parenthesis square and curly brackets  and hashtags #'

Voilà! We managed to keep hashtags!

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

snowflake
Uncategorized

Get Started with Python UDFs in Snowflake

Finally, Snowflake supports UDF (user-define functions) in Python. Thank you Snowflake! Apart from Python, we can write UDFs in Java,