What’s New ?

The Top 10 favtutor Features You Might Have Overlooked

Read More

Remove Punctuation from String using Python (4 Best Methods)

  • Feb 21, 2023
  • 6 Minute Read
  • Why Trust Us
    We uphold a strict editorial policy that emphasizes factual accuracy, relevance, and impartiality. Our content is crafted by top technical writers with deep knowledge in the fields of computer science and data science, ensuring each piece is meticulously reviewed by a team of seasoned editors to guarantee compliance with the highest standards in educational content creation and publishing.
  • By Komal Gupta
Remove Punctuation from String using Python (4 Best Methods)

Since computers don't understand our language and it's likely that they pick up junk during textual analysis, computer programming is used to clean your text data. By changing all characters to lowercase, eliminating punctuation, and eliminating stop words and typos, it can be quite beneficial to get rid of unhelpful portions of the data, or noise.

Today, we'll look at how to remove punctuation from a string using python with various different methods and their code. 

The results of any text-processing strategy are impacted by this crucial NLP preprocessing phase, which divides the text into sentences, paragraphs, and phrases. Including a text preparation layer in activities like sentiment analysis, document categorization, document retrieval based on user queries, and more, adding a text preprocessing layer provides more accuracy.

What are String Punctuations in Python?

Punctuation marks are unique symbols that give spoken English more grammatical structure. But when processing text, it becomes important to remove or replace them. Depending on the use case, it is crucial to determine the list of punctuation that will be discarded or removed from the data with care. 

Before learning how to get rid of them, we must know how are string punctuations defined in python.

The string.punctuation is a pre-defined constant string in python that contains all punctuation characters. This string includes all these characters and more, such as quotation marks, brackets, and slashes.

It is a convenient way to reference all punctuation characters in one place, rather than having to type out each individual character. It has the following list of punctuation:

import string
string.punctuation

 

Output :

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

 

Also, You can also add other special symbols to the list of punctuations that you want to discard. Example: '©', '^', '®',' ','¾', '¡', etc.

import string
regular_punct = list(string.punctuation) # python punctuations 
special_punct=['©', '^', '®',' ','¾', '¡','!'] # user defined special characters to remove 
def remove_punctuation(text,punct_list):
    for punc in punct_list:
        if punc in text:
            text = text.replace(punc, ' ')
    return text.strip()

a= remove_punctuation(" Hello!! welcome to Favetutor blogs©",regular_punct)
b=remove_punctuation(a,special_punct)
print("Sentence after removing python punctuations",a)
print("Sentence after removing special punctuations",b)

 

Output:

Sentence after removing python punctuations Hello   welcome to Favetutor blogs©
Sentence after removing special punctuations Hello   welcome to Favetutor blogs

 

How to Remove Punctuations from Strings?

As discussed removing punctuations from strings is a common task that every programmer should know about and we will look at 4 different methods in python to do it below:

1) The Translation Function

It is one of the best ways to easily strip punctuation.

The translate() function is a string method that can be used to replace characters in a string with other characters. It is used with the maketrans() function to remove punctuation from a string.  The string.translate() function's first two arguments are empty strings, and its third argument is a list of all the punctuation marks you want to eliminate from your string.

Syntax: string_name.translate(str.maketrans(‘ ’, ‘ ‘, string.punctuation))

We use the maketrans() function to create a translation table that maps the punctuation characters in the punctuation constant string to None. We apply the translation table to the sample string using the translate() method, which replaces each character in the string that matches a key in the translation table with its corresponding value, or removes it if the value is None. 

Let's understand with an example:

import string
text = "I'm Komal from Favtutor, hello. How may I assist you??"
translating = str.maketrans('', '', string.punctuation)
new_string = text.translate(translating)
print(new_string)

 

Output:

Im Komal from Favtutor hello How may I assist you

 

This function can also be used if you want to replace some words or characters with specific codes or characters. Check the example below:

table = str.maketrans('aeiou', '12345')
#here we are doing mappinof values 
'''
a==1
e==2
i==3
o==4
u==5
'''
string = 'This is a sample string'
translated_string = string.translate(table)
print(translated_string)

 

Output:

Th3s 3s 1 s1mpl2 str3ng

 

2) Using the Loop & Replace function

It is a standard method of doing this operation without using any in-built function of python. Here we iterate over the string to check for punctuations, and then, replace it with an empty string using the replace() function. It is a brute way to complete a task.

Here is an example:

test_str = "Hi, Welcome to the Favtutor live coding classes -24*7 Expert help availaible. Register Now!!"
print("The original string --> ","\n",test_str)
punc_list = '''!()-[]{};*:'"\,<>./?@_~'''
for i in test_str:
    if i in punc_list:
        test_str = test_str.replace(i, "")
# printing updated string
print('-----------------------------------------------------------------')
print("The string after removing punctuation -->",'\n',test_str)

 

Output:

The original string -->  
 Hi, Welcome to the Favtutor live coding classes -24*7 Expert help availaible. Register Now!!
-----------------------------------------------------------------
The string after removing punctuation --> 
 Hi Welcome to the Favtutor live coding classes 247 Expert help availaible Register Now

 

3) Using Regex

Regex is a powerful tool for pattern matching and manipulation of text, including removing specific characters from a string. It has a method function named sub() which is used to search for a pattern in a string and replace all occurrences of that pattern with a specified string.

Arguments of the sub() function are pattern, replace, string, count, and flag. Here is an example:

import re
string = "Hello, welcome to my blog! :)"
new_string = re.sub(r'[^\w\s]', '', string)
print(new_string)

 

Output:

Hello welcome to my blog 

 

4) The filter() function

The filter() method filters the elements based on a specific condition. You can understand it easily with the following example:

import string
str = "Hello, world!"
punctuations =string.punctuation
def remove_punctuation(char):
    return char not in punctuations
clean_text = ''.join(filter(remove_punctuation, str))

print(clean_text)  # output: Hello world

 

Output:

Hello world

 

What is the quickest method?

The str.translate() method in python is the fastest way to remove punctuation from a string. Speed isn't everything, of course, but discovering code that drastically slows down your code will frequently result in a worse user experience.

We can compare all the methods in terms of the time of execution of code:

import re, string, timeit

str = "Hi, Welcome to the Favtutor live coding classes -24*7 Expert help availaible. Register Now!!"
punctuations = set(string.punctuation)
table = str.maketrans("","", string.punctuation)
regex = re.compile('[%s]' % re.escape(string.punctuation))

def test_join(s):
    return ''.join(ch for ch in s if ch not in punctuations)

def test_re(s): 
    return regex.sub('', s)

def test_trans(s):
    return s.translate(table)

def test_repl(s):  
    for c in string.punctuation:
        s=s.replace(c,"")
    return s

print ("Join       :",timeit.Timer('f(str)', 'from __main__ import str,test_join as f').timeit(1000000))
print ("regex     :",timeit.Timer('f(str)', 'from __main__ import str,test_re as f').timeit(1000000))
print ("translate :",timeit.Timer('f(str)', 'from __main__ import str,test_trans as f').timeit(1000000))
print( "replace   :",timeit.Timer('f(str)', 'from __main__ import str,test_repl as f').timeit(1000000))

 

Output:

Join       : 7.931632200023159
regex     : 2.2031622999347746
translate : 2.171550799976103
replace   : 3.7943124000448734

 

Hence, we can say that translate() function is the fastest.

Also, learn how to check if the python string contains substring using 3 different approaches.

Conclusion

Since punctuation is difficult to process in natural English strings, we must first remove it before using the strings for additional processing. We learned different methods to strip punctuation from strings in python. Happy Learning :)

FavTutor - 24x7 Live Coding Help from Expert Tutors!

About The Author
Komal Gupta
I am a driven and ambitious individual who is passionate about programming and technology. I have experience in Java, Python, and machine learning, and I am constantly seeking to improve and expand my knowledge in these areas. I am an AI/ML researcher. I enjoy sharing my technical knowledge as a content writer to help the community.