Since computers don't understand our language and it's likely that they pick up junk during textual analysis, computer programming is used to clean your text data. By changing all characters to lowercase, eliminating punctuation, and eliminating stop words and typos, it can be quite beneficial to get rid of unhelpful portions of the data, or noise.
Today, we'll look at how to remove punctuation from a string using python with various different methods and their code.
The results of any text-processing strategy are impacted by this crucial NLP preprocessing phase, which divides the text into sentences, paragraphs, and phrases. Including a text preparation layer in activities like sentiment analysis, document categorization, document retrieval based on user queries, and more, adding a text preprocessing layer provides more accuracy.
What are String Punctuations in Python?
Punctuation marks are unique symbols that give spoken English more grammatical structure. But when processing text, it becomes important to remove or replace them. Depending on the use case, it is crucial to determine the list of punctuation that will be discarded or removed from the data with care.
Before learning how to get rid of them, we must know how are string punctuations defined in python.
The string.punctuation is a pre-defined constant string in python that contains all punctuation characters. This string includes all these characters and more, such as quotation marks, brackets, and slashes.
It is a convenient way to reference all punctuation characters in one place, rather than having to type out each individual character. It has the following list of punctuation:
Also, You can also add other special symbols to the list of punctuations that you want to discard. Example: '©', '^', '®',' ','¾', '¡', etc.
How to Remove Punctuations from Strings?
As discussed removing punctuations from strings is a common task that every programmer should know about and we will look at 4 different methods in python to do it below:
1) The Translation Function
It is one of the best ways to easily strip punctuation.
The translate() function is a string method that can be used to replace characters in a string with other characters. It is used with the maketrans() function to remove punctuation from a string. The string.translate() function's first two arguments are empty strings, and its third argument is a list of all the punctuation marks you want to eliminate from your string.
Syntax: string_name.translate(str.maketrans(‘ ’, ‘ ‘, string.punctuation))
We use the maketrans() function to create a translation table that maps the punctuation characters in the punctuation constant string to None. We apply the translation table to the sample string using the translate() method, which replaces each character in the string that matches a key in the translation table with its corresponding value, or removes it if the value is None.
Let's understand with an example:
This function can also be used if you want to replace some words or characters with specific codes or characters. Check the example below:
2) Using the Loop & Replace function
It is a standard method of doing this operation without using any in-built function of python. Here we iterate over the string to check for punctuations, and then, replace it with an empty string using the replace() function. It is a brute way to complete a task.
Here is an example:
3) Using Regex
Regex is a powerful tool for pattern matching and manipulation of text, including removing specific characters from a string. It has a method function named sub() which is used to search for a pattern in a string and replace all occurrences of that pattern with a specified string.
Arguments of the sub() function are pattern, replace, string, count, and flag. Here is an example:
4) The filter() function
The filter() method filters the elements based on a specific condition. You can understand it easily with the following example:
What is the quickest method?
The str.translate() method in python is the fastest way to remove punctuation from a string. Speed isn't everything, of course, but discovering code that drastically slows down your code will frequently result in a worse user experience.
We can compare all the methods in terms of the time of execution of code:
Hence, we can say that translate() function is the fastest.
Also, learn how to check if the python string contains substring using 3 different approaches.
Since punctuation is difficult to process in natural English strings, we must first remove it before using the strings for additional processing. We learned different methods to strip punctuation from strings in python. Happy Learning :)