In data analysis and manipulation using Pandas, there often comes a time when you need to remove certain columns from a data frame. This could be due to various reasons, such as cleaning up the data, reducing memory usage, or simplifying your analysis. In this article, we will learn how to drop columns in Pandas DataFrames.
Drop() Method to Remove Columns in Pandas
The drop() method in Pandas allows you to remove one or more columns from a DataFrame by returning a new DataFrame with the specified columns removed. However, it’s important to note that the original DataFrame is not modified unless you set the in-place parameter to True.
Here is the syntax of how to use it:
DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')
Here’s a breakdown of the syntax:
- labels: This parameter represents the labels to be dropped. It can be a single label or a list-like object. It specifies the labels to be dropped from either the index (rows) or columns, depending on the value of the axis parameter.
- axis: This parameter determines whether the labels are dropped from the index (rows) or columns. It can take values of 0 or ‘index’ to drop rows, and 1 or ‘columns’ to drop columns.
- index: An alternative to specifying labels directly, you can use the index parameter to specify labels to be dropped from the index (rows).
- columns: An alternative to specifying labels directly, you can use the columns parameter to specify labels to be dropped from columns.
- level: If the axis is a MultiIndex (hierarchical), this parameter specifies the level from which to drop labels.
- inplace: This parameter, when set to True, modifies the DataFrame in place and returns None. If False (the default), it returns a new DataFrame with the specified labels dropped.
Now let us discuss the various methods of using the .drop method in Python Pandas.
Drop a Single Column
Let us learn how to remove a single column from a DataFrame. The simplest way to remove a single column from a DataFrame is by using the .drop() method with the columns parameter. Let’s consider an example where we have a DataFrame with columns ‘A’, ‘B’, ‘C’, ‘D’, and ‘E’. We want to remove column ‘B’ from it.
In the example, we will create a DataFrame with columns ‘A’, ‘B’, ‘C’, ‘D’, and ‘E’. We will then use the .drop() method to remove column ‘B’ by passing it as a list to the columns parameter. Finally, we will assign the result back to the original DataFrame df to modify it in place.
Here is the Python program to drop column in a Dataframe:
import pandas as pd data = {'A': [1, 2, 3, 4, 5], 'B': [6, 7, 8, 9, 10], 'C': [11, 12, 13, 14, 15], 'D': [16, 17, 18, 19, 20], 'E': [21, 22, 23, 24, 25]} df = pd.DataFrame(data) # Display the original DataFrame print('Original DataFrame:\n', df) # Drop the column 'B' df = df.drop(columns=['B']) # Display the modified DataFrame print('Modified DataFrame:\n', df)
Output:
Original DataFrame:
A B C D E
0 1 6 11 16 21
1 2 7 12 17 22
2 3 8 13 18 23
3 4 9 14 19 24
4 5 10 15 20 25
Modified DataFrame:
A C D E
0 1 11 16 21
1 2 12 17 22
2 3 13 18 23
3 4 14 19 24
4 5 15 20 25
Removing Multiple Columns
Let us now try to remove multiple columns from the DataFrame.
To remove multiple columns from a DataFrame using the .drop() method, you can pass a list of column names to the columns parameter. Let’s consider an example where we have a DataFrame with columns ‘A’, ‘B’, ‘C’, ‘D’, and ‘E’. We want to remove both columns ‘C’ and ‘D’ from this DataFrame.
In the example, we will create a DataFrame with columns ‘A’, ‘B’, ‘C’, ‘D’, and ‘E’. We will then use the .drop() method to remove columns ‘C’ and ‘D’ by passing them as a list to the columns parameter. Finally, we will assign the result back to the original DataFrame df to modify it in place.
Below is the code to remove multiple columns in a Pandas DataFrame:
import pandas as pd data = {'A': [1, 2, 3, 4, 5], 'B': [6, 7, 8, 9, 10], 'C': [11, 12, 13, 14, 15], 'D': [16, 17, 18, 19, 20], 'E': [21, 22, 23, 24, 25]} df = pd.DataFrame(data) # Display the original DataFrame print('Original DataFrame:\n', df) # Drop columns 'C' and 'D' df = df.drop(columns=['C', 'D']) # Display the modified DataFrame print('Modified DataFrame:\n', df)
Output:
Original DataFrame:
A B C D E
0 1 6 11 16 21
1 2 7 12 17 22
2 3 8 13 18 23
3 4 9 14 19 24
4 5 10 15 20 25
Modified DataFrame:
A B E
0 1 6 21
1 2 7 22
2 3 8 23
3 4 9 24
4 5 10 25
Removing Columns by Index
Instead of removing columns by their names, you can also remove them by their indices using the .drop() method. This approach is very helpful when there exist multiple columns with similar names in the DataFrame.
To do this, pass a list of column indices to the columns parameter. Let’s consider an example where we have a DataFrame with columns ‘A’, ‘B’, ‘C’, ‘D’, ‘E’. We want to remove columns ‘B’ and ‘C’ by their indices.
In the example, we will create a DataFrame with columns ‘A’, ‘B’, ‘C’, ‘D’, and ‘E’. We will then use the .drop() method to remove columns ‘B’ and ‘C’ by passing their indices [1, 2] as a list to the columns parameter. Finally, we will assign the result back to the original DataFrame df to modify it in place.
Using the code below, you can remove columns by index:
import pandas as pd data = {'A': [1, 2, 3, 4, 5], 'B': [6, 7, 8, 9, 10], 'C': [11, 12, 13, 14, 15], 'D': [16, 17, 18, 19, 20], 'E': [21, 22, 23, 24, 25]} df = pd.DataFrame(data) # Display the original DataFrame print('Original DataFrame:\n', df) # Drop columns 'B' and 'C' by index df = df.drop(columns=df.columns[[1, 2]]) # Display the modified DataFrame print('Modified DataFrame:\n', df)
Output:
Original DataFrame:
A B C D E
0 1 6 11 16 21
1 2 7 12 17 22
2 3 8 13 18 23
3 4 9 14 19 24
4 5 10 15 20 25
Modified DataFrame:
A D E
0 1 16 21
1 2 17 22
2 3 18 23
3 4 19 24
4 5 20 25
Alternative: Pop() method
Another way to remove columns in Pandas is by using the .pop() method. Unlike the .drop() method, the .pop() method modifies the original DataFrame in place and returns the removed column as a Series.
Let’s consider an example where we have a DataFrame with columns ‘A’, ‘B’, ‘C’, ‘D’, and ‘E’. We want to remove the column ‘B’ using the .pop() method.
In the example, we will create a DataFrame with columns ‘A’, ‘B’, ‘C’, ‘D’, and ‘E’. We will use the .pop() method to remove the column ‘B’ from the DataFrame. The modified DataFrame will no longer contain the column ‘B’, and the removed column will be returned as a Series.
Check how to use it below:
import pandas as pd data = {'A': [1, 2, 3, 4, 5], 'B': [6, 7, 8, 9, 10], 'C': [11, 12, 13, 14, 15], 'D': [16, 17, 18, 19, 20], 'E': [21, 22, 23, 24, 25]} df = pd.DataFrame(data) # Display the original DataFrame print('Original DataFrame:\n', df) # Remove the column 'B' using .pop() df.pop('B') # Display the modified DataFrame print('Modified DataFrame:\n', df)
Output:
Original DataFrame:
A B C D E
0 1 6 11 16 21
1 2 7 12 17 22
2 3 8 13 18 23
3 4 9 14 19 24
4 5 10 15 20 25
Modified DataFrame:
A C D E
0 1 11 16 21
1 2 12 17 22
2 3 13 18 23
3 4 14 19 24
4 5 15 20 25
Conclusion
In this article, we have learned about the various techniques used to drop one or multiple columns in Pandas. Removing columns from a DataFrames is a common task in data analysis and manipulation. Remember to choose the appropriate method based on your specific requirements. Overall, Pandas provides flexible and efficient ways to accomplish this task.