The Pandas library in Python is great for data analysis, where one common task is converting a column to the integer data type. The conversion in Python is handled easily by the astype function. In this article, we will explore how to use it to convert single or multiple-column types to integers in Pandas.
How to convert Pandas column type to int?
This conversion is required in many situations, such as if you need to perform mathematical operations on a column, to make the column sortable or filterable, making it easier for data visualization and statistical analysis.
Using the astype() function
The astype() function handles all sorts of conversions. It provides easy, fast efficient data type conversions. It is the primary function used in Pandas to convert the data type of a column. It allows you to specify the desired data type for a column and applies the conversion accordingly.
Let us look at an example of how to use the astpye() function to change a column into an integer data type:
import pandas as pd # Creating a DataFrame data = {'Roll Number': [46.0, 35.0, 42.0], 'Age': [25.1 , 30.2 , 22.2]} df = pd.DataFrame(data) # Display the original DataFrame print('Original DataFrame:\n', df) # Using the astype() method df = df.astype('int') # Display the new DataFrame print('New DataFrame:\n', df)
Output:
Original DataFrame:
Roll Number Age
0 46.0 25.1
1 35.0 30.2
2 42.0 22.2
New DataFrame:
Roll Number Age
0 46 25
1 35 30
2 42 22
Converting a Single Column to Int
To convert a single column to the integer data type, we can use the astype() method. First, access the column using the column name and then apply the conversion. Let us see an example:
import pandas as pd # Creating a DataFrame data = {'Roll Number': [46.0, 35.0, 42.0], 'Age': [25.1 , 30.2 , 22.2]} df = pd.DataFrame(data) # Display the original DataFrame print('Original DataFrame:\n', df) # Using the astype() method df['Age'] = df['Age'].astype('int') # Display the new DataFrame print('New DataFrame:\n', df)
Output:
Original DataFrame:
Roll Number Age
0 46.0 25.1
1 35.0 30.2
2 42.0 22.2
New DataFrame:
Roll Number Age
0 46.0 25
1 35.0 30
2 42.0 22
Converting Multiple Columns to Int
We can also use the astype() method to convert multiple columns to integer data type in a DataFrame. We can use the astype() method with a dictionary. The keys of the dictionary represent the column names, and the values represent the desired data types. Let us see an example:
import pandas as pd # Creating a DataFrame data = {'Roll Number': [46.0, 35.0, 42.0], 'Age': [25.1 , 30.2 , 22.2]} df = pd.DataFrame(data) # Display the original DataFrame print('Original DataFrame:\n', df) # Using the astype() method convert_dict = {'Roll Number': int, 'Age': int} df = df.astype(convert_dict) # Display the new DataFrame print('New DataFrame:\n', df)
Output:
Original DataFrame:
Roll Number Age
0 46.0 25.1
1 35.0 30.2
2 42.0 22.2
New DataFrame:
Roll Number Age
0 46 25
1 35 30
2 42 22
Handling Missing Values during Conversion
We should always make sure to handle the missing or NaN values before any conversion. We can use the fillna method to handle the NaN or NULL values. The missing values can cause errors during conversions. Hence, it is advisable to take care of them before starting any conversion.
Here is a simple example:
import pandas as pd # Creating a DataFrame data = {'Roll Number': [46.0, 35.0, 42.0], 'Age': [25.1 , 30.2 , None]} df = pd.DataFrame(data) # Display the original DataFrame print('Original DataFrame:\n', df) # Using fillna to handle NaN values df["Age"] = df["Age"].fillna(0).astype(int) # Using the astype() method convert_dict = {'Roll Number': int, 'Age': int} df = df.astype(convert_dict) # Display the new DataFrame print('New DataFrame:\n', df)
Output:
Original DataFrame:
Roll Number Age
0 46.0 25.1
1 35.0 30.2
2 42.0 NaN
New DataFrame:
Roll Number Age
0 46 25
1 35 30
2 42 0
Conclusion
In this article, we learned about the various methods we can use to convert a column type to an integer in Pansas. We discussed using the astype() method to convert single as well as multiple columns to integer data type in a DataFrame.