Pandas is a very useful library in Python for data manipulation and analysis. It provides all sorts of tools and techniques that come in handy while doing projects in data science. In this article, we will explore different methods to change column types in Pandas DataFrames and discuss their use cases.
4 Methods to Change Pandas Column Type
One common task when working with data is to change the column types in a DataFrame. This can be useful for various reasons, such as converting string columns to numeric types or changing the data type to a more appropriate one.
Here are 4 easy methods to update one or more columns in Pandas DataFrame:
1) Using DataFrame.astype() Method
The DataFrame.astype() method is a convenient method that allows us to cast a Pandas object to a specified data type. It can be used to convert a DataFrame, Series, or Mapping of column name to data type.
Let us try an example to change the type of a column in a DataFrame with the following code in Python:
import pandas as pd df = pd.DataFrame({ 'A': [1, 2, 3, 4, 5], 'B': ['a', 'b', 'c', 'd', 'e'], 'C': [1.1, 1.0, 1.3, 2, 5] }) # Display the DataFrame print('Original DataFrame:\n', df) # Display the type of the original DataFrame print('Type of original DataFrame:\n', df.dtypes) # Change the column types of the DataFrame to string objects df = df.astype(str) # Display the type of the New DataFrame print('Type of new DataFrame:\n', df.dtypes)
Output:
Original DataFrame:
A B C
0 1 a 1.1
1 2 b 1.0
2 3 c 1.3
3 4 d 2.0
4 5 e 5.0
Type of original DataFrame:
A int64
B object
C float64
dtype: object
Type of new DataFrame:
A object
B object
C object
dtype: object
To learn more about pandas astype() refer to pandas-astype.
2) Using DataFrame.apply() Method
We can use the pandas-apply method for the conversion of columns in DataFrame.
The DataFrame.apply() method, combined with functions like pd.to_numeric(), pd.to_datetime(), and pd.to_timedelta(), allows us to change column types to numeric, DateTime, or timedelta types, respectively.
Let us try an example:
import pandas as pd df = pd.DataFrame({ 'A': [1, 2, 3, '4', '5'], 'B': ['a', 'b', 'c', 'd', 'e'], 'C': [1.1, '2.1', 3.0, '4.1', '5.1'] }) # Display the DataFrame print('Original DataFrame:\n', df) # Display the type of the original DataFrame print('Type of original DataFrame:\n', df.dtypes) # Change the column types of the columns of the DataFrame df[['A', 'C']] = df[['A', 'C']].apply(pd.to_numeric) # Display the type of the New DataFrame print('Type of new DataFrame:\n', df.dtypes)
Output:
Original DataFrame:
A B C
0 1 a 1.1
1 2 b 2.1
2 3 c 3.0
3 4 d 4.1
4 5 e 5.1
Type of original DataFrame:
A object
B object
C object
dtype: object
Type of new DataFrame:
A int64
B object
C float64
dtype: object
3) Using DataFrame.infer_objects() Method
The DataFrame.infer_objects() method is used for soft-conversion, which attempts to infer the data type of object-type columns. It is particularly useful when we have columns with mixed data types or when the data type of a column is not explicitly specified.
The DataFrame.infer_objects() method is useful when we have object-type columns with mixed data types or when the data type of a column is not explicitly specified. It attempts to infer the most appropriate data type for each column.
Let us try an example:
import pandas as pd df = pd.DataFrame({ 'A': [1, 2, 3, 4, 5], 'B': ['a', 'b', 'c', 'd', 'e'], 'C': [1.1, 2.1, 3.0, 4.1, 5.1] }, dtype='object') # Display the DataFrame print('Original DataFrame:\n', df) # Display the type of the original DataFrame print('Type of original DataFrame:\n', df.dtypes) # Change the column types of the columns of the DataFrame df = df.infer_objects() # Display the type of the New DataFrame print('Type of new DataFrame:\n', df.dtypes)
Output:
Original DataFrame:
A B C
0 1 a 1.1
1 2 b 2.1
2 3 c 3.0
3 4 d 4.1
4 5 e 5.1
Type of original DataFrame:
A object
B object
C object
dtype: object
Type of new DataFrame:
A int64
B object
C float64
dtype: object
4) Using DataFrame.convert_dtypes() Method
The DataFrame.convert_dtypes() method is available in Pandas version 1.0 and above. It converts each column of a DataFrame to the best possible data type that supports the pd.NA missing value.
The DataFrame.convert_dtypes() method is useful when we want to convert each column of a DataFrame to the best possible data type that supports the pd.NA missing value. It simplifies the process of converting columns to the appropriate data types.
Here is how to do it in Python:
import pandas as pd data = { "name": ["Aman", "Hardik", pd.NA], "qualified": [True, False, pd.NA] } df = pd.DataFrame(data) # Display the DataFrame print('Original DataFrame:\n', df) # Display the type of the original DataFrame print('Type of original DataFrame:\n', df.dtypes) # Change the column types of the columns of the DataFrame df = df.convert_dtypes() # Display the type of the New DataFrame print('Type of new DataFrame:\n', df.dtypes)
Output:
Original DataFrame:
name qualified
0 Aman True
1 Hardik False
2 <NA> <NA>
Type of original DataFrame:
name object
qualified object
dtype: object
Type of new DataFrame:
name string
qualified boolean
dtype: object
Conclusion
In this article, we explored different methods for how to change the type of a column in a DataFrame in Pandas Python. Each method has its use cases and benefits, allowing us to easily convert column types to the desired data types. By understanding these methods, you can efficiently manipulate and analyze your data in Pandas.