When working with data, it’s common to encounter null values, which are represented as NaN (Not a Number) in Pandas DataFrames. To address this issue, Pandas provides the fillna() method, which allows users to replace NaN values with their desired values. In this article, we will explore the various parameters and methods available in Pandas DataFrame’s fillna() method and how to use it effectively.
What is the fillna() method?
The fillna() method in Pandas DataFrame is used to replace NaN values with a specified value. It’s a common operation when working with data, as missing values can create issues during analysis or modeling. It provides flexibility in handling missing data and allows users to customize the replacement process.
By the way, we can also handle missing values by deleting the column which contains them.
Here is the syntax for the fillna() method is as follows:
DataFrame.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None)
Here’s a breakdown of the syntax:
- value: This parameter specifies the value to replace the NaN values with. It can be a static value, a dictionary, an array, a Series, or even another DataFrame.
- method: If the value parameter is not provided, the method parameter can be used to specify the method for filling the missing values. Pandas offers methods such as ‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, or None.
- axis: The axis parameter determines whether the replacement should be performed along the rows (axis=0) or columns (axis=1). By default, it is set to 0 (rows).
- inplace: This boolean parameter determines whether the changes should be made directly to the original DataFrame. If set to True, the original DataFrame will be modified, while False (default) will return a new DataFrame with the replacements.
- limit: The limit parameter specifies the maximum number of consecutive NaN values to fill if the method parameter is used. If not specified, it determines the maximum number of entries along the entire axis where NaNs will be filled.
- downcast: This parameter allows for downcasting the data types of the DataFrame. It takes a dictionary or the string ‘infer’ to automatically downcast to an appropriate equal type.
- kwargs: This allows for any other keyword arguments to be passed to the method.
Now, let us see the various methods to use the fillna() method.
Replace the NaN with a Static Value
One common scenario is replacing NaN values with a static value. This is useful when you want to replace all missing values with a specific value throughout the DataFrame.
Let’s consider an example where we have a DataFrame df with a column named “city” that contains some NaN or NULL values. We can use the fillna() method to replace these NaN values with a static value.
In the example, we will use the fillna() method to the “city” column of the df DataFrame. The NaN values in this column are replaced with the string “No City”. By setting the inplace parameter to True, the changes are made directly to the original DataFrame.
Check the Python example below to replace the NaN with static value using fillna() function:
import pandas as pd data = {'name': ['John', 'Jane', 'Jade', 'Jan'], 'age': [25, 30, 35, 40], 'city': [None, 'London', 'Paris', None]} df = pd.DataFrame(data) # Display the original DataFrame print('Original DataFrame:\n', df) # Use fillna to replace the none value with a static value. df["city"].fillna("No City", inplace=True) # Display the modified DataFrame print('Modified DataFrame:\n', df)
Output:
Original DataFrame:
name age city
0 John 25 None
1 Jane 30 London
2 Jade 35 Paris
3 Jan 40 None
Modified DataFrame:
name age city
0 John 25 No City
1 Jane 30 London
2 Jade 35 Paris
3 Jan 40 No City
Using the Method Parameter
The method parameter in the fillna() method allows for filling missing values using a specific method. Pandas provides several methods such as ‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, or None. These methods determine how the replacement values are propagated within the DataFrame.
In the above example, we use the fillna() to the “city” column of the df DataFrame using the ‘ffill’ (forward fill) method.
This method replaces the NaN values with the previous non-null value in the same column. Similarly, you can use ‘bfill’ (backward fill) to replace NaN values with the next non-null value in the column.
Here is an example to better understand it:
import pandas as pd data = {'name': ['John', 'Jane', 'Jade', 'Jan'], 'age': [25, 30, 35, 40], 'city': ['New York', None, 'Paris', None]} df = pd.DataFrame(data) # Display the original DataFrame print('Original DataFrame:\n', df) # Use fillna method 'ffill' to replace the none value with a previous not null value. df["city"].fillna(method='ffill', inplace=True) # Display the modified DataFrame print('Modified DataFrame:\n', df)
Output:
Original DataFrame:
name age city
0 John 25 New York
1 Jane 30 None
2 Jade 35 Paris
3 Jan 40 None
Modified DataFrame:
name age city
0 John 25 New York
1 Jane 30 New York
2 Jade 35 Paris
3 Jan 40 Paris
Using the Axis Parameter
The axis parameter in the fillna() method determines the axis along which the replacement should be performed. By default, it is set to 0, indicating that the replacement should be done along the rows. However, you can also specify axis=1 to perform the replacement along the columns. Let’s consider an example to illustrate the usage of the axis parameter.
In the below example, the fillna() method is used to replace NaN values in the “city” column with the string “Unknown” along the columns. By setting the axis parameter to 1, the replacement is performed column-wise.
Here is an examples in Python for using axis parameter:
import pandas as pd data = {'name': ['John', 'Jane', 'Jade', 'Jan'], 'age': [25, 30, 35, 40], 'city': ['New York', None, 'Paris', None]} df = pd.DataFrame(data) # Display the original DataFrame print('Original DataFrame:\n', df) # Use fillna to fill missing values along columns (axis=1) df = df.fillna(value='Unknown', axis=1) # Display the modified DataFrame print('Modified DataFrame:\n', df)
Output:
Original DataFrame:
name age city
0 John 25 New York
1 Jane 30 None
2 Jade 35 Paris
3 Jan 40 None
Modified DataFrame:
name age city
0 John 25 New York
1 Jane 30 Unknown
2 Jade 35 Paris
3 Jan 40 Unknown
Modifying the DataFrame Inplace
The fillna() method provides the flexibility to modify the DataFrame inplace or return a new DataFrame with the replacements. By default, the inplace parameter is set to False, indicating that a new DataFrame will be returned. However, if you want to modify the original DataFrame directly, you can set the inplace parameter to True. Let’s see an example to demonstrate this behavior.
In the below example, the fillna() method is applied to the “city” column of the df DataFrame. By setting the inplace parameter to True, the changes are made directly to the original DataFrame.
Here is an example:
import pandas as pd data = {'name': ['John', 'Jane', 'Jade', 'Jan'], 'age': [25, 30, 35, 40], 'city': ['New York', None, 'Paris', None]} df = pd.DataFrame(data) # Display the original DataFrame print('Original DataFrame:\n', df) # Use fillna to fill missing values by modifying the DataFrame inplace df.fillna(value={"city": "No City"}, inplace=True) # Display the modified DataFrame print('Modified DataFrame:\n', df)
Output:
Original DataFrame:
name age city
0 John 25 New York
1 Jane 30 None
2 Jade 35 Paris
3 Jan 40 None
Modified DataFrame:
name age city
0 John 25 New York
1 Jane 30 No City
2 Jade 35 Paris
3 Jan 40 No City
Limiting the Number of Replacements
The limit parameter in the fillna() method allows you to specify the maximum number of consecutive NaN values to fill if the method parameter is used. This parameter is useful when you want to limit the number of replacements to a specific number. Let’s consider an example to understand the usage of the limit parameter.
In the below example, the fillna() method is applied to the “city” column of the df DataFrame using the ‘ffill’ method. The limit parameter is set to 1, which means only one consecutive NaN value will be replaced.
This can be helpful when you want to control the extent of filling missing values. Check the code below:
import pandas as pd data = {'name': ['John', 'Jane', 'Jade', 'Jan'], 'age': [25, 30, 35, 40], 'city': ['New York', None, None, 'Paris']} df = pd.DataFrame(data) # Display the original DataFrame print('Original DataFrame:\n', df) # Use fillna to fill upto one consecutive missing values df["city"].fillna(method='ffill', limit=1, inplace=True) # Display the modified DataFrame print('Modified DataFrame:\n', df)
Output:
Original DataFrame:
name age city
0 John 25 New York
1 Jane 30 None
2 Jade 35 None
3 Jan 40 Paris
Modified DataFrame:
name age city
0 John 25 New York
1 Jane 30 New York
2 Jade 35 None
3 Jan 40 Paris
Additional Parameters
Apart from the primary parameters discussed above, the fillna() method also accepts additional keyword arguments (kwargs). These additional arguments allow for further customization and flexibility. Users can pass any other relevant keyword arguments specific to their use case.
Let’s consider an example to illustrate the usage of additional parameters.
In the below example, the fillna() method is applied to the “age” column of the df DataFrame. The NaN values in this column are replaced with the mean age of the column and the axis parameter is set to 0 to fill along the rows. Here is the Python code on how to use them:
import pandas as pd data = {'name': ['John', 'Jane', 'Jade', 'Jan'], 'age': [25, None, 35, None], 'city': ['New York', 'London', 'Delhi', 'Paris']} df = pd.DataFrame(data) # Display the original DataFrame print('Original DataFrame:\n', df) # Use fillna to fill mean value in the missing values df.fillna(value={"age": df["age"].mean()}, inplace=True, axis=0) # Display the modified DataFrame print('Modified DataFrame:\n', df)
Output:
Original DataFrame:
name age city
0 John 25.0 New York
1 Jane NaN London
2 Jade 35.0 Delhi
3 Jan NaN Paris
Modified DataFrame:
name age city
0 John 25.0 New York
1 Jane 30.0 London
2 Jade 35.0 Delhi
3 Jan 30.0 Paris
Conclusion
In this article, we explored the fillna() method in Pandas DataFrame, which provides a powerful tool for replacing NaN values with desired values. We discussed the various parameters available in the fillna() method, including value, method, axis, inplace, limit and other parameters. We also provided examples and use cases to demonstrate the practical application of it. By using the fillna() method effectively, you can handle missing data and ensure the completeness of your data analysis.