Pandas DataFrame fillna() Method (with Examples)

When working with data, it’s common to encounter null values, which are represented as NaN (Not a Number) in Pandas DataFrames. To address this issue, Pandas provides the fillna() method, which allows users to replace NaN values with their desired values. In this article, we will explore the various parameters and methods available in Pandas DataFrame’s fillna() method and how to use it effectively.

What is the fillna() method?

The fillna() method in Pandas DataFrame is used to replace NaN values with a specified value. It’s a common operation when working with data, as missing values can create issues during analysis or modeling. It provides flexibility in handling missing data and allows users to customize the replacement process.

By the way, we can also handle missing values by deleting the column which contains them.

Here is the syntax for the fillna() method is as follows:

DataFrame.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None)

Here’s a breakdown of the syntax:

value: This parameter specifies the value to replace the NaN values with. It can be a static value, a dictionary, an array, a Series, or even another DataFrame.
method: If the value parameter is not provided, the method parameter can be used to specify the method for filling the missing values. Pandas offers methods such as ‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, or None.
axis: The axis parameter determines whether the replacement should be performed along the rows (axis=0) or columns (axis=1). By default, it is set to 0 (rows).
inplace: This boolean parameter determines whether the changes should be made directly to the original DataFrame. If set to True, the original DataFrame will be modified, while False (default) will return a new DataFrame with the replacements.
limit: The limit parameter specifies the maximum number of consecutive NaN values to fill if the method parameter is used. If not specified, it determines the maximum number of entries along the entire axis where NaNs will be filled.
downcast: This parameter allows for downcasting the data types of the DataFrame. It takes a dictionary or the string ‘infer’ to automatically downcast to an appropriate equal type.
kwargs: This allows for any other keyword arguments to be passed to the method.

Now, let us see the various methods to use the fillna() method.

Replace the NaN with a Static Value

One common scenario is replacing NaN values with a static value. This is useful when you want to replace all missing values with a specific value throughout the DataFrame.

Let’s consider an example where we have a DataFrame df with a column named “city” that contains some NaN or NULL values. We can use the fillna() method to replace these NaN values with a static value.

In the example, we will use the fillna() method to the “city” column of the df DataFrame. The NaN values in this column are replaced with the string “No City”. By setting the inplace parameter to True, the changes are made directly to the original DataFrame.

Check the Python example below to replace the NaN with static value using fillna() function:

import pandas as pd
data = {'name': ['John', 'Jane', 'Jade', 'Jan'], 'age': [25, 30, 35, 40], 'city': [None, 'London', 'Paris', None]}
df = pd.DataFrame(data)
# Display the original DataFrame
print('Original DataFrame:\n', df)

# Use fillna to replace the none value with a static value.
df["city"].fillna("No City", inplace=True)

# Display the modified DataFrame
print('Modified DataFrame:\n', df)

Output:

Original DataFrame:
    name  age    city
0  John   25    None
1  Jane   30  London
2  Jade   35   Paris
3   Jan   40    None


Modified DataFrame:
    name  age     city
0  John   25  No City
1  Jane   30   London
2  Jade   35    Paris
3   Jan   40  No City

Using the Method Parameter

The method parameter in the fillna() method allows for filling missing values using a specific method. Pandas provides several methods such as ‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, or None. These methods determine how the replacement values are propagated within the DataFrame.

In the above example, we use the fillna() to the “city” column of the df DataFrame using the ‘ffill’ (forward fill) method.

This method replaces the NaN values with the previous non-null value in the same column. Similarly, you can use ‘bfill’ (backward fill) to replace NaN values with the next non-null value in the column.

Here is an example to better understand it:

import pandas as pd
data = {'name': ['John', 'Jane', 'Jade', 'Jan'], 'age': [25, 30, 35, 40], 'city': ['New York', None, 'Paris', None]}
df = pd.DataFrame(data)
# Display the original DataFrame
print('Original DataFrame:\n', df)

# Use fillna method 'ffill' to replace the none value with a previous not null value.
df["city"].fillna(method='ffill', inplace=True)

# Display the modified DataFrame
print('Modified DataFrame:\n', df)

Output:

Original DataFrame:
    name  age      city
0  John   25  New York
1  Jane   30      None
2  Jade   35     Paris
3   Jan   40      None


Modified DataFrame:
    name  age      city
0  John   25  New York
1  Jane   30  New York
2  Jade   35     Paris
3   Jan   40     Paris

Using the Axis Parameter

The axis parameter in the fillna() method determines the axis along which the replacement should be performed. By default, it is set to 0, indicating that the replacement should be done along the rows. However, you can also specify axis=1 to perform the replacement along the columns. Let’s consider an example to illustrate the usage of the axis parameter.

In the below example, the fillna() method is used to replace NaN values in the “city” column with the string “Unknown” along the columns. By setting the axis parameter to 1, the replacement is performed column-wise.

Here is an examples in Python for using axis parameter:

import pandas as pd
data = {'name': ['John', 'Jane', 'Jade', 'Jan'], 'age': [25, 30, 35, 40], 'city': ['New York', None, 'Paris', None]}
df = pd.DataFrame(data)
# Display the original DataFrame
print('Original DataFrame:\n', df)

# Use fillna to fill missing values along columns (axis=1)
df = df.fillna(value='Unknown', axis=1)

# Display the modified DataFrame
print('Modified DataFrame:\n', df)

Output:

Original DataFrame:
    name  age      city
0  John   25  New York
1  Jane   30      None
2  Jade   35     Paris
3   Jan   40      None


Modified DataFrame:
    name age      city
0  John  25  New York
1  Jane  30   Unknown
2  Jade  35     Paris
3   Jan  40   Unknown

Modifying the DataFrame Inplace

The fillna() method provides the flexibility to modify the DataFrame inplace or return a new DataFrame with the replacements. By default, the inplace parameter is set to False, indicating that a new DataFrame will be returned. However, if you want to modify the original DataFrame directly, you can set the inplace parameter to True. Let’s see an example to demonstrate this behavior.

In the below example, the fillna() method is applied to the “city” column of the df DataFrame. By setting the inplace parameter to True, the changes are made directly to the original DataFrame.

Here is an example:

import pandas as pd
data = {'name': ['John', 'Jane', 'Jade', 'Jan'], 'age': [25, 30, 35, 40], 'city': ['New York', None, 'Paris', None]}
df = pd.DataFrame(data)
# Display the original DataFrame
print('Original DataFrame:\n', df)

# Use fillna to fill missing values by modifying the DataFrame inplace
df.fillna(value={"city": "No City"}, inplace=True)

# Display the modified DataFrame
print('Modified DataFrame:\n', df)

Output:

Original DataFrame:
    name  age      city
0  John   25  New York
1  Jane   30      None
2  Jade   35     Paris
3   Jan   40      None


Modified DataFrame:
    name  age      city
0  John   25  New York
1  Jane   30   No City
2  Jade   35     Paris
3   Jan   40   No City

Limiting the Number of Replacements

The limit parameter in the fillna() method allows you to specify the maximum number of consecutive NaN values to fill if the method parameter is used. This parameter is useful when you want to limit the number of replacements to a specific number. Let’s consider an example to understand the usage of the limit parameter.

In the below example, the fillna() method is applied to the “city” column of the df DataFrame using the ‘ffill’ method. The limit parameter is set to 1, which means only one consecutive NaN value will be replaced.

This can be helpful when you want to control the extent of filling missing values. Check the code below:

import pandas as pd
data = {'name': ['John', 'Jane', 'Jade', 'Jan'], 'age': [25, 30, 35, 40], 'city': ['New York', None, None, 'Paris']}
df = pd.DataFrame(data)
# Display the original DataFrame
print('Original DataFrame:\n', df)

# Use fillna to fill upto one consecutive missing values 
df["city"].fillna(method='ffill', limit=1, inplace=True)

# Display the modified DataFrame
print('Modified DataFrame:\n', df)

Output:

Original DataFrame:
    name  age      city
0  John   25  New York
1  Jane   30      None
2  Jade   35      None
3   Jan   40     Paris


Modified DataFrame:
    name  age      city
0  John   25  New York
1  Jane   30  New York
2  Jade   35      None
3   Jan   40     Paris

Additional Parameters

Apart from the primary parameters discussed above, the fillna() method also accepts additional keyword arguments (kwargs). These additional arguments allow for further customization and flexibility. Users can pass any other relevant keyword arguments specific to their use case.

Let’s consider an example to illustrate the usage of additional parameters.

In the below example, the fillna() method is applied to the “age” column of the df DataFrame. The NaN values in this column are replaced with the mean age of the column and the axis parameter is set to 0 to fill along the rows. Here is the Python code on how to use them:

import pandas as pd
data = {'name': ['John', 'Jane', 'Jade', 'Jan'], 'age': [25, None, 35, None], 'city': ['New York', 'London', 'Delhi', 'Paris']}
df = pd.DataFrame(data)
# Display the original DataFrame
print('Original DataFrame:\n', df)

# Use fillna to fill mean value in the missing values 
df.fillna(value={"age": df["age"].mean()}, inplace=True, axis=0)

# Display the modified DataFrame
print('Modified DataFrame:\n', df)

Output:

Original DataFrame:
    name   age      city
0  John  25.0  New York
1  Jane   NaN    London
2  Jade  35.0     Delhi
3   Jan   NaN     Paris

Modified DataFrame:
    name   age      city
0  John  25.0  New York
1  Jane  30.0    London
2  Jade  35.0     Delhi
3   Jan  30.0     Paris

Conclusion

In this article, we explored the fillna() method in Pandas DataFrame, which provides a powerful tool for replacing NaN values with desired values. We discussed the various parameters available in the fillna() method, including value, method, axis, inplace, limit and other parameters. We also provided examples and use cases to demonstrate the practical application of it. By using the fillna() method effectively, you can handle missing data and ensure the completeness of your data analysis.