There are two essential functions in the Pandas library are isna() and isnull(), which play significant roles in the process of data cleaning. These functions are commonly used to detect any NULL or NaN values in the DataFrames. In this article, we will learn everything you need to learn about the isna() function in Pandas DataFrame as a beginner.
What is the isna() function in Pandas?
The isna() function in Pandas is designed to identify missing values within the given input. This could be a scalar or array-like object. The function returns a boolean result or an array of booleans indicating the presence or absence of missing values. It can be used to detect missing values (NaN) in a DataFrame or Series.
We should always handle the missing values before any conversions or data manipulation. One of the best ways to deal with the missing values is by using the fillna method.
When the isna() function is called, it returns an object of the same size, indicating whether the values are NA/NaN. ‘NA’ stands for ‘Not Available’ and is often used to denote missing values in pandas DataFrames.
The result is a DataFrame or Series of the same shape as the input, where each element is a boolean indicating whether the corresponding element in the original DataFrame or Series is missing.
We will now learn how to use the isna() method on a scalar as well as a DataFrame.
Using isna on Scalar Arguments
When used on scalars in Pandas the isna() function returns a singular boolean. The syntax for applying isna() function to a series is quite simple:
Series.isna()
When applied to ndarrays (n-dimensional arrays), the function returns an ndarray of booleans. This simply means that the isna() function will result in an array with True and False values. If the element at the position is NaN the value is True in the resultant array and if the element at a position in the array is not NaN it will simply return False.
Let us see an example:
import pandas as pd # Create a sample Series series = pd.Series([1, 2, None, 4, 5]) # Check for missing values in the Series missing_values = series.isna() # Display the result print("Series with Missing Value Indicator:") print(missing_values)
Output:
Series with Missing Value Indicator:
0 False
1 False
2 True
3 False
4 False
Similarly, for indexes too, an ndarray of booleans is returned, representing the presence or absence of missing values in the indexed data.
If we want to check if certain indices in a Series are associated with missing values, we can create a boolean mask using the isna() function and then use that mask with specific indices.
Let us see an example of this as well:
import pandas as pd # Create a sample Series data = pd.Series([1, 2, None, 4, 5]) # Check for missing values in the Series missing_values = data.isna() # Define a list of indices to check indices_to_check = [1, 2, 4] # Use the boolean mask to check if values at specific indices are missing for index in indices_to_check: if missing_values[index]: print(f"Value at index {index} is missing.") else: print(f"Value at index {index} is not missing.")
Output:
Value at index 1 is not missing.
Value at index 2 is missing.
Value at index 4 is not missing.
Using isna on DataFrame
The isna() method can also be used to detect the missing values in a DataFrame. Same as above the output will return the same type, containing booleans.The syntax is also very simple:
DataFrame.isna()
The function can be applied to a DataFrame to create a new DataFrame where each element is True if the corresponding element in the original DataFrame is NaN, and False otherwise.
You will understand it better with an example:
import pandas as pd # Create a sample DataFrame data = {'Column1': [1, 2, 3, None, 5], 'Column2': [10, 20, 30, 40, 50]} df = pd.DataFrame(data) # Display the Original DataFrame print('Original DataFrame:\n', df) # Check for missing values in the DataFrame missing_values = df.isna() # Display the result print("DataFrame with Missing Value Indicator:") print(missing_values)
Output:
Original DataFrame:
Column1 Column2
0 1.0 10
1 2.0 20
2 3.0 30
3 NaN 40
4 5.0 50
DataFrame with Missing Value Indicator:
Column1 Column2
0 False False
1 False False
2 False False
3 True False
4 False False
Distinction Between isna() and isnull()
Like isna(), the isnull() function is also a method used to detect missing values. It operates similarly to isna() and is, in fact, an alias for it. The isnull() function, like the isna() function, returns a DataFrame with True and False values. A True value indicates a null or missing value, while a False value indicates a not null and not missing value.
Let us see an example:
import pandas as pd # Create a sample DataFrame data = {'Column1': [1, 2, 3, None, 5], 'Column2': [10, 20, 30, 40, 50]} df = pd.DataFrame(data) # Display the Original DataFrame print('Original DataFrame:\n', df) # Check for missing values in the DataFrame missing_values = df.isnull() # Display the result print("DataFrame with Missing Value Indicator:") print(missing_values)
Output:
Original DataFrame:
Column1 Column2
0 1.0 10
1 2.0 20
2 3.0 30
3 NaN 40
4 5.0 50
DataFrame with Missing Value Indicator:
Column1 Column2
0 False False
1 False False
2 False False
3 True False
4 False False
As shown by the examples above, both functions perform the same operation. Therefore, the difference is not in functionality but in naming. isnull() is essentially an alias for isna(), meaning they are interchangeable.
Both functions return a DataFrame and are used to detect missing values. Each function returns a boolean same-sized object indicating if the values are NA. NA values, such as None or numpy.NaN, get mapped to True values. Everything else gets mapped to False values.
Conclusion
In this article, we have learned about the isna() function provided by the Pandas library in Python. It is a very convenient and efficient method to detect any NaN values in a Series or a DataFrame.