Pandas is a widely used Python library for data manipulation and analysis. One of its most powerful functions is apply(), which allows users to apply a function to each element of a Pandas series. It is incredibly versatile and can be used for a wide range of data science and machine learning tasks. In this article, we will explore the various ways of using the apply() function.
What is the apply() function in Pandas?
The apply() function in Pandas allows users to apply a function to each element of a Pandas series. This function acts as a powerful tool for data manipulation, as it enables users to perform complex operations on their data with ease. By applying a function to each element of a series, users can transform, filter, or aggregate their data based on specific conditions or requirements.
This method is particularly useful in data science and machine learning tasks, where the ability to process and transform data efficiently is crucial. Whether you need to calculate statistics, perform data cleaning operations, or even create new features based on existing ones, the function provides a flexible and efficient way to accomplish these tasks.
The apply() function returns a DataFrame or Series object with the changes applied. It is important to note that it does not modify the original DataFrame or series.
Here is the basic syntax:
dataframe.apply(func, axis=0, raw=False, result_type=None, args=(), **kwds)
Let’s break it down:
- func: This is the function that we want to apply to the DataFrame or series.
- axis: This parameter specifies the axis along which the function should be applied. The default value is 0, which corresponds to the index (row) axis. Setting axis=1 applies the function to each column of the DataFrame or series.
- raw: This parameter determines whether the row or column should be passed as an ndarray(n-dimensional) object to the function. The default value is False, which passes each row or column as a Pandas Series to the function.
- result_type: This parameter specifies how the result will be returned. The possible values are ‘expand’, ‘reduce’, ‘broadcast’, or None. The default behaviour (None) infers the return type from the applied function.
- args: This parameter allows users to pass additional positional arguments to the function if needed.
- **kwds: This parameter enables users to pass additional keyword arguments to the function.
Now, let us see the various methods to use it.
Applying a Function to DataFrame Elements
One of the most frequent use cases of the apply() function is to apply a function to each element of a DataFrame or series. This allows us to perform operations on individual elements and transform our data accordingly.
Let’s consider a simple example where we have a DataFrame with two columns, ‘x’ and ‘y’. We want to calculate the sum of each row by applying a function to the DataFrame.
In this example, we will define a function called calculate_sum() that will take a row as the input and return the sum of its elements.
We will then apply this function to the DataFrame. The result will contain the sum of each row.
Here is an example:
import pandas as pd # Functions to calculate the sum of rows def calculate_sum(row): return row.sum() data = {"x": [50, 40, 30], "y": [300, 1112, 42]} df = pd.DataFrame(data) # Display the original DataFrame print('Original DataFrame:\n', df) #Using the apply method to find the sum result = df.apply(calculate_sum) print('Calculated result: ',result)
Output:
Original DataFrame:
x y
0 50 300
1 40 1112
2 30 42
Calculated result:
x 120
y 1454
Utilizing Lambda Functions with apply()
In addition to defining separate functions, we can also use lambda functions directly with the apply() function. Lambda functions are anonymous functions that can be defined on the fly, without the need for a formal function definition. In simpler terms, the lambda function allows us to write shorter code, making the code more efficient.
In this case, we can directly define a lambda function within the apply() function call. The lambda function takes a row as input and returns the sum of its elements. The result will contain the sum of each row, just like in the previous example.
This can be particularly useful when we need to perform simple calculations or transformations on our data. Lambda functions provide a concise and easy way to do operations, without the need for separate function definitions.
The below Python code demonstrates the use of the lambda function:
import pandas as pd data = {"x": [50, 40, 30], "y": [300, 1112, 42]} df = pd.DataFrame(data) # Display the original DataFrame print('Original DataFrame:\n', df) # Using the apply method with the Lambda function result = df.apply(lambda row: row.sum()) print('Calculated result:\n',result)
Output:
Original DataFrame:
x y
0 50 300
1 40 1112
2 30 42
Calculated result:
x 120
y 1454
Applying Functions Along Different Axes
The apply() function in Pandas allows us to apply a function along different axes of the DataFrame or series. By default, it applies the function along the index (row) axis (axis=0). However, we can also specify the column axis (axis=1) to apply the function to each column instead.
In this example, we will use a lambda function to calculate the sum of each column. By setting axis=1 in the apply() function, we will apply the lambda function to each column instead of each row. The resulting series, result, will contain the sum of each column.
Applying functions along different axes allows us to perform calculations or transformations on different dimensions of our data. This flexibility enables us to handle a wide range of data analysis tasks efficiently.
The following code demonstrates it:
import pandas as pd # Functions to calculate sum of rows def calculate_sum(row): return row.sum() data = {"x": [50, 40, 30], "y": [300, 1112, 42]} df = pd.DataFrame(data) # Display the original DataFrame print('Original DataFrame:\n', df) # Apply function along the column axis result = df.apply(lambda col: col.sum(), axis=1) print('Calculated result:\n',result)
Output:
Original DataFrame:
x y
0 50 300
1 40 1112
2 30 42
Calculated result:
0 350
1 1152
2 72
Using Additional Arguments with apply()
In some cases, we may need to pass additional arguments to the function that we are applying using the apply() function. This can be achieved by using the args parameter, which allows us to pass additional positional arguments to the function.
Let’s consider an example where we have a DataFrame with two columns, ‘x’ and ‘y’. We want to calculate the product of each element by applying a function that takes an additional argument.
In this example, we will define a function called calculate_product() that takes an element and a factor as input and returns the product of the element and the factor.
We will then use a lambda function within the apply() function call to apply this function to each element of the DataFrame. The resulting DataFrame, result, will contain the product of each element multiplied by 5.
By utilizing the args parameter, we can pass additional arguments to the function and customize its behavior based on our specific requirements. This flexibility enhances the functionality of the function and allows us to perform more complex operations on our data.
Here is how to use additional arguments to the function:
import pandas as pd # Functions to calculate the product def calculate_product(element, factor): return element * factor data = {"x": [2, 4, 6], "y": [10, 20, 30]} df = pd.DataFrame(data) # Display the original DataFrame print('Original DataFrame:\n', df) # Use the apply function with the arguments result = df.apply(lambda row: calculate_product(row, factor=5)) print('Calculated result:\n',result)
Output:
Original DataFrame:
x y
0 2 10
1 4 20
2 6 30
Calculated result:
x y
0 10 50
1 20 100
2 30 150
With Positional and Keyword Arguments
In addition to passing additional positional arguments, we can also pass keyword arguments to the function using the apply() function. This can be achieved by using the **kwds parameter, which allows us to pass additional keyword arguments to the function.
Let’s consider an example where we have a DataFrame with two columns, ‘x’ and ‘y’. We want to calculate the product of each element by applying a function that takes both positional and keyword arguments.
In this example, we will define a function called calculate_product() that takes an element, a factor, and an offset as input and returns the product of the element and the factor plus the offset.
We will then use a lambda function within the apply() function call to apply this function to each element of the DataFrame. The result will contain the product of each element multiplied by 5 and incremented by 10.
By utilizing the **kwds parameter, we can pass additional keyword arguments to the function and fine-tune its behaviour based on our specific requirements. This flexibility enables us to perform more advanced calculations and transformations on our data.
Here is an example:
import pandas as pd # Functions to calculate the product def calculate_product(element, factor, offset=0): return (element * factor) + offset data = {"x": [2, 4, 6], "y": [10, 20, 30]} df = pd.DataFrame(data) # Display the original DataFrame print('Original DataFrame:\n', df) # Use the apply function with the arguments result = df.apply(lambda row: calculate_product(row, factor=5, offset=10)) print('Calculated result:\n',result)
Output:
Original DataFrame:
x y
0 2 10
1 4 20
2 6 30
Calculated result:
x y
0 20 60
1 30 110
2 40 160
Now you can move on to learn about how to drop a column to DataFrame Pandas.
Conclusion
In this article, we have explored the apply() function in Pandas and its various applications in data analysis and manipulation. This function in Pandas provides a powerful and flexible way to process and transform data. By applying custom functions to our data, we can perform complex calculations, transformations, and feature engineering tasks efficiently and effectively.