In the world of data manipulation, the ability to copy a DataFrame in Pandas is an essential skill for data scientists. This is important when working with large datasets or simply wanting to create a new DataFrame based on an existing one. In this article, we will learn everything about DataFrame copy() method to clone a DataFrame in Python.
What is the copy() Method in Pandas?
The copy() function provided in Pandas library is used to create a copy of a DataFrame. It creates a new DataFrame with the same data and column names, essentially duplicating the structure and content of the original DataFrame.
This function is crucial when you want to create an independent copy of a DataFrame. Without copy(), if you assign one DataFrame to another variable, changes made in one might affect the other. This is because the assignment creates a reference to the original DataFrame rather than a new one.
Let us look at a simple example to understand this better:
import pandas as pd # Creating a DataFrame data = {'Name': ['John', 'Jane', 'Bob'], 'Age': [25, 30, 22], 'City': ['New York', 'San Francisco', 'Los Angeles']} original_df = pd.DataFrame(data) # Display the original DataFrame print('Original DataFrame:\n', original_df) # Creating a copy of the DataFrame copied_df = original_df.copy() # Display the original DataFrame print('Copied DataFrame:\n', copied_df)
Output:
Original DataFrame:
Name Age City
0 John 25 New York
1 Jane 30 San Francisco
2 Bob 22 Los Angeles
Copied DataFrame:
Name Age City
0 John 25 New York
1 Jane 30 San Francisco
2 Bob 22 Los Angeles
How to copy a DataFrame in Pandas?
Depending on our needs the .copy() method provides two ways to copy a DataFrame:
- Deep Copy: A deep copy involves traversing the indices and individually copying them into a new DataFrame. When the deep parameter of the .copy() function = True, a deep copy is made, meaning that the data and indices are completely independent of the original DataFrame. Changes in the copied DataFrame will not affect the original.
- Shallow Copy: When the deep parameter of the .copy() = False, a shallow copy is made, which means that the data and indices are shared with the original DataFrame. Modifications to the copied DataFrame may impact the original. Shallow copying is more memory-efficient but requires caution to avoid unintended side effects.
Let us now look at how to create Deep and Shallow copies of our Dataframe depending on our needs.
Creating a Deep Copy
We can simply create deep copies by setting the deep parameter of the .copy() function = True. Let us see an example:
import pandas as pd # Creating a DataFrame data = {'Name': ['John', 'Jane', 'Bob'], 'Age': [25, 30, 22], 'City': ['New York', 'San Francisco', 'Los Angeles']} original_df = pd.DataFrame(data) # Creating a copy of the DataFrame copied_df = original_df.copy(deep=True) # Modifying the copied DataFrame copied_df['Age'] = [26, 31, 23] # Display the original DataFrame after modifying the copy print('Original DataFrame:\n', original_df) # Display the copied DataFrame print('Copied DataFrame:\n', copied_df)
Output:
Original DataFrame:
Name Age City
0 John 25 New York
1 Jane 30 San Francisco
2 Bob 22 Los Angeles
Copied DataFrame:
Name Age City
0 John 26 New York
1 Jane 31 San Francisco
2 Bob 23 Los Angeles
In the above example, we can see that even when we edit our copy it does not change the original DataFrame.
Creating a Shallow Copy
We can make the Shallow copy by setting the deep parameter to False. Let us see an example:
import pandas as pd # Creating a DataFrame data = {'Name': ['John', 'Jane', 'Bob'], 'Age': [25, 30, 22], 'City': ['New York', 'San Francisco', 'Los Angeles']} original_df = pd.DataFrame(data) # Display the original DataFrame print('Original DataFrame:\n', original_df) # Creating a copy of the DataFrame copied_df = original_df.copy(deep=False) # Modifying the copied DataFrame copied_df['Age'][0] = 26 # Display the original DataFrame after modifying the copy print('Original DataFrame after modification of the copy:\n', original_df) # Display the copied DataFrame print('Copied DataFrame:\n', copied_df)
Output:
Original DataFrame:
Name Age City
0 John 25 New York
1 Jane 30 San Francisco
2 Bob 22 Los Angeles
Original DataFrame after modification of the copy:
Name Age City
0 John 26 New York
1 Jane 30 San Francisco
2 Bob 22 Los Angeles
Copied DataFrame:
Name Age City
0 John 26 New York
1 Jane 30 San Francisco
2 Bob 22 Los Angeles
As we can see, changes made in the copy of the DataFrame can be seen in the original DataFrame as well.
Does copy () work on lists in Python?
No, in Python, the copy() method is not directly applicable to lists. However, it can be used with the copy module to create a shallow copy of a list. To create a deep copy of a list in Python, you can use the copy module’s deepcopy() function. You can also convert the list to DataFrame in python in such situations.
Conclusion
In this article, we explored the various ways and methods to copy a DataFrame in Pandas Python. By understanding the concept of deep copying and shallow copying, you can ensure data integrity, avoid unintended side effects, and optimize memory usage.