Pandas is a very useful Python library that provides great data analysis with the help of its DataFrame. On many occasions, we may want to combine two DataFrames, either vertically (along rows) or horizontally (along columns), depending on our data analysis needs. This article will explain how to concatenate two or more DataFrames using the concat() function in pandas.
Concatenation of Two or More DataFrames in Pandas
Concatenation simply means combining or putting together entities. Concatenation, in the context of pandas, refers to the process of combining two or more DataFrames along either the rows or columns axis. It allows us to merge datasets with similar or different structures, creating a unified DataFrame that can be easily analyzed and manipulated.
Let us now explore the various techniques we can use to concat two or more DataFrames.
Concatenate along Rows
We can concat the DataFrames along the rows. One way to concatenate DataFrames is by stacking them vertically along the rows axis. We can do this by using the pd.concat() function in pandas. We set the axis to 0.
Let’s see how to concatenate along rows with an example:
import pandas as pd import numpy as np # Create the DataFrames df1 = pd.DataFrame(np.random.randint(25, size=(4, 4)), index=["1", "2", "3", "4"], columns=["A", "B", "C", "D"]) df2 = pd.DataFrame(np.random.randint(25, size=(6, 4)), index=["5", "6", "7", "8", "9", "10"], columns=["A", "B", "C", "D"]) # Display the original DataFrames print('DataFrame 1:\n', df1) print('DataFrame 2:\n', df2) # Concat along rows df = pd.concat([df1, df2], axis=0) # Display the concated DataFrame print('Concated DataFrame:\n', df)
Output:
DataFrame 1:
A B C D
1 13 21 22 24
2 0 8 16 8
3 21 16 1 21
4 10 14 19 17
DataFrame 2:
A B C D
5 24 17 10 0
6 0 8 5 4
7 16 17 22 6
8 18 21 5 10
9 2 23 4 16
10 15 7 0 2
Concated DataFrame:
A B C D
1 13 21 22 24
2 0 8 16 8
3 21 16 1 21
4 10 14 19 17
5 24 17 10 0
6 0 8 5 4
7 16 17 22 6
8 18 21 5 10
9 2 23 4 16
10 15 7 0 2
Concatenate Along Columns
We can also concatenate our DataFrames along the columns, just like we did with the rows.
We can concatenate DataFrames horizontally along the columns axis. This can be useful when we have DataFrames with different columns but the same index values. We set the axis to 1. The missing values will be replaced by Nan values. To learn how to handle the missing values refer to pandas-fillna.
Here is the Python code to do it:
import pandas as pd import numpy as np # Create the DataFrames df1 = pd.DataFrame(np.random.randint(25, size=(4, 4)), index=["1", "2", "3", "4"], columns=["A", "B", "C", "D"]) df2 = pd.DataFrame(np.random.randint(25, size=(6, 4)), index=["5", "6", "7", "8", "9", "10"], columns=["A", "B", "C", "D"]) # Display the original DataFrames print('DataFrame 1:\n', df1) print('DataFrame 2:\n', df2) # Concat along collumns df = pd.concat([df1, df2], axis=1) # Display the concated DataFrame print('Concated DataFrame:\n', df)
Output:
DataFrame 1:
A B C D
1 5 15 1 22
2 7 10 0 2
3 11 4 8 13
4 15 18 18 4
DataFrame 2:
A B C D
5 15 1 10 19
6 2 12 16 10
7 15 20 8 2
8 17 13 1 10
9 7 5 9 16
10 2 20 13 9
Concated DataFrame:
A B C D A B C D
1 5.0 15.0 1.0 22.0 NaN NaN NaN NaN
2 7.0 10.0 0.0 2.0 NaN NaN NaN NaN
3 11.0 4.0 8.0 13.0 NaN NaN NaN NaN
4 15.0 18.0 18.0 4.0 NaN NaN NaN NaN
5 NaN NaN NaN NaN 15.0 1.0 10.0 19.0
6 NaN NaN NaN NaN 2.0 12.0 16.0 10.0
7 NaN NaN NaN NaN 15.0 20.0 8.0 2.0
8 NaN NaN NaN NaN 17.0 13.0 1.0 10.0
9 NaN NaN NaN NaN 7.0 5.0 9.0 16.0
10 NaN NaN NaN NaN 2.0 20.0 13.0 9.0
Concatenating with Append
In addition to the pd.concat() function, pandas provide an easy shortcut for concatenating DataFrames using the append() method. The append() method can be used to append one or more DataFrames to another DataFrame.
Let us see an example:
import pandas as pd import numpy as np # Create the DataFrames df1 = pd.DataFrame(np.random.randint(25, size=(4, 4)), index=["1", "2", "3", "4"], columns=["A", "B", "C", "D"]) df2 = pd.DataFrame(np.random.randint(25, size=(6, 4)), index=["5", "6", "7", "8", "9", "10"], columns=["A", "B", "C", "D"]) # Display the original DataFrames print('DataFrame 1:\n', df1) print('DataFrame 2:\n', df2) # Concat using append() df = df1.append(df2) # Display the concated DataFrame print('Concated DataFrame:\n', df)
Output:
DataFrame 1:
A B C D
1 12 10 12 3
2 3 12 9 19
3 14 13 17 20
4 19 20 2 14
DataFrame 2:
A B C D
5 16 3 21 22
6 19 12 21 23
7 7 14 24 23
8 0 11 16 23
9 9 23 2 8
10 21 10 21 18
Concated DataFrame:
A B C D
1 12 10 12 3
2 3 12 9 19
3 14 13 17 20
4 19 20 2 14
5 16 3 21 22
6 19 12 21 23
7 7 14 24 23
8 0 11 16 23
9 9 23 2 8
10 21 10 21 18
Using Various Types of Joins
Another method to cancatenate DataFrames is by using the joins. When joining DataFrames, we have a lot of types of joins available to us. The type of join determines how the rows from the original DataFrames will be combined in the resulting DataFrame.
The common types of joins are:
- Inner Join: The resulting DataFrame will only contain rows where the key exists in both DataFrames being joined. It acts like an intersection of the two DataFrames.
- Outer Join: The resulting DataFrame will contain all rows from both DataFrames. It acts like a Union of the DataFrames. The missing values will be replaced by NaN.
- Left Join: The resulting DataFrame will contain all rows from the left DataFrame and the matched rows from the right DataFrame. Again, the missing values will be replaced by the Nan values.
- Right Join: The resulting DataFrame will contain all rows from the right DataFrame and the matched rows from the left DataFrame. The missing values will be replaced by Nan values.
Let us see how to implement it in Python:
import pandas as pd import numpy as np # Create the DataFrames df1 = pd.DataFrame(np.random.randint(25, size=(4, 4)), index=["1", "2", "3", "4"], columns=["A", "B", "C", "D"]) df2 = pd.DataFrame(np.random.randint(25, size=(6, 4)), index=["5", "6", "7", "8", "9", "10"], columns=["A", "B", "C", "D"]) # Display the original DataFrames print('DataFrame 1:\n', df1) print('DataFrame 2:\n', df2) # Concat using outer join on 'B' df = pd.merge(df1, df2, on='B', how='outer') # Display the concated DataFrame print('Concated DataFrame:\n', df)
Output:
DataFrame 1:
A B C D
1 7 13 21 0
2 0 6 19 5
3 4 1 2 14
4 9 20 16 1
DataFrame 2:
A B C D
5 10 23 0 3
6 4 11 10 24
7 2 13 13 23
8 7 12 9 10
9 19 23 4 15
10 0 21 9 2
Concated DataFrame:
A_x B C_x D_x A_y C_y D_y
0 7.0 13 21.0 0.0 2.0 13.0 23.0
1 0.0 6 19.0 5.0 NaN NaN NaN
2 4.0 1 2.0 14.0 NaN NaN NaN
3 9.0 20 16.0 1.0 NaN NaN NaN
4 NaN 23 NaN NaN 10.0 0.0 3.0
5 NaN 23 NaN NaN 19.0 4.0 15.0
6 NaN 11 NaN NaN 4.0 10.0 24.0
7 NaN 12 NaN NaN 7.0 9.0 10.0
8 NaN 21 NaN NaN 0.0 9.0 2.0
Conclusion
In this article, we have discovered various methods to concat or join two or more Pandas DataFrames. Remember to experiment with different concatenation techniques, so that you will have the flexibility and power to merge and analyze datasets with ease. You can now move in to learn how to iterate over rows in Pandas, which is also important to learn for beginners.