There are many things we can do with the DataFrame we have built or imported in Pandas. It is possible to manipulate data in various ways, such as changing the data frame columns. Now, if we're reading most of the data from one data source but some from another, we'll need to know how to add columns to a Pandas DataFrame. Well, it's pretty simple. As you have already noticed, there are a few different approaches to complete this work. Of course, this can be perplexing for newcomers. As a beginner, you may see numerous alternative methods for adding a column to a data frame and wonder which one to use. Don't worry; in this article, we'll go over four different ways to do the same. So, let's get started!
What is Pandas in Python?
Pandas is a widely used open-source Python library for data science or data analysis and machine learning tasks. It has a lot of functions and methods for dealing with tabular data. Pandas' main data structure is a data frame, which is a tabular data structure with labeled rows and columns. If you are a beginner in python then you can try these 20 pandas exercises.
Now, let us dive deep into learning Pandas DataFrames below:
What is a DataFrame?
A DataFrame represents a table of data with rows and columns and is the most common Structured API. Rows in a DataFrame indicate observations or data points. The properties or attributes of the observations are represented by columns. Consider a set of property pricing data. Each row represents a house, and each column represents a characteristic of the house, such as its age, number of rooms, price, etc.
Using Pandas, what can you do with DataFrames?
Many of the time-consuming, repetitive processes connected to working with data are made simple with Pandas. Following are a few of the tasks that you can efficiently perform with Pandas DataFrame:
- Data Inspection
- Data Cleansing
- Data Normalization
- Data Visualization
- Statistical Analysis
First, let's create an example DataFrame that we'll use to explain a few ideas related to adding columns to pandas frames throughout this article.
For example:
import pandas as pd # importing pandas library df = pd.DataFrame({ 'colA':[True, False, False], 'colB': [1, 2, 3], }) # creating the DataFrame print(df)
Output
colA colB 0 True 1 1 False 2 2 False 3
Suppose we need to add a new column named 'colC' containing the values 'a', 'b', and 'c' for the indices 0, 1, and 2, respectively. How will we do it? Let's see!
How to Add Column to Pandas DataFrame?
Below are the four methods by which Pandas add column to DataFrame. In our case, we'll add 'colC' to our sample DataFrame mentioned earlier in the article:
1) Using the simple assignment
You can add a new column to Dataframe by simply giving your Series's data to the existing frame. It is one of the easiest and efficient methods widely used by python programmers. Note that the name of the new column should be enclosed with single quotes inside the square brackets, as shown in the below example.
For example:
df['colC'] = s.values print(df)
Output
colA colB colC 0 True 1 a 1 False 2 b 2 False 3 c
Note that in most circumstances, the above will work if the new column's indices match those of the DataFrame; or else, NaN values will be given to missing indices.
For example:
df['colC'] = pd.Series(['a', 'b', 'c'], index=[1, 2, 3]) print(df)
Output
colA colB colC 0 True 1 NaN 1 False 2 a 2 False 3 b
2) Using assign() method
Using the pandas.DataFrame.assign() method, you can insert multiple columns in a DataFrame, ignoring the index of a column to be added, or modify the values of existing columns. The method returns a new DataFrame object with all of the original columns as well as the additional(newly added) ones. Note that the index of the new columns will be ignored as well as, all the current columns will be overwritten if they are re-assigned.
For example:
e = pd.Series([1.0, 3.0, 2.0], index=[0, 2, 1]) s = pd.Series(['a', 'b', 'c'], index=[0, 1, 2]) df.assign(colC=s.values, colB=e.values)
Output
colA colB colC 0 True 1.0 a 1 False 3.0 b 2 False 2.0 c
3) Using insert() method
Apart from the above two methods, you can also use the method pandas.DataFrame.insert() for adding columns to DataFrame. This method comes in handy when you need to add a column at a specific position or index. Remember that here we make use of the 'len' method to identify the length of the columns for existing DataFrames. The below example adds another column named ’colC’ at the end of the DataFrame.
For example:
df.insert(len(df.columns), 'colC', s.values) print(df)
Output
colA colB colC
0 True 1 a 1 False 2 b 2 False 3 c
Now, if you want to add a column ’colC’ in between two columns - ‘colA’ and ‘colB’.
For example:
df.insert(1, 'colC', s.values) print(df)
Output
colA colC colB 0 True a 1 1 False b 2 2 False c 3
Note that the insert() method cannot be used to add the column with a similar name. By default, a ValueError will be thrown when a column already exists in the DataFrame.
For example:
df.insert(1, 'colC', s.values) df.insert(1, 'colC', s.values)
Output
ValueError: cannot insert colC, already exists
Nevertheless, the DataFrame will allow having two columns with the same name if you pass the command allow_duplicates=True to the insert() method.
For example:
df.insert(1, 'colC', s.values) df.insert(1, 'colC', s.values, allow_duplicates=True) print(df)
Output
colA colC colC colB 0 True a a 1 1 False b b 2 2 False c c 3
4) Using concat() method
The pandas.concat() method can also be used to add a column to the existing DataFrame by passing axis=1. This method will return the new DataFrame as the output, including the newly added column. Using the index, the above method will concatenate the Series with the original DataFrame. Check out the below example for a better understanding.
For example:
df = pd.concat([df, s.rename('colC')], axis=1) print(df)
Output
colA colB colC 0 True 1 a 1 False 2 b 2 False 3 c
Commonly you should use the above method if the indices of the objects to be added do match with each other. If the index doesn't match, every object's indices will be present in the resulting DataFrame, and the columns will represent NaN, as shown in the below example.
For example:
s = pd.Series(['a', 'b', 'c'], index=[10, 20, 30]) df = pd.concat([df, s.rename('colC')], axis=1) print(df)
Output
colA colB colC 0 True 1.0 NaN 1 False 2.0 NaN 2 False 3.0 NaN 10 NaN NaN a 20 NaN NaN b 30 NaN NaN c
Conclusion
Adding columns to DataFrame is a commonly used data analysis and modification operation. However, Pandas provide numerous options for completing a task by giving four distinct methods, as shown in the above article. The index is one of the most challenging aspects of adding new columns to DataFrames. You should be cautious because each of the methods covered in this article may handle indices differently. However, if you have learned all the above methods perfectly, you are good to go for adding new columns to your DataFrames.