What’s New ?

The Top 10 favtutor Features You Might Have Overlooked

Read More

Add Column to DataFrame Pandas (with Examples)

  • Nov 17, 2021
  • 7 Minutes Read
  • Why Trust Us
    We uphold a strict editorial policy that emphasizes factual accuracy, relevance, and impartiality. Our content is crafted by top technical writers with deep knowledge in the fields of computer science and data science, ensuring each piece is meticulously reviewed by a team of seasoned editors to guarantee compliance with the highest standards in educational content creation and publishing.
  • By Adrita Das
Add Column to DataFrame Pandas (with Examples)

 

There are many things we can do with the DataFrame we have built or imported in Pandas. It is possible to manipulate data in various ways, such as changing the data frame columns. Now, if we're reading most of the data from one data source but some from another, we'll need to know how to add columns to a Pandas DataFrame. Well, it's pretty simple. As you have already noticed, there are a few different approaches to complete this work. Of course, this can be perplexing for newcomers. As a beginner, you may see numerous alternative methods for adding a column to a data frame and wonder which one to use. Don't worry; in this article, we'll go over four different ways to do the same. So, let's get started!

What is Pandas in Python?

Pandas is a widely used open-source Python library for data science or data analysis and machine learning tasks. It has a lot of functions and methods for dealing with tabular data. Pandas' main data structure is a data frame, which is a tabular data structure with labeled rows and columns. If you are a beginner in python then you can try these 20 pandas exercises.

Now, let us dive deep into learning Pandas DataFrames below:

What is a DataFrame?

A DataFrame represents a table of data with rows and columns and is the most common Structured API. Rows in a DataFrame indicate observations or data points. The properties or attributes of the observations are represented by columns. Consider a set of property pricing data. Each row represents a house, and each column represents a characteristic of the house, such as its age, number of rooms, price, etc.

Using Pandas, what can you do with DataFrames?

Many of the time-consuming, repetitive processes connected to working with data are made simple with Pandas. Following are a few of the tasks that you can efficiently perform with Pandas DataFrame:

  • Data Inspection
  • Data Cleansing
  • Data Normalization
  • Data Visualization
  • Statistical Analysis

First, let's create an example DataFrame that we'll use to explain a few ideas related to adding columns to pandas frames throughout this article.

For example:

import pandas as pd           # importing pandas library
df = pd.DataFrame({
    'colA':[True, False, False], 
    'colB': [1, 2, 3],
})                         # creating the DataFrame

print(df)    

 

Output

    colA  colB
0   True     1
1  False     2
2  False     3  

 

Suppose we need to add a new column named 'colC' containing the values 'a', 'b', and 'c' for the indices 0, 1, and 2, respectively. How will we do it? Let's see!

 

How to Add Column to Pandas DataFrame?

Below are the four methods by which Pandas add column to DataFrame. In our case, we'll add 'colC' to our sample DataFrame mentioned earlier in the article:

1) Using the simple assignment

You can add a new column to Dataframe by simply giving your Series's data to the existing frame. It is one of the easiest and efficient methods widely used by python programmers. Note that the name of the new column should be enclosed with single quotes inside the square brackets, as shown in the below example. 

For example:

df['colC'] = s.values
print(df)

 

Output

    colA  colB colC
0   True     1    a
1  False     2    b
2  False     3    c

Note that in most circumstances, the above will work if the new column's indices match those of the DataFrame; or else, NaN values will be given to missing indices.

For example:

df['colC'] = pd.Series(['a', 'b', 'c'], index=[1, 2, 3])
print(df)

 

Output

    colA  colB colC
0   True     1  NaN
1  False     2    a
2  False     3    b

 

2) Using assign() method

Using the pandas.DataFrame.assign() method, you can insert multiple columns in a DataFrame, ignoring the index of a column to be added, or modify the values of existing columns. The method returns a new DataFrame object with all of the original columns as well as the additional(newly added) ones. Note that the index of the new columns will be ignored as well as, all the current columns will be overwritten if they are re-assigned.

For example:

e = pd.Series([1.0, 3.0, 2.0], index=[0, 2, 1])
s = pd.Series(['a', 'b', 'c'], index=[0, 1, 2])
df.assign(colC=s.values, colB=e.values)

 

Output

    colA  colB colC
0   True   1.0    a
1  False   3.0    b
2  False   2.0    c

 

3) Using insert() method

Apart from the above two methods, you can also use the method pandas.DataFrame.insert() for adding columns to DataFrame. This method comes in handy when you need to add a column at a specific position or index. Remember that here we make use of the 'len' method to identify the length of the columns for existing DataFrames. The below example adds another column named ’colC’ at the end of the DataFrame. 

For example:

df.insert(len(df.columns), 'colC', s.values)
print(df)

 

Output

    colA  colB colC
0 True 1 a 1 False 2 b 2 False 3 c

 

Now, if you want to add a column ’colC’ in between two columns - ‘colA’ and ‘colB’.

For example:

df.insert(1, 'colC', s.values)
print(df)

 

Output

    colA colC  colB
0   True    a     1
1  False    b     2
2  False    c     3

 

Note that the insert() method cannot be used to add the column with a similar name. By default, a ValueError will be thrown when a column already exists in the DataFrame.

For example: 

df.insert(1, 'colC', s.values)
df.insert(1, 'colC', s.values)

 

Output

ValueError: cannot insert colC, already exists

 

Nevertheless, the DataFrame will allow having two columns with the same name if you pass the command allow_duplicates=True to the insert() method. 

For example:

df.insert(1, 'colC', s.values)
df.insert(1, 'colC', s.values, allow_duplicates=True)
print(df)

Output

    colA colC colC  colB
0   True    a    a     1
1  False    b    b     2
2  False    c    c     3

 

4) Using concat() method

The pandas.concat() method can also be used to add a column to the existing DataFrame by passing axis=1. This method will return the new DataFrame as the output, including the newly added column. Using the index, the above method will concatenate the Series with the original DataFrame. Check out the below example for a better understanding. 

For example:

df = pd.concat([df, s.rename('colC')], axis=1)
print(df)

 

Output 

    colA  colB colC
0   True     1    a
1  False     2    b
2  False     3    c

 

Commonly you should use the above method if the indices of the objects to be added do match with each other. If the index doesn't match, every object's indices will be present in the resulting DataFrame, and the columns will represent NaN, as shown in the below example.

For example:

s = pd.Series(['a', 'b', 'c'], index=[10, 20, 30])
df = pd.concat([df, s.rename('colC')], axis=1)
print(df)

 

Output

     colA  colB colC
0    True   1.0  NaN
1   False   2.0  NaN
2   False   3.0  NaN
10    NaN   NaN    a
20    NaN   NaN    b
30    NaN   NaN    c

 

Conclusion

Adding columns to DataFrame is a commonly used data analysis and modification operation. However, Pandas provide numerous options for completing a task by giving four distinct methods, as shown in the above article. The index is one of the most challenging aspects of adding new columns to DataFrames. You should be cautious because each of the methods covered in this article may handle indices differently. However, if you have learned all the above methods perfectly, you are good to go for adding new columns to your DataFrames.

FavTutor - 24x7 Live Coding Help from Expert Tutors!

About The Author
Adrita Das
I am Adrita Das, a Biomedical Engineering student and technical content writer with a background in Computer Science. I am pursuing this goal through a career in life science research, augmented with computational techniques. I am passionate about technology and it gives immense happiness to share my knowledge through the content I make.