What’s New ?

The Top 10 favtutor Features You Might Have Overlooked

Read More

Get Unique values in Pandas DataFrame Column

  • Feb 07, 2022
  • 7 Minutes Read
  • Why Trust Us
    We uphold a strict editorial policy that emphasizes factual accuracy, relevance, and impartiality. Our content is crafted by top technical writers with deep knowledge in the fields of computer science and data science, ensuring each piece is meticulously reviewed by a team of seasoned editors to guarantee compliance with the highest standards in educational content creation and publishing.
  • By Shivali Bhadaniya
Get Unique values in Pandas DataFrame Column

 

Pandas is the most widely used Python library for data science. It provides high-performance, easy-to-use data structures and data analysis tools for the Python programming language. Pandas is a powerful tool for manipulating data once you know the core operations and how to use them. In this article, let us study one such operation to get unique values in a column of pandas dataframe along with some examples and output. But before fetching the unique values from columns, let's flashback to have a quick recap of the pandas dataframe below.

What is Pandas DataFrame?

Pandas DataFrames are a two-dimensional array with labeled data structures having different column types. It is a convenient way to work with structured data in Python and it is based on the R DataFrame. DataFrames are a standard way to store data in a tabular format, with rows to store the information and columns to name the information. Found in Pandas library, which is a Python module for data analysis, DataFrames can be easily accessed and created from many different sources. To learn more about pandas dataframe, visit "Create Empty DataFrame in Pandas"

For example:

import pandas as pd

data = {
  "Students": ["Ray", "John", "Mole", "Smith", "Jay", "Milli", "Tom", "Rick"],
  "Subjects": ["Maths", "Economics", "Science", "Maths", "Statistics", "Statistics", "Statistics", "Computers"]
}

#load data into a DataFrame object:
df = pd.DataFrame(data)

print(df) 

 

Output:

      Students    Subjects
0      Ray       Maths
1     John   Economics
2     Mole     Science
3    Smith       Maths
4      Jay  Statistics
5    Milli  Statistics
6      Tom  Statistics
7     Rick   Computers

 

How to Get Unique Values in DataFrame Column?

Consider a scenario where you have a dataframe consisting of students with respect to the subjects they study. In such a condition, if you wish to identify the total number of subjects being studied overall, it is quite infeasible for you to count each row of the dataframe and identify the unique subject category. To reduce your manual work, below are the 5 methods by which you can easily get unique values in a column of pandas dataframe:

1) Using unique() method

pandas.DataFrame().unique() method is used when we deal with a single column of a DataFrame and returns all unique elements of a column. The method returns a DataFrame containing the unique elements of a column, along with their corresponding index labels.

Syntax:

Series.unique(self)

Note that the term “series” in the above syntax displays the column in the dataframe. Pandas series is a one-dimensional array that holds data of any type.

 

For example:

import pandas as pd

data = {
  "Students": ["Ray", "John", "Mole", "Smith", "Jay", "Milli", "Tom", "Rick"],
  "Subjects": ["Maths", "Economics", "Science", "Maths", "Statistics", "Statistics", "Statistics", "Computers"]
}

#load data into a DataFrame object:
df = pd.DataFrame(data)

print(df["Subjects"].unique())
print(type(df["Subjects"].unique()))

 

Output:

['Maths' 'Economics' 'Science' 'Statistics' 'Computers']
<class 'numpy.ndarray'>

 

The final output after using the unique() method will display the array of unique elements as shown in the above example.

2) Using the drop_duplicates method

drop_duplicates() is an in-built function in the panda's library that helps to remove the duplicates from the dataframe. It helps to preserve the type of the dataframe object or its subset and removes the rows with duplicate values. When it comes to dealing with the large set of dataframe, using the drop_duplicate() method is considered to be the faster option to remove the duplicate values.

For example:

import pandas as pd

data = {
  "Students": ["Ray", "John", "Mole", "Smith", "Jay", "Milli", "Tom", "Rick"],
  "Subjects": ["Maths", "Economics", "Science", "Maths", "Statistics", "Statistics", "Statistics", "Computers"]
}

#load data into a DataFrame object:
df = pd.DataFrame(data)

print(df.drop_duplicates(subset = "Subjects"))
print(type(df.drop_duplicates(subset = "Subjects")))

 

Output:

      Students    Subjects
0      Ray       Maths
1     John   Economics
2     Mole     Science
4      Jay  Statistics
7     Rick   Computers
<class 'pandas.core.frame.DataFrame'>

 

Note that the output using the drop_duplicate method is found to be a dataframe object with the set unique rows for the given column.

3) Get unique values in multiple columns

Till now we have understood how you can get the set of unique values from a single dataframe. But what if you wish to identify unique values from more than one column. In such cases, you can merge the content of those columns for which the unique values are to be found, and later, use the unique() method on that series(column) object.

For example:

import pandas as pd

data = {
  "Students": ["Ray", "John", "Mole", "Smith", "Jay", "Smith", "Tom", "John"],
  "Subjects": ["Maths", "Economics", "Science", "Maths", "Statistics", "Statistics", "Statistics", "Computers"]
   }

#load data into a DataFrame object:
df = pd.DataFrame(data)

uniqueValues = (df['Students'].append(df['Subjects'])).unique()
print(uniqueValues)

 

Output:

['Ray' 'John' 'Mole' 'Smith' 'Jay' 'Tom' 'Maths' 'Economics' 'Science'
 'Statistics' 'Computers']

 

As shown above, you will get the array of unique elements from both the columns from the dataframe.

4) Count unique values in a single column 

Suppose instead of finding the names of unique values in the columns of the dataframe, you wish to count the total number of unique elements. In such a case, you can make use of the nunique() method instead of the unique() method as shown in the below example:

For example:

import pandas as pd

data = {
  "Students": ["Ray", "John", "Mole", "Smith", "Jay", "Milli", "Tom", "Rick"],
  "Subjects": ["Maths", "Economics", "Science", "Maths", "Statistics", "Statistics", "Statistics", "Computers"]
}

#load data into a DataFrame object:
df = pd.DataFrame(data)

uniqueValues = df['Subjects'].nunique()
print(uniqueValues)

 

Output:

5

 

Here, the output will return the count of unique elements from the given column of pandas dataframe.

Note that the above method will not count the NaN value as the unique elements of the column. To consider NaN value as the element you are looking for, you can pass the “dropna” argument and assign it to False as shown in the below example:

For example:

import pandas as pd

data = {
  "Students": ["Ray", "John", "Mole", "Smith", "Jay", "Milli", "Tom", "Rick"],
  "Subjects": ["Maths", "Economics", "Science", "Maths", "NaN", "NaN", "Statistics", "Computers"]
}

#load data into a DataFrame object:
df = pd.DataFrame(data)

uniqueValues = df['Subjects'].nunique(dropna=False)
print(uniqueValues)

 

Output:

6

 

5) Count unique values in each columns

In the above method, we count the number of unique values for the given column of the dataframe. However, if you wish to find a total number of the unique elements from each column of the dataframe, you can pass the dataframe object using the nunique() method. For better understanding, take a look at the below example.

For example:

import pandas as pd

data = {
  "Students": ["Ray", "John", "Mole", "Smith", "Jay", "Milli", "Tom", "Rick"],
  "Subjects": ["Maths", "Economics", "Science", "Maths", "Statistics", "Statistics", "Statistics", "Computers"]
}

uniqueValues = df.nunique()
print(uniqueValues)

 

Output:

Students    8
Subjects    6
dtype: int64

 

Conclusion

A data frame is an efficient way to store data in a tabular fashion that retains the 1-dimensional shape of features while also creating a multi-dimensional matrix. Data frames are a great tool for dealing with big datasets because they allow you to use techniques like parallel computing and machine learning. In this article, we studied how pandas get unique values in columns using some in-built methods and their respective example. Visit Favtutor to learn more such useful insights.

FavTutor - 24x7 Live Coding Help from Expert Tutors!

About The Author
Shivali Bhadaniya
I'm Shivali Bhadaniya, a computer engineer student and technical content writer, very enthusiastic to learn and explore new technologies and looking towards great opportunities. It is amazing for me to share my knowledge through my content to help curious minds.