Pandas is the most widely used Python library for data science. It provides high-performance, easy-to-use data structures and data analysis tools for the Python programming language. Pandas is a powerful tool for manipulating data once you know the core operations and how to use them. In this article, let us study one such operation to get unique values in a column of pandas dataframe along with some examples and output. But before fetching the unique values from columns, let's flashback to have a quick recap of the pandas dataframe below.

**What is Pandas DataFrame?**

Pandas DataFrames are a two-dimensional array with labeled data structures having different column types. It is a convenient way to work with structured data in Python and it is based on the R DataFrame. DataFrames are a standard way to store data in a tabular format, with rows to store the information and columns to name the information. Found in Pandas library, which is a Python module for data analysis, DataFrames can be easily accessed and created from many different sources. To learn more about pandas dataframe, visit "**Create Empty DataFrame in Pandas**"

**For example:**

import pandas as pd data = { "Students": ["Ray", "John", "Mole", "Smith", "Jay", "Milli", "Tom", "Rick"], "Subjects": ["Maths", "Economics", "Science", "Maths", "Statistics", "Statistics", "Statistics", "Computers"] } #load data into a DataFrame object: df = pd.DataFrame(data) print(df)

**Output:**

Students Subjects 0 Ray Maths 1 John Economics 2 Mole Science 3 Smith Maths 4 Jay Statistics 5 Milli Statistics 6 Tom Statistics 7 Rick Computers

**How to Get Unique Values in DataFrame Column?**

Consider a scenario where you have a dataframe consisting of students with respect to the subjects they study. In such a condition, if you wish to identify the total number of subjects being studied overall, it is quite infeasible for you to count each row of the dataframe and identify the unique subject category. To reduce your manual work, below are the 5 methods by which you can easily get unique values in a column of pandas dataframe:

**1) Using unique() method**

**pandas.DataFrame().unique()** method is used when we deal with a single column of a DataFrame and returns all unique elements of a column. The method returns a DataFrame containing the unique elements of a column, along with their corresponding index labels.

**Syntax:**

Series.unique(self)

Note that the term “series” in the above syntax displays the column in the dataframe. Pandas series is a one-dimensional array that holds data of any type.

**For example:**

import pandas as pd data = { "Students": ["Ray", "John", "Mole", "Smith", "Jay", "Milli", "Tom", "Rick"], "Subjects": ["Maths", "Economics", "Science", "Maths", "Statistics", "Statistics", "Statistics", "Computers"] } #load data into a DataFrame object: df = pd.DataFrame(data) print(df["Subjects"].unique()) print(type(df["Subjects"].unique()))

**Output:**

['Maths' 'Economics' 'Science' 'Statistics' 'Computers'] <class 'numpy.ndarray'>

The final output after using the unique() method will display the array of unique elements as shown in the above example.

**2) Using the drop_duplicates method**

**drop_duplicates()** is an in-built function in the panda's library that helps to remove the duplicates from the dataframe. It helps to preserve the type of the dataframe object or its subset and removes the rows with duplicate values. When it comes to dealing with the large set of dataframe, using the drop_duplicate() method is considered to be the faster option to remove the duplicate values.

**For example:**

import pandas as pd data = { "Students": ["Ray", "John", "Mole", "Smith", "Jay", "Milli", "Tom", "Rick"], "Subjects": ["Maths", "Economics", "Science", "Maths", "Statistics", "Statistics", "Statistics", "Computers"] } #load data into a DataFrame object: df = pd.DataFrame(data) print(df.drop_duplicates(subset = "Subjects")) print(type(df.drop_duplicates(subset = "Subjects")))

**Output:**

Students Subjects 0 Ray Maths 1 John Economics 2 Mole Science 4 Jay Statistics 7 Rick Computers <class 'pandas.core.frame.DataFrame'>

Note that the output using the drop_duplicate method is found to be a dataframe object with the set unique rows for the given column.

**3) Get unique values in multiple columns**

Till now we have understood how you can get the set of unique values from a single dataframe. But what if you wish to identify unique values from more than one column. In such cases, you can **merge the content of those columns** for which the unique values are to be found, and later, use the **unique() method** on that series(column) object.

**For example:**

import pandas as pd data = { "Students": ["Ray", "John", "Mole", "Smith", "Jay", "Smith", "Tom", "John"], "Subjects": ["Maths", "Economics", "Science", "Maths", "Statistics", "Statistics", "Statistics", "Computers"] } #load data into a DataFrame object: df = pd.DataFrame(data) uniqueValues = (df['Students'].append(df['Subjects'])).unique() print(uniqueValues)

**Output:**

['Ray' 'John' 'Mole' 'Smith' 'Jay' 'Tom' 'Maths' 'Economics' 'Science' 'Statistics' 'Computers']

As shown above, you will get the array of unique elements from both the columns from the dataframe.

**4) Count unique values in a single column **

Suppose instead of finding the names of unique values in the columns of the dataframe, you wish to count the total number of unique elements. In such a case, you can make use of the **nunique()** method instead of the unique() method as shown in the below example:

**For example:**

import pandas as pd data = { "Students": ["Ray", "John", "Mole", "Smith", "Jay", "Milli", "Tom", "Rick"], "Subjects": ["Maths", "Economics", "Science", "Maths", "Statistics", "Statistics", "Statistics", "Computers"] } #load data into a DataFrame object: df = pd.DataFrame(data) uniqueValues = df['Subjects'].nunique() print(uniqueValues)

**Output:**

`5`

Here, the output will return the count of unique elements from the given column of pandas dataframe.

Note that the above method will not count the **NaN value** as the unique elements of the column. To consider NaN value as the element you are looking for, you can pass the **“dropna”** argument and assign it to **False** as shown in the below example:

**For example:**

import pandas as pd data = { "Students": ["Ray", "John", "Mole", "Smith", "Jay", "Milli", "Tom", "Rick"], "Subjects": ["Maths", "Economics", "Science", "Maths", "NaN", "NaN", "Statistics", "Computers"] } #load data into a DataFrame object: df = pd.DataFrame(data) uniqueValues = df['Subjects'].nunique(dropna=False) print(uniqueValues)

**Output:**

```
6
```

**5) Count unique values in each columns**

In the above method, we count the number of unique values for the given column of the dataframe. However, if you wish to find a total number of the unique elements from each column of the dataframe, you can pass the dataframe object using the **nunique()** method. For better understanding, take a look at the below example.

**For example:**

import pandas as pd data = { "Students": ["Ray", "John", "Mole", "Smith", "Jay", "Milli", "Tom", "Rick"], "Subjects": ["Maths", "Economics", "Science", "Maths", "Statistics", "Statistics", "Statistics", "Computers"] } uniqueValues = df.nunique() print(uniqueValues)

**Output:**

Students 8 Subjects 6 dtype: int64

**Conclusion**

A data frame is an efficient way to store data in a tabular fashion that retains the 1-dimensional shape of features while also creating a multi-dimensional matrix. Data frames are a great tool for dealing with big datasets because they allow you to use techniques like parallel computing and machine learning. In this article, we studied how pandas get unique values in columns using some in-built methods and their respective example. Visit Favtutor to learn more such useful insights.