What’s New ?

The Top 10 favtutor Features You Might Have Overlooked

Read More

20 Pandas Exercises for Beginners (Python Solutions)

  • May 13, 2021
  • 10 Minutes Read
  • Why Trust Us
    We uphold a strict editorial policy that emphasizes factual accuracy, relevance, and impartiality. Our content is crafted by top technical writers with deep knowledge in the fields of computer science and data science, ensuring each piece is meticulously reviewed by a team of seasoned editors to guarantee compliance with the highest standards in educational content creation and publishing.
  • By Navoneel Chakrabarty
20 Pandas Exercises for Beginners (Python Solutions)

 

Pandas is a Python Data Analysis Lirbary, dealing primarily with tabular data. It's forms a major Data Analysis Toolbox which is widely used in the domains like Data Mining, Data Warehousing, Machine Learning and General Data Science. It is an Open Source Library under a liberal BSD license. It has mainly 2 forms:

  1. Series: Contains data related to a single variable (can be visualized as a vector) along with indexing information.
  2. DataFrame: Contains tabular data.

Here are 20 Basic Pandas Exercises for beginners which must be the bread and butter for every budding Data Analyst/Data Scientist.

Pandas Installation in Python

In the command line (cmd) type the following command,

pip install pandas

 

20 Pandas Exercises for Beginners

Importing Pandas and printing version number

import pandas as pd

print(pd.__version__)

 

Corresponding Output

1.1.3

 

EXERCISE 1 - List-to-Series Conversion

Given a list, output the corresponding pandas series

Sample Solution

given_list = [2, 4, 5, 6, 9]

series = pd.Series(given_list)

print(series)

 

Corresponding Output

0    2
1    4
2    5
3    6
4    9
dtype: int64

 

EXERCISE 2 - List-to-Series Conversion with Custom Indexing

Given a series, output the corresponding pandas series with odd indexes only

Sample Solution

given_list = [2, 4, 5, 6, 9]

series = pd.Series(given_list, index = [1, 3, 5, 7, 9])

print(series)

 

Corresponding Output

1    2
3    4
5    5
7    6
9    9
dtype: int64

 

EXERCISE 3 - Date Series Generation

Generate the series of dates from 1st May, 2021 to 12th May, 2021 (both inclusive)

Sample Solution

date_series = pd.date_range(start = '05-01-2021', end = '05-12-2021')

print(date_series)

 

Corresponding Output

DatetimeIndex(['2021-05-01', '2021-05-02', '2021-05-03', '2021-05-04',
               '2021-05-05', '2021-05-06', '2021-05-07', '2021-05-08',
               '2021-05-09', '2021-05-10', '2021-05-11', '2021-05-12'],
              dtype='datetime64[ns]', freq='D')

 

EXERCISE 4 - Implementing a function on each and every element of a series

Apply the function, f(x) = x/2 on each and every element of a given pandas series

Sample Solution

series = pd.Series([2, 4, 6, 8, 10])

print(series) # pandas series initially

print()

modified_series = series.apply(lambda x:x/2)

print(modified_series) # pandas series after function application

 

Corresponding Output

0     2
1     4
2     6
3     8
4    10
dtype: int64

0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
dtype: float64

 

EXERCISE 5 - Dictionary-to-Dataframe Conversion

Given a dictionary, convert it into corresponding dataframe and display it

Sample Solution

dictionary = {'name': ['Vinay', 'Kushal', 'Aman'],
              'age' : [22, 25, 24],
              'occ' : ['engineer', 'doctor', 'accountant']}

dataframe = pd.DataFrame(dictionary)

print(dataframe)

 

Corresponding Output

     name  age         occ
0   Vinay   22    engineer
1  Kushal   25      doctor
2    Aman   24  accountant

 

EXERCISE 6 - 2D List-to-Dataframe Conversion

Given a 2D List, convert it into corresponding dataframe and display it

Sample Solution

lists = [[2, 'Vishal', 22],
         [1, 'Kushal', 25],
         [1, 'Aman', 24]]

dataframe = pd.DataFrame(lists, columns = ['id', 'name', 'age'])

print(dataframe)

 

Corresponding Output

   id    name  age
0   2  Vishal   22
1   1  Kushal   25
2   1    Aman   24

 

EXERCISE 7 - Reading CSV to Dataframe

Given a CSV file, read it into a dataframe and display it

Sample Solution

dataframe = pd.read_csv('data.csv')

print(dataframe)

 

Corresponding Output

   id    name  age         occ
0   1   Vinay   22    engineer
1   2  Kushal   25      doctor
2   3    Aman   24  accountant

 

EXERCISE 8 - Setting Custom Index in Dataframe

Given a dataframe, change the index of a dataframe from the default indexes to a particular column

Sample Solution

print(dataframe) # original dataframe before custom indexing

print()

dataframe_customindex = dataframe.set_index('id') # custom indexed dataframe with column, 'id'

print(dataframe_customindex)

 

Corresponding Output 

   id    name  age         occ
0   1   Vinay   22    engineer
1   2  Kushal   25      doctor
2   3    Aman   24  accountant

      name  age         occ
id                         
1    Vinay   22    engineer
2   Kushal   25      doctor
3     Aman   24  accountant

 

EXERCISE 9 - Sorting a Dataframe by Index

Given a dataframe (say, with custom indexing), sort it by it's index

Sample Solution

print(dataframe) # original unsorted dataframe with custom indexing (id)

print()

dataframe_sorted = dataframe.sort_index()

print(dataframe_sorted)

 

Corresponding Output

      name  age         occ
id                         
2    Vinay   22    engineer
3   Kushal   25      doctor
1     Aman   24  accountant

      name  age         occ
id                         
1     Aman   24  accountant
2    Vinay   22    engineer
3   Kushal   25      doctor

 

EXERCISE 10 - Sorting a Dataframe by Multiple Columns

Given a dataframe, sort it by multiple columns

Sample Solution

print(dataframe) # original dataframe

print()

dataframe_sorted = dataframe.sort_values(by = ['id', 'age']) # dataframe after sorting by 'id' and 'age'

print(dataframe_sorted)

 

Corresponding Output

   id    name  age         occ
0   2   Vinay   22    engineer
1   1  Kushal   25      doctor
2   1    Aman   24  accountant

   id    name  age         occ
2   1    Aman   24  accountant
1   1  Kushal   25      doctor
0   2   Vinay   22    engineer

 

EXERCISE 11 - DataFrame with Custom Index to DataFrame with Dataframe with default indexes

Given a dataframe with custom indexing, convert and it to default indexing and display it

Sample Solution

print(dataframe_customindex) # printing the original dataframe with custom indexing

print()

dataframe = dataframe_customindex.reset_index()

print(dataframe) # printing the dataframe with default indexes

 

Corresponding Output

      name  age         occ
id                         
1    Vinay   22    engineer
2   Kushal   25      doctor
3     Aman   24  accountant

   id    name  age         occ
0   1   Vinay   22    engineer
1   2  Kushal   25      doctor
2   3    Aman   24  accountant

 

EXERCISE 12 - Indexing and Selecting Columns in a DataFrame

Given a dataframe, select a particular column and display it

Sample Solution

print(dataframe) # original dataframe

print()

o = dataframe['name'] # extracting the column 'name'

print(o)

Alternative Solution 1

print(dataframe) # original dataframe

print()

o = dataframe.iloc[:,1] # extracting the column 'name'

print(o)

Alternative Solution 2

print(dataframe) # original dataframe

print()

o = dataframe.loc[:,'name'] # extracting the column 'name'

print(o)

 

Corresponding Output

   id    name  age         occ
0   2   Vinay   22    engineer
1   1  Kushal   25      doctor
2   1    Aman   24  accountant

0     Vinay
1    Kushal
2      Aman
Name: name, dtype: object

 

EXERCISE 13 - Indexing and Selecting Rows in a DataFrame

Given a dataframe, select first 2 rows and output them

Sample Solution

print(dataframe) # original dataframe

print()

o = dataframe.iloc[[0,1], :] # extracting the 1st 2 rows of the dataframe

print(o)

Alternative Solution

print(dataframe) # original dataframe

print()

o = dataframe.loc[[0,1], :] # extracting the 1st 2 rows of the dataframe

print(o)

 

Corresponding Output

   id    name  age         occ
0   2   Vinay   22    engineer
1   1  Kushal   25      doctor
2   1    Aman   24  accountant

   id    name  age       occ
0   2   Vinay   22  engineer
1   1  Kushal   25    doctor

 

EXERCISE 14 - Conditional Selection of Rows in a DataFrame

Given a dataframe, select rows based on a condition

Sample Solution

print(dataframe) # original dataframe

print()

# selecting people with age greater than or equal to 24

dataframe_condition = dataframe.loc[dataframe.age >= 24]

print(dataframe_condition)

 

Corresponding Output

   id    name  age         occ
0   2   Vinay   22    engineer
1   1  Kushal   25      doctor
2   1    Aman   24  accountant

   id    name  age         occ
1   1  Kushal   25      doctor
2   1    Aman   24  accountant

 

EXERCISE 15 - Applying Aggregate Functions

Given is a dataframe showing name, occupation, salary of people. Find the average salary per occupation

Sample Solution

print(dataframe) # original dataframe

print()

occ_average_age = dataframe.groupby('occ')['salary'].mean() # required dataframe

print(occ_average_age)

 

Corresponding Output

     name       occ  salary
0   Vinay  engineer   60000
1  Kushal    doctor   70000
2    Aman  engineer   50000
3   Rahul    doctor   60000
4  Ramesh    doctor   65000

occ
doctor      65000
engineer    55000
Name: salary, dtype: int64

 

EXERCISE 16 - Filling NaN Values in a DataFrame

Given a dataframe with NaN Values, fill the NaN values with 0

Sample Solution

print(dataframe) # original dataframe

print()

dataframe_nullfill = dataframe.fillna(0)

print(dataframe_nullfill) # dataframe after filling NaN values with 1

 

Corresponding Output

     name       occ   salary
0   Vinay  engineer      NaN
1  Kushal    doctor  70000.0
2    Aman  engineer      NaN
3   Rahul    doctor  60000.0
4  Ramesh    doctor  65000.0

     name       occ   salary
0   Vinay  engineer      0.0
1  Kushal    doctor  70000.0
2    Aman  engineer      0.0
3   Rahul    doctor  60000.0
4  Ramesh    doctor  65000.0

 

EXERCISE 17 - Applying Functions (UDFs) on DataFrame

Given is a dataframe showing Company Names (cname) and corresponding Profits (profit). Convert the values of Profit column such that values in it greater than 0 are set to True and the rest are set to False.

Sample Solution

print(company_data) # original dataframe

print()

company_data['profit'] = company_data['profit'].apply(lambda x:x>0)

print(company_data) # required dataframe

 

Corresponding Output

                cname  profit
0         Shyam & Co.  -10000
1      Ramlal & Bros.   10000
2  Sharma Enterprises   -5000
3    Verma Furnitures   15000
4        Rahul Stores   20000

                cname  profit
0         Shyam & Co.   False
1      Ramlal & Bros.    True
2  Sharma Enterprises   False
3    Verma Furnitures    True
4        Rahul Stores    True

 

EXERCISE 18 - Joining 2 DataFrames by a Common Column (key)

Given are 2 dataframes, with one dataframe containing Employee ID (eid), Employee Name (ename) and Stipend (stipend) and the other dataframe containing Employee ID (eid) and designation of the employee (designation). Output the Dataframe containing Employee ID (eid), Employee Name (ename), Stipend (stipend) and Position (position).

Sample Solution

print(emp_data) # 1st DataFrame containing employee id (eid), employee name (ename) and stipend

print()

print(company_data) # 2nd DataFrame containing employee id (eid) and designation of the employee (position)

print()

dataframe = pd.merge(emp_data, company_data, how = 'inner', on = 'eid') # required dataframe

print(dataframe)

 

Corresponding Output

   eid   ename  stipend
0    1     Sid    10000
1    2  Ramesh    10000
2    3     Ron     5000
3    4   Harry    15000

   eid         position
0    1         employee
1    2         employee
2    3           intern
3    4  senior_employee

   eid   ename  stipend         position
0    1     Sid    10000         employee
1    2  Ramesh    10000         employee
2    3     Ron     5000           intern
3    4   Harry    15000  senior_employee

 

EXERCISE 19 - Getting the Non-Null Count and Data Type for Every Column

Given a dataframe, output the non-null count and data-type for every column

Sample Solution

print(dataframe) # the dataframe

print()

print(dataframe.info())

 

Corresponding Output

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 3
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   eid       4 non-null      int64 
 1   ename     4 non-null      object
 2   stipend   4 non-null      int64 
 3   position  4 non-null      object
dtypes: int64(2), object(2)
memory usage: 160.0+ bytes
None

 

EXERCISE 20 - Getting the Statistical Summary of all the Numerical Features of a DataFrame

Given a dataframe, generate the statistical summary of all the numerical features present in it

Sample Solution

print(dataframe) # the dataframe

print()

print(dataframe.describe())

 

Corresponding Output

   eid   ename  stipend         position
0    1     Sid    10000         employee
1    2  Ramesh    10000         employee
2    3     Ron     5000           intern
3    4   Harry    15000  senior_employee

            eid       stipend
count  4.000000      4.000000
mean   2.500000  10000.000000
std    1.290994   4082.482905
min    1.000000   5000.000000
25%    1.750000   8750.000000
50%    2.500000  10000.000000
75%    3.250000  11250.000000
max    4.000000  15000.000000

 

Conclusion

The above are the building blocks of Pandas that every beginner (Data Analyst or Scientist) must have an edge on. In case you are stuck somewhere in any of the pandas exercises or need further clarification on a concept of data science or Python, FavTutor experts are always available 24/7 to provide you help.

FavTutor - 24x7 Live Coding Help from Expert Tutors!

About The Author
Navoneel Chakrabarty
I'm Navoneel Chakrabarty, a Data Scientist, Machine Learning & AI Enthusiast, and a Regular Python Coder. Apart from that, I am also a Natural Language Processing (NLP), Deep Learning, and Computer Vision Enthusiast.