# 20 Pandas Exercises for Beginners (Python Solutions)

Pandas is a Python Data Analysis Lirbary, dealing primarily with tabular data. It's forms a major Data Analysis Toolbox which is widely used in the domains like Data Mining, Data Warehousing, Machine Learning and General Data Science. It is an Open Source Library under a liberal BSD license. It has mainly 2 forms:

1. Series: Contains data related to a single variable (can be visualized as a vector) along with indexing information.
2. DataFrame: Contains tabular data.

Here are 20 Basic Pandas Exercises for beginners which must be the bread and butter for every budding Data Analyst/Data Scientist.

### Pandas Installation in Python

In the command line (cmd) type the following command,

```pip install pandas
```

## 20 Pandas Exercises for Beginners

### Importing Pandas and printing version number

```import pandas as pd

print(pd.__version__)
```

Corresponding Output

```1.1.3
```

### EXERCISE 1 - List-to-Series Conversion

Given a list, output the corresponding pandas series

Sample Solution

```given_list = [2, 4, 5, 6, 9]

series = pd.Series(given_list)

print(series)
```

Corresponding Output

```0    2
1    4
2    5
3    6
4    9
dtype: int64
```

### EXERCISE 2 - List-to-Series Conversion with Custom Indexing

Given a series, output the corresponding pandas series with odd indexes only

Sample Solution

```given_list = [2, 4, 5, 6, 9]

series = pd.Series(given_list, index = [1, 3, 5, 7, 9])

print(series)
```

Corresponding Output

```1    2
3    4
5    5
7    6
9    9
dtype: int64
```

### EXERCISE 3 - Date Series Generation

Generate the series of dates from 1st May, 2021 to 12th May, 2021 (both inclusive)

Sample Solution

```date_series = pd.date_range(start = '05-01-2021', end = '05-12-2021')

print(date_series)
```

Corresponding Output

```DatetimeIndex(['2021-05-01', '2021-05-02', '2021-05-03', '2021-05-04',
'2021-05-05', '2021-05-06', '2021-05-07', '2021-05-08',
'2021-05-09', '2021-05-10', '2021-05-11', '2021-05-12'],
dtype='datetime64[ns]', freq='D')
```

### EXERCISE 4 - Implementing a function on each and every element of a series

Apply the function, f(x) = x/2 on each and every element of a given pandas series

Sample Solution

```series = pd.Series([2, 4, 6, 8, 10])

print(series) # pandas series initially

print()

modified_series = series.apply(lambda x:x/2)

print(modified_series) # pandas series after function application```

Corresponding Output

```0     2
1     4
2     6
3     8
4    10
dtype: int64

0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
dtype: float64
```

### EXERCISE 5 - Dictionary-to-Dataframe Conversion

Given a dictionary, convert it into corresponding dataframe and display it

Sample Solution

```dictionary = {'name': ['Vinay', 'Kushal', 'Aman'],
'age' : [22, 25, 24],
'occ' : ['engineer', 'doctor', 'accountant']}

dataframe = pd.DataFrame(dictionary)

print(dataframe)
```

Corresponding Output

```     name  age         occ
0   Vinay   22    engineer
1  Kushal   25      doctor
2    Aman   24  accountant
```

### EXERCISE 6 - 2D List-to-Dataframe Conversion

Given a 2D List, convert it into corresponding dataframe and display it

Sample Solution

```lists = [[2, 'Vishal', 22],
[1, 'Kushal', 25],
[1, 'Aman', 24]]

dataframe = pd.DataFrame(lists, columns = ['id', 'name', 'age'])

print(dataframe)
```

Corresponding Output

```   id    name  age
0   2  Vishal   22
1   1  Kushal   25
2   1    Aman   24
```

### EXERCISE 7 - Reading CSV to Dataframe

Given a CSV file, read it into a dataframe and display it

Sample Solution

```dataframe = pd.read_csv('data.csv')

print(dataframe)
```

Corresponding Output

```   id    name  age         occ
0   1   Vinay   22    engineer
1   2  Kushal   25      doctor
2   3    Aman   24  accountant
```

### EXERCISE 8 - Setting Custom Index in Dataframe

Given a dataframe, change the index of a dataframe from the default indexes to a particular column

Sample Solution

```print(dataframe) # original dataframe before custom indexing

print()

dataframe_customindex = dataframe.set_index('id') # custom indexed dataframe with column, 'id'

print(dataframe_customindex)
```

Corresponding Output

```   id    name  age         occ
0   1   Vinay   22    engineer
1   2  Kushal   25      doctor
2   3    Aman   24  accountant

name  age         occ
id
1    Vinay   22    engineer
2   Kushal   25      doctor
3     Aman   24  accountant
```

### EXERCISE 9 - Sorting a Dataframe by Index

Given a dataframe (say, with custom indexing), sort it by it's index

Sample Solution

```print(dataframe) # original unsorted dataframe with custom indexing (id)

print()

dataframe_sorted = dataframe.sort_index()

print(dataframe_sorted)
```

Corresponding Output

```      name  age         occ
id
2    Vinay   22    engineer
3   Kushal   25      doctor
1     Aman   24  accountant

name  age         occ
id
1     Aman   24  accountant
2    Vinay   22    engineer
3   Kushal   25      doctor
```

### EXERCISE 10 - Sorting a Dataframe by Multiple Columns

Given a dataframe, sort it by multiple columns

Sample Solution

```print(dataframe) # original dataframe

print()

dataframe_sorted = dataframe.sort_values(by = ['id', 'age']) # dataframe after sorting by 'id' and 'age'

print(dataframe_sorted)```

Corresponding Output

```   id    name  age         occ
0   2   Vinay   22    engineer
1   1  Kushal   25      doctor
2   1    Aman   24  accountant

id    name  age         occ
2   1    Aman   24  accountant
1   1  Kushal   25      doctor
0   2   Vinay   22    engineer
```

### EXERCISE 11 - DataFrame with Custom Index to DataFrame with Dataframe with default indexes

Given a dataframe with custom indexing, convert and it to default indexing and display it

Sample Solution

```print(dataframe_customindex) # printing the original dataframe with custom indexing

print()

dataframe = dataframe_customindex.reset_index()

print(dataframe) # printing the dataframe with default indexes
```

Corresponding Output

```      name  age         occ
id
1    Vinay   22    engineer
2   Kushal   25      doctor
3     Aman   24  accountant

id    name  age         occ
0   1   Vinay   22    engineer
1   2  Kushal   25      doctor
2   3    Aman   24  accountant
```

### EXERCISE 12 - Indexing and Selecting Columns in a DataFrame

Given a dataframe, select a particular column and display it

Sample Solution

```print(dataframe) # original dataframe

print()

o = dataframe['name'] # extracting the column 'name'

print(o)
```

Alternative Solution 1

```print(dataframe) # original dataframe

print()

o = dataframe.iloc[:,1] # extracting the column 'name'

print(o)
```

Alternative Solution 2

```print(dataframe) # original dataframe

print()

o = dataframe.loc[:,'name'] # extracting the column 'name'

print(o)```

Corresponding Output

```   id    name  age         occ
0   2   Vinay   22    engineer
1   1  Kushal   25      doctor
2   1    Aman   24  accountant

0     Vinay
1    Kushal
2      Aman
Name: name, dtype: object
```

### EXERCISE 13 - Indexing and Selecting Rows in a DataFrame

Given a dataframe, select first 2 rows and output them

Sample Solution

```print(dataframe) # original dataframe

print()

o = dataframe.iloc[[0,1], :] # extracting the 1st 2 rows of the dataframe

print(o)
```

Alternative Solution

```print(dataframe) # original dataframe

print()

o = dataframe.loc[[0,1], :] # extracting the 1st 2 rows of the dataframe

print(o)
```

Corresponding Output

```   id    name  age         occ
0   2   Vinay   22    engineer
1   1  Kushal   25      doctor
2   1    Aman   24  accountant

id    name  age       occ
0   2   Vinay   22  engineer
1   1  Kushal   25    doctor
```

### EXERCISE 14 - Conditional Selection of Rows in a DataFrame

Given a dataframe, select rows based on a condition

Sample Solution

```print(dataframe) # original dataframe

print()

# selecting people with age greater than or equal to 24

dataframe_condition = dataframe.loc[dataframe.age >= 24]

print(dataframe_condition)
```

Corresponding Output

```   id    name  age         occ
0   2   Vinay   22    engineer
1   1  Kushal   25      doctor
2   1    Aman   24  accountant

id    name  age         occ
1   1  Kushal   25      doctor
2   1    Aman   24  accountant
```

### EXERCISE 15 - Applying Aggregate Functions

Given is a dataframe showing name, occupation, salary of people. Find the average salary per occupation

Sample Solution

```print(dataframe) # original dataframe

print()

occ_average_age = dataframe.groupby('occ')['salary'].mean() # required dataframe

print(occ_average_age)
```

Corresponding Output

```     name       occ  salary
0   Vinay  engineer   60000
1  Kushal    doctor   70000
2    Aman  engineer   50000
3   Rahul    doctor   60000
4  Ramesh    doctor   65000

occ
doctor      65000
engineer    55000
Name: salary, dtype: int64
```

### EXERCISE 16 - Filling NaN Values in a DataFrame

Given a dataframe with NaN Values, fill the NaN values with 0

Sample Solution

```print(dataframe) # original dataframe

print()

dataframe_nullfill = dataframe.fillna(0)

print(dataframe_nullfill) # dataframe after filling NaN values with 1
```

Corresponding Output

```     name       occ   salary
0   Vinay  engineer      NaN
1  Kushal    doctor  70000.0
2    Aman  engineer      NaN
3   Rahul    doctor  60000.0
4  Ramesh    doctor  65000.0

name       occ   salary
0   Vinay  engineer      0.0
1  Kushal    doctor  70000.0
2    Aman  engineer      0.0
3   Rahul    doctor  60000.0
4  Ramesh    doctor  65000.0
```

### EXERCISE 17 - Applying Functions (UDFs) on DataFrame

Given is a dataframe showing Company Names (cname) and corresponding Profits (profit). Convert the values of Profit column such that values in it greater than 0 are set to True and the rest are set to False.

Sample Solution

```print(company_data) # original dataframe

print()

company_data['profit'] = company_data['profit'].apply(lambda x:x>0)

print(company_data) # required dataframe
```

Corresponding Output

```                cname  profit
0         Shyam & Co.  -10000
1      Ramlal & Bros.   10000
2  Sharma Enterprises   -5000
3    Verma Furnitures   15000
4        Rahul Stores   20000

cname  profit
0         Shyam & Co.   False
1      Ramlal & Bros.    True
2  Sharma Enterprises   False
3    Verma Furnitures    True
4        Rahul Stores    True
```

### EXERCISE 18 - Joining 2 DataFrames by a Common Column (key)

Given are 2 dataframes, with one dataframe containing Employee ID (eid), Employee Name (ename) and Stipend (stipend) and the other dataframe containing Employee ID (eid) and designation of the employee (designation). Output the Dataframe containing Employee ID (eid), Employee Name (ename), Stipend (stipend) and Position (position).

Sample Solution

```print(emp_data) # 1st DataFrame containing employee id (eid), employee name (ename) and stipend

print()

print(company_data) # 2nd DataFrame containing employee id (eid) and designation of the employee (position)

print()

dataframe = pd.merge(emp_data, company_data, how = 'inner', on = 'eid') # required dataframe

print(dataframe)
```

Corresponding Output

```   eid   ename  stipend
0    1     Sid    10000
1    2  Ramesh    10000
2    3     Ron     5000
3    4   Harry    15000

eid         position
0    1         employee
1    2         employee
2    3           intern
3    4  senior_employee

eid   ename  stipend         position
0    1     Sid    10000         employee
1    2  Ramesh    10000         employee
2    3     Ron     5000           intern
3    4   Harry    15000  senior_employee
```

### EXERCISE 19 - Getting the Non-Null Count and Data Type for Every Column

Given a dataframe, output the non-null count and data-type for every column

Sample Solution

```print(dataframe) # the dataframe

print()

print(dataframe.info())
```

Corresponding Output

```<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 3
Data columns (total 4 columns):
#   Column    Non-Null Count  Dtype
---  ------    --------------  -----
0   eid       4 non-null      int64
1   ename     4 non-null      object
2   stipend   4 non-null      int64
3   position  4 non-null      object
dtypes: int64(2), object(2)
memory usage: 160.0+ bytes
None
```

### EXERCISE 20 - Getting the Statistical Summary of all the Numerical Features of a DataFrame

Given a dataframe, generate the statistical summary of all the numerical features present in it

Sample Solution

```print(dataframe) # the dataframe

print()

print(dataframe.describe())
```

Corresponding Output

```   eid   ename  stipend         position
0    1     Sid    10000         employee
1    2  Ramesh    10000         employee
2    3     Ron     5000           intern
3    4   Harry    15000  senior_employee

eid       stipend
count  4.000000      4.000000
mean   2.500000  10000.000000
std    1.290994   4082.482905
min    1.000000   5000.000000
25%    1.750000   8750.000000
50%    2.500000  10000.000000
75%    3.250000  11250.000000
max    4.000000  15000.000000
```

## Conclusion

The above are the building blocks of Pandas that every beginner (Data Analyst or Scientist) must have an edge on. In case you are stuck somewhere in any of the pandas exercises or need further clarification on a concept of data science or Python, FavTutor experts are always available 24/7 to provide you help.

### FavTutor - 24x7 Live Coding Help from Expert Tutors! 