What’s New ?

The Top 10 favtutor Features You Might Have Overlooked

Read More

Using the PARTITION BY in SQL (with Examples)

  • Jan 15, 2024
  • 7 Minutes Read
  • Why Trust Us
    We uphold a strict editorial policy that emphasizes factual accuracy, relevance, and impartiality. Our content is crafted by top technical writers with deep knowledge in the fields of computer science and data science, ensuring each piece is meticulously reviewed by a team of seasoned editors to guarantee compliance with the highest standards in educational content creation and publishing.
  • By Tanish Mallik
Using the PARTITION BY in SQL (with Examples)

When working with large datasets, it's not always practical to consider the entire dataset as a single unit. This is where PARTITION BY helps us by allowing us to divide our data into more manageable and meaningful segments for in-depth examination. In this article, we will learn PARTITION BY in SQL along with examples.

What is Partition By in SQL?

SQL PARTITION BY is like a special tool in the database management system. Imagine you have a big list of data, and you want to look at specific groups within that list. That's where SQL PARTITION BY comes in.

It is like having a magnifying glass for your data. Instead of looking at the whole list at once, you can use PARTITION BY to zoom in on smaller parts. Each section is called a "partition," and you can do specific calculations or checks within these partitions.

So, instead of dealing with the entire dataset in one go, you break it down into smaller, more manageable pieces using the PARTITION BY clause. This way, you can focus on specific chunks of data, making it easier to analyze and work with. 

PARTITION BY in SQL

1) SQL PARTITION BY Multiple Columns

In SQL, using PARTITION BY with multiple columns is like creating organized groups within your data. Imagine you have a big list of transactions, and you want to break it down into smaller sections based on different aspects, such as both the product and the customer involved. In simpler terms, it's like sorting your transactions into folders by product and customer.

Syntax:

SELECT
column1,
column2,
...,
columnN,
aggregateFunction(columnX) OVER (PARTITION BY partitionColumn1, partitionColumn2, ..., partitionColumnM ORDER BY orderColumn) AS aliasName
FROM
tableName;

 

This example below calculates the running total of the 'Amount' column for each combination of 'ProductID' and 'Category' based on the 'SaleDate'.

Let's take the example of 'sales' table:

sales table

SQL Query:

SELECT
ProductID,
Category,
SaleDate,
Amount,
SUM(Amount) OVER (PARTITION BY ProductID, Category ORDER BY SaleDate) AS RunningTotal
FROM
Sales;

 

Output:

SQL PARTITION BY Multiple Columns output

2) SQL PARTITION BY Date

In SQL, using PARTITION BY Date is like having a special tool to look at data day by day. Imagine you have a list of events or sales that happened on different dates. With PARTITION BY Date, you can organize and study your data one day at a time.

Syntax:

SELECT
column1,
column2,
...,
columnN,
AGGREGATE_FUNCTION(columnX) OVER (PARTITION BY partitionColumn ORDER BY orderColumn) AS AliasName
FROM
TableName;


This example calculates the daily total of the 'Amount' column, partitioned by the 'SaleDate' and ordered by 'ProductID'.

We will take the same example of the 'sales' table.

SQL Query:

SELECT
ProductID,
Category,
SaleDate,
Amount,
SUM(Amount) OVER (PARTITION BY SaleDate ORDER BY ProductID) AS DailyTotal
FROM
Sales;


Output: 

SQL PARTITION BY Date output

3) SQL PARTITION BY ORDER BY

In SQL, using PARTITION BY ORDER BY is like creating a neat view of papers on your desk. Imagine you have a pile of reports, and you want to organize them into different categories. But wait, you also want each category to be sorted, perhaps by the date or importance.

Syntax:

SELECT
column1,
column2,
...,
columnN,
ROW_NUMBER() OVER (PARTITION BY partitionColumn1, partitionColumn2 ORDER BY orderColumn) AS AliasName
FROM
TableName;

 

This example assigns a row number to each row within each combination of 'ProductID' and 'Category' based on the 'SaleDate'.

We will take the same example of the 'sales' table.

SQL Query:

SELECT
ProductID,
Category,
SaleDate,
Amount,
ROW_NUMBER() OVER (PARTITION BY ProductID, Category ORDER BY SaleDate) AS RowNum
FROM
Sales;

 

Output:

SQL PARTITION BY ORDER BY output

SQL PARTITION BY Row Number

PARTITION BY Row Number is like assigning unique numbers to each row within specific groups of your data. Imagine you have a large dataset, and you want to label each row in a way that makes sense for each subgroup.

Syntax:

SELECT
column1,
column2,
...,
columnN,
ROW_NUMBER() OVER (ORDER BY orderColumn) AS AliasName
FROM
TableName;

 

This example assigns a row number to each row based on the overall order of 'SaleDate'.

We will take the same example of the 'sales' table.

SQL Query:

SELECT
ProductID,
Category,
SaleDate,
Amount,
ROW_NUMBER() OVER (ORDER BY SaleDate) AS OverallRowNum
FROM
Sales;


Output:

SQL PARTITION BY Row Number output

Partition By vs Group By in SQL

GROUP BY is like putting similar things together and then summarizing them. Imagine you have a list of fruits, and you want to know the total quantity for each type. You'd use GROUP BY to group all apples, all oranges, etc., and then find the sum for each group.

On the other hand, PARTITION BY is like dividing your list into sections and then doing something special within each section. If you have a list of daily expenses and you want to know the running total for each day separately, you'd use PARTITION BY. It helps you focus on one section at a time, like each day, and calculate things just for that part.

We will use this table to illustrate the difference between PARTITION BY and GROUP BY in SQL.

Let's take an example of the 'Transactions' table.

transactions table

Example of PARTITION BY:

Suppose you want to calculate the running total of Amount for each Category based on the TransactionID.

SQL Query:

SELECT
TransactionID,
Category,
Amount,
SUM(Amount) OVER (PARTITION BY Category ORDER BY TransactionID) AS RunningTotal
FROM
Transactions;

 

Output:

Example of PARTITION BY difference output

In this example, the PARTITION BY clause is used to create partitions for each unique Category. The SUM function then calculates the running total of Amount within each partition, ordered by TransactionID.

Example of GROUP BY:

Now, let's say you want to find the total amount for each Category.

SQL Query:

SELECT
Category,
SUM(Amount) AS TotalAmount
FROM
Transactions
GROUP BY
Category;

 

Output:

Example of GROUP BY difference output

In this query, the GROUP BY clause is used to group the data by Category, and the SUM function calculates the total amount for each category.

Conclusion

In the world of SQL, the PARTITION BY clause stands out as a wonderful tool that adds a layer of precision to data analysis. It allows you to break down your data into smaller groups or sections, and then perform calculations or analyses within each of these subsets. It's handy for tasks like calculating running totals, and averages or assigning row numbers within specific categories.

FavTutor - 24x7 Live Coding Help from Expert Tutors!

About The Author
Tanish Mallik
I am Tanish, a final-year computer science student with extensive expertise in SQL. My background in the field of data science has honed my strong analytical skills, and I also have hands-on experience in developing data integration and optimization pipelines using Python and MSBI. I am eager to contribute innovative ideas and make an impact in the ever-evolving world of technology.