Sorting data is an essential task in data analysis, and Pandas provides a powerful method called sort_values() to sort DataFrames based on one or more columns. In this article, we will learn how to use sort_value method to Sort by Column a Pandas DataFrame and with the different parameters that can be used to customize the sorting behavior.
What is the sort_values() Function in Pandas?
The sort_values() function in Pandas allows us to sort DataFrames on the basis of one or more columns. It takes several parameters that can define the sorting behavior, such as: the columns to sort by, the sorting order (ascending or descending), and handling null values.
Let us now explore the various ways to use the sort_values function to sort a DataFrame based on single and multiple columns.
Sorting in Ascending Order
Sorting data in a DataFrame in ascending order is the default behavior of the sort_values() function. This means that when no sorting order is defined, the sort_values will sort the DataFrame in Ascending Order.
Sorting by a Single Column
To sort a DataFrame by a single column, we can simply pass the column name to the by parameter of the sort_values() function.
For example, let’s sort a DataFrame called df by the ‘Age’’ column with implementation in Python:
import pandas as pd data = { "Name": ["John", "Emma", "Michael"], "Age": [45, 30, 35], "City": ["New York", "London", "Paris"] } df = pd.DataFrame(data) # Display the original DataFrame print('Original DataFrame:\n', df) # Sort the DataFrame by the 'Age' column df = df.sort_values(by='Age') # Display the sorted DataFrame print('Sorted DataFrame:\n', df)
Output:
Original DataFrame:
Name Age City
0 John 45 New York
1 Emma 30 London
2 Michael 35 Paris
Sorted DataFrame:
Name Age City
1 Emma 30 London
2 Michael 35 Paris
0 John 45 New York
Sorting by Multiple Columns
Sorting by multiple columns allows us to define a hierarchical sorting order. We can pass a list of column names to the by parameter to sort the DataFrame based on multiple columns. The sorting is performed sequentially, with the first column taking precedence over the second, and so on. This simply means, in the case of clashes in the first column, the second column will be used as a second preference.
Let us consider an example, let’s sort the DataFrame df by the ‘Age’ and ‘Name’ columns:
import pandas as pd data = { "Name": ["John", "Zmma", "Michael"], "Age": [45, 30, 35], "City": ["New York", "London", "Paris"] } df = pd.DataFrame(data) # Display the original DataFrame print('Original DataFrame:\n', df) # Sort the DataFrame by multiple columns ('Age' and 'Name') df = df.sort_values(by=['Age', 'Name']) # Display the sorted DataFrame print('Sorted DataFrame:\n', df)
Output:
Original DataFrame:
Name Age City
0 John 45 New York
1 Zmma 30 London
2 Michael 35 Paris
Sorted DataFrame:
Name Age City
1 Zmma 30 London
2 Michael 35 Paris
0 John 45 New York
Sorting with Null Values
By default, the sort_values() function places null values at the end of the sorted DataFrame. However, we can change this behavior by setting the na_position parameter to ‘first’ to place null values at the beginning.
For example, let us try to sort the DataFrame df by the ‘Age’’ column with null values first:
import pandas as pd data = { "Name": ["John", "Emma", "Michael"], "Age": [45, 30, None], "City": ["New York", "London", "Paris"] } df = pd.DataFrame(data) # Display the original DataFrame print('Original DataFrame:\n', df) # Sort the DataFrame by column ('Age') with null values placed first df = df.sort_values(by='Age', na_position='first') # Display the sorted DataFrame print('Sorted DataFrame:\n', df)
Output:
Original DataFrame:
Name Age City
0 John 45.0 New York
1 Emma 30.0 London
2 Michael NaN Paris
Sorted DataFrame:
Name Age City
2 Michael NaN Paris
1 Emma 30.0 London
0 John 45.0 New York
Sorting in Descending Order
We can also sort the DataFrame in similar ways in Descending Order. To do this we simply need to set the ascending parameter to False. This will reverse the sorting order of the specified column(s).
Let us try a simple example:
import pandas as pd data = { "Name": ["John", "Emma", "Michael"], "Age": [45, 30, 55], "City": ["New York", "London", "Paris"] } df = pd.DataFrame(data) # Display the original DataFrame print('Original DataFrame:\n', df) # Sort the DataFrame by column ('Age') in descending order df = df.sort_values(by='Age', ascending=False) # Display the sorted DataFrame print('Sorted DataFrame:\n', df)
Output:
Original DataFrame:
Name Age City
0 John 45 New York
1 Emma 30 London
2 Michael 55 Paris
Sorted DataFrame:
Name Age City
2 Michael 55 Paris
0 John 45 New York
1 Emma 30 London
Sorting with Custom Sorting Algorithm
We can also use custom sorting algorithms to sort a DataFrame. The sort_values() function in Pandas provides different sorting algorithms to choose from. By default, the sorting algorithm used for the sort_values function is ‘quicksort’. We can change this to any algorithm according to our needs.
For example, let us try to sort a DataFrame using the ‘merge sort’:
import pandas as pd data = { "Name": ["John", "Emma", "Michael"], "Age": [45, 30, 55], "City": ["New York", "London", "Paris"] } df = pd.DataFrame(data) # Display the original DataFrame print('Original DataFrame:\n', df) # Sort the DataFrame by column ('Age') using mergesort df = df.sort_values(by='Age', kind='mergesort') # Display the sorted DataFrame print('Sorted DataFrame:\n', df)
Output:
Original DataFrame:
Name Age City
0 John 45 New York
1 Emma 30 London
2 Michael 55 Paris
Sorted DataFrame:
Name Age City
1 Emma 30 London
0 John 45 New York
2 Michael 55 Paris
You should also now learn how to rename columns in Pandas.
Conclusion
In this article, we have explored the different ways and parameters used to sort the DataFrames on the basis of single or multiple columns. By mastering the techniques discussed in this article, we can manipulate and analyze data more effectively using Pandas.