What’s New ?

The Top 10 favtutor Features You Might Have Overlooked

Read More

Reordering Columns in R (With Examples)

  • Feb 01, 2024
  • 7 Minutes Read
Reordering Columns in R (With Examples)

One of the most fundamental tasks in data analysis is modifying datasets to meet certain requirements. Column reordering is a common method, whether to improve readability or to prepare the data for analysis. In this article, we'll look at how to reorder columns in R by name, as well as the complexities of reordering rows.

What are Data Frames?

Before we start to learn how to reorder columns and rows, let us first learn about the fundamental data structure that lies at the core of it in R programming - a data frame. A data frame is a two-dimensional, tabular data structure, akin to a spreadsheet, where data is organized in rows and columns. Most datasets we encounter in R will be in the form of a data frame.

Let's create a simple data frame for illustration purposes.

Code:

data <- data.frame(
  Name = c("Ashley", "Bobby", "Charles"),
  Age = c(25, 30, 22),
  Score = c(95, 80, 75)
)

print(data)

 

This code will generate the following data frame.

Output:

     Name Age Score
1  Ashley  25    95
2   Bobby  30    80
3 Charles  22    75

 

Reordering Columns in R by Name

There are several ways to reorder columns in R using their name. Let us take a look at a few of them.

1. Using Select Function from dplyr Package

Before we start to learn how to reorder columns and rows, let us first learn about the fundamental data structure that lies at the core of it in R programming - a data frame. A data frame is a two-dimensional, tabular data structure, akin to a spreadsheet, where data is organized in rows and columns. Most datasets we encounter in R will be in the form of a data frame.

Code:

data <- data %>%
  select(Age, Name, Score)

print(data)

 

This will result in the following reordered data frame.

 

Output:

  Age    Name Score
1  25  Ashley    95
2  30   Bobby    80
3  22 Charles    75

 

2. Using Core R Function

Apart from external packages, you can reorganize columns in R using its core functions. The 'order' function can be used to reorder the columns in the same way.

Code:

data <- data[, order(names(data))]

 

This code will produce the same output as shown above.

Reordering Rows in R

R also provides us with various ways to reorder rows, here we will take a look at a few of them.

1. Using the arrange Function from dplyr

Just as we used select for column reordering, the arrange function from the dplyr package is used for ordering rows based on one or more columns. This feature is useful when we want to reorder our data frame based on specific criteria.

Code:

data <- data %>%
  arrange(Age)

print(data)

 

Output:

    Name  Score Age
1 Charles    75  22
2  Ashley    95  25
3   Bobby    80  30

 

To sort in descending order, we can use the desc function.

Code:

data <- data %>%
  arrange(desc(Age))

print(data)

 

Output:

     Name Score Age
1   Bobby    80  30
2  Ashley    95  25
3 Charles    75  22

 

By doing this, we get a data frame sorted by the 'Age' column in descending order.

2. Using Core R Function

The order function can be applied to the row indices to reorder them using the core R function. This function returns the sorted indices for the given columns.

Code:

data <- data %>%
  arrange(desc(Age))

print(data)

 

Output:

     Name Score Age
1 Charles    75  22
2  Ashley    95  25
3   Bobby    80  30

 

This produces the same example as shown above, using the arrange function.

Reordering Columns and Rows Simultaneously

In certain situations, we may need to reorder both columns and rows. This can be accomplished by combining the choose and arrange methods from the dplyr package.

Code:

data <- data %>%
  select(Score, Age, Name) %>%
  arrange(Age)

print(data)

 

Output:

  Score Age    Name
1    75  22 Charles
2    95  25  Ashley
3    80  30   Bobby

 

In this example, we reordered both the columns and rows simultaneously.

Handling Missing Values

When working with real-world datasets, we could come across missing values. It's critical to understand how reordering procedures handle NA (Not Available) variables.

na.last Parameter

Missing values are placed according to the order function's na.last parameter. If na.last is set to TRUE (the default), missing values will be added at the end of the ascending order. If set to FALSE, missing values will be added at the start.

Code:

data_missing <- data.frame(
  Name = c("Ashley", "Bobby", "Charles", NA),
  Age = c(25, 30, 22, NA),
  Score = c(95, 80, 75, NA)
)

data_missing <- data_missing[order(data_missing$Age, na.last = TRUE), ]

print(data_missing)

 

Output:

     Name Age Score
3 Charles  22    75
1  Ashley  25    95
2   Bobby  30    80
4      NA    NA

 

Here, the row with missing values is placed at the end due to the na.last = TRUE setting.

 

Best Practices and Tips

Let us now look at a few best practices to use these tips and techniques.

1. Use Meaningful Column Names

When reordering columns, we need to make certain to use useful and descriptive column names. This not only improves the readability of your code but also helps others grasp the structure of the data.

2. Document Your Code

Documenting our code, like any other programming work, is a good practice. Include comments explaining why columns or rows are being reordered, as well as any sorting criteria utilized.

3. Understand the Impact of Analysis

Before rearranging columns or rows, we need to consider the implications for later analysis. Ensure that the new order follows the logical flow of your analysis and does not have any unexpected bugs.

Conclusion

In conclusion, learning the methods for rearranging columns and rows in R is an essential skill for data analysts and statisticians. Whether using the 'dplyr' package or normal R functions, the ability to rearrange data frames facilitates a more organized and informative data analysis process. In this article, we looked at how to reorder columns by name, rows by specific criteria, and columns and rows at the same time. We also looked at the best practices to improve code readability. 

FavTutor - 24x7 Live Coding Help from Expert Tutors!

About The Author
Abhisek Ganguly
Passionate machine learning enthusiast with a deep love for computer science, dedicated to pushing the boundaries of AI through academic research and sharing knowledge through teaching.