What’s New ?

The Top 10 favtutor Features You Might Have Overlooked

Read More

Remove Column in a DataFrame in R (with code)

  • Jan 29, 2024
  • 9 Minutes Read
Remove Column in a DataFrame in R (with code)

Refining the available raw data to make it easier to obtain valuable insights from it is called data manipulation. Removing columns from a dataset is also one such technique. It is necessary in situations where some parts of the data are redundant, not needed, or may cause confusion. Let’s look at how to remove columns in R.

Identifying Columns to Remove in R

It is important to ensure that the columns you remove don’t cause a problem for your analysis process in the future. So how do you identify the right columns to remove? Here are the steps to do before removing a column:

  1. Assess the dataset: Explore the dataset using functions like head() to display the first few rows, summary() to get statistical summaries, and str() to view the structure of the dataset. These functions give you a snapshot of the data, revealing its dimensions and data types.
  2. Identify the unnecessary columns: While trying to identify unnecessary columns, look out for the following:
    • Redundant columns
    • Constant columns
    • Highly correlated columns
    • Columns irrelevant to the analysis task
  3. Consider the impact of column removal on analysis: After you’ve identified the unnecessary columns, analyze how removing them might affect the achievement of the analysis goal. Consider its impact on the integrity and reliability of the data. Confirm that what you are removing is noise and not essential information.

How to Remove Column in R?

Consider the following sample dataframe:

employee_data <- data.frame(
ID = c(101, 102, 103),
Name = c("Lana", "David", "Emma"),
Age = c(30, 28, 35),
Department = c("Marketing", "Finance", "HR"),
Salary = c(60000, 75000, 65000)
)

 

Now let’s consider the different methods to remove a column in R:

1) Remove Column by Index

An index is the numerical position of a column in the dataframe. Using indexes to remove columns is not a very common method as it is less readable and is prone to errors. Let’s see how we can do it:

# Removing the column at index 3 (Age)
employee_data <- employee_data[, -3]

 

Output:

Remove Column by Index

2) Using subset() with Negative Index

We can remove columns by using the concept of negative indexing along with the subset() function. This provides a convenient way to create subsets of the dataset. For Example:

# Removing the column at index 3 (Age)
employee_data <- subset(employee_data, select = -3) 

It gives the same output as above.

3) Drop Column by Name

Removing columns by column name is a better alternative because it is flexible, improves readability and maintainability, and is less prone to errors. We will use $ Operator. This is one of the most commonly used methods. It is simple, concise, and directly modifies the original dataframe.

Example:

# Removing the 'Department' column
employee_data$Department <- NULL

 

Output:

Drop Column by Name

4) Using subset() Function

This function is more commonly used with column names than indexes. For Example:

# Removing the 'Department' column
employee_data <- subset(employee_data, select = -Department) 

 

5) Using the detach() function

This helps you detach a column, making it independent of your dataframe. This is how we can do it:

# Removing the 'Department' column
detach(employee_data, columns = c("Department"))

 

6) Using the exclude argument in summary()

This helps you exclude specific columns from the summary. It can be done using:

# Removing the 'Department' column
summary(employee_data, exclude = c("Department"))

 

7) Using dplyr Package's select() Function

Several packages in R have functions that can be used to remove columns from the dataset. This dplyr package provides a concise and readable syntax for data manipulation tasks. We can code it using:

library(dplyr)
# Removing the 'Department' column
employee_data <- select(employee_data, -Department) 

 

8) Using remove_column() Function in tibble Package:

This is a package that enhances data frames, providing them with modern and tidy data structures. For example,

library(tibble)
# Removing the 'Department' column
employee_data <- remove_column(employee_data, Department)  

 

9) Using remove_columns() Function from janitor Package

This package simplifies data cleaning and tabulation tasks with easy-to-use functions. For example:

library(janitor)
# Removing the 'Department' column
employee_data <- remove_columns(employee_data, Department)   

 

10) Conditional Column Removal

Most of the above methods can also be used to remove columns that match specific conditions. Here is how you do that using the dplyr Package select() Function:

# Remove column if its name starts with a specific prefix
library(dplyr)
prefix_to_remove <- "De"
employee_data <- select(employee_data, -starts_with(prefix_to_remove))

 

Conclusion

In a nutshell, removing columns makes the data more readable and easier to process and analyze. We learned the different ways that a column can be removed. We also saw the problems of removing important columns, and how to avoid this. This whole process helps you obtain more accurate results while working with data.

FavTutor - 24x7 Live Coding Help from Expert Tutors!

About The Author
Aarthi Juryala
I'm Aarthi, a final-year student in Artificial Intelligence & Data Science. My education and experience have introduced me to machine learning and Generative AI, and I have developed skills in languages like R, Python, and SQL. I'm passionate about exploring how AI can positively influence various fields.