What’s New ?

The Top 10 favtutor Features You Might Have Overlooked

Read More

Remove Row in R | 7 Methods (With Examples)

  • Sep 19, 2023
  • 6 Minutes Read
  • Why Trust Us
    We uphold a strict editorial policy that emphasizes factual accuracy, relevance, and impartiality. Our content is crafted by top technical writers with deep knowledge in the fields of computer science and data science, ensuring each piece is meticulously reviewed by a team of seasoned editors to guarantee compliance with the highest standards in educational content creation and publishing.
  • By Abhisek Ganguly
Remove Row in R | 7 Methods (With Examples)

Data cleaning is a crucial step in any data analysis project. One common task in data cleaning is removing rows with unwanted or irrelevant data. In R, there are several methods to remove rows from a dataset. In this article, we will explore various techniques and functions for removing rows in R.

Why Remove Rows?

Before moving forward, let us first understand why removing rows in R is crucial to us. There are several common scenarios:

  1. Missing Data: Rows with missing values may need to be removed to ensure the integrity of your analysis.
  2. Outliers: Sometimes, outliers can skew your analysis results. Removing rows containing outliers can help produce more accurate insights.
  3. Duplicates: Duplicate rows can distort your analysis and lead to incorrect conclusions. Removing duplicates is essential for data accuracy.
  4. Irrelevant Data: In some cases, you may have data that is irrelevant to your analysis. Removing these rows can make your dataset more focused.

How to Remove Rows in R?

There are various ways to remove rows in R, all with it's own pros and cons. Let us now explore the different ways to remove rows in R.

1. Using Subsetting

One of the most straightforward ways to remove rows in R is by subsetting the data. You can specify conditions to filter out rows that meet certain criteria. Here's an example:

data <- data.frame(
  ID = 1:5,
  Age = c(25, 32, NA, 45, 28)
)

clean_data <- data[!is.na(data$Age), ]

 

In this example, the `!is.na(data$Age)` condition is used to select rows where the "Age" column is not missing.

2. Using the subset() Function

The subset() function is another handy tool for removing rows in R. It allows you to specify conditions to filter rows based on column values.

data <- data.frame(
  ID = 1:5,
  Age = c(25, 32, NA, 45, 28)
)

clean_data <- subset(data, !is.na(Age))

 

The `subset()` function simplifies the code compared to subsetting, making it more readable.

You can learn more about subsetting in R in our blog: Subset in R

3. Removing Duplicates

To remove duplicate rows from a dataset, you can use the `unique()` function or the `duplicated()` function in combination with subsetting. Here's an example using both:

data <- data.frame(
  ID = c(1, 2, 2, 3, 4),
  Age = c(25, 32, 32, 45, 28)
)

clean_data <- unique(data)

duplicate_rows <- duplicated(data)
clean_data <- data[!duplicate_rows, ]

 

This method is especially useful when working with redundant data.

4. Using the dplyr Package

The dplyr package is a popular choice for data manipulation in R. It provides a set of functions that make data cleaning tasks more intuitive and readable. To remove rows using dplyr, you can use the `filter()` function:

library(dplyr)

data <- data.frame(
  ID = 1:5,
  Age = c(25, 32, NA, 45, 28)
)

clean_data <- data %>%
  filter(!is.na(Age))

 

Here we are filtering out the NA values from the data frame using the `filter()` function from dplyr package.

5. Removing Rows by Row Index

Sometimes we might want to remove only specific rows, this can be achieved by applying negative row indexing:

data <- data.frame(
  ID = 1:5,
  Age = c(25, 32, 22, 45, 28)
)

clean_data <- data[-c(3, 5), ]

 

In this, we apply a negative row index to remove rows at indices 3 and 5.

6. Deleting Rows by Range

To delete rows by range in R, you can use numeric indices to specify the range of rows you want to remove. Here's an example:

data <- data.frame(
  ID = 1:5,
  Age = c(25, 32, 22, 45, 28)
)

clean_data <- data[-c(2:4), ]

 

In this example, we use negative indexing with a `range (2:4)` to remove rows 2 to 4 from the `data` dataframe, resulting in the `clean_data` dataframe.

7. Deleting Rows by Name

To delete rows by name in R, you can use the row names or row labels. Here's an example:

data <- data.frame(
  Name = c("Alice", "Bob", "Charlie", "David", "Eve"),
  Age = c(25, 32, 22, 45, 28)
)

rownames(data) <- c("Row1", "Row2", "Row3", "Row4", "Row5")

clean_data <- data[-which(rownames(data) == "Row3"), ]

 

In this example, we first set row names for the dataframe using `rownames()`. Then, we use the `which()` function to find the index of the row with the name "Row3," and we remove that row from the `data` dataframe, resulting in the `clean_data` dataframe.

Conclusion

Removing rows is a fundamental part of data cleaning in R. Depending on your specific data and analysis requirements, you can choose the method that best suits your needs. Whether you prefer basic subsetting, the simplicity of the subset() function, the power of dplyr, or other techniques, R provides you with the flexibility to clean and prepare your data effectively. 

FavTutor - 24x7 Live Coding Help from Expert Tutors!

About The Author
Abhisek Ganguly
Passionate machine learning enthusiast with a deep love for computer science, dedicated to pushing the boundaries of AI through academic research and sharing knowledge through teaching.