What’s New ?

The Top 10 favtutor Features You Might Have Overlooked

Read More

Calculate Cumulative Average in R (with Examples)

  • Feb 03, 2024
  • 7 Minutes Read
  • Why Trust Us
    We uphold a strict editorial policy that emphasizes factual accuracy, relevance, and impartiality. Our content is crafted by top technical writers with deep knowledge in the fields of computer science and data science, ensuring each piece is meticulously reviewed by a team of seasoned editors to guarantee compliance with the highest standards in educational content creation and publishing.
  • By Aarthi Juryala
Calculate Cumulative Average in R (with Examples)

Finding any kind of averages of the data helps us see trends and patterns in everyday data. Whether you want to keep your savings in check or track how hot it has been this summer, averages smooth out the bumpy details and give us a clear idea about the data. Cumulative Averages also simplify complex data. In this article, we will learn how to find the Cumulative Average in R.

How to Find Cumulative Average in R?

Cumulative average is the average of all values of a dataset up to a given value. It is useful for understanding patterns in data over time while also maintaining context.

Cumulative average, unlike simple averages, considers the entire dataset's evolution. It is particularly useful for datasets with a cumulative nature, like financial or sequential data. 

However, it is sensitive to extreme values and may hide short-term fluctuations. So, cumulative averages are a better alternative when a long-term perspective is more important than small, immediate fluctuations.

The cumulative average is calculated as:

Cumulative Average formula

In R, it is usually written as:

Cumulative Average = cumsum(data) / seq_along(data) 

Seq_along(data) is the sequence of numbers representing the positions in the data. It helps to calculate the average at each step.

The most basic way to calculate the cumulative average of a given data is by using a loop. Here is how you do it:

data <- c(5, 8, 18, 12, 24)

# Initialize an empty vector for cumulative averages
cumulative_average <- numeric(length(data))

# Calculate cumulative average using a loop
for (i in 1:length(data)) {
  cumulative_average[i] <- mean(data[1:i])
}

cumulative_average

 

Output:

5.00000  6.50000 10.33333 10.75000 13.40000


But the most efficient way to do this in R is by using the cumsum() function. It adds up the values one by one and stores the running totals in a new vector.

# Sample dataframe of Monthly Spending
spendings <- data.frame(
  Person = c("Mare", "Billy", "Evie", "Johnny"),
  Jan = c(30, 25, 15, 22),
  Feb = c(35, 20, 18, 30),
  Mar = c(28, 22, 20, 25)
)

# Calculating cumulative average for each month
spendings $CA_Jan <- cumsum(spendings$Jan) / seq_along(spendings$Jan)
spendings $CA_Feb <- cumsum(spendings$Feb) / seq_along(spendings$Feb)
spendings $CA_Mar <- cumsum(spendings$Mar) / seq_along(spendings$Mar)

# Print the result
print(spendings)

 

Output:

Cumulative Average R Output 1

Here, the term seq_along(data) can be replaced with (1:length(data)). This creates a sequence of numbers from 1 to the length of the vector.

Cumulative Average Using dplyr

The dplyr package has a function called cummean() which can be used to calculate the cumulative average. It is quick and straightforward. Using dplyr also has the bonus of pipelining. Let’s replicate the same result as before to get a better understanding.

Let's take an example:

# Load the dplyr package
library(dplyr)

# Calculate cumulative average for each person using cummean() from dplyr and round to 2 decimals
spendings <- spendings %>%
  mutate(Cum_Avg_Jan = round(cummean(Jan), 2),
         Cum_Avg_Feb = round(cummean(Feb), 2),
         Cum_Avg_Mar = round(cummean(Mar), 2))

# Print the result
print(spendings)

 

Output:

Cumulative Average R Output 2

However, this method does not provide the customization and control that the previous method does. So, use cummeans() when you want to calculate quick cumulative averages with minimal code.

The dplyr also helps to reorder columns in R.

Conclusion

In a nutshell, cumulative averages in R help us understand trends in the data without getting lost in the details. It helps to simplify complex information and provides a broader view of patterns over time. The cumsum() function coupled with seq_along() makes the process more efficient and user-friendly.  

FavTutor - 24x7 Live Coding Help from Expert Tutors!

About The Author
Aarthi Juryala
I'm Aarthi, a final-year student in Artificial Intelligence & Data Science. My education and experience have introduced me to machine learning and Generative AI, and I have developed skills in languages like R, Python, and SQL. I'm passionate about exploring how AI can positively influence various fields.