What’s New ?

The Top 10 favtutor Features You Might Have Overlooked

Read More

melt() function in R (with Code Examples)

  • Dec 07, 2023
  • 6 Minutes Read
melt() function in R (with Code Examples)

In the extensive world of data manipulation and reshaping in R, the melt() function stands out as a powerful tool, especially when working with complex datasets. Linked with the reshape2 package, this function is crucial in turning data frames into a format that's usually better for analysis and visualization. In this article, we will understand the melt() function, its syntax, and various applications.

Understanding the Basics

Before jumping into code and examples, let us first learn about what it is and how it works.

What is the Melt Function in R?

The melt function is fundamentally used for reshaping data frames. Its primary role is to convert a wide-format data frame into a long-format one. This change is especially handy when the initial structure of the dataset poses difficulties for specific types of analysis or visualization.

In essence, the melt function helps in "melting" or "unpivoting" the data. In a wide-format data frame, variables might be scattered across columns, making it less straightforward to work with. The melt function gathers these variables into a single column, simplifying the dataset and making it more adaptable for various analytical tasks.

Installing and Loading reshape2 Package

Before diving into practical examples, it's essential to ensure that the reshape2 package is installed and loaded. If you haven't installed it yet, you can do so using the following command:

install.packages("reshape2")

 

Once the package is installed, you can load it into your R environment with:

library(reshape2)

 

With the reshape2 package in hand, let's learn aboout the different aspects of the melt function. 

Basic Syntax

The basic syntax of the melt function is straightforward. Here's the code:

melted_data <- melt(original_data, id.vars = c("ID_var1", "ID_var2"), measure.vars = c("measure_var1", "measure_var2"))

 

Following are its parameters in detail.

original_data: The data frame you want to melt.
id.vars: The identifier variables that you want to retain in the melted data.
measure.vars: The variables you want to melt into a single column. 

Application of R Melt

Let’s learn about the application of the melt() function in R using different examples.

Melt Function Example

Let's consider a practical example using a hypothetical dataset. Let’s suppose we have a data frame wide_data as follows:

Code:

wide_data <- data.frame(
  ID = c(1, 2, 3),
  Age_2019 = c(25, 30, 22),
  Age_2020 = c(26, 31, 23),
  Height_2019 = c(160, 175, 155),
  Height_2020 = c(162, 177, 157)
)

print("Original Wide-format Data:")
print(wide_data)

 

Output:

  ID Age_2019 Age_2020 Height_2019 Height_2020
1  1       25       26         160         162
2  2       30       31         175         177
3  3       22       23         155         157

 

Now, let's use the melt function to convert this wide-format data frame into a long-format one:

Code:

melted_data <- melt(wide_data, id.vars = "ID", measure.vars = c("Age_2019", "Age_2020", "Height_2019", "Height_2020"))

print("Melted Long-format Data:")
print(melted_data)

 

Output:

  ID     variable value
1  1   Age_2019    25
2  2   Age_2019    30
3  3   Age_2019    22
4  1   Age_2020    26
5  2   Age_2020    31
6  3   Age_2020    23
7  1 Height_2019   160
8  2 Height_2019   175
9  3 Height_2019   155
10 1 Height_2020   162
11 2 Height_2020   177
12 3 Height_2020   157

 

As you can observe, the melt function has transformed the wide-format data frame into a long-format one, making it easier to work with and analyze.

Handling Multiple Identifier Variables

In numerous cases, datasets have more than one identifier variable. The melt function enables you to specify multiple identifier variables by using the id.vars parameter. Let's look at an example:

Code:

wide_data_multiple_ids <- data.frame(
  Country = c("USA", "Canada", "Mexico"),
  Age_2019 = c(25, 30, 22),
  Age_2020 = c(26, 31, 23),
  Height_2019 = c(160, 175, 155),
  Height_2020 = c(162, 177, 157)
)

print("Original Wide-format Data with Multiple ID variables:")
print(wide_data_multiple_ids)

melted_data_multiple_ids <- melt(
  wide_data_multiple_ids,
  id.vars = "Country",
  measure.vars = c("Age_2019", "Age_2020", "Height_2019", "Height_2020")
)

print("Melted Long-format Data with Multiple ID variables:")
print(melted_data_multiple_ids)

 

In this example, the Country variable acts as an additional identifier. The melted data frame that results will incorporate both the Country and ID variables.

Output:

   Country     variable value
1      USA   Age_2019    25
2   Canada   Age_2019    30
3   Mexico   Age_2019    22
4      USA   Age_2020    26
5   Canada   Age_2020    31
6   Mexico   Age_2020    23
7      USA Height_2019   160
8   Canada Height_2019   175
9   Mexico Height_2019   155
10     USA Height_2020   162
11  Canada Height_2020   177
12  Mexico Height_2020   157

 

Handling Variable Names in Melted Data

In the melted data frame, the variable column holds the original variable names. Sometimes, you might prefer to customize these column names. The melt function lets you do exactly that with the variable.name and value.name parameters. Here's an example:

Code:

melted_data_custom_names <- melt(
  wide_data,
  id.vars = "ID",
  measure.vars = c("Age_2019", "Age_2020", "Height_2019", "Height_2020"),
  variable.name = "Year_Variable",
  value.name = "Measurement"
)

print("Melted Long-format Data with Custom Variable and Value Names:")
print(melted_data_custom_names)

 
Output:

   ID Year_Variable Measurement
1   1   Age_2019           25
2   2   Age_2019           30
3   3   Age_2019           22
4   1   Age_2020           26
5   2   Age_2020           31
6   3   Age_2020           23
7   1 Height_2019         160
8   2 Height_2019         175
9   3 Height_2019         155
10  1 Height_2020         162
11  2 Height_2020         177
12  3 Height_2020         157

 

Melt Function in Matrix Reshaping

The melt function isn't restricted to data frames; it can also be used with matrices. In the context of matrices, the rows and columns serve a role similar to identifier and measured variables in data frames. Let's look at an example:

Code:

matrix_data <- matrix(1:12, nrow = 3, ncol = 4)

print("Original Matrix:")
print(matrix_data)

melted_matrix <- melt(matrix_data)

print("Melted Matrix:")
print(melted_matrix)

 

In this example, the melt function is directly applied to a matrix. The resulting melted data frame will feature columns named Var1, Var2, and value, representing the row index, column index, and cell values, respectively. 

Output:

   Var1 Var2 value
1     1    1     1
2     2    1     2
3     3    1     3
4     1    2     4
5     2    2     5
6     3    2     6
7     1    3     7
8     2    3     8
9     3    3     9
10    1    4    10
11    2    4    11
12    3    4    12

 

Aggregating Data Using Melted Format

One of the advantages of the long-format data is its compatibility with aggregation functions. After melting the data, you can easily perform operations like calculating means, sums, or other summary statistics. Let's consider an example:

Code:

mean_values <- aggregate(value ~ variable, data = melted_data, mean)

print("Mean Values by Variable:")
print(mean_values)

 

Output:

      variable value
1   Age_2019     25.66667
2   Age_2020     26.66667
3 Height_2019   163.33333
4 Height_2020   165.33333

 

In this example, the aggregate function is used to calculate the mean values for each variable in the melted data frame. This provides a concise summary of the mean values for each variable across different IDs.

Conclusion

In R programming, the melt function, especially with reshape2, is like a helpful tool for changing and organizing data. It takes wide data frames and makes them longer, which makes it easier to understand and work with for analysis and pictures. In this article, we looked at how to use the melt function step by step. We saw examples, learned how to deal with more than one identifier, changed variable names, used it with matrices, and saw how the melted data is good for putting data together. 

FavTutor - 24x7 Live Coding Help from Expert Tutors!

About The Author
Abhisek Ganguly
Passionate machine learning enthusiast with a deep love for computer science, dedicated to pushing the boundaries of AI through academic research and sharing knowledge through teaching.