What’s New ?

The Top 10 favtutor Features You Might Have Overlooked

Read More

Kruskal Wallis Test in R

  • Dec 10, 2023
  • 6 Minutes Read
Kruskal Wallis Test in R

Statistical analysis is an important aspect of research and data-driven decision-making. When working with data that doesn't follow normal patterns or situations where assumptions of normality aren't met, we turn to alternative tests. One such test is the Kruskal-Wallis test, a method used to compare the distribution of several independent groups without assuming a specific shape for the data. In this article, we'll explore the details of the Kruskal-Wallis test, how to use it in R, what it aims to find out, and how to interpret the results.

What is the Kruskal-Wallis Test?

The Kruskal-Wallis test is a non-parametric substitute for the one-way analysis of variance (ANOVA). It comes into play when the assumptions of ANOVA, like having a normal distribution and equal variances, still need to be fulfilled. This test helps determine if there are any meaningful differences between the medians of three or more independent groups.

Hypotheses of the Kruskal-Wallis Test

Before learning about the test, we first need to learn about what hypotheses are tested by the Kruskal-Wallis test.

Null Hypothesis (H0): The medians of the groups are equal.

Alternative Hypothesis (H1): At least one group has a different median.

If the p-value obtained from the test is less than the significance level (commonly set at 0.05), we reject the null hypothesis, suggesting that there are significant differences between the groups.

Kruskal-Wallis Test in R

Let's now see how we can implement and interpret the Kruskal Wallis test in the R programming language.

Implementing the Kruskal-Wallis Test

In R, using the Kruskal-Wallis test is simple. Let's look at a basic example with three independent groups (A, B, and C). We'll make use of the kruskal.test() function in R.

group_a <- c(23, 18, 30, 15, 27)
group_b <- c(21, 25, 20, 28, 19)
group_c <- c(30, 26, 33, 24, 29)

data <- list(Group_A = group_a, Group_B = group_b, Group_C = group_c)

result <- kruskal.test(data)

 

Interpreting the Results

After running the test, the result object contains information about the test. You can access the p-value using result$p.value. Here's how you might interpret the results:

cat("P-value:", result$p.value, "\n")

if (result$p.value < 0.05) {
  cat("Reject the null hypothesis. There are significant differences between groups.\n")
} else {
  cat("Fail to reject the null hypothesis. No significant differences between groups.\n")
}

 

Output:

P-value: 0.1285853 

Fail to reject the null hypothesis. No significant differences between groups.

 

In this example above, we can see that the p-value comes out to be 0.1285853 which is less than 0.5; hence our hypothesis is rejected.

Kruskal-Wallis vs. ANOVA

Now let's learn the major differences between ANOVA test and Kruskal Wallis test, and what we should use in different given conditions.

Understanding the Differences

Both the Kruskal-Wallis test and ANOVA aim to compare group means, but they vary in their assumptions and uses. ANOVA relies on assumptions like normal distribution and equal variances, making it suitable for parametric data. Conversely, the Kruskal-Wallis test is non-parametric and doesn't assume anything about the data distribution, making it robust for situations where normality isn't present.

Let's illustrate it with an example:

Code:

group_anova <- c(5, 8, 10, 12, 15)
group_bnova <- c(9, 11, 13, 15, 18)
group_cnova <- c(15, 12, 10, 8, 5)

data_anova <- list(Group_A = group_anova, Group_B = group_bnova, Group_C = group_cnova)

anova_result <- aov(data ~ group, data = data_anova)

summary(anova_result)

 

Output:

            Df Sum Sq Mean Sq F value Pr(>F)
Group        2  34.13   17.07   1.243  0.323
Residuals   12 164.80   13.73   

 

In this situation, ANOVA might give inaccurate results because of the assumption violations. The Kruskal-Wallis test, however, is better equipped to handle such scenarios effectively.

Code:

kruskal_result <- kruskal.test(data_anova)

cat("P-value:", kruskal_result$p.value, "\n")

 

Output:

P-value: 2.57646e-06 

 

In this example, the Kruskal-Wallis test was preferred over ANOVA because the data had a skewed distribution, which violates ANOVA's assumption of normality. The Kruskal-Wallis test, being non-parametric, doesn't assume any specific distribution, making it suitable for ordinal or skewed datasets. Getting the p-value directly from the Kruskal-Wallis test makes interpretation straightforward: a small p-value indicates significant differences between groups. This simplicity aligns with the robustness of non-parametric tests in handling various data distributions, providing a clearer interpretation compared to ANOVA.

Choosing the Right Test

Choosing between ANOVA and the Kruskal-Wallis test hinges on your data's characteristics. If your data is roughly normally distributed and the variances are similar across groups, ANOVA might be more suitable. However, if these assumptions are not met or if the data is ordinal or skewed, the Kruskal-Wallis test provides a better alternative.

Interpretation of Kruskal-Wallis Test Results

Interpreting the results of the Kruskal-Wallis test involves understanding the p-value and making informed decisions based on it. Here are some key points to consider:

  1. Small p-value (typically < 0.05): Reject the null hypothesis. There is evidence that at least one group has a different median.

  2. Large p-value (typically ≥ 0.05): Fail to reject the null hypothesis. There is not sufficient evidence to conclude that there are considerable differences in medians between groups.

  3. Pairwise Comparisons: If the overall test is significant, perform pairwise comparisons to identify which specific groups differ.

Conclusion

The Kruskal-Wallis test in R is a valuable tool for researchers and data analysts dealing with non-parametric data or situations where the assumptions of ANOVA are not met. By understanding the hypotheses, implementing the test in R, and interpreting the results, you can make informed decisions about the equality of group medians. Remember to choose the appropriate test based on the nature of your data and assumptions. Whether you're comparing the effectiveness of different treatments or analyzing survey responses across multiple groups, the Kruskal-Wallis test provides a robust solution for assessing group differences in a variety of scenarios.

FavTutor - 24x7 Live Coding Help from Expert Tutors!

About The Author
Abhisek Ganguly
Passionate machine learning enthusiast with a deep love for computer science, dedicated to pushing the boundaries of AI through academic research and sharing knowledge through teaching.