What’s New ?

The Top 10 favtutor Features You Might Have Overlooked

Read More

What are Histograms in R? (With Code Examples)

  • Jan 23, 2024
  • 8 Minutes Read
  • Why Trust Us
    We uphold a strict editorial policy that emphasizes factual accuracy, relevance, and impartiality. Our content is crafted by top technical writers with deep knowledge in the fields of computer science and data science, ensuring each piece is meticulously reviewed by a team of seasoned editors to guarantee compliance with the highest standards in educational content creation and publishing.
  • By Abhisek Ganguly
What are Histograms in R? (With Code Examples)

Data visualization is a very important part of data analysis. R offers an extensive ecosystem for creating engaging and informative graphs. The histogram is a popular visualization tool in R for exploring data distributions. In this article, we will take a look at how to create histograms in R, specifically histograms with two variables, and group histograms with the ggplot2 package.

What are Histograms?

A histogram is a graphical representation of the distribution of a dataset. It shows the frequency or probability that different values fall into specific bins or intervals. We use the hist() function in R programming to create a histogram.

Code:

hist(mtcars$mpg, main = "Histogram of MPG", xlab = "Miles Per Gallon", col = "skyblue", border = "black")

 

Plot:

Histogram - image 1

 

This simple code creates a histogram for the 'mpg' (miles per gallon) variable using the built-in 'mtcars' dataset to show the distribution of fuel efficiency.

Bivariate Histograms

Sometimes it's useful to explore the relationship between two variables in a dataset. Bivariate histograms are an effective way to visualize the joint distribution of two continuous variables. The hist2d() function from the hexbin package and the geom_bin2d() layer in ggplot2 are popular methods for creating bivariate histograms.

First, make sure the hexbin library is installed and loaded in your R library.

install.packages("hexbin")

library(hexbin)

 

After we are done with this, we can then code the bivariate histogram.

Code:

hbin <- hexbin(mtcars$mpg, mtcars$hp)
plot(hbin, main = "Bivariate Histogram: MPG vs. Horsepower", xlab = "MPG", ylab = "Horsepower", colramp = function(n) heat.colors(n))

library(ggplot2)
ggplot(mtcars, aes(x = mpg, y = hp)) +
  geom_bin2d(binwidth = c(5, 50), color = "white", fill = "skyblue") +
  labs(title = "Bivariate Histogram: MPG vs. Horsepower", x = "MPG", y = "Horsepower")

 

Plot:

Bivariate histogram - image 2

 

These examples show how to create bivariate histograms that reveal the joint distribution of fuel efficiency (MPG) and horsepower for the 'mtcars' dataset.

ggplot2 for Histograms

The R program includes some basic functions for creating simple histograms. The ggplot2 package is useful for its versatility and styling in data visualization. It uses styling and graphics, allowing users to create complex plots by layering them.

Let's look at how ggplot2 can be used to create any basic histograms.

Code:

ggplot(mtcars, aes(x = mpg)) +
  geom_histogram(binwidth = 4, fill = "skyblue", color = "black", alpha = 0.7) +
  labs(title = "Histogram of MPG with ggplot2", x = "Miles Per Gallon", y = "Frequency")

 

Plot:

Histogram of MPG with ggplot2 - image 3

 

In this example, we use the ggplot2 library to generate a histogram of the 'mpg' variable, adjusting the binwidth, fill color, and other styling features to improve the visual appeal of our histogram.

Customizing ggplot2 Histograms

One of ggplot2's strengths is its customizability. It is known for creating highly customized plots and we can use those features to implement styling of our choice in our histograms. You can customize the histogram's colors, axis labels, and titles to meet your specific requirements.

Code:

ggplot(mtcars, aes(x = mpg)) +
  geom_histogram(binwidth = 4, fill = "skyblue", color = "black", alpha = 0.7) +
  labs(title = "Customized Histogram of MPG with ggplot2", x = "Miles Per Gallon", y = "Frequency") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5, size = 16),
        axis.title = element_text(size = 12),
        axis.text = element_text(size = 10),
        axis.text.x = element_text(angle = 45, hjust = 1))

 

Plot:

Customized Histogram of MPG with ggplot2 - image 4

 

This example showcases a customized histogram with a minimal theme, adjusted title positioning, and rotated x-axis labels for better readability.

Two-Variable Histograms with ggplot2

To continue our investigation, let's use ggplot2 to generate a histogram with two variables. In this scenario, we'll look at the joint distribution of'mpg' and 'hp' variables from the'mtcars' dataset.

Code:

ggplot(mtcars, aes(x = mpg, y = hp)) +
  geom_bin2d(binwidth = c(5, 50), color = "white", fill = "skyblue") +
  labs(title = "Two-Variable Histogram: MPG vs. Horsepower", x = "MPG", y = "Horsepower")

 

Plot:

Two-Variable Histogram - image 5

 

This ggplot2 code generates a heatmap-like representation, providing insights into the combined distribution of miles per gallon and horsepower.

Grouped Histograms with ggplot2

Grouped histograms are especially useful for comparing the distributions of various groups within a dataset. This could include comparing the distribution of a variable across various categories or groups. We can accomplish this with ggplot2 by using facets or dodged bar plots. Let's look into both approaches.

Faceted Grouped Histogram

Code:

ggplot(mtcars, aes(x = mpg, fill = as.factor(cyl))) +
  geom_histogram(binwidth = 2, position = "identity", alpha = 0.7) +
  facet_wrap(~cyl, scales = "free_y") +
  labs(title = "Faceted Grouped Histogram: MPG by Cylinder", x = "Miles Per Gallon", y = "Frequency", fill = "Cylinders")

 

Plot:

Faceted Grouped Histogram - image 6

 

In this example, we use facets to create a grouped histogram, separating the distribution of 'mpg' for different cylinder categories in the 'mtcars' dataset.

Dodged Grouped Histogram

Code:

ggplot(mtcars, aes(x = as.factor(cyl), fill = as.factor(vs))) +
  geom_bar(position = "dodge", alpha = 0.7, stat = "count") +  # Use geom_bar and set stat="count"
  labs(title = "Dodged Grouped Bar Plot: Cylinder vs. V/S", x = "Cylinders", y = "Frequency", fill = "V/S") +
  scale_fill_manual(values = c("0" = "skyblue", "1" = "orange"))

 

Plot:

Dodged Grouped Bar Plot - image 7

 

In this instance, we opt for a dodged histogram to compare the distribution of 'vs' (V/S, a binary variable) across different cylinder categories.

Conclusion

In this article, we examined the fundamentals of building histograms in R, looked at how to generate histograms with two variables, and investigated grouped histograms with ggplot2. Histograms in R, particularly when used with the ggplot2 tool, are an effective way to display and visualize the distribution of data. The ggplot2 provides a flexible toolkit for data analysts and researchers, allowing them to explore the relationship between two variables, customize visualizations, and compare group distributions. 

FavTutor - 24x7 Live Coding Help from Expert Tutors!

About The Author
Abhisek Ganguly
Passionate machine learning enthusiast with a deep love for computer science, dedicated to pushing the boundaries of AI through academic research and sharing knowledge through teaching.