What’s New ?

The Top 10 favtutor Features You Might Have Overlooked

Read More

Testing Proportions in R (With Code Examples)

  • Feb 07, 2024
  • 7 Minutes Read
Testing Proportions in R (With Code Examples)

Statistical analysis in data science is an important aspect of any research work. R, a popular statistical programming language provides us with a wide range of tools for conducting various statistical tests. The prop.test function is a commonly used R test for analyzing proportions. In this article, we will look at the intricacies of proportion tests in R, comparing prop.test to the chisq.test function. In addition, we will learn how to construct confidence intervals for proportions and perform a two-sample proportion test.

Introduction to Proportion Tests

Proportional tests are used when we are dealing with categorical data. When our target variable falls into distinct categories, we can use the proportional test. One of the most basic contexts in which percentage tests are useful is when dealing with binary outcomes or proportions, such as success/failure or yes/no situations.

The prop.test Function in R

R's prop.test function is specifically built for doing proportional tests. It is especially useful for comparing observed and expected proportions, as well as proportions between two groups.

Let us consider a hypothetical example to demonstrate the use of prop.test. Assume we have completed a survey on customer satisfaction levels and want to determine whether the proportion of pleased customers differs considerably from a predefined value.

Code:

satisfied_customers <- 75
total_customers <- 100
expected_proportion <- 0.8

result <- prop.test(satisfied_customers, total_customers, p = expected_proportion)

print(result)

 

Output:

	1-sample proportions test with continuity correction

data:  satisfied_customers out of total_customers, null probability expected_proportion
X-squared = 1.2656, df = 1, p-value = 0.2606
alternative hypothesis: true p is not equal to 0.8
95 percent confidence interval:
 0.6516159 0.8288245
sample estimates:
   p 
0.75 

 

In this example, the prop.test function determines whether the observed proportion of satisfied customers deviates considerably from the expected proportion of 0.8. The test result will provide the test statistic, p-value, and other pertinent information.

The chisq.test Function in R

The prop.test function is built exclusively for proportions, whereas the chisq.test function is more broad and can be used to assess independence in contingency tables. It can also be used to do proportion tests when working with a 2x2 contingency table.

Let's compare the usage of chisq.test with the previous example.

Code:

contingency_table <- matrix(c(satisfied_customers, total_customers - satisfied_customers,
                              expected_proportion * total_customers, (1 - expected_proportion) * total_customers),
                            nrow = 2)

result_chisq <- chisq.test(contingency_table)

print(result_chisq)

 

Output:

	Pearson's Chi-squared test with Yates' continuity correction

data:  contingency_table
X-squared = 0.45878, df = 1, p-value = 0.4982

 

In this scenario, we built a 2x2 contingency table for use with chisq.test. The test results will give the chi-squared statistic, degrees of freedom, and p-value.

prop.test() vs chisq.test() in R

Now, let's discuss the differences between prop.test and chisq.test and when to use each.

Use Cases for prop.test

1. Testing a Single Proportion: The prop.test is ideal for instances in which you wish to determine whether a single observed proportion differs significantly from a predicted percentage or a hypothesised value.

2. Comparing Two Proportions: prop.test is the recommended method for comparing proportions between two groups, particularly when the groups are independent.

3. One-Sample and Two-Sample Tests: prop.test can do both one-sample and two-sample proportion tests, allowing for greater versatility in various experimental scenarios.

Use Cases for chisq.test

1. Testing Independence in Contingency Tables: The chisq.test is more broad and can be used to assess independence in contingency tables with more than two categories. If your data contains more than two levels or groups, the chi-squared test may be more suited.

2. Handling 2x2 Contingency Tables: While prop.test can handle 2x2 tables, chisq.test is a good option, especially for bigger contingency tables where independence must be tested.

3. Appropriate for Expected Frequencies: The chisq.test is useful when you have predicted frequencies for each category and want to determine whether the observed frequencies differ considerably from the expected frequencies.

Comparing Results

It is crucial to note that, in many circumstances, the results of the prop.test and chisq.test for 2x2 tables will be comparable. Prop.test, on the other hand, is more suited to working with proportions and can provide a more obvious interpretation in proportion-related instances.

Confidence Intervals for Proportions

In addition to hypothesis testing, statistical analysis frequently involves establishing confidence intervals for proportions. The prop.test function in R can be used to compute confidence intervals for proportions.

Let's extend our previous example to include the calculation of a confidence interval.

Code:

confidence_interval <- prop.test(satisfied_customers, total_customers, p = expected_proportion)$conf.int

print(confidence_interval)

 

Output:

[1] 0.6516159 0.8288245
attr(,"conf.level")
[1] 0.95

 

This code snippet uses prop.test() function to calculate a confidence interval for the proportion of satisfied customers. The resulting confidence interval defines a range in which we can fairly expect the genuine population proportion to fall.

Two-Sample Proportion Test in R

In some cases, you might want to compare proportions between two separate groups. This is usually known as the two-sample proportion test. The prop.test function can be used for this purpose.

Assume you want to compare the proportions of satisfied consumers across two different products.

Code:

satisfied_product_A <- 45
total_product_A <- 60

satisfied_product_B <- 60
total_product_B <- 75

result_two_sample <- prop.test(c(satisfied_product_A, satisfied_product_B),
                                c(total_product_A, total_product_B),
                                alternative = "two.sided")

print(result_two_sample)

 

Output:

	2-sample test for equality of proportions with continuity correction

data:  c(satisfied_product_A, satisfied_product_B) out of c(total_product_A, total_product_B)
X-squared = 0.23625, df = 1, p-value = 0.6269
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.2071255  0.1071255
sample estimates:
prop 1 prop 2 
  0.75   0.80 

 

In this example, the prop.test function is used to perform a two-sample proportion test, comparing the percentage of happy consumers for Products A and B. The alternative argument is marked as "two.sided," implying a two-tailed test.

Conclusion

Understanding and implementing percentage tests in R is critical for deriving meaningful conclusions from categorical data. The type of your data and the hypothesis you want to test determine whether you should use prop.test or chisq.test. The Prop.test should be used when dealing with proportions, particularly in one- or two-sample cases. When assessing independence in contingency tables or working with big categorical datasets, use the chisq.test. Furthermore, establishing confidence intervals for proportions provides useful information about the range in which the genuine population proportion is likely to fall.

FavTutor - 24x7 Live Coding Help from Expert Tutors!

About The Author
Abhisek Ganguly
Passionate machine learning enthusiast with a deep love for computer science, dedicated to pushing the boundaries of AI through academic research and sharing knowledge through teaching.