Chi-square test of independence in R

栏目: IT技术 · 发布时间: 4年前

内容简介:This article explains how to perform the Chi-square test of independence in R and how to interpret its results. To learn more about how the test works and how to do it by hand, I invite you to read the article “To briefly recap what has been said in that a

Chi-square test of independence in R

Photo by Giorgio Tomassetti

Introduction

This article explains how to perform the Chi-square test of independence in R and how to interpret its results. To learn more about how the test works and how to do it by hand, I invite you to read the article “ Chi-square test of independence by hand ”.

To briefly recap what has been said in that article, the Chi-square test of independence tests whether there is a relationship between two categorical variables. The null and alternative hypotheses are:

  • H0: the variables are independent, there is no relationship between the two categorical variables. Knowing the value of one variable does not help to predict the value of the other variable
  • H1: the variables are dependent, there is a relationship between the two categorical variables. Knowing the value of one variable helps to predict the value of the other variable

The Chi-square test of independence works by comparing the observed frequencies (so the frequencies observed in your sample) to the expected frequencies if there was no relationship between the two categorical variables (so the expected frequencies if the null hypothesis was true).

Example

Data

For our example, let’s reuse the dataset introduced in the article “ Descriptive statistics in R ”. This dataset is the well-known iris dataset slightly enhanced. Since there is only one categorical variable and the Chi-square test requires two categorical variables, we added the variable size which corresponds to small if the length of the petal is smaller than the median of all flowers, big otherwise:

dat <- iris
dat$size <- ifelse(dat$Sepal.Length < median(dat$Sepal.Length),
  "small", "big"
)

We now create a contingency table of the two variables Species and size with the table() function:

table(dat$Species, dat$size)##             
##              big small
##   setosa       1    49
##   versicolor  29    21
##   virginica   47     3

The contingency table gives the observed number of cases in each subgroup. For instance, there is only one big setosa flower, while there are 49 small setosa flowers in the dataset.

It is also a good practice to draw a barplot to visually represent the data:

library(ggplot2)ggplot(dat) +
  aes(x = Species, fill = size) +
  geom_bar() +
  scale_fill_hue() +
  theme_minimal()

Chi-square test of independence in R

Chi-square test of independence in R

For this example, we are going to test in R if there is a relationship between the variables Species and size . For this, the chisq.test() function is used:

test <- chisq.test(table(dat$Species, dat$size))
test## 
##  Pearson's Chi-squared test
## 
## data:  table(dat$Species, dat$size)
## X-squared = 86.035, df = 2, p-value < 2.2e-16

Everything you need appears in this output: the title of the test, what variables have been used, the test statistic, the degrees of freedom and the p-value of the test. You can also retrieve the χ2 test statistic and the p-value with:

test$statistic # test statistic## X-squared 
##  86.03451test$p.value # p-value## [1] 2.078944e-19

If you need to find the expected frequencies, use test$expected .

If a warning such as “Chi-squared approximation may be incorrect” appears, it means that the smallest expected frequencies are lower than 5. To avoid this issue, you can either:

  • gather some levels (especially those with a small number of observations) to increase the number of observations in the subgroups, or
  • use the Fisher’s exact test

Fisher’s exact test does not require the assumption of a minimum of 5 expected counts. It can be applied in R thanks to the function fisher.test() . This test is similar to the Chi-square test in terms of hypothesis and interpretation of the results. Learn more about this test in this article dedicated to this type of test.

Conclusion and interpretation

From the output and from test$p.value we see that the p-value is less than the significance level of 5%. Like any other statistical test, if the p-value is less than the significance level, we can reject the null hypothesis.

⇒ In our context, rejecting the null hypothesis for the Chi-square test of independence means that there is a significant relationship between the species and the size. Therefore, knowing the value of one variable helps to predict the value of the other variable.

Thanks for reading. I hope the article helped you to perform the Chi-square test of independence in R and interpret its results. If you would like to learn how to do this test by hand and how it works, read the article “ Chi-square test of independence by hand ”.

As always, if you have a question or a suggestion related to the topic covered in this article, please add it as a comment so other readers can benefit from the discussion. If you find a mistake or bug, you can inform me by raising an issue on GitHub . For all other requests, you can contact me here .

Get updates every time a new article is published by subscribing to this blog .

Related articles:

An efficient way to install and load R packages

Do my data follow a normal distribution? A note on the most widely used distribution and how to test for normality in R

Fisher’s exact test in R: independence test for a small sample

Chi-square test of independence in R

How to create a timeline of your CV in R


以上所述就是小编给大家介绍的《Chi-square test of independence in R》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

技术的本质

技术的本质

布莱恩•阿瑟(Brian Arthur) / 曹东溟、王健 / 浙江人民出版社 / 2014-4-1 / 62.90

★《技术的本质》是复杂性科学奠基人、首屈一指的技术思想家、“熊彼特奖”得主布莱恩•阿瑟所创建的一套关于技术产生和进化的系统性理论,本书是打开“技术黑箱”的钥匙,它用平实的语言将技术最本质的思想娓娓道来。 ★技术,是一个异常美丽的主题,它不动声色地创造了我们的财富,成就了经济的繁荣,改变了我们存在的方式。尽管技术如此重要,却少有人在快节奏的生活中停下来深入思考技术。我们了解技术的原理,却不知道......一起来看看 《技术的本质》 这本书的介绍吧!

XML、JSON 在线转换
XML、JSON 在线转换

在线XML、JSON转换工具

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器

UNIX 时间戳转换
UNIX 时间戳转换

UNIX 时间戳转换