Mann-Whitney-Wilcoxon test In R

แชร์
ฝัง
  • เผยแพร่เมื่อ 16 ต.ค. 2024
  • In a previous tutorial, I showed how to compare two groups under different scenarios using the student’s t-test. The student’s t-test requires that the distributions follow a normal distribution when in presence of small samples. In this article, I show how to compare two groups when the normality assumption is violated, using the Wilcoxon test.
    The Wilcoxon test is a non-parametric test, meaning that it does not rely on data belonging to any particular parametric family of probability distributions. Non-parametric tests have the same objective as their parametric counterparts. However, they have two advantages over parametric tests: they do not require the assumption of normality of distributions, and they can deal with outliers. A Student’s t-test for instance is only applicable if the data are Gaussian or if the sample size is large enough (usually n≥30, thanks to the central limit theorem).
    A non-parametric test should be used in other cases.
    One may wonder why we would not always use a non-parametric test, so we do not have to bother about testing for normality. The reason is that non-parametric tests are usually less powerful than corresponding parametric tests when the normality assumption holds. Therefore, all else being equal, with a non-parametric test you are less likely to reject the null hypothesis when it is false if the data follows a normal distribution. It is thus preferred to use the parametric version of a statistical test when the assumptions are met.
    There are actually two versions of the Wilcoxon test:
    The Mann-Withney-Wilcoxon test (also referred as Wilcoxon rank sum test or Mann-Whitney U test) is performed when the samples are independent (so this test is the non-parametric equivalent to the student’s t-test for independent samples).
    The Wilcoxon signed-rank test (also sometimes referred as Wilcoxon test for paired samples) is performed when the samples are paired/dependent (so this test is the non-parametric equivalent to the student’s t-test for paired samples).
    ###########################
    The code
    dat =data.frame(
    VAR = as.factor(c(rep("Kora", 12), rep("Dagim", 12))),
    GY = c(19, 18, 9, 17, 8, 7, 16, 19, 20, 9, 11, 18,
    16, 5, 15, 2, 14, 15, 4, 7, 15, 6, 7, 14 ))
    dat
    to draw a boxplot
    library(ggplot2)
    ggplot(dat) +
    aes(x = VAR, y = GY) +
    geom_boxplot(fill = "#0c4c8a") +
    theme_minimal()
    to check normality
    shapiro.test(subset(dat, VAR== "Kora")$GY)
    shapiro.test(subset(dat, VAR== "Dagim")$GY)
    #Remember that the null and alternative hypothesis of the Wilcoxon test are as follows:
    #H0: the 2 groups are equal in terms of the variable of interest
    #H1:the 2 groups are different in terms of the variable of interest
    #Applied to our research question, we have:
    #H0: Grain yield of Kora and Dagim varities are equal
    #H1: Grain yield of Kora and Dagim varities are different
    t1=wilcox.test(dat$GY ~ dat$VAR)
    t1
    plot with statistical results for independent samples
    load package
    library(ggstatsplot)
    ggbetweenstats(data = dat,x = VAR,y = GY,
    plot.type = "box",
    type = "non-parametric",
    centrality.plotting = FALSE)

ความคิดเห็น • 6