Welch's t-Test

In this activity we will show you how to use Welch's \( t \)-test to compare two independent samples. First, here are the assumptions for Welch's \( t \)-test:

  1. The first sample of size \( n_1 \) is drawn from a normal population with mean \( \mu_1 \) and variance \( \sigma_1^2 \).

  2. The second sample of size \( n_2 \) is also drawn from a normal population with a mean \( \mu_2 \) and variance \( \sigma_2^2 \).

  3. The two samples are independent.

Unlike the independent samples in the activity Pooled t-Test, we do not assume that the variances of the two parent distributions are equal. That is, we do not assume that \( \sigma_1=\sigma_2 \).

We learned in the activity Distribution of the Differences of the Sample Means Two Random Samples that if two samples of size \( n_1 \) and \( n_2 \) are drawn from two normal distributions, one having mean \( \mu_1 \) and variance \( \sigma_1^2 \), the second having mean \( \mu_2 \) and variance \( \sigma_2^2 \), the distribution of \( \overline{X}_1-\overline{X}_2 \) is a normal distribution having mean \( \mu_1-\mu_2 \) and variance \( \sigma_1^2/n_1+\sigma_2^2/n_2 \). That is: \[ \overline{X}_1-\overline{X}_2 \text{ is } N\left(\mu_1-\mu_2, \sqrt{\dfrac{\sigma_1^2}{n_1}+\dfrac{\sigma_2^2}{n_2}}\right) \] If we replace \( \sigma_1^2 \) and \( \sigma_2^2 \) with the sample variances \( s_1^2 \) and \( s_2^2 \), respectively, then the \( t \)-statistic becomes: \[ t=\frac{(\overline{X}_1-\overline{X}_2)-(\mu_2-\mu_2)}{\sqrt{\dfrac{s_1^2}{n_1}+\dfrac{s_2^2}{n_2}}} \] Historically, the difficulty with this model is determining what to use for the degrees of freedom. In 1947, B. L. Welch wrote an article entitled The Generalization of a “Student's” Problem when Several Different Population Variances are Involved. Welch established the following formula for the degrees of freedom for this current \( t \)-test: \[ \text{df}=\frac{\left(\dfrac{\sigma_1^2}{n_1}+\dfrac{\sigma_2^2}{n_2}\right)^2}{\dfrac{\left(\dfrac{\sigma_1^2}{n_1}\right)^2}{n_1-1}+\dfrac{\left(\dfrac{\sigma_2^2}{n_2}\right)^2}{n_2-1}} \]

Reading Activities of Third Grade Children

The following example comes from David Moore and George McCabe's Introduction to the Practice of Statistics, Second Edition.

An educator believes that new directed reading activities in the classroom will help elementary school pupils improve some aspects of their reading ability. She arranges for a third grade class of 21 students to take part in these activities for an 8-week period. A control classroom of 23 third graders follows the same curriculum without the activities. At the end of 8 weeks, all students are given a Degree of Reading Power (DRP) test, which measures the aspects of reading ability that the treatment is designed to improve.

Here are the reading scores for both the treatment group and the control group. \[ \begin{array}{cc|cc} \text{Treatement} & \text{Group} & \text{Control} & \text{Group}\\\hline 24 & 56 & 42 & 46\\ 43 & 59 & 43 & 10\\ 58 & 52 & 55 & 17\\ 71 & 62 & 26 & 60\\ 43 & 54 & 62 & 53\\ 49 & 57 & 37 & 42\\ 61 & 33 & 33 & 37\\ 44 & 46 & 41 & 42\\ 67 & 43 & 19 & 55\\ 49 & 57 & 54 & 28\\ 53 & & 20 & 48\\ & & 85 \end{array} \]

Next, let's state the hypotheses for the test.

Now, let's let \( \mu_1 \) represent the mean reading score of the treatment group and \( \mu_2 \) represent the mean reading score of the control group and write the hypotheses in symbolic format.

If we subtract \( \mu_2 \) from both sides of the symbolic hypotheses, then the hypotheses are prepared for Welch's \( t \)-test.

Note that this is a one-tailed test whose direction of extreme is to the right.

Testing Assumptions of the Test

We begin by testing whether the treatment group and the control group were drawn from normal populations. First, enter the data.


Let's draw some qqnorm plots.

qqnorm(treatment,main="Treatment Group")
qqnorm(control,main="Control Group")

plot of chunk unnamed-chunk-2


The evidence looks good. Both groups seem to be drawn from normal populations. Now, let's draw a boxplot to do some more comparison.


plot of chunk unnamed-chunk-3

Note that the median score of the Treatment Group seems higher than the median score of the Control Group. Furthermore, the inner quartile range (IQR) of the Treatment Group seems less than the inner quartile range of the Control Group. Also, the whiskers of the control group extend further in both directions than the whiskers of the Treatment Group, leading one to believe that the sample deviation of the Control Group is larger than the sample deviation of the Treatment Group.

Let's load the psych package again and use its describe command to compare the sample deviations.

##   var  n  mean    sd median trimmed   mad min max range  skew kurtosis  se
## 1   1 21 51.48 11.01     53   52.12 10.38  24  71    47 -0.54     0.04 2.4
##   var  n  mean    sd median trimmed   mad min max range skew kurtosis   se
## 1   1 23 41.52 17.15     42   41.11 17.79  10  85    75 0.27    -0.04 3.58

Sure enough, the standard deviation of the Treatment Group is 11.01 and the standard deviation of the Control Group is 17.15. They are different. Because the pooled \( t \)-test depends on these being the same, in this example we will use Welch's \( t \)-test to make a decision on our hypotheses.

Performing Welch's t-Test

First, we store the data we need for the \( t \)-statistic and Welch's degrees of freedom formula.


Because the null hypothesis assumes that there is no difference in the population means, \( \mu_1-\mu_2=0 \) . Thus, the \( t \)-statistic formula becomes: \[ \begin{align*} t&=\frac{(\overline{X}_1-\overline{X}_2)-(\mu_2-\mu_2)}{\sqrt{\dfrac{s_1^2}{n_1}+\dfrac{s_2^2}{n_2}}}\\ t&=\frac{\overline{X}_1-\overline{X}_2}{\sqrt{\dfrac{s_1^2}{n_1}+\dfrac{s_2^2}{n_2}}}\\ \end{align*} \]

We can now use R to compute the \( t \)-statistic as follows.

## [1] 2.311

Next, we use Welch's formula to calculate the degrees of freedom for the \( t \)-distribution. Note that the formula \[ \text{df}=\frac{\left(\dfrac{\sigma_1^2}{n_1}+\dfrac{\sigma_2^2}{n_2}\right)^2}{\dfrac{\left(\dfrac{\sigma_1^2}{n_1}\right)^2}{n_1-1}+\dfrac{\left(\dfrac{\sigma_2^2}{n_2}\right)^2}{n_2-1}} \] will be extremely difficult to enter. However, if we let \[ A=\frac{s_1^2}{n1}\qquad\text{and}\qquad B=\frac{s_2^2}{n_2}, \] then Welch's formula becomes \[ \text{df}=\frac{(A+B)^2}{\dfrac{A^2}{n_1-1}+\dfrac{B^2}{n_2-1}}, \] which will be a bit easier to enter. So, let's begin by entering \( A \) and \( B \) in R.


Now enter the degrees of freedom formula that has been adjusted with the variables \( A \) and \( B \).

## [1] 37.86

Notice that the degrees of freedom is not a whole number.

Calculating the \( p \)-Value

We now state the definition of the \( p \)-value, then translate it symbolically using the \( t \)-statistic. The alternative hypothesis was \( H_1 \): \( \mu_1-\mu_2>0 \) , so the direction of extreme is to the right. \[ \begin{align*} p\text{-value} &=P(\text{Observed Value or more extreme under $H_0$})\\ &=P(t\ge 2.311) \end{align*} \] Next, we draw the distribution and shade the region representing the \( t \)-value.

text(2.311,.25,"df = 37.86")

plot of chunk unnamed-chunk-9

We can now use R's pt(t,df) command to compute the \( p \)-value. Remember that the command returns the area to the left, but we want the area shaded to the right. Hence, we'll have to subtract our answer from 1.

## [1] 0.01319

Hence, the \( p \)-value is 0.01319.

Technology Helps Us Avoid Cumbersome Computations

The formula for the \( t \)-statistic and Welch's formula for the degrees of freedom are difficult to remember and they are hard to enter and arrive at the correct answer. As the statistics tests become more difficult, statisticians begin to depend more and more on technology. In this case, it is far easier to use R's t.test command to determine the \( p \)-value for the test of the hypotheses.

Typing ?t.test in the console window brings up the help file for the t.test command. Here is the important piece from the help file.

t.test(x, y = NULL,
       alternative = c("two.sided", "less", "greater"),
       mu = 0, paired = FALSE, var.equal = FALSE,
       conf.level = 0.95, ...)

We're not using a paired \( t \)-test, so we won't change the default value of the argument paired = FALSE. We are not assuming the variance of our two groups are equal, so we won't change the default value of the argument var.equal = FALSE. Our null hypotheses is \( H_0 \): \( \mu_1-\mu_2=0 \) , so we won't change the default value of the argument mu = 0. We're not doing a confidence interval, so we will also ignore that argument. However, the direction of extreme is to the right, so we will enter the argument alternative = “greater”.

##  Welch Two Sample t-test
## data:  treatment and control
## t = 2.311, df = 37.85, p-value = 0.01319
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  2.691   Inf
## sample estimates:
## mean of x mean of y 
##     51.48     41.52

Note that the output of R's t.test command produces the same \( t \)-statistic, nearly the same degrees of freedom (roundoff error?), and the same \( p \)-value with a minimal amount of work.