What statistical technique can be used when comparing before/after data of a group?
This post was written with reference to Primer of Biostatistics, 7th edition by S. Glantz.
Prerequisites
To better understand this post, it is recommended to know the following topics:
Motivation
In a previous post, Meaning of tvalue and Student’s ttest, we introduced the ttest as one of the statistical techniques for comparing two sample groups. The ttest we introduced is also called the independent sample ttest, because it compares two sample groups that are independently drawn.
However, in some experiments, there may be a need to compare before/after data of the same subject. For example, experiments on the effectiveness of diet pills or smoking cessation may require this. If it is possible to statistically compare before/after data of the same subject, it would be possible to verify the effects of treatment more sensitively by confirming the effects for each individual.
In this post, we introduce the paired ttest as a parametric statistical technique that allows for the statistical comparison of before/after data of the same subjects.
Explanation of paired ttest
In the Motivation section, we mentioned that if it is possible to compare before/after data of the same subject for a given treatment, it would be more effective in verifying the effect of the treatment. Let’s take a closer look at what this means with an example illustrated in Figure 1, which shows the results of an experiment to verify the effect of a diuretic drug (a drug that increases urine output).
Figure 1. Results of an experiment to observe the treatment effect of a diuretic drug
This figure was created with reference to Figure 91 in Primer of biostatistics, 7th ed., by S. Glantz.
If you look at Figure 1, you can easily see the difference between the left (A) picture and the right (B) picture. The difference is whether the line is crossed out or not. The left (A) picture assumes that the placebo and drug groups are randomly sampled groups that are independent of each other. In contrast, the right (B) picture assumes that the experiment was conducted by administering a placebo and a drug to the same group of subjects.
Interpretation with independent sample ttest
Let’s conduct a ttest in the situation of the left (A) in Figure 1.
The means of the two groups are 994.4 and 1046.6, respectively. The standard deviations of each group are 113.0046 and 132.7983, respectively. If we calculate the tvalue, it is as follows^{1}:
\[t=\frac{\bar{X}_1  \bar{X}_2}{\sqrt{s_1^2/{n_1^2} + s_2^2/{n_2^2}}}=0.9467\]Here, both $n_1$ and $n_2$ are 10.
The degree of freedom is $n_1+n_22=18$. If we conduct a twotailed ttest, the criterion for a significant tvalue is $t_{0.05}=2.101$. Therefore, we can say that the tvalue given to us, 0.9467, is far from reaching 2.101. In other words, if statistical analysis is conducted in this way, we can only conclude that this medicine does not work as a diuretic.
Interpretation with paired sample ttest
However, let’s carefully look at the right(B) situation in Figure 1 to see what the experimental results look like. We can see that only 2 of the 10 subjects decreased urine output after taking the medicine, while 8 subjects had increased urine output. In terms of the individual effect of treatment that was applied, we can conclude that this medicine clearly works as a diuretic.
Therefore, let’s make a statistic about the difference before and after, which can be seen in the right (B) of Figure 1.
As mentioned in the meaning of tvalue and Student’s ttest, the fundamental meaning of tvalue is “difference / uncertainty”. In brief, we explained that since the sample mean always involves errors, a measure of the difference in mean values should be made while considering the error that occurs. Therefore, we can also think of the tvalue as follows.
\[\text{(paired test) }t = \frac{\text{mean difference}}{\text{standard error of the mean difference}}\]The mean difference can be denoted as $\bar{d}$ from the meaning of difference, and we can express the tvalue as follows.
\[t=\frac{\bar{d}}{s_{\bar{d}}}\]Here, the standard error for $\bar{d}$ is $s_{\bar{d}}=s_d/\sqrt{n}$.
If we think about it, both the independent ttest and the paired sample ttest divide the difference into uncertainty in the same logic.
Let’s solve the problem by considering the paired ttest.
The following are the daily urine volumes for the placebo and medication groups:
\[\text{Placebo}=[788, 859, 928, 957, 994, 1034, 1049, 1078, 1110, 1147]\] \[\text{Medication}=[849, 879, 910, 1019, 1145, 1003, 1114, 1162, 1201, 1184]\]Therefore, the difference between placebo and medication is as follows:
\[d = \text{Placebo}\text{Medication}=[61, 20, 18, 62, 151, 31, 65, 84, 91, 37]\]Thus, $\bar{d}$ is the mean of $d$, which is:
\[\bar{d}=52.2\]$s_{\bar{d}}$ can be calculated as $s_{\bar{d}}=s_d/\sqrt{n}$, where $s_d$ is the standard deviation of $d$, hence:
\[s_{\bar{d}}=\frac{s_d}{\sqrt{n}}=\frac{53.4723}{\sqrt{10}}=16.9094\]Therefore, the tvalue is:
\[\text{(paired test) }t=\frac{52.2}{16.9094}=3.0870\]This value is smaller than the criterion tvalue for a significant difference, $t_{0.05}=2.2622$ (df=9, twotailed), so we can conclude that there is a significant difference in urine output between the placebo and medication conditions.
Reference
 Primer of biostatistics, 7th ed., S. Glantz / Ch. 9 Experiments when each subject receives more than one treatment

Refer to the meaning of tvalue and Student’s ttest for the method of calculating tvalue. ↩