# t-Test for Two Samples: Independent and Overlapping

## Two-sample t-tests for a difference in mean involve independent samples, paired samples, and overlapping samples.

#### Key Points

• For the null hypothesis, the observed t-statistic is equal to the difference between the two sample means divided by the standard error of the difference between the sample means.

• The independent samples t-test is used when two separate sets of independent and identically distributed samples are obtainedâ€”one from each of the two populations being compared.

• An overlapping samples t-test is used when there are paired samples with data missing in one or the other samples.

#### Terms

• A hypothesis set up to be refuted in order to support an alternative hypothesis; presumed true until statistical evidence in the form of a hypothesis test indicates otherwise.

• A schedule for conducting treatment combinations in an experimental study such that any effects on the experimental results due to a known change in raw materials, operators, machines, etc., become concentrated in the levels of the blocking variable.

#### Figures

1. ##### Medical Treatment Research

Medical experimentation may utilize any two independent samples t-test.

The two sample t-test is used to compare the means of two independent samples. For the null hypothesis, the observed t-statistic is equal to the difference between the two sample means divided by the standard error of the difference between the sample means. If the two population variances can be assumed equal, the standard error of the difference is estimated from the weighted variance about the means. If the variances cannot be assumed equal, then the standard error of the difference between means is taken as the square root of the sum of the individual variances divided by their sample size. In the latter case the estimated t-statistic must either be tested with modified degrees of freedom, or it can be tested against different critical values. A weighted t-test must be used if the unit of analysis comprises percentages or means based on different sample sizes.

The two-sample t-test is probably the most widely used (and misused) statistical test. Comparing means based on convenience sampling or non-random allocation is meaningless. If, for any reason, one is forced to use haphazard rather than probability sampling, then every effort must be made to minimize selection bias.

## Unpaired and Overlapping Two-Sample T-Tests

Two-sample t-tests for a difference in mean involve independent samples, paired samples and overlapping samples. Paired t-tests are a form of blocking, and have greater power than unpaired tests when the paired units are similar with respect to "noise factors" that are independent of membership in the two groups being compared. In a different context, paired t-tests can be used to reduce the effects of confounding factors in an observational study.

### Independent Samples

The independent samples t-test is used when two separate sets of independent and identically distributed samples are obtained, one from each of the two populations being compared. For example, suppose we are evaluating the effect of a medical treatment, and we enroll 100 subjects into our study, then randomize 50 subjects to the treatment group and 50 subjects to the control group. In this case, we have two independent samples and would use the unpaired form of the t-test (Figure 1).

### Overlapping Samples

An overlapping samples t-test is used when there are paired samples with data missing in one or the other samples (e.g., due to selection of "I don't know" options in questionnaires, or because respondents are randomly assigned to a subset question). These tests are widely used in commercial survey research (e.g., by polling companies) and are available in many standard crosstab software packages.

#### Key Term Glossary

bias
(Uncountable) Inclination towards something; predisposition, partiality, prejudice, preference, predilection.
##### Appears in these related concepts:
Bias
a lack of impartiality. Biased sources favour a particular point of view.Â
##### Appears in these related concepts:
blocking
A schedule for conducting treatment combinations in an experimental study such that any effects on the experimental results due to a known change in raw materials, operators, machines, etc., become concentrated in the levels of the blocking variable.
##### Appears in these related concepts:
confounding
Describes a phenomenon in which an extraneous variable in a statistical model correlates (positively or negatively) with both the dependent variable and the independent variable; confounder = noun form.
##### Appears in these related concepts:
control
a separate group or subject in an experiment against which the results are compared where the primary variable is low or nonexistence
##### Appears in these related concepts:
control group
the group of test subjects left untreated or unexposed to some procedure and then compared with treated subjects in order to validate the results of the test
##### Appears in these related concepts:
critical value
the value corresponding to a given significance level
##### Appears in these related concepts:
data
pieces of information; numbers with a context
##### Appears in these related concepts:
datum
A measurement of something on a scale understood by both the recorder (a person or device) and the reader (another person or device).
##### Appears in these related concepts:
degrees of freedom
any unrestricted variable in a frequency distribution
##### Appears in these related concepts:
error
The difference between the population parameter and the calculated sample statistics.
##### Appears in these related concepts:
factor
The explanatory, or independent, variable in an experiment.
##### Appears in these related concepts:
haphazard
random; chaotic; incomplete; not thorough, constant, or consistent
##### Appears in these related concepts:
independent
not dependent; not contingent or depending on something else; free
##### Appears in these related concepts:
independent sample
Two samples are independent as they are drawn from two different populations, and the samples have no effect on each other.
##### Appears in this related concept:
mean
one measure of the central tendency either of a probability distribution or of the random variable characterized by that distribution
##### Appears in these related concepts:
null hypothesis
A hypothesis set up to be refuted in order to support an alternative hypothesis; presumed true until statistical evidence in the form of a hypothesis test indicates otherwise.
##### Appears in these related concepts:
observational study
a study drawing inferences about the possible effect of a treatment on subjects, where the assignment of subjects into a treated group versus a control group is outside the control of the investigator
##### Appears in these related concepts:
population
a group of units (persons, objects, or other items) enumerated in a census or from which a sample is drawn
##### Appears in these related concepts:
probability
The relative likelihood of an event happening.
##### Appears in these related concepts:
random
representative and undistinguished; typical and average; selected for no particular reason
##### Appears in these related concepts:
sample
a subset of a population selected for measurement, observation, or questioning to provide statistical information about the population
##### Appears in these related concepts:
sample mean
the mean of a sample of random variables taken from the entire population of those variables
##### Appears in these related concepts:
sampling
the process or technique of obtaining a representative sample
##### Appears in these related concepts:
standard error
A measure of how spread out data values are around the mean, defined as the square root of the variance.
##### Appears in these related concepts:
statistics
a mathematical science concerned with data collection, presentation, analysis, and interpretation
##### Appears in these related concepts:
Statistics
The study of the collection, organization, analysis, interpretation, and presentation of data.
##### Appears in these related concepts:
variance
a measure of how far a set of numbers is spread out