# Cohen's d

## Cohen's D is a method of estimating effect size in a t-test based on means or distances between/among means.

#### Key Points

• An effect size is a measure of the strength of a phenomenon (for example, the relationship between two variables in a statistical population) or a sample-based estimate of that quantity.

• An effect size calculated from data is a descriptive statistic that conveys the estimated magnitude of a relationship without making any statement about whether the apparent relationship in the data reflects a true relationship in the population.

• Cohen's D is an example of a standardized measure of effect, which are used when the metrics of variables do not have intrinsic meaning, results from multiple studies are being combined, the studies use different scales, or when effect size is conveyed relative to the variability in the population.

• As in any statistical setting, effect sizes are estimated with error, and may be biased unless the effect size estimator that is used is appropriate for the manner in which the data were sampled and the manner in which the measurements were made.

• Cohen's D is defined as the difference between two means divided by a standard deviation for the data: $D=\frac { { \bar { x } }_{ 1 }-{ \bar { x } }_{ 2 } }{ \sigma }$.

#### Terms

• A measure of effect size indicating the amount of different between two groups on a construct of interest in standard deviation units.

• The probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true.

#### Figures

1. ##### Standard Deviation Cohen's D

This formula calculates a pooled standard deviation with two independent samples.

2. ##### Cohen's D

Plots of the densities of Gaussian distributions showing different Cohen's effect sizes.

Cohen's D is a method of estimating effect size in a t-test based on means or distances between/among means (Figure 2). An effect size is a measure of the strength of a phenomenon—for example, the relationship between two variables in a statistical population (or a sample-based estimate of that quantity). An effect size calculated from data is a descriptive statistic that conveys the estimated magnitude of a relationship without making any statement about whether the apparent relationship in the data reflects a true relationship in the population. In that way, effect sizes complement inferential statistics such as p-values. Among other uses, effect size measures play an important role in meta-analysis studies that summarize findings from a specific area of research, and in statistical power analyses.

The concept of effect size already appears in everyday language. For example, a weight loss program may boast that it leads to an average weight loss of 30 pounds. In this case, 30 pounds is an indicator of the claimed effect size. Another example is that a tutoring program may claim that it raises school performance by one letter grade. This grade increase is the claimed effect size of the program. These are both examples of "absolute effect sizes," meaning that they convey the average difference between two groups without any discussion of the variability within the groups.

Reporting effect sizes is considered good practice when presenting empirical research findings in many fields. The reporting of effect sizes facilitates the interpretation of the substantive, as opposed to the statistical, significance of a research result. Effect sizes are particularly prominent in social and medical research.

Cohen's D is an example of a standardized measure of effect. Standardized effect size measures are typically used when the metrics of variables being studied do not have intrinsic meaning (e.g., a score on a personality test on an arbitrary scale), when results from multiple studies are being combined, when some or all of the studies use different scales, or when it is desired to convey the size of an effect relative to the variability in the population. In meta-analysis, standardized effect sizes are used as a common measure that can be calculated for different studies and then combined into an overall summary.

As in any statistical setting, effect sizes are estimated with error, and may be biased unless the effect size estimator that is used is appropriate for the manner in which the data were sampled and the manner in which the measurements were made. An example of this is publication bias, which occurs when scientists only report results when the estimated effect sizes are large or are statistically significant. As a result, if many researchers are carrying out studies under low statistical power, the reported results are biased to be stronger than true effects, if any.

## Relationship to Test Statistics

Sample-based effect sizes are distinguished from test statistics used in hypothesis testing in that they estimate the strength of an apparent relationship, rather than assigning a significance level reflecting whether the relationship could be due to chance. The effect size does not determine the significance level, or vice-versa. Given a sufficiently large sample size, a statistical comparison will always show a significant difference unless the population effect size is exactly zero. For example, a sample Pearson correlation coefficient of 0.1 is strongly statistically significant if the sample size is 1,000. Reporting only the significant p-value from this analysis could be misleading if a correlation of 0.1 is too small to be of interest in a particular application.

## Cohen's D

Cohen's D is defined as the difference between two means divided by a standard deviation for the data:

$D=\frac { { \bar { x } }_{ 1 }-{ \bar { x } }_{ 2 } }{ \sigma }$

Cohen's D is frequently used in estimating sample sizes. A lower Cohen's D indicates a necessity of larger sample sizes, and vice versa, as can subsequently be determined together with the additional parameters of desired significance level and statistical power.

The precise definition of the standard deviation s was not originally made explicit by Jacob Cohen; he defined it (using the symbol σ) as "the standard deviation of either population" (since they are assumed equal). Other authors make the computation of the standard deviation more explicit with the following definition for a pooled standard deviation with two independent samples (Figure 1).

#### Key Term Glossary

average
any measure of central tendency, especially any mean, the median, or the mode
##### Appears in these related concepts:
bias
(Uncountable) Inclination towards something; predisposition, partiality, prejudice, preference, predilection.
##### Appears in these related concepts:
Bias
a lack of impartiality. Biased sources favour a particular point of view.
biased
prejudiced
##### Appears in these related concepts:
correlation
One of the several measures of the linear statistical relationship between two random variables, indicating both the strength and direction of the relationship.
##### Appears in these related concepts:
correlation coefficient
Any of the several measures indicating the strength and direction of a linear relationship between two random variables.
##### Appears in these related concepts:
data
pieces of information; numbers with a context
##### Appears in these related concepts:
datum
A measurement of something on a scale understood by both the recorder (a person or device) and the reader (another person or device).
##### Appears in these related concepts:
descriptive statistics
A branch of mathematics dealing with summarization and description of collections of data sets, including the concepts of arithmetic mean, median, and mode.
##### Appears in these related concepts:
Descriptive statistics
the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics), in that descriptive statistics aim to summarize a sample, rather than use the data to learn about the population that the sample of data is thought to represent. This generally means that descriptive statistics, unlike inferential statistics, are not developed on the basis of probability theory.
##### Appears in these related concepts:
deviation
For interval variables and ratio variables, a measure of difference between the observed value and the mean.
##### Appears in these related concepts:
empirical
verifiable by means of scientific experimentation
##### Appears in these related concepts:
error
The difference between the population parameter and the calculated sample statistics.
##### Appears in these related concepts:
independent
not dependent; not contingent or depending on something else; free
##### Appears in these related concepts:
independent sample
Two samples are independent as they are drawn from two different populations, and the samples have no effect on each other.
##### Appears in this related concept:
inferential statistics
A branch of mathematics that involves drawing conclusions about a population based on sample data drawn from it.
##### Appears in these related concepts:
level
The specific value of a factor in an experiment.
##### Appears in these related concepts:
mean
one measure of the central tendency either of a probability distribution or of the random variable characterized by that distribution
##### Appears in these related concepts:
population
a group of units (persons, objects, or other items) enumerated in a census or from which a sample is drawn
##### Appears in these related concepts:
sample
a subset of a population selected for measurement, observation, or questioning to provide statistical information about the population
##### Appears in these related concepts:
significance level
A measure of how likely it is to draw a false conclusion in a statistical test, when the results are really just random variations.
##### Appears in these related concepts:
standard deviation
a measure of how spread out data values are around the mean, defined as the square root of the variance
##### Appears in these related concepts:
Standard Deviation
shows how much variation or dispersion exists from the average (mean), or expected value
##### Appears in these related concepts:
statistical power
the probability that a statistical test will reject a false null hypothesis, that is, that it will not make a type II error, producing a false negative
##### Appears in these related concepts:
statistical significance
A measure of how unlikely it is that a result has occurred by chance.
##### Appears in these related concepts:
statistics
a mathematical science concerned with data collection, presentation, analysis, and interpretation
##### Appears in these related concepts:
Statistics
The study of the collection, organization, analysis, interpretation, and presentation of data.
##### Appears in these related concepts:
variable
a quantity that may assume any one of a set of values