Coefficient of Determination
The coefficient of determination provides a measure of how well observed outcomes are replicated by a model.
Learning Objective

Interpret the properties of the coefficient of determination in regard to correlation.
Key Points
 The coefficient of determination,
$r^2$ , is a statistic whose main purpose is either the prediction of future outcomes or the testing of hypotheses on the basis of other related information.  The most general definition of the coefficient of determination is illustrated in, where
$SS_\text{err}$ is the residual sum of squares and$SS_\text{tot}$ is the total sum of squares. $r^2$ , when expressed as a percent, represents the percent of variation in the dependent variable y that can be explained by variation in the independent variable$x$ using the regression (best fit) line.$1r^2$ when expressed as a percent, represents the percent of variation in$y$ that is NOT explained by variation in$x$ using the regression line. This can be seen as the scattering of the observed data points about the regression line.
Terms

regression
An analytic method to measure the association of one or more independent variables with a dependent variable.

correlation coefficient
Any of the several measures indicating the strength and direction of a linear relationship between two random variables.
Full Text
The coefficient of determination (denoted
The Math
A data set will have observed values and modelled values, sometimes known as predicted values. The "variability" of the data set is measured through different sums of squares, such as:
 the total sum of squares (proportional to the sample variance);
 the regression sum of squares (also called the explained sum of squares); and
 the sum of squares of residuals, also called the residual sum of squares.
The most general definition of the coefficient of determination is:
where
Properties and Interpretation of $r^2$
The coefficient of determination is actually the square of the correlation coefficient. It is is usually stated as a percent, rather than in decimal form. In context of data,
$r^2$ , when expressed as a percent, represents the percent of variation in the dependent variable$y$ that can be explained by variation in the independent variable$x$ using the regression (best fit) line.$1r^2$ when expressed as a percent, represents the percent of variation in$y$ that is NOT explained by variation in$x$ using the regression line. This can be seen as the scattering of the observed data points about the regression line.
So
In many (but not all) instances where
Note that
 the independent variables are a cause of the changes in the dependent variable;
 omittedvariable bias exists;
 the correct regression was used;
 the most appropriate set of independent variables has been chosen;
 there is collinearity present in the data on the explanatory variables; or
 the model might be improved by using transformed versions of the existing set of independent variables.
Example
Consider the third exam/final exam example introduced in the previous section. The correlation coefficient is
The interpretation of
Key Term Reference
 bias
 Appears in these related concepts: Gender Bias, Context of Culture and Gender, and Social Psychology
 collinearity
 Appears in this related concept: Some Pitfalls: Estimability, Multicollinearity, and Extrapolation
 correlation
 Appears in these related concepts: Benefits of Globalization, Controlling for a Variable, and Descriptive and Correlational Statistics
 datum
 Appears in these related concepts: Change of Scale, Lab 1: Confidence Interval (Home Costs), and Type I and II Errors
 dependent variable
 Appears in these related concepts: Graphical Representations of Functions, Converting between Exponential and Logarithmic Equations, and What is a Quadratic Function?
 factor
 Appears in these related concepts: Rational Algebraic Expressions, Factors, and Finding Factors of Polynomials
 goodness of fit
 Appears in these related concepts: Goodness of Fit, Evaluating goodness of fit for a distribution, and The ChiSquare Distribution: Test for Homogeneity
 independent
 Appears in these related concepts: Fundamentals of Probability, Unions and Intersections, and Party Identification
 independent variable
 Appears in these related concepts: Experimental Design, The Cartesian System, and Experimental Research
 line
 Appears in these related concepts: Line, Qualities of Line, and Plotting Lines
 regression line
 Appears in these related concepts: Two Regression Lines, Outliers, and Line fitting, residuals, and correlation exercises
 residual
 Appears in these related concepts: Plotting the Residuals, Models with Both Quantitative and Qualitative Variables, and Degrees of Freedom
 residuals
 Appears in these related concepts: Inferences of Correlation and Regression, Midterm elections and unemployment, and Diagnostics for the email classifier
 sample
 Appears in these related concepts: Identifying Product Benefits, Surveys, and Basic Inferential Statistics
 statistics
 Appears in these related concepts: Communicating Statistics, Population Demography, and Understanding Statistics
 variable
 Appears in these related concepts: What is a Linear Function?, Math Review, and Introduction to Variables
Sources
Boundless vets and curates highquality, openly licensed content from around the Internet. This particular resource used the following sources: