Watch
Watching this resources will notify you when proposed changes or new versions are created so you can keep track of improvements that have been made.
Favorite
Favoriting this resource allows you to save it in the “My Resources” tab of your account. There, you can easily access this resource later when you’re ready to customize it or assign it to your students.
Coefficient of Determination
The coefficient of determination provides a measure of how well observed outcomes are replicated by a model.
Learning Objectives

Interpret the properties of the coefficient of determination in regard to correlation.

Compute the coefficient of determination.
Key Points
 The coefficient of determination, r^{2}, is a statistic whose main purpose is either the prediction of future outcomes or the testing of hypotheses on the basis of other related information.
 The most general definition of the coefficient of determination is illustrated in, where SS_{err} is the residual sum of squares and SS_{tot} is the total sum of squares.
 r^{2}, when expressed as a percent, represents the percent of variation in the dependent variable y that can be explained by variation in the independent variable x using the regression (best fit) line.
 1  r^{2} when expressed as a percent, represents the percent of variation in y that is NOT explained by variation in x using the regression line. This can be seen as the scattering of the observed data points about the regression line.
Terms

correlation coefficient
Any of the several measures indicating the strength and direction of a linear relationship between two random variables.

regression
An analytic method to measure the association of one or more independent variables with a dependent variable.
Full Text
The coefficient of determination (denoted r^{2} and pronounced r squared) is a statistic used in the context of statistical models. Its main purpose is either the prediction of future outcomes or the testing of hypotheses on the basis of other related information. It provides a measure of how well observed outcomes are replicated by the model, as the proportion of total variation of outcomes explained by the model. Values for r^{2} can be calculated for any type of predictive model, which need not have a statistical basis.
The Math
A data set will have observed values and modelled values, sometimes known as predicted values. The "variability" of the data set is measured through different sums of squares, such as:
 the total sum of squares (proportional to the sample variance);
 the regression sum of squares (also called the explained sum of squares); and
 the sum of squares of residuals, also called the residual sum of squares.
The most general definition of the coefficient of determination is illustrated in , where SS_{err} is the residual sum of squares and SS_{tot}is the total sum of squares.
Coefficient of Determination
This equation gives the most general form of the coefficient of determination.
Properties and Interpretation of r^{2}
The coefficient of determination is actually the square of the correlation coefficient. It is is usually stated as a percent, rather than in decimal form. In context of data, r^{2 }can be interpreted as follows:
 r^{2}, when expressed as a percent, represents the percent of variation in the dependent variable y that can be explained by variation in the independent variable x using the regression (best fit) line.
 1  r^{2}when expressed as a percent, represents the percent of variation in y that is NOT explained by variation in x using the regression line. This can be seen as the scattering of the observed data points about the regression line.
So r^{2}is a statistic that will give some information about the goodness of fit of a model. In regression, the r^{2} coefficient of determination is a statistical measure of how well the regression line approximates the real data points. An r^{2} of 1 indicates that the regression line perfectly fits the data.
In many (but not all) instances where r^{2} is used, the predictors are calculated by ordinary leastsquares regression: that is, by minimizing SS_{err}. In this case, r^{2} increases as we increase the number of variables in the model. This illustrates a drawback to one possible use of r^{2}, where one might keep adding variables to increase the r^{2} value. For example, if one is trying to predict the sales of a car model from the car's gas mileage, price, and engine power, one can include such irrelevant factors as the first letter of the model's name or the height of the lead engineer designing the car because the r^{2} will never decrease as variables are added and will probably experience an increase due to chance alone. This leads to the alternative approach of looking at the adjusted r^{2}. The explanation of this statistic is almost the same as r^{2} but it penalizes the statistic as extra variables are included in the model.
Note that r^{2 }does not indicate whether:
 the independent variables are a cause of the changes in the dependent variable;
 omittedvariable bias exists;
 the correct regression was used;
 the most appropriate set of independent variables has been chosen;
 there is collinearity present in the data on the explanatory variables; or
 the model might be improved by using transformed versions of the existing set of independent variables.
Example
Consider the third exam/final exam example introduced in the previous section. The correlation coefficient is r = 0.6631. Therefore, the coefficient of determination is r^{2} = 0.6631^{2} = 0.4397.
The interpretation of r^{2} in the context of this example is as follows. Approximately 44% of the variation (0.4397 is approximately 0.44) in the final exam grades can be explained by the variation in the grades on the third exam. Therefore approximately 56% of the variation (1  0.44 = 0.56) in the final exam grades can NOT be explained by the variation in the grades on the third exam.
Key Term Reference
 bias
 Appears in these related concepts: Distorting the Truth with Descriptive Statistics, Principles of Writing in the Sciences, and Surveys and Interviews
 collinearity
 Appears in this related concept: Some Pitfalls: Estimability, Multicollinearity, and Extrapolation
 correlation
 Appears in these related concepts: Descriptive and Correlational Statistics, Controversies in Intelligence and Standardized Testing, and Methods for Researching Human Development
 datum
 Appears in these related concepts: Inferential Statistics, Applications of Statistics, and Change of Scale
 dependent variable
 Appears in these related concepts: Formulating the Hypothesis, Converting between Exponential and Logarithmic Equations, and Experimental Research
 factor
 Appears in these related concepts: Randomized Design: SingleFactor, The Perceptual Process, and Solving Quadratic Equations by Factoring
 goodness of fit
 Appears in these related concepts: Homogeneity and Heterogeneity, Estimating a Population Variance, and Goodness of Fit
 independent
 Appears in these related concepts: Fundamentals of Probability, Unions and Intersections, and Party Identification
 independent variable
 Appears in these related concepts: Experimental Design, Evaluating Model Utility, and Graphing Functions
 line
 Appears in these related concepts: Line, Varieties of Line, and Qualities of Line
 regression line
 Appears in these related concepts: A Graph of Averages, The Regression Method, and Outliers
 residual
 Appears in these related concepts: The Correction Factor, Plotting the Residuals, and Degrees of Freedom
 residuals
 Appears in these related concepts: Graphs for Quantitative Data, Expected Value, and Inferences of Correlation and Regression
 sample
 Appears in these related concepts: Identifying Product Benefits, Surveys, and Basic Inferential Statistics
 statistics
 Appears in these related concepts: Communicating Statistics, Understanding Statistics, and Population Demography
 variable
 Appears in these related concepts: Fundamentals of Statistics, The Linear Function f(x) = mx + b and Slope, and Math Review
Sources
Boundless vets and curates highquality, openly licensed content from around the Internet. This particular resource used the following sources:
Cite This Source
Source: Boundless. “Coefficient of Determination.” Boundless Statistics. Boundless, 21 Jul. 2015. Retrieved 27 Nov. 2015 from https://www.boundless.com/statistics/textbooks/boundlessstatisticstextbook/correlationandregression11/correlation44/coefficientofdetermination2092661/