📊 Linear Regression Calculator

Calculate line of best fit, R-squared, and make predictions

📈 Enter Data Points

Enter values for the independent variable
Enter values for the dependent variable

📊 Results

Regression Equation

y = mx + b
Line of Best Fit

Slope (m)

--

Y-Intercept (b)

--

R² (R-Squared)

--

Correlation (r)

--

Regression Line & Data Points

💡 Make Predictions

📝 Calculation Steps

📚 Understanding Linear Regression

What is Linear Regression?

Linear regression is a statistical method for modeling the relationship between a dependent variable (Y) and an independent variable (X) by fitting a linear equation to observed data. The method finds the "line of best fit" that minimizes the sum of squared distances between actual data points and predicted values.

The Regression Equation

Standard Form

y = mx + b

m: Slope (rate of change of Y with respect to X)
b: Y-intercept (value of Y when X = 0)

How to Calculate the Slope (m)

Slope Formula

m = Σ((xᵢ - x̄)(yᵢ - ȳ)) / Σ(xᵢ - x̄)²

Where x̄ is the mean of X values and ȳ is the mean of Y values. The slope represents how much Y changes for each unit change in X.

How to Calculate the Y-Intercept (b)

Y-Intercept Formula

b = ȳ - m × x̄

The Y-intercept is calculated after finding the slope, ensuring the regression line passes through the point (x̄, ȳ).

Understanding R² (Coefficient of Determination)

R² measures how well the regression line fits the data. It ranges from 0 to 1 and represents the proportion of variance in Y that can be explained by X:

For example, R² = 0.75 means 75% of the variation in Y can be explained by changes in X, while 25% is due to other factors.

Correlation Coefficient (r)

The correlation coefficient measures the strength and direction of the linear relationship:

Real-World Applications

Assumptions of Linear Regression

Interpreting the Results

Frequently Asked Questions

What's the difference between correlation and regression?

Correlation measures the strength and direction of a relationship between two variables (r), while regression creates a predictive equation (y = mx + b) that can be used to estimate Y values from X values. Correlation tells you "how related," regression tells you "how to predict."

How many data points do I need?

You need at least 2 data points to calculate a line, but more points provide better reliability. Generally, 10-30 data points give reasonable results. More data points help identify the true relationship and reduce the impact of outliers.

What does a negative slope mean?

A negative slope means there's an inverse relationship: as X increases, Y decreases. For example, in a study of car age vs. value, the slope would be negative because older cars typically have lower values.

Can I use linear regression for any data?

Linear regression works best when the relationship between variables is approximately linear. If your data shows a curved pattern, exponential growth, or other non-linear relationships, you may need polynomial regression or other methods. Always visualize your data first.

What's a good R² value?

It depends on your field. In physical sciences, R² > 0.9 is often expected. In social sciences, R² > 0.5 can be considered good. In business, R² > 0.7 is typically strong. The key is whether the model is useful for your specific purpose, not just the R² value alone.

How do outliers affect linear regression?

Outliers can significantly affect the regression line because the method minimizes squared distances. A single extreme point can pull the line away from the majority of data. Always check for outliers and consider whether they should be included or investigated separately.

Can I extrapolate beyond my data range?

Extrapolation (predicting outside your data range) is risky because the linear relationship may not hold beyond observed values. Interpolation (predicting within your data range) is generally safer. Use extrapolation cautiously and only when you have good reason to believe the relationship continues.