What is Linear Regression?
Linear regression is the simplest predictive model. It draws a straight line through your data that best represents the relationship between your input features and the output value.
The equation is one you've seen in maths class:
Where:
- y = the value you're predicting
- x = the input feature
- m = the slope (how much y changes for each unit of x)
- c = the intercept (the value of y when x is 0)
How Do We Know if the Line is Good?
We use evaluation metrics to measure how far off our predictions are from the actual values.
Mean Absolute Error (MAE)
The average of absolute differences between predicted and actual values. Easy to interpret — it's in the same units as your data.
Mean Squared Error (MSE)
The average of squared differences. Penalises large errors more heavily than MAE.
Root Mean Squared Error (RMSE)
The square root of MSE. Brings the error back to the original units while still penalising large errors.
R² (R-Squared)
How much variance in Y does the model explain?
- R² = 1.0 → the model explains all the variance (perfect fit)
- R² = 0.0 → the model explains none of the variance (no better than guessing the mean)
- R² < 0 → the model is worse than just predicting the mean
Calculated as: R² = 1 − (RSS / TSS)
Where RSS is the residual sum of squares and TSS is the total sum of squares.
Checking Correlation First
Before building a regression model, check which features are correlated with your target:
print(df.corr())Features with high correlation to the target are good candidates for your model.
Multiple Linear Regression
When you use more than one input feature, it's called multiple linear regression. The equation extends to:
Each feature gets its own coefficient. A negative coefficient means that feature has a negative association with the target — as it increases, the prediction decreases.
Polynomial Regression
What if the relationship isn't a straight line? Polynomial regression fits a curve instead:
You're essentially creating extra features (x², x³, etc.) from your original feature to capture non-linear patterns.
Use polynomial regression when:
- The scatter plot shows a curved relationship
- A linear model has poor R² despite reasonable correlations
- You understand that higher-degree polynomials risk overfitting
From Simple to Multiple
| Type | Features | Equation |
|---|---|---|
| Simple Linear | 1 input | y = mx + c |
| Multiple Linear | 2+ inputs | y = b₀ + b₁x₁ + b₂x₂ + ... |
| Polynomial | 1+ input with powers | y = a + bx + bx² + ... |
The Designer's Takeaway
Linear regression is the foundation. Even if you never implement one yourself, understanding it helps you:
- Interpret dashboards and analytics in ML-powered products
- Know when a prediction is just a "best-fit line" and not magic
- Design appropriate confidence indicators in UIs that show predictions