From Regression to Classification
Linear regression predicts numbers. But what if you need a yes/no answer?
- Is this email spam?
- Is this water sample polluted?
- Will this customer churn?
That's classification — and the go-to algorithm for binary classification is logistic regression.
The Sigmoid Function
Instead of a straight line, logistic regression produces an S-shaped curve called the sigmoid function:
This function squashes any input into a value between 0 and 1 — which you can interpret as a probability. Above 0.5? Classify as "yes." Below? "No."
The Confusion Matrix
Once your model makes predictions, you need to know how well it did. The confusion matrix is a table that breaks down predictions into four categories:
| Predicted Positive | Predicted Negative | |
|---|---|---|
| Actually Positive | True Positive (TP) | False Negative (FN) |
| Actually Negative | False Positive (FP) | True Negative (TN) |
- True Positive — the model said yes, and it was right
- True Negative — the model said no, and it was right
- False Positive — the model said yes, but it was wrong (Type I error)
- False Negative — the model said no, but it was wrong (Type II error)
Example
If your model classifies 20 water samples:
- 10 correctly identified as clean (TP = 10)
- 3 incorrectly identified as clean when they were polluted (FP = 3)
- 1 incorrectly identified as polluted when it was clean (FN = 1)
- 6 correctly identified as polluted (TN = 6)
Evaluation Metrics
Accuracy
Overall correctness:
Sounds great, but accuracy can be misleading. If 95% of your data is one class, a model that always predicts that class gets 95% accuracy while being useless.
Precision
Of all the things the model predicted as positive, how many were actually positive?
Use when false positives are costly. Example: a spam filter — you don't want real emails going to spam.
Recall
Of all the things that were actually positive, how many did the model catch?
Use when false negatives are costly. Example: disease screening — you don't want to miss a sick patient.
Precision vs Recall — The Trade-off
There's almost always a tension between precision and recall. Improving one usually hurts the other. The right balance depends on your use case:
| Scenario | Prioritise |
|---|---|
| Medical diagnosis | Recall (don't miss cases) |
| Spam filter | Precision (don't block good email) |
| Fraud detection | Recall (catch all fraud) |
| Content recommendation | Precision (only show relevant content) |
Logistic Regression in Practice
Logistic regression is part of the Scikit-learn library:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, accuracy_score
model = LogisticRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
print(confusion_matrix(y_test, predictions))
print(accuracy_score(y_test, predictions))The Designer's Takeaway
As a designer, understanding the confusion matrix helps you:
- Design appropriate error states for AI-powered features
- Decide whether your product should err on the side of caution (high recall) or precision
- Communicate model limitations to users through thoughtful UI copy
- Help product teams decide acceptable thresholds for model confidence