All posts
Models10 March 2026·7 min read

Logistic Regression & the Confusion Matrix

When the answer is yes or no — understanding classification, the sigmoid curve, and how to measure if your model is actually right.

From Regression to Classification

Linear regression predicts numbers. But what if you need a yes/no answer?

  • Is this email spam?
  • Is this water sample polluted?
  • Will this customer churn?

That's classification — and the go-to algorithm for binary classification is logistic regression.

The Sigmoid Function

Instead of a straight line, logistic regression produces an S-shaped curve called the sigmoid function:

\sigma(x) = \frac{1}{1 + e^{-x}}

This function squashes any input into a value between 0 and 1 — which you can interpret as a probability. Above 0.5? Classify as "yes." Below? "No."

The Confusion Matrix

Once your model makes predictions, you need to know how well it did. The confusion matrix is a table that breaks down predictions into four categories:

Predicted PositivePredicted Negative
Actually PositiveTrue Positive (TP)False Negative (FN)
Actually NegativeFalse Positive (FP)True Negative (TN)
  • True Positive — the model said yes, and it was right
  • True Negative — the model said no, and it was right
  • False Positive — the model said yes, but it was wrong (Type I error)
  • False Negative — the model said no, but it was wrong (Type II error)

Example

If your model classifies 20 water samples:

  • 10 correctly identified as clean (TP = 10)
  • 3 incorrectly identified as clean when they were polluted (FP = 3)
  • 1 incorrectly identified as polluted when it was clean (FN = 1)
  • 6 correctly identified as polluted (TN = 6)

Evaluation Metrics

Accuracy

Overall correctness:

Accuracy = \frac{TP + TN}{TP + TN + FP + FN}

Sounds great, but accuracy can be misleading. If 95% of your data is one class, a model that always predicts that class gets 95% accuracy while being useless.

Precision

Of all the things the model predicted as positive, how many were actually positive?

Precision = \frac{TP}{TP + FP}

Use when false positives are costly. Example: a spam filter — you don't want real emails going to spam.

Recall

Of all the things that were actually positive, how many did the model catch?

Recall = \frac{TP}{TP + FN}

Use when false negatives are costly. Example: disease screening — you don't want to miss a sick patient.

Precision vs Recall — The Trade-off

There's almost always a tension between precision and recall. Improving one usually hurts the other. The right balance depends on your use case:

ScenarioPrioritise
Medical diagnosisRecall (don't miss cases)
Spam filterPrecision (don't block good email)
Fraud detectionRecall (catch all fraud)
Content recommendationPrecision (only show relevant content)

Logistic Regression in Practice

Logistic regression is part of the Scikit-learn library:

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, accuracy_score

model = LogisticRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

print(confusion_matrix(y_test, predictions))
print(accuracy_score(y_test, predictions))

The Designer's Takeaway

As a designer, understanding the confusion matrix helps you:

  • Design appropriate error states for AI-powered features
  • Decide whether your product should err on the side of caution (high recall) or precision
  • Communicate model limitations to users through thoughtful UI copy
  • Help product teams decide acceptable thresholds for model confidence

Subscribe to new posts

Get notified when I publish new learnings. No spam, unsubscribe anytime.