All posts
Practice12 March 2026·4 min read

ML Best Practices for Real Projects

Reproducibility, virtual environments, and the habits that separate a quick experiment from production-ready machine learning.

Make It Reproducible

The most important principle in applied ML: anyone should be able to run your project and get the same results.

This means:

  • Documenting your dependencies
  • Using version control
  • Setting random seeds where randomness is involved
  • Keeping your data pipeline clear and traceable

If your colleague can't reproduce your results on their machine, the work isn't complete.

Generalise Your Work

Your model should work on data it hasn't seen before — not just the specific dataset you trained on. This is called generalisation.

Signs your model isn't generalising:

  • High accuracy on training data, low accuracy on test data (overfitting)
  • The model relies on quirks in your specific dataset
  • Performance drops dramatically on slightly different data

Virtual Environments

Always work inside a virtual environment. This isolates your project's dependencies from your system Python installation.

# Create a virtual environment
python -m venv myproject-env

# Activate it (Windows)
myproject-env\Scripts\activate

# Activate it (Mac/Linux)
source myproject-env/bin/activate

# Install packages into this environment only
pip install pandas scikit-learn matplotlib

# Save your dependencies
pip freeze > requirements.txt

Why This Matters

  • Different projects may need different versions of the same library
  • Your system Python stays clean
  • requirements.txt makes your project reproducible
  • Use cd .. to navigate to the parent directory when organising projects

The Scikit-learn Ecosystem

Most ML in Python happens through Scikit-learn (sklearn). It provides a consistent API for:

  • Data preprocessing
  • Model training
  • Model evaluation
  • Feature selection

Logistic regression, linear regression, decision trees, random forests — they all follow the same pattern:

from sklearn.model_name import ModelClass

model = ModelClass()
model.fit(X_train, y_train)          # Train
predictions = model.predict(X_test)   # Predict

Project Structure Checklist

A well-organised ML project should have:

A virtual environment with requirements.txt
Clear separation of data, notebooks, and source code
Version control (Git)
A README explaining how to set up and run the project
Reproducible results (random seeds, documented preprocessing steps)

The Designer's Takeaway

These practices aren't just for engineers. If you're prototyping ML features, running data explorations, or collaborating with data scientists — following these habits means your work is shareable, verifiable, and trustworthy.

Understanding the development workflow also helps you design better tools for ML practitioners. Know the pain points, and you can design solutions for them.

Subscribe to new posts

Get notified when I publish new learnings. No spam, unsubscribe anytime.