Make It Reproducible
The most important principle in applied ML: anyone should be able to run your project and get the same results.
This means:
- Documenting your dependencies
- Using version control
- Setting random seeds where randomness is involved
- Keeping your data pipeline clear and traceable
If your colleague can't reproduce your results on their machine, the work isn't complete.
Generalise Your Work
Your model should work on data it hasn't seen before — not just the specific dataset you trained on. This is called generalisation.
Signs your model isn't generalising:
- High accuracy on training data, low accuracy on test data (overfitting)
- The model relies on quirks in your specific dataset
- Performance drops dramatically on slightly different data
Virtual Environments
Always work inside a virtual environment. This isolates your project's dependencies from your system Python installation.
# Create a virtual environment
python -m venv myproject-env
# Activate it (Windows)
myproject-env\Scripts\activate
# Activate it (Mac/Linux)
source myproject-env/bin/activate
# Install packages into this environment only
pip install pandas scikit-learn matplotlib
# Save your dependencies
pip freeze > requirements.txtWhy This Matters
- Different projects may need different versions of the same library
- Your system Python stays clean
requirements.txtmakes your project reproducible- Use
cd ..to navigate to the parent directory when organising projects
The Scikit-learn Ecosystem
Most ML in Python happens through Scikit-learn (sklearn). It provides a consistent API for:
- Data preprocessing
- Model training
- Model evaluation
- Feature selection
Logistic regression, linear regression, decision trees, random forests — they all follow the same pattern:
from sklearn.model_name import ModelClass
model = ModelClass()
model.fit(X_train, y_train) # Train
predictions = model.predict(X_test) # PredictProject Structure Checklist
A well-organised ML project should have:
requirements.txtThe Designer's Takeaway
These practices aren't just for engineers. If you're prototyping ML features, running data explorations, or collaborating with data scientists — following these habits means your work is shareable, verifiable, and trustworthy.
Understanding the development workflow also helps you design better tools for ML practitioners. Know the pain points, and you can design solutions for them.