Experiment Tracking with MLFlow

When you build machine learning models, it’s common to run dozens or hundreds of experiments to find the correct input data, parameters, and algorithm. The more experiments you run, the harder it gets to remember what works and what doesn’t.

Sometimes you see a great result but because it took hours or days to produce, you’ve already changed the code. And now you can’t remember what parameter you used to get that result!

Of course, this is an eternal problem for all scientific research. It’s one that’s traditionally solved by the Lab Notebook: a journal where scientists carefully record every experiment they run.

A handwritten page full of recorded numbers. — A page from Linus Pauling’s Lab Notebook

Many data scientists follow a similar approach, keeping notes of their experiments digitally or by hand. But this doesn’t always work for machine learning. You need to keep track of three things: the data, the code, and the model. And even if you’re super precise in your note-taking, it’s not feasible to record everything in full.

Luckily, you can automate experiment tracking with MLFlow. Not only is it simple to set up, and it adapts easily to your existing workflow.

What is MLFlow?

MLFlow is an extensive machine learning platform with several components. We’ll just focus on its experiment tracking features for now. We’ve talked before about model registries and why they’re valuable. There’s definitely some overlap between experiment tracking and model registries, but they aren’t the same.

Experiment tracking vs model registries

Model registries are for system managers and focus on models that make it to production. On the other hand, experiment tracking tools are for scientists to track all experiments, including unsuccessful ones.

Registries track and visualise these models’ performance and associated metadata. But for every model in production, there are usually hundreds of failed attempts. Experiment tracking is designed for scientists and researchers to help make sense of unsuccessful and not-yet-successful experiments. It makes the research process more efficient by ensuring that work isn’t lost or duplicated.

MLFlow’s Python library and dashboard

The two components of MLFlow you’ll use for experiment tracking are its Web UI and Python library. You can get them both by running a single install command:

pip install mlflow

You can immediately view the dashboard where all your experiments will live by running mlflow ui and visiting http://localhost:5000 in your browser.

A screenshot of the MLFLow dashboard, showing an empty table of experiments and some controls to filter, search, or create experiments. — The MLFlow dashboard’s default view

This dashboard already has a bunch of similar features to a Lab Notebook: you can take notes, search through previous data, and set up new experiments.

Things get more interesting when you use this in conjunction with the same mlflow Python library you’ve just installed. By adding a few lines to your existing machine learning code files, you can easily track every single run. This automatically saves the parameters you used, the results, and even the full model binary.

The screenshot below shows how you can start to automatically track your experiments in around ten lines of code.

A code sample showing scikit-learn training code with lines highlighted to show how to initialise mlflow and log parameters, metrics, and the full model. — Code to predict wine quality with tracking using MLFlow

The highlighted lines are MLFlow-specific, while the rest is a standard scikit-learn example to predict wine quality.

Each time you run an experiment with this code, it logs an entry you can view in the dashboard. Note the following:

In the Python code, we referred to an experiment_id`. You can see this in the dashboard after you create a new experiment in the UI. One experiment can have many runs, so you won’t need a new experiment ID each time you try new hyperparameters or other variations.
Your existing training and evaluation code needs to be within the mlflow.start_run block.
You can log parameters, metrics, and the entire model with the methods log_param, log_metric and sklearn.log_model.

From here, you can change parameters as you like, without worrying about forgetting which ones you’ve used. For example, in the screenshot you can see that MLFlow tracked the result of two experiments where we changed the alpha and l1_ratio parameters, making it easy to compare how these metrics affected the r2 and rmse performance.

Keeping free-form notes

The reason many scientists keep their own notebook rather than using a structured platform is they feel that pre-created templates aren’t flexible enough.

Something we like about MLFlow is that you can keep notes at different levels: notes relating to an entire experiment, or to a specific run.

A screenshot of a rich-text editor showing an example note that a scientist might leave about a specific run. — The note-taking functionality of MLFlow.

How MLFlow works internally

MLFlow can be set up in different ways. You can do it locally, as we did in the example above, or on a remote server, which lets you share a single dashboard among a whole team of scientists.

If you install it locally, you’ll notice it creates a directory called `mlruns` in the same directory where you executed your Python code. Inside this directory you can find the full model binaries (as .pkl files) for every experiment run.

If you want to track lots of models and you need more space, you can configure MLFlow to store these files on S3 or another cloud storage provider instead.

Scaling up your machine learning infrastructure

We use MLFlow as part of our Open MLOps architecture. You can use MLFlow on its own but it also works well in conjunction with other MLOps components, such as Prefect for scheduling and managing tasks, and a cloud-based Jupyter Notebook environment for easy collaboration.

We love discussing the challenges of building and keeping track of machine learning models. Contact us if you’d like to discuss your team’s approach to machine learning.

How We Track Machine Learning Experiments with MLFlow

What is MLFlow?

Experiment tracking vs model registries

MLFlow’s Python library and dashboard

Keeping free-form notes

How MLFlow works internally

Scaling up your machine learning infrastructure

Keep reading

Why is a Model Registry Valuable?

MLOps for model decay

Why is a Model Registry Valuable?

MLOps for model decay

MLOps for Research Teams

Talk to us about how machine learning can transform your business.

How We Track Machine Learning Experiments with MLFlow

What is MLFlow?

Experiment tracking vs model registries

MLFlow’s Python library and dashboard

Keeping free-form notes

How MLFlow works internally

Scaling up your machine learning infrastructure

Get Notified of New Articles

Keep reading

Why is a Model Registry Valuable?

MLOps for model decay

Why is a Model Registry Valuable?

MLOps for model decay

MLOps for Research Teams

Talk to us about how machine learning can transform your business.