Increasingly, data science projects do not simply end with a few key takeaway conclusions, but result in trained machine learning models going into production. With these models taking ever more critical roles in organisations’ services and tooling, it’s important for us to track how models were created, know why a particular model was selected over other candidates, and be able to reproduce them when necessary.

Experiments in Faculty Platform makes it easy to keep track of this information. We’ve integrated MLflow, a popular open source project providing tooling for the data science workflow, into Faculty, requiring adding only minor annotations to your existing code.

For more information on why we introduced the experiment tracking feature, see our blog post on it.

Getting started

All you need to do to use MLflow and the experiment tracking feature in Faculty is to import the Python library and start logging experiment runs:

import mlflow

with mlflow.start_run():
    mlflow.log_param("alpha", 0.1)
    mlflow.log_metric("accuracy", 0.98)

This will create a new run in the ‘Default’ experiment of the open project, which you can view in the Experiments screen:


Clicking on the run will open a more detailed view:


What can I log?

MLflow and experiment tracking log a lot of useful information about the experiment run automatically (start time, duration, who ran it, git commit, etc.), but to get full value out of the feature you need to log useful information like model parameters and performance metrics during the experiment run.

As shown in the above example, model parameters and metrics can be logged. These are shown both in the list of runs for an experiment and in the run detail screen. In the following, more complete example, we’re logging multiple useful metrics on the performance of a scikit-learn Support Vector Machine classifier:

from sklearn import datasets, svm, metrics
import mlflow

# Load and split training data
digits = datasets.load_digits()
data_train, data_test, target_train, target_test = train_test_split(
    digits.data, digits.target, random_state=221

with mlflow.start_run():

    gamma = 0.01
    mlflow.log_param("gamma", gamma)

    # Train model
    classifier = svm.SVC(gamma=gamma)
    classifier.fit(data_train, target_train)

    # Evaluate model performance
    predictions = classifier.predict(data_test)
    accuracy = metrics.accuracy_score(target_test, predictions)
    precision = metrics.precision_score(target_test, predictions, average="weighted")
    recall = metrics.recall_score(target_test, predictions, average="weighted")

    mlflow.log_metric("accuracy", accuracy)
    mlflow.log_metric("precision", precision)
    mlflow.log_metric("recall", recall)

If we then run the above code with different values of the gamma parameter, we can see and compare various runs and their metrics in the run history screen:


Logging models and artifacts

In addition to parameters and metrics, we can also log artifacts with experiment runs. These can be anything that can be stored in a file, including images and models themselves.

Logging models is fairly straightforward: first import the module in MLflow that corresponds to the model type you’re using, and call its log_model function. In the above example:

from sklearn import datasets, svm, metrics
import mlflow
import mlflow.sklearn

# Load and split training data
# ...

with mlflow.start_run():

    gamma = 0.01
    mlflow.log_param("gamma", gamma)

    # Train model
    classifier = svm.SVC(gamma=gamma)
    classifier.fit(data_train, target_train)

    # Log model
    mlflow.sklearn.log_model(classifier, "svm")

    # Evalutate model performance
    # ...


The following model types are supported in MLflow:

  • Keras (see mlflow.keras.log_model)
  • TensorFlow (see mlflow.tensorflow.log_model)
  • Spark (see mlflow.spark.log_model)
  • scikit-learn (see mlflow.sklearn.log_model)
  • MLeap (see mlflow.mleap.log_model)
  • H2O (see mlflow.h2o.log_model)
  • PyTorch (see mlflow.pytorch.log_model)

It’s also possible to wrap arbitrary Python fuctions in an MLflow model with mlflow.pyfunc.

The model will then be stored as artifacts of the run in MLflow’s MLmodel serialisation format. Such models can be inspected and exported from the artifacts view on the run detail page:


Context menus in the artifacts view provide the ability to download models and artifacts from the UI or load them into Python for further use.

It’s also possible to log any other kind of file as an artifact of a run. For example, to store a matplotlib plot in a run, first write it out as a file, then log that file as an artifact:

import os
import tempfile
import numpy
from matplotlib import pyplot
import mlflow

with mlflow.start_run():

    # Plot the sinc function
    x = numpy.linspace(-10, 10, 201)
    pyplot.plot(x, numpy.sinc(x))

    # Log as MLflow artifact
    with tempfile.TemporaryDirectory() as temp_dir:
        image_path = os.path.join(temp_dir, "sinc.svg")

The plot is then stored with the run’s artifacts and can be previewed and exported from the UI:


By the same mechanism, many types of files can be stored and previewed as part of an experiment run’s artifacts. A whole directory of artifacts can also be logged with mlflow.log_artifacts().

Multiple experiments

Each Faculty project has a ‘Default’ experiment that runs will be stored in, unless configured otherwise. However, if you have a lot of experiment runs, you may wish to break them up into multiple experiments. To do this, just set the name of the experiment you wish to use in your notebook before starting any runs:

import mlflow

mlflow.set_experiment("SVM classifier")

with mlflow.start_run():
    # ...

If the experiment does not already exist, it will be created for you.

Using experiment tracking from Faculty Jobs

It’s also possible to use experiment tracking with jobs. Just include the same MLflow tracking code as above in your Python script which gets run by the job, and experiments will be logged by the job automatically when run.

Experiments run from jobs will display the job and job run number used to generate them. Clicking on the job / run displayed on the experiment run will take you to the corresponding job, where you can see its logs and other runtime information.

Further reading

For more detail on how to use the experiment tracking feature and MLflow, have a look at the MLflow documentation.