Changing Storage Backend
Out of the box, LineaPy is the default storage backend for all artifacts. For certain storage backends in use (e.g., storing model artifacts in MLflow), saving one more copy of the same artifact into LineaPy causes sync issue between the two systems. Thus, LineaPy supports using different storage backends for certain data types (e.g., ML models). This support is essential for users to leverage functionalities from both LineaPy and other familiar toolkit (e.g., MLflow).
Note
Storage backend refers to the overall system handling storage and should be distinguished from specific storage locations such as Amazon S3. For instance, LineaPy is a storage backend that can use different storage locations.
Saving ML Models to MLflow Backend
By default, LineaPy uses LineaPy to save artifacts for all object types. However, for users who have access to MLflow, MLflow might be their first choice to save the ML model. Thus, we enable using MLflow as the backend storage for ML models.
Info
Supported model flavors include sklearn
, xgboost
, prophet
, and statsmodels
at the moment.
The list is planned to be expanded to cover all MLflow-supported model flavors.
Info
For objects not supported by MLflow, storage will fall back to the LineaPy backend as usual.
Configuring MLflow
Depending on how our MLflow is configured, we might need to specify tracking URI
and
(optional) registry URI
in MLflow to start using MLflow.
mlflow.set_tracking_uri('your_mlflow_tracking_uri')
mlflow.set_registry_uri('your_mlflow_registry_uri')
To let LineaPy be aware of the existence of MLflow, we need to set corresponding config items if we want to use MLflow as the storage backend for ML models.
lineapy.options.set('mlflow_tracking_uri','your_mlflow_tracking_uri')
lineapy.options.set('mlflow_registry_uri','your_mlflow_registry_uri')
Setting Default Storage Backend for ML Models
Each user might have a different usage pattern for MLflow; some might use it for logging purposes and
record all developing models. Some might treat it as a public space and only publish models that meet
specific criteria to MLflow. In the first case, users want to use MLflow to save artifacts(ML models)
by default, and in the second case, users only want to use MLflow to save artifacts when they want.
Thus, we provide an option (default_ml_models_storage_backend
) to let users decide the default storage backend
for ML models when mlflow_tracking_uri
has been set.
Here are behaviors about which storage backend to use for ML models:
- Only set
mlflow_tracking_uri
but notdefault_ml_models_storage_backend
lineapy.options.set("mlflow_tracking_uri", "databricks")
lineapy.save(model, "model") # Use MLflow (if mlflow_tracking_uri is set, default value of default_ml_models_storage_backend is mlflow)
lineapy.save(model, "model", storage_backend="mlflow") # Use MLflow
lineapy.save(model, "model", storage_backend="lineapy") # Use LineaPy
- Set
mlflow_tracking_uri
anddefault_ml_models_storage_backend=="mlflow"
lineapy.options.set("mlflow_tracking_uri", "databricks")
lineapy.options.set("default_ml_models_storage_backend", "mlflow")
lineapy.save(model, "model") # Use MLflow
lineapy.save(model, "model", storage_backend="mlflow") # Use MLflow
lineapy.save(model, "model", storage_backend="lineapy") # Use LineaPy
- Set
mlflow_tracking_uri
anddefault_ml_models_storage_backend=="lineapy"
lineapy.options.set("mlflow_tracking_uri", "databricks")
lineapy.options.set("default_ml_models_storage_backend", "lineapy")
lineapy.save(model, "model") # Use LineaPy
lineapy.save(model, "model", storage_backend="mlflow") # Use MLflow
lineapy.save(model, "model", storage_backend="lineapy") # Use LineaPy
Note that when using MLflow as storage backend, lineapy.save()
is wrapping mlflow.flavor.log_model
under the hood. Users can use all the arguments in mlflow.flavor.log_model
in lineapy.save()
as well.
For instance, if we want to specify registered_model_name
, we can write the save statement as:
lineapy.save(model, name="model", storage_backend="mlflow", registered_model_name="clf")
Retrieving Artifact from Both LineaPy and MLflow
Depending on what users want to do (or be familiar with), users can retrieve the same artifact (ML model)
from LineaPy API and MLflow API once users execute lineapy.save()
with mlflow
as the storage backend
to save the artifact.
- Retrieve artifact (model) with LineaPy API
artifact = lineapy.get("model")
lineapy_model = artifact.get_value()
- Retrieve artifact (model) with MLflow API
client = mlflow.MlflowClient()
latest_version = client.search_model_versions("name='clf'")[0].version
# This is exactly the same object as `lineapy_model` in previous session
mlflow_model = mlflow.sklearn.load_model(f'models:/clf/{latest_version}')