MLflow report

This task locate in branch mlflow

In this experiment compare metrics of two models: 1) catboost, 2) logistic_regression fitting on tf-idf embeddings on prediction toxic comments.

Mlflow infrastructure

In this step we try to simulate on how MLflow being utilized in production by using docker containers. In our docker-compose.yaml file we create several services:

minio : this service will create s3 like storage service, which can be useful since usually we will use storage like AWS S3 or Google Storage. pgsql : this service act as database storage used for MLflow backend store mlflow-web : this service will act as MLflow Tracking UI to help us manage our experiment

Dockerfile of mlflow server

Running Mlflow servise

Before starting our experiment we need to start our services by running command

docker-compose up -d

This will start all our services, after all service started there is several things to do. First, we need to create our bucket storage by visiting http://localhost:9000 , use credential defined on docker-compose.yaml to login. Create bucket with name mlflow.

Access MLflow UI by visiting http://localhost:5000

Running Mlflow experiments

To run mlflow experiments on local computer we need to define os enviroments variables

os.environ["AWS_ACCESS_KEY_ID"] = "mlflow" os.environ["AWS_SECRET_ACCESS_KEY"] = "password" os.environ["MLFLOW_S3_ENDPOINT_URL"] = "http://127.0.0.1:9000"

Also we need to deifne tracking uri and path of local dataset:

MLFLOW_TRACKING_URI = "http://localhost:5000" DATASET_PATH = "../data/mlflow_data/toxic_comments.csv"

Then we set tracking uri and name of experiment:

mlflow.set_tracking_uri(MLFLOW_TRACKING_URI) embeddings_experiment = mlflow.set_experiment("TF_IDF")

Then we start mlflow run: run_name = "logistic_regression" with mlflow.start_run(run_name=run_name) as run: ... In mlflow we have two running 1) catboost and 2) logistic regression

There we can compare metrics of models

After running experimnets model and metrics artifacts upload to s3 storage

Full code of task we can see in mlflow.ipynb