MLOps

Repository of the course “MLOps and production in DS research 3.0”

Tools and Usage

Repository management methodology

This repository follows a specific development and code maintenance methodology. Below are the main principles and guidelines for working with it.

Branching

We use the following branching strategy:

main: The main branch is only for stable, production-ready code. Direct changes to it are prohibited.
if you want to make changes or new features use separate branch (bash: git branch feature/*)
before push new commit use pre-commit!

Pull Requests

To merge code from feature/* branches into develop or from develop into main, a Pull Request must be created.
Each Pull Request must be reviewed by at least one colleague before merging.
Clear comments and a description of what has been done are required in the Pull Request.

Code Commenting

All code must be adequately commented.
Long functions or complex code sections require detailed comments.

Testing

Each new feature must be covered by tests.
Special attention should be paid to regression testing during feature development.

Running a Local Server

Instructions for running a local server or development environment must be provided in README.md.
README.md should contain information about environment requirements, dependencies installation, and application startup.

Docker

To get started with the project, follow these steps:

Clone the repository to your computer using the following command: bash git clone https://gitlab.com/ivan_ds/mlops_course
Navigate to the project directory: bash cd repository_name
Build the Docker image bash docker build -t mlops_course
Running the container bash docker run -it mlops_course /bin/bash

CIstages

linting
Build and publish your project as a package to the gitlab pypi registry (https://gitlab.com/ivan_golt/mlops_course/-/packages).
DinD build of a Docker image. Image published Gitlab Docker Registry.
EDA mlops_course
pages add documentation of project in Gitlab Pages

Snakemake

Realized next workflow Snakefile

preprocessing: realized two types preprocessing of input dataset
fittng models: on this stage fitting two models of GBDT (gbdt_params)(config/model/GBT.yaml) and RandomForest rndf_params

As part of the task, the pipelines were run using a conda virtual environment containing Scikit-learn, etc. packages to train the ML model inside the Snakemake pipelines. The launch command:

’’’ snakemake –cores 4 ’’’ Created artefacts after runnig pipeline artifacts.

Hydra

Hydra framework application is implemented. Configuration file config.yaml includes parameters of scaler and imputers of data and parameters for ML-model.

Create two groups of configs(for preprocessing and for model)
Integrated reading of configurations via Compose API into EDA code
Used instantiate to initialise the model.

See the report for details of the implementation in the code

DVC

Dvc

Contacts

If you have any questions or suggestions regarding the development methodology in this repository, please contact us at: - Email: milovidov.999@gmail.com - telegram: @ivan_golt