Repository of the course “MLOps and production in DS research 3.0”
Tools and Usage
For details see CONTRIBUTING.md
Repository management methodology
This repository follows a specific development and code maintenance methodology. Below are the main principles and guidelines for working with it.
Branching
We use the following branching strategy:
main
: Themain
branch is only for stable, production-ready code. Direct changes to it are prohibited.- if you want to make changes or new features use separate branch (bash: git branch
feature/*
) - before push new commit use pre-commit!
Pull Requests
- To merge code from
feature/*
branches intodevelop
or fromdevelop
intomain
, a Pull Request must be created. - Each Pull Request must be reviewed by at least one colleague before merging.
- Clear comments and a description of what has been done are required in the Pull Request.
Code Commenting
- All code must be adequately commented.
- Long functions or complex code sections require detailed comments.
Testing
- Each new feature must be covered by tests.
- Special attention should be paid to regression testing during feature development.
Running a Local Server
- Instructions for running a local server or development environment must be provided in README.md.
- README.md should contain information about environment requirements, dependencies installation, and application startup.
Docker
To get started with the project, follow these steps:
- Clone the repository to your computer using the following command:
bash git clone https://gitlab.com/ivan_ds/mlops_course
- Navigate to the project directory:
bash cd repository_name
- Build the Docker image
bash docker build -t mlops_course
- Running the container
bash docker run -it mlops_course /bin/bash
CIstages
linting
Build and publish your project as a package to the gitlab pypi registry (https://gitlab.com/ivan_golt/mlops_course/-/packages).
DinD build of a Docker image. Image published Gitlab Docker Registry.
EDA mlops_course
pages add documentation of project in Gitlab Pages
Snakemake
Realized next workflow Snakefile
- preprocessing: realized two types preprocessing of input dataset
- fittng models: on this stage fitting two models of GBDT (gbdt_params)(config/model/GBT.yaml) and RandomForest rndf_params
As part of the task, the pipelines were run using a conda virtual environment containing Scikit-learn, etc. packages to train the ML model inside the Snakemake pipelines. The launch command:
’’’ snakemake –cores 4 ’’’ Created artefacts after runnig pipeline artifacts.
Hydra
Hydra framework application is implemented. Configuration file config.yaml includes parameters of scaler and imputers of data and parameters for ML-model.
- Create two groups of configs(for preprocessing and for model)
- Integrated reading of configurations via Compose API into EDA code
- Used instantiate to initialise the model.
See the report for details of the implementation in the code
DVC
Dvc
Contacts
If you have any questions or suggestions regarding the development methodology in this repository, please contact us at: - Email: milovidov.999@gmail.com - telegram: @ivan_golt