ML Pipeline

This webapp allows users to train a supervised machine learning model that supports modern algorithms such as simple configuration of K-folds, train test split ratio. It also supports modern algorithm training such as XGboost and Random Forest, hyperparameters tuning.

The application is dockerized and deployed via AWS EC2 instance ( http://44.203.130.230:8501/) to support public usage.

How to use the application

Application URL:

http://44.203.130.230:8501/

Requirement:

wget command (optional to download sample dataset)
a CSV dataset with label/target on the rightmost column. example:
- Feature1 Feature2 Label
  
  1 2 label1

Steps

Prepare a sample dataset, else you may use download the following dataset:

wget https://github.com/yiwei-ang/ML_pipeline/blob/main/tests/data/sample_winequality.csv
wget https://github.com/yiwei-ang/ML_pipeline/blob/main/tests/data/sample_iris.csv

Access the UR,L and upload the dataset, then your result should be ready by seconds!

How to run locally

Install git, and the latest anaconda/miniconda from: https://www.anaconda.com/products/distribution
Clone repository:

git clone [email protected]:yiwei-ang/ML_pipeline.git

Open the anaconda prompt/terminal that has conda, run the following to prepare a conda environment:

conda env create -f environment.yml
conda activate ml_pipeline

Run the application:

streamlit run "C:\Users\User\PycharmProjects\ML_pipeline\engine\frontend\main.py"

(Optional) To run a quick python test:

from engine.model.model import *
import pandas as pd

file = "tests\\data\\sample_winequality.csv"
df= pd.read_csv(file)
self = SupervisedModels(input_data=df)
result = self.run_pipeline()

To-Do

Add more analysis to UI:
- Feature importance
- AUC/ROC curve
- Learning curve
Techdebt:
- Add make arguments for flexible based on problem (binary/classification)
Add more features:
- Config before run
  - Model Type - optional
  - Problem Type
  - Train Test Split ratio
- Tuning (yes/no)
- Download results
  - Whether a DB is required for long running training service.
- Running time and UI
Schema validator (We can use Pydantic) on POST request.
Preview Dataset before processing
Dockerize the process and deployment.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.aws		.aws
.github/workflows		.github/workflows
.idea		.idea
engine		engine
refactor		refactor
tests		tests
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ML Pipeline

How to use the application

Application URL:

Requirement:

Steps

How to run locally

To-Do

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

License

yiwei-ang/ML_pipeline

Folders and files

Latest commit

History

Repository files navigation

ML Pipeline

How to use the application

Application URL:

Requirement:

Steps

How to run locally

To-Do

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages