This webapp allows users to train a supervised machine learning model that supports modern algorithms such as simple configuration of K-folds, train test split ratio. It also supports modern algorithm training such as XGboost and Random Forest, hyperparameters tuning.
The application is dockerized and deployed via AWS EC2 instance ( http://44.203.130.230:8501/) to support public usage.
wget
command (optional to download sample dataset)- a CSV dataset with label/target on the rightmost column. example:
-
Feature1 Feature2 Label 1 2 label1
-
- Prepare a sample dataset, else you may use download the following dataset:
wget https://github.com/yiwei-ang/ML_pipeline/blob/main/tests/data/sample_winequality.csv
wget https://github.com/yiwei-ang/ML_pipeline/blob/main/tests/data/sample_iris.csv
- Access the UR,L and upload the dataset, then your result should be ready by seconds!
- Install
git
, and the latest anaconda/miniconda from: https://www.anaconda.com/products/distribution - Clone repository:
git clone [email protected]:yiwei-ang/ML_pipeline.git
- Open the anaconda prompt/terminal that has
conda
, run the following to prepare a conda environment:
conda env create -f environment.yml
conda activate ml_pipeline
- Run the application:
streamlit run "C:\Users\User\PycharmProjects\ML_pipeline\engine\frontend\main.py"
- (Optional) To run a quick python test:
from engine.model.model import *
import pandas as pd
file = "tests\\data\\sample_winequality.csv"
df= pd.read_csv(file)
self = SupervisedModels(input_data=df)
result = self.run_pipeline()
- Add more analysis to UI:
- Feature importance
- AUC/ROC curve
- Learning curve
- Techdebt:
- Add make arguments for flexible based on problem (binary/classification)
- Add more features:
- Config before run
- Model Type - optional
- Problem Type
- Train Test Split ratio
- Tuning (yes/no)
- Download results
- Whether a DB is required for long running training service.
- Running time and UI
- Config before run
- Schema validator (We can use Pydantic) on POST request.
- Preview Dataset before processing
- Dockerize the process and deployment.