I'm an enthusiastic data scientist with over eight years of experience in data analysis, data visualization, and data storytelling. I enjoy solving challenging problems, harnessing the power of machine learning to derive valuable insights, and effectively communicating complex information.
Category | Skill |
---|---|
Programming | |
Data Manipulation | |
Data Visualization | |
AI & Machine Learning | |
Big Data | |
Cloud | |
DevOps | |
Version Control | |
Web Development | |
Development Environments |
To empower YouTube content creators and marketers with actionable insights into their channel's performance, especially in comparison to related channels, I developed a comprehensive ETL pipeline and designed an interactive Power BI report. This project involved:
- Data Extraction: Utilized the YouTube API to gather extensive data from three selected channels, including videos and comments.
- Data Transformation: Performed sentiment analysis on video comments via API requests to a RoBERTa sentiment analysis model, which I deployed using Gradio on a private Hugging Face Space.
- Data Loading: Stored the transformed data in a MySQL database hosted on AWS.
- Automation: Managed the ETL workflow using Apache Airflow, Docker, and AWS.
- Data Visualization: Designed an interactive Power BI report to deliver insigths into channel performance, featuring key metrics and comparative analysis.
This project enables YouTube content creators to easily monitor and evaluate their channel's performance relative to their peers, allowing for more informed decision-making and strategic planning.
- Motivation: Simplify the process of finding rental properties in Singapore's expensive real estate market by using machine learning to estimate rental prices.
- Data Collection: Scraped 1680 property listings from an online property portal, including information on price, size, address, bedrooms, bathrooms and more.
- Exploratory Data Analysis: Visualized property locations on an interactive map, generated a word cloud to extract insights from property agent descriptions, and examined descriptive statistics, distributions, and correlations.
- Data Preprocessing: Handled missing address data and engineered location-related features using the Google Maps API, extracted property features from agent descriptions and systematically evaluated multiple outlier handling methods.
- Model Training: Trained five machine learning models with baseline configurations, selected an XGBoost regression model with optimized hyperparameters, and achieved a test dataset performance with an RMSE of 995, a MAPE of 0.13, and an R² of 0.90.
- Model Deployment: Created a web application for serving the XGBoost model using the Flask framework. Containerized this application using Docker and successfully deployed the Docker container on render.com.
- Motivation: Develop a hate speech detector for social media comments.
- Data: Utilized the ETHOS Hate Speech Detection Dataset.
- Models: Trained and evaluated the performance of three deep learning models using TensorFlow and scikit-learn. The fine-tuned BERT model demonstrated superior performance (78.0% accuracy) compared to the SimpleRNN (66.3%) and LSTM (70.7%) models.
- Deployment: Prepared the fine-tuned BERT model for production by integrating it into a web application and an API endpoint using the Flask web framework.
Fine-tuned BERT: Confusion Matrix | Model Deployment |
---|---|
![]() |
This AI-powered tool streamlines the cover letter writing process, transforming job postings into compelling, personalized applications. Simply provide a job posting URL and your personal background information to generate tailored cover letters.
Project Highlights:
- Web Scraping: Retrieves the job description text directly from the job posting URL using
Requests
andBeautifulSoup
, with automatic language detection usinglangdetect
. - Intelligent Information Extraction: Analyzes job descriptions to identify and extract crucial information, including the job requirements, tasks, job title, and employer details using
OpenAI's chat completions API
withstructured outputs
. LeveragesPydantic
to validate the JSON output. - Automated Cover Letter Generation: Creates three unique cover letter suggestions by matching personal qualifications with job requirements using
chat completions API
. - Refinement and Consolidation: Reviews and refines each cover letter, then consolidates them into a final, optimized cover letter, providing you with ready-to-edit drafts.
- Multilingual and Personalization Support: Provides cover letters in both English and German and allows customization for personal background, motivation, salary expectations, and start date.
Streamline your job application process and increase your chances of landing interviews with professionally crafted cover letters that effectively showcase your qualifications.
Advanced SQL: MySQL for Ecommerce & Web Analytics, Udemy, February 2024, 🔗 see certificate
Skills: MySQL · SQL
AWS Certified Cloud Practitioner, AWS, January 2024, 🔗 see certificate
Skills: Amazon Web Services (AWS)
Ultimate AWS Certified Cloud Practitioner CLF-C02, Udemy, January 2024, 🔗 see certificate
Skills: Amazon Web Services (AWS)
Spark and Python for Big Data with PySpark, Udemy, January 2024, 🔗 see certificate
Skills: Spark · PySpark · AWS · Python · Machine Learning · Linear Regression · Logistic Regression · Decision Trees · Random Forest · Gradient Boosting · k-means clustering · Recommender Systems · Natural Language Processing (NLP)
Microsoft Power BI Data Analyst, Udemy, November 2023, 🔗 see certificate
Skills: Power BI
Deep Learning, alfatraining Bildungszentrum GmbH, April 2023
Skills: TensorFlow · NumPy · Natural Language Processing (NLP) · Python · Deep Learning · Recurrent Neural Networks (RNN) · Neural Networks · Scikit-Learn · Reinforcement Learning · Transfer Learning · Convolutional Neural Networks (CNN) · Time Series Analysis
Machine Learning by Stanford University & DeepLearning.AI, Coursera, April 2023, 🔗 see certificate
Skills: Decision Trees · Recommender Systems · Anomaly Detection · Python · Linear Regression · Neural Networks · Logistic Regression · Reinforcement Learning · Principal Component Analysis · k-means clustering
Python for Machine Learning & Data Science Masterclass, Udemy, March 2023, 🔗 see certificate
Skills: Decision Trees · Support Vector Machine (SVM) · Matplotlib · Random Forest · Naive Bayes · NumPy · Seaborn · Hierarchical Clustering · Natural Language Processing (NLP) · Pandas · Python · Linear Regression · Scikit-Learn · Logistic Regression · Principal Component Analysis · Gradient Boosting · DBSCAN · k-means clustering · K-Nearest Neighbors (KNN)
Machine Learning, alfatraining Bildungszentrum GmbH, February 2023
Skills: Decision Trees · Support Vector Machine (SVM) · Matplotlib · Naive Bayes · NumPy · Hierarchical Clustering · Pandas · Python · Linear Regression · Neural Networks · Scikit-Learn · Principal Component Analysis · DBSCAN · k-means clustering · K-Nearest Neighbors (KNN)
The Ultimate MySQL Bootcamp: Go from SQL Beginner to Expert, Udemy, December 2022, 🔗 see certificate
Skills: MySQL · SQL
Profile banner GIF based on the video by RDNE Stock project from Pexels