Skip to content

An in-depth analysis of a gaming retailer's customer data to improve overall sales performance through enhanced understanding of customer behavior and loyalty patterns.

Notifications You must be signed in to change notification settings

Sujith-DA279/TurtleGames_Prediction_Analysis

Repository files navigation

Turtle Games : Predicting Future Outcomes

πŸ“‹ Project Brief

I conducted an in-depth analysis of Turtle Games customer data to improve overall sales performance through enhanced understanding of customer behavior and loyalty patterns. Using Python, R, and advanced statistical methods, I delivered actionable insights that can drive enhanced marketing effectiveness, customer retention, and targeted product offerings to increase revenue.

βœ… Objectives

The analysis addressed key business needs of

  • Understanding customer behavior and purchasing patterns.
  • Identifying key factors influencing customer loyalty.
  • Developing predictive models to forecast future sales.

They were addressed through multiple advanced data techniques:

  • Customer Loyalty Drivers Analysis: Applied linear regression modeling to identify and quantify key factors influencing customer loyalty
  • Customer Segmentation: Implemented decision tree and K-means clustering to develop behavior-driven customer personas
  • Sentiment Analysis: Utilized NLP techniques (VADER, TextBlob) on 2,000 customer reviews to assess product satisfaction
  • Integrated Recommendation System: Combined quantitative insights with sentiment analysis to create segment-specific strategies

🎯 Key Findings & Business Impact

The analysis revealed significant opportunities for enhanced customer engagement through strategic loyalty program redesign and targeted marketing. Spending score and remuneration emerged as primary loyalty drivers (correlation scores of 0.67 and 0.62), while five distinct customer segments were identified with significantly different loyalty behaviors (ranging from 275 to 3988 average loyalty points).

Data-Driven Recommendations:

  • Implement a tiered loyalty program with differentiated benefits based on five identified customer segments
  • Deploy predictive loyalty model to anticipate customer behavior and tailor marketing approaches
  • Enhance product quality and gameplay mechanics based on sentiment analysis findings
  • Develop targeted strategies for "Occasional Affluents" with high income but inconsistent spending patterns

Project Overview:

⏱️ Duration πŸ† Grade πŸ› οΈ Technologies 🧠 ML Algorithms πŸ“Š Datasets
6 Weeks 97% (High Distinction) 2,000 customer reviews
6 customer attributes

Analytical Approach:

1. Data Preparation & Engineering

Process Highlights & Technical Implementation
Data Import & Validation β€’ Loaded customer reviews dataset with purchase and demographic data
β€’ Performed dataset validation with .info(), .describe(), .shape() functions
β€’ Confirmed no missing values in the dataset with .isna().sum()
Data Cleaning β€’ Identified and retained potential outliers in loyalty points to preserve dataset integrity
β€’ Flagged 2.5% of age data showing potential misreporting
β€’ Applied exploratory visualizations to assess data distributions and relationships
Feature Analysis β€’ Analyzed correlation matrix between quantitative variables
β€’ Visualized relationships between loyalty points and potential predictor variables
β€’ Assessed categorical variables (gender, education) for their impact on loyalty points
Feature Engineering β€’ Applied log and square root transformations to address skewness in loyalty points
β€’ One-hot encoded categorical features for decision tree analysis
β€’ Created normalized features for clustering algorithms

2. Linear Regression Analysis

Process Technical Implementation
Simple Linear Regression β€’ Applied OLS function in statsmodel to analyze relationships between loyalty points and individual predictors
β€’ Developed multiple iterations of models to identify optimal variable combinations
β€’ Implemented model validation through training/test split
Multiple Linear Regression β€’ Created comprehensive models incorporating spending score, remuneration and age
β€’ Applied statistical tests for normality (Shapiro-Wilk) and heteroscedasticity (Breusch-Pagan)
β€’ Evaluated model performance using adjusted R-squared (84.0%)
Model Optimization β€’ Tested variable transformations (log, square root) to improve model performance
β€’ Implemented Weighted Least Squares regression to address heteroscedasticity
β€’ Validated final model with VIF assessment to confirm absence of multicollinearity
OLS_LoyaltyPoints

Simple and MultiLinear Regressions plotted

3. Customer Segmentation Analysis

Component Technical Implementation
Decision Tree Analysis β€’ Applied DecisionTreeRegressor from sklearn to identify key loyalty drivers
β€’ Implemented feature importance analysis to identify spending score and remuneration as primary factors
β€’ Used GridSearchCV to optimize hyperparameters (max_depth=3, max_leaf_nodes=40)
β€’ Achieved 91.04% R-squared score on the full dataset
K-means Clustering β€’ Used elbow method and silhouette scores to determine optimal number of clusters (k=5)
β€’ Applied K-means clustering to segment customers based on spending score and remuneration
β€’ Achieved silhouette score of 0.604 indicating good cluster separation
β€’ Characterized clusters through descriptive statistics and visualization
Persona Development β€’ Created detailed customer personas based on cluster analysis
β€’ Developed cross-tabulation analysis to examine loyalty point distribution across segments
β€’ Generated cluster visualization using scatter plots with color-coding
β€’ Formulated segment-specific marketing recommendations
ClassificationKNN

Scatterplot of Loyalty points showing unique customer plots

4. Sentiment Analysis

Process Technical Implementation
Text Preprocessing β€’ Tokenized and cleaned review and summary text using NLTK
β€’ Verified no relevant duplicates in the review dataset
β€’ Performed a manual verification on 20 rows to identify most reliable text source (summary or full review text)
Sentiment Classification β€’ Implemented VADER SIA model for sentiment polarity scoring
β€’ Applied TextBlob for polarity and subjectivity analysis
β€’ Conducted comparative analysis between VADER (72.5% accuracy) and TextBlob (22.5% accuracy)
β€’ Created sentiment distribution visualizations
Word Analysis β€’ Generated word clouds for positive and negative sentiment reviews
β€’ Extracted top 20 frequent terms by sentiment polarity
β€’ Identified key positive terms (play, great, love, fun) and negative terms (anger, disappointed, boring)
β€’ Conducted product-level sentiment aggregation
SentimentPolarity

Distribution showing Sentiment Polarity and Subjectivity scores with a rule-based classification of sentiment scores


πŸ—οΈ Key Insights:

Loyalty Drivers Analysis

  • Identified spending score (0.67) and remuneration (0.62) as strongest predictors of customer loyalty
  • Determined that age has a weak negative correlation (-0.04) with loyalty points but remains statistically significant
  • Developed robust Weighted Least Squares regression model explaining 82.1% of loyalty point variance
  • Created predictive formula: Loyalty Points = -1944.89 + 31.87 Γ— Spending Score + 31.39 Γ— Remuneration + 10.57 Γ— Age

Important

Value Discovery: Every 1-point increase in spending score correlates with a 31.87-point increase in loyalty points, while each Β£1,000 in income corresponds to a 31.39-point increase

Customer Segmentation

  • Identified five distinct customer segments with clear behavior patterns and loyalty characteristics
  • Labelled "Premium Buyers" (17.8% of customers) averaging 3988 loyalty points with high income and frequent purchases
  • Classified "Occasional Affluents" (16.5% of customers) with high income but relatively low loyalty engagement (912 points)
  • Categorized the largest segment as "Regular Customers" (38.7%) with mid-range income and consistent spending

Important

Value Discovery: "Bargain Hunters" (13.4% of customers) show high spending despite lower income, representing opportunity for targeted value-based marketing

Sentiment Analysis

  • Determined that 90% of customer reviews express positive sentiment toward Turtle Games products
  • Validated that VADER sentiment analyzer (72.5% accuracy) significantly outperformed TextBlob (22.5%) for game product reviews
  • Identified key positive review themes around gameplay experience, physical components, and family enjoyment
  • Located negative sentiment clusters around product quality issues, age-appropriateness, and gameplay mechanics

Important

Value Discovery: Premium Buyers show highest positive sentiment (90%) with only 5% negative reviews, indicating strong correlation between satisfaction and loyalty

Sentiment_Product_CustomerTrends

Average Sentiment Polarity for different products and Customer Personas


πŸ“Š Data-Driven Recommendations:

Based on comprehensive data analysis, the following strategic recommendations were developed to enhance Turtle Games' business performance:

Loyalty Program Optimization

  • Implement a tiered, gamified loyalty program with distinct tiers matching the five customer segments
  • Offer VIP "Platinum" benefits to Premium Buyers, including exclusive products and early access
  • Create customized "Gold" tier for Regular Customers with bundle offers and upselling opportunities
  • Develop entry-level tiers with targeted incentives for Basic Buyers, Bargain Hunters, and Occasional Affluents
  • Deploy the WLS predictive model to anticipate loyalty behaviors and tailor marketing approaches

Customer Engagement Strategies

  • Develop targeted strategies for "Occasional Affluents" who have high income but inconsistent spending
  • Create personalized incentives based on spending score and income level analysis
  • Investigate and address lower engagement among younger customers with age-appropriate offerings
  • Monitor segment migration patterns to evaluate effectiveness of marketing interventions
  • Implement clear upgrade paths between segments with tailored incentives

Product Development & Quality Enhancement

  • Address product quality issues identified through sentiment analysis, particularly for budget offerings
  • Enhance gameplay mechanics and instructions based on negative sentiment patterns
  • Develop additional family-oriented products based on positive sentiment around family gameplay
  • Implement feature-based rating system to gather more structured feedback on product qualities
  • Invest in advanced NLP techniques fine-tuned for Turtle Games' review data to extract deeper insights

Technical & Data Recommendations

  • Enhance data collection to include customer ID information to reduce potential duplicate analysis
  • Improve age data collection to address the 2.5% potential misreporting identified
  • Clarify loyalty points mechanics including expiration dates and redemption structures
  • Add timestamp information to purchase and review data to enable trend analysis
  • Develop comprehensive product category mappings to enable more granular product performance analysis