I conducted an in-depth analysis of Turtle Games customer data to improve overall sales performance through enhanced understanding of customer behavior and loyalty patterns. Using Python, R, and advanced statistical methods, I delivered actionable insights that can drive enhanced marketing effectiveness, customer retention, and targeted product offerings to increase revenue.
The analysis addressed key business needs of
- Understanding customer behavior and purchasing patterns.
- Identifying key factors influencing customer loyalty.
- Developing predictive models to forecast future sales.
They were addressed through multiple advanced data techniques:
- Customer Loyalty Drivers Analysis: Applied linear regression modeling to identify and quantify key factors influencing customer loyalty
- Customer Segmentation: Implemented decision tree and K-means clustering to develop behavior-driven customer personas
- Sentiment Analysis: Utilized NLP techniques (VADER, TextBlob) on 2,000 customer reviews to assess product satisfaction
- Integrated Recommendation System: Combined quantitative insights with sentiment analysis to create segment-specific strategies
The analysis revealed significant opportunities for enhanced customer engagement through strategic loyalty program redesign and targeted marketing. Spending score and remuneration emerged as primary loyalty drivers (correlation scores of 0.67 and 0.62), while five distinct customer segments were identified with significantly different loyalty behaviors (ranging from 275 to 3988 average loyalty points).
Data-Driven Recommendations:
- Implement a tiered loyalty program with differentiated benefits based on five identified customer segments
- Deploy predictive loyalty model to anticipate customer behavior and tailor marketing approaches
- Enhance product quality and gameplay mechanics based on sentiment analysis findings
- Develop targeted strategies for "Occasional Affluents" with high income but inconsistent spending patterns
β±οΈ Duration | π Grade | π οΈ Technologies | π§ ML Algorithms | π Datasets |
---|---|---|---|---|
6 Weeks | 97% (High Distinction) |
|
|
2,000 customer reviews 6 customer attributes |
Process | Highlights & Technical Implementation |
---|---|
Data Import & Validation | β’ Loaded customer reviews dataset with purchase and demographic data β’ Performed dataset validation with .info() , .describe() , .shape() functionsβ’ Confirmed no missing values in the dataset with .isna().sum() |
Data Cleaning | β’ Identified and retained potential outliers in loyalty points to preserve dataset integrity β’ Flagged 2.5% of age data showing potential misreporting β’ Applied exploratory visualizations to assess data distributions and relationships |
Feature Analysis | β’ Analyzed correlation matrix between quantitative variables β’ Visualized relationships between loyalty points and potential predictor variables β’ Assessed categorical variables (gender, education) for their impact on loyalty points |
Feature Engineering | β’ Applied log and square root transformations to address skewness in loyalty points β’ One-hot encoded categorical features for decision tree analysis β’ Created normalized features for clustering algorithms |
Process | Technical Implementation |
---|---|
Simple Linear Regression | β’ Applied OLS function in statsmodel to analyze relationships between loyalty points and individual predictors β’ Developed multiple iterations of models to identify optimal variable combinations β’ Implemented model validation through training/test split |
Multiple Linear Regression | β’ Created comprehensive models incorporating spending score, remuneration and age β’ Applied statistical tests for normality (Shapiro-Wilk) and heteroscedasticity (Breusch-Pagan) β’ Evaluated model performance using adjusted R-squared (84.0%) |
Model Optimization | β’ Tested variable transformations (log, square root) to improve model performance β’ Implemented Weighted Least Squares regression to address heteroscedasticity β’ Validated final model with VIF assessment to confirm absence of multicollinearity |
Component | Technical Implementation |
---|---|
Decision Tree Analysis | β’ Applied DecisionTreeRegressor from sklearn to identify key loyalty driversβ’ Implemented feature importance analysis to identify spending score and remuneration as primary factors β’ Used GridSearchCV to optimize hyperparameters (max_depth=3, max_leaf_nodes=40) β’ Achieved 91.04% R-squared score on the full dataset |
K-means Clustering | β’ Used elbow method and silhouette scores to determine optimal number of clusters (k=5) β’ Applied K-means clustering to segment customers based on spending score and remuneration β’ Achieved silhouette score of 0.604 indicating good cluster separation β’ Characterized clusters through descriptive statistics and visualization |
Persona Development | β’ Created detailed customer personas based on cluster analysis β’ Developed cross-tabulation analysis to examine loyalty point distribution across segments β’ Generated cluster visualization using scatter plots with color-coding β’ Formulated segment-specific marketing recommendations |
Process | Technical Implementation |
---|---|
Text Preprocessing | β’ Tokenized and cleaned review and summary text using NLTK β’ Verified no relevant duplicates in the review dataset β’ Performed a manual verification on 20 rows to identify most reliable text source (summary or full review text) |
Sentiment Classification | β’ Implemented VADER SIA model for sentiment polarity scoring β’ Applied TextBlob for polarity and subjectivity analysis β’ Conducted comparative analysis between VADER (72.5% accuracy) and TextBlob (22.5% accuracy) β’ Created sentiment distribution visualizations |
Word Analysis | β’ Generated word clouds for positive and negative sentiment reviews β’ Extracted top 20 frequent terms by sentiment polarity β’ Identified key positive terms (play, great, love, fun) and negative terms (anger, disappointed, boring) β’ Conducted product-level sentiment aggregation |

Distribution showing Sentiment Polarity and Subjectivity scores with a rule-based classification of sentiment scores
Loyalty Drivers Analysis
- Identified spending score (0.67) and remuneration (0.62) as strongest predictors of customer loyalty
- Determined that age has a weak negative correlation (-0.04) with loyalty points but remains statistically significant
- Developed robust Weighted Least Squares regression model explaining 82.1% of loyalty point variance
- Created predictive formula: Loyalty Points = -1944.89 + 31.87 Γ Spending Score + 31.39 Γ Remuneration + 10.57 Γ Age
Important
Value Discovery: Every 1-point increase in spending score correlates with a 31.87-point increase in loyalty points, while each Β£1,000 in income corresponds to a 31.39-point increase
Customer Segmentation
- Identified five distinct customer segments with clear behavior patterns and loyalty characteristics
- Labelled "Premium Buyers" (17.8% of customers) averaging 3988 loyalty points with high income and frequent purchases
- Classified "Occasional Affluents" (16.5% of customers) with high income but relatively low loyalty engagement (912 points)
- Categorized the largest segment as "Regular Customers" (38.7%) with mid-range income and consistent spending
Important
Value Discovery: "Bargain Hunters" (13.4% of customers) show high spending despite lower income, representing opportunity for targeted value-based marketing
Sentiment Analysis
- Determined that 90% of customer reviews express positive sentiment toward Turtle Games products
- Validated that VADER sentiment analyzer (72.5% accuracy) significantly outperformed TextBlob (22.5%) for game product reviews
- Identified key positive review themes around gameplay experience, physical components, and family enjoyment
- Located negative sentiment clusters around product quality issues, age-appropriateness, and gameplay mechanics
Important
Value Discovery: Premium Buyers show highest positive sentiment (90%) with only 5% negative reviews, indicating strong correlation between satisfaction and loyalty
Based on comprehensive data analysis, the following strategic recommendations were developed to enhance Turtle Games' business performance:
Loyalty Program Optimization
- Implement a tiered, gamified loyalty program with distinct tiers matching the five customer segments
- Offer VIP "Platinum" benefits to Premium Buyers, including exclusive products and early access
- Create customized "Gold" tier for Regular Customers with bundle offers and upselling opportunities
- Develop entry-level tiers with targeted incentives for Basic Buyers, Bargain Hunters, and Occasional Affluents
- Deploy the WLS predictive model to anticipate loyalty behaviors and tailor marketing approaches
Customer Engagement Strategies
- Develop targeted strategies for "Occasional Affluents" who have high income but inconsistent spending
- Create personalized incentives based on spending score and income level analysis
- Investigate and address lower engagement among younger customers with age-appropriate offerings
- Monitor segment migration patterns to evaluate effectiveness of marketing interventions
- Implement clear upgrade paths between segments with tailored incentives
Product Development & Quality Enhancement
- Address product quality issues identified through sentiment analysis, particularly for budget offerings
- Enhance gameplay mechanics and instructions based on negative sentiment patterns
- Develop additional family-oriented products based on positive sentiment around family gameplay
- Implement feature-based rating system to gather more structured feedback on product qualities
- Invest in advanced NLP techniques fine-tuned for Turtle Games' review data to extract deeper insights
Technical & Data Recommendations
- Enhance data collection to include customer ID information to reduce potential duplicate analysis
- Improve age data collection to address the 2.5% potential misreporting identified
- Clarify loyalty points mechanics including expiration dates and redemption structures
- Add timestamp information to purchase and review data to enable trend analysis
- Develop comprehensive product category mappings to enable more granular product performance analysis