A/B Testing Analysis
A/B test conducted for SkyCrossroads company's loyalty program promotion.
🔗 GitHub repository
Business Context
- Control Group (A): Customers receive 1,000 additional loyalty points for purchases over 100 rubles
- Test Group (B): Customers receive 2,000 additional loyalty points for purchases over 100 rubles (double the control)
The experiment was conducted across multiple trading points to determine the effectiveness of the enhanced promotion.
Dataset Description
The analysis uses the following variables:
| Variable | Description |
|---|---|
id_client | Unique customer ID |
id_group | Group identifier (control = 1,000 points, test = 2,000 points) |
sum_pay | Purchase amount |
id_point | Trading point ID |
months_reg | Duration of customer registration in the loyalty program (in months) |
Analysis Tasks
Task 1: Statistical Analysis Function
Built a comprehensive statistical analysis function that:
- Validates input data types and sample sizes
- Calculates descriptive statistics (mean, variance, standard deviation)
- Computes quantiles (including median, quartiles, and deciles)
- Generates histogram visualizations
Task 2: Parametric Test (Student’s t-test)
Implemented a t-test function to compare means between control and test groups, including:
- t-statistic calculation
- p-value determination
- Statistical significance assessment at 5% alpha level
Task 3: Non-parametric Test (Mann-Whitney Test)
Created a Mann-Whitney test function for comparing distributions when normality assumptions may not hold.
Task 4: Data Cleaning and Comparative Analysis
Comprehensive data preparation and analysis:
- Removed null values and outliers
- Created visualization functions for comparing distributions
- Applied both parametric and non-parametric tests
- Analyzed overall test results
Task 5: Analysis by Trading Points
Segmented analysis across six trading points:
- Visualized results for each location
- Applied statistical tests per trading point
- Ensured sample size comparability
- Identified location-specific patterns
Task 6: User Segmentation by Registration Duration
Analyzed correlation between payment amounts and customer tenure:
- Calculated Pearson and Spearman correlations
- Created scatter plots for visualization
- Examined correlation patterns in control vs. test groups
- Generated business insights based on customer lifetime
Key Findings
Overall Results
- Mann-Whitney Test: Control and test samples show similar distributions
- T-Test: Significant difference in mean purchase amounts between groups
- The discrepancy between tests suggests heterogeneity across trading points
Trading Point Analysis
Three major trading points (#1178, #1179, #1182) showed:
- Large sample sizes (1,000+ observations)
- Consistent patterns validating parametric test application
- Varying effectiveness of the enhanced promotion
Customer Segmentation Insights
- Strong correlation between purchase amounts and registration duration
- Correlation is stronger in the test group than control group
- Visual analysis (heatmap) confirms these patterns
- Suggests targeted strategies based on customer lifetime
Technologies Used
- Python 3.9+
- Libraries:
pandas- Data manipulation and analysisnumpy- Numerical computationsscipy- Statistical testsseaborn- Statistical data visualizationmatplotlib- Plotting and visualization
Methodology
The analysis follows a rigorous statistical approach:
- Data Validation: Check for data quality, missing values, and outliers
- Exploratory Analysis: Understand distributions and patterns
- Statistical Testing: Apply both parametric and non-parametric tests for robustness
- Segmentation: Analyze results by trading points and customer segments
- Business Insights: Translate statistical findings into actionable recommendations
Business Recommendations
Based on the analysis:
- Enhanced promotion effectiveness varies by location - Consider targeted implementation
- Customer tenure matters - Long-term customers show higher engagement with increased rewards
- Test group shows stronger correlation - Enhanced rewards program may improve customer lifetime value
- Sample size considerations - Focus implementation on high-traffic locations initially