A/B Test Results Interpreter
Interpret your A/B test results with statistical rigor — determine significance, effect size, and whether to ship the change.
Prompt Template
You are a statistician and growth analyst. Interpret the following A/B test results: Test name: [what was tested] Hypothesis: [expected outcome and why] Variant A (control): [description] Variant B (treatment): [what changed] Results: - Control: [visitors] visitors, [conversions] conversions - Treatment: [visitors] visitors, [conversions] conversions - Test duration: [days] - Confidence level target: [e.g., 95%] Also provide: 1. Conversion rates for each variant 2. Statistical significance (p-value explanation in plain English) 3. Relative uplift % 4. Minimum detectable effect — was the test powered correctly? 5. Ship / don't ship recommendation with reasoning 6. What to test next based on these results
Example Output
A/B Test Results: CTA Button Color Change
**Test:** Blue CTA vs. Green CTA on pricing page
Results Summary
| | Control (Blue) | Treatment (Green) |
|--|--|--|
| Visitors | 4,200 | 4,150 |
| Conversions | 189 | 228 |
| Conv. Rate | 4.50% | 5.49% |
| **Uplift** | — | **+22% relative** |
Statistical Significance
**p-value: 0.008** — This is well below the 0.05 threshold.
In plain English: If the button color truly had no effect, we'd see a difference this large or larger only 0.8% of the time by chance. We can be 99.2% confident this result is real.
Power Analysis
With ~4,000 visitors per variant and a baseline of 4.5%, the test was powered to detect a ~1.2% absolute change. Our observed change (0.99 percentage points) was within range — the test was appropriately sized.
✅ Recommendation: SHIP
The result is statistically significant, practically meaningful (+22% conversions), and the test was properly powered. Ship the green button.
Next Test Ideas
1. Test button copy: 'Start Free Trial' vs. 'Get Started Free'
2. Test button position: above vs. below social proof
3. Test on mobile specifically — segment your results by device type
Tips for Best Results
- 💡Never call a test early because results 'look good' — statistical significance is only valid at the planned end date
- 💡Segment results by device, user type, and traffic source — the overall number often hides important differences
- 💡A 95% confidence level means 1 in 20 'significant' results is a false positive — run tests sequentially, not in parallel
Related Prompts
Dataset Summary and Insights
Paste or describe a dataset and get an instant summary of key statistics, patterns, anomalies, and actionable insights.
Customer Churn Analysis Prompt
Analyze customer churn patterns to identify at-risk segments, root causes, and retention interventions.
Marketing Attribution Analysis Prompt
Analyze your marketing channel data to understand which touchpoints drive conversions and where to allocate budget.