Back to prompts
Data AnalysisChatGPTClaude

A/B Test Results Interpreter

Interpret your A/B test results with statistical rigor — determine significance, effect size, and whether to ship the change.

Prompt Template

You are a statistician and growth analyst. Interpret the following A/B test results:

Test name: [what was tested]
Hypothesis: [expected outcome and why]
Variant A (control): [description]
Variant B (treatment): [what changed]

Results:
- Control: [visitors] visitors, [conversions] conversions
- Treatment: [visitors] visitors, [conversions] conversions
- Test duration: [days]
- Confidence level target: [e.g., 95%]

Also provide:
1. Conversion rates for each variant
2. Statistical significance (p-value explanation in plain English)
3. Relative uplift %
4. Minimum detectable effect — was the test powered correctly?
5. Ship / don't ship recommendation with reasoning
6. What to test next based on these results

Example Output

A/B Test Results: CTA Button Color Change

**Test:** Blue CTA vs. Green CTA on pricing page

Results Summary

| | Control (Blue) | Treatment (Green) |

|--|--|--|

| Visitors | 4,200 | 4,150 |

| Conversions | 189 | 228 |

| Conv. Rate | 4.50% | 5.49% |

| **Uplift** | — | **+22% relative** |

Statistical Significance

**p-value: 0.008** — This is well below the 0.05 threshold.

In plain English: If the button color truly had no effect, we'd see a difference this large or larger only 0.8% of the time by chance. We can be 99.2% confident this result is real.

Power Analysis

With ~4,000 visitors per variant and a baseline of 4.5%, the test was powered to detect a ~1.2% absolute change. Our observed change (0.99 percentage points) was within range — the test was appropriately sized.

✅ Recommendation: SHIP

The result is statistically significant, practically meaningful (+22% conversions), and the test was properly powered. Ship the green button.

Next Test Ideas

1. Test button copy: 'Start Free Trial' vs. 'Get Started Free'

2. Test button position: above vs. below social proof

3. Test on mobile specifically — segment your results by device type

Tips for Best Results

  • 💡Never call a test early because results 'look good' — statistical significance is only valid at the planned end date
  • 💡Segment results by device, user type, and traffic source — the overall number often hides important differences
  • 💡A 95% confidence level means 1 in 20 'significant' results is a false positive — run tests sequentially, not in parallel

Try it with