Data AnalysisChatGPTClaude

A/B Test Results Interpreter

Interpret your A/B test results with statistical rigor — determine significance, effect size, and whether to ship the change.

Prompt Template

You are a statistician and growth analyst. Interpret the following A/B test results:

Test name: [what was tested]
Hypothesis: [expected outcome and why]
Variant A (control): [description]
Variant B (treatment): [what changed]

Results:
- Control: [visitors] visitors, [conversions] conversions
- Treatment: [visitors] visitors, [conversions] conversions
- Test duration: [days]
- Confidence level target: [e.g., 95%]

Also provide:
1. Conversion rates for each variant
2. Statistical significance (p-value explanation in plain English)
3. Relative uplift %
4. Minimum detectable effect — was the test powered correctly?
5. Ship / don't ship recommendation with reasoning
6. What to test next based on these results

Example Output

A/B Test Results: CTA Button Color Change

**Test:** Blue CTA vs. Green CTA on pricing page

Results Summary

| | Control (Blue) | Treatment (Green) |

|--|--|--|

| Visitors | 4,200 | 4,150 |

| Conversions | 189 | 228 |

| Conv. Rate | 4.50% | 5.49% |

| **Uplift** | — | **+22% relative** |

Statistical Significance

**p-value: 0.008** — This is well below the 0.05 threshold.

In plain English: If the button color truly had no effect, we'd see a difference this large or larger only 0.8% of the time by chance. We can be 99.2% confident this result is real.

Power Analysis

With ~4,000 visitors per variant and a baseline of 4.5%, the test was powered to detect a ~1.2% absolute change. Our observed change (0.99 percentage points) was within range — the test was appropriately sized.

✅ Recommendation: SHIP

The result is statistically significant, practically meaningful (+22% conversions), and the test was properly powered. Ship the green button.

Next Test Ideas

1. Test button copy: 'Start Free Trial' vs. 'Get Started Free'

2. Test button position: above vs. below social proof

3. Test on mobile specifically — segment your results by device type

Tips for Best Results

💡Never call a test early because results 'look good' — statistical significance is only valid at the planned end date
💡Segment results by device, user type, and traffic source — the overall number often hides important differences
💡A 95% confidence level means 1 in 20 'significant' results is a false positive — run tests sequentially, not in parallel

Try it with

ChatGPT Claude

Related Prompts

Data Analysis

Dataset Summary and Insights

Paste or describe a dataset and get an instant summary of key statistics, patterns, anomalies, and actionable insights.

ChatGPTClaudeGemini

Data Analysis

Customer Churn Analysis Prompt

Analyze customer churn patterns to identify at-risk segments, root causes, and retention interventions.

ChatGPTClaude

Data Analysis

Marketing Attribution Analysis Prompt

Analyze your marketing channel data to understand which touchpoints drive conversions and where to allocate budget.

ChatGPTClaudeGemini