Beginner 7 min read

Reading the Results Dashboard

A walkthrough of every panel on the experiment Results page — verdict card, summary stats, variant comparison, device segments, revenue, custom metrics, and the statistical analysis accordion.

The Results tab on any experiment is where you’ll spend most of your post-launch time. This guide walks through every panel in the order you’ll see them, with a quick read on what to look for and what each number actually means.

Opening the Results

From any experiment detail page, click the Results tab. The page renders top-down with the most decision-relevant content at the top.

Status Badge

A small badge at the top of the page tells you the experiment’s current state — Live, Paused, Completed, Winner Declared, or Archived. Use this to quickly confirm what you’re looking at: a live experiment’s results are still accumulating, while a completed experiment’s are frozen.

Verdict Card

The most important panel. The Verdict Card synthesizes the rest of the dashboard into one decision-oriented statement:

Probability to be best for the leading variant, e.g., “Variant B has a 73% probability of being best.”
A recommendation — Keep running, Implement, or Revert — based on the probability and how long the experiment has run.
A CTA — when the leading variant crosses 95% probability to be best on the primary metric, a Declare Winner button appears. See Declaring a Winner.

If you only read one panel, read this one. Everything below explains why the verdict says what it says.

Summary Stats

Two big numbers under the verdict:

Total Sessions — every session that visited a page matching your targeting (across all variants).
Overall Conversion Rate — the combined conversion rate across all variants on the primary metric. Useful as a sanity check (“does this match what I expect from the page in general?”).

Variant Comparison Table

The core data view. One row per variant with these columns:

Column	Meaning
Variant	Name (Control / Variant B / etc.) and a color indicator.
Sessions	Sessions assigned to this variant.
Conversions	Count of primary-metric events tagged to this variant.
Conversion Rate	Conversions / Sessions, as a percent.
Probability	Probability this variant beats the control.
95% Credible Interval	Range of plausible conversion rates for this variant (we’re 95% confident the true rate is in this range).

A wide credible interval means “we don’t have enough data to be sure.” A narrow interval means the data is converged.

Device Segment Cards

Below the main table you’ll see per-device breakdowns: Desktop, Tablet, Mobile. Each shows the variant comparison filtered to that device type.

This is where you spot device-specific reactions: the aggregate result might be flat, but Variant B might be a clear win on mobile and a wash on desktop — important context for the decision. See Segmenting Results.

Revenue Metrics

(Shopify only, when applicable.) When the experiment includes purchase data, additional cards appear:

Average Order Value (AOV) — total revenue / number of orders, per variant.
Revenue per Session — total revenue / total sessions, per variant.
Revenue per Purchaser — total revenue / unique purchasers, per variant.

These are continuous metrics — see Continuous Metrics for the statistical model.

Custom Metrics

If you’ve fired any custom events with value (see Custom Events), they appear here as Mean, Total, and Trials per variant. Trials = the count of events that included a value (which may be less than the conversion count if some events were fired without value).

Advanced DOM Metrics

(Shopify only, when the Web Pixel captures them.) For experiments where additional engagement data was recorded:

Scroll depth — how far down the page each variant’s visitors scrolled.
Click count — how many click events each variant generated.
Variant visibility — confirmation that variant CSS/JS rendered for the variant’s visitors.

These are diagnostic, not primary signals. Useful for catching cases where a variant changed visitor behavior in ways that don’t show up in conversion rate alone.

Statistical Analysis Accordion

At the bottom, an accordion panel per metric (primary first, then secondaries). Each accordion shows:

Distribution chart — the posterior distribution for each variant. For binary metrics, you see Beta curves; for continuous, Normal curves. Where the curves overlap is where the uncertainty is.
Modeled improvement — the distribution of likely lift values, with a violin/density plot. Wider = more uncertainty; narrower and shifted right of zero = high confidence in a positive lift.
Credible interval — the 25th / median / 75th percentiles of modeled improvement, plus the 95% bounds.

This is the geek-out section. You don’t need to read the curves to make a decision — the verdict card already summarized them — but they’re useful for explaining the result to a stakeholder or sanity-checking a surprising outcome.

Refreshing the Data

Results compute server-side and are cached briefly. A Refresh button on the page forces a recompute — useful after you launch and want to see the first wave of data, or when you suspect the cached version is stale. Computation typically takes 1–10 seconds; for large datasets it can take up to a couple of minutes.

What to Look For

A quick mental checklist when you open the Results page:

Probability to be best — is the leading variant past 95%? If yes, candidate winner.
Sample sizes — does each variant have at least 300–500 sessions? Below that, even a high probability is fragile.
Time elapsed — has it run a full week minimum? Day-of-week effects can swing day-to-day.
Device segments — does the result hold across devices, or is it a mobile-only / desktop-only effect?
Funnel breakdown (Shopify) — is the lift coming from the right stage of the funnel, or are you just shifting the bottleneck?

If all five check out, you’ve got a real result. See Declaring a Winner for what to do next.

Common Mistakes

Stopping at 80% or 90% probability. This is the single most common A/B testing mistake. Wait for 95%. See Common Mistakes.
Reading the conversion rate without looking at the credible interval. A 5.0% vs 4.5% comparison is meaningless if the credible intervals are 4.0%–6.0% for both variants. The interval is what tells you whether the difference is real.
Ignoring the device segments. A flat aggregate result with a strong mobile win is still actionable — apply the variant to mobile only.
Refreshing repeatedly during a low-traffic experiment. The math doesn’t move faster because you watched it. Set a reminder to check tomorrow.

Next Steps

Get the full picture of the Bayesian model behind the numbers: Bayesian Stats Explained.
Drill into per-device behavior: Segmenting Results.
Get an AI-generated read on the result: AI Review.

Ready to start testing?

Install Split Test Pro and run your first experiment today.

Install on Shopify