Split Test Pro
Advanced 5 min read

Anomaly Alerts

The four rule-based checks Split Test Pro runs against every active experiment — sample ratio mismatch, traffic drops, zero conversions, runaway long tests — plus the optional Claude-driven anomaly review.

Split Test Pro runs background checks on every active experiment to catch the kinds of problems that usually go unnoticed until they’ve cost you a week of bad data. This guide covers what the platform looks for, when each alert fires, and what to do about it.

How Detection Runs

Two layers run on a 12-hour cadence per experiment:

  1. Rule-based checks — fast, deterministic checks against fixed thresholds. Always run.
  2. Claude-powered anomaly review — runs every 24 hours per experiment when AI is enabled. Looks for subtler patterns: novelty effects, trend divergence, day-of-week patterns, sudden shifts.

Each detected issue produces an alert with a severity, a type, and a recommendation.

The Four Rule-Based Checks

1. Sample Ratio Mismatch (SRM)

Severity: Critical

When it fires: After at least 1,000 total sessions, if the actual traffic split deviates more than 2% from the configured split.

What it means: Visitors aren’t being assigned in the proportions you intended. If you set 50/50 and you’re seeing 55/45, something is wrong with assignment — possibly a bot filter that’s hitting one variant disproportionately, a redirect loop, or a script-loading issue on one variant’s affected pages.

What to do: Don’t trust the results until you fix it. SRM is a sign your data is biased. Common causes:

  • Variant CSS or JS that breaks the page enough that visitors bounce (skewing the assignment in favor of bouncers).
  • A redirect variant where the destination page itself sets the experiment cookie.
  • A bot or crawler hitting one URL pattern more than the other.

Pause the experiment, identify the cause, and start fresh after the fix.

2. Traffic Drop

Severity: Warning

When it fires: When daily traffic in the last 7 days is less than 50% of the prior 7 days.

What it means: Significantly less traffic is reaching the experiment than was expected. Often unrelated to the experiment itself — usually a campaign ending, seasonality, or a site issue.

What to do: Check whether the traffic drop is global (look at your overall analytics) or specific to the experiment’s targeted pages. If global, the experiment is fine — just expect a longer runtime. If specific, investigate whether something’s broken on the targeted pages.

3. Zero Conversions

Severity: Warning

When it fires: After 7+ days of running, if at least one variant has zero conversions on the primary metric.

What it means: Either tracking is broken for that variant, or the variant is dramatically worse than control. Both warrant a look.

What to do: First, sanity-check tracking — did the conversion event fire correctly for visitors in that variant? Open DevTools as a fresh visitor, force the variant, and trigger a conversion. If the event fires correctly and the dashboard still shows zero, something deeper is wrong — possibly the variant breaks the conversion flow entirely.

4. Test Running Too Long

Severity: Info

When it fires: After 60+ days, if the leading variant hasn’t crossed 90% probability to be best.

What it means: This experiment isn’t going to converge. Either the effect is too small to detect with your traffic, or there’s no real effect at all.

What to do: Make a call. Either complete the experiment as inconclusive (and use the learning that the change has no detectable effect — that’s still a result), or accept that you’re investing more runtime for diminishing return. There’s no virtue in running a 90-day test that’s stuck at 78% probability.

Claude Anomaly Review

When AI is enabled, a separate analysis runs every 24 hours. It looks at the last 21 days of conversion data and flags four pattern types:

  • novelty_effect — variant outperforms early but the lift decays as visitors get used to the change. Common with anything visually attention-grabbing.
  • trend_divergence — variant and control conversion rates were tracking together, then suddenly diverged. Often signals a campaign-specific or audience-specific effect.
  • day_pattern — variant performs differently on weekdays vs weekends, or on specific days of the week.
  • sudden_shift — abrupt change in one variant’s behavior on a specific day, often correlated with an external event.

Severities for Claude alerts: info, warning, critical — matching the rule-based scale.

How Alerts Are Delivered

Today: Alerts are recorded in the data layer and exposed via the experiment results API. They surface in the in-app banner on the Results view (depending on the redesigned UI build you’re on — see the callout at the top).

Not today: No email notifications. No SMS. No webhook delivery to external systems. If you need anomaly notifications outside the app, the path today is to poll the results API on a schedule.

Dismissing an Alert

You can dismiss an alert that you’ve triaged. Dismissed alerts don’t disappear — they’re marked as acknowledged and stop showing in the active alerts banner. The dismissal is per-user, so a teammate seeing the same experiment will still see the alert until they also dismiss it.

Dismissing an alert doesn’t delete the underlying issue. If the issue persists (e.g., SRM is still happening), the next detection run will produce a fresh alert.

What Anomaly Detection Doesn’t Do

  • Doesn’t auto-pause the experiment. Even a critical SRM alert lets the experiment keep running — pausing is your call.
  • Doesn’t suggest the fix. It tells you something is wrong; investigation is on you.
  • Doesn’t catch experiment-design errors. Bad targeting, wrong primary metric, no hypothesis — those need pre-launch QA, not anomaly detection.
  • Doesn’t run on draft or paused experiments. Detection runs on started experiments only.

Tuning the Sensitivity

Most thresholds are fixed (1,000 sessions for SRM, 50% drop for traffic drop, 7 days zero conversions, 60 days for too-long). They’re not configurable today. If you find an alert is too noisy or too quiet for your workflow, file feedback — adjustment is on the roadmap.

Common Mistakes

  • Ignoring SRM as “probably nothing.” It’s almost always something. Treat critical alerts seriously.
  • Reacting to a single Claude alert without the rule-based context. Claude flags patterns; the rule checks flag thresholds. The two together give a clearer picture than either alone.
  • Dismissing without fixing. A dismissed alert about a real problem just hides the symptom. The data is still bad.
  • Treating “info” severity as ignorable. Info-level alerts (like “test running too long”) aren’t urgent but they’re often the most actionable — they tell you something about your experimentation cadence, not just one experiment.

Next Steps

Ready to start testing?

Install Split Test Pro and run your first experiment today.

Install on Shopify