How long should I run a test?

Long enough for each arm to accumulate a meaningful sample, and ideally one full purchase cycle so returning-visitor novelty effects wash out. The engine’s peek-safe read tells you when the result is trustworthy; don’t stop the moment it first looks good.

Can I test more than two variants?

You can run control plus one or more variants. Keep the variant count sensible relative to your traffic, more arms means each one needs more sessions to reach significance on the primary goal.

What if a variant wins overall but loses for one segment?

That shows up in the per-segment breakdown. Use audience targeting to ship the variant only to the segment it wins for, rather than rolling out a blended average that hurts part of your traffic.

Analytics + A/B testing

Run an A/B test that actually means something

Set up an experiment in Idukki: pick a primary goal, split traffic, target an audience, read significance the peek-safe way, then auto-promote the winner with server-side purchase attribution.

7 min read · last updated 2026-06

On this page

Set up an experiment
Reading significance without fooling yourself
Holdout arms and incremental lift
Auto-promote the winner
Server-side purchase attribution
Common questions

A/B testing without statistics is guessing twice. The Idukki experimentation engine runs a real significance test on every experiment, reads it in a way that survives mid-flight checking, and only calls a winner when the result holds up. Here is how to set one up and read it honestly.

Set up an experiment

Pick the widget and define the arms. Choose the surface to test (PDP, PLP, homepage, collection or lookbook). Control (A) is your current layout. Add one or more variants that differ in layout, CTA copy, source filter or overlay style. A widget runs one experiment at a time, so test scopes never collide.
Choose a primary goal. Pick the single metric the test optimises for: conversion rate, add-to-cart, checkout, or click-through. Everything else is reported as a secondary metric, but the winner is decided on the primary goal so you don’t cherry-pick after the fact.
Set the traffic split. Default is an even split across arms. Shift the weighting if you want to limit exposure to a risky variant, or hold back a slice as a holdout arm that sees the control experience so you can read true incremental lift.
Target an audience (optional). Scope the test to a segment (device, geography, new vs returning, or traffic source) when a change is only meant for that audience. Results break down per segment so a variant that wins on mobile but loses on desktop shows up as exactly that, not a misleading blended average.
Launch and watch it live. KPIs refresh on a short interval. You see per-arm impressions, clicks, conversions and revenue as the test runs, with the current read on significance for the primary goal.

Reading significance without fooling yourself

Calling a winner the moment a result crosses a threshold inflates your false-positive rate, this is the classic peeking problem. The engine reads significance in a peek-safe (always-valid) way, so the number you see mid-flight stays trustworthy however often you check it. It does not flip to "significant" just because you happened to look at a lucky moment.

Holdout arms and incremental lift

A holdout arm keeps a portion of traffic on the control experience on purpose. Comparing the treated arms against the holdout gives you incremental lift (the conversion you would not have gotten anyway) rather than a raw rate that quietly includes baseline demand.

Auto-promote the winner

Once a variant beats control on the primary goal and the result clears significance, Idukki can promote it automatically and stop the experiment, so the winning experience ships without a manual step. Notifications fire when a result reaches significance, so the team that ran the test hears about it without watching the dashboard.

Server-side purchase attribution

Revenue and purchase outcomes are attributed server-side from your store integration, not inferred from a browser pixel that ad-blockers and Safari ITP can drop. The lift number in the experiment readout is the same one your finance team can reconcile.

Common questions

How long should I run a test?: Long enough for each arm to accumulate a meaningful sample, and ideally one full purchase cycle so returning-visitor novelty effects wash out. The engine’s peek-safe read tells you when the result is trustworthy; don’t stop the moment it first looks good.
Can I test more than two variants?: You can run control plus one or more variants. Keep the variant count sensible relative to your traffic, more arms means each one needs more sessions to reach significance on the primary goal.
What if a variant wins overall but loses for one segment?: That shows up in the per-segment breakdown. Use audience targeting to ship the variant only to the segment it wins for, rather than rolling out a blended average that hurts part of your traffic.