You shipped a feature behind a flag two weeks ago. Half your users saw the new checkout flow, half saw the old one. The engineer who built it is asking whether it worked. The product manager wants a decision by end of week.

In most companies, this is where the bottleneck appears. Someone submits a data request to the analytics team. The analyst is backed up. Three weeks later, you get a spreadsheet with no confidence intervals and a conclusion that doesn't account for the fact that the two cohorts had different plan distributions.

The data to answer this question is sitting in your database right now. This article shows you how to query it directly, apply the right statistical tests, and get to a confident decision without waiting for a data scientist or learning statistics from scratch.

Understanding What's in Your Database

A typical A/B test creates records in your database in a few different places:

Assignment table which users were assigned to which variant, and when

Event table user actions you're measuring (purchases, clicks, upgrades, completions)

User/subscription table attributes for segmenting results (plan, signup date, geography)

The assignment table is often under-instrumented. If you're using a feature flag tool like LaunchDarkly or Growthbook, those assignment events may be logged to your database or accessible via their API. If you rolled your own flags, the assignment is probably in a feature_flags or experiments table.

Here's a common schema:

 experiment_assignments table
CREATE TABLE experiment_assignments (
  user_id     UUID NOT NULL,
  experiment  VARCHAR(100) NOT NULL,
  variant     VARCHAR(50) NOT NULL, , 'control', 'treatment'
  assigned_at TIMESTAMPTZ NOT NULL
);

 events table (actions you're measuring)
CREATE TABLE events (
  user_id    UUID NOT NULL,
  event_type VARCHAR(100) NOT NULL,
  occurred_at TIMESTAMPTZ NOT NULL,
  properties JSONB
);

With these two tables, you can calculate conversion rates per variant and run statistical significance tests directly in SQL.

Step 1: Calculate Raw Conversion Rates

Before any statistics, you need to know how many users were in each variant and how many completed the target event. Let's say you're measuring checkout completions:

WITH assigned_users AS (
 , Users assigned to the experiment, with their variant
  SELECT
    user_id,
    variant,
    assigned_at
  FROM experiment_assignments
  WHERE experiment = 'checkout_redesign_v2'
),
converters AS (
 , Users who completed checkout after being assigned
  SELECT DISTINCT
    ea.user_id,
    ea.variant
  FROM assigned_users ea
  JOIN events e ON ea.user_id = e.user_id
  WHERE e.event_type = 'checkout_completed'
    AND e.occurred_at >= ea.assigned_at , only post-assignment events
)
SELECT
  a.variant,
  COUNT(DISTINCT a.user_id)                               AS total_users,
  COUNT(DISTINCT c.user_id)                               AS conversions,
  ROUND(
    100.0 * COUNT(DISTINCT c.user_id) / COUNT(DISTINCT a.user_id),
    2
  )                                                        AS conversion_rate_pct
FROM assigned_users a
LEFT JOIN converters c ON a.user_id = c.user_id
GROUP BY a.variant
ORDER BY a.variant;

This gives you a clean table like:

variant | total_users | conversions | conversion_rate_pct

control | 4,821 | 386 | 8.00

treatment | 4,903 | 441 | 8.99

About 1 percentage point difference. But is it real, or is it noise?

Step 2: Run a Z-Test for Statistical Significance

A z-test for proportions tells you whether the difference between two conversion rates is likely to be real or just random variation. You don't need Python or R for this PostgreSQL can do it:

WITH stats AS (
  SELECT
    variant,
    COUNT(DISTINCT a.user_id)                         AS n,
    COUNT(DISTINCT c.user_id)                         AS k,
    COUNT(DISTINCT c.user_id)::FLOAT /
      COUNT(DISTINCT a.user_id)                       AS p
  FROM experiment_assignments a
  LEFT JOIN (
    SELECT DISTINCT ea.user_id, ea.variant
    FROM experiment_assignments ea
    JOIN events e ON ea.user_id = e.user_id
    WHERE e.event_type = 'checkout_completed'
      AND e.occurred_at >= ea.assigned_at
  ) c ON a.user_id = c.user_id AND a.variant = c.variant
  WHERE a.experiment = 'checkout_redesign_v2'
  GROUP BY a.variant
),
control AS (SELECT n, k, p FROM stats WHERE variant = 'control'),
treatment AS (SELECT n, k, p FROM stats WHERE variant = 'treatment')
SELECT
  ROUND((t.p - c.p)::NUMERIC, 4)                         AS absolute_lift,
  ROUND(((t.p - c.p) / c.p * 100)::NUMERIC, 2)           AS relative_lift_pct,
  ROUND(
    (
      (t.p - c.p) /
      SQRT(
        ((c.p * (1 - c.p)) / c.n) +
        ((t.p * (1 - t.p)) / t.n)
      )
    )::NUMERIC,
    4
  )                                                        AS z_score
FROM control c, treatment t;

A z-score above 1.96 corresponds to 95% statistical confidence. Above 2.576 is 99% confidence.

If you're not comfortable running this SQL yourself, you can describe the question to AI for Database in plain English: "In the checkout_redesign_v2 experiment, what's the conversion rate by variant and is the difference statistically significant at 95% confidence?" The system handles the SQL, runs it, and returns the z-score alongside the rates.

Step 3: Check for Sample Ratio Mismatch

Before trusting any result, check whether your two groups are actually the right size relative to each other. If you intended a 50/50 split but your assignment ended up 55/45, that mismatch can invalidate your results it typically means something is wrong with your assignment logic.

SELECT
  variant,
  COUNT(DISTINCT user_id)                                  AS users,
  ROUND(
    100.0 * COUNT(DISTINCT user_id) /
    SUM(COUNT(DISTINCT user_id)) OVER (),
    1
  )                                                         AS pct_of_total
FROM experiment_assignments
WHERE experiment = 'checkout_redesign_v2'
GROUP BY variant;

If you see 47% / 53% when you expected 50/50, dig into why. Common causes: a bug in your flag rollout, users switching devices and getting re-assigned to a different variant, or the assignment query including some users who were exposed before the experiment officially started.

Step 4: Segment Your Results

An aggregate conversion rate hides a lot. A feature might help power users and hurt new users. It might perform well on mobile and poorly on desktop. Running segmented analysis tells you whether your result is consistent across the user population or driven by one specific segment.

Here's how to break down conversion rates by user plan:

WITH assigned AS (
  SELECT
    ea.user_id,
    ea.variant,
    ea.assigned_at,
    u.plan_id
  FROM experiment_assignments ea
  JOIN users u ON ea.user_id = u.id
  WHERE ea.experiment = 'checkout_redesign_v2'
),
converted AS (
  SELECT DISTINCT ea.user_id, ea.variant
  FROM assigned ea
  JOIN events e ON ea.user_id = e.user_id
  WHERE e.event_type = 'checkout_completed'
    AND e.occurred_at >= ea.assigned_at
)
SELECT
  a.plan_id,
  a.variant,
  COUNT(DISTINCT a.user_id)                                AS total,
  COUNT(DISTINCT c.user_id)                                AS conversions,
  ROUND(
    100.0 * COUNT(DISTINCT c.user_id) / COUNT(DISTINCT a.user_id),
    2
  )                                                         AS conversion_rate
FROM assigned a
LEFT JOIN converted c ON a.user_id = c.user_id
GROUP BY a.plan_id, a.variant
ORDER BY a.plan_id, a.variant;

If the new checkout flow converts free users at 12% vs 8% for control, but converts paid users at 7% vs 9% for control, you have a problem. Shipping to everyone based on the aggregate number would hurt a valuable user segment.

Step 5: Measure Revenue Impact, Not Just Conversion Rate

Conversion rate is a proxy. What you actually care about is revenue. A variant with a 10% higher conversion rate but 20% lower average order value is a net loss.

WITH assigned AS (
  SELECT user_id, variant, assigned_at
  FROM experiment_assignments
  WHERE experiment = 'checkout_redesign_v2'
),
revenue AS (
  SELECT
    a.user_id,
    a.variant,
    SUM((e.properties->>'amount')::NUMERIC) AS total_revenue
  FROM assigned a
  JOIN events e ON a.user_id = e.user_id
  WHERE e.event_type = 'checkout_completed'
    AND e.occurred_at >= a.assigned_at
    AND e.properties->>'amount' IS NOT NULL
  GROUP BY a.user_id, a.variant
)
SELECT
  a.variant,
  COUNT(DISTINCT a.user_id)                                AS total_users,
  COUNT(DISTINCT r.user_id)                                AS purchasers,
  ROUND(AVG(r.total_revenue), 2)                           AS avg_revenue_per_converter,
  ROUND(
    SUM(COALESCE(r.total_revenue, 0)) / COUNT(DISTINCT a.user_id),
    2
  )                                                         AS revenue_per_user
FROM assigned a
LEFT JOIN revenue r ON a.user_id = r.user_id
GROUP BY a.variant;

The revenue_per_user column total revenue divided by all assigned users, not just converters is your most important metric. It accounts for both conversion rate and order value together.

Setting Up Ongoing Experiment Monitoring

Running an A/B test analysis once is useful. Having it update automatically so you can check in without re-running queries is better.

With AI for Database, you can set up a dashboard tile that refreshes daily, showing:

Current conversion rates by variant

Statistical significance as of today

Revenue per user by variant

Sample ratio check

You can also set up an action workflow that alerts you when an experiment reaches statistical significance: "When the z-score for checkout_redesign_v2 crosses 1.96, send me a Slack message with the current conversion rates and lift."

This means you don't have to remember to check. The experiment result comes to you.

Common Pitfalls That Invalidate A/B Test Results

Peeking too early. If you check significance daily and stop when it crosses 95%, you'll be wrong about 1 in 3 times. Set a minimum run time (usually 2–4 weeks) and a minimum sample size before calling the experiment.

Not filtering for post-assignment events. A user who converted before being assigned to the experiment shouldn't count as a conversion. The query above handles this with e.occurred_at >= ea.assigned_at make sure your analysis does the same.

Counting the same user twice. If a user switches devices or clears cookies, they might appear in both variants. Use your user_id (logged-in user identifier) rather than anonymous IDs for assignment wherever possible.

Ignoring network effects. If control and treatment users interact with each other (e.g., in a marketplace or collaboration tool), one group's behaviour can influence the other. Standard A/B testing assumes independence between groups.

Measuring too many metrics. Every metric you test multiplies your chance of a false positive. Pick your primary metric before the experiment starts and treat everything else as directional signal.

From Analysis to Decision

The goal of an A/B test isn't to run analysis. It's to make a decision faster and with more confidence than you'd have otherwise. The queries in this article get you from raw database events to a statistically valid conclusion in minutes rather than weeks.

Connect your database to AI for Database at aifordatabase.com, and you can query your experiment results in plain English without writing a line of SQL and pin the results as a live dashboard so you always know where your active experiments stand.

Start querying your database for free → Connect in 2 minutes at aifordatabase.com, no SQL required.

How to Run A/B Test Analysis from Your Database Without a Data Scientist