TutorialsAIPostgreSQLSQL

How to Run A/B Test Analysis from Your Database Without a Data Scientist

You shipped a feature behind a flag two weeks ago. Half your users saw the new checkout flow, half saw the old one. The engineer who built it is asking wheth...

Dr. Elena Vasquez· AI Research LeadApril 12, 20268 min read

You shipped a feature behind a flag two weeks ago. Half your users saw the new checkout flow, half saw the old one. The engineer who built it is asking whether it worked. The product manager wants a decision by end of week.

In most companies, this is where the bottleneck appears. Someone submits a data request to the analytics team. The analyst is backed up. Three weeks later, you get a spreadsheet with no confidence intervals and a conclusion that doesn't account for the fact that the two cohorts had different plan distributions.

The data to answer this question is sitting in your database right now. This article shows you how to query it directly, apply the right statistical tests, and get to a confident decision without waiting for a data scientist or learning statistics from scratch.

Understanding What's in Your Database

A typical A/B test creates records in your database in a few different places:

  • Assignment table which users were assigned to which variant, and when
  • Event table user actions you're measuring (purchases, clicks, upgrades, completions)
  • User/subscription table attributes for segmenting results (plan, signup date, geography)
  • The assignment table is often under-instrumented. If you're using a feature flag tool like LaunchDarkly or Growthbook, those assignment events may be logged to your database or accessible via their API. If you rolled your own flags, the assignment is probably in a feature_flags or experiments table.

    Here's a common schema:

     experiment_assignments table
    CREATE TABLE experiment_assignments (
      user_id     UUID NOT NULL,
      experiment  VARCHAR(100) NOT NULL,
      variant     VARCHAR(50) NOT NULL, , 'control', 'treatment'
      assigned_at TIMESTAMPTZ NOT NULL
    );
    
     events table (actions you're measuring)
    CREATE TABLE events (
      user_id    UUID NOT NULL,
      event_type VARCHAR(100) NOT NULL,
      occurred_at TIMESTAMPTZ NOT NULL,
      properties JSONB
    );

    With these two tables, you can calculate conversion rates per variant and run statistical significance tests directly in SQL.

    Step 1: Calculate Raw Conversion Rates

    Before any statistics, you need to know how many users were in each variant and how many completed the target event. Let's say you're measuring checkout completions:

    WITH assigned_users AS (
     , Users assigned to the experiment, with their variant
      SELECT
        user_id,
        variant,
        assigned_at
      FROM experiment_assignments
      WHERE experiment = 'checkout_redesign_v2'
    ),
    converters AS (
     , Users who completed checkout after being assigned
      SELECT DISTINCT
        ea.user_id,
        ea.variant
      FROM assigned_users ea
      JOIN events e ON ea.user_id = e.user_id
      WHERE e.event_type = 'checkout_completed'
        AND e.occurred_at >= ea.assigned_at , only post-assignment events
    )
    SELECT
      a.variant,
      COUNT(DISTINCT a.user_id)                               AS total_users,
      COUNT(DISTINCT c.user_id)                               AS conversions,
      ROUND(
        100.0 * COUNT(DISTINCT c.user_id) / COUNT(DISTINCT a.user_id),
        2
      )                                                        AS conversion_rate_pct
    FROM assigned_users a
    LEFT JOIN converters c ON a.user_id = c.user_id
    GROUP BY a.variant
    ORDER BY a.variant;

    This gives you a clean table like:

    variant | total_users | conversions | conversion_rate_pct

    control | 4,821 | 386 | 8.00

    treatment | 4,903 | 441 | 8.99

    About 1 percentage point difference. But is it real, or is it noise?

    Step 2: Run a Z-Test for Statistical Significance

    A z-test for proportions tells you whether the difference between two conversion rates is likely to be real or just random variation. You don't need Python or R for this PostgreSQL can do it:

    WITH stats AS (
      SELECT
        variant,
        COUNT(DISTINCT a.user_id)                         AS n,
        COUNT(DISTINCT c.user_id)                         AS k,
        COUNT(DISTINCT c.user_id)::FLOAT /
          COUNT(DISTINCT a.user_id)                       AS p
      FROM experiment_assignments a
      LEFT JOIN (
        SELECT DISTINCT ea.user_id, ea.variant
        FROM experiment_assignments ea
        JOIN events e ON ea.user_id = e.user_id
        WHERE e.event_type = 'checkout_completed'
          AND e.occurred_at >= ea.assigned_at
      ) c ON a.user_id = c.user_id AND a.variant = c.variant
      WHERE a.experiment = 'checkout_redesign_v2'
      GROUP BY a.variant
    ),
    control AS (SELECT n, k, p FROM stats WHERE variant = 'control'),
    treatment AS (SELECT n, k, p FROM stats WHERE variant = 'treatment')
    SELECT
      ROUND((t.p - c.p)::NUMERIC, 4)                         AS absolute_lift,
      ROUND(((t.p - c.p) / c.p * 100)::NUMERIC, 2)           AS relative_lift_pct,
      ROUND(
        (
          (t.p - c.p) /
          SQRT(
            ((c.p * (1 - c.p)) / c.n) +
            ((t.p * (1 - t.p)) / t.n)
          )
        )::NUMERIC,
        4
      )                                                        AS z_score
    FROM control c, treatment t;

    A z-score above 1.96 corresponds to 95% statistical confidence. Above 2.576 is 99% confidence.

    If you're not comfortable running this SQL yourself, you can describe the question to AI for Database in plain English: "In the checkout_redesign_v2 experiment, what's the conversion rate by variant and is the difference statistically significant at 95% confidence?" The system handles the SQL, runs it, and returns the z-score alongside the rates.

    Step 3: Check for Sample Ratio Mismatch

    Before trusting any result, check whether your two groups are actually the right size relative to each other. If you intended a 50/50 split but your assignment ended up 55/45, that mismatch can invalidate your results it typically means something is wrong with your assignment logic.

    SELECT
      variant,
      COUNT(DISTINCT user_id)                                  AS users,
      ROUND(
        100.0 * COUNT(DISTINCT user_id) /
        SUM(COUNT(DISTINCT user_id)) OVER (),
        1
      )                                                         AS pct_of_total
    FROM experiment_assignments
    WHERE experiment = 'checkout_redesign_v2'
    GROUP BY variant;

    If you see 47% / 53% when you expected 50/50, dig into why. Common causes: a bug in your flag rollout, users switching devices and getting re-assigned to a different variant, or the assignment query including some users who were exposed before the experiment officially started.

    Step 4: Segment Your Results

    An aggregate conversion rate hides a lot. A feature might help power users and hurt new users. It might perform well on mobile and poorly on desktop. Running segmented analysis tells you whether your result is consistent across the user population or driven by one specific segment.

    Here's how to break down conversion rates by user plan:

    WITH assigned AS (
      SELECT
        ea.user_id,
        ea.variant,
        ea.assigned_at,
        u.plan_id
      FROM experiment_assignments ea
      JOIN users u ON ea.user_id = u.id
      WHERE ea.experiment = 'checkout_redesign_v2'
    ),
    converted AS (
      SELECT DISTINCT ea.user_id, ea.variant
      FROM assigned ea
      JOIN events e ON ea.user_id = e.user_id
      WHERE e.event_type = 'checkout_completed'
        AND e.occurred_at >= ea.assigned_at
    )
    SELECT
      a.plan_id,
      a.variant,
      COUNT(DISTINCT a.user_id)                                AS total,
      COUNT(DISTINCT c.user_id)                                AS conversions,
      ROUND(
        100.0 * COUNT(DISTINCT c.user_id) / COUNT(DISTINCT a.user_id),
        2
      )                                                         AS conversion_rate
    FROM assigned a
    LEFT JOIN converted c ON a.user_id = c.user_id
    GROUP BY a.plan_id, a.variant
    ORDER BY a.plan_id, a.variant;

    If the new checkout flow converts free users at 12% vs 8% for control, but converts paid users at 7% vs 9% for control, you have a problem. Shipping to everyone based on the aggregate number would hurt a valuable user segment.

    Step 5: Measure Revenue Impact, Not Just Conversion Rate

    Conversion rate is a proxy. What you actually care about is revenue. A variant with a 10% higher conversion rate but 20% lower average order value is a net loss.

    WITH assigned AS (
      SELECT user_id, variant, assigned_at
      FROM experiment_assignments
      WHERE experiment = 'checkout_redesign_v2'
    ),
    revenue AS (
      SELECT
        a.user_id,
        a.variant,
        SUM((e.properties->>'amount')::NUMERIC) AS total_revenue
      FROM assigned a
      JOIN events e ON a.user_id = e.user_id
      WHERE e.event_type = 'checkout_completed'
        AND e.occurred_at >= a.assigned_at
        AND e.properties->>'amount' IS NOT NULL
      GROUP BY a.user_id, a.variant
    )
    SELECT
      a.variant,
      COUNT(DISTINCT a.user_id)                                AS total_users,
      COUNT(DISTINCT r.user_id)                                AS purchasers,
      ROUND(AVG(r.total_revenue), 2)                           AS avg_revenue_per_converter,
      ROUND(
        SUM(COALESCE(r.total_revenue, 0)) / COUNT(DISTINCT a.user_id),
        2
      )                                                         AS revenue_per_user
    FROM assigned a
    LEFT JOIN revenue r ON a.user_id = r.user_id
    GROUP BY a.variant;

    The revenue_per_user column total revenue divided by all assigned users, not just converters is your most important metric. It accounts for both conversion rate and order value together.

    Setting Up Ongoing Experiment Monitoring

    Running an A/B test analysis once is useful. Having it update automatically so you can check in without re-running queries is better.

    With AI for Database, you can set up a dashboard tile that refreshes daily, showing:

  • Current conversion rates by variant
  • Statistical significance as of today
  • Revenue per user by variant
  • Sample ratio check
  • You can also set up an action workflow that alerts you when an experiment reaches statistical significance: "When the z-score for checkout_redesign_v2 crosses 1.96, send me a Slack message with the current conversion rates and lift."

    This means you don't have to remember to check. The experiment result comes to you.

    Common Pitfalls That Invalidate A/B Test Results

    Peeking too early. If you check significance daily and stop when it crosses 95%, you'll be wrong about 1 in 3 times. Set a minimum run time (usually 2–4 weeks) and a minimum sample size before calling the experiment.

    Not filtering for post-assignment events. A user who converted before being assigned to the experiment shouldn't count as a conversion. The query above handles this with e.occurred_at >= ea.assigned_at make sure your analysis does the same.

    Counting the same user twice. If a user switches devices or clears cookies, they might appear in both variants. Use your user_id (logged-in user identifier) rather than anonymous IDs for assignment wherever possible.

    Ignoring network effects. If control and treatment users interact with each other (e.g., in a marketplace or collaboration tool), one group's behaviour can influence the other. Standard A/B testing assumes independence between groups.

    Measuring too many metrics. Every metric you test multiplies your chance of a false positive. Pick your primary metric before the experiment starts and treat everything else as directional signal.

    From Analysis to Decision

    The goal of an A/B test isn't to run analysis. It's to make a decision faster and with more confidence than you'd have otherwise. The queries in this article get you from raw database events to a statistically valid conclusion in minutes rather than weeks.

    Connect your database to AI for Database at aifordatabase.com, and you can query your experiment results in plain English without writing a line of SQL and pin the results as a live dashboard so you always know where your active experiments stand.

    Ready to try AI for Database?

    Query your database in plain English. No SQL required. Start free today.