TutorialsAIPostgreSQLMySQL

How to Query Multiple Databases at Once With Natural Language

Most growing companies don't have one database they have several. A production PostgreSQL database, a Supabase project for a newer feature, a BigQuery datas...

Dr. Elena Vasquez· AI Research LeadMarch 29, 20269 min read

Most growing companies don't have one database they have several. A production PostgreSQL database, a Supabase project for a newer feature, a BigQuery dataset loaded with analytics events, maybe a MySQL instance running behind a legacy service. Each one holds a piece of the picture.

The problem is that asking a question that spans multiple databases traditionally means writing SQL against each one separately, then combining the results manually in a spreadsheet. It's slow, error-prone, and requires someone who knows the schema of each system.

Natural language interfaces have started changing this. This article explains the practical side: why multi-database querying is hard, when you actually need it, and how to do it without writing any SQL yourself.

Why Data Ends Up Scattered Across Multiple Databases

It's almost never a deliberate choice. Data ends up in multiple databases because of how companies grow:

  • You started with a single PostgreSQL instance, then adopted Supabase for a new project because of its built-in auth and real-time features.
  • A legacy acquisition brought along its own MySQL database that you haven't migrated yet.
  • You put analytics events in BigQuery because it handles high-volume append-only data better than PostgreSQL.
  • Your data warehouse (Snowflake, Redshift) is the "source of truth" for historical data, but your production database has the freshest records.
  • None of these decisions were wrong. But the result is that answering questions like "what's our revenue per user segment this month?" requires pulling from the production database for transactions, BigQuery for event data, and maybe Supabase for user profile details.

    The Traditional Approach and Why It Breaks Down

    Before tools existed to help with this, the standard workflow was:

  • Export data from database A as a CSV
  • Export data from database B as a CSV
  • Load both into a spreadsheet or a Python script
  • Join and aggregate manually
  • Present the result as if it's authoritative, knowing it's already slightly stale
  • Some companies built proper ETL pipelines to solve this copying data from all sources into a central warehouse on a schedule. That works, but it's expensive to build and maintain. You need engineers to write the pipelines, monitor them, handle schema changes, and debug failures when something breaks. For a team under 20 people, a full data warehouse pipeline is often overkill.

    The other approach is to live with the fragmentation and answer questions database by database, mentally combining results. This works until the questions get more complicated and business questions always get more complicated.

    When You Actually Need Cross-Database Queries

    Not every question requires data from multiple databases. Before building any kind of multi-database query infrastructure, it's worth asking which questions you're actually trying to answer.

    Questions that typically require a single database:

  • "Show me revenue last 30 days" (production DB)
  • "Which users signed up this week?" (production DB)
  • "What's our average ticket size?" (production DB)
  • Questions that typically require multiple databases:

  • "What do users who completed onboarding event X look like in terms of revenue?" (events in BigQuery, revenue in production DB)
  • "Which customers from the acquired company are also using our main product?" (legacy MySQL + production PostgreSQL)
  • "What's conversion rate by ad campaign?" (analytics events in BigQuery, conversions in production DB)
  • "Show me churn rate for users who used feature Y at least 3 times" (feature usage in events DB, subscription state in production DB)
  • If your questions are mostly in the first category, you probably don't need multi-database queries right now. If they're in the second category and you're answering them manually today, that's the problem worth solving.

    How Natural Language Interfaces Handle Multi-Database Connections

    A natural language interface for databases works by:

  • Taking your question in plain English
  • Identifying which tables and columns are relevant
  • Generating SQL that retrieves the data
  • Running the query and formatting the results
  • For a single database, step 2 is straightforward the system looks at the schema of the connected database to understand what's available. For multiple databases, it needs to know the schemas of all connected databases and understand which ones to use for a given question.

    The practical implementation varies by tool. Some require you to specify which database to query for each question. Better implementations can infer from context if you ask "show me revenue by user acquisition channel," the system should recognize that revenue lives in your production database and acquisition channel lives in your events database, and generate queries for both.

    What makes this significantly easier for users is that you don't need to know which database holds which data. You ask the question; the tool figures out where the data lives.

    AI for Database lets you connect multiple databases PostgreSQL, MySQL, Supabase, BigQuery, MongoDB, and others and then ask questions that span all of them. The system knows the schema of each connection and can pull from multiple sources in the same workflow.

    Practical Examples of Queries That Span Multiple Databases

    Here's what these cross-database questions look like in practice, with the SQL they'd generate against separate systems.

    Example 1: Conversion rate by traffic source

    Question: "What's the free-to-paid conversion rate by signup source this month?"

    Against the production database (PostgreSQL):

    SELECT
      signup_source,
      COUNT(*) AS total_signups,
      COUNT(CASE WHEN plan != 'free' THEN 1 END) AS paid_conversions,
      ROUND(
        100.0 * COUNT(CASE WHEN plan != 'free' THEN 1 END) / COUNT(*),
        2
      ) AS conversion_rate
    FROM users
    WHERE created_at >= DATE_TRUNC('month', CURRENT_DATE)
    GROUP BY signup_source
    ORDER BY conversion_rate DESC;

    If attribution data lives in BigQuery:

    SELECT
      user_id,
      first_touch_source,
      first_touch_campaign
    FROM user_attribution
    WHERE signup_date >= DATE_TRUNC(CURRENT_DATE(), MONTH)

    The result joins on user_id to combine attribution data with conversion status from your production database.

    Example 2: Feature usage and revenue correlation

    Question: "Do users who used the export feature more than 5 times have higher average MRR?"

    Against the events database:

    SELECT
      user_id,
      COUNT(*) AS export_event_count
    FROM events
    WHERE event_name = 'export_used'
      AND created_at >= NOW() - INTERVAL '90 days'
    GROUP BY user_id
    HAVING COUNT(*) > 5

    Against the production database:

    SELECT
      user_id,
      monthly_amount
    FROM subscriptions
    WHERE status = 'active'

    Join on user_id to compare MRR for heavy exporters vs everyone else.

    Example 3: Legacy customer overlap

    If you acquired a company and need to know which of their customers are also your customers:

    Against legacy MySQL:

    SELECT email, company_name, legacy_plan
    FROM legacy_customers
    WHERE status = 'active'

    Against production PostgreSQL:

    SELECT email, current_plan, mrr
    FROM customers
    WHERE active = true

    Join on email to find the overlap and understand cross-sell opportunity.

    Setting Up Multi-Database Connections in AI for Database

    The setup process in AI for Database is straightforward. You add each database as a separate connection, providing the connection string and credentials for each. The system then knows about all connected databases and can query across them.

    For a typical setup:

  • Connect your production PostgreSQL this holds users, subscriptions, transactions
  • Connect your Supabase project if you use it for a specific product feature
  • Connect BigQuery for analytics events if you stream them there
  • Once all connections are active, you can ask questions that span all three without specifying which database to use. Ask "which user cohort from last quarter has the highest 90-day retention?" and the system pulls subscription data from PostgreSQL and event data from BigQuery to give you a combined answer.

    The dashboard feature means you can build a single view that pulls metrics from multiple sources simultaneously no manual merging required. Set it to auto-refresh and it stays current without any action on your part.

    Common Problems When Querying Across Multiple Databases

    A few things to watch for when working with data from multiple sources:

    Clock skew and timezone differences one database might store timestamps in UTC, another in the server's local timezone. Always normalize to UTC before comparing dates across databases.

    -- Normalize to UTC when querying PostgreSQL
    SELECT * FROM events
    WHERE created_at AT TIME ZONE 'UTC' >= '2026-03-01T00:00:00Z';

    ID mismatches user IDs might be integers in one database and UUIDs in another. If you're joining on user identifiers, make sure they're the same type (or use an email or other shared identifier).

    Latency differences querying a BigQuery dataset with 500 million rows takes longer than querying a PostgreSQL table with 50,000 rows. For dashboards that pull from both, set expectations on refresh time accordingly.

    Schema drift if a column is renamed in one database, queries that reference it will break. Good multi-database tooling should flag when a referenced column no longer exists rather than returning empty results silently.

    Ready to try AI for Database?

    Query your database in plain English. No SQL required. Start free today.