Most growing companies don't have one database they have several. A production PostgreSQL database, a Supabase project for a newer feature, a BigQuery dataset loaded with analytics events, maybe a MySQL instance running behind a legacy service. Each one holds a piece of the picture.

The problem is that asking a question that spans multiple databases traditionally means writing SQL against each one separately, then combining the results manually in a spreadsheet. It's slow, error-prone, and requires someone who knows the schema of each system.

Natural language interfaces have started changing this. This article explains the practical side: why multi-database querying is hard, when you actually need it, and how to do it without writing any SQL yourself.

Why Data Ends Up Scattered Across Multiple Databases

It's almost never a deliberate choice. Data ends up in multiple databases because of how companies grow:

You started with a single PostgreSQL instance, then adopted Supabase for a new project because of its built-in auth and real-time features.

A legacy acquisition brought along its own MySQL database that you haven't migrated yet.

You put analytics events in BigQuery because it handles high-volume append-only data better than PostgreSQL.

Your data warehouse (Snowflake, Redshift) is the "source of truth" for historical data, but your production database has the freshest records.

None of these decisions were wrong. But the result is that answering questions like "what's our revenue per user segment this month?" requires pulling from the production database for transactions, BigQuery for event data, and maybe Supabase for user profile details.

The Traditional Approach and Why It Breaks Down

Before tools existed to help with this, the standard workflow was:

Export data from database A as a CSV

Export data from database B as a CSV

Load both into a spreadsheet or a Python script

Join and aggregate manually

Present the result as if it's authoritative, knowing it's already slightly stale

Some companies built proper ETL pipelines to solve this copying data from all sources into a central warehouse on a schedule. That works, but it's expensive to build and maintain. You need engineers to write the pipelines, monitor them, handle schema changes, and debug failures when something breaks. For a team under 20 people, a full data warehouse pipeline is often overkill.

The other approach is to live with the fragmentation and answer questions database by database, mentally combining results. This works until the questions get more complicated and business questions always get more complicated.

When You Actually Need Cross-Database Queries

Not every question requires data from multiple databases. Before building any kind of multi-database query infrastructure, it's worth asking which questions you're actually trying to answer.

Questions that typically require a single database:

"Show me revenue last 30 days" (production DB)

"Which users signed up this week?" (production DB)

"What's our average ticket size?" (production DB)

Questions that typically require multiple databases:

"What do users who completed onboarding event X look like in terms of revenue?" (events in BigQuery, revenue in production DB)

"Which customers from the acquired company are also using our main product?" (legacy MySQL + production PostgreSQL)

"What's conversion rate by ad campaign?" (analytics events in BigQuery, conversions in production DB)

"Show me churn rate for users who used feature Y at least 3 times" (feature usage in events DB, subscription state in production DB)

If your questions are mostly in the first category, you probably don't need multi-database queries right now. If they're in the second category and you're answering them manually today, that's the problem worth solving.

How Natural Language Interfaces Handle Multi-Database Connections

A natural language interface for databases works by:

Taking your question in plain English

Identifying which tables and columns are relevant

Generating SQL that retrieves the data

Running the query and formatting the results

For a single database, step 2 is straightforward the system looks at the schema of the connected database to understand what's available. For multiple databases, it needs to know the schemas of all connected databases and understand which ones to use for a given question.

The practical implementation varies by tool. Some require you to specify which database to query for each question. Better implementations can infer from context if you ask "show me revenue by user acquisition channel," the system should recognize that revenue lives in your production database and acquisition channel lives in your events database, and generate queries for both.

What makes this significantly easier for users is that you don't need to know which database holds which data. You ask the question; the tool figures out where the data lives.

AI for Database lets you connect multiple databases PostgreSQL, MySQL, Supabase, BigQuery, MongoDB, and others and then ask questions that span all of them. The system knows the schema of each connection and can pull from multiple sources in the same workflow.

Practical Examples of Queries That Span Multiple Databases

Here's what these cross-database questions look like in practice, with the SQL they'd generate against separate systems.

Example 1: Conversion rate by traffic source

Question: "What's the free-to-paid conversion rate by signup source this month?"

Against the production database (PostgreSQL):

SELECT
  signup_source,
  COUNT(*) AS total_signups,
  COUNT(CASE WHEN plan != 'free' THEN 1 END) AS paid_conversions,
  ROUND(
    100.0 * COUNT(CASE WHEN plan != 'free' THEN 1 END) / COUNT(*),
    2
  ) AS conversion_rate
FROM users
WHERE created_at >= DATE_TRUNC('month', CURRENT_DATE)
GROUP BY signup_source
ORDER BY conversion_rate DESC;

If attribution data lives in BigQuery:

SELECT
  user_id,
  first_touch_source,
  first_touch_campaign
FROM user_attribution
WHERE signup_date >= DATE_TRUNC(CURRENT_DATE(), MONTH)

The result joins on user_id to combine attribution data with conversion status from your production database.

Example 2: Feature usage and revenue correlation

Question: "Do users who used the export feature more than 5 times have higher average MRR?"

Against the events database:

SELECT
  user_id,
  COUNT(*) AS export_event_count
FROM events
WHERE event_name = 'export_used'
  AND created_at >= NOW() - INTERVAL '90 days'
GROUP BY user_id
HAVING COUNT(*) > 5

Against the production database:

SELECT
  user_id,
  monthly_amount
FROM subscriptions
WHERE status = 'active'

Join on user_id to compare MRR for heavy exporters vs everyone else.

Example 3: Legacy customer overlap

If you acquired a company and need to know which of their customers are also your customers:

Against legacy MySQL:

SELECT email, company_name, legacy_plan
FROM legacy_customers
WHERE status = 'active'

Against production PostgreSQL:

SELECT email, current_plan, mrr
FROM customers
WHERE active = true

Join on email to find the overlap and understand cross-sell opportunity.

Setting Up Multi-Database Connections in AI for Database

The setup process in AI for Database is straightforward. You add each database as a separate connection, providing the connection string and credentials for each. The system then knows about all connected databases and can query across them.

For a typical setup:

Connect your production PostgreSQL this holds users, subscriptions, transactions

Connect your Supabase project if you use it for a specific product feature

Connect BigQuery for analytics events if you stream them there

Once all connections are active, you can ask questions that span all three without specifying which database to use. Ask "which user cohort from last quarter has the highest 90-day retention?" and the system pulls subscription data from PostgreSQL and event data from BigQuery to give you a combined answer.

The dashboard feature means you can build a single view that pulls metrics from multiple sources simultaneously no manual merging required. Set it to auto-refresh and it stays current without any action on your part.

Common Problems When Querying Across Multiple Databases

A few things to watch for when working with data from multiple sources:

Clock skew and timezone differences one database might store timestamps in UTC, another in the server's local timezone. Always normalize to UTC before comparing dates across databases.

-- Normalize to UTC when querying PostgreSQL
SELECT * FROM events
WHERE created_at AT TIME ZONE 'UTC' >= '2026-03-01T00:00:00Z';

ID mismatches user IDs might be integers in one database and UUIDs in another. If you're joining on user identifiers, make sure they're the same type (or use an email or other shared identifier).

Latency differences querying a BigQuery dataset with 500 million rows takes longer than querying a PostgreSQL table with 50,000 rows. For dashboards that pull from both, set expectations on refresh time accordingly.

Schema drift if a column is renamed in one database, queries that reference it will break. Good multi-database tooling should flag when a referenced column no longer exists rather than returning empty results silently.

How to Query Multiple Databases at Once With Natural Language