Most growing companies don't have one database they have several. A production PostgreSQL database, a Supabase project for a newer feature, a BigQuery dataset loaded with analytics events, maybe a MySQL instance running behind a legacy service. Each one holds a piece of the picture.
The problem is that asking a question that spans multiple databases traditionally means writing SQL against each one separately, then combining the results manually in a spreadsheet. It's slow, error-prone, and requires someone who knows the schema of each system.
Natural language interfaces have started changing this. This article explains the practical side: why multi-database querying is hard, when you actually need it, and how to do it without writing any SQL yourself.
Why Data Ends Up Scattered Across Multiple Databases
It's almost never a deliberate choice. Data ends up in multiple databases because of how companies grow:
None of these decisions were wrong. But the result is that answering questions like "what's our revenue per user segment this month?" requires pulling from the production database for transactions, BigQuery for event data, and maybe Supabase for user profile details.
The Traditional Approach and Why It Breaks Down
Before tools existed to help with this, the standard workflow was:
Some companies built proper ETL pipelines to solve this copying data from all sources into a central warehouse on a schedule. That works, but it's expensive to build and maintain. You need engineers to write the pipelines, monitor them, handle schema changes, and debug failures when something breaks. For a team under 20 people, a full data warehouse pipeline is often overkill.
The other approach is to live with the fragmentation and answer questions database by database, mentally combining results. This works until the questions get more complicated and business questions always get more complicated.
When You Actually Need Cross-Database Queries
Not every question requires data from multiple databases. Before building any kind of multi-database query infrastructure, it's worth asking which questions you're actually trying to answer.
Questions that typically require a single database:
Questions that typically require multiple databases:
If your questions are mostly in the first category, you probably don't need multi-database queries right now. If they're in the second category and you're answering them manually today, that's the problem worth solving.
How Natural Language Interfaces Handle Multi-Database Connections
A natural language interface for databases works by:
For a single database, step 2 is straightforward the system looks at the schema of the connected database to understand what's available. For multiple databases, it needs to know the schemas of all connected databases and understand which ones to use for a given question.
The practical implementation varies by tool. Some require you to specify which database to query for each question. Better implementations can infer from context if you ask "show me revenue by user acquisition channel," the system should recognize that revenue lives in your production database and acquisition channel lives in your events database, and generate queries for both.
What makes this significantly easier for users is that you don't need to know which database holds which data. You ask the question; the tool figures out where the data lives.
AI for Database lets you connect multiple databases PostgreSQL, MySQL, Supabase, BigQuery, MongoDB, and others and then ask questions that span all of them. The system knows the schema of each connection and can pull from multiple sources in the same workflow.
Practical Examples of Queries That Span Multiple Databases
Here's what these cross-database questions look like in practice, with the SQL they'd generate against separate systems.
Example 1: Conversion rate by traffic source
Question: "What's the free-to-paid conversion rate by signup source this month?"
Against the production database (PostgreSQL):
SELECT
signup_source,
COUNT(*) AS total_signups,
COUNT(CASE WHEN plan != 'free' THEN 1 END) AS paid_conversions,
ROUND(
100.0 * COUNT(CASE WHEN plan != 'free' THEN 1 END) / COUNT(*),
2
) AS conversion_rate
FROM users
WHERE created_at >= DATE_TRUNC('month', CURRENT_DATE)
GROUP BY signup_source
ORDER BY conversion_rate DESC;If attribution data lives in BigQuery:
SELECT
user_id,
first_touch_source,
first_touch_campaign
FROM user_attribution
WHERE signup_date >= DATE_TRUNC(CURRENT_DATE(), MONTH)The result joins on user_id to combine attribution data with conversion status from your production database.
Example 2: Feature usage and revenue correlation
Question: "Do users who used the export feature more than 5 times have higher average MRR?"
Against the events database:
SELECT
user_id,
COUNT(*) AS export_event_count
FROM events
WHERE event_name = 'export_used'
AND created_at >= NOW() - INTERVAL '90 days'
GROUP BY user_id
HAVING COUNT(*) > 5Against the production database:
SELECT
user_id,
monthly_amount
FROM subscriptions
WHERE status = 'active'Join on user_id to compare MRR for heavy exporters vs everyone else.
Example 3: Legacy customer overlap
If you acquired a company and need to know which of their customers are also your customers:
Against legacy MySQL:
SELECT email, company_name, legacy_plan
FROM legacy_customers
WHERE status = 'active'Against production PostgreSQL:
SELECT email, current_plan, mrr
FROM customers
WHERE active = trueJoin on email to find the overlap and understand cross-sell opportunity.
Setting Up Multi-Database Connections in AI for Database
The setup process in AI for Database is straightforward. You add each database as a separate connection, providing the connection string and credentials for each. The system then knows about all connected databases and can query across them.
For a typical setup:
Once all connections are active, you can ask questions that span all three without specifying which database to use. Ask "which user cohort from last quarter has the highest 90-day retention?" and the system pulls subscription data from PostgreSQL and event data from BigQuery to give you a combined answer.
The dashboard feature means you can build a single view that pulls metrics from multiple sources simultaneously no manual merging required. Set it to auto-refresh and it stays current without any action on your part.
Common Problems When Querying Across Multiple Databases
A few things to watch for when working with data from multiple sources:
Clock skew and timezone differences one database might store timestamps in UTC, another in the server's local timezone. Always normalize to UTC before comparing dates across databases.
-- Normalize to UTC when querying PostgreSQL
SELECT * FROM events
WHERE created_at AT TIME ZONE 'UTC' >= '2026-03-01T00:00:00Z';ID mismatches user IDs might be integers in one database and UUIDs in another. If you're joining on user identifiers, make sure they're the same type (or use an email or other shared identifier).
Latency differences querying a BigQuery dataset with 500 million rows takes longer than querying a PostgreSQL table with 50,000 rows. For dashboards that pull from both, set expectations on refresh time accordingly.
Schema drift if a column is renamed in one database, queries that reference it will break. Good multi-database tooling should flag when a referenced column no longer exists rather than returning empty results silently.