Database compliance sounds like a problem for enterprises with dedicated legal and security teams. In practice, it hits small teams hardest — a 12-person SaaS startup is expected to meet the same GDPR requirements as a 5,000-person company, with a fraction of the resources.
The good news: most of what compliance actually requires is not that complicated. The hard part is knowing which requirements apply to you, understanding what your database needs to do to satisfy them, and keeping evidence that you're actually doing it.
This guide covers the three frameworks that come up most often for growing SaaS and data-driven companies — GDPR, SOC 2, and HIPAA — and explains what they mean in concrete, database-level terms.
Which Frameworks Apply to You?
Before worrying about what to implement, figure out what you're actually required to comply with.
GDPR (General Data Protection Regulation) applies if you process personal data of people in the European Union — regardless of where your company is based. If a user in Germany signs up for your product, GDPR applies. Personal data is broadly defined: names, email addresses, IP addresses, and any identifier that can be linked to a specific person.
HIPAA (Health Insurance Portability and Accountability Act) applies if you handle Protected Health Information (PHI) in the US. PHI includes medical records, diagnoses, treatment data, and anything that links health information to an identifiable individual. If you're building healthcare software, a medical practice management tool, or anything that stores patient data, HIPAA applies.
SOC 2 is different — it's not a legal requirement but a voluntary audit standard. Customers (especially enterprise buyers) increasingly require a SOC 2 Type II report before signing contracts. It covers security, availability, processing integrity, confidentiality, and privacy across your whole system, but your database is a central piece of the audit evidence.
You may need all three, two of them, or just one. If you're a general SaaS product serving EU customers, GDPR is the baseline. If you want to sell to mid-market or enterprise, SOC 2 matters. If you're in healthcare, HIPAA is non-negotiable.
GDPR: What Your Database Actually Needs to Do
GDPR is primarily about data subject rights and data minimisation. Here's what that means in database terms.
Know what personal data you store and where
You need a data map: a record of what personal data you store, which tables it lives in, what you use it for, and how long you keep it. This sounds bureaucratic, but it's genuinely useful. Most teams have personal data scattered across tables they've forgotten about.
A basic audit query for a PostgreSQL database:
-- Find tables likely to contain personal data
SELECT table_name, column_name, data_type
FROM information_schema.columns
WHERE table_schema = 'public'
AND (
column_name ILIKE '%email%'
OR column_name ILIKE '%phone%'
OR column_name ILIKE '%address%'
OR column_name ILIKE '%name%'
OR column_name ILIKE '%ip%'
)
ORDER BY table_name, column_name;This won't catch everything, but it's a starting point to identify where personal data lives.
Support data subject requests (DSARs)
GDPR gives individuals the right to request their data (Subject Access Request), correct it, or have it deleted (Right to Erasure, also called the "right to be forgotten"). You need to be able to fulfil these requests within 30 days.
That means you need to be able to:
An example deletion query pattern:
-- Anonymise rather than delete (preserves referential integrity)
UPDATE users
SET
email = CONCAT('deleted-', id, '@anon.invalid'),
name = 'Deleted User',
phone = NULL,
ip_address = NULL,
deleted_at = NOW()
WHERE id = :user_id;
-- Also clean related tables
UPDATE orders SET customer_email = NULL WHERE user_id = :user_id;
UPDATE audit_logs SET ip_address = NULL WHERE user_id = :user_id;Anonymisation is usually safer than hard deletion because foreign key constraints often make hard deletes messy. The key is that the person is no longer identifiable.
Data retention limits
GDPR requires you to not keep personal data longer than necessary. Define retention periods per data type:
-- Delete inactive users' personal data after 3 years
UPDATE users
SET email = CONCAT('expired-', id, '@anon.invalid'),
name = 'Expired User',
phone = NULL
WHERE last_login_at < NOW() - INTERVAL '3 years'
AND deleted_at IS NULL;Run this as a scheduled job, not a one-off. Retention must be ongoing.
Encryption and access controls
Personal data should be encrypted at rest and in transit. At the database level, this typically means:
Access controls matter too: not everyone on the team should be able to run SELECT * FROM users. Use database roles to limit who can query what.
SOC 2: What Auditors Will Look For in Your Database
SOC 2 audits cover five Trust Service Criteria: Security, Availability, Processing Integrity, Confidentiality, and Privacy. Most small teams start with just Security and Availability.
At the database level, auditors focus on:
Access control and authentication
Who can connect to your database, and how? Auditors want to see:
-- PostgreSQL: create a read-only analytics user
CREATE ROLE analytics_readonly;
GRANT CONNECT ON DATABASE mydb TO analytics_readonly;
GRANT USAGE ON SCHEMA public TO analytics_readonly;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO analytics_readonly;
-- Apply to specific user
CREATE USER priya WITH PASSWORD 'strongpassword';
GRANT analytics_readonly TO priya;Audit logging
SOC 2 requires evidence that you know who did what and when. Your database should log:
In PostgreSQL, enable pgaudit for structured audit logging. In MySQL, use the audit log plugin. Cloud databases (RDS, Cloud SQL, AlloyDB) have managed audit logging you can enable with a config change.
Change management
Every schema migration should be tracked: who approved it, when it ran, what it changed. Use a migration tool like Flyway, Liquibase, or Prisma Migrate. Keep the migration history in version control. Auditors will ask for this.
Backup and recovery
You need documented, tested backup procedures. "We have automated backups" isn't enough — auditors want to see that you've tested restoration. Document:
HIPAA: Database Requirements for Healthcare Data
HIPAA has two main rules relevant to databases: the Privacy Rule and the Security Rule.
The Security Rule has specific technical safeguards:
Encryption: PHI must be encrypted in transit (TLS 1.2+) and at rest. For cloud databases, this is usually a checkbox. For self-hosted, you need to verify it's actually enabled.
Access controls: Each person accessing PHI needs a unique identifier. No shared logins. Role-based access must be documented and reviewed regularly.
Audit controls: You must have hardware, software, or procedural mechanisms to examine activity in systems containing PHI. This is the same audit logging requirement as SOC 2, but with legal teeth.
Automatic logoff: Application sessions must time out. This is usually an application-layer concern, but if your team uses database GUI tools (like TablePlus or DBeaver) to access PHI directly, those sessions need to time out too.
Data backup and disaster recovery: Documented backup procedures with tested restoration. Same as SOC 2 requirements.
A critical HIPAA point: every vendor who has access to PHI needs a signed Business Associate Agreement (BAA). If you're using a managed database service (AWS RDS, Google Cloud SQL, Azure Database), check that your cloud provider will sign a BAA. Most major providers will. If you're using AI for Database to query a database containing PHI, the same applies — any tool that touches the data needs a BAA in place.
Monitoring Compliance in Your Database
Compliance isn't a one-time project. It requires ongoing monitoring:
A practical query to review database users in PostgreSQL:
SELECT
rolname AS username,
rolsuper AS is_superuser,
rolcreatedb AS can_create_db,
rolcreaterole AS can_create_roles,
rolcanlogin AS can_login,
pg_catalog.pg_get_userbyid(oid) AS owner
FROM pg_catalog.pg_roles
WHERE rolcanlogin = true
ORDER BY rolname;Run this, compare it to your expected user list, and immediately revoke access for anyone who shouldn't be there.
With AI for Database, you can build this kind of compliance monitoring into a dashboard. Connect your database, set up queries that check for access anomalies or retention violations, and schedule them to run weekly — automatically. You'll get an alert if something unexpected shows up, without needing to remember to check.
Building a Minimal Viable Compliance Programme
If you're starting from zero, here's a practical sequence:
This isn't everything — a real compliance programme involves policies, training, incident response plans, and more. But these six steps handle the majority of the database-level requirements across GDPR, SOC 2, and HIPAA, and they're all achievable by a small team without a dedicated security hire.
The Bottom Line
Database compliance feels overwhelming because the regulatory language is dense and the requirements feel abstract. But most of what GDPR, SOC 2, and HIPAA actually require at the database level is: know what you have, control who can see it, log what's done to it, and delete it when you no longer need it.
You don't need an enterprise compliance team. You need good database hygiene and the discipline to maintain it. Start with a data map and an access audit this week. The rest follows from there.
If ongoing monitoring is the hard part — remembering to check things, running audits on a schedule, alerting when something looks off — AI for Database can help. Connect your database, set up monitoring queries, and schedule them to run automatically. Free to try at aifordatabase.com.