Multi-Tenant SaaS with PostgreSQL Row-Level Security: Beyond the Basics

Multi-tenant SaaS with PostgreSQL row-level security: beyond the basics

When I started building our SaaS platform, I had a simple question: how do I stop Tenant A from seeing Tenant B's data? The platform handles API management for multiple organisations. Each one has its own routes, consumers, analytics, and billing. The data lives in PostgreSQL 16, the application is written in Go with pgx, and there are now several hundred tenants on the system.

I want to talk about what worked, what broke, and where Row-Level Security stopped being enough.

Why not just separate databases?

At ten tenants, separate databases are fine. Each tenant gets its own database. You run migrations once per database. Backups are clean. Connection pools are small. Life is good.

At a thousand tenants, you have a thousand connection pools. A thousand migration runs. A thousand backup schedules. Your ops team (if you have one) is now babysitting infrastructure instead of building product.

RLS gives you one database, one schema, one migration path, and per-row isolation enforced by PostgreSQL itself. Not by your application code. Not by an ORM plugin. By the database engine, on every query, whether you remembered to add the WHERE clause or not.

I did not fully appreciate that until we had our first data isolation bug. A developer forgot a WHERE clause in a dashboard query. Without RLS, that query would have returned every tenant's data. With RLS, it returned only the correct tenant's data. The bug was still a bug, but it was not a security incident.

How it actually works

Every table in our schema has a tenant_id column. When a request comes in, the Go application picks a connection from the pool and sets a session variable:

SET app.current_tenant = 'tenant_xyz';

Then a policy on each table says: you can only see rows where tenant_id matches current_setting('app.current_tenant'). PostgreSQL checks this on every SELECT, INSERT, UPDATE, and DELETE. Your application code does not need to add WHERE tenant_id = ... to every query. If a developer forgets, the database catches it. If a bug in the routing layer sends the wrong tenant context, the query returns empty, not someone else's data.

That is the real value. RLS is a guard that works even when your code does not.

The things that bit us

Superusers ignore RLS

This one is in the documentation, but I missed it the first time. RLS policies do not apply to superusers or table owners. Our migration user was a superuser. So every migration script, every seed script, every ad-hoc fix that ran under that user could read and write any tenant's data without restriction.

The fix was simple: the migration user is a superuser, the application user is not. Never mix them. We also added a check in CI that fails if the application connection string uses the superuser credentials.

Performance is not free

RLS adds a filter to every query. For small tables, you will not notice. For a table with 50 million rows of API analytics data, you will notice very quickly.

The fix: put tenant_id as the leading column in your composite indexes. Not just as a column in the table. In the index. We went from 800ms queries to 3ms queries on our analytics table by changing the index from (created_at) to (tenant_id, created_at). PostgreSQL can use the RLS filter to do an index scan instead of a sequential scan. But only if the index supports it.

Joins get strange

If you join two RLS-protected tables, both policies apply. This sounds obvious, but it can produce confusing results. If the session variable is not set, or is set to the wrong value, you get empty result sets with no error. Not a permission denied. Not an exception. Just zero rows.

We lost a day debugging a report that returned empty for one tenant. Turned out a background job was not setting the session variable before running the report query. The query was correct. The context was not.

Bulk imports need care

When you import data for one tenant while other tenants are querying, you need to think about transaction isolation. We use READ COMMITTED isolation for normal queries and move bulk imports into separate transactions with explicit locking on the import staging table. Without this, we saw partial reads where a tenant could see half-imported data.

RLS for things beyond access control

Once every table is tenant-scoped, you get some useful things for free.

Resource quotas

We limit each tenant to a maximum number of API routes based on their plan. The check is a simple count query: SELECT COUNT(*) FROM routes. Because RLS is active, this only counts the current tenant's routes. No WHERE clause needed. The quota check does not even know other tenants exist.

Billing isolation

Usage metering works the same way. SELECT SUM(request_count) FROM usage WHERE period = '2026-03' automatically returns only the current tenant's usage. The billing pipeline sets the session variable and reads usage figures. It cannot accidentally bill Tenant A for Tenant B's traffic.

Audit logging

Our audit log table is also RLS-protected. When a tenant asks "show me who changed this route," the query is tenant-scoped without any application logic. The audit trail cannot leak across tenants, and we did not write a single line of filtering code to make that happen.

When RLS is not enough

We hit the wall at around 400 tenants. Not because RLS broke. Because one tenant was responsible for 60% of all traffic.

That tenant had 12,000 API routes. The next largest had 300. Their analytics table had 40 million rows. Everyone else combined had 10 million. Query planning was fine for the long tail, but this one tenant's queries were competing for the same shared resources as everyone else.

We could have thrown more hardware at it. We did, for a while. But the real problem was that this tenant needed different vacuum settings, different connection pool sizes, and different backup frequency than the rest.

So we moved them to a dedicated database. Just that one tenant. The other 399 stayed on shared RLS. Our application routing layer checks a tenant config table: if the tenant has a dedicated database, use that connection pool. Otherwise, use the shared pool.

The threshold for us was: if a tenant uses more than 40% of total database resources for two consecutive weeks, they get their own database. We have moved three tenants off so far. The rest are happy on shared RLS and probably will be for years.

What I would do differently

I would add tenant_id to every index from day one. We had to rebuild several large indexes in production, which meant hours of CREATE INDEX CONCURRENTLY and careful monitoring.

I would also build the dedicated-database escape hatch earlier. Not because we needed it early, but because knowing the path out makes you less nervous about putting everything on shared RLS in the first place.

RLS is not magic. It is a PostgreSQL feature that does one thing well: it makes sure your application code cannot accidentally show one tenant's data to another. The quotas, the billing, the audit trail, those are side effects of a good data model. But that one thing, the isolation guarantee that works even when your developers are tired and your code has bugs, that is the reason we use it.