The starting problem
The client — a PropTech SaaS platform managing lease agreements, maintenance requests, and tenant communications across a UK property portfolio — came to me with three compounding problems.
First: data isolation was broken. The platform had been built with a shared schema and application-level row filtering. This worked well enough at ten landlords. At three hundred, it was producing intermittent cross-tenant data leaks — a landlord would occasionally see another landlord's maintenance tickets in their dashboard. The team had patched it repeatedly, but the root cause had never been addressed.
Second: performance had degraded sharply over the previous six months. A shared database with no tenant-level indexing strategy and no query optimisation meant that slow queries from large landlords were degrading response times for everyone on the platform.
Third: a compliance review triggered by a new enterprise client had flagged the architecture as insufficient. The enterprise deal — worth roughly four times the existing ARR — was conditional on the platform being able to demonstrate data isolation at the infrastructure level, not just the application level.
The founder had a developer. They had been firefighting. No one had owned the architectural problem end to end.
My role
I came in as fractional CTO and SaaS architect. My scope was to assess the existing architecture, define the target architecture, own the migration plan, and lead the developer through the implementation.
The engagement ran for four months. The first month was assessment and design. The remaining three were implementation, migration, and validation.
The architectural decisions
The first decision was multi-tenancy strategy. The platform was on PostgreSQL. The three standard options — shared schema with row-level security, separate schemas per tenant, and separate databases per tenant — each have different implications for cost, operational complexity, compliance, and query performance.
For this client, PostgreSQL row-level security policies with tenant_id propagation through the application session was the right choice. Separate schemas would have created operational complexity they could not sustain. Separate databases would have been prohibitively expensive at their volume and made cross-tenant reporting impossible. RLS gave us infrastructure-level isolation with a manageable migration path.
The second decision was how to migrate without downtime. The platform served landlords across UK time zones with no maintenance window. We designed a four-stage migration: add the RLS policies in audit mode alongside the existing application-level filtering, validate that no queries were escaping the new policies, enable enforcement in shadow mode with both layers active, then remove the application-level filtering after two weeks of clean operation.
The third decision was indexing strategy. We ran query analysis across the top fifty landlord accounts and identified six query patterns responsible for eighty per cent of slow queries. All six required composite indexes that included tenant_id as the leading column. Adding these required no structural changes and cut median query time by sixty per cent during a low-traffic test window.
The fourth decision was connection pooling. The platform was using direct PostgreSQL connections without pooling. PgBouncer in transaction pooling mode reduced connection overhead and made the connection count predictable under load.
The outcome
The migration completed in week eleven. No data was lost. There was no visible downtime. Landlords reported no disruption.
The compliance review passed. The enterprise deal closed three weeks after the architectural sign-off.
Median API response times dropped from 420ms to 155ms. The cross-tenant data leak that had been patched and re-appearing for eighteen months did not recur.
The developer, who had been firefighting for most of the previous year, described the three months post-migration as the first time the codebase had felt maintainable.
The founder lesson
Architecture debt does not accumulate linearly. It accumulates quietly for a long time, then compounds at the worst possible moment — when you have a large deal on the table, or when you are trying to scale, or when a compliance requirement surfaces that you cannot meet.
The cost of fixing this architecture correctly was four months of fractional engagement. The cost of not fixing it was a failed enterprise deal and ongoing operational risk. The economics were not close.
The other lesson: a single developer who owns both product and infrastructure simultaneously will almost always end up firefighting instead of building. Architecture work requires uninterrupted focus and a different mode of thinking than feature development. Getting the architecture right requires someone whose job it is to think about the architecture — not someone trying to fit it in between support tickets.