← All posts

Designing row-level security for a 1,600-user lakehouse

Draft — review and edit before publishing. Replace the placeholder details below with the specifics you're comfortable sharing publicly.

When eight organizations share one lakehouse, the platform question stops being "how do we store the data" and becomes "how do we make sure the right people see the right rows." This post covers how we approached row-level security (RLS) on an Apache Iceberg medallion architecture serving 1,600+ users.

The problem

A shared platform is only viable if every consuming team trusts that their data is isolated. The naive answers each fail at scale:

  • Separate copies per org — storage costs multiply and datasets drift out of sync.
  • Views per org — thousands of views to maintain, and one missed view is a data leak.
  • Application-level filtering — works until someone connects a BI tool directly to the warehouse.

The approach

The design that held up was pushing authorization into the query path itself, keyed on a small set of governed attributes rather than per-dataset rules.

  1. Classify once, enforce everywhere. Every dataset in the semantic layer carries ownership and sensitivity metadata. Enforcement reads the metadata; it is never encoded per-pipeline.
  2. Filter at the storage boundary. Row filters apply where the query engine reads data, so every access path — SQL, BI tools, the embedded AI agent — passes through the same control.
  3. Make the safe path the easy path. Onboarding a new dataset with correct RLS had to be less work than onboarding it without. Otherwise governance becomes the thing teams route around.

What I'd tell someone building this today

  • Audit access patterns before designing the filter model. The distribution is always more skewed than you expect.
  • Treat the AI/agent layer as just another client. If your RLS only works for humans typing SQL, it doesn't work.
  • Budget real time for the migration of existing consumers — the security model is the easy half; moving 1,600 users onto it is the hard half.