Multi-Tenant Isolation in the Runtime

Multi-tenancy is one of those design choices that has to be made on day one. Bolting it onto a single-tenant system later is usually a rewrite — the isolation invariants touch every storage prefix, every index, every cache key, every request path. AGNT5 was built multi-tenant from the start, and the split reaches down into the segment crate and out through the query layer.

A tenant is the billing and access-control unit. A project is a grouping inside a tenant — usually one application or one deployment target. A run belongs to a project, which belongs to a tenant. The tenant boundary is the hard one: no code path in the runtime reads or writes data across tenants by accident.

The boundary in the storage layer

Every RocksDB segment path is tenant-scoped. A concrete layout for an on-disk data plane looks like this:

/var/lib/agnt5/
├── segments/
│   ├── tenant=proj_abc123/
│   │   ├── run=0193f4.../records.rocksdb
│   │   └── run=0193f5.../records.rocksdb
│   └── tenant=proj_def456/
│       └── run=0193f6.../records.rocksdb

Opening a segment takes a (tenant_id, run_id) tuple. The record keys inside a segment are the run’s offset. There is no shared index across tenants, which means a compaction on one tenant’s segments cannot stall another tenant’s appends.

The Parquet archive in S3 carries the same boundary:

s3://agnt5-engine/engine/runs/
├── tenant=proj_abc123/day=2026-04-11/runs-00042.parquet
├── tenant=proj_abc123/day=2026-04-12/runs-00043.parquet
└── tenant=proj_def456/day=2026-04-12/runs-00044.parquet

Hive partitioning by tenant is how DuckDB prunes cross-tenant reads at plan time. A query scoped to tenant=proj_abc123 never opens a file under tenant=proj_def456. This is not a convention enforced by the application — it falls out of the object store prefix scan, which lists only the files under the requested prefix.

The boundary at the wire

The gateway is where tenant attribution gets stamped. Every request — whether from a Studio browser session, an SDK client, or a webhook — resolves through an auth middleware that produces a (tenant_id, project_id, user_id) triple. The rest of the request path passes that triple explicitly. There is no ambient global or thread-local tenant; if a handler needs to know which tenant it is serving, the triple is in the request.

pub struct RequestContext {
    pub tenant_id: TenantId,
    pub project_id: ProjectId,
    pub user_id: Option<UserId>,
    pub auth_mode: AuthMode,
}

impl Handler {
    pub async fn create_run(
        &self,
        ctx: RequestContext,
        payload: CreateRunRequest,
    ) -> Result<Run, Error> {
        let run_id = RunId::new();
        self.engine
            .append_invocation(ctx.tenant_id, ctx.project_id, run_id, payload)
            .await
    }
}

The engine’s append_invocation takes the tenant ID as a mandatory argument. The compiler will not let the gateway forget to pass it. The engine’s implementation constructs the segment path from the tuple and cannot write outside it.

This is the most boring form of isolation, and that is the point. Explicit arguments beat implicit globals. A security review can trace tenant propagation by reading the type signatures, and no one needs to read the runtime’s internals to verify that request A cannot see request B’s data.

The boundary in the coordinator

The coordinator’s routing table tracks which worker currently owns which entity key. That table is partitioned by tenant. Two tenants can both have an entity called Account/user-423 — they resolve to different leases, different workers, different journals. The coordinator does not collapse keys across tenants.

Worker pools are scoped the same way. A worker registers with the coordinator under a (tenant_id, service_name) pair. When a run for tenant A needs dispatch, the coordinator picks from tenant A’s worker pool. Tenant B’s workers are not in that pool and cannot be selected — not as a policy check, but as a data-structure fact.

What stays shared

We do not tenant-partition everything. The runtime binary itself is shared — every tenant’s runs execute on the same process in the common case. The Rust allocator, the tokio runtime, and the gRPC server are shared. The RocksDB block cache is configured with a global size but served per-segment, so cache pressure from one tenant can in principle push out another tenant’s blocks.

We accept this because the alternative — one runtime instance per tenant — collapses the platform’s efficiency story. Shared compute with strict data isolation is the right spot for most workloads. Tenants with regulatory or blast-radius requirements that demand full compute isolation run on dedicated deployments; the operator CRDs make that a configuration flag, not an architecture change.

Quota and noisy-neighbor defenses

Isolation of data is not isolation of throughput. A tenant firing ten thousand runs a second can saturate the shared worker pool. The platform layer handles this with rate limiting and quota enforcement at the gateway — concurrency caps per project, rate caps per API key, dispatch fairness across projects in the coordinator’s scheduling.

The runtime itself stays agnostic. It will cheerfully execute whatever the gateway lets through. Quota is a policy decision in the control plane, not a correctness invariant in the data plane. Keeping them separate means we can tune quota policy without touching execution code.

Why this matters

A platform that promises durable execution has to promise durable isolation too. Losing a run is bad; leaking it into another tenant’s query is worse. The runtime’s design puts the tenant boundary in the storage layout, in the function signatures, and in the routing table — three places where it is mechanical to check and hard to forget. That mechanical isolation is what lets the platform share a binary across tenants without sharing any of their data.

Multi-tenancy done loudly, with partitioned storage and explicit arguments, beats multi-tenancy done quietly, with shared globals and policy checks. We went loud.