Architecture overview

The map — Gateway, Engine, Coordinator, plus your workers — and how they fit together in a single binary.

AGNT5 is one runtime binary with three components — Gateway (ingress), Engine (workflow scheduling + journal), Coordinator (worker dispatch) — plus your workers, which are separate processes that connect out to the Coordinator over gRPC.

                  ┌─────────────────────────────────────┐
                  │         AGNT5 runtime binary        │
                  │                                     │
client ──HTTP──►  │  Gateway  ──►  Engine  ──►  Coord.  │  ──gRPC──►  worker
                  │           ◄──         ◄──           │  ◄────────  (your code)
                  │                                     │
                  │   journal · S3 archive · query      │
                  └─────────────────────────────────────┘

A single process serves all three components by default. Larger deployments split them with --target gateway | engine | coordinator | all. Workers are always separate.

The mental model

Gateway is the front door. It accepts HTTP from clients (REST, SSE for streaming) and forwards run starts, signals, and queries to the Engine. It is stateless: no run progresses through the Gateway, only requests pass through it.

Engine is the brain. It owns the journal — every step’s input, output, error, and timing — and the lease manager — which worker holds which run. When a workflow calls ctx.step(...), the Engine decides whether to replay from the journal or dispatch the step to a worker; it writes the outcome to the journal regardless.

Coordinator is the worker bridge. Workers connect outbound to the Coordinator over gRPC and stay connected. When the Engine needs a step executed, it hands the call to the Coordinator, which routes it over the worker’s open stream. Worker output flows back the same way.

Workers host your code. They are separate processes you run — agnt5 dev for local, container deployments for managed environments. A worker registers its @workflow/@function/@tool/Agent instances at startup, then waits for dispatch from the Coordinator. Multiple workers can serve the same project; the Coordinator routes by (tenant_id, deployment_id, component_id).

The single-binary default makes local development one command — agnt5 dev starts one process and you have a working runtime. The split-binary mode (--target) lets larger deployments scale Gateway, Engine, and Coordinator independently. The client-facing surface is identical in both modes, so code does not change between them.

Why it works this way

A single binary makes the runtime fit on a developer’s laptop, in a Docker container, or on a small Fly machine — Railway, Render, Fly.io, even a Raspberry Pi can host the whole runtime. Splitting only when you need to scale keeps the operations story clean: one process, one config, one log stream until traffic forces otherwise.

Worker-initiated gRPC connections invert the usual ingress model. Instead of the runtime needing to route inbound to workers (which means knowing every worker’s address, opening firewalls, and managing TLS for each), workers dial out to a single coordinator endpoint. That endpoint can sit behind a load balancer, the workers can live anywhere with outbound network access, and TLS terminates once at the LB.

The journal-and-lease pattern in the Engine is the single source of truth for run state. Every other component (Gateway, Coordinator, query layer) reads from or routes around the journal — there is no second source of truth to keep consistent.

Edge cases and gotchas

--target flag splits the binary. all (default) runs everything in one process. gateway, engine, coordinator each run that one component. The same binary serves every target — selection is at startup time only.
Gateway is stateless; Engine is the stateful one. Engine holds the journal and lease manager. Scaling Engine is a different problem from scaling Gateway — Engine needs HA-aware storage; Gateway needs only more replicas.
Workers connect out, not in. The runtime never opens a connection to a worker. This means workers can run inside private networks, behind NAT, or in environments that block inbound traffic, as long as they can reach the Coordinator endpoint.
The Coordinator endpoint must use the http:// scheme. Tonic (the Rust gRPC client) does not normalize bare host:port strings. Worker config must include the scheme — the local dev stack pins this in config.managed.yml.
Standalone and HA modes share the client surface. A single-node agnt5 dev and a three-node Envoy-fronted HA cluster expose the same gRPC services on the same ports. Worker code does not change.
The runtime’s storage is RocksDB + S3 + DuckDB. RocksDB holds the active journal (write-ahead log). Sealed segments are uploaded to S3 as Parquet. DuckDB queries the Parquet over S3 for the trace UI and eval reads. Storage choices are visible to operators; user code never touches them.

What the runtime owns vs. your code — the responsibility boundary across this picture.
Durable execution — the guarantee the Engine + journal implements.
Versioning and deployment model — how code changes propagate through the architecture.
Sandbox isolation tiers — how worker execution environments are configured.

┌─────────────────────────────────────┐ │ AGNT5 runtime binary │ │ │ client ──HTTP──► │ Gateway ──► Engine ──► Coord. │ ──gRPC──► worker │ ◄── ◄── │ ◄──────── (your code) │ │ │ journal · S3 archive · query │ └─────────────────────────────────────┘

The mental model

Why it works this way

Edge cases and gotchas

--target flag splits the binary. all (default) runs everything in one process. gateway, engine, coordinator each run that one component. The same binary serves every target — selection is at startup time only.

Gateway is stateless; Engine is the stateful one. Engine holds the journal and lease manager. Scaling Engine is a different problem from scaling Gateway — Engine needs HA-aware storage; Gateway needs only more replicas.

Workers connect out, not in. The runtime never opens a connection to a worker. This means workers can run inside private networks, behind NAT, or in environments that block inbound traffic, as long as they can reach the Coordinator endpoint.

The Coordinator endpoint must use the http:// scheme. Tonic (the Rust gRPC client) does not normalize bare host:port strings. Worker config must include the scheme — the local dev stack pins this in config.managed.yml.

Standalone and HA modes share the client surface. A single-node agnt5 dev and a three-node Envoy-fronted HA cluster expose the same gRPC services on the same ports. Worker code does not change.

The runtime’s storage is RocksDB + S3 + DuckDB. RocksDB holds the active journal (write-ahead log). Sealed segments are uploaded to S3 as Parquet. DuckDB queries the Parquet over S3 for the trace UI and eval reads. Storage choices are visible to operators; user code never touches them.

Related concepts

What the runtime owns vs. your code — the responsibility boundary across this picture.

Durable execution — the guarantee the Engine + journal implements.

Versioning and deployment model — how code changes propagate through the architecture.

Sandbox isolation tiers — how worker execution environments are configured.

Architecture overview

The mental model

Why it works this way

Edge cases and gotchas

Related concepts

On this page

Architecture overview

The mental model

Why it works this way

Edge cases and gotchas

Related concepts