Inspecting a Run: What Studio Shows and Where It Reads From

The debugging story for an AI workflow is almost always the same. Something went wrong. The user wants to know which step, with which input, called which model, returned what, and why the retry behaved differently than the first attempt. A good run view answers those questions in five clicks. A bad one sends engineers into Grafana, then Sentry, then the provider dashboard, then a Slack thread with a product manager.

AGNT5 Studio’s run view is the place those questions resolve. The data behind it is not a separate observability pipeline — it is the same journal the runtime reads and writes during execution, surfaced through the gateway with minimal transformation.

What the view shows

Open a run in Studio and you see a timeline. Each step is a row. Each row has:

The step name (from ctx.step("name", ...)).
Start and end timestamps.
Status: running, completed, failed, memoized on replay.
For LLM steps: model, tokens_in, tokens_out, cost_usd.
The step’s input payload, exactly as it was passed into the closure.
The step’s output payload, exactly as it was returned.
A link to the span in your OpenTelemetry backend, for correlating with upstream traces.

The timeline is hierarchical. A workflow that calls ctx.invoke to dispatch a child workflow shows the child as a nested block. An agent that makes three tool calls shows the tool invocations as sub-rows under the agent step. Nesting goes as deep as the workflow code does.

Human-in-the-loop pauses appear as a distinct row type — “waiting for user input” — with the prompt text, the input type, and (once resolved) the responder and timestamp. A run that paused for four hours and resumed shows the gap clearly, with no confusion about whether it was “stuck.”

Where the data comes from

For a live run — one that is still executing — Studio subscribes to an SSE stream from the gateway. The gateway tails the run’s journal in RocksDB and pushes each new entry down the stream as it is appended. Latency from append to screen is in the tens of milliseconds. No polling, no refresh button.

For a completed run — one that finished more than a few seconds ago and has been flushed to Parquet — Studio queries the gateway’s runs endpoint, which goes through the DuckDB-backed query crate. The query pulls the run’s Parquet row for summary data and, if the user opens the timeline, reads the serialized journal entries that were archived alongside the run.

The split is invisible to the user. A run that was executing 30 seconds ago and is now completed serves its summary from Parquet and its timeline from whichever side has the data — usually a mix, with recent entries still in RocksDB and older ones in Parquet.

The timeline as the canonical artifact

Every other feature in Studio points back to the timeline. The runs listing shows status and cost, and clicking a row opens the timeline. The cost-attribution dashboard shows spend by project and model, and clicking a cell filters to the runs that produced it, which open their timelines. The eval view shows per-case scores, and clicking a case opens the child run’s timeline alongside the grader’s.

This is deliberate. The timeline is the true record. Everything else — aggregates, dashboards, listings — is a view over it. If the timeline is right, the rest is right by construction. If the timeline is wrong, no dashboard can save you.

Because the timeline is derived directly from the journal, there is no “observability pipeline” to fall behind. The journal entry for step 7 becomes the timeline row for step 7, with no intermediate processing that could sample, drop, or delay it. If the runtime wrote the step, Studio shows it.

Replay in the view

A run that was executed, failed, and resumed shows both executions. The timeline renders an epoch marker where the resume happened — “Execution 2 started after worker crash” — and colors memoized steps distinctly from re-executed ones. Looking at a resumed run, you can see at a glance which steps ran once and which ran twice, and you can inspect the journal entries for both runs if you need to.

For an engineer debugging a production issue, this is the difference between “something happened” and “here is exactly what happened, in order.” The timeline does not hide the retry mechanics; it surfaces them.

Inputs and outputs, unredacted

Step inputs and outputs are stored as-is. We do not apply redaction in the runtime. If a step received a customer’s PII in its input, the PII is in the journal entry, and therefore in the timeline. The control plane does let operators configure per-project redaction policies that the SDK applies before writing, so sensitive fields can be masked at source — but that is a choice the application owner makes, not a default.

The reason is honest: we cannot guess what matters. Masking too aggressively makes the timeline useless for debugging. Masking too little puts data in a place users did not expect. Giving the application control, and making the default be “we show what was sent,” keeps the contract clear.

What the view does not try to be

Studio’s run view is not a log aggregator, not a metrics dashboard, and not a trace visualizer. For logs, you use your existing logging stack — the SDK ships structured logs with run IDs that let you join. For metrics, you use whatever you already have pointed at the OTLP collector. For traces that span beyond the runtime, you use your OTel backend, linked from the timeline.

The run view tries to be the best answer to one question: “what did this workflow do?” Not “what happened in my system as a whole” — that is a different view, and it belongs in tools built for it.

Why this matters

The practical test of an observability story is whether an engineer who has never seen the run before can figure out what went wrong in under a minute. The timeline is designed around that test. It is a linear, copyable, permalinkable artifact that answers the question the same way every time. That consistency is what makes debugging fast.

Every feature in Studio points back to the run. The run points back to the journal. The journal is the runtime. The chain is short, and the data is the data.