Deep Research Agent

The deep-research template is a four-phase workflow that produces a cited report from a single research question. It plans a set of sub-questions, runs parallel searches for each, reduces the findings into structured notes, and writes a final report. It’s the canonical fan-out/fan-in shape in AGNT5 — the kind of work that looks like a sequence but benefits enormously from parallelism.

What you’ll build

A plan step that decomposes a research question into 5–10 targeted sub-questions
A parallel fan-out that runs web search plus LLM extraction for every sub-question concurrently
A reduce step that deduplicates findings, resolves conflicts, and structures notes by theme
A write step that turns the notes into a cited markdown report

Requirements

Python 3.10+
OPENAI_API_KEY
A web search API key (Tavily, Exa, or Brave — TAVILY_API_KEY by default)
The AGNT5 CLI

Install

curl -LsSf https://agnt5.com/cli.sh | bash

Setup

Scaffold the project

agnt5 create deep_research research-bot
cd research-bot

Set environment variables

export OPENAI_API_KEY=sk-...
export TAVILY_API_KEY=tvly-...

Install dependencies

uv sync

pip install -e .

Run a research job

agnt5 dev up
agnt5 invoke research --input '{"question": "How have battery energy densities evolved since 2015?"}'

How it works

The workflow is a straight four-step pipeline, but step two is a fan-out. First, plan prompts the LLM to break the question into sub-questions with structured output. Second, the workflow dispatches a search_and_extract call per sub-question — these run concurrently because AGNT5 schedules independent ctx.step() invocations in parallel. Each call hits the web search API, pulls the top results, and asks the model to extract claims with source URLs. Third, reduce receives all extracted notes and produces a deduplicated, thematically grouped set. Fourth, write turns that into a markdown report.

Every step is journaled, which has two payoffs. The fan-out is crash-safe: if the worker dies after seven of ten sub-question searches complete, replay only re-executes the missing three. And the journal gives you a full audit trail — every URL fetched, every claim extracted, every reduction decision. You can replay the same research run with a stricter extraction prompt and diff the final reports to measure the change.

Parallelism comes from the shape of the code, not a decorator. Returning a list of step futures and awaiting them together is how you get fan-out. See research.py for the pattern.

Key files

worker.py — Registers the workflow and its four functions.
research.py — The top-level workflow: plan, fan-out, reduce, write.
functions/plan.py — LLM call that produces structured sub-questions.
functions/search_and_extract.py — The fan-out unit: web search plus per-result extraction.
functions/reduce.py — Deduplicates and groups notes by theme.
functions/write.py — Generates the final cited report.

Customize

Swap the search provider. functions/search_and_extract.py is the only place the search API is referenced. Replace Tavily with Exa, Brave, or your own indexer — keep the return shape and the rest of the pipeline works unchanged.

Change the report format. functions/write.py ends with a prompt that produces markdown. Point it at a Jinja template for structured reports, or change the output to JSON for downstream processing.

Cap the fan-out. plan.py decides how many sub-questions to generate. Clamp it in code if you need a strict budget on search API calls.

Next steps

Read /docs/build/workflows for fan-out and fan-in patterns
See code_reviewer for another deterministic multi-step workflow
Browse /docs/build/workflows for parallel step semantics