Coding Agent
Autonomous test-driven development agent with E2B sandbox
The coding-agent template is a test-driven agent loop. Given a task description and a test suite, it writes code, executes it inside an E2B sandbox, reads the failures, and iterates until the tests pass or the step budget is exhausted. It uses Groq for fast inference, E2B for isolated execution, and the AGNT5 runtime to make every iteration durable.
What you’ll build
- An agent workflow that alternates between LLM reasoning and sandboxed code execution
- An E2B-backed
run_teststool that executes the candidate code against a pytest suite - A bounded iteration loop with a step budget and final-answer termination
- A durable journal of every attempt — every model call, every sandbox run, every test output
Requirements
- Python 3.10+
GROQ_API_KEYfrom console.groq.comE2B_API_KEYfrom e2b.dev- The AGNT5 CLI
Install
curl -LsSf https://agnt5.com/cli.sh | bashSetup
Scaffold the project
agnt5 create coding_agent tdd-agent
cd tdd-agentSet environment variables
export GROQ_API_KEY=gsk_...
export E2B_API_KEY=e2b_...Install dependencies
uv syncpip install -e .Run the agent
agnt5 dev up
agnt5 invoke coding_agent --input '{
"task": "Write a function `fizzbuzz(n)` that returns the classic list.",
"tests": "def test_fizzbuzz():\n assert fizzbuzz(5) == [1, 2, \"Fizz\", 4, \"Buzz\"]"
}'How it works
Each iteration of the loop follows the same shape. The LLM — served by Groq for sub-second turn latency — receives the task, the current candidate code, and the last test output. It returns either an updated source file or a final-answer signal. If it returned code, run_tests spins up an E2B sandbox, writes the candidate plus the test file, runs pytest, and returns the exit code and captured stdout/stderr. The workflow feeds that output into the next model turn.
Every LLM call and every sandbox execution is a durable step. If the worker dies while pytest is running, replay reconstructs the loop exactly — prior iterations return their journaled outputs, and only the in-flight step re-executes. The step budget is enforced by the workflow, not the model, so there’s a hard ceiling on cost and latency independent of what the agent decides.
E2B sandboxes cost money per minute and have startup latency. For fast iteration, the template reuses a single sandbox across iterations when possible; see tools/e2b.py.
Key files
- worker.py — Registers the agent workflow and its tools.
- agent.py — The iteration loop: model turn, tool dispatch, budget check.
- tools/e2b.py — The
run_teststool wrapping E2B’s Python SDK. - prompts/system.txt — Instructs the model to return code diffs and call
run_testsafter every change. - agnt5.toml — Project config, including the step budget for the workflow.
Customize
Swap Groq for another model. Groq is the default for its latency, but the loop works with any chat model that supports tool use. Change the client in agent.py and set the corresponding API key.
Replace E2B with a local runner. For air-gapped environments, swap tools/e2b.py for a Docker-based runner. Keep the function signature stable — (code: str, tests: str) -> TestResult — and the loop is unchanged.
Tighten the step budget. Reduce the max iterations in agent.py to cap cost for simple tasks. The workflow will return whatever the best passing candidate is, or a failure with the last test output.
Next steps
- Read /docs/build/agents for the agent loop model
- Compare with code_reviewer for a non-agentic counterpart
- See /docs/build/workflows for retry semantics around sandboxed steps