Code Reviewer

AI-powered code review agent with GitHub and Jira integration

Get Code

The code-reviewer template is a deterministic review pipeline, not a chat agent. Given a repository and a pull request number, it fetches the diff, runs an LLM review pass over each changed file, and posts grouped comments back to GitHub. Every step is durable, so the review survives transient GitHub or LLM failures and can be replayed with an updated prompt to compare output.

What you’ll build

  • A workflow triggered by a GitHub PR (webhook or manual invoke) that clones and diffs the branch
  • A per-file LLM review step that produces structured comments
  • A posting step that groups comments and publishes them via the GitHub API
  • A reviewable journal of every model call, so you can replay a review with a new prompt and diff the results

Requirements

  • Python 3.10+
  • OPENAI_API_KEY or ANTHROPIC_API_KEY
  • GITHUB_TOKEN with repo scope (or a GitHub App installation token)
  • The AGNT5 CLI

Install

curl -LsSf https://agnt5.com/cli.sh | bash

Setup

Scaffold the project

agnt5 create code_reviewer pr-reviewer
cd pr-reviewer

Set environment variables

export OPENAI_API_KEY=sk-...
export GITHUB_TOKEN=ghp_...

Install dependencies

uv sync
pip install -e .

Run a review

agnt5 dev up
agnt5 invoke review_pull_request --input '{"repo": "owner/name", "pr_number": 42}'

How it works

The workflow has three phases. First, fetch_diff calls GitHub’s REST API to retrieve the PR’s unified diff and parses it into per-file chunks. Second, the workflow iterates over changed files and calls review_file for each — this is the LLM step, prompted to return structured review items (path, line range, severity, comment text). Third, post_comments groups the items into a single PR review and submits them via the GitHub Reviews API.

Each phase runs through ctx.step(), so the journal captures the diff, every per-file review, and the final post. This is what makes retries safe: if the LLM step times out on file 7 of 12, replay picks up from file 7 — files 1–6 return their cached reviews, and GitHub is called once, not thirteen times. The workflow is not an agentic loop; the model never decides when to stop. Deterministic flow keeps reviews predictable and makes prompt iteration easy.

Because every model call is journaled, you can replay a historical review with a new prompt or a new model, diff the output, and promote the change only if it’s actually better. This is the replay-as-evaluation pattern.

Key files

  • worker.py — Registers the workflow and its three constituent functions.
  • functions/fetch_diff.py — GitHub API call and diff parsing.
  • functions/review_file.py — LLM call with the review prompt; returns structured comments.
  • functions/post_comments.py — Groups comments and posts one review to GitHub.
  • prompts/review.txt — The reviewer system prompt. Edit this first to change behavior.

Customize

Change the review style. prompts/review.txt is the lever — tighten it for nit-picky reviews, loosen it for architectural feedback. Because the prompt is in git, every review is traceable to a prompt version.

Skip generated files. Add a filter in fetch_diff that drops paths matching **/*.generated.* or files over a size threshold before they reach review_file.

Swap the model. The LLM call lives in review_file.py. Replace the OpenAI client with Anthropic or Groq; structured output parsing stays the same.

Add a severity gate. Wrap post_comments with a step that only posts high or critical items, turning the template into a lint-grade reviewer.

Next steps