> For the complete documentation index, see [llms.txt](/llms.txt).
> A full single-fetch corpus is available at [llms-full.txt](/llms-full.txt).
---
title: Improve with AGNT5
description: Use production run data to create datasets, score outputs, compare candidates, and gate changes.
last_verified: 2026-06-07
---

Improve covers the eval loop for deployed AGNT5 components. You use production runs to curate datasets, score outputs, and compare candidate prompts, models, or agent behavior before rolling changes forward.

## What you'll do

- **[Datasets](/docs/improve/datasets.md)**: collect representative inputs from production runs or authored examples.
- **[Scorers](/docs/improve/scorers.md)**: define how AGNT5 decides whether an output meets the target behavior.
- **[Experiments](/docs/improve/experiments.md)**: compare a candidate prompt, model, or agent behavior against a baseline.
- **[Online evals](/docs/improve/online-evals.md)**: score production runs asynchronously and alert when quality drops.
- **[Batch eval](/docs/improve/batch-eval.md)**: run evaluations from code using `client.eval()` or `client.batch_eval()`.
- **[Quality cases](/docs/improve/quality-cases.md)**: track regressions and production issues through a structured lifecycle.
- **[Prompts](/docs/build/prompts.md)**: version prompt changes and compare them before they affect production traffic.

Outcome: you can decide whether a change improves agent behavior using the same platform that runs the agent.

## Next steps

- [Datasets](/docs/improve/datasets.md): curate test cases from production runs and publish immutable versions.
- [Scorers](/docs/improve/scorers.md): pick built-in checks or write custom scorer code.
- [Experiments](/docs/improve/experiments.md): run, compare, and gate CI on eval results.
- [Batch eval](/docs/improve/batch-eval.md): run evaluations directly from the Python or TypeScript SDK.
- [Quality cases](/docs/improve/quality-cases.md): track issues from discovery to verified fix.
- [Run with AGNT5](/docs/run/overview.md): capture the production run data that feeds improvement work.
