> For the complete documentation index, see [llms.txt](/llms.txt).
> A full single-fetch corpus is available at [llms-full.txt](/llms-full.txt).
---
title: Quality cases
description: Track production failures, eval regressions, and behavior issues through a structured lifecycle from discovery to verified fix.
last_verified: 2026-06-23
---

A **quality case** is a structured issue record that tracks a behavior problem from discovery to resolution. When an [online eval](/docs/improve/online-evals.md) alert fires, an [experiment](/docs/improve/experiments.md) run regresses, or a production run fails in an unexpected way, you open a quality case to investigate the root cause, iterate on a fix, and verify it before shipping.

Quality cases connect directly to the runs, scores, datasets, and deployments already in AGNT5 with no separate issue tracker required. Studio surfaces them under **Evaluate** -> **Quality cases**.

## Case anatomy

Every case has:

| Field | Description |
|---|---|
| `title` | Short summary of the problem |
| `description` | Detail about what happened and why it matters |
| `category` | Problem type (see below) |
| `severity` | `low`, `medium`, `high`, or `critical` |
| `status` | Current lifecycle stage (see below) |
| `source_type` | What triggered the case |
| `expected_behavior` | What the component should have done |
| `observed_behavior` | What it actually did |
| `labels` | Free-form tags for filtering |

### Categories

| Category | When to use |
|---|---|
| `behavior_quality` | Output was wrong, incomplete, or unhelpful |
| `eval_regression` | An experiment run score dropped below baseline |
| `production_failure` | A live run raised an error or timed out |
| `deployment_health` | A deployment is unhealthy or unresponsive |
| `runtime_infra` | Platform-level infrastructure issue |
| `support_request` | Issue escalated from a user report |
| `release_risk` | A change introduces risk before a planned release |

## Lifecycle

Cases flow through a fixed set of statuses:

```
open → triaged → investigating → candidate_ready → verified → shipped → closed
```

| Status | Meaning |
|---|---|
| `open` | Newly created; not yet reviewed |
| `triaged` | Reviewed; severity and category confirmed |
| `investigating` | Root cause analysis in progress |
| `candidate_ready` | A fix candidate is ready for eval |
| `verified` | Eval results confirm the fix works |
| `shipped` | Fix deployed to production |
| `closed` | Resolved or won't fix |

Move a case forward by updating its status via the API or Studio.

## Create a case

### From Studio

Open **Evaluate** -> **Quality cases** -> **New case**, fill in the fields, and optionally link a run, experiment, or alert at creation time.

### From the API

```bash
curl -X POST "https://api.agnt5.com/api/v1/projects/<project-id>/quality/cases" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "Support agent cites wrong order status",
    "description": "Agent reported order as shipped when it was still processing for 3 of 50 eval items.",
    "category": "behavior_quality",
    "severity": "high",
    "source_type": "eval_run_item",
    "experiment_run_id": "<run-id>",
    "expected_behavior": "Agent returns current status from orders API",
    "observed_behavior": "Agent returned stale cached status",
    "labels": ["orders", "caching"]
  }'
```

### From an MCP tool

If you're using Claude with AGNT5, three MCP tools let you create and query cases from inside agent workflows or conversations: `create_quality_case`, `get_quality_case`, and `list_quality_cases`. These accept the same fields as the REST API and return the same case structure.

### From an experiment run failure

After a run with failures, create a case directly from the failing items:

```bash
# From the experiment run page in Studio:
# Select failing items → Actions → Create quality case

# Or link an existing case to a run item via API:
curl -X POST "https://api.agnt5.com/api/v1/projects/<project-id>/quality/cases/<case-id>/links" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{"link_type": "experiment_run_item", "link_id": "<run-item-id>"}'
```

## Update a case

```bash
# Move to investigating
curl -X PATCH "https://api.agnt5.com/api/v1/projects/<project-id>/quality/cases/<case-id>" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "status": "investigating",
    "description": "Root cause: prompt does not refresh order cache. Fixing in PR #42."
  }'
```

## List and filter cases

```bash
# All open high-severity cases
curl "https://api.agnt5.com/api/v1/projects/<project-id>/quality/cases?status=open&severity=high" \
  -H "Authorization: Bearer <token>"

# Cases in a specific category
curl "https://api.agnt5.com/api/v1/projects/<project-id>/quality/cases?category=eval_regression" \
  -H "Authorization: Bearer <token>"

# Cases with a label
curl "https://api.agnt5.com/api/v1/projects/<project-id>/quality/cases?label=orders" \
  -H "Authorization: Bearer <token>"
```

Filters: `status`, `severity`, `category`, `source_type`, `label`. All filters are optional and combinable.

## Create a regression dataset from a case

Once a fix is in review (`candidate_ready`), build a regression dataset from the failing items so the fix can be verified before shipping:

```bash
# In Studio: quality case detail → Actions → Create regression dataset
# Or via the experiments regression-dataset command linked to the run:
agnt5 experiments runs regression-dataset <run-id> \
  --name "order-status-regression" \
  --start-run --wait
```

Link the resulting dataset and experiment run back to the case:

```bash
curl -X PATCH "https://api.agnt5.com/api/v1/projects/<project-id>/quality/cases/<case-id>" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{"status": "verified"}'
```

When the regression experiment passes, mark the case `verified`, then `shipped` after deployment.

## Add events to the audit trail

Every status change, note, or link creates a case event automatically. Add an explicit note at any lifecycle stage:

```bash
curl -X POST "https://api.agnt5.com/api/v1/projects/<project-id>/quality/cases/<case-id>/events" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "event_type": "note_added",
    "note": "Confirmed the stale cache issue reproduces on 10% of cold-start runs."
  }'
```

Event types: `note_added`, `investigation_added`, `candidate_linked`, `release_evidence_linked`.

## Source types

Cases can originate from any AGNT5 signal:

| Source type | Typical trigger |
|---|---|
| `eval_run_item` | Failing item in an experiment run |
| `eval_alert` | Online eval alert threshold breach |
| `runtime_run` | A specific production run failure |
| `guardrail_decision` | A guardrail blocked or flagged a run |
| `deployment` | Generic deployment health issue |
| `deployment_failure` | A deployment went unhealthy |
| `worker_health` | A worker is unhealthy or unreachable |
| `runtime_cluster` | Cluster-level runtime infrastructure issue |
| `manual` | Manually created from a report or observation |
| `support_ticket` | Escalated from a user |


**REST endpoints**: `POST /api/v1/projects/{id}/quality/cases` (create), `GET /api/v1/projects/{id}/quality/cases` (list, filters: status/severity/category/source_type/label), `GET /api/v1/projects/{id}/quality/cases/{caseId}` (get), `PATCH /api/v1/projects/{id}/quality/cases/{caseId}` (update), `POST /api/v1/projects/{id}/quality/cases/{caseId}/events` (add event), `POST /api/v1/projects/{id}/quality/cases/{caseId}/links` (add link).
**MCP tools**: `create_quality_case`, `get_quality_case`, `list_quality_cases`.
**Statuses**: open → triaged → investigating → candidate_ready → verified → shipped → closed.
**Categories**: behavior_quality, eval_regression, production_failure, deployment_health, runtime_infra, support_request, release_risk.
**Severities**: low, medium, high, critical.
**Source types**: eval_run_item, eval_alert, runtime_run, guardrail_decision, deployment_failure, worker_health, runtime_cluster, manual, support_ticket.


## Next steps

* [Experiments](/docs/improve/experiments.md): run a fix candidate against a regression dataset and gate on the result.
* [Online evals](/docs/improve/online-evals.md): set up alerts that automatically surface cases when production quality drops.
* [Datasets](/docs/improve/datasets.md): build the regression dataset that proves a fix works before shipping.
* [Deploying](/docs/run/deploying.md): ship the verified fix and close the case.
