> For the complete documentation index, see [llms.txt](/llms.txt).
> A full single-fetch corpus is available at [llms-full.txt](/llms-full.txt).
---
title: Prompt caching
description: Enable provider-native prompt caching from one SDK surface across Anthropic, OpenAI, and Gemini.
last_verified: 2026-06-30
---

**Prompt caching** lets a model provider reuse the stable leading portion of a request, such as instructions, tool definitions, documents, and earlier conversation turns. AGNT5 exposes one `cache` option and translates it to the provider-specific request shape under the hood. Use it when repeated calls share a long prefix and you want lower latency, lower input-token spend, and cache-hit metrics in one place.

AGNT5 reports cache hits through `response.usage.cached_tokens` across Anthropic, OpenAI, and Gemini. Providers that expose cache writes also populate `response.usage.cache_creation_tokens`.

## Enable caching

Pass `cache=True` when you want AGNT5 to turn on provider-native prompt caching wherever the provider needs an explicit signal. Pass a `PromptCache` policy when you need TTL, key, retention, or a reusable Gemini cache resource.

### Python

```python
from agnt5 import Agent, lm

agent = Agent(
    name="support_agent",
    model="anthropic/claude-sonnet-4-6",
    instructions=LONG_STABLE_INSTRUCTIONS,
    tools=[lookup_order, create_refund],
    cache=lm.PromptCache(ttl="1h"),
)

result = await agent.run("Customer order 1042 arrived damaged.")
```

Use the same `cache` option for direct model calls:

```python
from agnt5 import lm

response = await lm.generate(
    model="openai/gpt-4o-mini",
    system_prompt=LONG_STABLE_INSTRUCTIONS,
    prompt="Summarize the customer escalation.",
    cache=lm.PromptCache(
        key="support-agent",
        retention="24h",
    ),
)

print(response.usage.cached_tokens)
```

### TypeScript

```ts
const model = LM.anthropic();

const agent = new Agent({
  name: "support_agent",
  model,
  modelName: "anthropic/claude-sonnet-4-6",
  instructions: LONG_STABLE_INSTRUCTIONS,
  tools: [lookupOrder, createRefund],
  cache: { ttl: "1h" },
});

const result = await agent.run("Customer order 1042 arrived damaged.");
```

Direct model calls use `config.cache`:

```ts
const openai = LM.openai();

const response = await openai.generate({
  model: "openai/gpt-4o-mini",
  systemPrompt: LONG_STABLE_INSTRUCTIONS,
  messages: [{ role: "user", content: "Summarize the customer escalation." }],
  config: {
    cache: {
      key: "support-agent",
      retention: "24h",
    },
  },
});

console.log(response.usage?.cachedTokens);
```

### Go

```go
model := agnt5.NewAnthropicModel(agnt5.AnthropicConfig{
    APIKey: os.Getenv("ANTHROPIC_API_KEY"),
    Model:  "claude-sonnet-4-6",
})

agent, err := agnt5.NewAgent(
    "support_agent",
    agnt5.WithAgentModel(model),
    agnt5.WithAgentInstructions(LONG_STABLE_INSTRUCTIONS),
    agnt5.WithAgentTools(lookupOrder, createRefund),
    agnt5.WithAgentPromptCache(agnt5.PromptCacheWithTTL("1h")),
)
if err != nil {
    return err
}

result, err := agent.Run(ctx, agnt5.AgentInput{
    Message: "Customer order 1042 arrived damaged.",
})
```

Direct model calls use `GenerateRequest.Cache`:

```go
response, err := model.Generate(ctx, agnt5.GenerateRequest{
    Messages: []agnt5.Message{
        {Role: agnt5.MessageRoleSystem, Content: LONG_STABLE_INSTRUCTIONS},
        {Role: agnt5.MessageRoleUser, Content: "Summarize the customer escalation."},
    },
    Cache: agnt5.PromptCacheWithTTL("1h"),
})
if err != nil {
    return err
}

fmt.Println(response.Usage.CachedTokens)
```

## Provider behavior

Use the same `cache` option in application code. AGNT5 adapts the request per provider.

| Provider | AGNT5 input | Provider translation | Metrics |
|---|---|---|---|
| Anthropic | `cache=True` or `ttl` | Sends ephemeral `cache_control`; includes TTL when set | `cached_tokens`, `cache_creation_tokens` |
| OpenAI | Stable prefix; optional `key` and `retention` | Uses automatic prompt caching; Responses-backed SDKs map hints to `prompt_cache_key` and `prompt_cache_retention` | `cached_tokens` |
| Gemini implicit | `cache=True` | No request field; Gemini detects repeated prefixes | `cached_tokens` |
| Gemini explicit | `ContextCache` or `resource` | Sends `cachedContent` | `cached_tokens` |

AGNT5 ignores hints a provider does not support. That keeps provider switches from leaking into agent code: a TTL matters for Anthropic, key and retention matter for OpenAI Responses, and resource names matter for Gemini explicit caches.

## Reuse a Gemini context cache

Gemini supports explicit context caches for large, stable content. Create the cache once, pass the returned object through `cache`, and delete it when the workload is finished.

### Python

```python
from agnt5 import lm

cache = await lm.create_cache(
    model="google/gemini-2.5-flash",
    system_prompt="You are a contract analyst.",
    contents=contract_text,
    ttl_seconds=3600,
)

response = await lm.generate(
    model="google/gemini-2.5-flash",
    prompt="List the termination clauses.",
    cache=cache,
)

await cache.delete()
```

### TypeScript

```ts
const gemini = LM.google();

const cache = await gemini.createCache({
  model: "google/gemini-2.5-flash",
  systemPrompt: "You are a contract analyst.",
  contents: contractText,
  ttlSeconds: 3600,
});

const response = await gemini.generate({
  model: "google/gemini-2.5-flash",
  messages: [{ role: "user", content: "List the termination clauses." }],
  config: { cache },
});

await gemini.deleteCache(cache);
```

### Go

```go
gemini := agnt5.NewGoogleModel(agnt5.GoogleConfig{
    APIKey: os.Getenv("GOOGLE_API_KEY"),
    Model:  "google/gemini-2.5-flash",
})

cacheName, err := gemini.CreateCachedContent(
    ctx,
    "google/gemini-2.5-flash",
    "You are a contract analyst.",
    []string{contractText},
    3600,
)
if err != nil {
    return err
}
defer gemini.DeleteCachedContent(ctx, cacheName)

response, err := gemini.Generate(ctx, agnt5.GenerateRequest{
    Messages: []agnt5.Message{
        {Role: agnt5.MessageRoleUser, Content: "List the termination clauses."},
    },
    Cache: agnt5.PromptCacheResource(cacheName),
})
```

## Read cache metrics

Cache metrics are normalized onto the same usage fields in each SDK.

| Field | Meaning |
|---|---|
| `cached_tokens` / `cachedTokens` / `CachedTokens` | Input tokens served from a provider prompt cache. |
| `cache_creation_tokens` / `cacheCreationTokens` / `CacheCreationTokens` | Input tokens written into a cache. Anthropic reports this today. |
| `prompt_tokens` / `promptTokens` / `InputTokens` | All input tokens, including uncached tokens and cached-token reads. |

```python
response = await lm.generate(
    model="anthropic/claude-sonnet-4-6",
    system_prompt=LONG_STABLE_INSTRUCTIONS,
    prompt="Answer the next support question.",
    cache=True,
)

usage = response.usage
print(f"Cache hits: {usage.cached_tokens}")
print(f"Cache writes: {usage.cache_creation_tokens}")
```

The first matching call usually records a cache write or a cache miss. Later calls with the same stable prefix should report cache hits.

## Design for hits

Prompt caching only works when the provider sees the same leading tokens.

* **Keep instructions stable**: Put request-specific facts in the user message, not in `Agent(instructions=...)` or `system_prompt`.
* **Keep tool schemas stable**: Adding, removing, or reordering tools changes the cacheable prefix.
* **Keep serialization deterministic**: Sort JSON keys before embedding generated JSON in instructions.
* **Keep model IDs stable**: Changing the model string creates a different provider cache boundary.

<Callout type="warning">
Avoid timestamps, UUIDs, random seeds, and per-user template branches in the stable prefix. Put those values after the cached prefix so the reusable portion remains byte-identical.
</Callout>

## Next steps

* [Agents](/docs/build/agents.md): configure the agent loop that carries stable instructions and tools.
* [Prompts](/docs/build/prompts.md): version prompt text so cacheable prefixes stay reproducible across deployments.
* [AI providers](/docs/integrations/ai-providers.md): configure provider credentials and model routing.
