This post is a technical collection of patterns for AI agents using Anthropic Claude and FastAPI, as we use them in customer setups. Not a tutorial — more a quick reference for the decisions that make the difference between "works in the demo" and "works in production."

1. Streaming instead of batch for user-facing

Default tutorials always show messages.create() with full response. For user-facing apps that's bad — the UI freezes 3–15 seconds before the answer arrives. Use server-sent events instead.

from anthropic import AsyncAnthropic
from fastapi.responses import StreamingResponse

client = AsyncAnthropic()

@router.post("/agent/chat")
async def chat(prompt: str):
    async def event_stream():
        async with client.messages.stream(
            model="claude-opus-4-6",
            messages=[{"role": "user", "content": prompt}],
            max_tokens=4096,
        ) as stream:
            async for chunk in stream.text_stream:
                yield f"data: {chunk}\n\n"

    return StreamingResponse(event_stream(), media_type="text/event-stream")

2. Use prompt caching

When a large system prompt stays identical across requests, use cache_control. Claude holds it for up to 5 minutes; you pay 10% of the normal input price for cache hits. Real-world: 73% cost reduction on a customer support bot that includes the same documentation as context in every request.

3. Tool-use with Pydantic validation

Claude's tool-use is powerful but it can return invalid arguments. Defense: Pydantic models as tool schema generator AND validator. On validation error, play the error back to Claude — retry often works on the next turn.

4. Respect rate limits

Anthropic has per-minute and per-day limits. In production agent loops, you hit them fast. We use aiolimiter plus tenacity retry with exponential backoff for 529 overload responses.

5. Structured output without tool-use

Sometimes you need strict JSON output without tool-use. Claude trick: prefill with { — that forces Claude into JSON mode. Works 99% of the time; fallback with Pydantic validation for the rest.

6. Agent loops: set max iterations

An agent that decides for itself when it's done can get stuck in loops. Always set a hard upper bound. Log iteration and stop_reason — helps debug why an agent looped.

7. Observability: structlog + Anthropic request IDs

Every Claude response comes with a request_id that Anthropic indexes in their dashboard. Always log it. For bugs, we've traced back which prompt led to which response — Anthropic support can look up request IDs weeks later.

What we don't recommend

Langchain/LlamaIndex for Anthropic tools — too many abstraction layers, debugging becomes hell. Use the Anthropic SDK directly.
Own message serialization — the SDK handles this correctly, including Pydantic models.
Passing response objects across process boundaries — not picklable. Extract the fields you need instead.

We have a minimal Claude + FastAPI template on github.com/TechLogia-de with streaming, caching, tool-use, and structured logging. Questions on agent setups or code reviews: kontakt@techlogia.de.

Claude + FastAPI: 7 agent patterns from production