This post is a technical collection of patterns for AI agents using Anthropic Claude and FastAPI, as we use them in customer setups. Not a tutorial — more a quick reference for the decisions that make the difference between "works in the demo" and "works in production."
1. Streaming instead of batch for user-facing
Default tutorials always show messages.create() with full response. For
user-facing apps that's bad — the UI freezes 3–15 seconds before the answer arrives.
Use server-sent events instead.
from anthropic import AsyncAnthropic
from fastapi.responses import StreamingResponse
client = AsyncAnthropic()
@router.post("/agent/chat")
async def chat(prompt: str):
async def event_stream():
async with client.messages.stream(
model="claude-opus-4-6",
messages=[{"role": "user", "content": prompt}],
max_tokens=4096,
) as stream:
async for chunk in stream.text_stream:
yield f"data: {chunk}\n\n"
return StreamingResponse(event_stream(), media_type="text/event-stream")
2. Use prompt caching
When a large system prompt stays identical across requests, use
cache_control. Claude holds it for up to 5 minutes; you pay 10% of the
normal input price for cache hits. Real-world: 73% cost reduction on a customer support
bot that includes the same documentation as context in every request.
3. Tool-use with Pydantic validation
Claude's tool-use is powerful but it can return invalid arguments. Defense: Pydantic models as tool schema generator AND validator. On validation error, play the error back to Claude — retry often works on the next turn.
4. Respect rate limits
Anthropic has per-minute and per-day limits. In production agent loops, you hit them
fast. We use aiolimiter plus tenacity retry with exponential
backoff for 529 overload responses.
5. Structured output without tool-use
Sometimes you need strict JSON output without tool-use. Claude trick: prefill with
{ — that forces Claude into JSON mode. Works 99% of the time; fallback
with Pydantic validation for the rest.
6. Agent loops: set max iterations
An agent that decides for itself when it's done can get stuck in loops. Always set a
hard upper bound. Log iteration and stop_reason — helps
debug why an agent looped.
7. Observability: structlog + Anthropic request IDs
Every Claude response comes with a request_id that Anthropic indexes in
their dashboard. Always log it. For bugs, we've traced back which prompt led to which
response — Anthropic support can look up request IDs weeks later.
What we don't recommend
- Langchain/LlamaIndex for Anthropic tools — too many abstraction layers, debugging becomes hell. Use the Anthropic SDK directly.
- Own message serialization — the SDK handles this correctly, including Pydantic models.
- Passing response objects across process boundaries — not picklable. Extract the fields you need instead.
We have a minimal Claude + FastAPI template on github.com/TechLogia-de with streaming, caching, tool-use, and structured logging. Questions on agent setups or code reviews: kontakt@techlogia.de.

