This post is a technical collection of patterns for AI agents using Anthropic Claude and FastAPI, as we use them in customer setups. Not a tutorial — more a quick reference for the decisions that make the difference between "works in the demo" and "works in production." 1. Streaming instead of batch for user-facing Default tutorials always show messages.create() with full response. For user-facing apps that's bad — the UI freezes 3–15 seconds before the answer arrives. Use server-sent events instead. from anthropic import AsyncAnthropic from fastapi.responses import StreamingResponse client = AsyncAnthropic() @router.post("/agent/chat") async def chat(prompt: str): async def event_stream(): async with client.messages.stream( model="claude-opus-4-6", messages=[{"role": "user", "content": prompt}], max_tokens=4096, ) as stream: async for chunk in stream.text_stream: yield f"data: {chunk}\n\n" return StreamingResponse(event_stream(), media_type="text/event-stream") 2. Use prompt caching When a large system prompt stays identical across requests, use cache_control . Claude holds it for up to 5 minutes; you pay 10% of the normal input price for cache hits. Real-world: 73% cost reduction on a customer support bot that includes the same documentation as context in every request. 3. Tool-use with Pydantic validation Claude's tool-use is powerful but it can return invalid arguments. Defense: Pydantic models as tool schema generator AND validator. On validation error, play the error back to Claude — retry often works on the next turn. 4. Respect rate limits Anthropic has per-minute and per-day limits. In production agent loops, you hit them fast. We use aiolimiter plus tenacity retry with exponential backoff for 529 overload responses. 5. Structured output without tool-use Sometimes you need strict JSON output without tool-use. Claude trick: prefill with { — that forces Claude into JSON mode. Works 99% of the time; fallback with Pydantic validation for the rest. 6. Agent loops: set max iterations An agent that decides for itself when it's done can get stuck in loops. Always set a hard upper bound. Log iteration and stop_reason — helps debug why an agent looped. 7. Observability: structlog + Anthropic request IDs Every Claude response comes with a request_id that Anthropic indexes in their dashboard. Always log it. For bugs, we've traced back which prompt led to which response — Anthropic support can look up request IDs weeks later. What we don't recommend Langchain/LlamaIndex for Anthropic tools — too many abstraction layers, debugging becomes hell. Use the Anthropic SDK directly. Own message serialization — the SDK handles this correctly, including Pydantic models. Passing response objects across process boundaries — not picklable. Extract the fields you need instead. We have a minimal Claude + FastAPI template on github.com/TechLogia-de with streaming, caching, tool-use, and structured logging. Questions on agent setups or code reviews: kontakt@techlogia.de .