What gets generated
Multi-provider gateway
Anthropic is part of the 6-provider chain: OpenRouter (which proxies Anthropic, OpenAI, and 65+ others) → direct Anthropic → direct OpenAI → Google → DeepSeek → HuggingFace. Each provider can have up to 5 keys for rotation under rate limits.
# app/services/llm_gateway.py — generated structure
def call_claude(prompt, system=None, max_tokens=2000, stream=False, use_prompt_cache=True):
headers = {"anthropic-version": "2023-06-01"}
if use_prompt_cache and system:
# Mark system prompt as cacheable — saves tokens on repeat calls
system = [{"type": "text", "text": system, "cache_control": {"type": "ephemeral"}}]
return _dispatch_anthropic(prompt, system, max_tokens, stream, headers)
Prompt caching
Anthropic's prompt cache is wired for system prompts and large context blocks. When the same context is re-used (common in agent loops, RAG pipelines, conversational threads), the cached prefix doesn't get re-billed. Generated app/services/prompt_cache_strategy.py decides what to cache based on size + reuse patterns.
Streaming with cancellation
SSE-based streaming on the frontend. Cancellation propagates: closing the connection aborts the Anthropic request, saves token cost. Partial completions are persisted to the conversation log so refresh resumes the message.
Tool use (function calling)
Claude's tool-use API is wired with a registry of tools. Each tool has explicit JSON schema for inputs and outputs. Tools that perform sensitive operations require human confirmation before execution.
Usage metering and hard caps
Every call writes timestamp, user, workspace, model, input_tokens, output_tokens, cache_read_tokens, cache_write_tokens, latency, cost. Aggregated per workspace per period. Hard cap at the gateway — when hit, returns 402 with a clear message instead of silently burning credits.
Retry and failover
429 + key rotation. 529 (overloaded) + retry with backoff. 5xx + fall through to next provider. The fallback chain means an Anthropic outage degrades to OpenAI / Google / DeepSeek transparently, with a logged warning so ops can see provider-mix shifts.
Prompt versioning
Prompts live in app/prompts/<name>/v<N>.txt. Versions are immutable once deployed. A/B routing splits traffic by user-hash so you can roll a new prompt to 10% of customers and measure before full rollout.
What ships in docs/
docs/decisions/ADR-0006-llm-provider-cascade.md— Anthropic's place in the 6-provider chaindocs/decisions/ADR-0009-usage-metering.md— the cost-control architecturedocs/decisions/ADR-0012-prompt-versioning.md— versioned-immutable strategy with A/B routingdocs/decisions/ADR-0019-prompt-cache-strategy.md— what to cache, what not to (cost vs hit rate)
Environment variables generated
ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_API_KEY_2=sk-ant-... # rotation key, optional
LLM_MODEL_ANTHROPIC=claude-sonnet-4,claude-haiku-4 # comma-separated cascade within Anthropic
LLM_MONTHLY_TOKEN_CAP_PER_WORKSPACE=10000000
Anthropic documentation references
Internal links
- OpenAI integration for the GPT side of the cascade
- AI SaaS wrapper use case
CTA
Try it — free plan, no credit card. archiet.com.
Generate a codebase with Anthropic wired through the multi-provider gateway plus prompt caching, decide if it's the architecture you'd ship.