What gets generated

Multi-provider gateway

Anthropic is part of the 6-provider chain: OpenRouter (which proxies Anthropic, OpenAI, and 65+ others) → direct Anthropic → direct OpenAI → Google → DeepSeek → HuggingFace. Each provider can have up to 5 keys for rotation under rate limits.

# app/services/llm_gateway.py — generated structure
def call_claude(prompt, system=None, max_tokens=2000, stream=False, use_prompt_cache=True):
    headers = {"anthropic-version": "2023-06-01"}
    if use_prompt_cache and system:
        # Mark system prompt as cacheable — saves tokens on repeat calls
        system = [{"type": "text", "text": system, "cache_control": {"type": "ephemeral"}}]
    return _dispatch_anthropic(prompt, system, max_tokens, stream, headers)

Prompt caching

Anthropic's prompt cache is wired for system prompts and large context blocks. When the same context is re-used (common in agent loops, RAG pipelines, conversational threads), the cached prefix doesn't get re-billed. Generated app/services/prompt_cache_strategy.py decides what to cache based on size + reuse patterns.

Streaming with cancellation

SSE-based streaming on the frontend. Cancellation propagates: closing the connection aborts the Anthropic request, saves token cost. Partial completions are persisted to the conversation log so refresh resumes the message.

Tool use (function calling)

Claude's tool-use API is wired with a registry of tools. Each tool has explicit JSON schema for inputs and outputs. Tools that perform sensitive operations require human confirmation before execution.

Usage metering and hard caps

Every call writes timestamp, user, workspace, model, input_tokens, output_tokens, cache_read_tokens, cache_write_tokens, latency, cost. Aggregated per workspace per period. Hard cap at the gateway — when hit, returns 402 with a clear message instead of silently burning credits.

Retry and failover

429 + key rotation. 529 (overloaded) + retry with backoff. 5xx + fall through to next provider. The fallback chain means an Anthropic outage degrades to OpenAI / Google / DeepSeek transparently, with a logged warning so ops can see provider-mix shifts.

Prompt versioning

Prompts live in app/prompts/<name>/v<N>.txt. Versions are immutable once deployed. A/B routing splits traffic by user-hash so you can roll a new prompt to 10% of customers and measure before full rollout.

What ships in `docs/`

docs/decisions/ADR-0006-llm-provider-cascade.md — Anthropic's place in the 6-provider chain
docs/decisions/ADR-0009-usage-metering.md — the cost-control architecture
docs/decisions/ADR-0012-prompt-versioning.md — versioned-immutable strategy with A/B routing
docs/decisions/ADR-0019-prompt-cache-strategy.md — what to cache, what not to (cost vs hit rate)

Environment variables generated

ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_API_KEY_2=sk-ant-...                # rotation key, optional
LLM_MODEL_ANTHROPIC=claude-sonnet-4,claude-haiku-4   # comma-separated cascade within Anthropic
LLM_MONTHLY_TOKEN_CAP_PER_WORKSPACE=10000000

Anthropic documentation references

Internal links

OpenAI integration for the GPT side of the cascade
AI SaaS wrapper use case

CTA

Try it — free plan, no credit card. archiet.com.

Generate a codebase with Anthropic wired through the multi-provider gateway plus prompt caching, decide if it's the architecture you'd ship.

What gets generated

Multi-provider gateway

# app/services/llm_gateway.py — generated structure
def call_claude(prompt, system=None, max_tokens=2000, stream=False, use_prompt_cache=True):
    headers = {"anthropic-version": "2023-06-01"}
    if use_prompt_cache and system:
        # Mark system prompt as cacheable — saves tokens on repeat calls
        system = [{"type": "text", "text": system, "cache_control": {"type": "ephemeral"}}]
    return _dispatch_anthropic(prompt, system, max_tokens, stream, headers)

Prompt caching

Streaming with cancellation

Tool use (function calling)

Usage metering and hard caps

Retry and failover

Prompt versioning

What ships in `docs/`

docs/decisions/ADR-0006-llm-provider-cascade.md — Anthropic's place in the 6-provider chain
docs/decisions/ADR-0009-usage-metering.md — the cost-control architecture
docs/decisions/ADR-0012-prompt-versioning.md — versioned-immutable strategy with A/B routing
docs/decisions/ADR-0019-prompt-cache-strategy.md — what to cache, what not to (cost vs hit rate)

Environment variables generated

ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_API_KEY_2=sk-ant-...                # rotation key, optional
LLM_MODEL_ANTHROPIC=claude-sonnet-4,claude-haiku-4   # comma-separated cascade within Anthropic
LLM_MONTHLY_TOKEN_CAP_PER_WORKSPACE=10000000

Anthropic documentation references

Internal links

OpenAI integration for the GPT side of the cascade
AI SaaS wrapper use case

CTA

Try it — free plan, no credit card. archiet.com.

Generate a codebase with Anthropic wired through the multi-provider gateway plus prompt caching, decide if it's the architecture you'd ship.

Archiet + Anthropic — Claude wired with the production controls that matter

What gets generated

Multi-provider gateway

Prompt caching

Streaming with cancellation

Tool use (function calling)

Usage metering and hard caps

Retry and failover

Prompt versioning

What ships in `docs/`

Environment variables generated

Anthropic documentation references

Internal links

CTA

Generate the codebase.

Archiet + Anthropic — Claude wired with the production controls that matter

What gets generated

Multi-provider gateway

Prompt caching

Streaming with cancellation

Tool use (function calling)

Usage metering and hard caps

Retry and failover

Prompt versioning

What ships in `docs/`

Environment variables generated

Anthropic documentation references

Internal links

CTA

Generate the codebase.

What gets generated

Multi-provider gateway

Prompt caching

Streaming with cancellation

Tool use (function calling)

Usage metering and hard caps

Retry and failover

Prompt versioning

What ships in docs/

Environment variables generated

Anthropic documentation references

Internal links

CTA

Generate the codebase.

What gets generated

Multi-provider gateway

Prompt caching

Streaming with cancellation

Tool use (function calling)

Usage metering and hard caps

Retry and failover

Prompt versioning

What ships in docs/

Environment variables generated

Anthropic documentation references

Internal links

CTA

Generate the codebase.

What ships in `docs/`

What ships in `docs/`