What gets generated
Multi-provider gateway
OpenAI is one provider in a 6-provider cascade: OpenRouter (which itself proxies OpenAI plus 65+ others), then direct Anthropic, then direct OpenAI, then Google, then DeepSeek, then HuggingFace. Each can have up to 5 API keys for rotation under rate limits.
# app/services/llm_gateway.py — generated structure
def call_llm(prompt, model_hint=None, max_tokens=2000, stream=False):
chain = _build_provider_chain(model_hint)
last_err = None
for provider, model in chain:
try:
return _dispatch(provider, model, prompt, max_tokens, stream)
except RateLimitError as e:
# try next key, then next provider
continue
except QuotaExceededError as e:
# 402 — skip remaining keys for this provider
_skip_provider(provider)
continue
except Exception as e:
last_err = e
continue
raise LLMGatewayError("All providers failed", cause=last_err)
Streaming
OpenAI streaming is wired with SSE on the frontend. Cancellation propagates: if the user closes the connection, the OpenAI request is aborted (saves token cost). Partial responses are stored on the user's conversation log so a refresh resumes the message.
Retry policy
429 (rate limit): exponential backoff, then key rotation. 500/502/503/504: 3 retries with backoff, then fall through to next provider. 401 / 403: don't retry, surface the credential issue to ops.
Usage metering
Every call writes a row: timestamp, user_id, workspace_id, provider, model, prompt_tokens, completion_tokens, latency_ms, cost_estimate. Aggregated per workspace per billing period. Hard cap enforced at the gateway — when the workspace's monthly cap is hit, new requests return 402 with a clear message.
Prompt versioning
Prompts live in app/prompts/<name>/v<N>.txt. The deployed prompt is immutable. Updates create a new version. The gateway can A/B route based on user-hash, so you can roll a new prompt to 10% of traffic and measure before full rollout.
Function calling
OpenAI function-calling support is wired with a registry of tools the LLM can call. Each tool has explicit input/output schemas. Tools that touch sensitive operations require human-in-the-loop confirmation before execution.
What ships in docs/
docs/decisions/ADR-0006-llm-provider-cascade.md— OpenAI's place in the 6-provider chaindocs/decisions/ADR-0009-usage-metering.md— the cost-control architecturedocs/decisions/ADR-0012-prompt-versioning.md— versioned-immutable prompt strategydocs/runbooks/llm-cost-spike.md— what to do when usage spikes unexpectedly
Environment variables generated
OPENAI_API_KEY=sk-...
OPENAI_ORG_ID=org-... # optional, for organisation-billing
OPENAI_API_KEY_2=sk-... # rotation key, optional
LLM_MODEL_OPENAI=gpt-4o,gpt-4o-mini # comma-separated cascade within OpenAI
LLM_MONTHLY_TOKEN_CAP_PER_WORKSPACE=10000000
OpenAI documentation references
Internal links
- Anthropic integration for the Claude side of the cascade
- AI SaaS wrapper use case
CTA
Try it — free plan, no credit card. archiet.com.
Generate a codebase with OpenAI wired through the multi-provider gateway, decide if it's the cost+reliability shape you'd ship.