What gets generated

Multi-provider gateway

OpenAI is one provider in a 6-provider cascade: OpenRouter (which itself proxies OpenAI plus 65+ others), then direct Anthropic, then direct OpenAI, then Google, then DeepSeek, then HuggingFace. Each can have up to 5 API keys for rotation under rate limits.

# app/services/llm_gateway.py — generated structure
def call_llm(prompt, model_hint=None, max_tokens=2000, stream=False):
    chain = _build_provider_chain(model_hint)
    last_err = None
    for provider, model in chain:
        try:
            return _dispatch(provider, model, prompt, max_tokens, stream)
        except RateLimitError as e:
            # try next key, then next provider
            continue
        except QuotaExceededError as e:
            # 402 — skip remaining keys for this provider
            _skip_provider(provider)
            continue
        except Exception as e:
            last_err = e
            continue
    raise LLMGatewayError("All providers failed", cause=last_err)

Streaming

OpenAI streaming is wired with SSE on the frontend. Cancellation propagates: if the user closes the connection, the OpenAI request is aborted (saves token cost). Partial responses are stored on the user's conversation log so a refresh resumes the message.

Retry policy

429 (rate limit): exponential backoff, then key rotation. 500/502/503/504: 3 retries with backoff, then fall through to next provider. 401 / 403: don't retry, surface the credential issue to ops.

Usage metering

Every call writes a row: timestamp, user_id, workspace_id, provider, model, prompt_tokens, completion_tokens, latency_ms, cost_estimate. Aggregated per workspace per billing period. Hard cap enforced at the gateway — when the workspace's monthly cap is hit, new requests return 402 with a clear message.

Prompt versioning

Prompts live in app/prompts/<name>/v<N>.txt. The deployed prompt is immutable. Updates create a new version. The gateway can A/B route based on user-hash, so you can roll a new prompt to 10% of traffic and measure before full rollout.

Function calling

OpenAI function-calling support is wired with a registry of tools the LLM can call. Each tool has explicit input/output schemas. Tools that touch sensitive operations require human-in-the-loop confirmation before execution.

What ships in `docs/`

docs/decisions/ADR-0006-llm-provider-cascade.md — OpenAI's place in the 6-provider chain
docs/decisions/ADR-0009-usage-metering.md — the cost-control architecture
docs/decisions/ADR-0012-prompt-versioning.md — versioned-immutable prompt strategy
docs/runbooks/llm-cost-spike.md — what to do when usage spikes unexpectedly

Environment variables generated

OPENAI_API_KEY=sk-...
OPENAI_ORG_ID=org-...                        # optional, for organisation-billing
OPENAI_API_KEY_2=sk-...                      # rotation key, optional
LLM_MODEL_OPENAI=gpt-4o,gpt-4o-mini          # comma-separated cascade within OpenAI
LLM_MONTHLY_TOKEN_CAP_PER_WORKSPACE=10000000

OpenAI documentation references

Internal links

Anthropic integration for the Claude side of the cascade
AI SaaS wrapper use case

CTA

Try it — free plan, no credit card. archiet.com.

Generate a codebase with OpenAI wired through the multi-provider gateway, decide if it's the cost+reliability shape you'd ship.

What gets generated

Multi-provider gateway

# app/services/llm_gateway.py — generated structure
def call_llm(prompt, model_hint=None, max_tokens=2000, stream=False):
    chain = _build_provider_chain(model_hint)
    last_err = None
    for provider, model in chain:
        try:
            return _dispatch(provider, model, prompt, max_tokens, stream)
        except RateLimitError as e:
            # try next key, then next provider
            continue
        except QuotaExceededError as e:
            # 402 — skip remaining keys for this provider
            _skip_provider(provider)
            continue
        except Exception as e:
            last_err = e
            continue
    raise LLMGatewayError("All providers failed", cause=last_err)

Streaming

Retry policy

429 (rate limit): exponential backoff, then key rotation. 500/502/503/504: 3 retries with backoff, then fall through to next provider. 401 / 403: don't retry, surface the credential issue to ops.

Usage metering

Prompt versioning

Function calling

What ships in `docs/`

docs/decisions/ADR-0006-llm-provider-cascade.md — OpenAI's place in the 6-provider chain
docs/decisions/ADR-0009-usage-metering.md — the cost-control architecture
docs/decisions/ADR-0012-prompt-versioning.md — versioned-immutable prompt strategy
docs/runbooks/llm-cost-spike.md — what to do when usage spikes unexpectedly

Environment variables generated

OPENAI_API_KEY=sk-...
OPENAI_ORG_ID=org-...                        # optional, for organisation-billing
OPENAI_API_KEY_2=sk-...                      # rotation key, optional
LLM_MODEL_OPENAI=gpt-4o,gpt-4o-mini          # comma-separated cascade within OpenAI
LLM_MONTHLY_TOKEN_CAP_PER_WORKSPACE=10000000

OpenAI documentation references

Internal links

Anthropic integration for the Claude side of the cascade
AI SaaS wrapper use case

CTA

Try it — free plan, no credit card. archiet.com.

Generate a codebase with OpenAI wired through the multi-provider gateway, decide if it's the cost+reliability shape you'd ship.

Archiet + OpenAI — production LLM wiring with cost and reliability controls

What gets generated

Multi-provider gateway

Streaming

Retry policy

Usage metering

Prompt versioning

Function calling

What ships in `docs/`

Environment variables generated

OpenAI documentation references

Internal links

CTA

Generate the codebase.

Archiet + OpenAI — production LLM wiring with cost and reliability controls

What gets generated

Multi-provider gateway

Streaming

Retry policy

Usage metering

Prompt versioning

Function calling

What ships in `docs/`

Environment variables generated

OpenAI documentation references

Internal links

CTA

Generate the codebase.

What gets generated

Multi-provider gateway

Streaming

Retry policy

Usage metering

Prompt versioning

Function calling

What ships in docs/

Environment variables generated

OpenAI documentation references

Internal links

CTA

Generate the codebase.

What gets generated

Multi-provider gateway

Streaming

Retry policy

Usage metering

Prompt versioning

Function calling

What ships in docs/

Environment variables generated

OpenAI documentation references

Internal links

CTA

Generate the codebase.

What ships in `docs/`

What ships in `docs/`