What you get in the generated ZIP
A typical AI-SaaS generation includes:
app/blueprints/
├── chat_bp.py # the user-facing AI endpoint
├── prompt_template_bp.py # versioned prompts
├── usage_bp.py # metered usage per user / workspace
├── credit_bp.py # prepaid credits or pay-as-you-go
├── billing_bp.py # Stripe metered billing integration
└── llm_router_bp.py # provider selection per request
app/services/
├── llm_gateway.py # multi-provider cascade with failover
├── usage_meter.py # writes usage records, enforces caps
├── prompt_versioning.py # A/B routing
└── rate_limiter.py # per-user, per-tier rate limits
What's already wired
- 6-provider LLM cascade: OpenRouter → Anthropic → OpenAI → Google → DeepSeek → HuggingFace. Multi-key failover within each provider. Account-level quota detection (HTTP 402) skips the rest of a provider's keys instead of cycling. Workspace owners can exclude providers via env var (
LLM_EXCLUDED_PROVIDERS=google,deepseek). - Usage metering: every LLM call writes a row with input tokens, output tokens, model, provider, latency, cost. Aggregated per user / workspace / billing period.
- Hard caps: per-tier monthly cap. When the cap is hit, the API returns 402 with a clear message instead of silently burning credits. Soft warnings at 75% and 90%.
- Per-user rate limits: leaky-bucket per user with tier-specific rates. Free tier: 10 req/min. Paid tier: 100 req/min. Configurable.
- Prompt versioning: prompts live in
app/prompts/with version tags. Deployed prompts are immutable; updates create a new version. A/B routing splits traffic by user-hash. - Stripe metered billing: usage records are pushed to Stripe metered usage records on a schedule. Customers see "X requests this period" on their invoice.
- Streaming responses: SSE-based streaming so the user sees output as it generates. Cancellation propagates to the LLM call (Anthropic / OpenAI both support).
- Caching: optional response caching by prompt+input hash. Cuts cost on repeat queries by 30-60% depending on workload.
What ships in docs/
docs/decisions/ADR-0006-llm-provider-cascade.md— why 6 providers, why this order, when to add moredocs/decisions/ADR-0009-usage-metering.md— application-level vs proxy-level metering, with rejected alternativesdocs/decisions/ADR-0012-prompt-versioning.md— versioned-immutable vs mutable promptsdocs/cost/tco.md— projected LLM cost per user at different tier MAUs
Internal links
- OpenAI integration and Anthropic integration cover specific providers
- Stripe integration for the metered billing wiring
CTA
Try it — free plan, no credit card. archiet.com.
Generate an AI SaaS, look at the usage meter and the hard caps, decide if that's the shape that protects your runway.