Generate an AI SaaS that doesn't go bankrupt the first time a customer abuses it

The "wrap an LLM in a SaaS" pattern is the most common shape we see in 2026. The naive version takes a weekend: form, API call, response. The version that survives a paying customer hammering it accidentally — or a customer abusing the free tier — needs usage metering with hard caps, multi-provider failover so one outage doesn't take you down, prompt versioning so you can A/B test without breaking customers on contracts, and per-user rate limits at the application layer. Archiet generates the version with the abuse-and-cost controls in place, so a customer can't burn $10K of OpenAI credits before you notice.

What you get in the generated ZIP

A typical AI-SaaS generation includes:

app/blueprints/
├── chat_bp.py                # the user-facing AI endpoint
├── prompt_template_bp.py     # versioned prompts
├── usage_bp.py               # metered usage per user / workspace
├── credit_bp.py              # prepaid credits or pay-as-you-go
├── billing_bp.py             # Stripe metered billing integration
└── llm_router_bp.py          # provider selection per request

app/services/
├── llm_gateway.py            # multi-provider cascade with failover
├── usage_meter.py            # writes usage records, enforces caps
├── prompt_versioning.py      # A/B routing
└── rate_limiter.py           # per-user, per-tier rate limits

What's already wired

6-provider LLM cascade: OpenRouter → Anthropic → OpenAI → Google → DeepSeek → HuggingFace. Multi-key failover within each provider. Account-level quota detection (HTTP 402) skips the rest of a provider's keys instead of cycling. Workspace owners can exclude providers via env var (LLM_EXCLUDED_PROVIDERS=google,deepseek).
Usage metering: every LLM call writes a row with input tokens, output tokens, model, provider, latency, cost. Aggregated per user / workspace / billing period.
Hard caps: per-tier monthly cap. When the cap is hit, the API returns 402 with a clear message instead of silently burning credits. Soft warnings at 75% and 90%.
Per-user rate limits: leaky-bucket per user with tier-specific rates. Free tier: 10 req/min. Paid tier: 100 req/min. Configurable.
Prompt versioning: prompts live in app/prompts/ with version tags. Deployed prompts are immutable; updates create a new version. A/B routing splits traffic by user-hash.
Stripe metered billing: usage records are pushed to Stripe metered usage records on a schedule. Customers see "X requests this period" on their invoice.
Streaming responses: SSE-based streaming so the user sees output as it generates. Cancellation propagates to the LLM call (Anthropic / OpenAI both support).
Caching: optional response caching by prompt+input hash. Cuts cost on repeat queries by 30-60% depending on workload.

What ships in `docs/`

docs/decisions/ADR-0006-llm-provider-cascade.md — why 6 providers, why this order, when to add more
docs/decisions/ADR-0009-usage-metering.md — application-level vs proxy-level metering, with rejected alternatives
docs/decisions/ADR-0012-prompt-versioning.md — versioned-immutable vs mutable prompts
docs/cost/tco.md — projected LLM cost per user at different tier MAUs

Internal links

OpenAI integration and Anthropic integration cover specific providers
Stripe integration for the metered billing wiring

CTA

Try it — free plan, no credit card. archiet.com.

Generate an AI SaaS, look at the usage meter and the hard caps, decide if that's the shape that protects your runway.

Generate an AI SaaS that doesn't go bankrupt the first time a customer abuses it

What you get in the generated ZIP

A typical AI-SaaS generation includes:

app/blueprints/ ├── chat_bp.py # the user-facing AI endpoint ├── prompt_template_bp.py # versioned prompts ├── usage_bp.py # metered usage per user / workspace ├── credit_bp.py # prepaid credits or pay-as-you-go ├── billing_bp.py # Stripe metered billing integration └── llm_router_bp.py # provider selection per request app/services/ ├── llm_gateway.py # multi-provider cascade with failover ├── usage_meter.py # writes usage records, enforces caps ├── prompt_versioning.py # A/B routing └── rate_limiter.py # per-user, per-tier rate limits

What's already wired

6-provider LLM cascade: OpenRouter → Anthropic → OpenAI → Google → DeepSeek → HuggingFace. Multi-key failover within each provider. Account-level quota detection (HTTP 402) skips the rest of a provider's keys instead of cycling. Workspace owners can exclude providers via env var (LLM_EXCLUDED_PROVIDERS=google,deepseek).

Usage metering: every LLM call writes a row with input tokens, output tokens, model, provider, latency, cost. Aggregated per user / workspace / billing period.

Hard caps: per-tier monthly cap. When the cap is hit, the API returns 402 with a clear message instead of silently burning credits. Soft warnings at 75% and 90%.

Per-user rate limits: leaky-bucket per user with tier-specific rates. Free tier: 10 req/min. Paid tier: 100 req/min. Configurable.

Prompt versioning: prompts live in app/prompts/ with version tags. Deployed prompts are immutable; updates create a new version. A/B routing splits traffic by user-hash.

Stripe metered billing: usage records are pushed to Stripe metered usage records on a schedule. Customers see "X requests this period" on their invoice.

Streaming responses: SSE-based streaming so the user sees output as it generates. Cancellation propagates to the LLM call (Anthropic / OpenAI both support).

Caching: optional response caching by prompt+input hash. Cuts cost on repeat queries by 30-60% depending on workload.

What ships in docs/

docs/decisions/ADR-0006-llm-provider-cascade.md — why 6 providers, why this order, when to add more

docs/decisions/ADR-0009-usage-metering.md — application-level vs proxy-level metering, with rejected alternatives

docs/decisions/ADR-0012-prompt-versioning.md — versioned-immutable vs mutable prompts

docs/cost/tco.md — projected LLM cost per user at different tier MAUs

Generate an AI SaaS that doesn't go bankrupt the first time a customer abuses it

What you get in the generated ZIP

What's already wired

What ships in `docs/`

Internal links

CTA

Generate the codebase.

Generate an AI SaaS that doesn't go bankrupt the first time a customer abuses it

What you get in the generated ZIP

What's already wired

What ships in `docs/`

Internal links

CTA

Generate the codebase.

What you get in the generated ZIP

What's already wired

What ships in docs/

Internal links

CTA

Generate the codebase.

What you get in the generated ZIP

What's already wired

What ships in docs/

Internal links

CTA

Generate the codebase.

What ships in `docs/`

What ships in `docs/`