Budgeting Guide

How to plan spend, set caps, and forecast runway for token-metered inference.

The three numbers that matter

Tokens per request = prompt_tokens + completion_tokens (returned in every response)
Requests per day = typical + peak call volume
Daily cap = hard stop tokens/day per customer ID (HTTP 429 when exceeded)

Quick cost math

At $0.75 per 1,000 tokens:

Tokens	Cost
1,000	$0.75
10,000	$7.50
100,000	$75.00
1,000,000	$750.00

Finance framing: daily caps turn AI spend into a known maximum dollars/day.

Set a cap from your budget

Pick your max dollars/day, then convert to tokens/day:

tokens_per_day_cap = (dollars_per_day / 0.75) * 1000

Example: $10/day → ~13,333 tokens/day.

Plans, runway, and what “cap” protects

Bundles prepay your token usage. Caps are enforced at request time and reset daily (UTC). When you hit the cap, additional requests return HTTP 429 until the next reset.

	Solo	Team	Scale
Bundle	$50	$150	$300
Tokens included	~66,667	~200,000	~400,000
Typical default cap	2,000 / day	7,000 / day	15,000 / day
~Requests/day example Assumes ~800 tokens/request	~2–3	~8–10	~18–20
Max spend/day at cap $0.75 / 1k tokens	$1.50/day	$5.25/day	$11.25/day

Runway estimate (days) ≈ tokens_in_bundle / daily_cap. Example: 200,000 / 7,000 ≈ 28.6 days at cap.

Latency note (Standard vs Premium)

Larger models are typically higher-latency and may consume more tokens per useful answer. As a rule of thumb, qwen25-32b-awq runs slower than qwen25-14b-awq. Actual latency depends on prompt length, max_tokens, concurrent load, and warm vs cold starts.

Cap adjustments

Default caps start conservative to prevent accidental spend. Caps can be raised or lowered on request. Typical increases happen after a short clean-usage period and may depend on model tier (14B-only vs mixed).

To request an adjustment: desired cap, model(s), expected tokens/request, and expected requests/day.

Contact: [email protected]