Predictable GPU inference spend — without surprise bills.
Darktree provides OpenAI-compatible inference endpoints backed by prepaid compute credits.
Credits are consumed by token usage (prompt_tokens + completion_tokens) returned in API responses.
Add daily caps for budget control and an append-only audit ledger for reconciliation.
/v1 endpoints
Token usage in every response
Daily caps (HTTP 429 on limit)
API key auth
Append-only usage ledger
1) Buy prepaid credits
Purchase a bundle via Stripe. Credits fund token-metered inference and help you lock your spend up front.
2) Send requests
Use OpenAI-compatible endpoints with standard headers. Most clients work with minimal changes.
3) Cap, meter, audit
Token usage is logged per request. Daily caps protect against runaway usage; the ledger supports reconciliation.
What you get
- Cost certainty: prepaid bundles + customer-level daily caps
- Auditability: append-only per-request usage ledger
- Compatibility: OpenAI-style endpoints and usage fields
- Support: direct operator help during onboarding
What credits are (and aren’t)
- Credits are a prepaid balance consumed by token usage
- Credits are not GPU hours, server rentals, or reserved hardware
- Token usage returned by the API response is the billing source of truth
Larger models and heavier workloads may consume credits faster. The API’s token counts remain authoritative.
Plans & Credits
Prepaid bundles fund token-metered inference. Each plan includes a conservative daily cap to prevent runaway spend. Caps reset daily and can be raised or lowered on request.
| Solo | Team | Scale | |
|---|---|---|---|
| Bundle price | $50 | $150 | $300 |
| Tokens included | ~66,667 | ~200,000 | ~400,000 |
| Default daily cap | 2,000 tokens/day | 7,000 tokens/day | 15,000 tokens/day |
| ~Requests/day (example) Assumes ~800 tokens/request |
~2–3 | ~8–10 | ~18–20 |
| Max spend/day at cap $0.75 / 1k tokens |
$1.50/day | $5.25/day | $11.25/day |
Token totals assume $0.75 per 1,000 tokens.
Daily caps reset at 00:00 UTC and act as a hard stop
(requests return HTTP 429 when exceeded) until the next reset.
Caps are set per customer by Darktree and can be adjusted on request.
Latency note: Premium models are typically higher-latency than Standard models.
In steady state, qwen25-14b-awq is commonly ~100 tokens/sec and qwen25-32b-awq ~45 tokens/sec
(typical medians; depends on prompt length, max_tokens, concurrency, and warm vs cold starts).
Budgeting Guide · Usage & Billing PDF
Note: credits are prepaid and non-refundable. Usage is measured in tokens, not time. Token usage returned by API responses is the authoritative record for credit deduction.
Headers & limits
- Auth:
Authorization: Bearer <API_KEY> - Customer tag:
X-Customer-Id: <your-id> - Caps: enforced per customer per day (HTTP 429)
- Billing source: token usage + audit ledger
Models: qwen25-14b-awq (Standard) · qwen25-32b-awq (Premium)
Need higher caps or a dedicated lane? Email expected tokens/day and latency needs.
Need help keeping Stripe + usage ledger + QuickBooks clean? Book hourly bookkeeping / ops support.