Skip to content

LLM Pricing Operations

Practical advice for running LLM pricing in production without surprises.

For type: token-based routes, maintain a models map for the models you care about. New model IDs appear frequently and may not match your target margin.

routes:
"POST /v1/chat/completions":
upstream: openai
type: token-based
models:
gpt-4o: "$0.05"
gpt-4o-mini: "$0.005"
claude-sonnet-4-5: "$0.02"
fallback: "$0.02" # safe default for unknown models

A conservative fallback prevents silent undercharging when new models appear.

Fine-tunes often use custom IDs (ft:*, provider suffixes, deployment names). Map known IDs explicitly:

routes:
"POST /v1/chat/completions":
upstream: openai
type: token-based
models:
gpt-4o: "$0.05"
ft:gpt-4o:acme-support-v2: "$0.08"
ft:gpt-4o-mini:triage: "$0.015"
fallback: "$0.03"

For fail-closed behavior on unknown models, reject them in a hook:

hooks/enforce-model-allowlist.ts
const allowed = new Set(["gpt-4o", "gpt-4o-mini", "ft:gpt-4o:acme-support-v2"]);
export default async (ctx) => {
const body = (ctx.req.body ?? {}) as { model?: string };
const model = body.model;
if (!model || !allowed.has(model)) {
return { reject: true, status: 400, body: "Unsupported model" };
}
};
BasisBest forTradeoff
Token-based (type: token-based)Chat/completion APIs with variable outputPrice varies per request
Flat per request (price or match)Short, bounded requestsHeavy requests can erode margin

Start with token-based for general LLM APIs. Use flat pricing only when output size is tightly bounded.

  • Always set a conservative fallback for unknown models.
  • Keep an allowlist for approved model IDs on high-risk routes.
  • Prefer settlement: after-response on expensive upstreams where failures are common. See Refund Protection.

Related: