Blog

Input vs. Output Tokens: The Asymmetry That Decides Your AI Bill

Output tokens usually cost several times more than input. Here's how that asymmetry shapes the cost of chat, classification, and content generation, and how to plan for it.

Almost every model API prices two things separately: the tokens you send (input) and the tokens it generates (output). On nearly every model, output is the more expensive of the two, often several times more. That single asymmetry explains why two features with the same "token count" can have very different bills.

Why output costs more

Reading your prompt is cheap relative to generating a response. The model produces output one token at a time, each step depending on all the ones before it, and providers price that extra work into a higher output rate. So the shape of your workload, how much you send versus how much you ask the model to write, matters as much as the total volume.

Two workloads, two cost profiles

Take the same model on two jobs:

  • Classification or extraction sends a lot of context and asks for a tiny answer (a category, a JSON field). It's input-heavy: cost is dominated by the prompt, and cheap models shine.
  • Drafting or long-form generation sends a short instruction and asks for a long answer. It's output-heavy: cost is dominated by generation, and the output price is what bites.

A support bot that replies in two sentences and a writing tool that drafts a full article can run on the same model and still land in completely different cost brackets.

The hidden input multiplier: context

Input isn't just the user's message. It's the system prompt, the conversation history, retrieved documents, and any examples, all re-sent on every call. In a chat feature, input grows turn after turn as history accumulates, so a conversation that starts cheap can become expensive by message twenty. Trimming context is often the highest-leverage cost lever you have.

How to plan for it

  • Estimate input and output separately. Don't lean on one "tokens per request" number; split it. The Token Cost Calculator takes input and output tokens independently, so you can see which side dominates.
  • Model the whole feature, not one call. For chat and agents, account for growing history and multi-step calls. The AI Cost Calculator scales a per-request estimate to daily and monthly cost across your user base.
  • Match the model to the shape. Use a cheap, fast model for input-heavy routing and classification; reserve premium models for the output-heavy work that truly needs them.
  • Cut output where you can. Ask for concise answers, structured fields instead of prose, and caps on response length. Output is usually where the money goes.

Once you stop thinking in "tokens" and start thinking in "input tokens versus output tokens," your estimates get sharper and your optimization choices get obvious: shorten what you send, constrain what you generate, and price the two sides separately.