How this Image Token Calculator works
Why images aren't counted like text
Vision models bill images as input tokens, but they don't count characters — they count the image's dimensions and detail. A single high-resolution screenshot can cost as much as pages of text, and the same image maps to very different token counts from one provider to the next. Enter the width, height, and how many images you send per request to size the token load before you ship a vision feature.
How each provider counts an image
OpenAI scales the image, then tiles it into 512-pixel blocks — low detail is a flat 85 tokens, high detail is 85 plus 170 per tile. Anthropic approximates tokens as roughly width times height divided by 750, resizing very large images first. Google's Gemini charges about 258 tokens per 768-pixel tile. Switch the model in the calculator above to see the same image priced three ways.
What this estimate leaves out
These are the providers' documented tokenization rules applied as an approximation and multiplied by each model's standard input-token price — the model versions and rates here are planning placeholders, so treat the result as a baseline, not a billing guarantee. It also excludes any text prompt sent with the image, output tokens, detail-level edge cases, and provider discounts. Confirm against real usage.