What is Token? Definition & Guide

A token is the fundamental unit of text that a large language model processes. Tokens are not exactly words: they are chunks of text determined by the model's tokenizer. In English, a token is roughly 3/4 of a word on average. Common words like "the" or "is" are single tokens. Less common words get split into multiple tokens: "unbelievable" might become "un," "believ," "able." Numbers, punctuation, and code have their own tokenization patterns.

Why it matters: tokens directly impact the cost, speed, and capability of LLM usage. Models charge per token (both input and output). Models have maximum context windows measured in tokens. Longer prompts cost more and take longer to process. Understanding tokenization helps you optimize for cost, stay within context limits, and design efficient prompts and RAG systems.

Context windows: each model has a maximum number of tokens it can process in a single interaction (input + output combined). GPT-4o supports up to 128K tokens. Claude supports up to 200K tokens (with extended context up to 1M). Gemini 1.5 Pro supports up to 2M tokens. Larger context windows let you process longer documents, include more RAG context, and maintain longer conversation histories. But larger contexts are slower and more expensive.

Cost implications: OpenAI charges per 1K tokens (different rates for input and output). Anthropic charges per million tokens. At typical rates, processing a 10-page document costs $0.01-0.05 depending on the model. But at scale, costs add up: a customer support bot handling 10,000 conversations per day at 2,000 tokens each = 20M tokens/day. At $3 per million input tokens, that is $60/day or $1,800/month just for input tokens. Choosing the right model size for each task is essential for cost management.

Tokenization tools: OpenAI's tiktoken library lets you count tokens before sending requests. Anthropic provides a token counting API. Most LLM libraries include token counting utilities. Always count tokens before making API calls to ensure you stay within limits and can estimate costs. For prompt optimization, knowing that your system prompt uses 800 tokens helps you budget the remaining context window for user input and retrieval context.

Common mistakes: not accounting for output tokens in context window calculations (if your context window is 128K and your input uses 120K, you only have 8K tokens left for the response). Assuming tokens and words are 1:1 (they are not; plan for roughly 1.3 tokens per word). Not monitoring token usage in production, leading to unexpected costs. Wasting tokens on redundant or irrelevant content in prompts.

Practical example: a company building a RAG application calculates their token budget: 1,000 tokens for the system prompt, 3,000 tokens for retrieved context (roughly 5 document chunks), 500 tokens for the user's question and conversation history, and 2,000 tokens reserved for the model's response. Total: 6,500 tokens per interaction. Using Claude Haiku at roughly $0.25 per million input tokens and $1.25 per million output tokens, each interaction costs approximately $0.004. At 5,000 queries per day, the monthly cost is roughly $600, well within budget.

Token

Related terms

Put these concepts into action