We built this because we often use the OpenAI tokenizer UI when writing prompts for their LLMs. Equivalents do not exist for other LLM providers, like Anthropic and Mistral. You can use this tool to understand how a large language model from Anthropic (prior to Claude3) or Mistral will tokenize the text you input to the large language model.
Characters
0
Tokens
0
Large language models turn text into 'tokens'. Tokens are common sequences of text that the models group together. LLMs work by learning to predict which token is likely to come next. How text is turned into tokens is a key consideration when engineering with LLMs.
Token count for inputs allows comparison of different text formats (YAML, JSON, TS) and is a crude measure of prompt importance weighting. For outputs it is a relative measure of output speed between prompts (tok/s of an API varies by time of day) and a crude measure of compute used in outputs. This last one is why “Think step-by-step” works - a higher compute budget is spent. Token count also determines the cost of a prompt.
So if you want to compare prompts by efficacy, speed or cost then you'll want to quickly look at token counts.