What Is an AI Token? The Invisible Unit Behind Every AI Bill

AI tools charge by the token, not by the question. Here's what a token actually is, how it shapes your AI bill, and five ways to cut token costs by 40-70%.

Insight

2026-04-23

What Is an AI Token? A Clear Definition for Business Owners

There is a version of AI pricing that most business owners believe — and almost all of it is wrong. Most assume you pay "per question." You do not. You pay per token.

A token is the smallest chunk of text that an AI model reads, processes, and bills you for. One token is roughly 4 characters of English, or about 0.75 of an English word. For Chinese, one Chinese character is typically 1–2 tokens. Every prompt you send and every reply AI writes gets broken down into tokens before the model does anything.

When you see "GPT-4 costs US$10 per 1 million input tokens," that translates to roughly US$10 per 750,000 English words of input — about the length of 10 full-length novels.

Tokens are the unit of measurement behind every AI bill, every context window limit, and every "model too slow" complaint.

How Does Tokenization Actually Work?

Tokenization is the process of chopping text into smaller units (tokens) that the AI model can process mathematically. The model does not see letters or words the way humans do — it sees a sequence of numerical token IDs.

The word "unbelievable" might split into three tokens: un, believ, able. The phrase "香港" might be one token or two, depending on the tokenizer. Each unique token in the model's vocabulary has a number, and that number is what the AI actually processes.

A simple way to estimate tokens:

--- English: 1 word ≈ 1.3 tokens. 100 words ≈ 130 tokens.

--- Traditional Chinese: 1 character ≈ 1.5–2 tokens. 100 characters ≈ 150–200 tokens.

--- Code: highly variable; punctuation and operators often get their own tokens.

This matters because Chinese costs you more tokens per character than English. The same meaning in Chinese can consume roughly 2× the tokens of English — and therefore roughly 2× the cost.

What Is the Difference Between Input Tokens and Output Tokens?

Every AI interaction has two token meters running in parallel. Input tokens are everything you send to the model — your question, your system prompt, any documents you attach. Output tokens are everything the model writes back. They are priced differently, and output tokens almost always cost more.

For GPT-4 Turbo (2026 pricing), input tokens cost roughly US$10 per million, while output tokens cost roughly US$30 per million. That 3× gap is not a rounding error — it is a core part of how you should think about AI cost.

The business implication:

--- A short question with a long answer costs mostly output tokens — the expensive kind.

--- A long document with a short summary costs mostly input tokens — the cheap kind.

--- Asking AI to "be concise" or "answer in under 100 words" can cut your bill by 40–60% on heavy workloads, without changing quality.

If you are running AI at scale and have not audited your input/output ratio, you are almost certainly overpaying.

What Is a Context Window, and How Does It Use Tokens?

A context window is the maximum number of tokens a model can "see" at once — the input plus the output in a single exchange. When people say "Claude has a 200,000-token context window," they mean you can fit roughly 150,000 words of text (an entire novel) into a single prompt.

Every time you send a message, the model reads the entire conversation history within the current session. That history counts against your context window. In a long chat, old messages still cost tokens to re-process on every new turn.

Typical context window sizes in 2026:

--- GPT-4o: 128,000 tokens (~96,000 English words)

--- Claude 4: 200,000 tokens standard, up to 1,000,000 in enterprise

--- Gemini 2.5 Pro: 1,000,000 tokens (~750,000 English words)

A larger context window is not automatically better. Processing a million tokens in a single prompt is slow and expensive. For most SME use cases, 8,000–32,000 tokens is plenty.

How Do Tokens Actually Show Up on a Hong Kong SME's Bill?

For a small business using AI for customer service, content drafting, or document summarisation, monthly AI token spend typically lands between HK$200 and HK$3,000 — depending on volume, model choice, and how well you manage prompt length.

Three real-world cost examples (2026 GPT-4o pricing):

--- A property agent drafting 50 listing descriptions per month. Each listing averages 600 Chinese characters of input + 400 characters of output. Monthly cost: roughly HK$30.

--- A boutique handling 500 customer enquiries per month via AI-assisted replies. Each thread averages 300 tokens input + 200 tokens output. Monthly cost: roughly HK$150–250.

--- An accounting firm processing 2,000 invoices per month with AI data extraction. Heavy input (1,500 tokens per invoice), short output (150 tokens). Monthly cost: roughly HK$800–1,200.

The "AI is too expensive" narrative usually comes from businesses using the biggest, most expensive model for tasks that a smaller model handles at one-twentieth the cost.

Can You Reduce Your AI Token Costs Without Losing Quality?

Most Hong Kong SMEs can cut their AI token spend by 40–70% without noticeably affecting output quality. The savings come from prompt hygiene, model selection, and output discipline — not from using AI less.

Five proven cost-cutting tactics:

--- Right-size the model. Use GPT-4o mini or Claude Haiku for routine tasks; reserve GPT-4 and Claude Sonnet for hard reasoning. Typical savings: 80–90%.

--- Cap output length. Add "respond in 100 words or fewer" to your system prompt. Typical savings: 30–50% on output tokens.

--- Cache repeated inputs. If your system prompt is 2,000 tokens and you send it 1,000 times per day, a prompt cache (offered by OpenAI and Anthropic) drops that cost by roughly 90%.

--- Compress long documents before sending. Summarise an 80-page PDF down to a 2-page brief before asking AI to analyse it. Typical savings: 95% on input tokens.

--- Switch to batch APIs. OpenAI and Anthropic offer 50% discounts on non-urgent batch workloads. Ideal for overnight analytics or bulk content generation.

Most SMEs leave all five of these on the table because no one told them the levers existed.

Why Do Tokens Matter When You Are Choosing an AI Tool?

Two AI tools with "similar" pricing can have wildly different real-world costs once tokens are factored in. A model that charges US$5 per million tokens but consumes 3× the tokens for the same task is more expensive than a model priced at US$10 per million that uses fewer tokens.

Token efficiency varies by tokenizer design. According to a 2024 Hugging Face benchmark, some open-source Chinese-optimised tokenizers use 30–40% fewer tokens for Traditional Chinese text than general-purpose English tokenizers — a meaningful cost difference at scale.

Questions to ask before signing up:

--- What is the published token price for input and output?

--- How does the tokenizer handle Chinese characters specifically?

--- Is prompt caching supported, and what is the discount?

--- Is there a batch API for non-real-time tasks?

The right question is never "how much per month?" — it is "how much per 1,000 customer interactions in my exact workflow?"

The Bottom Line for Hong Kong SME Bosses

Tokens are the invisible unit that decides how much you actually pay for AI. Once you understand tokens, AI pricing stops feeling like a black box and starts looking like any other operational cost — measurable, manageable, and optimisable.

You do not need to become a tokenization expert. You need a clear view of where tokens come from in your workflow, how much each one costs, and which levers reduce that cost without hurting quality.

懂AI，更懂你。UD相伴，AI不冷。

Make Sense of Your AI Spend — Before It Grows

AI pricing is confusing, but it becomes clear once someone walks you through where tokens go in your actual workflow.
UD has spent 28 years helping Hong Kong SMEs adopt technology that pays for itself, and we'll walk you through every step: auditing your current AI tool use, picking the right model, and cutting token waste without cutting output quality.
Start with a free AI Ready Check — no commitment, no jargon.

Take the Free AI Ready Check

Explore AI Staff Solution

UD Blog

Unveiling Perspectives and Delivering Insights Related to Tech