What Is a Context Window? A Plain-Language Guide for Hong Kong Business Owners

A clear guide to AI context windows — what they are, how they work, and what size your Hong Kong business actually needs.

Insight

2026-04-23

By the end of this guide, you will know exactly what a context window is, why it quietly controls how well AI works for your business, and why the model with the biggest number on the box is not always the one that will save you the most time. No jargon, no hype — just the part Hong Kong business owners actually need to understand.

Context windows have become a 2026 business topic for a simple reason: the numbers exploded. Google Gemini now advertises 1 million tokens. GPT-5 handles 400,000. Claude offers 200,000. Those sound like marketing specs, but they decide whether your AI remembers the whole conversation with your client, the entire price list you just pasted in, and the last quarter of your staff handover notes. If you have ever pasted a long document into ChatGPT and watched it "forget" half of it, you have already met the context window — you just did not know its name.

This article explains the concept from the ground up, shows how it plays out in real Hong Kong SME workflows, and helps you tell the difference between real capability and a number on a spec sheet.

What is a context window in AI?

A context window is the maximum amount of text — measured in tokens — that an AI model can read, remember, and respond to in a single conversation. Everything counts: your question, the system instructions, any files you paste in, the conversation history, and the AI's own answer. Once you exceed this limit, the model starts forgetting the earliest parts.

Think of it as the AI's short-term memory. A human assistant can remember the general shape of a two-hour meeting but will forget the specific numbers without notes. An AI has no notes — only its context window. If the window holds 200,000 tokens, that is roughly 150,000 English words or about 250 pages of a normal business document. Anything beyond that is gone the moment it scrolls off the edge.

Tokens, not words. AI models do not count words — they count tokens. One English word is roughly 0.75 tokens. One Chinese character is usually 1 token on its own. So a 200,000-token window fits around 150,000 English words or 200,000 Chinese characters. Your mileage varies depending on language mix.

How does a context window actually work?

Inside the model, every token "attends to" every other token in the window. The longer the window, the more comparisons the model has to run. According to IBM's research overview on context windows, the computation scales roughly quadratically — doubling the window does not double the cost, it roughly quadruples the work in the attention layers.

That is why large context windows were technically impossible until 2023, why they were expensive in 2024, and why they became affordable in 2026. Engineering improvements — not bigger buildings full of GPUs — drove the jump.

The practical consequence: a model with a 1-million-token window is not just "bigger" than a 200,000-token model. It is slower per request, more expensive per query, and — critically — often less accurate in the middle of the window. The model treats the beginning and the end well. The middle is where things get lost.

Why does context window size matter for a small business?

Context window size decides what kind of work you can give the AI in one shot. A small window forces you to chop documents into pieces. A large window lets you paste in the full contract, the full staff manual, or the full customer chat history and ask a question across all of it.

Three concrete Hong Kong SME scenarios where size matters:

--- Customer service handover. A restaurant owner wants the AI to read a 30-page table-booking policy and the last 200 customer messages before answering a new inquiry. That is roughly 40,000 tokens — fine for any modern model, but would have broken a 2023-era 4,000-token window instantly.

--- Lease and contract review. A property agency asks the AI to compare a 60-page tenancy agreement against a 40-page landlord template to flag differences. Around 120,000 tokens total. Works well at 200,000. Fails cleanly at 100,000.

--- Full accounting year review. A small retail shop uploads 12 months of transaction descriptions — around 300,000 tokens of text — and asks the AI to spot anomalies. Only a model with a 400,000+ token window can see all of it at once.

If your day-to-day documents are short, window size barely matters. If you regularly work with long contracts, thick manuals, or full conversation archives, it decides what is possible.

Is a bigger context window always better?

No. Bigger windows cost more, run slower, and often drop accuracy in the middle of the text — a phenomenon researchers call "context rot." Multiple 2026 benchmarks show that models advertised at 200,000 tokens often become unreliable around 130,000 tokens. Performance degrades suddenly, not gradually.

This matters because pricing is usually per token. A question that uses 500,000 tokens of context costs about 125 times more than the same question against 4,000 tokens of context. For a Hong Kong small business running 100 queries a day, that is the difference between a few dollars and a few hundred dollars.

The rule of thumb: use the smallest window that comfortably fits your task. A 200,000-token model is plenty for 95% of SME workflows. The 1-million-token models are mainly useful for legal document review, codebase analysis, and long-form research — not for replying to customer emails.

What happens when you exceed the context window?

When you exceed the context window, the model either refuses the request with an error or silently drops the earliest part of the conversation. Neither behaviour is what most users expect. The AI does not warn you it is forgetting — it simply produces a worse answer based on a partial picture.

Two things this looks like in practice for a business owner:

--- The AI suddenly "forgets" the product price list you pasted 40 messages ago. The list rolled out of the window. The AI now makes up prices.

--- The AI gives an answer that contradicts the rules you set up at the start of the session. Those setup instructions also rolled out of the window. The AI is operating without them.

The practical fix: for anything that must not be forgotten — pricing, policies, approval rules, brand guidelines — either paste it at the end of your latest prompt (the end of the window is best remembered), or use an AI solution that has a dedicated knowledge base outside the context window.

How do AI employees handle long context differently?

A well-built AI employee does not rely on stuffing everything into the context window. Instead, it combines a reasonable-size context window with an external knowledge base — sometimes called retrieval-augmented generation, or RAG — so that the AI fetches only the most relevant pieces of information per question, rather than carrying the whole library around.

This matters because it is the difference between a consumer chatbot and a production-grade AI for business.

--- Consumer chatbots expect you to paste everything each time. When the conversation gets long, they forget.

--- Purpose-built AI employees store your policies, price lists, product catalogues, and historical conversations in a searchable knowledge base. The context window holds only what is relevant to the current question, so the AI stays accurate even across thousands of customer conversations.

For Hong Kong SMEs, this architectural choice matters more than the raw context window number. A 200,000-token AI employee with a good knowledge base will outperform a 1-million-token chatbot that relies on pasted text every single time.

Common misconceptions about context windows

Most SME owners meet context windows through marketing headlines, which create a few stubborn misconceptions. Clearing them up saves real money and prevents the wrong tool being bought for the wrong job.

--- "Bigger is always better." No. Bigger is more expensive and often less accurate in the middle. Match the window to the task.

--- "The context window is the AI's long-term memory." No. It is short-term working memory for a single session. Long-term memory requires a separate knowledge base or fine-tuning.

--- "My conversation history is safe inside the window." No. If the conversation exceeds the window, the earliest parts are dropped silently. Save anything important outside the chat.

--- "All tokens are equal in attention." No. Models pay more attention to the beginning and end of the window. Put critical instructions in those positions for best results.

How much context window does a Hong Kong SME really need?

Most Hong Kong SMEs need between 32,000 and 128,000 tokens of context — enough to handle a full customer conversation plus a detailed business policy. Anything above 200,000 is overkill for most use cases and starts costing real money per query.

Here is a rough sizing guide by business type:

--- Restaurant / F&B: 32,000 tokens is plenty. A full menu plus three months of booking history fits comfortably.

--- Retail shop: 64,000 tokens. Product catalogue (up to 500 items) plus current promotions and customer FAQ.

--- Property agency: 128,000 to 200,000 tokens. Full tenancy agreement plus landlord template plus relevant case law fits.

--- Professional services (legal, accounting, consulting): 200,000+ tokens. Document-heavy work benefits most from larger windows.

If you are not sure which category fits your business, the safest choice is a modern AI employee with a knowledge base. The knowledge base does the heavy lifting, and the context window only needs to hold the current question and its most relevant documents.

Conclusion: context window is a ceiling, not a feature

A context window is the ceiling on what an AI can see in one shot. It is not magic, it is not free, and it is not the same as intelligence. The AI with the biggest number on the box is not always the AI that will save your business the most time — especially if you are paying per token for context you did not need to send.

What matters for a Hong Kong SME is not the biggest window. It is the right architecture: a practical context window paired with a searchable knowledge base, set up by someone who understands both the technology and your business. That is the difference between AI that forgets at the wrong moment and AI that becomes a reliable part of your team.

At UD, we have spent 28 years helping Hong Kong businesses make new technology work for them — and we understand AI gets cold, but we understand your challenges more.

Ready to build an AI employee that actually remembers your business?

Now that you understand what a context window is and why architecture matters more than specs, the next step is finding the right fit for your business. UD's AI Employee Hub helps you pick, train, and deploy AI employees that combine the right context window with a knowledge base of your real business content. We'll walk you through every step — from assessing your needs to going live.

Explore AI Employee Hub

UD Blog

Unveiling Perspectives and Delivering Insights Related to Tech