What Is Few-Shot Prompting?
Few-shot prompting is the practice of including 2 to 5 worked examples in your prompt before asking the AI to do the real task. Each example shows an input paired with the exact output you want. The model uses those examples to infer your formatting, tone, and decision logic, then applies the same pattern to your real request.
Zero-shot prompting gives the model only the instruction. Few-shot gives the model the instruction plus a small demonstration set. The demonstration is where the gain lives.
The Prompt Engineering Guide compiled by Vellum reports that adding 2 to 5 examples typically lifts task performance on structured outputs by 15 to 35 percent compared to zero-shot, with the curve flattening after about five examples on most tasks.
Why Does It Beat Just Writing a Better Instruction?
Instructions tell the model what you want. Examples show the model what success looks like. When the two compete, examples almost always win. A vague instruction with three good examples will outperform a 400-word instruction with no examples on tasks that have any structural complexity.
This is because the model uses examples as the dominant signal for three things at once: the output format, the level of detail, and the implicit edge-case decisions. A 400-word instruction trying to describe all three creates ambiguity. Three examples resolve all three at once with no ambiguity.
Microsoft's own AI documentation puts it plainly. Few-shot prompting works when the task has a clear input-output shape but is hard to describe in pure prose. Classification, extraction, transformation, summarisation in a specific format. All of these reward examples more than they reward instruction-tuning.
When Should You Use Few-Shot Versus Zero-Shot?
The decision rule is whether you can write an unambiguous example faster than you can write an unambiguous instruction. Most of the time, the example is faster.
Use few-shot when:
--- The output has a specific structure you need preserved (JSON keys, markdown tables, a specific number of bullets).
--- The task involves a judgement call where the boundaries are hard to write but easy to demonstrate (sentiment that depends on context, severity grading, tone matching).
--- You want consistent voice across many outputs.
--- The task is repetitive and you'll run the same prompt with different inputs many times.
Use zero-shot when:
--- The task is open-ended and you actively want the model to bring its own structure.
--- The output is conversational rather than structured.
--- You have only one input to process and can edit the result faster than crafting examples.
A common mistake is using few-shot for tasks that should be zero-shot. If you're asking the model to brainstorm or generate creative variation, three examples can lock the model into copying the examples rather than producing something genuinely new.
How Do You Choose Good Examples?
Three rules separate examples that work from examples that confuse the model.
Rule 1: Cover the edge cases, not the average case. If your task has ambiguous boundaries, your examples should sit on those boundaries. Three examples that all look easy teach the model nothing about the hard ones. The teaching value lives in the close calls.
Rule 2: Keep examples close in length to your real task. If your real input will be 200 words, your examples should be 150 to 250 words. Three two-sentence examples followed by a five-paragraph real input will cause the model to truncate its real answer to match the example length.
Rule 3: Make every example obey the format exactly. If your output should be JSON, every example output must be valid JSON. If it should start with "Summary:", every example must start with "Summary:". The model treats your examples as the rule, so any inconsistency between them creates output drift.
Anthropic's published prompt engineering guidance for Claude adds a fourth practical rule. When examples are not strictly sequential, label them clearly. Put each example inside its own block with a heading like Example 1, Example 2, Example 3. This prevents the model from running examples together when they share similar inputs.
What Does a Real Few-Shot Prompt Look Like?
The fastest way to make this concrete is to look at a complete working prompt. The one below is for a common practitioner task: classifying customer support emails into intent categories with a confidence score. It uses three examples that cover an easy case, a tricky case, and an ambiguous case.
Try This Prompt:
You are a customer support email classifier. For each email, output the intent category and a confidence score from 1 to 5. Use only these categories: BILLING, TECHNICAL, ACCOUNT, FEEDBACK, OTHER.
Example 1
Email: "Hi, I was charged twice for my October invoice. Order number 884321. Can you refund the duplicate?"
Output: BILLING, 5
Example 2
Email: "My password reset email is not arriving. I've checked spam. Tried three times."
Output: ACCOUNT, 4
Example 3
Email: "Just wanted to say the new dashboard looks great, although the export feature crashes sometimes when I have more than 1000 rows."
Output: FEEDBACK, 3
Now classify this email:
Email: [paste the email here]
Output:
Three things make this prompt work. The instruction is short. The examples cover one easy intent, one moderate, and one mixed-signal case where confidence should drop. And the output format is fully demonstrated by Example 1 before the model ever sees the real input.
How Much Does Adding Examples Actually Cost?
Few-shot prompts cost more per call. The question is whether the extra cost is worth the accuracy gain.
A prompt with three 50-word examples adds roughly 200 tokens to every call. On GPT-5.5 at $5 per million input tokens, that's a tenth of a cent per call. On Claude Sonnet 4.6 at $3 per million input tokens, six hundredths of a cent. The cost difference is real but small.
The accuracy gain on structured tasks is typically 15 to 35 percent. Mem0's 2026 prompting benchmark report measured a 22 percent reduction in formatting errors and a 28 percent reduction in misclassifications on customer support classification tasks when moving from zero-shot to four-shot.
For one-off prompts, the cost analysis doesn't matter. For prompts you'll run 10,000 times in a workflow, the cost of inconsistent outputs is far higher than the cost of the extra tokens. This is why production AI workflows almost always use few-shot for structured tasks.
What's the One Habit That Levels Up Your Prompting Today?
Build a personal example bank. Every time you write a prompt that produces an output you like, save the input-output pair into a notes file. Tag it by task type. The next time you need to do a similar task, pull two or three from your bank, paste them as examples, and you'll skip the trial-and-error round that most users do every time.
This single habit separates intermediate AI users from the people who consistently get production-quality outputs on their first attempt. The examples are the work. The instruction is just the framing.
Few-shot prompting is one of the highest-leverage techniques in the prompting toolkit. It costs almost nothing extra, requires no new tools, works with every major model, and reliably improves the outputs that matter most: structured ones you'll use as part of a larger workflow. The next time your outputs feel inconsistent, ask yourself one question. Did I show the model what success looks like? If not, that's where to start.
We know AI's cold edges. We know your real challenges. 28 years with UD, turning technology into a partnership with warmth.
Ready to Find Out Where Your AI Skills Actually Stand?
Few-shot prompting is one technique. Are you applying the other twenty that intermediate AI users typically miss? UD's free AI IQ Test grades your real prompting skills against the techniques used by AI power users in Hong Kong. We'll walk you through every step, from your score to the specific habits that will move you up the next level.