Adaptive thinking is the prompting upgrade most practitioners are still missing
Anthropic shipped adaptive thinking in Claude Sonnet 4.6 and Claude Opus 4.8, and it changed how serious prompting works. Instead of telling Claude to "think step by step" inside your prompt, you set a single API parameter called effort, and the model decides on its own when to think hard and when to answer fast. Most practitioners are still writing chain-of-thought instructions Claude no longer needs.
Adaptive thinking is not extended thinking with a new name. Extended thinking forces a reasoning step on every request. Adaptive thinking is a switch that says "use extended thinking only when this problem deserves it." For agentic workflows, multi-tool pipelines, and long conversations, that distinction is the difference between a fast, focused model and one that wastes tokens overthinking trivial messages.
This article gives you the practical mental model for adaptive thinking, the four effort levels and when to use each, and the three places in your existing prompts where you should delete instructions Claude is now smarter than.
What is adaptive thinking in Claude?
Adaptive thinking is a mode in Claude Sonnet 4.6 and Claude Opus 4.8 where the model decides on each request whether to use extended reasoning, and how deep that reasoning should go. You enable it by setting thinking.type to "adaptive" in the API call, then choosing an effort level. Claude evaluates the prompt's complexity in real time and allocates thinking tokens accordingly.
The mechanism replaces the older approach where you specified budget_tokens, a fixed reasoning budget for every call. Budget tokens forced you to predict how hard a task would be before the model saw it. Adaptive thinking gives that decision back to the model, which now does a better job allocating reasoning than developers do guessing it. According to Anthropic's documentation, adaptive thinking also automatically enables interleaved thinking, meaning Claude can think between tool calls in an agentic workflow.
Practical implication: for any application that runs Claude across mixed task types (a customer support agent that handles both "what are your hours" and "process this refund request involving 5 line items"), adaptive thinking will let the model glide on simple cases and dig in on complex ones, without you having to route the request to different model configurations.
What are the effort levels and when do you use each?
The effort parameter in adaptive thinking takes four values: minimal, low, medium, and high. Each controls how aggressively Claude explores possible reasoning paths before committing to an output. Sonnet 4.6 defaults to high. Each level has a clear best-fit use case, and using the wrong one wastes either tokens or accuracy.
minimal is for tasks where speed matters more than depth: classification, simple extractions, friendly chitchat, content rewrites at fixed style. Claude almost never enters extended thinking. Latency is lowest, token cost is lowest. Use this for high-volume tier-1 support, simple form data extraction, or short-answer FAQ responses.
low is the calibrated middle for most business tasks: short reports, summaries that need accuracy, structured data transformations, basic SQL or code generation. Claude thinks selectively when it spots a tricky bit. This is the right default for most internal tooling where you do not want extended thinking on every request, but you do want it when the input gets complex.
medium is for genuinely hard problems where you expect Claude to reason: multi-step analysis, code review on a non-trivial codebase, debugging logic errors, strategic recommendations from raw data. Claude reliably enters extended thinking. Latency goes up, but the quality difference is visible on the kinds of tasks where extended thinking matters.
high is the maximum reasoning setting: complex research synthesis, multi-document reasoning, agentic workflows requiring planning across many tool calls, hard math, novel proofs. Claude almost always thinks, often deeply. This is the default for Sonnet 4.6 and the right choice for any task where being wrong costs more than being slow.
How do you actually call adaptive thinking?
The API call structure for adaptive thinking is simple. You add a thinking object to your request body with type "adaptive" and an effort level. Claude handles everything else. The same call works whether you are using Claude via the Anthropic API directly, via Amazon Bedrock, or via Vertex AI, with minor parameter naming differences.
Try this prompt structure right now (Anthropic API):
--- POST https://api.anthropic.com/v1/messages
--- Headers: x-api-key, anthropic-version: 2023-06-01
--- Body (JSON):
{
--- "model": "claude-sonnet-4-6",
--- "max_tokens": 4096,
--- "thinking": { "type": "adaptive", "effort": "medium" },
--- "messages": [ { "role": "user", "content": "Your prompt here" } ]
}
The response includes a thinking content block alongside the final answer when Claude chose to reason. You can read it for transparency, log it for audit, or strip it before showing the answer to your end user. The decision is yours.
If you are using Claude.ai or Claude Code: adaptive thinking is on by default in supported models. You do not need to configure it. The same dynamic applies, the model decides per turn whether to think hard. That is why a quick "summarise this email" prompt feels fast while a "audit this 4000-line script for race conditions" prompt takes longer.
What prompt instructions should you delete now?
Adaptive thinking deprecates a chunk of the prompt scaffolding many practitioners still ship in production. Three categories of instructions are now actively counterproductive on Sonnet 4.6 and Opus 4.8. Deleting them shortens your prompts, reduces token cost, and often improves output quality because Claude is no longer fighting your instructions with its own reasoning behaviour.
Delete: "Think step by step before answering." Sonnet 4.6 thinks step by step when needed, on its own, when effort is medium or high. Adding this instruction can actually push Claude into verbose reasoning on simple requests. Replace with the effort parameter at the API level. If you cannot set the API parameter (consumer Claude.ai), simply leave the instruction out, the default behaviour is now smarter than the prompt.
Delete: "Take your time and reason carefully." Same issue. Claude is already calibrating reasoning depth to the task. Soft instructions like this either get ignored or push the model toward overthinking. Specify your output requirements instead, for example "your answer must include reasoning for each step you propose."
Delete: "Think hard about whether the user really means X or Y." Replace with a structured disambiguation instruction: "If the user's intent is ambiguous between X and Y, ask one clarifying question before proceeding." Claude will use adaptive thinking to evaluate that condition, but the instruction tells it what to do when it does, which is far more useful than asking it to think harder.
How do you choose the right effort level for your application?
The effort level choice maps to a clear set of questions about your application. Each question narrows your options. Within five minutes, you can lock in the right level for any new build, and adjust as you observe real behaviour. The framework below is what most production teams converge on in practice.
Question 1: What is your error tolerance? If a wrong answer costs the user time or money in a way that matters (financial calculation, legal review, medical context, customer escalation logic), default to high effort. If a wrong answer is annoying but recoverable in the next turn (chitchat, basic search, casual content drafting), minimal or low is fine.
Question 2: What is your latency budget? If your user is staring at a chat window expecting an answer within 3 seconds, you cannot run high effort on every turn. Adaptive thinking helps because Claude only invokes reasoning when needed, but high effort still adds latency on the calls where it triggers. Test your p95 latency at each level and pick the one that fits.
Question 3: How variable are your inputs? If your application sees a narrow input distribution (a structured intake form, a fixed-template request), you can predict the right effort level and lock it. If your inputs are wildly variable (open-ended chat, mixed-domain Q&A, a general-purpose agent), use medium or high so adaptive thinking can scale up on the hard ones without you needing to detect them.
Question 4: What does the bill look like? Extended thinking generates more tokens. Token cost scales with effort. For high-volume applications, run a representative sample at low and at medium effort and measure the quality delta. If the delta is small, low is the rational choice. If the delta is large, the higher quality is usually worth the cost.
What mistakes break adaptive thinking workflows?
Three mistakes consistently turn adaptive thinking from a win into a regression. Each is easy to diagnose once you know to look. Avoiding them keeps the upgrade working as designed.
Mistake 1: Mixing thinking instructions with effort settings. If you set effort to medium in the API and also write "think step by step" in your prompt, Claude gets a double signal and may overthink trivial cases. Pick one channel for thinking control: the parameter, not the prompt. Strip your prompt clean of reasoning instructions when effort is set.
Mistake 2: Hardcoding budget_tokens alongside adaptive. The two systems are mutually exclusive. If your code still has budget_tokens from an older Sonnet 4.5 setup, remove it when you switch to adaptive. Otherwise the request will fail or produce inconsistent behaviour. Audit any model-call wrapper before deploying.
Mistake 3: Treating effort as a quality knob to maximise. High effort is not "better." It is "more reasoning." For tasks that do not need reasoning (formatting, translation, simple extraction), high effort can hurt quality by introducing irrelevant intermediate steps. Match effort to task type, not to "give me your best answer."
Conclusion: Reasoning becomes infrastructure
Adaptive thinking moves Claude reasoning from a prompt-level trick into an infrastructure-level setting. You stop coaching the model in plain English and start tuning it through a parameter the model handles better than you can. That is the right direction. The practitioners who pick this up will spend less time wrangling chain-of-thought wording and more time on the parts of their workflow that actually need a human.
We understand AI. We understand you better. With UD by your side, AI doesn't feel cold. Adaptive thinking is one example of where the AI tools are now smart enough that the work is no longer about clever prompts, it is about clean configuration and a clear product decision.
Want to Build This Into Your Team's Workflow?
Knowing the parameter is one thing. Wiring it into your team's actual Claude workflow, support agent, internal tool, or content pipeline is another. UD's AI Employee Hub gives you pre-built Claude workflows you can deploy in days, and we'll walk you through every step, from model selection to effort tuning and production rollout.