The Research Agent That Actually Reads 160 Sources For You
There is a Gemini feature called Deep Research Max that runs an autonomous research agent across up to 160 web queries and 900,000 input tokens for a single question. Most people who use Gemini have never touched it. The ones who have are quietly compressing what used to be a full day of research into a 30-minute background task.
Google launched Deep Research Max on April 21, 2026, built on the new Gemini 3.1 Pro reasoning model. It is a different kind of feature from anything you have used in ChatGPT or Claude. Instead of answering your question directly, it builds a research plan, executes dozens of searches, reads the source pages, refines its strategy, and returns a structured report with cited sources.
This is not the same thing as the original Deep Research that launched in late 2024. The new version handles roughly twice the number of queries, processes private data through MCP, and scored 93.3% on DeepSearchQA, up from 66.1% in December. If you do any kind of research work, market analysis, competitive intelligence, or due diligence, this is the most consequential AI feature shipped this quarter.
This guide covers what it actually does, when it is worth using, the prompt structure that produces the cleanest reports, and the specific tasks where it falls apart.
What Is Gemini Deep Research Max?
Deep Research Max is Google's autonomous research agent inside Gemini, designed for accuracy-critical investigations that synthesise hundreds of sources into a single cited report. It runs on Gemini 3.1 Pro, executes up to 160 web searches per task, and can process up to 900,000 input tokens. Tasks typically run 20 to 60 minutes in the background.
The standard Deep Research tier handles around 80 queries with 250,000 input tokens and runs in under 20 minutes. Deep Research Max is the heavier tier, designed for tasks where comprehensiveness matters more than speed.
The agent works in four stages. First, it decomposes your question into sub-questions. Second, it generates search queries that cover the topic from multiple angles. Third, it reads the actual content of source pages, not just snippets. Fourth, it iterates, finding gaps and adjusting its strategy until the report is complete.
The output is a structured document with inline citations, often with native charts and tables generated from the source data. You can view the full research plan before execution and edit it if the agent has misunderstood your scope.
When Is Deep Research Max Worth Using?
Use Deep Research Max when the question is broad enough to require multiple sub-investigations, the answer needs cited sources, and you can wait 20 to 60 minutes for the result. For quick factual lookups, fast follow-ups, or single-source questions, the standard Gemini chat is faster and cheaper.
Three task types are where this tool earns its weight. Market and competitor analysis: producing a 20-page report on a sector, key players, recent moves, and pricing data. Due diligence: investigating a company, a regulatory framework, or a technology vendor before a decision. Trend reports: pulling together discussions, news, and academic work on a topic that is moving fast.
The cost reality matters. A typical Deep Research Max run consumes 500K to 2M tokens, putting individual tasks in the $2 to $15 range when called via API. On the Gemini app at the AI Pro or AI Ultra tier, it is included in your subscription quota, but the tasks count against your daily limit.
Skip this tool for anything that needs current real-time data, anything where the answer comes from a single authoritative source, or anything you could verify with two minutes on Google. The agent is designed for synthesis at scale, not for speed.
How Do You Write a Prompt That Produces a Useful Report?
The single biggest determinant of output quality is the structure of the input prompt. Vague questions produce shallow reports. Structured prompts with explicit sub-questions, source preferences, and output format produce reports you can actually use without rewriting.
The framework that works has four components. Goal: state the decision the report should inform, not just the topic. Sub-questions: list five to ten specific questions the report must answer. Source guidance: tell the agent which kinds of sources to prioritise and which to deprioritise. Output format: specify the exact structure you want, including section headings, table formats, and any required comparisons.
Here is a complete prompt template you can copy and adapt:
Try This Prompt:
I am evaluating [TOPIC] to inform [SPECIFIC DECISION]. Produce a research report that answers the following sub-questions:
1. [Specific question 1]
2. [Specific question 2]
3. [Specific question 3]
4. [Specific question 4]
5. [Specific question 5]
Source priorities:
--- Prioritise: official documentation, peer-reviewed research, financial filings published in the last 12 months, statements from named executives.
--- Deprioritise: marketing blogs, content farms, opinion pieces without data, sources older than 24 months unless historical context is required.
Output format:
--- Executive summary, 5 bullet points, each under 25 words.
--- One H2 section per sub-question, each containing a 60-word answer capsule, supporting evidence with inline citations, and one comparison table where relevant.
--- Final section: open questions and gaps in the available research.
Tone: analytical, neutral, no marketing language. Cite every numeric claim.
This template forces the agent to commit to a structure, which in turn forces the planner stage to allocate searches across each sub-question rather than over-investing in one.
How Do You Get Consistent Results from Deep Research Max?
Consistency comes from controlling three variables: scope, source preferences, and the research plan review step. The agent allows you to review and edit the plan before it executes the searches. This is the single most underused control in the entire feature.
When you submit a prompt, the agent first generates a plan: typically 8 to 15 sub-investigations, each with proposed search queries and source types. You can edit any of them. If the plan covers an irrelevant angle or misses a critical sub-question, fix it here. Five minutes of plan editing saves you a 30-minute report that misses the point.
The second consistency lever is recency control. By default the agent does not cap source date. If you are investigating something that moved in the last six months, add an explicit cutoff: "Only cite sources published after [DATE]. If a critical claim requires older sources, flag it explicitly." This prevents the report from leaning on stale 2023 articles.
The third lever is contradiction handling. Real research sources contradict each other. Tell the agent how to handle this: "When sources disagree, present both positions, name the source of each, and explain the methodological difference. Do not silently pick one." Without this instruction, the agent often defaults to the most-cited position, even when it is the weaker one.
Where Does Deep Research Max Fall Apart?
Deep Research Max struggles with three categories of work: real-time data, niche technical topics with thin web coverage, and tasks where the answer requires reasoning beyond the source material. Knowing the failure modes helps you avoid wasting a 30-minute run on something it will not handle well.
The first failure mode is real-time data. If you ask for "the current stock price" or "today's exchange rate," the agent will pull whatever cached numbers appear in its searches, often stale by hours or days. Use the live web search in standard Gemini chat for these.
The second failure mode is thin coverage. The agent is only as good as the sources it can find. For very narrow technical topics, regional regulatory questions, or niche industries, the report can be padded with adjacent material that does not actually answer the question. If you suspect thin coverage, run a quick standard Gemini search first to gauge how much real source material exists.
The third failure mode is original analysis. The agent synthesises what is in its sources. It does not generate genuinely new analytical frameworks or independently verify contested claims. If your question requires "is this argument valid" rather than "what do experts say," you need a different tool, often a back-and-forth conversation with Claude or GPT-5.5 instead.
The honest summary: Deep Research Max is the best AI tool available right now for synthesising existing knowledge into a structured cited report. It is not a substitute for a domain expert, a live data feed, or your own judgement on what the synthesis means.
How Does Deep Research Max Compare to ChatGPT Deep Research and Claude Research?
All three major AI vendors now ship a research agent. Each has different strengths, and the right choice depends on the type of report you need. As of May 2026, Deep Research Max leads on benchmark scores and source coverage, ChatGPT Deep Research leads on conversational depth, and Claude Research leads on document-grounded analysis.
Deep Research Max scored 93.3% on DeepSearchQA and 54.6% on Humanity's Last Exam. ChatGPT Deep Research, running on GPT-5.5, scored 91.2% on DeepSearchQA. These are close, but the practical difference is in how they handle ambiguity. Deep Research Max tends to enumerate possibilities and cite source disagreement. ChatGPT Deep Research often produces a more confident, narrative-style answer.
For competitive analysis or market reports, Deep Research Max is the better choice because of its higher source ceiling and inline visualisations. For exploratory thinking or topics where you want a back-and-forth refinement, ChatGPT Deep Research is more flexible. For analysing a specific document set or your own uploaded research, Claude Research handles long-context grounding better.
If you are committing to one for daily use, Deep Research Max has the strongest output structure right now. If you already pay for ChatGPT Plus or Claude Pro, the marginal value of switching depends on how often you need a fully cited 20-page report rather than a quick analytical answer.
Try It Now: A 30-Minute Test Drive
The best way to learn what Deep Research Max can and cannot do is to run a single test on a question you already know the answer to. This calibrates your expectations for production use.
Pick a topic you have researched manually in the past three months: a competitor, a market, a regulation, a technology. Use the prompt template above to produce a structured report. Compare the output against what you found yourself.
You will notice three things. First, the agent finds sources you missed. Second, it misses some you found, especially anything paywalled or behind login. Third, the synthesis is good but generic, the kind of report a competent analyst with one day of research would produce. Once you understand that baseline, you can decide where it adds value to your actual workflow.
Most practitioners who integrate Deep Research Max use it for a specific class of tasks: weekly competitor monitoring, quarterly market scans, and pre-meeting briefings on companies or sectors. They do not use it for daily questions, real-time data, or anything where speed matters more than depth.
This is the dividing line between using AI tools and being limited by them. The people pulling ahead right now are not the ones writing cleverer prompts. They are the ones who know which tool to reach for, which tasks each tool actually handles well, and which to leave for human judgement. UD has been helping Hong Kong businesses bridge technology and human work for 28 years. 懂AI,更懂你。UD相伴,AI不冷。
Build a Reliable AI Workflow Around Tools Like This
Knowing how to use Deep Research Max is one piece. Building it into a reliable workflow that runs reliably every week is the next step. We'll walk you through every step, from tool selection to prompt templates to deployment, so AI actually compounds value in your work.