Repeating your prompt can boost LLM accuracy — without extra output tokens

Contents

What happened

Why would repeating a prompt help?

The practical version: how to do it

When this is most likely to help

Real prompt examples you can copy

The trade-offs (don’t skip this part)

Why this matters for everyday AI users?

FAQ

A Google Research preprint reports that a dead-simple trick — duplicating the entire input prompt — improves performance for popular non-reasoning LLMs across many benchmarks, while keeping outputs the same length.

What happened

A new preprint from Google Research, titled Prompt Repetition Improves Non-Reasoning LLMs, claims that simply repeating the full user prompt (turning <QUERY> into <QUERY><QUERY>) consistently boosts accuracy when you ask models not to use explicit reasoning.

Headline result: the authors report prompt repetition wins 47 out of 70 benchmark–model tests, with 0 losses, under their significance criterion — when reasoning is disabled.

They tested seven widely used models from major providers (Gemini, GPT, Claude, DeepSeek) on standard benchmarks (including ARC, OpenBookQA, GSM8K, MMLU-Pro, MATH) plus two custom “needle-in-a-haystack” style tasks designed to stress long-context behavior.

Why would repeating a prompt help?

The paper’s intuition is rooted in how most LLMs are trained: as causal language models, where tokens attend to previous tokens (not future tokens). In practice, the order of your prompt can matter a lot.

For example, a prompt shaped like <CONTEXT> <QUESTION> can behave differently than <QUESTION> <CONTEXT>. In multiple-choice settings, placing the answer options first can be especially awkward, because the model reads the options before it knows what the question is.

Repeating the entire prompt effectively gives tokens a “second pass” where important details appear again later in the context, making it easier for the model to align the right parts together — without changing your expected output format.

What the paper highlights

Best gains appear when reasoning is not used.
Usually no increase in generated output length.
Latency stays similar because repetition happens in the prefill stage (with some exceptions on very long prompts).
Biggest boosts show up in tasks that stress long prompts and position sensitivity.

The practical version: how to do it

The simplest implementation is brutally literal:
take the exact prompt you were going to send, and paste it twice.

Base pattern

<QUERY>

<QUERY>

Tip: keep the second copy identical. The point is not to “rephrase,” but to repeat.

A cleaner variant (same idea, less awkward to read)

<QUERY>

Let me repeat that:

<QUERY>

The paper also explores “repeat ×3” variants and a more verbose repetition template. In some long-context stress tests, repeating three times can outperform repeating twice — but it also risks bumping into context limits and (for some providers) can increase prefill time on very large inputs.

When this is most likely to help

If you only remember one thing: prompt repetition is a “non-reasoning” booster. So it’s most useful when you want quick, direct answers rather than step-by-step chains.

High-impact use cases

Multiple choice tasks where the model can get confused by ordering (especially “options-first” formats).
Long prompts with constraints: formatting rules, guardrails, and “don’t do X” instructions.
Extraction tasks: “Return only the email + phone number,” “List only the entities,” etc.
Needle-in-a-haystack scenarios: where one crucial detail in a long context must not be missed.
Classification with strict output schema.

When it might do little (or you shouldn’t bother)

If you already use an explicit reasoning mode that generates long chains (the paper finds repetition is mostly neutral there).
If your prompt is already near the model’s context limit.
If you’re optimizing for minimal input tokens (repetition obviously increases input length).

Real prompt examples you can copy

1) Strict formatting (JSON only)

You are a data extraction system.
Extract the following fields from the text: full_name, company, job_title.
Return ONLY valid JSON. No markdown. No commentary.

TEXT:
{{PASTE_TEXT_HERE}}

You are a data extraction system.
Extract the following fields from the text: full_name, company, job_title.
Return ONLY valid JSON. No markdown. No commentary.

TEXT:
{{PASTE_TEXT_HERE}}

2) Multiple-choice with options-first (the “risky” format)

Options:
A) ...
B) ...
C) ...
D) ...

Question:
{{QUESTION}}

Reply with ONE letter only.

Options:
A) ...
B) ...
C) ...
D) ...

Question:
{{QUESTION}}

Reply with ONE letter only.

3) Long instruction set (policy + constraints)

Write a 700-word article in HTML.
Constraints:
- No bullet lists
- Use H2 headings
- Keep tone punchy and practical
- Include one short FAQ at the end

Topic: {{TOPIC}}

Write a 700-word article in HTML.
Constraints:
- No bullet lists
- Use H2 headings
- Keep tone punchy and practical
- Include one short FAQ at the end

Topic: {{TOPIC}}

The trade-offs (don’t skip this part)

Prompt repetition isn’t magic — it’s a cost/benefit swap: you pay in input tokens to gain reliability, especially in non-reasoning mode.

Important: the preprint reports that output length generally stays the same, and measured latency is typically similar, but extremely long repeated prompts can slow down some providers in the prefill stage.

Quick rules of thumb

If your prompt is short-to-medium: repetition is often a “free” accuracy bump (input costs aside).
If your prompt is long: try repeating only when the task is high-value (extraction, compliance, strict formats).
If you’re near context limits: consider repeating only the most critical block (constraints/output format) instead of everything.

Why this matters for everyday AI users?

Most prompt tricks fall into two buckets: they either (1) add lots of reasoning tokens (“think step by step”), or (2) require complex templates and examples. Prompt repetition is different: it’s drop-in, keeps your output format stable, and targets a real weakness of causal LMs: position sensitivity.

If you build workflows where LLMs must behave consistently — customer support macros, extraction pipelines, agents that call tools, or content systems that must follow strict HTML/JSON rules — this is the kind of small hack that can quietly move your error rate.

The authors also list future directions like repeating parts of the prompt, repeating recently generated tokens during generation, and exploring multi-turn conversation effects — which could turn this “copy/paste hack” into a more principled reliability technique.

FAQ

Does repeating the prompt help with reasoning models too?

In the preprint’s experiments, when reasoning is enabled the effect is reported as mostly neutral to slightly positive overall. The bigger, cleaner gains show up when reasoning is disabled.

Isn’t this just “making the prompt longer”?

The paper includes a padding baseline (adding filler characters to match length) and reports it does not produce the same gains, suggesting the improvements come from repetition itself, not merely longer inputs.

Should I repeat twice or three times?

Twice is the easiest default. The paper explores a repeat ×3 variant that can outperform ×2 on some long-context stress tests, but it increases input tokens further and can stress context limits.