Reverse Prompting: Reconstructing Prompts from AI-Generated Text

We fine-tuned Google’s Gemma 3 (270M) to reverse the typical LLM workflow: given an AI-generated response, the model reconstructs the most likely prompt that produced it. We generated 100,000 synthetic prompt-response pairs using Gemini 2.5 Flash, trained for a single epoch on a consumer GPU, and built a Streamlit app that sweeps 24 decoding configurations to produce ranked prompt candidates. The model demo runs on CPU and is available here.

The Idea

Large language models take prompts and produce responses. We wanted to see if a small model could learn to do the opposite: take a response and work backwards to the prompt.

This isn’t about recovering the exact original prompt, but to surface the most plausible prompts, ranked by model confidence. Think of it as asking: “What question would most naturally lead to this answer?”

Training Data Generation

The training pipeline has two stages, both powered by Gemini 2.5 Flash via Vertex AI.

Stage 1: Prompt generation. We generated 100,000 diverse prompts across five categories designed to cover different user behaviours:

Mid-tail, search query style (single or multi-faceted)
Long-tail, search query style (multi-faceted)
Simple, prompt-like (single-faceted)
Typical, prompt-like (single or multi-faceted)
Detailed, prompt-like (multi-faceted)

Each API call generated a batch of 100 prompts as JSON with thinking disabled. We ran 100 concurrent calls, stored results in SQLite, and had the full dataset in minutes.

Stage 2: Response generation. Each of the 100,000 prompts was sent back to Gemini 2.5 Flash to produce a corresponding AI assistant response. Same concurrency, same speed. The result: 100,000 prompt-response pairs ready for training.

Data Preparation

The key design decision was how to format the training data. We needed the model to learn a clear boundary between the response (input) and the prompt (target). We settled on a simple separator:

{response}\n###\n{prompt}<eos>

During tokenization, we masked the loss over the response and separator tokens (setting labels to -100) so the model only learns to predict the prompt portion. This is critical: without masking, the model would waste capacity learning to reproduce the response text rather than focusing on the reverse mapping.

Sequences were capped at 2,048 tokens. Tokenization was batched in groups of 5,000 to manage memory, then concatenated into a single dataset.

Model Selection

We chose Gemma 3 270M for several reasons:

Size. At 270M parameters, it’s small enough to train on a single consumer GPU and fast enough to run inference on CPU. This matters for a free demo.
Architecture. Gemma 3 uses a mix of sliding window and full attention layers, giving it a good balance of local and global context within its 2,048 token training window.
Capability. Despite its size, Gemma 3 270M has a 262K vocabulary and was pretrained on enough data to have reasonable language understanding out of the box.

A larger model would almost certainly perform better, but the goal was a practical tool that could run anywhere, not a benchmark result.

Training

Training was straightforward. Full fine-tune, single epoch, on an NVIDIA RTX 4090.

Parameter	Value
Method	Full fine-tune
Precision	bfloat16
Batch size	2 (effective 16 with gradient accumulation)
Learning rate	5e-5
Optimizer	AdamW (torch fused)
Warmup steps	100
Gradient checkpointing	Enabled
Training time	4 hours 14 minutes

One epoch was sufficient. The loss curve showed steady convergence without signs of underfitting, and we wanted to avoid overfitting on synthetic data where the model might memorise specific phrasing patterns rather than learning the general reverse mapping.

Inference Strategy

A single generation pass from the model produces one candidate prompt. To get a diverse set of candidates, we sweep across 24 contrastive search configurations by varying two parameters:

top_k: [2, 4, 6, 15]
penalty_alpha: [0.1, 0.2, 0.3, 0.4, 0.5, 0.6]

Contrastive search balances token probability with a degeneration penalty, which encourages diverse yet coherent outputs. Different configurations produce different candidate prompts from the same input.

Each candidate is then scored by perplexity: we run the full sequence (response + separator + generated prompt) through the model and compute the average token-level log probability over the prompt portion. Lower perplexity means the model finds that prompt more natural given the response.

The top 10 candidates are displayed with per-token confidence visualisation, where each word’s opacity reflects how confident the model was in predicting it.

The Tool

The Streamlit app has two modes.

Paste mode is the primary interface. Paste any AI-generated text, click Reconstruct Prompts, and the model generates ranked candidates. The results include a prompt table with perplexity scores and per-token confidence bar charts, a key phrases panel that extracts the most important shared phrases across candidates, and a word frequency heatmap.

URL mode is experimental. Enter a URL and the app scrapes the page content via the DataForSEO API, converts it to markdown, and runs it through the model. This isn’t the intended use case since the model was trained on AI assistant responses, not web pages. But it produces interesting results: the reconstructed “prompts” reveal what the model considers the core semantic intent of the page content. It’s less prompt reconstruction and more semantic summarisation through the lens of “what question would this page answer?”

Possible Uses

Prompt engineering. Understanding what prompts lead to certain outputs helps refine prompt design. If you have an output you like, reverse prompting can suggest more efficient or precise ways to get there.

Content analysis. Running web content through the model reveals what the model perceives as the core intent behind the text. This could be useful for understanding how AI models interpret and categorise content.

AI content forensics. While this isn’t a detector (it doesn’t classify text as AI-generated or not), the confidence scores and perplexity values could serve as signals. Text that was genuinely produced by an AI assistant in response to a clear prompt may produce lower-perplexity reconstructions than text that wasn’t.

Training data curation. When building datasets, reverse prompting can help verify that responses actually match their intended prompts, or surface cases where the mapping is ambiguous.

Insights

A few things we noticed during development:

Synthetic data works. The model was trained entirely on Gemini-generated data and generalises to outputs from other models. The reverse mapping from response to prompt is more about structure and intent than model-specific quirks.

Small models can learn non-trivial mappings. At 270M parameters, this model is tiny by current standards. Yet it reliably produces sensible prompt reconstructions. The task is well-constrained enough that a small model can handle it.

Diversity in decoding matters more than model size. The contrastive search sweep across 24 configurations produces more useful results than a single greedy decode from a larger model would. The ranking by perplexity then surfaces the best candidates.

The separator matters. We tested several formats. The simple \n###\n separator worked best, likely because it’s distinct enough that the model learns a clean boundary between input and output.

The model and code are available on Hugging Face, and a live demo runs on https://dejan.ai/tools/reverse-prompter/

DEMO