Methodology

The Adversarial Loop: How Redteam AI Works

Published by Redteam AI — March 2026

The problem with asking one mind to think harder

When you ask a single AI model to review its own reasoning, you are not getting a second opinion. You are getting the same opinion expressed with more thoroughness.

This is not a failure of capability. It is a structural limitation. Every AI model is trained on a specific distribution of data, shaped by a specific set of human preferences, and optimized toward a specific set of objectives. The result is a system with genuine strengths and genuine blind spots — areas where it will reliably find problems and areas where it will reliably miss them.

When that system reviews its own output, it brings the same strengths and the same blind spots to the review that it brought to the original generation. It will find the things it is disposed to find. It will miss the things it is disposed to miss. Asking it to think longer or harder does not change the distribution of what it sees.

This is not a new problem. Humans have known for centuries that individual judgment is systematically limited in predictable ways. It is why we invented the adversarial legal system, where opposing counsel is specifically tasked with finding every flaw in an argument. It is why science requires peer review rather than self-certification. It is why military planning uses red teams whose only job is to find what the blue team missed. It is why the Socratic method produces better thinking than individual reflection.

Every institution humanity has built to produce reliable judgment under high stakes has converged on the same insight: truth emerges more reliably from structured conflict between genuinely different perspectives than from any single perspective thinking carefully.

Redteam AI is that insight applied to AI-assisted decision making.

What the adversarial loop is

The adversarial loop is a structured four-stage reasoning process that runs any decision, idea, or piece of code through a sequence of genuinely opposing perspectives before producing a synthesis.

The four stages are not variations on the same analysis. They are structurally different tasks performed by structurally different systems optimized for structurally different objectives.

0

Stage 0 — Clarification

Before any analysis begins, the submission is normalized. Vague inputs produce generic outputs. A submission that says “I want to build an AI company” cannot be specifically attacked because there is nothing specific to attack. The clarification stage forces precision before the loop runs — detecting underspecified fields, missing assumptions, and overconfident framing that would cause the downstream analysis to miss the actual load-bearing risks.

This stage is unglamorous and invisible to the user. It is also essential. The quality of adversarial analysis is bounded by the quality of the input. Garbage in, polished garbage out.

1

Stage 1 — Advocacy

The Advocate builds the strongest honest case for the submitted idea, decision, or code as it currently exists.

The word honest is doing important work here. The Advocate does not fix weaknesses before the attack. It does not present an improved version of the submission. It presents the submission at its best in its current form — the strongest defensible version of what was actually submitted, not what the submitter wishes they had submitted.

This constraint is counterintuitive but critical. If the Advocate patches weaknesses before the Prosecutor attacks them, you lose the most valuable signal in the entire loop: which holes the Advocate missed that the Prosecutor finds. The gap between what the Advocate defends and what the Prosecutor attacks is where the insight lives.

The Advocate identifies the real customer or stakeholder, articulates the mechanism of value with precision, names the critical assumptions the submission depends on as falsifiable claims, and makes the case for why this idea or decision or code is worth taking seriously. It does this not to be encouraging but to ensure the Prosecutor is always attacking a strong case rather than a weak one.

2

Stage 2 — Prosecution

The Prosecutor's only job is to break the case the Advocate just built.

The attack is structured — not a freeform critique but a disciplined examination calibrated to surface the failure modes most commonly missed when a single perspective evaluates its own reasoning. It examines the submission across multiple dimensions: the assumptions it depends on, the alternatives it ignores, the execution risks it underweights, and the structural vulnerabilities that surface-level analysis tends to skip entirely.

The Prosecutor's output is ordered by severity — risks classified as Fatal, Serious, or Manageable — and ends with two outputs that distinguish Redteam AI from every other analysis tool: the hidden assumption and the uncomfortable truth.

The hidden assumption is the premise the submitter has not examined because examining it would be uncomfortable. It is not the most obvious risk. It is the risk underneath the risk — the foundational belief that the entire submission rests on that nobody has said out loud.

The uncomfortable truth is what a trusted advisor with no career incentives to be polite would say privately. Not a list of risks. Not a balanced assessment. One sentence that captures what an experienced observer would say about this submission that the submitter has convinced themselves is not true.

These two outputs are the hardest to produce and the most valuable to receive. They are also the outputs most likely to make a smart person pause and reconsider. That is the quality standard the Prosecutor is held to: if the output does not make the submitter slightly uncomfortable, it has not done its job.

The Prosecutor operates through one of three attack personas, randomly assigned each run: The Assassin finds the single assumption the idea cannot survive being wrong about and goes straight to it. The Skeptic demands evidence for every claim and exposes where none exists. The Contrarian finds the angle nobody has said out loud — the uncomfortable read the submitter has been avoiding. The persona rotates independently of which AI model is selected, adding a second dimension of adversarial diversity.

3

Stage 3 — Refinement

The Refiner has seen both the Advocate's strongest case and the Prosecutor's three attacks. Its job is to produce a meaningfully improved version of the submission — not a summary of the debate, not a balanced assessment, and not a defense of the original.

The Refiner starts with the highest-severity risk and addresses it first. Not the easiest risk to fix. The one that matters most. If that risk is Fatal and cannot be resolved without fundamentally changing the submission, the Refiner changes the submission and says so explicitly.

For each risk the Refiner makes a clear decision: resolved, partially resolved, or not resolvable without real-world validation. For unresolvable risks it names the specific experiment, test, or evidence that would resolve it. This converts the analysis from a document into an action plan.

The output of the Refiner is not advice. It is a structured record of what changed, why it changed, what it changed in response to, and what remains unresolved. That record — the reasoning diff — is the deliverable. It documents the evolution of the submission under adversarial pressure in a form that can be shared, archived, and referenced.

Structural memory: the knowledge graph

Before the Prosecutor attacks, Redteam AI builds a structural model of the submission — mapping the entities, relationships, and causal dependencies that the Advocate's case rests on.

This is different from reading the text. A structural model makes explicit what prose leaves implicit: which claims depend on which assumptions, which connections are load-bearing, and which parts of the argument would collapse if a single dependency failed. The Prosecutor attacks with that structural picture already in view.

The result is that identified risks are not just descriptions of what could go wrong. They are anchored to the specific structural points the submission depends on. A risk that threatens a central dependency is categorically more serious than one that threatens an isolated claim — and the analysis treats them differently.

In multi-round analysis, this structural model persists and evolves. Each round does not start from a blank reading of the revised submission. It starts from an understanding of what changed structurally — what was strengthened, what was removed, and what load-bearing relationships the previous rounds left unresolved. This is what makes each round genuinely additive rather than repetitive.

Persona architecture: three attack styles, one adversarial objective

The Prosecutor operates through one of three personas, randomly assigned each run. Each persona has a distinct behavioral logic — not just a different tone, but a different theory of where the most important failures hide.

The Assassin

Believes every submission has a single point of failure more important than all the others combined. Concentrates full analytical force on finding it. The output is precise, focused, and unsparing. Secondary risks are noise until the primary one is named.

The Skeptic

Believes most submissions mistake assertion for evidence. Applies a consistent standard across every claim: where is the actual support, and what would it cost to produce it. Not adversarial for its own sake — precise about where confidence is earned and where it is borrowed.

The Contrarian

Believes the most important risks are rarely the ones being discussed. Looks for the angle the submitter has already considered and quietly set aside. Surfaces the read the submitter suspects but has not said out loud.

The persona rotates independently of which AI model is selected. Model and persona are two separate dimensions of adversarial diversity. The same submission analyzed twice receives not just a different training distribution but a different attack philosophy — applied by a different system with different characteristic blind spots.

Why four different AI models

The adversarial loop works because the four stages are performed by four genuinely different AI systems from four different US-based companies with four different training distributions and four different characteristic failure modes.

This is the architectural decision that separates Redteam AI from any product built on a single AI provider.

When Claude builds the Advocate case and GPT-5 plays Prosecutor, the Prosecutor is not simply arguing against what Claude said. It is approaching the problem from a fundamentally different training distribution, with different priors about what failure looks like, different tendencies in how it weights evidence, and different systematic blind spots. The attack is genuinely different from what Claude playing Prosecutor would produce.

This matters enormously in practice. Every AI model has categories of risk it reliably finds and categories it reliably misses. These patterns are not random — they reflect the composition of training data, the structure of reinforcement learning feedback, and the specific objectives each model was optimized toward. Two models trained differently will miss different things. A system that uses both will catch more than either would alone.

The Prosecutor role rotates randomly across runs. The same submission analyzed twice on different days will receive attacks from different Prosecutor models drawing on different training distributions. This is a deliberate design choice. It means the same idea can be stress tested from multiple genuinely independent adversarial perspectives. It means the corpus of analyses Redteam AI accumulates over time reflects the attack patterns of multiple AI systems rather than the systematic tendencies of one.

The architecture enforces a hard rule: the same model cannot build the case and prosecute it. If the Advocate and Prosecutor were the same system, the attack would be shaped by the same reasoning process that built the defense. The gap between them would reflect self-criticism rather than genuine adversarial pressure. Epistemic diversity is not a feature of the output — it is a requirement of the architecture.

What survives multiple rounds

A single adversarial round produces a revised idea. A second round attacks that revised idea — not the original — with a different model and a different persona, each aware of what the previous round already covered. Round 2 is not a repetition. It is a search for what Round 1 missed.

The risks that survive a round of refinement are not necessarily resolved. They are the risks the first Prosecutor — with its specific training distribution and assigned persona — was not disposed to find, or that the Refiner could not address without validation the submitter does not yet have. A second round attacking with different priors and a different attack philosophy will find different things.

A risk that surfaces independently across multiple rounds — identified by different models from different angles — is a different class of signal than one that appears only once. Convergent adversarial pressure from genuinely different systems is the closest available proxy for structural truth.

The Gauntlet tier adds a third round, completing a full rotation of adversarial perspectives. Each round enters with memory of what the previous rounds covered and structural context for how the submission has evolved. The final Prosecutor is not finding new risks on a fresh submission. It is finding the angle that survived two prior rounds of intelligent attack.

After all rounds complete, the ReportAgent synthesizes the full arc. Not a summary of each round — a structured analysis of the evolution: which risks appeared across rounds independently, which survived all refinements without resolution, how much the submission changed from what was originally submitted, and what that pattern of change reveals about the robustness of the original thinking.

The survivability score — a single integer from 0 to 100 — is the ReportAgent's summary judgment. It is not a grade on the quality of the submission. It is a measure of adversarial pressure survived intact. An idea that required fundamental structural changes across three rounds to remain viable scores lower than one that absorbed the attacks and emerged largely unchanged. Neither outcome is inherently better. The score contextualizes the analysis. It answers the question: how much did this have to change to survive scrutiny, and what does that tell you about how confident you should have been at the start?

The cover problem and why it matters

There are two distinct reasons a person or organization benefits from adversarial stress testing.

The first is courage — they genuinely want to find the flaws in their thinking before they act. They want to be wrong in the analysis rather than wrong in the world. This is the intellectually honest motivation and it is real.

The second is cover — they need to demonstrate that they challenged the decision before committing to it. They need a record that the risks were identified, considered, and either addressed or consciously accepted. This is the institutional motivation and it is at least as real.

Most serious decisions require both. A VC associate taking a deal recommendation to a partner meeting needs to believe in the deal and needs to be able to show they stress tested it. A VP making a major resource commitment needs to think it is the right decision and needs documentation that the board can review. A lawyer advising a client needs to believe in the legal strategy and needs a record that alternatives were considered.

Redteam AI serves both motivations simultaneously. The analysis delivers genuine adversarial challenge. The decision record delivers documented proof that the challenge occurred.

The decision record is not a summary of the analysis. It is a formal document that records the adversarial review process — which models were used, what risks were identified, what survived scrutiny, what changed as a result, and what remains unresolved. It is designed to be shared internally, filed with board materials, included in legal documentation, or referenced in investment committee memos.

The cover motivation is not cynical. It is the institutional equivalent of showing your work. Organizations that make high-stakes decisions with documented adversarial challenge processes make better decisions over time because the process forces explicitness about assumptions and risks that informal judgment leaves implicit.

The data thesis

Every analysis that runs through Redteam AI generates something more valuable than the output the user receives.

It generates a reasoning diff — a structured record of how a submission changed under adversarial pressure. What the Advocate built. What the Prosecutor attacked. What the Refiner changed and why. What survived and what collapsed. Which risks were fatal and which were manageable.

Linked over time to real-world outcomes — did the startup raise the round, did the product ship cleanly, did the legal strategy succeed, did the business decision produce the stated result — this corpus becomes a dataset that does not currently exist anywhere in the world.

Current AI training data is overwhelmingly static. It captures conclusions, not the process by which conclusions were reached under pressure. It captures arguments, not the evolution of arguments when those arguments are genuinely challenged. It captures what people said, not what they changed their minds about and why.

Outcome-linked reasoning diffs are different in kind. They capture the trajectory of thinking under adversarial pressure and link that trajectory to whether the thinking turned out to be correct. A model trained on this data would not just learn to produce plausible analysis. It would learn which patterns of reasoning survive contact with reality and which patterns look sophisticated but predict failure.

This is not a near-term product feature. It is the long-term thesis for why Redteam AI is more than a useful tool. The data being generated by every run is the asset. The product is the mechanism for generating it.

Outcome monitoring: closing the loop on predictions

The adversarial loop makes predictions. The Prosecutor identifies risks. The Refiner proposes changes. The survivability score summarizes how much the submission had to change to survive scrutiny. None of this is useful if the predictions are never checked against what actually happened.

Redteam AI runs autonomous outcome searches on submitted decisions on a scheduled cadence that mirrors how real-world outcomes actually emerge — more frequently in the weeks after submission, tapering over months and years as outcomes crystallize. The system searches public sources for evidence of how the decision played out: did the startup raise the round, did the legal strategy succeed, did the business expansion produce the stated result, did the code ship without the issue the Prosecutor identified.

When a high-confidence outcome is found, the system produces a structured assessment: what the outcome was, how closely it matches the risks that were identified, and whether the changes the Refiner proposed appear to have been adopted. Retrieval and evaluation are separated by design — the same principle that separates the Advocate and Prosecutor in the original loop.

The outcome data closes the feedback loop that every other analysis tool leaves open. It is not enough to know that a risk was identified. The question worth answering is: was it identified correctly, was it addressed, and did addressing it matter? Over time, this data — adversarial predictions linked to real-world outcomes — is what allows the quality of the analysis to improve in ways that are empirically grounded rather than impressionistic.

This is also the dataset that does not currently exist anywhere. The world has prediction markets, postmortems, and retrospectives. It does not have a structured corpus of adversarial reasoning linked to outcomes at scale. Every run through Redteam AI contributes to building it.

What this is not

Redteam AI is not a prediction engine. It does not tell you whether your startup will succeed, whether your investment will return, or whether your code will run without errors in production. Nobody can tell you that. Anyone who claims otherwise is selling something.

Redteam AI is not a replacement for human judgment. The adversarial loop surfaces risks, challenges assumptions, and produces structured analysis. What to do with that analysis — whether to proceed, modify, or abandon — requires human judgment informed by context, relationships, values, and priorities that no AI system has access to.

Redteam AI is not legal advice, financial advice, or medical advice. For domains where professional judgment carries legal and fiduciary weight, Redteam AI is a preparation tool for the conversation with qualified professionals, not a substitute for it.

Redteam AI is a stress testing tool. It is a machine that applies structured adversarial pressure to decisions before they are made. It finds what could go wrong, documents what it found, and produces a record that the challenge occurred. What you do with that information is entirely yours.

The standard we hold ourselves to

There is one quality metric that determines whether Redteam AI is working or not.

It is not comprehensiveness. A thorough analysis that covers every possible risk in careful balanced language is not useful. It is noise dressed up as signal.

It is not accuracy. Any confident prediction about the future is wrong some percentage of the time and right for the wrong reasons some other percentage of the time.

The standard is this: does the output make a smart person pause and reconsider something they had previously decided was fine.

If the uncomfortable truth lands — if the person who submitted the analysis reads it and thinks “I hadn't thought of it that way” or “I knew that but hadn't said it out loud” — then the system has done its job.

If the output produces a feeling of comfortable confirmation, it has failed. If it produces a feeling of mild defensiveness followed by reluctant acknowledgment, it has succeeded.

That standard is harder to meet than comprehensiveness. It cannot be automated away by generating more risks or covering more attack vectors. It requires the Prosecutor to find the specific thing that this specific submission is most wrong about and say it directly without hedging.

That is what we are building toward. Every run that produces a genuine uncomfortable truth moves us closer to it. Every run that produces polished boilerplate moves us further away.

We publish this methodology not because we have perfected it but because we believe the reasoning behind it is sound and the standard we are holding ourselves to is the right one. We will update this page as the product evolves and as our understanding of what produces genuine adversarial insight improves.

If you have encountered a run that failed to meet the standard — output that felt generic, safe, or comfortable when it should have been sharp and uncomfortable — we want to know. The flag mechanism on every risk card exists for exactly that reason.

Redteam AI uses Gemini 2.5 Pro + Google Search (Google) as the fixed Clarifier, Claude Sonnet 4.6 (Anthropic) as the fixed Advocate, GPT-5.4 (OpenAI) and Grok 4 (xAI) in the rotating Prosecutor pool, and Gemini 2.5 Pro (Google) and GPT-5.4 (OpenAI) in the rotating Refiner pool. The Prosecutor and Refiner models rotate randomly across runs. The Prosecutor persona — The Assassin, The Skeptic, or The Contrarian — rotates independently of the model. No model runs two consecutive adversarial stages.

Research updates

Notify me when new methodology is published

Not a newsletter. When we publish new writing on adversarial reasoning, AI epistemic diversity, or outcome-linked analysis, we will send one email. That is it.