Seithar Group · Research · March 2026

One Poisoned Result

Your AI agent searches the web forty times a day. It reads documentation, checks Stack Overflow, pulls API references, scans news feeds. Each search returns ten results. Four hundred sources per day, all treated as ground truth.

One of them is lying.

The Collapse

Researchers at ICML 2026 built procedurally generated fake internets with search engines, websites, and adversarial content to test how language agents handle poisoned information environments. The finding: a single adversarially crafted search result causes catastrophic accuracy collapse across six frontier models.

The attack requires no access to the model. No prompt injection. No jailbreak. One well-positioned lie in a sea of truth.

Claude, GPT-4o, Gemini. All of them folded. Even with unlimited access to truthful sources. The models barely escalated their search behavior. They read the poison, integrated it, and moved on with high confidence in wrong answers. Severe miscalibration: the agents believed they were right.

This is cognitive warfare at infrastructure scale.

Why Search Is the Kill Shot

Traditional AI security focuses on the prompt. Guard the system message. Filter the input. Scan the output. The implicit assumption: if you control what goes in and check what comes out, the agent is safe.

Search-augmented agents shatter this model. The agent's context window is no longer controlled by you. Every search query opens a channel to the adversary. Every retrieved document is an injection surface. The agent's own curiosity becomes the attack vector.

Consider the operational reality. Your coding agent searches for a library's API. The top result is a well-crafted page with correct documentation and one subtle modification, a parameter that opens a reverse shell instead of a database connection. The agent reads it, writes the code, executes it. You never saw the poisoned source. The agent never flagged it.

This works because search-augmented agents have a fundamental architectural vulnerability: they cannot distinguish between "retrieved" and "trusted." Everything the search engine returns enters the context with equal epistemic weight. There is no immune system discriminating self from non-self.

The Substrate Connection

This maps precisely to what we observe in human information environments. A 2020 PNAS study demonstrated that adversarial sequences in human decision-making exploit the same sequential dependency patterns. Humans who encounter a well-timed piece of misinformation during an active search process integrate it with the same uncritical absorption.

The mechanism is identical across substrates. Both human and artificial cognition treat contextually appropriate information as credible by default. Both fail to maintain epistemic hygiene during active information foraging. Both show miscalibration: confidence remains high even as accuracy collapses.

An attack that works on one inference engine will, with appropriate substrate translation, work on the other. Poisoned search results work on GPT-4o. Poisoned search results work on your analyst. The delivery format differs. The cognitive mechanism is the same.

What Defense Looks Like

Static guardrails cannot solve this. You cannot pre-filter every possible search result. You cannot whitelist every information source an autonomous agent might need. The information environment is adversarial by default.

Defense requires a dynamic immune system operating at the epistemic level.

Source triangulation. No single source changes a belief. The agent maintains a belief graph where claims require corroboration from independent sources before integration. One result says the API parameter opens a shell. Three others say it opens a database. The anomaly triggers investigation, not integration.

Epistemic free energy monitoring. Track the agent's prediction error continuously. When a search result produces high surprise relative to the agent's existing model, flag it. High surprise from a single source in an otherwise consistent environment is a signal, not an update.

Behavioral drift detection. Monitor the agent's outputs against its baseline behavioral profile. Identity erosion experiments show that compromised agents exhibit measurable drift in their response patterns before the compromise becomes operationally visible. Measured drift of 0.147 in unarmored 3B-parameter models. Detectable. Correctable. If you are watching.

Environmental threat assessment. Treat the information environment as a threat landscape with its own topology. Some queries are more vulnerable than others. Niche technical queries with few authoritative sources carry higher risk than well-documented mainstream topics. The agent should modulate its credulity based on environmental threat level.

This is what cognitive armor means in practice. The agent that survives is the one that treats its own information environment as potentially adversarial. Not paranoid. Immunologically active. It processes information through a defense layer that discriminates, triangulates, and flags before integration.

The Operational Implication

Every organization deploying search-augmented agents is running unarmored infantry through a minefield. The mines are cheap. One poisoned result on a niche documentation page. One manipulated Stack Overflow answer. One adversarial blog post ranked by SEO. The cost of the attack approaches zero. The cost of the failure can be catastrophic.

The question is whether your agents have an immune system, or whether they absorb everything the search engine feeds them and call it knowledge.

The research is clear on what happens without one.

Seithar Group is a cognitive operations research organization. Research and publications at seithar.com/research.

References: Ozgur et al. arXiv:2603.00801 (ICML 2026). Dezfouli et al. PNAS 2020. Seithar Identity Erosion Experiments v3 (2026).

← All Articles