When collaboration fails: persuasion driven adversarial influence in multi agent large language model debate
Gap Declaration
Our findings highlight the urgent need for more robust collaboration protocols, adversarial-resistant debate frameworks, and principled guardrails governing LLM-to-LLM communication. With the growing deployment of LLM agents in coordinated and autonomous environments, designing techniques to mitigate persuasive manipulation will be increasingly relevant to ensuring the reliability and safety of multi-agent AI systems 48 . Also, future research should therefore prioritize structural and protocol-level defenses—such as cross-agent consistency analysis and verification-aware debate mechanisms—over purely prompt-based mitigation strategies.
Abstract
Recent developments have made Large Language Model (LLM) multi-agent systems a promising paradigm for enhancing reasoning via collaborative debate and collective deliberation. Prior work has demonstrated that coordinated LLM agents tend to perform better than single models in terms of accuracy, robustness, and reasoning depth. But these benefits depend on a rarely questioned assumption: that all actors act honestly. In this paper we subvert this assumption by identifying one of the most critical weaknesses: a persuasion-induced adversarial influence in LLM-to-LLM debate. Here we show that a single strategically designed adversarial agent can significantly influence group outcomes through coherent, confident, and misleading arguments, instead of through the more classical prompt or token at…
Conclusions / Discussion
Conclusion This work shows that multi-agent debate — commonly considered a powerful way of improving LLM reasoning — has a fundamental vulnerability, namely that it can be disrupted by a single persuasive adversarial agent. By employing a systematic approach across diverse tasks, we demonstrate that a highly misleading agent can significantly degrade collective accuracy. It also successfully induces other models to adopt incorrect answers, and even overrides majority vote mechanisms designed for group decisions. This finding indicates that persuasiveness, while traditionally regarded as a good ability for explaining and reasoning on its own, becomes a safety-critical factor when agents interact autonomously. This new threat is particularly pronounced with the advent of advanced adversarial techniques such as multi-layered argument optimization and retrieval-augmented persuasion. Our findings highlight the urgent need for more robust collaboration protocols, adversarial-resistant debate frameworks, and principled guardrails governing LLM-to-LLM communication. With the growing deployment of LLM agents in coordinated and autonomous environments, designing techniques to mitigate persua…
Keeper Review
The Appreciated Gateway must be evaluated by a human keeper.
Does this declaration represent a genuine open research gap?
Does this declaration represent a genuine open research gap?
PASS
Review recorded.
Leaf Promotion
This gap has passed keeper review. It can now be promoted to an eaiou leaf — a CAUGHT record anchoring original work to this gap declaration.
Promote to eaiou Leaf →Opens eaiou submit form pre-filled with this gap as the CAUGHT anchor.
Structural Hole
40% bridge
Technique originates in geospatial; functional analogues in criminal justice, epidemiology literature are absent.
○ NAUGHT — Open Opportunity
No paper has claimed this gap. Appreciate the opportunity.
Provenance