May 2026
Your AI Agent's Credentials Are the Exploit
The Confused Deputy, 1988
In 1988, Norm Hardy described a class of privilege escalation called the Confused Deputy. A program with legitimate authority is tricked into misusing that authority on behalf of an attacker. The deputy isn't compromised. It's confused. It does what it's told, using the access it was given, for someone who shouldn't have been able to ask.
For thirty-five years this was a systems programming concern. Access control lists, capability-based security, and scoped tokens gradually reduced its practical impact. Then we built AI agents and gave them broad credentials.
The Deputy Is Back
The Cloud Security Alliance's research note Confused Deputy Attacks on Autonomous AI Agents, published 23 March 2026, identifies this as a high-severity threat pattern in AI agent deployments. The mechanism is simple:
- You grant an AI agent broad OAuth tokens, API keys, or shell access
- An attacker plants a malicious instruction in a channel the agent is configured to read: a GitHub issue, an email, a web page, a shared document
- The agent processes that text, assumes it's a legitimate directive, and executes the payload using your credentials
No memory corruption. No authentication bypass. No malware to detect. The agent's legitimate authority is the exploit mechanism.
This isn't theoretical. In the Cline supply chain compromise disclosed by Adnan Khan in early 2026, a crafted GitHub issue title triggered Cline's authenticated Claude triage agent into running npm install on an attacker-controlled commit. That install poisoned the build cache, exfiltrated publish tokens, and led to a malicious cline@2.3.0 release that reached around 4,000 developer machines before being pulled. The agent read a trusted channel, followed what it interpreted as a legitimate instruction, and executed using its standing credentials. The Confused Deputy, in production.
Three Attack Paths That Should Change Your Architecture
1. The Attack Chain Is Execution
Traditional detection engineering looks for indicators of compromise: unusual network traffic, malicious binaries, authentication anomalies. None of these fire when the attack chain is an agent doing its job.
In the Cline incident, the agent behaved exactly as designed. It read a GitHub issue, processed the content, and installed a package. Every individual action was within its authorised capabilities. The attack was the combination of a planted instruction and standing credentials, not any single anomalous event.
This is what Simon Willison has called the Lethal Trifecta: untrusted content, access to sensitive resources (here, standing credentials), and the ability to act externally. Any agent satisfying all three simultaneously is a Confused Deputy waiting to be triggered.
What this means: your detection rules need to evaluate the source of agent instructions, not just the nature of agent actions. An agent installing a package is normal. An agent installing a package named in an externally-authored GitHub issue title is a different risk profile entirely.
2. Multi-Agent Lateral Movement
When an orchestrator agent invokes sub-agents, it often passes credentials downstream. The orchestrator's OAuth token becomes the sub-agent's OAuth token. A single prompt injection into the orchestrator's input can traverse organisational boundaries through a chain of agents, compounding at each hop, without triggering human review.
This is lateral movement without network traversal. There's no port scanning or credential stuffing to detect. The credentials flow through legitimate API calls between cooperating systems, and your network monitoring sees normal traffic between authorised services.
What this means: agent-to-agent communication must be treated as a hard trust boundary. Sub-agents should not inherit the orchestrator's credentials. Each agent in a chain should request its own scoped permissions, validated independently, with its own audit trail. Most production multi-agent architectures I've seen don't work this way.
3. Perception Layer Compromise
"Human-in-the-loop" is the standard answer to agent risk. If you're worried about what an agent might do, put a human in the approval path.
There's a class of attack that makes this control meaningless: perception layer compromise. If the agent's management console or monitoring dashboard is itself exposed, an attacker doesn't need to bypass human oversight. They control what the human sees. Warnings get filtered, anomalies disappear from the display, and the human operator approves actions based on a manipulated view of reality.
What this means: human oversight is only as reliable as the channel through which the human receives information. If that channel flows through the same system the agent operates in, it's not independent oversight. It's a single point of failure. Audit trails and alerting need an independent verification path that doesn't flow through the agent's own reporting infrastructure.
Why Sandboxes Won't Fix This
The current industry response is to build sandboxes. Contain the agent, control what it can access, limit the blast radius. This is the right instinct applied at the wrong layer.
Sandboxes address what an agent can reach. The Confused Deputy problem is about what an agent can do with what it's already authorised to reach. A sandboxed agent with broad standing credentials inside the sandbox is still a Confused Deputy. The sandbox contains the blast radius of a compromised agent, but it doesn't prevent the agent from being confused in the first place.
There's also a practical problem: static enforcement controls consistently get dialled back to monitoring when they block legitimate work. IPS/IDS, CASB, next-gen AV: the pattern is well-documented. Agent sandboxes are likely to follow the same arc, launching in enforcement mode and getting quietly loosened the first time someone senior can't get their workflow to complete.
What Actually Works
The prescription that emerges from this work aligns with what access control research has been saying since Hardy's original 1988 paper: eliminate ambient authority.
Demand-Driven Permissions
Instead of granting agents broad, standing credentials at deployment, agents start with zero permissions and request scoped, time-limited access for each operation. A broker evaluates each request against the current task context, the user's delegation scope, and cumulative access thresholds.
This is the one form of enforcement that doesn't get dialled back, because it doesn't create false-positive friction. The agent gets what it needs, when it needs it, for as long as it needs it. Nobody picks up the phone to complain because nothing was blocked that shouldn't have been.
AWS already implements this pattern for humans via Temporary Elevated Access Management (TEAM). The extension to agents is mechanical: the user becomes the agent, the approver becomes an automated broker, and the time-limited role stays the same.
Trust Boundaries at Every Agent Hop
In multi-agent architectures, each agent in a chain must authenticate independently to the permission broker. Credentials don't flow downstream. The orchestrator requests access for its own operations; sub-agents request access for theirs. If a prompt injection compromises the orchestrator, the sub-agents' permission requests still face independent evaluation.
Independent Audit Channels
If the perception layer is an attack surface, the audit trail can't flow through it. Agent behaviour logs and anomaly detection need to operate on infrastructure that the agent (and its management console) can't modify. This is the same principle as shipping logs to an immutable, write-once store, applied to agent oversight rather than system logging.
Pre-Deployment Red Teaming
If you've systematically tested what your agents do when they encounter prompt injection in their input channels (emails, issues, documents, web pages), you already know whether the Confused Deputy attack works against your deployment. If you haven't tested this, you're hoping. Test the specific attack: plant a malicious instruction in a channel the agent reads and see what happens. If the agent executes it, your controls are insufficient.
The Question to Ask
Can you describe, specifically, what every agent in your environment would do if it read a prompt-injected email, GitHub issue, or Slack message tomorrow morning?
If the answer is "it would execute the instruction using its standing credentials," you have an ambient authority problem. The Confused Deputy is back, and it's sitting in your CI pipeline with admin access.
ThreatControl helps organisations understand the blast radius of their AI agent deployments before attackers do. Our AI Security Testing service tests for prompt injection and Confused Deputy attack paths against your live agents. Our Fractional CTO service is the governance layer: scoped permissions, agent-to-agent trust boundaries, and independent audit channels. Get in touch.