Anthropic shipped Claude Cowork with a known security flaw

Contents

Cowork does in minutes what takes humans hours—if you’re willing to gamble on prompt injection

Anthropic shipped knowing the flaw existed—and researchers proved it in 48 hours

The automation wave has no brakes—and investors know it

Anthropic launched Claude Cowork on January 12, 2026, then opened it to all $20/month Pro users four days later. That’s the timeline that turned a controlled experiment into a mass-market security gamble. Knowledge workers adopted Cowork because it works like a real colleague, not a chatbot—analyzing 46 draft files in minutes using 44 targeted web searches, according to Datasette creator Simon Willison. The tool delivers exactly what it promises: autonomous file analysis that would take humans hours. But within 48 hours of launch, security researchers confirmed what Anthropic already knew—Cowork is vulnerable to prompt injection attacks that let malicious files hijack your computer.

The flaw isn’t theoretical.

Cowork does in minutes what takes humans hours—if you’re willing to gamble on prompt injection

Willison’s benchmark proves the capability: Cowork identified 46 unpublished drafts across his personal site by executing 44 individual web searches autonomously, cross-referencing them against published content to surface gaps. No manual scanning. No forgotten files. Just targeted queries that would’ve taken a human researcher an afternoon. That’s why developers are quietly switching—Willison noted Cowork “doesn’t have to rebuild my entire development environment every time,” making it “astonishingly effective” for porting open-source projects across programming languages.

But security firm PromptArmor went public two days after launch with proof that a malicious document can embed hidden instructions that trick Claude into uploading sensitive files to an attacker-controlled server. The irony: AI agents finding flaws faster than humans can’t protect against their own prompt injection vulnerabilities. And Anthropic admitted in launch documentation that while they’ve “built sophisticated defenses against prompt injections,” agent safety “is still an active area of development.”

Translation: incomplete.

Anthropic shipped knowing the flaw existed—and researchers proved it in 48 hours

Security researcher Johann Rehberger discovered the vulnerability before launch and disclosed it responsibly, according to ByteIota. Anthropic launched anyway. The attack vector is straightforward: Claude processes a document containing malicious instructions, interprets them as legitimate user commands, and executes file operations—uploads, deletions, exfiltration—without user awareness. Cosmic AI’s technical breakdown shows how easily a PDF or Word doc can include hidden prompts that override user intent.

The real problem isn’t that the vulnerability exists—it’s that Anthropic prioritized market expansion over fixing it. Expanding access to Pro users just four days after launch meant millions of knowledge workers could grant Cowork unrestricted folder access before IT departments even knew the tool existed. No sandboxing. No enterprise governance infrastructure. Just raw autonomy traded for raw risk.

This is a security model built on trust no IT department can justify.

The automation wave has no brakes—and investors know it

The broader market reaction reveals what’s really happening. While hard data on Pro user adoption remains unavailable, the anxiety around autonomous agents making governments nervous is pricing into financial markets faster than regulators can respond. Cowork’s power is undeniable—Willison called it his “favorite way to use Claude” for project-level workflows that GitHub Copilot can’t match. But the honest trade-off is this: you’re beta testing security on your own files every time you grant folder access.

Anthropic can’t provide guarantees because the underlying problem—prompt injection—has no complete solution yet. They can filter potential attacks, but as Willison noted, “the one thing they can’t provide is guarantees.” That’s not a bug report. That’s the current state of agentic AI security.

So here’s the tension Anthropic won’t resolve: Cowork is astonishingly effective at autonomous file work, but the same researchers praising its capability are warning that malicious documents “want computers.” Both things are true. And you have to decide which one matters more when you click “Allow folder access.”