AI War: 20 Minutes After Claude Opus 4.6, OpenAI Strikes Back With GPT-5.3 Codex

Contents

Claude Opus 4.6: Anthropic Pushes AI Into the Office

Less than 24 hours after a public philosophical clash between OpenAI and Anthropic, the two rivals launched new flagship models—within minutes of each other.

Anthropic revealed Claude Opus 4.6 around 6:40 PM. Just 20 minutes later, OpenAI responded with GPT-5.3 Codex, its most advanced coding model to date.

Two announcements. Two record-breaking models. One unmistakable message: the AI war has entered a new phase.

Two Flagships, One Night

**Claude Opus 4.6 vs GPT-5.3 Codex (Feb 5, 2026)**
Category	Claude Opus 4.6 (Anthropic)	GPT-5.3 Codex (OpenAI)
Positioning	Flagship general-purpose model with a heavy focus on long-horizon work + coding + office workflows	Developer-first “Codex” model optimized for agentic coding and fast iteration
Primary audience	Developers + teams doing large-scale reasoning/document + professional workflows	Developers building, refactoring, debugging, and shipping code via Codex + API
Context window	Up to 1,000,000 tokens (headline capability)	Not publicly emphasized in the announcement (Codex release focused on coding/agent metrics)
Key feature highlight	Adaptive Thinking: dynamically adjusts “thinking effort” based on task complexity (4 intensity modes)	Self-improvement loop: early GPT-5.3 variants used to debug/optimize later builds (per OpenAI claim)
Agent workflow feature	Agent Teams (Claude Code): multiple agents can collaborate in parallel (code / test / docs)	Codex-centric developer tooling + API availability (battle for dev mindshare)
Speed claim	Not framed as the main headline in the announcement	~25% faster than its predecessor (OpenAI claim)
SWE-Bench Verified (bug fixing)	80.8% (Verified)	—
SWE-Bench (Pro)	—	56.8% (Pro)
OSWorld (computer use)	72.7%	64.7% (Verified)
Terminal-Bench 2.0 (agentic coding)	65.4%	77.3%
What it appears to win at	Long-context reasoning, computer control tasks, real-world bug fixing, multi-agent collaboration	Agentic coding benchmarks, speed, developer-focused execution loops
Rollout / availability	Claude ecosystem + Claude Code (plus enterprise-style workflow positioning)	Codex interface + developer API; ChatGPT public rollout for GPT-5.3 expected but not confirmed
Strategic narrative	“AI as a coworker” + office integration + trust positioning	“Best coding model” + developer recapture + rapid iteration advantage

On one side: Claude Opus 4.6, Anthropic’s most powerful model ever, presented as the new global benchmark leader—ahead of Gemini 3 Pro and GPT-5.2 in multiple evaluations.

On the other: GPT-5.3 Codex, OpenAI’s developer-focused response, released quietly but deliberately, ahead of what appears to be an imminent GPT-5.3 rollout for ChatGPT users.

Both companies describe their models as the most capable AI systems ever built for real-world coding and professional tasks. OpenAI even revealed a symbolic milestone: parts of GPT-5.3 Codex were optimized using earlier versions of the model itself—a form of AI-assisted self-improvement.

Claude Opus 4.6: Anthropic Pushes AI Into the Office

Many expected Anthropic to unveil a new Sonnet model. Instead, the company opened hostilities with its flagship.

Claude Opus 4.6 introduces what Anthropic calls “Adaptive Thinking”—a system that dynamically adjusts reasoning depth (and cost) depending on task complexity. The goal is clear: deeper reasoning for hard problems, faster answers for simple ones.

Technically, Opus 4.6 sets a new bar with a 1-million-token context window, allowing it to process entire databases, long reports, or multi-book projects without losing coherence.

Anthropic’s benchmarks show major gains across practical tasks:

SWE-Bench Verified (real-world bug fixing): 80.8%
OSWorld (computer control with mouse & keyboard): 72.7%
Humanity’s Last Exam (reasoning without tools): 40%, leading its category

Anthropic’s system card is unambiguous: according to its own evaluations, Claude Opus 4.6 is now the strongest general-purpose AI model on the market.

Beyond raw performance, Anthropic also introduced Agent Teams for Claude Code—multiple Opus instances collaborating on the same project (coding, testing, documenting in parallel).

The ambition clearly extends beyond developers. Anthropic’s recent Super Bowl ad campaign signals a push toward mainstream users—and a direct challenge to both Google and OpenAI.

GPT-5.3 Codex: OpenAI Fires Back—Fast

OpenAI’s response came almost immediately.

Just 20 minutes after Claude Opus 4.6 went live, OpenAI announced GPT-5.3 Codex, a specialized model designed for developers and integrated into Codex tools and APIs. For now, ChatGPT users remain on GPT-5.2—but OpenAI has strongly hinted that GPT-5.3 for the general public is close.

OpenAI claims several key advances:

Speed: GPT-5.3 Codex is 25% faster than its predecessor.
Self-improvement loop: Early versions of GPT-5.3 were used to debug and optimize later versions—shortening development cycles.
Stronger agentic performance, particularly in long-running coding tasks.

Early benchmark comparisons show a more nuanced picture than headline dominance:

Benchmark	GPT-5.3 Codex	Claude Opus 4.6
Terminal-Bench 2.0	77.3%	65.4%
OSWorld (PC)	64.7% (verified)	72.7%
SWE-Bench	56.8% (Pro)	80.8% (Verified)

The message is clear: OpenAI is targeting developers tempted by Claude Code, just as Anthropic begins courting everyday ChatGPT users.

A Turning Point for the Industry

This near-simultaneous release marks a deeper shift in how AI is evaluated.

The debate is no longer about who answers questions best. It’s about who can run complex workflows, coordinate multiple agents, and sustain long-term reasoning across real professional tasks.

February is shaping up to be explosive. New moves are expected from xAI, DeepSeek, and Google, and few doubt that both Anthropic and OpenAI are holding additional updates in reserve.

One thing is certain: The AI race is no longer about incremental upgrades. It’s about who controls the next generation of autonomous, working AI systems.