AI War: 20 Minutes After Claude Opus 4.6, OpenAI Strikes Back With GPT-5.3 Codex

gpt

Less than 24 hours after a public philosophical clash between OpenAI and Anthropic, the two rivals launched new flagship modelsโ€”within minutes of each other.

Anthropic revealed Claude Opus 4.6 around 6:40 PM. Just 20 minutes later, OpenAI responded with GPT-5.3 Codex, its most advanced coding model to date.

Two announcements. Two record-breaking models. One unmistakable message: the AI war has entered a new phase.

Two Flagships, One Night

Claude Opus 4.6 vs GPT-5.3 Codex (Feb 5, 2026)
Category Claude Opus 4.6 (Anthropic) GPT-5.3 Codex (OpenAI)
Positioning Flagship general-purpose model with a heavy focus on long-horizon work + coding + office workflows Developer-first โ€œCodexโ€ model optimized for agentic coding and fast iteration
Primary audience Developers + teams doing large-scale reasoning/document + professional workflows Developers building, refactoring, debugging, and shipping code via Codex + API
Context window Up to 1,000,000 tokens (headline capability) Not publicly emphasized in the announcement (Codex release focused on coding/agent metrics)
Key feature highlight Adaptive Thinking: dynamically adjusts โ€œthinking effortโ€ based on task complexity (4 intensity modes) Self-improvement loop: early GPT-5.3 variants used to debug/optimize later builds (per OpenAI claim)
Agent workflow feature Agent Teams (Claude Code): multiple agents can collaborate in parallel (code / test / docs) Codex-centric developer tooling + API availability (battle for dev mindshare)
Speed claim Not framed as the main headline in the announcement ~25% faster than its predecessor (OpenAI claim)
SWE-Bench Verified (bug fixing) 80.8% (Verified) โ€”
SWE-Bench (Pro) โ€” 56.8% (Pro)
OSWorld (computer use) 72.7% 64.7% (Verified)
Terminal-Bench 2.0 (agentic coding) 65.4% 77.3%
What it appears to win at Long-context reasoning, computer control tasks, real-world bug fixing, multi-agent collaboration Agentic coding benchmarks, speed, developer-focused execution loops
Rollout / availability Claude ecosystem + Claude Code (plus enterprise-style workflow positioning) Codex interface + developer API; ChatGPT public rollout for GPT-5.3 expected but not confirmed
Strategic narrative โ€œAI as a coworkerโ€ + office integration + trust positioning โ€œBest coding modelโ€ + developer recapture + rapid iteration advantage

On one side: Claude Opus 4.6, Anthropicโ€™s most powerful model ever, presented as the new global benchmark leaderโ€”ahead of Gemini 3 Pro and GPT-5.2 in multiple evaluations.

On the other: GPT-5.3 Codex, OpenAIโ€™s developer-focused response, released quietly but deliberately, ahead of what appears to be an imminent GPT-5.3 rollout for ChatGPT users.

Both companies describe their models as the most capable AI systems ever built for real-world coding and professional tasks. OpenAI even revealed a symbolic milestone: parts of GPT-5.3 Codex were optimized using earlier versions of the model itselfโ€”a form of AI-assisted self-improvement.

Claude Opus 4.6: Anthropic Pushes AI Into the Office

Many expected Anthropic to unveil a new Sonnet model. Instead, the company opened hostilities with its flagship.

Claude Opus 4.6 introduces what Anthropic calls โ€œAdaptive Thinkingโ€โ€”a system that dynamically adjusts reasoning depth (and cost) depending on task complexity. The goal is clear: deeper reasoning for hard problems, faster answers for simple ones.

Technically, Opus 4.6 sets a new bar with a 1-million-token context window, allowing it to process entire databases, long reports, or multi-book projects without losing coherence.

Anthropicโ€™s benchmarks show major gains across practical tasks:

  • SWE-Bench Verified (real-world bug fixing): 80.8%
  • OSWorld (computer control with mouse & keyboard): 72.7%
  • Humanityโ€™s Last Exam (reasoning without tools): 40%, leading its category

Anthropicโ€™s system card is unambiguous: according to its own evaluations, Claude Opus 4.6 is now the strongest general-purpose AI model on the market.

Beyond raw performance, Anthropic also introduced Agent Teams for Claude Codeโ€”multiple Opus instances collaborating on the same project (coding, testing, documenting in parallel).

The ambition clearly extends beyond developers. Anthropicโ€™s recent Super Bowl ad campaign signals a push toward mainstream usersโ€”and a direct challenge to both Google and OpenAI.

GPT-5.3 Codex: OpenAI Fires Backโ€”Fast

OpenAIโ€™s response came almost immediately.

Just 20 minutes after Claude Opus 4.6 went live, OpenAI announced GPT-5.3 Codex, a specialized model designed for developers and integrated into Codex tools and APIs. For now, ChatGPT users remain on GPT-5.2โ€”but OpenAI has strongly hinted that GPT-5.3 for the general public is close.

OpenAI claims several key advances:

  • Speed: GPT-5.3 Codex is 25% faster than its predecessor.
  • Self-improvement loop: Early versions of GPT-5.3 were used to debug and optimize later versionsโ€”shortening development cycles.
  • Stronger agentic performance, particularly in long-running coding tasks.

Early benchmark comparisons show a more nuanced picture than headline dominance:

Benchmark GPT-5.3 Codex Claude Opus 4.6
Terminal-Bench 2.0 77.3% 65.4%
OSWorld (PC) 64.7% (verified) 72.7%
SWE-Bench 56.8% (Pro) 80.8% (Verified)

The message is clear: OpenAI is targeting developers tempted by Claude Code, just as Anthropic begins courting everyday ChatGPT users.

A Turning Point for the Industry

This near-simultaneous release marks a deeper shift in how AI is evaluated.

The debate is no longer about who answers questions best. Itโ€™s about who can run complex workflows, coordinate multiple agents, and sustain long-term reasoning across real professional tasks.

February is shaping up to be explosive. New moves are expected from xAI, DeepSeek, and Google, and few doubt that both Anthropic and OpenAI are holding additional updates in reserve.

One thing is certain: The AI race is no longer about incremental upgrades. Itโ€™s about who controls the next generation of autonomous, working AI systems.

alex morgan
I write about artificial intelligence as it shows up in real life โ€” not in demos or press releases. I focus on how AI changes work, habits, and decision-making once itโ€™s actually used inside tools, teams, and everyday workflows. Most of my reporting looks at second-order effects: what people stop doing, what gets automated quietly, and how responsibility shifts when software starts making decisions for us.