Google Built a Cheaper AI That Beats GPT — But It’s Still Not Ready for Production

gemini 3.1

Google just dropped a reasoning model that doubles benchmark scores while keeping prices flat. The problem: it’s still in “preview,” which means most enterprises can’t actually use it.

Gemini 3.1 Pro launched February 19, 2026, and within 12 hours hit 566 points and 719 comments on Hacker News—the kind of developer frenzy usually reserved for GPT releases. The performance numbers explain why. On ARC-AGI-2 benchmark, which tests a model’s ability to solve entirely new logic patterns, 3.1 Pro scored 77.1%—more than double the 31.1% its predecessor managed. This isn’t a modest .1 update that nudges benchmarks by a few points. This is generational.

And Google did it without raising prices.

Google’s $2 reasoning model just made GPT-5 look overpriced

The competitive positioning is brutal. On Humanity’s Last Exam, which evaluates advanced domain expertise, Gemini 3.1 Pro hit 44.4% while OpenAI’s GPT-5.2 managed just 34.5%. That’s a 10-point gap on the hardest reasoning test available. But here’s the kicker: Google kept pricing at $2 per million input tokens and $12 per million output tokens—identical to Gemini 3 Pro. The context window remains massive at 1 million input tokens (roughly 1,500 A4 pages) with 64K output capacity.

This continues Google’s pricing strategy of undercutting OpenAI while matching or exceeding performance benchmarks. The pricing gap puts pressure on OpenAI’s financial pressures, especially as the company reportedly burns through capital while maintaining premium pricing. For document analysis, legal review, or research synthesis—tasks that demand both massive context and deep reasoning—3.1 Pro delivers more capability for less money than anything else on the market.

The top Hacker News comment captured the moment: “Google just dropped a reasoning nuke—77% ARC-AGI while keeping prices flat? Devs, drop everything.”

The preview problem Google can’t solve

Except they can’t drop everything. Because Gemini 3.1 Pro remains in “preview” status across every distribution channel: the Gemini API in Google AI Studio, Vertex AI, and Google Antigravity. So does Gemini 3 Pro, despite being the older model. Preview means no service-level agreement, no guaranteed uptime, no liability protection if the model fails in production.

And it does fail. Developers reported 503 errors on February 19 with the message “This model is currently experiencing high demand,” with resolution times ranging from 30 to 120 minutes. File editing tasks generated “model produced a malformed edit” errors, where the API reports success despite the operation failing. One developer documented the issue: “The prior attempt to resolve the issue by replacing the entire file content failed. The ‘target content not found’ error is perplexing.”

The preview status complicates enterprise AI decisions, forcing IT teams to choose between cutting-edge capabilities and contractual guarantees. Google has the technology. It just can’t ship it with the stability enterprises demand.

Why developers are using it anyway—and what they’re risking

The performance gap is too large to ignore. Developers on Reddit’s r/MachineLearning noted that “3.1 Pro crushes agentic workflows but still preview-only—feels like Google’s playing 4D chess with previews.” The model’s AI agent capabilities, combined with the 1-million-token context window, make it particularly attractive for complex automation tasks that require reasoning across massive datasets.

But the risk calculus is clear: use the most powerful reasoning model at the lowest price, or wait for Google to ship something production-ready. The company that dominates search still hasn’t figured out how to ship stable AI infrastructure. That’s the actual story here—not the benchmarks, not the pricing, but the fact that Google can build breakthrough technology and still lose the enterprise market to OpenAI because “preview” is a warning, not just a label.

Google proved it can build the most powerful reasoning model at the lowest price. It just can’t prove the model will work when you need it to.

alex morgan
I write about artificial intelligence as it shows up in real life — not in demos or press releases. I focus on how AI changes work, habits, and decision-making once it’s actually used inside tools, teams, and everyday workflows. Most of my reporting looks at second-order effects: what people stop doing, what gets automated quietly, and how responsibility shifts when software starts making decisions for us.