China's #1 coding AI costs more than GPT-5 and can't draw an SVG

Contents

GLM-5 beats closed models at one metric — then fails at everything users actually do

The open-source premium: GLM-5 costs more and does less than Llama 4

Nobody can verify the Huawei chip claim — and that’s the actual story

GLM-5 ranks #1 on agentic coding leaderboards while crashing on casual conversation and SVG generation, according to early testing on Feb 11, 2026. This isn’t a bug in China’s newest open-source AI. It’s the entire business model.

The era of general-purpose open-source models just ended. What replaced it: specialized toolkits where you’re assembling a stack, not choosing a model.

GLM-5 beats closed models at one metric — then fails at everything users actually do

The 744-billion-parameter model (with 40 billion active via mixture-of-experts) achieves the lowest hallucination rate on reasoning benchmarks, surpassing Claude Opus 4.5 and GPT-5.2-xhigh. That’s legitimate. But the same model that dominates technical accuracy can’t handle frontend work or conversational flow.

This isn’t model immaturity.

Zhipu AI trained GLM-5 deliberately on system-focused tasks: coding, agentic planning, long-context analysis. The result excels at backend logic while breaking on tasks that made ChatGPT mainstream. Developers already vibe coding with AI will find GLM-5 useful. Everyone else hits walls immediately.

The Feb 11 tester ranked it #3 overall despite that #1 agentic performance — a gap that tells you everything about what “state-of-the-art” actually means now. You’re not choosing between models anymore. You’re choosing which narrow capability you need right now, then switching tools.

The open-source premium: GLM-5 costs more and does less than Llama 4

GLM-5 charges $1 per million input tokens and $3.20 per million output tokens, according to Artificial Analysis benchmarks. Compare that to DeepSeek R1’s 96% cost advantage over ChatGPT — GLM-5 moves in the opposite direction.

But pricing isn’t the killer. Context is.

GLM-5’s 200,000-token context window looks adequate until you compare it to Llama 4 Scout’s 1 million tokens or Maverick’s 10 million. For multi-file codebases or long-document analysis, you hit limits before loading your full context. The model generates 110 million tokens versus a 15 million median — you’re paying for verbosity you didn’t ask for.

Speed penalty: 81.8 tokens per second output with 1.46 seconds time-to-first-token versus a 1.10-second median. And that MIT license that looks generous? It’s attached to a model that costs 5-20x more than alternatives with broader capabilities.

The open-source premium paradox is real.

Nobody can verify the Huawei chip claim — and that’s the actual story

The “100% trained on Huawei chips” narrative driving geopolitical coverage has zero public verification. No energy efficiency data. No kilowatt-hour comparisons to Nvidia H100s. No independent audits.

Research confirms GLM-5 uses DeepSeek Sparse Attention and FP16 precision training. Infrastructure claims? Unverified as of Feb 18, 2026. This echoes broader questions about Chinese data centers sitting empty while chipmakers run at capacity — raising doubts about the entire supply chain narrative.

For US developers, this means: evaluate GLM-5 on performance and cost, not on geopolitical assertions you can’t verify. The model is real. The infrastructure story might not be.

The hallucination benchmark dominance sits next to casual chat failures. The $3.20 per million output pricing sits next to the “open source” label. When specialized excellence requires you to assemble a toolkit instead of deploying one model, and when that toolkit costs more than closed alternatives — what exactly are we opening?