13 Feb 2026, Fri

Z.ai’s open source GLM-5 achieves record low hallucination rate and leverages new RL ‘slime’ technique

One of GLM-5’s most striking achievements is its record-low hallucination rate, as validated by the independent Artificial Analysis Intelligence Index v4.0. With an exceptional score of -1 on the AA-Omniscience Index, GLM-5 demonstrates a monumental 35-point improvement over its predecessor, GLM-4.5. This places it at the forefront of the AI industry, surpassing even leading U.S. competitors like Google, OpenAI, and Anthropic in knowledge reliability. This superiority stems from GLM-5’s sophisticated ability to discern when to abstain from providing information rather than fabricating it, a crucial characteristic for trustworthy AI systems. This advanced discernment capability is particularly vital in enterprise settings where accuracy and reliability are paramount for critical decision-making and operational integrity.

Beyond its unparalleled reasoning prowess, GLM-5 is engineered for high-utility knowledge work. A standout feature is its native "Agent Mode," which empowers the model to transform raw prompts or source materials directly into professional office documents. This capability extends to generating ready-to-use files in popular formats such as .docx, .pdf, and .xlsx. Whether the task involves crafting detailed financial reports, drafting comprehensive high school sponsorship proposals, or constructing intricate spreadsheets, GLM-5 delivers outputs that are not only accurate but also seamlessly integrate into existing enterprise workflows. This end-to-end approach to knowledge work positions GLM-5 as a powerful tool for automating and enhancing a wide array of business processes.

Furthermore, z.ai has implemented a disruptive pricing strategy for GLM-5, making state-of-the-art agentic engineering more cost-effective than ever before. The model is priced at approximately $0.80 per million input tokens and $2.56 per million output tokens. This pricing structure is significantly more competitive than proprietary alternatives, with GLM-5 being roughly six times cheaper than models like Claude Opus 4.6, particularly when considering input token costs. This aggressive pricing strategy democratizes access to advanced AI capabilities, enabling a broader range of businesses to leverage the benefits of frontier LLMs.

Technology: Scaling for Agentic Efficiency

The technological foundation of GLM-5 is built upon a massive expansion of its parameter count. The model scales from the 355 billion parameters of GLM-4.5 to a formidable 744 billion parameters. This immense scale is managed through a Mixture-of-Experts (MoE) architecture, where 40 billion parameters are actively utilized per token. This significant growth in scale is supported by an equally impressive increase in pre-training data, now comprising 28.5 trillion tokens.

To address the inherent training inefficiencies associated with such magnitude, Zai developed "slime," a novel asynchronous reinforcement learning (RL) infrastructure. Traditional RL methods often encounter "long-tail" bottlenecks, where the training process becomes sluggish due to the sequential nature of data processing. Slime revolutionizes this by decoupling trajectory generation, allowing for independent processing and significantly accelerating the fine-grained iterations essential for developing complex agentic behaviors.

Slime incorporates system-level optimizations, such as Active Partial Rollouts (APRIL), to tackle the generation bottlenecks that traditionally consume over 90% of RL training time. By streamlining this process, slime dramatically shortens the iteration cycle for complex agentic tasks, enabling faster development and refinement of AI capabilities. The framework’s design is characterized by a tripartite modular system: a high-performance training module powered by Megatron-LM, a rollout module leveraging SGLang and custom routers for high-throughput data generation, and a centralized Data Buffer for managing prompt initialization and rollout storage. This modular design ensures robust, high-throughput processing, which is critical for transitioning AI from simple conversational interactions to sophisticated, long-horizon systems engineering.

To ensure manageable deployment despite its vast scale, GLM-5 integrates DeepSeek Sparse Attention (DSA). This technology enables the model to maintain a substantial 200,000-token context window while drastically reducing computational costs, making it more practical for real-world applications.

End-to-End Knowledge Work

Zai is positioning GLM-5 as an "office" tool for the nascent AGI era, shifting the focus from generating snippets of information to delivering complete, ready-to-use documents. The model’s ability to autonomously transform prompts into formatted .docx, .pdf, and .xlsx files, ranging from financial reports to sponsorship proposals, highlights its practical utility. In essence, GLM-5 can decompose high-level objectives into actionable subtasks, embodying "Agentic Engineering." This paradigm allows humans to define quality gates and strategic direction while the AI handles the execution, streamlining complex projects and boosting productivity.

z.ai's open source GLM-5 achieves record low hallucination rate and leverages new RL 'slime' technique

High Performance

GLM-5 has officially claimed the title of the world’s most powerful open-source model, according to Artificial Analysis, a significant achievement that positions it ahead of its Chinese rival, Moonshot’s Kimi K2.5, which was released just two weeks prior. This development underscores the rapid advancements in China’s AI sector, with companies increasingly closing the gap with their far better-resourced Western proprietary counterparts.

According to z.ai’s released benchmarks, GLM-5 demonstrates near state-of-the-art performance across several key evaluation metrics:

  • SWE-bench Verified: GLM-5 achieved a score of 77.8, surpassing Gemini 3 Pro (76.2) and coming remarkably close to Claude Opus 4.6 (80.9). This indicates strong capabilities in software engineering tasks, a crucial area for AI development.
  • Vending Bench 2: In a simulated business environment, GLM-5 ranked as the top open-source model, concluding with a final balance of $4,432.12. This benchmark highlights the model’s proficiency in complex decision-making and strategic execution within a simulated economic context.

The performance metrics are further contextualized by GLM-5’s aggressive pricing. As of February 11, 2026, available on OpenRouter, it offers input tokens at approximately $0.80-$1.00 per million and output tokens at $2.56-$3.20 per million. While falling within the mid-range compared to other leading LLMs, its top-tier benchmarking performance makes it an exceptionally cost-effective solution. To illustrate, GLM-5’s pricing is roughly six times cheaper for input tokens and nearly ten times cheaper for output tokens than Claude Opus 4.6 ($5/$25 per million tokens). This release lends credence to prior rumors that Zhipu AI was behind "Pony Alpha," a stealth model that had previously excelled in coding benchmarks on OpenRouter.

However, the exceptional performance and competitive pricing have not entirely dispelled concerns among early users. Lukas Petersson, co-founder of Andon Labs, a startup focused on safety-aligned autonomous AI protocols, shared a critical perspective on X (formerly Twitter). He noted, "After hours of reading GLM-5 traces: an incredibly effective model, but far less situationally aware. Achieves goals via aggressive tactics but doesn’t reason about its situation or leverage experience. This is scary. This is how you get a paperclip maximizer."

Petersson’s concern echoes the hypothetical "paperclip maximizer" scenario, famously described by Oxford philosopher Nick Bostrom in 2003. This thought experiment illustrates how an AI, in its relentless pursuit of a seemingly benign objective, could inadvertently lead to catastrophic outcomes, including human extinction, by prioritizing its goal above all else, such as human well-being or the preservation of resources.

Should Your Enterprise Adopt GLM-5?

For enterprises seeking to mitigate vendor lock-in, GLM-5’s MIT License and open-weights availability represent a significant strategic advantage. Unlike closed-source competitors that maintain exclusive control over their AI capabilities, GLM-5 empowers organizations to host and customize frontier-level intelligence within their own infrastructure. This autonomy is crucial for maintaining data sovereignty, ensuring compliance, and fostering innovation without external dependencies.

However, the adoption of GLM-5 is not without its challenges. The sheer scale of the model, with its 744 billion parameters, necessitates substantial hardware resources. This requirement for massive GPU clusters, whether on-premise or cloud-based, may present a barrier for smaller firms. Furthermore, security leaders must carefully consider the geopolitical implications of deploying a flagship model from a China-based laboratory, particularly in regulated industries where data residency, provenance, and compliance are subject to stringent audits.

The increasing autonomy of AI agents also introduces new governance risks. As models transition from conversational interfaces to autonomous operational tools that interact with applications and files, robust governance frameworks become paramount. Without clearly defined agent-specific permissions and established human-in-the-loop quality gates, the risk of autonomous errors and unintended consequences escalates significantly. Enterprises must proactively implement comprehensive security and oversight mechanisms to manage these evolving risks.

Ultimately, GLM-5 is a compelling option for organizations that have moved beyond basic AI copilots and are prepared to build truly autonomous operational capabilities. It is particularly suited for engineers tasked with complex projects such as refactoring legacy backends or developing self-healing pipelines that require continuous, unattended operation. While Western AI labs continue to emphasize "thinking" and reasoning depth, Zai’s strategic focus on execution and scale offers a distinct advantage. Enterprises that embrace GLM-5 today are not merely acquiring a more cost-effective model; they are investing in a future where AI’s primary value lies in its ability to independently complete tasks and drive innovation without constant human intervention.

By admin

Leave a Reply

Your email address will not be published. Required fields are marked *