DeepSeek, the ambitious Chinese AI startup operating under the umbrella of High-Flyer Capital Management’s quantitative analysis firm, has once again sent shockwaves through the global artificial intelligence landscape. Following its near-overnight sensation in January 2025 with the release of the open-source R1 model, which astonishingly matched the performance of proprietary U.S. giants, DeepSeek has now unveiled DeepSeek-V4, a 1.6-trillion-parameter Mixture-of-Experts (MoE) model that is poised to redefine the economics and accessibility of frontier-class AI. Available freely under the commercially permissive MIT License, DeepSeek-V4 not only rivals, but in some benchmarks, surpasses the capabilities of the world’s most advanced closed-source systems, all while offering its API access at approximately one-sixth the cost.
This monumental release, described by DeepSeek AI researcher Deli Chen on X as a "labor of love" meticulously crafted over 484 days since the launch of V3, is being hailed as the "second DeepSeek moment." Chen’s poignant declaration, "AGI belongs to everyone," encapsulates the company’s ethos of democratizing access to cutting-edge AI. DeepSeek-V4 is immediately accessible on the popular AI code-sharing community Hugging Face and through DeepSeek’s own API.
Frontier-Class AI Enters a Lower Price Band
The most immediate and impactful consequence of the DeepSeek-V4 launch is its seismic effect on the economics of advanced AI. While the initial speculation regarding near-zero pricing was corrected, DeepSeek’s strategy remains clear: to dramatically reduce the cost barrier for accessing high-performance AI models. DeepSeek-V4-Pro, the flagship offering, is priced through its API at $1.74 USD per 1 million input tokens on a cache miss and $3.48 per million output tokens. This translates to a combined cost of $5.22 for a simple one-million-input, one-million-output interaction. When input tokens are cached, the price for input drops to a mere $0.145 per million tokens, bringing the blended cost for the same interaction down to an astonishing $3.625.
This pricing strategy starkly contrasts with the premium rates commanded by leading U.S. competitors. For instance, GPT-5.5 is priced at $5.00 per million input tokens and a hefty $30.00 per million output tokens, resulting in a combined cost of $35.00 for the same basic interaction. Similarly, Claude Opus 4.7 is priced at $5.00 for input and $25.00 for output, totaling $30.00. These figures highlight that DeepSeek-V4-Pro, on standard, cache-miss pricing, is approximately one-seventh the cost of GPT-5.5 and about one-sixth the cost of Claude Opus 4.7. The economic disparity widens further with cached input, where DeepSeek-V4-Pro costs about one-tenth as much as GPT-5.5 and roughly one-eighth as much as Claude Opus 4.7.
Adding another layer to this disruptive pricing model is the DeepSeek-V4-Flash variant. Priced at an astonishing $0.14 per million input tokens on a cache miss and $0.28 per million output tokens, the combined cost for a million input and output tokens is a mere $0.42. With cached input, this price plummets to $0.308. In this scenario, DeepSeek’s more accessible model is over 98% cheaper than GPT-5.5 and Claude Opus 4.7, a reduction approaching 1/100th the cost, albeit with a commensurate dip in performance.
DeepSeek’s aggressive pricing strategy is not merely a promotional tactic; it represents a fundamental shift in the AI market. By compressing advanced model economics into a significantly lower price band, DeepSeek compels developers and enterprises to re-evaluate their cost-benefit analyses for adopting premium closed-source models. For organizations with substantial inference workloads, this price differential can make previously economically unfeasible automation tasks viable. The launch effectively challenges the market’s reliance on performance alone as the primary differentiator for premium AI providers.
Benchmarking the Frontier: DeepSeek-V4-Pro Nears, But Doesn’t Quite Surpass, Top Closed Models
DeepSeek-V4-Pro-Max is best understood as a significant leap forward for open-weight models, rather than a complete dethroning of the latest proprietary frontier systems. While DeepSeek’s internal benchmark comparisons, which pit V4 against GPT-5.4 xHigh, Claude Opus 4.6 Max, and Gemini 3.1 Pro High, showcase its superiority on certain tasks like Codeforces and Apex Shortlist, these are not direct head-to-head comparisons with the most current iterations of GPT-5.5 and Claude Opus 4.7.
In a more direct comparison with the latest proprietary models on shared benchmarks, DeepSeek-V4-Pro-Max demonstrates a more nuanced picture. GPT-5.5 and Claude Opus 4.7 generally maintain a lead across most categories. DeepSeek-V4-Pro-Max’s strongest performance is observed on BrowseComp, a benchmark assessing agentic AI web browsing capabilities, where it achieves 83.4%, narrowly trailing GPT-5.5 at 84.4% and leading Claude Opus 4.7 at 79.3%. On Terminal-Bench 2.0, DeepSeek scores 67.9%, closely matching Claude Opus 4.7’s 69.4%, but falling significantly behind GPT-5.5’s 82.7%.
The academic reasoning benchmarks lean towards the closed models. On GPQA Diamond, DeepSeek-V4-Pro-Max scores 90.1%, compared to GPT-5.5’s 93.6% and Claude Opus 4.7’s 94.2%. Similarly, on Humanity’s Last Exam (without tools), DeepSeek’s 37.7% lags behind GPT-5.5 (41.4%), GPT-5.5 Pro (43.1%), and Claude Opus 4.7 (46.9%). Even with tools enabled, DeepSeek’s 48.2% trails GPT-5.5 (52.2%), GPT-5.5 Pro (57.2%), and Claude Opus 4.7 (54.7%).
Agentic and software-engineering benchmarks present a more mixed, yet still somewhat trailing, performance for DeepSeek-V4-Pro-Max. On Terminal-Bench 2.0, DeepSeek’s 67.9% is competitive with Claude Opus 4.7’s 69.4%, but GPT-5.5’s 82.7% remains significantly higher. On SWE-Bench Pro, DeepSeek’s 55.4% is surpassed by GPT-5.5 (58.6%) and Claude Opus 4.7 (64.3%). On MCP Atlas, DeepSeek’s 73.6% is slightly behind GPT-5.5 (75.3%) and Claude Opus 4.7 (79.1%). BrowseComp stands out as a strong point, where DeepSeek’s 83.4% not only beats Claude Opus 4.7 (79.3%) but also nearly matches GPT-5.5 (84.4%), though GPT-5.5 Pro’s 90.1% remains considerably ahead.
Ultimately, DeepSeek-V4-Pro-Max does not definitively dethrone GPT-5.5 or Claude Opus 4.7 on directly comparable benchmarks. However, its performance on several key enterprise-relevant tasks, such as agentic capabilities and reasoning, is close enough to warrant significant attention. Coupled with its drastically lower API pricing—approximately one-sixth to one-seventh the cost of its proprietary counterparts—DeepSeek-V4 forces a fundamental reevaluation of AI deployment economics. DeepSeek-V4-Pro-Max is undeniably the leading open-weight model currently available, and its proximity to frontier closed systems on practical benchmarks, combined with its affordability and open accessibility, positions it as a transformative force.
A Significant Leap from DeepSeek V3.2

The magnitude of the DeepSeek-V4 release is best appreciated by examining the performance gains over its predecessor, DeepSeek-V3.2-Base. The V4-Pro-Base model represents a substantial advancement. In World Knowledge, V4-Pro-Base achieved 90.1 on MMLU (5-shot), an improvement over V3.2’s 87.8, and a remarkable jump on MMLU-Pro from 65.5 to 73.5. High-level reasoning and verified facts show even more pronounced improvements: on SuperGPQA, V4-Pro-Base reached 53.9, up from V3.2’s 45.0, and on the FACTS Parametric benchmark, it more than doubled its predecessor’s performance, soaring from 27.1 to 62.6. Simple-QA verified scores also saw a dramatic rise from 28.3 to 55.2.
Long context capabilities have been significantly enhanced. On LongBench-V2, V4-Pro-Base scored 51.5, substantially outpacing the 40.2 achieved by V3.2-Base. In code and math, V4-Pro-Base reached 76.8 on HumanEval (Pass@1), an increase from 62.8 on V3.2-Base. These figures underscore DeepSeek’s success not only in optimizing for inference cost but also in fundamentally enhancing the intelligence density of its base architecture. The efficiency gains are equally compelling for the Flash variant. DeepSeek-V4-Flash-Base, despite its considerably smaller parameter count, outperforms the larger V3.2-Base across a wide array of benchmarks, particularly in long-context scenarios.
Introducing Manifold-Constrained Hyper-Connections (mHC): A New Information Traffic Controller
DeepSeek’s ability to deliver such competitive performance at lower price points is underpinned by radical architectural innovations detailed in its technical report, "Towards Highly Efficient Million-Token Context Intelligence." The standout technical achievement of V4 is its native one-million-token context window. Historically, supporting such extensive context required immense memory, primarily through the key-value (KV) cache. DeepSeek has overcome this challenge by introducing a Hybrid Attention Architecture. This architecture combines Compressed Sparse Attention (CSA) to reduce initial token dimensionality with Heavily Compressed Attention (HCA) to aggressively compress the memory footprint for long-range dependencies. In practice, the V4-Pro model requires only 10% of the KV cache and 27% of the single-token inference FLOPs compared to DeepSeek-V3.2, even when operating at a 1M token context.
To stabilize a network of 1.6 trillion parameters, DeepSeek has moved beyond traditional residual connections, incorporating Manifold-Constrained Hyper-Connections (mHC). This innovation strengthens signal propagation across layers while preserving the model’s expressivity, enabling AI to process a wider flow of information for learning more complex patterns without the risk of instability during training. It functions akin to an AI traffic controller ensuring smooth and efficient information flow on a massive highway. This is complemented by the Muon optimizer, which facilitated faster convergence and enhanced training stability during pre-training on over 32 trillion diverse and high-quality tokens. This training data was meticulously refined to remove auto-generated content, mitigating model collapse and prioritizing unique academic values. The model’s 1.6T parameters are structured using a Mixture-of-Experts (MoE) design, where only 49 billion parameters are activated per token, further reducing computational requirements.
Cultivating the Mixture-of-Experts (MoE) for Holistic Performance
DeepSeek-V4 was not merely trained; it was "cultivated" through a unique two-stage paradigm. The model’s reasoning capabilities are further segmented into three distinct "effort" modes, allowing for adaptable performance based on the computational resources and desired output quality.
Breaking the Nvidia GPU Stranglehold with Local Chinese Huawei Ascend NPUs
While the model weights are the primary headline, the accompanying open-sourced software stack is arguably more critical for the future of "Sovereign AI." Analyst Rui Ma highlighted a crucial detail: DeepSeek validated its fine-grained Expert Parallelism (EP) scheme on Huawei Ascend NPUs (neural processing units). By achieving a 1.50x to 1.73x speedup on non-Nvidia GPU platforms, DeepSeek has provided a blueprint for high-performance AI deployment that is resilient to Western GPU supply chains and export controls. It is important to note that DeepSeek has stated that officially licensed, legal Nvidia GPUs were still used for DeepSeek V4’s training, in addition to the Huawei NPUs.
Furthermore, DeepSeek has open-sourced the MegaMoE mega-kernel as part of its DeepGEMM library. This CUDA-based implementation offers up to a 1.96x speedup for latency-sensitive tasks such as RL rollouts and high-speed agent serving. This initiative ensures that developers can run these massive models with extreme efficiency on existing hardware, solidifying DeepSeek’s role as a primary driver of open-source AI infrastructure. The technical report emphasizes that these optimizations are vital for supporting the standard 1M context across all official services.
Licensing, Local Deployment, and Community Impact
DeepSeek-V4 is released under the MIT License, the most permissive licensing framework in the industry, allowing developers to use, copy, modify, and distribute the weights for commercial purposes without royalties. This stands in stark contrast to the more restrictive "open-weight" licenses often favored by other companies. For local deployment, DeepSeek recommends sampling parameters of temperature = 1.0 and top_p = 1.0. For users employing the "Think Max" reasoning mode, a context window of at least 384K tokens is advised to prevent truncation of the model’s internal reasoning chains. The release includes a dedicated encoding folder with Python scripts to facilitate OpenAI-compatible message encoding and parsing of the model’s output, including its reasoning content. DeepSeek-V4 also integrates seamlessly with leading AI agents such as Claude Code, OpenClaw, and OpenCode, underscoring its potential as a foundational element for developer tools and an open-source alternative to proprietary cloud provider ecosystems.
The community reaction has been overwhelmingly positive, characterized by a sense of shock and validation. Hugging Face officially welcomed the "whale" back, proclaiming the arrival of the era of cost-effective 1M context length. Industry experts have echoed this sentiment, labeling the release the "second DeepSeek moment" and asserting that it has reset the developmental trajectory of the entire field, placing significant pressure on closed-source providers like OpenAI and Anthropic to justify their premium pricing. AI evaluation firm Vals AI noted that DeepSeek-V4 has become the "#1 open-weight model on our Vibe Code Benchmark, and it’s not close."
DeepSeek is rapidly phasing out its older architectures, with legacy deepseek-chat and deepseek-reasoner endpoints scheduled for full retirement on July 24, 2026. All current traffic is being rerouted to the V4-Flash architecture, signifying a complete transition to the million-token standard.
DeepSeek-V4 represents more than just a new model; it is a direct challenge to the established order. By demonstrating that architectural innovation can effectively substitute for brute-force compute, DeepSeek has made cutting-edge AI intelligence accessible to the global developer community at a significantly reduced cost. This development holds immense potential benefit for the world, even as lawmakers and leaders grapple with concerns about Chinese labs leveraging U.S. proprietary giants for open-source training and fears surrounding the potential misuse of open-source or jailbroken proprietary models for nefarious purposes. While such risks are inherent to any powerful new technology, mirroring historical parallels with the internet itself, the benefits of broadened access to advanced AI appear to far outweigh them. DeepSeek’s unwavering commitment to keeping frontier AI models open serves the entire planet of potential AI users, particularly enterprises seeking to adopt cutting-edge technology at the most competitive price point.

