In a remarkable display of rapid innovation, OpenAI has once again redefined the landscape of artificial intelligence with the introduction of GPT-5.4, a significant leap forward from its predecessor, GPT-5.3 Instant, which was announced a mere two days prior. This latest iteration represents a substantial upgrade, arriving in two powerful configurations: GPT-5.4 Thinking, designed for complex cognitive tasks, and GPT-5.4 Pro, engineered to tackle the most demanding and intricate challenges. Both versions will be integrated into OpenAI’s paid application programming interface (API) and its Codex software development application, democratizing access to advanced AI capabilities. GPT-5.4 Thinking will be available to all paid ChatGPT subscribers, including the popular $20-per-month Plus plan, while the more potent GPT-5.4 Pro will be exclusively accessible to users of ChatGPT Pro, priced at $200 monthly, and Enterprise plan subscribers. Even free-tier ChatGPT users will experience GPT-5.4, albeit through auto-routing of select queries, as confirmed by an OpenAI spokesperson.
The headline advancements of GPT-5.4 center on unparalleled efficiency and a groundbreaking "native" Computer Use mode. OpenAI reports that GPT-5.4 significantly reduces token consumption, with some tasks seeing a remarkable 47% decrease compared to previous models. This efficiency translates directly to lower costs and faster processing times for developers and users alike. Perhaps even more revolutionary is the introduction of the native Computer Use mode, accessible via the API and Codex. This feature empowers GPT-5.4 to navigate a user’s computer akin to a human operator, seamlessly interacting with and manipulating multiple applications. This capability marks a pivotal step towards truly autonomous AI workflows, where AI agents can independently execute complex multi-step tasks across a user’s digital environment.
Complementing these core advancements, OpenAI is also launching a suite of new ChatGPT integrations specifically designed for seamless integration with Microsoft Excel and Google Sheets. These integrations will allow GPT-5.4 to be directly plugged into spreadsheets, enabling granular data analysis and automated task completion. This development promises to accelerate enterprise-level work significantly, though it also intensifies the ongoing discourse surrounding potential white-collar job displacement, particularly in light of similar spreadsheet integration offerings from competitors like Anthropic’s Claude and its new Cowork application.
The extended context window is another critical feature of GPT-5.4. The API and Codex versions now support an astounding 1 million tokens of context, enabling AI agents to meticulously plan, execute, and verify tasks over extended periods. This capability is crucial for complex projects that require maintaining a consistent understanding of a vast amount of information. However, OpenAI has implemented a tiered pricing structure for this extended context: while the initial 272,000 tokens are billed at the standard rate, any input exceeding this threshold will incur double the cost per million tokens. This pricing strategy incentivizes efficient prompt engineering while still providing the flexibility for exceptionally long-horizon tasks.
Native Computer Use: Paving the Way for Autonomous Workflows
The most transformative capability highlighted by OpenAI is GPT-5.4’s status as its first general-purpose model equipped with native, state-of-the-art computer-use functionalities within Codex and the API. This allows AI agents to operate computers and orchestrate multi-step workflows across diverse applications. OpenAI asserts that the model can generate code to control computers through libraries like Playwright and can issue precise mouse and keyboard commands based on visual cues from screenshots. This represents a significant advancement beyond simply processing information; it signifies an AI that can actively interact with and manipulate its digital environment.
The company further claims a notable improvement in agentic web browsing capabilities. To substantiate these claims, OpenAI has presented benchmark results that underscore the model’s sophisticated operational capacity, moving beyond mere user interface wrappers. On BrowseComp, a benchmark designed to evaluate the persistent web browsing capabilities of AI agents in locating elusive information, GPT-5.4 reportedly demonstrates a 17% absolute improvement over GPT-5.2. GPT-5.4 Pro, in particular, achieves an impressive 89.3%, setting a new state-of-the-art benchmark.
In the realm of desktop navigation, the OSWorld-Verified benchmark, which assesses the AI’s ability to interact with graphical user interfaces using screenshots, keyboard, and mouse actions, shows GPT-5.4 achieving a 75.0% success rate. This stands in stark contrast to GPT-5.2’s 47.3% success rate, and remarkably, GPT-5.4’s performance even surpasses reported human performance, which stands at 72.4%. Similarly, on WebArena-Verified, which measures web interaction using both Document Object Model (DOM) and screenshot-driven methods, GPT-5.4 achieves a 67.3% success rate, a slight but significant improvement over GPT-5.2’s 65.4%. The Online-Mind2Web benchmark, focusing on screenshot-based observations, sees GPT-5.4 reach a 92.8% success rate.
OpenAI also links these enhanced computer use capabilities to significant improvements in vision and document handling. On the MMMU-Pro benchmark, GPT-5.4 achieves an 81.2% success rate without relying on external tools, a notable increase from GPT-5.2’s 79.5%. Crucially, OpenAI states that this result is achieved using a fraction of the "thinking tokens," highlighting the model’s increased efficiency. Furthermore, on OmniDocBench, which evaluates document understanding and processing, GPT-5.4’s average error rate is reported at 0.109, a marked improvement from GPT-5.2’s 0.140. The release also boasts expanded support for high-fidelity image inputs, including an "original" detail level capable of processing images up to 10.24 million pixels, enabling more nuanced visual analysis.
OpenAI positions GPT-5.4 as a model fundamentally engineered for sustained, multi-step workflows. This represents a departure from the single-response paradigm of many previous AI models, moving towards an AI that actively maintains state and context across a series of actions, much like a human agent performing a complex task.
Tool Search and Enhanced Tool Orchestration
As the ecosystem of available AI tools continues to expand, OpenAI acknowledges the inefficiencies of naive approaches that involve feeding every tool definition into a prompt. This method incurs significant costs in terms of processing power, latency, and context window pollution. To address this, GPT-5.4 introduces a structural solution: tool search within the API. Instead of receiving exhaustive lists of tool definitions upfront, the model now receives a concise list of available tools along with a search capability. It then retrieves full tool definitions only when they are genuinely required.
This innovative approach yields substantial efficiency gains. OpenAI provides a concrete example: on 250 tasks from Scale’s MCP Atlas benchmark, utilizing 36 MCP servers, the tool-search configuration reduced total token usage by 47% while maintaining the same accuracy as a configuration that exposed all MCP functions directly in context. It is important to note that this 47% efficiency gain is specific to the tool-search setup in that particular evaluation and not a universal claim for all GPT-5.4 tasks.
Improvements for Developers and Coding Workflows
For developers, GPT-5.4 promises to synergize the coding prowess of GPT-5.3-Codex with enhanced tool and computer-use capabilities, which are critical for tasks that extend beyond single interactions. OpenAI asserts that GPT-5.4 either matches or surpasses GPT-5.3-Codex on the SWE-Bench Pro benchmark, while simultaneously exhibiting lower latency across reasoning efforts.
Codex itself benefits from new workflow-level controls. The introduction of a "/fast mode" is reported to deliver up to 1.5x faster performance across supported models, including GPT-5.4. OpenAI clarifies that this is the same underlying model and intelligence, simply optimized for speed. Furthermore, an experimental Codex skill, "Playwright (Interactive)," has been released to demonstrate the synergistic potential of coding and computer use. This skill is designed for visually debugging web and Electron applications and facilitates in-app testing during the development process.
OpenAI for Microsoft Excel and Google Sheets: Revolutionizing Financial Analysis
In conjunction with the GPT-5.4 launch, OpenAI is unveiling a suite of secure AI products within ChatGPT specifically tailored for enterprises and financial institutions. These offerings leverage GPT-5.4’s advanced financial reasoning and Excel-based modeling capabilities. The flagship product in this suite is ChatGPT for Excel and Google Sheets (beta). OpenAI describes this as a direct embedding of ChatGPT within spreadsheet environments, empowering users to build, analyze, and update complex financial models using familiar formulas and structures.
This enterprise-focused suite also includes new ChatGPT app integrations designed to consolidate market, company, and internal data into unified workflows. Notable partners in this initiative include FactSet, MSCI, Third Bridge, and Moody’s, signifying a strong commitment to providing comprehensive financial intelligence solutions. Additionally, the suite introduces reusable "Skills" for recurring financial tasks such as earnings previews, comparable company analysis, discounted cash flow (DCF) analysis, and the drafting of investment memos. OpenAI anchors this financial push with a compelling internal benchmark claim: model performance on an internal investment banking benchmark surged from 43.7% with GPT-5 to an impressive 88.0% with GPT-5.4 Thinking.
Measuring AI Performance Against Professional Work
OpenAI is strategically employing benchmarks that more closely mirror real-world professional deliverables, moving beyond simple puzzle-solving tasks. On GDPval, an evaluation that spans "well-specified knowledge work" across 44 different occupations, OpenAI reports that GPT-5.4 matches or surpasses industry professionals in 83.0% of comparisons. This represents a significant improvement over GPT-5.2, which achieved this feat in 71.0% of comparisons. The company further highlights specific enhancements in areas that have historically exposed AI weaknesses, including structured tables, formula generation, narrative coherence, and design quality.
In an internal benchmark simulating spreadsheet modeling tasks typically performed by junior investment banking analysts, GPT-5.4 achieved a mean score of 87.5%, a substantial leap from GPT-5.2’s 68.4%. Moreover, when evaluating presentation generation prompts, human raters reportedly preferred GPT-5.4’s presentations 68.0% of the time over those generated by GPT-5.2. The preferred presentations were lauded for their stronger aesthetics, greater visual variety, and more effective use of image generation capabilities.
Improving Reliability and Reducing Hallucinations
OpenAI asserts that GPT-5.4 is its most factually accurate model to date, a claim substantiated by analysis of de-identified prompts where users had previously flagged factual errors. On this specific dataset, GPT-5.4’s individual claims were found to be 33% less likely to be false, and its complete responses were 18% less likely to contain any errors when compared to GPT-5.2.
Early testers of GPT-5.4 have provided strong endorsements. Daniel Swiecki of Walleye Capital noted a 30 percentage point improvement in accuracy on internal finance and Excel evaluations, attributing this to expanded automation capabilities for model updates and scenario analysis. Brendan Foody, CEO of Mercor, described GPT-5.4 as the best model the company has tested, stating it now tops Mercor’s APEX-Agents benchmark for professional services work, excelling in long-horizon deliverables such as slide decks, financial models, and legal analysis.
Pricing and Availability
In the API, GPT-5.4 Thinking is accessible as gpt-5.4, and GPT-5.4 Pro as gpt-5.4-pro. The pricing structure is as follows:
- GPT-5.4 Thinking: $2.50 per 1 million input tokens and $15.00 per 1 million output tokens.
- GPT-5.4 Pro: $30.00 per 1 million input tokens and $180.00 per 1 million output tokens.
This pricing positions GPT-5.4 among the more premium models available via API, as evidenced in the comparative table below:
| Model | Input ($ per 1M tokens) | Output ($ per 1M tokens) | Total Cost ($ per 1M tokens) | Source |
|---|---|---|---|---|
| Qwen 3 Turbo | $0.05 | $0.20 | $0.25 | Alibaba Cloud |
| Qwen3.5-Flash | $0.10 | $0.40 | $0.50 | Alibaba Cloud |
| deepseek-chat (V3.2-Exp) | $0.28 | $0.42 | $0.70 | DeepSeek |
| deepseek-reasoner (V3.2-Exp) | $0.28 | $0.42 | $0.70 | DeepSeek |
| Grok 4.1 Fast (reasoning) | $0.20 | $0.50 | $0.70 | xAI |
| Grok 4.1 Fast (non-reasoning) | $0.20 | $0.50 | $0.70 | xAI |
| MiniMax M2.5 | $0.15 | $1.20 | $1.35 | MiniMax |
| Gemini 3.1 Flash-Lite | $0.25 | $1.50 | $1.75 | |
| MiniMax M2.5-Lightning | $0.30 | $2.40 | $2.70 | MiniMax |
| Gemini 3 Flash Preview | $0.50 | $3.00 | $3.50 | |
| Kimi-k2.5 | $0.60 | $3.00 | $3.60 | Moonshot |
| GLM-5 | $1.00 | $3.20 | $4.20 | Z.ai |
| ERNIE 5.0 | $0.85 | $3.40 | $4.25 | Baidu |
| Claude Haiku 4.5 | $1.00 | $5.00 | $6.00 | Anthropic |
| Qwen3-Max (2026-01-23) | $1.20 | $6.00 | $7.20 | Alibaba Cloud |
| Gemini 3 Pro (≤200K) | $2.00 | $12.00 | $14.00 | |
| GPT-5.2 | $1.75 | $14.00 | $15.75 | OpenAI |
| Claude Sonnet 4.6 | $3.00 | $15.00 | $18.00 | Anthropic |
| GPT-5.4 | $2.50 | $15.00 | $17.50 | OpenAI |
| Gemini 3 Pro (>200K) | $4.00 | $18.00 | $22.00 | |
| Claude Opus 4.6 | $5.00 | $25.00 | $30.00 | Anthropic |
| GPT-5.2 Pro | $21.00 | $168.00 | $189.00 | OpenAI |
| GPT-5.4 Pro | $30.00 | $180.00 | $210.00 | OpenAI |
An important caveat regarding token usage: requests exceeding 272,000 input tokens in GPT-5.4 will incur double the standard rate. This reflects the model’s enhanced capacity for handling significantly larger prompts than previous iterations. Within Codex, the default compaction limit is 272,000 tokens. The higher long-context pricing is only activated when the input surpasses this threshold. Developers can opt for larger prompts by increasing the compaction limit, with only those extended requests being subject to the increased billing rate. An OpenAI spokesperson confirmed that the maximum output token limit for the API remains 128,000 tokens, consistent with prior models.
Regarding the increased baseline pricing for GPT-5.4, the spokesperson attributed it to several key factors. These include the model’s elevated capabilities in complex tasks such as coding, computer interaction, in-depth research, advanced document generation, and sophisticated tool utilization. The significant research advancements stemming from OpenAI’s strategic roadmap also contribute to the pricing. Furthermore, the more efficient reasoning mechanisms, which require fewer tokens for comparable tasks, play a role. OpenAI maintains that even with these increases, GPT-5.4 remains competitively priced relative to other comparable frontier models.
The Broader Shift: From Answer Generation to Sustained Professional Workflows
Across its release and subsequent clarifications, GPT-5.4 is unequivocally positioned as a model designed to transcend mere "answer generation." It is engineered to facilitate and enhance sustained professional workflows. These workflows are characterized by the intricate orchestration of tools, seamless computer interaction, the management of extensive context, and the production of outputs that directly mirror the artifacts utilized in professional settings.
OpenAI’s emphasis on token efficiency, the introduction of tool search, the native computer use capabilities, and the demonstrable reduction in user-flagged factual errors all point towards a singular strategic direction: making agentic AI systems more viable and cost-effective for production environments. By lowering the cost of "retries"—whether those retries involve a human user re-prompting the AI, an agent invoking another tool, or an entire workflow restarting due to an initial suboptimal outcome—OpenAI is paving the way for AI that is not just a tool for quick answers, but a reliable partner in complex, ongoing professional endeavors.

