Just last year, an OpenAI finance analyst faced a monumental task: comparing revenue across diverse geographies and intricate customer cohorts. This complex undertaking involved a grueling hours-long process of sifting through an overwhelming 70,000 datasets, meticulously crafting SQL queries, and diligently verifying table schemas. Today, that same analyst can achieve the same, if not more, insightful results in mere minutes, simply by typing a question in plain English into Slack and receiving a polished, finished chart. This dramatic transformation is the result of a powerful AI data agent, a testament to the company’s commitment to leveraging its own cutting-edge technology.
The innovative tool responsible for this paradigm shift was conceived and built by a lean team of two engineers in an astonishingly short three-month period. What’s even more remarkable is that an estimated 70% of its codebase was generated by AI, showcasing the potency of AI-assisted development. This internally developed agent is now an indispensable part of daily operations for over 4,000 of OpenAI’s approximately 5,000 employees, marking one of the most aggressive and widespread deployments of an AI data agent within any corporation globally.
In an exclusive interview with VentureBeat, Emma Tang, the head of data infrastructure at OpenAI and the leader of the team behind this groundbreaking agent, provided a rare and insightful glimpse into its inner workings. She detailed its operational mechanisms, the challenges it encounters, and the profound implications it holds for the future of enterprise data management. Her insights, coupled with OpenAI’s official blog post announcing the tool, paint a compelling picture of a company that has effectively turned its own AI capabilities inward, uncovering a critical bottleneck that will soon confront every enterprise: the limitation to achieving smarter organizations lies not in superior AI models, but in the accessibility and quality of data.
"The agent is used for any kind of analysis," Tang emphasized, underscoring its broad applicability. "Almost every team in the company uses it." This pervasive adoption highlights the agent’s ability to democratize data insights across the entire organization, breaking down traditional silos and empowering a wider range of employees.
A Plain-English Gateway to 600 Petabytes of Corporate Data
To truly grasp the significance of OpenAI’s achievement, one must appreciate the sheer scale of the data challenge the company was facing. Its data platform encompasses an immense repository of over 600 petabytes, distributed across a staggering 70,000 individual datasets. Even the seemingly straightforward task of locating the correct data table could consume hours of a data scientist’s valuable time. Tang’s Data Platform team, a critical unit within the infrastructure division responsible for overseeing big data systems, streaming technologies, and the data tooling layer, serves an enormous internal user base. "There are 5,000 employees at OpenAI right now," Tang stated. "Over 4,000 use data tools that our team provides." This statistic alone illustrates the critical need for a more efficient and accessible method of data interaction.
The AI data agent, built upon the powerful GPT-5.2 model, is seamlessly integrated into the existing workflows of employees, accessible through familiar platforms such as Slack, a dedicated web interface, Integrated Development Environments (IDEs), the Codex Command Line Interface (CLI), and OpenAI’s internal ChatGPT application. This ubiquitous accessibility ensures that employees can engage with the agent wherever they are and through their preferred tools. The agent is designed to interpret natural language queries and deliver comprehensive outputs, including charts, interactive dashboards, and detailed long-form analytical reports. While the team estimates that the agent saves an average of two to four hours of work per query, Tang highlighted that the most significant benefit is far more profound and difficult to quantify: it grants individuals access to analyses that were previously unattainable, regardless of the time available.
"Engineers, growth, product, as well as non-technical teams, who may not know all the ins and outs of the company data systems and table schemas" can now independently extract sophisticated insights, her team noted. This empowers individuals across various departments, irrespective of their technical expertise in data querying or database structures, to leverage the power of data for informed decision-making.
From Revenue Breakdowns to Latency Debugging, One Agent Handles It All
Tang provided a series of concrete use cases that vividly illustrate the agent’s extensive capabilities. The finance team at OpenAI routinely queries the agent for detailed revenue comparisons, segmented by both geography and specific customer cohorts. "It can, just literally in plain text, send the agent a query, and it will be able to respond and give you charts and give you dashboards, all of these things," she explained, highlighting the ease of use and the richness of the output.
However, the agent’s true power is unleashed in its ability to perform complex, multi-step strategic analyses. Tang recounted a recent instance where a user identified discrepancies between two dashboards that were tracking Plus subscriber growth. "The data agent can give you a chart and show you, stack rank by stack rank, exactly what the differences are," she said. "There turned out to be five different factors. For a human, that would take hours, if not days, but the agent can do it in a few minutes." This capability dramatically accelerates the process of root cause analysis and problem identification, allowing teams to address issues with unprecedented speed.
Product managers leverage the agent to gain a deeper understanding of feature adoption rates, enabling them to refine product development strategies based on real-time user behavior. Engineers utilize it for diagnosing performance regressions, posing questions like whether a particular ChatGPT component is indeed slower than the previous day and, if so, identifying the specific latency components contributing to the change. The agent can efficiently break down these complex performance metrics and provide comparative analyses against prior periods from a single, straightforward prompt.
What sets this AI data agent apart is its ability to operate seamlessly across organizational boundaries. In stark contrast to the siloed nature of most enterprise AI agents today, which are often confined to specific departments (e.g., a finance bot, an HR bot), OpenAI’s agent functions horizontally across the entire company. Tang explained that while the agent was initially launched and curated with specific memory and context tailored for each department, "at some point it’s all in the same database." This unification allows senior leaders to combine disparate data sources – such as sales figures, engineering metrics, and product analytics – within a single query. "That’s a really unique feature of ours," Tang proudly stated, underscoring the agent’s ability to foster cross-functional insights.
How Codex Solved the Hardest Problem in Enterprise Data
According to Tang herself, identifying the correct table amidst a vast sea of 70,000 datasets represents the most formidable technical challenge her team confronts. "That’s the biggest problem with this agent," she admitted. This is precisely where Codex, OpenAI’s sophisticated AI coding agent, plays its most innovative and critical role.
Codex serves a multifaceted purpose within the system, fulfilling a triple duty. Firstly, users interact with the data agent through Codex via MCP (Model Control Plane), providing a unified access point. Secondly, and perhaps most astonishingly, the team leveraged Codex to generate over 70% of the agent’s own code. This AI-driven code generation was instrumental in enabling just two engineers to successfully deliver the entire system within the three-month timeframe. The third and most technically fascinating role of Codex is an ongoing, asynchronous process. In this capacity, Codex meticulously examines critical data tables, analyzes the underlying pipeline code responsible for their generation, and subsequently determines each table’s upstream and downstream dependencies, ownership, granularity, key join fields, and identifies similar tables.
"We give it a prompt, have Codex look at the code and respond with what we need, and then persist that to the database," Tang explained. This "Codex Enrichment" process effectively creates a rich metadata layer for the data. When a user subsequently queries for information related to revenue, the agent consults a vector database to pinpoint tables that Codex has already mapped and categorized as relevant to that concept.
This "Codex Enrichment" is just one of six distinct context layers that the AI data agent utilizes to inform its analysis. These layers encompass a spectrum of information, ranging from fundamental schema metadata and expertly curated descriptions of datasets to institutional knowledge extracted from internal communication platforms like Slack, collaborative document tools like Google Docs, and knowledge management systems like Notion. Furthermore, the agent incorporates a learning memory that stores corrections and feedback from previous user interactions, continuously refining its understanding and accuracy. When pre-existing contextual information is insufficient, the agent intelligently falls back to performing live queries directly against the data warehouse, ensuring it can always access the most up-to-date information.
The team has also implemented a tiered system for historical query patterns. "All query history is everybody’s ‘select star, limit 10.’ It’s not really helpful," Tang remarked, pointing out the often superficial nature of raw query logs. Instead, canonical dashboards and executive reports – those meticulously crafted by analysts who invested significant effort in defining the correct data representations – are designated as "source of truth." All other, less authoritative data sources are consequently deprioritized, ensuring the agent focuses on the most reliable and validated information.
The Prompt That Forces the AI to Slow Down and Think
Despite the sophisticated architecture and the integration of six context layers, Tang was refreshingly candid about the agent’s most significant behavioral flaw: its tendency towards overconfidence. This is a common challenge encountered by anyone who has worked extensively with large language models.
"It’s a really big problem, because what the model often does is feel overconfident," Tang stated. "It’ll say, ‘This is the right table,’ and just go forth and start doing analysis. That’s actually the wrong approach." The AI’s inclination to jump to conclusions without sufficient validation can lead to inaccurate or misleading insights.
The solution to this critical issue was found through meticulous prompt engineering, designed to compel the agent to spend more time in a crucial discovery phase before executing an analysis. "We found that the more time it spends gathering possible scenarios and comparing which table to use – just spending more time in the discovery phase – the better the results," she explained. The prompt is artfully crafted to mimic the guidance one might give to a junior analyst: "Before you run ahead with this, I really want you to do more validation on whether this is the right table. So please check more sources before you go and create actual data." This deliberate slowdown in processing time ensures a more thorough and accurate analytical process.
Through rigorous evaluation, the team also discovered a counterintuitive but highly effective principle: less context can sometimes yield better results. "It’s very easy to dump everything in and just expect it to do better," Tang observed. "From our evals, we actually found the opposite. The fewer things you give it, and the more curated and accurate the context is, the better the results." This emphasizes the importance of quality and relevance over sheer quantity of information.
To foster user trust and transparency, the agent streams its intermediate reasoning processes to users in real time. It clearly exposes which data tables it has selected and provides the rationale behind those choices, along with direct links to the underlying query results. Users retain the ability to interrupt the agent’s analysis mid-process, allowing them to redirect its focus if necessary. The system is also designed with checkpointing capabilities, enabling it to resume an interrupted task from where it left off in the event of failures. Crucially, at the conclusion of every task, the model is prompted to evaluate its own performance. "We ask the model, ‘how did you think that went? Was that good or bad?’" Tang shared. "And it’s actually fairly good at evaluating how well it’s doing," a testament to its advanced self-assessment capabilities.
Guardrails That Are Deliberately Simple – And Surprisingly Effective
When it comes to ensuring the safety and security of the AI data agent, Tang adopted a pragmatic approach that might surprise enterprises expecting highly sophisticated AI alignment techniques. "I think you just have to have even more dumb guardrails," she stated. "We have really strong access control. It’s always using your personal token, so whatever you have access to is only what you have access to."
The agent functions purely as an interface layer, meticulously inheriting the same granular permissions that govern all of OpenAI’s internal data. It never appears in public communication channels, operating exclusively within private channels or a user’s individual interface. Write access to data is strictly confined to a temporary test schema that is periodically purged and cannot be shared across users. "We don’t let it randomly write to systems either," Tang clarified, reinforcing the system’s commitment to data integrity and security.
User feedback serves as a critical component of the feedback loop. Employees are empowered to flag incorrect results directly within the system, initiating an investigation by the team. The model’s own self-evaluation process provides an additional layer of scrutiny. Looking further ahead, Tang indicated that the team plans to evolve towards a multi-agent architecture, where specialized agents would collaborate, monitor, and assist each other. "We’re moving towards that eventually," she said, "but right now, even as it is, we’ve gotten pretty far."
Why OpenAI Won’t Sell This Tool – But Wants You to Build Your Own
Despite the clear and substantial commercial potential of its internal data agent, OpenAI has explicitly stated that it has no plans to productize this specific tool. Instead, the company’s strategy is to provide the foundational building blocks and empower enterprises to construct their own bespoke solutions. Tang underscored that all the components and APIs her team utilized in building this system are already available externally.
"We use all the same APIs that are available externally," she confirmed. "The Responses API, the Evals API. We don’t have a fine-tuned model. We just use 5.2. So you can definitely build this." This message aligns with OpenAI’s broader strategic push into the enterprise market. In early February, the company launched OpenAI Frontier, an end-to-end platform designed for enterprises to build and manage their own AI agents. To support this initiative, OpenAI has partnered with leading consulting firms including McKinsey, Boston Consulting Group, Accenture, and Capgemini, who will assist in selling and implementing the Frontier platform. Furthermore, Amazon Web Services (AWS) and OpenAI are jointly developing a Stateful Runtime Environment for Amazon Bedrock, which will incorporate some of the persistent context capabilities that OpenAI developed for its data agent. Apple has also recently integrated Codex directly into Xcode, demonstrating the broad applicability and adoption of OpenAI’s development tools.
Information shared with VentureBeat by OpenAI reveals the profound impact of Codex within the company. It is now utilized by an impressive 95% of engineers at OpenAI and meticulously reviews all pull requests before they are merged into the codebase. The global weekly active user base for Codex has tripled since the beginning of the year, surpassing one million users, with overall usage experiencing a fivefold increase.
Tang described a significant evolution in how employees are utilizing Codex, extending far beyond its original purpose of coding. "Codex isn’t even a coding tool anymore. It’s much more than that," she observed. "I see non-technical teams use it to organize thoughts and create slides and to create daily summaries." She further elaborated on how one of her engineering managers uses Codex to review her notes each morning, identify the most critical tasks, pull relevant Slack messages and direct messages, and draft responses. "It’s really operating on her behalf in a lot of ways," Tang concluded, highlighting the agent’s growing role as a proactive assistant.
The Unsexy Prerequisite That Will Determine Who Wins the AI Agent Race
When asked what key takeaways other enterprises should glean from OpenAI’s experience, Tang did not point to the advanced model capabilities or the clever prompt engineering techniques. Instead, she highlighted a far more fundamental, albeit less glamorous, prerequisite. "This is not sexy, but data governance is really important for data agents to work well," she stated. "Your data needs to be clean enough and annotated enough, and there needs to be a source of truth somewhere for the agent to crawl."
The underlying infrastructure – including storage, compute, orchestration, and business intelligence layers – has not been rendered obsolete by the advent of the AI data agent; it remains essential for the agent to perform its functions. However, the agent represents a fundamentally new entry point for data intelligence, offering a level of autonomy and accessibility that surpasses anything seen before.
Tang concluded the interview with a pointed warning for companies that hesitate to embrace this technological shift. "Companies that adopt this are going to see the benefits very rapidly," she asserted. "And companies that don’t are going to fall behind. It’s going to pull apart. The companies who use it are going to advance very, very quickly."
When questioned about whether this accelerated pace of progress caused concern among her colleagues, particularly in light of recent widespread layoffs at companies like Block, Tang paused thoughtfully. "How much we’re able to do as a company has accelerated," she said, "but it still doesn’t match our ambitions, not even one bit." This final remark underscores OpenAI’s relentless drive for innovation and its ambitious vision for the future, fueled by the transformative power of AI.

