In a dusty prairie area in Abilene, Texas, hardware engineers from OpenAI and Crusoe, Oracle's data center contractor, worked overtime for days to make multiple gas turbine units run stably with the most expensive AI supercomputer in history.

A gas turbine unit is installed next to the Stargate AI data center built by Oracle and OpenAI in Abilene, Texas.
A gas turbine unit is installed next to the Stargate AI data center built by Oracle and OpenAI in Abilene, Texas.

A number of people familiar with the project, resident engineers and power grid experts revealed that the project is part of the OpenAI Stargate computing power infrastructure project, and the overall implementation difficulty and capital investment are far beyond initial expectations.

The Abilene base has long been regarded as a benchmark for global AI data center construction. Crusoe's customer Oracle has deployed servers for OpenAI here, consuming at least hundreds of megawatts of power; the company plans to deploy more chips in new buildings this summer, with a total power load of up to 1.2 gigawatts, which is enough to support the lighting needs of the entire city of San Francisco.

But the first problem is to ensure uninterrupted power supply. People familiar with the matter said that due to multiple problems such as cooling system failures, turbine unit anomalies, and new grid fluctuation control regulations to be introduced by the Texas power grid regulatory agency, Crusoe had to suspend operations in stages to avoid multiple risks to equipment, manpower, and funds.

In addition to various operational difficulties at the project site, all AI infrastructure manufacturers are experiencing cost out of control. A few weeks ago, Crusoe CEO Chase Rockmiller revealed in a guest lecture at Stanford University that the cost of building a “main electrified plant” for a 1-gigawatt data center is as high as $19.2 billion, covering the main building materials, mechanical and electrical equipment, supporting gas power stations and all labor costs.

This number has risen sharply compared to the quotes for projects with the same specifications two or three years ago: Under the AI ​​computing boom, contractors’ technical wages have generally increased by 30%, and labor costs have accounted for nearly a quarter of the total investment. "The competition for technical manpower in the industry has never been fiercer," Rockmiller said.

The cost of other hardware supporting equipment has also skyrocketed. He told the students that the cost of a single-gigawatt gas-fired power station has nearly tripled in the past few years, reaching up to $3 billion; data from the Federal Reserve Bank of St. Louis show that the price of transformers and switchgear has increased by 80% since 2020. The procurement cost for the chips and server supporting equipment required for a 1 GW data center is another approximately US$40 billion.

At present, the cost sharing plan of Crusoe, Oracle, OpenAI and other partners has not been disclosed; if budget overruns and construction delays occur, there is no clear conclusion on the relevant legal liability entities. A Crusoe spokesperson responded that the company's budget has set aside risk reserves to deal with various emergencies.

One thing is very clear: the global data center construction cycle is generally lengthening, and the three factors of lengthening the land use approval cycle, shortage of core equipment, and labor shortage continue to slow down progress. JP Morgan economists released a report last month saying that satellite images show that more than 60% of data centers originally scheduled to be put into operation before 2027 have not yet started construction, and another 7% of the project schedules have been delayed, indicating that the pace of industry expansion may slow down.

Crusoe's troubleshooting work at the Abilene base also sounded a warning to the entire industry: There is no room for sloppiness in the construction of gigawatt-scale hyperscale data centers. Any mistake may lead to chip overheating and damage, turbine blades and drive shafts to break, construction workers to be injured or injured due to electric shock, or power grid supporting equipment to be completely burned.

Crusoe CEO Chase Rockmiller
Crusoe CEO Chase Rockmiller

Multiple difficulties such as power supply bottlenecks and new regulatory constraints are also the core reasons why AI companies such as OpenAI and Anthropic report that they are unable to obtain sufficient computing power from newly built data centers and iterate new technologies at the expected speed.

Crusoe has been established for eight years. In its early days, it relied on waste energy to operate cryptocurrency mines. In 2022, it will fully transform into an AI infrastructure track. The company's private equity valuation exceeded US$10 billion seven months ago, and the latest news shows that its pre-IPO round of financing is expected to be valued in the US$300 billion to US$400 billion range. Corporate executives who have worked with Crusoe have given positive reviews to its management team, saying that the team has greatly accelerated the efficiency of industry construction and flexibly resolved various problems in project implementation and supervision.

Crusoe officially released a statement saying: "The power demand characteristics of AI computing loads are fundamentally different from the design adaptation logic of traditional backup power supplies in the power industry. This is a major engineering problem that the entire industry has to overcome. The projects we have delivered to our customers have set industry precedents in terms of construction speed and implementation scale, and we are very proud of this."

As a pioneer in the AI ​​data center track, the various hidden dangers exposed by the Crusoe project are equivalent to clearing mines in advance for the entire industry. Similar to Tesla’s previous deployment of energy storage batteries to resolve power pulse shocks in the xAI (now merged into SpaceX) data center.

Another local Texas infrastructure manufacturer commented that Crusoe dared to quickly trial and error and iterate solutions in exchange for ultimate construction speed, but at the expense of high investment costs. A former OpenAI engineer familiar with the Abilene project confirmed this. Project insiders revealed that the base’s initial backup power supply solution had insufficient resistance to sudden voltage changes and power oscillations, and the team had to modify multiple versions of the design.

Since these gas turbines are only used as backup power for the data center, they do not affect the main line connection between the base and the Texas public power grid. Project partner Lancium is responsible for constructing the on-site substation. People familiar with the construction period said that the progress of the substation project is up to standard or even ahead of schedule, ensuring that OpenAI can use up to 1.2 GW of external grid power this summer.

However, sufficient power supply from the grid does not mean that OpenAI and Oracle can be used at full capacity immediately. Engineers need to complete the server chip baking machine test and simultaneously optimize the power supply and cooling system design to complete the debugging of the entire computing cluster before summer. A former engineer involved in the project revealed that earlier this year, the refrigeration unit used to prevent the chip server from overheating and melting down (thermal runaway failure) failed in a low-temperature environment, causing computing power to be interrupted for nearly a full day.

Risks of going off grid

The power consumption of AI computing load fluctuates greatly between milliseconds. Relevant research points out that improper management and control will produce frequency mismatch (harmonic distortion), damage household appliances and substation equipment, and accelerate the loss of the data center's own battery. Once an abnormality in the power grid is detected, the data center will proactively disconnect from the grid to protect itself. In 2024 and 2025, dozens of computer rooms in Virginia’s “data center corridor” went offline twice, almost triggering a regional blackout.

In the summer and autumn of 2024, a crypto mining farm in West Texas continued to cause violent power oscillations in the power grid due to a firmware program defect. The manufacturer rewrote the firmware and the fault was resolved.

Texas power grid operators are highly vigilant about this. According to statistics from GridMonitor, a power grid conference tracking agency, this year alone, the term "power oscillation" was mentioned 80 times in various meetings of the Electric Reliability Council of Texas (ERCOT). The agency is implementing new distortion control regulations, forcing data centers to be equipped with high-precision power buffering and voltage stabilizing systems. The mainstream solution is energy storage batteries, and manufacturers are also simultaneously developing alternative solutions such as small generator sets, capacitors, and fuel cells.

Another new regulation to be implemented requires data centers to have the ability to ride through grid faults and not to be directly disconnected from the grid in the event of an abnormality. The good news is that the overall design of the new generation park is equipped with a more efficient energy storage buffer battery, and the supporting AI hardware has also been adapted and optimized. Sean James, energy system architect at NVIDIA, said: "NVIDIA continues to optimize the built-in circuitry of servers to improve the ability to buffer power pulses."

Preventing AI computing power from impacting the power grid has risen to the level of unified supervision in North America. The North American Electric Reliability Corporation (NERC) issued a rare level three warning on May 4, requiring power grid planners to implement core rectification measures before August 3 to prove that the power grid can carry new ultra-large computing loads such as AI data centers.

NERC CEO Jim Robb said: "Silicon Valley has always believed in rapid trial and error, old and new, but this logic does not apply to the power grid - all infrastructure for the operation of society depends on stable power supply. The operation mode of data centers and encrypted mines must ensure the overall reliability of the power grid."

An Oracle spokesperson responded: "Ensuring the stability of the power grid is the core design principle of Oracle's hyperscale data center. The company has deeply cooperated with Lancium and coordinated with local power companies to ensure the safe operation of the power grid."