Yao Quan, president of Huawei Data Center Facility Domain
Yao Quan, president of Huawei Data Center Facility Domain

Huawei held a conference on the Top 10 Trends of Data Center Facility in 2024 and released the corresponding White Paper. At the conference, Yao Quan, President of Huawei Data Center Facility Domain, defined the three characteristics of future data centers: reliable, simplified, and sustainable. Yao shared the trends of technological evolution in terms of component, product, system, and architecture for the sake of developing the industry and harnessing collective wisdom.

According to Yao, in the context of growing AI foundation models, the compound annual growth rate (CAGR) of global AI computing power is expected to exceed 80% in the next five years, which drives the transition from cloud data centers to cloud + intelligent computing data centers. Based on extensive explorations and long-term practices, Huawei released the Top 10 Trends of Data Center Facility in 2024 to the global audience, sharing with the industry its insights and thoughts on the future of data centers.

In recent years, there has been a noticeable rise in data center safety accidents, and according to data from the Uptime Institute, from 2019 to 2022, the percentage of losses over US$100,000 that were attributed to interrupted data center services has increased from 39% to 71%, and that number is set to multiply with the soaring computing power demand. Beyond all doubts, safety and reliability lie at the very core of data centers and should not be underestimated or underdeveloped.

Trend 1: High-Reliability Products and Professional Services Are the Key to Ensuring Secure and Reliable Data Center Operation

Data centers store, process, and transmit massive volumes of data, ensuring the stable operation of various industries. One of the shortcomings of data centers has been the lack of attention toward safety and reliability. To achieve safe and reliable data center operations, the concept of "full-chain safety" must be implemented throughout product design and manufacturing. On top of that, the quality of the product line should be strictly controlled with a high degree of automation to reduce human intervention, guaranteeing the reliability of products. Furthermore, it is necessary to fully consider to the countermeasures that must be taken after the emergence of product problems. By providing professional deployment and O&M services, we can reduce product failure rates, minimize post-disaster impacts, and improve the end-to-end assurance mechanism. With highly reliable products and professional services, the normal operation of data centers can be better safeguarded.

Trend 2: Distributed Cooling Architecture Will Become a Better Choice for Ensuring Cooling Safety

Traditional large-scale data centers mainly adopt the centralized cooling architecture. For example, in a traditional chilled water system, seven subsystems and dozens of devices co-exist in a chiller plant. As these devices cannot run independently, a single-point failure impacts the safe operation of the entire plant, causing the large-scale breakdown of a data center. Safety accidents of the industry in recent years also indicate that risks of single-point failures keep haunting the centralized cooling architecture. By contrast, the distributed cooling architecture is more flexible in the sense that subsystems are independent of each other, and faults of a single device exert no impact on the operation of other devices. With a smaller fault domain, the distributed cooling architecture, in virtue of its architecture design, shunts away single-point failures of the cooling system, ensuring the reliable operation of data centers.

Trend 3: Predictive Maintenance Will Become a Basic Feature of Data Center Infrastructure

Data center maintenance is usually performed after an event, causes of which are revealed only after such an event occurs. However, with the advent of the intelligent computing era, the response time to data center faults will be greatly shortened. Going forward, predictive maintenance will become a basic feature for data center infrastructure, that is, maintenance after an event will be substituted by maintenance before an event. Thanks to the rapid development of AI technologies, the scope of predictive maintenance will be further expanded. The service life of vulnerable components such as capacitors and fans, thermal runaway of devices, and leakage of the cooling system can be predicted to prevent accidents. In this way, data centers shift from passively targeted maintenance to proactively predictive maintenance, improving O&M reliability to a great extent.

Trend 4: The Lifecycle Network Security Protection System Will Become a Shield of Data Center Facility

As digital and intelligent technologies keep advancing, network attacks are occurring more frequently, posing exponentially increasing network security risks. When UPSs or cooling equipment is subject to malicious attacks, data centers are impacted in terms of both security and reliability. Moving forwards, the overall security of data center infrastructure can only be ensured based on both hardware security and software security. Software security must be built upon a lifecycle network security protection system from three dimensions: supply security, in-depth defense, and O&M/operation security to secure the reliable running of data centers.

Trend 5: Prefabricated and Modular Solution Will Become an Optimal Choice for High-Quality and Fast Delivery

The rapid development of global services of Internet cloud vendors is placing significant demand on the construction of more data centers. However, traditional data centers feature slow construction and complex engineering, falling far short of current demand. Therefore, the prefabricated and modular solution with a shorter construction period and higher quality will emerge as an optimal choice. Through product-like engineering and prefabrication design, products are prefabricated and commissioned early in the factory. This ensures the onsite delivery of high-quality products, effectively shortens the delivery period, meets customers' requirements for fast service rollout, and greatly reduces waste brought by onsite construction.

Trend 6: Professional Management Platform Makes Data Center O&M More Secure and Efficient

From 1,000-rack buildings to 10,000-rack campuses, data centers present a trend to scale up in an intensive manner. Consequently, the complexity of overall O&M increases dramatically. Since most data center devices are "dumb" devices, it is difficult to perform an all-around inspection in a traditional way because it requires well-skilled personnel and takes a long time to locate faults. A professional management platform can significantly improve the O&M efficiency and accuracy of data centers. The professional management platforms provided by original vendors help customers build in-depth device management capabilities, which greatly simplifies O&M through quick and timely fault location and rectification. As a result, data centers can operate more safely and reliably.

Trend 7: The Convergence of Air and Liquid Cooling Becomes the Preferred Architecture in Uncertain Service Requirements Scenarios

At present, the industry stands in the transition from general-purpose computing to intelligent computing. Scenarios supported by general-purpose computing and intelligent computing may exist in a data center. Generally, the power density of a single rack for general-purpose servers does not exceed 15 kW, where air-cooled equipment can meet the cooling requirements. However, the power density of a single rack in an intelligent computing center exceeds 30 kW. In this scenario, liquid cooling is required for heat dissipation. For service scenarios with uncertain requirements, the convergence of air and liquid cooling will become a preferred architecture, where the proportion of air cooling and liquid cooling can be adjusted to flexibly adapt to future service evolution and maximize customers' ROI.

Trend 8: Indirect Evaporative Cooling Is Still the Best Refrigeration Scheme Now and in the Future

The air cooling solution is still the undisputed champion of mainstream application scenarios. In respect to cooling sources, the indirect evaporative cooling system has obvious advantages over the chilled water system in terms of architecture, efficiency, and O&M. Therefore, the indirect evaporative cooling system remains the most cost-effective cooling solution. The distributed cooling architecture of an indirect evaporative cooling system effectively prevents single-point failures, leading to higher reliability. By maximizing the utilization of free cooling sources, only one heat exchange is required. In cold regions, compressors can stay dormant the majority of the time, achieving optimal PUE. In response to intelligent computing power demands, the indirect evaporative cooling system future proofs the architecture, allowing it to further adapt to liquid-cooled computing scenarios.

Trend 9: To Further Reduce PUE, the Optimal Solution is to Shift the Focus on Efficient Components to System Engineering Optimization

Carbon neutrality is a global consensus and mission. Traditional data centers focus on improving the efficiency of equipment such as UPSs and air conditioners. However, due to physical limitations, the efficiency improvement of components is about to hit a bottleneck. The time and cost invested in minor improvements are far from satisfying the requirements of the computing power era. Therefore, to reduce the PUE of data centers, the focus on efficient components should be shifted to system engineering optimization. We should ponder on the matter from the perspective of system engineering, and balance the actual conditions and component technology level, in an effort to draw up the optimal solution. For example, the UPS dual-conversion mode is transformed to S-ECO mode, and the data center PUE is changed to PFPUE (petaflops PUE), which optimizes the energy efficiency of data centers in an end-to-end manner.

Trend 10: AI Optimization Will Become the Optimal Choice for Intelligent Optimization of Energy Efficiency for Existing Data Centers

There are still a large number of existing data centers that require better energy-saving performance, especially data centers whose PUE is much higher than that required for China's national integrated big data centers. To meet the energy-saving requirements, these data centers await urgent modernization. Traditional energy-saving renovations involve the suspension of lines and services, which may cause service interruption, whereas manual optimization is also unsatisfying owing to its high difficulty, poor effect, and low frequency. In contrast, the AI energy efficiency optimization solution optimizes the energy efficiency of existing data centers with preset AI algorithms and big data models. In addition, as AI optimization does not rely on the expertise of relevant personnel, it features fast optimization and excellent effects, facilitating the transition from traditional cooling to intelligent cooling.

What roots are to a tree is what infrastructure is to a data center. Decades of concentrated efforts of practitioners in the data center industry have laid a solid foundation for the digital economy. Today, the explosion of intelligent computing power unveils a bright future for the industry. Looking ahead, Huawei will insist on creating reliable, simplified, and sustainable data center facility products and solutions to help customers and partners build green and reliable infrastructure for computing by enabling each watt to drive more computing power, as part of our efforts to power the digital world.

Copyright © BusinessKorea. Prohibited from unauthorized reproduction and redistribution