A Legacy of Innovation: Ahead of the Curve on Advanced Cooling for AI & HPC
Reflections on 2023 and Looking Ahead to 2024
By Chris Sharp, Chief Technology Officer, Digital Realty
The end of the year is a great time to look back and reflect on achievements and lessons learned and how we’re building on the past as we plan for the future.
Artificial intelligence (AI) and high-performance computing (HPC) have emerged as key areas of opportunity for innovation and business transformation.
The challenge for IT leaders is to enable these high-density workloads with the right IT infrastructure, and increasingly the community is discussing advanced cooling technologies like liquid cooling.
While direct liquid cooling (DLC) is being deployed in data centres today more than ever before, would you be surprised to learn that we’ve been deploying it in our data centre designs at Digital Realty since 2015? Did you also know that liquid cooling technology isn’t always the right choice for every high-density AI or HPC workload?
In this post, I’ll cover the basics of the data centre cooling needs for high-density workloads like AI and HPC, and how Digital Realty’s legacy of innovation has prepared us to support the acceleration of demand for advanced cooling techniques of all kinds, including liquid cooling solutions.
I’ll also share case studies from our innovation journey that demonstrate how enabling innovation is about having the right strategy and the right partners, rather than a one-size-fits-all approach.
Cooling needs of high-density workloads
The density of an AI or HPC deployment determines its unique cooling needs.
The power density requirements for AI and HPC can be 5-10 times higher than other data centre use cases. Traditional workloads tend to be in the range of 5-8 kW per rack.
In 2024, it’s likely that some computing hardware may enable power densities exceeding 100 kW/rack and the peak density in the data centre could reach 150 kW/rack over the next couple of years.
Traditional workload densities can be air cooled, however, broadly speaking, most AI & HPC workflows require specialized cooling such as direct liquid cooling (DLC), air-assistant liquid cooling (AALC), or a rear-door heat exchanger.
Not all AI & HPC workloads require liquid cooling
The requirements for liquid cooling solutions
Requirements for liquid cooling vary by the hardware vendor, the specific hardware itself, and the workload type. Liquid cooling solutions are not appropriate for all hardware or every scenario.
Even in the age of AI, not every rack will be drawing 100 kW, and may not even demand specialized advanced cooling technology.
For example, inferencing deployments tend to be less power-hungry than training deployments and may be able to be cooled with traditional air-cooling techniques. Machine learning requires fewer resources, while deep learning and generative AI require massive environments due to their complexity.
It’s important for IT leaders to understand that different AI and HPC workloads have different cooling needs and that not every data centre partner will have the specialised knowledge or infrastructure capabilities to enable the technology.
The requirements for each deployment will vary, so it’s important to work with a partner who will design a customized solution and not depend on a one-size-fits-all approach. That’s why Digital Realty’s legacy of data centre design expertise with advanced cooling makes a difference for our customers.
Strategies for advanced cooling innovation
Digital Realty’s global data centre platform, PlatformDIGITAL®, was chosen to be the home of many groundbreaking AI and HPC workloads.
We’ve learned that in order to enable innovation, a few key strategies help us not only keep pace with technology — we stay a step ahead.
IT strategies to support AI & HPC workflows must enable:
- Agility
- Scale
- Sustainable growth
These case studies from our own innovation journey over the last decade highlight these strategies in action. They also demonstrate how our expertise and innovation strategy helps us identify the right solution for the situation rather than relying on a one-size-fits-all approach.
Advanced Cooling Innovation case studies
Enable scale: a high-capacity trading engine with liquid cooling
2015 was a transformative year for us at Digital Realty; it was also my first year with the company. We embarked on an ambitious project to build the foundation for a global financial services company that specialized in algorithmic high-frequency trading.
A significant part of this venture was a strategic shift from traditional air cooling to advanced liquid cooling down to the chip level to support HPC clusters. This engineering feat not only enhanced the cooling system's efficiency, but also meant that we were able to scale our technology to continue to support our client as their deployment grew to nearly 6 MW.
Investing in next-generation liquid cooling technology was a decision that we knew would enable our customer beyond their immediate needs and establish a capability with a focus on long-term scalability and sustainability.
Enable sustainable growth: supercomputing with adaptable design
Recently, we partnered with a European customer to develop a sophisticated supercomputer environment that included up to 70 kW per rack in a mixed environment. The customer needed to deploy quickly while also complying with new sustainability regulations.
Waiting 3-5 years to build a new data centre was not an option, which is why our ability to retrofit existing facilities gets customers up and running faster. Taking an energy efficient facility that we built in 2013, we were able to meet their demanding requirements for high-power density and connectivity with minimal changes to our facility. This enabled a 400% faster deployment.1
Our customer projected a 30% improvement in energy efficiency by switching to liquid cooling.1 They also benefitted from Digital Realty’s aquifer thermal energy storage (ATES) cooling system and fully renewable energy sources to achieve CO2 targets set by local sustainability regulations.
Our ability to develop retrofit designs shows our commitment to both cutting-edge and agile design that enables sustainable, and timely, growth. Our design principles ensure our infrastructure will meet not just the present needs but also requirements decades into the future.
Enable agility: a flexible, future-proofed generative AI deployment
Today, we’re playing a key role in the advancement of generative AI (GenAI). We’re working with a customer that's integrating over 30,000 of the most advanced GPUs into one massive platform.
To enable advanced computing performance, the deployment requires that every GPU be connected in a single computing cluster. They needed a data centre platform provider that could help them deploy quickly to start getting the value from their GPU investment, which was even more challenging given their specialized design requirements.
Our investment strategy is aimed at anticipating future demand, which has enabled us to match them with a facility that was shell-ready with designs ready. Our agile, modular design approach enabled us to solve their complex design challenges while maintaining 99% of the original design, which meant we could get building sooner.
Our agile approach will enable them to deploy in as soon as 12 months instead of the 36 months they’d require with custom building.1 The requirements of our customers are rapidly changing, as is the technology and the solutions to meet them — that’s why agility needs to be a core strategy to enable innovation.
Even though this is the definition of an advanced AI workload, direct liquid cooling was not the best choice for cooling. This is a good example of why a one-size-fits-all approach to high-density workload cooling doesn’t work.
Beyond infrastructure: fostering a culture of innovation
To execute these innovation strategies, another key element is your team of people. For all IT leaders, it’s important to remember that our achievements aren’t just about infrastructure:;they're about the culture of innovation we've cultivated.
At Digital Realty, our talented teams bring a legacy of innovation and engineering for which we've received multiple awards as trailblazers in the data centre space.
Our culture of innovation at Digital Realty enables alignment with our customers, ensuring that our partners are comfortable that they can grow with Digital Realty far into the future.
A vision for the future
My role as Chief Technology Officer at Digital Realty is to understand our customers’ technological needs and ensure that Digital Realty can support those needs, not only for today, but for tomorrow.
As we look to the future, we remain dedicated to not just participating in the technological landscape but actively shaping it. Our mission is to enable our customers’ innovation by enabling agility, scale, and sustainable growth.
Sustainability is particularly important to us. We continue to expand our coverage of carbon-free and renewable power sources to keep pace with customer demand – we have more than 1 gigawatt of solar and wind energy under contract -- and we have begun to use alternative fuel secondary power solutions to further reduce the lifecycle carbon footprint of our data centres.
We'll focus on applying the best technology in-time to meet our customer’s needs, rather than wholesale deploying the status quo and forcing the customers of tomorrow to accept yesterday’s limitations. This approach is what has enabled Digital Realty to serve the examples highlighted throughout this post, as well as all manner of other customer needs throughout the globe.
Our adaptability, innovative spirit, and rich heritage are what make us a unique and enduring company in the ever-evolving world of technology.
Building a legacy of innovation doesn’t happen overnight, but at Digital Realty we’ve learned that we’re always moving in the right direction when we’re true to our values and focused on how we can best serve our customers’ needs.
Join us at Digital Realty as we continue to define the future of technology. Stay innovative, reach out to us, and let's deploy AI and HPC in a way that transforms your organization.
Learn more about AI-ready data centre infrastructure:
- Harness AI’s Potential and Navigate Disruption with Digital Realty
- Are Data Centres Obsolete in the Age of AI? Not on our Watch
- Integrating AI with Legacy Infrastructure
1 Projected outcomes for this customer as compared to their existing infrastructure prior to being deployed and connected on PlatformDIGITAL®, or compared with alternative solutions available at the time of purchase.
Frequently asked questions
What does advance cooling technologies do?
Advanced cooling technologies in data centres aim to dissipate the heat generated by servers and other equipment more efficiently, ensuring optimal operating conditions and preventing overheating. These technologies help the overall performance and reliability of the data centre infrastructure.
What is the difference between active cooling and passive cooling?
Active Cooling: Requires systems like fans, air conditioners, or liquid cooling systems to actively remove heat from the environment. It consumes energy to operate and can offer control over temperature and humidity levels.
Passive Cooling: Relies on natural methods such as airflow management, ventilation, and heat sinks to dissipate heat without the need for mechanical systems. It typically consumes less energy but can provide less control over temperature and humidity.
What is a liquid cooling solution?
Liquid cooling solutions involve circulating a liquid coolant with heat-generating components such as CPUs and GPUs within servers to dissipate heat more efficiently than traditional air-cooling methods. The liquid absorbs the heat and carries it away, either to a heat exchanger or to an external cooling system. Liquid cooling solutions can be more effective in cooling high-density servers and can result in lower energy consumption compared to air cooling.