On Exascale Day We Recall How AI Broke The Barrier

Thorsten Kurth still remembers the night he found his team broke the exascale barrier.

On the sofa at household at 9 p.m., he was pouring around the most recent final results from a person of the first major work run on Summit, then the world’s leading supercomputer, primarily based at Oak Ridge Nationwide Laboratory.

The 12-individual team experienced spent evenings and weekends seeking a way that AI could track hundreds of hurricanes and atmospheric rivers buried in terabytes of historical climate information.

Only a several months before, their software program failed to run on extra than 64 of the system’s nodes.

But this time — just two times in advance of a paper on the get the job done was thanks — it exercised 4,560 of Summit’s 4,608 nodes to supply the final results. In the process, it accomplished 1.13 exaflops of mixed-precision AI performance.

“That was a good sensation, a large amount of challenging operate compensated off,” recalled Kurth of the do the job he led when at Lawrence Berkeley Lab in 2018.

Contents of this Article show

Entering the Exascale Period

Right now, we celebrate the do the job of absolutely everyone who’s cracked a quintillion operations for each second.

Which is a billion billion or 10 to the 18th power. That is why we mark Exascale Day on Oct. 18.

About the identical time Kurth’s team was completing its get the job done, scientists at Oak Ridge also entered the exascale period, hitting 1.8, then 2.36 exaflops on Summit, analyzing genomics to improved fully grasp the nature of opioid habit.

COVID-19 Ignites Exascale Get the job done

Considering the fact that then, numerous other people have pushed the boundaries of science with GPUs.

In March 2024, the Folding@house task place out a contact for donations of no cost cycles on house computer systems to operate investigate analyzing the COVID-19 virus.

Ten times later their digital, dispersed process surpassed 1.5 exaflops, making a crowd-sourced exascale supercomputer fueled in element by extra than 356,000 NVIDIA GPUs.

AI Supercomputing Goes World-wide

Nowadays, tutorial and professional labs around the globe are deploying a new generation of accelerated supercomputers capable of exascale-class AI.

The hottest is Polaris, a process Hewlett Packard Business (HPE) is constructing at Argonne Countrywide Lab able of up to 1.4 AI exaflops. Scientists will use it to progress cancer therapies, examine clear electricity and drive the limitations of physics, operate that will be accelerated by 2,240 NVIDIA A100 Tensor Main GPUs.

An additional highly effective process stands on the campus of the College of California at Berkeley. Perlmutter employs 6,159 A100 GPUs to produce virtually 4 exaflops of AI effectiveness for a lot more than 7,000 scientists functioning on jobs that contain drawing the biggest 3D map of the noticeable universe to day.

Polaris and Perlmutter also use NVIDIA’s computer software applications to enable researchers prototype exascale purposes.

Europe Erects Exascale AI Infrastructure

Atos will develop an even much larger AI supercomputer for Italy’s CINECA exploration heart. Leonardo will pack 14,000 A100 GPUs on an NVIDIA Quantum 200Gb/s InfiniBand community to strike up to 10 exaflops of AI performance.

It’s one of eight units in a regional community that backers simply call “an engine to power Europe’s data overall economy.”

A single of Europe’s largest AI-capable supercomputers is slated to come online in Switzerland in 2024. Alps will be built by HPE at the Swiss Nationwide Computing Center applying NVIDIA GPUs and Grace, our to start with data middle CPU. It is expected to scale to heights up to 20 AI exaflops.

An Industrial HPC Revolution Commences

The move to high-efficiency AI extends further than academic labs.

Developments in deep discovering blended with the simulation technology of accelerated computing has place us at the beginnings of an industrial HPC revolution, mentioned NVIDIA founder and CEO Jensen Huang in a keynote earlier this calendar year.

Selene Exascale Day AI — Selene utilizes a modular architecture based mostly on the NVIDIA DGX SuperPOD

NVIDIA was an early participant in this craze.

In the to start with times of the pandemic, we commissioned Selene, currently ranked as the world’s fastest industrial supercomputer. It assists teach autonomous autos, refine conversational AI techniques and far more.

In June, Tesla Inc. unveiled its individual industrial HPC system to prepare deep neural networks for its electrical autos. It packs 5,760 NVIDIA GPUs to produce up to 1.8 exaflops.

Past the Figures

A few yrs right after profitable a Gordon Bell award for breaking the exascale barrier, Kurth, now a senior software engineer at NVIDIA, sees the real fruit of his team’s labors.

Improved variations of the AI design they pioneered are now readily available on line for any local climate scientist to use. They cope with in an hour what applied to get weeks. Governments can use them to strategy budgets for catastrophe response.

In the close, Exascale Working day is all about the men and women, mainly because to succeed at this level, “you have to have an great team with professionals who realize just about every aspect of what you are attempting to do,” Kurth mentioned.