top of page

The Exascale Horizon: Redefining the Boundaries of Computational Science

Silhouetted scientists observe a glowing digital landscape with circuit patterns and screens. Neon blues and oranges create a futuristic vibe.

Abstract

The ascendancy of exascale computing represents a pivotal juncture in the trajectory of human technological capability. Defined by the capacity to execute one quintillion (10^18) floating-point operations per second (FLOPS), exascale systems have shattered the performance ceilings established during the petascale era, offering a thousand-fold increase in computational throughput. This deep-dive research report provides an exhaustive analysis of the exascale landscape, tracing its lineage from the breakdown of Dennard scaling to the architectural renaissance of heterogeneous computing. We dissect the engineering marvels of flagship systems such as Frontier, Aurora, and El Capitan, elucidating the critical role of unified memory architectures, high-radix interconnects, and novel storage paradigms like Distributed Asynchronous Object Storage (DAOS). Furthermore, the report explores the transformative impact of these systems across scientific domains—from resolving cloud physics in climate models to simulating nuclear aging for stockpile stewardship. Finally, we interrogate the formidable challenges of energy consumption and reliability that loom over the future, evaluating the potential of zettascale roadmaps and post-silicon technologies, including neuromorphic and optical computing, to sustain the momentum of discovery in a post-Moore’s Law world.

1. Introduction: The Quintillion Calculation Threshold

1.1 The Exascale Milestone

In the annals of high-performance computing (HPC), few milestones have carried as much weight as the arrival of exascale. The term "exascale" denotes a computing system capable of performing at least 10^18 calculations per second. To grasp the magnitude of this figure, one must look beyond standard human intuition. An exaFLOP is a billion billion operations. It is a number so vast that if every single one of the approximately 8 billion human beings on Earth were to perform one calculation every second—working nonstop, without sleep or breaks—it would take the entire human race over four years to accomplish what an exascale supercomputer can complete in a single second.1

This threshold is not merely an arbitrary metric of speed; it represents a functional phase transition in scientific modeling. For decades, simulation has been the "third pillar" of science, standing alongside theory and experimentation. However, prior generations of supercomputers, despite their power, were often forced to rely on approximations. They simulated the climate by averaging cloud behaviors over large grids, or they modeled materials by simplifying atomic interactions. Exascale computing provides the computational density required to strip away these approximations. It allows for "first-principles" simulation: modeling the physics of cloud formation directly, simulating the quantum mechanical forces between individual atoms in a battery electrolyte, or resolving the turbulence of plasma in a fusion reactor with high-fidelity 3D granularity.2

1.2 From Petascale to Exascale: A Historical Context

The journey to this point has been defined by a relentless pursuit of performance against the backdrop of increasingly stubborn physical laws. In 2008, the IBM Roadrunner system at Los Alamos National Laboratory became the first supercomputer to break the petascale barrier (10^15 FLOPS).1 This achievement heralded a golden age of simulation, enabling high-resolution fluid dynamics and the first large-scale genomic analyses. Yet, as scientists pushed the boundaries of inquiry, the petascale ceiling became a hindrance.

The transition from petascale to exascale took approximately 14 years, a duration significantly longer than the transition from terascale to petascale. This deceleration was not due to a lack of ambition but rather the collision with the end of Dennard Scaling—the observation that as transistors got smaller, their power density remained constant. By the mid-2000s, this scaling law broke down; making transistors smaller no longer automatically made them more power-efficient at higher frequencies. Consequently, simply cranking up the clock speed of processors was no longer a viable path to performance. A 2010-era projection estimated that achieving exascale using existing architectures would require over a gigawatt of power—roughly the output of a dedicated nuclear power plant.4

This energy crisis necessitated a complete architectural revolution. The industry had to pivot from "frequency scaling" to "massive parallelism." The result was a shift toward heterogeneous architectures, where energy-efficient accelerators (GPUs) handle the bulk of the computation, and CPUs act as coordinators. This paradigm shift defined the engineering of the exascale era.5

1.3 The Geopolitical Arena

Supercomputing is rarely discussed without acknowledging its role as a strategic national asset. The capability to model complex systems—be it the aerodynamics of a hypersonic missile, the efficacy of a new bio-weapon defense, or the stability of a financial market—confers a distinct competitive advantage. Consequently, the "race to exascale" became a focal point of technological rivalry between the United States, China, and the European Union.2

While the United States officially claimed the title of "first exascale" with the Frontier system in 2022, the landscape is nuanced. Reports indicate that China likely fielded exascale-class systems, such as the Sunway OceanLight and Tianhe-3, prior to Frontier's debut. However, these systems were not submitted to the Top500 ranking list, a decision widely interpreted as a strategic move to avoid attracting further trade sanctions or technology export bans.7 Meanwhile, Europe has asserted its technological sovereignty with the JUPITER system, marking its entry into the exascale club and reducing its reliance on foreign computational resources.9

2. The Architectural Renaissance: Engineering the Impossible

Achieving exascale performance required solving a trifecta of hard engineering problems: power efficiency, memory bandwidth, and interconnect latency. The solutions to these problems have reshaped the fundamental architecture of the computer.

2.1 The Heterogeneous Compute Node

The heart of the exascale machine is the heterogeneous compute node. In the past, supercomputers were often homogeneous, composed of thousands of identical CPUs. Today, the CPU has retreated to a managerial role, while the heavy lifting is performed by Graphics Processing Units (GPUs) or specialized accelerators.

This shift is driven by the physics of computation. CPUs are designed for latency-sensitive serial tasks—running an operating system, handling user input, or branching logic. They dedicate significant silicon area to cache and control units. GPUs, conversely, are designed for throughput. They consist of thousands of small, simple cores capable of performing the same operation on massive blocks of data simultaneously (Single Instruction, Multiple Data, or SIMD). For scientific simulations, which often involve applying the same physical equations to millions of grid points, GPUs are vastly more energy-efficient.11

  • Frontier utilizes a node architecture comprising one AMD EPYC CPU and four AMD Instinct MI250X GPUs. The interconnectivity within the node is handled by Infinity Fabric, allowing the GPUs to operate as peers to the CPU rather than subservient peripherals.12

  • Aurora employs a "max-series" approach, pairing Intel Xeon Max CPUs with Intel Data Center GPU Max (Ponte Vecchio) accelerators. This system is designed with an exceptionally high density of compute capability per node.11

  • El Capitan represents the next evolutionary step: the Accelerated Processing Unit (APU). The AMD Instinct MI300A integrates both CPU cores and GPU cores onto a single package, sharing physical resources. This integration eliminates the bottleneck of communicating over an external bus, creating a tightly coupled engine for simulation.13

2.2 The Memory Wall and Unified Architectures

For years, the "Memory Wall"—the growing disparity between the speed at which processors can calculate and the speed at which memory can feed them data—threatened to stall progress. In traditional discrete GPU architectures, data resides in two distinct places: the system memory (RAM) attached to the CPU, and the video memory (VRAM) attached to the GPU. Moving data between these two pools across a PCIe bus is slow and energy-intensive.

Exascale systems have dismantled this wall through Unified Memory Architectures. In the case of El Capitan’s MI300A, the CPU and GPU cores share a unified pool of High Bandwidth Memory (HBM3). This memory is stacked vertically directly on top of the processor using 3D packaging technologies. This physical proximity allows for bandwidths measured in terabytes per second. More importantly, it allows for "zero-copy" data sharing. A simulation running on the CPU can populate a data structure, and the GPU can immediately begin processing it without the data ever needing to be copied to a separate memory space. This dramatically simplifies programming and improves performance for memory-bound applications.13

2.3 The Nervous System: Interconnect Topology

A supercomputer is more than a collection of fast nodes; it is a single cohesive instrument. The network interconnect is the nervous system that binds these nodes together, allowing them to communicate with microsecond latency.

The technology of choice for the US exascale systems is HPE Slingshot. Unlike traditional Ethernet, which was designed for robust but best-effort delivery, Slingshot is an HPC-optimized Ethernet. It introduces distinct features to handle the "bursty" nature of scientific traffic.

  • Congestion Control: Slingshot switches perform real-time monitoring of network traffic. If a pathway is congested, the hardware can dynamically reroute packets through alternative paths. This "adaptive routing" prevents the "tail latency" problem, where a calculation running on 10,000 nodes is stalled waiting for the single slowest packet to arrive.16

  • Quality of Service: It allows different classes of traffic to coexist. This is crucial as supercomputers increasingly run complex workflows where simulation traffic, file system I/O, and in-situ visualization data all compete for bandwidth.

2.4 The Storage Revolution: Beyond Files

The data generated by exascale simulations is staggering. A single run can produce petabytes of output. Traditional parallel file systems, such as Lustre or GPFS, which have served the community for decades, are struggling to scale to these levels. The issue is often metadata—the information about the files (names, permissions, locations). When an application tries to create millions of files simultaneously, the metadata server becomes a bottleneck.

Aurora has pioneered a new approach called DAOS (Distributed Asynchronous Object Storage). DAOS abandons the traditional POSIX file system paradigm. Instead of managing data as files and blocks, it manages data as objects in a vast key-value store. It is designed specifically for non-volatile memory (NVM) like Intel Optane or NVMe SSDs. By removing the overhead of legacy file system locking mechanisms, DAOS allows for massive concurrency. It enables the system to handle the random I/O patterns typical of AI training and big data analytics without choking, representing a fundamental shift in how supercomputers "remember" information.18

3. Flagship Systems: A Tour of the Titans

The theoretical promise of exascale has been realized in hardware through a series of flagship systems, each reflecting the technological philosophy of its creators.

3.1 Frontier (Oak Ridge National Laboratory, USA)

Frontier stands as the pioneer, the first system to officially cross the exascale threshold on the Top500 list in May 2022. It is a machine of brute force and elegance.

  • Specifications: Frontier is powered by the HPE Cray EX architecture, housing 9,408 compute nodes in 74 liquid-cooled cabinets. It achieves a peak performance of 1.685 exaFLOPS and a sustained Linpack performance of 1.353 exaFLOPS.11

  • Cooling Engineering: To manage the immense heat generated by 21 megawatts of power, Frontier employs an aggressive direct liquid cooling system. Over 6,000 gallons of water circulate through the system every minute. This warm-water cooling allows the heat to be captured and potentially reused, and it eliminates the need for energy-intensive chillers, contributing to its high efficiency rating on the Green500 list (52.23 GFLOPS/watt).17

3.2 Aurora (Argonne National Laboratory, USA)

Aurora’s path to exascale was winding, involving architectural shifts and delays, but the result is a machine of extraordinary capability, particularly in AI.

  • Specifications: Aurora features a distinct configuration with a heavy reliance on GPU density. It contains over 60,000 Intel Data Center GPU Max units. This massive GPU count is specifically targeted at accelerating deep learning training and inference at scale.11

  • AI Focus: Aurora is positioned as a convergence machine, designed to seamlessly blend traditional simulation with AI. Its DAOS storage system is a critical enabler here, feeding data to the GPUs fast enough to keep them fed during the training of massive foundation models.22

3.3 El Capitan (Lawrence Livermore National Laboratory, USA)

As of late 2024 and entering 2025, El Capitan reigns as the most powerful supercomputer on the planet.

  • Mission Critical: Unlike Frontier and Aurora, which are primarily open science machines, El Capitan’s primary tenant is the National Nuclear Security Administration (NNSA). Its mission is classified: to ensure the safety and reliability of the US nuclear stockpile.24

  • The APU Advantage: Its AMD MI300A architecture is a game-changer for the specific physics codes used in nuclear modeling. These codes are often "memory-bound," meaning the processor spends more time waiting for data than calculating. The unified memory of the APU minimizes this wait time, delivering performance gains that exceed what raw FLOP counts would suggest. It achieved 1.809 exaFLOPS on the HPL benchmark.9

3.4 JUPITER (Forschungszentrum Jülich, Germany)

JUPITER marks Europe's declaration of independence in high-performance computing.

  • Modular Architecture: JUPITER is unique in its "Modular Supercomputing Architecture" (MSA). It consists of a "Cluster Module" for general-purpose tasks and a "Booster Module" for highly parallel exascale workloads. The Booster is powered by NVIDIA GH200 Grace Hopper Superchips, which, like the AMD MI300A, integrate an ARM-based CPU with a powerful GPU.9

  • Sustainability: JUPITER places a heavy emphasis on green computing, utilizing dynamic software to schedule jobs on the most energy-efficient module for the specific task. It is the first exascale system in Europe and currently ranks as the 4th fastest system globally.9

3.5 The Hidden Dragons: Tianhe-3 and Sunway OceanLight

While the Top500 list is dominated by US and European machines, the shadow of China's capabilities looms large.

  • Sunway OceanLight: A successor to the TaihuLight, OceanLight is located in Qingdao. It reportedly utilizes the SW26010-Pro processor, a homegrown many-core architecture. It has demonstrated sustained exascale performance on complex scientific applications, such as the "quantum many-body problem," winning the Gordon Bell Prize despite not appearing on the Top500 list.8

  • Tianhe-3: Based on the Phytium processor (ARM architecture) and Matrix accelerators, Tianhe-3 represents another indigenous path. These systems demonstrate China’s ability to achieve exascale despite restricted access to Western semiconductor technology, relying on massive parallelism and novel architectural designs to overcome hardware limitations.7

4. Scientific Applications: The Why of Exascale

The billions of dollars invested in exascale systems are not merely for bragging rights. These machines are scientific instruments, akin to the James Webb Space Telescope or the Large Hadron Collider, designed to see into the complexity of the natural world.

4.1 Climate Science: Resolving the Cloud Problem

Climate modeling has long been plagued by a scale problem. Climate is global, but weather is local. To simulate the whole Earth for a century, models historically used grid cells that were 50 to 100 kilometers wide. At this resolution, clouds—which are crucial for reflecting sunlight and trapping heat—cannot be seen. They are "sub-grid" phenomena. Modelers had to approximate them using statistical rules, or "parameterizations."

With the Energy Exascale Earth System Model (E3SM), scientists are deploying a technique called "Superparameterization" or Cloud-Resolving Models (CRMs).

  • Mechanism: Inside every single grid cell of the global climate model, the exascale computer runs a smaller, high-resolution 2D or 3D model specifically to simulate the physics of cloud formation. It is a simulation within a simulation.

  • Impact: This allows for the explicit resolution of convection currents and cloud lifecycles. The result is a dramatic improvement in the prediction of the water cycle. We can move from predicting "global average temperature rise" to predicting "changes in afternoon thunderstorm patterns in the US Midwest," which has direct implications for agricultural planning and flood infrastructure.29

4.2 Precision Medicine: The CANDLE Project

Cancer is a disease of immense complexity, driven by thousands of potential genetic mutations and protein interactions.

  • CANDLE (CANcer Distributed Learning Environment): This project leverages exascale to transform cancer research. It utilizes deep learning to analyze millions of clinical patient records to identify patterns in treatment response.

  • Molecular Dynamics: Simultaneously, it performs massive molecular dynamics simulations of the RAS protein, a protein family mutated in 30% of all cancers. These simulations model the protein's movement at the femtosecond scale, searching for transient "pockets" on the protein's surface where a drug molecule could potentially bind. This virtual drug screening allows researchers to test millions of compounds in silico before synthesizing a single one in the lab, accelerating the timeline of drug discovery.31

4.3 High-Energy Physics: WarpX and Plasma Accelerators

Particle accelerators are vital tools for science, used for everything from studying the fundamental nature of matter to treating cancer with proton therapy. However, they are enormous; the Stanford Linear Accelerator is 3 kilometers long.

  • Plasma Wakefield Acceleration: This novel technology proposes using a laser or particle beam to create a "wake" in a plasma, on which electrons can surf and gain energy. This could shrink accelerators from kilometers to meters.

  • The Simulation Role: Designing these accelerators is notoriously difficult because the physics involves the chaotic interaction of lasers and plasma at microscopic scales. The WarpX code runs on Frontier to simulate these interactions in full 3D. It recently validated a new concept for a combined injector and accelerator, a critical step toward making tabletop particle accelerators a reality.33

4.4 National Security: Stockpile Stewardship

For the United States, the primary driver for exascale funding is the National Nuclear Security Administration (NNSA). Since the cessation of underground nuclear testing in 1992, the US has relied on "science-based stockpile stewardship" to certify the safety and reliability of its nuclear arsenal.

  • The Challenge: Nuclear weapons are complex devices that age. Plastics degrade, metals corrode, and plutonium's crystal structure changes due to self-irradiation.

  • El Capitan's Role: El Capitan allows physicists to simulate the full weapons cycle in 3D. It models how these aging defects—microscopic cracks or chemical changes—would affect the weapon's performance during detonation. These high-fidelity simulations provide the confidence that the deterrent remains effective (and safe from accidental detonation) without the need for physical testing.24

5. Software and Algorithms: Taming the Beast

Building an exascale machine is an engineering triumph; programming one is a herculean intellectual challenge. The hardware complexity—thousands of nodes, heterogeneous processors, deep memory hierarchies—must be abstracted away so that scientists can focus on physics rather than chip architecture.

5.1 The Portability Nightmare

The diversity of exascale hardware presents a significant hurdle. Frontier uses AMD GPUs; Aurora uses Intel GPUs; JUPITER uses NVIDIA GPUs. Each vendor has its own preferred programming language (HIP, SYCL, CUDA). A scientist writing a fluid dynamics code does not want to rewrite their software three times.

To solve this, the community has coalesced around Performance Portability Layers like Kokkos and RAJA.

  • Mechanism: These are C++ libraries that allow developers to express their algorithms (e.g., a loop over a grid of particles) using high-level abstractions. At compile time, the library maps these abstract loops to the specific instructions required by the backend hardware.

  • Impact: This allows a single source code to run efficiently on Frontier, Aurora, and JUPITER. It "future-proofs" scientific codes against the changing landscape of hardware vendors.36

5.2 The Convergence of AI and HPC

Exascale marks the era where Artificial Intelligence and High-Performance Computing merge.

  • AI Surrogates: In traditional simulation, solving differential equations for every timestep is computationally expensive. Researchers are now training AI models (surrogates) on data from these high-fidelity simulations. Once trained, the AI can predict the outcome of a similar physical scenario thousands of times faster than the traditional solver.

  • Mixed Precision: AI workloads do not always require the rigorous 64-bit double-precision (FP64) math used in physics. They often thrive on 16-bit or even 8-bit math. Exascale hardware like the MI250X and GH200 includes specialized tensor cores designed to accelerate these lower-precision calculations, dramatically boosting throughput for AI tasks without sacrificing the precision required for the physics parts of the workflow.11

5.3 Resilience at Scale

With systems containing tens of millions of components, failure is not a possibility; it is a statistical certainty. If a node fails every few hours, a simulation that takes weeks to run would never finish.

  • Checkpoint/Restart: The traditional defense is to periodically save the entire state of the machine to disk. However, at exascale, the memory is so large that writing it all to disk takes too long.

  • Burst Buffers: Systems now use layers of fast non-volatile memory (NVMe) located near the compute nodes to absorb these checkpoints quickly, allowing the simulation to resume computations almost immediately.

  • Algorithmic Fault Tolerance: Research is also advancing in algorithms that can survive data corruption. These methods use mathematical properties to detect and correct errors on the fly, allowing the simulation to "compute through" a hardware failure without stopping.39

6. Challenges and Limitations: The Walls We Face

Despite the triumphs, the exascale era is defined by the limits it pushes against.

6.1 The Energy Wall

The most formidable challenge is power. Frontier, the most efficient of the group, consumes approximately 21 megawatts to run its Linpack benchmark. This is roughly the electricity consumption of a town of 20,000 people.

  • Thermodynamic Limits: We are approaching the limits of how much we can reduce the voltage of silicon transistors without causing errors (leakage). Scaling current architectures to Zettascale (10^21) would require hundreds of megawatts, which is economically and environmentally unsustainable.

  • The Cost of Movement: The primary energy consumer is not the calculation (flipping a bit) but the data movement (sending a bit across a wire). This reality is driving the architectural shifts toward 3D stacking and optical interconnects to minimize the distance data must travel.6

6.2 The Data Deluge

Exascale machines generate data at a rate that overwhelms human ability to analyze it. A single simulation might generate hundreds of petabytes of data. Storing this data forever is impossible; moving it to long-term tape storage is too slow.

  • In-Situ Visualization: The solution is to analyze the data where it lives. Visualization and analysis routines run concurrently with the simulation on the same compute nodes. They extract the key insights—a graph, a rendered image, a statistical summary—and discard the raw data. This requires a fundamental rethinking of scientific workflows, moving from "Simulate then Analyze" to "Analyze while Simulating".41

7. The Future: Zettascale and Post-Silicon Frontiers

As the exascale era matures, the eyes of the HPC community are already fixed on the next horizon: Zettascale.

7.1 The Road to 2035

Projections suggest that zettascale systems—machines a thousand times faster than Frontier—could emerge around 2035. However, getting there will require more than just "more of the same."

  • Mixed Precision as the Key: It is unlikely we will see a zettascale machine that performs 10^21 64-bit operations per second. Instead, "Zettascale AI" machines may achieve this throughput using lower precision (16-bit) math, which is sufficient for the AI models that will increasingly dominate scientific discovery.42

7.2 Post-Silicon Technologies

To break the energy wall, we must look beyond standard CMOS silicon.

7.2.1 Neuromorphic Computing

Inspired by the biological brain, neuromorphic computing abandons the clock-driven logic of standard CPUs. Chips like Intel’s Loihi 2 use Spiking Neural Networks (SNNs).

  • Event-Driven: In an SNN, parts of the chip are only active when a "spike" of information occurs. This is analogous to how the human brain works; neurons only fire when stimulated.

  • Efficiency: This architecture promises orders-of-magnitude improvements in energy efficiency for specific tasks like optimization problems, graph searches, and real-time control systems. While they won't replace CPUs for fluid dynamics, they may become the standard "co-processor" for AI logic in future supercomputers.44

7.2.2 Optical Computing

Companies like Lightmatter are pioneering the use of photons (light) instead of electrons to perform calculations.

  • The Speed of Light: Optical processors use interferometers to perform matrix multiplications—the core operation of AI—at the speed of light. Because light generates negligible heat compared to electrical resistance, these chips could offer a way to scale performance without the accompanying thermal crisis.

  • Integration: The near-term future likely involves "optical interconnects" replacing copper wires within the supercomputer to boost bandwidth, followed eventually by optical compute cores for specific AI workloads.46

7.2.3 Quantum-Classical Hybrids

Quantum computers are not replacements for supercomputers; they are specialized accelerators.

  • The Hybrid Model: The future supercomputer will likely be a hybrid monster. A massive classical system (like Frontier) will handle the bulk of the simulation code, data management, and control logic. When the simulation hits a problem that is intractable for classical physics—such as calculating the exact electron correlation in a complex molecule—it will offload that specific kernel to a Quantum Processing Unit (QPU) attached to the node. This integration is already being tested in pilot projects worldwide.48

8. Conclusion

The arrival of exascale computing is a technological triumph that redefines the boundaries of the possible. Systems like Frontier, El Capitan, and JUPITER are not merely faster calculators; they are time machines. They allow us to peer into the future of our changing climate, to look inside the microscopic degradation of nuclear materials, and to visualize the formation of galaxies.

However, exascale is also a warning. The extreme engineering required to build these machines—the liquid cooling, the 3D stacked silicon, the 20-megawatt power draws—signals that the era of "easy scaling" is over. We have reached the end of the roadmap that Gordon Moore sketched out decades ago. The path forward to Zettascale and beyond will not be paved with smaller transistors alone. It will be built on the convergence of diverse technologies: silicon and light, classical and quantum, simulation and AI.

As we stand on this new frontier, the focus of the supercomputing community is shifting. It is no longer just about the raw speed of the FLOP; it is about the efficiency of the watt, the insight of the data, and the ingenuity of the architecture. The exascale era is not the end of the story; it is the beginning of the post-silicon age.

Table 1: Comparative Analysis of Leading Exascale Systems

Feature

Frontier (ORNL)

Aurora (Argonne)

El Capitan (LLNL)

JUPITER (Jülich)

Peak Performance (Rmax)

1.353 exaFLOPS

1.012 exaFLOPS

1.809 exaFLOPS

1.000 exaFLOPS

Primary Architecture

HPE Cray EX

HPE Cray EX

HPE Cray EX

BullSequana XH3000

CPU Technology

AMD EPYC "Trento"

Intel Xeon Max Series

AMD MI300A (APU)

ARM Neoverse (Grace)

Accelerator

AMD Instinct MI250X

Intel Data Center GPU Max

AMD MI300A (Integrated)

NVIDIA GH200

Interconnect

HPE Slingshot-11

HPE Slingshot-11

HPE Slingshot-11

NVIDIA InfiniBand

Storage Paradigm

Orion (Lustre-based)

DAOS (Object Store)

Rabbit (Near-node NVMe)

Parallel File System

Primary Mission

Open Science / Energy

AI / Data Science

Nuclear Stewardship

Modular Science / AI

Energy Efficiency

~52 GFLOPS/Watt

Lower Efficiency

~60 GFLOPS/Watt

High Efficiency

Cooling

Warm Water Direct Liquid

Direct Liquid Cooling

Direct Liquid Cooling

Direct Liquid Cooling

Table 2: Key Exascale Applications and Impact

Domain

Project / Code

Scientific Goal

The Exascale Advantage

Climate Science

E3SM-MMF

Predict regional water cycles and severe weather.

Superparameterization: Resolves cloud physics explicitly within global grid cells.

Biomedicine

CANDLE

Precision cancer treatment and drug discovery.

Scale: Analyzes millions of patient records; simulates RAS protein dynamics at femtosecond resolution.

High Energy Physics

WarpX

Design compact plasma wakefield accelerators.

3D Fidelity: Simulates laser-plasma interactions in full 3D, validating novel injector designs.

National Security

Stockpile Stewardship

Certify safety of aging nuclear arsenal.

3D Aging: Models material degradation (cracks, corrosion) in full 3D coupled with hydrodynamics.

Clean Energy

Fusion Modeling

Design viable fusion reactors (ITER/DEMO).

Turbulence: Resolves plasma turbulence and boundary instabilities in tokamak reactors.

Table 3: The Path to Zettascale (Projected)

Era

Milestone

Performance

Key Technology Enabler

Power Challenge

2008

Petascale

10^15 FLOPS

Multi-core CPUs

1-3 MW

2022

Exascale

10^18 FLOPS

Heterogeneous (CPU+GPU)

20-30 MW

2028

Post-Exascale

10^19 FLOPS

AI-HPC Convergence, Chiplets

40-50 MW

2035

Zettascale

10^21 FLOPS

Optical Interconnects, 3D Stacking, Mixed Precision

>100 MW (Projected)

Table 4: Emerging Post-Silicon Technologies

Technology

Mechanism

Best Use Case

Maturity

Neuromorphic

Spiking Neural Networks (Event-driven)

Real-time control, edge AI, sparse data processing.

Research / Early Commercial (Intel Loihi)

Optical Computing

Photonic Matrix Multiplication (Interferometry)

Linear algebra, AI inference at light speed.

Startups / Prototypes (Lightmatter)

Quantum Computing

Qubits (Superposition/Entanglement)

Factorization, Quantum Chemistry, Optimization.

NISQ Era (Noisy Intermediate-Scale Quantum)

In-Memory Computing

Processing logic embedded in RAM/Storage

Database search, graph analytics, reduction of data movement.

Developing / Niche Deployment

Table 5: Programming Models for Exascale

Model

Description

Target Hardware

Primary Benefit

Kokkos

C++ Abstraction Library

Portable (NVIDIA, AMD, Intel, CPU)

Performance Portability: Write once, run anywhere with optimized backends.

RAJA

C++ Abstraction Library

Portable (NVIDIA, AMD, Intel, CPU)

Loop Abstraction: Decouples loop bodies from execution policies.

SYCL

Open Standard C++

Heterogeneous Systems (Intel focused)

Single-Source: Combines host and device code in one file (Khronos standard).

HIP

AMD's GPU Language

AMD GPUs (and NVIDIA via translation)

CUDA Compatibility: Easy porting of legacy NVIDIA codes to AMD hardware.

OpenMP

Directive-based API

CPUs and GPUs (Offloading)

Simplicity: Uses #pragma directives to offload code blocks to accelerators.

Works cited

  1. What is exascale computing? - McKinsey, accessed January 7, 2026, https://www.mckinsey.com/featured-insights/mckinsey-explainers/what-is-exascale-computing

  2. Exascale computing - Wikipedia, accessed January 7, 2026, https://en.wikipedia.org/wiki/Exascale_computing

  3. Overview of the ECP - Exascale Computing Project, accessed January 7, 2026, https://www.exascaleproject.org/about/

  4. The Opportunities and Challenges of Exascale Computing - DOE Office of Science, accessed January 7, 2026, https://science.osti.gov/-/media/ascr/ascac/pdf/reports/Exascale_subcommittee_report.pdf

  5. From Petascale to Exascale Computing - YouTube, accessed January 7, 2026, https://www.youtube.com/watch?v=RNqbCjl6QmU

  6. Exascale Computing Technology Challenges, accessed January 7, 2026, https://pdsw.org/pdsw10/resources/slides/PDSW_SC2010_ExascaleChallenges-Shalf.pdf

  7. The Mystery Of Tianhe-3, The World's Fastest Supercomputer, Solved? - The Next Platform, accessed January 7, 2026, https://www.nextplatform.com/2024/02/09/the-mystery-of-tianhe-3-the-worlds-fastest-supercomputer-solved/

  8. Three Chinese Exascale Systems Detailed at SC21: Two Operational and One Delayed - HPCwire, accessed January 7, 2026, https://www.hpcwire.com/2021/11/24/three-chinese-exascale-systems-detailed-at-sc21-two-operational-and-one-delayed/

  9. El Capitan Retains #1 as JUPITER Becomes Europe's First Exascale System in the 66th TOP500 List, accessed January 7, 2026, https://top500.org/news/el-capitan-retains-1-as-jupiter-becomes-europes-first-exascale-system-in-the-66th-top500-list/

  10. NVIDIA Grace Hopper Superchip Powers JUPITER, Defining a New Class of Supercomputers to Propel AI for Scientific Discovery, accessed January 7, 2026, https://nvidianews.nvidia.com/news/nvidia-grace-hopper-superchip-powers-jupiter-defining-a-new-class-of-supercomputers-to-propel-ai-for-scientific-discovery

  11. June 2025 - TOP500, accessed January 7, 2026, https://top500.org/lists/top500/2025/06/

  12. Frontier User Guide - OLCF User Documentation, accessed January 7, 2026, https://docs.olcf.ornl.gov/systems/frontier_user_guide.html

  13. AMD MI300A APU: Unified CPU-GPU Accelerator - Emergent Mind, accessed January 7, 2026, https://www.emergentmind.com/topics/amd-mi300a-accelerated-processing-unit-apu

  14. Using El Capitan Systems: Hardware Overview | HPC @ LLNL, accessed January 7, 2026, https://hpc.llnl.gov/documentation/user-guides/using-el-capitan-systems/hardware-overview

  15. Dissecting CPU-GPU Unified Physical Memory on AMD MI300A APUs - arXiv, accessed January 7, 2026, https://arxiv.org/html/2508.12743v1

  16. Cray, AMD Tag Team On 1.5 Exaflops “Frontier” Supercomputer - The Next Platform, accessed January 7, 2026, https://www.nextplatform.com/2019/05/07/cray-amd-tag-team-on-1-5-exaflops-frontier-supercomputer/

  17. Exclusive Inside Look at First US Exascale Supercomputer - HPCwire, accessed January 7, 2026, https://www.hpcwire.com/2022/07/01/exclusive-inside-look-at-first-us-exascale-supercomputer/

  18. DAOS - ALCF User Guides, accessed January 7, 2026, https://docs.alcf.anl.gov/aurora/data-management/daos/daos-overview/

  19. DAOS: Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence, accessed January 7, 2026, https://insidehpc.com/2019/08/daos-scale-out-software-defined-storage-for-hpc-big-data-ai-convergence/

  20. Frontier (supercomputer) - Wikipedia, accessed January 7, 2026, https://en.wikipedia.org/wiki/Frontier_(supercomputer)

  21. Facts about Frontier - Oak Ridge National Laboratory, accessed January 7, 2026, https://www.ornl.gov/blog/facts-about-frontier

  22. Distributed Asynchronous Object Storage (DAOS) on Aurora, accessed January 7, 2026, https://www.alcf.anl.gov/events/distributed-asynchronous-object-storage-daos-aurora

  23. Overview of DAOS - YouTube, accessed January 7, 2026, https://www.youtube.com/watch?v=J8pV3U5Rv1s

  24. LLNL El Capitan Fact Sheet, accessed January 7, 2026, https://www.llnl.gov/sites/www/files/2024-12/llnl-el-capitan-fact-sheet.pdf

  25. El Capitan: NNSA's first exascale machine | Advanced Simulation and Computing, accessed January 7, 2026, https://asc.llnl.gov/exascale/el-capitan

  26. November 2025 - TOP500, accessed January 7, 2026, https://top500.org/lists/top500/2025/11/

  27. NVIDIA GH200 Grace Hopper Superchip, accessed January 7, 2026, https://www.nvidia.com/en-us/data-center/grace-hopper-superchip/

  28. China Stretches Another AI Framework To Exascale - The Next Platform, accessed January 7, 2026, https://www.nextplatform.com/2022/04/26/china-stretches-another-ai-framework-to-exascale/

  29. E3SM-MMF - Exascale Computing Project, accessed January 7, 2026, https://www.exascaleproject.org/research-project/e3sm-mmf/

  30. ECP Advances the Science of Atmospheric Convection Modeling, accessed January 7, 2026, https://www.exascaleproject.org/ecp-advances-the-science-of-atmospheric-convection-modeling/

  31. Exascale Deep Learning and Simulation Enabled Precision Medicine for Cancer, accessed January 7, 2026, https://www.anl.gov/exascale/exascale-deep-learning-and-simulation-enabled-precision-medicine-for-cancer

  32. Exascale Computing Project Contributes to Accelerating Cancer Research, accessed January 7, 2026, https://www.exascaleproject.org/exascale-computing-project-contributes-to-accelerating-cancer-research/

  33. WarpX - Exascale Computing Project, accessed January 7, 2026, https://www.exascaleproject.org/research-project/warpx/

  34. Exascale's New Frontier: WarpX - Oak Ridge Leadership Computing Facility, accessed January 7, 2026, https://www.olcf.ornl.gov/2023/07/18/exascales-new-frontier-warpx/

  35. El Capitan supercomputer is ready to handle nuclear stockpile and AI workflows, accessed January 7, 2026, https://www.nextgov.com/emerging-tech/2025/01/el-capitan-supercomputer-ready-handle-nuclear-stockpile-and-ai-workflows/402088/

  36. Kokkos/RAJA - Exascale Computing Project, accessed January 7, 2026, https://www.exascaleproject.org/research-project/kokkos-raja/

  37. Kokkos/RAJA - Exascale Computing Project, accessed January 7, 2026, https://www.exascaleproject.org/wp-content/uploads/2019/10/KOKKOS.pdf

  38. AMD says zettascale supercomputers will need half a gigawatt to operate, enough for ... - Tom's Hardware, accessed January 7, 2026, https://www.tomshardware.com/pc-components/gpus/amd-says-zettascale-supercomputers-will-need-half-a-gigawatt-to-operate-enough-for-375-000-homes

  39. Assessing Energy Efficiency of Fault Tolerance Protocols for HPC Systems - IEEE Xplore, accessed January 7, 2026, http://ieeexplore.ieee.org/document/6374769/

  40. A Performance and Energy Comparison of Fault Tolerance Techniques for Exascale Computing Systems - Colorado State University, accessed January 7, 2026, https://www.engr.colostate.edu/~hj/conferences/366.pdf

  41. Exascale Computing's Four Biggest Challenges and How They Were Overcome, accessed January 7, 2026, https://www.olcf.ornl.gov/2021/10/18/exascale-computings-four-biggest-challenges-and-how-they-were-overcome/

  42. Zettascale computing - Wikipedia, accessed January 7, 2026, https://en.wikipedia.org/wiki/Zettascale_computing

  43. Zettascale by 2035? China Thinks So - HPCwire, accessed January 7, 2026, https://www.hpcwire.com/2018/12/06/zettascale-by-2035/

  44. Taking Neuromorphic Computing with Loihi 2 to the Next Level Technology Brief - Intel, accessed January 7, 2026, https://download.intel.com/newsroom/2021/new-technologies/neuromorphic-computing-loihi-2-brief.pdf

  45. Intel Loihi 2 Neuromorphic Compute Tile on Intel 4 - ServeTheHome, accessed January 7, 2026, https://www.servethehome.com/intel-loihi-2-neuromorphic-compute-tile-on-intel-4/

  46. Lightmatter® - The photonic (super)computer company., accessed January 7, 2026, https://lightmatter.co/

  47. Lightmatter: Solving How to Interconnect Millions of Chips - The Futurum Group, accessed January 7, 2026, https://futurumgroup.com/insights/lightmatter-solving-how-to-interconnect-millions-of-chips/

  48. Hybrid Quantum-Classical Computing - Dell Technologies, accessed January 7, 2026, https://www.delltechnologies.com/asset/en-us/solutions/infrastructure-solutions/briefs-summaries/hybrid-quantum-classical-computing-brochure.pdf

  49. RIKEN-led Project Seeks to Combine The Powers of Quantum Computers And Supercomputers, accessed January 7, 2026, https://thequantuminsider.com/2026/01/06/riken-led-project-seeks-to-combine-the-powers-of-quantum-computers-and-supercomputers/

Comments


bottom of page