Modular Supercomputing: a System-Wide Orchestration of Heterogeneous Resources
Plenary Opening Session on January 24, 2022.
Very diverse applications run on today’s HPC systems: traditional tightly-coupled codes, machine learning and AI codes, multi-physics and multi-scale simulations, and application workflows combining all these. Each computational science domain asks for a particular kind of machine. HPC centres must serve all these requirements under strict operational constraints and in a limited power envelope. This makes it hardly possible to find a single technology that suits all. In response, most HPC-sites opt for heterogeneous system architectures.
The specific approach to heterogeneous computing chosen and developed at JSC (Jülich Supercomputing Centre) through the DEEP projects, is the Modular Supercomputing Architecture (MSA). The MSA orchestrates heterogeneous computer resources (CPUs, GPUs, many-core accelerators, disruptive technologies, etc.) at system-level, organizing them in compute modules. Modules are clusters of potentially large size, each configured with a specific type of user requirement in mind. The different modules are interconnected via a high-speed network, and a common software stack brings all modules together creating a unique machine.
Each application can dynamically decide how many nodes to use in each module, mapping its intrinsic requirements and concurrency patterns onto the hardware. An advanced scheduler and dynamic resource manager assign resources to jobs targeting maximum system utilization. Codes that perform multi-physics or multi-scale simulations can run across compute modules thanks to a global system-software and programming environment. Application workflows that execute different actions after (or in parallel) to each other can also be distributed in order to run each workflow-component on the best suited hardware, and exchange data either directly (via message-passing communication) or via the file-system.
This talk will describe the Modular Supercomputing Architecture, its main hardware and software elements and the user experience, to conclude with the MSA perspectives in the Exascale context.
Dr. Estela Suarez is Senior Scientist and deputy-lead of the Technology Department at the Jülich Supercomputing Centre, which she joined in 2010. Her research focuses on HPC system architectures and co-design. As leader of the DEEP series of EU-funded projects she has driven the development of the Cluster-Booster and the Modular Supercomputing Architectures, including hardware, software and application implementation and validation. Additionally, since 2018 she gives lectures on HPC architectures at the University of Bonn and leads the co-design efforts within the European Processor Initiative. She holds a PhD in Physics from the University of Geneva and a Master degree in Astrophysics from the University Complutense of Madrid.