Skip to main content

Khronos Blog

Exascale Computing Project at the University of Cambridge uses Khronos SYCL Standard to Develop Performance Portable FEniCS Libraries for the Finite Element Method

Image courtesy of: https://fenicsproject.org/ - The figure shows the von Mises stresses computed from a nonlinear thermomechanical FEniCSx simulation of a turbocharger.

Researchers from the University of Cambridge use SYCL™ as a high-performance language for solving differential equations with the finite element method.  SYCL is an open, non-proprietary, royalty-free programming language developed and maintained by the Khronos® Group open standards consortium, with multiple compiler implementations that enable performance portable code on new-generation and multi-vendor hardware. 

Overview of the Research

The FEniCS Project libraries are in development at the University of Cambridge for research into solving partial differential equations (PDEs) on high-performance computers. FEniCSx launched in 2018 as the new version of the FEniCS libraries. It includes numerous improvements over legacy FEniCS, including greater extensibility, support for more cell types and elements, improved parallelization, and complex number support. 

FEniCSx enables users to translate scientific models into efficient finite element code quickly. With the high-level Python and C++ interfaces to FEniCSx, it is easy to get started, but FEniCSx also offers powerful capabilities for more experienced programmers. FEniCSx runs on many platforms, ranging from laptops to high-performance clusters.

Exascale Computing

The most complex scientific research questions need more and more computer power to solve. Exascale computing systems can calculate at least 1018 IEEE 754 double precision (64-bit) operations and multiplications per second, allowing researchers to create more realistic models to solve many new challenges. 

As part of the FEniCS Project, researchers under Professor Garth Wells are working on two core projects towards exascale computing. 

First, the ASiMoV project with Rolls-Royce and other university partners is researching electrification to achieve net-zero flights. The team is building virtual designs and simulations of aero engines to study the electromagnetic, mechanical, thermal, and fluid flow responses – and how they each interact. This research focuses on propulsion and includes complex physical simulations with highly complex geometries. The second is an ExCALIBUR Project, the UK’s exascale computing program focused on software and experimental hardware. This research is essential for developing future exascale computing capabilities and providing accelerated computing platforms with application domains, including fusion energy.

The team at the FEniCS Project uses GPUs for high-performance computing to solve differential equations using the finite element method. With the finite method, the researchers perform operations over vast, unstructured grids distributed across multiple compute devices.

The Challenge 

In scientific computing, sparse matrices are commonly used to store matrices containing many zero-valued elements. Historically, research labs used CPUs to solve these matrix systems of equations. However, in recent years, there has been a trend in high-performance computing (HPC) toward using accelerator architectures—GPUs in particular—because they provide significant energy savings. The largest supercomputers in the world are embracing the performance and efficiency advantages of accelerators for many data-parallel workloads, and smaller GPU clusters are also seeing increasing use in research and production.

Yet, the shift toward using GPUs has created new design challenges for enabling code performance and portability. There are several reasons why traditional methods using large sparse matrices create significant challenges when accelerating with GPUs instead of CPUs. Early developer struggles arose because software stacks weren’t mature, the hardware was underdeveloped, and developers didn’t understand what had to be done mathematically to exploit accelerators fully. Research teams were porting what they had done before onto modern hardware. They were using low-order methods, limited by memory bandwidth, and trying to use large sparse matrices, which are not always a friendly operation for GPU accelerators. They were trying to adjust CPU algorithms to GPUs, and the performance was underwhelming. 

The Finite Element Method

The research team began shifting their methods to make things work better on GPUs. They moved toward higher-order versions of the finite element method, which boosts arithmetic intensity. With the finite element method, they also stopped using large sparse matrices, which removed a memory bandwidth bottleneck for GPUs. These method changes allowed them to increase accuracy and arithmetic intensity. The researchers saw a substantial boost in throughput when they used matrix-free methods. There is now a growing consensus that matrix-free methods are the way forward. 

For these reasons, many supercomputing and exascale programs are GPU-accelerated using high-order methods. However, different architectures and platforms use various programming models, which is a development burden. 

HPC and exascale computing require significant development effort. Ideally, a programming model is portable across systems and can support the latest hardware. To make that happen, an open interface with an approach that offers performance portability across architectures and backends for multiple implementations needs to happen. Open, cross-architecture programming is required for accelerated distributed computing.

Researchers at the University of Cambridge need to be able to develop code that runs on hardware from multiple vendors while achieving high performance. 

The Solution 

The research team at the University of Cambridge chose the Khronos SYCL open standard programming language to program the GPU accelerators, combining it with Message Passing Interface (MPI) for cluster communication. SYCL is a C++-based heterogeneous parallel programming framework for accelerating HPC and compute-intensive applications on a wide range of processor architectures, including CPUs, GPUs, FPGAs, and tensor accelerators. SYCL provides a consistent language, APIs, and ecosystem in which to write and tune code for various accelerator architectures. 

The FEniCS Project needed a high-level language extension that expanded on the modern C++ standard. They chose to use SYCL implementations to put their code directly onto a GPU, which enables them to see high performance and a substantial boost in throughput. It was attractive to the FEniCS Project developers that they could use a single source open specification that can run on different hardware and has multiple implementations. This would allow them to write their software using the SYCL standard and enable it to run on any hardware or supercomputing architecture. SYCL allows developers to build their applications on top of the framework, which handles performance portability between the different GPUs. 

“We acknowledge that the field is moving to accelerators. We also know that this may mean various architectures, and we don’t have the bandwidth to support a large range of acceleration backends. With SYCL, we get an independent, open specification that does not lock us into any one hardware vendor. We can use SYCL to develop computing-intensive applications and execute them on various hardware architectures,” said Professor Wells. “Because SYCL is an open standard, we all benefit from the community sharing feedback and working on issues.”

Because SYCL is hardware agnostic by design and can potentially port code to many of the largest machines for HPC and exascale computing, it was a solid choice for the FEniCS Project. SYCL enables different development use cases, including developing new offload acceleration or heterogeneous compute applications, converting existing C or C++ code to code compatible with SYCL, and migrating from other accelerator languages or frameworks.

Khronos® is a registered trademark, and SYCL™ is a trademark owned by The Khronos Group Inc.  All other product names, trademarks, and/or company names are used solely for identification and belong to their respective owners.

Comments