Graphics Processing Units

Supercomputers are mainly built of general-purpose components, which meant using multicore CPUs until recently because commercial demand was mainly driving the development of these processors for desktop and business computing. However, there is another market where processor performance is important: computer gaming.

Computer games are such an enormous market that developing specialized processors or GPUs providing high-quality 3D graphics is easily worthwhile. Special video processors were used already in the 1970s, but the GPUs as we know them have been around since the 1990s. Sony introduced the term graphics processing unit in 1994 with the PlayStation, and by the end of the decade, GPUs started to appear in the PC world.

In the early 2000s, the first experiments in using GPUs for scientific computing began, and as some GPUs started including features targeting high-performance computing as well, the term GPGPU (General Purpose Graphics Processing Unit) was coined. Today, the term GPU is frequently used in high-performance computing.

When comparing a CPU to a GPU, one word is enough to describe the differences: complexity. Below is a schematic representation of a CPU and a GPU side by side.

A CPU is a more complex, flexible device oriented towards general purpose usage. It’s fast and versatile, designed to run operating systems and various, very different types of applications. It has lots of features, such as better control logic, caches, and cache coherence, unrelated to pure computing.

A GPU has, in comparison, a relatively small number of transistors dedicated to controlling and caching as well as a much larger selection of transistors dedicated to mathematical operations. Since the cores in a GPU are designed just for 3D graphics, they can be made much simpler, and there can be a very larger number of them, the current ones containing thousands. An individual core in a GPU is less powerful than one in a CPU, but with the high amount of parallelism available in a GPU, it can even outperform a multicore CPU.

GPUs typically have a much higher memory access speed than CPUs, which can be important for good performance as well. In addition, due to the simpler construction, GPUs are also more energy-efficient than CPUs, which means they use less energy per floating-point operation. For this reason, most of the new systems targeting exascale performance utilize GPUs. Otherwise, the electricity consumption when running these systems would become too high.

The video below illustrates the power of massive parallelism in GPUs (note that real CPUs also have parallelism even though to a limited extent compared to GPUs).
Mythbusters demoing GPU vs. CPU

Because of the more straightforward operation of GPU cores and the requirement for advanced parallelism, all scientific problems cannot be adapted easily to GPUs. Furthermore, while CPUs can also effectively handle task-level parallelization in which different cores perform different operations, GPUs only work efficiently in highly data-parallel cases where all the cores perform identical arithmetic operations for different data. Generally, programming GPUs is more involved, and extracting good performance might require that the programmer consider some quite low-level details about the hardware.

Due to their specialized nature, GPUs need CPUs on their side. GPUs do not run any operating system, and the application execution always starts in the CPU. After the application opens, computations can instead be offloaded to the GPU for speed up. Accordingly, GPUs are often referred to as accelerators.

Depending on the use case, only a part of the application may be offloaded to the GPUs, and it is also possible to perform computations both on the CPU and the GPU simultaneously. However, the main memories of the CPU and the GPU are separate, so carrying out computations requires that the data be first copied from the CPU to the GPU.

Another thing to consider is that when the CPU needs to operate with the results from the GPU, data needs to be copied from the GPU to the CPU. As mentioned earlier, the memory bus between the GPU main memory and the GPU cores is typically faster than that on the CPU, but moving the data between the two is relatively slow and can often become a performance bottleneck. Therefore, the programmer must therefore pay careful attention to minimizing data transfers between the CPU and the GPUs.

Despite the challenges and limitations, many applications benefit from GPU acceleration. For example, there are typically four to six GPUs in supercomputers and one to two CPUs per node. When comparing the performance of an application when running with the assistance of GPUs or with only the CPUs, GPUs can typically speed up the software by a factor of four to eight, but in some cases, even by a factor of ten or more.

GPUs play a particularly important role in machine learning applications such as training neural networks. This is because the arithmetic operations in the training of neural networks are inherently highly parallel and, as such, very well suited for the GPUs. In best cases, GPUs are up to 30 times faster than CPUs in performing these actions.

Even though GPUs have played a significant role in high-performance computing for merely a decade, using accelerators in supercomputers is nothing new. Utilizing some type of co-processors to carry out parts of the calculations has been an on and off-going trend since the 1960s. At the moment, it looks like GPUs or other types of accelerators are here to stay, but only time will tell.

Nvidia has traditionally been the most visible GPU vendor due to both the performance of their hardware as well as the maturity of their software development ecosystem. However, Intel and AMD have recently been active in designing GPUs for high-performance computing, and LUMI supercomputer uses AMD GPUs

Includes material from "Supercomputing" online-course (https://www.futurelearn.com/courses/supercomputing/) by Edinburgh Supercomputing Center (EPCC), licensed under Creative Commons SA-BY