Supercomputer performance

There are some common metrics for measuring the supercomputers’ performance and comparing them. The oldest and most classic one is the number of floating-point operations per second (FLOP/s) that the computer can perform.

What does FLOP/s mean, then? A floating-point number is the computer representation of a real number. If we perform a single calculation involving two real numbers, such as 2.1 + 4.3, in one second, that is equal to one floating-point operation per second (1 FLOP/s).

While computers can also work with integer numbers, which is important for some applications, the mathematical problems are solvable using only real numbers in most scientific problems. Consequently, the operations included in this measurement are the basic arithmetic operations: addition, subtraction, multiplication, and division. Over the years, FLOP/s has become the standard measure.

When a computer can execute

a billion FLOP/s, its performance is one Giga FLOP/s (GFLOP/s)
one trillion (10¹²) FLOP/s is one Tera FLOP/s (TFLOP/s)
one quadrillion or 10¹⁵ FLOP/s, the performance is one Peta FLOP/s (PFLOP/s)

The theoretical computing power of a CPU core is determined by the clock speed and the maximum number of floating operations it can perform in one clock cycle. As an example, a CPU core in any given laptop might have a clock speed of 3 GHz, meaning that in a second it can perform 3 billion cycles.

$ 3\text{GHz}=3\times10^{9}\frac{1}{\text{s}} $

Furthermore, suppose this theoretical core can perform 16 floating operations or 16 FLOPs in a cycle. All together the core will have a peak performance of 48 GFLOP/s.

$ 3\times10^{9}\frac{1}{\text{s}} \times16\text{FLOP}=48\frac{\text{GFLOP}}{\text{s}}$

The theoretical peak performance of GPUs can be calculated similarly. Furthermore, the theoretical peak performance of a whole supercomputer is obtained by multiplying the theoretical peak performance of its CPUs and GPUs by the number of these components used in the system.

Performance terminology:

| Operations per second | Scientific notation | Metric prefix | Unit | |---------------------------|---------------------|---------------|---------| | 1 000 | 10³ | Kilo | KFLOP/s | | 1 000 000 | 10⁶ | Mega | MFLOP/s | | 1 000 000 000 | 10⁹ | Giga | GFLOP/s | | 1 000 000 000 000 | 10¹² | Tera | TFLOP/s | | 1 000 000 000 000 000 | 10¹⁵ | Peta | PFLOP/s | | 1 000 000 000 000 000 000 | 10¹⁸ | Exa | EFLOP/s |

As the term theoretical suggests, this computing power can generally not be reached in real calculations. Before a CPU can calculate 2.1 + 4.3, it needs to fetch two numbers from memory, and afterward, it needs to store the result back to memory. This does not happen instantaneously, so in practice, the computational speed is determined not only by the pure computing power of a CPU but also by how fast the CPU can access the memory.

Different applications have different ratios for floating operations per memory access. In some cases, the same number is used in multiple computations, as when calculating both 2.1 + 4.3 and 2.1 + 5.3 simultaneously. Suppose the memory access speed is the limiting factor (as is the case with most modern computers). In that case, applications performing many floating-point operations with the same data can achieve higher performance levels than applications with fewer operations per data.

In supercomputers, a CPU in one node might also need to access data in another node. Thus the speed of communicating data between nodes can also limit the practical performance. Furthermore, real-world applications also need to read and write data to the disk, which means that the speed of I/O (input/output, data transfer between processors and storage) may also further limit the performance.

TOP500 List

A benchmark is a standard by which the performance and functionality of a certain device, such as a (super)computer, can be measured. Typically, the runtime of the benchmark application is recorded and used as a metric. With benchmarks, it is easier to compare the performance of different computers.

LINPACK is a common benchmark which measures a system's floating-point computing power. LINPACK performs linear algebra operations to solve a system of linear equations, and typically about 75% of the theoretical peak performance is achieved in the measuring.

TOP500 is a supercomputer ranking list that collects LINPACK results submitted by organizations operating supercomputers. The list is updated twice a year, and the 500 most powerful supercomputers worldwide are ranked according to their computational power measured by the LINPACK benchmark.

In the first-ever TOP500 list in June 1993, the most powerful supercomputer came from the USA and had a performance of 60 GFLOP/s. In comparison, in November 2020, the fastest supercomputer (from Japan) had a performance of 440 PFLOP/s, which is almost seven million times faster than the winner 27 years before. Similarly, the last system (#500) on the list in June 1993 had a power of 0.4 GFLOP/s, while in November 2020, it had a power of 1.3 PFLOP/s, which is about three million times faster. Just like normal computers, supercomputers have reached tremendous increases in computational power over the years.

The figure below illustrates the performance development of TOP 500 supercomputers between 1993 and 2021. Green dots represent the total performance of the top 500 computers; orange triangles represent the #1 computer, and blue squares the #500 computer.

TOP500 list from 1993

For perspective, a modern laptop can have a performance of around 300 GFLOP/s, which means it would have been the number one supercomputer in June 1993. In 2020, the number one supercomputer had the power equivalence of over a million such laptops combined.

The Mahti supercomputer at CSC – IT For Science has a theoretical peak performance of 7.5 PFLOP/s, which means $ 7.5\times10^{15}$ operations per second, corresponding to around 24,000 laptops combined. So even if every person on Earth would perform one arithmetic operation per second, the combined performance would still be a million times lower than Mahti’s.

The new LUMI supercomputer has a theoretical peak performance of 428.70 PFLOP/s and is as of 2023 the third most powerful supercomputer in the world.