How high-performance computing has changed the world

Supercomputers are nowadays essential tools for a multitude of scientific problems. This article will discuss different paradigms of science and the role supercomputers have in them.

The three paradigms of science

In scientific research, various methods are used to find explanations for natural phenomena and make predictions about their behavior. The traditional methods are experimental and theoretical, the two traditional paradigms of science. Experimental science is based on observations and measurements. Theoretical science develops models, sometimes called laws of nature, which fit or “explain” measurements and, in the best case, predict phenomena that have not yet been observed. Since the mid 20th century, experimental and theoretical science have been complemented by computational science. Computational science can be defined as a discipline concerned with the design, implementation, and use of mathematical models to analyse and solve scientific problems via computer simulations and numerical analysis. The three disciplines or paradigms are often strongly interlinked.

Experimental science

Early astronomy is an example of experimental science. Ancient astronomers watched the skies first with bare eyes and then with telescopes and recorded the positions and movements of the Sun, the Moon, the planets of our solar system and some stars. They noted that their observations followed specific patterns. Similarly, one can drop stones of various weights from towers of different heights and measure the time it takes for the stones to hit the ground. Likewise, the random movement of small particles in liquid (Brownian motion) can be observed with microscopes, elementary particles with particle accelerators, etc.

Different experiments require widely different measurement apparatuses, and carrying out experiments can be very expensive. Nevertheless, experimental science can be considered as the fundamental paradigm: theories and computer simulations are, after all, confirmed or rejected based on the measurements.

Theoretical science

Based on the observations, one may predict the sunset time for a given day and say that the flight time of a rock is roughly the same regardless of its weight when dropped from the same height, and so on. Next, a theoretical scientist takes a look at the results and wonders how they can explain them. For the movement these bodies, mathematical models for gravity and dynamics developed by Newton were able to produce predictions that fit the measured results very nicely.

This is an example of the interplay between the experimental and theoretical approaches. Usually, observations come first, and then a theory is developed to explain them. But not always; for example, Paul Dirac first predicted the existence of the positron (the electron’s antiparticle) in 1931 from a purely theoretical point of view. Subsequently, Carl Anderson confirmed the existence of the positron experimentally in 1932 and received a Nobel prize in 1936 for his discovery.

Computational science

However, there are limits to what can be done with pen and paper or what kind of experiments can be set up. For example, continuing with the astronomical theme, it is easy enough to write down the equations that describe the motion of the Sun, the Moon, and the Earth. This is a so-called three-body problem, a relatively simple group of equations that, however, cannot be solved in closed form. That is, there is no formula where one can plug in time to get the positions of the heavenly bodies involved. Nevertheless, an approximate solution can be found numerically.

On the other hand, consider the climate and glaciers: it is crucial to know what will happen to them in the future. But naturally, we cannot perform experiments on a global scale to find that out. Similarly, in many cases, the required experiments would be too expensive, too dangerous, too slow, too difficult or complex, unethical, and so on. But we may still have an idea of the underlying theory.

In computational science, typically complex mathematical models (e.g., systems of partial differential equations) describing the problem are solved using numerical algorithms which are then implemented as computer programs and finally run on high-performance computers. These steps form a rough outline of a computational science project:

Write down the equations describing the problem. This is called mathematical modeling.
Choose an efficient numerical algorithm for solving the model.
Write a computer program that implements the algorithm efficiently.
Run the resulting program on a computer.

Steps 2-4 are called numerical simulation of the problem.

In this way, experiments can be partially replaced by simulations. Of course, only experiments, if feasible, can validate the model and simulation results.

Computational science is typically based on theoretical models. However, in some cases, computers can also be used to discover and prove mathematical theories. As an example, the so-called Four Color Theorem was proved in the 1970s, not purely by pen and paper, but with the help of a computer.

Data science

Nowadays, data science is sometimes considered a new paradigm. Data science can be defined as a field that uses various mathematical methods and algorithms to extract knowledge and insight from data. This insight can, in some cases, be used for formulating new mathematical models. As an early example, the measurements of Tycho Brahe made up the data that Johannes Kepler analyzed when formulating the Kepler's laws of planetary motion. Later on, Newton showed that Kepler's laws are a consequence of his theory of gravitation.

Two relevant fields related to each other in data science that have become more important in recent years are machine learning and artificial intelligence. Artificial intelligence is defined as technology to simulate intelligence as humans define it. For example, intelligence that we use to play chess or go. This technology doesn’t have predefined static algorithms but algorithms that strive to learn with their own “intelligence” based on previously learned information, which means that as a system, it learns from itself. Machine learning is defined as the algorithms and methods used to predict this information using probabilistic models. Often input to these models is massive data sets which naturally require a large amount of computation to crunch through, which makes machine learning a very good example of a supercomputing application.

High-performance computing

Suppose the computational problem is complex (e.g., climate), the simulated system is large in some sense (e.g., a galaxy comprising a vast number of stars) or the simulated time is long (e.g., long-term climate simulations). Or very high accuracy is needed for some reason. The required amount of computing work can quickly become enormous. Solving the problem with standard computers might take years or be impossible as the problem cannot fit into memory.

In addition to solving mathematical models numerically, supercomputers are ubiquitous in analyzing experimental data. Various experiments produce vast amounts of data that cannot be analyzed with pen and paper or a normal laptop or desktop computer. As an example, the experiments in CERN’s LHC particle accelerator (figure below) produce, on average, one petabyte (one million gigabytes) of data per day, which is equal to about 20 000 Blu-ray discs. Numerical simulations can produce large amounts of data too. Data analysis can also involve heavy numerical computations, such as artificial intelligence and machine learning methods.

High-performance computing and supercomputers enable solutions for the most demanding problems of computational and data science. They make amazingly complex large-scale simulations possible. Supercomputers are unique scientific tools in the sense that the same apparatus can be used for studying both extremely short length and small time scales (e.g., elemental particles) as well as extremely long length and large time scales (e.g., the motion of galaxies in the universe), and everything between. This is in contrast with experimental research where different scientific instruments are needed for different problems: the particle accelerator in CERN cannot be used for studying cosmic waves, and a vast radiotelescope cannot be used for studying subatomic particles. Because of this versatility, supercomputers can be used in multitude of application areas:

Fundamental sciences such as particle physics and cosmology
Climate, weather and earth sciences
Life sciences and medicine
Chemistry and material science
Energy, e.g oil and gas exploration and fusion research
Engineering, e.g. infrastructure and manufacturing
Data analysis
Artificial intelligence and machine learning

Thus, high-performance computing resources and knowledge are a hugely important asset for any modern society and provide a competitive advantage for research and industry. In the next section, a number of application examples are described in more detail.