In scientific research, various methods are used to find explanations for natural phenomena and make predictions about their behavior. The traditional methods are experimental and theoretical, the two traditional paradigms of science. Experimental science is based on observations and measurements. Theoretical science develops models, sometimes called laws of nature, which fit or “explain” measurements and, in the best case, predict phenomena that have not yet been observed. Since the mid 20th century, experimental and theoretical science have been complemented by computational science. Computational science can be defined as a discipline concerned with the design, implementation, and use of mathematical models to analyse and solve scientific problems via computer simulations and numerical analysis. The three disciplines or paradigms are often strongly interlinked.
Early astronomy is an example of experimental science. Ancient astronomers watched the skies first with bare eyes and then with telescopes and recorded the positions and movements of the Sun, the Moon, the planets of our solar system and some stars. They noted that their observations followed specific patterns. Similarly, one can drop stones of various weights from towers of different heights and measure the time it takes for the stones to hit the ground. Likewise, the random movement of small particles in liquid (Brownian motion) can be observed with microscopes, elementary particles with particle accelerators, etc.
Different experiments require widely different measurement apparatuses, and carrying out experiments can be very expensive. Nevertheless, experimental science can be considered as the fundamental paradigm: theories and computer simulations are, after all, confirmed or rejected based on the measurements.
Based on the observations, one may predict the sunset time for a given day and say that the flight time of a rock is roughly the same regardless of its weight when dropped from the same height, and so on. Next, a theoretical scientist takes a look at the results and wonders how they can explain them. For the movement these bodies, mathematical models for gravity and dynamics developed by Newton were able to produce predictions that fit the measured results very nicely.
This is an example of the interplay between the experimental and theoretical approaches. Usually, observations come first, and then a theory is developed to explain them. But not always; for example, Paul Dirac first predicted the existence of the positron (the electron’s antiparticle) in 1931 from a purely theoretical point of view. Subsequently, Carl Anderson confirmed the existence of the positron experimentally in 1932 and received a Nobel prize in 1936 for his discovery.
However, there are limits to what can be done with pen and paper or what kind of experiments can be set up. For example, continuing with the astronomical theme, it is easy enough to write down the equations that describe the motion of the Sun, the Moon, and the Earth. This is a so-called three-body problem, a relatively simple group of equations that, however, cannot be solved in closed form. That is, there is no formula where one can plug in time to get the positions of the heavenly bodies involved. Nevertheless, an approximate solution can be found numerically.
On the other hand, consider the climate and glaciers: it is crucial to know what will happen to them in the future. But naturally, we cannot perform experiments on a global scale to find that out. Similarly, in many cases, the required experiments would be too expensive, too dangerous, too slow, too difficult or complex, unethical, and so on. But we may still have an idea of the underlying theory.
In computational science, typically complex mathematical models (e.g., systems of partial differential equations) describing the problem are solved using numerical algorithms which are then implemented as computer programs and finally run on high-performance computers. These steps form a rough outline of a computational science project:
Nowadays, data science is sometimes considered a new paradigm. Data science can be defined as a field that uses various mathematical methods and algorithms to extract knowledge and insight from data. This insight can, in some cases, be used for formulating new mathematical models. As an early example, the measurements of Tycho Brahe made up the data that Johannes Kepler analyzed when formulating the Kepler's laws of planetary motion. Later on, Newton showed that Kepler's laws are a consequence of his theory of gravitation.
Two relevant fields related to each other in data science that have become more important in recent years are machine learning and artificial intelligence. Artificial intelligence is defined as technology to simulate intelligence as humans define it. For example, intelligence that we use to play chess or go. This technology doesn’t have predefined static algorithms but algorithms that strive to learn with their own “intelligence” based on previously learned information, which means that as a system, it learns from itself. Machine learning is defined as the algorithms and methods used to predict this information using probabilistic models. Often input to these models is massive data sets which naturally require a large amount of computation to crunch through, which makes machine learning a very good example of a supercomputing application.
Suppose the computational problem is complex (e.g., climate), the simulated system is large in some sense (e.g., a galaxy comprising a vast number of stars) or the simulated time is long (e.g., long-term climate simulations). Or very high accuracy is needed for some reason. The required amount of computing work can quickly become enormous. Solving the problem with standard computers might take years or be impossible as the problem cannot fit into memory.
In addition to solving mathematical models numerically, supercomputers are ubiquitous in analyzing experimental data. Various experiments produce vast amounts of data that cannot be analyzed with pen and paper or a normal laptop or desktop computer. As an example, the experiments in CERN’s LHC particle accelerator (figure below) produce, on average, one petabyte (one million gigabytes) of data per day, which is equal to about 20 000 Blu-ray discs. Numerical simulations can produce large amounts of data too. Data analysis can also involve heavy numerical computations, such as artificial intelligence and machine learning methods.
High-performance computing and supercomputers enable solutions for the most demanding problems of computational and data science. They make amazingly complex large-scale simulations possible. Supercomputers are unique scientific tools in the sense that the same apparatus can be used for studying both extremely short length and small time scales (e.g., elemental particles) as well as extremely long length and large time scales (e.g., the motion of galaxies in the universe), and everything between. This is in contrast with experimental research where different scientific instruments are needed for different problems: the particle accelerator in CERN cannot be used for studying cosmic waves, and a vast radiotelescope cannot be used for studying subatomic particles. Because of this versatility, supercomputers can be used in multitude of application areas:
Thus, high-performance computing resources and knowledge are a hugely important asset for any modern society and provide a competitive advantage for research and industry. In the next section, a number of application examples are described in more detail.