The machines themselves are very complicated pieces of technology and just keeping them running requires a number of experts. Additionally, even the national class systems are physically quite large and need adequate housing, power, cooling, infrastructure, and corresponding staff.
As getting good performance from a supercomputer is becoming more and more demanding, the supercomputing centers typically provide support for their users on various levels ranging from very basic problems such as connecting and accessing the system to very specialized ones like computational methods and performance-related issues.
The following list is not at all exhaustive but provides a simplified description of the functions or groups of staff that computing centers typically have:
Data center engineers operate and maintain the power (including emergency power) and cooling infrastructure (pumps and fans, heat exchangers, plumbing, ducting, etc.) and the network connections. This involves the connections within the data center as well as the ones with the outside world.
System administrators make sure that the system runs smoothly. They monitor the usage as well as hardware status and system load and make various adjustments accordingly. In a large machine, there are always broken components that need to be replaced, which may require a service break. System software is also regularly updated to fix problems, improve and add functionality, and maintain system security. Of course, the programming environment (compilers, libraries, performance analysis tools, etc.) may be their responsibility too, but at least CSC has a dedicated group of experts for that.
Scientific software support staff procure, install, and maintain the software packages based on the users’ needs. Additionally, they help the customers use the packages, write user guides and instructions, and provide related training. In some cases, there is also in-house scientific software development. For example, CSC develops the finite element method based multiphysics modeling software called Elmer which has a global user base.
HPC programming support staff helps users port, parallelize, and optimize their codes for the supercomputers. Extracting good performance from modern accelerated systems is not always easy and the scientists of various fields cannot be expected to be experts in high-performance computing, especially when computational methods are utilized in new fields of science with little or no tradition of HPC use.
Examples of Computing Centers (optional, not included in the exam)
The following are examples of computing centers in Europe, USA, and Japan: