HPC and cloud computing

Traditionally, HPC resources are supercomputers and clusters that are designed to support HPC applications developed for specific problem-solving. All applications and libraries need to be compiled for the operating system installed on the machine. The computational work (jobs) is queued and executed when there are sufficient resources available in the machine. The applications run with direct physical access to hardware.

As a result, the traditional HPC platforms are extremely fine-tuned to extract the best performance from the hardware for the most intensive problems.

This approach is preferred for massively parallel processing that usually also requires a fast interconnection between the processors and a high-speed disk system for data input and output operations. However, there are also some restrictions. The users are limited to the available software stack and must wait for the specific resources to be free. They are also limited to a fixed available storage capacity.

Cloud computing usually refers to access to a pool of configurable computing resources available on demand. The computing services are virtualized and, to a large extent, automatized requiring no direct active management by the user. Virtualization essentially means running a guest operating system on a host machine. Multiple virtual machines (VMs, or instances) on each physical node can be run on virtual operating platforms. On these VMs, Linux or Windows can be installed with a complete custom stack of software. The main advantages of cloud computing are as follows:

on-demand self-service (automation, everything provided by the service provider)
resilience and elasticity (no data loss or downtimes in case of hardware failure)
flexibility and scalability (for the user, resources are seemingly unlimited)

There are three common kinds of cloud resources:

Infrastructure as a Service (IaaS): The user is responsible for setting up the operating system and everything above it (middleware, runtime, data, and applications). This is the basic and most flexible cloud service and is available from a variety of commercial service providers. Several supercomputing centers, including CSC, also provide these types of resources.

Platform as a Service (PaaS): The provider sets up the operating system, middleware, and the runtime, and the user brings in applications and data. This is in practice quite close to the traditional computing environments the computing centers provide.

Software as a Service (SaaS): The provider is responsible for everything, including software. Most people use this kind of cloud daily: email, various document storing and sharing services, office applications, etc.

Overall, cloud computing reduces complexity for the user in some cases, saves money for small businesses as there are no startup costs, and still delivers good performance. At the same time, setting up virtual machines requires knowledge of installing and maintaining operating systems and software.

Cloud computing can also be used for some HPC workloads. Even though virtualization adds some overhead, single-node performance is often close to that of bare-metal clusters. In addition, some providers offer special HPC clouds with high-speed interconnects. However, the performance of massively parallel computations utilizing a large number of nodes can be worse than that of the bare-metal clusters.

HPC cloud computing can be significantly more expensive than basic cloud resources. While there are no startup costs and the services are available immediately when requested, the price of computing time can vary depending on the load. Finally, there can also be concerns about the privacy and confidentiality of the data in the cloud.