Traditionally, HPC resources are supercomputers and clusters that are designed to support HPC applications developed for specific problem-solving. All applications and libraries need to be compiled for the operating system installed on the machine. The computational work (jobs) is queued and executed when there are sufficient resources available in the machine. The applications run with direct physical access to hardware.
This approach is preferred for massively parallel processing that usually also requires a fast interconnection between the processors and a high-speed disk system for data input and output operations. However, there are also some restrictions. The users are limited to the available software stack and must wait for the specific resources to be free. They are also limited to a fixed available storage capacity.
Cloud computing usually refers to access to a pool of configurable computing resources available on demand. The computing services are virtualized and, to a large extent, automatized requiring no direct active management by the user. Virtualization essentially means running a guest operating system on a host machine. Multiple virtual machines (VMs, or instances) on each physical node can be run on virtual operating platforms. On these VMs, Linux or Windows can be installed with a complete custom stack of software. The main advantages of cloud computing are as follows:
There are three common kinds of cloud resources:
Cloud computing can also be used for some HPC workloads. Even though virtualization adds some overhead, single-node performance is often close to that of bare-metal clusters. In addition, some providers offer special HPC clouds with high-speed interconnects. However, the performance of massively parallel computations utilizing a large number of nodes can be worse than that of the bare-metal clusters.
HPC cloud computing can be significantly more expensive than basic cloud resources. While there are no startup costs and the services are available immediately when requested, the price of computing time can vary depending on the load. Finally, there can also be concerns about the privacy and confidentiality of the data in the cloud.