Shared memory computer

When a problem is solved in parallel, in most cases, some sharing of information between the different CPU cores is necessary as the CPU cores need to communicate with each other. One way to communicate is by using shared memory.

The fundamental feature of a shared-memory computer is that all the CPU cores are connected to the same piece of memory.

This is achieved by having a memory bus that takes requests for data from multiple sources (in the above illustration, each of the four separate CPU cores) and fetches the data from a single piece of memory. The term bus apparently comes from the Latin omnibus, meaning for all, which indicates that it is a single resource shared by many CPU cores.

This is the basic architecture of a modern mobile phone, laptop, or desktop PC. So, for example, if you buy a system with a quad-core processor and four gigabytes of RAM, each of the four CPU cores will be connected to the same RAM, and they’ll have to play nicely and share the memory fairly with each other.

A good analogy here is to think of four office mates or workers (the CPU cores) sharing a single office (the computer) with a single whiteboard (the memory). Each worker has their own set of whiteboard markers and an eraser, but they are not allowed to talk to each other. Instead, they can only communicate by writing to and reading from the whiteboard. In the case of Alice, Bob, Joe, and Lucy summing up the numbers, each of them can read their part of the numbers from the whiteboard and compute the partial results. Once finished with the computations, the four can write their partial results on the whiteboard. Alice can then read the partial results and perform the final summation.

This whiteboard analogy already illustrates some key challenges of the shared-memory approach:

Memory capacity: There is a limit to the size of the whiteboard that can fit into an office, and likewise, there is a limit to the amount of memory that can be put into a single shared-memory computer.
Memory access speed: Imagine ten people in the same office. Although they can, in principle, all read from and write to the whiteboard, there’s simply not enough room for more than around four of them to do so at the same time before they start to get in each other’s way. Although the office can be filled with more and more workers, their productivity will stall after about four workers because the additional workers will spend more and more time idle as they have to queue in order to access the shared whiteboard. At worst, the additional workers are disturbing, and the productivity can even decrease with too many workers.
Race conditions: As all workers have access to all the data, they might also inadvertently delete or alter the data of the other workers. Imagine the whiteboard being filled with numbers. In order to write, a part of the writing needs to be erased. Thus, it is vital to make sure that erasing a certain part doesn’t interfere with the others’ work.

Limitations

As the example above illustrates, memory access speed is a real issue in shared-memory machines. If you look at the processor diagram above, you’ll see that all the CPU cores share the same memory bus. The connection between the bus and the memory eventually becomes a bottleneck, and there is simply no point in adding additional CPU cores. Coupled with the fact that the kinds of programs run on supercomputers tend to read and write large quantities of data, it is often the memory access speed, not the floating-point performance of the CPU cores, that is the limiting factor controlling how quickly a calculation can be made.

There are various tricks to overcome these two issues, but the overcrowded office example clearly illustrates the fundamental challenges of an approach requiring many hundreds or thousands or even hundreds of thousands of CPU cores.

Advantages

There are naturally also advantages to the shared memory approach. Even though the race conditions need to be considered, parallel programming with shared memory is relatively simple. Also, as long as the memory bus is not a bottleneck, the performance is typically good as there is no additional overhead from the communication.

Adapted from material in "Supercomputing" online-course (https://www.futurelearn.com/courses/supercomputing/) by Edinburgh Supercomputing Center (EPCC), licensed under Creative Commons SA-BY