(Originally recorded 2019-05-02)
In this lecture we turn our attention to parallel computing. Although we have looked at some types of hardware- and compiler-supplied parallelism (pipelining and SIMD), we haven’t had to explicitly manage multiple threads of program control ourselves. That will be the focus of the remainder of this course.
There is a fundamental mental shift we have to make in thinking about computing at this point however. Up until now we have been thinking about a computation as a sequence of instructions carried out one after the other by the a CPU core – a single seqence of instructions. The seismic shift – from which all of the different models of parallel computng will follow – is that our “program” is going to consist of multiple sets of sequences of instructions. Each set of instructions – each instruction stream or each thread – will itself be executed by a core in the same fashion that we have been considering in the course so far. The core gets an instruction, puts it in the execution pipeline, gets the next instuction and so on. Now we just have multiple instruction streams, each of which is following this process.
Some of the subtleties in parallel programming has to do with writing programs such that they can be expressed as multiple threads of execution, rather than just one. To make sure a computation that is expressed in this way is correct, we have to make sure dependendencies between data and computation are respected – which can limit how many threads we can have or how many threads we can run concurrently. And we have to distinguish between concurrency, which means that computations can be run independently of each other (order doesn’t matter), and parallelism, which means computations are running truly at the same time on separate execution units. Concurrent programs aren’t necessarily parallel, but parallel programs must be concurrent.
We introduce parallelism and concurrency with an anthropmorphized example of grading a set of exams. We consider different ways of spreading the work of grading exams (and grading questions on the exams) among different graders. In class we analyzed how we might gain efficiency with more graders (a shorter time before all the exams were graded) and noted that parallelism and not just concurrency was important for speedup. Although spreading the work out among different graders might result in 1/22 of the work per grader and thus 1/22 of the time spent per grader, the time to get all of the exams graded is not 1/22 of the previous time unless all of the graders start at the same time and complete at the same time – and that time is 1/22 of the time it would take one grader.