Problem Set #5 (Part I)¶

Assigned: May 12, 2020

Due: May 20, 2020

Background¶

Lectures 10, 11, and 12

You may also find the videos on page fault / memory management and debugging interesting

Introduction¶

This assignment is extracted as with the previous assignment into a directory ps5a. There will be a follow-up ps5b released in a day or two that will have the same due date as ps5a.

Extract the homework files¶

Create a new subdirectory named “ps5a” to hold your files for this assignment. The files for this assignment are in a “tar ball” that can be downloaded with this link.

To extract the files for this assignment, download the tar ball and copy it (or move it) to your ps5a subdirectory. From a shell program, change your working directory to be the ps5a subdirectory and execute the command

$ tar zxvf ps5a.tar.gz

The files that should be extracted from the tape archive file are

Makefile:

Matrix.hpp: Declarations for Matrix class

Questions.rst: File containing questions to be answered for this assignment

Vector.hpp: Declarations for Vector class

amath583.cpp: Definitions (implementations) of Matrix and Vector functions

amath583.hpp: Declarations of Matrix and Vector functions

bonnie_and_clyde.cpp: Bank account example with futures

bonnie_and_clyde_function_object.cpp: Bank account example with function objects

catch.hpp: Catch2 testing framework header

norm_order.cpp: driver program that calls three different norm functions

norm_test.cpp: test program for three different norm functions

verify-all.bash: Meta bash script for running other verifiers

verify-make.bash: Bash script that attempts to make all matrix programs for PS4

verify-opts.bash: Bash script that sets options for other scripts

verify-run.bash: Bash script that attempts to make and run all matrix timing programs for PS4

verify-test.bash: Bash script that attempts to make and run all matrix test programs for PS4

Sign up For AWS Credits¶

For the last few assignments in the course we will be using a centralized compute cluster set up on AWS. The credits granted for the course are also centralized, but granted on a per-student basis. Use this link , select University of Washington for your institution and High-Performance Scientific Computing AMATH 583/583 as the course. You do not need a promotional code.

Preliminaries¶

Futures¶

C++ provides the mechanism to enable execution of a function call as an asynchronous task. Its semantics are important. Consider the example we have been using in lecture.

typedef size_t size_t;

size_t bank_balance = 300;

static std::mutex atm_mutex;
static std::mutex msg_mutex;

size_t withdraw(const std::string& msg, size_t amount) {
  std::lock(atm_mutex, msg_mutex);
  std::lock_guard<std::mutex> message_lock(msg_mutex, std::adopt_lock);

  std::cout << msg << " withdraws " << std::to_string(amount) << std::endl;

  std::lock_guard<std::mutex> account_lock(atm_mutex, std::adopt_lock);

  bank_balance -= amount;
  return bank_balance;
}

int main() {
  std::cout << "Starting balance is " << bank_balance << std::endl;

  std::future<size_t> bonnie = std::async(withdraw, "Bonnie", 100);
  std::future<size_t> clyde  = std::async(deposit, "Clyde", 100);

  std::cout << "Balance after Clyde's deposit is " << clyde.get() << std::endl;
  std::cout << "Balance after Bonnie's withdrawal is " << bonnie.get() << std::endl;

  std::cout << "Final bank balance is " << bank_balance << std::endl;

  return 0;
}

Here we just show the withdraw function. Refer to the full source code for complete implementation bonnie_and_clyde.cpp .

Note that we have modified the code from lecture so that the deposit and withdrawal functions return a value, rather than just modifying the global state.

Compile and run the bonnie_and_clyde program several times. Does it always return the same answer?

On my laptop I ran the example and obtained the following results (which were different every time):

$ ./bonnie_and_clyde.exe
Starting balance is 300
Bonnie withdraws 100
Balance after Clyde's deposit is Clyde deposits 100
300
Balance after Bonnie's withdrawal is 200
Final bank balance is 300

$ ./bonnie_and_clyde.exe
Starting balance is 300
Balance after Clyde's deposit is Clyde deposits 100
Bonnie withdraws 100
400
Balance after Bonnie's withdrawal is 300
Final bank balance is 300

$ ./bonnie_and_clyde.exe
Starting balance is 300
Bonnie withdraws 100
Clyde deposits 100
Balance after Clyde's deposit is 300
Balance after Bonnie's withdrawal is 200
Final bank balance is 300

$ ./bonnie_and_clyde.exe
Starting balance is 300
Balance after Clyde's deposit is Bonnie withdraws 100
Clyde deposits 100
300
Balance after Bonnie's withdrawal is 200
Final bank balance is 300

See if you can trace out how the different tasks are running and being interleaved. Notice that everything is protected from races – we get the correct answer at the end every time.

Note the ordering of calls in the program:

std::future<size_t> bonnie = std::async(withdraw, "Bonnie", 100);
std::future<size_t> clyde  = std::async(deposit, "Clyde", 100);

std::cout << "Balance after Clyde's deposit is " << clyde.get() << std::endl;
std::cout << "Balance after Bonnie's withdrawal is " << bonnie.get() << std::endl;

That is, the ordering of the calls is Bonnie makes a withdrawal, Clyde makes a deposit, we get the value back from Clyde’s deposit and then get the value back from Bonnie’s withdrawal. In run 0 above, that is how things are printed out: Bonnie makes a withdrawal, Clyde makes a deposit, we get the value after Clyde’s deposit, which is 300, and we get the value after Bonnie’s withdrawal, which is 200. Even though we are querying the future values with Clyde first and Bonnie second, the values that are returned depend on the order of the original task execution, not the order of the queries.

But let’s look at run 2. In this program, there are three threads: the main thread (which launches the tasks), the thread for the bonnie task and the thread for the clyde task. The first thing that is printed out after the tasks are launched is from Clyde’s deposit. But, before that statement can finish printing in its entirety, the main thread interrupts and starts printing the balance after Clyde’s deposit. The depositing function isn’t finished yet, but the get() has already been called on its future. That thread cannot continue until the clyde task completes and returns its value. The clyde task finishes its cout statement with “100”. But then the bonnie task starts running - and it runs to completion, printing “Bonnie withdraws 100” and it is done. But, the main thread was still waiting for its value to come back from the clyde task – and it does – so the main thread completes the line “Balance after Clyde’s deposit is” and prints “400”. Then the next line prints (“Balance after Bonnie’s withdrawal”) and the program completes.

Try to work out the sequence of interleavings for run 1 and run 3. There is no race condition here – there isn’t ever the case that the two tasks are running in their critical sections at the same time. However, they can be interrupted and another thread can run its own code – it is only the critical section code itself that can only have one thread executing it at a time. Also note that the values returned by bonnie and clyde can vary, depending on which order they are executed, and those values are returned by the future, regardless of which order the futures are evaluated.

std::future::get¶

Modify your bonnie_and_clyde program in the following way. Add a line in the function after the printout from Bonnie’s withdrawal to double check the value:

std::cout << "Balance after Clyde's deposit is " << clyde.get() << std::endl;
std::cout << "Balance after Bonnie's withdrawal is " << bonnie.get() << std::endl;
std::cout << "Double checking Clyde's deposit " << clyde.get() << std::endl;

(The third line above is the one you add.) What happens when you compile and run this example? Why do you think the designers of C++ and its standard library made futures behave this way?

Callables¶

We have looked at two ways of specifying concurrent work to be executed in parallel: threads and tasks. Both approaches take an argument that represents the work that we want to be done (along with additional arguments to be used by that first argument). One kind of thing that we can pass is a function.

void pi_helper(int begin, int end, double h) {
  for (int i = begin; i < end; ++i) {
    pi += (h*4.0) / (1.0 + (i*h*i*h));
  }
}

int main() {

  // ...

  std::thread(pi_helper, 0, N/4, h);  // pass function

  return 0;
}

The interfaces to thread and to async are quite general. In fact, the type of the thing being passed in is parameterized. What is important about it is not what type it is, but rather what kinds of expressions can be applied to it.

To be more concrete, the C++11 declaration for async is the following:

template< class Function, class... Args>
std::future<std::result_of_t<std::decay_t<Function>(std::decay_t<Args>...)>>
async( Function&& f, Args&&... args );

There are a number of advanced C++ features being used in this declaration, but the key thing to note is that, indeed, the first argument f is of a parameterized type (Function).

The requirements on whatever type is bound to Function are not that it be a function, but rather that it be Callable. Meaning, if f is Callable, we can say this as a valid expression:

f();

Or, more generally

f(args);

This is purely syntactic – the requirement is that we be able to write down the name of the thing that is Callable, and put some parentheses after it (in general with some other things inside the parentheses).

Of course, a function is one of the things that we can write down and put some parentheses after. But we have also seen in C++ that we can overload operator() within a class. Objects of a class with operator() can be written down with parentheses – just like a function call. In fact, this is what we have been doing to index into Vector and Matrix.

We can also create expressions that themselves are functions (“anonymous functions” or lambda expressions).

One way of thinking about thread and async is that they are “higher-order functions” – functions that operate on other functions. As you become more advanced in your use of C++, you will find that many common computational tasks can be expresses as higher order functions.

A bit more information about function objects and lambdas can be found as excursus: Function Objects and Lambda . Lambda is also covered in Lecture 13. There is also an example of function objects for our bank account example in the file bonnie_and_clyde_function_object.cpp.

Warm Up¶

Norm!¶

In lecture (and in previous assignments) we introduced the formal definition of vector spaces as a motivation for the interface and functionality of our Vector class. But besides the interface syntax, our Vector class needs to also support the semantics. One of the properties of a vector space is associativity of the addition operation, which we are going to verify experimentally in this assignment.

We have previously defined and used the function

double two_norm(const Vector&  x);

prototyped in amath583.hpp and defined in amath583.hpp.

In your source directory is a file norm_order.cpp that is the driver for this problem. It currently creates a Vector v and then invokes versions of two_norm on it four times, checking the results against that obtained by the first invocation.

The first invocation of two_norm is saved to the value norm0. We invoke two_norm a second time on v and save that value to norm1. We then check that norm0 and norm1 are equal.

As the program is delivered to you it also invokes two_norm_r on v and two_norm_s` on ``v (for “reverse” and “sorted”, respectively). The program also prints the absolute and relative differences between the values returned from two_norm_r and two_norm_s with the sequential two_norm to make sure the returned values are the same. The functions two_norm_r and two_norm_s are defined in amath583.cpp.

Absolute and relative differences are computed this way:

if (norm2 != norm0) {
  std::cout << "Absolute difference: " << std::abs(norm2-norm0) << std::endl;
  std::cout << "Relative difference: " << std::abs(norm2-norm0)/norm0 << std::endl;
}

(This is for norm2 - the code for the others should be obvious.)

Modify the code for two_norm_r so that it adds the values of v in reverse of the order given. I.e., rather than looping from 0 through num_rows()-1, loop from num_rows()-1 to 0.

Note

size_t

In almost all of the for loops we have been using in this course, I have written them using a size_t

for (size_t i = 0; i < x.num_rows(); ++i) {
   // ..
}

In C++ a size_t is unsigned and large enough to be able to index the largest thing representable in the system it is being compiled with. For all intents and purposes that means it is an unsigned 64-bit integer.

Be careful if you are going to use a size_t to count down.

What happens with the following code, for example:

int main() {

  size_t i = 1;
  assert(i > 0);

  i = i - 1;       // subtract one from one
  assert(i == 0);

  i = i - 1;       // subtract one from zero
  assert(i < 0);

  return 0;
}

Do you expect all the assertions to pass? Do they?

Modify the code for two_norm_s so that it adds the values of v in ascending order of its values. To do this, make a copy of v and sort those values using std::sort as follows

Vector w(v);  // w is a copy of v
std::sort(&w(0), &w(0)+w.num_rows());

What happens when you run norm.exe? Are the values of norm2 and norm3 equal to norm0?

Note that the program takes a command line argument to set the max size of the vector. The default is 1M elements.

Answer these questions in Questions.rst.

At what problem size do the answers between the computed norms start
to differ?
How do the absolute and relative errors change as a function of problem size?
Does the Vector class behave strictly like a member of an abstract vector class?
Do you have any concerns about this kind of behavior?

Problems Preview¶

Parallel Norm (Loop Parallelism)
Parallel Norm (Divide and Conquer)
Parallel sparse matvec
Parallel pagerank

Submitting Your Work¶

Do a make clean in your ps5a directory and then create a tarball ps5a-submit.tar.gz containing all of the files in that directory. From within the ps5a directory:

$ cd ps5a
$ make clean
$ cd ..
$ tar zcvf ps5a-submit.tar.gz --exclude "./ps5a/.vscode" ./ps5a

If you relied on any external references, list them in a file refs.txt and include that in your tarball as well (in the ps5a subdirectory).