Assigned: April 27, 2021

Due: May 4, 2021

Lecture 9

Page Rank from Lecture 13 (starts at 54:57)

I suggest that Lecture 9 *not* be listened to at 2X speed. Sparse matrices are *the*
central data structure in high-performance scientific computing and we will continue
to build on them for the rest of the course.

This assignment is extracted as with the previous assignment into a
directory `midterm`

Create a new subdirectory named “midterm” to hold your files for this
assignment. The files for this assignment are in a “tar ball” that can be
downloaded with `this link`

.

To extract the files for this assignment, download the tar ball and copy it (or move it) to your midterm subdirectory. From a shell program, change your working directory to be the midterm subdirectory and execute the command

```
$ tar zxvf midterm.tar.gz
```

The files that should be extracted from the tape archive file are

AOSMatrix.hpp: Declarations for Array of Structs Format Sparse Matrix class

COOMatrix.hpp: Declarations for Coordinate Format Sparse Matrix class

CSCMatrix.hpp: Declarations for Compressed Sparse Column Format Sparse Matrix class

CSRMatrix.hpp: Declarations for Compressed Sparse Row Format Sparse Matrix class

Makefile:

Matrix.hpp: Declarations for Matrix class

Questions.rst: File containing questions to be answered for this assignment

Timer.hpp: Simple timer class

Vector.hpp: Declarations for Vector class

amath583IO.cpp: Definitions (implementations) of Matrix and Vector IO functions

amath583IO.hpp: Declarations of Matrix and Vector IO functions

amath583sparse.cpp: Definitions (implementations) of Sparse Matrix functions

amath583sparse.hpp: Declarations of Sparse Matrix functions

amath583.cpp: Definitions (implementations) of Matrix and Vector functions

amath583.hpp: Declarations of Matrix and Vector functions

catch.hpp: Catch2 testing framework header

matmat.cpp: sparse matrix by dense matrix product driver

matmat_test.cpp: sparse matrix by dense matrix product test

matvec.cpp: sparse matrix by vector product driver

matvec_test.cpp: sparse matrix by vector product test

verify-*.bash: Verification scripts

For historical reasons this course has had a midterm exam, about, well, midway through the course.

As I hope you have seen with the assignments thus far in the course, my intention for the assignments is that they be instructive and tutorial much more than they be evaluative. I feel the same way about the exams in the course - they are intended to be learning experiences.

At the same time, I do have to do some assessment of how well each of you has progressed in these learning experiences.
So, this assignment and the very last one are designated as exams.
The rules are the same as for the other assignments, *except*

**The exam may not be discussed with anyone except course instructors**Do not discuss it with your classmates, with your friends, with your enemies. You*may*refer to your previous assignments, AMATH 483/583 slides and lectures. You may contact the instructors via*private*messages on Piazza to clarify questions. The same policies and penalties for plagiarism apply for the mid-term as for the regular assignments.

The mid-term will be evaluated by hand rather than with gradescope.
As with previous assignments, there are tests supplied with the midterm that you can use to verify your solutions.
We haven’t really had any requirements on commenting your code, either with
`//`

or the `/*`

`*/`

pair.
Continuing in this regard,
general code comments will not be required for the midterm, but are
*highly suggested*. Any clarifying comments may help the graders award
partial credit if your code doesn’t exhibit the correct behavior.

The work for this assignment will mostly be carried out in the header files for the corresponding classes (any other files will be specified in each problem). There are four driver programs that will be used to test and time your implementations:

matvec_test.cpp

matvec.cpp

matmat_test.cpp

matmat.cpp

Targets for the corresponding exe files are given in your Makefile.

The source code files for 483 and 583 are the same. There are certain
sections in the source files above that are guarded by a preprocessor
directive. In your Makefile there is a macro `DEFS`

. If you are a 583 student,
make sure it is defined as

```
DEFS := -D__583
```

So that the guarded code (having to do with the AOS matrix) is compiled.

If you are a 483 student, make sure the macro is not set:

```
DEFS :=
```

(So that the guarded code is not compiled.)

Similarly, to properly run the verification scripts (depending on 483 or 583),
comment out the appropriate line in `verify-opts.bash`

.
If you are a 483 student, the correct define should be

```
declare -a ALLDEFS=("")
```

If you are a 583 student, the correct define should be

```
declare -a ALLDEFS=("-D__583")
```

Recall that mathematically, matrix-vector product \(y \leftarrow A x\) is defined by:

\[y_i = \sum_k a_{ik} x_k\]

In the dense matrix case, we implemented this as

```
for (size_t i = 0; i < A.num_rows(); ++i) {
for (size_t k = 0; k < A.num_cols(); ++k) {
y(i) += A(i, k) * x(k);
}
}
```

The function loops through all values of the matrix `A`

,
indexes into `A`

with `i`

and `k`

, indexes into `y`

with `i`

and indexes into `x`

with `k`

.

In the definition of matrix-vector product, the key invariant relationship is that the matrix element has two indices – the left index corresponds to the index into the \(y\) and the right index corresponds to the index into \(x\). When we implement this product as a program, we assume we have all the values stored, and so we store them in a rectangular array – which is indexed by the same indices associated with each value of the finite dimensional linear operator being represented by `A`

.

But, we don’t have to represent matrices that way. In particular, if we have a sparse matrix (one with many more zero elements than non-zero elements), the non-zero elements are not stored in a rectangular array (with implicit indices) – rather, each non-zero element is stored along with its index values. That is, abstractly, a sparse matrix (any matrix, really), is represented by a set of triples:

\[A = \{ ( a, i, j ) \}\]

Where \(a\) is a Real or Complex number and \(i\) and \(j\) are integers.

Matrix-vector product is still abstractly the same but now for each \(y_i\) we are summing over those triples in \(A\) where the first index is equal to \(i\):

\[y_i = \sum_{(a, i, k)} a x_k\]

There are two complementary ways to store a set of triples in C++: as a struct of arrays or as an array of structs. In the former case, we store the values for each triple in separate arrays (at the same location) such that for some integer, say, `m`

, `a[m]`

, `i[m]`

, and `j[m]`

represent a triple in the sparse matrix being stored. This is how the COO format is defined, where `row_indices_`

, `col_indices_`

, and `storage_`

are used to store the components of each triple.

Matrix-vector product is then carried out as:

```
void matvec(const Vector& x, Vector& y) const {
for (size_t m = 0; m < storage_.size(); ++m) {
y(row_indices_[m]) += storage_[m] * x(col_indices_[m]);
}
}
```

Note that `m`

is being used to index into each of `row_indices_`

, `col_indices_`

, and `storage_`

As we saw in lecture, we can eliminate some redundant index information that is being stored by sorting the triples either according to the first index (the row number) or the second index (the column number) and then compressing the resulting sequences of repeated index values. This gives us the CSR and CSC formats, respectively.

Regardless of representation, COO, CSR, CSC, or AOS – the fundamental idea is that we are storing triples – which we will then use in matrix-vector (and matrix-matrix) product. All of the matrix-vector product functions for these formats carry out essentially the same process – get the row number, the column number, and the value for each matrix element and then update the appropriate element of the vector `y`

using the product of the matrix element and the appropriate value of the vector `x`

.

Computing the product \(y \leftarrow A^\top x\) – or, equivalently – \(y \leftarrow x A\) is defined as

\[y_k = \sum_{(a, i, k)} a x_i\]

We are doing the same thing as before, but the roles of the first and second index are exchanged.

How would you modify the following code to carry out the transpose matrix-vector product?

```
void matvec(const Vector& x, Vector& y) const {
for (size_t m = 0; m < storage_.size(); ++m) {
y(row_indices_[m]) += storage_[m] * x(col_indices_[m]);
}
}
```

Hint: In the mathematical formula, we swapped to use the second index for indexing \(y\) and the first index for indexing \(x\). What would the corresponding change be to the code?

As supplied, the driver programs matvec_test.cpp and matvec.cpp will compile and run with CSR and COO formats, but respectively fail the tests and print nonsense values for CSC (which has not been implemented).

Compile and run matvec_test.exe and matvec.exe

```
$ make matvec_test.exe
$ ./matvec_test.exe
$ make matvec.exe
$ ./matvec.exe
```

The driver matvec.exe will run and time matrix-vector product using a discretized 2D Laplacian (as presented in lecture) for various problem sizes and print the performance (in GFlops/sec) on each line.

Notice that there are
columns in the matvec.exe output for both
the matvec time and for the matvec *transpose* time.

```
NNZ COO COO^T CSR CSR^T CSC CSC^T AOS AOS^T
20480 1.40571 1.57306 2.09741 2.03288 2.03288 2.03288 1.34834 1.8101
81920 1.36646 1.65683 2.03918 2.10391 2.10391 2.10391 1.41007 1.69931
327680 1.30436 1.5622 1.97572 1.89224 1.84039 1.89224 1.13855 1.4142
```

We will be implementing both matrix-vector product and transpose-matrix vector product.

The functionality for sparse matrix classes is contained in the sparse matrix class declaration files (COO.hpp, CSR.hpp, CSC.hpp, and AOS.hpp) and in the two files amath583sparse.hpp and amath583sparse.cpp. All of the member functions for the classes are contained in the class declarations themselves. The functions in amath583sparse are things like norm, matrix-vector product, and so forth.

There is a bit more inside of the sparse matrix classes than there was in the `Matrix`

class. With the `Matrix`

class, we could maintain a private internal representation of the `Matrix`

and
expose just `operator()`

and we could do everything we needed to with the `Matrix`

. For a sparse matrix, storing \((a, i, j)\) triples, we can’t provide an `operator()`

interface – we have
to instead provide an interface with more specialized functionality.

In the case of `Matrix`

, the `mult`

function was a free function, just using `operator()`

:

1 2 3 4 5 6 7 8 | ```
void mult(const Matrix& A, const Vector& x, Vector& y) {
assert(y.num_rows() == A.num_rows());
for (size_t i = 0; i < A.num_rows(); ++i) {
for (size_t j = 0; j < A.num_cols(); ++j) {
y(i) += A(i, j) * x(j);
}
}
}
``` |

It doesn’t matter how the matrix is implemented inside, this function just needs `operator()`

. Note that we also provide a function `mult`

of two arguments, that returns `y`

:

1 2 3 4 5 | ```
Vector mult(const Matrix& A, const Vector& x) {
Vector y(A.num_rows());
mult(A, x, y);
return y;
}
``` |

It allocates `y`

, calls the three-argument version of `mult`

and then returns `y`

.

We have `mult`

functions with the same interface for `COOMatrix`

1 2 3 | ```
void mult(const COOMatrix& A, const Vector& x, Vector& y) {
A.matvec(x, y);
}
``` |

1 2 3 4 5 | ```
Vector mult(const COOMatrix& A, const Vector& x) {
Vector y(A.num_rows());
mult(A, x, y);
return y;
}
``` |

Note that now we invoke the member function `matvec()`

of the `COOMatrix`

. That in turn is defined in COOMatrix.hpp:

1 2 3 4 5 | ```
void matvec(const Vector& x, Vector& y) const {
for (size_t k = 0; k < storage_.size(); ++k) {
y(row_indices_[k]) += storage_[k] * x(col_indices_[k]);
}
}
``` |

This pattern is also used by `CSRMatrix`

, `CSCMatrix`

, and `AOSMatrix`

.

Note

DRY

All four of the sparse matrix classes we are working with have the same set of operations declared for them.
E.g. as declared in `amath583sparse.hpp`

,

```
void mult(const COOMatrix& A, const Vector& x, Vector& y);
void mult(const CSRMatrix& A, const Vector& x, Vector& y);
void mult(const CSCMatrix& A, const Vector& x, Vector& y);
void mult(const AOSMatrix& A, const Vector& x, Vector& y);
```

This seems like huge amounts of unnecessary repetition. As in all cases where something is repeated but with a
slight difference, we need a way to express what is the same once, while parameterizing what is different so that
we can capture the four (here) cases we actually need. In this case, it is the *types* of the parameters being
passed that vary. C++ has a mechanism (an extremely powerful mechanism) for parameterizing on types: templates,
which allow both functions and data structures to be parameterized. We will come back to templates in a week or two,
but using templates, the declarations above would compact to:

```
template <class MatrixType>
mult(const MatrixType& A, const Vector& x, Vector& y);
```

(In fact, if you look at the definitions of `mult`

for the sparse matrices, they are also all the same.
As with regular functions, function templates need to have a matching declaration and definition – so the implementations
would be compacted as well.)

For the first part of this assignment we are going to complete the implementations of the function necessary to fill in the table printed out by matvec.exe.

In the lecture on the Page Rank algorithm, we derived Page Rank as essentially a power method for revealing the
eigenvector associated with largest eigenvalue of the matrix \(P\). There was some hand waving about left
and right eigenvectors and we arrived at \(y \leftarrow P x\)
as being the core operation – realized as `mult(P, x, y)`

.

In other derivations of Page Rank, you may see \(y \leftarrow x P\) or \(y \leftarrow P^T x\) as the core operation. Both are correct, depending on how \(P\) is formed to represent the underlying graph of web links.

Complete the definition of the member function `t_matvec`

in COOMatrix.hpp and in CSRMatrix.hpp.

There are several functions that are not completed as the code is handed to you for this assignment. As a result, when you run matvec.exe you will see error messages for many failures. One of them will look like:

```
-------------------------------------------------------------------------------
Identity
COO
-------------------------------------------------------------------------------
```

When your function is working properly that error message will go away.

In lecture we derived the compressed sparse row format from coordinate format by first sorting the data based on the row indices stored in COO. By applying run-length encoding we compressed the row indices from a size equal to the number of non-zeros in the matrix down to the number of rows in the matrix (plus one).

A similar process could have been applied had we instead sorted the coordinate data by columns rather than rows. Such a format is known as compressed sparse column (CSC).

An implementation for the CSC format has been provided in CSCMatrix.hpp.

Complete the implementations for the two member functions: 1. matvec – computes matrix by vector product 2. t_matvec – computes the transpose matrix by vector product

In class we discussed that there are two ways to represent a coordinate (COO) sparse matrix – one that explicitly keeps all three values of the triples representing the matrix data - struct of arrays or array of structs. Our implementation (that we call COOMatrix) thus far has been the former. The row index information, the column index information, and the values are all stored in separate arrays.

You are to complete functionality for an implementation of COO sparse matrix as an array of structs (AOS). With AOS, the sparse matrix information is kept in a single array, where each entry in the array is a data structure consisting of three pieces of information: row index, column index, and value.

In our AOS implementation, we represent each triple as a C++ tuple
holding two indices and a value (two `size_t`

and a `double`

):

```
typedef std::tuple<size_t, size_t, double> element;
```

and then keep a single array (as an `std::vector`

) of those tuples as the matrix storage:

```
std::vector<element> storage_;
```

Thus, the expression to get the `m`

element of the `storage`

would be

```
element aik = storage_[m];
```

which would return an element – which is a triple. The function for accessing
a particular element of a C++ tuple is called `get`

. To get the row, column, and value of a tuple, we would write:

```
element aik = storage_[m];
size_t row = std::get<0>(aik);
size_t col = std::get<1>(aik);
double val = std::get<2>(aik);
```

This is equivalent to the COO case:

```
size_t row = row_indices_[m];
size_t col = col_indices_[m];
double val = storage_[m];
```

An implementation for the AOS format has been provided in AOSMatrix.hpp.

Complete the implementations for the two member functions: 1. matvec – computes matrix by vector product 2. t_matvec – computes the transpose matrix by vector product

The matvec.exe programs creates a sparse matrix representing a discretized 2D Laplacian.
The program takes as argument the maximum
size of the grid along one dimension and runs sizes from 64 to the specified max in powers of 2 (the default max is 2048).
Note that these are *not* the matrix dimensions. For the grid size of 64, the matrix dimension is 4096 by 4096 (which was about the limit of what was reasonable to do with a dense matrix).
A grid size of 2048 is represented by a matrix with 4M rows and 4M columns. (How much memory would a dense matrix of that size require?)

For each problem (grid) size, the program prints the performance (in GFlops/sec) along with the grid size, matrix size, and number of nonzeros in the matrix.

Compile and run matvec_test.exe and matvec.exe.

Write (and answer) your own question about sparse matrix computation. This should be a non-trival question. “How do you spell CSR?” is not non-trivial.

(Include your answer in Questions.rst.) How does the performance (in GFLOP/s) for sparse-matrix by vector product compare to what you previously achieved for dense-matrix by dense-matrix product? Explain, and quantify if you can, (e.g., using the roofline model).

(What I am looking for here and in similarly phrased questions is the following: The matmat performance you measured previously and the sparse matrix-vector performance you just measured. By what factor do they differ? (What is the ratio of one to the other?) Then, using your roofline model for your CPU, you should be able to predict what that ratio might be, based on the algorithm characteristics. Hint: slides 75, 77, and maybe 79 from lecture 8. If the numbers don’t come out exactly the same, that’s okay – the point is to show that you can use the roofline model to *estimate* performance.)

Referring to the roofline model and the sparse matrix-vector and dense matrix-matrix algorithms, what performance ratio would you theoretically expect to see if neither algorithm was able to obtain any reuse from cache?

(Include your answer in Questions.rst.) How does the performance (in GFLOP/s) for sparse-matrix by vector product for COO compare to CSR? Explain, and quantify if you can, (e.g., using the roofline model).

So far we have considered sparse matrices with their use in linear systems – noting that sparse matrix by vector product is a fundamental operation. In this problem we want to consider the effects of sparse matrix by dense matrix product.

First, in the file amath583sparse.cpp there are functions not just for matrix
by vector product but for matrix-by matrix product as well. As with matrix-vector
`mult`

, these dispatch to member functions of the corresponding classes.

Complete the mult functions for matrix-matrix product in COOMatrix, CSRMatrix, and CSCMatrix.

(AMATH 583) Complete the mult functions for matrix-matrix product in AOSMatrix.

Compile and run matmat_test.exe and matmat.exe. As with matvec.exe, matmat.exe takes an argument that specifies the maximum problem size – where problem size in this case is the size of the sparse matrix. It also takes an argument that specifies the maximum number of columns in the matrix B (the dense matrix operand in the sparse matrix by dense matrix product).

(Include your answer in Questions.rst.) How does the performance
(in GFLOP/s) for sparse-matrix by dense matrix product (**SPMM**)
compare to sparse-matrix by vector product (**SPMV**)? The performance
for SPMM should be about the same as for SPMV in the case of a 1
column dense matrix. What is the trend with increasing numbers of
columns? Explain, and quantify if you can, using the roofline model.

(Include your answer in Questions.rst.) How does the performance of sparse matrix by dense matrix product (in GFLOP/s) compare to the results you got dense matrix-matrix product in previous assignments? Explain, and quantify if you can, using the roofline model.

Answer the following questions (append to Questions.rst): a) The most important thing I learned from this assignment was… b) One thing I am still not clear on is…

Experiment with some of the optimizations we developed previously in the course for matrix-matrix product and apply them to sparse-matrix by dense-matrix product.

Experiment with some of the optimizations we developed previously in the course for matrix-matrix product and apply them to sparse-matrix by vector product.

(Be creative. Don’t necessarily restrict yourself to loop optimizations. Think back to previous assignments where performance was increased in ways maybe having not to do with just loop optimizations.)

Your midterm solutions will be uploaded to Gradescope. Do a make clean in your midterm directory and then upload all the files of your workind directory (code + Questions.rst). No need to convert Questions.rst to a PDF this time.

If you relied on any external references, list them in a file refs.txt and include that in your write-up as well.