Midterm Evaluation

Assigned: Apr 28, 2022

Due: May 5, 2022

Extended to May 6, 2022

Background

  • Lecture 9

Sparse matrices are the central data structure in high-performance scientific computing and we will continue to build on them for the rest of the course.

Introduction

This assignment is extracted as with the previous assignment into a directory midterm

Extract the homework files

Create a new subdirectory named “midterm” to hold your files for this assignment. The files for this assignment are in a “tar ball” that can be downloaded as midterm.tar.gz.

To extract the files for this assignment, download the tar ball and copy it (or move it) to your midterm subdirectory. From a shell program, change your working directory to be the midterm subdirectory and execute the command

$ tar zxvf midterm.tar.gz

The files that should be extracted from the tape archive file are

  • AOSMatrix.hpp: Declarations for Array of Structs Format Sparse Matrix class

  • COOMatrix.hpp: Declarations for Coordinate Format Sparse Matrix class

  • CSCMatrix.hpp: Declarations for Compressed Sparse Column Format Sparse Matrix class

  • CSRMatrix.hpp: Declarations for Compressed Sparse Row Format Sparse Matrix class

  • Makefile

  • Matrix.hpp: Declarations for Matrix class

  • Questions.rst: File containing questions to be answered for this assignment

  • Timer.hpp: Simple timer class

  • Vector.hpp: Declarations for Vector class

  • amath583IO.cpp: Definitions (implementations) of Matrix and Vector IO functions

  • amath583IO.hpp: Declarations of Matrix and Vector IO functions

  • amath583sparse.cpp: Definitions (implementations) of Sparse Matrix functions

  • amath583sparse.hpp: Declarations of Sparse Matrix functions

  • amath583.cpp: Definitions (implementations) of Matrix and Vector functions

  • amath583.hpp: Declarations of Matrix and Vector functions

  • catch.hpp: Catch2 testing framework header

  • matmat.cpp: sparse matrix by dense matrix product driver

  • matmat_test.cpp: sparse matrix by dense matrix product test

  • matvec.cpp: sparse matrix by vector product driver

  • matvec_test.cpp: sparse matrix by vector product test

  • verify-*.bash: Verification scripts

Special Instructions for Mid Term

For historical reasons this course has had a midterm exam, about, well, midway through the course.

As I hope you have seen with the assignments thus far in the course, my intention for the assignments is that they be instructive and tutorial much more than they be evaluative. I feel the same way about the exams in the course - they are intended to be learning experiences.

At the same time, I do have to do some assessment of how well each of you has progressed in these learning experiences. So, this assignment and the very last one are designated as exams. The rules are the same as for the other assignments, except

  1. The exam may not be discussed with anyone except course instructors
    Do not discuss it with your classmates, with your friends, with your enemies. You may refer to your previous assignments, AMATH 483/583 slides and lectures. You may contact the instructors via private messages on Piazza to clarify questions. The same policies and penalties for plagiarism apply for the mid-term as for the regular assignments.

The coding part of the midterm will be evaluated with autograder, same as ps2/ps3. The writing part of the midterm will be evaluated by hand. As with previous assignments, there are tests supplied with the midterm that you can use to verify your solutions. We haven’t really had any requirements on commenting your code, either with // or the /* */ pair. Continuing in this regard, general code comments will not be required for the midterm, but are highly suggested. Any clarifying comments may help the graders award partial credit if your code doesn’t exhibit the correct behavior. “Code is a medium to communicate with other programmers”. Good comments will also make your code understandable to both you and others.

The work for this assignment will mostly be carried out in the header files for the corresponding classes (any other files will be specified in each problem). There are four driver programs that will be used to test and time your implementations:

  1. matvec_test.cpp

  2. matvec.cpp

  3. matmat_test.cpp

  4. matmat.cpp

Targets for the corresponding exe files are given in your Makefile.

The source code files for 483 and 583 are the same. There are certain sections in the source files above that are guarded by a preprocessor directive. In your Makefile there is a macro DEFS. If you are a 583 student, make sure it is defined as

DEFS := -D__583

by removing the comment symbol “#”. So that the guarded code (having to do with the AOS matrix) is compiled.

If you are a 483 student, make sure the macro is not set:

DEFS :=

by removing the comment symbol “#”. (So that the guarded code is not compiled.)

Similarly, to properly run the verification scripts (depending on 483 or 583), comment out the appropriate line in verify-opts.bash. If you are a 483 student, the correct define should be

declare -a ALLDEFS=("")

If you are a 583 student, the correct define should be

declare -a ALLDEFS=("-D__583")

Sparse Matrix-Vector Product

Recall that mathematically, matrix-vector product \(y \leftarrow A x\) is defined by:

\[y_i = \sum_k a_{ik} x_k\]

In the dense matrix case, we implemented this as

for (size_t i = 0; i < A.num_rows(); ++i) {
  for (size_t k = 0; k < A.num_cols(); ++k) {
    y(i) += A(i, k) * x(k);
  }
}

The function loops through all values of the matrix A, indexes into A with i and k, indexes into y with i and indexes into x with k.

In the definition of matrix-vector product, the key invariant relationship is that the matrix element has two indices – the left index corresponds to the index into the \(y\) and the right index corresponds to the index into \(x\). When we implement this product as a program, we assume we have all the values stored, and so we store them in a rectangular array – which is indexed by the same indices associated with each value of the finite dimensional linear operator being represented by A.

But, we don’t have to represent matrices that way. In particular, if we have a sparse matrix (one with many more zero elements than non-zero elements), the non-zero elements are not stored in a rectangular array (with implicit indices) – rather, each non-zero element is stored along with its index values. That is, abstractly, a sparse matrix (any matrix, really), is represented by a set of triples:

\[A = \{ ( a, i, j ) \}\]

Where \(a\) is a Real or Complex number and \(i\) and \(j\) are integers.

Matrix-vector product is still abstractly the same but now for each \(y_i\) we are summing over those triples in \(A\) where the first index is equal to \(i\):

\[y_i = \sum_{(a, i, k)} a x_k\]

There are two complementary ways to store a set of triples in C++: as a struct of arrays or as an array of structs. In the former case, we store the values for each triple in separate arrays (at the same location) such that for some integer, say, m, a[m], i[m], and j[m] represent a triple in the sparse matrix being stored. This is how the COO format is defined, where row_indices_, col_indices_, and storage_ are used to store the components of each triple.

Matrix-vector product is then carried out as:

void matvec(const Vector& x, Vector& y) const {
  for (size_t m = 0; m < storage_.size(); ++m) {
    y(row_indices_[m]) += storage_[m] * x(col_indices_[m]);
  }
}

Note that m is being used to index into each of row_indices_, col_indices_, and storage_

As we saw in lecture, we can eliminate some redundant index information that is being stored by sorting the triples either according to the first index (the row number) or the second index (the column number) and then compressing the resulting sequences of repeated index values. This gives us the CSR and CSC formats, respectively.

Regardless of representation, COO, CSR, CSC, or AOS – the fundamental idea is that we are storing triples – which we will then use in matrix-vector (and matrix-matrix) product. All of the matrix-vector product functions for these formats carry out essentially the same process – get the row number, the column number, and the value for each matrix element and then update the appropriate element of the vector y using the product of the matrix element and the appropriate value of the vector x.

Transpose Matrix Vector Product

Computing the product \(y \leftarrow A^\top x\) – or, equivalently – \(y \leftarrow x A\) is defined as

\[y_k = \sum_{(a, i, k)} a x_i\]

We are doing the same thing as before, but the roles of the first and second index are exchanged.

How would you modify the following code to carry out the transpose matrix-vector product?

void matvec(const Vector& x, Vector& y) const {
  for (size_t m = 0; m < storage_.size(); ++m) {
    y(row_indices_[m]) += storage_[m] * x(col_indices_[m]);
  }
}

Hint: In the mathematical formula, we swapped to use the second index for indexing \(y\) and the first index for indexing \(x\). What would the corresponding change be to the code?

Warm Up

matvec.exe

As supplied, the driver programs matvec_test.cpp and matvec.cpp will compile and run with CSR and COO formats, but respectively fail the tests and print nonsense values for CSC (which has not been implemented).

Compile and run matvec_test.exe and matvec.exe

$ make matvec_test.exe
$ ./matvec_test.exe

$ make matvec.exe
$ ./matvec.exe

The driver matvec.exe will run and time matrix-vector product using a discretized 2D Laplacian (as presented in lecture) for various problem sizes and print the performance (in GFlops/sec) on each line.

Notice that there are columns in the matvec.exe output for both the matvec time and for the matvec transpose time.

   NNZ         COO       COO^T         CSR       CSR^T         CSC       CSC^T         AOS       AOS^T
 20480     1.40571     1.57306     2.09741     2.03288     2.03288     2.03288     1.34834      1.8101
 81920     1.36646     1.65683     2.03918     2.10391     2.10391     2.10391     1.41007     1.69931
327680     1.30436      1.5622     1.97572     1.89224     1.84039     1.89224     1.13855      1.4142

We will be implementing both matrix-vector product and transpose-matrix vector product.

COO.hpp (et al.)

The functionality for sparse matrix classes is contained in the sparse matrix class declaration files (COO.hpp, CSR.hpp, CSC.hpp, and AOS.hpp) and in the two files amath583sparse.hpp and amath583sparse.cpp. All of the member functions for the classes are contained in the class declarations themselves. The functions in amath583sparse are things like norm, matrix-vector product, and so forth.

There is a bit more inside of the sparse matrix classes than there was in the Matrix class. With the Matrix class, we could maintain a private internal representation of the Matrix and expose just operator() and we could do everything we needed to with the Matrix. For a sparse matrix, storing \((a, i, j)\) triples, we can’t provide an operator() interface – we have to instead provide an interface with more specialized functionality.

In the case of Matrix, the mult function was a free function, just using operator():

1void   mult(const Matrix& A, const Vector& x, Vector& y) {
2  assert(y.num_rows() == A.num_rows());
3  for (size_t i = 0; i < A.num_rows(); ++i) {
4    for (size_t j = 0; j < A.num_cols(); ++j) {
5      y(i) += A(i, j) * x(j);
6    }
7  }
8}

It doesn’t matter how the matrix is implemented inside, this function just needs operator(). Note that we also provide a function mult of two arguments, that returns y:

1Vector mult(const Matrix& A, const Vector& x) {
2  Vector y(A.num_rows());
3  mult(A, x, y);
4  return y;
5}

It allocates y, calls the three-argument version of mult and then returns y.

We have mult functions with the same interface for COOMatrix

1void mult(const COOMatrix& A, const Vector& x, Vector& y) {
2  A.matvec(x, y);
3}
1Vector mult(const COOMatrix& A, const Vector& x) {
2  Vector y(A.num_rows());
3  mult(A, x, y);
4  return y;
5}

Note that now we invoke the member function matvec() of the COOMatrix. That in turn is defined in COOMatrix.hpp:

1  void matvec(const Vector& x, Vector& y) const {
2    for (size_t k = 0; k < storage_.size(); ++k) {
3      y(row_indices_[k]) += storage_[k] * x(col_indices_[k]);
4    }
5  }

This pattern is also used by CSRMatrix, CSCMatrix, and AOSMatrix.

Note

DRY

All four of the sparse matrix classes we are working with have the same set of operations declared for them. E.g. as declared in amath583sparse.hpp,

void   mult(const COOMatrix& A, const Vector& x, Vector& y);
void   mult(const CSRMatrix& A, const Vector& x, Vector& y);
void   mult(const CSCMatrix& A, const Vector& x, Vector& y);
void   mult(const AOSMatrix& A, const Vector& x, Vector& y);

This seems like huge amounts of unnecessary repetition. As in all cases where something is repeated but with a slight difference, we need a way to express what is the same once, while parameterizing what is different so that we can capture the four (here) cases we actually need. In this case, it is the types of the parameters being passed that vary. C++ has a mechanism (an extremely powerful mechanism) for parameterizing on types: templates, which allow both functions and data structures to be parameterized. We will come back to templates in a week or two, but using templates, the declarations above would compact to:

template <class MatrixType>
mult(const MatrixType& A, const Vector& x, Vector& y);

(In fact, if you look at the definitions of mult for the sparse matrices, they are also all the same. As with regular functions, function templates need to have a matching declaration and definition – so the implementations would be compacted as well.)

Problems

For the first part of this assignment we are going to complete the implementations of the function necessary to fill in the table printed out by matvec.exe.

Transposed Matrix-Vector Product

We are going to have both Transpose Matrix-Vector Product and Matrix-Vector Product implemented for our Sparse Matrix-Vector Product. Both are needed by matvec.exe. We have provided Matrix-Vector Product matvec as a hint in both COOMatrix.hpp and CSRMatrix.hpp.

Complete the definition of the member function t_matvec in COOMatrix.hpp and in CSRMatrix.hpp.

There are several functions that are not completed as the code is handed to you for this assignment. As a result, when you run matvec.exe you will see error messages for many failures. One of them will look like:

-------------------------------------------------------------------------------
Identity
  COO
-------------------------------------------------------------------------------

When your function is working properly that error message will go away.

Compressed Sparse Column Format

In lecture we derived the compressed sparse row format from coordinate format by first sorting the data based on the row indices stored in COO. By applying run-length encoding we compressed the row indices from a size equal to the number of non-zeros in the matrix down to the number of rows in the matrix (plus one).

A similar process could have been applied had we instead sorted the coordinate data by columns rather than rows. Such a format is known as Compressed Sparse Column (CSC) format.

An implementation for the CSC format has been provided in CSCMatrix.hpp.

Complete the implementations for the two member functions in CSCMatrix.hpp: 1. matvec – computes matrix by vector product 2. t_matvec – computes the transpose matrix by vector product

Array of Structs (AMATH583)

In class we discussed that there are two ways to represent a coordinate (COO) sparse matrix – one that explicitly keeps all three values of the triples representing the matrix data - struct of arrays or array of structs. Our implementation (that we call COOMatrix) thus far has been the former. The row index information, the column index information, and the values are all stored in separate arrays.

You are to complete functionality for an implementation of COO sparse matrix as an array of structs (AOS). With AOS, the sparse matrix information is kept in a single array, where each entry in the array is a data structure consisting of three pieces of information: row index, column index, and value.

In our AOS implementation, we represent each triple as a C++ tuple holding two indices and a value (two size_t and a double):

typedef std::tuple<size_t, size_t, double> element;

and then keep a single array (as an std::vector) of those tuples as the matrix storage:

std::vector<element> storage_;

Thus, the expression to get the m element of the storage would be

element aik = storage_[m];

which would return an element – which is a triple. The function for accessing a particular element of a C++ tuple is called get. To get the row, column, and value of a tuple, we would write:

element aik = storage_[m];
size_t row = std::get<0>(aik);
size_t col = std::get<1>(aik);
double val = std::get<2>(aik);

This is equivalent to the COO case:

size_t row = row_indices_[m];
size_t col = col_indices_[m];
double val = storage_[m];

An implementation for the AOS format has been provided in AOSMatrix.hpp.

Complete the implementations for the two member functions in AOSMatrix.hpp: 1. matvec – computes matrix by vector product 2. t_matvec – computes the transpose matrix by vector product

Sparse-Matrix Vector Product Performance

The matvec.exe programs creates a sparse matrix representing a discretized 2D Laplacian. The program takes as argument the maximum size of the grid along one dimension and runs sizes from 64 to the specified max in powers of 2 (the default max is 2048). Note that these are not the matrix dimensions. For the grid size of 64, the matrix dimension is 4096 by 4096 (which was about the limit of what was reasonable to do with a dense matrix). A grid size of 2048 is represented by a matrix with 4M rows and 4M columns. (How much memory would a dense matrix of that size require?)

For each problem (grid) size, the program prints the performance (in GFlops/sec) along with the grid size, matrix size, and number of nonzeros in the matrix.

Compile and run matvec_test.exe and matvec.exe.

Write (and answer) your own question about sparse matrix computation. This should be a non-trival question. “How do you spell CSR?” is not non-trivial.

(Include your answer in Questions.rst.) How does the performance (in GFLOP/s) for sparse-matrix by vector product compare to what you previously achieved for dense-matrix by dense-matrix product? Explain, and quantify if you can, (e.g., using the roofline model).

(What I am looking for here and in similarly phrased questions is the following: The matmat performance you measured previously and the sparse matrix-vector performance you just measured. By what factor do they differ? (What is the ratio of one to the other?) Then, using your roofline model for your CPU, you should be able to predict what that ratio might be, based on the algorithm characteristics. Hint: slides 68, 70, and maybe 72 from lecture 8. If the numbers don’t come out exactly the same, that’s okay – the point is to show that you can use the roofline model to estimate performance.)

Referring to the roofline model and the sparse matrix-vector and dense matrix-matrix algorithms, what performance ratio would you theoretically expect to see if neither algorithm was able to obtain any reuse from cache?

(Include your answer in Questions.rst.) How does the performance (in GFLOP/s) for sparse-matrix by vector product for COO compare to CSR? Explain, and quantify if you can, (e.g., using the roofline model).

Sparse-Matrix Dense-Matrix Product

So far we have considered sparse matrices with their use in linear systems – noting that sparse matrix by vector product is a fundamental operation. In this problem we want to consider the effects of sparse matrix by dense matrix product.

First, in the file amath583sparse.cpp there are functions not just for matrix by vector product but for matrix-by matrix product as well. As with matrix-vector mult, these dispatch to member functions of the corresponding classes.

Complete the mult functions for matrix-matrix product in COOMatrix, CSRMatrix, and CSCMatrix.

(AMATH 583) Complete the mult functions for matrix-matrix product in AOSMatrix.

Compile and run matmat_test.exe and matmat.exe. As with matvec.exe, matmat.exe takes an argument that specifies the maximum problem size – where problem size in this case is the size of the sparse matrix. It also takes an argument that specifies the maximum number of columns in the matrix B (the dense matrix operand in the sparse matrix by dense matrix product).

(Include your answer in Questions.rst.) How does the performance (in GFLOP/s) for sparse-matrix by dense matrix product (SPMM) compare to sparse-matrix by vector product (SPMV)? The performance for SPMM should be about the same as for SPMV in the case of a 1 column dense matrix. What is the trend with increasing numbers of columns? Explain, and quantify if you can, using the roofline model.

(Include your answer in Questions.rst.) How does the performance of sparse matrix by dense matrix product (in GFLOP/s) compare to the results you got dense matrix-matrix product in previous assignments? Explain, and quantify if you can, using the roofline model.

Answer the following questions (append to Questions.rst): a) The most important thing I learned from this assignment was… b) One thing I am still not clear on is…

Array of Structs vs Struct of Arrays (AMATH583)

In ps4, we have a taste of what the best way to store / access that data in memory is in the context of storing images. We could, for instance, store each color plane separately, so that the image is a “structure of arrays”:

../_images/soa_pixels.png

Or, we could store the three color values of each pixel together, so that the image is an “array of structures”:

../_images/aos_pixels.png

Or, we could store just a single large array, treating the image as a “tensor” and use a lookup formula to determine how the data are stored.

In sparse matrix storage, COOMatrix is essentially a “structure of arrays”, where the row_indices_, col_indices_, and storage_ are stored as three arrays separately. On the other hand, AOSMatrix is an “array of structures”, where storage_ is an array of elements that each element is a std::tuple<size_t, size_t, double>.

How does the performance (in GFLOP/s) for sparse-matrix by vector product of struct of arrays (COOMatrix) compare to that of array of structs (AOSMatrix)? Explain what about the data layout and access pattern would result in better performance.

How does the performance (in GFLOP/s) for sparse-matrix by dense matrix product of struct of arrays (COOMatrix) compare to that of array of structs (AOSMatrix)? Explain what about the data layout and access pattern would result in better performance.

Extra Credit

Experiment with some of the optimizations we developed previously in the course for matrix-matrix product and apply them to sparse-matrix by dense-matrix product. Explain, and quantify if you can, using the roofline model.

Experiment with some of the optimizations we developed previously in the course for matrix-matrix product and apply them to sparse-matrix by vector product. Explain, and quantify if you can, using the roofline model.

(Be creative. Don’t necessarily restrict yourself to loop optimizations. Think back to previous assignments where performance was increased in ways maybe having not to do with just loop optimizations.)

Submitting Your Work

Submit your files to Gradescope. To log in to Gradescope, point your browser to gradescope.com and click the login button. On the form that pops up select “School Credentials” and then “University of Washington netid”. You will be asked to authenticate with the UW netid login page. Once you have authenticated you will be brought to a page with your courses. Select amath583sp22.

There you will find two assignments for 483 and 583 respectively: “midterm - coding” and “midterm - written”. For the coding part, you only need to submit the files that you edited:

  • AOSMatrix.hpp (583 only)

  • COOMatrix.hpp

  • CSCMatrix.hpp

  • CSRMatrix.hpp

The autograder should provide you a feedback within 10 minutes after the submission. Please contact TAs if you see “Autograder failed to execute correctly”. Do not attach the code – they will have access to your submission on Gradescope.

For the written part, you will need to submit your Questions.rst as a PDF file. You can use any modern text processor (Word, Pages, TextEdit, etc), or console tools like pandoc or rst2pdf, to convert it to a PDF file. Please make sure to match the relevant pages of your pdf with questions on Gradescope.

If you relied on any external resources, include the references to your document as well.