Assigned: Apr 28, 2022
Due: May 5, 2022
Extended to May 6, 2022
Lecture 9
Sparse matrices are the central data structure in high-performance scientific computing and we will continue to build on them for the rest of the course.
This assignment is extracted as with the previous assignment into a
directory midterm
Create a new subdirectory named “midterm” to hold your files for this
assignment. The files for this assignment are in a “tar ball” that can be
downloaded as midterm.tar.gz
.
To extract the files for this assignment, download the tar ball and copy it (or move it) to your midterm subdirectory. From a shell program, change your working directory to be the midterm subdirectory and execute the command
$ tar zxvf midterm.tar.gz
The files that should be extracted from the tape archive file are
AOSMatrix.hpp: Declarations for Array of Structs Format Sparse Matrix class
COOMatrix.hpp: Declarations for Coordinate Format Sparse Matrix class
CSCMatrix.hpp: Declarations for Compressed Sparse Column Format Sparse Matrix class
CSRMatrix.hpp: Declarations for Compressed Sparse Row Format Sparse Matrix class
Makefile
Matrix.hpp: Declarations for Matrix class
Questions.rst: File containing questions to be answered for this assignment
Timer.hpp: Simple timer class
Vector.hpp: Declarations for Vector class
amath583IO.cpp: Definitions (implementations) of Matrix and Vector IO functions
amath583IO.hpp: Declarations of Matrix and Vector IO functions
amath583sparse.cpp: Definitions (implementations) of Sparse Matrix functions
amath583sparse.hpp: Declarations of Sparse Matrix functions
amath583.cpp: Definitions (implementations) of Matrix and Vector functions
amath583.hpp: Declarations of Matrix and Vector functions
catch.hpp: Catch2 testing framework header
matmat.cpp: sparse matrix by dense matrix product driver
matmat_test.cpp: sparse matrix by dense matrix product test
matvec.cpp: sparse matrix by vector product driver
matvec_test.cpp: sparse matrix by vector product test
verify-*.bash: Verification scripts
For historical reasons this course has had a midterm exam, about, well, midway through the course.
As I hope you have seen with the assignments thus far in the course, my intention for the assignments is that they be instructive and tutorial much more than they be evaluative. I feel the same way about the exams in the course - they are intended to be learning experiences.
At the same time, I do have to do some assessment of how well each of you has progressed in these learning experiences. So, this assignment and the very last one are designated as exams. The rules are the same as for the other assignments, except
The coding part of the midterm will be evaluated with autograder, same as ps2/ps3. The writing part of the midterm will be evaluated by hand.
As with previous assignments, there are tests supplied with the midterm that you can use to verify your solutions.
We haven’t really had any requirements on commenting your code, either with
//
or the /*
*/
pair.
Continuing in this regard,
general code comments will not be required for the midterm, but are
highly suggested. Any clarifying comments may help the graders award
partial credit if your code doesn’t exhibit the correct behavior.
“Code is a medium to communicate with other programmers”.
Good comments will also make your code understandable to both you and others.
The work for this assignment will mostly be carried out in the header files for the corresponding classes (any other files will be specified in each problem). There are four driver programs that will be used to test and time your implementations:
matvec_test.cpp
matvec.cpp
matmat_test.cpp
matmat.cpp
Targets for the corresponding exe files are given in your Makefile.
The source code files for 483 and 583 are the same. There are certain
sections in the source files above that are guarded by a preprocessor
directive. In your Makefile there is a macro DEFS
. If you are a 583 student,
make sure it is defined as
DEFS := -D__583
by removing the comment symbol “#”. So that the guarded code (having to do with the AOS matrix) is compiled.
If you are a 483 student, make sure the macro is not set:
DEFS :=
by removing the comment symbol “#”. (So that the guarded code is not compiled.)
Similarly, to properly run the verification scripts (depending on 483 or 583),
comment out the appropriate line in verify-opts.bash
.
If you are a 483 student, the correct define should be
declare -a ALLDEFS=("")
If you are a 583 student, the correct define should be
declare -a ALLDEFS=("-D__583")
Recall that mathematically, matrix-vector product \(y \leftarrow A x\) is defined by:
In the dense matrix case, we implemented this as
for (size_t i = 0; i < A.num_rows(); ++i) {
for (size_t k = 0; k < A.num_cols(); ++k) {
y(i) += A(i, k) * x(k);
}
}
The function loops through all values of the matrix A
,
indexes into A
with i
and k
, indexes into y
with i
and indexes into x
with k
.
In the definition of matrix-vector product, the key invariant relationship is that the matrix element has two indices – the left index corresponds to the index into the \(y\) and the right index corresponds to the index into \(x\). When we implement this product as a program, we assume we have all the values stored, and so we store them in a rectangular array – which is indexed by the same indices associated with each value of the finite dimensional linear operator being represented by A
.
But, we don’t have to represent matrices that way. In particular, if we have a sparse matrix (one with many more zero elements than non-zero elements), the non-zero elements are not stored in a rectangular array (with implicit indices) – rather, each non-zero element is stored along with its index values. That is, abstractly, a sparse matrix (any matrix, really), is represented by a set of triples:
Where \(a\) is a Real or Complex number and \(i\) and \(j\) are integers.
Matrix-vector product is still abstractly the same but now for each \(y_i\) we are summing over those triples in \(A\) where the first index is equal to \(i\):
There are two complementary ways to store a set of triples in C++: as a struct of arrays or as an array of structs. In the former case, we store the values for each triple in separate arrays (at the same location) such that for some integer, say, m
, a[m]
, i[m]
, and j[m]
represent a triple in the sparse matrix being stored. This is how the COO format is defined, where row_indices_
, col_indices_
, and storage_
are used to store the components of each triple.
Matrix-vector product is then carried out as:
void matvec(const Vector& x, Vector& y) const {
for (size_t m = 0; m < storage_.size(); ++m) {
y(row_indices_[m]) += storage_[m] * x(col_indices_[m]);
}
}
Note that m
is being used to index into each of row_indices_
, col_indices_
, and storage_
As we saw in lecture, we can eliminate some redundant index information that is being stored by sorting the triples either according to the first index (the row number) or the second index (the column number) and then compressing the resulting sequences of repeated index values. This gives us the CSR and CSC formats, respectively.
Regardless of representation, COO, CSR, CSC, or AOS – the fundamental idea is that we are storing triples – which we will then use in matrix-vector (and matrix-matrix) product. All of the matrix-vector product functions for these formats carry out essentially the same process – get the row number, the column number, and the value for each matrix element and then update the appropriate element of the vector y
using the product of the matrix element and the appropriate value of the vector x
.
Computing the product \(y \leftarrow A^\top x\) – or, equivalently – \(y \leftarrow x A\) is defined as
We are doing the same thing as before, but the roles of the first and second index are exchanged.
How would you modify the following code to carry out the transpose matrix-vector product?
void matvec(const Vector& x, Vector& y) const {
for (size_t m = 0; m < storage_.size(); ++m) {
y(row_indices_[m]) += storage_[m] * x(col_indices_[m]);
}
}
Hint: In the mathematical formula, we swapped to use the second index for indexing \(y\) and the first index for indexing \(x\). What would the corresponding change be to the code?
As supplied, the driver programs matvec_test.cpp and matvec.cpp will compile and run with CSR and COO formats, but respectively fail the tests and print nonsense values for CSC (which has not been implemented).
Compile and run matvec_test.exe and matvec.exe
$ make matvec_test.exe
$ ./matvec_test.exe
$ make matvec.exe
$ ./matvec.exe
The driver matvec.exe will run and time matrix-vector product using a discretized 2D Laplacian (as presented in lecture) for various problem sizes and print the performance (in GFlops/sec) on each line.
Notice that there are columns in the matvec.exe output for both the matvec time and for the matvec transpose time.
NNZ COO COO^T CSR CSR^T CSC CSC^T AOS AOS^T
20480 1.40571 1.57306 2.09741 2.03288 2.03288 2.03288 1.34834 1.8101
81920 1.36646 1.65683 2.03918 2.10391 2.10391 2.10391 1.41007 1.69931
327680 1.30436 1.5622 1.97572 1.89224 1.84039 1.89224 1.13855 1.4142
We will be implementing both matrix-vector product and transpose-matrix vector product.
The functionality for sparse matrix classes is contained in the sparse matrix class declaration files (COO.hpp, CSR.hpp, CSC.hpp, and AOS.hpp) and in the two files amath583sparse.hpp and amath583sparse.cpp. All of the member functions for the classes are contained in the class declarations themselves. The functions in amath583sparse are things like norm, matrix-vector product, and so forth.
There is a bit more inside of the sparse matrix classes than there was in the Matrix
class. With the Matrix
class, we could maintain a private internal representation of the Matrix
and
expose just operator()
and we could do everything we needed to with the Matrix
. For a sparse matrix, storing \((a, i, j)\) triples, we can’t provide an operator()
interface – we have
to instead provide an interface with more specialized functionality.
In the case of Matrix
, the mult
function was a free function, just using operator()
:
1void mult(const Matrix& A, const Vector& x, Vector& y) {
2 assert(y.num_rows() == A.num_rows());
3 for (size_t i = 0; i < A.num_rows(); ++i) {
4 for (size_t j = 0; j < A.num_cols(); ++j) {
5 y(i) += A(i, j) * x(j);
6 }
7 }
8}
It doesn’t matter how the matrix is implemented inside, this function just needs operator()
. Note that we also provide a function mult
of two arguments, that returns y
:
1Vector mult(const Matrix& A, const Vector& x) {
2 Vector y(A.num_rows());
3 mult(A, x, y);
4 return y;
5}
It allocates y
, calls the three-argument version of mult
and then returns y
.
We have mult
functions with the same interface for COOMatrix
1void mult(const COOMatrix& A, const Vector& x, Vector& y) {
2 A.matvec(x, y);
3}
1Vector mult(const COOMatrix& A, const Vector& x) {
2 Vector y(A.num_rows());
3 mult(A, x, y);
4 return y;
5}
Note that now we invoke the member function matvec()
of the COOMatrix
. That in turn is defined in COOMatrix.hpp:
1 void matvec(const Vector& x, Vector& y) const {
2 for (size_t k = 0; k < storage_.size(); ++k) {
3 y(row_indices_[k]) += storage_[k] * x(col_indices_[k]);
4 }
5 }
This pattern is also used by CSRMatrix
, CSCMatrix
, and AOSMatrix
.
Note
DRY
All four of the sparse matrix classes we are working with have the same set of operations declared for them.
E.g. as declared in amath583sparse.hpp
,
void mult(const COOMatrix& A, const Vector& x, Vector& y);
void mult(const CSRMatrix& A, const Vector& x, Vector& y);
void mult(const CSCMatrix& A, const Vector& x, Vector& y);
void mult(const AOSMatrix& A, const Vector& x, Vector& y);
This seems like huge amounts of unnecessary repetition. As in all cases where something is repeated but with a slight difference, we need a way to express what is the same once, while parameterizing what is different so that we can capture the four (here) cases we actually need. In this case, it is the types of the parameters being passed that vary. C++ has a mechanism (an extremely powerful mechanism) for parameterizing on types: templates, which allow both functions and data structures to be parameterized. We will come back to templates in a week or two, but using templates, the declarations above would compact to:
template <class MatrixType>
mult(const MatrixType& A, const Vector& x, Vector& y);
(In fact, if you look at the definitions of mult
for the sparse matrices, they are also all the same.
As with regular functions, function templates need to have a matching declaration and definition – so the implementations
would be compacted as well.)
For the first part of this assignment we are going to complete the implementations of the function necessary to fill in the table printed out by matvec.exe.
We are going to have both Transpose Matrix-Vector Product and Matrix-Vector Product implemented for our Sparse Matrix-Vector Product.
Both are needed by matvec.exe. We have provided Matrix-Vector Product matvec
as a hint in both
COOMatrix.hpp and CSRMatrix.hpp.
Complete the definition of the member function t_matvec
in COOMatrix.hpp and in CSRMatrix.hpp.
There are several functions that are not completed as the code is handed to you for this assignment. As a result, when you run matvec.exe you will see error messages for many failures. One of them will look like:
-------------------------------------------------------------------------------
Identity
COO
-------------------------------------------------------------------------------
When your function is working properly that error message will go away.
In lecture we derived the compressed sparse row format from coordinate format by first sorting the data based on the row indices stored in COO. By applying run-length encoding we compressed the row indices from a size equal to the number of non-zeros in the matrix down to the number of rows in the matrix (plus one).
A similar process could have been applied had we instead sorted the coordinate data by columns rather than rows. Such a format is known as Compressed Sparse Column (CSC) format.
An implementation for the CSC format has been provided in CSCMatrix.hpp.
Complete the implementations for the two member functions in CSCMatrix.hpp: 1. matvec – computes matrix by vector product 2. t_matvec – computes the transpose matrix by vector product
In class we discussed that there are two ways to represent a coordinate (COO) sparse matrix – one that explicitly keeps all three values of the triples representing the matrix data - struct of arrays or array of structs. Our implementation (that we call COOMatrix) thus far has been the former. The row index information, the column index information, and the values are all stored in separate arrays.
You are to complete functionality for an implementation of COO sparse matrix as an array of structs (AOS). With AOS, the sparse matrix information is kept in a single array, where each entry in the array is a data structure consisting of three pieces of information: row index, column index, and value.
In our AOS implementation, we represent each triple as a C++ tuple
holding two indices and a value (two size_t
and a double
):
typedef std::tuple<size_t, size_t, double> element;
and then keep a single array (as an std::vector
) of those tuples as the matrix storage:
std::vector<element> storage_;
Thus, the expression to get the m
element of the storage
would be
element aik = storage_[m];
which would return an element – which is a triple. The function for accessing
a particular element of a C++ tuple is called get
. To get the row, column, and value of a tuple, we would write:
element aik = storage_[m];
size_t row = std::get<0>(aik);
size_t col = std::get<1>(aik);
double val = std::get<2>(aik);
This is equivalent to the COO case:
size_t row = row_indices_[m];
size_t col = col_indices_[m];
double val = storage_[m];
An implementation for the AOS format has been provided in AOSMatrix.hpp.
Complete the implementations for the two member functions in AOSMatrix.hpp: 1. matvec – computes matrix by vector product 2. t_matvec – computes the transpose matrix by vector product
The matvec.exe programs creates a sparse matrix representing a discretized 2D Laplacian. The program takes as argument the maximum size of the grid along one dimension and runs sizes from 64 to the specified max in powers of 2 (the default max is 2048). Note that these are not the matrix dimensions. For the grid size of 64, the matrix dimension is 4096 by 4096 (which was about the limit of what was reasonable to do with a dense matrix). A grid size of 2048 is represented by a matrix with 4M rows and 4M columns. (How much memory would a dense matrix of that size require?)
For each problem (grid) size, the program prints the performance (in GFlops/sec) along with the grid size, matrix size, and number of nonzeros in the matrix.
Compile and run matvec_test.exe and matvec.exe.
Write (and answer) your own question about sparse matrix computation. This should be a non-trival question. “How do you spell CSR?” is not non-trivial.
(Include your answer in Questions.rst.) How does the performance (in GFLOP/s) for sparse-matrix by vector product compare to what you previously achieved for dense-matrix by dense-matrix product? Explain, and quantify if you can, (e.g., using the roofline model).
(What I am looking for here and in similarly phrased questions is the following: The matmat performance you measured previously and the sparse matrix-vector performance you just measured. By what factor do they differ? (What is the ratio of one to the other?) Then, using your roofline model for your CPU, you should be able to predict what that ratio might be, based on the algorithm characteristics. Hint: slides 68, 70, and maybe 72 from lecture 8. If the numbers don’t come out exactly the same, that’s okay – the point is to show that you can use the roofline model to estimate performance.)
Referring to the roofline model and the sparse matrix-vector and dense matrix-matrix algorithms, what performance ratio would you theoretically expect to see if neither algorithm was able to obtain any reuse from cache?
(Include your answer in Questions.rst.) How does the performance (in GFLOP/s) for sparse-matrix by vector product for COO compare to CSR? Explain, and quantify if you can, (e.g., using the roofline model).
So far we have considered sparse matrices with their use in linear systems – noting that sparse matrix by vector product is a fundamental operation. In this problem we want to consider the effects of sparse matrix by dense matrix product.
First, in the file amath583sparse.cpp there are functions not just for matrix
by vector product but for matrix-by matrix product as well. As with matrix-vector
mult
, these dispatch to member functions of the corresponding classes.
Complete the mult functions for matrix-matrix product in COOMatrix, CSRMatrix, and CSCMatrix.
(AMATH 583) Complete the mult functions for matrix-matrix product in AOSMatrix.
Compile and run matmat_test.exe and matmat.exe. As with matvec.exe, matmat.exe takes an argument that specifies the maximum problem size – where problem size in this case is the size of the sparse matrix. It also takes an argument that specifies the maximum number of columns in the matrix B (the dense matrix operand in the sparse matrix by dense matrix product).
(Include your answer in Questions.rst.) How does the performance (in GFLOP/s) for sparse-matrix by dense matrix product (SPMM) compare to sparse-matrix by vector product (SPMV)? The performance for SPMM should be about the same as for SPMV in the case of a 1 column dense matrix. What is the trend with increasing numbers of columns? Explain, and quantify if you can, using the roofline model.
(Include your answer in Questions.rst.) How does the performance of sparse matrix by dense matrix product (in GFLOP/s) compare to the results you got dense matrix-matrix product in previous assignments? Explain, and quantify if you can, using the roofline model.
Answer the following questions (append to Questions.rst): a) The most important thing I learned from this assignment was… b) One thing I am still not clear on is…
In ps4, we have a taste of what the best way to store / access that data in memory is in the context of storing images. We could, for instance, store each color plane separately, so that the image is a “structure of arrays”:
Or, we could store the three color values of each pixel together, so that the image is an “array of structures”:
Or, we could store just a single large array, treating the image as a “tensor” and use a lookup formula to determine how the data are stored.
In sparse matrix storage, COOMatrix is essentially
a “structure of arrays”, where the row_indices_
, col_indices_
, and storage_
are stored as three arrays separately. On the other hand, AOSMatrix is an “array of structures”, where storage_
is an array of elements that each element is a std::tuple<size_t, size_t, double>
.
How does the performance (in GFLOP/s) for sparse-matrix by vector product of struct of arrays (COOMatrix) compare to that of array of structs (AOSMatrix)? Explain what about the data layout and access pattern would result in better performance.
How does the performance (in GFLOP/s) for sparse-matrix by dense matrix product of struct of arrays (COOMatrix) compare to that of array of structs (AOSMatrix)? Explain what about the data layout and access pattern would result in better performance.
Experiment with some of the optimizations we developed previously in the course for matrix-matrix product and apply them to sparse-matrix by dense-matrix product. Explain, and quantify if you can, using the roofline model.
Experiment with some of the optimizations we developed previously in the course for matrix-matrix product and apply them to sparse-matrix by vector product. Explain, and quantify if you can, using the roofline model.
(Be creative. Don’t necessarily restrict yourself to loop optimizations. Think back to previous assignments where performance was increased in ways maybe having not to do with just loop optimizations.)
Submit your files to Gradescope. To log in to Gradescope, point your browser to gradescope.com and click the login button. On the form that pops up select “School Credentials” and then “University of Washington netid”. You will be asked to authenticate with the UW netid login page. Once you have authenticated you will be brought to a page with your courses. Select amath583sp22.
There you will find two assignments for 483 and 583 respectively: “midterm - coding” and “midterm - written”. For the coding part, you only need to submit the files that you edited:
AOSMatrix.hpp (583 only)
COOMatrix.hpp
CSCMatrix.hpp
CSRMatrix.hpp
The autograder should provide you a feedback within 10 minutes after the submission. Please contact TAs if you see “Autograder failed to execute correctly”. Do not attach the code – they will have access to your submission on Gradescope.
For the written part, you will need to submit your Questions.rst as a PDF file. You can use any modern text processor (Word, Pages, TextEdit, etc), or console tools like pandoc or rst2pdf, to convert it to a PDF file. Please make sure to match the relevant pages of your pdf with questions on Gradescope.
If you relied on any external resources, include the references to your document as well.