Assigned: May 30, 2020
Due: Jun 5, 2020
Lectures 15, 16 (CUDA)
Lectures 21, 22 (C++ templates and generic programming)
Walk through of connecting to AWS using VS Code
This assignment is extracted as with the previous assignment into a
directory ps6. The tar file is at this link
.
When you extract the files, you will see the following directory structure, which is similar to, but not exactly the same as the previous assignment:
ps7/
├── Questions.rst
├── README.rst
├── data/
├── hello_omp/
│ ├── Makefile
│ ├── getenv.cpp
│ ├── hello_omp_2_threads.cpp
│ ├── hello_omp_critical.cpp
│ ├── hello_omp_num_threads.cpp
│ ├── hello_omp_parallel.cpp
│ ├── hello_omp_parallel_critical.cpp
│ └── ompi_info.cpp
├── include/
│ ├── AOSMatrix.hpp
│ ├── COOMatrix.hpp
│ ├── CSCMatrix.hpp
│ ├── CSRMatrix.hpp
│ ├── Make.inc
│ ├── Matrix.hpp
│ ├── Timer.hpp
│ ├── Vector.hpp
│ ├── amath583.hpp /* Everything is in here now */
│ ├── amath583IO.hpp
│ ├── catch.hpp
│ ├── getenv.hpp
│ ├── norm_utils.hpp
│ ├── norms.hpp
│ ├── pagerank.hpp
│ └── util.hpp /* NEW */
├── matvec/
│ └── Makefile
│ └── matmat.cpp
│ └── matvec.cpp
│ └── pmatmat.cpp
│ └── pmatvec.cpp
├── norm/
│ ├── Makefile
│ ├── norm_block_critical.cpp
│ ├── norm_block_reduction.cpp
│ ├── norm_cyclic_critical.cpp
│ ├── norm_cyclic_reduction.cpp
│ ├── norm_parfor.cpp
│ └── norm_seq.cpp
├── pagerank/
│ ├── Makefile
│ └── pagerank.cpp
├── test/
│ ├── Makefile
│ ├── matmat_test.cpp
│ ├── matvec_test.cpp
│ ├── norms_test.cpp
│ ├── pagerank_test.cpp
│ ├── pmatmat_test.cpp
│ ├── pmatvec_test.cpp
└── warmup/
├── Makefile
├── hello_script.bash
├── nonsense.cpp
├── overloads.cpp
└── specialization.cpp
Note that:
There is one new file: include/util.hpp
There is no longer a src
subdirectory (nor the library source files)
There is no longer include/amath583sparse.hpp
Most (all?) of the functions have been completed with a reference (my own) implementation.
In addition to the setup for connecting to AWS, you should update the setting “C_Cpp > Default: Cpp Standard” to c++17.
Per the walkthrough video, connect to AWS:
Copy or move your public and private keys to your .ssh subdirectory.
Use vs code (or an editor of your choice if you feel comfortable doing so) to create (or modify) a config file in .ssh/config
to contain an entry for the AWS instance.
Copy your existing ps6 subdirectory there using rsync (not ps7)
Rerun the timing programs in matvec, norm, and pagerank. (You may also find running those in hello_omp to be interesting). These aren’t “official” timings, but are just to give you a feel for a platform with a few more cores than you may have already been using (we will be using even more a bit further down).
Copy ps7 to AWS – you can either copy the whole directory with rsync after you have extracted it on your laptop or copy the tar file (using rsync or scp) and extract the tar files there. I suggest keeping your ps7 files in just one place – AWS – so that you don’t get into the problem of making some edits in one place and some in another and not keeping different locations consistent. There are tools to manage that process (source code control, e.g., git) but those take a while to get up to speed with and can cause more trouble than they save in the meantime.
As a reminder, an entry in .ssh/config
should look like the following:
# Read more about SSH config files: https://linux.die.net/man/5/ssh_config
Host ps6
HostName 54.184.35.40
User al75
IdentityFile /Users/lums658/.ssh/al75_id_rsa
Replace “al75” with your UW net id and the path in Identityfile to the location of your .ssh subdirectory.
Note
EFS
The home directories on our AWS instance are stored in Amazon’s EFS (elastic file system). We will be using different machines for multicore, GPU, and cluster assignments, but they will all connect to the same file system – the same files will be on any machine we use.
Most (all?) modern C++ libraries expect to be able to be composed with other libraries – other libraries written by other authors in other times and other places. Being able to safely and reliably (and confidently) compose independently developed libraries is of utmost importance in being able to construct large software systems from smaller software components.
In C++ this mechanism is provided by type parameterization (“templates”).
In the pagerank problems in previous assignments, we found that certain data structure choices offered better performance in different situations – we might want to use CSRMatrix
in some situations or CSCMatrix
in others.
But consider the implementations of pagerank
if we wanted to support both:
Vector pagerank(const CSRMatrix& P, double alpha, double tol, size_t max_iters) {
Vector x(P.num_rows(), 1.0 / static_cast<double>(P.num_rows()));
for (size_t i = 0; i < max_iters; ++i) {
Vector y = mult(x, P);
y *= alpha;
y += (1.0 - alpha) / static_cast<double>(x.num_rows());
if (two_norm(x - y) < tol * two_norm(x)) {
return y;
}
std::swap(x, y);
}
return x;
}
Vector pagerank(const CSRMatrix& P, double alpha, double tol, size_t max_iters) {
Vector x(P.num_rows(), 1.0 / static_cast<double>(P.num_rows()));
for (size_t i = 0; i < max_iters; ++i) {
Vector y = mult(x, P);
y *= alpha;
y += (1.0 - alpha) / static_cast<double>(x.num_rows());
if (two_norm(x - y) < tol * two_norm(x)) {
return y;
}
std::swap(x, y);
}
return x;
}
There is literally only one character that differs between the two functions. But, we need both if we are going to support both CSCMatrix
and CSRMatrix
(and we would need a new one for COOMatrix
and one for AOSMatrix
and one for any new sparse matrix type).
Similarly, in the body of the pagerank
function, we have a call to mult
, for which we then also need multiple almost-identical-except-for-one-character implementations (one for every matrix type):
void mult(const Vector& x, const CSCMatrix& A, Vector& y) {
A.t_matvec(x, y);
}
void mult(const Vector& x, const CSRMatrix& A, Vector& y) {
A.t_matvec(x, y);
}
Notice that in each of these cases, there is nothing about the functions that depends on implementation details of the matrix type (which is why we have identical function bodies for both). This is a fundamental principle in generic programming: functions should be written based on what functions need from types so that the function can operate correctly and efficiently. Types then need to be written to adhere to those requirements. (Generic programming is sometimes – rightly – characterized as “algorithm-oriented” programming.)
To have a slightly richer function body, let’s consider the case of dense
row-oriented matrices and colummn-oriented matrices, RowMatrix
and
ColMatrix
classes, respectively (they have been lurking in Matrix.hpp
all quarter). The matrix multiply algorithms for each class
would be very similar:
void multiply(const RowMatrix& A, const RowMatrix& B, RowMatrix& C) {
for (size_t i = 0; i < A.num_rows(); ++i) {
for (size_t j = 0; j < B.num_cols(); ++j) {
for (size_t k = 0; k < A.num_cols(); ++k) {
C(i, j) += A(i, k) * B(k, j);
}
}
}
}
void multiply(const ColMatrix& A, const ColMatrix& B, ColMatrix& C) {
for (size_t i = 0; i < A.num_rows(); ++i) {
for (size_t j = 0; j < B.num_cols(); ++j) {
for (size_t k = 0; k < A.num_cols(); ++k) {
C(i, j) += A(i, k) * B(k, j);
}
}
}
}
In fact, the function bodies would be the same as for the dense Matrix
class
we have been using all quarter. Moreover,
the functions (which are overloads) are invoked identically:
RowMatrix A(size), B(size), C(size);
ColMatrix D(size), E(size), F(size);
multiply(A, B, C);
multiply(D, E, F);
Notice that RowMatrix
and ColMatrix
have the same interface based primarily on operator()
–
and deliberately so. A matrix has particular semantics and both
RowMatrix
and ColMatrix
should present those. The difference
between the two types has nothing to do with how they are used – the
difference is simply how they internally represent the matrix.
Now, despite having the same interfaces and same semantics, since
RowMatrix
and ColMatrix
are different types, we have to have
different multiply
functions for them. That is, although the syntax for accessing elements is the same in each case – a call to operator()
– these are actually different functions and in the code that is actually running, the operator()
function belonging to the class being used is the one that needs to be called. That is, if we are multiplying two RowMatrix
matrices together, the operator()
inside multiply
needs to be the function belonging to RowMatrix
(and similarly for ColMatrix
). So even though the syntax of the function bodies is exactly the same, they are in fact doing different things (calling completely different operator()
functions) – and so we actually need two different functions. If we have additional matrix types, we need functions for those as well.
To really confuse matters, suppose you wanted to use ColMatrix
and RowMatrix
together? You would need eight multiply
functions (I just show the prototypes here, the function bodies are all the same):
void multiply(const ColMatrix& A, const ColMatrix& B, ColMatrix& C);
void multiply(const ColMatrix& A, const ColMatrix& B, RowMatrix& C);
void multiply(const ColMatrix& A, const RowMatrix& B, ColMatrix& C);
void multiply(const ColMatrix& A, const RowMatrix& B, RowMatrix& C);
void multiply(const RowMatrix& A, const ColMatrix& B, ColMatrix& C);
void multiply(const RowMatrix& A, const ColMatrix& B, RowMatrix& C);
void multiply(const RowMatrix& A, const RowMatrix& B, ColMatrix& C);
void multiply(const RowMatrix& A, const RowMatrix& B, RowMatrix& C);
If we had three matrix types we would need 27 versions. And so on.
And yet, the functions bodies are all the same. If you had to maintain a library with a combinatorially large number of functions, all identical except for the types, you might start thinking about automatically generating the functions. That is, if you knew you needed a multiply
for RowMatrix
with a RowMatrix
into a ColMatrix
– or a FooMatrix
with a BarMatrix
into a BazMatrix
, you might write a script that takes the three types and generates the necessary function:
def gen_multiply(m1, m2, m3):
print('''
void multiply(const {first}& A, const {second}& B, {third}& C) {{
for (size_t i = 0; i < A.num_rows(); ++i) {{
for (size_t j = 0; j < B.num_cols(); ++j) {{
for (size_t k = 0; k < A.num_cols(); ++k) {{
C(i, j) += A(i, k) * B(k, j);
}}
}}
}}
}}
'''.format(first=m1, second=m2, third=m3))
When we invoke this script:
gen_multiply("RowMatrix", "RowMatrix", "ColMatrix")
We get the specific function we want:
void multiply(const RowMatrix& A, const RowMatrix& B, ColMatrix& C) {
for (size_t i = 0; i < A.num_rows(); ++i) {
for (size_t j = 0; j < B.num_cols(); ++j) {
for (size_t k = 0; k < A.num_cols(); ++k) {
C(i, j) += A(i, k) * B(k, j);
}
}
}
}
Let’s think for a second about what our script is doing. It is a “template” if you like (pun intended) for the function we really want. Notice that everywhere we want to stamp something in we have a formatting placeholder ({first}
, {second}
, {third}
). We could spread those placeholders out wherever we like and still generate a correct function (assuming the function is correct when the type name is substituted for the placeholder).
So if we had a matvec
generating script for the two-argument matvec
(the one that returns the result):
def gen_matvec(m1, v1):
print('''
{second} matvec(const {first}& A, const {second}& x) {{
{second} y(x.num_rows());
matvec(A, x, y);
return y;
}}
'''.format(first=m1, second=v1))
We could invoke it as
gen_matvec("RowMatrix", "Vector")
to obtain
Vector matvec(const Matrix& A, const Vector& x) {
Vector y(x.num_rows());
matvec(A, x, y);
return y;
}
Basically, everywhere we would have a type, we can put in a placeholder and then stamp out the function we want.
Of course, this isn’t how we are really going to do things. Although it saves all of the duplication of code – we only need the “template” that is in the script – the process for using this for real programs would be quite unwieldy (i.e., impossible to really do). You have two different software systems coupled to each other – we have to feed the types we want to Python which would generate the C++ functions we want, which we would have to feed back to C++.
But the essential capability – write the function as a template in terms of placeholders that are filled in with types when you need a particular version of the function – would seem to be, well, essential for the types of situations above (these situations are ubiquitous in real code).
C++ does in fact have such a template mechanism. It lets you write function templates (and, in fact, class templates) that are then stamped out on demand to create type-specific functions as you need them (and then compiled). It is important to keep in mind that that is what is really happening – which is why I illustrated this first in terms of an external script. The difference with it all happening in C++ is that you will not see the generated function (or class) as such – but C++ is stamping them out and compiling them just as if we had generated them manually.
Consider a multiply
function template:
template <class Matrix_1_t, class Matrix_2_t, class Matrix_3_t>
void multiply(const Matrix_1_t& A, const Matrix_2_t& B, Matrix_3_t& C) {
for (size_t i = 0; i < A.num_rows(); ++i) {
for (size_t j = 0; j < B.num_cols(); ++j) {
for (size_t k = 0; k < A.num_cols(); ++k) {
C(i, j) += A(i, k) * B(k, j);
}
}
}
}
and compare this to the script we wrote to do the same thing. We are telling C++ that we are defining a function template that has three placeholders and then writing a function in terms of those placeholders.
If we now want to multiply a RowMatrix
by a RowMatrix
into a ColMatrix
, we just need to invoke the function template:
RowMatrix A(10, 10);
RowMatrix B(10, 10);
ColMatrix C(10, 10);
multiply<RowMatrix, RowMatrix, ColMatrix>(A, B, C);
When the compiler runs across the call to multiply<RowMatrix, RowMatrix, ColMatrix>
, it generates the necessary function (unless it has already generated that function from a previous use with the same types). Then, since this is a function call, it makes a call to that generated function.
Now, we have to be careful, just as we can’t call a function defined for certain types with arguments of other types, we have to make sure that the types of arguments match the types passed to placeholders. This would fail to compile, for example:
RowMatrix A(10, 10);
RowMatrix B(10, 10);
ColMatrix C(10, 10);
multiply<RowMatrix, RowMatrix, RowMatrix>(A, B, C);
We are defining a multiply
for three RowMatrix
matrices but passing in a ColMatrix
as the third argument. Fortunately, in many cases, C++ can infer the types you want for the placeholders based on the arguments you pass it (type argument deduction). For multiply
, we just need to invoke it as:
RowMatrix A(10, 10);
RowMatrix B(10, 10);
ColMatrix C(10, 10);
multiply(A, B, C);
and C++ can match the type of A to the placeholder for A, the type of B for the placeholder for B, and the type of C for the placeholder of C. Sometimes C++ may not be able to do this, in which case you may need to pass in explicit types to create the function you want from the function template.
Templates are an extraordinarily powerful mechanism in C++. In fact, the
template mechanism in C++ is its own Turing-complete language – one can
perform arbitrary compile-time operations using it. (C++ has more
recently introduced the constexpr
mechanism for compile-time
computation and should generally be used instead of template
metaprogramming.)
There are a few things to be aware of when using function templates.
1. Function templates are not functions. As the name implies, function templates are, well, templates. They are not actual functions and are not compiled – by themselves they don’t have enough information to be compiled. The type parameter needs to be bound to an actual type – then the compiler has enough information to compile the function. One refers to the creation of a function when the type parameter is bound as “instantiation.”
2. The model for using function templates is different than for concrete functions. To use a concrete function in a program we need its declaration. The function itself can exist elsewhere and be compiled separately from where it is being called. We can’t compile a function template separately – we don’t even know what it will be until we compose it with a particular type. When the function template is composed with a particular type, then the compiler can create an actual function based on this composition (an instantiation) and compile it. Accordingly, we have
3. Function templates are included completely in header files. The full text of the function template needs to be present when it is intantiated, so the full text of the function template is included as its declaration and definition in the header file.
4. Errors in function templates only become exposed upon composition. Since the function is compiled only when it is composed with a particular type, we won’t see any errors in the function (or with the composition) until the composition occurs and the compiler attempts to compile the composed function. Although error messages are getting better (and C++ will be including methods for better checking of templates, i.e., concepts), error messages from errors in function templates are notoriously verbose.
5. Instantiated function templates perform exactly as the equivalent
concrete function. That is, the function template multiply
will
give exactly the same performance when composed with RowMatrix
or
ColMatrix
as the earlier functions specifically written for those
types.
“Aha, not so fast!” you might be saying. “We don’t necessarily want to
use the same function for RowMatrix
and ColMatrix
. Different
loop orderings cam provide different performance, depending on the matrix orientation.”
This is true. And the C++ template mechanism provides a feature just for handling this issue – that is, the issue of needing selectively different functionality to get better performance (e.g.) for a given type.
The file specialization.cpp
in the warmup
subdirectory shows how
this is done. The generic function template is this
1 2 3 4 5 6 7 8 9 10 11 | template <class Matrix_1_t, class Matrix_2_t, class Matrix_3_t>
void multiply(const Matrix_1_t& A, const Matrix_2_t& B, Matrix_3_t& C) {
std::cout << "Generic multiply" << std::endl;
for (size_t i = 0; i < C.num_rows(); ++i) {
for (size_t j = 0; j < C.num_cols(); ++j) {
for (size_t k = 0; k < A.num_cols(); ++k) {
C(i, j) += A(i, k) * B(k, j);
}
}
}
}
|
(Note we’ve added a diagnostic print statement just for this example.)
We can create specializations of multiply
for particular specific values of the placeholders, e.g., ColMatrix
is this
1 2 3 4 5 6 7 8 9 10 11 | template <>
void multiply(const ColMatrix& A, const ColMatrix& B, ColMatrix& C) {
std::cout << "Specialized ColMatrix multiply" << std::endl;
for (size_t j = 0; j < C.num_cols(); ++j) {
for (size_t k = 0; k < A.num_cols(); ++k) {
for (size_t i = 0; i < C.num_rows(); ++i) {
C(i, j) += A(i, k) * B(k, j);
}
}
}
}
|
If we compile this program
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | int main() {
RowMatrix A(10, 10);
RowMatrix B(10, 10);
RowMatrix C(10, 10);
ColMatrix D(10, 10);
ColMatrix E(10, 10);
ColMatrix F(10, 10);
multiply(A, B, C);
multiply(D, E, F);
multiply(A, B, D);
return 0;
}
|
We get
$ make specialization.exe
$ ./specialization.exe
Specialized RowMatrix multiply
Specialized ColMatrix multiply
Generic multiply
When deciding how to instantiate a function template, the compiler examines the types of the arguments being passed in and attempts to match to any specializations and if none are found, uses the base template.
The C++ template mechanism is exceedingly powerful – we’ve only barely scratched the surface here. In fact, the template system is “Turing complete” – meaning you can write arbitrary programs using it with a technique known as “template metaprogramming”. These are programs that are executed by the compiler itself at compile time that manipulate the compiler and the program being compiled (the syntax for template metaprogramming is Byzantine, to say the least, however).
A number of techniques have been developed over the years to try to reign in some of the excesses of template metaprogramming, as well as some of the deficiencies of templates themselves. The most important of these is concepts.
The basic idea of a concept is this. Despite that power (or maybe because of it), there are some big problems with the fundamental idea of a function template as stencil that we can just stamp types into. For example, with our multiply template we can invoke it as:
std::string A = "ten";
int B = 10;
Matrix C(10, 10);
multiply(A, B, C);
This is nonsense of course, but the compiler doesn’t know that. The compiler does for these types (or any types) what it does with templates. It substitutes the indicated types for the placeholders and then tries to compile the result. You will get errors with the resulting compilation process – but the error won’t be “this is nonsense, you can’t pass a string when you said you wanted a matrix” – it may be errors from deep inside the function(s) being compiled. For the example above:
$ make nonsense.exe
nonsense.cpp: In instantiation of 'void multiply(const Matrix_1_t&, const Matrix_2_t&, Matrix_3_t&) [with Matrix_1_t = std::__cxx11::basic_string<char>; Matrix_2_t = int; Matrix_3_t = Matrix]':
nonsense.cpp:24:19: required from here
nonsense.cpp:12:32: error: 'const class std::__cxx11::basic_string<char>' has no member named 'num_cols'
12 | for (size_t k = 0; k < A.num_cols(); ++k) {
| ~~^~~~~~~~
nonsense.cpp:13:21: error: no match for call to '(const std::__cxx11::basic_string<char>) (size_t&, size_t&)'
13 | C(i, j) += A(i, k) * B(k, j);
| ~^~~~~~
nonsense.cpp:13:31: error: expression cannot be used as a function
13 | C(i, j) += A(i, k) * B(k, j);
| ~^~~~~~
(“Instantiation” is the technical term C++ uses for stamping a type into a template).
This happens because (as of C++ 17) the placeholders are just placeholders with no semantic meaning. The upcoming “concepts” feature for C++20 will provide a formal mechanism for attaching meaning to placeholders, so that the compiler will be able to tell you “what you are asking is nonsense”.
There is a more problematic difficulty that also comes up due to not having any semantic information attached to placeholders. Over the quarter we have worked with two types of matrix-matrix product, one for dense matrices and one for sparse matrices. Now, of course, for this course (ha!), we want to have both versions of multiply
available – and we have. But now, we want to have them available as function templates.
So let’s define a multiply
template for sparse matrices.
template <class SparseMatrix_t, class Matrix_1_t, class Matrix_2_t>
void multiply(const SparseMatrix_t& A, const Matrix_1_t& B, Matrix_2_t& C) {
A.matmat(B, C);
}
If we try to compile this we run into a problem:
$ make overloads.exe
make overloads.exe
overloads.cpp:28:6: error: redefinition of 'template<class SparseMatrix_t, class Matrix_1_t, class Matrix_2_t> void multiply(const SparseMatrix_t&, const Matrix_1_t&, Matrix_2_t&)'
28 | void multiply(const SparseMatrix_t& A, const Matrix_1_t& B, Matrix_2_t& C) {
| ^~~~~~~~
overloads.cpp:17:6: note: 'template<class Matrix_1_t, class Matrix_2_t, class Matrix_3_t> void multiply(const Matrix_1_t&, const Matrix_2_t&, Matrix_3_t&)' previously declared here
17 | void multiply(const Matrix_1_t& A, const Matrix_2_t& B, Matrix_3_t& C) {
| ^~~~~~~~
overloads.cpp: In instantiation of 'void multiply(const Matrix_1_t&, const Matrix_2_t&, Matrix_3_t&) [with Matrix_1_t = CSRMatrix; Matrix_2_t = Matrix; Matrix_3_t = Matrix]':
overloads.cpp:38:19: required from here
overloads.cpp:21:21: error: no match for call to '(const CSRMatrix) (size_t&, size_t&)'
21 | C(i, j) += A(i, k) * B(k, j);
| ~^~~~~~
make: *** [overloads.o] Error 1
The problem is that we have defined two templates named multiply
. We need two such templates, one for sparse matrices and one for dense – but to the compiler, the placeholders are just placeholders, there isn’t anyway (without using concepts) to directly tell the compiler to use one version if the first placeholder gets associated with a dense matrix and to use a different version if it is associated with a sparse matrix.
Warning
Magic Ahead
While it is true we can’t directly use concepts (yet), we can trick the compiler into at least selecting the right overload based on arbitrary properties of types that we pass to the templates. Because of all of the potential (valid) overloads that might be available to a compiler, all that is needed is that there be a single most valid one. If the cases that are less specific are also in error, those do not get generated and compiled – they are simply ignored (this is a property known as “substitution failure is not an error” or “SFINAE”).
There are various ways we can leverage this property to prune templates from being considered. What we are going to do is the following:
template <class Matrix_1_t, class Matrix_2_t, class Matrix_3_t,
std::enable_if_t< is_dense_matrix <Matrix_1_t>::value
&& is_dense_matrix <Matrix_2_t>::value
&& is_dense_matrix <Matrix_3_t>::value>* = nullptr >
void multiply(const Matrix_1_t& A, const Matrix_2_t& B, Matrix_3_t& C) {
for (size_t i = 0; i < C.num_rows(); ++i) {
for (size_t j = 0; j < C.num_cols(); ++j) {
for (size_t k = 0; k < A.num_cols(); ++k) {
C(i, j) += A(i, k) * B(k, j);
}
}
}
}
template <class SparseMatrix_t, class Matrix_1_t, class Matrix_2_t,
std::enable_if_t< is_sparse_matrix<SparseMatrix_t>::value
&& is_dense_matrix <Matrix_1_t> ::value
&& is_dense_matrix <Matrix_2_t> ::value>* = nullptr >
void multiply(const SparseMatrix_t& A, const Matrix_1_t& B, Matrix_2_t& C) {
A.matmat(B, C);
}
Now, this can look extremely intimidating, but let’s break down what is happening, say, in the first multiply
. If we ignore the template mechanisms, what is left is just what we have been writing all quarter for multiply
(written in terms of placeholders rather than concrete types).
template <class Matrix_1_t, class Matrix_2_t, class Matrix_3_t,
std::enable_if_t< is_dense_matrix <Matrix_1_t>::value
&& is_dense_matrix <Matrix_2_t>::value
&& is_dense_matrix <Matrix_3_t>::value>* = nullptr >
void multiply(const Matrix_1_t& A, const Matrix_2_t& B, Matrix_3_t& C) {
for (size_t i = 0; i < C.num_rows(); ++i) {
for (size_t j = 0; j < C.num_cols(); ++j) {
for (size_t k = 0; k < A.num_cols(); ++k) {
C(i, j) += A(i, k) * B(k, j);
}
}
}
}
Let’s look again at the concept-overloaded multiply
functions, looking at the template statements.
template <class Matrix_1_t, class Matrix_2_t, class Matrix_3_t,
std::enable_if_t< is_dense_matrix <Matrix_1_t>::value
&& is_dense_matrix <Matrix_2_t>::value
&& is_dense_matrix <Matrix_3_t>::value>* = nullptr >
void multiply(const Matrix_1_t& A, const Matrix_2_t& B, Matrix_3_t& C) {
for (size_t i = 0; i < C.num_rows(); ++i) {
for (size_t j = 0; j < C.num_cols(); ++j) {
for (size_t k = 0; k < A.num_cols(); ++k) {
C(i, j) += A(i, k) * B(k, j);
}
}
}
}
template <class SparseMatrix_t, class Matrix_1_t, class Matrix_2_t,
std::enable_if_t< is_sparse_matrix<SparseMatrix_t>::value
&& is_dense_matrix <Matrix_1_t> ::value
&& is_dense_matrix <Matrix_2_t> ::value>* = nullptr >
void multiply(const SparseMatrix_t& A, const Matrix_1_t& B, Matrix_2_t& C) {
A.matmat(B, C);
}
Again, for each, we have the function bodies for multiply
– those should be very familiar by now (and shouldn’t be intimidating). Above each function definition (and highlighted) is a template
statement – indicating, just as before, that the function being defined is a function template. And, just as before, we have placeholder arguments for the matrix arguments being passed to each variant of multiply
.
Now, each multiply
function template has one more placeholder argument – and this is where the magic with SFINAE happens. The enable_if
template is one that will basically “blow up” if its argument evaluates to false
, or it will return something valid if the its argument evaluates to true. If it blows up, the compiler will not even consider using that function template to stamp types into. If it does not blow up, the compiler will consider it. In addition, your course repo now has a file util.hpp
that has some other (magic) class templates that (again using SFINAE) evaluate to true if the type argument is a dense matrix (i.e., that it has an operator()
that takes two index arguments) or if it is a sparse matrix, or if it is a dense vector. The concepts mechanism for C++20 will replace all of this machinery by essentially elevating placeholders to instead be “types of types.”
You will not be required to write any code like this in the course (or even be particular facile with it). But if you do want to basically be able to read it – focus first on the function body, then the first template arguments, then the enable_if
. Here we can see in the enable_if
that the first version of multiply
takes three types that are each a dense matrix, whereas the second multiply
expects the first argument to be a sparse matrix and the next two to be dense matrices.
I will be more than happy to arrange extra office hours for those who would like questions answered about this (or just additional explanation).
The code in ps7a is basically the code that was in ps6 but with the template version of all of the math functions. The invocations of those now function templates has not changed. If you look in your repo you will notice we no longer have the various .cpp files that we had in the past. All of those functions have been converted to function templates and are now completely included in header files. If you browse through some of those files you will see how this is done. Notice that alot of previous redundancy has been eliminated.
Note also that since the math functionality we are using (in amath583.hpp
) is generic, it no longer depends on Matrix.hpp
nor any of the other headers that define specific classes. So the include statemenst for those files have been removed.
The files in ps7a can be considered as a reference implementation (solution?) for many of the problems we did in the last few assignments. You should spend a litte time browsing the class files as well as the algorithms.
Repeat what you did above with ps6, but using ps7a and using srun to execute your programs (you can also experiment with using srun and your programs from ps6). Again, you are encouraged to browse some of the driver code to convince yourself that there is no difference in using the code between the concrete versions and the generic versions.
Note in particular that you can switch between CSRMatrix
and CSCMatrix
just by changing one line now in pagerank.cpp. Verify that you can do this (and that the right thing happens).
Note
auto
In fact, you can switch between CSRMatrix
and CSCMatrix
by just changing one character.
There is an additional feature in C++ that works hand-in-glove with templates (particularly class templates): auto
. In many instances it is possible for C++ to determine the type of a variable based on how the variable is being initialized (recall that we should initialize variables when they are created, if at all possible). To let C++ determine the type of the thing you are creating, you use the auto
keyword instead of a type. When we create the matrix A
in pagerank.cpp
we can say
auto A = read_csrmatrix(input_file);
The “Right Thing” (tm) will happen – A
will be a CSRMatrix
. Knowing explicitly what types are is important for a programmer when the programmer is responsible for matching up types and functions, but since a generic function can take any type, it is less important that you explicitly define types (and you really don’t want to when you are working with placeholders in generic functions).
You can create a CSCMatrix
by changing just one character:
auto A = read_csrmatrix(input_file);
I don’t recommend relying on auto
(in fact I recommend against it) until you are comfortable with the strong typing in C++ – and with the basic operation of generic functions and data types.
Next up: GPUs.