Lecture 5: The Matrix class¶

(Originally recorded 2019-04-16)

In this lecture we continue exploration of C++ support for classes, looking at constructors and accessors and overloading. And const in particular.

We present the following Matrix class:

#include <vector>

class Matrix {
public:
  Matrix(size_t M, size_t N) : num_rows_(M), num_cols_(N), storage_(num_rows_ * num_cols_) {}

        double& operator()(size_t i, size_t j)       { return storage_[i * num_cols_ + j]; }
  const double& operator()(size_t i, size_t j) const { return storage_[i * num_cols_ + j]; }

  size_t num_rows() const { return num_rows_; }
  size_t num_cols() const { return num_cols_; }

private:
  size_t              num_rows_, num_cols_;
  std::vector<double> storage_;
};

As with the definition we had for Vector, this is a straightforward, but surprisingly powerful, representation for a matrix. No more really needs to be done with the class itself. The operations we are now going to be doing with it simply use the public interface provided by operator().

A key issue with a matrix representation is its layout: how the two-dimensions of the matrix mapped to the one-dimensional structure of computer memory. We look at both row-major and column-major and settle on row-major as the implementation we will be using in the rest of the course.

Matrix Representation¶

In the previous lecture we developed a Vector class, which was a one-dimensional structure, meaning we used one index to designate unique values in the Vector. A matrix, however, has two indices, so we need some way to represent two-dimensional data.

Based on our development for the Vector class, we can propose the following public interface for a Matrix class:

#include <vector>

class Matrix {
public:
  Matrix(size_t M, size_t N);

        double& operator()(size_t i, size_t j);
  const double& operator()(size_t i, size_t j) const;

  size_t num_rows();
  size_t num_cols();
};

The question now is how do we store the data for the Matrix and how do we access it through operator().

One seemingly obvious, but quite sub-optimal, representation is to use a vector of vectors as the matrix representation. The corresponding implementation would look like

#include <vector>

class Matrix {
public:
  Matrix(size_t M, size_t N) : num_rows_(M), num_cols_(N), storage_(num_rows_, std::vector(num_cols_)) {}

        double& operator()(size_t i, size_t j)       { return storage_[i][j]; }
  const double& operator()(size_t i, size_t j) const { return storage_[i][j]; }

  size_t num_rows() const { return num_rows_; }
  size_t num_cols() const { return num_cols_; }

private:
  size_t                      num_rows_, num_cols_;
  std::vector<vector<double>> storage_;
  size_
};

Now why do we say this is sub-optimal? For two reasons. First, we are using more memory than is needed for matrix data. That is, for an \(M \times N\) matrix, we should only use \(M \times N\) data. Unfortunately, with a vector of vectors, we are using \(M \times N\) data in the inner vectors – plus \(M\) more data to represent the outer vector. Second, when we are looking up a value in the matrix, we have to make two trips to memory, one into the outer vector to lookup the inner vector, and then one into the inner vector. As we will see over the next couple of weeks, memory accesses are one of primary impediments to high performance and we should always strive to minimize memory accesses. Making two trips to memory rather than just one means that it is twice as expensive to access data compared to one trip – meaning we can only achieve half the performance. (cf slide 55)

Another approach is to “ravel” the two-dimensional matrix data into one-dimensional data. What we are going to do is just use a single std::vector to store the matrix data. For \(M \times N\) matrix, we will use a std::vector of \(M \times N\) elements. To ravel the matrix data, we arrange the two-dimensional data as follows. The first row of the matrix goes at the beginning of the std::vector. The second row of the matrix goes next to the first, and so on, until all \(M\) rows have been raveled. This layout of the matrix – row by row – is known as row-major order.

The actual raveling is carried out by the operator() in the Matrix class. Suppose we have element \((i,j)\) in the matrix. Where is that in the vector? Well, it is in the \(i^{\textrm{th}}\) row, so we move down \(i\) rows in the vector. Then, since it is in the \(j^{\textrm{th}}\) column, i.e., the \(j^{\textrm{th}}\) location in the row, we move down \(j\) more elements. That is, element \((i,j)\) in the matrix is at location \(i\) times the size of a row plu \(j\) in the vector. For an \(M \times N\) matrix, the size of a row is the number of columns, so the formula for the location \(k\) in the vector corresponding to entry \((i,j)\) in the matrix is:

\[k = i * N + j\]

We can now fill in the rest of the Matrix. First, we use a single std::vector for storage. Then, when we construct the matrix, we set the values of the number of rows and number of columns as well as allocate the appropriate amount of storage. Note that the formula for accessing a matrix element in storage_ is :cpp`storage_[i * num_cols_ + j]`;

#include <vector>

class Matrix {
public:
  Matrix(size_t M, size_t N) : num_rows_(M), num_cols_(N), storage_(num_rows_ * num_cols_) {}

        double& operator()(size_t i, size_t j)       { return storage_[i * num_cols_ + j]; }
  const double& operator()(size_t i, size_t j) const { return storage_[i * num_cols_ + j]; }

  size_t num_rows() const { return num_rows_ }
  size_t num_cols() const { return num_cols_ }

private:
  size_t              num_rows_, num_cols_;
  std::vector<double> storage_;
};

The choice we made to ravel the matrix row by row was one option. Another would be to ravel it column by column, i.e., to use column-major order. In that case, for an \(M \times N\) matrix, the formula for the location \(k\) in the vector corresponding to entry \((i,j)\) in the matrix is \(j\) times the size of column (which is the number of rows) plus \(i\), that is

\[k = j * M + i\]

A column-major Matrix class would only differ from a row-major Matrix by the formula we use in operator():

#include <vector>

class Matrix {
public:
  Matrix(size_t M, size_t N) : num_rows_(M), num_cols_(N), storage_(num_rows_ * num_cols_) {}

        double& operator()(size_t i, size_t j)       { return storage_[j * num_rows_ + i]; }
  const double& operator()(size_t i, size_t j) const { return storage_[j * num_rows_ + i]; }

  size_t num_rows() const { return num_rows_ }
  size_t num_cols() const { return num_cols_ }

private:
  size_t              num_rows_, num_cols_;
  std::vector<double> storage_;
};

Note that which orientation we use for the Matrix class does not change its interface.

The formula for matrix-matrix product is

\[C_{ij} = \sum_{k} A_{ik} B_{kj}\]

Our realization with our Matrix class would look like:

Matrix A(M, K), B(K, N), C(M, N);
// ...

for (size_t i = 0; i < M; ++i) {
  for (size_t j = 0; j < N; ++j) {
    for (size_t k = 0; k < K; ++k) {
      C(i, j) += A(i, k) * B(k, j);
    }
  }
}

Note that we can mix Matrix objects and Vector objects, for instance in a matrix-vector product

Matrix A(M, N);
Vector y(M), x(N);
// ...

for (size_t i = 0; i < M; ++i) {
  for (size_t j = 0; j < N; ++j) {
    y(i) += A(i, j) * x(j);
  }
}

These are the loops for a matrix-vector product – how would we define its interface as a function? Given a matrix A and vectors x and y, we could define a matrix-vector product as follows:

void matvec(const Matrix& A, const Vector& x, Vector& y) {
  for (size_t i = 0; i < A.num_rows(); ++i) {
    for (size_t j = 0; j < A.num_cols(); ++j) {
      y(i) += A(i, j) * x(j);
    }
  }
}

Note that we have vector y as an “in-out” variable – it is passed by reference no by const reference. (Pop question: Explain what is meant by const Matrix& A and why we use that.)

There are also situations where we might want to use matvec to initialize a vector, i.e., to return y rather than pass y as an in-out variable.

Vector matvec(const Matrix& A, const Vector& x) {
  Vector y(A.num_rows());    // Create vector of appropriate size
  for (size_t i = 0; i < A.num_rows(); ++i) {
    for (size_t j = 0; j < A.num_cols(); ++j) {
      y(i) += A(i, j) * x(j);
    }
  }
}

Note that we can use the same name for the two functions — matvec — because they have different parameter lists. (Using the same name for two functions is called overloading the function.)

Now we need to make one more observation. We have the same verbatim code for both versions of matvec. When we talked about procedural abstraction we emphasized the importance of encapsulating specific functionality in one place — and in just one place. Since the first matvec does what is in the inner loop of the second, we can implement the second with a call to the first:

Vector matvec(const Matrix& A, const Vector& x) {
  Vector y(A.num_rows());    // Create vector of appropriate size
  matvec(A, x, y);           // Call the three-argument matvec
}

We can do the same thing for matrix-matrix product:

void matmat(const Matrix& A, const Matrix& B, Matrix& C) {
  for (size_t i = 0; i < C.num_rows(); ++i) {
    for (size_t j = 0; j < C.num_cols(); ++j) {
      for (size_t k = 0; k < A.num_cols(); ++k) {
        C(i, j) += A(i, k) * B(k, j);
      }
    }
  }
}

Matrix matmat(const Matrix& A, const Matrix& B) {
  Matrix C(A.num_rows(), B.num_cols());
  matmat(A, B);
}

We will see this pattern in several places as we build out more math functions in this course.

Finally, we haven’t done any error checking in these operations. We would want to make sure, for instance, that A, B, and C are compatible for multiplication. We can use assertions to check this:

Question for thought, where else might we want to use assertions in the code we have developed above (e.g., in the class definitions for Matrix and Vector)? In problem set #2 we discuss assertions in more detail. Why is it important that we be able to, in effect, “remove” the calls to assert with the appropriate setting of the NDEBUG macro?