Lecture 4: The Vector class

../_images/L4-sp22-title.png Lecture 4 slides Lecture 4 panopto Lecture 4 podcast

(Recorded 2022-04-07)

Overview

In this lecture we start to get into some of the features of C++ that make it such a powerful language for HPC – and, in general, for building large-scale software systems.

In particular, we go step-by-step through the development of a C++ class for representing a mathematical vector (an N-tuple).

By the end of lecture we arrive at the following Vector class:

#include <vector>

class Vector {
public:
  Vector(size_t M) : num_rows_(M), storage_(num_rows_) {}

        double& operator()(size_t i)       { return storage_[i]; }
  const double& operator()(size_t i) const { return storage_[i]; }

  size_t num_rows() const { return num_rows_; }

private:
  size_t num_rows_;
  std::vector<double> storage_;
};

This is a straightforward, but surprisingly powerful, representation for a vector.

Important C++ concepts that we focus on during the development are

  • class definition

  • member functions

  • private and public

  • constructors

  • initialization syntax

  • operators

  • operator+

  • operator()

  • const

Vectors and the Vector class

In scientific computing we are almost always concerned with discrete quantities of one kind or another—often quantities that represent discretized forms of continuous variables (arising, say, from differential equations). Mathematically, the fundamental type of object that we use to represent discretized quantities is a finite-dimensional vector.

Abstractly, we can define a vector as follows:

..proof:definition:: Vector Space

(Halmos) A vector space is a set \(V\) of elements called emph{vectors} satisfying the following axioms: 1. To every pair \(x\) and \(y\) of vectors in \(V\) there corresponds a vector \(x+y\) called the emph{sum} of \(x\) and \(y\) in such a way that

  • addition is commutative, \(x+y=y+x\)

  • addition is associative, \(x+(y+z)=(x+y)+z\)

  • there exists in \(V\) a unique vector \(0\) (called the origin) such that \(x+0=x\) for ever vector \(x\), and

  • to every vector \(x\) in \(V\) there corresponds a unique vector \(-x\) such that \(x+ (-x)=0\)

  1. To every pair \(a\) and \(x\) where \(a\) is a scalar and \(x\) is a vector in \(V\), there corresponds a vector \(ax\) in \(V\) called the product of \(a\) and \(x\) in such a way that - multiplication by scalars is associative \(a(bx)=(ab)x\), and - \(1x = x\) for every vector \(x\).

    • multiplications by scalar is distributive with respect to vector addition. \(a(x+y) = ax + ay\)

    • multiplication by vetors is distributive with respect to scalar addition \((a+b)x = ax + by\)

end{enumerate}

Any vector space with a finite basis is a finite-dimensional vector space. The particular finite-dimensional space that we are concerned with in scientific computing is the set of \(n\)-tuples of real numbers, i.e., \(\Real^n\) . For the software that we will be developing for scientific computing (high-performance scientific computing), we want programming abstractions that correspond to the mathematical abstractions underlying the software. That is, we want to represent finite-dimensional vectors. We will do that with our first C++ class, namely class Vector.

Classes in C++ are programming constructs that allows us to encapsulate related data and functions. Variables of a given class (instances of a class) are called “objects”. By encapsulating related data and functions, we can provide a visible interface for manipulating the contents of an object while hiding (abstracting) the actual implementation.

Note

Separating interface from implementation – or, equivalently, policy from mechanism – is a fundamental principles of software development.

Some important Core Guidelines related to classes are:

  • C.1 : Organize related data into structures (structs or classes)

  • C.3 : Represent the distinction between an interface and an implementation using a class

  • C.4 : Make a function a member only if it needs direct access to the representation of a class

  • C.10 : Prefer concrete types over class hierarchies

  • C.11 : Make concrete types regular

Here is a proposed class for Vector:

 1#include <vector>
 2
 3class Vector {
 4public:
 5  Vector(size_t M) : num_rows_(M), storage_(num_rows_) {}
 6  
 7  double& operator()(size_t i) { return storage_[i]; }
 8  const double& operator()(size_t i) const { return storage_[i]; }
 9  
10  size_t num_rows() const { return num_rows_; }
11  
12private:
13  size_t num_rows_;
14  std::vector<double> storage_;
15};

In the following we are going to build up to this definition step by step. But first, observe what we have encapsulated in this class: a set of related functions (lines 5-10) and data (lines 13-14). Note also that we have declared the functions to be public and the data to be private. We have a public interface and a private implementation.

A mathematical vector has a dimension, it has data, and the data are distinguished (and accessed) by a subscript. Some desiderata that we would therefore like to have for a type that is modeling an mathematical

We create a class using the C++ keyword class and give it a name. Everything between the open and closing brace is a member of the class.

1class Vector {
2public:
3  size_t num_rows_;
4  std::vector<double> storage_;
5};

Here, we have two data members – num_rows_, which maintains the size of our vector (its number of rows, or dimension), and storage_, which contains the data comprising the vector itself.

We can create an object of a particular class in much the same way as we create a variable of a given type:

Vector x;

(We actually want to do a bit more when be declare a variable of Vector type – we need to define a size. We will see how to do that and will discuss creating – constructing – objects in more detail below.)

Members are usually functions or data. Members are accessed with a syntax similar to that of Python: object.member. For instance, in the above example we can access the num_rows_ member as follows:

Vector x;
x.num_rows_ = 4;
x.storage_.resize(4);

Here, we create an object of type Vector, set its num_rows_ to 4 and then make the size of storage_ to be 4.

But now we have a fundamental problem (and one which the paradigm of objects is designed to address). The size of the storage and the value of num_rows need to always be the same – they form an invariant. Unfortunately, with the interface that lets us acccess the data members how ever we would like, we can easily violate that invariant:

Vector x; x.num_rows_ = 4; x.storage_.resize(2);

In a case like this, if the number of rows is 4, we would expect to be able to access the third element of the vector. Unfortunately, because the size of storage_ is equal to two, the third element does not exist. In order to maintain an invariant, we make the data members inaccessible, by marking them private.

1class Vector {
2private:
3  size_t num_rows_;
4  std::vector<double> storage_;
5};

Now, we still want to be able to access these members, but we are going to provide member functions to do it, and the member functions are going to always guarantee that all invariants are maintained.

The first member function that we introduce is one that retrieves the size of the vector for us.

We deal with member functions in much the same way as we deal with non-member functions. We need to declare the function in order to use it and we need to also define the function. Note that a class definition is also a class declaration. If we want to separately declare and define a member function, one convention is to put the declaration in the body of the class, which will go in a .hpp file and put the definition separately in a .cpp file. However, for short, well defined functions, the convention is to simply include the definition in the class declaration, obviating the need for the function definition separately. And, if we adhere to the C++ core guideline of preferring free functions to member functions, it is often the case that all of the member functions can be defined in the .hpp file – obviating the need for a corresponding .cpp file altogether. This will in face be the case for all of the classes we develop in this course.

With the member function added, our class definition now looks like this:

1class Vector {
2public:
3  size_t num_rows() const { return num_rows_; }
4  
5private:
6  size_t num_rows_;
7  std::vector<double> storage_;
8};

The member function simply returns the value of the member num_rows_.

Note that we have marked the function as const, meaning the function does not change any data within the object it is called on – equivalently that the object the member is being called on is const for the scope of that call. (The object that a member is invoked on is essentially a hidden argument – more on that in lecture 5).

Now we have protected the member data from having num_rows_ and the size of storage_ becoming inconsistent. But now we need a way to set those two things in the first place. We do that in a special member function called a constructor. A constructor is a member function with the same name as the class, in this case it would be called Vector. A constructor takes arguments like any function, but it has no return type – it is also not invoked as a member function with “dot” notation. Rather, the constructor is invoked when we declare a variable to be of the type of the class.

For example, the statement we saw before

Vector x;

Invokes the constructor to create the object x – in this it would be a constructor with no arguments. But that isn’t what we want. We want a constructor that takes as its argument the size of the vector being created, viz:

Vector x(10);

We would like for this to create a Vector of size 10. Note that the constructor takes 10 as an argument – so the constructor we create to carry out this functionality will need to take an integer argument.

Such a constructor might look like the following:

class Vector {
public:
  Vector(size_t M) {
    num_rows_ = M;
    storage_.resize(num_rows_);
  }

  size_t num_rows() const { return num_rows_; }

private:
  size_t num_rows_;
  std::vector<double> storage_;
};

Notice how the constructor maintains its invariant. It sets the value of num_rows_ to be equal to M and sets the size of storage_ to be equal to M.

But notice one thing. Inside the body of the constructor, the storage_ member has itself already been constructed. But since it was constructed without any guidance from us – it was default constructed – its size was zero. Hence, we have to call resize(). However, what we really want to do is have storage_ be created with the proper size when its constructor is invoked. C++ provides a special initialization syntax to allow just that. Using the initialization syntax, our Vector constructor would look like this.

 1class Vector {
 2public:
 3  Vector(size_t M) : num_rows_(M), storage_(num_rows_) {}
 4  
 5  size_t num_rows() const { return num_rows_; }
 6  
 7private:
 8  size_t num_rows_;
 9  std::vector<double> storage_;
10};

In general, you should initialize the members of your class with initialization syntax in your class’s constructors. Your object should be fully constructed and initialized before the body of the constructor is executed. (Often the body is empty, as it is here.}

Now. There is one more bit of functionality that we need our Vector class to have. Namely, we need to be able to read and write (get and set) the values in the Vector.

In analogy to the mathematical way of accessing vector data, and in similarity to other program languages, we would like to access the data with subscripting notation. That is, we would like to be able to write statements like

Vector x(10);
double y = x(3);
x(4) = 2.0;

With built-in types you can use syntax like a + b to add two things and a * b to multiply two things. One of the goals of C++ is to allow user-defined types to act like built-in types to the greatest extent possible. That is, C++ has mechanisms to allow us to define functions that will let us say a + b when a and b are Vectors. (This doesn’t happen magically, we do have to define those functions, but the machinery is there to make it possible).

One particular bit of syntax is in the statement above that we would like to use for subscripting. In C++, for built in types (functions), we can say f(3). C++ also allows us to create functions so that we can support f(3) when f is a user-defined type (an object of a class we have defined).

Let’s see how this is done.

The case of supporting f(3) is just a bit confusing, so let’s look at another case first: addition.

We could create an add member function that would have the following declaration:

class Vector {
public:
  Vector add(const Vector& y);

private:
  size_t num_rows_;
  std::vector<double> storage_;
};

This add function takes a const reference to a Vector as an argument and returns a Vector. We would use this function as follows:

Vector x(5), y(5), z(5);
z = x.add(y);

This would invoke the member function add that belongs to x, passing in y and returning the result, which is assigned to z. This function could have any name, add is just a meaningful name choice. Let’s pick a different name.

class Vector {
public:
  Vector operator+(const Vector& y);

private:
  size_t num_rows_;
  std::vector<double> storage_;
};

Instead of add, we are calling this function operator+. But that is just its name (just like :cpp:`add is just a name). We would invoke it as

Vector x(5), y(5), z(5);
z = x.operator+(y);

But here is where C++ does something special for us. There is a different way of invoking functions that have “operator” in their name. In particular, we can invoke operator+ this way

Vector x(5), y(5), z(5);
z = x + y;

Isn’t that cool? Again, there is nothing magical happening. The + syntax is just dispatching to a function with a particular name, operator+ in this case. We can do the same thing for other arithmetic operators: -, *, / (the corresponding function names should be obvious). And, we can do it for function call syntax – parentheses. The name of the function that gets invoked in response to parentheses is operator().

The Vector class with the corresponding definition is as follows:

 1class Vector {
 2public:
 3  Vector(size_t M) : num_rows_(M), storage_(num_rows_) {}
 4  
 5  double& operator()(size_t i) { return storage_[i]; }
 6  
 7  size_t num_rows() const { return num_rows_; }
 8  
 9private:
10  size_t num_rows_;
11  std::vector<double> storage_;
12};

The syntax for operator() might look a little confusing because there are two sets of parentheses. But realize that the first set belong to “operator”: the name operator() is simply the name of a function. The second set of parentheses designate the arguments to the function. In the case of operator(), the function just looks up the indicated entry in storage_ – and returns a reference to it. That it returns a reference is very important. If it returned a value, we would be able to do this

Vector x(10)
double a = x(5);

But we would not be able to do this

Vector x(10)
x(5) = 2.0;

because x(5) is a value. However, if we have a reference, we can use x(5) on either side of the assignment operator (the = sign).

By the way, the functions like operator+ do not need to be member functions. :pp:`operator()` does need to be, but that is okay since we need to access internal data with it anyway. But operator() gives us read and write access to the contents of the vector. The other arithmetic operators can all be defined using it. For example

Vector operator+(const Vector&x, const Vector& y) {
  Vector z(x.num_rows());
  for (size_t i = 0; i < z.num_rows(); ++i) {
    z(i) = x(i) + y(i);
  }
  return z;
}

There is one last thing we need to do before we are done with Vector however. The example above won’t actually compile with the definition we have for Vector. In our definition of operator+, we have defined the arguments to be const, which is as it should be, we are not going to change them and we want to pass them by reference since they may be large (and we don’t want to copy as we would if we passed them by value). But our definition of operator() makes no such promise about not changing x or y. Any function called on a const object must itself be const.

To solve this, we declare another version of operator(), but indicate that it is the version to be used on const objects.

This is then our final definition of Vector

 1#include <vector>
 2
 3class Vector {
 4public:
 5  Vector(size_t M) : num_rows_(M), storage_(num_rows_) {}
 6  
 7  double& operator()(size_t i) { return storage_[i]; }
 8  const double& operator()(size_t i) const { return storage_[i]; }
 9  
10  size_t num_rows() const { return num_rows_; }
11  
12private:
13  size_t num_rows_;
14  std::vector<double> storage_;
15};

Note that we have added the appropriate #include statement so that we can properly declare storage_ to be of type std::vector<double>.