Lecture 2: Starting out with C++

../_images/L2-title.png Lecture 2 slides Lecture 2 panopto Lecture 2 podcast

(Originally recorded 2019-04-04)

Overview

Recap of Lecture 1. Intro to C++.

Abstraction

Whenever I teach programming I start with the following question: How do humans manage complexity? For example, how is it possible for us to build a jet airliner or to build a CPU? These things have millions of parts — billions of parts in the case of a CPU. These systems are designed and built by huge teams of engineers, the complexity of the resulting artifacts is beyond what a single human can possibly comprehend. Yet, we build these things and some of the most complex things we build — airliners, CPUs — are among the most reliable.

Usually after some discussion involving processes such as decomposition, divide-and-conquer, and the like, I turn the discussion towards what is the mental tool that underlies these processes — that let’s us manage complexity.

The answer is: Abstraction.

If we look up abstraction in the dictionary, this is what we find:

Abstraction (noun):

  1. the quality of dealing with ideas rather than events: topics will vary in degrees of abstraction. - something which exists only as an idea: the question can no longer be treated as an academic abstraction.

  2. freedom from representational qualities in art: geometric abstraction has been a mainstay in her work. - an abstract work of art: critics sought the meaning of O’Keeffe’s abstractions | a series of black-and-white abstractions.

  3. a state of preoccupation: she sensed his momentary abstraction.

  4. the process of considering something independently of its associations, attributes, or concrete accompaniments: duty is no longer determined in abstraction from the consequences.

  5. the process of removing something, especially water from a river or other source: the abstraction of water from springs and wells.

Number 4 is what we mean when we talk about abstraction as a mental tool: the process of considering something independently of its attributes.

In programming, there are two primary types of abstraction that let us manage the complexity of computer programs: procedural abstraction and data abstraction. Procedural abstraction lets us consider procedures (aka functions) independently of how it is implemented. We only need to know what the procedure does with that arguments we give it. We use a square root function without regard to how the square root is computed. We just give the procedure a number and it gives us back the square root of that number. As we organize our programs we will create our own procedural abstractions to encapsulate well-defined functionality that we can use without regard to implementation (and without having to recapitulate the implementation). Data abstraction lets us create compound data types to encapsulate related pieces of information. Objects combine data abstraction and procedural abstraction by encapsulating together related pieces of data along with the functions that can operate on those data. (You should be familiar at this point with objects from programming in Python.)

Programming in C++

When we write and then run a program, the mental model is that the computer is “running our program.” But what does that even mean. As we discussed briefly—and will discuss in more detail in a few lectures—what a CPU does in some sense is very simple. It reads binary information from its memory, interprets that binary information as an instruction from its instruction set, and then executes that instruction. It does this very quickly, billions of times per second. But the instructions themselves are not at all like what we see in a high-level programming language. There are no variables. The operations are things like “add these two values together” or “get the data at this memory location.” In a real sense, the CPU has its own language—one that is low-level and expressed as a sequence of binary strings. (cf slides 10-11, 25)

To “run a program”, somehow, what we write in the language of our program needs to be carried out in the language of the CPU. There are two primary paradigms for how this might be done. A program may be interpreted, or it may be compiled. (cf slides 25-30)

In an interpreted program, there is another program—called the interpreter—that is actually running on the CPU. The interpreter program is an executable program, meaning it has been translated from its original source language into the binary language of the CPU. The CPU runs the binary instructions of the interpreter. When you present a program to the interpreter, it reads, interprets, and executes the textual statements in your program, but it does not turn the statements of your program into statements in the CPU language. (NB: Python may “compile” its program in the sense that it will turn the textual statements into more compact representations that can be interpreted more efficiently. There is also notion in some languages of “just-in-time compilation” where some parts of an interpreted language may be compiled into machine instructions.)

Interpreting a single statement from an interpreted language may take many hundreds of machine instructions. Each character of the statement must be separately read—and even reading a single character takes many machine instructions. Then the characters must be turned into tokens. Then the tokens must be interpreted as program statements. And on and on. Alot happens when you execute even a single Python statement!

A similar process has to happen with a compiled program. The text of your program is given to a compiler. The compiler then has to do many of the same things that an interpreter has to do. The characters have to be read, groups of characters have to be turned into tokens, and so on. But there is fundamental difference. The compiler does not execute your program, it translates your program into assembly language — a direct textual representation of the instructions that the CPU can execute. Another program (which might be part of the compiler), called the assembler, then translates that representation into the binary instructions that the CPU can execute. A final program, called a linker, wraps those binary instructions up in a way that the operating system can use to start the program on the CPU. The program itself, after it is compiled, is just another file. But, it is a file containing a program that can be run on the CPU. (cf slides 25-30).

Hello World!

The traditional first program in many programming language texts is the so called “Hello World” program. This is a simple program that, when run, prints the text “Hello World!” to the terminal from whence it was run. (cf slides 34-37).

In C++, the “Hello World” program looks like this:

1
2
3
4
5
6
7
8
#include <iostream>

int main() {

  std::cout << "Hello World!" << std::endl;

  return 0;
}

The different parts of this program are diagrammed on slide 34.

If we save this program text to a file, say, hello.cpp, we can compile it with the following:

$ c++ hello.cpp

This is the simplest form of compilation and will carry out all of the steps, translation to assembly, translation to binary, and linking. It will leave the executable program in a file called “a.out”. You can then run a.out as follows:

$ ./a.out
Hello World!

You will have the opportunity to write and run your own hello world program (and some others) in the first assignment. However, you may want to just try it now.

#include

There is a statement at the top of the hello world program

#include <iostream>

The #include statement is similar to the import statement in Python. It tells the compiler (technically, the pre-processor), to pull in the contents of the indicated file, in this case, the file named “iostream”. System files are denoted with the angle brackets; local files that may be included are delimited with double quotes.

Types

One of the fundamental differences between C++ and Python is that C++ is strongly typed. That is, when we create a variable, we have to tell the compiler what the type of that variable is. In Python, we don’t have to do this, and in fact, the type of a variable can change over the course of a program. In C++, a variable is assigned a type when it is created and it will always remain that type. Variables in C++ can be basic built-in types such as an int or a double, or they can be a compound type. A compound type in C++ is a class or a struct; a variable of a compound type is an object. (This is similar to Python.)

In general when you create a variable, you should also initialize it by assigning it some value. (cf C++ Core Guidelines ES.22 .)

Namespaces

C++ is a language intended for building large software systems, which inevitably means composing separately developed components. In order to prevent naming collisions (that is, two types or functions in two different software components), C++ provides the namespace mechanism, by which you can “wrap up” your variable names in another name. Of course, it is possible for namespaces to collide, but namespaces are usually chosen based on an organization or project name — and there are fewer namespaces across a project than variable names — so namespace collisions are very rare (and, there are ways to work around such collisions).

To wrap a variable in a namespace, you declare it in the scope of a namespace.

namespace amath583 {
  double pi = 3.14;
};

Now, the full name of the variable we just declared is amath583::pi. When we are outside of the namespace and want to refer to this particular pi, we need to call it by its full name. Inside of the namespace we only need to say pi.

C++ comes with a fairly rich standard library. To use elements from the standard library we use the #include statements. Everything in the standard library is in the namespace std.

We can open up a namespace with a “using” statement

In general, I suggest not opening namespaces in this way, and using the full name for variables in your programs, including the standard library. It may save you some typing to have a using statement, but code is read more often than it is written and leaving out the namespace associated with types obscures intent. You should never do anything in your programs just to save typing.