PageRank

In the next problem sets we will be parallelizing and evaluating PageRank (as well as other application programs). And, as we progress with the program we will also be introducing some of the more advanced C++ techniques that we have discussed in lecture, such as standard library algorithms and templates.

The program itself can take several options. It must be passed either a filename of graph (or sparse matrix) data upon which to perform PageRank, or it must be given the dimensions for a matrix generator (-p specify the size of the domain for a discretized Laplacian operator). The program also accepts options for number of lines of ranked data to print out (default is none), an output file to store its results (default is cout), the name of a file containing vertex labels, and number of threads to use once we parallelize.

We are processing command line options in a more sophisticated way than before. Rather than requiring that particular options be in a particular order on the command line, we are recognizing flags (denoted with a leading -) and using those to specify options.

To see an example of the operation of the pagerank program, build it and execute it

$ make pagerank.exe
$ ./pagerank.exe data/Erdos02.mtx -l data/Erdos02_nodename.txt -v 10

This will run the program on the “Erdos” data set and print the top ten results. (Just as Kevin Bacon is the center of the IMDB universe, Paul Erdos is the center of the mathematics universe. The Erdos02.mtx dataset contains a graph of authors who have written papers with each other and who are a distance of 2 in the graph from Erdos himself.)

You should read through the file pagerank.cpp, looking in particular at the function pagerank and, of course, main.