That’s Funny¶

When you run your working norm_thrust, you may get an output that looks like the following:

$./norm_thrust.exe

Float
        N  Sequential       First      Second       First      Second
  1048576      1.7798     6.09445     6.07604    2.7595e-08    2.7595e-08
  2097152     1.68752      11.108     11.2314   4.51044e-08   4.51044e-08
  4194304     1.41257     18.6075     19.1292   2.15465e-08   2.15465e-08
  8388608     1.38701     27.5941     29.5374   3.51608e-08   3.51608e-08
 16777216     1.38567     36.9667     45.4383   4.24208e-08   4.24208e-08
 33554432     1.37679     41.2072     57.2881   8.84264e-09   8.84264e-09
 67108864     1.38583     29.4984     65.4721   2.65331e-09   2.65331e-09

Double
        N  Sequential       First      Second       First      Second
  1048576     1.78137     6.38466     6.44605   1.36909e-11   1.36909e-11
  2097152     1.69176     11.0473     11.0473   2.58195e-11   2.58195e-11
  4194304     1.41943     17.2002     17.3459   1.87745e-12   1.87745e-12
  8388608     1.39531     24.1052     24.1052   8.97525e-12   8.97525e-12
 16777216     1.39186     29.8772     29.8772   6.11285e-13   6.11285e-13
 33554432     1.38983     33.5544     34.0407    2.9634e-12    2.9634e-12
 67108864     1.38298     36.2751     36.2751   1.04758e-11   1.04758e-11

|q_| Consider just the Second column for single and double. Why might there be a difference in performance between the two? |_q|

|q_| Consider just the First and Second columns for single precision. Why might there be a difference in performance between the two? (Hint: What data structure are we starting with in the driver? Our own Vector type. What are its element types as compared to what we are sending to the GPU?) |_q|

|t_| Extra Credit. Use nvprof or some of other means of testing your hypothesis for your answer above. |_t|

Warm Up: PS6 Reprise¶

For this problem set warm-up will consist of revisiting ps6 and executing some of the problems we did on multicore nodes of Hyak.

Setting up your environment¶

Different development environments on Hyak are supported via the modules system.

Copy PS6 to Hyak¶

Use rsync to copy your ps6 directory from your laptop to Hyak.

From the directory above ps6

$ rsync -avuzb ps6 klone.hyak.uw.edu:

(You will have to go through the 2FA process, just as if you were logging in.)

Connect to Hyak with vs code and verify that ps6 has been copied.

|q_| How does the performance of (in GFLOP/s) compare to your results for dense matrix-matrix product in previous assignments? There should be a fairly large difference in GFLOP/s between sparse and dense matrix methods. Give some reasons for such a big difference. |_q|

(Include your answer in Questions.rst.)

(Include your answer in Questions.md.) How does the performance of CSC compare to the performance of CSR and of COO? Explain why (or why not) there are any significant differences.

That’s Funny¶

Warm Up: PS6 Reprise¶

Setting up your environment¶

Copy PS6 to Hyak¶

Sparse-Matrix Transpose-Matrix Product¶