Cr. 3. (3-0). Prerequisites: COSC 4310, COSC 4330, and COSC 6303 or equivalent.

This course provides an introduction to High-Performance Computation (HPC), an essential tool in many sciences and engineering and increasingly also in Internet Computing, medicine, the humanities, and the entertainment industry. HPC has traditionally been driving computer architecture and computer system design and has during the last decade been driving industry’s efforts towards significantly improved energy efficiency in computation for economic and environmental reasons.

HPC today implies large scale parallel computation and in many cases employs high-performance networking technologies. In HPC there is a great emphasis on achieving high efficiency in resource utilization for applications, something that often requires a good understanding of the application, the processor, node and platform architecture, and compilers, operating systems and programming tools. Increasingly, the drive for performance also includes a drive for energy efficiency and, as stated by Google researchers, energy proportional computing.

This course focuses on the architecture of high-performance, energy efficient, scalable computing environments and algorithms for scientific and engineering problem solving in such environments. The course is suitable for scientists and engineers with computationally demanding problems, and computer scientists and applied mathematicians with an interest in techniques for efficient use of platforms suitable for large scale computations.

The course gives an overview of high performance computer architectures, parallel programming paradigms with an emphasis on MPI, OpenMP, and GPU programming. The Map-Reduce programming model will also be described. Basic algorithms for matrix operations, the solution of linear systems of equations, sorting, the Fast Fourier Transform and other common operations will be taught.

Reducing the need for data motion through proper data allocation and management of the data motion is critical for performance and energy efficiency. User level techniques for managing memory hierarchies will be discussed, and tools for performance analysis will be covered briefly.

Scalable platforms will be used for homework and projects.

**Lectures**

Lecture 1 Overview – Applications

Lecture 2 – Technology I

Lecture 3 – Clusters

Lecture 4 – Technology II

Lecture 5 – Memory I

Lecture 6 – Memory II

Lecture 7 – Cache

Lecture 8 – Cache II

Lecture 9 – Parallel Computing Concepts

Lecture 10 – Vectorization I

Lecture 11 – OpenMP

Lecture 12 – Vectorization II

Lecture 13 – Matrix-Vector Multiplication

Lecture 14 – Matrix-Matrix Multiplication

Lecture 15 – Cache Oblivious Algorithms

Lecture 16 – OpenCL I

Lecture 17 – OpenCL II

Lecture 17 – Interconnection Networks I

Lecture 18 – Interconnection Networks II

Lecture 19 – Interconnection Networks III

Lecture 20 – Sorting I

Lecture 21 – Sorting II

Lecture 21 – MPI I

Lecture 22 – MPI II

Lecture 23 – MPI III

Lecture 24 – LU Factorization and Solve

Lecture 25 – Data Partitioning I

Lecture 26 – Data Partitioning II

Lecture 27 – Fast Fourier Transform I

Lecture 28 – Fast Fourier Transform II