This course is intended for students interested in the efficient use of modern parallel systems ranging from multi-core and many-core processors to large-scale distributed memory clusters. The course puts equal emphasis on the theoretical foundations of parallel computing and practical aspects of different parallel programming models. It provides a survey of common parallel architectures and types of parallelism, introduces formal approaches to assess scalability and efficiency of parallel algorithms and their implementations, and covers the most common and current parallel programming techniques and APIs, including for shared address space, many-core accelerators, distributed memory clusters and big data analytics platforms.