This project-based course is designed to provide practical experience in building, debugging, testing and profiling end-to-end parallel applications. Throughout the semester, students work in small teams implementing either assigned or self-proposed projects focusing on a parallel architecture of their choice (e.g., GPGPU accelerators, shared-memory servers, distributed memory bare-metal or cloud-based clusters, etc.), and using their preferred programming model (e.g., CUDA, MPI, Spark, RPC-based APIs, etc.). The course puts equal emphasis on all aspects of the project execution, all the way from the conception and background research, through implement-test-benchmark loop, all the way to the end-product deployment and demonstration. The purely practical component of the course is complemented by milestone presentations and written reports by each team, and is culminated by the full-scale product presentation at the end of the semester.