Unlock the power of the GPU. Master parallel programming with NVIDIA CUDA to accelerate scientific computing, AI, and simulations.
CPUs have few cores; GPUs have thousands. This course teaches Heterogeneous Computing using NVIDIA CUDA C++. You will learn the GPU architecture (Grids, Blocks, Threads), memory hierarchy (Global, Shared, Constant), and how to write Kernels that execute in parallel. We apply these skills to matrix multiplication, image processing, and simulation. This is the enabling technology behind modern AI training, crypto mining, and high-performance physics simulations.
Estimated completion time: 21 lessons • Self-paced learning • Lifetime access
Focus is Compute (GPGPU), not rendering graphics.
CUDA is NVIDIA specific; concepts apply to OpenCL/HIP.
Need an NVIDIA GPU (or cloud instance).
This is the low-level code that powers PyTorch/TF.