Unlock the power of the GPU. Master parallel programming with NVIDIA CUDA to accelerate scientific computing, AI, and simulations.
CPUs have few cores; GPUs have thousands. This course teaches Heterogeneous Computing using NVIDIA CUDA C++. You will learn the GPU architecture (Grids, Blocks, Threads), memory hierarchy (Global, Shared, Constant), and how to write Kernels that execute in parallel. We apply these skills to matrix multiplication, image processing, and simulation. This is the enabling technology behind modern AI training, crypto mining, and high-performance physics simulations.
Estimated completion time: 21 lessons • Self-paced learning • Lifetime access
Focus is Compute (GPGPU), not rendering graphics.
CUDA is NVIDIA specific; concepts apply to OpenCL/HIP.
Need an NVIDIA GPU (or cloud instance).
This is the low-level code that powers PyTorch/TF.
3 recommended paths based on what you're learning
The natural next step after CUDA Programming? Becoming a Embedded Architect.
The secret weapon for CUDA Programming learners? Adding Memory Management to your toolkit.
This AI tool changes the game: Copilot + Rust Analyzer lets you write safer system-level code.