CUDA memory hierarchy explained, including register file, L1 cache, shared memory, constant cache, L2 cache, global memory, local memory, texture and constant memory characteristics and usage. Memory access patterns for global and shared memory, and optimization techniques (SoA vs AoS, vectorized loads, __ldg, broadcast, padding, bank-aware layout, occupancy, TMA, async copy).