GPU programming learning resources collection, including NVIDIA GTC conference lectures, CUTLASS library tutorials, CUDA programming books and open source projects, covering from basic to advanced GPU development techniques.
CUDA PTX ISA and SASS assembly language learning resources, including PTX instruction set architecture documentation, compiler APIs, inline assembly guides, dynamic loading techniques, and other GPU low-level programming materials.
Flash Attention technology explained, including parallelization strategies, work partition optimization, supported head dimensions, and Flash Attention2's fused kernels, matrix tiling, causal masking, and other core optimization techniques.