GPU programming learning resources collection, including NVIDIA GTC conference lectures, CUTLASS library tutorials, CUDA programming books and open source projects, covering from basic to advanced GPU development techniques.
Flash Attention technology explained, including parallelization strategies, work partition optimization, supported head dimensions, and Flash Attention2's fused kernels, matrix tiling, causal masking, and other core optimization techniques.