GPU programming learning resources collection, including NVIDIA GTC conference lectures, CUTLASS library tutorials, CUDA programming books and open source projects, covering from basic to advanced GPU development techniques.
GPU instruction throughput and latency analysis, detailing performance characteristics of different instruction types and instruction execution capabilities per SM, providing important reference data for GPU programming optimization.
Flash Attention technology explained, including parallelization strategies, work partition optimization, supported head dimensions, and Flash Attention2's fused kernels, matrix tiling, causal masking, and other core optimization techniques.