GPU programming learning resources collection, including NVIDIA GTC conference lectures, CUTLASS library tutorials, CUDA programming books and open source projects, covering from basic to advanced GPU development techniques.
Read more »

NVIDIA GPU新特性介绍,包括V100的Volta SIMT模型、Cooperative Groups,以及A100的异步拷贝、异步屏障、任务图加速和2:4结构化稀疏等先进技术。
Read more »

GPU性能分析工具Roofline模型,用于评估GPU计算性能瓶颈,帮助开发者理解计算密度与内存带宽对性能的影响。
Read more »

CUDA PTX ISA and SASS assembly language learning resources, including PTX instruction set architecture documentation, compiler APIs, inline assembly guides, dynamic loading techniques, and other GPU low-level programming materials.
Read more »

NVIDIA Tensor Core技术详解,包括第一代、第二代、第三代Tensor Core的架构特点、计算能力和性能指标,以及在不同GPU架构中的实现差异。
Read more »

大模型训练优化技术全面解析,包括Megatron框架、计算优化(OP融合、混合精度、通信融合)、显存优化(重计算、Offload)、并行优化(数据并行、模型并行、流水线并行)等核心技术。
Read more »

机器学习和并行计算相关课程资源汇总,包括MLSys系统课程、GPU并行编程课程链接,以及高性能计算实验室资源,涵盖CMU、EPFL、华盛顿大学等知名院校。
Read more »

Flash Attention technology explained, including parallelization strategies, work partition optimization, supported head dimensions, and Flash Attention2's fused kernels, matrix tiling, causal masking, and other core optimization techniques.
Read more »
0%