GPU instruction throughput and latency analysis, detailing performance characteristics of different instruction types and instruction execution capabilities per SM, providing important reference data for GPU programming optimization.
Read more »

GPU programming learning resources collection, including NVIDIA GTC conference lectures, CUTLASS library tutorials, CUDA programming books and open source projects, covering from basic to advanced GPU development techniques.
Read more »

NVIDIA GPU新特性介绍,包括V100的Volta SIMT模型、Cooperative Groups,以及A100的异步拷贝、异步屏障、任务图加速和2:4结构化稀疏等先进技术。
Read more »

GPU性能分析工具Roofline模型,用于评估GPU计算性能瓶颈,帮助开发者理解计算密度与内存带宽对性能的影响。
Read more »

CUDA PTX ISA and SASS assembly language learning resources, including PTX instruction set architecture documentation, compiler APIs, inline assembly guides, dynamic loading techniques, and other GPU low-level programming materials.
Read more »

NVIDIA Tensor Core技术详解,包括第一代、第二代、第三代Tensor Core的架构特点、计算能力和性能指标,以及在不同GPU架构中的实现差异。
Read more »

大模型训练优化技术全面解析,包括Megatron框架、计算优化(OP融合、混合精度、通信融合)、显存优化(重计算、Offload)、并行优化(数据并行、模型并行、流水线并行)等核心技术。
Read more »

MLIR编译器基础设施介绍,包括Dialect设计(类型、属性、操作、接口)、Dialect转换、代码转换、变换、翻译和Pass优化等编译器技术。
Read more »
0%