gpu materials
gpu lectures
CUTLASS: CUDA TEMPLATE LIBRARY FOR DENSE LINEAR ALGEBRA AT ALL LEVELS AND SCALES cutlass-gtc2018
PROGRAMMING TENSOR CORES: NATIVE VOLTA TENSOR CORES WITH CUTLASS gtc-2019
Developing CUDA Kernels to Push Tensor Cores to the Absolute Limit on NVIDIA A100 cutlass-gtc2020
Accelerating Backward Data Gradient by Increasing Tensor Core Utilization in CUTLASS cutlass-gtc2022
Use CUTLASS to Fuse Multiple GEMMs to Extreme Performance cutlass-gtc2022 中文
Auto48: A General Framework for Automatic Model Compression and Acceleration using Int4/Int8 Mixed Precision cutlass-gtc2022
https://developer.nvidia.com/blog/cutlass-linear-algebra-cuda/