gpu instruction throughput

Instruction Throughput

image

Instruction Latencies and Instructions/SM

image

alt text

Little’s law

所需线程数量 = 延迟*吞吐量

Arithmetic Instruction Latency

Memory Instruction Latency

每个时钟周期的读取字节数 = 内存带宽 / 时钟频率