你的情况正确的转换不是GHz的:
fprintf(stdout, "%d:%ld=%f(ms)\n", i,runtime[i], (runtime[i]/1.62)*1000.0);
^^^^
但赫兹:
fprintf(stdout, "%d:%ld=%f(ms)\n", i,runtime[i], (runtime[i]/1620000000.0f)*1000.0);
^^^^^^^^^^^^^
在维分析:
clock cycles
clock cycles/-------------- = seconds
second
的第一项是时钟周期测量。第二项是GPU的频率(赫兹,而不是GHz),第三项是期望的测量(秒)。您可以通过1000
乘以秒转换成毫秒,这里有一个工作的例子,显示了一个与设备无关的方式做到这一点(这样你就不必硬编码时钟频率):
$ cat t1306.cu
#include <stdio.h>
const long long delay_time = 1000000000;
const int nthr = 1;
const int nTPB = 256;
__global__ void kernel(long long *clocks){
int idx=threadIdx.x+blockDim.x*blockIdx.x;
long long start=clock64();
while (clock64() < start+delay_time);
if (idx < nthr) clocks[idx] = clock64()-start;
}
int main(){
int peak_clk = 1;
int device = 0;
long long *clock_data;
long long *host_data;
host_data = (long long *)malloc(nthr*sizeof(long long));
cudaError_t err = cudaDeviceGetAttribute(&peak_clk, cudaDevAttrClockRate, device);
if (err != cudaSuccess) {printf("cuda err: %d at line %d\n", (int)err, __LINE__); return 1;}
err = cudaMalloc(&clock_data, nthr*sizeof(long long));
if (err != cudaSuccess) {printf("cuda err: %d at line %d\n", (int)err, __LINE__); return 1;}
kernel<<<(nthr+nTPB-1)/nTPB, nTPB>>>(clock_data);
err = cudaMemcpy(host_data, clock_data, nthr*sizeof(long long), cudaMemcpyDeviceToHost);
if (err != cudaSuccess) {printf("cuda err: %d at line %d\n", (int)err, __LINE__); return 1;}
printf("delay clock cycles: %ld, measured clock cycles: %ld, peak clock rate: %dkHz, elapsed time: %fms\n", delay_time, host_data[0], peak_clk, host_data[0]/(float)peak_clk);
return 0;
}
$ nvcc -arch=sm_35 -o t1306 t1306.cu
$ ./t1306
delay clock cycles: 1000000000, measured clock cycles: 1000000210, peak clock rate: 732000kHz, elapsed time: 1366.120483ms
$
这使用cudaDeviceGetAttribute
来获得时钟速率,它返回的结果为kHz,这使得我们可以在这种情况下轻松计算毫秒。
除以赫兹的数量,而不是GHz。除以1620000000.0f'。时钟周期除以时钟周期每秒给你的秒数。将秒数乘以1000得到毫秒数。 –
@RobertCrovella,现在按预期工作,谢谢!如果您以此作为答案,我很乐意将其标记为已接受。 – John