thrust

-1热度

1回答

这里是我的代码： //initialize the device_vector int size = N; thrust::device_vector<glm::vec3> value(size); thrust::device_vector<int> key(size); //get the device pointer of the device_vector //so than I

0热度

1回答

推力与立方体的性能

我有一个std::vector不同尺寸的矩阵，我将计算每个矩阵的平方。我有两个解决方案： 1 /将所有矩阵平坦化，并将它们作为巨大的平面数组（float *）存储在设备中，并使用该矩阵中每个矩阵的开始和结束索引，并以cublas为例做平方。 2将矩阵存储在thrust::device_vector<float *>中，并使用thrust::for_each来排列它们。很明显，第二种解决方案提供了

1热度

1回答

推力:: device_vector使用推力::更换或推力::自定义函子/谓词

变换我使用CUDA核心做在一个推力矢量乙状结肠激活： thrust::device_vector<float> output = input; float * output_ptr = thrust::raw_pointer_cast(output.data()); sigmoid_activation<<<num_blocks_x,block_threads_x>>>(output_ptr)

0热度

1回答

如何对CUDA矢量类型的数组进行排序

具体如何排序float3的数组？因此，.x组件是主要的分类标准，.y组件是次要分类标准，而.z组件是第三级分类标准。有没有一个简单的解决方案，可以打一个电话cub:: DeviceRadixSort或thrust::sort_by_key？目前我想也许我可以创建一个uint32键阵列，其中每个元素的前三分之一的数字取自输入数组的第一个三分之一的组成部分，第一个三分之一的数字取自第一个数字输入阵

1热度

1回答

将cuda数组传递给thrust :: inclusive_scan

我可以使用inclusive_scan作为cpu上的数组，但是可以使用gpu上的数组来做到这一点吗？（评论是我知道的作品的方式，但我不需要）。或者，还有其他简单的方法可以对设备内存中的阵列执行全面扫描吗？代码： #include <stdio.h> #include <stdlib.h> /* for rand() */ #include <unistd.h> /* for getpid(

0热度

1回答

通过推力从__Global__访问指针

如何通过推力从Global访问指针。 thrust::device_vector<int> testvec; int *send_data = thrust::raw_pointer_cast(&testvec[0]); << <1, 1 >> > (send_data, raw_ptr); 我能够与全球工作中使用SEND_DATA。我无法检索它 thrust::device_ptr

0热度

1回答

Thrust :: sort崩溃无效参数

我试图在设备内存上使用推力::排序。但它在运行时崩溃。我也尝试禁用调试信息生成。下面是一个小例子： cudaSetDevice(0); int u[10]; int* v; cudaMalloc(&v, 10 * sizeof(int)); for (int i = 0; i < 10 ; i++) u[i] = 10-i; cudaMemcpy(u, v, 10 * si

0热度

1回答

获取CUDA中多个数组的唯一元素

问题：数组数量很多，例如2000个数组，但每个数组中只有256个整数。整数的范围非常可观，例如[0,1000000]。我想获得每个数组的唯一元素，换句话说，删除重复的元素。我有2个解决方案：使用推力让每一个数组的独特元素，所以我必须做2000次thrust::unique。但是每个阵列都很小，这种方式可能无法获得良好的性能。在cuda内核中实现哈希表，使用2000个块，每个块中有256个

0热度

1回答

如何使用推力落实键减少时，键是字符串或字符数组

输入： BC BD BC BC BD CD 输出：公元前3 BD 2 CD 1 ，如果我用char类型关键是available.But似乎推力不支持字符串作为密钥。 #include <thrust/device_vector.h> #include <thrust/iterator/constant_iterator.h> #include <thrust/reduce.h> #incl

1热度

1回答

推力异常：“推力::系统:: SYSTEM_ERROR在存储器位置00000000”

我写使用类device_vector用于初始化向量CUDA内核分配（）这些代码的。这个内核是通过一个类的成员函数推出作为解决这样的问题： CUDA kernel as member function of a class 和根据 https://devtalk.nvidia.com/default/topic/573289/mixing-c-and-cuda/。我正在使用GTX650Ti GPU