如何使用推力和cuda将gpu数据排序到cpu对象所拥有的单独列表中？

我是新来的推力，但希望得到一个如何在并行排序情况。我有一个超大的gpu列表（1mil +），我试图将它们分类到各个cpu容器中，其中每个容器都有一个device_vector。这个想法是，我想将gpu列表分类到CPU容器拥有的各种device_vector中。如何使用推力和cuda将gpu数据排序到cpu对象所拥有的单独列表中？

class GpuObject 
{ 
    int someData; 
    int otherValue; 
}; 

class CpuContainer 
{ 
    thrust::device_vector<GpuObject>* SortedGpuList; 
}; 

for(int i = 0; i<100; i++) 
{ 
     Containers.push_back(new CpuContainer()); 
} 

thrust::device_vector<GpuObject>* completeGpuList; 

__device__ __host__ 
void sortIntoContainers(....) 
{ 
    // ... possible to sort completeGpuList into Containers[i].SortedGpuList based on GpuObject.someData ? 
}

我的第一个尝试是要建立一个device_vector持有代表该容器一个给GpuObject将得到映射到（大小等于completeGpuList）一个int。然后我使用thrust :: transform与int（）运算符的对象返回每个GpuObject的容器ID。在此之后，我使用新的containerIDList对原始gpuCompleteList中的键进行排序。但是如何在排序之后有效地复制所有条目而无需循环列表？

来源

2017-07-18 JMan Mousey

如何在一个更大的矩阵中设置所有向量？值存储在其中，其他字段在对象中排序。例如，一个50 * 1M的float *矩阵。然后，每个向量i位于该矩阵的偏移量'50 * i'处，或者（矩阵+ 50 * i）。这是管理多个向量的常用方法。

然后，您可以通过按键对'thrust :: sort_by_key'进行排序。每次排序之前，用一个简单的内核将'keys'矩阵重置为[0,1，...，49，0，1，... 49，...，0，1，...，49] 。然后可以使用如下所示的“sort_columns_withIndices”对元素进行排序。排序后，键是对象的索引。

#include <thrust/host_vector.h> 
#include <thrust/device_vector.h> 
#include <thrust/sort.h> 
#include <thrust/reduce.h> 
#include <thrust/execution_policy.h> 
#include <thrust/functional.h> 



extern "C" 
__global__ void sort_columns_withIndices(float* values, int* keys, int numRows, int numCols, int descending) 
{ 
int i = blockDim.x * blockIdx.x + threadIdx.x; 
if (i < numCols) 
{ 
    if (descending > 0){ 
     thrust::sort_by_key(thrust::device, values + i * numRows, values + (i + 1) * numRows, keys + i * numRows, thrust::greater<float>()); 
    } else { 
     thrust::sort_by_key(thrust::device, values + i * numRows, values + (i + 1) * numRows, keys + i * numRows, thrust::less<float>()); 
    } 
} 
}

来源

2017-07-19 02:37:00 Tom

是的，这基本上是一个压缩的稀疏行格式。 CUSP库具有以推力构建的这个故事的GPU实现 – talonmies

如何使用推力和cuda将gpu数据排序到cpu对象所拥有的单独列表中？

回答

相关问题