C++ Intel TBB和Microsoft PPL，如何在并行循环中使用next_permutation？

我有安装了英特尔并行工作室2013的Visual Studio 2012，所以我有英特尔TBB。C++ Intel TBB和Microsoft PPL，如何在并行循环中使用next_permutation？

说我有下面的代码：

const int cardsCount = 12; // will be READ by all threads 
// the required number of cards of each colour to complete its set: 
// NOTE that the required number of cards of each colour is not the same as the total number of cards of this colour available 
int required[] = {2,3,4}; // will be READ by all threads 
Card cards[cardsCount]; // will be READ by all threads 
int cardsIndices[cardsCount];// this will be permuted, permutations need to be split among threads ! 

// set "cards" to 4 cards of each colour (3 colours total = 12 cards) 
// set cardsIndices to {0,1,2,3...,11} 

// this variable will be written to by all threads, maybe have one for each thread and combine them later?? or can I use concurrent_vector<int> instead !? 
int logColours[] = {0,0,0}; 

int permutationsCount = fact(cardsCount); 

for (int pNum=0; pNum<permutationsCount; pNum++) // I want to make this loop parallel !! 
{ 
    int countColours[3] = {0,0,0}; // local loop variable, no problem with multithreading 
    for (int i=0; i<cardsCount; i++) 
    { 
     Card c = cards[cardsIndices[i]]; // accessed "cards" 

     countColours[c.Colour]++; // local loop variable, np. 
      // we got the required number of cards of this colour to complete it 
     if (countColours[c.Colour] == required[c.Colour]) // read global variable "required" ! 
     { 
        // log that we completed this colour and go to next permutation 
      logColours[c.Colour] ++; // should I use a concurrent_vector<int> for this shared variable? 
      break; 
     } 
    } 
    std::next_permutation(cardsIndices, cardsIndices+cardsCount); // !! this is my main issue 
}

什么我计算多少次，我们将完成一个颜色，如果我们从现有的卡随机挑选，而这通过每个去详尽地做可能的排列和顺序选择，当颜色“完成”时，我们打破并进入下一个排列。请注意，我们有4种不同颜色的卡片，但红色，绿色和蓝色所需的卡片数量为{2,3,4}。 2张红牌足以完成红色并且我们有4张可用，因此红色比蓝色更有可能完成，这需要全部4张牌被选中。

我想使这个for-loop并行，但我的主要问题是如何处理“卡”排列？如果我有4个线程，我可以把它分成4个不同的区域，并让每个线程都通过它们。

如果我不知道机器的内核数量，并且我希望程序自动选择正确的并发线程数，该怎么办？肯定有一种方法可以使用英特尔或微软工具来做到这一点？

这是我的名片结构以防万一：

struct Card 
{ 
public: 
    int Colour; 
    int Symbol; 
}

来源

2013-04-27 Space monkey

你可以很容易通过固定排列的第一个元素，并呼吁其他元素独立于每个线程std::next_permutation使你的代码运行在平行1,2, ..., or cardsCount线程。考虑下面的代码：

// declarations 

// #pragma omp parallel may be here 
{ // start of a parallel section 
    const int start = (cardsCount * threadIndex)/threadNumber; 
    const int end = (cardsCount * (threadIndex + 1))/threadNumber; 

    int cardsIndices[cardsCount]; // a local array for each thread 

    for (const int firstElement = start; firstElement < end; ++firstElement) { 
     cardsIndices[0] = firstElement; 
     // fill other cardsIndices with elements [0-cardsCount], but skipping firstElement 
     do { 
      // your calculations go here 
     } while (std::next_permutation(cardsIndices + 1, cardsIndices + cardsCount)); // note the +1 here 
    } 
}

如果您希望使用的OpenMP作为并行化工具，你只需要只是并行部分前加#pragma omp parallel。并使用 omp_get_thread_num()函数获取线程索引。
你也不必在这里使用一个concurrent_vector，这将可能使你的程序非常缓慢，使用特定线程积累数组：
```
logColours[threadNumber][3] = {}; 
++logColours[threadIndex][c.Colour]; 
```
如果Card是一个相当沉重的类，我会建议使用const Card& c = ...而不是每次复制Card c = ...。

来源

2013-04-27 10:46:29 Ixanezis

您可以使用std::thread::hardware_ concurrency()从<thread>。从“C++并发在行动”引用的A.Williams -

一个C++标准库的功能，可以帮助这里是 std::thread::hardware_ concurrency()。对于给定的程序执行，此函数返回指示可真正并行运行的线程数。例如，在多核系统上，CPU核的数量可能是。

来源

2013-04-27 10:47:58 SChepurin

让N = cardsNumber，M = required[0] * required[1] * ... * required[maxColor]。然后，实际上，您的问题可以在O（N * M）时间内轻松解决。在你的情况下，这是12 * 2 * 3 * 4 = 288操作。 :)

可能的方法之一是使用循环关系。考虑一个函数logColours f(n, required)。假设n是已经考虑的卡的当前数目; required是你的例子中的一个向量。函数返回向量logColours中的答案。你有兴趣f(12, {2,3,4})。功能f简里面反复的计算可以这样写：

std::vector<int> f(int n, std::vector<int> require) { 
    if (cache[n].count(require)) { 
     // we have already calculated function with same arguments, do not recalculate it again 
     return cache[n][require]; 
    } 

    std::vector<int> logColours(maxColor, 0); // maxColor = 3 in your example 

    for (int putColor=0; putColor<maxColor; ++putColor) { 
     if (/* there is still at least one card with color 'putColor'*/) { 
       // put a card of color 'putColor' on place 'n' 
       if (require[putColor] == 1) { 
        // means we've reached needed amount of cards of color 'putColor' 
        ++logColours[putColor]; 
       } else { 
        --require[putColor]; 
        std::vector<int> logColoursRec = f(n+1, require); 
        ++require[putColor]; 
        // merge child array into your own. 
        for (int i=0; i<maxColor; ++i) 
         logColours[i] += logColoursRec[i]; 
       } 
      } 
    } 

    // store logColours in a cache corresponding to this function arguments 
    cache[n][required] = std::move(logColours); 
    return cache[n][required]; 
}

缓存可以作为std::unordered_map<int, std::unordered_map<std::vector<int>, std::vector<int>>>来实现。

一旦你理解了主要思想，你就可以用更高效的代码实现它。

来源

2013-04-27 11:54:14 Ixanezis

嗯..我不明白这个主意，但我可以看到你不关心我拥有的卡片。如果你不知道这些卡片，这个如何工作？在我的例子中，我有4种颜色的卡片，总共3种颜色，所以有12张卡片。我编辑了我的问题，以表明.. – 2013-04-27 13:22:38

我很在乎。在'f'函数中有一个'if'，它检查剩余所需颜色的卡片数量。我以为你可以自己填写代码。 – Ixanezis 2013-04-27 17:33:12

请看看完整的实现：http://ideone.com/9ibXaa – Ixanezis 2013-04-27 18:29:57

我猜这是什么意思@Ixanezis

如果红色胜

的最终结果将是一个业余友好版本：2红，绿0-2，0-3蓝色

说中奖红色为A，其他红色B，有12种方法让A和B.

以下是可能的情况：

Cases:   #Cards after A #Cards before A #pick green #pick blue 
0 green, 0 blue: 10! = 3628800  1! = 1   1   1 
0 green, 1 blue: 9 ! = 362880  2! = 2   1   4 
0 green, 2 blue: 8 ! = 40320  3! = 6   1   6 
0 green, 3 blue: 7 ! = 5040  4! = 24   1   4 
1 green, 0 blue: 9 ! = 362880  2! = 2   4   1 
1 green, 1 blue: 8 ! = 40320  3! = 6   4   4 
1 green, 2 blue: 7 ! = 5040  4! = 24   4   6 
1 green, 3 blue: 6 ! = 720   5! = 120  4   4 
2 green, 0 blue: 8 ! = 40320  3! = 6   6   1 
2 green, 1 blue: 7 ! = 5040  4! = 24   6   4 
2 green, 2 blue: 6 ! = 720   5! = 120  6   6 
2 green, 3 blue: 5 ! = 120   6! = 720  6   4

允许sumproduct那些4个数组：= 29064960，然后乘以12 = 348779520

类似地，可以计算值绿色胜用于蓝色胜。

来源

2013-04-28 16:07:13 colinfang

C++ Intel TBB和Microsoft PPL，如何在并行循环中使用next_permutation？

回答

相关问题