将执行从一个线程移动到另一个线程以执行任务并行和调用

我正试图在C++中实现未来调用机制。虽然这只是一个测试代码（有点匆忙），但我打算为我正在使用的语言的运行时使用类似的东西来实现透明并行。将执行从一个线程移动到另一个线程以执行任务并行和调用

，我干我的工作的代码，使其稍微小了一点，但它仍然是很大的：

#include <cstdlib> 
#include <cstdio> 
#include <iostream> 
#include <vector> 
#include <queue> 
#include <future> 
#include <thread> 
#include <functional> 
#include <type_traits> 
#include <utility> 
using namespace std; 
using namespace std::chrono; 

//------------------------------------------------------------------------------ 
// Simple locked printer 

static std::recursive_mutex print_lock; 

inline void print_() { 
    return; 
}; 

template<typename T, typename... Args> 
inline void print_(T t, Args... args) { 
    print_lock.lock(); 
    std::cout << t; 
    print_(args...); 
    print_lock.unlock(); 
}; 
//------------------------------------------------------------------------------ 

template<typename R> 
class PooledTask { 
    public: 
    explicit PooledTask(function<R()>); 

    // Possibly execute the task and return the value 
    R &operator()() { 

     // If we can get the lock, we're not executing 
     if(lock.try_lock()) { 

     // We may already have executed it 
     if(done) 
      goto end; 

     // Otherwise, execute it now 
     try { 
      result = move(task()); 
     } catch(...) { 
      // If an exception is thrown, save it for later 
      eptr = current_exception(); 
      failed = true; 
     }; 

     done = true; 

     goto end; 

     } else { 

     // Wait until the task is completed 
     lock.lock(); 

     end: { 
      lock.unlock(); 

      // Maybe we got an exception! 
      if(failed) 
      rethrow_exception(eptr); 

      // Otherwise, just return the result 
      return result; 
     }; 
     }; 
    }; 

    private: 
    exception_ptr eptr; 
    function<R()> task; 
    bool done; 
    bool failed; 
    mutex lock; 
    R result; 
}; 

extern class TaskPool pool; 

class TaskPool { 
    public: 
    TaskPool() noexcept: TaskPool(thread::hardware_concurrency() - 1) { 
     return; 
    }; 

    TaskPool(const TaskPool &) = delete; 
    TaskPool(TaskPool &&) = delete; 

    template<typename T> 
    void push(PooledTask<T> *task) noexcept { 

     lock_guard<mutex> guard(lock); 

     builders.push([=] { 
     try { 
      (*task)(); 
     } catch(...) { 
      // Ignore it here! The task will save it. :) 
     }; 
     }); 

    }; 

    ~TaskPool() { 
     // TODO: wait for all tasks to finish... 
    }; 
    private: 
    queue<thread *> threads; 
    queue<function<void()>> builders; 
    mutex lock; 

    TaskPool(signed N) noexcept { 
     while(N --> 0) 
     threads.push(new thread([this, N] { 
      for(;;) { 

      pop_task(); 

      }; 
     })); 
    }; 

    void pop_task() noexcept { 

     lock.lock(); 

     if(builders.size()) { 

     auto task = builders.front(); 

     builders.pop(); 

     lock.unlock(); 

     task(); 

     } else 
     lock.unlock(); 
    }; 

} pool; 


template<typename R> 
PooledTask<R>::PooledTask(function<R()> fun): 
    task(fun), 
    done(false), 
    failed(false) 
{ 
    pool.push(this); 
}; 

// Should probably return a std::shared_ptr here... 
template<typename F, typename... Args> 
auto byfuture(F fun, Args&&... args) noexcept -> 
    PooledTask<decltype(fun(args...))> * 
{ 

    using R = decltype(fun(args...)); 

    auto pooled = new PooledTask<R> { 
    bind(fun, forward<Args>(args)...) 
    }; 

    return pooled; 
}; 


//------------------------------------------------------------------------------ 
#include <map> 

// Get the current thread id as a simple number 
static int myid() noexcept { 
    static unsigned N = 0; 
    static map<thread::id, unsigned> hash; 
    static mutex lock; 

    lock_guard<mutex> guard(lock); 

    auto current = this_thread::get_id(); 

    if(!hash[current]) 
    hash[current] = ++N; 

    return hash[current]; 
}; 
//------------------------------------------------------------------------------ 

//------------------------------------------------------------------------------ 
// The fibonacci test implementation 
int future_fib(int x, int parent) { 

    if(x < 3) 
    return 1; 

    print_("future_fib(", x, ")", " on thread ", myid(), \ 
     ", asked by thread ", parent, "\n"); 

    auto f1 = byfuture(future_fib, x - 1, myid()); 
    auto f2 = byfuture(future_fib, x - 2, myid()); 

    auto res = (*f1)() + (*f2)(); 

    delete f1; 
    delete f2; 

    return res; 
}; 
//------------------------------------------------------------------------------ 

int main() { 
    // Force main thread to get id 1 
    myid(); 

    // Get task 
    auto f = byfuture(future_fib, 8, myid()); 

    // Make sure it starts on the task pool 
    this_thread::sleep_for(seconds(1)); 

    // Blocks 
    (*f)(); 

    // Simply wait to be sure all threads are clean 
    this_thread::sleep_for(seconds(2)); 

    // 
    return EXIT_SUCCESS; 
};

此程序的结果是这样的（我有一个四核，所以3个线程池中）：

future_fib(8) on thread 2, asked by thread 1 
future_fib(7) on thread 3, asked by thread 2 
future_fib(6) on thread 4, asked by thread 2 
future_fib(6) on thread 3, asked by thread 3 
future_fib(5) on thread 4, asked by thread 4 
future_fib(5) on thread 3, asked by thread 3 
future_fib(4) on thread 4, asked by thread 4 
future_fib(4) on thread 3, asked by thread 3 
future_fib(3) on thread 4, asked by thread 4 
future_fib(3) on thread 3, asked by thread 3 
future_fib(3) on thread 4, asked by thread 4 
future_fib(3) on thread 3, asked by thread 3 
future_fib(4) on thread 4, asked by thread 4 
future_fib(4) on thread 3, asked by thread 3 
future_fib(3) on thread 4, asked by thread 4 
future_fib(3) on thread 3, asked by thread 3 
future_fib(5) on thread 3, asked by thread 3 
future_fib(4) on thread 3, asked by thread 3 
future_fib(3) on thread 3, asked by thread 3 
future_fib(3) on thread 3, asked by thread 3

此实现自己都慢比正常的斐波那契功能。

所以这里的问题：当池中运行fib(8)，它会创建将在接下来的线程上运行两个任务，但是，当它到达auto res = (*f1)() + (*f2)();，两个任务都已经在运行，所以它会阻塞f1（上运行线程3）。

为了提高速度，我需要做的是为线程2而不是在f1上进行阻塞，以假定线程3正在执行的任务，让它准备好接受另一个任务，所以没有线程会睡觉做计算。

这篇文章在这里http://bartoszmilewski.com/2011/10/10/async-tasks-in-c11-not-quite-there-yet/说有必要做我想做的事，但没有指定如何。

我的疑问是：我怎么可能做到这一点？

有没有其他的选择做我想要的？

来源

2014-02-20 paulotorrens

[Threading Building Blocks（TBB）library]（https://www.threadingbuildingblocks.org/）怎么样？它提供了带有线程池的并发任务系统。 – yohjp

看看C++ 1z的'.then（）'方案吗？ 'return pooled_fib（x-2）.then（[x]（auto && r1）{auto r2 = pooled_fib（x-1）; return r1.get（）+ r2.get（）;}）;'或者somesuch。 – Yakk

我想你可能有机会与resumable functions currently proposed for C++ standartization。该提案尚未获得批准，但Visual Studio 15 CTP实现了该提案，因此您可以尝试制作原型（如果可以使用MSVC编译器）。

Gor的Nishanov（最新建议论文的作者之一）描述了计算斐波那契数的一个非常相似的示例“父偷调度”开始在他的谈话CppCon 23:47：https://www.youtube.com/watch?v=KUhSjfSbINE

但是请注意，，我找不到spawnable<T>实施的任何资源/样本，因此您可能需要联系提案作者以获取详细信息。

来源

2014-12-21 08:19:16

看起来真不错！我会尝试与他联系。非常感谢你！ :) – paulotorrens

读完论文后的确如此：这的确是一样的想法（我真的试图通过手工构建一个偷窃时间表），通过阅读他的方法，我可以看到我做错了什么。现在我可以完成我的工作！ :) – paulotorrens

看你的代码是完全的东西，会比计算FIB 8

例如切换到内核空间更长的时间来找出线程ID是什么将在窗口的最可能的口味的时间比工作更长时间在这里完成。

并行化并不是为了让一堆线程竞争共享内存。这是你可以犯的最糟糕的错误。

并行化任务时，您将输出分成不连续的块，以便并行线程分别写入自己的内存，避免内存和缓存争用，从而导致应用程序崩溃。

当你有3个线程触及3个独立的内存位置时，永远不需要使用锁或其他同步原语。在大多数窗口上还需要内核模式切换。

所以你真正需要知道的唯一事情就是线程全部完成。这可以通过许多Interlocked Exchange方法或OS驱动的事件句柄来实现。

如果您想成为一名认真的开发人员，请删除线程ID，删除锁定代码，并开始思考如何在没有这些问题的情况下处理此问题。

在2车道高速公路上考虑2辆车。一个比另一个更快。你永远不知道哪辆车在另一辆车前面。问问你自己有没有办法在两条车道上定位这些车，谁在前面谁不在乎谁在移动更快？你应该得出结论，如果每辆车停留在自己的车道上，那么永远不会有问题。这是最简单的并行化。

现在考虑你将在不同的大陆的不同机器上产生这些工作。尝试交换关于线程和内存的信息是否合理？不，这不对。你很简单地将问题分解成几乎完全没有任何关系的离散功能块，忘记过度控制，让信息时代的魔法发生。

我希望这会有所帮助。

来源

2014-02-20 08:31:07 Dan

这只是一个例子，当然在这种情况下，代码将充满比fib更长的事情。您可以替换无锁队列的锁。这并不能解决f1必须隐含等待f2直到operator +可以执行的问题。我们需要的是一种释放线程的方法，而不是等待一个信号，这是一个锁定或旋转等待互锁比较交换（这是通常如何实现锁定）或执行到另一个任务（即通过任务窃取） –

附录：operator +必须在f1和f2之后执行，并且不希望更改代码以使用延续或通过使用像CPC continuation-passing-C这样的预处理器来更改构建。你还可以采用其他方法而不锁定？ –

没错。当然，我没有在速度测试中保留带有线程ID的'print_'。问题在于这个任务只在线程3和4之间共享，因为2被阻塞，正在等待。我尝试了今天早上执行的一个任务，但最终我得到了一个'bus error：10'，我认为这是由堆栈溢出引起的（具有讽刺意味）。那么，我认为这可能是问题，也许我可以通过使用自旋锁而不是互斥来获得一些改进。也许即使我使用全部3个线程，它仍然会比串行版本慢。只是想尝试一下，因为我需要自动并行。 – paulotorrens

将执行从一个线程移动到另一个线程以执行任务并行和调用

回答

相关问题