为什么这个线程管理模式会导致死锁？

我正在使用公共基类has_threads来管理应允许实例化boost::thread的任何类型。每个has_threads为什么这个线程管理模式会导致死锁？

实例拥有thread的set一个S（支持waitAll和interruptAll功能，我不包括以下），并应自动调用removeThread当一个线程终止保持这种set的完整性。

在我的程序中，我只有其中之一。线程每间隔10秒创建一次，每个线程执行一次数据库查找。当查找完成时，线程运行完成并调用removeThread;通过设置互斥锁，将线程对象从内部跟踪中删除。我可以看到这个工作正常与输出ABC。

虽然有一段时间，但机制发生了碰撞。也许两次同时执行removeThread。我无法弄清楚为什么这导致死锁。从这一点开始的所有线程调用都不会输出除A以外的任何内容。 ^{[值得注意的是，我使用的是线程安全的stdlib，并且在未使用IOStream时仍然存在问题。]}堆栈跟踪表明该互斥锁正在锁定这些线程，但为什么该锁最终不会被第一个线程为第二个，第二个为第三个，依此类推？

我错过了关于scoped_lock如何工作的基础知识吗？尽管（或者甚至是由于？）使用互斥锁，是否有任何显而易见的错误可能导致僵局？

对不起，这个问题很糟糕，但是我确定你知道这个问题已经很晚了 - 不可能为这样的错误提供真正的测试用例。

class has_threads { 
    protected: 
     template <typename Callable> 
     void createThread(Callable f, bool allowSignals) 
     { 
      boost::mutex::scoped_lock l(threads_lock); 

      // Create and run thread 
      boost::shared_ptr<boost::thread> t(new boost::thread()); 

      // Track thread 
      threads.insert(t); 

      // Run thread (do this after inserting the thread for tracking so that we're ready for the on-exit handler) 
      *t = boost::thread(&has_threads::runThread<Callable>, this, f, allowSignals); 
     } 

    private: 

     /** 
     * Entrypoint function for a thread. 
     * Sets up the on-end handler then invokes the user-provided worker function. 
     */ 
     template <typename Callable> 
     void runThread(Callable f, bool allowSignals) 
     { 
      boost::this_thread::at_thread_exit(
       boost::bind(
        &has_threads::releaseThread, 
        this, 
        boost::this_thread::get_id() 
       ) 
      ); 

      if (!allowSignals) 
       blockSignalsInThisThread(); 


      try { 
       f(); 
      } 
      catch (boost::thread_interrupted& e) { 

       // Yes, we should catch this exception! 
       // Letting it bubble over is _potentially_ dangerous: 
       // http://stackoverflow.com/questions/6375121 

       std::cout << "Thread " << boost::this_thread::get_id() << " interrupted (and ended)." << std::endl; 
      } 
      catch (std::exception& e) { 
       std::cout << "Exception caught from thread " << boost::this_thread::get_id() << ": " << e.what() << std::endl; 
      } 
      catch (...) { 
       std::cout << "Unknown exception caught from thread " << boost::this_thread::get_id() << std::endl; 
      } 
     } 

     void has_threads::releaseThread(boost::thread::id thread_id) 
     { 
      std::cout << "A"; 
      boost::mutex::scoped_lock l(threads_lock); 

      std::cout << "B"; 
      for (threads_t::iterator it = threads.begin(), end = threads.end(); it != end; ++it) { 

       if ((*it)->get_id() != thread_id) 
        continue; 

       threads.erase(it); 
       break; 
      } 
      std::cout << "C"; 
     } 

     void blockSignalsInThisThread() 
     { 
      sigset_t signal_set; 
      sigemptyset(&signal_set); 
      sigaddset(&signal_set, SIGINT); 
      sigaddset(&signal_set, SIGTERM); 
      sigaddset(&signal_set, SIGHUP); 
      sigaddset(&signal_set, SIGPIPE); // http://www.unixguide.net/network/socketfaq/2.19.shtml 
      pthread_sigmask(SIG_BLOCK, &signal_set, NULL); 
     } 


     typedef std::set<boost::shared_ptr<boost::thread> > threads_t; 
     threads_t threads; 

     boost::mutex threads_lock; 
}; 

struct some_component : has_threads { 
    some_component() { 
     // set a scheduler to invoke createThread(bind(&some_work, this)) every 10s 
    } 

    void some_work() { 
     // usually pretty quick, but I guess sometimes it could take >= 10s 
    } 
};

来源

2011-12-08 Lightness Races in Orbit

那么，如果在同一个线程锁定它已经锁定的互斥（除非你使用递归互斥体）出现死锁可能。

如果释放部分被同一个线程第二次调用，就像它看起来发生在您的代码中一样，您会遇到死锁。

我还没有详细研究过你的代码，但是你可能不得不重新设计你的代码（简化？）以确保一个锁不能被同一个线程获得两次。您可以使用安全检查锁的所有权...

编辑：正如我的评论和IronMensan答案中所说，一种可能的情况是线程在创建过程中停止，at_exit在释放锁定在代码的创建部分中的互斥锁。

EDIT2：

那么，与互斥和范围的锁，我只能想象一个递归锁，或者是没有释放的锁。例如，如果由于内存损坏导致循环变为无限，则可能会发生这种情况。

我建议使用线程ID添加更多的日志来检查是否存在递归锁或奇怪的东西。然后我会检查我的循环是否正确。我也会检查at_exit每个线程只调用一次...

还有一两件事，检查擦除（因此调用析构函数）一个线程，而在at_exit功能是效果...

我的2美分

来源

2011-12-08 17:24:44 neuro

当然，死锁只能发生，如果'releaseThread'，直接或间接，最终调用自己（或'createThread'）？我看不出它是怎么回事...... –

@Tomalak：我的猜测是线程可以在释放创建锁之前停止，递归地调用发行版... – neuro

哦，这是一个观点...我不期待，但我想这是可能的！ –

你可能需要做这样的事情：

void createThread(Callable f, bool allowSignals) 
    { 
     // Create and run thread 
     boost::shared_ptr<boost::thread> t(new boost::thread()); 

     { 
      boost::mutex::scoped_lock l(threads_lock); 

      // Track thread 
      threads.insert(t); 
     } 

     //Do not hold threads_lock while starting the new thread in case 
     //it completes immediately 

     // Run thread (do this after inserting the thread for tracking so that we're ready for the on-exit handler) 
     *t = boost::thread(&has_threads::runThread<Callable>, this, f, allowSignals); 
    }

换句话说，使用thread_lock专门保护threads。

更新：

要在与炒作有关如何提高::线程工作意见的东西扩大，锁定模式可以是这个样子：

createThread：

（createThread）获得threads_lock
（boost::thread::opeator =）获得boost::thread内部锁
（boost::thread::opeator =）释放boost::thread内部锁
（createThread）释放threads_lock

螺纹端处理程序：

（at_thread_exit）获得boost::thread内部锁
（ releaseThread）获得threads_lock
（releaseThread）释放threads_lock
（at_thread_exit）释放boost:thread内部锁

如果这两个boost::thread锁是相同的锁，用于死锁的可能性是显而易见的。但是这是猜测，因为大部分提升代码都让我感到害怕，我尽量不去看它。

createThread可以/应该重做，在步骤1和步骤2之间移动步骤4并消除潜在的死锁。

来源

2011-12-08 17:28:25 IronMensan

需要在范围之前声明的'shared_ptr'。 – Xeo

你发现了一个瑕疵！我并不完全相信我的线程可能运行得如此之快，但仍然需要修复，我会放弃它。谢谢！ –

@Xeo谢谢，修正。 – IronMensan

创建的线程可能在createThread完成之前或在赋值运算符期间完成。使用事件队列或可能需要的其他结构。虽然更简单，但尽管如此，解决方案也可能起作用。请勿更改createThread，因为您必须使用threads_lock来保护threads本身以及它指向的thread对象。相反改变runThread到这：

template <typename Callable> 
    void runThread(Callable f, bool allowSignals) 
    { 
     //SNIP setup 

     try { 
      f(); 
     } 
     //SNIP catch blocks 

     //ensure that createThread is complete before this thread terminates 
     boost::mutex::scoped_lock l(threads_lock); 
    }

template <typename Callable> 
    void runThread(Callable f, bool allowSignals) 
    { 
     //SNIP setup 

     try { 
      f(); 
     } 
     //SNIP catch blocks 

     //ensure that createThread is complete before this thread terminates 
     boost::mutex::scoped_lock l(threads_lock); 
    }

来源

2011-12-08 21:10:35 IronMensan

为什么这个线程管理模式会导致死锁？

回答

相关问题