这段代码为什么会死锁？

我在我的可加载模块中创建了2个Linux内核线程，并将它们绑定到在双核Android设备上运行的独立CPU内核。运行这几次后，我注意到设备重新启动，并且硬件看门狗定时器复位。我始终如一地解决问题。什么可能导致僵局？这段代码为什么会死锁？

基本上，我需要做的是确保两个线程同时在不同的内核上运行do_something（），而不会有人盗取cpu周期（即中断被禁用）。我正在使用一个自旋锁和一个易变的变量。我也有一个父线程在子线程上等待的信号量。

#define CPU_COUNT 2 

/* Globals */ 
spinlock_t lock; 
struct semaphore sem; 
volatile unsigned long count; 

/* Thread util function for binding the thread to CPU*/ 
struct task_struct* thread_init(kthread_fn fn, void* data, int cpu) 
{ 
    struct task_struct *ts; 

    ts=kthread_create(fn, data, "per_cpu_thread"); 
    kthread_bind(ts, cpu); 
    if (!IS_ERR(ts)) { 
     wake_up_process(ts); 
    } 
    else { 
     ERR("Failed to bind thread to CPU %d\n", cpu); 
    } 
    return ts; 
} 

/* Sync both threads */ 
void thread_sync() 
{ 
    spin_lock(&lock); 
    ++count; 
    spin_unlock(&lock); 

    while (count != CPU_COUNT); 
} 

void do_something() 
{ 
} 

/* Child thread */ 
int per_cpu_thread_fn(void* data) 
{ 
    int i = 0; 
    unsigned long flags = 0; 
    int cpu = smp_processor_id(); 

    DBG("per_cpu_thread entering (cpu:%d)...\n", cpu); 

    /* Disable local interrupts */ 
    local_irq_save(flags); 

    /* sync threads */ 
    thread_sync(); 

    /* Do something */ 
    do_something(); 

    /* Enable interrupts */ 
    local_irq_restore(flags); 

    /* Notify parent about exit */ 
    up(&sem); 
    DBG("per_cpu_thread exiting (cpu:%d)...\n", cpu); 
    return value; 
} 

/* Main thread */ 
int main_thread() 
{ 
    int cpuB; 
    int cpu = smp_processor_id(); 
    unsigned long flags = 0; 

    DBG("main thread running (cpu:%d)...\n", cpu); 

    /* Init globals*/ 
    sema_init(&sem, 0); 
    spin_lock_init(&lock); 
    count = 0; 

    /* Launch child thread and bind to the other CPU core */ 
    if (cpu == 0) cpuB = 1; else cpuB = 0;   
    thread_init(per_cpu_thread_fn, NULL, cpuB); 

    /* Disable local interrupts */ 
    local_irq_save(flags); 

    /* thread sync */ 
    thread_sync(); 

    /* Do something here */ 
    do_something(); 

    /* Enable interrupts */ 
    local_irq_restore(flags); 

    /* Wait for child to join */ 
    DBG("main thread waiting for all child threads to finish ...\n"); 
    down_interruptible(&sem); 
}

来源

2013-08-02 Gupta

我不确定，这是一个真正的原因，但是您的代码包含一些严重错误。

第一个在while (count != CPU_COUNT);。除非读取是原子的，否则不能在不锁定锁的情况下读取共享变量。与count它不保证是。

您必须保护带锁的读取count。你可以用下面的代码替换您while循环：

unsigned long local_count; 
do { 
    spin_lock(&lock); 
    local_count = count; 
    spin_unlock(&lock); 
} while (local_count != CPU_COUNT);

或者，你可以使用原子类型。通知不存在锁定

atomic_t count = ATOMIC_INIT(0); 

... 

void thread_sync() { 
    atomic_inc(&count); 
    while (atomic_read(&count) != CPU_COUNT); 
}

二问题中断。我想，你不明白你在做什么。

local_irq_save()保存并禁用中断。然后，您再次使用local_irq_disable()禁用中断。经过一些工作后，您可以使用local_irq_restore()恢复以前的状态，并使用启用中断。这种使能是完全错误的。无论以前的状态如何，您都可以启用中断。

第三问题。如果主线程没有绑定到CPU，除非您确定在获取CPU编号后内核不会重新计划，否则不应使用smp_processor_id()。最好使用get_cpu()，它禁用内核抢占，然后返回cpu id。完成后，请致电put_cpu()。

但是，当您拨打get_cpu()时，这是创建和运行其他线程的错误。这就是为什么你应该设置主线程的亲和力。

第四。 local_irq_save()和local_irq_restore()需要变量的宏，而不是指向unsigned long的指针。（我有一个错误和一些警告传递指针，我不知道你是如何编译你的代码的）。您的回复http://pastebin.com/Ven6wqWf

来源

2013-08-03 01:07:01

感谢Rasen：删除引用

最后的代码可以在这里找到。我修正了中断呼叫，但仍然看到问题。我不能在spin_lock（）内移动（count！= CPU_COUNT），因为它会立即死锁。你有其他建议吗？我的要求是两个线程都应该在同一时间开始执行do_something（）。 – Gupta

您必须保护读取'count'锁。我编辑了我的帖子以显示如何执行此操作。 –

再次感谢指针。这还没有解决我的问题。我看到其中一个线程正在旋转，而另一个线程永远不会增加计数器。您是否看到线程创建方式的问题？ – Gupta

这段代码为什么会死锁？

回答

相关问题