linux-kernel - Re: workqueue: WARN at at kernel/workqueue.c:2176

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <53965D86.2070803@cn.fujitsu.com>
Date:	Tue, 10 Jun 2014 09:21:10 +0800
From:	Lai Jiangshan <laijs@...fujitsu.com>
To:	<jjherne@...ux.vnet.ibm.com>
CC:	Peter Zijlstra <peterz@...radead.org>,
	Sasha Levin <sasha.levin@...cle.com>,
	Tejun Heo <tj@...nel.org>, LKML <linux-kernel@...r.kernel.org>,
	Dave Jones <davej@...hat.com>, Ingo Molnar <mingo@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Steven Rostedt <rostedt@...dmis.org>
Subject: Re: workqueue: WARN at at kernel/workqueue.c:2176

On 06/09/2014 10:01 PM, Jason J. Herne wrote:
> On 06/05/2014 06:54 AM, Lai Jiangshan wrote:
>> ------------
>>
>> Subject: [PATCH] sched: migrate the waking tasks
>>
>> Current code skips to migrate the waking task silently when TTWU_QUEUE is enabled.
>>
>> When a task is waking, it is pending on the wake_list of the rq, but
>> it is not on queue (task->on_rq == 0). In this case, set_cpus_allowed_ptr()
>> and __migrate_task() will not migrate it due to it is not on queue.
>>
>> This behavior is incorrect, because the task had been already waken-up, it will
>> be running on the wrong CPU without correct placement until the next wake-up
>> or update for cpus_allowed.
>>
>> To fix this problem, we need to make the waking tasks on-queue (transfer
>> the waking tasks to running state) before migrate them.
>>
>> Signed-off-by: Lai Jiangshan <laijs@...fujitsu.com>
>> ---
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 268a45e..d05a5a1 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -1474,20 +1474,24 @@ static int ttwu_remote(struct task_struct *p, int wake_flags)
>>   }
>>
>>   #ifdef CONFIG_SMP
>> -static void sched_ttwu_pending(void)
>> +static void sched_ttwu_pending_locked(struct rq *rq)
>>   {
>> -    struct rq *rq = this_rq();
>>       struct llist_node *llist = llist_del_all(&rq->wake_list);
>>       struct task_struct *p;
>>
>> -    raw_spin_lock(&rq->lock);
>> -
>>       while (llist) {
>>           p = llist_entry(llist, struct task_struct, wake_entry);
>>           llist = llist_next(llist);
>>           ttwu_do_activate(rq, p, 0);
>>       }
>> +}
>>
>> +static void sched_ttwu_pending(void)
>> +{
>> +    struct rq *rq = this_rq();
>> +
>> +    raw_spin_lock(&rq->lock);
>> +    sched_ttwu_pending_locked(rq);
>>       raw_spin_unlock(&rq->lock);
>>   }
>>
>> @@ -4530,6 +4534,11 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
>>           goto out;
>>
>>       dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
>> +
>> +    /* Ensure it is on rq for migration if it is waking */
>> +    if (p->state == TASK_WAKING)
>> +        sched_ttwu_pending_locked(rq);
>> +
>>       if (p->on_rq) {
>>           struct migration_arg arg = { p, dest_cpu };
>>           /* Need help from migration thread: drop lock and wait. */
>> @@ -4576,6 +4585,10 @@ static int __migrate_task(struct task_struct *p, int src_cpu, int dest_cpu)
>>       if (!cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p)))
>>           goto fail;
>>
>> +    /* Ensure it is on rq for migration if it is waking */
>> +    if (p->state == TASK_WAKING)
>> +        sched_ttwu_pending_locked(rq_src);
>> +
>>       /*
>>        * If we're not on a rq, the next wake-up will ensure we're
>>        * placed properly.
>>
> 
> FYI, this patch appears to fix the problem. I was able to run for 3 days without hitting the warning.

Thank you for the test. It proves that we found the root cause.
Your tests are the most important, coding takes the second place, let it go forward step by step.

Thanks,
Lai

> 
> I see that you guys are still discussing the details of the fix. When you decide on a final solution I'm happy to retest. Just be sure to ask :). It is hard to tell what to test with so many patches and code snippets flying around all the time.
> 
> Happy coding.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/